[HN Gopher] ZFS on a single core RISC-V hardware with 512MB
       ___________________________________________________________________
        
       ZFS on a single core RISC-V hardware with 512MB
        
       Author : magicalhippo
       Score  : 77 points
       Date   : 2022-03-13 17:16 UTC (5 hours ago)
        
 (HTM) web link (andreas.welcomes-you.com)
 (TXT) w3m dump (andreas.welcomes-you.com)
        
       | 2Gkashmiri wrote:
       | hope this coupled with more tech on risc-v hardware can bring it
       | to the level of raspbery pi with all the community and hardware
       | devices and the accessories and all that.
       | 
       | will it take a decade ? less?
        
         | FullyFunctional wrote:
         | I hope it won't be a decade, but remember the (original)
         | Raspberry Pi launched on a very mature part, with a _very_
         | mature (ancient) ISA.
         | 
         | Outside the discount pricing, Intel has promised to tape out
         | SiFive's P650. Revos, Tenstorrent, and others are also working
         | on fast cores, but it'll be at least 2-3 years before they hit
         | the market if at all.
         | 
         | So far SiFive's dual issue in-order core (~ 40 GeekBench 5.4.1)
         | (like on now-cancelled BeagleV) is the fastest chip you can buy
         | as a lay person. The D1 (~ 32 GB 5.4.1) is cheaper but less
         | powerful.
        
       | michaelmrose wrote:
       | There has not ever been a reason for memory to be correlated with
       | storage capacity nor any reason to believe that such a
       | correlation ought to exist.
       | 
       | Nobody ever said well I plugged an 20TB external hard drive so I
       | better plug in a few more sticks of RAM so that works.
       | 
       | Dedup needs RAM in proportion to storage because for each
       | duplicate block it maintains an entry in an in memory table.
        
         | lazide wrote:
         | You uh just kind of contradicted yourself?
         | 
         | All file systems have metadata which is good to keep in memory.
         | Building several 50+TB NAS boxes recently, it isn't just ZFS
         | either. And it isn't some sort of linear performance penalties
         | sometimes when you don't have enough RAM. It can be kernel
         | panics, exponential decay in performance, etc.
        
           | michaelmrose wrote:
           | I didn't contradict myself at all virtually nobody needs
           | dedup its not remotely worth the RAM cost for 99.9% of users.
           | 
           | Can you quantify what you are saying. What OS/filesystem?
           | What minimum RAM requirements for what amount of storage?
        
             | lazide wrote:
             | That's a good point - I've seen free memory drop every time
             | I've built the larger file systems (and not from just the
             | cache), but I never tried to quantify it. And I don't see
             | any good stats or notes on it.
             | 
             | seems like no one is building these larger systems on boxes
             | small enough for it to matter, or at least google isn't
             | finding it.
        
               | michaelmrose wrote:
               | Another way of saying this is that RAM usage doesn't
               | meaningfully scale with storage size in scope of storage
               | systems encountered by actual non theoretical people
               | because the minimum ram available on any system one
               | encounters is sufficient to service the amount of storage
               | that it is possible to use on said system.
        
         | porkgymnastics wrote:
        
         | magicalhippo wrote:
         | > There has not ever been a reason for memory to be correlated
         | with storage capacity nor any reason to believe that such a
         | correlation ought to exist.
         | 
         | However specific implementations can indeed have memory
         | requirements that scale in relation to storage capacity. For
         | example, if the implementation keeps the bitmap of free space
         | in memory, then more storage = larger bitmap = more memory
         | required.
         | 
         | There's been several attempts in ZFS to reduce memory overhead.
         | I'm pretty sure that if you took a decade old version of ZFS
         | you'd struggle to run it on a system with 512MB RAM.
        
           | michaelmrose wrote:
           | At present 512MB of RAM is notable in how ridiculously tiny
           | it is and 2TB is still an acceptable amount of storage.
           | Without resorting to decades obsolete software can you put a
           | pin on exactly how much storage it would take to render that
           | tiny amount of RAM unusable and then explain how much storage
           | it would take to render a machine with 4GB of RAM likewise
           | unusable so that we may demonstrate memory usage scaling with
           | storage?
        
       | FullyFunctional wrote:
       | Ha, this is awesome, thanks for checking that out. One point of
       | note though, I'm pretty sure it would have been faster to build
       | the kernel + OpenZFS on the Debian/RISC-V in QEMU. QEMU on decent
       | hardware runs very fast and much faster than the D1.
       | 
       | ADD: Geekbench 5.4.1 on RISC-V
       | 
       | - under QEMU/Ryzen 9 3900XT: 82
       | 
       | - under QEMU/M1: 76
       | 
       | - Native D1: 32 (https://browser.geekbench.com/v5/cpu/13259016)
       | 
       | The M1 result is skewed because for some reason AES emulation is
       | much faster on Ryzen. The rest of the integer stuff is faster on
       | the M1, up to 30% faster.
        
       | bombcar wrote:
       | Anyone have a low-power RISC or ARM hardware that supports many
       | SATA ports?
        
         | mustache_kimono wrote:
         | Don't know why anyone hasn't made such a board a priority.
         | Seems like a sweet spot.
        
         | vorpalhex wrote:
         | Would also love to see this, even if it's experimental or beta.
         | 
         | PiBox is the only contender I am aware of.
        
       | dark-star wrote:
       | But can it do dedupe on such a box? I think the recommendation is
       | still "1GB of RAM for each TB of storage" if you're using
       | dedupe...
       | 
       | I still have some boards with ~512mb RAM lying around (an
       | UltraSPARC for example) that I'd love to re-purpose to a cheap
       | NAS, just for the heck of doing it on a non-x86 platform....
        
         | Wowfunhappy wrote:
         | Yeah, I think the author may be mixing up recommendations for
         | dedup vs non-dedup. The solution is always to not enable dedup,
         | it's a niche feature that's not worthwhile outside of _very_
         | specific scenarios.
        
           | R0b0t1 wrote:
           | The speed you want them to run is a factor also. The rule of
           | thumb hasn't applied for a while, he's right in noting that
           | in the post.
        
             | Wowfunhappy wrote:
             | I thought it does generally apply for dedup, though,
             | because ZFS is then required to keep the dedup tables in
             | memory?
        
         | lazide wrote:
         | I've tried dedup out, and even with a large powerful box with a
         | LOT of duplicate files (multi-TB repositories of media files
         | which get duplicated several times due to coarse snapshotting
         | from other less fancy systems), I get near zero deduplication.
         | I think it was literally low single digits percents.
         | 
         | ZFS dedup is block based, and actual block size varies
         | depending on data feed rate for most workloads (zfs queues up
         | async writes and merges them), so in practice once a file gets
         | some non-zero block offset somewhere which happens all the
         | time, even identical files don't dedup.
        
           | Wowfunhappy wrote:
           | Wow, that's worse than I realized! Honestly, this makes me
           | wonder whether the feature should even exist in ZFS. Given
           | the enormous hardware requirements and minimal savings...
           | well, I'd be curious to hear if anyone has ever found a real
           | use case.
        
             | FullyFunctional wrote:
             | Does anyone actually use dedup? I think even the OpenZFS
             | documentation says compression is more useful in practice.
             | If at all, dedup should be an offline feature, to be run as
             | scheduled by the operator.
             | 
             | My setup tries to get the absolute highest bandwidth and
             | uses NVMe sticks in a stripe (I get my redundancy
             | elsewhere), no compression, no dedup and yet can only hit ~
             | 3.5 GB/s reads (TrueNAS Core, EPYC 7443P, Samsung 980PRO,
             | 256 GiB). I hope TrueNAS SCALE will perform better.
        
               | watersb wrote:
               | My first ever large (> 4TB) ZFS pool is still stuck with
               | dedup. It's a backup server, gets about 2x with
               | deduplication.
               | 
               | At the time, it was the difference between slow and
               | impossible: I couldn't afford another 2x of disks.
               | 
               | These days, the pool could fit on a portable SSD that
               | would fit in my pocket.
               | 
               | Careful, file-based dedup on top of ZFS might be more
               | effective.
               | 
               | Small changes to single, large files see some advantage
               | with block based deduplication. You see this in
               | collections disk images for virtual machines.
               | 
               | You might see that in database applications, depending on
               | log structure. I don't know, I don't have that
               | experience.
               | 
               | For most of us, file-based deduplication might work out
               | better, and is almost certainly easier to understand. You
               | can come with a mental model of what you're working with,
               | dealing with successive collections of files.
               | 
               | Even though files are just another abstraction over
               | blocks, it's an abstraction that leaks less without the
               | deduplication.
               | 
               | I haven't used a combination of encryption and
               | deduplication. That was Really Hard for ZFS to implement,
               | and I'm not sure how meaningful such a combination is in
               | practice.
        
               | bombcar wrote:
               | It would be nice if ZFS was able to combine dedup and
               | compression - basically be able to notice that a
               | block/file/datastream was similar/identical to another
               | one, and do compression along with a pointer ...
        
               | lazide wrote:
               | practically speaking the tradeoffs to make that work are
               | unlikely to make you or anyone else happy except in some
               | VERY specific workloads.
        
               | willis936 wrote:
               | ZFS can have both features enabled at once.
               | 
               | Though there is no clean way to disable either.
               | Compression can be removed from files by rewriting them,
               | but removing deduplication requires copying over all data
               | to a fresh pool.
        
           | mlok wrote:
           | ZFS Dedup has been wonderful for me : dedupratio = 7.05x (144
           | GB stored on a 25 GB volume, and still 1.3 GB left free). I
           | use it for backups of versions of the same folders and files
           | slowly evolving over a long period of time ( > 15 years) that
           | gives a lot of duplication, of course. (I could also use
           | compression on top of it)
        
           | magicalhippo wrote:
           | While regular dedup is only a win for _highly_ specific
           | workload, the file-based deduplication[1][2] which is in the
           | works seems like it can have some potential.
           | 
           | They discussed it, along with some options for a background-
           | scanning dedup service (trying to find potential files to
           | dedup), in the February leadership meeting[3].
           | 
           | [1]: https://openzfs.org/wiki/OpenZFS_Developer_Summit_2020_t
           | alks...
           | 
           | [2]: https://youtu.be/hYBgoaQC-vo
           | 
           | [3]: https://www.youtube.com/watch?v=hij7PGGjevc
        
           | spullara wrote:
           | Wow, that is a very naive dedup algorithm.
        
             | lazide wrote:
             | Without restricting pretty heavily how you can interact
             | with files or causing severe bottlenecks it's the best that
             | can be done probably, since the FS API doesn't provide any
             | real guarantees about what data WILL be written later, or
             | how much of it, etc. So it has to figure things out as it
             | goes with minimal performance impact.
        
       ___________________________________________________________________
       (page generated 2022-03-13 23:00 UTC)