What I'm actually more interested in is what Apple is intending on using instead of ZFS.
All indications are that ZFS will not be the next filesystem for OS X, yet they'll need to replace HFS soon seeing as it's the worst-performing and fewest-featured filesystem in use with this generation of operating systems. HFS+ is showing its age in multiple places, and Apple could really take advantage of some of the newer features available in modern filesystems to make a huge difference and increase the power and performance behind some of the current features (a la Time Machine), with techniques such as snapshots, transactioned reads/writes, and more.
AFAIk the main argument against use of ZFS is that it would be relatively resource hungry, making it a bad candidate for use in e.g. A MacBook Air. I have Googled, but couldn't find recent data about this. Does anybody know about any? If performance isn't good enough on some target hardware, could one do better, or would it require dropping features? Which ones?
Without encryption, compression, or deduplication on I don't think there would be very much overheard - more than traditional file systems, definitely, but still relatively very little.
Deduplication requires lots of memory (I think I saw 1-2GB per TB of data quoted someplace) and will increase CPU load a bit, but for certain workloads can save a lot of writing and diskspace (think Virtual Images or a dataset like Dropbox's).
I'm not involved in ZFS development (just an avid user) so take this with a grain of salt.
With new Core iX processers now being able to do AES in hardware, I wonder if the CPU requirement for encryption will go down significantly once/if support is enabled.
ZFS lets you put encryption/compression/dedupe on a per filesystem basis, and filesystems are extremely light and easy to create in ZFS (one/multiple per home directory or installed app is normal) so it's easy enough to only apply those features where they're needed.
As ZFS checksums every block in the CPU anyway, doing an additional crypto step isn't all that bad, especially in a server environment where there cores are plentiful.
ZFS will also verify the checksum of each data block, which means that by default it is not a zero-copy filesystem. sendfile(), recvfile(), etc. will all require at least one read pass.
This isn't necessarily a bad thing--in my experience, ZFS has caught on-disk bit rot several times. However, it may not be worth it on an iPhone or other low-power device. Fortunately, it can be disabled.
I recently got Time Machine working on a network drive that was ZFS underneath (deduplication turned on) and it didn't make a very big difference.
Deduplication would be nice, but I think it won't show a lot of benefit under many common workloads. Also, it can use up a ton of memory for larger drives (it would be a huge boon for SSDs, though).
Deduplication matters most for people who edit big files. In my case, I have some virtual machines with hard drive images stored on my main filesystem. If I didn't exclude them from Time Machine, they would completely fill my backup drive in a matter of days, but with block-level deduplication, there would only be slight overhead compared with storing the same data directly on the HFS+ file system instead of in the images.
The reason deduplication won't help much with large mail archives is that it operates at the block level. In order to be effective, you need to have two copies of the same data that are also divided along the same block boundaries (the second part is arguably the bigger problem).
Also, many applications will already try avoiding duplicate data to varying degrees which makes dedup even less beneficial (backup utilities, VCSes, etc).
If you are dealing with a lot of text, though (like in a mail archive, or with lots of source code) compression really shines. It has way less memory overhead, pretty low CPU overhead, and it can significantly increase disk throughput.
You'll notice I was careful to say "under many common workloads" in my other post, it was because of all this. For most people, most of the time, there would just be no real benefit and a decent performance hit with dedup.
You're missing my point: it's not duplicate blocks in a single mail archive that I'm concerned about, it's duplicate blocks across multiple versions of the same mail archive.
New mails are typically appended to the end of the archive, but the rest of it remains more or less the same across versions.
Also, a well-written block-level deduplication algorithm will work regardless of boundary shifts. I designed and wrote the proprietary blocklevel algorithm used in Genie Timeline, and can tell you that it saves terabytes of data in corporations for just PSTs alone. And it doesn't rely on block boundaries, capable of detecting slided or shifted data.
I'm admittedly not an expert, but isn't that a very different scenario?
In other words, ZFS operates on blocks at a block level - it can't make any assumptions about the contents other than "this block is the same as this block" or "this block is not the same as this block." It can't (based on my understanding) realize that "this block is almost the same as this block" and shift things accordingly, because then it would be messing with application data that might be corrupted by the changes.
If the application is making those shifts on the other hand, it would work just fine because the application could be smart enough to shuffle and unshuffle things in the most efficient way it knows how, but the file system doesn't have the luxury.
If I'm still missing something, please let me know. Anyway, it'd be pretty straightforward to properly test this so maybe I'll do that.
Time Machine uses hard links within it's directory structure, so dedupe or CoW wouldn't be much help unless it was redesigned using ZFS specific features (ie, instead of rebuilding an entire hard link structure as TM does, you'd clone the TM filesystem, then apply the source changes).
All indications are that ZFS will not be the next filesystem for OS X, yet they'll need to replace HFS soon seeing as it's the worst-performing and fewest-featured filesystem in use with this generation of operating systems. HFS+ is showing its age in multiple places, and Apple could really take advantage of some of the newer features available in modern filesystems to make a huge difference and increase the power and performance behind some of the current features (a la Time Machine), with techniques such as snapshots, transactioned reads/writes, and more.