Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Amazon S3 file system with improved caching: the itch I scratched over Christmas (github.com/russross)
36 points by russross on Dec 28, 2009 | hide | past | favorite | 9 comments


This project has one of the best READMEs I've seen in recent days of looking at a lot of open source code. Good overview, well written.


Very cool. This looks like a real version of the Ruby fusefs I wrote to grok all of the s3organizer vs. s3sync vs. whatever schemes for differentiating files vs. directories in S3:

http://github.com/stephenh/s3fsr

Given I used Ruby's fusefs, nothing is streamed, and its single threaded, limitations I assume this C++ implementation doesn't have to deal with.


Would it be possible to reuse an existing HTTP caching solution like squid or nginx for the caching since s3 exposes a REST api?


I'm not sure it would interact nicely with the request authentication system that S3 uses.

I think a generic cache layer would be a better solution. A bit of googling turns up fuse-cache, which sounds like about the right thing (although I haven't actually examined it in detail), or fs-cache, which also sounds like a discrete cache layer to be added to any file system. Basically, you mount it, and it passes requests through to any other mount (like s3fslite) while adding an on-disk cache layer.

I haven't tested any of these, but that seems like an approach worth pursuing.

- Russ


Very nice! I was rather disappointed with the FuseOverAmazon version I tried a couple of months ago. I will definitely give this a try. Thanks for the great documentation on how to use it as well.

Great job!


Neat! Rather than having to issue a find, perhaps a background task that primes the cache as soon as you mount?


I'm hesitant to automatically fire off that many requests, especially when they may not end up being necessary. If you are using the same machine and preserving the cache, it will already be primed each time you mount the bucket, except the first time (or any time you delete the cache database file).

Using find is just a trick I used whenever I'd corrupt the cache or change the DB schema while developing it, and then wanted to go in and test it again interactively.

I should probably mention that reducing the number of requests was one of my primary goals. The first time I played with s3fs (the one I forked), my bill for the month was roughly 10% storage and bandwidth, and 90% requests (or was it 20/80?).

Anyway, thanks for the feedback; I do appreciate it!

- Russ


Does this use http to upload everything to S3?


Yes, it does. Adding https as an option is something I'll probably look into.

edit: It uses libcurl for transfers, and libcurl supports https, so getting a secure connection is as simple as adding the option:

    url=https://s3.amazonaws.com
at mount time.

I've added that to the README file.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: