Since the author is using Redis for a system like that, I wanted to do some math about how many documents it can store for gigabyte.
Assuming a 64 bit instance (that is more memory hungry) and that the average paste size is 512 bytes, every 2 million documents require 1GB (just tested with redis-benchmark).
This means that if you have a paste every minute, in order to use 1 GB of memory you need to wait 4 years.
Many problems that at a first look appear to be hard to treat with an in-memory DB at a closer look appear quite addressable.
That said this is the kind of problem where the working set is very small compared to the whole set of documents stored, and where documents are rarely written and often accessed in a read only fashion, so a *SQL DB would work super well for that use case.
I Redis server could form a very good addition in order to perform real time stats about the accesses to the document: number of times the document was read, a sorted set for latest created documents, top documents for this month by page views, and so forth.
I think it would be a really interesting project to get together a way to gather statistics like this. I'm particularly interested in seeing things like average reads per doc, interval between reads, etc (like you mention).
Thanks for the research, and look forward to more coming out in the way of statistics soon!
More related to the central topic of redis as a store for this, I think it really depends on the read per write ratio. Gathering stats like this would be very useful, and its very easy to write a SQL adapter for haste-server, since its just a set/get and optional expire
In theory in a pastebin-alike site you can have a lot of writes for the analytics. think about a pastebin page appearing on HN or other very busy site, if you want to do real time stats Redis will handle the load without issues.
What about a Redis feature where you could mark a key as swappable? Hot keys could stay in memory, cold keys could stay on disk. This would work great with append-only mode, in essence like BitCask.
Assuming a 64 bit instance (that is more memory hungry) and that the average paste size is 512 bytes, every 2 million documents require 1GB (just tested with redis-benchmark).
This means that if you have a paste every minute, in order to use 1 GB of memory you need to wait 4 years.
Many problems that at a first look appear to be hard to treat with an in-memory DB at a closer look appear quite addressable.
That said this is the kind of problem where the working set is very small compared to the whole set of documents stored, and where documents are rarely written and often accessed in a read only fashion, so a *SQL DB would work super well for that use case.
I Redis server could form a very good addition in order to perform real time stats about the accesses to the document: number of times the document was read, a sorted set for latest created documents, top documents for this month by page views, and so forth.