TL;DR They compare one specific general purpose persistent key value store with transaction support - Kyoto Cabinet [1] - running on one specific machine - without further details besides running at 3.2 GHz - with an implementation of an in-memory hash map with chaining (32 byte keys, 16 byte values) on a Stratix IV FPGA with 8 GiB of external DDR SDRAM running at 244 MHz and find that the FPGA is an order of magnitude or two, depending on the operation, faster. Essentially a large but slow associative memory. They also ignore any communication overhead between the host and the FPGA.
I don't think that this is really a relevant result, my old Core i3 with 2.5 GHz easily achieves their five million operation per second when I just use an in-memory hash map - tested with the simplest possible C# program adding ten million strings into a Dictionary<String, String>.
This may have been true in the past but now that individual cores are starting to reach diminishing returns from more transistors the game will be different though it might turn out the same.
If a key value store uses concurrency well it might continue to benefit from better hardware and likewise if an FPGA key value store builds in more concurrency it might be able to perform substantially in overall throughput.
I would say there is not much to be gained here, a key value store is just to simple to benefit much from custom logic. You hash the key, you read from or write to a memory location based on the resulting hash. It is pretty likely that with a fast hash function the bottleneck on modern hardware is the memory bandwidth and the same would very likely apply to a FPGA implementation unless you go some extra way to also create some exceptionally fast memory interface. You could likely get some speed-up with dedicated logic to calculate the hashes but what good is that if you afterwards have to wait for the memory?
I don't think that this is really a relevant result, my old Core i3 with 2.5 GHz easily achieves their five million operation per second when I just use an in-memory hash map - tested with the simplest possible C# program adding ten million strings into a Dictionary<String, String>.
[1] http://fallabs.com/kyotocabinet/