Does anyone have numbers on memory bandwidth and latency?
The x1 cost per GB is about 2/3 that of r3 instances, but you get 4x as many memory channels if spec the same amount of memory via r3 instances so the cost per memory channel is more than twice as high for x1 as r3. DRAM is valuable precisely because of its speed, but the speed itself is not cost-effective with the x1. As such, the x1 is really for the applications that can't scale with distributed memory. (Nothing new here, but this point is often overlooked.)
Similarly, you get a lot more SSDs with several r3 instances, so the aggregate disk bandwidth is also more cost-effective with r3.
Not sure I quite understand your math here. The largest R3 instance is the r3.8xlarge with 244 GB of memory. 4 times of that would only get you to 1 TB. Also, this: "DRAM is valuable precisely because of its speed", is wrong (https://en.wikipedia.org/wiki/Dynamic_random-access_memory).
1. 4 of those R3 instances cost less than the X1 but offer nearly double the bandwidth. The X1 is cheaper per GB, but much more expensive per GB/s.
2. If DRAM was not faster than NVRAM/SSD, nobody would use it. "Speed" involves both bandwidth and latency. Latency is probably similar or higher for the X1 instances, but I haven't seen numbers. We can make better estimates about realizable bandwidth based on the system stats.
The x1 cost per GB is about 2/3 that of r3 instances, but you get 4x as many memory channels if spec the same amount of memory via r3 instances so the cost per memory channel is more than twice as high for x1 as r3. DRAM is valuable precisely because of its speed, but the speed itself is not cost-effective with the x1. As such, the x1 is really for the applications that can't scale with distributed memory. (Nothing new here, but this point is often overlooked.)
Similarly, you get a lot more SSDs with several r3 instances, so the aggregate disk bandwidth is also more cost-effective with r3.