The current flash architecture is, from a building blocks perspective, identical...

Tuna-Fish · on Oct 31, 2012

> The current flash architecture is, from a building blocks perspective, identical to dynamic RAM architecture.

This is true of typical NOR flash, but not NAND flash. The smallest word you can read from typical, modern NAND is 8kB. This is not a feature of the controller, but of the way the bit array is laid out. While you could build this memory bus out of NOR, it would be quite expensive -- NAND is not only much more dense, but because it's a commodity with a lot of competition, cost per mm^2 is also much lower than NOR.

wmf · on Oct 30, 2012

If you look at how Intel and others have constructed their PCIe cards which have flash on them you will see that the controller presents a "memory" type interface to the PCIe bus as it would in the case of connecting it to the physical memory bus.

I'm pretty sure this is not correct; you still have to use DMA. Doing MMIO to flash may make a core unhappy (or at least extremely bored) when it blocks for ~20us.

ChuckMcM · on Oct 30, 2012

I think we're talking past each other.

I completely agree with you that it would be challenging and probably quite unsatisfactory to take existing flash controllers and 'pretend' they were a memory controller.

What I am suggesting is that if there is a processor out there which can "consume" a large amount of flash attached to the physical memory bus, then you can will see controllers that are designed to operate well in that mode. Current arm chips for example often have the PSMI "bus" which is used both for pseudo-static memory and sometimes people hook up LCD controllers to that bus.

AMD is big enough to create a market for such flash controllers. And they could put hooks in the TLB such that a cache line fetch from that space could happen asynchronously with other stuff going on. I've got core memory planes that have slower read speeds than flash, I know its possible to make it work :-) But as you (wmf) point out it hasn't been done yet (other than internal NAND flash for embedded devices)

So if I were writing up the MRD or PRD [1] for the controller chip I'd start with provides a multi-channel way of doing load and store of program data at the cache line level. I'd put the wear leveling in the controller to minimize implementation complexity. I might also provide some 'staging' static ram, much like the 'open page' registers in a DRAM controller to keep track of requests that had happened so that I could do read ahead to improve sequential access.

[1] When I worked at Intel each new chip started with a 'Market Requirements Document' (MRD) which described the market for the chip and a 'Product Requirements Document' (PRD) which was a description of a product that could be sold into that market.

Tuna-Fish · on Oct 31, 2012

> But as you (wmf) point out it hasn't been done yet (other than internal NAND flash for embedded devices)

It's not done there either. When NAND is used in such applications, it is usually copied over to volatile ram before use.