Given a background in programming, you can ramp up on this in an afternoon. Here's an intro "crib sheet"
Your application believes that it has all the RAM to itself. This is a lie that the operating system and hardware tell your application to decouple the physical RAM addresses and the ones your application uses (virtual RAM addresses). Learn more about virtual memory here: http://en.wikipedia.org/wiki/Virtual_memory
In order to keep this mirage working, the computer needs to map from virtual address to physical address. Instead of tracking every single address, it tracks spans of addresses. So, the address your application sees as 0 to 4096 will map to physical address 5000 to 9096. Keeping this map using fixed-size spans keeps the size of the mapping down and the performance fast.
This article is about using bigger spans (0 to about 1 billion) instead of the standard 4kb. The advantage of this is that the mapping from virtual to physical is stored in memory as a tree and bigger spans mean you need fewer nodes in the tree. Fewer nodes means you have less traversals/indirection to find the node you are looking for. Less work means faster performance.
The details about the caching and the counts of TLB in the processor has to do with how much dedicated space is in different parts of the CPU for this mapping information.
The details about offsets and changing how the memory was accessed in order get positive / negative performance in the tradeoff of 4kb vs 1gb have to do with wether the mapping information was in the cache or not. it is similar to alignment: http://en.wikipedia.org/wiki/Data_structure_alignment
Finally, in order to use these 1gb maps instead of 4kb maps, the programmer has to leverage special way of allocating memory from the operating system called mmap http://en.wikipedia.org/wiki/Mmap