I think it's worth noting that EleutherAI is a grassroots collection of researchers, which distinguishes it from academia/industry labs.
As part of their work on democratizing AI, they're now hoping to replicate GPT-3 and release it for free (unlike OpenAI's API).
I would encourage everyone interested to join their discord server (https://discord.gg/BK2v3EJ) -- they're extremely friendly and I think it's a project worth contributing to.
Connor Leahy, who I think is a sort of BDFL figure for ElutherAI, mentioned in a Slatestarcodex online meetup I attended that Google donated millions of dollars worth of preemptable TPU credits to the project. There is a video of the meetup on YouTube somewhere. Struck me as a really smart kid with a lot of passion.
Haha Connor (although one of the main participants) definitely isn't a BDFL - we don't have any BDFLs :)
We don't really have much of a hierarchy at all - it's mostly just a collection of researchers of widely varying backgrounds all interested in ML research.
I'm not sure what a BDFL figure is, but Google does not give us millions of dollars. We are a part of TFRC, a program where researchers and non-profits can borrow TPUs when they're not being used. You could say that we are indirectly funded as a result, but it's nowhere near millions of dollars and it doesn't reflect any kind of special relationship with Google.
they'll probably run it on scientific clusters of various universities, or on collections of idle lab desktop machines. Both of these tend to sit idle a lot of the time, based on my experience at uni in Europe.
570GB of Common Crawl post-filtering, but only 40% of CC data was seen even once during training, though CC is only 60% of the training data. You could work through the math to find the rough size of GPT-3's training data, but it sounds like The Pile is of comparable size.
Yeah, the Pile is approximately the size of the GPT-3 training data, which is not a coincidence--one major reason we created the Pile (though certainly not the only one) was for our GPT-3 replication project.
As part of their work on democratizing AI, they're now hoping to replicate GPT-3 and release it for free (unlike OpenAI's API).
I would encourage everyone interested to join their discord server (https://discord.gg/BK2v3EJ) -- they're extremely friendly and I think it's a project worth contributing to.