The scaling laws for transformers _deliberately_ factor in the amount of data as...

gavmor · 2025-09-04T00:11:48 1756944708

> The premise of this article... has been obvious to people who are paying attention.

Well, forgive me but I feel that the article is a much-needed injection of context into my thinking around the Bitter Lesson. I like the imperative to preface compute requests with data roadmaps.

I'm not an AI guy. Not an ML engineer. I've been studiously avoiding the low-level stuff, actually, because I didn't want to half-ass it when off-the-shelf solutions were still providing tremendous novelty and value for my customers.

So, for most of my career, "compute" has been practically irrelevant! RAM and disk constraints presented more frequent obstacles than processor cycles'. I would have easily told you that data presents more of a bottleneck to value than CPU. But that's just the era of computing I came up in.

The last few years have been different. Suddenly compute is at a premium, again. So it's easy to think, "if only I had more," and "line goes up!" and forget about s-curves and logarithmic scaling.

Is the article unnecessarily sensationalist? I don't know, maybe you've been overestimating how much the rest of us are "paying attention."[0]

0. https://xkcd.com/2501/