Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The scaling laws for transformers _deliberately_ factor in the amount of data as well as the amount of compute needed in order to scale.

The premise of this article, that data is more important than compute has been obvious to people who are paying attention.

Sorry but the unnecessary sensationalism in this article was mildly annoying to me. Like the author discovered some novel new insight. A bit like that doctor who published a "no el" paper about how to find the area under a curve.



> The premise of this article... has been obvious to people who are paying attention.

Well, forgive me but I feel that the article is a much-needed injection of context into my thinking around the Bitter Lesson. I like the imperative to preface compute requests with data roadmaps.

I'm not an AI guy. Not an ML engineer. I've been studiously avoiding the low-level stuff, actually, because I didn't want to half-ass it when off-the-shelf solutions were still providing tremendous novelty and value for my customers.

So, for most of my career, "compute" has been practically irrelevant! RAM and disk constraints presented more frequent obstacles than processor cycles'. I would have easily told you that data presents more of a bottleneck to value than CPU. But that's just the era of computing I came up in.

The last few years have been different. Suddenly compute is at a premium, again. So it's easy to think, "if only I had more," and "line goes up!" and forget about s-curves and logarithmic scaling.

Is the article unnecessarily sensationalist? I don't know, maybe you've been overestimating how much the rest of us are "paying attention."[0]

0. https://xkcd.com/2501/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: