RL or no RL, AI cannot escape the distribution it's trained on. It's just that t...

crthpl · 2026-05-09T15:09:30 1778339370

Yes it can! That's the whole point of RL! it generates slightly out of distribution rollouts, and rewards good rollouts to change the distribution of the output

energy123 · 2026-05-09T19:21:54 1778354514

That's not out of distributíon, that's inside the distribution of the rollout. If you don't create rollouts for the game of Chess then it doesn't know how to play Chess no matter how smart it is at tasks you've created rollouts for. It's structurally stuck in its distribution.

djeastm · 2026-05-10T02:59:07 1778381947

What if it doesn't need to escape the distribution, it can just exhaust the current distribution we have much more broadly and efficiently than humans can?

So the answers we're seeking to our bleeding edge questions are already there, we just need an AI's ability to target the answers. Then re-train on the improvements and go from there.

Just a thought.