RL or no RL, AI cannot escape the distribution it's trained on. It's just that the labs will put so much into the distribution that we won't be able to tell the difference that easily, nor will it matter for most tasks. The reason AI does well on ARC-AGI-2 is because the labs created synthetic training data using similar puzzles.
Yes it can! That's the whole point of RL! it generates slightly out of distribution rollouts, and rewards good rollouts to change the distribution of the output
That's not out of distributíon, that's inside the distribution of the rollout. If you don't create rollouts for the game of Chess then it doesn't know how to play Chess no matter how smart it is at tasks you've created rollouts for. It's structurally stuck in its distribution.
What if it doesn't need to escape the distribution, it can just exhaust the current distribution we have much more broadly and efficiently than humans can?
So the answers we're seeking to our bleeding edge questions are already there, we just need an AI's ability to target the answers. Then re-train on the improvements and go from there.