Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sorry, are you familiar with what a next token distribution is, mathematically speaking?

If you are not, let me introduce you to the term: a probability distribution.

Just because it has profound properties ... doesn't make it different.

> has all the trappings of the stochastic parrot-style HN-discourse that has been consistently wrong for almost a decade now

Perhaps respond to my actual comment compared to whatever meta-level grouping you wish to interpret it as part of?

> It contains a number of premises that we have no business being confident in. We are potentially witnessing the obviation of human cognitive labor.

What premises? Be clear.



I think they are questioning whether human feedback is even necessary to make progress, i.e. whether the premise that RL needs to be RLHF is true.

My (limited) understanding is that LLMs are not capable of escaping their learned distribution by simply feeding on their own output.

But the question is whether the required external (out of distribution) "stimulus" needs to come from humans.

Could LLMs design experiments/interventions to get feedback from their environment like human scientists would?

I have my doubts that this is possible without an inherent causal reasoning capability but I'm not sure.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: