One example from today: I had a coding bug which I asked R1 about. The final ans...

gwd · on Jan 29, 2025

> I presume OpenAI kept their traces a secret to prevent their competitors from training models with it

At some point there was a paper they'd written about it, and IIRC the logic presented was like this:

- We (the OpenAI safety people) want to be able to have insight into what o1 is actually thinking, not a self-censored "people are watching me" version of its thinking.

- o1 knows all kinds of potentially harmful information, like how to make bombs, how to cook meth, how to manipulate someone, etc, which could "cause harm" if seen by an end-user

So the options as they saw it were:

1. RLHF both the internal thinking and the final output. In this case the thought process would avoid saying things that might "cause harm", and so could be shown to the user. But they would have a less clear picture of what the LLM was "actually" thinking, and the potential state space of exploration would be limited due to the self-censorship.

2. Only RLHF the final output. In this case, they can have a clearer picture into what the LLM is "actually" thinking (and the LLM could potentially explore the state space more fully without risking about causing harm), but thought process could internally mention things which they don't want the user to see.

OpenAI went with #2. Not sure what DeepSeek has done -- whether they have RLHF'd the CoT as well, or just not worried as much about it.

manmal · on Jan 28, 2025

Do you use Continue.dev or similar tools to load code into the context, or do you copypaste into their web chat?