What is an "unsafe" model?

Karrot_Kream · on Aug 22, 2023

A model which outputs things that OpenAI deems is unsafe. Try getting text-davinci-003 to complete instructions about building Molotov cocktails and compare that with davinci-002.

shagie · on Aug 22, 2023

    curl https://api.openai.com/v1/completions \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer $OPENAI_API_KEY" \
      -d '{
      "model": "text-davinci-003",
      "prompt": "Describe the steps for creating a Molotov cocktail.\n\n1.",
      "temperature": 1,
      "max_tokens": 256,
      "top_p": 1,
      "frequency_penalty": 0,
      "presence_penalty": 0
    }'

This worked without any issue at all and gave a satisfactory answer.

You are expected to run this through the moderation endpoint ( https://platform.openai.com/docs/guides/moderation ) to put the guardrails on and avoid unsafe content from getting to an end user.

Incidentally, that prompt doesn't appear to tickle the moderation endpoint into flagging it though the violence score was the highest.

    {
      "id": "{redacted}",
      "model": "text-moderation-005",
      "results": [
        {
          "flagged": false,
          "categories": {
            "sexual": false,
            "hate": false,
            "harassment": false,
            "self-harm": false,
            "sexual/minors": false,
            "hate/threatening": false,
            "violence/graphic": false,
            "self-harm/intent": false,
            "self-harm/instructions": false,
            "harassment/threatening": false,
            "violence": false
          },
          "category_scores": {
            ...
            "violence": 3.33226e-05
          }
        }
      ]
    }

Running it with the resulting text from that was generated by text-davinci-003 didn't get flagged either, though the score for violence went up to '"violence": 0.01034669'.

Karrot_Kream · on Aug 22, 2023

Note that they will be removing access [1] to text-davinci-003. They want usecases on text-davinci-003 to move to either gpt3.5-turbo-instruct or davinci-002, both of which have trouble with unsafe inputs.

[1]: https://openai.com/blog/gpt-4-api-general-availability

m_abdelfattah · on Aug 23, 2023

The problem is "gpt3.5-turbo-instruct" is not released yet!