There should be a benchmark that tells the AI it's previous answer was wrong and...

		nprateem on Dec 20, 2024 \| parent \| context \| favorite \| on: OpenAI O3 breakthrough high score on ARC-AGI-PUB There should be a benchmark that tells the AI it's previous answer was wrong and test the number of times it either corrects itself or incorrectly capitulates, since it seems easy to trip them up when they are in fact right.