Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm not sure it'd help what they are talking about much.

E.g. go back in time and imagine you didn't know there are ways for computers to be really good at performing integration yet as nobody had tried to make them. If someone asked you how to tell if something is intelligent "the ability to easily reason integrations or calculate extremely large multiplications in mathematics" might seem like a great test to make.

Skip forward to the modern era and it's blatantly obvious CASes like Mathematica on a modern computer range between "ridiculously better than the average person" to "impossibly better than the best person" depending on the test. At the same time, it becomes painfully obvious a CAS is wholly unrelated to general intelligence and just because your test might have been solvable by an AGI doesn't mean solving it proves something must have been an AGI.

So you come up with a new test... but you have the same problem as originally, it seems like anything non-human completely bombs and an AGI would do well... but how do you know the thing that solves it will have been an AGI for sure and not just another system clearly unrelated?

Short of a more clever way what GP is saying is the goalposts must keep being moved until it's not so obvious the thing isn't AGI, not that the average human gets a certain score which is worse.

.

All that aside, to answer your original question, in the presentation it was said the average human gets 85% and this was the first model to beat that. It was also said a second version is being worked on. They have some papers on their site about clear examples of why the current test clearly has a lot of testing unrelated to whether something is really AGI (a brute force method was shown to get >50% in 2020) so their aim is to create a new goalpost test and see how things shake out this time.



> So you come up with a new test... but you have the same problem as originally, it seems like anything non-human completely bombs and an AGI would do well... but how do you know the thing that solves it will have been an AGI for sure and not just another system clearly unrelated?

We should skip to the end and just define a task like "it's AGI if it can predict, with 100% accuracy the average human's next action in any situation". Anything that can do that is as good as AGI even if people manage to find a proxy for the task.


Generality is not binary. It's a spectrum. And these models are already general in ways those things you've mentioned simply weren't.

What exactly is AGI to you ? If it's simply a generally intelligent machine then what are you waiting for ? What else is there to be sure of ? There's nothing narrow about these models.

Humans love to believe they're oh so special so much that there will always be debates on whether 'AGI' has arrived. If you are waiting for that then you'll be waiting a very long time, even if a machine arrives that takes us to the next frontier in science.


> There's nothing narrow about these models.

There is, they can't create new ideas like humanity can. AGI should be able to replace humanity in terms of thinking, otherwise it isn't general, you would just have a model specialized at reproducing thoughts and patterns human have thought before, it still can't recreate science from scratch etc like humanity did, meaning it can't do science properly.

Comparing an AI to a single individual is not how you measure AGI, if a group of humans perform better then you can't use the AI to replace that group of humans, and thus the AI isn't an AGI since it couldn't replace the group humans.

So for example, if a group of programmers write more reliable programs than the AI, then you can't replace that group of programmers with the AI, even if you duplicate that AI many times, since the AI isn't capable of reproducing that same level of reliability when ran in parallel. This is due to an AI being run in parallel is still just an AI, an ensemble model is still just an AI, so the model the AI has to beat is the human ensemble called humanity.

If we lower the bar a bit at least it has to beat 100 000 humans working together to make a job obsolete, since all the tutorials etc and all such things are made by other humans as well if you remove the job those would also disappear and the AI would have to do the work of all of those, so if it can't humans will still be needed.

Its possible you will be able to substitute part of those human ensembles with AI much sooner, but then we just call it a tool. (We also call narrow humans tools, it is fair)


I see these models create new ideas. At least at the standard humans are beholden to, so this just falls flat for me.


You don't just need to create an idea, you need to be able to create ideas that on average progress in a positive direction. Humans can evidently do that, AI can't, when AI work too much without human input you always end up with nonsense.

In order to write general program you need to have that skill. Every new code snipped needs to be evaluated by that system, whether it makes the codebase better or not. The lack of that ability is why you can't just loop an LLM today to replace programmers. It might be possible to automate it for specific programming tasks, but not general purpose programming.

Overcoming that hurdle is not something I think LLM ever can do, you need a totally different kind of architecture, not something that is trained to mimic but trained to reason. I don't know how to train something that can reason about noisy unstructured data, we will probably figure that out at some point but it probably wont be LLM as they are today.


I'm firmly in the "absolutely nothing special about human intelligence" camp so don't let dismissal of this as AGI fuel any misconceptions as to why I might think that.

As for what AGI is? Well, the lack of being able to describe that brings us full circle in this thread - I'll tell you for sure when I've seen it for the first time and have the power of hindsight to say what was missing. I think these models are the closest we've come but it feels like there is at least 1-2 more "4o->o1" style architecture changes where it's not necessarily about an increase in model fitting and more about a change in how the model comes to an output before we get to what I'd be willing to call AGI.

Who knows though, maybe some of those changes come along and it's closer but still missing some process to reason well enough to be AGI rather than a midway tool.


"Short of a more clever way what GP is saying is the goalposts must keep being moved until it's not so obvious the thing isn't AGI, not that the average human gets a certain score which is worse."

Best way of stating that I've heard.

The Goal Post must keep moving, until we understand enough what is happening.

I usually poo-poo the goal post moving, but this makes sense.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: