Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

10+ years ago I expected we would get AI that would impact blue collar work long before AI that impacted white collar work. Not sure exactly where I got the impression, but I remember some "rising tide of AI" analogy and graphic that had artists and scientists positioned on the high ground.

Recently it doesn't seem to be playing out as such. The current best LLMs I find marvelously impressive (despite their flaws), and yet... where are all the awesome robots? Why can't I buy a robot that loads my dishwasher for me?

Last year this really started to bug me, and after digging into it with some friends I think we collectively realized something that may be a hint at the answer.

As far as we know, it took roughly 100M-1B years to evolve human level "embodiment" (evolve from single celled organisms to human), but it only took around ~100k-1M for humanity to evolve language, knowledge transfer and abstract reasoning.

So it makes me wonder, is embodiment (advanced robotics) 1000x harder than LLMs from an information processing perspective?



> So it makes me wonder, is embodiment (advanced robotics) 1000x harder than LLMs from an information processing perspective?

Essentially, yes, but I would go further in saying that embodiment is harder than intelligence in and of itself.

I would argue that intelligence is a very simple and primitive mechanism compared to the evolved animal body, and the effectiveness of our own intelligence is circumstantial. We manage to dominate the world mainly by using brute force to simplify our environment and then maintaining and building systems on top of that simplified environment. If we didn't have the proper tools to selectively ablate our environment's complexity, the combinatorial explosion of factors would be too much to model and our intelligence would be of limited usefulness.

And that's what we see with LLMs: I think they model relatively faithfully what, say, separates humans from chimps, but it lacks the animal library of innate world understanding which is supposed to ground intellect and stop it from hallucinating nonsense. It's trained on human language, which is basically the shadows in Plato's cave. It's very good at tasks that operate in that shadow world, like writing emails, or programming, or writing trite stories, but most of our understanding of the world isn't encoded in language, except very very implicitly, which is not enough.

What trips us up here is that we find language-related tasks difficult, but that's likely because the ability evolved recently, not because they are intrinsically difficult (likewise, we find mental arithmetic difficult, but it not intrinsically so). As it turns out, language is simple. Programming is simple. I expect that logic and reasoning are also simple. The evolved animal primitives that actually interface with the real world, on the other hand, appear to be much more complicated (but time will tell).


Nicely said. This all aligns with my intuition, with one caveat.

I think you and I are using different definitions of intelligence. I'm bought into Karl Friston's free energy principle and think it's intelligence all the way down. There is no separating embodiment and intelligence.

The LLM distinction is intelligence via symbols as opposed to embodied intelligence, which is why I really like your shadow world analogy. Without getting caught up in subtle differences in our ontologies, I agree wholeheartedly.


You're right, we probably have different ontologies. To me an intelligent system is a system which aims to realize a goal through modelling its environment and planning actions to bring about that intended state. That's more or less what humans do and I think that's more in line with the colloquial understanding of it.

There are basically two approaches to defining intelligence, I think. You can either define it in terms of capability, in which case a system that has no intent and does not plan can be more intelligent than one that does, simply by virtue of being more effective. Or you can define it in terms of mechanism: something is intelligent if it operates in a specific way. But it may then turn out to be the case that some non-intelligent systems are more effective than some intelligent systems. Or you can do both and assume that there is some specific mechanism (human intelligence, conveniently) that is intrinsically better than the others, which is a mistake people commonly make and is the source of a lot of confusion.

I tend to go for the second approach because I think it's a more useful framing to talk about ourselves, but the first is also consistent. As long as we know what the other means.


If intelligence is treated as a scale, should it be measured primarily by (a) the diversity of valid actions an entity can take combined with its ability to collect and process information about its environment and predict outcomes, or (b) only by its ability to collect and process information and predict outcomes?

In either case, the smallest unit of intelligence could be seen as a component of a two-field or particle interaction, where information is exchanged and an outcome is determined. Scaled up, these interactions generate emergent properties, and at each higher level of abstraction, new layers of intelligence appear that drive increasing complexity. Under such a view, a less intelligent system might still excel in a narrow domain, while a more intelligent system, effective across a broader range, might perform worse in that same narrow context.

Depending on the context of the conversation, I might go along with some cut-off on the scale, but I don't see why the scale isn't continuous. Maybe it has stacked s-curves though...

We just happen to exist at an interesting spot on the fractal that's currently the highest point we can see. So it makes sense we would start with our own intelligence as the idea of intelligence itself.


I think it's an issue of hierarchies and the Society of Mind (Minsky). If a human touches a hot stove, or any animal's end effector, a lower-level process instantly pulls the hand/paw away from the heat. There are no doubt thousands of these 'smart body, no brain' interactions that take over in certain situations, conscious thinking not required.

Ken Goldberg shows that getting robots to operate in the real world using methods that have been successful getting LLMs to do things we consider smart -- getting huge amounts of training data -- seems unlikely. The vastness between what little data a company like Physical Intelligence has vs what GPT-5 uses is shown here: https://drive.google.com/file/d/16DzKxYvRutTN7GBflRZj57WgsFN... 84 seconds

Ken advocates plenty of Good Old-Fashioned Engineering to help close this gap, and worries that demos like Optimus actually set the field back because expectations are set too high. Like the AI researchers who were shocked by LLMs' advances, it's possible something out of left field will close this training gap for robots. I think it'll be at least 5 more years before robots will be among us as useful in-house servants. We'll see if the LLM hype has spilled over too much into the humanoid robot domain soon enough.


> But it may then turn out to be the case that some non-intelligent systems are more effective than some intelligent systems.

That is surely the case on limited scopes. For example the non neural net chess engines are better at chess than any human.

I think that neural networks compare with human intelligence in a fair way, because we should limit their training to the number of games that human professionals can reasonably play in their life. Alphago won't be much good after playing, let's say, 10 thousand games even starting from the corpus of existing human games.


>There is n separating embodiment and intelligence.

And yet whetever IQ you have, it can't make you just play the violin without actually having embodied practice first.


If you have sufficient motor control and dexterity, the amount of required practice should be approximately zero. Just calculate the required finger position and bow orientation, pressure, and velocity for optimal production of the desired sound and do that. That is not how humans perform physical tasks though.


> That is not how humans perform physical tasks though

is it not though? wouldn't it just be that our processing center isn't located completely in the skull as we typically think, but is extended to our spinal cord and nervous system? Something is being processed, you're just not conscious of the entire process. This is especially clear to me as a musician: as you're learning to play, you have to be absolutely aware of all of those processes until you can finally just let go and play!


You've captured a lot here with you shadow world summary. Very well done - I've been feeling this and now you've turned it into words and I'm pretty sure you're correct!


> We manage to dominate the world mainly by using brute force to simplify our environment and then maintaining and building systems on top of that simplified environment. If we didn't have the proper tools to selectively ablate our environment's complexity…

This is very interesting and I feel there is a lot to unpack here. Could you elaborate on this theory with a few more paragraphs (or books / blogs that elucidate this)? In what ways do we use brute force to simplify the environment, and are there not ways in which we use highly sophisticated leveraged methods to simplify our environment tools? What proper tools allow us to selectively ablate complexity? Why does our intelligence only operate on simplified forms?

Also, what would convince you that symbolic intelligence is actually “harder” than embodied intelligence? To me the natural test is how hard it is for each one to create the other. We know it took a few billion years to go from embodied intelligence (ie organisms that can undergo evolution, with enough diversity to survive nearly any conditions on Earth) to sophisticated symbolic intelligence. What if it turns out that within 100 years, symbolic intelligence (contained in LLM like systems) could produce the insights to eg create new synthetic life from scratch that was capable of undergoing self-sustained evolution in diverse and chaotic environments? Would this convince you that actually symbolic intelligence is the harder problem?


Not OP, but several examples:

A. instead of building a house on random terrain with random materials, first we prefer to flatten the place, then we use standard materials (e.g. bricks), which were produced from simple source (e.g. large and relatively homogenous deposit of clay).

B. For mental tasks it’s usual to said, that a person can handle only 7 items at a time (if you disagree multiply by 2-3). But when you ride a bike you process more inputs at the same time (you hear a car behind you, you see person on the right, you feel your balance, you anticipate your direction, if you feel strong wind or sun on your face you probably squint your eyes, you take a breath of air. On top of that all the processes of your body adjust and support your riding: heart, liver, stomach…)

C. “Spherical cows” in physics. (Google this if needed)


> Why does our intelligence only operate on simplified forms?

Part of the issue with discussing this is that our understanding of complexity is subjective and adapted to our own capabilities. But the gist of it is that the difficulty of modelling and predicting the behavior of a system scales very sharply with its complexity. At the end of the scale, chaotic systems are basically unintelligible. Since modelling is the bread and butter of intelligence, any action that makes the environment more predictable has outsized utility. Someone else gave pretty good examples, but I think it's generally obvious when you observe how "symbolic-smart" people think (engineers, rationalists, autistic people, etc.) They try to remove as many uncontrolled sources of complexity as possible. And they will rage against those that cannot be removed, if they don't flat out pretend they don't exist. Because in order to realize their goals, they need to prove things about these systems, and it doesn't take much before that becomes intractable.

One example of a system that I suspect to be intractable is human society itself. It is made out of intelligent entities, but as a whole I don't think it is intelligent, or that it has any overarching intent. It is insanely complex, however, and our attempts to model its behavior do not exactly have a good record. We can certainly model what would happen if everybody did this or that (aka a simpler humanity), but everybody doesn't do this and that, so that's moot. I think it's an illuminating example of the limitations of symbolic intelligence: we can create technology (simple), but we have absolutely no idea what the long term consequences are (complex). Even when we do, we can't do anything about it. The system is too strong, it's like trying to flatten the tides.

> To me the natural test is how hard it is for each one to create the other.

I don't think so. We already observe that humans, the quintessential symbolic intelligences, have created symbolic intelligence before embodied intelligence. In and of itself, that's a compelling data point that embodied is harder. And it appears likely that if LLMs were tasked to create symbolic intelligences, even assuming no access to previous research, they would recreate themselves faster than they would create embodied intelligences. Possibly they would do so faster than evolution, but I don't see why that matters, if they also happen to recreate symbolic intelligence even faster than that. In other words, if symbolic is harder... how the hell did we get there so quick? You see what I mean? It doesn't add up.

On a related note, I'd like to point out an additional subtlety regarding intelligence. Intelligence (unlike, say, evolution) has goals and it creates things to further these goals. So you create a new synthetic life. That's cool. But do you control it? Does it realize your intent? That's the hard part. That's the chief limitation of intelligence. Creating stuff that is provably aligned with your goals. If you don't care what happens, sure, you can copy evolution, you can copy other methods, you can create literally anything, perhaps very quickly, but that's... not smart. If we create synthetic life that eats the universe, that's not an achievement, that's a failure mode. (And if it faithfully realizes our intent then yeah I'm impressed.)


I think a lot of this is true, but not as critical as is being interpreted.

Compare the economics of purely cognitive AI to in-world robotics AI.

Pure cognitive: Massive scale systems for fast, frictionless and incredibly efficient cognitive system deployment and distribution of benefits are solved. On tap even. Cloud computing and the Internet.

What is the amortized cost per task? Almost nothing.

In-world: The cost of extracting raw resources, parts chain, material process chain, manufacturing, distributing, maintaining, etc.

Then what is the amortized cost per task, for one robot?

Several orders of magnitude more expensive, per task! There is no comparison.

Doing that profitably isn’t going to be the norm for many years.

At what price does a kitchen robot make sense? Not at $1,000,000. “Only $100,000?” “Only $25,000? “Only $10k”? Lower than that?

Compared to a Claude plan? That many people still turn down just to use free tier?

Long before general house helper robots makes any economic sense, we will have had walking talking, socializing, profitable-to-build sex robots at higher price points for price insensitive owners.

There are people who will pay high prices for that, when costs come down.

That will be the canary for general robotic servants or helpers.

The cost isn’t intelligence. There isn’t a particular challenge with in-world information processing and control. It’s the cost of the physical thing that processing happens in.

This is a purely economic problem. Not an AI problem at all.


It took about the same amount of time to evolve human-level intelligence as human-level mobility. Pretty much no other animal walks on two legs...


This is interesting to think about. It’s basically just birds and primates. Birds have an ancient evolutionary tree as they are dinosaurs, which did actually walk on two legs. But the gap between dinos and primates walking on two feet, I think, is tens of millions of years. So yea pretty long time.


This makes me think something else, though. Once we were able to reason about the physics behind the way things can move, we invented wheels. From there it's a few thousand years to steam engines and a couple hundred more years to jet planes and space travel.

We may have needed a billion years of evolution from a cell swimming around to a bipedal organism. But we are no longer speed limited by evolution. Is there any reason we couldn't teach a sufficiently intelligent disembodied mind the same physics and let it pick up where we left off?

I like the notion of the LLM's understanding being "shadows on the wall of Plato's cave metaphor," and language may be just that. But math and physics can describe the world much more precisely and, of you pair them with the linguistic descriptors, a wall shadow is not very different from what we perceive with out own senses and learn to navigate.


Note that wheels, steam engines, jet planes, spaceships wouldn't survive on their own in nature. Compared to natural structures, they are very simple, very straightforward. And while biological organisms are adapted to survive or thrive in complicated, ever-changing ecosystems, our machines thrive in sanitized environments. Wheels thrive on flat surfaces like roads, jet planes thrive in empty air devoid of trees, and so on. We ensure these conditions are met, and so far, pretty much none of our technology would survive without us. All this to say, we're playing a completely different game from evolution. A much, much easier game. Apples and oranges.

As for limits, in my opinion, there are a few limits human intelligence has that evolution doesn't. For example, intent is a double-edged sword: it is extremely effective if the environment can be accurately modelled and predicted, but if it can't be, it's useless. Intelligence is limited by chaos and the real world is chaotic: every little variation will eventually snowball into large scale consequences. "Eventually" is the key word here, as it takes time, and different systems have different sensitivities, but the point is that every measure has a half-life of sorts. It doesn't matter if you know the fundamentals of how physics work, it's not like you can simulate physics, using physics, faster than physics. Every model must be approximate and therefore has a finite horizon in which its predictions are valid. The question is how long. The better we are at controlling the environment so that it stays in a specific regime, the more effective we can be, but I don't think it's likely we can do this indefinitely. Eventually, chaos overpowers everything and nothing can be done.

Evolution, of course, having no intent, just does whatever it does, including things no intelligence would ever do because it could never prove to its satisfaction that it would help realize its intent.


Okay, but (1) we don't need to simulate physics faster than physics to make accurate-enough predictions to fly a plane, in our heads, or build a plane on paper, or to model flight in code. (2) If that's only because we've cleared out the trees and the Canada Geese and whatnot from our simplified model and "built the road" for the wheels, then necessity is also the mother of invention. "Hey, I want to fly but I keep crashing into trees" could lead an AI agent to keep crashing, or model flying chainsaws, or eventually something that would flatten the ground in the shape of a runway. In other words, why are we assuming that agents cannot shape the world (virtual, for now) to facilitate their simplified mechanical and physical models of "flight" or "rolling" in the same way that we do?

Also, isn't that what's actually scary about AI, in a nutshell? The fact that it may radically simplify our world to facilitate e.g. paper clip production?


> we don't need to simulate physics faster than physics to make accurate-enough predictions to fly a plane

No, but that's only a small part of what you need to model. It won't help you negotiate a plane-saturated airspace, or avoid missiles being shot at you, for example, but even that is still a small part. Navigation models won't help you with supply chains and acquiring the necessary energy and materials for maintenance. Many things can -- and will -- go wrong there.

> In other words, why are we assuming that agents cannot shape the world

I'm not assuming anything, sorry if I'm giving the wrong impression. They could. But the "shapability" of the world is an environment constraint, it isn't fully under the agent's control. To take the paper clipper example, it's not operating with the same constraints we are. For one, unlike us (notwithstanding our best efforts to do just that), it needs to "simplify" humanity. But humanity is a fast, powerful, reactive, unpredictable monster. We are harder to cut than trees. Could it cull us with a supervirus, or by destroying all oxygen, something like that? Maybe. But it's a big maybe. Such brute force takes requires a lot of resources, the acquisition of which is something else it has to do, and it has to maintain supply chains without accidentally sabotaging them by destroying too much.

So: yes. It's possible that it could do that. But it's not easy, especially if it has to "simplify" humans. And when we simplify, we use our animal intelligence quite a bit to create just the right shapes. An entity that doesn't have that has a handicap.


>Also, isn't that what's actually scary about AI, in a nutshell? The fact that it may radically simplify our world to facilitate e.g. paper clip production?

No, it's more about massive job losses and people left to float alone, mass increase in state control and surveillance, mass brain rot due to AI slop, and full deterioration of responsibility and services through automation and AI as a "responsibility shield".


Something that isn’t obvious when we’re talking about the invention of the wheel: we aren’t actually talking about the round shape thing, we’re actually talking about the invention of the axle which allowed mounting a stationary cart on moving wheels.


And the roadways (later, rails) on which it operates.

Meanwhile, entire civilizations in South America developed with little to no use of wheels, because the terrain was unsuited to roads.


It wasn't actually just terrain. It was actually availability of draft animals, climate conditions and actually most importantly... economics.

Wheeled vehicles aren't inherently better in a natural environment unless they're more efficient economically than the alternatives: pack animals, people carrying cargo, boats, etc.

South America didn't have good draft animals and lots of Africa didn't have the proper economic incentives: Sahara had bad surfaces where camels were absolutely better than carts and sub Saharan Africa had climate, terrain, tsetse flies and whatnot that made standard pack animals economically inefficient.

Humans are smart and lazy, they will do the easiest thing that let's them achieve their goals. This sometimes leads them to local maxima. That's why many "obvious" inventions took thousands of years to create (cotton gin, for example).


Yes, only humans, birds, sifakas, pangolins, kangaroos, and giant ground sloths. Only those six groups of creatures, and various lizards including the Jesus lizard which is bipedal on water, just those seven groups and sometimes goats and bears.


I get what you mean, that’s why the basically is there. Most, kangaroos and some lemurs in your list being the exception, do not move around primarily as bipeds. The ability to walk on two legs occasionally is different than genuinely having two legs and two arms.


And once every while, my cat.


Human-level mobility however is not much to write home about. Just one more variation of the many types seen in animals.

Human level intelligence is, otoh, qualitatively and quantitatively a bigger deal.


I wouldn't agree completely. Being bipedal frees up the hands for, anything, really.

We're better than most animals because we have tools. We have great tools because we have hands.


Birds? Bears whose front paws got injured? https://youtu.be/kcIkQaLJ9r8


Birds didn't develop hands, neither did bears. Also bears can't walk 100km on their hind legs, but we can.


Talking about "time to evolve something" seems patently absurd and unscientific to me. All of nature evolved simultaneously. Nature didn't first make the human body and then go "that's perfect for filling the dishwasher, now to make it talk amongst itself" and then evolve intelligence. It all evolved at the same time, in conjunction.

You cannot separate the mind and the body. They are the same physiological and material entity. Trying anyway is of course classic western canon.


>Nature didn't first make the human body and then go "that's perfect for filling the dishwasher, now to make it talk amongst itself" and then evolve intelligence. It all evolved at the same time, in conjunction.

Nature didn't make decisions about anything.

But it also absolutely didn't "all evolved at the same time, in conjunction" (if by that you mean all features, regarding body and intelligence, at the same rate).

>You cannot separate the mind and the body. They are the same physiological and material entity

The substrate is. Doesn't mean the nature of abstract thinking is the same as the nature of the body, in the same way the software as algorithm is not the same as hardware, even if it can only run on hardware.

But to the point: this is not about separating the "mind and the body". It's about how you can have humanoid form and all the typical human body functions for millions of years before you get human level intelligence, after many later evolution.

>Trying anyway is of course classic western canon.

It's also classic eastern canon, and several others besides.


> The substrate is. Doesn't mean the nature of abstract thinking is the same as the nature of the body, in the same way the software as algorithm is not the same as hardware, even if it can only run on hardware.

In this you are positing the existance of a _soul_ that exists separately from the body, and is portable amongst bodies. Analogues to how an algorithm (disembodied software) exists outside of the hardware and is portable amongst it (by embodying it as software).

I don't not agree with that at all, but it's impossible to know of you're right, but I can at least understand why you have a hard time with my argument and the east-west difference if tradition of the existance of a soul is that "obvious" to you.


I think whether it's "portable amongst bodies" is orthogonal. A specific consciousness of person X can very well only exist within the specific body of person X, and my argument still remains the same (not saying it's right, just that it's not premised on the constraint that there's a soul and it's independent/portable being true).

The argument is that whether consciousness is independent of a specific body or not, it's still of a different nature.

The consciousness part uses the body (e.g. nerve system, neurons etc), but it's nature is the informational exchange and it's essense is not in the construction of the body as a physical machine (though that's its base), but in the stored "weights" encoding memories and world-knowledge.

Same how with a CPU a specific program it runs is not defined by the CPU but the memory contents (data and variables and logic code). It might as well run in an abstract CPU, or one made of water tubes or billiard balls.

Of course in our case, the consciousness runs on a body - and only a specific body - and can't exist without one (same way a program can't exist as a running program without a CPU). But it doesn't mean its of the same nature as the body - just that the body is its substrate.


Plato's "Allegory of the cave" was uninteresting and uninformative when I first read it more than 50 years ago. It remains so today.

https://en.wikipedia.org/wiki/Allegory_of_the_cave

Also, other than in sculpture/dentistry/medicine I also find "ablation" to not be a particularly insightful metaphor either. Although I see ablation's application to LLMs I simply had to laugh when I first read about it: I envisioned starting with a Greyhound bus and blowing off parts until it was a Lotus 7 sports car!8-). Good luck with that! Kind of like fixing the TV set by kicking it (but it _does_ work sometimes!).

Perhaps we should refrain somewhat from applying metaphors/simile/allegories to describe LLMs relative to human intelligence unless they provide some insight of significant value.


>Plato's "Allegory of the cave" was uninteresting and uninformative when I first read it more than 50 years ago. It remains so today.

Anything can be uninteresting and uninformative when one doesn't see it's interestingness or can't grok its information.

It however stood for millenia as a great device to describe multiple layers of abstractions, deeper reality vs appearance, and so on, with utility as such in countless domains.


No. the Allegory is a fragment of a poor unfinished story and little more. You don't need it to explain "multiple layers of abstractions, deeper reality vs appearance" as you say. In fact, you don't need it for anything at all except to explain Plato's "Allegory of the cave". Sheesh.

coldtea says "...with utility as such in countless domains." So when's the last time you referred to the "Allegory of the cave" in your day, other than on HN?


>So when's the last time you referred to the "Allegory of the cave" in your day, other than on HN?

Several times. But it was with broadly educated people, not over-specialized one-dimensional ones.


I don’t think that’s what ablation is about. It’s more like blowing parts off a bus until it ceases to be a bus. Then you find the minimal set of bus parts required to still be a bus, and that’s an indication that those parts are important to the central task of being a bus.


taneq SAYS "i don’t think that’s what ablation is about. It’s more like blowing parts off a bus until it ceases to be a bus."

Different people have different goals. You want some form of minimal bus and I want a Lotus 7. There's no guarantee either of us reach our goal.

Ablation is about disassembling something randomly, whether little by little or on an arbitrary scale until [SOMETHING INTERESTING OR DESIRABLE HAPPENS].

https://en.wikipedia.org/wiki/Ablation_(artificial_intellige...

Ablation is laughable but sometimes useful. It is also easy, mostly brainless, NOT guaranteed to provide any useful information (so you've an excuse for the wasted resources), and occasionally provides insight. It's a good tool for software engineers who have no (or seek no) understanding of their system, so I think of ablation as a "last resort" solutions (e.g., another being to randomly modify code until it "works") that I disdain.

But I'm old so I'm probably wrong! Burn those CPU towers down, boys and girls!


> 10+ years ago I expected we would get AI that would impact blue collar work long before AI that impacted white collar work.

We did.

Like, to the point that the AI that radically impacted blue collar work isn't even part of what is considered “AI” any more.


I think it's Benedict Evans who frequently posts about 'blue collar' AI work not looking like humanoid robots but instead Amazon fulfillment centers keeping track of millions of individual items or tomato picking robots with MV cameras only keeping the ripe ones as it picks at absurd rates.

There are endless corners of the physical world right now where it's not worth automating a task if you need to assign an engineer and develop a software competency as a manufacturing or retail company, but would absolutely be worth it if you had a generalizable model that you could point-and-shoot at them.


Or a generalized model to develop them in a virtual sandbox before deploying them physically, which I think is more likely.


I think the bottleneck for this is still the cost of the physical hw of the robot, and its maintenance.

You need a fairly robust one that needs little maintenance, with a multitude of good sensors and precise actuators to be even remotely useful for sufficiently wide range of tasks (so that you have economy of scales). None of that comes cheap.


Part of the answer to this puzzle is that your dishwasher itself is a robot that washes dishes, and has had enormous impact on blue collar jobs since its invention and widespread deployment. There are tons of labor saving devices out there doing blue collar work that we don't think of as robots or as AI.


Not a robotics guy, but to extent that the same fundamentals hold—

I think it's a degrees of freedom question. Given the (relatively) low conditional entropy of natural language, there aren't actually that many degrees of (true) freedom. On the other hand, in the real world, there are massively more degrees of freedom both in general (3 dimensions, 6 degrees of movement per joint, M joints, continuous vs. discrete space, etc.) and also given the path dependence of actions, the non-standardized nature of actuators, actuators, kinematics, etc.

All in, you get crushed by the curse of dimensionality. Given N degrees of true freedom, you need O(exp(N)) data points to achieve the same performance. Folks do a bunch of clever things to address that dimensionality explosion, but I think the overly reductionist point still stands: although the real world is theoretically verifiable (and theoretically could produce infinite data), in practice we currently have exponentially less real-world data for an exponentially harder problem.

Real roboticists should chime in...


This understates the complexity of the problem. I have built a career modeling/learning entity behavior in the physical world at scale. Language is almost a trivial case by comparison.

Even the existence of most relationships in the physical world can only be inferred, never mind dimensionality. The correlations are often weak unless you are able to work with data sets that far exceed the entire corpus of all human text, and sometimes not even then. Language has relatively unambiguous structure that simply isn't the norm in real space-time data models. In some cases we can't unambiguously resolve causality and temporal ordering in the physical world. Human brains aren't fussed by this.

There is a powerful litmus test for things "AI" can do. Theoretically, indexing and learning are equivalent problems. There are many practical data models for which no scalable indexing algorithm exists in literature. This has an almost perfect overlap with data models that current AI tech is demonstrably incapable of learning. A company with novel AI tech that can learn a hard data model can demonstrate a zero-knowledge proof of capability by qualitatively improving indexing performance of said data models at scale.

Synthetic "world models" so thoroughly nerf the computer science problem that they won't translate to anything real.


But we don't need to know all the things that could happen if M joints moved in every possible way at the same time. We operate within normal constraints. When you see someone trip on a sidewalk and recover before falling on their face, that's still a physical system taking signals and suggesting corrections that could be simulated in a relatively straightforward newtonian virtual reality, and trained a billion times on with however many virtual joints and actuators.

In terms of "world building", it makes sense for the "world" to not be dreamed up by an AI, but to have hard deterministic limits to bump up against in training.

I guess what I mean is that humans in the world constantly face a lot of conditions that can lead to undefined behavior as well, but 99% of the time not falling on your face is good enough to get you a job washing dishes.


In other words, self driving cars and robot vacuum cleaners cannot exist. Hmm.


LOL. Both of those are very limited and work in 2D spaces in highly constrained environments especially designed for them.


Also not a robotics guy, but that all sounds right to me...

What I do have deep experience in is market abstractions and jobs to be done theory. There are so many ways to describe intent, and it's extremely hard to describe intent precisely. So in addition to all the dimensions you brought up that relate to physical space, there is also the hard problem of mapping user intent to action with minimal "error", especially since the errors can have big consequences in the physical world. In other words, the "intent space" also has many dimensions to it, far beyond what LLMs can currently handle.

On one end of the spectrum of consequences is the robot loads my dishwasher such that there is too much overlap and a bunch of the dishes don't get cleaned (what I really want is for the dishes to be clean, not for the dishes to be in the dishwasher), and on the other end we get the robot that overpowers humanity and turns the universe into paperclips.

So maybe we have to master LLMs and probably a whole other paradigm before robots can really be general purpose and useful.


As I could see, classic methods (used in children teaching) could create at least magnitude more data than we have now, just paraphrasing text (classic NLP), but depends on language (I'll try explain).

Text really have lot of degrees of freedom, but depends on language, and even more on type of alphabet - modern English with phonetic alphabet is worst choice, because it is simplest, nearly nobody use second-third hidden meaning (I hear about 2-3 to 5-6 meanings depending on source); hieroglyphic languages are much more information rich (10-22 meanings); and what is interest, phonetic languages in totalitarian countries (like Russian) are also much more rich (8-12 meanings), because they used to hide few meanings from government to avoid punishment.

Language difference (more dimensions) could be explanation of current achievements of China, superior to Western, and it could also be hint, on how to boost Western achievements - I mean, use more scientists from Eastern Europe and give more attention to Eastern European languages.

For 3D robots, I see only one way - computational simulated environment.


Autonomous vehicles are an interesting subset.

Even though the system rules and I/O are tightly constrained, they're still struggling to match human performance in an open-world scenario, after a gigantic R&D investment with a crystal clear path to return.

Fifteen years ago I thought that'd be a robustly solved problem by now. It's getting there, but I think I'll still need to invest in driving lessons for my teenage kids. Which is pretty annoying, honestly: expensive, dangerous for a newly qualified driver, and a massive waste of time that could be used for better things. (OK, track days and mountain passes are fun. 99% of driving is just boring, unnecessary suckage).

What's notable: AVs have vastly better sensors than humans, masses of compute, potentially 10X reaction speed. What they struggle with is nuance and complexity.

Also, AVs don't have to solve the exact same problems as a human driver. For example, parking lots: they don't need to figure out echelon parking or multi-storey lots, they can drop their passengers and drive somewhere else further away to park.


> in practice we currently have exponentially less real-world data for an exponentially harder problem

Is that where learning comes in? Any actual AGI machine will be able to learn. We should be able to buy a robot that comes ready to learn and we teach it all the things we want it to do. That might mean a lot of broken dishes at first, but it's about what you would expect if you were to ask a toddler to load your dishes into the dishwasher.

My personal bar for when we reach actual AGI is when it can be put in a robot body that can navigate our world, understand spatial relationships, and can learn from ordinary people.


We think this because ten years ago we were all having our minds blown by DeepMind's game playing achievements and videos of dancing robots and thought this meant blue collar work would be solved imminently.

But most of these solutions were more crude than they let on, and you wouldn't really know unless you were working in AI already.

Watch John Carmack's recent talk at Upper Bound if you want him to see him destroy like a trillion dollars worth of AI hype.

https://m.youtube.com/watch?v=rQ-An5bhkrs&t=11303s&pp=2AGnWJ...

Spoiler: we're nowhere close to AGI


> But most of these solutions were more crude than they let on, and you wouldn't really know unless you were working in AI already.

Same with LLMs. Despite having seen this play out before, and being aware of this, people are falling for it again.


Thank you for this update. I vividly remember a few years ago the excitement of John Carmack announcing he was retreating into his cave to do some deep work on AGI, pushing the boundaries of the current AI research. I truly appreciate Carmack's intellectual honesty now at announcing "yeah, no, LLMs are not the way to go to recreate anything remotely close to human intelligence.". In fact, and I quote him, "we do not even have a line of sight to [the fundamentals of intelligence]."

I'm honestly relieved that one of the brightest minds in computing, with all the resources and desire to create actual super-intelligences, has had to temper hard his expectations.


I don't think that quote from Carmack represents some deeply considered conclusion. He started off his efforts with embodiment. He either never considered LLMs a path towards AGI, or thought he didn't personally have anything to contribute to LLMs (he talked about it early on in his journey but I don't remember the specifics). He didn't spend a year investigating LLMs and then decide that they weren't the path to AGI. The point is that he has no special insight regarding LLMs relationship to AGI and its misleading to imply that his current effort towards building AGI that eschew LLMs is an expert opinion.


Yes, I meant to say that, for Carmack, no type of modern AI research has figured out the path to actual general intelligence. I just didn't want to use the meaningless "AI" buzzword, and these days all the focus and money is on large language models, especially when talking about the end goal of AGI.


> 10+ years ago I expected we would get AI that would impact blue collar work long before AI that impacted white collar work. Not sure exactly where I got the impression, but I remember some "rising tide of AI" analogy and graphic that had artists and scientists positioned on the high ground.

The moment you strip away the magical thinking, the humanization (bugs not hallucinations) what you realize is that this is just progress. Ford in the 1960's putting in the first robot arms vs auto manufacturing today. The phone: from switch board operators, to mechanical switching to digital to... (I think phone is in some odd hybrid era with text but only time will tell). Draftsmen in the 1970's all replaced by autocad by the 90's. GO further back to 1920, 30 percent of Americans were farmers, today thats less than 2.

Humans, on very human scales are very good at finding all new ways of making ourselves "busy" and "productive".


The big robot AI issue is: no data!

There is a lot of high quality text from diverse domains, there's a lot of audio or images or videos around. The largest robotics datasets are absolutely pathetic in size compared to that. We didn't collect or stockpile the right data in advance. Embodiment may be hard by itself, but doing embodiment in this data-barren wasteland is living hell.

So you throw everything but the kitchen sink at the problem. You pre-train on non-robotics data to squeeze transfer learning for all its worth, you run hard sims, a hundred flavors of data augmentation, you get hardware and set up actual warehouses with test benches where robots try their hand at specific tasks to collect more data.

And all of that combined only gets you to "meh" real world performance - slow, flaky, fairly brittle, and on relatively narrow tasks. Often good enough for an impressive demo, but not good enough to replace human workers yet.

There's a reason why a lot of those bleeding edge AI powered robots are designed for and ship with either teleoperation capabilities, or demonstration-replay capabilities. Companies that are doing this hope to start pushing units first, and then use human operators to start building up some of the "real world" datasets they need to actually train those robots to be more capable of autonomous operation.

Having to deal with Capital H Hardware is the big non-AI issue. You can push ChatGPT to 100 million devices, as long as you have a product people want to use for the price of "free", and the GPUs to deal with inference demand. You can't materialize 100 million actual physical robot bodies out of nowhere for free, GPUs or no GPUs. Scaling up is hard and expensive.


> And all of that combined only gets you to "meh" real world performance - slow, flaky, fairly brittle, and on relatively narrow tasks. Often good enough for an impressive demo, but not good enough to replace human workers yet.

Sounds like LLMs to me.


It's like GPT-3.5 - a proof-of-concept tech demo more than a product.

I don't think further improvements are impossible, not at all. They're just hard to get at.


Embodiment is 1000x harder from a physical perspective.

Look at how hard it is for us to make reliable laptop hinges or the articulated car door handle trend (started by Tesla) where they constantly break.

These are simple mechanisms compared to any animal or human body. Our bodies last up to 80-100 years through not just constant regeneration but organic super-materials that rival anything synthetic in terms of durability within its spec range. Nature is full of this, like spider silk much stronger than steel or joints that can take repeated impacts for decades. This is what hundreds of millions to billions of years of evolution gets you.

We can build robots this good but they are expensive, so expensive that just hiring someone to do it manually is cheaper. So the problem is that good quality robots are still much more expensive than human labor.

The only areas where robots have replaced human labor is where the economics work, like huge volume manufacturing, or where humans can’t easily go or can’t perform. The latter includes tasks like lifting and moving things thousands of times larger than humans can or environments like high temperatures, deep space, the bottom of the ocean, radioactive environments, etc.


The problem is not the robot loading the diswasher, it is the dishwasher. The dishwasher (and general kitchen electronics) industry has not innovated in a long time.

My prediction is a new player will come in who vertically integrates these currently disjoint industries and product. The tableware used should be compatible with the dishwasher, the packaging of my groceries should be compatible with the cooking system. Like a mini-factory.

But current vendors have no financial incentive to do so, because if you take a step back the whole notion of putting one room of your apartment full with random electronics just to cook a meal once in a blue moon is deeply inefficient. End-to-end food automation is coming to the restaurant business, and I hope it pushes prices of meals so far down that having a dedicated room for a kitchen in the apartment is simply not worth it.

That's the "utopia" version of things.

In reality, we see prices for fast food (the most automated food business) going up while quality is going down. Does it make the established players more vulnerable to disruption? I think so.


This exists already in the form of "ready meals" a.k.a. TV dinners. Fast food shops are already substantially mechanised; huge effots have been made to robotize cooking, but people are still cheaper to hire. It's still nowhere near the quality of home-cooked food.


Yes, there are a lot of garbage microwave food offerings, especially popular with the US population. As a European I'm talking about quality food made with an automated process and end-to-end automation, including ingredient procurement and cleanup.

Not in competition with trash food but with proper food and local ingredients.


> the whole notion of putting one room of your apartment full with random electronics just to cook a meal once in a blue moon is deeply inefficient

You don't use your kitchen? After the rooms we sleep in, the kitchen is probably the most used space in my home. We are planning an upcoming renovation of our home and the kitchen is where we plan on spending the most money.

> The tableware used should be compatible with the dishwasher

Aside from non-dishwasher safe items, what tableware is incompatible with a dishwasher?


Yes, of course I use it a lot. It is a great hobby. But only use it because it is kind of forced upon us. It's just so inefficient nowadays. Cooking used to be for the whole homestead or for the large family. Now it is mostly only for the immediate family. All the machines are not utilized properly. When people discussed car sharing it was exactly the same argument and I feel it also applies to kitchens.

With the "tableware" argument I meant something like a standardized (magnetic?) adapter for grabbing plates, forks and knives so they can easily be moved by machines/robots.

I feel a company like Ikea is perfectly set up to make this idea a reality, but they'll never do so because they make much more money when every single household buys all these appliances and items for their own kitchen.

Just from the perspective of a single household in a densely populated city I think it'd be nice to have freshly cooked, reproducibly prepared meals with high-quality ingredients available to me. Like an automated soup kitchen with cleanup. Without all the layers of plastic wrapping needed to move produce from large-scale distributors into single-household fridges and so on.


I think what a lot of people missed when they were talking about shared cars a few years ago is that people seem to mostly like their cars. They spend far more on them than they need to. The average price for a new car is almost $50k now when a vehicle costing half that would satisfy most people's needs.

I'm guessing people mostly overspend on kitchens as well. When our renovation happens, I'm sure we will and I'll feel pretty good about it.

For cars and kitchens, utilization considerations seem to be ranked way, way below things like comfort and convenience and beauty.


> 10+ years ago I expected we would get AI that would impact blue collar work long before AI that impacted white collar work.

I'm not sure where people get this impression from, even back decades ago. Hardware is always harder than software. We had chess engines in the 20th century but a robotic hand that could move pieces? That was obviously not as easy because dealing with the physical world always has issues that dealing with the virtual doesn't.


Robots are only harder because they have expensive hardware. We already have robots that can load dishwashers and do other manual work but humans are cheaper so there isn't much of a market for them.

The rising tide idea came from a 1997 paper by Moravec. Here's a nice graphic and subsequent history https://lifearchitect.ai/flood/

Interestingly, Moravec also stated: "When the highest peaks are covered, there will be machines than can interact as intelligently as any human on any subject. The presence of minds in machines will then become self−evident." We pretty much have those today so by 1997 standards, machines have minds, yet somehow we moved the goalposts and decided that doesn't count anymore. Even if LLMs end up being strictly more capable than every human on every subject, I'm sure we'll find some new excuse why they don't have minds or aren't really intelligent.


> Interestingly, Moravec also stated: "When the highest peaks are covered, there will be machines than can interact as intelligently as any human on any subject. The presence of minds in machines will then become self−evident

> We pretty much have those today so by 1997 standards, machines have minds, yet somehow we moved the goalposts and decided that doesn't count anymore

What you describe as "moving the goalposts" could also just be explained as simply not meeting the standard of "as intelligently as any human on any subject".

Even in the strongest possible example of LLM's strengths applying their encyclopedic knowledge and (more limited) ability to apply that knowledge for a given subject I don't think they meet that bar. Especially if we're comparing to a human over a time period greater than 30 minutes or so.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: