Lmst

Multi-agentic foundation models are important for #robotics and #automation in negotiated and adversarial places such as #traffic and #warfare.

But how to implement them? I have previously drafted a data-centric architecture for decomposing agentic representations for #UniversalEmbodiment in a GitHub repository.

But LLMs already have internalized multi-agentic representations, why can't we utilize them directly? For example, in text you can easily ask an LLM to describe all the persons or agents present in the scene and their intents.

We can and we must certainly utilize these! But these representations aren't grounded.

What we need to do is to craft robotic foundation model training data to involve scenarios where there are multiple agents present.

First start acausally from what ultimately happened — how was the scenario negotiated between multiple participants, who drove first, what attack and evasive patterns were used?

As we then know what happened, we can go back in time and ask the foundation model to identify all the participants in the feed, and complete their intentions with the information from the ultimate outcome.

The foundation model can then utilize all the language space knowledge it has about multi-agent environments, but also anchor this to visual and control signals present in the training data.

This allows the model to not only answer questions of what each participant intents to do, but also anchor this to multi-modal sensory information, and also project embodiment related control intents to all the participants in the scenario, not only ego.

Ego becomes just a special case in robotic control, the model should learn to generalize to project control intents to all agents present in the data.

Ultimately this allows the foundation model to learn from perceived and projected experiences of others, to learn to imitate or not imitate what it has seen other agents do.

It's all about crafting data, not really about sophisticated model architectures.

#RoboticFoundationModels #FoundationModels #PhysicalAI #AI #AGI

It's not really about structured versus unstructured environments for #robots anymore. It's static versus agentic.

Robots in the real world will encounter other agents. Autonomous cars will need to negotiate with all kinds of other road users, including cats, which are everywhere in Spain at least. There was a video from east Asia where an old lady was drying their vegetables on the road and an autonomous car was insistent on driving over them while the lady was trying their best to defend them.

So, for any autonomous robot "in the real world" the true challenge isn't anymore that there are no standard grasping surfaces and items aren't in predefined places. Those are solved problems.

The challenge is in agentic environments where the system needs to understand the other living or at least moving entities and their objectives to appropriately navigate the inherently social situations.

This isn't only about cats trying to trip humanoid robots in stairs. It's also non-living things like fire. Humans model fire psychologically as an entity with an intent. Hence they are evolutionarily adapted to being able to keep a fire burning, or limit its destruction by putting it off.

Human psychology is very Aristotlean in the way it models heavy things "wanting" to go down. Robotic psychology will need similar understanding to be able to negotiate, guide and harness dynamic entities in the world effectively.

For these purposes we will need to replace static world models with agentic world models which properly accommodate non-ego agents and non-ego intents in the world. What's cool about that is that it will also enable a model to learn from third party experience which is always more abundant than ego experience. Monkey-see-monkey-do, or in some cases learn to absolutely not do.

Let's work together in this and surpass the human level in agentic, living environments as well!

#UniversalEmbodiment #RoboticFoundationModels #AI

Why do we need universal embodiment with in-context learning of the embodiment? Because the embodiment isn't fixed. Of course there are the common degradations and even partial mechanical failures, but also imagine:

A humanoid robot sits on a car driver's seat and drives the car; the motor planning and reasoning shouldn't be on the level of turning the steering wheel this many degrees and so on, but on the level of the changed embodiment; now the car.

The same goes with using tools, adapting the embodiment ad-hoc for the purpose, letting the same model design, build, repair and customize embodiments. It is all synergistic and while the current paradigm of embodied AI doesn't aim for the next step yet, we will need to create the next step as well at some point.

Conveniently when we combine this with multi-agent and intent characterization, we will get embodiment adaptation much easier, but we'll also get truly social robotics which are able to negotiate and communicate in the real-world multi-agentic spaces.

#UniversalEmbodiment #RoboticFoundationModels

#Emergentcapabilities in #largelanguagemodels, such as in-context learning, can also appear in #visionlanguageaction (#VLA) models. Scaling up #roboticfoundationmodels allows for emergent human-to-robot transfer, improving performance on tasks demonstrated in human videos by approximately 2x. https://www.physicalintelligence.company/research/human_to_robot?eicker.news #tech #media #news

#RoboticFoundationModels

Client Info