Who coined the term Physical AI?

The term 'Physical AI' was popularized by NVIDIA CEO Jensen Huang in his GTC 2024 keynote, where he described it as 'AI that understands the physical world and can act in it.' The concept builds on decades of embodied AI and robotics research, but Huang's framing as 'Physical AI' — distinct from digital AI — gave the field a unifying label that stuck.

Is Physical AI the same as embodied AI?

Largely yes. Embodied AI is the academic term used since the 1990s for AI systems with physical bodies that sense and act in the real world. Physical AI is the industry term that emerged in 2024 to describe the same concept. The main difference is emphasis: embodied AI research focused on understanding intelligence through embodiment, while Physical AI emphasizes building practical systems that work in physical environments.

What companies are working on Physical AI?

Major Physical AI companies in 2026 include NVIDIA (Isaac simulation platform), Physical Intelligence (pi, foundation models for robots), Figure AI (humanoid robots), Agility Robotics (Digit warehouse humanoid), 1X Technologies (NEO humanoid), Apptronik (Apollo humanoid), Boston Dynamics (Atlas, Spot), Unitree (affordable humanoids and quadrupeds), and Hugging Face (LeRobot open-source framework). SVRC provides hardware and data infrastructure for Physical AI development.

How much data does Physical AI need?

It depends on the approach. Imitation learning for a single task typically requires 50-200 high-quality demonstrations (achievable in 1-2 days of teleoperation). Foundation model approaches like RT-2 or pi-0 use millions of episodes across many tasks. Reinforcement learning in simulation can generate billions of steps but requires accurate simulation. The data bottleneck is quality and diversity, not raw quantity — 100 expert demonstrations often outperform 10,000 noisy ones.

What is the Physical AI stack?

The Physical AI stack has six layers: Sensors (cameras, LiDAR, force-torque, tactile) capture the physical world. Perception processes sensor data into structured representations. The World Model predicts how the physical world will respond to actions. The Policy decides what action to take. Actuators (motors, grippers, legs) execute the action. The Physical World responds, and the cycle repeats. Each layer has distinct engineering challenges.

What hardware do I need to get started with Physical AI?

For simulation-only work, a workstation with an NVIDIA RTX 4090 GPU is sufficient. For real-world Physical AI, you need a robot (starting at $5,000 for a capable arm like xArm or OpenArm), cameras (Intel RealSense D435, $350 each), a GPU workstation for training, and a teleoperation interface for data collection (VR headset at $500 or leader arm at $3,000+). Total minimum budget for a real Physical AI setup: $10,000-25,000.

Will Physical AI replace human workers?

Physical AI in 2026 is far from general-purpose. It works reliably only in structured environments with well-defined tasks — warehouse picking, simple assembly, laboratory sample handling. Unstructured environments (homes, outdoor, novel objects) remain largely unsolved. Physical AI will augment human workers in dangerous, repetitive, or precision-critical tasks before it replaces them in open-ended work. The timeline for general-purpose physical AI is measured in decades, not years.

What Is Physical AI? The Complete Guide to Embodied Intelligence (2026)

What Is Physical AI?

Physical AI refers to artificial intelligence systems that operate in and interact with the physical world through a body — a robot, a vehicle, a drone, or any machine with sensors and actuators. Unlike digital AI, which processes text, images, and code within a computer, Physical AI must perceive real environments through cameras and sensors, reason about physical dynamics (gravity, friction, object permanence), and execute actions with real consequences through motors and grippers.

The term gained mainstream traction when NVIDIA CEO Jensen Huang described Physical AI at GTC 2024 as "the next frontier of AI" — intelligence that understands and operates in the physical world. Huang positioned NVIDIA's Isaac platform (simulation, synthetic data, and inference hardware) as the enabling infrastructure, and the framing resonated because it captured a shift already happening across robotics: the transition from hand-coded robot behaviors to learned behaviors trained on data.

Physical AI is not new in concept. Researchers have studied embodied intelligence and situated cognition since the 1980s, and the academic field of "embodied AI" has existed for over two decades. What changed in 2024-2026 is the convergence of three capabilities that made Physical AI practical: large-scale simulation (NVIDIA Isaac, MuJoCo), foundation models that transfer across tasks (RT-2, pi-0, Octo), and affordable robot hardware for data collection. The combination means that for the first time, it is feasible to train a robot to perform a new task in days rather than months of engineering.

Why Physical AI Is Different from Digital AI

Digital AI (LLMs, image generators, code assistants) and Physical AI share underlying machine learning techniques, but the engineering challenges are fundamentally different:

The perception-action loop: Digital AI processes input and produces output in a single forward pass. Physical AI runs in a continuous loop — sense, decide, act, observe the result, repeat — at 10-50 Hz. Every action changes the world, which changes the next observation. This closed-loop nature means errors compound: a small positioning error at step 1 may cause a grasp failure at step 10.
Real-time constraints: A chatbot can take 2 seconds to respond. A robot arm moving at 1 m/s travels 2 cm in 20 milliseconds. Physical AI policies must run at control-loop speed (20-50 Hz for manipulation, 200+ Hz for locomotion), which constrains model size and inference hardware.
Physical consequences: A bad LLM response wastes the user's time. A bad robot action breaks hardware, damages objects, or injures people. Physical AI requires safety constraints, force limits, and collision avoidance layered around the learned policy — engineering that digital AI does not need.
Data scarcity: Digital AI trains on billions of text documents scraped from the internet. Physical AI has no equivalent data source. Every training demonstration must be physically collected on a real robot or simulated in a physics engine. This makes Physical AI data 1,000-10,000x more expensive per sample than digital AI data.
The reality gap: Digital AI trains and deploys in the same medium (computers). Physical AI often trains in simulation and deploys in reality — two environments that never perfectly match. Bridging this sim-to-real gap is a core challenge that digital AI simply does not face.

The Physical AI Stack

Physical AI systems follow a layered architecture. Understanding each layer clarifies where the hard engineering problems lie:

THE PHYSICAL AI STACK
↓ Physical World (objects, surfaces, forces, humans)
↓ Sensors — cameras, LiDAR, F/T, tactile, proprioception
↓ Perception — object detection, pose estimation, scene understanding
↓ World Model — physics prediction, consequence forecasting
↓ Policy — action selection (learned or programmed)
↓ Actuators — motors, grippers, wheels, legs
↓ Physical World (loop repeats at 10-50 Hz)

Sensors capture the state of the physical world. For manipulation, the minimum sensor set is an RGB camera and joint encoders (proprioception). Advanced setups add depth cameras (Intel RealSense), force-torque sensors at the wrist, and tactile sensors on fingertips. Each additional sensor modality improves robustness but increases data collection complexity and policy input dimensionality.

Perception transforms raw sensor data into structured representations the policy can use — object bounding boxes, 6-DOF poses, semantic segmentation, point clouds. Pre-trained vision models (DINO, SAM, GroundingDINO) have made perception the most commoditized layer of the stack. Many Physical AI systems skip explicit perception entirely, feeding raw images directly to end-to-end policies.

World Model predicts what will happen when the robot takes an action. Explicit world models (physics simulators like MuJoCo, Isaac Sim) are used for planning and training. Learned world models (video prediction, latent dynamics models) are an active research frontier. NVIDIA's foundation world model approach uses video generation to predict future frames conditioned on robot actions.

Policy is the decision-making core — given the current observation, what action should the robot take? This is where imitation learning (BC, Diffusion Policy, ACT), reinforcement learning (PPO, SAC), and foundation model approaches (RT-2, pi-0, Octo) operate. The policy is the layer that most directly benefits from the recent ML advances driving Physical AI forward.

Actuators execute the policy's commanded actions in the physical world — joint torques, end-effector velocities, gripper commands. Actuator quality sets hard limits on what the policy can achieve. A policy trained in simulation with perfect actuators will fail on a real robot with backlash, friction, and current limits.

The 3 Approaches to Physical AI

There are three dominant approaches to creating Physical AI systems, each with different data requirements, compute costs, and practical tradeoffs:

Approach	Core Idea	Data Requirement	Compute	Strengths	Weaknesses
Reinforcement Learning	Learn from trial and error in simulation	Billions of simulated steps	Very high (GPU cluster)	Can discover superhuman strategies, no human demos needed	Sim-to-real gap, reward engineering, sample inefficient in real world
Imitation Learning	Learn from human demonstrations	50-500 real demonstrations per task	Moderate (single GPU)	Data-efficient, produces human-like behavior, works with real data	Bounded by demonstrator skill, struggles with out-of-distribution
Foundation Models / VLAs	Scale across many tasks with one large model	Millions of episodes across diverse tasks	Extreme (GPU cluster months)	Generalization, language-conditioned, zero-shot transfer	Enormous data/compute cost, still emerging, hard to fine-tune

In practice, most successful Physical AI deployments in 2026 use imitation learning. It has the lowest barrier to entry: collect 100-200 demonstrations via teleoperation, train a policy with Diffusion Policy or ACT, and deploy. RL dominates locomotion (where simulation is accurate) and dexterous manipulation in simulation. Foundation models are the long-term bet but require resources only available to well-funded labs and companies.

Physical AI Companies to Watch in 2026

NVIDIA — provides the simulation (Isaac Sim/Lab/Gym), synthetic data generation (Replicator), and inference hardware (Jetson Thor) that underpins most Physical AI development. Not building robots, but building the platform everyone else uses.
Physical Intelligence (pi) — founded by Karol Hausman, Sergey Levine, and others from Google Brain/DeepMind. Building foundation models for robot manipulation (pi-0). Their approach: one large model trained on diverse manipulation data that generalizes to new tasks with minimal fine-tuning. Raised $400M+ in 2024-2025.
Figure AI — building the Figure 02 humanoid for warehouse and manufacturing work. Integrated a large language model for task understanding with a manipulation policy for execution. Partnered with BMW for factory deployment. Valued at $2.6B.
Agility Robotics — Digit humanoid robot designed specifically for warehouse logistics. Purpose-built bipedal form factor for human-designed environments. Operating a pilot factory in Salem, Oregon, producing Digit units for Amazon and other logistics customers.
1X Technologies — NEO humanoid focused on household tasks. Funded by OpenAI. Taking a data-centric approach: collecting massive teleoperation datasets in homes to train general-purpose household manipulation policies.
Apptronik — Apollo humanoid designed for industrial environments. Partnership with Mercedes-Benz for automotive manufacturing. Emphasizes reliability and safety certification over cutting-edge AI capabilities.
Boston Dynamics — the pioneer. Atlas (electric humanoid, 2024 redesign), Spot (quadruped, commercially deployed), and Stretch (warehouse robot). Transitioning from impressive demos to commercial Physical AI deployments.
Unitree — making Physical AI hardware affordable. G1 humanoid ($16,000), H1 ($90,000), and Go2 quadruped ($1,600). Chinese manufacturer producing robots at price points that make Physical AI research accessible to universities and small labs worldwide.
Hugging Face (LeRobot) — open-source framework for Physical AI. LeRobot provides data collection, training (ACT, Diffusion Policy, TDMPC), and deployment tools with support for real hardware. Democratizing Physical AI the way Hugging Face democratized NLP.
SVRC — provides the physical infrastructure for Physical AI development: robot hardware (OpenArm, Franka, UR5e), data collection services, teleoperation systems, and a data platform for managing robot datasets at scale.

Hardware for Physical AI

Physical AI systems need bodies. The choice of hardware platform determines what tasks are possible, what data you can collect, and how your policies will generalize:

Robot arms (6-7 DOF): The most practical platform for Physical AI research and deployment today. Arms like Franka Panda, UR5e, xArm, and OpenArm provide precise manipulation in a fixed workspace. Ideal for tabletop tasks: pick-and-place, assembly, sorting, packing. Mature ecosystem of end effectors, sensors, and software. Cost: $5,000-30,000.
Humanoid robots: Full-body platforms (Figure 02, Unitree G1/H1, Agility Digit, 1X NEO) that can navigate human environments and manipulate with arms and hands. The hardware form factor best suited for general-purpose Physical AI — but also the hardest to control, most expensive ($16,000-250,000), and least mature. Most humanoid Physical AI in 2026 is still R&D, not deployment.
Quadruped robots: Four-legged platforms (Boston Dynamics Spot, Unitree Go2, Anymal) excel at locomotion in rough terrain and inspection tasks. Physical AI for locomotion is the most mature sub-field — RL policies trained in simulation transfer reliably to real quadrupeds. Adding manipulation (arm-on-quadruped) is an active frontier.
Mobile manipulators: A robot arm mounted on a mobile base (TIAGo, Fetch, Hello Robot Stretch). Combines manipulation with navigation for tasks that span multiple locations — shelf stocking, lab transport, home assistance. The engineering challenge is coordinating base movement with arm manipulation.

Data: The Fuel of Physical AI

Physical AI is bottlenecked by data. Unlike LLMs that consume the internet, Physical AI models need demonstrations of robots interacting with the physical world — and that data does not exist on the internet.

Why real-world robot data is scarce and expensive: Every demonstration requires a physical robot, a human teleoperator, a task setup, and recording infrastructure. A single high-quality manipulation demonstration takes 30-120 seconds to collect, plus setup time. At scale, collecting 10,000 demonstrations of a single task costs $20,000-100,000 in operator time and hardware wear. Collecting diverse data across hundreds of tasks — the requirement for foundation models — costs millions.

The data flywheel concept: The most promising scaling strategy is the data flywheel: deploy robots in the real world to perform tasks autonomously, use human teleoperators to intervene and correct when the robot fails, record those corrections as new training data, retrain the policy, and redeploy. Each cycle improves the policy and reduces the intervention rate, generating better data more cheaply. Tesla's autopilot follows this pattern. Physical AI companies like 1X and Figure are building flywheel infrastructure for manipulation.

Open X-Embodiment: The Open X-Embodiment dataset (Google DeepMind, 2023) aggregated manipulation data from 22 robot platforms across 21 institutions — over 1 million episodes spanning 527 skills. It demonstrated that cross-embodiment training (learning from data collected on different robots) produces policies that generalize better than single-robot training. This dataset established the paradigm that Physical AI, like digital AI, benefits from data diversity and scale.

Physical AI vs. Robotics vs. Embodied AI

The terminology in this field is confusing. Here is an honest disambiguation:

Robotics is the broadest term. It encompasses all aspects of designing, building, and programming robots — mechanical engineering, electrical engineering, control systems, perception, and AI. Most industrial robots today use hand-coded programs, not AI. Robotics includes Physical AI but is not synonymous with it.
Embodied AI is the academic term (used since the 1990s) for AI systems that have physical bodies and interact with the real world. It emphasizes the theoretical insight that intelligence requires embodiment — you cannot truly understand the physical world without acting in it. Embodied AI research spans cognitive science, robotics, and AI.
Physical AI is the industry term (popularized 2024) for the same concept, but with an engineering and commercial focus. Physical AI emphasizes building practical systems: training robot policies on data, deploying them in real environments, and scaling. It is essentially "embodied AI" rebranded for the era of foundation models and commercial humanoids.
In practice: use "robotics" when discussing hardware and mechanical systems, "embodied AI" in academic papers and conference submissions, and "Physical AI" when talking to investors, executives, and the general public. They describe overlapping but not identical concepts.

Where Physical AI Works Today

Honest assessment of Physical AI maturity by application domain as of 2026:

Warehouse logistics (mature): Pick-and-place of known objects from bins and shelves. Companies: Amazon (Sparrow), Covariant (now part of Google DeepMind), Berkshire Grey. Success rate: 95-99% on trained object categories. This is the most commercially viable Physical AI application today.
Manufacturing assembly (emerging): Simple assembly steps — inserting screws, snapping connectors, placing components. Requires force-controlled manipulation and sub-millimeter precision. Success rate: 85-95% for trained tasks. Human-in-the-loop supervision still required for edge cases.
Lab automation (emerging): Pipetting, sample transfer, plate handling in biology and chemistry labs. Structured environment with predictable objects. Physical AI adds flexibility over traditional liquid handlers. Companies: Automata (LINQ), Opentrons, Arctoris.
Agriculture (early): Harvesting delicate crops (strawberries, tomatoes), weeding, pruning. Challenging due to outdoor conditions, object variability, and delicate manipulation requirements. Companies: Agrobot, AppHarvest, Iron Ox. Limited commercial deployment.
Household (research): Cleaning, tidying, cooking, laundry. The ultimate Physical AI challenge due to extreme environment and object diversity. No commercially viable household Physical AI exists in 2026. Active research at 1X, Toyota Research Institute, and university labs.

What Is Not Working Yet

Honest assessment of Physical AI limitations — what the demos do not show:

Open-world manipulation: Picking up novel objects the system has never seen, in configurations it has never encountered. Foundation models show promising generalization but still fail 20-40% of the time on truly novel objects. Production systems solve this by constraining the environment, not by achieving general manipulation.
Unstructured environments: Homes, construction sites, disaster zones — places where nothing is standardized and everything varies. Physical AI systems trained in clean labs fail dramatically when deployed in messy real-world conditions. The gap between demo environments and real deployment environments remains the field's central challenge.
Long-horizon tasks: Tasks requiring 50+ sequential manipulation steps (cooking a meal, assembling furniture, cleaning a room). Current policies reliably handle 5-15 steps. Beyond that, error accumulation causes failure. Hierarchical planning with Physical AI execution is an active research direction but not production-ready.
Multi-robot coordination: Multiple Physical AI agents working together in shared space. Collision avoidance, task allocation, and handoff coordination multiply the complexity. Most deployed systems use simple turn-taking protocols, not learned coordination.
Robustness to failure: When a Physical AI policy fails mid-task (drops an object, misjudges a grasp), recovery requires real-time replanning that most learned policies cannot do. Industrial deployments add hand-coded recovery procedures around the learned policy core.

Getting Started with Physical AI

A practical path from zero to deploying a Physical AI system, ordered by increasing commitment and cost:

Step 1: Simulation (cost: $0, time: 1-2 weeks). Start in simulation to learn the Physical AI workflow without hardware risk. Install MuJoCo or Isaac Sim. Train a BC or RL policy on a standard benchmark (e.g., RoboMimic Lift task, see our RoboMimic tutorial). Understand the training loop, evaluation metrics, and failure modes before touching a real robot.
Step 2: Teleoperation data collection (cost: $5,000-15,000, time: 2-4 weeks). Acquire a robot arm (OpenArm, xArm, or used Franka from $8,000) and a teleoperation interface (Meta Quest 3 at $500 or SpaceMouse at $300). Define a single manipulation task. Collect 100-200 demonstrations. This is where you learn what makes data "good" — consistent task execution, proper camera placement, and clean action recording.
Step 3: Policy training and deployment (cost: $1,000-3,000 GPU time, time: 1-2 weeks). Train a Diffusion Policy or ACT model on your collected data. Deploy on the real robot. Measure success rate over 50+ trials. Debug the inevitable failures — wrong camera angle, action space mismatch, object placement variation. This step teaches you the real engineering of Physical AI, which is mostly debugging, not model architecture.
Step 4: Iteration and scaling (ongoing). Collect more data for failure cases. Add task variations (different objects, positions, lighting). Build the data flywheel: deploy, observe failures, collect corrective demos, retrain. This is where Physical AI becomes a continuous engineering process rather than a one-time project.

For teams that want to skip hardware setup, SVRC provides turnkey data collection — you define the task, SVRC collects the demonstrations on calibrated hardware, and delivers datasets ready for training.

Frequently Asked Questions

Who coined the term Physical AI? The term was popularized by NVIDIA CEO Jensen Huang at GTC 2024. The concept of embodied intelligence predates the term by decades, but Huang's framing as "Physical AI" gave the field a unifying industry label.
Is Physical AI the same as embodied AI? Essentially yes. Embodied AI is the academic term used since the 1990s. Physical AI is the industry term from 2024 that emphasizes practical engineering and commercial deployment rather than theoretical understanding of embodied cognition.
What companies are working on Physical AI? Major players: NVIDIA (platform), Physical Intelligence (foundation models), Figure AI (humanoids), Agility Robotics (warehouse humanoids), 1X Technologies (household humanoids), Unitree (affordable hardware), Hugging Face/LeRobot (open-source tools), and SVRC (hardware and data infrastructure).
How much data does Physical AI need? For a single task via imitation learning: 50-200 demonstrations. For a foundation model: millions of episodes across diverse tasks. Quality matters more than quantity — 100 expert demonstrations often outperform 10,000 noisy ones.
What is the Physical AI stack? Six layers: Sensors, Perception, World Model, Policy, Actuators, Physical World — running in a continuous loop at 10-50 Hz.
What hardware do I need to get started? Simulation-only: workstation with NVIDIA GPU ($2,000-5,000). Real-world: robot arm ($5,000-30,000) + cameras ($350-1,000) + teleoperation interface ($500-3,000) + GPU workstation ($3,000-5,000). Minimum viable real Physical AI setup: $10,000-25,000.
Will Physical AI replace human workers? Not in 2026. Physical AI works reliably only in structured environments with well-defined tasks. It will augment workers in dangerous, repetitive, and precision-critical roles before replacing anyone in open-ended work. General-purpose Physical AI is decades away.

What Is Physical AI? The Complete Guide to Embodied Intelligence