← Back to Blogs

Beyond Singularity:
The Democratic Promise of Physical AI

Why robotics, the least mature branch of AI, may be the most open and the most democratizing
Tonghe Zhang
June 2026

The Singularity

Never before in human history have we stood so close to the singularity: the point at which AI, a digital species built from human experience, becomes powerful enough to escape full human control. What makes this prospect especially troubling is not only its irreversibility, but the conditions under which it is unfolding. Superhuman AI is emerging under an oligopoly. A small number of frontier laboratories, backed by extraordinary concentrations of private capital, possess the knowledge, compute, and organizational power required to build it, while the rest of the world risks falling steadily behind. It is deeply unsettling that, in the near future, even well-educated people, including many who helped build these systems, may no longer be able to fully comprehend the most advanced AI if it remains closed to the public.

How We Got Here

This transformation took only three or four years, a fleeting moment in time yet a giant leap in human history. How did we get here? Modern AI was built through search and learning at scale, or more concretely, through the distillation of human knowledge and the use of feedback to enable self-improvement. We trained models on the accumulated record of human civilization to endow them with general knowledge and common sense, then refined them with the reasoning traces of experts such as mathematicians, physicians, and engineers. AI’s engineering ability improved through a similar process. At first, we built external harnesses around models, using human-designed rules to regulate their outputs. We then distilled those regulated outputs back into training so that the models could internalize sound engineering principles without external scaffolding. Beyond that, we placed AI in controlled virtual environments where it could act, receive reliable feedback, and improve through iterative search. That combination of distilled human knowledge, scaffolded refinement, and feedback-driven self-improvement is what made the recent leap in AI possible.

Language is not the World

These principles were sufficient to solve language modeling, the leading frontier of AI. Language did not become the first frontier by accident. It is a remarkable medium: it compresses human knowledge into symbolic form, supports abstract reasoning, and allows us to articulate and attack our hardest intellectual problems. It is also both a channel of communication and an interface for software tools, which is why AI has become so capable at conversation and code generation. Yet there are still many areas in which AI cannot match humans. It is weaker at understanding the physical world through perception, the world we experience through endless combinations of shape, color, light, material, and motion. It is weaker still when asked to change that world directly through physical interaction, which is the domain of robotics. Robotics demands more than perception or description; it requires the ability to act on the basis of physical common sense and to alter the environment through force, contact, and control. Paradoxically, this least mature branch of AI is also the most open, the least monopolized, and in some sense the freest. For that reason, it may also have the greatest potential to benefit a much broader range of people.

The Democratic Promise of Physical AI

Advancing physical AI would create the possibility of general-purpose robots that liberate people broadly, not only the educated or technically literate. It would make possible machines that repair appliances, clean living spaces, and reduce the need for humans to perform dangerous or highly repetitive work that demands extreme care, threatens their health, or drains their time and energy. More importantly, physical AI could be democratized more naturally than purely digital AI because its interface is inherently more accessible than language. Today, language models usually reach people through a search box, a website, an IDE, or some other computational interface, all of which presume a certain degree of textual and technical literacy. Physical AI, by contrast, would not be confined to such interfaces. Its physical presence would itself be a more direct and accessible expression of its existence. It would be present in everyday life, perceptible through sight and touch, and therefore more easily adopted by a much wider population, especially those with less formal technological education, as well as children, the elderly, and disabled or otherwise underprivileged people who make up a much broader share of humanity.

Physical AI may also prove more democratizing because of its deep dependence on factors beyond algorithm research, such as operations, hardware, and manufacturing. Unlike language modeling or video generation, physical interaction does not come with massive quantities of readily trainable data, whether proprietary or open-source. Such data cannot simply be harvested from the web; it must be collected through real-world labor, organized across many regions, and standardized through operational workflows that involve ordinary people around the globe. The robots themselves must also be built through supply chains that span continents, rather than through a stack that can be dominated as easily by a single sovereign power or a handful of firms. For that reason, progress in physical AI would not enrich only frontier researchers and chip designers. It would stimulate a much broader industrial ecosystem spanning mechanics, electronics, communications, industrial design, and the emerging workforce involved in robot data collection.

Openness and Its Price

Another advantage of physical AI arises precisely from its underdevelopment. The field remains highly chaotic and far from mature. No pre-training scaling laws comparable to those of language models have yet been established for robotics, which means that no entrenched monopoly has fully formed. Algorithmic paradigms, multi-fingered dexterous hands, and data-collection devices are still evolving month by month, while open research continues to flourish in a way reminiscent of AI before 2022. In that sense, robotics remains historically open in a way that language modeling no longer is. Yet this openness comes with a price. The slow progress of robotics cannot be directly remedied by the same technologies that transformed AI in virtual domains, such as language modeling and video generation. If robotics were merely a trivial extension of language or vision, the existing AI oligopoly would already have solved it. But robotics lies on the opposite side of the problem. It is not about describing the world in text or pixels; it is about changing the world through mechanical interaction. Language cannot fully characterize the subtle motions of the fingers when tying a rope, because such movements are too fine to be captured adequately in words. Vision alone cannot explain how we fasten a seat belt or turn a door handle in complete darkness, relying only on touch. Physical intelligence requires more than conceptual abstraction or visual observation, because high-level intention and observable consequence are not the same as action itself.

The Bottleneck: General Dexterous Manipulation

At the center of the challenge of building physical intelligence lies general dexterous manipulation, whose bottlenecks are twofold. The first is the absence of physical interaction data at the scale and fidelity required to achieve human-level speed and precision. This limitation determines whether robots remain little more than entertainment or become genuinely useful systems with far greater commercial value. Data that reflects the diversity of the physical world cannot be synthesized adequately in simulation, nor can text or video serve as a complete substitute; it must be derived directly from physical interaction itself. The second bottleneck is the lack of annotations and evaluation signals spanning a sufficiently broad range of manipulation behaviors. The availability of such annotations and reward signals determines whether we can control a robot’s behavior through vision-language instructions and whether robots can inherit the compositional generalization that makes human intelligence so powerful, allowing them to complete new tasks with few or no demonstrations. Without those capabilities, the cost of building robotics models scales with every new hardware platform and every new task, rather than being amortized across them. Only when such transfer becomes possible can robotics foundation model become a genuinely scalable business and sustain the growth of physical AI.

Our Hope

Thankfully, we still have much to leverage in solving robotics. The fact that robotics is being tackled last also means that we can inherit both the methods and the bitter lessons of earlier AI revolutions. The central algorithmic philosophy, scaling capability through search and learning, remains relevant. But to reproduce that scaling behavior in robotics, we must confront something that other fields largely took for granted: the cost of data collection. In most AI domains, data collection is a secondary concern because large datasets already exist on the web. Their diversity is assumed rather than built. Robotics is different. Data scarcity introduces a new class of operational and managerial problems: how to organize collection at scale, how to motivate workers from diverse backgrounds to perform tedious tasks over long periods, how to identify and train the most talented operators for the most precise interactions, and how to encourage improvisation that meaningfully expands the diversity of physical behavior in the dataset. These challenges have already become part of the unspoken advantage behind successful frontier robotics models. Although it remains unclear whether current paradigms and hardware can capture the speed and versatility of human-object interaction well enough to support zero-shot manipulation, it is difficult to see how real-world interaction data can be bypassed altogether. This will be a long slog, and it will demand the co-evolution of hardware, operations, and algorithms.

Although the path of large-scale data collection and end-to-end training is painful and expensive, we should not be tempted by the easier but more dangerous alternative: customized engineering and task-specific data collection for every new task. That approach may deliver fast results at first, but it soon plateaus, remains bounded by the size and skill of the engineering team, and fundamentally fails to scale. Zero-shot capability, however daunting, is the central objective of robotics, because it determines whether robots can transfer knowledge across tasks rather than be rebuilt one task at a time. In the end, whether we treat scalability as a first-class principle will determine whether physical AI becomes a genuinely high-value industry or merely another low-margin systems-integration business.

And that question is not merely economic. It is ultimately about what AI is for. In my view, the purpose of developing AI should not be to replace human beings in their highest intellectual and creative pursuits, such as writing poetry, creating art, or doing mathematics. Rather, it should liberate us from the drudgery of ordinary life, returning time to us so that we may pursue meaning, deepen our bonds with others, and seek inner peace. Unfortunately, digital AI has not been fully directed toward that end. But physical AI still offers us the opportunity to shape a different future. That is why I hope more people will devote themselves to this endeavor: not merely for technological progress, but for the common good of humanity as a whole.