Connect with us

Thought Leaders

Inside the New Robotics Race: Data, Models, and Manufacturing

mm

Innovation rarely emerges in isolation. More often, it is born in conversations among engineers, founders, researchers, and investors trying to understand where technology is heading.

Over the course of a year, I attended dozens of conferences around the world. Business trips sometimes last for months, and meetings with partners and clients take place from Asia to North America. Yet one of my recent trips to Switzerland turned out to be particularly interesting – largely because of the people and the conversations that happened there.

Zurich proved to be one of the places where the future of robotics and Physical AI is actively being discussed today. And the deeper these conversations go, the more obvious it becomes that the real race in robotics is unfolding around data.

Europe’s Silicon Valley

Zurich has traditionally been associated with the financial sector, but in recent years it has increasingly been called Europe’s Silicon Valley. Much of this reputation is tied to ETH Zurich, one of the most respected engineering universities in Europe. It attracts researchers, PhD students, entrepreneurs, and engineers from around the world. As a result, a powerful technological ecosystem has formed around the university, where research, startups, and industrial projects evolve almost simultaneously.

One of the reasons for my trip was to get a deeper understanding of what Introspector can offer the robotics market, which has been booming since the beginning of 2025. It is an industry that a wide range of startups are trying to enter, while technological breakthroughs from major tech companies are actively reshaping it. Yet despite all this momentum, the field still raises more questions than it answers.

Zurich is also home to our partners Lightly, who helped introduce me to peers working at the intersection of robotics, computer vision, and AI. There is one important aspect of the local technology ecosystem that I would like to highlight: people here are remarkably open and welcoming. They are not afraid to share their ideas and hypotheses, to talk about the challenges they are trying to solve, and the experiments they are running. As a result, you begin to understand the market’s real context and where the industry is heading much more quickly.

By the way, when people ask me how the European “Silicon Valley” differs from the American one, the answer often surprises them. In Zurich, the balance between work and life feels much stronger: sports in the morning, focused work during the day in a calm yet productive rhythm, and evenings spent in the mountains with family or simply relaxing. In San Francisco, there is often a sense that you constantly need to prove that you are working harder than everyone else. In Zurich, the pace is different – more sustainable. Yet the level of technological ambition here is no lower.

Better data before better robots

One of the main takeaways from this trip was a rather simple observation: many people today want to work in robotics. But despite the enormous interest in the industry, many teams are still in an exploratory phase, trying to understand what role they can play in the new wave of robotics and Physical AI, and what contribution they can make.

Many conversations eventually converge on the same topic: data. Today, the industry lacks data on dexterity tasks, i.e., fine motor skills. In this area, robots’ capabilities remain extremely limited. What humans do with their hands almost automatically – picking up an object, turning it, carefully placing it somewhere, or performing a small manipulation – remains one of the most challenging tasks for robots.

The key to progress here lies primarily in large-scale, properly collected datasets. Today, people often talk about egocentric datasets, recorded from a first-person perspective, where the system captures human actions as if it were performing them itself. However, in practice, it turns out that the very concept of an “egocentric dataset” can mean very different things and raises a number of technical questions. Where should the camera be placed? On the forehead, on the chest, or perhaps at eye level? What sensors should accompany the video recording? If we are capturing hand movements, should operators use special gloves? And if so, should those gloves include tactile sensors, gyroscopes, or other motion-tracking systems?

An even more complex question arises: how to properly capture the depth of motion. After all, it is important to understand not only the position of a hand in a two-dimensional plane, but also how it moves through three-dimensional space – forward, backward, up, or down.

So far, the industry has not reached a unified answer. That is why many teams today are experimenting with different sensor configurations, recording methods, and dataset formats.

Multimodal systems

As soon as the conversation turns to data collection for robotics, another topic quickly emerges – additional sensors and multimodality, which enable the capture of body movements, hand actions, and object interactions with greater precision. They also help reduce errors during dataset collection.

When a person records their actions on camera, there is always a risk that part of the material will be unusable. The camera may shift slightly, the shooting angle may be incorrect, the operator may accidentally turn the wrong way, or the operator may perform a movement too quickly. As a result, a significant portion of the recorded material is discarded. A simple example: to obtain one hour of truly usable video, an operator often needs to record around two hours of raw footage.

Additional sensors help compensate for some of these problems. Even if the camera shifts slightly, sensor data can still make it possible to reconstruct the movement of the hand or the position of the body in space. As a result, instead of two hours of recording, it might take roughly one hour and twenty minutes to obtain the same amount of usable data. This significantly increases the efficiency of dataset collection and reduces the cost of creating them.

It is therefore no coincidence that many teams are also noticing growing interest in multimodal data annotation. This has become one of the more visible trends directly connected to the development of robotics and embodied AI.

The next point is the labeling of such datasets. We have encountered similar questions at Keymakr when working with client datasets for robotics cases: what should such an annotation look like in practice? Should it be skeletal? Two-dimensional or three-dimensional? Should elements of reinforcement learning be incorporated into the pipeline? There are dozens of such questions. Engineers themselves admit that no one can yet say with certainty which particular data configuration will ultimately lead to a real technological breakthrough.

These concerns are understandable. Building complex datasets is an expensive process. Every mistake in the data structure can cost thousands or even millions of dollars. It is possible to collect the “wrong” dataset or record it under conditions that are difficult to reproduce in the real world, ultimately undermining the entire project. That is precisely why today, more and more attention is being paid to both the models themselves and the quality and architecture of the data on which those models are trained.

What kind of robots does the market need?

Classic industrial robots, which have been operating on automotive assembly lines for decades, actually require very little computer vision or complex AI models. Their task is extremely specific: to perform strictly repetitive movements – left, right, up, down – with high precision and consistency. In this area, they have long surpassed humans.

A completely different category is humanoid robots. These systems require “brains”: the ability to navigate space, perceive the surrounding environment, understand the context of a situation, and control manipulators not through pre-programmed trajectories but by adapting to the real world.

Even with the high level of automation on modern factory floors, many tasks are still performed by humans. Moving an object, picking up a box, sorting parts, fastening a component, or organizing materials – these are small actions that require flexibility and coordination. This area remains one of the most difficult to automate, and it is precisely here that humanoid systems may find their role.

Many of the teams I spoke with are using a similar business model. They approach a factory and propose solving a specific production case. For example, a worker may spend the entire day moving boxes between warehouse zones. Engineers suggest a relatively simple experiment: equip the worker with a camera and a set of sensors, record thousands of hours of their actions, and use this data to train a model that will control a humanoid robot. In this way, the robot learns to perform exactly the tasks carried out by the human worker.

In essence, the company purchases a humanoid platform, while the development team builds a custom model that replicates the behavior of a specific operator. This is not a universal intelligence capable of solving any task. Rather, it is a set of skills trained for a particular scenario or group of production tasks. For many engineers today, this approach appears far more realistic. Instead of attempting to create a universal robot immediately, teams focus on narrow but economically viable automation scenarios.

The business dimension

If the future lies in custom models, it is important to understand that, from an economic perspective, this is a fairly long development path.

Each industry is essentially its own world. Every production environment has its own processes, workflows, and exceptions. A robot trained to operate in an automotive factory cannot simply be transferred to food manufacturing or warehouse logistics. In each case, the system must be retrained from scratch.

This leads to the next logical question: who will be the first customers of such technology?

At this stage, the primary adopters are likely to be large enterprises – those with the budgets and for whom automation can generate a meaningful economic impact. Today, a humanoid robot costs roughly $60,000–$90,000 for the hardware alone. This is only the base configuration. On top of that, there are maintenance costs, batteries, charging stations, infrastructure, and software.

As a result, the companies most capable of experimenting with such systems are large organizations, automotive manufacturers, food corporations, and major industrial enterprises.

Of course, smaller sectors may also see some early adopters. Some companies may purchase one or two robots for specific tasks. However, in most cases, these businesses are simply not ready to invest hundreds of thousands of euros into collecting and annotating the custom datasets required to train systems for highly specific operational scenarios. For them, human labor still remains the cheaper option.

The long game of robotics innovation

We eventually arrive at a fundamental economic question: what is more efficient – a human or a robot? If we look at today’s economy, the answer is obvious: human labor is cheaper, adapts more quickly to new conditions, and doesn’t require complex infrastructure.

So why does the industry continue to invest in robotics today? The answer is largely strategic.

Many companies understand that a kind of race for technological leadership is underway. They are already developing solutions, despite the high costs, to be ahead when the economics of robotics shift.

As electronics advance, component costs decline, and computing efficiency improves, robotics will inevitably become more affordable. And when that happens, the advantage will belong to the companies that have already built models, accumulated data, and established the necessary technological infrastructure.

Imagine, for example, that new regulations emerge allowing the large-scale use of humanoid robots in manufacturing. Or that governments begin subsidizing the robotization of industries. In such a scenario, the market could grow dramatically within just a few years. And those who prepared in advance, those with existing models, research, datasets, and a ready technological stack, will be the ones who benefit most.

That is why development continues even now, despite the fact that the business economics may not yet look ideal. For many companies, it is an investment in the future – in the moment when technologies become more accessible, and demand increases sharply.

And in this race, as in many technological revolutions, one factor often proves decisive: who started earlier. In this sense, today’s robotics strongly resembles the early stages of artificial intelligence. Back then, there were also more questions than answers. Yet it was the teams that began working with data and infrastructure earlier than others that ultimately shaped the direction of the entire industry.

Michael Abramov is the founder & CEO of Introspector, bringing over 15+ years of software engineering and computer vision AI systems experience to building enterprise-grade labelling tools.

Michael began his career as a software engineer and R&D manager, building scalable data systems and managing cross-functional engineering teams. Until 2025, he has served as the CEO of Keymakr, a data labelling service company, where he pioneered human-in-the-loop workflows, advanced QA systems, and bespoke tooling to support large-scale computer vision and autonomy data needs.

He holds a B.Sc. in Computer Science and a background in engineering and creative arts, bringing a multidisciplinary lens to solving hard problems. Michael lives at the intersection of technology innovation, strategic product leadership, and real-world impact, driving forward the next frontier of autonomous systems and intelligent automation.