Author | Huang Nan

Editor | Yuan Silai

Hard Kr has realized that Daimon Robotics not too long ago accomplished a 100-million-yuan Series A financing spherical, collectively invested by Huichuan Industry Investment, an industrial fund beneath Huichuan Technology, and China Telecom. The funds shall be used to additional construct an excellent-giant-scale bodily interplay info dataset, speed up the analysis and growth of the bodily world mannequin, and drive the info flywheel and enterprise closed-loop in actual bodily eventualities.

Daimon Robotics formally began operations in 2023. Its core crew has lengthy centered on the fields of robotic dexterous manipulation and bodily interplay intelligence. Professor Wang Yu, the co-founder and chief scientist, as soon as served because the founding dean of the Robotics Research Institute on the Hong Kong University of Science and Technology. The ideas he proposed, corresponding to “embodied skills” and “skill cloning,” are necessary elements of Daimon Robotics’ core expertise roadmap. Dr. Duan Jianghua, the founder and CEO, and the primary technical leaders all come from the core crew of the Robotics Research Institute on the Hong Kong University of Science and Technology and have 10 years of know-how in manipulation intelligence. Yuan Weihao, the chief AI scientist, was as soon as a multimodal analysis knowledgeable at Alibaba’s Tongyi Lab and has reducing-edge expertise in migrating world fashions to robotic bodily manipulation.

As the recognition of embodied intelligence continues to rise, the trade logic is present process profound adjustments. The growth of the observe evolves alongside a transparent path: from the early competitors in the strolling and movement management capabilities of robots to the exploration of differentiated algorithm structure routes and the “embodied brain.” Each spherical of hotspots has amassed key foundations for its breakthrough.

As humanoid robots transfer from stage demonstrations to actual-world operations, the edge for refined complete-machine sensible operations continues to extend. Whether excessive-high quality bodily interplay knowledge might be collected has turn into a key dividing line for the trade’s implementation.

In the mainstream pure imaginative and prescient notion options, sensors can solely seize the looks of objects and can not determine bodily traits corresponding to hardness, softness, friction coefficient, and deformation beneath drive, making it tough to help robots in predicting object adjustments. In distinction, the bodily interplay knowledge that integrates contact can fully file key parameters corresponding to instantaneous drive and materials properties, precipitate bodily widespread sense in giant-scale mannequin coaching, speed up convergence, assist robots set up bodily causal cognition, and implement varied refined operations.

Daimon begins from the gathering and annotation of bodily interplay knowledge, step by step builds an entire technical hyperlink protecting notion, operation, and studying, and then constructs a world mannequin that may present bodily widespread sense for robots.

At the cognitive stage, its mannequin can obtain the alignment of imaginative and prescient and contact modalities, enabling robots to deduce the bodily properties of objects from photographs and additionally infer the item form from contact sensations. In the execution stage, with the assistance of excessive-response frequency tactile suggestions, it helps the system full notion, judgment, and motion correction inside milliseconds of contact, forming a closed-loop management.

Perform refined operations corresponding to stringing grapes and inserting eggs with bodily instinct (Source/Enterprise)

“For robots to be able to work, an understanding of the causality in the physical world and feedback based on real contact are essential,” Duan Jianghua, the CEO of Daimon Robotics, instructed Hard Kr. “A robot that can do parkour and somersaults will have greatly reduced application value if it can’t pick up a sponge with just the right amount of force to wipe an object. ‘Vision is a non-contact remote signal. It can tell you where an object is, but it can’t tell you why a sponge deforms when you touch it. Touch, on the other hand, is the ‘feel’ at the moment of contact and is the key to judging physical causality and achieving refined operations.”

However, expertise and fashions alone usually are not sufficient. How to drive the continual iteration of the bodily world mannequin with knowledge closed-loop and skilled analysis requirements is one other main problem at the moment confronted by the trade. Duan Jianghua identified to Hard Kr, “The essence of the shortage of tactile data lies in that the data representation method for vision has been relatively unified, while there is no standard for touch, and there is a lack of a large-scale, multimodal real data collection system.”

To resolve this downside, Daimon has constructed an “outward-distributed” embodied knowledge assortment community. Different from the standard mannequin that depends on fastened-level laboratories and distant operation for knowledge assortment, the “outward-distributed” assortment community disperses the centralized laboratory and implements distributed social assortment, which may successfully obtain the authenticity of eventualities, a qualitative change in assortment effectivity, and a lower in marginal prices.

In April 2026, Daimon Robotics, in collaboration with dozens of main home and worldwide establishments together with Google DeepMind, launched the world’s largest full-modal bodily world dataset with contact, Daimon-Infinity, which accommodates contact info corresponding to texture, hardness, and mechanics. It additionally open-sourced 10,000 hours of information at no cost use by the trade. Based on the dataset, a scientific analysis normal was established, and in June, a full-modal Benchmark system for bodily interplay capabilities, RobOmni, supporting each “real data training + simulator training” modes was launched.

Human infants study concerning the world and develop their intelligence by touching. For robots which are about to enter households from factories, this lesson can’t be skipped both. After fixing the issues of “seeing clearly” and “walking steadily,” “touching accurately” is turning into the final and most vital “kilometer” for embodied intelligence to enter the bodily world. Daimon Robotics is making an attempt to outline its personal requirements in this technological technique of “feel.”

Human infants study concerning the world and develop their intelligence by touching. For robots which are about to enter households from factories, this lesson can’t be skipped both. After fixing the issues of “seeing clearly” and “walking steadily,” “touching accurately” is turning into the final and most vital “kilometer” for embodied intelligence to enter the bodily world. Daimon revealed to Hard Kr that the cargo quantity of its visible-tactile sensors at the moment ranks first in the world. It is making an attempt to outline its personal requirements in this technological technique of “feel.”

The following is an excerpt from an interview between Hard Kr and Duan Jianghua, the CEO of Daimon Robotics (barely edited):

Hard Kr: From notion to execution, embodied intelligence must cross the hole from “understanding” to “working.” How does Daimon’s bodily world mannequin deal with the fusion of visible and tactile modalities and low-stage management? What duties that robots could not do earlier than can this structure assist them full when dealing with advanced operation duties?

Duan Jianghua: Our mannequin infers bodily causality. In phrases of the mannequin construction, we break up bodily contact into two layers: the cognitive layer and the execution layer.

What the cognitive layer does is to allow two-approach mapping of imaginative and prescient and contact in the identical semantic area. This is much like human synesthesia. When you see a strawberry, it should have a granular texture with out squeezing it. When you insert a key into the lock to open the door, your hand blocks your view in the intervening time the important thing enters the lock. Without seeing the contact state between the important thing and the keyhole, people depend on instinct and really feel to finish the operation – whether or not it is inserted, whether or not it is caught, and whether or not to show it. We hope robots may also do that.

Daimon Robotics makes use of a gripper to select up an egg (Source/Enterprise)

There are two mechanisms operating concurrently in the execution layer. One is a excessive-frequency tactile servo on the hundred-hertz stage, much like a spinal reflex. Without going by means of higher-stage reasoning, as quickly as an object begins to point out a slipping tendency, a compensatory motion is distributed out, even earlier than the visible body has switched. This is like if you’re washing dishes and a plate lined with dish cleaning soap begins to slide slightly. You do not want to have a look at it along with your eyes; your fingers will instinctively tighten to carry the plate.

The different is bodily world reasoning. The mannequin constantly predicts the operation states in the following few steps and offers correction methods in advance earlier than a mistake truly happens. This is like if you’re pouring water from a kettle right into a cup with one hand. As the water flows out, the middle of gravity of the kettle backside is consistently altering. Your mind will constantly predict the burden distribution of the kettle in the following second based mostly on the water move charge and alter the lean angle of your wrist easily in advance to make sure a gentle move of water.

These two mechanisms correspond to millisecond-stage reactions and multi-step ahead-trying respectively. Although they’ve totally different time scales, they work collectively in the identical process. This is a very powerful structural distinction in comparison with pure imaginative and prescient operation fashions.

Hard Kr: Daimon has not too long ago launched a dataset and a Benchmark for robotic bodily interplay capabilities. What is the connection between these and the bodily world mannequin you are engaged on?

Duan Jianghua: The dataset is the gas, the bodily world mannequin is the engine, and the Benchmark is the tachometer.

Traditional datasets, whether or not visible or simulated, file “pixel changes” or “trajectories.” But to allow robots to grasp the bodily world, these are far from sufficient. For instance, is an object mushy or laborious? Is its floor easy or tough? How a lot regular strain, tangential drive, and slipping tendency are there when greedy? These all belong to bodily property info. The Daimon-Infinity dataset collects greater than a dozen modalities, together with strain, deformation, texture, stiffness, and slipping tendency.

The greatest issue is to not accumulate a single modality alone, however to strictly align these greater than a dozen tactile modalities with visible photographs and motion directions in the millisecond-stage spatio-temporal dimension.

Daimon Robotics achieves the duty of threading grapes autonomously (Source/Enterprise)

For instance, when a robotic’s finger touches an object, the tactile sensor ought to file the strain distribution and texture info on the contact level. At the identical time, the digital camera ought to file the image at that second, and the management system ought to file the joint angle and torque at that second. These three have to be synchronized to the millisecond stage in time; in any other case, the mannequin may have issue studying the right causal logic.

With the info and the mannequin in place, the following query arises – how you can decide whether or not the mannequin has actually realized bodily causality? This is the importance of Daimon’s launch of RobOmni.

Existing benchmark evaluations in the sphere of embodiment typically concentrate on the visible notion modality, emphasizing the robotic’s generalization greedy and lengthy-sequence planning duties. The analysis requirements for the tactile notion modality and contact refined operations usually are not but excellent.

The trade nonetheless lacks a standardized analysis benchmark for tactile notion and dexterous manipulation. There isn’t any unified normal between totally different fashions and knowledge, making it tough to quantify tactile capabilities and systematically confirm the generalization potential of fashions.

We’ve seen that some groups specializing in simulation and Sim2Real fields have not too long ago began to introduce visible-tactile fusion evaluations. This reveals that all the trade frontier is reaching a consensus – pure imaginative and prescient shouldn’t be sufficient for robots to really perceive and work together with the world, and contact is unavoidable. RobOmni fills this hole and gives a standardized, comparable, reproducible, and scalable verification entry for bodily interplay capabilities.

Without a ruler, we will not measure progress. Without requirements, the trade cannot kind a joint drive. So we have to make a ruler first and then measure the world.

Comments from traders:

A related individual in cost of Huichuan Industry Investment mentioned that for embodied intelligence to attain a generational leap in actual-state of affairs operations, tactile notion to enrich bodily causal logic is a obligatory path. Daimon Robotics is without doubt one of the few corporations in the trade that begins from bodily causal logic, drives with large visible-tactile knowledge, and promotes the implementation of the bodily world mannequin in refined operation eventualities. Huichuan Technology has lengthy been deeply concerned in the fields of business automation and clever robots and is nicely conscious of the strategic worth of multimodal notion in refined operation eventualities. In the longer term, based mostly on Huichuan’s state of affairs and trade data, we sit up for collectively constructing a tactile neural community in the period of embodied intelligence with Daimon.

A related individual in cost of China Telecom Investment Company mentioned that for embodied intelligence to attain giant-scale business implementation, it not solely will depend on the continual iterative improve of cloud-based mostly giant mannequin computing energy but in addition extremely depends on excessive-precision bodily notion capabilities and a multimodal knowledge system as help. Daimon Robotics has deep accumulation in the visible-tactile notion observe and has constructed a strong core expertise barrier. As a spine drive in the development of a digital China, China Telecom is absolutely implementing the “Cloud Transformation, Digital Transformation, and Intelligence Benefits” technique. In the longer term, we sit up for deeply collaborating with Daimon Robotics to collectively create implementable and replicable trade options for embodied intelligence, construct a brand new digital infrastructure to empower the event of latest productive forces, and assist speed up the excessive-high quality growth of the embodied trade to attain ecological win-win outcomes.



Sources

Leave a Reply

Your email address will not be published. Required fields are marked *