My name is Yihao, and I am a Ph.D. student in Computer Science at Johns Hopkins University. My research lies at the intersection of medical robotics, mixed reality, and artificial intelligence. I focus on language-conditioned manipulation, particularly its modeling in reasoning-intensive and long-horizon tasks. To support this work, I also develop medical robotic infrastructure that enables more capable and intelligent systems.
My Ph.D. advisor is Prof. Mehran Armand from Dept. Com. Sci., Mech. Eng., Ortho. Surgery and Appied Physics Lab. I was advised by Prof. Peter Kazanzides during my MSE and Prof. Zheng Liu and Prof. Jian Liu during my undergraduate program. Over the last few years, I had great times interning at Moon Surgical and Rokae Robotics.
At the center of my recent studies is the State Machine Serialization Language (SMSL). I am investigating the optimal approaches to formalize the mapping of complex interlaced workflow to structured subtask space and their transitions. The major issue is the level of complexity based on the semantics of a scene, so task hierarchy and open-ended-ness are my focus. Such structure should be able to be captured by an agent, and passed to downstream VLA models for execution.
Selected Publications
* Serving as corresponding author
Look Before You Leap: Using Serialized State Machine for Language Conditioned Robotic Manipulation
This paper proposes a framework that uses serialized Finite State Machines (FSMs) to generate demonstrations for imitation learning in robotic manipulation. The method addresses the issue of cascading errors caused by incomplete demonstration coverage, especially in long-horizon, precision-heavy tasks.
dARt Vinci: Egocentric Data Collection for Surgical Robot Learning at Scale
Y Liu*, YC Ku, J Zhang, H Ding, P Kazanzides, M Armand
The paper presents dARt Vinci, a scalable data collection platform for robot learning in surgical settings that leverages Augmented Reality (AR) and a high-fidelity physics engine to capture nuanced surgical maneuvers without requiring a physical robot setup.
An Image-Guided Robotic System for Transcranial Magnetic Stimulation
Y Liu*, J Zhang, L Ai, J Tian, S Sefati, H Liu, A Martin-Gomez, A Kheradmand, M Armand
This paper presents an image-guided robotic system for transcranial magnetic stimulation (TMS) that aims to standardize coil pose planning based on detailed brain anatomy and improve placement accuracy. The system addresses limitations of manual TMS by introducing a reference pose and enhancing repeatability through robotic actuation.
Evd surgical guidance with retro-reflective tool tracking and spatial reconstruction using head-mounted augmented reality device
H Li, W Yan, D Liu, L Qian, Y Yang, Y Liu, Z Zhao, H Ding, G Wang
IEEE Transactions on Visualization and Computer Graphics (TVCG), 2024
This paper introduces an AR-based guidance framework for External Ventricular Drain (EVD) surgery that leverages Time of Flight (ToF) depth sensors for accurate, non-invasive tracking and surface reconstruction. We build a correction method reduces errors by over 85%.
Calibration of Augmented Reality Headset with External Tracking System Using AX=YB
L Ai, Y Liu*, M Armand, A Martin-Gomez
IEEE International Symposium on Mixed and Augmented Reality (ISMAR), 2024
This paper proposes a robust virtual-to-real calibration method for Augmented Reality by formulating the problem as a hand-eye/robot-world calibration using the AX=YB model. The approach leverages both head-mounted display self-localization and external tracking, with additional techniques for data synchronization and outlier filtering based on the Frobenius norm.
Autokinesis Reveals a Threshold for Perception of Visual Motion
Y Liu, J Tian, A Martin-Gomez, Q Arshad, M Armand, A Kheradmand
This study investigates how autokinesis—illusory motion perception in dark environments—affects visual motion perception in humans. Using a novel optical tracking method, we found that at lower motion speeds in darkness, perceived motion aligned more with autokinesis, while brighter environments or higher speeds improved accuracy.
Segment any medical model extended
Y Liu*, J Zhang, A Diaz-Pinto, H Li, A Martin-Gomez, A Kheradmand, M Armand
This paper presents SAMM Extended (SAMME), a unified platform designed to enhance and evaluate the Segment Anything Model (SAM) for medical image segmentation. SAMME supports variant integration, fine-tuning, new interaction modes, and improved communication protocols.
A Roadmap Towards Automated and Regulated Robotic Systems
This paper proposes a roadmap for integrating generative AI into regulated and autonomous robotic systems, particularly for safety-critical domains like medicine. We argue that while generative models are suitable for low-level tasks, their outputs should undergo regulatory oversight before robotic execution. State Machine Serialization Language (SMSL) is introduced.
On the Fly Robotic-Assisted Medical Instrument Planning and Execution Using Mixed Reality
L Ai, Y Liu*, M Armand, A Kheradmand, A Martin-Gomez
IEEE International Conference on Robotics and Automation (ICRA), 2024
This paper presents a mixed reality framework designed to reduce the complexity of operating Robotic-Assisted Medical Systems (RAMS) by enhancing human-robot interaction. The system enables real-time planning and execution through 3D anatomical overlays, collision detection, and an intuitive robot programming interface, supported by a user-friendly head-mounted display calibration.
Gbec: Geometry-based hand-eye calibration
Y Liu*, J Zhang, Z She, A Kheradmand, M Armand
IEEE International Conference on Robotics and Automation (ICRA), 2024
This paper introduces Geometry-Based End-Effector Calibration (GBEC), a novel method for hand-eye calibration that relies solely on the physical geometry between the robot end-effector and the attached sensor, improving accuracy and repeatability over traditional pose-based methods like AXXB.
Realtime Robust Shape Estimation of Deformable Linear Object
J Zhang, Z Zhang, Y Liu*, Y Chen, A Kheradmand, M Armand
IEEE International Conference on Robotics and Automation (ICRA), 2024
This paper presents a robust, real-time shape estimation method for linear deformable objects using scattered, unordered key points without the need for dense point clouds or fully visible markers. A probability-based labeling algorithm determines the correct order of key points, followed by piecewise spline interpolation to reconstruct the object's shape.
Samm (segment any medical model): A 3d slicer integration to sam
This extension is the first working medical segmentation tool using the foundation model Segment Anything (SAM). SAMM enables near real-time image mask inference with a latency of just 0.6 seconds, facilitating the development, evaluation, and application of SAM on medical images.