Haokun Zhu*, Zongtai Li, Zihan Liu, Kevin Guo, Zhengzhi Lin, Yuxin Cai, Guofei Chen, Chen Lv, Wenshan Wang, Jean Oh, Ji Zhang
Carnegie Mellon University, New York University, Nanyang Technological University
[Project Page] [arXiv]
- [2026-03] Paper released on arXiv.
- [2026-03] Project page is online.
- [2026-04] Code released for Unity simulation, wheeled robot, Unitree Go2, and Unitree G1 platforms.
Object navigation in real-world environments remains a significant challenge in embodied AI. We present SysNav, a three-level object navigation system that decouples semantic reasoning, navigation planning, and motion control. The framework employs Vision-Language Models for high-level semantic guidance and implements a hierarchical room-based navigation strategy that treats rooms as minimal decision-making units, combined with classical exploration for in-room navigation. Through 190 real-world experiments across three robot embodiments (wheeled, quadruped, humanoid), we demonstrate 4-5x improvement in navigation efficiency over existing baselines. The system also achieves state-of-the-art results on HM3D-v1, HM3D-v2, MP3D, and HM3D-OVON simulation benchmarks.
Find Refrigerator in Lounge. ▶ Watch on YouTube |
Find Blue Trash Can in Classroom. ▶ Watch on YouTube |
Find Microwave Oven near Refrigerator. ▶ Watch on YouTube |
| System View | Third-person View | |
|---|---|---|
| Wheeled Robot |
wheeled_system_view.webm |
wheeled_third_person.webm |
| Find the microwave_oven. | ||
| Quadruped (Go2) |
go2_system_view.webm |
go2_third_person.webm |
| Find the blue trash_can. | ||
| Humanoid (G1) |
g1_system_view.webm |
g1_third_person.webm |
| Find the tv_monitor on the black desk. | ||
More demos on our project page.
This repository supports three robot embodiments, each maintained on its own branch. Switch to the corresponding branch (git checkout unitree_go2 / git checkout unitree_g1) before building and running on a Unitree robot.
Wheeled Robot + Unity simulation — main (you are here)
- Custom wheeled vehicle with Mecanum wheels (indoor carpet) or standard wheels (hard floor / outdoors)
- Livox Mid-360 lidar + Ricoh Theta Z1 360-degree camera
- Motor controller connected via USB serial (
/dev/ttyACM0by default) - Gaming laptop (RTX 4090) as the processing computer
- PS3/Xbox-style joystick for teleoperation
Detailed hardware photos and assembly info: Real-robot Setup → Hardware.
Unitree Go2 Quadruped — unitree_go2
- Unitree Go2 quadruped, controlled via WebRTC
- Livox Mid-360 lidar + Ricoh Theta Z1 360-degree camera
- Asus NUC 14 Pro (Intel Core Ultra 5) as the onboard computer
- Desktop workstation / Laptop with NVIDIA RTX 4090 for the semantic mapping and VLM reasoning
- Wired / WiFi network shared between robot, NUC, and desktop
Unitree G1 Humanoid — unitree_g1
- Unitree G1 humanoid, controlled via WebRTC
- Livox Mid-360 lidar + Ricoh Theta Z1 360-degree camera
- Asus NUC 14 Pro (Intel Core Ultra 5) as the onboard computer
- Desktop workstation / Laptop with NVIDIA RTX 4090 for the semantic mapping and VLM reasoning
- Wired / WiFi network shared between robot, NUC, and desktop
- Demo
- Platforms
- Installation
- Simulation Setup
- Real-robot Setup
- Bagfile Setup
- Credits
- Citation
- License
The system has been tested on Ubuntu 24.04 with ROS2 Jazzy.
Install ROS2 Jazzy, then:
echo "source /opt/ros/jazzy/setup.bash" >> ~/.bashrc
source ~/.bashrcInstall system dependencies:
sudo apt update
sudo apt install ros-jazzy-desktop-full ros-jazzy-pcl-ros libpcl-dev git
sudo apt install -y nlohmann-json3-dev
sudo apt install ros-jazzy-backward-rosgit submodule update --init --recursive
pip install -r requirement.txt --break-system-package
# detectron2
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git' --break-system-package
# pytorch3d
pip install "git+https://github.com/facebookresearch/pytorch3d.git" --no-build-isolation --break-system-package
# sam2
cd src/semantic_mapping/semantic_mapping/external/sam2
pip install -e . --break-system-package
cd checkpoints && ./download_ckpts.sh && cd ../..
# spacy
python -m spacy download en_core_web_sm --break-system-package
# CLIP
pip install git+https://github.com/ultralytics/CLIP.git --break-system-package
# YOLO models
python set_yolo_e.py
python set_yolo_world.pyInstall Sophus (from src/slam/dependency/Sophus):
mkdir build && cd build
cmake .. -DBUILD_TESTS=OFF
make && sudo make installInstall Ceres Solver (from src/slam/dependency/ceres-solver):
mkdir build && cd build
cmake ..
make -j6 && sudo make installInstall GTSAM (from src/slam/dependency/gtsam):
mkdir build && cd build
cmake .. -DGTSAM_USE_SYSTEM_EIGEN=ON -DGTSAM_BUILD_WITH_MARCH_NATIVE=OFF
make -j6 && sudo make install
sudo /sbin/ldconfig -vInstall Livox-SDK2 (from src/utilities/livox_ros_driver2/Livox-SDK2):
mkdir build && cd build
cmake ..
make && sudo make installConfigure the lidar IP in src/utilities/livox_ros_driver2/config/MID360_config.json — set the IP to 192.168.1.1xx where xx are the last two digits of the lidar serial number.
Compile the driver:
colcon build --symlink-install --cmake-args -DCMAKE_BUILD_TYPE=Release --packages-select livox_ros_driver2For simulation (skips SLAM and lidar driver):
colcon build --symlink-install --cmake-args -DCMAKE_BUILD_TYPE=Release --packages-skip arise_slam_mid360 arise_slam_mid360_msgs livox_ros_driver2For real robot (full build, requires steps 3-4):
colcon build --symlink-install --cmake-args -DCMAKE_BUILD_TYPE=ReleaseThe VLM node supports two providers via the OpenAI-compatible interface. Set one of the following:
Gemini (default) — get a key from Google AI Studio:
export GEMINI_API_KEY="your-api-key-here"Qwen (DashScope) — get a key from Alibaba Cloud DashScope:
export DASHSCOPE_API_KEY="your-api-key-here"If both keys are set, Gemini is used by default; override with export VLM_PROVIDER=qwen. Optionally override Qwen model names with QWEN_MODEL / QWEN_MODEL_LITE. Add the line(s) to ~/.bashrc so they persist across terminal sessions.
The system is integrated with Unity environment models for simulation. Download a Unity environment model (recommend home_building_1.zip) and unzip the files to the src/base_autonomy/vehicle_simulator/mesh/unity folder. For computers without a powerful GPU, please try the without_360_camera version for a higher rendering rate.
The environment model files should look like:
mesh/
unity/
environment/
Model_Data/
Model.x86_64
UnityPlayer.so
Dimensions.csv
Categories.csv
map.ply
object_list.txt
traversable_area.ply
map.jpg
render.jpg
Launch the system:
./system_simulation.shAfter seeing data showing up in RVIZ, users can use the 'Waypoint' button to set waypoints and navigate the vehicle around. The system supports three operating modes:

Base autonomy (smart joystick, waypoint, and manual modes)
-
Smart joystick mode (default): The vehicle follows joystick commands while avoiding collisions. Use the control panel in RVIZ or the right joystick on the controller.
-
Waypoint mode: The vehicle follows waypoints while avoiding collisions. Use the 'Waypoint' button in RVIZ, or click 'Resume Navigation to Goal' to switch to this mode.
-
Manual mode: The vehicle follows joystick commands without collision avoidance. Press the 'manual-mode' button on the controller.
Alternatively, users can run a ROS node to send a series of waypoints:
source install/setup.sh
ros2 launch waypoint_example waypoint_example.launchClick the 'Resume Navigation to Goal' button in RVIZ, and the vehicle will navigate inside the boundary following the waypoints. More information about the base autonomy system is available on the Autonomous Exploration Development Environment website.
Launch the system with the exploration planner:
./system_simulation_with_exploration_planner.shClick the 'Resume Navigation to Goal' button in RVIZ to start the exploration. Users can adjust the navigation boundary by updating the boundary polygon in src/exploration_planner/tare_planner/data/boundary.ply.
Note: On ARM computers, download the corresponding OR-Tools binary release and replace the
includeandlibfolders undersrc/exploration_planner/tare_planner/or-tools.

Base autonomy with exploration planner
The vehicle hardware is designed to support advanced AI. Space is left for users to install a Jetson AGX Orin computer or a gaming laptop. The vehicle is equipped with a 19V and a 110V inverter (both 400W) to power sensors and computers. A wireless HDMI module transmits signals to a control station.
We supply two types of wheels: Mecanum wheels for indoor carpet, and standard wheels for hard floor and outdoors.
Install Ubuntu 24.04 and ROS2 Jazzy on the processing computer. Add user to the dialout group:
echo "source /opt/ros/jazzy/setup.bash" >> ~/.bashrc
source ~/.bashrc
sudo adduser 'username' dialout
sudo reboot nowFollow the Installation section to install all dependencies and compile the full repository. For the motor controller, connect it via USB and update the serial device path in src/base_autonomy/local_planner/launch/local_planner.launch and src/utilities/teleop_joy_controller/launch/teleop_joy_controller.launch if needed (default: /dev/ttyACM0).
Test the teleoperation:
source install/setup.sh
ros2 launch teleop_joy_controller teleop_joy_controller.launchThe system uses a Ricoh Theta Z1 360-degree camera. The camera driver and lidar-to-camera calibration tools are maintained in a separate repository — clone it alongside this repo and follow its README to build and configure:
https://github.com/jizhang-cmu/360_camera/tree/jazzy
Launch the full system:
./system_real_robot.shLaunch with the exploration planner:
./system_real_robot_with_exploration_planner.shTo run the system with a recorded bagfile, open three terminals:
Terminal 1 - Launch the system:
./system_bagfile.sh
# or with exploration planner:
./system_bagfile_with_exploration_planner.shTerminal 2 - Republish camera images:
ros2 run image_transport republish \
--ros-args \
-p in_transport:=compressed \
-p out_transport:=raw \
--remap in/compressed:=/camera/image/compressed \
--remap out:=/camera/imageTerminal 3 - Play the bagfile:
source install/setup.bash
ros2 bag play bagfolder_path/bagfile_name.mcapExample bagfiles are available here.
Note: Before processing bagfiles, ensure the repository has been fully compiled following the Installation section.
The project is led by Ji Zhang's group at Carnegie Mellon University.
The base autonomy system is based on Autonomous Exploration Development Environment. The SLAM module is an upgraded implementation of LOAM.
If you find this work useful, please consider citing:
@article{zhu2026sysnav,
title={SysNav: Multi-Level Systematic Cooperation Enables Real-World, Cross-Embodiment Object Navigation},
author={Zhu, Haokun and Li, Zongtai and Liu, Zihan and Guo, Kevin and Lin, Zhengzhi and Cai, Yuxin and Chen, Guofei and Lv, Chen and Wang, Wenshan and Oh, Jean and Zhang, Ji},
journal={arXiv preprint arXiv:2603.06914},
year={2026}
}This project is licensed under the BSD 3-Clause License.
Some third-party packages retain their original open-source licenses (BSD, MIT, Apache 2.0, GPLv3). See individual package.xml files for per-package license declarations.






