This repository contains the full implementation of my Sensing and Perception Group Project at King’s College London:
NAO Robot Autonomous Ball Retrieval System
Sensing and Perception Group Project | King's College London | August 2025
This project develops a comprehensive sensing and perception framework for the NAO V5 humanoid robot to autonomously detect, track, navigate to and kick a tennis ball. Inspired by RoboCup Soccer and tennis court ball kid assistance, the system integrates multiple robotics domains:
- Computer Vision: Real-time ball detection and tracking using OpenCV
- Path Planning: Dynamic obstacle avoidance with A(star) algorithm
- SLAM: Sparse 3D reconstruction inspired by ORB-SLAM2
- Motion Planning: Custom kick kinematics with balance constraints
- Human-Robot Interaction: Voice command recognition system
The robot was tested in Quad Lab, King's College London.
- Project Objectives
- System Architecture
- Simulation Environment
- Technical Implementations
- Results & Performance
- Installation & Setup
- Demo Videos
- Challenges & Solutions
- Future Work
- Acknowledgments
- References
This project implements a fully autonomous navigation pipeline for the NAO humanoid robot, enabling the robot to:
- Detect a target object (tennis ball)
- Build and maintain a grid-based world representation
- Compute an optimal path using the A* algorithm
- Avoid static obstacles and reach the target reliably
- Execute the computed path in simulation and on a real NAO robot
┌─────────────────┐ │ Voice Command │ │ Recognition │ └────────┬────────┘ │ ▼ ┌─────────────────┐ ┌──────────────┐ │ Ball Detection │◄─────┤ NAO Camera │ │ (OpenCV) │ └──────────────┘ └────────┬────────┘ │ ▼ ┌─────────────────┐ ┌──────────────┐ │ Visual Tracking│◄─────┤ Head Control │ │ (Proportional) │ │ (ALProxy) │ └────────┬────────┘ └──────────────┘ │ ▼ ┌─────────────────┐ ┌──────────────┐ │ SLAM System │◄─────┤ Feature │ │ (ORB-based) │ │ Extraction │ └────────┬────────┘ └──────────────┘ │ ▼ ┌─────────────────┐ │ Path Planning │ │ (A* Algorithm) │ └────────┬────────┘ │ ▼ ┌─────────────────┐ ┌──────────────┐ │ Motion │◄─────┤ Kick │ │ Execution │ │ Kinematics │ └─────────────────┘ └──────────────┘ The system consists of four primary layers:
- Image-based ball detection (optional extension)
- Occupancy grid generation
- Static obstacle identification
- A*-based global path planner
- Manhattan distance heuristic
- Node expansion, open/closed set management
- Webots for physics-based robot simulation
- RViz/Foxglove for visualising grid and planned path
- NAOqi API for body movement
- Path smoothing and waypoint tracking
The project integrates multiple tools:
- Full NAO model
- Obstacle environment
- Tennis-ball placement
- Kinematic control
- Grid visualisation
- Path expansion timeline
- Debugging of occupancy cells
- Real-time monitoring
- Playback of navigation logs
- More realistic integration with ROS tools and MoveIt planning
- More fragile on newer Ubuntu versions
Algorithm Steps:
- Image Acquisition: Capture RGB frames from NAO's camera (320×240 resolution)
- Color Filtering: Apply HSV color space conversion and yellow mask
- Noise Reduction: Morphological operations (erosion + dilation)
- Contour Detection: Identify closed contours using OpenCV
- Circle Validation: Filter circular contours and compute center coordinates
Proportional control for head tracking:
θ = k × (x - x_center) Where:
θ = angular adjustment k = proportional gain constant x = ball center x-coordinate x_center = image frame center
Distance estimation from radius:
distance ≈ f(radius) [inverse relationship]
Ball Detection from the NAO camera
Coordinates and radius of ball
# Ball Detection Core Logic hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV) mask = cv2.inRange(hsv, lower_yellow, upper_yellow) mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel) contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) for contour in contours: ((x, y), radius) = cv2.minEnclosingCircle(contour) if radius > min_radius: cv2.circle(frame, (int(x), int(y)), int(radius), (0, 255, 0), 2) theta = k * (x - x_center) # Proportional controlThe A* implementation uses Manhattan distance heuristic for efficient pathfinding on a 2D grid:
h(n) = |x_n - x_goal| + |y_n - y_goal| - 8-way movement (diagonal movement allowed)
- Dynamic obstacle detection and avoidance
- Real-time path replanning (15-50ms per obstacle)
- Optimal path reconstruction via parent node tracking
| Metric | A* Algorithm | Dijkstra Algorithm | Improvement |
|---|---|---|---|
| Success Rate | 92% (50+ runs) | 88% | +4.5% |
| Path Length | Optimized | Baseline | 12% shorter |
| Replanning Time | 15-50ms | 25-70ms | 40% faster |
| Memory Usage | Moderate | High | Lower |
def a_star(start, goal, grid): open_set = PriorityQueue() open_set.put((0, start)) came_from = {} g_score = {start: 0} while not open_set.empty(): current = open_set.get()[1] if current == goal: return reconstruct_path(came_from, current) for neighbor in get_neighbors(current, grid): tentative_g = g_score[current] + 1 if tentative_g < g_score.get(neighbor, float('inf')): came_from[neighbor] = current g_score[neighbor] = tentative_g f_score = tentative_g + manhattan_distance(neighbor, goal) open_set.put((f_score, neighbor))This visual SLAM system adapts ORB-SLAM2 architecture to Python 2.7 constraints:
Pipeline Stages:
- Feature Extraction: ORB (Oriented FAST and Rotated BRIEF) feature detection (up to 3000 features)
- Feature Matching: FLANN-based descriptor matching across frames
- Motion Estimation: Essential matrix computation with RANSAC outlier rejection
- Keyframe Selection: Add keyframes on significant camera translation
- Triangulation: 3D point reconstruction from matched features
- Loop Closure: Periodic global optimization (threshold: 10+ keyframes)
- Map Building: Covisibility graph construction
(1) Nao in Gazebo environment with ball (2) Covisibility graph of landmarks and robot camera trajectory
ORB Feature Detection:
orb = cv2.ORB_create(nfeatures=3000) keypoints, descriptors = orb.detectAndCompute(image, None)Feature Matching with FLANN:
FLANN_INDEX_LSH = 6 index_params = dict(algorithm=FLANN_INDEX_LSH, table_number=6, key_size=12, multi_probe_level=1) flann = cv2.FlannBasedMatcher(index_params, {}) matches = flann.knnMatch(desc1, desc2, k=2)Essential Matrix & Camera Motion:
E, mask = cv2.findEssentialMat(pts1, pts2, focal=focal, pp=(cx, cy), method=cv2.RANSAC, prob=0.999, threshold=1.0) _, R, t, mask = cv2.recoverPose(E, pts1, pts2, focal=focal, pp=(cx, cy))Particle Filter SLAM:
Particle filter-based SLAM showing belief map evolution and robot state estimation
Development Steps:
- Physical Teaching: Manually guide NAO's leg through desired kick motion
- Joint Recording: Capture joint angles using Choregraphe timeline
- Motion Refinement: Fine-tune keyframes for smooth trajectory
- Balance Constraint: Weight shift to right leg + CoM recentering
- Cartesian Control: End-effector position interpolation
- Testing & Iteration: Validate stability and kick effectiveness
NAO Leg Degrees of Freedom (6 DOF per leg):
- Hip Yaw/Pitch: Position adjustment
- Hip Roll: Lateral movement
- Knee Pitch: Leg extension
- Ankle Pitch/Roll: Foot orientation
Kick motion visualization in Choregraphe showing successful execution
# Balance and kick execution motionProxy.wbFootState("Fixed", "RLeg") motionProxy.wbEnableBalanceConstraint(True, "Legs") # Cartesian interpolation for kick effector = "LLeg" space = motion.FRAME_ROBOT path = [ [0.0, 0.1, 0.05], # Retract [0.15, 0.1, 0.05], # Forward kick [0.0, 0.1, 0.0] # Return ] times = [1.0, 2.0, 3.0] motionProxy.positionInterpolation(effector, space, path, 0x3f, times, True) motionProxy.post.goToPosture("StandInit", 1.0)Challenges:
- Center of gravity balance during single-leg support
- Preventing robot fall-over post-kick
- Timing coordination between leg and arm movements
Due to Python 2.7 constraints on NAO, a novel dual-script system was implemented.
System Flow:
- Script 1 (Python 3.12): Runs on laptop, captures microphone input
- Speech Recognition: Processes audio using Google Speech API
- File I/O: Writes transcription to shared .txt file
- Script 2 (Python 2.7): Polls file, executes NAO commands via NAOqi
- Cleanup: Clears file after command execution to manage memory
# Python 3.12 - Speech Recognition Script import speech_recognition as sr recognizer = sr.Recognizer() with sr.Microphone() as source: audio = recognizer.listen(source) text = recognizer.recognize_google(audio) with open("command.txt", "w") as f: f.write(text)while True: if os.path.exists("command.txt"): with open("command.txt", "r") as f: command = f.read().strip() if command == "go get the ball": execute_ball_retrieval() open("command.txt", "w").close() # Clear file time.sleep(0.5)Limitations:
- Unable to run directly on NAO due to microphone compatibility issues
- Choregraphe simulation software incompatibility
- Workaround demonstrates concept but not fully integrated
| Component | Metric | Performance | Notes |
|---|---|---|---|
| Ball Detection | Accuracy | 95%+ | Controlled lighting conditions |
| Frame Rate | 15-20 FPS | 320×240 resolution | |
| Detection Range | 0.5m - 3m | Based on ball size | |
| Path Planning | Success Rate | 92% | 50+ test runs |
| Path Optimality | 12% better than Dijkstra | Length comparison | |
| Replanning Time | 15-50ms | Per obstacle update | |
| SLAM | Feature Detection | Up to 3000 ORB features | Per frame |
| Keyframe Threshold | 10+ frames | For global optimization | |
| Map Density | Sparse | Monocular constraints | |
| Kick Kinematics | Success in Simulation | 100% | Choregraphe testing |
| Real-world Stability | Unstable | Falls post-kick (needs tuning) |
Strengths:
- Robust ball detection under varying ball positions
- Efficient path planning with obstacle avoidance
- Successful SLAM feature extraction and matching
- Modular, maintainable codebase
- Comprehensive documentation
Limitations:
- Legacy Python 2.7 constraints limit modern libraries
- Kick kinematics require fine-tuning for stability
- SLAM trajectory distortion due to incomplete loop closure
- Speech recognition not fully integrated with NAO
- Limited testing time with physical robot
- NAO V5 Humanoid Robot
- Computer running Ubuntu 14.04 (for ROS Indigo compatibility)
- Minimum 4GB RAM, 20GB storage
- Python 2.7.x (NAO compatibility) - Python 3.12+ (Speech recognition) - ROS Indigo - NAOqi SDK 2.1.4.13 - OpenCV 3.x - NumPy 1.x - Gazebo 2.x - MoveIt - Choregraphe 2.1.4- Clone Repository
git clone https://github.com/Degas01/nao_robot.git cd nao_robot- Set Up Python 2.7 Environment (NAO)
virtualenv -p python2.7 venv_nao source venv_nao/bin/activate pip install -r requirements.txt- Set Up Python 3.12 Environment (Speech)
python3.12 -m venv venv_speech source venv_speech/bin/activate pip install -r requirements_py312.txt- Install ROS Indigo & Dependencies
sudo sh -c 'echo "deb http://packages.ros.org/ros/ubuntu trusty main" > /etc/apt/sources.list.d/ros-latest.list' sudo apt-get update sudo apt-get install ros-indigo-desktop-full sudo apt-get install ros-indigo-naoqi-driver sudo apt-get install ros-indigo-moveit- Build ROS Workspace
mkdir -p ~/catkin_ws/src cd ~/catkin_ws/src catkin_init_workspace ln -s /path/to/nao-autonomous-ball-retrieval . cd ~/catkin_ws catkin_make source devel/setup.bash- Install Gazebo & NAO Models
sudo apt-get install gazebo2 cd ~/catkin_ws/src git clone https://github.com/ros-naoqi/nao_meshes.git git clone https://github.com/ros-naoqi/nao_robot.git catkin_make- Initialize NAO robot connection
- Start ball detection module
- Wait for voice command "go get the ball"
- Begin visual tracking and SLAM
- Compute path using A*
- Navigate to ball location
- Execute kick when in range
- Return to start position
VID-20250329-WA0012.mp4
VID-20250329-WA0014.mp4
Nao_Astar.mp4
kick_sim.mp4
Speech.Recognition.Showcase.mp4
- NAO requires Python 2.7 and NAOqi SDK, incompatible with modern libraries (YOLO, TensorFlow)
- pip package ecosystem deprecated for Python 2.7
- Use OpenCV 3.x (last version supporting Python 2.7) for ball detection
- Implement ORB-SLAM2 pipeline from scratch using available libraries
- Create dual-script architecture for speech recognition (Python 3.12 ↔ Python 2.7)
Increased development complexity but ensured NAO compatibility
- NAO falls over after executing kick motion in real world
- Center of gravity shifts excessively during single-leg balance
- Implemented weight shift to supporting leg using wbFootState
- Added balance constraints with wbEnableBalanceConstraint
- Manual joint fine-tuning (ongoing)
- Future: Predictive balance model with IMU integration
Works in simulation, requires further real-robot tuning
- Camera trajectory shows significant drift over time
- Loop closure mechanism incomplete, causing accumulated error
- Implement bag-of-words (BoW) approach for better loop detection
- Integrate IMU data for motion prediction (ORB-SLAM3 approach)
- Add bundle adjustment optimization after loop closure
Sparse map still useful for local navigation (0-5m range)
- MoveIt unable to update NAO joint poses dynamically in Gazebo
- Planned trajectories execute in Rviz but not in simulated robot
ROS Indigo + Gazebo 2.x compatibility issues with NAO controller
- Test kick planning separately in Rviz (visual validation)
- Execute pre-computed trajectories via Python scripts
- Use Choregraphe for kinematic validation
Upgrade to ROS Noetic + Gazebo 11 (requires NAO SDK update)
- NAO's onboard microphone undetectable by speech recognition libraries
- Choregraphe audio modules incompatible with external Python scripts
- Use laptop microphone for speech capture (Python 3.12)
- File-based communication between Python 3.12 and Python 2.7 scripts
- NAO executes commands from parsed text file
Not fully autonomous (requires external laptop)
- Kick Stability Enhancement
- Integrate Kalman filter for balance prediction
- Add ZMP (Zero Moment Point) calculation for dynamic stability
- Implement adaptive kick force based on ball distance
- Test with various ball positions and weights
- SLAM Optimization
- Implement bag-of-words for robust loop closure
- Add bundle adjustment after every N keyframes
- Integrate IMU data for motion prior (ORB-SLAM3 style)
- Dense reconstruction using patch-based stereo
- Path Planning Enhancements
- Add dynamic replanning for moving obstacles
- Implement RRT* for complex environments
- Integrate SLAM map directly into A* cost function
- Test in outdoor tennis court environment
- Multi-Ball Tracking
- Extend detection to handle multiple balls simultaneously
- Prioritize closest ball using depth estimation
- Implement ball sorting strategy (e.g., nearest-first)
- Human Interaction
- Gesture recognition for commands (waving, pointing)
- Ball handoff detection using pressure sensors
- Natural language dialogue system
- Energy Efficiency
- Optimize gait for battery conservation
- Sleep mode when idle
- Periodic recharging behavior
- RoboCup Soccer Integration
- Multi-agent coordination with other NAO robots
- Opponent detection and avoidance
- Goal recognition and scoring strategy
- Deep Learning Integration
- Replace OpenCV with YOLO v8 ball detection (requires Python 3.x migration)
- Deep reinforcement learning for kick optimization
- Neural SLAM (e.g., Neural Recon)
- Full Autonomy
- Eliminate external laptop dependency for speech
- Onboard edge computing module (e.g., Jetson Nano)
- 5G connectivity for cloud offloading
- Multi-modal Fusion: Combine vision, IMU, and pressure sensors for robust state estimation
- Sim-to-Real Transfer: Train policies in simulation, deploy on real robot
- Explainable AI: Visualize decision-making process for debugging and trust
*Provided NAO robot and lab facilities (Quad Lab)* *NAO robot platform and NAOqi SDK* *ROS Indigo, Gazebo, MoveIt packages* *Harry Braganza, Hitesh Anavai, Mohammad Islam and Kriti Chauhan* *Computer vision library* *Raúl Mur-Artal and Juan D. Tardós for SLAM architecture inspiration* *Dr. Oya Celiktutan and teaching assistants for guidance* *Resources and documentation* - RoboCup Standard Platform League. https://spl.robocup.org/
- Li, Q. & Zhao, Y. (2024). "Tennis Ball Recognition in Complex Scenes Based on Improved YOLOv5." ICAACE. DOI: 10.1109/icaace61206.2024.10548503
- Leiva, L.A. et al. (2018). "Playing soccer without colors in the SPL: A convolutional neural network approach." arXiv:1811.12493.
- Bradski, G. (2008). Learning OpenCV: Computer Vision with the OpenCV Library. O'Reilly.
- Baevski, A. et al. (2020). "wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations." arXiv:2006.11477.
- Hart, P., Nilsson, N., & Raphael, B. (1968). "A Formal Basis for the Heuristic Determination of Minimum Cost Paths." IEEE Transactions on Systems Science and Cybernetics, 4(2), 100-107.
- Kalman, R.E. (1960). "A New Approach to Linear Filtering and Prediction Problems." Journal of Basic Engineering, 82(1), 35-45.

