SortBot

Project overview.

An intelligent robotic sorting system built in PyBullet that combines deep reinforcement learning (PPO) with object detection (YOLOv8). The uArm robot detects, picks up, and sorts coloured boxes into their designated zones autonomously.

This project was developed as part of the AI in Robotics (41118) course at the University of Technology Sydney, Australia.

Animated GIF of UArm successfully sorting boxes

Animated GIF of UArm successfully sorting boxes in a PyBullet window

How it works.

Our project is an automated sorting system that leverages a UArm Metal robot arm to tackle everyday sorting challenges. The arm is equipped with a suction cup end effector, which gently picks up and moves boxes to their designated zones—where coloured boxes are sorted into designated areas that match its colour. A camera paired with a You Only Look Once (YOLO) algorithm detects and distinguishes the boxes by color, while a reinforcement learning policy using Proximal Policy Optimisation (PPO) guides the arm's movements with adaptive precision.

Key components:

For visual perception, we employ Ultralytics YOLOv8, a state-of-the-art object detection model, to localise and classify boxes from a top-down camera feed. YOLOv8 detects each object’s class (small, medium, large) and predicts its bounding box coordinates. These detections are then translated into real-world positions, allowing the robot to autonomously identify where each object is and decide where to place it based on its class. YOLOv8 offers fast, accurate, and real-time inference, making it ideal for the robotic sorting task.

YOLO bounding boxes, with classes representing the colour they will sort into.

Training process for the YOLO algorithm, where different views of the objects are shown and labelled.

This project uses Proximal Policy Optimization (PPO), a reinforcement learning algorithm, from Stable Baselines3 to train a robotic arm to navigate precisely to target positions. PPO balances exploration and exploitation by updating the policy in a controlled manner, making it well-suited for continuous control tasks. In our case, PPO learns joint movements that align the robot’s end-effector with object and goal locations. The training process optimises for efficiency and accuracy in movement, enabling the robot to generalise across varied object positions without explicit hard-coding.

Snapshot of PPO training process, featuring the reward system.

Simulated UArm which will be controlled to pick up and sort boxes using PPO.

Team members.

The best team?

This project was built through a collaborative effort, where each team member contributed to different aspects—from AI and vision systems to mechanical design and documentation. As a group of students, we explored ideas, planned solutions, and worked together to bring everything together.

Below is a breakdown of our roles.: