SortBot

SortBot

Project overview.


An intelligent robotic sorting system built in PyBullet that combines deep reinforcement learning (PPO) with object detection (YOLOv8). The uArm robot detects, picks up, and sorts coloured boxes into their designated zones autonomously.

This project was developed as part of the AI in Robotics (41118) course at the University of Technology Sydney, Australia.

Animated GIF of UArm successfully sorting boxes
Animated GIF of UArm successfully sorting boxes in a PyBullet window

How it works.


Our project is an automated sorting system that leverages a UArm Metal robot arm to tackle everyday sorting challenges. The arm is equipped with a suction cup end effector, which gently picks up and moves boxes to their designated zones—where coloured boxes are sorted into designated areas that match its colour. A camera paired with a You Only Look Once (YOLO) algorithm detects and distinguishes the boxes by color, while a reinforcement learning policy using Proximal Policy Optimisation (PPO) guides the arm's movements with adaptive precision.

Key components:

For visual perception, we employ Ultralytics YOLOv8, a state-of-the-art object detection model, to localise and classify boxes from a top-down camera feed. YOLOv8 detects each object’s class (small, medium, large) and predicts its bounding box coordinates. These detections are then translated into real-world positions, allowing the robot to autonomously identify where each object is and decide where to place it based on its class. YOLOv8 offers fast, accurate, and real-time inference, making it ideal for the robotic sorting task.

YOLO bounding boxes, with classes representing the colour they will sort into.
Training process for the YOLO algorithm, where different views of the objects are shown and labelled.

This project uses Proximal Policy Optimization (PPO), a reinforcement learning algorithm, from Stable Baselines3 to train a robotic arm to navigate precisely to target positions. PPO balances exploration and exploitation by updating the policy in a controlled manner, making it well-suited for continuous control tasks. In our case, PPO learns joint movements that align the robot’s end-effector with object and goal locations. The training process optimises for efficiency and accuracy in movement, enabling the robot to generalise across varied object positions without explicit hard-coding.

Snapshot of PPO training process, featuring the reward system.
Simulated UArm which will be controlled to pick up and sort boxes using PPO.

Team members.


The best team?

This project was built through a collaborative effort, where each team member contributed to different aspects—from AI and vision systems to mechanical design and documentation. As a group of students, we explored ideas, planned solutions, and worked together to bring everything together.

Below is a breakdown of our roles.:

Matthew Truong doodle

Matthew Truong LinkedIn

Mechanical design lead

Designed the UArm and its part for use in simulations.

Michele Liang doodle

Michele Liang LinkedIn

Vision assist

Help with computer vision and YOLO algorithm, and creating documentation (+ this website!).

Minh Nguyen doodle

Minh Nguyen LinkedIn

AI Lead

Led in creation and training using PPO and YOLO, assisted with mechanical design.

Lauren Seeto doodle

Lauren Seeto LinkedIn

Robot controller

Created the UArm controller and assisting with documentation.

Luis Pratama doodle

Luis Pratama LinkedIn

Vision lead

Created the final video, led computer vision and YOLO algorithm.

Final video.


Check out the final video below!

Code.


Check out our GitHub repository for the code!

Screenshot of main branch in the GitHub repository
A simulation environment for a robotic arm using Bullet Physics ExampleBrowser with OpenGL3+. On the left side, there's a terminal displaying log messages about the robotic arm's movements and tracking its goal positions. Raw input for YOLO with the coloured boxes as target objects and without bounding boxes. Image of UArm metal in real life. YOLO predicted values (left) and ground truth (right).
Concept for reward system of the StackBot, the idea predecessor for the SortBot. Performance logs on PPO training. URDF model of UArm with defined axes in each joint.
×

Powered by GitHub Pages | Template from w3.css icon