The SeparAItor is a proof of concept of an autonomous recycling robot that exploits the power of deep learning to enhance human capabilities in a task that is both crucially important and severely underused (around 9% of plastic waste is recycled and 79% goes to landfills or to the environment).
Since recycling is mainly a sorting exercise, and one of deep learning's main applications is the categorization of different images, this was a very natural progression. Keras (in TensorFlow2) is used for the machine learning part, mainly due to its simplicity, as well as some OpenCV for image preprocessing.Base
The base consists of a simple pan and tilt system with a box on top. It comunicates with the computer over Bluetooth and is powered by an on-board battery, so the actuator part is completely independent. A Lazy Susan turntable bearing and a 608zz axial bearing provide mechanical stability to the system.
To simplify the electronics two servos are used to control the base, so there is a 2:1 relationship between the pan servo's gear and the one in the base to be able to rotate a full 360 degrees.
Finally, two super-sticky, anti-slip silicone gel pads like the ones used to hold phones on car dashboards are used to both carry the cardboard box at the top as well as to keep the base firmly in place and stop it from rotating and sliding around.
With the base assembled and the cenital camera set up the dataset acquisition was the next step. While I initially downloaded images from the internet to train on, these were either renders or professional-looking pictures designed to sell the product, most often just showing the side with the logo. Since I needed to get it to work with real-life objects I thought the best way was to take pictures in the real conditions.
Using the "take_picture, py" script I was able to save pictures by pressing a key. These images were first pre-processed to resize them to the adequate dimensions and then had their background removed, to only keep the relevant parts of the image for training.
Finally, since adding an object not only changes that part, but also creates shadows and can tilt the base, only the largest single blob was deemed part of the object.
Since this method of adding objects repeatedly at different angles was very time-consuming I only took 300 images for each one of the six classes, which makes for a very small dataset. To compensate for this I used the pretrained VGG19 network as the convolutional part of my network, as well as using K-fold cross-validation to get a better overview of the training process.
Finally, since this last method involved creating and loading multiple instances of the model into the GPU, the latter would run out of memory pretty fast if run in a sequential manner, so each fold was run in a different process to be able to free the VRAM between iterations.
Once the neural network performed reasonably well it was just a matter of writing some code to finish the system up. After connecting to the serial port over Bluetooth, the script loads the trained model and initializes the camera.
A MOG2 background subtractor is used here as well to detect motion in the image, which indicates that a new object has been added, and once there is a falling edge (no movement found after a previous detection, object no longer moving) the image is processed and the class as well as the corresponding bin are identified.
This information is then sent to the base, which will drop the item in the correct bin and send a message once it's done. The background image of the empty tray is then updated and the system is ready for a new detection.
Post imported from http://www.alvaroferran.com/projects/separaitor