Machine Learning Recycling Project — Part 2 — The Development Starts!

Nathan Bailey
9 min readNov 2, 2023

--

In my previous blog, I detailed the proposal for an upcoming machine-learning project to sort waste into recyclables and non-recyclables automatically. In this blog, I detail the current development completed and what the next stages will be.

Scraping the council data

The first stage was determining what council the user resided in, given their postcode. This could be completed using several methods. The simplest would be using a Python browser extension such as Selenium to open up the relevant UK government website, input the postcode and scrape the results. However, this is not an ideal solution since this requires a browser window to be opened, this does not happen in the background and therefore creates a clanky system from the user’s point of view.

I settled on creating a script that makes a POST request to the government’s website and processes the output. The details of the POST request were determined through postmaster and the actual post request was achieved using the request Python module.

Next, a sample council database was collected. To start I collected the recycling requirements from 3 councils and formed a JSON database, which can be seen below. This will be extended in the future to cover all councils. For now, this is sufficient for the proof-of-concept design that we require. The JSON format means that the database can easily be processed and parsed in a Python script once the postcode is retrieved from the government website.

{"entries":
[
{"council": "Cambridge City Council",
"Recyclable": ["paper", "cardboard", "glass", "plastic", "metal"],
"Non-Recyclable": [""],
"Organics": true
},
{"council": "Coventry City Council",
"Recyclable": ["paper", "cardboard", "glass", "plastic", "metal"],
"Non-Recyclable": [""],
"Organics": true
},
{"council": "Wokingham Borough Council",
"Recyclable": ["paper", "cardboard", "plastic", "metal"],
"Non-Recyclable": ["glass"],
"Organics": true
}
]
}

Collecting an initial dataset

The end goal will be to collect a dataset for this project, with the images captured when objects are placed in a bin. This will ensure that when the model is deployed, the frames passed to the model closely match the data it was trained on. However, for initial development, we collect a dataset based on images available.

There already exist multiple ‘waste datasets’ and these have been collected on GitHub. The most appropriate dataset found was collected in a paper [1] which aimed to classify objects into 1 of 4 recyclable categories using CNNs. The dataset was formed from multiple existing datasets and labelled by hand. This was a good starting point for the project and allowed development to occur quickly without a dataset needing to be manually collected or labelled.

This dataset was then expanded upon to include organic (food waste) objects using 2 additional datasets:

Sample Images From Dataset

As an aside, this paper [1] where the dataset was collected gives us a good starting point for this project as a whole. They aim to train multiple CNNs to classify objects into recyclable and non-recyclable classes and then deploy the solution in a practical application. Therefore, it gives a benchmark to compare our solution to and also allows us to identify improvements that could be made.

The authors of this paper focus on large networks, which for our purpose are not appropriate. We want our networks to be small, lightweight and fast, such that they could be used in a real-world application. The authors do deploy the network in a real-world application but do not consider the speed at which the inference results are produced. It is also useful to note that they only take 1 frame per second from the camera to perform image recognition, making the application slow.

They also fail to properly utilise the power of the networks. It is clear from the training graphs that they produce that these networks suffer greatly from overfitting. The validation (or testing) metrics are much lower than the training results. This is an area that could be improved upon in our project.

Determining and training the initial model

For the classification network, we require a small model that can achieve high accuracy (ideally 90%+). Initially, the edge device to be used was a Google Coral Development Board/USB accelerator, so we added a requirement that the network must be able to be converted from Pytorch to TensorFlow-Lite. This is to enable the network to be deployed on the board. However, for the reasons explained below, this was not used in the final solution.

Initially, the network chosen to be used in the classification task was a MobileNet-V3, pre-trained on image net. This was then modified such that the output layer produced 5 outputs, one for each class and trained on the collected dataset.

As seen from the table below, the validation accuracy of this model was lower than we would like it to be. Additionally, due to the constructs used in this network, the PyTorch model could not be converted to a TF-Lite format so it could not be used in our solution.

A further 2 networks were trialled, the MobileNet-V2 achieved 90% accuracy, but we found that the EfficientNet-Lite v4 produced the best accuracy and was fast enough in our edge-device trials to be deployed. These 2 networks could also both be converted to TF-Lite. We consider the fact that we can optimise and prune a network to achieve faster speeds with small compromises in accuracy. So, for these reasons, we chose EfficientNet-Lite-v4 to be used as the classification network.

Initial Results of Proposed Networks

For the training process, we used the CrossEntropyLoss function, which is typically used in classification tasks and SGD using momentum and weight decay. We also employed a learning rate scheduler to decrease the learning rate if validation accuracy does not improve for a given number of epochs. We started with a base learning rate of 0.001 and decayed this down to 0.0000000000001. Lastly, we applied early stopping to stop the training process early if validation accuracy did not improve.

From the training graphs below, we can see that unlike the paper mentioned above, we have little to no overfitting, improving on their work.

Training Graphs for EfficientNet-Lite-v4

Increasing the accuracy

To increase the accuracy of the trained network, we explored the use of data augmentation to increase the size of the dataset without having to collect and label more data. The idea behind this is to use transforms such as random rotation, flips, colour jitter and Gaussian blurs to create “new” images that then can be used in the training dataset. This increases the number of samples for each class which in turn can increase the performance of the model. Additionally, adding these augmentations to the images allows the network to become more robust to changes and more generalizable [4].

We apply 5 augmentations to our training images increasing the dataset by 5x. Unfortunately, this did not increase our training accuracy. It was worthwhile investigating however, as we might see that later in the development process the robustness of the model is increased as a result.

Example Augmented Images
Training Graphs for EfficientNet-Lite-v4 with augmented images

In addition to the data augmentation, I also looked into 2 further augmentation techniques. CutMix and MixUp. These are interesting concepts, with the idea to combine 2 images of different classes. The first CutMix, places one image onto another image such that both images share the same space and are still visible [2]. This is similar to pasting one image on top of another. MixUp [3] on the other hand makes one image transparent and places another on top of it. The labels are then mixed to include both classes of the twoimages. CutUp aims to increase performance and robustness while MixUp aims to improve the network’s generalisation.

CutMix
MixUp

These were not used in the development of the network but were interesting to look into.

Creating the full system

Once the classification network had been created, this was integrated into a complete system that could take a video as input.

First, we had to create a simple detection network to work out if the current frame that we were capturing from the camera contained an object. To do this we construct a simple convolution neural network with four convolutional layers and two fully connected layers. This is visualized in the figure below. Each convolutional layer was followed with batch normalization and max pooling layers to downside the spatial dimensions of the feature maps. Fully connected layers utilised dropout as well.

Detection Network

Then we write a simple software program to take in a video input and sample frames from it, the frames are cropped and then passed to the detection network. If an object is detected, the cropped image is then passed to the classification network to determine the class of the object.

Full System Visualized

Determining the edge device to use

Initially, the plan was to use edge devices that I already owned, which in my case was the Google Coral development board. However, I had access to a Jetson Nano as well so I evaluated both using the EfficientNet-Lite model in the full system created above.

Jetson Nano and Google Coral

The Jetson Nano has the advantage in development as it can run PyTorch models. The Google Coral on the other hand only supports TensorFlow (TF) -Lite models running in INT-8 precision mode. This meant that all PyTorch models had to be converted to TF-Lite. This is not a trivial process but was made easier by using the PyTorch to Keras Python package. The resulting Keras model could then be converted to TF-Lite using built-in functions in TensorFlow.

Before the model was used on the Coral it was converted to use INT-8 precision and to fully exploit the Edge TPU on the device.

Converting to TF-Lite means that some operations are not supported and this was highlighted when converting our models. For example, the MobileNet-V3 could not be converted and this was the reason for choosing an alternative network as seen above.

Interestingly converting the EfficientNet-Lite significantly reduced the accuracy of the network to the point where it could not be reliably used anymore. Unsurprisingly, due to the smaller precision and the fact that the Edge TPU on the device is optimised for TF-Lite, the Google Coral achieved higher speeds. However, upon reviewing the software running on the Jetson Nano, it was clear that the program could process frames fast enough to be used in our system even without further model optimisations. Therefore, the Jetson Nano was chosen as the device to use.

Edge Device Results

A future task would be to determine exactly where in the conversion process the model loses its accuracy. This is not something that has not been explored fully yet, but removing the conversion to INT-8 precision was not the root cause of this problem. The issue is a result of the conversion process from PyTorch to TF-Lite.

--

--

Nathan Bailey

MSc AI and ML Student @ ICL. Ex ML Engineer @ Arm, Ex FPGA Engineer @ Arm + Intel, University of Warwick CSE Graduate, Climber. https://www.nathanbaileyw.com