February 15, 2023

AI life sciences use-case of colony forming units counting

AI life sciences use-case of colony forming units counting


Nowadays, it is hard to underestimate machine learning (ML) and artificial intelligence's (AI) contributions to scientific research. Numerous novel ideas are presently being actively explored to make ML models more accurate, robust, and usable. Adapting ML methods and techniques to the specificity of individual scientific disciplines takes time and is of interest to many teams of scientists. As researchers, we can leverage the best of AI's revolutionary innovations to automate tedious and repetitive tasks, work with larger data volumes, and tackle more demanding research activities. 

Let's not get it wrong. It is not that ML is a new concept in the world of science. Now it's just easier for us to make full use of it. We owe it to the variety of open-source software libraries, public training datasets, excellent documentation, and, most importantly, the ability to explain the result of the ML model.

As researchers, we are becoming increasingly efficient in using information technology. More and more researchers can write good computer code. Thanks to this, achieving proficiency in computer-aided data analysis techniques is within reach of more research teams. There is no longer a barrier posed by computer code and programming in conducting exciting research.

ML has excellent potential to play a supporting role in daily scientific activities. Even now, you are probably aided by advanced systems whose key feature is learning from data. The task is getting more manageable due to the miniaturization of hardware capable of running ML models. Among the available solutions, small-size devices like microcontrollers are exciting.

A great example is computer vision (CV). Being able to deploy even a complex vision model on a board not larger than a box of matches opens many exciting opportunities. That and the availability of hardware components, support from great DIY communities willing to share their knowledge, and friendly programming ecosystems give the researchers all the tools to create an advanced prototyping playground.

Perfect test areas of Artificial Intelligence at the edge are life sciences. To prove it, we will tackle a widespread problem: counting bacterial colonies after incubation on an agar plate. In this blog post, you will find a description of our simple experiment and some results. We've also included a short discussion section.

So, let’s grab our coffees, Petri dishes, and a microcontroller. It’s definitely a fine day for science!

This blog post was created by awesome siblings: Asst. Prof. Justyna Adamiak, Ph.D., and Maciej Adamiak, Ph.D., supported by their qualified assistants Karolinka and Jagódka.

Calculating bacterial number

Table 1. Solid media types.

If you’ve ever bought a probiotic drink, you’ve probably noticed the term “CFU” on the label. The widely used gold standard method for determining bacterial cell numbers is Colony Forming Units (CFU) counting. CFUs simply refer to individual colonies of bacteria, fungi, or yeast and can be reported per analyzed area or volume, depending on the purpose of your experiment. In our case, we checked the magnitude of contamination of multiple objects used daily. We simply made a stamp of a chosen object on a surface of solid media by gently pressing and holding it for about 10 sec. The surface of a plate is like a canvas for counting. After 24h - 36h of incubation at 25°C - 30°C (77°F - 86°F), you can see colonies that vary in size, shape, and color or, as we call it - morphology.

Fig 1. Incubated bacteria colonies; from the left-upper corner: fingerprint, coin, mobile phone, air sample.


We planned a multistage experiment that consisted of the following tasks: 

  1. Create a Do-It-Yourself monitoring platform based on Arduino, and Raspberry Pi - a decent portion of hardware wiring and additive manufacturing was involved in this task.
  2. Prepare agar plates and incubate different types of bacteria - we used everyday items to gather bacteria samples. Not for the faint of heart 😅.
  3. Gather and annotate training data - acquiring photos and labeling bacteria clusters went smoothly and was, surprisingly, not a time-consuming task.
  4. Train and test a machine learning object detection model.
  5. Prepare a data analysis layer - it was vital for us to show how object detection results can be utilized in the following stages of a research process.
  6. Perform analysis of bacteria growth based on image time series - it was a time-consuming but satisfying part of our experiment.
  7. Prepare video material presenting our findings.
Fig 2. Plan of our experiment.


Monitoring device

We love the possibilities that 3D printing gives for rapid prototype development. After several iterations, we decided that our monitoring device would be a simple semi-transparent jar. This made storage easy and let us experiment with various diffused light sources to create the best conditions for the camera sensor. Drilling a few holes assured uninterrupted airflow and bacteria growth. 

The purpose of the solid lid was to hide a microcontroller and a camera. We used graphene filament to print the cover to make it more durable. From the available vision modules, Arduino Portenta combined with a LoRa vision shield and good-old Raspberry Pi with a camera were the best choice due to their small sizes and enough computational power to support a machine vision model. We utilized two distinct sensors to check what approach would be better in a full-scale experiment.

For more information regarding ML on Arduino Portenta visit this blog post.

Fig. 3a. Monitoring device - Arduino Portenta.
Fig. 3b. Monitoring device - Raspberry Pi.

Data collection and preprocessing

Fig. 4. Data preparation process.

The sample preparation and data collection process was simple. We applied the bacteria in the chosen medium by stamping the object into the agar plate. In some cases, we utilized the object directly. In others, we used a cotton swab to wipe the object and transfer the bacteria to the medium. The Petri dishes were then placed in our monitoring containers. 

Table 2. Dataset description. Ad 1. Taken from my daughter's hands after a day at kindergarten - 100% biohazard 😆

The devices were configured to take a photo every 5 minutes and then send it to the storage. The Ardunio jar captured 320px x 320px grayscale images. The RGB images acquired using Rasberry Pi were at 2048px x 2048px resolution.

Working in a home laboratory enabled us to maintain a constant temperature and humidity. We needed an additional setup to ensure proper light conditions in the containers. Combining the embedded light sources and a shadeless photographic tent did the job. This is something to consider in the next iteration of the container. 

We gradually removed older samples during data acquisition by pouring bleach on them. Between capturing each sample, the container and the laboratory were sterilized using a UV lamp.

After collecting all the samples, the data was preprocessed to fit the needs of the machine learning training phase. Firstly we cropped the images with a constant size circle to remove everything besides the working area. Then we converted RGB images to grayscale. During this experiment, we do not need the color information. This approach will make the model simpler. 

In the end, we had two distinct datasets: Raspberry Pi high-resolution and Arduino Portenta low-resolution grayscale time-lapses.

With the help of the annotation software, we also assigned the object markers (points). We started with the simplest case possible. We marked only clusters of bacteria without trying to identify specific types or species. There will be time for this during more advanced research supported by microscopic analysis and a spectral camera.


Training a machine learning model for an embedded device, was the project phase where the magic happened. Fortunately, designing and deploying a model with an adequately curated dataset and a dedicated platform is simple. You can use a predefined model enabled by various vendors or implement your own. I recommend the latter option only to people with a lot of knowledge of TensorFlow or PyTorch.

We decided to treat bacteria strain classification as object detection and instance segmentation computer vision tasks.

Fig. 5. Model architecture.

Object detection models can find areas of the image representing specific classes of things. This is great for counting. Of course, bacteria colonies cannot be treated as concrete objects. Rather than things, they are the amorphous stuff that we have to find and name. That’s why we used a point detection model, which utilizes heat maps during detection rather than well-known bounding boxes. Two approaches were tested. First on grayscale Arduino Portenta images and second using RGB images captured using Raspberry Pi.

By the way, using an ML model is optional in such a simple task. Satisfactory results could be achieved by properly thresholding the image and applying watershed or by simple blob detection.

Nevertheless, trying multiple algorithms when experimenting is always a good approach.


The truth is we failed miserably when using images captured using Arduino Portenta. We managed to acquire some results, but due to insufficient image resolution (320px x 320px), the model detected only the largest objects that appeared in the middle of the incubation process. Vision shield sensor is not designed for such applications. We were prepared for such an outcome. Now we know for sure that a device with different specifications is needed. The experiment failed successfully 🙂

Fortunately, images from Rasberry Pi were good enough to perform the detection process. In the movie below, you can see one of the test samples processed using the algorithm. Still a little bit noisy but significantly faster than calculating each instance by hand. The noise can be reduced considerably by preparing a less dusty monitoring device with an additional moisture-wicking mechanism.

Video 1. Counting bacteria colonies - clean result.


How can machine learning contribute to life sciences? 

Machine learning has already proved to be an auspicious problem solver across many industry sectors, but quite recently, it has started making inroads in life science by providing a powerful tool for surveying and classifying biological data [1,2,3]. The life science industry gradually benefits from deploying artificial intelligence and machine learning in the research agendas, especially in genomics, chemistry, biophysics, microscopy, medical analysis, and many others.  It has become increasingly relevant in healthcare, such as cancer research, studying brain cells, and developing therapeutics. Numerous studies show that machine learning algorithms dive into data in ways that humans can’t and can detect features that might otherwise be impossible to catch, such as classifying cellular images or making genomic connections [1,3,4]. Machine learning tools are evolving rapidly, and to take advantage of them or to fully use their potential, life science labs will soon need dedicated computational expertise, collaborations, or both.

Limitations of machine learning

Although machine learning promises a breakthrough in the scientific world, it has some limitations worth mentioning: 

  1. Lack of large-scale, high-quality data - how to train your model with fewer data, which is often a thing in life science; 
  2. Quality of data - your model is as good as the input data; 
  3. The balance between an accurate algorithm and a biologically explainable outcome is still difficult to achieve; 
  4. Lack of solid, biologically relevant judgment - computers can’t fully replace scientists [1,5].


Needless to say, we had a lot of fun with the assignment! Planning the experiment, designing and building a monitoring device, incubating bacteria, and training a machine learning model - everything in one short life science project.

What we have learned is, above all, the possibilities of using machine learning in even the craziest scenario. As you have probably noticed, there are multiple applications of embedded artificial intelligence. Counting bacterial colonies is just the beginning.

You are probably wondering what we will do next with our data. We plan to work on expanding the dataset. Incubation takes a lot of effort. Perhaps we will add microscopic images and try a multimodal approach. Who knows? 🙂 We will definitely let you know when we publish the collection as public.

In addition, we need another iteration of our device. The current one is fine but handles only one sample at a time.

Thanks for reading, and good luck with your experiments!


[1] Webb S. (2018) Deep learning for biology. Nature 554, 555-557.

[2] Pugliese R. et al. (2021) Machine learning-based approach: global trends, research directions, and regulatory standpoints. Data Science and Management 4, 19-29.

[3] Bajorath J. (2022) Artificial intelligence in interdisciplinary life science and drug discovery research. Future Science OA 8, FSO792.

[4] Ghosh, S., Dasgupta, R. (2022). A Brief Overview of Applications of Machine Learning in Life Sciences. In: Machine Learning in Biological Sciences. Springer.

[5] Ching T.  et al. (2018) Opportunities and obstacles for deep learning in biology and medicine. Journal of the Royal Society Interface 15, 20170387.

Reviewed by Adam Wawrzyński and Kamil Rzechowski