
ReasonField Lab
a SoftwareMill Group Company
hello@reasonfieldlab.com
What is OpenCV? A library for image processing.
In this one sentence, the OpenCV library can be summarised. But it is much more. Originally it was designed in C++, as a highly efficient open-source computer vision library for image processing operations. Currently, the OpenCV library is available in both Python and Java and is widely used in machine learning for pre-processing and post-processing steps, as well as in regular computer vision algorithms. It can load, modify, convert and save digital images, do morphological and geometrical operations on them and even track objects or extract features with classical computer vision techniques.
The tutorial will cover Python's most commonly used and essential OpenCV library functions.
If you are using venv:
If you are using a mamba environment:
If you are using poetry:
In each case, the specific OpenCV version can be requested by typing:
We will start with basic functions for image loading, displaying, and saving the given image to the file. For the tutorial purpose, we will use the well-known in the computer vision community: lenna.png image.
Flags in the cv2.imread allow a load of an image in different modes, for example, IMREAD_UNCHANGED, IMREAD_GRAYSCALE, IMREAD_COLOR, and other types. Please refer to cv2.ImreadModes for all possible modes.
The cv2.waitKey function waits x milliseconds for a key press on an OpenCV window. If the delay parameter is 0, it waits infinitely for the keyboard input. Therefore usually, 0 parameters are used, as it waits for the user's reaction. If any other time is set, the window closes after the given time or after the key press, whatever happened first.
We can now retrieve our image shape:
One important note is that digital images loaded with OpenCV by default have BGR format. It means that the first channel of the 3-channels OpenCV array structure is Blue, not Red. The cv.imread function handles it and displays images correctly, however, if we would like to display the OpenCV-loaded image files with matplotlib, we will notice some weird behavior, as presented in the output image in Figure 1.
It is because the Red and Blue channels are switched by default. To revert it one can do cvtColor.
Right now, everything looks back to normal.
The most basic OpenCV image type is np.uint8. It is a 1-byte unsigned integer, which means it can store values from 0 to 255. There is 1 byte for each pixel in the grayscale image and 3 bytes for each pixel in the color image or 4 bytes if the image has an alpha channel additionally. It is worth mentioning that jpg images will never have an alpha channel, so if you have transparency in your image, please use png.
The standard image type after loading with OpenCV is np.uint8:
Before feeding the digital image to the neural network it is necessary to cast it to float and usually we perform normalization. It can be achieved with the following code:
or
If we would like to normalize an image in a way, that the minimum value is zero and the maximum value is 1 we could do:
or
Here we normalized the image to be between 0-1. However, many times we would like to normalize the image with its mean and standard deviation. We can do it both ways, using numpy and OpenCV:
Or directly in deep learning framework, like for example torchvision. It is preferred that way, as it usually gets the GPU acceleration.
Coming back to image types, if, for example, we load medical images like x-ray images or depth images, they usually come with 16-bit grayscale maps. It means, that after loading them in the memory, we will have uint16, with values from 0-65536.
If we would like to display it, we need to convert the image to uint8.
The last image type that is sometimes used for storing binary masks is bool.
The histogram is a plot of the insensitivity distribution of an image. It counts the number of pixels' intensity for each possible pixel value. In the case of uint8, those values are 0-255. By looking at the histogram, one can get some useful information about the image, like intensity distribution, contrast, or image brightness. A histogram can be created only for a single-channel (grayscale) image, so in the case of multiple-channel images, we need to compute the histogram for each channel separately.
As we can see from the histogram, there is a lot of red, high-intensity pixels in the Lena image. We can refer to that, as the image looks red indeed.
Histogram equalization stretches the histogram to improve the image contrast. If pixels occupy only the lower half of possible intensity values (0-127) or the upper part (128-255) the contrast will be low, as all the pixels have similar intensity. However, if we equalize the histogram pixels in the image will occupy all possible intensity levels and thus increase the contrast.
Brightness adjustment is simply constant value addition or subtraction applied to the whole image. It is equal to a histogram shifting left or right, with values clamping to 255 or 0, depending on which direction we shift a histogram.
As we can see in Figure 7, the substruction of the constant = 20 from the red channels, shifts the whole histogram left and make the whole image looks less red. On the other hand, if we shift all pixels, we will see the brightness adjustment effect.
To adjust brightness, we need to convert the image to int64 first, as simple shifting on uint8 would cause a circular shift.
Contrast adjustment is equivalent to histogram scaling and thus multiplying.
Gamma correction is a non-linear image brightness adjustment.
$$Iout=(Iin255)255$$
It allows the lightening of dark image areas, without saturating others. It is especially useful if we had images taken in bad lighting conditions, for example, as is in the image below.
Gamma correction allows for making the image bright in all areas while avoiding saturation. The results are shown in Figure 9.
OpenCV is best suitable for pre- and post-image processing or classical computer vision applications. Here we will briefly see, how to perform rectangles detection using OpenCV.
Given is an image with two rectangles.
Firstly we need to load the given image in the grayscale mode. It is important, because, we will use a thresholding operation, which only works on single-channel images.
Before applying the thresholding operation, we need to blur the image a little bit. It will remove noise from the image and make the thresholding output more smooth.
For thresholding, we use an adaptive threshold. As we can see there is a shadow in Figure 10. Adaptive thresholding will help reduce the brightness disbalance in different parts of the image.
After thresholding, we already get some nice contours as shown in Figure 11.
However, we would like to get rid of the noise from the image, before moving to the next step - contours detection. Therefore we apply the morphological opening operation to the thresholded image.
Ok. It looks a bit better right now. We can move to the contours detection step.
After getting the contours from Figure 12 we will move to the filtering step. We only want to keep contours that look like rectangles and squares. We first define MIN_ALLOWED_AREA, which will reject all contours that are too small (noise in the image). In a for loop, we go contour by contour and compute a minimum enclosing rectangle for each contour. Further, we compute the contour area and compare both computed areas together. If they are close to each other, it might be a contour and we mark it with the rectangle and the label in the image. Please find the full code below.
The final result is presented in Figure 13. All required rectangles were correctly detected.
OpenCV is a great computer vision library when it comes to regular computer vision tasks and image processing. Its main advantages are speed, multiple language support, and a variety of image-processing functions, that it implements. This huge open-source library originally was designed in c++, and both in python and java we can still notice some implementation decisions, originating from c++. Before the neural network era, it was also one of the main libraries for solving computer vision problems like text detection, object detection, object classification, recognizing objects, feature extraction, face detection, image segmentation, motion tracking, and object tracking. Personally, I find OpenCV to be a great support to neural networks in input image processing tasks for example image filtering or in camera calibration.