January 12, 2023

What is OpenCV used for?

What is OpenCV? A library for image processing.

In this one sentence, the OpenCV library can be summarised. But it is much more. Originally it was designed in C++, as a highly efficient open-source computer vision library for image processing operations. Currently, the OpenCV library is available in both Python and Java and is widely used in machine learning for pre-processing and post-processing steps, as well as in regular computer vision algorithms. It can load, modify, convert and save digital images, do morphological and geometrical operations on them and even track objects or extract features with classical computer vision techniques.

The tutorial will cover Python's most commonly used and essential OpenCV library functions.

_Installation

If you are using venv:


pip install opencv-python

If you are using a mamba environment:


mamba install -c conda-forge opencv

If you are using poetry:


poetry add opencv-python

In each case, the specific OpenCV version can be requested by typing:


opencv-python==4.6.0.66

‍

Image Loading and Basic Manipulations

We will start with basic functions for image loading, displaying, and saving the given image to the file. For the tutorial purpose, we will use the well-known in the computer vision community: lenna.png image.


import cv2
import numpy as np
import sys

# load the image
file_path: str = "lenna.png"
img: np.ndarray = cv2.imread(filename=file_path,flags=cv2.IMREAD_UNCHANGED)

# check whether image was correctly loaded
if img is None:
  sys.exit(f"Could not read the image {file_path}.")

# display image and wait for keyboard input

cv2.imshow(winname="Display image", mat=img)
# wait for the user key press and hide the window
key = cv2.waitKey(delay=0)

# if the `s` was pressed save the image
if key == ord("s"):
  cv2.imwrite(filename="lenna_out.png", img=img)

Flags in the cv2.imread allow a load of an image in different modes, for example, IMREAD_UNCHANGED, IMREAD_GRAYSCALE, IMREAD_COLOR, and other types. Please refer to cv2.ImreadModes for all possible modes.

The cv2.waitKey function waits x milliseconds for a key press on an OpenCV window. If the delay parameter is 0, it waits infinitely for the keyboard input. Therefore usually, 0 parameters are used, as it waits for the user's reaction. If any other time is set, the window closes after the given time or after the key press, whatever happened first.

We can now retrieve our image shape:


h, w, c = img.shape

BGR vs. RGB

One important note is that digital images loaded with OpenCV by default have BGR format. It means that the first channel of the 3-channels OpenCV array structure is Blue, not Red. The cv.imread function handles it and displays images correctly, however, if we would like to display the OpenCV-loaded image files with matplotlib, we will notice some weird behavior, as presented in the output image in Figure 1.


import matplotlib.pyplot as plt
import matplotlib.image as AxesImage

imgplot:AxesImage = plt.imshow(X=img)
plt.show()

‍

It is because the Red and Blue channels are switched by default. To revert it one can do cvtColor.


rgb_image: np.ndarray = cv2.cvtColor(src=img, code=cv2.COLOR_BGR2RGB)

Right now, everything looks back to normal.

OpenCV image types

The most basic OpenCV image type is np.uint8. It is a 1-byte unsigned integer, which means it can store values from 0 to 255. There is 1 byte for each pixel in the grayscale image and 3 bytes for each pixel in the color image or 4 bytes if the image has an alpha channel additionally. It is worth mentioning that jpg images will never have an alpha channel, so if you have transparency in your image, please use png.

The standard image type after loading with OpenCV is np.uint8:


>>> print(img.dtype)
uint8

Before feeding the digital image to the neural network it is necessary to cast it to float and usually we perform normalization. It can be achieved with the following code:


normalized_img: np.ndarray = img.astype(np.float32)/255.0


normalized_img: np.ndarray = cv2.normalize(img.astype(np.float32), dst=None, alpha=1.0, beta=0.0, norm_type=cv2.NORM_INF

If we would like to normalize an image in a way, that the minimum value is zero and the maximum value is 1 we could do:


float_img: np.ndarray = img.astype(np.float32)
normalized_img: np.ndarray = (float_img - np.min(float_img))/(np.max(float_img) - np.min(float_img))


normalized_img: np.ndarray = cv2.normalize(img.astype(np.float32), dst=None, alpha=1.0, beta=0.0, norm_type=cv2.NORM_MINMAX)

Here we normalized the image to be between 0-1. However, many times we would like to normalize the image with its mean and standard deviation. We can do it both ways, using numpy and OpenCV:


mean: tuple[float] = (0.485, 0.456, 0.406)
std: tuple[float] = (0.229, 0.224, 0.225)

float_img: np.ndarray = img.astype(np.float32)
normalized_img: np.ndarray = (float_img/255.0 - mean)/std

Or directly in deep learning framework, like for example torchvision. It is preferred that way, as it usually gets the GPU acceleration.


import torch
from torchvision import transforms

transforms = torch.nn.Sequential(
  transforms.Normalize(mean, std),
)
normalized_img: torch.tensor = transforms(torch.tensor(float_img/255.0).permute(2,0,1))

Coming back to image types, if, for example, we load medical images like x-ray images or depth images, they usually come with 16-bit grayscale maps. It means, that after loading them in the memory, we will have uint16, with values from 0-65536.


>>> import pydicom
>>> dicom = pydicom.read_file("./heart.DCM")
>>> data = dicom.pixel_array
>>> print(data.dtype)
uint16

If we would like to display it, we need to convert the image to uint8.


normalized_img: np.ndarray = cv2.normalize(data.astype(np.float32), dst=None, alpha=256.0, beta=0.0, norm_type=cv2.NORM_INF).astype(np.uint8)
cv2.imshow("dicom_image", np.transpose(normalized_img[:3],(1,2,0)))
cv2.waitKey(0)

Figure 3. Visualizing a dicom image stored on 16 bits. A single plane of the knee MR scan. Source: Figshare

The last image type that is sometimes used for storing binary masks is bool.


h, w, c = img.shape
mask: np.ndarray = np.ones_like(a=img,dtype=bool)
mask[h//2-20:h//2+20,w//2-20:w//2+20] = 0

img *= mask

cv2.imshow(winname="masked_image",mat=img)
cv2.waitKey(delay=0)

Basic image operations and image processing

Image histogram

The histogram is a plot of the insensitivity distribution of an image. It counts the number of pixels' intensity for each possible pixel value. In the case of uint8, those values are 0-255. By looking at the histogram, one can get some useful information about the image, like intensity distribution, contrast, or image brightness. A histogram can be created only for a single-channel (grayscale) image, so in the case of multiple-channel images, we need to compute the histogram for each channel separately.


colors: tuple[str] = ('b','g','r')
for i,color in enumerate(colors):
  histogram: np.ndarray = cv2.calcHist(images=[img], channels=[i], mask=None, histSize=[256], ranges=[0,256])
  plt.plot(histogram, color=color)
  plt.xlim([0,256])
plt.show()

As we can see from the histogram, there is a lot of red, high-intensity pixels in the Lena image. We can refer to that, as the image looks red indeed.

Histogram equalization

Histogram equalization stretches the histogram to improve the image contrast. If pixels occupy only the lower half of possible intensity values (0-127) or the upper part (128-255) the contrast will be low, as all the pixels have similar intensity. However, if we equalize the histogram pixels in the image will occupy all possible intensity levels and thus increase the contrast.


equalized_channels: np.ndarray = [cv2.equalizeHist(src=img[:,:,channel]) for channel in range(img.shape[2])]

equalized_img: np.ndarray = np.concatenate(np.expand_dims(equalized_channels, axis=3), axis=2)

colors: tuple[str] = ('r','g','b')

fig = plt.figure()
ax1 = fig.add_subplot(221)
ax2 = fig.add_subplot(222)
ax3 = fig.add_subplot(223)
ax4 = fig.add_subplot(224)
ax1.title.set_text('Original histogram')
ax2.title.set_text('Equalized histogram')
ax3.title.set_text('Original image')
ax4.title.set_text('Equalized image')
plt.subplot(2,2,4)
rgb_equalized_img: np.ndarray = cv2.cvtColor(src=equalized_img, code=cv2.COLOR_BGR2RGB)
plt.imshow(rgb_equalized_img)
plt.subplot(2,2,2)
image_histogram(img=rgb_equalized_img, colors=colors)

plt.subplot(2,2,3)
rgb_img: np.ndarray = cv2.cvtColor(src=img, code=cv2.COLOR_BGR2RGB)
plt.imshow(rgb_img)
plt.subplot(2,2,1)
image_histogram(img=rgb_img, colors=colors)

plt.show()

Contrast and brightness adjustment

Brightness

Brightness adjustment is simply constant value addition or subtraction applied to the whole image. It is equal to a histogram shifting left or right, with values clamping to 255 or 0, depending on which direction we shift a histogram.


shift: np.ndarray = np.array([0,0,-40], dtype=int)
img = img.astype(int)
img += shift
img = np.clip(a=img, a_min=0, a_max=255)
img = img.astype(np.uint8)

for i, color in enumerate(colors):
  histogram: np.ndarray = cv2.calcHist(images=[img], channels=[i], mask=None, histSize=[256], ranges=[0,256])
  plt.plot(histogram, color=color)
  plt.xlim([0,256])

rgb_image: np.ndarray = cv2.cvtColor(src=img, code=cv2.COLOR_BGR2RGB)
plt.subplot(2,2,2)
plt.imshow(rgb_image)

plt.show()

Figure 7. Results of brightness modification on Lenna image.

As we can see in Figure 7, the substruction of the constant = 20 from the red channels, shifts the whole histogram left and make the whole image looks less red. On the other hand, if we shift all pixels, we will see the brightness adjustment effect.

To adjust brightness, we need to convert the image to int64 first, as simple shifting on uint8 would cause a circular shift.

Contrast

Contrast adjustment is equivalent to histogram scaling and thus multiplying.


scale: np.ndarray = np.array([1.4, 1.4, 1.0], dtype=float)
img = img.astype(float)
img *= scale
img = np.clip(a=img, a_min=0, a_max=255)
img = img.astype(np.uint8)

for i, color in enumerate(colors):
  histogram: np.ndarray = cv2.calcHist(images=[img], channels=[i], mask=None, histSize=[256], ranges=[0,256])
  plt.plot(histogram, color=color)
  plt.xlim([0,256])

rgb_image: np.ndarray = cv2.cvtColor(src=img, code=cv2.COLOR_BGR2RGB)
plt.subplot(2,2,2)
plt.imshow(rgb_image)

plt.show()

Figure 8. Results of contrast modification on Lenna image.

Gamma correction

Gamma correction is a non-linear image brightness adjustment.

$$Iout=(Iin255)255$$

It allows the lightening of dark image areas, without saturating others. It is especially useful if we had images taken in bad lighting conditions, for example, as is in the image below.


def image_histogram(img: np.ndarray,colors: tuple[str]) -> None:
  for i, color in enumerate(colors):
    histogram: np.ndarray = cv2.calcHist(images=[img], channels=[i], mask=None, histSize=[256], ranges=[0,256])
    plt.plot(histogram, color=color)
    plt.xlim([0,256])

colors: tuple[str] = ('b','g','r')

plt.subplot(2,2,3)
plt.imshow(img)
plt.subplot(2,2,1)
image_histogram(img=img, colors=colors)

# gamma correction
plt.subplot(2,2,2)
gamma: float = 0.4
img = img.astype(float)
img = np.clip(pow(img / 255.0, gamma) * 255.0, 0, 255)
img = np.clip(a=img, a_min=0, a_max=255)
img = img.astype(np.uint8)

image_histogram(img=img, colors=colors)

rgb_image: np.ndarray = cv2.cvtColor(src=img, code=cv2.COLOR_BGR2RGB)
plt.subplot(2,2,4)
plt.imshow(rgb_image)

plt.show()

Figure 9. Results of gamma correction applied to the image with brightness disbalance. Source: Canon

Gamma correction allows for making the image bright in all areas while avoiding saturation. The results are shown in Figure 9.

OpenCV in computer vision applications

OpenCV is best suitable for pre- and post-image processing or classical computer vision applications. Here we will briefly see, how to perform rectangles detection using OpenCV.

Given is an image with two rectangles.

Figure 10. Cards image for the rectangles detection task. Source: Sostrenegrene

Firstly we need to load the given image in the grayscale mode. It is important, because, we will use a thresholding operation, which only works on single-channel images.


# load the image in grayscale
file_path: str = "./cards.png"
img: np.ndarray = cv2.imread(filename=file_path, flags=cv2.IMREAD_GRAYSCALE)

Before applying the thresholding operation, we need to blur the image a little bit. It will remove noise from the image and make the thresholding output more smooth.


img = cv2.medianBlur(src=img, ksize=5)

For thresholding, we use an adaptive threshold. As we can see there is a shadow in Figure 10. Adaptive thresholding will help reduce the brightness disbalance in different parts of the image.


thresholded_image: np.ndarray = cv2.adaptiveThreshold(
  src=img,
  maxValue=255,
  adaptiveMethod=cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
  thresholdType=cv2.THRESH_BINARY,
  blockSize=11,
  C=2,
)

After thresholding, we already get some nice contours as shown in Figure 11.

Figure 11. Cards after adaptive gaussian thresholding operation.

However, we would like to get rid of the noise from the image, before moving to the next step - contours detection. Therefore we apply the morphological opening operation to the thresholded image.


kernel5x5: np.ndarray = np.ones(shape=(5,5), dtype=np.uint8)
opened_image: np.ndarray = cv2.morphologyEx(
  src=thresholded_image,
  op=cv2.MORPH_OPEN,
  kernel=kernel5x5,
)

Figure 12. Cards after opening operation.

Ok. It looks a bit better right now. We can move to the contours detection step.


contours, hierarchy = cv2.findContours(
  image=opened_image,
  mode=cv2.RETR_LIST,
  method=cv2.CHAIN_APPROX_TC89_L1,
)

After getting the contours from Figure 12 we will move to the filtering step. We only want to keep contours that look like rectangles and squares. We first define MIN_ALLOWED_AREA, which will reject all contours that are too small (noise in the image). In a for loop, we go contour by contour and compute a minimum enclosing rectangle for each contour. Further, we compute the contour area and compare both computed areas together. If they are close to each other, it might be a contour and we mark it with the rectangle and the label in the image. Please find the full code below.

The final result is presented in Figure 13. All required rectangles were correctly detected.

Figure 13. Final results for the rectangles detection task.


# load the image in grayscale
file_path: str = "./cards.png"
img: np.ndarray = cv2.imread(filename=file_path, flags=cv2.IMREAD_GRAYSCALE)

# check whether image was correctly loaded
if img is None:
  sys.exit(f"Could not read the image {file_path}.")

img = cv2.medianBlur(src=img, ksize=5)
thresholded_image: np.ndarray = cv2.adaptiveThreshold(
  src=img,
  maxValue=255,
  adaptiveMethod=cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
  thresholdType=cv2.THRESH_BINARY,
  blockSize=11,
  C=2,
)

kernel5x5: np.ndarray = np.ones(shape=(5,5), dtype=np.uint8)
opened_image: np.ndarray = cv2.morphologyEx(
  src=thresholded_image,
  op=cv2.MORPH_OPEN,
  kernel=kernel5x5,
)

contours, hierarchy = cv2.findContours(
  image=opened_image,
  mode=cv2.RETR_LIST,
  method=cv2.CHAIN_APPROX_TC89_L1,
)

h, w = img.shape
MIN_ALLOWED_AREA: Final[float] = 0.1*h*w
for cnt in contours:
  rect: np.ndarray = cv2.minAreaRect(points=cnt)
  box: np.ndarray = cv2.boxPoints(box=rect)
  box = np.int0(box)
  box_area: float = cv2.contourArea(contour=box)
  area: float = cv2.contourArea(contour=cnt)
  if box_area 0 or area 0:
    continue

  areas_ratio: float = (min(area, box_area)/max(area, box_area))
  if area > MIN_ALLOWED_AREA and areas_ratio < 1.2 and areas_ratio > 0.8:
    x1, y1 = cnt[0][0]
    # label rectangle detected on the image
    cv2.putText(img, 'Rectangle', (x1, y1), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2)
    sorted_bbox: np.ndarray = box[np.argsort(np.linalg.norm(box, axis=1))]
    # corners are in order bottom left, bottom right or top left, top left or bottom right, top right
    # to be in the drawing order last two corners need to switch places
    sorted_bbox_tmp: np.ndarray = sorted_bbox[2].copy()
    sorted_bbox[2] = sorted_bbox[3]
    sorted_bbox[3] = sorted_bbox_tmp
    # draw detected rectangle on the image
    img = cv2.drawContours(
        image=img,
        contours=[sorted_bbox],
        contourIdx=-1,
        color=(0,255,0),
        thickness=3,
    )
cv2.imshow(winname="result", mat=img)
cv2.waitKey(delay=0)

Summary

OpenCV is a great computer vision library when it comes to regular computer vision tasks and image processing. Its main advantages are speed, multiple language support, and a variety of image-processing functions, that it implements. This huge open-source library originally was designed in c++, and both in python and java we can still notice some implementation decisions, originating from c++. Before the neural network era, it was also one of the main libraries for solving computer vision problems like text detection, object detection, object classification, recognizing objects, feature extraction, face detection, image segmentation, motion tracking, and object tracking. Personally, I find OpenCV to be a great support to neural networks in input image processing tasks for example image filtering or in camera calibration.

‍