July 3, 2023

Face recognition in a new AI era

No items found.

In this blog post, we will explore face recognition, applications, its components, problems in this field, and some solutions.

What is Face recognition?

Face recognition is a Computer Vision task that identifies people based on facial features. Other methods like fingerprints and iris scanning exist for biometrics, widely applied in spy movies. The considerable advantage of face recognition is that it is a less invasive biometrical method than the two ones above.

On the other hand, face recognition systems might have lower quality. Still, advancements in Deep Learning in recent years made it possible to reach a wider audience, i.e., aceId on Apple or Android smartphones.

Face Recognition and AI Act

According to plans in AI Act for the EU:

real-time face recognition systems will be banned,
retroactive usage of face recognition systems will only be available to prosecute serious crimes after judicial authorization,
biometric categorization systems (gender, age, race, etc.) will be banned,
sourcing face recognition datasets from social media or CCTV cameras (violating the human right to privacy) will be banned.

Applications

Now, let's look into where we can see face recognition systems around us. Sometimes their usage is controversial, and opinions on whether it is ethical are divided.

Law enforcement

We frequently hear about using face recognition systems for law enforcement. Often in movies, they run algorithms across CCTV, cross with their “database”, and then find a “match”.

Their usage is often controversial, as some places (like San Francisco in 2019) have forbidden it, while others (like Australia) embrace it, especially after the Covid-19 pandemic.

As mentioned before, the AI Act in the EU also plans to limit the usage of face recognition systems.

User authentication

Another widespread usage of the face recognition system is user authentication. You can experience their presence when you wish to enter an airport, and you may have been trying to access some more restricted research facility or access to an online course.

Their presence at airports increased the throughput of passport control and eliminated human errors.

Account Verification

Another application is account verification which often happens when you need to verify your account in a bank, investment platform, or social media. This step is also called KYC (“Know your Customer"). Account verification limits banking fraud, i.e., when someone tries to take a loan in your name. Social media fraud can also have dire consequences, as impersonators of some influential accounts can influence stock markets or political campaigns.

You could have experienced its usage when opening an account on platforms like Revolut, Bunq, or other Internet banks. Investment platforms like Degiro or Coinbase also use similar services.s. Social media platforms like Twitter, Facebook, and Tinder also perform account verification for user onboarding. The verification process looks quite similar, and you often have to make a straight profile face and then from a left and right profile. Part of the verification process is a liveness test, when you must perform specific movements to prove that you are human, not some face-generated video or simply a picture.

Attendance

Another use of face recognition systems is class attendance, especially with the advent of online classes, where cameras must be used. In such a setup, you can apply a face recognition system to identify students at the entrance to the exam hall to minimize the risk of some frauds or impersonators. You can also consider using it as an automated tool to check who attended a particular class and who did not.

Again usage of these systems is controversial, in Sweden, one primary school got a hefty fine of tens of thousands of dollars for using cameras in the classroom without knowledge and acceptance from students.

Problems with face recognition

As we went over possible applications, we can now explore problems which might arise in Face recognition systems.

The first of them are systematic occlusions (examples can be seen in the graphic below). They can happen when we add various occlusions (via glasses, hair, facial hair like a moustache or beard, some scars, or clothing-related occlusions like headwear).

Another branch of problems is temporary occlusion (i.e. pose variation, specific object or part of body occluding face, or environmental occlusions like shadows introduced by light). Examples of this kind of occlusions can be seen below.

As face recognition systems try to focus on the nitty-gritty details of the face to identify or verify a person correctly, they are vulnerable to low-quality pictures and poor lighting.

Last but not least are problems related to the underrepresentation of certain groups. For quite a long time, face recognition datasets were filled with Caucasian, middle-aged males, which resulted in worse performance for other races, with poor performance for women, black, and Asian races and older people. To mitigate these issues, datasets were enriched, and now their performance in these groups got significantly better, but still, when creating a dataset, they are worth considering. This perfectly aligns with the requirements of the AI Act for a dataset.

Branches of face recognition

We can now explain the different branches of face recognition. The first difference is what kind of task we want to solve. If we want to answer the question, “Is this that person?” we call it face verification. Face verification is often used when we want to restrict access or check someone's identity (i.e., in the airport, our face and face from the passport are compared, and the system checks whether we are the same person who is in the photo in the passport).

On the other hand, if we want to answer the question “Who is that person” we call this branch face identification. An example of that type is a face recognition system suggesting that the person in the photos on Facebook is none other than Harry Potter.

Another branching level happens when we consider if we wish to freeze the database of identities and only compare within their scope (subject dependent) or if we want to have a universal system, which could work out of the box for faces never seen by the model (subject independent).

To visualize the matter better, let's consider the system restricting access to Hogwarts. We created a model when Harry and Ron were in their first year, and they should be granted access. Harry and Ron are again granted access one year after creating the model. The caveat happens when we consider Ginny Wesley, the younger sister of Ron. When we consider the subject-dependent model, she will not be granted access because her picture was not in the database when the model was being trained. On the other hand, when we consider the subject-independent model, if Ginny’s picture appears in the latest database, her access would be granted. When we consider criminals like Lord Voldemort, he was granted access as a student, but we would prefer to revoke his access when he became the most wanted person in the Wizarding World.

As you can imagine, subject-independent models are more widely used by people as databases and accesses change over time, and retraining the model every time is infeasible.

Face recognition pipeline

Let’s now discuss what the face recognition pipeline looks like. Initially, we have a feed from the camera as an image or video. Then we perform face detection or tracking (depending on input) and get a bounding box around the prospective face. After that, we align the picture to have only the chosen face and remove the rest of the picture as it might introduce some noise. At this point, we have a nice picture with a single face on the image. Then we try to extract features from that face and compare these features with the faces in the database. If the face in the incoming feed matches the face in the database enough, we can report a match. If not, then no match is found.

Face processing

One of the steps in improving the system's robustness are augmentations called “one-to-many” and “many-to-one”. These augmentations aim to mitigate systematic and temporary occlusions to present and model all the important facial features. The “One-to-many” algorithm produces many pictures of faces from a single image. Thanks to that model can have all the specialized information on each image, and also we can feed the model with a larger amount of faces without adding.

“Many-to-one” methods create one super-face from multiple faces from different angles. As stated before, this solution increases robustness to occlusions even further than “one-to-many” augmentation.

The classical approach to biometrics

Now, let's explore how biometrics were used in a classical way to recognize each human by facial features.

Casual biometrics used for the face are:

Distance between the eyes.
Distance from the forehead to the chin.
Distance between the nose and mouth.
Depth of the eye sockets.
The shape of the cheekbones.
Contour of the lips, ears, and chin.

*Example of facial landmarks presented in Harry Potter*

As you already know, some problems you may encounter in face recognition make the usage of classical methods imprecise since some of these characteristics might change over time.

Machine learning solutions

Datasets

In Face recognition, many datasets were collected with different objectives in mind. Here I will cover the three most popular ones: VGGFace, MSCeleb-1M, and Megaface.

VGGFace

This dataset consists of face instances across different poses and ages, focusing on the robustness of face recognition systems across changing environments. This dataset consists of 3.3M photos with 9k unique identities.

MSCeleb-1M

This dataset consists of about 4 million instances with around 87k unique ids. Supplementary to MSCelb-1M, the authors added a focused dataset called AsianCeleb with almost 94k unique identities with 3 million instances. The first version of the dataset had quite noisy labels and dataset biases. Consequently, the authors improved labels, removed low-quality examples and balanced datasets with more focus on them with more balanced race, gender and age balances.

Megaface

This dataset is probably one of the largest available datasets in face recognition, with more than 4.7 million pictures for 670k unique identities. This dataset targets another problem, how to perform face identification for many identities with a minimal number of pictures per identity.

Approaches

Holistic approaches

Within the machine learning approaches, we can observe how they historically appeared. Initially, there were holistic approaches aimed at using all available pixels as features and classifying them using all available features.

Eigenfaces which came to light in 1991, use Principal Component Analysis to picture the face and represent it with fewer features. The first part of the name comes from the fact that PCA uses eigenvectors and eigenvalues calculated on all available faces in the trainset, so within some approximation, “each new face is represented as a linear combination of faces in train set”.

Fisherface came as a successor of Eigenfaces, using Linear Discriminant Analysis instead of PCA. LDA is a dimensionality reduction method preferred to preserve separability between the classes, unlike PCA, which does not use labels in its mechanism.

Approaches based on local handcrafted features

Another branch is local handcrafted features like Gabor filters which perform spatial pooling or histogram calculations. Then on these specialized features, you can apply a classifier and answer whether the user is who he is or not.

Gabor filters are a method which is often used in texture analysis. which, in practice, is a Gaussian kernel function modulated by a sinusoidal signal.

Deep learning approaches

*DeepFace architecture from* *the original paper*

‍DeepFace (released in 2014) was one of the first deep learning approaches, which uses 3D face alignment (using 67 fiducial points), performs face frontalization (via classical Computer Vision solution) and then runs face recognition on 152x152 RGB pictures using a network consisting of 8 layers network (consisting of convolutional, locally connected, and fully connected layers) AlexNet highly influenced this architecture.

DeepID (released in 2015) was inspired by VGGNet and GoogleNet with a combination of stacked convolutions and inception layers. This solution uses contrastive loss to minimize the distance between produced features for positive pairs (matching faces) and maximize the distance to negative pairs (not matching faces).

FaceNet (released in 2015) also operates in hyperspace (Euclidean space), using GoogleNet as its inspiration. This solution uses triplet loss, which always grabs three examples at once (anchor, positive and negative) and simultaneously minimises the distance for positive examples and maximises the distance for negative ones.

*Comparison of losses, Triple loss, Tuplet loss, and ArcFace loss*

A more recent solution, called ArcFace (released in 2018), inspired by the ResNet-100 backbone, uses specially designed loss. As triplet loss was trying to maximize the distances between single examples, ArcFace loss uses more elaborate measures and tries to use centres of positive and negative populations. Thanks to these more elaborate loss calculations, models using it seem more robust to noisy and hard faces.

The latest SOTA results are basing their solutions on the aforementioned solutions with finetuning of all available hyperparameters.

Conclusions

In this blog post, we analyzed the field of face recognition, with exciting and controversial applications (especially with the current landscape of incoming regulations) and possible problems which are bread and butter for the practitioners in the field. We also explored the majority of branches of face recognition (including differences between face identification - “Who is on the photo?” and face verification - the question “Are you the person who you claim you are?”), how the classical and machine learning approaches were, what interesting tricks they used, and where they are now. Thanks to this analysis, I hope you gain insight into face recognition.

‍