How can one quickly look up people from a large database?

Vocabulary

  • Face detection: Finding all faces in an image.
  • Face representation: The simplest way to represent a face is as an image (pixels / color values). This is not very space efficient and likely makes follow-up tasks hard. Face embeddings are one other representation. In this case a face is a point on the unit-sphere in $\mathbb{R}^{128}$, IIRC.
  • Face verification: Given two face representations, deciding if they are the same

Question

I was just wondering how identifying a person with many potential people can work. So finding a face in an image works quite well and fast enough for most applications. Face verification as well. But I'm not sure how to scale this if you don't compare 1 face against 1 other face, but 1 against millions / billions of faces.

Suppose you have a lot of examples of faces with the identity of the person. Think of Facebook, where many people tagged friends. Or of countries with biometric passports.

In the real applications, the face verification task is easy because you can just brute-force compare against all candidates:

  • Facebook: Only candidates are your friends, so ~200 candidates.
  • Airport EU fast entry: Your face is compared against the passport. So only one candidate.

But then think about some dystopian books / movies, where cameras identify anybody. While tracking helps to reduce that problem, finding a match from millions / billions of examples is computationally super heavy. Assuming a single face verification takes ~200ms, for a million candidates it would already take 60h. For a billion users it would already be 6 years. For all people on earth who currently live it is 48 years.

So with that many candidates, you don't want to compare against all candidates.

When you use the face-embedding, it becomes a nearest neighbor search in $\mathbb{R}^{128}$.

Calculating the euclidean distance of two vectors in $\mathbb{R}^{128}$ takes roughly 15μs (see "timing" below). This means a single check over $7.5 \cdot 10^9$ people would take 31h. Way better, but still pretty long.

While the face-embedding approach pre-computes a good face representation, going over all examples is still a pretty dumb approach. If it was only $\mathbb{R}^1$, one could make a simple binary tree. For few dimensions, I think something like a k-d-tree might work. But what about 128 dimensions?

Is there another approach to get the person quicker?

Timing

import numpy as np
durations = timeit.repeat('np.linalg.norm(a-b)',
                          setup='import numpy as np;a=np.random.random(128);b=np.random.random(128)',
                          repeat=1000,
                          number=3)
print('min: {min:5.1f}μs, mean: {mean:5.1f}μs, max: {max:6.1f}μs'
      .format(min=min(durations) * 10**6,
              mean=np.mean(durations) * 10**6,
              max=max(durations) * 10**6,
              ))

Topic image-recognition search scalability bigdata

Category Data Science


You’ve got the right idea with a k-d tree. The basic idea is that you don’t calculate an embedding for every face every time you do a query - you keep your database up to date and store the embeddings you have already calculated in your index. When you get a “query” (a new photo of a face) you generate the embedding for that and then search the index (possibly a k-d tree) for embeddings nearby that new embedding. You can also have more than one index, so that if the face is spotted in London you can first query the index that only covers faces likely to be seen in London.

The most difficult part of your work is likely to be creating a good embedding, and you’d probably create this by generating a high-dimensional embedding with a neural network and then mapping this to a much smaller embedding using manifold learning (e.g. t-SNE or UMAP).

This article is about a team using a similar approach: https://datascopeanalytics.com/blog/building-a-visual-search-algorithm/


That is problem is call identification, mapping a percept to a specific entity.

One common option is hashing, take a percept and map it to a specific, unique integer. If two different percepts map to the same integer, they are the same entity. If two different percepts do not map to the same integer, they are different entities. Hashing takes constant time to look-up an entity no matter how many entities.

In the case of facial recognition, the hashing function is best learned through Deep Learning.


You could do this progressively. For example, you could cluster embedding and try to classify ethnicity, gender and age.

Then you only need to search for people in that ethnicity, gender and age.

Similar procedures to reduce the search region. Just by gender, you already reduce your number of comparisons by half, age can help you separate it in (say you use 10 years) about 11 parts (which will be of different size, but still a nice improvement), you can reduce it further by separating by ethnicity.

Demographics of the world (Wikipedia)

Age: According to the 2006 CIA World Factbook, around 27% of the world's population is below 15 years of age.

  • 0–14 years: 26.3% (male 944,987,919/female 884,268,378)1
  • 15–64 years: 65.9% (male 2,234,860,865/female 2,187,838,153)1
  • 65 years and over: 7.9% (male 227,164,176/female 289,048,221) (2011 est.)1

Gender: This can be approximated by half/half distribution

Ethnicity: The largest ethnicity is Han (Sino-Tibetan -> Chinese) with 1.3 billions of members

So, for the worst case, with a 15-64 years world person, male or female, from Han ethnic group you have

$$ 1.3 \times 10^9 \times 0.5 \times 65.9\% = 428.35 \times 10^6 \text{ people} $$

Using your time estimate of $15 \mu s$ we have:

$$ 15 \times 10^{-6} \times 428.35 \times 10^6 = 1.78 \text{ hours} $$

The second worst case would be Arabs of the same age, which would take about $ 37 \text{ minutes}$ and for UK it would take about $ 18 \text{ minutes}$

This is still a slow procedure, but you can probably find clusters inside this large groups to reduce the search time even further.

I suppose it is viable to separate age 15-64 into at least 3 groups. If this can be done

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.