I and countless others have spent the last several years referring to the National Institute of Standards and Technology’s Face Recognition Vendor Test, or FRVT. I guess some people have spent almost a quarter century referring to FRVT, because the term has been in use since 1999.
Starting now, you’re not supposed to use the FRVT acronym any more.
To bring clarity to our testing scope and goals, what was formerly known as FRVT has been rebranded and split into FRTE (Face Recognition Technology Evaluation) and FATE (Face Analysis Technology Evaluation). Tracks that involve the processing and analysis of images will run under the FATE activity, and tracks that pertain to identity verification will run under FRTE. All existing participation and submission procedures remain unchanged.
The change actually makes sense, since tasks such as age estimation and presentation attack detection (liveness detection) do not directly relate to the identification of individuals.
Us old folks just have to get used to the change.
I just hope that the new “FATE” acronym doesn’t mean that some algorithms are destined to perform better than others.
What’s more, these results change on a monthly basis, so it’s quite possible that the #1 vendor in some category in February 2022 was no longer than #1 vendor in March 2022. (And if your company markets years-old FRVT results, stop it!)
This is the August 15, 2023 peek at three ways to slice and dice the NIST FRVT results.
And a bunch of vendors will be mad at me because I didn’t choose THEIR preferred slicing and dicing, or their ways to exclude results (not including Chinese algorithms, not including algorithms used in surveillance, etc.). The mad vendors can write their own blog posts (or ask Bredemarket to ghostwrite them on their behalf).
NIST FRVT 1:1, VISABORDER
The phrase “NIST FRVT 1:1, VISABORDER” is shorthand for the NIST one-to-one version of the Face Recognition Vendor Test, using the VISABORDER probe and gallery data. This happens to be the default way in which NIST sorts the 1:1 accuracy results, but of course you can sort them against any other probe/gallery combination, and get a different #1 vendor.
As of August 15, the top two accuracy algorithms for VISABORDER came from Cloudwalk. Here are all of the top ten.
But NIST doesn’t just measure accuracy for a bunch of different probe-target combinations. It also measures performance, since the most accurate algorithm in the world won’t do you any good if it takes forever to compare the face templates.
One caveat regarding these measures is that NIST conducts the tests on a standardized set of equipment, so that results between vendors can be compared. This is important to note, because a comparison that takes 103 milliseconds on NIST’s equipment will yield a different time on a customer’s equipment.
One of the many performance measures is “Comparison Time (Mate).” There is also a performance measure for “Comparison Time (Non-mate).”
So in this test, the fastest vendor algorithm comes from Trueface. Again, here are the top 10.
Now I know what some of you are saying. “John,” you say, “the 1:1 test only measures a comparison against one face against one other face, or what NIST calls verification. What if you’re searching against a database of faces, or identification?”
Well, NIST has a 1:N test to measure that particular use case. Or use cases, because again you can slice and dice the results in so many different ways.
When looking at accuracy, the default NIST 1:N sort is by:
Probe images from the BORDER database.
Gallery images from a 1,600,000 record VISA database.
Cloudwalk happens to be the #1 vendor in this slicing and dicing of the test. Here are the top ten.
The usual cautions apply that everyone, including NIST, emphasizes that these test results do not guarantee similar results in an operational environment. Even if the algorithm author ported its algorithm to an operational system with absolutely no changes, the operational system will have a different hardware configuration and will have different data.
For example, none of the NIST 1:N tests use databases with more than 12 million records. Even 20 years ago, Behnam Bavarian correctly noted that biometric databases would eventually surpass hundreds of millions of records, or even billions of records. There is no way that NIST could assemble a test database that large.
Iris recognition continues to make the news. Let’s review what iris recognition is and its benefits (and drawbacks), why Apple made the news last month, and why Worldcoin is making the news this month.
What is iris recognition?
There are a number of biometric modalities that can identify individuals by “who they are” (one of the five factors of authentication). A few examples include fingerprints, faces, voices, and DNA. All of these modalities purport to uniquely (or nearly uniquely) identify an individual.
One other way to identify individuals is via the irises in their eyes. I’m not a doctor, but presumably the Cleveland Clinic employs medical professionals who are qualified to define what the iris is.
The iris is the colored part of your eye. Muscles in your iris control your pupil — the small black opening that lets light into your eye.
But why use irises rather than, say, fingerprints and faces? The best person to answer this is John Daugman. (At this point several of you are intoning, “John Daugman.” With reason. He’s the inventor of iris recognition.)
(I)ris patterns become interesting as an alternative approach to reliable visual recognition of persons when imaging can be done at distances of less than a meter, and especially when there is a need to search very large databases without incurring any false matches despite a huge number of possibilities. Although small (11 mm) and sometimes problematic to image, the iris has the great mathematical advantage that its pattern variability among different persons is enormous.
Daugman, John, “How Iris Recognition Works.” IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 14, NO. 1, JANUARY 2004. Quoted from page 21. (PDF)
Or in non-scientific speak, one benefit of iris recognition is that you know it is accurate, even when submitting a pair of irises in a one-to-many search against a huge database. How huge? We’ll discuss later.
Brandon Mayfield and fingerprints
Remember that Daugman’s paper was released roughly two months before Brandon Mayfield was misidentified in a fingerprint comparison. (Everyone now intone “Brandon Mayfield.”)
While some of the issues associated with Mayfield’s misidentification had nothing to do with forensic science (Al Jazeera spends some time discussing bias, and Itiel Dror also looked at bias post-Mayfield), this still shows that fingerprints are remarkably similar and that it takes care to properly identify people.
Police agencies, witnesses, and faces
And of course there are recent examples of facial misidentifications (both by police agencies and witnesses), again not necessarily forensic science related, and again showing the similarity of faces from two different people.
At the root of iris recognition’s accuracy is the data-richness of the iris itself. The IrisAccess system captures over 240 degrees of freedom or unique characteristics in formulating its algorithmic template. Fingerprints, facial recognition and hand geometry have far less detailed input in template construction.
Enough about claims. What about real results? The IREX 10 test, independently administered by the U.S. National Institute of Standards and Technology, measures the identification (one-to-many) accuracy of submitted algorithms. At the time I am writing this, the ten most accurate algorithms provide false negative identification rates (FNIR) between 0.0022 ± 0.0004 and 0.0037 ± 0.0005 when two eyes are used. (Single eye accuracy is lower.) By the time you see this, the top ten algorithms may have changed, because the vendors are always improving.
IREX10 two-eye accuracy, top ten algorithms as of July 28, 2023. (Link)
While the IREX10 one-to-many tests are conducted against databases of less than a million records, it is estimated that iris one-to-many accuracy remains high even with databases of a billion people—something we will return to later in this post.
Iris drawbacks
OK, so if irises are so accurate, why aren’t we dumping our fingerprint readers and face readers and just using irises?
In short, because of the high friction in capturing irises. You can use high-resolution cameras to capture fingerprints and faces from far away, but as of now iris capture usually requires you to get very close to the capture device.
Iris image capture circa 2020 from the U.S. Federal Bureau of Investigation. (Link)
Which I guess is better than the old days when you had to put your eye right up against the capture device, but it’s still not as friendly (or intrusive) as face capture, which can be achieved as you’re walking down a passageway in an airport or sports stadium.
Irises and Apple Vision Pro
So how are irises being used today? You may or may not have hard last month’s hoopla about the Apple Vision Pro, which uses irises for one-to-one authetication.
I’m not going to spend a ton of time delving into this, because I just discussed Apple Vision Pro in June. In fact, I’m just going to quote from what I already said.
In short, as you wear the headset (which by definition is right on your head, not far away), the headset captures your iris images and uses them to authenticate you.
It’s a one-to-one comparison, not the one-to-many comparison that I discussed earlier in this post, but it is used to uniquely identify an individual.
But iris recognition doesn’t have to be used for identification.
Irises and Worldcoin
“But wait a minute, John,” you’re saying. “If you’re not using irises to determine if a person is who they say they are, then why would anyone use irises?”
Over the past several years, I’ve analyzed a variety of identity firms. Earlier this year I took a look at Worldcoin….Worldcoin’s World ID emphasizes privacy so much that it does not conclusively prove a person’s identity (it only proves a person’s uniqueness)…
That’s the only thing that I’ve said about Worldcoin, at least publicly. (I looked at Worldcoin privately earlier in 2023, but that report is not publicly accessible and even I don’t have it any more.)
The Worldcoin Foundation today announced that Worldcoin, a project co-founded by Sam Altman, Alex Blania and Max Novendstern, is now live and in a production-grade state.
The launch includes the release of the World ID SDK and plans to scale Orb operations to 35+ cities across 20+ countries around the world. In tandem, the Foundation’s subsidiary, World Assets Ltd., minted and released the Worldcoin token (WLD) to the millions of eligible people who participated in the beta; WLD is now transactable on the blockchain….
“In the age of AI, the need for proof of personhood is no longer a topic of serious debate; instead, the critical question is whether or not the proof of personhood solutions we have can be privacy-first, decentralized and maximally inclusive,” said Worldcoin co-founder and Tools for Humanity CEO Alex Blania. “Through its unique technology, Worldcoin aims to provide anyone in the world, regardless of background, geography or income, access to the growing digital and global economy in a privacy preserving and decentralized way.”
Worldcoin does NOT positively identify people…but it can still pay you
A very important note: Worldcoin’s purpose is not to determine identity (that a person is who they say they are). Worldcoin’s purpose is to determine uniqueness: namely, that a person (whoever they are) is unique among all the billions of people in the world. Once uniqueness is determined, the person can get money money money with an assurance that the same person won’t get money twice.
Iris biometrics outperform other biometric modalities and already achieved false match rates beyond 1.2× 10−141.2×10−14 (one false match in one trillion[9]) two decades ago[10]—even without recent advancements in AI. This is several orders of magnitude more accurate than the current state of the art in face recognition.
It’s been years since I talked about Identity Assurance Levels (IALs) in any detail, but I wanted to delve into two of the levels and see when IAL3 is necessary, and when it is not.
The U.S. National Institute of Standards and Technology has defined “identity assurance levels” (IALs) that can be used when dealing with digital identities. It’s helpful to review how NIST has defined the IALs.
Assurance in a subscriber’s identity is described using one of three IALs:
IAL1: There is no requirement to link the applicant to a specific real-life identity. Any attributes provided in conjunction with the subject’s activities are self-asserted or should be treated as self-asserted (including attributes a [Credential Service Provider] CSP asserts to an [Relying Party] RP). Self-asserted attributes are neither validated nor verified.
IAL2: Evidence supports the real-world existence of the claimed identity and verifies that the applicant is appropriately associated with this real-world identity. IAL2 introduces the need for either remote or physically-present identity proofing. Attributes could be asserted by CSPs to RPs in support of pseudonymous identity with verified attributes. A CSP that supports IAL2 can support IAL1 transactions if the user consents.
IAL3: Physical presence is required for identity proofing. Identifying attributes must be verified by an authorized and trained CSP representative. As with IAL2, attributes could be asserted by CSPs to RPs in support of pseudonymous identity with verified attributes. A CSP that supports IAL3 can support IAL1 and IAL2 identity attributes if the user consents.
For purposes of this post, IAL1 is (if I may use a technical term) a nothingburger. It may be good enough for a Gmail account, but these days even social media accounts are more likely to require IAL2.
So what’s the practical difference between IAL2 and IAL3?
If we ignore IAL1 and concentrate on IAL2 and IAL3, we can see one difference between the two. IAL2 allows remote, unsupervised identity proofing, while IAL3 requires (in practice) that any remote identity proofing is supervised.
Much of my time at my previous employer Incode Technologies involved unsupervised remote identity proofing (IAL2). For example, if a woman wants to set up an account at a casino, she can complete the onboarding process to set up the account on her phone, without anyone from the casino being present to make sure she wasn’t faking her face or her ID. (Fraud detection is the “technologies” part of Incode Technologies, and that’s how they make sure she isn’t faking.)
SRIP provides remote supervision of in-person proofing using NextgenID’s Identity Stations, an all-in-one system designed to securely perform all enrollment processes and workflow requirements. The station facilitates the complete and accurate capture at IAL levels 1, 2 and 3 of all required personal identity documentations and includes a full complement of biometric capture support for face, fingerprint, and iris.
Now there are some other differences between IAL2 and IAL3 in terms of the proofing, so NIST came up with a handy dandy chart that allows you to decide which IAL level you need.
At this point, the agency understands that some level of proofing is required. Step 3 is intended to look at the potential impacts of an identity proofing failure to determine if IAL2 or IAL3 is the most appropriate selection. The primary identity proofing failure an agency may encounter is accepting a falsified identity as true, therefore providing a service or benefit to the wrong or ineligible person. In addition, proofing, when not required, or collecting more information than needed is a risk in and of itself. Hence, obtaining verified attribute information when not needed is also considered an identity proofing failure. This step should identify if the agency answered Step 1 and 2 incorrectly, realizing they do not need personal information to deliver the service. Risk should be considered from the perspective of the organization and to the user, since one may not be negatively impacted while the other could be significantly harmed. Agency risk management processes should commence with this step.
Even with the complexity of the flowchart, some determinations can be pretty simple. For example, if any of the six risks listed under question 3 are determined to be “high,” then you must use IAL3.
But the whole exercise is a lot to work through, and you need to work through it yourself. When I pasted the PNG file for the flowchart above into this blog post, I noticed that the filename is “IAL_CYOA.png.” And we all know what “CYOA” means.
But if you do the work, you’ll be better informed on the procedures you need to use to verify the identities of people.
One footnote: although NIST is a U.S. organization, its identity assurance levels (including IAL2 and IAL3) are used worldwide, including by the World Bank. So everyone should be familiar with them.
After a lack of appearances in the Bredemarket blog (none since November), Pangiam is making an appearance again, based on announcements by Biometric Update and Trueface itself about a new revision of the Trueface facial recognition SDK.
The new revision includes a number of features, including a new model for masked faces and some technical improvements.
So what is this revision called?
Version 1.0.
“Wait,” you’re asking yourself. “Version 1.0 is the NEW version? It sounds like the ORIGINAL version. Shouldn’t the new version be 2.0?”
Well, no. The original version was V0. Trueface is now ready to release V1.
Well, almost ready.
If you go to the Trueface SDK reference page, you’ll see that Trueface releases are categorized as “alpha,” “beta,” and “stable.”
When I viewed the page on the afternoon of March 28, the latest stable release was 0.33.14634.
If you want to use the version 1.0 that is being “introduced” (Pangiam’s word), you have to go to the latest beta release, which was 1.0.16286.
And if you want to go bleeding edge alpha, you can get release 1.1.16419.
(Again, this was on the afternoon of March 28, and may change by the time you read this.)
Now most biometric vendors don’t expose this much detail about their software. Some don’t even provide any release information, especially for products with long delivery times where the version that a customer will eventually get doesn’t even have locked-down requirements yet. But Pangiam has chosen to provide this level of detail.
Oh, and Pangiam/Trueface also actively participates in the ongoing NIST FRVT testing. Information on the 1:1 performance of the trueface-003 algorithm can be found here. Information on the 1:N performance of the trueface-000 algorithm can be found here.
(When I wrote this in 2022 I used the then-current FRVT terminology. I’ve updated to FRTE as warranted.)
As I’ve noted before, there are a number of facial recognition companies that claim to be the #1 NIST facial recognition vendor. I’m here to help you cut through the clutter so you know who the #1 NIST facial recognition vendor truly is.
You can confirm this information yourself by visiting the NIST FRTE 1:1 Verification and FRTE 1:N Identification pages. The old FRVT, by the way, stood for “Face Recognition Vendor Test”—and has subsequently been replaced by FRTE, “Face Recognition Technology Evaluation.”
Now how can ALL dozen-plus of these entities be number 1?
Easy.
The NIST 1:1 and 1:N tests include many different accuracy and performance measurements, and each of the entities listed above placed #1 in at least one of these measurements. And all of the databases, database sizes, and use cases measure very different things.
Visage Technologies was #1 in the 1:1 performance measurements for template generation time, in milliseconds, for 480×720 and 960×1440 data.
Meanwhile, NEC was #1 in the 1:N Identification (T>0) accuracy measurements for gallery border, probe border with a delta T greater than or equal to 10 years, N = 1.6 million.
Not to be confused with the 1:N Identification (T>0) accuracy measurements for gallery visa, probe border, N = 1.6 million, where the #1 algorithm was not from NEC.
And not to be confused with the 1:N Investigation (R = 1, T = 0) accuracy measurements for gallery border, probe border with a delta T greater than or equal to 10 years, N = 1.6 million, where the #1 algorithm was not from NEC.
And can I add a few more caveats?
First caveat: Since all of these tests are ongoing tests, you can probably find a slightly different set of #1 algorithms if you look at the January data, and you will probably find a slightly different set of #1 algorithms when the March data is available.
Second caveat: These are the results for the unqualified #1 NIST categories. You can add qualifiers, such as “#1 non-Chinese vendor” or “#1 western vendor” or “#1 U.S. vendor” to vault a particular algorithm to the top of the list.
Third caveat: You can add even more qualifiers, such as “within the top five NIST vendors” and (one I admit to having used before) “a top tier NIST vendor in multiple categories.” This can mean whatever you want it to mean. (As can “dramatically improved” algorithm, which may mean that you vaulted from position #300 to position #200 in one of the categories.)
Fourth caveat: Even if a particular NIST test applies to your specific use case, #1 performance on a NIST test does not guarantee that a facial recognition system supplied by that entity will yield #1 performance with your database in your environment. The algorithm sent to NIST may or may not make it into a production system. And even if it does, performance against a particular NIST test database may not yield the same results as performance against a Rhode Island criminal database, a French driver’s license database, or a Nigerian passport database. For more information on this, see Mike French’s LinkedIn article “Why agencies should conduct their own AFIS benchmarks rather than relying on others.”
So now that you know who the #1 NIST facial recognition vendor is, do you feel more knowledgeable?
Although I’ll grant that a NIST accuracy or performance claim is better than some other claims, such as self-test results.
As many of you know, there have been many claims about bias in facial recognition, which have even led to the formation of an Algorithmic Justice League.
Whoops, wrong Justice League. But you get the idea. “Gender Shades” and stuff like that, which I’ve written about before.
Back to Hall’s article, which makes a number of excellent points about bias in facial recognition, including the studies performed by NIST (referenced later in this post), but I loved one comparison that Baker wrote about.
So technical improvements may narrow but not entirely eliminate disparities in face recognition. Even if that’s true, however, treating those disparities as a moral issue still leads us astray. To see how, consider pharmaceuticals. The world is full of drugs that work a bit better or worse in men than in women. Those drugs aren’t banned as the evil sexist work of pharma bros. If the gender differential is modest, doctors may simply ignore the difference, or they may recommend a different dose for women. And even when the differential impact is devastating—such as a drug that helps men but causes birth defects when taken by pregnant women—no one wastes time condemning those drugs for their bias. Instead, they’re treated like any other flawed tool, minimizing their risks by using a variety of protocols from prescription requirements to black box warnings.
As an (tangential) example of this, I recently read an article entitled “To begin addressing racial bias in medicine, start with the skin.” This article does not argue that we should ban dermatology because conditions are more often misdiagnosed in people with darker skin. Instead, the article argues that we should improve dermatology to reduce these biases.
In the same manner, the biometric industry and stakeholder should strive to minimize bias in facial recognition and other biometrics, not ban it. See NIST’s study (NISTIR 8280, PDF) in this regard, referenced in Baker’s article.
In addition to what Baker said, let me again note that when judging the use of facial recognition, it should be compared against the alternatives. While I believe that alternatives should be offered, even passwords, consider that automated facial recognition supported by trained examiner review is much more accurate than witness (mis)identification. I don’t think we want to solely rely on that.
Because falsely imprisoning someone due to non-algorithmic witness misidentification is as bad as kryptonite.
I’ve worked in the general area of contactless fingerprint capture for years, initially while working for a NIST CRADA partner. While most of the NIST CRADA partners are still pursuing contactless fingerprint technology, there are also new entrants.
In the pre-COVID days, the primary advantage of contactless fingerprint capture was speed. As I noted in an October 2021 post:
Actually this effort launched before that, as there were efforts in 2004 and following years to capture a complete set of fingerprints within 15 seconds; those efforts led, among other things, to the smartphone software we are seeing today.
By 2016, several companies had entered into cooperative research and development agreements with NIST to develop contactless fingerprint capture software, either for dedicated devices or for smartphones. Most of those early CRADA participants are still around today, albeit under different names.
I’ve previously written posts about two of these CRADA partners, Telos ID (previously Diamond Fortress) and Sciometrics (the supplier for Integrated Biometrics).
But these aren’t the only players in the contactless fingerprint market. There are always new entrants in a market where there is opportunity.
A month before I wrote my post about Integrated Biometrics/Sciometrics’ SlapShot, a company called Tech5 released its own product.
T5-AirSnap Finger uses a smartphone’s built-in camera to perform finger detection, enhancement, image processing and scaling, generating images that can be transmitted for identity verification or registration within seconds, according to the announcement. The resulting images are suitable for use with standard AFIS solutions, and comparison against legacy datasets…
Parthe has noted the importance of smartphone-based contactless fingerprint capture:
“We all carry these awesome computers in our hands,” Parthe explains. “It’s a perfectly packaged hardware device that is ideal for any capture technology. Smartphones are powerful compute devices on the edge, with a nice integrated camera with auto-focus and flash. And now phones also come with multiple cameras which can help with better focus and depth estimation. This allows the users to take photos of their fingers and the software takes care of the rest. I’d just like to point out here that we’re talking about using the phone’s camera to capture biometrics and using a smartphone to take the place of a dedicated reader. We’re not talking about the in-built fingerprint acquisition we’re all familiar with on many devices which is the means of accessing the device itself.”
I’ve made a similar point before. While dedicated devices may not completely disappear, multi-purpose devices that we already have are the preferable way to go.
For more information about T5-AirSnap Finger, visit this page.
Tech5’s results for NIST’s Proprietary Fingerprint Template (PFT) Evaluation III, possibly using an algorithm similar to that in T5-AirSnap Finger, are detailed here.
This report, currently published in draft form, reviews the methods that forensic laboratories use to interpret evidence containing a mixture of DNA from two or more people.
The problem of mixtures is more pronounced in DNA analysis than in analysis of other biometrics. You aren’t going to encounter two overlapping irises or two overlapping faces in the real world. (Well, not normally.)
You can certainly encounter overlapping voices (in a recorded conversation) or overlapping fingerprints (when two or more people touched the same item).
But there are methods to separate one biometric sample from another.
It’s a little more complicated when you’re dealing with DNA.
Distinguishing one person’s DNA from another in these mixtures, estimating how many individuals contributed DNA, determining whether the DNA is even relevant or is from contamination, or whether there is a trace amount of suspect or victim DNA make DNA mixture interpretation inherently more challenging than examining single-source samples. These issues, if not properly considered and communicated, can lead to misunderstandings regarding the strength and relevance of the DNA evidence in a case.
As some of you know, I have experience with “rapid DNA” instruments that provide a mostly-automated way to analyze DNA samples. Because these instruments are mostly automated and designed for use by non-scientific personnel, they are not able to analyze all of the types of DNA that would be analyzed by a forensic laboratory.
Therefore, this draft document is silent on the topic of rapid DNA, despite the fact that co-author Peter Vallone has years of experience in rapid DNA.
I am not a scientist, but in my view the absence of any reference to rapid DNA strongly suggests that it’s premature at this time to apply these instruments to DNA mixtures, such as rape cases in which both the assailant’s and the victim’s DNA are present in a sample.
Granted, there may be rape cases in which the DNA of the assailant may be present with no mixture.
You have to be REALLY careful before claiming that rapid DNA instruments can be used to wipe out the backlog of rape test kits. However, rapid DNA can be used to clear less complicated DNA cases so that the laboratories can concentrate on the more complex cases.
For those who have never looked at FRVT before, it does not merely report the accuracy results of searches against one database, but reports accuracy results for searches against eight different databases of different types and of different sizes (N).
Mugshot, Mugshot, N = 12000000
Mugshot, Mugshot, N = 1600000
Mugshot, Webcam, N = 1600000
Mugshot, Profile, N = 1600000
Visa, Border, N = 1600000
Visa, Kiosk, N = 1600000
Border, Border 10+YRS, N = 1600000
Mugshot, Mugshot 12+YRS, N = 3000000
This is actually good for the vendors who submit their biometric algorithms, because even if the algorithm performs poorly on one of the databases, it may perform wonderfully on one of the other seven. That’s how so many vendors can trumpet that their algorithm is the best. When you throw in other qualifiers such as “top five,” “best non-Chinese vendor,” and even “vastly improved,” you can see how dozens of vendors can issue “NIST says we’re the best” press releases.
Not that I knock the practice; after all, I myself have done this for years. But you need to know how to interpret these press releases, and what they’re really saying. Remember this when you read the vendor announcement toward the end of this post.
Anyway, I went to check the current results, which when you originally visit the page are sorted in the order of the fifth database, the Visa Border database. And this is what I saw this morning (October 27):
For the most part, the top five for the Visa Border test contain the usual players. North Americans will be most familiar with IDEMIA and NEC, and Cloudwalk and Sensetime have been around for a while.
A new algorithm from a not-so-new provider
But I had never noticed Cubox in the NIST testing before. And the number attached to the Cubox algorithm, “000,” indicates that this is Cubox’s first submission.
And Cubox did exceptionally well, especially for a first submission.
As you can see by the superscripts attached to each numeric value, Cubox had the second most accurate algorithm for the Visa Border test, the most accurate algorithm for the Visa Kiosk test, and placed no lower than 12th in the six (of eight) tests in which it participated. Considering that 302 algorithms have been submitted over the years, that’s pretty remarkable for a first-time submission.
Well, as an ex-IDEMIA employee, my curious nature kicked in.
The Cubox that submitted an algorithm to NIST is a South Korean firm with the website cubox.aero, self-described as “The Leading Provider in Biometrics” (aren’t they all?) with fingerprint and face solutions. Cubox competes in the access control and border control markets.
Cubox’s ten-year history and “overseas” page details its growth in its markets, and its solutions that it has provided in South Korea, Mongolia, and Vietnam.
And although Cubox hasn’t trumpeted its performance on its own website (at least in the English version; I don’t know about the Korean version), Cubox has publicized its accomplishment on a LinkedIn post.
But before you get excited about the NIST results from Cubox, Sensetime, or any of the algorithm providers, remember that the NIST test is just a test. NIST cautions people about this, I have cautioned people about this (see the fourth point in this post), and Mike French has also discussed this.
However, it is also important to remember that NIST does not test operational systems, but rather technology submitted as software development kits or SDKs. Sometimes these submissions are labeled as research (or just not labeled), but in reality it cannot be known if these algorithms are included in the product that an agency will ultimately receive when they purchase a biometric system. And even if they are “thesame”, the operational architecture could produce different results with the same core algorithms optimized for use in a NIST study.
The very fact that test results vary between the NIST databases explicitly tells you that a number one ranking on one database does not mean that you’ll get a number one ranking on every database. And as French reminds us, when you take an operational algorithm in an operational system with a customer database, the results may be quite different.
Which is why French recommends that any government agency purchasing a biometric system should conduct its own test, with vendor operational systems (rather than test systems) loaded with the agency’s own data.
Incidentally, if your agency needs a forensic expert to help with a biometric procurement or implementation, check out the consulting services offered by French’s company, Applied Forensic Services.