We’ve been talking about the death of the bicycle since the time of the Wright Brothers and Henry Ford.
But we still haven’t achieved it.
Wilbur Wright building a bicycle two centuries ago before he came to his senses. By Wright brothers – Library of Congress CALL NUMBER: LC-W85- 81 [P&P]REPRODUCTION NUMBER: LC-DIG-ppprs-00540 (digital file from original)LC-W851-81 (b&w film copy, Public Domain, https://commons.wikimedia.org/w/index.php?curid=2217030
What will it take to make the death of the bicycle a reality?
Why does the bicycle need to die?
I think that all intelligent people agree that the bicycle needs to die. But just to be extra-cautious, I will again enumerate the reasons why the death of the bicycle is absolutely necessary.
By Photo by Adam Coppola. – Photo by Adam Coppola taken under contract for PeopleForBikes, released into the public domain with the consent of the subjects.[1][2], CC0, https://commons.wikimedia.org/w/index.php?curid=46251073
The bicycle is too slow. Perhaps the bicycle was suitable for 19th century life, but today it’s an embarrassment. The speed of the bicycle has long been surpassed by automobiles from the aforementioned Ford, and airplanes from the aforementioned Wrights. It poses a danger as slow-moving bicycle traffic risks getting hit by faster-moving vehicles, unless extraordinary measures are undertaken to separate bicycles from normal traffic. For this reason alone the bicycle must die.
The bicycle is too weak. If that weren’t enough, take a look at the weakness of the bicycle and the huge threat from this weakness. You can completely destroy the bicycle and its rider with a simple puddle of oil, a nail, or a misplaced brick that a bicycle hits. This is yet another reason why the bicycle must die.
The bicycle is too inefficient. Other factors of transportation are much better equipped to carry loads of people and goods. The bicycle? Forget it. Any attempt to carry a reasonable load of goods on a bicycle is doomed to failure.
The bicycle is too easy to steal. It takes some effort to steal other factors of transportation, but it is pitifully easy to steal a bike, or part of a bike.
Despite everyone knowing about these security and personal threats for years if not decades, use of the bicycle continues to persist.
And we have to put a stop to it.
Why does the bicycle continue to live?
The problem is that a few wrongheaded individuals continue to promote bicycle use in a misguided way.
Some of them argue that bicycles provide health benefits that you can’t realize with other factors of transportation. Any so-called health benefits are completely erased by the damage that could happen when a bicycle rider ends up face down on the pavement.
Others argue that you can mitigate the problems with bicycles by requiring riders to change to a new bicycle every 90 days. This is also misguided, because even if you do this, the threats from bicycle use continue to occur from day one.
Make sure your bicycle has a wheel, spokes, seat, and drink holder, and don’t use any of the last six bicycles you previously used. By Havang(nl) – Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=2327525
How do we solve this?
People have tried to hasten the death of the bicycle, but its use still persists.
We have continued to advance other factors of transportation, both from the efforts of vendors, as well as the efforts of industry associations such as the International Bus and Infiniti Association (IBIA) and the MANX (Moving At Necessary eXpress) Alliance.
Yet resistance persists. Even the National Institute of Standards and Technology (NIST), which should know better, continues to define bicycle use as a standard factor of transportation.
The three most recognized factors of transportation include “something you pedal” (such as a bicycle), “something you drive” (such as an automobile), and “something you ride” (such as a bus).
NIST Special Publication 800-8-2. Link unavailable.
It is imperative that both governments and businesses completely ban use of the bicycle in favor of other forms of transportation. Our security as a nation depends on this.
Do your part to bring about the death of the bicycle in favor of other factors of transportation, and ensure that we will enjoy a bicycleless future.
A personal note
I don’t agree with anything I just wrote.
Despite its faults, I still believe that the bicycle has a proper place in our society, perhaps as one of several factors of transportation in an MFT (multi-factor transportation) arrangement.
And, if you haven’t figure it out yet, I’m not on board with the complete death of the password either. Passwords (and PINs) have their place. And when used properly they’re not that bad (even if these 2021 figures are off by an order of magnitude today).
Feel free to share the images and interactive found on this page freely. When doing so, please attribute the authors by providing a link back to this page and Better Buys, so your readers can learn more about this project and the related research.
I’ve talked about why NIST separated its FRVT efforts into FRTE and FATE.
But I haven’t talked bout how NIST did this.
And as you all know, the second most important question after why is how.
Why the great renaming took place
As I noted back in August, NIST chose to split its Face Recognition Vendor Test (FRVT) into two parts—FRTE (Face Recognition Technology Evaluation) and FATE (Face Analysis Technology Evaluation).
In essence, the Face Recognition Vendor Test had become a hodgepodge of different things. Some of the older tests were devoted to identification of individuals (face recognition), while some of the newer tests were looking at issues other than individual identification (face analysis).
Of course, this confusion between identification and non-identification is nothing new, which is why some of the people who read Gender Shades falsely concluded that if the three algorithms couldn’t classify people by sex or race, they couldn’t identify them as individuals.
But I digress. (I won’t do it again.)
NIST explained at the time:
Tracks that involve the processing and analysis of images will run under the FATE activity, and tracks that pertain to identity verification will run under FRTE.
To date, most of my personal attention (and probably most of yours) was paid to what was previously called FRVT 1:1 and FRVT 1:N.
These two tests are now part of FRTE, and were simply renamed to FRTE 1:1 and FRTE 1:N. They’ve even (for now) retained the same URLs, although that may change in the future.
Other tests that are now part of the FRTE bucket include:
The “Still Face and Iris 1:N Identification” effort (PDF) has apparently also been reclassified as an FRTE effort.
What is in FATE?
Obviously, presentation attack detection (PAD) testing falls into the FATE category, since this does not measure the identification of an individual, but whether a person is truly there or not. The first results have been released; I previously wrote about this here.
The next obvious category is age estimation testing, which again does not try to identify an individual, but estimate how old the person is. This testing has not yet started, but I talked about the concept of age estimation previously.
It is very possible that NIST will add additional FRTE and FATE tests in the future. These may be brand new tests, or variations of existing tests. For example, when all of us started wearing face masks a couple of years ago, NIST simulated face masks on its existing facial images and created the data for the face mask test described above.
What do you think NIST should test next, either in the FRTE or the FATE category?
More on morphing
And yes, I’m concluding this post with this video. By the way, this is the full version that (possibly intentionally) caused a ton of controversy and was immediately banned for nearly a quarter century. The morphing starts at 5:30. The crotch-grabbing starts right after the 7:00 mark.
Perhaps because of the lack of controversy with Godley & Creme’s earlier effort, Ashley Clark prefers it to the later Michael Jackson/John Landis effort.
Whereas Godley & Creme used editing technology to embrace and reflect the ambiguous murk of thwarted love, Jackson and Landis imposed an artificial sheen on the complexity of identity; a sheen that feels poignant if not outright tragic in the wake of Jackson’s ultimate appearance and fate. Really, it did matter if he was black or white.
One of the main application areas of facial morphing for criminal purposes is forging identity documents. The attack targets face-based identity verification systems and procedures. Most often it involves passports; however, any ID document with a photo can be compromised.
One well-known case happened in 2018 when a group of activists merged together a photo of Federica Mogherini, the High Representative of the European Union for Foreign Affairs and Security Policy, and a member of their group. Using this morphed photo, they managed to obtain an authentic German passport.
Well, the FATE side of the house has released its first two studies, including one entitled “Face Analysis Technology Evaluation (FATE) Part 10: Performance of Passive, Software-Based Presentation Attack Detection (PAD) Algorithms” (NIST Internal Report NIST IR 8491; PDF here).
I and countless others have spent the last several years referring to the National Institute of Standards and Technology’s Face Recognition Vendor Test, or FRVT. I guess some people have spent almost a quarter century referring to FRVT, because the term has been in use since 1999.
Starting now, you’re not supposed to use the FRVT acronym any more.
To bring clarity to our testing scope and goals, what was formerly known as FRVT has been rebranded and split into FRTE (Face Recognition Technology Evaluation) and FATE (Face Analysis Technology Evaluation). Tracks that involve the processing and analysis of images will run under the FATE activity, and tracks that pertain to identity verification will run under FRTE. All existing participation and submission procedures remain unchanged.
The change actually makes sense, since tasks such as age estimation and presentation attack detection (liveness detection) do not directly relate to the identification of individuals.
Us old folks just have to get used to the change.
I just hope that the new “FATE” acronym doesn’t mean that some algorithms are destined to perform better than others.
What’s more, these results change on a monthly basis, so it’s quite possible that the #1 vendor in some category in February 2022 was no longer than #1 vendor in March 2022. (And if your company markets years-old FRVT results, stop it!)
This is the August 15, 2023 peek at three ways to slice and dice the NIST FRVT results.
And a bunch of vendors will be mad at me because I didn’t choose THEIR preferred slicing and dicing, or their ways to exclude results (not including Chinese algorithms, not including algorithms used in surveillance, etc.). The mad vendors can write their own blog posts (or ask Bredemarket to ghostwrite them on their behalf).
NIST FRVT 1:1, VISABORDER
The phrase “NIST FRVT 1:1, VISABORDER” is shorthand for the NIST one-to-one version of the Face Recognition Vendor Test, using the VISABORDER probe and gallery data. This happens to be the default way in which NIST sorts the 1:1 accuracy results, but of course you can sort them against any other probe/gallery combination, and get a different #1 vendor.
As of August 15, the top two accuracy algorithms for VISABORDER came from Cloudwalk. Here are all of the top ten.
But NIST doesn’t just measure accuracy for a bunch of different probe-target combinations. It also measures performance, since the most accurate algorithm in the world won’t do you any good if it takes forever to compare the face templates.
One caveat regarding these measures is that NIST conducts the tests on a standardized set of equipment, so that results between vendors can be compared. This is important to note, because a comparison that takes 103 milliseconds on NIST’s equipment will yield a different time on a customer’s equipment.
One of the many performance measures is “Comparison Time (Mate).” There is also a performance measure for “Comparison Time (Non-mate).”
So in this test, the fastest vendor algorithm comes from Trueface. Again, here are the top 10.
Now I know what some of you are saying. “John,” you say, “the 1:1 test only measures a comparison against one face against one other face, or what NIST calls verification. What if you’re searching against a database of faces, or identification?”
Well, NIST has a 1:N test to measure that particular use case. Or use cases, because again you can slice and dice the results in so many different ways.
When looking at accuracy, the default NIST 1:N sort is by:
Probe images from the BORDER database.
Gallery images from a 1,600,000 record VISA database.
Cloudwalk happens to be the #1 vendor in this slicing and dicing of the test. Here are the top ten.
The usual cautions apply that everyone, including NIST, emphasizes that these test results do not guarantee similar results in an operational environment. Even if the algorithm author ported its algorithm to an operational system with absolutely no changes, the operational system will have a different hardware configuration and will have different data.
For example, none of the NIST 1:N tests use databases with more than 12 million records. Even 20 years ago, Behnam Bavarian correctly noted that biometric databases would eventually surpass hundreds of millions of records, or even billions of records. There is no way that NIST could assemble a test database that large.
Iris recognition continues to make the news. Let’s review what iris recognition is and its benefits (and drawbacks), why Apple made the news last month, and why Worldcoin is making the news this month.
What is iris recognition?
There are a number of biometric modalities that can identify individuals by “who they are” (one of the five factors of authentication). A few examples include fingerprints, faces, voices, and DNA. All of these modalities purport to uniquely (or nearly uniquely) identify an individual.
One other way to identify individuals is via the irises in their eyes. I’m not a doctor, but presumably the Cleveland Clinic employs medical professionals who are qualified to define what the iris is.
The iris is the colored part of your eye. Muscles in your iris control your pupil — the small black opening that lets light into your eye.
But why use irises rather than, say, fingerprints and faces? The best person to answer this is John Daugman. (At this point several of you are intoning, “John Daugman.” With reason. He’s the inventor of iris recognition.)
(I)ris patterns become interesting as an alternative approach to reliable visual recognition of persons when imaging can be done at distances of less than a meter, and especially when there is a need to search very large databases without incurring any false matches despite a huge number of possibilities. Although small (11 mm) and sometimes problematic to image, the iris has the great mathematical advantage that its pattern variability among different persons is enormous.
Daugman, John, “How Iris Recognition Works.” IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 14, NO. 1, JANUARY 2004. Quoted from page 21. (PDF)
Or in non-scientific speak, one benefit of iris recognition is that you know it is accurate, even when submitting a pair of irises in a one-to-many search against a huge database. How huge? We’ll discuss later.
Brandon Mayfield and fingerprints
Remember that Daugman’s paper was released roughly two months before Brandon Mayfield was misidentified in a fingerprint comparison. (Everyone now intone “Brandon Mayfield.”)
While some of the issues associated with Mayfield’s misidentification had nothing to do with forensic science (Al Jazeera spends some time discussing bias, and Itiel Dror also looked at bias post-Mayfield), this still shows that fingerprints are remarkably similar and that it takes care to properly identify people.
Police agencies, witnesses, and faces
And of course there are recent examples of facial misidentifications (both by police agencies and witnesses), again not necessarily forensic science related, and again showing the similarity of faces from two different people.
At the root of iris recognition’s accuracy is the data-richness of the iris itself. The IrisAccess system captures over 240 degrees of freedom or unique characteristics in formulating its algorithmic template. Fingerprints, facial recognition and hand geometry have far less detailed input in template construction.
Enough about claims. What about real results? The IREX 10 test, independently administered by the U.S. National Institute of Standards and Technology, measures the identification (one-to-many) accuracy of submitted algorithms. At the time I am writing this, the ten most accurate algorithms provide false negative identification rates (FNIR) between 0.0022 ± 0.0004 and 0.0037 ± 0.0005 when two eyes are used. (Single eye accuracy is lower.) By the time you see this, the top ten algorithms may have changed, because the vendors are always improving.
IREX10 two-eye accuracy, top ten algorithms as of July 28, 2023. (Link)
While the IREX10 one-to-many tests are conducted against databases of less than a million records, it is estimated that iris one-to-many accuracy remains high even with databases of a billion people—something we will return to later in this post.
Iris drawbacks
OK, so if irises are so accurate, why aren’t we dumping our fingerprint readers and face readers and just using irises?
In short, because of the high friction in capturing irises. You can use high-resolution cameras to capture fingerprints and faces from far away, but as of now iris capture usually requires you to get very close to the capture device.
Iris image capture circa 2020 from the U.S. Federal Bureau of Investigation. (Link)
Which I guess is better than the old days when you had to put your eye right up against the capture device, but it’s still not as friendly (or intrusive) as face capture, which can be achieved as you’re walking down a passageway in an airport or sports stadium.
Irises and Apple Vision Pro
So how are irises being used today? You may or may not have hard last month’s hoopla about the Apple Vision Pro, which uses irises for one-to-one authetication.
I’m not going to spend a ton of time delving into this, because I just discussed Apple Vision Pro in June. In fact, I’m just going to quote from what I already said.
In short, as you wear the headset (which by definition is right on your head, not far away), the headset captures your iris images and uses them to authenticate you.
It’s a one-to-one comparison, not the one-to-many comparison that I discussed earlier in this post, but it is used to uniquely identify an individual.
But iris recognition doesn’t have to be used for identification.
Irises and Worldcoin
“But wait a minute, John,” you’re saying. “If you’re not using irises to determine if a person is who they say they are, then why would anyone use irises?”
Over the past several years, I’ve analyzed a variety of identity firms. Earlier this year I took a look at Worldcoin….Worldcoin’s World ID emphasizes privacy so much that it does not conclusively prove a person’s identity (it only proves a person’s uniqueness)…
That’s the only thing that I’ve said about Worldcoin, at least publicly. (I looked at Worldcoin privately earlier in 2023, but that report is not publicly accessible and even I don’t have it any more.)
The Worldcoin Foundation today announced that Worldcoin, a project co-founded by Sam Altman, Alex Blania and Max Novendstern, is now live and in a production-grade state.
The launch includes the release of the World ID SDK and plans to scale Orb operations to 35+ cities across 20+ countries around the world. In tandem, the Foundation’s subsidiary, World Assets Ltd., minted and released the Worldcoin token (WLD) to the millions of eligible people who participated in the beta; WLD is now transactable on the blockchain….
“In the age of AI, the need for proof of personhood is no longer a topic of serious debate; instead, the critical question is whether or not the proof of personhood solutions we have can be privacy-first, decentralized and maximally inclusive,” said Worldcoin co-founder and Tools for Humanity CEO Alex Blania. “Through its unique technology, Worldcoin aims to provide anyone in the world, regardless of background, geography or income, access to the growing digital and global economy in a privacy preserving and decentralized way.”
Worldcoin does NOT positively identify people…but it can still pay you
A very important note: Worldcoin’s purpose is not to determine identity (that a person is who they say they are). Worldcoin’s purpose is to determine uniqueness: namely, that a person (whoever they are) is unique among all the billions of people in the world. Once uniqueness is determined, the person can get money money money with an assurance that the same person won’t get money twice.
Iris biometrics outperform other biometric modalities and already achieved false match rates beyond 1.2× 10−141.2×10−14 (one false match in one trillion[9]) two decades ago[10]—even without recent advancements in AI. This is several orders of magnitude more accurate than the current state of the art in face recognition.
It’s been years since I talked about Identity Assurance Levels (IALs) in any detail, but I wanted to delve into two of the levels and see when IAL3 is necessary, and when it is not.
The U.S. National Institute of Standards and Technology has defined “identity assurance levels” (IALs) that can be used when dealing with digital identities. It’s helpful to review how NIST has defined the IALs.
Assurance in a subscriber’s identity is described using one of three IALs:
IAL1: There is no requirement to link the applicant to a specific real-life identity. Any attributes provided in conjunction with the subject’s activities are self-asserted or should be treated as self-asserted (including attributes a [Credential Service Provider] CSP asserts to an [Relying Party] RP). Self-asserted attributes are neither validated nor verified.
IAL2: Evidence supports the real-world existence of the claimed identity and verifies that the applicant is appropriately associated with this real-world identity. IAL2 introduces the need for either remote or physically-present identity proofing. Attributes could be asserted by CSPs to RPs in support of pseudonymous identity with verified attributes. A CSP that supports IAL2 can support IAL1 transactions if the user consents.
IAL3: Physical presence is required for identity proofing. Identifying attributes must be verified by an authorized and trained CSP representative. As with IAL2, attributes could be asserted by CSPs to RPs in support of pseudonymous identity with verified attributes. A CSP that supports IAL3 can support IAL1 and IAL2 identity attributes if the user consents.
For purposes of this post, IAL1 is (if I may use a technical term) a nothingburger. It may be good enough for a Gmail account, but these days even social media accounts are more likely to require IAL2.
So what’s the practical difference between IAL2 and IAL3?
If we ignore IAL1 and concentrate on IAL2 and IAL3, we can see one difference between the two. IAL2 allows remote, unsupervised identity proofing, while IAL3 requires (in practice) that any remote identity proofing is supervised.
Much of my time at my previous employer Incode Technologies involved unsupervised remote identity proofing (IAL2). For example, if a woman wants to set up an account at a casino, she can complete the onboarding process to set up the account on her phone, without anyone from the casino being present to make sure she wasn’t faking her face or her ID. (Fraud detection is the “technologies” part of Incode Technologies, and that’s how they make sure she isn’t faking.)
SRIP provides remote supervision of in-person proofing using NextgenID’s Identity Stations, an all-in-one system designed to securely perform all enrollment processes and workflow requirements. The station facilitates the complete and accurate capture at IAL levels 1, 2 and 3 of all required personal identity documentations and includes a full complement of biometric capture support for face, fingerprint, and iris.
Now there are some other differences between IAL2 and IAL3 in terms of the proofing, so NIST came up with a handy dandy chart that allows you to decide which IAL level you need.
At this point, the agency understands that some level of proofing is required. Step 3 is intended to look at the potential impacts of an identity proofing failure to determine if IAL2 or IAL3 is the most appropriate selection. The primary identity proofing failure an agency may encounter is accepting a falsified identity as true, therefore providing a service or benefit to the wrong or ineligible person. In addition, proofing, when not required, or collecting more information than needed is a risk in and of itself. Hence, obtaining verified attribute information when not needed is also considered an identity proofing failure. This step should identify if the agency answered Step 1 and 2 incorrectly, realizing they do not need personal information to deliver the service. Risk should be considered from the perspective of the organization and to the user, since one may not be negatively impacted while the other could be significantly harmed. Agency risk management processes should commence with this step.
Even with the complexity of the flowchart, some determinations can be pretty simple. For example, if any of the six risks listed under question 3 are determined to be “high,” then you must use IAL3.
But the whole exercise is a lot to work through, and you need to work through it yourself. When I pasted the PNG file for the flowchart above into this blog post, I noticed that the filename is “IAL_CYOA.png.” And we all know what “CYOA” means.
But if you do the work, you’ll be better informed on the procedures you need to use to verify the identities of people.
One footnote: although NIST is a U.S. organization, its identity assurance levels (including IAL2 and IAL3) are used worldwide, including by the World Bank. So everyone should be familiar with them.
After a lack of appearances in the Bredemarket blog (none since November), Pangiam is making an appearance again, based on announcements by Biometric Update and Trueface itself about a new revision of the Trueface facial recognition SDK.
The new revision includes a number of features, including a new model for masked faces and some technical improvements.
So what is this revision called?
Version 1.0.
“Wait,” you’re asking yourself. “Version 1.0 is the NEW version? It sounds like the ORIGINAL version. Shouldn’t the new version be 2.0?”
Well, no. The original version was V0. Trueface is now ready to release V1.
Well, almost ready.
If you go to the Trueface SDK reference page, you’ll see that Trueface releases are categorized as “alpha,” “beta,” and “stable.”
When I viewed the page on the afternoon of March 28, the latest stable release was 0.33.14634.
If you want to use the version 1.0 that is being “introduced” (Pangiam’s word), you have to go to the latest beta release, which was 1.0.16286.
And if you want to go bleeding edge alpha, you can get release 1.1.16419.
(Again, this was on the afternoon of March 28, and may change by the time you read this.)
Now most biometric vendors don’t expose this much detail about their software. Some don’t even provide any release information, especially for products with long delivery times where the version that a customer will eventually get doesn’t even have locked-down requirements yet. But Pangiam has chosen to provide this level of detail.
Oh, and Pangiam/Trueface also actively participates in the ongoing NIST FRVT testing. Information on the 1:1 performance of the trueface-003 algorithm can be found here. Information on the 1:N performance of the trueface-000 algorithm can be found here.
(When I wrote this in 2022 I used the then-current FRVT terminology. I’ve updated to FRTE as warranted.)
As I’ve noted before, there are a number of facial recognition companies that claim to be the #1 NIST facial recognition vendor. I’m here to help you cut through the clutter so you know who the #1 NIST facial recognition vendor truly is.
You can confirm this information yourself by visiting the NIST FRTE 1:1 Verification and FRTE 1:N Identification pages. The old FRVT, by the way, stood for “Face Recognition Vendor Test”—and has subsequently been replaced by FRTE, “Face Recognition Technology Evaluation.”
Now how can ALL dozen-plus of these entities be number 1?
Easy.
The NIST 1:1 and 1:N tests include many different accuracy and performance measurements, and each of the entities listed above placed #1 in at least one of these measurements. And all of the databases, database sizes, and use cases measure very different things.
Visage Technologies was #1 in the 1:1 performance measurements for template generation time, in milliseconds, for 480×720 and 960×1440 data.
Meanwhile, NEC was #1 in the 1:N Identification (T>0) accuracy measurements for gallery border, probe border with a delta T greater than or equal to 10 years, N = 1.6 million.
Not to be confused with the 1:N Identification (T>0) accuracy measurements for gallery visa, probe border, N = 1.6 million, where the #1 algorithm was not from NEC.
And not to be confused with the 1:N Investigation (R = 1, T = 0) accuracy measurements for gallery border, probe border with a delta T greater than or equal to 10 years, N = 1.6 million, where the #1 algorithm was not from NEC.
And can I add a few more caveats?
First caveat: Since all of these tests are ongoing tests, you can probably find a slightly different set of #1 algorithms if you look at the January data, and you will probably find a slightly different set of #1 algorithms when the March data is available.
Second caveat: These are the results for the unqualified #1 NIST categories. You can add qualifiers, such as “#1 non-Chinese vendor” or “#1 western vendor” or “#1 U.S. vendor” to vault a particular algorithm to the top of the list.
Third caveat: You can add even more qualifiers, such as “within the top five NIST vendors” and (one I admit to having used before) “a top tier NIST vendor in multiple categories.” This can mean whatever you want it to mean. (As can “dramatically improved” algorithm, which may mean that you vaulted from position #300 to position #200 in one of the categories.)
Fourth caveat: Even if a particular NIST test applies to your specific use case, #1 performance on a NIST test does not guarantee that a facial recognition system supplied by that entity will yield #1 performance with your database in your environment. The algorithm sent to NIST may or may not make it into a production system. And even if it does, performance against a particular NIST test database may not yield the same results as performance against a Rhode Island criminal database, a French driver’s license database, or a Nigerian passport database. For more information on this, see Mike French’s LinkedIn article “Why agencies should conduct their own AFIS benchmarks rather than relying on others.”
So now that you know who the #1 NIST facial recognition vendor is, do you feel more knowledgeable?
Although I’ll grant that a NIST accuracy or performance claim is better than some other claims, such as self-test results.
As many of you know, there have been many claims about bias in facial recognition, which have even led to the formation of an Algorithmic Justice League.
Whoops, wrong Justice League. But you get the idea. “Gender Shades” and stuff like that, which I’ve written about before.
Back to Hall’s article, which makes a number of excellent points about bias in facial recognition, including the studies performed by NIST (referenced later in this post), but I loved one comparison that Baker wrote about.
So technical improvements may narrow but not entirely eliminate disparities in face recognition. Even if that’s true, however, treating those disparities as a moral issue still leads us astray. To see how, consider pharmaceuticals. The world is full of drugs that work a bit better or worse in men than in women. Those drugs aren’t banned as the evil sexist work of pharma bros. If the gender differential is modest, doctors may simply ignore the difference, or they may recommend a different dose for women. And even when the differential impact is devastating—such as a drug that helps men but causes birth defects when taken by pregnant women—no one wastes time condemning those drugs for their bias. Instead, they’re treated like any other flawed tool, minimizing their risks by using a variety of protocols from prescription requirements to black box warnings.
As an (tangential) example of this, I recently read an article entitled “To begin addressing racial bias in medicine, start with the skin.” This article does not argue that we should ban dermatology because conditions are more often misdiagnosed in people with darker skin. Instead, the article argues that we should improve dermatology to reduce these biases.
In the same manner, the biometric industry and stakeholder should strive to minimize bias in facial recognition and other biometrics, not ban it. See NIST’s study (NISTIR 8280, PDF) in this regard, referenced in Baker’s article.
In addition to what Baker said, let me again note that when judging the use of facial recognition, it should be compared against the alternatives. While I believe that alternatives should be offered, even passwords, consider that automated facial recognition supported by trained examiner review is much more accurate than witness (mis)identification. I don’t think we want to solely rely on that.
Because falsely imprisoning someone due to non-algorithmic witness misidentification is as bad as kryptonite.