I was encouraged to check out k-ID, a firm that tracks age compliance laws on a global basis. It also lets companies ensure that their users comply with these laws.
“Age assurance refers to a range of methods and technologies used to estimate, verify or confirm someone’s age. Different countries allow different methods, but here are a few commonly used by k-ID…”
The following methods are then listed:
Face Scan: Your age is estimated by completing a video selfie
ID Scan: Your age is confirmed by scanning a government-issued ID
Credit Card: Confirm you’re an adult by using a valid credit card
Note that k-ID’s age assurance methods include age estimation (via your face), age verification (via your government ID), and age who-knows-what (IMHO, possession of a credit card proves nothing, especially if it’s someone else’s).
But if k-ID truly applies the appropriate laws to age assurance, it’s a step in the right direction. Because keeping track of laws in hundreds of thousands of jurisdictions can be a…um…challenge.
An announcement from Paravision says its biometric age estimation technology has achieved Level 3 certification from the Age Check Certification Scheme (ACCS), the leading independent certification body for age estimation. The results make it one of only six companies globally to receive ACCS’s highest-level designation for compliance.
San Francisco-based Paravision’s age estimation tech posted 100 percent precision in Challenge 25 compliance, with 0 subjects falsely identified as over 25 years old. It also scored a 0 percent Failure to Acquire Rate, meaning that every image submitted for analysis returned a result. Mean Absolute Error (MAE) was 1.37 years, with Standard Deviation of 1.17.
Now this is an impressive achievement, and Paravision is a quality company, and Joey Pritikin is a quality biometric executive, but…well, let me share the other story first, involving a Yoti customer (not Yoti).
Fenix responded that it set a challenge threshold at 23 years of age. Any user estimated to be that age or younger based on their face biometrics is required to use a secondary method for age verification.
Fenix had set OnlyFans challenge age, it turns out, at 20 years old. A correction to 23 years old was carried out on January 16, and then Fenix changed it again three days later, to 21 years old, Ofcom says.
Now Biometric Update was very clear that “Yoti provides the tech, but does not set the threshold.”
Challenge ages and legal ages
But do challenge thresholds have any meaning? I addressed that issue back in May 2024.
Many of the tests used a “Challenge-T” policy, such as “Challenge 25.” In other words, the test doesn’t estimate whether a person IS a particular age, but whether a person is WELL ABOVE a particular age….
So if you have to be 21 to access a good or service, the algorithm doesn’t estimate if you are over 21. Instead, it estimates whether you are over 25. If the algorithm thinks you’re over 25, you’re good to go. If it thinks you’re 24, pull out your ID card.
And if you want to be more accurate, raise the challenge age from 25 to 28.
NIST admits that this procedure results in a “tradeoff between protecting young people and inconveniencing older subjects” (where “older” is someone who is above the legal age but below the challenge age).
You may be asking why the algorithms have to set a challenge age above the lawful age, thus inconveniencing people above the lawful age but below the challenge age.
The reason is simple.
Age estimation is not all that accurate.
I mean, it’s accurate enough if I (a person well above the age of 21 years) must indicate whether I’m old enough to drink, but it’s not sufficiently accurate for a drinker on their 21st birthday (in the U.S.), or a 13 year old getting their first social media account (where lawful).
Not an official document.
If you have a government issued ID, age verification based upon that ID is a much better (albeit less convenient) solution.
« L’internaute doit simplement effectuer 3 mouvements de la main et l’avant-bras devant la caméra de son écran (ordinateur, tablette, smartphone). En quelques secondes, il/elle vérifie son âge sans dévoiler son identité. »
Help me, Google Translate; you’re my only hope.
“The Internet user simply has to make 3 movements of the hand and forearm in front of the camera on their screen (computer, tablet, smartphone). In a few seconds, he/she verifies his/her age without revealing his/her identity.”
The method is derived from a 1994 scientific paper entitled “Rapid aimed limb movements: Age differences and practice effects in component submovements.” The abstract of the paper reads as follows:
“Two experiments are reported in which younger and older adults practiced rapid aimed limb movements toward a visible target region. Ss were instructed to make the movements as rapidly and as accurately as possible. Kinematic details of the movements were examined to assess the differences in component submovements between the 2 groups and to identify changes in the movements due to practice. The results revealed that older Ss produced initial ballistic submovements that had the same duration but traveled less far than those of younger Ss. Additionally, older Ss produced corrective secondary submovements that were longer in both duration and distance than those of the younger subjects. With practice, younger Ss modified their submovements, but older Ss did not modify theirs even after extensive practice on the task. The results show that the mechanisms underlying movements of older adults are qualitatively different from those in younger adults.”
So what does this mean? Needemand has a separate BorderAge website—thankfully in English—that illustrates the first part of the user instructions.
I don’t know what happens after that, but the process definitely has an “active liveness” vibe, except instead of proving you’re real, you’re proving you’re old, or old enough.
Now I’m not sure if the original 1994 study results were ever confirmed across worldwide populations. But it wouldn’t be the first scheme that is unproven. Do we KNOW that fingerprints are unique?
Another question I have regards the granularity of the age estimation solution. Depending upon your use case and jurisdiction, you may have to show that your age is 13, 16, 18, 21, or 25. Not sure if BorderAge gets this granular.
But if you want a way to estimate age and preserve anonymity (the solution blocks faces and has too low of a resolution to capture friction ridges), BorderAge may fit the bill.
“So depending upon your needs, you can argue that”
This frame was followed by three differing answers to the “Where is ByteDance From?” question.
But isn’t there only one answer to the question? How can there be three?
It all depends upon your needs.
Who is the best age estimation vendor?
I shared an illustrative example of this last year. When the National Institute of Standards and Technology (NIST) tested its first six age estimation algorithms, it published the results for everyone to see.
“Because NIST conducts so many different tests, a vendor can turn to any single test in which it placed first and declare it is the best vendor.
“So depending upon the test, the best age estimation vendor (based upon accuracy and or resource usage) may be Dermalog, or Incode, or ROC (formerly Rank One Computing), or Unissey, or Yoti. Just look for that “(1)” superscript….
“Out of the 6 vendors, 5 are the best. And if you massage the data enough you can probably argue that Neurotechnology is the best also.
“So if I were writing for one of these vendors, I’d argue that the vendor placed first in Subtest X, Subtest X is obviously the most important one in the entire test, and all the other ones are meaningless.”
Are you the best? Only if I’m writing for you
I will let you in on a little secret.
When I wrote things for IDEMIA, I always said that IDEMIA was the best.
When I wrote things for Incode, I always said that Incode was the best.
And when I write things for each of my Bredemarket clients, I always say that my client is the best.
I recently had to remind a prospect of this fact. This particular prospect has a very strong differentiator from its competitors. When the prospect asked for past writing samples, I included this caveat:
“I have never written about (TOPIC 1) or (TOPIC 2) from (PROSPECT’S) perspective, but here are some examples of my writing on both topics.”
I then shared four writing samples, including something I wrote for my former employer Incode about two years ago. I did this knowing that my prospect would disagree with my assertions that Incode’s product is so great…and greater than the prospect’s product.
If this loses me the business, I can accept that. Anyone with any product marketing experience in the identity industry is guaranteed to have said SOMETHING offensive to most of the 80+ companies in the industry.
How do I write for YOU?
But let’s say that you’re an identity firm and you decide to contract with Bredemarket anyway, even though I’ve said nice things about your competitors in the past.
How do we work together to ensure that I say nice things about you?
By the time we’re done, we have hopefully painted a hero picture of your company, describing why you are the preferred solution for your customers—better than IDEMIA, Incode, or anyone else.
(Unless of course IDEMIA or Incode contracts with Bredemarket, in which case I will edit the sentence above just a bit.)
So let’s talk
If you would like to work with Bredemarket for differentiated content, proposal, or analysis work, book a free meeting on my “CPA” page.
As I’ve mentioned before, when the National Institute of Standards and Technology (NIST) tests biometric modalities such as finger and face, they conduct each test in a bunch of different ways.
But NIST doesn’t do this just to make the vendors happy. NIST does this because biometrics are used in many, many ways.
Let’s look at recent age estimation testing, which currently tests 15 algorithms rather than the original 6.
Governments and private entities can estimate ages for people at the pub, people buying weed, or people gambling. And then there’s the use case that is getting a lot of attention these days—people accessing social media.
Child Online Safety, Ages 13-16 (in my country anyway)
When NIST conceived the age estimation tests, the social media providers generaly required their users to be 13 years of age or older. For this reason, one of NIST’s age estimation tests focused upon whether age estimation algorithms could reliably identify those who were 13 years old vs. those who were not.
Age 8-12 – False Positive Rates (FPR) are proportions of subjects aged 8 to 12 but whose age is estimated from 13 to 16 (below 17).
This covers the case in which a social media provider requires people to be 13 years old or older, someone between 8 and 12 tries to sign up for the social media service anyway…AND SUCCESSFULLY DOES SO.
You want the “false positive rate” to be as low as possible in this case, so that’s what NIST measures.
As of December 10, the best performing algorithm of the 15 tested had a false positive rate (FPR) of 0.0467. The second was close at 0.0542, with the third at 0.0828.
The 15th was a distant last at 0.2929.
But the worst-tested algorithm is much better on other tests
But before you conclude that the 15th algorithm in the “8-12” test is a dud, take a look at how that same algorithm performed on some of the OTHER age estimation tests.
For the age 17-22 test (“False Positive Rates (FPR) are proportions of subjects aged 17 to 22 but whose age is estimated from 13 to 16 (below 17)”), this algorithm was the second MOST accurate.
And the algorithm is pretty good at correctly classifying 13-16 year olds.
It also performs well in the “challenge 25” tests (addressing some of the use cases I mentioned above such as alcohol purchases).
So it looks like this particular algorithm doesn’t (currently) do well with kids, but it does VERY well with adults.
So before you use the NIST tests as a starting point to determine if an algorithm is good for you, make sure you evaluate the CORRECT test, including the CORRECT data.
Basically, the difference between “recognition” and “analysis” in this context is that recognition identifies an individual, while analysis identifies a characteristic of an individual….The age of a person is another example of analysis. In and of itself an age cannot identify an individual, since around 385,000 people are born every day. Even with lower birth rates when YOU were born, there are tens or hundreds of thousands of people who share your birthday.
Normal people look forward to the latest album or movie. A biometric product marketing expert instead looks forward to an inaugural test report from the National Institute of Standards and Technology (NIST) on age estimation and verification using faces.
I’ve been waiting for this report for months now (since I initially mentioned it in July 2023), and in April NIST announced it would be available in the next few weeks.
NIST news release
Yesterday I learned of the report’s public availability via a NIST news release.
A new study from the National Institute of Standards and Technology (NIST) evaluates the performance of software that estimates a person’s age based on the physical characteristics evident in a photo of their face. Such age estimation and verification (AEV) software might be used as a gatekeeper for activities that have an age restriction, such as purchasing alcohol or accessing mature content online….
The new study is NIST’s first foray into AEV evaluation in a decade and kicks off a new, long-term effort by the agency to perform frequent, regular tests of the technology. NIST last evaluated AEV software in 2014….
(The new test) asked the algorithms to specify whether the person in the photo was over the age of 21.
Well, sort of. We’ll get to that later.
Current AEV results
I was in the middle of a client project on Thursday and didn’t have time to read the detailed report, but I did have a second to look at the current results. Like other ongoing tests, NIST will update the age estimation and verification (AEV) results as these six vendors (and others) submit new algorithms.
Why does NIST test age estmation, or anything else?
The Information Technology Laboratory and its Information Access Division
NIST campus, Gaithersburg MD. From https://www.nist.gov/ofpm/historic-preservation-nist/gaithersburg-campus. I visited it once, when Safran’s acquisition of Motorola’s biometric business was awaiting government approval. I may or may not have spoken to a Sagem Morpho employee at this meeting, even though I wasn’t supposed to in case the deal fell through.
One of NIST’s six research laboratories is its Information Technology Laboratory (ITL), charged “to cultivate trust in information technology (IT) and metrology.” Since NIST is part of the U.S. Department of Commerce, Americans (and others) who rely on information technology need an unbiased source on the accuracy and validity of this technology. NIST cultivates trust by a myriad of independent tests.
Some of those tests are carried out by one of ITL’s six divisions, the Information Access Division (IAD). This division focuses on “human action, behavior, characteristics and communication.”
The difference between FRTE and FATE
While there is a lot of IAD “characteristics” work that excites biometric folks, including ANSI/NIST standard work, contactless fingerprint capture, the Fingerprint Vendor Technology Evaluation (ugh), and other topics, we’re going to focus on our new favorite acronyms, FRTE (Face Recognition Technology Evaluation) and FATE (Face Analysis Technology Evaluation). If these acronyms are new to you, I talked about them last August (and the deprecation of the old FRVT acronym).
Basically, the difference between “recognition” and “analysis” in this context is that recognition identifies an individual, while analysis identifies a characteristic of an individual. So the infamous “Gender Shades” study, which tested the performance of three algorithms in identifying people’s sex and race, is an example of analysis.
Age analysis
The age of a person is another example of analysis. In and of itself an age cannot identify an individual, since around 385,000 people are born every day. Even with lower birth rates when YOU were born, there are tens or hundreds of thousands of people who share your birthday.
They say it’s your birthday. It’s my birthday too, yeah. From https://www.youtube.com/watch?v=fkZ9sT-z13I. Paul’s original band never filmed a promotional video for this song.
And your age matters in the situations I mentioned above. Even when marijuana is legal in your state, you can’t sell it to a four year old. And that four year old can’t (or shouldn’t) sign up for Facebook either.
You can check a person’s ID, but that takes time and only works when a person has an ID. The only IDs that a four year old has are their passport (for the few who have one) and their birth certificate (which is non-standard from county to county and thus difficult to verify). And not even all adults have IDs, especially in third world countries.
Self-testing
So companies like Yoti developed age estimation solutions that didn’t rely on government-issued identity documents. The companies tested their performance and accuracy themselves (see the PDF of Yoti’s March 2023 white paper here). However, there are two drawbacks to this:
While I am certain that Yoti wouldn’t pull any shenanigans, results from a self-test always engender doubt. Is the tester truly honest about its testing? Does it (intentionally or unintentionally) gloss over things that should be tested? After all, the purpose of a white paper is for a vendor to present facts that lead a prospect to buy a vendor’s solution.
Even with Yoti’s self tests, it did not have the ability (or the legal permission) to test the accuracy of its age estimation competitors.
How NIST tests age estimation
Enter NIST, where the scientists took a break from meterological testing or whatever to conduct an independent test. NIST asked vendors to participate in a test in which NIST personnel would run the test on NIST’s computers, using NIST’s data. This prevented the vendors from skewing the results; they handed their algorithms to NIST and waited several months for NIST to tell them how they did.
I won’t go into it here, but it’s worth noting that a NIST test is just a test, and test results may not be the same when you implement a vendor’s age estimation solution on CUSTOMER computers with CUSTOMER data.
The NIST internal report I awaited
NOW let’s turn to the actual report, NIST IR 8525 “Face Analysis Technology Evaluation: Age Estimation and Verification.”
NIST needed a set of common data to test the vendor algorithms, so it used “around eleven million photos drawn from four operational repositories: immigration visas, arrest mugshots, border crossings, and immigration office photos.” (These were provided by the U.S. Departments of Homeland Security and Justice.) All of these photos include the actual ages of the persons (although mugshots only include the year of birth, not the date of birth), and some include sex and country-of-birth information.
For each algorithm and each dataset, NIST recorded the mean absolute error (MAE), which is the mean number of years between the algorithm’s estimate age and the actual age. NIST also recorded other error measurements, and for certain tests (such as a test of whether or not a person is 17 years old) the false positive rate (FPR).
The challenge with the methodology
Many of the tests used a “Challenge-T” policy, such as “Challenge 25.” In other words, the test doesn’t estimate whether a person IS a particular age, but whether a person is WELL ABOVE a particular age. Here’s how NIST describes it:
For restricted-age applications such as alcohol purchase, a Challenge-T policy accepts people with age estimated at or above T but requires additional age assurance checks on anyone assessed to have age below T.
So if you have to be 21 to access a good or service, the algorithm doesn’t estimate if you are over 21. Instead, it estimates whether you are over 25. If the algorithm thinks you’re over 25, you’re good to go. If it thinks you’re 24, pull out your ID card.
And if you want to be more accurate, raise the challenge age from 25 to 28.
NIST admits that this procedure results in a “tradeoff between protecting young people and inconveniencing older subjects” (where “older” is someone who is above the legal age but below the challenge age).
NIST also performed a variety of demographic tests that I won’t go into here.
What the NIST age estimation test says
OK, forget about all that. Let’s dig into the results.
So depending upon the test, the best age estimation vendor (based upon accuracy and or resource usage) may be Dermalog, or Incode, or ROC (formerly Rank One Computing), or Unissey, or Yoti. Just look for that “(1)” superscript.
You read that right. Out of the 6 vendors, 5 are the best. And if you massage the data enough you can probably argue that Neurotechnology is the best also.
So if I were writing for one of these vendors, I’d argue that the vendor placed first in Subtest X, Subtest X is obviously the most important one in the entire test, and all the other ones are meaningless.
But the truth is what NIST said in its news release: there is no single standout algorithm. Different algorithms perform better based upon the sex or national origin of the people. Again, you can read the report for detailed results here.
What the report didn’t measure
NIST always clarifies what it did and didn’t test. In addition to the aforementioned caveat that this was a test environment that will differ from your operational environment, NIST provided some other comments.
The report excludes performance measured in interactive sessions, in which a person can cooperatively present and re-present to a camera. It does not measure accuracy effects related to disguises, cosmetics, or other presentation attacks. It does not address policy nor recommend AV thresholds as these differ across applications and jurisdictions.
Of course NIST is just starting this study, and could address some of these things in later studies. For example, its ongoing facial recognition accuracy tests never looked at the use case of people wearing masks until after COVID arrived and that test suddenly became important.
What about 22 year olds?
As noted above, the test used a Challenge 25 or Challenge 28 model which measured whether a person who needed to be 21 appeared to be 25 or 28 years old. This makes sense when current age estimation technology measures MAE in years, not days. NIST calculated the “inconvenience” to 21-25 (or 28) year olds affected by this method.
While a lot of attention is paid to the use cases for 21 year olds (buying booze) and 18 year olds (viewing porn), states and localities have also paid a lot of attention to the use cases for 13 year olds (signing up for social media). In fact, some legislators are less concerned about a 20 year old buying a beer than a 12 year old receiving text messages from a Meta user.
NIST tests for these in the “child online safety” tests, particularly these two:
Age < 13 – False Positive Rates (FPR) are proportions of subjects aged below 13 but whose age is estimated from 13 to 16 (below 17).
Age ≥ 17 – False Positive Rates (FPR) are proportions of subjects aged 17 or older but whose age is estimated from 13 to 16.
However, the visa database is the only one that includes data of individuals with actual ages below age 13. The youngest ages in the other datasets are 14, or 18, or even 21, rendering them useless for the child online safety tests.
Why NIST researchers are great researchers
The mark of a great researcher is their ability to continue to get funding for their research, which is why so many scientific papers conclude with the statement “further study is needed.”
Here’s how NIST stated it:
Future work: The FATE AEV evaluation remains open, so we will continue to evaluate and report on newly submitted prototypes. In future reports we will: evaluate performance of implementations that can exploit having a prior known-age reference photo of a subject (see our API); consider whether video clips afford improved accuracy over still photographs; and extend demographic and quality analyses.
Translation: if Congress doesn’t continue to give NIST money, then high school students will get drunk or high, young teens will view porn, and kids will encounter fraudsters on Facebook. It’s up to you, Congress.
Vendors and researchers are paying a lot of attention to estimating ages by using a person’s face, and all of us are awaiting NIST’s results on its age estimation tests.
But before you declare dorsal hand features as the solution to age estimation, consider:
As the title states, the study only looked at females. No idea if my masculine hand features are predictive. (Anecdotally, more males work at tasks such as bricklaying that affect the hands, including the knuckle texture that was highlighted in the study.)
As the title states, the study only looked at people from India. No idea if my American/German/English/etc. hand features are predictive. (To be fair, the subjects had a variety of skin tones.)
The study only had 1454 subjects. Better than a study that used less than 20 people, but still not enough. More research is needed.
And even with all of that, the mean absolute error in age estimation was over 4 years.
The Prism Project’s home page at https://www.the-prism-project.com/, illustrating the Biometric Digital Identity Prism as of March 2024. From Acuity Market Intelligence and FindBiometrics.
With over 100 firms in the biometric industry, their offerings are going to naturally differ—even if all the firms are TRYING to copy each other and offer “me too” solutions.
I’ve worked for over a dozen biometric firms as an employee or independent contractor, and I’ve analyzed over 80 biometric firms in competitive intelligence exercises, so I’m well aware of the vast implementation differences between the biometric offerings.
Some of the implementation differences provoke vehement disagreements between biometric firms regarding which choice is correct. Yes, we FIGHT.
Let’s look at three (out of many) of these implementation differences and see how they affect YOUR company’s content marketing efforts—whether you’re engaging in identity blog post writing, or some other content marketing activity.
The three biometric implementation choices
Firms that develop biometric solutions make (or should make) the following choices when implementing their solutions.
Presentation attack detection. Assuming the solution incorporates presentation attack detection (liveness detection), or a way of detecting whether the presented biometric is real or a spoof, the firm must decide whether to use active or passive liveness detection.
Age assurance. When choosing age assurance solutions that determine whether a person is old enough to access a product or service, the firm must decide whether or not age estimation is acceptable.
Biometric modality. Finally, the firm must choose which biometric modalities to support. While there are a number of modality wars involving all the biometric modalities, this post is going to limit itself to the question of whether or not voice biometrics are acceptable.
I will address each of these questions in turn, highlighting the pros and cons of each implementation choice. After that, we’ll see how this affects your firm’s content marketing.
(I)nstead of capturing a true biometric from a person, the biometric sensor is fooled into capturing a fake biometric: an artificial finger, a face with a mask on it, or a face on a video screen (rather than a face of a live person).
This tomfoolery is called a “presentation attack” (becuase you’re attacking security with a fake presentation).
And an organization called iBeta is one of the testing facilities authorized to test in accordance with the standard and to determine whether a biometric reader can detect the “liveness” of a biometric sample.
(Friends, I’m not going to get into passive liveness and active liveness. That’s best saved for another day.)
Now I could cite a firm using active liveness detection to say why it’s great, or I could cite a firm using passive liveness detection to say why it’s great. But perhaps the most balanced assessment comes from facia, which offers both types of liveness detection. How does facia define the two types of liveness detection?
Active liveness detection, as the name suggests, requires some sort of activity from the user. If a system is unable to detect liveness, it will ask the user to perform some specific actions such as nodding, blinking or any other facial movement. This allows the system to detect natural movements and separate it from a system trying to mimic a human being….
Passive liveness detection operates discreetly in the background, requiring no explicit action from the user. The system’s artificial intelligence continuously analyses facial movements, depth, texture, and other biometric indicators to detect an individual’s liveness.
Pros and cons
Briefly, the pros and cons of the two methods are as follows:
While active liveness detection offers robust protection, requires clear consent, and acts as a deterrent, it is hard to use, complex, and slow.
Passive liveness detection offers an enhanced user experience via ease of use and speed and is easier to integrate with other solutions, but it incorporates privacy concerns (passive liveness detection can be implemented without the user’s knowledge) and may not be used in high-risk situations.
So in truth the choice is up to each firm. I’ve worked with firms that used both liveness detection methods, and while I’ve spent most of my time with passive implementations, the active ones can work also.
A perfect wishy-washy statement that will get BOTH sides angry at me. (Except perhaps for companies like facia that use both.)
If you need to know a person’s age, you can ask them. Because people never lie.
Well, maybe they do. There are two better age assurance methods:
Age verification, where you obtain a person’s government-issued identity document with a confirmed birthdate, confirm that the identity document truly belongs to the person, and then simply check the date of birth on the identity document and determine whether the person is old enough to access the product or service.
Age estimation, where you don’t use a government-issued identity document and instead examine the face and estimate the person’s age.
I changed my mind on age estimation
I’ve gone back and forth on this. As I previously mentioned, my employment history includes time with a firm produces driver’s licenses for the majority of U.S. states. And back when that firm was providing my paycheck, I was financially incentivized to champion age verification based upon the driver’s licenses that my company (or occasionally some inferior company) produced.
But as age assurance applications moved into other areas such as social media use, a problem occurred since 13 year olds usually don’t have government IDs. A few of them may have passports or other government IDs, but none of them have driver’s licenses.
But does age estimation work? I’m not sure if ANYONE has posted a non-biased view, so I’ll try to do so myself.
The pros of age estimation include its applicability to all ages including young people, its protection of privacy since it requires no information about the individual identity, and its ease of use since you don’t have to dig for your physical driver’s license or your mobile driver’s license—your face is already there.
The huge con of age estimation is that it is by definition an estimate. If I show a bartender my driver’s license before buying a beer, they will know whether I am 20 years and 364 days old and ineligible to purchase alcohol, or whether I am 21 years and 0 days old and eligible. Estimates aren’t that precise.
Fingerprints, palm prints, faces, irises, and everything up to gait. (And behavioral biometrics.) There are a lot of biometric modalities out there, and one that has been around for years is the voice biometric.
I’ve discussed this topic before, and the partial title of the post (“We’ll Survive Voice Spoofing”) gives away how I feel about the matter, but I’ll present both sides of the issue.
No one can deny that voice spoofing exists and is effective, but many of the examples cited by the popular press are cases in which a HUMAN (rather than an ALGORITHM) was fooled by a deepfake voice. But voice recognition software can also be fooled.
Take a study from the University of Waterloo, summarized here, that proclaims: “Computer scientists at the University of Waterloo have discovered a method of attack that can successfully bypass voice authentication security systems with up to a 99% success rate after only six tries.”
If you re-read that sentence, you will notice that it includes the words “up to.” Those words are significant if you actually read the article.
In a recent test against Amazon Connect’s voice authentication system, they achieved a 10 per cent success rate in one four-second attack, with this rate rising to over 40 per cent in less than thirty seconds. With some of the less sophisticated voice authentication systems they targeted, they achieved a 99 per cent success rate after six attempts.
Other voice spoofing studies
Similar to Gender Shades, the University of Waterloo study does not appear to have tested hundreds of voice recognition algorithms. But there are other studies.
The 2021 NIST Speaker Recognition Evaluation (PDF here) tested results from 15 teams, but this test was not specific to spoofing.
A test that was specific to spoofing was the ASVspoof 2021 test with 54 team participants, but the ASVspoof 2021 results are only accessible in abstract form, with no detailed results.
Another test, this one with results, is the SASV2022 challenge, with 23 valid submissions. Here are the top 10 performers and their error rates.
You’ll note that the top performers don’t have error rates anywhere near the University of Waterloo’s 99 percent.
So some firms will argue that voice recognition can be spoofed and thus cannot be trusted, while other firms will argue that the best voice recognition algorithms are rarely fooled.
What does this mean for your company?
Obviously, different firms are going to respond to the three questions above in different ways.
For example, a firm that offers face biometrics but not voice biometrics will convey how voice is not a secure modality due to the ease of spoofing. “Do you want to lose tens of millions of dollars?”
A firm that offers voice biometrics but not face biometrics will emphasize its spoof detection capabilities (and cast shade on face spoofing). “We tested our algorithm against that voice fake that was in the news, and we detected the voice as a deepfake!”
There is no universal truth here, and the message your firm conveys depends upon your firm’s unique characteristics.
And those characteristics can change.
Once when I was working for a client, this firm had made a particular choice with one of these three questions. Therefore, when I was writing for the client, I wrote in a way that argued the client’s position.
After I stopped working for this particular client, the client’s position changed and the firm adopted the opposite view of the question.
Therefore I had to message the client and say, “Hey, remember that piece I wrote for you that said this? Well, you’d better edit it, now that you’ve changed your mind on the question…”
Bear this in mind as you create your blog, white paper, case study, or other identity/biometric content, or have someone like the biometric content marketing expert Bredemarket work with you to create your content. There are people who sincerely hold the opposite belief of your firm…but your firm needs to argue that those people are, um, misinformed.