What if Machine Learning Models Can’t Get Generative AI Training Data?

An image of a neural network. By DancingPhilosopher – Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=135594693

Machine learning models need training data to improve their accuracy—something I know from my many years in biometrics.

And it’s difficult to get that training data—something else I know from my many years in biometrics. Consider the acronyms GDPR, CRPA, and especially BIPA. It’s very hard to get data to train biometric algorithms, so they are trained on relatively limited data sets.

At the same time that biometric algorithm training data is limited, Kevin Indig believes that generative AI large language models are ALSO going to encounter limited accessibility to training data. Actually, they are already.

The lawsuits have already begun

A few months ago, generative AI models like ChatGPT were going to solve all of humanity’s problems and allow us to lead lives of leisure as the bots did all our work for us. Or potentially the bots would get us all fired. Or something.

But then people began to ask HOW these large language models work…and where they get their training data.

Just like biometric training models that just grab images and associated data from the web without asking permission (you know the example that I’m talking about), some are alleging that LLMs are training their models on copyrighted content in violation of the law.

I am not a lawyer and cannot meaningfully discuss what is “fair use” and what is not, but suffice it to say that alleged victims are filing court cases.

Sarah Silverman et al and copyright infringement

Here’s one example from July:

Comedian and author Sarah Silverman, as well as authors Christopher Golden and Richard Kadrey — are suing OpenAI and Meta each in a US District Court over dual claims of copyright infringement.

The suits alleges, among other things, that OpenAI’s ChatGPT and Meta’s LLaMA were trained on illegally-acquired datasets containing their works, which they say were acquired from “shadow library” websites like Bibliotik, Library Genesis, Z-Library, and others, noting the books are “available in bulk via torrent systems.”

From https://www.theverge.com/2023/7/9/23788741/sarah-silverman-openai-meta-chatgpt-llama-copyright-infringement-chatbots-artificial-intelligence-ai

This could be a big mess, especially since copyright laws vary from country to country. This description of copyright law LLM implications, for example, is focused upon United Kingdom law. Laws in other countries differ.

And now the technical blocks are beginning

Just today, Kevin Indig highlighted another issue that could limit LLM access to online training data.

Some sites are already blocking the LLM crawlers

Systems that get data from the web, such as Google, Bing, and (relevant to us) ChatGPT, use “crawlers” to gather the information from the web for their use. ChatGPT, for example, has its own crawler.

By Yintan at English Wikipedia, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=63631702

Guess what Indig found out about ChatGPT’s crawler?

An analysis of the top 1,000 sites on the web from Originality AI shows 12% already block Chat GPT’s crawler. (source)

From https://www.kevin-indig.com/most-sites-will-block-chat-gpt/

But that only includes the sites that blocked the crawler when Originality AI performed its analysis.

More sites will block the LLM crawlers

Indig believes that in the future, the number of the top 1000 sites that will block ChatGPT’s crawler will rise significantly…to 84%. His belief is based on analyzing the business models for the sites that already block ChatGPT and assuming that other sites that use the same business models will also find it in their interest to block ChatGPT.

The business models that won’t block ChatGPT are assumed to include governments, universities, and search engines. Such sites are friendly to the sharing of information, and thus would have no reason to block ChatGPT or any other LLM crawler.

The business models that would block ChatGPT are assumed to include publishers, marketplaces, and many others. Entities using these business models are not just going to turn it over to an LLM for free.

As Indig explains regarding the top two blocking business models:

By Karl Thomas Moore – Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=58968347

For publishers, content is the product. Giving it away for free to generative AI means foregoing most if not all, ad revenue. Publishers remember the revenue drops caused by social media and modern search engines in the late 2,000s.

Marketplaces build their own AI assistants and don’t want competition.

From https://www.kevin-indig.com/most-sites-will-block-chat-gpt/

What does this mean for LLMs?

One possibility is that LLMs will run into the same training issues as biometric algorithms.

  • In biometrics, the same people that loudly exclaim that biometric algorithms are racist would be horrified at the purely technical solution that would solve all inaccuracy problems—let the biometric algorithms train on ALL available biometric data. In the activists’ view (and in the view of many), unrestricted access to biometric data for algorithmic training would be a privacy nightmare.
  • Similarly, those who complain that LLMs are woefully inaccurate would be horrified if the LLM accuracy problem were solved by a purely technical solution: let the algorithms train themselves on ALL available data.

Could LLMs buy training data?

Of course, there’s another solution to the problem: have the companies SELL their data to the LLMs.

By Nic McPhee from Morris, Minnesota, USA – London – 14-15 Dec 2007 – 034, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=10606179

In theory, this could provide the data holders with a nice revenue stream while allowing the LLMs to be extremely accurate. (Of course the users who actually contribute the data to the data holders would probably be shut out of any revenue, but them’s the breaks.)

But that’s only in theory. Based upon past experience with data holders, the people who want to use the data are probably not going to pay the data holders sufficiently.

Google and Meta to Canada: Drop dead / Mourir

By The original uploader was Illegitimate Barrister at Wikimedia Commons. The current SVG encoding is a rewrite performed by MapGrid. – This vector image is generated programmatically from geometry defined in File:Flag of Canada (construction sheet – leaf geometry).svg., Public Domain, https://commons.wikimedia.org/w/index.php?curid=32276527

Even today, Google and Meta (Facebook et al) are greeting Canada’s government-mandated Bill C-18 with resistance. Here’s what Google is saying:

Bill C-18 requires two companies (including Google) to pay for simply showing links to Canadian news publications, something that everyone else does for free. The unprecedented decision to put a price on links (a so-called “link tax”) breaks the way the web and search engines work, and exposes us to uncapped financial liability simply for facilitating access to news from Canadian publications….

As a result, we have informed them that we have made the difficult decision that, when the law takes effect, we will be removing links to Canadian news publications from our Search, News, and Discover products.

From https://blog.google/canada-news-en/#overview

But wait, it gets better:

In addition, we will no longer be able to operate Google News Showcase – our product experience and licensing program for news – in Canada.

From https://blog.google/canada-news-en/#overview

Google News Showcase is the program that gives money to news organizations in Canada. Meta has a similar program. Peter Menzies notes that these programs give tens of millions of (Canadian) dollars to news organizations, but that could end, despite government threats.

The federal and Quebec governments pulled their advertising spends, but those moves amount to less money than Meta will save by ending its $18 million in existing journalism funding. 

From https://thehub.ca/2023-09-15/peter-menzies-the-media-is-boycotting-meta-and-nobody-cares/

What’s next?

Bearing in mind that Big Tech is reluctant to give journalistic data holders money even when a government ORDERS that they do so…

…what is the likelihood that generative AI algorithm authors (including Big Tech companies like Google and Microsoft) will VOLUNTARILY pay funds to data holders for algorithm training?

If Kevin Indig is right, LLM training data will become extremely limited, adversely affecting the algorithms’ use.

There’s a Reason Why “Tech” is a Four-Letter Word

By Tomia, original image en:User:Polylerus – Own work (Vector drawing based on Image:Profanity.JPG), Public Domain, https://commons.wikimedia.org/w/index.php?curid=3332425

We often use the phrase “four-letter word” to refer to cuss words that shouldn’t be said in polite company. Occasionally, we have our own words that we personally consider to be four-letter words. (Such as “BIPA.”)

There are some times when we resign ourselves to the fact that “tech” can be a four-letter word also. But there’s actually a good reason for the problems we have with today’s technology.

Tech can be dim

Just this week I was doing something on my smartphone and my screen got really dim all of a sudden, with no explanation.

So I went to my phone’s settings, and my brightness setting was down at the lowest level.

For no reason.

“Any sufficiently advanced technology is indistinguishable from magic.”

– Arthur C. Clarke, quoted here.

So I increased my screen’s brightness, and everything was back to normal. Or so I thought.

A little while later, my screen got dim again, so I went to the brightness setting…and was told that my brightness was very high. (Could have fooled me.)

I can’t remember what I did next (because when you are trying to fix something you can NEVER remember what you did next), but later my screen brightness was fine.

For no reason.

Was Arthur C. Clarke right? And if so, WHY was he right?

Perhaps it’s selective memory, but I don’t recall having this many technology problems when I was younger.

The shift to multi-purpose devices

Part of the reason for the increasing complexity of technology is that we make fewer and fewer single-purpose devices, and are manufacturing more and more multi-purpose devices.

One example of the shift: if I want to write a letter today, I can write it on my smartphone. (Assuming the screen is bright enough.) This same smartphone can perform my banking activities, play games, keep track of Bredemarket’s earnings…oh, and make phone calls.

Smartphones are an example of technologial convergence:

Technological convergence is a term that describes bringing previously unrelated technologies together, often in a single device. Smartphones might be the best possible example of such a convergence. Prior to the widespread adoption of smartphones, consumers generally relied on a collection of single-purpose devices. Some of these devices included telephones, wrist watches, digital cameras and global positioning system (GPS) navigators. Today, even low-end smartphones combine the functionality of all these separate devices, easily replacing them in a single device.

From a consumer perspective, technological convergence is often synonymous with innovation.

From https://www.techtarget.com/searchdatacenter/definition/technological-convergence

And the smartphone example certainly demonstrates innovation from the previous-generation single-purpose devices.

When I was a kid, if I wanted to write a letter, I had two choices:

  1. I could set a piece of paper on the table and write the letter with a writing implement such as a pen or pencil.
  2. I could roll a piece of paper into a typewriter and type the letter.

These were, for the most part, single purpose devices. Sure I could make a paper airplane out of the piece of paper, but I couldn’t use the typewriter to play a game or make a phone call.

Turning our attention to the typewriter, it certainly was a manufacturing marvel, and intricate precision was required to design the hammers that would hit the typewritter ribbon and leave their impressions on the piece of paper. And typewriters could break, and repairmen (back then they were mostly men) could fix them.

A smartphone is much more innovative than a smartphone. But it’s infinitely harder to figure out what is wrong with a smartphone.

The smartphone hardware alone is incredibly complex, with components from a multitude of manufacturers. Add the complexities of the operating system and all the different types of software that are loaded on a smartphone, and a single problem could result from a myriad of causes.

No wonder it seems like magic, even for the best of us.

Explaining technology

But this complexity has provided a number of jobs:

  • The helpful person at your cellular service provider who has acquired just enough information to recognize and fix an errant application.
  • The many people in call centers (the legitimate call centers, not the “we found a problem with your Windows computer” call scammers) who perform the same tasks at a distance.
By Earl Andrew at English Wikipedia – Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=17793658
  • All the people who write instructions on how to use and fix all of our multi-purpose devices, from smartphones to computers to remote controls.

Oh, and the people that somehow have to succinctly explain to prospects why these multi-purpose devices are so great.

Because no one’s going to run into problems with technology unless they acquire the technology. And your firm has to get them to acquire your technology.

Crafting a technology marketing piece

So your firm’s marketer or writer has to craft some type of content that will make a prospect aware of your technology, and/or induce the prospect to consider purchasing the technology, and/or ideally convert the prospect into a paying customer.

Before your marketer or writer crafts the content, they have to answer some basic questions.

By Evan-Amos – Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=11293857

Using a very simple single-purpose example of a hammer, here are the questions with explanations:

  • Why does the prospect need this technology? And why do you provide this technology? This rationale for why you are in business, and why your product exists, will help you make the sale. Does your prospect want to buy a hammer from a company that got tired of manufacturing plastic drink stirrers, or do they want to buy a hammer from a forester who wants to empower people to build useful items?
  • How does your firm provide this technology? If I want to insert a nail into a piece of wood, do I need to attach your device to an automobile or an aircraft carrier? No, the hammer will fit in your hand. (Assuming you have hands.)
  • What is the technology? Notice that the “why” and “how” questions come before the “what” question, because “why” and “how” are more critical. But you still have to explain what the technology is (with the caveat I mention below). Perhaps some of your prospects have no idea what a hammer is. Don’t assume they already know.
  • What is the goal of the technology? Does a hammer help you floss your teeth? No, it puts nails into wood.
  • What are the benefits of the technology? When I previously said that you should explain what the technology is, most prospects aren’t looking for detailed schematics. They primarily care about what the technology will do for them. For example, that hammer can keep their wooden structure from falling down. They don’t care about the exact composition of the metal in the hammer head.
  • Finally, who is the target audience for the technology? I don’t want to read through an entire marketing blurb and order a basic hammer, only to discover later that the product won’t help me keep two diamonds together but is really intended for wood. So don’t send an email to jewelers about your hammer. They have their own tools.
By Mauro Cateb – Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=90944472

(UPDATE OCTOBER 23, 2023: “SIX QUESTIONS YOUR CONTENT CREATOR SHOULD ASK YOU IS SO 2022. DOWNLOAD THE NEWER “SEVEN QUESTIONS YOUR CONTENT CREATOR SHOULD ASK YOU” HERE.)

Once you answer these questions (more about the six questions in the Bredemarket e-book available here), your marketer or writer can craft your content.

Or, if you need help, Bredemarket (the technology content marketing expert) can craft your content, whether it’s a blog post, case study, white paper, or something else.

I’ve helped other technology firms explain their “hammers” to their target audiences, explaining the benefits, and answering the essential “why” questions about the hammers.

Can I help your technology firm communicate your message? Contact me.

Bredemarket logo

Five Topics a Biometric Content Marketing Expert Needs to Understand

As a child, did you sleep at night dreaming that someday you could become a biometric content marketing expert?

I didn’t either. Frankly, I didn’t even work in biometrics professionally until I was in my 30s.

If you have a mad adult desire to become a biometric content marketing expert, here are five topics that I (a self-styled biometric content marketing expert) think you need to understand.

Topic One: Biometrics

Sorry to be Captain Obvious, but if you’re going to talk about biometrics you need to know what you’re talking about.

The days in which an expert could confine themselves to a single biometric modality are long past. Why? Because once you declare yourself an iris expert, someone is bound to ask, “How does iris recognition compare to facial recognition?”

Only some of the Biometrics Institute’s types of biometrics. Full list at https://www.biometricsinstitute.org/what-is-biometrics/types-of-biometrics/

And there are a number of biometric modalities. In addition to face and iris, the Biometrics Institute has cataloged a list of other biometric modalities, including fingerprints/palmprints, voice, DNA, vein, finger/hand geometry, and some more esoteric ones such as gait, keystrokes, and odor. (I wouldn’t want to manage the NIST independent testing for odor.)

As far as I’m concerned, the point isn’t to select the best biometric and ignore all the others. I’m a huge fan of multimodal biometrics, in which a person’s identity is verified or authenticated by multiple biometric types. It’s harder to spoof multiple biometrics than it is to spoof a single one. And even if you spoof two of them, what if the system checks for odor and you haven’t spoofed that one yet?

Topic Two: All the other factors

In the same way that I don’t care for people who select one biometric and ignore the others, I don’t care for some in the “passwords are dead” crowd who go further and say, “Passwords are dead. Use biometrics instead.”

Although I admire the rhyming nature of the phrase.

If you want a robust identity system, you need to use multiple factors in identity verification and authentication.

  • Something you know.
  • Something you have.
  • Something you are (i.e. biometrics).
  • Something you do.
  • Somewhere you are.

Again, use of multiple factors protects against spoofing. Maybe someone can create a gummy fingerprint, but can they also create a fake passport AND spoof the city in which you are physically located?

From https://www.youtube.com/shorts/mqfHAc227As

Don’t assume that biometrics answers all the ills of the world. You need other factors.

And if you master these factors, you are not only a biometric content marketing expert, but also an identity content marketing expert.

Topic Three: How biometrics are used

It’s not enough to understand the technical ins and outs of biometric capture, matching, and review. You need to know how biometrics are used.

  • One-to-one vs. one-to-many. Is the biometric that you acquire only compared to a single biometric samples, or to a database of hundreds, thousands, millions, or billions of other biometric samples?
  • Markets. When I started in biometrics, I only participated in two markets: law enforcement (catch bad people) and benefits (get benefit payments to the right people). There are many other markets. Just recently I have written about financial identity and educational identity. I’ve worked with about a dozen other markets personally, and there are many more.
  • Use cases. Related to markets, you need to understand the use cases that biometrics can address. Taking the benefits example, there’s a use case in which a person enrolls for benefits, and the government agency wants to make sure that the person isn’t already enrolled under another name. And there’s a use cases when benefits are paid to make sure that the authorized recipient receives their benefits, and no one else receives their benefits.
  • Legal and privacy issues. It is imperative that you understand the legal ramifications that affect your chosen biometric use case in your locality. For example, if your house has a doorbell camera that uses “familiar face detection” to identify the faces of people that come to your door, and the people that come to your door are residents of the state of Illinois, you have a BIG BIPA (Biometric Information Privacy Act) problem.

Any identity content marketing expert or biometric content marketing expert worth their salt will understand these and related issues.

Topic Four: Content marketing

This is another Captain Obvious point. If you want to present yourself as a biometric contet marketing expert or identity content marketing expert, you have to have a feel for content marketing.

Here’s how HubSpot defines content marketing:

The definition of content marketing is simple: It’s the process of publishing written and visual material online with the purpose of attracting more leads to your business. These can include blog posts, pages, ebooks, infographics, videos, and more.

From https://blog.hubspot.com/marketing/content-marketing

Here are all the types of content in which one content marketer claims proficiency (as of July 27, 2023, subject to change):

Articles • Battlecards (80+) • Blog Posts (400+) • Briefs/Data/Literature Sheets • Case Studies (12+) • Competitive Analyses • Email Newsletters (200+) • Event/Conference/Trade Show Demonstration Scripts • FAQs • Plans • Playbooks • Presentations • Proposal Templates • Proposals (100+) • Quality Improvement Documents • Requirements • Scientific Book Chapters • Smartphone Application Content • Social Media (Facebook, Instagram, LinkedIn, Threads, TikTok, Twitter) • Strategic Analyses • Web Page Content • White Papers and E-Books

From https://www.linkedin.com/in/jbredehoft/, last updated 7/27/2023.

Now frankly, that list is pretty weak. You’ll notice that it doesn’t include Snapchat.

But content marketers need to be comfortable with creating at least one type of content.

Topic Five: How L-1 Identity Solutions came to be

Yes, an identity content marketing expert needs to thoroughly understand how L-1 Identity Solutions came to be.

I’m only half joking.

Back in the late 1990s and early 2000s (I’ll ignore FpVTE results for a moment), the fingerprint world in which I worked recognized four major vendors: Cogent, NEC, Printrak (later part of Motorola), and Sagem Morpho.

And then there were all these teeny tiny vendors that offered biometric and non-biometric solutions, including the fierce competitors Identix and Digital Biometrics, the fierce competitors Viisage and Visionics, and a bunch of other companies like Iridian.

Wel, there WERE all these teeny tiny vendors.

Until Bob LaPenta bought them all up and combined them into a single company, L-1 Identity Solutions. (LaPenta was one of the “Ls” in L-3, so he chose the name L-1 when he started his own company.)

So around 2008 the Big Four (including a post-FpVTE Motorola) became the Big Five, since L-1 Identity Solutions was now at the table with the big boys.

But then several things happened:

  • Motorola started selling off parts of itself. One of those parts, its Biometric Business Unit, was purchased by Safran (the company formed after Sagem and Snecma merged). This affected me because I, a Motorola employee, became an employee of MorphoTrak, the subsidiary formed when Sagem Morpho de facto acquired “Printrak” (Motorola’s Biometric Business Unit). So now the Big Five were the Big Four.
  • Make that the Big Three, because Safran also bought L-1 Identity Solutions, which became MorphoTrust. MorphoTrak and MorphoTrust were separate entities, and in fact competed against each other, so maybe we should say that the Big Four still existed.
  • Oh, and by the way, the independent company Cogent was acquired by 3M (although NEC considered buying it).
  • A few years later, 3M sold bits of itself (including the Cogent bit) to Gemalto.
  • Then in 2017, Advent International (which owned Oberthur) acquired bits of Safran (the “Morpho” part) and merged them with Oberthur to form IDEMIA. As a consequence of this, MorphoTrust de facto acquired MorphoTrak, ending the competition but requiring me to have two separate computers to access the still-separate MorphoTrust and MorphoTrak computer networks. (In passing, I have heard from two sources, but have not confirmed myself, that the possible sale of IDEMIA is on hold.)
  • And Gemalto was acquired by Thales.

So as of 2023, the Big Three (as characterized by Maxine Most and FindBiometrics) are IDEMIA, NEC, and Thales.

Why do I mention all this? Because all these mergers and acquisitions have resulted in identity practitioners working for a dizzying number of firms.

As of August 2023, I myself have worked for five identity firms, but in reality four of the five are the same firm because the original Printrak International kept on getting acquired (Motorola, Safran, IDEMIA).

And that’s nothing. One of my former Printrak coworkers (R.M.) has also worked for Digital Biometrics (now part of IDEMIA), Cross Match Technologies (now part of ASSA ABLOY), Iridian (now part of IDEMIA), Datastrip, Creative Information Technology, AGNITiO, iTouch Biometrics, NDI Recognition Systems, iProov, and a few other firms here and there.

The point is that everybody knows everybody because everybody has worked with (and against) everybody. And with all the job shifts, it’s a regular Peyton Place.

By ABC Television – eBay itemphoto frontphoto back, Public Domain, https://commons.wikimedia.org/w/index.php?curid=17252688

Not sure which one is me, which one is R.M., and who the other people are.

Do you need an identity content marketing expert today?

Do you need someone who not only knows biometrics and content marketing, but also all the other factors, their uses, and even knows the tangled history of L-1?

Someone who offers:

  • No identity learning curve?
  • No content learning curve?
  • Proven results?

If I can help you create your identity content, contact me.

Digital identity and…the United Nations Sustainable Development Goals?

Over the last few years, I have approached digital identity(ies) from a particular perspective, concentrating on the different types of digital identities that we have (none of us has a single identity, when you think about it), and the usefulness of these identities for various purposes, including purposes in which the identity of the person must be well established.

I have also noted the wide list of organizations that have expressed an interest in digital identity. Because of pressing digital identity needs, many of these organizations have moved forward with their own digital identity proposals, although now they are devoting more effort to ensure that their individual proposals play well with the proposals of other organizations.

Enter the United Nations (or part of it)

Well, let’s add one more organization to the list of those concerned about digital identity: the United Nations.

Although actually “the United Nations” is in reality a whole bunch of separate organizations that kinda sorta work together under the UN umbrella. But each of these organizations can get some oomph (an international relations diplomatic turn) from trumpeting a UN affiliation.

So let’s look at the Better Than Cash Alliance.

Based at the United Nations, the Better Than Cash Alliance is a partnership of governments, companies, and international organizations that accelerates the transition from cash to responsible digital payments to help achieve the Sustainable Development Goals

Note right off the bat that the Better Than Cash Alliance is not focused on digital identity per se, but digital payments. (Chris Burt of Biometric Update notes this focus.) Of course, digital payments and digital identity are necessarily intertwined, as we will see in a minute.

Enter the Sustainable Development Goals

But more importantly, digital payments themselves are not the ultimate goal of the Better Than Cash Alliance. Digital payments are only a means to an end to realize the United Nations Sustainable Development Goals, issued by a different UN organization.

Because of its primary focus, the Better Than Cash Alliance concentrates on issues that I myself have only studied in passing. For example, I have concentrated on the issues faced by people with no verifiable identity, but have not specifically looked at this from the lens of Sustainable Development Goal number 5, Gender Equality.

Principle 2 of the UN Principles for Responsible Digital Payments (October 2021 revision)

For this post, however, I’m going to focus on the digital identity aspects of the Better Than Cash Alliance and its report, UN Principles for Responsible Digital Payments (PDF), which was just updated this month (October 2021).

One of the key factors outlined in the report is “trust.” Now trust can have a variety of meanings (including trust that the information about my identity will not be used to throw me into a terrorist concentration camp), but for my purposes I want to concentrate on the trust that I, as a digital payments recipient, will receive the payments to which I am entitled.

To that end, the revised principles include items such as “ensure funds are protected and accessible” (principle 2), “champion value chain accountability” (principle 9), and other principles that impact on digital identity.

The introduction to the discussion on principle 2 highlights the problem:

A prerequisite of digital payments is that they match or surpass the
qualities of cash. All users rightly expect their funds to be safe and readily available, but this is not always the case. The causal factors behind this are multiplex.

(“Multiplex”? Yes, this document was written by government committees. Or movie theater owners.)

AMC Ontario Mills. (California, not Canada.) By Coolcaesar – Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=104309320

To avoid the multiplexity of these issues, one offered response is to “proactively track and protect against unauthorized transactions, including fraud and mistakes.” This can be done by several methods near and dear to us in identity-land:

Advocate for appropriate security controls to mitigate transaction risks (e.g., biometric security,34 two factor authentication,35 limits on logins or transaction amounts,36 creating “need-to-know” administrative privileges for interacting with client data).

Now most people who read this report aren’t interested in the footnotes. But I am. Here are footnotes 34, 35, and 36 from the document.

34 Examples include the use of biometrics in India’s Aadhaar identification system, and UNHCR’s use of iris technology to distribute cash to refugees in Jordan

35 See EU PSD2 Articles 97–98, Ghana’s Payments Systems and Service Act, 2019 (section 65(1)), and Malawi’s 2019 e-Money regulations (section 17)

36 India Master Direction on Prepaid Payment Instruments, Section 15.3

Of course the report could have cited other examples, such as the use of fingerprints for benefits payments in the United States in the 1990s and 2000s, but I’m sure that falls afoul of some Sustainable Development Goal.

Although it’s harder to criticize a UN entity, such as the aforementioned UNHCR, when it uses biometrics.

Or maybe it isn’t that hard, when you think about Access Now’s criticisms of the UNHCR program.

Refugees should not be required to hand over personal biometric data in exchange for basic needs such as purchasing food, or accessing money. However, iris scan technology supplied by UK-registered company, IrisGuard, is reportedly being used by the World Food Programme (WFP) and the United Nations High Commissioner for Refugees (UNHCR) in refugee camps and urban centers in Jordan.

Based on reports suggesting the absence of meaningful consent, and an opaque privacy policy, Access Now has serious objections to the lack of transparency and privacy safeguards around this precarious tech rollout. 

Wow. Jordan is as bad as Illinois. Maybe Jordan needs a BIPA! Hope their doorbell cameras aren’t a problem…

So while the Better Than Cash Alliance is focusing on other things, it’s at least paying lip service to some of the stronger identity controls that many in the identity industry advocate.

Of course, it’s outside of the scope of the Better Than Cash Alliance to dictate HOW to implement “appropriate security controls.”

But anything that saves the whales AND the plankton (and complies with BIPA) will be met with approval.

A second “biometrics is evil” post (Amazon One)

This is a follow-up to something I wrote a couple of weeks ago. I concluded that earlier post by noting that when you say that something needs to be replaced because it is bad, you need to evaluate the replacement to see if it is any better…or worse.

First, the recap

Before moving forward, let me briefly recap my points from the earlier post. If you like, you can read the entire post here.

  • Amazon is incentivizing customers ($10) to sign up for its Amazon One palm print program.
  • Amazon is not the first company to use biometrics to speed retail purchases. Pay By Touch, the University of Maryland Dining Hall have already done this, as well as every single store that lets you use Apple Pay, Google Pay, or Samsung Pay.
  • Amazon One is not only being connected in the public eye to unrelated services such as Amazon Rekognition, and to unrelated studies such as Gender Shades (which dealt with classification, not recognition), but has been accused of “asking people to sell their bodies.” Yet companies that offer similar services are not being demonized in the same way.
  • If you don’t use Amazon One to pay for your purchases, that doesn’t necessarily mean that you are protected from surveillance. I’ll dive into that in this post.

Now that we’re caught up, let’s look at the latest player to enter the Amazon One controversy.

Yes, U.S. Senators can be bipartisan

If you listen to the “opinion” news services, you get the feeling that the United States Senate has devolved into two warring factions that can’t get anything done. But Senators have always worked together (see Edward Kennedy and Dan Quayle), and they continue to work together today.

Specifically, three Senators are working together to ask Amazon a few questions: Bill Cassidy, M.D. (R-LA), Amy Klobuchar (D-MN), and Jon Ossoff (D-GA).

And naturally they issued a press release about it.

Now arguments can be made about whether Congressional press releases and hearings merely constitute grandstanding, or whether they are serious attempts to better the nation. Of course, anything that I oppose is obviously grandstanding, and anything I support is obviously a serious effort.

But for the moment let’s assume that the Senators have serious concerns about the privacy of American consumers, and that the nation demands answers to these questions from Amazon.

Here are the Senators’ questions, from the press release:

  1. Does Amazon have plans to expand Amazon One to additional Whole Foods, Amazon Go, and other Amazon store locations, and if so, on what timetable? 
  2. How many third-party customers has Amazon sold (or licensed) Amazon One to? What privacy protections are in place for those third parties and their customers?
  3. How many users have signed up for Amazon One? 
  4. Please describe all the ways you use data collected through Amazon One, including from third-party customers. Do you plan to use data collected through Amazon One devices to personalize advertisements, offers, or product recommendations to users? 
  5. Is Amazon One user data, including the Amazon One ID, ever paired with biometric data from facial recognition systems? 
  6. What information do you provide to consumers about how their data is being used? How will you ensure users understand and consent to Amazon One’s data collection, storage, and use practices when they link their Amazon One and Amazon account information?
  7. What actions have you taken to ensure the security of user data collected through Amazon One?

So when will we investigate other privacy-threatening technologies?

In a sense, the work of these three Senators should be commended, because if Amazon One is not implemented properly, serious privacy breaches could happen which could adversely impact American citizens. And this is the reason why many states and municipalities have moved to restrict the use of biometrics by private businesses.

And we know that Amazon is evil, because Slate said so back in January 2020.

The online bookseller has evolved into a giant of retail, resale, meal delivery, video streaming, cloud computing, fancy produce, original entertainment, cheap human labor, smart home tech, surveillance tech, and surveillance tech for smart homes….The company’s “last mile” shipping operation has led to burnout, injuries, and deaths, all connected to a warehouse operation that, while paying a decent minimum wage, is so efficient in part because it treats its human workers like robots who sometimes get bathroom breaks.

But why stop with Amazon? After all, Slate’s list included 29 other companies (while Amazon tops the list, other “top”-ranked companies include Facebook, Alphabet, Palantir Technologies, and Uber), to say nothing of entire industries that are capable of massive privacy violations.

Privacy breaches are not just tied to biometric systems, but can be tied to any system that stores private data. Restricting or banning biometric systems won’t solve anything, since all of these abuses could potentially occur on other systems.

  • When will the Senators ask these same questions to Apple, Google (part of the aforementioned Alphabet), and Samsung to find out when these companies will expand their “Pay” services? They won’t even have to ask all seven questions, because we already know the answer to question 5.
  • Oh, and while we’re at it, what about Mastercard, Visa, American Express, Discover, and similar credit card services that are often tied to information from our bank accounts? How do these firms personalize their offerings? Who can buy all that data?
  • And while we’re looking at credit cards, what about the debit cards issued by the banks, which are even more vulnerable to abuse. Let’s have the banks publicly reveal all the ways in which they protect user data.
  • You know, you have to watch out for those money orders also. How often do money order issuers ask consumers to show their government ID? What happens to that data?
  • Oh, and what about those gift cards that stores issue? What happens to the location and purchase data that is collected for those gift cards?
  • When people use cash to pay for goods, what is the resolution of the surveillance cameras that are trained on the cash registers? Can those surveillance cameras read the serial numbers on the bills that are exchanged? What assurances can the stores give that they are not tracking those serial numbers as they flow through the economy?

If you think that it’s silly to shut down every single payment system that could result in a privacy violation…you’re right.

Obviously if Amazon is breaking federal law, it should be prosecuted accordingly.

And if Amazon is breaking state law (such as Illinois BIPA law), then…well, that’s not the Senators’ business, that’s the business of class action lawyers.

But now the ball is in Amazon’s court, and Amazon will either provide thousands of pages of documents, a few short answers, a response indicating that the Senators are asking for confidential information on future product plans, or (unlikely with Amazon, but possible with other companies) a reply stating that the Senators can go pound sand.

Either way, the “Amazon is evil” campaign will continue.

Franchisees and BIPA

In other contexts, I have written about the relationship between franchisors and franchisees, which in some respects is similar to the way gig drivers work “with” (not “for”) Uber, Lyft, and the like. In many cases, the products that are advertised by a particular company are not made by that company, but by a franchisee of that company who is entirely separate from the parent company, but who is responsible for doing things the way the parent company wants them done. If you’re a franchisee, you CAN’T…um…”have it your way.”

This Whopper probably wasn’t made by Burger King itself, but by a franchisee of Burger King. By Tokfo – Own work, CC0, https://commons.wikimedia.org/w/index.php?curid=37367904

Speaking of which, here is an example of an article that confuses franchisor and franchisee. The Buzzfeed article, in typical Buzzfeed style, is entitled “This Is What Happened After A Bunch Of Employees At A Burger King Quit.” (Because of malfunctioning air conditioning, a number of employees put in their two weeks’ notice, leaving a “We All Quit” sign as they left.) You have to read ANOTHER article (from NBC) to find this little statement:

“Our franchisee is looking into this situation to ensure this doesn’t happen in the future,” a Burger King spokesperson said.

Yes, the employees’…um…beef wasn’t with Burger King itself (or its Brazilian/Canadian/American parent Restaurant Brands International), but with whoever manages the local franchise.

Well, now this world of franchisors and franchisees has entered the biometric world, according to a post in Greensfelder, a self-described “franchising & distribution law blog.”

Greensfelder’s post starts by explaining to its readers what BIPA is (something you already know if you read MY blog) and how franchisees are affected.

Plaintiffs are suing both franchisors and franchisees. Franchisors are being sued for collecting the information themselves for their own employees and also for the actions of their franchisees on theories of joint and several liability, vicarious liability, agency and alter ego. A recently filed case alleges that a franchisor mandates and controls virtually every aspect of its franchise locations, including the use of certain equipment that collects biometric information to track employees’ time and attendance and to monitor cash register systems for fraud.

This benefits the lawyers, who get to collect double the damages by claiming that both the franchisor and the franchisee are separately liable.

Greensfelder’s takeaway for franchisors:

Franchisors should be careful about mandating franchisee use of biometric procedures and devices without first checking applicable law and also making sure that their own policies and procedures are in compliance with those laws.

I’m not sure who is providing takeaways for franchisees.

Other than the usual advice to read the franchise agreement very, very carefully.

Will the Kami Doorbell Camera sell in Illinois?

There was a recent press release that I missed until Biometric Update started talking about it two days later. The January 19 press release from Kami was entitled “Kami Releases Smart Video Doorbell With Facial Recognition Capabilities.” The subhead announced, “The device also offers user privacy controls.”

And while reading that Kami press release, I noticed a potential issue that wasn’t fully addressed in the press release, or (so far) in the media coverage of the press release. That issue relates to that four-letter word “BIPA.”

This post explains what BIPA is and why it’s important.

  • But it starts by looking at smart video doorbells.
  • Next, it looks at this particular press release about a smart video doorbell.
  • Then we’ll look at a competitor’s smart video doorbell, and a particular decision that the competitor made because of BIPA.
  • Only then will we dive into BIPA.
  • Finally, we’ll circle back to Kami, and how it may be affected by BIPA. (Caution: I’m not a lawyer.)

What is a smart video doorbell?

Many of us can figure out what a smart video doorbell would do, since Kami isn’t the first company to offer such a product. (I’ll talk about another company in a little bit.)

The basic concept is that the owner of the video doorbell (whom I’ll refer to as the “user,” to be consistent with Kami’s terminology) manages a small database of faces that could be recognized by the video doorbell. For example, if I owned such a device, I would definitely want to enroll my face and the face of my wife, and I would probably want to enroll the faces of other relatives and close friends. Doing this would create an allowlist of people who are known to the smart video doorbell system.

However, because technology itself is neutral, I need to point out two things about a standard smart video doorbell implementation:

  • Depending upon the design, you can enroll a person into the system without the person knowing it. If the user of the system controls the enrollment, then the user has complete control over the people that are enrolled into the system. All I need is a picture of the person, and I can use that picture to enroll the person into my smart video doorbell. I can grab a picture that I took from New Year’s Eve, or I could even grab a picture from the Internet. After all, if President Joe Biden walked up to my front door, I’d definitely want to know about it. Now there are technological solutions to this; for example, liveness detection could be used to ensure that the person who is enrolling in the system is a live person and not a picture. But I’m not aware of any system that requires liveness detection for this particular use case.
  • You can enroll a person into the system for ANY reason. Usually consumer smart video doorbells are presented as a way to let you know when friends and family come to the door. But the technology has no way of detecting whether you are actually enrolling a “friend.” Perhaps you want to know when your ex-girlfriend comes to the door. Or perhaps you have a really good picture of the guy who’s been breaking into homes in your neighborhood. Now enterprise and government systems account for this by supporting separate allowlists and blocklists, but frankly you can put anyone on to any list for any reason.

So with that introduction, let’s see what Kami is offering, and why it’s different.

The Kami Doorbell Camera

Let’s return to the Kami press release. It, as well as the description of the item in Kami’s online store, parallels a lot of the features that you can find in any smart video doorbell.

Know exactly who’s at your door. Save the faces of friends and family in your Kami or YI Home App, allowing you to get notified if the person outside your front door is a familiar face or a stranger.

And it has other features, such as an IP-65 rating stating that the camera will continue to work outdoors in challenging weather conditions.

However, Yamin Durrani, Kami’s CEO, emphasized a particular point in the press release:

“The Kami Doorbell Camera was inspired by a greater need for safety and peace of mind as people spend more time at home and consumers’ increasing desire to reside in smart homes,” said Yamin Durrani, CEO of Kami. “However, we noticed one gaping hole in the smart doorbell market — it was lacking an extremely advanced security solution that also puts the user in complete control of their privacy. In designing our video doorbell camera we considered all the ways people live in their homes to elegantly combine accelerated intelligence with a level of customization and privacy that is unmatched in today’s market. The result is a solution that provides comfort, safety and peace of mind.”

Privacy for the user(s) makes sense, because you don’t want someone hacking into the system and stealing the pictures and other stored information. As described, Kami lets the user(s) control their own data, and the system has presumably been designed from the ground up to support this.

But Kami isn’t the only product out there.

One of Kami’s competitors has an interesting footnote in its product description

There’s this company called Google. You may have heard of it. And Google offers a product called Nest Aware. This product is a subscription service that works with Nest cameras and provides various types of alerts for activities within the range of the cameras.

And Nest even has a feature that sounds, um, familiar to Kami users. Nest refers to the feature as “familiar face detection.”

Nest speakers and displays listen for unusual sounds. Nest cameras can spot a familiar face.4 And they all send intelligent alerts that matter.

So it sounds like Nest Aware has the same type of “allowlist” feature that allows the Nest Aware user to enroll friends and family (or whoever) into the system, so that they can be automatically recognized and so you can receive relevant information.

Hmm…did you note that there is a footnote next to the mention of “familiar face”? Let’s see what that footnote says.

4. Familiar face alerts not available on Nest Cams used in Illinois.

To the average consumer, that footnote probably looks a little odd. Why would this feature not be available in Illinois, but available in all the other states?

Or perhaps the average consumer may recall another Google app from three years ago, the Google Art & Culture app. That app became all the rage when it introduced a feature that let you compare your face to the faces on famous works of art. Well, it let you perform that comparison…unless you lived in Illinois or Texas.

So what’s the big deal about Illinois?

Those of us who are active in the facial recognition industry, or people who are active in the privacy industry, are well aware of the Illinois Biometric Information and Privacy Act, or BIPA. This Act, which was passed in 2008, provides Illinois residents control over the use of their biometric data. And if a company violates that control, the resident is permitted to sue the offending company. And class action lawsuits are allowed, thus increasing the possible damages to the offending company.

And there are plenty of lawyers that are willing to help residents exercise their rights under BIPA.

One early example of a BIPA lawsuit was filed against L.A. Tan. This firm offered memberships, and rather than requiring the member to present a membership card, the member simply placed his or her fingerprint onto a scanner to verify membership. But under BIPA, that could be a problem:

The plaintiffs in the L.A. Tan case alleged that the company, which used customers’ fingerprint scans in lieu of key fobs for tanning membership ID purposes, violated the BIPA by failing to obtain the customers’ written consent to use the fingerprint data and by not disclosing to customers the company’s plans for storing the data or destroying it in the event a tanning customer terminated her salon membership or a franchise closed. The plaintiffs did not claim L.A. Tan illegally sold or lost customers’ fingerprint data, just that it did not handle the data as carefully as the BIPA requires.

L.A. Tan ended up settling the case for over a million dollars, but Illinois Policy wondered:

This outcome is reassuring for anyone concerned about the handling of private information like facial-recognition data and fingerprints, but it also could signal a flood of similar lawsuits to come.

And there certainly was a flood of lawsuits. I was working in strategic marketing at the time, and I would duly note the second lawsuit filed under BIPA, and then the third lawsuit, and the fourth…Eventually I stopped counting.

As of June 2019, 324 such lawsuits had been filed in total, including 161 in the first six months of 2019 alone. And some big names have been sued under BIPA.

Facebook settled for $650 million.

Google was sued in October 2019 over Google Photos, again in February 2020 over Google Photos, again in April 2020 over its G Suite for Education, again in July 2020 over its use of IBM’s Diversity in Faces algorithm, and probably several other times besides.

So you can understand why Google is a little reluctant to sell Nest Aware’s familiar face detection feature in Illinois.

So where does that leave Kami?

Here’s where the problem may lie. Based upon the other lawsuits, it appears that lawyers are alleging that before an Illinois resident’s biometric features are stored in a database, the person has to give consent for the biometric to be stored, and the person has to be informed of his or her rights under BIPA.

So such explicit permission has to be given for every biometric database physically within the state of Illinois?

Yes…and then some. Remember that Facebook and Google’s databases aren’t necessarily physically located within the state of Illinois, but those companies have been sued under BIPA. I’m not a lawyer, but conceivably an Illinois resident could sue a Swiss company, with its databases in Switzerland, for violating BIPA.

Now when someone sets up a Kami system, does the Kami user ensure that every Illinois resident has received the proper BIPA notices? And if the Kami user doesn’t do that, is Kami legally liable?

For all I know, the Kami enrollment feature may include explicit BIPA questions, such as “Is the person in this picture a resident of Illinois?” Then again, it may not.

Again, I’m not a lawyer, but it’s interesting to note that Google, who does have access to a bunch of lawyers, decided to dodge the issue by not selling familiar face detection to Illinois residents.

Which doesn’t answer the question of an Iowa Nest Aware familiar face detection user who enrolls an Illinois resident…