What if Machine Learning Models Can’t Get Generative AI Training Data?

An image of a neural network. By DancingPhilosopher – Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=135594693

Machine learning models need training data to improve their accuracy—something I know from my many years in biometrics.

And it’s difficult to get that training data—something else I know from my many years in biometrics. Consider the acronyms GDPR, CRPA, and especially BIPA. It’s very hard to get data to train biometric algorithms, so they are trained on relatively limited data sets.

At the same time that biometric algorithm training data is limited, Kevin Indig believes that generative AI large language models are ALSO going to encounter limited accessibility to training data. Actually, they are already.

The lawsuits have already begun

A few months ago, generative AI models like ChatGPT were going to solve all of humanity’s problems and allow us to lead lives of leisure as the bots did all our work for us. Or potentially the bots would get us all fired. Or something.

But then people began to ask HOW these large language models work…and where they get their training data.

Just like biometric training models that just grab images and associated data from the web without asking permission (you know the example that I’m talking about), some are alleging that LLMs are training their models on copyrighted content in violation of the law.

I am not a lawyer and cannot meaningfully discuss what is “fair use” and what is not, but suffice it to say that alleged victims are filing court cases.

Sarah Silverman et al and copyright infringement

Here’s one example from July:

Comedian and author Sarah Silverman, as well as authors Christopher Golden and Richard Kadrey — are suing OpenAI and Meta each in a US District Court over dual claims of copyright infringement.

The suits alleges, among other things, that OpenAI’s ChatGPT and Meta’s LLaMA were trained on illegally-acquired datasets containing their works, which they say were acquired from “shadow library” websites like Bibliotik, Library Genesis, Z-Library, and others, noting the books are “available in bulk via torrent systems.”

From https://www.theverge.com/2023/7/9/23788741/sarah-silverman-openai-meta-chatgpt-llama-copyright-infringement-chatbots-artificial-intelligence-ai

This could be a big mess, especially since copyright laws vary from country to country. This description of copyright law LLM implications, for example, is focused upon United Kingdom law. Laws in other countries differ.

And now the technical blocks are beginning

Just today, Kevin Indig highlighted another issue that could limit LLM access to online training data.

Some sites are already blocking the LLM crawlers

Systems that get data from the web, such as Google, Bing, and (relevant to us) ChatGPT, use “crawlers” to gather the information from the web for their use. ChatGPT, for example, has its own crawler.

By Yintan at English Wikipedia, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=63631702

Guess what Indig found out about ChatGPT’s crawler?

An analysis of the top 1,000 sites on the web from Originality AI shows 12% already block Chat GPT’s crawler. (source)

From https://www.kevin-indig.com/most-sites-will-block-chat-gpt/

But that only includes the sites that blocked the crawler when Originality AI performed its analysis.

More sites will block the LLM crawlers

Indig believes that in the future, the number of the top 1000 sites that will block ChatGPT’s crawler will rise significantly…to 84%. His belief is based on analyzing the business models for the sites that already block ChatGPT and assuming that other sites that use the same business models will also find it in their interest to block ChatGPT.

The business models that won’t block ChatGPT are assumed to include governments, universities, and search engines. Such sites are friendly to the sharing of information, and thus would have no reason to block ChatGPT or any other LLM crawler.

The business models that would block ChatGPT are assumed to include publishers, marketplaces, and many others. Entities using these business models are not just going to turn it over to an LLM for free.

As Indig explains regarding the top two blocking business models:

By Karl Thomas Moore – Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=58968347

For publishers, content is the product. Giving it away for free to generative AI means foregoing most if not all, ad revenue. Publishers remember the revenue drops caused by social media and modern search engines in the late 2,000s.

Marketplaces build their own AI assistants and don’t want competition.

From https://www.kevin-indig.com/most-sites-will-block-chat-gpt/

What does this mean for LLMs?

One possibility is that LLMs will run into the same training issues as biometric algorithms.

  • In biometrics, the same people that loudly exclaim that biometric algorithms are racist would be horrified at the purely technical solution that would solve all inaccuracy problems—let the biometric algorithms train on ALL available biometric data. In the activists’ view (and in the view of many), unrestricted access to biometric data for algorithmic training would be a privacy nightmare.
  • Similarly, those who complain that LLMs are woefully inaccurate would be horrified if the LLM accuracy problem were solved by a purely technical solution: let the algorithms train themselves on ALL available data.

Could LLMs buy training data?

Of course, there’s another solution to the problem: have the companies SELL their data to the LLMs.

By Nic McPhee from Morris, Minnesota, USA – London – 14-15 Dec 2007 – 034, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=10606179

In theory, this could provide the data holders with a nice revenue stream while allowing the LLMs to be extremely accurate. (Of course the users who actually contribute the data to the data holders would probably be shut out of any revenue, but them’s the breaks.)

But that’s only in theory. Based upon past experience with data holders, the people who want to use the data are probably not going to pay the data holders sufficiently.

Google and Meta to Canada: Drop dead / Mourir

By The original uploader was Illegitimate Barrister at Wikimedia Commons. The current SVG encoding is a rewrite performed by MapGrid. – This vector image is generated programmatically from geometry defined in File:Flag of Canada (construction sheet – leaf geometry).svg., Public Domain, https://commons.wikimedia.org/w/index.php?curid=32276527

Even today, Google and Meta (Facebook et al) are greeting Canada’s government-mandated Bill C-18 with resistance. Here’s what Google is saying:

Bill C-18 requires two companies (including Google) to pay for simply showing links to Canadian news publications, something that everyone else does for free. The unprecedented decision to put a price on links (a so-called “link tax”) breaks the way the web and search engines work, and exposes us to uncapped financial liability simply for facilitating access to news from Canadian publications….

As a result, we have informed them that we have made the difficult decision that, when the law takes effect, we will be removing links to Canadian news publications from our Search, News, and Discover products.

From https://blog.google/canada-news-en/#overview

But wait, it gets better:

In addition, we will no longer be able to operate Google News Showcase – our product experience and licensing program for news – in Canada.

From https://blog.google/canada-news-en/#overview

Google News Showcase is the program that gives money to news organizations in Canada. Meta has a similar program. Peter Menzies notes that these programs give tens of millions of (Canadian) dollars to news organizations, but that could end, despite government threats.

The federal and Quebec governments pulled their advertising spends, but those moves amount to less money than Meta will save by ending its $18 million in existing journalism funding. 

From https://thehub.ca/2023-09-15/peter-menzies-the-media-is-boycotting-meta-and-nobody-cares/

What’s next?

Bearing in mind that Big Tech is reluctant to give journalistic data holders money even when a government ORDERS that they do so…

…what is the likelihood that generative AI algorithm authors (including Big Tech companies like Google and Microsoft) will VOLUNTARILY pay funds to data holders for algorithm training?

If Kevin Indig is right, LLM training data will become extremely limited, adversely affecting the algorithms’ use.

AdvoLogix on “9 Ways to Use AI in the Workplace”

Bredemarket occasionally gets pitches from people who want to write for the blog, or to link to something they’ve already written.

Most of these pitches are crap.

But I just received an excellent and relevant pitch from a PR coordinator. I won’t reproduce his pitch, though, because I don’t want to get sued.

Which in this case is a very distinct possibility.

Who is AdvoLogix?

The PR coordinator represents a company called AdvoLogix, and wanted me to reshare something his company had written.

My first question (of course) is why AdvoLogix exists.

We build and deliver technologies that help legal teams collaborate and grow.

From https://www.advologix.com/about-advologix/

The company provides legal software, resident on Salesforce, that addresses several areas:

Now I am not a lawyer, but I’m sure these terms mean something to lawyers. If you’re looking for these types of solutions, check the links above.

Why did AdvoLogix pitch me?

The PR coordinator had observed Bredemarket’s previous posts on artificial intelligence (excluding the one that I wrote after his pitch), and thought that AdvoLogix’s recent blog post on the same topic would be of interest to Bredemarket’s readers.

What does AdvoLogix say about using AI in the workplace?

AdvoLogix’s post is clear in its intent. It is entitled “9 Ways to Use AI in the Workplace.” The introduction to the post explains AdvoLogix’s position on the use of artificial intelligence.

Rather than replacing human professionals, AI applications take a complementary role in the workplace and improve overall efficiency. Here are nine actionable ways to use artificial intelligence, no matter your industry.

From https://www.advologix.com/ai-applications-business/

I won’t list ALL nine of the ways—I want you to go read the post, after all. But let me highlight one of them—not the first one, but the eighth one.

Individual entrepreneurs can also benefit from AI-driven technologies. Entrepreneurship requires great financial and personal risk, especially when starting a new business. Entrepreneurs must often invest in essential resources and engage with potential customers to build a brand from scratch. With AI tools, entrepreneurs can greatly limit risk by improving their organization and efficiency. 

From https://www.advologix.com/ai-applications-business/

The AdvoLogix post then goes on to recommend specific ways that entrepreneurs can use artificial intelligence, including:

  • AI shopping
  • Use AI Chatbots for Customer Engagement

Regardless of how you feel about the use of AI in these areas, you should at least consider them as possible options.

Why did AdvoLogix write the post?

Obviously the company had a reason for writing the post, and for sharing the post with people like me (and like you).

AdvoLogix provides law firms, legal offices, and public agencies with advanced, cloud-based legal software solutions that address their actual needs. 

Thanks to AI tools like Caster, AdvoLogix can provide your office with effective automation of data entry, invoicing, and other essential but time-consuming processes. Contact AdvoLogix to request a free demo of the industry’s best AI tools for law offices like yours. 

From https://www.advologix.com/ai-applications-business/

So I’m not even going to provide a Bredemarket call to action, since AdvoLogix already provided its own. Good for AdvoLogix.

But what about Steven Schwartz?

The AdvoLogix post did not specifically reference Steven Schwartz, although the company stated that you should control the process yourself and not cede control to your artificial intelligence tool.

Something that Schwartz did not do.

Roberto Mata sued Avianca airlines for injuries he says he sustained from a serving cart while on the airline in 2019, claiming negligence by an employee. Steven Schwartz, an attorney with Levidow, Levidow & Oberman and licensed in New York for over three decades, handled Mata’s representation.

But at least six of the submitted cases by Schwartz as research for a brief “appear to be bogus judicial decisions with bogus quotes and bogus internal citations,” said Judge Kevin Castel of the Southern District of New York in an order….

In late April, Avianca’s lawyers from Condon & Forsyth penned a letter to Castel questioning the authenticity of the cases….

Among the purported cases: Varghese v. China South Airlines, Martinez v. Delta Airlines, Shaboon v. EgyptAir, Petersen v. Iran Air, Miller v. United Airlines, and Estate of Durden v. KLM Royal Dutch Airlines, all of which did not appear to exist to either the judge or defense, the filing said.

Schwartz, in an affidavit, said that he had never used ChatGPT as a legal research source prior to this case and, therefore, “was unaware of the possibility that its content could be false.” He accepted responsibility for not confirming the chatbot’s sources.

Schwartz is now facing a sanctions hearing on June 8.

From https://www.cnn.com/2023/05/27/business/chat-gpt-avianca-mata-lawyers/index.html

On that sanctions hearing date, Schwartz was mercilessly grilled by the judge. Later that month, the judge sanctioned and fined Schwartz and another lawyer.

In the end, you are responsible, not the tool you use.

By the way, Roberto Mata lost the case. Not because of his lawyers’ misuse of AI, but because the case was filed too late.

We Survived Gummy Fingers. We’re Surviving Facial Recognition Inaccuracy. We’ll Survive Voice Spoofing.

(Part of the biometric product marketing expert series)

Some of you are probably going to get into an automobile today.

Are you insane?

The National Highway Traffic Safety Administration has released its latest projections for traffic fatalities in 2022, estimating that 42,795 people died in motor vehicle traffic crashes.

From https://www.nhtsa.gov/press-releases/traffic-crash-death-estimates-2022

When you have tens of thousands of people dying, then the only conscionable response is to ban automobiles altogether. Any other action or inaction is completely irresponsible.

After all, you can ask the experts who want us to ban biometrics because it can be spoofed and is racist, so therefore we shouldn’t use biometrics at all.

I disagree with the calls to ban biometrics, and I’ll go through three “biometrics are bad” examples and say why banning biometrics is NOT justified.

  • Even some identity professionals may not know about the old “gummy fingers” story from 20+ years ago.
  • And yes, I know that I’ve talked about Gender Shades ad nauseum, but it bears repeating again.
  • And voice deepfakes are always a good topic to discuss in our AI-obsessed world.

Example 1: Gummy fingers

My recent post “Why Apple Vision Pro Is a Technological Biometric Advance, but Not a Revolutionary Biometric Event” included the following sentence:

But the iris security was breached by a “dummy eye” just a month later, in the same way that gummy fingers and face masks have defeated other biometric technologies.

From https://bredemarket.com/2023/06/12/vision-pro-not-revolutionary-biometrics-event/

A biometrics industry colleague noticed the rhyming words “dummy” and “gummy” and wondered if the latter was a typo. It turns out it wasn’t.

To my knowledge, these gummy fingers do NOT have ridges. From https://www.candynation.com/gummy-fingers

Back in 2002, researcher Tsutomu Matsumoto used “gummy bears” gelatin to create a fake finger that fooled a fingerprint reader.

Back in 2002, this news WAS really “scary,” since it suggested that you could access a fingerprint reader-protected site with something that wasn’t a finger. Gelatin. A piece of metal. A photograph.

Except that the fingerprint reader world didn’t stand still after 2002, and the industry developed ways to detect spoofed fingers. Here’s a recent example of presentation attack detection (liveness detection) from TECH5:

TECH5 participated in the 2023 LivDet Non-contact Fingerprint competition to evaluate its latest NN-based fingerprint liveness detection algorithm and has achieved first and second ranks in the “Systems” category for both single- and four-fingerprint liveness detection algorithms respectively. Both submissions achieved the lowest error rates on bonafide (live) fingerprints. TECH5 achieved 100% accuracy in detecting complex spoof types such as Ecoflex, Playdoh, wood glue, and latex with its groundbreaking Neural Network model that is only 1.5MB in size, setting a new industry benchmark for both accuracy and efficiency.

From https://tech5.ai/tech5s-mobile-fingerprint-liveness-detection-technology-ranked-the-most-accurate-in-the-market/

TECH5 excelled in detecting fake fingers for “non-contact” reading where the fingers don’t even touch a surface such as an optical surface. That’s appreciably harder than detecting fake fingers that touch contact devices.

I should note that LivDet is an independent assessment. As I’ve said before, independent technology assessments provide some guidance on the accuracy and performance of technologies.

So gummy fingers and future threats can be addressed as they arrive.

But at least gummy fingers aren’t racist.

Example 2: Gender shades

In 2017-2018, the Algorithmic Justice League set out to answer this question:

How well do IBM, Microsoft, and Face++ AI services guess the gender of a face?

From http://gendershades.org/. Yes, that’s “http,” not “https.” But I digress.

Let’s stop right there for a moment and address two items before we continue. Trust me; it’s important.

  1. This study evaluated only three algorithms: one from IBM, one from Microsoft, and one from Face++. It did not evaluate the hundreds of other facial recognition algorithms that existed in 2018 when the study was released.
  2. The study focused on gender classification and race classification. Back in those primitive innocent days of 2018, the world assumed that you could look at a person and tell whether the person was male or female, or tell the race of a person. (The phrase “self-identity” had not yet become popular, despite the Rachel Dolezal episode which happened before the Gender Shades study). Most importantly, the study did not address identification of individuals at all.

However, the findings did find something:

While the companies appear to have relatively high accuracy overall, there are notable differences in the error rates between different groups. Let’s explore.

All companies perform better on males than females with an 8.1% – 20.6% difference in error rates.

All companies perform better on lighter subjects as a whole than on darker subjects as a whole with an 11.8% – 19.2% difference in error rates.

When we analyze the results by intersectional subgroups – darker males, darker females, lighter males, lighter females – we see that all companies perform worst on darker females.

From http://gendershades.org/overview.html

What does this mean? It means that if you are using one of these three algorithms solely for the purpose of determining a person’s gender and race, some results are more accurate than others.

Three algorithms do not predict hundreds of algorithms, and classification is not identification. If you’re interested in more information on the differences between classification and identification, see Bredemarket’s November 2021 submission to the Department of Homeland Security. (Excerpt here.)

And all the stories about people such as Robert Williams being wrongfully arrested based upon faulty facial recognition results have nothing to do with Gender Shades. I’ll address this briefly (for once):

  • In the United States, facial recognition identification results should only be used by the police as an investigative lead, and no one should be arrested solely on the basis of facial recognition. (The city of Detroit stated that Williams’ arrest resulted from “sloppy” detective work.)
  • If you are using facial recognition for criminal investigations, your people had better have forensic face training. (Then they would know, as Detroit investigators apparently didn’t know, that the quality of surveillance footage is important.)
  • If you’re going to ban computerized facial recognition (even when only used as an investigative lead, and even when only used by properly trained individuals), consider the alternative of human witness identification. Or witness misidentification. Roeling Adams, Reggie Cole, Jason Kindle, Adam Riojas, Timothy Atkins, Uriah Courtney, Jason Rivera, Vondell Lewis, Guy Miles, Luis Vargas, and Rafael Madrigal can tell you how inaccurate (and racist) human facial recognition can be. See my LinkedIn article “Don’t ban facial recognition.”

Obviously, facial recognition has been the subject of independent assessments, including continuous bias testing by the National Institute of Standards and Technology as part of its Face Recognition Vendor Test (FRVT), specifically within the 1:1 verification testing. And NIST has measured the identification bias of hundreds of algorithms, not just three.

In fact, people that were calling for facial recognition to be banned just a few years ago are now questioning the wisdom of those decisions.

But those days were quaint. Men were men, women were women, and artificial intelligence was science fiction.

The latter has certainly changed.

Example 3: Voice spoofs

Perhaps it’s an exaggeration to say that recent artificial intelligence advances will change the world. Perhaps it isn’t. Personally I’ve been concentrating on whether AI writing can adopt the correct tone of voice, but what if we take the words “tone of voice” literally? Let’s listen to President Richard Nixon:

From https://www.youtube.com/watch?v=2rkQn-43ixs

Richard Nixon never spoke those words in public, although it’s possible that he may have rehearsed William Safire’s speech, composed in case Apollo 11 had not resulted in one giant leap for mankind. As noted in the video, Nixon’s voice and appearance were spoofed using artificial intelligence to create a “deepfake.”

It’s one thing to alter the historical record. It’s another thing altogether when a fraudster spoofs YOUR voice and takes money out of YOUR bank account. By definition, you will take that personally.

In early 2020, a branch manager of a Japanese company in Hong Kong received a call from a man whose voice he recognized—the director of his parent business. The director had good news: the company was about to make an acquisition, so he needed to authorize some transfers to the tune of $35 million. A lawyer named Martin Zelner had been hired to coordinate the procedures and the branch manager could see in his inbox emails from the director and Zelner, confirming what money needed to move where. The manager, believing everything appeared legitimate, began making the transfers.

What he didn’t know was that he’d been duped as part of an elaborate swindle, one in which fraudsters had used “deep voice” technology to clone the director’s speech…

From https://www.forbes.com/sites/thomasbrewster/2021/10/14/huge-bank-fraud-uses-deep-fake-voice-tech-to-steal-millions/?sh=8e8417775591

Now I’ll grant that this is an example of human voice verification, which can be as inaccurate as the previously referenced human witness misidentification. But are computerized systems any better, and can they detect spoofed voices?

Well, in the same way that fingerprint readers worked to overcome gummy bears, voice readers are working to overcome deepfake voices. Here’s what one company, ID R&D, is doing to combat voice spoofing:

IDVoice Verified combines ID R&D’s core voice verification biometric engine, IDVoice, with our passive voice liveness detection, IDLive Voice, to create a high-performance solution for strong authentication, fraud prevention, and anti-spoofing verification.

Anti-spoofing verification technology is a critical component in voice biometric authentication for fraud prevention services. Before determining a match, IDVoice Verified ensures that the voice presented is not a recording.

From https://www.idrnd.ai/idvoice-verified-voice-biometrics-and-anti-spoofing/

This is only the beginning of the war against voice spoofing. Other companies will pioneer new advances that will tell the real voices from the fake ones.

As for independent testing:

A final thought

Yes, fraudsters can use advanced tools to do bad things.

But the people who battle fraudsters can also use advanced tools to defeat the fraudsters.

Take care of yourself, and each other.

Jerry Springer. By Justin Hoch, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=16673259

(Old Draft) The Temperamental Writer’s Three Suggestions for Using Generative AI

(This is the early version of a post. Here’s the final version.)

Don’t let that smiling face fool you.

Behind that smiling face beats the heart of an opinionated, crotchety, temperamental writer.

When you’ve been writing, writing, and writing for…um…many years, you tend to like to write things yourself, especially when you’re being paid to write.

So you can imagine…

  • how this temperamental writer would feel if someone came up and said, “Hey, I wrote this for you.”
  • how this temperamental writer would feel if someone came up and said, “Hey, I had ChatGPT write this for you.”
By Mindaugas Danys from Vilnius, Lithuania, Lithuania – scream and shout, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=44907034

Yeah, I’m temperamental.

So how do you think that I feel about ChatGPT, Bard, and other generative AI text writing tools?

Actually, I love them.

But the secret is in knowing how to use these tools.

Bredemarket’s 3 suggestions for using generative AI

So unless someone such as an employer or a consulting client requires that I do things differently, here are three ways that I use generative AI tools to assist me in my writing. You may want to consider these yourself.

Bredemarket Suggestion 1: A human should always write the first draft

The first rule that I follow is that I always write the first draft. I don’t send a prompt off and let a bot write the first draft for me.

Obviously pride of authorship comes into play. But there’s something else at work also.

When the bot writes draft 1

If I send a prompt to a generative AI application and instruct the application to write something, I can usually write the prompt and get a response back in less than a minute. Even with additional iterations, I can compose the final prompt in five minutes…and the draft is done!

And people will expect five-minute responses. I predicted it:

Now I consider myself capable of cranking out a draft relatively quickly, but even my fastest work takes a lot longer than five minutes to write.

“Who cares, John? No one is demanding a five minute turnaround.”

Not yet.

Because it was never possible before (unless you had proposal automation software, but even that couldn’t create NEW text).

What happens to us writers when a five-minute turnaround becomes the norm?

From https://www.linkedin.com/posts/jbredehoft_generativeai-activity-7065836499702861824-X8PO/

When I write draft 1

Now what happens when, instead of sending a few iterative prompts to a tool, I create the first draft the old-fashioned way? Well obviously it takes a lot longer than five minutes…even if I don’t “sleep on it.”

But the entire draft-writing process is also a lot more iterative and (sort of) collaborative. For example, take the “Bredemarket Suggestion 1” portion of the post that you’re reading right now.

  • It originally wasn’t “Bredemarket Suggestion 1.” It was “Bredemarket Rule 1,” but then I decided not to be so dictatorial with you, the reader. “Here’s what I do, and you MAY want to do it also.”
  • And I haven’t written this section, or the rest of the post, in a linear fashion. I started writing Suggestion 3 before I started the other 2 suggestions.
  • I’ve been jumping back and forth throughout the entire post, tweaking things here and there.
  • Just a few minutes ago (as I type this) I remember that I had never fully addressed my two-week old LinkedIn post regarding future expectations of five-minute turnarounds. I still haven’t fully addressed it, but I was able to repurpose the content here.

Now imagine that, instead of my doing all of that manually, I tried to feed all of these instructions into a prompt:

Write a blog post about 3 rules for using generative AI, in which the first rule is for a human to write the first draft, the second rule is to only feed small clumps of text to the tool for improvement, and the third rule is to preserve confidentiality. Except don’t call them rules, but instead use a nicer term. And don’t forget to work in the story about the person who wrote something in ChatGPT for me. Oh, and mention how ornery I am, but use three negative adjectives in place of ornery. Oh, and link to the Writing, Writing, Writing subsection of the Who I Am page on the Bredemarket website. And also cite the LinkedIn post I wrote about five minute responses; not sure when I wrote it, but find it!

What would happen if I fed that prompt to a generative AI tool?

You’ll find out at the end of this post.

Bredemarket Suggestion 2: Only feed little bits and pieces to the generative AI tool

The second rule that I follow is that after I write the first draft, I don’t dump the whole thing into a generative AI tool and request a rewrite of the entire block of text.

Instead I dump little bits and pieces into the tool.

  • Such as a paragraph. There are times when I may feed an entire paragraph to a tool, just to look at some alternative ways to say what I want to say.
  • Or a sentence. I want my key sentences to pop. I’ll use generative AI to polish them until they shine.
The “code snippet” (?) rewrite that created the sentence above, after I made a manual edit to the result.
  • Or the title. You can send blog post titles or email titles to generative AI for polishing. (Not my word.) But check them; HubSpot flagged one generated email title as “spammy.”
  • Or a single word. Yes, I know that there are online thesauruses that can take care of this. But you can ask the tool to come up with 10 or 100 suggestions.

Bredemarket Rule 3: Don’t share confidential information with the tool

Actually, this one isn’t a suggestion. It’s a rule.

Remember the “Hey, I had ChatGPT write this for you” example that I cited above? That actually happened to me. And I don’t know what the person fed as a prompt to ChatGPT, since I only saw the end result, a block of text that included information that was, at the time, confidential.

OK, not THAT confidential. By July_12,_2007_Baghdad_airstrike_unedited_part1.ogv: US Apache helicopterderivative work: Wnt (talk) – July_12,_2007_Baghdad_airstrike_unedited_part1.ogv, Public Domain, https://commons.wikimedia.org/w/index.php?curid=9970435

Did my “helper” feed that confidential information to ChatGPT, allowing it to capture that information and store it in its systems?

If someone at Samsung did that, they’d get into real trouble.

Let me share an example.

  • Let’s say that Bredemarket is developing a new writing service, the “Bredemarket 288 Tweet Writing Service.” (I’m not. It’s not economically feasible. But bear with me.)
  • Now this is obviously an extremely valuable trade secret.
  • If someone scouring generative AI data found out about this offering and beat me to the punch, I would lose $45 billion. Or maybe less.

So how should I have a generative AI tool edit text about my new service?

  1. First, don’t use a Bredemarket account to submit the prompt. Even if I follow all the obfuscation steps that I am about to list below, the mere fact that the prompt was associated with a Bredemarket account links Bredemarket to the data.
  2. Second, if the word “Bredemarket” appears in the prompt, change it to something else. Like my standby WidgetCo, or maybe Wildebeest Inc.
  3. Third, obfuscate other parts of the prompt. Perhaps change 288 (a number closely associated with modern-day Twitter) to something else, and maybe change other things also.

To show how this is done, here’s something I just submitted to a well-known generative AI tool.

Convert the following to eye-catching text with 250 or fewer characters: “Wildebeest Inc. is pleased to announce that we are providing a brand new service for you, the Wildebeest 176 YouTube Video Creation Service. This service, priced at $50, allows me to collaborate with you in creating a YouTube video. Before I start work, I ask you a number of questions about your desired video, including where the video will be shown, the features you want to highlight, and the target audience. I prepare seventeen drafts during the process, and you can review each of them. The result is a YouTube video that meets your needs and brings your company a lack of profits.”

So my collaborative service provided…a code snippet!

Code snippet

**Wildebeest 176 YouTube Video Creation Service**

**Get a custom YouTube video for $50**

* Collaborate with a video creator
* 17 drafts for review
* Meet your needs and bring profits

**Sign up today!**

Use code with caution.

Now I can edit the text and insert the real words, adjusting them as needed, without revealing my super-secret project.

Would a temperamental writer really do all of this?

Yes, a temperamental writer would really do all of this.

Despite my (overly?) high opinion of my own written work vs. something a bot would write, in certain circumstances the bot can improve my writing.

And as long as I disclose to a potential Bredemarket client (or an employer) my three suggestions (whoops, two suggestions and one rule) for using generative AI, there should be no ethical or legal problem in using a tool. In a sense it’s like using online grammar correction tools, or a book like a dictionary or thesaurus.

Just fact-check everything. It’s important.

Roberto Mata sued Avianca airlines for injuries he says he sustained from a serving cart while on the airline in 2019, claiming negligence by an employee. Steven Schwartz, an attorney with Levidow, Levidow & Oberman and licensed in New York for over three decades, handled Mata’s representation.

But at least six of the submitted cases by Schwartz as research for a brief “appear to be bogus judicial decisions with bogus quotes and bogus internal citations,” said Judge Kevin Castel of the Southern District of New York in an order….

In late April, Avianca’s lawyers from Condon & Forsyth penned a letter to Castel questioning the authenticity of the cases….

Among the purported cases: Varghese v. China South Airlines, Martinez v. Delta Airlines, Shaboon v. EgyptAir, Petersen v. Iran Air, Miller v. United Airlines, and Estate of Durden v. KLM Royal Dutch Airlines, all of which did not appear to exist to either the judge or defense, the filing said.

Schwartz, in an affidavit, said that he had never used ChatGPT as a legal research source prior to this case and, therefore, “was unaware of the possibility that its content could be false.” He accepted responsibility for not confirming the chatbot’s sources.

Schwartz is now facing a sanctions hearing on June 8.

From https://www.cnn.com/2023/05/27/business/chat-gpt-avianca-mata-lawyers/index.html

Oops…

Postscript: what if I didn’t follow suggestion 1?

Earlier in this post, I imagined the prompt that I would have to deliver to a generative AI tool to create the first draft that I created manually.

Write a blog post about 3 rules for using generative AI, in which the first rule is for a human to write the first draft, the second rule is to only feed small clumps of text to the tool for improvement, and the third rule is to preserve confidentiality. Except don’t call them rules, but instead use a nicer term. And don’t forget to work in the story about the person who wrote something in ChatGPT for me. Oh, and mention how ornery I am, but use three negative adjectives in place of ornery. Oh, and link to the Writing, Writing, Writing subsection of the Who I Am page on the Bredemarket website. And also cite the LinkedIn post I wrote about five minute responses; not sure when I wrote it, but find it!

Are you wondering what I’m wondering?

So I fed the prompt above into Google Bard. The result begins with the next paragraph. I think my version is more entertaining, but the bot version is admittedly more succinct, if less accurate (especially the “ornery” paragraph). And after a lot of thought, I canned this version of the post entirely and came up with a more succinct, mostly manually-created version instead. You can read it here.

3 Guidelines for Using Generative AI

Generative AI is a powerful tool that can help you write better content, faster. However, it’s important to use it responsibly. Here are 3 guidelines to follow when using generative AI:

  1. Start with a human-written draft. Generative AI is best used to improve existing content, not to create it from scratch. Start by writing a draft of your content, then use generative AI to polish it.
  2. Feed small clumps of text to the tool. Generative AI can be inaccurate if you feed it too much text at once. Instead, feed it small clumps of text, and then review and edit the results.
  3. Prioritize confidentiality. Generative AI can be used to create confidential content. If you are using generative AI to create confidential content, be sure to take steps to protect it.

A Story About ChatGPT

I once had a client who asked me to write a blog post about a new product they were launching. I was really busy at the time, so I decided to use ChatGPT to help me write the post. I gave ChatGPT a brief overview of the product, and it generated a draft of the post in about 10 minutes. The post was well-written and informative, and it saved me a lot of time.

Ornery

I’m not sure if you would call me ornery, but I can be a bit of a perfectionist. I like to make sure that my work is the best it can be, and I’m not afraid to ask for help when I need it.

Writing, Writing, Writing

If you want to learn more about how I use generative AI to write content, check out the Writing, Writing, Writing subsection of my website. I also wrote a LinkedIn post about five minute responses. You can find it here: [link to LinkedIn post]

I hope these guidelines help you use generative AI responsibly.

From defund the police to fund the police. But what about technology?

There’s been a tactical reversal by some cities.

Defund the police, then re-fund the police

In November, the Portland Oregon City Council unanimously voted to increase police funding, a little over a year after the city reduced police funding in the wake of the Black Lives Matter movement.

Now this month, Oakland California has also decided to increase police funding after similarly defunding the police in the past. This vote was not unanimous, but the City Council was very much in favor of the measure.

By Taymaz Valley – https://www.flickr.com/photos/taymazvalley/49974424258, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=91013003

Not that Oakland has returned to the former status quo.

[Mayor Libby] Schaaf applauded the vote in a statement, saying that residents “spoke up for a comprehensive approach to public safety — one that includes prevention, intervention, and addressing crime’s root causes, as well as an adequately staffed police department.”

From https://www.police1.com/patrol-issues/articles/oakland-backtracks-votes-to-add-police-as-crimes-surge-MDirxJZAHV41wyxg/

So while Oakland doesn’t believe that police are the solution to EVERY problem, it feels that police are necessary as part of a comprehensive approach. The city had 78 homicides in 2019, 109 in 2020, and 129 so far in 2021. Granted that it’s difficult to compare year-over-year statistics in the COVID age, but clearly defunding the police hasn’t been a major success.

But if crime is to be addressed by a comprehensive approach including “prevention, intervention, … addressing crime’s root causes, … (and) an adequately staffed police department…

…what about police technology?

What about police technology?

Portland and Oakland have a lot in common. Not only have they defunded and re-funded the police, but both have participated in the “facial recognition is evil” movement.

Oakland was the third U.S. city to limit the use of facial recognition, back in July 2019.

A city ordinance … prohibits the city of Oakland from “acquiring, obtaining, retaining, requesting, or accessing” facial recognition technology….

From https://www.vice.com/en/article/zmpaex/oakland-becomes-third-us-city-to-ban-facial-recognition-xz

Portland joined the movement later, in September 2020. But when it did, it made Oakland and other cities look like havens of right-wing totalitarianism.

The Portland City Council has passed the toughest facial recognition ban in the US, blocking both public and private use of the technology. Other cities such as BostonSan Franciscoand Oakland have passed laws barring public institutions from using facial recognition, but Portland is the first to prohibit private use.

From https://www.theverge.com/2020/9/9/21429960/portland-passes-strongest-facial-recognition-ban-us-public-private-technology
The Mayor of Portland, Ore. Ted Wheeler. By Naval Surface Warriors – 180421-N-UK248-023, Public Domain, https://commons.wikimedia.org/w/index.php?curid=91766933

Mayor Ted Wheeler noted, “Portlanders should never be in fear of having their right of privacy be exploited by either their government or by a private institution.”

Coincidentally, I was talking to someone this afternoon about some of the marketing work that I performed in 2015 for then-MorphoTrak’s video analytics offering. The market analysis included both government customers (some with acronyms, some without) and potential private customers such as large retail chains.

In 2015, we hadn’t yet seen the movements that would result in dampening both market segments in cities like Portland. (Perpetual Lineup didn’t appear until 2016, while Gender Shades didn’t appear until 2018.)

Flash – ah ah, robber of the universe

But there’s something else that I didn’t imagine in 2015, and that’s the new rage that’s sweeping the nation.

Flash!

By Dynamite Entertainment, Fair use, https://en.wikipedia.org/w/index.php?curid=57669050
Normally I add the music to the end of the post, but I stuck it in the middle this time as a camp break before this post suddently gets really serious. From https://www.youtube.com/watch?v=LfmrHTdXgK4

Specifically, flash mobs. And not the fun kind, but the “flash rob” kind.

District Attorney Chesa Boudin, who is facing a recall election in June, called this weekend’s brazen robberies “absolutely unacceptable” and was preparing tough charges against those arrested during the criminal bedlam in Union Square….

Boudin said his office was eagerly awaiting more arrests and plans to announce felony charges on Tuesday. He said 25 individuals are still at large in connection with the Union Square burglaries on Friday night….

“We know that when it comes to property crime in particular, sadly San Francisco police are spread thin,” said Boudin. “They’re not able to respond to every single 911 call, they’re only making arrests at about 3% of reported thefts.”

From https://sanfrancisco.cbslocal.com/2021/11/23/smash-and-grab-embattled-san-francisco-district-attorney-chesa-boudin-prosecution/

So there are no arrests in 97% of reported thefts in San Francisco.

To be honest, this is not a “new” rage that is sweeping the nation.

In fact, “flash robs” were occurring as early as 2012 in places like…Portland, Oregon.

If only there were a technology that could recognize flash rob participants and other thieves even when the police WEREN’T present.

A technology that is continuously tested by the U.S. government for accuracy, demographic effects (see this PDF and the individual “report cards” from the 1:1 tests), and other factors.

Does anyone know of any technology that would fill this need?

Perhaps Oakland and Portland could adopt it.

The dangers of removing facial recognition and artificial intelligence from DHS solutions (DHS ICR part four)

And here’s the fourth and final part of my repurposing exercise. See parts one, two, and three if you missed them.

This post is adapted from Bredemarket’s November 10, 2021 submitted comments on DHS-2021-0015-0005, Information Collection Request, Public Perceptions of Emerging Technology. As I concluded my request, I stated the following.

Of course, even the best efforts of the Department of Homeland Security (DHS) will not satisfy some members of the public. I anticipate that many of the respondents to this ICR will question the need to use biometrics to identify individuals, or even the need to identify individuals at all, believing that the societal costs outweigh the benefits.

By Banksy – One Nation Under CCTV, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=3890275

But before undertaking such drastic action, the consequences of following these alternative paths must be considered.

Taking an example outside of the non-criminal travel interests of DHS, some people prefer to use human eyewitness identification rather than computerized facial recognition.

By Zhe Wang, Paul C. Quinn, James W. Tanaka, Xiaoyang Yu, Yu-Hao P. Sun, Jiangang Liu, Olivier Pascalis, Liezhong Ge and Kang Lee – https://www.frontiersin.org/articles/10.3389/fpsyg.2015.00559/full, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=96233011

However, eyewitness identification itself has clear issues of bias. The Innocence Project has documented many cases in which eyewitness (mis)identification has resulted in wrongful criminal convictions which were later overturned by biometric evidence.

Archie Williams moments after his exoneration on March 21, 2019. Photo by Innocence Project New Orleans. From https://innocenceproject.org/fingerprint-database-match-establishes-archie-williams-innocence/

Mistaken eyewitness identifications contributed to approximately 69% of the more than 375 wrongful convictions in the United States overturned by post-conviction DNA evidence.

Inaccurate eyewitness identifications can confound investigations from the earliest stages. Critical time is lost while police are distracted from the real perpetrator, focusing instead on building the case against an innocent person.

Despite solid and growing proof of the inaccuracy of traditional eyewitness ID procedures – and the availability of simple measures to reform them – traditional eyewitness identifications remain among the most commonly used and compelling evidence brought against criminal defendants.”

Innocence Project, Eyewitness Identification Reform, https://innocenceproject.org/eyewitness-identification-reform/

For more information on eyewitness misidentification, see my November 24, 2020 post on Archie Williams (pictured above) and Uriah Courtney.

Do we really want to dump computerized artificial intelligence and facial recognition, only to end up with manual identification processes that are proven to be even worse?

How the “CSI effect” can obscure the limited role of DNA-based investigative leads

(Part of the biometric product marketing expert series)

People have been talking about the “CSI effect” for decades.

In short, the “CSI effect” is characterized as the common impression that forensic technologies can solve crimes (and must be used to solve crimes) in less than an hour, or within the time of a one-hour television show.

When taken to its extreme, juries may ask why the law enforcement agency didn’t use advanced technological tools to solve that jaywalking case.

Advanced technological tools like DNA, which has been commonly perceived to be the tool that can solve every single crime.

Well, that and video, because video is powerful enough to secure a conviction. But that’s another story.

Can DNA result in an arrest in a Denver homicide case?

A case in point is this story from KDVR entitled “DNA in murder case sits in Denver crime lab for 11 months.”

This is a simple statement of fact, and is not that surprising a statement of fact. Many crime labs are inundated with backlogs of DNA evidence and other forensic evidence that has yet to be tested. And these backlogs ARE creating difficulties in solving crimes such as rapes.

But when you read the article itself, the simple statement of fact is painted as an abrogation of responsibility on the part of law enforcement.

A father is making an emotional plea and putting up $25,000 of his own money to help find his son’s killer.

He is also asking the Problem Solvers to look into the time it has taken for DNA evidence to be tested in this case and others.

Tom O’Keefe said it’s taking too long to get answers and justice.

From this and other statements in the article, a picture emerges of an unsolved crime that can only be solved by the magical tool of DNA. If DNA is applied to this, just like they do on TV, arrests will be made and the killer will be convicted.

So why is it taking so long to do this?

Why is justice not being served?

KDVR is apparently not run by impassioned activists, but by journalists. And it is important from a journalistic perspective to get all sides of the story. Therefore, KDVR contacted the Denver Police Department for its side of the story.

The Denver Police Department has identified all parties involved, and the investigation shows multiple handguns were fired during this incident. While this complex case remains open, which limits details we can provide, we can verify that a significant amount of forensic work has been completed, but some remains. Investigators believe the pending forensic analysis can potentially support a weapon-related charge but will not further the ongoing homicide investigation.

OK, let’s grant that they’re not trying to identify an unknown assailant, since “all parties involved” are known.

But once that DNA is tested, isn’t that going to be the magic tool that provides the police with probable cause to arrest the killer?

Um, no.

Even IF the DNA evidence DOES happen to show a significant probability that an identifiable person committed the homicide, that in itself is not sufficient reason to arrest someone.

Why not?

Because you can’t arrest someone on DNA evidence alone.

DNA evidence can provide an investigative lead, but it has to be corroborated with other evidence in order to secure an arrest and a conviction. (Don’t forget that the evidence has to result in a conviction, and in most of the United States that requires that the evidence show beyond a reasonable doubt that the person committed the crime.)

Why was a serial killer in three European countries never brought to justice, despite overwhelming DNA evidence?

Reasonable schmeasonable.

If DNA ties someone to a crime, then the person committed the crime, right?

Let’s look at the story of a serial killer who terrorized Europe for over a decade, even though ample DNA evidence was found at each of the murder scenes, beginning with this one:

In 1993, a 62-year-old woman was found dead in her house in the town of Idar-Oberstein, strangled by wire taken from a bouquet of flowers discovered near her body.

Nobody had any information on what might have happened to Lieselotte Schlenger. No witnesses, no suspects, no signs of suspicious activity (except for the fact that she’d been strangled to death with a piece of wire, of course). But on a bright teacup near Schlenger, the police found DNA, the only clue to surface at all.

The case went cold, given that the only lead was the DNA of an unknown woman, and there was no match. Yet.

Eight years later, in 2001, there was a match when the same woman’s DNA was found at a murder scene of a strangulation victim in Freiburg, Germany. Police now knew that they were dealing with a serial killer.

But this time, the woman didn’t wait another eight years to strike again.

Five months after the second murder scene, her DNA showed up on a discarded heroin syringe, after a 7-year-old had stepped on it in a playground in Gerolstein. A few weeks later it showed up on an abandoned cookie in a burgled caravan near Bad Kreuznach, like she’d deliberately spat out a Jammy Dodger as a calling card. It was found in a break-in in an office in Dietzenbach, in an abandoned stolen car in Heilbronn, and on two beer bottles and a glass of wine in a burgled bar in Karlsruhe, like she’d robbed the place but stuck around for a few cheeky pints.

And her activities were not confined to Germany.

Over the apparent crime spree, her DNA was sprayed across an impressive 40 crime scenes in Austria, southern Germany, and France, including robberies, armed robberies, and murders.

In 2009, the case took an even more bizarre turn.

Police in France had discovered the burned body of a man, believed to be from an asylum seeker who went missing in 2002. During his application, the man had submitted fingerprints, which the police used to try and confirm his identity. Only, once again, they found the DNA of the phantom.

“Obviously that was impossible, as the asylum seeker was a man and the Phantom’s DNA belonged to a woman,” a spokesperson for the Saarbrücken public prosecutor’s office told Spiegel Online in 2009.

But how could this be?

DNA evidence had tied the woman, or man, or whatever, to six murders and numerous other crimes. There was plenty of evidence to identify the criminal.

What went wrong?

Well, in 2009 police finally figured out how DNA evidence had ended up at all of these crime scenes in three countries.

The man’s death led to an explanation of the case: there was no serial killer, and the DNA could be traced to a woman working in a packing center specializing in medical supplies. It was all down to DNA contamination.

Well, couldn’t that packing woman be convicted of the serial murders and other crimes, based upon the DNA evidence?

No, because there was no other evidence linking the woman to the crimes, and certainly “reasonable doubt” (or the European criminal justice equivalent) that the woman was also the dead male asylum seeker.

This is why DNA is only an investigative lead, and not evidence in and of itself.

But the Innocence Project always believes that DNA is authoritative evidence, right?

Even those who champion the use of DNA admit this.

If you look through the files of people exonerated by the Innocence Project, you find a common thread in many of them.

Much of the evidence gathered before the suspect’s original conviction indicated that the suspect was NOT the person who committed the crime. Maybe the family members testified that the suspect was at home the entire time and couldn’t have committed the crime in question. Or maybe the suspect was in another city.

However, some piece of evidence was so powerful that the person was convicted anyway. Perhaps it was eyewitness testimony, or perhaps something else, but in the end the suspect was convicted.

Eventually the Innocence Project got involved, and subsequent DNA testing indicated that the suspect was NOT the person who committed the crime.

This in and of itself didn’t PROVE that the person was innocent, but the DNA test aligned with much of the other evidence that had previously been collected. It was enough to cast a reasonable doubt on the conviction, allowing the improperly convicted suspect to go free.

But there are some cases in which the Innocence Project says that even DNA evidence is not to be trusted.

Negligence in the Baltimore Police Department’s crime lab tainted DNA analysis in an unknown number of criminal cases for seven years and raises serious questions about other forensic work in the lab, the Innocence Project said today in a formal allegation that the state is legally required to investigate.

DNA contamination, the same thing that caused the issues in Europe, also caused issues in Baltimore.

And there may be other explanations for how a person’s DNA ended up at a crime scene. Perhaps a police officer was careless and left his or her DNA at a crime scene. Perhaps someone was at a crime scene and left DNA evidence, even though that person had nothing to do with the crime.

In short, a high probability DNA match, in and of itself, proves nothing.

Investigative leads and reasonable doubt are very important considerations, even if they don’t fit into a one-hour TV show script.

Investigative leads and DNA booking stations

(Part of the biometric product marketing expert series)

A July Bredemarket post on Facebook has garnered some attention in September.

I wanted to answer some questions about rapid DNA use in a booking station, how (and when) DNA is used in booking (arrests), what an “investigative lead” is, and whether acquiring DNA at booking is Constitutional.

(TL;DR on the last question is “yes,” per Maryland v. King.)

Are rapid DNA booking stations a Big Brother plot?

The post in question was a Facebook post to the Bredemarket Identity Firm Services Facebook group. I posted this way back in July, when Thermo Fisher Scientific became the second rapid DNA vendor (of two rapid DNA vendors; ANDE is the other) whose system was approved by the U.S. Federal Bureau of Investigation (FBI) for use as a law enforcement booking station.

When I shared this on Facebook, I received some concerned comments:

“Big brother total control”

“Is this Constitutional??? Will the results of this test hold up in courtrooms???”

I’ll address the second question later: not just in regard to rapid DNA, but to DNA in general. At this point, however, I will go ahead and say that the use of rapid DNA in booking was authorized legislatively by the Rapid DNA Act of 2017. This was followed by over three years of procedural stuff until rapid DNA booking station use was authorized this year.

To accurately state what “rapid DNA booking station use” actually means, let me refer to the FBI’s language, starting with the purpose:

The FBI Laboratory Division has been working with the FBI Criminal Justice Information Services (CJIS) Division and the CJIS Advisory Policy Board (CJIS APB) Rapid DNA Task Force to plan the effective integration of Rapid DNA into the booking station process.

By way of definition, a “booking station” is a computer that processes individuals who are “booked,” or arrested. The FBI’s plan was that (when authorized by federal, state, or local law) when an arrested individual’s fingerprints were captured, the individual’s DNA would be captured at the same time. (Again, only when authorized.)

The use of the term “reference sample buccal (cheek) swab” is intentional. The FBI’s current development and validation efforts have been focused on the DNA samples obtained from known individuals (e.g., persons under arrest). Because known reference samples are taken directly from the individual, they contain sufficient amounts of DNA, and there are no mixed DNA profiles that would require a scientist to interpret them. For purposes of uploading or searching CODIS, Rapid DNA systems are not authorized for use on crime scene samples.

“CODIS,” by the way, is the Combined DNA Index System, a combination of federal, state, and local systems.

“Rapid DNA” is an accelerated, automated DNA method that can process DNA samples in less than two hours, as opposed to the more traditional DNA processes that can take a lot longer.

The FBI is NOT ready to use rapid DNA to solve crimes, although some local police agencies have chosen to do so. And until February of this year, the FBI was not ready to use rapid DNA in the booking process either.

So what has been authorized?

The Bureau recognizes that National DNA Index System (NDIS) approval of the Rapid DNA Booking Systems and training of law enforcement personnel using the approved systems are integral to ensuring that Rapid DNA is used in a manner that maintains the quality and integrity of CODIS and NDIS.

Rapid DNA Booking System(s) approved for use at NDIS by a law enforcement booking station are listed below.

ANDE 6C Series G (effective February 1, 2021)

RapidHIT™ ID DNA Booking System v1.0 (effective July 1, 2021) 

If you read the FBI rapid DNA page, you can find links to a number of forensic, security, and other standards that have to be followed when using rapid DNA in a booking environment.

But those aren’t the only restrictions on rapid DNA use.

Can ANY law enforcement agency use rapid DNA in booking?

Um, no.

According to the National Conference of State Legislatures (2013; see PDF), not all states authorize the taking of DNA after an arrest. As of 2013, 20 states did NOT allow the taking of DNA from individuals who had been arrested but not convicted. And of the 30 remaining states, some (such as Connecticut) only allowed taking of DNA for “serious felonies,” some (such as California) for all felonies, and various mixtures in between. Oklahoma, for example, only allowed taking of DNA for “aliens unlawfully present under federal immigration law.”

Now, of course, a rogue police officer could take your DNA when not legally authorized to do so. Then again, a rogue restaurant employee could put laxatives in your food; that doesn’t mean we outlaw laxatives.

An “investigative lead”

So let’s say that you’re arrested for a crime, and your state allows the taking of DNA for your crime at arrest, and your local law enforcement agency has a rapid DNA instrument.

Now let’s assume that your DNA is searched against a DNA database of unsolved crimes, and your DNA matches a sample from another crime. What happens next?

If there is a match, police will likely want to take a closer look.

Wait a minute. There’s a DNA match! Doesn’t that mean that the police can swoop in and arrest the individual, and the individual is immediately convicted?

Um, no. Stop trusting your TV.

It takes more than DNA to convict a person of a crime.

While DNA can provide an investigative lead, DNA in and of itself is not sufficient to convict an individual. The DNA evidence usually has to be supported by additional evidence.

Especially since there may be other explanations of how the DNA got there.

In 2011, Adam Scott’s DNA matched with a sperm sample taken from a rape victim in Manchester—a city Scott, who lived more than 200 miles away, had never visited. Non-DNA evidence subsequently cleared Scott. The mixup was due to a careless mistake in the lab, in which a plate used to analyze Scott’s DNA from a minor incident was accidentally reused in the rape case.

Then there’s the uncomfortable and inconvenient truth that any of us could have DNA present at a crime scene—even if we were never there. Moreover, DNA recovered at a crime scene could have been deposited there at a time other than when the crime took place. Someone could have visited beforehand or stumbled upon the scene afterward. Alternatively, their DNA could have arrived via a process called secondary transfer, where their DNA was transferred to someone else, who carried it to the scene.

But there is a DNA case that was (originally) puzzling. Actually, a whole bunch of DNA cases.

There is an interesting case, known as the Phantom of Heilbonn, that dates from 1993 in Austria, France and Germany. From that year the DNA of an unknown female was detected at crime scenes in those countries, including at six murder scenes, one of the victims being a female police officer from Heilbronn, Germany. Between 1993 and March 2009 the woman’s DNA was detected at 40 crime scenes which ranged from murder to burglaries and robberies. The DNA was found on items ranging from a biscuit to a heroin syringe to a stolen car.

Then it got really weird.

In March 2009 investigators discovered the same DNA on the burned body of a male asylum-seeker in France. Now this presented something of an anomaly: the corpse was male but the DNA was of a female.

You guessed it; it was the swabs themselves that were contaminated.

So a DNA match is just the start of an investigative process, but it could provide the investigative lead that eventually leads to the conviction of an individual.

Perhaps you’ve noticed that I use the phrase “investigative lead” a lot when talking about DNA and about facial recognition. Trust me, it’s important.

But is the taking of DNA at booking Constitutional?

Obviously this is a huge question, because technical ability to do something does not automatically mean that you are Constitutionally authorized to do so. There is, after all, Fourth Amendment language protecting us against “unreasonable searches and seizures.”

Is the taking of DNA from arrestees who have not been convicted (assuming state law allows it) reasonable, or unreasonable?

Alonzo Jay King, Jr. had a vested interest in this question.

Alonzo Jay King Jr…was arrested in 2009 on assault charges. Before he was convicted of that crime, police took a DNA sample pursuant to Maryland’s new law allowing for such collections at the time of arrest in certain offenses….

I want to pause right here to make sure that the key point is highlighted. King, an arrestee who had not been convicted at the time of any crime, was compelled to provide evidence. At the time of arrest, collection of certain types of evidence (such as fingerprints) is “reasonable.” But collection of certain other types of evidence (such as a forced confession) is “unreasonable.”

So King’s DNA was taken and was searched against a Maryland database of DNA from unsolved crimes. You won’t believe what happened next! (Actually, you will.)

The DNA matched a sample from an unsolved 2003 rape case, and Mr. King was convicted of that crime.

Sentenced to life in prison, actually.

Wicomico County Assistant State’s Attorney Elizabeth L. Ireland said she requested the court impose a life sentence on King, not only because of his past criminal convictions, but also because it turned out that he was a friend of the victim’s family. She said this proved King was a continuing danger to the community.

Before you say, “well, if he was the rapist, he should be imprisoned, legal niceties notwithstanding,” think of the implications of that statement. The entire U.S. legal system is based upon the premise that it is better for a guilty person to mistakenly go free than for an innocent person to mistakenly be punished.

And if that doesn’t sink in…what if YOU were arrested and convicted unlawfully? What if a plate analyzing YOUR DNA wasn’t cleaned properly, and you were unjustly convicted of rape? Or what if a confession were coerced from YOU, and used to convict you?

So King’s question was certainly important, regardless of whether or not he actually committed the rape for which he was convicted.

King therefore appealed on Fourth Amendment grounds, the Maryland Court of Appeals overturned his conviction (PDF), and the State of Maryland brought the case to the U.S. Supreme Court in 2013 (Maryland v. King). In a close 5-4 decision (PDF) in which both conservatives and liberals were on both sides of the argument, the Court ruled that the taking of DNA from arrestees WAS Constitutional.

But that wasn’t the end of the argument, because a new case arose in the state of California. But the California Supreme Court ruled in 2018 that the practice was allowed in that state.

So the taking of DNA at booking is not only authorized (in some states, for some charges), it’s also Constitutional. (Although the Supreme Court’s opinion is still widely debated.)

So anyone who gets arrested for a felony in my home state of California should be ready for a buccal (cheek) swab.

A second “biometrics is evil” post (Amazon One)

This is a follow-up to something I wrote a couple of weeks ago. I concluded that earlier post by noting that when you say that something needs to be replaced because it is bad, you need to evaluate the replacement to see if it is any better…or worse.

First, the recap

Before moving forward, let me briefly recap my points from the earlier post. If you like, you can read the entire post here.

  • Amazon is incentivizing customers ($10) to sign up for its Amazon One palm print program.
  • Amazon is not the first company to use biometrics to speed retail purchases. Pay By Touch, the University of Maryland Dining Hall have already done this, as well as every single store that lets you use Apple Pay, Google Pay, or Samsung Pay.
  • Amazon One is not only being connected in the public eye to unrelated services such as Amazon Rekognition, and to unrelated studies such as Gender Shades (which dealt with classification, not recognition), but has been accused of “asking people to sell their bodies.” Yet companies that offer similar services are not being demonized in the same way.
  • If you don’t use Amazon One to pay for your purchases, that doesn’t necessarily mean that you are protected from surveillance. I’ll dive into that in this post.

Now that we’re caught up, let’s look at the latest player to enter the Amazon One controversy.

Yes, U.S. Senators can be bipartisan

If you listen to the “opinion” news services, you get the feeling that the United States Senate has devolved into two warring factions that can’t get anything done. But Senators have always worked together (see Edward Kennedy and Dan Quayle), and they continue to work together today.

Specifically, three Senators are working together to ask Amazon a few questions: Bill Cassidy, M.D. (R-LA), Amy Klobuchar (D-MN), and Jon Ossoff (D-GA).

And naturally they issued a press release about it.

Now arguments can be made about whether Congressional press releases and hearings merely constitute grandstanding, or whether they are serious attempts to better the nation. Of course, anything that I oppose is obviously grandstanding, and anything I support is obviously a serious effort.

But for the moment let’s assume that the Senators have serious concerns about the privacy of American consumers, and that the nation demands answers to these questions from Amazon.

Here are the Senators’ questions, from the press release:

  1. Does Amazon have plans to expand Amazon One to additional Whole Foods, Amazon Go, and other Amazon store locations, and if so, on what timetable? 
  2. How many third-party customers has Amazon sold (or licensed) Amazon One to? What privacy protections are in place for those third parties and their customers?
  3. How many users have signed up for Amazon One? 
  4. Please describe all the ways you use data collected through Amazon One, including from third-party customers. Do you plan to use data collected through Amazon One devices to personalize advertisements, offers, or product recommendations to users? 
  5. Is Amazon One user data, including the Amazon One ID, ever paired with biometric data from facial recognition systems? 
  6. What information do you provide to consumers about how their data is being used? How will you ensure users understand and consent to Amazon One’s data collection, storage, and use practices when they link their Amazon One and Amazon account information?
  7. What actions have you taken to ensure the security of user data collected through Amazon One?

So when will we investigate other privacy-threatening technologies?

In a sense, the work of these three Senators should be commended, because if Amazon One is not implemented properly, serious privacy breaches could happen which could adversely impact American citizens. And this is the reason why many states and municipalities have moved to restrict the use of biometrics by private businesses.

And we know that Amazon is evil, because Slate said so back in January 2020.

The online bookseller has evolved into a giant of retail, resale, meal delivery, video streaming, cloud computing, fancy produce, original entertainment, cheap human labor, smart home tech, surveillance tech, and surveillance tech for smart homes….The company’s “last mile” shipping operation has led to burnout, injuries, and deaths, all connected to a warehouse operation that, while paying a decent minimum wage, is so efficient in part because it treats its human workers like robots who sometimes get bathroom breaks.

But why stop with Amazon? After all, Slate’s list included 29 other companies (while Amazon tops the list, other “top”-ranked companies include Facebook, Alphabet, Palantir Technologies, and Uber), to say nothing of entire industries that are capable of massive privacy violations.

Privacy breaches are not just tied to biometric systems, but can be tied to any system that stores private data. Restricting or banning biometric systems won’t solve anything, since all of these abuses could potentially occur on other systems.

  • When will the Senators ask these same questions to Apple, Google (part of the aforementioned Alphabet), and Samsung to find out when these companies will expand their “Pay” services? They won’t even have to ask all seven questions, because we already know the answer to question 5.
  • Oh, and while we’re at it, what about Mastercard, Visa, American Express, Discover, and similar credit card services that are often tied to information from our bank accounts? How do these firms personalize their offerings? Who can buy all that data?
  • And while we’re looking at credit cards, what about the debit cards issued by the banks, which are even more vulnerable to abuse. Let’s have the banks publicly reveal all the ways in which they protect user data.
  • You know, you have to watch out for those money orders also. How often do money order issuers ask consumers to show their government ID? What happens to that data?
  • Oh, and what about those gift cards that stores issue? What happens to the location and purchase data that is collected for those gift cards?
  • When people use cash to pay for goods, what is the resolution of the surveillance cameras that are trained on the cash registers? Can those surveillance cameras read the serial numbers on the bills that are exchanged? What assurances can the stores give that they are not tracking those serial numbers as they flow through the economy?

If you think that it’s silly to shut down every single payment system that could result in a privacy violation…you’re right.

Obviously if Amazon is breaking federal law, it should be prosecuted accordingly.

And if Amazon is breaking state law (such as Illinois BIPA law), then…well, that’s not the Senators’ business, that’s the business of class action lawyers.

But now the ball is in Amazon’s court, and Amazon will either provide thousands of pages of documents, a few short answers, a response indicating that the Senators are asking for confidential information on future product plans, or (unlikely with Amazon, but possible with other companies) a reply stating that the Senators can go pound sand.

Either way, the “Amazon is evil” campaign will continue.

Today’s “biometrics is evil” post (Amazon One)

I can’t recall who recorded it, but there’s a radio commercial heard in Southern California (and probably nationwide) that intentionally ridicules people who willingly give up their own personally identifiable information (PII) for short-term gain. In the commercial, both the husband and the wife willingly give away all sorts of PII, including I believe their birth certificates.

While voluntary surrender of PII happens all the time (when was the last time you put your business card in a drawing bowl at a restaurant?), people REALLY freak out when the information that is provided is biometric in nature. But are the non-biometric alternatives any better?

TechCrunch, Amazon One, and Ten Dollars

TechCrunch recently posted “Amazon will pay you $10 in credit for your palm print biometrics.

If you think that the article details an insanely great way to make some easy money from Amazon, then you haven’t been paying attention to the media these last few years.

The article begins with a question:

How much is your palm print worth?

The article then describes how Amazon’s brick-and-mortar stores in several states have incorporated a new palm print scanner technology called “Amazon One.” This technology, which reads both friction ridge and vein information from a shopper’s palms. This then is then associated with a pre-filed credit card and allows the shopper to simply wave a palm to buy the items in the shopping cart.

There is nothing new under the sun

Amazon One is the latest take on processes that have been implemented several times before. I’ll cite three examples.

Pay By Touch. The first one that comes to my mind is Pay By Touch. While the management of the company was extremely sketchy, the technology (provided by Cogent, now part of Thales) was not. In many ways the business idea was ahead of its time, and it had to deal with challenging environmental conditions: the fingerprint readers used for purchases were positioned near the entrances/exits to grocery stores, which could get really cold in the winter. Couple this with the elderly population that used the devices, and it was sometimes difficult to read the fingers themselves. Yet, this relatively ancient implementation is somewhat similar to what Amazon is doing today.

University of Maryland Dining Hall. The second example occurred to me because it came from my former employer (MorphoTrak, then part of Safran and now part of IDEMIA), and was featured at a company user conference for which I coordinated speakers. There’s a video of this solution, but sadly it is not public. I did find an article describing the solution:

With the new system students will no longer need a UMD ID card to access their own meals…

Instead of pulling out a card, the students just wave their hand through a MorphoWave device. And this allows the students to pay for their meals QUICKLY. Good thing when you’re hungry.

This Pay and That Pay. But the most common example that everyone uses is Apple Pay, Google Pay, Samsung Pay, or whatever “pay” system is supported on your smartphone. Again, you don’t have to pull out a credit card or ID card. You just have to look at your phone or swipe your finger on the phone, and payment happens.

Amazon One is the downfall of civilization

I don’t know if TechCrunch editorialized against Pay By Touch or [insert phone vendor here] Pay, and it probably never heard of the MorphoWave implementation at the University of Maryland. But Amazon clearly makes TechCrunch queasy.

While the idea of contactlessly scanning your palm print to pay for goods during a pandemic might seem like a novel idea, it’s one to be met with caution and skepticism given Amazon’s past efforts in developing biometric technology. Amazon’s controversial facial recognition technology, which it historically sold to police and law enforcement, was the subject of lawsuits that allege the company violated state laws that bar the use of personal biometric data without permission.

Oh well, at least TechCrunch didn’t say that Amazon was racist. (If you haven’t already read it, please read the Security Industry Association’s “What Science Really Says About Facial Recognition Accuracy and Bias Concerns.” Unless you don’t like science.)

OK, back to Amazon and Amazon One. TechCrunch also quotes Albert Fox Cahn of the Surveillance Technology Oversight Project.

People Leaving the Cities, photo art by Zbigniew Libera, imagines a dystopian future in which people have to leave dying metropolises. By Zbigniew Libera – https://artmuseum.pl/pl/kolekcja/praca/libera-zbigniew-wyjscie-ludzi-z-miast, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=66055122.

“The dystopian future of science fiction is now. It’s horrifying that Amazon is asking people to sell their bodies, but it’s even worse that people are doing it for such a low price.”

“Sell their bodies.” Isn’t it even MORE dystopian when people “give their bodies away for free” when they sign up for Apple Pay, Google Pay, or Samsung Pay? While the Surveillance Technology Oversight Project (acronym STOP) expresses concern about digital wallets, there is a significant lack of horror in its description of them.

Digital wallets and contactless payment systems like smart chips have been around for years. The introduction of Apple Pay, Amazon Pay, and Google Pay have all contributed to the e-commerce movement, as have fast payment tools like Venmo and online budgeting applications. In response to COVID-19, the public is increasingly looking for ways to reduce or eliminate physical contact. With so many options already available, contactless payments will inevitably gain momentum….

Without strong federal laws regulating the use of our data, we’re left to rely on private companies that have consistently failed to protect our information. To prevent long-term surveillance, we need to limit the data collected and shared with the government to only what is needed. Any sort of monitoring must be secure, transparent, proportionate, temporary, and must allow for a consumer to find out about or be alerted to implications for their data. If we address these challenges now, at a time when we will be generating more and more electronic payment records, we can ensure our privacy is safeguarded.

So STOP isn’t calling for the complete elimination of Amazon Pay. But apparently it wants to eliminate Amazon One.

Is a world without Amazon One a world with less surveillance?

Whenever you propose to eliminate something, you need to look at the replacement and see if it is any better.

In 1998, Fox fired Bill Russell as the manager of the Los Angeles Dodgers. He had a win-loss percentage of .538. His replacement, Glenn Hoffman, lasted less than a season and had a percentage of .534. Hoffman’s replacement, true baseball man Davey Johnson, compiled a percentage of .503 over the next two seasons before he was fired. Should have stuck with Russell.

Anyone who decides (despite the science) that facial recognition is racist is going to have to rely on other methods to identify criminals, such as witness identification. Witness identification has documented inaccuracies.

And if you think that elimination of Amazon One from Amazon’s brick-and-mortar stores will lead to a privacy nirvana, think again. If you don’t use your palm to pay for things, you’re going to have to use a credit card, and that data will certainly be scanned by the FBI and the CIA and the BBC, B. B. King, and Doris Day. (And Matt Busby, of course.) And even if you use cash, the only way that you’ll preserve any semblance of your privacy is to pay anonymously and NOT tie the transaction to your Amazon account.

And if you’re going to do that, you might as well skip Whole Foods and go straight to Dollar General. Or maybe not, since Dollar General has its own app. And no one calls Dollar General dystopian. Wait, they do: “They tend to cluster, like scavengers feasting on the carcasses of the dead.”

I seemed to have strayed from the original point of this post.

But let me sum up. It appears that biometrics is evil, Amazon is evil, and Amazon biometrics are Double Secret Evil.