Voracious Crawlers: Cloudflare and AI

(Animals strike curious poses. Imagen 4.)

Matthew Prince of Cloudflare recently described an alleged imbalance affecting content creators, and what Cloudflare and others are doing about it. It turns out that today’s AI web crawlers behave differently than yesterday’s search web crawlers.

The revolution 

Prince began his article by describing a win-win deal facilitated by a content-gathering company known as Google. Google’s web crawlers would acquire site content, but the content creators would win also.

“The deal that Google made with content creators was simple: let us copy your content for search, and we’ll send you traffic. You, as a content creator, could then derive value from that traffic in one of three ways: running ads against it, selling subscriptions for it, or just getting the pleasure of knowing that someone was consuming your stuff.”

Sounds like a win-win to me.

The new power generation

What Prince didn’t say is that not everyone was thrilled with the arrangement.

Let’s start with Spain, and the relationship between Spanish online publications and Google Noticias (Google News). 

Imagen 4.

The publishers thought they were getting the raw end of the deal, since Google would present summaries of the publishers’ content on Google pages, but no one would go to the publishers’ pages. Why bother? Google had shared the important stuff.

So Spain passed a law in 2014 requiring Google Noticias to pay…and Google Noticias shut down in Spain in December of that year.

“Reacting to a law that requires news sites in Spain to charge for their content, Google shut down its Google News service in the country….The tech company and other news aggregators would face steep fines if they publish headlines and abstracts without paying.”

At the time, I cast this as a battle between the nations and plucky individuals fighting for freedom…ignoring the fact that Google (cited twice below) was more powerful than some nations.

“So it’s possible for individuals to flout the laws of nations. The nations, however, are fighting back. Spain has passed content laws that are forcing Google to shut down Google Noticias in Spain. Swedish laws have brought the Pirate Bay offline. Russia is enacting laws that are forcing Google (again) to take its engineers out of Russia.”

As an aside, it’s worth noting that several nations subsequently banded together to implement GDPR, shifting more power to the governments.

Oh, and the Spanish law was changed to conform with European Union copyright law. As a result, Google Noticias came back online in Spain in 2022, eight years later.

3rdeyegirl (bear with me here)

Back to Cloudflare’s Matthew Prince, who talked about a brand new voracious web crawler that didn’t feel like a win-win. Rather than presenting links to outside content, or summaries of content accompanied by prominent links, AI tools (including Google’s own) would simply present the summaries, burying the links.

“Google itself has changed. While ten years ago they presented a list of links and said that success was getting you off their site as quickly as possible, today they’ve added an answer box and more recently AI Overviews which answer users’ questions without them having to leave Google.com. With the answer box they reported that 75 percent of queries were answered without users leaving Google. With the more recent launch of AI Overviews it’s even higher.”

Imagen 4.

So the new AI-sponsored web crawlers and their implementation effectively serve to keep readers in the walled gardens of OpenAI, Google, Microsoft, and the rest.

Walled gardens again? It’s just human nature. Having to click on a link to go somewhere else causes friction. This very post links to Cloudflare’s article, my old Empoprise-BI blog, and a multitude of other sources. And I bet you won’t click on ANY of those links to view the other content. I know. WordPress tells me.

As Prince himself puts it: 

“…increasingly we aren’t consuming originals, we’re consuming derivatives.”

Ran out of band names

Back in 2023 I already noted a move to block AI web crawlers.

So what’s Cloudflare doing about the AI web crawlers that are sucking information away with little or no return to the content creators?

Blocking them by default.

“That changes today, July 1, what we’re calling Content Independence Day. Cloudflare, along with a majority of the world’s leading publishers and AI companies, is changing the default to block AI crawlers unless they pay creators for their content. That content is the fuel that powers AI engines, and so it’s only fair that content creators are compensated directly for it.”

Cloudflare envisions a marketplace in which AI companies will pay creators for high quality content.

However, today’s content creators may face the same challenges that Spanish periodicals faced from 2014 to 2022. They may prevent their content from being ripped off…but no one will ever know because the people who go to ChatGPT will never learn about them.

Because in the end, most people are happy with derived content.

But your hungry people want to hear from you.

If you are a tech marketer who needs help creating content, talk to Bredemarket.

Content for tech marketers.

Ubiquity Via Focus, The Recap

June 2025 is almost over, so I can evaluate my performance against my goal.

  • Did I focus? Somewhat, both in my professional and my personal life.
  • Did I achieve ubiquity? Nope. But the blog has enjoyed record impressions and visitors. And would have achieved more if I hadn’t run afoul of the search engine gatekeepers.
  • Did I improve Bredemarket’s “capabilities to serve you”? Yes.

So what is my goal for July? Stay tuned.

Ubiquity Via Focus.

Mistaken Identity

I generated this picture in Imagen 4 after reading an AI art prompt suggestion from Danie Wylie. (I have mentioned her before in the Bredemarket blog…twice.)

The AI exercise raises a question.

What if you are in the middle of an identity verification or authentication process, and only THEN discover that a fraudster is impersonating you at that very moment?

Applying Common Sense to Employment Fraud

Jobseekers need to know their potential employer when something about a job opportunity doesn’t feel right. And there are ways to do that.

Trusting the person who says to trust your gut

I’ve previously talked about how common sense can minimize the chances of being fooled by a deepfake.

But common sense can help prevent other types of fraud such as employment fraud, as noted by Rachel Lund, chief risk officer with Sandia Area Federal Credit Union.

“Trust your gut- if it feels off, it probably is.”

But can we trust Lund? 

Using search engines for employment fraud scam research

Let’s look at another tip of hers:

“Research the company: Google “[Company Name] + Scam” and see if anything comes up.”

Although you can use Bing. Google isn’t the only search engine out there.

So I entered “Sandia Area Federal Credit Union Scam” into Bing…and found out about its warnings about scams.

From Microsoft.

As far as Bing is concerned, Scandia Area Federal Credit Union is not a scammer itself.

But Bing (and Google) are old fashioned dinosaurs.

Using generative AI for employment fraud scam research

So I clicked on the tab for Copilot results. (ChatGPT isn’t the only generative AI tool out there.)

From Microsoft.

Well, it’s good to know that a regulated credit union isn’t a scammer.

So credit unions are fine

But what about something with a slightly sleazier reputation…like stuffing envelopes?

From Microsoft.

OK, Copilot isn’t hot on envelope stuffing opportunities. 

So envelope stuffing isn’t fine

But what if we get personal?

From Microsoft.

TL;DR: “That’s not us.”

Know your business. Know your employer.

How to Prepare for Your 30 Minute Meeting With Bredemarket

You are the CMO, marketing leader, or other leader at an identity, biometric, or technology firm.

You’ve made the decision to work with Bredemarket to create your content, proposal, or analysis.

You’ve gone to the https://bredemarket.com/cpa/ page and scheduled a “Free 30 minute content needs assessment” with me on my Calendly calendar. We will talk via Google Meet.

You’ve answered the preliminary questions I asked in the meeting request, including:

So…what now?

I will make it real simple. I will ask you a single simple question:

“Why?”

  • Why does your company exist, and why is it really great and why are your competitors terrible?
  • Same with your products and services. Why are they great and why are the competing ones terrible?
  • Or maybe the competitors and their products/services are great and YOURS are terrible. It’s a private call, and we can talk freely.

We have 30 minutes to chat, and at the end of that time you and I will jointly determine

  • Why we should (or should not) work together
  • How we should work together
  • What I will do, and what you will do

See you soon.

Deepfake App Secret Purposes and Age Non-verification

It’s nearly impossible to battle a tidal wave.

CBS News recently reported on the attempts of Meta and others to remove advertisements for “nudify” apps from their platforms. The intent of these apps is to take pictures of existing people—for example, “Scarlett Johansson and Anne Hathaway”—and creating deepfake nudes based on the source material.

Two versions of “what does this app do”

But the apps may present their purposes differently when applying for Apple App Store and Google Play Store approval.

“The problem with apps is that they have this dual-use front where they present on the app store as a fun way to face swap, but then they are marketing on Meta as their primary purpose being nudification. So when these apps come up for review on the Apple or Google store, they don’t necessarily have the wherewithal to ban them.”

How old are you? If you say so

And there’s another problem. While the apps are marketed to adult men, their users extend beyond that.

“CBS News’ 60 Minutes reported on the lack of age verification on one of the most popular sites using artificial intelligence to generate fake nude photos of real people. 

“Despite visitors being told that they must be 18 or older to use the site…60 Minutes was able to immediately gain access to uploading photos once the user clicked “accept” on the age warning prompt, with no other age verification necessary.”

We’ve seen this so-called “age verification” before.

From another age-regulated industry.

But if whack-a-mole fighting against deepfake generators won’t work, what will?

I don’t have the answer. Even common sense won’t help here.

Veo 3 and Deepfakes

(Not a video, but a still image from Imagen 4)

My Google Gemini account does not include access to Google’s new video generation tool Veo 3. But I’m learning about its capabilities from sources such as TIME magazine.

Which claims to be worried.

“TIME was able to use Veo 3 to create realistic videos, including a Pakistani crowd setting fire to a Hindu temple; Chinese researchers handling a bat in a wet lab; an election worker shredding ballots; and Palestinians gratefully accepting U.S. aid in Gaza. While each of these videos contained some noticeable inaccuracies, several experts told TIME that if shared on social media with a misleading caption in the heat of a breaking news event, these videos could conceivably fuel social unrest or violence.”

However, TIME notes that the ability to create fake videos has existed for years. So why worry now?

“Veo 3 videos can include dialogue, soundtracks and sound effects. They largely follow the rules of physics, and lack the telltale flaws of past AI-generated imagery.”

Some of this could be sensationalism. After all, simple text can communicate misinformation.

And you can use common sense to detect deepfakes…sometimes.

Mom’s spaghetti 

Then again, some of the Veo 3 deepfakes look pretty good. Take this example of Will Smith slapping down some pasta at Eminem’s restaurant. The first part of the short was generated with old technology, the last part with Veo 3.

Now I am certain that Google will attempt to put guardrails on Veo 3, as it has attempted to do with other products.

But what will happen if a guardrail-lacking Grok video generator is released?

Or if someone creates a non-SaaS video generator that a user can run on their own with all guardrails disabled?

Increase the impact of your deepfake detection technology

In that case, deepfake detection technology will become even more critical.

Does your firm offer deepfake detection technology?

Do you want your prospects to know how your technology benefits them?

Here’s how Bredemarket can help you help your prospects: https://bredemarket.com/cpa/

Expanding My Generative AI Picture Prompts

I’m experimenting with more detailed prompts for generative AI.

If you haven’t noticed, I use a ton of AI-generated images in Bredemarket blog posts and social media posts. They primarily feature wildebeests, wombats, and iguanas, although sometimes they feature other things.

My prompts for these images are usually fairly short, no more than two sentences.

But when I saw some examples of prompts written by Danie Wylie—yes, the same Danie Wylie who wrote the Facebook post earlier this year at the https://m.facebook.com/story.php?story_fbid=pfbid0nvmhyuLpn3jwMv8K8sbK5EXfS4kcpjfWHicgj4BJhdFLMme87P5fvPSYf9CwjRH7l&id=100001380243595&mibextid=wwXIfr URL—then I realized that I could include a lot more detail in my own image prompts.

If you read Wylie’s Facebook post, or my own subsequent post at the https://bredemarket.com/2025/06/03/when-hivellm-pitches-an-anti-fraud-professional/ URL, then you know exactly what the picture depicts. 

Plus some other stuff buried in the details.

By the way, here is my prompt, which Google Gemini (Imagen 4) stored as “Eerie Scene: Sara’s Fake Bills.”

“Draw a realistic picture of a ghost-like woman wearing a t-shirt with the name “Sara.” She is holding out a large stack of dollar bills that is obviously fake because the picture on the bill is a picture of a clown with orange face makeup wearing a blue suit and a red tie. Next to Sara is a dead tree with a beehive hanging from it. Bees buzz around the beehive. A laptop with the word “HiveLLM” on the screen sits on the rocky ground beneath the tree. It is night time, and the full moon casts an eerie glow over the landscape.”

I didn’t get exactly what I wanted—the bills are two-faced—but close enough. And the accident of two-faced bills is a GOOD thing.

How detailed are your picture prompts?

Eerie.

The Monk Skin Tone Scale

(Part of the biometric product marketing expert series)

Now that I’ve dispensed with the first paragraph of Google’s page on the Monk Skin Tone Scale, let’s look at the meat of the page.

I believe we all agree on the problem: the need to measure the accuracy of facial analysis and facial recognition algorithms for different populations. For purposes of this post we will concentrate on a proxy for race, a person’s skin tone.

Why skin tone? Because we have hypothesized (I believe correctly) that the performance of facial algorithms is influenced by the skin tone of the person, not by whether or not they are Asian or Latino or whatever. Don’t forget that the designated races have a variety of skin tones within them.

But how many skin tones should one use?

40 point makeup skin tone scale

The beauty industry has identified over 40 different skin tones for makeup, but this granular of an approach would overwhelm a machine learning evaluation:

[L]arger scales like these can be challenging for ML use cases, because of the difficulty of applying that many tones consistently across a wide variety of content, while maintaining statistical significance in evaluations. For example, it can become difficult for human annotators to differentiate subtle variation in skin tone in images captured in poor lighting conditions.

6 point Fitzpatrick skin tone scale

The first attempt at categorizing skin tones was the Fitzpatrick system.

To date, the de-facto tech industry standard for categorizing skin tone has been the 6-point Fitzpatrick Scale. Developed in 1975 by Harvard dermatologist Thomas Fitzpatrick, the Fitzpatrick Scale was originally designed to assess UV sensitivity of different skin types for dermatological purposes.

However, using this skin tone scale led to….(drumroll)…bias.

[T]he scale skews towards lighter tones, which tend to be more UV-sensitive. While this scale may work for dermatological use cases, relying on the Fitzpatrick Scale for ML development has resulted in unintended bias that excludes darker tones.

10 point Monk Skin Tone (MST) Scale

Enter Dr. Ellis Monk, whose biography could be ripped from today’s headlines.

Dr. Ellis Monk—an Associate Professor of Sociology at Harvard University whose research focuses on social inequalities with respect to race and ethnicity—set out to address these biases.

If you’re still reading this and haven’t collapsed in a rage of fury, here’s what Dr. Monk did.

Dr. Monk’s research resulted in the Monk Skin Tone (MST) Scale—a more inclusive 10-tone scale explicitly designed to represent a broader range of communities. The MST Scale is used by the National Institute of Health (NIH) and the University of Chicago’s National Opinion Research Center, and is now available to the entire ML community.

From https://skintone.google/the-scale.

Where is the MST Scale used?

According to Biometric Update, iBeta has developed a demographic bias test based upon ISO/IEC 19795-10, which itself incorporates the Monk Skin Tone Scale.

At least for now. Biometric Update notes that other skin tone measurements are under developoment, including the “Colorimetric Skin Tone (CST)” and INESC TEC/Fraunhofer Institute research that uses “ethnicity labels as a continuous variable instead of a discrete value.”

But will there be enough data for variable 8.675309?

Revisiting Amazon Rekognition, May 2025

(Part of the biometric product marketing expert series)

A recent story about Meta face licensing changes caused me to get reflective.

“This openness to facial recognition could signal a turning point that could affect the biometric industry. 

“The so-called “big” biometric players such as IDEMIA, NEC, and Thales are teeny tiny compared to companies like Meta, Alphabet, and Amazon. If the big tech players ever consented to enter the law enforcement and surveillance market in a big way, they could put IDEMIA, NEC, and Thales out of business. 

“However, wholesale entry into law enforcement/surveillance could damage their consumer business, so the big tech companies have intentionally refused to get involved – or if they have gotten involved, they have kept their involvement a deep dark secret.”

Then I thought about the “Really Big Bunch” product that offered the greatest threat to the “Big 3” (IDEMIA, NEC, and Thales)—Amazon Rekognition, which directly competed in Washington County, Oregon until Amazon imposed a one-year moratorium on police use of facial recognition in June 2020. The moratorium was subsequently extended until further notice.

I last looked at Rekognition in June 2024, when Amazon teamed up with HID Global and may have teamed up with the FBI.

So what’s going on now?

Hard to say. I have been unable to find any newly announced Amazon Rekognition law enforcement customers.

That doesn’t mean that nothing is happening. Perhaps the government buyers are keeping their mouths shut.

Plus, there is this page, “Use cases that involve public safety.”

Nothing controversial on the page itself:

  • “Have appropriately trained humans review all decisions to take action that might impact a person’s civil liberties or equivalent human rights.”
  • “Train personnel on responsible use of facial recognition systems.”
  • “Provide public disclosures of your use of facial recognition systems.”
  • “In all cases, facial comparison matches should be viewed in the context of other compelling evidence, and shouldn’t be used as the sole determinant for taking action.” (In other words, INVESTIGATIVE LEAD only.)

Nothing controversial at all, and I am…um…99% certain (geddit?) that IDEMIA, NEC, and Thales would endorse all these points.

But why does Amazon even need such a page, if Rekognition is only used to find missing children?

Maybe this is a pre-June 2020 page that Amazon forgot to take down.

Or maybe not.

Couple this with the news about Meta, and there’s the possibility that the Really Big Bunch may enter the markets currently dominated by the Big Three.

Imagine if the DHS HART system, delayed for years, were resurrected…with Alphabet or Amazon or Meta technology.

We are still in the time of uncertainty…and may never go back.

(Large and small wildebeests via Imagen 3)