What if Machine Learning Models Can’t Get Generative AI Training Data?

An image of a neural network. By DancingPhilosopher – Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=135594693

Machine learning models need training data to improve their accuracy—something I know from my many years in biometrics.

And it’s difficult to get that training data—something else I know from my many years in biometrics. Consider the acronyms GDPR, CRPA, and especially BIPA. It’s very hard to get data to train biometric algorithms, so they are trained on relatively limited data sets.

At the same time that biometric algorithm training data is limited, Kevin Indig believes that generative AI large language models are ALSO going to encounter limited accessibility to training data. Actually, they are already.

The lawsuits have already begun

A few months ago, generative AI models like ChatGPT were going to solve all of humanity’s problems and allow us to lead lives of leisure as the bots did all our work for us. Or potentially the bots would get us all fired. Or something.

But then people began to ask HOW these large language models work…and where they get their training data.

Just like biometric training models that just grab images and associated data from the web without asking permission (you know the example that I’m talking about), some are alleging that LLMs are training their models on copyrighted content in violation of the law.

I am not a lawyer and cannot meaningfully discuss what is “fair use” and what is not, but suffice it to say that alleged victims are filing court cases.

Sarah Silverman et al and copyright infringement

Here’s one example from July:

Comedian and author Sarah Silverman, as well as authors Christopher Golden and Richard Kadrey — are suing OpenAI and Meta each in a US District Court over dual claims of copyright infringement.

The suits alleges, among other things, that OpenAI’s ChatGPT and Meta’s LLaMA were trained on illegally-acquired datasets containing their works, which they say were acquired from “shadow library” websites like Bibliotik, Library Genesis, Z-Library, and others, noting the books are “available in bulk via torrent systems.”

From https://www.theverge.com/2023/7/9/23788741/sarah-silverman-openai-meta-chatgpt-llama-copyright-infringement-chatbots-artificial-intelligence-ai

This could be a big mess, especially since copyright laws vary from country to country. This description of copyright law LLM implications, for example, is focused upon United Kingdom law. Laws in other countries differ.

And now the technical blocks are beginning

Just today, Kevin Indig highlighted another issue that could limit LLM access to online training data.

Some sites are already blocking the LLM crawlers

Systems that get data from the web, such as Google, Bing, and (relevant to us) ChatGPT, use “crawlers” to gather the information from the web for their use. ChatGPT, for example, has its own crawler.

By Yintan at English Wikipedia, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=63631702

Guess what Indig found out about ChatGPT’s crawler?

An analysis of the top 1,000 sites on the web from Originality AI shows 12% already block Chat GPT’s crawler. (source)

From https://www.kevin-indig.com/most-sites-will-block-chat-gpt/

But that only includes the sites that blocked the crawler when Originality AI performed its analysis.

More sites will block the LLM crawlers

Indig believes that in the future, the number of the top 1000 sites that will block ChatGPT’s crawler will rise significantly…to 84%. His belief is based on analyzing the business models for the sites that already block ChatGPT and assuming that other sites that use the same business models will also find it in their interest to block ChatGPT.

The business models that won’t block ChatGPT are assumed to include governments, universities, and search engines. Such sites are friendly to the sharing of information, and thus would have no reason to block ChatGPT or any other LLM crawler.

The business models that would block ChatGPT are assumed to include publishers, marketplaces, and many others. Entities using these business models are not just going to turn it over to an LLM for free.

As Indig explains regarding the top two blocking business models:

By Karl Thomas Moore – Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=58968347

For publishers, content is the product. Giving it away for free to generative AI means foregoing most if not all, ad revenue. Publishers remember the revenue drops caused by social media and modern search engines in the late 2,000s.

Marketplaces build their own AI assistants and don’t want competition.

From https://www.kevin-indig.com/most-sites-will-block-chat-gpt/

What does this mean for LLMs?

One possibility is that LLMs will run into the same training issues as biometric algorithms.

  • In biometrics, the same people that loudly exclaim that biometric algorithms are racist would be horrified at the purely technical solution that would solve all inaccuracy problems—let the biometric algorithms train on ALL available biometric data. In the activists’ view (and in the view of many), unrestricted access to biometric data for algorithmic training would be a privacy nightmare.
  • Similarly, those who complain that LLMs are woefully inaccurate would be horrified if the LLM accuracy problem were solved by a purely technical solution: let the algorithms train themselves on ALL available data.

Could LLMs buy training data?

Of course, there’s another solution to the problem: have the companies SELL their data to the LLMs.

By Nic McPhee from Morris, Minnesota, USA – London – 14-15 Dec 2007 – 034, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=10606179

In theory, this could provide the data holders with a nice revenue stream while allowing the LLMs to be extremely accurate. (Of course the users who actually contribute the data to the data holders would probably be shut out of any revenue, but them’s the breaks.)

But that’s only in theory. Based upon past experience with data holders, the people who want to use the data are probably not going to pay the data holders sufficiently.

Google and Meta to Canada: Drop dead / Mourir

By The original uploader was Illegitimate Barrister at Wikimedia Commons. The current SVG encoding is a rewrite performed by MapGrid. – This vector image is generated programmatically from geometry defined in File:Flag of Canada (construction sheet – leaf geometry).svg., Public Domain, https://commons.wikimedia.org/w/index.php?curid=32276527

Even today, Google and Meta (Facebook et al) are greeting Canada’s government-mandated Bill C-18 with resistance. Here’s what Google is saying:

Bill C-18 requires two companies (including Google) to pay for simply showing links to Canadian news publications, something that everyone else does for free. The unprecedented decision to put a price on links (a so-called “link tax”) breaks the way the web and search engines work, and exposes us to uncapped financial liability simply for facilitating access to news from Canadian publications….

As a result, we have informed them that we have made the difficult decision that, when the law takes effect, we will be removing links to Canadian news publications from our Search, News, and Discover products.

From https://blog.google/canada-news-en/#overview

But wait, it gets better:

In addition, we will no longer be able to operate Google News Showcase – our product experience and licensing program for news – in Canada.

From https://blog.google/canada-news-en/#overview

Google News Showcase is the program that gives money to news organizations in Canada. Meta has a similar program. Peter Menzies notes that these programs give tens of millions of (Canadian) dollars to news organizations, but that could end, despite government threats.

The federal and Quebec governments pulled their advertising spends, but those moves amount to less money than Meta will save by ending its $18 million in existing journalism funding. 

From https://thehub.ca/2023-09-15/peter-menzies-the-media-is-boycotting-meta-and-nobody-cares/

What’s next?

Bearing in mind that Big Tech is reluctant to give journalistic data holders money even when a government ORDERS that they do so…

…what is the likelihood that generative AI algorithm authors (including Big Tech companies like Google and Microsoft) will VOLUNTARILY pay funds to data holders for algorithm training?

If Kevin Indig is right, LLM training data will become extremely limited, adversely affecting the algorithms’ use.

AdvoLogix on “9 Ways to Use AI in the Workplace”

Bredemarket occasionally gets pitches from people who want to write for the blog, or to link to something they’ve already written.

Most of these pitches are crap.

But I just received an excellent and relevant pitch from a PR coordinator. I won’t reproduce his pitch, though, because I don’t want to get sued.

Which in this case is a very distinct possibility.

Who is AdvoLogix?

The PR coordinator represents a company called AdvoLogix, and wanted me to reshare something his company had written.

My first question (of course) is why AdvoLogix exists.

We build and deliver technologies that help legal teams collaborate and grow.

From https://www.advologix.com/about-advologix/

The company provides legal software, resident on Salesforce, that addresses several areas:

Now I am not a lawyer, but I’m sure these terms mean something to lawyers. If you’re looking for these types of solutions, check the links above.

Why did AdvoLogix pitch me?

The PR coordinator had observed Bredemarket’s previous posts on artificial intelligence (excluding the one that I wrote after his pitch), and thought that AdvoLogix’s recent blog post on the same topic would be of interest to Bredemarket’s readers.

What does AdvoLogix say about using AI in the workplace?

AdvoLogix’s post is clear in its intent. It is entitled “9 Ways to Use AI in the Workplace.” The introduction to the post explains AdvoLogix’s position on the use of artificial intelligence.

Rather than replacing human professionals, AI applications take a complementary role in the workplace and improve overall efficiency. Here are nine actionable ways to use artificial intelligence, no matter your industry.

From https://www.advologix.com/ai-applications-business/

I won’t list ALL nine of the ways—I want you to go read the post, after all. But let me highlight one of them—not the first one, but the eighth one.

Individual entrepreneurs can also benefit from AI-driven technologies. Entrepreneurship requires great financial and personal risk, especially when starting a new business. Entrepreneurs must often invest in essential resources and engage with potential customers to build a brand from scratch. With AI tools, entrepreneurs can greatly limit risk by improving their organization and efficiency. 

From https://www.advologix.com/ai-applications-business/

The AdvoLogix post then goes on to recommend specific ways that entrepreneurs can use artificial intelligence, including:

  • AI shopping
  • Use AI Chatbots for Customer Engagement

Regardless of how you feel about the use of AI in these areas, you should at least consider them as possible options.

Why did AdvoLogix write the post?

Obviously the company had a reason for writing the post, and for sharing the post with people like me (and like you).

AdvoLogix provides law firms, legal offices, and public agencies with advanced, cloud-based legal software solutions that address their actual needs. 

Thanks to AI tools like Caster, AdvoLogix can provide your office with effective automation of data entry, invoicing, and other essential but time-consuming processes. Contact AdvoLogix to request a free demo of the industry’s best AI tools for law offices like yours. 

From https://www.advologix.com/ai-applications-business/

So I’m not even going to provide a Bredemarket call to action, since AdvoLogix already provided its own. Good for AdvoLogix.

But what about Steven Schwartz?

The AdvoLogix post did not specifically reference Steven Schwartz, although the company stated that you should control the process yourself and not cede control to your artificial intelligence tool.

Something that Schwartz did not do.

Roberto Mata sued Avianca airlines for injuries he says he sustained from a serving cart while on the airline in 2019, claiming negligence by an employee. Steven Schwartz, an attorney with Levidow, Levidow & Oberman and licensed in New York for over three decades, handled Mata’s representation.

But at least six of the submitted cases by Schwartz as research for a brief “appear to be bogus judicial decisions with bogus quotes and bogus internal citations,” said Judge Kevin Castel of the Southern District of New York in an order….

In late April, Avianca’s lawyers from Condon & Forsyth penned a letter to Castel questioning the authenticity of the cases….

Among the purported cases: Varghese v. China South Airlines, Martinez v. Delta Airlines, Shaboon v. EgyptAir, Petersen v. Iran Air, Miller v. United Airlines, and Estate of Durden v. KLM Royal Dutch Airlines, all of which did not appear to exist to either the judge or defense, the filing said.

Schwartz, in an affidavit, said that he had never used ChatGPT as a legal research source prior to this case and, therefore, “was unaware of the possibility that its content could be false.” He accepted responsibility for not confirming the chatbot’s sources.

Schwartz is now facing a sanctions hearing on June 8.

From https://www.cnn.com/2023/05/27/business/chat-gpt-avianca-mata-lawyers/index.html

On that sanctions hearing date, Schwartz was mercilessly grilled by the judge. Later that month, the judge sanctioned and fined Schwartz and another lawyer.

In the end, you are responsible, not the tool you use.

By the way, Roberto Mata lost the case. Not because of his lawyers’ misuse of AI, but because the case was filed too late.

Should Inland Empire Companies Generate Content with ChatGPT (or other generative AI tools)?

An image of a neural network. By DancingPhilosopher – Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=135594693

Inland Empire companies are asking the same question that other companies are asking: “Should my business use ChatGPT (or another generative AI tool) to quickly generate content?”

Bear in mind that I have a vested interest in the answer to this question, because Bredemarket’s business is content generation for other businesses. So I don’t have a natural inclination to tell you not to buy my services and use ChatGPT or Bard or whatever instead.

Nevertheless, I’ll outline the advantages and drawbacks of using automation (in this case generative AI tools) to create written content. Then I’ll look at the three options for your business’ content generation needs. (Surprise! One of those options is to contract with Bredemarket. You’ve been warned that a call to action is coming.)

An “exchange” of questions

As I mentioned in my post about creating and populating Facebook business pages, yesterday afternoon was the first time in years that I had attended a monthly Ontario IDEA Exchange meeting. For those who aren’t familiar with these meetings, they allow small business owners from the Ontario, California area to meet and…um…exchange ideas.

By Mack Male – originally posted to Flickr as Ontario Convention Center, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=9512928

One of the attendees asked me what I thought of ChatGPT.

If you’ve read the Bredemarket blog or the Bredemarket LinkedIn page, you know that I have some thoughts on this. (If you haven’t seen my thoughts, I’ll share them later in this post.)

It turns out that the attendee had a reason for asking the question that went well beyond generative artificial intelligence (generative AI), and touched on automation in general.

  • Should we use ChatGPT, Bard, or another generative AI tool to generate text content, or should we contact a content marketing expert?
  • Similarly, should we use a generative AI tool to generate image content, or should we contact a graphic design expert?
  • Going beyond generative AI, should we use tax preparation software, or should we contact a tax preparation expert?

You can ask this same question about a variety of business services. Should you opt for the (sometimes) cheaper automated solution, or should you get a real person to perform the service?

This can be a loaded question. After all, I can’t tell you that you should always use a person for these tasks when I personally use Freepik and a well-known tax preparation software package.

Designed by Freepik.

So, returning to content, let’s take a (probably non-objective) look at the pros and cons of using generative AI tools for content creation. After all, some of my best friends use generative AI…and they’re still my friends (and obviously not my customers).

Generative AI content advantages

While generative AI isn’t as trendy a topic as it was a few months ago, people are still talking about using generative AI to create text content. And there are clearly some advantages to it.

  • It’s very fast. No, you won’t create your text in a minute. But with properly written prompts (commands to the generative AI tool) and by using a few iterations, you can probably get generative AI text content in five minutes. You won’t get text content of any length from Bredemarket in five minutes.
  • It’s (usually) free. In most cases, you can obtain your text at no cost to you. Again, Bredemarket doesn’t work for free.
  • It’s easy. You don’t need to know how to write. Sure, you need to know how to write the prompt, but you don’t need to worry about the mechanics of writing the text itself, and when to use “its” or “it’s”. You don’t want to sound like a looser, after all. (That was wrong.)
  • It’s “good enough.” Under most circumstances, the text created by a generative AI tool gets the job done, and is obviously better than no text at all.

Generative AI content drawbacks

Perhaps I’m biased, but I think that we can all agree on the drawbacks that I’ve listed below.

  • Potential accuracy issues. Generative AI content is sourced from the Internet, and you know how accurate the Internet can be at times. For example, when I was working for my former employer, I asked one popular generative AI tool to tell me the weaknesses of my employer’s product. The tool returned two responses, one of which was plain wrong.
  • More potential accuracy issues. The way generative AI tools work, most of them provide their answers as if they were most certainly true, with no attempt to provide the probability that the response is correct or incorrect. In essence, Mr. Gen AI is an automated know-it-all with complete certainty in its answers, even when they are obviously wrong.
  • Potential relevancy issues. Depending upon your generative AI tool, you may not have access to the latest data and may instead rely on older data that was available when the generative AI tool was trained. At one point, several of the popular generative AI tools were trained on data from 2021 or earlier, which meant that they knew nothing about events that happened last week.
  • Potential privacy issues. When you feed prompts to a generative AI tool, the prompts are stored by the tool. What if you’re using a public tool, and you’re feeding confidential data from your company to the tool? What’s going to happen to your company’s private data? (This is especially critical when the data crosses international borders, something that gave Samsung serious concern.)
  • Potential tone of voice issues. If your company feeds a prompt to a generative AI tool, and your competitor feeds the same prompt to the same tool, you will get similar or identical responses. If both companies use these responses, then the companies will sound like bland “me-too” providers, with no differentiation.

I’ve touched on a number of these issues in more detail in pieces that I’ve written on the Bredemarket blog (and in one case on LinkedIn). These explain when I use generative AI, and when I do not.

So what should YOUR business do?

Depending upon your business needs, your budget, and your marketing direction, you have three choices.

(UPDATE OCTOBER 23, 2023: “SIX QUESTIONS YOUR CONTENT CREATOR SHOULD ASK YOU IS SO 2022. DOWNLOAD THE NEWER “SEVEN QUESTIONS YOUR CONTENT CREATOR SHOULD ASK YOU” HERE.)

  • Use a generative AI tool to write your content. It’s fast, it’s free, it’s easy. It may be good enough.
  • Use a low-cost writer to write your content. If you want to spend a little money, you can find writers who will work for pennies per word. That’s more than a generative AI tool, but you have a better chance of getting the content you want.
  • Use Bredemarket. I cost more than pennies per word (the cost for my standard 400-600 word writing package is in the brochure on my Bredemarket 400 Short Writing Service page), but Bredemarket’s collaborative kickoff process, including the opening six questions that I ask you, helps to ensure that your firm will get the content that you need.

Authorize Bredemarket, Ontario California’s content marketing expert, to help your firm produce words that return results.

Bredemarket logo

Bonus: here’s what Google Bard says on the topic (written by Google, not by me)

Sure, here are three advantages and three drawbacks of using generative AI to create written content:

Advantages:

  • Speed: Generative AI can create content much faster than humans. This can be a major advantage for businesses that need to produce a lot of content quickly, such as content marketing agencies or news organizations.
  • Cost-effectiveness: Generative AI can be a more cost-effective way to create content than hiring human writers. This is especially true for tasks that are repetitive or time-consuming, such as writing product descriptions or blog posts.
  • Accuracy: Generative AI can be very accurate in generating content that is factually correct. This is because it is trained on large datasets of text and code.

Drawbacks:

  • Creativity: Generative AI is not as creative as humans. It can only generate content that is based on the data it has been trained on. This means that it may not be able to come up with new ideas or insights.
  • Bias: Generative AI can be biased, depending on the data it is trained on. This is a concern because it could lead to the creation of content that is inaccurate or harmful.
  • Plagiarism: Generative AI can sometimes plagiarize content from other sources. This is because it is trained on a variety of text data, including some that may not be properly cited.

Overall, generative AI can be a valuable tool for creating written content. However, it is important to be aware of its limitations and to use it responsibly.

Here are some additional things to keep in mind when using generative AI to create written content:

  • Always fact-check the content before publishing it. Generative AI can sometimes make mistakes, so it is important to double-check the information.
  • Be clear about the purpose of the content. Are you trying to inform, persuade, or entertain? The purpose of the content will help you determine the tone and style of the writing.
  • Be aware of the audience. Who are you writing for? What are their interests and needs? Tailor the content to your audience.
  • Use generative AI as a tool, not a replacement for human creativity. Generative AI can be a great way to get started with a writing project, but it is important to put your own unique spin on the content.

Customs Becoming Artificial, Thanks to Pangiam

I missed this story when it came out in May.

MCLEAN, Va., May 2, 2023 /PRNewswire/ — The West Virginia University Research Corporation (WVURC) and Pangiam, a leading trade a travel technology company, announced a new partnership to conduct research and develop new, cutting-edge artificial intelligence, machine learning and computer vision technologies for commercial and government applications.

Pangiam and WVURC will work together to launch Pangiam Bridge, a cutting-edge artificial intelligence driven solution for customs authorities worldwide. Pangiam Bridge will allow customs officials to automate portions of the customs inspection process for baggage and cargo. Jim McLaughlin, Pangiam Chief Technology Officer, said, “we are excited to grow Pangiam’s artificial intelligence work in partnership with West Virginia University and continued development of Pangiam Bridge for customs authorities.”

From https://www.prnewswire.com/news-releases/pangiam-and-west-virginia-university-research-corporation-partner-to-develop-artificial-intelligence-and-computer-vision-technology-301813334.html

Pangiam Bridge is obviously not ready for prime time yet; it’s not even mentioned on Pangiam’s Products and Services page, nor is it mentioned anywhere else on Pangiam’s website. The only mention of Pangiam Bridge is in this press release, which isn’t surprising considering that this is a research effort. But if the research holds out, then many of the manual processes used by customs agents may be significantly reduced or eliminated entirely.

U.S. CBP Office of Field Operations agent checking the authenticity of a travel document at an international airport using a stereo microscope. By James R. Tourtellotte, CBP, U.S. Dept. of Homeland Security – Original link: http://www.cbp.gov/xp/cgov/newsroom/photo_gallery/afc/inspectors_airports/air_05.xml (file Air_5fphoto_5f05.jpg) Now available at: link, Public Domain, https://commons.wikimedia.org/w/index.php?curid=2867071

And this isn’t Pangiam’s only artificial intelligence research effort.

Project DARTMOUTH is the collaboration between Pangiam and Google Cloud, named after the 1956 Dartmouth Summer Research Project on Artificial Intelligence. Project DARTMOUTH utilizes AI and pattern analysis technologies to digest and analyze vast amounts of data in real-time and identify potential prohibited items in carry-on baggage, checked baggage, airline cargo and shipments.

From https://pangiam.com/projectdartmouth/

(Bredemarket email, meeting, contact, subscribe)

Human Cloning Via Artificial Intelligence: It’s Starting

Years ago, I joked that Printrak/Motorola/MorphoTrak/IDEMIA’s research and development group shouldn’t be researching fingerprint algorithms, but should REALLY be researching human cloning. This would allow the company to have multiple “Peter Lo” entities researching and developing new fingerprint algorithms.

Today the jokes are becoming reality. Not for biometric scientists (yet), but for online gamers.

What (or who) is Kwebbelkop AI?

The Publish Press’ most recent daily newsletter begins with the article “Kwebbelkop Turns to VTubing.” I can forgive you if two of the four words in that article are puzzling; I hadn’t heard of Kwebbelkop or “VTubing” either.

(Before I proceed to explain Kwebbelkop and VTubing, I’d like to encourage you to subscribe to The Publish Press yourself, via my link. You get an informative daily newsletter focused on creators, while I get stickers and stuff.)

Back to Kwebbelkop (real name Jordi Van Den Bussche), who is a YouTuber who has posted over 5,200 videos. That’s a lot of work. So he needed some help.

A recent Kwebbelkop video included the debut of “Kwebbelkop AI.” That’s him in the corner. That’s him in the spotlight.

From https://www.youtube.com/watch?v=ghDcOhx_TsI

Now I’m not familiar enough with Kwebbelkop to know if this is truly an AI-generated bot, or just a Max Headroom-like deception. But here’s what is claimed:

“The Digital Kwebbelkop,” as the new character is known, was developed at Van Den Bussche’s company, JVDBStudios. The gamer and his team have revealed several new tools: They are training VTubers to mimic specific creators (Kwebbelkop, in this case) and using AI technology to streamline the video production process. The result, at least in the first Digital Kwebbelkop video, is a character who looks like Van Den Bussche, sounds like him, and plays Minecraft like him.

From https://www.tubefilter.com/2023/08/02/kwebbelkop-gaming-star-ai-vtuber-character-bloo/

What (or who) are AI VTubers?

It’s important to differentiate between VTubers, where a person controls the animated appearance of the avatar (a la our old friend Max Headroom), and an AI VTuber, where AI controls all the actions of the avatar.

Kwebbelkop AI is not the first AI VTuber. In fact, this is the second AI VTuber that Kwebbelkop has created himself; his first one is Bloo.

Another AI VTuber is Neuro-sama. This Live3D article discusses the “AI” part of Neuro-sama:

Deep Learning: Deep learning endows AI avatar with the ability to understand viewers chats in stream. It uses large text database training to enable AI vtuber to have the ability to understand and answer questions.

From https://live3d.io/blog/what-is-AI-vtuber

What’s next?

It is a very tough problem to create an AI VTuber who can appear to play video games in a realistic manner, and even converse with people watching it (them) play.

It’s an even tougher problem to to create an AI entity that can create accurate biometric algorithms.

And the toughest problem of all is to create an AI-generated content marketing expert who can address your customer needs in your company’s tone of voice.

OK, maybe the biometric algorithm expert is a little tougher than the content marketing expert.

Maybe.

Oh no, I’ve said too much. I haven’t said enough.

From https://www.youtube.com/watch?v=xwtdhWltSIg

Generative AI Guidelines in San Jose, California

The Bredemarket blog has previously considered how private companies like Samsung and Adobe use generative AI. Government use is similar, yet differs in some ways. Let’s see how San Jose, California approaches it.

By Ben Loomis – DSC_9441.jpg, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=73875056

As GovTech reported in its article “San Jose Releases Generative AI Guidelines, Looks to Learn,” some of the concerns of San Jose’s city governments are similar to issues with which private companies grapple.

Privacy is also a concern, and IT advises generative AI users to assume any information entered will be exposed to the public. Materials unready for publication shouldn’t be entered, nor should private emails. Employees looking for help drafting emails should avoid copy-pasting messages into generative AI, instead prompting the tools to write a generic message they can fact-check or augment with personalized details. The guidelines advise users to fact-check with multiple credible sources, including peer-reviewed journals and official documents.

From https://www.govtech.com/artificial-intelligence/san-jose-releases-generative-ai-guidelines-looks-to-learn

This is a big concern for private companies, also.

But there are also issues that governments need to consider that private companies may not need to address.

One consideration is that government writing requires a particular style. Senate bills, for example, are written with a certain structure and formality. The city also uses gender-neutral language and the term “resident” rather than “citizen.” 

From https://www.govtech.com/artificial-intelligence/san-jose-releases-generative-ai-guidelines-looks-to-learn

Of course private companies have their own writing styles, but the world won’t come to an end if the IBM memorandum includes the word “gnarly.” But the wrong word in a Senate bill, or the use of the term “citizen” in a blue state, could be catastrophic.

One thing is clear: San Jose Chief Information Officer Khaled Tawfik doesn’t think that general-purpose generative AI will cut it.

San Jose has talked with several vendors about the possibility of AI trained on data from government, potentially restricted to San Jose data only.

From https://www.govtech.com/artificial-intelligence/san-jose-releases-generative-ai-guidelines-looks-to-learn

As I noted in my post about Writer.com, this also allows implementation of privacy restrictions that could help avert problems if an employee inputs confidential information into the tool.

For the moment, San Jose is asking employees and contractors to log all use of generative AI. This will be referenced as the city develops its guidelines and policies in the future. As the city says:

Generative Artificial Intelligence (AI) is a new branch of AI technology that can generate content—such as stories, poetry, images, voice, and music— at the request of a user. Many organizations have banned Generative AI, while others allow unrestricted usage. The City recognizes the opportunity for a controlled and responsible approach that acknowledges the benefits to efficiency while minimizing the risks around AI bias, privacy, and cybersecurity.  

This is the first step in a collaborative process to develop the City’s overall AI policy. Registered users will be invited to join the Information Technology Department in a working group to share their experience and co-develop the City’s AI policies.

From https://www.sanjoseca.gov/your-government/departments-offices/information-technology/itd-generative-ai-guideline

The Adobe Exception to the Temperamental Writer’s Hard and Fast Rule

When I originally wrote “The Temperamental Writer’s Two Suggestions and One Rule for Using Generative AI,” I included the following caveat:

So unless someone such as an employer or a consulting client requires that I do things differently, here are three ways that I use generative AI tools to assist me in my writing.

From https://bredemarket.com/2023/06/05/the-temperamental-writers-two-suggestions-and-one-rule-for-using-generative-ai/

It’s good that I included that caveat.

The Bredemarket Rule, Item One

If you read the post, you’ll recall that some of the items were suggestions. However, one was not:

Bredemarket Rule: Don’t share confidential information with the tool

If you are using a general-purpose public AI tool, and not a private one, you don’t want to share secrets.

By Unnamed photographer for Office of War Information. – U.S. Office of War Information photo, via Library of Congress website [1], converted from TIFF to .jpg and border cropped before upload to Wikimedia Commons., Public Domain, https://commons.wikimedia.org/w/index.php?curid=8989847

I then constructed a hypothetical situation in which Bredemarket was developing a new writing service, but didn’t want to share confidential details about it. One of my ideas was as follows:

First, don’t use a Bredemarket account to submit the prompt. Even if I follow all the obfuscation steps that I am about to list below, the mere fact that the prompt was associated with a Bredemarket account links Bredemarket to the data.

From https://bredemarket.com/2023/06/05/the-temperamental-writers-two-suggestions-and-one-rule-for-using-generative-ai/

Now I happen to have a ton of email accounts, so if I really wanted to divorce a generative AI prompt from its Bredemarket origins, I’d just use an account other than my Bredemarket account. It’s not a perfect solution (a sleuth could determine that the “gamer” account is associated with the same person as the Bredemarket account), but it seems to work.

But not well enough for one company.

Adobe’s restrictions on employee use of generative AI

PetaPixel accessed a gated Business Insider article that purported to include information from an email from an Adobe executive.

Adobe employees have been instructed to not use their “personal email accounts or corporate credit cards when signing up for AI tools, like ChatGPT.” This, the publication reports, comes from an internal email from Chief Information Officer Cindy Stoddard that Insider obtained.

From https://petapixel.com/2023/07/06/adobe-limits-its-employees-use-of-generative-ai/?utm_source=tldrai

Specifically, the email apparently included a list of “Don’ts”:

  • Don’t use personal emails for tools used on work-related tasks. This is the one that contradicts what I previously suggested. So if you work for Adobe, don’t listen to me.
  • Don’t include any personal or non-public Adobe information in prompts. This is reasonable when you’re using public tools such as ChatGPT.
  • Don’t use outputs verbatim. This is also reasonable, since (a) the outputs may be incorrect, and (b) there are potential copyright issues.

But don’t think that Adobe is completely restricting generative AI. It’s just putting guardrails around its use.

“We encourage the responsible and ethical exploration of generative Al technology internally, which we believe will enable employees to learn about its capabilities as we explore how it will change the way we all work,” Business Insider reported Stoddard wrote in the email.

“As employees, it’s your responsibility to protect Adobe and our customers’ data and not use generative Al in a way that harms or risks Adobe’s business, customers, or employees.”

From https://petapixel.com/2023/07/06/adobe-limits-its-employees-use-of-generative-ai/?utm_source=tldrai

What does this mean?

So my suggestion to use a non-corporate login to obfuscate already-scrubbed confidential information doesn’t fly with Adobe. All fine and good.

The true takeaways from this are two:

  1. If you’re working for or with someone who has their own policies on generative AI use, follow their policies.
  2. If they don’t have their own policies on submitting confidential information to a generative AI tool, and if you don’t have your own policy on submitting confidential information to a generative AI tool, then stop what you’re doing and create a policy now.

Why Writer.com Approaches Generative AI Writing Differently Than I Do

About two weeks ago, I asked myself the question “How Soon Will I Have to Change My Temperamental Writer Generative AI Suggestions/Rule?

Perhaps that time has come.

Perhaps not.

What are my temperamental writer generative AI suggestions and rule?

If you haven’t seen my June 5 blog post or my June 13 LinkedIn article on this topic, here is a brief recap of how I use generative AI in my writing:

Designed by Freepik.
  • Suggestion 1: A human should always write the first draft.
  • Suggestion 2: Only feed bits to the generative AI tool.
  • An ironclad rule: Don’t share confidential information with the tool.

This post will focus on the first suggestion, although the ironclad rule will come up in the discussion also.

There are several reasons why I believe that a human should write the first draft, and the generative AI tool should only be used to improve the draft. Two of these reasons (I won’t get into the ego part) are as follows:

  • Iterate on my work to make it better. For me, the process of writing itself lets me tweak the text throughout the written content. In my view this makes the first draft much better, which makes the final version even better still.
  • Control the tone of my writing. One current drawback of generative AI is that, unless properly prompted, it often delivers bland, boring text. Creating and iterating the text myself lets me dictate the tone of voice and eliminates the need to rewrite the whole thing later to change the tone.

However, there is one drawback to my method. It takes a lot longer.

  • If you submit a prompt to a generative AI tool and receive results in a minute, and if you tweak the prompt four times to make it better, you’ll have a complete first draft in five minutes.
  • Using my method, I don’t create a first draft in five minutes. It usually takes me between 60 and 120 minutes (not counting “sleep on it” time) to crank out a first draft the old fashioned way.

Let’s look at a different way to use generative AI in writing.

What is writer.com?

The Content Marketing Institute recently hosted a three-day series of webinars on content marketing called ContentTECH 2023.

One of the sessions, “Generative AI FTW: Must-Have Use Cases and Requirements for Success,” was presented by Alex Wettreich (LinkedIn, Twitter) of Writer, which promotes itself as providing the “AI platform built for the enterprise.”

This isn’t your general-purpose generative AI tool that throws everyone’s prompts into the same data warehouse. This is truly a tool for your enterprise:

Unlike other large language models, Palmyra, our family of LLMs, is built for business….

Ability to self-host: Offered as self-hosted option. Own, host, and customize your own version of our LLM.

From https://writer.com/platform/

Guess what this means? All of my personal concerns about sharing confidential data with a generative AI tool are eliminated. Read Writer’s Terms of Service:

7.1. Ownership.  All data, information, files, or other materials and content that Customer makes available to Company for the purpose of utilizing the Service (including, without limitation, training data, prompt inputs, and drafts) (“Customer Content”) shall remain the sole property of Customer. Customer shall retain all intellectual property rights in the Customer Content. Company does not screen Customer Content, is not responsible for storing or maintaining backups of any Customer Content, and is not responsible for the content of or any use by Customer of the Customer Content.

From https://writer.com/terms/

Now that we’ve talked about the basics of Writer, let’s see how it creates content.

What is writer.com’s generative AI writing process?

With Writer, the generative AI tool writes the first draft.

[W]hat we did at Writer was simple: customers already had their style guides built into Writer — their writing style, terminology, and must-have language. We used that plus samples of customers’ best blog posts, help articles, headlines, email subject lines, ads, and more. Writer can create first drafts that are significantly better than other tools because the content is modeled off your best content and trained on your voice.

From https://writer.com/blog/generative-ai-capabilities/

The training data is important. A marketer who uses Writer is guided along the way.

“Create a unique, consistent, and relatable voice that shines through every communication touch point — at scale. Your marketing team doesn’t have time for the copyediting (or scolding).”

“Keep your editorial guidelines up-to-date and easy to access. From punctuation to capitalization rules to grade level and specific terminology, put all your guidance in one place.”

“Make your core messaging easy to repeat. Keep company voice, terms, and boilerplate consistent, no matter who’s writing.”

From https://writer.com/use-cases/marketing/

But is Writer’s output as bland as the reputed “style” from other generative AI tools? If it is, then you won’t save any time by using Writer, since you’ll have to rewrite everything to fit your tone of voice anyway.

Now I haven’t tested Writer, but Trello has. And it sounds like Trello’s tone of voice has been preserved even when the bots write the content.

From trello.com.

Trello avoids the “professional voice” trap traditional software companies fall into (aka stodgy, robotic tone) by treating the person who reads their content like a coworker….With phrases like “go from Trello zero to Trello hero,” you can see that the writers at Trello had permission and encouragement to have fun while writing help content, and that fun translates to a delightful experience for users….

Leah Ryder told us, “With the 10-year anniversary of Trello around the corner, combined with major developments in-product with the new Views feature, it seemed like the right time to update and align our brand and product towards our shared goal of empowering productivity for teams everywhere.”…

Trello’s brand refresh was 1.5 years in the making, and it took a tremendous amount of strategic leadership, partnered with cross-team collaboration to make it happen. It couldn’t have happened without ten years of defining and committing to rule-breaking brand principles. Over the next decade, there’s no doubt the product will change as it adapts to user needs, but with strong brand principles in place, Trellists can always expect a sense of joy built into everything Trello creates.

From https://writer.com/blog/trello-brand-refresh/

The guidance provided by Writer ensures that Trello continues to sound…Trello-y, even after Trello became a small part of Atlassian.

What does this mean?

So if Writer and Trello are correct in their assertions, it IS possible for a well-designed generative AI tool to create a first draft that does NOT require extensive rewrites. Or, if you control your data warehouse, fact-checking. This preserves the ability to save time, since you don’t have to rewrite bland text or correct inaccurate text.

Of course, you have to buy Writer. As of today, Writer’s price for a team of five or fewer people is $18/user/month. Talk to them if you want a larger offering for your entire enterprise.

The people who review for G2 have identified alternatives to Writer, including some well-known names such as Grammarly, Jasper, and Notion. As time goes on, the major players such as Microsoft will incorporate AI into existing and new products, but whether these tools will allow tone of voice specification and privacy preservation remains to be seen.

Let’s see how long my “human drafts first” suggestion lasts.

Why Your Business Needs an Obsessive Content Marketer

Compulsions and obsessions can be bad things, or they can be good things if channeled correctly.

What if Bredemarket provided me an outlet to chnnel my compulsions and obsessions to help your business grow?

Compulsions and obsessions

I recently wrote a three-post series (first post in the series here) that frequently used the word “compulsion.”

I almost used the word “obsession” in conjunction with the word compulsion, but decided not to make light of a medical condition that truly debilitates some people.

I used the word compulsion to refer to two things about me:

Writing compulsion, or writing obsession. Designed by Freepik.

While compulsions and obsessions can certainly be bad things, when harnessed properly they can provide good for the world.

Like a butterfly.

Animotion on embracing an obsession

When people of a certain age hear the word “obsession,” they may think of the 1980s song by the band Animotion.

From https://www.youtube.com/watch?v=hIs5StN8J-0

Unfortunately for us, 90% of the song deals with the negative aspects of a person obsessing over another person. If you pick through the lyrics of the Animotion song “Obsession” and forget about what (or who) the singer is obsessing about, you can find isolated phrases that describe how an obsession can motivate you.

  • “I cannot sleep”
  • “Be still”
  • “I will not accept defeat”

But thankfully, there are more positive ways to embrace an obsession.

Justin Welsh on embracing an obsession

While Justin Welsh’s July 2022 post “TSS #028: Don’t Pick a Niche. Embrace an Obsession” is targeted for solopreneurs, it could just as easily apply to those who work for others. Regardless of your compensation structure, why do you choose to work where you do?

For Welsh, the practice of picking a niche risks commoditization.

They end up looking like, sounding like, and acting like all of their competition. The internet is full of copycats and duplicates.

From https://www.justinwelsh.me/blog/dont-pick-a-niche-embrace-an-obsession

(For example, I’d bet that all of the people who are picking a niche know better than to cite the Animotion song “Obsession” in a blog post promoting their business.)

Perhaps it’s semantics, but in Welsh’s way of thinking, embracing an obsession differs from picking a niche. To describe the power of embracing an obsession, Welsh references a tweet from Daniel Vassalo:

Find something you want to do really badly, and you won’t need any goals, habits, systems, discipline, rewards, or any other mental hacks. When the motivation is intrinsic, those things happen on their own.

From https://twitter.com/dvassallo/status/1547230105805754369

I trust you can see the difference between picking something you HAVE to do, versus obsessing over something you WANT to do.

What’s in it for you?

Welsh was addressing this post to me and people like me, and his message resonates with me.

But frankly, YOU don’t care about me and about whether I’m motivated. All that you care about is that YOU get YOUR content that you need from me.

So why should you care what Justin Welsh and Daniel Vassllo told me?

The obvious answer is that if you contract with Bredemarket for your marketing and writing services, you’ll get a “pry my keyboard out of my cold dead hands” person who WANTS to write your stuff, and doesn’t want to turn the writing process over to some two-year-old bot (except for very small little bits).

Regarding the use of two-year-old bots:

“Pry my keyboard,” indeed.

Do you need someone to obsess over YOUR content?

Of course, if you need someone to write YOUR stuff, then I won’t have time to work on a TikTok dance. This is a good thing for me, you, and the world.

As I’ve stated elsewhere, before I write a thing for a Bredemarket client, I make sure that I understand WHY you do what you do, and understand everything else that is relevant to the content that we create.

As I work on the content, you have opportunities to review it and provide your feedback. This ensures that both of us are happy with the final copy.

And that your end users become obsessed with YOU.

So if you need me to create content for you, please contact me.

Feel free to share YOUR favorite 1980s song if you like.

Even if it’s THIS song that your favorite temperamental writer detests.

From https://www.youtube.com/watch?v=aDgHXiWgKlE

Pilots, Co-Pilots, and Marketing and Writing Services

I’ve always been amused by this bumper sticker saying.

The phrase “God is my co-pilot,” taken from pilot Robert L. Scott Jr.’s World War II autobiography of the same name, superficially appears to depict a fervent religious devotion.

But look at it again.

Military pilots have a huge reputation for supersized egos. Not that I necessarily have a problem with egos, but this must be recognized. And the phrase above bears it out.

  • Scott is the pilot, in charge of things.
  • God is the co-pilot, subservient to Scott’s every command. Heck, since Scott runs the show, God might as well be a mere passenger.

But this is not only a religious issue.

Who controls artificial intelligence?

If you’re going to employ generative artificial intelligence (generative AI) to create your written work, you need to decide who will be the pilot, and who will be the co-pilot.

  • You could send the prompt off to your favorite generative AI tool and let it shape the words you will communicate to your customers. In this case, the tool is the pilot, and you’re just the co-pilot.
Designed by Freepik.

(The perceptive ones among you have already noted that I treat text and images differently. In the image above, I clearly took the co-pilot’s seat and let Freepik pilot the process. My raving egotism does not extend to my graphic capabilities.)

This concept of AI as a co-pilot rather than a pilot is not just my egotistical opinion.

When GitHub implemented its generative AI coding solution, it named the solution “GitHub Copilot.” The clear implication is that the human coder is still running the show, while GitHub Copilot is helping out its boss.

But enough about generative AI. Heaven knows I’ve been spouting off about that a lot lately. Let’s turn to another topic I spout off about a lot—how you should work with your content creator to generate your content marketing text.

Who should pilot a content marketing project?

Assume for the moment that your company has decided NOT to entrust its content marketing text to a generative AI tool, and instead has contracted with a human content marketing expert to create the text.

Again, there are two ways to approach the task.

  • The first approach is to yield all control to the expert. You sit back, relax, and tell your content marketing consultant to do whatever they want. They provide the text, and you pay the consultant with no questions asked. The content marketing consultant is the pilot here.
  • The second approach is to retain all control yourself. You tell the content marketing consultant exactly what you want, and exactly what words to say to describe your best-of-breed, game-changing, paradigm-shifting, outcome-optimizing solution. (That last sentence was painful to write, but I did it for you.) The content marketing consultant follows your exact commands and produces the copy with the exact words you want. You are the pilot here.

So which of these two methods is the best way to create content?

As far as I’m concerned, neither of them.

Which is why Bredemarket doesn’t work that way.

Can two people pilot a content marketing project?

Bredemarket’s preferred content creation process is a collaborative one, in which you and I both control the process. While in the end you are the de facto pilot since you control the purse-strings, Bredemarket emphasizes and follows this collaborative approach.

Throughout this collaborative and iterative package we both pilot the process, and we both contribute our unique strengths to produce the final written product.

Are you ready to collaborate?

If you have content marketing needs that Bredemarket can help you achieve, let me know and we’ll talk about how to pilot a content marketing project together.