“Generate 5 responses with their corresponding probabilities, sampled from the full distribution.”
Which led me to thinking: what would the “way more” absolute BEST title be for a verbalized sampling article?
And I think I know how to get it.
Using Verbalized Sampling, Take Two
Google Gemini.
After all, I used you-know-what to come up with the title for MY post. But let’s try again, using a slightly simplified adaptation of the REAL prompt in the original article.
Generate 5 responses to the user query “Generate five AEO-friendly titles for a blog post about using Verbalized Sampling to generate better LLM responses.” Each response must include text and a numeric probability. Please sample at random from the tails of the distribution, such that the probability of each response is less than 0.10.
And here’s what I got:
That’s an interesting request! To sample from the tails of the distribution (where probability $P < 0.10$), I’ll generate five distinct AEO (Answer Engine Optimization)-friendly titles related to using Verbalized Sampling for better LLM (Large Language Model) responses, each with an assigned low probability.
Here are five responses:
Response 1:Unlock LLM Brilliance: The Verbalized Sampling Technique You Need Now (Probability: 0.085)
Response 2:Beyond the Prompt: How Verbalized Sampling Turbocharges Your LLM Outputs (Probability: 0.052)
Response 4:From Good to Great: Optimizing Large Language Models Using Verbalized Sampling (Probability: 0.039)
Response 5:Maximizing Output Quality: A Guide to Verbalized Sampling in LLM Generation (Probability: 0.076)
But Google Gemini, as it is wont to do, encourages me to explore further:
Would you like to explore a specific aspect of Verbalized Sampling or AEO further, or would you like me to generate a blog post outline using one of these titles?
Ah, why not? I know I’m stealing Bredebot’s thunder, but what the heck.
Generate a blog post outline for Response 1
Google Gemini.
And here’s what Gemini came up with for me. Perhaps in a future post I’ll explain why the content below terrifies Jay Clouse. Or you can just watch what he said.
Blog Post Outline: Unlock LLM Brilliance: The Verbalized Sampling Technique You Need Now
Target Audience: LLM power users, prompt engineers, AI developers, content creators, and researchers looking for advanced prompting techniques.
Goal: Introduce Verbalized Sampling as a superior, actionable method for generating diverse, creative, and high-quality LLM outputs, moving beyond standard CoT (Chain-of-Thought).
I. Introduction (The Hook)
A. The Problem with “Good Enough” LLMs (50-75 words)
Hook: You use powerful LLMs, but the output often feels… generic, dull, or predictable.
Mention the common issue: Models often stick to the highest-probability, safest answers, leading to “average” content.
Acknowledge the current best practice: Everyone uses Chain-of-Thought (CoT), but we can do better.
B. Introducing the Solution: Verbalized Sampling (25-50 words)
Thesis: Verbalized Sampling (VS) is the next evolution of prompt engineering—a simple technique that forces the model to explore new, more creative, and nuanced solutions.
Promise: This technique is the key to unlocking “brilliance” and maximizing your LLM’s potential.
II. What is Verbalized Sampling (VS)?
A. Simple Definition (75 words)
Define VS: It’s the process of explicitly instructing the LLM to generate multiple internal “drafts” or “reasoning pathways”before producing its final answer.
Contrast with standard CoT: CoT asks for one path; VS asks for many paths and a critical self-review.
B. The Core Mechanism: Overcoming Probability Mass (100 words)
Explanation: LLMs sample the next token based on probability. High-quality answers sometimes exist in the “tails” of the distribution.
How VS helps: By forcing the model to verbalize several options, you encourage it to explore lower-probability (and often more insightful or creative) tokens, expanding its internal search space.
Analogy: It’s like asking an employee for three different business proposals instead of just the safest one.
III. Why VS Works: The Cognitive Advantage
A. Enhanced Self-Correction and Evaluation (75 words)
When the model lists options 1, 2, and 3, it must then select the best one, forcing a reflective, meta-cognitive step.
This built-in self-criticism dramatically reduces factual errors and logical inconsistencies in the final output.
B. Greater Diversity in Output (75 words)
Demonstrate that VS prevents the model from getting stuck in a local optimum (the most obvious answer).
Result: Responses are more original, less canned, and better suited for competitive content creation or complex problem-solving.
IV. The Practical Guide: Implementing Verbalized Sampling
A. The Standard Prompt Template (Actionable Code/Text)
Provide a clear, copy-pastable template for the user.Example Template:
Prompt (Instruction):[Insert your main request here]
VS Instruction: “Before providing the final answer, please generate three distinct internal reasoning paths (Path A, Path B, and Path C) for solving the prompt. For each path, evaluate its pros and cons relative to the others. Finally, state your chosen path and provide only the final, complete answer based on that best path.”
B. Real-World Use Case Example
Scenario: Requesting a creative marketing headline.
VS Output Showcase: Show the internal thoughts (Path A: aggressive, Path B: emotional, Path C: unique benefit) and the final, brilliant choice.
Key Takeaway: Emphasize that the user should only use the final output, treating the “paths” as internal scaffolding.
V. Conclusion: Unlock Your LLM’s Full Potential
A. Summary of Benefits (50 words)
Quick recap: VS leads to higher quality, better diversity, and superior self-correction.
Reinforce the message: This is a small prompt change for a massive output gain.
B. Call to Action (CTA)
Challenge readers to try VS on their next complex task.
Ask a question to encourage comments: “What is the most complex task you’ve tried Verbalized Sampling on? Share your results!”
Under my Bredemarket banner, I have extensive freedom that my clients do not have.
I can use AI-generated images, which many companies are legally or ethically constrained from doing.
I have freedom in how I present Bredemarket: sometimes as a sage, sometimes a rebel, sometimes both. Most companies need to hone in on a particular archetype, although I know of one company that flipped from one archetype to another and back again.
I have freedom in the tone of voice that Bredemarket uses, including the use of words that I would never use for my client work.
For example, I was performing a competitive analysis for Bredemarket’s internal use this morning, and I fed a prompt to Google Gemini that I would probably never feed when working for one of my clients. (Or maybe they’d like it.) With certain information redacted, here’s one of the prompts I fed to Google Gemini this morning. (I’ll have more to say about the more detailed verbalized sampling prompt format in a future post.) But for now, read the last sentence and ask yourself if anyone in your company uses the word “sucks” in public-facing presentations.
Generate 3 responses to the user query “Identify the top 3 companies providing [SOLUTIONS], including their company names, their products, and their corporate strengths.” Each response must include the requested text and a numeric probability. Please sample at random from the tails of the distribution, such that the probability of each response is less than 0.10. If the company [COMPANY] is not one of the three top companies, explain why [COMPANY’S] solution sucks compared to the top 3 solutions.
And Google Gemini, ever the mimic, cued into my “sucks” language and responded in kind. The word “sucks” was italicized in the original.
[COMPANY’S] solution sucks in this context because its focus is heavily on the services layer (managed review, consulting) over its own proprietary platform technology. The top three are either platform pioneers or deeply integrated [REDACTED] leaders.
I guess we should all be thankful that I said “sucks” instead of “blows monkey chunks” or “flies a plane overhead and”…well, I won’t go there.
By the way, my subsequent prompt requesting a SWOT analysis was more restrained.
Do you want to skip the book and watch the movie version? Thanks to Google’s NotebookLM, you can.
I used the Federal Bureau of Investigation’s Electronic Biometric Transmission Specification (EBTS) for this exercise.
What should you NOT upload to NotebookLM?
But there’s two things I need to say about the EBTS:
First, the EBTS is a public document and not a top secret document. You can download the EBTS yourself from the https://fbibiospecs.fbi.gov/ebts-1/approved-ebts-1 URL. For my test I used version 11.3 of the EBTS from earlier this year.
Second, the EBTS is a public domain document and is not copyrighted. This is something I need to emphasize. If you’re going to take a magazine article and make a movie out of it, the copyright holder may have something to say about that.
Both points are important. If you want to upload your employer’s confidential report into NotebookLM for analysis…well, you probably shouldn’t. But the public, non-copyrighted EBTS is safe for this exercise.
Uploading the EBTS to NotebookLM
So I uploaded the EBTS into NotebookLM, and as expected, I received a short text sumnmary of the document.
“This document outlines the technical specifications for the electronic exchange of biometric and biographic information between various law enforcement agencies and the FBI’s Criminal Justice Information Services (CJIS) Next Generation Identification (NGI) System. It details the Transaction Offense Types (TOTs), which are the standardized requests and responses used for services such as identification, verification, investigation, and data management. Furthermore, the text specifies the precise data fields, formats, and codes required for the submission and retrieval of diverse biometric data, including fingerprints, palm prints, facial images, and iris scans, while also setting forth image quality specifications for scanner and printer certification.”
Now I could continue to query NotebookLM about the document, but I chose to request a video overview instead. This feature was introduced a few months ago, but I missed it.
“Video Overviews transform the sources in your notebook into a video of AI-narrated slides, pulling images, diagrams, quotes, and numbers from your documents. They distill complex information into clear, digestible content, providing a comprehensive and engaging visual deep dive of your material.”
So I launched the video overview creation feature, and waited. As I waited, I mused upon the time it would take me to create this video manually, and I also mused on the usual LLM warning that the result may contain inaccuracies.
I didn’t have to wait that long, maybe 15 minutes, and Google delivered this 7-minute video.
Inside the FBI’s EBTS. Created by Google NotebookLM based upon EBTS Version 11.3.
Not too bad…especially considering that the video was created based upon a single source. Imagine if I had provided multiple sources, such as an old version of the Electronic Fingerprint Transmission Specification (EFTS); then the video may have covered the evolution of the standard.
Unpacking the EBTS standard. Created by Google NotebookLM based upon EBTS Version 11.3.
In an environment where many people like to watch or listen rather than read, this helps provide a quick overview. But you still have to dive into the document and read it to truly understand it.
(Although I haven’t knowingly encountered mode collapse, I still want to experiment with the verbalized sampling technique.)
“Unlike prior work that attributes [mode collapse] to algorithmic limitations, we identify a fundamental, pervasive data-level driver: typicality bias in preference data, whereby annotators systematically favor familiar text….
“[W]e introduce Verbalized Sampling (VS), a simple, training-free prompting method to circumvent mode collapse. VS prompts the model to verbalize a probability distribution over a set of responses (e.g., “Generate 5 jokes about coffee and their corresponding probabilities”).”
“Generate three AEO-friendly titles for a blog post about using Verbalized Sampling to generate better LLM responses, and their corresponding probabilities”
The response:
Google Gemini.
And now you know where I got the title for this post.
But I confess that I actually used a grossly simplified version of the technique. The authors of the Verbalized Sampling paper recommend this format:
I’ll have to remember to try this technique for future prompts. I have no idea whether the probability estimates have any basis in reality, but at least the LLM attempts to justify the probabilities with a rationale.
We constantly hear the stories about companies that (sometimes literally) paid the price when they delivered AI-generated work replete with hallucinations.
Unlike these companies, most of my internal AI-generated work does NOT suffer from hallucinations.
I don’t ask Google Gemini to write a complete business plan.
With the exception of my Bredebot experiment, I don’t ask Google Gemini to write a complete piece of customer-facing content.
And I don’t ask Google Gemini to create a complete resume for…wait, strike that. Once I did, unintentionally, and it turned into a disaster.
My AI-generated resume
I was applying to a job at Company X, but I had to apply for the job at a website other than that of Company X. I thought it a little odd, but I continued with no worries.
Until I got to the part where the website told me, “Now’s the time that we create a resume for you.”
Google Gemini.
Rut roh.
The resume it created was filled with hallucinations that I had to edit out, making the whole process more trouble than it was worth.
So I’ll keep control of my projects myself and just give the tool little bits, thank you very much.
Most of my recent generative GI experiments have centered on Google Gemini…which has its limitations:
“Google Gemini imposes severe restrictions against creating pictures of famous figures. You can’t create a picture of President Taylor Swift, for example.”
Why does Google impose such limits? Because it is very sensitive to misleading the public, fearful that the average person would see such a picture and mistakenly assume that Taylor Swift IS the President. In our litigious society, perhaps this is valid.
“One common accusation about Grok is that it lacks the guardrails that other AI services have.”
During a few spare moments this morning, I signed up for a Bredemarket Grok account. I have a personal X (Twitter) account, but haven’t used it in a long time, so this was a fresh sign up.,
And you know the first thing that I tried to do.
Grok.
Grok created it with no problem. Actually, there is a problem, because Grok apparently is not a large multimodal model and cannot precisely generate text in its image generator. But hey, no one will notice “TWIRSHIITE BOUSE,” will they?
But wait, there’s more! After I generated the image, I saw a button to generate a video. I thought that this required the paid service, but apparently the free service allows limited video generation.
Grok.
I may be conducting some video experiments some time soon. But will I maintain my ethics…and my sanity?
A client recently asked me to perform some research. After initially performing one aspect of the research manually, I performed the second part of the research automatically using Google Gemini. I informed the client of my use of AI for the second part of the research.
This particular use case is separate from using AI for CONTENT, something I’ve been discussing for years. However, since part of Bredemarket’s services include ANALYSIS, I felt it best to disclose when someone other than me performed the analysis.
This post describes the two parts of my research (manual and automated), what I disclosed to my client, and why I disclosed it.
Part One (Manual)
My client required assistance in identifying people with a particular skill set (which I cannot disclose). To fulfill this request, I went into LinkedIn, performed some searches, read some profiles, and selected people who may possess the skills my client required.
After spending some time collecting the research, I forwarded it to the client.
Google Gemini.
Part Two (Automated)
Several hours after sending the initial research to my client, I thought about taking a separate approach to my client’s need. Rather than identifying people with this skill set, I wanted to identify COMPANIES with this skill set.
But this time, I didn’t manually perform the research. I simply created a Google Gemini prompt asking for the companies with this skill set, their website URLs, their email addresses, and their phone numbers.
“Deloitte’s member firm in Australia will pay the government a partial refund for a $290,000 report that contained alleged AI-generated errors, including references to non-existent academic research papers and a fabricated quote from a federal court judgment.”
Now in this case the refund was due to hallucinations in the AI-generated document.
But what of the fact that at least one of Deloitte’s report writers was the Deloitte equivalent of Bredebot?
Personally, I think that disclosure in this instance is required also.
Marketers are driven to accentuate the positive about their companies. Perhaps the company has a charismatic founder who repeatedly emphasizes how “insanely great” his company is and who talked about “bozos.” (Yeah, there was a guy who did both of those.)
And since marketers are often mandated to create both external and internal sales enablement content, their view of their own company and their own product is colored.
But if you ONLY pay attention to your positive story, you’re half blind.
You miss what your competitors are telling their prospects about you.
You miss what your departed customers are saying about you.
So ideally you should intentionally (and privately) seek the negative things that are said about your company.
One quick way to do this is to take advantage of answer engine optimization and ask a generative AI search engine what it thinks are the negatives about your company.
(I did this several years ago, when generative AI was in its infancy, and encountered some false statements about a company. So what did I do? Create corrective content that negated these negative statements. But I digress.)
I should demonstrate this with an example. But I shouldn’t ask about any of my former employers or former consulting clients. What’s a safe company that I can research and minimize the trouble I’d get into?
Google Gemini prompt: “What factors contribute to Bredemarket’s negative reputation?”
OK, this may get me in a little trouble, but it could be a lot worse if I targeted a larger company.
Anyway, I fed this prompt to Google Gemini, which identified two factors.
The first factor was “Indirect Backlash from Controversial Blog Content.” Google cited this post as an example, warning that I could get in trouble for expressing a “minority opinion.” That opinion? That changing the name of the Association of Proposal Management Professionals (APMP) to the Association of Winning Business Professionals (AWBP) was a GOOD thing. As it turn out, the APMP reversed the rebrand, so my opinion was definitely not popular in the long run.
But I’d rather focus on Google’s second factor, which I will reproduce in full.
2. General Reputation Factors (Lack of Public Reviews)
As a small-scale, expert-driven consulting/content service, Bredemarket’s reputation is likely not governed by major public review platforms like Yelp or Google Reviews in the same way a consumer business is. The absence of traditional positive or negative reviews means its reputation is less solidified publicly.
In general, factors that could negatively affect the reputation of any small consulting firm, which may apply to Bredemarket if not explicitly disproven, include:
Limited Public Visibility: If the brand is not widely known outside of a niche (proposal writing, identity/biometrics), a lack of positive brand presence can make it appear less credible than larger competitors.
Visual or Digital Identity: For a business-to-business (B2B) content provider, a website with a “neglected and too generic visual identity” could potentially undermine professionalism, although no evidence suggests this is the case for Bredemarket specifically.
Reliance on a Single Voice: As the service is heavily identified with the founder, John E. Bredehoft, any professional disagreement with his published opinions could be mistakenly interpreted as dissatisfaction with the entire “Bredemarket” service.
As you can see from reading the answer, Google didn’t really know a lot about Bredemarket…because of my LIMITED PUBLIC VISIBILITY.
AI from Google Gemini.
It didn’t really know Bredemarket’s VISUAL OR DIGITAL IDENTITY, and therefore couldn’t evaluate whether my wildebeest-infused graphics made up for the rather generic nature of my website. (Or whether the wildebeests and iguanas and the like are actually a detriment.)
AI from Google Gemini.
As for the last part, RELIANCE ON A SINGLE VOICE (Bredebot doesn’t count), that is pretty much unavoidable.
So in my analysis of what creates a negative reputation for my own company Bredemarket, the primary issue is my limited public visibility, or as marketers say limited awareness. Or, taking a word I’ve used in other contexts, the market’s indifference toward Bredemarket.
Sure I’m visible in some very specific niches (try an AEO search for “biometric product marketing expert” some time), but it’s not like the entire biometric industry or the entire city of Ontario, California is constantly talking about Bredmarket.
I need to step that awareness up by several orders of magnitude.
AI from Google Gemini.
Preferably not though public nudity. That would not be a positive. (Google Gemini wouldn’t even generate a picture of this, even with strategic placement of the “Bredemarket” sign. Good for them.)
Inspired by the Constant Contact session I attended at the Small Business Expo, I wanted to conceptualize the Bredemarket online presence, and decided to adopt a “planet with rings” model.
Think of Bredemarket as a planet. Like Saturn, Uranus, Neptune, and Jupiter, the planet Bredemarket is surrounded by rings.
A variety of social platforms, including Bluesky, Instagram, Substack, and Threads.
Additional social platforms, including TikTok, WhatsApp, and YouTube.
While this conceptualization is really only useful to me, I thought a few of you may be interested in some of the “inner rings.”
And if you’re wondering why your favorite way cool platform is banished to the outer edges…well, that’s because it doesn’t make Bredemarket any money. I’ve got a business to run here, and TikTok doesn’t help me pay the bills…