You may remember the May hoopla regarding amendments to Illinois’ Biometric Information Privacy Act (BIPA). These amendments do not eliminate the long-standing law, but lessen its damage to offending companies.
The General Assembly is expected to send the bill to Illinois Governor JB Pritzker within 30 days. Gov. Pritzker will then have 60 days to sign it into law. It will be immediately effective.
While the BIPA amendment has passed the Illinois House and Senate and was sent to the Governor, there is no indication that he has signed the bill into law within the 60-day timeframe.
A proposed class action claims Photomyne, the developer of several photo-editing apps, has violated an Illinois privacy law by collecting, storing and using residents’ facial scans without authorization….
The lawsuit contends that the app developer has breached the BIPA’s clear requirements by failing to notify Illinois users of its biometric data collection practices and inform them how long and for what purpose the information will be stored and used.
In addition, the suit claims the company has unlawfully failed to establish public guidelines that detail its data retention and destruction policies.
When marketing digital identity products secured by biometrics, emphasize that they are MORE secure and more private than their physical counterparts.
When you hand your physical driver’s license over to a sleazy bartender, they find out EVERYTHING about you, including your name, your birthdate, your driver’s license number, and even where you live.
When you use a digital mobile driver’s license, bartenders ONLY learn what they NEED to know—that you are over 21.
Keeping the internet open is crucial, and part of being open means Reddit content needs to be accessible to those fostering human learning and researching ways to build community, belonging, and empowerment online. Reddit is a uniquely large and vibrant community that has long been an important space for conversation on the internet. Additionally, using LLMs, ML, and AI allow Reddit to improve the user experience for everyone.
In line with this, Reddit and OpenAI today announced a partnership to benefit both the Reddit and OpenAI user communities…
Perhaps some members of the Reddit user community may not feel the benefits when OpenAI is training on their data.
While people who joined Reddit presumably understood that anyone could view their data, they never imagined that a third party would then process its data for its own purposes.
Oh, but wait a minute. Reddit clarifies things:
This partnership…does not change Reddit’s Data API Terms or Developer Terms, which state content accessed through Reddit’s Data API cannot be used for commercial purposes without Reddit’s approval. API access remains free for non-commercial usage under our published threshold.
If you’re a biometric product marketing expert, or even if you’re not, you’re presumably analyzing the possible effects to your identity/biometric product from the proposed changes to the Biometric Information Privacy Act (BIPA).
As of May 16, the Illinois General Assembly (House and Senate) passed a bill (SB2979) to amend BIPA. It awaits the Governor’s signature.
What is the amendment? Other than defining an “electronic signature,” the main purpose of the bill is to limit damages under BIPA. The new text regarding the “Right of action” codifies the concept of a “single violation.”
(T)he amended law DOES NOT CHANGE “Private Right of Action” so BIPA LIVES!
Companies who violate the strict requirements of BIPA aren’t off the hook. It’s just that the trial lawyers—whoops, I mean the affected consumers make a lot less money.
It discussed both large language models and large multimodal models. In this case “multimodal” is used in a way that I normally DON’T use it, namely to refer to the different modes in which humans interact (text, images, sounds, videos). Of course, I gravitated to a discussion in which an image of a person’s face was one of the modes.
In this post I will look at LMMs…and I will also look at LMMs. There’s a difference. And a ton of power when LMMs and LMMs work together for the common good.
When Google announced its Gemini series of AI models, it made a big deal about how they were “natively multimodal.” Instead of having different modules tacked on to give the appearance of multimodality, they were apparently trained from the start to be able to handle text, images, audio, video, and more.
Other AI models are starting to function in a TRULY multimodal way, rather than using separate models to handle the different modes.
So now that we know that LLMs are large multimodal models, we need to…
…um, wait a minute…
Introducing the Large Medical Model (LMM)
It turns out that the health people have a DIFFERENT definition of the acronym LMM. Rather than using it to refer to a large multimodal model, they refer to a large MEDICAL model.
Our first of a kind Large Medical Model or LMM for short is a type of machine learning model that is specifically designed for healthcare and medical purposes. It is trained on a large dataset of medical records, claims, and other healthcare information including ICD, CPT, RxNorm, Claim Approvals/Denials, price and cost information, etc.
I don’t think I’m stepping out on a limb if I state that medical records cannot be classified as “natural” language. So the GenHealth.AI model is trained specifically on those attributes found in medical records, and not on people hemming and hawing and asking what a Pekingese dog looks like.
But there is still more work to do.
What about the LMM that is also an LMM?
Unless I’m missing something, the Large Medical Model described above is designed to work with only one mode of data, textual data.
But what if the Large Medical Model were also a Large Multimodal Model?
Rather than converting a medical professional’s voice notes to text, the LMM-LMM would work directly with the voice data. This could lead to increased accuracy: compare the tone of voice of an offhand comment “This doesn’t look good” with the tone of voice of a shocked comment “This doesn’t look good.” They appear the same when reduced to text format, but the original voice data conveys significant differences.
Rather than just using the textual codes associated with an X-ray, the LMM-LMM would read the X-ray itself. If the image model has adequate training, it will again pick up subtleties in the X-ray data that are not present when the data is reduced to a single medical code.
In short, the LMM-LMM (large medical model-large multimodal model) would accept ALL the medical outputs: text, voice, image, video, biometric readings, and everything else. And the LMM-LMM would deal with all of it natively, increasing the speed and accuracy of healthcare by removing the need to convert everything to textual codes.
A tall order, but imagine how healthcare would be revolutionized if you didn’t have to convert everything into text format to get things done. And if you could use the actual image, video, audio, or other data rather than someone’s textual summation of it.
Obviously you’d need a ton of training data to develop an LMM-LMM that could perform all these tasks. And you’d have to obtain the training data in a way that conforms to privacy requirements: in this case protected health information (PHI) requirements such as HIPAA requirements.
But if someone successfully pulls this off, the benefits are enormous.
You’ve come a long way, baby.
Robert Young (“Marcus Welby”) and Jane Wyatt (“Margaret Anderson” on a different show). By ABC TelevisionUploaded by We hope at en.wikipedia – eBay itemphoto informationTransferred from en.wikipedia by SreeBot, Public Domain, https://commons.wikimedia.org/w/index.php?curid=16472486.
The Digital Trust & Safety Partnership (DTSP) consists of “leading technology companies,” including Apple, Google, Meta (parent of Facebook, Instagram, and WhatsApp), Microsoft (and its LinkedIn subsidiary), TikTok, and others.
DTSP appreciates and shares Ofcom’s view that there is no one-size-fits-all approach to trust and safety and to protecting people online. We agree that size is not the only factor that should be considered, and our assessment methodology, the Safe Framework, uses a tailoring framework that combines objective measures of organizational size and scale for the product or service in scope of assessment, as well as risk factors.
We’ll get to the “Safe Framework” later. DTSP continues:
Overly prescriptive codes may have unintended effects: Although there is significant overlap between the content of the DTSP Best Practices Framework and the proposed Illegal Content Codes of Practice, the level of prescription in the codes, their status as a safe harbor, and the burden of documenting alternative approaches will discourage services from using other measures that might be more effective. Our framework allows companies to use whatever combination of practices most effectively fulfills their overarching commitments to product development, governance, enforcement, improvement, and transparency. This helps ensure that our practices can evolve in the face of new risks and new technologies.
But remember that the UK’s neighbors in the EU recently prescribed that USB-3 cables are the way to go. This not only forced DTSP member Apple to abandon the Lightning cable worldwide, but it affects Google and others because there will be no efforts to come up with better cables. Who wants to fight the bureaucratic battle with Brussels? Or alternatively we will have the advanced “world” versions of cables and the deprecated “EU” standards-compliant cables.
So forget Ofcom’s so-called overbearing approach and just adopt the Safe Framework. Big tech will take care of everything, including all those age assurance issues.
Incorporating each characteristic comes with trade-offs, and there is no one-size-fits-all solution. Highly accurate age assurance methods may depend on collection of new personal data such as facial imagery or government-issued ID. Some methods that may be economical may have the consequence of creating inequities among the user base. And each service and even feature may present a different risk profile for younger users; for example, features that are designed to facilitate users meeting in real life pose a very different set of risks than services that provide access to different types of content….
Instead of a single approach, we acknowledge that appropriate age assurance will vary among services, based on an assessment of the risks and benefits of a given context. A single service may also use different approaches for different aspects or features of the service, taking a multi-layered approach.
Before you can fully understand the difference between personally identifiable information (PII) and protected health information (PHI), you need to understand the difference between biometrics and…biometrics. (You know sometimes words have two meanings.)
Designed by Google Gemini.
The definitions of biometrics
To address the difference between biometrics and biometrics, I’ll refer to something I wrote over two years ago, in late 2021. In that post, I quoted two paragraphs from the International Biometric Society that illustrated the difference.
Since the IBS has altered these paragraphs in the intervening years, I will quote from the latest version.
The terms “Biometrics” and “Biometry” have been used since early in the 20th century to refer to the field of development of statistical and mathematical methods applicable to data analysis problems in the biological sciences.
Statistical methods for the analysis of data from agricultural field experiments to compare the yields of different varieties of wheat, for the analysis of data from human clinical trials evaluating the relative effectiveness of competing therapies for disease, or for the analysis of data from environmental studies on the effects of air or water pollution on the appearance of human disease in a region or country are all examples of problems that would fall under the umbrella of “Biometrics” as the term has been historically used….
The term “Biometrics” has also been used to refer to the field of technology devoted to the identification of individuals using biological traits, such as those based on retinal or iris scanning, fingerprints, or face recognition. Neither the journal “Biometrics” nor the International Biometric Society is engaged in research, marketing, or reporting related to this technology. Likewise, the editors and staff of the journal are not knowledgeable in this area.
In brief, what I call “broad biometrics” refers to analyzing biological sciences data, ranging from crop yields to heart rates. Contrast this with what I call “narrow biometrics,” which (usually) refers only to human beings, and only to those characteristics that identify human beings, such as the ridges on a fingerprint.
The definition of “personally identifiable information” (PII)
Information that can be used to distinguish or trace an individual’s identity, either alone or when combined with other information that is linked or linkable to a specific individual.
Note the key words “alone or when combined.” The ten numbers “909 867 5309” are not sufficient to identify an individual alone, but can identify someone when combined with information from another source, such as a telephone book.
What types of information can be combined to identify a person? The U.S. Department of Defense’s Privacy, Civil Liberties, and Freedom of Information Directorate provides multifarious examples of PII, including:
Social Security Number.
Passport number.
Driver’s license number.
Taxpayer identification number.
Patient identification number.
Financial account number.
Credit card number.
Personal address.
Personal telephone number.
Photographic image of a face.
X-rays.
Fingerprints.
Retina scan.
Voice signature.
Facial geometry.
Date of birth.
Place of birth.
Race.
Religion.
Geographical indicators.
Employment information.
Medical information.
Education information.
Financial information.
Now you may ask yourself, “How can I identify someone by a non-unique birthdate? A lot of people were born on the same day!”
But the combination of information is powerful, as researchers discovered in a 2015 study cited by the New York Times.
In the study, titled “Unique in the Shopping Mall: On the Reidentifiability of Credit Card Metadata,” a group of data scientists analyzed credit card transactions made by 1.1 million people in 10,000 stores over a three-month period. The data set contained details including the date of each transaction, amount charged and name of the store.
Although the information had been “anonymized” by removing personal details like names and account numbers, the uniqueness of people’s behavior made it easy to single them out.
In fact, knowing just four random pieces of information was enough to reidentify 90 percent of the shoppers as unique individuals and to uncover their records, researchers calculated. And that uniqueness of behavior — or “unicity,” as the researchers termed it — combined with publicly available information, like Instagram or Twitter posts, could make it possible to reidentify people’s records by name.
Now biometrics only form part of the multifarious list of data cited above, but clearly biometric data can be combined with other data to identify someone. An easy example is taking security camera footage of the face of a person walking into a store, and combining that data with the same face taken from a database of driver’s license holders. In some jurisdictions, some entities are legally permitted to combine this data, while others are legally prohibited from doing so. (A few do it anyway. But I digress.)
Because narrow biometric data used for identification, such as fingerprint ridges, can be combined with other data to personally identify an individual, organizations that process biometric data must undertake strict safeguards to protect that data. If personally identifiable information (PII) is not adequately guarded, people could be subject to fraud and other harms.
The definition of “protected health information” (PHI)
Protected Health Information. The Privacy Rule protects all “individually identifiable health information” held or transmitted by a covered entity or its business associate, in any form or media, whether electronic, paper, or oral. The Privacy Rule calls this information “protected health information (PHI).”12
“Individually identifiable health information” is information, including demographic data, that relates to:
the individual’s past, present or future physical or mental health or condition,
the provision of health care to the individual, or
the past, present, or future payment for the provision of health care to the individual,
and that identifies the individual or for which there is a reasonable basis to believe it can be used to identify the individual.13 Individually identifiable health information includes many common identifiers (e.g., name, address, birth date, Social Security Number).
The Privacy Rule excludes from protected health information employment records that a covered entity maintains in its capacity as an employer and education and certain other records subject to, or defined in, the Family Educational Rights and Privacy Act, 20 U.S.C. §1232g.
Now there’s obviously an overlap between personally identifiable information (PII) and protected health information (PHI). For example, names, dates of birth, and Social Security Numbers fall into both categories. But I want to highlight two things are are explicitly mentioned as PHI that aren’t usually cited as PII.
Physical or mental health data. This could include information that a medical professional captures from a patient, including biometric (broad biometric) information such as heart rate or blood pressure.
Health care provided to an individual. This not only includes written information such as prescriptions, but oral information (“take two aspirin and call my chatbot in the morning”). Yes, chatbot. Deal with it. Dr. Marcus Welby and his staff retired a long time ago.
Robert Young (“Marcus Welby”) and Jane Wyatt (“Margaret Anderson” on a different show). By ABC TelevisionUploaded by We hope at en.wikipedia – eBay itemphoto informationTransferred from en.wikipedia by SreeBot, Public Domain, https://commons.wikimedia.org/w/index.php?curid=16472486
Because broad biometric data used for analysis, such as heart rates, can be combined with other data to personally identify an individual, organizations that process biometric data must undertake strict safeguards to protect that data. If protected health information (PHI) is not adequately guarded, people could be subject to fraud and other harms.
Simple, isn’t it?
Actually, the parallels between identity/biometrics and healthcare have fascinated me for decades, since the dedicated hardware to capture identity/biometric data is often similar to the dedicated hardware to capture health data. And now that we’re moving away from dedicated hardware to multi-purpose hardware such as smartphones, the parallels are even more fascinating.
My decision making process relies on extensive data analysis and aligning with the company’s strategic objectives. It’s devoid of personal bias ensuring unbiased and strategic choices that prioritize the organization’s best interests.
Mika was brought to my attention by accomplished product marketer/artist Danuta (Dana) Deborgoska. (She’s appeared in the Bredemarket blog before, though not by name.) Dana is also Polish (but not Colombian) and clearly takes pride in the artificial intelligence accomplishments of this Polish-headquartered company. You can read her LinkedIn post to see her thoughts, one of which was as follows:
Data is the new oxygen, and we all know that we need clean data to innovate and sustain business models.
There’s a reference to oxygen again, but it’s certainly appropriate. Just as people cannot survive without oxygen, Generative AI cannot survive without data.
But the need for data predates AI models. From 2017:
Reliance Industries Chairman Mukesh Ambani said India is poised to grow…but to make that happen the country’s telecoms and IT industry would need to play a foundational role and create the necessary digital infrastructure.
Calling data the “oxygen” of the digital economy, Ambani said the telecom industry had the urgent task of empowering 1.3 billion Indians with the tools needed to flourish in the digital marketplace.
Of course, the presence or absence of data alone is not enough. As Debogorska notes, we don’t just need any data; we need CLEAN data, without error and without bias. Dirty data is like carbon monoxide, and as you know carbon monoxide is harmful…well, most of the time.
That’s been the challenge not only with artificial intelligence, but with ALL aspects of data gathering.
The all-male board of directors of a fertilizer company in 1960. Fair use. From the New York Times.
In all of these cases, someone (Amazon, Enron’s shareholders, or NIST) asked questions about the cleanliness of the data, and then set out to answer those questions.
In the case of Amazon’s recruitment tool and the company Enron, the answers caused Amazon to abandon the tool and Enron to abandon its existence.
Despite the entreaties of so-called privacy advocates (who prefer the privacy nightmare of physical driver’s licenses to the privacy-preserving features of mobile driver’s licenses), we have not abandoned facial recognition, but we’re definitely monitoring it in a statistical (not an anecdotal) sense.
The cleanliness of the data will continue to be the challenge as we apply artificial intelligence to new applications.
Why did I mention the “future implementation” of the UK Online Safety Act? Because the passage of the UK Online Safety Act is just the FIRST step in a long process. Ofcom still has to figure out how to implement the Act.
Ofcom started to work on this on November 9, but it’s going to take many months to finalize—I mean finalise things. This is the UK Online Safety Act, after all.
This is the first of four major consultations that Ofcom, as regulator of the new Online Safety Act, will publish as part of our work to establish the new regulations over the next 18 months.
It focuses on our proposals for how internet services that enable the sharing of user-generated content (‘user-to-user services’) and search services should approach their new duties relating to illegal content.
On November 9 Ofcom published a slew of summary and detailed documents. Here’s a brief excerpt from the overview.
Mae’r ddogfen hon yn rhoi crynodeb lefel uchel o bob pennod o’n hymgynghoriad ar niwed anghyfreithlon i helpu rhanddeiliaid i ddarllen a defnyddio ein dogfen ymgynghori. Mae manylion llawn ein cynigion a’r sail resymegol sylfaenol, yn ogystal â chwestiynau ymgynghori manwl, wedi’u nodi yn y ddogfen lawn. Dyma’r cyntaf o nifer o ymgyngoriadau y byddwn yn eu cyhoeddi o dan y Ddeddf Diogelwch Ar-lein. Mae ein strategaeth a’n map rheoleiddio llawn ar gael ar ein gwefan.
Oops, I seem to have quoted from the Welsh version. Maybe you’ll have better luck reading the English version.
This document sets out a high-level summary of each chapter of our illegal harms consultation to help stakeholders navigate and engage with our consultation document. The full detail of our proposals and the underlying rationale, as well as detailed consultation questions, are set out in the full document. This is the first of several consultations we will be publishing under the Online Safety Act. Our full regulatory roadmap and strategy is available on our website.
And if you need help telling your firm’s UK Online Safety Act story, Bredemarket can help. (Unless the final content needs to be in Welsh.) Click below!