LMM vs. LMM (Acronyms Are Funner)

Do you recall my October 2023 post “LLM vs. LMM (Acronyms Are Fun)“?

It discussed both large language models and large multimodal models. In this case “multimodal” is used in a way that I normally DON’T use it, namely to refer to the different modes in which humans interact (text, images, sounds, videos). Of course, I gravitated to a discussion in which an image of a person’s face was one of the modes.

Document processing with GPT-4V. The model’s mistake is highlighted in red. From https://huyenchip.com/2023/10/10/multimodal.html?utm_source=tldrai.

In this post I will look at LMMs…and I will also look at LMMs. There’s a difference. And a ton of power when LMMs and LMMs work together for the common good.

Revisiting the Large Multimodal Model (LMM)

Since I wrote that piece last year, large multimodal models continue to be discussed. Harry Guinness just wrote a piece for Zapier in March.

When Google announced its Gemini series of AI models, it made a big deal about how they were “natively multimodal.” Instead of having different modules tacked on to give the appearance of multimodality, they were apparently trained from the start to be able to handle text, images, audio, video, and more. 

Other AI models are starting to function in a TRULY multimodal way, rather than using separate models to handle the different modes.

So now that we know that LLMs are large multimodal models, we need to…

…um, wait a minute…

Introducing the Large Medical Model (LMM)

It turns out that the health people have a DIFFERENT definition of the acronym LMM. Rather than using it to refer to a large multimodal model, they refer to a large MEDICAL model.

As you can probably guess, the GenHealth.AI model is trained for medical purposes.

Our first of a kind Large Medical Model or LMM for short is a type of machine learning model that is specifically designed for healthcare and medical purposes. It is trained on a large dataset of medical records, claims, and other healthcare information including ICD, CPT, RxNorm, Claim Approvals/Denials, price and cost information, etc.

I don’t think I’m stepping out on a limb if I state that medical records cannot be classified as “natural” language. So the GenHealth.AI model is trained specifically on those attributes found in medical records, and not on people hemming and hawing and asking what a Pekingese dog looks like.

But there is still more work to do.

What about the LMM that is also an LMM?

Unless I’m missing something, the Large Medical Model described above is designed to work with only one mode of data, textual data.

But what if the Large Medical Model were also a Large Multimodal Model?

By Piotr Bodzek, MD – Uploaded from http://www.ginbytom.slam.katowice.pl/25.html with author permission., CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=372117
  • Rather than converting a medical professional’s voice notes to text, the LMM-LMM would work directly with the voice data. This could lead to increased accuracy: compare the tone of voice of an offhand comment “This doesn’t look good” with the tone of voice of a shocked comment “This doesn’t look good.” They appear the same when reduced to text format, but the original voice data conveys significant differences.
  • Rather than just using the textual codes associated with an X-ray, the LMM-LMM would read the X-ray itself. If the image model has adequate training, it will again pick up subtleties in the X-ray data that are not present when the data is reduced to a single medical code.
  • In short, the LMM-LMM (large medical model-large multimodal model) would accept ALL the medical outputs: text, voice, image, video, biometric readings, and everything else. And the LMM-LMM would deal with all of it natively, increasing the speed and accuracy of healthcare by removing the need to convert everything to textual codes.

A tall order, but imagine how healthcare would be revolutionized if you didn’t have to convert everything into text format to get things done. And if you could use the actual image, video, audio, or other data rather than someone’s textual summation of it.

Obviously you’d need a ton of training data to develop an LMM-LMM that could perform all these tasks. And you’d have to obtain the training data in a way that conforms to privacy requirements: in this case protected health information (PHI) requirements such as HIPAA requirements.

But if someone successfully pulls this off, the benefits are enormous.

You’ve come a long way, baby.

Robert Young (“Marcus Welby”) and Jane Wyatt (“Margaret Anderson” on a different show). By ABC TelevisionUploaded by We hope at en.wikipedia – eBay itemphoto informationTransferred from en.wikipedia by SreeBot, Public Domain, https://commons.wikimedia.org/w/index.php?curid=16472486.

2 Comments

Leave a Comment