How to build a Next-Gen LMM Application Using the Human Mind as a Model
AI multi-agent architecture inspired by System 2 human cognition
Hey folks, in this week’s newsletter, I’m discussing my personal first interactions with GPT-4V(ision), and how we might start developing next-gen LMM apps.
Change is coming. Rapidly.
With the advancement of AI capabilities and the transition from LLMs to LMMs, the question isn't if our professional landscape will be altered, but when and how.
I recently encountered the wonders of GPT-4V(ision) and its uncanny accuracy—truly a glimpse into the future.
Plus, I'll shed light on emerging trends in AI multi-agent architecture inspired by System 2 human cognition (Kahneman), highlighting intriguing developments like Microsoft’s AutoGen framework.
As we stand on the brink of AI evolution, what do you envision for the future?
Let’s dive in!
This newsletter is sponsored by What Works Digital

I recently started offering Generative AI Briefings, Training and Coaching for executives and those needing to build a Generative AI strategy for their business.
If that sounds of interest to you, head on over to my website and take a look at my services today. Availability is tight, but I still have a few slots left to fill.
“AI won’t put you out of business, but a competitor using AI might”
I have an impending sense of dread that white-collar, knowledge-worker-type-office jobs are going to be gutted by AI in the last quarter of this decade, that is 2027+.
I know, it’s a bold, not to mention scary statement to make, but as I’ve written in the past, AI is evolving fast, and its capabilities are growing exponentially each year.
Of course, I could be wrong.
2027 is still a way out, and nobody can accurately predict the impact of AI in ~five years’ time.
However, what we definitely do know is that the AI we are using today is the worst it’s ever going to be, and it’s already pretty good.
“…the AI we are using today is the worst it’s ever going to be, and it’s already pretty good.”
The naysayers will point to the fact that after every major technological innovation, society has bounced back and created new jobs never conceived of beforehand.
And that’s true. So far …
But we’ve never created an ‘intelligence engine’ before, short of increasing the human population's birth rate.
Non-human intelligence is soon going to become a commodity, like corn, soya, or pork bellies traded on the stock exchange.
And soon, it might perform even better than human intelligence. It already does for some tasks.
Even if we don’t get to AGI, as the owners of OpenAI claimed is likely “within the next decade”, we are still going to have extremely powerful AI.
Take, for example, GPT-4V(ision) which I just received access to this morning as I was writing this week’s newsletter.
I took a quick snap of my dog, blissed out after her morning walk, lying in her bed next to my desk and pasted it into the ChatGPT prompt to test it out.
“In the photo, there's a white, curly-haired dog lying on its back in a soft, plush dog bed. The dog appears relaxed and has its tongue slightly sticking out. Beside the dog, within the bed, there's a purple spiky ball, likely a toy. The dog bed is placed on a light-colored wooden floor. The overall scene suggests a comfortable, restful moment for the dog.”
—ChatGPT Vision
The level of accuracy is phenomenal!
Of course, I then proceeded to upload half a dozen more photos, with requests for ChatGPT to explain why an image is funny or humorous.
Here’s an example:
And it succeeded.
Every. Single. Time.
Of course, I wrote about GPT-4V(ision) industry use cases last week, but until you have tried it out with your own personal photos you don’t get the full impact of what this technology is capable of.
And, its capabilities are staggering.
This should be front-page news the world over like ChatGPT was last year.
Building Next-Gen LMM Apps
Which leads me on to other recent developments.
Whilst Large Multimodal Models (LMMs) like GPT-4V that can ‘see’ are going to bring in a raft of new use cases for AI, there are some other things we can do to tweak the next generation of LLM/LMM architected apps.
LMMs like the GPT-4V model underpinning ChatGPT have already seen huge advances compared to models a year ago (remember how amazing the first version of ChatGPT seemed when you first saw it!), but many AI experts, like Professor Gary Marcus, believe they are an evolutionary dead end—the trilobite of the AI era—and will never get us to AGI.
Prof Marcus has argued that LLMs have too many flaws, like not being grounded in facts making them untrustworthy in many situations due to ‘hallucinations’.
Whilst part of what he argues is certainly true, many of his criticisms over the years have been overcome as more sophisticated models have been published.
I’m confident AI researchers will continue to find ways to suppress hallucinations, if not fully, to an acceptable operational level in the not-too-distant future.
The human mind as a model for AI
When building next-gen LMM apps, there are also parallels with the way the human mind works.
For example,
I’ve mentioned in previous newsletters, the book, "Thinking, Fast and Slow" (Amazon affiliate link) by psychologist Daniel Kahneman, in which he presents his research to help us understand, in an abstract rather than physical sense, what really goes on inside our heads.
The book introduces two mental systems, simply named System 1 and System 2, which drive our thoughts and decision-making processes.
Here’s a quick recap of what they are about.
Keep reading with a 7-day free trial
Subscribe to BotZilla AI Newsletter to keep reading this post and get 7 days of free access to the full post archives.