It appears as though the times of me laughing at the misfortune of AI's attempts to generate images and text have come to an end. From here on out, they'll be laughing at me, specifically at pictures of me. Why pictures? Because, well, AI can see now. They have eyeballs, AI-balls if you will. With the latest release of GPT-4’s Vision mode, it can look inside pictures and tell you exactly what’s happening.
We've seen examples of people using GPT-4 Vision to create everything from AI Football commentators, who, by simply looking at a frame of video every so often, can predict and narrate what is happening and what’s most likely about to happen. There’s even an AI David Attenborough that can narrate your life in real-time. It's good, really good. It’s a superpower, honestly.
So, what did we decide to do with this new superpower? Make it watch Hallmark Christmas Movies. Obviously. I mean, if an AI can capture the play-by-play of what's happening inside a Hallmark Movie, then it can understand pretty much anything about humans. This is fine; everything is fine.
We fed intermittent frames from Hallmark Christmas Movies to GPT-4 Vision, and it sent back transcripts of what it saw. We then take the transcribed text and convert it into audible speech with the OpenAI Text to Speech API. We end up with an MP3 containing the commentary of what the AI saw, which is then synced back up to the original movie for a play-by-play explanation of what is happening in the movie.
You can get access to this new feature inside of ChatGPT Plus, but if you want to have a little more fun and have GPT-4 Vision narrate any short video, say a commercial, then just grab your OpenAI Key and head over to this experimental Colab notebook I set up. You can even modify the tone in which it writes the transcripts. The current tone is Mitch Hedberg <3.
As marketers, we can use it for even more powerful purposes. Detecting sentiment and brand perception inside a video is giving us insight only once dreamt about. Did someone just post a TikTok about your Christmas sausage company? Was it good, bad, ugly? Well, in a matter of seconds, GPT-4 Vision can look inside and tell us how we were perceived. Better yet, it can generate a clever and contextual draft for a comment that I can use to reply. Even better yet, it can write a positive and insightful blog around that TikTok to highlight our brand's presence in the video and explain why others should care.
While the idea of AI dissecting Hallmark movies is as amusing as licking a frozen metal pole, the real potential of GPT-4 Vision is no joke. So, whether you're cuddling up for a Hallmark Christmas movie marathon or pondering the endless potential of AI, remember: GPT-4 Vision is the gift that's just started giving.