Today, I’d like to talk about the beauty of AI or artificial intelligence, and the ability to use images that some people are branding as deep fakes, but other companies are using to great effect to communicate around the world at low costs, but with high quality, including famous stars like David Beckham. I got to think about this, because I did a shoot with Marcus Ahmad here in Bristol. When we got there, there was this moment where everyone was worried whether they should be wearing their masks or not. One of the issues was that had it been just he and I, we would perhaps get over that, but if we were to do an event or a shoot with multiple people, then we wouldn’t be able to do that, because with social distancing, it would have just been impossible to all be in the studio together. There’s an issue that’s been created by COVID, which I think is being addressed by AI. I‘ve spent some time looking to try and solve some problems for myself, which are around, what if I want to show the face of products and clients, but I can’t actually send a film crew? I’m a big fan of video, and I’m a big fan of having people, but we’ve all used up our archive photography. We’re all getting out slowly but surely, but with social distancing, it’s becoming very difficult to have any group shots or any press briefings.
Why WPP uses Synthesia
WPP is a multinational agency originally founded by Sir Martin Sorrell, and their solution is a London company called Synthesia, which was founded in 2017 by researchers and entrepreneurs out of UCL, Stanford, and Cambridge. At WPP, they’ve created corporate training videos for their tens of thousands of employees worldwide, but instead of having presenters speaking individual languages, they used Synthesia to create an AI-based presenter. What’s interesting about this is that it’s not just AI in terms of a CGI. It’s not computer graphics; it’s just pure animation. It makes use of pictures of individuals, of real people’s faces that are uploaded to the Synthesia platform which are then animated. WPP Chief Technology Officer, Stephan Pretorius, who was covered in WIRED said, "With Synthesia, we can have avatars that are diverse and speak your name and your agency and in your language, and the whole thing can cost $100,000." They cover English, Spanish, and Mandarin in their internal training program, and they aim to send out 20 five-minute modules to 50,000 employees for what was less than the price of a movie ticket, less than $2 per head.
On Synthesia, you can upload a picture of the person that you’d like to be the spokesperson and the text, and then it will turn the face into an AI avatar. The quality of the avatar is amazing, both in terms of the facial expressions and also in the lip synchronization. Although it’s not 100% accurate, it’s very close to the real thing. Considering that most people will be listening to the audio and are less attentive to the lip synching than they are to the quality of the audio, it’s creating an amazing opportunity to have people, in effect, made into avatars and speaking to an audience. That in itself may not sound so exciting, but with this program, you can actually have the avatar speaking up to 34 languages. You can have one person or multiple people say the same script, but the lip movement and the facial expressions change according to the language that’s being spoken.
If you were to watch any of the videos on their website, you wouldn’t feel as though you were watching some motionless avatar. It really looks as though someone is speaking to you. It’s partly about the technology, but it’s also about the practicality of it, that in the creation of videos, it’s challenging to work with a whole film crew these days, in part because people can’t necessarily travel. They may not all be allowed to be in the same room, and then that spokesperson may need to wear a mask. So, there’s a practical issue where the technology is coming to the fore, because of the limitations on our freedom to interact with each other caused by COVID.
Chinese applications even now enable its users to try on makeup using an avatar. You can paint lipstick on an avatar and choose the different kinds of skin color, eye color, hair color, and so on. With Synthesia, you’re able to animate that. Victor Riparbelli, the CEO and co-founder of Synthesia, says, "We’re saying let’s remove the camera from the equation," and since there are many occasions now where communications programs can’t take place, simply because people with cameras are not allowed to be next to the talent, this is an opportunity, then, to communicate and circumnavigate the limitations being imposed on people.
Narrated video presentations can also be from slide decks. Basically, you can speak, present, and narrate a PowerPoint or a screen. With Synthesia, you can do the same thing. You can have your avatar embedded into an infographic, a slide presentation, or an informercial that you’re giving, and this makes it great for training.
How Malaria No More got David Beckham to speak in 9 languages
Last year, David Beckham appeared on a public service announcement talking about malaria. David Beckham probably is not known for his linguistic ability, but in the video, he was speaking nine languages, including Hindi, Arabic, and Kinyarwanda (the language of the people in Rwanda). Millions of people were then able to understand it, the same people who otherwise would have needed subtitles or a voiceover. A lady from Malaria No More said that the Synthesia videos exceeded their expectations, because they were able to harness the power of the immersive AI video to break new ground and reach millions more people globally, because of the ability to do it linguistically. Interestingly, they’re also working with Reuters to create an automated presenter-led video report. In this case, it’s using match data to autogenerate a news bulletin, which autocreates the text for the AI-created avatar. In other words, from end to end, once the news has come in from the pitch, then it’s all AI. Now, why bother? Because viewers record and retain 95% of a video message compared to 10% when reading text. That’s why the safety announcements on the plane are videos and not a book. People remember 95% of a video’s message.
Synthesia did a test with a platform called Realeyes, which uses computer vision to track viewers’ attention and emotion when they’re watching videos, and they found that people connect with the AI-dubbed content more in their own language than they did if it was dubbed. In other words, using AI and using an avatar speaking in a local language got a better recall rate than a real person with a dubbed over voice. According to Synthesia, there was a 175% increase, and so it nearly doubles the impact. They also went on to say that the difference was even greater amongst young and old, that the young people had absolutely no illusions at all about what they’re watching. They were convinced almost entirely by the avatars, whereas the older people were a little bit more cynical. Synthesia’s research shows that video with AI content can increase engagement up to 1,200%, which is pretty massive. If you think about David Beckham speaking in the language of the people of Rwanda, that’s a pretty astonishing event. No wonder people found that memorable.
Synthesia for videos and Rosebud for images
Now, not everyone wants to make animated movies all the time. Luckily, there’s a company called Rosebud, which makes glossy images that are used in e-commerce. You can pick from either uploading your own photograph or you can use one of their photos. There’s some 25,000 modeling photos of people that never existed. It’s pretty amazing, 25,000 unique faces. You can also swap different elements, and what makes is better is that the changes are seamless. You can change the hair, and it changes the light on the forehead. If you change the eyes, it changes the position of the nose, and so on. They’ve even recently launched a service that enables you to put clothes on the mannequin.
For young designers designing clothes themselves on a computer, you can simply use their mannequin service to literally dress up a virtual mannequin with your outfits. Imagine the impact for product launches and production. You would then only make product to order. You wouldn’t have any stock. You’d just be designing, not having any real models come into a studio to try things on. And if you got people to buy them, then you could make them. That’s an amazing reduction in manufacturing lead times. And from an environmental point of view, it means they’re not making things that people don’t need.
They have a self-service app with no photoshoot needed; you can just drag and drop the images, and it’s $19.95 a month. Imagine how much money you’d save from not having to do photoshoots too. With virtual fashion models with full faces and full body poses, you can have a campaign at $250. In Singapore, you wouldn’t get a crew out for that amount of money. They also have what they call a white glove service for $1,000 for a campaign, and it only takes a week to do a turnaround, where you can also ask them to help you with your demographic targeting. It’s astounding when you look at what’s created, considering most people are watching product launches, PR events, and product displays online and on their phones.
These AI videos by Synthesia and Rosebud really cater to a company’s needs. Lisha Li, the CEO of Rosebud, says that the company can help small brands with limited resources produce more powerful portfolios of images, and that’s absolutely true. You can have algorithms now to help you make your portfolio instead of the expense and the delay of models. There’s a company called CAA that signed Lil Miquela for a campaign. Lil Miquela is actually an AI-created avator who is now an Instagram influencer with 2 million followers, so you get the idea. AI is here, it’s very powerful, and it’s something everyone can use for good.
Cover Photo from Synthesia