OpenAI and Google Brain are working on new text-to-image AI models and they look insanely powerful. Check it out:
These models transform a sentence into an image. So a prompt like:
A transparent sculpture of a duck made out of glass. The sculpture is in front of a painting of a landscape.
It's a bit intense right. There's many more examples showcasing how powerful this is. Especially with generating photo-realistic images, drawings and oil paintings.
If you want more examples, here's a quick promo video by OpenAI on their model called DALL-E 2 to understand what it's capable of. The one by Google Brain looks at least as powerful.
For those of you keen on understanding what's going on behind the scenes, here's some neat videos, the model websites and research papers.
So of course access to these models is not public yet and there's always the possibility that they chose perfect examples that are way overhyping the capabilities. My personal experience playing with the other OpenAI models and following progress on the research papers suggest otherwise. I'm always impressed by the speed at which AI innovations are advancing.
That's the question I've been asking myself a bunch of times recently. It's not evident. I mean I get how other machine learning models can be very useful.
For example, image classification and image segmentation models can be used to detect defects in solar panels. As shown in a another post on Development Hackers
But text to image models?
Maybe the first step in doing good is not doing evil. Unfortunately these text-to-image models make it much easier to produce fake images and possibly fake news.
Because these models are trained using web-scraped images and data, they're also very susceptible to biases and social stereotypes. For example, asking the models to show you photos of criminals could lead to racist biases from the internet...
There's a lot of research on the subject if you want to look into it. Also a lot of drama. It's very interesting.
Furthermore the datasets used to train these models also include some amount of pornographic imagery... so that can lead to unexpected outcomes.
Are there ways to use these models for positive impacts?
Maybe they could be used to illustrate concepts very easily.
For example I've used the language learning app Duolingo a lot. It's great but it's always making you say weird sentences like:
_"las tortugas comen arroz" --> "the turles eat rice"
Auto-generating an illustration of turtles eating rice would help make the words stick in my memory and learn faster. I think e-learning apps could benefit from image generation.
Or maybe it can help spread a message. A climate change blog would ask for "Dramatic painting of a koala fleeing a burning forest in Australia" for the banner of an article on forest fires.
I'm improvising here. I'd love to hear your thoughts on positive or negative impacts of text-to-image AI models.
I think there's no reason to only let techies from Silicon Valley use these new technologies to optimise ads.
- What do you think?
- Can we use this for something really useful?