Roatan is an incredible place for scuba diving, with some of the best coral reefs in the Western Hemisphere. The reef terrain is complex and very three dimensional, with deep canyons, caverns and swimming passages, and there is a wealth of both marine life and coral species. Here’s what it was like:
Coincidentally, I got access to DALL-E 2 while I was there last week. I started wondering, “Could I use DALL-E 2 to create a fake vacation? Or, more ethically, I could use DALL-E 2 to recreate actual events from my vacation, but I couldn’t get a good photo of them? ” I started using their interface to generate some synthetic vacation photos, and indeed a good portion of them were good enough to pass for the real thing.
I decided to test the quality of these synthetic images by mixing them in a Facebook post with real images. The post started like this:
Users scrolled through the photos one by one, in a slide reel on the web interface or in a column on the mobile interface. There were 22 real photos, four synthetic ones, and a final image that revealed the experiment. The synthetic ones all came after the real ones.
A staggering 83% of people (19 out of 23) who responded to the survey ultimately missed the fact that there was something different in the DALL-E images. This worked despite the fact that the DALL-E watermark was in the lower right corner (to comply with OpenAI’s access rules) and the fact that the images had different, grainy aspect ratios. My group of friends is also a relatively sophisticated audience who are well aware of the existence of DALL-E and deepfake.
Here’s what I think is happening:
- Our brains are partly on autopilot when we browse social media, and examining images requires extra mental work. There is a great blog post I read a while ago that pointed out that the text generated by GPT-3 already passes the Turing test for users who are browsing the text and not paying much attention. I believe something similar is happening here, where we go through the image galleries at a rate of one every two seconds, and it is actually difficult to detect the subtle deepfakery marks in a couple of seconds even if you pay close attention.
- It is likely that with a more difficult version of the Turing test, where real and fake images of the same content are presented side by side and people are told that one of them is fake, it would be much easier to detect fake images.
- A Facebook post shared between friends is a low-stakes environment with high confidence, so people aren’t in a headspace to peruse the images. If you look closely at some of the DALL-E images, you can see some obvious telltale signs of untruth. There are no dive hoses in the fake diver photo and the fins blend together. (I instructed the algorithm to use pink fins to match the ones Helen wears in real life, and was glad it worked well.)
- As one Facebook commenter pointed out, underwater environments are alien enough that most people don’t have a good subconscious for detecting fakes. That said, there were many avid divers who were fooled.
- We are not quite prepared to detect deepfakes yet, but this will come over time. I remember seeing The Fifth Element (1997) in the theater as a child and being mesmerized by the CGI of Future Manhattan, but now it looks just like something a college student would do in an introductory computer animation class. The difference is that our brains are now trained to detect the telltale signs of computer animation. However, deepfakes will improve over time, just like computer animation.
- That said, the DALL-E 2 is really good, good enough that it can produce usable stock images for a wide range of applications. It still fights with faces, as well as with complex scenes where the prompt has several adjectives attached to different names, but can produce photorealistic images or stylized art that is usable most of the time given some human care. It appears that these networks are doing a variant of “guessing the teacher’s password” instead of really understanding what they are drawing, and over time these failures could be addressed through more use of 3D training data so that the networks start learning. the actual shape and functional connectivity of objects instead of what a range of 2D views of them appear.
While I am concerned about the effect of deepfakes, say, on politics, the last few years in America have shown that there is no need for deepfakes to fool people; all you need is a blatant lie that people want believe it, and if you repeat it enough you can convince millions of people of your lie despite the overwhelming logic and evidence to the contrary.
Meanwhile, the frontier for expanding people’s visual creative abilities is wide open and I’m excited to see what the next few years will bring. I think generative imaging systems won’t make artists obsolete any more than Photoshop did; they are simply another tool for amplifying the connection between imagination and a physical manifestation of it, and imagination is still the rare commodity that good artists possess.