As AI often is used as an umbrella term (AI > machine learning > deep learning > generative deep learning > generative adversarial networks), the domain of generative AI is a bit of a jungle. This article tries to shed some light on a few key concepts and contemporary evolutions in the field of generative AI, using inspiring best practices to illustrate them.
Due to the invention of GANs (Generative Adversarial Networks) by Ian Goodfellow in 2014, the possibilities to create AI-generated (synthetic) media have grown enormously. New possibilities to make, manipulate and transform imagery are updated constantly, at growing levels of sophistication. GANs function by letting two type of networks, a Generator and a Discriminator, compete against each other. The task of the Generator is to generate fakes, based on a given dataset, to try to fool the Discriminator. This second network’s objective is to spot the fakes and distinguish them from the real images of the provided dataset. These networks basically train each other to get better in their tasks, resulting in better output. Deep fakes, as shown in the examples below, are a well-known application making use of the GAN architecture.
Highly recommended is the article ‘Design in the age of synthetic realities’ by Andy Polaine, as he focuses on innovative possibilities for creators and storytellers.
Jordan Peele, US, 2017
This deep fake video caught the public’s attention a few years ago, as it offers a chilling preview in what the future of fake news could look like. Peele’s production company made this through a combination of Adobe After Effects and the face-swapping tool FakeApp (Vincent, 2018). Needless to say, deep fake technology is feared by many, as it creates the possibility to generate ‘alternative facts’ in unprecedented ways.
You can read more about this video on The Verge.
Francesca Panetta & Halsey Burgund, US, 2019
This film installation takes an interesting spin on a key event in 20th century history: what if the Apollo 11 mission would have gone wrong? It features a deep fake version of president Nixon (developed at MIT Center for Advanced Virtuality) who delivers the emergency speech that was prepared in case of disaster. The work, which premiered at IDFA Doclab in 2019, invites us to reflect on ways new technologies can twist and transform the truth.
Listen to the Voices of VR podcast with the creators behind it.
Ctrl Shift Face, unknown, 2019
One Youtuber who does a remarkable job at creating deepfakes, is Ctrl Shift Face. He uses the technology to create a humorous take on cinema history. In this example we see Jim Carrey taking the place of Jack Nicholson in one of his most iconic roles.
If you go to the Youtube channel of Ctrl Shift Face, you can browse through his already impressive deepfake portfolio. Recommendations are ‘Bill Hader channels Tom Cruise‘ and ‘Terminator learns how to smile‘.
Trey Parker, Matt Stone, Peter Serafinowicz, US, 2020
This weekly satire show by South Park creators Trey Parker & Matt Stone is the first recurring production which uses deep fakes as its core premise. For this first episode, it takes on the topic of deep fakes itself. The creators use the popular open-source algorithm DeepFaceLab to create deep fakes of Donald Trump, Al Gore, Jared Kushner and Marc Zuckerberg (Hao, 2020). This technology feels like a right fit into the hands of the makers of South Park in these turbulent times, against a backdrop of a contested US presidential election and a raging pandemic.
You can read more about this show on MIT Technology Review.
Terence Broad, UK, 2016
The objective of Terence Broad’s dissertation project for his research masters in Creative Computing at Goldsmiths, is to let artificial neural networks (autoencoders) reconstruct films by training them to rebuild every individual frame and resequence these frames (Broad, 2016). When uploaded to Vimeo, Warner Bros. issued a DMCA takedown notice, making this a case of copyright infringement. Warner didn’t see the difference between the synthetic simulation and the real Blade Runner, a film about AI being indistinguishable from humans (Romano, 2016). An interesting example of life imitating art.
Vive Studios, KR, 2020
If you think we’re still years away from creating synthetic avatars of our deceased loved ones (such as in that Black Mirror episode ‘Be Right Back‘), think again. This documentary, broadcasted by the Korean network MBC, “reunites” a grieving mother with a virtual version of her deceased daughter Nayeon in VR. This virtual daughter was created using motion capture, photogrammetry, voice recognition and a basic AI for conversation skills (Hayden, 2020). One can only imagine the possibilities in the not so far future, for example with the use of GANs for more realism.
You can find out more about the production process on Road to VR.
Another well-known application of generative AI, neural style transfer is the technique of recomposing one image in the style of another. Two inputs, a content image (e.g. a photo of a cup of tea) and a style image (e.g. a Van Gogh painting) are analyzed by a convolutional neural network. This network then creates an output image with mirrored content from the content image, but in the style of the style image (Kogan, n.d.). The output image of the analysis of our two examples would be: a cup of tea, but in the style of the Van Gogh painting. Neural style transfer was first demonstrated in the 2015 paper ‘A neural algorithm of artistic style’ by Gatys, Ecker, and Bethge (Kogan, n.d.).
When we take this technology a few steps further, and expand it through different media, it becomes really interesting. Gene Kogan (n.d.):
,,One could imagine a program resynthesizing The Star Spangled Banner as heavy metal or bossa nova, or rewriting Harry Potter in the frantic tone of Edgar Allen Poe. Although so far it has only been demonstrated convincingly with images, there is much effort underway at developing it in the video, audio, and text domains. The capability is prized for its far-reaching applications, as well as the insights it could potentially provide into our perception of style.”
As we will see in the following cases, the efforts of applying style transfer in video are indeed promising, as Kogan points out. In these examples, style transfer is blended with other technologies, transforming the input imagery, at times pushing the technology to its limits. The fact that these cases are both music videos should come as no surprise, as this domain has often been a testing ground for experiments with new technologies.
Mike Burakoff & Hallie Cooper-Novack, US, 2017
Although style transfer feels a bit passé to some, this video makes it look fresh again (Vincent, 2017). Using the custom built software Glooby, ‘When You Die’ is beautiful, psychedelic, and ambitious. The music video captures an unique vision of life and death through a rollercoaster of visual metaphors. It has voyages through space, a hospital scene exploding into a mural of flowers, and a grinning wife melting into a canyon made of meat and faces. It’s a nightmare, but a gentle one, worth watching over and over again (Ryan, 2017).
Weirdcore, UK, 2018
Combining different technologies such as photogrammetry, point-clouds, ASCII and style transfer (Davies, 2018), this cutting-edge video by Weirdcore (who also did a fantastic job with the colour sequence in Brian Welsh’s Beats) looks like a virtual reality collapsing into a black hole. The buildings and streets take on unreal virtual appearances: moving, oscillating, and flickering with various textures, shapes, and colors. Weirdcore: ,,It’s style transfer using Transfusion.AI over a Cornish photogrammetry collage. The original animation actually looked … low-end, but using several style transfer composites/layers in various ways really made a difference.” (Pangburn, 2018).
Latent space is a fundamental concept of deep learning. Basically, it’s a representation of compressed data. Latent space representations are used to transform more complex forms of raw data (e.g. images, video) into simpler representations which are more convenient to process and analyze. Similar data points will tend to cluster together, so patterns in data can be more easily discovered. The interesting part is that we can interpolate data in the latent space, and use the model’s decoder to generate “new” data samples (Tiu, 2020). Simply put: we can discover “hidden” space between imagery, and generate new variations. A puppy and a space shuttle might seem as two very different things to us, but by exploring the space between the data points making up the latent space representations of the two, we can morph between a dog and a space shuttle, and generate a variation of blends. You can see it as otherwise “hidden” information becoming visible. If this is not clear yet, the following examples will illustrate this concept, and explore latent space in an interesting way. As you will probably notice, the concept is closely linked to our understanding of memory.
If you want to learn more about the concept of latent space, we recommend reading the article ‘Understanding Latent Space in Machine Learning‘ by Ekin Tiu on Medium.
Memo Akten, TR, 2018
An excerpt from ‘Deep Meditations: A brief history of almost everything in 60 minutes‘, a multichannel video installation, a journey told through the imagination of a deep artificial neural network. ,,What does love and faith look like? Or ritual? Worship? … Could we teach a machine about these very abstract, subjectively human concepts? As they have no clearly defined, objective visual representations, an artificial neural network is instead trained on our subjective experiences of them, specifically, on what the keepers of our collective consciousness think they look like, archived by our new overseers in the cloud. Hundreds of thousands of images were scraped (i.e. autonomously downloaded by a script) from flickr, tagged with these words to train the neural network. The images seen in the final work are generated from scratch from the fragments of memories in the depths of the neural network. The soundtrack is generated by another artificial neural network trained on hours of religious and spiritual chants, prayers and rituals, scraped from Youtube.” (The Mega Super Awesome Visuals Company, Memo Akten, 2018).
See the interpolation at 01:42 between a sunset and a flower? The result is an eerily surreal, painterly image. We recognize some elements, at the same time the output image looks completely alien. Discover more about the technical background of this project here.
Refik Anadol Studio, US, 2019
This is not your brain melting, it’s an excerpt of Refik Anadol’s installation ‘Latent History‘.
,,Latent History is a time and space exploration into Stockholm’s past, and ultimately present, using the deployment of machine learning algorithms trained on datasets from both archival and contemporary photographs. Through the exploration of photographic memories from the past 150 years, this exhibition aims to investigate and re-imagine collective memory, hidden layers of history, and the consciousness of a city that otherwise might remain unseen. (…) At the microlevel, a digital photograph is not just a photograph, but a compilation of pixels that assemble a holistic image. This work endeavors to scale this relationship to all photographs taken of Stockholm. Would the resulting image accurately reflect the intricacies of a complex city whose character is dependent upon the interlocking relationships of the people, places, and memories that shape it? The advent of machine intelligence has allowed photography to create this historically impossible output. Able to synthesize millions of images, map their connections, and generate new understandings, we are now able to composite the discrete memories, experiences, and events of Stockholm into a new form.” (Refik Anadol Studio, 2019).
Mario Klingemann, DE, 2018
This interactive installation produces real-time digital portraits of viewers. It analyses biometric face markers and information on pose and hand movements. Then, it presents a painterly image based on everything it has previously seen. For Klingemann, audiences are an interesting source of data, they are inputs that lend unpredictability and risk. The installation is constantly learning, assimilating the data of everyone who looks into this unusual mirror. Each new portrait draws on the machine’s accumulated knowledge, each face it produces contains something of those who went before. A reflection of how the machine views its observer (Onkaos, 2019), using input from its “memory banks”.
AI Told Me, unknown, 2019
This timelapse of a neural network with the neurons switching off one by one, is a haunting experiment in deep learning. A programmer (name unknown) generated a woman face using a GAN, and then made the network slowly “forget” what this face looked like, by gradually shutting off individual neurons. At first, it seems like the generated face is aging. Lines appears under her eyes and around the edges of her face, hair thins and fades. After a few seconds, her skins turns a greenish hue, and features begin to wash away. Within sixty seconds, the face is completely decomposed, leaving nothing but a white and brown smudge (Cole, 2019).
Can a machine (help to) generate new kinds of narratives, and bring innovative approaches in storytelling? The earliest attempts to theorize about algorithmic editing can be traced back to essays of Vertov and Eisenstein from 1929. These roots, situated in Soviet montage theory, were further developed through experimentation with schema in avant-garde cinema from the late 1960s and early 1970s. With the introduction of the computer, it became possible to create more complex editing schema. Database cinema, introduced by Lev Manovich in ‘The Language of New Media‘ (2001), is a new media form that uses the computer’s ability to manipulate, analyze, organize and arrange multimedia data. It explores how the computer accesses its database, through algorithms (Enns, 2018). Manovich (who also coined the term ‘algorithmic editing’) asks the following question:
,,How can our new abilities to store vast amounts of data, to automatically classify, index, link, search and instantly retrieve it, lead to new kinds of narratives?”
According to Manovich, the computer age brought with it a new cultural algorithm:
reality -> media -> data -> database (Enns, 2018)
In ‘A Brief History of Algorithmic Editing‘, Clint Enns (2018) suggests expanding this cultural algorithm to:
reality -> media -> data -> database -> algorithmic editing -> new forms of narrative
The following examples question in their own, unique way how algorithmic editing can lead to new forms of narration.
Matt Pearson, UK, 2013
,,This video responds to Alex Rutterford, who hand-animated ‘Ganz Graf’. He claimed it was impossible for an algorithm to make a music video: ,,Everyone asks: how long did it take you? How did you do it?” (…) I’d love to say: ,,I just wrote a computer algorithm, and it did it all.” It doesn’t exist, it’s fool’s gold thinking (…) software can make intelligent decisions about pace and animation.” I hope this proves him wrong. ‘PRISMS’ is fully algorithmic. No cuts, just one continuous generative animation. All decisions (camera work, movements, …) are made by my system’s interpretation of the audio track. I created the system and then curated its output or: I just wrote a computer algorithm, and it did it all.” – Matt Pearson (in: 65daysofstatic, 2013).
Piotr Winiewicz, Dawid Górny, RNDR, DK & NL, 2019
Can a machine edit a documentary? That’s the question the installation ‘Reflector‘ is trying to answer from both a technical and philosophical point of view (Dam, 2019). The autonomous AI engine Kaspar arranges and edits the film in continuous real time. Winiewicz and Gorny are using their installation to question the potential role of machines in documentary and arthouse film production (Makropol, 2019). Johan Knattrup Jensen from Makropol, the XR company that developed Kaspar, hopes that it will show what makes a film personal: ,,How does it communicate emotion through rhythm, timing and pacing? I think that reflecting ourselves in an AI will make us more aware of what it means to be human.” (Dam, 2019)
As the field of generative AI is evolving at a fast pace, we can expect more inspiring cases to come. The truly interesting practices are those where the creative intelligence of the human counterpart is augmented by co-creating with generative AI. In that sense, AI stands for Augmented Intelligence (Philips, 2020). Or, in the words of Sayjel Vijay Patel (2020): ,,we must harness the true potential of AI: the sensitivity of designer intuition and the brute force of machine intelligence, combined.”
Here at AIDD, we keep an eye out.
65daysofstatic. (2013). 65daysofstatic – PRISMS (Official Video). (2013). Vimeo. Consulted on 03/10/2020 via https://vimeo.com/75299268
Akten, M. (2018). Deep meditations: a meaningful exploration of inner self, a controlled navigation of latent space. Medium. Consulted on 02/09/2020 via https://medium.com/@memoakten/deep-meditations-meaningful-exploration-of-ones-inner-self-576aab2f3894
Broad, T. (2016). Autoencoding Blade Runner. Medium. Consulted on 05/11/2019 via https://medium.com/@terencebroad/autoencoding-blade-runner-88941213abbe
Bye, K. (2019). #846 DocLab: Deep Fake of a Synthesized Nixon Speech that Never Happened | Voices of VR Podcast. Voices of VR. Consulted on 06/09/2020 via https://voicesofvr.com/846-doclab-deep-fake-of-a-synthesized-nixon-speech-that-never-happened
Cole, S. (2019). Watching AI Slowly Forget a Human Face Is Incredibly Creepy. VICE. Consulted on 24/09/2020 via https://www.vice.com/en/article/evym4m/ai-told-me-human-face-neural-networks
Ctrl Shift Face. (n.d.). Ctrl Shift Face. YouTube. Consulted on 08/09/2020 via https://www.youtube.com/channel/UCKpH0CKltc73e4wh0_pgL3g
Dam, F. (2019). How can a machine help us make art? Danish Film Institute. Consulted on 25/09/2020 via https://www.dfi.dk/en/english/news/how-can-machine-help-us-make-art
Davies, S. (2018). Decoding the bonkers new Aphex Twin video ‘T69 Collapse’. Dazed. Consulted on 21/09/2020 via https://www.dazeddigital.com/music/article/40944/1/decoding-the-bonkers-new-aphex-twin-video-t69-collapse
Enns, C. (2019). A Brief History of Algorithmic Editing – Jan Bot. Medium. Consulted on 03/10/2020 via https://medium.com/janbot/a-brief-history-of-algorithmic-editing-732c3e19884b
Ergopix.com. (2020). 23 – Mario Klingemann. Images.ch. Consulted on 18/09/2020 via https://www.images.ch/en/festival-en/program/artists/27-mario-klingemann-2/
Gatys, L. A., Ecker, A. S., Bethge, M. (2015). A Neural Algorithm of Artistic Style. arXiv.org. Consulted on 20/09/2020 via https://arxiv.org/abs/1508.06576
Goodfellow, I. J. (2014). Generative Adversarial Networks. arXiv.org. Consulted on 22/05/2020 via https://arxiv.org/abs/1406.2661
Hao, K. (2020). The creators of South Park have a new weekly deepfake satire show. MIT Technology Review. Consulted on 17/11/2020 via https://www.technologyreview.com/2020/10/28/1011336/ai-deepfake-satire-from-south-park-creators/
Hayden, S. (2020). Mother Meets Recreation of Her Deceased Child in VR. Road to VR. Consulted on 19/09/2020 via https://www.roadtovr.com/mother-meets-recreation-of-deceased-child-in-vr/
IMDB. (2013). “Black Mirror” Be Right Back (TV Episode 2013). IMDb. Consulted on 15/09/2020 via https://www.imdb.com/title/tt2290780/
Kogan, G. (n.d.). Style transfer. Machine Learning for Artists. Consulted on 20/09/2020 via https://ml4a.github.io/ml4a/style_transfer/
Makropol. (2019). REFLECTOR. Vimeo. Consulted on 25/09/2020 via https://vimeo.com/369546090
Manovich, L. (2001). The Language of New Media. MIT Press.
MIT News. (2020). Tackling the misinformation epidemic with “In Event of Moon Disaster”. (2020, 20 juli). MIT News | Massachusetts Institute of Technology. Consulted on 16/09/2020 via https://news.mit.edu/2020/mit-tackles-misinformation-in-event-of-moon-disaster-0720
Oberon Amsterdam. (2019). In Event of Moon Disaster. IDFA. Consulted on 06/09/2020 via https://www.idfa.nl/en/film/29cebac4-4364-49d6-bf7a-479d16c04c02/in-event-of-moon-disaster
Onkaos. (2019). Uncanny Mirror by Mario Klingemann. Vimeo. Consulted on 18/09/2020 via https://vimeo.com/336559940
Pangburn, D. J. (2018). How Aphex Twin’s “T69 Collapse” video used a neural network for hallucinatory visuals. Fast Company. Consulted on 18/10/2019 via https://www.fastcompany.com/90216189/how-aphex-twins-t69-collapse-video-used-a-neural-network-for-hallucinatory-visuals
Patel, S. V. (2020). Designers need Augmented Intelligence not Black Box AI. Medium. Consulted on 10/05/2020 via https://towardsdatascience.com/augmented-intelligence-for-sustainable-design-and-architecture-2f96a2fac95e
Philips, M. (2020). AI and Design: why AI is your creative partner – UX Collective. Medium. Consulted on 05/05/2020 via https://uxdesign.cc/ai-and-design-ai-is-your-creative-partner-cb035b8ef107
Polaine, A. (2019). Design in the age of synthetic realities – Design Voices. Medium. Consulted on 05/11/2019 via https://medium.com/design-voices/design-in-the-age-of-synthetic-realities-d00215a78580
Refik Anadol Studio. (2019). Latent History – Refik Anadol. Refik Anadol. Consulted on 18/09/2020 via https://refikanadol.com/works/latent-history/
Romano, A. (2016). A guy trained a machine to “watch” Blade Runner. Then things got seriously sci-fi. Vox. Consulted on 14/11/2019 via https://www.vox.com/2016/6/1/11787262/blade-runner-neural-network-encoding
Ryan, A. R. (2017). MGMT makes a beautiful nightmare aesthetic for “When You Die”. LemonWire. Consulted on 21/09/2020 via https://lemonwire.com/2017/12/15/mgmt-makes-beautiful-nightmare-aesthetic-die/
The Mega Super Awesome Visuals Company, Memo Akten. (2018). Deep Meditations: A brief history of almost everything. Consulted on 02/09/2020 via https://www.memo.tv/works/deep-meditations/
Tiu, E. (2020). Understanding Latent Space in Machine Learning – Towards Data Science. Medium. Consulted on 29/08/2020 via https://towardsdatascience.com/understanding-latent-space-in-machine-learning-de5a7c687d8d
Vincent, J. (2017). MGMT’s new video makes the AI art of style transfer look cool again. The Verge. Consulted on 20/09/2020 via https://www.theverge.com/2017/12/13/16772636/mgmt-when-you-die-video-style-transfer-ai
Vincent, J. (2018). Watch Jordan Peele use AI to make Barack Obama deliver a PSA about fake news. The Verge. Consulted on 07/11/2019 via https://www.theverge.com/tldr/2018/4/17/17247334/ai-fake-news-video-barack-obama-jordan-peele-buzzfeed
Weirdcore. (2018). APHEX TWIN – COLLAPSE VIDEO |. Weirdcore.tv. Consulted on 21/09/2020 via http://weirdcore.tv/2018/08/07/aphex-twin-collapse-video/