Identity and Movement: Motion Transfer With Neural Networks

Deep fake dancing and breaking technology.

I’ve been talking with the StratoFyzika team about how identity resides in the body for their residency in CounterPulse’s Combustible program. We’re researching surveillance technology, machine learning, and dance for a performance in spring 2020. The research is conceptual as well as practical, and the following comprises my notes in building a first prototype of an AI for the performance.

A common conception of identity might include one’s values, likes, dislikes, various personality traits. “The things that make me who I am.” How does your body make you who you are? Conversely, how does who you are make your body? What are the indelible traces of your identity in your body? Our bodies are our interfaces with reality. There is nothing we do or experience that is not mediated by the body, even in virtual or online reality.

These are not entirely philosophical or abstract questions. Surveillance technologies track us based on our gait, facial geometry, or voices. They are recognizing the traces of our identities in our bodies. Meanwhile, every aspect of our online presences can be faked using AI, as Jordan Peele notably demonstrated in what is now a quaint seeming deep fake. These technologies are leveraged as instruments of control, whether for influencing consumers to buy products, or for enforcing fascist hegemony.

One of my dance teachers was fond of quoting Martha Graham, saying “the body never lies” as an invocation of the inherent authenticity of dance and its ability to find and speak truth. To a degree, surveillance and deep fake technologies are exploiting our notion that the body never lies. Peele’s deep fake plays on our trust that if we saw Obama’s mouth moving, speaking those words in a voice that sounds like his, then he said those words. But these technologies also easily fall down when confronted with physical, bodily reality. Surveillance is easily defeated by makeup, or taping a picture of people holding umbrellas to your shirt. Indeed, early explorations for this project quickly discarded an idea for building a vocabulary of anti-surveillance movements, simply because it was so easy to confound gait detection that it became uninteresting.

The conception of the never-lying body seems simultaneously naive and like a potent vector for intervening in these systems of control. At the very least, dance can provide a perspective for complicating our assumptions about the body, truth, reality, and technology; for finding the edges and spaces between.

As a first step in investigating this space for our project, I worked with Daria and Hen to create a deep fake dance. Here’s how it works: Hen video records herself moving around so that a computer can build a visual model of her body. Meanwhile, Daria video records herself performing a dance so that a computer can build a structural model of her movement. Then, the computer applies the structure and movement from Daria to Hen’s visual representation. The result is a video of what appears to be Hen performing a dance she never performed.

This approach is based on Chan, Ginosar, Zhou, and Efros’ paper Everybody Dance Now (2019). Being an initial test, there’s a lot of room for improvement. Notably, the shadows in Hen’s target video are enough to throw off the pose estimation. My slap-dash removal of the shadows caused the fuzzy black artifacts in the synthesized output.

This video demonstrates an alarming potential for not only showing a person saying something they did not say, but doing something they did not do. But it also finds the edges of the technology, the errors that describe an open space of questions and possibility. Svetlana Boym describes technology like this as nostalgic or “off-modern”:

To err is human, says a Roman proverb. In the advanced technological lingo the space of humanity itself is relegated to the margin of error. Technology, we are told, is wholly trustworthy, were it not for the human factor. We seem to have gone full circle: to be human means to err. Yet, this margin of error is our margin of freedom. It’s a choice beyond the multiple choices programmed for us, an interaction excluded from computerized interactivity. The error is a chance encounter between us and the machines in which we surprise each other. The art of computer erring is neither high tech nor low tech. Rather it’s broken-tech. It cheats both on technological progress and on technological obsolescence. And any amateur artist can afford it. Art’s new technology is a broken technology.

– Svetlana Boym, Nostalgic Technology: Notes for an Off-modern Manifesto, 2006

This broken technology provides a paradoxical perspective on deep fakes and motion tracking, at once butting up against the limits of the technology, problematizing the technology, and opening up a space for creative exploration. We find Boym’s chance encounter with the machine, in which we are surprised simultaneously by its fragility, its consequences, and the potentials for repurposing it in unintended ways.

To further explore this space, we’re stepping forward with another set of questions and investigations. Can Daria and Hen perform live deep fakes, coercing new performances from each others’ images? Can the source structures be algorithmically modified to produce choreographies that are collaborations between the machine and the dancers? Can the technology be developed to the point where it could fool a viewer, like a Turing test for dance?

Cheap, Easy Video Player Cluster

Play many synchronized videos on the cheap using Raspberry PI.

Raspberry PIs make great video players. They’re cheap, and can playback high def video smoothly. If you need to play multiple synchronized videos, however, they don’t work so well. Without something to keep them in sync, they’ll play slightly slower and faster than one another until you can see them getting off.

This set of tools allows the PIs to talk to eachother over a network using a simple peer-to-peer protocol to automatically synchronize, including hot-swapping new PIs in and out; gives you a control panel you can use on your phone or laptop; makes sure they start and stop automatically when they lose or get power; and allows you to quickly setup a whole bunch of PIs all at once so that you don’t have to configure each one by hand.

The main software is called node-omxplayer-sync, and node-omxplayer-sync-devops will help you get it running on 1 or 100 PIs. They’re not really intended for simple installation (yet). You’ll need to be familiar with the PI commandline, and some experience with NodeJS wouldn’t hurt.

The images above are of Surabhi Saraf’s Remedies, which used this tool to display two four-channel looping videos.

Remedies, Saraf
a four-player network
Remedies, Saraf
Remedies, Saraf

WebUI for Cinder

Control Cinder applications from your web browser.

This library makes it easy to control Cinder with a web browser. It’s especially designed for live performance, and works well in tandem with other interfaces. For instance, I used it alongside an Ohm RGB control surface to control visuals for GRAINS. The interface is easy to setup, and includes a JavaScript client library for binding browser interface elements to Cinder variables. See the GitHub project for more details.

the browser interface
the main Cinder window