A filmmaking robot

Abstract

This robot makes short films based on its visual experience. Its eyes travel about the city on buses while the body sits in a gallery. The eyes collect snippets of video, and transmit them to the body when their buses come within range of a Cafenet wireless internet node. The robot body splits the video into individual frames and analyses each one, obtaining twenty numbers reflecting the arrangement of colour, shape and detail within the frame. These numbers are treated as coordinates in a twenty dimensional space, in which distance is somewhat related to visual difference. For twelve hours a day the robot traces a zigzagging path through this space. This path passes through a series of images, which become a video sequence. Visitors to the gallery can see this video, called variously the robot's "dream" or "stream of consciousness". At the end of the day the robot looks over its days work and joins the best parts together as a finished film. The robot uses neural networks and heuristic rules to choose waypoints for its daily dream, but the finished film is mainly selected for the smoothness of its movement through the space. The robot will remember everything it sees until it has five million images in its mind, after which it will replace its least favourite images with new ones. In addition to getting images from the eyes, the robot creates false memories by combining and manipulating well-liked and overused images. These notes are incomplete.

Background

Some experimental filmmakers are tempted to see their footage as a changing field of colour, and not as an indication of objects moving about in space. This is quite hard to do, and can result in many wasted years of studied dis-observation. I used to be such a filmmaker, until I realised that a machine would be better suited to the task.

In 2003 I managed to get some money from the Arts Council to make a filmmaking robot. I worked on it between January and July 2004, and it was shown in the New Zealand Film Archive's gallery as part of Telecom Prospect 2004. Since August 16 it has lived in the Citylink machine room. Citylink and Stagecoach are sponsors of the robot, which depends on them for communication and motion, respectively.

Although it is called a robot, the machine avoids movement as much as possible. Its body sits perfectly still, while its eyes ride around Wellington City on Stagecoach buses. The eyes connect to the body whenever they can via CafeNet, a collection of public wireless internet access points. The rest of the time they collect up video for transmission when a chance arises.

Selected films by the robot.

The July 29, August 16 and September 26 movies are available via Google Video. More will be released as I get round to uploading them. The video quality is unfortunately not great, but I lack the bandwidth to directly host the files.

If video is too much for your computer, you might prefer to look at this collection of stills.

Twenty dimensions

illustration using Plato's cave analogy to demonstrate the robot's experience

The robot lives in a cave and only sees the shadows of images of the real world. The shadows look like numbers.

The filmmaking robot looks at each image and reduces it to twenty numbers. Each number represent something like the average lightness of a region, the average level of red, blue and green, the vibrancy of colours, the density of vertical and horizontal lines, or the degree of detail. This is all the robot ever knows of the image.

These numbers can be treated as coordinates in a twenty dimensional space. Any RGB image will map to some point in that space, and images mapping to points close together in this space are likely to be similar, but are not necessarily so. The robot creates video sequences by hopping between images close together in the space.

Aesthetics

The robot only judges some of the images it uses in its dreaming. These are the waypoints, or targets, between which it traces lines. The other pictures are selected because they lie on the path to the next waypoint. When it makes an aesthetic judgement, it takes the twenty numbers pertaining to the the images and feeds them to its five judgemental faculties, which are:

Fine art. A neural network is trained to like a selection of fine art images found on the web. Most are impressionist paintings from ibiblio's web museum, but there is some contemporary and New Zealand art.
David Hall. David divided 100 of the robot's images into three sets, depending whether they were good, bad, or neither. A network was trained to like the good ones and hate the bad ones.
Enthusiast. Each day a network starts from scratch and tries to learn from and reinforce the judgement of the other two.
Away from mean. This heuristic prefers images on the edge of the space. Images in the centre will get included anyway, as paths traced between remote waypoints will cross the centre.
Away from recents. This heuristic dislikes waypoints similar to ones recently used.

The final films are selected out of this dream partly according to the softness and steadiness of change in their angular momentum, and partly by applying the same neural networks.

Confabulation

The robot develops false memories of things it has not seen. It dwells upon the images it thinks are good or unique, and combines and distorts these to create new images. It reproduces the good images to improve the overall quality of its memory, and the unique ones to fill up sparsely populated patches in its memory space. Images in these holes tend to be chosen whenever the robot's mind wanders that way, resulting in overused motifs. By making imperfect copies of these images, the robot tries to fill holes in its memory space, allowing freer drift of attention.

Credits

Credits are attached to each finished movie, which are fully explained on another page.

Frequently Asked Questions

Why does it go back and forth (or in and out)?

The robot doesn't know which way is forward. It forgets the original order of its memories, and arranges them in according to its own views of similarity and progression. This sometimes coincides with a natural sequence, and the robot will follow it as a path of low resistance, usually skipping frames along the way. When the natural sequence diverges too far from the robot's vision, the skipped frames can look better than those forward.

As the robot's memory fills, this will become less common. Currently the natural sequences form strings through the robot's mind, whereas eventually they should be more like a cloud.

Why does it sometimes use funny colours?

Some of the robot's memories are false. These memories are generated from effects applied to combinations of favourite or overused images. Most of the effects are subtle, and the false memories blend in with the real ones, but some are quite extreme. The extreme ones tend to be overused in their own right, because they occupy new space in the robot's world-view. When they are more common, they will be used less and reproduce less rapidly.

It doesn't have arms and legs, so what makes it a robot?

Nothing especially. In the beginning I was intending to make a self-propelled robot, but that quickly seemed like a stupid idea. If it wandered about on its own, it would have to avoid being run over, stuck, or stolen -- until its batteries ran out. Being parasitic of motion allows it concentrate on making films.

Where are the cameras on the buses?

Above the drivers head, beside the green sign, on buses like this.

Why did David Hall get to train the robot, and who is he anyway?

He offered to help. I didn't want to train it myself, in case I was tempted to give it data I knew it could learn from. I think it took David about twenty minutes to select the training images. He is a musician and cartoonist, living in Wellington, New Zealand.