Close
Company Post

Using Generative AI to Envision a Sustainable Near-Future

AWS came to Hard Work Party with a loose brief wanting to tell a story involving both sustainability and their new generative AI tools for their Sustainability Showcase at the 2023 AWS re:Invent conference in Las Vegas. Hard Work Party's approach would be to employ generative AI in a real-time manner, in a way that would initiate meaningful conversations, explore the boundaries of what is achievable, engage deeply with sustainability issues, and utilize the medium to envision a realistic present where different choices were made with the same resources, rather than focusing on unattainable utopian ideals. Alongside AWS and Stable Diffusion, TouchDesigner was utilized in a number of ways, serving as the scripting, compositing, and presentation layer while orchestrating and integrating output from a ComfyUI backend. We spoke with Hard Work Party's Noah Norman to gain further insights. 

 

Derivative: Can you give us a quick idea of the design brief for this project?

Noah Norman: AWS came to Hard Work Party with a loose brief: in the context of their Sustainability Showcase at the ‘23 Vegas AWS re:Invent conference, they wanted to tell a story that involved both sustainability and their new work with creating generative AI tools on AWS. It was that open-ended.

We had a small footprint on the physical show floor but a high-traffic spot in a thoroughfare where many folks would pass by and many more would gather to meet and await sessions in the conference halls nearby. We knew there would be no staff to explain the piece so it was important that it be somewhat self-explanatory, albeit with some mystery left over to draw guests in. All in all it was a pretty blank slate.

Derivative: What inspired the theme of using generative AI to envision sustainable transformations for the AWS conference?

Noah Norman: The most direct answer is that gen AI and sustainability feel like two of the most salient issues of our time, and AWS is right in the middle of both of them. AWS have customers that are making a huge impact in both spaces, both as innovators and as operations with lots of room to improve. These are the two things AWS and their customers are talking about most right now.

The brief AWS came in with was open-ended but they were nonetheless looking for a solution in the narrow intersection of a complex Venn diagram.

The right idea would use generative AI generatively (like, in real-time), would serve as a conversation starter, would push the limits of what’s possible, and would meaningfully engage with the topic of sustainability.

But from previous work with gen AI in production, I knew that presenting raw AI outputs, uncurated and unsupervised, directly to the public, is not feasible for a few different reasons. 

The brief explanation is that there is no model that outputs “great” images (defined however you will) more than a low percentage of the time, and when you make the outputs huge and present them to the public as we did with this installation, the expectation is that some very high percentage of the images onscreen are going to be “great”. I describe this as a problem of models having a low “hit rate”, but at root it’s a mismatch of expectations. I expanded on this a bit in this blog post about generative AI in production if you’re interested. 

Over the course of a few projects, one of the strategies I developed to deal with the inconsistency in the output of generative models - the low hit rate - is what I call the “Hieronymus Bosch method”, or the “mural method”.

This was inspired by the early outpainting work of Herndon Dryhurst Studio - explorations in which they used DALL·E 1 to create high-resolution artworks with decent visual coherence through a process they don’t totally explain. I recognized in that work that while individual patches of the image - single outputs from the model at, in their case, 512x512, might be themselves incoherent or uninteresting or even ugly, the gestalt effect of the larger canvas is interesting and evocative and thus inured to the “failure rate” of the model.

Originally I pitched the idea to AWS of having an ever-evolving mural, one where a melting patch-by-patch replacement is constantly playing out, and one where the overall canvas isn’t necessarily coherent, themed, or representational at any given time. TouchDesigner user Scottie Fox got some compelling results from a similar process when inpainting patches on cube map textures a while back.

Unfortunately with realistic, representational content, presented on a traditional flat canvas, this turned out to be illegible most of the time and not very temporally dynamic, and thus uninteresting.

As soon as I built out a system that could do that kind of work, though, I realized that it was perfectly suited to a before-and-after ‘overpainting’ sort of transformation - one where a single scene fluidly reconfigures itself into another coherent picture. That was the “aha” moment where the whole idea coalesced. 

In terms of the choice of subject matter, it might be tempting to use this tool to say “what if we threw infinite money at the problems caused by climate change?” or “what if things were just better?”, and to use it to literally paint over tough or even intractable problems with a pollyannaish utopian vision, but of course that’s completely unhelpful.

Instead, I thought “what if we used this medium to envision a realistic present day where we made different choices with the same resources?” 

The tool we built begins with a high-resolution pre-generated AI image of a less-than-ideal present-day scenario, then, over the course of about 1-2 minutes, sort of flows into a believable, more sustainable alternative, in a generative (and stochastic) real-time process. 

This was presented on a large high-resolution LED screen in a way where the image was constantly appearing to fluidly ‘melt’ before the viewer’s eyes into a different image, slowly and inexorably, and in a way that never repeated over the 6 days the installation was up.

For that we built out dozens of transformation scenarios like:

A typical American gas station ➝ A biophilic charging oasis with shaded seating, lush landscaping, and a cafe

A typical American 'stroad' ➝ A solar-harvesting protected bike lane adjacent to light rail, making a walkable community

A dense monoculture industrial greenhouse ➝ A vertical aquaponics facility growing a variety of crops

A diesel bus, idling in a cloud of smoke ➝ A sleek electric bus charging wirelessly through its bus stop

A series of flat, barren urban rooftops ➝ A connected series of lush green roofs shaded by solar pergolas

A dusty city undergoing desertification ➝ A slightly greener and much less dusty city using greywater capture and strategic irrigation

A festive table laden with meat-heavy dishes ➝ A sustainable feast of beautiful seasonal vegetables and grains

A deserted open-pit coal mine ➝ A park and solar power production demonstration facility 

This idea chimed beautifully with the brief. We were now, as a team, discussing the practicalities and details of specific climate solutions — how they’d look and feel — and how they’d sit in the same place as less-than-optimal situations we have today, and we were using generative AI to help picture that transformation, literally.

Derivative: Can you elaborate on how TouchDesigner was integrated into the project workflow?

Noah Norman:

For this project TouchDesigner was used as the scripting, compositing, and presentation layer, orchestrating and consuming the output from a ComfyUI backend.

ComfyUI, for those who aren’t familiar, is a node-based interface to a community-supported library of Stable Diffusion inference tools. 

ComfyUI can run as a headless inference engine — it hosts a webserver with an endpoint that expects a JSON payload. That JSON object describes a ComfyUI network (called a ‘workflow’) in its entirety, including node layout, patching, and arguments.

So I was creating workflows in Comfy, exporting their JSON representation, bringing those into TouchDesigner, and modifying the contents of those objects in order to invoke different diffusion inference processes depending on the scene we were running.

In show mode, the TouchDesigner engine would do a spiral walk of slightly-overlapping 1-megapixel regions of the initial image (the “start plate”), randomly varying the aspect ratio of the cropped area by selecting among the the resolutions used in training the SDXL model. It would take that AOI crop and a dynamically-generated bitmap mask, generally a fuzzy oval inset in the crop frame at the same resolution, and sling the two over to ComfyUI with the JSON workflow description, then await the inference result. 

TouchDesigner then re-composited the resulting inpainted image back into the main canvas, using the original mask to reject any leftover diffusion artifacts outside the inpainted area, then repeated the process until the spiral walk had covered the canvas either 1 or 2 times, depending on the composition. 

The presentation composition was running from a separate TouchDesigner process doing some melty transitions at a steady 60FPS. That way I didn’t have to worry too much about blocking calls for disk I/O, hitting the API, etc. - the back-and-forth to ComfyUI was happening in a non-realtime context while the presentation layer was locked at 60, constantly doing a very fluid A/B ping-pong as the heartbeat of images came over from the inference and scripting.

I rarely take the choice of adding an additional instance of TouchDesigner lightly - there’s some additional complexity in dev and ops that comes with that - but it was nice to separate concerns and leave the inference engine / state machine out of the optimization equation. It’s really running at an effective 1/7 FPS so that felt like a natural choice.

When a scene finished its transformation, the system would ‘melt’ the whole thing into a new randomly-chosen start plate and start over again.

Derivative: What specific features or capabilities of TouchDesigner made it an ideal tool for this project?

Noah Norman: This was one project where my usual answer about TouchDesigner’s rapid prototyping and iteration capabilities is not the top reason it was the right tool for the job. In this case, the reason TouchDesigner was so powerful was the ability to finely introspect textures, pixel by pixel, all the way through a pipeline. 

There were a lot of gremlins to work out in alignment, mask tuning, inpaint location selection, color space, and the scripting of how the system moves through an image, and I can’t imagine doing that stuff with another tool. TouchDesigner makes it so easy to stand up debug views, do analysis, bug hunt, find seams, spin off test cases, and do it without the ceaseless stop/start and context switching that would be required in almost any other context.

I suppose it would have been feasible do to something similar in, eg, pure python, but it would have been hateful (or at least well outside my experience) to do the kind of debugging that was required in a context where so many things hinged on pixel-peeping and loupe-style scanning across moving images for hours on end.

Derivative: Did you collaborate with other vendors/producers for the project and if so were there any takeaways from that experience?

Noah Norman: Not really! This one was all in-house.

Derivative: Can you explain the role TouchDesigner played in integrating the diffusion inference techniques such as fine-tuned models, custom LoRAs, and layered ControlNets?

Noah Norman: In the process of getting the system to produce legible, semantically consistent, visually interesting results, we found that this was far from a one-size-fits-all thing. While it’s true that in some examples, people generate fantastic AI images using a simple single-model inference workflow and prompt engineering alone, it’s a well-kept secret that many images presented as ‘AI-generated’ are the result of using AI-powered workflows a little more like we’re used to using TouchDesigner, or Photoshop. Each pixel of the image might have been generated by AI, but often in many passes, using many different techniques, models, and control methods, and composited using masking, layering, and cut-and-paste techniques along the way.

While the general process we used for transforming images was consistent from scene to scene, we found that to get great results, each one required a different workflow, including different underlying base models, LoRAs, sometimes custom LoRAs to influence style or to introduce concepts that aren’t present in the base models, parameters, and, in some cases, ControlNets to acquire more compositional consistency as the image is gradually replaced. 

In the development process of these looks and workflows, TouchDesigner made it possible to build up a look-development GUI, showing me exactly what data I was passing across to ComfyUI, what the input textures to the workflows were, what the result looked like coming back, and how it fit into the context of the larger image.

Again this is a story of TouchDesigner being great at having no significant distinction between the presentation state and the development / debug state of the application.

Derivative: Were there any particular challenges you faced while using TouchDesigner for this project, and how did you overcome them?

Noah Norman: There were a bunch. Two big ones come to mind:

First, the interop between TouchDesigner and ComfyUI via the JSON API payload definitely presented a few challenges. Currently the tooling in Comfy is a little better but at the time the only method for invoking an inference workflow from outside Comfy was to take a JSON object that described the entire state of the workflow, essentially all the nodes, their connections, and their settings, and to pass that object to the webserver endpoint running on Comfy to queue that inference.

I never went so far as to write my own script to generate these JSON workflow descriptions, so I was using the UI part of ComfyUI to iterate on designs, then using its built-in tool to dump out JSON descriptions of the state of the workflows. I’d then grab those objects in TouchDesigner and inject new values where appropriate to run inference with different settings, on the fly. 

Without going too deep into what is a fairly dry and annoying process, I knew at the outset that I absolutely did not want to be dealing with runtime type errors when chopping up thousands of lines of JSON with hand-coded inputs of all sorts. To solve that, I used pydantic to define a number of classes related to the sorts of data entities I was dealing with - abstractions related to base models, LoRAs, samplers, VAE, etc., and relied on pydantic to give me helpful linter hints in the IDE, run validation, and throw errors when I made data-entry mistakes or was mishandling some object I had defined.

Ever since then I’ve used type hints at minimum and pydantic wherever possible in all my TouchDesigner projects. This has resulted in many headaches avoided, and it’s been helpful to be forced to define your terms before passing around data, especially in a duck-typed language like Python.

I’d really recommend anyone using Python in general to make more use of typing, even in a context like TouchDesigner, where implementing it can have a little more friction.

The second bit of friction involved getting textures passed between ComfyUI and TouchDesigner as fast as possible. I’m frequently pushing for work to have a more meditative, contemplative pace, and this piece was no different. 

On paper, the client was on board with that idea, but as we progressed in the development process, we found that the small scale of pre-renders as viewed on the client’s computer screens weren’t helping tell the story of how the thing would feel on a large screen in-person. This is a common problem with installation development, and for a larger project I’d usually try to get in the same place as the client and put a VR headset on them. That can result in a better experience of the effect of the piece at scale, and I find it can sell through the vision of a calmer, quieter piece.

For budget and timing reasons, in-person VR pre-viz wasn’t possible for this project, and providing comps of the finished piece with human-scale reference and eyelines wasn’t totally making the case, so I got a few requests for these transformations to progress more quickly as the changes played out.

That would have been easy to accomplish by simply utilizing more of the patch size in our inpainting process - I was already using 1MP patches in SDXL models - areas that were an appreciable fraction of the final screen resolution - but the more of the patch used for a given inpaint, the less is available as context to the model to provide visual continuity to the finished composite, and I wanted to make sure images stayed coherent and didn’t have notable seams at inpaint intersections.

Right at the time this project was in active development, the SDXL Turbo and Lightning models became available. Doing inference on them is much faster (as implied by their names), but their usability is inferior in multiple ways, importantly in the inability to raise classifier-free guidance to the required level, or to use certain flavors of ControlNet or negative prompting, depending on the method.

So I was left trying to juice our inpaint ‘lap time’ with SDXL models as much as possible.

In addition to torch tricks, half-precision, pruned models, parameter tuning, and a few very deep-cut techniques introduced to the inference pipeline, one major thing for inference time-saving was using the Script TOP on the TouchDesigner side to place the inpaint patch and alpha mask into shared memory before picking them up with a custom shared memory node on the ComfyUI side, and then doing the reverse coming back. 

This meant that I didn’t need a “safety delay” when reaching for images written to disk, where previously I was waiting to ensure a file had completed writing before passing the data off between the two processes. That shaved almost two seconds off the round trip for each inpaint patch, which was a significant and needed speedup.

Derivative: Can you share any insights or best practices for other artists and developers who want to use TouchDesigner for similar projects?

Noah Norman: Any time you’re using something external as a “black box”, be sure to build up your tooling / wrappers around it one piece at a time, and make sure you do it in a way where you can validate that you’re sending what you think you’re sending and what the tool expects, and that you’re getting back what you think you should be getting.

You can lose days at really unfortunate times to some seriously subtle errors if you don’t have your ducks in a row in that regard.

Derivative: What role do you believe art and technology play in driving conversations and actions around climate change and sustainability?

Noah Norman: I’ve always believed in getting to the capital-T Truth of why a client or a patron wants to work with me. Often the answer is simple: they think the stuff I make is interesting, and they want to participate in making something interesting. “Interesting” here can mean cool, new, thought-provoking, beautiful, etc., and the specifics of that are vitally important, but I’m intentionally simplifying here. 

When something is interesting, it can draw attention away from the shouting world around it, and when you can get attention, you can ask questions, or prompt people to ask their own questions. 

So I think new media art has the potential to get attention - to get people to engage with issues, and in particular, at the moment, gen AI can really have that effect.

It’s really easy to fall into a trap, though, of using gen AI to just make pretty pictures of beautiful scenarios that, while believable on their own, represent an impossible outcome from the present. Like you can just paint over an urban highway with advanced light rail but in reality, at least in most countries, a project like that would take many folks’ life work and three acts of god to accomplish.

It’s also really easy to let these tools reinforce biases — frequently sustainable development is new development, and new development has to be sold to many stakeholders. The images generated to sell through new development projects have a certain aesthetic that, regardless of whether the project includes affordable housing or makes room for existing communities, can look like gentrification.

Since the data set used to train the available models is the world of images that have been made so far, it can be challenging to generate images of sustainable scenarios in the built environment without them looking like displacement or spaces for the rich. 

So it’s not enough to just sprinkle some ‘AI pixie dust’ on a project and theme it “sustainability” any more than it would make sense to take any sculpture out of LED and say it’s “about sustainability.”

You really need to make sure that the form and the function are one and the same. If you can turn it around enough, you’ll often find there’s some way that the expression of the thing reinforces the message, and if you can find that, you can actually get viewers to think about sustainability or any other message you’re concerned with.

Derivative: What do you hope audiences take away from experiencing the Sustainability Showcase?

Noah Norman: For people who experienced the piece in person or online, I really hope that the basic framing of Epoch Optimizer (that was its punny name) felt natural and salient: sustainability doesn’t have to be sacrifice, doesn’t have to be expensive, and doesn’t have to exist only in the future — it’s just a matter of making different choices with the same resources, and the result can be clearly preferable to the choices we’ve been making so far.

Derivative: Can you describe a moment during this project that was particularly rewarding or enlightening for you and your team?

Noah Norman:

I really didn’t expect to have as many deep conversations with guests about specific climate solutions as I did while standing in front of the piece at the conference, and it felt like the best possible validation of the concept. 

Just as we hoped, guests would see something eye-catching, slow down, square up to the screen, and quiet themselves, locking in for a while as they watched a transformation take shape. When they had questions, they were usually first about the tech - what kind of AI, how does it work, etc, but quickly the conversations segued to walkability, or the carbon impact of our diets, or transportation policy, water policy, etc. - all the real issues at the heart of the story. I had one attendee say “I never thought I’d be talking about stroads at this conference - I gotta text my urban planning group back home and tell them we had this conversation.”

Derivative: How do you envision the future of interactive media art in promoting societal change and awareness?

Noah Norman: This project succeeded because it presented guests with detailed images of sustainability scenarios in a way anyone could understand. It’s one thing to use sustainability jargon about best practices, or to point to success stories in planned communities, new projects, and far-off places, but the whiz-bang tech element of this, and the shiny physical presentation, coupled with the slow pace, were enough to engage the audience and get them to sit with these transitions long enough to see them as equally possible outcomes from the same conditions.

Interactive media art can be effective in injecting novelty to ideas that otherwise wouldn’t reach the viewer. A lot of people are inured to messages about things like climate change, or social issues, in a way where there’s almost nothing you can do in a more common medium - on the web or on a small screen - that’s going to hit home. 

And while it’s really common to try to summon awe in these projects, whether it’s through scale or repetition or bright lights and loud noises, I think it’s not just the awe that you can create with new media that’s useful - it’s getting people to quiet down enough to hear what you’re trying to say.

That’s harder than it’s ever been but if you can do it, you can get people to engage with ideas they might not anywhere else.

Derivative: What projects or directions are you excited about, and what can we expect to see from you in the near future?

Noah Norman: Right now I’m working on a roomscale gaming platform startup! It’s called Third Wave Arcade. Projection on three walls and the floor, markerless motion capture for controls, multiplayer arcade-style gaming. We’re making game prototypes, building out a dev SDK, having fundraising conversations, and developing a UX language for what is a completely unexplored medium. 

Concurrently, we’re telling the story of the build process in real-time on YouTube. We’re working on the first video now, and it’s going to be about making games in TouchDesigner actually. Smash that subscribe button to get it when it drops!!
 

Follow Hard Work Party Web | Instagram
Also from Hard Work Party on the Derivative Showcase, read Elsewhere Sound Space: Twitch's First Space Cult