Close
Tutorial

Huggingface x Touchdesigner For Beginners

I built a TouchDesigner TOX that gives you one-click access to the entire Hugging Face cloud model catalogue. FLUX, SDXL, Sora 2, Google Veo 3.1, LTX-Video, Wan 2.2, Qwen-Image, HunyuanVideo — all from inside your patch, all from a single API key you paste in once.

Drop the TOX into any .toe, paste your HF token, and you have text-to-image, image-to-image, text-to-video and image-to-video generation wired straight into TouchDesigner. No local ML install, no VRAM, no model downloads. The provider runs the model; you get the frame back.

Why this matters

Up until now, if you wanted diffusion models inside TD you had two options. Run them locally on a beefy GPU — which means CoreML or Diffusers with a pile of setup, limited to what fits in your VRAM, and painful to keep current with the latest releases. Or pay for a single-model service like Daydream Scope or StreamDiffusionTD — which are great at their one thing but lock you into one model family and one vendor.

hf_infer is the third option. HF's Inference Providers route a single API key to dozens of state-of-the-art models across fal-ai, Replicate, Together, Nscale, WaveSpeed and HF's own inference infra. Pay per call, switch models in a dropdown, keep one token. That token persists across every project on your machine — same pattern Dotsimulate's tools use, because it's the right pattern.

What's in the TOX

Two inputs, two outputs, twelve parameter pages, and a preview HUD that tells you what's happening at a glance.

  • TOP input for image-to-image and image-to-video (optional).

  • DAT input for prompts (optional — falls back to a parameter).

  • Clean TOP output for the generated still.

  • Looping video TOP output that auto-plays the latest mp4.

  • On-TOX preview HUD with status, model/provider/latency line, a live log tail, and a braille-pattern spinner during inference.

Models included (38 presets curated and tested)

Every entry in the dropdown is confirmed to work through HF Inference Providers. Broken model/provider combos and models that consistently time out are documented and excluded — no guesswork, no burning credits on dead ends.

Text-to-image

  • Fast / distilled: Z-Image-Turbo, ERNIE-Image-Turbo, FLUX.1-schnell

  • Quality: FLUX.1-dev, FLUX.1-Krea-dev (photoreal), Qwen-Image-2512, Qwen-Image, ERNIE-Image, GLM-Image

  • SD baselines: SD 3.5 Large, SD 3.5 Medium, SDXL 1.0 Base

Image-to-image / edit

  • FLUX.1-Kontext-dev (best general-purpose prompt-driven edit)

  • FLUX.2-dev, FLUX.2-Klein 9B, FLUX.2-Klein 4B

  • Kontext Relighting LoRA v3 (re-light any scene)

  • FireRed Image Edit 1.1

Text-to-video

  • Fast: LTX-Video 0.9.7 distilled, Google Veo 3.1 Fast (proprietary)

  • Quality: OpenAI Sora 2 (proprietary), LTX-Video 0.9.7 dev, Wan 2.2 T2V 14B, Wan 2.2 TI2V 5B, HunyuanVideo 1.5, HunyuanVideo, Mochi 1 preview, CogVideoX-5B

Image-to-video

  • Fast: Google Veo 3.1 Fast i2v (proprietary)

  • Quality: OpenAI Sora 2 i2v (proprietary), LTX-2, LTX-Video, Wan 2.2 I2V 14B, Wan 2.1 I2V 14B 720p, HunyuanVideo I2V, Stable Video Diffusion XT

Plus a Custom option where you paste any HF repo id to use a model not in the catalogue.

Features you won't find in a weekend build

  • Thread-safe by construction. All HTTP happens on a worker thread, never on TD's cook thread. No "Cannot reference operators from other threads" warnings, no crashes under continuous-mode abuse.

  • Three trigger modes. Manual pulse. On-change (debounced — re-runs when the prompt or input TOP changes). Continuous (fires at a fixed rate, paired with a distilled model for near-live output).

  • Preset save/load. Snapshot your current Task, Model, Provider, prompt, dimensions, seed, scheduler, strength — recall with one pulse.

  • User callbacks. Stamp out a starter callbacks DAT with one press. Hook Ongenerate, Onresult, Onerror, Onprogress, Oncancel events into your own Python to drive anything downstream.

  • Retry with fallback. One click to cycle the provider when fal-ai returns an SDK-incompatible response. Known-broken models from the Qwen-Image-Edit family come back to life via replicate.

  • Outputs saved automatically. Videos land in <your tox folder>/outputs/hf_video_TIMESTAMP.mp4. Images saved alongside when Saveimages is on. No hunting for files.

  • Cross-platform. macOS, Windows, Linux. Config lives in the right per-OS place

Honest caveats, up front

This is not a replacement for StreamDiffusionTD or a Daydream-style live pipeline. Those run a persistent diffusion pipeline with cross-frame latent state and amortise model warm-up across frames — that's how they hit 20–60 FPS. HF Inference Providers are a request/response API. Realistic ceiling even with distilled models is 0.3–1 Hz per stream. If you need true real-time video diffusion, keep StreamDiffusionTD for the live feed and use hf_infer for one-shot generation across a much wider library.

You pay per call. A fast text-to-image run is roughly $0.005–0.02. A video call can be $0.10 or more depending on the model and duration. Your HF account's free tier covers casual exploration; regular use needs credit on file. The TOX tracks Requestcount and Errorcount on a Status page so you can see what you've spent on at a glance.

What you get with your pledge

The .tox file itself — drop it in any .toe, ready to use. The full Python extension source. A written spec (DOCS.md) covering every parameter, every task, every troubleshooting path. An example .toe showing text-to-image, image-to-image, and live video uses side by side. Access to updates as new models land on Inference Providers and the catalogue grows.

 

Asset Downloads

Experience level 

Comments