Close
Community Post

WIP: Live StreamDiffusion “skin” on a Kinect depth point‑cloud (with GLSL MAT splats)

Intro

I've been experimenting with StreamDiffusionTD inside TouchDesigner as a way to texture a live depth point‑cloud captured with a Kinect sensor.  Instead of pre‑rendering frames, the StreamDiffusion operator generates new diffusion images on the fly and these images are mapped directly onto the point‑cloud.  The aim is to *dress* every point in a 3‑D depth capture with AI‑generated textures while keeping everything running in real time.  To achieve this I built a custom instancing setup and a GLSL material that renders each point as a soft Gaussian splat.

System & Tools

  • TouchDesigner build: 2023.11600 (Windows)
  • Sensor: Kinect v2 (depth resolution 512 × 424)
  • GeForce RTX‑series (a 3000‑ or 4000‑class card is recommended because StreamDiffusion relies on Nvidia GPUs and performs best on high‑end hardware)

Network Overview:

 

Depth capture & UV generation

The Kinect TOP provides a depth map which is converted to CHOP channels using a TOP to CHOP. The resulting CHOP has the Kinect’s depth values mapped into r, g and b channels corresponding to x, y and z. UV coordinates for each point are derived from the 2D grid dimensions (uGridDim uniforms), allowing every point to sample from the diffusion texture correctly.

StreamDiffusion texture

 A small cache/switch chain (sd_select → sd_cache → sd_switch → null_skin) ensures that the StreamDiffusion TOP always outputs a valid frame and avoids magenta frames when no new diffusion image is available. 

Instancing

A single‑point SOP in geo1 is instanced across all points in null1_position. Instancing attributes (P, uv, etc.) are sent to a custom GLSL material.

 GLSL MAT

The GLSL material is responsible for rendering each point as a soft circular splat. It uses TouchDesigner helper functions to compute projection and passes per‑instance UVs into the fragment shader.

Core GLSL code

The Mat shader consists of two stages: a vertex shader that computes UVs from `gl_InstanceID` and a fragment shader that samples the diffusion texture and applies a Gaussian falloff.  The fragment shader looks like this: 
// Fragment shader (glsl2_pixel)
uniform sampler2D skinTex;
uniform float sigma;
in vec2 vUV;
out vec4 fragColor;
 
void main(){
    TDCheckDiscard();              // Early discard for depth culling
    vec2 p = TDPointCoord() - vec2(0.5); // Use TDPointCoord() for point‑sprite UVs
    float g = exp(-dot(p, p) / (2.0 * sigma * sigma));
    vec4 tex = texture(skinTex, vUV);
    fragColor = vec4(tex.rgb, g);
}

 

Why use `TDPointCoord()`?The TouchDesigner documentation notes that point sprites should call `TDPointCoord()` instead of `gl_PointCoord` to obtain 0–1 texture coordinates; using `gl_PointCoord` can vertically flip the coordinates.  Also remember to write to `gl_PointSize` in the vertex shader when rendering point sprites.

What works so far

  • Live AI “skin” StreamDiffusion continually generates new frames that are mapped onto the Kinect point‑cloud in real time.  This creates a living “skin” over the depth capture.
  • Stable textures: A simple cache/switch network ensures the diffusion TOP always outputs a valid frame, preventing the magenta warning frames that appear when a texture is invalid.
  • UV mapping & Gaussian falloff: UVs derived from the depth grid map 1 : 1 onto the diffusion image.  The Gaussian function in the fragment shader produces soft, circular splats that blend smoothly when points overlap.

 Areas for improvement

  • Visual quality: The look of the point‑cloud depends heavily on the per‑point size and falloff width.  The `uPointSize` and `sigma` uniforms control these parameters; experimenting with different values can produce denser or softer results.
  • Performance: With ~64 k points the system runs around 10–15 fps.  Possible optimizations include lowering the depth resolution, down‑sampling before instancing, or rewriting the pipeline using compute shaders to offload work from the CPU.  StreamDiffusion itself benefits from high‑end GPUs.
  • Depth shading:Currently all splats are the same size and color regardless of distance.  Adding depth‑dependent adjustments (for example, scaling `gl_PointSize` or modulating color based on depth) could give stronger depth cues.
  • Input variability:Real‑time diffusion works best when the input has sufficient texture and structure for the model to latch onto.  The StreamDiffusion article explains that if your input is just lines or contours, the output may look like abstract fragments; adding noise and texture to the input helps the model generate richer images.  Applying pre‑processing (blur and noise) to the Kinect depth map before feeding it into the diffusion pipeline may improve results.

 Using the component

To try this yourself:
  • Load your Kinect for point depth cloud: Use the Kinect TOP and convert it to CHOP channels with a TOP to CHOP.  Make sure the channels correspond to position (`r`, `g`, `b`) and compute UVs based on `uGridDim`.
  • Swap in the StreamDiffusion texture: Replace `skinTex` in the GLSL material with your live StreamDiffusion TOP.  The cache/switch network ensures a valid frame each cook.
  • Adjust uniforms: Use the Vectors and Samplers pages of the GLSL MAT to modify `uGridDim`, `sigma`, `uPointSize`, and assign the correct texture sampler.  `uGridDim` should match the Kinect depth resolution (512 × 424 for Kinect v2); `sigma` controls splat falloff; `uPointSize` sets pixel size

I’m just starting to find my footing with point clouds and depth data, and I’d love to connect for feedback, knowledge-sharing, or collaborative ideas,  particularly around performance, compute-shader techniques, three-point theatrical lighting, or other depth-aware shading strategies.

Comments