Login | Register 
FEATURES
APPLICATIONS
DOWNLOADS
EDUCATION
BLOG
WIKI
FORUM
STORE

Hitting bottleneck controlling close to 4,000 LEDs !

General discussion about anything TouchDesigner

Hitting bottleneck controlling close to 4,000 LEDs !

Postby lucasm » Wed Oct 15, 2014 1:59 am

Hey everyone,

I've been working on a modular LED video panel system for several months and despite being close to the end I've hit a rather large roadblock and I'm at a loss here of what to do next..
The setup here does work great up until maybe 1000 led's, then the slowdown gets to be quite noticeable.

In Short:

I'm trying to find a way to either optimize sending 11,520 bytes every frame (roughly 690 kbytes a second) over serial OR another protocol or way all together..

I've read through the forums in search of other similar issues and have only found this:
http://derivative.ca/Forum/viewtopic.php?f=4&t=6075
A bit more detail on the setup:

Hardware:

- Windows 8
- Teensy 3.1 (arduino based MC with full usb speed serial communication)
- OctoWs2811 (high speed led library for the teensy)
- Adafruit's Neopixel strips

My setup:

1) Touch designer instance that manages the UI and animations created. This sends the animation data pre mapped in a table (3 rows, r/g/b, the number of columns equal the number of addressable led's), out through a Touch Out DAT.

2) Second Touch designer instance receives the table dat, and focuses solely on formatting that table data into several compact byte strings that are sent using python and the serial.sendBytes() command.

[i]A side note here, I have to split the led bytes into chunks or packets of 255 or less due to a built in limit that sendBytes has.. Is there a way around this limit? I've successfully sent entire byte strings using processing and it's considerably faster (35-45 fps)

3) Byte string is received over serial comport by the teensy, at this point it's premapped and each 3 values are applied to the led's 1:1 more or less.

Python Code:

Code: Select all
def receive(dat, rowIndex, message, bytes):
   print("asd")
   thisTeensysName = "%s"%(me.name.split("_")[0])
   panelEnabled = op("serialCom_enabler")["%s_enableLED"%(thisTeensysName)]
   
   if(panelEnabled == 1):
      
      n = op('%s_dataEnd'%(thisTeensysName))
      serialConnectorName = "%s_serialConnector"%(thisTeensysName)
      serialToggleVal = op("serialCom_enabler")["serialToggle"]
      
      packetSize = int(255)
      
      debugCharLength = 0 # when this is set to one, we build our debug array. If it is set to 2, we debug our time.
      
      if(debugCharLength == 2):
         import time
         millis0 = int(round(time.time() * 1000))
         
      colorTable_RGB = []
      #debugLengthArray = []
      executeArray = []
      subExArray = []
      
      numSamps = n.numSamples
      
      ch_r = n['r']
      ch_g = n['g']
      ch_b = n['b']
      
      if(debugCharLength == 2):
         millis1 = int(round(time.time() * 1000))
         print("      Initialization Stuff TIME: %i"%(millis1 - millis0))
      for i in range(0, numSamps):
         if((len(subExArray) + 5) < packetSize):
            subExArray.append(int(ch_r[i]))
            subExArray.append(int(ch_g[i]))
            subExArray.append(int(ch_b[i]))
         else:
            executeArray.append(subExArray)
            subExArray = []
            subExArray.append(int(ch_r[i]))
            subExArray.append(int(ch_g[i]))
            subExArray.append(int(ch_b[i]))
      executeArray.append(subExArray)
      if(debugCharLength == 2):
         millis2 = int(round(time.time() * 1000))
         print("      Building Exec Arrays TIME: %i"%(millis2 - millis1))
      if( int(serialToggleVal) == 1 ):
         for subArray in executeArray:
            #print(subArray)
            #print('op("%s").sendBytes(%s)'%(serialConnectorName, str(subArray).strip('[]')))
            exec('op("%s").sendBytes(%s)'%(serialConnectorName, str(subArray).strip('[]')))
      if(debugCharLength == 2):
         millis3 = int(round(time.time() * 1000))
         print("               Serial Send TIME: %i"%(millis3 - millis2))
   return




My next course of action is trying out a couple of FadeCandy's and using 1 per panel instead of 8 panels per teensy.

I'm not even sure this will remove the bottleneck, as just as many bytes will need to be written, just to different devices.. but these devices are highly optimized so worth a shot I guess.

Any ideas, thoughts, criticisms are all hugely appreciated! I really want to get this part ironed out.
User avatar
lucasm
 
Posts: 223
Joined: Sat Apr 28, 2012 7:55 pm
Location: Dallas, TX

Re: Hitting bottleneck controlling close to 4,000 LEDs !

Postby malcolm » Wed Oct 15, 2014 10:40 am

Have you measured the performance without print statements? Those can be quite heavy. We'll try to take a look at this though.
User avatar
malcolm
Staff
 
Posts: 4203
Joined: Tue Nov 13, 2007 1:11 am

Re: Hitting bottleneck controlling close to 4,000 LEDs !

Postby lucasm » Wed Oct 15, 2014 1:11 pm

Hi Malcolm,

Yep, I always check both when testing.Even with them off though, writing a lot via serial like this gets the OP's cook time into the 20, 30, and sometimes even 40 ms range.
User avatar
lucasm
 
Posts: 223
Joined: Sat Apr 28, 2012 7:55 pm
Location: Dallas, TX

Re: Hitting bottleneck controlling close to 4,000 LEDs !

Postby lucasm » Wed Oct 15, 2014 3:03 pm

I just ordered a pixel pusher board as well as a few fade candy's to try out as I read some really great things about driving lots of led's with both although I'm leaning heavily towards pixel pusher at the moment.
Anyone have any positive (or not so great )experience with either?

From the research I've done~

Pixel Pusher:

- Communicates via Ethernet

- Can control up to 3,840 pixels per board (480 per each 8 channels which fits my setup exactly) @ 60 hz (this claim is from their website, however they are really against neopixels as being slow so I'd have to do some testing to find out if this refresh rate holds true for neopixels too)

- Has support for receiving a TOP's video stream via the spout protocol in processing. (touch designer has direct support for this Wooo!) Although this means the premapped led data i generate in touch for the custom panels would have to be remapped to a top in a way that when applied to the led's would look mapped again... Not impossible but would take some noodling.

- Even potentially better yet! It has support for Art-Net via a java artnet bridge app that allows touch to talk to the pixel pusher/leds with out going through processing (but the java bridge app)

Fade Candy:

- Communicates via USB

- Can control up to 512 pixels per board via 8 channels

- would require a FC MC per panel and a panel to consist of 8 "strips" making up 480 rather than 1 "strip" making 480 pixels. This would cause a lot of work I've done to be void though :/

- Would have to configure rgb data to go from Touch -> processing(fade candy libraray) -> fadeCandy board -> leds

- I would need 16 usb connections to the computer to control the 16 panel setup at it's largest configuration.... that might cause a world of problems of its own. Thoughts?




I'm going to keep this thread going with findings and results, as I've had trouble finding much info on this topic. If anyone else out there has anything to throw in please do!
User avatar
lucasm
 
Posts: 223
Joined: Sat Apr 28, 2012 7:55 pm
Location: Dallas, TX

Re: Hitting bottleneck controlling close to 4,000 LEDs !

Postby FM64 » Fri Oct 17, 2014 12:33 pm

Hi,
I'm stuck with a very similar problem, I want to control around 2000 LEDs and I can't find an efficient way to deal with this huge amount of data.
I'm using 3 Artnet controllers with 6 outputs each, each output is assigned to a DMX Universe. And there is about 120 LEDs on each controllers output.

My workflow is organized as follow:

[Generate a TOP to represent the +- 120 LEDs of one output (resolution 120x1)]==>[convert the TOP into a CHOP] ==> [reorder the chans (G0 R0 B0 G1 R1 B1 etc.)] ==> [sends chans via Artnet through a DMX out CHOP]
This chain is repeated 18 times (once per output)

I spent a lot of time trying many solutions to optimize the workflow but it didn't work. I inspected the performance monitor to point out which OPs are slow to cook and it appears that the problem mainly comes from the TOPto CHOPs, the reorder CHOPs and the DMX out CHOPs.

I'm a bit desperate, i don't know how to re-think the process, and even if the fps drops to 15, neither my GPU nor my CPU seems to be overloaded (I inspected CPU/GPU load monitor when running TD)

Any ideas ?
User avatar
FM64
 
Posts: 28
Joined: Thu Nov 28, 2013 5:44 pm

Re: Hitting bottleneck controlling close to 4,000 LEDs !

Postby lucasm » Fri Oct 17, 2014 10:52 pm

Yikes,

I can't speak for the dmx out / artnet speeds as I haven't been able to get that to work anyways, but you SHOULD be able to get better speeds for the other OP's..

right now in my setup, I have a huge line sop that represents my led locations (each point a physical location) and using that in conjunction with a TOP to Chop to sample specific coordinates on the top.

These values are passed through a few other chops performing some maths, then that chop is passed into a CHOP to TOP generating a very wide, 1 pixel tall image like you have (3,840 x 1)

Going from CHOP to TOP my cook time never goes above .5 ms, for 3 channels at 3,840 samples. and the rest of the network related to converting the data around is equally quick...

You are repeating that chain you said 18 times, I think you might get better speeds if you combine data for as much of that chain as possible and split it up again at the end.

with chops you can use trim to select a certain span of your chop. Tops you can selectively crop, etc. do this as close toe your dmx out's as possible, as I think most OP's can process lots of data internally better than a lot of OP's can process less data ... I think.

Maybe someone else can chime in on this?
User avatar
lucasm
 
Posts: 223
Joined: Sat Apr 28, 2012 7:55 pm
Location: Dallas, TX

Re: Hitting bottleneck controlling close to 4,000 LEDs !

Postby lucasm » Fri Oct 17, 2014 11:07 pm

As far as my findings go with my project, I've definitely decided on going with Pixel Pusher.

I got mine in the mail today, and it's really optimized for communicating with large amounts of led's and works smoothly for the most part.

Good News:

Pixel Pusher's processing sketch for Spout(touch has a spout out TOP) works incredibly well.. with COOK times of around .5 MS in touch designer and way above 30 fps on processing the whole thing is really robust.
This is good news for scalability..

The Bad News:

According to Jas @ heroicrobotics Neopixels are extremely slow(comparatively ) and perform badly with their device. They claim that the driver is about an order of magnitude slower than anything else they support, and depending on when you order them, the led's will actually behave differently, sometimes not work at all because of manufacturing differences.

I confirmed this with my own 4 panels I've build so far, 1 of them bugged out in a totally different way than the other 3 and they were all built within weeks of each other.

Anyways,
To further complicate my own situation at least, the 60 led per meter flavor of the strip that works well (apa102) isn't widely sold, almost at all, so fun times! I get to redesign quite a lot of 3d printed parts.
:cry:

I'm sure I'll have more to report on this soon.
User avatar
lucasm
 
Posts: 223
Joined: Sat Apr 28, 2012 7:55 pm
Location: Dallas, TX

Re: Hitting bottleneck controlling close to 4,000 LEDs !

Postby lucasm » Sat Oct 18, 2014 5:55 am

Well I'm happy to say I've made some progress in the frame rate department:

TL/DR
I managed to get my fps @ 25 using Touch's Spout Out -> Processing Spout In -> Processing Serial Out.

Here's a video showing it all working:
http://youtu.be/UEz1eDLUXdE

I was fooling around with pixelPusher, and ended up tearing out the dll and code for Spout that had been incorporated into a processing sketch and re purposed it to receive frames from touch over Spout but send that data out over serial to the led's

Since processing is able to batch send the entire set of data at once (or at least it appears to from a higher level) the FPS was a solid 25 the whole time.. not quite 30, but it looks pretty good to the eye.

That's with touch and processing moving 3,840 pixels worth of data too. I'll post some more info on the how's for those who are interested soon. Going to clean up some code first.
User avatar
lucasm
 
Posts: 223
Joined: Sat Apr 28, 2012 7:55 pm
Location: Dallas, TX

Postby lucasm » Sun Oct 19, 2014 6:31 am

Hey everyone,

I've put together a collection of attachments of all the things you might need to get rolling with a large numbers of led's .

1) Touch example (barebones)

2) Processing middle man sketch

3) Arduino / teensy 3.1 sketch

If you're driving 3,840 led's you can expect roughly 25 fps, the less you use granted you configure processing and arduino sketches accordingly the faster the speed you can expect!

Im sure these aren't anywhere near perfect but it works and it's quite robust in my experience. Hope it helps!
Attachments
touch_to_serialArduino_11_octows8211Lib.zip
Arduino Sketch to use with A teensy 3.1 and the octows8211 library receives serial frame data and applies it directly to led's with a gamma fix only.
(1.56 KiB) Downloaded 272 times
ReceiveFrames_R3_03.zip
Processing Sketch to receive frames from touch over Spout and send them via serial to teensy / aduino
(34.33 KiB) Downloaded 269 times
simpleLedMappingAndDrivingWorkflow.toe
Touch TOE showing Simple example of led mapping/driving workflow.
(20.04 KiB) Downloaded 371 times
User avatar
lucasm
 
Posts: 223
Joined: Sat Apr 28, 2012 7:55 pm
Location: Dallas, TX

Re: Hitting bottleneck controlling close to 4,000 LEDs !

Postby FM64 » Sun Oct 19, 2014 2:40 pm

Wooooow,
thanks a lot for sharing this stuff, it helps me so much ! The way you are dealing with the led mapping is so much more effective and smart than mine, i'll redraw all the project :) .
Thanks to your advices I finally achieve to manage all the LEDs @45-50fps but I think i could optimize even more: Like in your example, i'm using a Shuffle CHOP to split all the samples in order to send it to the DMX CHOP, but I must use a reorder CHOP for each DMX output (18x) to organize the channels in the right order (G0 R0 B0 G1 R1 B1...).
It take 0,35ms to each reorderCHOP to cook this task, isn't it a bit long for such a simple operation ?
If I convert the data to DAT to reorder the channels, it's really quicker but then the conversion back to CHOP it's soooooooo slow.
I'm wondering about a solution to directly split the samples in the right order or maybe use a Cplusplus CHOP to write a optimized reorder...
User avatar
FM64
 
Posts: 28
Joined: Thu Nov 28, 2013 5:44 pm

Next

Return to General TouchDesigner Discussion

Who is online

Users browsing this forum: No registered users and 4 guests