Frame Buffer Sharing Performance

Hello Everyone!

I was curious before I put the work in to do it myself if anyone had done any profiling of the various methods of sharing the frame buffer of a TOP between processes on the same physical hardware. Below are the ways I am considering, and I am curious if anyone has any experiences to share with the various options. As an added caveat I plan to use 100+ TOPs sharing their data out, but each TOP will be pretty low resolution (i.e. less than SD) so I place a premium on low-overhead solutions.

DirectX Out - It feels a bit inefficient to share the data to directx given the data is originally in OpenGL? But perhaps this is a no copy solution?

Syphon/Spout - I’ve found this relatively unreliable on Windows for use in production when there are lots of textures being shared

SharedMem - Hits the CPU so that’s a non-starter

Custom C++ Top CUDA - Transfer the data to my process via CUDA IPC

Thanks,
Matt

Spout uses DirectX also, just through a more standard API. By default it is limited to 10 textures but you can increase this by setting a registry value.

What GPU/range? Nvidia added a set of copy functions to OpenGL, which all do very fast copies, including, in theory, peer to peer PCIe copies. (I wonder if their new NVLink stuff will work with same API? :nerd: )

Bruce

@bwheaton we are using Nvidia cards but can build a custom computer with whatever for the task. Is there a reference you can point me to on the copy functions? Do they work inter-process?

Thanks!
Matt

khronos.org/registry/OpenGL … _image.txt