I compiled the 4.2 beta version of opencv, cause now dnn uses partly Cuda.
I used kinect body index (not great quality of subtraction, but there were few light when i did) with adding image background.
With my GTX 1060, 3-5 FPS in 1280x720, depending on pretrained model
I hit 10-14 FPS in 320x180.
I added possibility to drop images to increase FPS.
I prefer using c++ TOP, cause faster than python even if it depends on dependances. Much more, it's only one TOP, no need python script.
And opencv dnn cuda has greats result.