GPU to CPU fastest transfer

Hi, I have a project that will use a variety of tools including notch. I am looking at various hardware solutions, but part of what I will need to do irrelevant of hardware is to create images on the GPU and as I need to send almost all of the pixels over sACN or Artnet, this means I need to download the texture data from the GPU to the CPU for sending over the network (sACN/Artnet). I have a lot of hardware options and a good budget, but some testing on even some high end systems shows that this process is slow, and varies depeding on software and hardware. Does anyone have any experience with various GPU’s and CPU’s and getting large amounts of data from the GPU to the CPU? I know that GPU memory speed and motherboard bus bandwidth plays a role, but so do various hardware implementations that differ between motherboards, chipset and CPU - (AMD, AMD HDT, Intel core series and Intel Zeon) and also GPU architecture (Quadro, RTX and AMD).

If anyone has some advice on the fastest hardware for this particular pupose let me know. For reference I have tried with a 1080ti with a core i7 and pcie (3.0)@ 8x and 16x with a core i7, and a 2080ti with a threadripper and pcie 4.0 @ 16x. The fastest I can get is 25pfs (I can get this with either hardware but not faster and would like to run the project at 50fps for the video parts, obviously artnet is going to top out at 44fps), various software solutions give me the same results. I have tested with Artnet in Notch and various media servers.

Any advice is greatly appreciated.


You may get better bandwidth using the pro-level cards (Quadro etc) instead of the Geforces. How much data are you talking here?

However, the main problem with downloading from GPU to CPU is latency and sync, not bandwidth (unless you’re really talking a lot of data), and that is a software issue primarily - not a hardware one. So the more important question is - what software are you using to send that GPU data over artnet? (Notch doesn’t support video → artnet directly so I’m presuming not Notch :slight_smile: )

I am trying a bunch of different things, disguise at the moment is a target, but I am also doing tests writing my own software in c++/openGL. I do this mainly to test hardware capabilities. I am trying to download a huge amount of data, a 32216 * 960 texture on the GPU to pixel data on the CPU. To understand the bottleneck I am using a ring of pixel buffers on the GPU to get asynchronous transfers. But it does not seem to matter what software I use, getting this much data of the GPU to the CPU with the hardware I am testing is slow. What exactly about quadro cards do you think allows for a faster transfer - I am really looking for the specification/hardware feature that makes this faster.

In D3D the most efficient way to get an async transfer is to copy to a staging buffer (a ring buffer of staging buffers) which you then speculatively lock to test when the download is complete to avoid any syncs. GL has comparable methods.

But 32216 x 960 is a lot of data. At 50fps and 4 bytes per pixel you are looking at 6gb/s to download, which is going to be troublesome for anything really. That seems like a lot of data to send via Artnet too, and to render in the first place at 50 hz…
In order to improve the download it’s worth looking at doing GPU-side compression pre download (and then decompressing it to send out). This is something we do with NotchLC : compressing the data using compute shaders to say 1/8 the size, then downloading that smaller size. It depends on what you’re trying to render but some simple schemes like RLE could reduce the data size a lot.