Ventuz unique approach to multi graphics card (GPU) scaling can be used to increase the resolution rendered by a machine.
The new approach is different from what is usually done for games for both the problem and the solution.
Games usually improve rendering speed on a single output.
Ventuz improves the performance per machine and per output by using multiple GPUs for multiple outputs.
Every GPU only renders the content for the outputs connected to that GPU. Using this method, Ventuz has a linear performance increase. Traditional multi GPU approaches like NVLink or CrossFire does not have desireable performance in scaling even if using only two GPUs.
By scaling over multiple GPUs in one machine, Ventuz increases the rendering resolution over multiple outputs, exceeding 16k resolutions on a single machine.
Included in this functionality, it is possible to render completely different content on each GPU.
For offscreen devices like SDI, NDI, etc., it is possible to use an additional GPU that renders these outputs discreetly.
Beyond what can be rendered on a single machine, Ventuz can synchronize a cluster of machines to scale even further.
To enable Ventuz to render across multiple GPUs a few steps are needed:
When multiple GPUs are installed in the machine, they appear as individual devices in the Ventuz Device Configuration. By adding outputs of the respective GPU, that GPU is used. By using outputs from multiple GPUs, multi GPU rendering is automatically enabled.
For non-GPU outputs such as SDI, NDI, Off-Screen, etc the Render on GPU setting can be used to activate a specific GPU for rendering.
A major advantage of using a single machine with multiple GPUs instead of using multiple machines, is that there are no additional steps to ensure there is synchronization between different Ventuz Runtimes. However, when using multiple GPUs in one machine, a synchronization board - like the AMD S400 or Nvidia Quadro Sync - is required to synchronize the multiple GPUs at the hardware level to achieve a smooth, stable, and synchronized result.
Output devices in the Device Configuration have a Render on GPU property. This property allows the user to select which physical GPU output the content will be rendered.
For GPU outputs, it is natural that they are rendered on the GPU to which the output belongs. This is what the default setting does for GPU outputs. It is rarely needed to change the rendering of a GPU output to a different GPU. Doing so will even lower performance as the rendered frames need to be transferred over to the GPU that's outputting the content.
For non-GPU outputs such as SDI, NDI, Off-Screen etc the Render on GPU property can be used to designate a specific GPU to render that non-GPU output. For example, one GPU can be used for Director and its preview, while a second GPU can be used to render an SDI output exclusively. For non-GPU outputs, the setting default will render on GPU0.
Note that Off-Screen Shared Surfaces used for preview in Director need to be rendered on the same GPU where the Director application is shown.
The Preview Window can also be rendered on a specific GPU. The setting default renders the window on the GPU on which it is currently shown. Note: the Preview Window will only show content that is rendered on the same GPU that the Preview Window is rendered on to ensure the best possible performance.
In Ventuz 7, it is not necessary to create a display group using Nvidia Mosaic or AMD Eyefinity to output across multiple GPU outputs as was required in earlier versions of Ventuz. Using display groups using these GPU features will still work if this is the desired workflow. Creating a display group for outputs on one GPU can increase the stability by decreasing the render queues that Ventuz has to handle.
Spanning a display group across multiple GPUs will not enable Ventuz multi GPU rendering. Each display group is presented to Ventuz as a single output by the graphics driver, thus will appear and be processed as one output to Ventuz. Creating a display group across multiple GPUs will only use the performance of one GPU while the driver distributes the image across the GPUs which only increases the amount of outputs. This is a result of how the GPU drivers handle these features.
To be able to see a resource on an output that is connected to a GPU, it has to be uploaded to that GPU. In general, the Ventuz Runtime render engine distributes everything needed to render the pixels to the GPUs.
Live video inputs and some movie clips may need to be manually restrict which GPU the assets are uploaded to according to where it will be displayed. Restricting which GPU receives the uploaded assets can increase performance of the system. This can be done by what we refer to as GPU Masks:
Video input devices in the Device Configuration have a Use on GPUs property.
By default, each frame of every video input used in the scene is uploaded to each GPU that exists and is used in that machine. The process of uploading the frames from the input device to the GPU can be time consuming, especially for higher resolutions, frame rates, and for multiple videos or inputs that are used simultaniously.
Ventuz can not determine where that video input will be shown automatically, it is always ready to display them on all GPUs by default.
In some setups it may be known that a video input will only be visible on certain outputs and not others. With Use on GPUs Ventuz will not share or send these inputs to GPUs that won't be using them; This will optimize the performance of the machine. Selecting one or more GPUs here will provide this input only to the GPUs selected. If changing these settings in an existing presentation, performance testing should take place.
Movie Nodes have a property called GpuMask. Similar to the Video Inputs, this allows a movie to be restricted to certain GPUs.
Software decoded movies that are decoded on the CPU have advantage that each frame bypasses the GPU memory. In a scenario where multiple movies are played back in the scene but will, by design, only be visible on outputs of a certain GPU, pushing all movie frames to all GPUs is unnecessary and can become a bottleneck, especially for high resolution and high framerate videos.
Hardware decoded movies that are decoded on the GPU itself, using the GpuMask prevents every movie from being decoded on all GPUs if not every one is displaying this movie. Letting each GPU only decode what is needed leaves more performance for other tasks.
For high resolution video walls with high resolution video playback, it is a common to pre split the movie into parts that match the rendered GPU area. Setting the GPU Masks accordingly can provide a massive improvements in performance.
Using post processing effects such as Layer Effects may have the same problem that exist in Cluster Rendering. The solutions used to solve this issue in Cluster Rendering can be used to solve these issues in Multi GPU Rendering.
One issue can come from Layers that have effects or filters such as blurs. A blur may present issues at the borders of each discreet output on different GPUs. Read more about it on the Cluster page, including a list of affected effects.
Ventuz enumerates GPUs according to the PCI slot enumeration on the motherboard, they are not related to how Windows enumerates. Ventuz does this to create an overall stable system. In the event that Windows or the GPU's driver settings are changed, using the logical addresses, re-ordering issues are avoided.
The outputs per GPU are enumerated starting with the one closest to the motherboard.
GPUs and outputs both start with 0.
Enumeration is visible in the Device Config.
Additionally GPU and output can be selected and deselected with the Live Option Label Outputs when the Runtime is running.
To use more than one GPU with Ventuz an appropriate license is required. Please check the edition comparison table on our homepage: https://www.ventuz.com/editions/