Scene Performance


Introduction

Everyone who builds scenes using real-time graphics software like Ventuz should not only be concerned about the design but also about the scene performance.
This section of the manual provides guidelines for getting the best performance from your Ventuz scene and how to avoid performance bottlenecks and memory shortages. The first part will handle some general recommendations. The remaining sections will deal with special Nodes which may be critical for scene performance.

If Ventuz should render a scene with a frame-rate of e.g. 60 fps (fps = frames per second) every frame has only 1/60 seconds = 16 milliseconds time for validation of the scene tree and rendering. A simple scene may only take 1 millisecond to validate and render. But as the size and complexity of a scene grow, the overall time per frame will grow as well. If the necessary time for validation and rendering exceeds the 16 millisecond boundary, a frame-rate of 60 fps is no longer possible. Certainly it also depends on the hardware (especially CPU and graphics card) when exactly a scene reaches a level of complexity where the requested frame-rate is no longer given. But better hardware should always be the last solution to fix a bad scene performance.

On 32 bit Operation Systems Ventuz is limited to 1.38 GB memory usage! Use Ventuz x64 on 64 bit Operations Systems to overcome this limitation.


Simplifying the Scene Structure

As a rule of thumb the scene with best performance is an empty scene. This means that parts of a scene which are not visible should not be rendered. These subtrees should be blocked if the reason to not render them is only temporary in nature or deleted if they are never rendered.

A scene should also be built with as few nodes as possible. The number of nodes can be minimized by exploiting the fact that Ventuz is rendering using a State Engine. A Color node for example affects all nodes in its subtree, therefore multiple geometries can be assigned that color with only a single node.

The following basic examples show two different possibilities to build scene trees in the Hierarchy Editor which result in exactly the same rendering output, but are different in terms of performance.


The first hierarchy applies a separate Color and Texture node to each geometry. But this is not beneficial because both Color and Texture are the same for the geometries. The better way to build such a scene is shown below.


The same principle applies to Content nodes which provide values or resources to Hierarchy nodes. If an Axis node receives values from a Mover and the Axis is copied normally, the Mover will be copied as well. But if the parameters of the second Mover remain the same, it is dispensable because the second Axis could receive its values from the first mover:


In such cases do not use the regular copy [CTRL+C, CTRL+V] but the so called reference copy, as described in section Moving and Copying, to decrease the time it takes to build the hierarchy. This will save memory and speed up validation time.

Node specific performance optimization

These are a set of recommendations to speed up performance by using or avoiding certain nodes, possibly along the lines of: Minimize the use of XYZ node. This does not mean you should never use that node! For example, a RenderTarget uses a fair amount of resources. However, many effects and functionality are simply not achievable without them. By all means use them, merely be aware of their impact and use them wisely!


Textures and Images

There are some rules that should be considered when working with a lot of images and textures. Keep in mind that the graphics card has only a limited amount of memory which is not only used for textures but also for geometries. The textures are stored in an uncompressed format in the graphics memory. A texture with the dimension 1024 x 1024 pixels and 24 bits color depth requires about 3 MB memory on the graphics card even if it originally was a .jpeg file with a size of 200 KB.

Use textures as small as possible. Typically it does not make any sense to use textures with the dimensions of 2048 x 2048 if the render output only has a resolution of 1280 x 720. Use as few textures as possible. Many texture swaps in every frame have a bad impact on the performance. To avoid this join many small textures to one big texture and use the Mapping Node to access the required sections during rendering.

When working with gradient textures where the gradient is 1-dimensional (texture color changes only in X or Y direction) it is recommended to use textures that are only 1 pixel high or wide (e.g. 512 x 1 or 1 x 512 pixels). A texture in such dimension contains all necessary gradient information. The texture is automatically stretched if it is applied to a geometry and the memory saving is enormous compared to a 512 x 512 texture.

Use the DDS Loader Node to load textures and the Advanced Image Loader Node to load images. These Nodes load faster and require less memory than the Image Loader Node. The Image Loader always keeps an additional copy of the original image in memory and not only the texture which will be used by the graphics card.

Geometries and 3D Models

The primitives (Cube, Circle …) and imported 3D models should be treated in a similar way as textures. The geometries should have as few polygons as possible. Use the lowest tessellation for the primitives that still matches the visual requirements. The input properties of the primitives should not be animated if possible. The change of any input property results in a recreation of the geometry. This recreation is very CPU intensive. To animate the size of a primitive always use the Axis Node. Objects created in any modeling software should be real-time capable. Keep this in mind from the beginning of modeling. A high level of detail will reduce the scene performance.

Text Rendering

Do not treat or use Text like you would in a text editing software package. Rendering 3D text and holding it in memory, ready for any realtime changes, are a vastly different proposition than using it in a 2D text document. Added to this is the fact that complete character sets can have tens of thousands of characters if loaded in their entirety and your scene will have its hands full. A single typseset fully loaded into Ventuz might require, in the worst case, hundreds of megabytes of memory.

Both types of Ventuz Text load resources which are stored in the graphics card memory. The Texture Text loads textures and the Mesh Text loads geometries to memory. To reduce memory consumption of your scene, use as few different fonts as possible. And always load only the character sets from a font which are really needed. Fonts like Arial, Tahoma and Times New Roman have about 2000 characters (without any East Asian, Arabic, etc characters). It is not recommended that you load the complete font if only Latin characters will ever be displayed. These are about 200 characters. The other 1800 characters would be only a waste of memory.

Note that e.g. Arial and Arial Bold are different fonts and increase the memory usage if both are used in a Ventuz scene. The same applies for font resolutions: in most cases it does not make any sense to use the same font in e.g. Medium and High resolution. Note that Mesh Text has a lower memory usage than the same font rendered as Texture text.

Texture Text is rendered faster than Mesh Text, even if the Mesh Text is not extruded. The excessive use of the Scroll Text and Text Effect nodes should be avoided because these render every single character separately and not the complete text at once.

Spread Node

There are two very important points that have to be considered with the use of the Spread nodes. Disregard may cause critical performance penalties. The first point concerns the tree in the Hierarchy Editor. The Spread node duplicates every subtree that is linked to it.
To avoid unnecessary duplication of nodes, the subtree should be reduced as far as possible. The following screen-shots show a bad and a correct use of the Spread node.


In the first version not only the Cube is duplicated but also the Color and Texture node. This is unnecessary because the Spread node does not affect the Color and Texture node. All duplicated copies remain the same. This way every copy of the Cube node creates two unnecessary nodes which only consume memory and reduce the scene performance. The correct use of the Spread is shown below. Only the Cube is duplicated because Color and Texture are applied to any copy of the Cube anyway.


The second important point for the use of the Spread is the adjustment of the Max and Count properties. The property Max determines the overall number of possible copies. This property is not linkable. Therefore the value can only be set manually by the user. The reason for this is, that it is not possible to calculate the internal data structures of the Spread node for high number of copies in real-time.
The property Count defines the number of visible copies. This property can only have a value that is smaller or equal the Max value. Unlike Max the property Count can be linked and animated. To achieve the best performance, the value for Max should be as low as possible. If Count is needed in the range between 10 and 20, than Max should be set to 20.

Movie/Video Nodes

Playing movies in Ventuz should always be considered as a possible performance issue. This is because the playback of a movie requires CPU activity at two stages. In the first stage the CPU is used to decode the movie (if it exists in a compressed format like MPEG or WMV). In the second stage the CPU is used to copy the decoded movie frame data to the DirectDraw texture which is applied to a geometry.

If there is a frequent need of movies in Ventuz scenes, a Dual-CPU or a Dual-Core-CPU hardware is highly recommended. The benefit is that the workload can be distributed between CPUs (CPU-Cores). In such cases one CPU will be responsible for the decoding process and the other CPU for the data transfer to the DirectDraw texture. In addition consider using less CPU-intensive video codecs. Decoding a WMV for example is comparatively CPU-intensive, an MPEG1 is not.

Every movie and video reserves texture memory in a multiple of the movie/video resolution.

Arrange Node

The excessive use of Arrange nodes should be avoided if possible. The Arrange node depends on the bounding boxes of subordinated nodes and therefore its use can be CPU intensive. Avoid the animation of subordinated Nodes. The animation changes the position or size of the bounding boxes and forces the Arrange node to recalculate the position of the arranged nodes.

Axis Node

Avoid the use of Axis nodes which keep the default parameters. Such Axis nodes do not perform any transformations. Their internal matrix is an Identity matrix which has no effect on a transformation.
Do not use the Axis for grouping of subtrees, use the Group node instead. This does not affect the performance at all.

Image Transforms/Filters Nodes

Do not animate the input properties of the Image Transforms and Filters nodes. Most of these nodes are not real-time capable!

Light Nodes

The speed for the calculation of the shading depends on the type of light. Directional Light provides the fastest shading. The Spot Light is second. And the Point Light provides the slowest shading.

RenderTarget Node

Avoid excessive use of !RenderTarget nodes because many rendertarget changes per frame are very time consuming. Rendertarget sizes should be kept as low as possible. Do not forget that rendertargets consume memory on the graphics card. A lot of high resolution rendertargets can reduce your scene performance drastically. The maximum number of rendertargets is limited but depends on their resolution and amount of memory on your graphics card. Try to use rendertarget sharing to save resources (see the Render to Texture section in the Design chapter)!

BlurTexture Nodes

Avoid excessive use of the !BlurTexture nodes. The process of blurring needs many calculation passes and therefore causes a lot of workload for the graphics card.

Output Node

The Output node is used inside Hierarchy Containers to enable the linkage of further Hierarchy nodes to this Container node (on the upper Container level). Note that multiple linked Output duplicates increase the render load if a subtree is linked to such a Hierarchy Container. This kind of scene setup is used for Glow or Mirror containers. Even if it seems that there is only a single subtree bound to the Hierarchy Container, every linked Output duplicate adds a further instance of this subtree to the render load and may decrease the render performance depending on the size of the subtree.

Script Node

The usage of too many Script nodes (> 50-100) in a scene has two disadvantages. The first disadvantage is a long loading time of such a scene. This is caused by the fact that every script is compiled to a single assembly in the first validation after scene loading. The second disadvantage affects the scene performance. Every script causes a context switch between the assemblies which is relatively slow. This results in a long validation time per frame. To avoid the mentioned problems, scripts could be combined in a single Script node - if possible. Scripts nodes should only be used if it is not possible to achieve a desired functionality by combining existing Toolbox Nodes.

General Tips and Tricks