Configuration Editor:
Designer:
The performance statistics contain a lot of information. In many cases displaying it all can be a bit overwhelming and requires a lot of screen real-estate. So individual sections can be enabled and disabled and with the Frame Count the amount of visible frames can be adjusted.
The display is updated every frame. By pressing the pause key, the update of all sections can be paused.
Here is an explanation of these sections.
This gives a detailed breakdown of what happens on the main CPU thread and the GPU. If SDI boards or Webcams are in use, it will also display the timing of the Video IO thread.
The percentages are averaged over the last 64 frames. The colored bars show in more detail which part of Ventuz is performing worst. By hovering over the bars, a tool-tip explains what the bars stand for. The vertical white lines represent frames. The display is configured to show two frames, assuming you are most interested in performance when things are bad.
Please note that the performance statistics are about frames, not fields. So when running at 1080i50, expect a framerate of 25Hz, and the bars show information about rendering two fields. This is especially interesting for the video engine, as it does most of its work per frame and not per field.
In general:
By default, bars for two frames are displayed. If the scene runs slower than two frames, the bars are truncated. To display more or less then two frames, the number of frames can be changed with the FrameCount setting in the performance pull-down menu.
Clock is the internal cluster-synchronized clock of Ventuz. It counts video frames, or fields when in interlaced mode. This line also shows the Cluster clock converted into seconds, minutes, hours and days. Then follows the Cluster ID of this machine, and the Machine ID from which the clock is taken (in a cluster).
The last few numbers count the frames that were skipped or duplicated because of cluster synchronization, and the reason for the last skipping or duplication.
Framerate is the frame-rate and frame duration, accumulated over the last 64 frames (or fields). Again, in interlaced mode this is the rate of full frames with two fields. It shows the actual measured frame-rate and the expected frame-rate. This information is displayed again, converted into milliseconds.
When tweaking performance, think in milliseconds, not fps. You can easily do math in milliseconds, like 3ms + 7ms = 10ms. You can't do that with FPS. 333fps + 143 fps = 100 fps? That's correct, the same numbers as above, but it just doesn't make sense. Also, getting from 40 fps to 50 fps is 5ms, getting from 50 to 60 is 3.3 ms. Squeezing a millisecond is always the same, a difference in fps is essentially meaningless.
Active Bindings (P+E) shows the number of validated bindings of each frame. The first number shows property bindings and the second one shows event bindings.
CPU Load Tells the total CPU load by the Ventuz process. The Performance Timing bars above only capture the main thread and the video thread. The most important other threads are:
The Scenes count behind the CPU Load simply indicates how many scenes are currently loaded into memory.
CPU Memory shows the percentage of physical memory Ventuz takes up, and how much memory is swapped on disk. The values come directly from the operating system, so they include memory that is allocated indirectly, for instance by DirectX.
GPU Memory shows the percentage of GPU memory Ventuz uses. The values are counted manually by Ventuz whenever it creates or destroys a resource. Since the driver may have some overhead that is not knowable to Ventuz, the actual amount of GPU memory used may be larger.
For instance, the driver may need to round up the width of a texture to the next power of two, so a 640 pixel wide texture gets rounded to 1024, a substantial difference. Ventuz can not know this and reports the incorrect amount. The Clock shows the current core frequency as well as the base frequency.
Note that the GPU Clock field shows the frequency of the core not that of the memory. It can go beyond 100% when overclocking the GPU.
Further information on the GPU can be found as well: The manufacturer as well as the model name, its core number and its core group granularity is diplayed right below the memory usage bars.
GPU: Indices is the number of Vertices drawn ignoring reuse. Each triangle contributes 3 indices, each line only 2. DrawCalls are the number of times something is drawn. With font rendering, multiple draw calls can be combined to a batch. The number of Batches is the most important indicator on how much time is spend in the graphics driver. Another trick used by font rendering to reduce the number of batches is Instancing.
In the Community Edition of Ventuz you will find an additional Node Limit in this list. It indicates the currently rendered Primary Nodes and their maximum allowed number.
Cores:Number of shader cores are reported by vendor specific API. This number may be incorrect for some architectures and is only used to give a rough estimate on how powerful a GPU is.
Group Granularity: Number of shader threads that are guaranteed to run in lockstep. Usable for shader optimizations.
Tex: Memory used in textures, excluding color and depth buffers.
Color/Depth: Memory used for rendertargets and depth buffers.
Geo: Memory used for vertices and indices of meshes.
Buffers: Memory used for buffers. Buffers are used for instance in the particle system.
Gfx-State: Each material is represented as a gfx-state when it is actually drawn. Each frame the states are set and possibly need an update. Updating is especially expensive if the shader-signature has changed and the exact shader variant has to be determined. this gets even more expensive if the shader is new and needs to be compiled. You should make sure your Signature values are near 0 and Compiles are always 0.
Dynamic Geometry: Ventuz will do some FastUploads for internal purposes. When a geometry is animated by binding it, this will issue a SlowUpload when a value changes. How slow such an update is depends on how complex the vertex format is: Most vertex formats will undergo no conversion or a QuickVertexConvert. Sometimes a SlowVertexConvert has to be done, this should be avoided on animated geometries.
Particles: Shows how much particles, either as a sprite or mesh, are calculated every frame. High numbers can impact the performance of your scene drastically. Generally speaking meshes are impacting performance more then sprites.
Creation Calls: Are an exceptionally expensive part of the DirectX API and should only happen during scene loading.
Shader: Shows the currently cached and compiled shader number.
Persistent Resources: are resources that accumulate and are never deleted until Ventuz is quit. VertexFormats and VertexConverters are cheap internal resources that should stay below 25. Shaders is the number of different shaders that have been used since Ventuz started.
The Video Phase will give you detailed information about the health of all current outputs.
1: The star indicates which output is used as timing ressource for the Ventuz clock. A SDI output has highest priority, followed by a GPU output (except Preview window). Shared Surfaces or NDI has the lowest priority. The Preview Window is only used, if there is no other output.
The first letter code indicates what Ventuz has set:
Everything after that will show the output mode:
The following situations will always be "dwm" and should not be considered a problem:
2: Shows the current framerate the Output is using in Hertz.
3: Currently rendered resolution of that output.
4: The orange bar shows the shift of the video phase in percentage compared to the output with 0% shift. The values afterwards represent the shift in whole frames and partial numbers.
Queues are represented in different colors depending on there sync status:
Note: For some devices like GPU outs in fullscreen exclusive or specific SDI boards, this indication might not be correct. 5: Output device name and output number / output name
Composition: Shows the currently rendered Composition.
Screens: Currently rendered Screen(s).
Outputs: Used GPU's for the current rendering.
Render: Shows the current number of render steps more complex scenes will show a higher number.
These resolution numbers will be bound to your Render Setup when using only the Runtime, previewing the "Design Preview or enabling "Screens and Outputs without Scaling". Only the other previews inside the Designer are dynamically changing it to save performance!
Blitslow/Blitfast:
This is a list of all Audio and Video Devices and their states.
The states mean:
The bar shows how many buffers are allocated and if they are full. This is only shown when Ventuz manually manages buffers, like for SDI streams. For an output stream, the buffers should be as full as possible, and when Ventuz is too slow the buffers are drained, a drop is only visible when the queue gets empty. For input streams, the buffers should be as empty as possible, when Ventuz is too slow the queue fills up so that Ventuz can process the images later on. Input frames are only dropped if the queue overflows.
Next is a list of audio cross-points. It shows the amount of delay that is associated with the cross-point. Delay is needed for windows audio devices or asynchronous inputs.
For each enabled movie related node, this shows information about the movie. This is especially important since most of the CPU load done by movies is not captured in the performance timing bars.
For each movie, the state, position, lead, buffering and file-name is shown.
The states are:
When multiple scenes are load into Ventuz, all the inactive scenes have still their audio player initalized. This may waste lots of GPU and CPU memory for the frame-buffers. Movies from inactive scenes have gray text instead of white. Ventuz will automatically take care to put movie clips from inactive scenes into pause mode, so that at least no effort is wasted in decoding frames that are never displayed.
The lead works like this: When the clip is not in async mode, and Ventuz runs in a cluster, it can happen that seeking to a position is not good enough because when the seek is complete, time has passed and the video is already too late. To avoid this problem, always pause the video, seek, wait a bit and then un-pause. If this is not done, Ventuz will notice the problem and automatically seek to a position slightly in the future, like when shooting skeet. A lead of 0 means no problem, 1 means that Ventuz considers leading but hasn't done it already, and higher numbers indicate increasing amounts of lead-time.
The buffers show how many buffers Ventuz has allocated and how many frames in the future are available. If these buffers run dry, increase the number of buffers for that movie. At the end of each line you can see the exact Cluster Clock when each Movie Started. Like the lead this is mainly useful to check syncronization of movies in a cluster.
In the Ventuz Designer, one can jump directly to a movie node by right-clicking on the name of a movie. A message is displayed to hint this:
Here the top movie is in an inactive scene (therefore it is displayed in gray, by right-clicking on it one can see where the node is hidden.
When a camera tracking system is used, debug information can be shown. Ventuz will display the view and projection matrix, plus the K1 and K2 values and lens center shift in x and y for the lens distortion, all received from the tracking system.
The layer engine will magically convert your layering into an optimized set of shaders. Each shader call can blend up to 8 layers together, if conditions are right.
To check how well the engine does in combining effects into single shaders, you can look ad the layer processing.
Each line represents:
The log will reference textures and rendertargets with !#00, where 00 is replaced by a number assigned to the texture / rendertarget. Blits reference their sources by referencing to other lines in the log, using the notation @000. In the example you can see how a scene is rendered to color buffer !#06 and depth buffer !#07, then the color buffer is resolved to !#08 and the whole thing is referenced as line @003 in the first blit.
Some effect will create only a small shader snipped in one of the blits, like the grayscale and the color correct effect. Other effects, like the blur, will create multiple shaders, even changing depending on parameters. In the example above, the two downsample and the two blur blits are generated by a blur effect
Layer Blending will create Blend.XXXX blits, with XXXX being the blend-mode. The last blit line shows how multiple blends and other effects are merged into a single blit, the two nops being leftovers from the merge process.
As a rule of thumb: the less blits the better, and everything with blur in the name and the downsample box are especially expensive.
Shows the messages coming from the Log Node.
The watches show all currently watched properties in the scene tree. For more information see the Watches Editor page.
Use this option to identify which output is using which GPU and which connector.
GPU's are counted as an array starting with GPU 0 and connector 0. For example a scene rendered on GPU 0 using the third connector would read GPU 0:2.
These can also differ depending on what they are currently rendering. Preview windows will always show the "Preview" flag.
Detailed Timing Header
TIMING : 00:01:29:42 0005382 : +007 Drift, 258 drops, 257 multi frame drops, 4 drop clusters, 0 Resoft, 0 Resyncs, 3 Disjoint, sr Flip-Queue Histogram : 0:0 1:10 2:6 3:0 4:0 5:0 6:0 7:0 unchanged:4475 noinfo:2
Flip-Queue Histogram:
Analyses the queues of all outputs, and every frame it logs the most empty swap-queue. This allows to detect performance issues before a frame visibly drops: Especially if you configured a few extra buffers, you can see the queue decreasing in length without getting empty. This is an indication how "safe" you are running.
When running in the designer with render window and physical outputs, the render window will not be included.
Interpreting the Histogram:
If you see the slot "1" going up, you are having visible drops or are very near to dropping. Increases in higher slots should not be visible drops but temporary situations where rendering was slow but still some frames queued up for display, so no visible drop. By adding more extra-buffers things should shift to the safe side, at the cost of increased latency / lag. A good situation would be by adding two extra buffers, seeing the "1" and "2" slots at zero and the "3" slot showing some increases. You are running fine.
Detailed Timing Events
In addition to the header, the detailed timing will log events, basically whenever something went wrong.
Logs entries start with a timestamp:
WED-14:14 00:00:00:31 0000031
Day of week, Hour:Minute of the day, cluster-clock as timecode (HH:MM:SS:FF), Cluster-clock total.
The number in brackets is the number of frames that should have passed. +1 is normal, +3 indicates that 2 frames were dropped.
Unfortunately, the GPU measurement is sometimes incorrect. Especially when the GPU isn't doing much, it might stall in the middle of the measurement, displaying a high GPU utilization while it is actually doing little. Enable the Stall GPU option: this will negatively effect frame-rate, but correctly tell whether CPU or GPU is the bottleneck.
To see spikes in performance better, enable the History Mode with the drop-down box. Pressing the pause key on keyboard will pause and continue the update of the bars. When the history mode is on and pause mode is on, use the page-up and page-down keys to scroll. Pressing the control key will scroll faster.
The numbers before the lines are the last three digits of the cluster clock
When analyzing performance, the first thing to notice is whether it is the CPU or GPU that is limiting Ventuz performance.
Unfortunately, the GPU measurement is sometimes incorrect. Especially when the GPU isn't doing much, it might stall in the middle of the measurement, displaying a high GPU utilization while it is actually doing little. Enable the Stall GPU option: this will negatively effect frame-rate, but correctly tell whether CPU or GPU is the bottleneck.
By default, the display uses accumulated bars. When this is disabled, the bars show the exact timing as it happens on the CPU or GPU, and not just accumulated values. Also, one can see the various periods where the CPU is possibly waiting for the GPU (blue "idle" bars). In this mode, one can also see how the bars are stacked: ContentValidation and HierarchyValidation are stacked on top of Validation, which shows the total time including what is stacked on top.
Such detailed performance measurement is not for free, and especially the "Stall GPU" option slows things down. When the performance monitor is completely disabled, or the timing section is disabled, no data is recorded and therefore impact to performance is negligible. You can see the difference by switching the timing section ("Performance" in the menu) on and off while keeping the system information on.
More information about how to interpret this below.
If working in Designer with RenderSetups of high resolution in Machine or Cluster Preview modes, the Performance Statistics rendering might get very tiny. To scale it up to the preview window size, enable this option.
These statistics are gathered in a separate thread so not to harm rendering performance directly, but there can be side effects that effect system performance that are hard to pin down. These stats will also show the current sync board status if one is configured and detected.
The sync board status is only available for NVIDIA Sync boards!
Clears the history of the Detailed frame time debugging.
Use this button when the rendering gets out of sync, to force Ventuz trying to resync everything.
Define where and what kind of information should be displayed.
Show on all Outputs:
Show only in preview window:
Show on all outputs, only relevant:
The performance monitor allows high performance logging. This is divided in two sections: Frame logging is replaced every frame, and can be used for watching values as they changed. This will look like the other performance statistics.
The console will work like a normal log file where messages printed will scroll endlessly. This is useful to capture events and state changes.
Use the Log Node or scripts to do frame and console logging.
The strings [sb:10/100], [mb:10/100], [lb:10/100] will paint a small, medium or long bar which will be filled for 10 of 100 parts (10 %). This bar looks like the Memory bars in the System Information section. The numbers will be printed as written, and the bar-length will be calculated by the division. So 33/100 will yield approximately the same bar-length as 1/3, but display different numbers.
To add text inside the bar, put it after the numbers : [lb:10/100MB]. A special mode allows number from -100 to 0 be displayed as decibel : [lb:-12/dB].
[! will print text brighter until ] is encountered, line in dim [! bright] dim.
The text-color can be changed with [tc:rgb] and [tc:rrggbb], where r, g and b must be replaced by web-style hex-colors. Again, the text in the brackets will be effected. Example: white [tc:ff0000 red] white.
When logging, c# will convert \n to linefeed (ASCII 10) and \f to formfeed (ASCII 12). Linefeeds will work normally, no \r carriage return (ASCII 13) is needed. Formfeeds will insert a half-sized linefeed. This is used between the sections of the performance statistics.
The codes \\, \[ and \] are parsed by the logging system and can be used for escaping. Other uses of the backslash will print the backslash normally. So in a c# script, one can write \\[ to escape a [.
When getting clever with markup, it might be a good idea to check in the render window and the web interface if both sides interpret the markup correctly.
When using resolutions larger than HD, it becomes noticeable that the console logging area is limited to 238 x 96 characters. Frame logging is limited to 8095 characters total.
For frame-logging use Ventuz.Kernel.Tools.PerfMon.PrintFrame . This information will be overwritten every frame.
For console-style logging use Ventuz.Kernel.Tools.PerfMon.PrintConsole.
All this logging is high-performance and when the performance monitor is disabled, the only cost of this is the formatting of the string.
This explains how to add custom colored-bars to the performance timing.
First, a token must be created as a static member:
public static Ventuz.Kernel.Tools.PerfMon.Token Script1 = new Ventuz.Kernel.Tools.PerfMon.Token("Script1", 0xff8000);
The token has a name and a color. The color is given as hex web-color in RRGGBB format.
To measure something, use the token:
using(Script1.MeasureCpu) { System.Threading.Thread.Sleep(3); }
Here is a complete example:
using System; using Ventuz.Kernel; using Ventuz.Kernel.Tools; public class Script : ScriptBase, System.IDisposable { private bool changed; public Script() { } public virtual void Dispose() { } // Token for Performance Timing public static PerfMon.Token Script1 = new PerfMon.Token("Script1", 0xff8000); public override void Validate() { changed = true; // Logging that will be overwritten every frame PerfMon.PrintFrame("Hello World\n"); // Logging that will spam the console PerfMon.PrintConsole("LogLogLog\n"); // wait a few milliseconds, and add a bar to timing using(Script1.MeasureCpu) { System.Threading.Thread.Sleep(3); } } public bool OnEnable(int arg) { Ventuz.Kernel.Tools.PerfMon.ConsoleLogEnabled = true; Ventuz.Kernel.Tools.PerfMon.FrameLogEnabled = true; return false; } public bool OnDisable(int arg) { Ventuz.Kernel.Tools.PerfMon.ConsoleLogEnabled = false; Ventuz.Kernel.Tools.PerfMon.FrameLogEnabled = false; return false; } public override bool Generate() { if (changed) { changed = false; return true; } return false; } }
To make this example work, add an integer input to the C# script node and connect it to the cluster-clock output of the SystemID node. This will update the script every frame, although the value of cluster-clock is ignored.
Assume a computer with a 4 core processor with hyper-threading. That makes 8 logical threads.
If a Ventuz scene takes 60% on the Main Thread, and no other time-consuming tasks are done (movie replay, heavy load on the DirectX driver), the CPU percentage should show ca. 7%, as the CPU is able to run 8 threads at the same time, and one of these threads is at 60% load.
Now, hyper-threading is not perfect. One can expect a 50% performance boost through hyper-threading at best, making it feel more like a 5-6 core than an 8 core machine. Still the CPU percentage display will consider all threads the same, as the actual performance boost is highly depending on the application.
The movie playback does not show on the Main Thread. But it shows in the following way:
When playing low frame-rate video (24 or 30 fps) on full frame-rate outputs (50 or 60 fps), the GPU processing is not performed every frame. Turning on the History option for the performance timing reveals this: The GPU processing is done only every second or third frame. Be aware that while the average GPU load may be below 100%, individual frames may spike beyond the 100% mark.
In the AV Inputs/Outputs section, the delay of the audio crosspoints is displayed. This delay is inserted automatically by Ventuz to have some buffer when audio devices do not deliver data completely synchronous. Lets consider an example where an audio crosspoint goes from SDI-In to SDI-Out for a refresh rate of 59.94 Hz (Interlaced). The delay will be quite larget. What's going on here?
So for shortest delays:
Red(ish): Validation
Green(ish): Rendering
Some shaders get special colors:
Blue(ish): Idle
The Idle markers are only shown when Accumulated Bars is switched off.
Yellow(ish): Postprocessing
Cyan(ish): Resource Generation
Video Process:
Other