Real-time show efficiency best practices

Hello,

I recently did a show that consisted of several scenes, many of which used real-time fx paired with a kinect azure to create interactive visuals with a performer. While the show went well and the client was happy (hooray!), the show ran at about 70% speed, judging by the bpm of the audio file (only perceptible when kinect skeleton is active). I’m wondering what the best practices are to manage resource use in this scenario?

I’m particularly confused about the resource load of inactive nodes/scenes. My gpu (rtx 3080 laptop) can handle any of these scenes with kinect tracking individually just fine. But even when there is no output from certain nodes/scenes/layers, they still have a serious effect on gpu load, making my system unable to handle the particular scene that it would otherwise blaze through. Is there some way around this? I also tried some things like pre-rendering scenes into .mp4 and having them play on image planes, and decreasing intensive operations like field density and optical flow.

I would upload a dfx, but unfortunately I always get errors when trying to upload on this site. Not sure what that’s about. Here’s a link to a google drive folder where you can download the dfx and assets. The show is arranged so that the nodes progress chronologically in a clockwise manner around the root node. I originally had it separated into layers, but noticed this did not offer a performance boost when layers were deactivated (re: inactive scenes mentioned above) and only gave me more trouble with layer animation than it was worth in organization.
https://drive.google.com/drive/folders/1TICmyZE0J9T7eUHRXRamCExeMvfkLRhs?usp=sharing

1 Like

+1

I too have noticed that “inactive” layers actually are active from a performance standpoint, or at least that seems to be the case with various scenes. Perhaps it’s a memory issue?

It would be wonderful if someone from the Notch team could document this in a way that informs users in the software manual, or point us in that direction, for anyone reviewing this thread down the road…

I too have noticed that “inactive” layers actually are active from a performance standpoint, or at least that seems to be the case with various scenes. Perhaps it’s a memory issue?

They are not active. There’s an existing thread that covers this false hypothesis in great detail already: Notch Block ~8ms cook time with empty layer selected

I think @dsessumes712 may be hinting to resource load (memory) rather than cook times per frame. (Of course, please correct me if I’m wrong.)

Do inactive layers persist in memory when the dfx is loaded, or are they loaded/unloaded into RAM dynamically as an operator switches between them?

1 Like

I think this is what I’m getting at, if I understand you correctly. Both inactive layers and nodes that have been made to be inactive, whether through the “active” slider set to 0 or other means, such as turning emission to 0 on particle primitive emitter. I guess what I’m looking for is an animation friendly “full off” property for both layers and nodes that either eliminates or nearly eliminates their impact on gpu load, as it seems that nodes made inactive through the methods I suggested above still have an significant resource requirement.

To get this out of the way first… BUILDER IS NOT A LIVE PERFORMANCE TOOL.
Notch has been engineered so that editor usability is the focus while in Builder, not performance - and definitely not smoothness of performance.

The main reason Standalone and Block exports exist is to provide much better run-time performance for Notch projects - by removing all of the UI, editability, undo stack, copy/paste and all those other niceties that are not needed for playback. Various parts of the scene are flattened & baked down. Shaders are precompiled and cached. It uses less memory, and it’s able to manage memory better because there are constraints on what can happen at any time during execution. It has a more linear, constrained relationship with time. It warms caches. It’s engineered for performance & smoothness, not for editability.

For example, when you hide a node in Builder, Builder doesn’t know if you’re about to unhide the node again - so it needs to keep any memory used for it around for a bit to see what happens. Block/Standalone exports on the other hand can pretend hidden nodes never existed in the first place and just give that memory up.

Now that’s out of the way, a bit about memory usage…
Memory for imported resources is allocated up front and resources remain resident in memory (except for streams/video).
Most nodes use negligable amounts of memory in themselves as they’re just parameters, but their actions may use memory. For example, a deformer on an object requires memory to process; particles and fields need memory to store simulation data. This can run to gigs of memory as things can be so huge. Usually the root node for the simulation is responsible for the memory; e.g. Particle Root stores all the particles, Emitter nodes just tell them how to appear. So hiding a Particle Emitter won’t change memory usage but hiding the Particle Root will.

Memory is allocated for particles and other effects on demand from pools. Some is given back immediately (e.g. temporary buffers during post fx or deformers); some is kept around, e.g. simulations. Simulation memory is only given up with the simulation ends, its parent layer ends or you switch layer so it’s no longer visible (and there are different rules in Builder vs in an export, as discussed above).

It’s worth understanding the difference between the ways in which nodes can be made inactive / ended.

  • By switching layer or ending the parent layer
  • By hiding the node
  • Cropping timeline bar
  • The “Active” slider
  • “Other means” (setting emission to 0 etc).

If you switch layer or the parent layer ends, all memory used by nodes in the layer will be released. If a layer is inactive or out of time range, none of its nodes will be executed or evaluated. Using multiple layers and switching between those layers is a very efficient way to manage performance and memory.

If you hide the node, it’s like the node doesn’t exist anymore. This enables the system to clean up any resources used by the node. In Builder this is not done immediately for reasons listed above, but in standalone/blocks it is.

Cropping the time bar is the most efficient way of controlling node activity over time. If the time bar has reached its end, in Standalone/Block the system knows that the effect is over and can chuck it away.

The Active slider, and “other means”, will stop the effect from executing but they won’t give its memory back. That’s because they’re just parameters - you could just switch them on again and need the effect back, so we cant chuck it away. That said, generally if a node is never started (because it’s active slider is always 0) it never consumes memory in the first place.
Layer Precomps are an exception as they have an option to release memory when they’re inactive.

The best way to manage lots of effects / elements is to split them into separate layers, and crop time bars of the nodes and layers (if you’re working on a linear timeline) or switch between those layers as needed (if you’re working dynamically or in a media server).

If you’re working in a media server host, the media server usually tells Notch its layer has ended at appropriate times (e.g. when switching track / disabling the layer).

Hope that helps!

4 Likes

Wow, this is a great breakdown and explanation of the inner workings of the software. I understand why blocks and standalone would be best for show use. I’ll keep all of this in mind moving forward.

However, I’m not sure memory (as far as RAM is concerned) is the issue. I have 32gb on my system, and during the show I rarely reach 70% usage. What seems to be happening is the frame rate gets quite low when the gpu usage hits 100%. This is an obvious result, but my concern is that it hits 100% gpu much more quickly when there are lots of inactive nodes (by any means, including being hidden, layers, or timeline crop) than if I am just running that specific effect and active nodes alone. I’m wondering if there is any way to mitigate this, and especially if that method can be animated. Unfortunately I’m an independent artist and can’t always afford to upgrade for a month for personal projects.

Thanks again for your help so far. It has been very enlightening.

Dalton Sessumes

What matters is VRAM, not main memory btw.

Regarding particle systems : to minimise memory consumption and resources in general, minimise the number of Particle Root nodes - particularly if you have large particle counts. (For small particle counts, don’t worry about it.)

The particle system is very flexible and it’s often possible to combine multiple particle effects into a single particle root. You can put multiple emitters under a single particle root, and particles will be allocated to them on demand: if an emitter is disabled/made inactive/emission set to 0 it stops grabbing particles, allowing another one to grab them instead. This makes it considerably more efficient than using multiple Particle Roots for this purpose.

You can combine entirely different particle effects under one particle root. If you want a certain particle renderer to only render the particles of certain emitters, or a particle effector to only effect certain emitters, all of the nodes have “Rendered Emitters” / “Affected Emitters” inputs to filter them. Or, if you want a renderer to only render particles from a single emitter, parent the renderer directly to the emitter.

Think of a Particle Root as just a “particle container”.

Note that it’s also important to minimise the number of particles set on the Particle Root. If you have a Particle Root with 1 million particles and you only ever use 100 of them, you’re still allocating for 1 million. Check the particle metrics and reduce the number on the Particle Root accordingly. (Affectors/renderers generally do reduce workloads by compacting the set of particles they work on, but there’s still a computation overhead if you over-size the container.)

3 Likes

Hi! coming back a bit later, but it is a very interesting read.

I am also wondering if the particule simulation run only on the CPU or GPU or if it’s an option to choose from somewhere?

Also in what measure would a great CPU improve the capabilities, as I see the Benchmark is focused on the GPU ?

Thank you,

Particles only run on the GPU - Notch is primarily GPU bound, the CPU is rarely a factor for performance. That’s why we don’t bother including them in the bench marks.

If your looking at upgrading your pc - pretty much any modern CPU will do just fine. The GPU is where you’ll want to go for the best you can reasonably afford.

– Ryan

1 Like

Another question on this topic:

Is there a way to calculate necessary performance if you’re creating a show that will be running on a different machine? Maybe a reference to Notch marks could be useful, if that’s possible to implenet in a meaningful way.

Like for example, I create a scene and I could get a readout from Notch that says something like what a minimun Notch mark would be to run a stable 30fps.

I get that there a lot of variables on different machines but maybe it could be useful when working with known hardware, like Disguise machines for instance.

If you got to the window > performance, you can compare the local systems performance with a target system based on its notchmarks.

It’s an estimate, so you shouldn’t take it as gospel, but it will give you a rough idea of the performance.

– Ryan

1 Like