Profiling¶

Performance Monitor¶

NovaPhy ships an opt-in runtime performance monitor that captures aggregate per-phase timings from the C++ stepping pipeline and can export Chrome / Perfetto traces.

The monitor is a standalone utility -- it does not appear on any solver's public API. Instead you wrap a step in with monitor.scoped(): solver.step(...). This mirrors Newton's newton.utils.event_scope / EventTracer pattern: solvers stay parameter-free, and instrumentation has zero overhead when no scope is active.

Basic Usage¶

import novaphy

model    = builder.finalize()
solver   = novaphy.solvers.SolverSemiImplicit(model, novaphy.solvers.SolverSemiImplicit.Config())
state    = model.state()
control  = model.control()
collision_pipeline = novaphy.CollisionPipeline(model)
contacts = collision_pipeline.contacts()

monitor = novaphy.PerformanceMonitor()
monitor.enabled = True

for _ in range(120):
    with monitor.scoped():
        solver.step(state, state, control, contacts, 1.0 / 120.0)

# Print phase statistics
slowest = sorted(monitor.phase_stats(), key=lambda s: s.avg_ms, reverse=True)
for stat in slowest[:5]:
    print(f"{stat.name}: avg={stat.avg_ms:.2f}ms, max={stat.max_ms:.2f}ms")

# Print per-frame metrics
for metric in monitor.last_frame_metrics():
    print(f"{metric.name}: {metric.value}")

Trace Export¶

For detailed per-frame analysis, enable trace mode and export to Chrome / Perfetto format:

monitor.trace_enabled = True

for _ in range(60):
    with monitor.scoped():
        solver.step(state, state, control, contacts, 1.0 / 120.0)

monitor.write_trace_json("build/novaphy_trace.json")

Open the JSON file in:

Perfetto UI (recommended)
Chrome chrome://tracing

Profiling Demo¶

NovaPhy includes a dedicated profiling demo:

# Rigid body profiling
python python/demos/demo_performance_monitor.py --scene rigid

# Fluid profiling
python python/demos/demo_performance_monitor.py --scene fluid --measured-steps 60

Best Practices¶

Disable visualization when profiling

Polyscope and Python-side rendering are NOT included in engine timings. Disable visualization when diagnosing engine performance.

Use aggregate stats first to find the hot subsystem.
Use trace export for short, focused captures (60-120 steps).
Profile with representative workloads (body count, collision density).
Compare before / after when optimizing.
The monitor is thread-local: instantiate one per worker thread when running parallel rollouts.