Profiling¶
Performance Monitor¶
NovaPhy ships an opt-in runtime performance monitor that captures aggregate per-phase timings from the C++ stepping pipeline and can export Chrome / Perfetto traces.
The monitor is a standalone utility -- it does not appear on any
solver's public API. Instead you wrap a step in
with monitor.scoped(): solver.step(...). This mirrors Newton's
newton.utils.event_scope / EventTracer pattern: solvers stay
parameter-free, and instrumentation has zero overhead when no scope is
active.
Basic Usage¶
import novaphy
model = builder.finalize()
solver = novaphy.solvers.SolverSemiImplicit(model, novaphy.solvers.SolverSemiImplicit.Config())
state = model.state()
control = model.control()
collision_pipeline = novaphy.CollisionPipeline(model)
contacts = collision_pipeline.contacts()
monitor = novaphy.PerformanceMonitor()
monitor.enabled = True
for _ in range(120):
with monitor.scoped():
solver.step(state, state, control, contacts, 1.0 / 120.0)
# Print phase statistics
slowest = sorted(monitor.phase_stats(), key=lambda s: s.avg_ms, reverse=True)
for stat in slowest[:5]:
print(f"{stat.name}: avg={stat.avg_ms:.2f}ms, max={stat.max_ms:.2f}ms")
# Print per-frame metrics
for metric in monitor.last_frame_metrics():
print(f"{metric.name}: {metric.value}")
Trace Export¶
For detailed per-frame analysis, enable trace mode and export to Chrome / Perfetto format:
monitor.trace_enabled = True
for _ in range(60):
with monitor.scoped():
solver.step(state, state, control, contacts, 1.0 / 120.0)
monitor.write_trace_json("build/novaphy_trace.json")
Open the JSON file in:
- Perfetto UI (recommended)
- Chrome
chrome://tracing
Profiling Demo¶
NovaPhy includes a dedicated profiling demo:
# Rigid body profiling
python python/demos/demo_performance_monitor.py --scene rigid
# Fluid profiling
python python/demos/demo_performance_monitor.py --scene fluid --measured-steps 60
Best Practices¶
Disable visualization when profiling
Polyscope and Python-side rendering are NOT included in engine timings. Disable visualization when diagnosing engine performance.
- Use aggregate stats first to find the hot subsystem.
- Use trace export for short, focused captures (60-120 steps).
- Profile with representative workloads (body count, collision density).
- Compare before / after when optimizing.
- The monitor is thread-local: instantiate one per worker thread when running parallel rollouts.