Profiling
The uipc.profile package provides tools to benchmark and profile UIPC simulations. It has two modules:
| Module | Purpose | How it works |
|---|---|---|
uipc.profile |
Benchmark — wall-clock timer statistics | Runs in-process, collects SimulationStats |
uipc.profile.nsight |
GPU profile — kernel-level metrics | Launches subprocess under Nsight Compute (ncu) |
Both accept a World as the primary input. Build your scene, create an engine and world, then pass the world to the profiler.
Setup
from uipc import Scene, Engine, World
from uipc.geometry import tetmesh, ground, label_surface
from uipc.constitution import AffineBodyConstitution
# Build scene
scene = Scene(Scene.default_config())
abd = AffineBodyConstitution()
scene.contact_tabular().default_model(0.5, 1e9)
de = scene.contact_tabular().default_element()
# ... add geometries, constitutions, etc. ...
# Create engine and world
engine = Engine('cuda', 'my_workspace')
world = World(engine)
world.init(scene)
Benchmarking (uipc.profile)
Simple: profile.run()
from uipc import profile
result = profile.run(world, num_frames=10, name='baseline', output_dir='bench')
print(result['summary'])
# Scene: baseline | Frames: 10 | Wall time: 2.345s | Avg: 234.5ms/frame
The result dict contains:
| Key | Description |
|---|---|
name |
Label |
num_frames |
Frames benchmarked |
wall_time |
Total wall-clock seconds |
stats |
SimulationStats instance (for plotting, reports) |
timer_frames |
Raw per-frame timer tree (list of dicts) |
summary |
Human-readable summary string |
workspace |
Path to the engine workspace |
Flexible: profile.session()
Use the context manager to mix warmup and profiling:
from uipc import profile
with profile.session(world, name='test', output_dir='bench') as s:
s.advance(50) # warmup 50 frames (no stats collected)
s.profile(10) # benchmark 10 frames (stats collected)
print(s.result['summary'])
The advance() and profile() calls are deferred — they describe what to do, but no work happens until the with block exits. You can chain any sequence:
with profile.session(world, name='phased') as s:
s.advance(10) # warmup phase 1
s.profile(5) # measure phase 1
s.advance(20) # warmup phase 2
s.profile(5) # measure phase 2
Comparing Results
from uipc import profile
md = profile.compare('bench/baseline', 'bench/optimized', output_dir='comparison')
print(md)
This produces a Markdown report with wall-clock deltas and per-timer comparisons.
Generating Reports
The benchmark result includes a SimulationStats object with visualization tools. See Performance Statistics for details on:
stats.summary_report()— comprehensive Markdown + SVG reportstats.profiler_heatmap()— sunburst chartstats.plot()— per-frame chartsstats.to_markdown()— Markdown tables
GPU Profiling (uipc.profile.nsight)
This module wraps Nsight Compute CLI (ncu) to collect kernel-level GPU metrics: duration, SM utilization, occupancy, register usage, etc.
Prerequisites
- Nsight Compute must be installed. Set
NCU_PATHor ensurencuis on yourPATH. - Use
--ncu-set fullfor duration metrics (requires admin/root on some systems).
Simple: nsight.run()
from uipc.profile import nsight
result = nsight.run(world, num_frames=3, name='my_scene',
output_dir='ncu_results', ncu_set='default')
Flexible: nsight.session()
from uipc.profile import nsight
with nsight.session(world, name='my_scene') as s:
s.profile(10) # profiles 10 frames from world.frame()
print(s.result['report_md']) # path to the Markdown report
print(s.result['report_json']) # path to the JSON report
How It Works
Since ncu instruments an entire process, the profiler:
- Saves the scene to a temporary JSON file via
SceneIO. - If a
Worldis passed, callsworld.dump()to checkpoint the simulation state. - Generates a Python subprocess script that loads the scene, recovers the world state (if applicable), and runs the simulation.
- Executes the script under
ncu. - Parses the resulting
.ncu-rep→ CSV → Markdown/JSON reports.
When a World at frame N is passed, the subprocess uses world.recover(N) to skip directly to frame N — no replay cost.
Output Files
<output_dir>/
<name>_report.md # kernel hotspot table (Markdown)
<name>_report.json # structured metrics with source_hints
<name>.ncu-rep # binary (for Nsight Compute GUI)
<name>.csv # raw CSV (already parsed into reports)
Reading the Reports
The JSON report contains per-kernel entries like:
{
"name": "StacklessBVH::calcExtNodeSplitMetrics",
"launches": 3,
"total_duration_ns": 123456.0,
"registers_per_thread": 40,
"avg_sm_pct": 100.0,
"avg_occupancy_pct": 48.0,
"source_hint": "src/backends/cuda/collision_detection/",
"optimization_hints": ["Low occupancy - consider reducing registers"]
}
Key fields:
| Field | Meaning |
|---|---|
name |
Shortened kernel name (extracted from muda lambda) |
total_duration_ns |
Total GPU time across all launches (nanoseconds) |
registers_per_thread |
Register pressure |
avg_occupancy_pct |
Achieved occupancy (higher is better) |
source_hint |
Directory where the CUDA source likely lives |
optimization_hints |
Auto-generated suggestions |
Scene Shortcut
Both profile.run()/profile.session() and nsight.run()/nsight.session() also accept a Scene as a convenience shortcut. A temporary Engine + World is created internally:
from uipc import profile
from uipc.profile import nsight
# These work but World is preferred:
result = profile.run(scene, num_frames=10, name='test', backend='none')
with nsight.session(scene, name='test') as s:
s.advance(50)
s.profile(10)
CLI
The CLI provides quick access for benchmarking assets from HuggingFace: MuGdxy/uipc-assets:
# List available scenes
python -m uipc.cli.benchmark list
# Run a benchmark
python -m uipc.cli.benchmark run --scene cube_ground --frames 10 --output bench
# Profile with Nsight Compute
python -m uipc.cli.benchmark profile --scene cube_ground --frames 3 --output ncu_results
# Analyze a benchmark result
python -m uipc.cli.benchmark analyze bench/cube_ground --ncu-csv ncu_results/cube_ground.csv
# Compare before/after
python -m uipc.cli.benchmark compare bench/baseline bench/optimized --output comparison
Full Example
from uipc import Scene, Engine, World, profile
from uipc.assets import load
from uipc.profile import nsight
# 1. Build scene and world
scene = Scene(Scene.default_config())
load('cube_ground', scene)
engine = Engine('cuda', 'workspace')
world = World(engine)
world.init(scene)
# 2. Benchmark baseline
result = profile.run(world, num_frames=20, name='baseline', output_dir='bench')
print(result['summary'])
# Generate full stats report
result['stats'].summary_report(
output_dir='bench/baseline/report',
workspace=engine.workspace(),
)
# 3. Profile with Nsight Compute
with nsight.session(world, name='cube_ground', output_dir='ncu') as s:
s.profile(5)
# 4. Read reports, identify bottlenecks, optimize code, rebuild
# 5. Re-benchmark after optimization
engine2 = Engine('cuda', 'workspace2')
world2 = World(engine2)
world2.init(scene)
after = profile.run(world2, num_frames=20, name='optimized', output_dir='bench')
# 6. Compare
md = profile.compare('bench/baseline', 'bench/optimized', output_dir='comparison')
print(md)