Frequently Asked Questions

Which --mask_type should I pick?

The default is mematte (Memory Efficient Matting, ViT-B Composition-1k). It produces the cleanest hair and silhouette edges of the three model-based options and is the right starting point.

  • mematte — default. Good edges, moderate VRAM.
  • vit — HuggingFace ViTMatte, tiled. Higher VRAM cost, useful if MEMatte underperforms on a specific frame.
  • sam3 — raw SAM3 mask, no alpha refinement. Fastest; use only when you want SAM3's coarse output and will refine downstream.
  • user — read pre-computed mattes from --mattes_folder. Use when you have alpha mattes from a different tool and only want the spline JSON from Rotobot Next.

When do I need --depth_folder, and what --z_threshold should I use?

Supply --depth_folder whenever you want depth-aware occlusion gating — typically multi-person scenes where one person is in front of another, or anywhere a body part crosses another body. The depth folder must contain one EXR per input frame, in sorted 1:1 order.

The default --z_threshold 0.025 (metres at the limb midline, scaled per-segment) is the result of a five-value wedge run on 4K clips at [0.2, 0.1, 0.075, 0.05, 0.025]. Findings:

  • 0.025 and 0.05 — no artefacts versus the no-depth baseline; the recommended range.
  • 0.17 (the historical default) — leaves visible artefacts; do not use.
  • Tighter thresholds also run roughly 20% faster at 4K because rays terminate earlier.

How are licenses counted?

Per process. One rotobot_next worker = one seat. Running two workers in parallel on the same machine consumes two seats. See the Relay Server page for details.

Where does output land if I omit --output_folder?

In ./output/<image_folder_name>/ relative to the current working directory. The JSON file plus any requested sidecar folders (mattes/, debug/, trimaps/, sam3_masks/, filled_shapes/) are written under that root.

Can I run multiple workers on one GPU?

Yes, depending on input resolution and available VRAM:

  • HD (1080p) / 1440p — two workers fit comfortably on a 24 GB+ GPU (each worker takes around 14–15 GB).
  • 4K / UHD — single worker. Two workers will OOM the ViTMatte / MEMatte tile pass on a 24 GB card and run unreliably on 48 GB.

Most of the batch scripts in visualisation/ use a flock-based claim pattern to share a queue between worker processes safely.

What input formats does Rotobot Next accept?

JPG, PNG, and EXR sequences. EXR input goes through OpenColorIO; the bundle ships the ACES 1.0.3 config so ACEScg-linear EXRs are colour-managed correctly by default. Set OCIO=/path/to/config.ocio to point at a different config.

Why was my first run slow on a cold machine?

The bundled binary materialises the SAM3, SAM3D-Body, ViTMatte / MEMatte, and MoGE-2 weights on first call (about 30–60 seconds on a typical NVMe). Subsequent runs in the same process tree reuse cached weights and start instantly.

--help itself returns in well under a second — it skips model loading entirely.

What environment variables does the binary read?

TOKGAN_RELAY_URL
License relay URL. Default http://0.0.0.0:6349. See the Relay Server page.
OCIO
Path to an OpenColorIO config file (required for non-default colour management; the bundled ACES 1.0.3 config covers most VFX workflows).
SAM3D_DETECTOR_PATH
Override the bundled vitdet detector folder.
SAM3D_SEGMENTOR_PATH
Override the bundled SAM3 segmentor folder.
SAM3D_FOV_PATH
Override the bundled MoGE-2 FOV estimator folder.
SAM3D_MHR_PATH
Override the bundled Momentum Human Rig assets folder.