Rotobot Next CLI Reference
Rotobot Next is a command-line tool that produces per-person spline JSON and (optionally) raster sidecars from a folder of frames. A network license relay must be reachable while the binary runs — see the Relay Server page for setup, and the FAQ for common questions.
CLI Arguments
usage: rotobot_next [-h] --image_folder IMAGE_FOLDER
[--output_folder OUTPUT_FOLDER]
[--depth_folder DEPTH_FOLDER] [--auto_depth]
[--auto_depth_window AUTO_DEPTH_WINDOW]
[--z_threshold Z_THRESHOLD]
[--mask_type {sam3,vit,mematte,user}]
[--mattes_folder MATTES_FOLDER] [--mattes_output]
[--debug_output] [--filled_shapes_per_person_output]
[--output_trimap] [--output_sam3_mask]
[--sam3_mask_combine {best_iou,union_top2,union}]
[--vitmatte_tile_size VITMATTE_TILE_SIZE]
[--vitmatte_tile_overlap VITMATTE_TILE_OVERLAP]
[--mematte_max_tokens MEMATTE_MAX_TOKENS]
[--bbox_thresh BBOX_THRESH] [--use_mask]
[--checkpoint_path CHECKPOINT_PATH]
[--detector_name DETECTOR_NAME]
[--segmentor_name SEGMENTOR_NAME] [--fov_name FOV_NAME]
[--detector_path DETECTOR_PATH]
[--segmentor_path SEGMENTOR_PATH] [--fov_path FOV_PATH]
[--mhr_path MHR_PATH]
[--mematte_config_path MEMATTE_CONFIG_PATH]
[--mematte_checkpoint_path MEMATTE_CHECKPOINT_PATH]
rotobot_next — produce per-person spline JSON and optional raster sidecars
(mattes, trimaps, debug overlays) from a folder of frames, with optional
depth-aware occlusion gating.
options:
-h, --help show this help message and exit
input / output:
--image_folder IMAGE_FOLDER
Folder of input frames (JPG/PNG/EXR; sorted
alphabetically).
--output_folder OUTPUT_FOLDER
Output folder for JSON + sidecars (default:
./output/<image_folder_name>).
depth-aware occlusion gating:
--depth_folder DEPTH_FOLDER
Folder of metric-depth EXR files (one per input frame,
sorted 1:1). When supplied, enables depth-aware
occlusion gating; when omitted, the pipeline runs in
pure-RGB mode.
--auto_depth Run a Depth-Anything-3 (DA3METRIC-LARGE) prepass over
--image_folder before matting. Writes one EXR per frame
to <output_folder>/auto_depth/ and then enables depth-
aware occlusion gating against that folder. Cannot be
combined with --depth_folder (the explicit folder always
wins).
--auto_depth_window AUTO_DEPTH_WINDOW
Temporal window (frames) for the --auto_depth prepass.
Default 3 matches the standalone DA3 CLI; the centre
frame of each window is written to disk.
--z_threshold Z_THRESHOLD
Metric depth tolerance (m) at a limb's midline, scaled
per-segment by SEGMENT_DEPTH_MULT. Only used with
--depth_folder. Default 0.025 — 0.025 and 0.05 produce
no artefacts vs. the no-depth baseline; the previous
0.17 default left visible artefacts. Tighter z also
runs ~20% faster at 4K (rays terminate earlier).
matting:
--mask_type {sam3,vit,mematte,user}
Matte source: mematte (default — Memory Efficient
Matting ViT-B Composition-1k), vit (HF ViTMatte
tiled), sam3 (raw SAM3 mask), or user (read pre-
computed mattes from --mattes_folder).
--mattes_folder MATTES_FOLDER
Folder of pre-computed matte images. Only used with
--mask_type user.
output sidecars (off by default):
--mattes_output Write final alpha mattes as JPEGs under
<output_folder>/mattes/.
--debug_output Write pixel-based debug visualisations under
<output_folder>/debug/.
--filled_shapes_per_person_output
Write per-person filled-shape rasters under
<output_folder>/filled_shapes/.
--output_trimap Save the per-person trimap fed to ViTMatte as JPEG
under <output_folder>/trimaps/ (0=bg, 128=unknown band
between SAM3 erode/dilate, 255=fg). One JPG per person
per frame. Pair with --output_sam3_mask to diagnose
whether bad alpha edges come from the trimap or from
the matting model itself.
--output_sam3_mask Save the greyscale SAM3 mask (max-of-BGR over the
cleaned SAM3 matte, before thresholding) as JPEG under
<output_folder>/sam3_masks/. This is the source the
trimap is built from.
tuning (defaults are good):
--sam3_mask_combine {best_iou,union_top2,union}
How to combine SAM3's three candidate masks (subpart /
part / whole). union_top2 (default): OR top two by IoU
— best coverage/strays trade. best_iou: single
highest-IoU candidate — fewer hallucinations, may drop
head/hair pixels. union: OR all three (legacy —
broader coverage, more strays). Connected-component
gating runs after the combine.
--vitmatte_tile_size VITMATTE_TILE_SIZE
Edge length (px) of each square ViTMatte tile. Larger
tiles see more global context per inference but use
more VRAM. Default 1024.
--vitmatte_tile_overlap VITMATTE_TILE_OVERLAP
Pixel overlap between adjacent ViTMatte tiles. Default
128.
--mematte_max_tokens MEMATTE_MAX_TOKENS
MEMatte backbone token budget. At 4K with topk=0.25
the router picks ~8000 tokens so this cap rarely
fires. Default 24000.
--bbox_thresh BBOX_THRESH
Bounding box detection threshold. Default 0.8.
--use_mask Use mask-conditioned prediction (segmentation mask
auto-generated from bbox).
model paths (advanced — defaults baked into the docker image):
--checkpoint_path CHECKPOINT_PATH
Path to SAM 3D Body model checkpoint.
--detector_name DETECTOR_NAME
Human detection model name (default: vitdet).
--segmentor_name SEGMENTOR_NAME
Human segmentation model name (default: sam3).
--fov_name FOV_NAME FOV estimation model name (default: moge2).
--detector_path DETECTOR_PATH
Human detection model folder (or set
SAM3D_DETECTOR_PATH).
--segmentor_path SEGMENTOR_PATH
Human segmentation model folder (or set
SAM3D_SEGMENTOR_PATH).
--fov_path FOV_PATH FOV estimation model folder (or set SAM3D_FOV_PATH).
--mhr_path MHR_PATH MoHR/assets folder (or set SAM3D_MHR_PATH).
--mematte_config_path MEMATTE_CONFIG_PATH
Path to MEMatte LazyConfig file. Default: ViT-B
Composition-1k.
--mematte_checkpoint_path MEMATTE_CHECKPOINT_PATH
Path to MEMatte .pth checkpoint. Default: ViT-B
Composition-1k weights.
Examples:
# Pure-RGB pass, defaults (mematte alpha, union_top2 SAM3 combine).
rotobot_next --image_folder ./frames --output_folder ./out
# Depth-aware pass with all sidecars for debugging.
rotobot_next \
--image_folder ./frames \
--depth_folder ./frames_depth \
--output_folder ./out \
--mattes_output --debug_output \
--output_trimap --output_sam3_mask \
--filled_shapes_per_person_output
Environment variables (used when the matching --*_path flag is empty):
SAM3D_DETECTOR_PATH human detection model folder
SAM3D_SEGMENTOR_PATH human segmentation model folder
SAM3D_FOV_PATH FOV estimation model folder
SAM3D_MHR_PATH MoHR / assets folder
Dependencies & Open Source
The following libraries are packaged and utilized within the Rotobot Next engine:
CPython License
Source Repository
CUDA License
Source Repository
cuDNN License
Source Repository
Depth Anything 3 License
Source Repository
Detectron2 License
Source Repository
DINOv3 License
Source Repository
ftfy fixes text for you Python License
Source Repository
HuggingFace Transformers License
Source Repository
iopath License
Source Repository
MEMatte License
Source Repository
Momentum Human Rig License
Source Repository
MoGE License
Source Repository
numpy License
Source Repository
OpenColorIO License
Source Repository
OpenImageIO License
Source Repository
opencv License
Source Repository
PyInstaller License
Source Repository
pytorch License
Source Repository
regex python License
Source Repository
SAM3 Meta License
Source Repository
SAM3 Body 3D License
Source Repository
SciPy License
Source Repository
timm License
Source Repository
tqdm License
Source Repository
ViTMatte License
Source Repository