Video Reframe

Auto-crop horizontal video to vertical with AI active-speaker framing. TikTok-ready in one call — frames the speaker in every shot and cuts cleanly at every boundary.

Video

Click to upload video

or drag and drop

Target Aspect Ratio

Transcript (for multi-speaker tracking)

Click to upload document

or drag and drop

Best for

•Reframing 16:9 podcast / interview / documentary footage to 9:16 TikTok/Reels/Shorts
•Tracking a specific subject (host, product, screen) when content has multiple focal points
•Producing 1:1 Instagram squares from horizontal masters with active-speaker tracking
•Closing the long-form-to-shorts loop when chained with video-trim and captions

When to use

•After video-trim — trim picks the moment, reframe picks the framing
•Before captions — reframe first so caption text fits the 9:16 canvas
•Standalone for users with already-edited horizontal clips that need vertical versions

Tips

✓Supply a diarized transcript for interviews and panels — it drives per-shot active-speaker framing far more reliably than vision alone
✓Shots with no clear single subject (screen-shares, wide establishing shots) letterbox by design — that is the safe choice, not a miss
✓There are no pan / zoom / smoothing knobs to tune — the camera director is deterministic lock-and-cut

Recipes using this pipeline

Long video → multiple captioned clips, in one command

Slice any long video into N captioned shorter clips. Transcribe once, auto-pick the moments with AI, trim + caption each in parallel. Optionally reformat to vertical (9:16) for TikTok / Reels / Shorts, square (1:1) for Instagram, or portrait (4:5) — caption anchor auto-adjusts.

6 pipelines · CLI

Frequently Asked Questions

What aspect ratios are supported?

9:16 (TikTok/Reels/Shorts), 1:1 (Instagram square), 4:5 (Instagram portrait), and 16:9 (passthrough — no reframing applied).

What if my source is already vertical?

The pipeline detects this and returns the source as-is with meta.skipped=already_target_aspect.

How does it pick who to follow?

Audio-first. If you supply a diarized transcript it frames the active speaker per shot; otherwise a per-shot vision label plus a CV face detector decide. Shots with no clear single subject — screen-shares, wide establishing shots — letterbox to show the whole frame rather than guessing.

Does it ever pan or zoom around?

No. Each shot gets one static crop held perfectly still; the camera only ever cuts at a shot boundary. This makes the jittery-pan / hunting-camera artifact class structurally impossible.

What does it cost?

2 credits for clips up to 1 minute, 3.5 for 1-2 minutes, 5 credits for anything longer.

Can I chain it with video-trim?

Yes — typical chain is video-trim (length) → video-reframe (aspect) → captions (burn-in). Most users will want all three for a finished short.