Long video → multiple captioned clips, in one command
Slice any long video into N captioned shorter clips. Transcribe once, auto-pick the moments with AI, trim + caption each in parallel. Optionally reformat to vertical (9:16) for TikTok / Reels / Shorts, square (1:1) for Instagram, or portrait (4:5) — caption anchor auto-adjusts.
Got a long video and want a dozen shorter ones out of it? Transcribe the whole thing once, let AI pick the best moments, then loop trim and caption per clip. Pass --reformat to crop to vertical for TikTok or Reels; leave it off and the source aspect is preserved for YouTube cards or Instagram feeds.
Two ways, same result
Either paste the prompt into your AI agent, or run a single command in your terminal — both invoke the same recipe.
Run the pipe2 clip-factory recipe — one long video → N captioned, watermarked clips. Picks moments automatically.
Dispatch: pipe2 recipe run clip-factory --input <video-url-or-local-path> --reformat 9:16
Tune the picker with --highlights-count N and --highlights-style "the funniest moments". Power-user override: --clips path/to/clips.json (JSON array of {"context","start_sec","end_sec"}).
Returns a JSON array of clip URLs. pipe2 CLI + PIPE2_TOKEN; fetches the recipe from GitHub
pipe2 recipe run clip-factory --input https://www.youtube.com/watch?v=4uzGDAoNOZc --reformat 9:16 6 pipelines transform one input into one output
What each step takes in, and what it spits out. The artifact glyph on the right shows the output kind.
Estimated at recipe defaults. Final cost may vary — metered models bill by actual token usage, and optional steps add to the total when their inputs are supplied.
- transcription 0.8 cr
ElevenLabs Scribe transcribes the full source once — cached on the source hash, so re-runs and every clip after the first are free. The trim reads these words to find each moment.
input source OpenClaw Creator — Why 80% of Apps Will Disappear www.youtube.com/watch?v=4uzGDAoNOZc Demo previews are five clips cut from this interview. Drop in any long-form video to make your own.text00:00:00,180 Today, I'm sitting down with Peter Steinberger, the creator of OpenClaw, the open source personal AI agent that has completely taken over the internet. 00:00:09,060 The GitHub repo exploded to over 160,000 stars practically overnight. 00:00:14,130 The community has built countless projects, like Malt Book, where bots talk among themselves. 00:00:19,560 And now, the bots are even renting humans to do tasks in the real world. 00:00:24,220 In our conversation, we discuss his aha moment, his contrarian development philosophies, and what this means for builders in 2026. 00:00:32,740 Let's dive in. 00:00:38,980 So good to see you, man. 00:00:39,960 Hey, what's up? 00:00:40,760 Um, so you've made something people want. … -
Reads the transcript and picks N editorial moments — the auto-pick path. Skipped when --clips supplies a manual JSON list.
json[{"context":"Peter explains OpenClaw's core differentiator: local execution gives it access to everything the user can do, unlike cloud-based AI.","desired_seconds":32,"start_sec":95.24000000000001,"end_sec":127.64},{"context":"The vivid moment Peter realized OpenClaw's creative problem-solving: it autonomously transcribed a voice message using ffmpeg and OpenAI's API without being explicitly programmed to do so.","desired_seconds":106,"start_sec":515.182,"end_sec":621.252},{"context":"Peter's contrarian prediction: 80% of apps disappear because personal AI agents manage data and tasks more naturally than purpose-built applications.","desired_seconds":55,"start_sec":641.792,"end_sec":696.312},{"context":"Peter's philosophy on building: minimize friction by using Unix tools and CLIs instead of inventing new abstractions, letting the model handle creative problem-solving.","desired_seconds":54,"start_sec":1178.264,"end_sec":1232.584},{"context":"Peter articulates why swarm intelligence mirrors human society: individuals alone can't build iPhones or go to space, but groups specializing together achieve anything.","desired_seconds":46,"start_sec":258,"end_sec":303.5}] - video-trim 0.1 cr
Per clip: deterministic SRT-slice + ffmpeg-cut to the window highlights picked, snapped to sentence boundaries. Returns the windowed transcript rebased to the clip for the captions step.
video -
Per clip (only when --reformat is set): reframes to the requested aspect ratio with the lock-and-cut camera director — it frames the active speaker in every shot (from the windowed transcript + CV faces) and cuts cleanly at shot boundaries, never drifting or panning. Skipped by default — the source's native aspect is preserved.
video - captions 0.2 cr
Per clip: burns the windowed transcript onto the clip in your chosen preset, at the anchor --position picks.
video - watermark 0.5 cr
Overlays the Pipe2 logo (or your own --watermark-url) at the top-left corner so the clip carries attribution across reposts. Pass --no-watermark to skip.
video
Click any slug to see full pipeline pricing tiers.
5 clips from one episode
One run of the recipe above, fanned out — every clip trimmed to its own moment, reframed vertical, and captioned. Tap any clip to play it full-size.