tutorial · 1 min read

Long video → multiple captioned clips, in one command

Slice any long video into N captioned shorter clips. Transcribe once, auto-pick the moments with AI, trim + caption each in parallel. Optionally reformat to vertical (9:16) for TikTok / Reels / Shorts, square (1:1) for Instagram, or portrait (4:5) — caption anchor auto-adjusts.

Got a long video and want a dozen shorter ones out of it? Transcribe the whole thing once, let AI pick the best moments, then loop trim and caption per clip. Pass --reformat to crop to vertical for TikTok or Reels; leave it off and the source aspect is preserved for YouTube cards or Instagram feeds.

#cli#claude-code#captions#transcription#highlights#video-trim#video-reframe#shorts#vertical-video#tiktok#reels#instagram#youtube-shorts#agent-loop
Run it

Two ways, same result

Either paste the prompt into your AI agent, or run a single command in your terminal — both invoke the same recipe.

est. cost ≈ 1.6 credits needs PIPE2_TOKEN runs 6 pipelines
prompt · paste into agent
Claude Code, Codex, any shell-agent — conventions in AGENT.md
Run the pipe2 clip-factory recipe — one long video → N captioned, watermarked clips. Picks moments automatically.

Dispatch: pipe2 recipe run clip-factory --input <video-url-or-local-path> --reformat 9:16

Tune the picker with --highlights-count N and --highlights-style "the funniest moments". Power-user override: --clips path/to/clips.json (JSON array of {"context","start_sec","end_sec"}).

Returns a JSON array of clip URLs.
The agent reports the final hosted video URL when the chain completes.
command · run in shell
Needs pipe2 CLI + PIPE2_TOKEN; fetches the recipe from GitHub
pipe2 recipe run clip-factory --input https://www.youtube.com/watch?v=4uzGDAoNOZc --reformat 9:16
Prints the hosted video URL to stdout when the chain finishes.
Walkthrough

6 pipelines transform one input into one output

What each step takes in, and what it spits out. The artifact glyph on the right shows the output kind.

total ≈ 1.6 credits steps 6 pipelines flow

Estimated at recipe defaults. Final cost may vary — metered models bill by actual token usage, and optional steps add to the total when their inputs are supplied.

  1. transcription 0.8 cr

    ElevenLabs Scribe transcribes the full source once — cached on the source hash, so re-runs and every clip after the first are free. The trim reads these words to find each moment.

    input source OpenClaw Creator — Why 80% of Apps Will Disappear www.youtube.com/watch?v=4uzGDAoNOZc Demo previews are five clips cut from this interview. Drop in any long-form video to make your own.
    text
    00:00:00,180  Today, I'm sitting down with Peter Steinberger, the creator of OpenClaw, the open source personal AI agent that has completely taken over the internet.
    00:00:09,060  The GitHub repo exploded to over 160,000 stars practically overnight.
    00:00:14,130  The community has built countless projects, like Malt Book, where bots talk among themselves.
    00:00:19,560  And now, the bots are even renting humans to do tasks in the real world.
    00:00:24,220  In our conversation, we discuss his aha moment, his contrarian development philosophies, and what this means for builders in 2026.
    00:00:32,740  Let's dive in.
    00:00:38,980  So good to see you, man.
    00:00:39,960  Hey, what's up?
    00:00:40,760  Um, so you've made something people want.
    …
  2. highlights +10 cr only if --clips

    Reads the transcript and picks N editorial moments — the auto-pick path. Skipped when --clips supplies a manual JSON list.

    json
    [{"context":"Peter explains OpenClaw's core differentiator: local execution gives it access to everything the user can do, unlike cloud-based AI.","desired_seconds":32,"start_sec":95.24000000000001,"end_sec":127.64},{"context":"The vivid moment Peter realized OpenClaw's creative problem-solving: it autonomously transcribed a voice message using ffmpeg and OpenAI's API without being explicitly programmed to do so.","desired_seconds":106,"start_sec":515.182,"end_sec":621.252},{"context":"Peter's contrarian prediction: 80% of apps disappear because personal AI agents manage data and tasks more naturally than purpose-built applications.","desired_seconds":55,"start_sec":641.792,"end_sec":696.312},{"context":"Peter's philosophy on building: minimize friction by using Unix tools and CLIs instead of inventing new abstractions, letting the model handle creative problem-solving.","desired_seconds":54,"start_sec":1178.264,"end_sec":1232.584},{"context":"Peter articulates why swarm intelligence mirrors human society: individuals alone can't build iPhones or go to space, but groups specializing together achieve anything.","desired_seconds":46,"start_sec":258,"end_sec":303.5}]
  3. video-trim 0.1 cr

    Per clip: deterministic SRT-slice + ffmpeg-cut to the window highlights picked, snapped to sentence boundaries. Returns the windowed transcript rebased to the clip for the captions step.

    video
  4. video-reframe +2 cr only if --reformat

    Per clip (only when --reformat is set): reframes to the requested aspect ratio with the lock-and-cut camera director — it frames the active speaker in every shot (from the windowed transcript + CV faces) and cuts cleanly at shot boundaries, never drifting or panning. Skipped by default — the source's native aspect is preserved.

    video
  5. captions 0.2 cr

    Per clip: burns the windowed transcript onto the clip in your chosen preset, at the anchor --position picks.

    video
  6. watermark 0.5 cr

    Overlays the Pipe2 logo (or your own --watermark-url) at the top-left corner so the clip carries attribution across reposts. Pass --no-watermark to skip.

    video

Click any slug to see full pipeline pricing tiers.

Stuck on a chain that won't compose? Drop into our Discord and paste the JSON output of the failing step.