Pipelines
Pipe2.ai pipelines chain multiple AI models into a single automated workflow. Each pipeline takes simple inputs (photos, text, audio) and produces finished content — images, videos, audio, or a combination.
How pipelines work
- Upload your input — a photo, video, audio file, or text
- Choose a pipeline — each pipeline defines what AI models run and in what order
- Get your output — images, videos, or audio depending on the pipeline
Each pipeline has a fixed credit cost. Credits are reserved when you start a run and confirmed on success (or refunded on failure).
Available pipelines
| Pipeline | Description | Category | Models | Credits |
|---|---|---|---|---|
| Video Generator | Generate AI videos from a text prompt, image, or reference clip | video | Veo 3.1, Seedance 2 Pro, Seedance 2 Fast | 12–96 |
| Image Generator | Generate AI images from text or reference photos | image | Imagen 4 Fast, Gemini 3.1 Flash, GPT Image 2 | 0.5–7 |
| Audio Generator | Turn text into natural speech with dozens of voices, 70+ languages, and custom style directions for tone, accent, and pacing | audio | Gemini 2.5 TTS | 0.6–1.2 |
| Music Generator | Generate original background music that matches any mood, genre, and length you describe | audio | Eleven Music v1, Lyria 3 Pro | 49 |
| Video Editor | Edit existing videos with AI | video | Grok Imagine | 8 |
| Image Editor | Edit any image with AI | image | Grok Imagine | 0.5–1.5 |
| Text Card | Create professional text overlays and title cards with typography, colors, and animation chosen automatically | image | Gemini 3.1 Flash | 1 |
| Image Motion | Turn any still image into a cinematic video clip with smooth camera movement, zoom, and transitions | video | — | 0.5 |
| Script Writer | Turn any topic into a complete video production blueprint — narration, visual plan per segment, audio direction, and a step-by-step shot list | text | — | 1 |
| Video Reel | Combine multiple video clips into one seamless video with AI-picked transitions, narration, and background music | video | — | 1 |
| Video Trim | Deterministic transcript-aware video trim: slice an SRT to your explicit [start, end] window and ffmpeg-cut the source to match | video | — | 0.1 |
| Image Search | Search museum archives and stock photo libraries with AI-targeted queries that curate the best results from five sources | image | — | 0.5 |
| Footage Search | Search stock video libraries for b-roll, establishing shots, and background footage | video | — | 0.5 |
| YouTube Cover | Generate scroll-stopping YouTube thumbnails from your video title — bold composition, expressive focal point, and room for overlay text | image | Gemini 3.1 Flash | 1.5 |
| Product Shots | Turn one product photo into a full marketplace-ready shoot — white background, lifestyle, gradient studio, in-use, and outdoor variants | image | Gemini 3.1 Flash | 1.5/item |
| Transcription | Transcribe any video or audio file to text | audio | ElevenLabs Scribe | from 0.8 + 0.08/min |
| Watermark | Brand any video output with your logo | video | — | 0.5 |
| Captions | Burn styled captions into any video from a transcript or SRT | video | — | from 0.2 + 0.02/min |
| Highlights | Read a transcript, return N editorial picks — the most quotable moments for clipping, chapter generation, or social posts | text | Claude Sonnet 4.6 | 10 |
| Video Reframe | Auto-crop horizontal video to vertical with AI active-speaker framing | video | — | from 2 |
Inputs and outputs
Each pipeline defines its own input schema and output format. Common patterns:
Inputs: photos, text scripts, audio files, style/provider configuration
Outputs: generated images, animated videos, audio files, or multiple assets per run