Long-to-Shorts Engine (Whisper + Gemini)

Last Updated: 2/14/2026Read time: 1 min
#Repurposing#Short-Form Video#AI Editing#Social Scheduling

A repeatable SOP to mine the best moments from long-form videos, cut them cleanly with transcript timestamps, and keep a daily posting cadence across platforms—without hiring an editor.

Who Is This For?

CreatorsEditorsSocial Media TeamsAgenciesFounders

What Problem Does It Solve?

Challenge

  • Manual clipping takes 2-6 hours per long video.

  • Cuts often land mid-word and feel unprofessional.

  • Posting consistency breaks when you get busy.

Solution

  • AI mines 3-6 clips automatically and produces ready-to-post shorts.

  • Word-level timestamps enable cleaner cut points with subtle pre/post-roll.

  • Auto-schedule one clip per day for consecutive days to maintain cadence.

What You'll Achieve with This Toolkit

Convert one long video into a week of shorts that look native to each platform, while protecting your original resolution and reducing editing time dramatically.

Cleaner Cuts that Feel Human

Word-accurate timestamps enable cuts that avoid mid-word glitches, improving perceived quality and watch time.

Daily Cadence Without Burnout

Consecutive-day scheduling turns a single production session into multiple days of growth.

How It Works

1Upload Long Video
2Extract Audio
3Whisper Word-Timestamp Transcript
4Gemini Clip Mining & Metadata
5FFmpeg Cut/Crop Shorts
6Schedule 1/Day Multi-Platform Posting
1

Step 1: Collect the Source Video

Start with a single long-form video (podcast, webinar, interview, talk). Ensure you have the final master file to avoid compression artifacts that reduce subtitle accuracy.

A long-form video ready for repurposing

Why this tool:

Selected for its single-token workflow that can accept a source video and later reuse the same integration for processing and publishing, reducing operational complexity.

Upload-Post

Upload-Post

3.5FreemiumEN

Unified Social Media API to auto-publish videos, images, and posts across 10+ networks

2

Step 2: Extract Audio for Accurate Transcription

Extract a clean audio track from the video before transcription. This improves ASR stability and makes downstream clip timestamps more reliable.

Audio waveform extracted from video

Why this tool:

Chosen for its deterministic media processing, enabling repeatable audio extraction that aligns perfectly with later cut operations.

FFmpeg

FFmpeg

4.9FreeEN

FFmpeg - The Universal AI Media Processing Engine

3

Step 3: Transcribe with Word-Level Timestamps

Run Whisper transcription and keep timestamps granular enough to avoid mid-word cuts. Store the transcript with timing so clip boundaries can be derived from what people actually said.

Transcript with timestamps per segment/word

Why this tool:

Selected for its proven ASR quality and support for word-level timestamps, which is the key to professional-feeling cuts.

OpenAI Whisper (whisper-1)

OpenAI Whisper (whisper-1)

4.7PaidEN

Speech-to-text API for word-timestamp subtitles and automation-ready transcripts

4

Step 4: Mine 3-6 High-Retention Moments

Use Gemini to analyze the transcript and propose 3-6 short segments (15-60 seconds) with hook-first structure. Generate per-clip titles/descriptions so publishing isn't blocked on copywriting.

AI-selected clip timestamps and titles

Why this tool:

Chosen for its multimodal/video understanding capabilities and strong transcript reasoning, making clip selection more signal-driven than manual guessing.

Gemini

Gemini

4.8FreemiumEN

Automate Workflows Across Google Workspace

5

Step 5: Cut, Crop, and Export Platform-Ready Shorts

Use FFmpeg to cut clips by exact timestamps, then crop/pad intelligently for 9:16 outputs while preserving source resolution when possible. Add subtle pre/post-roll to avoid abrupt starts.

Short-form clip export settings 9:16

Why this tool:

Selected for GPU-accelerated FFmpeg processing plus a job/status model, which makes batch cutting reliable without maintaining your own video servers.

FFmpeg

FFmpeg

4.9FreeEN

FFmpeg - The Universal AI Media Processing Engine

6

Step 6: Schedule One Clip Per Day

Schedule each short to publish on consecutive days (e.g., 3 clips = next 3 days, 6 clips = next 6 days). Keep a consistent posting time per timezone to train audience expectations.

Content calendar with consecutive-day scheduling

Why this tool:

Chosen because it combines multi-platform posting and scheduling in one integration, preventing the "log into 3 apps" bottleneck.

Upload-Post

Upload-Post

3.5FreemiumEN

Unified Social Media API to auto-publish videos, images, and posts across 10+ networks

Similar Workflows

Looking for different tools? Explore these alternative workflows.

Frequently Asked Questions

Most common video formats work; the SOP supports both vertical and horizontal inputs, then outputs platform-ready 9:16 shorts using crop/pad logic.

Typically 3-6 clips, depending on video length and the number of high-signal moments the transcript contains.

Costs usually come from transcription minutes (Whisper), AI analysis (Gemini), and video processing/publishing volume (FFmpeg + scheduling).

Clip selection quality depends on audio clarity and speaker structure; fast scene changes and noisy audio can reduce transcript accuracy and therefore highlight selection.

You can use any LLM that can consume transcripts and output timestamps + titles; the SOP stays the same as long as the model can rank moments and produce structured clip plans.

Yes—if your publishing API supports additional networks, you can extend the final step without changing the upstream transcription and clip mining logic.