Automated video cropping platform
AIClipper.video
An automated serverless SaaS platform converting long-form YouTube videos into viral vertical Shorts, optimized for mixed-language regional audiences.
Outcome
Delivered a fully automated media rendering pipeline that transcribes, isolates viral hooks, and crops landscape videos into portrait formats.
Problem
Content creators spend hours manually transcription-cropping videos for vertical formats, struggle with local slang transcriptions, and face high compute bills.
Approach
Decoupled video processing into a serverless orchestrator using AWS Step Functions, dividing audio extraction, transcription, AI curation, and rendering into individual steps.
Architecture
A pnpm monorepo containing a Next.js 15 client authenticated by AWS Cognito, and a backend built on Serverless Framework with AWS Lambda, Step Functions, DynamoDB, Python, Groq Whisper API, DeepSeek curation model, and FFmpeg.
Result
Reduced video editing workflow time from 30 minutes to under 2 minutes per clip.
Lessons learned
Decoupling compute-heavy FFmpeg tasks into independent AWS Step Functions prevents monolithic Lambda timeout errors and allows targeted resource scaling.
Constraints
- Serverless execution limits on Lambda (max 15 mins for heavy processing).
- High accuracy transcription of mixed Indonesian-English-Slang expressions.
- Interactive subtitle synchronization.
Technical decisions
- • Used Groq Whisper API for lightning-fast word-level timestamped transcriptions.
- • Selected DeepSeek via BytePlus/ModelArk to analyze transcript timelines and identify viral segments.
- • Used Python (yt-dlp) and FFmpeg to segment and crop the original video stream efficiently.
Key features
- • AI-driven viral hook segment detection.
- • Word-level timestamped dynamic subtitles.
- • Fully serverless orchestration pipeline.