FRAMEnest
AI footage organizer
The brief
A festival film studio (Lengi) was drowning in 1–5TB of raw footage per shoot — editors spent days just finding the right clips before they could even start cutting.
I built FRAMEnest, a desktop app that uses AI to auto-tag every clip by content, mood, and shot type. Scrub through 1.4TB in minutes instead of hours.
The starting point
Lengi Productions, a documentary and festival film studio, was sitting on roughly 1.4 TB of footage from a single shoot. Their editors were spending the first three days of every project just finding the clips they needed — opening proxy files, scrolling through shot logs, pinging the DP for what was on which card. Multiply that across a year of shoots and they were losing weeks of paid editor time to a problem that had nothing to do with editing.
They didn't need a faster machine. They needed a way to ask their footage questions in plain English: "wide shot, golden hour, two people laughing." That's what FRAMEnest does.
How it works
FRAMEnest is a desktop app that sits on the editor's workstation. Point it at a folder and it does three things, all locally:
- Frame extraction — uses bundled FFmpeg to pull a frame every N seconds from every clip in the folder. Includes RED
.R3D, ProRes, H.264/H.265, MOV, MP4, MXF. - AI tagging pipeline — each frame goes through a multimodal vision model that returns shot type (close-up, wide, medium), lighting (natural, golden hour, fluorescent, low-key), mood (calm, energetic, tense), and free-form content tags. Adjacent near-identical frames get clustered so a 90-minute interview only sends ~50 frames to the model instead of 5,400.
- Searchable index — embeddings + a local SQLite full-text index. Editors search natural language and get matching clips ranked by relevance, with thumbnails, timecode, and "open in Premiere/DaVinci" actions.
How it was built
The shell is Electron + React + Tailwind, deliberately. Tauri would have produced a smaller binary, but every dev a future Lengi might hire can read the Electron codebase. The frame extractor shells out to ffmpeg-static. The local index uses better-sqlite3. AI tagging started on the OpenAI Vision API for v1, then moved to a local CLIP-style model for embeddings + a small captioning model for tags so footage never leaves the editor's machine — non-negotiable for festival pre-release material.
The cross-platform build is the boring miracle: GitHub Actions runners sign macOS bundles with the studio's Apple Developer cert, notarize them on Apple's servers, and produce a Windows .exe with Authenticode signing in parallel. git tag v0.4.2 produces installers for both platforms in 18 minutes.
What surprised me
The hardest week wasn't the AI pipeline — it was RED's R3D codec licensing. RED's SDK is per-app, with a 2–6 week procurement window, and FFmpeg can't read R3D natively. I built v1 against the studio's ProRes proxies (which they generate anyway for editing) and shipped R3D support as a v1.1 feature once the SDK arrived. Lesson learned: ask about codecs in the discovery call.
The other unexpected eater of time was GPU memory management. Editors run Premiere, DaVinci, and Chrome simultaneously, leaving 2–4 GB of VRAM for everything else. FRAMEnest streams frames through the model rather than batching, detects available VRAM at startup, downgrades to a smaller model variant when needed, and falls back to CPU when memory is tight. A four-hour CPU job that finishes is much better than a 90-minute GPU job that crashes and loses progress.
The result
Lengi's editors went from spending three days on clip discovery to roughly forty minutes. The studio uses FRAMEnest on every project now. The AI cost is ~$8 per terabyte tagged thanks to the embedding-based clustering, down from ~$200 with the naive "send every frame" approach.
What this work signals
If you're running a creative studio sitting on terabytes of unsearchable footage, the technology to fix that exists today, costs a fraction of one editor-week, and runs on hardware your team already owns. FRAMEnest is a custom build for Lengi, but the architecture maps cleanly to any media-heavy studio — broadcast, agency, in-house brand teams. If that's you, say hi.
Lengi's editors went from spending three days on clip discovery to roughly forty minutes. The studio uses FRAMEnest on every project now, and the AI cost is ~$8 per terabyte thanks to embedding-based clustering — down from ~$200 with the naive 'send every frame' approach.
What I built
- ✦Electron + GitHub Actions for cross-platform binary builds (macOS + Windows)
- ✦Frame extraction + AI tagging pipeline (multimodal vision models)
- ✦Searchable library with shot-type, lighting, and emotion filters
- ✦Plays nice with RED R3D + standard codecs
- ✦Local-first — never uploads footage anywhere
Building something like FRAMEnest?
I take on a small number of projects each quarter. Let's talk about yours.
Start a project →