Generated 2.39:1 Open Skills category masthead showing categorized skill drawers on the right side.

Video & Media Production

Skills for the expensive parts of media work: transcript-first editing, motion graphics, timeline assembly, and NLE control. Build these after the simpler media primitives are working.

Read this page like a triage menu. Find the skill that removes a repeated explanation, a fragile manual step, or a quality bar the agent keeps missing. Then copy the setup prompt and let your agent adapt it to your tools, files, accounts, and standards.

Install the primitive only when you can name the workflow it will improve.

3 skills

How to use this category.

If you are building this lane from scratch, start with Radio Edit. Otherwise, skip directly to the skill that matches today's bottleneck. The prompt is the starting point; the installed skill should reflect your real workflow.

Radio Edit

Creates a transcript-driven "radio edit" — a rough cut where the spoken narrative is fixed before any visuals are touched. Working from a timestamped transcript, the agent identifies false starts, repeated takes, filler, tangents, and flubbed lines; chooses the best take of each repeated section; and produces both a human-reviewable paper edit (what's kept, what's cut, why) and a timeline file (FCXML/EDL) your editing software imports directly, with all cuts already placed.

Why build it

The first hours of editing talking-head footage are mechanical: find the good takes, cut the failures, tighten the flow. That's exactly the work a transcript-literate agent does well — and the paper edit means you review its editorial choices before opening your editing app. Going from raw recording to a cuts-placed timeline without scrubbing footage by hand changes the economics of producing video.

What you need

Media Transcription (word-level timestamps are essential) · An NLE that imports FCXML or EDL (DaVinci Resolve, Premiere, Final Cut)

Show the full setup prompt

<prompt>
  <task>
    Create a new skill for my AI coding agent called "radio-edit", stored wherever my
harness loads skills from.

The skill's job: produce a transcript-driven rough cut of talking-head footage — fix
the spoken flow first, before any visual work.

This depends on my media-transcription skill: input is a video plus its word-level
timestamped transcript.

Before writing it, interview me for: my editing software (for the right timeline
format — FCXML or EDL), how aggressive cuts should be by default (tight vs.
conversational), and whether anything must always be cut (profanity, specific
phrases, names).

The skill must include: (1) trigger conditions — when I ask for a rough cut, paper
edit, or cleaned-up edit of a recording; (2) edit-decision rules: detect false starts,
repeated takes (keep the best, with reasoning), filler, dead air, and tangents;
(3) a paper edit document for my review: every cut with timecodes, what was removed,
and why — delivered BEFORE the timeline file; (4) timeline export in my NLE's format
with cuts placed, including a small handle of frames on each cut for finesse;
(5) a revision loop: I mark up the paper edit, you regenerate the timeline.

After writing it, test it on a short recording (under 5 minutes) end to end, including
importing the timeline into my editor.
  </task>
</prompt>

B-Roll Pipeline

An end-to-end automated pipeline that turns a finished talking-head video into one with animated motion graphics: the transcript is analyzed by a scout agent that selects the moments deserving a graphic (applying density and spacing rules so the video isn't wallpapered), a builder agent generates animated graphic components in code (Remotion — React-based video — is the proven stack) against a strict shared visual contract so every graphic matches, the clips are rendered, and everything is composited onto the source video at the right timestamps, with platform-appropriate titling. One skill orchestrates the whole flow and tracks pipeline state so a multi-hour job can resume after interruption.

Why build it

This is the most complicated skill in the library and the strongest proof of the skills thesis: motion graphics work that costs an editor days happens in a supervised pipeline run. It teaches three advanced patterns at once — subagent decomposition (scout selects, builder builds; neither does the other's job), contract-first generation (the shared visual API is what keeps fifty generated graphics consistent), and resumable state (long pipelines must survive interruption). Build the earlier media skills first; build this when you're ready for the payoff.

What you need

Media Transcription · Node.js with Remotion · ffmpeg for compositing · Real patience for the initial build — this one is a project, and worth it

Show the full setup prompt

<prompt>
  <task>
    Create a skill (plus two subagents) for my AI coding agent called "broll-pipeline",
stored wherever my harness loads skills from.

The job: an end-to-end pipeline that takes a finished talking-head video plus its
timestamped transcript and produces animated motion-graphic overlays composited onto
the video at the right moments.

Architecture — three pieces:
1. A SCOUT subagent: reads the chaptered transcript and selects which moments deserve
   a graphic, enforcing density and spacing rules (a target of graphics-per-minute and
   a minimum gap between them), and writes a manifest: timestamp in/out, concept, and
   the data or text each graphic should show.
2. A BUILDER subagent: takes 2–3 manifest entries at a time and generates Remotion
   (React video) components for them against a SHARED VISUAL CONTRACT — one TypeScript
   file defining the palette, typography, animation primitives, and layout components
   every graphic must use. The contract is what keeps all graphics consistent.
3. The ORCHESTRATOR skill: runs scout → builder batches → render each clip → composite
   clips onto the source video at manifest timestamps with ffmpeg → final output. It
   keeps a pipeline state file so a long run can resume from any stage after
   interruption.

Before building, interview me for: my brand palette and typography for the visual
contract, my target graphic density, and output specs (resolution, platforms).

Build it in stages and verify each before moving on: contract first, then one
hand-written reference graphic we approve together, then the scout, then the builder
(validated against the contract), then rendering and compositing. Test the full
pipeline on a short video (2–3 minutes) before any real footage.
  </task>
</prompt>

AI Editing Assistant (NLE Integration)

Connects your agent directly to your video editing software (DaVinci Resolve is the proven target — it has a Python scripting API) so the agent operates inside the editor: analyzing transcripts, removing silences, extracting subclips, making editorial decisions, and building timelines programmatically in your real project rather than handing you files to import. Where Radio Edit produces a timeline file, this skill manipulates the editor live.

Why build it

This is the deepest level of media automation — agent as editing assistant rather than file generator. "Cut the silences out of this interview," "pull every clip where she mentions pricing," "build a rough timeline of the best moments" become sentences you say instead of sessions you spend. It also teaches the general pattern of driving any scriptable desktop application from an agent, which extends far beyond video.

What you need

DaVinci Resolve (free version includes the scripting API) or another scriptable NLE · Media Transcription · Comfort letting an agent operate real software (the skill must work on duplicated timelines, never originals)

Show the full setup prompt

<prompt>
  <task>
    Create a new skill for my AI coding agent called "nle-assistant", stored wherever my
harness loads skills from.

The skill's job: operate my video editing software directly through its scripting API
to do transcript-driven editing — silence removal, subclip extraction, and timeline
building — inside my real projects.

Before writing it, check what's available: I use DaVinci Resolve (its Python scripting
API ships with the app). Verify you can connect to a running instance and read a
project before building anything else.

The skill must include: (1) trigger conditions — editing requests that should happen
inside the editor: remove silences, extract clips matching a description, build a
rough timeline from footage; (2) the connection procedure and its failure modes (app
not running, project not open); (3) a hard safety rule: ALWAYS duplicate the timeline
and work on the copy — never modify an original timeline or delete media; (4) core
operations, each verified individually: import media, read/mark clips, cut at
timecodes, assemble timelines from a transcript-derived edit list; (5) integration
with my media-transcription skill so transcripts drive the edits.

After writing it, test against a throwaway project: duplicate a timeline, remove
silences from one clip, and show me the result in the app before touching anything
real.
  </task>
</prompt>

Back to the Skills directory or continue into runbook compositions.