← All articles

SRT vs VTT vs ASS: Choosing the Right Subtitle Format

A practical comparison of SRT, VTT, and ASS subtitle formats — what each supports, what you lose in conversion, and when format choice actually matters.

Choosing a subtitle format is not about which one is "best." It is about which one carries the features your delivery target requires — and nothing more. Each format represents a different trade-off between compatibility, capability, and complexity.

This guide breaks down the three most common text-based subtitle formats: SRT, VTT, and ASS. When you understand what each one actually stores, the choice usually makes itself.

Feature Comparison

FeatureSRTVTTASS
Timestamp formatHH:MM:SS,mmmHH:MM:SS.mmmH:MM:SS.cc (centiseconds)
Timing precisionMillisecondsMillisecondsCentiseconds (10ms)
Bold/italicPartial (<b>, <i> tags)Yes (CSS + tags)Yes (override tags)
Font specificationNoYes (CSS)Yes (style definitions)
ColorNoYes (CSS)Yes (override tags, hex BGR)
Vertical positioningNoYes (line cue setting)Yes (absolute coordinates)
Horizontal positioningNoYes (position cue setting)Yes (absolute coordinates)
Region/box definitionsNoYes (REGION blocks)No (but has collision detection)
Animation/effectsNoNoYes (transforms, fades, moves)
Karaoke timingNoNoYes (per-syllable \k tags)
Multiple stylesNoPartial (::cue CSS)Yes (named style definitions)
Sequence numbersRequiredOptional (cue identifiers)No (index in section)
Multi-languageNoNoNo (but multiple tracks possible)
Player supportUniversalWeb browsers, modern playersVLC, mpv, Aegisub ecosystem
File sizeSmallSmall-MediumMedium-Large

SRT: The Universal Default

SubRip Subtitle (SRT) has been the de facto subtitle format since the early 2000s. Its structure is minimal:

1
00:00:05,200 --> 00:00:08,100
This is the first subtitle.

2
00:00:09,000 --> 00:00:12,500
This is the second subtitle.

Sequential number, start/end timestamps with millisecond precision, text content, blank line separator. That is the entire specification.

Use SRT when:

  • You need maximum player compatibility
  • The delivery target is YouTube, Vimeo, or a generic media player
  • Your subtitles are plain text with no positioning requirements
  • You are building a subtitle pipeline and need a lowest-common-denominator format
  • The content is short-form where millisecond drift is negligible

Limitations to understand:

  • No positioning. Every subtitle renders at the player's default location (typically bottom-center). If you need a subtitle at the top of screen to avoid obscuring on-screen text, SRT cannot express this.
  • No styling metadata. Some players interpret <b>, <i>, and <u> HTML tags in SRT files, but this is non-standard behavior. There is no guarantee.
  • No font control. The viewer's player determines the font, size, and color.

SRT's strength is its weakness: it is so simple that every player can parse it, but so limited that it can only carry text and timing.

VTT: The Web Standard

WebVTT (Web Video Text Tracks) is the W3C standard for timed text in HTML5. Structurally, it resembles SRT with significant additions:

WEBVTT

STYLE
::cue(.speaker) {
  color: yellow;
  font-style: italic;
}

00:00:05.200 --> 00:00:08.100 position:10% align:start
<v Speaker>This is positioned and styled.</v>

00:00:09.000 --> 00:00:12.500 line:0
This appears at the top of the screen.

Use VTT when:

  • The delivery target is HTML5 <video> or <track> elements
  • You are packaging for HLS or DASH streaming (VTT is the standard subtitle format for both)
  • You need basic positioning (top/bottom, left/right alignment)
  • You need CSS-based styling that the browser will render
  • The platform is web-first (e-learning, web players, progressive web apps)

Limitations to understand:

  • Browser rendering varies. The same VTT file will look different in Chrome, Safari, and Firefox. CSS ::cue support is inconsistent across browsers and versions.
  • Region support is specified but poorly implemented. VTT REGION blocks define named areas of the screen, but real-world browser support is incomplete.
  • No animation or effects. VTT can position and style text, but it cannot animate it.
  • Legacy player support is weaker than SRT. Older media players and some smart TV firmware do not recognize VTT.

VTT occupies the middle ground: more capable than SRT, broadly supported on the web, but not rich enough for creative subtitle work.

ASS: Maximum Creative Control

Advanced SubStation Alpha (ASS) is the format of choice for anime fansubbing, karaoke, and any scenario requiring precise visual control:

[Script Info]
Title: Example
ScriptType: v4.00+
PlayResX: 1920
PlayResY: 1080

[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, ...
Style: Default,Arial,48,&H00FFFFFF,...

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:05.20,0:00:08.10,Default,,0,0,0,,This is a subtitle.
Dialogue: 0,0:00:09.00,0:00:12.50,Default,,0,0,0,,{\pos(960,200)}This is at the top.

Use ASS when:

  • You need pixel-precise positioning (absolute X,Y coordinates on a defined canvas)
  • The project requires animation (fades, moves, rotations, scaling)
  • You are creating karaoke subtitles with per-syllable timing
  • You need multiple named styles (dialog, signs, songs, narrator)
  • Your playback environment supports ASS (VLC, mpv, libass-based players)

Limitations to understand:

  • Centisecond precision (10ms). ASS timestamps use H:MM:SS.cc, which is coarser than SRT and VTT. At 29.97fps, one frame is 33.367ms — so centisecond precision can round to the wrong frame boundary for approximately one in three frames.
  • No browser support. HTML5 <video> does not natively support ASS. Web playback requires JavaScript libraries like libass.js or JASSUB.
  • No streaming platform acceptance. Netflix, YouTube, Disney+, and other platforms do not accept ASS for delivery.
  • Complex syntax. Override tags like {\an8\pos(960,100)\fad(200,0)\1c&H00FFFF&} are powerful but error-prone to write and edit.

ASS is the most expressive subtitle format available, but its ecosystem is narrow. It is the right choice for creative work during production and the wrong choice for delivery to most platforms.

What You Lose in Conversion

Every format conversion is potentially lossy. Here is what survives each direction:

ConversionWhat survivesWhat is lost
SRT to VTTText, timingNothing (VTT is a superset for what SRT carries)
VTT to SRTText, timingPositioning, cue settings, CSS styling, regions
SRT to ASSText, timingNothing (ASS is a superset)
ASS to SRTText, timing (with centisecond rounding)Styles, positioning, animations, effects, karaoke tags
VTT to ASSText, timing, basic positioning (approximate)CSS styling, regions (different model)
ASS to VTTText, timing, basic positioning (approximate)Override tags, animations, exact fonts, per-syllable karaoke

The pattern: converting to a simpler format always loses data. Converting to a more complex format preserves everything the simpler format had, but cannot add what was never there.

This is why format choice matters most at the point of creation. If you author in SRT and later need ASS-style positioning, you have to add it manually. If you author in ASS and later need SRT, the conversion is automatic — you just lose the features SRT cannot carry.

When Format Choice Does Not Matter

For a surprising number of workflows, the format choice is irrelevant:

  • Plain dialog subtitles with no styling or positioning: SRT, VTT, and ASS all carry the same information. Pick whichever your delivery target prefers.
  • Short-form content (under 10 minutes): Millisecond drift at 29.97fps is negligible for short durations. Any format works.
  • Single-platform delivery: If you are only delivering to YouTube, use SRT. If only to HLS, use VTT. Do not over-engineer.

Format choice matters when you need features that only certain formats carry, or when you are building a pipeline that serves multiple delivery targets. In the latter case, store your canonical version in a format that preserves everything (like IMSC), and export to SRT/VTT/ASS as needed.

Converting Between Formats

The subtitle converter handles direct conversion between all pairs. Some commonly used conversions:

  • SRT to VTT — Web deployment from universal source files
  • VTT to SRT — Maximum compatibility from web-native source
  • ASS to SRT — Stripping styled subs down for platform delivery
  • SRT to ASS — Starting point for adding styles and effects

All conversions run through frame-number-canonical computation via the timecodes engine, so timing precision is maintained regardless of the source and target formats' native timestamp resolutions.

The Bottom Line

If you need...Use
Maximum compatibilitySRT
Web/streaming deliveryVTT
Creative styling and effectsASS
Archival and multi-target deliveryIMSC

Pick the simplest format that carries the features your workflow requires. Convert between them freely, but understand what each conversion loses. And when in doubt, start with the richest format you can — it is always easier to strip features out than to add them back.