← All articlesMarch 10, 2026

SRT vs VTT vs ASS: Choosing the Right Subtitle Format

A practical comparison of SRT, VTT, and ASS subtitle formats — what each supports, what you lose in conversion, and when format choice actually matters.

subtitlessrtvttasscomparisonformats

Choosing a subtitle format is not about which one is "best." It is about which one carries the features your delivery target requires — and nothing more. Each format represents a different trade-off between compatibility, capability, and complexity.

This guide breaks down the three most common text-based subtitle formats: SRT, VTT, and ASS. When you understand what each one actually stores, the choice usually makes itself.

Feature Comparison

Feature	SRT	VTT	ASS
Timestamp format	`HH:MM:SS,mmm`	`HH:MM:SS.mmm`	`H:MM:SS.cc` (centiseconds)
Timing precision	Milliseconds	Milliseconds	Centiseconds (10ms)
Bold/italic	Partial (`<b>`, `<i>` tags)	Yes (CSS + tags)	Yes (override tags)
Font specification	No	Yes (CSS)	Yes (style definitions)
Color	No	Yes (CSS)	Yes (override tags, hex BGR)
Vertical positioning	No	Yes (line cue setting)	Yes (absolute coordinates)
Horizontal positioning	No	Yes (position cue setting)	Yes (absolute coordinates)
Region/box definitions	No	Yes (REGION blocks)	No (but has collision detection)
Animation/effects	No	No	Yes (transforms, fades, moves)
Karaoke timing	No	No	Yes (per-syllable `\k` tags)
Multiple styles	No	Partial (::cue CSS)	Yes (named style definitions)
Sequence numbers	Required	Optional (cue identifiers)	No (index in section)
Multi-language	No	No	No (but multiple tracks possible)
Player support	Universal	Web browsers, modern players	VLC, mpv, Aegisub ecosystem
File size	Small	Small-Medium	Medium-Large

SRT: The Universal Default

SubRip Subtitle (SRT) has been the de facto subtitle format since the early 2000s. Its structure is minimal:

1
00:00:05,200 --> 00:00:08,100
This is the first subtitle.

2
00:00:09,000 --> 00:00:12,500
This is the second subtitle.

Sequential number, start/end timestamps with millisecond precision, text content, blank line separator. That is the entire specification.

Use SRT when:

You need maximum player compatibility
The delivery target is YouTube, Vimeo, or a generic media player
Your subtitles are plain text with no positioning requirements
You are building a subtitle pipeline and need a lowest-common-denominator format
The content is short-form where millisecond drift is negligible

Limitations to understand:

No positioning. Every subtitle renders at the player's default location (typically bottom-center). If you need a subtitle at the top of screen to avoid obscuring on-screen text, SRT cannot express this.
No styling metadata. Some players interpret <b>, <i>, and <u> HTML tags in SRT files, but this is non-standard behavior. There is no guarantee.
No font control. The viewer's player determines the font, size, and color.

SRT's strength is its weakness: it is so simple that every player can parse it, but so limited that it can only carry text and timing.

VTT: The Web Standard

WebVTT (Web Video Text Tracks) is the W3C standard for timed text in HTML5. Structurally, it resembles SRT with significant additions:

WEBVTT

STYLE
::cue(.speaker) {
  color: yellow;
  font-style: italic;
}

00:00:05.200 --> 00:00:08.100 position:10% align:start
<v Speaker>This is positioned and styled.</v>

00:00:09.000 --> 00:00:12.500 line:0
This appears at the top of the screen.

Use VTT when:

The delivery target is HTML5 <video> or <track> elements
You are packaging for HLS or DASH streaming (VTT is the standard subtitle format for both)
You need basic positioning (top/bottom, left/right alignment)
You need CSS-based styling that the browser will render
The platform is web-first (e-learning, web players, progressive web apps)

Limitations to understand:

Browser rendering varies. The same VTT file will look different in Chrome, Safari, and Firefox. CSS ::cue support is inconsistent across browsers and versions.
Region support is specified but poorly implemented. VTT REGION blocks define named areas of the screen, but real-world browser support is incomplete.
No animation or effects. VTT can position and style text, but it cannot animate it.
Legacy player support is weaker than SRT. Older media players and some smart TV firmware do not recognize VTT.

VTT occupies the middle ground: more capable than SRT, broadly supported on the web, but not rich enough for creative subtitle work.

ASS: Maximum Creative Control

Advanced SubStation Alpha (ASS) is the format of choice for anime fansubbing, karaoke, and any scenario requiring precise visual control:

[Script Info]
Title: Example
ScriptType: v4.00+
PlayResX: 1920
PlayResY: 1080

[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, ...
Style: Default,Arial,48,&H00FFFFFF,...

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:05.20,0:00:08.10,Default,,0,0,0,,This is a subtitle.
Dialogue: 0,0:00:09.00,0:00:12.50,Default,,0,0,0,,{\pos(960,200)}This is at the top.

Use ASS when:

You need pixel-precise positioning (absolute X,Y coordinates on a defined canvas)
The project requires animation (fades, moves, rotations, scaling)
You are creating karaoke subtitles with per-syllable timing
You need multiple named styles (dialog, signs, songs, narrator)
Your playback environment supports ASS (VLC, mpv, libass-based players)

Limitations to understand:

Centisecond precision (10ms). ASS timestamps use H:MM:SS.cc, which is coarser than SRT and VTT. At 29.97fps, one frame is 33.367ms — so centisecond precision can round to the wrong frame boundary for approximately one in three frames.
No browser support. HTML5 <video> does not natively support ASS. Web playback requires JavaScript libraries like libass.js or JASSUB.
No streaming platform acceptance. Netflix, YouTube, Disney+, and other platforms do not accept ASS for delivery.
Complex syntax. Override tags like {\an8\pos(960,100)\fad(200,0)\1c&H00FFFF&} are powerful but error-prone to write and edit.

ASS is the most expressive subtitle format available, but its ecosystem is narrow. It is the right choice for creative work during production and the wrong choice for delivery to most platforms.

What You Lose in Conversion

Every format conversion is potentially lossy. Here is what survives each direction:

Conversion	What survives	What is lost
SRT to VTT	Text, timing	Nothing (VTT is a superset for what SRT carries)
VTT to SRT	Text, timing	Positioning, cue settings, CSS styling, regions
SRT to ASS	Text, timing	Nothing (ASS is a superset)
ASS to SRT	Text, timing (with centisecond rounding)	Styles, positioning, animations, effects, karaoke tags
VTT to ASS	Text, timing, basic positioning (approximate)	CSS styling, regions (different model)
ASS to VTT	Text, timing, basic positioning (approximate)	Override tags, animations, exact fonts, per-syllable karaoke

The pattern: converting to a simpler format always loses data. Converting to a more complex format preserves everything the simpler format had, but cannot add what was never there.

This is why format choice matters most at the point of creation. If you author in SRT and later need ASS-style positioning, you have to add it manually. If you author in ASS and later need SRT, the conversion is automatic — you just lose the features SRT cannot carry.

When Format Choice Does Not Matter

For a surprising number of workflows, the format choice is irrelevant:

Plain dialog subtitles with no styling or positioning: SRT, VTT, and ASS all carry the same information. Pick whichever your delivery target prefers.
Short-form content (under 10 minutes): Millisecond drift at 29.97fps is negligible for short durations. Any format works.
Single-platform delivery: If you are only delivering to YouTube, use SRT. If only to HLS, use VTT. Do not over-engineer.

Format choice matters when you need features that only certain formats carry, or when you are building a pipeline that serves multiple delivery targets. In the latter case, store your canonical version in a format that preserves everything (like IMSC), and export to SRT/VTT/ASS as needed.

Converting Between Formats

The subtitle converter handles direct conversion between all pairs. Some commonly used conversions:

SRT to VTT — Web deployment from universal source files
VTT to SRT — Maximum compatibility from web-native source
ASS to SRT — Stripping styled subs down for platform delivery
SRT to ASS — Starting point for adding styles and effects

All conversions run through frame-number-canonical computation via the timecodes engine, so timing precision is maintained regardless of the source and target formats' native timestamp resolutions.

The Bottom Line

If you need...	Use
Maximum compatibility	SRT
Web/streaming delivery	VTT
Creative styling and effects	ASS
Archival and multi-target delivery	IMSC

Pick the simplest format that carries the features your workflow requires. Convert between them freely, but understand what each conversion loses. And when in doubt, start with the richest format you can — it is always easier to strip features out than to add them back.