Why Your 29.97 Subtitle Sync Drifts (And How to Fix It)
The math behind 29.97fps subtitle drift, why millisecond-based tools lose sync over long programs, and how frame-number-canonical computation eliminates the problem.
You have a two-hour program at 29.97fps. You load the subtitles into your player. The first few cues are fine. By the midpoint, things look slightly off. By the last act, subtitles are arriving a visible beat late — or early. You did not touch the timing. The file validated clean. What happened?
The answer is baked into the number 29.97 itself.
29.97 Is Not 29.97
When NTSC color television was standardized in 1953, the framerate was shifted from exactly 30 frames per second to 30000/1001 frames per second. This works out to approximately 29.97002997... fps — an infinitely repeating decimal.
This is not a rounding convention. The exact framerate is the fraction 30000/1001. Every tool that handles NTSC content must use this fraction, not the approximation 29.97.
One frame at 30000/1001 fps lasts exactly:
1001 / 30000 = 0.033366666... seconds = 33.366666... milliseconds
That trailing 666... is the source of every sync problem in millisecond-based subtitle tools.
The Millisecond Truncation Problem
SRT, VTT, and SBV all store timestamps with millisecond precision — three decimal places of seconds. This means every timestamp is rounded to the nearest millisecond.
One frame at 29.97fps is 33.3667ms. Stored as an SRT timestamp, it becomes either 33ms or 34ms. Neither is correct. The error on a single frame is small — about 0.367ms or 0.633ms. But this error is not random. It is systematic, and it accumulates.
Here is the math for a two-hour program:
Total frames in 2 hours at 29.97fps:
2 * 3600 * 30000/1001 = 215,784.216 frames
(rounded: 215,784 frames)
Exact duration of 215,784 frames:
215,784 * 1001/30000 = 7,199.993... seconds
Duration as stored in ms-based format:
Each frame boundary rounded to nearest ms
Cumulative rounding over 215,784 frames
In practice, because rounding errors partially cancel over successive frames, accumulated millisecond rounding over a two-hour program routinely produces drift of 2-4 frames — enough to be visible to viewers and enough to fail broadcast QC.
The theoretical worst case illustrates why. If the rounding error averaged 0.3ms per frame boundary (because 33.367ms truncates to 33ms more often than it rounds to 34ms), then over 215,784 frames:
Theoretical worst-case drift = 0.3ms * 215,784 / 1000 = ~64.7 seconds
Real files never hit this bound because rounding errors alternate direction, but the arithmetic shows how much error budget exists in the system. Even a small fraction of this worst case is enough to push subtitles out of sync.
A Concrete Example
Consider frame 150,000 in a 29.97fps timeline. Its exact time position:
150,000 * 1001 / 30000 = 5,005.000 seconds exactly
That one happens to land cleanly. Now consider frame 150,001:
150,001 * 1001 / 30000 = 5,005.033366... seconds
= 5,005 seconds, 33.367ms
Stored in SRT: 01:23:25,033 (truncated). The real position is 33.367ms into that second. The stored position is 33.000ms. That is a 0.367ms error on this single cue.
Now consider what happens when a tool reads that SRT timestamp back and tries to determine what frame it belongs to:
5005.033 seconds * 30000/1001 = 150,000.989...
Floor → frame 150,000
The round-trip lost one frame. Frame 150,001 was written to SRT and read back as frame 150,000. This is not a bug in the tool — it is an inherent limitation of millisecond storage for non-integer framerates.
Drop-Frame Timecode: A Related But Different Problem
Drop-frame timecode is often confused with this drift issue, but it solves a different problem.
At exactly 30fps, timecode 01:00:00:00 corresponds to wall-clock time 1:00:00.000 — exactly one hour. At 29.97fps, after one hour of wall-clock time, only 107,892 frames have elapsed, but a non-drop timecode counter would read 00:59:56:12. The timecode is "slow" by 3.6 seconds per hour relative to the wall clock.
Drop-frame timecode compensates by skipping frame numbers 0 and 1 at the start of each minute, except every tenth minute. This makes the timecode display approximate wall-clock time (within ~2 frames per day).
Drop-frame solves the display problem — making timecode readouts match wall clocks and program durations. It does not solve the computation problem of millisecond rounding. A drop-frame timecode is still a label for a frame number, and converting that label to milliseconds still introduces the same truncation error.
The Solution: Frame-Number-Canonical Computation
The fix is architectural, not algorithmic. Stop treating milliseconds as a source of truth.
The correct pipeline:
- Ingest: Parse the timestamp (milliseconds, timecode, whatever). Immediately convert to a frame number at the known framerate, using the exact fraction (30000/1001, not 29.97).
- Compute: All operations — shifting, scaling, splitting, merging — happen in integer frame space. No floating point. No rounding.
- Export: Convert frame numbers back to the target format's timestamp representation. If the target is SRT, derive milliseconds from the frame number. If the target is SCC, derive SMPTE timecode.
This is what the timecodes engine does internally. The Timecode class stores a frame number as its canonical value. When you construct a Timecode from milliseconds, it immediately resolves to a frame number using the exact framerate fraction. When you read .milliseconds or .timecode back out, those values are derived from the frame number — not stored independently.
import { Timecode } from 'timecodes';
// Construct from milliseconds — internally resolves to frame number
const tc = new Timecode({
inputValue: 5005033,
valueType: 'milliseconds',
framerate: '29.97'
});
console.log(tc.frameNumber); // Exact frame number
console.log(tc.timecode); // Derived from frame number
console.log(tc.milliseconds); // Derived from frame number
Because the frame number is an integer, there is no accumulation. Frame 150,001 is always frame 150,001, regardless of how many conversions it passes through.
How to Detect Drift in Your Subtitles
If you suspect your existing subtitle files have drifted, here is a practical check:
- Determine the exact frame count of your program. Your NLE or media info tool will report this.
- Convert that frame count to a timecode at 29.97fps (drop-frame if applicable).
- Look at the last subtitle's out-cue timestamp in your SRT/VTT file.
- Compare. If the last subtitle's out-cue is more than one frame's duration (33ms) away from where it should be relative to the program duration, you have accumulated drift.
You can test this directly in the playground, which lets you convert between frame numbers, timecodes, and milliseconds at any framerate and see the exact values.
Preventing Drift in Your Workflow
The rules are straightforward:
- Never store milliseconds as a source of truth. If your subtitle database or project file stores timing as milliseconds, you have a drift vector. Store frame numbers and framerate instead.
- Never multiply milliseconds by a ratio to change framerates. Converting 29.97fps subtitles to 25fps requires going through frame numbers, not multiplying timestamps by 25/29.97.
- Always know your framerate. A millisecond value without framerate context is ambiguous. The same millisecond value maps to different frame numbers at different framerates.
- Validate after every conversion. When you export from your subtitle tool, check the last cue against the known program duration.
These are not suggestions — they are requirements for frame-accurate work. The subtitle tool enforces framerate context on every operation precisely because a subtitle file without a framerate is an incomplete document.
The Takeaway
29.97fps drift is not a mystery bug. It is a predictable, mathematical consequence of storing non-terminating repeating decimal frame durations in a fixed-precision decimal format. The solution has been known in broadcast engineering for decades: work in frame numbers, not milliseconds. Every tool in the chain needs to respect this, from ingest through delivery.
If your subtitles drift, the problem is not your content. The problem is your tools.