YTT - Building a Self-Hosted YouTube Transcript Translator
Published on Feb 9, 2026
YTT — Building a Self-Hosted YouTube Transcript Translator
Most online transcript tools are riddled with ads, require sign-ups, or send your data to third-party servers. I wanted something simpler: paste a YouTube URL, get a translated transcript, and keep everything on my own hardware. That’s how YTT (YouTube Transcript Translator) was born — a self-hosted web application that fetches YouTube transcripts and translates them using LibreTranslate, a privacy-respecting open-source translation engine.
This post walks through the architecture, the single-container deployment strategy, the key technical decisions, and the challenges encountered along the way.
Table of Contents
- Architecture at a Glance
- The Single Container Pattern
- Key Features
- Frontend: SvelteKit 5 with Carbon Design System
- Backend: FastAPI + yt-dlp
- Technical Challenges
- What’s Next
- Wrapping Up
Architecture at a Glance
YTT follows an all-in-one container approach: a single Docker image serves both the SvelteKit frontend and the FastAPI backend. The translation work is offloaded to a separate LibreTranslate container on the same Docker network.
Tech Stack
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | SvelteKit 5 (Svelte 5 runes) | SPA with client-side routing |
| UI Library | Carbon Components Svelte v0.99.1 | IBM’s design system |
| Backend | FastAPI (Python 3.11) | REST API + static file serving |
| Transcripts | yt-dlp | YouTube subtitle extraction |
| Translation | LibreTranslate | Self-hosted machine translation |
| Data Storage | JSON file | History persistence (no database) |
| Deployment | Docker (multi-stage build) | Single container |
The Single Container Pattern
The most interesting architectural decision is serving both frontend and backend from a single container. This pattern is inspired by how many production apps work: the API server also serves the frontend’s static assets.
Multi-Stage Dockerfile
The build process uses two stages. The first stage compiles the SvelteKit app into static HTML/JS/CSS. The second stage sets up the Python runtime and copies those static files in.
# Stage 1: Build frontend with Node.js
FROM node:20-alpine AS frontend-builder
WORKDIR /app/frontend
COPY frontend/package.json frontend/pnpm-lock.yaml ./
RUN npm install -g pnpm && pnpm install --frozen-lockfile
COPY frontend/ ./
RUN pnpm run build
# Stage 2: Python runtime with built frontend
FROM python:3.11-slim
WORKDIR /app
COPY backend/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY backend/ .
# Copy built frontend from stage 1
COPY --from=frontend-builder /app/frontend/build ./staticThe result is a single image that includes everything: Python, FastAPI, yt-dlp, and the pre-built SvelteKit static files.
The SPAStaticFiles Router
The trickiest part of this pattern is handling client-side routing. When a user navigates to /history in the browser, the SvelteKit router handles it client-side. But if they refresh the page or access /history directly, the request hits the server — and there’s no history directory in the static files.
The solution is a custom static file handler that falls back to index.html for any path that looks like a route (no file extension):
from starlette.exceptions import HTTPException as StarletteHTTPException
class SPAStaticFiles(StaticFiles):
async def get_response(self, path: str, scope):
try:
response = await super().get_response(path, scope)
return response
except StarletteHTTPException as ex:
if ex.status_code == 404:
# Routes (no file extension) -> serve index.html
if not path or "." not in path.split("/")[-1]:
return await super().get_response("index.html", scope)
raise ex
# Mount as the last handler (catch-all)
app.mount("/", SPAStaticFiles(directory="static", html=True), name="spa")A subtle but important detail: this uses StarletteHTTPException, not FastAPI’s HTTPException. FastAPI’s exception handler intercepts the 404 differently, which prevents the fallback from working. This bug took some time to track down.
Same-Origin API
Because both the frontend and /api/* endpoints are served from the same origin on port 8000, there are no CORS issues in production. The frontend just uses relative paths like /api/translate and everything works. In development, the SvelteKit dev server proxies API calls to the backend.
Key Features
YouTube Transcript Fetching
The core workflow starts with pasting a YouTube URL. The backend extracts the video ID, then uses yt-dlp to download the available subtitles in VTT format. The raw VTT output is messy — it contains timing metadata, HTML color tags, and duplicate lines from overlapping subtitle segments. A cleaning pipeline strips all of this:
- Strip VTT headers and timestamps — remove
WEBVTT, timing lines, and position markers - Remove HTML tags — auto-generated subtitles contain
<c>color tags - Deduplicate lines — overlapping subtitle timings produce repeated text
- Paragraph merging — join single-sentence lines into readable paragraphs using punctuation-based heuristics
Automatic Translation
When the source language differs from the target language, YTT automatically translates the transcript via LibreTranslate. Large texts are split into ~5000 character chunks at paragraph boundaries, translated individually, then joined back together. This handles LibreTranslate’s practical text length limits while preserving paragraph structure.
Side-by-Side View with Paragraph Linking
The /view/[id] page displays the original and translated text in parallel scrollable panels. Hovering over a paragraph highlights the corresponding paragraph in the other panel, and scrolling one panel synchronizes the other. On desktop this uses percentage-based scroll synchronization; on mobile it switches to paragraph-index-based snapping.
Multiple Input Modes
Beyond YouTube URLs, YTT supports direct text input and file uploads (txt, srt, vtt, pdf, docx, json, yaml). All three modes are accessible via tabs on the main page.
Responsive Design
The app uses Carbon’s rail SideNav pattern — a 48px collapsed sidebar with icons that expands to 256px on hover. On mobile (< 1056px), it collapses entirely and is accessible via a hamburger menu. The history page switches from a full DataTable on desktop to a card-based layout on mobile.
Theme Support
Five Carbon themes are available (white, g10, g80, g90, g100), switchable from the settings page with live preview and persistence.
Frontend: SvelteKit 5 with Carbon Design System
Svelte 5 Runes
The entire frontend uses Svelte 5’s rune patterns ($state, $derived, $effect, $props). Here’s how the root layout manages navigation and theme state:
<script lang="ts">
import { Header, SideNav, SideNavItems, SideNavLink, Theme } from "carbon-components-svelte";
import Language from "carbon-icons-svelte/lib/Language.svelte";
import RecentlyViewed from "carbon-icons-svelte/lib/RecentlyViewed.svelte";
import Settings from "carbon-icons-svelte/lib/Settings.svelte";
import { loadSettings, settings } from '$lib/stores/settings.js';
let { children } = $props();
let isSideNavOpen = $state(false);
let innerWidth = $state(0);
let currentTheme = $state('white');
// Sync theme with persisted settings
$effect(() => {
if ($settings.theme) {
currentTheme = $settings.theme;
}
});
// Auto-open rail navigation on desktop
$effect(() => {
if (innerWidth >= 1056) {
isSideNavOpen = true;
}
});
</script>
<svelte:window bind:innerWidth />
<Theme bind:theme={currentTheme} persist persistKey="ytt-theme">
<Header company="YTT" platformName="YouTube Transcript Translator" bind:isSideNavOpen />
<SideNav bind:isOpen={isSideNavOpen} rail>
<SideNavItems>
<SideNavLink href="/" text="Translator" icon={Language} />
<SideNavLink href="/history" text="History" icon={RecentlyViewed} />
<SideNavLink href="/settings" text="Settings" icon={Settings} />
</SideNavItems>
</SideNav>
<main class="main-content">
{@render children()}
</main>
</Theme>SideNav Theme Fix
The SideNav doesn’t properly apply theme colors in dark modes out of the box. The fix requires manual :global() style overrides matching each theme:
:global([theme="g90"]) :global(.bx--side-nav),
:global([theme="g100"]) :global(.bx--side-nav) {
background-color: #262626;
color: #f4f4f4;
}The overall CSS strategy is to write as little custom CSS as possible — only for app-specific features like fullscreen mode and paragraph linking — and let Carbon’s component library handle everything else.
Backend: FastAPI + yt-dlp
API Structure
The backend is organized into four API routers, all mounted under /api:
| Endpoint | Method | Purpose |
|---|---|---|
/api/youtube/fetch | POST | Fetch and translate a YouTube transcript |
/api/youtube/info | POST | Get video metadata and available subtitles |
/api/translate | POST | Translate text or uploaded files |
/api/translate/detect | POST | Detect input language |
/api/languages | GET | List supported translation languages |
/api/history | GET | List saved translations (paginated) |
/api/history/{id} | GET/DELETE | Get or delete a specific translation |
/api/settings | GET/PUT | Read or update application settings |
/api/settings/export | GET | Export settings as JSON |
/api/settings/import | POST | Import settings from JSON |
/health | GET | Health check with build version info |
No Database Required
History is stored in a simple JSON file (data/history.json). For a single-user self-hosted tool, this is more than sufficient and avoids the complexity of managing a database. Each translation entry includes the video metadata, original text, translated text, timestamps, and language pair.
Transcript Caching
If you fetch the same video with the same language pair, YTT returns the cached result from history instead of re-downloading and re-translating. This saves time and avoids unnecessary calls to LibreTranslate.
Technical Challenges
SPA Routing with FastAPI
The initial SPAStaticFiles implementation used FastAPI.HTTPException to catch 404s. This silently failed because FastAPI’s exception handling pipeline processes HTTPException differently than Starlette’s. Switching to StarletteHTTPException (imported from starlette.exceptions) fixed the issue, allowing both direct URL access and page refresh to work correctly on all routes.
VTT Subtitle Cleaning
YouTube’s auto-generated VTT subtitles have several quirks: overlapping timing windows produce duplicate lines, <c> tags wrap individual words for karaoke-style highlighting, and paragraph boundaries don’t exist in the raw output. The cleaning pipeline handles deduplication, HTML stripping, and heuristic paragraph detection (splitting on sentences ending with ., !, ?, :, or ;).
What’s Next
Several features are planned for future development:
Timed Transcript Playback — Preserve VTT timestamps and add playback controls with synchronized paragraph highlighting and click-to-jump navigation. Export as SRT/VTT subtitle files.
LLM Integration — Add “Clean Text” and “Summarize” buttons powered by LLMs (Claude, OpenAI, Ollama). Remove filler words, generate summaries, and support multiple providers with cost estimation.
Enhanced Translation — Multiple translation providers (DeepL, OpenAI), quality scoring, custom glossaries, and translation memory.
Wrapping Up
YTT demonstrates that a practical self-hosted tool doesn’t need to be complex. A single Docker container running FastAPI with a SvelteKit SPA, communicating with LibreTranslate over a Docker network, covers the full workflow from YouTube URL to translated transcript. The single-container pattern keeps deployment simple — one image, one compose file, one deploy script.
The project is running in production at ytt.rappedoos.com and the source is available for anyone looking to set up their own instance or adapt the single-container SvelteKit + FastAPI pattern for their own projects.