YTT - Building a Self-Hosted YouTube Transcript Translator

Published on Feb 9, 2026

YTT - Building a Self-Hosted YouTube Transcript Translator

YTT — Building a Self-Hosted YouTube Transcript Translator

Most online transcript tools are riddled with ads, require sign-ups, or send your data to third-party servers. I wanted something simpler: paste a YouTube URL, get a translated transcript, and keep everything on my own hardware. That’s how YTT (YouTube Transcript Translator) was born — a self-hosted web application that fetches YouTube transcripts and translates them using LibreTranslate, a privacy-respecting open-source translation engine.

This post walks through the architecture, the single-container deployment strategy, the key technical decisions, and the challenges encountered along the way.

Table of Contents

Architecture at a Glance

YTT follows an all-in-one container approach: a single Docker image serves both the SvelteKit frontend and the FastAPI backend. The translation work is offloaded to a separate LibreTranslate container on the same Docker network.

Tech Stack

LayerTechnologyPurpose
FrontendSvelteKit 5 (Svelte 5 runes)SPA with client-side routing
UI LibraryCarbon Components Svelte v0.99.1IBM’s design system
BackendFastAPI (Python 3.11)REST API + static file serving
Transcriptsyt-dlpYouTube subtitle extraction
TranslationLibreTranslateSelf-hosted machine translation
Data StorageJSON fileHistory persistence (no database)
DeploymentDocker (multi-stage build)Single container

The Single Container Pattern

The most interesting architectural decision is serving both frontend and backend from a single container. This pattern is inspired by how many production apps work: the API server also serves the frontend’s static assets.

Multi-Stage Dockerfile

The build process uses two stages. The first stage compiles the SvelteKit app into static HTML/JS/CSS. The second stage sets up the Python runtime and copies those static files in.

# Stage 1: Build frontend with Node.js
FROM node:20-alpine AS frontend-builder
WORKDIR /app/frontend
COPY frontend/package.json frontend/pnpm-lock.yaml ./
RUN npm install -g pnpm && pnpm install --frozen-lockfile
COPY frontend/ ./
RUN pnpm run build

# Stage 2: Python runtime with built frontend
FROM python:3.11-slim
WORKDIR /app
COPY backend/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY backend/ .

# Copy built frontend from stage 1
COPY --from=frontend-builder /app/frontend/build ./static

The result is a single image that includes everything: Python, FastAPI, yt-dlp, and the pre-built SvelteKit static files.

The SPAStaticFiles Router

The trickiest part of this pattern is handling client-side routing. When a user navigates to /history in the browser, the SvelteKit router handles it client-side. But if they refresh the page or access /history directly, the request hits the server — and there’s no history directory in the static files.

The solution is a custom static file handler that falls back to index.html for any path that looks like a route (no file extension):

from starlette.exceptions import HTTPException as StarletteHTTPException

class SPAStaticFiles(StaticFiles):
    async def get_response(self, path: str, scope):
        try:
            response = await super().get_response(path, scope)
            return response
        except StarletteHTTPException as ex:
            if ex.status_code == 404:
                # Routes (no file extension) -> serve index.html
                if not path or "." not in path.split("/")[-1]:
                    return await super().get_response("index.html", scope)
            raise ex

# Mount as the last handler (catch-all)
app.mount("/", SPAStaticFiles(directory="static", html=True), name="spa")
Warning

A subtle but important detail: this uses StarletteHTTPException, not FastAPI’s HTTPException. FastAPI’s exception handler intercepts the 404 differently, which prevents the fallback from working. This bug took some time to track down.

Same-Origin API

Because both the frontend and /api/* endpoints are served from the same origin on port 8000, there are no CORS issues in production. The frontend just uses relative paths like /api/translate and everything works. In development, the SvelteKit dev server proxies API calls to the backend.


Key Features

YouTube Transcript Fetching

The core workflow starts with pasting a YouTube URL. The backend extracts the video ID, then uses yt-dlp to download the available subtitles in VTT format. The raw VTT output is messy — it contains timing metadata, HTML color tags, and duplicate lines from overlapping subtitle segments. A cleaning pipeline strips all of this:

  1. Strip VTT headers and timestamps — remove WEBVTT, timing lines, and position markers
  2. Remove HTML tags — auto-generated subtitles contain <c> color tags
  3. Deduplicate lines — overlapping subtitle timings produce repeated text
  4. Paragraph merging — join single-sentence lines into readable paragraphs using punctuation-based heuristics

Automatic Translation

When the source language differs from the target language, YTT automatically translates the transcript via LibreTranslate. Large texts are split into ~5000 character chunks at paragraph boundaries, translated individually, then joined back together. This handles LibreTranslate’s practical text length limits while preserving paragraph structure.

Side-by-Side View with Paragraph Linking

The /view/[id] page displays the original and translated text in parallel scrollable panels. Hovering over a paragraph highlights the corresponding paragraph in the other panel, and scrolling one panel synchronizes the other. On desktop this uses percentage-based scroll synchronization; on mobile it switches to paragraph-index-based snapping.

Multiple Input Modes

Beyond YouTube URLs, YTT supports direct text input and file uploads (txt, srt, vtt, pdf, docx, json, yaml). All three modes are accessible via tabs on the main page.

Responsive Design

The app uses Carbon’s rail SideNav pattern — a 48px collapsed sidebar with icons that expands to 256px on hover. On mobile (< 1056px), it collapses entirely and is accessible via a hamburger menu. The history page switches from a full DataTable on desktop to a card-based layout on mobile.

Theme Support

Five Carbon themes are available (white, g10, g80, g90, g100), switchable from the settings page with live preview and persistence.


Frontend: SvelteKit 5 with Carbon Design System

Svelte 5 Runes

The entire frontend uses Svelte 5’s rune patterns ($state, $derived, $effect, $props). Here’s how the root layout manages navigation and theme state:

<script lang="ts">
  import { Header, SideNav, SideNavItems, SideNavLink, Theme } from "carbon-components-svelte";
  import Language from "carbon-icons-svelte/lib/Language.svelte";
  import RecentlyViewed from "carbon-icons-svelte/lib/RecentlyViewed.svelte";
  import Settings from "carbon-icons-svelte/lib/Settings.svelte";
  import { loadSettings, settings } from '$lib/stores/settings.js';

  let { children } = $props();
  let isSideNavOpen = $state(false);
  let innerWidth = $state(0);
  let currentTheme = $state('white');

  // Sync theme with persisted settings
  $effect(() => {
    if ($settings.theme) {
      currentTheme = $settings.theme;
    }
  });

  // Auto-open rail navigation on desktop
  $effect(() => {
    if (innerWidth >= 1056) {
      isSideNavOpen = true;
    }
  });
</script>

<svelte:window bind:innerWidth />

<Theme bind:theme={currentTheme} persist persistKey="ytt-theme">
  <Header company="YTT" platformName="YouTube Transcript Translator" bind:isSideNavOpen />

  <SideNav bind:isOpen={isSideNavOpen} rail>
    <SideNavItems>
      <SideNavLink href="/" text="Translator" icon={Language} />
      <SideNavLink href="/history" text="History" icon={RecentlyViewed} />
      <SideNavLink href="/settings" text="Settings" icon={Settings} />
    </SideNavItems>
  </SideNav>

  <main class="main-content">
    {@render children()}
  </main>
</Theme>

SideNav Theme Fix

The SideNav doesn’t properly apply theme colors in dark modes out of the box. The fix requires manual :global() style overrides matching each theme:

:global([theme="g90"]) :global(.bx--side-nav),
:global([theme="g100"]) :global(.bx--side-nav) {
  background-color: #262626;
  color: #f4f4f4;
}

The overall CSS strategy is to write as little custom CSS as possible — only for app-specific features like fullscreen mode and paragraph linking — and let Carbon’s component library handle everything else.


Backend: FastAPI + yt-dlp

API Structure

The backend is organized into four API routers, all mounted under /api:

EndpointMethodPurpose
/api/youtube/fetchPOSTFetch and translate a YouTube transcript
/api/youtube/infoPOSTGet video metadata and available subtitles
/api/translatePOSTTranslate text or uploaded files
/api/translate/detectPOSTDetect input language
/api/languagesGETList supported translation languages
/api/historyGETList saved translations (paginated)
/api/history/{id}GET/DELETEGet or delete a specific translation
/api/settingsGET/PUTRead or update application settings
/api/settings/exportGETExport settings as JSON
/api/settings/importPOSTImport settings from JSON
/healthGETHealth check with build version info

No Database Required

History is stored in a simple JSON file (data/history.json). For a single-user self-hosted tool, this is more than sufficient and avoids the complexity of managing a database. Each translation entry includes the video metadata, original text, translated text, timestamps, and language pair.

Transcript Caching

If you fetch the same video with the same language pair, YTT returns the cached result from history instead of re-downloading and re-translating. This saves time and avoids unnecessary calls to LibreTranslate.


Technical Challenges

SPA Routing with FastAPI

The initial SPAStaticFiles implementation used FastAPI.HTTPException to catch 404s. This silently failed because FastAPI’s exception handling pipeline processes HTTPException differently than Starlette’s. Switching to StarletteHTTPException (imported from starlette.exceptions) fixed the issue, allowing both direct URL access and page refresh to work correctly on all routes.

VTT Subtitle Cleaning

YouTube’s auto-generated VTT subtitles have several quirks: overlapping timing windows produce duplicate lines, <c> tags wrap individual words for karaoke-style highlighting, and paragraph boundaries don’t exist in the raw output. The cleaning pipeline handles deduplication, HTML stripping, and heuristic paragraph detection (splitting on sentences ending with ., !, ?, :, or ;).


What’s Next

Several features are planned for future development:

  1. Timed Transcript Playback — Preserve VTT timestamps and add playback controls with synchronized paragraph highlighting and click-to-jump navigation. Export as SRT/VTT subtitle files.

  2. LLM Integration — Add “Clean Text” and “Summarize” buttons powered by LLMs (Claude, OpenAI, Ollama). Remove filler words, generate summaries, and support multiple providers with cost estimation.

  3. Enhanced Translation — Multiple translation providers (DeepL, OpenAI), quality scoring, custom glossaries, and translation memory.


Wrapping Up

YTT demonstrates that a practical self-hosted tool doesn’t need to be complex. A single Docker container running FastAPI with a SvelteKit SPA, communicating with LibreTranslate over a Docker network, covers the full workflow from YouTube URL to translated transcript. The single-container pattern keeps deployment simple — one image, one compose file, one deploy script.

The project is running in production at ytt.rappedoos.com and the source is available for anyone looking to set up their own instance or adapt the single-container SvelteKit + FastAPI pattern for their own projects.