docs: add Colab notebook for full-talk transcription (notebooks/colab_full_transcribe.ipynb)

GPU(T4) 셀: ffmpeg+uv → 익명 clone → uv sync(engine+gpu) → detect → 오디오 업로드 → large-v3-turbo 풀 전사 → transcript.txt 다운로드. (Colab은 사내 게이트 미도달이라 전사 전용; 보정은 온프렘.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
chore(omc): hotpaths (beam-size/correct/COLAB)
2026-06-09 07:33:54 +09:00 · 2026-06-09 07:29:37 +09:00 · 2026-06-09 07:29:37 +09:00 · 2026-06-09 07:09:51 +09:00 · 2026-06-09 07:09:51 +09:00 · 2026-06-08 23:20:01 +09:00
44 changed files with 5983 additions and 13 deletions
@@ -0,0 +1,24 @@
+# luke_scribe 설정 예시 — 복사: cp .env.example .env  (env prefix: SCRIBE_)
+
+# 모델 (하이브리드 기본; P1 bench 결과에 따라 단일 turbo로 통일 가능)
+SCRIBE_MODEL_REALTIME=large-v3-turbo
+SCRIBE_MODEL_BATCH=large-v3
+
+# 디바이스: auto|cpu|cuda|cuda:0 — 자동 산정, 강제 가능
+SCRIBE_DEVICE=auto
+# SCRIBE_COMPUTE_TYPE=int8        # 비우면 cc/VRAM 기반 자동
+# SCRIBE_WORKERS=1                # 비우면 자동 산정
+
+SCRIBE_LANGUAGE=ko
+
+# 입력 절대 상한 (초과 413)
+SCRIBE_MAX_DURATION_S=14400       # 4h
+SCRIBE_MAX_SIZE_BYTES=2147483648  # 2GB
+
+# 보관 (P2+)
+SCRIBE_RETENTION_DAYS=7
+# SCRIBE_REDIS_URL=redis://localhost:6379/0
+# SCRIBE_API_KEYS=["key1","key2"]
+
+# 터널 (P5): none|cloudflare|ngrok
+SCRIBE_TUNNEL=none
@@ -21,8 +21,12 @@ venv/
 # Models / data / scratch
 *.log
 models/
-samples/*.wav
-samples/*.mp4
+samples/**/*.wav
+samples/**/*.flac
+samples/**/*.mp3
+samples/**/*.m4a
+samples/**/*.mp4
+samples/**/*.mov

 # ─── OS / editor ──────────────────────────────────────────
 .DS_Store
@@ -1,21 +1,37 @@
 {
  "version": "1.0.0",
-  "lastScanned": 1780794206309,
+  "lastScanned": 1780919472386,
  "projectRoot": "/root/luke_scribe",
  "techStack": {
    "languages": [
-      "Python"
+      {
+        "name": "Python",
+        "version": null,
+        "confidence": "high",
+        "markers": [
+          "pyproject.toml"
+        ]
+      }
    ],
    "frameworks": [
-      "FastAPI · faster-whisper/CTranslate2 · Redis/RQ(no-fork) · pydantic v2 · ffmpeg · Silero VAD"
+      {
+        "name": "fastapi",
+        "version": null,
+        "category": "backend"
+      },
+      {
+        "name": "pytest",
+        "version": null,
+        "category": "testing"
+      }
    ],
-    "packageManager": "uv",
-    "runtime": "Python 3.11+"
+    "packageManager": null,
+    "runtime": null
  },
  "build": {
    "buildCommand": null,
-    "testCommand": null,
-    "lintCommand": null,
+    "testCommand": "export PATH=\"$HOME/.local/bin:$HOME/.cargo/bin:$PATH\"\necho \"=== ruff ===\"; uv run ruff check src/ tests/ && echo clean\necho \"=== pytest ===\"; uv run pytest -q 2>&1 | tail -2\necho \"=== --correct 경로(설정 없음 → 우아한 에러) ===\"\nuv run luke-scribe transcribe /tmp/jfk.flac --model tiny --language en --correct 2>&1 | tail -4; echo \"exit=${PIPESTATUS[0]}\"",
+    "lintCommand": "ruff check",
    "devCommand": null,
    "scripts": {}
  },
@@ -29,9 +45,10 @@
    "isMonorepo": false,
    "workspaces": [],
    "mainDirectories": [
-      "src/luke_scribe (계획, 미생성)"
+      "src",
+      "tests"
    ],
-    "gitBranches": "main"
+    "gitBranches": null
  },
  "customNotes": [
    {
@@ -51,10 +68,284 @@
      "source": "manual",
      "category": "env",
      "content": "git 원격=자체호스팅 Gitea https://git.lukehemmin.com (openresty, HTTPS/443 전용, SSH 미노출). 인증=PAT를 ~/.git-credentials에 저장(global helper store, username lukehemmin) — 검증완료, VS Code askpass 없이 push 됨. ⚠️ 저장소 익명 읽기 허용 상태(내부/비공개 의도면 Gitea에서 Private 점검)."
+    },
+    {
+      "timestamp": 1780812476362,
+      "source": "manual",
+      "category": "status",
+      "content": "P1 진행(2026-06-07): ✅ detect(능력등급 T0~T3, 1050→T0_CPU 명시강등) · ✅ transcribe(faster-whisper CPU 검증: JFK 11s 클립 정확 전사, model_used 출력) · 단위테스트 10개 통과. 코드 존재함(더 이상 0%). 남음: word-ts/format 출력옵션·Silero VAD 옵션화, VRAM 실측 probe(정적추정 대체), bench(라벨 KO+EN 샘플셋 필요), 상위 tier(T2/T3) Colab 검증, P2(API+Redis/RQ). 브랜치 feat/p1-core."
+    },
+    {
+      "timestamp": 1780926195887,
+      "source": "manual",
+      "category": "finding",
+      "content": "검증된 발견(2026-06-07): KO+EN 혼용어 음차 문제의 open-vocab 해법 = 사내 GPT-4o 텍스트 후처리 보정. faster-whisper(turbo)가 음차로 망친 영문 용어를 hotwords 등록 없이 문맥+지식으로 복원. 실증(EmbeddingGemma 강연 90초 슬라이스): 인베딩 점마→Embedding Gemma, 재미나이→Gemini, 점마→Gemma, 랭기징→Language, 구글 포 디벨로퍼스→Google for Developers (5/5, 일반 한국어는 보존). 게이트=OpenAI 호환(baseURL http://192.168.0.123:8080/v1, model copilot-gpt-4o, API키 필요·키는 메모리에 저장 안 함; localhost:8080은 사용자 머신 터널이라 샌드박스선 미도달) → 사내 호출이라 외부 egress 0(프라이버시 OK). 함의: hotwords는 등록된 것만 잡아 불충분, LLM 문맥보정이 '모르는 용어'까지 커버. 단서: (1) 'Embedding Gemma' 띄어쓰기(공식 EmbeddingGemma)→rules/glossary 정규화 병행 필요, (2) LLM이 아는/추론가능 용어만·초신조어는 confidence 플래그→휴먼, (3) 샘플1개라 과교정 추가검증, (4) 게이트 경로 불안정(401→timeout→reset)→재시도 필요(스크립트에 반영). 작은 컨텍스트는 청크+러닝글로서리로 우회. PoC=scripts/llm_correct.py → 승격 대상 postprocess/llm.py(confidence-gated·청크·backend=internal·감사로그) + transcribe --correct 플래그."
+    }
+  ],
+  "directoryMap": {
+    "samples": {
+      "path": "samples",
+      "purpose": null,
+      "fileCount": 1,
+      "lastAccessed": 1780919472362,
+      "keyFiles": [
+        "README.md"
+      ]
+    },
+    "src": {
+      "path": "src",
+      "purpose": "Source code",
+      "fileCount": 0,
+      "lastAccessed": 1780919472371,
+      "keyFiles": []
+    },
+    "tests": {
+      "path": "tests",
+      "purpose": "Test files",
+      "fileCount": 2,
+      "lastAccessed": 1780919472373,
+      "keyFiles": [
+        "test_device_manager.py",
+        "test_engine_audio.py"
+      ]
+    }
+  },
+  "hotPaths": [
+    {
+      "path": "src/luke_scribe/cli.py",
+      "accessCount": 8,
+      "lastAccessed": 1780957705972,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/config.py",
+      "accessCount": 5,
+      "lastAccessed": 1780957473801,
+      "type": "file"
+    },
+    {
+      "path": "scripts/llm_correct.py",
+      "accessCount": 4,
+      "lastAccessed": 1780925584647,
+      "type": "file"
+    },
+    {
+      "path": "pyproject.toml",
+      "accessCount": 4,
+      "lastAccessed": 1780928043613,
+      "type": "file"
+    },
+    {
+      "path": "README.md",
+      "accessCount": 3,
+      "lastAccessed": 1780812417055,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/postprocess/llm.py",
+      "accessCount": 3,
+      "lastAccessed": 1780956524689,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/api/routes/transcribe.py",
+      "accessCount": 3,
+      "lastAccessed": 1780956549345,
+      "type": "file"
+    },
+    {
+      "path": "tests/test_postprocess.py",
+      "accessCount": 2,
+      "lastAccessed": 1780956556589,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/__init__.py",
+      "accessCount": 1,
+      "lastAccessed": 1780804261889,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/devices/__init__.py",
+      "accessCount": 1,
+      "lastAccessed": 1780804263611,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/devices/profile.py",
+      "accessCount": 1,
+      "lastAccessed": 1780804266795,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/devices/vram_probe.py",
+      "accessCount": 1,
+      "lastAccessed": 1780804273484,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/devices/manager.py",
+      "accessCount": 1,
+      "lastAccessed": 1780804300531,
+      "type": "file"
+    },
+    {
+      "path": "run.sh",
+      "accessCount": 1,
+      "lastAccessed": 1780804312249,
+      "type": "file"
+    },
+    {
+      "path": ".env.example",
+      "accessCount": 1,
+      "lastAccessed": 1780804316978,
+      "type": "file"
+    },
+    {
+      "path": "tests/test_device_manager.py",
+      "accessCount": 1,
+      "lastAccessed": 1780804449331,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/engine/__init__.py",
+      "accessCount": 1,
+      "lastAccessed": 1780812252757,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/engine/model_registry.py",
+      "accessCount": 1,
+      "lastAccessed": 1780812254912,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/engine/faster_whisper_engine.py",
+      "accessCount": 1,
+      "lastAccessed": 1780812261152,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/audio/__init__.py",
+      "accessCount": 1,
+      "lastAccessed": 1780812262920,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/audio/ingest.py",
+      "accessCount": 1,
+      "lastAccessed": 1780812299865,
+      "type": "file"
+    },
+    {
+      "path": "tests/test_engine_audio.py",
+      "accessCount": 1,
+      "lastAccessed": 1780812413312,
+      "type": "file"
+    },
+    {
+      "path": "samples/README.md",
+      "accessCount": 1,
+      "lastAccessed": 1780812722445,
+      "type": "file"
+    },
+    {
+      "path": "samples/ko_en/manifest.jsonl.example",
+      "accessCount": 1,
+      "lastAccessed": 1780812854083,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/results/__init__.py",
+      "accessCount": 1,
+      "lastAccessed": 1780927886298,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/results/formats.py",
+      "accessCount": 1,
+      "lastAccessed": 1780927892282,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/postprocess/__init__.py",
+      "accessCount": 1,
+      "lastAccessed": 1780927894092,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/postprocess/rules.py",
+      "accessCount": 1,
+      "lastAccessed": 1780927897308,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/api/__init__.py",
+      "accessCount": 1,
+      "lastAccessed": 1780927952439,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/api/schemas.py",
+      "accessCount": 1,
+      "lastAccessed": 1780927953308,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/api/engine_pool.py",
+      "accessCount": 1,
+      "lastAccessed": 1780927954191,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/api/deps.py",
+      "accessCount": 1,
+      "lastAccessed": 1780927955218,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/api/app.py",
+      "accessCount": 1,
+      "lastAccessed": 1780927956175,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/api/routes/__init__.py",
+      "accessCount": 1,
+      "lastAccessed": 1780927957095,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/connectivity/__init__.py",
+      "accessCount": 1,
+      "lastAccessed": 1780927962648,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/connectivity/tunnel.py",
+      "accessCount": 1,
+      "lastAccessed": 1780927971385,
+      "type": "file"
+    },
+    {
+      "path": "tests/test_formats.py",
+      "accessCount": 1,
+      "lastAccessed": 1780928016400,
+      "type": "file"
+    },
+    {
+      "path": "tests/test_api.py",
+      "accessCount": 1,
+      "lastAccessed": 1780928028187,
+      "type": "file"
+    },
+    {
+      "path": "COLAB.md",
+      "accessCount": 1,
+      "lastAccessed": 1780957731994,
+      "type": "file"
    }
  ],
-  "directoryMap": {},
-  "hotPaths": [],
  "userDirectives": [
    {
      "timestamp": 1780801958149,
@@ -0,0 +1,79 @@
+# Colab / GPU 풀 전사 가이드
+
+GPU 환경(Colab T4/A100 또는 온프렘 GPU)에서 **풀 강연을 빠르게** 전사(+선택 보정)합니다.
+CPU(개발 박스)는 풀 강연이 느려(turbo ~RTF 5×) 비권장 — 여기서 돌리세요.
+GPU(T4)에서 turbo는 대략 실시간의 ~0.1~0.3× → **37분 강연이 수 분**.
+
+---
+
+## A) Google Colab — 전사 전용
+
+> Colab은 외부 클라우드라 **사내 LLM 게이트(192.168.0.123)에 못 닿습니다** → `--correct`(보정) 불가, **전사만**.
+> 런타임 → 런타임 유형 변경 → **GPU(T4)** 선택.
+
+```python
+# 1) 시스템 의존성 + uv
+!apt-get -qq update && apt-get -qq install -y ffmpeg
+!curl -LsSf https://astral.sh/uv/install.sh | sh
+import os; os.environ["PATH"] = "/root/.local/bin:" + os.environ["PATH"]
+
+# 2) 코드 (저장소 익명 read 허용)
+!git clone -b feat/p1-core https://git.lukehemmin.com/lukehemmin/luke_scribe.git
+%cd luke_scribe
+
+# 3) 의존성 (엔진 + GPU CUDA 런타임)
+!uv sync --extra engine --extra gpu
+
+# 4) GPU 인식 확인 (T3면 turbo+large-v3 동시상주)
+!uv run luke-scribe detect
+
+# 5) 오디오 업로드 (또는 Drive 마운트)
+from google.colab import files
+AUDIO = list(files.upload().keys())[0]
+
+# 6) 풀 전사 (large-v3-turbo) — 더 높은 정확도는 --model large-v3
+!uv run luke-scribe transcribe "$AUDIO" --model large-v3-turbo --language ko --timestamps | tee transcript.txt
+```
+
+### Colab을 API로 외부 노출하려면
+```python
+# cloudflared 공개 URL 발급 → 외부에서 curl
+!uv sync --extra engine --extra gpu --extra api
+import subprocess, os
+os.environ["SCRIBE_API_KEYS"] = '["colab-test"]'
+!nohup uv run luke-scribe serve --host 0.0.0.0 --port 8000 --tunnel cloudflare > serve.log 2>&1 &
+import time; time.sleep(8); print(open("serve.log").read())   # public *.trycloudflare.com URL 확인
+```
+
+---
+
+## B) 온프렘 GPU — 전사 + 사내 LLM 보정 (풀 파이프라인)
+
+사내망(게이트 192.168.0.123 도달) + GPU 머신이면 **음차→영문 복원까지** 한 번에:
+
+```bash
+git clone -b feat/p1-core https://git.lukehemmin.com/lukehemmin/luke_scribe.git && cd luke_scribe
+uv sync --extra engine --extra gpu
+
+export SCRIBE_LLM_BASE_URL=http://192.168.0.123:8080/v1
+export SCRIBE_LLM_API_KEY=<사내 키>          # 셸 히스토리 주의
+export SCRIBE_LLM_MODEL=copilot-gpt-4o
+export SCRIBE_LLM_MAX_CHARS=3000             # 사내 LLM 컨텍스트 창에 맞춰(~8k→1500/~16k→3000/~30k→6000)
+
+# 전사 + 청크 보정을 한 명령으로
+uv run luke-scribe transcribe talk.m4a --model large-v3-turbo --language ko --correct | tee transcript.txt
+```
+
+API로:
+```bash
+uv run luke-scribe serve                     # 출력된 X-API-Key 사용
+curl -H "X-API-Key: <키>" -F file=@talk.m4a -F model=large-v3-turbo -F correct=true \
+     http://localhost:8000/v1/transcribe
+```
+
+---
+
+## 참고
+- 보정은 긴 전사를 `SCRIBE_LLM_MAX_CHARS` 청크로 분할 + **러닝 글로서리**로 처리(작은 컨텍스트 창 대응).
+- 약 GPU(1050/2GB)는 turbo도 안 들어가 자동으로 **CPU(T0)** 로 강등 — `detect`로 등급 확인.
+- 오디오 파일은 저장소에 없음(`.gitignore`) — Colab 업로드/Drive 또는 온프렘 로컬 경로 사용.
@@ -0,0 +1,26 @@
+# luke_scribe
+
+내부용 **로컬 STT 전사 API** — faster-whisper(CTranslate2) 기반, 하드웨어 적응형.
+단일 `Job` 추상화로 배치(파일/영상)와 실시간(WebSocket)을 처리한다.
+
+> 설계 단일 진실원본(SoT): [`.omc/plans/consensus-luke-scribe-stt-api.md`](.omc/plans/consensus-luke-scribe-stt-api.md),
+> [`.omc/specs/deep-interview-luke-scribe-stt-api.md`](.omc/specs/deep-interview-luke-scribe-stt-api.md)
+
+## 상태
+- 설계 완료(모호도 ~5%) · 구현 P1 진행 중 (greenfield).
+
+## 빠른 시작 (개발)
+```bash
+uv sync                                            # 코어 의존성
+uv run luke-scribe detect                          # 하드웨어 감지 → 능력등급/정밀도/워커수
+uv sync --extra engine                             # 엔진(faster-whisper)
+uv run luke-scribe transcribe FILE --model tiny    # 단발 전사
+```
+
+## CLI
+| 명령 | 설명 | 상태 |
+|------|------|------|
+| `detect` | 하드웨어 감지·능력등급(T0~T3)·정밀도·워커수 | ✅ P1 |
+| `transcribe <file>` | 단발 파일 전사 (faster-whisper, CPU/GPU) | ✅ P1 |
+| `bench` | turbo vs large-v3 도메인 벤치(게이트) | ⏳ P1 (샘플셋 필요) |
+| `serve` | API 서버 | ⏳ P2 |
@@ -0,0 +1,130 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# luke_scribe — Colab 풀 강연 전사\n",
+    "\n",
+    "GPU(T4)에서 풀 강연을 **수 분**에 전사합니다.\n",
+    "\n",
+    "**먼저:** 런타임 → 런타임 유형 변경 → 하드웨어 가속기 **GPU** 선택.\n",
+    "\n",
+    "> ⚠️ Colab은 외부라 **사내 LLM 게이트(192.168.0.123)에 못 닿습니다** → 보정(`--correct`) 불가, **전사만**. 보정까지는 사내망 GPU에서 (repo `COLAB.md` B절).\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "# 0) GPU 확인 (없으면 런타임 유형을 GPU로)\n",
+    "!nvidia-smi -L || echo \"GPU 없음 → 런타임 유형을 GPU로 바꾸세요\"\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "# 1) 시스템 의존성 + uv\n",
+    "!apt-get -qq update && apt-get -qq install -y ffmpeg\n",
+    "!curl -LsSf https://astral.sh/uv/install.sh | sh\n",
+    "import os\n",
+    "os.environ['PATH'] = '/root/.local/bin:' + os.environ['PATH']\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "# 2) 코드 가져오기 (저장소 익명 read 허용)\n",
+    "!git clone -b feat/p1-core https://git.lukehemmin.com/lukehemmin/luke_scribe.git\n",
+    "%cd luke_scribe\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "# 3) 의존성 (엔진 + GPU CUDA 런타임) — 수 분 소요\n",
+    "!uv sync --extra engine --extra gpu\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "# 4) 하드웨어 등급 확인 (T3 = turbo+large-v3 동시상주)\n",
+    "!uv run luke-scribe detect\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "# 5) 강연 오디오 업로드 (m4a/mp3/wav/mp4 …)\n",
+    "from google.colab import files\n",
+    "AUDIO = list(files.upload().keys())[0]\n",
+    "print('업로드:', AUDIO)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "# 6) 풀 전사 (large-v3-turbo; 더 정확히는 --model large-v3)\n",
+    "!uv run luke-scribe transcribe \"$AUDIO\" --model large-v3-turbo --language ko --timestamps | tee transcript.txt\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "# 7) 전사문 내려받기\n",
+    "from google.colab import files\n",
+    "files.download('transcript.txt')\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 참고\n",
+    "- **모델**: `large-v3-turbo`(빠름) ↔ `large-v3`(정확). `detect`가 T0(CPU)면 약 GPU(느림).\n",
+    "- **보정(음차→영문)**: Colab 불가(게이트 미도달). 사내망 GPU에서 `--correct` + `SCRIBE_LLM_*` (`COLAB.md` B절).\n",
+    "- **속도**: T4 turbo ≈ 실시간 0.1~0.3× → 37분 강연 수 분.\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "accelerator": "GPU",
+  "colab": {
+   "provenance": [],
+   "gpuType": "T4"
+  },
+  "kernelspec": {
+   "display_name": "Python 3",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
@@ -0,0 +1,40 @@
+[project]
+name = "luke-scribe"
+version = "0.1.0"
+description = "내부용 로컬 STT 전사 API (faster-whisper, hardware-adaptive)"
+requires-python = ">=3.11"
+dependencies = [
+    "pydantic>=2.7",
+    "pydantic-settings>=2.3",
+    "typer>=0.12",
+    "rich>=13.7",
+    "psutil>=5.9",
+    "nvidia-ml-py>=12.535",
+    "huggingface-hub>=0.24",
+]
+
+[project.optional-dependencies]
+# 엔진 — transcribe/bench 증분에서 설치 (uv sync --extra engine)
+engine = ["faster-whisper>=1.0.3", "av>=11"]
+# GPU CUDA 런타임 (faster-whisper GPU 추론 시)
+gpu = ["nvidia-cublas-cu12", "nvidia-cudnn-cu12"]
+# 테스트 API (동기) — serve
+api = ["fastapi>=0.110", "uvicorn[standard]>=0.29", "python-multipart>=0.0.9"]
+# P2 비동기 큐 (보류)
+queue = ["redis>=5.0", "rq>=1.16"]
+# P5 옵션
+diarize = ["pyannote.audio>=3.1"]
+llm = ["openai>=1.30"]
+
+[project.scripts]
+luke-scribe = "luke_scribe.cli:main"
+
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+
+[tool.hatch.build.targets.wheel]
+packages = ["src/luke_scribe"]
+
+[dependency-groups]
+dev = ["pytest>=8.2", "ruff>=0.5", "httpx>=0.27"]
@@ -0,0 +1,5 @@
+#!/usr/bin/env bash
+# 개발/Colab 실행 래퍼 — Docker 없이 순수 Python (계획 §3.10d).
+set -euo pipefail
+cd "$(dirname "$0")"
+exec uv run luke-scribe "$@"
@@ -0,0 +1,39 @@
+# samples/ — bench 데이터셋 (KO+EN 도메인)
+
+`bench` 게이트(turbo vs large-v3 의 **R-WER · entity 보존율**)와 혼용어 정확도(AC-4)
+검증의 입력입니다. 이 데이터가 있어야 설계 모호도 마지막 ~5%(하이브리드→단일 확정)를
+측정으로 닫을 수 있습니다.
+
+## 무엇이 필요한가
+1. **오디오/영상 클립** — wav/flac/mp3/m4a/mp4 등(엔진이 ffmpeg로 디코딩). 5~60초 권장, **5~20개부터** 시작 가능.
+2. **정답 전사(ground truth)** — 각 클립의 올바른 한국어 텍스트. **영문 기술용어는 영문 그대로**(예: `vLLM`, `API`, `Kubernetes`).
+3. (선택) **도메인 엔티티 목록** — entity 보존율 측정용.
+
+## 배치 형식
+```
+samples/ko_en/
+  clips/0001.wav
+  clips/0002.wav
+  manifest.jsonl     # 클립 ↔ 정답 매핑 (한 줄당 1 클립)
+  entities.txt       # (선택) 한 줄당 도메인 용어
+```
+
+`manifest.jsonl` 예:
+```jsonl
+{"audio": "clips/0001.wav", "text": "그 API 서빙할 때 vLLM 쓰면 성능 대박이야", "lang": "ko"}
+{"audio": "clips/0002.wav", "text": "FastAPI로 엔드포인트 만들고 Kubernetes에 배포했어", "lang": "ko"}
+```
+
+`entities.txt` 예(선택):
+```
+vLLM
+FastAPI
+Kubernetes
+CTranslate2
+GPU
+```
+
+## 주의
+- 오디오/영상 파일은 `.gitignore`로 **커밋 제외**(용량·프라이버시). `manifest.jsonl`·`entities.txt`·이 README만 추적.
+- entity 보존율은 **정답 텍스트의 영문 표기**를 기준으로 계산하니 표기를 정확히.
+- `bench` 구현 시 이 형식을 그대로 소비합니다: `uv run luke-scribe bench --samples samples/ko_en/`.
@@ -0,0 +1,2 @@
+# 오디오/영상 클립을 이 폴더에 넣으세요 (예: 0001.wav, 0001.mp3, 0001.mp4).
+# 미디어 파일 자체는 .gitignore로 커밋 제외됩니다(용량/프라이버시). manifest만 추적.
@@ -0,0 +1,2 @@
+{"audio": "clips/0001.wav", "text": "그 API 서빙할 때 vLLM 쓰면 성능 대박이야", "lang": "ko"}
+{"audio": "clips/0002.wav", "text": "FastAPI로 엔드포인트 만들고 Kubernetes에 배포했어", "lang": "ko"}
@@ -0,0 +1,82 @@
+#!/usr/bin/env python3
+"""STT 후처리 PoC — 음차된 영문 기술용어를 사내 LLM(OpenAI 호환)으로 복원.
+
+게이트가 닿는 환경에서 실행:
+    export SCRIBE_LLM_BASE_URL=http://localhost:8080/v1
+    export SCRIBE_LLM_API_KEY=<사내 키>
+    export SCRIBE_LLM_MODEL=copilot-gpt-4o
+    python3 scripts/llm_correct.py              # 내장 샘플로 데모
+    python3 scripts/llm_correct.py < my.txt     # 임의 전사 교정
+
+외부 의존성 없음(urllib). 향후 postprocess/llm.py(confidence-gated, 청크/러닝글로서리)로 발전.
+"""
+from __future__ import annotations
+
+import json
+import os
+import sys
+import time
+import urllib.error
+import urllib.request
+
+SYSTEM = (
+    "너는 한국어 STT 전사 후처리기다. 한국어 음성에 섞여 나온 영어 기술용어·고유명사가 "
+    "발음대로 한글로 음차되어 잘못 적힌 부분을 문맥과 지식으로 원래 영어 표기로 복원하라. "
+    "일반 한국어는 그대로 두고, 확실하지 않으면 바꾸지 마라. 설명 없이 교정된 전사문만 출력하라."
+)
+
+# turbo가 망친 실제 전사(EmbeddingGemma 강연) — 내장 데모용
+SAMPLE = (
+    "그래서 오늘 준비한 내용은 기본적으로 인베딩 점마에 대해서 설명을 드릴 텐데요. "
+    "여러분들이 알고 계시는 랭기징 모델이 정말 사람이 생각하는 것처럼 하는데 "
+    "그 다음에 구글에 런칭한 오픈모델입니다. 인베딩 점마 라는 것을 소개를 해드릴 예정입니다. "
+    "그리고 어 재미나이 하고 이제 점마하고 두 가지가 있는데요. "
+    "구글 포 디벨로퍼스 사이트에 가시면 제가 올린 포스트도 보실 수 있는데."
+)
+
+
+def correct(text: str) -> str:
+    base = os.environ.get("SCRIBE_LLM_BASE_URL", "http://localhost:8080/v1").rstrip("/")
+    key = os.environ.get("SCRIBE_LLM_API_KEY", "")
+    model = os.environ.get("SCRIBE_LLM_MODEL", "copilot-gpt-4o")
+    payload = {
+        "model": model,
+        "temperature": 0,
+        "messages": [
+            {"role": "system", "content": SYSTEM},
+            {"role": "user", "content": text},
+        ],
+    }
+    req = urllib.request.Request(
+        base + "/chat/completions",
+        data=json.dumps(payload).encode(),
+        headers={"Content-Type": "application/json", "Authorization": "Bearer " + key},
+    )
+    retries = 4
+    for attempt in range(1, retries + 1):
+        try:
+            with urllib.request.urlopen(req, timeout=90) as resp:
+                return json.loads(resp.read())["choices"][0]["message"]["content"]
+        except urllib.error.HTTPError:
+            raise  # 실제 HTTP 응답(401/400 등) — 재시도 무의미
+        except (urllib.error.URLError, OSError) as exc:  # 연결 reset/timeout 등 transient
+            if attempt == retries:
+                raise
+            print(f"  [retry {attempt}/{retries - 1}] {type(exc).__name__} → 재시도", file=sys.stderr)
+            time.sleep(1.5 * attempt)
+    raise RuntimeError("unreachable")
+
+
+def main() -> None:
+    src = (sys.stdin.read().strip() if not sys.stdin.isatty() else "") or SAMPLE
+    print("=== 원본 ===\n" + src + "\n\n=== 교정 ===")
+    try:
+        print(correct(src))
+    except urllib.error.HTTPError as exc:
+        sys.exit(f"HTTP {exc.code}: {exc.read().decode()[:300]}")
+    except Exception as exc:  # noqa: BLE001
+        sys.exit(f"{type(exc).__name__}: {exc}")
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,3 @@
+"""luke_scribe — 내부용 로컬 STT 전사 API (faster-whisper, hardware-adaptive)."""
+
+__version__ = "0.1.0"
@@ -0,0 +1 @@
+"""HTTP API (FastAPI) — 동기 테스트 API. 비동기 큐/실시간은 P2/P3."""
@@ -0,0 +1,24 @@
+"""FastAPI 앱 팩토리."""
+from __future__ import annotations
+
+import contextlib
+import logging
+from collections.abc import AsyncIterator
+
+from fastapi import FastAPI
+
+from .deps import ensure_keys
+from .routes.transcribe import router
+
+logger = logging.getLogger("luke_scribe.api")
+
+
+def create_app() -> FastAPI:
+    @contextlib.asynccontextmanager
+    async def lifespan(_app: FastAPI) -> AsyncIterator[None]:
+        logger.info("luke_scribe API ready · X-API-Key=%s", ensure_keys()[0])
+        yield
+
+    app = FastAPI(title="luke_scribe", version="0.1.0", lifespan=lifespan)
+    app.include_router(router)
+    return app
@@ -0,0 +1,26 @@
+"""인증 — X-API-Key (스펙 §3.8). 키 미설정 시 기동 때 임시 키 1개 생성·강제."""
+from __future__ import annotations
+
+import secrets
+
+from fastapi import Header, HTTPException, status
+
+from ..config import settings
+
+_ephemeral_key: str | None = None
+
+
+def ensure_keys() -> list[str]:
+    """유효 키 목록. 설정이 없으면 임시 키를 1회 생성해 반환(앱이 출력)."""
+    global _ephemeral_key
+    if settings.api_keys:
+        return settings.api_keys
+    if _ephemeral_key is None:
+        _ephemeral_key = "sk-luke-" + secrets.token_urlsafe(24)
+    return [_ephemeral_key]
+
+
+def require_api_key(x_api_key: str | None = Header(default=None)) -> str:
+    if x_api_key not in ensure_keys():
+        raise HTTPException(status.HTTP_401_UNAUTHORIZED, "invalid or missing X-API-Key")
+    return x_api_key
@@ -0,0 +1,27 @@
+"""프로세스 레벨 엔진 캐시 — 모델 load-once 재사용 (스펙 §3.5).
+
+전사는 `transcribe_lock`으로 직렬화(단일 GPU/CPU, 테스트 등급). uvicorn 단일 워커 전제.
+"""
+from __future__ import annotations
+
+import threading
+
+from ..engine.faster_whisper_engine import FasterWhisperEngine
+
+_engines: dict[tuple[str, str, str], FasterWhisperEngine] = {}
+_cache_lock = threading.Lock()
+transcribe_lock = threading.Lock()
+
+
+def get_engine(
+    model: str, device: str, compute_type: str, cache_dir: str | None = None
+) -> FasterWhisperEngine:
+    key = (model, device, compute_type)
+    eng = _engines.get(key)
+    if eng is None:
+        with _cache_lock:
+            eng = _engines.get(key)
+            if eng is None:
+                eng = FasterWhisperEngine(model, device, compute_type, cache_dir)
+                _engines[key] = eng
+    return eng
@@ -0,0 +1,124 @@
+"""라우트 — /health, /v1/system, /v1/models, POST /v1/transcribe (동기)."""
+from __future__ import annotations
+
+import contextlib
+import os
+import tempfile
+
+from fastapi import APIRouter, Depends, File, Form, HTTPException, UploadFile, status
+from fastapi.responses import PlainTextResponse
+
+from ...audio.ingest import probe_media
+from ...config import settings
+from ...devices import DeviceManager
+from ...postprocess import llm as llm_correct
+from ...postprocess import rules
+from ...results import formats
+from ..deps import require_api_key
+from ..engine_pool import get_engine, transcribe_lock
+
+router = APIRouter()
+
+
+@router.get("/health")
+def health() -> dict[str, str]:
+    return {"status": "ok"}
+
+
+@router.get("/v1/system")
+def system():  # noqa: ANN201 — DeviceProfile(pydantic) 직렬화
+    return DeviceManager.detect()
+
+
+@router.get("/v1/models")
+def models() -> dict:
+    profile = DeviceManager.detect()
+    return {
+        "tier": profile.tier.value,
+        "served": profile.served_models,
+        "realtime": settings.model_realtime,
+        "batch": settings.model_batch,
+    }
+
+
+@router.post("/v1/transcribe")
+def transcribe_ep(  # noqa: PLR0913 — 요청 옵션 다수(스펙 options 스키마)
+    file: UploadFile = File(...),
+    language: str | None = Form(None),
+    model: str | None = Form(None),
+    device: str = Form("auto"),
+    vad: bool = Form(True),
+    word_timestamps: bool = Form(False),
+    correct: bool = Form(False),
+    response_format: str = Form("json"),
+    _api_key: str = Depends(require_api_key),
+):
+    suffix = os.path.splitext(file.filename or "")[1] or ".bin"
+    fd, tmp = tempfile.mkstemp(prefix="luke_up_", suffix=suffix)
+    try:
+        with os.fdopen(fd, "wb") as out:
+            while chunk := file.file.read(1 << 20):
+                out.write(chunk)
+
+        info = probe_media(tmp)
+        if info.duration_s > settings.max_duration_s or info.size_bytes > settings.max_size_bytes:
+            raise HTTPException(
+                status.HTTP_413_CONTENT_TOO_LARGE,
+                f"{info.duration_s:.0f}s/{info.size_bytes}B "
+                f"exceeds {settings.max_duration_s}s/{settings.max_size_bytes}B",
+            )
+
+        profile = DeviceManager.detect(force_device=(None if device == "auto" else device))
+        dev = "cpu" if profile.kind == "cpu" else "cuda"
+        model_name = model or settings.model_realtime
+        lang = language or settings.language
+
+        engine = get_engine(model_name, dev, profile.compute_type, settings.model_cache_dir)
+        with transcribe_lock:
+            segments, tinfo = engine.transcribe(
+                tmp, language=lang, word_timestamps=word_timestamps, vad=vad
+            )
+            seg_list = [
+                {"start": float(s.start), "end": float(s.end), "text": s.text.strip()}
+                for s in segments
+            ]
+
+        text = " ".join(s["text"] for s in seg_list).strip()
+        corrected = False
+        if correct:
+            try:
+                text = rules.normalize(
+                    llm_correct.correct(
+                        text,
+                        base_url=settings.llm_base_url,
+                        api_key=settings.llm_api_key,
+                        model=settings.llm_model,
+                        max_chars=settings.llm_max_chars,
+                    )
+                )
+                corrected = True
+            except llm_correct.LLMNotConfigured as exc:
+                raise HTTPException(status.HTTP_400_BAD_REQUEST, f"correct=true but {exc}") from exc
+            except Exception as exc:  # noqa: BLE001
+                raise HTTPException(
+                    status.HTTP_502_BAD_GATEWAY, f"LLM correction failed: {exc}"
+                ) from exc
+
+        if response_format == "txt":
+            return PlainTextResponse(text)
+        if response_format == "srt":
+            return PlainTextResponse(formats.to_srt(seg_list))
+        if response_format == "vtt":
+            return PlainTextResponse(formats.to_vtt(seg_list))
+        return {
+            "text": text,
+            "segments": seg_list,
+            "language": getattr(tinfo, "language", None),
+            "model_used": model_name,
+            "corrected": corrected,
+            "duration_s": info.duration_s,
+        }
+    finally:
+        with contextlib.suppress(OSError):
+            os.remove(tmp)  # 프라이버시: 모든 종료경로에서 임시파일 삭제
+        file.file.close()
@@ -0,0 +1,19 @@
+"""API 응답 스키마."""
+from __future__ import annotations
+
+from pydantic import BaseModel
+
+
+class Segment(BaseModel):
+    start: float
+    end: float
+    text: str
+
+
+class TranscribeResult(BaseModel):
+    text: str
+    segments: list[Segment]
+    language: str | None = None
+    model_used: str
+    corrected: bool = False
+    duration_s: float = 0.0
@@ -0,0 +1,4 @@
+"""오디오/영상 입력 — ingest(probe·상한), VAD (스펙 §4-4)."""
+from .ingest import MediaInfo, probe_media
+
+__all__ = ["MediaInfo", "probe_media"]
@@ -0,0 +1,41 @@
+"""미디어 입력 — duration/size probe + 상한 점검 (스펙 §4-4, AC-7).
+
+상한 초과는 호출측이 413으로 매핑(P2). 실제 디코딩은 엔진(faster-whisper/PyAV)이 수행.
+"""
+from __future__ import annotations
+
+import json
+import os
+import shutil
+import subprocess
+from dataclasses import dataclass
+
+
+@dataclass
+class MediaInfo:
+    path: str
+    duration_s: float
+    size_bytes: int
+
+
+def probe_media(path: str) -> MediaInfo:
+    if not os.path.exists(path):
+        raise FileNotFoundError(path)
+    return MediaInfo(path=path, duration_s=_ffprobe_duration(path), size_bytes=os.path.getsize(path))
+
+
+def _ffprobe_duration(path: str) -> float:
+    ffprobe = shutil.which("ffprobe")
+    if not ffprobe:
+        return 0.0
+    try:
+        out = subprocess.run(
+            [ffprobe, "-v", "error", "-show_entries", "format=duration", "-of", "json", path],
+            capture_output=True,
+            text=True,
+            timeout=30,
+            check=True,
+        ).stdout
+        return float(json.loads(out).get("format", {}).get("duration") or 0.0)
+    except Exception:
+        return 0.0
@@ -0,0 +1,193 @@
+"""CLI — typer. `detect`(구현) + transcribe/bench/serve(스텁). 스펙 §배포."""
+from __future__ import annotations
+
+import typer
+from rich.console import Console
+from rich.table import Table
+
+from .devices import DeviceManager
+
+app = typer.Typer(add_completion=False, help="luke_scribe — 로컬 STT 전사 (hardware-adaptive)")
+console = Console()
+
+
+@app.command()
+def detect(
+    device: str = typer.Option("auto", help="auto|cpu|cuda"),
+    compute_type: str = typer.Option(None, "--compute-type", help="강제 compute_type(float16|int8|int8_float16)"),
+    workers: int = typer.Option(None, help="워커수 오버라이드"),
+) -> None:
+    """하드웨어 감지 → 능력등급(T0~T3)/정밀도/워커수 산정 (AC-2/3, 측정 전 정적 추정)."""
+    profile = DeviceManager.detect(
+        force_device=(None if device == "auto" else device),
+        force_compute_type=compute_type,
+        workers_override=workers,
+    )
+    table = Table(title="luke_scribe · device profile", show_header=False, title_style="bold cyan")
+    table.add_row("device", f"{profile.kind}  ({profile.name})")
+    if profile.compute_capability:
+        table.add_row("compute capability", profile.compute_capability)
+    if profile.vram_total_mb:
+        table.add_row("VRAM (free/total)", f"{profile.vram_free_mb} / {profile.vram_total_mb} MB")
+    table.add_row("RAM", f"{profile.ram_total_mb} MB")
+    table.add_row("disk free", f"{profile.disk_free_mb} MB")
+    table.add_row("compute_type", profile.compute_type)
+    table.add_row("capability tier", f"[bold]{profile.tier.value}[/]")
+    table.add_row("max workers", str(profile.max_workers))
+    for lane, model in profile.served_models.items():
+        table.add_row(f"served · {lane}", model)
+    table.add_row("measured", "yes" if profile.measured else "no (정적 추정)")
+    console.print(table)
+    for note in profile.notes:
+        console.print(f"  • {note}", style="yellow")
+
+
+def _todo(name: str, hint: str = "") -> None:
+    console.print(f"[yellow]'{name}' 은 아직 미구현입니다 (P1 진행 중).[/] {hint}")
+    raise typer.Exit(code=1)
+
+
+@app.command()
+def transcribe(
+    file: str = typer.Argument(..., help="오디오/영상 파일"),
+    model: str = typer.Option(None, help="모델 오버라이드(기본=실시간 모델). tiny|base|large-v3|large-v3-turbo"),
+    language: str = typer.Option(None, help="언어(기본 설정값). 'auto' 가능"),
+    device: str = typer.Option("auto", help="auto|cpu|cuda"),
+    word_timestamps: bool = typer.Option(False, "--word-timestamps"),
+    vad: bool = typer.Option(True, "--vad/--no-vad", help="무음 제거"),
+    beam_size: int = typer.Option(None, "--beam-size", help="디코딩 빔(CPU 1~2 권장=속도↑)"),
+    correct: bool = typer.Option(False, "--correct", help="사내 LLM 보정(SCRIBE_LLM_* 설정 필요)"),
+    timestamps: bool = typer.Option(False, "--timestamps", help="세그먼트 [start–end] 표시"),
+) -> None:
+    """단발 파일 전사 (faster-whisper, CPU/GPU 자동, AC-4 일부)."""
+    from .config import settings
+
+    try:
+        from .audio.ingest import probe_media
+        from .engine.faster_whisper_engine import FasterWhisperEngine
+    except ImportError as exc:
+        console.print(f"[red]엔진 미설치:[/] {exc}\n→ `uv sync --extra engine` 후 다시 시도하세요.")
+        raise typer.Exit(code=1) from exc
+
+    try:
+        info = probe_media(file)
+    except FileNotFoundError:
+        console.print(f"[red]파일 없음:[/] {file}")
+        raise typer.Exit(code=1) from None
+
+    if info.duration_s > settings.max_duration_s or info.size_bytes > settings.max_size_bytes:
+        console.print(
+            f"[red]입력 상한 초과(413):[/] {info.duration_s:.0f}s / {info.size_bytes}B "
+            f"(상한 {settings.max_duration_s}s / {settings.max_size_bytes}B)"
+        )
+        raise typer.Exit(code=1)
+
+    profile = DeviceManager.detect(force_device=(None if device == "auto" else device))
+    dev = "cpu" if profile.kind == "cpu" else "cuda"
+    model_name = model or settings.model_realtime
+    lang = language or settings.language
+    console.print(
+        f"[dim]model={model_name} device={dev} compute={profile.compute_type} "
+        f"lang={lang} dur={info.duration_s:.1f}s[/]"
+    )
+
+    engine = FasterWhisperEngine(model_name, dev, profile.compute_type, cache_dir=settings.model_cache_dir)
+    segments, tinfo = engine.transcribe(
+        file, language=lang, word_timestamps=word_timestamps, vad=vad,
+        beam_size=(beam_size or settings.beam_size),
+    )
+
+    seg_list = []
+    for seg in segments:
+        seg_list.append({"start": seg.start, "end": seg.end, "text": seg.text.strip()})
+        if not correct:  # 스트리밍 출력(보정 시엔 전체를 모은 뒤 한 번에)
+            if timestamps:
+                console.print(f"[cyan][{seg.start:6.2f}–{seg.end:6.2f}][/] {seg.text.strip()}")
+            else:
+                console.print(seg.text.strip())
+
+    if correct:
+        from .postprocess import llm as llm_correct
+        from .postprocess import rules
+
+        text = " ".join(s["text"] for s in seg_list).strip()
+        try:
+            text = rules.normalize(
+                llm_correct.correct(
+                    text,
+                    base_url=settings.llm_base_url,
+                    api_key=settings.llm_api_key,
+                    model=settings.llm_model,
+                    max_chars=settings.llm_max_chars,
+                )
+            )
+        except llm_correct.LLMNotConfigured as exc:
+            console.print(f"[red]--correct:[/] {exc}")
+            raise typer.Exit(code=1) from exc
+        console.print(text)
+
+    detected = getattr(tinfo, "language", None)
+    console.print(
+        f"[green]✓ {len(seg_list)} segments · detected_lang={detected} · "
+        f"model_used={model_name} · corrected={correct}[/]"
+    )
+
+
+@app.command()
+def bench(samples: str = typer.Option(None, help="라벨된 KO+EN 샘플 디렉터리")) -> None:
+    """turbo vs large-v3 도메인 벤치 게이트 (샘플셋 확보 후)."""
+    _todo("bench", "→ samples/ 라벨셋 필요")
+
+
+@app.command()
+def serve(
+    host: str = typer.Option(None, help="bind host (기본 설정값)"),
+    port: int = typer.Option(None, help="bind port (기본 설정값)"),
+    tunnel: str = typer.Option("none", help="none|cloudflare (Colab 외부 노출)"),
+) -> None:
+    """테스트 API 서버 (동기 transcribe + opt-in 보정). AC-1/11/12 일부."""
+    from .config import settings
+
+    try:
+        import uvicorn
+
+        from .api.app import create_app
+        from .api.deps import ensure_keys
+    except ImportError as exc:
+        console.print(f"[red]API 의존성 미설치:[/] {exc}\n→ `uv sync --extra api --extra engine`")
+        raise typer.Exit(code=1) from exc
+
+    bind_host = host or settings.host
+    bind_port = port or settings.port
+    key = ensure_keys()[0]
+    console.print(
+        f"[green]luke_scribe API[/] → http://{bind_host}:{bind_port}   "
+        f"(X-API-Key: [bold]{key}[/])"
+    )
+
+    proc = None
+    if tunnel == "cloudflare":
+        try:
+            from .connectivity.tunnel import start_cloudflared
+
+            proc, public = start_cloudflared(bind_port)
+            console.print(
+                f"[green]public:[/] {public}" if public
+                else "[yellow]cloudflared URL 미수신(계속 진행).[/]"
+            )
+        except Exception as exc:  # noqa: BLE001
+            console.print(f"[yellow]터널 실패(무시): {exc}[/]")
+
+    try:
+        uvicorn.run(create_app(), host=bind_host, port=bind_port, workers=1, log_level="info")
+    finally:
+        if proc is not None:
+            proc.terminate()
+
+
+def main() -> None:
+    app()
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,51 @@
+"""런타임 설정 — env(`SCRIBE_*`) / `.env` 로 오버라이드. 스펙 §config."""
+from __future__ import annotations
+
+from pydantic_settings import BaseSettings, SettingsConfigDict
+
+
+class Settings(BaseSettings):
+    model_config = SettingsConfigDict(env_prefix="SCRIBE_", env_file=".env", extra="ignore")
+
+    # 모델 (경로별 기본 — 하이브리드; P1 bench 결과에 따라 단일 turbo로 통일 가능)
+    model_realtime: str = "large-v3-turbo"
+    model_batch: str = "large-v3"
+
+    # 디바이스 (auto|cpu|cuda|cuda:0) — Device Manager가 자동 산정, 강제 가능
+    device: str = "auto"
+    compute_type: str | None = None      # None=자동(cc/VRAM 기반)
+    workers: int | None = None           # None=자동 산정
+    beam_size: int = 5                    # 디코딩 빔(CPU는 1~2 권장=속도↑, GPU는 5)
+
+    # 언어 (기본 ko, 요청별 override)
+    language: str = "ko"
+
+    # 입력 절대 상한 (초과 413)
+    max_duration_s: int = 4 * 3600       # 4h
+    max_size_bytes: int = 2 * 1024 * 1024 * 1024  # 2GB
+
+    # 보관/큐/인증 (P2+)
+    retention_days: int = 7
+    redis_url: str | None = None
+    api_keys: list[str] = []
+
+    # 터널 (P5)
+    tunnel: str = "none"                 # none|cloudflare|ngrok
+
+    # 모델 캐시 디렉터리 (None=HF 기본)
+    model_cache_dir: str | None = None
+
+    # API 서버 (테스트 동기 API)
+    host: str = "127.0.0.1"
+    port: int = 8000
+
+    # LLM 보정 (opt-in, 사내/로컬 OpenAI 호환 백엔드)
+    llm_enabled: bool = False
+    llm_base_url: str | None = None      # 예: http://192.168.0.123:8080/v1 (allowlist=이 endpoint만)
+    llm_api_key: str | None = None       # env SCRIBE_LLM_API_KEY 로만 주입
+    llm_model: str = "copilot-gpt-4o"
+    # 보정 청크 크기(글자) — 사내 LLM 컨텍스트 창에 맞춰 조정 (예: ~8k창→1500, ~16k→3000, ~30k→6000)
+    llm_max_chars: int = 3000
+
+
+settings = Settings()
@@ -0,0 +1 @@
+"""외부 노출 — Colab 등 공인 IP 부재 환경 (스펙 §8). MVP: cloudflared quick tunnel."""
@@ -0,0 +1,63 @@
+"""cloudflared quick tunnel (스펙 §8). 바이너리 없으면 캐시에 다운로드. best-effort.
+
+`serve --tunnel cloudflare` 가 호출 → 공개 https://<rand>.trycloudflare.com 발급(계정 불필요).
+"""
+from __future__ import annotations
+
+import os
+import platform
+import re
+import shutil
+import stat
+import subprocess
+import time
+import urllib.request
+
+_RELEASE = "https://github.com/cloudflare/cloudflared/releases/latest/download"
+_ASSETS = {
+    ("Linux", "x86_64"): "cloudflared-linux-amd64",
+    ("Linux", "aarch64"): "cloudflared-linux-arm64",
+}
+_URL_RE = re.compile(r"https://[-a-z0-9]+\.trycloudflare\.com")
+
+
+def ensure_cloudflared() -> str:
+    found = shutil.which("cloudflared")
+    if found:
+        return found
+    cache = os.path.expanduser("~/.cache/luke_scribe")
+    os.makedirs(cache, exist_ok=True)
+    path = os.path.join(cache, "cloudflared")
+    if os.path.exists(path):
+        return path
+    asset = _ASSETS.get((platform.system(), platform.machine()))
+    if not asset:
+        raise RuntimeError(
+            f"cloudflared 자동설치 미지원: {platform.system()}/{platform.machine()} "
+            "— 수동 설치 후 PATH에 두세요."
+        )
+    urllib.request.urlretrieve(f"{_RELEASE}/{asset}", path)  # noqa: S310
+    os.chmod(path, os.stat(path).st_mode | stat.S_IEXEC)
+    return path
+
+
+def start_cloudflared(port: int, timeout: float = 30.0) -> tuple[subprocess.Popen, str | None]:
+    """터널 프로세스 시작 → (proc, public_url). URL 못 받으면 url=None(프로세스는 유지)."""
+    binp = ensure_cloudflared()
+    proc = subprocess.Popen(  # noqa: S603
+        [binp, "tunnel", "--no-autoupdate", "--url", f"http://localhost:{port}"],
+        stdout=subprocess.PIPE,
+        stderr=subprocess.STDOUT,
+        text=True,
+    )
+    deadline = time.time() + timeout
+    while time.time() < deadline:
+        line = proc.stdout.readline() if proc.stdout else ""
+        if not line:
+            if proc.poll() is not None:
+                break
+            continue
+        m = _URL_RE.search(line)
+        if m:
+            return proc, m.group(0)
+    return proc, None
@@ -0,0 +1,5 @@
+"""Device Manager — GPU/CPU 감지 → 능력등급/정밀도/워커수 산정 (스펙 §6, 계획 §3.6)."""
+from .manager import DeviceManager
+from .profile import CapabilityTier, DeviceProfile
+
+__all__ = ["DeviceManager", "DeviceProfile", "CapabilityTier"]
@@ -0,0 +1,125 @@
+"""DeviceManager — 감지 → 정밀도/능력등급/워커수 산정 (계획 §3.6, AC-2/3).
+
+현재는 정적 추정(보수 상수). 후속: 부팅 시 모델 1회 로드 실측(`measured=True`)으로 대체.
+"""
+from __future__ import annotations
+
+import os
+
+from .profile import HEADROOM, MODEL_FOOTPRINT_MB, CapabilityTier, DeviceProfile
+from .vram_probe import GpuInfo, probe_disk_free_mb, probe_gpus, probe_ram_mb
+
+TURBO = "large-v3-turbo"
+V3 = "large-v3"
+
+
+def _select_compute_type(cc: tuple[int, int], free_mb: int) -> str:
+    """정밀도 자동 선택 (계획 §3.6)."""
+    major = cc[0]
+    if major >= 7:  # Volta+ : fp16 효율
+        return "float16" if free_mb >= 12000 else "int8_float16"
+    if major == 6:  # Pascal (예: GTX 1050) — fp16 비효율 → int8
+        return "int8"
+    return "int8"
+
+
+def _fits(model: str, ct: str, free_mb: int) -> bool:
+    fp = MODEL_FOOTPRINT_MB.get((model, ct))
+    return fp is not None and fp * HEADROOM <= free_mb
+
+
+def _both_fit(ct: str, free_mb: int) -> bool:
+    a = MODEL_FOOTPRINT_MB.get((TURBO, ct))
+    b = MODEL_FOOTPRINT_MB.get((V3, ct))
+    return a is not None and b is not None and (a + b) * HEADROOM <= free_mb
+
+
+def _cpu_workers(override: int | None) -> int:
+    return override or max(1, (os.cpu_count() or 2) // 4)
+
+
+def _cpu_profile(
+    *, name: str, ram: int, disk: int, override: int | None,
+    gpu: GpuInfo | None = None, notes: list[str] | None = None,
+) -> DeviceProfile:
+    return DeviceProfile(
+        kind="cpu",
+        name=name,
+        compute_capability=(f"{gpu.compute_capability[0]}.{gpu.compute_capability[1]}" if gpu else None),
+        vram_total_mb=(gpu.vram_total_mb if gpu else 0),
+        vram_free_mb=(gpu.vram_free_mb if gpu else 0),
+        ram_total_mb=ram,
+        disk_free_mb=disk,
+        compute_type="int8",
+        tier=CapabilityTier.T0_CPU,
+        max_workers=_cpu_workers(override),
+        served_models={"realtime": f"{TURBO}@cpu", "batch": f"{TURBO}@cpu"},
+        notes=(notes or []) + ["large-v3 GPU 미제공(CPU 경로)"],
+    )
+
+
+class DeviceManager:
+    @staticmethod
+    def detect(
+        force_device: str | None = None,
+        force_compute_type: str | None = None,
+        workers_override: int | None = None,
+    ) -> DeviceProfile:
+        ram = probe_ram_mb()
+        disk = probe_disk_free_mb(".")
+        gpus = probe_gpus()
+
+        # 강제 CPU 또는 GPU 없음 → T0
+        if force_device == "cpu" or not gpus:
+            note = (
+                "GPU 감지됨이나 --device cpu 강제" if (force_device == "cpu" and gpus)
+                else "GPU 미감지 → CPU"
+            )
+            return _cpu_profile(name="CPU", ram=ram, disk=disk, override=workers_override, notes=[note])
+
+        gpu = gpus[0]
+        cc = gpu.compute_capability
+        ct = force_compute_type or _select_compute_type(cc, gpu.vram_free_mb)
+
+        # turbo조차 GPU에 안 들어가면 → CPU 강등(T0)
+        if not _fits(TURBO, ct, gpu.vram_free_mb):
+            need = int(MODEL_FOOTPRINT_MB[(TURBO, ct)] * HEADROOM)
+            return _cpu_profile(
+                name=f"CPU (GPU={gpu.name} 2GB급 부족)", ram=ram, disk=disk,
+                override=workers_override, gpu=gpu,
+                notes=[f"{gpu.name} free {gpu.vram_free_mb}MB < turbo {need}MB(헤드룸 포함) → CPU 강등(T0)"],
+            )
+
+        # turbo는 GPU OK → large-v3 적재 여부로 등급 분기
+        notes: list[str] = []
+        if not _fits(V3, ct, gpu.vram_free_mb):
+            tier = CapabilityTier.T1_TURBO_GPU
+            served = {"realtime": f"{TURBO}@cuda", "batch": f"{TURBO}@cuda"}
+            notes.append("large-v3 미제공 → 배치도 turbo")
+        elif not _both_fit(ct, gpu.vram_free_mb):
+            tier = CapabilityTier.T2_SWAP
+            served = {"realtime": f"{TURBO}@cuda", "batch": f"{V3}@cuda (swap)"}
+            notes.append("turbo/large-v3 동시상주 불가 → 호출별 load/unload")
+        else:
+            tier = CapabilityTier.T3_CORESIDENT
+            served = {"realtime": f"{TURBO}@cuda", "batch": f"{V3}@cuda"}
+
+        # 워커수 = floor((free - reserve) / per_worker), reserve=상주 모델 헤드룸
+        per_worker = MODEL_FOOTPRINT_MB[(TURBO, ct)]
+        reserve = int(per_worker * (HEADROOM - 1.0))
+        est = max(1, (gpu.vram_free_mb - reserve) // per_worker)
+
+        return DeviceProfile(
+            kind="cuda",
+            name=gpu.name,
+            compute_capability=f"{cc[0]}.{cc[1]}",
+            vram_total_mb=gpu.vram_total_mb,
+            vram_free_mb=gpu.vram_free_mb,
+            ram_total_mb=ram,
+            disk_free_mb=disk,
+            compute_type=ct,
+            tier=tier,
+            max_workers=workers_override or est,
+            served_models=served,
+            notes=notes,
+        )
@@ -0,0 +1,46 @@
+"""DeviceProfile 모델 + 능력등급 + 모델 VRAM 보수 상수 (계획 §3.6)."""
+from __future__ import annotations
+
+from enum import Enum
+
+from pydantic import BaseModel, Field
+
+
+class CapabilityTier(str, Enum):
+    """부팅 실측으로 자동판정 — "제공 가능 모델"을 등급이 결정 (무음 강등 아님)."""
+
+    T0_CPU = "T0_CPU"            # GPU로 turbo도 무리/GPU 없음 → turbo@CPU
+    T1_TURBO_GPU = "T1_TURBO_GPU"  # turbo는 GPU OK, large-v3 무리 (배치도 turbo)
+    T2_SWAP = "T2_SWAP"            # large-v3 OK, turbo와 동시상주 불가 → load/unload
+    T3_CORESIDENT = "T3_CORESIDENT"  # turbo + large-v3 동시 적재 가능
+
+
+# 보수 기본 상수 (MB) — 측정 전 폴백. 계획 §3.6.
+# (부팅 시 실제 로드 측정으로 대체 예정: vram_probe --probe-load)
+MODEL_FOOTPRINT_MB: dict[tuple[str, str], int] = {
+    ("large-v3", "float16"): 10000,
+    ("large-v3", "int8_float16"): 5500,
+    ("large-v3", "int8"): 3500,
+    ("large-v3-turbo", "float16"): 4000,
+    ("large-v3-turbo", "int8_float16"): 2400,
+    ("large-v3-turbo", "int8"): 1800,
+}
+HEADROOM = 1.3  # 적재 헤드룸 배수
+
+
+class DeviceProfile(BaseModel):
+    """감지 결과 + 산정값. /v1/system·detect 가 그대로 노출."""
+
+    kind: str                              # "cuda" | "cpu"
+    name: str
+    compute_capability: str | None = None
+    vram_total_mb: int = 0
+    vram_free_mb: int = 0
+    ram_total_mb: int = 0
+    disk_free_mb: int = 0
+    compute_type: str
+    tier: CapabilityTier
+    max_workers: int = 1
+    served_models: dict[str, str] = Field(default_factory=dict)  # {"realtime":..., "batch":...}
+    measured: bool = False                 # True=모델 실측, False=정적 추정
+    notes: list[str] = Field(default_factory=list)
@@ -0,0 +1,72 @@
+"""하드웨어 실측 — GPU(NVML)/RAM/디스크. 의존성 없거나 GPU 없으면 우아하게 빈 결과."""
+from __future__ import annotations
+
+import shutil
+from dataclasses import dataclass
+
+
+@dataclass
+class GpuInfo:
+    index: int
+    name: str
+    compute_capability: tuple[int, int]
+    vram_total_mb: int
+    vram_free_mb: int
+
+
+def probe_gpus() -> list[GpuInfo]:
+    """NVML로 GPU 목록·VRAM·compute capability 실측. 없으면 []."""
+    try:
+        import pynvml  # nvidia-ml-py
+    except ImportError:
+        return []
+    try:
+        pynvml.nvmlInit()
+    except Exception:
+        return []
+
+    gpus: list[GpuInfo] = []
+    try:
+        for i in range(pynvml.nvmlDeviceGetCount()):
+            h = pynvml.nvmlDeviceGetHandleByIndex(i)
+            name = pynvml.nvmlDeviceGetName(h)
+            if isinstance(name, bytes):
+                name = name.decode()
+            mem = pynvml.nvmlDeviceGetMemoryInfo(h)
+            try:
+                major, minor = pynvml.nvmlDeviceGetCudaComputeCapability(h)
+            except Exception:
+                major, minor = (0, 0)
+            gpus.append(
+                GpuInfo(
+                    index=i,
+                    name=name,
+                    compute_capability=(major, minor),
+                    vram_total_mb=int(mem.total // (1024 * 1024)),
+                    vram_free_mb=int(mem.free // (1024 * 1024)),
+                )
+            )
+    except Exception:
+        return []
+    finally:
+        try:
+            pynvml.nvmlShutdown()
+        except Exception:
+            pass
+    return gpus
+
+
+def probe_ram_mb() -> int:
+    try:
+        import psutil
+
+        return int(psutil.virtual_memory().total // (1024 * 1024))
+    except Exception:
+        return 0
+
+
+def probe_disk_free_mb(path: str = ".") -> int:
+    try:
+        return int(shutil.disk_usage(path).free // (1024 * 1024))
+    except Exception:
+        return 0
@@ -0,0 +1,5 @@
+"""추론 엔진 — faster-whisper(CTranslate2) 단일 엔진 + 얇은 추상화 (계획 §3 D3)."""
+from .faster_whisper_engine import FasterWhisperEngine
+from .model_registry import resolve_model
+
+__all__ = ["FasterWhisperEngine", "resolve_model"]
@@ -0,0 +1,55 @@
+"""faster-whisper(CTranslate2) 엔진 래퍼 (스펙 §2 / 계획 §4-3).
+
+faster-whisper가 내부적으로 PyAV로 디코딩하므로 파일 경로(오디오/영상)를 그대로 받는다.
+segments는 제너레이터 — 호출측이 소비하며 progress/취소 점검(P2)에 활용.
+"""
+from __future__ import annotations
+
+from typing import TYPE_CHECKING, Any
+
+from .model_registry import resolve_model
+
+if TYPE_CHECKING:
+    from collections.abc import Iterable
+
+
+class FasterWhisperEngine:
+    def __init__(
+        self,
+        model_name: str,
+        device: str,
+        compute_type: str,
+        cache_dir: str | None = None,
+    ) -> None:
+        from faster_whisper import WhisperModel
+
+        self.model_name = model_name
+        self.device = device
+        self.compute_type = compute_type
+        self.model = WhisperModel(
+            resolve_model(model_name),
+            device=device,
+            compute_type=compute_type,
+            download_root=cache_dir,
+        )
+
+    def transcribe(
+        self,
+        audio: str,
+        *,
+        language: str | None = "ko",
+        word_timestamps: bool = False,
+        vad: bool = True,
+        hotwords: list[str] | None = None,
+        initial_prompt: str | None = None,
+        beam_size: int = 5,
+    ) -> tuple[Iterable[Any], Any]:
+        return self.model.transcribe(
+            audio,
+            language=(None if language in (None, "auto") else language),
+            word_timestamps=word_timestamps,
+            vad_filter=vad,
+            hotwords=(" ".join(hotwords) if hotwords else None),
+            initial_prompt=initial_prompt,
+            beam_size=beam_size,
+        )
@@ -0,0 +1,16 @@
+"""논리 모델명 → faster-whisper(CT2) 식별자 (계획 §4-3).
+
+표준 사이즈(tiny/base/small/medium/large-v3)는 그대로 통과.
+turbo류는 검증된 CT2 변환 레포로 매핑.
+"""
+from __future__ import annotations
+
+_MODEL_IDS: dict[str, str] = {
+    "large-v3-turbo": "deepdml/faster-whisper-large-v3-turbo-ct2",
+    "turbo": "deepdml/faster-whisper-large-v3-turbo-ct2",
+    "large-v3": "large-v3",
+}
+
+
+def resolve_model(name: str) -> str:
+    return _MODEL_IDS.get(name, name)
@@ -0,0 +1 @@
+"""후처리 — glossary/rules + (opt-in) LLM 보정 + confidence (스펙 §7)."""
@@ -0,0 +1,138 @@
+"""LLM 보정 (스펙 §7 stage 3 / §3.8) — 음차된 영문 용어를 문맥+지식으로 복원.
+
+작은 컨텍스트 창 대응(사내 GPT-4o < 30k 토큰): 긴 전사는 **문장 경계로 청크 분할**,
+각 청크를 순차 보정하며 **이미 확정된 영문 표기(러닝 글로서리)** 를 다음 청크로 전달 →
+큰 창 없이도 강연 전체 용어 일관성 유지.
+
+OpenAI 호환 백엔드(사내/로컬). **opt-in**(요청 correct=true) · **allowlist**(설정 base_url만) ·
+**감사로그**(호출 요약 1줄). transient(연결 reset/timeout) 재시도.
+"""
+from __future__ import annotations
+
+import json
+import logging
+import re
+import time
+import urllib.error
+import urllib.request
+
+logger = logging.getLogger("luke_scribe.postprocess.llm")
+
+SYSTEM = (
+    "너는 한국어 STT 전사 후처리기다. 한국어 음성에 섞여 나온 영어 기술용어·고유명사가 "
+    "발음대로 한글로 음차되어 잘못 적힌 부분을 문맥과 지식으로 원래 영어 표기로 복원하라. "
+    "일반 한국어는 그대로 두고, 확실하지 않으면 바꾸지 마라. 설명 없이 교정된 전사문만 출력하라."
+)
+
+_SENT_RE = re.compile(r"(?<=[.!?。…\n])\s+")          # 문장 경계
+_TERM_RE = re.compile(r"[A-Za-z][A-Za-z0-9.+/#-]{1,}")  # 러닝 글로서리용 영문 토큰
+_GLOSSARY_CAP = 60
+
+
+class LLMNotConfigured(RuntimeError):
+    """llm_base_url / llm_api_key 미설정."""
+
+
+def _chunk(text: str, max_chars: int) -> list[str]:
+    """문장 경계로 max_chars 이하 청크 패킹. 한 문장이 과대하면 글자 단위 강제 분할."""
+    if len(text) <= max_chars:
+        return [text]
+    packed: list[str] = []
+    cur = ""
+    for part in _SENT_RE.split(text):
+        if not part:
+            continue
+        if cur and len(cur) + len(part) + 1 > max_chars:
+            packed.append(cur)
+            cur = part
+        else:
+            cur = f"{cur} {part}" if cur else part
+    if cur:
+        packed.append(cur)
+    out: list[str] = []
+    for c in packed:  # 안전망: 단일 문장이 너무 길면 글자 단위 강제 분할
+        if len(c) > max_chars:
+            out.extend(c[i : i + max_chars] for i in range(0, len(c), max_chars))
+        else:
+            out.append(c)
+    return out
+
+
+def _terms(text: str) -> list[str]:
+    seen: dict[str, None] = {}
+    for m in _TERM_RE.finditer(text):
+        seen.setdefault(m.group(0), None)
+    return list(seen)
+
+
+def _request(
+    messages: list[dict],
+    *,
+    url: str,
+    api_key: str,
+    model: str,
+    retries: int,
+    timeout: float,
+) -> str:
+    payload = {"model": model, "temperature": 0, "messages": messages}
+    req = urllib.request.Request(
+        url,
+        data=json.dumps(payload).encode(),
+        headers={"Content-Type": "application/json", "Authorization": "Bearer " + api_key},
+    )
+    for attempt in range(1, retries + 1):
+        try:
+            with urllib.request.urlopen(req, timeout=timeout) as resp:
+                return json.loads(resp.read())["choices"][0]["message"]["content"]
+        except urllib.error.HTTPError:
+            raise  # 실제 HTTP 응답(401/4xx) — 재시도 무의미
+        except (urllib.error.URLError, OSError):  # transient
+            if attempt == retries:
+                raise
+            time.sleep(1.0 * attempt)
+    raise RuntimeError("unreachable")
+
+
+def correct(
+    text: str,
+    *,
+    base_url: str | None,
+    api_key: str | None,
+    model: str = "copilot-gpt-4o",
+    max_chars: int = 3000,
+    retries: int = 4,
+    timeout: float = 90.0,
+) -> str:
+    """음차 영문 용어 복원. max_chars로 청크 분할(작은 컨텍스트 창 대응)."""
+    if not base_url or not api_key:
+        raise LLMNotConfigured("llm_base_url/llm_api_key 미설정 — correct에 SCRIBE_LLM_* 필요")
+    url = base_url.rstrip("/") + "/chat/completions"
+    chunks = _chunk(text, max_chars)
+    logger.info(
+        "llm-correct egress endpoint=%s model=%s chars=%d chunks=%d",
+        url, model, len(text), len(chunks),
+    )
+    glossary: dict[str, None] = {}
+    out: list[str] = []
+    for chunk in chunks:
+        system = SYSTEM
+        if glossary:
+            system += (
+                "\n이미 이 전사에서 확정된 영문 표기: "
+                + ", ".join(glossary)
+                + ". 같은/유사 용어는 이 표기로 통일하라."
+            )
+        corrected = _request(
+            [{"role": "system", "content": system}, {"role": "user", "content": chunk}],
+            url=url,
+            api_key=api_key,
+            model=model,
+            retries=retries,
+            timeout=timeout,
+        )
+        out.append(corrected)
+        for term in _terms(corrected):
+            glossary.setdefault(term, None)
+        if len(glossary) > _GLOSSARY_CAP:
+            glossary = dict(list(glossary.items())[-_GLOSSARY_CAP:])
+    return " ".join(out).strip()
@@ -0,0 +1,18 @@
+"""결정적 정규화 (스펙 §7 stage 2). LLM 복원 뒤 정확한 표기로 보정.
+
+발견 노트: LLM이 'Embedding Gemma'로 복원 → rules가 공식 표기 'EmbeddingGemma'로 정규화.
+"""
+from __future__ import annotations
+
+# 기본 내장 맵 (config/glossary로 확장 가능)
+DEFAULT_RULES: dict[str, str] = {
+    "Embedding Gemma": "EmbeddingGemma",
+    "embedding gemma": "EmbeddingGemma",
+    "Google for developers": "Google for Developers",
+}
+
+
+def normalize(text: str, extra: dict[str, str] | None = None) -> str:
+    for src, dst in {**DEFAULT_RULES, **(extra or {})}.items():
+        text = text.replace(src, dst)
+    return text
@@ -0,0 +1 @@
+"""결과 포맷·보관 (스펙 §4). MVP: 출력 포맷(txt/srt/vtt)."""
@@ -0,0 +1,45 @@
+"""세그먼트 → txt/srt/vtt 변환 (스펙 §4, AC-9). 세그먼트=dict{start,end,text}."""
+from __future__ import annotations
+
+from collections.abc import Sequence
+
+Segment = dict  # {"start": float, "end": float, "text": str}
+
+
+def _hms(t: float) -> tuple[int, int, int, int]:
+    t = max(0.0, t)
+    h = int(t // 3600)
+    m = int((t % 3600) // 60)
+    s = int(t % 60)
+    ms = int(round((t - int(t)) * 1000))
+    if ms == 1000:  # 반올림 보정
+        ms, s = 0, s + 1
+    return h, m, s, ms
+
+
+def _ts_srt(t: float) -> str:
+    h, m, s, ms = _hms(t)
+    return f"{h:02d}:{m:02d}:{s:02d},{ms:03d}"
+
+
+def _ts_vtt(t: float) -> str:
+    h, m, s, ms = _hms(t)
+    return f"{h:02d}:{m:02d}:{s:02d}.{ms:03d}"
+
+
+def to_txt(segments: Sequence[Segment]) -> str:
+    return "\n".join(s["text"].strip() for s in segments)
+
+
+def to_srt(segments: Sequence[Segment]) -> str:
+    out: list[str] = []
+    for i, s in enumerate(segments, 1):
+        out += [str(i), f"{_ts_srt(s['start'])} --> {_ts_srt(s['end'])}", s["text"].strip(), ""]
+    return "\n".join(out).strip() + "\n"
+
+
+def to_vtt(segments: Sequence[Segment]) -> str:
+    out: list[str] = ["WEBVTT", ""]
+    for s in segments:
+        out += [f"{_ts_vtt(s['start'])} --> {_ts_vtt(s['end'])}", s["text"].strip(), ""]
+    return "\n".join(out).strip() + "\n"
@@ -0,0 +1,86 @@
+"""API — FastAPI TestClient. 엔진은 monkeypatch(가짜)로 모델 로드 회피."""
+from __future__ import annotations
+
+from types import SimpleNamespace
+
+import pytest
+from fastapi.testclient import TestClient
+
+import luke_scribe.api.routes.transcribe as route
+from luke_scribe.api.app import create_app
+from luke_scribe.config import settings
+
+
+class _FakeSeg:
+    def __init__(self, start: float, end: float, text: str) -> None:
+        self.start = start
+        self.end = end
+        self.text = text
+
+
+class _FakeEngine:
+    def transcribe(self, _audio, **_kw):
+        return [_FakeSeg(0.0, 1.0, "안녕 vLLM"), _FakeSeg(1.0, 2.0, "두번째")], SimpleNamespace(
+            language="ko"
+        )
+
+
+@pytest.fixture
+def client(monkeypatch):
+    monkeypatch.setattr(route, "get_engine", lambda *a, **k: _FakeEngine())
+    monkeypatch.setattr(
+        route, "probe_media", lambda p: SimpleNamespace(path=p, duration_s=2.0, size_bytes=1234)
+    )
+    monkeypatch.setattr(settings, "api_keys", ["testkey"])
+    return TestClient(create_app())
+
+
+def _files():
+    return {"file": ("a.wav", b"RIFF0000WAVE", "audio/wav")}
+
+
+def test_health(client):
+    assert client.get("/health").json() == {"status": "ok"}
+
+
+def test_requires_key(client):
+    assert client.post("/v1/transcribe", files=_files()).status_code == 401
+
+
+def test_transcribe_ok(client):
+    r = client.post(
+        "/v1/transcribe", files=_files(), headers={"X-API-Key": "testkey"}, data={"language": "ko"}
+    )
+    assert r.status_code == 200
+    body = r.json()
+    assert body["segments"][0]["text"] == "안녕 vLLM"
+    assert body["model_used"]
+    assert body["corrected"] is False
+
+
+def test_413(client, monkeypatch):
+    monkeypatch.setattr(
+        route, "probe_media", lambda p: SimpleNamespace(path=p, duration_s=999999, size_bytes=1)
+    )
+    r = client.post("/v1/transcribe", files=_files(), headers={"X-API-Key": "testkey"})
+    assert r.status_code == 413
+
+
+def test_srt_format(client):
+    r = client.post(
+        "/v1/transcribe",
+        files=_files(),
+        headers={"X-API-Key": "testkey"},
+        data={"response_format": "srt"},
+    )
+    assert r.status_code == 200
+    assert "00:00:00,000 --> 00:00:01,000" in r.text
+
+
+def test_correct_path(client, monkeypatch):
+    monkeypatch.setattr(route.llm_correct, "correct", lambda text, **k: text + " [보정]")
+    r = client.post(
+        "/v1/transcribe", files=_files(), headers={"X-API-Key": "testkey"}, data={"correct": "true"}
+    )
+    assert r.status_code == 200
+    assert r.json()["corrected"] is True
@@ -0,0 +1,79 @@
+"""Device Manager 능력등급/정밀도/오버라이드 결정 로직 (계획 §8 unit).
+
+실하드웨어는 T0만 밟으므로 T1~T3은 합성 VRAM 값으로 검증.
+"""
+from __future__ import annotations
+
+from luke_scribe.devices import manager as m
+from luke_scribe.devices.manager import DeviceManager
+from luke_scribe.devices.profile import CapabilityTier
+from luke_scribe.devices.vram_probe import GpuInfo
+
+
+def _patch(monkeypatch, gpus: list[GpuInfo]) -> None:
+    monkeypatch.setattr(m, "probe_gpus", lambda: gpus)
+    monkeypatch.setattr(m, "probe_ram_mb", lambda: 16000)
+    monkeypatch.setattr(m, "probe_disk_free_mb", lambda path=".": 100000)
+
+
+def _gpu(cc: tuple[int, int], free: int, name: str = "TestGPU") -> GpuInfo:
+    return GpuInfo(0, name, cc, free + 100, free)
+
+
+def test_no_gpu_is_t0_cpu(monkeypatch):
+    _patch(monkeypatch, [])
+    p = DeviceManager.detect()
+    assert p.kind == "cpu"
+    assert p.tier == CapabilityTier.T0_CPU
+    assert p.compute_type == "int8"
+
+
+def test_weak_pascal_downgrades_to_cpu(monkeypatch):
+    # GTX 1050: cc6.1, free 1990 → turbo(int8, 2340MB 헤드룸) 부족 → CPU 강등
+    _patch(monkeypatch, [_gpu((6, 1), 1990, "GTX 1050")])
+    p = DeviceManager.detect()
+    assert p.tier == CapabilityTier.T0_CPU
+    assert p.kind == "cpu"
+    assert p.vram_free_mb == 1990  # GPU 정보는 보존(투명성)
+    assert any("강등" in n for n in p.notes)
+
+
+def test_t1_turbo_only(monkeypatch):
+    # cc7.5, free 6000 → int8_float16; turbo 적재 OK, large-v3 무리
+    _patch(monkeypatch, [_gpu((7, 5), 6000)])
+    p = DeviceManager.detect()
+    assert p.tier == CapabilityTier.T1_TURBO_GPU
+    assert p.compute_type == "int8_float16"
+    assert p.served_models["batch"].startswith("large-v3-turbo")
+
+
+def test_t2_swap(monkeypatch):
+    # cc7.5, free 16000 → float16; turbo·large-v3 각각 OK, 동시상주는 불가
+    _patch(monkeypatch, [_gpu((7, 5), 16000)])
+    p = DeviceManager.detect()
+    assert p.tier == CapabilityTier.T2_SWAP
+    assert p.compute_type == "float16"
+    assert "swap" in p.served_models["batch"]
+
+
+def test_t3_coresident(monkeypatch):
+    # A100급: cc8.0, free 40000 → float16; turbo+large-v3 동시상주
+    _patch(monkeypatch, [_gpu((8, 0), 40000, "A100")])
+    p = DeviceManager.detect()
+    assert p.tier == CapabilityTier.T3_CORESIDENT
+    assert p.compute_type == "float16"
+    assert p.served_models["batch"] == "large-v3@cuda"
+    assert p.max_workers >= 1
+
+
+def test_force_cpu_override(monkeypatch):
+    _patch(monkeypatch, [_gpu((8, 0), 40000)])
+    p = DeviceManager.detect(force_device="cpu")
+    assert p.tier == CapabilityTier.T0_CPU
+    assert p.kind == "cpu"
+
+
+def test_workers_override(monkeypatch):
+    _patch(monkeypatch, [_gpu((8, 0), 40000)])
+    p = DeviceManager.detect(workers_override=3)
+    assert p.max_workers == 3
@@ -0,0 +1,23 @@
+"""engine.model_registry / audio.ingest 경량 단위 테스트 (모델 로드 불요)."""
+from __future__ import annotations
+
+import pytest
+
+from luke_scribe.audio.ingest import probe_media
+from luke_scribe.engine.model_registry import resolve_model
+
+
+def test_resolve_model_turbo_maps_to_ct2_repo():
+    expected = "deepdml/faster-whisper-large-v3-turbo-ct2"
+    assert resolve_model("large-v3-turbo") == expected
+    assert resolve_model("turbo") == expected
+
+
+def test_resolve_model_standard_passthrough():
+    assert resolve_model("tiny") == "tiny"
+    assert resolve_model("large-v3") == "large-v3"
+
+
+def test_probe_media_missing_raises():
+    with pytest.raises(FileNotFoundError):
+        probe_media("/no/such/file.wav")
@@ -0,0 +1,25 @@
+"""results.formats — txt/srt/vtt."""
+from __future__ import annotations
+
+from luke_scribe.results import formats
+
+SEGS = [
+    {"start": 0.0, "end": 1.5, "text": "안녕 world"},
+    {"start": 1.5, "end": 3.0, "text": "두번째"},
+]
+
+
+def test_txt():
+    assert formats.to_txt(SEGS) == "안녕 world\n두번째"
+
+
+def test_srt():
+    out = formats.to_srt(SEGS)
+    assert "1\n00:00:00,000 --> 00:00:01,500\n안녕 world" in out
+    assert "2\n00:00:01,500 --> 00:00:03,000\n두번째" in out
+
+
+def test_vtt():
+    out = formats.to_vtt(SEGS)
+    assert out.startswith("WEBVTT")
+    assert "00:00:00.000 --> 00:00:01.500" in out
@@ -0,0 +1,59 @@
+"""postprocess.rules / postprocess.llm (urllib monkeypatch)."""
+from __future__ import annotations
+
+import json
+
+import pytest
+
+from luke_scribe.postprocess import llm, rules
+
+
+def test_rules_normalize():
+    assert rules.normalize("구글 Embedding Gemma 소개") == "구글 EmbeddingGemma 소개"
+    assert rules.normalize("그대로") == "그대로"
+
+
+def test_llm_not_configured():
+    with pytest.raises(llm.LLMNotConfigured):
+        llm.correct("x", base_url=None, api_key=None)
+
+
+class _FakeResp:
+    def __init__(self, payload: dict) -> None:
+        self._p = payload
+
+    def read(self) -> bytes:
+        return json.dumps(self._p).encode()
+
+    def __enter__(self):
+        return self
+
+    def __exit__(self, *_a):
+        return False
+
+
+def test_llm_correct_monkeypatched(monkeypatch):
+    def fake_urlopen(_req, timeout=90):  # noqa: ARG001
+        return _FakeResp({"choices": [{"message": {"content": "EmbeddingGemma 복원됨"}}]})
+
+    monkeypatch.setattr(llm.urllib.request, "urlopen", fake_urlopen)
+    out = llm.correct("인베딩 점마", base_url="http://x/v1", api_key="k", model="m")
+    assert out == "EmbeddingGemma 복원됨"
+
+
+def test_llm_chunking_and_glossary(monkeypatch):
+    """긴 입력 → 청크 분할 + 러닝 글로서리(작은 컨텍스트 창 대응)."""
+    calls: list[list[dict]] = []
+
+    def fake_request(messages, **_kw):
+        calls.append(messages)
+        return messages[1]["content"]  # 청크 그대로 echo
+
+    monkeypatch.setattr(llm, "_request", fake_request)
+    long_text = ". ".join(f"문장{i} EmbeddingGemma 설명" for i in range(400))
+    out = llm.correct(long_text, base_url="http://x/v1", api_key="k", max_chars=200)
+
+    assert len(calls) > 1  # 분할됨
+    assert "EmbeddingGemma" in out  # 재조립됨
+    # 2번째 청크부터 이전에 확정된 영문 표기가 system에 주입됨
+    assert any("확정된 영문 표기" in m[0]["content"] for m in calls[1:])
Author	SHA1	Message	Date
lukehemmin	a5e6d56568	docs: add Colab notebook for full-talk transcription (notebooks/colab_full_transcribe.ipynb) GPU(T4) 셀: ffmpeg+uv → 익명 clone → uv sync(engine+gpu) → detect → 오디오 업로드 → large-v3-turbo 풀 전사 → transcript.txt 다운로드. (Colab은 사내 게이트 미도달이라 전사 전용; 보정은 온프렘.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 07:33:54 +09:00
lukehemmin	cd2f807557	chore(omc): hotpaths (beam-size/correct/COLAB) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 07:29:37 +09:00
lukehemmin	7a8cc12cb3	feat(cli): --beam-size + --correct; add COLAB.md GPU full-transcribe guide - transcribe: --beam-size(CPU 속도), --correct(사내 LLM 청크 보정, SCRIBE_LLM_*), config.beam_size(CPU 1~2 권장). 보정 시 전체 수집 후 한 번에 출력. - COLAB.md: Colab(전사 전용·게이트 미도달) + 온프렘 GPU(전사+보정 풀 파이프라인) 가이드. 23 tests pass, ruff clean. --correct 미설정 시 우아한 에러 검증. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 07:29:37 +09:00
lukehemmin	1a91060c43	chore(omc): hotpaths (chunked correction) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 07:09:51 +09:00
lukehemmin	b721ca6419	feat(api): chunk LLM correction for small context windows (+running glossary) 사내 GPT-4o 컨텍스트(<30k)에 맞춰 긴 전사를 문장 경계로 청크 분할하고, 각 청크 보정의 영문 용어를 '러닝 글로서리'로 다음 청크 system에 전달 → 큰 창 없이 강연 전체 용어 일관성 유지. config.llm_max_chars(기본 3000; ~8k창→1500/~16k→3000/~30k→6000). 과대 단일문장은 글자단위 강제 분할 안전망. 23 tests pass(청크 분할/글로서리 주입 포함), ruff clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 07:09:51 +09:00
lukehemmin	1ea96c36c8	chore(omc): record GPT-4o correction finding + P2 API progress (hotpaths) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-08 23:20:01 +09:00
lukehemmin	8f6f8969fd	feat(api): sync test API (serve) + opt-in LLM correction + cloudflared tunnel - api/: FastAPI app, X-API-Key 인증(미설정 시 임시키), 엔진 load-once 풀 (+transcribe lock), POST /v1/transcribe(multipart, 동기), /health, /v1/system, /v1/models. 업로드 임시파일 finally 삭제(프라이버시). - postprocess/: llm.correct(scripts/llm_correct.py 승격; opt-in·allowlist·감사로그·재시도) + rules.normalize(EmbeddingGemma 등 정규화). - results/formats.py: txt/srt/vtt. connectivity/tunnel.py: cloudflared quick tunnel(Colab). - cli serve: uvicorn 단일워커 + --tunnel cloudflare; config llm_* 필드; pyproject api/queue extra 분리(+python-multipart, dev httpx). 검증: 22 단위테스트(API TestClient·formats·postprocess) + 실서버 e2e (/health·auth 401·실제 전사(JFK)·SRT·임시파일 삭제). KO 품질은 turbo/large-v3 필요(tiny는 한국어 degenerate). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-08 23:20:01 +09:00
lukehemmin	480a36edfe	chore: scaffold samples/ko_en/ (clips/ + manifest template) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 15:14:25 +09:00
lukehemmin	45690371c3	docs: add samples/ bench dataset spec (KO+EN) + broaden audio gitignore Document the exact format for the KO+EN labeled clips that the bench gate needs (manifest.jsonl + ground-truth text + optional entities). Ignore audio/video under samples/** while keeping manifests tracked. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 15:12:20 +09:00
lukehemmin	518c03174a	chore(omc): record P1 progress note (engine+transcribe) + hotpaths Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 15:08:07 +09:00
lukehemmin	73380bebf9	feat(p1): faster-whisper engine + audio ingest + transcribe (CPU verified) - engine/: FasterWhisperEngine 래퍼 + model_registry (turbo→CT2 repo) - audio/ingest.py: ffprobe duration/size probe + 413 상한 훅 - cli transcribe: device-auto, model 오버라이드, 413 가드, model_used 출력 - 단위 테스트 3 (resolve_model, probe_media); README 갱신 검증(CPU): JFK 11s 클립 → 정확 전사, detected_lang=en. 10 tests pass, ruff clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 15:07:41 +09:00
lukehemmin	d75d60671e	chore(omc): seed build commands + hotpaths from P1 scaffolding Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 12:56:07 +09:00
lukehemmin	5d2604105b	feat(p1): scaffolding + Device Manager / VRAM probe + CLI detect - pyproject (uv, src layout) + extras: engine/gpu/api/diarize/llm - config.py (pydantic-settings, SCRIBE_ env) - devices/: vram_probe (NVML/psutil/disk) + DeviceManager → capability tier T0–T3, precision by cc/VRAM, worker estimate (계획 §3.6, AC-2/3) - cli.py (typer): detect (구현) + transcribe/bench/serve (스텁) - run.sh, .env.example, README Verified on GTX 1050/2GB: detect → T0_CPU (turbo doesn't fit → explicit downgrade, fail-explicit). Overrides (--device/--workers) work. 7 unit tests cover T0–T3 + overrides via synthetic VRAM. ruff clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 12:56:07 +09:00
				`@@ -0,0 +1 @@`
				`"""HTTP API (FastAPI) — 동기 테스트 API. 비동기 큐/실시간은 P2/P3."""`
				`@@ -0,0 +1 @@`
				`"""외부 노출 — Colab 등 공인 IP 부재 환경 (스펙 §8). MVP: cloudflared quick tunnel."""`
				`@@ -0,0 +1 @@`
				`"""후처리 — glossary/rules + (opt-in) LLM 보정 + confidence (스펙 §7)."""`
				`@@ -0,0 +1 @@`
				`"""결과 포맷·보관 (스펙 §4). MVP: 출력 포맷(txt/srt/vtt)."""`