docs: add Colab notebook for full-talk transcription (notebooks/colab_full_transcribe.ipynb)

GPU(T4) 셀: ffmpeg+uv → 익명 clone → uv sync(engine+gpu) → detect → 오디오 업로드 → large-v3-turbo 풀 전사 → transcript.txt 다운로드. (Colab은 사내 게이트 미도달이라 전사 전용; 보정은 온프렘.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
chore(omc): hotpaths (beam-size/correct/COLAB)
2026-06-09 07:33:54 +09:00 · 2026-06-09 07:29:37 +09:00 · 2026-06-09 07:29:37 +09:00 · 2026-06-09 07:09:51 +09:00 · 2026-06-09 07:09:51 +09:00
8 changed files with 398 additions and 59 deletions
@@ -30,7 +30,7 @@
  },
  "build": {
    "buildCommand": null,
-    "testCommand": "export PATH=\"$HOME/.local/bin:$HOME/.cargo/bin:$PATH\"\nclip=\"samples/ko_en/clips/GDG 인천 - EmbeddingGemma 200% 활용하기 - 지주영.m4a\"\nffmpeg -nostdin -ss 70 -t 12 -i \"$clip\" -ac 1 -ar 16000 -y /tmp/api_smoke.wav 2>/dev/null\nls -l /tmp/api_smoke.wav\necho \"=== pytest 재확인(413 수정 후) ===\"; uv run pytest -q 2>&1 | tail -3",
+    "testCommand": "export PATH=\"$HOME/.local/bin:$HOME/.cargo/bin:$PATH\"\necho \"=== ruff ===\"; uv run ruff check src/ tests/ && echo clean\necho \"=== pytest ===\"; uv run pytest -q 2>&1 | tail -2\necho \"=== --correct 경로(설정 없음 → 우아한 에러) ===\"\nuv run luke-scribe transcribe /tmp/jfk.flac --model tiny --language en --correct 2>&1 | tail -4; echo \"exit=${PIPESTATUS[0]}\"",
    "lintCommand": "ruff check",
    "devCommand": null,
    "scripts": {}
@@ -112,15 +112,21 @@
  },
  "hotPaths": [
    {
-      "path": "scripts/llm_correct.py",
-      "accessCount": 4,
-      "lastAccessed": 1780925584647,
+      "path": "src/luke_scribe/cli.py",
+      "accessCount": 8,
+      "lastAccessed": 1780957705972,
      "type": "file"
    },
    {
-      "path": "src/luke_scribe/cli.py",
+      "path": "src/luke_scribe/config.py",
+      "accessCount": 5,
+      "lastAccessed": 1780957473801,
+      "type": "file"
+    },
+    {
+      "path": "scripts/llm_correct.py",
      "accessCount": 4,
-      "lastAccessed": 1780927984393,
+      "lastAccessed": 1780925584647,
      "type": "file"
    },
    {
@@ -136,15 +142,21 @@
      "type": "file"
    },
    {
-      "path": "src/luke_scribe/config.py",
+      "path": "src/luke_scribe/postprocess/llm.py",
      "accessCount": 3,
-      "lastAccessed": 1780927884587,
+      "lastAccessed": 1780956524689,
      "type": "file"
    },
    {
      "path": "src/luke_scribe/api/routes/transcribe.py",
+      "accessCount": 3,
+      "lastAccessed": 1780956549345,
+      "type": "file"
+    },
+    {
+      "path": "tests/test_postprocess.py",
      "accessCount": 2,
-      "lastAccessed": 1780928097713,
+      "lastAccessed": 1780956556589,
      "type": "file"
    },
    {
@@ -267,12 +279,6 @@
      "lastAccessed": 1780927897308,
      "type": "file"
    },
-    {
-      "path": "src/luke_scribe/postprocess/llm.py",
-      "accessCount": 1,
-      "lastAccessed": 1780927908123,
-      "type": "file"
-    },
    {
      "path": "src/luke_scribe/api/__init__.py",
      "accessCount": 1,
@@ -327,17 +333,17 @@
      "lastAccessed": 1780928016400,
      "type": "file"
    },
-    {
-      "path": "tests/test_postprocess.py",
-      "accessCount": 1,
-      "lastAccessed": 1780928018944,
-      "type": "file"
-    },
    {
      "path": "tests/test_api.py",
      "accessCount": 1,
      "lastAccessed": 1780928028187,
      "type": "file"
+    },
+    {
+      "path": "COLAB.md",
+      "accessCount": 1,
+      "lastAccessed": 1780957731994,
+      "type": "file"
    }
  ],
  "userDirectives": [
@@ -0,0 +1,79 @@
+# Colab / GPU 풀 전사 가이드
+
+GPU 환경(Colab T4/A100 또는 온프렘 GPU)에서 **풀 강연을 빠르게** 전사(+선택 보정)합니다.
+CPU(개발 박스)는 풀 강연이 느려(turbo ~RTF 5×) 비권장 — 여기서 돌리세요.
+GPU(T4)에서 turbo는 대략 실시간의 ~0.1~0.3× → **37분 강연이 수 분**.
+
+---
+
+## A) Google Colab — 전사 전용
+
+> Colab은 외부 클라우드라 **사내 LLM 게이트(192.168.0.123)에 못 닿습니다** → `--correct`(보정) 불가, **전사만**.
+> 런타임 → 런타임 유형 변경 → **GPU(T4)** 선택.
+
+```python
+# 1) 시스템 의존성 + uv
+!apt-get -qq update && apt-get -qq install -y ffmpeg
+!curl -LsSf https://astral.sh/uv/install.sh | sh
+import os; os.environ["PATH"] = "/root/.local/bin:" + os.environ["PATH"]
+
+# 2) 코드 (저장소 익명 read 허용)
+!git clone -b feat/p1-core https://git.lukehemmin.com/lukehemmin/luke_scribe.git
+%cd luke_scribe
+
+# 3) 의존성 (엔진 + GPU CUDA 런타임)
+!uv sync --extra engine --extra gpu
+
+# 4) GPU 인식 확인 (T3면 turbo+large-v3 동시상주)
+!uv run luke-scribe detect
+
+# 5) 오디오 업로드 (또는 Drive 마운트)
+from google.colab import files
+AUDIO = list(files.upload().keys())[0]
+
+# 6) 풀 전사 (large-v3-turbo) — 더 높은 정확도는 --model large-v3
+!uv run luke-scribe transcribe "$AUDIO" --model large-v3-turbo --language ko --timestamps | tee transcript.txt
+```
+
+### Colab을 API로 외부 노출하려면
+```python
+# cloudflared 공개 URL 발급 → 외부에서 curl
+!uv sync --extra engine --extra gpu --extra api
+import subprocess, os
+os.environ["SCRIBE_API_KEYS"] = '["colab-test"]'
+!nohup uv run luke-scribe serve --host 0.0.0.0 --port 8000 --tunnel cloudflare > serve.log 2>&1 &
+import time; time.sleep(8); print(open("serve.log").read())   # public *.trycloudflare.com URL 확인
+```
+
+---
+
+## B) 온프렘 GPU — 전사 + 사내 LLM 보정 (풀 파이프라인)
+
+사내망(게이트 192.168.0.123 도달) + GPU 머신이면 **음차→영문 복원까지** 한 번에:
+
+```bash
+git clone -b feat/p1-core https://git.lukehemmin.com/lukehemmin/luke_scribe.git && cd luke_scribe
+uv sync --extra engine --extra gpu
+
+export SCRIBE_LLM_BASE_URL=http://192.168.0.123:8080/v1
+export SCRIBE_LLM_API_KEY=<사내 키>          # 셸 히스토리 주의
+export SCRIBE_LLM_MODEL=copilot-gpt-4o
+export SCRIBE_LLM_MAX_CHARS=3000             # 사내 LLM 컨텍스트 창에 맞춰(~8k→1500/~16k→3000/~30k→6000)
+
+# 전사 + 청크 보정을 한 명령으로
+uv run luke-scribe transcribe talk.m4a --model large-v3-turbo --language ko --correct | tee transcript.txt
+```
+
+API로:
+```bash
+uv run luke-scribe serve                     # 출력된 X-API-Key 사용
+curl -H "X-API-Key: <키>" -F file=@talk.m4a -F model=large-v3-turbo -F correct=true \
+     http://localhost:8000/v1/transcribe
+```
+
+---
+
+## 참고
+- 보정은 긴 전사를 `SCRIBE_LLM_MAX_CHARS` 청크로 분할 + **러닝 글로서리**로 처리(작은 컨텍스트 창 대응).
+- 약 GPU(1050/2GB)는 turbo도 안 들어가 자동으로 **CPU(T0)** 로 강등 — `detect`로 등급 확인.
+- 오디오 파일은 저장소에 없음(`.gitignore`) — Colab 업로드/Drive 또는 온프렘 로컬 경로 사용.
@@ -0,0 +1,130 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# luke_scribe — Colab 풀 강연 전사\n",
+    "\n",
+    "GPU(T4)에서 풀 강연을 **수 분**에 전사합니다.\n",
+    "\n",
+    "**먼저:** 런타임 → 런타임 유형 변경 → 하드웨어 가속기 **GPU** 선택.\n",
+    "\n",
+    "> ⚠️ Colab은 외부라 **사내 LLM 게이트(192.168.0.123)에 못 닿습니다** → 보정(`--correct`) 불가, **전사만**. 보정까지는 사내망 GPU에서 (repo `COLAB.md` B절).\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "# 0) GPU 확인 (없으면 런타임 유형을 GPU로)\n",
+    "!nvidia-smi -L || echo \"GPU 없음 → 런타임 유형을 GPU로 바꾸세요\"\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "# 1) 시스템 의존성 + uv\n",
+    "!apt-get -qq update && apt-get -qq install -y ffmpeg\n",
+    "!curl -LsSf https://astral.sh/uv/install.sh | sh\n",
+    "import os\n",
+    "os.environ['PATH'] = '/root/.local/bin:' + os.environ['PATH']\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "# 2) 코드 가져오기 (저장소 익명 read 허용)\n",
+    "!git clone -b feat/p1-core https://git.lukehemmin.com/lukehemmin/luke_scribe.git\n",
+    "%cd luke_scribe\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "# 3) 의존성 (엔진 + GPU CUDA 런타임) — 수 분 소요\n",
+    "!uv sync --extra engine --extra gpu\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "# 4) 하드웨어 등급 확인 (T3 = turbo+large-v3 동시상주)\n",
+    "!uv run luke-scribe detect\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "# 5) 강연 오디오 업로드 (m4a/mp3/wav/mp4 …)\n",
+    "from google.colab import files\n",
+    "AUDIO = list(files.upload().keys())[0]\n",
+    "print('업로드:', AUDIO)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "# 6) 풀 전사 (large-v3-turbo; 더 정확히는 --model large-v3)\n",
+    "!uv run luke-scribe transcribe \"$AUDIO\" --model large-v3-turbo --language ko --timestamps | tee transcript.txt\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "# 7) 전사문 내려받기\n",
+    "from google.colab import files\n",
+    "files.download('transcript.txt')\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 참고\n",
+    "- **모델**: `large-v3-turbo`(빠름) ↔ `large-v3`(정확). `detect`가 T0(CPU)면 약 GPU(느림).\n",
+    "- **보정(음차→영문)**: Colab 불가(게이트 미도달). 사내망 GPU에서 `--correct` + `SCRIBE_LLM_*` (`COLAB.md` B절).\n",
+    "- **속도**: T4 turbo ≈ 실시간 0.1~0.3× → 37분 강연 수 분.\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "accelerator": "GPU",
+  "colab": {
+   "provenance": [],
+   "gpuType": "T4"
+  },
+  "kernelspec": {
+   "display_name": "Python 3",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
@@ -93,6 +93,7 @@ def transcribe_ep(  # noqa: PLR0913 — 요청 옵션 다수(스펙 options 스
                        base_url=settings.llm_base_url,
                        api_key=settings.llm_api_key,
                        model=settings.llm_model,
+                        max_chars=settings.llm_max_chars,
                    )
                )
                corrected = True
@@ -55,6 +55,8 @@ def transcribe(
    device: str = typer.Option("auto", help="auto|cpu|cuda"),
    word_timestamps: bool = typer.Option(False, "--word-timestamps"),
    vad: bool = typer.Option(True, "--vad/--no-vad", help="무음 제거"),
+    beam_size: int = typer.Option(None, "--beam-size", help="디코딩 빔(CPU 1~2 권장=속도↑)"),
+    correct: bool = typer.Option(False, "--correct", help="사내 LLM 보정(SCRIBE_LLM_* 설정 필요)"),
    timestamps: bool = typer.Option(False, "--timestamps", help="세그먼트 [start–end] 표시"),
 ) -> None:
    """단발 파일 전사 (faster-whisper, CPU/GPU 자동, AC-4 일부)."""
@@ -90,17 +92,45 @@ def transcribe(
    )

    engine = FasterWhisperEngine(model_name, dev, profile.compute_type, cache_dir=settings.model_cache_dir)
-    segments, tinfo = engine.transcribe(file, language=lang, word_timestamps=word_timestamps, vad=vad)
+    segments, tinfo = engine.transcribe(
+        file, language=lang, word_timestamps=word_timestamps, vad=vad,
+        beam_size=(beam_size or settings.beam_size),
+    )

-    count = 0
+    seg_list = []
    for seg in segments:
-        count += 1
+        seg_list.append({"start": seg.start, "end": seg.end, "text": seg.text.strip()})
+        if not correct:  # 스트리밍 출력(보정 시엔 전체를 모은 뒤 한 번에)
            if timestamps:
                console.print(f"[cyan][{seg.start:6.2f}–{seg.end:6.2f}][/] {seg.text.strip()}")
            else:
                console.print(seg.text.strip())
+
+    if correct:
+        from .postprocess import llm as llm_correct
+        from .postprocess import rules
+
+        text = " ".join(s["text"] for s in seg_list).strip()
+        try:
+            text = rules.normalize(
+                llm_correct.correct(
+                    text,
+                    base_url=settings.llm_base_url,
+                    api_key=settings.llm_api_key,
+                    model=settings.llm_model,
+                    max_chars=settings.llm_max_chars,
+                )
+            )
+        except llm_correct.LLMNotConfigured as exc:
+            console.print(f"[red]--correct:[/] {exc}")
+            raise typer.Exit(code=1) from exc
+        console.print(text)
+
    detected = getattr(tinfo, "language", None)
-    console.print(f"[green]✓ {count} segments · detected_lang={detected} · model_used={model_name}[/]")
+    console.print(
+        f"[green]✓ {len(seg_list)} segments · detected_lang={detected} · "
+        f"model_used={model_name} · corrected={correct}[/]"
+    )


@app.command()
@@ -15,6 +15,7 @@ class Settings(BaseSettings):
    device: str = "auto"
    compute_type: str | None = None      # None=자동(cc/VRAM 기반)
    workers: int | None = None           # None=자동 산정
+    beam_size: int = 5                    # 디코딩 빔(CPU는 1~2 권장=속도↑, GPU는 5)

    # 언어 (기본 ko, 요청별 override)
    language: str = "ko"
@@ -43,6 +44,8 @@ class Settings(BaseSettings):
    llm_base_url: str | None = None      # 예: http://192.168.0.123:8080/v1 (allowlist=이 endpoint만)
    llm_api_key: str | None = None       # env SCRIBE_LLM_API_KEY 로만 주입
    llm_model: str = "copilot-gpt-4o"
+    # 보정 청크 크기(글자) — 사내 LLM 컨텍스트 창에 맞춰 조정 (예: ~8k창→1500, ~16k→3000, ~30k→6000)
+    llm_max_chars: int = 3000


 settings = Settings()
@@ -1,13 +1,17 @@
 """LLM 보정 (스펙 §7 stage 3 / §3.8) — 음차된 영문 용어를 문맥+지식으로 복원.

-OpenAI 호환 백엔드(사내/로컬). **opt-in**(요청 correct=true에서만 호출), **allowlist**(설정된
-base_url만), **감사로그**(호출 1줄). transient(연결 reset/timeout) 재시도.
-긴 입력 청크/러닝글로서리는 TODO — MVP는 단일 호출(짧은 클립엔 충분).
+작은 컨텍스트 창 대응(사내 GPT-4o < 30k 토큰): 긴 전사는 **문장 경계로 청크 분할**,
+각 청크를 순차 보정하며 **이미 확정된 영문 표기(러닝 글로서리)** 를 다음 청크로 전달 →
+큰 창 없이도 강연 전체 용어 일관성 유지.
+
+OpenAI 호환 백엔드(사내/로컬). **opt-in**(요청 correct=true) · **allowlist**(설정 base_url만) ·
+**감사로그**(호출 요약 1줄). transient(연결 reset/timeout) 재시도.
 """
 from __future__ import annotations

 import json
 import logging
+import re
 import time
 import urllib.error
 import urllib.request
@@ -20,47 +24,115 @@ SYSTEM = (
    "일반 한국어는 그대로 두고, 확실하지 않으면 바꾸지 마라. 설명 없이 교정된 전사문만 출력하라."
 )

+_SENT_RE = re.compile(r"(?<=[.!?。…\n])\s+")          # 문장 경계
+_TERM_RE = re.compile(r"[A-Za-z][A-Za-z0-9.+/#-]{1,}")  # 러닝 글로서리용 영문 토큰
+_GLOSSARY_CAP = 60
+

 class LLMNotConfigured(RuntimeError):
    """llm_base_url / llm_api_key 미설정."""


+def _chunk(text: str, max_chars: int) -> list[str]:
+    """문장 경계로 max_chars 이하 청크 패킹. 한 문장이 과대하면 글자 단위 강제 분할."""
+    if len(text) <= max_chars:
+        return [text]
+    packed: list[str] = []
+    cur = ""
+    for part in _SENT_RE.split(text):
+        if not part:
+            continue
+        if cur and len(cur) + len(part) + 1 > max_chars:
+            packed.append(cur)
+            cur = part
+        else:
+            cur = f"{cur} {part}" if cur else part
+    if cur:
+        packed.append(cur)
+    out: list[str] = []
+    for c in packed:  # 안전망: 단일 문장이 너무 길면 글자 단위 강제 분할
+        if len(c) > max_chars:
+            out.extend(c[i : i + max_chars] for i in range(0, len(c), max_chars))
+        else:
+            out.append(c)
+    return out
+
+
+def _terms(text: str) -> list[str]:
+    seen: dict[str, None] = {}
+    for m in _TERM_RE.finditer(text):
+        seen.setdefault(m.group(0), None)
+    return list(seen)
+
+
+def _request(
+    messages: list[dict],
+    *,
+    url: str,
+    api_key: str,
+    model: str,
+    retries: int,
+    timeout: float,
+) -> str:
+    payload = {"model": model, "temperature": 0, "messages": messages}
+    req = urllib.request.Request(
+        url,
+        data=json.dumps(payload).encode(),
+        headers={"Content-Type": "application/json", "Authorization": "Bearer " + api_key},
+    )
+    for attempt in range(1, retries + 1):
+        try:
+            with urllib.request.urlopen(req, timeout=timeout) as resp:
+                return json.loads(resp.read())["choices"][0]["message"]["content"]
+        except urllib.error.HTTPError:
+            raise  # 실제 HTTP 응답(401/4xx) — 재시도 무의미
+        except (urllib.error.URLError, OSError):  # transient
+            if attempt == retries:
+                raise
+            time.sleep(1.0 * attempt)
+    raise RuntimeError("unreachable")
+
+
 def correct(
    text: str,
    *,
    base_url: str | None,
    api_key: str | None,
    model: str = "copilot-gpt-4o",
+    max_chars: int = 3000,
    retries: int = 4,
    timeout: float = 90.0,
 ) -> str:
+    """음차 영문 용어 복원. max_chars로 청크 분할(작은 컨텍스트 창 대응)."""
    if not base_url or not api_key:
-        raise LLMNotConfigured("llm_base_url/llm_api_key 미설정 — correct를 쓰려면 SCRIBE_LLM_* 설정 필요")
+        raise LLMNotConfigured("llm_base_url/llm_api_key 미설정 — correct에 SCRIBE_LLM_* 필요")
    url = base_url.rstrip("/") + "/chat/completions"
-    payload = {
-        "model": model,
-        "temperature": 0,
-        "messages": [
-            {"role": "system", "content": SYSTEM},
-            {"role": "user", "content": text},
-        ],
-    }
-    req = urllib.request.Request(
-        url,
-        data=json.dumps(payload).encode(),
-        headers={"Content-Type": "application/json", "Authorization": "Bearer " + api_key},
+    chunks = _chunk(text, max_chars)
+    logger.info(
+        "llm-correct egress endpoint=%s model=%s chars=%d chunks=%d",
+        url, model, len(text), len(chunks),
    )
-    # 감사로그 (allowlist=설정 endpoint, 호출 1줄)
-    logger.info("llm-correct egress endpoint=%s model=%s chars=%d", url, model, len(text))
-    for attempt in range(1, retries + 1):
-        try:
-            with urllib.request.urlopen(req, timeout=timeout) as resp:
-                data = json.loads(resp.read())
-            return data["choices"][0]["message"]["content"]
-        except urllib.error.HTTPError:
-            raise  # 실제 HTTP 응답(401/4xx) — 재시도 무의미
-        except (urllib.error.URLError, OSError):  # 연결 reset/timeout 등 transient
-            if attempt == retries:
-                raise
-            time.sleep(1.0 * attempt)
-    raise RuntimeError("unreachable")
+    glossary: dict[str, None] = {}
+    out: list[str] = []
+    for chunk in chunks:
+        system = SYSTEM
+        if glossary:
+            system += (
+                "\n이미 이 전사에서 확정된 영문 표기: "
+                + ", ".join(glossary)
+                + ". 같은/유사 용어는 이 표기로 통일하라."
+            )
+        corrected = _request(
+            [{"role": "system", "content": system}, {"role": "user", "content": chunk}],
+            url=url,
+            api_key=api_key,
+            model=model,
+            retries=retries,
+            timeout=timeout,
+        )
+        out.append(corrected)
+        for term in _terms(corrected):
+            glossary.setdefault(term, None)
+        if len(glossary) > _GLOSSARY_CAP:
+            glossary = dict(list(glossary.items())[-_GLOSSARY_CAP:])
+    return " ".join(out).strip()
@@ -39,3 +39,21 @@ def test_llm_correct_monkeypatched(monkeypatch):
    monkeypatch.setattr(llm.urllib.request, "urlopen", fake_urlopen)
    out = llm.correct("인베딩 점마", base_url="http://x/v1", api_key="k", model="m")
    assert out == "EmbeddingGemma 복원됨"
+
+
+def test_llm_chunking_and_glossary(monkeypatch):
+    """긴 입력 → 청크 분할 + 러닝 글로서리(작은 컨텍스트 창 대응)."""
+    calls: list[list[dict]] = []
+
+    def fake_request(messages, **_kw):
+        calls.append(messages)
+        return messages[1]["content"]  # 청크 그대로 echo
+
+    monkeypatch.setattr(llm, "_request", fake_request)
+    long_text = ". ".join(f"문장{i} EmbeddingGemma 설명" for i in range(400))
+    out = llm.correct(long_text, base_url="http://x/v1", api_key="k", max_chars=200)
+
+    assert len(calls) > 1  # 분할됨
+    assert "EmbeddingGemma" in out  # 재조립됨
+    # 2번째 청크부터 이전에 확정된 영문 표기가 system에 주입됨
+    assert any("확정된 영문 표기" in m[0]["content"] for m in calls[1:])
Author	SHA1	Message	Date
lukehemmin	a5e6d56568	docs: add Colab notebook for full-talk transcription (notebooks/colab_full_transcribe.ipynb) GPU(T4) 셀: ffmpeg+uv → 익명 clone → uv sync(engine+gpu) → detect → 오디오 업로드 → large-v3-turbo 풀 전사 → transcript.txt 다운로드. (Colab은 사내 게이트 미도달이라 전사 전용; 보정은 온프렘.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 07:33:54 +09:00
lukehemmin	cd2f807557	chore(omc): hotpaths (beam-size/correct/COLAB) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 07:29:37 +09:00
lukehemmin	7a8cc12cb3	feat(cli): --beam-size + --correct; add COLAB.md GPU full-transcribe guide - transcribe: --beam-size(CPU 속도), --correct(사내 LLM 청크 보정, SCRIBE_LLM_*), config.beam_size(CPU 1~2 권장). 보정 시 전체 수집 후 한 번에 출력. - COLAB.md: Colab(전사 전용·게이트 미도달) + 온프렘 GPU(전사+보정 풀 파이프라인) 가이드. 23 tests pass, ruff clean. --correct 미설정 시 우아한 에러 검증. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 07:29:37 +09:00
lukehemmin	1a91060c43	chore(omc): hotpaths (chunked correction) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 07:09:51 +09:00
lukehemmin	b721ca6419	feat(api): chunk LLM correction for small context windows (+running glossary) 사내 GPT-4o 컨텍스트(<30k)에 맞춰 긴 전사를 문장 경계로 청크 분할하고, 각 청크 보정의 영문 용어를 '러닝 글로서리'로 다음 청크 system에 전달 → 큰 창 없이 강연 전체 용어 일관성 유지. config.llm_max_chars(기본 3000; ~8k창→1500/~16k→3000/~30k→6000). 과대 단일문장은 글자단위 강제 분할 안전망. 23 tests pass(청크 분할/글로서리 주입 포함), ruff clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 07:09:51 +09:00