docs: add Colab notebook for full-talk transcription (notebooks/colab_full_transcribe.ipynb)

GPU(T4) 셀: ffmpeg+uv → 익명 clone → uv sync(engine+gpu) → detect → 오디오 업로드 → large-v3-turbo 풀 전사 → transcript.txt 다운로드. (Colab은 사내 게이트 미도달이라 전사 전용; 보정은 온프렘.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
chore(omc): hotpaths (beam-size/correct/COLAB)
2026-06-09 07:33:54 +09:00 · 2026-06-09 07:29:37 +09:00 · 2026-06-09 07:29:37 +09:00 · 2026-06-09 07:09:51 +09:00 · 2026-06-09 07:09:51 +09:00
8 changed files with 398 additions and 59 deletions
@@ -30,7 +30,7 @@
  },
  "build": {
    "buildCommand": null,
-    "testCommand": "export PATH=\"$HOME/.local/bin:$HOME/.cargo/bin:$PATH\"\nclip=\"samples/ko_en/clips/GDG 인천 - EmbeddingGemma 200% 활용하기 - 지주영.m4a\"\nffmpeg -nostdin -ss 70 -t 12 -i \"$clip\" -ac 1 -ar 16000 -y /tmp/api_smoke.wav 2>/dev/null\nls -l /tmp/api_smoke.wav\necho \"=== pytest 재확인(413 수정 후) ===\"; uv run pytest -q 2>&1 | tail -3",
+    "testCommand": "export PATH=\"$HOME/.local/bin:$HOME/.cargo/bin:$PATH\"\necho \"=== ruff ===\"; uv run ruff check src/ tests/ && echo clean\necho \"=== pytest ===\"; uv run pytest -q 2>&1 | tail -2\necho \"=== --correct 경로(설정 없음 → 우아한 에러) ===\"\nuv run luke-scribe transcribe /tmp/jfk.flac --model tiny --language en --correct 2>&1 | tail -4; echo \"exit=${PIPESTATUS[0]}\"",
    "lintCommand": "ruff check",
    "devCommand": null,
    "scripts": {}
@@ -112,15 +112,21 @@
  },
  "hotPaths": [
    {
-      "path": "scripts/llm_correct.py",
+      "path": "src/luke_scribe/cli.py",
-      "accessCount": 4,
+      "accessCount": 8,
-      "lastAccessed": 1780925584647,
+      "lastAccessed": 1780957705972,
      "type": "file"
    },
    {
-      "path": "src/luke_scribe/cli.py",
+      "path": "src/luke_scribe/config.py",
      "accessCount": 5,
      "lastAccessed": 1780957473801,
      "type": "file"
    },
    {
      "path": "scripts/llm_correct.py",
      "accessCount": 4,
-      "lastAccessed": 1780927984393,
+      "lastAccessed": 1780925584647,
      "type": "file"
    },
    {
@@ -136,15 +142,21 @@
      "type": "file"
    },
    {
-      "path": "src/luke_scribe/config.py",
+      "path": "src/luke_scribe/postprocess/llm.py",
      "accessCount": 3,
-      "lastAccessed": 1780927884587,
+      "lastAccessed": 1780956524689,
      "type": "file"
    },
    {
      "path": "src/luke_scribe/api/routes/transcribe.py",
      "accessCount": 3,
      "lastAccessed": 1780956549345,
      "type": "file"
    },
    {
      "path": "tests/test_postprocess.py",
      "accessCount": 2,
-      "lastAccessed": 1780928097713,
+      "lastAccessed": 1780956556589,
      "type": "file"
    },
    {
@@ -267,12 +279,6 @@
      "lastAccessed": 1780927897308,
      "type": "file"
    },
    {
      "path": "src/luke_scribe/postprocess/llm.py",
      "accessCount": 1,
      "lastAccessed": 1780927908123,
      "type": "file"
    },
    {
      "path": "src/luke_scribe/api/__init__.py",
      "accessCount": 1,
@@ -327,17 +333,17 @@
      "lastAccessed": 1780928016400,
      "type": "file"
    },
    {
      "path": "tests/test_postprocess.py",
      "accessCount": 1,
      "lastAccessed": 1780928018944,
      "type": "file"
    },
    {
      "path": "tests/test_api.py",
      "accessCount": 1,
      "lastAccessed": 1780928028187,
      "type": "file"
    },
    {
      "path": "COLAB.md",
      "accessCount": 1,
      "lastAccessed": 1780957731994,
      "type": "file"
    }
  ],
  "userDirectives": [
@@ -0,0 +1,79 @@
 # Colab / GPU 풀 전사 가이드
 GPU 환경(Colab T4/A100 또는 온프렘 GPU)에서 **풀 강연을 빠르게** 전사(+선택 보정)합니다.
 CPU(개발 박스)는 풀 강연이 느려(turbo ~RTF 5×) 비권장 — 여기서 돌리세요.
 GPU(T4)에서 turbo는 대략 실시간의 ~0.1~0.3× → **37분 강연이 수 분**.
 ---
 ## A) Google Colab — 전사 전용
 > Colab은 외부 클라우드라 **사내 LLM 게이트(192.168.0.123)에 못 닿습니다** → `--correct`(보정) 불가, **전사만**.
 > 런타임 → 런타임 유형 변경 → **GPU(T4)** 선택.
 ```python
 # 1) 시스템 의존성 + uv
 !apt-get -qq update && apt-get -qq install -y ffmpeg
 !curl -LsSf https://astral.sh/uv/install.sh | sh
 import os; os.environ["PATH"] = "/root/.local/bin:" + os.environ["PATH"]
 # 2) 코드 (저장소 익명 read 허용)
 !git clone -b feat/p1-core https://git.lukehemmin.com/lukehemmin/luke_scribe.git
 %cd luke_scribe
 # 3) 의존성 (엔진 + GPU CUDA 런타임)
 !uv sync --extra engine --extra gpu
 # 4) GPU 인식 확인 (T3면 turbo+large-v3 동시상주)
 !uv run luke-scribe detect
 # 5) 오디오 업로드 (또는 Drive 마운트)
 from google.colab import files
 AUDIO = list(files.upload().keys())[0]
 # 6) 풀 전사 (large-v3-turbo) — 더 높은 정확도는 --model large-v3
 !uv run luke-scribe transcribe "$AUDIO" --model large-v3-turbo --language ko --timestamps | tee transcript.txt
 ```
 ### Colab을 API로 외부 노출하려면
 ```python
 # cloudflared 공개 URL 발급 → 외부에서 curl
 !uv sync --extra engine --extra gpu --extra api
 import subprocess, os
 os.environ["SCRIBE_API_KEYS"] = '["colab-test"]'
 !nohup uv run luke-scribe serve --host 0.0.0.0 --port 8000 --tunnel cloudflare > serve.log 2>&1 &
 import time; time.sleep(8); print(open("serve.log").read())   # public *.trycloudflare.com URL 확인
 ```
 ---
 ## B) 온프렘 GPU — 전사 + 사내 LLM 보정 (풀 파이프라인)
 사내망(게이트 192.168.0.123 도달) + GPU 머신이면 **음차→영문 복원까지** 한 번에:
 ```bash
 git clone -b feat/p1-core https://git.lukehemmin.com/lukehemmin/luke_scribe.git && cd luke_scribe
 uv sync --extra engine --extra gpu
 export SCRIBE_LLM_BASE_URL=http://192.168.0.123:8080/v1
 export SCRIBE_LLM_API_KEY=<사내 키>          # 셸 히스토리 주의
 export SCRIBE_LLM_MODEL=copilot-gpt-4o
 export SCRIBE_LLM_MAX_CHARS=3000             # 사내 LLM 컨텍스트 창에 맞춰(~8k→1500/~16k→3000/~30k→6000)
 # 전사 + 청크 보정을 한 명령으로
 uv run luke-scribe transcribe talk.m4a --model large-v3-turbo --language ko --correct | tee transcript.txt
 ```
 API로:
 ```bash
 uv run luke-scribe serve                     # 출력된 X-API-Key 사용
 curl -H "X-API-Key: <키>" -F file=@talk.m4a -F model=large-v3-turbo -F correct=true \
     http://localhost:8000/v1/transcribe
 ```
 ---
 ## 참고
 - 보정은 긴 전사를 `SCRIBE_LLM_MAX_CHARS` 청크로 분할 + **러닝 글로서리**로 처리(작은 컨텍스트 창 대응).
 - 약 GPU(1050/2GB)는 turbo도 안 들어가 자동으로 **CPU(T0)** 로 강등 — `detect`로 등급 확인.
 - 오디오 파일은 저장소에 없음(`.gitignore`) — Colab 업로드/Drive 또는 온프렘 로컬 경로 사용.
@@ -0,0 +1,130 @@
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# luke_scribe — Colab 풀 강연 전사\n",
    "\n",
    "GPU(T4)에서 풀 강연을 **수 분**에 전사합니다.\n",
    "\n",
    "**먼저:** 런타임 → 런타임 유형 변경 → 하드웨어 가속기 **GPU** 선택.\n",
    "\n",
    "> ⚠️ Colab은 외부라 **사내 LLM 게이트(192.168.0.123)에 못 닿습니다** → 보정(`--correct`) 불가, **전사만**. 보정까지는 사내망 GPU에서 (repo `COLAB.md` B절).\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "# 0) GPU 확인 (없으면 런타임 유형을 GPU로)\n",
    "!nvidia-smi -L || echo \"GPU 없음 → 런타임 유형을 GPU로 바꾸세요\"\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "# 1) 시스템 의존성 + uv\n",
    "!apt-get -qq update && apt-get -qq install -y ffmpeg\n",
    "!curl -LsSf https://astral.sh/uv/install.sh | sh\n",
    "import os\n",
    "os.environ['PATH'] = '/root/.local/bin:' + os.environ['PATH']\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "# 2) 코드 가져오기 (저장소 익명 read 허용)\n",
    "!git clone -b feat/p1-core https://git.lukehemmin.com/lukehemmin/luke_scribe.git\n",
    "%cd luke_scribe\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "# 3) 의존성 (엔진 + GPU CUDA 런타임) — 수 분 소요\n",
    "!uv sync --extra engine --extra gpu\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "# 4) 하드웨어 등급 확인 (T3 = turbo+large-v3 동시상주)\n",
    "!uv run luke-scribe detect\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "# 5) 강연 오디오 업로드 (m4a/mp3/wav/mp4 …)\n",
    "from google.colab import files\n",
    "AUDIO = list(files.upload().keys())[0]\n",
    "print('업로드:', AUDIO)\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "# 6) 풀 전사 (large-v3-turbo; 더 정확히는 --model large-v3)\n",
    "!uv run luke-scribe transcribe \"$AUDIO\" --model large-v3-turbo --language ko --timestamps | tee transcript.txt\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "# 7) 전사문 내려받기\n",
    "from google.colab import files\n",
    "files.download('transcript.txt')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 참고\n",
    "- **모델**: `large-v3-turbo`(빠름) ↔ `large-v3`(정확). `detect`가 T0(CPU)면 약 GPU(느림).\n",
    "- **보정(음차→영문)**: Colab 불가(게이트 미도달). 사내망 GPU에서 `--correct` + `SCRIBE_LLM_*` (`COLAB.md` B절).\n",
    "- **속도**: T4 turbo ≈ 실시간 0.1~0.3× → 37분 강연 수 분.\n"
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "provenance": [],
   "gpuType": "T4"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
 }
@@ -93,6 +93,7 @@ def transcribe_ep(  # noqa: PLR0913 — 요청 옵션 다수(스펙 options 스
                        base_url=settings.llm_base_url,
                        api_key=settings.llm_api_key,
                        model=settings.llm_model,
                        max_chars=settings.llm_max_chars,
                    )
                )
                corrected = True
@@ -55,6 +55,8 @@ def transcribe(
    device: str = typer.Option("auto", help="auto|cpu|cuda"),
    word_timestamps: bool = typer.Option(False, "--word-timestamps"),
    vad: bool = typer.Option(True, "--vad/--no-vad", help="무음 제거"),
    beam_size: int = typer.Option(None, "--beam-size", help="디코딩 빔(CPU 1~2 권장=속도↑)"),
    correct: bool = typer.Option(False, "--correct", help="사내 LLM 보정(SCRIBE_LLM_* 설정 필요)"),
    timestamps: bool = typer.Option(False, "--timestamps", help="세그먼트 [start–end] 표시"),
 ) -> None:
    """단발 파일 전사 (faster-whisper, CPU/GPU 자동, AC-4 일부)."""
@@ -90,17 +92,45 @@ def transcribe(
    )
    engine = FasterWhisperEngine(model_name, dev, profile.compute_type, cache_dir=settings.model_cache_dir)
-    segments, tinfo = engine.transcribe(file, language=lang, word_timestamps=word_timestamps, vad=vad)
+    segments, tinfo = engine.transcribe(
        file, language=lang, word_timestamps=word_timestamps, vad=vad,
        beam_size=(beam_size or settings.beam_size),
    )
-    count = 0
+    seg_list = []
    for seg in segments:
-        count += 1
+        seg_list.append({"start": seg.start, "end": seg.end, "text": seg.text.strip()})
-        if timestamps:
+        if not correct:  # 스트리밍 출력(보정 시엔 전체를 모은 뒤 한 번에)
-            console.print(f"[cyan][{seg.start:6.2f}–{seg.end:6.2f}][/] {seg.text.strip()}")
+            if timestamps:
-        else:
+                console.print(f"[cyan][{seg.start:6.2f}–{seg.end:6.2f}][/] {seg.text.strip()}")
-            console.print(seg.text.strip())
+            else:
                console.print(seg.text.strip())
    if correct:
        from .postprocess import llm as llm_correct
        from .postprocess import rules
        text = " ".join(s["text"] for s in seg_list).strip()
        try:
            text = rules.normalize(
                llm_correct.correct(
                    text,
                    base_url=settings.llm_base_url,
                    api_key=settings.llm_api_key,
                    model=settings.llm_model,
                    max_chars=settings.llm_max_chars,
                )
            )
        except llm_correct.LLMNotConfigured as exc:
            console.print(f"[red]--correct:[/] {exc}")
            raise typer.Exit(code=1) from exc
        console.print(text)
    detected = getattr(tinfo, "language", None)
-    console.print(f"[green]✓ {count} segments · detected_lang={detected} · model_used={model_name}[/]")
+    console.print(
        f"[green]✓ {len(seg_list)} segments · detected_lang={detected} · "
        f"model_used={model_name} · corrected={correct}[/]"
    )
@app.command()
@@ -15,6 +15,7 @@ class Settings(BaseSettings):
    device: str = "auto"
    compute_type: str | None = None      # None=자동(cc/VRAM 기반)
    workers: int | None = None           # None=자동 산정
    beam_size: int = 5                    # 디코딩 빔(CPU는 1~2 권장=속도↑, GPU는 5)
    # 언어 (기본 ko, 요청별 override)
    language: str = "ko"
@@ -43,6 +44,8 @@ class Settings(BaseSettings):
    llm_base_url: str | None = None      # 예: http://192.168.0.123:8080/v1 (allowlist=이 endpoint만)
    llm_api_key: str | None = None       # env SCRIBE_LLM_API_KEY 로만 주입
    llm_model: str = "copilot-gpt-4o"
    # 보정 청크 크기(글자) — 사내 LLM 컨텍스트 창에 맞춰 조정 (예: ~8k창→1500, ~16k→3000, ~30k→6000)
    llm_max_chars: int = 3000
 settings = Settings()
@@ -1,13 +1,17 @@
 """LLM 보정 (스펙 §7 stage 3 / §3.8) — 음차된 영문 용어를 문맥+지식으로 복원.
-OpenAI 호환 백엔드(사내/로컬). **opt-in**(요청 correct=true에서만 호출), **allowlist**(설정된
+작은 컨텍스트 창 대응(사내 GPT-4o < 30k 토큰): 긴 전사는 **문장 경계로 청크 분할**,
-base_url만), **감사로그**(호출 1줄). transient(연결 reset/timeout) 재시도.
+각 청크를 순차 보정하며 **이미 확정된 영문 표기(러닝 글로서리)** 를 다음 청크로 전달 →
-긴 입력 청크/러닝글로서리는 TODO — MVP는 단일 호출(짧은 클립엔 충분).
+큰 창 없이도 강연 전체 용어 일관성 유지.
 OpenAI 호환 백엔드(사내/로컬). **opt-in**(요청 correct=true) · **allowlist**(설정 base_url만) ·
 **감사로그**(호출 요약 1줄). transient(연결 reset/timeout) 재시도.
 """
 from __future__ import annotations
 import json
 import logging
 import re
 import time
 import urllib.error
 import urllib.request
@@ -20,47 +24,115 @@ SYSTEM = (
    "일반 한국어는 그대로 두고, 확실하지 않으면 바꾸지 마라. 설명 없이 교정된 전사문만 출력하라."
 )
 _SENT_RE = re.compile(r"(?<=[.!?。…\n])\s+")          # 문장 경계
 _TERM_RE = re.compile(r"[A-Za-z][A-Za-z0-9.+/#-]{1,}")  # 러닝 글로서리용 영문 토큰
 _GLOSSARY_CAP = 60
 class LLMNotConfigured(RuntimeError):
    """llm_base_url / llm_api_key 미설정."""
 def _chunk(text: str, max_chars: int) -> list[str]:
    """문장 경계로 max_chars 이하 청크 패킹. 한 문장이 과대하면 글자 단위 강제 분할."""
    if len(text) <= max_chars:
        return [text]
    packed: list[str] = []
    cur = ""
    for part in _SENT_RE.split(text):
        if not part:
            continue
        if cur and len(cur) + len(part) + 1 > max_chars:
            packed.append(cur)
            cur = part
        else:
            cur = f"{cur} {part}" if cur else part
    if cur:
        packed.append(cur)
    out: list[str] = []
    for c in packed:  # 안전망: 단일 문장이 너무 길면 글자 단위 강제 분할
        if len(c) > max_chars:
            out.extend(c[i : i + max_chars] for i in range(0, len(c), max_chars))
        else:
            out.append(c)
    return out
 def _terms(text: str) -> list[str]:
    seen: dict[str, None] = {}
    for m in _TERM_RE.finditer(text):
        seen.setdefault(m.group(0), None)
    return list(seen)
 def _request(
    messages: list[dict],
    *,
    url: str,
    api_key: str,
    model: str,
    retries: int,
    timeout: float,
 ) -> str:
    payload = {"model": model, "temperature": 0, "messages": messages}
    req = urllib.request.Request(
        url,
        data=json.dumps(payload).encode(),
        headers={"Content-Type": "application/json", "Authorization": "Bearer " + api_key},
    )
    for attempt in range(1, retries + 1):
        try:
            with urllib.request.urlopen(req, timeout=timeout) as resp:
                return json.loads(resp.read())["choices"][0]["message"]["content"]
        except urllib.error.HTTPError:
            raise  # 실제 HTTP 응답(401/4xx) — 재시도 무의미
        except (urllib.error.URLError, OSError):  # transient
            if attempt == retries:
                raise
            time.sleep(1.0 * attempt)
    raise RuntimeError("unreachable")
 def correct(
    text: str,
    *,
    base_url: str | None,
    api_key: str | None,
    model: str = "copilot-gpt-4o",
    max_chars: int = 3000,
    retries: int = 4,
    timeout: float = 90.0,
 ) -> str:
    """음차 영문 용어 복원. max_chars로 청크 분할(작은 컨텍스트 창 대응)."""
    if not base_url or not api_key:
-        raise LLMNotConfigured("llm_base_url/llm_api_key 미설정 — correct를 쓰려면 SCRIBE_LLM_* 설정 필요")
+        raise LLMNotConfigured("llm_base_url/llm_api_key 미설정 — correct에 SCRIBE_LLM_* 필요")
    url = base_url.rstrip("/") + "/chat/completions"
-    payload = {
+    chunks = _chunk(text, max_chars)
-        "model": model,
+    logger.info(
-        "temperature": 0,
+        "llm-correct egress endpoint=%s model=%s chars=%d chunks=%d",
-        "messages": [
+        url, model, len(text), len(chunks),
            {"role": "system", "content": SYSTEM},
            {"role": "user", "content": text},
        ],
    }
    req = urllib.request.Request(
        url,
        data=json.dumps(payload).encode(),
        headers={"Content-Type": "application/json", "Authorization": "Bearer " + api_key},
    )
-    # 감사로그 (allowlist=설정 endpoint, 호출 1줄)
+    glossary: dict[str, None] = {}
-    logger.info("llm-correct egress endpoint=%s model=%s chars=%d", url, model, len(text))
+    out: list[str] = []
-    for attempt in range(1, retries + 1):
+    for chunk in chunks:
-        try:
+        system = SYSTEM
-            with urllib.request.urlopen(req, timeout=timeout) as resp:
+        if glossary:
-                data = json.loads(resp.read())
+            system += (
-            return data["choices"][0]["message"]["content"]
+                "\n이미 이 전사에서 확정된 영문 표기: "
-        except urllib.error.HTTPError:
+                + ", ".join(glossary)
-            raise  # 실제 HTTP 응답(401/4xx) — 재시도 무의미
+                + ". 같은/유사 용어는 이 표기로 통일하라."
-        except (urllib.error.URLError, OSError):  # 연결 reset/timeout 등 transient
+            )
-            if attempt == retries:
+        corrected = _request(
-                raise
+            [{"role": "system", "content": system}, {"role": "user", "content": chunk}],
-            time.sleep(1.0 * attempt)
+            url=url,
-    raise RuntimeError("unreachable")
+            api_key=api_key,
            model=model,
            retries=retries,
            timeout=timeout,
        )
        out.append(corrected)
        for term in _terms(corrected):
            glossary.setdefault(term, None)
        if len(glossary) > _GLOSSARY_CAP:
            glossary = dict(list(glossary.items())[-_GLOSSARY_CAP:])
    return " ".join(out).strip()
@@ -39,3 +39,21 @@ def test_llm_correct_monkeypatched(monkeypatch):
    monkeypatch.setattr(llm.urllib.request, "urlopen", fake_urlopen)
    out = llm.correct("인베딩 점마", base_url="http://x/v1", api_key="k", model="m")
    assert out == "EmbeddingGemma 복원됨"
 def test_llm_chunking_and_glossary(monkeypatch):
    """긴 입력 → 청크 분할 + 러닝 글로서리(작은 컨텍스트 창 대응)."""
    calls: list[list[dict]] = []
    def fake_request(messages, **_kw):
        calls.append(messages)
        return messages[1]["content"]  # 청크 그대로 echo
    monkeypatch.setattr(llm, "_request", fake_request)
    long_text = ". ".join(f"문장{i} EmbeddingGemma 설명" for i in range(400))
    out = llm.correct(long_text, base_url="http://x/v1", api_key="k", max_chars=200)
    assert len(calls) > 1  # 분할됨
    assert "EmbeddingGemma" in out  # 재조립됨
    # 2번째 청크부터 이전에 확정된 영문 표기가 system에 주입됨
    assert any("확정된 영문 표기" in m[0]["content"] for m in calls[1:])
Author	SHA1	Message	Date
lukehemmin	a5e6d56568	docs: add Colab notebook for full-talk transcription (notebooks/colab_full_transcribe.ipynb) GPU(T4) 셀: ffmpeg+uv → 익명 clone → uv sync(engine+gpu) → detect → 오디오 업로드 → large-v3-turbo 풀 전사 → transcript.txt 다운로드. (Colab은 사내 게이트 미도달이라 전사 전용; 보정은 온프렘.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 07:33:54 +09:00
lukehemmin	cd2f807557	chore(omc): hotpaths (beam-size/correct/COLAB) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 07:29:37 +09:00
lukehemmin	7a8cc12cb3	feat(cli): --beam-size + --correct; add COLAB.md GPU full-transcribe guide - transcribe: --beam-size(CPU 속도), --correct(사내 LLM 청크 보정, SCRIBE_LLM_*), config.beam_size(CPU 1~2 권장). 보정 시 전체 수집 후 한 번에 출력. - COLAB.md: Colab(전사 전용·게이트 미도달) + 온프렘 GPU(전사+보정 풀 파이프라인) 가이드. 23 tests pass, ruff clean. --correct 미설정 시 우아한 에러 검증. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 07:29:37 +09:00
lukehemmin	1a91060c43	chore(omc): hotpaths (chunked correction) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 07:09:51 +09:00
lukehemmin	b721ca6419	feat(api): chunk LLM correction for small context windows (+running glossary) 사내 GPT-4o 컨텍스트(<30k)에 맞춰 긴 전사를 문장 경계로 청크 분할하고, 각 청크 보정의 영문 용어를 '러닝 글로서리'로 다음 청크 system에 전달 → 큰 창 없이 강연 전체 용어 일관성 유지. config.llm_max_chars(기본 3000; ~8k창→1500/~16k→3000/~30k→6000). 과대 단일문장은 글자단위 강제 분할 안전망. 23 tests pass(청크 분할/글로서리 주입 포함), ruff clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 07:09:51 +09:00