chore(omc): record P1 progress note (engine+transcribe) + hotpaths

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
feat(p1): faster-whisper engine + audio ingest + transcribe (CPU verified)
2026-06-07 15:08:07 +09:00 · 2026-06-07 15:07:41 +09:00 · 2026-06-07 12:56:07 +09:00 · 2026-06-07 12:56:07 +09:00
20 changed files with 4703 additions and 5 deletions
@@ -0,0 +1,24 @@
+# luke_scribe 설정 예시 — 복사: cp .env.example .env  (env prefix: SCRIBE_)
+
+# 모델 (하이브리드 기본; P1 bench 결과에 따라 단일 turbo로 통일 가능)
+SCRIBE_MODEL_REALTIME=large-v3-turbo
+SCRIBE_MODEL_BATCH=large-v3
+
+# 디바이스: auto|cpu|cuda|cuda:0 — 자동 산정, 강제 가능
+SCRIBE_DEVICE=auto
+# SCRIBE_COMPUTE_TYPE=int8        # 비우면 cc/VRAM 기반 자동
+# SCRIBE_WORKERS=1                # 비우면 자동 산정
+
+SCRIBE_LANGUAGE=ko
+
+# 입력 절대 상한 (초과 413)
+SCRIBE_MAX_DURATION_S=14400       # 4h
+SCRIBE_MAX_SIZE_BYTES=2147483648  # 2GB
+
+# 보관 (P2+)
+SCRIBE_RETENTION_DAYS=7
+# SCRIBE_REDIS_URL=redis://localhost:6379/0
+# SCRIBE_API_KEYS=["key1","key2"]
+
+# 터널 (P5): none|cloudflare|ngrok
+SCRIBE_TUNNEL=none
@@ -13,10 +13,10 @@
    "runtime": "Python 3.11+"
  },
  "build": {
-    "buildCommand": null,
-    "testCommand": null,
-    "lintCommand": null,
-    "devCommand": null,
+    "buildCommand": "uv sync",
+    "testCommand": "export PATH=\"$HOME/.local/bin:$HOME/.cargo/bin:$PATH\"\nuv run pytest -q 2>&1 | tail -8\necho \"=== ruff ===\"; uv run ruff check src/ tests/ && echo \"clean\"",
+    "lintCommand": "uv run ruff check src/ tests/",
+    "devCommand": "uv run luke-scribe detect",
    "scripts": {}
  },
  "conventions": {
@@ -51,10 +51,125 @@
      "source": "manual",
      "category": "env",
      "content": "git 원격=자체호스팅 Gitea https://git.lukehemmin.com (openresty, HTTPS/443 전용, SSH 미노출). 인증=PAT를 ~/.git-credentials에 저장(global helper store, username lukehemmin) — 검증완료, VS Code askpass 없이 push 됨. ⚠️ 저장소 익명 읽기 허용 상태(내부/비공개 의도면 Gitea에서 Private 점검)."
+    },
+    {
+      "timestamp": 1780812476362,
+      "source": "manual",
+      "category": "status",
+      "content": "P1 진행(2026-06-07): ✅ detect(능력등급 T0~T3, 1050→T0_CPU 명시강등) · ✅ transcribe(faster-whisper CPU 검증: JFK 11s 클립 정확 전사, model_used 출력) · 단위테스트 10개 통과. 코드 존재함(더 이상 0%). 남음: word-ts/format 출력옵션·Silero VAD 옵션화, VRAM 실측 probe(정적추정 대체), bench(라벨 KO+EN 샘플셋 필요), 상위 tier(T2/T3) Colab 검증, P2(API+Redis/RQ). 브랜치 feat/p1-core."
    }
  ],
  "directoryMap": {},
-  "hotPaths": [],
+  "hotPaths": [
+    {
+      "path": "README.md",
+      "accessCount": 3,
+      "lastAccessed": 1780812417055,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/cli.py",
+      "accessCount": 2,
+      "lastAccessed": 1780812315014,
+      "type": "file"
+    },
+    {
+      "path": "pyproject.toml",
+      "accessCount": 1,
+      "lastAccessed": 1780804235420,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/__init__.py",
+      "accessCount": 1,
+      "lastAccessed": 1780804261889,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/config.py",
+      "accessCount": 1,
+      "lastAccessed": 1780804262703,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/devices/__init__.py",
+      "accessCount": 1,
+      "lastAccessed": 1780804263611,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/devices/profile.py",
+      "accessCount": 1,
+      "lastAccessed": 1780804266795,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/devices/vram_probe.py",
+      "accessCount": 1,
+      "lastAccessed": 1780804273484,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/devices/manager.py",
+      "accessCount": 1,
+      "lastAccessed": 1780804300531,
+      "type": "file"
+    },
+    {
+      "path": "run.sh",
+      "accessCount": 1,
+      "lastAccessed": 1780804312249,
+      "type": "file"
+    },
+    {
+      "path": ".env.example",
+      "accessCount": 1,
+      "lastAccessed": 1780804316978,
+      "type": "file"
+    },
+    {
+      "path": "tests/test_device_manager.py",
+      "accessCount": 1,
+      "lastAccessed": 1780804449331,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/engine/__init__.py",
+      "accessCount": 1,
+      "lastAccessed": 1780812252757,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/engine/model_registry.py",
+      "accessCount": 1,
+      "lastAccessed": 1780812254912,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/engine/faster_whisper_engine.py",
+      "accessCount": 1,
+      "lastAccessed": 1780812261152,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/audio/__init__.py",
+      "accessCount": 1,
+      "lastAccessed": 1780812262920,
+      "type": "file"
+    },
+    {
+      "path": "src/luke_scribe/audio/ingest.py",
+      "accessCount": 1,
+      "lastAccessed": 1780812299865,
+      "type": "file"
+    },
+    {
+      "path": "tests/test_engine_audio.py",
+      "accessCount": 1,
+      "lastAccessed": 1780812413312,
+      "type": "file"
+    }
+  ],
  "userDirectives": [
    {
      "timestamp": 1780801958149,
@@ -0,0 +1,26 @@
+# luke_scribe
+
+내부용 **로컬 STT 전사 API** — faster-whisper(CTranslate2) 기반, 하드웨어 적응형.
+단일 `Job` 추상화로 배치(파일/영상)와 실시간(WebSocket)을 처리한다.
+
+> 설계 단일 진실원본(SoT): [`.omc/plans/consensus-luke-scribe-stt-api.md`](.omc/plans/consensus-luke-scribe-stt-api.md),
+> [`.omc/specs/deep-interview-luke-scribe-stt-api.md`](.omc/specs/deep-interview-luke-scribe-stt-api.md)
+
+## 상태
+- 설계 완료(모호도 ~5%) · 구현 P1 진행 중 (greenfield).
+
+## 빠른 시작 (개발)
+```bash
+uv sync                                            # 코어 의존성
+uv run luke-scribe detect                          # 하드웨어 감지 → 능력등급/정밀도/워커수
+uv sync --extra engine                             # 엔진(faster-whisper)
+uv run luke-scribe transcribe FILE --model tiny    # 단발 전사
+```
+
+## CLI
+| 명령 | 설명 | 상태 |
+|------|------|------|
+| `detect` | 하드웨어 감지·능력등급(T0~T3)·정밀도·워커수 | ✅ P1 |
+| `transcribe <file>` | 단발 파일 전사 (faster-whisper, CPU/GPU) | ✅ P1 |
+| `bench` | turbo vs large-v3 도메인 벤치(게이트) | ⏳ P1 (샘플셋 필요) |
+| `serve` | API 서버 | ⏳ P2 |
@@ -0,0 +1,38 @@
+[project]
+name = "luke-scribe"
+version = "0.1.0"
+description = "내부용 로컬 STT 전사 API (faster-whisper, hardware-adaptive)"
+requires-python = ">=3.11"
+dependencies = [
+    "pydantic>=2.7",
+    "pydantic-settings>=2.3",
+    "typer>=0.12",
+    "rich>=13.7",
+    "psutil>=5.9",
+    "nvidia-ml-py>=12.535",
+    "huggingface-hub>=0.24",
+]
+
+[project.optional-dependencies]
+# 엔진 — transcribe/bench 증분에서 설치 (uv sync --extra engine)
+engine = ["faster-whisper>=1.0.3", "av>=11"]
+# GPU CUDA 런타임 (faster-whisper GPU 추론 시)
+gpu = ["nvidia-cublas-cu12", "nvidia-cudnn-cu12"]
+# P2 API + Queue
+api = ["fastapi>=0.110", "uvicorn[standard]>=0.29", "redis>=5.0", "rq>=1.16"]
+# P5 옵션
+diarize = ["pyannote.audio>=3.1"]
+llm = ["openai>=1.30"]
+
+[project.scripts]
+luke-scribe = "luke_scribe.cli:main"
+
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+
+[tool.hatch.build.targets.wheel]
+packages = ["src/luke_scribe"]
+
+[dependency-groups]
+dev = ["pytest>=8.2", "ruff>=0.5"]
@@ -0,0 +1,5 @@
+#!/usr/bin/env bash
+# 개발/Colab 실행 래퍼 — Docker 없이 순수 Python (계획 §3.10d).
+set -euo pipefail
+cd "$(dirname "$0")"
+exec uv run luke-scribe "$@"
@@ -0,0 +1,3 @@
+"""luke_scribe — 내부용 로컬 STT 전사 API (faster-whisper, hardware-adaptive)."""
+
+__version__ = "0.1.0"
@@ -0,0 +1,4 @@
+"""오디오/영상 입력 — ingest(probe·상한), VAD (스펙 §4-4)."""
+from .ingest import MediaInfo, probe_media
+
+__all__ = ["MediaInfo", "probe_media"]
@@ -0,0 +1,41 @@
+"""미디어 입력 — duration/size probe + 상한 점검 (스펙 §4-4, AC-7).
+
+상한 초과는 호출측이 413으로 매핑(P2). 실제 디코딩은 엔진(faster-whisper/PyAV)이 수행.
+"""
+from __future__ import annotations
+
+import json
+import os
+import shutil
+import subprocess
+from dataclasses import dataclass
+
+
+@dataclass
+class MediaInfo:
+    path: str
+    duration_s: float
+    size_bytes: int
+
+
+def probe_media(path: str) -> MediaInfo:
+    if not os.path.exists(path):
+        raise FileNotFoundError(path)
+    return MediaInfo(path=path, duration_s=_ffprobe_duration(path), size_bytes=os.path.getsize(path))
+
+
+def _ffprobe_duration(path: str) -> float:
+    ffprobe = shutil.which("ffprobe")
+    if not ffprobe:
+        return 0.0
+    try:
+        out = subprocess.run(
+            [ffprobe, "-v", "error", "-show_entries", "format=duration", "-of", "json", path],
+            capture_output=True,
+            text=True,
+            timeout=30,
+            check=True,
+        ).stdout
+        return float(json.loads(out).get("format", {}).get("duration") or 0.0)
+    except Exception:
+        return 0.0
@@ -0,0 +1,123 @@
+"""CLI — typer. `detect`(구현) + transcribe/bench/serve(스텁). 스펙 §배포."""
+from __future__ import annotations
+
+import typer
+from rich.console import Console
+from rich.table import Table
+
+from .devices import DeviceManager
+
+app = typer.Typer(add_completion=False, help="luke_scribe — 로컬 STT 전사 (hardware-adaptive)")
+console = Console()
+
+
+@app.command()
+def detect(
+    device: str = typer.Option("auto", help="auto|cpu|cuda"),
+    compute_type: str = typer.Option(None, "--compute-type", help="강제 compute_type(float16|int8|int8_float16)"),
+    workers: int = typer.Option(None, help="워커수 오버라이드"),
+) -> None:
+    """하드웨어 감지 → 능력등급(T0~T3)/정밀도/워커수 산정 (AC-2/3, 측정 전 정적 추정)."""
+    profile = DeviceManager.detect(
+        force_device=(None if device == "auto" else device),
+        force_compute_type=compute_type,
+        workers_override=workers,
+    )
+    table = Table(title="luke_scribe · device profile", show_header=False, title_style="bold cyan")
+    table.add_row("device", f"{profile.kind}  ({profile.name})")
+    if profile.compute_capability:
+        table.add_row("compute capability", profile.compute_capability)
+    if profile.vram_total_mb:
+        table.add_row("VRAM (free/total)", f"{profile.vram_free_mb} / {profile.vram_total_mb} MB")
+    table.add_row("RAM", f"{profile.ram_total_mb} MB")
+    table.add_row("disk free", f"{profile.disk_free_mb} MB")
+    table.add_row("compute_type", profile.compute_type)
+    table.add_row("capability tier", f"[bold]{profile.tier.value}[/]")
+    table.add_row("max workers", str(profile.max_workers))
+    for lane, model in profile.served_models.items():
+        table.add_row(f"served · {lane}", model)
+    table.add_row("measured", "yes" if profile.measured else "no (정적 추정)")
+    console.print(table)
+    for note in profile.notes:
+        console.print(f"  • {note}", style="yellow")
+
+
+def _todo(name: str, hint: str = "") -> None:
+    console.print(f"[yellow]'{name}' 은 아직 미구현입니다 (P1 진행 중).[/] {hint}")
+    raise typer.Exit(code=1)
+
+
+@app.command()
+def transcribe(
+    file: str = typer.Argument(..., help="오디오/영상 파일"),
+    model: str = typer.Option(None, help="모델 오버라이드(기본=실시간 모델). tiny|base|large-v3|large-v3-turbo"),
+    language: str = typer.Option(None, help="언어(기본 설정값). 'auto' 가능"),
+    device: str = typer.Option("auto", help="auto|cpu|cuda"),
+    word_timestamps: bool = typer.Option(False, "--word-timestamps"),
+    vad: bool = typer.Option(True, "--vad/--no-vad", help="무음 제거"),
+    timestamps: bool = typer.Option(False, "--timestamps", help="세그먼트 [start–end] 표시"),
+) -> None:
+    """단발 파일 전사 (faster-whisper, CPU/GPU 자동, AC-4 일부)."""
+    from .config import settings
+
+    try:
+        from .audio.ingest import probe_media
+        from .engine.faster_whisper_engine import FasterWhisperEngine
+    except ImportError as exc:
+        console.print(f"[red]엔진 미설치:[/] {exc}\n→ `uv sync --extra engine` 후 다시 시도하세요.")
+        raise typer.Exit(code=1) from exc
+
+    try:
+        info = probe_media(file)
+    except FileNotFoundError:
+        console.print(f"[red]파일 없음:[/] {file}")
+        raise typer.Exit(code=1) from None
+
+    if info.duration_s > settings.max_duration_s or info.size_bytes > settings.max_size_bytes:
+        console.print(
+            f"[red]입력 상한 초과(413):[/] {info.duration_s:.0f}s / {info.size_bytes}B "
+            f"(상한 {settings.max_duration_s}s / {settings.max_size_bytes}B)"
+        )
+        raise typer.Exit(code=1)
+
+    profile = DeviceManager.detect(force_device=(None if device == "auto" else device))
+    dev = "cpu" if profile.kind == "cpu" else "cuda"
+    model_name = model or settings.model_realtime
+    lang = language or settings.language
+    console.print(
+        f"[dim]model={model_name} device={dev} compute={profile.compute_type} "
+        f"lang={lang} dur={info.duration_s:.1f}s[/]"
+    )
+
+    engine = FasterWhisperEngine(model_name, dev, profile.compute_type, cache_dir=settings.model_cache_dir)
+    segments, tinfo = engine.transcribe(file, language=lang, word_timestamps=word_timestamps, vad=vad)
+
+    count = 0
+    for seg in segments:
+        count += 1
+        if timestamps:
+            console.print(f"[cyan][{seg.start:6.2f}–{seg.end:6.2f}][/] {seg.text.strip()}")
+        else:
+            console.print(seg.text.strip())
+    detected = getattr(tinfo, "language", None)
+    console.print(f"[green]✓ {count} segments · detected_lang={detected} · model_used={model_name}[/]")
+
+
+@app.command()
+def bench(samples: str = typer.Option(None, help="라벨된 KO+EN 샘플 디렉터리")) -> None:
+    """turbo vs large-v3 도메인 벤치 게이트 (샘플셋 확보 후)."""
+    _todo("bench", "→ samples/ 라벨셋 필요")
+
+
+@app.command()
+def serve() -> None:
+    """API 서버 (P2)."""
+    _todo("serve", "→ P2 (FastAPI + Redis/RQ)")
+
+
+def main() -> None:
+    app()
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,38 @@
+"""런타임 설정 — env(`SCRIBE_*`) / `.env` 로 오버라이드. 스펙 §config."""
+from __future__ import annotations
+
+from pydantic_settings import BaseSettings, SettingsConfigDict
+
+
+class Settings(BaseSettings):
+    model_config = SettingsConfigDict(env_prefix="SCRIBE_", env_file=".env", extra="ignore")
+
+    # 모델 (경로별 기본 — 하이브리드; P1 bench 결과에 따라 단일 turbo로 통일 가능)
+    model_realtime: str = "large-v3-turbo"
+    model_batch: str = "large-v3"
+
+    # 디바이스 (auto|cpu|cuda|cuda:0) — Device Manager가 자동 산정, 강제 가능
+    device: str = "auto"
+    compute_type: str | None = None      # None=자동(cc/VRAM 기반)
+    workers: int | None = None           # None=자동 산정
+
+    # 언어 (기본 ko, 요청별 override)
+    language: str = "ko"
+
+    # 입력 절대 상한 (초과 413)
+    max_duration_s: int = 4 * 3600       # 4h
+    max_size_bytes: int = 2 * 1024 * 1024 * 1024  # 2GB
+
+    # 보관/큐/인증 (P2+)
+    retention_days: int = 7
+    redis_url: str | None = None
+    api_keys: list[str] = []
+
+    # 터널 (P5)
+    tunnel: str = "none"                 # none|cloudflare|ngrok
+
+    # 모델 캐시 디렉터리 (None=HF 기본)
+    model_cache_dir: str | None = None
+
+
+settings = Settings()
@@ -0,0 +1,5 @@
+"""Device Manager — GPU/CPU 감지 → 능력등급/정밀도/워커수 산정 (스펙 §6, 계획 §3.6)."""
+from .manager import DeviceManager
+from .profile import CapabilityTier, DeviceProfile
+
+__all__ = ["DeviceManager", "DeviceProfile", "CapabilityTier"]
@@ -0,0 +1,125 @@
+"""DeviceManager — 감지 → 정밀도/능력등급/워커수 산정 (계획 §3.6, AC-2/3).
+
+현재는 정적 추정(보수 상수). 후속: 부팅 시 모델 1회 로드 실측(`measured=True`)으로 대체.
+"""
+from __future__ import annotations
+
+import os
+
+from .profile import HEADROOM, MODEL_FOOTPRINT_MB, CapabilityTier, DeviceProfile
+from .vram_probe import GpuInfo, probe_disk_free_mb, probe_gpus, probe_ram_mb
+
+TURBO = "large-v3-turbo"
+V3 = "large-v3"
+
+
+def _select_compute_type(cc: tuple[int, int], free_mb: int) -> str:
+    """정밀도 자동 선택 (계획 §3.6)."""
+    major = cc[0]
+    if major >= 7:  # Volta+ : fp16 효율
+        return "float16" if free_mb >= 12000 else "int8_float16"
+    if major == 6:  # Pascal (예: GTX 1050) — fp16 비효율 → int8
+        return "int8"
+    return "int8"
+
+
+def _fits(model: str, ct: str, free_mb: int) -> bool:
+    fp = MODEL_FOOTPRINT_MB.get((model, ct))
+    return fp is not None and fp * HEADROOM <= free_mb
+
+
+def _both_fit(ct: str, free_mb: int) -> bool:
+    a = MODEL_FOOTPRINT_MB.get((TURBO, ct))
+    b = MODEL_FOOTPRINT_MB.get((V3, ct))
+    return a is not None and b is not None and (a + b) * HEADROOM <= free_mb
+
+
+def _cpu_workers(override: int | None) -> int:
+    return override or max(1, (os.cpu_count() or 2) // 4)
+
+
+def _cpu_profile(
+    *, name: str, ram: int, disk: int, override: int | None,
+    gpu: GpuInfo | None = None, notes: list[str] | None = None,
+) -> DeviceProfile:
+    return DeviceProfile(
+        kind="cpu",
+        name=name,
+        compute_capability=(f"{gpu.compute_capability[0]}.{gpu.compute_capability[1]}" if gpu else None),
+        vram_total_mb=(gpu.vram_total_mb if gpu else 0),
+        vram_free_mb=(gpu.vram_free_mb if gpu else 0),
+        ram_total_mb=ram,
+        disk_free_mb=disk,
+        compute_type="int8",
+        tier=CapabilityTier.T0_CPU,
+        max_workers=_cpu_workers(override),
+        served_models={"realtime": f"{TURBO}@cpu", "batch": f"{TURBO}@cpu"},
+        notes=(notes or []) + ["large-v3 GPU 미제공(CPU 경로)"],
+    )
+
+
+class DeviceManager:
+    @staticmethod
+    def detect(
+        force_device: str | None = None,
+        force_compute_type: str | None = None,
+        workers_override: int | None = None,
+    ) -> DeviceProfile:
+        ram = probe_ram_mb()
+        disk = probe_disk_free_mb(".")
+        gpus = probe_gpus()
+
+        # 강제 CPU 또는 GPU 없음 → T0
+        if force_device == "cpu" or not gpus:
+            note = (
+                "GPU 감지됨이나 --device cpu 강제" if (force_device == "cpu" and gpus)
+                else "GPU 미감지 → CPU"
+            )
+            return _cpu_profile(name="CPU", ram=ram, disk=disk, override=workers_override, notes=[note])
+
+        gpu = gpus[0]
+        cc = gpu.compute_capability
+        ct = force_compute_type or _select_compute_type(cc, gpu.vram_free_mb)
+
+        # turbo조차 GPU에 안 들어가면 → CPU 강등(T0)
+        if not _fits(TURBO, ct, gpu.vram_free_mb):
+            need = int(MODEL_FOOTPRINT_MB[(TURBO, ct)] * HEADROOM)
+            return _cpu_profile(
+                name=f"CPU (GPU={gpu.name} 2GB급 부족)", ram=ram, disk=disk,
+                override=workers_override, gpu=gpu,
+                notes=[f"{gpu.name} free {gpu.vram_free_mb}MB < turbo {need}MB(헤드룸 포함) → CPU 강등(T0)"],
+            )
+
+        # turbo는 GPU OK → large-v3 적재 여부로 등급 분기
+        notes: list[str] = []
+        if not _fits(V3, ct, gpu.vram_free_mb):
+            tier = CapabilityTier.T1_TURBO_GPU
+            served = {"realtime": f"{TURBO}@cuda", "batch": f"{TURBO}@cuda"}
+            notes.append("large-v3 미제공 → 배치도 turbo")
+        elif not _both_fit(ct, gpu.vram_free_mb):
+            tier = CapabilityTier.T2_SWAP
+            served = {"realtime": f"{TURBO}@cuda", "batch": f"{V3}@cuda (swap)"}
+            notes.append("turbo/large-v3 동시상주 불가 → 호출별 load/unload")
+        else:
+            tier = CapabilityTier.T3_CORESIDENT
+            served = {"realtime": f"{TURBO}@cuda", "batch": f"{V3}@cuda"}
+
+        # 워커수 = floor((free - reserve) / per_worker), reserve=상주 모델 헤드룸
+        per_worker = MODEL_FOOTPRINT_MB[(TURBO, ct)]
+        reserve = int(per_worker * (HEADROOM - 1.0))
+        est = max(1, (gpu.vram_free_mb - reserve) // per_worker)
+
+        return DeviceProfile(
+            kind="cuda",
+            name=gpu.name,
+            compute_capability=f"{cc[0]}.{cc[1]}",
+            vram_total_mb=gpu.vram_total_mb,
+            vram_free_mb=gpu.vram_free_mb,
+            ram_total_mb=ram,
+            disk_free_mb=disk,
+            compute_type=ct,
+            tier=tier,
+            max_workers=workers_override or est,
+            served_models=served,
+            notes=notes,
+        )
@@ -0,0 +1,46 @@
+"""DeviceProfile 모델 + 능력등급 + 모델 VRAM 보수 상수 (계획 §3.6)."""
+from __future__ import annotations
+
+from enum import Enum
+
+from pydantic import BaseModel, Field
+
+
+class CapabilityTier(str, Enum):
+    """부팅 실측으로 자동판정 — "제공 가능 모델"을 등급이 결정 (무음 강등 아님)."""
+
+    T0_CPU = "T0_CPU"            # GPU로 turbo도 무리/GPU 없음 → turbo@CPU
+    T1_TURBO_GPU = "T1_TURBO_GPU"  # turbo는 GPU OK, large-v3 무리 (배치도 turbo)
+    T2_SWAP = "T2_SWAP"            # large-v3 OK, turbo와 동시상주 불가 → load/unload
+    T3_CORESIDENT = "T3_CORESIDENT"  # turbo + large-v3 동시 적재 가능
+
+
+# 보수 기본 상수 (MB) — 측정 전 폴백. 계획 §3.6.
+# (부팅 시 실제 로드 측정으로 대체 예정: vram_probe --probe-load)
+MODEL_FOOTPRINT_MB: dict[tuple[str, str], int] = {
+    ("large-v3", "float16"): 10000,
+    ("large-v3", "int8_float16"): 5500,
+    ("large-v3", "int8"): 3500,
+    ("large-v3-turbo", "float16"): 4000,
+    ("large-v3-turbo", "int8_float16"): 2400,
+    ("large-v3-turbo", "int8"): 1800,
+}
+HEADROOM = 1.3  # 적재 헤드룸 배수
+
+
+class DeviceProfile(BaseModel):
+    """감지 결과 + 산정값. /v1/system·detect 가 그대로 노출."""
+
+    kind: str                              # "cuda" | "cpu"
+    name: str
+    compute_capability: str | None = None
+    vram_total_mb: int = 0
+    vram_free_mb: int = 0
+    ram_total_mb: int = 0
+    disk_free_mb: int = 0
+    compute_type: str
+    tier: CapabilityTier
+    max_workers: int = 1
+    served_models: dict[str, str] = Field(default_factory=dict)  # {"realtime":..., "batch":...}
+    measured: bool = False                 # True=모델 실측, False=정적 추정
+    notes: list[str] = Field(default_factory=list)
@@ -0,0 +1,72 @@
+"""하드웨어 실측 — GPU(NVML)/RAM/디스크. 의존성 없거나 GPU 없으면 우아하게 빈 결과."""
+from __future__ import annotations
+
+import shutil
+from dataclasses import dataclass
+
+
+@dataclass
+class GpuInfo:
+    index: int
+    name: str
+    compute_capability: tuple[int, int]
+    vram_total_mb: int
+    vram_free_mb: int
+
+
+def probe_gpus() -> list[GpuInfo]:
+    """NVML로 GPU 목록·VRAM·compute capability 실측. 없으면 []."""
+    try:
+        import pynvml  # nvidia-ml-py
+    except ImportError:
+        return []
+    try:
+        pynvml.nvmlInit()
+    except Exception:
+        return []
+
+    gpus: list[GpuInfo] = []
+    try:
+        for i in range(pynvml.nvmlDeviceGetCount()):
+            h = pynvml.nvmlDeviceGetHandleByIndex(i)
+            name = pynvml.nvmlDeviceGetName(h)
+            if isinstance(name, bytes):
+                name = name.decode()
+            mem = pynvml.nvmlDeviceGetMemoryInfo(h)
+            try:
+                major, minor = pynvml.nvmlDeviceGetCudaComputeCapability(h)
+            except Exception:
+                major, minor = (0, 0)
+            gpus.append(
+                GpuInfo(
+                    index=i,
+                    name=name,
+                    compute_capability=(major, minor),
+                    vram_total_mb=int(mem.total // (1024 * 1024)),
+                    vram_free_mb=int(mem.free // (1024 * 1024)),
+                )
+            )
+    except Exception:
+        return []
+    finally:
+        try:
+            pynvml.nvmlShutdown()
+        except Exception:
+            pass
+    return gpus
+
+
+def probe_ram_mb() -> int:
+    try:
+        import psutil
+
+        return int(psutil.virtual_memory().total // (1024 * 1024))
+    except Exception:
+        return 0
+
+
+def probe_disk_free_mb(path: str = ".") -> int:
+    try:
+        return int(shutil.disk_usage(path).free // (1024 * 1024))
+    except Exception:
+        return 0
@@ -0,0 +1,5 @@
+"""추론 엔진 — faster-whisper(CTranslate2) 단일 엔진 + 얇은 추상화 (계획 §3 D3)."""
+from .faster_whisper_engine import FasterWhisperEngine
+from .model_registry import resolve_model
+
+__all__ = ["FasterWhisperEngine", "resolve_model"]
@@ -0,0 +1,55 @@
+"""faster-whisper(CTranslate2) 엔진 래퍼 (스펙 §2 / 계획 §4-3).
+
+faster-whisper가 내부적으로 PyAV로 디코딩하므로 파일 경로(오디오/영상)를 그대로 받는다.
+segments는 제너레이터 — 호출측이 소비하며 progress/취소 점검(P2)에 활용.
+"""
+from __future__ import annotations
+
+from typing import TYPE_CHECKING, Any
+
+from .model_registry import resolve_model
+
+if TYPE_CHECKING:
+    from collections.abc import Iterable
+
+
+class FasterWhisperEngine:
+    def __init__(
+        self,
+        model_name: str,
+        device: str,
+        compute_type: str,
+        cache_dir: str | None = None,
+    ) -> None:
+        from faster_whisper import WhisperModel
+
+        self.model_name = model_name
+        self.device = device
+        self.compute_type = compute_type
+        self.model = WhisperModel(
+            resolve_model(model_name),
+            device=device,
+            compute_type=compute_type,
+            download_root=cache_dir,
+        )
+
+    def transcribe(
+        self,
+        audio: str,
+        *,
+        language: str | None = "ko",
+        word_timestamps: bool = False,
+        vad: bool = True,
+        hotwords: list[str] | None = None,
+        initial_prompt: str | None = None,
+        beam_size: int = 5,
+    ) -> tuple[Iterable[Any], Any]:
+        return self.model.transcribe(
+            audio,
+            language=(None if language in (None, "auto") else language),
+            word_timestamps=word_timestamps,
+            vad_filter=vad,
+            hotwords=(" ".join(hotwords) if hotwords else None),
+            initial_prompt=initial_prompt,
+            beam_size=beam_size,
+        )
@@ -0,0 +1,16 @@
+"""논리 모델명 → faster-whisper(CT2) 식별자 (계획 §4-3).
+
+표준 사이즈(tiny/base/small/medium/large-v3)는 그대로 통과.
+turbo류는 검증된 CT2 변환 레포로 매핑.
+"""
+from __future__ import annotations
+
+_MODEL_IDS: dict[str, str] = {
+    "large-v3-turbo": "deepdml/faster-whisper-large-v3-turbo-ct2",
+    "turbo": "deepdml/faster-whisper-large-v3-turbo-ct2",
+    "large-v3": "large-v3",
+}
+
+
+def resolve_model(name: str) -> str:
+    return _MODEL_IDS.get(name, name)
@@ -0,0 +1,79 @@
+"""Device Manager 능력등급/정밀도/오버라이드 결정 로직 (계획 §8 unit).
+
+실하드웨어는 T0만 밟으므로 T1~T3은 합성 VRAM 값으로 검증.
+"""
+from __future__ import annotations
+
+from luke_scribe.devices import manager as m
+from luke_scribe.devices.manager import DeviceManager
+from luke_scribe.devices.profile import CapabilityTier
+from luke_scribe.devices.vram_probe import GpuInfo
+
+
+def _patch(monkeypatch, gpus: list[GpuInfo]) -> None:
+    monkeypatch.setattr(m, "probe_gpus", lambda: gpus)
+    monkeypatch.setattr(m, "probe_ram_mb", lambda: 16000)
+    monkeypatch.setattr(m, "probe_disk_free_mb", lambda path=".": 100000)
+
+
+def _gpu(cc: tuple[int, int], free: int, name: str = "TestGPU") -> GpuInfo:
+    return GpuInfo(0, name, cc, free + 100, free)
+
+
+def test_no_gpu_is_t0_cpu(monkeypatch):
+    _patch(monkeypatch, [])
+    p = DeviceManager.detect()
+    assert p.kind == "cpu"
+    assert p.tier == CapabilityTier.T0_CPU
+    assert p.compute_type == "int8"
+
+
+def test_weak_pascal_downgrades_to_cpu(monkeypatch):
+    # GTX 1050: cc6.1, free 1990 → turbo(int8, 2340MB 헤드룸) 부족 → CPU 강등
+    _patch(monkeypatch, [_gpu((6, 1), 1990, "GTX 1050")])
+    p = DeviceManager.detect()
+    assert p.tier == CapabilityTier.T0_CPU
+    assert p.kind == "cpu"
+    assert p.vram_free_mb == 1990  # GPU 정보는 보존(투명성)
+    assert any("강등" in n for n in p.notes)
+
+
+def test_t1_turbo_only(monkeypatch):
+    # cc7.5, free 6000 → int8_float16; turbo 적재 OK, large-v3 무리
+    _patch(monkeypatch, [_gpu((7, 5), 6000)])
+    p = DeviceManager.detect()
+    assert p.tier == CapabilityTier.T1_TURBO_GPU
+    assert p.compute_type == "int8_float16"
+    assert p.served_models["batch"].startswith("large-v3-turbo")
+
+
+def test_t2_swap(monkeypatch):
+    # cc7.5, free 16000 → float16; turbo·large-v3 각각 OK, 동시상주는 불가
+    _patch(monkeypatch, [_gpu((7, 5), 16000)])
+    p = DeviceManager.detect()
+    assert p.tier == CapabilityTier.T2_SWAP
+    assert p.compute_type == "float16"
+    assert "swap" in p.served_models["batch"]
+
+
+def test_t3_coresident(monkeypatch):
+    # A100급: cc8.0, free 40000 → float16; turbo+large-v3 동시상주
+    _patch(monkeypatch, [_gpu((8, 0), 40000, "A100")])
+    p = DeviceManager.detect()
+    assert p.tier == CapabilityTier.T3_CORESIDENT
+    assert p.compute_type == "float16"
+    assert p.served_models["batch"] == "large-v3@cuda"
+    assert p.max_workers >= 1
+
+
+def test_force_cpu_override(monkeypatch):
+    _patch(monkeypatch, [_gpu((8, 0), 40000)])
+    p = DeviceManager.detect(force_device="cpu")
+    assert p.tier == CapabilityTier.T0_CPU
+    assert p.kind == "cpu"
+
+
+def test_workers_override(monkeypatch):
+    _patch(monkeypatch, [_gpu((8, 0), 40000)])
+    p = DeviceManager.detect(workers_override=3)
+    assert p.max_workers == 3
@@ -0,0 +1,23 @@
+"""engine.model_registry / audio.ingest 경량 단위 테스트 (모델 로드 불요)."""
+from __future__ import annotations
+
+import pytest
+
+from luke_scribe.audio.ingest import probe_media
+from luke_scribe.engine.model_registry import resolve_model
+
+
+def test_resolve_model_turbo_maps_to_ct2_repo():
+    expected = "deepdml/faster-whisper-large-v3-turbo-ct2"
+    assert resolve_model("large-v3-turbo") == expected
+    assert resolve_model("turbo") == expected
+
+
+def test_resolve_model_standard_passthrough():
+    assert resolve_model("tiny") == "tiny"
+    assert resolve_model("large-v3") == "large-v3"
+
+
+def test_probe_media_missing_raises():
+    with pytest.raises(FileNotFoundError):
+        probe_media("/no/such/file.wav")
Author	SHA1	Message	Date
lukehemmin	518c03174a	chore(omc): record P1 progress note (engine+transcribe) + hotpaths Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 15:08:07 +09:00
lukehemmin	73380bebf9	feat(p1): faster-whisper engine + audio ingest + transcribe (CPU verified) - engine/: FasterWhisperEngine 래퍼 + model_registry (turbo→CT2 repo) - audio/ingest.py: ffprobe duration/size probe + 413 상한 훅 - cli transcribe: device-auto, model 오버라이드, 413 가드, model_used 출력 - 단위 테스트 3 (resolve_model, probe_media); README 갱신 검증(CPU): JFK 11s 클립 → 정확 전사, detected_lang=en. 10 tests pass, ruff clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 15:07:41 +09:00
lukehemmin	d75d60671e	chore(omc): seed build commands + hotpaths from P1 scaffolding Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 12:56:07 +09:00
lukehemmin	5d2604105b	feat(p1): scaffolding + Device Manager / VRAM probe + CLI detect - pyproject (uv, src layout) + extras: engine/gpu/api/diarize/llm - config.py (pydantic-settings, SCRIBE_ env) - devices/: vram_probe (NVML/psutil/disk) + DeviceManager → capability tier T0–T3, precision by cc/VRAM, worker estimate (계획 §3.6, AC-2/3) - cli.py (typer): detect (구현) + transcribe/bench/serve (스텁) - run.sh, .env.example, README Verified on GTX 1050/2GB: detect → T0_CPU (turbo doesn't fit → explicit downgrade, fail-explicit). Overrides (--device/--workers) work. 7 unit tests cover T0–T3 + overrides via synthetic VRAM. ruff clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 12:56:07 +09:00