T

lukehemmin b721ca6419 feat(api): chunk LLM correction for small context windows (+running glossary)

사내 GPT-4o 컨텍스트(<30k)에 맞춰 긴 전사를 문장 경계로 청크 분할하고,
각 청크 보정의 영문 용어를 '러닝 글로서리'로 다음 청크 system에 전달 →
큰 창 없이 강연 전체 용어 일관성 유지. config.llm_max_chars(기본 3000;
~8k창→1500/~16k→3000/~30k→6000). 과대 단일문장은 글자단위 강제 분할 안전망.

23 tests pass(청크 분할/글로서리 주입 포함), ruff clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-09 07:09:51 +09:00

.claude

chore: initial commit — planning docs and omc project context

2026-06-07 10:08:17 +09:00

.omc

chore(omc): record GPT-4o correction finding + P2 API progress (hotpaths)

2026-06-08 23:20:01 +09:00

samples

chore: scaffold samples/ko_en/ (clips/ + manifest template)

2026-06-07 15:14:25 +09:00

scripts

feat(api): sync test API (serve) + opt-in LLM correction + cloudflared tunnel

2026-06-08 23:20:01 +09:00

src/luke_scribe

feat(api): chunk LLM correction for small context windows (+running glossary)

2026-06-09 07:09:51 +09:00

tests

feat(api): chunk LLM correction for small context windows (+running glossary)

2026-06-09 07:09:51 +09:00

.env.example

feat(p1): scaffolding + Device Manager / VRAM probe + CLI detect

2026-06-07 12:56:07 +09:00

.gitignore

docs: add samples/ bench dataset spec (KO+EN) + broaden audio gitignore

2026-06-07 15:12:20 +09:00

opencode.json

chore: initial commit — planning docs and omc project context

2026-06-07 10:08:17 +09:00

pyproject.toml

feat(api): sync test API (serve) + opt-in LLM correction + cloudflared tunnel

2026-06-08 23:20:01 +09:00

README.md

feat(p1): faster-whisper engine + audio ingest + transcribe (CPU verified)

2026-06-07 15:07:41 +09:00

run.sh

feat(p1): scaffolding + Device Manager / VRAM probe + CLI detect

2026-06-07 12:56:07 +09:00

uv.lock

feat(api): sync test API (serve) + opt-in LLM correction + cloudflared tunnel

2026-06-08 23:20:01 +09:00

README.md

luke_scribe

내부용 로컬 STT 전사 API — faster-whisper(CTranslate2) 기반, 하드웨어 적응형. 단일 Job 추상화로 배치(파일/영상)와 실시간(WebSocket)을 처리한다.

설계 단일 진실원본(SoT): .omc/plans/consensus-luke-scribe-stt-api.md, .omc/specs/deep-interview-luke-scribe-stt-api.md

상태

설계 완료(모호도 ~5%) · 구현 P1 진행 중 (greenfield).

빠른 시작 (개발)

uv sync                                            # 코어 의존성
uv run luke-scribe detect                          # 하드웨어 감지 → 능력등급/정밀도/워커수
uv sync --extra engine                             # 엔진(faster-whisper)
uv run luke-scribe transcribe FILE --model tiny    # 단발 전사

CLI

명령	설명	상태
`detect`	하드웨어 감지·능력등급(T0~T3)·정밀도·워커수	✅ P1
`transcribe <file>`	단발 파일 전사 (faster-whisper, CPU/GPU)	✅ P1
`bench`	turbo vs large-v3 도메인 벤치(게이트)	⏳ P1 (샘플셋 필요)
`serve`	API 서버	⏳ P2