luke_scribe

lukehemmin/luke_scribe

Fork 0

Commit Graph

Author	SHA1	Message	Date
lukehemmin	b721ca6419	feat(api): chunk LLM correction for small context windows (+running glossary) 사내 GPT-4o 컨텍스트(<30k)에 맞춰 긴 전사를 문장 경계로 청크 분할하고, 각 청크 보정의 영문 용어를 '러닝 글로서리'로 다음 청크 system에 전달 → 큰 창 없이 강연 전체 용어 일관성 유지. config.llm_max_chars(기본 3000; ~8k창→1500/~16k→3000/~30k→6000). 과대 단일문장은 글자단위 강제 분할 안전망. 23 tests pass(청크 분할/글로서리 주입 포함), ruff clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 07:09:51 +09:00
lukehemmin	8f6f8969fd	feat(api): sync test API (serve) + opt-in LLM correction + cloudflared tunnel - api/: FastAPI app, X-API-Key 인증(미설정 시 임시키), 엔진 load-once 풀 (+transcribe lock), POST /v1/transcribe(multipart, 동기), /health, /v1/system, /v1/models. 업로드 임시파일 finally 삭제(프라이버시). - postprocess/: llm.correct(scripts/llm_correct.py 승격; opt-in·allowlist·감사로그·재시도) + rules.normalize(EmbeddingGemma 등 정규화). - results/formats.py: txt/srt/vtt. connectivity/tunnel.py: cloudflared quick tunnel(Colab). - cli serve: uvicorn 단일워커 + --tunnel cloudflare; config llm_* 필드; pyproject api/queue extra 분리(+python-multipart, dev httpx). 검증: 22 단위테스트(API TestClient·formats·postprocess) + 실서버 e2e (/health·auth 401·실제 전사(JFK)·SRT·임시파일 삭제). KO 품질은 turbo/large-v3 필요(tiny는 한국어 degenerate). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-08 23:20:01 +09:00

Author

SHA1

Message

Date

lukehemmin

b721ca6419

feat(api): chunk LLM correction for small context windows (+running glossary)

사내 GPT-4o 컨텍스트(<30k)에 맞춰 긴 전사를 문장 경계로 청크 분할하고,
각 청크 보정의 영문 용어를 '러닝 글로서리'로 다음 청크 system에 전달 →
큰 창 없이 강연 전체 용어 일관성 유지. config.llm_max_chars(기본 3000;
~8k창→1500/~16k→3000/~30k→6000). 과대 단일문장은 글자단위 강제 분할 안전망.

23 tests pass(청크 분할/글로서리 주입 포함), ruff clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-09 07:09:51 +09:00

lukehemmin

8f6f8969fd

feat(api): sync test API (serve) + opt-in LLM correction + cloudflared tunnel

- api/: FastAPI app, X-API-Key 인증(미설정 시 임시키), 엔진 load-once 풀
  (+transcribe lock), POST /v1/transcribe(multipart, 동기), /health, /v1/system,
  /v1/models. 업로드 임시파일 finally 삭제(프라이버시).
- postprocess/: llm.correct(scripts/llm_correct.py 승격; opt-in·allowlist·감사로그·재시도)
  + rules.normalize(EmbeddingGemma 등 정규화).
- results/formats.py: txt/srt/vtt. connectivity/tunnel.py: cloudflared quick tunnel(Colab).
- cli serve: uvicorn 단일워커 + --tunnel cloudflare; config llm_* 필드;
  pyproject api/queue extra 분리(+python-multipart, dev httpx).

검증: 22 단위테스트(API TestClient·formats·postprocess) + 실서버 e2e
(/health·auth 401·실제 전사(JFK)·SRT·임시파일 삭제). KO 품질은 turbo/large-v3 필요(tiny는 한국어 degenerate).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-08 23:20:01 +09:00

2 Commits