Files
luke_scribe/.omc/artifacts/ask/gemini-20260603-095739.md
lukehemmin fbe13dddcc chore: initial commit — planning docs and omc project context
Greenfield setup for luke_scribe (local STT transcription API). No source
code yet; this captures the completed design phase so teammates can ramp
through oh-my-claudecode.

Includes:
- .omc/plans/consensus-luke-scribe-stt-api.md — consensus impl plan v2.2
- .omc/specs/deep-interview-luke-scribe-stt-api.md — deep-interview spec
- .omc/artifacts/ask/{codex,gemini}-*.md — external review (CCG)
- .omc/project-memory.json — omc project memory
- opencode.json, .claude/settings.json — shared tooling config
- .gitignore — excludes ephemeral omc state/session logs and local settings

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 10:08:17 +09:00

33 lines
4.4 KiB
Markdown

Ripgrep is not available. Falling back to GrepTool.
[ERROR] [IDEClient] Failed to connect to IDE companion extension. Please ensure the extension is running. To install the extension, run /ide install.
Here is a prioritized, read-only review of the STT API design and documentation, analyzing both the Specification and the Consensus Plan.
### 1. REST + WebSocket API Design
The single `Job` abstraction and `options` payload are pragmatic, but the design has several structural and ergonomic gaps:
* **REST Gaps:**
* **Webhooks vs. Polling:** For 4-hour video transcriptions, polling `GET /v1/jobs/{id}` is a severe anti-pattern. An optional `webhook_url` in the `options` schema is critical.
* **Idempotency & Pagination:** `POST /v1/jobs` for a 2GB file needs an `Idempotency-Key` header to prevent duplicate processing on transient network drops. Furthermore, `GET /v1/jobs` lacks pagination (cursors/offset), which will break the API after days of heavy usage.
* **Result TTL:** The 7-day retention policy is noted, but the API response behavior on day 8 is undefined (should explicitly be `410 Gone`, not `404 Not Found`).
* **WebSocket (WS) Protocol:**
* **Auth Handshake:** Browsers cannot send custom headers (`X-API-Key`) during a WS handshake. The docs must specify passing the key via query parameters (`?api_key=...`) or within the first JSON message payload.
* **Codec Negotiation & Backpressure:** The WS schema lacks an explicit audio format declaration (e.g., sample rate, PCM16 vs. Opus). Additionally, while `429` handles REST queue overflow, WS backpressure is undefined (e.g., `{"type": "error", "reason": "buffer_full"}`).
* **Reconnection:** There is no mechanism for a client to resume a dropped WS session without losing the `LocalAgreement` context buffer. A `session_id` is required for mid-stream resumption.
### 2. Documentation Clarity & Spec/Plan Contradictions
There is significant drift between the Spec and the Consensus Plan. An engineer implementing this will face contradictions that pose a deployment risk:
* **VRAM Sizing Drift (Critical):** The Spec estimates `large-v3 fp16` at ~6GB VRAM. The Plan correctly overrides this to **10GB** to account for conservative headroom and sequence length. The Spec must be updated to avoid engineers undersizing GPU instances.
* **Queue Architecture:** The Spec loosely suggests "RQ/Celery". The Plan definitively locks in **RQ `SimpleWorker` (no-fork)** because standard RQ/Celery `os.fork()` behaviors crash PyTorch CUDA contexts. If an engineer follows the Spec and uses Celery, the application will crash on boot.
* **Worker Model:** The Spec implies a standard web-worker pool. The Plan enforces a strict "load-once per worker process" architecture to avoid VRAM fragmentation. This constraint must be elevated in the Spec.
### 3. Alternative Approaches
* **Queue Backend (Redis vs. SQLite):** While Redis/RQ is durable, it bloats the Docker and Colab footprint. **Alternative:** Since this is a local-first API running on a single box, using `taskiq` or `huey` backed by SQLite/file-system eliminates the Redis container entirely while maintaining durability.
* **Realtime Streaming:** Implementing custom `LocalAgreement-2` over `faster-whisper` (as planned) is notoriously brittle for edge cases (e.g., mid-word VAD slicing). **Alternative:** Adopt the C++ `whisper.cpp` streaming server natively via bindings, which handles VAD, context windowing, and memory stability much more efficiently than a custom Python implementation.
* **Model Weights Distribution:** Baking weights into Docker or downloading them synchronously on boot will cause timeouts. **Alternative:** Use an init-container or volume mount for weights.
### 4. Edge-Case Usability
* **First-Run Download Penalty:** A `large-v3` model takes minutes to download. A REST request hitting the API during a cold boot will trigger a timeout. The API needs a `status: "downloading_model"` state.
* **Colab URL Rotation:** `cloudflared` Quick Tunnels are ephemeral and rotate frequently. If a client is polling a 4-hour job and the tunnel drops, the job is orphaned. The CLI should enforce ngrok auth-tokens or webhooks for long-running batch jobs.
* **Multi-language Auto-detect:** Passing `"auto"` language to the `turbo` model on a mixed KO/EN clip often results in the model locking onto English and hallucinating Korean phonetics. The `options` schema should support a prioritized language hint array, not just `"auto"`.