NLP & Speechlive

Audio Intelligence

Speech → transcription, sentiment, keywords — a full NLP pipeline.

Speech → 4 structured outputs in one pass

The problem

Upload an audio clip and you usually get just a transcript. Adding sentiment, keywords, and speaking-rate to the same upload is mechanically easy but rarely shipped together. This is the one-pass version.

Who this is for

Anyone evaluating NLP pipeline glue, candidates for a voice-product role wanting a small but complete example.

Architecture

faster-whisper (tiny): Transcription with word-level timestamps. POST /api/transcribe.
DistilBERT sentiment: Sentence-level positive / negative scoring over the transcript.
Keyword extractor: Salient terms surfaced from the transcript for skim-friendly summaries.
Speaking-rate calculator: Tokens / second derived from word timestamps.
FastAPI + Next.js: POST /api/analyze runs the full pipeline; UI shows transcript and the three analyses side by side.

Request / data flow

01Audio uploaded → Whisper produces transcript + word timestamps.
02Transcript chunked by sentence → DistilBERT scores each.
03Keyword extractor pulls salient terms; speaking rate computed from timestamps.
04Structured response with all four blocks returned in one shot.

Key decisions

Whisper tiny instead of larger variants.

whyLatency on a homelab CPU matters more than the last point of WER for a demo; tiny still produces usable transcripts for sentiment.

One analyze endpoint that returns everything.

whyMultiple round-trips would make the UI more complex without changing what the user sees.

Stack

faster-whisperDistilBERTNLPSentimentFastAPINext.js

If I rebuilt it

›Add speaker diarization so sentiment per speaker is meaningful in multi-voice clips.
›Stream the transcript word-by-word as Whisper emits it instead of waiting for full completion.

← All projects Contact me about this →