voice.AMD reaches feature parity with python. - #1390 (@toubatbrian)fix(agents): support constructing AgentSession with no arguments - #1410 (@u9g)
AMD: cancel the pre-baked HUMAN/short_greeting silence timer when a final STT transcript arrives inside the short-speech window, replacing it with a long_speech timer anchored at speechEndedAt + MACHINE_SILENCE_THRESHOLD_MS so the LLM verdict gets the final word. Mirrors the python fix in livekit/agents#5637. - #1390 (@toubatbrian)
Port AMD improvements from python livekit/agents#5584. voice.AMD now exposes the previously hard-coded timing thresholds (humanSpeechThresholdMs, humanSilenceThresholdMs, machineSilenceThresholdMs) and the classification prompt as constructor options, defers to the LLM (instead of forcing a HUMAN verdict) when a transcript is already available after a short greeting, and accepts a participantIdentity hint plus a suppressCompatibilityWarning flag. The classifier now offers two LLM tools — save_prediction and postpone_termination (capped at 3 extensions × 10s) — letting the model request more audio when the transcript is ambiguous; if the model returns plain JSON instead of tool calls, AMD falls back to the previous content-parsing path. AMD also logs a one-shot warning when the resolved LLM is not in the bundled EVALUATED_LLM_MODELS list. - #1368 (@toubatbrian)
fix(inference): make inference.LLM compatible with openai >= 6.36.0 - ()
Add comments to agent side and inference side fallback adapters - #1398 (@tmshapland)
refactor(agents): replace uuid with crypto.randomUUID - #1392 (@benasher44)
Add support for the new inworld-tts-2 Inworld TTS model. - #1396 (@toubatbrian)
inworld/inworld-tts-2 to the InworldModels union exported from
@livekit/agents/inference so the model is selectable when using the
LiveKit Inference Gateway TTS client.TTSModels type from @livekit/agents-plugin-inworld
('inworld-tts-2' | 'inworld-tts-1.5-max') and updates TTSOptions.model
to TTSModels | string, mirroring the Python plugin so callers get
autocomplete for the curated model names while still being able to pass
any custom model id.Ports https://github.com/livekit/agents/pull/5646 from livekit/agents.
Port the barge-in cooldown / backchannelBoundary interruption window from Python (livekit/agents#5269). When the agent starts speaking, VAD-based interruption now stays active for a configurable cooldown (default 1000 ms) before being disabled, allowing the user to quickly correct themselves at the start of the agent's turn. When the agent finishes speaking, transcripts whose end time falls within the trailing cooldown (default 3500 ms) are released as normal user input instead of being held, surfacing premature answers to the agent's last sentence. The cooldown is configured via turnHandling.interruption.backchannelBoundary (a single number applies to both sides; pass [start, end] to configure them separately, or null to disable). - #1366 (@toubatbrian)
feat(stt): add FakeSTT test harness for FallbackAdapter - #1288 (@drain-zine)
Harden RecorderIO teardown by fencing writes before channel closure and stopping - #1378 (@toubatbrian) the forward task first, preventing repeated closed WritableStream write errors on disconnect. Also centralize writable-stream closed error detection in utils and add regression tests.
Add voice.AvatarSession base class and port the asymmetric-detach warning from the Python TranscriptSynchronizer. The new base class registers aclose as a job shutdown callback and warns when an avatar session is started after AgentSession.start() has already wired an audio output. The transcript synchronizer now tracks _audioAttached / _textAttached via onAttached / onDetached and logs a one-shot warning when audio or text is detached asymmetrically (covering external avatars and manual session.output.audio / .transcription replacement). Existing avatar plugins (anam, bey, lemonslice, trugen) now inherit from voice.AvatarSession and call super.start(agentSession, room) first. - #1280 (@toubatbrian)
fix(inference): drop streamed assistant text from tool call chunks - #1359 (@Genmin)
fix(inference): update tts event name and drop unkown type warning - #1354 (@chenghao-mou)
Port the liveavatar plugin from the Python livekit-agents repo, including the new videoQuality parameter from livekit/agents#5552. - #1324 (@toubatbrian)
The new @livekit/agents-plugin-liveavatar package adds a LiveAvatar AvatarSession that mirrors the Python plugin: it brings up a LiveAvatar streaming session, opens the realtime websocket, captures the agent's audio output through a queue-based AudioOutput, resamples to 24 kHz mono, and forwards base64-encoded chunks (~600 ms first chunk, ~1 s subsequent) to the LiveAvatar service. Inbound websocket events drive playback start/finish notifications back into the AgentSession.
Also exports voice.AudioOutput (and its companion AudioOutputCapabilities / PlaybackFinishedEvent / PlaybackStartedEvent types) from @livekit/agents so plugin authors can subclass the abstract audio sink.
feat(telemetry): expose provider request ids on STT/TTS/LLM spans for debugging - #1319 (@toubatbrian)
Adds the lk.provider_request_ids (string[], deduped) span attribute to the
user_turn (STT), tts_request_run (TTS), and llm_request_run (LLM) spans
so users can correlate traces with the provider's server-side logs.
emit agent handoffs under conversationitemadded - #1347 (@tinalenguyen)
feat(room-io): add jsonFormat option on RoomOutputOptions for timed transcription output. When enabled, each chunk published on the lk.transcription datastream topic is a JSON object with text, and start_time/end_time when the chunk is a TimedString. Ported from livekit/agents#5472. - #1305 (@toubatbrian)
Port livekit/agents#5511 + #5532: - #1304 (@toubatbrian)
lk.playback_started RPC support to DataStreamAudioOutput — new waitPlaybackStart constructor option (default false). When true, the playbackStarted event is deferred until the remote avatar worker invokes the lk.playback_started RPC instead of firing eagerly on the first captured frame.SegmentSynchronizerImpl start-time off onPlaybackStarted — startWallTime and startFuture are now set when the audio output reports playback start (chained automatically through SyncedAudioOutput.onPlaybackStarted), rather than when the first audio frame is pushed. Combined with the close-path fallback from #5532 this keeps the synchronizer correct for both eager (room) and deferred (avatar RPC) playback timing.Note: only the consumer side (the agent registering the RPC handler and surfacing the event) is included; agents-js does not have an AvatarRunner / DataStreamAudioReceiver, so the producer-side notifyPlaybackStarted is skipped.
Gracefully handle unknown inference TTS event type - #1333 (@toubatbrian)
chore(deps): update @livekit/rtc-node to 0.13.27 - #1331 (@toubatbrian)
fix lockfile - #1340 (@toubatbrian)
support new realtime model capability for native transcript synchronization, set to true for phonic - #1329 (@tinalenguyen)
feat: Resume false interruption feature - #1320 (@toubatbrian)
modelOptions (cartesia.add_timestamps, elevenlabs.sync_alignment, inworld.timestamp_type) and forward the gateway's output_timestamps WebSocket events as TimedString word/character timings attached to the next synthesized audio frame. Ported from livekit/agents#5534. - #1311 (@toubatbrian)fix(voice): await initRecording() to prevent OTEL trace loss in short sessions - #1300 (@moyounishimself)
support LIVEKIT_AGENT_NAME env var - #1332 (@toubatbrian)
fix(deps): update dependency uuid to v14 [security] - #1313 (@renovate)
feat(metrics): add playbackLatency metric on assistant ChatMessages - #1323 (@toubatbrian)
replcae sentencetokenizer with wordtokenizer for python parity - #1312 (@tinalenguyen)
add preserveFunctionCallHistory option to AgentTask and TaskGroup and use function call history in Phonic plugin - #1285 (@qionghuang6)
Add Deepgram flux-general-multi STTv2 model support with multi-language detection. Introduces a new languageHint option for biasing the model toward specific languages (only used by flux-general-multi), and adds a new sourceLanguages field on SpeechData that carries all detected languages sorted by prevalence. For multi-language detection, the dominant language is set on language while sourceLanguages retains the full list. - #1275 (@toubatbrian)
fix(voice): don't commit unplayed LLM response to chat context when interruption happens before any text is synchronized - #1270 (@u9g)
feat(stt): add diarization capabilities and speaker_id support - #1267 (@toubatbrian)
feat(voice): add PreemptiveGenerationOptions for fine-grained control - #1265 (@toubatbrian)
feat: add dedent tagged template literal helper - #1259 (@u9g)
fix(inference): accept numeric STT error codes - #1231 (@Maples7)
feat: add UserData generic to JobProcess, JobContext, and defineAgent - #1250 (@u9g)
Update all ws usage to use the same version - #1239 (@lukasIO)
feat(inference): handle preflight_transcript events in inference STT plugin - #1228 (@adrian-cowham)
fix: add required parameter to getJobContext(), matching Python SDK's get_job_context(required=False) pattern. Removes noisy warn-level log during evals/tests. - #1253 (@u9g)
feat(voice): add answering machine detection - #1215 (@chenghao-mou)
fix(voice): allow awaiting speech handles from inside function tools; make SpeechHandle awaitable - #1266 (@u9g)
feat(inference): introduce XAIModels type and enhance LLMModels with reasoning support - #1241 (@russellmartin-livekit)
Use ThrowsPromise helper across agent package - #1249 (@lukasIO)
fix: avoid retrying aborted LLM requests during shutdown - #1247 (@tobiplancraft)
Add get_framework_info request/response support - #1223 (@toubatbrian)
update readme with community link - #1225 (@tinalenguyen)
refactor _updateSession in phonic and base realtimesession class - #1224 (@tinalenguyen)
expose serviceTier in CompletionUsage from OpenAI Responses API - #1205 (@piyush-gambhir)
Fix extra_headers being sent in OpenAI request body instead of as HTTP headers in inference LLM - #1216 (@smorimoto)
remove rt session say logic and add phonic logic for resetting ws conn - #1177 (@toubatbrian)
fix(tts): unblock FallbackAdapter when primary provider fails silently - #1218 (@mrniket)
Reuse STT Pipeline Across Agent Handoff - #1177 (@toubatbrian)
fix(agents): release initMutex after warming to restore pool concurrency - #1214 (@drain-zine)
fix: pass queueSizeMs from RoomOutputOptions through to AudioSource - #1207 (@cxyangs)
Add prompt_cache_retention option to inference - #1212 (@s-hamdananwar)
(inference): add debug metadata headers to inference requests - #1208 (@adrian-cowham)
Explicitly close AudioResampler instances too free up resources - #1210 (@lukasIO)
fix: Address 6 bugs from Detail scan (March 25) - #1182 (@toubatbrian)
fix: address 5 Detail scan bugs from March 11 (reconnect, mutex leak, playout, ordering, retryability) - #1188 (@toubatbrian)
fix(voice): reset VAD on premature STT EOT & guard empty recorder frames - #1181 (@toubatbrian)
fix: Include session usage in reports and emit usage updates - #1161 (@toubatbrian)
Handle unhandled rejection from fire-and-forget run() in SupervisedProc - #1158 (@Raysharr)
fix: add idle timeouts to TTS stream reads to prevent agent stuck in speaking state - #1174 (@toubatbrian)
Guard WritableStream close in RoomIO teardown to prevent ERR_INVALID_STATE when writer is already closed or errored during concurrent speech interruption - #1172 (@Raysharr)
fix(IPC): graceful handling when channel closes during inference - #1168 (@toubatbrian)
Add chatCtx and ChatMessage support to AgentSession.generateReply - #1170 (@toubatbrian)
fix: handle unhandled 'error' event on FfmpegCommand in audio.ts - #1173 (@enriqueespaillat-gyde)
Ensure delay doesn't reject with undefined - #1152 (@lukasIO)
Action-aware history summarization - #1146 (@toubatbrian)
fix: Align inference TTS provider options - #1160 (@toubatbrian)
Support Image Input for OpenAI realtime model - #1094 (@toubatbrian)
Fix hanging process when participant disconnects during init - #1087 (@lukasIO)
deprecated
Standardize LanguageCode handling - #1120 (@toubatbrian)
Bun and deno runtime stream release fixes - #1135 (@lukasIO)
Prevent mainTask hang when speech handle is interrupted after authorization - #1126 (@enriqueespaillat-gyde)
fix: handle channel close errors in safeSend during shutdown - #1110 (@haroldfabla2-hue)
Skip speech handles that are already interrupted when processing queue - #1090 (@lukasIO)
Use cgroup-aware CPU monitoring inside Docker containers - #1099 (@toubatbrian)
Add GPT-5.4 to inference OpenAIModels type - #1105 (@Topherhindman)
Add AEC warmup functionality to AgentSession and AgentActivity - #1091 (@toubatbrian)
Ensure input stream is only tee'd when it's actually being used - #1088 (@lukasIO)
Support gateway Inworld model options - #1102 (@toubatbrian)
fix: prevent shutdown hang when speech is active during disconnect - #1100 (@toubatbrian)
Add TaskGroup feature - #1072 (@toubatbrian)
Change logger to use error serializer - #1063 (@qionghuang6)
Implement AgentTask feature - #1045 (@toubatbrian)
add openai responses api llm - #958 (@tinalenguyen)
Ensure registered plugin versions stay up to date - #1064 (@lukasIO)
feat: Create MultiInputStream API primitive - #1036 (@toubatbrian)
Add comprehensive user span instrumentations - #1027 (@toubatbrian)
Add phonic realtime model - #1062 (@toubatbrian)
fix: dev command now correctly defaults to debug log level - #1020 (@toubatbrian)
Implement tts aligned transcripts - #990 (@toubatbrian)
increase AudioMixer default timeout in background audio player - #1021 (@toubatbrian)
Implement health check - #996 (@andrewnitu)
Change the health check from always returning healthy to returning the status of the following two criteria:
fix(tokenize): correct capture group reference in website regex - #1004 (@IlyaShelestov)
update livekit inference model to match latest - #993 (@davidzhao)
preserve thought_signature across parallel tool calls for Gemini 3+ for inference gateway - #1000 (@toubatbrian)
Make agent state transition fixes and add interim transcript interruption support - #992 (@toubatbrian)
fix: handle VAD stream closed error during agent handover - #997 (@toubatbrian)
StreamAdapter where endInput() could be called on an already-closed VAD stream during agent handover, causing an unrecoverable stt_error. This affected non-streaming STTs (like OpenAI STT) that use the StreamAdapter wrapper.isStreamClosedError() utility function for consistent error handling.Add support for noiseCancellation frameProcessors - #966 (@lukasIO)
refine timestamps in spans and recording alignment - #982 (@toubatbrian)
Add aligned transcript support with word-level timing for STT plugins - #984 (@toubatbrian)
Add tests for existing agent implementations in examples - #978 (@toubatbrian)
Add advanced test utilities for test framework - #976 (@toubatbrian)
Add connector participant kind to defaults - #973 (@lukasIO)
Supports initial set of testing utilities in agent framework - #965 (@toubatbrian)
Support extra content in inference llm for provider-specific metadata - #967 (@toubatbrian)
Implemented FallbackAdapter for LLM - #916 (@KrishnaShuk)
Fix queue closure in LLMStream, STTStream, TTSStream - #954 (@toubatbrian)
fix(google): handle late-arriving toolCalls in Gemini realtime API - #937 (@kirsten-emak)
When using the Gemini realtime API, tool calls could occasionally arrive after turnComplete, causing them to be lost or trigger errors. This fix keeps the functionChannel open after turnComplete to catch late-arriving tool calls, and adds a closed property to StreamChannel to track channel state.
No code changes required for consumers.
awaited the prewarm function - #919 (@KrishnaShuk)
Fix flaky IPC test EPIPE error - #941 (@toubatbrian)
Send all log levels to cloud observability regardless of terminal log level - #942 (@toubatbrian)
Fix supervisor process crashes when child process dies unexpectedly - #935 (@Hormold)
inherit execArgv when forking TypeScript child processes - #948 (@toubatbrian)
fix realtime function call timestamps - #946 (@toubatbrian)
Fork files with cjs extension when running cjs file - #892 (@budde377)
fix(agents): return to listening state for Gemini realtime API thinking-only turns - #936 (@kirsten-emak)
Fix voice interruption transcript spill, add ConnectionPool for inference websockets, and log TTS websocket pool misses. - #910 (@toubatbrian)
Support thinking sound inside background audio player - #915 (@toubatbrian)
Support multi-context WebSocket connection for elevenlabs TTS - #912 (@toubatbrian)
Fix improper resource cleanup inside AgentActivity by not close global STT / TTS / VAD components - #893 (@toubatbrian)
Improve TTS resource cleanup - #893 (@toubatbrian)
Rename pushedDurationMs to pushedDuration (was actually in seconds) - #876 (@toubatbrian)
Fix improper resource cleanup inside AgentActivity by not close global STT / TTS / VAD components - #891 (@toubatbrian)
Add Session Connection Options and Fix Blocking Speech from High-latency LLM Generation - #880 (@toubatbrian)
Add session shutdown API - #866 (@toubatbrian)
Add traces for session.say and session.generateReply - #882 (@toubatbrian)
Fix error spam during stream cleanup. Gracefully handle edge cases when detaching audio streams that were never initialized. - #884 (@Hormold)
Add RecorderIO for stereo audio recording - #876 (@toubatbrian)
Support transcripts & traces upload to livekit cloud observability - #863 (@toubatbrian)
Fixed memory leaks in AgentActivity and AgentSession - #875 (@jessebond2)
Support otel traces upload to livekit cloud observability - #867 (@toubatbrian)
Support logging integration to livekit cloud observability - #873 (@toubatbrian)
Fix race condition where STT/TTS processing could throw "Queue is closed" error when a participant disconnects. These events are now logged as warnings instead of errors. - #861 (@Devesh36)
Fix TTS with proper error handling logics from expected shutdown / interruptions - #859 (@toubatbrian)
create a new error object on timeout to have a correct stacktrace - #853 (@simllll)
Fix memory leak of inference gateway STT provider - #858 (@toubatbrian)
bump openai to 6.x - #813 (@toubatbrian)
Emit away events for User - #801 (@paulheinrichs-jb)
Support openai half-duplex mode (audio in -> text out -> custom TTS model) - #814 (@toubatbrian)
Support strict tool schema for openai-competible model - #816 (@toubatbrian)
Add preemptive generation - #798 (@toubatbrian)
Rename Worker to AgentServer - #713 (@Shubhrakanti)
Fix race condition causing "Writer is not bound to a WritableStream" error in Silero VAD - #786 (@toubatbrian)
Support Zod V4 tool schema and backward competible to V3 - #792 (@toubatbrian)
Add utility to play local audio file to livekit - #788 (@toubatbrian)
Add BackgroundAudio support - #789 (@toubatbrian)
Expose EOUMetrics type - #776 (@toubatbrian)
dist - #777 (@toubatbrian)