Cloud EOT with local fallback; console mode for local brokers
@livekit/agents@1.4.7
Patch Changes
-
feat(core): audio end-of-turn detection with cloud → local fallback (AGT-2520) - #1719 (@chenghao-mou)
- New
inference.TurnDetector: WebSocket cloud EOT transport (version: 'v1', model nameturn-detector-v1) with automatic fallback to the local native model (version: 'v1-mini', model nameturn-detector-v1-mini) via@livekit/local-inference. Auto-selects'v1'whenLIVEKIT_REMOTE_EOT_URLis set,'v1-mini'otherwise. Theversionis the constructor knob; telemetry/billing report the full model name viadetector.model. - The local EOT model runs in the shared inference process (the same
InferenceProcExecutorthe text turn detector uses), loaded once per worker host (~138 MB) instead of in every job worker. The runner is registered by default when the native binding is available, so the inference process spawns on worker startup; on platforms where the binding can't load, local EOT degrades to a positive-default prediction and the worker still starts. (This is a JS-specific divergence from Python, which keeps EOT in-process and relies on forkserver COW sharing.) - No prewarm helpers: EOT auto-warms in the inference process; the in-process silero VAD lazy-loads on first stream. (The
inference.prewarm*helpers added during development were removed before release.) - New
inference.VAD(local-only streaming VAD via@livekit/local-inference). AgentSessionnow auto-provisions a bundled silero VAD whenvadis omitted (isDefault=true). Passvad: nullto opt out.livekit-plugins-silerois deprecated; passvad: nullto opt out of the bundled default, or useinference.VAD({ model: 'silero', ... })to customise.livekit-plugins-livekitturn detector is deprecated in favor ofinference.TurnDetector.- Endpointing defaults are now detector-aware: when the resolved turn detector is a streaming ("audio model") detector — the bundled default — unset endpointing keys fall back to tighter defaults (
minDelay: 300,maxDelay: 2500) instead of the legacy500/3000. Non-streaming modes (vad/stt/manual/realtime_llm, orturnDetection: null) keep the legacy defaults. Explicit user keys are tracked as sparse overrides and re-resolved per agent activity, so different agents in one session can use different detectors and runtimeupdateOptionschanges survive handoffs. - New
EOTInferenceMetricsandEOTModelUsage; new telemetry span attributes (lk.eou.source,lk.eou.from_cache,lk.eou.detection_delay); neweot_predictionevent forwarded over remote sessions. - Requires
@livekit/protocol>= 1.46.5 (exposes theAgentInferencemessage namespace used by the cloud transport, including the server-providedSessionCreateddefault thresholds).
- New
-
Add
TcpAudioInput/TcpAudioOutputfor console-mode sessions, porting the Pythontcp_consoleaudio IO: inboundaudio_inputframes are resampled from the 48 kHz wire rate to the 24 kHz agent rate and fed to the STT pipeline, while the agent's TTS frames are resampled back up and streamed asaudio_outputmessages. The output drives the flush/clear playout handshake, blocking the agent turn until the broker reportsaudio_playback_finished(or reporting an interruption when the buffer is cleared).SessionHostnow accepts optional audio IO and routes inboundaudio_input/audio_playback_finishedmessages to it. - #1694 (@toubatbrian) -
Add a
consoleCLI subcommand and in-process console runner, the final piece that lets a local broker (e.g. the LiveKit CLIlk sessiondaemon) drive a Node agent over TCP.runConsoleloads the agent, opens aTcpSessionTransportto--connect-addr, sets up anAgentsConsolesingleton, and runs the agent entrypoint in-process (mirroring python's_run_tcp_console, which usesJobExecutorType.THREADto share the console singleton with theAgentSession).AgentSessionnow wires itsSessionHostfrom theAgentsConsolesingleton when console mode is active, andJobContextgained fake-job support (isFakeJob, no-opconnect/deleteRoom/recording) so a console job without a backing LiveKit room behaves correctly. Audio IO is attached by default (voice mode); a text-mode driver disables it at runtime via anupdate_iorequest. - #1706 (@toubatbrian) -
Console mode parity fixes for the
lk agent consoleflow: run registered inference runners (e.g. the livekit turn detector) in a supervised child process instead of failing, and write--recordoutput (audio.ogg+session_report.json) to a localconsole-recordings/session-<timestamp>/directory like python, instead of a temp dir with no report. - #1706 (@toubatbrian) -
feat(eot): emit agent backchannel opportunity events (AGT-2520) - #1719 (@chenghao-mou)
The multimodal EOT model now returns a backchannel probability alongside the end-of-turn probability. The turn detector compares it to a server-provided threshold and, when it clears, surfaces an internal backchannel opportunity (a window where the agent could say a short "mm-hmm" while the user still holds the floor) to
AgentActivity.inference.TurnDetectorgains abackchannelThresholdoption (andupdateOptions({ backchannelThreshold }));ThresholdOptions.lookupBackchannel()resolves server-provided defaults layered with user overrides, mirroring the existing EOT threshold resolution.- Backchannel thresholds are server-driven and cloud-only — disabled when the gateway sends none, after a cloud→local fallback (the mini model produces no backchannel probability), and for any non-positive threshold.
- Internal only:
AgentActivity.onAgentBackchannelOpportunityis a no-op with a TODO; the event is not surfaced as a publicAgentSessionevent (absent from theAgentEventunion,AgentSessionEventTypes, and package exports), treated the same way as the internal EOT prediction plumbing. - Requires
@livekit/protocol>= 1.46.8 (addsEotPrediction.backchannelProbabilityandSessionCreated.defaultBackchannelThresholds/defaultBackchannelThreshold).
-
Fix DataStream avatars (Anam, Bey, D-ID, LemonSlice, Runway, Tavus, Trugen) stalling the - #1795 (@smorimoto) conversation on user interruption when paired with the OpenAI Realtime API.
DataStreamAudioOutputparsed thelk.playback_finishedRPC payload with a compile-time-onlyas PlaybackFinishedEventcast. The LiveKit avatar protocol serializes that payload with snake_case keys (playback_position,synchronized_transcript) — confirmed against Anam's live engine, which emits{"playback_position": 2.0, "interrupted": true, "synchronized_transcript": null}— so the camelCaseplaybackPositionread backundefined. That becameMath.floor(undefined * 1000) === NaN, whichJSON.stringifyserializes asnullinconversation.item.truncate; the OpenAI Realtime API then rejected the truncate with aninvalid_typeerror and the interrupted turn could not recover.DataStreamAudioOutputnow normalizes the wire payload (snake_case primary, camelCase fallback), which also restores the previously-droppedsynchronizedTranscripton interrupted turns. As defense-in-depth, the realtime truncate path now clamps a non-finiteaudioEndMsto a valid non-negative integer in bothAgentActivityand the OpenAI plugin so a malformed or absent playback position can never again serialize asnull. -
Fix orphaned WebSocket leak in
connectWs: when the connection timeout fires, the socket is now terminated so it cannot connect and linger without an owner. Also fixes a hang where a normal (code 1000) close during the handshake left the promise unsettled — it now rejects on any close before the socket opens. UsesAPITimeoutErrorinstead ofAPIConnectionErrorfor clearer retry semantics. - #1788 (@chenghao-mou) -
Avoid clearing newer participant entrypoint tasks after quick reconnects. - #1723 (@rosetta-livekit-bot)
-
Remove the
cartesia/ink-2-latestalias from the Cartesia inference STT model type hints. The alias still works at runtime; dated and-latestCartesia snapshot aliases are no longer surfaced in the SDK types. - #1792 (@adrian-cowham) -
SupervisedProc.initialize()now fails fast — racing the first IPC message against child exit and the initialize timeout — instead of hanging forever when the child process dies before responding (e.g. an inference runner whose model files are missing). Callers that previously deadlocked (worker startup, console mode) now get an actionable error. - #1706 (@toubatbrian)
Fetched June 17, 2026
