Add AssemblyAI inference STT parameters for agent context, Voice Focus, Voice Focus threshold, and streaming mode. - #1852 (@rosetta-livekit-bot)
fix(eot): restore inference prediction timeout - #1853 (@rosetta-livekit-bot)
Sanitize turn detector for session reports: the OTLP attribute serializer now honors toJSON(), and the audio turn detector exposes a credential-free config snapshot. - #1847 (@chenghao-mou)
assemblyai/universal-3-5-pro to the inference STT model type hints. - #1840 (@rosetta-livekit-bot)feat(core): audio end-of-turn detection with cloud → local fallback (AGT-2520) - #1719 (@chenghao-mou)
inference.TurnDetector: WebSocket cloud EOT transport (version: 'v1', model name turn-detector-v1) with automatic fallback to the local native model (version: 'v1-mini', model name turn-detector-v1-mini) via @livekit/local-inference. Auto-selects 'v1' when LIVEKIT_REMOTE_EOT_URL is set, 'v1-mini' otherwise. The version is the constructor knob; telemetry/billing report the full model name via detector.model.InferenceProcExecutor the text turn detector uses), loaded once per worker host (~138 MB) instead of in every job worker. The runner is registered by default when the native binding is available, so the inference process spawns on worker startup; on platforms where the binding can't load, local EOT degrades to a positive-default prediction and the worker still starts. (This is a JS-specific divergence from Python, which keeps EOT in-process and relies on forkserver COW sharing.)inference.prewarm* helpers added during development were removed before release.)inference.VAD (local-only streaming VAD via @livekit/local-inference).AgentSession now auto-provisions a bundled silero VAD when vad is omitted (isDefault=true). Pass vad: null to opt out.livekit-plugins-silero is deprecated; pass vad: null to opt out of the bundled default, or use inference.VAD({ model: 'silero', ... }) to customise.livekit-plugins-livekit turn detector is deprecated in favor of inference.TurnDetector.minDelay: 300, maxDelay: 2500) instead of the legacy 500/3000. Non-streaming modes (vad/stt/manual/realtime_llm, or turnDetection: null) keep the legacy defaults. Explicit user keys are tracked as sparse overrides and re-resolved per agent activity, so different agents in one session can use different detectors and runtime updateOptions changes survive handoffs.EOTInferenceMetrics and EOTModelUsage; new telemetry span attributes (lk.eou.source, lk.eou.from_cache, lk.eou.detection_delay); new eot_prediction event forwarded over remote sessions.@livekit/protocol >= 1.46.5 (exposes the AgentInference message namespace used by the cloud transport, including the server-provided SessionCreated default thresholds).Add TcpAudioInput/TcpAudioOutput for console-mode sessions, porting the Python tcp_console audio IO: inbound audio_input frames are resampled from the 48 kHz wire rate to the 24 kHz agent rate and fed to the STT pipeline, while the agent's TTS frames are resampled back up and streamed as audio_output messages. The output drives the flush/clear playout handshake, blocking the agent turn until the broker reports audio_playback_finished (or reporting an interruption when the buffer is cleared). SessionHost now accepts optional audio IO and routes inbound audio_input/audio_playback_finished messages to it. - #1694 (@toubatbrian)
Add a console CLI subcommand and in-process console runner, the final piece that lets a local broker (e.g. the LiveKit CLI lk session daemon) drive a Node agent over TCP. runConsole loads the agent, opens a TcpSessionTransport to --connect-addr, sets up an AgentsConsole singleton, and runs the agent entrypoint in-process (mirroring python's _run_tcp_console, which uses JobExecutorType.THREAD to share the console singleton with the AgentSession). AgentSession now wires its SessionHost from the AgentsConsole singleton when console mode is active, and JobContext gained fake-job support (isFakeJob, no-op connect/deleteRoom/recording) so a console job without a backing LiveKit room behaves correctly. Audio IO is attached by default (voice mode); a text-mode driver disables it at runtime via an update_io request. - #1706 (@toubatbrian)
Console mode parity fixes for the lk agent console flow: run registered inference runners (e.g. the livekit turn detector) in a supervised child process instead of failing, and write --record output (audio.ogg + session_report.json) to a local console-recordings/session-<timestamp>/ directory like python, instead of a temp dir with no report. - #1706 (@toubatbrian)
feat(eot): emit agent backchannel opportunity events (AGT-2520) - #1719 (@chenghao-mou)
The multimodal EOT model now returns a backchannel probability alongside the end-of-turn probability. The turn detector compares it to a server-provided threshold and, when it clears, surfaces an internal backchannel opportunity (a window where the agent could say a short "mm-hmm" while the user still holds the floor) to AgentActivity.
inference.TurnDetector gains a backchannelThreshold option (and updateOptions({ backchannelThreshold })); ThresholdOptions.lookupBackchannel() resolves server-provided defaults layered with user overrides, mirroring the existing EOT threshold resolution.AgentActivity.onAgentBackchannelOpportunity is a no-op with a TODO; the event is not surfaced as a public AgentSession event (absent from the AgentEvent union, AgentSessionEventTypes, and package exports), treated the same way as the internal EOT prediction plumbing.@livekit/protocol >= 1.46.8 (adds EotPrediction.backchannelProbability and SessionCreated.defaultBackchannelThresholds / defaultBackchannelThreshold).Fix DataStream avatars (Anam, Bey, D-ID, LemonSlice, Runway, Tavus, Trugen) stalling the - #1795 (@smorimoto) conversation on user interruption when paired with the OpenAI Realtime API.
DataStreamAudioOutput parsed the lk.playback_finished RPC payload with a compile-time-only
as PlaybackFinishedEvent cast. The LiveKit avatar protocol serializes that payload with
snake_case keys (playback_position, synchronized_transcript) — confirmed against Anam's
live engine, which emits {"playback_position": 2.0, "interrupted": true, "synchronized_transcript": null}
— so the camelCase playbackPosition read back undefined. That became
Math.floor(undefined * 1000) === NaN, which JSON.stringify serializes as null in
conversation.item.truncate; the OpenAI Realtime API then rejected the truncate with an
invalid_type error and the interrupted turn could not recover.
DataStreamAudioOutput now normalizes the wire payload (snake_case primary, camelCase
fallback), which also restores the previously-dropped synchronizedTranscript on interrupted
turns. As defense-in-depth, the realtime truncate path now clamps a non-finite audioEndMs to
a valid non-negative integer in both AgentActivity and the OpenAI plugin so a malformed or
absent playback position can never again serialize as null.
Fix orphaned WebSocket leak in connectWs: when the connection timeout fires, the socket is now terminated so it cannot connect and linger without an owner. Also fixes a hang where a normal (code 1000) close during the handshake left the promise unsettled — it now rejects on any close before the socket opens. Uses APITimeoutError instead of APIConnectionError for clearer retry semantics. - #1788 (@chenghao-mou)
Avoid clearing newer participant entrypoint tasks after quick reconnects. - #1723 (@rosetta-livekit-bot)
Remove the cartesia/ink-2-latest alias from the Cartesia inference STT model type hints. The alias still works at runtime; dated and -latest Cartesia snapshot aliases are no longer surfaced in the SDK types. - #1792 (@adrian-cowham)
SupervisedProc.initialize() now fails fast — racing the first IPC message against child exit and the initialize timeout — instead of hanging forever when the child process dies before responding (e.g. an inference runner whose model files are missing). Callers that previously deadlocked (worker startup, console mode) now get an actionable error. - #1706 (@toubatbrian)
Add the agent participant SID as an X-LiveKit-Agent-Id header on inference requests, alongside the existing room and job ID headers, when running inside a job context. - #1687 (@adrian-cowham)
Defer AMD listening until the participant audio track is subscribed, and for SIP participants until sip.callStatus is active, so ringback and early media no longer consume the no-speech budget. After AMD settles on a machine verdict with interruptOnMachine, skip the normal auto-reply triggered by user-turn completion so it no longer races with — and interrupts — the caller's own generateReply (e.g. leaving a voicemail). - #1639 (@rosetta-livekit-bot)
Complete the AMD verdict-emission port: add waitUntilFinished and maxEndpointingDelayMs options and gate emission on both post-speech silence and end-of-turn (machine/uncertain verdicts wait for the turn detector or a fallback backstop; a confident human releases on silence alone). Settle no_speech_timeout as uncertain instead of machine-unavailable. Treat the classifier LLM's tool calls as authoritative — no longer resurrect a verdict by parsing free-text content emitted alongside an uncertain/postpone tool call.
Wire AMD into the recognition-hook layer the way the Python framework does: AgentActivity now drives AMD via onUserSpeechStarted(), onUserSpeechEnded(silenceDurationMs), and onTranscript(text, source) from its VAD/STT hooks, instead of AMD snooping the derived UserStateChanged/UserInputTranscribed session events. This gives AMD the VAD's real silenceDuration directly, so post-speech timers and reported delays are anchored on the true speech-end time rather than skewed by VAD/event latency.
Port the AMD classification prompt verbatim from the Python framework — restoring the task description, category definitions (machine-vm = leaving a message IS possible; machine-unavailable = NOT possible), and the few-shot examples that steer borderline cases (hours-of-operation → uncertain, "press 1" → machine-ivr, call-screening → machine-ivr) — and pass the raw transcript as the user message so it matches the prompt's Input:/Output: pattern.
feat(voice/avatar): add avatar join waiting and cleanup participant on close - #1594 (@rosetta-livekit-bot)
Adaptive interruption detection now omits the threshold from session.create unless the user explicitly overrides it, letting the gateway apply its fetched default (surfaced via default_threshold on session.created). The HTTP transport has been dropped — detection always connects over WebSocket and always requires LiveKit credentials, and its base URL now defaults from LIVEKIT_INFERENCE_URL instead of LIVEKIT_REMOTE_EOT_URL. Inference requests also send an X-LiveKit-Worker-Token header when LIVEKIT_WORKER_TOKEN is set (hosted agents); a token supplied via the --worker-token CLI flag is now re-exported into the environment so forked job subprocesses inherit it and include the header. The X-LiveKit-Agent-Id header is now only attached once the room is connected to avoid leaking an unset local-participant SID. The interruption WebSocket is now closed deterministically on stream teardown (including error and cancel paths) instead of only on graceful completion — previously an orphaned socket leaked per session/activity and accumulated for the worker's lifetime. Mid-session threshold/duration changes via updateOptions now reconnect the WebSocket in place rather than closing it and letting the next send error the stream — so option changes no longer consume a failover retry (previously enough updates in a session could exhaust the retry budget and stop interruption detection). - #1785 (@chenghao-mou)
Bound AgentSession close during job shutdown so shutdown callbacks still run. - #1638 (@rosetta-livekit-bot)
Rate-limit IPC high-memory warnings and include process context in memory logs. - #1717 (@rosetta-livekit-bot)
Clamp the STT-derived lastSpeakingTime to the current wall-clock time. When the STT stream's clock diverged from the activity's input epoch (e.g. a reused STT pipeline after an agent handoff), the transcript endTime could map to a timestamp minutes in the future, causing the end-of-turn bounce task to sleep that long before committing the user turn — the agent appeared to go silent mid-call even though LLM preemptive generation kept running. - #1782 (@toubatbrian)
increase memory warning threshold - #1778 (@davidzhao)
Support FlushSentinel in voice LLM nodes to flush audio and text output per segment. - #1710 (@rosetta-livekit-bot)
Support granular recording options in AgentSession.start. The record option now accepts boolean | RecordingOptions ({ audio, traces, logs, transcript }); a boolean maps to all-on/all-off and a partial object merges onto all-on, so omitted keys default to true. Each category independently gates audio capture, trace export, log export, and transcript upload, mirroring the Python SDK and matching the documented granular form. - #1702 (@anzemur)
Guard inference agent ID header lookup until the room is connected. - #1700 (@rosetta-livekit-bot)
Preserve OpenAI Responses assistant message phase metadata across follow-up requests. - #1720 (@rosetta-livekit-bot)
Close active RecorderIO during job session-end cleanup before generating the session report. - #1682 (@rosetta-livekit-bot)
Align AgentSession.start recording with the Python SDK's primary-session behavior. The primary/secondary designation now happens in start() before initRecording, so a demoted secondary session never configures cloud recording. A non-primary session whose record argument was not explicitly given now silently disables its recording (instead of throwing); it still throws only when record was passed explicitly, matching Python's record_is_given semantics. - #1704 (@toubatbrian)
Restrict STT pipeline reuse during handoff to agents using the default sttNode. - #1605 (@rosetta-livekit-bot)
fix(voice): scope forwardAudio's playback-started listener to its own segment - #1786 (@chenghao-mou)
When a speech is interrupted, the scheduling loop immediately authorizes the next
speech, so the new segment's forwardAudio registers its playback_started
listener on the shared audio output while the interrupted segment is still
emitting events during teardown. The stray event resolved the new segment's
firstFrameFut before its first frame was captured, which skipped resampler
creation and pushed an unresampled frame straight to the AudioSource
(RtcError: sample_rate and num_channels don't match) and corrupted playback
bookkeeping. The listener now only resolves firstFrameFut after the segment has
captured its own first frame.
Add TcpSessionTransport, a SessionTransport that frames protobuf session messages over a raw TCP socket (4-byte big-endian length prefix, 1 MiB cap, TCP_NODELAY), mirroring the Python implementation. Also handle the updateIo session request in SessionHost, toggling input/output audio and transcription. This is the transport plumbing that lets a local broker (e.g. the LiveKit CLI session daemon) drive a Node agent over TCP. - #1693 (@toubatbrian)
fix(voice): emit the wrapper error (with recoverable) on session error events instead of the inner error - #1787 (@u9g)
Fix AgentActivity.generateReply defaulting toolChoice to 'none' on a child AgentSession spawned inside a tool. The previous check relied on AsyncLocalStorage, which leaks the parent function-call context into the child session and caused the framework to drop legitimate tool calls emitted by the child agent (e.g. the supervisor's connect_to_caller invocation in WarmTransferTask). The check now uses per-task info, matching the Python implementation. - #1458 (@rosetta-livekit-bot)
Block user turn exceeded callbacks while an agent handoff is starting. - #1614 (@rosetta-livekit-bot)
fix: repair leaked chat-template tokens in function call args - #1604 (@rosetta-livekit-bot)
Fix interrupt race that could leak unplayed transcript text. - #1573 (@rosetta-livekit-bot)
Wire internal debug messages through remote sessions. - #1645 (@rosetta-livekit-bot)
subscribe to tracks published after connect with AUDIO_ONLY/VIDEO_ONLY - #1629 (@toubatbrian)
chore(worker): update worker warnings - #1571 (@rosetta-livekit-bot)
fix(inference): stop mislabeling barge-in handler errors as parse failures - #1619 (@chenghao-mou)
The interruption WebSocket handler wrapped both wsMessageSchema.parse and handleMessage in one try, so a handler throw (e.g. a late bargein_detected prediction enqueued after the readable side was errored/closed) was logged as "Failed to parse WebSocket message" with the real error discarded. Parse and handler errors are now caught separately and log the actual error, and the late barge-in event is dropped quietly (desiredSize === null) instead of throwing into a dead stream.
update rtc sdk to 0.13.29 - #1652 (@davidzhao)
fix(llm): convert per-turn instructions on the first turn for Google provider format - #1589 (@rosetta-livekit-bot)
fix(realtime): process all messages in multi-message realtime generations - #1628 (@tinalenguyen)
Reorders audio/text forwarding setup inside processOneMessage to match the
Python source order (audio first, then text), and tightens the playout-await
guard so playoutPromise is only awaited when not interrupted. This fixes a
case where the second message in a multi-message realtime response (e.g.
gpt-realtime-2 preambles) could be dropped.
Also stamps assistant ChatMessage.createdAt with startedSpeakingAt (the
first frame's playback start) instead of defaulting to Date.now() at
end-of-generation. This preserves correct user/assistant ordering in
ChatContext when user transcription items land during agent playout.
feat(realtime): support multi-message generation per response - #1555 (@rosetta-livekit-bot)
Prevent recorder close from hanging during encode cleanup and clamp recorder frame splits to valid frame bounds. - #1684 (@rosetta-livekit-bot)
Remove the ttsPronunciationMap Agent option (and the TTSPronunciationMap type). Use the general tts_text_transforms / replace text transform for pre-TTS pronunciation replacements instead. - #1620 (@u9g)
fix: reset user turn tracker when clearing user turn - #1615 (@rosetta-livekit-bot)
fix(voice): make ParticipantAudioOutput.pause() actually gate audio (port _playback_enabled + synchronizer pause) - #1579 (@toubatbrian)
Replace discarded input audio with silence for STT and realtime model streams. - #1601 (@rosetta-livekit-bot)
fix: make non-transient 4xx API status errors non-retryable - #1597 (@rosetta-livekit-bot)
feat: allow updating dynamic endpointing alpha on active sessions - #1634 (@rosetta-livekit-bot)
feat(google): add Vertex AI Model Garden LLM integration - #1606 (@rosetta-livekit-bot)
Add Soniox STT support and surface per-run source and target language segments on STT speech data. - #1602 (@rosetta-livekit-bot)
fix(llm): sort function tools to keep tool order invariant. - #1641 (@rosetta-livekit-bot)
Update download-files deprecation message - #1621 (@rosetta-livekit-bot)
Use STT transcript timestamps for last speaking time when VAD is unavailable or misses speech. - #1603 (@rosetta-livekit-bot)
fix(voice): surface tool-argument validation errors to the LLM instead of returning a generic "internal error" - #1606 (@rosetta-livekit-bot)
When an LLM-generated tool call failed JSON parsing or Zod schema validation, the framework returned "An internal error occurred" to the LLM, which left the model with no way to correct itself — causing it to loop on the same invalid call. Argument-validation failures are now wrapped in a ToolError whose message includes the tool name and the validator's diagnostic, so the LLM can fix its arguments.
Behavior is unchanged for exceptions thrown from inside a tool's execute: regular Errors are still masked as "An internal error occurred" to avoid leaking server-side details, and ToolError continues to be the supported way to forward a custom message to the LLM.
Make ToolOptions.abortSignal required. The framework always provides an AbortSignal to tool execution, so the field is no longer optional. Tool authors can rely on abortSignal always being defined and drop defensive if (abortSignal) checks. - #1678 (@toubatbrian)
Reset active VAD streams on flush so STT end-of-speech can recover without recreating streams. STT end-of-speech now preserves the VAD-owned lastSpeakingTime instead of overwriting it, keeping the end-of-turn "no new speech" check reliable when VAD is active. - #1574 (@rosetta-livekit-bot)
Add beta WarmTransferTask workflow for SIP-based human handoffs. - #1458 (@rosetta-livekit-bot)
Add avatar join and playback latency metrics. - #1537 (@rosetta-livekit-bot)
fix(generation): preserve LLM-supplied call_id instead of overwriting with item id - #1524 (@toubatbrian)
Add support for Rime time scale factor options on arcana, coda, and mistv3. - #1557 (@rosetta-livekit-bot)
fix(voice): cancel realtime generation when speech is interrupted - #1503 (@rosetta-livekit-bot)
Fix playback flush and speech interruption races - #1518 (@toubatbrian)
fix(telemetry): export observability logs from logger instances captured before OTEL setup. - #1562 (@Cay-Zhang)
Add VAD-driven finalization for Speechmatics inference STT. - #1526 (@rosetta-livekit-bot)
fix(voice): allow true interruptions during backchannel boundary cooldown - #1565 (@rosetta-livekit-bot)
Add user turn limit options for interrupting long user speech. - #1535 (@rosetta-livekit-bot)
Improve audio discard checks - #1504 (@rosetta-livekit-bot)
Add dynamic endpointing for voice turn handling. - #1475 (@rosetta-livekit-bot)
fix(stt): reflect active child in FallbackAdapter model/provider - #1515 (@julien-lottie)
audio_recognition.refreshUserTurnSttAttributes reads these on every
STT event to stamp gen_ai.request.model / gen_ai.provider.name
on the user_turn span. With static wrapper labels, every span
reported FallbackAdapter / livekit regardless of which provider
actually transcribed — so a mid-turn fallover was invisible in
traces. Track the elected child from both the streaming and
recognize paths and surface its identifiers.
Add beta workflow InstructionParts exports. - #1500 (@rosetta-livekit-bot)
Add updateOptions support to inference LLM for live model swaps. - #1527 (@rosetta-livekit-bot)
fix audio resampler memory leak. - #1453 (@KrishnaShuk)
feat(agents): add modality-aware Instructions with audio/text variants - #1484 (@toubatbrian)
Introduce a new Instructions class for system prompts that adapt to the
user's input modality. The pipeline now applies the matching variant before
each LLM turn based on SpeechHandle.inputDetails.modality, and
AgentSession.generateReply() and AgentSession.run() expose an
inputModality option. Instructions.tpl supports JS-native prompt
composition while preserving audio/text variants.
brianyin/agt-2866-delete-room-on-session-close - #1501 (@toubatbrian)
fix(agents): await realtime auto tool replies in RunResult - #1490 (@rosetta-livekit-bot)
Add support for the Rime Coda TTS model. - #1523 (@rosetta-livekit-bot)
feat(agents): add Speechmatics inference STT model options. - #1507 (@rosetta-livekit-bot)
feat(agents): add livekit-agents download-files command for Docker layer caching - #1511 (@davidzhao)
Adds a standalone CLI (npx livekit-agents download-files) that discovers installed
@livekit/agents-plugin-* packages and downloads their asset files without loading
the user's agent code.
fix(barge-in): suppress session-level barge-in errors. - #1513 (@rosetta-livekit-bot)
fix: do not republish background audio tracks after reconnect - #1487 (@davidzhao)
Fail download-files when plugin downloads fail - #1481 (@toubatbrian)
chore(amd): update default models and drop null support - #1476 (@chenghao-mou)
Add TTS pronunciation customization support to agents, Google Gemini TTS, and Sarvam TTS. - #1473 (@rosetta-livekit-bot)
chore(amd): add default amd prediction log - #1496 (@chenghao-mou)
Add TTS text transforms with built-in markdown/emoji filtering, streaming replacement, and custom callable transform support. - #1477 (@rosetta-livekit-bot)
Expose AgentSessionOptions.ttsReadIdleTimeout and AgentSessionOptions.forwardAudioIdleTimeout to configure the two pipeline stall guards in performTTSInference and performAudioForwarding. Useful for custom LLM/TTS backends whose first-token latency can legitimately exceed the previous 10s default. Defaults remain 10 seconds, preserving existing behavior. - #1461 (@s-hamdananwar)
Make default user turn span start times explicit. - #1456 (@rosetta-livekit-bot)
Prevent voice pipeline scheduling from hanging when a pipeline task crashes after a speech handle is already marked done. - #1423 (@u9g)
fix(google): abort pending realtime sends during reconnect - #1415 (@u9g)
feat(inference): propagate STT extra to SpeechData.metadata - #1389 (@toubatbrian)
The inference STT plugin now plumbs the gateway's per-transcript extra field
onto SpeechData.metadata, exposing provider-specific signals (e.g. Inworld
voice profile, xAI speech_final) to consumers.
fix(worker): use available CPU cores for numIdleProcesses in production - #1449 (@KrishnaShuk)
fix(transcription): rstrip punctuation from interim segments - #1447 (@KrishnaShuk)
Emit agent configuration updates in OTLP session logs. - #1434 (@rosetta-livekit-bot)
fix(agents): persist user turn start across VAD bursts - #1457 (@rosetta-livekit-bot)
Support OpenAI Realtime Whisper STT - #1429 (@toubatbrian)
voice.AMD reaches feature completion. - #1390 (@toubatbrian)fix(agents): support constructing AgentSession with no arguments - #1410 (@u9g)
AMD: cancel the pre-baked HUMAN/short_greeting silence timer when a final STT transcript arrives inside the short-speech window, replacing it with a long_speech timer anchored at speechEndedAt + MACHINE_SILENCE_THRESHOLD_MS so the LLM verdict gets the final word. Mirrors the python fix in livekit/agents#5637. - #1390 (@toubatbrian)
Port AMD improvements from python livekit/agents#5584. voice.AMD now exposes the previously hard-coded timing thresholds (humanSpeechThresholdMs, humanSilenceThresholdMs, machineSilenceThresholdMs) and the classification prompt as constructor options, defers to the LLM (instead of forcing a HUMAN verdict) when a transcript is already available after a short greeting, and accepts a participantIdentity hint plus a suppressCompatibilityWarning flag. The classifier now offers two LLM tools — save_prediction and postpone_termination (capped at 3 extensions × 10s) — letting the model request more audio when the transcript is ambiguous; if the model returns plain JSON instead of tool calls, AMD falls back to the previous content-parsing path. AMD also logs a one-shot warning when the resolved LLM is not in the bundled EVALUATED_LLM_MODELS list. - #1368 (@toubatbrian)
fix(inference): make inference.LLM compatible with openai >= 6.36.0 - #1411 (@u9g)
Add comments to agent side and inference side fallback adapters - #1398 (@tmshapland)
refactor(agents): replace uuid with crypto.randomUUID - #1392 (@benasher44)
Add support for the new inworld-tts-2 Inworld TTS model. - #1396 (@toubatbrian)
inworld/inworld-tts-2 to the InworldModels union exported from
@livekit/agents/inference so the model is selectable when using the
LiveKit Inference Gateway TTS client.TTSModels type from @livekit/agents-plugin-inworld
('inworld-tts-2' | 'inworld-tts-1.5-max') and updates TTSOptions.model
to TTSModels | string, mirroring the Python plugin so callers get
autocomplete for the curated model names while still being able to pass
any custom model id.Ports https://github.com/livekit/agents/pull/5646 from livekit/agents.