@elevenlabs/client@1.8.1
Patch Changes
-
a9dcb56: Fix iOS Safari dropping the first message's audio on WebSocket voice sessions.
iOS Safari blocks
HTMLAudioElementautoplay (including elements fed byMediaStreamDestination) when the underlyingAudioContextwasn't started synchronously inside a user gesture, and additionally needs the audio element to have an explicitplay()call against a non-empty playback graph. The web setup chain awaitsgetUserMedia, the connection handshake, and the audio worklet load beforeMediaDeviceOutputwould have created itsAudioContext, so by then the gesture is already consumed. The first agent message would arrive into a suspended context and never play; subsequent messages worked because the mic capture had reactivated iOS's audio session by then.The fix has two parts:
- On import, the web entry point installs capture-phase
touchstart/touchend/clicklisteners ondocument. The first user interaction creates and unlocks anAudioContext(silentBufferSource.start(0)+resume()) and stashes it for the next session. The stash auto-discards after 30s if no session starts. Capture-phase is needed because the convai widget awaits a terms-modal promise between the user's tap andConversation.startSession, which would otherwise consume the gesture before any session code runs. - After the worklet is wired up,
MediaDeviceOutputon iOS posts ~100ms of silence to the worklet and explicitly callsaudioElement.play()to prime the MediaStream → HTMLAudioElement pipeline.
Non-iOS is unchanged: it still lazily creates the context with the requested sample-rate constraint and does not register the document listeners.
WebRTC voice sessions are unaffected by this change.
- On import, the web entry point installs capture-phase
Fetched May 22, 2026

