Skip to content

How to stream text responses

This guide shows you how to stream tokens to the chat UI as they arrive from the model.

A StreamingChatAction differs from a ChatAction in one way: generate_response is a generator that yields string chunks instead of returning a single string. The chat component pushes each chunk to the browser over Server-Sent Events and appends it to the assistant bubble in real time.

Install the OpenAI SDK

pip install openai

Set OPENAI_API_KEY (and optionally OPENAI_BASE_URL).

Define a streaming action

Subclass StreamingChatAction and yield each delta the model emits.

OpenAI streaming chat

from collections.abc import Iterator

from openai import OpenAI
from pydantic import Field

import vizro.models as vm
from vizro import Vizro
from vizro_experimental.chat import Chat, Message, StreamingChatAction


class OpenAIStreamingChat(StreamingChatAction):
    model: str = Field(default="gpt-4.1-nano")

    def generate_response(self, messages: list[Message]) -> Iterator[str]:
        client = OpenAI()
        api_messages = [{"role": m["role"], "content": m["content"]} for m in messages]
        response = client.responses.create(
            model=self.model,
            input=api_messages,
            instructions="Be polite and creative.",
            store=False,
            stream=True,
        )
        for event in response:
            if event.type == "response.output_text.delta":
                yield event.delta


page = vm.Page(
    title="Streaming chat",
    components=[Chat(actions=OpenAIStreamingChat())],
)

Vizro().build(vm.Dashboard(pages=[page])).run()

Streaming chat

For a non-streaming version of the same pattern (full reply in one shot), see Use a real LLM.

When to use streaming

Reach for StreamingChatAction when the model can stream and the response is text-only. Use the synchronous ChatAction when:

  • The backend returns the full reply in one call (no streaming API).
  • You need to return a Dash component instead of text. See Render Dash components. Streaming only supports text chunks.

What's next