Streaming LLM responses in React Native with expo/fetch

One of the most impactful UX improvements you can make to an AI chat app is streaming. Instead of waiting 5–10 seconds for a complete response, users see tokens appear in real-time — just like ChatGPT.

Why expo/fetch instead of the built-in fetch?

React Native's built-in fetch doesn't fully support the ReadableStream API on iOS and Android. Streaming responses would either block until completion or fail silently. expo/fetch solves this — it's a WinterCG-compliant implementation that gives you proper ReadableStream support on all platforms.

# Already included with Expo SDK 52+ — no extra install needed

import { fetch } from "expo/fetch";

That's the only change needed. Everything else works exactly like the web Fetch API.

Setting up the stream

import { fetch } from "expo/fetch";

async function streamMessage(
  prompt: string,
  onToken: (token: string) => void,
  onComplete: () => void,
  signal?: AbortSignal
) {
  const response = await fetch("https://api.openai.com/v1/chat/completions", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      Authorization: `Bearer ${API_KEY}`,
    },
    body: JSON.stringify({
      model: "gpt-4o",
      stream: true,
      messages: [{ role: "user", content: prompt }],
    }),
    signal,
  });

  const reader = response.body?.getReader();
  const decoder = new TextDecoder();

  if (!reader) return;

  while (true) {
    const { done, value } = await reader.read();
    if (done) {
      onComplete();
      break;
    }

    const chunk = decoder.decode(value, { stream: true });
    const lines = chunk.split("\n").filter((l) => l.startsWith("data: "));

    for (const line of lines) {
      const data = line.replace("data: ", "");
      if (data === "[DONE]") continue;

      try {
        const json = JSON.parse(data);
        const token = json.choices[0]?.delta?.content;
        if (token) onToken(token);
      } catch {
        // ignore parse errors on incomplete chunks
      }
    }
  }
}

State management

The key to smooth streaming is accumulating tokens correctly without re-creating the array every tick:

const [messages, setMessages] = useState<Message[]>([]);
const [isStreaming, setIsStreaming] = useState(false);

const sendMessage = async (text: string) => {
  // Add user message
  setMessages((prev) => [...prev, { role: "user", content: text }]);

  // Add empty assistant message placeholder
  const assistantId = Date.now().toString();
  setMessages((prev) => [
    ...prev,
    { id: assistantId, role: "assistant", content: "" },
  ]);

  setIsStreaming(true);

  await streamMessage(
    text,
    (token) => {
      setMessages((prev) =>
        prev.map((m) =>
          m.id === assistantId ? { ...m, content: m.content + token } : m
        )
      );
    },
    () => setIsStreaming(false)
  );
};

Auto-scroll

Auto-scroll to bottom as tokens arrive:

const flatListRef = useRef<FlatList>(null);

useEffect(() => {
  if (isStreaming) {
    flatListRef.current?.scrollToEnd({ animated: false });
  }
}, [messages, isStreaming]);

Cancellation

Always implement cancellation — users tap "stop" mid-stream:

const abortControllerRef = useRef<AbortController | null>(null);

const startStream = async (text: string) => {
  abortControllerRef.current = new AbortController();

  await streamMessage(
    text,
    onToken,
    onComplete,
    abortControllerRef.current.signal
  );
};

const stopStreaming = () => {
  abortControllerRef.current?.abort();
  setIsStreaming(false);
};

Blinking cursor while streaming

function StreamingCursor({ visible }: { visible: boolean }) {
  const opacity = useRef(new Animated.Value(1)).current;

  useEffect(() => {
    if (!visible) return;
    const anim = Animated.loop(
      Animated.sequence([
        Animated.timing(opacity, { toValue: 0, duration: 500, useNativeDriver: true }),
        Animated.timing(opacity, { toValue: 1, duration: 500, useNativeDriver: true }),
      ])
    );
    anim.start();
    return () => anim.stop();
  }, [visible]);

  if (!visible) return null;
  return <Animated.Text style={{ opacity }}>▌</Animated.Text>;
}

Supabase Edge Function proxy

If you're routing through a Supabase Edge Function (recommended for keeping API keys server-side), the same pattern works:

const response = await fetch(
  `${process.env.EXPO_PUBLIC_SUPABASE_URL}/functions/v1/chat-proxy`,
  {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      Authorization: `Bearer ${session.access_token}`,
    },
    body: JSON.stringify({ conversationId, messages }),
    signal,
  }
);

The edge function streams SSE back, and expo/fetch handles it natively on device.

Conclusion

The switch from fetch to import { fetch } from 'expo/fetch' is a one-line change that unlocks proper streaming on iOS and Android. Combined with ref-based token accumulation (to avoid re-renders on every token) and AbortController for cancellation, you get a smooth, production-ready streaming experience that feels native.