Streaming LLM responses in React Native with expo/fetch
One of the most impactful UX improvements you can make to an AI chat app is streaming. Instead of waiting 5–10 seconds for a complete response, users see tokens appear in real-time — just like ChatGPT.
Why expo/fetch instead of the built-in fetch?
React Native's built-in fetch doesn't fully support the ReadableStream API on iOS and Android. Streaming responses would either block until completion or fail silently. expo/fetch solves this — it's a WinterCG-compliant implementation that gives you proper ReadableStream support on all platforms.
# Already included with Expo SDK 52+ — no extra install needed
import { fetch } from "expo/fetch";
That's the only change needed. Everything else works exactly like the web Fetch API.
Setting up the stream
import { fetch } from "expo/fetch"; async function streamMessage( prompt: string, onToken: (token: string) => void, onComplete: () => void, signal?: AbortSignal ) { const response = await fetch("https://api.openai.com/v1/chat/completions", { method: "POST", headers: { "Content-Type": "application/json", Authorization: `Bearer ${API_KEY}`, }, body: JSON.stringify({ model: "gpt-4o", stream: true, messages: [{ role: "user", content: prompt }], }), signal, }); const reader = response.body?.getReader(); const decoder = new TextDecoder(); if (!reader) return; while (true) { const { done, value } = await reader.read(); if (done) { onComplete(); break; } const chunk = decoder.decode(value, { stream: true }); const lines = chunk.split("\n").filter((l) => l.startsWith("data: ")); for (const line of lines) { const data = line.replace("data: ", ""); if (data === "[DONE]") continue; try { const json = JSON.parse(data); const token = json.choices[0]?.delta?.content; if (token) onToken(token); } catch { // ignore parse errors on incomplete chunks } } } }
State management
The key to smooth streaming is accumulating tokens correctly without re-creating the array every tick:
const [messages, setMessages] = useState<Message[]>([]); const [isStreaming, setIsStreaming] = useState(false); const sendMessage = async (text: string) => { // Add user message setMessages((prev) => [...prev, { role: "user", content: text }]); // Add empty assistant message placeholder const assistantId = Date.now().toString(); setMessages((prev) => [ ...prev, { id: assistantId, role: "assistant", content: "" }, ]); setIsStreaming(true); await streamMessage( text, (token) => { setMessages((prev) => prev.map((m) => m.id === assistantId ? { ...m, content: m.content + token } : m ) ); }, () => setIsStreaming(false) ); };
Auto-scroll
Auto-scroll to bottom as tokens arrive:
const flatListRef = useRef<FlatList>(null); useEffect(() => { if (isStreaming) { flatListRef.current?.scrollToEnd({ animated: false }); } }, [messages, isStreaming]);
Cancellation
Always implement cancellation — users tap "stop" mid-stream:
const abortControllerRef = useRef<AbortController | null>(null); const startStream = async (text: string) => { abortControllerRef.current = new AbortController(); await streamMessage( text, onToken, onComplete, abortControllerRef.current.signal ); }; const stopStreaming = () => { abortControllerRef.current?.abort(); setIsStreaming(false); };
Blinking cursor while streaming
function StreamingCursor({ visible }: { visible: boolean }) { const opacity = useRef(new Animated.Value(1)).current; useEffect(() => { if (!visible) return; const anim = Animated.loop( Animated.sequence([ Animated.timing(opacity, { toValue: 0, duration: 500, useNativeDriver: true }), Animated.timing(opacity, { toValue: 1, duration: 500, useNativeDriver: true }), ]) ); anim.start(); return () => anim.stop(); }, [visible]); if (!visible) return null; return <Animated.Text style={{ opacity }}>▌</Animated.Text>; }
Supabase Edge Function proxy
If you're routing through a Supabase Edge Function (recommended for keeping API keys server-side), the same pattern works:
const response = await fetch( `${process.env.EXPO_PUBLIC_SUPABASE_URL}/functions/v1/chat-proxy`, { method: "POST", headers: { "Content-Type": "application/json", Authorization: `Bearer ${session.access_token}`, }, body: JSON.stringify({ conversationId, messages }), signal, } );
The edge function streams SSE back, and expo/fetch handles it natively on device.
Conclusion
The switch from fetch to import { fetch } from 'expo/fetch' is a one-line change that unlocks proper streaming on iOS and Android. Combined with ref-based token accumulation (to avoid re-renders on every token) and AbortController for cancellation, you get a smooth, production-ready streaming experience that feels native.