When people first design a streaming interface, the natural question is:
If TCP already provides a reliable ordered byte stream, and HTTP/2 or HTTP/3 already supports long-lived multiplexed streams, why do we still need application-layer streaming design?
Because the transport layer only promises to move bytes. Real systems care about higher-level facts: which file do these bytes belong to? Is this chunk number 17? Can this operation be retried? What if the same chunk arrives twice? Can a download resume after disconnecting halfway through? Who slows down when the consumer cannot keep up?
Those are not transport-layer decisions.
This post uses two small objects throughout:
- upload side: upload a
1GBfile as4MBchunks. - download side: stream LLM tokens, log lines, or media fragments from a server to a client.
The directions are opposite, but the application-layer problem is similar: a byte stream must be split into meaningful messages, messages must be attached to state, and state must be acknowledged, resumed, and cancelled.
Figure 1: The transport layer moves bytes; the application layer adds boundaries, state, and meaning
Start With A Tiny Example
Suppose a client needs to upload a 1GB file and the network may disconnect halfway through. The simplest design is to put the whole file in one HTTP request body:
| |
This can work on a perfect network, but its system semantics are fragile:
- if the connection breaks at
740MB, how much did the server save? - should the client retry the full
1GB, or continue from740MB? - if the server receives the second half again, how does it know this is a retry, not a new file?
- should integrity be checked with one whole-file checksum, or per chunk?
A practical application-layer design usually turns the upload into a stateful session:
| |
Now the application layer has introduced objects the transport layer cannot know about:
| Object | Purpose |
|---|---|
upload_id | group multiple requests into one upload session |
chunk_index / byte range | create retryable byte boundaries |
checksum | distinguish reliable delivery from content correctness |
complete | commit temporary state into a final object |
| idempotency key | make repeated requests safe |
This is the core of upload-side streaming: split one large object into independently confirmable smaller objects, while still committing them as one consistent business object.
Download Streaming Is Not Just Upload In Reverse
Download streaming has a different shape. On the upload side, the client usually owns the complete input. On the download side, the server may generate output while the request is still running.
A typical example is an LLM token stream:
| |
This can be carried by HTTP chunked responses, Server-Sent Events, WebSocket, or HTTP/2 streams. But no matter which transport API is used, the application layer still has to define:
- where does each event begin and end?
- what do
delta,error, anddonemean? - can the client resume from
index=37after reconnecting? - how does the server send heartbeats so intermediaries do not treat the connection as idle?
- when the user cancels, how does the server stop backend computation instead of merely closing a socket?
So the download-side problem is not “keep writing to the socket.” It is expose a continuously produced computation as a consumable, terminable, observable event sequence.
sequenceDiagram
participant Client
participant API as Application API
participant Worker as Producer
Client->>API: start stream request
API->>Worker: create job(request_id)
Worker-->>API: delta #0
API-->>Client: event: delta, id: 0
Worker-->>API: delta #1
API-->>Client: event: delta, id: 1
Client-->>API: cancel / disconnect
API->>Worker: abort(request_id)
Worker-->>API: stopped
The important thing in this diagram is not the arrows. It is the request_id and event numbering. Without them, the system can only say “the connection closed.” With them, it can say which business task stopped, how much output was delivered, and whether backend work should be aborted.
What The Transport Layer Already Does
To be fair, the transport layer already solves hard problems.
TCP provides:
- ordered byte delivery: the receiver sees bytes in the order the sender wrote them.
- reliable retransmission: lost packets are resent.
- congestion control: the sender adapts to network conditions.
- flow control: the receiver window prevents the sender from overflowing receive buffers.
QUIC / HTTP/3 moves parts of this machinery into user space and improves connection migration and head-of-line blocking behavior under multiplexing. HTTP/2 can also carry multiple streams over one connection.
So the application layer should not reimplement packet retransmission, congestion control, or byte ordering. That is transport-layer work.
But the transport abstraction has a clear boundary: it sees connections, packets, streams, and byte offsets. It does not see these facts:
| Transport layer knows | Application layer knows |
|---|---|
| byte offset | which chunk or event this is |
| connection closed | user cancelled, network failed, or server completed |
| bytes delivered | content passed business validation |
| receiver window full | UI is slow, disk is slow, or downstream service is slow |
| stream id | which file, task, or session this stream belongs to |
So the sharper version of “isn’t the transport layer enough?” is:
If the system only needs to move bytes reliably, the transport layer is enough. If it needs to advance business state reliably, the application layer needs a design.
What The Application Layer Must Design
Application-layer streaming design has six recurring dimensions.
1. Message Boundaries
TCP is a byte stream. It does not preserve the message boundaries from application writes. If you call write() three times, the peer may receive the bytes in one read(). If you call write() once, the peer may receive it across multiple read() calls.
The application layer therefore needs framing:
| |
Or it can reuse existing formats:
- HTTP multipart
- SSE
event:/data:frames - WebSocket message frames
- gRPC streaming messages
- custom length-prefixed frames
Boundaries are not formatting trivia. They are the unit of state management. Without a boundary, there is no “chunk 17 succeeded” or “event 38 was consumed.”
2. Idempotency And Retry
The hardest part of network failure is not that a request failed. It is that the client often does not know whether the server processed it.
For an upload chunk:
| |
The client should retry. But if retrying makes the server append chunk 17 twice, the file is corrupted. The application layer needs to make the operation idempotent:
| |
When the server sees the same key, it can return “already received” instead of writing the data again. TCP cannot infer this. TCP knows whether bytes entered a connection; it does not know whether this request is a replay of a business action.
3. Progress And Resume
A common resumable-upload protocol looks like this:
| |
Download streaming can have a similar mechanism:
| |
But the download side has an extra constraint: does the server retain historical events? If events are generated in memory and never stored, reconnecting cannot precisely resume. The server may need to regenerate, or state explicitly that the stream is not resumable. That is part of the application contract.
4. Backpressure
The transport layer has flow control, but application-layer backpressure is still needed because “receive buffer is full” is not the only kind of slowness.
On the download side:
- browser JavaScript may process events slowly.
- the client may write to disk slowly.
- the UI may only need refreshes every 50ms, not every token immediately.
- a downstream consumer may be slow while the socket buffer is still fine.
On the upload side:
- server disk writes may be slow.
- the server may scan, decode, or index data while receiving it.
- object storage or a database may be the bottleneck.
The application layer can express backpressure in several ways:
- limit the number of in-flight chunks.
- return
429/503withRetry-After. - adjust chunk size or concurrency based on ACK latency.
- merge, sample, or batch-flush download events.
Transport flow control protects connection buffers. Application backpressure protects the business processing pipeline.
5. Completion, Error, And Cancellation
A streaming protocol should make “end” a first-class event.
Download streams should distinguish at least:
| |
If the protocol only relies on socket close, the client cannot tell whether the close means normal completion, network failure, server crash, or user cancellation. Uploads have the same problem: complete upload is an explicit commit point. Without it, the server cannot reliably distinguish “still uploading” from “abandoned.”
6. Observability And Cleanup
Streaming connections are often long-lived, cross multiple components, and fail in several ways. The application layer needs stable identities:
| |
These ids connect logs, metrics, retries, cancellation, and cleanup jobs. If a download stream disconnects, the API layer needs to find the backend worker and stop computation. If an upload session has not completed after 24 hours, a cleanup job needs to delete temporary chunks.
Without application-layer identity, the system can only clean up connections. With it, the system can clean up business resources.
The Shared Pattern Behind Upload And Download
Compressed into one model, upload and download streaming share the same state machine:
stateDiagram-v2
[*] --> Created
Created --> Streaming: first chunk/event
Streaming --> Streaming: ack + next unit
Streaming --> Paused: temporary failure
Paused --> Streaming: resume
Streaming --> Completed: done/complete
Streaming --> Failed: unrecoverable error
Streaming --> Cancelled: client/server abort
Paused --> Cancelled: timeout cleanup
Completed --> [*]
Failed --> [*]
Cancelled --> [*]
The difference is what “unit” means:
| Direction | Unit | Progress checkpoint | Resume condition |
|---|---|---|---|
| Upload | chunk / byte range | server persisted or validated the chunk | server can list received chunks |
| Download | event / token / frame | client stored the last event id, or the protocol has explicit ACKs | server can replay history, or recompute |
This state machine is also a checklist for evaluating a streaming design:
- can a stream be explicitly created?
- does every unit have a boundary and identifier?
- is success or progress acknowledged per unit, or only at the end?
- after failure, can both sides know the last consistent point?
- does cancellation release backend resources?
- is completion separate from connection close?
Choosing SSE, WebSocket, gRPC, Or A Custom Protocol
Application-layer design does not mean inventing a new protocol. Most systems should reuse mature carriers.
| Option | Good fit | Watch out |
|---|---|---|
| HTTP chunked response | simple download streams, continuous server output | carries bytes but does not define event semantics |
| SSE | server-to-browser text events, LLM token streams | one-way; has id and reconnect support, but awkward for binary |
| WebSocket | bidirectional low-latency messages, collaborative editing, realtime control | you define message types, reconnect, heartbeat, and auth refresh |
| gRPC streaming | typed service-to-service streaming with schemas | browser support is less direct; backend-oriented ecosystem |
| resumable upload protocol | large file uploads and weak-network recovery | the core is session, chunk, checksum, and commit |
The real design question is not “which transport API should we use?” It is:
What is the business unit? Where is the acknowledgment point? Where can recovery resume? How are duplicate messages handled? How are completion and cancellation expressed?
The transport choice carries those answers.
A Practical Design Template
When designing a streaming interface, start by filling this table.
| Question | Upload-side example | Download-side example |
|---|---|---|
| stream identity | upload_id | stream_id / request_id |
| unit boundary | chunk_index + byte_range | event_id + event_type |
| ordering | chunk index is monotonic; chunks may upload concurrently and be sorted later | event id is monotonic |
| integrity | per-chunk checksum + final checksum | event schema validation, optional checksum |
| idempotency | upload_id:chunk_index:checksum | dedupe by event_id, or resume cursor |
| ack / progress | chunk persisted | last event id stored, or explicit app-level ACK |
| resume | query received chunks | replay from last event id |
| completion | explicit complete + commit | explicit done event |
| cancellation | abort upload session and delete temporary chunks | abort backend job |
| cleanup | TTL cleanup for unfinished sessions | stop producer after disconnect or keep short replay buffer |
If this table cannot be filled, the system probably does not have a streaming design yet. It is only sending data in pieces.
Summary
The transport layer answers “how do bytes cross the network?” The application layer answers “how do these bytes advance business state?”
The upload side needs sessions, chunks, checksums, idempotency, commit, and cleanup. The download side needs events, done/error/cancel semantics, heartbeats, resume cursors, and producer lifecycle. Both rely on the transport layer for reliable, ordered, controlled byte movement, but neither can ask the transport layer to guess business meaning.
So the core of streaming is not “use a long connection.”
More precisely, streaming is a state-machine design: split a large object or long-running process into named, confirmable, recoverable, cancellable steps, and give every step clear application-layer semantics.