This page looks best with JavaScript enabled

Streaming Design: Why The Application Layer Still Matters

 ·  ☕ 11 min read · 👀... views
Read aloud Ready
0/0

When people first design a streaming interface, the natural question is:

If TCP already provides a reliable ordered byte stream, and HTTP/2 or HTTP/3 already supports long-lived multiplexed streams, why do we still need application-layer streaming design?

Because the transport layer only promises to move bytes. Real systems care about higher-level facts: which file do these bytes belong to? Is this chunk number 17? Can this operation be retried? What if the same chunk arrives twice? Can a download resume after disconnecting halfway through? Who slows down when the consumer cannot keep up?

Those are not transport-layer decisions.

This post uses two small objects throughout:

  • upload side: upload a 1GB file as 4MB chunks.
  • download side: stream LLM tokens, log lines, or media fragments from a server to a client.

The directions are opposite, but the application-layer problem is similar: a byte stream must be split into meaningful messages, messages must be attached to state, and state must be acknowledged, resumed, and cancelled.

Figure 1: The transport layer moves bytes; the application layer adds boundaries, state, and meaning

Figure 1: The transport layer moves bytes; the application layer adds boundaries, state, and meaning

Start With A Tiny Example

Suppose a client needs to upload a 1GB file and the network may disconnect halfway through. The simplest design is to put the whole file in one HTTP request body:

1
2
3
POST /upload

[1GB raw bytes...]

This can work on a perfect network, but its system semantics are fragile:

  • if the connection breaks at 740MB, how much did the server save?
  • should the client retry the full 1GB, or continue from 740MB?
  • if the server receives the second half again, how does it know this is a retry, not a new file?
  • should integrity be checked with one whole-file checksum, or per chunk?

A practical application-layer design usually turns the upload into a stateful session:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
create upload session
  -> upload_id = u_123

PUT /uploads/u_123/chunks/0000  bytes 0..4MB-1   checksum=a1
PUT /uploads/u_123/chunks/0001  bytes 4MB..8MB-1 checksum=b2
PUT /uploads/u_123/chunks/0002  bytes 8MB..12MB-1 checksum=c3

complete upload u_123
  -> verify chunk list + total checksum
  -> commit object

Now the application layer has introduced objects the transport layer cannot know about:

ObjectPurpose
upload_idgroup multiple requests into one upload session
chunk_index / byte rangecreate retryable byte boundaries
checksumdistinguish reliable delivery from content correctness
completecommit temporary state into a final object
idempotency keymake repeated requests safe

This is the core of upload-side streaming: split one large object into independently confirmable smaller objects, while still committing them as one consistent business object.

Download Streaming Is Not Just Upload In Reverse

Download streaming has a different shape. On the upload side, the client usually owns the complete input. On the download side, the server may generate output while the request is still running.

A typical example is an LLM token stream:

1
2
3
4
5
6
7
8
event: delta
data: {"index":0,"content":"stream"}

event: delta
data: {"index":1,"content":"ing"}

event: done
data: {}

This can be carried by HTTP chunked responses, Server-Sent Events, WebSocket, or HTTP/2 streams. But no matter which transport API is used, the application layer still has to define:

  • where does each event begin and end?
  • what do delta, error, and done mean?
  • can the client resume from index=37 after reconnecting?
  • how does the server send heartbeats so intermediaries do not treat the connection as idle?
  • when the user cancels, how does the server stop backend computation instead of merely closing a socket?

So the download-side problem is not “keep writing to the socket.” It is expose a continuously produced computation as a consumable, terminable, observable event sequence.

sequenceDiagram
    participant Client
    participant API as Application API
    participant Worker as Producer

    Client->>API: start stream request
    API->>Worker: create job(request_id)
    Worker-->>API: delta #0
    API-->>Client: event: delta, id: 0
    Worker-->>API: delta #1
    API-->>Client: event: delta, id: 1
    Client-->>API: cancel / disconnect
    API->>Worker: abort(request_id)
    Worker-->>API: stopped

The important thing in this diagram is not the arrows. It is the request_id and event numbering. Without them, the system can only say “the connection closed.” With them, it can say which business task stopped, how much output was delivered, and whether backend work should be aborted.

What The Transport Layer Already Does

To be fair, the transport layer already solves hard problems.

TCP provides:

  • ordered byte delivery: the receiver sees bytes in the order the sender wrote them.
  • reliable retransmission: lost packets are resent.
  • congestion control: the sender adapts to network conditions.
  • flow control: the receiver window prevents the sender from overflowing receive buffers.

QUIC / HTTP/3 moves parts of this machinery into user space and improves connection migration and head-of-line blocking behavior under multiplexing. HTTP/2 can also carry multiple streams over one connection.

So the application layer should not reimplement packet retransmission, congestion control, or byte ordering. That is transport-layer work.

But the transport abstraction has a clear boundary: it sees connections, packets, streams, and byte offsets. It does not see these facts:

Transport layer knowsApplication layer knows
byte offsetwhich chunk or event this is
connection closeduser cancelled, network failed, or server completed
bytes deliveredcontent passed business validation
receiver window fullUI is slow, disk is slow, or downstream service is slow
stream idwhich file, task, or session this stream belongs to

So the sharper version of “isn’t the transport layer enough?” is:

If the system only needs to move bytes reliably, the transport layer is enough. If it needs to advance business state reliably, the application layer needs a design.

What The Application Layer Must Design

Application-layer streaming design has six recurring dimensions.

1. Message Boundaries

TCP is a byte stream. It does not preserve the message boundaries from application writes. If you call write() three times, the peer may receive the bytes in one read(). If you call write() once, the peer may receive it across multiple read() calls.

The application layer therefore needs framing:

1
2
[length=1048576][chunk bytes...][checksum]
[length=932144 ][chunk bytes...][checksum]

Or it can reuse existing formats:

  • HTTP multipart
  • SSE event: / data: frames
  • WebSocket message frames
  • gRPC streaming messages
  • custom length-prefixed frames

Boundaries are not formatting trivia. They are the unit of state management. Without a boundary, there is no “chunk 17 succeeded” or “event 38 was consumed.”

2. Idempotency And Retry

The hardest part of network failure is not that a request failed. It is that the client often does not know whether the server processed it.

For an upload chunk:

1
2
3
4
client -> server: PUT chunk 17
server: write chunk 17 ok
network: response lost
client: timeout

The client should retry. But if retrying makes the server append chunk 17 twice, the file is corrupted. The application layer needs to make the operation idempotent:

1
2
PUT /uploads/u_123/chunks/17
Idempotency-Key: u_123:17:sha256:...

When the server sees the same key, it can return “already received” instead of writing the data again. TCP cannot infer this. TCP knows whether bytes entered a connection; it does not know whether this request is a replay of a business action.

3. Progress And Resume

A common resumable-upload protocol looks like this:

1
2
3
client: which chunks do you have for upload u_123?
server: 0..184 are committed, 185 is missing
client: continue from chunk 185

Download streaming can have a similar mechanism:

1
2
client: resume stream s_456 from event id 37
server: replay 38..latest if retained, then continue live stream

But the download side has an extra constraint: does the server retain historical events? If events are generated in memory and never stored, reconnecting cannot precisely resume. The server may need to regenerate, or state explicitly that the stream is not resumable. That is part of the application contract.

4. Backpressure

The transport layer has flow control, but application-layer backpressure is still needed because “receive buffer is full” is not the only kind of slowness.

On the download side:

  • browser JavaScript may process events slowly.
  • the client may write to disk slowly.
  • the UI may only need refreshes every 50ms, not every token immediately.
  • a downstream consumer may be slow while the socket buffer is still fine.

On the upload side:

  • server disk writes may be slow.
  • the server may scan, decode, or index data while receiving it.
  • object storage or a database may be the bottleneck.

The application layer can express backpressure in several ways:

  • limit the number of in-flight chunks.
  • return 429 / 503 with Retry-After.
  • adjust chunk size or concurrency based on ACK latency.
  • merge, sample, or batch-flush download events.

Transport flow control protects connection buffers. Application backpressure protects the business processing pipeline.

5. Completion, Error, And Cancellation

A streaming protocol should make “end” a first-class event.

Download streams should distinguish at least:

1
2
3
4
5
6
7
8
event: done
data: {"finish_reason":"stop"}

event: error
data: {"code":"quota_exceeded","message":"..."}

event: cancelled
data: {"by":"client"}

If the protocol only relies on socket close, the client cannot tell whether the close means normal completion, network failure, server crash, or user cancellation. Uploads have the same problem: complete upload is an explicit commit point. Without it, the server cannot reliably distinguish “still uploading” from “abandoned.”

6. Observability And Cleanup

Streaming connections are often long-lived, cross multiple components, and fail in several ways. The application layer needs stable identities:

1
2
3
4
5
request_id = req_abc
upload_id  = up_123
stream_id  = s_456
chunk_id   = 17
event_id   = 38

These ids connect logs, metrics, retries, cancellation, and cleanup jobs. If a download stream disconnects, the API layer needs to find the backend worker and stop computation. If an upload session has not completed after 24 hours, a cleanup job needs to delete temporary chunks.

Without application-layer identity, the system can only clean up connections. With it, the system can clean up business resources.

The Shared Pattern Behind Upload And Download

Compressed into one model, upload and download streaming share the same state machine:

stateDiagram-v2
    [*] --> Created
    Created --> Streaming: first chunk/event
    Streaming --> Streaming: ack + next unit
    Streaming --> Paused: temporary failure
    Paused --> Streaming: resume
    Streaming --> Completed: done/complete
    Streaming --> Failed: unrecoverable error
    Streaming --> Cancelled: client/server abort
    Paused --> Cancelled: timeout cleanup
    Completed --> [*]
    Failed --> [*]
    Cancelled --> [*]

The difference is what “unit” means:

DirectionUnitProgress checkpointResume condition
Uploadchunk / byte rangeserver persisted or validated the chunkserver can list received chunks
Downloadevent / token / frameclient stored the last event id, or the protocol has explicit ACKsserver can replay history, or recompute

This state machine is also a checklist for evaluating a streaming design:

  • can a stream be explicitly created?
  • does every unit have a boundary and identifier?
  • is success or progress acknowledged per unit, or only at the end?
  • after failure, can both sides know the last consistent point?
  • does cancellation release backend resources?
  • is completion separate from connection close?

Choosing SSE, WebSocket, gRPC, Or A Custom Protocol

Application-layer design does not mean inventing a new protocol. Most systems should reuse mature carriers.

OptionGood fitWatch out
HTTP chunked responsesimple download streams, continuous server outputcarries bytes but does not define event semantics
SSEserver-to-browser text events, LLM token streamsone-way; has id and reconnect support, but awkward for binary
WebSocketbidirectional low-latency messages, collaborative editing, realtime controlyou define message types, reconnect, heartbeat, and auth refresh
gRPC streamingtyped service-to-service streaming with schemasbrowser support is less direct; backend-oriented ecosystem
resumable upload protocollarge file uploads and weak-network recoverythe core is session, chunk, checksum, and commit

The real design question is not “which transport API should we use?” It is:

What is the business unit? Where is the acknowledgment point? Where can recovery resume? How are duplicate messages handled? How are completion and cancellation expressed?

The transport choice carries those answers.

A Practical Design Template

When designing a streaming interface, start by filling this table.

QuestionUpload-side exampleDownload-side example
stream identityupload_idstream_id / request_id
unit boundarychunk_index + byte_rangeevent_id + event_type
orderingchunk index is monotonic; chunks may upload concurrently and be sorted laterevent id is monotonic
integrityper-chunk checksum + final checksumevent schema validation, optional checksum
idempotencyupload_id:chunk_index:checksumdedupe by event_id, or resume cursor
ack / progresschunk persistedlast event id stored, or explicit app-level ACK
resumequery received chunksreplay from last event id
completionexplicit complete + commitexplicit done event
cancellationabort upload session and delete temporary chunksabort backend job
cleanupTTL cleanup for unfinished sessionsstop producer after disconnect or keep short replay buffer

If this table cannot be filled, the system probably does not have a streaming design yet. It is only sending data in pieces.

Summary

The transport layer answers “how do bytes cross the network?” The application layer answers “how do these bytes advance business state?”

The upload side needs sessions, chunks, checksums, idempotency, commit, and cleanup. The download side needs events, done/error/cancel semantics, heartbeats, resume cursors, and producer lifecycle. Both rely on the transport layer for reliable, ordered, controlled byte movement, but neither can ask the transport layer to guess business meaning.

So the core of streaming is not “use a long connection.”

More precisely, streaming is a state-machine design: split a large object or long-running process into named, confirmable, recoverable, cancellable steps, and give every step clear application-layer semantics.

Share on