With the Compression Streams API browsers provide ways to compress and decompress data without having to rely on third-party implementations or native implementations compiled to JavaScript or WASM. Currently, only the DEFLATE and GZIP formats are supported. This API only operates on readable and writable streams from the Streams API.

As part of implementing reading of ZIP files in the browser, I needed a way to process compressed data present in an ArrayBuffer.

To be able to decompress it with the DecompressionStream the data has to be provided by a ReadableStream. There are several ways to do that, for example:

  1. You can manually implemente a source for the stream.
  2. You can obtain a stream from a Response instance provided by the Fetch API via the body read-only property.
  3. In Firefox (and Node.js), you can use ReadableStream.from to generate such a stream from several sources like arrays, arrays of promises etc. Unfortunately, this is currently experimental and not available in any other browser.

In my implementation, only the first approach is applicable. The second approach could by useful if you are implementing a fully streaming decompression from a remote source. Please note that the ZIP format has some gotchas around that, for example, the size of the uncompressed data may be unknown if the archive was created in a streaming fashion. This is the case if a command in the form of cat to_compress.txt | zip > test.zip was used.

All code listings below are in TypeScript.

Manually Implementing a Source

The underlying source of a ReadableStream consists of the following:

  1. start: This method is called by the stream upon construction.
  2. pull: This method is called to request new data if needed. This can be called if the stream's internal queues still have space left.
  3. cancel: This method is called if the stream was cancelled, for example if a reader of the stream cannot continue processing anymore.

There are other attributes to further customise the behaviour of the stream. All methods mentioned above are optional. The source is a simple ArrayBuffer, thus, neither extra setup during stream construction nor during cancellation is needed. The type attribute will be addressed later.

The stream passes a stream controller to the start and pull methods which allows providing data to the stream (via enqueue) and managing the stream itself (via close and error).

A simple implementation is to just use a simple pull implementation which provides the full ArrayBuffer at once and then closes the stream (i.e., indicating that no more data will be provided by the stream):

// The ArrayBuffer containing the compressed data.
const data = new ArrayBuffer();

const readableStream = new ReadableStream<ArrayBuffer>({
  pull(controller): void {
    controller.enqueue(data);
    controller.close();
  },
});

The implementation of a ReadableStream providing data from an ArrayBuffer.

According to the specification of DecompressionStream, the chunks of data provided by to it (in our case by calling controller.enqueue once for a single chunk containing all the data) must be a BufferSource, i.e., an ArrayBuffer or one of the concrete TypedArray instances like Uint8Array [1][2][3].

Decompressing the Data

With the data source ready, we can construct a DecompressionStream. This is a TransformStream which is reading from ReadableStream and writing to a WritableStream.

In my case, I only have raw data compressed with DEFLATE without any headers or checksums in Zlib format. Thus, I need to select the deflate-raw format instead of the deflate format:

const decompressionStream = 
  new DecompressionStream('deflate-raw');

The construction of a DecompressionStream. Other possible parameters are deflate and gzip.

Each ReadableStream provides a utility method pipeThrough to apply such a TransformStream and return another ReadableStream which provides the processed data. This method performs the necessary plumbing to make the data written to the TransformStream's WritableStream available to read via the returned ReadableStream:

const uncompressedStream = readableStream
  .pipeThrough<Uint8Array>(decompressionStream);

Using pipeTrough on a stream containing compressed data to obtain a transformed stream of uncompressed data.

Unfortunately, the DecompressionStream is a GenericTransformStream in TypeScript which does use an any as the underlying type for the ReadableStream<T> and WritableStream<T>.

According to the specification of DecompressionStream it produces chunks of Uint8Array type [1]. Thus, it is safe to provide a generic argument of type Uint8Array to the pipeThrough method to ensure that the resulting stream is typed correctly.

Working with the Uncompressed Stream of Data

For most browsers, one can use the recently released async iteration on ReadableStream with for await. Unfortunately, Safari has yet to support this. As this is the main browser where the decompression will be used, I needed to implement reading the data myself.

There is a way to construct an exclusive stream reader using the getReader method on a ReadableStream. Please note, that this reader locks the stream and prevents creating other readers for the same stream until the reader is closed.

On this reader, the read method can be used to obtain data from the stream until the stream is done providing data. As I know the size of the decompressed data from metadata, I can pre-allocate a buffer which can hold all of the resulting data.

const uncompressedReader = uncompressedStream.getReader();

const expectedSize = /* ... */;
const uncompressedData = new Uint8Array(expectedSize);
let uncompressedOffset = 0;

let decompressionDone = false;
while (!decompressionDone) {
  const readResult = await uncompressedReader.read();
  if (readResult.done) {
    decompressionDone = true;
  }

  const chunk = readResult.value;
  if (chunk) {
    // The chunk will be undefined if `readResult.done` is true.
    uncompressedData.set(chunk, uncompressedOffset);
    uncompressedOffset += chunk.length;
  }
}

// This should be in a `finally` clause, but is omitted here
// to make the core part of this code listing more readable.
uncompressedReader.releaseLock();

const resultArrayBuffer = uncompressedData.buffer;

Using a reader to obtain the uncompressed data and storing it in a buffer.

Each read call will return a chunk of the data, which is written into the result buffer by using set. This method supports storing elements of an array-like objects (for example Uint8Array instances) in a TypedArray (which Uint8Array is a subclass of). An optional offset can be provided to copy to different positions in the array.

To ensure that all decompressed data is received, looping until the stream completes. The reader signals this completion by setting done to true in the return value of the read call and returning a value of undefined.

Alternative Approaches and Possible Improvements

There are several alternative approaches and possible improvements for the solution above:

  1. Using the type: 'byte' attribute as part of the source when creating the ReadableStream could allow for less copying of data combined with other features of that class. As my performance requirements are low (occasionally a 10-20 MB ZIP file needs to be processed), I have skipped this in my implementation.
  2. Instead of providing a poll implementation on the source a start implementation could be provided which just queues the full buffer instead. I noticed this improvement only while writing this post and I will probably adjust my implementation with it.
  3. If a consumer or transformer down the pipeline cannot hold the resulting data in memory, it could be useful to provide only parts of the source and use the built-in back-pressure mechanisms to regulate the flow of data.
  4. An alternative approach is to create a WriteableStream with a custom sink (which works similarly to custom source for a ReadableStream) and pass it to the pipeTo method on the ReadableStream containing the decompressed data. This would also allow for further control of the data flow, if needed.
  5. Use autoAllocateChunkSize on the stream source to customise the buffer size for type: 'byte' streams. This would allow the streams to pre-allocate the internal buffers to the size of the source buffer and reduce allocations.
  6. Use getReader({ mode: 'byob'}) to get a stream reader which accepts a target buffer / view for its read method, which could allow the initial source (or some intermediate TransformStream) to directly write to that buffer. Together with using type: 'byte' sources and autoAllocateChunkSize, this would reduce the amount of copy operations needed.
  7. The provided metadata may be incorrect, so the code using the reader will need to account for less or more data being provided by the resulting stream.

As my implementation is running on the client-side only and it's more-or-less a proof of concept, I skipped most of the improvements but wrote them down so I can reference them in the future for other use-cases. Further investigating the built-in features could be really interesting, though.

See Also

In this section you can find the sources used for this article and other interesting links.

Footnotes

[1] https://compression.spec.whatwg.org/#decompression-stream

[2] https://webidl.spec.whatwg.org/#BufferSource

[3] https://webidl.spec.whatwg.org/#ArrayBufferView

Using DecompressionStream with an ArrayBuffer

Even though the Compression Streams API provides a built-in way to decompress data, it is not straightforward to use with non-streamed data. In this article, I describe a possible approach to decompress data directly from an ArrayBuffer and further ideas on how to use it more effectively.