With the Compression Streams API browsers provide ways to compress and decompress data without having to rely on third-party implementations or native implementations compiled to JavaScript or WASM. Currently, only the DEFLATE
and GZIP
formats are supported. This API only operates on readable and writable streams from the Streams API.
As part of implementing reading of ZIP files in the browser, I needed a way to process compressed data present in an ArrayBuffer
.
To be able to decompress it with the DecompressionStream
the data has to be provided by a ReadableStream
. There are several ways to do that, for example:
- You can manually implemente a source for the stream.
- You can obtain a stream from a
Response
instance provided by the Fetch API via thebody
read-only property. - In Firefox (and Node.js), you can use
ReadableStream.from
to generate such a stream from several sources like arrays, arrays of promises etc. Unfortunately, this is currently experimental and not available in any other browser.
In my implementation, only the first approach is applicable. The second approach could by useful if you are implementing a fully streaming decompression from a remote source. Please note that the ZIP format has some gotchas around that, for example, the size of the uncompressed data may be unknown if the archive was created in a streaming fashion. This is the case if a command in the form of cat to_compress.txt | zip > test.zip
was used.
All code listings below are in TypeScript.
Manually Implementing a Source
The underlying source of a ReadableStream
consists of the following:
start
: This method is called by the stream upon construction.pull
: This method is called to request new data if needed. This can be called if the stream's internal queues still have space left.cancel
: This method is called if the stream was cancelled, for example if a reader of the stream cannot continue processing anymore.
There are other attributes to further customise the behaviour of the stream. All methods mentioned above are optional. The source is a simple ArrayBuffer
, thus, neither extra setup during stream construction nor during cancellation is needed. The type
attribute will be addressed later.
The stream passes a stream controller to the start
and pull
methods which allows providing data to the stream (via enqueue
) and managing the stream itself (via close
and error
).
A simple implementation is to just use a simple pull
implementation which provides the full ArrayBuffer
at once and then closes the stream (i.e., indicating that no more data will be provided by the stream):
According to the specification of DecompressionStream
, the chunks of data provided by to it (in our case by calling controller.enqueue once for a single chunk containing all the data) must be a BufferSource
, i.e., an ArrayBuffer
or one of the concrete TypedArray
instances like Uint8Array
[1][2][3].
Decompressing the Data
With the data source ready, we can construct a DecompressionStream
. This is a TransformStream
which is reading from ReadableStream
and writing to a WritableStream
.
In my case, I only have raw data compressed with DEFLATE without any headers or checksums in Zlib format. Thus, I need to select the deflate-raw
format instead of the deflate
format:
Each ReadableStream
provides a utility method pipeThrough
to apply such a TransformStream
and return another ReadableStream
which provides the processed data. This method performs the necessary plumbing to make the data written to the TransformStream
's WritableStream
available to read via the returned ReadableStream
:
Unfortunately, the DecompressionStream
is a GenericTransformStream
in TypeScript which does use an any
as the underlying type for the ReadableStream<T>
and WritableStream<T>
.
According to the specification of DecompressionStream
it produces chunks of Uint8Array
type [1]. Thus, it is safe to provide a generic argument of type Uint8Array
to the pipeThrough
method to ensure that the resulting stream is typed correctly.
Working with the Uncompressed Stream of Data
For most browsers, one can use the recently released async iteration on ReadableStream
with for await
. Unfortunately, Safari has yet to support this. As this is the main browser where the decompression will be used, I needed to implement reading the data myself.
There is a way to construct an exclusive stream reader using the getReader
method on a ReadableStream
. Please note, that this reader locks the stream and prevents creating other readers for the same stream until the reader is closed.
On this reader, the read
method can be used to obtain data from the stream until the stream is done providing data. As I know the size of the decompressed data from metadata, I can pre-allocate a buffer which can hold all of the resulting data.
Each read
call will return a chunk of the data, which is written into the result buffer by using set
. This method supports storing elements of an array-like objects (for example Uint8Array
instances) in a TypedArray
(which Uint8Array
is a subclass of). An optional offset can be provided to copy to different positions in the array.
To ensure that all decompressed data is received, looping until the stream completes. The reader signals this completion by setting done
to true
in the return value of the read
call and returning a value
of undefined
.
Alternative Approaches and Possible Improvements
There are several alternative approaches and possible improvements for the solution above:
- Using the
type: 'byte'
attribute as part of the source when creating theReadableStream
could allow for less copying of data combined with other features of that class. As my performance requirements are low (occasionally a 10-20 MB ZIP file needs to be processed), I have skipped this in my implementation. - Instead of providing a
poll
implementation on the source astart
implementation could be provided which just queues the full buffer instead. I noticed this improvement only while writing this post and I will probably adjust my implementation with it. - If a consumer or transformer down the pipeline cannot hold the resulting data in memory, it could be useful to provide only parts of the source and use the built-in back-pressure mechanisms to regulate the flow of data.
- An alternative approach is to create a
WriteableStream
with a custom sink (which works similarly to custom source for aReadableStream
) and pass it to thepipeTo
method on theReadableStream
containing the decompressed data. This would also allow for further control of the data flow, if needed. - Use
autoAllocateChunkSize
on the stream source to customise the buffer size fortype: 'byte'
streams. This would allow the streams to pre-allocate the internal buffers to the size of the source buffer and reduce allocations. - Use
getReader({ mode: 'byob'})
to get a stream reader which accepts a target buffer / view for itsread
method, which could allow the initial source (or some intermediateTransformStream
) to directly write to that buffer. Together with usingtype: 'byte'
sources andautoAllocateChunkSize
, this would reduce the amount of copy operations needed. - The provided metadata may be incorrect, so the code using the reader will need to account for less or more data being provided by the resulting stream.
As my implementation is running on the client-side only and it's more-or-less a proof of concept, I skipped most of the improvements but wrote them down so I can reference them in the future for other use-cases. Further investigating the built-in features could be really interesting, though.
See Also
In this section you can find the sources used for this article and other interesting links.
- https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream
- Firefox-only and Node-only conversion to
ReadableStream
: https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#convert_an_iterator_or_async_iterator_to_a_stream and https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/from_static - Obtaining a
Reader
from aReadableStream
: https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/getReader - Async iteration over stream chunks which is currently not supported in Safari: https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#async_iteration_of_a_stream_using_for_await...of
- Firefox-only and Node-only conversion to
- https://web.dev/articles/streams
- https://developer.mozilla.org/en-US/docs/Web/API/ReadableStreamDefaultReader
- Reading chunks from a
ReadableStream
via aReadableStreamDefaultReader
: https://developer.mozilla.org/en-US/docs/Web/API/ReadableStreamDefaultReader/read
- Reading chunks from a
- https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray
- Setting multiple elements of a
TypedArray
from another array-like object: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray/set
- Setting multiple elements of a
- https://developer.mozilla.org/en-US/docs/Web/API/DecompressionStream
- https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array
- https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer
Footnotes
[1] https://compression.spec.whatwg.org/#decompression-stream
[2] https://webidl.spec.whatwg.org/#BufferSource
[3] https://webidl.spec.whatwg.org/#ArrayBufferView
Using DecompressionStream with an ArrayBuffer
Even though the Compression Streams API provides a built-in way to decompress data, it is not straightforward to use with non-streamed data. In this article, I describe a possible approach to decompress data directly from an ArrayBuffer and further ideas on how to use it more effectively.