|
| 1 | +# ZFP codec |
| 2 | + |
| 3 | +Defines a `array -> bytes` codec that compresses chunks using the [zfp](https://github.com/LLNL/zfp) algorithm. |
| 4 | + |
| 5 | +## Codec name |
| 6 | + |
| 7 | +The value of the `name` member in the codec object MUST be `zfp`. |
| 8 | + |
| 9 | +## Configuration parameters |
| 10 | + |
| 11 | +The configuration of this codec matches the compression modes defined at <https://zfp.readthedocs.io/en/latest/modes.html#expert-mode>. |
| 12 | +Refer to that page for usage information. |
| 13 | + |
| 14 | +The codec has one parameter which is always required: |
| 15 | +- `mode` (string). |
| 16 | + |
| 17 | +The other required parameters are dependent on the `mode`: |
| 18 | +- `"mode": "reversible"` |
| 19 | +- `"mode": "expert"` |
| 20 | + - `minbits` (unsigned integer) |
| 21 | + - `maxbits` (unsigned integer) |
| 22 | + - `maxprec` (unsigned integer) |
| 23 | + - `minexp` (signed integer) |
| 24 | +- `"mode": "fixed_accuracy"` |
| 25 | + - `tolerance` (number) |
| 26 | +- `"mode": "fixed_rate"` |
| 27 | + - `rate` (number) |
| 28 | +- `"fixed_precision"` |
| 29 | + - `precision` (unsigned integer) |
| 30 | + |
| 31 | +## Example |
| 32 | + |
| 33 | +For example, the array metadata below specifies that the array contains `zfp` compressed chunks using `fixed_accuracy` mode with a tolerance of 0.05: |
| 34 | + |
| 35 | +```json |
| 36 | +{ |
| 37 | + "codecs": [{ |
| 38 | + "name": "zfp", |
| 39 | + "configuration": { |
| 40 | + "mode": "fixed_accuracy", |
| 41 | + "tolerance": 0.05 |
| 42 | + } |
| 43 | + }], |
| 44 | +} |
| 45 | +``` |
| 46 | + |
| 47 | +More examples can be viewed in the [examples](./examples/) subdirectory. |
| 48 | + |
| 49 | +## Supported Chunk Shapes |
| 50 | + |
| 51 | +`zfp` natively only supports 1, 2, 3 and 4 dimensional arrays. |
| 52 | +Chunk shapes are mapped to `zfp` field sizes according to the [`zfp_field_Nd`](https://zfp.readthedocs.io/en/release0.5.5/high-level-api.html#array-metadata) APIs as follows: |
| 53 | + - 1D: `[nx]` |
| 54 | + - 2D: `[ny, nx]` |
| 55 | + - 3D: `[nz, ny, nx]` |
| 56 | + - 4D: `[nw, nz, ny, nx]` |
| 57 | + |
| 58 | +The chunk of a zero-dimensional Zarr array is interpreted as a 1D `zfp` field with `nx = 1`. |
| 59 | + |
| 60 | +Chunks with more than four dimensions are not supported directly by this codec. |
| 61 | +However, higher-dimensional arrays could be supported by preceeding this codec with a [`np.squeeze`](https://numpy.org/doc/stable/reference/generated/numpy.squeeze.html) inspired array-to-array codec that collapses singleton dimensions (dimensions of size 1), provided the resulting dimensionality is four or fewer. |
| 62 | +For example, a chunk with shape `[4, 1, 3, 1, 2, 1]` would be squeezed to a `zfp` field size of `[nz, ny, nx] = [4, 3, 2]`. |
| 63 | + |
| 64 | +These rules apply to the inner chunk shape if this codec is used as the array-to-bytes codec within the `sharding_indexed` codec. |
| 65 | + |
| 66 | +## Supported Data Types |
| 67 | + |
| 68 | +- `int32`, `uint32`, `int64`, `uint64`, `float32`, `float64` |
| 69 | + |
| 70 | +Implementations may support lower-precision data types (e.g. `float16`, `bfloat16`, `int4`, etc.) through promotion / casting to the above data types. |
| 71 | + |
| 72 | +Implementations may support additional data types that could be interpreted or promoted to the above data types (e.g. `datetime64` -> `int64`). |
| 73 | + |
| 74 | +## Format and algorithm |
| 75 | + |
| 76 | +This format is tightly coupled to the [`zfp` C library](https://zfp.readthedocs.io/en/latest/). |
| 77 | + |
| 78 | +### Compression |
| 79 | + |
| 80 | +1. Lower-precision data types must first be promoted to 32-bit |
| 81 | + - Floating point data types (e.g. `float16`, `bfloat16`, etc.) can be supported by casting to `float32` in the normal way. |
| 82 | + - Integer data types must be promoted to `int32` in accordance with the [`zfp_promote_*`](https://zfp.readthedocs.io/en/release0.5.5/low-level-api.html#utility-functions) functions: |
| 83 | + - `int` with `N` bits: `int32_value = (int32_t)intN_value << (31 - N)` |
| 84 | + - `uint` with `N` bits: `int32_value = ((int32_t)uintN_value - (1<<(N-1))) << (31 - N)` |
| 85 | +2. The uncompressed data is represented as a contiguous array in a [`zfp_field`](https://zfp.readthedocs.io/en/release0.5.5/high-level-api.html#c.zfp_field) and compressed with `zfp_compress`: |
| 86 | + - The field sizes are set in accordance with the rules described in [Supported Chunk Shapes](#supported-chunk-shapes). |
| 87 | + |
| 88 | +### Decompression |
| 89 | + |
| 90 | +1. The data is decompressed into a contiguous array with `zfp_decompress`. |
| 91 | +2. Lower-precision data types are restored through demotion: |
| 92 | + - Floating point data types (e.g. `float16`, `bfloat16`, etc.) can be supported by casting from `float32` in the normal way. |
| 93 | + - Integer data types must be be demoted from `int32` in accordance with the appropriate [`zfp_demote_*`](https://zfp.readthedocs.io/en/release0.5.5/low-level-api.html#utility-functions) functions: |
| 94 | + - `int` with `N` bits: `intN_value = (intN_t)clamp(int32_value >> (31 - N), 1<<(N-1), (1<<(N-1)) - 1)` |
| 95 | + - `uint` with `N` bits: `uintN_value = (uintN_t)clamp((int32_value >> (31 - N)) + (1<<(N-1)), 0, (1<<N) - 1)` |
| 96 | + |
| 97 | +## Differences from `numcodecs.zfpy` |
| 98 | + |
| 99 | +- `mode` is a string rather than an integer. |
| 100 | +- `mode` supports `reversible` and `expert` mode. |
| 101 | +- Lower-precision integer and floating point data types are supported. |
| 102 | +- a header is not written with [`zfp_write_header`](https://zfp.readthedocs.io/en/release0.5.5/high-level-api.html#c.zfp_write_header). |
| 103 | + - This header is redundant given the information in the codec configuration. |
| 104 | + |
| 105 | +> [!NOTE] |
| 106 | +> An earlier version of the `zfp` codec in the [`zarrs`](https://github.com/LDeakin/zarrs) Rust crate included an optional [`write_header` parameter](https://docs.rs/zarrs_metadata/0.3.7/zarrs_metadata/v3/array/codec/zfp/struct.ZfpCodecConfigurationV1.html) in the codec configuration. |
| 107 | +> This has since been removed in favor of better separating `zfp` and `zfpy`. |
| 108 | +
|
| 109 | +## Change log |
| 110 | + |
| 111 | +No changes yet. |
| 112 | + |
| 113 | +## Current maintainers |
| 114 | + |
| 115 | +* [Lachlan Deakin](https://github.com/LDeakin) |
0 commit comments