Skip to content

Commit dd57397

Browse files
authored
add zfp codec (#8)
* add `zarrs.zfp` codec * fix: make all properties required * fix: datetime64 to int64 * add: lower-precision data type support * fix: add supported chunk shapes section * chore: rename to `zfp` * fix: change language on chunk shape support * fix: fixup rename to zfp * fix: adjust rules for higher-dimensional arrays * fix: reword higher dimensional support * fix: typo * fix: clarify mapping of zarr shapes to zfp shapes and general cleanup * remove direct higher dimensional support, use squeeze instead * reword
1 parent a0b7932 commit dd57397

File tree

7 files changed

+246
-0
lines changed

7 files changed

+246
-0
lines changed

codecs/zarrs.zfp/README.md

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
# ZFP codec
2+
3+
Defines a `array -> bytes` codec that compresses chunks using the [zfp](https://github.com/LLNL/zfp) algorithm.
4+
5+
## Codec name
6+
7+
The value of the `name` member in the codec object MUST be `zfp`.
8+
9+
## Configuration parameters
10+
11+
The configuration of this codec matches the compression modes defined at <https://zfp.readthedocs.io/en/latest/modes.html#expert-mode>.
12+
Refer to that page for usage information.
13+
14+
The codec has one parameter which is always required:
15+
- `mode` (string).
16+
17+
The other required parameters are dependent on the `mode`:
18+
- `"mode": "reversible"`
19+
- `"mode": "expert"`
20+
- `minbits` (unsigned integer)
21+
- `maxbits` (unsigned integer)
22+
- `maxprec` (unsigned integer)
23+
- `minexp` (signed integer)
24+
- `"mode": "fixed_accuracy"`
25+
- `tolerance` (number)
26+
- `"mode": "fixed_rate"`
27+
- `rate` (number)
28+
- `"fixed_precision"`
29+
- `precision` (unsigned integer)
30+
31+
## Example
32+
33+
For example, the array metadata below specifies that the array contains `zfp` compressed chunks using `fixed_accuracy` mode with a tolerance of 0.05:
34+
35+
```json
36+
{
37+
"codecs": [{
38+
"name": "zfp",
39+
"configuration": {
40+
"mode": "fixed_accuracy",
41+
"tolerance": 0.05
42+
}
43+
}],
44+
}
45+
```
46+
47+
More examples can be viewed in the [examples](./examples/) subdirectory.
48+
49+
## Supported Chunk Shapes
50+
51+
`zfp` natively only supports 1, 2, 3 and 4 dimensional arrays.
52+
Chunk shapes are mapped to `zfp` field sizes according to the [`zfp_field_Nd`](https://zfp.readthedocs.io/en/release0.5.5/high-level-api.html#array-metadata) APIs as follows:
53+
- 1D: `[nx]`
54+
- 2D: `[ny, nx]`
55+
- 3D: `[nz, ny, nx]`
56+
- 4D: `[nw, nz, ny, nx]`
57+
58+
The chunk of a zero-dimensional Zarr array is interpreted as a 1D `zfp` field with `nx = 1`.
59+
60+
Chunks with more than four dimensions are not supported directly by this codec.
61+
However, higher-dimensional arrays could be supported by preceeding this codec with a [`np.squeeze`](https://numpy.org/doc/stable/reference/generated/numpy.squeeze.html) inspired array-to-array codec that collapses singleton dimensions (dimensions of size 1), provided the resulting dimensionality is four or fewer.
62+
For example, a chunk with shape `[4, 1, 3, 1, 2, 1]` would be squeezed to a `zfp` field size of `[nz, ny, nx] = [4, 3, 2]`.
63+
64+
These rules apply to the inner chunk shape if this codec is used as the array-to-bytes codec within the `sharding_indexed` codec.
65+
66+
## Supported Data Types
67+
68+
- `int32`, `uint32`, `int64`, `uint64`, `float32`, `float64`
69+
70+
Implementations may support lower-precision data types (e.g. `float16`, `bfloat16`, `int4`, etc.) through promotion / casting to the above data types.
71+
72+
Implementations may support additional data types that could be interpreted or promoted to the above data types (e.g. `datetime64` -> `int64`).
73+
74+
## Format and algorithm
75+
76+
This format is tightly coupled to the [`zfp` C library](https://zfp.readthedocs.io/en/latest/).
77+
78+
### Compression
79+
80+
1. Lower-precision data types must first be promoted to 32-bit
81+
- Floating point data types (e.g. `float16`, `bfloat16`, etc.) can be supported by casting to `float32` in the normal way.
82+
- Integer data types must be promoted to `int32` in accordance with the [`zfp_promote_*`](https://zfp.readthedocs.io/en/release0.5.5/low-level-api.html#utility-functions) functions:
83+
- `int` with `N` bits: `int32_value = (int32_t)intN_value << (31 - N)`
84+
- `uint` with `N` bits: `int32_value = ((int32_t)uintN_value - (1<<(N-1))) << (31 - N)`
85+
2. The uncompressed data is represented as a contiguous array in a [`zfp_field`](https://zfp.readthedocs.io/en/release0.5.5/high-level-api.html#c.zfp_field) and compressed with `zfp_compress`:
86+
- The field sizes are set in accordance with the rules described in [Supported Chunk Shapes](#supported-chunk-shapes).
87+
88+
### Decompression
89+
90+
1. The data is decompressed into a contiguous array with `zfp_decompress`.
91+
2. Lower-precision data types are restored through demotion:
92+
- Floating point data types (e.g. `float16`, `bfloat16`, etc.) can be supported by casting from `float32` in the normal way.
93+
- Integer data types must be be demoted from `int32` in accordance with the appropriate [`zfp_demote_*`](https://zfp.readthedocs.io/en/release0.5.5/low-level-api.html#utility-functions) functions:
94+
- `int` with `N` bits: `intN_value = (intN_t)clamp(int32_value >> (31 - N), 1<<(N-1), (1<<(N-1)) - 1)`
95+
- `uint` with `N` bits: `uintN_value = (uintN_t)clamp((int32_value >> (31 - N)) + (1<<(N-1)), 0, (1<<N) - 1)`
96+
97+
## Differences from `numcodecs.zfpy`
98+
99+
- `mode` is a string rather than an integer.
100+
- `mode` supports `reversible` and `expert` mode.
101+
- Lower-precision integer and floating point data types are supported.
102+
- a header is not written with [`zfp_write_header`](https://zfp.readthedocs.io/en/release0.5.5/high-level-api.html#c.zfp_write_header).
103+
- This header is redundant given the information in the codec configuration.
104+
105+
> [!NOTE]
106+
> An earlier version of the `zfp` codec in the [`zarrs`](https://github.com/LDeakin/zarrs) Rust crate included an optional [`write_header` parameter](https://docs.rs/zarrs_metadata/0.3.7/zarrs_metadata/v3/array/codec/zfp/struct.ZfpCodecConfigurationV1.html) in the codec configuration.
107+
> This has since been removed in favor of better separating `zfp` and `zfpy`.
108+
109+
## Change log
110+
111+
No changes yet.
112+
113+
## Current maintainers
114+
115+
* [Lachlan Deakin](https://github.com/LDeakin)
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
{
2+
"name": "zfp",
3+
"configuration": {
4+
"mode": "expert",
5+
"minbits": 1,
6+
"maxbits": 13,
7+
"maxprec": 19,
8+
"minexp": -2
9+
}
10+
}
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"name": "zfp",
3+
"configuration": {
4+
"mode": "fixed_accuracy",
5+
"tolerance": 0.05
6+
}
7+
}
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"name": "zfp",
3+
"configuration": {
4+
"mode": "fixed_precision",
5+
"precision": 19
6+
}
7+
}
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"name": "zfp",
3+
"configuration": {
4+
"mode": "fixed_rate",
5+
"rate": 10.5
6+
}
7+
}
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{
2+
"name": "zfp",
3+
"configuration": {
4+
"mode": "reversible"
5+
}
6+
}

codecs/zarrs.zfp/schema.json

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
{
2+
"$schema": "https://json-schema.org/draft/2020-12/schema",
3+
"oneOf": [
4+
{
5+
"type": "object",
6+
"properties": {
7+
"name": {
8+
"const": "zarrs.zfp"
9+
},
10+
"configuration": {
11+
"oneOf": [
12+
{
13+
"type": "object",
14+
"properties": {
15+
"mode": {
16+
"const": "reversible"
17+
}
18+
},
19+
"required": ["mode"],
20+
"additionalProperties": false
21+
},
22+
{
23+
"type": "object",
24+
"properties": {
25+
"mode": {
26+
"const": "expert"
27+
},
28+
"minbits": {
29+
"type": "integer",
30+
"minimum": 0
31+
},
32+
"maxbits": {
33+
"type": "integer",
34+
"minimum": 0
35+
},
36+
"maxprec": {
37+
"type": "integer",
38+
"minimum": 0
39+
},
40+
"minexp": {
41+
"type": "integer"
42+
}
43+
},
44+
"required": ["mode", "minbits", "maxbits", "maxprec", "minexp"],
45+
"additionalProperties": false
46+
},
47+
{
48+
"type": "object",
49+
"properties": {
50+
"mode": {
51+
"const": "fixed_accuracy"
52+
},
53+
"tolerance": {
54+
"type": "number"
55+
}
56+
},
57+
"required": ["mode", "tolerance"],
58+
"additionalProperties": false
59+
},
60+
{
61+
"type": "object",
62+
"properties": {
63+
"mode": {
64+
"const": "fixed_rate"
65+
},
66+
"rate": {
67+
"type": "number"
68+
}
69+
},
70+
"required": ["mode", "rate"],
71+
"additionalProperties": false
72+
},
73+
{
74+
"type": "object",
75+
"properties": {
76+
"mode": {
77+
"const": "fixed_precision"
78+
},
79+
"precision": {
80+
"type": "integer",
81+
"minimum": 0
82+
}
83+
},
84+
"required": ["mode", "precision"],
85+
"additionalProperties": false
86+
}
87+
]
88+
}
89+
},
90+
"required": ["name", "configuration"],
91+
"additionalProperties": false
92+
}
93+
]
94+
}

0 commit comments

Comments
 (0)