Skip to content

Commit 052377b

Browse files
authored
feat: add bitround codec (#26)
* feat: add `bitround` codec * feat: add `bitround` sample data * fix: strip unnecessary attributes
1 parent 0164eeb commit 052377b

File tree

7 files changed

+232
-0
lines changed

7 files changed

+232
-0
lines changed

codecs/bitround/README.md

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# bitround codec
2+
3+
Defines an `array -> array` codec to bit-round floating-point numbers and integers.
4+
5+
## Codec name
6+
7+
The value of the `name` member in the codec object MUST be `bitround` or a recognised alias.
8+
9+
### Aliases
10+
#### `numcodecs.bitround` (Deprecated)
11+
12+
Implementations may accept `numcodecs.bitround` as an alias for this codec.
13+
However, it is considered deprecated and `numcodecs.bitround` SHOULD NOT be used to store new data.
14+
15+
## Configuration parameters
16+
17+
### `keepbits` (Required)
18+
19+
An integer specifying the number of bits to keep after rounding. Must be at least 1.
20+
21+
For floating-point data types, this specifies the number of bits of the mantissa to retain.
22+
23+
For integer data types, this specifies the number of bits to retain from the most significant set bit.
24+
25+
## Example
26+
27+
For example, the array metadata below specifies that the array contains bitrounded chunks:
28+
29+
```json
30+
{
31+
"codecs": [{
32+
"name": "bitround",
33+
"configuration": {
34+
"keepbits": 10
35+
}
36+
}, { "name": "bytes", "configuration": { "endian": "little" } }]
37+
}
38+
```
39+
40+
## Format and algorithm
41+
42+
This is an `array -> array` codec that reduces the precision of numeric data to improve compressibility.
43+
44+
### Floating-point data types
45+
46+
For floating-point values, the codec rounds the mantissa to the specified number of bits (`keepbits`). This operation:
47+
- Preserves the sign and exponent
48+
- Rounds the mantissa, keeping only `keepbits` bits
49+
50+
### Integer data types
51+
52+
For integer values, the codec rounds from the most significant set bit. This operation:
53+
- Identifies the most significant bit that is set
54+
- Keeps `keepbits` bits starting from that position
55+
- Rounds the remaining lower-order bits to zero
56+
57+
### Effect on compression
58+
59+
By reducing precision, the `bitround` codec creates repeated patterns in the binary representation of the data, which may improve the compression ratio when used with subsequent bytes-to-bytes compression codecs (such as `gzip`, `zstd`, or `blosc`).
60+
61+
## Supported data types
62+
63+
### Floating-point types
64+
65+
- `float16`, `float32`, `float64`
66+
- `bfloat16`
67+
- `complex_float16`, `complex_float32`, `complex_float64`
68+
- `complex_bfloat16`
69+
- `complex64`, `complex128`
70+
71+
### Integer types
72+
73+
- `int8`, `int16`, `int32`, `int64`
74+
- `uint8`, `uint16`, `uint32`, `uint64`
75+
- `numpy.timedelta64`, `numpy.datetime64` (encoded equivalently to `int64`)
76+
77+
### Other types
78+
79+
Implementations may support other data types that are interpretable as an integer or floating-point representation, or a composition of such primitives.
80+
81+
## Compatibility
82+
83+
This codec is compatible with `numcodecs.bitround` for floating-point data types.
84+
Integer data types are not supported by the `numcodecs.bitround` codec.
85+
86+
## Sample Data
87+
88+
Sample Zarr arrays encoded with the `bitround` codec can be found in the [sample_data](./sample_data) directory.
89+
90+
## Change log
91+
92+
No changes yet.
93+
94+
## Current maintainers
95+
96+
* Lachlan Deakin ([@LDeakin](https://github.com/LDeakin))
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# `bitround` Codec Sample Data
2+
3+
This directory contains sample Zarr arrays encoded with the `bitround` codec on `float32` and `uint8` data.
4+
5+
Both arrays were encoded with the following `bitround` configuration, which retains 3 bits of precision:
6+
7+
```json
8+
"name": "bitround",
9+
"configuration": {
10+
"keepbits": 3
11+
}
12+
```
13+
14+
### `bitround_float32.zarr`
15+
| Original | Rounded | Original (0b) | Rounded (0b) |
16+
|--------------------|-------------------|--------------------------------------|--------------------------------------|
17+
| 0.000000 | 0.000000 | `0_00000000_00000000000000000000000` | `0_00000000_00000000000000000000000` |
18+
| 0.100000 | 0.101562 | `0_01111011_10011001100110011001101` | `0_01111011_10100000000000000000000` |
19+
| 1.200000 | 1.250000 | `0_01111111_00110011001100110011010` | `0_01111111_01000000000000000000000` |
20+
| 12.300000 | 12.000000 | `0_10000010_10001001100110011001101` | `0_10000010_10000000000000000000000` |
21+
| 123.400002 | 120.000000 | `0_10000101_11101101100110011001101` | `0_10000101_11100000000000000000000` |
22+
| 1234.500000 | 1280.000000 | `0_10001001_00110100101000000000000` | `0_10001001_01000000000000000000000` |
23+
| NaN | NaN | `0_11111111_10000000000000000000000` | `0_11111111_10000000000000000000000` |
24+
| inf | inf | `0_11111111_00000000000000000000000` | `0_11111111_00000000000000000000000` |
25+
| -inf | -inf | `1_11111111_00000000000000000000000` | `1_11111111_00000000000000000000000` |
26+
27+
### `bitround_uint8.zarr`
28+
29+
| Original | Rounded | Original (0b) | Rounded (0b) |
30+
|----------|---------|---------------|--------------|
31+
| 0 | 0 | `00000000` | `00000000` |
32+
| 1 | 1 | `00000001` | `00000001` |
33+
| 10 | 10 | `00001010` | `00001010` |
34+
| 11 | 12 | `00001011` | `00001100` |
35+
| 100 | 96 | `01100100` | `01100000` |
36+
| 123 | 128 | `01111011` | `10000000` |
37+
| 200 | 192 | `11001000` | `11000000` |
38+
| 208 | 192 | `11010000` | `11000000` |
39+
| 209 | 224 | `11010001` | `11100000` |
40+
| 255 | 224 | `11111111` | `11100000` |
36 Bytes
Binary file not shown.
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
{
2+
"zarr_format": 3,
3+
"node_type": "array",
4+
"shape": [
5+
9
6+
],
7+
"data_type": "float32",
8+
"chunk_grid": {
9+
"name": "regular",
10+
"configuration": {
11+
"chunk_shape": [
12+
9
13+
]
14+
}
15+
},
16+
"chunk_key_encoding": {
17+
"name": "default",
18+
"configuration": {
19+
"separator": "/"
20+
}
21+
},
22+
"fill_value": 0.0,
23+
"codecs": [
24+
{
25+
"name": "bitround",
26+
"configuration": {
27+
"keepbits": 3
28+
}
29+
},
30+
{
31+
"name": "bytes",
32+
"configuration": {
33+
"endian": "little"
34+
}
35+
}
36+
]
37+
}
10 Bytes
Binary file not shown.
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
{
2+
"zarr_format": 3,
3+
"node_type": "array",
4+
"shape": [
5+
10
6+
],
7+
"data_type": "uint8",
8+
"chunk_grid": {
9+
"name": "regular",
10+
"configuration": {
11+
"chunk_shape": [
12+
10
13+
]
14+
}
15+
},
16+
"chunk_key_encoding": {
17+
"name": "default",
18+
"configuration": {
19+
"separator": "/"
20+
}
21+
},
22+
"fill_value": 0,
23+
"codecs": [
24+
{
25+
"name": "bitround",
26+
"configuration": {
27+
"keepbits": 3
28+
}
29+
},
30+
{
31+
"name": "bytes",
32+
"configuration": {
33+
"endian": "little"
34+
}
35+
}
36+
]
37+
}

codecs/bitround/schema.json

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
{
2+
"$schema": "https://json-schema.org/draft/2020-12/schema",
3+
"type": "object",
4+
"properties": {
5+
"name": {
6+
"const": "bitround"
7+
},
8+
"configuration": {
9+
"type": "object",
10+
"properties": {
11+
"keepbits": {
12+
"type": "integer",
13+
"minimum": 0
14+
}
15+
},
16+
"required": ["keepbits"],
17+
"additionalProperties": false
18+
}
19+
},
20+
"required": ["name", "configuration"],
21+
"additionalProperties": false
22+
}

0 commit comments

Comments
 (0)