fastmachinelearning · maltanar · Nov 20, 2025 · Feb 11, 2025 · Mar 13, 2025 · Mar 13, 2025
diff --git a/README.md b/README.md
@@ -11,7 +11,7 @@
 <img align="left" src="https://xilinx.github.io/finn/img/TFC_1W2A.onnx.png" alt="QONNX example" style="margin-right: 20px" width="200"/>
 
 
-QONNX (Quantized ONNX) introduces several custom operators -- [`IntQuant`](docs/qonnx-custom-ops/intquant_op.md), [`FloatQuant`](docs/qonnx-custom-ops/floatquant_op.md), [`BipolarQuant`](docs/qonnx-custom-ops/bipolar_quant_op.md), and [`Trunc`](docs/qonnx-custom-ops/trunc_op.md) -- in order to represent arbitrary-precision integer and minifloat quantization in ONNX. This enables:
+QONNX (Quantized ONNX) introduces several [custom operators](docs/qonnx-custom-ops/overview.md) -- `IntQuant`, `FloatQuant`, `BipolarQuant`, and `Trunc` -- in order to represent arbitrary-precision integer and minifloat quantization in ONNX. This enables:
 * Representation of binary, ternary, 3-bit, 4-bit, 6-bit or any other integer/fixed-point quantization.
 * Representation of minifloat quantization with configurable exponent and mantissa bits.
 * Quantization is an operator itself, and can be applied to any parameter or layer input.
@@ -29,9 +29,7 @@ This repository contains a set of Python utilities to work with QONNX models, in
 
 ### Operator definitions
 
-* [Quant](docs/qonnx-custom-ops/quant_op.md) for 2-to-arbitrary-bit quantization, with scaling and zero-point
-* [BipolarQuant](docs/qonnx-custom-ops/bipolar_quant_op.md)  for 1-bit (bipolar) quantization, with scaling and zero-point
-* [Trunc](docs/qonnx-custom-ops/trunc_op.md) for truncating to a specified number of bits, with scaling and zero-point
+Please see the [custom operator overview](docs/qonnx-custom-ops/overview.md) table for more details.
 
 ### Installation
 

diff --git a/docs/qonnx-custom-ops/bipolar_quant_op.md → docs/qonnx-custom-ops/bipolarquant_v1.md b/docs/qonnx-custom-ops/bipolar_quant_op.md → docs/qonnx-custom-ops/bipolarquant_v1.md
@@ -5,7 +5,7 @@ Additionally, takes one float as input, which define the scaling.
 
 #### Version
 
-This operator is not part of the ONNX standard and is not currently versioned.
+The description of this operator in this document corresponds to `qonnx.custom_ops.general` opset version 1.
 
 #### Attributes
 

diff --git a/docs/qonnx-custom-ops/floatquant_op.md → docs/qonnx-custom-ops/floatquant_v1.md b/docs/qonnx-custom-ops/floatquant_op.md → docs/qonnx-custom-ops/floatquant_v1.md
@@ -16,7 +16,7 @@ special (symbolic) values. This makes it nontrivial to infer the maximum represe
 
 #### Version
 
-This operator is not part of the ONNX standard and is not currently versioned.
+The description of this operator in this document corresponds to `qonnx.custom_ops.general` opset version 1.
 
 #### Attributes
 

diff --git a/docs/qonnx-custom-ops/intquant_op.md → docs/qonnx-custom-ops/intquant_v1.md b/docs/qonnx-custom-ops/intquant_op.md → docs/qonnx-custom-ops/intquant_v1.md
@@ -9,11 +9,11 @@ rounding_mode defines how quantized values are rounded.
 
 Notes:
 * This operator was previously named `Quant` but is renamed to `IntQuant` to distinguish it from `FloatQuant`. For a transition period, qonnx will transparently handle `Quant` as `IntQuant` for backwards compatibility reasons, but only `IntQuant` should be used for new models.
-* This operator does not work for binary or bipolar quantization, for this purpose the simpler BipolarQuant node exists.
+* This operator does not work for binary or bipolar quantization, for this purpose the simpler `BipolarQuant` node exists.
 
 #### Version
 
-This operator is not part of the ONNX standard and is not currently versioned.
+The description of this operator in this document corresponds to `qonnx.custom_ops.general` opset version 1.
 
 #### Attributes
 

diff --git a/docs/qonnx-custom-ops/overview.md b/docs/qonnx-custom-ops/overview.md
@@ -0,0 +1,13 @@
+## Operator Schemas
+
+This file lists the QONNX custom operators, similar to `Operators.md` for the ONNX standard.
+It is manually updated, since QONNX custom operators are relatively few in number.
+
+### qonnx.custom_op.general
+
+|**Operator**|**Since version**||
+|-|-|-|
+|<a href="bipolarquant_v1.md">BipolarQuant</a>|<a href="bipolarquant_v1.md">1</a>|
+|<a href="floatquant_v1.md">FloatQuant</a>|<a href="floatquant_v1.md">1</a>|
+|<a href="intquant_v1.md">IntQuant</a>|<a href="intquant_v1.md">1</a>|
+|<a href="trunc_v2.md">Trunc</a>|<a href="trunc_v2.md">2</a>, <a href="trunc_v1.md">1</a>|
diff --git a/docs/qonnx-custom-ops/trunc_op.md → docs/qonnx-custom-ops/trunc_v1.md b/docs/qonnx-custom-ops/trunc_op.md → docs/qonnx-custom-ops/trunc_v1.md
@@ -6,7 +6,7 @@ The attribute rounding_mode defines how truncated values are rounded.
 
 #### Version
 
-This operator is not part of the ONNX standard and is not currently versioned.
+The description of this operator in this document corresponds to `qonnx.custom_ops.general` opset version 1.
 
 #### Attributes
 

diff --git a/docs/qonnx-custom-ops/trunc_v2.md b/docs/qonnx-custom-ops/trunc_v2.md
@@ -0,0 +1,144 @@
+### <a name="Trunc"></a><a name="abs">**Trunc**</a>
+
+Truncates the values of one input data (Tensor<T>) at a specified bitwidth and produces one output data (Tensor<T>).
+Additionally, takes four float tensors as input, which define the scale, zero-point, input bit-width and output bit-width of the quantization.
+The attribute rounding_mode defines how truncated values are rounded.
+
+#### Version
+
+This operator is not part of the ONNX standard.
+The description of this operator in this document corresponds to `qonnx.custom_ops.general` opset version 2.
+
+#### Attributes
+
+<dl>
+<dt><tt>rounding_mode</tt> : string (default is "FLOOR")</dt>
+<dd>Defines how rounding should be applied during truncation. Currently available modes are: "ROUND", "CEIL" and "FLOOR". Here "ROUND" implies a round-to-even operation. Lowercase variants for the rounding mode string are also supported: "round", "ceil", "floor".</dd>
+<dt><tt>signed</tt> : int (default is 1)</dt>
+<dd>Defines if the quantization includes a signed bit. E.g. at 8b unsigned=[0, 255] vs signed=[-128, 127].</dd>
+<dt><tt>narrow</tt> : int (default is 0)</dt>
+<dd>Defines if the value range should be interpreted as narrow, when signed=1. E.g. at 8b regular=[-128, 127] vs narrow=[-127, 127].</dd>
+</dl>
+
+#### Inputs
+
+<dl>
+<dt><tt>X</tt> (differentiable) : tensor(float32)</dt>
+<dd>input tensor to truncate</dd>
+<dt><tt>scale</tt> : float32</dt>
+<dd>The scale factor at the input of the truncation</dd>
+<dt><tt>zeropt</tt> : float32</dt>
+<dd>The zero-point at the input of the truncation</dd>
+<dt><tt>in_bitwidth</tt> : int32</dt>
+<dd>The number of bits used at the input of the truncation</dd>
+<dt><tt>out_scale</tt> : float32</dt>
+<dd>The scale factor of the output of the truncation</dd>
+<dt><tt>out_bitwidth</tt> : int32</dt>
+<dd>The number of bits used at the output of the truncation</dd>
+</dl>
+
+
+#### Outputs
+
+<dl>
+<dt><tt>Y</tt> (differentiable) : tensor(float32)</dt>
+<dd>Output tensor</dd>
+</dl>
+
+
+#### Examples
+<details>
+<summary>Trunc</summary>
+
+```python
+from onnx import helper
+import numpy as np
+
+# Define node settings and input
+x = np.random.randn(100).astype(np.float32)*10.
+scale = np.array(1.)
+zeropt = np.array(0.)
+in_bitwidth = np.array(10)
+out_bitwidth = np.array(4)
+rounding_mode = "ROUND"
+
+# Create node
+node = helper.make_node(
+    'Trunc',
+    domain='finn.custom_op.general',
+    inputs=['x', 'scale', 'zeropt', 'in_bitwidth', 'out_bitwidth'],
+    outputs=['y'],
+    rounding_mode=rounding_mode,
+)
+
+# Execute the same settings with the reference implementation (trunc)
+# See the sample implementation for more details on trunc.
+output_ref = trunc(inp_tensor, scale, zeropt, in_bitwidth, out_bitwidth, rounding_mode)
+
+# Execute node and compare
+expect(node, inputs=[x, scale, zeropt, bitwidth], outputs=[output_ref], name='test_trunc')
+
+```
+
+</details>
+
+
+#### Sample Implementation
+
+<details>
+<summary>Trunc</summary>
+
+```python
+# SPDX-License-Identifier: Apache-2.0
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import numpy as np
+
+def trunc(inp_tensor, scale, zeropt, input_bit_width, narrow, signed, output_scale, output_bit_width, rounding_mode):
+
+    # Scaling
+    y = inp_tensor / scale
+    y = y + zeropt
+    # Rounding
+    y = np.round(y)
+    # Rescale
+    trunc_scale = 2 ** np.round(
+        np.log2(output_scale / scale)
+    )  # Trunc scale should be a power-of-two - ensure that is the case
+    y = y / trunc_scale
+
+    # Clamping
+    min_int_val = min_int(signed, narrow, output_bit_width)
+    max_int_val = max_int(signed, narrow, output_bit_width)
+    y = np.where(y > max_int_val, max_int_val.astype(y.dtype), y)
+    y = np.where(y < min_int_val, min_int_val.astype(y.dtype), y)
+    # To int (truncate)
+    rounding_fx = resolve_rounding_mode(rounding_mode)
+    y = rounding_fx(y)
+
+    # Rescale
+    output_zeropt = zeropt / trunc_scale  # Rescale zero-point
+    y = y - output_zeropt
+    y = y * output_scale
+
+    return y
+
+def resolve_rounding_mode(mode_string):
+    """Resolve the rounding mode string of Quant and Trunc ops
+    to the corresponding numpy functions."""
+    if mode_string == "ROUND":
+        return np.round
+    elif mode_string == "CEIL":
+        return np.ceil
+    elif mode_string == "FLOOR":
+        return np.floor
+    else:
+        raise ValueError(f"Could not resolve rounding mode called: {mode_string}")
+
+```
+
+</details>
diff --git a/src/qonnx/core/execute_custom_node.py b/src/qonnx/core/execute_custom_node.py
@@ -27,10 +27,9 @@
 # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
 import qonnx.custom_op.registry as registry
-from qonnx.util.basic import get_preferred_onnx_opset
 
 
-def execute_custom_node(node, context, graph, onnx_opset_version=get_preferred_onnx_opset()):
+def execute_custom_node(node, context, graph, onnx_opset_version):
     """Call custom implementation to execute a single custom node.
     Input/output provided via context."""
     op_type = node.op_type

diff --git a/src/qonnx/core/onnx_exec.py b/src/qonnx/core/onnx_exec.py
@@ -36,15 +36,10 @@
 import qonnx.analysis.topology as ta
 import qonnx.core.execute_custom_node as ex_cu_node
 from qonnx.custom_op.registry import is_custom_op
-from qonnx.util.basic import (
-    get_preferred_onnx_opset,
-    get_sanitize_quant_tensors,
-    qonnx_make_model,
-    sanitize_quant_values,
-)
+from qonnx.util.basic import get_preferred_qonnx_opset, get_sanitize_quant_tensors, qonnx_make_model, sanitize_quant_values
 
 
-def execute_node(node, context, graph, return_full_exec_context=False, opset_version=get_preferred_onnx_opset()):
+def execute_node(node, context, graph, opset_version, return_full_exec_context=False):
     """Executes a single node by using onnxruntime or with a custom function.
 
     Input/output provided via context."""
@@ -158,7 +153,7 @@ def execute_onnx(model, input_dict, return_full_exec_context=False, start_node=N
     model_exec_mode = model.get_metadata_prop("exec_mode")
     if (model_exec_mode is None) or (model_exec_mode == ""):
         # extract opset version for node-by-node execution
-        opset_version = model.model.opset_import[0].version
+        opset_imports = model.get_opset_imports()
         # execute the model node by node
         # we can simply walk down the list since the ONNX spec guarantees that it is
         # topologically sorted
@@ -176,7 +171,11 @@ def execute_onnx(model, input_dict, return_full_exec_context=False, start_node=N
             if get_sanitize_quant_tensors() != 0:
                 # round input values to match quantization annotation
                 execution_context = sanitize_quant_values(model, node.input, execution_context)
-            execute_node(node, execution_context, graph, return_full_exec_context, opset_version)
+            if node.domain in opset_imports:
+                opset_version = opset_imports[node.domain]
+            else:
+                opset_version = get_preferred_qonnx_opset()
+            execute_node(node, execution_context, graph, opset_version, return_full_exec_context)
             if get_sanitize_quant_tensors() != 0:
                 # round output values to quantization annotation
                 execution_context = sanitize_quant_values(model, node.output, execution_context)

diff --git a/src/qonnx/custom_op/base.py b/src/qonnx/custom_op/base.py
@@ -30,15 +30,35 @@
 import onnx.numpy_helper as np_helper
 from abc import ABC, abstractmethod
 
-from qonnx.util.basic import get_by_name, get_preferred_onnx_opset
+from qonnx.util.basic import get_by_name, get_preferred_qonnx_opset
 
 
 class CustomOp(ABC):
     """CustomOp class all custom op nodes are based on. Contains different functions
     every custom node should have. Some as abstract methods, these have to be
-    filled when writing a new custom op node."""
+    filled when writing a new custom op node.
 
-    def __init__(self, onnx_node, onnx_opset_version=get_preferred_onnx_opset()):
+    Opset Version Support:
+        CustomOp classes use "since version" semantics matching ONNX operators.
+        Version is determined by the class name using _vN suffix convention:
+
+        - No suffix (e.g., IntQuant): Version 1 (default)
+        - _vN suffix (e.g., IntQuant_v2): Version N
+
+        The registry automatically selects the highest version <= requested opset.
+
+        Example:
+            class IntQuant(CustomOp):
+                pass  # Version 1 (no suffix)
+
+            class IntQuant_v2(CustomOp):
+                pass  # Version 2, covers opset v2-v3 (if no v3 exists)
+
+            class IntQuant_v4(CustomOp):
+                pass  # Version 4, covers opset v4+
+    """
+
+    def __init__(self, onnx_node, onnx_opset_version=get_preferred_qonnx_opset()):
         super().__init__()
         self.onnx_node = onnx_node
         self.onnx_opset_version = onnx_opset_version

diff --git a/src/qonnx/custom_op/channels_last/__init__.py b/src/qonnx/custom_op/channels_last/__init__.py
@@ -1,11 +1,17 @@
 # Importing registers CustomOps in qonnx.custom_op.channels_last domain
-from qonnx.custom_op.channels_last.batch_normalization import BatchNormalization
-from qonnx.custom_op.channels_last.conv import Conv
-from qonnx.custom_op.channels_last.max_pool import MaxPool
+from qonnx.custom_op.channels_last.batch_normalization import (
+    BatchNormalization_v1,
+    BatchNormalization_v9,
+    BatchNormalization_v14,
+)
+from qonnx.custom_op.channels_last.conv import Conv_v1
+from qonnx.custom_op.channels_last.max_pool import MaxPool_v1, MaxPool_v10
 
-# Legacy dictionary for backward compatibility
-custom_op = {
-    "Conv": Conv,
-    "MaxPool": MaxPool,
-    "BatchNormalization": BatchNormalization,
-}
+__all__ = [
+    "Conv_v1",
+    "MaxPool_v1",
+    "MaxPool_v10",
+    "BatchNormalization_v1",
+    "BatchNormalization_v9",
+    "BatchNormalization_v14",
+]
diff --git a/src/qonnx/custom_op/channels_last/batch_normalization.py b/src/qonnx/custom_op/channels_last/batch_normalization.py
@@ -32,7 +32,7 @@
 from qonnx.custom_op.channels_last.base_wrapped_op import ChannelsLastWrappedOp
 
 
-class BatchNormalization(ChannelsLastWrappedOp):
+class BatchNormalization_v1(ChannelsLastWrappedOp):
     def get_nodeattr_types(self):
         """Returns a dict of permitted attributes for node, where:
         ret_dict[attribute_name] = (dtype, require, default_value, <allowed_values>)
@@ -133,3 +133,13 @@ def verify_node(self):
             )
 
         return info_messages
+
+
+class BatchNormalization_v9(BatchNormalization_v1):
+    # no relevant changes for channels-last wrapper
+    pass
+
+
+class BatchNormalization_v14(BatchNormalization_v9):
+    # no relevant changes for channels-last wrapper
+    pass
diff --git a/src/qonnx/custom_op/channels_last/conv.py b/src/qonnx/custom_op/channels_last/conv.py
@@ -33,7 +33,7 @@
 from qonnx.custom_op.general.im2col import compute_conv_output_dim
 
 
-class Conv(ChannelsLastWrappedOp):
+class Conv_v1(ChannelsLastWrappedOp):
     def get_nodeattr_types(self):
         """Returns a dict of permitted attributes for node, where:
         ret_dict[attribute_name] = (dtype, require, default_value, <allowed_values>)

diff --git a/src/qonnx/custom_op/channels_last/max_pool.py b/src/qonnx/custom_op/channels_last/max_pool.py
@@ -33,7 +33,7 @@
 from qonnx.custom_op.general.maxpoolnhwc import compute_pool_output_dim
 
 
-class MaxPool(ChannelsLastWrappedOp):
+class MaxPool_v1(ChannelsLastWrappedOp):
     def get_nodeattr_types(self):
         """Returns a dict of permitted attributes for node, where:
         ret_dict[attribute_name] = (dtype, require, default_value, <allowed_values>)
@@ -171,3 +171,8 @@ def verify_node(self):
             )
 
         return info_messages
+
+
+class MaxPool_v10(MaxPool_v1):
+    # no relevant changes for channels-last wrapper
+    pass