Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,13 @@ accuracy and higher performance, and better results than post-training quantizat
enables you to set the minimum acceptable accuracy value for your optimized model, determining
the optimization efficiency.

With a few lines of code, you can apply NNCF compression to a PyTorch or TensorFlow training
With a few lines of code, you can apply NNCF compression to a PyTorch training
script. Once the model is optimized, you may convert it to the
:doc:`OpenVINO IR format <../../documentation/openvino-ir-format>`, getting even better
inference results with OpenVINO Runtime. To optimize your model, you will need:

* A PyTorch or TensorFlow floating-point model.
* A training pipeline set up in the original framework (PyTorch or TensorFlow).
* A PyTorch floating-point model.
* A training pipeline set up in the PyTorch framework.
* Training and validation datasets.
* A `JSON configuration file <https://github.com/openvinotoolkit/nncf/blob/develop/docs/ConfigFile.md>`__
specifying which compression methods to use.
Expand All @@ -45,9 +45,8 @@ quantization errors part of the overall training loss and tries to minimize thei

To learn more, see:

* guide on quantization for :doc:`PyTorch and TensorFlow <./compressing-models-during-training/quantization-aware-training>`.
* guide on quantization for :doc:`PyTorch <./compressing-models-during-training/quantization-aware-training>`.
* Jupyter notebook on `Quantization Aware Training with NNCF and PyTorch <https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/pytorch-quantization-aware-training>`__.
* Jupyter notebook on `Quantization Aware Training with NNCF and TensorFlow <https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/tensorflow-quantization-aware-training>`__.


Filter pruning
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,13 +46,6 @@ In this step, NNCF-related imports are added in the beginning of the training sc
:language: python
:fragment: [imports]

.. tab-item:: TensorFlow 2
:sync: tensorflow-2

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
:language: python
:fragment: [imports]

2. Create NNCF configuration
++++++++++++++++++++++++++++

Expand All @@ -68,13 +61,6 @@ of optimization methods (`"compression"` section).
:language: python
:fragment: [nncf_congig]

.. tab-item:: TensorFlow 2
:sync: tensorflow-2

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
:language: python
:fragment: [nncf_congig]

Here is a brief description of the required parameters of the Filter Pruning method. For a full description refer to the
`GitHub <https://github.com/openvinotoolkit/nncf/blob/develop/docs/usage/training_time_compression/other_algorithms/Pruning.md>`__ page.

Expand Down Expand Up @@ -103,13 +89,6 @@ optimization.
:language: python
:fragment: [wrap_model]

.. tab-item:: TensorFlow 2
:sync: tensorflow-2

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
:language: python
:fragment: [wrap_model]

4. Fine-tune the model
++++++++++++++++++++++

Expand All @@ -126,14 +105,6 @@ of the original model.
:language: python
:fragment: [tune_model]

.. tab-item:: TensorFlow 2
:sync: tensorflow-2

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
:language: python
:fragment: [tune_model]


5. Multi-GPU distributed training
+++++++++++++++++++++++++++++++++

Expand All @@ -149,18 +120,11 @@ fine-tuning that will inform optimization methods to do some adjustments to func
:language: python
:fragment: [distributed]

.. tab-item:: TensorFlow 2
:sync: tensorflow-2

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
:language: python
:fragment: [distributed]

6. Export quantized model
+++++++++++++++++++++++++

When fine-tuning finishes, the quantized model can be exported to the corresponding format for further inference: ONNX in
the case of PyTorch and frozen graph - for TensorFlow 2.
the case of PyTorch.

.. tab-set::

Expand All @@ -171,14 +135,6 @@ the case of PyTorch and frozen graph - for TensorFlow 2.
:language: python
:fragment: [export]

.. tab-item:: TensorFlow 2
:sync: tensorflow-2

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
:language: python
:fragment: [export]


These were the basic steps to applying the QAT method from the NNCF. However, it is required in some cases to save/load model
checkpoints during the training. Since NNCF wraps the original model with its own object it provides an API for these needs.

Expand All @@ -197,14 +153,6 @@ To save model checkpoint use the following API:
:language: python
:fragment: [save_checkpoint]

.. tab-item:: TensorFlow 2
:sync: tensorflow-2

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
:language: python
:fragment: [save_checkpoint]


8. (Optional) Restore from checkpoint
+++++++++++++++++++++++++++++++++++++

Expand All @@ -219,20 +167,13 @@ To restore the model from checkpoint you should use the following API:
:language: python
:fragment: [load_checkpoint]

.. tab-item:: TensorFlow 2
:sync: tensorflow-2

.. doxygensnippet:: docs/optimization_guide/nncf/code/pruning_tf.py
:language: python
:fragment: [load_checkpoint]

For more details, see the following `documentation <https://github.com/openvinotoolkit/nncf/blob/develop/docs/usage/training_time_compression/other_algorithms/Pruning.md>`__.

Deploying pruned model
######################

The pruned model requires an extra step that should be done to get performance improvement. This step involves removal of the
zero filters from the model. This is done at the model conversion step using :doc:`model conversion API <../../model-preparation>` tool when model is converted from the framework representation (ONNX, TensorFlow, etc.) to OpenVINO Intermediate Representation.
zero filters from the model. This is done at the model conversion step using :doc:`model conversion API <../../model-preparation>` tool when model is converted from the framework representation (ONNX, etc.) to OpenVINO Intermediate Representation.

* To remove zero filters from the pruned model add the following parameter to the model conversion command: ``transform=Pruning``

Expand All @@ -244,6 +185,3 @@ Examples
####################

* `PyTorch Image Classification example <https://github.com/openvinotoolkit/nncf/blob/develop/examples/torch/classification>`__

* `TensorFlow Image Classification example <https://github.com/openvinotoolkit/nncf/tree/develop/examples/tensorflow/classification>`__

Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,6 @@ knowledgeable in Python programming and familiar with the training code for the

Steps required to apply QAT to the model:

.. note::
Currently, NNCF for TensorFlow supports the optimization of models created using the Keras
`Sequential API <https://www.tensorflow.org/guide/keras/sequential_model>`__ or
`Functional API <https://www.tensorflow.org/guide/keras/functional>`__.

1. Apply Post Training Quantization to the Model
#################################################

Expand All @@ -31,13 +26,6 @@ Quantize the model using the :doc:`Post-Training Quantization <../quantizing-mod
:language: python
:fragment: [quantize]

.. tab-item:: TensorFlow 2
:sync: tensorflow-2

.. doxygensnippet:: docs/optimization_guide/nncf/code/qat_tf.py
:language: python
:fragment: [quantize]

2. Fine-tune the Model
#######################

Expand All @@ -56,13 +44,6 @@ forward and backward passes.
:language: python
:fragment: [tune_model]

.. tab-item:: TensorFlow 2
:sync: tensorflow-2

.. doxygensnippet:: docs/optimization_guide/nncf/code/qat_tf.py
:language: python
:fragment: [tune_model]

.. note::
The precision of weight transitions to INT8 only after converting the model to OpenVINO
Intermediate Representation. You can expect a reduction in the model footprint only for
Expand All @@ -85,13 +66,6 @@ To save a model checkpoint, use the following API:
:language: python
:fragment: [save_checkpoint]

.. tab-item:: TensorFlow 2
:sync: tensorflow-2

.. doxygensnippet:: docs/optimization_guide/nncf/code/qat_tf.py
:language: python
:fragment: [save_checkpoint]

4. (Optional) Restore from Checkpoint
######################################

Expand All @@ -106,13 +80,6 @@ To restore the model from checkpoint, use the following API:
:language: python
:fragment: [load_checkpoint]

.. tab-item:: TensorFlow 2
:sync: tensorflow-2

.. doxygensnippet:: docs/optimization_guide/nncf/code/qat_tf.py
:language: python
:fragment: [load_checkpoint]

Deploying the Quantized Model
##############################

Expand All @@ -128,18 +95,10 @@ any additional steps.
:language: python
:fragment: [inference]

.. tab-item:: TensorFlow 2
:sync: tensorflow-2

.. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_tensorflow.py
:language: python
:fragment: [inference]

For more details, see the corresponding :doc:`documentation <../../running-inference>`.

Examples
#########

* `Quantization-aware Training of Resnet18 PyTorch Model <https://github.com/openvinotoolkit/nncf/tree/develop/examples/quantization_aware_training/torch/resnet18>`__
* `Quantization-aware Training of STFPM PyTorch Model <https://github.com/openvinotoolkit/nncf/tree/develop/examples/quantization_aware_training/torch/anomalib>`__
* `Quantization-aware Training of MobileNet v2 TensorFlow Model <https://github.com/openvinotoolkit/nncf/tree/develop/examples/quantization_aware_training/tensorflow/mobilenet_v2>`__
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ flows:

.. note

NNCF offers a Python API, for compressing PyTorch, TensorFlow 2.x, ONNX, and OpenVINO IR
NNCF offers a Python API, for compressing PyTorch, ONNX, and OpenVINO IR
model formats. OpenVINO IR offers the most comprehensive support.


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Basic Quantization Flow
Introduction
####################

The basic quantization flow is the simplest way to apply 8-bit quantization to the model. It is available for models in the following frameworks: OpenVINO, PyTorch, TensorFlow 2.x, and ONNX. The basic quantization flow is based on the following steps:
The basic quantization flow is the simplest way to apply 8-bit quantization to the model. It is available for models in the following frameworks: OpenVINO, PyTorch, and ONNX. The basic quantization flow is based on the following steps:

* Set up an environment and install dependencies.
* Prepare a representative **calibration dataset** that is used to estimate quantization parameters of the activations within the model, for example, of 300 samples.
Expand Down Expand Up @@ -56,13 +56,6 @@ The transformation function is a function that takes a sample from the dataset a
:language: python
:fragment: [dataset]

.. tab-item:: TensorFlow
:sync: tensorflow

.. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_tensorflow.py
:language: python
:fragment: [dataset]

.. tab-item:: TorchFX
:sync: torch_fx

Expand Down Expand Up @@ -102,13 +95,6 @@ See the `example section <#examples-of-how-to-apply-nncf-post-training-quantizat
:language: python
:fragment: [quantization]

.. tab-item:: TensorFlow
:sync: tensorflow

.. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_tensorflow.py
:language: python
:fragment: [quantization]

.. tab-item:: TorchFX
:sync: torch_fx

Expand Down Expand Up @@ -142,13 +128,6 @@ If you have not already installed OpenVINO developer tools, install it with ``pi
:language: python
:fragment: [inference]

.. tab-item:: TensorFlow
:sync: tensorflow

.. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_tensorflow.py
:language: python
:fragment: [inference]

TorchFX models can utilize OpenVINO optimizations using `torch.compile(..., backend="openvino") <https://docs.openvino.ai/2025/openvino-workflow/torch-compile.html>`__ functionality:

.. tab-set::
Expand Down Expand Up @@ -242,5 +221,4 @@ Examples of how to apply NNCF post-training quantization:
* `Post-Training Quantization of MobileNet v2 PyTorch Model <https://github.com/openvinotoolkit/nncf/blob/develop/examples/post_training_quantization/torch/mobilenet_v2>`__
* `Post-Training Quantization of SSD PyTorch Model <https://github.com/openvinotoolkit/nncf/blob/develop/examples/post_training_quantization/torch/ssd300_vgg16>`__
* `Post-Training Quantization of MobileNet v2 ONNX Model <https://github.com/openvinotoolkit/nncf/blob/develop/examples/post_training_quantization/onnx/mobilenet_v2>`__
* `Post-Training Quantization of MobileNet v2 TensorFlow Model <https://github.com/openvinotoolkit/nncf/blob/develop/examples/post_training_quantization/tensorflow/mobilenet_v2>`__

82 changes: 0 additions & 82 deletions docs/optimization_guide/nncf/code/pruning_tf.py

This file was deleted.

Loading