Skip to content

Conversation

@shaneahmed
Copy link
Member

@shaneahmed shaneahmed commented Oct 22, 2025

🚀 Summary

This PR introduces an updated, DeepFeatureExtractor engine, to the TIAToolbox framework. It enables extraction of deep CNN features from whole slide images (WSIs) or image patches for downstream tasks such as clustering, visualization, or training other models. The PR also includes a command-line interface (CLI) for this engine, along with comprehensive tests.


✨ Key Features

  • New Engine: DeepFeatureExtractor

    • Extracts intermediate CNN features from WSIs and patches.
    • Outputs features and coordinates in Zarr or dict format.
    • Supports memory-aware caching for large-scale processing.
  • CLI Integration

    • Adds deep-feature-extractor command to TIAToolbox CLI.
    • Supports input/output paths, model selection, batch size, device, and more.
  • Unit Tests

    • Covers patch-based and WSI-based inference.
    • Tests multi-GPU support and CLI functionality.
    • Validates compatibility with CNNBackbone and TimmBackbone models.
  • Codebase Integration

    • Registers the engine and CLI in __init__.py files.
    • Updates CLI and model registries to include the new extractor.

@shaneahmed shaneahmed self-assigned this Oct 22, 2025
@shaneahmed shaneahmed added this to the Release v2.0.0 milestone Oct 22, 2025
@shaneahmed shaneahmed added the enhancement New feature or request label Oct 22, 2025
@codecov
Copy link

codecov bot commented Oct 22, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.92%. Comparing base (c535eab) to head (5c07200).

Additional details and impacted files
@@                    Coverage Diff                     @@
##           dev-define-engines-abc     #963      +/-   ##
==========================================================
+ Coverage                   94.72%   94.92%   +0.20%     
==========================================================
  Files                          73       75       +2     
  Lines                        9234     9344     +110     
  Branches                     1208     1214       +6     
==========================================================
+ Hits                         8747     8870     +123     
+ Misses                        452      439      -13     
  Partials                       35       35              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Results are inconsistent as the model is redefined on a different device.
@Jiaqi-Lv Jiaqi-Lv requested a review from Copilot November 12, 2025 12:17
Copilot finished reviewing on behalf of Jiaqi-Lv November 12, 2025 12:20
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces the DeepFeatureExtractor engine to TIAToolbox, enabling extraction of deep CNN feature representations from whole slide images (WSIs) and image patches for downstream tasks like clustering and visualization.

Key Changes

  • New DeepFeatureExtractor class extending SemanticSegmentor for feature extraction from WSIs and patches
  • CLI integration with deep-feature-extractor command supporting various configuration options
  • Comprehensive test suite covering patch-based inference, WSI processing, multi-GPU support, and CLI functionality

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
tiatoolbox/models/engine/deep_feature_extractor.py Core engine implementation with memory-aware caching and Zarr output support
tiatoolbox/models/engine/init.py Registration of new deep_feature_extractor module
tiatoolbox/models/init.py Export of DeepFeatureExtractor class
tiatoolbox/cli/deep_feature_extractor.py CLI interface for the feature extractor with parameter handling
tiatoolbox/cli/init.py Registration of deep_feature_extractor CLI command
tiatoolbox/models/engine/semantic_segmentor.py Fixed docstring to correctly reference SemanticSegmentor instead of PatchPredictor
tests/engines/test_feature_extractor.py Comprehensive tests for patch/WSI inference, multi-GPU, and CLI
tests/engines/test_semantic_segmentor.py Updated test docstring for clarity

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

# Default Memory threshold percentage is 80.
memory_threshold = kwargs.get("memory_threshold", 80)
vm = psutil.virtual_memory()
keys = ["probabilities", "coordinates"]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
keys = ["probabilities", "coordinates"]
keys = ["features", "coordinates"]

Copy link
Collaborator

@measty measty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found a few issues that would need to be addressed before merging

)

raw_predictions["coordinates"] = da.concatenate(coordinates, axis=0)
raw_predictions["probabilities"] = da.concatenate(probabilities, axis=0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

various references to probabilities throughout this function, but feature extractor isnt doing anything with probabilities so it's a bit of a misleading name.

Either change it to something more appropriate in here, or maybe change it in the base SemanticSegmentor to something more generic?

"""
super().__init__(
model=model,
batch_size=batch_size,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when a string is passed as a model, it should really go look in the lists of architectures available for TIMMBackbone and CNNBackbone (in architecture.vanilla.py) and build the appropriate model. At the moment it looks in the models defined in pretrainedModel.yaml which is patch prediction/segmentation models. And those don't work in here.

We may even want to consolidate these, so that there is a single unified way of getting models, whatever type they are.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, the feature extractor should work with both TimmBackbone("efficientnet_b0", pretrained=True) and "efficientnet_b0" options.

save_dir: os.PathLike | Path | None = None,
overwrite: bool = False,
output_type: str = "dict",
**kwargs: Unpack[SemanticSegmentorRunParams],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has some usability issues. As a user (and especially trying to put myself into the shoes of someone relatively new to tiatoolbox) I would just want to specify a model, a patch size, a resolution, some data to run it on, and somewhere to save. It should be intuitive how to provide those.

In practice, some of those things are hidden away - i either have to go off into IOSegmentorConfig and figure out how to make one of those, or hunt around in SemanticSegmentorRunParams to find out from these how to specify them

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly specify input_resolutions and patch_input_shape instead of wrapping it into SemanticSegmentorRunParams.

runner = CliRunner()
models_wsi_result = runner.invoke(
cli.main,
[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a correct test.

what this is actually doing:
As default model is fcn-tissue-mask, it is running that within the DeepFeatureExtractor engine and saving the patches that come out of that (the probability maps for tissue/non-tissue) in a wierd zarr array that has shape (512*num_patches, 512, 2)

In practice, the deep-feature-extractor cli isnt useable because of the issue i pointed out in one of my other comments with providing model as string; you cant pass name of a valid CNNBackbone or TimmBackbone in there.

"""Deep Feature Extraction Engine for Digital Pathology.
This module defines the `DeepFeatureExtractor` class, which extends
`SemanticSegmentor` to extract intermediate CNN feature representations
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't just CNN features (most foundation models are transformer)



class DeepFeatureExtractor(SemanticSegmentor):
r"""Generic CNN-based feature extractor for digital pathology images.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove CNN-based

Example:
--------
>>> from tiatoolbox.models.engine.deep_feature_extractor import DeepFeatureExtractor
>>> extractor = DeepFeatureExtractor(model="resnet50-kather100k")
Copy link
Collaborator

@measty measty Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example wouldn't work as resnet50-kather100k isnt a feature extraction model (and wouldnt run as patch_output_shape isnt defined in it's ioconfig)

"""Deep Feature Extraction Engine for Digital Pathology.
This module defines the `DeepFeatureExtractor` class, which extends
`SemanticSegmentor` to extract intermediate CNN feature representations
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`SemanticSegmentor` to extract intermediate CNN feature representations
`SemanticSegmentor` to extract intermediate feature representations



class DeepFeatureExtractor(SemanticSegmentor):
r"""Generic CNN-based feature extractor for digital pathology images.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
r"""Generic CNN-based feature extractor for digital pathology images.
r"""Generic deep feature extractor for digital pathology images.

Whether to overwrite existing output files. Default is False.
output_type (str):
Desired output format. Must be "zarr" or "dict".
**kwargs (SemanticSegmentorRunParams):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expand the docstring for all the options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants