-
Notifications
You must be signed in to change notification settings - Fork 102
🆕 Define DeepFeatureExtractor
#963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev-define-engines-abc
Are you sure you want to change the base?
🆕 Define DeepFeatureExtractor
#963
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## dev-define-engines-abc #963 +/- ##
==========================================================
+ Coverage 94.72% 94.92% +0.20%
==========================================================
Files 73 75 +2
Lines 9234 9344 +110
Branches 1208 1214 +6
==========================================================
+ Hits 8747 8870 +123
+ Misses 452 439 -13
Partials 35 35 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
… into dev-define-DeepFeatureExtractor
Results are inconsistent as the model is redefined on a different device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces the DeepFeatureExtractor engine to TIAToolbox, enabling extraction of deep CNN feature representations from whole slide images (WSIs) and image patches for downstream tasks like clustering and visualization.
Key Changes
- New
DeepFeatureExtractorclass extendingSemanticSegmentorfor feature extraction from WSIs and patches - CLI integration with
deep-feature-extractorcommand supporting various configuration options - Comprehensive test suite covering patch-based inference, WSI processing, multi-GPU support, and CLI functionality
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| tiatoolbox/models/engine/deep_feature_extractor.py | Core engine implementation with memory-aware caching and Zarr output support |
| tiatoolbox/models/engine/init.py | Registration of new deep_feature_extractor module |
| tiatoolbox/models/init.py | Export of DeepFeatureExtractor class |
| tiatoolbox/cli/deep_feature_extractor.py | CLI interface for the feature extractor with parameter handling |
| tiatoolbox/cli/init.py | Registration of deep_feature_extractor CLI command |
| tiatoolbox/models/engine/semantic_segmentor.py | Fixed docstring to correctly reference SemanticSegmentor instead of PatchPredictor |
| tests/engines/test_feature_extractor.py | Comprehensive tests for patch/WSI inference, multi-GPU, and CLI |
| tests/engines/test_semantic_segmentor.py | Updated test docstring for clarity |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Default Memory threshold percentage is 80. | ||
| memory_threshold = kwargs.get("memory_threshold", 80) | ||
| vm = psutil.virtual_memory() | ||
| keys = ["probabilities", "coordinates"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| keys = ["probabilities", "coordinates"] | |
| keys = ["features", "coordinates"] |
Using features in inference will require major change in the base class.
measty
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've found a few issues that would need to be addressed before merging
| ) | ||
|
|
||
| raw_predictions["coordinates"] = da.concatenate(coordinates, axis=0) | ||
| raw_predictions["probabilities"] = da.concatenate(probabilities, axis=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
various references to probabilities throughout this function, but feature extractor isnt doing anything with probabilities so it's a bit of a misleading name.
Either change it to something more appropriate in here, or maybe change it in the base SemanticSegmentor to something more generic?
| """ | ||
| super().__init__( | ||
| model=model, | ||
| batch_size=batch_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when a string is passed as a model, it should really go look in the lists of architectures available for TIMMBackbone and CNNBackbone (in architecture.vanilla.py) and build the appropriate model. At the moment it looks in the models defined in pretrainedModel.yaml which is patch prediction/segmentation models. And those don't work in here.
We may even want to consolidate these, so that there is a single unified way of getting models, whatever type they are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, the feature extractor should work with both TimmBackbone("efficientnet_b0", pretrained=True) and "efficientnet_b0" options.
| save_dir: os.PathLike | Path | None = None, | ||
| overwrite: bool = False, | ||
| output_type: str = "dict", | ||
| **kwargs: Unpack[SemanticSegmentorRunParams], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has some usability issues. As a user (and especially trying to put myself into the shoes of someone relatively new to tiatoolbox) I would just want to specify a model, a patch size, a resolution, some data to run it on, and somewhere to save. It should be intuitive how to provide those.
In practice, some of those things are hidden away - i either have to go off into IOSegmentorConfig and figure out how to make one of those, or hunt around in SemanticSegmentorRunParams to find out from these how to specify them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicitly specify input_resolutions and patch_input_shape instead of wrapping it into SemanticSegmentorRunParams.
| runner = CliRunner() | ||
| models_wsi_result = runner.invoke( | ||
| cli.main, | ||
| [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a correct test.
what this is actually doing:
As default model is fcn-tissue-mask, it is running that within the DeepFeatureExtractor engine and saving the patches that come out of that (the probability maps for tissue/non-tissue) in a wierd zarr array that has shape (512*num_patches, 512, 2)
In practice, the deep-feature-extractor cli isnt useable because of the issue i pointed out in one of my other comments with providing model as string; you cant pass name of a valid CNNBackbone or TimmBackbone in there.
| """Deep Feature Extraction Engine for Digital Pathology. | ||
| This module defines the `DeepFeatureExtractor` class, which extends | ||
| `SemanticSegmentor` to extract intermediate CNN feature representations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't just CNN features (most foundation models are transformer)
|
|
||
|
|
||
| class DeepFeatureExtractor(SemanticSegmentor): | ||
| r"""Generic CNN-based feature extractor for digital pathology images. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove CNN-based
| Example: | ||
| -------- | ||
| >>> from tiatoolbox.models.engine.deep_feature_extractor import DeepFeatureExtractor | ||
| >>> extractor = DeepFeatureExtractor(model="resnet50-kather100k") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example wouldn't work as resnet50-kather100k isnt a feature extraction model (and wouldnt run as patch_output_shape isnt defined in it's ioconfig)
| """Deep Feature Extraction Engine for Digital Pathology. | ||
| This module defines the `DeepFeatureExtractor` class, which extends | ||
| `SemanticSegmentor` to extract intermediate CNN feature representations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| `SemanticSegmentor` to extract intermediate CNN feature representations | |
| `SemanticSegmentor` to extract intermediate feature representations |
|
|
||
|
|
||
| class DeepFeatureExtractor(SemanticSegmentor): | ||
| r"""Generic CNN-based feature extractor for digital pathology images. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| r"""Generic CNN-based feature extractor for digital pathology images. | |
| r"""Generic deep feature extractor for digital pathology images. |
| Whether to overwrite existing output files. Default is False. | ||
| output_type (str): | ||
| Desired output format. Must be "zarr" or "dict". | ||
| **kwargs (SemanticSegmentorRunParams): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expand the docstring for all the options.
🚀 Summary
This PR introduces an updated,
DeepFeatureExtractorengine, to the TIAToolbox framework. It enables extraction of deep CNN features from whole slide images (WSIs) or image patches for downstream tasks such as clustering, visualization, or training other models. The PR also includes a command-line interface (CLI) for this engine, along with comprehensive tests.✨ Key Features
New Engine:
DeepFeatureExtractorCLI Integration
deep-feature-extractorcommand to TIAToolbox CLI.Unit Tests
Codebase Integration
__init__.pyfiles.