Skip to content

[FEATURE] Model Request... Add CSATv2 to timm model zoo #2622

@gusdlf93

Description

@gusdlf93

Is your feature request related to a problem? Please describe.
I would like to contribute a new high-resolution ImageNet-1K model, CSATv2, to the timm model zoo.

Right now, CSATv2 is available as a Hugging Face model and I’ve integrated it locally into timm for evaluation, but it is not part of the official timm registry.
I can only report the metrics from my own validate.py runs, and I’m not sure if they fully match the “official” timm evaluation environment (flags, batch size, data loader settings, etc.).

It would be helpful to have CSATv2 available as a timm model with a standardized evaluation setup, so that its accuracy/speed trade-offs are directly comparable to existing timm architectures at 512×512 resolution.

Describe the solution you'd like
I’d like to add CSATv2 as a new timm model:

The core idea of CSATv2 is to perform a lightweight frequency-domain compression before the main backbone, restricting redundant spatial information and reducing computation. This allows very high throughput at 512×512 while maintaining competitive accuracy.

Summary of current results from my local validate.py runs:

Image

Throughput (measured on my side):
2,800 images/second on a single NVIDIA A6000 GPU at 512×512 input resolution.
Batch size and exact benchmarking script can be adjusted to match timm’s preferred protocol.

Additional context
The main motivation of CSATv2 is to push high-resolution inference speed while keeping ImageNet-1K accuracy in a reasonable range. By restricting information in the frequency domain early in the pipeline, the backbone can operate on a more compact representation, which results in:

11.1M parameters
80.02% Top-1 / 94.9% Top-5 accuracy on ImageNet-1K @ 512×512
2,800 img/s on a single A6000 GPU (512×512 inputs)

Currently, my validate.py setup only outputs the JSON-style summary shown above.
If there is a standard timm evaluation environment or recommended set of flags (e.g., exact --batch-size, --workers, --amp, --channels-last, etc.) for both accuracy and throughput benchmarks, please let me know.
I will happily re-run the evaluation under your recommended settings and update the numbers accordingly in the PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions