[FEATURE] Model Request... Add CSATv2 to timm model zoo

**Is your feature request related to a problem? Please describe.**
I would like to contribute a new high-resolution ImageNet-1K model, **CSATv2**, to the timm model zoo.

Right now, CSATv2 is available as a Hugging Face model and I’ve integrated it locally into timm for evaluation, but it is not part of the official timm registry.
I can only report the metrics from my own `validate.py` runs, and I’m not sure if they fully match the “official” timm evaluation environment (flags, batch size, data loader settings, etc.).

It would be helpful to have CSATv2 available as a timm model with a standardized evaluation setup, so that its accuracy/speed trade-offs are directly comparable to existing timm architectures at 512×512 resolution.


**Describe the solution you'd like**
I’d like to add **CSATv2** as a new timm model:

The core idea of CSATv2 is to perform a lightweight **frequency-domain compression** before the main backbone, restricting redundant spatial information and reducing computation. This allows very high throughput at 512×512 while maintaining competitive accuracy.

Summary of current results from my local `validate.py` runs:

<img width="269" height="265" alt="Image" src="https://github.com/user-attachments/assets/a76069df-a2ec-4da0-bfcb-4655bc725768" />

Throughput (measured on my side):
**2,800 images/second on a single NVIDIA A6000 GPU at 512×512 input resolution.**
Batch size and exact benchmarking script can be adjusted to match timm’s preferred protocol.

**Additional context**
The main motivation of CSATv2 is to push high-resolution inference speed while keeping ImageNet-1K accuracy in a reasonable range. By restricting information in the frequency domain early in the pipeline, the backbone can operate on a more compact representation, which results in:

11.1M parameters
80.02% Top-1 / 94.9% Top-5 accuracy on ImageNet-1K @ 512×512
2,800 img/s on a single A6000 GPU (512×512 inputs)

Currently, my validate.py setup only outputs the JSON-style summary shown above.
If there is a standard timm evaluation environment or recommended set of flags (e.g., exact --batch-size, --workers, --amp, --channels-last, etc.) for both accuracy and throughput benchmarks, please let me know.
I will happily re-run the evaluation under your recommended settings and update the numbers accordingly in the PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[FEATURE] Model Request... Add CSATv2 to timm model zoo #2622

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[FEATURE] Model Request... Add CSATv2 to timm model zoo #2622

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions