You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+10-8Lines changed: 10 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -453,7 +453,7 @@ model = RegionViT(
453
453
dim= (64, 128, 256, 512), # tuple of size 4, indicating dimension at each stage
454
454
depth= (2, 2, 8, 2), # depth of the region to local transformer at each stage
455
455
window_size=7, # window size, which should be either 7 or 14
456
-
num_classes=1000, # number of output lcasses
456
+
num_classes=1000, # number of output classes
457
457
tokenize_local_3_conv=False, # whether to use a 3 layer convolution to encode the local tokens from the image. the paper uses this for the smaller models, but uses only 1 conv (set to False) for the larger models
458
458
use_peg=False, # whether to use positional generating module. they used this for object detection for a boost in performance
459
459
)
@@ -496,6 +496,8 @@ pred = nest(img) # (1, 1000)
496
496
497
497
A new <ahref="https://arxiv.org/abs/2111.06377">Kaiming He paper</a> proposes a simple autoencoder scheme where the vision transformer attends to a set of unmasked patches, and a smaller decoder tries to reconstruct the masked pixel values.
498
498
499
+
<ahref="https://www.youtube.com/watch?v=LKixq2S2Pz8">DeepReader quick paper review</a>
500
+
499
501
You can use it with the following code
500
502
501
503
```python
@@ -809,13 +811,13 @@ Coming from computer vision and new to transformers? Here are some resources tha
809
811
## Citations
810
812
```bibtex
811
813
@article{hassani2021escaping,
812
-
title = {Escaping the Big Data Paradigm with Compact Transformers},
813
-
author = {Ali Hassani and Steven Walton and Nikhil Shah and Abulikemu Abuduweili and Jiachen Li and Humphrey Shi},
814
-
year = 2021,
815
-
url = {https://arxiv.org/abs/2104.05704},
816
-
eprint = {2104.05704},
817
-
archiveprefix = {arXiv},
818
-
primaryclass = {cs.CV}
814
+
title = {Escaping the Big Data Paradigm with Compact Transformers},
815
+
author = {Ali Hassani and Steven Walton and Nikhil Shah and Abulikemu Abuduweili and Jiachen Li and Humphrey Shi},
0 commit comments