Training data-efficient image transformers & distillation through attention

Abstract

1. Introduction