Backlog
Why?
Vision Transformers (ViTs) have shown remarkable potential in some AI tasks, with specific models like Swiftformer being well-suited for deployment on compact devices. They offer:
- Superior accuracy across various tasks compared to traditional CNNs.
- Comparable throughput to CNNs on certain hardware platforms, making them viable for edge computing.
What?
The goal is to assess the performance of efficient Vision Transformers, including Swiftformer, on edge accelerators such as the IMX500, focusing on:
- Benchmarking their efficiency, accuracy, and power consumption.
- Validating the status of the support of the IMX500 for such models.
References
- MobileVit: Light-Weight, General-Purpose, and Mobile-Friendly Vision Transformer -- presentation of the article