Today we are delighted to announce that our suite of tools now includes Transformers!🤖
We are also showcasing our new Import Model Feature, to demonstrate how we can import the latest novel model to deploy to the edge!
This is exciting as transformers was an area previously reserved for deep learning on large scale datasets producing models too large to deploy on the edge.
Read the What are Transformers? sectionWhat are Transformers?
Transformers are a powerful and efficient technique for processing data pioneered in large scale NLP applications. They are based on the idea of using a set of self-attention layers to learn to focus on the most relevant features in the data. This makes them well-suited for natural language processing tasks and vision systems.[1]
Transformers can perform similar tasks to Recurrent Neural Networks, through the ability to capture longer-term dependencies in the data by using attention mechanisms which allow the model to focus on specific parts of the input.
Self-Attention works by assigning weights to each part of the input sequence and using those weights to determine which parts of the sequence should be given more importance when making predictions.
Read the Why is this important? sectionWhy is this important?
There has been a growing interest in Transformers and their applications to the Edge most recently we noted Georgi Gerganov from ViewRay, Inc. has posted of the “growing interest for doing efficient transformer model inference on-device (i.e. at the edge).” Georgi is well-known for his impressive work porting whisper.cpp and llama.cpp to run on edge devices, which he has published on GitHub https://github.com/ggerganov
Transformers have become an indispensable staple in the modern deep learning stack[2], but they have typically been challenging to deploy to edge devices—until today!
Read the How can I get started? sectionHow can I get started?
The key building block of a transformer is the Keras MultiHeadAttention layer. As part of a recent release we now support this layer. You can add this layer through Expert mode, and use it as a replacement for your layers.
Now you have access to the Timeseries Transformer model as first described in Timeseries classification with a Transformer model by Theodoros Ntakouris based on material from Attention Is All You Need arXiv:1706.03762v5 [cs.CL] [3].
We have run preliminary testing with the Timeseries classification with a Transformer model and are very excited to see what you do with Transformers, as it is such a research led topic that pushes the boundaries on the state-of-the-art.
Read the Introducing Import Model! sectionIntroducing Import Model!

To get the model deployed quickly we recommend using our all new Import Model Feature with a tflite version of the Timeseries classification with a Transformer model which we have included in our documentation as an example.
We do expect some challenges to arise as people explore Transformers, and our new Upload your Model feature. Please post any questions you have and any projects you create over on our forum or tag @EdgeImpulse on our social media channels!
If you are a researcher and wish to reference Edge Impulse in your papers please see: Edge Impulse Partners with Harvard SEAS to Release Academic Paper
Further Reading on Transformers:
[1] Dive into Deep Learning(2022) https://d2l.ai/chapter_attention-mechanisms-and-transformers/self-attention-and-positional-encoding.html
[1] Tay, Y., Dehghani, M., Abnar, S., Shen, Y., Bahri, D., Pham, P., ... & Metzler, D. (2020). Long range arena: A benchmark for efficient transformers. arXiv preprint arXiv:2011.04006 .
[3] Attention Is All You Need arXiv:1706.03762v5 [cs.CL]
[4] Transferring Knowledge on Time Series with the Transformer - https://wandb.ai/covid/covid_forecast/reports/Transferring-Knowledge-on-Time-Series-with-the-Transformer--VmlldzoxNDEzOTk