Are Transformers All That We Needed? - Live Podcast (virtual)
Guests:
Tanya Dadasheva, Co-Founder & CEO Nekko AI
Roman Shaposhnik, Co-founder & CTO Nekko Ai
Host:
Greg Chase, Community Organizer at AIFoundry.org
The leak of Meta’s Llama large language model and the subsequent open-source innovation around it are popularly considered the spark of the current hype cycle around generative AI.
However, we feel that the publishing of a seminal research paper from Google in 2017, “Attention Is All You Need,” is the true seed of the current round of innovation in AI. The simplified transformer architecture described in this paper allows for more efficient training while producing higher-quality results. By sharing their findings, Google unlocked an entirely new generation of Large Language Models that underpin this first delivery of AI to the masses.
The Attention Paper deserves even more attention because this architecture is useful in many AI applications besides natural language processing (NLP), such as audio, image generation, time series analysis, and multi-modal use cases. Its simplicity supports the innovation of specialization hardware like ASICs, which outperform NVIDIA by 100x (just taped out!).
In addition to improving the training of large language models, the transformer architecture has allowed for innovation in inference engines, such as Llama.cpp. If the industry standardizes on a transformer kernel, this can be the basis for further innovation. It is in the industry’s broader interest that this standard kernel be collectively developed as open-source software.
We are going to review the initial paper, debate whether Transformers is THE architecture that we need, discuss what other innovations it has inspired, and share our view of the industry going forward.