LLM Era2017demo planned
Transformer
Attention replaces recurrence
Self-attention made long-range context easier to model and training much more parallel.
Roadmap node · demo on the way
Read the wiki entry for Transformer
The interactive demo for this model is on the build list — the live linear regression demo is the template the rest will follow. The wiki entry already has the full concept, history, and where this fits in the chain.