← Back to timeline
LLM Era2017demo planned

Transformer

Attention replaces recurrence

Self-attention made long-range context easier to model and training much more parallel.

Roadmap node · demo on the way

Read the wiki entry for Transformer

The interactive demo for this model is on the build list — the live linear regression demo is the template the rest will follow. The wiki entry already has the full concept, history, and where this fits in the chain.