← Back to timeline
LLM Era2022demo planned

RLHF

Align models to human preference

Reinforcement Learning from Human Feedback reshaped pretrained language models into instruction-following assistants by optimizing against learned human preferences.

Roadmap node · demo on the way

Read the wiki entry for RLHF

The interactive demo for this model is on the build list — the live linear regression demo is the template the rest will follow. The wiki entry already has the full concept, history, and where this fits in the chain.