LLM Era2022demo planned

RLHF

Align models to human preference

Reinforcement Learning from Human Feedback reshaped pretrained language models into instruction-following assistants by optimizing against learned human preferences.

Roadmap node · demo on the way

Read the wiki entry for RLHF

The interactive demo for this model is on the build list — the live linear regression demo is the template the rest will follow. The wiki entry already has the full concept, history, and where this fits in the chain.

Read the wiki →See the live demo pattern Back to timeline