LLM Era2022demo planned
RLHF
Align models to human preference
Reinforcement Learning from Human Feedback reshaped pretrained language models into instruction-following assistants by optimizing against learned human preferences.
Roadmap node · demo on the way
Read the wiki entry for RLHF
The interactive demo for this model is on the build list — the live linear regression demo is the template the rest will follow. The wiki entry already has the full concept, history, and where this fits in the chain.