Sample-efficient and uncertainty-aware RLHF

Barna Pásztor Active Learning, Large Language Models (LLMs), Preference Learning, Reinforcement Learning, Uncertainty Quantification

Reinforcement learning from human feedback relies on reward models trained from preference data. But collecting high-quality preference labels is expensive, and reward models trained on limited data can be unreliable. Our two projects address these connected problems from complementary angles.

ActiveUltraFeedback introduces a modular active learning pipeline that uses uncertainty-aware reward estimates to select informative response pairs, reducing the amount of preference data needed for strong downstream performance by 3-6x.

RewardUQ complements this by studying how to measure and compare uncertainty in reward models, providing a unified framework for evaluating accuracy and calibration.

Together, these works show how uncertainty can be used both as a practical signal for efficient preference data generation and as a diagnostic tool for reliable reward modelling.

Project Page.