Evaluation is catching up to agentic behavior: reliability, not just likability.
arXiv
Research Desk
arXiv Weekly: RLHF evaluation is shifting from preference to reliability
New work suggests preference tuning must be paired with robustness metrics if models are to operate safely as agents.
Recommended
Similar reads
ResearchJan 9, 2026
DeepMind publishes new results on efficient sparse training for large models
New techniques aim to reduce training cost while preserving downstream quality across reasoning and coding evaluations.
ModelsJan 8, 2026
Meta releases Llama‑4 with enhanced reasoning capabilities
The release tightens the open model race, with improved tool-use and stronger performance in long-context tasks.