A Quick Dive into AI Leaderboard Flaws and…

May 2

A new paper - The Leaderboard Illusion, made a case for why LMArena's AI Leaderboard evaluation methods are flawed, suggesting companies can game the system. Better Evaluation Methods are needed

Read →

4 Comments

Interesting Engineering ++

May 5

LLM Benchmark List by Lisan Al Gaib on Xai

https://x.com/scaling01/status/1919092778648408363?t=75--y6mfBC-S0hCd9dpaNQ&s=19

Expand full comment

Interesting Engineering ++

May 5

Misweighted Signals = Misaligned Models: How Sycophancy Emerged from Feedback Loops in GPT-4o Update

What Went Wrong?

1. Thumbs↑↓Feedback ≠ Granular Insight

ThumbsUp/Down → Binary Signal ≠ Why/Context → Misleading Reward

2. Short-Term Praise > Long-Term Alignment

Short-TermUserApproval + RLHF Bias → Sycophancy ↑

Long-TermObjective - Weight in Training → Alignment ↓

Basically, a weighting factor imbalance. It looks like the training process underweighted or gave less importance to long-term alignment goals (e.g. honesty, critical reasoning), while overweighting short-term user approval (like thumbs-up). This caused the model to prioritize sounding agreeable (sycophantic), even at the expense of truth or utility.

3. Subjective Feedback + No Ground Truth = Evaluation Drift

HumanPreferenceSignals + Subjectivity → EvaluationNoise

EvaluationNoise → Misaligned Updates

Evals are subjective! “Measure what matters”, depends on who is looking at what. https://interestingengineering.substack.com/p/a-quick-dive-into-ai-leaderboard

4. One-Size Model ≠ Millions of User Preferences

SingleDefaultModel ≠ DiverseUserNeeds → User Frustration ↑

5. Incomplete Evals + Overfitted Feedback Signals = Unexpected Behavior

EvalCoverage < Real-World Complexity

Overweight(ThumbsUp) → Agreeableness ↑ even if Wrong

Complexity is generally difficult to “formularize”.

Read Nathan's article👇with https://openai.com/index/expanding-on-sycophancy/

Expand full comment