How Good Can Linear Models Be for Time-Series Forecasting? Paper • 2606.27282 • Published 6 days ago • 7
How Good Can Linear Models Be for Time-Series Forecasting? Paper • 2606.27282 • Published 6 days ago • 7
CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies Paper • 2606.16613 • Published 16 days ago • 9
CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies Paper • 2606.16613 • Published 16 days ago • 9
CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies Paper • 2606.16613 • Published 16 days ago • 9
Mitigating Reward Hacking in RLHF via Advantage Sign Robustness Paper • 2604.02986 • Published Apr 3 • 3
Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests Paper • 2606.07379 • Published 26 days ago • 5
HakushoBench: A Japanese Chart and Table VQA Benchmark from Governmental White Papers Paper • 2606.01132 • Published May 31 • 6
HakushoBench: A Japanese Chart and Table VQA Benchmark from Governmental White Papers Paper • 2606.01132 • Published May 31 • 6