Trimming the Long-Tail of Visual World Modeling Evaluation Paper • 2606.24256 • Published 8 days ago • 35
GBC: Gradient-Based Connections for Optimizing Multi-Agent Systems Paper • 2606.28187 • Published 5 days ago • 11
PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems Paper • 2606.22388 • Published 10 days ago • 96
Building Social World Models with Large Language Models Paper • 2606.11482 • Published 22 days ago • 2
BenSyc: Benchmarking Conversational Sycophancy and Human Alignment in LLMs for Bengali Contexts Paper • 2606.10061 • Published 23 days ago
AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints Paper • 2606.05622 • Published 27 days ago • 44
Advancing Creative Physical Intelligence in Large Multimodal Models Paper • 2605.26396 • Published May 25 • 21
RoPE Distinguishes Neither Positions Nor Tokens in Long Contexts, Provably Paper • 2605.15514 • Published May 15
GRASP: Learning to Ground Social Reasoning in Multi-Person Non-Verbal Interactions Paper • 2605.15764 • Published May 15 • 4
Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance Paper • 2605.15012 • Published May 14 • 4
RouteProfile: Elucidating the Design Space of LLM Profiles for Routing Paper • 2605.00180 • Published Apr 30 • 30
Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation Paper • 2605.12975 • Published May 13 • 9
The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes Paper • 2605.11182 • Published May 11 • 5
Who Prices Cognitive Labor in the Age of Agents? Compute-Anchored Wages Paper • 2605.05558 • Published May 8 • 3
CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing Paper • 2605.02910 • Published May 6 • 23
Agentic AI Systems Should Be Designed as Marginal Token Allocators Paper • 2605.01214 • Published May 2 • 4
Active Prompting with Chain-of-Thought for Large Language Models Paper • 2302.12246 • Published Feb 23, 2023
LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling Paper • 2604.11748 • Published Apr 15 • 14