AI & ML interests
None defined yet.
Recent Activity
Papers
ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks
datasets 16
benchflow/env0-experiment-trajectories
Updated • 6.67k
benchflow/env0-qwen35-9b-mobile300-prime-sft
Viewer • Updated • 300 • 31
benchflow/env0-qwen35-9b-full2003-prime-sft
Preview • Updated • 40
benchflow/env0-qwen35-9b-full1703-prime-sft
Viewer • Updated • 1.7k • 42
benchflow/env0-prime-sft-smoke10-arrow
Viewer • Updated • 10 • 32
benchflow/env0-prime-sft-smoke10
Viewer • Updated • 10 • 30
benchflow/skillsbench
Updated • 4.31k • 6
benchflow/skillsbench-leaderboard
Updated • 12.4k • 1
benchflow/benchmarks
Updated • 48
benchflow/skillsbench-research-artifacts
Updated • 35