SLM Agent
Leaderboard

> Evaluating Small Language Models (<10B)
> Task completion within ~20 steps
> Constrained hardware execution

Rank #	Model Identifier	Size	Success %	Avg Steps	Observed Behavior

Status: Active* Lower steps = higher efficiency

Metric_01

Evaluates capability to solve tasks within a hard cap of 20 steps.

Metric_02

Tests adherence to tool interfaces vs hallucinating invalid parameters.

Metric_03

Analyzes error recovery and loop detection in short-term sequences.

Metric_04

Benchmarked on constrained, low-VRAM local consumer hardware.

EnvironmentTextWorld

Sample Size30 Episodes / Model

Constraint20 Step Budget

ValidationStrict Regex Matching

Feel free, to ping us if you want to discuss more, or have questions

Open Issue ->

SLM AgentLeaderboard