01 · ResultBanking77 · Official

Small models, serious accuracy.

Seed AutoArch just hit 94.42% on the official Banking77 test set — under a strict full-train protocol, no leakage — at ~68.4 MiB and ~225 ms on CPU. That is the shape humanoid robotics actually needs on-board.

Explore the frontier See the benchmark table

Built for teams shipping

Tesla OptimusBoston DynamicsFigure1XAgility RoboticsApptronik

02 · The Frontier

Pick the shape you ship.

The Seed AutoArch frontier is a family of models, not a single checkpoint. Slide between top accuracy, balanced, and most-efficient to see the tradeoffs on the Banking77 benchmark.

Pareto FrontierBanking77 test

Seed AutoArch×Competitor SOTA

Mode01 / 03

Seed AutoArch · High-Accuracy

When the ceiling matters. Push Banking77 quality higher than the published Test SOTA while staying small enough to run on CPU without apology.

Accuracy

94.42%

+0.59pp over Test SOTA

Inference

~225 ms

end-to-end

Footprint

~68.4 MiB

502,170 heads-only

Efficiency

Pareto

20–30× smaller · 50–100× faster on CPU

vs SPACE · Current Main SOTA. Balanced is an illustrative Pareto midpoint; actual deployments are tuned to your target.

03 · Benchmark

The receipts.

Every row below is a published result or a SeedFrontier-measured number on the same official Banking77 test set, with matching protocol. No cherry-picked splits. No hand-waving.

RankModel / MethodAccuracyvs SOTAParamsRuntimeEfficiency

SPACE

Current Main SOTA

94.94%+1.11ppHundreds of millions~1–2 GB · 20–100 ms GPUbaseline

Seed AutoArch · High-Accuracy

SeedFrontier

94.42%+0.59pp502,170 heads-only~68.4 MiB · ~225 ms~20–30× smaller · ~50–100× faster CPU

Test SOTA (CUD)

Published baseline

93.83%baseline~355M+~1.4 GB · 15–50 ms GPU—

Seed AutoArch · Efficient

SeedFrontier

91.46%−2.37pp54,000Very small · 0.11 ms~10,000×+ smaller · ~100–500× faster

04 · Robotics

Humanoids don’t need bigger models.

They need the right shape — small, fast, accurate enough to run on-robot without apology. A 70B parameter model does not fit in a torso powered by a battery. A 68 MiB model in 225 ms does.

Compute is trapped on-robot

Humanoid platforms run on embedded accelerators with thermal and power budgets measured in watts. Cloud offload is not an option when control loops run at kilohertz rates.

Latency is standing vs. falling

Balance, locomotion, and manipulation policies run anywhere from 30 Hz to 1 kHz. Every millisecond of inference is a direct physical constraint on what the robot can do.

Battery budgets punish bloat

Every joule spent on inference is a joule not spent on actuators. Smaller, faster models extend runtime and make concurrent on-robot skills possible.

05 · The control stack

Every layer is a deployment constraint.

A humanoid runs many models at once, at wildly different frequencies and sizes. Large general models cannot fill these layers. Small, specialized, production-shaped models can.

LayerFrequencySize budget

Balance and whole-body control500–1000 Hzunder 10 MB

Joint-level control1000+ Hzunder 1 MB

Locomotion policy50–200 Hz1–50 MB

Manipulation policy30–100 Hz50–500 MB

Visual perception30–60 Hz20–200 MB

Task planning (VLM)1–5 Hz2–7 B params

Tesla OptimusBoston DynamicsFigure1XAgility RoboticsApptronikSanctuary AIPhysical Intelligence

06 · Get in touch

Ship the right shape.

If your platform has a thermal budget, a battery, and a control loop, the shape of your models is the shape of your product. Let’s talk about what Seed can build for you.

Request a briefing Re-read the result

Talk to us about

On-robot perception and control models
Custom frontier tuning for your hardware target
Banking77-class benchmarks for your domain
Deployment profiles under strict latency budgets