Why Banking77 is hard
Noisy labels
Intent data is messy. Banking77 does not give you a clean synthetic path to good-looking results.
SeedFrontier.ai
Banking77 breakthrough
Intent classification result
Noisy labels. 77 intents. Real overlap. SeedFrontier just hit 94.42% on the official Banking77 test set, up 0.59 percentage points over baseline, under a strict full-train protocol with no leakage.
~225 ms. ~68 MiB. Small enough to run on-robot. Fast enough to keep a humanoid standing. This is the shape humanoid robotics actually needs.
94.42%
official Banking77 test accuracy
Measured on the official test set
+0.59pp
over baseline
Meaningful lift under the same evaluation frame
~225 ms
inference
Runtime kept visible instead of hand-waved away
~68 MiB
model footprint
Compact enough to matter in deployment discussions
Result snapshot
Accuracy
94.42%
official Banking77 test score
Delta
+0.59pp
improvement over baseline
Runtime
~225 ms
inference
Footprint
~68 MiB
model size
Why Banking77 is hard
Intent data is messy. Banking77 does not give you a clean synthetic path to good-looking results.
Why Banking77 is hard
This is a large intent inventory with plenty of room for confusion across semantically adjacent classes.
Why Banking77 is hard
The hard part is not just class count. It is the genuine semantic overlap between user intents.
The headline
The point of this page is not to shout “state of the art” in the abstract. It is to show a result that is specific, measurable, and anchored to a known benchmark with visible operating characteristics.
94.42% on the official Banking77 test set, with a strict full-train protocol, no leakage, roughly 225 milliseconds inference, and a footprint around 68 MiB.
Protocol
The landing page needs to make the evaluation discipline obvious. That is what separates a real benchmark statement from a dressed-up experiment.
The headline number is the real external score, not a cherry-picked split or internal holdout.
The result was produced with a disciplined training setup rather than ad hoc experimentation.
The page makes credibility explicit. No contamination, no hidden shortcut, no soft benchmark framing.
Accuracy is paired with inference and memory footprint so the result reads like a deployable system.
Why this matters
Messaging line
Banking77 is not easy. That is exactly why this result is worth publishing.
Robotics
Tesla Optimus. Boston Dynamics. Figure. 1X. Agility Robotics. Every humanoid and legged platform being built right now is compute-starved. A giant cloud-scale model does not fit in a robot torso powered by a battery. A ~68 MiB model running in ~225 ms does.
That is not a coincidence. It is the frontier these platforms actually live on — and it is where SeedFrontier is built to operate.
Humanoid platforms run on embedded accelerators with thermal and power budgets measured in watts. Cloud offload is not an option when control loops run at kilohertz rates.
Balance, locomotion, and manipulation policies run anywhere from 30 Hz to 1 kHz. Every millisecond of inference is a direct physical constraint on what the robot can actually do.
Every joule spent on inference is a joule not spent on actuators. Smaller, faster models extend runtime and make concurrent on-robot skills possible in the first place.
The on-robot control stack
A humanoid runs many models at once, at wildly different frequencies and sizes. Large general models cannot fill these layers. Small, specialized, production-shaped models can.
Who this is for
If your platform has a thermal budget, a battery, and a control loop, the shape of your models is the shape of your product. Small, fast, accurate models are not a nice-to-have for humanoid robotics — they are the only thing that fits.
Messaging line
Humanoids do not need bigger models. They need the right model — small, fast, and accurate enough to run on-robot without apology. SeedFrontier delivers that shape.
Coming next
Banking77 is a proof point. The challenge is real, the benchmark is official, the gain is measurable, and the runtime profile stays visible — the kind of shape a real deployment actually needs.
The real target is humanoid robotics, where small, fast, specialized models are not optional. The deeper write-up will connect the Banking77 result to on-robot deployment and what this means for the teams building the next generation of humanoid platforms.