Pumping machine health numbers.

In the world of machine learning, proving a model's worth means more than just clever code—it means measurable results. The MLCommons MLPerf Tiny Inference suite sets a standard for just that, allowing developers to benchmark the performance of their AI acceleration approach against meaningful, representative challenges. At Literal Labs, we embraced this benchmark to showcase the performance of our anomaly detection model.

But, while the MLPerf Tiny Inference suite uses a popular dataset for research, other, real-world datasets provide other means by which to measure an AI model.

From toys to machinery

The anomaly detection task in MLPerf’s Tiny Inference benchmark uses the ToyADMOS dataset, which is well-suited for controlled experiments. The "toy" isn’t just in the name—it’s in the source. The ToyADMOS dataset is built on sounds captured from various children’s toys, providing a playful starting point for researchers.

But, while anomalies in toy sounds help validate AI approaches, they are not industrially gathered. Enter the MIMII dataset, crafted by Hitachi. It’s a dataset born of necessity—real, gritty, commercial machinery at work. Sounds from pumps, fans, valves, and slide rails recorded in situ, representing the challenges of predictive maintenance as they unfold on factory floors and in pump rooms.

This is not an academic exercise. This is where AI meets industry—where detecting anomalies can mean the difference between downtime and smooth operation, where efficiency saves money, and foresight prevents breakdowns.

Here’s how Literal Labs' anomaly detection AI model, trained logically, stacks up against the performance of neural network approaches when applied to the MIMII dataset:

	Accuracy	F1 score
Neural Network	0.9267	0.8499
Literal Labs	0.9733	0.9693

As with our MLPerf anomaly results, our models don’t just compete—they lead, delivering superior accuracy and precision where it matters most: at the edge and in real-world scenarios. For this comparison, we utilised a standard 3-layer neural network with 69,888 float parameters. While adding further layers is an option, a 3-layer network strikes the right, commercial balance for the MIMII dataset—simple enough to avoid overfitting and ensure accuracy, yet complex enough to capture non-linear patterns in the data. Adding layers beyond three risks diminishing returns, increased computational load, and the pitfalls of overfitting without significant gains.

Download the whitepaper

Curious how these results compare to the MLPerf Tiny benchmark? Follow the link to dive deeper into our results. You’ll also have access to our white paper, where we break down our methodology, techniques, and performance across a key anomaly detection dataset.

Download the whitepaper

Pumping machine health numbers.

From toys to machinery

Download the whitepaper

Literal Labs

Updates

Company