By
Alain Blancquart - EvoChip CEO
May 12, 2025
In the exhilarating sprint toward artificial intelligence supremacy, one number is starting to matter more than all the rest: inferences per second. It’s a cold, clinical phrase—but beneath it lies a deeply human concern: how to make powerful AI smarter, faster, and vastly more energy-efficient. Especially on devices that live outside the walls of cloud data centers.
While AI is increasingly ubiquitous—powering everything from voice assistants and translation apps to self-driving vehicles and medical diagnostics—the cost of running these brainy systems is soaring. Behind every ChatGPT response, photo-enhancing filter, or facial recognition scan is a mountain of computation. And that computation burns energy. A lot of it.
Why efficiency now matters more than ever
This isn't just a problem of big tech bills. It’s a challenge of sustainability, access, and innovation.
AI models today are astonishingly capable, but they are often heavy, power-hungry beasts. They live most comfortably in vast, climate-controlled data centers stacked with high-end processors and cooled by powerful fans. But not everyone can—or should—rely on such infrastructure.
Consider the life-saving potential of AI running on mobile devices in rural clinics, or autonomous drones surveying wildfire zones, or language apps translating dialects in war zones. These aren’t scenarios where power is unlimited, or cooling is easy. In these places, resource constraints aren't technical hiccups—they're hard limits. Improving efficiency—how much useful work an AI can do for every watt of energy—is not a luxury. It’s a necessity.
The AI boom’s dirty secret
Let’s start with a hard truth: AI has a growing environmental footprint. Estimates vary, but researchers say that a single query to a large model like GPT-4 can consume as much energy as an oven running for a second. It sounds trivial, until you realize we’re making billions of these queries every day. Multiply that out, and you’re looking at gigawatt-hours of energy—enough to power small cities.
Even tech companies are beginning to sound the alarm. Executives from OpenAI, Google, and Microsoft have all acknowledged that AI’s energy costs are unsustainable unless we find ways to do more with less.
The little devices that could (If we let them).
While much of the AI conversation focuses on flashy new cloud chips or the energy use of data centers, the efficiency story matters even more on the edge—the billions of phones, cameras, wearables, robots, and appliances around us. These edge devices often operate on batteries or modest power budgets. They can’t offload every AI task to the cloud due to latency, connectivity, or privacy concerns. For them, the equation is simple: If AI can’t be efficient, it can’t be local.
Companies and researchers are responding. Engineers have developed dual-AI systems where a lightweight “scout” model handles most tasks and only calls in the big guns when truly needed. On edge devices like the NVIDIA Jetson Nano, this approach has reduced energy consumption by over 85% while keeping accuracy nearly intact.
Other innovations include shrinking the AI models themselves through techniques like pruning (removing unneeded connections) or quantization (simplifying the math). It’s the computational equivalent of fitting a grand piano into a studio apartment—and making it play just as well.
A new kind of arms race
We’ve long measured AI by how big it could get: more data, more layers, more compute. But the new frontier is how small and efficient it can become while staying smart.
It’s a shift with profound implications. For one, it levels the playing field. If AI can run well on a few $ device or a solar-powered instrument, it becomes accessible to schools, nonprofits, startups, and developing countries—not just tech giants. It also offers a lifeline to the planet. Making AI models 10x more efficient doesn’t just save money. It slashes energy bills and reduces carbon emissions. When deployed at scale, these savings become monumental. And let’s not forget: Efficiency means speed. An AI that runs twice as efficiently often runs twice as fast. That’s not just nice—it’s critical in time-sensitive applications like autonomous vehicles, disaster response,
and battlefield intelligence.
The path ahead: Rethink, redesign, reinvent
To meet the growing appetite for intelligent tools without breaking our power grids or our budgets, we’ll need more than clever algorithms. We need a mindset shift—from brute-force intelligence to elegant, efficient AI.
That means:
• Designing new chips that integrate memory and compute to avoid power-hungry data transfers.
• Optimizing models not just for accuracy, but for speed and energy.
• Benchmarking AI not just by what it can do, but how cleanly it can do it.
Some researchers are calling for the introduction of “energy labels” on AI services—like nutrition facts for software. Others are exploring new metrics like “inferences per watt” or “inferences per dollar per watt.” The idea is simple: If we care about what AI can do, we should care just as much about what it costs the world to do it.
The bottom line
AI is no longer just a technological frontier—it’s an environmental and social one. The next generation of breakthroughs won’t just be smarter. They’ll be leaner, faster, and dramatically more efficient. Because the real revolution in artificial intelligence won’t come from building models that can think like humans. It will come from teaching them to think like engineers—and do more with less.