Most of us judge a computer or smartphone by its raw speed. In the AI hardware world, this speed is usually measured in “TOPS” (Trillions of Operations Per Second). For years, the rule was simple: higher TOPS meant a better chip.
But as AI evolves from just reading text to actually “seeing” and understanding the physical world—thanks to Vision Large Language Models (Vision LLMs)—that golden rule is falling apart.
And since all our chips were designed following this golden rule they are struggling, and our engineers are reinventing them to meet the demands of edge AI.
For the last decade, AI chips in our phones and cameras were built to do one thing perfectly: identify objects. Think of this like a highly efficient factory assembly line. Every piece of data goes through the exact same steps in the exact same order. The faster the conveyor belt moves (the higher the TOPS), the better the chip performs.
The Chaos of “Seeing” and “Thinking”.
Vision LLMs do not just recognize a stop sign; they look at a busy intersection, track moving cars, understand the weather, and decide what a self-driving car should do next.
This complex reasoning ruins the predictable assembly line. The AI is now constantly bouncing between processing video, retrieving memories, and running complex logic. The factory is no longer an assembly line—it is a chaotic custom workshop trying to build cars, drones, and bicycles all at the same time.
This causes three massive traffic jams inside the chip:
— The Memory Bottleneck: The AI has so much context to juggle that it runs out of fast, on-chip workspace. It has to constantly fetch data from slow, external memory, which creates lag and drains battery life.
— The Attention Overload: The longer an AI looks at something (its “context window”), the more memory it eats up. Eventually, data movement becomes the real speed limit, regardless of how fast the processor is.
— Wasted Workers: Because the tasks are so irregular, parts of the chip sit completely idle waiting for other parts to finish their jobs.
You can add all the raw speed (TOPS) you want, but if the memory is jammed and the workers are waiting, the chip is still slow.
The Packet Processing Fix.
To solve this, hardware designers are throwing out the assembly line and rethinking how data moves. Instead of forcing the chip to process massive layers of data all at once, new architectures (like Expedera’s “Origin”) break the work down into tiny, smart “packets.”
To understand why this is a game-changer, let us walk through a real-world example: a self-driving car’s AI trying to process a pedestrian stepping off a curb.
The Old Way (Layer-by-Layer Architecture): The traditional chip processes the entire street scene one massive step at a time. First, it analyzes every single pixel in the frame for basic shapes (Layer 1). Because this is so much data, it runs out of space and has to shove all that information into slow, external memory. Next, it hauls all that data back out to identify the pedestrian (Layer 2), then stores it again. Finally, it pulls the data back a third time to predict the pedestrian’s movement (Layer 3). Most of the chip’s time and energy is wasted just moving giant chunks of data back and forth to storage.
The New Way (Expedera’s Origin): Instead of processing the whole scene layer-by-layer, the new architecture creates “packets.” Think of a packet like a mini delivery drone carrying a specific, bite-sized task—in this case, just the data relating to the pedestrian. Instead of waiting in a long, rigid line, this drone flies straight through the chip. It hits the vision block, then the reasoning block, and finally the action block, all in one smooth trip without ever stopping to store data in external memory. Once the job is done, the drone drops its data and gets out of the way.

This clears the traffic jam. It keeps the chip fully utilized, prevents data from piling up, and drastically reduces the need to fetch data from slow, external memory.

The Bottom Line.
As we put smarter, seeing AI into everything from smart glasses to medical devices, the ultimate question is no longer “How fast is the chip?” The real question is “How smart is the traffic control inside the chip?” The future of edge computing is not about raw muscle—it is about extreme efficiency.
Discover more from WireUnwired Research
Subscribe to get the latest posts sent to your email.




