Convergence Plus - Online Journal for Information and Communications Technology Industry

Articles

Wednesday, July 2, 2025

Embedded Technologies in Supercomputing

By Pramit Kumar

Gone are the days when supercomputing meant big, rack-based systems in an air conditioned room. Today, embedded processors, FPGAs and GPUs are able to do AI and machine learning operations, enabling new types of local decision making in embedded systems.

Embedded computing technology has evolved way past the point now where complete system functionality on a single chip is remarkable. Today, the levels of compute performance and parallel processing on an IC means that what were once supercomputing levels of capabilities can now be implemented in in chip-level solutions.

While supercomputing has become a generalized term, what system developers are really interested in are crafting artificial intelligence, machine learning and neural networking using today’s embedded processing. Supplying the technology for these efforts are the makers of leading-edge embedded processors, FPGAs and GPUs. In these tasks, GPUs are being used for “general-purpose computing on GPUs”, a technique also known as GPGPU computing.

With all that in mind, embedded processor, GPU and FPGA companies have rolled out a variety of solutions over the last 12 months, aimed at performing AI, machine learning and other advanced computing functions for several demanding embedded system application segments.

FPGAS Take AI Focus

Back March, FPGA vendor Xilinx announced its plans to launch a new FPGA product category it calls its adaptive compute acceleration platform (ACAP). Following up on that, in October the company unveiled Versal—the first of its ACAP implementations. Versal ACAPs combine scalar processing engines, adaptable hardware engines and intelligent engines with advanced memory and interfacing technologies to provide heterogeneous acceleration for any application. But even more importantly, according to Xilinx, the Versal ACAP’s hardware and software can be programmed and optimized by software developers, data scientists and hardware developers alike. This is enabled by a host of tools, software, libraries, IP, middleware and frameworks that facilitate industry-standard design flows.

Built on TSMC’s 7-nm FinFET process technology, the Versal portfolio combines software programmability with domain-specific hardware acceleration and adaptability. The portfolio includes six series of devices architected to deliver scalability and AI inference capabilities for a host of applications across different markets, from cloud to networking to wireless communications to edge computing and endpoints.

The portfolio includes the Versal Prime series, Premium series and HBM series, which are designed to deliver high performance, connectivity, bandwidth, and integration for the most demanding applications. It also includes the AI Core series, AI Edge series and AI RF series, which feature the AI Engine. The AI Engine is a new hardware block designed to address the emerging need for low-latency AI inference for a wide variety of applications and also supports advanced DSP implementations for applications like wireless and radar.

It is tightly coupled with the Versal Adaptable Hardware Engines to enable whole application acceleration, meaning that both the hardware and software can be tuned to ensure maximum performance and efficiency. The portfolio debuts with the Versal Prime series, delivering broad applicability across multiple markets and the Versal AI Core series, delivering an estimated 8x AI inference performance boost compared to industry-leading GPUs, according to Xilinx.

Low-Power AI Solution

Following the AI trend, back in May Lattice Semiconductor unveiled Lattice sensAI, a technology stack that combines modular hardware kits, neural network IP cores, software tools, reference designs and custom design services. In September the company unveiled expanded features of the sensAI stack designed for developers of flexible machine learning inferencing in consumer and industrial IoT applications. Building on the ultra-low power (1 mW to 1 W) focus of the sensAI stack, Lattice released new IP cores, reference designs, demos and hardware development kits that provide scalable performance and power for always-on, on-device AI applications.

Embedded system developers can build a variety of solutions enabled by sensAI. They can build stand-alone iCE40 UltraPlus/ECP5 FPGA based always-on, integrated solutions, with latency, security and form factor benefits. Alternatively, they can use CE40 UltraPlus as an always-on processor that detects key phrases or objects, and wakes-up a high-performance AP SoC / ASIC for further analytics only when required, reducing overall system power consumption. And, finally, you can use the scalable performance/power benefits of ECP5 for neural network acceleration, along with I/O flexibility to seamlessly interface to on-board legacy devices including sensors and low-end MCUs for system control.

Updates to the sensAI stack include a new CNN (convolutional neural networks) Compact Accelerator IP core for improved accuracy on iCE40 UltraPlus FPGA and enhanced CNN Accelerator IP core for improved performance on ECP5 FPGAs. Software tools include an updated neural network compiler tool with improved ease-of-use and both Caffe and TensorFlow support for iCE40 UltraPlus FPGAs.

« Back

Gadget Reviews

Interview

Neil Holmquist
VP - Marketing & Product
Strategy Cloud & IP
Spirent Communications