Overview

NeuPro-S™ is a low power AI processor architecture for on-device deep learning inferencing, imaging and computer vision workloads.

While NeuPro-S provides a self-contained and specialized AI processor, it also supports heterogeneous co-processing with custom AI engines to enable additional customer differentiation and cover specific application needs, enabling it to fit a broad range of end markets including IoT, smartphones, surveillance, automotive, robotics, medical and industrial.

NeuPro-S builds on CEVA’s industry-leading position and experience in deep neural networks for computer vision applications. Dozens of customers are already deploying the CEVA-XM4 and CEVA-XM6 and NeuPro vision platforms along with the CDNN Compiler in consumer, surveillance and ADAS products.

This new AI processor architecture covers a wide range of processing options, ranging from 2 Tera Ops Per Second (TOPS) up to 12.5 TOPS per core and is fully scalable to reach above 100 TOPS using multi-core instantiations. NeuPro-S was designed to meet the most stringent safety compliance standards and comes complete with a full complementary software stack including CDNN, CEVA-CV, CEVA-SLAM SDK and Wide-angle imaging algorithms.

Benefits

The NeuPro AI processor family were designed to reduce the high barriers-to-entry into the AI space in terms of both architecture and software. Enabling an optimized and cost-effective standard AI platform that can be utilized for a multitude of AI-based workloads and applications

Self-contained, unified imaging, computer vision and AI Processor in single architecture
Unique 4096 native 8x8 MACS processing enabling up to 12.5 TOPS for single core and 100+ TOPS for multi-core instantiations
System aware architecture, optimized for memory bandwidth, power and performance efficiency

Main Features

  • NeuPro-S AI processor consists of NeuPro-S Engine and CEVA-XM Vision DSP
    • NeuPro-S Engine - Specialized engines for Convolution, Activation and Pooling layers as well as weights decompression
    • CEVA-XM6 - Fully programmable vector DSP for complementary NN functions, simultaneous processing of computer vision, imaging and customer extensions workloads
    • Supports both 8-bit and 16-bit quantization mix to enable real-time decision tradeoff between precision vs. performance
  • Supports multi-level memory system hierarchy enables multi-core scalability
  • Optimized DDR bandwidth enabling weight compression and exploring network sparsity
  • Advanced hardware DMA controllers for parallel processing and minimizing system overhead
  • The NeuPro-S AI processor architecture includes the following processor options:
    • NPS1000 includes 1024 8x8-bit MAC units
    • NPS2000 includes 2048 8x8-bit MAC units
    • NPS4000 includes 4096 8x8-bit MAC units
  • Supports heterogeneous scalability and co-processing with custom AI engines to enable further customer differentiation

Block Diagram

Microprocessor Report

Ceva and Synopsys Spin More TOPS

IP Vendors Roll Out Multicore Deep-Learning Accelerators, by Mike Demler (October 7, 2019).

The rapid adoption of machine learning is driving IP vendors to compete by scaling up performance in each new generation of licensable deep-learning accelerators (DLAs).
Last year, DLAs integrating up to 4,096 multiply-accumulate (MAC) units per core set the standard for licensable inference engines.
But advanced driver assistance systems (ADASs) and other high-performance edge devices demand even greater performance, so with their latest products, Ceva and Synopsys aim to shatter that mark.