The growing complexity of imaging pipelines continues to surprise us. Far from the simplified view of a camera connecting directly to an AI stage for analysis, now a series of computer vision (CV) transformations are required to condition images (possibly from multiple cameras) before they are ready for AI processing. This need is especially apparent in surveillance camera applications.
Figure 1: Drone technology in agriculture (Source: Ceva)
Surveillance cameras demand
Surveillance camera growth continues for general applications in security and cost management. Home safety is an obvious example. More generally protecting small and large stores from recent trends in flash robbery is growing in importance, both for business safety and to minimize costs and inconvenience for us consumers. Drones mounted with multi-spectral cameras (color vision fused with infra-red vision for example), through their wide-ranging mobility can seek out early wildfire signs, alerting fire crews to suppress spot fires before they run out of control. Similarly, camera enabled drones can survey crops to monitor irrigation, fertilization and pest management.
All these applications require high quality yet compact vision solutions, commonly across multiple cameras and with minimal power budgets. Before AI takes over, important CV conditioning functions must pre-process camera images and videos to the latest requirements for surveillance imaging quality, through transformations which can’t be managed efficiently by either a CPU or an NPU.
An interesting opportunity with technical challenges
The market for surveillance cameras is proven and growing. Already 100 million homes use personal smart home surveillance cameras and over a billion surveillance cameras are in use worldwide. Further the global surveillance camera market is forecasted to grow with CAGR of 16.8% for 2022 – 2029. Much of this growth is attributed to increasing tech content in these systems, especially in smart devices for remote monitoring.
“Smart” is the AI part but before getting to AI, these systems must be able to generate unified and rectified views of a scene on which the AI can run. What does this mean? Some cameras use a fisheye lens, able to monitor through 360o but producing highly distorted images. Others use multiple cameras to produce 360o views (or wide-angle views). But then those images must be stitched together seamlessly. You may have seen something like this in recent car models which can generate an all-around view that seems to be taken from a camera above the car but is in fact built from front, rear and side-view cameras.
Another obvious requirement is support for digital pan, tilt and zoom (PTZ). Not so hard in an advanced traffic surveillance camera but not a given in the budgets we want in consumer surveillance camera or drones. Yet we need PTZ just as much, if not more, for interactive support in such applications, to zoom in on a potentially suspicious area or to look around for other possible problems.
These capabilities require additional computer vision processing steps which don’t fit neatly into front-end camera image processing or back-end AI. For the high-quality real-time surveillance imaging essential to accurate detection and minimizing false alarms, these steps demand DSP-class accuracy and throughput.
Figure 2: Dewarping on a mobile device (Source: Ceva)
The Ceva-SensPro2 CV DSP
By way of example, the fisheye corrections mentioned above commonly depend on trigonometric transformations across the entire image field, a task well outside the capabilities of the central CPU inside such surveillance imaging systems. This is a classic CV problem, requiring heavy parallelism in processing across wide vector words, together with support for non-linear functions such as trig or hyperbolic trig. The Ceva-SensPro2 DSP family provides exactly that support in an embedded vision DSP, scalable from 128 to 1024 INT8 MACs (also configurable to INT16 and Floating Point). This is organized as six highly configurable cores able to offer significant parallelism where needed for multi-camera applications. It is even able to serve front-end AI needs up to a few TOPs, for example in basic object detection, reducing the load on a subsequent AI engine.
Ceva-SensPro2 is supported by a very rich set of application libraries, across CV in multiple conditioning methods, in SLAM for navigation (in drones and robots for example) and in conventional neural net AI methods. These are not only math extensions. For example, support in CV includes of course filters and color conversion operations, but also direct support for feature detection and image warp correction through multiple standard algorithms. For PTZ operations, the library offers image transformation functions alongside math and vector processing functions. For support in AI network library functions include tensor manipulation, convolution, pooling, activation and recurrence. All of these run in an open-source framework which ports to SensPro2.
Do you really need to add another core to support your pipeline?
This is an obvious question, worthy of a definitive answer. Conceptually it should be clear that functions like de-warping, image stitching and so on should run faster on a dedicated CV engine than on a general-purpose CPU. More telling is customer adoption. Novatek, a Taiwanese public company, announced that they built SensPro2 into their NT98530 chip in support of surveillance applications. They use this for 360o de-warping, 4Mx4M camera stitching, video stabilization, AI scene detect to guide ISP auto-adjustment, RGB plus IR sensor fusion and stereo depth support. All for 4K imaging at 60 frames per second. For which they depend heavily on SensPro2 and the SDK.
There are other unannounced customers. Clearly, they all felt the complexity of the pipeline task they were required to support required high performance CV conditioning in the pipeline. Which should be pretty obvious from Novatek’s range of requirements.
If you have similar needs, give us a call or check out our product page