Introduction to OpenCV's Deep Neural Networks Module

Contents

Introduction
Thoughts on Deep Neural Networks with OpenCV
The DNN Module

Introduction

OpenCV could be considered as the computer vision framework out there. That does not mean that it is the best for production deployment but it certainly is the most flexible solution to start R&D with. Since computer vision mostly moved towards deep learning based solutions with a bit of (classical) pre-processing it is clear that OpenCV provides some DNN capabilites. OpenCV has the dnn module for quite a while now. There exists the OpenCV model zoo that provides pre-trained models under the Apache 2.0 license which allows commercial deployment.

Thoughts on Deep Neural Networks with OpenCV

Before outlining the DNN module and showing some examples, it makes sense to present some more personal views on advantages/disadvantages and potential use cases. NB!: Everything on this page covers the inference side only. There is no point even trying to train a neural network within some weird OpenCV DNN-based pipeline - not even sure if this is actually possible and how much time it would take to develop it.

A key consideration when choosing any solution to deploy a neural network is that it needs to work with certain resource constraints and more generally speaking it needs to be fast enough to be useful (latency and throughput).
It does not matter what framework ends up to be chosen if it is too slow and resource hungry to be of use.

Generally speaking the DNN module comes with some advantages and disadvantages which results in certain “when to use” and “when not to use” scenarios:

Advantages

easy to use with off-the shelf models if supported
good (enough) performance on CPUs
well integrated into the computer vision framework most people start with (learning, prototyping, etc.)
can be integrated into pipelines generated/optimized by OpenCV’s Graph API (G-API).
Android seems to be supported
optimized for batchsize = 1

Disadvantages

main focus on CPU deployment (yes, there are some integrated GPU/FPGA/NPU and a CUDA backend but that seems not to be the main focus)
slow
lack of flexibility
restricted to a small subset of models

When to use

first attempts to familiarize ourselves with deep neural networks
- just to see what they are capable of (small amount of source code and no training)
prototyping and deployment on SBCs (e.g. Raspberry Pi)
initial evaluation/proof of concept if a model is available via OpenCV’s model zoo
non-frequent inference
- any deployment case where latency is not that important and not a constant stream is processed
- a pipeline might be called 100 times per hour and it doesn’t matter that end-to-end inference takes 100 ms or so (this is a market segment CPU manufacturers are aiming at)

When definitely not to use

deployment of custom model architectures (conversion layers supported by OpenCV may fail)
deploy DNNs on NVIDIA GPUs -> use the triton inference server or build something custom and really optimized (e.g. using ONNX).
deployment in scenarios that contain active learning/continuous (self) learning

The DNN module

OpenCV’s DNN module or the (exposed) API is not very spectacular. Unless we want to write our own module, which would be compiled into OpenCV, there is not much to see or to know (see “How it works”). Anyone who works with e.g. TensorRT or ONNX or PyTorch on the training side (who else remembers TensorFlow 1.x ;)) might have problems adapting to OpenCV’s DNN module due to the extreme simplicity of the exposed API.

Supported Frameworks

There are a number of frameworks, models exported from these to be precise, supported by OpenCV’s DNN module. These are:

Caffe
Darknet
DLDT (OpenVINO)
ONNX (commonly used to import PyTorch models)
TensorFlow
Torch (yes, Torch and not PyTorch)

ONNX and Torch models are self-contained meaning weights and model description are available in one file whereas the other models have model weights and DNN description in two separate files.

Backends and Targets

OpenCV supports vairous backends and compute targets within the DNN module. Moreover, there is a backend under development which is utilizing the ficus programming language](https://github.com/vpisarev/ficus).

Not all backends support all compute targets. Quantiziation support seems to be somewhat limited but functional in some rudimentary way. Choose the compute target according to your needs. The default is the CPU. Which one is the fastest depends mainly on model architectures as these are designed or generated (neural architecture search) for different deployment cases and therefore a model optimized for CPU usage might be slower when deployed on a GPU - especially end to end (full pipeline).

How it works

The general source code structure of any model used looks a bit like this:

a model is read/imported using cv::dnn::readNet or a framework-specific version (e.g. cv::dnn::readNetFromONNX)
(optional) a backend and target is set (net.setPreferableBackend,net.setPreferableTarget)
an image (cv::mat) is prepared for the model using cv::dnn::blobFromImage. This takes care of changing the dimensions of an image from “HWC” to “NCHW” to make it compliant with the common format expected by models and takes care of rescaling, changing color space (BGR2RGB) and normalizing an image as model input.
set “blob” generated as model input (net.setInput) and run a forward pass (net.forward)
the output generated (of type cv::mat) is then post-processed according to whatever a model requires as post-processing

Since the API of the DNN module does not expose too much and some functions are model dependent, there is little use for providing an in-depth description of them. Having a look at example models in the model zoo should be more than enough to answer any questions that may could arise especially since pre- and post-processing are well exposed in these simple examples as well as some model specific classes, sometimes called higher level APIs (e.g. cv::FaceDetectorYN, cv::dnn::TextRecognitionModel). More examples are shown in the documentation.