Contents
Introduction
OpenCV could be considered as the computer vision framework out there. That does not mean that it is the best for production deployment but it certainly is the most flexible solution to start R&D with. Since computer vision mostly moved towards deep learning based solutions with a bit of (classical) pre-processing it is clear that OpenCV provides some DNN capabilites. OpenCV has the dnn module
for quite a while now. There exists the OpenCV model zoo that provides pre-trained models under the Apache 2.0 license which allows commercial deployment.
Thoughts on Deep Neural Networks with OpenCV
Before outlining the DNN module and showing some examples, it makes sense to present some more personal views on advantages/disadvantages and potential use cases. NB!: Everything on this page covers the inference side only. There is no point even trying to train a neural network within some weird OpenCV DNN-based pipeline - not even sure if this is actually possible and how much time it would take to develop it.
A key consideration when choosing any solution to deploy a neural network is that it needs to work with certain resource constraints and more generally speaking it needs to be fast enough to be useful (latency and throughput).
It does not matter what framework ends up to be chosen if it is too slow and resource hungry to be of use.
Generally speaking the DNN module comes with some advantages and disadvantages which results in certain “when to use” and “when not to use” scenarios:
Advantages
- easy to use with off-the shelf models if supported
- good (enough) performance on CPUs
- well integrated into the computer vision framework most people start with (learning, prototyping, etc.)
- can be integrated into pipelines generated/optimized by OpenCV’s Graph API (G-API).
- Android seems to be supported
- optimized for batchsize = 1
Disadvantages
- main focus on CPU deployment (yes, there are some integrated GPU/FPGA/NPU and a CUDA backend but that seems not to be the main focus)
- slow
- lack of flexibility
- restricted to a small subset of models
When to use
- first attempts to familiarize ourselves with deep neural networks
- just to see what they are capable of (small amount of source code and no training)
- prototyping and deployment on SBCs (e.g. Raspberry Pi)
- initial evaluation/proof of concept if a model is available via OpenCV’s model zoo
- non-frequent inference
- any deployment case where latency is not that important and not a constant stream is processed
- a pipeline might be called 100 times per hour and it doesn’t matter that end-to-end inference takes 100 ms or so (this is a market segment CPU manufacturers are aiming at)
When definitely not to use
- deployment of custom model architectures (conversion layers supported by OpenCV may fail)
- deploy DNNs on NVIDIA GPUs -> use the triton inference server or build something custom and really optimized (e.g. using ONNX).
- deployment in scenarios that contain active learning/continuous (self) learning
The DNN module
OpenCV’s DNN module or the (exposed) API is not very spectacular. Unless we want to write our own module, which would be compiled into OpenCV, there is not much to see or to know (see “How it works”). Anyone who works with e.g. TensorRT or ONNX or PyTorch on the training side (who else remembers TensorFlow 1.x ;)) might have problems adapting to OpenCV’s DNN module due to the extreme simplicity of the exposed API.
Supported Frameworks
There are a number of frameworks, models exported from these to be precise, supported by OpenCV’s DNN module. These are:
- Caffe
- Darknet
- DLDT (OpenVINO)
- ONNX (commonly used to import PyTorch models)
- TensorFlow
- Torch (yes, Torch and not PyTorch)
ONNX and Torch models are self-contained meaning weights and model description are available in one file whereas the other models have model weights and DNN description in two separate files.
Backends and Targets
OpenCV supports vairous backends and compute targets within the DNN module. Moreover, there is a backend under development which is utilizing the ficus programming language](https://github.com/vpisarev/ficus).
Not all backends support all compute targets. Quantiziation support seems to be somewhat limited but functional in some rudimentary way. Choose the compute target according to your needs. The default is the CPU. Which one is the fastest depends mainly on model architectures as these are designed or generated (neural architecture search) for different deployment cases and therefore a model optimized for CPU usage might be slower when deployed on a GPU - especially end to end (full pipeline).
How it works
The general source code structure of any model used looks a bit like this:
-
a model is read/imported using
cv::dnn::readNet
or a framework-specific version (e.g.cv::dnn::readNetFromONNX
) -
(optional) a backend and target is set (
net.setPreferableBackend
,net.setPreferableTarget
) -
an image (
cv::mat
) is prepared for the model usingcv::dnn::blobFromImage
. This takes care of changing the dimensions of an image from “HWC” to “NCHW” to make it compliant with the common format expected by models and takes care of rescaling, changing color space (BGR2RGB) and normalizing an image as model input. -
set “blob” generated as model input (
net.setInput
) and run a forward pass (net.forward
) -
the output generated (of type
cv::mat
) is then post-processed according to whatever a model requires as post-processing
Since the API of the DNN module does not expose too much and some functions are model dependent, there is little use for providing an in-depth description of them. Having a look at example models in the model zoo should be more than enough to answer any questions that may could arise especially since pre- and post-processing are well exposed in these simple examples as well as some model specific classes, sometimes called higher level APIs (e.g. cv::FaceDetectorYN
, cv::dnn::TextRecognitionModel
). More examples are shown in the documentation.