degirum
PySDK - DeGirum AI Inference Software Development Kit
Version 0.2.1
Copyright DeGirum Corp. 2022
The DeGirum AI Inference Software Development Kit, PySDK, is a Python package which provides APIs to perform inference on ML models.
PySDK consists of the following components:
- PySDK library, which includes the following essential modules:
degirum.zoo_manager
- AI model zoo management classdegirum.model
module - AI model management classesdegirum.postprocessor
module - AI inference result handling classes
- The
CoreClient
extension module implementing APIs to work with DeGirum AI accelerator cards installed locally - The
aiclient
extension module implementing APIs to work with DeGirium AI servers running on remote systems - The
degirum.server
module - helper module that allows launching AI server on a local AI hardware to be used by remote clients.
The PySDK package provides the necessary tools to support the following use cases:
- AI inferences on the local system and DeGirum AI accelerator hardware installed on the local system
- Deploy the system with installed DeGirum AI accelerator hardware as AI server
- Connect to one or several AI servers and perform AI inferences on those servers remotely
Installation
Basic Installation of PySDK Python Package
It is recommended to install PySDK in a virtual environment. Please see PySDK Supported Configurations page for a list of supported system configurations.
To install DeGirum PySDK from the DeGirum index server use the following command:
python -m pip install degirum --extra-index-url https://degirum.github.io/simple
To force reinstall the most recent PySDK version without reinstalling all dependencies use the following command:
python -m pip install degirum --upgrade --no-deps --force-reinstall --extra-index-url https://degirum.github.io/simple
Kernel Driver Installation
Linux Kernel Driver Installation
Hosts that have DeGirum Orca card installed need a driver to enable its functionality. The driver is distributed as a source package via a DeGirum aptitude repository. It can be built automatically after download.
Add the following line to
/etc/apt/sources.list
to register the DeGirum APT repository:deb-src https://degirum.github.io/apt-repo ORCA main
Download DeGirum public key by running the following command:
wget -O - -q http://degirum.github.io/apt-repo/DeGirum.gpg.key | sudo apt-key add -
Update package information from configured sources, and download prerequisites by running the following commands:
sudo apt update sudo apt install dpkg-dev debhelper
Finally, download and build DeGirum Linux driver package by running the following command:
sudo apt-get --build source orca-driver
Kernel Driver Installation for Other Operating Systems
The current version of DeGirum software package supports only Linux kernel driver. Kernel driver support for other operating systems is under development. This document will be updated when kernel drivers for other operating systems will be released.
Quick Start
Note: This quick start guide covers the Local inference use case when you run AI inferences on the local host with DeGirum AI accelerator hardware installed on this local host as an option. See "System Configuration for Specific Use Cases" section below for more use cases.
To start working with PySDK you import degirum
package:
import degirum as dg
The main PySDK entry point is degirum.connect_model_zoo
function, which creates and returns
degirum.zoo_manager.ZooManager
zoo manager object:
zoo = dg.connect_model_zoo()
By default, zoo manager automatically connects to DeGirum public model zoo located in the cloud, and you have free access to all AI models from this public model zoo.
To see the list of all available AI models, use degirum.zoo_manager.ZooManager.list_models
method.
It returns a list of strings, where each string is a model name:
model_list = zoo.list_models()
print(model_list)
If you want to perform AI inference using some model, you need to load it using degirum.zoo_manager.ZooManager.load_model
method. You provide the model name as a method argument. The model name should be one of model names returned by
list_models()
method, for example:
model = zoo.load_model("mobilenet_v2_ssd_coco--300x300_quant_n2x_orca_1")
The load_model()
method returns a degirum.model.Model
object, which can be used to perform AI inferences.
Before performing AI inferences, you may want to adjust some model parameters. The model class has a list of parameters which can be modified during runtime. These model parameters affect how the input data is pre-processed and how the inference results are post-processed. All model parameters have reasonable default values, so in the beginning you may skip this step.
Some usable model parameters are:
Property Name | Description | Possible Values |
---|---|---|
degirum.model.Model.image_backend |
image processing package to be used | "auto" , "pil" , or "opencv" "auto" tries PIL first |
degirum.model.Model.input_pad_method |
how input image will be padded when resized | "stretch" or "letterbox" |
degirum.model.Model.output_confidence_threshold |
confidence threshold to reject results with low scores | Float value in [0..1] range |
degirum.model.Model.output_nms_threshold |
rejection threshold for non-max suppression | Float value in [0..1] range |
degirum.model.Model.overlay_color |
color to draw AI results | Tuple in (R,G,B) format |
degirum.model.Model.overlay_font_scale |
font scale to print AI results | Float value |
degirum.model.Model.overlay_show_labels |
True to show class labels when drawing AI results | True/False |
degirum.model.Model.overlay_show_probabilities |
True to show class probabilities when drawing AI results | True/False |
For the complete list of model parameters see section "Model Parameters" below.
Now you are ready to perform AI inference. To do inference you either invoke degirum.model.Model.predict
method
or simply call the model, supplying the input image as an argument. The inference result is returned.
A model may accept input images in various formats:
- as a string containing the file name of the image file on the local file system:
result = model("./images/TwoCats.jpg")
- as a string containing URL of the image file:
result = model("https://degirum.github.io/images/samples/TwoCats.jpg")
- as a PIL image object:
from PIL import Image
image = Image.open("./images/TwoCats.jpg")
result = model(image)
- as a numpy array (for example, returned by OpenCV):
import cv2
image = cv2.imread("./images/TwoCats.jpg")
model.input_numpy_colorspace = "BGR" # set colorspace to match OpenCV-produced numpy array
result = model(image)
The result object returned by the model (an object derived from degirum.postprocessor.InferenceResults
class)
contains the following information:
- numeric inference results
- graphical inference results
- original image
Numeric results can be accessed by degirum.postprocessor.InferenceResults.results
property. This property returns a
list of result dictionaries, one dictionary per detected object or class. The format of this dictionary is model
dependent. For example, to iterate over all classification model inference results you may do this:
for r in result.results:
print(f"Detected class: r["label"] with probability r["score"]")
Tip: if you just print your inference result object, all the numeric results will be pretty-printed in YAML format:
print(result)
Graphical results can be accessed by degirum.postprocessor.InferenceResults.image_overlay
property. This property
returns a graphical object containing an original image with all inference results draw over it. The graphical object
type depends on the graphical package specified for the model image_backend
(if you omit it, it will be PIL, if PIL
is installed, otherwise OpenCV). Once you get this object, you may display it, print it, or save it to a file the way
you like using the graphical package of your choice. For example, for PIL:
result.image_overlay.save("./images/TwoCatsResults.jpg")
result.image_overlay.show()
And the original image can be accessed by degirum.postprocessor.InferenceResults.image
property, which returns graphical
object, whose type again depends on the graphical package specified for the model image_backend
.
System Configuration for Specific Use Cases
The PySDK package can be used in the following use cases:
- Local inference: running AI inferences on the local host with DeGirum AI accelerator hardware installed on this local host as an option
- AI Server inference: running AI inferences on remote AI server host with DeGirum AI accelerator hardware installed on that remote host
- AI Server hosting: configuring the host with DeGirum AI accelerator hardware as AI server to be used for remote AI inferences
The following sections provide step-by-step instructions how tp setup the system for particular use case.
Configuration for Local Inference
- Install PySDK package as described in "Basic Installation of PySDK Python Package" section above.
If your system is equipped with DeGirum AI accelerator hardware, install the kernel driver as described in "Kernel Driver Installation" section above.
Note: If your system is not equipped with any AI accelerator hardware, the set of models available for local inference will be limited only to CPU models.
If you plan to operate with DeGirum public model zoo, follow instructions described in "Quick Start" section above.
If you plan to operate with locally deployed model, you need to provide full path to the model JSON file when calling
connect_model_zoo
:zoo = dg.connect_model_zoo("full/path/to/model.json")
Refer to "Model Zoo Manager" section below for the explanation how to open model zoo with particular local model.
If you plan to operate with private model zoo, you need to provide zoo URL and zoo access token when calling
connect_model_zoo
:zoo = dg.connect_model_zoo("https://my.private.zoo", token = "@my#secret%token1234")
Configuration for AI Server Inference
- Install PySDK package as described in "Basic Installation of PySDK Python Package" section above.
- Follow instructions described in "Quick Start" section above with exception that when calling
connect_model_zoo
you need to provide hostname or IP address of the AI server you want to use for AI inference:python zoo = dg.connect_model_zoo("192.168.0.118")
Configuration for AI Server Hosting
- Install the kernel driver as described in "Kernel Driver Installation" section above.
- Follow instructions provided in "Configuring and Launching AI Server" section below.
Model Zoo Manager
A model zoo is a collection of AI models.
Depending on the deployment location, there are several types of model zoos supported by PySDK:
- cloud model zoo: deployed on the cloud resource (for example GitHub repository);
- AI server model zoo: deployed on the DeGirum AI server host;
- local model zoo: deployed on the local file system of PySDK installation host.
You connect to the model zoo by calling degirum.connect_model_zoo
function. You supply the model zoo
identification string (called URL string) as a function parameter. The function returns the
Model Zoo manager object which allows you to access a collection of models from that model zoo.
The type of the model zoo is defined by the URL string which you pass as a parameter to the
degirum.connect_model_zoo
function. The following cases are supported:
None
or not specified URL. In this case, the connection to DeGirum public cloud model zoo is established.- A string which starts with
"https://"
prefix. Such a URL string is treated as the URL of a private cloud model zoo. In this scenario, the connection to that model zoo is established using security token defined by thetoken
parameter. - A string which defines an Internet host name or IP address. Such a URL string is treated as the address of
a remote DeGirum AI server host serving AI server model zoo (refer to
degirum.server
for more info). In this case, an attempt to connect to that AI server is made. In the case of unsuccessful connection, an exception is raised. - A string which defines a local file path to a
".json"
file. Such a URL string is treated as a single-file local model zoo. The".json"
file must be valid DeGirum model configuration file, otherwise an exception is raised.
Note: this option is mostly used for testing/debugging new models during model development which are not yet released in any model zoo.
In the case of the cloud model zoo, if for whatever reason a connection to that cloud zoo fails, it is not considered as an error, and no exception is raised. Instead the Model Zoo manager object continues to work with the cached content of that cloud zoo, which is stored in the cache subdirectory associated with that cloud zoo. If the very first connection to the cloud zoo fails and there is no cached content, such zoo is treated as an empty zoo with no models. In the case of successful connection, the list of models is downloaded from the cloud zoo and placed in the cache subdirectory. This cache mechanism allows you to work with cloud zoos in offline mode: when online, PySDK downloads to the cache all models which you used for inference, then, going offline, you continue using models, downloaded when online (see section "Loading AI Models" for details on loading models for inference).
Cache subdirectories are maintained per each cloud zoo URL. Cache subdirectories' root location is operating system specific:
- For Windows it is
%APPDATA%/degirum
- For Linux it is
~/.local/share/degirum
The Model Zoo manager object allows you to perform the following activities:
- list and search models available in the connected model zoo;
- create AI model handling objects to perform AI inferences;
- request various AI model parameters;
Listing and Searching AI Models
AI model is represented in a model zoo by a set of files stored in the model subdirectory which is unique for each model. Each model subdirectory contains the following model files:
Model File | Description |
---|---|
<model name>.json |
JSON file containing all model parameters. The name of this file is the name of the model. This file is mandatory. |
<model name>.n2x |
DeGirum Orca binary file containing the model. This file is mandatory for DeGirum Orca models |
<model name>.tflite |
TensorFlow Lite binary file containing the model. This file is mandatory for TFLite models |
<class dictionary>.json |
JSON file containing class labels for classification or detection models. This file is optional. |
To obtain the list of available AI models, you may use degirum.zoo_manager.ZooManager.list_models
method.
This method accepts arguments which specify the model filtering criteria. All the arguments are optional.
If a certain argument is omitted, then the corresponding filtering criterion is not applied.
The following filters are available:
Argument | Description | Possible Values |
---|---|---|
model_family |
Model family name filter. Used as a search substring in the model name | Any valid substring like "yolo" , "mobilenet" |
device |
Inference device filter: a string or a list of strings of device names | "orca" : DeGirum Orca device"cpu" : host CPU"edgetpu" : Google EdgeTPU device |
precision |
Model calculation precision filter: a string or a list of strings of model precision labels | "quant" : quantized model"float" : floating point model |
pruned |
Model density filter: a string or a list of strings of model density labels | "dense" : dense model"pruned" : sparse/pruned model |
runtime |
Runtime agent type filter: a string or a list of strings of runtime agent types | "n2x" : DeGirum N2X runtime"tflite" : Google TFLite runtime |
The method returns a list of model name strings. These model name strings are to be used later when you load AI models for inference.
When you work with the cloud zoo but you are offline, you still can list models. In this case, the model list will be taken from the cache subdirectory associated with that cloud model zoo.
The degirum.zoo_manager.ZooManager.list_models
method returns the list of models, which was requested at the time
you connected to a model zoo by calling degirum.connect_model_zoo
. This list of models is then stored inside the
Model Zoo manager object, so subsequent calls to list_models
method would quickly return the model list without
connecting to a remote model zoo. If you suspect that the remote model zoo contents changed, then to update the model
list you need to create another instance of Zoo Manager object by calling degirum.connect_model_zoo
.
Loading AI Models
Once you obtained the AI model name string, you may load this model for inference by calling
degirum.zoo_manager.ZooManager.load_model
method and supplying the model name string as its argument.
For example:
model = zoo.load_model("mobilenet_v2_ssd_coco--300x300_quant_n2x_orca_1")
If a model with the supplied name string is found, the load_model()
method returns model handling object of
degirum.model.Model
class. Otherwise, it throws an exception.
If you load the model from the cloud model zoo, this model will be downloaded first and stored in the cache subdirectory associated with that cloud model zoo. If the model already exists in the cache, it will be loaded from that cache but only if the cached model checksum matches the model checksum in the cloud zoo. If checksums do not match, the model from the cloud zoo will be downloaded again into the cache. When offline, you still can work with models stored in the cache. But when you try to load a model, which is not in the cache, you will get an exception.
Note: Model Zoo manager does not provide any explicit method to download a model from the cloud zoo: the model is downloaded automatically when possible, if the model is not in cache or the cached model checksum does not match the model checksum in the cloud zoo. However, the
degirum.server
module providesdegirum.server.download_models
function to explicitly download the whole cloud model zoo to the local directory (see more in "Configuring and Launching AI Server" section below)
If you load the model from the AI server model zoo, the command to load the model will be sent to the AI server: the connected AI server will handle all model loading actions remotely.
Note: The AI Server lists the models that it serves and you can only load those models: the job of managing remote AI server model zoo is not handled by Model Zoo manager class and should be done different way. Please refer to "Configuring and Launching AI Server" section for details.
If you load the model from the local model zoo, it will be loaded from model file referred by local model zoo link.
Note: single-model local model zoo is intended to be used for testing/debugging new models during model development. It assumes that you have all necessary tools for model creation and are able to create and compile new models. This document does not provide details of this development process.
Running AI Model Inference
Once you loaded an AI model and obtained model handling object, you can start doing AI inferences on your model.
The following methods of degirum.model.Model
class are available to perform AI inferences:
degirum.model.Model.predict
anddegirum.model.Model.__call__
to run prediction of a single data framedegirum.model.Model.predict_batch
to run prediction of a batch of framesdegirum.model.Model.predict_dir
to run prediction of multiple files in a directory
The predict()
and __call__
methods behave exactly the same way (actually, __call__
just calls predict()
).
They accept single argument - input data frame, perform AI inference of that data frame, and return inference result -
an object derived from degirum.postprocessor.InferenceResults
superclass.
The batch prediction methods, predict_batch()
and predict_dir()
, perform predictions of multiple frames in a
pipelined manner, which is more efficient than just calling predict()
method in a loop.
These methods are described in details in "Batch Inferences" section below.
Input Data Handling
PySDK model prediction methods support different types of input data. An exact input type depends on the model to be used. The following input data types are supported:
- image input data
- audio input data
- raw tensor input data
The input data object you supply to model predict methods also depends on the number of inputs the model has. If the model has single data input, then the data objects you pass to model predict methods are single objects. If the model has multiple data inputs, then the data objects you pass to model predict methods are lists of objects: one object per corresponding input.
The number and the type of inputs of the model are described by the InputType
property of the ModelInfo
class
returned by degirum.model.Model.model_info
property (see section "Model Info" for details about model
info properties). The InputType
property returns the list of input data types, one type per model input.
So the number of model inputs can be deduced by evaluating the length of the list returned by the InputType
property.
The following sections describe details of input data handling for various model input types.
Image Input Data Handling
When dealing with model inputs of image type (InputType
is equal to "Image"
), the PySDK model prediction
methods accept a wide variety of input data frame types:
- the input frame can be the name of a file with frame data;
- it can be the HTTP URL pointing to a file with frame data;
- it can be a numpy array with frame data;
- it can be a PIL
Image
object; - it can by
bytes
object containing raw frame data.
An AI model requires particular input tensor dimensions and data type which, in most of the cases, does not match the dimensions of the input frame. In this case, PySDK performs automatic conversion of the input frame to the format compatible with AI model input tensor, performing all the necessary conversions such as resizing, padding, colorspace conversion, and data type conversion.
PySDK performs input frame transformations using one of the two graphical packages (called backends): PIL or OpenCV.
The backend is selected by degirum.model.Model.image_backend
property. By default it is set to auto
, meaning
that PIL backend will be used first, and if it is not installed, then OpenCV backend will be used.
You may explicitly select which backend to use by assigning either "pil"
or "opencv"
to
degirum.model.Model.image_backend
property.
Note: In case of OpenCV backend, you cannot pass PIL Image objects to model predict methods.
If your input frame is in the file on a local filesystem, or is accessible through HTTP protocol, pass the filename string or URL string directly to model predict methods: PySDK will (down-)load the file, decode it, and convert to the model input tensor format. The set of supported graphical file formats is defined solely by the graphical backend library you selected, PIL or OpenCV - PySDK does not perform any own decoding.
Sometimes, image conversion to AI model input tensor format requires image resizing. This resizing can be done in two possible ways:
- preserving the aspect ratio;
- not preserving the aspect ratio.
You can control the way of image resizing by degirum.model.Model.input_pad_method
property, which has two possible
values: "stretch"
or "letterbox"
. When you select "stretch"
method, the input image is resized exactly to the
AI model input tensor dimensions, possibly changing the aspect ratio. When you select "letterbox"
method (default way),
the aspect ratio is preserved. The voids which can appear on the image sides are filled with the color specified by
degirum.model.Model.input_letterbox_fill_color
property (black by default).
You can specify the resize algorithm in the degirum.model.Model.input_resize_method
property, which may have the
following values:
"nearest"
, "bilinear"
, "area"
, "bicubic"
, or "lanczos"
.
These values specify various interpolation algorithms used for resizing.
In case your input frames are stored in numpy arrays, you may need to tell PySDK the order of colors in those numpy
arrays: RGB or BGR. This order is called the colorspace. By default, PySDK treats numpy arrays as having RGB
colorspace. So if your numpy arrays as such, then no additional action is needed from your side. But if your numpy
arrays have color order opposite to default, then you need to change degirum.model.Model.input_numpy_colorspace
property.
Note: If a model has multiple image inputs, the PySDK applies the same input_***
image properties as discussed above
for every image input of a model.
Audio Input Data Handling
When dealing with model inputs of audio type (InputType
is equal to "Audio"
), PySDK does not perform any
conversions of the input data: it expects numpy 1-D array with audio waveform samples of proper size and with proper
sampling rate. The waveform size should be equal to InputWaveformSize
model info property. The waveform sampling
rate should be equal to InputSamplingRate
model info property. And finally the data element type should be equal
to the data type specified by the InputRawDataType
model info property.
All aforementioned model info properties are the properties of the ModelInfo
class returned by
degirum.model.Model.model_info
property (see section "Model Info" for details).
Tensor Input Data Handling
When dealing with model inputs of raw tensor type (InputType
is equal to "Tensor"
), PySDK expects that you
provide a 4-D numpy array of proper dimensions.
The dimensions of that array should match model input dimensions as specified by the following model info properties:
InputN
for dimension 0,InputH
for dimension 1,InputW
for dimension 2,InputC
for dimension 3.
The data element type should be equal to the data type specified by the InputRawDataType
model info property
(see section "Model Info" for details).
Inference Results
All model predict methods return result objects derived from degirum.postprocessor.InferenceResults
class.
Particular class types of result objects depend on the AI model type: classification, object detection, pose detection
etc. But from the user point of view, they deliver identical functionality.
Result object contains the following data:
degirum.postprocessor.InferenceResults.image
property keeps original image;degirum.postprocessor.InferenceResults.image_overlay
property keeps original image with inference results drawn on a top; the type of such drawing is model-dependent:- for classification models, the list of class labels with probabilities is printed below the original image;
- for object detection models, bounding boxes of detected object are drawn on the original image;
- for pose detection models, detected keypoints and keypoint connections are drawn on the original image;
degirum.postprocessor.InferenceResults.results
property keeps a list of numeric results (follow the property link for detailed explanation of all result formats);degirum.postprocessor.InferenceResults.image_model
property keeps the binary array with image data converted to AI model input specifications. This property is assigned only if you setdegirum.model.Model.save_model_image
model property before performing predictions.
The results
property is what you typically use for programmatic access to inference results. The type of results
is
always a list of dictionaries, but the format of those dictionaries is model-dependent.
Also, if the result contains coordinates of objects, all such coordinates are recalculated from the model coordinates
back to coordinates on the original image, so you can use them directly.
The image_overlay
property is very handy for debugging and troubleshooting. It allows you to quickly assess the correctness
of the inference results in graphical form.
There are result properties which affect how the overlay image is drawn:
degirum.postprocessor.InferenceResults.overlay_alpha
: transparency value (alpha-blend weight) for all overlay details;degirum.postprocessor.InferenceResults.overlay_font_scale
: font scaling factor for overlay text;degirum.postprocessor.InferenceResults.overlay_line_width
: line width in pixels for overlay lines;degirum.postprocessor.InferenceResults.overlay_color
: RGB color tuple for drawing all overlay details;degirum.postprocessor.InferenceResults.overlay_show_labels
: flag to enable drawing class labels of detected objects;degirum.postprocessor.InferenceResults.overlay_show_probabilities
: flag to enable drawing probabilities of detected objects;degirum.postprocessor.InferenceResults.overlay_fill_color
: RGB color tuple for filling voids.
When each individual result object is created, all these overlay properties (except overlay_fill_color
) are assigned
with values of similarly named properties taken from the model object (see "Model Parameters" section below for the
list of model properties). This allows assigning overlay property values only once and applying them to all consecutive
results. But if you want to play with individual result, you may reassign any of overlay properties and then re-read
image_overlay
property. Each time you read image_overlay
, it returns new image object freshly drawn according
to the current values of overlay properties.
Note" overlay_fill_color
is assigned with degirum.model.Model.input_letterbox_fill_color
.
Batch Inferences
If you need to process multiple frames using the same model and the same settings, the most effective way to do is to use batch prediction methods of `degirum.model.Model' class:
degirum.model.Model.predict_batch
method to run predictions on a list of frames;degirum.model.Model.predict_dir
method to run predictions on files in a directory.
Both methods perform predictions of multiple frames in a pipelined manner, which is more efficient than just calling
predict()
method in a loop.
Both methods return the generator object, so you can iterate over inference results. This allows you to directly use the result of batch prediction methods in for-loops, for example:
for result in model.predict_batch(['image1.jpg','image2.jpg']):
print(result)
Note: Since batch prediction methods return generator object, simple assignment of batch prediction method result to some variable does not start any inference. Only iterating over that generator object does.
The predict_batch
method accepts single parameter: an iterator object, for example, a list. You populate your
iterator object with the same type of data you pass to regular predict()
, i.e. input image path strings,
input image URL string, numpy arrays, or PIL Image objects (in case of PIL image backend).
The predict_dir
method accepts a filepath to a directory containing graphical files for inference.
You may supply optional extensions
parameter passing the list of file extensions to process.
Model Parameters
The model behavior can be controlled with various Model
class properties, which define model parameters.
They can be divided into the following categories:
- parameters, which control how to handle input frames;
- parameters, which control the inference;
- parameters, which control how to display inference results;
- parameters, which control model run-time behavior and provide access to model information
The following table provides complete summary of Model
class properties arranged by categories.
Property Name | Description | Possible Values | Default Value |
---|---|---|---|
Input Handling Parameters | |||
image_backend |
package to be used for image processing | "auto" , "pil" , or "opencv" "auto" tries PIL first |
"auto" |
input_letterbox_fill_color |
image fill color in case of 'letterbox' padding | 3-element tuple of RGB color | (0,0,0) |
input_numpy_colorspace |
colorspace for numpy arrays | "RGB" or "BGR" |
"RGB" |
input_pad_method |
how input image will be padded when resized | "stretch" or "letterbox" |
"letterbox" |
input_resize_method |
interpolation algorithm for image resizing | "nearest" , "bilinear" , "area" , "bicubic" , "lanczos" |
"bilinear" |
save_model_image |
flag to enable/disable saving of model input image in inference results | Boolean value | False |
Inference Parameters | |||
output_confidence_threshold |
confidence threshold to reject results with low scores | Float value in [0..1] range | 0.1 |
output_max_detections |
maximum number of objects to report for detection models | Integer value | 20 |
output_max_detections_per_class |
maximum number of objects to report for each class for detection models | Integer value | 100 |
output_max_classes_per_detection |
maximum number of classes to report for detection models | Integer value | 30 |
output_nms_threshold |
rejection threshold for non-max suppression | Float value in [0..1] range | 0.6 |
output_pose_threshold |
rejection threshold for pose detection models | Float value in [0..1] range | 0.8 |
output_postprocess_type |
inference result post-processing type. You may set it to 'None' to bypass post-processing. |
String | Model-dependent |
output_top_k |
Number of classes with biggest scores to report for classification models. If 0 , report all classes above confidence threshold |
Integer value | 0 |
output_use_regular_nms |
use regular (per-class) NMS algorithm as opposed to global (class-ignoring) NMS algorithm for detection models | Boolean value | False |
Display Parameters | |||
overlay_alpha |
transparency value (alpha-blend weight) for all overlay details | Float value in [0..1] range | 0.5 |
overlay_color |
color for drawing all overlay details | 3-element tuple of RGB color | (255,255,128) |
overlay_font_scale |
font scaling factor for overlay text | Positive float value | 1.0 |
overlay_line_width |
line width in pixels for overlay lines | 3 |
|
overlay_show_labels |
flag to enable drawing class labels of detected objects | Boolean value | True |
overlay_show_probabilities |
flag to enable drawing probabilities of detected objects | Boolean value | False |
Control and Information Parameters | |||
devices_available |
list of inference device indices which can be used for model inference (read-only) | List of integer values | N/A |
devices_selected |
list of inference device indices selected for model inference | List of integer values | Equal to devices_available |
label_dictionary |
model class label dictionary (read-only) | Dictionary | N/A |
measure_time |
flag to enable measuring and collecting inference time statistics | Boolean value | False |
model_info |
model information object to provide read-only access to model parameters (read-only) | ModelParams object |
N/A |
non_blocking_batch_predict |
flag to control the blocking behavior of predict_batch() method |
Boolean value | False |
Model Info
AI models have a lot of static attributes defining various model features and characteristics. Unlike model properties, these attributes in most cases cannot be changed: they come with the model.
To access all model attributes, you may query read-only model property degirum.model.Model.model_info
.
Note: New deep copy of model info class is created each time you read this property, so any changes made to this copy will not affect model behavior.
Model attributes are divided into the following categories:
- Device-related attributes
- Pre-processing-related attributes
- Inference-related attributes
- Post-processing-related attributes
The following table provides a complete summary of model attributes arranged by categories.
The Attribute Name column contains the name of the ModelInfo
class member returned by the model_info
property.
Note: Each attribute in the Pre-Processing-Related Attributes group is a list of values, one per model input.
Attribute Name | Description | Possible Values |
---|---|---|
Device-Related Attributes | ||
DeviceType |
Device type to be used for AI inference of this model | "ORCA" : DeGirum Orca,"EDGETPU" : Google EdgeTPU,"CPU" : host CPU |
RuntimeAgent |
Type of runtime to be used for AI inference of this model | "N2X" : DeGirum NNExpress runtime,"TFLITE" : Google TFLite runtime |
Pre-Processing-Related Attributes | ||
InputType |
Model input type | List of the following strings:"Image" : image input type,"Audio" : audio input type,"Tensor" : raw tensor input type |
InputN |
Input frame dimension size | 1 Other sizes to be supported |
InputH |
Input height dimension size | Integer number |
InputW |
Input width dimension size | Integer number |
InputC |
Input color dimension size | Integer number |
InputQuantEn |
Enable input frame quantization flag (set for quantized models) | Boolean value |
InputRawDataType |
Data element type for audio or tensor inputs | List of the following strings:"DG_UINT8" : 8-bit unsigned integer,"DG_INT16" : 16-bit signed integer,"DG_FLT" : 32-bit floating point |
InputTensorLayout |
Input tensor shape and layout | List of the following strings:"NHWC" : 4-D tensor frame-height-width-colorMore layouts to be supported |
InputColorSpace |
Input image colorspace (sequence of colors in C dimension) | List of the following strings:"RGB" , "BGR" |
InputImgNormEn |
Enable global normalization of input image flag | List of boolean values |
InputImgNormCoeff |
Normalization factor for input image global normalization | List of float values |
InputImgMean |
Mean value for per-channel image normalization | List of 3-element arrays of float values |
InputImgStd |
StDev value for per-channel image normalization | List of 3-element arrays of float values |
InputQuantOffset |
Quantization offset for input image quantization | List of float values |
InputQuantScale |
Quantization scale for input image quantization | List of float values |
InputWaveformSize |
Input waveform size in samples for audio input types | List of positive integer values |
InputSamplingRate |
Input waveform sampling rate in Hz for audio input types | List of positive float values |
Inference-Related Attributes | ||
ModelPath |
Path to the model JSON file | String with filepath |
ModelInputN |
Model frame dimension size | 1 Other sizes to be supported |
ModelInputH |
Model height dimension size | Integer number |
ModelInputW |
Model width dimension size | Integer number |
ModelInputC |
Model color dimension size | Integer number |
ModelQuantEn |
Enable input frame quantization flag (set for quantized models) | Boolean value |
Post-Processing-Related Attributes | ||
OutputNumClasses |
Number of classes model detects | Integer value |
OutputSoftmaxEn |
Enable softmax step in post-processing flag | Boolean value |
OutputClassIDAdjustment |
Class ID adjustment: number subtracted from the class ID reported by the model | Integer value |
OutputPostprocessType |
Post-processing type | "Classification" , "Detection" , "DetectionYolo" , "PoseDetection" , "FaceDetect" , "Segmentation" , "BodyPix" , "Python" Other types to be supported |
OutputConfThreshold |
Confidence threshold to reject results with low scores | Float value in [0..1] range |
OutputNMSThreshold |
Rejection threshold for non-max suppression | Float value in [0..1] range |
OutputTopK |
Number of classes with biggest scores to report for classification models | Integer number |
MaxDetections |
Maximum number of objects to report for detection models | Integer number |
MaxDetectionsPerClass |
Maximum number of objects to report for each class for detection models | Integer number |
MaxClassesPerDetection |
Maximum number of classes to report for detection models | Integer number |
UseRegularNMS |
Use regular (per-class) NMS algorithm as opposed to global (class-ignoring) NMS algorithm for detection models | Boolean value |
Inference Advanced Topics
Selecting Devices for Inference
Every AI model in a model zoo is designed to work on a particular hardware, either on AI accelerator hardware such as DeGirum Orca, or on host computer CPU. Imagine the situation when the host computer is equipped with multiple hardware devices of a given type, and you run multiple inferences of a model designed for this device type. In this case by default all available hardware devices of this type will be used for this model inferences. This guarantees top inference performance in the case of single model running on all available devices.
To get the information about available devices you query degirum.model.Model.devices_available
property.
It returns the list of device indices of all available devices of the type this model is designed for.
Those indices are zero-based, so if your host computer has a single device of a given type, the returned list
would contain single zero element: [0]
. In case of two devices it will be [0, 1]
and so on.
In certain cases you may want to limit the model inference to particular subset of available devices.
For example, you have two devices and you want to run concurrent inference of two models.
In default case both devices would be used for both model inferences causing the models to be reloaded to
devices each time you run the inference of another model. Even if the model loading for DeGirum Orca devices
is extremely fast, it still may cause performance degradation. In this case you may want to run the first model
inference only on the first device, and the second model inference only on the second device.
To do so you need to assign degirum.model.Model.devices_selected
property of each model object to contain
the list of device indices you want your model to run on. In our example you need to assign the list [0]
to the
devices_selected
property of the first model object, and the list [1]
to the second model object.
In general, the list you assign to the devices_selected
property should contain only indices occurred in the
list returned by the devices_available
property.
Handling Multiple Streams of Frames
The Model class interface has a method, degirum.model.Model.predict_batch
, which can run multiple predictions
on a sequence of frames. In order to deliver the sequence of frames to the predict_batch
you implement
an iterable object, which returns your frames one-by-one. One example of iterable object is a regular Python
list, another example is a function, which yields frame data using yield
statement. Then you pass such iterable
object as an argument to the predict_batch
method. In turn, the predict_batch
method returns a generator object,
which yields prediction results using yield
statement.
All the inference magic with pipelining sequential inferences, asynchronously retrieving inference results,
supporting various inference devices, and AI server vs. local operation modes happens inside the implementation
of predict_batch
method. All you need to do is to wrap your sequence of frame data in an iterable object, pass this
object to predict_batch
, and iterate over the generator object returned by predict_batch
using either
for
-loop or by repeatedly calling next()
built-in function on this generator object.
The following example runs the inference on an infinite sequence of frames captured from the camera:
import cv2 # OpenCV
stream = cv2.VideoCapture(0) # open video stream from local camera #0
def source(): # define iterator function, which returns frames from camera
while True:
ret, frame = stream.read()
yield frame
for result in model.predict_batch(source()): # iterate over inference results
cv2.imshow("AI camera", res.image_overlay) # process result
But what if you need to run multiple concurrent inferences of multiple asynchronous data streams with different frame
rates? The simple approach when you combine two generators in one loop either using zip()
built-in function or by
manually calling next()
built-in function for every generator in a loop body will not work effectively.
Non-working example 1. Using zip()
built-in function:
batch1 = model1.predict_batch(source1()) # generator object for the first model
batch2 = model2.predict_batch(source2()) # generator object for the second model
for result1, result2 in zip(batch1, batch2)
# process result1 and result2
Non-working example 2. Using next()
built-in function:
batch1 = model1.predict_batch(source1()) # generator object for the first model
batch2 = model2.predict_batch(source2()) # generator object for the second model
while True:
result1 = next(batch1)
result2 = next(batch2)
# process result1 and result2
The reason is that the Python runtime has Global Interpreter Lock (GIL), which allows running only one thread at a time blocking the execution of other threads. So if the currently running thread is itself blocked by waiting for the next frame or waiting for the next inference result, all other threads are blocked as well.
For example, if the frame rate of source1()
is slower than the frame rate of source2()
and assuming that the
model inference frame rates are higher than the corresponding source frame rates, then the code above will
spend most of the time waiting for the next frame from source1()
, not letting frames from source2()
to be retrieved,
so the model2
will not get enough frames and will idle, losing performance.
Another example is when the inference latency of model1
is higher than the inference queue depth expressed in time
(this is the product of the inference queue depth expressed in frames and the single frame inference time).
In this case when the model1
inference queue is full, but inference result is not ready yet, the code above will
block on waiting for that inference result inside next(batch1)
preventing any operations with model2
.
To get around such blocks the special non-blocking mode of batch predict operation is implemented. You turn
on this mode by assigning True
to degirum.model.Model.non_blocking_batch_predict
property.
When non-blocking mode is enabled, the generator object returned by predict_batch()
method accepts None
from the input iterable object. This allows you to design non-blocking frame data source iterators: when no data
is available, such iterator just yields None
without waiting for the next frame. If None
is returned from the
input iterator, the model predict step is simply skipped for this iteration.
Also in non-blocking mode when no inference results are available in the result queue at some iteration,
the generator yields None
result. This allows to continue execution of the code which operates with another model.
In order to operate in non-blocking mode you need to modify your code the following way:
- Modify frame data source iterator to return
None
if no frame is available yet, instead of waiting for the next frame. - Modify inference loop body to deal with
None
results by simply skipping them.
Measure Inference Timing
The degirum.model.Model
class has a facility to measure and collect model inference time information.
To enable inference time collection assign True
to degirum.model.Model.measure_time
property.
When inference timing collection is enabled, the durations of individual steps for each frame prediction are accumulated in internal statistic accumulators.
To reset time statistic accumulators you use degirum.model.Model.reset_time_stats
method.
To retrieve time statistic accumulators you use degirum.model.Model.time_stats
method.
This method returns a dictionary with time statistic objects. Each time statistic object accumulates time statistics
for particular inference step over all frame predictions happened since the timing collection was enabled or reset.
The statistics includes minimum, maximum, average, and count. Inference steps correspond to dictionary keys.
The following dictionary keys are supported:
Key | Description |
---|---|
FrameTotalDuration_ms |
Frame total inference duration from the moment when you invoke predict method to the moment when inference results are returned |
PythonPreprocessDuration_ms |
Duration of Python pre-processing step including data loading time and data conversion time |
CorePreprocessDuration_ms |
Duration of low-level pre-processing step |
CoreInferenceDuration_ms |
Duration of actual AI inference step on AI inference hardware |
CoreLoadResultDuration_ms |
Duration of data movement step from AI inference hardware |
CorePostprocessDuration_ms |
Duration of low-level post-processing step |
Note: In batch prediction mode many inference phases are pipelined so the pre- and post-processing steps of one frame may be executed in parallel with the AI inference step of another frame. Therefore actual frame rate may be higher than the frame rate calculated by
FrameTotalDuration_ms
statistic.
Note:
PythonPreprocessDuration_ms
statistic includes data loading time and data conversion time. This can give very different results for different ways of loading input frame data. For example, if you provide image URLs for inference, then thePythonPreprocessDuration_ms
will include image downloading time, which can be much higher compared with the case when you provide the image as numpy array, which does not require any downloading.
The following example shows how to use time statistics collection interface.
It assumes that the model
variable is the model created by load_model()
.
model.measure_time = True # enable accumulation of time statistics
# perform batch prediction
for result in model.predict_batch(source()):
# process result
pass
stats = model.time_stats() # query time statistics dictionary
# pretty-print frame total inference duration statistics
print(stats["FrameTotalDuration_ms"])
# print average duration of AI inference step
print(stats["CoreInferenceDuration_ms"].avg)
model.reset_time_stats() # reset time statistics accumulators
# perform one more batch prediction
for result in model.predict_batch(source()):
# process result
pass
# print maximum duration of Python pre-processing step
print(stats["PythonPreprocessDuration_ms"].max)
Configuring and Launching AI Server
PySDK can be used to configure and launch DeGirum AI server on hosts equipped with DeGirum Orca AI accelerator card(s). This allows you to run AI inferences on this AI server host initiated from remote clients.
To run PySDK as a server on a host, perform the following steps on the host:
- Create or select a user name to be used for all the following configuration steps. This user should have
administrative rights on this host. The user name
ai-user
is used in the instructions below, but it can be changed to any other user name of your choice. - For convenience of future maintenance we recommend you to install PySDK into virtual environment, such as Miniconda.
- Make sure you activated your Python virtual environment with the appropriate Python version and that PySDK installed into this virtual environment.
- Create a directory for the AI server model zoo, and change your current working directory to this directory.
For example:
sh mkdir /home/ai-user/zoo cd /home/ai-user/zoo
- Download all models from DeGirum public model zoo into the current working directory by executing the
following command:
python3 -c "from degirum import server; server.download_models('.')"
- Start DeGirum AI server process by executing the following command:
python3 -m degirum.server --zoo /home/ai-user/zoo
The AI server is up and will run until you press ENTER
in the same terminal where you started it.
By default, AI server listens to 8778 TCP port. If you want to change the TCP port, pass --port
command line
argument when launching the server, for example:
python3 -m degirum.server --zoo /home/ai-user/zoo --port 8780
Starting AI Server as Linux System Service
It is convenient to automate the process of AI server launch so that it will be started automatically on each system startup. On Linux-based hosts, you can achieve this by defining and configuring a system service, which will handle AI server startup.
Please perform the following steps to create, configure, and start system service:
- Create the configuration file in
/etc/systemd/system
directory nameddegirum.service
. You will need administrative rights to create this file. You can use the following template as an example:sh [Unit] Description=DeGirum AI Service [Service] # You may want to adjust the working directory: WorkingDirectory=/home/ai-user/ # You may want to adjust the path to your Python executable and --zoo model zoo path. # Also you may specify server TCP port other than default 8778 by adding --port <port> argument. ExecStart=/home/ai-user/miniconda3/bin/python -m degirum.server --zoo /home/ai-user/zoo Restart=always # You may want to adjust the restart time interval: RestartSec=10 SyslogIdentifier=degirum-ai-server # You may want to change the user name under which this service will run. # This user should have rights to access model zoo directory User=ai-user [Install] WantedBy=multi-user.target
- Start the system service by executing the following command:
sudo systemctl start degirum.service
- Check the system service status by executing the following command:
sudo systemctl status degirum.service
- If the status is "Active", it means that the configuration is good and the service is up and running.
- Then enable the service for automatic startup by executing the following command:
sudo systemctl enable degirum.service
Connecting to AI Server from Client Side
Now your AI server is up and running and you may connect to it from Python scripts using PySDK.
To do so, you pass the AI server network hostname or its IP address to the degirum.connect_model_zoo
PySDK function:
import degirum as dg
model_zoo = dg.connect_model_zoo(host_address)
If you run your PySDK script on the same host as the AI server, you may use the "localhost"
string as a network
hostname.
In local Linux networks with standard mDNS configuration the network hostname is a concatenation of the local hostname
as returned by hostname
command and .local
suffix, for example, if hostname
command returns ai-host
, then the
network hostname will be ai-host.local
Updating AI Server Model Zoo
If you need to update AI Server model zoo you need to perform the following steps:
- Shut down AI server:
- If you started your AI server yourself from the command line, just press
ENTER
in the terminal where you started the server. - If you started your AI server as a system service, execute the following command:
sudo systemctl stop degirum.service
- If you started your AI server yourself from the command line, just press
- Manage your model zoo directory:
- Add new models by downloading models from the cloud zoo the way described in the beginning of this chapter.
- Remove models by deleting model subdirectories.
- Start AI server again:
- For manual start refer to the beginning of this chapter.
- If you want to start it as a service, execute the following command:
sudo systemctl start degirum.service
Connect to the model zoo of your choice.
This is the main PySDK entry point: you start your work with PySDK by calling this function.
The Model Zoo manager class instance is created and returned as a result of this call. Model Zoo manager object allows you to connect to a model zoo of your choice and access a collection of models in that model zoo.
The following model zoo types are supported:
- DeGirum cloud model zoo;
- DeGirum AI server-based model zoo;
- single-file model zoo.
The type of the model zoo is defined by the URL string which you pass as a parameter (see below).
Model Zoo manager object allows you performing the following activities:
- list and search models available in the conencted model zoo;
- download models from a cloud model zoo;
- create AI model handling objects to perform AI inferences;
- request various AI model parameters.
Parameters:
zoo_url
: URL string, which defines the model zoo to managetoken
: optional security token string to be passed to the cloud model zoo for authentication and authorization
zoo_url
parameter can be one of the following varieties:
None
or not specified. In this case the connection to DeGirum public cloud model zoo is established.- A string which starts with
"https://"
prefix. Such URL string is treated as the URL of a private cloud model zoo. In this case an attempt to connect to that model zoo is made with security token defined by thetoken
parameter. In the case of not successful connection the exception is rased. - A string which defines an Internet host name or IP address. Such URL string is treated as the address of
a remote DeGirum AI server host (refer to
degirum.server
for more info). In this case an attempt to connect to that AI server is made. In the case of not successful connection the exception is rased. - A string which defines a local file path to
".json"
file. Such URL string is treated as the single-file model zoo. That".json"
file must be valid DeGirum model configuration file, otherwise exception is raised. Note: this option is mostly used for testing/debugging new models during model development which are not yet released in any model zoo.
Once you created Model Zoo manager object, you may use the following methods:
degirum.zoo_manager.ZooManager.list_models
to list and search models available in the model zoo;degirum.zoo_manager.ZooManager.load_model
to createdegirum.model.Model
model handling object to be used for AI inferences;degirum.zoo_manager.ZooManager.model_info
to request model parameters;