degirum

PySDK - DeGirum AI Inference Software Development Kit

Version 0.2.1

Copyright DeGirum Corp. 2022

The DeGirum AI Inference Software Development Kit, PySDK, is a Python package which provides APIs to perform inference on ML models.

PySDK consists of the following components:

  1. PySDK library, which includes the following essential modules:
  2. The CoreClient extension module implementing APIs to work with DeGirum AI accelerator cards installed locally
  3. The aiclient extension module implementing APIs to work with DeGirium AI servers running on remote systems
  4. The degirum.server module - helper module that allows launching AI server on a local AI hardware to be used by remote clients.

The PySDK package provides the necessary tools to support the following use cases:

  • AI inferences on the local system and DeGirum AI accelerator hardware installed on the local system
  • Deploy the system with installed DeGirum AI accelerator hardware as AI server
  • Connect to one or several AI servers and perform AI inferences on those servers remotely

Installation

Basic Installation of PySDK Python Package

It is recommended to install PySDK in a virtual environment. Please see PySDK Supported Configurations page for a list of supported system configurations.

To install DeGirum PySDK from the DeGirum index server use the following command:

python -m pip install degirum --extra-index-url https://degirum.github.io/simple

To force reinstall the most recent PySDK version without reinstalling all dependencies use the following command:

python -m pip install degirum --upgrade --no-deps --force-reinstall --extra-index-url https://degirum.github.io/simple

Kernel Driver Installation

Linux Kernel Driver Installation

Hosts that have DeGirum Orca card installed need a driver to enable its functionality. The driver is distributed as a source package via a DeGirum aptitude repository. It can be built automatically after download.

  1. Add the following line to /etc/apt/sources.list to register the DeGirum APT repository:

    deb-src https://degirum.github.io/apt-repo ORCA main
    
  2. Download DeGirum public key by running the following command:

    wget -O - -q http://degirum.github.io/apt-repo/DeGirum.gpg.key | sudo apt-key add -
    
  3. Update package information from configured sources, and download prerequisites by running the following commands:

    sudo apt update
    sudo apt install dpkg-dev debhelper
    
  4. Finally, download and build DeGirum Linux driver package by running the following command:

    sudo apt-get --build source orca-driver
    

Kernel Driver Installation for Other Operating Systems

The current version of DeGirum software package supports only Linux kernel driver. Kernel driver support for other operating systems is under development. This document will be updated when kernel drivers for other operating systems will be released.

Quick Start

Note: This quick start guide covers the Local inference use case when you run AI inferences on the local host with DeGirum AI accelerator hardware installed on this local host as an option. See "System Configuration for Specific Use Cases" section below for more use cases.

To start working with PySDK you import degirum package:

import degirum as dg

The main PySDK entry point is degirum.connect_model_zoo function, which creates and returns degirum.zoo_manager.ZooManager zoo manager object:

zoo = dg.connect_model_zoo()

By default, zoo manager automatically connects to DeGirum public model zoo located in the cloud, and you have free access to all AI models from this public model zoo.

To see the list of all available AI models, use degirum.zoo_manager.ZooManager.list_models method. It returns a list of strings, where each string is a model name:

model_list = zoo.list_models()
print(model_list)

If you want to perform AI inference using some model, you need to load it using degirum.zoo_manager.ZooManager.load_model method. You provide the model name as a method argument. The model name should be one of model names returned by list_models() method, for example:

model = zoo.load_model("mobilenet_v2_ssd_coco--300x300_quant_n2x_orca_1")

The load_model() method returns a degirum.model.Model object, which can be used to perform AI inferences.

Before performing AI inferences, you may want to adjust some model parameters. The model class has a list of parameters which can be modified during runtime. These model parameters affect how the input data is pre-processed and how the inference results are post-processed. All model parameters have reasonable default values, so in the beginning you may skip this step.

Some usable model parameters are:

Property Name Description Possible Values
degirum.model.Model.image_backend image processing package to be used "auto", "pil", or "opencv"
"auto" tries PIL first
degirum.model.Model.input_pad_method how input image will be padded when resized "stretch" or "letterbox"
degirum.model.Model.output_confidence_threshold confidence threshold to reject results with low scores Float value in [0..1] range
degirum.model.Model.output_nms_threshold rejection threshold for non-max suppression Float value in [0..1] range
degirum.model.Model.overlay_color color to draw AI results Tuple in (R,G,B) format
degirum.model.Model.overlay_font_scale font scale to print AI results Float value
degirum.model.Model.overlay_show_labels True to show class labels when drawing AI results True/False
degirum.model.Model.overlay_show_probabilities True to show class probabilities when drawing AI results True/False

For the complete list of model parameters see section "Model Parameters" below.

Now you are ready to perform AI inference. To do inference you either invoke degirum.model.Model.predict method or simply call the model, supplying the input image as an argument. The inference result is returned.

A model may accept input images in various formats:

  • as a string containing the file name of the image file on the local file system:
result = model("./images/TwoCats.jpg")
  • as a string containing URL of the image file:
result = model("https://degirum.github.io/images/samples/TwoCats.jpg")
  • as a PIL image object:
from PIL import Image
image = Image.open("./images/TwoCats.jpg")
result = model(image)
  • as a numpy array (for example, returned by OpenCV):
import cv2
image = cv2.imread("./images/TwoCats.jpg")
model.input_numpy_colorspace = "BGR" # set colorspace to match OpenCV-produced numpy array
result = model(image)

The result object returned by the model (an object derived from degirum.postprocessor.InferenceResults class) contains the following information:

  • numeric inference results
  • graphical inference results
  • original image

Numeric results can be accessed by degirum.postprocessor.InferenceResults.results property. This property returns a list of result dictionaries, one dictionary per detected object or class. The format of this dictionary is model dependent. For example, to iterate over all classification model inference results you may do this:

for r in result.results:
   print(f"Detected class: r["label"] with probability r["score"]")

Tip: if you just print your inference result object, all the numeric results will be pretty-printed in YAML format:

print(result)

Graphical results can be accessed by degirum.postprocessor.InferenceResults.image_overlay property. This property returns a graphical object containing an original image with all inference results draw over it. The graphical object type depends on the graphical package specified for the model image_backend (if you omit it, it will be PIL, if PIL is installed, otherwise OpenCV). Once you get this object, you may display it, print it, or save it to a file the way you like using the graphical package of your choice. For example, for PIL:

result.image_overlay.save("./images/TwoCatsResults.jpg")
result.image_overlay.show()

And the original image can be accessed by degirum.postprocessor.InferenceResults.image property, which returns graphical object, whose type again depends on the graphical package specified for the model image_backend.

System Configuration for Specific Use Cases

The PySDK package can be used in the following use cases:

  • Local inference: running AI inferences on the local host with DeGirum AI accelerator hardware installed on this local host as an option
  • AI Server inference: running AI inferences on remote AI server host with DeGirum AI accelerator hardware installed on that remote host
  • AI Server hosting: configuring the host with DeGirum AI accelerator hardware as AI server to be used for remote AI inferences

The following sections provide step-by-step instructions how tp setup the system for particular use case.

Configuration for Local Inference

  1. Install PySDK package as described in "Basic Installation of PySDK Python Package" section above.
  2. If your system is equipped with DeGirum AI accelerator hardware, install the kernel driver as described in "Kernel Driver Installation" section above.

    Note: If your system is not equipped with any AI accelerator hardware, the set of models available for local inference will be limited only to CPU models.

  3. If you plan to operate with DeGirum public model zoo, follow instructions described in "Quick Start" section above.

  4. If you plan to operate with locally deployed model, you need to provide full path to the model JSON file when calling connect_model_zoo:

    zoo = dg.connect_model_zoo("full/path/to/model.json")
    

    Refer to "Model Zoo Manager" section below for the explanation how to open model zoo with particular local model.

  5. If you plan to operate with private model zoo, you need to provide zoo URL and zoo access token when calling connect_model_zoo:

    zoo = dg.connect_model_zoo("https://my.private.zoo", token = "@my#secret%token1234")
    

Configuration for AI Server Inference

  1. Install PySDK package as described in "Basic Installation of PySDK Python Package" section above.
  2. Follow instructions described in "Quick Start" section above with exception that when calling connect_model_zoo you need to provide hostname or IP address of the AI server you want to use for AI inference: python zoo = dg.connect_model_zoo("192.168.0.118")

Configuration for AI Server Hosting

  1. Install the kernel driver as described in "Kernel Driver Installation" section above.
  2. Follow instructions provided in "Configuring and Launching AI Server" section below.

Model Zoo Manager

A model zoo is a collection of AI models.

Depending on the deployment location, there are several types of model zoos supported by PySDK:

  • cloud model zoo: deployed on the cloud resource (for example GitHub repository);
  • AI server model zoo: deployed on the DeGirum AI server host;
  • local model zoo: deployed on the local file system of PySDK installation host.

You connect to the model zoo by calling degirum.connect_model_zoo function. You supply the model zoo identification string (called URL string) as a function parameter. The function returns the Model Zoo manager object which allows you to access a collection of models from that model zoo.

The type of the model zoo is defined by the URL string which you pass as a parameter to the degirum.connect_model_zoo function. The following cases are supported:

  1. None or not specified URL. In this case, the connection to DeGirum public cloud model zoo is established.
  2. A string which starts with "https://" prefix. Such a URL string is treated as the URL of a private cloud model zoo. In this scenario, the connection to that model zoo is established using security token defined by the token parameter.
  3. A string which defines an Internet host name or IP address. Such a URL string is treated as the address of a remote DeGirum AI server host serving AI server model zoo (refer to degirum.server for more info). In this case, an attempt to connect to that AI server is made. In the case of unsuccessful connection, an exception is raised.
  4. A string which defines a local file path to a ".json" file. Such a URL string is treated as a single-file local model zoo. The ".json" file must be valid DeGirum model configuration file, otherwise an exception is raised.

Note: this option is mostly used for testing/debugging new models during model development which are not yet released in any model zoo.

In the case of the cloud model zoo, if for whatever reason a connection to that cloud zoo fails, it is not considered as an error, and no exception is raised. Instead the Model Zoo manager object continues to work with the cached content of that cloud zoo, which is stored in the cache subdirectory associated with that cloud zoo. If the very first connection to the cloud zoo fails and there is no cached content, such zoo is treated as an empty zoo with no models. In the case of successful connection, the list of models is downloaded from the cloud zoo and placed in the cache subdirectory. This cache mechanism allows you to work with cloud zoos in offline mode: when online, PySDK downloads to the cache all models which you used for inference, then, going offline, you continue using models, downloaded when online (see section "Loading AI Models" for details on loading models for inference).

Cache subdirectories are maintained per each cloud zoo URL. Cache subdirectories' root location is operating system specific:

  • For Windows it is %APPDATA%/degirum
  • For Linux it is ~/.local/share/degirum

The Model Zoo manager object allows you to perform the following activities:

  • list and search models available in the connected model zoo;
  • create AI model handling objects to perform AI inferences;
  • request various AI model parameters;

Listing and Searching AI Models

AI model is represented in a model zoo by a set of files stored in the model subdirectory which is unique for each model. Each model subdirectory contains the following model files:

Model File Description
<model name>.json JSON file containing all model parameters. The name of this file is the name of the model. This file is mandatory.
<model name>.n2x DeGirum Orca binary file containing the model. This file is mandatory for DeGirum Orca models
<model name>.tflite TensorFlow Lite binary file containing the model. This file is mandatory for TFLite models
<class dictionary>.json JSON file containing class labels for classification or detection models. This file is optional.

To obtain the list of available AI models, you may use degirum.zoo_manager.ZooManager.list_models method. This method accepts arguments which specify the model filtering criteria. All the arguments are optional. If a certain argument is omitted, then the corresponding filtering criterion is not applied. The following filters are available:

Argument Description Possible Values
model_family Model family name filter. Used as a search substring in the model name Any valid substring like "yolo", "mobilenet"
device Inference device filter: a string or a list of strings of device names "orca": DeGirum Orca device
"cpu" : host CPU
"edgetpu": Google EdgeTPU device
precision Model calculation precision filter: a string or a list of strings of model precision labels "quant": quantized model
"float": floating point model
pruned Model density filter: a string or a list of strings of model density labels "dense": dense model
"pruned": sparse/pruned model
runtime Runtime agent type filter: a string or a list of strings of runtime agent types "n2x": DeGirum N2X runtime
"tflite": Google TFLite runtime

The method returns a list of model name strings. These model name strings are to be used later when you load AI models for inference.

When you work with the cloud zoo but you are offline, you still can list models. In this case, the model list will be taken from the cache subdirectory associated with that cloud model zoo.

The degirum.zoo_manager.ZooManager.list_models method returns the list of models, which was requested at the time you connected to a model zoo by calling degirum.connect_model_zoo. This list of models is then stored inside the Model Zoo manager object, so subsequent calls to list_models method would quickly return the model list without connecting to a remote model zoo. If you suspect that the remote model zoo contents changed, then to update the model list you need to create another instance of Zoo Manager object by calling degirum.connect_model_zoo.

Loading AI Models

Once you obtained the AI model name string, you may load this model for inference by calling degirum.zoo_manager.ZooManager.load_model method and supplying the model name string as its argument. For example:

model = zoo.load_model("mobilenet_v2_ssd_coco--300x300_quant_n2x_orca_1")

If a model with the supplied name string is found, the load_model() method returns model handling object of degirum.model.Model class. Otherwise, it throws an exception.

If you load the model from the cloud model zoo, this model will be downloaded first and stored in the cache subdirectory associated with that cloud model zoo. If the model already exists in the cache, it will be loaded from that cache but only if the cached model checksum matches the model checksum in the cloud zoo. If checksums do not match, the model from the cloud zoo will be downloaded again into the cache. When offline, you still can work with models stored in the cache. But when you try to load a model, which is not in the cache, you will get an exception.

Note: Model Zoo manager does not provide any explicit method to download a model from the cloud zoo: the model is downloaded automatically when possible, if the model is not in cache or the cached model checksum does not match the model checksum in the cloud zoo. However, the degirum.server module provides degirum.server.download_models function to explicitly download the whole cloud model zoo to the local directory (see more in "Configuring and Launching AI Server" section below)

If you load the model from the AI server model zoo, the command to load the model will be sent to the AI server: the connected AI server will handle all model loading actions remotely.

Note: The AI Server lists the models that it serves and you can only load those models: the job of managing remote AI server model zoo is not handled by Model Zoo manager class and should be done different way. Please refer to "Configuring and Launching AI Server" section for details.

If you load the model from the local model zoo, it will be loaded from model file referred by local model zoo link.

Note: single-model local model zoo is intended to be used for testing/debugging new models during model development. It assumes that you have all necessary tools for model creation and are able to create and compile new models. This document does not provide details of this development process.

Running AI Model Inference

Once you loaded an AI model and obtained model handling object, you can start doing AI inferences on your model. The following methods of degirum.model.Model class are available to perform AI inferences:

The predict() and __call__ methods behave exactly the same way (actually, __call__ just calls predict()). They accept single argument - input data frame, perform AI inference of that data frame, and return inference result - an object derived from degirum.postprocessor.InferenceResults superclass.

The batch prediction methods, predict_batch() and predict_dir(), perform predictions of multiple frames in a pipelined manner, which is more efficient than just calling predict() method in a loop. These methods are described in details in "Batch Inferences" section below.

Input Data Handling

PySDK model prediction methods support different types of input data. An exact input type depends on the model to be used. The following input data types are supported:

  • image input data
  • audio input data
  • raw tensor input data

The input data object you supply to model predict methods also depends on the number of inputs the model has. If the model has single data input, then the data objects you pass to model predict methods are single objects. If the model has multiple data inputs, then the data objects you pass to model predict methods are lists of objects: one object per corresponding input.

The number and the type of inputs of the model are described by the InputType property of the ModelInfo class returned by degirum.model.Model.model_info property (see section "Model Info" for details about model info properties). The InputType property returns the list of input data types, one type per model input. So the number of model inputs can be deduced by evaluating the length of the list returned by the InputType property.

The following sections describe details of input data handling for various model input types.

Image Input Data Handling

When dealing with model inputs of image type (InputType is equal to "Image"), the PySDK model prediction methods accept a wide variety of input data frame types:

  • the input frame can be the name of a file with frame data;
  • it can be the HTTP URL pointing to a file with frame data;
  • it can be a numpy array with frame data;
  • it can be a PIL Image object;
  • it can by bytes object containing raw frame data.

An AI model requires particular input tensor dimensions and data type which, in most of the cases, does not match the dimensions of the input frame. In this case, PySDK performs automatic conversion of the input frame to the format compatible with AI model input tensor, performing all the necessary conversions such as resizing, padding, colorspace conversion, and data type conversion.

PySDK performs input frame transformations using one of the two graphical packages (called backends): PIL or OpenCV. The backend is selected by degirum.model.Model.image_backend property. By default it is set to auto, meaning that PIL backend will be used first, and if it is not installed, then OpenCV backend will be used. You may explicitly select which backend to use by assigning either "pil" or "opencv" to degirum.model.Model.image_backend property.

Note: In case of OpenCV backend, you cannot pass PIL Image objects to model predict methods.

If your input frame is in the file on a local filesystem, or is accessible through HTTP protocol, pass the filename string or URL string directly to model predict methods: PySDK will (down-)load the file, decode it, and convert to the model input tensor format. The set of supported graphical file formats is defined solely by the graphical backend library you selected, PIL or OpenCV - PySDK does not perform any own decoding.

Sometimes, image conversion to AI model input tensor format requires image resizing. This resizing can be done in two possible ways:

  • preserving the aspect ratio;
  • not preserving the aspect ratio.

You can control the way of image resizing by degirum.model.Model.input_pad_method property, which has two possible values: "stretch" or "letterbox". When you select "stretch" method, the input image is resized exactly to the AI model input tensor dimensions, possibly changing the aspect ratio. When you select "letterbox" method (default way), the aspect ratio is preserved. The voids which can appear on the image sides are filled with the color specified by degirum.model.Model.input_letterbox_fill_color property (black by default).

You can specify the resize algorithm in the degirum.model.Model.input_resize_method property, which may have the following values: "nearest", "bilinear", "area", "bicubic", or "lanczos". These values specify various interpolation algorithms used for resizing.

In case your input frames are stored in numpy arrays, you may need to tell PySDK the order of colors in those numpy arrays: RGB or BGR. This order is called the colorspace. By default, PySDK treats numpy arrays as having RGB colorspace. So if your numpy arrays as such, then no additional action is needed from your side. But if your numpy arrays have color order opposite to default, then you need to change degirum.model.Model.input_numpy_colorspace property.

Note: If a model has multiple image inputs, the PySDK applies the same input_*** image properties as discussed above for every image input of a model.

Audio Input Data Handling

When dealing with model inputs of audio type (InputType is equal to "Audio"), PySDK does not perform any conversions of the input data: it expects numpy 1-D array with audio waveform samples of proper size and with proper sampling rate. The waveform size should be equal to InputWaveformSize model info property. The waveform sampling rate should be equal to InputSamplingRate model info property. And finally the data element type should be equal to the data type specified by the InputRawDataType model info property. All aforementioned model info properties are the properties of the ModelInfo class returned by degirum.model.Model.model_info property (see section "Model Info" for details).

Tensor Input Data Handling

When dealing with model inputs of raw tensor type (InputType is equal to "Tensor"), PySDK expects that you provide a 4-D numpy array of proper dimensions. The dimensions of that array should match model input dimensions as specified by the following model info properties:

  • InputN for dimension 0,
  • InputH for dimension 1,
  • InputW for dimension 2,
  • InputC for dimension 3.

The data element type should be equal to the data type specified by the InputRawDataType model info property (see section "Model Info" for details).

Inference Results

All model predict methods return result objects derived from degirum.postprocessor.InferenceResults class. Particular class types of result objects depend on the AI model type: classification, object detection, pose detection etc. But from the user point of view, they deliver identical functionality.

Result object contains the following data:

The results property is what you typically use for programmatic access to inference results. The type of results is always a list of dictionaries, but the format of those dictionaries is model-dependent. Also, if the result contains coordinates of objects, all such coordinates are recalculated from the model coordinates back to coordinates on the original image, so you can use them directly.

The image_overlay property is very handy for debugging and troubleshooting. It allows you to quickly assess the correctness of the inference results in graphical form.

There are result properties which affect how the overlay image is drawn:

When each individual result object is created, all these overlay properties (except overlay_fill_color) are assigned with values of similarly named properties taken from the model object (see "Model Parameters" section below for the list of model properties). This allows assigning overlay property values only once and applying them to all consecutive results. But if you want to play with individual result, you may reassign any of overlay properties and then re-read image_overlay property. Each time you read image_overlay, it returns new image object freshly drawn according to the current values of overlay properties.

Note" overlay_fill_color is assigned with degirum.model.Model.input_letterbox_fill_color.

Batch Inferences

If you need to process multiple frames using the same model and the same settings, the most effective way to do is to use batch prediction methods of `degirum.model.Model' class:

Both methods perform predictions of multiple frames in a pipelined manner, which is more efficient than just calling predict() method in a loop.

Both methods return the generator object, so you can iterate over inference results. This allows you to directly use the result of batch prediction methods in for-loops, for example:

for result in model.predict_batch(['image1.jpg','image2.jpg']):
   print(result)

Note: Since batch prediction methods return generator object, simple assignment of batch prediction method result to some variable does not start any inference. Only iterating over that generator object does.

The predict_batch method accepts single parameter: an iterator object, for example, a list. You populate your iterator object with the same type of data you pass to regular predict(), i.e. input image path strings, input image URL string, numpy arrays, or PIL Image objects (in case of PIL image backend).

The predict_dir method accepts a filepath to a directory containing graphical files for inference. You may supply optional extensions parameter passing the list of file extensions to process.

Model Parameters

The model behavior can be controlled with various Model class properties, which define model parameters. They can be divided into the following categories:

  • parameters, which control how to handle input frames;
  • parameters, which control the inference;
  • parameters, which control how to display inference results;
  • parameters, which control model run-time behavior and provide access to model information

The following table provides complete summary of Model class properties arranged by categories.

Property Name Description Possible Values Default Value
Input Handling Parameters
image_backend package to be used for image processing "auto", "pil", or "opencv"
"auto" tries PIL first
"auto"
input_letterbox_fill_color image fill color in case of 'letterbox' padding 3-element tuple of RGB color (0,0,0)
input_numpy_colorspace colorspace for numpy arrays "RGB" or "BGR" "RGB"
input_pad_method how input image will be padded when resized "stretch" or "letterbox" "letterbox"
input_resize_method interpolation algorithm for image resizing "nearest", "bilinear", "area", "bicubic", "lanczos" "bilinear"
save_model_image flag to enable/disable saving of model input image in inference results Boolean value False
Inference Parameters
output_confidence_threshold confidence threshold to reject results with low scores Float value in [0..1] range 0.1
output_max_detections maximum number of objects to report for detection models Integer value 20
output_max_detections_per_class maximum number of objects to report for each class for detection models Integer value 100
output_max_classes_per_detection maximum number of classes to report for detection models Integer value 30
output_nms_threshold rejection threshold for non-max suppression Float value in [0..1] range 0.6
output_pose_threshold rejection threshold for pose detection models Float value in [0..1] range 0.8
output_postprocess_type inference result post-processing type. You may set it to 'None' to bypass post-processing. String Model-dependent
output_top_k Number of classes with biggest scores to report for classification models. If 0, report all classes above confidence threshold Integer value 0
output_use_regular_nms use regular (per-class) NMS algorithm as opposed to global (class-ignoring) NMS algorithm for detection models Boolean value False
Display Parameters
overlay_alpha transparency value (alpha-blend weight) for all overlay details Float value in [0..1] range 0.5
overlay_color color for drawing all overlay details 3-element tuple of RGB color (255,255,128)
overlay_font_scale font scaling factor for overlay text Positive float value 1.0
overlay_line_width line width in pixels for overlay lines 3
overlay_show_labels flag to enable drawing class labels of detected objects Boolean value True
overlay_show_probabilities flag to enable drawing probabilities of detected objects Boolean value False
Control and Information Parameters
devices_available list of inference device indices which can be used for model inference (read-only) List of integer values N/A
devices_selected list of inference device indices selected for model inference List of integer values Equal to devices_available
label_dictionary model class label dictionary (read-only) Dictionary N/A
measure_time flag to enable measuring and collecting inference time statistics Boolean value False
model_info model information object to provide read-only access to model parameters (read-only) ModelParams object N/A
non_blocking_batch_predict flag to control the blocking behavior of predict_batch() method Boolean value False

Model Info

AI models have a lot of static attributes defining various model features and characteristics. Unlike model properties, these attributes in most cases cannot be changed: they come with the model.

To access all model attributes, you may query read-only model property degirum.model.Model.model_info.

Note: New deep copy of model info class is created each time you read this property, so any changes made to this copy will not affect model behavior.

Model attributes are divided into the following categories:

  • Device-related attributes
  • Pre-processing-related attributes
  • Inference-related attributes
  • Post-processing-related attributes

The following table provides a complete summary of model attributes arranged by categories. The Attribute Name column contains the name of the ModelInfo class member returned by the model_info property.

Note: Each attribute in the Pre-Processing-Related Attributes group is a list of values, one per model input.

Attribute Name Description Possible Values
Device-Related Attributes
DeviceType Device type to be used for AI inference of this model "ORCA": DeGirum Orca,
"EDGETPU": Google EdgeTPU,
"CPU": host CPU
RuntimeAgent Type of runtime to be used for AI inference of this model "N2X": DeGirum NNExpress runtime,
"TFLITE": Google TFLite runtime
Pre-Processing-Related Attributes
InputType Model input type List of the following strings:
"Image": image input type,
"Audio": audio input type,
"Tensor": raw tensor input type
InputN Input frame dimension size 1
Other sizes to be supported
InputH Input height dimension size Integer number
InputW Input width dimension size Integer number
InputC Input color dimension size Integer number
InputQuantEn Enable input frame quantization flag (set for quantized models) Boolean value
InputRawDataType Data element type for audio or tensor inputs List of the following strings:
"DG_UINT8": 8-bit unsigned integer,
"DG_INT16": 16-bit signed integer,
"DG_FLT": 32-bit floating point
InputTensorLayout Input tensor shape and layout List of the following strings:
"NHWC": 4-D tensor frame-height-width-color
More layouts to be supported
InputColorSpace Input image colorspace (sequence of colors in C dimension) List of the following strings:
"RGB", "BGR"
InputImgNormEn Enable global normalization of input image flag List of boolean values
InputImgNormCoeff Normalization factor for input image global normalization List of float values
InputImgMean Mean value for per-channel image normalization List of 3-element arrays of float values
InputImgStd StDev value for per-channel image normalization List of 3-element arrays of float values
InputQuantOffset Quantization offset for input image quantization List of float values
InputQuantScale Quantization scale for input image quantization List of float values
InputWaveformSize Input waveform size in samples for audio input types List of positive integer values
InputSamplingRate Input waveform sampling rate in Hz for audio input types List of positive float values
Inference-Related Attributes
ModelPath Path to the model JSON file String with filepath
ModelInputN Model frame dimension size 1
Other sizes to be supported
ModelInputH Model height dimension size Integer number
ModelInputW Model width dimension size Integer number
ModelInputC Model color dimension size Integer number
ModelQuantEn Enable input frame quantization flag (set for quantized models) Boolean value
Post-Processing-Related Attributes
OutputNumClasses Number of classes model detects Integer value
OutputSoftmaxEn Enable softmax step in post-processing flag Boolean value
OutputClassIDAdjustment Class ID adjustment: number subtracted from the class ID reported by the model Integer value
OutputPostprocessType Post-processing type "Classification", "Detection", "DetectionYolo", "PoseDetection", "FaceDetect", "Segmentation", "BodyPix", "Python"
Other types to be supported
OutputConfThreshold Confidence threshold to reject results with low scores Float value in [0..1] range
OutputNMSThreshold Rejection threshold for non-max suppression Float value in [0..1] range
OutputTopK Number of classes with biggest scores to report for classification models Integer number
MaxDetections Maximum number of objects to report for detection models Integer number
MaxDetectionsPerClass Maximum number of objects to report for each class for detection models Integer number
MaxClassesPerDetection Maximum number of classes to report for detection models Integer number
UseRegularNMS Use regular (per-class) NMS algorithm as opposed to global (class-ignoring) NMS algorithm for detection models Boolean value

Inference Advanced Topics

Selecting Devices for Inference

Every AI model in a model zoo is designed to work on a particular hardware, either on AI accelerator hardware such as DeGirum Orca, or on host computer CPU. Imagine the situation when the host computer is equipped with multiple hardware devices of a given type, and you run multiple inferences of a model designed for this device type. In this case by default all available hardware devices of this type will be used for this model inferences. This guarantees top inference performance in the case of single model running on all available devices.

To get the information about available devices you query degirum.model.Model.devices_available property. It returns the list of device indices of all available devices of the type this model is designed for. Those indices are zero-based, so if your host computer has a single device of a given type, the returned list would contain single zero element: [0]. In case of two devices it will be [0, 1] and so on.

In certain cases you may want to limit the model inference to particular subset of available devices. For example, you have two devices and you want to run concurrent inference of two models. In default case both devices would be used for both model inferences causing the models to be reloaded to devices each time you run the inference of another model. Even if the model loading for DeGirum Orca devices is extremely fast, it still may cause performance degradation. In this case you may want to run the first model inference only on the first device, and the second model inference only on the second device. To do so you need to assign degirum.model.Model.devices_selected property of each model object to contain the list of device indices you want your model to run on. In our example you need to assign the list [0] to the devices_selected property of the first model object, and the list [1] to the second model object.

In general, the list you assign to the devices_selected property should contain only indices occurred in the list returned by the devices_available property.

Handling Multiple Streams of Frames

The Model class interface has a method, degirum.model.Model.predict_batch, which can run multiple predictions on a sequence of frames. In order to deliver the sequence of frames to the predict_batch you implement an iterable object, which returns your frames one-by-one. One example of iterable object is a regular Python list, another example is a function, which yields frame data using yield statement. Then you pass such iterable object as an argument to the predict_batch method. In turn, the predict_batch method returns a generator object, which yields prediction results using yield statement.

All the inference magic with pipelining sequential inferences, asynchronously retrieving inference results, supporting various inference devices, and AI server vs. local operation modes happens inside the implementation of predict_batch method. All you need to do is to wrap your sequence of frame data in an iterable object, pass this object to predict_batch, and iterate over the generator object returned by predict_batch using either for-loop or by repeatedly calling next() built-in function on this generator object.

The following example runs the inference on an infinite sequence of frames captured from the camera:

import cv2 # OpenCV
stream = cv2.VideoCapture(0) # open video stream from local camera #0

def source(): # define iterator function, which returns frames from camera
   while True:
      ret, frame = stream.read()
      yield frame   

for result in model.predict_batch(source()): # iterate over inference results
   cv2.imshow("AI camera", res.image_overlay) # process result

But what if you need to run multiple concurrent inferences of multiple asynchronous data streams with different frame rates? The simple approach when you combine two generators in one loop either using zip() built-in function or by manually calling next() built-in function for every generator in a loop body will not work effectively.

Non-working example 1. Using zip() built-in function:

batch1 = model1.predict_batch(source1()) # generator object for the first model
batch2 = model2.predict_batch(source2()) # generator object for the second model
for result1, result2 in zip(batch1, batch2)
   # process result1 and result2

Non-working example 2. Using next() built-in function:

batch1 = model1.predict_batch(source1()) # generator object for the first model
batch2 = model2.predict_batch(source2()) # generator object for the second model
while True:
   result1 = next(batch1)
   result2 = next(batch2)
   # process result1 and result2

The reason is that the Python runtime has Global Interpreter Lock (GIL), which allows running only one thread at a time blocking the execution of other threads. So if the currently running thread is itself blocked by waiting for the next frame or waiting for the next inference result, all other threads are blocked as well.

For example, if the frame rate of source1() is slower than the frame rate of source2() and assuming that the model inference frame rates are higher than the corresponding source frame rates, then the code above will spend most of the time waiting for the next frame from source1(), not letting frames from source2() to be retrieved, so the model2 will not get enough frames and will idle, losing performance.

Another example is when the inference latency of model1 is higher than the inference queue depth expressed in time (this is the product of the inference queue depth expressed in frames and the single frame inference time). In this case when the model1 inference queue is full, but inference result is not ready yet, the code above will block on waiting for that inference result inside next(batch1) preventing any operations with model2.

To get around such blocks the special non-blocking mode of batch predict operation is implemented. You turn on this mode by assigning True to degirum.model.Model.non_blocking_batch_predict property.

When non-blocking mode is enabled, the generator object returned by predict_batch() method accepts None from the input iterable object. This allows you to design non-blocking frame data source iterators: when no data is available, such iterator just yields None without waiting for the next frame. If None is returned from the input iterator, the model predict step is simply skipped for this iteration.

Also in non-blocking mode when no inference results are available in the result queue at some iteration, the generator yields None result. This allows to continue execution of the code which operates with another model.

In order to operate in non-blocking mode you need to modify your code the following way:

  1. Modify frame data source iterator to return None if no frame is available yet, instead of waiting for the next frame.
  2. Modify inference loop body to deal with None results by simply skipping them.

Measure Inference Timing

The degirum.model.Model class has a facility to measure and collect model inference time information. To enable inference time collection assign True to degirum.model.Model.measure_time property.

When inference timing collection is enabled, the durations of individual steps for each frame prediction are accumulated in internal statistic accumulators.

To reset time statistic accumulators you use degirum.model.Model.reset_time_stats method.

To retrieve time statistic accumulators you use degirum.model.Model.time_stats method. This method returns a dictionary with time statistic objects. Each time statistic object accumulates time statistics for particular inference step over all frame predictions happened since the timing collection was enabled or reset. The statistics includes minimum, maximum, average, and count. Inference steps correspond to dictionary keys. The following dictionary keys are supported:

Key Description
FrameTotalDuration_ms Frame total inference duration from the moment when you invoke predict method to the moment when inference results are returned
PythonPreprocessDuration_ms Duration of Python pre-processing step including data loading time and data conversion time
CorePreprocessDuration_ms Duration of low-level pre-processing step
CoreInferenceDuration_ms Duration of actual AI inference step on AI inference hardware
CoreLoadResultDuration_ms Duration of data movement step from AI inference hardware
CorePostprocessDuration_ms Duration of low-level post-processing step

Note: In batch prediction mode many inference phases are pipelined so the pre- and post-processing steps of one frame may be executed in parallel with the AI inference step of another frame. Therefore actual frame rate may be higher than the frame rate calculated by FrameTotalDuration_ms statistic.

Note: PythonPreprocessDuration_ms statistic includes data loading time and data conversion time. This can give very different results for different ways of loading input frame data. For example, if you provide image URLs for inference, then the PythonPreprocessDuration_ms will include image downloading time, which can be much higher compared with the case when you provide the image as numpy array, which does not require any downloading.

The following example shows how to use time statistics collection interface. It assumes that the model variable is the model created by load_model().

model.measure_time = True # enable accumulation of time statistics 

# perform batch prediction
for result in model.predict_batch(source()):
   # process result
   pass

stats = model.time_stats() # query time statistics dictionary

# pretty-print frame total inference duration statistics
print(stats["FrameTotalDuration_ms"])

# print average duration of AI inference step
print(stats["CoreInferenceDuration_ms"].avg) 

model.reset_time_stats() # reset time statistics accumulators

# perform one more batch prediction
for result in model.predict_batch(source()):
   # process result
   pass

# print maximum duration of Python pre-processing step
print(stats["PythonPreprocessDuration_ms"].max)

Configuring and Launching AI Server

PySDK can be used to configure and launch DeGirum AI server on hosts equipped with DeGirum Orca AI accelerator card(s). This allows you to run AI inferences on this AI server host initiated from remote clients.

To run PySDK as a server on a host, perform the following steps on the host:

  • Create or select a user name to be used for all the following configuration steps. This user should have administrative rights on this host. The user name ai-user is used in the instructions below, but it can be changed to any other user name of your choice.
  • For convenience of future maintenance we recommend you to install PySDK into virtual environment, such as Miniconda.
  • Make sure you activated your Python virtual environment with the appropriate Python version and that PySDK installed into this virtual environment.
  • Create a directory for the AI server model zoo, and change your current working directory to this directory. For example: sh mkdir /home/ai-user/zoo cd /home/ai-user/zoo
  • Download all models from DeGirum public model zoo into the current working directory by executing the following command: python3 -c "from degirum import server; server.download_models('.')"
  • Start DeGirum AI server process by executing the following command: python3 -m degirum.server --zoo /home/ai-user/zoo

The AI server is up and will run until you press ENTER in the same terminal where you started it.

By default, AI server listens to 8778 TCP port. If you want to change the TCP port, pass --port command line argument when launching the server, for example:

python3 -m degirum.server --zoo /home/ai-user/zoo --port 8780

Starting AI Server as Linux System Service

It is convenient to automate the process of AI server launch so that it will be started automatically on each system startup. On Linux-based hosts, you can achieve this by defining and configuring a system service, which will handle AI server startup.

Please perform the following steps to create, configure, and start system service:

  • Create the configuration file in /etc/systemd/system directory named degirum.service. You will need administrative rights to create this file. You can use the following template as an example: sh [Unit] Description=DeGirum AI Service [Service] # You may want to adjust the working directory: WorkingDirectory=/home/ai-user/ # You may want to adjust the path to your Python executable and --zoo model zoo path. # Also you may specify server TCP port other than default 8778 by adding --port <port> argument. ExecStart=/home/ai-user/miniconda3/bin/python -m degirum.server --zoo /home/ai-user/zoo Restart=always # You may want to adjust the restart time interval: RestartSec=10 SyslogIdentifier=degirum-ai-server # You may want to change the user name under which this service will run. # This user should have rights to access model zoo directory User=ai-user [Install] WantedBy=multi-user.target
  • Start the system service by executing the following command: sudo systemctl start degirum.service
  • Check the system service status by executing the following command: sudo systemctl status degirum.service
  • If the status is "Active", it means that the configuration is good and the service is up and running.
  • Then enable the service for automatic startup by executing the following command: sudo systemctl enable degirum.service

Connecting to AI Server from Client Side

Now your AI server is up and running and you may connect to it from Python scripts using PySDK. To do so, you pass the AI server network hostname or its IP address to the degirum.connect_model_zoo PySDK function:

import degirum as dg
model_zoo = dg.connect_model_zoo(host_address)

If you run your PySDK script on the same host as the AI server, you may use the "localhost" string as a network hostname.

In local Linux networks with standard mDNS configuration the network hostname is a concatenation of the local hostname as returned by hostname command and .local suffix, for example, if hostname command returns ai-host, then the network hostname will be ai-host.local

Updating AI Server Model Zoo

If you need to update AI Server model zoo you need to perform the following steps:

  • Shut down AI server:
    • If you started your AI server yourself from the command line, just press ENTER in the terminal where you started the server.
    • If you started your AI server as a system service, execute the following command: sudo systemctl stop degirum.service
  • Manage your model zoo directory:
    • Add new models by downloading models from the cloud zoo the way described in the beginning of this chapter.
    • Remove models by deleting model subdirectories.
  • Start AI server again:
    • For manual start refer to the beginning of this chapter.
    • If you want to start it as a service, execute the following command: sudo systemctl start degirum.service
def connect_model_zoo(*args, **kwargs)

Connect to the model zoo of your choice.

This is the main PySDK entry point: you start your work with PySDK by calling this function.

The Model Zoo manager class instance is created and returned as a result of this call. Model Zoo manager object allows you to connect to a model zoo of your choice and access a collection of models in that model zoo.

The following model zoo types are supported:

  • DeGirum cloud model zoo;
  • DeGirum AI server-based model zoo;
  • single-file model zoo.

The type of the model zoo is defined by the URL string which you pass as a parameter (see below).

Model Zoo manager object allows you performing the following activities:

  • list and search models available in the conencted model zoo;
  • download models from a cloud model zoo;
  • create AI model handling objects to perform AI inferences;
  • request various AI model parameters.

Parameters:

  • zoo_url: URL string, which defines the model zoo to manage
  • token: optional security token string to be passed to the cloud model zoo for authentication and authorization

zoo_url parameter can be one of the following varieties:

  1. None or not specified. In this case the connection to DeGirum public cloud model zoo is established.
  2. A string which starts with "https://" prefix. Such URL string is treated as the URL of a private cloud model zoo. In this case an attempt to connect to that model zoo is made with security token defined by the token parameter. In the case of not successful connection the exception is rased.
  3. A string which defines an Internet host name or IP address. Such URL string is treated as the address of a remote DeGirum AI server host (refer to degirum.server for more info). In this case an attempt to connect to that AI server is made. In the case of not successful connection the exception is rased.
  4. A string which defines a local file path to ".json" file. Such URL string is treated as the single-file model zoo. That ".json" file must be valid DeGirum model configuration file, otherwise exception is raised. Note: this option is mostly used for testing/debugging new models during model development which are not yet released in any model zoo.

Once you created Model Zoo manager object, you may use the following methods: