Link Search Menu Expand Document

Get started with ORT for C#

Contents

Install the Nuget Packages with the .NET CLI

dotnet add package Microsoft.ML.OnnxRuntime --version 1.2.0
dotnet add package System.Numerics.Tensors --version 0.1.0

Import the libraries

using Microsoft.ML.OnnxRuntime;
using System.Numerics.Tensors;

Create method for inference

This is an Azure Function example that uses ORT with C# for inference on an NLP model created with SciKit Learn.

 public static async Task<IActionResult> Run(
            [HttpTrigger(AuthorizationLevel.Function, "get", "post", Route = null)] HttpRequest req,
            ILogger log, ExecutionContext context)
        {
            log.LogInformation("C# HTTP trigger function processed a request.");

            string review = req.Query["review"];

            string requestBody = await new StreamReader(req.Body).ReadToEndAsync();
            dynamic data = JsonConvert.DeserializeObject(requestBody);
            review = review ?? data?.review;

            // Get path to model to create inference session.
            var modelPath = "./model.onnx";

            // create input tensor (nlp example)
            var inputTensor = new DenseTensor<string>(new string[] { review }, new int[] { 1, 1 });

            // Create input data for session.
            var input = new List<NamedOnnxValue> { NamedOnnxValue.CreateFromTensor<string>("input", inputTensor) };

            // Create an InferenceSession from the Model Path.
            var session = new InferenceSession(modelPath);

            // Run session and send input data in to get inference output. Call ToList then get the Last item. Then use the AsEnumerable extension method to return the Value result as an Enumerable of NamedOnnxValue.
            var output = session.Run(input).ToList().Last().AsEnumerable<NamedOnnxValue>();

            // From the Enumerable output create the inferenceResult by getting the First value and using the AsDictionary extension method of the NamedOnnxValue.
            var inferenceResult = output.First().AsDictionary<string, float>();

            // Return the inference result as json.
            return new JsonResult(inferenceResult);
        }

Reuse input/output tensor buffers

In some scenarios, you may want to reuse input/output tensors. This often happens when you want to chain 2 models (ie. feed one’s output as input to another), or want to accelerate inference speed during multiple inference runs.

Chaining: Feed model A’s output(s) as input(s) to model B

InferenceSession session1, session2;  // let's say 2 sessions are initialized

Tensor<float> t1;  // let's say data is fed into the Tensor objects
var inputs1 = new List<NamedOnnxValue>()
              {
                  NamedOnnxValue.CreateFromTensor<float>("name1", t1)
              };
// session1 inference
using (var outputs1 = session1.Run(inputs1))
{
    // get intermediate value
    var input2 = outputs1.First();
    
    // modify the name of the ONNX value
    input2.Name = "name2";

    // create input list for session2
    var inputs2 = new List<NamedOnnxValue>() { input2 };

    // session2 inference
    using (var results = session2.Run(inputs2))
    {
        // manipulate the results
    }
}

Multiple inference runs with fixed sized input(s) and output(s)

If the model have fixed sized inputs and outputs of numeric tensors, you can use “FixedBufferOnnxValue” to accelerate the inference speed. By using “FixedBufferOnnxValue”, the container objects only need to be allocated/disposed one time during multiple InferenceSession.Run() calls. This avoids some overhead which may be beneficial for smaller models where the time is noticeable in the overall running time.

An example can be found at TestReusingFixedBufferOnnxValueNonStringTypeMultiInferences():

Running on GPU (Optional)

If using the GPU package, simply use the appropriate SessionOptions when creating an InferenceSession.

int gpuDeviceId = 0; // The GPU device ID to execute on
var session = new InferenceSession("model.onnx", SessionOptions.MakeSessionOptionWithCudaProvider(gpuDeviceId));

ONNX Runtime C# API

The ONNX runtime provides a C# .NET binding for running inference on ONNX models in any of the .NET standard platforms.

Supported Versions

.NET standard 1.1

Builds

Artifact Description Supported Platforms
Microsoft.ML.OnnxRuntime CPU (Release) Windows, Linux, Mac, X64, X86 (Windows-only), ARM64 (Windows-only)…more details: compatibility
Microsoft.ML.OnnxRuntime.Gpu GPU - CUDA (Release) Windows, Linux, Mac, X64…more details: compatibility
Microsoft.ML.OnnxRuntime.DirectML GPU - DirectML (Release) Windows 10 1709+
ort-nightly CPU, GPU (Dev) Same as Release versions

API Reference

C# API Reference

Samples

See Tutorials: Basics - C#

Learn More