Skip to content

guojin-yan/TensorRT-CSharp-API

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

24 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

TensorRtSharp

็ฎ€ไฝ“ไธญๆ–‡ | English

TensorRtSharp

NuGetLicense

TensorRtSharp is a complete C# wrapper for NVIDIA TensorRT, enabling .NET developers to leverage high-performance GPU inference without leaving the C# ecosystem.

English | ็ฎ€ไฝ“ไธญๆ–‡

๐Ÿš€ Features

  • โœ… Complete API Coverage - Full support for TensorRT core features including model building, inference execution, and dynamic shapes
  • โœ… Type Safety - Strong type system with compile-time error checking
  • โœ… Automatic Resource Management - RAII and Dispose pattern-based resource management prevents memory leaks
  • โœ… Cross-Platform - Supports Windows, Linux, .NET 5.0-10.0, .NET Core 3.1, .NET Framework 4.7.1+
  • โœ… High Performance - Async execution with CUDA Stream, multi-context parallel inference
  • โœ… Ready to Use - NuGet packages include all dependencies, no complex configuration required

๐Ÿ“ฆ Installation

Via NuGet

# Install the API package
dotnet add package JYPPX.TensorRT.CSharp.API

# Install the runtime package (choose based on your CUDA version)
# For CUDA 12.x
dotnet add package JYPPX.TensorRT.CSharp.API.runtime.win-x64.cuda12

# For CUDA 11.x
dotnet add package JYPPX.TensorRT.CSharp.API.runtime.win-x64.cuda11

System Requirements

Requirement Description
OS Windows 10+, Linux (Ubuntu 18.04+),
.NET .NET 5.0-10.0, .NET Core 3.1, .NET Framework 4.7.1+
GPU NVIDIA GPU (supports CUDA 11.x or 12.x)
Dependencies NVIDIA TensorRT 10.x, CUDA Runtime

โš ๏ธ Important: TensorRtSharp 3.0 is based on TensorRT 10.x and does not support TensorRT 8.x or 9.x.

Configuration

After installing the NuGet packages, configure your system PATH to include:

  • CUDA bin directory (e.g., C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin)
  • TensorRT lib directory (e.g., C:\TensorRT-10.13.0.35\lib)

๐ŸŽฏ Quick Start

Basic Inference Example

using JYPPX.TensorRtSharp.Cuda;
using JYPPX.TensorRtSharp.Nvinfer;

// Load the engine
byte[] engineData = File.ReadAllBytes("model.engine");
Runtime runtime = new Runtime();

using CudaEngine engine = runtime.deserializeCudaEngineByBlob(engineData, (ulong)engineData.Length)
using ExecutionContext context = engine.createExecutionContext()
using CudaStream stream = new CudaStream()
using Cuda1DMemory<float> input = new Cuda1DMemory<float>(3 * 640 * 640)
using Cuda1DMemory<float> output = new Cuda1DMemory<float>(1000)
{
    // Bind tensor addresses
    context.setInputTensorAddress("images", input.get());
    context.setOutputTensorAddress("output", output.get());

    // Prepare input data
    float[] inputData = PreprocessImage("image.jpg");
    input.copyFromHost(inputData);

    // Execute inference
    context.executeV3(stream);
    stream.Synchronize();

    // Get results
    float[] outputData = new float[1000];
    output.copyToHost(outputData);
}

Model Building (ONNX โ†’ Engine)

using Builder builder = new Builder();
using NetworkDefinition network = builder.createNetworkV2(TrtNetworkDefinitionCreationFlag.kEXPLICIT_BATCH)
using BuilderConfig config = builder.createBuilderConfig()
using OnnxParser parser = new OnnxParser(network)
{
    // Parse ONNX model
    parser.parseFromFile("model.onnx", verbosity: 2);

    // Enable FP16
    config.setFlag(TrtBuilderFlag.kFP16);

    // Build and serialize
    using HostMemory serialized = builder.buildSerializedNetwork(network, config);

    // Save to file
    File.WriteAllBytes("model.engine", serialized.getByteData());
}

๐Ÿ“š Documentation

For complete documentation, please visit:

๐Ÿ—๏ธ Architecture

TensorRtSharp uses a clear three-layer architecture:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚    High-Level API Layer                โ”‚
โ”‚  Runtime, Builder, CudaEngine, Context  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                    โ†•
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚    Resource Management Layer           โ”‚
โ”‚  DisposableTrtObject, RAII pattern     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                    โ†•
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚    P/Invoke Interop Layer              โ”‚
โ”‚  NativeMethodsTensorRt*, NativeCuda*    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ’ก Core Classes

Runtime

Entry point for TensorRT inference, responsible for deserializing engine files.

Runtime runtime = new Runtime();
using CudaEngine engine = runtime.deserializeCudaEngineByBlob(data, size);
runtime.setMaxThreads(4);

Builder

Builds TensorRT engines from ONNX models.

Builder builder = new Builder();
bool hasFP16 = builder.platformHasFastFp16();
using NetworkDefinition network = builder.createNetworkV2(flags);

CudaEngine

Core inference object containing the optimized model computation graph.

string inputName = engine.getIOTensorName(0);
Dims inputShape = engine.getTensorShape(inputName);
using ExecutionContext context = engine.createExecutionContext();

ExecutionContext

Manages the execution environment for single inference.

context.setInputTensorAddress("images", inputPtr);
context.setOutputTensorAddress("output", outputPtr);
context.executeV3(stream);

๐Ÿ“Š Performance

TensorRT typically provides:

  • ๐Ÿ“ˆ 2-10x inference speedup (compared to native frameworks)
  • ๐Ÿ’พ 50%+ memory reduction (through precision optimization and layer fusion)
  • โšก Sub-millisecond latency (meeting real-time application requirements)

๐Ÿ†š Comparison

Feature TensorRtSharp ML.NET ONNX Runtime
Language C# C# C++/Python
Performance Native Medium Native
TensorRT Support โœ… Complete โŒ โš ๏ธ Limited
Custom Operators โœ… โš ๏ธ Difficult โœ…
Dynamic Shapes โœ… โš ๏ธ Limited โœ…
Multi-GPU โœ… โš ๏ธ Limited โœ…

๐Ÿ› Troubleshooting

Issue: Unable to load DLL

Error: Unable to load DLL 'TensorRT-C-API'

Solution:

  1. Verify Runtime NuGet package is installed
  2. Check PATH includes TensorRT lib and CUDA bin directories
  3. Confirm TensorRT version is 10.x

Issue: SEHException

Error: System.Runtime.InteropServices.SEHException

Solution:

  1. Confirm TensorRT version is 10.x
  2. Check CUDA version compatibility
  3. Regenerate Engine file on current device

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

๐Ÿ“„ License

This project is licensed under the Apache-2.0 license License - see the LICENSE file for details.

๐Ÿ‘จโ€๐Ÿ’ป Author

Guojin Yan

๐Ÿ“ฎ Support

  • ๐Ÿ“ง GitHub Issues: Submit an issue
  • ๐Ÿ’ฌ QQ Group: Join 945057948 for faster responses

๐Ÿ™ Acknowledgments

QQ็พคไบŒ็ปด็ 


Made with โค๏ธ by the .NET community

About

TensorRT wrapper for .NET.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages