็ฎไฝไธญๆ | English
TensorRtSharp is a complete C# wrapper for NVIDIA TensorRT, enabling .NET developers to leverage high-performance GPU inference without leaving the C# ecosystem.
English | ็ฎไฝไธญๆ
- โ Complete API Coverage - Full support for TensorRT core features including model building, inference execution, and dynamic shapes
- โ Type Safety - Strong type system with compile-time error checking
- โ Automatic Resource Management - RAII and Dispose pattern-based resource management prevents memory leaks
- โ Cross-Platform - Supports Windows, Linux, .NET 5.0-10.0, .NET Core 3.1, .NET Framework 4.7.1+
- โ High Performance - Async execution with CUDA Stream, multi-context parallel inference
- โ Ready to Use - NuGet packages include all dependencies, no complex configuration required
# Install the API package
dotnet add package JYPPX.TensorRT.CSharp.API
# Install the runtime package (choose based on your CUDA version)
# For CUDA 12.x
dotnet add package JYPPX.TensorRT.CSharp.API.runtime.win-x64.cuda12
# For CUDA 11.x
dotnet add package JYPPX.TensorRT.CSharp.API.runtime.win-x64.cuda11| Requirement | Description |
|---|---|
| OS | Windows 10+, Linux (Ubuntu 18.04+), |
| .NET | .NET 5.0-10.0, .NET Core 3.1, .NET Framework 4.7.1+ |
| GPU | NVIDIA GPU (supports CUDA 11.x or 12.x) |
| Dependencies | NVIDIA TensorRT 10.x, CUDA Runtime |
โ ๏ธ Important: TensorRtSharp 3.0 is based on TensorRT 10.x and does not support TensorRT 8.x or 9.x.
After installing the NuGet packages, configure your system PATH to include:
- CUDA
bindirectory (e.g.,C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin) - TensorRT
libdirectory (e.g.,C:\TensorRT-10.13.0.35\lib)
using JYPPX.TensorRtSharp.Cuda;
using JYPPX.TensorRtSharp.Nvinfer;
// Load the engine
byte[] engineData = File.ReadAllBytes("model.engine");
Runtime runtime = new Runtime();
using CudaEngine engine = runtime.deserializeCudaEngineByBlob(engineData, (ulong)engineData.Length)
using ExecutionContext context = engine.createExecutionContext()
using CudaStream stream = new CudaStream()
using Cuda1DMemory<float> input = new Cuda1DMemory<float>(3 * 640 * 640)
using Cuda1DMemory<float> output = new Cuda1DMemory<float>(1000)
{
// Bind tensor addresses
context.setInputTensorAddress("images", input.get());
context.setOutputTensorAddress("output", output.get());
// Prepare input data
float[] inputData = PreprocessImage("image.jpg");
input.copyFromHost(inputData);
// Execute inference
context.executeV3(stream);
stream.Synchronize();
// Get results
float[] outputData = new float[1000];
output.copyToHost(outputData);
}using Builder builder = new Builder();
using NetworkDefinition network = builder.createNetworkV2(TrtNetworkDefinitionCreationFlag.kEXPLICIT_BATCH)
using BuilderConfig config = builder.createBuilderConfig()
using OnnxParser parser = new OnnxParser(network)
{
// Parse ONNX model
parser.parseFromFile("model.onnx", verbosity: 2);
// Enable FP16
config.setFlag(TrtBuilderFlag.kFP16);
// Build and serialize
using HostMemory serialized = builder.buildSerializedNetwork(network, config);
// Save to file
File.WriteAllBytes("model.engine", serialized.getByteData());
}For complete documentation, please visit:
TensorRtSharp uses a clear three-layer architecture:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ High-Level API Layer โ
โ Runtime, Builder, CudaEngine, Context โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Resource Management Layer โ
โ DisposableTrtObject, RAII pattern โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ P/Invoke Interop Layer โ
โ NativeMethodsTensorRt*, NativeCuda* โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Entry point for TensorRT inference, responsible for deserializing engine files.
Runtime runtime = new Runtime();
using CudaEngine engine = runtime.deserializeCudaEngineByBlob(data, size);
runtime.setMaxThreads(4);Builds TensorRT engines from ONNX models.
Builder builder = new Builder();
bool hasFP16 = builder.platformHasFastFp16();
using NetworkDefinition network = builder.createNetworkV2(flags);Core inference object containing the optimized model computation graph.
string inputName = engine.getIOTensorName(0);
Dims inputShape = engine.getTensorShape(inputName);
using ExecutionContext context = engine.createExecutionContext();Manages the execution environment for single inference.
context.setInputTensorAddress("images", inputPtr);
context.setOutputTensorAddress("output", outputPtr);
context.executeV3(stream);TensorRT typically provides:
- ๐ 2-10x inference speedup (compared to native frameworks)
- ๐พ 50%+ memory reduction (through precision optimization and layer fusion)
- โก Sub-millisecond latency (meeting real-time application requirements)
| Feature | TensorRtSharp | ML.NET | ONNX Runtime |
|---|---|---|---|
| Language | C# | C# | C++/Python |
| Performance | Native | Medium | Native |
| TensorRT Support | โ Complete | โ | |
| Custom Operators | โ | โ | |
| Dynamic Shapes | โ | โ | |
| Multi-GPU | โ | โ |
Error: Unable to load DLL 'TensorRT-C-API'
Solution:
- Verify Runtime NuGet package is installed
- Check PATH includes TensorRT lib and CUDA bin directories
- Confirm TensorRT version is 10.x
Error: System.Runtime.InteropServices.SEHException
Solution:
- Confirm TensorRT version is 10.x
- Check CUDA version compatibility
- Regenerate Engine file on current device
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the Apache-2.0 license License - see the LICENSE file for details.
Guojin Yan
- ๐ง GitHub Issues: Submit an issue
- ๐ฌ QQ Group: Join 945057948 for faster responses
- NVIDIA TensorRT - High-performance deep learning inference optimizer
- .NET community - For the amazing developer platform
Made with โค๏ธ by the .NET community

