This Document For Debian Based Systems/ MacOS
For Windows follow the document from torchserve
- Install TorchServe
- Serve a Model
- Logging and Monitoring
- Metrics
- Convert to Script Mode
- Serve YOLOv5 Model
- Common Bugs
- References
- Clone TorchServe repository
$ git clone https://github.com/pytorch/serve.git-
Install all dependencies:
Note: For Conda, Python 3.8 is required to run Torchserve.
-
Change to serve directory
$ cd serve -
For CPU
$ python ./ts_scripts/install_dependencies.py
-
For GPU with Cuda 10.2. Options are
cu92,cu101,cu102,cu111$ python ./ts_scripts/install_dependencies.py --cuda=cu102
- Install torchserve, torch-model-archiver and torch-workflow-archiver
For Conda
$ conda install torchserve torch-model-archiver torch-workflow-archiver -c pytorch
For Pip
$ pip install torchserve torch-model-archiver torch-workflow-archiver
If this command raise a warning that these libraries is not in PATH looks like the following warning.
WARNING: The script torchserve is installed in '/home/user/.local/bin' which is not on PATH.
You must add root directory path of these libraries. For example:
export PATH=$PATH:/home/user/.local/bin
Install TorchServe for development at here
$ cd ..$ mkdir model_store$ wget https://download.pytorch.org/models/densenet161-8d451a50.pthNote: Pytorch model include 2 mode: eager mode and script mode. To know more about mode in Pytorch click here
$ torch-model-archiver --model-name densenet161 --version 1.0 --model-file ./serve/examples/image_classifier/densenet_161/model.py --serialized-file densenet161-8d451a50.pth --export-path model_store --extra-files ./serve/examples/image_classifier/index_to_name.json --handler image_classifier
Here for more informations about arguments.
-
Handler file is extremely important and is specified by
--handler. -
Handler file controls preprocessing data, passes data through model and gets the predictions.
-
You can create your own handler. See more How to write a handler file?.
1.5.1 Configure properties:
The defaul port of Inference REST API - 8080, Management REST API - 8081, Metrics REST API - 8082. Inference gRPC API - 7070, Management gRPC API - 7071.
In my case, port 7070 is already used by AnyDesk. So I have to configure gRPC port:
$ echo "grpc_inference_port=8888" > config.propertiesTorch Serve will load configurations from file config.properties where you run tochserve.
For more Advanced Configuration
1.5.2 Start TorchServe:
$ torchserve --start --ncs --model-store model_store --models densenet161.marAfter you execute the torchserve command above, TorchServe runs on your host, listening for inference requests.
1.6.1 Using GRPC APIs through python client
-
Install grpc python dependencies :
$ pip install -U grpcio protobuf grpcio-tools
-
Change directory to the serve directory :
$ cd serve -
Generate inference client using proto files
$ python -m grpc_tools.protoc --proto_path=frontend/server/src/main/resources/proto/ --python_out=ts_scripts --grpc_python_out=ts_scripts frontend/server/src/main/resources/proto/inference.proto frontend/server/src/main/resources/proto/management.proto
This command will create 4 python files:
inference_pb2.py,inference_pb2_grpc.py,management_pb2.py,management_pb2_grpc.pydepend ongrpc_tools.protocfile. -
Run inference using a sample client gRPC python client
Remember to Start TorchServe before running this command. Change inference port from 7070 to 8888 if you set grpc_inference_port=8888 in line 10 of
torchserve_grpc_client.py.$ python ts_scripts/torchserve_grpc_client.py infer densenet161 examples/image_classifier/kitten.jpg
-
1-> api name [infer, register, unregister]
-
2-> model name
-
3-> model input for prediction
-
1.6.2 Using REST APIs
-
Download an image to test the model server (you also can use your own data).
$ curl -O https://raw.githubusercontent.com/pytorch/serve/master/docs/images/kitten_small.jpg
-
The first way, we can use terminal to get prediction:
$ curl http://127.0.0.1:8080/predictions/densenet161 -T kitten_small.jpg
-
The other way, we implement a python script to get a prediction from an url:
$ python send_request.pyurl
1.7.1 Register a model
$ python ts_scripts/torchserve_grpc_client.py register squeezenet1_1By example code we only can use some pretrained model.
Create a mar file of the second model store in the model_store folder. And set several workers for the second model.
Example: If we have a squeezenet1_1.mar in model_store. Use this following code:
$ curl -v -X POST "http://localhost:8081/models?initial_workers=1&url=squeezenet1_1.mar"Or you can download a pretrained model from Torch Serve.
$ curl -v -X POST "http://localhost:8081/models?initial_workers=1&url=https://torchserve.pytorch.org/mar_files/squeezenet1_1.mar"If you want to update a new version of model (example version 1.1):
$ curl -v -X POST "http://localhost:8081/models?initial_workers=1&url=squeezenet1_1.mar/1.1" url: Load a model archive. Supports the following locations:
a local model archive (.mar); the file must be in the model_store folder (and not in a subfolder).
a URI using the HTTP(s) protocol. TorchServe can download .mar files from the Internet.
model_name: Name of a mar model file.
handler: Make sure that the given handler is in the PYTHONPATH. Format: module_name: method_name
runtime: The runtime for the model custom service code. Default PYTHON
batch_size: The inference batch size. Default 1
max_batch_delay: This is the maximum batch delay time TorchServe waits to receive batch_size number of requests.
Default 100 ms
initial_workers: The number of initial workers to create.
TorchServe will not run inference when initial_workers=0. Default 0
synchronous: Whether or not the creation of worker is synchronous. Default false
response_timeout: Timeout, in seconds, maximum time for model’s backend workers process a request.
Raise Error 500 if no response. Default 120 s
1.7.2 Cancel a model registration.
$ python ts_scripts/torchserve_grpc_client.py unregister squeezenet1_1$ curl -X DELETE http://localhost:8081/models/squeezenet1_1/1.0Use torchsever --stop if the server is running. After that run this command.
$ torchserve --start --ncs --model-store model_store --models model1 model2
--start Start the model-server
--stop Stop the model-server
--ts-config TS_CONFIG
Configuration file for model server
--no-config-snapshots, --ncs
Prevents to server from storing config snapshot files.
--models Load models. There are some options:
standalone: default: N/A, No models are loaded at start up.
all: Load all models present in model_store.
model1.mar, model2.mar: Load models in the specified MAR files from model_store.
model1=model1.mar, model2=model2.mar: Load models with the specified names and MAR files from model_store.
TorchServe currently provides the following types of logs
-
Access logs
-
TorchServe logs
-
When you load TorchServe with a model and run inference against the server, the following logs are collected into the
access_log.log:2018-10-15 13:56:18,976 [INFO ] BackendWorker-9000 ACCESS_LOG - /127.0.0.1:64003 "POST /predictions/resnet-18 HTTP/1.1" 200 118
-The above log tells us that a successful POST call to /predictions/resnet-18 was made by remote host 127.0.0.1:64003 it took 118ms to complete this request.
- These logs collect all the logs from TorchServe and from the backend workers (the custom model code).
The following logs are collected into the ts_log.log (We load 2 worker for model my_model_name):
2021-07-02 13:04:51,816 [DEBUG] W-9000-my_model_name_0.1 org.pytorch.serve.wlm.WorkerThread - W-9000-my_model_name_0.1 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2021-07-02 13:04:51,829 [INFO ] W-9001-my_model_name_0.1 org.pytorch.serve.wlm.WorkerThread - Backend response time: 1864
2021-07-02 13:04:51,829 [DEBUG] W-9001-my_model_name_0.1 org.pytorch.serve.wlm.WorkerThread - W-9001-my_model_name_0.1 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2021-07-02 13:05:58,061 [INFO ] W-9000-my_model_name_0.1 org.pytorch.serve.wlm.WorkerThread - Backend response time: 8467
2021-07-02 13:05:58,181 [DEBUG] W-9000-my_model_name_0.1 org.pytorch.serve.job.Job - Waiting time ns: 122486975, Backend time ns: 8589969943
- To debug your python code or model load failed, we can check logs from
model_log.log. Example:2021-06-29 22:55:24,443 [INFO ] W-9003-my_model_name_1_0.1-stdout MODEL_LOG - File "/home/congdao/Desktop/TorchServe-REST/yolov5_torchserve_v2/yolov5 /torchserve_handler.py", line 4, in <module> 2021-06-29 22:55:24,444 [INFO ] W-9003-my_model_name_1_0.1-stdout MODEL_LOG - from utils2.datasets import letterbox 2021-06-29 22:55:24,447 [INFO ] W-9003-my_model_name_1_0.1-stdout MODEL_LOG - ModuleNotFoundError: No module named 'utils2'
-
If you want to fix bugs from model, you just have to care about
model_log.logfile. Or if you want to check connections from requests to server, you only care aboutaccess_log.log. -
To debug python script, we can use logging.info and these logs will be show on
model_log.logfile.
Types of metrics
-
System metrics - log_directory/ts_metrics.log
-
Custom metrics - log directory/model_metrics.log
TorchServe emits metrics to log files by default. To enable JSON formatting for metrics, change the following line in log4j.properties:
log4j.appender.ts_metrics.layout = org.pytorch.serve.util.logging.JSONLayoutWe can Create dimension object(s), Add generic metrics, Add time-based metrics, Add size-based metrics, Add Percentage based metrics, Add counter-based metrics.
Here for more informations about custom metrics.
A PyTorch model has two mode. There are Eager Mode and Script Mode.
A short explanation about those is Eager Mode always use for training model, and Script Mode use for production. Script Mode helps model portability and more performance. See the image below.
We can check what mode your model is by print model.graph. Only a scripted model has a graph, a eager model will raise an error.
For more informations about Script Mode
-
In particularly, we save a model by torch.save that means you save the model in eager mode.
-
Like the way we did before to archive a model in eager model we need to create a python file containing model architecture and pass through the --model-file argument.
-
To archive a custom model we need to save the model in script mode by using two methods: torch.jit.trace or torch.jit.script.
The way to convert from eager mode to script mode by torch.jit.trace:
-
Take an existing eager mode, and provide example inputs.
-
The tracer runs the function, recording the tensorflow operations performed.
We turn recording into a Script Module.
-
Can reuse eager model code.
-
Control-flow an data structures are ignored. (That means it doesn't work well with if statements or for loops).
Pass instance of your model to torch.jit.script().
-
Control-flow is preserved.
-
print statement can be used for debugging.
-
Remove the script() call to debug as a standard PyTorch module.
Try to use torch.jit.trace and torch.jit.script here.
I am going to test on pretrained yolov5s.
Don't foget to install Torch Serve first.
- Clone this repo to get Yolov5 weights,
index_to_name.json, andtorchserve_handler.py.
$ git clone https://github.com/congdaoduy298/TorchServe.git$ cd TorchServe/yolov5_torchserve $ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip install -r requirements.txt$ cd ..$ python3 yolov5/export.py --weights yolov5s.pt --img 640 --batch 1We can see a yolov5s.torchscript.pt has been created in yolov5_torchserve folder.
Create model_store to save archived model.
$ mkdir model_store torch-model-archiver --model-name yolov5s --version 0.1 --serialized-file yolov5s.torchscript.pt --export-path model_store --handler torchserve_handler.py --extra-files index_to_name.json -f-
Handler file is extremely important and is specified by
--handler. -
Handler file controls preprocessing data, passes data through model and gets the predictions.
-
You can create your own handler. See more How to write a handler file?.
Now, yolov5s.mar will be exported in model_store folder.
Before we start TorchServe, try to add yolov5 path to PYTHONPATH. It makes us can import all codes in yolov5 (In this case, I use letterbox function from utils.datasets).
$ export PYTHONPATH=$PYTHONPATH:/[YOUR PATH TO YOLOV5] Example:
$ export PYTHONPATH=$PYTHONPATH:/home/user/Desktop/TorchServe/yolov5_torchserve/yolov5 Start TorchServe and get predictions just like the steps before.
Configure TorchServe gRPC listening ports
The inference gRPC API is listening on port 7070, and the management gRPC API is listening on port 7071 by default.
To configure different ports use following properties
grpc_inference_port: Inference gRPC API binding port. Default: 7070
grpc_management_port: management gRPC API binding port. Default: 7071
Here are a couple of examples:
grpc_inference_port=8888
grpc_management_port=9999
Save to config.properties in where you call torchserve.
It happens when the server is busy. Worker has tasks to do.
Increase job queue size.
In file config.properties we add job_queue_size=num_job.
Example: job_queue_size = 100 (add maximum 100 jobs in queue when server is busy).
=> Maximum number of task at a specific time = default_workers_per_model + job_queue_size
It means Response Time Out. The way to solve this problem:
Solution: Set bigger value for respone time out.
Example: default_response_timeout = 180 (Default 120s)
TorchServe, công cụ hỗ trợ triển khai mô hình PyTorch.
TorchScript and PyTorch JIT | Deep Dive.


