tensorrt处理动态shape - 文举的博客

tensorrt处理动态shape
2022-03-10

使用tensorrt时如何处理动态shape

1、onnx的动态输出

若要使用tensorrt的动态shape支持，那么在导出onnx时就应该指定动态shape的dimension。详见下面的代码中的dynamic_axes。

# Input to the model
x = torch.randn(batch_size, 1, 224, 224, requires_grad=True)
torch_out = torch_model(x)

# Export the model
torch.onnx.export(torch_model,               # model being run
                  x,                         # model input (or a tuple for multiple inputs)
                  "super_resolution.onnx",   # where to save the model (can be a file or file-like object)
                  export_params=True,        # store the trained parameter weights inside the model file
                  opset_version=10,          # the ONNX version to export the model to
                  do_constant_folding=True,  # whether to execute constant folding for optimization
                  input_names = ['input'],   # the model's input names
                  output_names = ['output'], # the model's output names
                  dynamic_axes={'input' : {0 : 'batch_size'},    # variable length axes
                                'output' : {0 : 'batch_size'}})

如果不在输出onnx时指定动态shape的维度，那么在使用tensorrt的动态shape支持时，可能会报错，注意，这里是可能会报错。

因为如果某个layer中使用的hard code的shape参数，导出成onnx时没有声明动态shape，则tensorrt在将onnx 解析成engine时，会出现错误。详见这个回答：https://forums.developer.nvidia.com/t/working-with-dynamic-shape-example/112738/3。

2、tensorrt对动态shape的支持演进

tensorrt发展很快。早期的版本中，tensorrt有两种运行模式，explicit batch 和implicit batch。 implicit batch模式下，默认每个tensor都有第0维都是batch，其他的维度是完全一样并且是固定不变的。

explicit batch 模式下，tensor所有的维度都需要明确指定并且是动态的。这是tensorrt为了支持动态shape开发的模式。

当前，implicit batch已经处于过期状态。官方只推荐explicit batch模式。不过，由于历史遗留原因，这个名字中虽然有batch，但实际上所有的维度都是可变的，需要留心。

3、支持动态shape引入的概念

onnx导出时虽然指定了动态的dimension。但是，使用tensorrt转化成engine时，最好还是给一些提示，指定这些维度的取值范围，这可以帮助tensorrt生成性能更好的engine。这就是optimization profile的作用。如下：

An optimization profile describes a range of dimensions for each network input and the dimensions that the auto-tuner will use for optimization.

举例来讲，一个optimization profile可能是如下所示：

最小的shape： [3,100,200]

最大的shape： [3,200,300]

优化用shape： [3,150,250]

这对应到trtexec中的–minShapes 、 –maxShapes 和 –optShapes三个参数。

深入一点，一个engine可以支持多个optimization profile。在实际使用时需要指定使用哪个profile。profile是按照0，1，2…的顺序进行编码。

4、使用示例

首先，导出成onnx时使用dynamic。

def export_onnx(model,image_shape,onnx_path, batch_size=1):
    x,y=image_shape
    img = torch.zeros((batch_size, 3, x, y))
    dynamic_onnx=True
    if dynamic_onnx:
        dynamic_ax = {'input_1' : {2 : 'image_height',3:'image_wdith'},   
                                'output_1' : {2 : 'image_height',3:'image_wdith'}}
        torch.onnx.export(model, (img), onnx_path, 
           input_names=["input_1"], output_names=["output_1"], verbose=False, opset_version=11,dynamic_axes=dynamic_ax)
    else:
        torch.onnx.export(model, (img), onnx_path, 
           input_names=["input_1"], output_names=["output_1"], verbose=False, opset_version=11
    )

第二，构建engine时添加profile

def build_engine(onnx_path, using_half,engine_file,dynamic_input=True):
    trt.init_libnvinfer_plugins(None, '')
    with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
        builder.max_batch_size = 1
        config = builder.create_builder_config()
        config.max_workspace_size = GiB(1)
        if using_half:
            config.set_flag(trt.BuilderFlag.FP16)
        # Load the Onnx model and parse it in order to populate the TensorRT network.
        with open(onnx_path, 'rb') as model:
            if not parser.parse(model.read()):
                print ('ERROR: Failed to parse the ONNX file.')
                for error in range(parser.num_errors):
                    print (parser.get_error(error))
                return None
        ##增加部分
        if dynamic_input:
            profile = builder.create_optimization_profile();
            profile.set_shape("input_1", (1,3,512,512), (1,3,1600,1600), (1,3,1024,1024)) 
            config.add_optimization_profile(profile)
        #加上一个sigmoid层
        previous_output = network.get_output(0)
        network.unmark_output(previous_output)
        sigmoid_layer=network.add_activation(previous_output,trt.ActivationType.SIGMOID)
        network.mark_output(sigmoid_layer.get_output(0))
        return builder.build_engine(network, config) 

第三，使用时指定实际的shape和使用的profile。

def profile_trt(engine, imagepath,batch_size):
    assert(engine is not None)  
    
    input_image,input_shape=preprocess_image(imagepath)

    segment_inputs, segment_outputs, segment_bindings = allocate_buffers(engine, True,input_shape)
    
    stream = cuda.Stream()    
    with engine.create_execution_context() as context:
        context.active_optimization_profile = 0#增加部分
        origin_inputshape=context.get_binding_shape(0)
        #增加部分
        if (origin_inputshape[-1]==-1):
            origin_inputshape[-2],origin_inputshape[-1]=(input_shape)
            context.set_binding_shape(0,(origin_inputshape))
        input_img_array = np.array([input_image] * batch_size)
        img = torch.from_numpy(input_img_array).float().numpy()
        segment_inputs[0].host = img
        [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in segment_inputs]#Copy from the Python buffer src to the device pointer dest (an int or a DeviceAllocation) asynchronously,
        stream.synchronize()#Wait for all activity on this stream to cease, then return.
       
        context.execute_async(bindings=segment_bindings, stream_handle=stream.handle)#Asynchronously execute inference on a batch. 
        stream.synchronize(）
        [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in segment_outputs]#Copy from the device pointer src (an int or a DeviceAllocation) to the Python buffer dest asynchronously
        stream.synchronize()
        results = np.array(segment_outputs[0].host).reshape(batch_size, input_shape[0],input_shape[1])    
    return results.transpose(1,2,0)

另外一个简单使用tensorrt 动态shape支持的方法是使用torch2trt_dynamic。当使用情景比较简单时，可以尝试。

Refs:

https://zhuanlan.zhihu.com/p/299845547

https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#opt_profiles

原创文章，转载请注明出处，否则拒绝转载！
本文链接：抬头看浏览器地址栏

上篇： tensorrt基本使用

下篇： CUDA中矩阵相乘的stride技巧

tensorrt处理动态shape
2022-03-10

1、onnx的动态输出

2、tensorrt对动态shape的支持演进

3、支持动态shape引入的概念

4、使用示例

栏目分类

最新文章

tensorrt处理动态shape 2022-03-10

1、onnx的动态输出

2、tensorrt对动态shape的支持演进

3、支持动态shape引入的概念

4、使用示例

栏目分类

最新文章

tensorrt处理动态shape
2022-03-10