利用TorchScript+libtorch部署c++模型

本文对应的代码可以在百度云链接-密码: t6pu找到。

利用TorchScriptlibtorch可以快速将PyTorch提供的基于Python的模型迁移到到C ++加载和执行的序列化表示形式,而无需依赖Python。

部署c++模型分为两部:

  1. 将pytorch模型转为TorchScript模型文件并序列化到磁盘文件中。
  2. 利用libtorch反序列化TorchScript模型文件,并用libtorch提供的API进行推断。

将已有模型转为Torchscript

TorchScript可以视为PyTorch模型的一种中间表示,TorchScript表示的PyTorch模型可以直接在C++中进行读取。PyTorch在1.0版本之后都可以使用TorchScript的方式来构建序列化的模型。TorchScript提供了TracingAnnotation两种应用方式。

通过 Tracing 的形式转换成Torchscript模型

一个例子是:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import torch
import cv2
from src.loftr import LoFTR, default_cfg

# 1. 加载模型
matcher = LoFTR(config=default_cfg)
model = torch.load("/home/jiajie/3d_reco/LoFTR_2021/LoFTR-master/weights/indoor_ds.ckpt")['state_dict']
matcher.load_state_dict(model, strict = True)
matcher = matcher.eval().cuda()

# 2. 加载数据
img0_pth = "/home/jiajie/3d_reco/LoFTR_pretrained_cpp/models_script/test_imgs/indoor0.JPG"
img1_pth = "/home/jiajie/3d_reco/LoFTR_pretrained_cpp/models_script/test_imgs/indoor1.JPG"
img0_raw = cv2.imread(img0_pth, cv2.IMREAD_GRAYSCALE)
img1_raw = cv2.imread(img1_pth, cv2.IMREAD_GRAYSCALE)
#indoor
img0_raw = cv2.resize(img0_raw, (640, 480))
img1_raw = cv2.resize(img1_raw, (640, 480))

img0 = torch.from_numpy(img0_raw)[None][None].cuda() / 255.
img1 = torch.from_numpy(img1_raw)[None][None].cuda() / 255.
batch = {'image0': img0, 'image1': img1}

# 3. 推断
# Inference with LoFTR and get prediction
with torch.no_grad():
	# matcher(batch)
    traced_script_module = torch.jit.trace(matcher, batch, strict=False)
    
# 4. 保存参数
traced_script_module.save('/home/jiajie/3d_reco/LoFTR_pretrained_cpp/models_script/weights/indoor_ds.pt')

上面的流程和基于python的推断过程不同的地方仅在于:

1
matcher(batch)

改为

1
traced_script_module = torch.jit.trace(matcher, batch, strict=False)

其中matcher是定义的pytorch模型,batch是输入数据, strict=False允许模型的输出以字典的形式存在。

torch.jit.trace的使用是依赖于输入的数据,它只会记录由输入数据决定的网络的分支。所以当网络模型因为输入数据的不同出现多分支时,不应该使用torch.jit.trace的形式序列化已有模型。

由于torch.jit.trace高度依赖于输入数据,实际部署使用中,输入数据的维度必须和转换模型时的数据维度一致,比如上述的输入图像尺寸是 $640 \times 480$, 当在c++代码中改变输入图像的尺寸:

1
2
3
4
cv::Mat img0 = cv::imread(image_name_vec[0], cv::IMREAD_GRAYSCALE);
cv::Mat img1 = cv::imread(image_name_vec[1], cv::IMREAD_GRAYSCALE);
cv::resize(img0, img0, cv::Size(800,800));
cv::resize(img1, img1, cv::Size(800,800));

会出现以下报错:

1
RuntimeError: shape '[1, 60, 80, 60, 80]' is invalid for input of size 100000000

需要固定输入尺寸并不利于实际部署,比如在面对不同输入尺寸的图像,或者需要扩大输入尺寸来充分利用GPU的计算能力。这时候可以通过另一种方式生成Torchscript模型文件。

通过 Annotation 的形式转换成Torchscript模型

Annotation允许完全保存网络的分支, 并且支持数据动态维度输入(当然这得看具体网络结构)。

Annotation 通过Torch Script 脚本语言中重新定义模型, 将原来转换为ScriptModule并保存。

官方的例子

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
class MyModule(torch.nn.Module):
    def __init__(self, N, M):
        super(MyModule, self).__init__()
        self.weight = torch.nn.Parameter(torch.rand(N, M))

    def forward(self, input):
        if input.sum() > 0:
          output = self.weight.mv(input)
        else:
          output = self.weight + input
        return output

my_module = MyModule(10,20)
sm = torch.jit.script(my_module)

上面的例子其实可以认为是Torch Script语言编写的而不是python,但由于Torch Scriptpython语言的子集,上面的例子没看出区别,但一些python的数据类型在Torch Script中是不支持的,有一些的数据类型的使用方法在两个脚本语言中的使用方法也有所不同。具体信息可以查看jit_language_reference这个链接

通过Annotation方式转换模型需要注意几点:

  1. 确保整个pytorch模型流程完全有python语言和pytorch操作算子组成,避免出现用numpy数据处理再转为tensor这些操作。
  2. Torch Script中,默认的数据类型是torch.Tensor,如果数据是其他类型,需要显式地标注,比如在一个类模块中:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
class LocalFeatureTransformer(nn.Module):
    """A Local Feature Transformer (LoFTR) module."""

    def __init__(self, config):
        super(LocalFeatureTransformer, self).__init__()

        self.config = config
        self.d_model = config['d_model']
        self.nhead = config['nhead']
        self.layer_names = config['layer_names']
        encoder_layer = LoFTREncoderLayer(config['d_model'], config['nhead'], config['attention'])
        self.layers = nn.ModuleList([copy.deepcopy(encoder_layer) for _ in range(len(self.layer_names))])
        self._reset_parameters()

    def _reset_parameters(self):
        for p in self.parameters():
            if p.dim() > 1:
                nn.init.xavier_uniform_(p)

    def forward(self, feat0, feat1, mask0=None, mask1=None):
        """
        Args:
            feat0 (torch.Tensor): [N, L, C]
            feat1 (torch.Tensor): [N, S, C]
            mask0 (torch.Tensor): [N, L] (optional)
            mask1 (torch.Tensor): [N, S] (optional)
        """

        # assert self.d_model == feat0.size(2), "the feature number of src and transformer must be equal"
        for layer, name in zip(self.layers, self.layer_names):
            if name == 'self':
                feat0 = layer(feat0, feat0, mask0, mask0)
                feat1 = layer(feat1, feat1, mask1, mask1)
            elif name == 'cross':
                feat0 = layer(feat0, feat1, mask0, mask1)
                feat1 = layer(feat1, feat0, mask1, mask0)
            else:
                raise KeyError

        return feat0, feat1

Torch Script标注的形式是这样的:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
class LocalFeatureTransformer(nn.Module):
    """A Local Feature Transformer (LoFTR) module."""

    def __init__(self, config):
        super(LocalFeatureTransformer, self).__init__()

        self.config = config
        self.d_model = config['d_model']
        self.nhead = config['nhead']
        self.layer_names = config['layer_names']
        encoder_layer = LoFTREncoderLayer(config['d_model'], config['nhead'], config['attention'])
        self.layers = nn.ModuleList([copy.deepcopy(encoder_layer) for _ in range(len(self.layer_names))])
        self._reset_parameters()

    def _reset_parameters(self):
        for p in self.parameters():
            if p.dim() > 1:
                nn.init.xavier_uniform_(p)

    def forward(self, feat0, feat1, mask0:Optional[torch.Tensor], mask1:Optional[torch.Tensor])->Tuple[torch.Tensor, torch.Tensor]:
        """
        Args:
            feat0 (torch.Tensor): [N, L, C]
            feat1 (torch.Tensor): [N, S, C]
            mask0 (torch.Tensor): [N, L] (optional)
            mask1 (torch.Tensor): [N, S] (optional)
        """

        assert self.d_model == feat0.size(2), "the feature number of src and transformer must be equal"
       
        for idx, v in enumerate(self.layers):
            name = self.layer_names[idx];
            layer = v;
            if name == 'self':
                feat0 = layer(feat0, feat0, mask0, mask0)
                feat1 = layer(feat1, feat1, mask1, mask1)
            elif name == 'cross':
                feat0 = layer(feat0, feat1, mask0, mask1)
                feat1 = layer(feat1, feat0, mask1, mask0)
            else:
                raise KeyError

        return feat0, feat1

注意在def forward()函数中的数据类型,不是属于torch.Tensor的数据都被显式地标注了类型,包括返回类型。

  1. Torch Script不支持位置参数(*)关键词参数(**), 具体解决方法得看实际问题,比如:在rearrange函数中
1
2
def rearrange(tensor, pattern, **axes_lengths):
	...

含有关键词参数axes_lengths,用来支持自定义维度扩展,这使得Torch Script不支持该函数,修改方法:

1
feat_c0 = rearrange(self.pos_encoding(feat_c0), 'n c h w -> n (h w) c')

等价于:

1
2
3
4
feat_c0 = self.pos_encoding(feat_c0)
feat_c0 = torch.transpose(feat_c0,1,3)
feat_c0 = torch.transpose(feat_c0,1,2)
feat_c0 = torch.reshape(feat_c0, (feat_c0.shape[0], feat_c0.shape[1]*feat_c0.shape[2], feat_c0.shape[3]))
  1. Torch Scriptiteration支持也不够好:
1
2
3
4
5
6
7
8
9
for layer, name in zip(self.layers, self.layer_names):
    if name == 'self':
    	feat0 = layer(feat0, feat0, mask0, mask0)
    	feat1 = layer(feat1, feat1, mask1, mask1)
    elif name == 'cross':
    	feat0 = layer(feat0, feat1, mask0, mask1)
    	feat1 = layer(feat1, feat0, mask1, mask0)
    else:
    	raise KeyError

改为:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
for idx, v in enumerate(self.layers):
	name = self.layer_names[idx];
	layer = v;
	if name == 'self':
		feat0 = layer(feat0, feat0, mask0, mask0)
		feat1 = layer(feat1, feat1, mask1, mask1)
	elif name == 'cross':
		feat0 = layer(feat0, feat1, mask0, mask1)
		feat1 = layer(feat1, feat0, mask1, mask0)
	else:
		raise KeyError

其他一些错误,和Torch Script不支持的类型,见一位老哥的博客,写的很详细。

Tracing 中使用 Script

由于Tracing不适用于多分支网络的情况,可以在多分支的地方通过显式调用@torch.jit.script可以记录所有分支,官方的例子:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import torch

@torch.jit.script
def foo(x, y):
    if x.max() > y.max():
        r = x
    else:
        r = y
    return r


def bar(x, y, z):
    return foo(x, y) + z

traced_bar = torch.jit.trace(bar, (torch.rand(3), torch.rand(3), torch.rand(3)))

Script 中使用 Tracing

反过来,另一种情况是在script module中用tracing生成子模块,对于一些存在script module不支持的python featurelayer,就可以把相关layer封装起来,用trace记录相关layer流,其他layer不用修改。使用示例如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import torch
import torchvision
 
class MyScriptModule(torch.nn.Module):
    def __init__(self):
        super(MyScriptModule, self).__init__()
        self.means = torch.nn.Parameter(torch.tensor([103.939, 116.779, 123.68])
                                        .resize_(1, 3, 1, 1))
        self.resnet = torch.jit.trace(torchvision.models.resnet18(),
                                      torch.rand(1, 3, 224, 224))
 
    def forward(self, input):
        return self.resnet(input - self.means)
 
my_script_module = torch.jit.script(MyScriptModule())

定义完TorchScript模型,保存模型:

1
torch.jit.save(sm, torch_script_path)

libtorch加载序列化模型

CMakeLists:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(loftr)

if (NOT CMAKE_BUILD_TYPE)
    message(STATUS "No build type selected, default to Release")
    set(CMAKE_BUILD_TYPE "Release")
endif()

set(CMAKE_PREFIX_PATH ${CMAKE_CURRENT_SOURCE_DIR}/dependences/libtorch-shared-with-deps-1.8.1+cu102/libtorch)

find_package(Torch REQUIRED)
find_package(OpenCV REQUIRED)

add_executable(loftr loftr.cpp)
target_link_libraries(loftr ${OpenCV_LIBS} ${TORCH_LIBRARIES})
include_directories(${OpenCV_INCLUDE_DIRS})

file(COPY
  ${CMAKE_CURRENT_SOURCE_DIR}/models_script/weights/indoor_ds.pt
  DESTINATION ${CMAKE_BINARY_DIR}
)

一个例子是:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
void infer(std::vector<cv::String> image_name_vec, std::string executable_dir)
{
    std::cout<<"start loftr match..."<<std::endl;
    torch::manual_seed(1);
    torch::autograd::GradMode::set_enabled(false);

    torch::Device device(torch::kCPU);

    if (torch::cuda::is_available()) {
        std::cout << "CUDA is available! Training on GPU." << std::endl;
        device = torch::Device(torch::kCUDA);
    }
	//加载模型
    auto module_path = executable_dir + "/" + "indoor_ds.pt";
    torch::jit::script::Module loftr = torch::jit::load(module_path);
    loftr.eval();
    loftr.to(device);
	//加载数据
    //read images
    cv::Mat img0 = cv::imread(image_name_vec[0], cv::IMREAD_GRAYSCALE);
    cv::Mat img1 = cv::imread(image_name_vec[1], cv::IMREAD_GRAYSCALE);

    //convert to tensor
    torch::Tensor image0 = mat2tensor(img0).to(device);
    torch::Tensor image1 = mat2tensor(img1).to(device);

    //dump to loftr network
    torch::Dict<std::string, Tensor> output;
    torch::Dict<std::string, Tensor> input;

    input.insert("image0", image0);
    input.insert("image1", image1);
	//模型推断
    output = toTensorDict(loftr.forward({input}));

    at::Tensor mkpts0_f = output.at("mkpts0_f"); //N*2
    at::Tensor mkpts1_f = output.at("mkpts1_f"); //N*2
    at::Tensor mconf = output.at("mconf"); //N*1 匹配的置信度
}

具体代码实现见文章最上面的百度云链接。

updatedupdated2021-04-292021-04-29