本文对应的代码可以在百度云链接-密码: t6pu找到。
利用TorchScript和libtorch可以快速将PyTorch提供的基于Python的模型迁移到到C ++加载和执行的序列化表示形式,而无需依赖Python。
部署c++模型分为两部:
- 将pytorch模型转为TorchScript模型文件并序列化到磁盘文件中。
- 利用libtorch反序列化TorchScript模型文件,并用libtorch提供的API进行推断。
TorchScript可以视为PyTorch模型的一种中间表示,TorchScript表示的PyTorch模型可以直接在C++中进行读取。PyTorch在1.0版本之后都可以使用TorchScript的方式来构建序列化的模型。TorchScript提供了Tracing和Annotation两种应用方式。
一个例子是:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
|
import torch
import cv2
from src.loftr import LoFTR, default_cfg
# 1. 加载模型
matcher = LoFTR(config=default_cfg)
model = torch.load("/home/jiajie/3d_reco/LoFTR_2021/LoFTR-master/weights/indoor_ds.ckpt")['state_dict']
matcher.load_state_dict(model, strict = True)
matcher = matcher.eval().cuda()
# 2. 加载数据
img0_pth = "/home/jiajie/3d_reco/LoFTR_pretrained_cpp/models_script/test_imgs/indoor0.JPG"
img1_pth = "/home/jiajie/3d_reco/LoFTR_pretrained_cpp/models_script/test_imgs/indoor1.JPG"
img0_raw = cv2.imread(img0_pth, cv2.IMREAD_GRAYSCALE)
img1_raw = cv2.imread(img1_pth, cv2.IMREAD_GRAYSCALE)
#indoor
img0_raw = cv2.resize(img0_raw, (640, 480))
img1_raw = cv2.resize(img1_raw, (640, 480))
img0 = torch.from_numpy(img0_raw)[None][None].cuda() / 255.
img1 = torch.from_numpy(img1_raw)[None][None].cuda() / 255.
batch = {'image0': img0, 'image1': img1}
# 3. 推断
# Inference with LoFTR and get prediction
with torch.no_grad():
# matcher(batch)
traced_script_module = torch.jit.trace(matcher, batch, strict=False)
# 4. 保存参数
traced_script_module.save('/home/jiajie/3d_reco/LoFTR_pretrained_cpp/models_script/weights/indoor_ds.pt')
|
上面的流程和基于python的推断过程不同的地方仅在于:
改为
1
|
traced_script_module = torch.jit.trace(matcher, batch, strict=False)
|
其中matcher
是定义的pytorch模型,batch
是输入数据, strict=False
允许模型的输出以字典的形式存在。
torch.jit.trace
的使用是依赖于输入的数据,它只会记录由输入数据决定的网络的分支。所以当网络模型因为输入数据的不同出现多分支时,不应该使用torch.jit.trace
的形式序列化已有模型。
由于torch.jit.trace
高度依赖于输入数据,实际部署使用中,输入数据的维度必须和转换模型时的数据维度一致,比如上述的输入图像尺寸是 $640 \times 480$, 当在c++
代码中改变输入图像的尺寸:
1
2
3
4
|
cv::Mat img0 = cv::imread(image_name_vec[0], cv::IMREAD_GRAYSCALE);
cv::Mat img1 = cv::imread(image_name_vec[1], cv::IMREAD_GRAYSCALE);
cv::resize(img0, img0, cv::Size(800,800));
cv::resize(img1, img1, cv::Size(800,800));
|
会出现以下报错:
1
|
RuntimeError: shape '[1, 60, 80, 60, 80]' is invalid for input of size 100000000
|
需要固定输入尺寸并不利于实际部署,比如在面对不同输入尺寸的图像,或者需要扩大输入尺寸来充分利用GPU的计算能力。这时候可以通过另一种方式生成Torchscript模型文件。
Annotation
允许完全保存网络的分支, 并且支持数据动态维度输入(当然这得看具体网络结构)。
Annotation
通过Torch Script
脚本语言中重新定义模型, 将原来转换为ScriptModule并保存。
官方的例子:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
class MyModule(torch.nn.Module):
def __init__(self, N, M):
super(MyModule, self).__init__()
self.weight = torch.nn.Parameter(torch.rand(N, M))
def forward(self, input):
if input.sum() > 0:
output = self.weight.mv(input)
else:
output = self.weight + input
return output
my_module = MyModule(10,20)
sm = torch.jit.script(my_module)
|
上面的例子其实可以认为是Torch Script
语言编写的而不是python
,但由于Torch Script
是python
语言的子集,上面的例子没看出区别,但一些python
的数据类型在Torch Script
中是不支持的,有一些的数据类型的使用方法在两个脚本语言中的使用方法也有所不同。具体信息可以查看jit_language_reference这个链接。
- 确保整个
pytorch
模型流程完全有python
语言和pytorch
操作算子组成,避免出现用numpy
数据处理再转为tensor
这些操作。
- 在
Torch Script
中,默认的数据类型是torch.Tensor
,如果数据是其他类型,需要显式地标注,比如在一个类模块中:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
|
class LocalFeatureTransformer(nn.Module):
"""A Local Feature Transformer (LoFTR) module."""
def __init__(self, config):
super(LocalFeatureTransformer, self).__init__()
self.config = config
self.d_model = config['d_model']
self.nhead = config['nhead']
self.layer_names = config['layer_names']
encoder_layer = LoFTREncoderLayer(config['d_model'], config['nhead'], config['attention'])
self.layers = nn.ModuleList([copy.deepcopy(encoder_layer) for _ in range(len(self.layer_names))])
self._reset_parameters()
def _reset_parameters(self):
for p in self.parameters():
if p.dim() > 1:
nn.init.xavier_uniform_(p)
def forward(self, feat0, feat1, mask0=None, mask1=None):
"""
Args:
feat0 (torch.Tensor): [N, L, C]
feat1 (torch.Tensor): [N, S, C]
mask0 (torch.Tensor): [N, L] (optional)
mask1 (torch.Tensor): [N, S] (optional)
"""
# assert self.d_model == feat0.size(2), "the feature number of src and transformer must be equal"
for layer, name in zip(self.layers, self.layer_names):
if name == 'self':
feat0 = layer(feat0, feat0, mask0, mask0)
feat1 = layer(feat1, feat1, mask1, mask1)
elif name == 'cross':
feat0 = layer(feat0, feat1, mask0, mask1)
feat1 = layer(feat1, feat0, mask1, mask0)
else:
raise KeyError
return feat0, feat1
|
用Torch Script
标注的形式是这样的:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
|
class LocalFeatureTransformer(nn.Module):
"""A Local Feature Transformer (LoFTR) module."""
def __init__(self, config):
super(LocalFeatureTransformer, self).__init__()
self.config = config
self.d_model = config['d_model']
self.nhead = config['nhead']
self.layer_names = config['layer_names']
encoder_layer = LoFTREncoderLayer(config['d_model'], config['nhead'], config['attention'])
self.layers = nn.ModuleList([copy.deepcopy(encoder_layer) for _ in range(len(self.layer_names))])
self._reset_parameters()
def _reset_parameters(self):
for p in self.parameters():
if p.dim() > 1:
nn.init.xavier_uniform_(p)
def forward(self, feat0, feat1, mask0:Optional[torch.Tensor], mask1:Optional[torch.Tensor])->Tuple[torch.Tensor, torch.Tensor]:
"""
Args:
feat0 (torch.Tensor): [N, L, C]
feat1 (torch.Tensor): [N, S, C]
mask0 (torch.Tensor): [N, L] (optional)
mask1 (torch.Tensor): [N, S] (optional)
"""
assert self.d_model == feat0.size(2), "the feature number of src and transformer must be equal"
for idx, v in enumerate(self.layers):
name = self.layer_names[idx];
layer = v;
if name == 'self':
feat0 = layer(feat0, feat0, mask0, mask0)
feat1 = layer(feat1, feat1, mask1, mask1)
elif name == 'cross':
feat0 = layer(feat0, feat1, mask0, mask1)
feat1 = layer(feat1, feat0, mask1, mask0)
else:
raise KeyError
return feat0, feat1
|
注意在def forward()
函数中的数据类型,不是属于torch.Tensor
的数据都被显式地标注了类型,包括返回类型。
Torch Script
不支持位置参数(*)关键词参数(**), 具体解决方法得看实际问题,比如:在rearrange
函数中
1
2
|
def rearrange(tensor, pattern, **axes_lengths):
...
|
含有关键词参数axes_lengths
,用来支持自定义维度扩展,这使得Torch Script
不支持该函数,修改方法:
1
|
feat_c0 = rearrange(self.pos_encoding(feat_c0), 'n c h w -> n (h w) c')
|
等价于:
1
2
3
4
|
feat_c0 = self.pos_encoding(feat_c0)
feat_c0 = torch.transpose(feat_c0,1,3)
feat_c0 = torch.transpose(feat_c0,1,2)
feat_c0 = torch.reshape(feat_c0, (feat_c0.shape[0], feat_c0.shape[1]*feat_c0.shape[2], feat_c0.shape[3]))
|
Torch Script
对 iteration
支持也不够好:
1
2
3
4
5
6
7
8
9
|
for layer, name in zip(self.layers, self.layer_names):
if name == 'self':
feat0 = layer(feat0, feat0, mask0, mask0)
feat1 = layer(feat1, feat1, mask1, mask1)
elif name == 'cross':
feat0 = layer(feat0, feat1, mask0, mask1)
feat1 = layer(feat1, feat0, mask1, mask0)
else:
raise KeyError
|
改为:
1
2
3
4
5
6
7
8
9
10
11
|
for idx, v in enumerate(self.layers):
name = self.layer_names[idx];
layer = v;
if name == 'self':
feat0 = layer(feat0, feat0, mask0, mask0)
feat1 = layer(feat1, feat1, mask1, mask1)
elif name == 'cross':
feat0 = layer(feat0, feat1, mask0, mask1)
feat1 = layer(feat1, feat0, mask1, mask0)
else:
raise KeyError
|
其他一些错误,和Torch Script
不支持的类型,见一位老哥的博客,写的很详细。
由于Tracing
不适用于多分支网络的情况,可以在多分支的地方通过显式调用@torch.jit.script
可以记录所有分支,官方的例子:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
import torch
@torch.jit.script
def foo(x, y):
if x.max() > y.max():
r = x
else:
r = y
return r
def bar(x, y, z):
return foo(x, y) + z
traced_bar = torch.jit.trace(bar, (torch.rand(3), torch.rand(3), torch.rand(3)))
|
反过来,另一种情况是在script module
中用tracing
生成子模块,对于一些存在script module
不支持的python feature
的layer
,就可以把相关layer
封装起来,用trace
记录相关layer
流,其他layer
不用修改。使用示例如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
import torch
import torchvision
class MyScriptModule(torch.nn.Module):
def __init__(self):
super(MyScriptModule, self).__init__()
self.means = torch.nn.Parameter(torch.tensor([103.939, 116.779, 123.68])
.resize_(1, 3, 1, 1))
self.resnet = torch.jit.trace(torchvision.models.resnet18(),
torch.rand(1, 3, 224, 224))
def forward(self, input):
return self.resnet(input - self.means)
my_script_module = torch.jit.script(MyScriptModule())
|
定义完TorchScript
模型,保存模型:
1
|
torch.jit.save(sm, torch_script_path)
|
CMakeLists:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(loftr)
if (NOT CMAKE_BUILD_TYPE)
message(STATUS "No build type selected, default to Release")
set(CMAKE_BUILD_TYPE "Release")
endif()
set(CMAKE_PREFIX_PATH ${CMAKE_CURRENT_SOURCE_DIR}/dependences/libtorch-shared-with-deps-1.8.1+cu102/libtorch)
find_package(Torch REQUIRED)
find_package(OpenCV REQUIRED)
add_executable(loftr loftr.cpp)
target_link_libraries(loftr ${OpenCV_LIBS} ${TORCH_LIBRARIES})
include_directories(${OpenCV_INCLUDE_DIRS})
file(COPY
${CMAKE_CURRENT_SOURCE_DIR}/models_script/weights/indoor_ds.pt
DESTINATION ${CMAKE_BINARY_DIR}
)
|
一个例子是:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
|
void infer(std::vector<cv::String> image_name_vec, std::string executable_dir)
{
std::cout<<"start loftr match..."<<std::endl;
torch::manual_seed(1);
torch::autograd::GradMode::set_enabled(false);
torch::Device device(torch::kCPU);
if (torch::cuda::is_available()) {
std::cout << "CUDA is available! Training on GPU." << std::endl;
device = torch::Device(torch::kCUDA);
}
//加载模型
auto module_path = executable_dir + "/" + "indoor_ds.pt";
torch::jit::script::Module loftr = torch::jit::load(module_path);
loftr.eval();
loftr.to(device);
//加载数据
//read images
cv::Mat img0 = cv::imread(image_name_vec[0], cv::IMREAD_GRAYSCALE);
cv::Mat img1 = cv::imread(image_name_vec[1], cv::IMREAD_GRAYSCALE);
//convert to tensor
torch::Tensor image0 = mat2tensor(img0).to(device);
torch::Tensor image1 = mat2tensor(img1).to(device);
//dump to loftr network
torch::Dict<std::string, Tensor> output;
torch::Dict<std::string, Tensor> input;
input.insert("image0", image0);
input.insert("image1", image1);
//模型推断
output = toTensorDict(loftr.forward({input}));
at::Tensor mkpts0_f = output.at("mkpts0_f"); //N*2
at::Tensor mkpts1_f = output.at("mkpts1_f"); //N*2
at::Tensor mconf = output.at("mconf"); //N*1 匹配的置信度
}
|
具体代码实现见文章最上面的百度云链接。