Pytorch官方Faster R-CNN源代码解析(一)——特征提取


Pytorch官方使用的示例代码如下:

import torch
import torchvision

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
# For training
images, boxes = torch.rand(4, 3, 600, 1200), torch.rand(4, 11, 4)
boxes[:, :, 2:4] = boxes[:, :, 0:2] + boxes[:, :, 2:4]
labels = torch.randint(1, 91, (4, 11))
images = list(image for image in images)
targets = []
for i in range(len(images)):
    d = {
          
   boxes: boxes[i], labels: labels[i]}
    targets.append(d)
output = model(images, targets)
# For inference
model.eval()
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
predictions = model(x)

# optionally, if you want to export the model to ONNX:
torch.onnx.export(model, x, "faster_rcnn.onnx", opset_version = 11)

下面主要就示例代码进行详细说明。


首先,初始化 Faster R-CNN 模型。

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

可以看出,这里使用的是主干网络 Resnet-50-FPN 的 Faster R-CNN。接下来 Debug 进内部代码。

def fasterrcnn_resnet50_fpn(pretrained=False, progress=True,
                            num_classes=91, pretrained_backbone=True, trainable_backbone_layers=3, **kwargs):
    """
    Constructs a Faster R-CNN model with a ResNet-50-FPN backbone.
	构建一个主干网络为 ResNet-50-FPN 的 Faster R-CNN 模型。
	
    The input to the model is expected to be a list of tensors, each of shape ``[C, H, W]``, one for each
    image, and should be in ``0-1`` range. Different images can have different sizes.
	模型的输入应该为一个由tensors组成的列表,每个tensor的形状为[C,H,W],对于每一个图像的元素值都应该在[0,1]的范围内,不同的图像有着不同的尺寸。
	
    The behavior of the model changes depending if it is in training or evaluation mode.
	模型有训练与评估两种模式,模型的表现取决于模型所处的模式。
	
    During training, the model expects both the input tensors, as well as a targets (list of dictionary),
    containing:
        - boxes (``FloatTensor[N, 4]``): the ground-truth boxes in ``[x1, y1, x2, y2]`` format, with values of ``x``
          between ``0`` and ``W`` and values of ``y`` between ``0`` and ``H``
        - labels (``Int64Tensor[N]``): the class label for each ground-truth box
	在训练过程中,模型需要输入图像的tensor,以及目标(字典组成的列表),其包含:
		- 边框(FloatTensor[N,4]):真实框为[x1,y1,x2,y2]的形式,x 的值在 0~W 之间,y 的值在 0-H 之间。
		- 标签(Int64Tensor[N]):每个真实框的类别标签。

    The model returns a ``Dict[Tensor]`` during training, containing the classification and regression
    losses for both the RPN and the R-CNN.
	在训练期间,模型返回一个 ”Dict[Tensor]“,包含 RPN 与 R-CNN 阶段的分类与回归损失。
	
    During inference, the model requires only the input tensors, and returns the post-processed
    predictions as a ``List[Dict[Tensor]]``, one for each input image. The fields of the ``Dict`` are as
    follows:
        - boxes (``FloatTensor[N, 4]``): the predicted boxes in ``[x1, y1, x2, y2]`` format, with values of ``x``
          between ``0`` and ``W`` and values of ``y`` between ``0`` and ``H``
        - labels (``Int64Tensor[N]``): the predicted labels for each image
        - scores (``Tensor[N]``): the scores or each prediction
	在推理过程中,模型仅需要输入图像的tensor,然后返回经过后处理的预测结果以 "List[Dict[Tensor]]" 的形式,对于每一个输入图像,其 "Dict" 域如下:
		- 边框(FloatTensor[N,4]):预测框为[x1,y1,x2,y2]的形式,x 的值在 0~W 之间,y 的值在 0~H 之间。
		- 标签(Int64Tensor[N]):每个图像的预测标签。
		- 分数(Tensor[N]):每个预测的分数。
	
    Faster R-CNN is exportable to ONNX for a fixed batch size with inputs images of fixed size.
	Faster R—CNN 可以被导出为一个固定批大小域固定尺寸输入图像的 ONNX 格式。
	
    Arguments:
        pretrained (bool): If True, returns a model pre-trained on COCO train2017
        progress (bool): If True, displays a progress bar of the download to stderr
        pretrained_backbone (bool): If True, returns a model with backbone pre-trained on Imagenet
        num_classes (int): number of output classes of the model (including the background)
        trainable_backbone_layers (int): number of trainable (not frozen) resnet layers starting from final block.
            Valid values are between 0 and 5, with 5 meaning all backbone layers are trainable.
    参数:
    	pretrianed(bool):如果为真,返回一个在 COCO train2017 上的预训练模型。
    	progress(bool):如果为真,将下载进度条展示在屏幕。
    	pretrained_backbone(bool):如果为真,返回一个在 Imagenet 上的主干网络预训练模型。
    	num_classes(int):模型输出的种类数量(包括背景)。
    	trainable_backbone_layers(int):从最后一个块开始可训练 ResNet 层的数量(未被冻结)。合法的值在 0~5 之间,5 意味着所有主干网络的层都是可训练的。
	"""
	# 使用 assert 判断 trainable_backbone_layers 的值是否合法
    assert trainable_backbone_layers <= 5 and trainable_backbone_layers >= 0 
    # dont freeze any layers if pretrained model or backbone is not used
    # 如果预训练模型或者预训练主干网络未被使用,不要冻结任何层。
    if not (pretrained or pretrained_backbone):
        trainable_backbone_layers = 5
    if pretrained:
        # no need to download the backbone if pretrained is set
        # 如果预训练模型被使用,就不需要下载预训练主干网络
        pretrained_backbone = False
   	# 获取 ResNet_FPN 主干网络
    backbone = resnet_fpn_backbone(resnet50, pretrained_backbone, trainable_layers=trainable_backbone_layers)
    # 获取 Faster R-CNN 模型
    model = FasterRCNN(backbone, num_classes, **kwargs)
    if pretrained:
        # 如果使用预训练模型,就下载相关的预训练模型配置
        state_dict = load_state_dict_from_url(model_urls[fasterrcnn_resnet50_fpn_coco],
                                              progress=progress)
        # 加载模型配置到模型中
        model.load_state_dict(state_dict)
    return model # 返回模型

Debug 进获取 ResNet_FPN 主干网络对应代码。

def resnet_fpn_backbone(
    backbone_name,
    pretrained,
    norm_layer=misc_nn_ops.FrozenBatchNorm2d,
    trainable_layers=3,
    returned_layers=None,
    extra_blocks=None
):
    """
    Constructs a specified ResNet backbone with FPN on top. Freezes the specified number of layers in the backbone.
    构建一个在顶端加入FPN的ResNet主干网络。冻结主干网络中指定数量的层。

原创文章,作者:ItWorker,如若转载,请注明出处:https://blog.ytso.com/292644.html

(0)
上一篇 2022年11月6日
下一篇 2022年11月6日

相关推荐

发表回复

登录后才能评论