（没跑通）在 multimodal Twitter dataset 上使用 VisualBert

2020-04-20

未分类

图片特征提取

因为没有跑通图像特征抽取的代码，所以没有成功复现。

论文：VisualBERT: A Simple and Performant Baseline for Vision and Language

论文地址：https://arxiv.org/abs/1908.03557

仓库地址：https://github.com/uclanlp/visualbert

图片特征提取

第一次尝试（失败）

安装 Detectron

本来 Detectron 是提供了一个 docker，但是我没装上，所以安装了 caffee2 的 docker，然后进一步安装 Detectron，具体过程记录在我上一篇文章里。

下载与训练权重

仓库 README 的 Extract image features on your own 部分中提到，他使用的权重是 35861858，我推测他用的是：

模型：

wget https://dl.fbaipublicfiles.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl

模型信息：3ae556bf3de044a56eb3ecb66fea5cda model_final.pkl 491 MB

对应的配置文件：

1	configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml

我们先把模型下载到代码的根目录，一会启动 docker 可以和代码一起挂在，而 yaml 文件是 detectron 仓库中的，因为 docker 镜像中已经克隆了那个仓库，所以已经有了。

特征提取脚本

目前，只有 NLVR2 的脚本，visualbert/utils/get_image_features/extract_features_nlvr.py

应该不怎么用修改就可以用。

用法是：

1
2
3

#SET = train/dev/test1
cd visualbert/utils/get_image_features
CUDA_VISIBLE_DEVICES=0 python extract_features_nlvr.py --cfg XXX.yaml --wts XXX.pkl --min_bboxes 150 --max_bboxes 150 --feat_name gpu_0/fc6 --output_dir X_NLVR --image-ext png X_NLVR_IMAGE/SET --no_id --one_giant_file X_NLVR/features_SET_150.th

简单看这个脚本的结构，main 函数是入口，通过 recurse_find_image 函数获取图片的列表。然后就是对图片依次处理，这个参数 one_giant_file 好像是用于最后把所有结果输出到一个大文件里的，我感觉没有用，所以可以不加。

运行

启动刚刚创建的，安装好了的 Docker 镜像并挂载数据目录，代码模型目录，输出目录。

1
2

mkdir /tmp/sarcasm_image2
docker run --gpus all -v /home/sxw/jupyter_workspace/Data/sarcasm/dataset_image/:/root/images:ro -v /tmp/sarcasm_image2:/root/features -v /home/sxw/jupyter_workspace/mutil-model/visualbert/visualbert/:/root/code --rm -it my_detectron:v1.0 /bin/bash

进入镜像后，先进入特征提取代码的路径

1	cd /root/code/utils/get_image_features/

安装依赖

1	pip install torch tqdm pycocotools

运行

python extract_features_nlvr.py --cfg /root/detectron/configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml --wts /root/code/model_final.pkl --min_bboxes 150 --max_bboxes 150 --feat_name gpu_0/fc6 --output_dir /root/features/ --image-ext jpg /root/images/

错误太多，不能运行，所以我又改写了脚本

import cv2
from caffe2.python import workspace
import caffe2
import os
from tqdm import tqdm
from detectron.core.config import cfg
import detectron.core.test_engine as model_engine
import detectron.utils.c2 as c2_utils
import detectron.core.test as infer_engine
import numpy as np
from detectron.core.config import merge_cfg_from_file


image_path = '/root/images/'
out_path = '/root/features/'
weight_path ='/root/code/model_final.pkl'
config_path = '/root/detectron/configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml'


workspace.GlobalInit(['caffe2', '--caffe2_log_level=0'])




def get_detections_from_im(cfg, model, im, image_id, feat_blob_name,
                            MIN_BOXES, MAX_BOXES, conf_thresh=0.2, bboxes=None):
    with c2_utils.NamedCudaScope(0):
        scores, cls_boxes, im_scale = infer_engine.im_detect_bbox(model, 
                                                                im,
                                                                cfg.TEST.SCALE,
                                                                cfg.TEST.MAX_SIZE,
                                                                boxes=bboxes)
        box_features = workspace.FetchBlob(feat_blob_name)
        cls_prob = workspace.FetchBlob("gpu_0/cls_prob")
        rois = workspace.FetchBlob("gpu_0/rois")
        max_conf = np.zeros((rois.shape[0]))
        # unscale back to raw image space
        cls_boxes = rois[:, 1:5] / im_scale

        for cls_ind in range(1, cls_prob.shape[1]):
            cls_scores = scores[:, cls_ind]
            dets = np.hstack((cls_boxes, cls_scores[:, np.newaxis])).astype(np.float32)
            keep = np.array(nms(dets, cfg.TEST.NMS))
            max_conf[keep] = np.where(cls_scores[keep] > max_conf[keep], cls_scores[keep], max_conf[keep])

        keep_boxes = np.where(max_conf >= conf_thresh)[0]
        if len(keep_boxes) < MIN_BOXES:
            keep_boxes = np.argsort(max_conf)[::-1][:MIN_BOXES]
        elif len(keep_boxes) > MAX_BOXES:
            keep_boxes = np.argsort(max_conf)[::-1][:MAX_BOXES]
        objects = np.argmax(cls_prob[keep_boxes], axis=1)

    return box_features[keep_boxes], max_conf[keep_boxes], cls_boxes[keep_boxes]

im_list = os.listdir(image_path)
print(len(im_list))
merge_cfg_from_file(config_path)
cfg.NUM_GPUS = 1
model = model_engine.initialize_model_from_cfg(weight_path)
for i, im_name in enumerate(tqdm(im_list)):
    im_base_name = os.path.basename(im_name)
    image_id = im_base_name.split(".")[0]
    bbox = None
    im = cv2.imread(os.path.join(image_path, im_name))
    outfile = os.path.join(out_path, image_id+".npz")
    box_features, max_conf, cls_boxes = get_detections_from_im(
        cfg, model, im, image_id, 'gpu_0/fc6', 150, 150
    )
    np.savez(outfile, box_features=box_features, max_conf=max_conf, cls_boxes=cls_boxes)

改写后的脚本仍无法运行，model = model_engine.initialize_model_from_cfg(weight_path) 报错：

24635
WARNING:root:[====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.
Traceback (most recent call last):
File “get_f.py”, line 67, in
model = model_engine.initialize_model_from_cfg(weight_path)
File “/root/detectron/detectron/core/test_engine.py”, line 327, in initialize_model_from_cfg
model = model_builder.create(cfg.MODEL.TYPE, train=False, gpu_id=gpu_id)
File “/root/detectron/detectron/modeling/model_builder.py”, line 124, in create
return get_func(model_type_func)(model)
File “/root/detectron/detectron/modeling/model_builder.py”, line 89, in generalized_rcnn
freeze_conv_body=cfg.TRAIN.FREEZE_CONV_BODY
File “/root/detectron/detectron/modeling/model_builder.py”, line 229, in build_generic_detection_model
optim.build_data_parallel_model(model, _single_gpu_build_func)
File “/root/detectron/detectron/modeling/optimizer.py”, line 54, in build_data_parallel_model
single_gpu_build_func(model)
File “/root/detectron/detectron/modeling/model_builder.py”, line 169, in _single_gpu_build_func
blob_conv, dim_conv, spatial_scale_conv = add_conv_body_func(model)
File “/root/detectron/detectron/modeling/FPN.py”, line 63, in add_fpn_ResNet101_conv5_body
model, ResNet.add_ResNet101_conv5_body, fpn_level_info_ResNet101_conv5
File “/root/detectron/detectron/modeling/FPN.py”, line 104, in add_fpn_onto_conv_body
conv_body_func(model)
File “/root/detectron/detectron/modeling/ResNet.py”, line 48, in add_ResNet101_conv5_body
return add_ResNet_convX_body(model, (3, 4, 23, 3))
File “/root/detectron/detectron/modeling/ResNet.py”, line 99, in add_ResNet_convX_body
p, dim_in = globals()cfg.RESNETS.STEM_FUNC
File “/root/detectron/detectron/modeling/ResNet.py”, line 253, in basic_bn_stem
p = model.AffineChannel(p, ‘res_conv1_bn’, dim=dim, inplace=True)
File “/root/detectron/detectron/modeling/detector.py”, line 103, in AffineChannel
return self.net.AffineChannel([blob_in, scale, bias], blob_in)
File “/usr/local/caffe2/python/core.py”, line 1958, in getattr
“,”.join(workspace.C.nearby_opnames(op_type)) + ‘]’
AttributeError: Method AffineChannel is not a registered operator. Did you mean: []

第二次尝试（失败）

根据 https://github.com/facebookresearch/Detectron/issues/756 提示，成功通过 Detectron 的 Dockerfile 构建了

docker run --gpus all -v /home/sxw/jupyter_workspace/Data/sarcasm/dataset_image/:/root/images:ro -v /tmp/sarcasm_image2:/root/features -v /home/sxw/jupyter_workspace/mutil-model/visualbert/visualbert/:/root/code --rm -it detectron:c2-cuda9-cudnn7 /bin/bash

pip install torch tqdm pycocotools
cd /root/
# git clone https://github.com/facebookresearch/Detectron.git
cd /root/code/utils/get_image_features/

python get_f.py --cfg /Detectron/configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml --wts /root/code/model_final.pkl --min_bboxes 150 --max_bboxes 150 --feat_name gpu_0/fc6 --output_dir /root/features/ --image-ext jpg /root/images/

还是不行和第一次最后的错误一样

第三次尝试（失败）

使用更早版本的 Detectron 构建 docker 镜像，仍然不行，而且出现了新的问题

第四次尝试

这次的方案是使用 caffe2 镜像，手动安装 Detectron

docker run --gpus all -v /home/sxw/jupyter_workspace/Data/sarcasm/dataset_image/:/root/images:ro -v /tmp/sarcasm_image2:/root/features -v /home/sxw/jupyter_workspace/mutil-model/visualbert/visualbert/:/root/code --rm -it caffe2/caffe2:snapshot-py2-cuda9.0-cudnn7-ubuntu16.04 /bin/bash

执行测试后发现原来测试都通不过。。。（测试命令为：python -m caffe2.python.operator_test.relu_op_test）

第五次尝试

docker run --gpus all -v /home/sxw/jupyter_workspace/Data/sarcasm/dataset_image/:/root/images:ro -v /tmp/sarcasm_image2:/root/features -v /home/sxw/jupyter_workspace/mutil-model/visualbert/visualbert/:/root/code --rm -it zhang19941219/caffe2_py2_cuda90_cudnn7 /bin/bash

无法通过 caffe2 的测试

第六次尝试

docker run --gpus all -v /home/sxw/jupyter_workspace/Data/sarcasm/dataset_image/:/root/images:ro -v /tmp/sarcasm_image2:/root/features -v /home/sxw/jupyter_workspace/mutil-model/visualbert/visualbert/:/root/code --rm -it caffe2ai/caffe2:c2v0.8.1.cuda8.cudnn7.ubuntu16.04 /bin/bash

这次执行测试没有报错，但是似乎因为内存不够还是没有跑出来。

然后按照安装文档安装

安装 cocoapi

git clone https://github.com/cocodataset/cocoapi.git

依赖 pip install cython

然后遇到莫名其妙的 matplotlib 安装失败的问题。pip 手动装了依然不行。

具体错误是 easy install 的

1
2
3

  code = compile(script, filename, 'exec')
File "/tmp/easy_install-21D8yi/matplotlib-3.2.1/setup.py", line 139
  raise IOError(f"Failed to download jquery-ui.  Please download "

解决：查看 makefile 实际上里面写的很简单，就是执行 setup.py，于是把依赖里的 matplotlib 注释掉，就好了。

安装 detectron

1
2
3

git clone https://github.com/facebookresearch/detectron
cd detectron/
pip install -r requirements.txt

测试是否成功

python detectron/tests/test_spatial_narrow_as_op.py

结果报错：

Traceback (most recent call last):
  File "detectron/tests/test_spatial_narrow_as_op.py", line 88, in <module>
    c2_utils.import_detectron_ops()
  File "/root/detectron/detectron/utils/c2.py", line 43, in import_detectron_ops
    detectron_ops_lib = envu.get_detectron_ops_lib()
  File "/root/detectron/detectron/utils/env.py", line 75, in get_detectron_ops_lib
    raise Exception('Detectron ops lib not found')
Exception: Detectron ops lib not found

查了一下似乎和 pytorch 有关，于是安装 pytorch：pip install torch，果然解决（我装的是 1.4.0）

但是测试似乎是卡住了，不报错也无法执行完毕，占用 26.5 GB 内存（我一共 32 GB）和不到 300 MB 显存。

安装依赖 pip install tqdm pycocotools

执行

1
2
3

cd /root/code/utils/get_image_features

python get_f.py --cfg /root/detectron/configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml --wts /root/code/model_final.pkl --min_bboxes 150 --max_bboxes 150 --feat_name gpu_0/fc6 --output_dir /root/features/ --image-ext jpg /root/images/

仍然是错误：AttributeError: Method AffineChannel is not a registered operator. Did you mean: []

查了一下感觉可能是 detectron 没有 make ops 的问题，但是这个操作提示找不到 caffe2 的 CMakeFile 配置：

root@d785e85b8cfa:~/detectron# make ops
mkdir -p build && cd build && cmake .. && make -j24
CMake Error at CMakeLists.txt:8 (find_package):
  By not providing "FindCaffe2.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "Caffe2", but
  CMake did not find one.

  Could not find a package configuration file provided by "Caffe2" with any
  of the following names:

    Caffe2Config.cmake
    caffe2-config.cmake

  Add the installation prefix of "Caffe2" to CMAKE_PREFIX_PATH or set
  "Caffe2_DIR" to a directory containing one of the above files.  If "Caffe2"
  provides a separate development package or SDK, be sure it has been
  installed.


-- Configuring incomplete, errors occurred!
See also "/root/detectron/build/CMakeFiles/CMakeOutput.log".
Makefile:13: recipe for target 'ops' failed
make: *** [ops] Error 1

参考官方的 DockerFile 是把这个设置为

但我去看也没有找到这两个文件

1
2
3

root@d785e85b8cfa:~/detectron# ls /usr/local/caffe2/
CMakeFiles   __init__.pyc  contrib  cuda_rtc  distributed  image  mpi        perfkernels  python  sgd         utils
__init__.py  binaries      core     db        experiments  mkl    operators  proto        queue   transforms  video

可能是因为我用的镜像和官方镜像还是不同的。

再看 detectron 仓库，提示如果不行可能是没有在caffe2 执行 make install（https://github.com/facebookresearch/Detectron/blob/master/INSTALL.md#cmake-cannot-find-caffe2）

我去这个镜像的 /caffe2 尝试 make 但是第一发现没有 make install 只有 all，第二报错：

[  9%] Building NVCC (Device) object third_party/gloo/gloo/CMakeFiles/gloo_cuda.dir/nccl/gloo_cuda_generated_nccl.cu.o
nvcc fatal   : Unsupported gpu architecture 'compute_75'
CMake Error at gloo_cuda_generated_nccl.cu.o.cmake:203 (message):
  Error generating
  /caffe2/build/third_party/gloo/gloo/CMakeFiles/gloo_cuda.dir/nccl/./gloo_cuda_generated_nccl.cu.o

参考：https://blog.csdn.net/xunan003/article/details/90696412

文章中说因为 2080 是 7.5，得用 cuda 10，所以我编译不过

第七次

使用 cuda10.1 + cudnn 7 镜像

1	docker run --gpus all --rm -it nvidia/cuda:10.1-cudnn7-runtime-ubuntu16.04 /bin/bash

安装 caffe2

apt-get update
apt-get install -y --no-install-recommends \
      build-essential \
      git \
      libgoogle-glog-dev \
      libgtest-dev \
      libiomp-dev \
      libleveldb-dev \
      liblmdb-dev \
      libopencv-dev \
      libopenmpi-dev \
      libsnappy-dev \
      libprotobuf-dev \
      openmpi-bin \
      openmpi-doc \
      protobuf-compiler \
      python-dev \
      python-pip
pip install setuptools==40.7
pip install \
      future \
      numpy==1.16\
      protobuf \
      typing \
      hypothesis==4
apt-get install -y --no-install-recommends \
      libgflags-dev \
      cmake

git clone https://github.com/facebookarchive/caffe2.git
cd caffe2
git checkout v0.8.1
git submodule update --init --recursive

这步报错：这个 https://github.com/RLovelett/eigen/ 仓库已经没了，我尝试找了 https://github.com/libigl/eigen 来代替，但是我也不确定可不可以。

https://github.com/NervanaSystems/nervanagpu 也没了，用 https://github.com/VisionSystemsInc/nervanagpu

1
2
3

cd third_party/
git clone https://github.com/libigl/eigen
git clone https://github.com/VisionSystemsInc/nervanagpu

安装 cocoapi

git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
pip install cython
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple matplotlib
#pip install tqdm pycocotools
make install

安装 detectron

1 2	git clone https://github.com/facebookresearch/detectron cd detectron

在 build ops 的时候出错

CMake Error at CMakeLists.txt:8 (find_package):
By not providing “FindCaffe2.cmake” in CMAKE_MODULE_PATH this project has
asked CMake to find a package configuration file provided by “Caffe2”, but
CMake did not find one.

Could not find a package configuration file provided by “Caffe2” with any
of the following names:
Caffe2Config.cmake
caffe2-config.cmake
Add the installation prefix of “Caffe2” to CMAKE_PREFIX_PATH or set
“Caffe2_DIR” to a directory containing one of the above files. If “Caffe2”
provides a separate development package or SDK, be sure it has been
installed.

第八次

docker run --gpus all -v /home/sxw/jupyter_workspace/Data/sarcasm/dataset_image/:/root/images:ro -v /tmp/sarcasm_image2:/root/features -v /home/sxw/jupyter_workspace/mutil-model/visualbert/visualbert/:/root/code --rm -it housebw/detectron /bin/bash

进入容器后安装依赖

1	pip install tqdm

还是最早的错误:

root@c9cef23fd48e:~/code/utils/get_image_features# python get_f.py
24635
WARNING:root:[====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.
Traceback (most recent call last):
File “get_f.py”, line 67, in
model = model_engine.initialize_model_from_cfg(weight_path)
File “/detectron/detectron/core/test_engine.py”, line 328, in initialize_model_from_cfg
model = model_builder.create(cfg.MODEL.TYPE, train=False, gpu_id=gpu_id)
File “/detectron/detectron/modeling/model_builder.py”, line 124, in create
return get_func(model_type_func)(model)
File “/detectron/detectron/modeling/model_builder.py”, line 89, in generalized_rcnn
freeze_conv_body=cfg.TRAIN.FREEZE_CONV_BODY
File “/detectron/detectron/modeling/model_builder.py”, line 229, in build_generic_detection_model
optim.build_data_parallel_model(model, _single_gpu_build_func)
File “/detectron/detectron/modeling/optimizer.py”, line 54, in build_data_parallel_model
single_gpu_build_func(model)
File “/detectron/detectron/modeling/model_builder.py”, line 169, in _single_gpu_build_func
blob_conv, dim_conv, spatial_scale_conv = add_conv_body_func(model)
File “/detectron/detectron/modeling/FPN.py”, line 63, in add_fpn_ResNet101_conv5_body
model, ResNet.add_ResNet101_conv5_body, fpn_level_info_ResNet101_conv5
File “/detectron/detectron/modeling/FPN.py”, line 104, in add_fpn_onto_conv_body
conv_body_func(model)
File “/detectron/detectron/modeling/ResNet.py”, line 48, in add_ResNet101_conv5_body
return add_ResNet_convX_body(model, (3, 4, 23, 3))
File “/detectron/detectron/modeling/ResNet.py”, line 99, in add_ResNet_convX_body
p, dim_in = globals()cfg.RESNETS.STEM_FUNC
File “/detectron/detectron/modeling/ResNet.py”, line 253, in basic_bn_stem
p = model.AffineChannel(p, ‘res_conv1_bn’, dim=dim, inplace=True)
File “/detectron/detectron/modeling/detector.py”, line 103, in AffineChannel
return self.net.AffineChannel([blob_in, scale, bias], blob_in)
File “/usr/local/caffe2_build/caffe2/python/core.py”, line 2040, in getattr
“,”.join(workspace.C.nearby_opnames(op_type)) + ‘]’
AttributeError: Method AffineChannel is not a registered operator. Did you mean: []

而且这个 caffe2 测试也不能通过，错误：

gc=device_type: 1, dc=[, device_type: 1], engine=u’CUDNN’) produces unreliable results: Falsified on the first call but did not on a subsequent one

Ran 1 test in 11.002s

FAILED (errors=1)

第九次

docker run --gpus all -v /home/sxw/jupyter_workspace/Data/sarcasm/dataset_image/:/root/images:ro -v /tmp/sarcasm_image2:/root/features -v /home/sxw/jupyter_workspace/mutil-model/visualbert/visualbert/:/root/code --rm -it kracwarlock/detectron:1.1-cu100 /bin/bash

docker run --gpus all -v /home/sxw/jupyter_workspace/Data/sarcasm/dataset_image/:/root/images:ro -v /tmp/sarcasm_image2:/root/features -v /home/sxw/jupyter_workspace/mutil-model/visualbert/visualbert/:/root/code --rm -it yidliu/detectron:maskrcnn /bin/bash

docker run --gpus all -v /home/sxw/jupyter_workspace/Data/sarcasm/dataset_image/:/root/images:ro -v /tmp/sarcasm_image2:/root/features -v /home/sxw/jupyter_workspace/mutil-model/visualbert/visualbert/:/root/code --rm -it wxmao/detectron:v01 /bin/bash

以上几个 caffe2 测试都不能通过。

1	python -m caffe2.python.operator_test.relu_op_test

docker run --gpus all -v /home/sxw/jupyter_workspace/Data/sarcasm/dataset_image/:/root/images:ro -v /tmp/sarcasm_image2:/root/features -v /home/sxw/jupyter_workspace/mutil-model/visualbert/visualbert/:/root/code --rm -it caffe2/caffe2:snapshot-py2-cuda9.0-cudnn7-ubuntu16.04 /bin/bash

我的博客

（没跑通）在 multimodal Twitter dataset 上使用 VisualBert

图片特征提取

第一次尝试（失败）

第二次尝试（失败）

第三次尝试（失败）

第四次尝试

第五次尝试

第六次尝试

第七次

安装 caffe2

第八次

第九次

About

Categories

Tags

Tag Cloud

Archives

Recents