我的博客

(没跑通)在 multimodal Twitter dataset 上使用 VisualBert

目录
  1. 图片特征提取
    1. 第一次尝试(失败)
    2. 第二次尝试(失败)
    3. 第三次尝试(失败)
    4. 第四次尝试
    5. 第五次尝试
    6. 第六次尝试
    7. 第七次
      1. 安装 caffe2
    8. 第八次
    9. 第九次

因为没有跑通图像特征抽取的代码,所以没有成功复现。

论文:VisualBERT: A Simple and Performant Baseline for Vision and Language

论文地址:https://arxiv.org/abs/1908.03557

仓库地址:https://github.com/uclanlp/visualbert

图片特征提取

第一次尝试(失败)

  1. 安装 Detectron

    本来 Detectron 是提供了一个 docker,但是我没装上,所以安装了 caffee2 的 docker,然后进一步安装 Detectron,具体过程记录在我上一篇文章里。

  2. 下载与训练权重

    仓库 README 的 Extract image features on your own 部分中提到,他使用的权重是 35861858,我推测他用的是:

    模型:

    1
    wget https://dl.fbaipublicfiles.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl

    模型信息:3ae556bf3de044a56eb3ecb66fea5cda model_final.pkl 491 MB

    对应的配置文件:

    1
    configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml

    我们先把模型下载到代码的根目录,一会启动 docker 可以和代码一起挂在,而 yaml 文件是 detectron 仓库中的,因为 docker 镜像中已经克隆了那个仓库,所以已经有了。

  3. 特征提取脚本

    目前,只有 NLVR2 的脚本,visualbert/utils/get_image_features/extract_features_nlvr.py

    应该不怎么用修改就可以用。

    用法是:

    1
    2
    3
    #SET = train/dev/test1
    cd visualbert/utils/get_image_features
    CUDA_VISIBLE_DEVICES=0 python extract_features_nlvr.py --cfg XXX.yaml --wts XXX.pkl --min_bboxes 150 --max_bboxes 150 --feat_name gpu_0/fc6 --output_dir X_NLVR --image-ext png X_NLVR_IMAGE/SET --no_id --one_giant_file X_NLVR/features_SET_150.th

简单看这个脚本的结构,main 函数是入口,通过 recurse_find_image 函数获取图片的列表。然后就是对图片依次处理,这个参数 one_giant_file 好像是用于最后把所有结果输出到一个大文件里的,我感觉没有用,所以可以不加。

  1. 运行

    启动刚刚创建的,安装好了 的 Docker 镜像并挂载数据目录,代码模型目录,输出目录。

    1
    2
    mkdir /tmp/sarcasm_image2
    docker run --gpus all -v /home/sxw/jupyter_workspace/Data/sarcasm/dataset_image/:/root/images:ro -v /tmp/sarcasm_image2:/root/features -v /home/sxw/jupyter_workspace/mutil-model/visualbert/visualbert/:/root/code --rm -it my_detectron:v1.0 /bin/bash

    进入镜像后,先进入特征提取代码的路径

    1
    cd /root/code/utils/get_image_features/

安装依赖

1
pip install torch tqdm pycocotools

运行

1
python extract_features_nlvr.py --cfg /root/detectron/configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml --wts /root/code/model_final.pkl --min_bboxes 150 --max_bboxes 150 --feat_name gpu_0/fc6 --output_dir /root/features/ --image-ext jpg /root/images/

错误太多,不能运行,所以我又改写了脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
import cv2
from caffe2.python import workspace
import caffe2
import os
from tqdm import tqdm
from detectron.core.config import cfg
import detectron.core.test_engine as model_engine
import detectron.utils.c2 as c2_utils
import detectron.core.test as infer_engine
import numpy as np
from detectron.core.config import merge_cfg_from_file


image_path = '/root/images/'
out_path = '/root/features/'
weight_path ='/root/code/model_final.pkl'
config_path = '/root/detectron/configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml'


workspace.GlobalInit(['caffe2', '--caffe2_log_level=0'])




def get_detections_from_im(cfg, model, im, image_id, feat_blob_name,
MIN_BOXES, MAX_BOXES, conf_thresh=0.2, bboxes=None):
with c2_utils.NamedCudaScope(0):
scores, cls_boxes, im_scale = infer_engine.im_detect_bbox(model,
im,
cfg.TEST.SCALE,
cfg.TEST.MAX_SIZE,
boxes=bboxes)
box_features = workspace.FetchBlob(feat_blob_name)
cls_prob = workspace.FetchBlob("gpu_0/cls_prob")
rois = workspace.FetchBlob("gpu_0/rois")
max_conf = np.zeros((rois.shape[0]))
# unscale back to raw image space
cls_boxes = rois[:, 1:5] / im_scale

for cls_ind in range(1, cls_prob.shape[1]):
cls_scores = scores[:, cls_ind]
dets = np.hstack((cls_boxes, cls_scores[:, np.newaxis])).astype(np.float32)
keep = np.array(nms(dets, cfg.TEST.NMS))
max_conf[keep] = np.where(cls_scores[keep] > max_conf[keep], cls_scores[keep], max_conf[keep])

keep_boxes = np.where(max_conf >= conf_thresh)[0]
if len(keep_boxes) < MIN_BOXES:
keep_boxes = np.argsort(max_conf)[::-1][:MIN_BOXES]
elif len(keep_boxes) > MAX_BOXES:
keep_boxes = np.argsort(max_conf)[::-1][:MAX_BOXES]
objects = np.argmax(cls_prob[keep_boxes], axis=1)

return box_features[keep_boxes], max_conf[keep_boxes], cls_boxes[keep_boxes]

im_list = os.listdir(image_path)
print(len(im_list))
merge_cfg_from_file(config_path)
cfg.NUM_GPUS = 1
model = model_engine.initialize_model_from_cfg(weight_path)
for i, im_name in enumerate(tqdm(im_list)):
im_base_name = os.path.basename(im_name)
image_id = im_base_name.split(".")[0]
bbox = None
im = cv2.imread(os.path.join(image_path, im_name))
outfile = os.path.join(out_path, image_id+".npz")
box_features, max_conf, cls_boxes = get_detections_from_im(
cfg, model, im, image_id, 'gpu_0/fc6', 150, 150
)
np.savez(outfile, box_features=box_features, max_conf=max_conf, cls_boxes=cls_boxes)

改写后的脚本仍无法运行,model = model_engine.initialize_model_from_cfg(weight_path) 报错:

24635
WARNING:root:[====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.
Traceback (most recent call last):
File “get_f.py”, line 67, in
model = model_engine.initialize_model_from_cfg(weight_path)
File “/root/detectron/detectron/core/test_engine.py”, line 327, in initialize_model_from_cfg
model = model_builder.create(cfg.MODEL.TYPE, train=False, gpu_id=gpu_id)
File “/root/detectron/detectron/modeling/model_builder.py”, line 124, in create
return get_func(model_type_func)(model)
File “/root/detectron/detectron/modeling/model_builder.py”, line 89, in generalized_rcnn
freeze_conv_body=cfg.TRAIN.FREEZE_CONV_BODY
File “/root/detectron/detectron/modeling/model_builder.py”, line 229, in build_generic_detection_model
optim.build_data_parallel_model(model, _single_gpu_build_func)
File “/root/detectron/detectron/modeling/optimizer.py”, line 54, in build_data_parallel_model
single_gpu_build_func(model)
File “/root/detectron/detectron/modeling/model_builder.py”, line 169, in _single_gpu_build_func
blob_conv, dim_conv, spatial_scale_conv = add_conv_body_func(model)
File “/root/detectron/detectron/modeling/FPN.py”, line 63, in add_fpn_ResNet101_conv5_body
model, ResNet.add_ResNet101_conv5_body, fpn_level_info_ResNet101_conv5
File “/root/detectron/detectron/modeling/FPN.py”, line 104, in add_fpn_onto_conv_body
conv_body_func(model)
File “/root/detectron/detectron/modeling/ResNet.py”, line 48, in add_ResNet101_conv5_body
return add_ResNet_convX_body(model, (3, 4, 23, 3))
File “/root/detectron/detectron/modeling/ResNet.py”, line 99, in add_ResNet_convX_body
p, dim_in = globals()cfg.RESNETS.STEM_FUNC
File “/root/detectron/detectron/modeling/ResNet.py”, line 253, in basic_bn_stem
p = model.AffineChannel(p, ‘res_conv1_bn’, dim=dim, inplace=True)
File “/root/detectron/detectron/modeling/detector.py”, line 103, in AffineChannel
return self.net.AffineChannel([blob_in, scale, bias], blob_in)
File “/usr/local/caffe2/python/core.py”, line 1958, in getattr
“,”.join(workspace.C.nearby_opnames(op_type)) + ‘]’
AttributeError: Method AffineChannel is not a registered operator. Did you mean: []

第二次尝试(失败)

根据 https://github.com/facebookresearch/Detectron/issues/756 提示,成功通过 Detectron 的 Dockerfile 构建了

1
docker run --gpus all -v /home/sxw/jupyter_workspace/Data/sarcasm/dataset_image/:/root/images:ro -v /tmp/sarcasm_image2:/root/features -v /home/sxw/jupyter_workspace/mutil-model/visualbert/visualbert/:/root/code --rm -it detectron:c2-cuda9-cudnn7 /bin/bash
1
2
3
4
5
6
pip install torch tqdm pycocotools
cd /root/
# git clone https://github.com/facebookresearch/Detectron.git
cd /root/code/utils/get_image_features/

python get_f.py --cfg /Detectron/configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml --wts /root/code/model_final.pkl --min_bboxes 150 --max_bboxes 150 --feat_name gpu_0/fc6 --output_dir /root/features/ --image-ext jpg /root/images/

还是不行和第一次最后的错误一样

第三次尝试(失败)

使用更早版本的 Detectron 构建 docker 镜像,仍然不行,而且出现了新的问题

第四次尝试

这次的方案是使用 caffe2 镜像,手动安装 Detectron

1
docker run --gpus all -v /home/sxw/jupyter_workspace/Data/sarcasm/dataset_image/:/root/images:ro -v /tmp/sarcasm_image2:/root/features -v /home/sxw/jupyter_workspace/mutil-model/visualbert/visualbert/:/root/code --rm -it caffe2/caffe2:snapshot-py2-cuda9.0-cudnn7-ubuntu16.04 /bin/bash

执行测试后发现原来测试都通不过。。。(测试命令为:python -m caffe2.python.operator_test.relu_op_test

第五次尝试

1
docker run --gpus all -v /home/sxw/jupyter_workspace/Data/sarcasm/dataset_image/:/root/images:ro -v /tmp/sarcasm_image2:/root/features -v /home/sxw/jupyter_workspace/mutil-model/visualbert/visualbert/:/root/code --rm -it zhang19941219/caffe2_py2_cuda90_cudnn7 /bin/bash

无法通过 caffe2 的测试

第六次尝试

1
docker run --gpus all -v /home/sxw/jupyter_workspace/Data/sarcasm/dataset_image/:/root/images:ro -v /tmp/sarcasm_image2:/root/features -v /home/sxw/jupyter_workspace/mutil-model/visualbert/visualbert/:/root/code --rm -it caffe2ai/caffe2:c2v0.8.1.cuda8.cudnn7.ubuntu16.04 /bin/bash

这次执行测试没有报错,但是似乎因为内存不够还是没有跑出来。

然后按照安装文档安装

  1. 安装 cocoapi

    git clone https://github.com/cocodataset/cocoapi.git

    依赖 pip install cython

    然后遇到莫名其妙的 matplotlib 安装失败的问题。pip 手动装了依然不行。

具体错误是 easy install 的

1
2
3
  code = compile(script, filename, 'exec')
File "/tmp/easy_install-21D8yi/matplotlib-3.2.1/setup.py", line 139
raise IOError(f"Failed to download jquery-ui. Please download "

解决:查看 makefile 实际上里面写的很简单,就是执行 setup.py,于是把依赖里的 matplotlib 注释掉,就好了。

  1. 安装 detectron
1
2
3
git clone https://github.com/facebookresearch/detectron
cd detectron/
pip install -r requirements.txt

测试是否成功

python detectron/tests/test_spatial_narrow_as_op.py

结果报错:

1
2
3
4
5
6
7
8
Traceback (most recent call last):
File "detectron/tests/test_spatial_narrow_as_op.py", line 88, in <module>
c2_utils.import_detectron_ops()
File "/root/detectron/detectron/utils/c2.py", line 43, in import_detectron_ops
detectron_ops_lib = envu.get_detectron_ops_lib()
File "/root/detectron/detectron/utils/env.py", line 75, in get_detectron_ops_lib
raise Exception('Detectron ops lib not found')
Exception: Detectron ops lib not found

查了一下似乎和 pytorch 有关,于是安装 pytorch:pip install torch,果然解决(我装的是 1.4.0)

但是测试似乎是卡住了,不报错也无法执行完毕,占用 26.5 GB 内存(我一共 32 GB) 和 不到 300 MB 显存。

安装依赖 pip install tqdm pycocotools

执行

1
2
3
cd /root/code/utils/get_image_features

python get_f.py --cfg /root/detectron/configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml --wts /root/code/model_final.pkl --min_bboxes 150 --max_bboxes 150 --feat_name gpu_0/fc6 --output_dir /root/features/ --image-ext jpg /root/images/

仍然是错误:AttributeError: Method AffineChannel is not a registered operator. Did you mean: []

查了一下感觉可能是 detectron 没有 make ops 的问题,但是这个操作提示找不到 caffe2 的 CMakeFile 配置:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
root@d785e85b8cfa:~/detectron# make ops
mkdir -p build && cd build && cmake .. && make -j24
CMake Error at CMakeLists.txt:8 (find_package):
By not providing "FindCaffe2.cmake" in CMAKE_MODULE_PATH this project has
asked CMake to find a package configuration file provided by "Caffe2", but
CMake did not find one.

Could not find a package configuration file provided by "Caffe2" with any
of the following names:

Caffe2Config.cmake
caffe2-config.cmake

Add the installation prefix of "Caffe2" to CMAKE_PREFIX_PATH or set
"Caffe2_DIR" to a directory containing one of the above files. If "Caffe2"
provides a separate development package or SDK, be sure it has been
installed.


-- Configuring incomplete, errors occurred!
See also "/root/detectron/build/CMakeFiles/CMakeOutput.log".
Makefile:13: recipe for target 'ops' failed
make: *** [ops] Error 1

参考官方的 DockerFile 是把这个 设置为

但我去看也没有找到这两个文件

1
2
3
root@d785e85b8cfa:~/detectron# ls /usr/local/caffe2/
CMakeFiles __init__.pyc contrib cuda_rtc distributed image mpi perfkernels python sgd utils
__init__.py binaries core db experiments mkl operators proto queue transforms video

可能是因为我用的镜像和官方镜像还是不同的。

再看 detectron 仓库,提示如果不行可能是没有在caffe2 执行 make install(https://github.com/facebookresearch/Detectron/blob/master/INSTALL.md#cmake-cannot-find-caffe2)

我去这个镜像的 /caffe2 尝试 make 但是第一发现没有 make install 只有 all,第二报错:

1
2
3
4
5
[  9%] Building NVCC (Device) object third_party/gloo/gloo/CMakeFiles/gloo_cuda.dir/nccl/gloo_cuda_generated_nccl.cu.o
nvcc fatal : Unsupported gpu architecture 'compute_75'
CMake Error at gloo_cuda_generated_nccl.cu.o.cmake:203 (message):
Error generating
/caffe2/build/third_party/gloo/gloo/CMakeFiles/gloo_cuda.dir/nccl/./gloo_cuda_generated_nccl.cu.o

参考:https://blog.csdn.net/xunan003/article/details/90696412

文章中说因为 2080 是 7.5,得用 cuda 10,所以我编译不过

第七次

使用 cuda10.1 + cudnn 7 镜像

1
docker run --gpus all --rm -it nvidia/cuda:10.1-cudnn7-runtime-ubuntu16.04 /bin/bash

安装 caffe2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
apt-get update
apt-get install -y --no-install-recommends \
build-essential \
git \
libgoogle-glog-dev \
libgtest-dev \
libiomp-dev \
libleveldb-dev \
liblmdb-dev \
libopencv-dev \
libopenmpi-dev \
libsnappy-dev \
libprotobuf-dev \
openmpi-bin \
openmpi-doc \
protobuf-compiler \
python-dev \
python-pip
pip install setuptools==40.7
pip install \
future \
numpy==1.16\
protobuf \
typing \
hypothesis==4
apt-get install -y --no-install-recommends \
libgflags-dev \
cmake
1
2
3
4
git clone https://github.com/facebookarchive/caffe2.git
cd caffe2
git checkout v0.8.1
git submodule update --init --recursive

这步报错:这个 https://github.com/RLovelett/eigen/ 仓库已经没了,我尝试找了 https://github.com/libigl/eigen 来代替,但是我也不确定可不可以。

https://github.com/NervanaSystems/nervanagpu 也没了,用 https://github.com/VisionSystemsInc/nervanagpu

1
2
3
cd third_party/
git clone https://github.com/libigl/eigen
git clone https://github.com/VisionSystemsInc/nervanagpu

安装 cocoapi

1
2
3
4
5
6
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
pip install cython
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple matplotlib
#pip install tqdm pycocotools
make install

安装 detectron

1
2
git clone https://github.com/facebookresearch/detectron
cd detectron

在 build ops 的时候出错

CMake Error at CMakeLists.txt:8 (find_package):
By not providing “FindCaffe2.cmake” in CMAKE_MODULE_PATH this project has
asked CMake to find a package configuration file provided by “Caffe2”, but
CMake did not find one.

Could not find a package configuration file provided by “Caffe2” with any
of the following names:

Caffe2Config.cmake
caffe2-config.cmake

Add the installation prefix of “Caffe2” to CMAKE_PREFIX_PATH or set
“Caffe2_DIR” to a directory containing one of the above files. If “Caffe2”
provides a separate development package or SDK, be sure it has been
installed.

第八次

1
docker run --gpus all -v /home/sxw/jupyter_workspace/Data/sarcasm/dataset_image/:/root/images:ro -v /tmp/sarcasm_image2:/root/features -v /home/sxw/jupyter_workspace/mutil-model/visualbert/visualbert/:/root/code --rm -it housebw/detectron /bin/bash

进入容器后安装依赖

1
pip install tqdm

还是最早的错误:

root@c9cef23fd48e:~/code/utils/get_image_features# python get_f.py
24635
WARNING:root:[====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.
Traceback (most recent call last):
File “get_f.py”, line 67, in
model = model_engine.initialize_model_from_cfg(weight_path)
File “/detectron/detectron/core/test_engine.py”, line 328, in initialize_model_from_cfg
model = model_builder.create(cfg.MODEL.TYPE, train=False, gpu_id=gpu_id)
File “/detectron/detectron/modeling/model_builder.py”, line 124, in create
return get_func(model_type_func)(model)
File “/detectron/detectron/modeling/model_builder.py”, line 89, in generalized_rcnn
freeze_conv_body=cfg.TRAIN.FREEZE_CONV_BODY
File “/detectron/detectron/modeling/model_builder.py”, line 229, in build_generic_detection_model
optim.build_data_parallel_model(model, _single_gpu_build_func)
File “/detectron/detectron/modeling/optimizer.py”, line 54, in build_data_parallel_model
single_gpu_build_func(model)
File “/detectron/detectron/modeling/model_builder.py”, line 169, in _single_gpu_build_func
blob_conv, dim_conv, spatial_scale_conv = add_conv_body_func(model)
File “/detectron/detectron/modeling/FPN.py”, line 63, in add_fpn_ResNet101_conv5_body
model, ResNet.add_ResNet101_conv5_body, fpn_level_info_ResNet101_conv5
File “/detectron/detectron/modeling/FPN.py”, line 104, in add_fpn_onto_conv_body
conv_body_func(model)
File “/detectron/detectron/modeling/ResNet.py”, line 48, in add_ResNet101_conv5_body
return add_ResNet_convX_body(model, (3, 4, 23, 3))
File “/detectron/detectron/modeling/ResNet.py”, line 99, in add_ResNet_convX_body
p, dim_in = globals()cfg.RESNETS.STEM_FUNC
File “/detectron/detectron/modeling/ResNet.py”, line 253, in basic_bn_stem
p = model.AffineChannel(p, ‘res_conv1_bn’, dim=dim, inplace=True)
File “/detectron/detectron/modeling/detector.py”, line 103, in AffineChannel
return self.net.AffineChannel([blob_in, scale, bias], blob_in)
File “/usr/local/caffe2_build/caffe2/python/core.py”, line 2040, in getattr
“,”.join(workspace.C.nearby_opnames(op_type)) + ‘]’
AttributeError: Method AffineChannel is not a registered operator. Did you mean: []

而且这个 caffe2 测试也不能通过,错误:

gc=device_type: 1, dc=[, device_type: 1], engine=u’CUDNN’) produces unreliable results: Falsified on the first call but did not on a subsequent one


Ran 1 test in 11.002s

FAILED (errors=1)

第九次

1
docker run --gpus all -v /home/sxw/jupyter_workspace/Data/sarcasm/dataset_image/:/root/images:ro -v /tmp/sarcasm_image2:/root/features -v /home/sxw/jupyter_workspace/mutil-model/visualbert/visualbert/:/root/code --rm -it kracwarlock/detectron:1.1-cu100 /bin/bash
1
docker run --gpus all -v /home/sxw/jupyter_workspace/Data/sarcasm/dataset_image/:/root/images:ro -v /tmp/sarcasm_image2:/root/features -v /home/sxw/jupyter_workspace/mutil-model/visualbert/visualbert/:/root/code --rm -it yidliu/detectron:maskrcnn /bin/bash
1
docker run --gpus all -v /home/sxw/jupyter_workspace/Data/sarcasm/dataset_image/:/root/images:ro -v /tmp/sarcasm_image2:/root/features -v /home/sxw/jupyter_workspace/mutil-model/visualbert/visualbert/:/root/code --rm -it wxmao/detectron:v01 /bin/bash

以上几个 caffe2 测试都不能通过。

1
python -m caffe2.python.operator_test.relu_op_test
1
docker run --gpus all -v /home/sxw/jupyter_workspace/Data/sarcasm/dataset_image/:/root/images:ro -v /tmp/sarcasm_image2:/root/features -v /home/sxw/jupyter_workspace/mutil-model/visualbert/visualbert/:/root/code --rm -it caffe2/caffe2:snapshot-py2-cuda9.0-cudnn7-ubuntu16.04 /bin/bash

评论无需登录,可以匿名,欢迎评论!