我的博客

在 multimodal Twitter dataset 上使用 VL-BERT

目录
  1. 预处理图片数据
  2. 训练
    1. 配置环境
    2. 初始化
    3. 训练
    4. 问题
    5. 测试

VL-BERT 代码:https://github.com/jackroos/VL-BERT/

预处理图片数据

VL-BERT 使用 fast rcnn 提取图片中的物体。

我使用 https://github.com/open-mmlab/mmdetection/ 提供的 resnet 101 fast rcnn 实现

下载模型,准备入口代码

1
2
3
mkdir mycode
cd mycode
wget https://open-mmlab.oss-cn-beijing.aliyuncs.com/mmdetection/models/faster_rcnn_r101_fpn_1x_20181129-d1468807.pth

创建 main.py,内容是:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import mmcv
from mmcv.runner import load_checkpoint
from mmdet.models import build_detector
from mmdet.apis import inference_detector, show_result
from tqdm import tqdm
import os
import json

cfg = mmcv.Config.fromfile('configs/faster_rcnn_r101_fpn_1x.py')
cfg.model.pretrained = None

# construct the model and load checkpoint
model = build_detector(cfg.model, test_cfg=cfg.test_cfg)
_ = load_checkpoint(model, '/root/code/faster_rcnn_r101_fpn_1x_20181129-d1468807.pth')

image_list = os.listdir('/root/images')
# test a single image
for name in tqdm(image_list):
if not name.endswith('.jpg'):
continue
img = mmcv.imread('/root/images/%s' % name)
result = inference_detector(model, img, cfg)
rl = []
for r in result:
if r.shape[0] > 0:
for x in r:
rl.append(x.tolist())
with open('/root/features/%s.boxs' % name.split('.')[0], 'w') as f:
f.write(json.dumps(rl))

启动 docker 容器

1
2
docker pull vistart/mmdetection:v0.6.0
docker run --gpus all -v /home/sxw/jupyter_workspace/Data/sarcasm/dataset_image/:/root/images:ro -v /tmp/sarcasm_image2:/root/features -v /home/sxw/jupyter_workspace/mutil-model/mycode:/root/code:ro --rm -it vistart/mmdetection:v0.6.0 /bin/bash

进入容器后

1
2
3
4
pip3 install pycocotools tqdm
cd mmdetection/
cp /root/code/main.py .
python3 main.py

训练

配置环境

1
2
3
4
5
6
7
8
9
10
11
pip install torch==1.1.0

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

cd ../
git clone https://github.com/sxwxs/VL-BERT.git
cd VL-BERT
pip install Cython
pip install -r requirements.txt

初始化

1
./scripts/init.sh

下载对应的预训练模型

https://github.com/jackroos/VL-BERT/blob/master/model/pretrained_model/PREPARE_PRETRAINED_MODELS.md

训练

1
./scripts/nondist_run.sh twitter/train_end2end.py cfgs/twitter/base_4x16G_fp32.yaml ./out/ 2> err.log

问题

遇到一堆问题

RuntimeError: CUDA error: an illegal memory access was encountered

CUDA_LAUNCH_BLOCKING=1 ./scripts/nondist_run.sh twitter/train_end2end.py cfgs/twitter/base_4x16G_fp32.yaml ./out/ 2> err.log

测试

1
python twitter/test.py  --cfg cfgs/twitter/base_4x16G_fp32.yaml --ckpt out/output/vl-bert/twitter/base_4x16G_fp32/train2014+val2014_train/vl-bert_base_res101_vqa-0004.model --gpus 0 --result-path result_output --result-name test1 2> terr.log

评论无需登录,可以匿名,欢迎评论!