我的博客

文本+视觉模型复现

目录
  1. LXMERT
  2. vilbert-multi-task

LXMERT

论文:LXMERT: Learning Cross-Modality Encoder Representations from Transformers

https://arxiv.org/abs/1908.07490

先预处理提取图片特征:通过物件识别得到 ROI(region of interest),和图片特征

https://codeplot.top/2020/04/11/%E5%9C%A8-multimodal-Twitter-dataset-%E4%B8%8A%E4%BD%BF%E7%94%A8-LXMERT/

vilbert-multi-task

论文:ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

https://arxiv.org/abs/1908.02265

https://codeplot.top/2020/04/05/%E5%9C%A8-multimodal-Twitter-dataset-%E4%B8%8A%E4%BD%BF%E7%94%A8-vilbert-multi-task/

评论无需登录,可以匿名,欢迎评论!