附录3 参考文献

1.1.4 自然语言处理的发展历程

文章“Giving GPT-3 a Turing Test”

https://lacker.io/ai/2020/07/06/giving-gpt-3-a-turing-test.html

1.3.6 注意力机制

开源项目：文字注意力热力图可视化（Text-Attention-Heatmap-Visualization）

https://github.com/jiesutd/Text-Attention-Heatmap-Visualization

1.3.8 多模态学习

论文“Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books”

https://arxiv.org/abs/1506.06724

1.4.2 Batch Size的选择

论文“Revisiting Small Batch Training for Deep Neural Networks”

https://arxiv.org/abs/1804.07612

1.4.3 数据集不平衡问题

论文“Focal Loss for Dense Object Detection”

https://arxiv.org/abs/1708.02002

论文“A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection”

https://arxiv.org/abs/1607.07155

1.5.4 预训练模型与数据安全

论文“Extracting Training Data from Large Language Models”

https://arxiv.org/abs/2012.07805

2.1.3 使用pip包管理程序和Python虚环境

Python官方文档：虚环境

https://docs.python.org/zh-cn/3/tutorial/venv.html

2.1.5 安装Python自然语言处理常用的库

论文“PKUSEG: A Toolkit for Multi-Domain Chinese Word Segmentation”

https://arxiv.org/abs/1906.11455

2.3.5 文本规范化

BERT-KPE

https://github.com/thunlp/BERT-KPE/blob/master/preprocess/prepro_utils.py

2.5.1 通过ctype调用C/C++代码

Python官方文档ctypes

https://docs.python.org/zh-cn/3.8/library/ctypes.html

3.2.1 PyTorch的优势

近年来几个NLP顶级会议中使用PyTorch和TensorFlow论文数比较数据

http://horace.io/pytorch-vs-tensorflow/

4.2.6 使用torch.nn的Transformer模型

论文“Attention is all you need”

https://arxiv.org/abs/1706.03762

4.3.6 使用LogSoftMax函数

PyTorch中SoftMax和LogSoftMax的实现（C++代码）地址

https://github.com/pytorch/pytorch/blob/v1.6.0/aten/src/ATen/native/SoftMax.cpp

4.5.2 使用Adam优化器

论文“Adam: A Method for Stochastic Optimization”

https://arxiv.org/abs/1412.6980

论文“On the Convergence of Adam and Beyond”

https://arxiv.org/abs/1904.09237

4.5.3 使用AdamW优化器

论文“Decoupled Weight Decay Regularization”

https://arxiv.org/abs/1711.05101

4.9.2 在PyTorch中使用TensorBoard

PyTorch官方网站文档中对TensorBoard使用方法的介绍

https://pytorch.org/docs/master/tensorboard.html#torch-utils-tensorboard

6.3.4 使用pkuseg

开源项目pkuseg-python中提供的tags.txt文件

https://github.com/lancopku/pkuseg-python/blob/master/tags.txt

9.1.1 背景

论文“Generating Sequences With Recurrent Neural Networks”

https://arxiv.org/abs/1308.0850

论文“Recurrent Continuous Translation Models”

https://www.aclweb.org/anthology/D13-1176/

论文“Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation”

https://arxiv.org/abs/1406.1078

论文“Sequence to Sequence Learning with Neural Networks”

https://arxiv.org/abs/1409.3215

9.2 使用PyTorch实现Seq2seq模型

开源项目PyTorch-Seq2seq

https://github.com/bentrevett/pytorch-seq2seq

10.1.1 最早应用于计算机视觉

文章“Attention and Memory in Deep Learning and NLP”

http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/

论文“Recurrent Models of Visual Attention”

http://arxiv.org/abs/1406.6247

10.4.2 Self-Attention相关的工作

论文“Long Short-Term Memory-Networks for Machine Reading”

https://www.aclweb.org/anthology/D16-1053/

论文“A Structured Self-Attentive Sentence Embedding”

https://arxiv.org/abs/1703.03130

论文“A Deep Reinforced Model for Abstractive Summarization”

https://arxiv.org/abs/1705.04304

10.6 Multi-hop Attention

论文“Memory Networks”

https://arxiv.org/abs/1410.3916

论文“End-To-End Memory Networks”

https://arxiv.org/abs/1503.08895

论文“Multihop Attention Networks for Question Answer Matching”

https://dl.acm.org/doi/10.1145/3209978.3210009

10.7 Soft Attention和Hard Attention

论文“Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”

https://arxiv.org/abs/1502.03044

10.8 Full Attention和Sparse Attention

论文“Generating Long Sequences with Sparse Transformers”

https://arxiv.org/abs/1904.10509

11.1. 背景

论文“Attention Is All You Need”

https://arxiv.org/abs/1706.03762

论文“Convolutional Sequence to Sequence Learning”

https://arxiv.org/abs/1705.03122

11.2.1 背景

论文“Convolutional Neural Networks for Sentence Classification”

https://arxiv.org/abs/1408.5882

11.3.4 使用Positional Encoding

Positional Encoding完整代码

https://github.com/jalammar/jalammar.github.io/blob/master/notebookes/transformer/transformer_positional_encoding_graph.ipyn

11.4 Transformer的改进

论文“Generating Long Sequences with Sparse Transformers”

https://arxiv.org/abs/1904.10509

论文“Local Self-Attention over Long Text for Efficient Document Retrieval”

https://arxiv.org/abs/2005.04908

12.1.3 自然语言处理预训练的发展

论文“Deep Contextualized Word Representations”

https://arxiv.org/abs/1802.05365

12.3 GPT模型

论文“Improving Language Understanding by Generative Pre-Training”

https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf

12.3.6 GPT2和GPT3

论文“Language Models are Unsupervised Multitask Learners”

https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf

论文“Language Models are Few-Shot Learners”

https://arxiv.org/abs/2005.14165

文章“Giving GPT-3 a Turing Test”

https://lacker.io/ai/2020/07/06/giving-gpt-3-a-turing-test.html

12.4 BERT模型

论文“BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”

https://arxiv.org/abs/1810.04805

论文“Cloze procedure: A new tool for measuring readability”

https://journals.sagepub.com/doi/10.1177/107769905303000401

论文“RoBERTa: A Robustly Optimized BERT Pretraining Approach”

https://arxiv.org/abs/1907.11692

论文“ALBERT: A Lite BERT for Self-supervised Learning of Language Representations”

https://arxiv.org/abs/1909.11942

14.1.1 实验目标与数据集介绍

论文“"Neural Chinese Address Parsing”

https://www.aclweb.org/anthology/N19-1346/

14.4.4 训练模型

neural-chinese-address-parsing 中包含的测试脚本conlleval.pl

https://github.com/leodotnet/neural-chinese-address-parsing/blob/master/conlleval.pl

15.7.2 评估模型

GPT2-Chinese

https://github.com/Morizeyao/GPT2-Chinese/blob/master/generate.py

1.1.4 自然语言处理的发展历程

1.3.6 注意力机制

1.3.8 多模态学习

1.4.2 Batch Size的选择

1.4.3 数据集不平衡问题

1.5.4 预训练模型与数据安全

2.1.3 使用pip包管理程序和Python虚环境

2.1.5 安装Python自然语言处理常用的库

2.3.5 文本规范化

2.5.1 通过ctype调用C/C++代码

3.2.1 PyTorch的优势

4.2.6 使用torch.nn的Transformer模型

4.3.6 使用LogSoftMax函数

4.5.2 使用Adam优化器

4.5.3 使用AdamW优化器

4.9.2 在PyTorch中使用TensorBoard

6.3.4 使用pkuseg

9.1.1 背景

9.2 使用PyTorch实现Seq2seq模型

10.1.1 最早应用于计算机视觉

10.4.2 Self-Attention相关的工作

10.6 Multi-hop Attention

10.7 Soft Attention和Hard Attention

10.8 Full Attention和Sparse Attention

11.1. 背景

11.2.1 背景

11.3.4 使用Positional Encoding

11.4 Transformer的改进

12.1.3 自然语言处理预训练的发展

12.3 GPT模型

12.3.6 GPT2和GPT3

12.4 BERT模型

14.1.1 实验目标与数据集介绍

14.4.4 训练模型

15.7.2 评估模型

social