1.1.4 自然语言处理的发展历程
- 文章“Giving GPT-3 a Turing Test”
https://lacker.io/ai/2020/07/06/giving-gpt-3-a-turing-test.html
1.3.6 注意力机制
- 开源项目:文字注意力热力图可视化(Text-Attention-Heatmap-Visualization)
https://github.com/jiesutd/Text-Attention-Heatmap-Visualization
1.3.8 多模态学习
- 论文“Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books”
https://arxiv.org/abs/1506.06724
1.4.2 Batch Size的选择
- 论文“Revisiting Small Batch Training for Deep Neural Networks”
https://arxiv.org/abs/1804.07612
1.4.3 数据集不平衡问题
- 论文“Focal Loss for Dense Object Detection”
https://arxiv.org/abs/1708.02002
- 论文“A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection”
https://arxiv.org/abs/1607.07155
1.5.4 预训练模型与数据安全
- 论文“Extracting Training Data from Large Language Models”
https://arxiv.org/abs/2012.07805
2.1.3 使用pip包管理程序和Python虚环境
- Python官方文档:虚环境
https://docs.python.org/zh-cn/3/tutorial/venv.html
2.1.5 安装Python自然语言处理常用的库
- 论文“PKUSEG: A Toolkit for Multi-Domain Chinese Word Segmentation”
https://arxiv.org/abs/1906.11455
2.3.5 文本规范化
- BERT-KPE
https://github.com/thunlp/BERT-KPE/blob/master/preprocess/prepro_utils.py
2.5.1 通过ctype调用C/C++代码
- Python官方文档ctypes
https://docs.python.org/zh-cn/3.8/library/ctypes.html
3.2.1 PyTorch的优势
- 近年来几个NLP顶级会议中使用PyTorch和TensorFlow论文数比较数据
http://horace.io/pytorch-vs-tensorflow/
4.2.6 使用torch.nn的Transformer模型
- 论文“Attention is all you need”
https://arxiv.org/abs/1706.03762
4.3.6 使用LogSoftMax函数
- PyTorch中SoftMax和LogSoftMax的实现(C++代码)地址
https://github.com/pytorch/pytorch/blob/v1.6.0/aten/src/ATen/native/SoftMax.cpp
4.5.2 使用Adam优化器
- 论文“Adam: A Method for Stochastic Optimization”
https://arxiv.org/abs/1412.6980
- 论文“On the Convergence of Adam and Beyond”
https://arxiv.org/abs/1904.09237
4.5.3 使用AdamW优化器
- 论文“Decoupled Weight Decay Regularization”
https://arxiv.org/abs/1711.05101
4.9.2 在PyTorch中使用TensorBoard
PyTorch官方网站文档中对TensorBoard使用方法的介绍
https://pytorch.org/docs/master/tensorboard.html#torch-utils-tensorboard
6.3.4 使用pkuseg
- 开源项目pkuseg-python中提供的tags.txt文件
https://github.com/lancopku/pkuseg-python/blob/master/tags.txt
9.1.1 背景
- 论文“Generating Sequences With Recurrent Neural Networks”
https://arxiv.org/abs/1308.0850
论文“Recurrent Continuous Translation Models”
https://www.aclweb.org/anthology/D13-1176/
论文“Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation”
https://arxiv.org/abs/1406.1078
论文“Sequence to Sequence Learning with Neural Networks”
https://arxiv.org/abs/1409.3215
9.2 使用PyTorch实现Seq2seq模型
- 开源项目PyTorch-Seq2seq
https://github.com/bentrevett/pytorch-seq2seq
10.1.1 最早应用于计算机视觉
- 文章“Attention and Memory in Deep Learning and NLP”
http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/
- 论文“Recurrent Models of Visual Attention”
http://arxiv.org/abs/1406.6247
10.4.2 Self-Attention相关的工作
- 论文“Long Short-Term Memory-Networks for Machine Reading”
https://www.aclweb.org/anthology/D16-1053/
论文“A Structured Self-Attentive Sentence Embedding”
https://arxiv.org/abs/1703.03130
论文“A Deep Reinforced Model for Abstractive Summarization”
https://arxiv.org/abs/1705.04304
10.6 Multi-hop Attention
- 论文“Memory Networks”
https://arxiv.org/abs/1410.3916
论文“End-To-End Memory Networks”
https://arxiv.org/abs/1503.08895
论文“Multihop Attention Networks for Question Answer Matching”
https://dl.acm.org/doi/10.1145/3209978.3210009
10.7 Soft Attention和Hard Attention
论文“Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”
https://arxiv.org/abs/1502.03044
10.8 Full Attention和Sparse Attention
论文“Generating Long Sequences with Sparse Transformers”
https://arxiv.org/abs/1904.10509
11.1. 背景
- 论文“Attention Is All You Need”
https://arxiv.org/abs/1706.03762
论文“Convolutional Sequence to Sequence Learning”
https://arxiv.org/abs/1705.03122
11.2.1 背景
- 论文“Convolutional Neural Networks for Sentence Classification”
https://arxiv.org/abs/1408.5882
11.3.4 使用Positional Encoding
- Positional Encoding完整代码
https://github.com/jalammar/jalammar.github.io/blob/master/notebookes/transformer/transformer_positional_encoding_graph.ipyn
11.4 Transformer的改进
- 论文“Generating Long Sequences with Sparse Transformers”
https://arxiv.org/abs/1904.10509
论文“Local Self-Attention over Long Text for Efficient Document Retrieval”
https://arxiv.org/abs/2005.04908
12.1.3 自然语言处理预训练的发展
- 论文“Deep Contextualized Word Representations”
https://arxiv.org/abs/1802.05365
12.3 GPT模型
- 论文“Improving Language Understanding by Generative Pre-Training”
https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf
12.3.6 GPT2和GPT3
- 论文“Language Models are Unsupervised Multitask Learners”
https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf
论文“Language Models are Few-Shot Learners”
https://arxiv.org/abs/2005.14165
文章“Giving GPT-3 a Turing Test”
https://lacker.io/ai/2020/07/06/giving-gpt-3-a-turing-test.html
12.4 BERT模型
- 论文“BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”
https://arxiv.org/abs/1810.04805
论文“Cloze procedure: A new tool for measuring readability”
https://journals.sagepub.com/doi/10.1177/107769905303000401
论文“RoBERTa: A Robustly Optimized BERT Pretraining Approach”
https://arxiv.org/abs/1907.11692
论文“ALBERT: A Lite BERT for Self-supervised Learning of Language Representations”
https://arxiv.org/abs/1909.11942
14.1.1 实验目标与数据集介绍
- 论文“"Neural Chinese Address Parsing”
https://www.aclweb.org/anthology/N19-1346/
14.4.4 训练模型
- neural-chinese-address-parsing 中包含的测试脚本conlleval.pl
https://github.com/leodotnet/neural-chinese-address-parsing/blob/master/conlleval.pl
15.7.2 评估模型
- GPT2-Chinese
https://github.com/Morizeyao/GPT2-Chinese/blob/master/generate.py