Chapter 14 Epilogue

Author: Matthias Aßenmacher

Since this project was realized in a limited time frame and accounted for about one third of the ECTS points which should be achieved during one semester, it is obvious that this booklet cannot provide exhaustive coverage of the vast research field of Natural Language Processing.

Furthermore this area of research is moving very rapidly at the moment, which means that certain architectures, improvements or ideas had net yet even been published when we sat down and came up with the chapter topics in February 2020. Thus, this epilogue tries to put the content of this booklet into context and relate it to what is currently happening. Thereby we will focus on mainly three aspects:

  • New influential (or even state-of-the-art) architectures
  • Improvements and work on the Attention-mechanism and Transformers
  • Work on proper evaluation and interpretability

14.1 New influentioal architectures

In Chapter 7: “Transfer Learning for NLP I” and Chapter 9: “Transfer Learning for NLP I” some of the most important models for sequential transfer learning have been presented. We chose to narrow ourselves down to this type of models, since we considered them to be most important to begin with, in order to unterstand the overall concept. Nevertheless, other influential architectures shall also be addressed:

  • An architecture with a relatively interesting pre-training objective the interested reader might want to have a look at, is ELECTRA (Clark et al. (2020)).
  • Google’s T5 (Raffel et al. (2019)) (already briefly mentioned in Chapter 9) does not fit into the category of sequential tansfer learning but rather belongs to multi-task learning models (cf. Fig. 7.1) in Chapter 7)) since it is trained on multiple tasks at once. This is possible due to transformation of the entire input and output to strings, which essentially converts every tasks to as seq-to-seq task.
  • In May 2020 the OpenAI GPT-3 (Brown et al. (2020)) shook the NLP community and triggered a lot of subsequent research. This model puts, as already mentioned in Chapter 9, is by far bigger then every previous model and put a special focus on few-shot learning-

14.2 Improvements of the Self-Attention mechanism

Recently, there has been a lot effort put in improving the Self-Attention, mostly by reducing its computational cost and this enabling models to process longer sequences. One interesting article has already been discussed at the end of Chapter 8, while another interesting piece of work has been published recently by Wang et al. (2020): The so-called Linformer

14.3 Evaluation and Interpretability

While “traditional” Benchmark (collections) have been discussed in Chapter 11, there is a lot of ongoing research about proper evaluation and interpretability of NLP models. Here are just two examples of impressive work, which was published very recently:

We hope that this little outlook can adequately round off this nice piece of academic work created by extremely motivated students and we hope that you enjoyed reading.


Brown, Tom B., Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al. 2020. “Language Models Are Few-Shot Learners.”

Clark, Kevin, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. 2020. “ELECTRA: Pre-Training Text Encoders as Discriminators Rather Than Generators.” In International Conference on Learning Representations.

Raffel, Colin, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.” arXiv Preprint arXiv:1910.10683.

Ribeiro, Marco Tulio, Tongshuang Wu, Carlos Guestrin, and Sameer Singh. 2020. “Beyond Accuracy: Behavioral Testing of NLP Models with CheckList.” In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 4902–12. Online: Association for Computational Linguistics.

Tenney, Ian, James Wexler, Jasmijn Bastings, Tolga Bolukbasi, Andy Coenen, Sebastian Gehrmann, Ellen Jiang, et al. 2020. “The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for Nlp Models.” arXiv Preprint arXiv:2008.05122.

Wang, Sinong, Belinda Z. Li, Madian Khabsa, Han Fang, and Hao Ma. 2020. “Linformer: Self-Attention with Linear Complexity.”