gitmyhub

NLP_ability

Python ★ 7.5k updated 3y ago

总结梳理自然语言处理工程师(NLP)需要积累的各方面知识,包括面试题,各种基础知识,工程能力等等,提升核心竞争力

Chinese-language reference guide covering NLP theory and engineering, Transformer architecture, BERT variants, word embeddings, knowledge distillation, and interview prep notes for practitioners.

Pythonsetup: easycomplexity 1/5

This repository is a Chinese-language collection of articles and notes for people working in or studying Natural Language Processing, the field of software that deals with understanding and generating human language. The maintainer assembled it from their own work experience, daily research notes, and paper summaries, with the goal of building up a reference that covers both theory and practical engineering skill.

The content is organized by topic. A large section covers the Transformer architecture, which underpins most modern language AI systems. That section includes a set of common interview questions with detailed answers, explanations of how the encoder and decoder components work, and discussions of design choices like different position encoding approaches and normalization strategies. A separate section focuses on BERT, a well-known language model, and its variants such as RoBERTa, XLNET, ALBERT, and UniLM.

Word embeddings are covered in another section, with multiple articles on Word2Vec covering the two training approaches, the optimization methods, negative sampling, and parameter selection. FastText and GloVe also have dedicated articles. There is a section on knowledge distillation, which is the technique of training a smaller, faster model to mimic a larger one, including articles on TinyBERT, PKD-BERT, and BERT-of-Theseus.

Other sections address text classification, text similarity and matching, named entity recognition, and multimodal models that combine language with other types of data. The articles are written in Chinese and are primarily aimed at practitioners preparing for NLP engineering interviews or looking to deepen their understanding of specific techniques. The repository does not appear to include runnable code for most topics; it functions as a structured reading and reference guide.

Where it fits