[std::regex/Boost.Regex-c++]
[hyperscan-c++/python], a large number of regular expressions, only for x86
[QRegExp-c++]
[re-python]
[PCRE/PCRE++-perl/c++]
[google/re2-c++/go/python], a large number of regular expressions
comparision
Chinese Lexical Analysis with Deep Bi-GRU-CRF Network -baidu, arxiv2018
[thulac]
[baidu/lac]
HIT-SCIR/ltp
spacy
stanza
[hanlp]
Improving Machine Reading Comprehension with Single-choice Decision and Transfer Learning -tencent, arxiv2020
DUMA: Reading Comprehension with Transposition Thinking -huawei, arxiv2020
DCMN+: Dual co-matching network for multi-choice reading comprehension -cloudwalk, AAAI2020
Albert: A lite bert for self-supervised learning of language representations -google, ICLR2020
Dual co-matching network for multi-choice reading comprehension -cloudwalk, arxiv2019
Option comparison network for multiple-choice reading comprehension -tencent, arxiv2019
Neural Machine Reading Comprehension: Methods and Trends -S Liu, AppliedSciences2019
Applying deep learning to answer selection: A study and an open task -IBM, ASRU2015
DREAM
RACE
[SQuAD2.0]
[ARC]
[CoQA]
A survey on deep learning for named entity recognition -TKDE2020
Ontonotes release 4.0/5.0
MSRA, Word segmentation and named entity recognition
Weibo NER, recognition for Chinese social media with jointly trained embeddings
人民日报
BosonNLP_NER_6C, bosonnlp
CCKS2017/2018/2019/2020电子病历实体标注
WikiANN/PAN-X
XGLUE
CLUENER2020
baidu/ERNIE
baidu/lac
HIT-SCIR/ltp
spacy
stanza
腾讯UER
CLUEPretrainedModels
Chinese-BERT-wwm
google
Efficient Second-Order TreeCRF for Neural Dependency Parsing -SoochowUniversity, ACL2020, code
Deep Biaffine Attention for Neural Dependency Parsing -Stanford, ICLR2017
baidu/DDParser
HIT-SCIR/ltp
spacy
stanza
A survey on non-autoregressive generation for neural machine translation and beyond -msra, PAMI2023, linker
Directed Acyclic Transformer Pre-training for High-quality Non-autoregressive Text Generation -tsinghua, ACL2023, code
Directed Acyclic Transformer for Non-Autoregressive Machine Translation -bytedance, ICML2022
Hierarchical Context Tagging for Utterance Rewriting -tencent, AAAI2022
Text generation with text-editing models -NAACL2022
EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start -google, EMNLP2022, code
LEWIS: Levenshtein Editing for Unsupervised Text Style Transfer -ACL2021
LayoutReader: Pre-training of Text and Layout for Reading Order Detection -EMNLP2021, code&dataset
Softcorrect: Error correction with soft detection for automatic speech recognition -microsoft, AAAI2023
FastCorrect2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition -microsoft, EMNLP2021, code
FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition -microsoft, NeurIPS2021, code
FELIX: Flexible Text Editing Through Tagging and Insertion -google, EMNLP2020, code
Seq2Edits: Sequence transduction using span-level edit operations -google, EMNLP2020
Spelling Error Correction with Soft-Masked BERT -bytedance, arxiv2020
SpellGCN: Incorporating Phonological and Visual Similarities into Language Models for Chinese Spelling Check -alibaba, ACL2020
Encode, Tag, Realize: High-Precision Text Editing -google, EMNLP2019
Levenshtein transformer -facebook, NIPS2019
Unified Language Model Pre-training for Natural Language Understanding and Generation -NIPS2019
A spelling correction model for end-to-end speech recognition -google, ICASSP 2019
Automatic Spelling Correction with Transformer for CTC-based End-to-End Speech Recognition -alibaba, arxiv2019
CLUECorpus2020 Google原始中文词表
CLUECorpus2020
brightmart
人民日报1998版本
人民日报2014版本
CLUE benchmark google, Natural Questions: a Benchmark for Question Answering Research
哈工大、讯飞CMRC DRCD
CLUE benchmark
清华大学开源的文本分类数据集THUCTC
YEDDA