Skip to content

yflv-yanxia/nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 

Repository files navigation

nlp

RE

[std::regex/Boost.Regex-c++]
[hyperscan-c++/python], a large number of regular expressions, only for x86
[QRegExp-c++]
[re-python]
[PCRE/PCRE++-perl/c++]
[google/re2-c++/go/python], a large number of regular expressions
comparision

LAC

Chinese Lexical Analysis with Deep Bi-GRU-CRF Network -baidu, arxiv2018

pretrain models

[thulac]
[baidu/lac]
HIT-SCIR/ltp
spacy
stanza
[hanlp]

Machine Reading Comprehension

Improving Machine Reading Comprehension with Single-choice Decision and Transfer Learning -tencent, arxiv2020
DUMA: Reading Comprehension with Transposition Thinking -huawei, arxiv2020
DCMN+: Dual co-matching network for multi-choice reading comprehension -cloudwalk, AAAI2020
Albert: A lite bert for self-supervised learning of language representations -google, ICLR2020
Dual co-matching network for multi-choice reading comprehension -cloudwalk, arxiv2019
Option comparison network for multiple-choice reading comprehension -tencent, arxiv2019
Neural Machine Reading Comprehension: Methods and Trends -S Liu, AppliedSciences2019
Applying deep learning to answer selection: A study and an open task -IBM, ASRU2015

databse

DREAM
RACE
[SQuAD2.0]
[ARC]
[CoQA]

NER

A survey on deep learning for named entity recognition -TKDE2020

database

Ontonotes release 4.0/5.0
MSRA, Word segmentation and named entity recognition
Weibo NER, recognition for Chinese social media with jointly trained embeddings
人民日报
BosonNLP_NER_6C, bosonnlp
CCKS2017/2018/2019/2020电子病历实体标注
WikiANN/PAN-X
XGLUE
CLUENER2020

pretrain models

baidu/ERNIE
baidu/lac
HIT-SCIR/ltp
spacy
stanza
腾讯UER
CLUEPretrainedModels
Chinese-BERT-wwm
google

Dependency Parsing

Efficient Second-Order TreeCRF for Neural Dependency Parsing -SoochowUniversity, ACL2020, code
Deep Biaffine Attention for Neural Dependency Parsing -Stanford, ICLR2017

database

pretrain models

baidu/DDParser
HIT-SCIR/ltp
spacy
stanza

Post Editing

A survey on non-autoregressive generation for neural machine translation and beyond -msra, PAMI2023, linker
Directed Acyclic Transformer Pre-training for High-quality Non-autoregressive Text Generation -tsinghua, ACL2023, code
Directed Acyclic Transformer for Non-Autoregressive Machine Translation -bytedance, ICML2022
Hierarchical Context Tagging for Utterance Rewriting -tencent, AAAI2022
Text generation with text-editing models -NAACL2022
EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start -google, EMNLP2022, code
LEWIS: Levenshtein Editing for Unsupervised Text Style Transfer -ACL2021
LayoutReader: Pre-training of Text and Layout for Reading Order Detection -EMNLP2021, code&dataset
Softcorrect: Error correction with soft detection for automatic speech recognition -microsoft, AAAI2023
FastCorrect2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition -microsoft, EMNLP2021, code
FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition -microsoft, NeurIPS2021, code
FELIX: Flexible Text Editing Through Tagging and Insertion -google, EMNLP2020, code
Seq2Edits: Sequence transduction using span-level edit operations -google, EMNLP2020
Spelling Error Correction with Soft-Masked BERT -bytedance, arxiv2020
SpellGCN: Incorporating Phonological and Visual Similarities into Language Models for Chinese Spelling Check -alibaba, ACL2020
Encode, Tag, Realize: High-Precision Text Editing -google, EMNLP2019
Levenshtein transformer -facebook, NIPS2019
Unified Language Model Pre-training for Natural Language Understanding and Generation -NIPS2019
A spelling correction model for end-to-end speech recognition -google, ICASSP 2019
Automatic Spelling Correction with Transformer for CTC-based End-to-End Speech Recognition -alibaba, arxiv2019

dict

CLUECorpus2020 Google原始中文词表

pretrain

CLUECorpus2020
brightmart
人民日报1998版本
人民日报2014版本

CLUE

中文医疗信息处理挑战榜CBLUE, database

QA

english

CLUE benchmark google, Natural Questions: a Benchmark for Question Answering Research

chinese

哈工大、讯飞CMRC DRCD

cls

CLUE benchmark 清华大学开源的文本分类数据集THUCTC

labeling tools

YEDDA

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published