- Evolution Strategies as a Scalable Alternative to Reinforcement Learning [arXiv]
- Controllable Text Generation [arXiv]
- Neural Episodic Control [arXiv]
- A Structured Self-attentive Sentence Embedding [arXiv]
- Multi-step Reinforcement Learning: A Unifying Algorithm [arXiv]
- Neural Machine Translation and Sequence-to-sequence Models: A Tutorial [arXiv]
- Large-Scale Evolution of Image Classifiers [arXiv]
- FeUdal Networks for Hierarchical Reinforcement Learning [arXiv]
- Evolving Deep Neural Networks [arXiv]
- The Shattered Gradients Problem: If resnets are the answer, then what is the question? [arXiv]
- Neural Map: Structured Memory for Deep Reinforcement Learning [arXiv]
- Bridging the Gap Between Value and Policy Based Reinforcement Learning [arXiv]
- Deep Voice: Real-time Neural Text-to-Speech [arXiv]
- Beating the World's Best at Super Smash Bros. with Deep Reinforcement Learning [arXiv]
- The Game Imitation: Deep Supervised Convolutional Networks for Quick Video Game AI [arXiv]
- Learning to Parse and Translate Improves Neural Machine Translation [arXiv]
- All-but-the-Top: Simple and Effective Postprocessing for Word Representations [arXiv]
- Deep Learning with Dynamic Computation Graphs [arXiv]
- Skip Connections as Effective Symmetry-Breaking arXiv
- odelSemi-Supervised QA with Generative Domain-Adaptive Nets [arXiv]
- Wasserstein GAN arXiv
- Deep Reinforcement Learning: An Overview [arXiv]
- DyNet: The Dynamic Neural Network Toolkit [arXiv]
- DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker [arXiv]
- NIPS 2016 Tutorial: Generative Adversarial Networks arXiv
- A recurrent neural network without Chaos [arXiv]
- Language Modeling with Gated Convolutional Networks [arXiv]
- How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs [arXiv]
- Improving Neural Language Models with a Continuous Cache [arXiv]
- DeepMind Lab[arXiv]
- Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning [arXiv]
- Overcoming catastrophic forgetting in neural networks [arXiv]
- Image-to-Image Translation with Conditional Adversarial Networks [arXiv]
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer [OpenReview]
- Learning to reinforcement learn [arXiv]
- A Way out of the Odyssey: Analyzing and Combining Recent Insights for LSTMs [arXiv]
- Adversarial Training Methods for Semi-Supervised Text Classification [arXiv]
- Importance Sampling with Unequal Support [arXiv]
- Quasi-Recurrent Neural Networks [arXiv]
- Capacity and Learnability in Recurrent Neural Networks [OpenReview]
- Unrolled Generative Adversarial Networks [OpenReview]
- Deep Information Propagation [OpenReview]
- Structured Attention Networks [OpenReview]
- Incremental Sequence Learning [arXiv]
- b-GAN: Unified Framework of Generative Adversarial Networks [OpenReview]
- A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks [OpenReview]
- Categorical Reparameterization with Gumbel-Softmax [arXiv]
- Lip Reading Sentences in the Wild [arXiv]
Reinforcement Learning:
-Learning to reinforcement learn [arXiv]
- A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models [arXiv]
- The Predictron: End-To-End Learning and Planning [OpenReview]
- Third-Person Imitation Learning [OpenReview]
- Generalizing Skills with Semi-Supervised Reinforcement Learning [OpenReview]
- Sample Efficient Actor-Critic with Experience Replay [OpenReview]
- Reinforcement Learning with Unsupervised Auxiliary Tasks [arXiv]
- Neural Architecture Search with Reinforcement Learning [OpenReview]
- Towards Information-Seeking Agents [OpenReview]
- Multi-Agent Cooperation and the Emergence of (Natural) Language [OpenReview]
- Improving Policy Gradient by Exploring Under-appreciated Rewards [OpenReview]
- Stochastic Neural Networks for Hierarchical Reinforcement Learning [OpenReview]
- Tuning Recurrent Neural Networks with Reinforcement Learning [OpenReview]
- RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning [arXiv]
- Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning [OpenReview]
- Learning to Perform Physics Experiments via Deep Reinforcement Learning [OpenReview]
- Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU [OpenReview]
- Learning to Compose Words into Sentences with Reinforcement Learning[OpenReview]
- Deep Reinforcement Learning for Accelerating the Convergence Rate [OpenReview]
- #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning [arXiv]
- Learning to Compose Words into Sentences with Reinforcement Learning [OpenReview]
- Learning to Navigate in Complex Environments [arXiv]
- Unsupervised Perceptual Rewards for Imitation Learning [OpenReview]
- Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic [OpenReview]
Machine Translation & Dialog
- Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation [arXiv]
- Neural Machine Translation with Reconstruction [arXiv]
- Iterative Refinement for Machine Translation [OpenReview]
- A Convolutional Encoder Model for Neural Machine Translation [arXiv]
- Improving Neural Language Models with a Continuous Cache [OpenReview]
- Vocabulary Selection Strategies for Neural Machine Translation [OpenReview]
- Towards an automatic Turing test: Learning to evaluate dialogue responses [OpenReview]
- Dialogue Learning With Human-in-the-Loop [OpenReview]
- Batch Policy Gradient Methods for Improving Neural Conversation Models [OpenReview]
- Learning through Dialogue Interactions [OpenReview]
- Dual Learning for Machine Translation [arXiv]
- Unsupervised Pretraining for Sequence to Sequence Learning [arXiv]
- Understanding deep learning requires rethinking generalization [arXiv]
- Neural Machine Translation in Linear Time [arXiv]
- Professor Forcing: A New Algorithm for Training Recurrent Networks [arXiv]
- Learning to Protect Communications with Adversarial Neural Cryptography [arXiv]
- Can Active Memory Replace Attention? [arXiv]
- Using Fast Weights to Attend to the Recent Past [arXiv]
- Fully Character-Level Neural Machine Translation without Explicit Segmentation [arXiv]
- Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models [arXiv]
- Video Pixel Networks [arXiv]
- Connecting Generative Adversarial Networks and Actor-Critic Methods [arXiv]
- Learning to Translate in Real-time with Neural Machine Translation [arXiv]
- Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search arXiv
- Pointer Sentinel Mixture Models arXiv
- Towards Deep Symbolic Reinforcement Learning [arXiv]
- HyperNetworks [arXiv]
- Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation [arXiv]
- Safe and Efficient Off-Policy Reinforcement Learning [arXiv]
- Playing FPS Games with Deep Reinforcement Learning [arXiv]
- SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient [arXiv]
- Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks [arXiv]
- Energy-based Generative Adversarial Network [arXiv]
- Stealing Machine Learning Models via Prediction APIs [arXiv]
- Semi-Supervised Classification with Graph Convolutional Networks [arXiv]
- WaveNet: A Generative Model For Raw Audio [arXiv]
- Hierarchical Multiscale Recurrent Neural Networks [arXiv]
- End-to-End Reinforcement Learning of Dialogue Agents for Information Access [arXiv]
- Deep Neural Networks for YouTube Recommendations [paper]
- Machine Comprehension Using Match-LSTM and Answer Pointer [arXiv]
- Stacked Approximated Regression Machine: A Simple Deep Learning Approach [arXiv]
- Decoupled Neural Interfaces using Synthetic Gradients [arXiv]
- WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia [arXiv]
- Temporal Attention Model for Neural Machine Translation [arXiv]
- Residual Networks of Residual Networks: Multilevel Residual Networks [arXiv]
- Learning Online Alignments with Continuous Rewards Policy Gradient [arXiv]
- An Actor-Critic Algorithm for Sequence Prediction [arXiv]
- Cognitive Science in the era of Artificial Intelligence: A roadmap for reverse-engineering the infant language-learner [arXiv]
- Recurrent Neural Machine Translation [arXiv]
- MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition [arXiv]
- Layer Normalization [arXiv]
- Neural Machine Translation with Recurrent Attention Modeling [arXiv]
- Neural Semantic Encoders [arXiv]
- Attention-over-Attention Neural Networks for Reading Comprehension [arXiv]
- sk_p: a neural program corrector for MOOCs [arXiv]
- Recurrent Highway Networks [arXiv]
- Bag of Tricks for Efficient Text Classification [arXiv]
- Context-Dependent Word Representation for Neural Machine Translation [arXiv]
- Dynamic Neural Turing Machine with Soft and Hard Addressing Schemes [arXiv]
-
Sequence-to-Sequence Learning as Beam-Search Optimization [arXiv]
-
Policy Networks with Two-Stage Training for Dialogue Systems [arXiv]
-
Towards an integration of deep learning and neuroscience [arXiv]
-
On Multiplicative Integration with Recurrent Neural Networks [arxiv]
-
Online and Offline Handwritten Chinese Character Recognition [arXiv]
-
Tutorial on Variational Autoencoders [arXiv]
-
Concrete Problems in AI Safety [arXiv]
-
Deep Reinforcement Learning Discovers Internal Models [arXiv]
-
SQuAD: 100,000+ Questions for Machine Comprehension of Text [arXiv]
-
Conditional Image Generation with PixelCNN Decoders [arXiv]
-
Model-Free Episodic Control [arXiv]
-
Improved Techniques for Training GANs [arXiv])
-
Memory-Efficient Backpropagation Through Time [arXiv]
-
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets [arXiv]
-
Zero-Resource Translation with Multi-Lingual Neural Machine Translation [arXiv]
-
Key-Value Memory Networks for Directly Reading Documents [arXiv]
-
Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translatin [arXiv]
-
Learning to learn by gradient descent by gradient descent [arXiv]
-
Learning Language Games through Interaction [arXiv]
-
Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations [arXiv]
-
Smart Reply: Automated Response Suggestion for Email [arXiv]
-
Virtual Adversarial Training for Semi-Supervised Text Classification [arXiv]
-
Deep Reinforcement Learning for Dialogue Generation [arXiv]
-
Very Deep Convolutional Networks for Natural Language Processing [arXiv]
-
Neural Net Models for Open-Domain Discourse Coherence [arXiv]
-
Neural Architectures for Fine-grained Entity Type Classification [arXiv]
-
Gated-Attention Readers for Text Comprehension [arXiv]
-
End-to-end LSTM-based dialog control optimized with supervised and reinforcement learning [arXiv]
-
Iterative Alternating Neural Attention for Machine Reading [arXiv]
-
Memory-enhanced Decoder for Neural Machine Translation [arXiv]
-
Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation [arXiv]
-
Conversational Contextual Cues: The Case of Personalization and History for Response Ranking [arXiv]
-
Adversarially Learned Inference [arXiv]
-
Neural Network Translation Models for Grammatical Error Correction [arXiv]
- Hierarchical Memory Networks [arXiv]
- Deep API Learning [arXiv]
- Wide Residual Networks [arXiv]
- TensorFlow: A system for large-scale machine learning [arXiv]
- Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention [arXiv]
- Aspect Level Sentiment Classification with Deep Memory Network [arXiv]
- FractalNet: Ultra-Deep Neural Networks without Residuals [arXiv]
- Learning End-to-End Goal-Oriented Dialog [arXiv]
- One-shot Learning with Memory-Augmented Neural Networks [arXiv]
- Deep Learning without Poor Local Minima [arXiv]
- AVEC 2016 - Depression, Mood, and Emotion Recognition Workshop and Challenge [arXiv]
- Data Programming: Creating Large Training Sets, Quickly [arXiv]
- Deeply-Fused Nets [arXiv]
- Deep Portfolio Theory [arXiv]
- Unsupervised Learning for Physical Interaction through Video Prediction [arXiv]
- Movie Description [arXiv]
- Higher Order Recurrent Neural Networks [arXiv]
- Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition [arXiv]
- Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation [arXiv]
- The IBM 2016 English Conversational Telephone Speech Recognition System [arXiv]
- Dialog-based Language Learning [arXiv]
- Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss [arXiv]
- Sentence-Level Grammatical Error Identification as Sequence-to-Sequence Correction [arXiv]
- A Network-based End-to-End Trainable Task-oriented Dialogue System [arXiv]
- Visual Storytelling [arXiv]
- Improving the Robustness of Deep Neural Networks via Stability Training [arXiv]
- Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex [arXiv]
- Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention [arXiv]
- Sentence Level Recurrent Topic Model: Letting Topics Speak for Themselves [arXiv]
- Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models [arXiv]
- Building Machines That Learn and Think Like People [arXiv]
- A Semisupervised Approach for Language Identification based on Ladder Networks [arXiv]
- Deep Networks with Stochastic Depth [arXiv]
- PHOCNet: A Deep Convolutional Neural Network for Word Spotting in Handwritten Documents [arXiv]
- Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning [arXiv]
- A Fast Unified Model for Parsing and Sentence Understanding [arXiv]
- Latent Predictor Networks for Code Generation [arXiv]
- Attend, Infer, Repeat: Fast Scene Understanding with Generative Models [arXiv]
- Recurrent Batch Normalization [arXiv]
- Neural Language Correction with Character-Based Attention [arXiv]
- Incorporating Copying Mechanism in Sequence-to-Sequence Learning [arXiv]
- How NOT To Evaluate Your Dialogue System [arXiv]
- Adaptive Computation Time for Recurrent Neural Networks [arXiv]
- A guide to convolution arithmetic for deep learning [arXiv]
- Colorful Image Colorization [arXiv]
- Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles [arXiv]
- Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus [arXiv]
- A Persona-Based Neural Conversation Model [arXiv]
- A Character-level Decoder without Explicit Segmentation for Neural Machine Translation [arXiv]
- Multi-Task Cross-Lingual Sequence Tagging from Scratch [arXiv]
- Neural Variational Inference for Text Processing [arXiv]
- Recurrent Dropout without Memory Loss [arXiv]
- One-Shot Generalization in Deep Generative Models [arXiv]
- Recursive Recurrent Nets with Attention Modeling for OCR in the Wild [[arXiv](Recursive Recurrent Nets with Attention Modeling for OCR in the Wild)]
- A New Method to Visualize Deep Neural Networks [[arXiv](A New Method to Visualize Deep Neural Networks)]
- Neural Architectures for Named Entity Recognition [arXiv]
- End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF [arXiv]
- Character-based Neural Machine Translation [arXiv]
- Learning Word Segmentation Representations to Improve Named Entity Recognition for Chinese Social Media [arXiv]
- Architectural Complexity Measures of Recurrent Neural Networks [arXiv]
- Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks [arXiv]
- Recurrent Neural Network Grammars [arXiv]
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations [arXiv]
- Contextual LSTM (CLSTM) models for Large scale NLP tasks [arXiv]
- Sequence-to-Sequence RNNs for Text Summarization [arXiv]
- Extraction of Salient Sentences from Labelled Documents [arXiv]
- Learning Distributed Representations of Sentences from Unlabelled Data [arXiv]
- Benefits of depth in neural networks [arXiv]
- Associative Long Short-Term Memory [arXiv]
- Generating images with recurrent adversarial networks [arXiv]
- Exploring the Limits of Language Modeling [arXiv]
- Swivel: Improving Embeddings by Noticing What’s Missing [arXiv]
- WebNav: A New Large-Scale Task for Natural Language based Sequential Decision Making [arXiv]
- Efficient Character-level Document Classification by Combining Convolution and Recurrent Layers [arXiv]
- BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 [arXiv]
- Learning Discriminative Features via Label Consistent Neural Network [arXiv]
- Pixel Recurrent Neural Networks [arXiv]
- Bitwise Neural Networks [arXiv]
- Long Short-Term Memory-Networks for Machine Reading [arXiv]
- Coverage-based Neural Machine Translation [arXiv]
- Understanding Deep Convolutional Networks [arXiv]
- Training Recurrent Neural Networks by Diffusion [arXiv]
- Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures [arXiv]
- Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism [arXiv]
- Recurrent Memory Network for Language Modeling [arXiv]
- Language to Logical Form with Neural Attention [arXiv]
- Learning to Compose Neural Networks for Question Answering [arXiv]
- The Inevitability of Probability: Probabilistic Inference in Generic Neural Networks Trained with Non-Probabilistic Feedback [arXiv]
- COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images [arXiv]
- Survey on the attention based RNN model and its applications in computer vision [arXiv]
NLP
- Strategies for Training Large Vocabulary Neural Language Models [arXiv]
- Multilingual Language Processing From Bytes [arXiv]
- Learning Document Embeddings by Predicting N-grams for Sentiment Classification of Long Movie Reviews [arXiv]
- Target-Dependent Sentiment Classification with Long Short Term Memory [arXiv]
- Reading Text in the Wild with Convolutional Neural Networks [arXiv]
Vision
- Deep Residual Learning for Image Recognition [arXiv]
- Rethinking the Inception Architecture for Computer Vision [arXiv]
- Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks [arXiv]
- Deep Speech 2: End-to-End Speech Recognition in English and Mandarin [arXiv]
NLP
- Deep Reinforcement Learning with a Natural Language Action Space [arXiv]
- Sequence Level Training with Recurrent Neural Networks [arXiv]
- Teaching Machines to Read and Comprehend [arxiv]
- Semi-supervised Sequence Learning [arXiv]
- Multi-task Sequence to Sequence Learning [arXiv]
- Alternative structures for character-level RNNs [arXiv]
- Larger-Context Language Modeling [arXiv]
- A Unified Tagging Solution: Bidirectional LSTM Recurrent Neural Network with Word Embedding [arXiv]
- Towards Universal Paraphrastic Sentence Embeddings [arXiv]
- BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies [arXiv]
- Sequence Level Training with Recurrent Neural Networks [arXiv]
- Natural Language Understanding with Distributed Representation [arXiv]
- sense2vec - A Fast and Accurate Method for Word Sense Disambiguation In Neural Word Embeddings [arXiv]
- LSTM-based Deep Learning Models for non-factoid answer selection [arXiv]
Programs
- Neural Random-Access Machines [arxiv]
- Neural Programmer: Inducing Latent Programs with Gradient Descent [arXiv]
- Neural Programmer-Interpreters [arXiv]
- Learning Simple Algorithms from Examples [arXiv]
- Neural GPUs Learn Algorithms [arXiv]
- On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models [arXiv]
Vision
- ReSeg: A Recurrent Neural Network for Object Segmentation [arXiv]
- Deconstructing the Ladder Network Architecture [arXiv]
- Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks [arXiv]
General
- Towards Principled Unsupervised Learning [arXiv]
- Dynamic Capacity Networks [arXiv]
- Generating Sentences from a `ous Space [arXiv]
- Net2Net: Accelerating Learning via Knowledge Transfer [arXiv]
- A Roadmap towards Machine Intelligence [arXiv]
- Session-based Recommendations with Recurrent Neural Networks [arXiv]
- Regularizing RNNs by Stabilizing Activations [arXiv]
- A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification [arXiv]
- Attention with Intention for a Neural Network Conversation Model [arXiv]
- Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Recurrent Neural Network [arXiv]
- A Survey: Time Travel in Deep Learning Space: An Introduction to Deep Learning Models and How Deep Learning Models Evolved from the Initial Ideas [arXiv]
- A Primer on Neural Network Models for Natural Language Processing [arXiv]
- A Diversity-Promoting Objective Function for Neural Conversation Models [arXiv]
- Character-level Convolutional Networks for Text Classification [arXiv]
- A Neural Attention Model for Abstractive Sentence Summarization [arXiv]
- Poker-CNN: A Pattern Learning Strategy for Making Draws and Bets in Poker Games [arXiv]
- Neural Machine Translation of Rare Words with Subword Units [arXiv]
- Listen, Attend and Spell [arxiv]
- Character-Aware Neural Language Models [arXiv]
- Improved Transition-Based Parsing by Modeling Characters instead of Words with LSTMs [arXiv]
- Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation [arXiv]
- Effective Approaches to Attention-based Neural Machine Translation [arXiv]
- Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models [arXiv]
- Semi-Supervised Learning with Ladder Networks [arXiv]
- Document Embedding with Paragraph Vectors [arXiv]
- Training Very Deep Networks [arXiv]
- Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning [arXiv]
- A Neural Network Approach to Context-Sensitive Generation of Conversational Responses [arXiv]
- Document Embedding with Paragraph Vectors [arXiv]
- A Neural Conversational Model [arXiv]
- Skip-Thought Vectors [arXiv]
- Pointer Networks [arXiv]
- Spatial Transformer Networks [arXiv]
- Tree-structured composition in neural networks without tree-structured architectures [arXiv]
- Visualizing and Understanding Neural Models in NLP [arXiv]
- Learning to Transduce with Unbounded Memory [arXiv]
- Ask Me Anything: Dynamic Memory Networks for Natural Language Processing [arXiv]
- Deep Knowledge Tracing [arXiv]
- ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks [arXiv]
- Reinforcement Learning Neural Turing Machines [arXiv]
- Correlational Neural Networks [arXiv]
- Distilling the Knowledge in a Neural Network [arXiv]
- End-To-End Memory Networks [arXiv]
- Neural Responding Machine for Short-Text Conversation [arXiv]
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift [arXiv]
- Text Understanding from Scratch [arXiv]
- Show, Attend and Tell: Neural Image Caption Generation with Visual Attention [arXiv]
- Learning Longer Memory in Recurrent Neural Networks [arXiv]
- Neural Turing Machines [arxiv]
- Grammar as a Foreign Langauage [arXiv]
- On Using Very Large Target Vocabulary for Neural Machine Translation [arXiv]
- Effective Use of Word Order for Text Categorization with Convolutional Neural Networks [arXiv]
- Multiple Object Recognition with Visual Attention [arXiv]
- Sequence to Sequence Learning with Neural Networks [arXiv]
- Neural Machine Translation by Jointly Learning to Align and Translate [arxiv]
- On the Properties of Neural Machine Translation: Encoder-Decoder Approaches [arXiv]
- Recurrent Neural Network Regularization [arXiv]
- Very Deep Convolutional Networks for Large-Scale Image Recognition [arXiv]
- Going Deeper with Convolutions [arXiv]
- Convolutional Neural Networks for Sentence Classification [arxiv]
- Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation [arXiv]
- Recurrent Models of Visual Attention [arXiv]
- Generative Adversarial Networks [arXiv]
- A Convolutional Neural Network for Modelling Sentences [arXiv]
- Visualizing and Understanding Convolutional Networks [arXiv]
- DeViSE: A Deep Visual-Semantic Embedding Model [pub]
- Maxout Networks [arXiv]
- Exploiting Similarities among Languages for Machine Translation [arXiv]
- Efficient Estimation of Word Representations in Vector Space [arXiv]
- Natural Language Processing (almost) from Scratch [arXiv]