Inverse Approximation Theory for Nonlinear Recurrent Neural Networks

Description

Curse of memory refers to the difficulty of learning long-term memory using recurrent models. Although recurrent models benefit from low inference costs, this curse restricts their effectiveness for tasks involving long sequences. In this paper, we study nonlinear RNNs' curse of memory phenomenon. It is shown that, simply adding nonlinear activations such as hardtanh and tanh does not relax the curse. Using stable reparameterisation such as exp parameterisation and softplus parameterisation can relax the curse of memory and achieve stable approximation for long-term memories.

Curse of memory in linear RNNs

Let $m$ be the hidden dimensions. We manually construct datasets with different memory patterns. Short-memory one is exponential decay and long-memory one is polynomial decay ($\rho_t = e^{-t}$ and $\rho_t = \frac{1}{(1+t)^p}$.)

Exponential decaying memory can be stably approximated	Polynomial decaying memory cannot be stably approximated

Curse of memory in nonlinear RNNs

Next, we still work on the polynomial decay memory. We show that the commonly-used activations (hardtanh and tanh) do not directly relaxed the difficuly in the polynomial decaying memory task.

Hardtanh	Tanh

Proper parameterisation enables stable approximation for long memory

We'll designate the parameterizations that accommodate long-term memory as stable parameterizations.

Parameterisation	Exponential decay	Polynomial decay
Diagonal RNN	Stable	Unstable
Vanilla RNN	Stable	Unstable
State-space model	Stable	Unstable
Linear Recurrent Unit	Stable	Unstable
Stable Reparameterisation	Stable	Stable

Vanilla RNN	Stable Parameterisation

Models

Nonlinear RNNs

Discrete-time case: $$h_{k+1} = h_k + \Delta t \mathbf{\sigma}(Wh_k+Ux_k+b)$$

$$y_k = c^\top h_k$$

Continuous-time case: $$\frac{dh_{t}}{dt} = \mathbf{\sigma}(Wh_t+Ux_t+b)$$

$$y_t = c^\top h_t$$

SSMs

State-space models refer to the linear RNNs with layer-wise nonlinear activations.

Discrete-time case: $$h_{k+1} = Wh_k+Ux_k+b$$

$$y_k = c^\top \mathbf{\sigma}(h_k)$$

Continuous-time case: $$\frac{dh_{t}}{dt} = Wh_t+Ux_t+b$$

$$y_t = c^\top \mathbf{\sigma}(h_t)$$

StableSSM

State-space models refer to the linear RNNs with layer-wise nonlinear activations.

Discrete-time case: $$h_{k+1} = \Lambda h_k+Ux_k+b$$

$$\Lambda = f(W)$$ $W$ is the trainable weight. $f$ can be exponential/softplus/inverse parameterization.

Installation

Pip

# clone project
git clone https://github.com/radarFudan/Curse-of-memory
cd Curse-of-memory
conda create -n CoM python=3.10
conda activate CoM
pip install -r requirements.txt

Refs

Curse of memory phenomenon / definition of memory function and stable approximation

@inproceedings{
    wang2024inverse,
    title={Inverse Approximation Theory for Nonlinear Recurrent Neural Networks},
    author={Shida Wang and Zhong Li and Qianxiao Li},
    booktitle={The Twelfth International Conference on Learning Representations},
    year={2024},
    url={https://openreview.net/forum?id=yC2waD70Vj}
}
@inproceedings{
    li2021on,
    title={On the Curse of Memory in Recurrent Neural Networks: Approximation and Optimization Analysis},
    author={Zhong Li and Jiequn Han and Weinan E and Qianxiao Li},
    booktitle={International Conference on Learning Representations},
    year={2021},
    url={https://openreview.net/forum?id=8Sqhl-nF50}
}

Extension to state-space models

@inproceedings{
    wang2023statespace,
    title={State-space models with layer-wise nonlinearity are universal approximators with exponential decaying memory},
    author={Shida Wang and Beichen Xue},
    booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
    year={2023},
    url={https://openreview.net/forum?id=i0OmcF14Kf}
}
@misc{wang2023stablessm,
    title={StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization},
    author={Shida Wang and Qianxiao Li},
    year={2023},
    eprint={2311.14495},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

Survey on sequence modelling from approximation perspective

@Article{JML-2-1,
    author = {Haotian Jiang and Qianxiao Li and Zhong Li and Shida Wang},
    title = {A Brief Survey on the Approximation Theory for Sequence Modelling},
    journal = {Journal of Machine Learning},
    year = {2023},
    volume = {2},
    number = {1},
    pages = {1--30},
    abstract = {We survey current developments in the approximation theory of sequence modelling in machine learning. Particular emphasis is placed on classifying existing results for various model architectures through the lens of classical approximation paradigms, and the insights one can gain from these results. We also outline some future research directions towards building a theory of sequence modelling.},
    issn = {2790-2048},
    doi = {https://doi.org/10.4208/jml.221221},
    url = {http://global-sci.org/intro/article_detail/jml/21511.html} }

Name		Name	Last commit message	Last commit date
Latest commit History 199 Commits
.github		.github
configs		configs
data		data
figs		figs
logs		logs
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.project-root		.project-root
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
apt_install.txt		apt_install.txt
docker_command.sh		docker_command.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Inverse Approximation Theory for Nonlinear Recurrent Neural Networks

Description

Models

Nonlinear RNNs

SSMs

StableSSM

Installation

Pip

Refs

Curse of memory phenomenon / definition of memory function and stable approximation

Extension to state-space models

Survey on sequence modelling from approximation perspective

About

Releases

Packages

Contributors 2

Languages

radarFudan/Curse-of-memory

Folders and files

Latest commit

History

Repository files navigation

Inverse Approximation Theory for Nonlinear Recurrent Neural Networks

Description

Models

Nonlinear RNNs

SSMs

StableSSM

Installation

Pip

Refs

Curse of memory phenomenon / definition of memory function and stable approximation

Extension to state-space models

Survey on sequence modelling from approximation perspective

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages