# RNNs as State-space Systems It’s fantastic how you can often use concepts from one field to investigate ideas in another area and improve your understanding of both areas. That’s one of the things I enjoy most.

We’ve just started studying state-space models in 3F2 Systems and Control (a third-year Engineering course at Cambridge). It’s reminded me strongly of recurrent neural networks (RNNs). Look at the first sentence of the handout:

‘The essence of a dynamical system is its memory, i.e. the present output, $y(t)$, depends on past inputs, $u(\tau)$, for $\tau \leq t$.

We are also given that:

Three sets of variables define a dynamical system: the inputs, the state variables and the outputs. This is the state-space representation of the system.

The state is dependent only on previous states and inputs up to and including the input for that timestep.

The State Property: All you need to know about the past up till $t_o$ is $x(t_o)$. That is, the state summarises the effect on the future of inputs and states prior to $t_o$.

You can see RNNs with their hidden states fit these descriptions perfectly, so RNNs are examples of a dynamical systems. More specifically, the standard form for discrete-time state space models is: $x_{k+1} = f(\underline{x_k}, \underline{u_k}, k)$ $y_{k+1} = g(\underline{x_k}, \underline{u_k}, k)$

and the equations for RNNs are: $h_i = \sigma(W_{hh}h_{i-1}+W_hx{x_i}+b_h)$ $y_i = W_{yh}h_i$

which are already in standard form.

It would be interesting to consider how we’d use the control framework to analyse RNNs. Can we model backpropagation as part of the system, or can we only (easily) analyse RNNs with a specific set of weights?

#### — Relevant info —

1 The standard form for a continuous-time state-space dynamical model is $\mathcal{S}$: $\underline{\dot{x}}(t) = \underline{f}(\underline{x}(t), \underline{u}(t), t)$ $\underline{y}(t) = \underline{g}(\underline{x}(t), \underline{u}(t), t)$

Note that it comprises only first-order ODEs.

2 How to choose the state vector: Choose all e.g. $y, \dot{y}, ...$ except the highest derivative. Then we can describe the highest derivative (e.g. $\ddot{y}$) in terms of the state vector. Then find $\underline{\dot{x}}(t)$ in terms of $\underline{x}(t)$ and so on.