Models¶
There are three major types of neural network models, each defined primarily by
the loss function that the model attempts to optimize. While
other types of models are certainly possible, theanets
only tries to handle
the common cases with built-in model classes. If you want to define a new type
of model, see Custom Models.
To describe the predefined models, we assume that a neural network has some set of parameters \(\theta\). In the feedforward pass, the network computes some function of an input vector \(x \in \mathbb{R}^n\) using these parameters; we represent this feedforward function using the notation \(y = F_\theta(x)\).
Autoencoder¶
An autoencoder
takes an array of
\(m\) arbitrary data vectors \(X \in \mathbb{R}^{m \times n}\) as input,
transforms it in some way, and then attempts to recreate the original input as
the output of the network.
To evaluate the loss for an autoencoder, only the input data is required. The default autoencoder model computes the loss using the mean squared error between the network’s output and the input:
Autoencoders simply try to adjust their model parameters \(\theta\) to minimize this squared error between the true inputs and the values that the network produces.
In theory this could be trivial—if, for example, \(F_\theta(x) = x\)—but in practice this doesn’t actually happen very often. In addition, a regularizer \(R(X, \theta)\) can be added to the overall loss for the model to prevent this sort of trivial solution.
To create an autoencoder in theanets
, just create an instance of the
appropriate network subclass:
net = theanets.Autoencoder()
Of course you’ll also need to specify which types of layers you’d like in your model; this is discussed in Specifying Layers.
Regression¶
A regression
model is much like an
autoencoder. Like an autoencoder, a regression model takes as input an array of
arbitrary data \(X \in \mathbb{R}^{m \times n}\). However, at training time,
a regression model also requires an array of expected target outputs \(Y
\in \mathbb{R}^{m \times o}\). Like an autoencoder, the error between the
network’s output and the target is computed using the mean squared error:
The difference here is that instead of trying to produce the input, the regression model is trying to match the target output.
To create a regression model in theanets, just invoke the constructor:
net = theanets.Regressor()
Again, you’ll need to specify which types of layers you’d like in your model; this is discussed in Specifying Layers.
Classification¶
A classification
model takes as input
some piece of data that you want to classify (e.g., the pixels of an image, word
counts from a document, etc.) and outputs a probability distribution over
available labels.
At training time, this type of model requires an array of input data \(X \in \mathbb{R}^{m \times n}\) and a corresponding set of integer labels \(Y \in \{1,\dots,k\}^m\). The error is then computed as the cross-entropy between the network output and the true target labels:
where \(\delta{a,b}\) is the Kronecker delta, which is 1 if \(a=b\) and 0 otherwise.
To create a classifier model in theanets
, invoke its constructor:
net = theanets.Classifier()
As with the other models, you’ll need to specify which types of layers you’d like in your model; this is discussed in Specifying Layers.
Recurrent Models¶
The three predefined models described above also exist in recurrent
formulations. In recurrent networks, time is an explicit part of the model. In
theanets
, if you wish to include recurrent layers in your model, you must
use a model class from the theanets.recurrent
module; this is because
recurrent models require input and output data matrices with an additional
dimension to represent time. In general,
- the data shapes required for a recurrent layer are all one dimension larger than the corresponding shapes for a feedforward network,
- the extra dimension represents time, and
- the extra dimension is located on:
- the first (0) axis in
theanets
versions through 0.6, or - the second (1) axis in
theanets
versions 0.7 and up.
- the first (0) axis in
Warning
Starting with release 0.7.0 of theanets
, recurrent models have changed
the expected axis ordering for data arrays! The axis ordering before version
0.7.0 was (time, batch, variables)
, and the axis ordering starting in the
0.7.0 release is (batch, time, variables)
.
The new ordering is more consistent with other models in theanets
.
Starting in the 0.7 release, the first axis (index 0) of data arrays for all
model types represents the examples in a batch, and the last axis (index -1)
represents the input variables. For recurrent models, the axis in the middle
of a batch (index 1) represents time.
Note
In recurrent models, the batch size is currently required to be greater than one. If you wish to run a recurrent model on a single sample, just create a batch with two copies of the same sample.
Autoencoding¶
A recurrent autoencoder
, just like its
feedforward counterpart, takes as input a single array of data \(X \in
\mathbb{R}^{m \times t \times n}\) and attempts to recreate the same data at the
output, under a squared-error loss.
To create a model of this type, just invoke its constructor:
net = theanets.recurrent.Autoencoder()
Regression¶
A recurrent regression
model is also
just like its feedforward counterpart. It requires two inputs at training time:
an array of input data \(X \in \mathbb{R}^{m \times t \times n}\) and a
corresponding array of output data \(Y \in \mathbb{R}^{m \times t \times
o}\). Like the feedforward regression models, the recurrent version attempts to
produce the target outputs under a squared-error loss.
To create a model of this type, just invoke its constructor:
net = theanets.recurrent.Regressor()
Classification¶
A recurrent classification
model is
like a feedforward classifier in that it takes as input some piece of data that
you want to classify (e.g., the pixels of an image, word counts from a document,
etc.) and outputs a probability distribution over available labels. Computing
the error for this type of model requires an input dataset \(X \in
\mathbb{R}^{m \times t \times n}\) and a corresponding set of integer labels
\(Y \in \mathbb{Z}^{t \times m}\); the error is then computed as the
cross-entropy between the network output and the target labels.
To create a model of this type, just invoke its constructor:
net = theanets.recurrent.Classifier()
Custom Models¶
To create a custom model, just define a new subclass of theanets.Network
.
For instance, the feedforward autoencoder
model is defined basically like this:
class Autoencoder(theanets.Network):
def __init__(self, layers=(), loss='mse', weighted=False):
super(Autoencoder, self).__init__(
layers=layers, loss=loss, weighted=weighted)
Essentially this model just defines a default loss on top of the functionality
in theanets.Network
for creating and managing
layers and loss functions, training the model, making predictions, and so on.
By defining a custom model class, you can also implement whatever helper functionality you think will be useful for your task. With the programming power of Python, the sky’s the limit!