The Rule of 78
In 1986, McClelland and Rumelhart published their seminal paper on the learning
of past tenses of English verbs. It played a large part in the renewal of
academic interest in neuron-like computing and machine learning. I'm taking a
class on modelling for computational linguistics this term and we've been using
JavaNNS to play around with some neural networks. McClelland and Rumelhart
provides a scaled-down example of their experiment, so we used that as a
practice exercise. They call their example the rule of 78. It can be described
as a basic rewriting rule, e.g. (2 4 7) -> (2 4 8), (1 6 8) -> (1 6 7),
etc. The point is that the rules are regular in the sense that it is only the
last digit that gets changed in the output (7 turns into 8 and vice versa). This
is just like regular verbs – the base form remains intact and there is a
systematic change to the suffix. However, they also introduce an exception to
the rule: (1 4 7) -> (1 4 7). This can be seen as an irregular verb, e.g. hit
-> hit.
In order to get this data set into JavaNNS, we need to encode it somehow, into sequences of binary numbers. McClelland and Rumelhart use a simple feed-forward network with 8 inputs and 8 outputs, so we'll basically do the same thing. They don't mention how they encoded the patterns into binary sequences, but its quite easy to figure out how it can be done. I'll spoil it in the table below:
| Pattern | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|
| (1 4 7) | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| (2 4 7) | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| (2 4 8) | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
Et cetera. I wrote a small Ruby script for creating pattern files with these data for JavaNNS, which is available here. It's very raw but it should be understandable. It produces .pat files which can be opened directly in JavaNNS and then used for training the network. I won't go into the details of McClelland and Rumelhart's training scheme, however I'd highly recommend reading the article for the rest of the story. We more or less replicated their results, using Backpropagation as the learning function in JavaNNS.
