Posted on Jul 11

Proposal: A New Neural Network Architecture for Dynamic Behavior

I post this as a challenge for anyone willing. Although there are no prices, if you can experiment and refine the idea, you can witness the potential. The idea is as abstract as it can be. I leave the improvement of the idea to the acceptors. If you can implement something within the described boundaries, that's okay. If you can build something with modifications that points at flaws in the idea, that's better! If you can derive something new- complementing the flaws of this theory, that's the best! Only one will win!

Among the values mentioned below, 1, 2, 3, 4, and 7 are important while others are optional and are there to improve the network's functionality. Anyone can add new values and define the purpose to them. Those values can have any impact on the network. Classifying the values is better: Positive values(Improves the network), those who hamper the network- Negative values, and those negative values which are helping the network- Neutral values.

How the neuron behaves and works:

All the values that the neuron has(Anyone can add new values and define them):

Connection potential: How likely is the neuron to build a new connection? Higher the potential, higher the chances of building new connections easily.
Connection Threshold: The threshold that must be crossed for the creation of new connections.
Destruction potential: How likely a neuron to destroy its old connections? More accurately, a measure of how much a neuron wants to be left alone.
Destruction Threshold: The threshold that must be reached to destroy any connection.
Division Threshold(+ve values): The threshold for a neuron to divide itself into multiple neurons.
Connection Weights(0 values): Each path can have its own weight that can be passed on to the receiving neuron.
Activation Function: Each neuron can have a different activation function or the same activation function. The activation function is a function that takes in inputs and produces an output signal. The input can be anything: sum of connection weights, path activation threshold, etc. It is up to the creator to decide how they want to define the function. It is also perfectly valid for each neuron to have its own unique activation function. The creator shall provide a bunch of such functions from which the neurons may choose.
Acceptance Threshold(0 value): This threshold describes how much a neuron wants to accept any input signal. We can interpret this in any way we like. If an input signal crosses this threshold, add it to the activation function. Or if any input signal is near this threshold, amplify it to a certain level. Or the neuron may add the input signals from previous connections and see if this new signal is comparable to them, if it is, then we say it passes the threshold and that signal alone becomes that input for the activation function.
Connection Limit(-ve value): Limit how many connections can be made.
1. Minimum connection limit(+ve value): The minimum number of connections that the neuron can have.
2. Death Threshold(0 value): If this threshold is passed, the neuron kills itself and gets rid of all of its connections and values.

How does the network work?

We will have an array of neurons all initialized to random values for all potentials and thresholds.
[N1, N2, N3, N4, N5, N6, N7, N8, N9, N10]
We then randomly choose any number of input and output neurons and build random
connections between every neuron such that the input neurons don't have no input connections from other neurons at all. Each connection is a Path. Each path has an activation threshold. If the input signal to the path doesn't fulfill its activation threshold, the signal isn't passed on to the next neuron.
The output signal from the activation function of the sending neuron can be
compared with the activation threshold of every path it is connected to and the signal may be transmitted through all those paths whose activation thresholds are fulfilled.

For the sake of example:

Let us take just a three neuron model and suppose that there is just input neuron and one output neuron. The third neuron is in between. Now we begin with the random connections. The simplest connection made here would be N1 -> N2 -> N3. So lets suppose that this is the random connection that was made. What is so special here? The special thing here is that N1 can make a new connection directly to N3, N3 can make one back to N2 which ends up again in N3, N1 can make two new connections with N2 and N2 can make three new connections with N3. The number of connections here is endless, theoretically. With just three neurons, we can build a very complex network. The drawback here? We have to deal with a lot of problems.

Lets train this network. We will be teaching it recognize the color of a pixel as the input neuron can only take in the value of one pixel at a time. Lets say the network is in its random connection state(Birth state) with one connection each for each neuron i.e N1->N2->N3. We first have to define the output neuron's output signal interpretation. We need to setup some kind of interpreter which will interpret the output signals generated by the output signal. That interpreter is responsible for providing the network with feedback telling us what the output was. We will just take Black, White, Blue, Orange and Red for this example. Say the following output range corresponds to those colors.

0.02 - 0.68 -> Black

0.776 - 0.990 -> White

1.001 - 1.0079 -> Blue

1.34 - 1.77 -> Orange

3.01 - 3.5 -> Red

The output signals can never be perfect integers. The rounding-up or rounding-down of floats into integers affect the output. Our first input pixel is Black (0,0,0). We may choose to give this input in any way we like. For the sake of the example, we will choose a+b+c from (a,b,c) as the function that calculates our input value from the pixel values.

Do note that different colors may add up to the same value so using a better function
would be better like a+b * c or (a+b) * (b+c) - a. Division would be bad as the numbers can be 0 as well.

Now that we have an input N1 will take the input.
We also need to define how the network will function as we continue. Let's call the first stage of the network The Propagation Stage. In this stage no new connections are formed, destroyed, no new neurons are added and nothing other than making sure the value gets to the output neuron is done. N1 takes in the input value. As there are no other inputs for the neuron(and there shouldn't be as this is the input neuron i.e one unique input). N1 is now in Input State as it uses its activation function to produce its output signal. N1 will check if it has any connections. If it has connections then N1 enters the Search State otherwise it enters the Criminal State. In the Search State, N1 will look through its list of paths to see if any activation threshold is crossed. If crossed, N1 will send the signal through the path to the receiving neuron. If multiple paths are found with activation threshold crossed, each path will receive the signal but if none are found N1 enters the Criminal State. Why is it called so?
Because neurons in this state will break the law of The Propagation Stage. The neuron will make an attempt to form a new path in a desperate attempt to get rid of its output signal.
If N1 -> N2 didn't exist from those random connections, N1 would have formed a new connection with N2 or N3 anyway. Finally N1 enters the Calm State by getting rid of the output signal.
We can introduce a Penalty system where the criminal Neuron goes through some changes which might impact the network either positively or negatively. The path sends the signal to the receiving neuron. The receiving neuron isn't an input neuron and so it now enters the Listen State where it waits for all the signals to arrive before entering the Input State and calculating the activation function.

Note: Our activation function is undefined here but say N1 sent a value of 2.3 to N2.

After N2 gets out of the Listen State, it goes through the same states as N1 and sends a signal to N3(say N2 sent a value of 1.11). Now N3 goes though the same states and produces its output signal which is interpreted by some program which knows what the output should be and gives feedback to the network. Say N3 gave a value of 5.6 which is way off the range for Black and so the network needs to improve itself. We start from N3 this time. We move back and fix things. We could update different values, activation threshold and other values as well. Lets call this stage the Fixing Stage. As the network moves backwards, it makes an attempt to fix itself. Now what? Is the network sure to recognize the color Black this time? No. What do we do? We send the same input from N1 and see how well it did this time. We need to introduce something like an error margin. Based on the recent error, the network will adjust itself which is exactly like
back propagation in static networks but that isn't the most ideal thing to do with this type of models specially when there are hundreds of neurons and thousands of connections between them.
It shouldn't be impossible but rather extremely hard to implement back propagation. I will explain that later. Now after the Fixing Stage the network enters the Arrangement and Creation Stage which should be entered after every output while in training but rarely while working. In this stage each neuron gets to build new connections with no penalty. It can destroy connections, change its internal
values, change out its activation functions, introduce new variables and thresholds, add new neurons etc. The network goes neuron to neuron and allows it to do all those things.
Now after all that changes say the network looks something like this:

|------>|---> N4-|

N1 -> N2 ---> N3<|

^-------|----------------|

The network already looks complex. N1 has two paths to N2, N2 has one path to N4 and one to N3, N3 has one path to N2 and N4 has one connection to N3. We can put a Restrictive condition in which the new paths take on values for activation thresholds based on the most recent error. Say our most recent error was 1.3. Now say N1 wants to make a connection with N2. The new path takes a value less than or equal to(<=) 1.3
The value taken is subtracted from 1.3 and the next new path does the same but this time it's max value has to be <= (1.3 - The subtracted value). Now we have another problem
at hand, how does the network work now? Lets break this down. N1 receives the input, goes through all the states. Say, the new connection to N2 wasn't activated. From the old connection, N2 goes through all the states as well but what is up there? N3 also has a connection to N2 here? Here is where we introduce the concept of One-Way Paths which is different from the Normal Paths we were using so far. The path from N3 -> N2 is a One-Way path, why? Because as N2 needs input from N3 but N3 needs input from N2 in the first place which is why the only way for this path to be activated is if N3 activates it.

Note: This Different path type solution is not the best out there and neither are the others I thought about. I will discuss the others later.

The COMPUTATIONALLY EXPENSIVE and MEMORY HOGGING dilemma is already obvious, isn't it? We can introduce new path types as needed with new properties. Now N2 has
a path to N3 and N4, what do we do? Where should we go? Say both paths had different activation threshold. That would have made things easier. But now suppose that those values weren't same. How do we handle this? We first go to one of them randomly. Say we choose N4 then it is straightforward, N4 goes through all the states, then to N3 which will either produce an output or go back to N2 from the next path and then it comes back to N3 in some way. But what if we chose N3 rather than N4? We go to N3 and see that it needs input from N4. Is that a One-Way path? No. Because N4 doesn't need input from N3 and so N3 NEEDS the input from N4 hence we go to N4 which needs an input from N2 but N2 just did all that calculations. How do we solve this? Its pretty simple. Just broadcast the output from N2 to all receiving neurons(even those who don't fulfill the activation threshold), a sort of like a Cache. The path can temporarily store the broadcasted data and provide it to the neurons. N4 can use that, do all it needs to do and give it back to N3.

See the problems? With just 4 neurons it has already gotten so complicated and I haven't even started on the problems. Building a network using this architecture will need an entire team and lot of designing, planning and problem solving. Already the amount of computational resources needed is abnormal compared to the size of the network. The network can form more and more paths there and make it even complicated. This is why I said that using something like back propagation will be extremely hard as you have to come up an entirely new algorithm.

How does the network learn though?

What we are basically doing here is forcing the network to associate those values with those input values. We are forcing it to associate an input of 0(0+0+0)
with the range of 0.02 - 0.68 which is interpreted as a color. Basically the network learns about the world through numbers and numbers alone. What about other colors? Black could be
N1 to N2 to N3(after the "Fixing Stage"), White could be N1 to N2(from the next path) to N3 to N2 to N3, Red could be N1 to N2 to N3 to N2 to N4 to N3, or we could add, to N2 to N3 for Orange.
The above describe possible neuron firing pattern for different colors.
This means the network takes specific paths for specific inputs. We will get into the similarities with the real life neurons in a bit.
The power and potential is clear to the eyes but our computational power is lacking(I am on a not-so-good device so maybe I am wrong).

Problems:

As can already be seen, the need for completely new algorithms for different things is necessary. Since the architecture is new and no body has even created any so far, this is just a ground layer for making an actual network. The creators have all the freedom they need. They can add new values, add their own interpretations, add new path types, their own functions for various features, make the neuron even more complicated and even derive a completely new way of making the neuron work. As mentioned continuously, the problem with memory and computation are the major problems. We need memory to store the paths, neuron states, network state, cache, connection direction and so on. The list is long.

Another problem that will haunt anyone who dares build a network from this is the problem of lone groups and closed paths. Lets look at this using a new example. N1 to N2 to N3 to N4 to N2 to N3 to N4 to ...., see? This is a closed paths right there. Another example for next problem: N1 to N4 to N5 to N6 to .... such that you never encounter N2 and N3. This is an example of lone groups. The neurons N2 and N3 have isolated themselves, forming no connections degrading the network. Both of these problems affect the network as a whole in a bad way. One neuron can end up building a path from itself to
itself which, again, is bad.

The use of more variables(as long as they are +ve or 0 values), will make the network better but also make it slower. This already sounds too complicated to implement and I cannot see anyone staying sane after trying to make one. The network will be very slow unless you have computation monsters. The network is way too much unpredictable
for any decent graphing of what each neuron does or will do without a complicated system in place to know the structure.

Merits:

There are a lot of merits to this model. This model is dynamic i.e it keeps learning even when it is working. Even with less neurons, we can make exceedingly complex networks. The sheer number of possible combinations from just a few neurons is jaw-dropping. Any network should be very adaptive to changes to its input as long as the proper output set is defined and feedback is provided with each mistake. For eg: We can use the above color identifying network and make it do some other thing as well say based on a given ASCII value for a lowercase letter provides the value for its uppercase letter as the output. The network will just have to build new connections and expand accordingly while still retaining its ability to identify colors. We still need to define output value ranges and have some kind of interpreter which knows what each output value means. A creator has all the freedom to define new values, add a new interpretation to it and so much more. What's more awesome is that this network builds a hidden layer by itself. All it needs are input and output neurons with random connections.

Some Solutions:

To be honest, I don't really have any solutions to the above mentioned problems. Those are problems which need experiments to be solved. Though I do have some solutions but they aren't worthwhile. Limiting the number of connections per neuron is a good option. The use of few -ve values with a bit more of 0 values and comparably more +ve values should do the trick.
A ratio of something like (-ve):0:(+ve) = 1:5:4.9 or something similar. Too many -ve values will affect the network in a very bad way while too many +ve values will make the network unbearably slow(as if it isn't slow already). As mentioned, a fixed path fires for a specific input neuron and if that input value is provided too often then associating that input directly to the output would be a better choice. But the cost for this would be more memory but that is a worth it sacrifice as we can predict which input will be provided more often.
Remember the Different path type solution? The following is a much more worse solution.
If only we could run a neuron per process. That is asking too much but if we could run at least 5 neurons on separate threads in
different processes then we can make the network faster but at the cost of more resources and added complexity. It's near impossible to go neuron to neuron in this scenario until we reach the output neuron. One neuron might be dependent on the output of another neuron which in turn might be dependent on the current neuron. This is a dependency loop. The Different path type solution tries to solve this specific problem but we lose a lot of our flexibility and require much more memory which outweighs the merits it brings.
There is one another solution to the problem. We may run each input neurons per process or per thread. Each time a new branch is detected i.e multiple paths are activated at once, each path shall get its own thread or process. Using this in conjunction with the Different path type solution, we get some improvements but if we just say that if an input comes before the neuron fires, that's accepted otherwise not then we don't even need the Different path type solution but we loose a lot of the flexibility.
Also note that the Arrangement and Creation Stage only occurs while in training, why? Well, nobody is stopping you from making it a possibility while the network is running but think of the resources it will eat through in an instant.
This is all I can think of right now.

Some Notes:

What is the most interesting thing about this model? The fact that it is an attempt to closely resemble a biological neuron makes it that much more interesting. Remember that I mentioned that certain paths are activated for a given input? A biological neural network is very similar. When you smell something, that activates a bunch of neural pathways in your brain. Some pathways help you identify the smell while some may help to pinpoint its location. The output from the identification network can be fed into another network which could try to remember something based on that smell. This goes very deep into biology about specific pathways and how memory is formed and retrieved but in short, the model is also very similar i.e a complex enough network from this model could be used to study actual biological brains. Though the costs will be huge.. Then imagine running millions together. We won't be able to manage thousands let alone ten thousands.

Another interesting thing that you can see is the structure of the network that forms. Using just the first four values mentioned at the top, you can see some pretty wild stuff. As the
initialization is completely random, you don't know what the values for these is but you can put some kind of mechanism to track your network with time but you can just make a wild
prediction in some cases with the random values. A network with majority of neurons having high Creation Potential, high Destruction Threshold, low Creation Threshold and low Destruction
Potential, will create a very complex yet slow network. Similarly a network with majority of neurons having low Creation Potential, low Destruction Threshold, high Creation Threshold and
high Destruction Potential, will create a very simple network which may be fast but will be very dumb. The ideal ratio for the different types of neurons will need experimentation.
We can make other kinds of guesses using other variables as well!

Final Note for those who didn't understand

Let me describe the network architecture in the simplest way I can. Think of the architecture as an extravagant way of implementing a hashmap. The network is the hashing function which generates a hash values and the interpreter associates that hash value with something.

In a more non-technical way, you may take a city roadmap as an example. You are a signal that starts at your house which is the input neuron here. You take one path that leads to another that leads to another and so on. One path can take you to a hospital but another might take you to a school. Multiple paths can lead to a junction and then lead to multiple destinations. Multiple paths can take you to the same place. This is exactly the principle.

One path associates to something while the other path associates to something else. Take just two neurons, A and B. Here are some possibilities: A to B = One output, A to B but with different path and different activation threshold = Another output. Each new path with different activation threshold can associate to one thing. In theory, with infinite resources, just two neurons should be able to associate to an infinite number of things.