Dr. Peter Henningsen
peter@alifegames.com
This paper is structured in five parts: Parts 1 and 2 introduce a class of
dynamic systems called SONkANNs - self-organizing networks (k times iterated) of attractor neural networks. The usefulness of these systems is derived in two different ways:
Part 1 analyzes the brain as a dynamic system and searches for the simplest way to capture the essence of the workings of the brain in a toy realm.
Part 2 searches for the simplest system that can exhibit complicity or superemergence in a computer.
We consider it significant that these two arguments, based on neuroscience and dynamic systems theory, independently lead to the same system architecture. The class of systems we single out for further investigation are networks of Hopfield nets with the following properties:
The basic nets are configured for 2D-symbol recognition.
The network of Hopfield nets is governed by a regulatory network that can implement learning and shift attentional focus.
The network of networks is connected to an outside world through sensory-input/action-output channels.
We call such networks SOHNs (Self-Organizing Hopfield Networks).
Part 3 consists of a technical analysis of SOHNs, and part 4 concentrates on two interesting properties they have:
Semantics: Based on the existential connection to a valid outside world, and propagated through similarities between the symbol sets of neighboring Hopfield nets, the SOHNs can develop semantic content.
Brainwaves: The workings of the regulatory network can give rise to brainwaves which in suitable semantic networks can perform emergent information processing on a higher organizational level.
In Part 5 we suggest a method to investigate the space of SOHNs for outstanding cases of brainwave processing in semantic networks that can be considered cases of supermergence or consciousness. Our solution is to use the SOHNs as brains for actors in a virtual world that ``users'' can interact with as a computer game. If users are given an easy way to design/modify/train SOHNs, which they can then use to compete with their peers, we expect that a lot of ingenuity will be applied to find the SOHNs that act in the most intelligent manner. If the space of SOHNs allows for consciousness to emerge, we expect that this approach will eventually find systems that actually do so.
When analyzing a complex system, the first step is to check whether the system can sensibly be decomposed into subsystems. Subsystems are defined as parts of the system whose dynamics is dominated by their internal dynamics, with the interaction between subsystems being a secondary influence. In many situations such subsystems are clearly visible and suggest themselves - for instance, the human body has its organs as subsystems, and the organs themselves have cells as subsystems. The brain has many subsystems which have different anatomical names, and whose function in the overall system of the brain is known to some extent. One such subsystem is the neocortex, a thin layer on the outside of the brain that is much more prominent in humans than in apes or other animals. Parts of it are used for processing sensory input, and a large part of the neocortex is called ``association cortex'', and is used for thinking and decision making. The neocortex is organized in six distinct layers parallel to the surface of the brain, and into columns that run perpendicular through these layers. We will argue that it makes sense to regard the part of every column that is in one layer as a subsystem of the neocortex. According to this view, the neocortex is made up of (6 * (number of columns)) subsystems.
Even if the neocortex were just an undifferentiated mass of neurons with random connections, there is a statistical likelihood that some subsets of neurons would have many more connections amongst themselves than with the outside. Such subsets, which naturally emerge from randomness, could usefully be analyzed as subsystems. However, in the actual neocortex neurons are much more likely to have connections to other neurons that are close by than to those that are distant, which means that there will be many more subsystems than in a randomly connected mass of neurons, and these subsystems will consist of neighbors. Since layers and columns visually structure the neocortex into subsets of neighboring neurons, it makes sense to assume that these subsets can actually be regarded as subsystems.
There are other reasons as well. It is well known that the information content of the human genome is not large enough to specify the layout of the brain. The wiring of the brain cannot be such that every connection between two neurons is specified in the genome - rather, large parts of the actual wiring must be emergent from processes that are simpler to describe than the wiring itself. An ideal candidate for these simpler processes is a process that specifies the wiring of a subsystem, and another process that determines where to build such subsystems.. Another reason can be derived from the experiences that humans have made when they have created complex systems themselves. It is very inefficient to build every little piece from scratch and independently of the others. To effectively create a complex system, humans have learned to construct it out of subsystems, preferably subsystems that can be used repeatedly. This is the principle of reductionism that stands at the core of natural science, it is the root idea that has facilitated the industrial system of production, and it is the idea behind object-oriented programming. Effective deployment of subsystems has allowed humanity to create systems of staggering complexity, a trick that nature learned through evolution a long time ago.
Finally, neuroscientific research shows that the decomposition of the neocortex into subsystems can effectively explain many observed phenomena. Amit[1994] makes the case that there must be modules (Hebbian cell-assemblies) in the neocortex that can reverberate in one of several possible attractor states to keep content in active memory. His results are based on experiments performed by Miyashita et al on monkeys, and his paper mentions several other authors who interpret the neocortex in terms of subsystems and attractors to explain observed phenomena.
Anderson and Sutton[1995] argue that a ``network of networks''-model provides a plausible means of describing different levels of system organization within the brain. They show that this model can give rise to some of the higher level processes evident in cognition. Their basic approach is the closest to ours we could find in the literature, but their implementation is quite different from ours1.
For more information on relevant literature refer to Gerhard Werner[2000], an excellent survey article, which gives an overwiew of current neuroscientific research, including research that uses the ``dynamic systems''-paradigm.
Some subsystems in the neocortex, particularly those that are dedicated to the classification of sensory inputs or to the organization of motor outputs, are probably dominated by their input - in neural net parlance these would be feedforward networks, and we do not concern ourselves with such subsystems here. We are interested in subsystems whose overall dynamics are dominated by their internal dynamics, with the input playing a secondary role. The attractors in these subsystems must be either point attractors, cyclic attractors, or strange attractors. We will now consider the usefulness of these three types of attractors for an information processing organ such as the brain.
Point attractors: A system with point attractors only will, if left undisturbed, converge towards one of them and remain there. This can be considered as a classification: Whatever the systems initial state, it will transform itself into one of a finite number of states. If the system operates with input and output, it can classify its input and generate output that depends primarily upon the attractor it is close to. This could be very useful in basic information processing.
Periodic attractors: A system that converges towards a periodic attractor will cycle through a series of states. Its output could obviously be used for timing functions, implementing an internal clock, or serving to coordinate and control other subsystems of the brain. Making use of cyclic output in basic information processing appears to be more complicated than with point attractors.
Strange attractors: A system that converges towards a strange attractor will move through comparatively large parts (but not all) of its state space in a pattern that appears random at first sight, but that in fact is predictable and has detailed internal structure. Its output can be used as pseudorandom noise (which is very useful to have in information processing systems), or it can be used to coordinate and control other subsystems of the brain. Making use of strange output in basic information processing seems to be very complicated, though not impossible.
Since we do not try to replicate the functioning of the brain, but to re-implement its most fundamental dynamics in mathematical models that run on computers, it does not make sense to use neural nets to get timers and pseudorandom noise - we can get that much easier on the computer. We also do not intend to use cyclic or strange attractors in our models for basic information processing for now - it makes sense to start with the simplest case, the point attractor, and to introduce more complicated attractors if and when they are needed. It will certainly be easier to build SONofANNs that do something sensible with point attractors only, than if we used all kinds of attractors freely - and this restriction may also apply to evolution to some degree, which would imply that point attractors are also much more commonly used in the brain than the other types (not counting instances where the other types are used for noise generation and clocks).
Right now it is impossible to empirically determine what kind of attractors an isolated column in the neocortex has. For one, we cannot measure the state of enough of the neurons simultaneously, and for two, in a live brain the dynamics of the column is always driven by input from the outside. Even if the system only has point attractors, the input will destabilize these attractors ever so often, thus making the system travel through a series of attractors. Any kind of data one will obtain from a live brain will for this reason always look very complicated.
Wherever complex systems have arisen in the world, they appear to be made up of levels of emergence. For instance, from quantum mechanics emerge atoms. The interactions between them can be described by rules that are much simpler than if we applied the laws of quantum mechanics to them directly, and so atoms become the basic elements of chemistry, which captures many useful aspects of the world one level up from quantum mechanics. An emergent feature of chemistry are replicators, which give rise to the biological sciences that describe life phenomena, one level up from chemistry.
Since the human brain is a very complex system that exhibits very interesting emergent features such as conceptual thinking and language, it is quite likely that there are levels of emergence within it. For while it is possible that conceptual thinking and language may emerge directly from the interaction of neurons, the two levels appear too alien to each other, like life emerging directly from quantum mechanics. I think there is no way we could ever hope to explain the emergence of life directly from the level of quantum mechanics, and I don't expect we will ever find a way to explain the emergence of conceptual thinking and language from neuron interactions. However, I think we may well be able to explain the emergence of attractor dynamics from neuron dynamics, and the emergence of conceptual thinking and language from attractor dynamics can at least be envisioned.
Picture this: The parts of the brain that are dominated by feedback dynamics, particularly the association cortex, is a networks of networks... each constituent network with its own internal attractors, all sending output to each other that depends primarily on their current attractor, and flipping each other from one basin of attraction into another in a dance-like fashion. This system has attractors of subsystems as its basic elements, just like atoms are the basic elements of chemistry, and we speak of dynamics in attractor space to refer to this. Such a dynamics in attractor space can emerge in any brain that contains areas that are dominated by feedback dynamics as opposed to input/output dynamics. I expect it to be present in many animals, like in all vertebrates. Attractors can have semantic content (see here) and exist on time scales that are commensurate with the time scales on which animals have to act. Because of this they are much more suitabable to high level information processing and action selection than raw neuron firings, which explains why evolution has led to their emergence.
In humans the dynamics in attractor space is operating in a large enough system, with enough attractors to flip around so that it becomes possible for self-organization to become operative on this level and lead to the emergence of features one level up from the dynamics in attractor space. I find it likely that on this level we will find phenomena such as conceptual thinking and language - new and tenous phenomena in the history of evolution. Most of our mental processes work on the level of the dynamics in attractor space - for instance, when we use our intuition to come up with new ideas, or wrestle with a difficult decision, these processes are unconscious. Why? I suggest because they are non-conceptual, operating at the level of attractor dynamics. This dynamics can be actively bent towards a specific problem, and we can consciously feed it new data or restate the precise terms of the problem, we can even be aware of the work being done in our brain (by feeling a tension and becoming tired)... but we cannot become conscious of the content of the work being done until the result suddenly pops into consciousness, fully formed.
I believe that a very good case can be made that there are numerous subsystems in the brain, particularly in the neocortex, and a good case that the dynamics of many of these subsystems is dominated by point attractors. But it is a pure leap of faith to assume that these subsystems interact with each other by output/input that depends primarily on their current attractor (or, stated more precisely, the basin of attraction they are currently in). Also, I don't see how investigations of the brain with current technology could be used to test the hypothesis that there is a dynamics in attractor space mediating between neuron dynamics and high level phenomena such as conceptual thinking and language. Instead, I intend to implement systems on computers that have a similar dynamics in attractor space, and to see what kind of phenomena will emerge from this dynamics. If phenomena emerge that are similar to high level phenomena in the brain, this would be an indication that the picture I have been painting may well capture an important aspect of reality.
The brain has the capacity to adapt to situations it encounters repeatedly - this process is called learning and is one of the features that makes having a brain such an advantage. Dynamics in attractor space as we have envisioned it so far does not implement learning, so we have to add that feature if we want to accurately re-implement the basic dynamics of the brain. The obvious candidate for this is some version of Hebbian learning, where the link from subsystem A to subsystem B gets strengthened when the attractor in A has contributed to flipping the attractor in B. This will have the effect that pathways in the brain that are used often become easier to use and more likely to be used over time.
Now imagine a herbivore peacefully grazing, the activity it engages in most often - its brain will be well entrenched in a set of well connected attractors. If the herbivore's brain dynamics in attractor space is implemented as we have envisioned it so far, it will be very hard indeed for new sensory input to destabilize the reigning attractors and flip the system into a new global state. But just that is needed when a predator appears, and fast! When the predator appears, the herbivore will switch its attention from grazing to predator evasion in a flash - this attention switching mechanism is part of the basic functioning of the brain, and we need to implement some kind of attentional system in our model too.
While learning can be implemented through local mechanisms, the attentional system is global in nature: A small change in a few subsystems concerned with sensory input must be able to destabilize attractors all over the entire system, and activation must be able to quickly spread from the subsystems that are concerned with the new situation. There is not one obvious way to implement this, so we will have to experiment with different possibilities. In particular, it seems overly restrictive to give the power to trigger a switch in attention to input subsystems only, and the whole system is more elegant if instead of simply destabilizing all attractors in the system we allow selective destabilization of some subsystems. We are thus lead to extend our model by introducing a regulatory network, which takes input from the entire system, with its main input being information about attractors that have been flipped, and that can output globally to anywhere in the system. Its output will be
1. An action that destabilizes selected attractors;
2. Temporary strengthening of links in order to facilitate the spreading of activation from subsystems upon which attention is focused;
3. Permanent small changes in link strength to implement learning.
A SONofANNs is a network of attractor neural networks with such a regulatory network attached to it.
In the brain, the rapid destabilization of some attractors and activation of others is almost certainly done through chaotic processes. Freeman has identified such processes in his extensive research into the olfactory bulb (see e.g. Skarda and Freeman[1987]). A SONofANNs with its regulatory network active will exhibit similar highly complex behavior. While the brain must use subsystems with strange attractors to achieve the rapid large scale shifting of state that accompanies a shifting of attentional focus, we can choose simpler mechanisms when implementing such processes in a computer. For instance, the destabilization of an attractor can simply be effected by injection of pseudorandom noise.
Section 2 builds on the book ``THE COLLAPSE OF CHAOS'' by Jack Cohen and Ian Stewart, who bring a refreshing new perspective to bear on the science of complexity. They complement the standard paradigm of reductionism that explains the behavior of a system through the interaction of its parts with a view that takes into account constraints imposed on a system by its interaction with other systems. Similar complementary views of the process of self-organization were developed by Ilya Prigogine, who pursued the bottom-up, reductionist approach, and Hermann Haken, who focused on how global attractor states constrain the possible dynamics of subsystems. Cohen and Stewart make a good case that the reductionist approach to complex systems is one-sided, and try to map out the territory from which a paradigm complementary to reductionism might emerge. They introduce some useful new terms, such as features and complicity, for which we give a short introduction here.
From complex systems emerge features, which are simple and significant. Features can be an attractor the system is in, or statistical averages such as pressure and temperature in a gas - a hurricane would be a feature of the weather system, and yellow would be a feature of a large enough mass of sulphur atoms. When we switch levels of description such as from quantum mechanics to chemistry, or from chemistry to biology, we ideally would use the emergent features of the lower level system as the elementary constituents of the higher level system.
Complicity happens when two or more complex systems interact with each other through their emergent features. Cohen and Stewart develop this concept in depth using the example of the biological evolution that has happened on Earth. Giving detailed reasons we cannot repeat here they make a very good case that evolution is not just variation in DNA-space and selection acting on the resulting creatures, but that DNA-space and creature-space both have their own dynamics dominated by attractors, and that the dynamics of the two spaces interact largely through their emergent features. They go on to argue that it is no coincidence that such amazing features have emerged from the evolutionary process, but that is can be attributed to the fact that they emerge from a system that does not just possess complexity, but goes one level beyond that into the realm of complicity. They demonstrate that complicity has the potential to lead to super-emergence, and cite consciousness as another example of super-emergence, implying that there must be complicity at work in the brain. While they do not work out in detail how complicity might be implemented in the brain, they make a very useful argument that the brain can be understood as being essentially a feature-detector.
When I finished reading ``The Collapse of Chaos'', I was so convinced of the usefulness of the ideas presented in the book that I decided to spend some more time with them to absorb them better. The best way to do this, I decided, was to try and find a simple toy realm where it might be possible to see complicity in action, and play with super-emergence. I did not think of the brain when I set out to do this, and was very astonished when I arrived at an architecture very similar to the one I had developed earlier based on my explorations of neuroscience. This is how I set out to create complicity from scratch:
The first task is to find toy realms that can be made to interact by using their emergent features as input/output. According to the ideas put forward in ``The Collapse of Chaos'', the specific choice of toy realm should not really be important, as long as we get the dynamics right. After some pondering, I settled on the Hopfield net as a suitable toy realm - this is a recurrent neural network commonly used in signal recognition, such as optical pattern recognition. When used this way, the Hopfield net operates on a rectangular grid where each cell can be either ``on'' (value +1, represented as black square) or ``off'' (value -1, represented as white square). One can choose the patterns to be recognized arbitrarily (as long as there are not too many), and then simply construct the connection matrix of the neural net from them. Each pattern becomes an attractor of the net thus constructed, and when the net is presented with a test pattern and allowed to iterate over it, it will converge to one of its attractor patterns. This is handy when trying to create complicity, because while the attractors are emergent features of the system, we can actually start with them and choose attractors we can easily recognize. Another great advantage of the (optical pattern recognition) Hopfield net is that it can be well represented on a computer screen, and a human observer can easily observe the dynamics and gain an understanding of it.
So the first step we take on the road to constructing complicity is to take several Hopfield nets and use a representation of the attractor one net is close to as an input for another. This representation can either be the current state of the net (assuming the state is close to an attractor the state could be said to represent that attractor), or the closest attractor itself. I tend towards the second alternative for several reasons, in particular:
1. This makes it possible to suppress the output of spurious attractors that arise in a Hopfield net, and that would make understanding of the dynamics more difficult.
2. I have elsewhere2 developed an argument that a necessary condition for an entity to have free will is that its brain must be a quantum effect amplifier. Switching from one output attractor to another in response to the smallest possible change in the system makes our model a better quantum effect amplifier and improves the chances that AI developed on this basis, when implemented in a quantum computer, will be able to develop free will3.
When we connect several Hopfield nets by sending their current attractor as output to other nets in the neighborhood, all we get is one larger system that will follow its own dynamics, and that depending on its initial state will converge towards a point, cyclic, or strange attractor. Nothing but this attractor will ever emerge from it, and that surely cannot be called superemergence. So how can we enrich the system so that it may become possible for it to give rise to simple, yet amazing features? While there may be other approaches that have not occurred to me, the only thing I can think of is to give the system an input and an output. We understand systems with input and output well because we are such systems ourselves. In a truly closed system that we could only observe, how would we be able to say that an emergent feature is not only simple, but that it is significant? Can there even be significance without interaction? I don't think so. So this leads us to a network of Hopfield nets, interacting by sending output to each other that codes for their current attractor, and with some nets receiving input from the environment, and some giving output to it.
Immediately we face another obstacle on the road to complicity: The global dynamics of the network of Hopfield nets will have attractors, some of which will sit at the bottom of deep basins of attraction. How can an input change such an attractor (as it surely must be able to do, for sitting in one basin of attraction cannot be superemergence)? If we make inputs so strong that they simply flood the entire network with their signals, that would wipe out the attractor dynamics and defeat our purpose. What we need is selectivity and reactivity - a way to destabilize stubborn attractors in response to small changes of input, while most such changes should leave the overall state of the system unchanged. Without that type of dynamics there is no hope for achieving super-emergence, and as far as I can see there is no way to get this type of selectivity and reactivity in our current architecture by fiddling around with parameters - we need to add a regulatory network to the network of Hopfield nets. This regulatory network must be able to destabilize attractors in some situations, but must leave well enough alone most times.
The development of the architecture thus far is derived purely from the intention to build complicity from scratch, but at this point it becomes obvious that we are constructing something rather brain-like, and it makes sense to take some inspiration from what is known about the brain to improve our chances of getting to complicity. The regulatory network we just introduced bears a resemblance to an attentional subsystem in the brain, and as such it should not only destabilize attractors, but also facilitate fast spreading of activation from Hopfield nets that are the focus of attention. This is the SOHN architecture (Self-Organizing Hopfield Nets, a special case of the SONofANNs). If we want to see superemergence at work in a SOHN, we will have to mount an extensive search through the space of parameters describing possible SOHNs4, and this will probably be simplified if we use some type of Hebbian learning as outlined here. Now we have arrived at the same architecture that was outlined in section 1, except that we have specialized the attractor neural networks to Hopfield nets.
If we want to test large numbers of SOHNs for the presence of supermergence, it would help if we knew what we are looking for specifically. ``Simple features of great significance'' is a somewhat nebulous goal to shoot for. Since our toy realm has turned out to look a lot like the brain, and an essential function of the brain is feature detection, we can focus our search by looking for SOHNs that perform feature detection. Cohen and Stewart write (The Collapse of Chaos, p 425): ``... feature detectors are themselves features, so a generalized feature detector will be self-referential. We give the label ``consciousness'' to our feature-detection system: We become conscious of a feature of the world when our brain detects it. Therefore consciousness is self-referential - that is, we are conscious that we possess consciousness... What needs explaining is not the self-referential nature of consciousness, but how it can take a huge quantity of partially structured sensory data and extract important features - the same features that ``natural'' interactions see.'' While I think that the first and more primitive feature detection systems we create will only be able to detect a few features of their world, and will not be self-referential, I believe that Cohen and Stewart are on to something with their understanding of consciousness as feature detection, and that if we could get feature detection to emerge in a computer, we would have made a significant step towards true artificial intelligence.
In order to be able to do feature detection, a SOHN must be situated in a world that makes sense - meaning that a human who would take note of the data that the SOHN receives as input would be able to consistently extract features from these data. We will discuss the choice of such a world in Section 5. It is quite possible, however, that the SOHN architecture does not yet have sufficient complicity to allow for the emergence of feature detection. We therefore must map out a path on which we can proceed to generalize the structure of the SOHN in order to achieve greater complicity. The obvious path for this is recursive modularization, which can be applied equally well to SONofANNs, and which probably is implemented in the brain to some degree. We define recursively:
SON1ANNs .= SONofANNs (or more specifically a SOHN)
SONkANNs .= SONofSONk-1ANNs
.= a self-organizing network with SONk-1ANNs as the elementary constituents.
If we cannot get feature detection in a SOHN, we will investigate the space
of SON2ANNs, with Hopfield nets as the most basic ANNs. A simple
SON2ANNs is shown in the diagram below, which depicts 3 interacting SOHNs, each of which consists of a 4 x 4 array of Hopfield nets.
Initially, we would have evolved one SOHN, which would have its input and output in its top row. Then, when we clone this SOHN, one of them gets the input deactivated, the other the output. At first, both SOHNs would have the same attractor patterns and dynamics, but soon they would be allowed to diverge. When a third SOHN is introduced as in the diagram, it would make sense to make the interaction between the two original SOHNs weaker, and interactions with the new unit stronger, so as to force an interaction of the input and the output SOHN through the intermediary. Later more SOHNs could be added.
If we go to this level of complexity we must decide how the SOHNs should interact. I think the simplest way to connect two SOHNs is to let their elementary modules have strong connections with modules in other SOHNs that have the same placement in the layout of SOHNs (such as the 3 blackened modules in the diagram above), have weak connections with modules close by, and no connections with modules far away. This would have the advantage that SOHNs can be cloned and introduced into the SON2ANNs without disrupting the behavior patterns that have already been evolved. Later the dynamics within the new SOHN and its connections to other SOHNs can me modified to give access to new modes of feature detection and action selection.
The amazing emergence from the process of evolution may have come about because the two underlying spaces, DNA-space and creature-space, each have a very strong internal dynamics, and the two dynamics are very different from each other - it is plainly impossible to find a straighforward mapping between genes that code for proteins and creature features such as wings, eyes, and beaks. We should therefore make sure that the dynamics in different SOHNs become quite distinct. An important aspect of this is choosing the symbols that define the attractors of the Hopfield networks.
Initially we should take a cue from the brain, where neurons are most likely to be connected with other neurons that are close by, and connect the Hopfield nets in our SOHNs most strongly with their neighbors. If we do that, and neighboring Hopfield nets have similar attractor patterns, we will get an associative network. The recursive modular structure we introduce when we go to the SONkANNs introduces a hierarchical structure into our global network: The reigning attractors in the higher level networks will constrain and dominate the dynamics in the lower level networks. This coexistence of an associative network and a hierarchical network in the same network structure is the dual network that Ben Goertzel[1996] has identified as essential for cognition. He has initiated a major project to actually build a true artificial intelligence based on the dual network paradigm5.
We describe the state of a Hopfield net6 by a column vector
W = |
|
xixiT - pI |
When the Hopfield net is running, it updates itself according to the basic
formula
The Hopfield net can be analyzed with the energy function
Unfortunately, there are normally more attractors in the network than the ones we have chosen. If x is an attractor, then -x is also an attractor. When we translate our state vectors into patterns on a grid, this means that for every attractor pattern, the reversed pattern will also be an attractor. One way of dealing with this is to always choose attractors in pairs: A pattern and its reversed pattern. However, we may not always want to do this. Even more bothersome are mixtures of attractors: From every linear combination of an odd number of attractors we can construct a new attractor s according to
There are three sensible ways in the context of a SOHN to deal with spurious attractors, and I consider it best to let the user decide which of these methods to choose. When initially choosing a set of attractor patterns, and also when pausing between runs to reorganize the network, the user will be presented with the design attractor patterns and all the spurious attractor patterns these give rise to, and the user will be able to choose between these alternatives:
1. Treat the spurious attractor as a regular attractor, meaning it will then become one of the possible outputs of the net.
2. Give instructions to the regulatory network that whenever the net gets close to this attractor, it should be kicked towards another specifically chosen regular attractor.
3. Give instructions to the regulatory network that whenever the net gets close to this attractor, it should be destabilized, without any bias as to where it will go next.
Choosing option 3 turns the basin of attraction of the destabilized attractor into a region that (provided the destabilization is done just right) is placed on the edge of the basins of attraction of all the attractors in the net. Choosing option 2 means that the attractor towards which the net is kicked has two basins of attraction: Its own and the one of the unwanted attractor that gets destabilized. These two basins can be disconnected and far away from each other. Both of these are interesting and potentially powerful situations that cannot be readily achieved in an ANN without a regulatory net. However, they can also be problematic - in particular, understanding the dynamics of systems where attractors have several disjunct basins of attraction will be difficult. Therefore, option 2 should normally only be chosen when the unwanted spurious attractor is very close to a regular one, so that two adjacent basins of attraction will be merged.
If we want to combine Hopfield nets into a SOHN, we must choose their input/output connections. In subsection 2.2 we gave reasons why we want to use the current attractor of the net as its output. Ideally, we would like to use the attractor whose basin of attraction contains the current state of the system. However, this would be very expensive to compute, so we just choose the regular attractor7 the current state is closest to in Hamming distance8 as the output, naming this the current attractor. This elegantly takes care of spurious attractors, which can never become output this way, unless they have been specifically elevated to the status of regular attractor by the user.
Regular processing in a SOHN (as opposed to attentive processing,
which is covered later) is done as follows: Let M be the number of neurons
that are updated inside each Hopfield net per second according to the
process described in subsection
3.1. With N(t) a real-valued
function that is an output of the regulatory network, (int)(M * N(t)) is the
number of link transmissions from one Hopfield net to all the nets it is
connected to. To describe how link transmissions proceed, we must first
define link strength: For every pair (I , J) of Hopfield nets
there is a real valued link strength LSIJ
with
In a computer program that shows the SOHN and its environment in real time, it is not known in advance how much time will be taken up by one simulation tick - one tick includes one complete update of all SOHNs, the execution of all actions that have been chosen by actors in the world, including SOHNs, the update of the world to the new status resulting from these actions, and the display of the new state of the SOHNs and their world to the screen. Let dt be the time that was used by the previous simulation tick, measured in seconds - then we first do M * dt updates of each Hopfield net according to its internal dynamics. We then compute the current attractor for each net, and store the results so we can use them in the transmission step, which comes next: Every Hopfield net does M * N(t) * dt link transmissions of its current attractor to the nets it is connected to. Since the updates of each net according to its internal dynamics will stabilize the current attractors, and the link transmissions will often destabilize them, N(t) can be used to tune the amount of attractor flipping that will occur in the nets, with a higher N leading to more flipping. Thus the most basic function of the regulatory network is to change N(t) over time to keep a ``good'' level of activity going in the SOHN.
Regular processing steadily goes on in the background settling the SOHN into its current global attractor, but much of the work of the SOHNs is done during attentive processing. The regulatory network initiates attentive processing by giving one or more nets Ik attention (in the real-valued amount of) AIk. Normally, the regulatory network will do this because attractors in the Ik's have flipped - in particular, whenever an attractor in an input net9 flips, the regulatory net will give this input net a large amount of attention. However, in advanced SOHNs attention may also be given to nets whose attractors have not flipped, e.g. to nets who maintain a memory of a previous input.
Attentive processing proceeds according to the following steps:
1. All nets I with attention send their current attractor as output to all
nets J they are connected to with strength
LSIJ * Ai.
2. We determine whether any attractors have flipped because of this transmission. If not, attentive processing ends. Otherwise, we
3. Pass attention on from the parents to the newly flipped nets. If net I
has participated in flipping attractors in nets
J1,...,Jm, then net Jk (with k &isin {1, ... ,m})
gets the following amount of attention
|
LSIJk
|
* Ai |
In the attentive process, attention spreads through the SOHN from net to net, with the amount of attention present only being reduced when a net with attention does not participate in flipping another. In that case, what attention it had is lost. In most circumstances, an injection of attention into the SOHN will result in a brainwave: A chain reaction of attractors flipping each other, with the attractors that flip during one phase of the above iteration arranged in a line, and the line moving across the SOHN like a wave10. If we choose link strengths that are stronger in the direction from the input areas to the output areas than vice versa, then these brainwaves will tend to travel towards the output areas, allowing the SOHN to react rapidly to changed circumstances in its environment. Without a mechanism such as brainwaves, the reaction times of SOHNs would deteriorate proportionally to their size, and become unacceptable in large SOHNs.
While the brainwave triggered by an injection of attention into the SOHN happens during one simulation tick, the user will have the option to pause the simulation whenever a brainwave happens, and watch it spread over the SOHN, flipping attractors. Since brainwaves are a very important, and possibly dominant part of processing in the SOHN, repeated observation of this process will likely be essential to fine-tuning the network. I envision the ultimate function of the regulatory net to be orchestrating the brainwaves, so that their interaction may become the basis for superemergence.
Whether a brainwave can spread, or comes to an abrupt halt, depends largely upon the degree of similarity (i.e. the Hamming distance) between the attractors in neighboring nets11. If there is enough similarity between the attractors of linked nets, we can speak of a semantic network: Some attractor patterns will have a meaning because they are input patterns the SOHN receives from its environment, and others have a meaning because they are output patterns that determine what the actor who has the SOHN as its brain will do next to change its world. As long as the SOHN basically works as an associative network, with attractor patterns activating similar patterns in neighboring Hopfield nets, the meaning that is naturally present in the attractors of input and output modules will tend to spread through the network. If the attractor patterns are chosen carefully, they can propagate semantic content through large parts of the network, but they can also be chosen so as to break semantic connections and dissipate meaning. This is up to the user to decide, and I expect the sculpting of attractor patterns to be of paramount importance on the quest for superemergence. Here is an example of a possible set of attractor patterns for a Hopfield net:
These 4 attractors have 20 ``on''-cells each, none of which overlap. The Hamming distance between any two of them is 40. The spurious attractors that arise as mixtures of these 4 patterns are all identical to the empty pattern (every cell if ``off''), and those that arise as mixtures of reversed patterns are the full pattern (every cell is ``on'').
A neighboring Hopfield network (with strong connections to the network with the attractors above) could have the following attractors:
These 3 attractors have 40 ``on''-cells each, combining the ``on''-cells of the first and second, second and third, and third and fourth attractors of the first net respectively. The Hamming distances between the first and second, and between the second and third attractors are 40 each, while the distance between the first and third is 80. More interesting, however, are the distances to the attractors in the first net. For instance, the first attractor in the second net has a distance of 20 to both the first and second attractors of the first net, but of 60 to the other two attractors. When these nets give their current attractor as input to each other, there is an interesting propagation of semantic content - e.g., pattern1/net1 will only activate pattern1/net2, while pattern2/net1 will activate pattern1/net2 and pattern2/net2. Considering the Hamming distances between the attractors in the two nets, we could say that 61/81 of the semantic content of pattern1/net1 gets propagated to pattern1/net2, and we could try to define semantic content for all patterns in the net this way, spreading out from the input and output patterns12. However, this is a linear operation in a highly nonlinear system which can only be a crude approximation, valid over short distances.
It is easy to envision a situation in which semantic connections between adjacent networks get broken: If the set of attractor patterns in the first network consists of ``on''-columns only, and in the second network of ``on''-rows only, there is no correlation between them, no attractor pattern in one net will favor any pattern in the other net over any other, and meaning cannot propagate directly between these two nets. If such nets are adjacent to each other in the network, it is probably best to have no connection between them. However, if they are some distance from each other, it is possible to construct an intermediary net with attractors consisting of ``on''-cells that are part row, part column, so that the ``column''-attractors will favour specific attractors in the intermediary net, and these in turn will favour specific ``row''-attractors. In fact, the intermediary net can be chosen so as to favor an arbitrary one-on-one mapping between the column and row patterns. This implies that the same attractor patterns can have different, possibly contradictory ``meaning'' in different parts of the net: Semantic content is local in nature.
If we envision the two nets whose attractor patterns we have depicted as parts of a SOHN, with the first net receiving input from the environment, the second receiving it from the first for further processing, the semantic connection between them has interesting consequences: Depending on the state of the second network, it will react to some attractor changes in the first net, but not to others - and by feeding back into the first net, it will make it more likely for two attractors to become dominant than the two others. This mirrors the well known fact from psychology that the content of our perception is determined not only by sensory input, but also by expectations and the content of our working memory.
While we keep our Hopfield nets as square patterns, we can define neighbor any way we want - for instance, we can initially connect each net with its six neighbors in the arrangement shown below. This geometry seems well suited to the smooth spreading of brainwaves. Other arrangements are possible, including arrangements where some nets have (many) more neighbors than others.
Imagine that most nets are connected to all their neighbors, but that when exceptions occur, that these are often such that there is a line through the SOHN that is not crossed by connections, and that has dissimilar attractor patterns on different sides of the line. Let us call a neighborhood of nets that are bounded by such lines (one of which may also be the outside border of the SOHN) on two sides a tube - then a brainwave could impinge on the openings of several tubes, enter some of these, undergo different transformations of its semantic content while passing through, then emerge at the other ends of the tubes (which may be close together or not) with different semantic content. Some very fancy information processing could be done this way. Of course, as soon as we have a large SOHN and are a good distance from both input and output areas, it will not be possible to get semantic content by simply transmitting it from those areas. In such areas, semantic content will have to emerge from the overall dynamics of the SOHN. I have a hunch that a large part of the arcane art of building a SOHN (or SONkANNs) that will manifest superemergence will consist of choosing network connectivity and attractor patterns that will allow for the emergence of new and unplanned semantic content in regions of the net far from the input and output areas. If such content were stable over large areas of the SOHN, it could be compared to language, and its manipulation by the SOHN to conceptual thinking.
The world the SOHNs act in must be complex enough to contain emergent features which are not directly contained in the sensory data we feed the SOHN, but which can be extracted from those data. We cannot know beforehand at what level of complexity this can happen, so just as we developed a whole continuum of brain architectures, we must also develop a continuum of worlds, starting rather simple and growing in complexity. Then we must evaluate the behavior of different brains in these worlds, trying to sample the entire space of possible SOHNs.
Obviously, this is a gargantuan task. Would two years of work by a dozen graduate students be enough to determine whether this approach will work or not? Only if between them they have enough luck, inspiration, and genius to find attractor patterns, connectivity, and a regulatory network that yields a feature-detecting SOHN. My guess is that really good SOHNs are few and hard to find, and that it will take a lot of time to evolve one that does significant feature detection. There are only two ways that can be done: By a rich corporation or individual, or by a mass of volunteers. I prefer the second alternative, but I don't think that volunteers can be found if the world and the tasks the SOHNs are solving is boring. Volunteering for this project must be fun. The world must be a game.
While I would very much like to invent an original game that breaks with the established genres of computer gaming, every such attempt is very risky and likely to result in a boring gaming experience. Even if the game is cool, its unfamiliarity would imply a steep learning curve, which would deter most people. The existing game types are established because they are fun to play for many people, and they represent a wealth of accumulated experience with what works and what doesn't. Therefore, and rather reluctantly, I intend to put the SOHNs into game worlds that are modelled on existing games, particularly from the genre of strategy games.
In a standard computer game, the user directly controls his actors by telling them where to go and what to do. In our virtual world, actors will be controlled by SOHNs (hereafter referred to as ``brains''), and the user will not be able to give direct commands to his actors, but will instead have the ability to influence their brains. The internal state of their brains will be accessible in real-time on-screen, and the user will be able to ``cast spells'' on brain modules. These spells will have effects such as:
1. Sending the current attractor of a selected net as output to another selected net;
2. Stabilizing or destabilizing the attractor in a selected net;
3. Giving attention to a selected net or set of nets (likely triggering a brainwave).
When a user casts a spell on an actor's brain, he will usually try to influence the workings of the brain so as to make the actor take a specific action. However, all spells also have a secondary effect in that they change the brain structure to make this action more likely to occur in similar situations in the future. This is done in the following manner:
1. Whenever the current attractor of a net is sent as output to another net, the link from the first to the second net gets strengthened, or if such a link does not exist yet, it is created. Link strengths can never be directly weakened by spell actions, but overall link strength is kept at the same level by automatically reducing other link strengths when some are increased by spell actions or Hebbian learning.
2. When the attractor in net J is destabilized, we want to make this more likely to happen again in similar situations in the future. There are two ways to do this: If the user knows that he wants to destabilize net J because the attractor in net K has just flipped, he can make the connection directly - if he does not supply such information, the program will identify the last m attractors that have flipped, and increase their chances of causing destabilization in net J, weighted by the inverse of the elapsed time.
3. Giving attention to a selected net will in most circumstances trigger a brainwave, and thus is the strongest form of intervention the user can make. It should be handled with care - in a single-user game, the game should be paused, and the user prompted to specify a recent input to the regulatory network that should become more likely to cause this shift of attention in the future.
By interacting with his actors through casting spells on their brains, the user will train his actors to become more efficient - but equally important, the user will spend many hours intently observing the way the brain reacts to changing inputs, and will thus gain deep insight into the intricacies of its functioning. This will allow the user to incrementally design better brains by changing attractor patterns, adding new modules, changing connectivity, etc. The program should have utilities that make it easy to redesign brains in this manner. The game experience should consist of these steps:
1. Play the game, casting brain spells, trying to achieve a good score;
2. Re-design brains, and go to step 1 until satisfied, then:
3. Compete online with other users who have gone through the same process.
This will have to be coordinated through a web site that posts a ranking of the users. The best brains will always be posted and available for download and further modification by anyone. The team developing the software must stay on top of this process, and whenever a plateau is reached and brains improve only marginally (and only then), the virtual world must be updated and made more complex, so that there are more features for the brains to detect. Our long term goal is to have a 3D-rendered world, where the entities get the rendered scene as input. Brains that can function in such a world could be downloaded into robots, who could then act effectively in the real world13.
There are many possible ways to implement this vision. The essential points to aim for are:
1. Users can introduce new brain architectures into the evolutionary arena;
2. Successful brains will spread through the user community.
This is an evolutionary process leading to more effective brains, but the source of variation is not chance, but user intelligence. The users become intimately familiar with the workings of the brains, gain an intuitive understanding of the processes involved, and then introduce ``mutations'' based on the insight they have gained. Selection of successful contenders happens very fast in the competitive arena of internet play. Such an evolutionary process should work orders of magnitude faster than the pure-chance based evolutionary process that gave rise to humans. If this evolutionary process is carried on long enough on ever faster computer networks, there is no reason why this process should not eventually lead to genuine consciousness in a computer. Except, maybe, that such a thing might be impossible... but even then we would have learned something of value.
Like I said, I am not happy with implementing this project in a world that is modelled on a strategy game. Classic strategy games focus on greedy resource exploitation, and fights with neighbors. While it may be argued that this is very human, it may be prudent to create machine consciousness only if it is of a more advanced type, friendly or at least non-disruptive towards its environment and its neighbors (such as us humans). After all, if we manage to build machines that are genuinely intelligent, then such machines, who will be potentially immortal and able to sleep for thousands of years, will find it easy to spread throughout the galaxy and to shape it in their image. Their mindset will be of immense importance, and we may want them to be less competitive than humans.
On the other hand, if we try to create conscious agents through an evolutionary process, then they must compete with each other all the way, with the weaker ones being denied offspring, so the survivors will of neccessity be highly competitive. This is a problem.