Hackademia Retreat: Sub-Project Darwinian Neurodynamics: Video of Nao in the Jolly Jumper

Darwinian Neurodynamics

Chrisantha Fernando and Alex Churchill are the principle investigators in the Darwinian neurodynamics subproject which develops the EROS (Evolutionary Robot Operating System) on a range of 3D printed arduino-based robot platforms and the Nao humanoid robot, including the one that Berit and Chris are making. DN algorithms may serve as methods for autonomous epistemic exploration for curiosity and creativity in a range of sub-projects during the week.

This page contains notes of cutting-edge developments in our thought about Darwinian cognitive architectures.

Video of Nao on the Jolly Jumper: Experiment 1.

The video below shows our first experiment with the Nao in the Jolly Jumper, evolving an actor molecule parameters, with the fitness being the sum of the first derivative of accelerometer values in rostro-caudal axis (up-down), i.e. the amount of bounciness. Dynamic movement primitive atoms are used to encode a range of movements. The coupling consists and the internal parameters of the DMPs are evolved using a microbial GA of pop size = 10. The whole experiment runs for about 800 evaluations, each evaluation being about 8 seconds. The fitness graph is shown.

Half way through, the cushion is removed and the Nao has to readapt to the hanging condition where its feet cannot touch the ground any longer. It successfully adapts to this new condition with a different motor strategy being used. We did not test whether it had forgotten the original solution with the cushion by returning the cushion, because I am sure it will have forgotten because the archive is not being used.

An early description of the architecture can be found here and here, but a paper is expected in Frontiers in Cognitive Science shortly with a full description. Also we're talking and giving a demo at Living Machines 2013 here, where Terrence Deacon is also talking.

The actor molecule we evolve the parameters of is shown below. We hold it to this fixed topology. The sensor and motor identities are also not allowed to mutate. The DMG parameters (coupling constants, a, b, time constants, weights and properties of the radial basis functions) are allowed to mutate, as are the weights in the linear transforms of the two foot reflexes that move the knee and ankle as a linear function of the sum of the 4 force sensors on the feet.

The next experiment is to try another fitness function which punishes electric current used by the robot so that it tries to maximise "bounce per ounce" as Esther Thelen puts it in this wonderful paper here, which is what inspired us to apply Darwinian neurodynamics to child development explicitly. Thelen used the latest modelling of exploration and selection dynamics available in the brain at the time, i.e. the work of Edelman on Neural Darwinism, and tended to be anti-cognitivist, but our approach tries to unify the dynamical systems and cognitivist positions, and extend the work to include the accumulation of adaptation. The above results do not show the accumulation of adaptation yet.

Compare to a child who has optimised performance in the jolly jumper here.

Experiment 2: Including accelerometers in the DMG molecule.

In the above experiment, the z accelerometer reading is not accessible in real time to the controller, but only to the game molecule for calculating the fitness of a controller. Each of the sensor atoms in the large DMG molecule now contains as its second sensor the z accelerometer [143]. A linear transform (green atom) linearly combines the two sensor values and transmits them to the motor DMG atoms (red). In this way the robot can now use the real-time z accelerometer values if they are useful for performance.

Observations during the run (10th July 2013, afternoon):

1. 40 Evaluations: Two strategies have been discovered, a. Maximally Flex knees and ankles and just passively swing. b. Oscillate right knee pushing rhythmically on cushion (not clear if this uses the FSRs). There seems to be no tendency for 'circular reactions' learned on one side to be transferred to the other side. Whilst passively swinging in the air there are some knee movements that seem to be accelerometer dependent,i.e. oscillating in rhythm to the overall robot swinging.

2. 70 Evaluations: Both legs are used by in a somewhat unsynchronized fashion. Alternating pushing off. The total independence of strategies in both legs seems biologically implausible, but I have no idea how much bilateral skill transfer there is in infants. What kinds of molecule variation operator would permit bilateral skill transfer? [Copying of parameters between atoms would produce transfer, or copying of the weights in a linear transform molecule.]

3. 100 Evaluations: Perhaps (wishfully) slightly more coordinated kick-off action between the two legs, with both knees extending at the same time sometimes. Yes, I'm not imagining it, coordination between knees is definitely there.

4. 200 Evaluations: There does not seem to be any great improvement in coordination over the last 100 generations. I'm concerned that with the cushion the Nao is too close to the floor, so that there isn't much room for it to actually bounce.

5. 230 Evaluations: Discovered a left leg bouncing method which looks altogether more elegant than the half-cocked two leg strategies. Lets hope that one survives! Well it seems to have spread in the population now, and by 260 Evaluations the fitness has gone up to around 10 from less than 8 where it was around 200 evaluations, so this left foot based hopping method works much better than trying to coordinate the two legs. [in a sense this is a kind of freezing but bilateral freezing. The right leg is now stuck in a flexed position so it doesnt get in the way].

6. 290 Evaluations: It sounds like that left ankle is getting quite tired. I think this means it automatically reduces its stiffness, and this means that actor molecules that use that ankle may now no longer do so well. So in combination with a routine that reduces force when a joint is tired, this drives perhaps the evolution of other actor molecules that can use other joints. There is no need to explicitly code energy saving in the fitness function for actors if joints can become tired and work less well when overheated.

7. 300 Evaluations: Tuning of the single left leg hopping strategy is taking place. There are micro changes in timing and fixed attractor position on the left leg knee and ankle joints to match the details of the jolly jumper spring and the cushion.

8. 380 Evaluations: Not much change in strategy or fitness. The population seems to have become converged, with only minor tuning of the parameters. Fitness remains around 10.0. Perhaps this is a local optimum that a behavioural insight would be required to break out of? Perhaps for more radical changes to be possible we need to add recombinatorial complexity, i.e. add stuff from the archieve, or use variation operators that modify the topology of the system. This should definately be a next step to prevent getting stuck on these local optima. Perhaps the more complex operators could ONLY be tried during an impasse of this sort where no fitness improvement is being made.

Experiment 3: Including accelerometers in the DMG molecule + Punishment for Electric Current Used.

The fitness function is modified to divide the amount of bounciness by the sum of the electriccurrent used by the 4 active joints (knees and ankles). This is basically Thelen's notion of "bounce per ounce". It is a simple multi-objective function but we are not using any sophisticated methods to maintain diversity along a pareto front. I wonder whether explicitly punishing electric current used will produce more elegant and efficient looking behaviours?

Further ideas:

The same kind of one legged jumping behaviour arose. OK, I'm bored of that behaviour now, it was fun at the beginning of he week, now its boring. I'm quite brain dead, need to sleep and think about how to get some accumulation of adaptation, or as Alex prefers, transfer learning... Right, slept! A few information theory measures have been brought to my attention, e.g sensorimotor mutual information (SMMI) and this may be an interesting selection criterion for good games, see.

http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003111

and the preceeding paper

http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002236#pcbi.1002236.s003

for the methods used. a Hidden Markov Network is used as the method of encoding activation logic between atoms which is interesting. They use a table of probabilities to encode the logic which is costly in terms of genome length however, but this is the most general encoding. We might wish to try it as a means of encoding the activation between atoms. It is a probabilistic encoding which may aid evolvability, i.e. you get rare paths that come into existence while the standard path works. It also bring the system closer to the path evolution algorithm here.

Probabilistic activation of atoms.

A HMN is a general encoding for determining I/O functions for multiple inputs and outputs, its like a Boolean network but with probabilities. Its a very general formulation. It might be something we want to consider implementing later to achieve probabilistic actor molecules which may be more evolvable.

Accumulation of adaptation

SMMI (Sensorimotor mutual information) as a fitness measure over games

To work this out in a continuous system, the simplest way is to artificially discrete the system by binning in time and magnitude. Thats what people do... e.g. Ay et al 2008 in a Sante Fe working paper describe how to calculate a simpler version of sensory mutual information for single channels for autonomous robots here. The mutual information between successive timesteps is equal to predictive information given a Markovian system. The authors claim that maximizing this measure results in robots that are both explorative and have predictable future events. White Gaussian noise is assumed (unrealistically). They split the sensor stream into 30 bins and calculated probabilities p(x), p(y) and p(x(t+1), x(t)) by frequencies obtained over long sampling over the run. They evaluated the MI for each of the sensor channels separately! To do this for vectors of x and y it is necessary to have multidimensional bins probably of larger sizes, and then we have the curse of dimensionality.

Perceptual interestingness (unsupervised learning rate) as a fitness measure over games [POWERPLAY]

This relates to Schmidhuber's compressibility. A good game is one that results in interesting perceptual input. If one wants to get into this line of investigation one should really read Schmidhuber's stuff.

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0023534
http://arxiv.org/pdf/1210.8385v1.pdf

Powerplay may be of importance for task description methods. The introduction to powerplay implies a rich task description. SLIM RNN neurons internal states define the goal! These SLIM RNNs have a halting neuron. They are universal dynamical function approximators like CTRNNs, so one can think of them as CTRNNs with a threshold halt neuron I think. A new task is learned for which the network can reach the SAME internal goal state in the new environment, AND reproduce the same internal goal state in all previous environments! Its like saying, the CTRNNs 10th neuron must be above 0.5 when the CTRNN halts. Wow. How internal. If the SLIM RNN can't halt in the new environment, then random weight changes (mutations) are made.

Algorithm

The algorithm is simple. Get a new network s, and mutate it to s+1. Make a new task k. See if the mutated s can solve the new k task AND all prev. tasks (by having the same internal state at halting). If it can, then store the task k and internal state of the network s+1, into the archive.

A compression version is also used where the task is to compress the data, in which case the fitness criteria is that after mutation of the network, the internal states can be produced with smaller weights or fewer connections being used. This apparently is equivalent to a better unsupervised learning ability of the classification. But I'm not sure the classification is interesting to begin with (what is a trivial set of internal states/classifications is learned at the beginning, e.g. what if s classifies all pictures with a black pixel on the bottom right as 0 and all those without as 1. Then an infinite number of pictures could be classified thus, and the new network would evolve just to consider the pixel on the bottom right and ignore everything else. Eventually no simpler more compressed network would be possible. Without the compression criterion then new tasks could always be solved.

There is absolutely no demonstration in Schmidhuber's paper that in the fovea task the behaviours learned are subjectively interesting!!!!

http://arxiv.org/pdf/1210.8385v1.pdf

What is interesting is that he is using a hill-climbing strategy, with fitness = new task performance + old task performances (to promote accumulation of adaptation and no forgetting) + compression together as a criterion. Compression being a simplification of the network capable of solving the tasks. In our work we don't have multiple tasks yet, and we don't require a solution to be good at all previous tasks. I don't think thats a sensible solution for open-endedness, because there are too many dependencies. The solution to french shouldnt need to also do unicycling. Unicycling may be a quite different controller. I do like the idea of tasks as being defined as certain internal states of the RNN because this is a very powerful and open-ended method of specifying situated and embodied goals, BUT I think that a measure of INTERESTING internal states must be added, possibly constrained by one of the information theory measures of Tononi et al to make those internal states informationally interesting about the sensrimotor states??

Implication for EROS. Set up a game molecule containing a SLIM RNN atom (or equivalent CTRNN with fixed halting time). The game molecule gives a high fitness to the actor molecule if a subset of the neurons in the CTRNN are in a given GOAL state at the end of a fixed time trial. Choose CTRNN + GOAL state pairs (i.e. dynamical game definitions) in which PROGRESS is being made in achieving this goal state over actor generations. Do not require that a new actor molecule is able to satisfy all previous CTRNN+ GOAL pairs. Mutate CTRNNs and GOALs. Problem I see with this is that additional constraints on the fitness of CTRNNs and GOALs will be needed that prevent trivial CTRNNs and GOALs from evolving perhaps, which achieves progress in some trivial way, e.g. by classifying the pixel second from the right, and third from the right. etc...

To solve this problem another kind of game atom may be a more traditional method for unsupervised learning, with the quality of unsupervised learning being a fitness measure. e.g. the game may be a k-means classifier. Actor molecules that result in classifier improvement are selected for, i.e. a good k-means classifier is one which has the greatest rate of reduction of the cost function over the last 100 data points perhaps. For k = 2 this means, behaviour which is able to distinguish maximally between two classes of event is favoured, behaviours that are able to divide observed data into two distinct clusters is favoured. Wait, the cost-function of k-means does not reward being able to MAXIMALLY SEPARATE the two classes, so instead, we should divide the actual energy of the classification, with the energy if the class labels were swapped. If the two centroids are close to each other then this will be near one. But if the centroids are far apart then this value will approach zero (I think). Another method may be to use a PCA game, in which the task is to do things that tend to maximize the variance of sensorimotor inputs to the minimum number of dimensions.

References (Reading List)

Object Action Complexes: http://en.wikipedia.org/wiki/Object_Action_Complex [This seems like a very promising evolvable representation of complex action that we should consider. How is it related to our notion of action molecules?] OACs are state-action-prediction triplets roughly, with a device that tries to make predictions more accurate, and tests how well it satisfies a goal, and records how well its prediction is doing. It is closely related to XCS/TGNG it seems. It is useful to know what existing concepts have inspired them, for similar concepts inspire our own molecular representations, although we did not know about STRIPS or Situation/Event calculi, but will certainly now look at these concepts because it might inform how our molecules should be organised (or not). See...

http://mirror.umd.edu/roswiki/attachments/Events(2f)CoTeSys(2d)ROS(2d)School/tamim.pdf

OAC Relation to Existing Concepts and Representations

OACs combine three elements:

The object (and situation)‐oriented concept of affordance

(Gibson 1950; Duchy, Warren and Kaelbling 1998; Stoytchev 2005; Fitzpatrick

et al. 2003; Sahin et al. 2007; Grupen et al. 2007, Gorniak and Roy 2007);

The representational and computational efficiency for planning and

execution monitoring (the original Frame Problem) of STRIPS rules

(Fikes and Nilsson 1971; Shahaf and Amir 2006; Shahaf, Chang and Amir 2006;

Kuipers et al. 2006; Modayil and Kuipers 2007);

The logical clarity of the situation/event calculus

(McCarthy and Hayes 1969; Kowalski and Sergot 1986; Reiter 2001; Poole 1993;

Richardson and Domingos 2006; Steedman 2002)

Lets consider each of the above influences in reverse order.

The situational and event calculi are temporal logics. I'd only once heard of these in a theory group meeting at QMUL I accidentally attended, but it did seem very interesting at the time. Didn't click until now that there could be a relation between that and our actor and game molecules. So what are these temporal logics all about? Can they help with modularity/task decomposition/action sequencing/action recombination, and search and accumulation of action adaptations? These are the critical questions to be asked and answered.

Kowalski, Robert, and Fariba Sadri. "Reconciling the event calculus with the situation calculus." The Journal of Logic Programming 31.1 (1997): 39-58.

Hackademia Retreat

Wednesday, 10 July 2013

Sub-Project Darwinian Neurodynamics: Video of Nao in the Jolly Jumper

No comments:

Post a Comment