The iREACH Model reproduces the evolution during development of various kinematic and dynamical aspects of infant reaching , the evolution of submovements and corrective movements, the progressive regularization of the speed profile toward a bell shape patter, and some phenomena related to Bernstein’s df problem. Of importance, all these phenomena are reproduced by the same model, which therefore furnishes an integrated interpretation of the developmental mechanisms underlying them. The iREACH model is unique because it analyzes reaching at different stages of development (cover 40 simulated months from its onset), allowing it to reproduce and account for the features of reaching in different developmental stages, not only at the end of its acquisition process. The model integrates the core hypotheses of three motor control theories: (a) the reinforcement learning theory; (b) the equilibrium points hypothesis (EPH); (c) the minimum variance theory.
Arm proprioception and Object Position (Array) - Ten 21*21 2D neural maps encoding arm posture, speed of joints and the hand-target movement on the basis population code. The first 5 encode information about the two shoulder and elbow angles and their angular position in combined fashion. The other 5 maps encode information about the shoulder and elbow angles and the angular vectorial hand-target distance.
Arm Joint Torques (Vector) - Target shoulder and elbow torques from Muscle Model which is used by the simulated robotic arms in the tests.
Desired EPs (vector) - Actor outputs the equilibrium points (EPs) of the shoulder and elbow angles in a two unit vector. This is then added an exploratory noise.
Actual EPs (vector) - The desired EPs are altered using high level noise computed on the basis of a first-order filtered noise to allow controlled exploration. This noisy EPs are the actual EPs sent to Muscle Model for action.
Torques (vector) - To simulate the signal-dependent noise in the model, the torques issued to the arm are affected by a disturbance whose amplitude depends linearly on the (absolute) signal generating the muscle torques.
In this diagram, thin arrows represent information flows, whereas bold arrows represent all-to-all connection weights trained on the basis of the RL algorithm. The dashed line represents the critic’s TD-error learning signal.
The diagram shows the actor-critic model, formed by the "actor" and "critic" components which learn on the basis of the temporal difference learning rule (TD). Both components receive information about arm posture, the speed of joints, and the hand-target distance. This information is encoded in 2D neural maps on the basis of population codes.
Before being sent to the arm, the output signals of the actor are modified with two sources of noise. The noise generation is important to encourage exploration and also to generate an important trade-off between the RL drive to generate fast movements and the need for slow movements to improve accuracy.
Figure 2: Key Hypotheses and Target Phenomena
The integration of reinforcement learning (RL), the equilibrium point hypothesis (EP), and the minimum variance theory (MVT; linked by the inner circle) leads the model to generate a developmental trajectory that allows the model to reproduce and predict several empirical data (outer circle), most of which (reported in bold), have not been addressed by previous models.
Submodules (click to view and edit)
Actor-Critic RL Module - The simulated arm used in the model was controlled by an actor-critic RL model used to mimic the trial-and-error learning processes of infants. The specific actor-critic model used had two main components, the "actor" and the "critic" and learns on the basis of the temporal difference learning rule. Both components receive information about hte arm postures, the speed of joints, and the hand-target distance. The critic uses two evaluations, and the reward to compute reward prediction error.
Exploratory Noise Module - Exploratory Noise allows the model to randomly perturb the movements to evaluate the consequences of the perturbation and improve actions accordingly.
Muscle Model - The Muscle Spring-Damping model computes the appropriate muscle torques using the actual EPs and the desired joint angles and joint-velocity. Before muscle torques are relayed, they are introduced to second signal-dependent noise (MVT).
The huge literature on reaching behaviour is still deficient in terms of having a unified model which describes development of motor control processes. This model attempts to build a computational framework by using three key hypotheses: a) Reinforcement Learning which uses trial and error based learning and without this no learning will occur. b) Equilibrium Point Hypothesis which posits that motor control is achieved by using desired EPs c) Minimum variance theory using which the end movement precision is achieved.To model the development in reaching in infants, this model makes a thoughtful use of Reinforcement learning, Equilibrium Points and controlled Noise that together drives the progressive refinement of the reaching behavior. The model learns how to make coarse movements at first by using equilibrium points corresponding to the target position. This is followed by fine tuning of the motion as the end of movement is reached to perform a more accurate and refined movement. The movement gets better and efficient as the learning continues with the development and approaches the bell shaped speed profile as found in the adults. The model efficiently reproduces the experimental data from the study of development of reaching in infants and it also makes some meaningful predictions which lay a ground of future research.