Press Esc to close

BODB


Hierarchy Browser (Show)

Model: iReach, A Developmental model of Reaching (Caligiore - Parisi - Baldassarre)

Collator: Jasmine Berry
Created: March 23, 2015
Last modified by: Jimmy Bonaiuto
Last modified: April 02, 2016
Tags: grasping, reaching, computational, reinforcement learning, minimum variance theory, Equilibrium Points
Brief Description

The iREACH Model reproduces the evolution during development of various kinematic and dynamical aspects of infant reaching , the evolution of submovements and corrective movements, the progressive regularization of the speed profile toward a bell shape patter, and some phenomena related to Bernstein’s df problem. Of importance, all these phenomena are reproduced by the same model, which therefore furnishes an integrated interpretation of the developmental mechanisms underlying them.
The iREACH model is unique because it analyzes reaching at different stages of development (cover 40 simulated months from its onset), allowing it to reproduce and account for the features of reaching in different developmental stages, not only at the end of its acquisition process.
The model integrates the core hypotheses of three motor control theories: (a) the reinforcement learning theory; (b) the equilibrium points hypothesis (EPH); (c) the minimum variance theory.

Public:  YES
Architecture
Inputs
  • Arm proprioception and Object Position (Array) - Ten 21*21 2D neural maps encoding arm posture, speed of joints and the hand-target movement on the basis population code. The first 5 encode information about the two shoulder and elbow angles and their angular position in combined fashion. The other 5 maps encode information about the shoulder and elbow angles and the angular vectorial hand-target distance.
Outputs
  • Arm Joint Torques (Vector) - Target shoulder and elbow torques from Muscle Model which is used by the simulated robotic arms in the tests.
States
  • Desired EPs (vector) - Actor outputs the equilibrium points (EPs) of the shoulder and elbow angles in a two unit vector. This is then added an exploratory noise.
  • Actual EPs (vector) - The desired EPs are altered using high level noise computed on the basis of a first-order filtered noise to allow controlled exploration. This noisy EPs are the actual EPs sent to Muscle Model for action.
  • Torques (vector) - To simulate the signal-dependent noise in the model, the torques issued to the arm are affected by a disturbance whose amplitude depends linearly on the (absolute) signal generating the muscle torques.
Diagrams (Show)
Figure 1: Architecture of iREACH
Architecture of iREACH

In this diagram, thin arrows represent information
flows, whereas bold arrows represent all-to-all connection weights trained on the basis of the RL algorithm. The dashed line represents the critic’s TD-error learning signal.

The diagram shows the actor-critic model, formed by the "actor" and "critic" components which learn on the basis of the temporal difference learning rule (TD). Both components receive information about arm posture, the speed of joints, and the hand-target distance. This information is encoded in 2D neural maps on the basis of population codes.

Before being sent to the arm, the output signals of the actor are modified with two sources of noise. The noise generation is important to encourage exploration and also to generate an important trade-off between the RL drive to generate fast movements and the need for slow movements to improve accuracy.

Figure 2: Key Hypotheses and Target Phenomena
Key Hypotheses and Target Phenomena

The integration of reinforcement learning (RL), the equilibrium point hypothesis (EP), and the minimum variance theory (MVT; linked by the inner circle) leads the model to generate a developmental trajectory that allows the model to reproduce and predict several empirical data (outer circle), most of which (reported in bold), have not been addressed by previous models.

 
Submodules (click to view and edit)
  • Actor-Critic RL Module - The simulated arm used in the model was controlled by an actor-critic RL model used to mimic the trial-and-error learning processes of infants. The specific actor-critic model used had two main components, the "actor" and the "critic" and learns on the basis of the temporal difference learning rule. Both components receive information about hte arm postures, the speed of joints, and the hand-target distance. The critic uses two evaluations, and the reward to compute reward prediction error.
  • Exploratory Noise Module - Exploratory Noise allows the model to randomly perturb the movements to evaluate the consequences of the perturbation and improve actions accordingly.
  • Muscle Model - The Muscle Spring-Damping model computes the appropriate muscle torques using the actual EPs and the desired joint angles and joint-velocity. Before muscle torques are relayed, they are introduced to second signal-dependent noise (MVT).
Narrative (Show)
Related Models (Show)
Related BOPs (Show)
References (Show)
Discussion (Show)