Humans maintain a body image of themselves, which plays a central role in controlling bodily movement, planning action, recognising and naming actions performed by others, and requesting or executing commands. This paper explores through experiments with autonomous humanoid robots how such a body image could form. Robots play a situated embodied language game called the Action Game in which they ask each other to perform bodily actions. They start without any prior inventory of names, without categories for visually recognising body movements of others, and without knowing the relation between visual images of motor behaviours carried out by others and their own motor behaviours. Through diagnostic and repair strategies carried out within the context of action games, they progressively self-organise an effective lexicon as well as bi-directional mappings between the visual and the motor domain. The agents thus establish and continuously adapt networks linking perception, body representation, action and language.