Complex system recovery by process programming redundancy
Titel:
Complex system recovery by process programming redundancy
Auteur:
Vokorokos Liberios
Verschenen in:
Acta Montanistica Slovaka
Paginering:
Jaargang 5 (2000) nr. 4 pagina's 383-386
Jaar:
2000
Inhoud:
This paper presents the recovery of a control system resistant against faults. We come out from parallel computer system with distributed memory and communication based upon exchange of messages. This system consists of processor elements, communication lines and switches. At least one application process is running on each of the processor of parallel system. Processes are executed parallely and sequently, communicating with each other through the communication lines executing one task. Several tasks can be run on the parallel system. Processes are mapped to the processor elements.This applied method of system endurance against fault is ensured on the level of processor elements, communication lines, switches and processes using software and hardware redundancy. The purpose of the recovery in fault tolerant parallel system is to create and insure system supporting against fault after its appearing. Resistance against faults is ensured by the applied method of a fault tolerant system.The paper describes the function of the system after system fault. Faults in different parts of parallel system have different importance. Lets think about a fault processor, line or switch. The most important is fault on processor. In this case the processes allocated on this processor have to be moved to other processor, recovered and initialled one more time. Usually we can think about that processor memory content is lost after fault appearing, or unaccessing. It is necessary to remove and to redirect all communications lines going through this process.The process of system recovery is known. But there is a question how and who controls recovery of kernel of processor. Control can be either centralised or decentralised. There is a question how many copies of processes are enough for sufficient resistance against faults. In case of active and passive processes it depends on requested security. One passive copy of the process is sufficient if we assume, that fault doesnt appear on two processors occupied by the same process at the same time or in time of recovery of the system.
Uitgever:
Technical University of Kosice, the Faculty of Mining, Ecolo