|
|
|
|||
| Journals Home | Journals List | EJs Extra | This Journal | Search | Authors | Referees | Librarians | User Options | Help | | ||||
1999 Distrib. Syst. Engng. 6 95-102 doi: 10.1088/0967-1846/6/3/301
![]()
|
||||
Abstract. A group membership failure (in short, a group failure) occurs when one of the group members crashes. A group failure detection protocol has to inform all the non-crashed members of the group that this group entity has crashed. Ideally, such a protocol should be live (if a process crashes, then the group failure has to be detected) and safe (if a group failure is claimed, then at least one process has crashed).
Unreliable asynchronous distributed systems are characterized by the impossibility for a process to get an accurate view of the system state. Consequently, the design of a group failure detection protocol that is both safe and live is a problem that cannot be solved in all runs of an asynchronous distributed system.
This paper analyses a group failure detection protocol whose design naturally ensures its liveness. We show that by appropriately tuning some of its duration-related parameters, the safety property can be guaranteed with a probability as close to one as desired. This analysis shows that, in real distributed systems, it is possible to achieve failure detection with a negligible probability of wrong suspicions.
Print publication: Issue 3 (September 1999)| Post to CiteUlike | | Post to Connotea | | Post to Bibsonomy |
|
Journals Home | Journals List | EJs Extra | This Journal | Search | Authors | Referees | Librarians | User Options | Help | Recommend this journal EndNote, ProCite ® and Reference Manager ® are registered trademarks of ISI Researchsoft. Copyright © Institute of Physics and IOP Publishing Limited 2009. Use of this service is subject to compliance with the terms and conditions of use. In particular, reselling and systematic downloading of files is prohibited. Help: Cookies | Data Protection. |
|
| |