Shivakant Mishra Assistant Professor Department of Computer Science University of Wyoming, P.O. Box 3682 Laramie, WY 82071-3682, USA. Email: mishra@cs.uwyo.edu Phone: (307)766-4086 Fax: (307)766-4036 ------------------------------------------ DESIGN and IMPLEMENTATION Of Dependable Distributed Services on NEXT GENERATION INTERNET A white paper for the Workshop on Research Directions for the Next Generation Internet. Shivakant Mishra With increasing use of computers in everyday life, particularly for constructing critical applications, the need for highly available, dependable, and real-time responsive software is increasing. In particular, with internet access becoming common, people are increasingly using internet to construct various applications, many of which need availability, dependability, and real-time responsiveness properties. So, an important research issue that needs to be addressed in the next generation internet is constructing suitable abstractions to provide availability, dependability, and real-time responsiveness properties to an application. So far, constructing highly available, dependable, and real-time responsive software in a wide-area communication network such as internet has been extremely complicated. The key reason for this complexity is the large communication delays in today's internet that result in unacceptably low performance. As a result, the majority of research efforts in the construction of highly available, dependable, and real-time responsive software has been centered around either local-area networks or dedicated networks. We believe that the next generation internet provides properties that potentially simplify the construction of such software. GROUP COMMUNICATION SERVICES Experience has shown that the construction of highly available, dependable, and real-time responsive software is simplified by the use of suitable system-level, fault-tolerant, group communication services that provide consistent information to a group of cooperating computing components [1,2,3,6]. A Group communication service is a set of fault-tolerant, network communication protocols that enable replicated application processes to maintain a consistent replicated state, despite random communication delays and communication or processor failures. These include an atomic multicast protocol, a group membership protocol, and a clock synchronization protocol. Group communication services provide several attractive features that simplify the design and implementation of highly available, dependable, and real-time responsive software. As a result, these services have been successfully used in constructing several important applications that need high availability, dependability, or real-time responsiveness properties. However, the applicability of group communication services in constructing such applications in a wide-area network has so far remained limited. In this white paper, we propose two research directions that will enable the design and implementation of group communication services in the next generation internet, and hence enable the construction of highly available, dependable, and real-time responsive software in the next generation internet. HIGH-SPEED COMMUNICATION The key reason for the limited applicability of group communication services in today's internet is large communication delays. Because of large communication delays, the performance of an atomic multicast protocol that ensures that messages are delivered to all group members reliably and in a consistent order is extremely poor. Group membership protocols that ensure a consistent system-wide view of correct processors typically require several rounds of message exchange. Once again, because of large communication delays in the internet, this protocol results in poor performance. Finally, in order to provide real-time properties to the applications, the communication delays must be small. The next generation internet is 100 to 1000 times faster than today's internet. Hence, the performance problems associated with group communication services are expected to diminish in the next generation internet. This gives rise to several interesting research issues in the design and implementation of a group communication service. The main focus of these research issues is how to exploit the increase in communication speed to design and implement group communication services that provide all the important consistency properties useful in the construction of highly available, dependable, or real-time responsive applications, at the same time, is high performance. HETEROGENEITY Another important research issue that needs to be addressed in the design and implementation of group communication services in the next generation internet is that of heterogeneity. The computing environment that is increasingly being used for constructing applications consists of hardware and software component that differ in their interfaces, execution speeds, and architecture. This type of heterogeneous distributed computing environment will become even more common with the use of the next generation internet to construct applications. Hence, the ability to operate correctly and efficiently in a heterogeneous computing environment is an important property, that any distributed service implemented in the next generation internet must provide. Heterogeneity, interoperability, extensibility, dependability, and availability are the key requirements of a group communication service. To address these issues in object-oriented programming, the Object Management Group (OMG) created an open standard called the Common Object Request Broker Architecture (CORBA) [7]. CORBA is a set of standardized specifications that provide heterogeneity, interoperability, and extensibility properties. It provides the basic mechanisms for remote invocations through the Object Request Broker (ORB), as well as a set of services for object management. CORBA has become extremely popular with a large number of software industries all over the world. However, neither the ORB nor the existing services provide any support for high availability or dependability. As a result, applications built using CORBA are not highly available or dependable. This issue of high availability and dependability in CORBA-based applications needs to be addressed. Recently, there have been two different approaches to address these issue: (1) modifying and extending the ORB with group communication service [5], and (2) providing group communication as a service on top of ORB [4]. Both of these approaches are promising and need to be further researched. REFERENCES [1] Y. Amir, L.E. Moser, P.M. Melliar-Smith, D.A. Agarwal, and P. Ciarfella. `The Totem Single Ring Ordering and Membership Protocol'. ACM Transactions on Computer Systems, 13(4):311-342, November 1995. [2] K. Birman, A. Schiper, and P. Stephenson. `Lightweight Causal and Atomic Group Multicast'. ACM Transactions on Computer Systems, 9(3):272-314, August 1991. [3] F. Cristian. `Understanding Fault-Tolerant Distributed Systems'. Communications of the ACM, 34(2):56-78, February 1991. [4] P.A. Felber, B. Garbinato, and R. Guerraoui. `The Design of a CORBA Group Communication Service'. Proceedings of the 15th Symposium on Reliable Distributed Systems, Niagara-on-the-Lake, Canada, October 1996. [5] S. Maffeis. `Adding Group Communication and Fault Tolerance to CORBA'. Proceedings of the 1995 USENIX Conference on Object Oriented Technologies, Monterey, CA, June 1995. [6] S. Mishra, L. Peterson, and R. Schlichting. `Consul: A Communication Substrate for Fault-tolerant Distributed Programs'. Distributed Systems Engineering, 1(2):87-103, December 1993. [7] OMG. `The Common Object Request Broker: Architecture and Specification'. OMG 1995.