Integrated Reliability Analysis Tool Submitted by: Mike Sjulin, Cindy Phillips, Elmer Collins, Bob Hutchinson Sandia National Laboratories M. S. 0451 P. O. Box 5800 Albuquerque, NM 87185-0451 (505) 844-5012 voice (Mike Sjulin) (505) 844-9641 fax mrsjuli@sandia.gov ---------------------------------------------------------------------------- ------ As the size and complexity of communication networks grow, designers of these networks are increasingly dependent on tools which aid in the design. Some significant factors that influence the design of these networks include: functionality, performance, reliability, and cost. A designer must balance these factors in a sound design. Current network design tools focus on functionality and performance. The design engineer does not have a direct indication of reliability or long term cost through these tools. The proposal is to develop an automated tool that assists the network designer in evaluating the system reliability of a communications network. This tool will be integrated into an existing network design package. Primitive data structures of the existing tool will be transformed into a reliability model. This reliability model in concert with a reliability data base will be used to evaluate the expected system reliability based on the topology and the content of the network The automated tool is envisioned to consist of three parts: a translator, a reliability data base, and an analysis engine. The translator will be capable of extracting information about network topology, component attributes, and communication protocols directly from the design tool . This information and a precise definition of system success entered by the user will be used to generate a reliability representation of the network. The reliability data base will contain field failure information, number of failures per million hours of operational usage, and the applicable probability distribution function for most commercially-available communication network components. Users will have the option to use this data or enter their own failure data. If a component is not in the data base, the user will be provided guidance in determining a reliability metric, i. e., Mean Time Between Failure (MTBF), for that component. The analysis engine uses the reliability representation from the translator and the reliability data base to evaluate the model and provide a reliability assessment of the system. A communication network can be abstracted as a graph where the nodes represent hubs or termination points (routers, gateways, switches, users, etc.) an edge between two nodes exists if there is a direct communication channel between them The analysis engine is envisioned to be developed in different stages. The first stage will consider all-terminal reliability. The input is a graph where the nodes are assumed to be completely reliable and each edge e is assigned a value p_e, which represents the probability that edge e will fail in a given time interval. Given that all edges fail independently with their given probability, the challenge is to compute the probability that viable (unfailed) paths exist between every pair of nodes at the end of the time interval. The goal will be to provide the network designer with real time reliability data for interactive design. The second stage will add channel capacity and node failure. A multi-commodity flow model will be incorporated into a Monte Carlo simulation to estimate the reliability versus channel capacity distribution. Analysis results should be integrated into the existing network design tool display. This will provide the user with an integrated design tool. Reliability data along with repair cost data can be used to estimate long term system cost.