Application Utilization Of A National Gigabit Network, Getting End-to-End Performance Thomas J. Pratt Member of Technical Staff Sandia National Laboratories Email: tjpratt@sandia.gov Phone: 505-844-6725 FAX: 505-844-2067 Steven A. Gossage Distinguished Member of Technical Staff Sandia National Laboratories email: sagossa@sandia.gov Phone : 505-844-6291 FAX: 505-844-2067 Application Utilization Of A National Gigabit Network, Getting End-to-End Performance To effectively utilize the Next Generation Internet high performance toolsets need to be developed. For an application to effectively utilize a national gigabyte network requires both the application and the network to cooperate in a more intimate way than the current application-network-application interface. A one gigabit national network has the potential of having over 12.5 megabytes of application data in flight. The addition of network switching queues increases the in-flight data factor. The use of virtual circuits combined with parallel communication strategies will create network traffic patterns that aren't typical of the current data networks. The emergence of new applications will require quality of service guarantees that will challenge existing Internet network management policies and network usage assumptions. The network monitoring tools used today are not able to meet the current needs; at gigabit speed their functionality will need to be radically improved just to maintain the current level of network management responsiveness. Application currently don't understand their communication needs. While this is a acceptable for applications that don't require large amount of network resources, applications that do require large amount of resources need to understand the impact their network utilization has on network resources. Currently an application that requires large amount of network resources will result to using protocols that have a lower guarantee of delivery in the effort to extract higher network performance. As the network becomes congested these applications can become the first to be dropped. The DOE Scientific computing platform are on a performance path to deliver 100 teraflops. Current production machines operate at around 300 gigaflops about 1/5000 of the 100 teraflop. Currently the largest machine coming into operation is a 1-2 teraflop machine. The performance is projected to double every two years. This path will quickly result in the need for gigabit/seconds access to these machines. Access to these computing resources will be required by several different national disbursed sites. We will discuss the network needs of the scientific applications as they now exist and extrapolate the needs of these applications as the DOE scientific computing platforms performance increase proceeds. We will discuss the needs of the network to provide a reliable data path for these applications while protecting itself from being overrun by misbehaving or bandwidth voracious applications. Our experience has been that while bandwidth is the primary topic of discussion in these national network forums, the actual feature that allows the application to exploit the available bandwidth is the quality of service(QOS) guarantees that the network can provide. The NGI must make a concerted effort to provide the highest level of QOS possible within its networks. It must also present dynamic QOS information to the applications needing that information. One of our major activities is enabling scientific computing applications to take advantage of high performance networks. We are creating a seamless distributed supercomputing environment for these scientific computing applications. These efforts involve our partners at ORNL and ESNET. Activities have included TCP/IP protocol acceleration, high speed application based flow control, hardware traffic shaping, the characterization of the wide area network capability, and end to end network performance monitoring. We are continuing to empower the middleware, software that bridges the scientific application to the network, by making it more capable to intelligently interact with the wide area network. These activities are being developed to take advantage of what the current network infrastructure can do and enable the middleware's performance to meet the performance of its physical interface, 67 megabytes per second. These efforts are based on the networking standards and specification that are already in place but which are not widely available in the field today. There is an expectation that the next generation network will implement these standards thereby allowing the middleware code to quickly take advantage of these new features. Over the years, Sandia National Laboratories has aggressively sought insights into the requirements of these type of services. As an early user of satellite communications and high speed telecommunications we developed insight into the transmission of voice and data over long distances. We have in-depth knowledge of operations of networks with various error profiles. We have experience dealing with networks that have a large delay bandwidth product. Sandia had early programmatic requirements for using low latency high bandwidth wide area networking. In 1992, we became the first user of a Federal Telephone System dedicated DS3 telecommunication circuit. The circuit, that linked supercomputer users at our site in Livermore, CA. to the supercomputing site in Albuquerque, NM allowed Sandia to centralize supercomputing resources at a single site. That original circuit evolved into an ATM circuit that now passes voice, data, and video traffic between the sites, as well as, still serving it original purpose of providing a reliable high speed connection for supercomputing access. We have experience passing high quality video across a multi-user national ATM network. In 1995 at Supercomputing 95 we ran video applications across DOD's AAI network. These remote video experiments included transporting virtual reality images and CAD/CAM design visualization images. Also, the more traditional uses of remote video were also demonstrated, these included video conferencing and CATV extension. Our current activities includes providing connectivity to the ASCI Red Computing platform. The red platform is the first teraflop computing platform within the ASCI environment. The effort to create a balance computing and communication environment is a major driver in the design of the communication network surrounding the machine.