Workshop on Research Directions for the Next Generation Internet Exploiting IPv6 "Jumbo Payloads" Over Synchronous Optical Networks W. Marcus Miller Lawrence Livermore National Laboratory, L-560 Livermore, CA 94551 Voice: (510) 424-4147 FAX: (510) 422-6287 E-mail: marcusm@llnl.gov March 24, 1997 Workshop on Research Directions for the Next Generation Internet Exploiting IPv6 "Jumbo Payloads" Over Synchronous Optical Networks W. Marcus Miller Lawrence Livermore National Laboratory Livermore, CA Motivation To meet future science-based stockpile stewardship goals and objectives through the Department of Energy Advanced Strate- gic Computer Initiative (ASCI) program, large scale model- ing, simulation, visualization, and computation intensive application must be conducted through an alliance of the ma- jor DOE laboratories. ASCI will require advances in the speed, security, and functionality of both wide and local area networks. The sharing of the computation resources and massive amounts of data to accomplish ASCI objectives will require extremely high inter-lab network bandwidth. Because tera-scale computing will generate single datasets averaging a terabyte or more in size, the network connecting the labo- ratories will require throughput to increase several orders of magnitude over the existing links. We believe the recent introduction of the IPv6 protocol [1] affords an excellent opportunity to leverage the existing DOE laboratories network infrastructure while exploiting some of the new performance capabilities of the IPv6 proto- col. The exploitation of such options as the "Hop-by-Hop" header field via "Jumbo Payloads" provides a mechanism for the transport of extremely large data packets without the need for fragmentation and reassembly. Recent research con- ducted over Gigabit networks, indicates that, even for high- ly optimized IP stacks, and so called "zero copy" IP proto- col stacks, such systems are incapable of driving packets at the available bandwidth for a mixture of payload sizes [2]. Processor memory copy speeds, network interface memory buffer (mbuf) copy overhead [5], poor protocol layer inter- faces [3], and Receive Livelock [6] are major bottlenecks in many conventional IP protocol stacks. Moreover, results reported in [4], indicate that TCP/IP performance over con- gested Asynchronous Transfer Mode (ATM) networks is poor since, (1) the TCP rate control mechanism does not control traffic burstiness sufficiently to avoid congestion-induced cell losses in wide area networks, and (2), is primarily due to large TCP segment sizes and too small ATM switch buffers. TCP performance over a 2.4Gb/s ATM wide area network in the MAGIC testbed was shown to be heavily dependent on segment size. Increased segment sizes yielded corresponding higher throughput. Our contention is that the transmission of large contiguous payloads affords an opportunity to hide the latency associ- ated with protocol overheads and amortize these cost across the network. The migration of large blocks of data will be prevalent in the ASCI environment. Bulk data transfers such as those required in downloading visualization data from a rendering engine; or the transfer of large datasets by a distributed storage system such as the High Performance Storage System (HPSS) [7] between compute nodes and disk or tape Network Attached Peripherals (NAPs) [8] will require gigabyte transfers. The transmission of large payloads has direct application in the design and optimization of network file systems such as NFS and AFS. The transmission of IP over SONET will enable us to exploit variably large packet sizes, to circumvent the 53 byte cell size imposed by ATM networks, and avoid segmentation and reassembly at the ATM Adaptation Layer. IPv6 Extensions and Jumbo Payloads Unlike the existing IPv4 protocol, IPv6 options are placed in separate extension headers that are located between the IPv6 header and the transport-layer header in a packet. Most IPv6 extension headers are not examined or processed by any router along a packet's delivery path until it arrives at its final destination. This provides a major improvement in router performance for packets containing options. In IPv4 the presence of any options requires the router to examine all such options. In contrast to IPv4 options, IPv6 exten- sion headers can be of arbitrary length and the total amount of options carried in a packet is not limited to 40 bytes. One of the extension options defined in IPv6 is a "Hop-by- Hop" option. A 32 bit "Jumbo Payload" length allows for the transport of packets up to 4 gigabytes for links supporting Maximum Transmission Units (MTU) greater than 64K. Research Issues We propose to investigate the transmission of "ultra" large data sets over native SONET networks by exploiting the IPv6 Jumbo Payload option. We are primarily interested in inves- tigating the feasibility of deploying IPv6 as a bearer ser- vice for local and wide area networks in support of high performance computing. We will investigate the tradeoffs that exist between payload size, throughput and protocol overhead. We expect to investigate specific problem areas such as switch fabric and router buffering size and associ- ated switch delays related to large buffer sizes. We intend to develop and deploy a prototype IPv6 stack that is opti- mized to reduce copy and protocol overheads. As required by ASCI performance criteria, we may elect to develop/deploy Quality of Service (QoS) protocols over IPv6 in support of our throughput objectives. The IPv6 Flow Information Field provides a mechanism to prioritize and identify a data flow March 24, 1997 - 4 - based on a flow label and a corresponding priority field. The semantics of the flow label priority field is considered experimental and is currently the subject of debate by the IPv6 Internet Engineering Task Force (IETF) working group. References [1] S. Deering, R. Hinden, Internet Protocol, Version 6 (IPv6) Specification, RFC 1883, December 1995. [2] J. Kay and J. Pasquale, Profiling and Reducing Process- ing Overheads in TCP/IP, IEEE/ACM Transactions on Network- ing, Vol. 4, No. 6, Dec. 1996. [3] M. B. Abbott and L. L. Peterson, Increasing Network Throughput by Integrating Protocol Layers, IEEE/ACM Transac- tions on Networking, Vol 1, No. 5, Oct. 1993. [4] B. J. Ewy, et al, TCP/ATM Experiences in the Magic Testbed, http://www.ukans.magic.net/tcp/overview.html, 1994. [5] B. Tierney, et al., System Issues in Implementing High Speed Distributed Parallel Storage Systems, USENIX High Speed Networking Symposium, August 1994 [6] J. C. Mogul, Eliminating Receive Livelock in an Inter- rupt-driven Kernel, USENIX Technical Conference, Jan 22-26, 1996. [7] H. Hullen and R. Watson, The High Performance Storage System, Supercomputing, Nov. 1993. [8] R. D. Van Meter III, A Brief Survey of Current Work on Network Attached Peripherals, http://www.isi.edu, Nov. 1995. March 24, 1997 -- W. Marcus Miller, Ph.D. Lawrence Livermore National Laboratory Mail Stop L-560 Tel : (510) 424-4147 7000 East Avenue FAX : (510) 423-6374 Livermore, CA 94550 e-mail: marcusm@llnl.gov