Paper Title: TCP Connection Establishment Protocol Extension for Multiple Server Load-Sharing Authors: Yuli Zhou Senior Technical Staff Member AT&T Labs Research 600 Mountain Ave. Murray Hill, New Jersey 07974 Telephone: (908) 582-7815 Fax: (908) 582-6160 Email: zhou@research.att.com Doug Blewett Distinguished Member of Technical Staff AT&T Labs Research 600 Mountain Ave. Murray Hill, New Jersey 07974 Telephone: (908) 582-6496 Fax: (908) 582-6160 Email: blewett@research.att.com The paper comes in the attachment (plain ASCII text). Thank you very much, we are looking forward to attending the workshop. Yuli Zhou Doug Blewett TCP Connection Establishment Protocol Extension for Multiple Server Load-Sharing Yuli Zhou (zhou@research.att.com) Doug Blewett (blewett@research.att.com) AT&T Labs Research 1. Introduction The fundamental architecture of the Internet, namely the TCP/IP protocol suite, has remained the same since its origin as a modest network connecting a handful of research and educational institutions. Hosts are connected via IP routers, which forward datagrams independent of each other on a hop-to- hop bases. TCP connections are maintained solely by the two communicating hosts, unknown to the routers in the network. Consequently, there is very little signaling capability for setting up and managing connections, compared against the SS7 used in the North America telephone system, which provides 800 numbers and multi-line hunt groups, which are critical to the operation of large scale services. This paper addresses the problem of how to build a distributed server site having a single, public IP address. We first discuss how current solutions, especially DNS round-robin, are not completely satisfactory. We then outline a proposal for a simple extension to TCP's connection establishment protocol as a solution to the remaining problems. 2. Problem and Current Solutions The demand on most of the popular Internet sites have long out-grown the largest of a single machine SMP server. There are currently three ways to cope with this problem. The first, not much of a solution at all, lets the user choose from a list of alternative servers. It is often seen with high demand ftp sites, such as those where we download the most recent copy of Netscape or Microsoft Internet browser. The second, and most widely deployed solution is DNS round-robin [1]. The third, much less deployed solution, uses a front-end router modified to capture and redirect TCP/UDP sessions, such as Cisco's LocalDirector [2], and the TCP router [4] implemented by the authors. DNS round-robin uses a feature in new implementations of the DNS that rotates the list of IP addresses associated with a host name, so that each each time a client queries the DNS for a host's IP address, it gets a different one in round-robin fashion. Since DNS is only an Internet application and not an integral part of the IP infra- structure, clients that use raw IP addresses can completely by-pass this mechanism. In addition, the resulting mapping may be cached at various levels, e.g., by older DNS's that do not implement round-robin, or by the Netscape browser. In the worst case, a client may be repeatedly contacting a failed server while other servers are available. A session redirector solves the afore-mentioned problems, however at the cost of complicating the router. Specifically, IP routers are designed to be stateless, i.e., each packet is forwarded independently, wherease a session redirector no longer is: it has to record each active session, and redirect all packets in that session to the selected server. This increases the router's resource requirements and lowers its performance at high connection rates. It also makes the router potentially less robust. 3. Another Proposal While experimenting with the TCP router [4], we have thought at length about the following idea: when a client requests connection to a public IP address, a special redirection agent, either implemented in the router, or on a special host in the server cluster, can intercept it and contact a real server on the client's behalf, then tell the client that a connection is established with another server. The agent should only be involved in setting up the connection, afterwards the connection will be maintained solely by the communicating end-points, as per the original design of TCP/IP. This relieves the router from keeping connection state information, but requires some modification on the client's TCP/IP code. Specifically, the semantics of "connect" needs to reflect that the connection is established with a substitute server having a different IP address. 3.1 The Redirection Agent TCP establishes a connection through its so-called "three-way handshake" [3]: typically, a client wishing to connect to a server sends the server a SYN segment, with its initial sequence number SN. The server responds with its SYN, with its initial sequence number and ACK sequence number SN+1. Finally, the client ACK's the server's SYN, thus establishing the connection. In the following we detail the changes necessary to incorporate the redirection mechanism. 3.1.A Redirection by the Router This diagram shows the first scenario, where redirection is done by the router. All servers are connected to a class C network 199.97.247.0, and 100 is the public IP address. A '*' indicates that modification is necessary to current software. [.101]----| |-----[router*]-----{ network }-----[client*] [.102]----| .100 . | : |199.97.247.0 The new protocol acts as follows: - Client sends SYN to the public IP address 199.97.247.100. - Router intercepts SYN, rewrites destination address and forwards it to server (say 101). - Server(101) sends SYN/ACK to client. When passing through the router, the router looks up a table and matches the packet with an earlier SYN, it then changes the source address back to 100, adds an TCP option indicating the real server, i.e. 101, and forwards it back to the client. - Client and server 101 will assume normal conversation, with router functioning just as an IP router. The redirection agent in the router does have to keep the SYN to match the SYN/ACK, and for potential retransmissions. However, it does not need to process packets other than SYN's. 3.1.B Redirection by a Special Host In this scenario, the router is unmodified, and all connection requests go to a special redirection host X, which has the public IP address. [ X* ]----| .100 | | [.101]----| |----[router]----{ network }----[client*] [.102]----| . | : |199.97.247.0 - Client sends SYN to X. - X stores the SYN, and sends its own SYN to server (say 101). The new SYN duplicates the original port numbers, but acquires a new sequence number SN, unique on X during a long time interval (using a 32bit wrap-around counter). - Server(101) sends SYN/ACK to X, the ack sequence number is SN+1. X finds earlier SYN through SN, from which it assembles the correct SYN/ACK and send it back to the client, with an TCP option indicating the real server(101). - Client and server assume direct conversation, host X is no longer involved. 3.2 Client Modification Client's TCP code must be able to process the new option from the SYN/ACK returned from the redirection agent. The client needs to ACK the real server, and, on completion of the TCP three-way handshake, modify its socket to send/receive data to/from the real server. 5. References [1] T. Brisco, DNS Support for Load Balancing. Internet RFC 1794, Apr 1995. http://www.internic.net/rfc/rfc1794.txt. [2] Cisco Systems, Scaling the World Wide Web. White paper, Feb 1996. http://cio.cisco.com/warp/public/751/advtg/swww_wp.htm. [3] Richard Stevens, TCP/IP Illustrated, Volumes 1: The Protocols. Addison Wesley Professional Computing Series. Addison-Wesley Publishing Company, Reading, Massachusetts, 1995. [4] Yuli Zhou, Doug Blewett, Scalable Internet Servers based on Dynamic Routing of TCP Streams. AT&T Labs Research Memo, Nov 1996.