Spanning the Next Generation Internet Robert Armstrong Ann Gentile rob@ca.sandia.gov gentile@ca.sandia.gov voice: 510-294-2470 voice: 510-294-3614 FAX: 510-294-3422 FAX: 510-294-1004 Distributed Information Systems 7011 East Ave Livermore, CA 94550 ----------------------------------------------------------------- -------------------- --------------------------------------------- Spanning the Next Generation Internet The ability to effectively use resources on the Next Generation Internet will depend heavily upon the ability to reach and manage resources in a scalable manner. The Next Generation Internet envisions thousands of machines cooperating in calculations and database retrieval. Spanning software operations to an assembly of this size is impractical without tools that securely scale with the size of the machine aggregate. Multiple systems linked via high-capacity networks with latencies as low as the speed of light will allow will be best utilized with code that can span these networks logarithmically. Code which spans systems of this sort must be scalable. Constructing such code, whether for applications, debugging, or maintenance, is an unsolved problem. Approaches for solving these problems include the use of intelligent agents, meta-systems, multicasting, or combinations of these approaches. For the computing resources on the Next Generation Internet to be effectively shared, scalable code must be employed for controlling user processes as well as general system administrative tasks (e.g., creation of user accounts, broadcasting status information, system set-up, monitoring of system status). Although tools exist to accomplish some of these tasks, they are not scalable, rely on relatively weak security, and have not been designed for survivability should participating machines go down. The Next Generation Internet is envisioned to connect thousands of independent computers which can be harnessed for a single application. In order to harness the power of this architecture, it will be necessary to ferret out application anomalies and operating system failures when the only indication of a problem is that the application stalls. For example, consider that a user's parallel code, consisting of thousands of workers scattered across the network, stalls because of an unknown problem with a single worker. Parallel computations are commonly structured such that if one worker is held up, it will eventually stall the entire application. There is no a priori way of determining which worker is the culprit and a linear querying scheme would be prohibitively slow. Each machine needs to be queried to determine which worker processes are not blocked in a receive state -- a straightforward way to determine which process is the culprit. It is necessary to carry the code to all of the worker machines and perform the query with the local machine's debugger simultaneously and scalably. Then, if a culprit is found, a debugger window can be attached to the user's X-display. These concerns are not restricted to high-performance computing. Retrieval of medical information from thousands of participating servers requires the same sort of response. The case of medical information also illustrates the need for security: medical resources must be protected, results must be reliable, and privacy is critical. Traditionally, this query would be run on each machine from a single host in a non-scalable loop, resulting in a completion time that would likely be prohibitive. For example, for a thousand networked machines a non-scalable loop will take 16 minutes to span all of the machines, but only 10 seconds is required to do this scalable, logarithmic fashion. (Note this ignores propagation delays in both cases.) Meta-systems can be tested as part of the Next Generation Internet initiative. Mechanisms which have potential for spanning the Next Generation Internet include CORBA, DCE, Legion, Globus, Daisy, and Lilith. These systems focus on sharing processor cycles. The Next Generation Internet needs to allow the sharing of unique resources, and the sharing of commodity resources. A basic commodity of a network is a processor cycles. Without the ability to share code and harness distant resources, the Next Generation Internet will have high speed connections to idle cycles.