Name: David A. Lifka Title: Senior Systems Programmer Affiliation: Cornell Theory Center Postal address: Frank H. T. Rhodes Hall Ithaca, NY 14850 Email: lifka@tc.cornell.edu Phone: 607-254-6621 Fax: 607-254-8888 The Advanced Resource Management System (ARMS) At the Cornell Theory Center (CTC) we have been developing an advanced resource management and scheduling system for distributed, heterogeneous computational, visualization, network and data acquisition resources. This system called ARMS deterministically schedules applications on these resources so that network connectivity and quality of service are ensured. ARMS is a technology enabler, in that it provides access for applications to these distributed resources in a seamless, cohesive fashion. This will allow researchers from all areas to focus on their areas of expertise without being concerned with the logistics of running their applications or experiments on geographically distributed heterogeneous resources. ARMS has been under development for over a year and is already showing signs of enormous potential. An ARMS-like system will be essential for the Next Generation Internet which needs to provide secure, reliable access to the growing number of distributed resources that our country's researchers require. ARMS includes a two-level intelligent scheduling system that uses an Informix distributed database. The database maintains status and availability information, which is then used to schedule jobs with specific computing, synthetic environment, software, data, and networking resource requirements. The foundation for ARMS is based on key components from the EASY-IBM LoadLeveler application programming interface (API) project (http://www.tc.cornell.edu/Software/EASY-LL). Through a minimal set of API calls, all the necessary information about local and remote computational resources and user job requirements can be queried reliably. This API is being extended to work with other major scheduling systems such as Platform Computing's LSF. ARMS provides secure, survivable resource management and job scheduling. Local resource monitoring agents are being developed that will relay status and availability information to the ARMS database using the Ensemble system developed at Cornell University (http://simon.cs.cornell.edu/Info/Projects/Ensemble). Ensemble, is a reliable networking protocol that guarantees the arrival of information and also its ordered arrival. Without ordered arrivability, for example, if a resource manager sends a message that a node is up and then a message that the same node is down, and the "node is up" message arrives after the "node is down" message, a scheduler might try to start a job on a node that is currently unavailable. ARMS relies on DCE/DFS for cross-realm authentication. Sites that participate in an ARMS distributed system will administer their own local DCE cell, which will allow cross-cell authentication with the other partner cells. CTC is working closely with staff members at Lawrence Livermore National Laboratory, who have been working extensively on multiple DCE cell deployment as well as issues related to administration. Due to the ARMS infrastructure's size and heterogeneous nature, distributed system monitoring tools and navigational tools will be extremely important for both systems administrators and users. IBM and CTC are developing Web-based Java tools for locating resources and monitoring their status and availability. Current work includes developing a Java-based hierarchical view of the ARMS resources and intuitive ways of browsing the hierarchy and presenting various pieces of interesting information. To test the usability of the ARMS system for the development of distributed programming environments, we have been working on several prototype applications and collaborative tools layered on top of ARMS. CTC has developed a secure Web-based scheduler interface that provides ARMS users with a common interface for submitting jobs to the ARMS scheduler and for browsing and editing DFS filespace, no matter where the jobs reside. "Smart Make" facility, another interface under development, will allow users to submit source code to the ARMS scheduler, which in turn will locate the necessary software and computing resources for the code, build the correct executable, and store metadata on the executable pertaining to its resource requirements. CTC is already a leader in the area of Web-based education through its Virtual Workshop. This application is particularly demanding of network responsiveness for users. ARMS will enable advanced reservation of networking resources so that during a Virtual Workshop or class session users will be able to interact as they would in a physical meeting or class room. ARMS is being tested in several testbeds. Development work is being done primarily at CTC on its 512 node IBM SP system and SGI Power Onyx systems. As code matures it is deployed to a campus-based distributed system with ATM network connectivity. In addition, mature ARMS code has been tested on a national testbed including CTC, Pennsylvania State University, the University of California at Los Angeles, and the University of Maryland. The geographical distance between these sites has proven to be an excellent test of ARMS' capabilities. The reliable and deterministic service that ARMS will be able to guarantee will make it an essential part of the NGI. It will provide a mechanism to predict and control network traffic - a feature that is missing in the current Internet. Our working closely with key distributed application developers will ensure the ultimate usability of the ARMS system. It is extremely important that we closely track the evolving list of requirements of future NGI users so that ARMS will meet them.