White Paper for CRA Workshop on Research Directions for the Next Generation Internet May 13-14 1997, Vienna Virginia Performance Monitoring for the Next Generation Internet Will E. Leland Director, Computer Communications Research Group Room MCC 1A-228B Bellcore 445 South St. Morristown NJ 07960-6438 (201) 829-4376 Fax: (201) 829-2504 wel@bellcore.com ^L Performance Monitoring for the Next Generation Internet Will E. Leland Bellcore High performance and high reliability are essential to the Next Generation Internet and require a coordinated suite of measurement and monitoring methods in order to achieve and maintain those goals. "Performance" in this new world must include more than just traffic delivery properties, such as delay or loss, but all facets of performance across a range of time horizons. Performance measurement is needed in order to address a variety of questions, such as: -- The user and application "black box" view of Internet service: Is this application receiving adequate performance? Is this provider sufficiently secure and reliable to use for an intended task? Is a performance problem arising from my local environment, the remote service, or some particular transit component within the Internet? -- Site and organization verification of service: Are there problems developing that need to be brought to the attention of my Internet provider? Am I, as the funder for some grades of service, actually getting what was promised? What is the source of a problem my users are reporting? -- Provider assurance of service: What bottlenecks threaten my current service offerings? What components can be forecast to need future expansion? Am I, in fact, offering acceptable levels of performance? Is my service under stress, whether due to malicious attack, component failure, or focused loads? Are my users seeing failures due to factors outside my network? Measurement tools and techniques are essential, but not enough in themselves to support the potential benefits of the Next Generation Internet. What is required is performance history, analysis, and visualization across a wide range of time scales and across different scopes. We use "performance monitoring" to denote the application of a coherent set of appropriate measurement techniques across the network and across time. The conceptual framework encompasses end-to-end external views (the user-level perception of service), detailed internal views used by an administrative domain, and the broader-scale view of the principal behavioral components of interconnected networks or the Internet at large. These views may have different visibility scopes -- a given network may not wish its internal details to be visible to its end-users or other networks -- but need to be based on coherent metrics and techniques, with tools for integrating and correlating the views at different levels. The Next Generation Internet, with its demand for high performance in all these dimensions, is the forerunner for a new information infrastructure that must operate in a regime unfamiliar to both traditional telecommunications (such as telephony) and to the existing, successfully commercialized Internet. This new regime comprises high performance, high penetration, and high criticality: it must combine new levels of transport performance with new models of service, while providing telephony-like levels of reliability and consistency. The new service models are evolving to support cost-effective high-quality service. Some early steps are already being taken to offer some degree of service differentiation in the Internet and in Community of Interest Networks (whether overlain on the Internet or developed as distinct extranets). These steps, however, face severe limitations -- including the lack of adequate techniques for monitoring the performance actually achieved by different grades of service. The impact of devoting resources to premium services on the basic best-effort services is unknown, and hard to determine without a consistent historical archive of monitoring. >From the perspective of the NGI, high-performance networking entails not only devoting transport resources to its network but also devising appropriate monitoring tools. Operational tools are required to support effective use of those resources; user-level and application-level tools are needed to verify quality of service and to detect and address lapses in performance. The NGI must go beyond the promise of a few grades of service (offering "better best-effort" transport) to consider many aspects of high performance as seen by the users and the member organizations: not just loss, delay, and throughput (essential as those factors are), but broader questions such as security, reliability, cost allocation, and responsiveness to user requests and problems. The new performance levels sought by the NGI entail levels of throughput, latency, and loss reduction not known in the current Internet. The end-user requires assurance of receiving these levels, implying a requirement for the means to verify the performance being given. Participating sites and organizations require a clear definition of service, independent means of confirming the service is indeed being given, knowledge of the quality of offerings by potential or current providers, and automatic identification and tracking of problems (to effectively pursue immediate and longer-term remedies). Providers and network operators need to be able to obtain knowledge that is more detailed in several dimensions: the performance actually seen by their various classes of users; the contributions to this performance by components of the network (both those directly affecting transport, e.g., packet losses, and those involving essential higher-level services, such as DNS); the internal factors that contribute to current performance and future bottlenecks (such as route stability, and router and server loads); coordination of factors (such as the status of internal components when site X detects a problem, or the effect on routing when link Y is down); history (for trend analysis, engineering and planning); and identification of external problems (whether in other transport providers, in external services that are critical to their users, or from unacceptable user behavior). The evolution of the Internet from the research community to its present role is only the first step in a path that will rapidly see it become the nation's critical information infrastructure. To reach the next step, the Next Generation Internet will both sustain research in the performance monitoring techniques needed and directly depend on those techniques to support the performance required by its new applications. The NGI's exploration of these techniques will in turn provide essential tools for the networks of the 21st century.