Research Directions for the Next Generation Internet: Marianna Kantor, Deputy Group Leader Computer Research and Applications Group (CIC-3) Norman Johnson, Technical Staff Member Fluid Dynamics (T-3) Los Alamos National Laboratory, MS B265 Los Alamos, NM, 87545 email: nlj@lanl.gov, mkantor@lanl.gov phone: (505) 667-9094, (505) 665-8310 fax: (505) 665-5220 Research Directions for the Next Generation Internet: Emergent Knowledge on the Internet Abstract: --------- A central future function of the Internet will be to process information and extract knowledge based on the interaction between the user and the underlying data infrastructure. We argue that a knowledge-capturing and decision-making capability will evolve that combines the advantages of human understanding of complex information with the ability of computer networks to process and relate vast amounts of data. The current information processing methodologies are neither robust nor sophisticated enough to support the complexity of today's business, research, and government activities. There is a clear and compelling societal need for effective and efficient information processing that supersedes the traditional methods. We claim that the "Emergent Knowledge" capability of dynamic knowledge representation will address fundamental societal needs in the areas of national security, scientific research, education, economics, and decision and policy making. Given the importance of this capability, a significant opportunity arises for the Next Generation Internet effort to develop an information processing methodology and establish appropriate standards for intelligent information processing on the Internet while contributing to pre-competitive research activities. Perspective on Communication and Computing: ------------------------------------------- Recently, we have witnessed a global shift of paradigms in computer and network architectures. Fifteen years ago computation and communication were by and large separate fields, with computation being developed around centralized architectures. The following years witnessed a change from centralized to distributed processing, with dramatic consequences on the industrial landscape: companies once all powerful and associated with centralized computing had to review their strategies to survive in a world where local area networks were fast replacing mainframes. Predominant world computing resources are presently organized around networks of computers. Even though distributed, these architectures are still locally centralized, with a clear distinction between communication and computation. The explosion of the Internet is propelling another dramatic change, where computation is done on massively distributed networks, blurring the distinction between processing, storage, and communications. Data is now distributed over vast, heterogeneous and remote sites. Software is developed, acquired, and updated from diverse origins and by independent agents. Computations by these interacting software agents are increasingly more focused on relating distributed data, rather than on the traditional processing of data locally. Communication is now an essential aspect of processing. This new highly networked environment is causing a technological phase transition, demanding new tools, new infrastructures, new methods and new theories. The next major paradigm shift will be the integration of users into the dynamic data processing. Addressing a Societal, Industrial and Research Need: ---------------------------------------------------- Despite the increased access to data and dramatic increases in computing power, there are large classes of problems that are not currently tractable by existing human or computing resources, but are of significant societal importance. These classes of problems have the dual components of (1) data that is of a diverse and complex nature and (2) excessively large amounts of data. Humans are excellent processors of complex and contradictory data, easily following intuitive leaps to desirable solutions, but only for limited amounts of data. Conversely, computers easily process large amounts of data, but only for homogeneous data of low complexity. Unfortunately, in our modern complex world, most of the problems confronting society encompass both complexity and volume. For example, social-technical problems require a variety of experts that have differing approaches to the problem definition and have difficulty establishing even a common vocabulary. For human decision making, current methodologies of analysis (the ubiquitous committees, for example) do not make best use of diversity of expertise or knowledge, because these in of themselves hamper the collective decision making process in the attempt to form a consensus. Collective human decision making is typically cumbersome and slow, and often fails to obtain critical conclusions that are obvious in retrospect. Computational tools are limited by the diversity of the data. For example, the majority of data mining techniques in knowledge discovery are based on regressions, thus lacking the ability to analyze heterogeneous and non-numerical information. Similarly, expert systems require representation of knowledge into simple rules. These systems have limited flexibility of learning or adaptability once the rules are established and their complexity is limited by the understanding of the "expert" creating the system. Evolution of Emergent Knowledge Systems: ---------------------------------------- The above observations lead directly to an alternative analysis and decision making capability that combines the strengths of both human and computer networks. The traditional process of knowledge association and development through spoken communication and printed media is being supplemented, and, in some areas, replaced by the linking of the extensive "knowledge" on the Internet or within intranets. A premiere example of a highly interconnected data resource is the now famous Los Alamos Archives, a comprehensive repository of publications in Physics with about 300,000 users per week within the US. Despite the simple nature of the internal links within the Archive, it still has significant usefulness within the Physics community. One can envision a more sophisticated linking structure overlaying the same primary data, that reflects the usage of the data by experts. Such a secondary structure would capture dynamic trends of usage and begin to create an "Emergent Knowledge" structure that could serve a variety of purposes: (1) improved searching capability by dynamically linking prior search concepts with data, (2) a better collective knowledge resource by associating related concepts for improved use by non-experts, (3) identifying new trends of interest by capturing recent activity in user access, and (4) improving cross-disciplinary communication by dynamically associating related concepts. In a similar manner, once the Emergent Knowledge capability is developed, corporate, research, and government efforts can use the technique for advise and decision making using. Here, we envision a growth of primary data, as experts add to the base knowledge, with dynamically changing secondary knowledge structure that reflects the collective understanding and trends of the community within the intranet. The Emergent Knowledge structure can be either an aid to capturing a current understanding of the issues or as a mechanism to determine trends in markets or research. An Example: ----------- A simple experiment was done by a research group in Belgium that illustrates the concept of Emergent Knowledge (http://pespmc1.vub.ac.be/papers/SelfOrganWWW.html). The group created 150 websites, each identified by a word from the 150 most common words in English. Under the word, a list of 10 randomly chosen words from the list of 150 words were displayed. Upon entering one of the 150 sites, the user was asked to pick the word from the list that most closely is associated with the header word. Upon choosing a word, the order of the list is recalculated based on the frequency of selection and then the user is taken to a new site that corresponds to the selected word, and the process is repeated. The researchers found that the lists stabilized to a fixed order after about 4000 selections in a site. The resulting ordered lists determined a common semantics despite the heterogeneity of users. This simple task of ordering would be easy for an individual but of little utility due to variation in semantic differences. Alternatively, the task would be difficult to achieve for a committee of experts, while the result would be more useful. The network solution achieved a result representing collective knowledge, but with minimal instruction and effort for the collective group of individuals. This example captures the essence of developing an Emergent Knowledge system that combines the advantages of both human and computer networks to quickly solve a syntactically complex problem. From this example, one can imagine a host of previously challenging, if not intractable, problems that could be addressed once the methodology is developed. Reason for NGI Involvement: --------------------------- Emergent Knowledge System formation on the Internet and intranets will evolve in capability with or without a coordinated effort to understand or develop the field, as a consequence of the driving forces of the commercialization and socialization of the Internet. But without the application of science to the process, we miss the opportunity to participate as scientists in its formation and development and to facilitate the process for the betterment of our society. The commercialization of the capability spans a broad diversity of applications such that a pre-competitive research support would allow the maximum application of the methodology, without control by a single sector of the industry. Furthermore, the application of the Internet is sufficiently broad that standards are needed for the ease of implementation across the Internet community. For example, we envision fundamental changes in browser technology as a result of the concepts presented here. Support within the NGI framework would enable a pro-active research effort that would contribute to both basic research and the broad commercialization, with the ultimate contribution to the Nation and society.