David M. Meyer Voice: +1.541.346.1747 Director, Advanced Network Technology Center SkyPager: +1.888.691.7491 Office of University Computing Cellular: +1.541.954.1103 Computing Center FAX: +1.541.346.4397 University of Oregon Internet: meyer@antc.uoregon.edu 1225 Kincaid URL: www.antc.uoregon.edu Eugene, OR 97403 PGP keyID: 0xF61DF0B1 PGP fingerprint: 94 27 13 DF 97 8E 4B 4B 84 02 82 81 73 EA 37 00 -------- Scalable Approaches to IP Multicast David Meyer University of Oregon Advanced Network Technology Center Many common Internet applications exhibit point-to-multipoint or "multicast" behavior. The world wide web's data distribution and caching models, soft real-time applications such as video conferencing and application sharing, and USENET News (NNTP) are examples of common applications which assume a point-to-multipoint distribution model. These applications have historically relied on unicast technologies to implement point-to-multipoint or multipoint-to-multipoint communication topologies. In particular, multipoint communication in the Internet had been and is still is (in many cases) implemented by replicating unicast flows, one for each receiver. Replicating unicast flows is clearly neither efficient nor scalable; the problem is exacerbated in the for inter-domain environment, where resources are particularly scarce. Over the past few years, however, efficient IP layer multicasting has been recognized as one of the essential architectural features required in an Internet that can scale to very large sizes. In recent years we have seen the first multicast routing and forwarding protocols designed and deployed on the Internet [DVMRP, PIMARCH, CBT]. While these protocols have been relatively successful in small scale deployment [MBONE], However, these protocols have exhibited various deficiencies when scaling to the Internet sizes while still providing adequate policy control for network service providers. The seminal work on multicast routing is contained in Steve Deering's thesis [DEERING89]. Deering outlined a mechanism for building shortest path multicast distribution trees (SPT) using Reverse Path Forwarding (RPF). For each (S,G) pair, the method builds a distribution tree rooted at the source such that each receiver is on the shortest path back to the source (the notation (S,G) represents a source,group pair). When the path between a sender and receiver is symmetrical, RPF computes the shortest path tree. The method is data-driven, data is flooded down all the branches of the distribution tree. Nodes that have no downstream receivers send "prune" packets upstream (toward the source) to prune branches of the tree which have no receivers. This protocol has become known as as Dense-Mode [PIM-DM], and is most useful when group members are densely clustered in some part of the topology. To address the case of sparsely distributed group members, Minimal Shared-Tree (MST) distribution algorithms have been introduced. Shared distribution trees have their origin as approximations of Steiner Minimal Trees, and use a variation on Center-based trees [WALL80] as their basis. Current shared tree multicast distribution algorithms include Core Based Trees [CBT] and Protocol Independent Multicasting Sparse Mode [PIM-SM]. The root of the shared tree is often called a Core or Rendezvous Point (RP). Shortest Path and Shared Tree algorithms represent trade-offs [WEI93] at three fundamental points in the design space: - Delay Shared tree algorithms have worse delays for large groups, since no known RP placement can produce shortest paths. Shared tree algorithms also don't handle dynamic group membership as well as shortest path tree, since optimal RP placement is a function of group membership distribution. In summary, SPT multicast routing algorithms like DVMRP or MOSPF [MOSPF] have the worst case delay is bounded by the round-trip time (RTT) from the receiver to the sender, whereas shared tree algorithms like PIM-SM and CBT have worst case delay bounded by twice the RTT from receiver to RP/core (assuming symmetrical unicast routing from sender to receiver). - Traffic Concentration Traffic concentration is a well known problem for shared tree protocols. For some important classes of topologies, shared tree and source trees yield the same delay characteristics. - State information Shortest path trees require much more state information. Shared tree approaches only require a single tree, used by all, while the shortest path trees are relative to each site as source. It is possible that shortest path trees could require a (S,G) pair for every active sender or receiver in the Internet. Today's multi-provider Internet reveals a fourth issue in the traditional design space: the ability to express and implement inter-provider policies. However, unlike current inter-domain unicast routing protocols (which have a rich and well developed policy model), neither of the two classes of algorithms described are adaptable in a straight forward way to the policy oriented multi-provider environment found in todays Internet. A simple example illustrates the problem: Consider three providers, A, B, and C, that have connections to a shared-media exchange point. Assume that connectivity is non-transitive due to some policy (the common case, since bi-lateral agreements are a very common form of peering agreement). That is, A and B are peers, B and C are peers, but A and C are not peers. Now, consider a source S covered by a prefix P, where P belongs to a customer of A (i.e., P is advertised by A). Now, multicast packets forwarded by A's border router will be correctly accepted by B's border router, since it sees the RPF interface for P to be the shared-media of the exchange. Likewise, C's border router will reject the packets forwarded by A's border router because, by definition, C's border router does not have A's routes through its interface on the exchange (so packets sourced "inside" A fail the RPF check in C's border router). In this example above, RPF is a powerful enough mechanism to inform C that it should not accept packets sourced in P from A over the exchange. However, consider the common case in which P multi-homed to both A and B. C now sees a route for P from B through its interface on the exchange. Without some form of multi-provider cooperation and/or packet filtering (or a more sophisticated RPF mechanism), C could accept multicast packets sourced by S from A across the shared media exchange, even though A and C have not entered into any agreement on the exchange. The situation described above is caused by the overloading of the semantics of unicast route (as described above), and the reliance on the RPF check as a policy mechanism. This presentation will focus on the multicast capabilities of today's Internet, the short term prospectus for multicast, and what kind of multicast capabilities we can expect a next-generation Internet such as I2. References ========== [CBT] A. Ballardie, et. al., "Core Based Trees (CBT) Multicast -- Protocol Specification --", draft-ietf-idmr-cbt-spec-06.txt, September, 1996. [DEERING89] Steve Deering, "Scalable Multicast Routing Protocol", Ph.D. Thesis, Stanford University, 1989. [DVMRP] T. Pusateri, "Distance Vector Multicast Routing Protocol", draft-ietf-idmr-dvmrp-v3-03, September, 1996. [MBONE] "The Multicast FAQ", http://www.mbone.com/mbone/mbone.faq.html [MOSPF] J. Moy, "Multicast Extensions to OSPF", RFC 1584, March, 1994. [PIMARCH] D. Estrin, et. al., "Protocol Independent Multicast Sparse Mode (PIM-SM): Motivation and Architecture", draft-ietf-idmr-pim-arch-04.ps , October, 1996. [PIM-SM] D. Estrin, et. al., "Protocol Independent Multicast Sparse Mode (PIM-SM): Protocol Specification", draft-ietf-idmr-PIM-SM-spec-09.ps, October, 1996. [WALL80] David Wall, "Mechanisms for Broadcast and Selective Broadcast", Ph.D Thesis, Stanford University, 1980. [WEI93] Liming Wei and Deborah Estrin, "A Comparison of Multicast Trees and Algorithms", USC Technical Report 93-560.