Contents :
Distributing Streaming Media Content Using Cooperative Networking Venkata N. Padmanabhan Helen J. Wang Philip A. Chou Microsoft Research Kunwadee Sripanidkulchai Carnegie Mellon University April 2002 Technical Report MSR-TR-2002-37 Microsoft Research Microsoft Corporation One Microsoft Way Redmond WA 98052 Distributing Streaming Media Content Using Cooperative Networking Venkata N. Padmanabhan Helen J. Wang Philip A. Chou Microsoft Research padmanab helenw pachou @microsoft.com Kunwadee Sripanidkulchai Carnegie Mellon University kunwadee@cs.cmu.edu ABSTRACT In this paper we discuss the problem of distributing streaming media content both live and on-demand to a large number of hosts in a scalable way. Our work is set in the context of the traditional client-server framework. Specifically we consider the problem that arises when the server is overwhelmed by the volume of requests from its clients. As a solution we propose Cooperative Networking (CoopNet) where clients cooperate to distribute content thereby alleviating the load on the server. We discuss the proposed solution in some detail pointing out the interesting research issues that arise and present a preliminary evaluation using traces gathered at a busy news site during the flash crowd that occurred on September 11 2001. 1. INTRODUCTION There has been much work in recent years on the topic of content distribution. This work has largely fallen into two categories: (a) infrastructure-based content distribution and (b) peer-to-peer content distribution. An infrastructure-based content distribution network (CDN) (e.g. Akamai) complements the server in the traditional client-server framework. It employs a dedicated set of machines to store and distribute content to clients on behalf of the server. The dedicated infrastructure including machines and networks links is engineered to provide a high level of performance guarantees. On the other hand peer-to-peer content distribution relies on clients to host content and distribute it to other clients. The P2P model replaces rather than complements the clientserver framework. Typically there is no central server that holds content. Examples of P2P content distribution systems include Napster and Gnutella. In this paper we discuss Cooperative Networking (CoopNet) an approach to content distribution that combines aspects of infrastructure-based and peer-to-peer content distribution. Our focus is on distributing streaming media content both live and on-demand. Like infrastructure-based content distribution we seek to complement rather than replace the traditional client-server framework. Specifically we consider the problem that arises when the server is overwhelmed by the volume of requests from its clients. For instance a news site may be overwhelmed because of a large "flash crowd" caused This is an expanded version of a paper appearing in ACM NOSSDAV 2002 17 . For more information please visit the CoopNet project Web page at http://www.research.microsoft.com/ padmanab/projects/CoopNet/. by an event of widespread interest such as a sports event or an earthquake. A home computer that is webcasting a birthday party live to friends and family might be overwhelmed even by a small number of clients because of its limited network bandwidth. In fact the large volume of data and the relatively high bandwidth requirement associated with streaming media content increases the likelihood of the server being overwhelmed in general. Server overload can cause significant degradation in the quality of the streaming media content received by clients. CoopNet addresses this problem by having clients cooperate with each other to distribute content thereby alleviating the load on the server. In the case of on-demand content clients cache audio/video clips that they viewed in the recent past. During a period of overload the server redirects new clients to other clients that had downloaded the content previously. In the case of live streaming the clients form a distribution tree rooted at the server. Clients that receive streaming content from the server in turn stream it out to one or more of their peers. The key distinction between CoopNet and pure P2P systems like Gnutella is that CoopNet complements rather than replaces the client-server framework of the Web. There is still a server that hosts content and (directly) serves it to clients. CoopNet is only invoked when the server is unable to handle the load imposed by clients. The presence of a central server simplifies the task of locating content. In contrast searching for content in a pure P2P system entails an often more expensive distributed search 21 22 25 . Individual clients may only participate in CoopNet for a short period of time say just a few minutes which is in contrast to the much longer participation times reported for systems such as Napster and Gnutella 24 . For instance in the case of live streaming a client may tune in for a few minutes during which time it may be willing to help distribute the content. Once the client tunes out it may no longer be willing to participate in CoopNet. This calls for a content distribution mechanism that is robust against interruptions caused by the frequent joining and leaving of individual peers. To address this problem CoopNet employs multiple description coding (MDC). The streaming media content whether live or on-demand is divided into multiple sub-streams using MDC and each sub-stream is delivered to the requesting client via a different peer. This improves robustness and also helps balance load amongst peers. The rest of this paper is organized as follows. In Section 2 we discuss related work. In Section 3 we discuss the operation of CoopNet for live and on-demand content and present an outline of multiple description coding. In Section 4 we use traces from the flash crowd that occurred on September 11 2001 to evaluate how well CoopNet would have performed for live and on-demand content. We present our conclusions in Section 5. tion 3.3) which enables much quicker repairs. It is hard for us to do a specific comparison with Allcast and vTrails in the absence of published information. 3. COOPERATIVE NETWORKING (COOPNET) 2. RELATED WORK As noted in Section 1 two areas of related work are infrastructure-based CDNs and peer-to-peer systems. Infrastructurebased CDNs such as Akamai employ a dedicated network of thousands of machines in distributed locations often with leased links inter-connecting them to serve content on behalf of servers. When a client request arrives (be it for streaming media or other content) the CDN redirects the client to a nearby replica server. The main limitation of infrastructurebased CDNs is that their cost and scale is only appropriate for large commercial sites such as CNN and MSNBC. A second issue is that it is unclear how such a CDN would fare in the face of a large flash crowd that causes a simultaneous spike in traffic at many or all of the sites hosted by the CDN. Peer-to-peer systems such as Napster and Gnutella depend on little or no dedicated infrastructure1 . There is however the implicit assumption that the individual peers participate for a significant length of time (for instance 24 reports a median session duration of about an hour both for Napster and for Gnutella). In contrast CoopNet seeks to operate in a highly dynamic situation such as a flash crowd where an individual client may only participate for a few minutes. The disruption that this might cause is especially challenging for streaming media compared to static file downloads which is the primary focus of Napster and Gnutella. The short lifetime of the individual nodes poses a challenge to distributed search schemes such as CAN 21 Chord 25 Pastry 22 and Tapestry 30 . Work on application-level multicast (e.g. ALMI 18 End System Multicast 3 Scattercast 2 ) is directly relevant to the live streaming aspect of CoopNet. CoopNet could benefit from the efficient tree construction algorithms developed in previous work. Our focus here however is on using real traces to evaluate the efficacy of CoopNet. Thus we view our work as complementing existing work on application-level multicast. We also consider the on-demand streaming case which does not quite fit in the application-level multicast framework. Existing work on distributed streaming (e.g. 13 ) is also directly relevant to CoopNet. A key distinction of our work is that we focus on the distruption and packet loss caused by node arrivals and departures which is likely to be significant in a highly dynamic environment. Using traces from the September 11 flash crowd we are able to evaluate this issue in a realistic setting. Systems such as SpreadIt 5 Allcast 32 and vTrails 34 are perhaps closest in spirit to our work. Like CoopNet they attempt to deliver streaming content using a peer-to-peer approach. SpreadIt differs from CoopNet is a couple of ways. First it uses only a single distribution tree and hence is vulnerable to disruptions due to node departures. Second the tree management algorithm is such that the nodes orphaned by the departure of their parent might be bounced around between multiple potential parents before settling on a new parent. In contrast CoopNet uses a centralized protocol (Sec1 Napster has central servers but these only hold indices not content. In this section we present the details of CoopNet as it applies to the distribution of streaming media content. We first consider the live streaming case where we discuss and analyze multiple description coding (MDC) and distribution tree management. We then turn to the on-demand streaming case. 3.1 Live Streaming Live streaming refers to the synchronized distribution of streaming media content to one or more clients. (The content itself may either be truly live or pre-recorded.) Therefore multicast is a natural paradigm for distributing such content. Since IP multicast is not widely deployed especially at the inter-domain level CoopNet uses application-level multicast instead. A distribution tree rooted at the server is formed with clients as its members. Each node in the tree transmits the received stream to each of its children using unicast. The outdegree of each node is constrained by the available outgoing bandwidth at the node. In general the degree of the root node (i.e. the server) is likely to be much larger than that of the other nodes because the server is likely to have a much higher bandwidth than the individual client nodes. One issue is that the peers in CoopNet are far from being dedicated servers. Their ability and willingness to participate in CoopNet may fluctuate with time. For instance a client's participation may terminate when the user tunes out of the live stream. In fact even while the user is tuned in to the live stream CoopNet-related activity on his/her machine may be scaled down or stopped immediately when the user initiates other unrelated network communication. Machines can also crash or become disconnected from the network. With a single distribution tree the departure or reduced availability of a node has a severe impact on its descendants. The descendants may receive no stream at all until the tree has been repaired. This is especially problematic because node arrivals and departures may be quite frequent in flash crowd situations. To reduce the disruption caused by node departures we advocate having multiple distribution trees spanning a given set of nodes and transmitting a different MDC description down each tree. This would diminish the chances of a node losing the entire stream (even temporarily) because of the departure of another node. We discuss this further in Section 3.2. The distribution trees need to be constantly maintained as new clients join and existing ones leave. In Section 3.3 we advocate a centralized approach to tree management which exploits the availability of a resourceful server node coupled with client cooperation to greatly simplify the problem. 3.2 Multiple Description Coding (MDC) Multiple description coding is a method of encoding the audio and/or video signal into M 1 separate streams or descriptions such that any subset of these descriptions can be received and decoded into a signal with distortion (with respect to the original signal) commensurate with the number of descriptions received that is the more descriptions received the lower the distortion (i.e. the higher the quality) of base layer description (stream) 1 ... ... Pkt 1 Pkt 2 ... Pkt 1 Pkt 2 ... Pkt 1 Pkt 2 ... ... ... encoder decoder encoder decoder description (stream) 2 description (stream) M ... ... Pkt M Pkt M Pkt M ... ... GOF i-1 GOF i GOF i+1 Figure 1: (a) Multiple description coding. (b) Layered coding. D(R0 ) Distortion D(R1 ) D(R2 ) D(RM ) R1 R2 R3 RM-1 ... .. RM ... Bits Figure 3: Construction of MDC streams from packetized GOFs. sulting in distortion D(Rm ) where 0 R0 R1 RM and consequently D(R0 ) D(R1 ) D(RM ). Thus all M packets are equally important only the number of received packets determines the reconstruction quality of the GOF. Further the expected distortion is M m 0 p(m)D (Rm ) where p(m) is the probability that m out of M packets are received. Given p(m) and the operational distortion-rate function D(R) this expected distortion can be minimized using a simple procedure that adjusts the rate points R1 . . . RM subject to a constraint on the packet length 4 20 11 12 . By sending the mth packet in each GOF to the mth description the entire audio and/or video signal is represented by M descriptions where each description is a sequence of packets transmitted at rate 1 packet per GOF as illustrated in Figure 3. It is a very simple matter to generate these optimized M descriptions on the fly assuming that the signal is already coded with a layered codec. R0 Bit stream ... ... ... ... ... RS (M 1) (M 2) (M 3) ... Packet 1 Packet 2 Packet 3 Packet 4 Packet M (M M ) code Figure 2: Priority encoded packetization of a group of frames (GOF). Any m out of M packets can recover the initial Rm bits of the bit stream for the GOF. the reconstructed signal. This differs from layered coding2 in that in MDC every subset of descriptions must be decodable whereas in layered coding only a nested sequence of subsets must be decodable as illustrated in Figure 1. For this extra flexibility MDC incurs a modest performance penalty relative to layered coding which in turn incurs a slight performance penalty relative to single description coding. A simple MDC system for video might be the following. The original video picture sequence is demultiplexed into M subsequences by putting every M th picture m + iM i 0 1 2 . . . into the mth subsequence m 1 . . . M . The subsequences are independently encoded to form the M descriptions. Any subset of these M descriptions can be decoded and the pictures can be remultiplexed to reconstruct a video sequence whose frame rate is essentially proportional to the number of descriptions received. More sophisticated forms of multiple description coding have been investigated over the years some highlights are 26 27 28 6 . For an overview see 7 . A particularly efficient and practical system is based on layered audio or video coding 19 10 Reed-Solomon coding 29 priority encoded transmission 1 and optimized bit allocation 4 20 11 12 . In such a system the audio and/or video signal is partitioned into groups of frames (GOFs) each group having duration T 1 second or so. Each GOF is then independently encoded error protected and packetized into M packets as shown in Figure 2. If any m M packets are received then the initial Rm bits of the bit stream for the GOF can be recovered re2 Layered coding is also known as embedded progressive or scalable coding. 3.2.1 CoopNet Analysis: Quality During Multiple Failures Let us consider how multiple description coding achieves robustness in CoopNet. Suppose that the server encodes its AV signal into M descriptions as described above and transmits the descriptions down M different distribution trees each rooted at the server. Each of the distribution trees conveys its description to all N destination hosts. Ordinarily all N destination hosts receive all M descriptions. However if any of the destination hosts fail (or leave the session) then all of the hosts that are descendents of the failed hosts in the mth distribution tree will not receive the mth description. The number of descriptions that a particular host will receive depends on its location in each tree relative to the failed hosts. Specifically a host n will receive the mth description if none of its ancestors in the mth tree fail. This happens with probability (1 - )An where An is the number of the host's ancestors and is the probability that a host fails (assuming independent failures). If hosts are placed at random sites in each tree then the unconditional probability that any given host will receive its mth description is the average An N (1/N ) N across all hosts in the tree. Thus n 1 (1 - ) the number of descriptions that a particular host will receive is randomly distributed according to a Binomial(M N ) distrim bution i.e. p(m) M N (1 - N )M -m . Hence for large M m the fraction of descriptions received is approximately Gaussian with mean N and variance N (1 - N ). This can be seen in Figure 4 which shows (in bars) the distribution p(m) for various values of M 2 4 8 16 and N 10 1000 100000. In the figure to compute N we assumed balanced binary trees with N nodes and probability of host failure 1%. Note that as N grows large performance slowly degrades because the depth of the tree (and hence 1 - N ) grows like log2 N . The distribution p(m) can be used to optimize the multiple description code by choosing the rate points R0 R1 . . . RM 40 M 2 N 10 40 M 4 N 10 40 M 8 N 10 40 M 16 N 10 40 M 2 N 10 40 M 4 N 10 40 M 8 N 10 40 M 16 N 10 0 40 0 M 2 N 1000 2 0 40 0 M 4 N 1000 4 0 40 0 M 8 N 1000 8 0 40 0 M 16 N 1000 16 0 40 0 M 2 N 100 2 0 40 0 M 4 N 100 4 0 40 0 M 8 N 100 8 0 40 0 M 16 N 100 16 0 40 0 2 M 2 N 100000 0 40 0 0 4 0 40 M 4 M 8 N 100000 N 100000 8 0 40 0 16 M 16 N 100000 0 40 0 M 2 N 1000 2 0 40 0 M 4 N 1000 4 0 40 0 M 8 N 1000 8 0 40 0 M 16 N 1000 16 0 0 2 0 0 4 0 0 8 0 0 16 0 0 2 0 0 4 0 0 8 0 0 16 Figure 4: SNR in dB (line) and probabililty distribution (bars) as a function of the number of descriptions received when the probability of host failure is 1%. to minimize the expected distortion M m 0 p(m)D (Rm ) subject to a packet length constraint. Figure 4 shows (in lines) the quality associated with each p(m) measured as SNR in dB i.e. 10 log10 ( 2 /D(Rm )) as a function of the number of received descriptions m 0 1 . . . M . In the figure to compute the rate points R0 R1 . . . RM we assumed an operational distortion-rate function D(R) 2 2-2R which is asymptotically typical for any source with variance 2 where R is expressed in bits per symbol and we assumed a packet length constraint given as R 8. Figure 5: SNR in dB (line) and probabililty distribution (bars) as a function of the number of descriptions received during the failure of a single host. decreases. However it is also the case that the expected number of hosts that receive fewer than 50% of the descriptions decreases resulting in an increase in quality on average. Further as N increases for fixed M performance becomes nearly perfect since N 1 - log2 N/N which goes to 1. However for large N it becomes increasingly difficult to repair the trees before a second failure occurs. 3.2.3 Further Analyses 3.2.2 CoopNet Analysis: Quality During Single Failure The time it takes to repair the trees is called the repair time. If of the hosts fail during each repair time then the average length of time that a host participates in the session is 1/ repair times. When the number of hosts is small compared to 1/ then many repair times may pass between single failures. In this case most of the time all hosts receive all descriptions and quality is excellent. Degradation occurs only when a single host fails. Thus it may be preferable to optimize the MDC system by minimizing the distortion expected during the repair interval in which the single host fails rather than minimizing the expected distortion over all time. To analyze this case suppose that a single host fails randomly. A remaining host n will not receive the mth description if the failed host is an ancestor of host n in the mth tree. This happens with probability An /(N - 1) where An is the number of ancestors of host n. Since hosts are place at random sites in each tree the unconditional probability that any given host will receive its mth description is the average N (1/N ) N n 1 (1 - An /(N - 1)). Thus the number of descriptions that a particular host will receive is randomly distributed according to a Binomial(M N ) distribution. Equivalently the expected number of hosts that receive m descriptions during the failure is (N - 1)p(m) where m p(m) M N (1 - N )M -m . This distribution can be used to m optimize the multiple description code for the failure of a single host. Figure 5 illustrates this distribution and the corresponding optimized quality as a function of the number of descriptions received for M 2 4 8 16 and N 10 100 1000. Note that as M increases for fixed N the distribution again becomes Gaussian. One implication of this is that the expected number of hosts that receive 100% of the descriptions These same analyses can be extended to d-ary trees. It is not difficult to see that for d 2 a d-ary trees with N log2 d N nodes has the same height and hence the same performance as a binary tree with only N nodes. Thus when each node has a large out-degree i.e. when each host has a large uplink bandwidth much larger populations can be handled. Interestingly the analysis also applies when d 1. So if each host can devote only as much uplink bandwidth as its downlink video bandwidth (which is typically the case for modem users) then the descriptions can still be distributed peer-to-peer by arranging the hosts in a chain like a bucket brigade. It can be shown that when the order of the hosts in the chain is random and independent for each description then for a single failure the number of hosts receiving m out of M descriptions is binomially distributed with parameters M and N where N (N + 1)/2N . Although this holds for any N it is most suitable for smaller N . For larger N it may not be possible to repair the chains before other failures occur. In fact as N goes to infinity the probability that any host receives any descriptions goes to zero. In this section we have proposed optimizing the MDC system to the unconditional distribution p(m) derived by averaging over trees and hosts. Given any set of trees however the distribution of the number of received descriptions varies widely across the set of hosts as a function of their upstream connectivity. By optimizing the MDC system to the unconditional distribution p(m) we are not minimizing the expected distortion for any given host but rather minimizing the sum of the expected distortions across all hosts or equivalently minimizing the expected sum of the distortions over all hosts. 3.3 Tree Management We now discuss the problem of constructing and maintaining the distribution trees in the face of frequent node arrivals and departures. There are many (sometimes conflicting) goals
- Rating :
- Search Skype/AIM!
- File Type : .pdf
- Page size : 185 x 240
- Length : 13 pages
- File Size: 579.1 kb
- Virus Tested : No
- Verified : 2013-06-06
- Source: www.research.microsoft.com
INFO HASH : 61f1c1065c1cbb8a23425baaaf4677bf5aae3ccb
blog comments powered by Disqus

Download now