Wednesday, April 6, 2011

What Did the 80th IETF Meeting Mean to HD Voice, HD Video, and Unified Communications?

IETF has been making Internet standards - called unpretentiously "Request For Comments" or "RFCs" but nevertheless working quite well – for exactly 25 years now. Happy birthday, IETF! May the next 25 be equally exciting! This anniversary also means that the Internet has matured, and I could feel it in the discussions at the 80th IETF meeting (IETF 80) last week. Prague was a beautiful venue for the event and the meeting hotel Hilton Prague provided excellent facilities.
I do not attend most IETF meeting – that would be quite difficult with my busy schedule and with three IETF meetings on three different continents happening every year. The last one I attended (IETF 74) was in San Francisco two years ago, and I was very excited to find out what had changed in 2 years. The very strong Polycom team included Mary Barnes, a long-time IETFer, Stephen Botzko, who also covers ITU-T for Polycom, Mark Duckworth, and me.

From the 133 or so session I counted in the program I attended the ones that were related to voice and video (over IP, of course, since it is all about the Internet), Unified Communications, and related technologies. I could recognize three main discussion topics: connecting IP communication islands, enabling UC applications on the network, and handling of multiple media streams.

Connecting IP Communications Islands

IETF recognizes that there are still islands of IP communication and the vision of IP networks replacing the Public Switched Telephone Network (PSTN) is far from being fulfilled. This led to the VIPR activities (VIPR stands for "Verification Involving PSTN Reachability") that leverage nothing else but the PSTN to allow more voice to flow over IP and never cross PSTN. Since IP communication islands do not trust each other, the VIPR idea is to use a basic phone call to verify the destination is what it claims to be. On more generic level, the mechanism can be used to extend trust established in one network (e.g. PSTN) to another network (e.g. IP) but the VIPR working group seems to be focusing on the narrow and practical application of connecting voice over IP islands without PSTN gateways. VIPR is very important to HD voice because it enables direct end-to-end HD voice connections. PSTN gateways on the other hand always take the voice quality down to "toll quality" (3.4kHz, G.711), even if handsets and conference servers support HD voice. VIPR can be used for video, and in fact is even more beneficial for video, since PSTN does not support video at all. Once PSTN is used to verify the destination, all subsequent calls between source and destination can be completed over IP. Again, the quality is not limited by any gateway, only by available bandwidth, and HD video can flow freely end-to-end.

Another interesting discussion that reminded us – who mostly live in the IP world - about the existence of PSTN was about Q.850 error codes generated by switches in the PSTN network. I remember the discussions about mapping these error codes to SIP error codes from previous IETF meetings but it turns out these mappings do not work well because some of the Q.850 have no equivalent in SIP and inventing new error codes only complicates SIP and confuses SIP servers. So the proposal on the table is to update RFC 3326 "The Reason Header Field in SIP" to transport the original Q.850 codes. Well, as they say, when mapping does not work, it is best to encapsulate and let network elements decide what to do with the information inside.

And then there was the discussion about lost Quality of Service (a.k.a. DSCP) settings when IP traffic passes through a service provider IP network. Since SPs do not really like any of their customers telling them how to prioritize traffic in their network, they basically resets the QoS values in the IP packets coming from the customer LANs to something they can use in the SP networks. The problem is that the destination IP LAN may want to honor the original QoS but the packets coming in from the SP do not have the real DSCP values. This leads to all sorts of creative ideas how to pass more granular information about the type of application end-to-end. One proposal in the MMUSIC working group was to update RFC 4598, so that the session description (Session Description Protocol, or SDP) has more detailed description of the application (for example telepresence, desktop video, personal video, web collaboration, etc.), so that the destination LAN knows what priority to assign to the traffic in that session.

Finally, there is always the topic of end-to-end security. Many obscure mechanisms defined in some RFC and being use in some application somewhere in the world turned out to have problems with the relatively new ICE firewall traversal mechanism. ICE stands for Interactive Connectivity Establishment, and is finally an RFC (RFC 5245 to be precise). As a result, there were numerous presentations at IETF 80 – mostly in the MMUSIC working group - about things that do not work with ICE: some fax scenarios, the relatively new DCCP protocol (which was all the rage back at IETF 74), simulcast streaming scenarios, and media aggregating scenarios. At this point, however, no one is considering changing ICE and the pretty universal response to such contributions was "sorry but we cannot help you". Another security issue was discussed in the XMPP group, where the consensus was that Transport Layer Security, or TLS, was not working at all on the interface between XMPP servers, that is, in XMPP federation scenarios. The proposal in discussion was to use DNS SEC to verify SRV records and define rules how authorizations and permissions are handled across domains. The simplest explanation is that if you have Gmail with 1000 domains and WebEx with another 1000 domains, trying to establish XMPP federations among all domains would drive the number of connections towards a million, which leads to scalability and performance issues. Instead, the folks in the XMPP group want to establish one connection between Google and WebEx and have all domains use it. There is also a certain time pressure to solve the secure XMPP federation issue because the US government is considering implementation of XMPP federation - if the security issues are fixed.

Enabling UC Applications

The second big topic at IETF 80 was standards around UC functions. While there are many different opinions where UC will happen – in a soft client, in the mobile phone, in the browser, etc. - at IETF 80, the attention was on "UC in the browser". RTC Web, or Real Time Communication on the World Wide Web, was feverishly discussed, starting with the Birds of Feather event on Tuesday (from the saying "Birds of a feather flock together", or an informal discussion group) and concluding with the first RTC Web working group meeting (although the group has not officially been established yet) on Friday. Web browsers today use incompatible plug-ins to communicate with each other and there is no interoperability across browser vendors. RTC Web's vision is a standard that allows real-time communication functionality (voice, video, and some associated data) to be exchanged across web browsers. The architecture is still fluid but it is clear that a group of companies, including Google and to a certain extent Skype, is interested in standardizing the media stream across browsers. As for using standard signaling (which basically means including SIP stack in the browser), there is no consensus, as browser vendors seem more comfortable with their own proprietary HTTP-based signaling, and promise gateways to SIP networks. I guess I understand why Google wants to do it all in the browser (this is the environment they can control more or less) but I am still puzzled by Skype's position – maybe they are willing to give up the Skype client for a strong play in the infrastructure. I am mostly interested in standards, so SIP in the browser sounds more interesting than proprietary protocols that then require gateways to connect to standards-based networks.

Another interesting discussion around UC took place in the XMPP working group, and was about interaction between XMPP and SIP clients. Since UC in its basic form is presence, IM, voice, and video, and since XMPP is used for instant messaging (IM) and presence while SIP is used for voice and video over IP, I would say this particular discussion was really about UC (outside the browser, though). The Nokia team presented a proposal for interworking between XMPP and SIP based on a dual-stack (SIXPAC). We at Polycom love dual-stack implementations – with most new video endpoints and even some high-end business media phones supporting SIP, H.323 and even XMPP while multipoint conferencing servers support many protocol stacks simultaneously. The proposed XMPP-SIP dual stack would make gateways between SIP and XMPP unnecessary, and would allow for richer user experience on the dual-stack client. The key benefit I see is if SIP URIs that are used to place a SIP call can be automatically resolved (by some server) into Jabber Identifiers (JIDs) used in XMPP, and vice versa. This would for example allow the user to check presence, start an IM session, and then seamlessly escalate it to voice and video.

Multiple Media Streams

The third big topic at IETF 80 was handling of multiple streams, or as the IETFers say, multiple m lines in the session description. Applications for multiple streams range from multi-codec telepresence systems to video walls in situation rooms to just sending multiple media streams for redundancy. The most important of all is of course the CLUE activity. CLUE stands for "ControLling mUltiple streams for tElepresence" (I know – picking random letters to create a catchy group name is an art form that I do not appreciate enough), and the CLUE working group had its first meeting at IETF 80. Mary Barnes moderated the session. With about 80 people in the room, the discussion covered symmetric and asymmetric point-to-point and multipoint use cases. Encouraging was that many people in the room had read the use case description and 20+ people agreed to review and contribute. Later the group spent quite a lot of time going over assumptions (about 8 in total) and requirements (about 12 originally but one was later dropped). It became clear that CLUE will only focus on handling multiple media streams and leave architecture, signaling, etc. to other IETF groups. Most questions were about definition of terms such as "stream", "source", "endpoint", "middle box", "asymmetric", "heterogeneous", and "synchronization". The CLUE group will continue discussions at a virtual meeting in May and possibly a face-to-face meeting in June. Polycom's Mary Barnes in a chair of the CLUE working group, and will keep me updated on the progress.

CLUE is very important because the video industry needs a consensus how to handle telepresence and other multi-stream applications. Since Polycom has announced support for the Telepresence Interoperability Protocol (TIP), I frequently get asked how CLUE relates to TIP. The short answer is "It does not". CLUE's charter is to develop standards for describing the spatial relationships between multiple audio and video streams and negotiation methods for SIP-based systems. This new work is completely separate from TIP and will support many use cases that TIP does not. The work was originally proposed by the IMTC Telepresence activity group (which Polycom also co-chairs), and was chartered by the IETF early this year. Participating companies include Polycom, Cisco, HP, Huawei, ZTE, and others.

Note that ITU-T Study Group 16 is also active in that area but has a different charter, which includes multiple-stream signaling for H.32x systems, the creation of services (like accessibility), establishing requirements for telepresence media coding, telepresence control systems, and media quality recommendations for telepresence systems. ITU-T is planning to harmonize the H.32x multi-stream signaling with CLUE but, more importantly, the above mentioned companies are participating in both IETF and ITU-T, which is the best way to make sure the standards do not contradict. As far as the future of TIP goes, it is an interim solution for vendors to interoperate with Cisco telepresence systems. We will have to see how long it lasts in the market place - certainly once solutions are deployed they tend to stay for a while. As usual, various vendors will provide their own migration paths to the standard, for example, Polycom will continue to support TIP as long as it is necessary for our customers, and gradually migrate telepresence systems to CLUE - once the work in the CLUE working group is completed and the standard is ready.

Back to the broader topic of handling multiple media streams. The topic came up in several presentations in the MMUSIC group. For example, Huawei proposed transmitting 3D video via multiple streams. Since the 3D image can be created through two 2D images – one for each eye - simulcast (i.e. sending Left and Right views in separate streams) can be used. Other options include frame packing (combine Left and Right views into a single stream) and 2D+auxiliary (synthesize Left and Right views from 2D video using auxiliary data such as depth and parallax maps). The draft introduces a new SDP attribute called "Parallax-Info" with parameters "position" and "parallax". While some IETFers expressed concerns about breaking the Real Time Protocol (RTP), there are interesting elements in the draft and I will keep following it.


IETF 80 was a very productive meeting and a great gathering of technical experts from around the world. They all brought new ideas and very different areas of expertise: networking, voice, video, web, mobile, etc. The meeting provided an excellent opportunity for discussions in and outside the conference rooms. A lot of topics that I know from IETF 74 progressed quite well but did not disappear, just led to additional areas that require standardization.

My takeaway from the meeting is that standardization work is like a marathon – it requires patience and persistence to get to the final line. The Prague Marathon on Saturday was therefore a fitting metaphor and a great way to conclude a very productive, well-attended, and amazingly versatile IETF 80. (IETF participants were not required to run the marathon!)

1 comment:

  1. Fantastic summary. Thank you!

    I am familiar with several of the underlying concepts in here, but VIPR was new to me, and I find it particularly interesting.