Video Networker: March 2010

Friday, March 19, 2010

3rd HD Communications Summit Discusses HD Voice in Fixed and Wireless Networks

I was invited to represent Polycom and speak at the third HD Communications Summit which took place in Paris, France on February 12, 2010. Both the first Summit (May 2009) and the second one (September 2009) were in New York City, and Polycom’s CTO Jeff Rodman attended. The HD Communications Summit in Paris was therefore the first European event and was hosted by Orange in their Innovation Garden. The Summit gathered voice industry professionals from across Europe and the United States to discuss the state of HD Voice deployments and future plans. Since I have always looked at HD Voice from enterprise VOIP perspective, the Summit was unique opportunity to see HD Voice through the eyes of a service provider.

Why HD Voice?

HD Voice technology has been around for long time. I traced Polycom’s first HD Voice implementation (Siren 7, a 7 kHz codec) to the VSX video endpoint in 2000, while the first Polycom IP phone with HD Voice was SoundPoint 650 in 2006. But only recently, competition with alternative communication means, such as Email and IM, has led to serious attention to HD Voice among service providers. For example, Voice brings now 75% of mobile service providers’ revenue but it is rarely discussed at industry events such as the Mobile World Congress where the focus is on apps for mobile devices.

The voice industry was concerned too much about not competing with legacy voice, and managed to preserve the same voice quality level (dubbed ‘toll quality’) for decades. However, other communication tools are competing for user’s attention today, with Email, Web, and Instant Messaging so widely adopted that people often use Voice only if they do not have other options to reach someone. Mediocre voice quality leads to misunderstandings while accuracy is very important in the complex environment we all live in. HD Voice reduces or eliminates misunderstanding, and has been proven to reduce fatigue and increase productivity.

I knew that Orange had about 190 million customers - of which 130 million mobile customers - but I knew little about Orange’s involvement in HD Voice and wanted to learn why they are so passionate about HD Voice. It turns out Orange conducted customer surveys which showed that 72% of customers wanted HD voice, 40% would make more or longer calls with HD voice, and 50% would switch VOIP operator to get improved voice quality. Orange, therefore, sees HD Voice is a competitive differentiator and has a strategy how to deliver HD Voice to both residential and wireless customers.

Residential HD Voice

On the residential side, Orange has installed 600,000 HD Voice devices (LivePhone) since 2006. LivePhone uses DECT on the cordless interface, and plugs into the Livebox home gateway, which connects via DSL to the IP network. Livebox supports two wideband voice codecs: G.722 for broadband HD Voice and AMR-Wide Band (AMR-WB, equivalent to G.722.2) for HD Voice to mobile handsets. The LivePhone/Levebox combo is already 8.5% of the installed VOIP base in France and about 20% of the total DECT sales in 2009. Orange’s goal is to have more than 1 million HD Voice residential customers by 2012.

Around year 2000, the advances of IP technology led many industry experts to believe that DECT would quickly disappear and Wi-Fi would deliver IP to the handset. Instead, I see DECT base stations being integrated with DSL routers, and support of Voice over IP in the DSL routers, not in the handsets. This approach allows implementing the more complex IP protocols in the base station/router while keeping the handset simple and inexpensive, which is very important in the price-sensitive residential market. The disadvantage of this approach is that every new IP application, for example IM, must be mapped into a DECT set of commands, and 100% mapping is rarely possible.

I have not been involved in DECT since a DECT/GSM interworking project in 1996, so it was interesting to hear about CAT-iq, the shiny new thing in the DECT space. CAT-iq is a joint specification of the DECT Forum and the Home Gateway Initiative (HGI), and describes the integration of DECT and IP for broadband home connectivity. In addition to the narrow-band voice codec G.726 at 32 kilobits per second, traditionally used in DECT, CAT-iq includes the wideband voice codec G.722 at 64 kilobits per second, in effect making every CAT-iq handsets an HD Voice handset. Most of the CAT-iq implementations will be in DECT/DSL combos but integration of DECT and cable modem is already defined by CableLab’s DECT SIP Specification, and products are in development.

HD Voice in Mobile Networks

On the wireless network side, Orange has successfully deployed HD Voice in Moldova, and is planning deployments in UK, Spain, and Belgium in 2H’2010. Their goal is to have 75% of mobile handsets support AMR-WB by 2012. Based on lessons learned from previous new technology deployments Orange wants to upgrade all network elements to HD Voice before rolling out the service to wireless customers. Adding an HD voice codec to mobile handsets is just the first step. Since noise increases when you transmit higher frequencies, the handset must have better noise suppression. The acoustics of the mobile device are very important, too. Orange tested 20 ‘HD Voice handsets’ and found Mean Opinion Score (MOS) between 2 (really bad quality) and 4 (really good quality). Improving acoustics requires touching microphones, casing, and speakers of the handset.

Note that there are several ways to get HD Voice into the mobile handsets. Few handsets such as Nokia 6720 and Sony-Ericsson Elm/Hazel support HD Voice for voice calls (several new ones were introduced at the Mobile World Congress which I could not attend), while mobile devices from Google, HTC, and Blackberry have audio players that can play HD Voice quality but do not support it on voice calls. In addition, mobile handsets with Wi-Fi support could run an HD Voice soft client. Nokia has a mobile VOIP client that runs on some Nokia mobile devices and there are soft client implementations for Windows Mobile OS.

Orange’s goal is to deploy HD Voice without upgrading the wireless links which traditionally carry AMR-Narrow Band voice at 12.2 kilobits per second (per voice channel). They, therefore, selected the AMR-WB option at 12.65 kilobits per second to keep the wireless network bandwidth virtually unchanged. In the core of the wireless network, announcement and voice mail servers must be upgraded to support HD Voice, while the network connecting mobile devices and servers must support QOS parameters for HD Voice.

Orange is concerned that if even one of these elements – handsets, network, or servers - does not support HD Voice, users will be disappointed by the overall experience.

Connecting the HD Islands

The deployment of HD Voice led to the creation of islands using different codecs. VOIP applications use the G.722 codec. Mobile service providers use AMR-WB in GSM networks and 4GV-WB in CDMA networks. Microsoft and Skype push their own proprietary voice codecs. Unfortunately, it does not look like one codec is going to win (you can also read my post ‘How Many Codecs does Unified Communication Really Need?’ that addresses this issue), and transcoding among different HD Voice formats will be needed in the foreseeable future to connect HD Voice islands.

Service provider Voxbone offers HD Voice interconnect service based on G.722 codec; the service includes transcoding from G.711, G.729, and SILK. The adoption of URI dialing has been slow and Voxbone is therefore using E.164 dialing. They got a new country code +883 from ITU (similar to +1 for USA and +49 for Germany), and now offer their iNum interconnect service in 49 countries. Since they follow the traditional SP origination-termination model, mobile and other service providers can easily connect to Voxbone’s iNum service.

Nobody seemed to like transcoding at the HD Communications Summit, and Orange encouraged other markets/networks to adopt AMR-WB, and avoid transcoding. The issue is that AMR-WB is not royalty free - it includes IPR from Nokia, Ericsson, FT, and VoiceAge – and adopting it in other market segments would increase the cost of doing business. AMR-WB is also optimized for transmitting voice over low-bandwidth wireless links, while wired networks have moved past that and heading towards much higher (full-band audio) quality and to transmission of music and natural sounds.

Wired networks will very fast move from 7 KHz to 14 kHz to 20 kHz audio; Polycom is already shipping products that support these options. We strongly believe that a new era of royalty-free licensing has dawned on us, and that is why we have a royalty-free license for all voice codecs that include Polycom IPR: Siren 7, Siren 14, Siren 22, and the corresponding G.722.1, G.722.1 Annex C, and G.719. In fact, we cover all of these technologies in a single licensing agreement that makes the whole licensing process a breeze. Since G.719 is a joint development of Polycom and Ericsson, we worked with our partner to make sure G.719 is completely royalty-free, which today is the only way to assure wide codec adoption.

The Future of HD Voice

The key argument of my presentation ‘Visions of the HD Future’ at the Summit is that wireless voice networks will follow the same pattern as wired voice networks - with some delay due to the slower increase of bandwidth on wireless links - and will gradually move from wideband codecs such as AMR-WB to super-wideband codecs to the ultimate full-band quality (20 kHz) available in Siren 22 and G.719. At that point, HD voice will become what I call ‘Ultra High Definition Audio’ and will be used for much more than just talking. There are many vertical applications, for example, in healthcare or in the arts and humanity space that have been using Polycom full-band audio technology, in wired networks. Adding mobility to such applications is definitely something our industry should strive for. Have a look at the slides and let me know if you have comments or questions!

In summation, the third HD Communications Summit was a great opportunity to meet some old friends in the voice industry and some new people, to check deployment progress, and compare notes on where the voice industry is going. I liked the healthy balance between fixed and wireless networks in all discussions. I wish presenters would use more audio and video files to support their points - I was a definite minority playing multiple video and audio clips during my presentation.

The next HD Communications Summit will be held on May 12, 2010 in the Computer History Museum in Silicon Valley, California. The location is almost walking distance from my office, so I will definitely attend, and I hope to meet some of you there!

Wednesday, March 17, 2010

Will the US Stimulus Package Lead to Wider Video Adoption?

When I first heard about the priorities of the US stimulus package (the official name is American Recovery and Reinvestment Act, or ARRA), I was very hopeful that it will drastically improve the broadband infrastructure and pave the way for wider adoption of video communication across the United States. Video - and to a lesser extent wideband audio - require quite a lot of bandwidth combined with some quality of service requirements , for example, packet loss should not be more than 5%, jitter should not be more than 40ms, and latency … well, latency is negotiable but a real nice real-time interaction calls for maximum 250ms end-to-end.

I live in a large city which is part of a large metropolitan area of 5-6 million people, and I do have choices among cable, xDSL, FTTH, etc. to get high-speed access to the IP network. In addition, Polycom’s office is not far away and once I connect to the corporate network, I can use much faster and more predictable links to connect to other Polycom offices around the world. But what if I lived in a remote rural area? What if I only could get modem or satellite connection, or connect through the packet service of a mobile network? I would not be able to use video communication – at least not at quality level that makes it useful - and even wideband audio would be a challenge.

A huge part of the US population cannot use video communication because the broadband access network just does not support this application, and the stimulus money spent on broadband initiatives should improve the situation. Wouldn’t it be great to allow patients at remote locations to access best specialist over video and rural schools to connect to world-class education institutions such as the Manhattan School of Music and teach music over advanced audio-video technology?

But how does the stimulus package apply to broadband access? The National Telecommunications and Information Administration (NTIA) established the Broadband Technology Opportunities Program (BTOP) which makes available grants for deploying broadband infrastructure in ‘unserved’ and ‘underserved’ areas in the United States, enhancing broadband capacity at public computer centers, and promoting sustainable broadband adoption projects. The Rural Utilities Service (RUS) has a program called BIP (Broadband Initiatives Program); it extends loans, grants, and loan/grant combinations to facilitate broadband deployment in rural areas. When NTIA or RUS announce a Notice of Funds Availability (NOFA), there is a lot of excitement in the market.

I am actually less interested in the logistics of fund distribution but am rather concerned about the ‘broadband service’ definition used in all NOFA documents. It originates from the Federal Communication Commission (FCC) and stipulates that ‘broadband service’ is everything above 768 kilobits per second downstream (i.e. from service provider to user) and 200 kilobits per second upstream (i.e. from user to service provider). Real-time video requires symmetric bandwidth, although video systems would adjust the audio and video quality level depending on the available bandwidth in each direction. At the minimum ‘broadband service’ level defined above, the user could see acceptable video quality coming from the far-end but would be able to only send low-quality video to the far-end.

I understand that when the broadband service definition was discussed at FCC, the wire-line companies wanted higher limits, in line with what cable and xDSL technology can support, while wireless companies wanted far lower limits, like the ones adopted, so that they can play in broadband access as well. FCC decided to set the bar low enough for everyone to be able to play but allow competition in offering higher speeds. There is fair amount of skepticism that this model will get us to higher speeds than the defined minimums. Several organizations including Internet2 proposed two-tier approach with a lower broadband service limit set for households and a higher limit set for institutions/organizations; however, FCC’ final definition did not recognize that broadband for institutions is different from broadband for end users.

At Polycom, we take network bandwidth limitations very seriously, and have been working on new compression techniques that reduce bandwidth usage for video communication. This resulted in the implementation of the H.264 High Profile which I described in detail in my previous post. And while we can now compress Standard Definition video to about 128 kilobits per second, the additional bit rate necessary for good quality audio and the IP protocol overhead still does not allow us to fit into the very thin 200-kilobits-per-second pipe. Don’t forget that bandwidth is not the only requirement for real-time video; latency, jitter and packet loss are very important and none of these parameters is explicitly required or defined in any NTIA or RUS documents.

Tuesday, March 9, 2010

H.264 High Profile: The Next Big Thing in Visual Communications / Telepresence

High Definition (HD) video led to a rapid and total transformation of the visual communication market. It made visual communication much more attractive, and demand for mass deployment in organizations of any kind and size increased. The dilemma of CIOs today is how to meet user demand for HD communication while not breaking the bank for network upgrades.

Now that video systems support HD up to 1080p quality and the infrastructure is scalable and robust enough to support large HD deployments, network bandwidth remains the last limiting factor to mass deployment of HD video across organizations. Most CIOs are still not comfortable letting 1+ megabit per second HD video calls flood their IP networks. Timing is therefore perfect for a new compression technology breakthrough that dramatically decreases the bandwidth required to run HD and high-quality video.

While H.264 is a well-established and widely implemented standard for video compression, the much simpler and less efficient Baseline Profile is used for visual communication applications today. H.264 however offers more sophisticated profiles, and the High Profile delivers the most efficient compression, in many cases reducing the network bandwidth for a video call by up to 50%. Polycom’s announcement about the support of H.264 High Profile across its video solution is therefore exactly what the market needs right now. This technology breakthrough not only enables drastic reduction of the network resources necessary to video-enable organizations but also allows CIOs to meet budget challenges and power more visual communication with fewer resources, thus limiting or avoiding costly network upgrades.

In my view, the shift from Baseline Profile to High Profile is bigger and more important than the previous video technology breakthrough—the much-heralded shift from H.263 to H.264 in 2003. The gains in performance for High Profile are consistent across the full bandwidth spectrum, while the incremental gain for H.264 over H.263 was limited to the lower bandwidths, below 512 kilobits per second. As a result, new High Definition systems benefit the most from High Profile, and this new technology will accelerate the adoption of HD communication across organizations.

I have received a lot of questions about the H.264 High Profile: How is the High Profile different from other H.264 profiles? What is the impact of this new capability on the visual communication market? How will customers benefit from it? How will this technology help CIOs roll out video across organizations? How does High Profile interact with other video functions? What is its role in the Polycom Open Collaboration Network?

Answering these questions online would have resulted in a very long blog post, so I put together a white paper that looks at High Profile from both business and technology perspectives. I called it “H.264 High Profile: The Next Big Thing in Visual Communications”. Let me know what you think about it.