Monday, December 20, 2010

The Hottest Topics in 2010

Year 2010 is coming to an end and this gives me a great opportunity to look back, and summarize the major communications market developments in the past 12 months. Unified Communications was the most important topic in 2010, and was frequently discussed at industry events (see links below), in online forums, and in papers.

While most of the technology discussions were around better ways to compress video – for example, via H.264 High Profile and Scalable Video Coding – business discussions focused mainly on cloud-based UC services that service providers are starting to explore. Unresolved interoperability issues in the area of Telepresence and Unified Communications were discussed at meetings of IMTC, UCIF, and standardization organizations while HD Voice continued to penetrate new market segments.

New Video Technologies

One of the hottest new technology developments in 2010 was the implementation of H.264 High Profile which allows for substantial network bandwidth savings when transmitting HD or SD video over the network. To describe the technology and its benefits, I wrote the white paper “H.264 High Profile: The Next Big Thing in Visual Communications” in April, and since the technology was moving so fast, I had to update the paper in June.

By October there was so much interest around High Profile in the service provider community that my colleague Ian Irvine and I put together an additional paper “The Opportunity for Service Providers to Grow Business with Polycom High Profile” which looks at the High Profile technology from a service provider perspective.

Another hot technology topic in 2010 was H.264 Scalable Video Coding, and in November I wrote the paper “More Scale at Lower Cost with Scalable Video Coding” that discusses the SVC benefits but also the interoperability challenges surrounding this new technology.

Cloud-based UC Services

While VOIP service providers have been around for a while, UC services provided by service providers are quite new phenomenon. In 2010, I had the pleasure to meet several service providers – Simple Signal is one example – that have created amazing services using UC technology.

The Broadband World Forum in October was great opportunity to invite service providers and UC vendors for a panel discussion around UC. I chaired the session “Unified Communication in the Cloud” which examined which UC functions bring real value to enterprises and how hosted or managed service providers can help deliver this UC functionality. Andreas Arrigoni, Head of Collaboration Services at Swisscom, and David Gilbert, President of SimpleSignal, provided perspectives on the service provider role for rolling out UC services.

The UC vendor community was represented by Thierry Batut, Sales Leader for Unified Communications at IBM Europe, George Paolini, VP Business Solutions at Avaya, Glynn Jones, VP at Polycom EMEA, and David Noguer Bau, Head of SP Multiplay Marketing EMEA at Juniper Networks. The 90-minute session was an excellent mix of presentations and panel discussion, and is a model I would love to replicate at other industry events in the future.

Cloud services are definitely the new frontier for Unified Communication, and I have summarized my thought about voice and video collaboration services in the cloud in a blog post in September. Then in October BroadSoft announced its new BroadCloud service, and I spent some time describing how Polycom and BroadSoft partnered over the years – in fact, since 2002 – to first make hosted VOIP robust and easy to deploy, and now expand to video services in the BroadCloud. Read the entire story in the new white paper “From Hosted Voice to Cloud-Based Unified Communication Services”.


In my blog post in May, I made the argument for standards and interoperability, and for the need of additional test and certification work in the Unified Communications Interoperability Forum (UCIF). Then I had the chance to present on “How UC and Telepresence Are Changing Video Protocols and Interoperability Forever” at the Wainhouse Research Collaboration Summit in June. Interoperability – and more specifically telepresence Interoperability - was the most important topic at IMTC SuperOp!, which was followed by a lot of face-to-face and virtual discussions throughout 2010. The most recent one was a roundtable with telepresence vendors organized by Howard Lichtman from the Human Productivity Lab.

UC interoperability work started in the UCIF and the forum’s first face-to-face meeting took place in October. The true value UCIF brings to the table is the ability to create test specifications, verify, and certify vendor compliance and interoperability. This will finally create an independent seal of approval that is very much missing in UC environments today, and which customers are calling for before committing to Unified Communications.

HD Voice Everywhere

In 2010, HD Voice finally moved from on-premise installations in the enterprise to scalable services provided by tier-one service providers. The third HD Communications Summit in February was the first European HD Voice event and was hosted by Orange. The Summit gathered voice industry professionals from across Europe and the United States to discuss the state of HD Voice deployments and future plans. It was a great opportunity to meet some old friends in the voice industry and some new people, to check deployment progress, and compare notes on where the voice industry is going.

Happy Holidays

I would like to thank all Video Networker followers for their continuous support and great feedback in 2010. Very best wishes to you and your families in this holiday season!

Thursday, October 14, 2010

Inside the Unified Communications Interoperability Forum (UCIF)

First UCIF Face-to-Face Meeting

UCIF members met for the first time face-to-face during IT EXPO in Los Angeles last week. I had four presentations at IT EXPO and was in town, so I had the opportunity to meet key people and get an overview of the activities in UCIF. Polycom is a founding member of the forum, and actively participates in working groups along with Microsoft, Logitech/LifeSize, and other member companies.

36 people gathered in the LA Convention Center for series of meetings on Tuesday and Wednesday, while up to 22 people joined online. Recruitment of additional members is ongoing, so if your company would like to join, let me know.

UCIF Working Groups

The meeting included sessions of the active UCIF groups. Each UCIF group starts as a Study Group (similar to a BOF in IETF) which studies certain issue, and develops a proposal for charter. Once the charter is approved by the UCIF board, the group becomes a Task Group. On a high level, UCIF has three WGs: Technical WG, Test & Certification WG, and Marketing WG. Within the Technical WG, there are currently three Task Groups (USB Audio Task Group, Webcam Task Group, and H.264 Profile Task Group), and three Study Groups (Voice Study Group, Instant Messaging and Presence Study Group, and Provisioning Study Group).

I was able to attend few of the groups’ sessions, and got the following understanding of where UCIF is going. Based on the trend towards simplifying the infrastructure for video (and preparing it for cloud deployments), Scalable Video Coding is the focus of the H.264 Profile Task Group. The group will create an H.264 SVC profile to ensure that video encoders and decoders interoperate. The group will not address SVC description in SDP, transport via RTP, etc., since there is already work on these topics at IETF.

SVC moves the complexity from the infrastructure to the edges of the network, i.e. in the endpoints. As video soft client proliferation is expected to lead to mass deployment of video, solutions are needed to free the computer CPU from the video processing work. Current SVC implementations run on high-end computers and consume a lot of their performance impacting other applications; therefore, looking for ways to free the CPU is an honorable task.

The UCIF Webcam Task Group is focusing on compressing video in the webcam, and defining the interface between webcam and video soft client application. Since webcams are usually connected to the computer via USB, the USB Video Class Specification V1.1 can be used as baseline but H.264 SVC configuration requires exchange of additional parameters. The specification must also allow for multiplexing video streams on single USB interface while keeping the interface simple for the client.

Provisioning of endpoints is critical in multi-vendor environments where the configuration server may come from one vendor while the endpoint comes from another. Most recently, the SIP Forum set out to define a profile and recommendations for User Agent configuration. The result of this work was a contribution in IETF that describes a mechanism for server discovery using existing standards. The SIP Forum, however, did not address the configuration data model/schema. At its meeting, the UCIF Provisioning Study Group discussed the gap between its charter and the work done in the SIP Forum. There does not seem to be a good reason for UCIF to define yet another mechanism for discovery for SIP User Agents when both SIP Forum and IETF have defined mechanisms. Focusing on the format for the configuration data (data model/schema) makes sense and so does creating a test and certification around the provisioning interface. Several previous attempts at IETF to define a schema can be used as a starting point for the UCIF Provisioning SG.

The UCIF Provisioning Group is still a Study Group but a charter is almost done and nothing should stay in the way for the group to become Task Group very soon. Polycom is obviously very interested in this work, since Polycom endpoints – telephones, video endpoints, etc. – are deployed with dozens of servers. Wouldn’t it be nice to have a standard way of configuring endpoints, no matter what environment the endpoints are deployed in? Another example for the importance of interoperability is deployment of CMA 5000 management application in mixed video endpoint environments. Now management applications have to support multiple configuration methods to support endpoints from Polycom, Tandberg, etc., which is very inefficient and limits scale. The UCIF Provisioning group discussed the use of service announcement via DNS SRV and Bonjour (previously called ‘Rendezvous’). While discovering the provisioning service is important, I think defining the configuration format - possibly an XML file – and transport mechanism – I vote for HTTPS – are important requirements for interoperability.

The Test and Certification Work Group is probably the most important group of all. All working groups are required to create test plans along with creating interop specifications. When a group finishes work, it sends the interop specification and the test plan to the Test and Certification Group, which is responsible for testing and certification of the vendors that pass the test. This promises more structured approach to testing than other test venues such as SuperOp and SIPit.

UCIF and Other Industry Organizations

I had the chance to represent UCIF in several industry panels, and the question that comes up a lot is how UCIF relates to other industry forums, for example, the International Multimedia Telecommunications Consortium (IMTC) and the SIP Forum.

Marc Robins, President of the SIP Forum, and Richard Shockey, chair of SIP Forum’s board, attended the UCIF meeting in L.A., and identified areas of possible cooperation. Founded as a nonprofit in 2000, the SIP Forum includes many of the same companies that participate in UCIF, and focuses on SIP Trunking (based on the SIP Connect specification, now in V1.1), User Agent Configuration, and Fax-Over-IP. UCIF is currently not looking at fax, although surprisingly fax is getting a second wind with mandatory HIPAA requirements that do not allow sending medical test results over email. The work in the SIP Forum’s User Agent Configuration group led to the definition of a mechanism for finding configuration servers but stops short of defining the actual format of configuration files. The UCIF Provisioning Group could take that work to the next level, define formats, create test plans, and certify vendors that comply. The same applies to the SIP trunking work in the SIP Forum: the UCIF Voice Study Group could take the SIP Connect specification, create a test plan, and certify vendors complying with it. I think that if the group focuses on SIP trunking (or more generally on SIP interop), it should not be called Voice Group. While SIP trunks today are defined and used for voice, they will support video once SPs start interconnecting video IP-PBXs through SIP trunks.

With regards to IMTC, I see interest in telepresence interoperability in both IMTC and UCIF. Standardization bodies ITU-T and IETF are also working on the subject. Whatever happens in this area, I expect that any activities in UCIF will include test plans and certification, which is not in the scope for IMTC and standards organizations. Obviously, the challenge with interop test among telepresence systems is the size of these systems, and the difficulties moving them. This leads to the requirement for testing infrastructure to connect UCIF members over the Internet in a test environment that allows for continuous testing.


The true value UCIF brings to the table is the ability to create test specifications, verify, and certify vendor compliance and interoperability. This will finally create an independent seal of approval that is very much missing in UC environments today, and which customers are calling for before committing to Unified Communications.

Wednesday, September 1, 2010

Voice and Video Collaboration Services in the Cloud

Cloud computing is defined as ‘virtualization of computing assets delivered on demand over the IP network’. It promises availability and scalability for applications ranging from storage to collaboration. Clouds are in particularly popular because the concept is easier to grasp than previous attempts to define similar services through Application Service Providers (ASPs) and Software as a Service (SaaS).

The Cloud is a more general concept that includes not only SaaS but also storage, platform, and infrastructure as a service. Clouds are in better position to deliver on the promise for interactivity. While slow networks in the past have made the user experience with ASPs quite negative, better networks available today allow for fast response times, and increased interactivity.

Analysts are excited about cloud services and see above-average growth; while the average IT market growth is expected to be 4% per year until 2013, IT cloud services are expected to grow 25% over same period. The current bidding war between HP and Dell for cloud storage technology provider 3PAR is a great example for how hot this market segment has become.

Evolution of Service Architectures

In the legacy approach, each enterprise application runs on separate server(s) that resides in one of the enterprise’s offices. This leads to inefficient use of space, energy (power-up and cooling), and substandard user experience. In the next stage of the evolution, all servers were collocated in a data center where they can share space and power. In the third stage, services are provided by the Cloud that can be within the enterprise (‘private cloud’) or outside the enterprise (‘public cloud’).

The term ‘virtualization’ is often used in relation to Clouds, and it is important to clarify virtualization’s role. Virtualization saves money by increasing server utilization, i.e. reducing the number of servers (hardware) necessary to support applications. Virtualization can be used in a traditional data center or in the Cloud. In both environments, virtualization reduces the hardware necessary to run enterprise applications. It has very strong positive financial and environmental impact.

Unified Collaboration

Unified collaboration combines variety of communication tolls – voice, video, email, presence, IM, etc. – into a seamless user experience, and into workflows, through a single user interface. Since UC installations are far past pilot deployments and are now being rolled out across large organizations, scalability is an important requirement.

Global teams span the entire world, and different time zones do not allow everyone on the team to participate live in all collaboration sessions. Audio, video, and shared content must therefore be stored and streamed. This leads to the requirement for efficient and scalable storage.

Accessibility of UC applications has two sides. First, users should be able to access them from anywhere, not just the office bit also from remote locations. Second, any device should provide access, including computers, telephones, appliances such as personal and group video systems, and even immersive telepresence systems.

To meet these UC requirements, UC architectures must follow IT architectures towards Clouds.

Bandwidth Requirements

UC applications, such as voice and video, require higher quality of service (QOS) than applications such as email, scheduling, or management. QOS are defined by bandwidth, latency, packet loss, and jitter. And while there are mechanisms in place to combat packet loss and jitter, bandwidth remains the most important resource necessary to support voice and video collaboration applications.

If multiple systems have to be connected in a multi-point conference, the traffic quickly grows, and may overwhelm the Cloud. Cloud throughput is critical for successful deployment of voice and video collaboration application. Recent advances in video compression technology, in particular Polycom’s implementation of the H.264 High Profile for real-time video, allow for ‘thinner’ connections between premise and Cloud without sacrificing the quality of experience.

In general, voice transmission requires less bandwidth than video. Even the highest audio quality does not require more than 128 kbps per channel and this is usually not an issue for the interface between customer premise and the cloud service provider.

Security Requirements

Numerous surveys of CIOs and IT administrators have shown that security is the leading concern around deploying Cloud services. With data applications, hackers try to capture and copy the customer data. With real-time collaboration applications, such as voice and video, hackers try to redirect and record voice and video calls.

There is currently a fairly robust security framework for user authentication, authorization, and media encryption – both in SIP and H.323 environments - that can be deployed to prevent interception of voice and video calls at the interface between customer and SP. However, this security framework has to be reevaluated and extended to cover new security threats that come with new cloud service use cases.

Many industry experts believe that cloud services will lead to improved security due to the centralization of data and the increased security-focused resource. SPs are able to devote resources to solving security issues that many customers cannot afford to solve themselves.

Availability Requirements

In the cloud services scenario, a lot of the infrastructure functionality that today resides on customer premise will be moved to the Cloud, and become a shared resource among enterprises. The availability of these resources is of paramount importance to the success of voice and video services in the Cloud.

One successfully deployed approach to increased availability (and scale) is the use of a redundant resource management application that controls a pool of multipoint conferencing resources in the network.

To make a pool of conference servers behave as one huge conference server, the resource management application tracks incoming calls, routes them to the appropriate resource (for instance, this can be done based on available server resources but also based on available bandwidth to the location of this server) and that automatically creates cascading links if a conference overflows to another server. If the conference is prescheduled, the application server can select a conference server that has sufficient resources to handle the number of participants at the required video quality (bandwidth). Overflow situations are probable with ad-hoc conferences where participants spontaneously join without any upfront reservation of resources.

The resource management application runs on two servers to ensure 100% redundancy and auto-failover. It is designed to provide uninterrupted service by routing calls around failed or busy conference servers. It also allows administrators to “busy out” media severs for maintenance activities while still providing an ‘always available’ experience from the Cloud user point of view. The system can gradually grow from small deployments of 1-2 conference servers to large deployments with many geographically dispersed conference servers based on the dynamic needs of growing organizations. System administrators can monitor daily usage and plan the expansion as necessary. This approach also provides a centralized mechanism to deploy a front-end application to control and monitor conferencing activities across all conference servers.

The resource management application also serves as a load balancer in this scenario, that is, it distributes the conference load over a group of servers, ensuring that a server is not oversubscribed, while another being underutilized. The larger the resource pool, the more efficient the load balancing function is, a feature that is very important to Cloud service providers who can offer conference services globally by using the resource management application and placing conference servers in central points of their networks. More approaches to increased availability and scale are discussed in the paper ‘Polycom UC Intelligent Core: Scalable Infrastructure for Distributed Video’.

Bringing Collaboration and Clouds Closer

The trend towards cloud services is driving both technology and business model changes.

On the technology side, UC technology providers have to make significant changes in the architecture for voice and video applications to better align with the architecture of Clouds. Reducing the complexity in the infrastructure and pushing it to the endpoints is a viable approach although the impact on the user experience through complex endpoint implementation is still being evaluated.

Cloud service providers have to meet challenges on their own. Clouds today are designed with data processing in mind, and throughput (bandwidth to and from the Cloud) and Quality of Service (latency, for example) are not at the level required for real-time interaction. Cloud providers therefore will need to increase throughput for real-time applications, and develop new service pricing to accommodate the specifics of real-time collaboration.

Monday, May 31, 2010

Why Standards? Why Interoperability?

The Importance of Standards

Standards and interoperability have always been the foundation of the video conferencing industry and for many people in this business the need for standards and interoperability is self-explanatory. Using standards to connect systems from different vendors into a seamless network is the best way to assure end-to-end audio and video quality and protect customer investments. However, we frequently forget that – as video becomes an intrinsic part of all kinds of communication tools – the overwhelming majority of new users do not quite appreciate the importance of standards. As a result, questions what standards are and why do we need them pop up much more often lately. Instead of answering the questions around standards and interoperability in separate emails, I decided to write a detailed, and maybe somewhat lengthy, but hopefully balanced and useful overview of this topic.

A technical standard is an established norm or requirement, usually a formal document that establishes uniform engineering or technical criteria, methods, processes and practices. Technical standards have been important in all areas of our lives. They make sure, for example, that electrical appliances support 120V in the USA and 230V in Germany, and that when you plug a toaster into the outlet, it just works.

Another demonstration of the power of standards can be found in the railway system. Most of the railways in the world are built to standard gauge that allows engines and cars from different vendors to use the same rail. The decision of the Royal Commission in 1845 to adopt George Stephenson’s gauge (4 feet 8 1⁄2 inches, or 1435 mm) as a standard gouge stimulated commerce because different railway systems could be seamlessly inter-connected.

In the area of communications technology, analog telephone standards allow until today connection of billions of PSTN users. With the migration to digital technology, new standards for voice (and later video and other communications) had to be agreed on to enable systems and their users across the world to communicate.

How do Communications Standards Emerge?

Standards are usually created by consensus, that is, a number of vendors meet at an industry standardization organization, and over a period of time (which may be months or years) work out a specification that is acceptable to all parties involved. Difficulties emerge when participating vendors already have proprietary implementations and are trying to model the standard after what they already have. The motivation is mostly financial: redesigning existing products to meet a standard is expensive. When implementing a standard, vendors may also be giving up some intellectual property and competitive differentiation. Negotiation of standards is therefore a balancing act between vendor’s interests and the interest of the community/industry.

Sometimes the standardization process stalls, and other means are used to set standards. Governments may get tired of waiting for an agreement in a particular industry, pick a standard, and allow only sales of products complying with it. Governments are especially concerned with security standards, and there are many examples of government-set standards for communication security.

Markets sometimes enforce de-facto standards, when a player in certain market segment has such a large market share that its specification becomes the standard for everybody else, who wants to connect to the market leader. This creates a lot of problems in emerging markets where market shares change rapidly, and companies that are rising fast today are losing to the “next big thing” tomorrow. Standards are designed for the long run while proprietary implementations may come and go really fast.

Skype and Google

Today is not different from any other day in human history, and the battle between standards and proprietary implementations continues. Skype is getting a lot of attention with its hundreds of millions of users (a small portion of them active, but nevertheless), and analysts and consultants frequently ask me “Is Skype not too big to ignore?” and “Shouldn’t everybody make sure they can connect to Skype?” Yet another group of analysts and consultants is very excited about Google’s recent buying spree in voice and video technology. Google’s acquisition of GIPS and On2, and last week’s announcement about making the On2 VP8 video codec open source led to another set of questions about the impact of Google’s move on the reigning champion - H.264 - and the alternative candidate for HTML5 codec called Ogg Theora.

In general, there are two ways to promote a proprietary codec: claim that it has better quality than standard codecs and claim that it is “clean” from intellectual property rights (IPR) claims. Google tried both arguments, positioning VP8 as both higher quality than H.264 and as fully royalty free (as compared to the licensable H.264). Unfortunately, both arguments are not easy to support. Independent comparisons showed that VP8 quality is lower than H.264 Main and High Profile, somewhat comparable with H.264 Baseline Profile. H.264 is a toolbox that includes many options to improve codec performance and its capabilities have not been exhausted. A recent proof point is Polycom’s first and only implementation of H.264 High Profile for real-time communication that led to dramatic reduction of network bandwidth for video calls (white paper).

The IPR situation is also not as clear as Google wishes us to believe because VP8 uses a lot of the mechanisms in H.264 and other codecs, probably covered by someone’s IPR. If you dig deeper into any video and audio compressing technology, you will at some point find similarities, for example, in the areas of frame prediction and encoding, so the IPR situation is never completely clear.

By the way, “open” only means that Google discloses the specs and allows others to implement it. Licensing is however only a small portion of the implementation cost, and redesigning products to add yet another open codec is an expensive proposition for any vendor.

Gateways and Chips

The introduction of proprietary codecs decreases interoperability in the industry. People sometimes dismiss the issue by saying “Just put a gateway between the two networks to translate between the different audio and video formats”. This sound simple but the impact on both the network and the user experience is dramatic. Gateways dramatically decrease scalability and become network bottlenecks, while the cost goes up since gateways, especially for video, require powerful and expensive hardware. Did I mention that the user experience suffers because gateways introduce additional delay (which reduces interactivity) and degrade audio/video quality?

So the real goal is to achieve interoperability and standards are the means to do that without using gateways. But some more technically inclined folks could say “Why don’t you support multiple codecs in every endpoint and server, and just select one of them based on who you are talking to?” This is a great idea and in fact works to a certain extent. For example, Polycom video endpoint today support at least three video codecs (H.264, obviously, but also H.263 and H.261 for backward compatibility) and several audio codecs (G.719, G.722, G.711 …) You can of course add few more codecs but very quickly you will reach complexity in the codec negotiation process that makes the whole call setup a nightmare.

Also, codecs are most efficiently implemented in hardware (chips), and adding more codecs to a chip is not a trivial matter; it requires stable specs and a business case that goes over a long period of time. Adding capabilities to chips increases the price, no matter if you use that codec or not. The worst case scenario for a chip vendor is to spend the effort and add support of a proprietary codec in a chip just to find out that the vendor owning the proprietary codec is not around anymore or has decided to move on to something else. The benefit of established standards is that there is already a substantial investment in hardware and software to support them. Therefore, while encouraging technology innovation, we at Polycom always highlight the need to support and cherish established industry standards that provide foundation for universal connectivity around the world.

Is “Proprietary” Good or Bad?

There are a lot of good reasons to avoid proprietary implementations, and vendors that care about the industry as a whole collaborate and cooperate with other vendors for the common good. A recent example I can think of was when Polycom submitted Siren 22 (proprietary audio codec) to ITU-T for standardization. Siren 22 is a great codec with outstanding quality used by musicians for live music performance over IP networks. Ericsson submitted an alternative proposal and Polycom worked with Ericsson to combine the two codecs into one superior codec that was accepted as the ITU-T G.719 standard (full story). This takes us to another benefit of a standard: it has been evaluated and discussed in wider industry audience, which has concluded that the standard is necessary to satisfy a particular need. The process also guarantees that there is no functional or application overlap with other standards.

Some readers may be confused by the term “proprietary” used in negative connotation. Proprietary and patented technologies are being advertised as a guarantee that one product is better than another. We at Polycom are also proud of a lot of proprietary technologies that make our phones sound better and our video endpoint provide crisper pictures. There is nothing wrong with improving the quality and the user experience through proprietary means. “Proprietary” gets in the way when it hinders communication with other devices and servers in the network, and when it only allows communication within the proprietary system. Such closed systems create islands of communication that do not connect well with the outside world.

How can a proprietary implementation become a standard? Vendors can submit their proprietary implementation to a standardization organization such as ITU-T and IETF, and argue for the benefits of the particular specification with other vendors, scientists, independent experts from the research community, and government. Depending on how crowded this part of the market is, discussions may conclude fast or drag over years. Main complain for vendors who opt for going to market with proprietary implementations is that they need fast Time To Market (TTM) to capture a business opportunity, while the standardization process takes time. It is again a discussion about the balance between personal gain and community benefit. Business opportunities come and go but the need for stability and continuation in the communication market remains. While getting a standard approved takes a substantial effort and patience, standards are still the only way to assure stability, backward compatibility, and customer investment protection across the industry.

The Wisdom of Standards

It required a lot of effort and hard work to create the standards we have today, and dropping them in favor of myriad of proprietary implementations (“open” or not) seriously undermines interoperability efforts in the industry.

There are so many examples of companies who tried and were partially successful with proprietary implementations but later realized the wisdom of standards. For years, PBX vendors like Avaya, Cisco, and Siemens have marketed proprietary systems that only provided enhance functionality internally or when connected to another system from the same vendor. Once connected to a system from another vendor, functionality went down to just the basics. If you have monitored this market, you have seen how over the past few years all vendors moved to SIP-based systems which, while having proprietary extensions, provide high level of interoperability. In another example, Adobe introduced proprietary On2 VP6 video codec into Flash and ran with it until Flash Version 8. Then in Version 9 they yielded to the pressure from partners and customers, and added support of the H.264 standard.

Beyond Standards and Towards True Interoperability

Does standards compliance guarantee full interoperability? Standards have few mandatory functions and a wide range of optional functions. Vendors who only support mandatory functions have basic interoperability while coordination of options support is required for advanced interoperability. Many people asked me about the recent announcement of the Unified Communications Interoperability Forum (UCIF) in which Polycom is a founding member with a board seat. Similar to a judge who interprets the law, UCIF will interpret standards in the Unified Communications space and come up with specifications and guideline how to make UC standards work 100%. The foundation is already laid by the existence of standards for voice, video, presence, IM, etc. UCIF members will together make sure these communication tools work across networks, and provide advanced functionality through a seamless user interface.

In summation, the discussion about standards vs. proprietary is really about fierce competition versus collaborative “the rising tide lifts all boats” approach. There are plenty of areas where vendors can compete (user interfaces, audio/video capture and playback, compression profiles) but there are also areas where working jointly with existing standards and towards emerging standards drives growth of the communications market, and prosperity for the entire industry.

Monday, May 3, 2010

Science Discovery and Advanced Networking 1.5 Miles Below the Earth's Surface

The Spring 2010 Internet2 Conference was superb! I have witnessed the increase in quality and diversity of the Internet2 conferences over years, and hope that I have contributed to these changes, too. Fact is that Internet2 events are larger and more diverse, and that they now include not only educational and research institutes but also government and health organizations as well as growing international attendance. Polycom’s participation in these events has also increased over time. In addition to numerous presentations I have given (links are below in the ‘Speaking Engagements’ section), we have done amazing demos, including the TPX three-screen telepresence system that we built at the Fall 2009 Internet2 Conference, as part of the telepresence interoperability effort. The spring event last week gathered 700 participants and was another excellent opportunity to experience collaboration tools with video capabilities, including Polycom CMA Desktop and PVX soft clients, while Polycom HDX equipment was used in many sessions to connect remote participants from all around the world.

But nothing can compare to the astonishing video and audio quality used to connect LIVE both the former Homestake gold mine near Lead, South Dakota and the office of the Governor of South Dakota to the conference hotel Marriot Crystal Gateway in Arlington, Virginia. All attendees gathered in the big ballroom for the general session "Science Discovery and Advanced Networking 1.5 Miles Below the Earth's Surface" which focused on the plans to convert the Homestake mine in South Dakota into a Deep Underground Science and Engineering Lab (DUSEL), where physicists, biologists and geologists could research fundamental questions about matter, energy, life and the Earth.

It looks like every kind of scientific research would benefit from the underground lab, for example, geologists want to study the rocks and figure out why there is no more gold in the mine, while physicists want to study neutrino and dark matter, and hide from the cosmic radiation that seems to screw up a lot of the experiments. Whatever they end up doing in this lab, it will result in a lot of data that has to be transported to research institutes around the world over a very fast network. And since getting in and out of the mine is not easy, advanced voice and video communication is needed for scientists underground to stay in touch with their peers on the surface. The general session gave a preview of what Polycom audio-video technology can do in the tough mine environment characterized by dust, water, and wide temperature variation.

The mine itself is up to 8,000 feet or 2,438 meters deep (and therefore the deepest in North America) but most of the work today is done at 4,850 feet / 1,478 meters underground, and that’s exactly where the Polycom HDX 8000 system was installed. Optical fiber goes to the surface, and connects to the South Dakota’s Research, Education and Economic Development network (REED), which supports two 10 gigabit/second waves and links the state’s six public universities. REED also connects with the Great Plains regional research and education network at Kansas City, which peers with Internet2. Internet2 links with the Mid Atlantic regional network, which had a 1 Gigabit per second link to the conference site in Arlington. Pretty much the same network – except the underground part - was used to connect the second remote participant in the session: the Governor of South Dakota Michael Rounds. The original plan to have him in the mine was scrapped because of safety concerns and another Polycom HDX 8000 system connected the governor’s office to Arlington.

I have seen many demos of Polycom technology over good networks. The Polycom corporate IP network is designed for audio and video and provides very good quality. BUT nothing I have seen compares to the perfect network used during the general session last week. Not a single packet was lost and the delay was just not there, so that the interaction among on-site and remote participants was flawless. The HDX 8000 systems worked at High Definition 1080p video quality and full-band (22 kHz) audio quality over connections of 6 megabits per second. On one hand, the audience could see, hear, and almost smell the thick air in the deep mine. On the other hand, the pristine quality delivered a fully immersive experience, and made everyone in Arlington feel ‘in the mine’. It felt surreal to be so close and so far away at the same time. 700 conference attendees joined me in that experience.

It is impossible to capture the immersive experience during the session but I will try to at least give my blog readers some feeling of the event.

I took a picture of the Governor of South Dakota Michael Rounds speaking about the creation of an underground science lab in the Homestake mine. I also shot a short video of this part of the session.

When Kevin Lesko, DUSEL Principal Investigator, spoke from the Homestake mine, I took a still picture and shot a short video of him, too.

The Q&A part of the session used a split screen to allow conference attendees to see both the Governor and the team underground at the same time, and engage in live discussion. Here is a picture and a video clip from that part of the session.

The interaction in the Q&A session was spectacular. The Governor and the team in the Homestake mine answered numerous questions from the audience and the interaction across distances was just spectacular. In conclusion of the session the President and CEO of Internet2 Doug Van Houweling thanked all contributors to the session. He thanked Polycom for providing the video equipment for this incredible discussion that highlighted both the advances of audio-video technology and the enormous capabilities of the Internet2 network.

Throughout the 75 minute session, the audio and video quality was impressive. Several attendees came to me after the session to share their surprise and excitement about the immersive experience. Most of them wanted to know how to make their own video conferencing systems deliver similar quality, which of course led to discussion about the recent advances of audio and video technology including compression, cameras, microphones, and networking.

I am sure several of my blog followers attended the session "Science Discovery and Advanced Networking 1.5 Miles Below the Earth's Surface", and I would love to get their comments.

Friday, March 19, 2010

3rd HD Communications Summit Discusses HD Voice in Fixed and Wireless Networks

I was invited to represent Polycom and speak at the third HD Communications Summit which took place in Paris, France on February 12, 2010. Both the first Summit (May 2009) and the second one (September 2009) were in New York City, and Polycom’s CTO Jeff Rodman attended. The HD Communications Summit in Paris was therefore the first European event and was hosted by Orange in their Innovation Garden. The Summit gathered voice industry professionals from across Europe and the United States to discuss the state of HD Voice deployments and future plans. Since I have always looked at HD Voice from enterprise VOIP perspective, the Summit was unique opportunity to see HD Voice through the eyes of a service provider.

Why HD Voice?

HD Voice technology has been around for long time. I traced Polycom’s first HD Voice implementation (Siren 7, a 7 kHz codec) to the VSX video endpoint in 2000, while the first Polycom IP phone with HD Voice was SoundPoint 650 in 2006. But only recently, competition with alternative communication means, such as Email and IM, has led to serious attention to HD Voice among service providers. For example, Voice brings now 75% of mobile service providers’ revenue but it is rarely discussed at industry events such as the Mobile World Congress where the focus is on apps for mobile devices.

The voice industry was concerned too much about not competing with legacy voice, and managed to preserve the same voice quality level (dubbed ‘toll quality’) for decades. However, other communication tools are competing for user’s attention today, with Email, Web, and Instant Messaging so widely adopted that people often use Voice only if they do not have other options to reach someone. Mediocre voice quality leads to misunderstandings while accuracy is very important in the complex environment we all live in. HD Voice reduces or eliminates misunderstanding, and has been proven to reduce fatigue and increase productivity.

I knew that Orange had about 190 million customers - of which 130 million mobile customers - but I knew little about Orange’s involvement in HD Voice and wanted to learn why they are so passionate about HD Voice. It turns out Orange conducted customer surveys which showed that 72% of customers wanted HD voice, 40% would make more or longer calls with HD voice, and 50% would switch VOIP operator to get improved voice quality. Orange, therefore, sees HD Voice is a competitive differentiator and has a strategy how to deliver HD Voice to both residential and wireless customers.

Residential HD Voice

On the residential side, Orange has installed 600,000 HD Voice devices (LivePhone) since 2006. LivePhone uses DECT on the cordless interface, and plugs into the Livebox home gateway, which connects via DSL to the IP network. Livebox supports two wideband voice codecs: G.722 for broadband HD Voice and AMR-Wide Band (AMR-WB, equivalent to G.722.2) for HD Voice to mobile handsets. The LivePhone/Levebox combo is already 8.5% of the installed VOIP base in France and about 20% of the total DECT sales in 2009. Orange’s goal is to have more than 1 million HD Voice residential customers by 2012.

Around year 2000, the advances of IP technology led many industry experts to believe that DECT would quickly disappear and Wi-Fi would deliver IP to the handset. Instead, I see DECT base stations being integrated with DSL routers, and support of Voice over IP in the DSL routers, not in the handsets. This approach allows implementing the more complex IP protocols in the base station/router while keeping the handset simple and inexpensive, which is very important in the price-sensitive residential market. The disadvantage of this approach is that every new IP application, for example IM, must be mapped into a DECT set of commands, and 100% mapping is rarely possible.

I have not been involved in DECT since a DECT/GSM interworking project in 1996, so it was interesting to hear about CAT-iq, the shiny new thing in the DECT space. CAT-iq is a joint specification of the DECT Forum and the Home Gateway Initiative (HGI), and describes the integration of DECT and IP for broadband home connectivity. In addition to the narrow-band voice codec G.726 at 32 kilobits per second, traditionally used in DECT, CAT-iq includes the wideband voice codec G.722 at 64 kilobits per second, in effect making every CAT-iq handsets an HD Voice handset. Most of the CAT-iq implementations will be in DECT/DSL combos but integration of DECT and cable modem is already defined by CableLab’s DECT SIP Specification, and products are in development.

HD Voice in Mobile Networks

On the wireless network side, Orange has successfully deployed HD Voice in Moldova, and is planning deployments in UK, Spain, and Belgium in 2H’2010. Their goal is to have 75% of mobile handsets support AMR-WB by 2012. Based on lessons learned from previous new technology deployments Orange wants to upgrade all network elements to HD Voice before rolling out the service to wireless customers. Adding an HD voice codec to mobile handsets is just the first step. Since noise increases when you transmit higher frequencies, the handset must have better noise suppression. The acoustics of the mobile device are very important, too. Orange tested 20 ‘HD Voice handsets’ and found Mean Opinion Score (MOS) between 2 (really bad quality) and 4 (really good quality). Improving acoustics requires touching microphones, casing, and speakers of the handset.

Note that there are several ways to get HD Voice into the mobile handsets. Few handsets such as Nokia 6720 and Sony-Ericsson Elm/Hazel support HD Voice for voice calls (several new ones were introduced at the Mobile World Congress which I could not attend), while mobile devices from Google, HTC, and Blackberry have audio players that can play HD Voice quality but do not support it on voice calls. In addition, mobile handsets with Wi-Fi support could run an HD Voice soft client. Nokia has a mobile VOIP client that runs on some Nokia mobile devices and there are soft client implementations for Windows Mobile OS.

Orange’s goal is to deploy HD Voice without upgrading the wireless links which traditionally carry AMR-Narrow Band voice at 12.2 kilobits per second (per voice channel). They, therefore, selected the AMR-WB option at 12.65 kilobits per second to keep the wireless network bandwidth virtually unchanged. In the core of the wireless network, announcement and voice mail servers must be upgraded to support HD Voice, while the network connecting mobile devices and servers must support QOS parameters for HD Voice.

Orange is concerned that if even one of these elements – handsets, network, or servers - does not support HD Voice, users will be disappointed by the overall experience.

Connecting the HD Islands

The deployment of HD Voice led to the creation of islands using different codecs. VOIP applications use the G.722 codec. Mobile service providers use AMR-WB in GSM networks and 4GV-WB in CDMA networks. Microsoft and Skype push their own proprietary voice codecs. Unfortunately, it does not look like one codec is going to win (you can also read my post ‘How Many Codecs does Unified Communication Really Need?’ that addresses this issue), and transcoding among different HD Voice formats will be needed in the foreseeable future to connect HD Voice islands.

Service provider Voxbone offers HD Voice interconnect service based on G.722 codec; the service includes transcoding from G.711, G.729, and SILK. The adoption of URI dialing has been slow and Voxbone is therefore using E.164 dialing. They got a new country code +883 from ITU (similar to +1 for USA and +49 for Germany), and now offer their iNum interconnect service in 49 countries. Since they follow the traditional SP origination-termination model, mobile and other service providers can easily connect to Voxbone’s iNum service.

Nobody seemed to like transcoding at the HD Communications Summit, and Orange encouraged other markets/networks to adopt AMR-WB, and avoid transcoding. The issue is that AMR-WB is not royalty free - it includes IPR from Nokia, Ericsson, FT, and VoiceAge – and adopting it in other market segments would increase the cost of doing business. AMR-WB is also optimized for transmitting voice over low-bandwidth wireless links, while wired networks have moved past that and heading towards much higher (full-band audio) quality and to transmission of music and natural sounds.

Wired networks will very fast move from 7 KHz to 14 kHz to 20 kHz audio; Polycom is already shipping products that support these options. We strongly believe that a new era of royalty-free licensing has dawned on us, and that is why we have a royalty-free license for all voice codecs that include Polycom IPR: Siren 7, Siren 14, Siren 22, and the corresponding G.722.1, G.722.1 Annex C, and G.719. In fact, we cover all of these technologies in a single licensing agreement that makes the whole licensing process a breeze. Since G.719 is a joint development of Polycom and Ericsson, we worked with our partner to make sure G.719 is completely royalty-free, which today is the only way to assure wide codec adoption.

The Future of HD Voice

The key argument of my presentation ‘Visions of the HD Future’ at the Summit is that wireless voice networks will follow the same pattern as wired voice networks - with some delay due to the slower increase of bandwidth on wireless links - and will gradually move from wideband codecs such as AMR-WB to super-wideband codecs to the ultimate full-band quality (20 kHz) available in Siren 22 and G.719. At that point, HD voice will become what I call ‘Ultra High Definition Audio’ and will be used for much more than just talking. There are many vertical applications, for example, in healthcare or in the arts and humanity space that have been using Polycom full-band audio technology, in wired networks. Adding mobility to such applications is definitely something our industry should strive for. Have a look at the slides and let me know if you have comments or questions!

In summation, the third HD Communications Summit was a great opportunity to meet some old friends in the voice industry and some new people, to check deployment progress, and compare notes on where the voice industry is going. I liked the healthy balance between fixed and wireless networks in all discussions. I wish presenters would use more audio and video files to support their points - I was a definite minority playing multiple video and audio clips during my presentation.

The next HD Communications Summit will be held on May 12, 2010 in the Computer History Museum in Silicon Valley, California. The location is almost walking distance from my office, so I will definitely attend, and I hope to meet some of you there!

Wednesday, March 17, 2010

Will the US Stimulus Package Lead to Wider Video Adoption?

When I first heard about the priorities of the US stimulus package (the official name is American Recovery and Reinvestment Act, or ARRA), I was very hopeful that it will drastically improve the broadband infrastructure and pave the way for wider adoption of video communication across the United States. Video - and to a lesser extent wideband audio - require quite a lot of bandwidth combined with some quality of service requirements , for example, packet loss should not be more than 5%, jitter should not be more than 40ms, and latency … well, latency is negotiable but a real nice real-time interaction calls for maximum 250ms end-to-end.

I live in a large city which is part of a large metropolitan area of 5-6 million people, and I do have choices among cable, xDSL, FTTH, etc. to get high-speed access to the IP network. In addition, Polycom’s office is not far away and once I connect to the corporate network, I can use much faster and more predictable links to connect to other Polycom offices around the world. But what if I lived in a remote rural area? What if I only could get modem or satellite connection, or connect through the packet service of a mobile network? I would not be able to use video communication – at least not at quality level that makes it useful - and even wideband audio would be a challenge.

A huge part of the US population cannot use video communication because the broadband access network just does not support this application, and the stimulus money spent on broadband initiatives should improve the situation. Wouldn’t it be great to allow patients at remote locations to access best specialist over video and rural schools to connect to world-class education institutions such as the Manhattan School of Music and teach music over advanced audio-video technology?

But how does the stimulus package apply to broadband access? The National Telecommunications and Information Administration (NTIA) established the Broadband Technology Opportunities Program (BTOP) which makes available grants for deploying broadband infrastructure in ‘unserved’ and ‘underserved’ areas in the United States, enhancing broadband capacity at public computer centers, and promoting sustainable broadband adoption projects. The Rural Utilities Service (RUS) has a program called BIP (Broadband Initiatives Program); it extends loans, grants, and loan/grant combinations to facilitate broadband deployment in rural areas. When NTIA or RUS announce a Notice of Funds Availability (NOFA), there is a lot of excitement in the market.

I am actually less interested in the logistics of fund distribution but am rather concerned about the ‘broadband service’ definition used in all NOFA documents. It originates from the Federal Communication Commission (FCC) and stipulates that ‘broadband service’ is everything above 768 kilobits per second downstream (i.e. from service provider to user) and 200 kilobits per second upstream (i.e. from user to service provider). Real-time video requires symmetric bandwidth, although video systems would adjust the audio and video quality level depending on the available bandwidth in each direction. At the minimum ‘broadband service’ level defined above, the user could see acceptable video quality coming from the far-end but would be able to only send low-quality video to the far-end.

I understand that when the broadband service definition was discussed at FCC, the wire-line companies wanted higher limits, in line with what cable and xDSL technology can support, while wireless companies wanted far lower limits, like the ones adopted, so that they can play in broadband access as well. FCC decided to set the bar low enough for everyone to be able to play but allow competition in offering higher speeds. There is fair amount of skepticism that this model will get us to higher speeds than the defined minimums. Several organizations including Internet2 proposed two-tier approach with a lower broadband service limit set for households and a higher limit set for institutions/organizations; however, FCC’ final definition did not recognize that broadband for institutions is different from broadband for end users.

At Polycom, we take network bandwidth limitations very seriously, and have been working on new compression techniques that reduce bandwidth usage for video communication. This resulted in the implementation of the H.264 High Profile which I described in detail in my previous post. And while we can now compress Standard Definition video to about 128 kilobits per second, the additional bit rate necessary for good quality audio and the IP protocol overhead still does not allow us to fit into the very thin 200-kilobits-per-second pipe. Don’t forget that bandwidth is not the only requirement for real-time video; latency, jitter and packet loss are very important and none of these parameters is explicitly required or defined in any NTIA or RUS documents.

Tuesday, March 9, 2010

H.264 High Profile: The Next Big Thing in Visual Communications / Telepresence

High Definition (HD) video led to a rapid and total transformation of the visual communication market. It made visual communication much more attractive, and demand for mass deployment in organizations of any kind and size increased. The dilemma of CIOs today is how to meet user demand for HD communication while not breaking the bank for network upgrades.

Now that video systems support HD up to 1080p quality and the infrastructure is scalable and robust enough to support large HD deployments, network bandwidth remains the last limiting factor to mass deployment of HD video across organizations. Most CIOs are still not comfortable letting 1+ megabit per second HD video calls flood their IP networks. Timing is therefore perfect for a new compression technology breakthrough that dramatically decreases the bandwidth required to run HD and high-quality video.

While H.264 is a well-established and widely implemented standard for video compression, the much simpler and less efficient Baseline Profile is used for visual communication applications today. H.264 however offers more sophisticated profiles, and the High Profile delivers the most efficient compression, in many cases reducing the network bandwidth for a video call by up to 50%. Polycom’s announcement about the support of H.264 High Profile across its video solution is therefore exactly what the market needs right now. This technology breakthrough not only enables drastic reduction of the network resources necessary to video-enable organizations but also allows CIOs to meet budget challenges and power more visual communication with fewer resources, thus limiting or avoiding costly network upgrades.

In my view, the shift from Baseline Profile to High Profile is bigger and more important than the previous video technology breakthrough—the much-heralded shift from H.263 to H.264 in 2003. The gains in performance for High Profile are consistent across the full bandwidth spectrum, while the incremental gain for H.264 over H.263 was limited to the lower bandwidths, below 512 kilobits per second. As a result, new High Definition systems benefit the most from High Profile, and this new technology will accelerate the adoption of HD communication across organizations.

I have received a lot of questions about the H.264 High Profile: How is the High Profile different from other H.264 profiles? What is the impact of this new capability on the visual communication market? How will customers benefit from it? How will this technology help CIOs roll out video across organizations? How does High Profile interact with other video functions? What is its role in the Polycom Open Collaboration Network?

Answering these questions online would have resulted in a very long blog post, so I put together a white paper that looks at High Profile from both business and technology perspectives. I called it “H.264 High Profile: The Next Big Thing in Visual Communications”. Let me know what you think about it.