Thursday, December 17, 2009

Qumu Customer Advisory Council 2009

I had the pleasure to attend Qumu’s Customer Advisory Council in late November 2009. The event was collocated with the Streaming Media West conference, and included a selected group of Qumu customers and partners. I will try to summarize the key issues discussed at this meeting.

The inflection point of enterprise video streaming seems very close. The original application for enterprise streaming was broadcasting live events, for example, executive presentations or live corporate training sessions and, similar to watching TV at home, this application is about highly produced video media that is consumed by large number of people. Now companies experiment with applications that go beyond live event replacement, for example, using video to capture information, sharing video content with other employees, and video communication. Streaming video is becoming an enterprise communication tool that puts more importance on the content’s relevance to the job rather than on production quality of the video. Similar to video conferencing, streaming video competes for mind share and for space in the employee’s workflow. Corporate users have 5 applications and a browser but no time to learn new applications, so the question remains: “How do we bring video, both streaming and real-time, to every employee in the company?”

On the business side, the current recession leads to capital expenditure constraints and enterprise customers look for alternative business models such as term license and managed service. At the same time, large enterprises have own teams of developers who want open Application Programming Interfaces (APIs) so that they can customize streaming and content management solutions.

Several customer presentations described how companies use enterprise steaming and content management. Some use resolution of 320x240 pixels – this is less than the Common Intermediate Format (CIF) used in old video conferencing systems. The quality is sufficient for “talking head” applications but, if content/slides are shared, this resolution does not provide enough detail. Other enterprises use 400x300 pixels and the option to switch to higher resolution (512 x 364 was mentioned) when slides are shared.

Since many of the same companies use the latest generation of telepresence systems that provide HD 720 (1280x720 pixels) and HD 1080p (1920x1080 pixels) video, they are looking for ways to connect the telepresence and video streaming by bridging the quality gap. Some bridges between telepresence/video conferencing and content management/streaming already exist, for example, there is a close integration of Qumu Video Control Center (VCC) and the Polycom RSS 4000 recorder. The initial implementation allows VCC to access RSS 4000 periodically (the polling interval is set by the administrator), and to retrieve recorded videoconference calls. The next level of integration is based on a new Discovery Service that allows VCC to find calls that are being recorded on RSS 4000 even before the recording is complete. The benefit of that new function is that it can automatically discover live content on the portal and on-the-fly streaming without any scheduling in advance.

In its latest version, VCC is getting a new content depository user interface which allows easier posting/uploading of Employee Generated Content (EGC), similar to YouTube. But why use VCC instead of YouTube? Enterprises look for secure deployments, password-based access, defined approval workflow (what can be posted), real-time reporting (who watches what content), and customization (colors and logo). VCC meets these requirements today while additional work is planned around approval workflow. The issue with approval workflows is not technical but rather organizational. Companies have different policies and processes how to handle content posting. While reviewing a document before posting it is relatively easy and fast, watching hours of video is time consuming and inefficient. A creative approach is to allow posting in general and review only videos that have been watched by a certain number of people (let’s say 20). Automatic indexing of the video file is important function because it allows fast search and fast forward to the relevant section. While existing speech recognition technology is not good enough for full transcripts, it is great for indexing video files.

Qumu works very closely with Microsoft and delivers exceptional experience based on the MS Silverlight platform. The VCC user interface now includes a “carousel” that allows you to select programs (that is, individual recordings – Qumu uses broadcasting terminology) from a list while the “jukebox” allows you to search for a video in your library (it includes all videos that you have access to). The updated media player based on Silverlight includes layouts displaying multiple live video streams and content (slides).

Microsoft has moved the streaming function from Windows Media Server (WMS) to MS Internet Information Services (IIS), starting IIS V7. This move includes a radical change in the streaming technology. While WMS uses the Real Time Streaming Protocol (RTSP), IIS adds new technology called “smooth streaming”. Multi-bit rate video content is split into 2-second fragments and subsequently encoded. Each rate is transmitted via the Hypertext Transfer Protocol (HTTP) and consumed by players at the highest bit rate the network can support. The obvious benefit is that HTTP can traverse firewalls but I learned about several additional benefits. For example, WAN optimization servers have large HTTP cache that can be efficiently used for video. The alternative - create separate cache for RTSP video – requires changes in existing networks. Another great benefit is that IIS is not Windows Media specific and smooth streaming can be used for H.264 and other video file formats. As any technology, smooth streaming has some shortcomings, for example it supports only unicast, that is, point-to-point connections. Unicast is great for video-on-demand where a relatively small number of users watch the same video. Live event broadcasts however require streaming of the same video file to possibly thousands of users, and are served better by multicast technology. Multicast is supported in Windows Media Server, and this server will continue to be used for broadcasting live events. Streaming and content management applications such as Qumu VCC will continue to support WMS until the new “smooth” technology is proven and widely deployed.

The use of multiple Content Delivery Networks (CDNs), both internally and externally, is becoming the norm for enterprise customers. Traditionally, internal CDNs built behind the corporate firewalls, were used to distribute content to company-internal users, while external CDNs on the public Internet were used to distribute content to users outside the company. Qumu’s new Software as a Service (SaaS) capabilities allow enterprises to reach both internal and external audiences using connections to Akamai’s or AT&T’s Internet-based CDN’s as simultaneous publishing points to internal CDN’s. SaaS therefore allows customers to implement scalable video streaming without building out their CDNs behind the firewall. Note that even though SaaS uses external CDNs for video distribution, video content creation remains behind the firewall. For example, company-internal videoconference sessions can be recorded internally, and then streamed to internal and external parties.

The mix of internal and external users leads to the fundamental security question: How to provide appropriate video content access to employees, partners, and random external people? User authentication is critical, and the user data is typically spread over dozens of databases in the enterprise. Fortunately, most databases support the Lightweight Directory Access Protocol (LDAP) and can be accessed by LDAP enabled applications, such as Qumu VCC. Since LDAP configurations are becoming very complex in enterprises, applications have to support numerous LDAP servers as well as Nested LDAP Groups, that is, groups that include sub-groups of entries.

In summation, Qumu’s Customer Advisory Council provided a great overview of the requirements for enterprise video streaming and series of valuable customer perspectives. I am looking forward to CAC 2010!

Monday, November 9, 2009

Hardware Architecture for Conference Servers

The ATCA Summit (October 27-29, 2009) was a rare opportunity to think about and discuss the importance of hardware for our industry. As the communication industry becomes more software-driven, major industry events have focused on applications and solutions, and I rarely see good in-depth sessions on hardware. The ATCA Summit provided a refreshing new angle to communication technology.

First of all, ATCA stands for Advanced Telecom Computing Architecture and is a standard developed by the PICMG – a group of hardware vendors with great track record for defining solid hardware architectures: PCI, Compact PCI, MicroTCA, and ATCA. The ATCA Summit is the annual meeting of the ATCA community, or ecosystem, which includes vendors making chassis, blades, fans, power supplies, etc. components that can be used as tool kit to build a server quickly. Time-to-market is definitely an important reason companies turn to ATCA instead of developing their own hardware but equally important is that this telecom-grade (carrier-grade) hardware architecture provide very high scalability, redundancy, and reliability.

So, how does ATCA relate to visual communications? As visual communication becomes more pervasive and business critical both service providers (offering video services) and large enterprises (running their own video networks) start asking for more scalability and reliability in the video infrastructure. The core of the video infrastructure is the conference server (MCU), and the hardware architecture used in that network element has direct impact on the ability to support large video networks. HD video compression is very resource-intensive: raw HD video is about 1.5 gigabits per second, and modern H.264 compression technology can get it down to under 1 megabit per second. This 1500-fold compression requires powerful chips (DSPs) that generate a lot of heat; therefore, the conference server hardware must provide efficient power (think AC and DC power supplies) and cooling (think fans). But even in compressed form video is still using a lot of network bandwidth, and the conference server is the place where all video streams converge. Therefore, conference servers must have high input and output capabilities (think Gigabit Ethernet). Finally, some sort of blade architecture is required to allow for scalability, and server performance heavily depends on the way these blades are connected. The server component that connects the blades is referred to as ‘backplane’, although it does not need be physically in the back of the server. The ATCA architecture was built from ground up to meet these requirements. It was created with telecom applications in mind and has therefore high input/output, great power management and cooling, and a lot of mechanisms for high reliability.

The highlight of the ATCA Summit is always the Best of Show award. This year, Polycom RMX 4000 won Best of Show for infrastructure product, and I had the pleasure to receive the award. I posted a picture from the award ceremony here Subsequently, I presented in the session ‘The Users Talk Back’, and addressed the unique hardware functions in RMX 4000 that led to this award (

So, why did Polycom RMX 4000 win? I think it is mostly elegant engineering design and pragmatic decisions how to leverage standard hardware architecture to achieve unprecedented reliability. It starts with a high-throughput, low-overhead backplane (which we call ‘fabric switch’) that allows free flow of video across blades. This allows conferences to use resources from any of the blades. To illustrate the importance of this point, let’s briefly compare RMX 4000 to Tandberg MSE 8000 which combines 9 blades into a chassis but does not have a high-throughput backplane. Since video cannot flow freely among blades in MSE 8000, conferences are restricted to the resources available on a just one of the 9 blades. For example, if blade 1 supports 20 ports but 15 of them are already in use, you can only create a 5-party conference on that blade. If you need to start a 6-party conference, you cannot use blade 1, and have to look for another blade – let’s say blade 2 - that has 6 free ports. The 5 ports on blade 1 will stay idle until there is a conference of 5 or less participants. In fact, the ‘flat capacity’ software that is running on top of this hardware leads to even worse resource utilization on MSE 8000 but this article is about hardware, so I am not going into that subject (It is discussed in detail here The bottom line is that, with RMX 4000, you will be able to connect 5 participants to one blade and connect the sixth participant to another blade, without even noticing it.

Additional reliability can be gained by using DC power and full power supply redundancy. Direct Current (DC) power is used internally in all electronics equipment. However, power comes as Alternating Current (AC) over the power grid because AC power loss over long distances is lower than DC power loss. Once power reaches the data center, it makes sense to convert it once to DC and feed it to all servers, and that is why service providers and large enterprises running their own data centers like DC power. The alternative approach - provide AC power to each server and have each server convert it to DC - results in high conversion power loss, and is, basically, waste of energy, and should only be used if DC power is not available. RMX 4000 supports both AC and DC power but I am much more excited about the new DC power option. Each DC power supply has 1.5kW, and can power the entire RMX 4000. Best practice is to connect one DC power supply to the data center’s main power line and connect the second one to the battery array. Data centers have huge battery arrays that can keep them running even if the primary power line is down for hours or even days.

Reliability issues may arise from mixing media (video/audio) and signaling/management traffic, and therefore RMX 4000 completely separates these two types of traffic internally. This architectural approach also benefits security, since attacks against servers are usually about getting control of the signaling to manipulate the media. By clearly separating the two, RMX 4000 makes hijacking the server from outside impossible. Note that hijacking of voice conference servers is a major problem for voice service providers (I wrote about that here As visual communication becomes more pervasive and business critical, similar issues can be expected in this space as well, and RMX 4000 is designed for that more dangerous future.

Finally, if a component in the conference server fails, it is critical that it can be replaced without disconnecting all calls and shutting down the server, thus preserving server-level reliability. All critical components in RMX 4000 are therefore hot swappable. This includes the four media blades (they are in the front of the chassis and host the video processing DSPs), RTM LAN modules (they are on the back of the chassis and connect to the IP network) and RTM ISDN modules (also on the back, connect to the ISDN network), power supplies, and fans. Each of these components can be removed and replaced with a new one while the RMX 4000 server is running.


I will discuss the topics of network-based redundancy and reliability in a separate article. Stay tuned!

Sunday, October 25, 2009


The results from the telepresence interoperability demo were discussed on October 7 in the session “Telepresence Interoperability is Here!” Bob Dixon used visual and sound effects (including love songs and Hollywood-style explosions) to explain interoperability to people who are less involved in the topic. His presentation inspired me to write about telepresence interoperability for less technical and more general audience. (I hope that my series of blog posts achieved that). Bob highlighted that this was not only the first multi-vendor telepresence interoperability but also the first time systems on Interent2, Commodity Internet, Polycom’s, Tandberg’s, and IBM’ networks successfully connected.

Gabe connected through an HDX video endpoint to RSS 2000 and played back some key parts of the recording from the interoperability demos on October 6 ( I was actually pleasantly surprised how much information the RSS 2000 captured during the demos. I later found out that Robbie had created a special layout using the MLA application on RMX2000, and this layout allowed us to see multiple sites in the recording.

Robbie (over video from Ohio State) commented that connecting the telepresence systems was the easier part while modifying the layouts turned out to be more difficult. He was initially surprised when RMX/MLA automatically associated video rooms 451, 452, and 453 at Ohio State into a telepresence system but then used this automation mechanism throughout the interoperability tests.

Jim talked about the need to improve usability.

Gabe talked about monitoring the resources on RMX 2000 during the tests and reported that it never used more than 50% of the resource.

I talked mainly about the challenges to telepresence interoperability (as described in Part 2) and about the need to port some of the unique video functions developed in H.323 into the SIP, which is the protocol used in Unified Communications.

Bill (over video from IBM) explained that his team has been testing video interoperability for a year. The results are used for deployment decisions within IBM but also for external communication. IBM is interested in more interoperability among vendors.

During the Q&A session, John Chapman spontaneously joined the panel to answer questions about the demo call to Doha and about the modifications of their telepresence rooms to make them feel more like classrooms.

The Q&A session ran over time and number of attendees stayed after that to discuss with the panelists.

There was a consensus in the room that the telepresence interoperability demo was successful and very impressive. This success proves that standards and interoperability are alive and can connect systems from different vendors running on different networks. The series of tests were also a great team work experience in which experts from several independent, sometimes competing, organizations collaborated towards a common goal.

Back to beginning of the article ...

Friday, October 23, 2009


The demo on October 6 was the first immersive telepresence demo at Internet2. Note that Cisco showed their CTS 1000 telepresence system at the previous Internet2 conference; however, this system has only one screen, and feels more like an HD video conferencing system than an immersive telepresence system. Also, the Cisco demo was on stage and far away from viewers while the TPX demo was available for everyone at the conference to experience.

The following multi-codec systems participated in the telepresence interoperability demo:- Polycom TPX HD 306 three-screen system in Chula Vista Room, Hyatt Regency Hotel, - Polycom TPX HD 306 three-screen system located in Andover, Massachusetts, - LifeSize Room 100 three-screen system located at OARnet in Columbus, Ohio, - Polycom RPX 200 at iFormata in Dayton, Ohio- Polycom RPX 400 at IBM Research in Armonk, NY - Tandberg T3 three-screen system located in Lisbon, Portugal (the afternoon demos were too late for Rui and Bill connected a T3 system in New York instead)

The systems were connected either to the Polycom RMX 2000 located at Ohio State University in Columbus, Ohio, or to the Tandberg Telepresence Server at IBM Research in Yorktown Heights, NY.

As for the setup in Chula Vista, TPX comes with 6 chairs, and there were additional 30 chairs building several rows behind the system. There was enough space for people to stand in the back of the room. (

I can only share my experience sitting in the TPX system in the Chula Vista Room. I am sure other participants in the demo have experienced it a little differently. I was tweeting on the step-by-step progress throughout the demos.

The final test plan included both continuous presence scenarios and voice switching scenarios. Voice switching is a mechanism widely used in video conferencing; the conference server detects who speaks, waits for 2-3 seconds to make sure it is not just noise or a brief comment, and then starts distributing video from that site to all other sites. The twist - when telepresence systems are involved - is that not only one but all 2, 3, or 4 screens that belong to the ‘speaking’ site must be distributed to all other sites. Voice switched tests worked very well; sites were appearing as expected.

Continuous presence – also technology used in video conferencing – allows the conference server to build customized screen layouts for each site. The layout can be manipulated by management applications, e.g. RMX Manager and MLA manipulate the layouts in RMX 2000. (

TPX performed flawlessly. On October 5, most calls were at 2Mbps per screen due to some bottlenecks when crossing networks. This issue was later resolved and on October 6 TPX connected at 4Mbps per screen (total of 12 Mbps). TPX was using the new Polycom EagleEye 1080 HD cameras that support 1080p @30fps and 720p @60fps. We used 720p@ 60fps which provides additional motion smoothness.

About quality: The quality of multipoint telepresence calls on RMX 2000 was excellent. A video recorded in Chula Vista is posted at In few test cases, we connected the TPX systems directly to TTPS, and the quality decreased noticeably.

About reliability: In addition to the failure during the first test (described in Part 5), TTPS failed during the morning demo on October 6 (I was tweeting throughout the demo and have the exact time documented here RMX 2000 performed flawlessly.

About layouts: Since TTPS is advertised as a customized solution for multipoint telepresence, I expected that it will handle telepresence layouts exceptionally well. Throughout the demos, Robbie Nobel used the MLA application to control RMX 2000 while Bill Rippon controlled TTPS. In summation, RMX 2000 handled telepresence layouts better than TTPS. The video shows a layout created by RMX 2000 – T3 system is connected to RMX through TTPS. In comparison, when the telepresence systems were connected directly to TTPS, even the best layout was a patchwork covering small portion of the TPX screens. ( I understand that due to the built-in automation in TTPS, the user has limited capability to influence the layouts. While MLA includes layout automation, it does allow the user to modify layouts and select the best layout for the conference.

About capacity: TTPS is 16-port box and each codec takes a port, so it can connect maximum five 3-screen systems or four 4-screen systems. Bill therefore could not connect all available systems on TTPS – the server just ran out of ports. In comparison, RMX 2000 had 160 resources and each HD connection took 4 resources, so that RMX 2000 could connect maximum of 40 HD codecs, i.e., thirteen 3-screen systems or ten 4-screen systems. RMX therefore never ran out of capacity during the demo.

The morning and lunch interoperability demos were recorded on a Polycom RSS 2000 recorder @ IP address

We ran three interoperability demos during the morning, lunch, and afternoon conference breaks. In addition, we managed to squeeze in two additional demos that highlighted topics relevant to Internet2 and the education community. In the first one, we connected the TPX Chula Vista to the Polycom RPX 218 system at Georgetown University in Doha, Qatar on the Arabian Peninsula, and had a very invigorating discussion about the way Georgetown uses telepresence technology for teaching and learning. John Chapman from the Georgetown University and Ardoth Hassler from the National Science Foundation joined us in the Chula Vista room. If you are interested in that topic, check out the joint Georgetown-Polycom presentation at the spring’09 Interent2 conference The discussion later went into using telepresence technology for grant proposal review panels.

Another interesting demo was meeting Scott Stevens from Juniper Networks over telepresence and discussing with him how Juniper’s policy management engine interacts with Polycom video infrastructure to provide high-quality of experience for telepresence.

Throughout all interoperability and other demos, the Interent2 network performed flawlessly – we did not notice any packet loss and jitter was very low.

Stay tuned for Part 8 with summary of the test and demo results …


Bringing a TPX to San Antonio required a lot of preparation. We had to find a room in the conference hotel Hyatt Regency that had enough space for the system and for additional chairs for conference attendees to see the demo.

Another important consideration for the room selection was how close it was to the loading dock. The TPX come in 7 large crates and we did not want to move them all over the hotel. And the size of the truck had to fit the size of Hyatt’s loading dock.

It was critical to have the IP network on site up and running before the TPX system could be tested. Usually a lot of the work and cost is related to bringing a high-speed network connection to the telepresence system. This was not an issue at the Internet2 conference since I2 brings 10Gbps to each conference site. We needed only about 12 Mbps (or approximately 0.1% from that) for TPX.

Timing was critical too. The Polycom installation team had to do the installation on the weekend, so that everything would work on Monday morning. The room that we identified was Chula Vista on lobby level. It was close to the loading dock and had enough space. The only issue was that the room was booked for another event on Wednesday, so TPX had to be dismantled on Tuesday, right after the last interoperability demo finished at 4:30pm.

Stay tuned for Part 7 about the telepresence interoperability demo at the Internet2 Conference on October 6, 2009 …

Thursday, October 22, 2009


At this point, the team was comfortable with the functionality of the RMX, TPX, and Room 100. Adding another infrastructure component – the Tandberg Telepresence Server – to the test bed increased complexity but that was a risk we had to take in order to evaluate T3’ capabilities. It was also my first opportunity to see TTPS in action, and I was curious to find out what it could do. I knew that TTPS was a 16-port MCU, and that it has some additional capabilities to support multi-screen telepresence systems. But I still did not understand what functionality differentiated it from a standard MCU.

The team’s first experiences were not great. The Tandberg Telepresence Server crashed during the first test in which it participated. It also had problems in what is called 'Room Switched Continuous Telepresence' mode: when a T3 site was on TPX full screen and someone in the LifeSize Room 100 started talking, LifeSize was not shown on full screen on TPX but remained in a small preview window on the bottom of the screen and the border around it was flashing in red. We saw this behavior again during the interoperability demos on October 6

However as we worked with it we found that by cascading to TTPS from an RMX 2000 worked quite well. Gabe or Robbie configured TTPS as three-screen telepresence system on RMX, while Bill configured RMX as three-screen telepresence system on TTPS. And with every test, interoperability got better…

Stay tuned for Part 6 about the logistics around bringing a telepresence system to an industry event …


Then things started happening very fast. Tests were scheduled in every week, sometimes two times a week, throughout September. Gabe and Robbie learned how to use the Multipoint Layout Application (MLA) that controls telepresence layouts on RMX 2000 and found out that if you name the codecs sequentially, e.g. Room451, Room452, Room453, RMX/MLA automatically recognize that these codecs belong to the same multi-codec telepresence system.

The only setback was that we could not find a way around the ‘filmstrip’ generated by Tandberg T3. It did not matter if you connect Rui’s T3 directly to TPX or to Room 100 (point-to-point calls) or if you connect T3 to RMX 2000, T3 always sent a ‘filmstrip’ to third-party systems ( The only advice we got is that we need a Tandberg Telepresence Server (TTPS) to reconstruct the original three images. Leveraging endpoints to sell infrastructure is not a new idea, but with all due respect to Tandberg, forcing customers to buy Tandberg Telepresence Server just to be able to get the original images generated by each of the three codecs in T3 is borderline proprietary, no matter if they use H.323 signaling or not.

In my blog post I have already argued that a standard conference server (MCU) can handle telepresence calls and there is no need for a separate Telepresence Server. I looked at the comments following the post, and two of them (from Ulli and from Jorg) call for more products similar to the Tandberg Telepresence Server from other vendors. Now that I have some experience with TTPS, I am trying to imagine what would happen if Polycom and LifeSize decided to follow Tandberg’s example and develop TTPS-like servers, let’s call them Polycom Telepresence Server (PTPS) and LifeSize Telepresence Server (LSTPS). In this version of the future, the only way for telepresence systems from Polycom, LifeSize, and Tandberg to talk is by cascading the corresponding Telepresence Servers. Calls would go TPX-PTPS-TTPS-T3 or TPX-PTPS-LSTPS-LS Room 100, i.e., we are looking at double transcoding plus endless manual configuration of cascading links. I really believe this separate server approach represents a backward step on the road to interoperability.

Since we had no access to TTPS, Bob Dixon asked Bill Rippon from IBM Research if they could help. I have known Bill since January 2003. At the time, he was testing SIP telephones for a deployment at IBM Palisades Executive Briefing Center and hotel. I was product manager for SIP telephones at Siemens, and naturally very interested in getting the phones certified… Anyway, it was great to hear from Bill again. It turned out Bill had access not only to TTPS but also to an impressive collection of telepresence and other video systems, including Polycom’s largest telepresence system, a 4-screen RPX 400 in Armonk, NY.

Stay tuned for Part 5 about the Tandberg Telepresence Server …

Wednesday, October 21, 2009


I hoped that summer’09 would be quieter than the extremely busy spring conference season, and I had great plans to write new white papers. But on June 29, Bob Dixon asked me if Polycom could take the lead and bring a telepresence system to the Internet2 meeting in San Antonio. He needed a real telepresence system on site to run real live telepresence interoperability demos. I agreed in principle but asked for time to check if we could pull it off logistically. Installing any of the larger Polycom Real Presence (RPX) systems was out of the question – RPX comes with walls, floor, and ceiling, and it was not feasible to install an RPX for just 2 days of demos. The 3-screen TPX system was much more appropriate. I will discuss logistics in more detail in Part 6.

While I was gathering support for the idea within Polycom, Bob Dixon, Gabe Moulton, and Robbie Nobel (Gabe and Robbie are with Ohio State University) started tests with the LifeSize Room 100 systems and the RMX 2000 they had at OARnet But they needed a TPX system similar to the one that would be installed in San Antonio. The best candidate was the North Church TPX in the Polycom office in Andover, Massachusetts, and I started looking for ways to support the test out of the Andover office.

In the meantime, Bob continued looking for other participants on the interoperability demo. Teleris declined participation. That was understandable since they only could connect through a gateway with all the negative consequences from using a gateway.

Cisco have been making efforts to position themselves as a standard-compliant vendor in the Interenet2 community, and promised to show up for the test, even talked about specific plans to upgrade their OEM gateway from RadVision to Beta software that would allow better interoperability. However, when the tests were about to start in late August, they suddenly withdrew. I guess at this point they had made the decision to acquire Tandberg and this had impact on their plans for RadVision.

Tandberg seemed uncertain whether to participate or not. Initially they expressed interest but, in the end they opted not to participate. Given Tandberg’s past history of actively championing interoperability, their decision not to participate in this forum seems inexplicable. Some have speculated that their decision was colored by ongoing talks with Cisco regarding acquisition. That may or may not be true but it will be interesting to observe whether Tandberg’s enthusiasm for standards compliance dampens once the Cisco acquisition is finalized.

Anyway, we did not get any direct support from Tandberg, and we really needed access to a T3 room to expand the tests. That is when the Megaconference email distribution list came in handy. The list ( is a great tool for finding video resources worldwide, so on August 19, I sent a note asking for people interested in telepresence interoperability. Rui Ribeiro from FCCN in Portugal responded enthusiastically. He had a T3 system in Lisbon and wanted to participate. Due to the 5-hour time difference to the East Coast, including Lisbon in the tests meant testing only in the morning, which is busy time for both people and telepresence rooms … but we needed Rui.

We scheduled the first three-way test – with Polycom, LifeSize, and Tandberg systems – for the first week of September. Everyone was available and rooms were booked but it was not meant to happen. On the morning of the test day, my colleague Mark Duckworth who was scheduled to support the test out of the TPX room in Andover had a motorcycle accident, and ended up in the hospital. The team was in shock and had to reschedule the test for the subsequent week. Mark is doing well, and participated in the interoperability tests between doctors’ visits.

Stay tuned for Part 4 about the telepresence interoperability tests in summer 2009 …

Monday, October 19, 2009


The challenges around telepresence interoperability are related to both logistics and technology. Logistics are probably the bigger problem. Vendors usually conduct interoperability tests by gathering at an interoperability event, bringing their equipment to a meeting location, and running test plans with each other. This is the way IMTC manages H.323 interoperability tests and also the way SIPit manages tests based on the SIP protocol. While developers can pack their new video codec in a suitcase and travel to the meeting site, multi-codec telepresence systems are large and difficult to transport. A full-blown telepresence system comes on a large truck and takes substantial time to build – usually a day or more. Therefore, bringing telepresence systems to interoperability test events is out of the question.

An alternative way to test interoperability is by vendors purchasing each other’s equipment and running tests in their own labs. While this is acceptable approach for $10K video codecs, it is difficult to replicate with telepresence systems that cost upwards of $200K. One could ask “Why don’t you just connect the different systems through the Internet for tests?” The issue is that telepresence systems today are run on fairly isolated segments of the IP network - mostly to guarantee quality but also due to security concerns - and connecting these systems to the Internet is not trivial. It requires rerouting network traffic, and use of video border proxies to traverse firewalls.

The technology challenges require more detailed explanation. Vendors like HP and Teleris run closed proprietary telepresence networks and their telepresence systems cannot talk directly to other vendor’s systems. There are of course gateways that can be used for external connectivity but gateways mean transcoding, i.e. decrease of quality, limited capacity, and decreased reliability of end-to-end communication. For those not familiar with the term ‘transcoding’, it is basically translation from one video format into another video format. Telepresence systems send and receive HD video at 2-10 megabits per second (Mbps) for each screen/codec in the system, and all that information has to go through the gateway and be translated into a format that standards-based systems can understand.

Some telepresence vendors state that they support standards such as H.323 or SIP (Session Initiation Protocol). However, standard- compliance is not black-and-white, and telepresence systems can support standards and still not allow good interoperability with other vendors’ systems. When Cisco introduced its three-screen CTS 3000, they made the primary video codec multiplex three video streams – its own and the two captured by the other two codecs – into a single stream that traversed the IP network to the destination’s primary codec. Third-party codecs cannot understand the multiplexed bit stream, and that is basically why you cannot connect a Polycom, LifeSize, or Tandberg telepresence system to Cisco CTS. Note that Cisco uses SIP for signaling and claims therefore standard-compliance; however, the net result is that third-party systems cannot connect. If you decide to spend more money and buy a gateway from Cisco, you could connect to third-party system but at a decreased video and audio quality that is far from the telepresence promise of immersive communication and replacement of face-to-face meetings. The discrepancy between the ‘standard compliance’ claim and the reality that its systems just do not talk to any other vendor has haunted Cisco since they entered the video market.

When Tandberg introduced its three-screen telepresence system T3, they made another technological decision that impacts interoperability. T3 combines the video streams from three codecs (one per screen) into one stream, and any non-Tandberg system that connects to T3 receives what we call a ‘filmstrip’, i.e. three small images next to each other ( The ‘filmstrip’ covers maybe one-third of one screen (or one-ninth of the total screen real estate of a three-screen system). So, yes, you can connect to T3 but you lose the immersive, face to face feeling that is expected of a telepresence system. Note that T3 uses standard H.323 signaling to communicate with other systems, so it is standard-compliant; however, the result is that if you want to see the three images from T3 on full screens, you have to add an expensive Tandberg Telepresence Server (TTPS). I will discuss TTPS in more detail in parts 4 and 5.

To come back to my original point, due to a range of logistical and technological issues, establishing telepresence interoperability is quite a feat that requires serious vendor commitment and a lot of work across the industry.

Stay tuned for Part 3 about the organizational issues around telepresence interoperability testing …


On October 6, 2009, Bob Dixon from OARnet moderated successful telepresence interoperability demonstration at the Fall Internet2 meeting in San Antonio, Texas. It included systems from Polycom, LifeSize, and Tandberg, and the short version of the story is in the joint press release While the memories from this event are still very fresh, I would like to spend some time and reflect on the long journey that led to this success.

First of all, why is telepresence interoperability so important?

The video industry is built on interoperability among systems from different vendors, and customers enjoy the ability to mix and match elements from Polycom, Tandberg, LifeSize, RadVision and other vendors in their video networks. As a result, video networks today rarely have equipment from only one vendor. It was therefore natural for the video community to strive for interoperability among multi-screen/multi-codec telepresence systems.

Most industry experts and visionaries in our industry subscribe to the idea that visual communication will become as pervasive as telephony today, and it has been widely recognized that the success of the good old Public Switch Telephone Network (PSTN) is based on vendors adhering to standards. Lack of interoperability, on the other hand, leads to inefficient network implementations of media gateways that transcode (translate) the digital audio and video information from one format to another thus increasing delay and decreasing quality. While gateways exist in voice networks, e.g. between PSTN and Voice over IP networks, their impact on delay and quality is far smaller than the impact of video gateways. Therefore, interoperability of video systems – telepresence and others – is even more important than interoperability of voice systems.

The International Multimedia Teleconferencing Consortium (IMTC) has traditionally driven interoperability based on the H.323 protocol. At the IMTC meeting in November’08, the issue came up in three of the sessions and there were heated discussions how to tackle telepresence interoperability. The conclusion was that IMTC had expertise in signaling protocols (H.323) but not in the issues around multi-codec systems.

In February’09, fellow blogger John Bartlett wrote on NoJitter about the need for interoperability to enable business-to-business (B2B) telepresence and I replied on Video Networker, basically saying that proprietary mechanisms used in some telepresence systems create obstacles to interoperability.

In April’09, Bob Dixon from Ohio State and OARnet invited all telepresence vendors to the session ‘Telepresence Perspectives and Interoperability’ at the Spring Internet2 conference He chaired the session and, in conclusion, challenged all participating vendors to demonstrate interoperability of generally available products at the next Intrenet2 event. All vendors but HP were present. Initially, everyone agreed that this was a great idea. Using Internet2 to connect all systems would allow vendors to test without buying each others’ expensive telepresence systems. Bandwidth would not be an issue since Internet2 has so much of it. And since the interoperability would be driven by an independent third party, i.e. Bob Dixon, there would be no competitive fighting.

In June’09, I participated in the session ‘Interoperability: Separating Myth from Reality’ at the meeting of the Interactive Multimedia & Collaborative Communications Alliance (IMCCA) during InfoComm in Orlando, Florida, and telepresence interoperability was on top of the agenda.

During InfoComm, Tandberg demonstrated connection between their T3 telepresence system and Polycom RPX telepresence system through the Tandberg Telepresence Server. The problem with such demos is always that you do not how much of it is real and how much is what we call ‘smoke and mirrors’. For those not familiar with this term, ‘smoke and mirrors’ refers to demos that are put together by modifying products and using extra wires, duct tape, glue and other high tech tools just to make it work for the duration of the demo. The main question I had around this demo was why a separate product like the Tandberg Telepresence Server was necessary? Couldn’t we just use a standard MCU with some additional layout control to achieve the same or even better results? To answer these questions, we needed an independent interoperability test. Ohio State, OARnet, and Internet2 would be the perfect vehicle for such test; they are independent and have a great reputation in the industry.

Stay tuned for Part 2 about the challenges to telepresence interoperability …

Thursday, October 1, 2009

Cisco to Acquire Tandberg

Cisco announced today that they will acquire Tandberg, and this will have significant impact on the video communications market. It will reduce competition, and limit customers’ choices, especially in the telepresence space. It will, hurt Radvision who now fills the gap in Cisco’s video infrastructure portfolio.

I am however more concerned about the standards-compliance that have been the pillar of the video communication industry for years. Tandberg and Polycom worked together in international standardization bodies such as ITU-T and in industry consortiums such as IMTC to define standard mechanisms for video systems to communicate.

Cisco on the other hand is less interested in standards, and considers proprietary extensions as a way to gain competitive advantage. The concern of the video communication industry right now should be that the combined company will be so heavily dominated by Cisco that standards will become last priority, far after integrating Tandberg products with Cisco Call Manager and WebEx.

Telling is the fact that both Tandberg and Cisco declined participating in interoperability events over the last few months.

Wednesday, September 30, 2009

How to Manage Quality of Experience for Video?

Video calls require much higher network bandwidth than voice calls; they put therefore more strain on IP networks, and could overwhelm routers and switches to the point that they start losing packets. Video calls also tend to last longer than voice calls (the average length of a voice call is about 3 minutes); therefore, the probability that the network will experience performance degradation during a video call is higher. In addition to voice-related quality issues such as echo and noise, video struggles with freezes, artifacts, pixilation, etc.

So what can we do to guarantee high-quality user experience on video calls? This is an important question for organizations deploying on-premise video today. But due to the increased complexity of video networks, many organizations turn their video networks to managed service providers, and for them, measuring and controlling the quality of experience (QOE) is even more important. It allows managed SPs to identify and fix problems before the user calls the SP’ help desk; this impacts the SP’ bottom line directly.

Everyone who has used video long enough has encountered quality degradation at some point. Packet loss, jitter, and latency fluctuate depending on what else is being transmitted over the IP network. Quality of Service (QOS) mechanisms, such as DiffServ, help transmit real-time (video and voice) packets faster but even good QOS in the network does not necessarily mean that the user experience is good. QOE goes beyond just fixing network QOS; it also depends on the endpoints’ capability to compensate for network imperfections (through jitter buffers and packet recovery mechanisms), remove acoustic artifacts (like echo and noise), and combat image artifacts (like freezes and pixilation).

To monitor user experience, we can ask users to fill out a survey after every call. Skype, for example, is soliciting user feedback at the end of a call but how often do you fill out the form? And what if you are using a video endpoint with a remote control?

For longer video calls, it would be actually better if users report immediately when the issue happens, i.e. during the call. In practice, however, few users report problems while on a call. And even if they do, chances are that no one is available to investigate the issue immediately. In theory, the user could jot down the time when the problem happened and later ask the video network administrator to check if something happened in the IP network at that time. In reality, however, pressed by action items and back-to-back meetings, we just move on. As a result, problems do not get fixed and come back again and again.

Since we cannot rely on the users to report quality issues, we have to embed intelligence in the network itself to measure QOE and either make changes automatically to fix the problem (that would be the nirvana for every network manager) or at least create meaningful report identifying the problem area.

This technology exists today and has already been deployed in some Voice over IP networks. Most deployments use probes - small boxes distributed all over the network and inspecting RTP streams. Probes identify quality issues and report them to an aggregation tool that then generates reports for the network administrator. Integrating the probe’s functionality into endpoints makes the reports even more precise. For example, Polycom phones ship today with an embedded QOE agent that report to QOE management tools.

Originally developed for voice, QOE agents are getting more sophisticated, and now include some video capabilities. They can be used in video endpoints and multi-codec telepresence systems to monitor and report user experience. While this is currently not a priority for on-premise video deployments, QOE may become an important issue if more managed video services become available, as we all hope.

What can we expect to happen in this area in the future? The algorithms for calculating the impact of network issues on QOE will improve. Having the QOE agent embedded in the endpoint allows the endpoint’s application to submit additional quality information, e.g. echo parameters and noise level, to QOE management tools. This would give the tools more data points and lead to more precise identification of problems.

Since call success rate has direct impact on the quality of the user experience, one can expand the definition of QOE and use the same approach for monitoring call success rates. For example, the endpoint’s application can feed information about call success, failure, and reason for failure into the QOE agent and the agent can report that to the QOE reporting tool, which will detect lower-than-normal call success rates and alarm the network administrator. Call success rate can also be derived from Call Detail Records (CDRs) generated by the call control engine in the network; therefore, the alternative approach is to correlate the data from the CDRs with QOE reports from endpoints to identify issues.

While few organizations deploy QOE tools today we see increased interest among managed service providers who see value in any technology that allows them to avoid the dreaded help desk call. In the classic support scenario, a user complaint about bad call quality leads to finger pointing among voice SP, IP network SP, and organization’s IT department. Without proper tools, it is virtually impossible to identify the source of the problem. QOE reporting tools allow administrators to identify the source of the problem and are very valuable in distributed VOIP deployments.

In summation, QOE tools are new and still have a lot of room for improvement. However, the concept itself has been proven for voice and looks promising for video. In the future, look for wider support of QOE agents in voice and video products, and for wider deployment of QOE management tools, especially by managed service providers.

Monday, August 31, 2009

Is Flat Better than Flexible? The Curious Story of Resource Management in Conference Servers

When I joined the video communication industry in 2006, I learned that there is a huge argument in the industry about the best way to manage resources in a conference server (MCU). Three years later, the controversy continues and is a great topic for ‘Video Networker’.

Let’s start with the basics! Video endpoints connect to the conference server to join multi-point calls. The server has number of blades and each blade has a number of Digital Signaling Processors (DSPs) that process digital video. The ultimate flexibility for video users requires the server to transcode among video formats and to customize the Continuous Presence layout for each user. This flexibility costs a fair amount of resources which heavily depends on the quality of the processed video. Higher quality video means more information to process and requires more resources in the server. Not surprisingly, a conference servers can handle a smaller number of very high quality video connections (like HD 1080p), a larger number of high quality connections (like HD 720p), an even larger number of medium quality connections (like SD), and a huge number of low-quality video connections (like CIF). HD obviously stands for High Definition, SD - for Standard Definition, and CIF - for the lower quality Common Intermediate Format.

Having spent many years in the communications industry, this made perfect sense to me. Every server is more scalable when it has less work to do per user. In the case of a conference server, users connect at different quality depending on the capabilities of endpoints and the available network bandwidth. The conference server allocates resources to handle the new users dynamically, up until it runs out of resources and starts rejecting calls. In 2006, this was the way servers from Polycom, Tandberg and RadVision behaved, and there was not even a name for that behavior because it was natural.

Increased scalability was achieved in two ways. First, the video switching mode allowed server to avoid creating Continuous Presence screens. Only video from the loudest speaker was distributed to everybody else – very simple and scalable approach that led to reduced flexibility, and was totally inappropriate for many conferencing scenarios. A major limitation of video switching is that all sites must have the exact same capabilities (bit rate, resolution and frames per second), i.e., the conference server looks for a common denominator. One old video endpoint that can only support CIF resolution at 15fps takes the entire conference – including standard definition and high definition video endpoints - to CIF at 15fps. The second major drawback of video switching is that it only allows users to see ‘the loudest site’ on full screen. While it is nice to see the speaker on full screen, I feel very uncomfortable not seeing the body language of everybody else who is on the call. This limits the interactivity and negatively impacts the collaboration experience.

The second approach to scalability was ‘Conference on a Port’. The administrator of the conference server could select one Continuous Presence layout for the entire conference, and all participants who join received this layout. Again, the limited flexibility results in less work for the conference server (per user) and in increased scalability.

Back in 2006, I was in fact quite surprised to hear that a substantial number of people in the industry were excited by a new concept pushed by Codian and known as ‘a port is a port’ or ‘flat capacity’, which basically keeps the number of connections that the conference server supports constant, no matter whether the connection is HD, SD, or CIF. The proponents of this approach highlighted the simplicity of counting ports on servers. They also emphasized that, with the ‘flexible resource management’ approach, conference server administrators did not know for sure how many users the server can support. It is better, they said, to always have 20 ports rather than to have between 10 and 100 ports depending on connection types. Customers, they argued, should feel more comfortable buying a fixed number of ports.

So we had two competing philosophies in the market: ‘flexible resource management’ vs. ‘flat capacity’. The discussion went back and forth with urban legends coming from the ‘flat’ camp that new DSPs are somehow designed to perform better with ‘flat capacity’ and that there is so much performance on newer DSPs that you can afford to assign a lot of resources to a connection, no matter what quality it is. To the first argument, I know DSPs and they are designed to be a shared resource. Obviously, it is easier and simpler to assign a HD-capable DSP to a connection and let it process whatever quality comes in. It requires more sophisticated resource management to dynamically assign parts of DSPs to handle less demanding connections and full DSPs to HD connections. To the second argument, it is true that DSP performance increases but the complexity of handling HD is an order of magnitude higher than SD. Arguments for wasting resources sound hollow for conferencing servers that can cost $200,000 and up.

Anyway, the rational argumentation did not help resolve the discrepancies between ‘flat’ and ‘flexible’, and this resulted in new products that support both modes and allow the administrator to switch between them. For example, Polycom RMX 2000 easily switches between ‘flexible resource management’ and ‘fixed resource (‘flat capacity’) modes – the change does not even require restarting the server.

But now that the Pandora’s Box is opened, and everyone has an opinion on conference server resource management, there are a lot of new ideas for modes that make the server more efficient for certain applications. On the low end, desktop video is becoming popular and poses a new set of requirements to conference servers, so it is feasible to create a mode of operation dedicated to desktop video deployments. HD is less of an issue for desktop video but scalability is very important when entire organizations become video-enabled.

On the high end, multi-screen telepresence applications demand more performance per system from the conference server, while multiple video streams (one inbound and one outbound for each screen) must be associated and treated as a bundle. Some vendors like Tandberg decided to develop a completely separate product (Telepresence Server) to handle multi-point calls among multi-screen telepresence systems. I think this approach is an overreaction, some may say – an overkill. There are indeed some specific layouts that must be handled differently in a multi-screen telepresence environment but that does not mean putting a separate (and very expensive) server in the network just to handle telepresence calls. I think the approach where the standard conference server has a mode for multi-screen telepresence calls is much more sound from both business and technical perspectives – the main benefit is that you can still use the remaining resources on the server for regular calls among single-screen systems. This is also in accord with the maximum utilization philosophy driving ‘flexible resource management’.

As for the ‘separate telepresence server’ camp, it is not a coincidence that the same team that introduced ‘flat capacity’ is now pitching ‘separate telepresence server’. I see no innovation in limiting flexibility to achieve simplicity - true innovation is simplifying while keeping the flexibility intact.

In summation, conference servers are still the core of visual communication. In the past they had one application: video conferencing. Today, they have to handle video rooms, multi-screen telepresence systems, and desktop video. It is not surprising, therefore, that conference servers evolve and become more versatile. Adding new modes for resource management is a very pragmatic approach to satisfying requirements from new video applications, especially if switching among modes is fast and easy. Developing additional applications – which can run on general purpose computers and communicate with the conference server – is another valid approach. Developing separate servers for each application – room video, multi-screen telepresence, desktop video – is not scalable: it fills the network with hardware that is redundant in the bad way. As visual communication becomes mainstream and changes both our personal and professional lives, new sets of requirements to the conference server will emerge and the flexibility of the server platform to accommodate these requirements will decide whether conferencing servers will continue to be the heart of the video network or not.

Sunday, May 17, 2009

How will the migration from IPv4 to IPv6 impact visual communication?

All information in the Internet and in private intranets is carried in packets. The packet format was defined in the 1980s and described in the Internet Protocol specification (also referred to as IPv4, IETF RFC 791, When IPv4 was designed no one really expected that the Internet would become so pervasive and using 32 bits to address network elements seemed reasonable. The maximum size of the IP packet was set to 65535 bytes which was more than enough for any application at the time. Since the organizations initially using Internet trusted each other, security was not important requirement for IPv4, and the protocol itself did not provide any security mechanisms.

In the 1990s, the rapid growth of the Internet led to the first discussions about the design limitations of the IPv4 protocol. The industry was mostly concerned about the small address space and the discussion lead to the definition of a new packet protocol (IPv6, IETF RFC 1883 and later RFC 2460, that uses 128-bit addresses. However, changing the underlying networking protocol means high cost to service providers and they did not rush into implementing IPv6. Instead, service providers used Network Address Translation (NAT) and later double-NAT as workarounds to overcome the address space shortage. NATs directly impact real-time communication – including visual communication – because they hide the real IP address of the destination and video system on the Internet cannot just call a video system behind the corporate NAT. Business-to-business calls must go through multiple NATs, and this frequently leads to call failures. Another fundamental problem with NATs is that they change the IP address field in the IP packet and this leads to incorrect checksums and encryption failures, i.e., NATs break end-to-end security in IP networks.

So why has the migration to IPv6 become such a hot topic over the last few months? I wrote about the discussions at the 74th IETF meeting, and there were additional discussions, presentations and panels about the urgent need to migrate to IPv6 at the FutureNet conference

While corporate networks can continue to use IPv4 address and NATs for decades, service providers do need unique IP addresses for the home routers, laptops and other mobile devices their customers are using. The pool of available IPv4 addresses is being depleted very fast, and according to Internet Assigned Numbers Authority (IANA), the last full block of IP addresses will be assigned in about 2.5 years, i.e. in end 2011. The address shortage is bad in Europe and very bad in Asia where China is adding something like 80 million Internet users a year. It is human psychology to ignore things that are far in the future but 2011 is so close and so real that everyone started panicking, and looking at IPv6 as the savior of the Internet.

Although the migration to IPv6 is driven by the address shortage, IPv6 brings many new functions that will have impact on real-time applications such as voice and video over IP. Since there will be enough IPv6 addresses for everyone and everything, NATs can be completely removed, and real-time applications would work much better on the Internet. Some organizations believe that NATs’ ability to hide IP addresses of internal IP servers and devices provide security, and they push for having NATs in IPv6 networks. Security experts have repeatedly stated that NATs do not improve security because a hacker can scan the small IPv4 subnets– they usually have just 255 IP addresses each – within seconds, even if they are behind a NAT. Scanning IPv6 subnets in comparison is futile because these subnets are so large that it would take years to find something in the subnet. Removing NATs would allow end-to-end security protocols such as IPSEC to efficiently secure the communication in IP networks.

Quality of Service (QoS) mechanisms developed for IPv4 can be further used with IPv6. The new header structure in IPv6 allows faster header parsing which leads to faster packet forwarding in routers. The impact on real-time communication is positive: voice and video packets will move faster through the IP network.

The new packet structure in IPv6 allows for larger packets with jumbo payload between 65535 and 4 billion bytes. This would allow sending more video information in a single packet, instead of splitting it in multiple packets. This should benefit visual communications, especially as video quality increases and video packets get larger. The way IPv6 handles packets leads to another security improvement. Many security problems in IPv4 are related to packet fragmentation, which happens if a packet has to be sent through a slower link. The router splits the packet in multiple fragments and sends them as separate IP packets. The receiver must recognize the fragmentation, collect all pieces, and put the original packet together. IPv6 does not allow packet fragmentation by intermediaries/routers which now must drop too large packets and send ICMPv6 Packet Too Big message to the sender/source. The source then reduces the packet size so that it can go across the network in one piece.

Note that just supporting the new IPv6 headers in networking equipment is only a part of supporting IPv6. Several other protocols have been enhanced to support IPv6:
- Internet Control Message Protocol (ICMP) v6 (RFC 4443, and the additional SEcure Neighbor Discovery (SEND, RFC 3971,
- Dynamic Host Configuration Protocol (DHCP) for IPv6 (RFC 3315,
- Domain Name System (DNS) for IPv6 (RFC 4472,
- Open Shortest Path First (OSPF) routing protocol for IPv6 (RFC 5340,
- Mobility Support in IPv6 (RFC 3775,

Wednesday, May 6, 2009

How many codecs does unified communication really need?

There are hundreds of video, voice and audio codecs out there. In June 2007, Brendon Mills from Ripcode claimed he had counted 250 audio codecs and about 733 video codecs. While his count may be a little exaggerated to support the business case for transcoding, there are definitely too many codecs in the market place, and most of them are only used in one particular closed application.

We distinguish between speech codecs that are designed to work well with human speech (not with music and natural noises) and audio codecs that are designed to work well with all sorts of audio: music, speech, natural noises, and mixed content. Since speech is a subset of audio, I prefer using the term ‘audio codecs’ in general conversations about audio technology.

Some codecs are standards, for example, the G. series of audio codecs and H. series of video codecs. Other codecs are proprietary, for example, On2 VP6 for video and Polycom Siren 22 for audio. The differences among codecs are mainly in the encoding techniques, supported bit rates, audio frequency spectrum (for audio codecs), or supported resolutions and frame rates (for video codecs).

With so many codec choices, we are at a point where the complexity of handling (‘supporting’) numerous codecs in communication equipment creates more problems than the benefits we get from a codec’s better performance in one particular application. There are at least three main problems with supporting many codecs in communication equipment. The first and biggest problem is interoperability. Yes, there are ‘capability exchange’ procedures in H.323, SIP and other protocols – these are used to negotiate a common codec that can be used on both ends of the communication link – but these procedures create complexity, delay call setup, and lead to a lot of errors when the codec parameters do not match 100%. Second, supporting multiple codecs means maintaining their algorithms and code tables in the device memory, which leads to memory management issues. Third, many codecs today require licensing from individual companies or consortia who own the intellectual property rights. That is both an administrative and a financial burden.

These are three good reasons to look for simplification of the codec landscape. The only reason not to simplify is backward compatibility, that is, interoperability with older systems that support these codecs. For example, new video systems ship with high quality H.264 video codecs but still support the old and inefficient H.261 and H.263 video compressions to interwork with installed base of video systems in the network.

Most of the audio and video codecs emerged in the last few year, especially with the advances of video streaming. The longer it takes for the industry to converge around fewer universal codecs, the more interoperability problems with the installed base will we face in the future. This makes codec convergence an urgent issue.

Let’s look at audio and ask the fundamental question ‘How many audio codecs do we as industry really need to fulfill the dream of Unified Communication (UC)?

The answer is driven by the types of packet (IP) networks that we have today and will have in the future. With Gigabit Ethernet finding wide adoption in Local Area Networks (LANs) and access networks and with fast Wide Area Networks (WANs) based on optical networks, bit rate for audio is not a problem anymore. With the state of audio encoding technology today, great audio quality can be delivered over 128kbps per channel, or 256kbps for stereo. Most enterprises and high-speed Internet Service Providers (ISPs) have IP networks that are fast enough to carry good quality audio.

High quality audio is critical for business communication (it is a major components in creating an immersive telepresence experience) and for Arts and Humanities applications (the Manhattan School of Music is a good example The new ITU-T G.719 codec competes with the MPEG AAC codecs for this space. As argued in the white paper ‘G.719 – The First ITU-T Standard for Full-Band Audio’ ), the low complexity and small footprint of G.719 makes it more suitable for UC applications that require high quality audio. Its bit rates range from 32 to 128 kbps (per channel) which makes it a great choice for even relatively slow fixed networks.

At the same time, there are packet networks that have very little bandwidth; for example, mobile networks still offer relatively low bit rates. General Packet Radio Service (GPRS) - a packet oriented mobile data service available to users of Global System for Mobile Communications (GSM) – is widely deployed today in the so called 2G networks. GPRS today uses three timeslots with maximum bit rate of 24kbps. However, the application layer Forward Error Correction (FEC) mechanisms lead to much smaller bit rates of about 18kbps. The evolution of GPRS known as 2.5G supports a better bit rate up to a theoretical maximum of 140.8kbps, though typical rates are closer to 56kbps – barely enough to run high-quality audio. Such ‘bad’ networks require efficient low bandwidth audio codec that provides higher quality than the ‘good old PSTN’ (G.711 codec). There are several good wideband audio codecs that provide substantially higher quality than PSTN and that can operate within the mere 18kbps packet connection to mobile devices. AMR-WB and Skype’s new SILK codec come to mind and are possible candidates to address this need.

In the area of video compression, market forces and desire to avoid resource-intensive video transcoding led to the wide adoption of H.264 not only for real-time communication (telepresence, video conferencing, even video telephony) but also in the video streaming market – with Adobe’s adoption of H.264 in its Flash media player. I see the trend towards H.264 in other video-related markets such as digital signage and video surveillance.

In summation, UC is about connecting now separate communication networks into one, and providing a new converged communication experience to users. To avoid loss of audio and video quality due to transcoding gateways, the industry has to converge around few audio and video codecs that provide great quality in ‘good’ and ‘bad’ networks and that have low complexity and small footprint to fit in systems from immersive telepresence to mobile phones. It is time to have an unbiased professional discussion which codecs will take us best to the UC future we all dream of.

Monday, May 4, 2009

Summary of the Internet2 Meeting in Arlington, Virginia, April 27-29, 2009

Internet2 is a not-for-profit high-speed networking organization. Members are 200+ U.S. universities, 70 corporations, 45 government agencies, laboratories and other institutions of higher learning. Internet2 maintains relationships with over 50 international partner organizations, such as TERENA in Europe. Internet2 has working groups that focus on network performance and middleware that is shared across educational institutions to develop applications. For example, InCommon focuses on user authentication in federated environments, Shibboleth - on web single sign-on and federations, Grouper – on groups management toolkit, and perfSonar – on performance monitoring.

Internet2 members meet twice a year. The spring 2009 meeting took place in Arlington, Virginia, just outside Washington, D.C., and gathered about 640 participants from 280 organizations. 96 of the participants were from corporate members. Here is a short video from the Interent2 reception on Monday, April 27:

Polycom has been a member of Internet2 for years, and has contributed equipment and sponsored events. Six HDX systems were used in Arlington to connect remote participants, e.g. from Kenya and Ecuador. I have been involved in Internet2 since 2007, and presented at the several meetings. At the event this week, my two presentations addressed telepresence. The first one was on Tuesday – I was part of a large panel of vendors in the telepresence industry I shot a short video during the preparation of the telepresence panel

One thing I do not like about vendor panels is that folks tend to jump into product pitches and competitive fighting. In telepresence panels there also the tendency to define telepresence in a way that matches vendor’s own products, and exclude everything else. Instead, I focused on the broader definition of telepresence and the different levels of interoperability that we should look at as an industry. In my view, telepresence is an experience (as if you are in the same room with the people on the other side) that can be achieved with different screens sizes, codecs, and audio technologies. For example, people using Polycom RPX may consider Cisco CTS not immersive enough. Properly positioned and with the right background a single-screen HDX system can provide more immersive experience than a three-screen system on a multipoint call. All speakers seem to agree that the remote control is not part of the telepresence experience. Some insisted that the cameras have to be fixed but I do not really agree with that. If a three-screen system is connected to a three-screen system, the cameras have one angle. If you connect the same three-screen system to a two-screen system, changing the camera angle could deliver better experience for the remote (two-screen) site. So in my view, moving cameras is OK as long as it happens automatically and the user is not involved in the process.

Signaling level interoperability is important as we have systems that use H.323, SIP, and proprietary protocols in the market. But using the same signaling does not mean interoperability. There is no standard for transmitting spatial information, e.g., where the screens are located, which audio channel is on what side, and what is the camera angle. While video interoperability is easier due to the wide adoption of the H.264 standard, audio interoperability is still a problem. There are several competing wideband and full-band speech and audio codecs that are incompatible, so systems from different vendors today negotiate down to the low-quality common denominator which does not support a telepresence experience. I got a lot of positive feedback after the panel; Internet2 attendees are much more interested in balanced analysis of the interoperability issues than in products. Presentation slides are posted on the session page. The session was streamed and the recording should be available for viewing in a few days.

My second telepresence presentation was a joint session with John Chapman from the Georgetown University John Chapman described the history of Georgetown’s remote campus in Qatar and the attempts to connect it back to the main campus in Washington D.C. via video conferencing and collaboration tools. He then described the decision process that led to the selection and installation of two Polycom Real Presence Experience (RPX) systems.

My presentation provided an overview of the existing telepresence options from Polycom (different sizes of RPX 400 series, RPX 200 series and the TPX system) that can meet the requirements for immersive interaction of up to 28 people per site. I focused on the technologies used in creating the telepresence experience – monitors, cameras, microphones, speakers, and furniture. Then I talked about the new functionality in the recently released TPX/RPX V2.0 and about the differences between the Video Network Operations Center (VNOC) service and the newly announced Assisted Operations Service (AOS). Using video clips proved very efficient in this presentation and made it very interactive. Presentation slides will be available for viewing at the session web page.

Now a couple of highlights from the meeting …

The IETF chair Russ Housley talked about successful protocols – this seems to be a recurring theme at IETF, as you can see in my summary of the last IETF meeting. Russ focused on the main challenges for the Internet: increasing demand for bandwidth, need to reduce power consumption in network elements, creating protocols that run well on battery-powered mobile devices, support of new applications (like video streaming and real-time video communication), and, finally, the issue with the empty pool of IP V4 addresses and the urgent need to migrate to IP V6. Russ Housley also called for more academic researchers to become involved in IETF. This only reinforces my observation that IETF has been taken over by vendors, and researchers are now in minority; see summary of the last (74th) IETF Meeting here

I know Ken Klingenstein from previous Intrenet2 and TERENA meetings. His primary focus is federated identities and he presented about successful implementation of federation on national level in Switzerland: I talked to him in the break. The InCommon group wanted to create a mechanism for user authentication in federated environments. They looked at Kerberos, SIP Digest Authentication, etc. but none fit federated environment. InCommon therefore developed a mechanism that replicated web HTTP authentication. For example, when the user agent sends an INVITE, the SIP server challenges it with a message, and points at authentication server that is recognized by this SIP server. The user agent connects over HTTP to the authentication service (which can be anything, e.g., Kerberos, NTLM, or Digest), gets authenticated and then sends its authenticated information (name, organization, location, email address, phone number, etc. combined in a SAML assertion) to the destination. They need a standard mechanism to send the SAML assertion to the destination – in a SIP message or out- of-band (through another protocol). In Switzerland, SWITCH created an ID card with that information and the destination user agent displays this ID card to the user who decides whether and how to respond, e.g., accept the call. This authentication mechanism is very important for video endpoints that connect to a federation. Endpoints today support digest authentication in pure SIP environment or NTML in Microsoft environment while H.235 is not widely implemented in H.323 environments. As stated above, universal method for authentication is required in federated environments.

During the general session on Wednesday morning, there was also a demo of a psychiatrist using single-screen ‘telepresence’ system from Cisco to connect to a veteran, and discuss possible mental problems. On one hand, I am glad that Cisco is using its large marketing budget to popularize telepresence - this helps grow the video market as a whole. On the other hand, the whole demo implied that only Cisco provides this technology, and I addressed the issue in my presentation on Wednesday afternoon. The HD 1080p technology used in the demo is now available from Polycom and other vendors. The presentation introducing the demo referred to hundreds of installed video systems but failed to mention that the interoperability between Cisco telepresence and other video systems is so bad that it cannot be used for tele-psychiatry or any other application demanding high video quality. The demo itself was not scripted very well – the veteran did not seem to have any problems and the psychiatrist did not seem to know how to use the system. The camera at the remote location was looking at a room corner and did not provide any telepresence experience (it looked like a typical video conferencing setup).

I attended a meeting of the Emerging National Regional Education Networks (NREN) group. The One Laptop Per Child (OLPC) program distributed millions of laptops to children in developing countries These laptops do not have much memory (256MB) and the CPU is not very fast (433MHz). Their only input/output interface is Wi-Fi.

Ohio University decided to test what kind of video can be enabled on these laptops, so that children can participate in virtual classes. The laptops can decode H.263 video (not H.264) and the team therefore installed VLC media player over the IP network, and used it to decode streaming video in H.263 format. An MCU converts H.264 video into H.263. The streaming protocol between the MCU and the laptop is Real Time Streaming Protocol (RTSP) Here is how it looks To allow feedback (questions from the children to the presenters), they use Pidgin chat client that talks to different chat services: AIM, Google Talk, Yahoo!, MSN, etc. Children can watch the streaming video and switch to the Pidgin application to send questions over chat.

In summation, the Internet2 meeting in Arlington was very well organized and attended. It provided great opportunities to discuss how education, government and healthcare institutions use video to improve their services.

Monday, April 27, 2009

Telepresence Interoperability Discussion at Internet2 Meeting

A last-minute change in the Internet2 meeting program led to my participation in the panel 'Telepresence Perspectives and Interoperability'

The speaker selection will result in some interesting discussions.

This session will be recorded and streamed, so please click on the streaming icon to watch the session.

As a result of this session, a series of telepresence interoperability tests were organized in summer 2009, and the results were presented at the Internet2 conference in October 2009. Read the full story here:

Monday, April 20, 2009

New white paper "G.719: The First ITU-T Standard for Full-Band Audio"

Conferencing systems are increasingly used for more elaborate presentations, often including music and sound effects. While speech remains the primary means for communication, content sharing is becoming more important and now includes presentation slides with embedded music and video files. In today’s multimedia presentations, playback of high-quality audio (and video) from DVDs and PCs is becoming a common practice; therefore, both the encoder and decoder must be able to handle this input, transmit the audio across the network, and play it back in sound quality that is true to the original.

New communications and telepresence systems provide High Definition (HD) video and audio quality to the user, and require a corresponding quality of media delivery to fully create the immersive experience. While most people focus on the improved video quality, telepresence experts and users point out that the superior audio is what makes the interaction smooth and natural. In fact, picture quality degradation has much lower impact on the user experience than degradation of the audio. Since telepresence rooms can seat several dozens of people, advanced fidelity and multichannel capabilities are required that allow users to acoustically locate the speaker in the remote room. Unlike conventional teleconference settings, even side conversations and noises have to be transmitted accurately to assure interactivity and a fully immersive experience.

Audio codecs for use in telecommunications face more severe constraints than general-purpose media codecs. Much of this comes from the need for standardized, interoperable algorithms that deliver high sound quality at low latency, while operating with low computational and memory loads to facilitate incorporation in communication devices that span the range from extremely portable, low-cost devices to high-end immersive room systems. In addition, they must have proven performance, and be supported by an international system that assures that they will continue to be openly available worldwide.

Audio codecs that are optimized for the special needs of telecommunications have traditionally been introduced and proven starting at the low end of the audio spectrum. However, as media demands increase in telecommunications, the International Telecommunication Union (ITU-T) has identified the need for a telecommunications codec that supports full human auditory bandwidth, that is, all sounds that a human can hear. This has led to the development and standardization of the G.719 audio codec...

The new white paper "G.719 - The First ITU-T Standard for Full-Band Audio" is available here:

Wednesday, April 15, 2009

The Art of Teleworking

My new white paper "The Art of Teleworking" is now available online at
Comments are welcome.

Summary of the 74th IETF Meeting in San Francisco, March 23-27, 2009

The Internet Engineering Task Force meets three times a year (fall, spring, and summer) in different parts of the world to discuss standards (called Request For Comments or RFCs) for the Internet. These meetings are the place to discuss everything related to the Internet Protocol (IP), the User Datagram Protocol (UDP), the Transmission Control Protocol (TCP), Session Initiation Protocol (SIP), etc.

If you have not been to IETF meetings, here are two impressions of working group sessions:

This was the second IETF meeting I attended (the first one was in 1997), and it was quite fascinating to observe the changes. First, many of the 100 or so IETF working groups are now running out of work items. IETF seems to be losing 5 working groups per meeting or 15 per year. If this trend continues, IETF can disappear by 2015, someone commented. At the meeting in San Francisco, most groups finished early because they ran out of agenda items.

Another change is in the number of participants representing vendors as compared to folks from education and research. It looks like a lot of vendors have flooded IETF over the years and - some say - made it slower, more competitive, and less efficient. You can look at the list of attendees and make up your own mind.

Key topic of the meeting was IPv4–IPv6 migration. Service providers are really run out of IPv4 addresses – especially in Asia Pacific – and there was a sense of urgency to help. The Internet Architecture Board (IAB) created a document with their thoughts on the issue The firewall folks discussed what functions to put in an IPv6-to-IPv6 firewall. There was a BOF (initial discussion of a new topic) on sharing IPv4 addresses among multiple users – this is to temporarily alleviate the pain of ISPs that are running out of IP addresses. Migration to IPv6 is important for Voice over IP and Video over IP products (basically the entire Polycom product portfolio) because they all have to support IPv6 and run in a dual stack (IPv4 and IPv6) mode for the transitional period that can span over many years. Note that IPv6 support is trivial. In addition to supporting the news IP packet header, endpoints have to also support a version of the Dynamic Host Configuration Protocol (DHCP) that supports IPv6, a special specification that describes how the Domain Name System (DNS) will support IPv6, etc.

Another new thing at IETF is the work on new transport protocols that enhance UDP and TCP. The Datagram Congestion Control Protocol (DCCP) is like UDP but with congestion control. While adding congestion control to the datagram transport protocol is not a bad technical idea, the business implications are huge. It looks like today even the two existing transport protocols (UDP and TCP) are one too many, and applications migrate from TCP to UDP because of its simplicity. Even signaling for real-time applications, which is the best fit for TCP, is frequently transported over UDP. There is also an effort to specify Transport Layer Security (TLS) over UDP.

Video and telepresence systems – such as Polycom RPX, TPX, and HDX – use UDP for transport of real time traffic (voice and video packets). Migrating to the DCCP protocol may make sense in the future if the congestion control mechanisms in DCCP are supported end-to-end. This is not the case today.

The Secure Transport Control Protocol (STCP) is another new transport protocol, a better version of TCP. I am not sure why STCP is better than just running Transport Layer Security (TLS) on top of TCP but the big question is whether there is space for additional transport protocols (beyond UDP and TCP). Video systems today are using TLS and TCP for secure transport of signaling messages during call setup and call tear-down STCP will therefore have no impact on video equipment, since TLS over TCP and TLS over UDP are doing beautiful job securing the communication.

DIAMETER was originally developed as a authentication protocol (to replace RADIUS) but is now adopted by many service providers and IETF is trying other uses, for example, for negotiating Quality of Service (QOS) and even for management of Network Address Translation (NAT) functions in carrier-to-carrier firewalls. Video applications require a lot of network resources (bandwidth, low latency and jitter, and low packet loss) and communicating QOS requirements from a video application such as Polycom CMA 5000 to a policy engine controlling the routers and switches in the IP network is a great idea. A standard solution based on DIAMETER would help interoperability among video vendors and IP networking equipment vendors.

As I already wrote in the summary of the International SIP Conference, SIP is getting too complex - with 140 RFCs and hundreds of internet drafts. IETF understands that the complexity of SIP is a problem and wants to create base SIP specification that includes only the key functionality. The problem is that different people within IETF want to include different set of RFCs (subset of the 140 SIP-related RFCs) in the base specification. Polycom has implemented a great deal of SIP RFCs in its voice and video products and is indeed interested in a simpler version of SIP that will ensure robust interoperability across the industry and include the basic call features that everyone uses, not the fancy ones that are rarely used.

I attended the meetings of the two IETF working groups that discuss conferencing: Centralized Conferencing (XCON) and Media Server Control. While XCON is focused on conference call setup, the Media Server Control group defines a protocol between a Media Resource Broker (MRB) and Media Server (MS), which is useful when you have a conferencing application control one or more conference servers (MCUs). When completed this standard could allow the Polycom Distributed Management Application to control non-Polycom MCUs, for example, Scopia MCUs from RadVision.

IETF is obviously getting into a mature phase and did what I would call ‘tribute to MPLS’ in Hollywood Oscar ceremony-style. They tried to portrait Multi-Protocol Label Switching as a great success of IETF standardization but many in the audience pointed out that MPLS interoperability among vendors and among operators just was not there.