Monday, August 31, 2009

Is Flat Better than Flexible? The Curious Story of Resource Management in Conference Servers

When I joined the video communication industry in 2006, I learned that there is a huge argument in the industry about the best way to manage resources in a conference server (MCU). Three years later, the controversy continues and is a great topic for ‘Video Networker’.

Let’s start with the basics! Video endpoints connect to the conference server to join multi-point calls. The server has number of blades and each blade has a number of Digital Signaling Processors (DSPs) that process digital video. The ultimate flexibility for video users requires the server to transcode among video formats and to customize the Continuous Presence layout for each user. This flexibility costs a fair amount of resources which heavily depends on the quality of the processed video. Higher quality video means more information to process and requires more resources in the server. Not surprisingly, a conference servers can handle a smaller number of very high quality video connections (like HD 1080p), a larger number of high quality connections (like HD 720p), an even larger number of medium quality connections (like SD), and a huge number of low-quality video connections (like CIF). HD obviously stands for High Definition, SD - for Standard Definition, and CIF - for the lower quality Common Intermediate Format.

Having spent many years in the communications industry, this made perfect sense to me. Every server is more scalable when it has less work to do per user. In the case of a conference server, users connect at different quality depending on the capabilities of endpoints and the available network bandwidth. The conference server allocates resources to handle the new users dynamically, up until it runs out of resources and starts rejecting calls. In 2006, this was the way servers from Polycom, Tandberg and RadVision behaved, and there was not even a name for that behavior because it was natural.

Increased scalability was achieved in two ways. First, the video switching mode allowed server to avoid creating Continuous Presence screens. Only video from the loudest speaker was distributed to everybody else – very simple and scalable approach that led to reduced flexibility, and was totally inappropriate for many conferencing scenarios. A major limitation of video switching is that all sites must have the exact same capabilities (bit rate, resolution and frames per second), i.e., the conference server looks for a common denominator. One old video endpoint that can only support CIF resolution at 15fps takes the entire conference – including standard definition and high definition video endpoints - to CIF at 15fps. The second major drawback of video switching is that it only allows users to see ‘the loudest site’ on full screen. While it is nice to see the speaker on full screen, I feel very uncomfortable not seeing the body language of everybody else who is on the call. This limits the interactivity and negatively impacts the collaboration experience.

The second approach to scalability was ‘Conference on a Port’. The administrator of the conference server could select one Continuous Presence layout for the entire conference, and all participants who join received this layout. Again, the limited flexibility results in less work for the conference server (per user) and in increased scalability.

Back in 2006, I was in fact quite surprised to hear that a substantial number of people in the industry were excited by a new concept pushed by Codian and known as ‘a port is a port’ or ‘flat capacity’, which basically keeps the number of connections that the conference server supports constant, no matter whether the connection is HD, SD, or CIF. The proponents of this approach highlighted the simplicity of counting ports on servers. They also emphasized that, with the ‘flexible resource management’ approach, conference server administrators did not know for sure how many users the server can support. It is better, they said, to always have 20 ports rather than to have between 10 and 100 ports depending on connection types. Customers, they argued, should feel more comfortable buying a fixed number of ports.

So we had two competing philosophies in the market: ‘flexible resource management’ vs. ‘flat capacity’. The discussion went back and forth with urban legends coming from the ‘flat’ camp that new DSPs are somehow designed to perform better with ‘flat capacity’ and that there is so much performance on newer DSPs that you can afford to assign a lot of resources to a connection, no matter what quality it is. To the first argument, I know DSPs and they are designed to be a shared resource. Obviously, it is easier and simpler to assign a HD-capable DSP to a connection and let it process whatever quality comes in. It requires more sophisticated resource management to dynamically assign parts of DSPs to handle less demanding connections and full DSPs to HD connections. To the second argument, it is true that DSP performance increases but the complexity of handling HD is an order of magnitude higher than SD. Arguments for wasting resources sound hollow for conferencing servers that can cost $200,000 and up.

Anyway, the rational argumentation did not help resolve the discrepancies between ‘flat’ and ‘flexible’, and this resulted in new products that support both modes and allow the administrator to switch between them. For example, Polycom RMX 2000 easily switches between ‘flexible resource management’ and ‘fixed resource (‘flat capacity’) modes – the change does not even require restarting the server.

But now that the Pandora’s Box is opened, and everyone has an opinion on conference server resource management, there are a lot of new ideas for modes that make the server more efficient for certain applications. On the low end, desktop video is becoming popular and poses a new set of requirements to conference servers, so it is feasible to create a mode of operation dedicated to desktop video deployments. HD is less of an issue for desktop video but scalability is very important when entire organizations become video-enabled.

On the high end, multi-screen telepresence applications demand more performance per system from the conference server, while multiple video streams (one inbound and one outbound for each screen) must be associated and treated as a bundle. Some vendors like Tandberg decided to develop a completely separate product (Telepresence Server) to handle multi-point calls among multi-screen telepresence systems. I think this approach is an overreaction, some may say – an overkill. There are indeed some specific layouts that must be handled differently in a multi-screen telepresence environment but that does not mean putting a separate (and very expensive) server in the network just to handle telepresence calls. I think the approach where the standard conference server has a mode for multi-screen telepresence calls is much more sound from both business and technical perspectives – the main benefit is that you can still use the remaining resources on the server for regular calls among single-screen systems. This is also in accord with the maximum utilization philosophy driving ‘flexible resource management’.

As for the ‘separate telepresence server’ camp, it is not a coincidence that the same team that introduced ‘flat capacity’ is now pitching ‘separate telepresence server’. I see no innovation in limiting flexibility to achieve simplicity - true innovation is simplifying while keeping the flexibility intact.

In summation, conference servers are still the core of visual communication. In the past they had one application: video conferencing. Today, they have to handle video rooms, multi-screen telepresence systems, and desktop video. It is not surprising, therefore, that conference servers evolve and become more versatile. Adding new modes for resource management is a very pragmatic approach to satisfying requirements from new video applications, especially if switching among modes is fast and easy. Developing additional applications – which can run on general purpose computers and communicate with the conference server – is another valid approach. Developing separate servers for each application – room video, multi-screen telepresence, desktop video – is not scalable: it fills the network with hardware that is redundant in the bad way. As visual communication becomes mainstream and changes both our personal and professional lives, new sets of requirements to the conference server will emerge and the flexibility of the server platform to accommodate these requirements will decide whether conferencing servers will continue to be the heart of the video network or not.