Monday, November 9, 2009

Hardware Architecture for Conference Servers

The ATCA Summit http://www.advancedtcasummit.com/ (October 27-29, 2009) was a rare opportunity to think about and discuss the importance of hardware for our industry. As the communication industry becomes more software-driven, major industry events have focused on applications and solutions, and I rarely see good in-depth sessions on hardware. The ATCA Summit provided a refreshing new angle to communication technology.

First of all, ATCA stands for Advanced Telecom Computing Architecture and is a standard developed by the PICMG – a group of hardware vendors with great track record for defining solid hardware architectures: PCI, Compact PCI, MicroTCA, and ATCA. The ATCA Summit is the annual meeting of the ATCA community, or ecosystem, which includes vendors making chassis, blades, fans, power supplies, etc. components that can be used as tool kit to build a server quickly. Time-to-market is definitely an important reason companies turn to ATCA instead of developing their own hardware but equally important is that this telecom-grade (carrier-grade) hardware architecture provide very high scalability, redundancy, and reliability.

So, how does ATCA relate to visual communications? As visual communication becomes more pervasive and business critical both service providers (offering video services) and large enterprises (running their own video networks) start asking for more scalability and reliability in the video infrastructure. The core of the video infrastructure is the conference server (MCU), and the hardware architecture used in that network element has direct impact on the ability to support large video networks. HD video compression is very resource-intensive: raw HD video is about 1.5 gigabits per second, and modern H.264 compression technology can get it down to under 1 megabit per second. This 1500-fold compression requires powerful chips (DSPs) that generate a lot of heat; therefore, the conference server hardware must provide efficient power (think AC and DC power supplies) and cooling (think fans). But even in compressed form video is still using a lot of network bandwidth, and the conference server is the place where all video streams converge. Therefore, conference servers must have high input and output capabilities (think Gigabit Ethernet). Finally, some sort of blade architecture is required to allow for scalability, and server performance heavily depends on the way these blades are connected. The server component that connects the blades is referred to as ‘backplane’, although it does not need be physically in the back of the server. The ATCA architecture was built from ground up to meet these requirements. It was created with telecom applications in mind and has therefore high input/output, great power management and cooling, and a lot of mechanisms for high reliability.

The highlight of the ATCA Summit is always the Best of Show award. This year, Polycom RMX 4000 won Best of Show for infrastructure product, and I had the pleasure to receive the award. I posted a picture from the award ceremony here http://www.flickr.com/photos/20518315@N00/4080968072/. Subsequently, I presented in the session ‘The Users Talk Back’, and addressed the unique hardware functions in RMX 4000 that led to this award (http://www.flickr.com/photos/20518315@N00/4059010146/)

So, why did Polycom RMX 4000 win? I think it is mostly elegant engineering design and pragmatic decisions how to leverage standard hardware architecture to achieve unprecedented reliability. It starts with a high-throughput, low-overhead backplane (which we call ‘fabric switch’) that allows free flow of video across blades. This allows conferences to use resources from any of the blades. To illustrate the importance of this point, let’s briefly compare RMX 4000 to Tandberg MSE 8000 which combines 9 blades into a chassis but does not have a high-throughput backplane. Since video cannot flow freely among blades in MSE 8000, conferences are restricted to the resources available on a just one of the 9 blades. For example, if blade 1 supports 20 ports but 15 of them are already in use, you can only create a 5-party conference on that blade. If you need to start a 6-party conference, you cannot use blade 1, and have to look for another blade – let’s say blade 2 - that has 6 free ports. The 5 ports on blade 1 will stay idle until there is a conference of 5 or less participants. In fact, the ‘flat capacity’ software that is running on top of this hardware leads to even worse resource utilization on MSE 8000 but this article is about hardware, so I am not going into that subject (It is discussed in detail here http://videonetworker.blogspot.com/2009/08/curious-story-of-resource-management-in.html). The bottom line is that, with RMX 4000, you will be able to connect 5 participants to one blade and connect the sixth participant to another blade, without even noticing it.

Additional reliability can be gained by using DC power and full power supply redundancy. Direct Current (DC) power is used internally in all electronics equipment. However, power comes as Alternating Current (AC) over the power grid because AC power loss over long distances is lower than DC power loss. Once power reaches the data center, it makes sense to convert it once to DC and feed it to all servers, and that is why service providers and large enterprises running their own data centers like DC power. The alternative approach - provide AC power to each server and have each server convert it to DC - results in high conversion power loss, and is, basically, waste of energy, and should only be used if DC power is not available. RMX 4000 supports both AC and DC power but I am much more excited about the new DC power option. Each DC power supply has 1.5kW, and can power the entire RMX 4000. Best practice is to connect one DC power supply to the data center’s main power line and connect the second one to the battery array. Data centers have huge battery arrays that can keep them running even if the primary power line is down for hours or even days.

Reliability issues may arise from mixing media (video/audio) and signaling/management traffic, and therefore RMX 4000 completely separates these two types of traffic internally. This architectural approach also benefits security, since attacks against servers are usually about getting control of the signaling to manipulate the media. By clearly separating the two, RMX 4000 makes hijacking the server from outside impossible. Note that hijacking of voice conference servers is a major problem for voice service providers (I wrote about that here http://videonetworker.blogspot.com/2009/04/conferencing-service-providers-meet-at.html). As visual communication becomes more pervasive and business critical, similar issues can be expected in this space as well, and RMX 4000 is designed for that more dangerous future.

Finally, if a component in the conference server fails, it is critical that it can be replaced without disconnecting all calls and shutting down the server, thus preserving server-level reliability. All critical components in RMX 4000 are therefore hot swappable. This includes the four media blades (they are in the front of the chassis and host the video processing DSPs), RTM LAN modules (they are on the back of the chassis and connect to the IP network) and RTM ISDN modules (also on the back, connect to the ISDN network), power supplies, and fans. Each of these components can be removed and replaced with a new one while the RMX 4000 server is running.

..

I will discuss the topics of network-based redundancy and reliability in a separate article. Stay tuned!

Sunday, October 25, 2009

PART 8: ‘TELEPRESENCE INTEROPERABILITY IS HERE!’

The results from the telepresence interoperability demo were discussed on October 7 in the session “Telepresence Interoperability is Here!” http://events.internet2.edu/2009/fall-mm/agenda.cfm?go=session&id=10000758&event=980. Bob Dixon used visual and sound effects (including love songs and Hollywood-style explosions) to explain interoperability to people who are less involved in the topic. His presentation inspired me to write about telepresence interoperability for less technical and more general audience. (I hope that my series of blog posts achieved that). Bob highlighted that this was not only the first multi-vendor telepresence interoperability but also the first time systems on Interent2, Commodity Internet, Polycom’s, Tandberg’s, and IBM’ networks successfully connected.

Gabe connected through an HDX video endpoint to RSS 2000 and played back some key parts of the recording from the interoperability demos on October 6 (http://www.flickr.com/photos/20518315@N00/4015164486/). I was actually pleasantly surprised how much information the RSS 2000 captured during the demos. I later found out that Robbie had created a special layout using the MLA application on RMX2000, and this layout allowed us to see multiple sites in the recording.

Robbie (over video from Ohio State) commented that connecting the telepresence systems was the easier part while modifying the layouts turned out to be more difficult. He was initially surprised when RMX/MLA automatically associated video rooms 451, 452, and 453 at Ohio State into a telepresence system but then used this automation mechanism throughout the interoperability tests.

Jim talked about the need to improve usability.

Gabe talked about monitoring the resources on RMX 2000 during the tests and reported that it never used more than 50% of the resource.

I talked mainly about the challenges to telepresence interoperability (as described in Part 2) and about the need to port some of the unique video functions developed in H.323 into the SIP, which is the protocol used in Unified Communications.

Bill (over video from IBM) explained that his team has been testing video interoperability for a year. The results are used for deployment decisions within IBM but also for external communication. IBM is interested in more interoperability among vendors.

During the Q&A session, John Chapman spontaneously joined the panel to answer questions about the demo call to Doha and about the modifications of their telepresence rooms to make them feel more like classrooms.

The Q&A session ran over time and number of attendees stayed after that to discuss with the panelists.

There was a consensus in the room that the telepresence interoperability demo was successful and very impressive. This success proves that standards and interoperability are alive and can connect systems from different vendors running on different networks. The series of tests were also a great team work experience in which experts from several independent, sometimes competing, organizations collaborated towards a common goal.

Back to beginning of the article ... http://videonetworker.blogspot.com/2009/10/telepresence-interoperability.html

Friday, October 23, 2009

PART 7: TELEPRESENCE INTEROPERABILITY DEMO

The demo on October 6 was the first immersive telepresence demo at Internet2. Note that Cisco showed their CTS 1000 telepresence system at the previous Internet2 conference; however, this system has only one screen, and feels more like an HD video conferencing system than an immersive telepresence system. Also, the Cisco demo was on stage and far away from viewers while the TPX demo was available for everyone at the conference to experience.

The following multi-codec systems participated in the telepresence interoperability demo:- Polycom TPX HD 306 three-screen system in Chula Vista Room, Hyatt Regency Hotel, - Polycom TPX HD 306 three-screen system located in Andover, Massachusetts, - LifeSize Room 100 three-screen system located at OARnet in Columbus, Ohio, - Polycom RPX 200 at iFormata in Dayton, Ohio- Polycom RPX 400 at IBM Research in Armonk, NY - Tandberg T3 three-screen system located in Lisbon, Portugal (the afternoon demos were too late for Rui and Bill connected a T3 system in New York instead)

The systems were connected either to the Polycom RMX 2000 located at Ohio State University in Columbus, Ohio, or to the Tandberg Telepresence Server at IBM Research in Yorktown Heights, NY.

As for the setup in Chula Vista, TPX comes with 6 chairs, and there were additional 30 chairs building several rows behind the system. There was enough space for people to stand in the back of the room. (http://www.flickr.com/photos/20518315@N00/4014401487/)

I can only share my experience sitting in the TPX system in the Chula Vista Room. I am sure other participants in the demo have experienced it a little differently. I was tweeting on the step-by-step progress throughout the demos.

The final test plan included both continuous presence scenarios and voice switching scenarios. Voice switching is a mechanism widely used in video conferencing; the conference server detects who speaks, waits for 2-3 seconds to make sure it is not just noise or a brief comment, and then starts distributing video from that site to all other sites. The twist - when telepresence systems are involved - is that not only one but all 2, 3, or 4 screens that belong to the ‘speaking’ site must be distributed to all other sites. Voice switched tests worked very well; sites were appearing as expected.

Continuous presence – also technology used in video conferencing – allows the conference server to build customized screen layouts for each site. The layout can be manipulated by management applications, e.g. RMX Manager and MLA manipulate the layouts in RMX 2000. (http://www.flickr.com/photos/20518315@N00/4014401683/)

TPX performed flawlessly. On October 5, most calls were at 2Mbps per screen due to some bottlenecks when crossing networks. This issue was later resolved and on October 6 TPX connected at 4Mbps per screen (total of 12 Mbps). TPX was using the new Polycom EagleEye 1080 HD cameras that support 1080p @30fps and 720p @60fps. We used 720p@ 60fps which provides additional motion smoothness.

About quality: The quality of multipoint telepresence calls on RMX 2000 was excellent. A video recorded in Chula Vista is posted at http://www.youtube.com/watch?v=XpfNmJtAtVg. In few test cases, we connected the TPX systems directly to TTPS, and the quality decreased noticeably.

About reliability: In addition to the failure during the first test (described in Part 5), TTPS failed during the morning demo on October 6 (I was tweeting throughout the demo and have the exact time documented here http://twitter.com/StefanKara/status/4633989195). RMX 2000 performed flawlessly.

About layouts: Since TTPS is advertised as a customized solution for multipoint telepresence, I expected that it will handle telepresence layouts exceptionally well. Throughout the demos, Robbie Nobel used the MLA application to control RMX 2000 while Bill Rippon controlled TTPS. In summation, RMX 2000 handled telepresence layouts better than TTPS. The video http://www.youtube.com/watch?v=XpfNmJtAtVg shows a layout created by RMX 2000 – T3 system is connected to RMX through TTPS. In comparison, when the telepresence systems were connected directly to TTPS, even the best layout was a patchwork covering small portion of the TPX screens. (http://www.flickr.com/photos/20518315@N00/4014401367/) I understand that due to the built-in automation in TTPS, the user has limited capability to influence the layouts. While MLA includes layout automation, it does allow the user to modify layouts and select the best layout for the conference.

About capacity: TTPS is 16-port box and each codec takes a port, so it can connect maximum five 3-screen systems or four 4-screen systems. Bill therefore could not connect all available systems on TTPS – the server just ran out of ports. In comparison, RMX 2000 had 160 resources and each HD connection took 4 resources, so that RMX 2000 could connect maximum of 40 HD codecs, i.e., thirteen 3-screen systems or ten 4-screen systems. RMX therefore never ran out of capacity during the demo.

The morning and lunch interoperability demos were recorded on a Polycom RSS 2000 recorder @ IP address 198.109.240.221.

We ran three interoperability demos during the morning, lunch, and afternoon conference breaks. In addition, we managed to squeeze in two additional demos that highlighted topics relevant to Internet2 and the education community. In the first one, we connected the TPX Chula Vista to the Polycom RPX 218 system at Georgetown University in Doha, Qatar on the Arabian Peninsula, and had a very invigorating discussion about the way Georgetown uses telepresence technology for teaching and learning. John Chapman from the Georgetown University and Ardoth Hassler from the National Science Foundation joined us in the Chula Vista room. If you are interested in that topic, check out the joint Georgetown-Polycom presentation at the spring’09 Interent2 conference http://events.internet2.edu/2009/spring-mm/sessionDetails.cfm?session=10000467&event=909. The discussion later went into using telepresence technology for grant proposal review panels.

Another interesting demo was meeting Scott Stevens from Juniper Networks over telepresence and discussing with him how Juniper’s policy management engine interacts with Polycom video infrastructure to provide high-quality of experience for telepresence.

Throughout all interoperability and other demos, the Interent2 network performed flawlessly – we did not notice any packet loss and jitter was very low.

Stay tuned for Part 8 with summary of the test and demo results … http://videonetworker.blogspot.com/2009/10/part-8-telepresence-interoperability-is.html

PART 6: TELEPRESENCE INTEROPERABILITY LOGISTICS

Bringing a TPX to San Antonio required a lot of preparation. We had to find a room in the conference hotel Hyatt Regency that had enough space for the system and for additional chairs for conference attendees to see the demo.

Another important consideration for the room selection was how close it was to the loading dock. The TPX come in 7 large crates and we did not want to move them all over the hotel. And the size of the truck had to fit the size of Hyatt’s loading dock.

It was critical to have the IP network on site up and running before the TPX system could be tested. Usually a lot of the work and cost is related to bringing a high-speed network connection to the telepresence system. This was not an issue at the Internet2 conference since I2 brings 10Gbps to each conference site. We needed only about 12 Mbps (or approximately 0.1% from that) for TPX.

Timing was critical too. The Polycom installation team had to do the installation on the weekend, so that everything would work on Monday morning. The room that we identified was Chula Vista on lobby level. It was close to the loading dock and had enough space. The only issue was that the room was booked for another event on Wednesday, so TPX had to be dismantled on Tuesday, right after the last interoperability demo finished at 4:30pm.

Stay tuned for Part 7 about the telepresence interoperability demo at the Internet2 Conference on October 6, 2009 … http://videonetworker.blogspot.com/2009/10/part-7-telepresence-interoperability.html

Thursday, October 22, 2009

TELEPRESENCE INTEROPERABILITY PART 5: TANDBERG TELEPRESENCE SERVER

At this point, the team was comfortable with the functionality of the RMX, TPX, and Room 100. Adding another infrastructure component – the Tandberg Telepresence Server – to the test bed increased complexity but that was a risk we had to take in order to evaluate T3’ capabilities. It was also my first opportunity to see TTPS in action, and I was curious to find out what it could do. I knew that TTPS was a 16-port MCU, and that it has some additional capabilities to support multi-screen telepresence systems. But I still did not understand what functionality differentiated it from a standard MCU.

The team’s first experiences were not great. The Tandberg Telepresence Server crashed during the first test in which it participated. It also had problems in what is called 'Room Switched Continuous Telepresence' mode: when a T3 site was on TPX full screen and someone in the LifeSize Room 100 started talking, LifeSize was not shown on full screen on TPX but remained in a small preview window on the bottom of the screen and the border around it was flashing in red. We saw this behavior again during the interoperability demos on October 6 http://www.youtube.com/watch?v=GCwUWfgw9ig.

However as we worked with it we found that by cascading to TTPS from an RMX 2000 worked quite well. Gabe or Robbie configured TTPS as three-screen telepresence system on RMX, while Bill configured RMX as three-screen telepresence system on TTPS. And with every test, interoperability got better…

Stay tuned for Part 6 about the logistics around bringing a telepresence system to an industry event … http://videonetworker.blogspot.com/2009/10/part-6-telepresence-interoperability.html