Video Networker: May 2009

Sunday, May 17, 2009

How will the migration from IPv4 to IPv6 impact visual communication?

All information in the Internet and in private intranets is carried in packets. The packet format was defined in the 1980s and described in the Internet Protocol specification (also referred to as IPv4, IETF RFC 791, http://www.ietf.org/rfc/rfc0791.txt?number=791). When IPv4 was designed no one really expected that the Internet would become so pervasive and using 32 bits to address network elements seemed reasonable. The maximum size of the IP packet was set to 65535 bytes which was more than enough for any application at the time. Since the organizations initially using Internet trusted each other, security was not important requirement for IPv4, and the protocol itself did not provide any security mechanisms.

In the 1990s, the rapid growth of the Internet led to the first discussions about the design limitations of the IPv4 protocol. The industry was mostly concerned about the small address space and the discussion lead to the definition of a new packet protocol (IPv6, IETF RFC 1883 and later RFC 2460, http://www.ietf.org/rfc/rfc2460.txt?number=2460) that uses 128-bit addresses. However, changing the underlying networking protocol means high cost to service providers and they did not rush into implementing IPv6. Instead, service providers used Network Address Translation (NAT) and later double-NAT as workarounds to overcome the address space shortage. NATs directly impact real-time communication – including visual communication – because they hide the real IP address of the destination and video system on the Internet cannot just call a video system behind the corporate NAT. Business-to-business calls must go through multiple NATs, and this frequently leads to call failures. Another fundamental problem with NATs is that they change the IP address field in the IP packet and this leads to incorrect checksums and encryption failures, i.e., NATs break end-to-end security in IP networks.

So why has the migration to IPv6 become such a hot topic over the last few months? I wrote about the discussions at the 74th IETF meeting http://videonetworker.blogspot.com/2009/04/summary-of-74th-ietf-meeting-in-san.html, and there were additional discussions, presentations and panels about the urgent need to migrate to IPv6 at the FutureNet conference http://www.futurenetexpo.com/attend/conf_at_a_glance.html.

While corporate networks can continue to use IPv4 address and NATs for decades, service providers do need unique IP addresses for the home routers, laptops and other mobile devices their customers are using. The pool of available IPv4 addresses is being depleted very fast, and according to Internet Assigned Numbers Authority (IANA), the last full block of IP addresses will be assigned in about 2.5 years, i.e. in end 2011. The address shortage is bad in Europe and very bad in Asia where China is adding something like 80 million Internet users a year. It is human psychology to ignore things that are far in the future but 2011 is so close and so real that everyone started panicking, and looking at IPv6 as the savior of the Internet.

Although the migration to IPv6 is driven by the address shortage, IPv6 brings many new functions that will have impact on real-time applications such as voice and video over IP. Since there will be enough IPv6 addresses for everyone and everything, NATs can be completely removed, and real-time applications would work much better on the Internet. Some organizations believe that NATs’ ability to hide IP addresses of internal IP servers and devices provide security, and they push for having NATs in IPv6 networks. Security experts have repeatedly stated that NATs do not improve security because a hacker can scan the small IPv4 subnets– they usually have just 255 IP addresses each – within seconds, even if they are behind a NAT. Scanning IPv6 subnets in comparison is futile because these subnets are so large that it would take years to find something in the subnet. Removing NATs would allow end-to-end security protocols such as IPSEC to efficiently secure the communication in IP networks.

Quality of Service (QoS) mechanisms developed for IPv4 can be further used with IPv6. The new header structure in IPv6 allows faster header parsing which leads to faster packet forwarding in routers. The impact on real-time communication is positive: voice and video packets will move faster through the IP network.

The new packet structure in IPv6 allows for larger packets with jumbo payload between 65535 and 4 billion bytes. This would allow sending more video information in a single packet, instead of splitting it in multiple packets. This should benefit visual communications, especially as video quality increases and video packets get larger. The way IPv6 handles packets leads to another security improvement. Many security problems in IPv4 are related to packet fragmentation, which happens if a packet has to be sent through a slower link. The router splits the packet in multiple fragments and sends them as separate IP packets. The receiver must recognize the fragmentation, collect all pieces, and put the original packet together. IPv6 does not allow packet fragmentation by intermediaries/routers which now must drop too large packets and send ICMPv6 Packet Too Big message to the sender/source. The source then reduces the packet size so that it can go across the network in one piece.

Note that just supporting the new IPv6 headers in networking equipment is only a part of supporting IPv6. Several other protocols have been enhanced to support IPv6:
- Internet Control Message Protocol (ICMP) v6 (RFC 4443, http://www.ietf.org/rfc/rfc4443.txt?number=4443) and the additional SEcure Neighbor Discovery (SEND, RFC 3971, http://www.ietf.org/rfc/rfc3971.txt?number=3971)
- Dynamic Host Configuration Protocol (DHCP) for IPv6 (RFC 3315, http://www.ietf.org/rfc/rfc3315.txt?number=3315)
- Domain Name System (DNS) for IPv6 (RFC 4472, http://www.ietf.org/rfc/rfc4472.txt)
- Open Shortest Path First (OSPF) routing protocol for IPv6 (RFC 5340, http://www.ietf.org/rfc/rfc5340.txt?number=5340)
- Mobility Support in IPv6 (RFC 3775, http://www.ietf.org/rfc/rfc3775.txt?number=3775)

Wednesday, May 6, 2009

How many codecs does unified communication really need?

There are hundreds of video, voice and audio codecs out there. In June 2007, Brendon Mills from Ripcode claimed he had counted 250 audio codecs and about 733 video codecs. While his count may be a little exaggerated to support the business case for transcoding, there are definitely too many codecs in the market place, and most of them are only used in one particular closed application.

We distinguish between speech codecs that are designed to work well with human speech (not with music and natural noises) and audio codecs that are designed to work well with all sorts of audio: music, speech, natural noises, and mixed content. Since speech is a subset of audio, I prefer using the term ‘audio codecs’ in general conversations about audio technology.

Some codecs are standards, for example, the G. series of audio codecs and H. series of video codecs. Other codecs are proprietary, for example, On2 VP6 for video and Polycom Siren 22 for audio. The differences among codecs are mainly in the encoding techniques, supported bit rates, audio frequency spectrum (for audio codecs), or supported resolutions and frame rates (for video codecs).

With so many codec choices, we are at a point where the complexity of handling (‘supporting’) numerous codecs in communication equipment creates more problems than the benefits we get from a codec’s better performance in one particular application. There are at least three main problems with supporting many codecs in communication equipment. The first and biggest problem is interoperability. Yes, there are ‘capability exchange’ procedures in H.323, SIP and other protocols – these are used to negotiate a common codec that can be used on both ends of the communication link – but these procedures create complexity, delay call setup, and lead to a lot of errors when the codec parameters do not match 100%. Second, supporting multiple codecs means maintaining their algorithms and code tables in the device memory, which leads to memory management issues. Third, many codecs today require licensing from individual companies or consortia who own the intellectual property rights. That is both an administrative and a financial burden.

These are three good reasons to look for simplification of the codec landscape. The only reason not to simplify is backward compatibility, that is, interoperability with older systems that support these codecs. For example, new video systems ship with high quality H.264 video codecs but still support the old and inefficient H.261 and H.263 video compressions to interwork with installed base of video systems in the network.

Most of the audio and video codecs emerged in the last few year, especially with the advances of video streaming. The longer it takes for the industry to converge around fewer universal codecs, the more interoperability problems with the installed base will we face in the future. This makes codec convergence an urgent issue.

Let’s look at audio and ask the fundamental question ‘How many audio codecs do we as industry really need to fulfill the dream of Unified Communication (UC)?

The answer is driven by the types of packet (IP) networks that we have today and will have in the future. With Gigabit Ethernet finding wide adoption in Local Area Networks (LANs) and access networks and with fast Wide Area Networks (WANs) based on optical networks, bit rate for audio is not a problem anymore. With the state of audio encoding technology today, great audio quality can be delivered over 128kbps per channel, or 256kbps for stereo. Most enterprises and high-speed Internet Service Providers (ISPs) have IP networks that are fast enough to carry good quality audio.

High quality audio is critical for business communication (it is a major components in creating an immersive telepresence experience) and for Arts and Humanities applications (the Manhattan School of Music is a good example http://www.polycom.com/global/documents/whitepapers/music_performance_and_instruction_over_highspeed_networks.pdf). The new ITU-T G.719 codec competes with the MPEG AAC codecs for this space. As argued in the white paper ‘G.719 – The First ITU-T Standard for Full-Band Audio’ http://www.polycom.com/global/documents/whitepapers/g719-the-first-itut-standard-for-full-band-audio.pdf ), the low complexity and small footprint of G.719 makes it more suitable for UC applications that require high quality audio. Its bit rates range from 32 to 128 kbps (per channel) which makes it a great choice for even relatively slow fixed networks.

At the same time, there are packet networks that have very little bandwidth; for example, mobile networks still offer relatively low bit rates. General Packet Radio Service (GPRS) - a packet oriented mobile data service available to users of Global System for Mobile Communications (GSM) – is widely deployed today in the so called 2G networks. GPRS today uses three timeslots with maximum bit rate of 24kbps. However, the application layer Forward Error Correction (FEC) mechanisms lead to much smaller bit rates of about 18kbps. The evolution of GPRS known as 2.5G supports a better bit rate up to a theoretical maximum of 140.8kbps, though typical rates are closer to 56kbps – barely enough to run high-quality audio. Such ‘bad’ networks require efficient low bandwidth audio codec that provides higher quality than the ‘good old PSTN’ (G.711 codec). There are several good wideband audio codecs that provide substantially higher quality than PSTN and that can operate within the mere 18kbps packet connection to mobile devices. AMR-WB http://en.wikipedia.org/wiki/AMR-WB and Skype’s new SILK codec http://www.wirevolution.com/2009/01/13/skypes-new-super-wideband-codec/ come to mind and are possible candidates to address this need.

In the area of video compression, market forces and desire to avoid resource-intensive video transcoding led to the wide adoption of H.264 not only for real-time communication (telepresence, video conferencing, even video telephony) but also in the video streaming market – with Adobe’s adoption of H.264 in its Flash media player. I see the trend towards H.264 in other video-related markets such as digital signage and video surveillance.

In summation, UC is about connecting now separate communication networks into one, and providing a new converged communication experience to users. To avoid loss of audio and video quality due to transcoding gateways, the industry has to converge around few audio and video codecs that provide great quality in ‘good’ and ‘bad’ networks and that have low complexity and small footprint to fit in systems from immersive telepresence to mobile phones. It is time to have an unbiased professional discussion which codecs will take us best to the UC future we all dream of.

Monday, May 4, 2009

Summary of the Internet2 Meeting in Arlington, Virginia, April 27-29, 2009

Internet2 http://www.internet2.edu/ is a not-for-profit high-speed networking organization. Members are 200+ U.S. universities, 70 corporations, 45 government agencies, laboratories and other institutions of higher learning. Internet2 maintains relationships with over 50 international partner organizations, such as TERENA in Europe. Internet2 has working groups that focus on network performance and middleware that is shared across educational institutions to develop applications. For example, InCommon focuses on user authentication in federated environments, Shibboleth - on web single sign-on and federations, Grouper – on groups management toolkit, and perfSonar – on performance monitoring.

Internet2 members meet twice a year. The spring 2009 meeting took place in Arlington, Virginia, just outside Washington, D.C., and gathered about 640 participants from 280 organizations. 96 of the participants were from corporate members. Here is a short video from the Interent2 reception on Monday, April 27: http://www.youtube.com/watch?v=C13uAKg7omQ.

Polycom has been a member of Internet2 for years, and has contributed equipment and sponsored events. Six HDX systems were used in Arlington to connect remote participants, e.g. from Kenya and Ecuador. I have been involved in Internet2 since 2007, and presented at the several meetings. At the event this week, my two presentations addressed telepresence. The first one was on Tuesday – I was part of a large panel of vendors in the telepresence industry http://events.internet2.edu/2009/spring-mm/agenda.cfm?go=session&id=10000509&event=909. I shot a short video during the preparation of the telepresence panel http://www.youtube.com/watch?v=NvYwF-HqdWo.

One thing I do not like about vendor panels is that folks tend to jump into product pitches and competitive fighting. In telepresence panels there also the tendency to define telepresence in a way that matches vendor’s own products, and exclude everything else. Instead, I focused on the broader definition of telepresence and the different levels of interoperability that we should look at as an industry. In my view, telepresence is an experience (as if you are in the same room with the people on the other side) that can be achieved with different screens sizes, codecs, and audio technologies. For example, people using Polycom RPX may consider Cisco CTS not immersive enough. Properly positioned and with the right background a single-screen HDX system can provide more immersive experience than a three-screen system on a multipoint call. All speakers seem to agree that the remote control is not part of the telepresence experience. Some insisted that the cameras have to be fixed but I do not really agree with that. If a three-screen system is connected to a three-screen system, the cameras have one angle. If you connect the same three-screen system to a two-screen system, changing the camera angle could deliver better experience for the remote (two-screen) site. So in my view, moving cameras is OK as long as it happens automatically and the user is not involved in the process.

Signaling level interoperability is important as we have systems that use H.323, SIP, and proprietary protocols in the market. But using the same signaling does not mean interoperability. There is no standard for transmitting spatial information, e.g., where the screens are located, which audio channel is on what side, and what is the camera angle. While video interoperability is easier due to the wide adoption of the H.264 standard, audio interoperability is still a problem. There are several competing wideband and full-band speech and audio codecs that are incompatible, so systems from different vendors today negotiate down to the low-quality common denominator which does not support a telepresence experience. I got a lot of positive feedback after the panel; Internet2 attendees are much more interested in balanced analysis of the interoperability issues than in products. Presentation slides are posted on the session page. The session was streamed and the recording should be available for viewing in a few days.

My second telepresence presentation was a joint session with John Chapman from the Georgetown University http://events.internet2.edu/2009/spring-mm/sessionDetails.cfm?session=10000467&event=909. John Chapman described the history of Georgetown’s remote campus in Qatar and the attempts to connect it back to the main campus in Washington D.C. via video conferencing and collaboration tools. He then described the decision process that led to the selection and installation of two Polycom Real Presence Experience (RPX) systems.

My presentation provided an overview of the existing telepresence options from Polycom (different sizes of RPX 400 series, RPX 200 series and the TPX system) that can meet the requirements for immersive interaction of up to 28 people per site. I focused on the technologies used in creating the telepresence experience – monitors, cameras, microphones, speakers, and furniture. Then I talked about the new functionality in the recently released TPX/RPX V2.0 and about the differences between the Video Network Operations Center (VNOC) service and the newly announced Assisted Operations Service (AOS). Using video clips proved very efficient in this presentation and made it very interactive. Presentation slides will be available for viewing at the session web page.

Now a couple of highlights from the meeting …

The IETF chair Russ Housley talked about successful protocols http://events.internet2.edu/speakers/speakers.php?go=people&id=2546 – this seems to be a recurring theme at IETF, as you can see in my summary of the last IETF meeting. Russ focused on the main challenges for the Internet: increasing demand for bandwidth, need to reduce power consumption in network elements, creating protocols that run well on battery-powered mobile devices, support of new applications (like video streaming and real-time video communication), and, finally, the issue with the empty pool of IP V4 addresses and the urgent need to migrate to IP V6. Russ Housley also called for more academic researchers to become involved in IETF. This only reinforces my observation that IETF has been taken over by vendors, and researchers are now in minority; see summary of the last (74th) IETF Meeting here http://videonetworker.blogspot.com/2009/04/summary-of-74th-ietf-meeting-in-san.html.

I know Ken Klingenstein from previous Intrenet2 and TERENA meetings. His primary focus is federated identities and he presented about successful implementation of federation on national level in Switzerland: http://events.internet2.edu/2009/spring-mm/agenda.cfm?go=session&id=10000483&event=909. I talked to him in the break. The InCommon group http://www.incommonfederation.org/about.cfm wanted to create a mechanism for user authentication in federated environments. They looked at Kerberos, SIP Digest Authentication, etc. but none fit federated environment. InCommon therefore developed a mechanism that replicated web HTTP authentication. For example, when the user agent sends an INVITE, the SIP server challenges it with a message, and points at authentication server that is recognized by this SIP server. The user agent connects over HTTP to the authentication service (which can be anything, e.g., Kerberos, NTLM, or Digest), gets authenticated and then sends its authenticated information (name, organization, location, email address, phone number, etc. combined in a SAML assertion) to the destination. They need a standard mechanism to send the SAML assertion to the destination – in a SIP message or out- of-band (through another protocol). In Switzerland, SWITCH created an ID card with that information and the destination user agent displays this ID card to the user who decides whether and how to respond, e.g., accept the call. This authentication mechanism is very important for video endpoints that connect to a federation. Endpoints today support digest authentication in pure SIP environment or NTML in Microsoft environment while H.235 is not widely implemented in H.323 environments. As stated above, universal method for authentication is required in federated environments.

During the general session on Wednesday morning, there was also a demo of a psychiatrist using single-screen ‘telepresence’ system from Cisco to connect to a veteran, and discuss possible mental problems. On one hand, I am glad that Cisco is using its large marketing budget to popularize telepresence - this helps grow the video market as a whole. On the other hand, the whole demo implied that only Cisco provides this technology, and I addressed the issue in my presentation on Wednesday afternoon. The HD 1080p technology used in the demo is now available from Polycom and other vendors. The presentation introducing the demo referred to hundreds of installed video systems but failed to mention that the interoperability between Cisco telepresence and other video systems is so bad that it cannot be used for tele-psychiatry or any other application demanding high video quality. The demo itself was not scripted very well – the veteran did not seem to have any problems and the psychiatrist did not seem to know how to use the system. The camera at the remote location was looking at a room corner and did not provide any telepresence experience (it looked like a typical video conferencing setup).

I attended a meeting of the Emerging National Regional Education Networks (NREN) group. The One Laptop Per Child (OLPC) program distributed millions of laptops to children in developing countries http://laptop.org/. These laptops do not have much memory (256MB) and the CPU is not very fast (433MHz). Their only input/output interface is Wi-Fi.

Ohio University decided to test what kind of video can be enabled on these laptops, so that children can participate in virtual classes. The laptops can decode H.263 video (not H.264) and the team therefore installed VLC media player over the IP network, and used it to decode streaming video in H.263 format. An MCU converts H.264 video into H.263. The streaming protocol between the MCU and the laptop is Real Time Streaming Protocol (RTSP) http://www.ietf.org/rfc/rfc2326.txt. Here is how it looks http://www.youtube.com/watch?v=vjh14l-60Pc. To allow feedback (questions from the children to the presenters), they use Pidgin chat client http://www.pidgin.im/ that talks to different chat services: AIM, Google Talk, Yahoo!, MSN, etc. Children can watch the streaming video and switch to the Pidgin application to send questions over chat.

In summation, the Internet2 meeting in Arlington was very well organized and attended. It provided great opportunities to discuss how education, government and healthcare institutions use video to improve their services.