There are hundreds of video, voice and audio codecs out there. In June 2007, Brendon Mills from Ripcode claimed he had counted 250 audio codecs and about 733 video codecs. While his count may be a little exaggerated to support the business case for transcoding, there are definitely too many codecs in the market place, and most of them are only used in one particular closed application.
We distinguish between speech codecs that are designed to work well with human speech (not with music and natural noises) and audio codecs that are designed to work well with all sorts of audio: music, speech, natural noises, and mixed content. Since speech is a subset of audio, I prefer using the term ‘audio codecs’ in general conversations about audio technology.
Some codecs are standards, for example, the G. series of audio codecs and H. series of video codecs. Other codecs are proprietary, for example, On2 VP6 for video and Polycom Siren 22 for audio. The differences among codecs are mainly in the encoding techniques, supported bit rates, audio frequency spectrum (for audio codecs), or supported resolutions and frame rates (for video codecs).
With so many codec choices, we are at a point where the complexity of handling (‘supporting’) numerous codecs in communication equipment creates more problems than the benefits we get from a codec’s better performance in one particular application. There are at least three main problems with supporting many codecs in communication equipment. The first and biggest problem is interoperability. Yes, there are ‘capability exchange’ procedures in H.323, SIP and other protocols – these are used to negotiate a common codec that can be used on both ends of the communication link – but these procedures create complexity, delay call setup, and lead to a lot of errors when the codec parameters do not match 100%. Second, supporting multiple codecs means maintaining their algorithms and code tables in the device memory, which leads to memory management issues. Third, many codecs today require licensing from individual companies or consortia who own the intellectual property rights. That is both an administrative and a financial burden.
These are three good reasons to look for simplification of the codec landscape. The only reason not to simplify is backward compatibility, that is, interoperability with older systems that support these codecs. For example, new video systems ship with high quality H.264 video codecs but still support the old and inefficient H.261 and H.263 video compressions to interwork with installed base of video systems in the network.
Most of the audio and video codecs emerged in the last few year, especially with the advances of video streaming. The longer it takes for the industry to converge around fewer universal codecs, the more interoperability problems with the installed base will we face in the future. This makes codec convergence an urgent issue.
Let’s look at audio and ask the fundamental question ‘How many audio codecs do we as industry really need to fulfill the dream of Unified Communication (UC)?
The answer is driven by the types of packet (IP) networks that we have today and will have in the future. With Gigabit Ethernet finding wide adoption in Local Area Networks (LANs) and access networks and with fast Wide Area Networks (WANs) based on optical networks, bit rate for audio is not a problem anymore. With the state of audio encoding technology today, great audio quality can be delivered over 128kbps per channel, or 256kbps for stereo. Most enterprises and high-speed Internet Service Providers (ISPs) have IP networks that are fast enough to carry good quality audio.
High quality audio is critical for business communication (it is a major components in creating an immersive telepresence experience) and for Arts and Humanities applications (the Manhattan School of Music is a good example http://www.polycom.com/global/documents/whitepapers/music_performance_and_instruction_over_highspeed_networks.pdf). The new ITU-T G.719 codec competes with the MPEG AAC codecs for this space. As argued in the white paper ‘G.719 – The First ITU-T Standard for Full-Band Audio’ http://www.polycom.com/global/documents/whitepapers/g719-the-first-itut-standard-for-full-band-audio.pdf ), the low complexity and small footprint of G.719 makes it more suitable for UC applications that require high quality audio. Its bit rates range from 32 to 128 kbps (per channel) which makes it a great choice for even relatively slow fixed networks.
At the same time, there are packet networks that have very little bandwidth; for example, mobile networks still offer relatively low bit rates. General Packet Radio Service (GPRS) - a packet oriented mobile data service available to users of Global System for Mobile Communications (GSM) – is widely deployed today in the so called 2G networks. GPRS today uses three timeslots with maximum bit rate of 24kbps. However, the application layer Forward Error Correction (FEC) mechanisms lead to much smaller bit rates of about 18kbps. The evolution of GPRS known as 2.5G supports a better bit rate up to a theoretical maximum of 140.8kbps, though typical rates are closer to 56kbps – barely enough to run high-quality audio. Such ‘bad’ networks require efficient low bandwidth audio codec that provides higher quality than the ‘good old PSTN’ (G.711 codec). There are several good wideband audio codecs that provide substantially higher quality than PSTN and that can operate within the mere 18kbps packet connection to mobile devices. AMR-WB http://en.wikipedia.org/wiki/AMR-WB and Skype’s new SILK codec http://www.wirevolution.com/2009/01/13/skypes-new-super-wideband-codec/ come to mind and are possible candidates to address this need.
In the area of video compression, market forces and desire to avoid resource-intensive video transcoding led to the wide adoption of H.264 not only for real-time communication (telepresence, video conferencing, even video telephony) but also in the video streaming market – with Adobe’s adoption of H.264 in its Flash media player. I see the trend towards H.264 in other video-related markets such as digital signage and video surveillance.
In summation, UC is about connecting now separate communication networks into one, and providing a new converged communication experience to users. To avoid loss of audio and video quality due to transcoding gateways, the industry has to converge around few audio and video codecs that provide great quality in ‘good’ and ‘bad’ networks and that have low complexity and small footprint to fit in systems from immersive telepresence to mobile phones. It is time to have an unbiased professional discussion which codecs will take us best to the UC future we all dream of.