Wednesday, September 30, 2009

How to Manage Quality of Experience for Video?

Video calls require much higher network bandwidth than voice calls; they put therefore more strain on IP networks, and could overwhelm routers and switches to the point that they start losing packets. Video calls also tend to last longer than voice calls (the average length of a voice call is about 3 minutes); therefore, the probability that the network will experience performance degradation during a video call is higher. In addition to voice-related quality issues such as echo and noise, video struggles with freezes, artifacts, pixilation, etc.

So what can we do to guarantee high-quality user experience on video calls? This is an important question for organizations deploying on-premise video today. But due to the increased complexity of video networks, many organizations turn their video networks to managed service providers, and for them, measuring and controlling the quality of experience (QOE) is even more important. It allows managed SPs to identify and fix problems before the user calls the SP’ help desk; this impacts the SP’ bottom line directly.

Everyone who has used video long enough has encountered quality degradation at some point. Packet loss, jitter, and latency fluctuate depending on what else is being transmitted over the IP network. Quality of Service (QOS) mechanisms, such as DiffServ, help transmit real-time (video and voice) packets faster but even good QOS in the network does not necessarily mean that the user experience is good. QOE goes beyond just fixing network QOS; it also depends on the endpoints’ capability to compensate for network imperfections (through jitter buffers and packet recovery mechanisms), remove acoustic artifacts (like echo and noise), and combat image artifacts (like freezes and pixilation).

To monitor user experience, we can ask users to fill out a survey after every call. Skype, for example, is soliciting user feedback at the end of a call but how often do you fill out the form? And what if you are using a video endpoint with a remote control?

For longer video calls, it would be actually better if users report immediately when the issue happens, i.e. during the call. In practice, however, few users report problems while on a call. And even if they do, chances are that no one is available to investigate the issue immediately. In theory, the user could jot down the time when the problem happened and later ask the video network administrator to check if something happened in the IP network at that time. In reality, however, pressed by action items and back-to-back meetings, we just move on. As a result, problems do not get fixed and come back again and again.

Since we cannot rely on the users to report quality issues, we have to embed intelligence in the network itself to measure QOE and either make changes automatically to fix the problem (that would be the nirvana for every network manager) or at least create meaningful report identifying the problem area.

This technology exists today and has already been deployed in some Voice over IP networks. Most deployments use probes - small boxes distributed all over the network and inspecting RTP streams. Probes identify quality issues and report them to an aggregation tool that then generates reports for the network administrator. Integrating the probe’s functionality into endpoints makes the reports even more precise. For example, Polycom phones ship today with an embedded QOE agent that report to QOE management tools.

Originally developed for voice, QOE agents are getting more sophisticated, and now include some video capabilities. They can be used in video endpoints and multi-codec telepresence systems to monitor and report user experience. While this is currently not a priority for on-premise video deployments, QOE may become an important issue if more managed video services become available, as we all hope.

What can we expect to happen in this area in the future? The algorithms for calculating the impact of network issues on QOE will improve. Having the QOE agent embedded in the endpoint allows the endpoint’s application to submit additional quality information, e.g. echo parameters and noise level, to QOE management tools. This would give the tools more data points and lead to more precise identification of problems.

Since call success rate has direct impact on the quality of the user experience, one can expand the definition of QOE and use the same approach for monitoring call success rates. For example, the endpoint’s application can feed information about call success, failure, and reason for failure into the QOE agent and the agent can report that to the QOE reporting tool, which will detect lower-than-normal call success rates and alarm the network administrator. Call success rate can also be derived from Call Detail Records (CDRs) generated by the call control engine in the network; therefore, the alternative approach is to correlate the data from the CDRs with QOE reports from endpoints to identify issues.

While few organizations deploy QOE tools today we see increased interest among managed service providers who see value in any technology that allows them to avoid the dreaded help desk call. In the classic support scenario, a user complaint about bad call quality leads to finger pointing among voice SP, IP network SP, and organization’s IT department. Without proper tools, it is virtually impossible to identify the source of the problem. QOE reporting tools allow administrators to identify the source of the problem and are very valuable in distributed VOIP deployments.

In summation, QOE tools are new and still have a lot of room for improvement. However, the concept itself has been proven for voice and looks promising for video. In the future, look for wider support of QOE agents in voice and video products, and for wider deployment of QOE management tools, especially by managed service providers.