Solving poor Netflix performance

One of the great things about SamKnows One is that it allows us to investigate complex technical issues in a quick and intuitive manner.

Last week we received a message from an unhappy user in Lithuania who was suffering from frequent Netflix buffering issues despite paying for a 100Mbps broadband connection. We took a look and soon realised that the problem extended beyond this one user, and once again, demonstrated that speed tests alone do not tell the whole story.

This article documents our investigation and its findings.

Background

The user — let’s call him Fred — ran a series of web-based speed tests that consistently showed good results that were near to the advertised rate of 100Mbps. Fred was also lucky enough to have a SamKnows Whitebox installed at home, running our SamKnows automated speed tests and Netflix tests frequently. The Whitebox connects to the router and so only measures the quality of service provided by the ISP, removing any in-home influences, such as cross-traffic, from the data we collect. Our Netflix test measures the performance to the real Netflix content servers, using the same CDN redirection logic that the real Netflix application uses.

Figure 1: Source SamKnows One

Figure 1 above shows very clearly that Fred was indeed getting great speeds to the nearest dedicated speed test server (located in Riga, Latvia), but that his download speeds from Netflix were terrible!

Before we go any further, it’s time for a quick primer on how Netflix delivers video content to users…

How Netflix delivers content

Netflix operates a large, global, content distribution network (CDN) to deliver video content to end users. These CDN caches are called “Open Connect Appliances” (OCAs) in Netflix’s parlance. These may be installed inside the ISP’s network, if the ISP is large enough and wants to install them, or they will be hosted by Netflix at a few dozen major international peering locations.

When a user starts up the Netflix app and asks to play a video, an API on Netflix’s side looks at the source IP address of the user and attempts to determine the most appropriate set of OCAs that could serve the user’s request. At the very least, this takes into account the geographical location of the user and the ISP that the user is using. This should result in the user’s requests always being served from a nearby Netflix OCA with plenty of bandwidth — but we will see how this can go wrong…

A far more detailed description of how Netflix delivers content to end users can be found here.

Finding a correlation

The first thing we do when investigating a performance issue is to try to determine whether the issue is observed for just one user or if it’s a wider issue, and if so, what the correlating factor is. In this case, it is very clear to see that the issue is correlated to the ISP Fred uses — Telia Lithuania. We used data from Whiteboxes deployed in Lithuania as the basis of the results below.

Figure 2: Source SamKnows One

Figure 2 above, shows a scatter plot of download speed test results to our dedicated speed test servers in Latvia, yielding nice and stable results, whilst the Netflix download speed results are very erratic.

Figure 3: Source SamKnows One

Here is where things get very interesting: Measurements to Netflix where content is delivered from 45.57.74.0/24 appear to perform very well, whilst performance is very poor when 45.57.75.0/24 is used. Figure 5 below summarises this clearly:

Both 45.57.74.0/24 and 45.57.75.0/24 are Netflix prefixes that appear to be announced out of Frankfurt. Some traceroutes from one of our servers in Frankfurt confirmed that the servers were certainly in Frankfurt (round-trip latency was 0.6ms).

However, this tells us nothing about the path that traffic from Telia Lithuania takes to these servers. To understand this, we enabled some traceroutes on Telia Lithuania Whiteboxes to these Netflix addresses. What we saw was the following:

Here is the smoking gun: Traffic to 45.57.75.0/24 is taking a completely different path to that of 45.57.74.0/24.

Figure 4 shows the 45.57.74.0/24 going to directly to Frankfurt and back, resulting in a good Netflix performance.

Figure 4

However, Figure 5 shows how traffic to 45.57.75.0/24 is leaving Lithuania, going all the way to Washington DC in the USA (po300.es02.was001.ix.nflxvideo.net), and then back to Frankfurt.

Figure 5

This adds more than 90ms versus using the direct path to Frankfurt. All of this extra distance adds not only latency, but also means that our traffic is traversing many more intermediate networks, increasing the likelihood of our traffic hitting congestion at some point.

Conclusion

We have been able to identify a routing issue between Telia Lithuania and Netflix’s CDN, which means that Netflix traffic is sometimes being routed via Washington DC for Lithuanian users. This is certainly the cause of the underperformance observed on Telia Lithuania broadband connections.

This is just one example of the kind of performance issues that we help ISPs, regulators, and end-users investigate on a regular basis using SamKnows One and our suite of measurements. It is also another demonstration of why speed tests to nearby test servers alone are insufficient to characterise quality of experience.

On a happier note, we have offered to share our findings and supporting data with Telia Lithuania, so hopefully poor Fred can get back to watching Netflix interruption-free in the very near future.