Spotlight: Bufferbloat

In the latest edition of Spotlight, we take on ‘Bufferbloat’, a relatively unknown but common cause of high latency that can badly affect video streaming, online gaming, and teleconferencing. To find out about how bufferbloat happens, what it means for your broadband connection, and how to beat it – read the magazine and listen to the podcast here:

Download this issue (pdf)

Ten years ago, a computer scientist stood in front of a room of Google engineers and told them about a huge problem that he’d discovered on his broadband connection – a problem that meant it could take 10 seconds or more to download a single web page.

A decade on and that problem – dubbed bufferbloat – still exists. Bufferbloat can effectively cripple your connection, ruining online games, breaking up video calls, generally grinding your connection to a crawl. And all because lots of tiny data buffers, most notably the ones in your broadband router, become clogged with data, meaning it can take ages for important stuff to trickle through...

The 'the internet is slow today' problem

It’s April 2011, and the ‘father of the internet’, Vint Cerf, has gathered his Google colleagues together to listen to a man who has identified a big problem with broadband connections.

“It may have a severe impact on our ability to deliver timely services,” Cerf warns a crowded conference room of Google engineers. “We all understand Google is all about low latency. Anything that gets in the way of latency gets under [Google co-founder] Larry Page’s skin, and that isn’t where any of us want to be,” he warns, only half-jokingly.

Cerf hands the floor over to Jim Gettys, a man with an impressive CV. He edited the HTTP specification, he helped develop Linux for handheld devices (a forerunner of Google’s Android operating system), he was the vice president of the One Laptop Per Child project, which aimed to bring laptops to the world’s poorest children. At that time, he was working at Bell Labs on videoconferencing. In short, this guy knows his stuff.

Gettys takes to the floor and starts explaining a problem he’s noticed on his broadband connection. He was uploading a sizeable database of files on his home connection, when he noticed that his internet connection was stumbling. He used a tool called SmokePing to get a clearer view of what was happening. It was reporting serious packet loss and latencies in excess of one second – on a connection that normally has a latency of around 10ms.

“Anything that gets in the way of latency gets under Larry Page’s skin, and that’s not where any of us want to be.”

He shows the Googlers a graph of the latency on his connection, which looks like a rollercoaster ride, with occasional steep dips. “The only time the latency is low is because trying to surf the web or read mail was so painful, I occasionally suspended it [the upload] just so I could do something else,” Gettys says, explaining those dips back to normality. “This,” Gettys proclaims, “is the ‘daddy, the internet is slow today problem’.”Gettys explains how he continued to investigate the problem, using a variety of network-monitoring tools and leaning on the expertise of his colleagues until he identified the culprit: bufferbloat.

There are data buffers hidden everywhere, Gettys goes on to explain. In operating systems, in laptops, in phones and, most notably, in the broadband routers we use to connect to the internet. “I’ve got really paranoid,” Gettys tells the Google staff. “I think bufferbloat is almost everywhere.”

When those buffers are filled by, say, a large file that you’re uploading to the internet, bad things start to happen. Your typical broadband router has no real idea what it’s uploading, it just knows there’s a big chunk of data stuck in its buffer and tries to work through it. Anything else that comes along in the meantime – such as a request to visit a website or a Skype call – joins the back of the queue or goes missing in action.

So, when Gettys is still midway through his large file upload, the reason his latency has shot through the roof – the reason it’s taking ages to do something as simple as visit a web page – is because the tiny packet of data that’s requesting that website is struggling to get past that bloated buffer, much like sewers become blocked by those enormous fatbergs.

Consequently, it doesn’t matter if you have a 10Mbps connection, a 100Mbps connection or even a gigabit (1,000Mbps) connection – you’re still going to suffer from stutter and slowdown once those buffers fill up.

And here’s the kicker: a decade after Gettys stood in front of a room full of Google engineers and warned them of this problem, it still persists today.

The Bufferbloat effect

SamKnows has been measuring the effects of bufferbloat – or latency under load – almost since the day that Gettys stood up in front of the Google engineers and outlined the problem. Even 10 years on, with a lot of remedial work done to try to resolve the problem, the SamKnows data shows that bufferbloat continues to bog down even today’s ‘ultrafast’ connections.

Extensive tests carried out on US broadband networks show just how debilitating bufferbloat can be. Average latency on a 40Mbps DSL connection is 26.9ms – less than a quarter of the time it takes to blink. On a 300Mbps fibre connection, the latency is reduced even further to 19.95ms.

However, if you flood the uplink with data in the same way as you might when uploading photos to a cloud service or a video to YouTube, the problems start to emerge. The average latency under upstream load jumps to 280.93ms on that 40Mbps DSL connection, while slower DSL connections can see latency rise beyond one second. Even that 300Mbps cable connection sees a hike to 48.31ms.

High latency is disastrous for many online activities. Online gaming will start to stutter and lag when you get latency beyond 50 or 100ms – some game servers will simply refuse to admit players who have triple-figure latency because they’ll ruin the game for everyone else. The performance of voice and video calls will be dragged down. And, as Jim Gettys noticed all those years ago, even something as simple as visiting a new web page can take ages, because that tiny packet of data sent from your computer to request the new page is stuck in a long queue.

As SamKnows founder, Sam Crawford, explains, services such as Netflix have designed ways to work around problems such as bufferbloat by giving viewers some headroom if the connection is suddenly flooded. “Netflix is fairly well publicised to pre-buffer about three to four minutes of video,” he says. “That’s great for user experience as it means if your connection goes down for even three or four minutes, or you have a real speed problem, then, frankly, you probably won’t even notice.”

“Skipping content and starting new playback of titles is where services such as Netflix suffer as a consequence of bufferbloat”

However, not even Netflix is completely impervious to bufferbloat performance problems. “If you’re suffering from bufferbloat whilst trying to skip to another part of the video, that new part in the video you’re skipping to is unlikely to be in your three-to-four-minute Netflix buffer,” Crawford explains. “So, what this means is the Netflix app has to go all the way out to the internet and request that content again, so it’s like starting afresh. Skipping content and starting new playback of titles is where services such as Netflix, Apple TV and Amazon Prime Video really suffer as a consequence of bufferbloat.”

The real latency problem

Jason Livingood, vice president of technology policy & standards at Comcast, says that the latency measurement people have been conditioned to focus on is “idle latency” – the latency on the connection when it’s not being used. However, it’s latency under load – or “working latency” – that has a much bigger impact on a person’s day-to-day internet experience.

Livingood says “idle latency is useful to show the difference between major types of access networks, such as wired vs. satellite, but working latency is what users care about day-to-day.”

Livingood says that “users get frustrated” when frames start dropping on video calls or “the audio becomes problematic, and that’s because that working latency suddenly goes beyond some envelope of acceptableness”.

Alas, the vast majority of internet speed tests focus purely on idle latency, so even if users have a problem with working latency, the speed tests are highly unlikely to reveal it. It’s a problem that’s largely hidden from view.

How we test for Bufferbloat

SamKnows tests for two different types of latency: idle latency and latency under load.

Idle latency is the test you see performed on most regular internet speed test sites, where we ping a test server when the connection is otherwise unused to see how fast it responds in milliseconds. This could be described as best-case scenario latency, because nothing else is getting in the way. The latency under load test is performed while we’re running our download/upload speed test. Our speed test uses multiple, parallel TCP connections that are downloading data from, or uploading data to, the internet. They’re effectively flooding the connection with test data for a short period of time. Whilst those speed tests are running, we also run a UDP ping out to a set of test servers on the internet. These pings are made once every 500 milliseconds during the speed test, so that we get a good range of samples while the connection is under heavy load. Latency under load on the upstream and downstream are measured independently, because upstream load has a bigger effect on latency. That’s how we can accurately measure the impact that bufferbloat has on a connection’s performance.

“Idle latency is useful to show the difference between major types of access networks, but working latency is what users care about”

253ms. The difference between idle latency and latency under load on a 40Mbps DSL connection

Beating Bufferbloat

Perhaps the biggest problem with bufferbloat, from a consumer’s point of view, is that it isn’t all that easy to beat. Sure, you can buy a faster broadband connection, but as we’ve shown, that doesn’t make the problem go away entirely.

Some routers do come equipped with smart queue management (SQM) that should eradicate the bufferbloat problem. Instead of putting all of your internet traffic in a single queue, irrespective of what that traffic is, SQM creates different queues for each ‘flow’. So, for example, a hefty Dropbox upload might form one flow, your email might form another, and so on.

The SQM will then examine the size of each queue and prioritise those that have no or small queues, meaning easy-to-deal with requests such as fetching a new web page aren’t trapped at the back of one long queue, waiting for that massive Dropbox upload to clear. (Note that SQM is different from quality of service (QoS), which will only prioritise certain, user-defined traffic flows.)

The bad news is that the routers supplied by most broadband providers don’t currently support SQM. “I don’t think any of the major internet service providers, at least here in the UK, support SQM out of the box.”

Instead of putting all your internet traffic in a single queue, SQM creates different queues for each ‘flow'

However, there is hope on the horizon. There are reports that some major broadband providers are starting to add SQM to the routers they supply to customers, which should dramatically reduce the impact of bufferbloat. That will likely spark a chain reaction as other providers do likewise, meaning that we should gradually see this decade-old problem being eliminated as customers either receive router firmware updates or new models with the technology built in.

In the longer term, the solution could be a more fundamental reshaping of the internet, moving towards a system where latency is given much higher priority than it is today. This “Low Latency, Low Loss, Scalable throughput” architecture is called L4S for short, and it could have a huge impact on the types of services that can be delivered online.

In the longer term, the solution could be a more fundamental reshaping of the internet

Comcast’s Jason Livingood says to think about the amount of time it takes to open a photo on your iPhone and a photo stored in Apple’s iCloud service. “What if they were the same amount of time?” he says, talking about the potential, near-instantaneous latency of L4S. “That would fundamentally alter the way developers thought about how they develop apps and the things that are possible. It would make this network much more like a distributed computer.”

However, L4S is still many years away; right now it’s little more than a blueprint for how the internet might look in years to come. Hopefully, by then, Bufferbloat will be a distant memory. Hopefully.

Listen now

Find out more about Bufferbloat and how SamKnows tests for it on the latest edition of our podcast with Sam Crawford.

About us

SamKnows measures, analyses and visualises internet quality of experience in realtime. We do this across entire countries, networks, homes and individual devices. We are trusted by governments, ISPs, application providers and consumers alike to provide accurate and actionable insights. In an increasingly complex connected world, SamKnows is able to spot faults as they are occurring and deliver notifications that are easy to understand and genuinely improve internet quality of experience.