At SamKnows, our purpose is to collect and provide accurate internet measurement data to provide a source of truth for how the internet is really performing. That’s why we do what we do! We’re truth seekers. We measure with as much validity as possible, and pool data together to create large comparable datasets, which can then be used as a common language between government regulators, ISPs, academics, and content providers to optimize and improve internet performance for everyone.
To achieve this, we've built all of our products, services and tests on the foundation of our rigorous testing methodology. But what actually is our methodology? It’s not a simple question to answer, it covers everything from where we measure, how we measure, what we measure, what we don’t measure, the quantity of measurements, and much more...
We’re always aiming for precise, trustworthy, and fair results. In order to guarantee this trust, we are open and transparent about our testing methodology, here's a peak into our method of measuring internet performance.
Collecting clean data
Testing at the router
A typical speed test runs from a browser over a user's home network, to a router then out to a test server on the internet. But, the problem with this kind of approach is that it introduces unknown variables into the results. Like: Was the test conducted over Wi-Fi in an area of the home with poor signal strength? Was it using an old browser or under-powered computer limiting the available speed? We test at the edge of the home network: at the router, which is connected to the internet. This eliminates any in-home factors and ensures we measure the service provided to the home and nothing else. It creates a consistent and fair test environment, which is essential for large-scale comparative studies.
Scheduled tests throughout the day
A standard speed test is a very self-selecting sample. As in, people only really use them when they think they have a problem! Our approach is to use scheduled tests throughout the day, that way we can understand the whole of a person or area’s internet performance. Our test schedules can be configured to suit our customer requirements, but a standard test schedule will run from our test suite every hour, 7 days a week, 365 days a year. This generates a large sample of data for us to draw strong conclusions from. Our latency test operates continuously in the background, with typically 2000 samples taken every hour. This provides a very granular picture of latency performance and allows us to understand how latency varies when under load.
Our proprietary test scheduler uses the Large-Scale Measurement of Broadband Performance (LMAP) protocol. If all of SamKnows’ test agents ran their tests at the same time, the extra load on the ISPs network could negatively affect the test results. So our LMAP controller staggers the tests across the 1 hour window.
Only measuring when the connection is idle
If someone was streaming a 4K movie while running a speed test, it would completely distort the results. Our test agents can detect 'cross traffic' in the home – this is when other devices inside the home are also using the Internet. If cross traffic is detected then the test is deferred for 30 seconds. If cross traffic persists for more than five minutes then the test is skipped. This ensures that our measurements are an accurate representation of the service provided to the home and don’t disrupt the Internet experience of the customer.
The amount of personally identifiable information stored by SamKnows is limited to; name, email, and address. All metadata and personally identifiable information is both encrypted at rest and in transit on SamKnows servers. It is never stored or transmitted to or from the test agent.
The SamKnows Internet Measurement Platform is relied upon by ISPs, internet companies, governments, consumers and academics worldwide. It is therefore imperative that the highest possible levels of security are maintained! Every aspect of the platform is designed with security in mind.
Testing real internet experience
Our methodology when it comes to tests is to look beyond speed. As broadband connections get faster and faster, latency or lag is becoming the primary limiting factor. Testing latency to the nearest test server doesn’t tell you anything about real-world internet performance. That’s why we’ve developed hundreds of QoE tests for popular applications and services on the internet.
Quality of Service measurements
Speed tests have long been used to identify and assess the performance of networks. Poor QoS performance will affect everything on the connection and are generally caused by broadband provider under-performance. But QoS measurements alone, only capture part of the internet experience.
Quality of Experience measurements
QoE gives a much fuller and more realistic picture of the internet performance a user is experiencing, because they test the real services people are using. SamKnows specializes in these types of tests and can provide real time data on how well video conferencing, online games, social media networks or streaming services. to name a few, are performing on operators’ networks. This level of data means real network problems that have an impact on big populations of internet consumers. For example, Netflix caches inside an ISP's network being congested, can be easily found and fixed. This leads to an improved service for all consumers.
Neutral and fair test targets
All speed tests need an endpoint to test against. These are called test servers. A typical browser-based speed test uses the nearest test server which is typically quite nearby physically, usually inside your ISPs network (on-net) and optimized for speed so that they return the best results possible. The problem with the nearest test server approach is it’s not very representative of how well the internet performs in the real world – no one is streaming Netflix from their nearest test server.
Off-net test servers
For our regulatory studies, our methodology is to only use standardized ‘off-net’ measurement test servers. This means they are not installed inside an ISPs network, and instead sit outside at a neutral location. This ensures that ISP performance can be measured under the same conditions and avoids artificially biasing results for any one ISP over another. This is done to ensure that we are measuring the complete service that the ISP is providing, up to and including a common handover point to the wider Internet. SamKnows test servers must also meet a minimum hardware and connectivity specification. This ensures that all test servers used have acceptable connectivity and are not of a standard that would harm or lead to inaccurate data collection.
The goal is to create a realistic and fair dataset from a small sample of homes that can be used to draw conclusions about performance at a larger scale, and much more reliably than the crowdsourced data method.
Each Whitebox runs a lot of tests, giving a very accurate estimate of a single household's performance. To make sure that this generalizes to the population at large, we need to make sure that a large enough number of households are sampled in each of the main broadband packages on offer in a country.
We decide on an acceptable margin of error, and create a sample plan to deliver that precision. The greater the variation in product performance, the more samples we need for an accurate overall result. Once we have enough households to get a fair and accurate population estimate, it's wasteful for us to send out any more Whiteboxes than necessary. This delicate balancing act is overseen by our data analysts and project managers.
Geographic distribution of test agents
While geographic quotas aren't always explicitly built into the sampling plan (each geographic region multiplies the needed sample size!), we make sure that our recruitment captures the hard-to-reach parts of the countries that we work in.
Our more mature studies utilize statistical techniques to make sure that results are reported in a way that fairly represents the make-up of each package's subscriber base.
Cleaning up the data
Our data analysts pull together all of the telemetry data collected by the Whitebox to flag up any anomalous readings. We don't trust any single data point in isolation. We apply several layers of cross-checking to make sure that every measurement is assigned to the right broadband package in reporting. This is a lot of work, but it’s essential to building a consensus around how to interpret study results.
This very focused data cleaning is backed up by regular high-level monitoring of the data reported by our QoS and QoE tests. Our tests run across the entire world on millions of devices, so our test developers routinely make small adjustments that reflect the local peculiarities of application traffic in each country. We choose the most appropriate data to report whenever an adjustment happens in the middle of a measurement period.
Data analysis methodology
Our data analysts use the programming language R as an end-to-end tool. We use the same computing environment for everything: from interacting with our Google BigQuery data storage infrastructure through to making the final reports with R Markdown. This means that the methodology used to arrive at results is flexible, transparent, and repeatable.
Internet performance is getting better every day, the foundation for anyone trying to improve network performance is an accurate and reliable way to measure the difference they make. SamKnows can’t improve global internet performance alone! But our customers and the industry at large can rely upon our data to help make the internet better for anyone.