As more ISPs integrate SamKnows Agents into their Routers, we find ourselves dealing with a huge amount of performance data. Of course, this is nothing new to us - we’ve been measuring internet performance on behalf of governments and ISPs for over 10 years. But, what is new, is our FaultFinder system that is helping our customers discover anomalies within their network.
In the past, our customers used our Analytics platform or exported the data into 3rd party data analytics tools to find anomalies. But the success in finding anomalies is limited to the imagination of the person operating the Analytics platform. If you don’t ask the right question, there’s a lot that could be happening on your network that you miss.
While there are machine-learning platforms which look for anomalies, these tend to be generic and ignorant to the complexities of an ISP’s network or how the internet works. We’ve long dreamed of having the time necessary to build a service that can both understand an ISPs network and automatically identify an anomaly in real time. In 2020, when the whole world was shut down, the globetrotting SamKnows team were grounded. So we set about building our dream product for anomaly detection. We started sketching the design for what would become FaultFinder.
Finding faults that impact customer experience
The quality of your customers’ internet experience relies upon literally billions of interconnections that make up the incredibly diverse and wonderfully complicated internet. ISPs often have excellent visibility of the health of individual bits of equipment within their own network, but it is the interplay of all of these components working together that forms user experience.
FaultFinder monitors network performance from the perspective of the end user, via a test agent embedded in your customers CPE. It builds a dynamic model of normal performance for thousands of different scenarios, constantly learning and re-learning to adapt to trend shifts. It compares real-time measurements against these models. When an anomaly is detected we can trace the root cause within your infrastructure or the wider internet, and also measure the impact of the fault in terms of performance and the volume of customers affected.
The beauty of FaultFinder is that with a little bit of fine-tuning it’s pretty much plug and play. No setup required, it adapts automatically to changes to your infrastructure and you can set up scheduled maintenance windows to avoid false anomalies.
FaultFinder automatically learns normal performance and builds a dynamic model which is constantly re-learning to adapt to trend shifts.
FaultFinder AI monitors a live stream of your performance data 24/7 automatically detecting when things are not behaving as expected.
FaultFinder AI’s algorithms understand the semantics of cascading problems and pinpoints root cause in one single notification, rather than triggering a flood of alerts.
Network faults have a knock on effect, FaultFinder can show you how many households are affected and how much it’s affecting your customer experience. This crucial contextual information will help you prioritize the most urgent faults to fix.
FaultFinder visualizes the anomalies it finds in a chart, like the one pictured above. The blue line indicates ‘normal’ performance, based on previous observations. The light blue threshold area that the blue line is within indicates the normal trend range, meaning if anything that falls outside of that norm, FaultFinder recognizes it - shown in a pink line straying from the threshold. Whenever deviations from the model are encountered, FaultFinder groups together the anomalies that share a common root cause and generates a single insightful alert.
To better isolate the specific anomaly, you can adjust chart settings in a few ways:
- Adjusting the date range
- Switching between affected metrics
- Switching between measurement result, show success rate, show sample size, and show agent count
Enhance FaultFinder with network metadata
The more information you give FaultFinder, the better it gets. By adding network metadata to your test agents - things like the telephone exchange, the unit uses, or the CMTS unit inside the exchange - all of these provide more context to accurately trace the root of the cause and its effects.
Increasing the number of test agents in your panel will also increase the accuracy of FaultFinder and it’s assumptions about the impact of a fault.
Types of Anomalies
These are some of the kinds of anomalies FaultFinder can identify:
If the sample size goes down, that means the tests can’t even run or reach the agents, which means there is a problem
The tests run but a certain percentage report back as failed
The value or speed we’ve measured. If download speed goes down from say 200 Mbps to 10 Mbps, it may be due to congestion. The tests still run and succeed but the rate or value measured is much different
Get in touch to find out more
- Save time setting up network performance alerts
- Discover faults before customers start complaining
- Trace faults back to the root cause
- Understand the impact of faults on customer experience