Download PDF

Data API.

The Data API provides read-only access to raw and aggregated metric results, similar to that which you might find in SamKnows One’s charting, in addition to the raw test results collected from our various measurement agents.

It can be used in two primary ways, through the SamKnows One UI, where you can then download CSVs of results; or the API directly for clients who wish to integrate our data into their internal platforms or automated systems.

Please read general documentation of APIs here before continuing.

SamKnows One UI

Within SamKnows One you have both a request builder, links to this documentation, and the ability to manage the Data API authentication key for your organisation.

The request builder allows you to select which of the three types of data you wish to export, then use the normal SamKnows One chart builder in order to either produce downloadable CSVs, or example requests for using the API directly, outside of the SamKnows One UI.

Types of data

We have two different types of data you can export using the Data API. Metric data and Test Data.

  • Metric data is very similar to the data you might get if you do an export CSV from SamKnows One's charting functionality. It uses our specially derived metrics and with it you can fully utilise much of the power of SamKnows One including aggregation giving you averages, standard deviations, percentiles, confidence interval values and whilst also being able to use functionality such as prefiltering, normalisation, splitting data into multiple series and advanced filtering. You can view this as either unaggregated results (raw) or in aggregated form.
  • Test data is the detailed raw unfiltered and unprocessed output from the tests that execute on various measurement agents (Whitebox, CPE, Mobile and Web). This data can be useful for doing a deep-dive into the data and is significantly more detailed but also a lot more complex to use.

We also have various metadata endpoints to aid your use of the API such as the various metrics and tests accessible to you, and the various pieces of metadata you can filter or split data by.

API & endpoints

Development

Developer documentation is available below. URLs for production and development purposes will be provided upon commencement of a project. Authentication tokens are managed through the SamKnows One UI.

Performance and capacity

We actively manage capacity planning to ensure that the API can handle as many simultaneous connections/sessions as required by our clients.

Basic analytics queries (including the Data API and in SamKnows One web interface) for small date ranges and small data sets will complete within a few hundred milliseconds in almost all situations. Higher spikes in load can occasionally cause higher response times but this is mitigated by careful monitoring and extensive capacity planning. Query times depend on the amount of measurement agents reporting and the date range being requested.

Logging

All analytics queries are logged internally by SamKnows for auditing and analytics purposes.

Overview

Metric data endpoints

  • Metrics Data Endpoint (JSON) [POST /metric_data] - Fetch data for specific metrics
  • Metrics Data Endpoint (CSV) [POST /metric_data.csv] - Generate a CSV and return a link for above endpoint
  • Test Data Endpoint (JSON) [POST /test_data] - Fetch raw test data
  • Test Data Endpoint (CSV) [POST /test_data.csv] - Generate a CSV and return a link for above endpoint

Metadata endpoints

  • Metrics endpoint [GET /metrics] - Get a list of all metrics
  • Tests endpoint [GET /tests] - Get a list of all tests
  • Splittables endpoint [GET /splits] - Get a list of everything you can split by when looking at metric data
  • Filterables endpoint [GET /filters] - Get a list of everything you can filter by, and links for auto-complete values when looking at metric data

Metric data endpoint

Request [POST /metric_data]

+ Request (application/json)

+ Headers

Authorization: {token}

Accept: application/json

+ Body

{

"metric": "httpget",

"chartType": "aggregate",

"aggregation": "hourly",

"normalised": true,

"prefilter": false,

"splits": [

"package"

],

"filters": [

{

"id": "package",

"filterType": "1",

"filterValues": [

"6737",

"6659"

]

}

],

"time": {

"from": "2017-05-04 23:00",

"to": "2017-05-07 04:00"

}

}</code></pre><table class="table"><thead><tr><th>Field</th><th>Description</th><th>Type</th><th>Example</th><th>Required?</th></tr></thead><tbody><tr><td>metric</td><td>The key for the metric you wish to fetch. All the available metrics are detailed on the metrics endpoint.</td><td>string</td><td><code>dnsFail</code></td><td>Yes</td></tr><tr><td>chartType</td><td>The chart type, <code>raw</code> or <code>aggregate</code>.</td><td>string</td><td><code>aggregate</code></td><td>Yes</td></tr><tr><td>aggregation</td><td>If you selected an aggregate chart type, what to aggregate by</td><td>string</td><td><code>hourly</code></td><td>Required for <code>aggregate</code></td></tr><tr><td>normalised</td><td>Toggle for enabling normalisation, see below for more information.</td><td>boolean</td><td><code>true</code></td><td>Yes</td></tr><tr><td>prefilter</td><td>Toggle for enabling prefiltering, see below for more information.</td><td>boolean</td><td><code>true</code></td><td>Yes</td></tr><tr><td>splits</td><td>Array of pieces of metadata to split by</td><td>list</td><td><code>["package","isp"]</code></td><td>No</td></tr><tr><td>filters</td><td>Filters are detailed in more depth below, please see there for more information</td><td>list of objects</td><td><code>[{"id": "package","filterType": "1","filterValues": ["6737"]}]</code></td><td>No</td></tr><tr><td>time</td><td>The local time range to return data from</td><td>object consisting of to/from strings</td><td><code>{"from": "2017-05-04 15:00","to": "2017-05-07 03:00"}</code></td><td>No</td></tr></tbody></table><p>All fields are documented in more depth below.</p><h5 id="aggregated-chart-type-response-200-">Aggregated chart type response (200)</h5><pre><code class="language-json">{

"code": "OK",

"message": "Request successful",

"data": [

{

"metricData": [

{

"metricValue": 23031.16,

"dtime": "2018-01-03T00:00:00",

"stdDev": 5718.58,

"sampleCount": 159,

"unitCount": 152,

"ci90": 594.41,

"ci95": 909.12,

"ci99": 1194.75,

"min": 10296,

"max": 43483,

"percentiles": {

"99": 38968,

"1": 12036,

"25": 18734,

"20": 18734,

"5": 14526,

"90": 30505,

"50": 22667,

"95": 33040,

"75": 26641,

"80": 26641,

"10": 16054

},

},

{

"metricValue": 22738.24,

"dtime": "2018-01-03T01:00:00",

"stdDev": 5526.6,

"sampleCount": 168,

"unitCount": 164,

"ci90": 553.04,

"ci95": 845.85,

"ci99": 1111.6,

"min": 10097,

"max": 39356,

"percentiles": {

"99": 38159,

"1": 11250,

"25": 19283,

"20": 19283,

"5": 14559,

"90": 29712,

"50": 22090,

"95": 32495,

"75": 26193,

"80": 26193,

"10": 15496

},

}

]

}

]

}</code></pre><p>Within the metric data blob you will see the following key-value pairs:</p><table class="table"><thead><tr><th>Field</th><th>Description</th><th>Example</th></tr></thead><tbody><tr><td>metricValue</td><td>The primary metric value, it is usually an average for most metrics</td><td><code>74768.46</code></td></tr><tr><td>dtime</td><td>The datetime representation of the beginning of the period represented by this data point (time series aggregates only)</td><td><code>2018-01-03T05:00:00</code> or <code>2018-01-03</code></td></tr><tr><td>aggregatedNumber</td><td>When aggregating by something that's not time e.g. hour of day, this is the value of the group</td><td><code>1</code></td></tr><tr><td>stdDev</td><td>Population Standard Deviation</td><td><code>9620041.6</code></td></tr><tr><td>sampleCount</td><td>Sample count (of tests)</td><td><code>1016</code></td></tr><tr><td>agentCount</td><td>Number of measurement agents that reported contributing results</td><td><code>663</code></td></tr><tr><td>ci90</td><td>90th Confidence Interval</td><td><code>478783.08</code></td></tr><tr><td>ci95</td><td>95th Confidence Interval</td><td><code>732278.46</code></td></tr><tr><td>ci99</td><td>99th Confidence Interval</td><td><code>962348.39</code></td></tr><tr><td>min</td><td>Minimum</td><td><code>0</code></td></tr><tr><td>max</td><td>Maximum</td><td><code>117136375</code></td></tr><tr><td>median</td><td>Median</td><td><code>4671160</code></td></tr><tr><td>percentiles</td><td>Array of common percentile values</td><td><code>"99": 38159, "1": 11250, "25": 19283, "5": 14559, "90": 29712, "50": 22090, "95": 32495, "75": 26193, "10": 15496</code></td></tr><tr><td>{split}</td><td>Columns containing the values of data when it has been split</td><td><code>38x10</code></td></tr></tbody></table><p>The units vary from metric to metric, and this data is provided on the metrics

endpoint, for example the <code>httpget</code> and <code>httppost</code> metrics that produce upload/download

speeds are measured in bytes per second.</p><h5 id="raw-data-chart-type-response-200-">Raw data chart type response (200)</h5><pre><code class="language-json">{

"code": "OK",

"message": "Request successful",

"data": [

{

"metricData": [

{

"metricValue": 27426340,

"dtime": "2018-01-08T00:00:20",

"agent": "06a68b00-2c35-11a7-b11a-0aa47a6ababd",

"unit": "827829"

},

{

"metricValue": 27427038,

"dtime": "2018-01-08T00:00:38",

"agent": "06a68b00-2c35-11a7-b11a-0aa47a6ababd",

"unit": "827829"

}

]

}

]

}</code></pre><p>Within the <code>metric_data</code> blob you will see the following key value pairs:</p><table class="table"><thead><tr><th>Field</th><th>Description</th><th>Example</th></tr></thead><tbody><tr><td>metricValue</td><td>The raw metric value. See Metric Types</td><td><code>74768.46</code></td></tr><tr><td>dtime</td><td>The datetime the test ended</td><td><code>2018-01-03T00:00:00</code></td></tr><tr><td>agent</td><td>The agent ID of the reporting agent</td><td><code>06a68b00-2c35-11a7-b11a-0aa47a6ababd</code></td></tr><tr><td>unit</td><td>The unit ID of the reporting agent</td><td><code>827829</code></td></tr><tr><td>{split}</td><td>Columns containing the values of data when it has been split</td><td><code>38x10</code></td></tr></tbody></table><h4 id="metrics">Metrics</h4><p>We have a huge variety of metrics available extracting useful data from our test

results. Some metrics are only available when aggregated.

Some metrics also have additional filters and splits available (such as httpget and

httppost which you can filter and split by thread count or IP version).

All the available metrics are detailed on the metrics endpoint.</p><h4 id="normalisation">Normalisation</h4><p>Normalisation functionality is available when retrieving aggregate time series data.

This is useful when plotting results by hour, but the test schedule does not run test

results every hour across every agent. Without normalisation, users will often see a

saw-toothing effect, with some hours having measurement agents with much better results than

other hours simply due to agent distribution. Normalisation forces our analytics

engine to look at the underlying test schedule for the metric in question, and to

normalise results across the blocks of hours that the tests should run over. This results

in an even number of measurement agents and results being represented for all hours of the day,

thus removing the spikes in the charts.</p><h4 id="filtering">Filtering</h4><p>You can filter the outputted data in a number of ways:</p><ul><li>Date / Time</li><li>Prefilters</li><li>Custom defined metadata</li><li>Standardised metadata</li></ul><p>All filters except date and time ranges can be inclusive (include only those specified)

or exclusive (exclude those specified). Date and time ranges are always inclusive (only

the range you specify).</p><h4 id="date-time">Date time</h4><p>You must specify at least the date range over which you wish to retrieve data from.

Datetimes used in to/from fields can be described in the following formats, all dates

and times refer to measurement agents' local time. They will be used exactly as stated with no

rounding. The stated times would be therefore be included.</p><ul><li>YYYY-MM-DD HH:MM:SS (e.g. 2018-01-01 10:15:30)</li><li>YYYY-MM-DD HH:MM (e.g. 2018-01-01 10:00)</li></ul><h4 id="prefilters">Prefilters</h4><p>Our prefilter functionality uses a database of manually specified prefilters that

preclude metric results that are extremely unlikely or impossible in addition to

utilising our machine learning predictive algorithms and models in order to try and

filter out data which may be considered to be unreliable or 'bad' data.</p><h4 id="standardised-metadata">Standardised metadata</h4><p>There are a number of fields that are always available to filter by, for routers &amp; Whiteboxes they are:</p><table class="table"><thead><tr><th>Field</th><th>Description</th></tr></thead><tbody><tr><td>target_set</td><td>The SamKnows or client-specified target groups</td></tr><tr><td>package</td><td>The specified package/product the agent is on</td></tr><tr><td>country</td><td>The ability to filter down to results from a particular country</td></tr><tr><td>target</td><td>A specific target / test node</td></tr><tr><td>agent</td><td>Filtering down test results by Agent ID</td></tr><tr><td>unit</td><td>Filtering down test results by Unit ID</td></tr><tr><td>isp</td><td>The unit's associated ISP</td></tr><tr><td>base</td><td>The device model</td></tr></tbody></table><p>For mobile they are:</p><table class="table"><thead><tr><th>Field</th><th>Description</th></tr></thead><tbody><tr><td>model</td><td>Phone model</td></tr><tr><td>geo_country</td><td>The country we identify the device as being in</td></tr><tr><td>manufacturer</td><td>The device manufacturer</td></tr><tr><td>connection_type</td><td>Wi-Fi or Cellular, which connection type was used during the test</td></tr><tr><td>cellular_technology</td><td>The current cellular technology available during the test run e.g. 3G / 4G</td></tr><tr><td>carrier_name</td><td>The carrier name</td></tr><tr><td>operating_system_version</td><td>The version of the OS (Android or iOS version)</td></tr></tbody></table><h4 id="chart-types">Chart types</h4><p>The valid chart types are:</p><ul><li>raw - This will give you a row/object per data point</li><li>aggregate - This will allow you to do aggregation, most commonly aggregating by an element of time</li></ul><h4 id="aggregations">Aggregations</h4><p>If you have specified <code>aggregate</code> as your chart type then you may also specify an aggregation.</p><p>The following aggregations are available that aggregate by time in chronological order:</p><ul><li><code>hourly</code> - e.g. <code>2018-01-01 22:00</code> (until 22:59), <code>2018-01-01 23:00</code> (until 23:59), <code>2018-01-02 00:00</code> (until 00:59), <code>2018-01-02 01:00</code> (until 01:50)</li><li><code>two_hourly</code> - e.g. <code>2018-01-01 00:00</code> (until 05:59), <code>2018-01-01 06:00</code> (until 11:59), <code>2018-01-01 12:00</code> (until 17:59)</li><li><code>four_hourly</code> - e.g. <code>2018-01-01 00:00</code> (until 03:59), <code>2018-01-01 04:00</code> (until 07:59), <code>2018-01-01 08:00</code> (until 11:59)</li><li><code>six_hourly</code> - e.g. <code>2018-01-01 00:00</code> (until 05:59), <code>2018-01-01 06:00</code> (until 11:59), <code>2018-01-01 12:00</code> (until 17:59), <code>2018-01-01 18:00</code> (until 23:59)</li><li><code>daily</code> - e.g. <code>05/01/18</code>, <code>04/01/18</code>, <code>03/01/18</code></li><li><code>weekly</code> - e.g. <code>2018-01-01</code>, <code>2018-01-08</code>, <code>2018-01-15</code></li><li><code>monthly</code> - e.g. <code>2018-01-01</code>, <code>2018-02-01</code>, <code>2018-03-01</code></li><li><code>quarterly</code> - e.g. <code>2018-01-01</code>, <code>2018-04-01</code>, <code>2018-07-01</code>, <code>2018-10-01</code></li></ul><p>The following aggregations are available that use an element of time (e.g. the date or the hour of the day)</p><ul><li><code>hour_of_day</code> - e.g. 00:00 to 00:59; 01:00 to 01:59</li><li><code>day_of_week</code> - e.g. <code>1</code> for Monday, <code>2</code> for Tuesday, <code>7</code> for Sunday.</li><li><code>day_of_month</code> - e.g. <code>1</code> for the 1st, <code>5</code> for the 5th</li></ul><p>The following other aggregations are available:</p><ul><li><code>total</code> - This aggregates everything in a split group into a single objects/rows</li></ul><h4 id="splits">Splits</h4><p>Splits allow you to, when aggregating, separate out data into different data series. So for example, if you are

doing an <code>hour of day</code> aggregation and split by <code>package</code> you might see:</p><ul><li><code>Time: 4pm - 5pm</code>, <code>Package: 38x10</code>, <code>Average: 37.4</code></li><li><code>Time: 4pm - 5pm</code>, <code>Package: 76x25</code>, <code>Average: 75.4</code></li><li><code>Time: 5pm - 6pm</code>, <code>Package: 38x10</code>, <code>Average: 37.1</code></li><li><code>Time: 5pm - 6pm</code>, <code>Package: 76x25</code>, <code>Average: 74.1</code></li><li><code>Time: 6pm - 7pm</code>, <code>Package: 38x10</code>, <code>Average: 36.2</code></li><li><code>Time: 6pm - 7pm</code>, <code>Package: 76x25</code>, <code>Average: 74.2</code></li><li><code>Time: 7pm - 8pm</code>, <code>Package: 38x10</code>, <code>Average: 36.3</code></li><li><code>Time: 7pm - 8pm</code>, <code>Package: 76x25</code>, <code>Average: 74.3</code></li><li><code>Time: 8pm - 9pm</code>, <code>Package: 38x10</code>, <code>Average: 37.1</code></li><li><code>Time: 8pm - 9pm</code>, <code>Package: 76x25</code>, <code>Average: 75.1</code></li></ul><p>You can find out the metadata you can split by from the split metadata endpoint.</p>

Test data endpoint

Request [POST /test_data]

+ Request (application/json)

          + Headers

                  Authorization: {token}
                  Accept: application/json

          + Body

                  {
                      "test": "httpget",
                      "date": {
                          "from": "2017-05-04",
                          "to": "2017-05-07"
                      }
                  }

FieldDescriptionExampletestThis is the key that refers to the test you wish to retrieve data ondnsdateThe date range you wish to retrieve data for (UTC)"from": "2018-01-01","to": "2018-01-05"

You can find the value for the panel id in SamKnows One. This essentially corresponds to your data source / data subscription. Other fields are documented in more depth below.

Response 200

{
        "code": "OK",
        "message": "Request successful",
        "data": [
          {
                "unit_id": 32477,
                "agent_id": "91f7f827-2c33-11e7-b97c-0cc47a6beaccc",
                "dtime": "2018-01-03 12:23:30.000",
                "base": "skwb8",
                "ddate": "2018-01-03",
                "target": "n11-the1.samknows.com",
                "address": "77.75.107.173",
                "fetch_time": 10000503,
                "bytes_total": 205720256,
                "bytes_sec": 20570991,
                "bytes_sec_interval": 20570991,
                "warmup_time": 5000238,
                "warmup_bytes": 96839378,
                "sequence": 0,
                "threads": 1,
                "tcp_retransmissions": 36,
                "successes": 1,
                "failures": 0,
                "ip_version": 4,
                "target_group": "Off-net"
          }
        ]
      }

Within the data objects you will see all the various columns and values we store. These values will vary from test to test, and you can find descriptions and tests available provided on the tests endpoint.

The following fields you will see across many tests and are therefore not documented per test:

FieldDescriptionExamplebaseThe device modelskwb8ddateThe date the test took place on2018-01-01dtimeThe datetime the test ended2018-01-03 12:23:30.000agent_idThe agent ID of the reporting agent06a68b00-2c35-11a7-b11a-0aa47a6ababdunit_idThe unit ID of the reporting agent827829

Date range filtering

Date and time ranges are always inclusive (only the range you specify). You must specify a date range. Data Formats: Dates used in to/from fields can be described in any of the following formats, they must always be UTC. When used a from, it will be rounded down (e.g. 2018-01 would include from 2018-01-01 00:00:00). When used as a to, it will be rounded up (e.g. 2018-04 would include up to 2018-04-31 23:59:59). The stated dates/days would therefore be included.

  • YYYY-MM-DD (e.g. 2018-01-01)
  • YYYYMMDD (e.g. 20180101)
  • YYYY-MM (e.g. 2018-01)
  • YYYY (e.g. 2018-01-01)
  • +x days (e.g. +5 days meaning 5 days in the future)
  • -x days (e.g. -5 days meaning 5 days ago)
  • last|next|this DAY (e.g. last Monday)
  • now

Metadata endpoints

Request [GET /metrics]

This will get an array of the metrics (for metric data) available to you and their units.

+ Request (application/json)

          + Headers

              Authorization: {token}
              Accept: application/json

Response 200

{
        "code": "OK",
        "message": "Request successful",
        "data": [
          {
            "key": "httpget",
            "unit": "mbps"
          },
          {
            "key": "httpget_retrans",
            "unit": "count"
          },
          {
            "key": "jitterDown",
            "unit": "ms"
          },
          {
            "key": "dnsResponse",
            "unit": "ms"
          },
          {
            "key": "dnsFail",
            "unit": "%"
          },
          {
            "key": "webgetAvg",
            "unit": "s"
          }
        ]
      }

Request [GET /tests]

This will get an array of the tests (for test data) available to you and their measurement agents.

+ Request (application/json)

          + Headers

              Authorization: {token}
              Accept: application/json

Response 200 (for units)

{
        "code": "OK",
        "message": "Request successful",
        "data": [
          "httpget",
          "httppost",
          "dns",
          "netflix",
          "ping",
          "jitter",
          "latency",
          "webget",
          "youtube",
          "network_usage",
          "traceroute"
        ]
      }

Response 200 (for mobile)

{
        "code": "OK",
        "message": "Request successful",
        "data": [
          "mobile_download",
          "mobile_latency",
          "mobile_upload",
          "mobile_www",
          "mobile_youtube"
        ]
      }

CSVs

Examples throughout this document reference the JSON format, but by appending .csv to the endpoint you can receive raw CSV output:

  • Metrics Data Endpoint (CSV) [POST /metric_data.csv] - Generate a CSV and return a link for above endpoint
  • Test Data Endpoint (CSV) [POST /test_data.csv] - Generate a CSV and return a link for above endpoint

If you would prefer, we can upload the CSV to Amazon AWS S3 and you can download it from there. To do so, use one of the following two endpoints:

  • Metrics Data Endpoint (JSON) [POST /metric_data_download] - Fetch data for specific metrics
  • Test Data Endpoint (JSON) [POST /test_data_download] - Fetch raw test data