Driftnet API

A comprehensive RESTful JSON API.


Internet Scans


Overview

Driftnet's Internet Scan data contains reports gathered by visiting open internet services using their IP and port.

Searching by IP

Internet scan data is returned by the scan/protocols endpoint. The simplest way to search internet scan data is by IP address.

curl -s -H 'Authorization: Bearer <your-api-token>' \
     'https://api.driftnet.io/v1/scan/protocols?ip=8.8.8.8' \
  | jq . \
  | less -S
{
  "page": 0,
  "pages": 22,
  "result_count": 2174,
  "results": [
    {
      "date": "2019-05-13",
      "id": "hJgzWy25TfuNfhvgMzs8Tw",
      "items": [
        {
          "context": "",
          "is_metadata": true,
          "type": "ip",
          "value": "8.8.8.8"
        },
        {
          "context": "",
          "is_metadata": true,
          "type": "port-tcp",
          "value": "443"
        },
        ...

In this example, there are 2174 results for IP address 8.8.8.8. The API returns results in batches of 100, so there are 22 pages in total. The newest results are returned first. To retrieve another page of results, use the page= parameter. Page numbering starts at zero.

Each result takes the form of a report. A report has a date stamp, a unique ID, and a collection of items. Each item contains:

  • type: The type of data being displayed, e.g. ip or host.
  • context: The context in which the value was seen, e.g. cert-dns-name for an ip or host seen inside an X.509 certificate.
  • value: The actual data value.
  • is_metadata: true if the data came from inside the collection system (e.g. an enrichment), false if it was collected from the external environment.

The ip parameter can take a CIDR range. Setting ip=8.8.8.0/24 would have searched the entire /24 address range, i.e. all addresses from 8.8.8.0 to 8.8.8.255 inclusive.

Including Indirect IPs

By default, a search by IP will only match cases where the IP being searched is the one that was scanned. You might also want to look for a type of ip in any context, i.e. including results with some other reference (usually an X.509 certificate) to the IP address searched for. To get these results, add the indirect=true qualifier:

curl -s -H 'Authorization: Bearer <your-api-token>' \
     'https://api.driftnet.io/v1/scan/protocols?ip=8.8.8.8&indirect=true' \
  | jq . \
  | less -S
{
  "page": 0,
  "pages": 90,
  "result_count": 8922,
  "results": [
    {
      "date": "2019-05-13",
      "id": "AR1NAVUQSMuy7NMQweNPgw",
      "items": [
        {
          "context": "",
          "is_metadata": true,
          "type": "ip",
          "value": "45.160.122.135"
        },
        {
          "context": "",
          "is_metadata": true,
          "type": "port-tcp",
          "value": "853"
        },
        ...

With the indirect=true parameter set we include results which present a certificate that is valid for IP 8.8.8.8, but which are not directly hosted on IP 8.8.8.8.

Field Searches

You can search internet scan data by any type field. For instance, to find all results with a server-banner containing cherrypy,

curl -s -H 'Authorization: Bearer <your-api-token>' \
     'https://api.driftnet.io/v1/scan/protocols&field=server-banner:cherrypy' \
  | jq . \
  | less -S

When using a field= search like this, the API will tokenize the search term and match it (case-insensitively) anywhere in the value.

You can search for any type field that you discover in the UI. Some other commonly-searched type fields:

  • http-header: To search within returned HTTP headers.
  • title: To search for the HTML title from a surveyed page.
  • host: To search for a hostname, seen anywhere. Hostname matches are right-anchored, so field=host:example.com will match foo.bar.example.com, etc. As a special case, to search for a host field within a URL, also set host_in_url=true.
  • issuer, subject: TLS certificate issuer/subject fields.
  • url: To search for a URL, seen anywhere. URL searches also support host queries, so field=url:example.com will match https://foo.bar.example.com/abc/def, etc.

If you don't know which type field to look in, you can use the query= parameter to omit it entirely.

curl -s -H 'Authorization: Bearer <your-api-token>' \
     'https://api.driftnet.io/v1/scan/protocols&query=cherrypy' \
  | jq . \
  | less -S

Using the query= parameter is slow, so try not to use it routinely. Always use field= or keyword= in preference.

If you would like to search for your query term as a prefix, set prefix=true on the API call.

To get a degree of sloppy matching, use the slop= parameter. For instance, setting slop=1 would allow a query for university london to match university of london.

Keyword Searches

If you know exactly what you are looking for, you can get more precision using the keyword= parameter.

curl -s -H 'Authorization: Bearer <your-api-token>' \
     --get --data-urlencode 'keyword=server-banner:CherryPy/3.2.5' \
     'https://api.driftnet.io/v1/scan/protocols' \
  | jq . \
  | less -S

(This slightly different call syntax persuades curl to URL-encode the / character for us.)

Filtering

To time-filter the results, use the from= and to= parameters. These accept dates in the format YYYY-MM-DD.

To filter on any arbitrary type, use filter=type:value, for instance filter=port-tcp:443 to restrict to TCP port 443.

Combining Search Parameters

The API allows us to combine several of these features at one time.

If we wanted to find scan results including TLS certificates issued to the University of Oxford, seen on the last three days, only on TCP port 4443, and only where the hardware was tagged fortinet, we could call:

curl -s -H 'Authorization: Bearer <your-api-token>' \
     --get --data-urlencode 'field=subject:university oxford' --data-urlencode 'slop=1' \
     --data-urlencode 'keyword=product-tag:fortinet' \
     --data-urlencode 'from=2019-05-10' \
     --data-urlencode 'filter=port-tcp:4443' \
     'https://api.driftnet.io/v1/scan/protocols' \
  | jq . \
  | less -S

Logically, a boolean AND is applied between search types, and an OR is applied between filter parameters. So, if we also wanted port 8443 in the above query, we could add --data-urlencode 'filter=port-tcp:8443'.

Entity Enrichment

Sometimes it can be useful to see the organization associated with the scan results. The API can, at query time, add this information. To enable this feature, set enrich_entity=true:

curl -s -H 'Authorization: Bearer <your-api-token>' \
     'https://api.driftnet.io/v1/scan/protocols?ip=8.8.8.8&enrich_entity=true' \
  | jq . \
  | less -S
{
  "page": 0,
  "pages": 19,
  "result_count": 1818,
  "results": [
    {
      "date": "2019-05-13",
      "id": "Zx3-NF_BRqGTQpPlRd0rcw",
      "items": [
        {
          "context": "",
          "is_metadata": true,
          "type": "ip",
          "value": "8.8.8.8"
        },
        {
          "context": "ip",
          "is_metadata": true,
          "type": "entity",
          "value": "Google LLC"
        },
        ...

Most-Recent Results

Driftnet stores results as a time series. Often, you only want to know the most recent result for an {ip, port} pair. Set most_recent=true, and voilà.

curl -s -H 'Authorization: Bearer <your-api-token>' \
     'https://api.driftnet.io/v1/scan/protocols?ip=8.8.8.8&most_recent=true' \
  | jq . \
  | less -S

Hosts in URLs

If you search for example.com, do you also want to match URLs of the form scheme://sub.example.com/path/to/somewhere? It depends on your use-case. By default, this feature is off, but you can enable it by setting host_in_url=true:

curl -s -H 'Authorization: Bearer <your-api-token>' \
     'https://api.driftnet.io/v1/scan/domains?field=host:google.com&host_in_url=true' \
  | jq . \
  | less -S

Summarization

Often, you want to get a quick rollup summary of a particular field. The API enables this with the summarize= parameter. This call will get all scan results including TLS certificates issued to the University of Oxford, and summarize the ports they were seen on:

curl -s -H 'Authorization: Bearer <your-api-token>' \
     --get --data-urlencode 'field=subject:university oxford' --data-urlencode 'slop=1' \
     --data-urlencode 'summarize=port-tcp' \
     'https://api.driftnet.io/v1/scan/protocols' \
  | jq . \
  | less -S
{
  "summary": {
    "other": 0,
    "values": {
      "1000": 10,
      "10000": 11,
      "10443": 64,
      ...
      "943": 21,
      "9443": 9,
      "993": 10
    }

The values object in the return contains the extracted values, together with their counts.

You might want to restrict the summary to particular contexts, or to exclude particular contexts (e.g. to exclude summarize HTTP headers, or to exclude a particular HTTP header). You can use the summary_context= and summary_nocontext= parameters for this.

By default, the summary is limited to a maximum of 100 values; if there are more unique values than this, then the total count of non-summarized values is placed in other. You can increase the maximum number of values in the summary using the summary_limit= parameter, up to a ceiling of 10,000 values per call.

If you need to increase the limit on the number of returned values above 10,000, then you can do so at the cost of additional API quota usage. Add enable_expensive_call=true to your request, and you will be able to set a summary_limit= of up to 1,000,000. However, for each block of 10,000 results returned, one unit of API quota will be consumed.Take care when setting this parameter!

Prioritization

You can request Driftnet to schedule protocol-level collection on a particular IP/port pair by using the scan/protocols/prioritize endpoint.

curl -s -H 'Authorization: Bearer <your-api-token>' \
     'https://api.driftnet.io/v1/scan/protocols/prioritize?ip=8.8.8.8&port=443' \
  | jq .