Topics on this page

Retrieve datasets using BriteData API

Overview

BriteCore’s data lake, BriteLake, is a centralized repository of data sourced from BriteCore’s distributed systems (BriteSuite).

BriteLake is created in a phased architecture and organized in zones, allowing for the creation of a streamlined pipeline for data ingestion, transformation, and consumption.

Ingestion

The ingestion phase creates the “bronze” layer, consuming data from the BriteSuite distributed databases and HTTP APIs, then stores those files in AWS S3, a scalable cloud object storage, using parquet format for tables extracts and JSON format for API responses.

The bronze layer groups datasets by prefixing the layer name followed by the BriteSuite product name, for instance, bronze__britecore_classic__policies.

AWS EMR ingests the data from the data sources overnight after other nightly data-producing processes are complete.

Transformation

The transformation phase creates the “silver” layer, using the “bronze” layer as the source. This stage performs data cleansing, harmonization, and preparation, then stores the datasets in s3 following a similar convention, prefixing the datasets name with the “silver__” symbol. Here we drop the BriteSuite product name since the data comes from multiple sources. For instance, silver__risks.

Consumption

The custom reports in BriteCore can consume the data available in BriteLake. These reports execute AWS Athena queries that save the report file in an S3 bucket. In BriteCore, you can find the files in the Attachments section of the Reports module.

The Data API provides endpoints to list and download the datasets directly in CSV format.

Technical overview

BriteDataETL consists primarily of two components: BriteDataAirflow and BriteDataAPI. The BriteDataAirflow orchestrates ETL jobs designed to build BriteLake. The BriteDataAPI provides the access to BriteLake both internally and externally via JSON:API compliant REST APIs.

Figure 1 illustrates a high-level overview of the BriteData ETL process.

Figure 1: BriteData ETL process overview.

Below are the broad strokes of the steps involved in a BriteData ETL process:

  1. Trigger via HTTP call to the Airflow Webserver.
  2. Apache Airflow defines workflow for data set jobs.
  3. Spark extracts raw data from data sources (RDS, API, etc), stores artifacts in an S3 Bucket.
  4. AWS Glue provides the data dictionary for every data set available.
  5. AWS Athena is used to query the data sets, using the metadata from Glue.
  6. Data is exposed to the end user via API Gateway.

This tutorial will guide you through retrieving datasets through APIs.

Use BriteData API endpoints

Note: Blocks of code are hidden by default to make the page more navigable. Select View code and Hide code to view or hide these sections as needed.

Step 1: Get a security token

You will need to request anID and Secret to use OAuth 2.0.

For more information, refer to How do I get started?

Step 2: Retrieve a list of data sets from product lines, policies, billing and claims

The listDataset endpoint lists available data sets. You can paginate by querying the /links/next URL and requesting the next batch of datasets.

Sample request

View code
curl --location -g --request GET '{{url}}/api/data/datasets?page[cursor]=eyJsYXN0RXZhbHVhdGVkS2V5Ijp7IkhBU0hfS0VZIjp7InMiOiJ0LjA0MjY0YWY0N2FiNDQyY2ZiNWRmZDgzM2Q0YzYyNTM4In0sIlJBTkdFX0tFWSI6eyJzIjoiYnJpdGVsYWtlX19icm9uemVfX2JyaXRlY2xhaW1zX190cmFuc2FjdGlvbl9zdWJfdHlwZXMifX0sImV4cGlyYXRpb24iOnsic2Vjb25kcyI6MTYzMjU3NTA5MSwibmFub3MiOjk3NzAwMDAwMH0sInBhZ2VTaXplIjoxMDB9'

Sample response

View code
{
    "data": [
        {
            "type": "dataset",
            "attributes": {
                "product": "briteclaims",
                "name": "audit_logs",
                "layer": "bronze"
            },
            "id": "bronze__briteclaims__audit_logs",
            "links": {
                "self": "/data/datasets/bronze__briteclaims__audit_logs"
            }
        },
        {
            "type": "dataset",
            "attributes": {
                "product": "briteclaims",
                "name": "claim_assignments",
                "layer": "bronze"
            },
            "id": "bronze__briteclaims__claim_assignments",
            "links": {
                "self": "/data/datasets/bronze__briteclaims__claim_assignments"
            }
        },
        {
            "type": "dataset",
            "attributes": {
                "product": "briteclaims",
                "name": "claim_event_types",
                "layer": "bronze"
            },
            "id": "bronze__briteclaims__claim_event_types",
            "links": {
                "self": "/data/datasets/bronze__briteclaims__claim_event_types"
            }
        },
        {
            "type": "dataset",
            "attributes": {
                "product": "briteclaims",
                "name": "claim_events",
                "layer": "bronze"
            },
            "id": "bronze__briteclaims__claim_events",
            "links": {
                "self": "/data/datasets/bronze__briteclaims__claim_events"
            }
        },
        {
            "type": "dataset",
            "attributes": {
                "product": "briteclaims",
                "name": "claim_formats",
                "layer": "bronze"
            },
            "id": "bronze__briteclaims__claim_formats",
            "links": {
                "self": "/data/datasets/bronze__briteclaims__claim_formats"
            }
  ],
  "links": {
    "next": "data/datasets?page[cursor]=hash_to_next_page"
  }
}

Step 3: Download a dataset

The downloadDataset endpoint downloads a dataset from BriteLake based on its name.

Sample request

View code

curl --location -g --request GET '{{baseUrl}}/data/datasets/:id'

Sample response

View code
{
    "data": {
        "type": "dataset",
        "attributes": {
            "product": "britecore_classic",
            "layer": "bronze",
            "expires_on": "2021-11-02T05:56:19.230208",
            "download_url": "https://sc-540351047480-pp-seer6rbghnq-athenaoutputbucket-122rqfblo7989.s3.amazonaws.com/1d873da9-582a-4036-8515-1c5708ec43df.csv?AWSAccessKeyId=ASIAX3T3OU44NJSZL2FV&Signature=4Ym%2BnxgHw43ixYhJDkmvm9pARNc%3D&x-amz-security-token=IQoJb3JpZ2luX2VjEH0aCXVzLWVhc3QtMSJHMEUCIQCyn0bkt5xg5hQWQ7j17lpaDb%2B9sYdWrQX2XbiXBG61xwIgKLO%2Fu8kYennHFTR469kLcr8MjsYRZauZLgd546YhCQoqxAIIJhACGgw1NDAzNTEwNDc0ODAiDNwVbv2Djq8sQpcX7SqhAp6vpN1nKlq%2F349yYrk2MlrK845tIl2UNL0OQ7sHgB0qwA%2BMWtJLO27qo%2Bt3YHRzFU4tAzKivK5LAOPVz3UoOblYksh3tC7LFAdCftDcepIkdQdE8O%2FyIMRwLXFKnod6bXKdq8nkEutmc6mTRRJ1NhqFyJHTur%2BWf%2F0%2BPM0IGEnS9LWe6Nx5fRR2W5IgaNPSId3YJpMr5NEgPWrvc%2FMZ0fxiDFdFhxMRTDmpBO%2BlRv6AeYpbW7qyFw2x1THAXvj27PZ7I1s2Gn5H6moP8%2Bh1tMWjPwyzH0AHigEyXs63En5k5PQyG7eAEErtLIqbA4FAbJ5xtd74tNRP4iIdXwmM7TUYqRcAsKhQK34VgByw%2FMGYlZs1SvVBqt0O2MA%2BjPoEQsgw6omDjAY6mgGPsHWOHWeTMpn1YV8xnyPjdz6449dZiBbwNwbBnGTRQj08%2FtesyMYxr55yef1yqJ9MrU6i8x4zRexbWi33SWj8CI2F0nl4UutC5%2BamEWhL0loqZPrGQQxjC2p3o1c7Dwq9pMdPYSsdiMScmEC71L2XciwBLORp7a0YHe94WvPJ0gNPsuv60kykdXKOsqJpshGkeByef%2FS2l9sy&Expires=1635832579",
            "name": "policy_type_items",
            "active_since": "2021-11-02T04:56:19.230208"
        },
        "id": "bronze__britecore_classic__policy_type_items",
        "links": {
            "self": "/data/datasets/bronze__britecore_classic__policy_type_items"
        }
    },
    "links": {
        "self": "/data/datasets/bronze__britecore_classic__policy_type_items"
    },
    "jsonapi": {
        "version": "1.0"
    }
}

Step 4: Retrieve downloaded data

To retrieve the downloaded data, get the S3 bucket URL to stream direct from BriteLake. The required parameters include:

  • AWSAccessKeyId: <key>
  • Signature: <sig>
  • x-amz-security-token: <token>
  • Expires: <time>

Sample request

View code

curl --location --request GET 'https://client.s3.amazonaws.com/<id>.csv?AWSAccessKeyId=%3Ckey%3E&Signature=%3Csig%3E&x-amz-security-token=%3Ctoken%3E&Expires=%3Ctime%3E'

Step 5: Import data into your reporting tool

You can stream the raw data in CSV format and import into your reporting tool.