Data sets overview

Why data sets?

Databases can contain large amounts of data. Queries can take milliseconds to many hours depending on their complexity. While a query is running, it can significantly slow down; sometimes, new queries are inhibited from being run. This becomes particularly relevant to BriteCore with regard to running reports. Our clients run a multitude of reports that require extensive data manipulation.

To make the reporting process less intensive and time-consuming, we generate nightly views of data. These specialized views are used to build reports more effectively. Data sets are flat files (such as .csv) that are used by a Python library called Pandas.

Most of our reporting comes from data sets (or data frames/data caches) and not directly from querying the database.

What are data sets?

Data sets are data structures designed for reporting and analysis; think of them as reporting materialized views.

The data sets include more meaningful groupings of information, achieved with a date (similar to a snapshot) from which they were rendered. This date is often referred to internally as the as of date.

Any modern BriteCore report you run will use an as of date, which dictates the archive used. Though you might not specify an as of date when running a report, a meaningful one is chosen. For example, if the report is a date range, the as of date is the date after the to date in the range.

We provide some mechanisms for exploring the metadata for all of the available data sources as data sets.

You can export data sets from Report List in the Reports menu for further analysis. All data sets are stored in .csv format.

Types of data sets in BriteCore

There are two types of data sets available in BriteCore:

  • Non-prepared data sets: Non-prepared data sets are based on raw SQL queries that pull directly from the transactional database. These data sets act as a staging area for further processing.
  • Prepared data sets: Prepared data sets are based on non-prepared data sets. These are based on SQL queries that can include merging and other logic. Prepared data sets don’t have repeating column names among them and they all use revisionId as the merge key. Prepared data sets are the preferred first data source for reports and are divided into:
    • Facts: Facts are also referred to as measures. This is data that is aggregated, summarized, or subtotaled. BriteData treats the facts data set as the base data set for a report. If you don’t select a fact data set for your report, BriteData will use policy_state as the default. The following are examples of data sets that contain facts:
      • accounting
      • claim_payments
      • claims
      • commission_accounting
      • commission_payments
      • files
      • item_changes
      • item_range
      • item_state
      • item_transactions
      • policy_changes
      • policy_range
      • policy_state
      • premium_records
      • quotes
      • return_premiums
      • written_premium
    • Dimensions: Dimensions provide the context for the facts or measurements so when querying data, the dimensions serve as filters or groupings. The following are examples of data sets that contain dimensions:
      • additional_interests
      • agencies
      • credit_reports
      • dates_to_remember
      • items
      • lines
      • mortgagees
      • policies
      • policy_types
      • policyholders
      • primary_policyholders
      • properties
      • property_item_rating_details
      • Revisions
    • Factless: Factless data sets are used to join dimensional data but don’t contain any measures or facts. An example of a factless data set is the Credit Reports data set.


Figure 1: The Extract, Transform, Load (ETL) process producing data sets (DF) as the output: