Creating Datasets

A dataset represents a dataset in your data warehouse. Rakam automatically creates datasets from your dbt models, sources, and seeds. You can also semantically define your datasets as SQL using our view resource in a semantic way.

A dataset has three important properties:

dimension lets you drill down into the dataset, it can be used to filter query results. It can be either a column in your table or a SQL expression that processes a single row and returns a value.

measure is a field that uses a SQL aggregate function, such as count, sum, or average. Measures can be used to aggregate dimensions. For example, measures for a Sales model might include total items sold (a count), total sales price (a sum), and the average sales price (an average).

relation defines a join in between the datasets. Rakam automatically generates SQL join logic to bring in all fields from another dataset correctly then the user analyzes a specific dataset.

Rakam makes use of dbt as the data modeling language by extending dbt for metric definition. Your dbt projects can be integrated to Rakam via GIT and we automatically synchronize the dbt models, seeds, and sources. However; dbt doesn't know about business metrics (aka. measures) so we have an additional field that will be used inside yml files.

Integrate dbt models

You can map your columns as dimensions and measures under the meta property of columns as follows:

models:
 - name: customers
   columns:
     - name: country_code
       meta:
         rakam.dimension:
           type: string
     - name: city
       meta:
         rakam.dimension:
           type: string
     - name: total_customers
       description: total number of customers defined as count(*) in sql
       meta:
         rakam.measure:
    	    aggregation: sum

In addition to column mapping, you can also create custom measures and dimensions by defining them under meta of the model:

models:
  - name: customers
    meta:
      rakam:
         measures:
            total_rows:
               aggregation: count
         dimensions:
            full_location:
	            sql: CONCAT({{TABLE}}.country, {{TABLE}}.city)   
	            type: string

You can see the full list of properties that you can use under model.meta and column.meta here.

Mapping sources as models

In case you want to create models that point to tables in your database, you can make use of dbt's source properties as follows:

sources:
  - name: raw_events
    tables:
      - name: pageview
        meta:
          rakam:
             measures:
                total_pageviews:
                aggregation: count
        columns:
          - name: url
            meta:
              rakam.dimension:
                type: string

If you're analyzing the time-series data, you can also define mappings so that Rakam understands your data in a better way and lets you access specific features such as funnel and retention:

models:
  - name: events
    meta:
      rakam:
	      mappings:
	        event_timestamp: event_occurred_at
	        userId: user_id

Integrating seeds into your models

Exposing your seeds to the end-users

Since seeds are usually used to enrich the data, we don't create models from seeds by default. However; you can expose them to the end-users if you enable the seeds to have appeared in dbt docs:

seeds:
  - name: countries
    docs:
      show: true
    meta:
      rakam:
        measures:
        #
    columns:
      - name: iso_code
        meta:
          rakam.dimension:
      - name: user_friendly_name
        meta:
          rakam.dimension: 
            type: string
models:
  - name: pageviews
    columns:
      - name: country
        tests:
          - not_null
          - relationships:
              to: ref('countries')
              field: iso_code
              rakam: # Includes info about the relationship
                   join: left_join 
                   type: many_to_many

Rakam automatically creates the relation from pageview to a hidden countries model so that the users can drill down by user_friendly_name when they're analyzing pageviews.

Supported tests

We make use of your dbt test definitions in order to understand your data in a better way. Here is the list of column tests we currently support:

  • unique enables Rakam to be able to calculate symmetric aggregates as there must be a unique key in a model.
  • not_null hides the is not set filter to the user in the user interface.
  • accepted_values makes Rakam suggests the values to the end-user in the user interface. It disables the automatic suggestion in favor of performance.
  • relationships automatically creates the relations between the models. We extend it with join and type properties in order to resolve the relation.

You can create a file all_events.sql and put it under either analysis/ or models/ directory. We automatically discover and create the virtual models on our end.

Full reference of properties

models:
  - name: customers
    description: List of customers
    meta:
     rakam:
      label: Customers
      hidden: false
      measures:
        total_rows:
          aggregation: count
          dimension | column | sql: reference
          filters: 
             // filtered measure
      dimensions:
        full_location:
          column | sql: reference
          type: string
      category: User Segments
      relations:
        events:
          to: ref('user_id')
      aggregates:
        total_customers:
           measures: ['total_rows']
           dimensions: ['full_location']
      always_filters:
        - {dimension: 'is_active', value: true}
    columns:
       country:
         description: The country the user signed up from
         meta:
           rakam.dimension:
	      type: string
	      label: User Country
	      category: Location
	      hidden: false
	      primary: false
	      name: location_country
       created_at:
         meta:
           rakam.dimension:
	      type: timestamp
	      timeframes: [day, week, month, year] # only available for timestamp columns
       total_sales:
         meta:
           rakam.measure:
               aggregation: sum