Semantic Datasets

If you need to create datasets programmatically, you can use our views. Since dbt doesn't have this concept, we extended it in order to cover the following use-cases:

1. Creating datasets automatically from your database tables

If you don't want to create dimension & measure definitions in yml files and instead create datasets from your tables in an automatic way.

2. Product Analytics

If you want to create a dataset for each of your event types and dimensions for each of your event properties automatically.

3. Building installable recipes

If you're building recipes that will be installed by other users, you can configure your recipe variables in rakam_project.yml, let people select their schema & table, and use var function inside your jinja2 files in order to build up datasets in a dynamic way.

Creating your first semantic dataset

We use Jinja in order to create semantic datasets. Create a file called models/example.jinja2 as follows:

{% for i in range(2) %}
  {{view(name='data'~i, sql="select "~i)}}
{% endfor %}

The example above creates 2 datasets called data1 and data2 with the definitions of select 1 and select 2.

A more realistic example can create a dataset from one of your tables as follows:

{{view(
    name='customer_attributes', 
    measures={total_customers: {aggregation: 'count'}},
    dimensions: 
  )
  }}

view Spesific Properties

In addition to the Model properties ], view also supports the following properties:

sql:

You can define the sql for the dataset as follows:

{{view(
    name='customer_attributes', 
    sql="select 1",
  )
  }}

Keep in mind that this SQL context does not support dbt's Jinja context and can't be materialized. It's intended to be used for semantically defining the datasets.

extends:

If you're creating datasets inheriting a dbt source or model, you can reference the parent model as follows:

{{view(
    name='pageview_event', 
    extends= source('product', 'all_events'),  # it also supports ref('model_name')
  )
  }}

Rakam automatically merges dimension & measure & relations to the current dataset.

Context variables

Since Rakam compiles the Jinja files, the context variables are different from the dbt context. Here is the full list of available functions:

{{ import('../lib/file.yml') }}

Returns the file content of the imported file path. If you're importing yml files, you need to use fromyaml filter as follows:

{% set dimensions = import('../lib/dimensions.yml') | fromyml %}

{{ view(name = 'my_model', dimensions = dimensions) }}
{{ ref('model_name') }}

Returns the dataset name of the dbt model. It's useful if you want to reference other datasets in jinja2 files.

{{ source('source_name') }}

Returns the dataset name of the dbt source. It's useful if you want to reference other datasets in jinja2 files.

var(string) : any

Returns the value of the project variable.