Semantic Datasets
If you need to create datasets programmatically, you can use our view
s. Since dbt doesn't have this concept, we extended it in order to cover the following use-cases:
1. Creating datasets automatically from your database tables
If you don't want to create dimension & measure definitions in yml files and instead create datasets from your tables in an automatic way.
2. Product Analytics
If you want to create a dataset for each of your event types and dimensions for each of your event properties automatically.
3. Building installable recipes
If you're building recipes that will be installed by other users, you can configure your recipe variables in rakam_project.yml
, let people select their schema & table, and use var
function inside your jinja2
files in order to build up datasets in a dynamic way.
Creating your first semantic dataset
We use Jinja in order to create semantic datasets. Create a file called models/example.jinja2
as follows:
{% for i in range(2) %}
{{view(name='data'~i, sql="select "~i)}}
{% endfor %}
The example above creates 2 datasets called data1
and data2
with the definitions of select 1
and select 2
.
A more realistic example can create a dataset from one of your tables as follows:
{{view(
name='customer_attributes',
measures={total_customers: {aggregation: 'count'}},
dimensions:
)
}}
view
Spesific Properties
view
Spesific PropertiesIn addition to the Model properties ], view also supports the following properties:
sql:
sql:
You can define the sql for the dataset as follows:
{{view(
name='customer_attributes',
sql="select 1",
)
}}
Keep in mind that this SQL context does not support dbt's Jinja context and can't be materialized. It's intended to be used for semantically defining the datasets.
extends:
extends:
If you're creating datasets inheriting a dbt source
or model
, you can reference the parent model as follows:
{{view(
name='pageview_event',
extends= source('product', 'all_events'), # it also supports ref('model_name')
)
}}
Rakam automatically merges dimension & measure & relations to the current dataset.
Context variables
Since Rakam compiles the Jinja files, the context variables are different from the dbt context. Here is the full list of available functions:
{{ import('../lib/file.yml') }}
Returns the file content of the imported file path. If you're importing yml files, you need to use fromyaml
filter as follows:
{% set dimensions = import('../lib/dimensions.yml') | fromyml %}
{{ view(name = 'my_model', dimensions = dimensions) }}
{{ ref('model_name') }}
Returns the dataset name of the dbt model. It's useful if you want to reference other datasets in jinja2
files.
{{ source('source_name') }}
Returns the dataset name of the dbt source. It's useful if you want to reference other datasets in jinja2
files.
var(string) : any
Returns the value of the project variable.
Updated over 3 years ago