Creating Datasets
A dataset represents a dataset in your data warehouse. Rakam automatically creates datasets from your dbt models
, sources
, and seeds
. You can also semantically define your datasets as SQL using our view
resource in a semantic way.
A dataset has three important properties:
dimension lets you drill down into the dataset, it can be used to filter query results. It can be either a column in your table or a SQL expression that processes a single row and returns a value.
measure is a field that uses a SQL aggregate function, such as count
, sum
, or average
. Measures can be used to aggregate dimensions. For example, measures for a Sales
model might include total items sold (a count), total sales price (a sum), and the average sales price (an average).
relation defines a join in between the datasets. Rakam automatically generates SQL join logic to bring in all fields from another dataset correctly then the user analyzes a specific dataset.
Rakam makes use of dbt as the data modeling language by extending dbt for metric definition. Your dbt projects can be integrated to Rakam via GIT and we automatically synchronize the dbt models
, seeds
, and sources
. However; dbt doesn't know about business metrics (aka. measures
) so we have an additional field that will be used inside yml
files.
Integrate dbt models
You can map your columns as dimensions and measures under the meta
property of columns
as follows:
models:
- name: customers
columns:
- name: country_code
meta:
rakam.dimension:
type: string
- name: city
meta:
rakam.dimension:
type: string
- name: total_customers
description: total number of customers defined as count(*) in sql
meta:
rakam.measure:
aggregation: sum
In addition to column mapping, you can also create custom measures and dimensions by defining them under meta
of the model:
models:
- name: customers
meta:
rakam:
measures:
total_rows:
aggregation: count
dimensions:
full_location:
sql: CONCAT({{TABLE}}.country, {{TABLE}}.city)
type: string
You can see the full list of properties that you can use under model.meta
and column.meta
here.
Mapping sources as models
In case you want to create models that point to tables in your database, you can make use of dbt's source properties as follows:
sources:
- name: raw_events
tables:
- name: pageview
meta:
rakam:
measures:
total_pageviews:
aggregation: count
columns:
- name: url
meta:
rakam.dimension:
type: string
If you're analyzing the time-series data, you can also define mappings
so that Rakam understands your data in a better way and lets you access specific features such as funnel and retention:
models:
- name: events
meta:
rakam:
mappings:
event_timestamp: event_occurred_at
userId: user_id
Integrating seeds into your models
Exposing your seeds to the end-users
Since seeds are usually used to enrich the data, we don't create models from seeds
by default. However; you can expose them to the end-users if you enable the seeds
to have appeared in dbt docs:
seeds:
- name: countries
docs:
show: true
meta:
rakam:
measures:
#
columns:
- name: iso_code
meta:
rakam.dimension:
- name: user_friendly_name
meta:
rakam.dimension:
type: string
models:
- name: pageviews
columns:
- name: country
tests:
- not_null
- relationships:
to: ref('countries')
field: iso_code
rakam: # Includes info about the relationship
join: left_join
type: many_to_many
Rakam automatically creates the relation from pageview
to a hidden countries
model so that the users can drill down by user_friendly_name
when they're analyzing pageviews
.
Supported tests
We make use of your dbt test definitions in order to understand your data in a better way. Here is the list of column tests we currently support:
unique
enables Rakam to be able to calculate symmetric aggregates as there must be a unique key in a model.not_null
hides theis not set
filter to the user in the user interface.accepted_values
makes Rakam suggests the values to the end-user in the user interface. It disables the automatic suggestion in favor of performance.relationships
automatically creates the relations between the models. We extend it withjoin
andtype
properties in order to resolve the relation.
You can create a file all_events.sql
and put it under either analysis/
or models/
directory. We automatically discover and create the virtual models on our end.
Full reference of properties
models: - name: customers description: List of customers meta: rakam: label: Customers hidden: false measures: total_rows: aggregation: count dimension | column | sql: reference filters: // filtered measure dimensions: full_location: column | sql: reference type: string category: User Segments relations: events: to: ref('user_id') aggregates: total_customers: measures: ['total_rows'] dimensions: ['full_location'] always_filters: - {dimension: 'is_active', value: true} columns: country: description: The country the user signed up from meta: rakam.dimension: type: string label: User Country category: Location hidden: false primary: false name: location_country created_at: meta: rakam.dimension: type: timestamp timeframes: [day, week, month, year] # only available for timestamp columns total_sales: meta: rakam.measure: aggregation: sum
Updated over 3 years ago