Analyze Data via SQL

In Snowflake, Rakam makes use of one table called EVENTS in order to insert all the event data. This table has a properties column that includes all the event properties. Here is the schema:


_time	It represents the event timestamp in UTC format. We convert all the timestamp values to UTC for convenience.	timestamptz
server_time	It represents the time when this event is inserted to Snowflake. Its timezone depends on the Snowflake Snowpipe server's timezone.	timestamptz
event_type	The event type that you use sending the event to Rakam.	string
properties	It contains all the event properties that you sent to Rakam. The field that starts with `_` are sent from SDK which are enrichment fields.	variant
project	The project of the write_key used sending the event.	string
_id	A unique identifier of the event inserted via Snowpipe. It's used for deduplication.	string

The data is available in Snowflake in less than 30 minutes. You also have all the data in JSON format in your AWS S3 bucket for further analysis. If you want to analyze the data with SQL queries, you can use Snowflake but it's not actually suitable for extracting raw data so we have a backup in S3 in case you want to use Python & R analyzing the event data.

We run a scheduled task every hour that deduplicates the events because if you're collecting the data from mobile devices, the SDKs can try to send the same event more than once since the network may not be reliable.

Depending on the queries that you're running on Rakam, you can change the clustering key of the EVENTS table. By default, the partition key is LINEAR(EVENT_TYPE, CAST(_TIME AS DATE)) which means that your queries will be running much faster if you include event_type and _time predicate in your WHERE clause.

Sessions

_session_id parameter is the epoch timestamp of the beginning of sessions. If you need to calculate the unique sessions you can use the following expression:

select count(distinct CONCAT(properties:_device_id, properties:_session_id)) from events

When _device_id is combined with _session_id, it represents the unique session id for the users.