In Snowflake, Rakam makes use of one table called
EVENTS in order to insert all the event data. This table has a
properties column that includes all the event properties. Here is the schema:
It represents the event timestamp in UTC format. We convert all the timestamp values to UTC for convenience.
It represents the time when this event is inserted to Snowflake. Its timezone depends on the Snowflake Snowpipe server's timezone.
The event type that you use sending the event to Rakam.
It contains all the event properties that you sent to Rakam. The field that starts with
The project of the write_key used sending the event.
A unique identifier of the event inserted via Snowpipe. It's used for deduplication.
The data is available in Snowflake in less than 30 minutes. You also have all the data in JSON format in your AWS S3 bucket for further analysis. If you want to analyze the data with SQL queries, you can use Snowflake but it's not actually suitable for extracting raw data so we have a backup in S3 in case you want to use Python & R analyzing the event data.
We run a scheduled task every hour that deduplicates the events because if you're collecting the data from mobile devices, the SDKs can try to send the same event more than once since the network may not be reliable.
Depending on the queries that you're running on Rakam, you can change the clustering key of the
EVENTS table. By default, the partition key is
LINEAR(EVENT_TYPE, CAST(_TIME AS DATE)) which means that your queries will be running much faster if you include
_time predicate in your
_session_id parameter is the epoch timestamp of the beginning of sessions. If you need to calculate the unique sessions you can use the following expression:
select count(distinct CONCAT(properties:_device_id, properties:_session_id)) from events
When _device_id is combined with _session_id, it represents the unique session id for the users.
Updated 4 months ago