Getting started

Rakam is an analytics platform that allows you to create your analytics services.

Features / Goals

Rakam is a modular analytics platform that gives you a set of features to create your own analytics service.

Typical workflow of using Rakam:

All these features come with a single box, you just need to specify which modules you want to use using a configuration file (config.properties) and Rakam will do the rest for you.
We also provide cloud deployment tools for scaling your Rakam cluster easily.

If your event data-set can fit in a single server, we recommend using Postgresql backend. Rakam will collect all your events in the row-oriented format in a Postgresql node. All the features provided by Rakam are supported in Postgresql deployment type.

However, Rakam is designed to be highly scalable in order to provide a solution for high work-loads. You can configure Rakam to send events to a distributed commit-log such as Apache Kafka or Amazon Kinesis in serialized Apache Avro format and process data in PrestoDB workers and store them in a distributed filesystem in a columnar format.

An event is an immutable action defined in your specific use-case. They have a collection name and properties. If you have a website, all page views are events. You may use pageview as collection name and URL, client location data etc. are properties. Or if you are an IoT company, all the sensor data coming from your devices is an event. Events should have an actor and timestamp of the occurrence. Rakam provides multiple methods for collecting events. It has high-level methods such as trackers, client libraries, importers and also RESTFul API*, webhook support, schedulers for collecting events from multiple sources. You can embed trackers in your client applications, send events from your favorite programming languages via client libraries or directly using RESTFul API, integrate your third-party services via webhook or schedulers. We aim to make Rakam as your data hub so you should be able to collect data from everywhere to Rakam.

You don't need to define the collection schema, Rakam automatically handles schema evolution of the events for you. One caveat is that if you have a property called IP and the type is integer, it tries to cast string to integer and if it fails the field is will be ignored.

Rakam will check the fields and if they exist and values match the existing schema, the event will be sent to the storage backend as you sent.

Trackers

Currently, we have Javascript tracker for websites and Android SDK for Android applications. We're also actively working on IOS tracker. Trackers are the preferred way to collect data because they're easy to integrate and automatically collect most of the data you need. For example, Javascript tracker automatically collects data about client machine, keep track of user ids between sessions and provide ways to collect platform-related data such as time on page the user spend in the website.

RESTFul API

If you want to send events directly from your applications, you can use client libraries. They're basically a wrapper that uses RESTFul API*.

Client libraries are current is in beta, they're automatically generated with Swagger. They provide classes and methods for all API endpoints in your Rakam API, you can also directly use RESTFul API to send data to Rakam.

RESTFul API* powers trackers and client libraries. You can always use the directly. Here is an example event:

{
    "collection": "pageView",
    "properties": {
        "user_agent": "Firefox",
        "locale": "TR-tr",
        "url": "http://mysite.com/blog-post",
        "referrer": "http://google.com/?q=term",
        "ip": "186.45.356.33",
        "platform": "web",
        "page_duration": 5,
        "session_id": "c88d7bad-af01-433f-bef1-dc107cee4334"
        "_user": "[email protected]"
        "_time": 1480457838042
    }
}

If _time attribute is not set, Rakam automatically attaches the current timestamp to the event. If you know the user who did the event, you should also set _user attribute to user id. These attributes will be used by funnel, retention and event explorer modules.

If you want to disable schema evolution for security reasons, you can use disable_dynamic_schema=true config once you created the schema of your event collections. It's recommended if you are running Rakam in production and open to the clients.

Webhook

If you want to integrate third-party services with Rakam and the third party service supports Webhook, you may use our webhook support. For example, Stripe sends payment and order data, Mailgun sends mail data (unsubscribe, click, open, etc.) via webhook. Webhook support is implemented as follow: You define an identifier such as mailgun_mails and write JS code that transforms the request body and headers and builds the event using JS code. When we receive a request from [RAKAM_API]/webhook/collect/mailgun_mails, we invoke the JS code if it returns JSON data, we add it as a new event.
Webhook support is available in Rakam BI and we provide templates for common services such as Mailgun and Stripe.

Importers

You can import CSV, JSON or AVRO files directly to your Rakam project. If you're already using analytics services such as Mixpanel and want to import to Rakam, we have importers that fetches raw event data from them and send it to Rakam. Currently, we have Mixpanel integration, you can find the documentation here.
We also have twitter task that collects tweet data from Twitter in real-time and send it to Rakam continuously, we use it internally for our integration tests.