Core Concept

The goal of Rakam is to make it easily to create analytics services based on your needs. We found that there are Analytics SaaS providers that does great job but you often need to use at least a few analytics services and there are a few disadvantages of that. It means that you need to share your data with a few 3th party applications, pay each of them independently even though they usually have have similar infrastructures even though they use completely different technologies. The other common case is that these analytics services often specialize at one subject (web analytics, mobile analytics, real-time analytics, customer analytics etc.) so they may not solve your problem. We want to develop a modular and extensible analytics platform that you can use to create your custom analytics solutions easily.

We provide various ways to collect your events with Collection API. Currently you can use client libraries various platforms, send them in JSON format or write a module that consume events from different data sources. Rakam takes care of schema evolution if you wish, it automatically alters the schema at runtime when it encounters new fields. Depending on the deployment type, the event dataset will be stored in a (or a few, if you wish) database and the Analysis API uses SQL query language in order to analyze event dataset. It's possible to accomplish almost all analytics features such as funnel and retention analyses. We we also provide materialized query tables for caching and event flows and continuous query tables for real-time event processing.

Currently, we provide three different solutions for different use cases: Postgresql, Kafka & PrestoDB & (Distributed file-system or S3) and Amazon Kinesis & S3 & Redshift. You can use one of these deployment types or extend Rakam by developing deployment modules for your specific needs. We're also evaluating other solutions such as Elasticsearch, InfluxDB and Pinot.

If the volume of data is not that big to not fit in a single node we suggest you to use Postgresql deployment type because it's the easiest and feature-complete deployment type. You can also tune your Postgresql database by creating INDEXes, installing a columnar storage FDW extension such as cstore_fdw so think twice before using other complex deployment types. Setup Guide

The other deployment type is (Kafka or Kinesis) & PrestoDB & (Distributed file-system or S3) and it basically includes open-source components that you can install them in your cluster and maintain yourself. If you're in big data era and maintain your own cluster this is the deployment type for you. The events will be directly sent a distributed commit-log such as Apache Kafka and Amazon Kinesis. Then, we process data in small batches using PrestoDB, which is a distributed query executor. PrestoDB fetches data from the distributed commit log and save it in a distributed file-system (Hadoop etc.) you want in columnar format. Then, you can execute SQL queries on that dataset. Setup Guide