Streaming SQL Engine Comparison

· 3 min read

Our Goal: User-Facing Analytics

This is User-facing analytics, not BI Data Warehouse Engine like Trino, Dremio, BigQuery.

When it comes to user-facing analytics, you need a database that supports sub-second query responses, near real-time updates and high QPS (concurrent Queries-per-second).

In other words, while you may be able to wait a couple of minutes for results to come back for your business intelligence (BI) report to share with internal stakeholders, your end users don’t want to wait more than a few seconds to see results and take action in your platform. Hence, you need a data stack that can keep up with the required pace.

User-facing Analytics vs. BI Data Warehouse Engine

Customer-facing analytics is when you present data to your end-users so they can gain insights from it and take action. It can increase the real and perceived value of your product by presenting metrics in a digestible and immediate way. For example, real-time user-facing analytics is critical for things like live trading applications, health monitoring tools, or even delivery apps that show exactly how many minutes it’ll take to get your food.

Internal BI, on the other hand, is all about presenting business information to data analysts or managers for company use. So, in the delivery app example, internal BI would show metrics like number of daily users, best-performing restaurants, and average delivery time.

Table Comparison

Internal BIUser-facing Analytics
LatencyIt’s OK if it takes minutes to load depending on the size of dataShould load in milliseconds, irrespective of the size of data
FreshnessIt’s OK if the data is couple minutes oldOften needs to show realtime data
Concurrencydozens or hundreds depending on how big your company isthousands to million depending on how many user using the product
Query ComplexityGenerally more complexSimpler, but still not achievable with OLTP such as PostgreSQL
Query PatternMore ad-hoc query, you don’t really knowAlready defined as product requirement
BI Tool SupportSupport most of BI tool like Looker, Tableau, Metabase, etcLimited support

Examples

  • BI data warehouses: Redshift, Snowflake, Google BigQuery, Trino, SparkSQL
  • Real-time user-facing analytics: Rockset, ClickHouse, Cassandra, Druid, and Pinot.

Typical Setup

PostgreSQL -> Realtime Database -> S3

PostgreSQL -> S3 -> BI Data Warehouse

Combine

Candidate

Okay, so we are clear on what class of technology we want to compare. Let’s move with our candidate.

  • ClickHouse
  • Druid
  • Pinot
  • RisingWave
  • StarRock

What We Look for(or Defining Our Metrics)

What we look:

  • Popularity:
  • Freshness / Ingestion Time from PostgreSQL to X
  • Latency / Time from X to our User
  • Aggregation performance

What I’m not so sure:

  • integration with BI tool
  • JOIN performance

What we not look so much:

  • integration with dbt

Discussion

Popularity

Source: Google Trend

TechCompanies
ClickHouseCloudFlare, Microsoft, OpenSea, GraphQL
DruidAirBnB, Alibaba, Cisco, Salesforce, Shopify, Verizon
PinotLinkedln, Zoho, Uber, Microsoft, Walmart, NVIDIA
RockSetKlarna, Meta, Allianz, Sequoia

Freshness / Ingestion Time from PostgreSQL to X

Latency / Time from X to our User

Conclusion

References: