R2 SQL - R2 SQL now supports approximate aggregation functions — Cloudflare Changelog

R2 SQL now supports five approximate aggregation functions for fast analysis of large datasets. These functions trade minor precision for improved performance on high-cardinality data. New functions

APPROX_PERCENTILE_CONT(column, percentile) — Returns the approximate value at a given percentile (0.0 to 1.0). Works on integer and decimal columns. APPROX_PERCENTILE_CONT_WITH_WEIGHT(column, weight, percentile) — Weighted percentile calculation where each row contributes proportionally to its weight column value. APPROX_MEDIAN(column) — Returns the approximate median. Equivalent to APPROX_PERCENTILE_CONT(column, 0.5). APPROX_DISTINCT(column) — Returns the approximate number of distinct values. Works on any column type. APPROX_TOP_K(column, k) — Returns the k most frequent values with their counts as a JSON array.

All functions support WHERE filters. All except APPROX_TOP_K support GROUP BY. Examples -- Percentile analysis on revenue dataSELECT approx_percentile_cont(total_amount, 0.25), approx_percentile_cont(total_amount, 0.5), approx_percentile_cont(total_amount, 0.75)FROM my_namespace.sales_data -- Median per departmentSELECT department, approx_median(total_amount)FROM my_namespace.sales_dataGROUP BY department -- Approximate distinct customers by regionSELECT region, approx_distinct(customer_id)FROM my_namespace.sales_dataGROUP BY region -- Top 5 most frequent departmentsSELECT approx_top_k(department, 5)FROM my_namespace.sales_data -- Combine approximate and standard aggregationsSELECT COUNT(*), AVG(total_amount), approx_percentile_cont(total_amount, 0.5), approx_distinct(customer_id)FROM my_namespace.sales_dataWHERE region = 'North' For the full syntax and additional examples, refer to the SQL reference.