ClickHouse Performance and Feature Improvements

Performance Improvements:

Allow read-in-order optimization and primary-key pruning with Nullable CAST target types for monotonic conversions
Allow index pruning and filter pushdown when comparing integral columns with float literals
Added SLRU cache for Parquet metadata to improve read performance
Support swapping sides of ANTI, SEMI and FULL joins based on optimizer statistics
Optimized granules skipping for pointInPolygon and fixed index analysis issues
Improved levenshteinDistance function performance
Optimized batch decimal type conversions by avoiding per-element function calls
Iceberg tables now support asynchronous metadata prefetching and cached metadata usage
S3Queue ordered mode uses ListObjectsV2 StartAfter to reduce ListObjects calls
Lowered memory usage for inserts deduplication in sync mode
Use arch-specific cache line size instead of hardcoded 64-byte value
Optimized text index dictionary reading and analysis
Sped up LZ4 decompression of 16 byte blocks in ARM
Refactored tokenization to high-performance interface with SIMD support
Improved text index analysis for queries with combined conditions
Improved performance of queries with constant expressions generating large arrays/maps
Fixed key condition analysis for DateTime64 primary keys compared with integer constants
Setting optimize_syntax_fuse_functions enabled by default
Optimized avgWeighted aggregate function with local accumulators (~27% improvement for Nullable inputs)
Improved performance and reduced memory usage for parallel window functions and arrayFold workloads
Improved sorted merges performance
Optimized INTERSECT ALL and EXCEPT ALL
Added read_in_order_use_virtual_row optimization support for reverse-order reads
Reduced cache contention in RIGHT and FULL JOINs
Optimized PrefetchingHelper::calcPrefetchLookAhead with integer arithmetic
Reduced Keeper memory consumption by replacing absl::flat_hash_set with CompactChildrenSet (KeeperMemNode reduced from 144 to 128 bytes)

Feature Improvements:

Aggregate projections now correctly supported in views
Support OUTER to INNER join conversion optimization with join_use_nulls
Improved subcolumns reading with correct sizes calculation
Separate jemalloc arenas for mark, uncompressed and page caches to avoid memory fragmentation
Tables with DELETE TTL rules can now use vertical merge algorithm
Apply data skipping indexes during distributed index analysis
Secondary index marks prewarmed when prewarm_mark_cache setting enabled
Reduced locking during access control
Compound AND conditions in row policies and PREWHERE now decomposed for sorting-key atoms extraction
Reduced lock contention in MergeTreeBackgroundExecutor
Fixed excessive memory usage (~514 MiB) during format auto-detection for non-Arrow data
Parse GeoParquet files with different Geo types in same column
Introduced tokensForLikePattern SQL function for LIKE pattern tokenization
Added {_schema_hash} placeholder for S3 table engine
SymbolIndex, addressToSymbol, system.symbols, buildId now work on macOS
system.stack_trace table now works on macOS
Added per-server LDAP config option <follow_referrals> to control referral chasing
Track data skipping indices used in query execution via skip_indices column in query_log
ACCESS_DENIED hints no longer reveal column names unless user can show all required columns
Added dedicated cleanup thread for MergeTree to prevent cleanup delays
Reload cluster config if IPs of local server's hostname changed
Allow optimize_aggregators_of_group_by_keys to correctly optimize in GROUPING SETS queries
Keeper-bench: report errors in metrics and generate JSON metrics file
Added ROLE clause to CREATE USER
Internal_replication settings can now be set for Replicated database clusters
New setting allow_nullable_tuple_in_extracted_subcolumns controls Tuple subcolumns behavior

More from ClickHouse

From other products

More from ClickHouse

From other products