by dengolius on 7/25/25, 7:35 AM with 1 comments
by dengolius on 7/25/25, 7:35 AM
The Problem with Prometheus At scale, Prometheus started showing limitations, especially for large enterprise customers like Pinterest.The main issues were: - High resource consumption: Prometheus used a lot of CPU and memory, leading to frequent out-of-memory (OOM) crashes. - Long recovery times: After a crash, Prometheus needed a long time to recover, sometimes failing altogether. - Limited query performance: Large queries would often fail or be very slow.
The Solution: VictoriaMetrics
TiDB switched to VictoriaMetrics and saw significant improvements: - Better resource utilization: CPU and memory usage dropped significantly, eliminating OOM crashes. - Improved query performance: Large queries that previously failed in Prometheus now run efficiently in VictoriaMetrics. - Lower costs: Reduced resource consumption and better storage efficiency led to lower operational costs.