Thanos Query Http Request Query Error Rate High
critical

Description Thanos Query {{$labels.job}} is failing to handle {{$value | humanize}}% of "query" requests.
Query for 5m
>>>
	
				
					(sum by (job) (
				
			
				
					
					
						rate
					
				
			
				
					(
				
			
				
					
				
			
				
					{code=~"5..", job=~".*thanos-query.*", handler="query"}[5m]))/  sum by (job) (
				
			
				
					
					
						rate
					
				
			
				
					(
				
			
				
					
				
			
				
					{job=~".*thanos-query.*", handler="query"}[5m]))) * 100 > 5
				
			
    
Query Explanation

The rule calculates, for each job, the percentage of HTTP 5xx responses among all query requests over the last 5 minutes ( (sum(rate(http_requests_total{code=~"5..",handler="query",job=~".*thanos-query.*"}[5m])) / sum(rate(http_requests_total{handler="query",job=~".*thanos-query.*"}[5m]))) * 100 ) and triggers when this error rate exceeds 5 %. In other words, if any Thanos‑Query instance reports that more than five percent of its query‑handler requests are failing, the alert fires.

Get Alert
Download
Copy to Clipboard