Prometheus target missing with warmup time
critical

Description Allow a job time to start up (10 minutes) before alerting that it's down.
Query
>>>
	
				
					sum by (instance, job) ((
				
			
				
					
				
			
				
					 == 0) * on (instance) gro
				
			
				
					
				
			
				
					_left(__name__) (
				
			
				
					
				
			
				
					 - 
				
			
				
					
				
			
				
					 > 600))
				
			
    
Query Explanation

This alert triggers when a Prometheus scrape target is reported as down (up == 0) and the underlying node hosting that target has been up for at least 10 minutes (node_time_seconds - node_boot_time_seconds > 600). This prevents alerts from firing immediately after a node (and its services) starts up, allowing for a warm-up period.

Get Alert
Download
Copy to Clipboard