Prometheus target missing with warmup time
critical
Get Alert

Description Allow a job time to start up (10 minutes) before alerting that it's down.

Query

>>>
	
				
					sum by (instance, job) ((
				
			
				
					
				
			
				
					 == 0) * on (instance) gro
				
			
				
					
				
			
				
					_left(__name__) (
				
			
				
					
				
			
				
					 - 
				
			
				
					
				
			
				
					 > 600))

Query Explanation

This alert triggers when a Prometheus scrape target is reported as down (up == 0) and the underlying node hosting that target has been up for at least 10 minutes (node_time_seconds - node_boot_time_seconds > 600). This prevents alerts from firing immediately after a node (and its services) starts up, allowing for a warm-up period.

Source Awesome Prometheus Alerts

Get Alert✕

Download

Copy to Clipboard

Prometheus target missing with warmup time critical Get Alert Download Copy to Clipboard

Prometheus target missing with warmup time
critical
Get Alert