CloudNativePG
448,310

Created 2/3/2024

Updated 4/14/2025

Revision 4

Prometheus Operator example

A specific PostgreSQL cluster can be monitored using the Prometheus Operator's resource PodMonitor. A PodMonitor correctly pointing to a Cluster can be automatically created by the operator by setting .spec.monitoring.enablePodMonitor to true in the Cluster resource itself (default: false).

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: cluster-example
  namespace: test
spec:
  instances: 3

  storage:
    size: 1Gi

  monitoring:
    enablePodMonitor: true

User defined metrics

Custom metrics can be defined by users by referring to the created Configmap/Secret in a Cluster definition under the .spec.monitoring.customQueriesConfigMap or customQueriesSecret section as in the following example:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: cluster-example
  namespace: test
spec:
  instances: 3

  storage:
    size: 1Gi

  monitoring:
    customQueriesConfigMap:
      - name: example-monitoring
        key: custom-queries

Here you can see an example of a ConfigMap containing a single custom query, referenced by the Cluster example above:

apiVersion: v1
kind: ConfigMap
metadata:
  name: example-monitoring
  namespace: test
  labels:
    cnpg.io/reload: ""
data:
  custom-queries: |
    pg_replication:
      query: "SELECT CASE WHEN NOT pg_is_in_recovery()
              THEN 0
              ELSE GREATEST (0,
                EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())))
              END AS lag,
              pg_is_in_recovery() AS in_recovery,
              EXISTS (TABLE pg_stat_wal_receiver) AS is_wal_receiver_up,
              (SELECT count(*) FROM pg_stat_replication) AS streaming_replicas"

      metrics:
        - lag:
            usage: "GAUGE"
            description: "Replication lag behind primary in seconds"
        - in_recovery:
            usage: "GAUGE"
            description: "Whether the instance is in recovery"
        - is_wal_receiver_up:
            usage: "GAUGE"
            description: "Whether the instance wal_receiver is up"
        - streaming_replicas:
            usage: "GAUGE"
            description: "Number of streaming replicas connected to the instance"

A list of basic monitoring queries can be found in the default-monitoring.yaml file that is already installed in your CloudNativePG deployment (see "Default set of metrics").

Export Dashboard✕

Download

Copy to Clipboard

Used Metrics 41

cnpg_pg_replication_streaming_replicas
cnpg_pg_replication_is_wal_receiver_up
cnpg_pg_replication_lag
cnpg_pg_stat_replication_write_lag_seconds
cnpg_pg_stat_replication_flush_lag_seconds
cnpg_pg_stat_replication_replay_lag_seconds
kubelet_volume_stats_available_bytes
kubelet_volume_stats_capacity_bytes
kubelet_volume_stats_inodes_used
kubelet_volume_stats_inodes
cnpg_pg_postmaster_start_time
cnpg_pg_stat_database_xact_commit
cnpg_pg_stat_database_xact_rollback
node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate
kube_pod_container_resource_requests
container_memory_working_set_bytes
wal
kubelet_volume_stats_used_bytes
tbs
volume
kube_pod_spec_volumes_persistentvolumeclaims_info
cnpg_collector_last_available_backup_timestamp
cnpg_backends_total
cnpg_pg_settings_setting
cnpg_pg_replication_in_recovery
timestamp
cnpg_pg_stat_archiver_seconds_since_last_archival
cnpg_collector_postgres_version
kube_pod_status_ready
controller_runtime_reconcile_total
cnpg_pg_database_size_bytes
cnpg_collector_first_recoverability_point
kube_pod_container_status_ready
min
kube_pod_info
label_topology_kubernetes_io_zone
kube_node_labels
cnpg_pg_database_xid_age
cnpg_backends_max_tx_duration_seconds
cnpg_pg_stat_database_deadlocks
cnpg_backends_waiting_total