CloudNativePG
448,310

Created 2/3/2024
Updated 4/14/2025
Revision 4
Categories
Databases
Grafana Version >=10.3.3
Datasources
Prometheus

Refer to the Monitoring documentation on how to enable monitoring for your cluster and/or export custom metrics.

Source Code: GitHub

Prometheus Operator example

A specific PostgreSQL cluster can be monitored using the Prometheus Operator's resource PodMonitor. A PodMonitor correctly pointing to a Cluster can be automatically created by the operator by setting .spec.monitoring.enablePodMonitor to true in the Cluster resource itself (default: false).

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: cluster-example
  namespace: test
spec:
  instances: 3

  storage:
    size: 1Gi

  monitoring:
    enablePodMonitor: true

User defined metrics

Custom metrics can be defined by users by referring to the created Configmap/Secret in a Cluster definition under the .spec.monitoring.customQueriesConfigMap or customQueriesSecret section as in the following example:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: cluster-example
  namespace: test
spec:
  instances: 3

  storage:
    size: 1Gi

  monitoring:
    customQueriesConfigMap:
      - name: example-monitoring
        key: custom-queries

Here you can see an example of a ConfigMap containing a single custom query, referenced by the Cluster example above:

apiVersion: v1
kind: ConfigMap
metadata:
  name: example-monitoring
  namespace: test
  labels:
    cnpg.io/reload: ""
data:
  custom-queries: |
    pg_replication:
      query: "SELECT CASE WHEN NOT pg_is_in_recovery()
              THEN 0
              ELSE GREATEST (0,
                EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())))
              END AS lag,
              pg_is_in_recovery() AS in_recovery,
              EXISTS (TABLE pg_stat_wal_receiver) AS is_wal_receiver_up,
              (SELECT count(*) FROM pg_stat_replication) AS streaming_replicas"

      metrics:
        - lag:
            usage: "GAUGE"
            description: "Replication lag behind primary in seconds"
        - in_recovery:
            usage: "GAUGE"
            description: "Whether the instance is in recovery"
        - is_wal_receiver_up:
            usage: "GAUGE"
            description: "Whether the instance wal_receiver is up"
        - streaming_replicas:
            usage: "GAUGE"
            description: "Number of streaming replicas connected to the instance"

A list of basic monitoring queries can be found in the default-monitoring.yaml file that is already installed in your CloudNativePG deployment (see "Default set of metrics").

Export Dashboard
Download
Copy to Clipboard

Used Metrics 41

  • cnpg_pg_replication_streaming_replicas

  • cnpg_pg_replication_is_wal_receiver_up

  • cnpg_pg_replication_lag

  • cnpg_pg_stat_replication_write_lag_seconds

  • cnpg_pg_stat_replication_flush_lag_seconds

  • cnpg_pg_stat_replication_replay_lag_seconds

  • kubelet_volume_stats_available_bytes

  • kubelet_volume_stats_capacity_bytes

  • kubelet_volume_stats_inodes_used

  • kubelet_volume_stats_inodes

  • cnpg_pg_postmaster_start_time

  • cnpg_pg_stat_database_xact_commit

  • cnpg_pg_stat_database_xact_rollback

  • node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate

  • kube_pod_container_resource_requests

  • container_memory_working_set_bytes

  • wal

  • kubelet_volume_stats_used_bytes

  • tbs

  • volume

  • kube_pod_spec_volumes_persistentvolumeclaims_info

  • cnpg_collector_last_available_backup_timestamp

  • cnpg_backends_total

  • cnpg_pg_settings_setting

  • cnpg_pg_replication_in_recovery

  • timestamp

  • cnpg_pg_stat_archiver_seconds_since_last_archival

  • cnpg_collector_postgres_version

  • kube_pod_status_ready

  • controller_runtime_reconcile_total

  • cnpg_pg_database_size_bytes

  • cnpg_collector_first_recoverability_point

  • kube_pod_container_status_ready

  • min

  • kube_pod_info

  • label_topology_kubernetes_io_zone

  • kube_node_labels

  • cnpg_pg_database_xid_age

  • cnpg_backends_max_tx_duration_seconds

  • cnpg_pg_stat_database_deadlocks

  • cnpg_backends_waiting_total