Ceph - Cluster 18,65318,653 5.0 (1 reviews)
Description
This dashboard is targeted for service managers or teams which manage more than one ceph instances. It shows all the stats per cluster and easy to switch between them. This dashboard uses native Ceph prometheus module (ceph_exporter not needed) for ceph stats and node exporter for node stats
Requisites
- Ceph 12.2 Luminous or Ceph 13.2 Mimic (Note that some of the stats are only reported by Mimic instances)
- Node Exporter for node metrics
Setup
- Enable ceph prometheus module on every cluster:
ceph mgr module enable prometheus - Allow traffic through the port
9283of the machines containing the ceph mgr - To ensure that you don't lose the metrics between mgr fail-overs, add all the mgr to the target section in Prometheus (depending of the prometheus scrape interval some of the metrics could be lost during the failover)
- In prometheus configuration, create a new label when defining the ceph targets and also the node targets, like:
{ceph-targets}
{
"targets": [ "mycluster-mgr-1:9283", "mycluster-mgr-2:9283", "mycluster-mgr-3:9283" ],
"labels": {
"cluster":"mycluster"
}
}
{node-exporter-targets}
{
"targets": [ "mycluster-node-1:9100", "mycluster-node-2:9100", "mycluster-node-3:9100" ],
"labels": {
"cluster":"mycluster"
}
}
- In case you don't use the above label nomenclature, adapt accordingly the
clustervariable in Grafana
Used Metrics 5353
-
ceph_health_status
-
ceph_osd_op_w_in_bytes
-
ceph_osd_op_r_out_bytes
-
ceph_cluster_total_bytes
-
ceph_cluster_total_used_bytes
ceph_cluster_total_objects
ALERTS
-
ceph_osd_op_w
-
ceph_osd_op_r
-
ceph_mon_num_sessions
-
ceph_mon_quorum_status
-
ceph_osd_up
-
ceph_osd_in
-
ceph_osd_numpg
-
ceph_osd_apply_latency_ms
-
ceph_osd_commit_latency_ms
-
ceph_osd_op_w_latency_sum
-
ceph_osd_op_w_latency_count
-
ceph_osd_op_r_latency_sum
-
ceph_osd_op_r_latency_count
ceph_pool_bytes_used
name
-
ceph_pool_metadata
ceph_pool_raw_bytes_used
-
ceph_pool_objects
-
ceph_pool_quota_bytes
-
ceph_pool_quota_objects
-
ceph_bluestore_commit_lat_count
ceph_filestore_journal_latency_count
-
ceph_pg_active
-
ceph_pg_clean
-
ceph_pg_peering
-
ceph_pg_degraded
-
ceph_pg_stale
ceph_unclean_pgs
-
ceph_pg_undersized
-
ceph_pg_incomplete
-
ceph_pg_forced_backfill
-
ceph_pg_inconsistent
-
ceph_pg_forced_recovery
-
ceph_pg_creating
ceph_pg_wait_backfill
-
ceph_pg_deep
-
ceph_pg_scrubbing
-
ceph_pg_recovering
-
ceph_pg_repair
-
ceph_pg_down
-
ceph_pg_peered
ceph_pg_backfill
-
ceph_pg_remapped
-
ceph_pg_backfill_toofull
-
ceph_osd_recovery_ops
interval