If you are running a Ceph cluster for several years you will come to the point where the number of placement groups (PGs) no longer fits to the size of your cluster. Maybe you increased the number of OSDs several times, maybe you misjudged the growth of a pool years ago.
Too few PGs per OSD can lead to drawbacks regarding performance, recovery time and uneven data distribution. The latter can be examined with the command ceph osd df. It shows the usage, weight, variance and number of PGs for each OSD as well the min/max variance and the standard deviation of your OSD usage.
Increasing the number of the PGs can lead to a more even data distribution,
but there are several warnings in blogs or mailings lists, especially for production environments. Doubling the PGs (e.g. from 1024 to 2048) may bring down your cluster for some minutes, because creating, activating and peering the new PGs may strongly influence your client’s traffic.
Regarding this warnings, we decided to increase the PGs in slices of 128. Between each slice we waited until all PGs peered successfully, which needed only a few seconds and did not influence the client traffic at all.
Increasing the number of PGs is done with two simple commands:
$ ceph osd set <pool> pg_num <int> $ ceph osd set <pool> pgp_num <int>
Increasing pg_num creates new PGs, but the data rebalancing and backfilling will only start after increasing pgp_num (the number of placement groups for placement), too. pg_num and pgp_num should always have the same value. Increasing your PGs usually comes with a huge amount of backfilling which should not be a problem for a well configured cluster.