ceph osd commit latency increase over time, until restart

* ceph osd commit latency increase over time, until restart
       [not found] ` <395511117.2665.1548405853447.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
@ 2019-01-25  9:14   ` Alexandre DERUMIER
       [not found]     ` <387140705.12275.1548407699184.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Alexandre DERUMIER @ 2019-01-25  9:14 UTC (permalink / raw)
  To: ceph-users, ceph-devel

Hi, 

I have a strange behaviour of my osd, on multiple clusters, 

All cluster are running mimic 13.2.1,bluestore, with ssd or nvme drivers, 
workload is rbd only, with qemu-kvm vms running with librbd + snapshot/rbd export-diff/snapshotdelete each day for backup 

When the osd are refreshly started, the commit latency is between 0,5-1ms. 

But overtime, this latency increase slowly (maybe around 1ms by day), until reaching crazy 
values like 20-200ms. 

Some example graphs:

http://odisoweb1.odiso.net/osdlatency1.png
http://odisoweb1.odiso.net/osdlatency2.png

All osds have this behaviour, in all clusters. 

The latency of physical disks is ok. (Clusters are far to be full loaded) 

And if I restart the osd, the latency come back to 0,5-1ms. 

That's remember me old tcmalloc bug, but maybe could it be a bluestore memory bug ? 

Any Hints for counters/logs to check ? 

Regards, 

Alexandre 

^ permalink raw reply	[flat|nested] 42+ messages in thread