linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Large latency with bcache for Ceph OSD(new mail thread)
@ 2021-03-02 10:20 Norman.Kern
  2021-03-02 13:20 ` Coly Li
  0 siblings, 1 reply; 6+ messages in thread
From: Norman.Kern @ 2021-03-02 10:20 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-bcache, linux-block

Sorry for creating a new mail thread(the origin is so long...)


I made a test again and get more infomation:

root@WXS0089:~# cat /sys/block/bcache0/bcache/dirty_data
0.0k
root@WXS0089:~# lsblk /dev/sda
NAME      MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda         8:0    0 447.1G  0 disk
`-bcache0 252:0    0  10.9T  0 disk
root@WXS0089:~# cat /sys/block/sda/bcache/priority_stats
Unused:         1%
Clean:          29%
Dirty:          70%
Metadata:       0%
Average:        49
Sectors per Q:  29184768
Quantiles:      [1 2 3 5 6 8 9 11 13 14 16 19 21 23 26 29 32 36 39 43 48 53 59 65 73 83 94 109 129 156 203]
root@WXS0089:~# cat /sys/fs/bcache/066319e1-8680-4b5b-adb8-49596319154b/internal/gc_after_writeback
1
You have new mail in /var/mail/root
root@WXS0089:~# cat /sys/fs/bcache/066319e1-8680-4b5b-adb8-49596319154b/cache_available_percent
28

I read the source codes and found if cache_available_percent > 50, it should wakeup gc while doing writeback, but it seemed not work right.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Large latency with bcache for Ceph OSD(new mail thread)
  2021-03-02 10:20 Large latency with bcache for Ceph OSD(new mail thread) Norman.Kern
@ 2021-03-02 13:20 ` Coly Li
  2021-03-03  3:25   ` Norman.Kern
  2021-03-05  9:00   ` Norman.Kern
  0 siblings, 2 replies; 6+ messages in thread
From: Coly Li @ 2021-03-02 13:20 UTC (permalink / raw)
  To: Norman.Kern; +Cc: linux-bcache, linux-block

On 3/2/21 6:20 PM, Norman.Kern wrote:
> Sorry for creating a new mail thread(the origin is so long...)
> 
> 
> I made a test again and get more infomation:
> 
> root@WXS0089:~# cat /sys/block/bcache0/bcache/dirty_data
> 0.0k
> root@WXS0089:~# lsblk /dev/sda
> NAME      MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
> sda         8:0    0 447.1G  0 disk
> `-bcache0 252:0    0  10.9T  0 disk
> root@WXS0089:~# cat /sys/block/sda/bcache/priority_stats
> Unused:         1%
> Clean:          29%
> Dirty:          70%
> Metadata:       0%
> Average:        49
> Sectors per Q:  29184768
> Quantiles:      [1 2 3 5 6 8 9 11 13 14 16 19 21 23 26 29 32 36 39 43 48 53 59 65 73 83 94 109 129 156 203]
> root@WXS0089:~# cat /sys/fs/bcache/066319e1-8680-4b5b-adb8-49596319154b/internal/gc_after_writeback
> 1
> You have new mail in /var/mail/root
> root@WXS0089:~# cat /sys/fs/bcache/066319e1-8680-4b5b-adb8-49596319154b/cache_available_percent
> 28
> 
> I read the source codes and found if cache_available_percent > 50, it should wakeup gc while doing writeback, but it seemed not work right.
> 

If gc_after_writeback is enabled, and after it is enabled and the cache
usage > 50%, a tag BCH_DO_AUTO_GC will be set to c->gc_after_writeback.
Then when the writeback completed the gc thread will wake up in force.

so the auto gc after writeback will be triggered when,
1, the bcache device is in writeback mode
2, gc_after_writeback set to 1
3, After 2) done, the cache usage exceeds 50% threshold.
4, writeback rate set to maximum rate when the bcache device is idle (no
regular I/O request)
5, after the writeback accomplished, the gc thread will be waken up.

But /sys/block/bcache0/bcache/dirty_data is 0.0k doesn't mean the
writeback is accomplished. It is possible the writeback thread still
goes through all btree keys for the last try even all the dirty data are
flushed. Therefore you should check whether the writeback thread is
still active before a conclusion is made that the writeback is completed.

BTW, do you try a Linux v5.8+ kernel and see how things are ?

Thanks.

Coly Li

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Large latency with bcache for Ceph OSD(new mail thread)
  2021-03-02 13:20 ` Coly Li
@ 2021-03-03  3:25   ` Norman.Kern
  2021-03-05  9:00   ` Norman.Kern
  1 sibling, 0 replies; 6+ messages in thread
From: Norman.Kern @ 2021-03-03  3:25 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-bcache, linux-block


On 2021/3/2 下午9:20, Coly Li wrote:
> On 3/2/21 6:20 PM, Norman.Kern wrote:
>> Sorry for creating a new mail thread(the origin is so long...)
>>
>>
>> I made a test again and get more infomation:
>>
>> root@WXS0089:~# cat /sys/block/bcache0/bcache/dirty_data
>> 0.0k
>> root@WXS0089:~# lsblk /dev/sda
>> NAME      MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
>> sda         8:0    0 447.1G  0 disk
>> `-bcache0 252:0    0  10.9T  0 disk
>> root@WXS0089:~# cat /sys/block/sda/bcache/priority_stats
>> Unused:         1%
>> Clean:          29%
>> Dirty:          70%
>> Metadata:       0%
>> Average:        49
>> Sectors per Q:  29184768
>> Quantiles:      [1 2 3 5 6 8 9 11 13 14 16 19 21 23 26 29 32 36 39 43 48 53 59 65 73 83 94 109 129 156 203]
>> root@WXS0089:~# cat /sys/fs/bcache/066319e1-8680-4b5b-adb8-49596319154b/internal/gc_after_writeback
>> 1
>> You have new mail in /var/mail/root
>> root@WXS0089:~# cat /sys/fs/bcache/066319e1-8680-4b5b-adb8-49596319154b/cache_available_percent
>> 28
>>
>> I read the source codes and found if cache_available_percent > 50, it should wakeup gc while doing writeback, but it seemed not work right.
>>
> If gc_after_writeback is enabled, and after it is enabled and the cache
> usage > 50%, a tag BCH_DO_AUTO_GC will be set to c->gc_after_writeback.
> Then when the writeback completed the gc thread will wake up in force.
>
> so the auto gc after writeback will be triggered when,
> 1, the bcache device is in writeback mode
> 2, gc_after_writeback set to 1
> 3, After 2) done, the cache usage exceeds 50% threshold.
> 4, writeback rate set to maximum rate when the bcache device is idle (no
> regular I/O request)
> 5, after the writeback accomplished, the gc thread will be waken up.
>
> But /sys/block/bcache0/bcache/dirty_data is 0.0k doesn't mean the
> writeback is accomplished. It is possible the writeback thread still
> goes through all btree keys for the last try even all the dirty data are
> flushed. Therefore you should check whether the writeback thread is
> still active before a conclusion is made that the writeback is completed.
>
> BTW, do you try a Linux v5.8+ kernel and see how things are ?
I stoped all rw tests for almost the whole day and no iops from iostat. When I echo 1 to trigger_gc the

cache_available_percent changed for 29 to 100 in seconds.
I will have a test on 5.8.0-44.

>
> Thanks.
>
> Coly Li

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Large latency with bcache for Ceph OSD(new mail thread)
  2021-03-02 13:20 ` Coly Li
  2021-03-03  3:25   ` Norman.Kern
@ 2021-03-05  9:00   ` Norman.Kern
  2021-03-05 10:03     ` Coly Li
  1 sibling, 1 reply; 6+ messages in thread
From: Norman.Kern @ 2021-03-05  9:00 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-bcache, linux-block


On 2021/3/2 下午9:20, Coly Li wrote:
> On 3/2/21 6:20 PM, Norman.Kern wrote:
>> Sorry for creating a new mail thread(the origin is so long...)
>>
>>
>> I made a test again and get more infomation:
>>
>> root@WXS0089:~# cat /sys/block/bcache0/bcache/dirty_data
>> 0.0k
>> root@WXS0089:~# lsblk /dev/sda
>> NAME      MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
>> sda         8:0    0 447.1G  0 disk
>> `-bcache0 252:0    0  10.9T  0 disk
>> root@WXS0089:~# cat /sys/block/sda/bcache/priority_stats
>> Unused:         1%
>> Clean:          29%
>> Dirty:          70%
>> Metadata:       0%
>> Average:        49
>> Sectors per Q:  29184768
>> Quantiles:      [1 2 3 5 6 8 9 11 13 14 16 19 21 23 26 29 32 36 39 43 48 53 59 65 73 83 94 109 129 156 203]
>> root@WXS0089:~# cat /sys/fs/bcache/066319e1-8680-4b5b-adb8-49596319154b/internal/gc_after_writeback
>> 1
>> You have new mail in /var/mail/root
>> root@WXS0089:~# cat /sys/fs/bcache/066319e1-8680-4b5b-adb8-49596319154b/cache_available_percent
>> 28
>>
>> I read the source codes and found if cache_available_percent > 50, it should wakeup gc while doing writeback, but it seemed not work right.
>>
> If gc_after_writeback is enabled, and after it is enabled and the cache
> usage > 50%, a tag BCH_DO_AUTO_GC will be set to c->gc_after_writeback.
> Then when the writeback completed the gc thread will wake up in force.
>
> so the auto gc after writeback will be triggered when,
> 1, the bcache device is in writeback mode
> 2, gc_after_writeback set to 1
> 3, After 2) done, the cache usage exceeds 50% threshold.
> 4, writeback rate set to maximum rate when the bcache device is idle (no
> regular I/O request)
> 5, after the writeback accomplished, the gc thread will be waken up.
>
> But /sys/block/bcache0/bcache/dirty_data is 0.0k doesn't mean the
> writeback is accomplished. It is possible the writeback thread still
> goes through all btree keys for the last try even all the dirty data are
> flushed. Therefore you should check whether the writeback thread is
> still active before a conclusion is made that the writeback is completed.
>
> BTW, do you try a Linux v5.8+ kernel and see how things are ?

I have test on 5.8.X,  but it doesn't help. I test on the same config on another server(480G SSD + 8T HDD),

it can't reproduce, this really made me confused. I will compare the configs and try to find out the diffs.

Thanks.

Norman

>
> Thanks.
>
> Coly Li

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Large latency with bcache for Ceph OSD(new mail thread)
  2021-03-05  9:00   ` Norman.Kern
@ 2021-03-05 10:03     ` Coly Li
  2021-03-08  5:47       ` Norman.Kern
  0 siblings, 1 reply; 6+ messages in thread
From: Coly Li @ 2021-03-05 10:03 UTC (permalink / raw)
  To: Norman.Kern; +Cc: linux-bcache, linux-block

On 3/5/21 5:00 PM, Norman.Kern wrote:
> 
> On 2021/3/2 下午9:20, Coly Li wrote:
>> On 3/2/21 6:20 PM, Norman.Kern wrote:
>>> Sorry for creating a new mail thread(the origin is so long...)
>>>
>>>
>>> I made a test again and get more infomation:
>>>
>>> root@WXS0089:~# cat /sys/block/bcache0/bcache/dirty_data
>>> 0.0k
>>> root@WXS0089:~# lsblk /dev/sda
>>> NAME      MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
>>> sda         8:0    0 447.1G  0 disk
>>> `-bcache0 252:0    0  10.9T  0 disk
>>> root@WXS0089:~# cat /sys/block/sda/bcache/priority_stats
>>> Unused:         1%
>>> Clean:          29%
>>> Dirty:          70%
>>> Metadata:       0%
>>> Average:        49
>>> Sectors per Q:  29184768
>>> Quantiles:      [1 2 3 5 6 8 9 11 13 14 16 19 21 23 26 29 32 36 39 43 48 53 59 65 73 83 94 109 129 156 203]
>>> root@WXS0089:~# cat /sys/fs/bcache/066319e1-8680-4b5b-adb8-49596319154b/internal/gc_after_writeback
>>> 1
>>> You have new mail in /var/mail/root
>>> root@WXS0089:~# cat /sys/fs/bcache/066319e1-8680-4b5b-adb8-49596319154b/cache_available_percent
>>> 28
>>>
>>> I read the source codes and found if cache_available_percent > 50, it should wakeup gc while doing writeback, but it seemed not work right.
>>>
>> If gc_after_writeback is enabled, and after it is enabled and the cache
>> usage > 50%, a tag BCH_DO_AUTO_GC will be set to c->gc_after_writeback.
>> Then when the writeback completed the gc thread will wake up in force.
>>
>> so the auto gc after writeback will be triggered when,
>> 1, the bcache device is in writeback mode
>> 2, gc_after_writeback set to 1
>> 3, After 2) done, the cache usage exceeds 50% threshold.
>> 4, writeback rate set to maximum rate when the bcache device is idle (no
>> regular I/O request)
>> 5, after the writeback accomplished, the gc thread will be waken up.
>>
>> But /sys/block/bcache0/bcache/dirty_data is 0.0k doesn't mean the
>> writeback is accomplished. It is possible the writeback thread still
>> goes through all btree keys for the last try even all the dirty data are
>> flushed. Therefore you should check whether the writeback thread is
>> still active before a conclusion is made that the writeback is completed.
>>
>> BTW, do you try a Linux v5.8+ kernel and see how things are ?
> 
> I have test on 5.8.X,  but it doesn't help. I test on the same config on another server(480G SSD + 8T HDD),
> 

What do you mean on "doesn't help" ?  Do you mean the force gc does not
trigger, or something else.

> it can't reproduce, this really made me confused. I will compare the configs and try to find out the diffs.

For which behavior that it don't reproduce ?

Thanks.

Coly Li

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Large latency with bcache for Ceph OSD(new mail thread)
  2021-03-05 10:03     ` Coly Li
@ 2021-03-08  5:47       ` Norman.Kern
  0 siblings, 0 replies; 6+ messages in thread
From: Norman.Kern @ 2021-03-08  5:47 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-bcache, linux-block


On 2021/3/5 下午6:03, Coly Li wrote:
> On 3/5/21 5:00 PM, Norman.Kern wrote:
>> On 2021/3/2 下午9:20, Coly Li wrote:
>>> On 3/2/21 6:20 PM, Norman.Kern wrote:
>>>> Sorry for creating a new mail thread(the origin is so long...)
>>>>
>>>>
>>>> I made a test again and get more infomation:
>>>>
>>>> root@WXS0089:~# cat /sys/block/bcache0/bcache/dirty_data
>>>> 0.0k
>>>> root@WXS0089:~# lsblk /dev/sda
>>>> NAME      MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
>>>> sda         8:0    0 447.1G  0 disk
>>>> `-bcache0 252:0    0  10.9T  0 disk
>>>> root@WXS0089:~# cat /sys/block/sda/bcache/priority_stats
>>>> Unused:         1%
>>>> Clean:          29%
>>>> Dirty:          70%
>>>> Metadata:       0%
>>>> Average:        49
>>>> Sectors per Q:  29184768
>>>> Quantiles:      [1 2 3 5 6 8 9 11 13 14 16 19 21 23 26 29 32 36 39 43 48 53 59 65 73 83 94 109 129 156 203]
>>>> root@WXS0089:~# cat /sys/fs/bcache/066319e1-8680-4b5b-adb8-49596319154b/internal/gc_after_writeback
>>>> 1
>>>> You have new mail in /var/mail/root
>>>> root@WXS0089:~# cat /sys/fs/bcache/066319e1-8680-4b5b-adb8-49596319154b/cache_available_percent
>>>> 28
>>>>
>>>> I read the source codes and found if cache_available_percent > 50, it should wakeup gc while doing writeback, but it seemed not work right.
>>>>
>>> If gc_after_writeback is enabled, and after it is enabled and the cache
>>> usage > 50%, a tag BCH_DO_AUTO_GC will be set to c->gc_after_writeback.
>>> Then when the writeback completed the gc thread will wake up in force.
>>>
>>> so the auto gc after writeback will be triggered when,
>>> 1, the bcache device is in writeback mode
>>> 2, gc_after_writeback set to 1
>>> 3, After 2) done, the cache usage exceeds 50% threshold.
>>> 4, writeback rate set to maximum rate when the bcache device is idle (no
>>> regular I/O request)
>>> 5, after the writeback accomplished, the gc thread will be waken up.
>>>
>>> But /sys/block/bcache0/bcache/dirty_data is 0.0k doesn't mean the
>>> writeback is accomplished. It is possible the writeback thread still
>>> goes through all btree keys for the last try even all the dirty data are
>>> flushed. Therefore you should check whether the writeback thread is
>>> still active before a conclusion is made that the writeback is completed.
>>>
>>> BTW, do you try a Linux v5.8+ kernel and see how things are ?
>> I have test on 5.8.X,  but it doesn't help. I test on the same config on another server(480G SSD + 8T HDD),
>>
> What do you mean on "doesn't help" ?  Do you mean the force gc does not
> trigger, or something else.
The  cache_available_percent didn't reset to 100 automatically after all I/O done for a very long time. I must echo 1 to trigger_gc to help it recovered.
>
>> it can't reproduce, this really made me confused. I will compare the configs and try to find out the diffs.
> For which behavior that it don't reproduce ?
The problem of cache_available_percent not being recovered automatically.
>
> Thanks.
>
> Coly Li

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-03-08  5:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-02 10:20 Large latency with bcache for Ceph OSD(new mail thread) Norman.Kern
2021-03-02 13:20 ` Coly Li
2021-03-03  3:25   ` Norman.Kern
2021-03-05  9:00   ` Norman.Kern
2021-03-05 10:03     ` Coly Li
2021-03-08  5:47       ` Norman.Kern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).