All of lore.kernel.org
 help / color / mirror / Atom feed
* Large latency with bcache for Ceph OSD
@ 2021-02-18  7:56 Norman.Kern
  2021-02-21 23:48 ` Norman.Kern
  0 siblings, 1 reply; 11+ messages in thread
From: Norman.Kern @ 2021-02-18  7:56 UTC (permalink / raw)
  To: linux-bcache; +Cc: linux-block, colyli, axboe

Hi guys,

I am testing ceph with bcache, I found some I/O with O_SYNC writeback to 
HDD, which caused large latency on HDD, I trace the I/O with iosnoop:

./iosnoop  -Q -ts -d '8,192

Tracing block I/O for 1 seconds (buffered)...
STARTs          ENDs            COMM         PID    TYPE DEV 
BLOCK        BYTES     LATms

1809296.292350  1809296.319052  tp_osd_tp    22191  R    8,192 
4578940240   16384     26.70
1809296.292330  1809296.320974  tp_osd_tp    22191  R    8,192 
4577938704   16384     28.64
1809296.292614  1809296.323292  tp_osd_tp    22191  R    8,192 
4600404304   16384     30.68
1809296.292353  1809296.325300  tp_osd_tp    22191  R    8,192 
4578343088   16384     32.95
1809296.292340  1809296.328013  tp_osd_tp    22191  R    8,192 
4578055472   16384     35.67
1809296.292606  1809296.330518  tp_osd_tp    22191  R    8,192 
4578581648   16384     37.91
1809295.169266  1809296.334041  bstore_kv_fi 17266  WS   8,192 
4244996360   4096    1164.78
1809296.292618  1809296.336349  tp_osd_tp    22191  R    8,192 
4602631760   16384     43.73
1809296.292618  1809296.338812  tp_osd_tp    22191  R    8,192 
4602632976   16384     46.19
1809296.030103  1809296.342780  tp_osd_tp    22180  WS   8,192 
4741276048   131072   312.68
1809296.292347  1809296.345045  tp_osd_tp    22191  R    8,192 
4609037872   16384     52.70
1809296.292620  1809296.345109  tp_osd_tp    22191  R    8,192 
4609037904   16384     52.49
1809296.292612  1809296.347251  tp_osd_tp    22191  R    8,192 
4578937616   16384     54.64
1809296.292621  1809296.351136  tp_osd_tp    22191  R    8,192 
4612654992   16384     58.51
1809296.292341  1809296.353428  tp_osd_tp    22191  R    8,192 
4578220656   16384     61.09
1809296.292342  1809296.353864  tp_osd_tp    22191  R    8,192 
4578220880   16384     61.52
1809295.167650  1809296.358510  bstore_kv_fi 17266  WS   8,192 
4923695960   4096    1190.86
1809296.292347  1809296.361885  tp_osd_tp    22191  R    8,192 
4607437136   16384     69.54
1809296.029363  1809296.367313  tp_osd_tp    22180  WS   8,192 
4739824400   98304    337.95
1809296.292349  1809296.370245  tp_osd_tp    22191  R    8,192 
4591379888   16384     77.90
1809296.292348  1809296.376273  tp_osd_tp    22191  R    8,192 
4591289552   16384     83.92
1809296.292353  1809296.378659  tp_osd_tp    22191  R    8,192 
4578248656   16384     86.31
1809296.292619  1809296.384835  tp_osd_tp    22191  R    8,192 
4617494160   65536     92.22
1809295.165451  1809296.393715  bstore_kv_fi 17266  WS   8,192 
1355703120   4096    1228.26
1809295.168595  1809296.401560  bstore_kv_fi 17266  WS   8,192 
1122200      4096    1232.96
1809295.165221  1809296.408018  bstore_kv_fi 17266  WS   8,192 
960656       4096    1242.80
1809295.166737  1809296.411505  bstore_kv_fi 17266  WS   8,192 
57682504     4096    1244.77
1809296.292352  1809296.418123  tp_osd_tp    22191  R    8,192 
4579459056   32768    125.77

I'm confused why write with O_SYNC must writeback on the backend storage 
device?  And when I used bcache for a time,

the latency increased a lot.(The SSD is not very busy), There's some 
best practices on configuration?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Large latency with bcache for Ceph OSD
  2021-02-18  7:56 Large latency with bcache for Ceph OSD Norman.Kern
@ 2021-02-21 23:48 ` Norman.Kern
  2021-02-24  8:52   ` Coly Li
  0 siblings, 1 reply; 11+ messages in thread
From: Norman.Kern @ 2021-02-21 23:48 UTC (permalink / raw)
  To: linux-bcache, colyli; +Cc: linux-block, axboe

Ping.

I'm confused on the SYNC I/O on bcache. why SYNC I/O must be writen back 
for persistent cache?  It can cause some latency.

@Coly, can you give help me to explain why bcache handle O_SYNC like this.?


On 2021/2/18 下午3:56, Norman.Kern wrote:
> Hi guys,
>
> I am testing ceph with bcache, I found some I/O with O_SYNC writeback 
> to HDD, which caused large latency on HDD, I trace the I/O with iosnoop:
>
> ./iosnoop  -Q -ts -d '8,192
>
> Tracing block I/O for 1 seconds (buffered)...
> STARTs          ENDs            COMM         PID    TYPE DEV 
> BLOCK        BYTES     LATms
>
> 1809296.292350  1809296.319052  tp_osd_tp    22191  R    8,192 
> 4578940240   16384     26.70
> 1809296.292330  1809296.320974  tp_osd_tp    22191  R    8,192 
> 4577938704   16384     28.64
> 1809296.292614  1809296.323292  tp_osd_tp    22191  R    8,192 
> 4600404304   16384     30.68
> 1809296.292353  1809296.325300  tp_osd_tp    22191  R    8,192 
> 4578343088   16384     32.95
> 1809296.292340  1809296.328013  tp_osd_tp    22191  R    8,192 
> 4578055472   16384     35.67
> 1809296.292606  1809296.330518  tp_osd_tp    22191  R    8,192 
> 4578581648   16384     37.91
> 1809295.169266  1809296.334041  bstore_kv_fi 17266  WS   8,192 
> 4244996360   4096    1164.78
> 1809296.292618  1809296.336349  tp_osd_tp    22191  R    8,192 
> 4602631760   16384     43.73
> 1809296.292618  1809296.338812  tp_osd_tp    22191  R    8,192 
> 4602632976   16384     46.19
> 1809296.030103  1809296.342780  tp_osd_tp    22180  WS   8,192 
> 4741276048   131072   312.68
> 1809296.292347  1809296.345045  tp_osd_tp    22191  R    8,192 
> 4609037872   16384     52.70
> 1809296.292620  1809296.345109  tp_osd_tp    22191  R    8,192 
> 4609037904   16384     52.49
> 1809296.292612  1809296.347251  tp_osd_tp    22191  R    8,192 
> 4578937616   16384     54.64
> 1809296.292621  1809296.351136  tp_osd_tp    22191  R    8,192 
> 4612654992   16384     58.51
> 1809296.292341  1809296.353428  tp_osd_tp    22191  R    8,192 
> 4578220656   16384     61.09
> 1809296.292342  1809296.353864  tp_osd_tp    22191  R    8,192 
> 4578220880   16384     61.52
> 1809295.167650  1809296.358510  bstore_kv_fi 17266  WS   8,192 
> 4923695960   4096    1190.86
> 1809296.292347  1809296.361885  tp_osd_tp    22191  R    8,192 
> 4607437136   16384     69.54
> 1809296.029363  1809296.367313  tp_osd_tp    22180  WS   8,192 
> 4739824400   98304    337.95
> 1809296.292349  1809296.370245  tp_osd_tp    22191  R    8,192 
> 4591379888   16384     77.90
> 1809296.292348  1809296.376273  tp_osd_tp    22191  R    8,192 
> 4591289552   16384     83.92
> 1809296.292353  1809296.378659  tp_osd_tp    22191  R    8,192 
> 4578248656   16384     86.31
> 1809296.292619  1809296.384835  tp_osd_tp    22191  R    8,192 
> 4617494160   65536     92.22
> 1809295.165451  1809296.393715  bstore_kv_fi 17266  WS   8,192 
> 1355703120   4096    1228.26
> 1809295.168595  1809296.401560  bstore_kv_fi 17266  WS   8,192 
> 1122200      4096    1232.96
> 1809295.165221  1809296.408018  bstore_kv_fi 17266  WS   8,192 
> 960656       4096    1242.80
> 1809295.166737  1809296.411505  bstore_kv_fi 17266  WS   8,192 
> 57682504     4096    1244.77
> 1809296.292352  1809296.418123  tp_osd_tp    22191  R    8,192 
> 4579459056   32768    125.77
>
> I'm confused why write with O_SYNC must writeback on the backend 
> storage device?  And when I used bcache for a time,
>
> the latency increased a lot.(The SSD is not very busy), There's some 
> best practices on configuration?
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Large latency with bcache for Ceph OSD
  2021-02-21 23:48 ` Norman.Kern
@ 2021-02-24  8:52   ` Coly Li
  2021-02-25  2:22     ` Norman.Kern
  2021-02-25  2:23     ` Norman.Kern
  0 siblings, 2 replies; 11+ messages in thread
From: Coly Li @ 2021-02-24  8:52 UTC (permalink / raw)
  To: Norman.Kern; +Cc: linux-block, axboe, linux-bcache

On 2/22/21 7:48 AM, Norman.Kern wrote:
> Ping.
> 
> I'm confused on the SYNC I/O on bcache. why SYNC I/O must be writen back
> for persistent cache?  It can cause some latency.
> 
> @Coly, can you give help me to explain why bcache handle O_SYNC like this.?
> 
> 

Hmm, normally we won't observe the application issuing I/Os on backing
device except for,
- I/O bypass by SSD congestion
- Sequential I/O request
- Dirty buckets exceeds the cutoff threshold
- Write through mode

Do you set the write/read congestion threshold to 0 ?

Coly Li

> On 2021/2/18 下午3:56, Norman.Kern wrote:
>> Hi guys,
>>
>> I am testing ceph with bcache, I found some I/O with O_SYNC writeback
>> to HDD, which caused large latency on HDD, I trace the I/O with iosnoop:
>>
>> ./iosnoop  -Q -ts -d '8,192
>>
>> Tracing block I/O for 1 seconds (buffered)...
>> STARTs          ENDs            COMM         PID    TYPE DEV
>> BLOCK        BYTES     LATms
>>
>> 1809296.292350  1809296.319052  tp_osd_tp    22191  R    8,192
>> 4578940240   16384     26.70
>> 1809296.292330  1809296.320974  tp_osd_tp    22191  R    8,192
>> 4577938704   16384     28.64
>> 1809296.292614  1809296.323292  tp_osd_tp    22191  R    8,192
>> 4600404304   16384     30.68
>> 1809296.292353  1809296.325300  tp_osd_tp    22191  R    8,192
>> 4578343088   16384     32.95
>> 1809296.292340  1809296.328013  tp_osd_tp    22191  R    8,192
>> 4578055472   16384     35.67
>> 1809296.292606  1809296.330518  tp_osd_tp    22191  R    8,192
>> 4578581648   16384     37.91
>> 1809295.169266  1809296.334041  bstore_kv_fi 17266  WS   8,192
>> 4244996360   4096    1164.78
>> 1809296.292618  1809296.336349  tp_osd_tp    22191  R    8,192
>> 4602631760   16384     43.73
>> 1809296.292618  1809296.338812  tp_osd_tp    22191  R    8,192
>> 4602632976   16384     46.19
>> 1809296.030103  1809296.342780  tp_osd_tp    22180  WS   8,192
>> 4741276048   131072   312.68
>> 1809296.292347  1809296.345045  tp_osd_tp    22191  R    8,192
>> 4609037872   16384     52.70
>> 1809296.292620  1809296.345109  tp_osd_tp    22191  R    8,192
>> 4609037904   16384     52.49
>> 1809296.292612  1809296.347251  tp_osd_tp    22191  R    8,192
>> 4578937616   16384     54.64
>> 1809296.292621  1809296.351136  tp_osd_tp    22191  R    8,192
>> 4612654992   16384     58.51
>> 1809296.292341  1809296.353428  tp_osd_tp    22191  R    8,192
>> 4578220656   16384     61.09
>> 1809296.292342  1809296.353864  tp_osd_tp    22191  R    8,192
>> 4578220880   16384     61.52
>> 1809295.167650  1809296.358510  bstore_kv_fi 17266  WS   8,192
>> 4923695960   4096    1190.86
>> 1809296.292347  1809296.361885  tp_osd_tp    22191  R    8,192
>> 4607437136   16384     69.54
>> 1809296.029363  1809296.367313  tp_osd_tp    22180  WS   8,192
>> 4739824400   98304    337.95
>> 1809296.292349  1809296.370245  tp_osd_tp    22191  R    8,192
>> 4591379888   16384     77.90
>> 1809296.292348  1809296.376273  tp_osd_tp    22191  R    8,192
>> 4591289552   16384     83.92
>> 1809296.292353  1809296.378659  tp_osd_tp    22191  R    8,192
>> 4578248656   16384     86.31
>> 1809296.292619  1809296.384835  tp_osd_tp    22191  R    8,192
>> 4617494160   65536     92.22
>> 1809295.165451  1809296.393715  bstore_kv_fi 17266  WS   8,192
>> 1355703120   4096    1228.26
>> 1809295.168595  1809296.401560  bstore_kv_fi 17266  WS   8,192
>> 1122200      4096    1232.96
>> 1809295.165221  1809296.408018  bstore_kv_fi 17266  WS   8,192
>> 960656       4096    1242.80
>> 1809295.166737  1809296.411505  bstore_kv_fi 17266  WS   8,192
>> 57682504     4096    1244.77
>> 1809296.292352  1809296.418123  tp_osd_tp    22191  R    8,192
>> 4579459056   32768    125.77
>>
>> I'm confused why write with O_SYNC must writeback on the backend
>> storage device?  And when I used bcache for a time,
>>
>> the latency increased a lot.(The SSD is not very busy), There's some
>> best practices on configuration?
>>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Large latency with bcache for Ceph OSD
  2021-02-24  8:52   ` Coly Li
@ 2021-02-25  2:22     ` Norman.Kern
  2021-02-25  2:23     ` Norman.Kern
  1 sibling, 0 replies; 11+ messages in thread
From: Norman.Kern @ 2021-02-25  2:22 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-block, axboe, linux-bcache


On 2021/2/24 下午4:52, Coly Li wrote:
> On 2/22/21 7:48 AM, Norman.Kern wrote:
>> Ping.
>>
>> I'm confused on the SYNC I/O on bcache. why SYNC I/O must be writen back
>> for persistent cache?  It can cause some latency.
>>
>> @Coly, can you give help me to explain why bcache handle O_SYNC like this.?
>>
>>
> Hmm, normally we won't observe the application issuing I/Os on backing
> device except for,
> - I/O bypass by SSD congestion
> - Sequential I/O request
> - Dirty buckets exceeds the cutoff threshold
> - Write through mode
>
> Do you set the write/read congestion threshold to 0 ?

Thanks for you reply.

I have set the threshold to zero, all configs:

#make-bcache -C -b 4m -w 4k --discard --cache_replacement_policy=lru /dev/sdm
#make-bcache -B --writeback -w 4KiB /dev/sdn --wipe-bcache
congested_read_threshold_us = 0
congested_write_threshold_us = 0

# I tried to set sequential_cutoff to 0, but it didn't solve it.

sequential_cutoff = 4194304
writeback_percent = 40
cache_mode = writeback

I renew the cluster, run for hours and reproduced the problem. I check the cache status:

root@WXS0106:/root/perf-tools# cat /sys/fs/bcache/d87713c6-2e76-4a09-8517-d48306468659/cache_available_percent
29
root@WXS0106:/root/perf-tools# cat /sys/fs/bcache/d87713c6-2e76-4a09-8517-d48306468659/internal/cutoff_writeback_sync
70
'Dirty buckets exceeds the cutoff threshold' caused the problem, is my config wrong or other reasons?

>
> Coly Li
>
>> On 2021/2/18 下午3:56, Norman.Kern wrote:
>>> Hi guys,
>>>
>>> I am testing ceph with bcache, I found some I/O with O_SYNC writeback
>>> to HDD, which caused large latency on HDD, I trace the I/O with iosnoop:
>>>
>>> ./iosnoop  -Q -ts -d '8,192
>>>
>>> Tracing block I/O for 1 seconds (buffered)...
>>> STARTs          ENDs            COMM         PID    TYPE DEV
>>> BLOCK        BYTES     LATms
>>>
>>> 1809296.292350  1809296.319052  tp_osd_tp    22191  R    8,192
>>> 4578940240   16384     26.70
>>> 1809296.292330  1809296.320974  tp_osd_tp    22191  R    8,192
>>> 4577938704   16384     28.64
>>> 1809296.292614  1809296.323292  tp_osd_tp    22191  R    8,192
>>> 4600404304   16384     30.68
>>> 1809296.292353  1809296.325300  tp_osd_tp    22191  R    8,192
>>> 4578343088   16384     32.95
>>> 1809296.292340  1809296.328013  tp_osd_tp    22191  R    8,192
>>> 4578055472   16384     35.67
>>> 1809296.292606  1809296.330518  tp_osd_tp    22191  R    8,192
>>> 4578581648   16384     37.91
>>> 1809295.169266  1809296.334041  bstore_kv_fi 17266  WS   8,192
>>> 4244996360   4096    1164.78
>>> 1809296.292618  1809296.336349  tp_osd_tp    22191  R    8,192
>>> 4602631760   16384     43.73
>>> 1809296.292618  1809296.338812  tp_osd_tp    22191  R    8,192
>>> 4602632976   16384     46.19
>>> 1809296.030103  1809296.342780  tp_osd_tp    22180  WS   8,192
>>> 4741276048   131072   312.68
>>> 1809296.292347  1809296.345045  tp_osd_tp    22191  R    8,192
>>> 4609037872   16384     52.70
>>> 1809296.292620  1809296.345109  tp_osd_tp    22191  R    8,192
>>> 4609037904   16384     52.49
>>> 1809296.292612  1809296.347251  tp_osd_tp    22191  R    8,192
>>> 4578937616   16384     54.64
>>> 1809296.292621  1809296.351136  tp_osd_tp    22191  R    8,192
>>> 4612654992   16384     58.51
>>> 1809296.292341  1809296.353428  tp_osd_tp    22191  R    8,192
>>> 4578220656   16384     61.09
>>> 1809296.292342  1809296.353864  tp_osd_tp    22191  R    8,192
>>> 4578220880   16384     61.52
>>> 1809295.167650  1809296.358510  bstore_kv_fi 17266  WS   8,192
>>> 4923695960   4096    1190.86
>>> 1809296.292347  1809296.361885  tp_osd_tp    22191  R    8,192
>>> 4607437136   16384     69.54
>>> 1809296.029363  1809296.367313  tp_osd_tp    22180  WS   8,192
>>> 4739824400   98304    337.95
>>> 1809296.292349  1809296.370245  tp_osd_tp    22191  R    8,192
>>> 4591379888   16384     77.90
>>> 1809296.292348  1809296.376273  tp_osd_tp    22191  R    8,192
>>> 4591289552   16384     83.92
>>> 1809296.292353  1809296.378659  tp_osd_tp    22191  R    8,192
>>> 4578248656   16384     86.31
>>> 1809296.292619  1809296.384835  tp_osd_tp    22191  R    8,192
>>> 4617494160   65536     92.22
>>> 1809295.165451  1809296.393715  bstore_kv_fi 17266  WS   8,192
>>> 1355703120   4096    1228.26
>>> 1809295.168595  1809296.401560  bstore_kv_fi 17266  WS   8,192
>>> 1122200      4096    1232.96
>>> 1809295.165221  1809296.408018  bstore_kv_fi 17266  WS   8,192
>>> 960656       4096    1242.80
>>> 1809295.166737  1809296.411505  bstore_kv_fi 17266  WS   8,192
>>> 57682504     4096    1244.77
>>> 1809296.292352  1809296.418123  tp_osd_tp    22191  R    8,192
>>> 4579459056   32768    125.77
>>>
>>> I'm confused why write with O_SYNC must writeback on the backend
>>> storage device?  And when I used bcache for a time,
>>>
>>> the latency increased a lot.(The SSD is not very busy), There's some
>>> best practices on configuration?
>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Large latency with bcache for Ceph OSD
  2021-02-24  8:52   ` Coly Li
  2021-02-25  2:22     ` Norman.Kern
@ 2021-02-25  2:23     ` Norman.Kern
  2021-02-25 13:00       ` Norman.Kern
  1 sibling, 1 reply; 11+ messages in thread
From: Norman.Kern @ 2021-02-25  2:23 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-block, axboe, linux-bcache


On 2021/2/24 下午4:52, Coly Li wrote:
> On 2/22/21 7:48 AM, Norman.Kern wrote:
>> Ping.
>>
>> I'm confused on the SYNC I/O on bcache. why SYNC I/O must be writen back
>> for persistent cache?  It can cause some latency.
>>
>> @Coly, can you give help me to explain why bcache handle O_SYNC like this.?
>>
>>
> Hmm, normally we won't observe the application issuing I/Os on backing
> device except for,
> - I/O bypass by SSD congestion
> - Sequential I/O request
> - Dirty buckets exceeds the cutoff threshold
> - Write through mode
>
> Do you set the write/read congestion threshold to 0 ?

Thanks for you reply.

I have set the threshold to zero, all configs:

#make-bcache -C -b 4m -w 4k --discard --cache_replacement_policy=lru /dev/sdm
#make-bcache -B --writeback -w 4KiB /dev/sdn --wipe-bcache
congested_read_threshold_us = 0
congested_write_threshold_us = 0

# I tried to set sequential_cutoff to 0, but it didn't solve it.

sequential_cutoff = 4194304
writeback_percent = 40
cache_mode = writeback

I renew the cluster, run for hours and reproduced the problem. I check the cache status:

root@WXS0106:/root/perf-tools# cat /sys/fs/bcache/d87713c6-2e76-4a09-8517-d48306468659/cache_available_percent
29
root@WXS0106:/root/perf-tools# cat /sys/fs/bcache/d87713c6-2e76-4a09-8517-d48306468659/internal/cutoff_writeback_sync
70
'Dirty buckets exceeds the cutoff threshold' caused the problem?  My configs  are wrong or other reasons?

>
> Coly Li
>
>> On 2021/2/18 下午3:56, Norman.Kern wrote:
>>> Hi guys,
>>>
>>> I am testing ceph with bcache, I found some I/O with O_SYNC writeback
>>> to HDD, which caused large latency on HDD, I trace the I/O with iosnoop:
>>>
>>> ./iosnoop  -Q -ts -d '8,192
>>>
>>> Tracing block I/O for 1 seconds (buffered)...
>>> STARTs          ENDs            COMM         PID    TYPE DEV
>>> BLOCK        BYTES     LATms
>>>
>>> 1809296.292350  1809296.319052  tp_osd_tp    22191  R    8,192
>>> 4578940240   16384     26.70
>>> 1809296.292330  1809296.320974  tp_osd_tp    22191  R    8,192
>>> 4577938704   16384     28.64
>>> 1809296.292614  1809296.323292  tp_osd_tp    22191  R    8,192
>>> 4600404304   16384     30.68
>>> 1809296.292353  1809296.325300  tp_osd_tp    22191  R    8,192
>>> 4578343088   16384     32.95
>>> 1809296.292340  1809296.328013  tp_osd_tp    22191  R    8,192
>>> 4578055472   16384     35.67
>>> 1809296.292606  1809296.330518  tp_osd_tp    22191  R    8,192
>>> 4578581648   16384     37.91
>>> 1809295.169266  1809296.334041  bstore_kv_fi 17266  WS   8,192
>>> 4244996360   4096    1164.78
>>> 1809296.292618  1809296.336349  tp_osd_tp    22191  R    8,192
>>> 4602631760   16384     43.73
>>> 1809296.292618  1809296.338812  tp_osd_tp    22191  R    8,192
>>> 4602632976   16384     46.19
>>> 1809296.030103  1809296.342780  tp_osd_tp    22180  WS   8,192
>>> 4741276048   131072   312.68
>>> 1809296.292347  1809296.345045  tp_osd_tp    22191  R    8,192
>>> 4609037872   16384     52.70
>>> 1809296.292620  1809296.345109  tp_osd_tp    22191  R    8,192
>>> 4609037904   16384     52.49
>>> 1809296.292612  1809296.347251  tp_osd_tp    22191  R    8,192
>>> 4578937616   16384     54.64
>>> 1809296.292621  1809296.351136  tp_osd_tp    22191  R    8,192
>>> 4612654992   16384     58.51
>>> 1809296.292341  1809296.353428  tp_osd_tp    22191  R    8,192
>>> 4578220656   16384     61.09
>>> 1809296.292342  1809296.353864  tp_osd_tp    22191  R    8,192
>>> 4578220880   16384     61.52
>>> 1809295.167650  1809296.358510  bstore_kv_fi 17266  WS   8,192
>>> 4923695960   4096    1190.86
>>> 1809296.292347  1809296.361885  tp_osd_tp    22191  R    8,192
>>> 4607437136   16384     69.54
>>> 1809296.029363  1809296.367313  tp_osd_tp    22180  WS   8,192
>>> 4739824400   98304    337.95
>>> 1809296.292349  1809296.370245  tp_osd_tp    22191  R    8,192
>>> 4591379888   16384     77.90
>>> 1809296.292348  1809296.376273  tp_osd_tp    22191  R    8,192
>>> 4591289552   16384     83.92
>>> 1809296.292353  1809296.378659  tp_osd_tp    22191  R    8,192
>>> 4578248656   16384     86.31
>>> 1809296.292619  1809296.384835  tp_osd_tp    22191  R    8,192
>>> 4617494160   65536     92.22
>>> 1809295.165451  1809296.393715  bstore_kv_fi 17266  WS   8,192
>>> 1355703120   4096    1228.26
>>> 1809295.168595  1809296.401560  bstore_kv_fi 17266  WS   8,192
>>> 1122200      4096    1232.96
>>> 1809295.165221  1809296.408018  bstore_kv_fi 17266  WS   8,192
>>> 960656       4096    1242.80
>>> 1809295.166737  1809296.411505  bstore_kv_fi 17266  WS   8,192
>>> 57682504     4096    1244.77
>>> 1809296.292352  1809296.418123  tp_osd_tp    22191  R    8,192
>>> 4579459056   32768    125.77
>>>
>>> I'm confused why write with O_SYNC must writeback on the backend
>>> storage device?  And when I used bcache for a time,
>>>
>>> the latency increased a lot.(The SSD is not very busy), There's some
>>> best practices on configuration?
>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Large latency with bcache for Ceph OSD
  2021-02-25  2:23     ` Norman.Kern
@ 2021-02-25 13:00       ` Norman.Kern
  2021-02-25 14:44         ` Coly Li
  0 siblings, 1 reply; 11+ messages in thread
From: Norman.Kern @ 2021-02-25 13:00 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-block, axboe, linux-bcache

I made a test:

- Stop writing and wait for dirty data writen back

$ lsblk
NAME                                                                                                   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sdf                                                                                                      8:80   0   7.3T  0 disk
└─bcache0                                                                                              252:0    0   7.3T  0 disk
  └─ceph--32a481f9--313c--417e--aaf7--bdd74515fd86-osd--data--2f670929--3c8a--45dd--bcef--c60ce3ee08e1 253:1    0   7.3T  0 lvm 
sdd                                                                                                      8:48   0   7.3T  0 disk
sdb                                                                                                      8:16   0   7.3T  0 disk
sdk                                                                                                      8:160  0 893.8G  0 disk
└─bcache0                                                                                              252:0    0   7.3T  0 disk
  └─ceph--32a481f9--313c--417e--aaf7--bdd74515fd86-osd--data--2f670929--3c8a--45dd--bcef--c60ce3ee08e1 253:1    0   7.3T  0 lvm 
$ cat /sys/block/bcache0/bcache/dirty_data
0.0k

root@WXS0106:~# bcache-super-show /dev/sdf
sb.magic                ok
sb.first_sector         8 [match]
sb.csum                 71DA9CA968B4A625 [match]
sb.version              1 [backing device]

dev.label               (empty)
dev.uuid                d07dc435-129d-477d-8378-a6af75199852
dev.sectors_per_block   8
dev.sectors_per_bucket  1024
dev.data.first_sector   16
dev.data.cache_mode     1 [writeback]
dev.data.cache_state    1 [clean]
cset.uuid               d87713c6-2e76-4a09-8517-d48306468659

- check the available cache

# cat /sys/fs/bcache/d87713c6-2e76-4a09-8517-d48306468659/cache_available_percent
27


As the doc described:

cache_available_percent
    Percentage of cache device which doesn’t contain dirty data, and could potentially be used for writeback. This doesn’t mean this space isn’t used for clean cached data; the unused statistic (in priority_stats) is typically much lower.
When all dirty data writen back,  why cache_available_percent was not 100?

And when I start the write I/O, the new writen didn't replace the clean cache(it think the cache is diry now?), so it cause the hdd with large latency:

./bin/iosnoop -Q -d '8,80'

<...>        73338  WS   8,80     3513701472   4096     217.69
<...>        73338  WS   8,80     3513759360   4096     448.80
<...>        73338  WS   8,80     3562211912   4096     511.69
<...>        73335  WS   8,80     3562212528   4096     505.08
<...>        73339  WS   8,80     3562213376   4096     501.19
<...>        73336  WS   8,80     3562213992   4096     511.16
<...>        73343  WS   8,80     3562214016   4096     511.74
<...>        73340  WS   8,80     3562214128   4096     512.95
<...>        73329  WS   8,80     3562214208   4096     510.48
<...>        73338  WS   8,80     3562214600   4096     518.64
<...>        73341  WS   8,80     3562214632   4096     519.09
<...>        73342  WS   8,80     3562214664   4096     518.28
<...>        73336  WS   8,80     3562214688   4096     519.27
<...>        73343  WS   8,80     3562214736   4096     528.31
<...>        73339  WS   8,80     3562214784   4096     530.13

On 2021/2/25 上午10:23, Norman.Kern wrote:
> On 2021/2/24 下午4:52, Coly Li wrote:
>> On 2/22/21 7:48 AM, Norman.Kern wrote:
>>> Ping.
>>>
>>> I'm confused on the SYNC I/O on bcache. why SYNC I/O must be writen back
>>> for persistent cache?  It can cause some latency.
>>>
>>> @Coly, can you give help me to explain why bcache handle O_SYNC like this.?
>>>
>>>
>> Hmm, normally we won't observe the application issuing I/Os on backing
>> device except for,
>> - I/O bypass by SSD congestion
>> - Sequential I/O request
>> - Dirty buckets exceeds the cutoff threshold
>> - Write through mode
>>
>> Do you set the write/read congestion threshold to 0 ?
> Thanks for you reply.
>
> I have set the threshold to zero, all configs:
>
> #make-bcache -C -b 4m -w 4k --discard --cache_replacement_policy=lru /dev/sdm
> #make-bcache -B --writeback -w 4KiB /dev/sdn --wipe-bcache
> congested_read_threshold_us = 0
> congested_write_threshold_us = 0
>
> # I tried to set sequential_cutoff to 0, but it didn't solve it.
>
> sequential_cutoff = 4194304
> writeback_percent = 40
> cache_mode = writeback
>
> I renew the cluster, run for hours and reproduced the problem. I check the cache status:
>
> root@WXS0106:/root/perf-tools# cat /sys/fs/bcache/d87713c6-2e76-4a09-8517-d48306468659/cache_available_percent
> 29
> root@WXS0106:/root/perf-tools# cat /sys/fs/bcache/d87713c6-2e76-4a09-8517-d48306468659/internal/cutoff_writeback_sync
> 70
> 'Dirty buckets exceeds the cutoff threshold' caused the problem?  My configs  are wrong or other reasons?
>
>> Coly Li
>>
>>> On 2021/2/18 下午3:56, Norman.Kern wrote:
>>>> Hi guys,
>>>>
>>>> I am testing ceph with bcache, I found some I/O with O_SYNC writeback
>>>> to HDD, which caused large latency on HDD, I trace the I/O with iosnoop:
>>>>
>>>> ./iosnoop  -Q -ts -d '8,192
>>>>
>>>> Tracing block I/O for 1 seconds (buffered)...
>>>> STARTs          ENDs            COMM         PID    TYPE DEV
>>>> BLOCK        BYTES     LATms
>>>>
>>>> 1809296.292350  1809296.319052  tp_osd_tp    22191  R    8,192
>>>> 4578940240   16384     26.70
>>>> 1809296.292330  1809296.320974  tp_osd_tp    22191  R    8,192
>>>> 4577938704   16384     28.64
>>>> 1809296.292614  1809296.323292  tp_osd_tp    22191  R    8,192
>>>> 4600404304   16384     30.68
>>>> 1809296.292353  1809296.325300  tp_osd_tp    22191  R    8,192
>>>> 4578343088   16384     32.95
>>>> 1809296.292340  1809296.328013  tp_osd_tp    22191  R    8,192
>>>> 4578055472   16384     35.67
>>>> 1809296.292606  1809296.330518  tp_osd_tp    22191  R    8,192
>>>> 4578581648   16384     37.91
>>>> 1809295.169266  1809296.334041  bstore_kv_fi 17266  WS   8,192
>>>> 4244996360   4096    1164.78
>>>> 1809296.292618  1809296.336349  tp_osd_tp    22191  R    8,192
>>>> 4602631760   16384     43.73
>>>> 1809296.292618  1809296.338812  tp_osd_tp    22191  R    8,192
>>>> 4602632976   16384     46.19
>>>> 1809296.030103  1809296.342780  tp_osd_tp    22180  WS   8,192
>>>> 4741276048   131072   312.68
>>>> 1809296.292347  1809296.345045  tp_osd_tp    22191  R    8,192
>>>> 4609037872   16384     52.70
>>>> 1809296.292620  1809296.345109  tp_osd_tp    22191  R    8,192
>>>> 4609037904   16384     52.49
>>>> 1809296.292612  1809296.347251  tp_osd_tp    22191  R    8,192
>>>> 4578937616   16384     54.64
>>>> 1809296.292621  1809296.351136  tp_osd_tp    22191  R    8,192
>>>> 4612654992   16384     58.51
>>>> 1809296.292341  1809296.353428  tp_osd_tp    22191  R    8,192
>>>> 4578220656   16384     61.09
>>>> 1809296.292342  1809296.353864  tp_osd_tp    22191  R    8,192
>>>> 4578220880   16384     61.52
>>>> 1809295.167650  1809296.358510  bstore_kv_fi 17266  WS   8,192
>>>> 4923695960   4096    1190.86
>>>> 1809296.292347  1809296.361885  tp_osd_tp    22191  R    8,192
>>>> 4607437136   16384     69.54
>>>> 1809296.029363  1809296.367313  tp_osd_tp    22180  WS   8,192
>>>> 4739824400   98304    337.95
>>>> 1809296.292349  1809296.370245  tp_osd_tp    22191  R    8,192
>>>> 4591379888   16384     77.90
>>>> 1809296.292348  1809296.376273  tp_osd_tp    22191  R    8,192
>>>> 4591289552   16384     83.92
>>>> 1809296.292353  1809296.378659  tp_osd_tp    22191  R    8,192
>>>> 4578248656   16384     86.31
>>>> 1809296.292619  1809296.384835  tp_osd_tp    22191  R    8,192
>>>> 4617494160   65536     92.22
>>>> 1809295.165451  1809296.393715  bstore_kv_fi 17266  WS   8,192
>>>> 1355703120   4096    1228.26
>>>> 1809295.168595  1809296.401560  bstore_kv_fi 17266  WS   8,192
>>>> 1122200      4096    1232.96
>>>> 1809295.165221  1809296.408018  bstore_kv_fi 17266  WS   8,192
>>>> 960656       4096    1242.80
>>>> 1809295.166737  1809296.411505  bstore_kv_fi 17266  WS   8,192
>>>> 57682504     4096    1244.77
>>>> 1809296.292352  1809296.418123  tp_osd_tp    22191  R    8,192
>>>> 4579459056   32768    125.77
>>>>
>>>> I'm confused why write with O_SYNC must writeback on the backend
>>>> storage device?  And when I used bcache for a time,
>>>>
>>>> the latency increased a lot.(The SSD is not very busy), There's some
>>>> best practices on configuration?
>>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Large latency with bcache for Ceph OSD
  2021-02-25 13:00       ` Norman.Kern
@ 2021-02-25 14:44         ` Coly Li
  2021-02-26  8:57           ` Norman.Kern
  0 siblings, 1 reply; 11+ messages in thread
From: Coly Li @ 2021-02-25 14:44 UTC (permalink / raw)
  To: Norman.Kern; +Cc: linux-block, axboe, linux-bcache

On 2/25/21 9:00 PM, Norman.Kern wrote:
> I made a test:

BTW, what is the version of your kernel, and your bcache-tool, and which
distribution is running ?

> 
> - Stop writing and wait for dirty data writen back
> 
> $ lsblk
> NAME                                                                                                   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
> sdf                                                                                                      8:80   0   7.3T  0 disk
> └─bcache0                                                                                              252:0    0   7.3T  0 disk
>   └─ceph--32a481f9--313c--417e--aaf7--bdd74515fd86-osd--data--2f670929--3c8a--45dd--bcef--c60ce3ee08e1 253:1    0   7.3T  0 lvm 
> sdd                                                                                                      8:48   0   7.3T  0 disk
> sdb                                                                                                      8:16   0   7.3T  0 disk
> sdk                                                                                                      8:160  0 893.8G  0 disk
> └─bcache0                                                                                              252:0    0   7.3T  0 disk
>   └─ceph--32a481f9--313c--417e--aaf7--bdd74515fd86-osd--data--2f670929--3c8a--45dd--bcef--c60ce3ee08e1 253:1    0   7.3T  0 lvm 
> $ cat /sys/block/bcache0/bcache/dirty_data
> 0.0k
> 
> root@WXS0106:~# bcache-super-show /dev/sdf
> sb.magic                ok
> sb.first_sector         8 [match]
> sb.csum                 71DA9CA968B4A625 [match]
> sb.version              1 [backing device]
> 
> dev.label               (empty)
> dev.uuid                d07dc435-129d-477d-8378-a6af75199852
> dev.sectors_per_block   8
> dev.sectors_per_bucket  1024
> dev.data.first_sector   16
> dev.data.cache_mode     1 [writeback]
> dev.data.cache_state    1 [clean]
> cset.uuid               d87713c6-2e76-4a09-8517-d48306468659
> 
> - check the available cache
> 
> # cat /sys/fs/bcache/d87713c6-2e76-4a09-8517-d48306468659/cache_available_percent
> 27
> 

What is the content from
/sys/fs/bcache/<cache-set-uuid>/cache0/priority_stats ? Can you past
here too.

There is no dirty blocks, but cache is occupied 78% buckets, if you are
using 5.8+ kernel, then a gc is probably desired.

You may try to trigger a gc by writing to
sys/fs/bcache/<cache-set-uuid>/internal/trigger_gc


> 
> As the doc described:
> 
> cache_available_percent
>     Percentage of cache device which doesn’t contain dirty data, and could potentially be used for writeback. This doesn’t mean this space isn’t used for clean cached data; the unused statistic (in priority_stats) is typically much lower.
> When all dirty data writen back,  why cache_available_percent was not 100?
> 
> And when I start the write I/O, the new writen didn't replace the clean cache(it think the cache is diry now?), so it cause the hdd with large latency:
> 
> ./bin/iosnoop -Q -d '8,80'
> 
> <...>        73338  WS   8,80     3513701472   4096     217.69
> <...>        73338  WS   8,80     3513759360   4096     448.80
> <...>        73338  WS   8,80     3562211912   4096     511.69
> <...>        73335  WS   8,80     3562212528   4096     505.08
> <...>        73339  WS   8,80     3562213376   4096     501.19
> <...>        73336  WS   8,80     3562213992   4096     511.16
> <...>        73343  WS   8,80     3562214016   4096     511.74
> <...>        73340  WS   8,80     3562214128   4096     512.95
> <...>        73329  WS   8,80     3562214208   4096     510.48
> <...>        73338  WS   8,80     3562214600   4096     518.64
> <...>        73341  WS   8,80     3562214632   4096     519.09
> <...>        73342  WS   8,80     3562214664   4096     518.28
> <...>        73336  WS   8,80     3562214688   4096     519.27
> <...>        73343  WS   8,80     3562214736   4096     528.31
> <...>        73339  WS   8,80     3562214784   4096     530.13
> 

I just wondering why gc thread does not run up ....


Thanks.

Coly Li


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Large latency with bcache for Ceph OSD
  2021-02-25 14:44         ` Coly Li
@ 2021-02-26  8:57           ` Norman.Kern
  2021-02-26  9:54             ` Coly Li
  0 siblings, 1 reply; 11+ messages in thread
From: Norman.Kern @ 2021-02-26  8:57 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-block, axboe, linux-bcache


On 2021/2/25 下午10:44, Coly Li wrote:
> On 2/25/21 9:00 PM, Norman.Kern wrote:
>> I made a test:
> BTW, what is the version of your kernel, and your bcache-tool, and which
> distribution is running ?
root@WXS0106:~# uname -a
Linux WXS0106 5.4.0-58-generic #64~18.04.1-Ubuntu SMP Wed Dec 9 17:11:11 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
root@WXS0106:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
UBUNTU_CODENAME=xenial
root@WXS0106:~# dpkg -l bcache-tools
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                  Version                 Architecture            Description
+++-=====================================-=======================-=======================-================================================================================
ii  bcache-tools                          1.0.8-2                 amd64                   bcache userspace tools

>
>> - Stop writing and wait for dirty data writen back
>>
>> $ lsblk
>> NAME                                                                                                   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
>> sdf                                                                                                      8:80   0   7.3T  0 disk
>> └─bcache0                                                                                              252:0    0   7.3T  0 disk
>>   └─ceph--32a481f9--313c--417e--aaf7--bdd74515fd86-osd--data--2f670929--3c8a--45dd--bcef--c60ce3ee08e1 253:1    0   7.3T  0 lvm 
>> sdd                                                                                                      8:48   0   7.3T  0 disk
>> sdb                                                                                                      8:16   0   7.3T  0 disk
>> sdk                                                                                                      8:160  0 893.8G  0 disk
>> └─bcache0                                                                                              252:0    0   7.3T  0 disk
>>   └─ceph--32a481f9--313c--417e--aaf7--bdd74515fd86-osd--data--2f670929--3c8a--45dd--bcef--c60ce3ee08e1 253:1    0   7.3T  0 lvm 
>> $ cat /sys/block/bcache0/bcache/dirty_data
>> 0.0k
>>
>> root@WXS0106:~# bcache-super-show /dev/sdf
>> sb.magic                ok
>> sb.first_sector         8 [match]
>> sb.csum                 71DA9CA968B4A625 [match]
>> sb.version              1 [backing device]
>>
>> dev.label               (empty)
>> dev.uuid                d07dc435-129d-477d-8378-a6af75199852
>> dev.sectors_per_block   8
>> dev.sectors_per_bucket  1024
>> dev.data.first_sector   16
>> dev.data.cache_mode     1 [writeback]
>> dev.data.cache_state    1 [clean]
>> cset.uuid               d87713c6-2e76-4a09-8517-d48306468659
>>
>> - check the available cache
>>
>> # cat /sys/fs/bcache/d87713c6-2e76-4a09-8517-d48306468659/cache_available_percent
>> 27
>>
> What is the content from
> /sys/fs/bcache/<cache-set-uuid>/cache0/priority_stats ? Can you past
> here too.
I forgot get the info before I triggered gc..., I think I can reproduce the problem. When I reproduce it, I will collect the infomation.
>
> There is no dirty blocks, but cache is occupied 78% buckets, if you are
> using 5.8+ kernel, then a gc is probably desired.
>
> You may try to trigger a gc by writing to
> sys/fs/bcache/<cache-set-uuid>/internal/trigger_gc
>
When all cache had written back, I triggered gc, it recalled.

root@WXS0106:~# cat /sys/block/bcache0/bcache/cache/cache_available_percent
30

root@WXS0106:~# echo 1 > /sys/block/bcache0/bcache/cache/internal/trigger_gc
root@WXS0106:~# cat /sys/block/bcache0/bcache/cache/cache_available_percent
97

Why must I trigger gc manually? Is not a default action of bcache-gc thread? And I found it can only work when all dirty data written back.

>> As the doc described:
>>
>> cache_available_percent
>>     Percentage of cache device which doesn’t contain dirty data, and could potentially be used for writeback. This doesn’t mean this space isn’t used for clean cached data; the unused statistic (in priority_stats) is typically much lower.
>> When all dirty data writen back,  why cache_available_percent was not 100?
>>
>> And when I start the write I/O, the new writen didn't replace the clean cache(it think the cache is diry now?), so it cause the hdd with large latency:
>>
>> ./bin/iosnoop -Q -d '8,80'
>>
>> <...>        73338  WS   8,80     3513701472   4096     217.69
>> <...>        73338  WS   8,80     3513759360   4096     448.80
>> <...>        73338  WS   8,80     3562211912   4096     511.69
>> <...>        73335  WS   8,80     3562212528   4096     505.08
>> <...>        73339  WS   8,80     3562213376   4096     501.19
>> <...>        73336  WS   8,80     3562213992   4096     511.16
>> <...>        73343  WS   8,80     3562214016   4096     511.74
>> <...>        73340  WS   8,80     3562214128   4096     512.95
>> <...>        73329  WS   8,80     3562214208   4096     510.48
>> <...>        73338  WS   8,80     3562214600   4096     518.64
>> <...>        73341  WS   8,80     3562214632   4096     519.09
>> <...>        73342  WS   8,80     3562214664   4096     518.28
>> <...>        73336  WS   8,80     3562214688   4096     519.27
>> <...>        73343  WS   8,80     3562214736   4096     528.31
>> <...>        73339  WS   8,80     3562214784   4096     530.13
>>
> I just wondering why gc thread does not run up ....
>
>
> Thanks.
>
> Coly Li
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Large latency with bcache for Ceph OSD
  2021-02-26  8:57           ` Norman.Kern
@ 2021-02-26  9:54             ` Coly Li
  2021-03-02  2:03               ` Norman.Kern
  2021-03-02  5:30               ` Norman.Kern
  0 siblings, 2 replies; 11+ messages in thread
From: Coly Li @ 2021-02-26  9:54 UTC (permalink / raw)
  To: Norman.Kern; +Cc: linux-block, axboe, linux-bcache

On 2/26/21 4:57 PM, Norman.Kern wrote:
>
[snipped]
>> You may try to trigger a gc by writing to
>> sys/fs/bcache/<cache-set-uuid>/internal/trigger_gc
>>
> When all cache had written back, I triggered gc, it recalled.
> 
> root@WXS0106:~# cat /sys/block/bcache0/bcache/cache/cache_available_percent
> 30
> 
> root@WXS0106:~# echo 1 > /sys/block/bcache0/bcache/cache/internal/trigger_gc
> root@WXS0106:~# cat /sys/block/bcache0/bcache/cache/cache_available_percent
> 97
> 
> Why must I trigger gc manually? Is not a default action of bcache-gc thread? And I found it can only work when all dirty data written back.
> 

1, GC is automatically triggered after some mount of data consumed. I
guess it is just not about time in your situation.

2, Because the gc will shrink all cached clean data, which is very
unfriendly for read-intend workload. Therefore gc_after_writeback is
defaulted as 0, when this sysfs file content set to 1, a gc will trigger
after the writeback accomplished.

Coly Li





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Large latency with bcache for Ceph OSD
  2021-02-26  9:54             ` Coly Li
@ 2021-03-02  2:03               ` Norman.Kern
  2021-03-02  5:30               ` Norman.Kern
  1 sibling, 0 replies; 11+ messages in thread
From: Norman.Kern @ 2021-03-02  2:03 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-block, axboe, linux-bcache


On 2021/2/26 下午5:54, Coly Li wrote:
> On 2/26/21 4:57 PM, Norman.Kern wrote:
> [snipped]
>>> You may try to trigger a gc by writing to
>>> sys/fs/bcache/<cache-set-uuid>/internal/trigger_gc
>>>
>> When all cache had written back, I triggered gc, it recalled.
>>
>> root@WXS0106:~# cat /sys/block/bcache0/bcache/cache/cache_available_percent
>> 30
>>
>> root@WXS0106:~# echo 1 > /sys/block/bcache0/bcache/cache/internal/trigger_gc
>> root@WXS0106:~# cat /sys/block/bcache0/bcache/cache/cache_available_percent
>> 97
>>
>> Why must I trigger gc manually? Is not a default action of bcache-gc thread? And I found it can only work when all dirty data written back.
>>
> 1, GC is automatically triggered after some mount of data consumed. I
> guess it is just not about time in your situation.
>
> 2, Because the gc will shrink all cached clean data, which is very
> unfriendly for read-intend workload. Therefore gc_after_writeback is
> defaulted as 0, when this sysfs file content set to 1, a gc will trigger
> after the writeback accomplished.

I made a test again and get more infomation:

root@WXS0089:~# cat /sys/block/bcache0/bcache/dirty_data
0.0k
root@WXS0089:~# lsblk /dev/sda
NAME      MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda         8:0    0 447.1G  0 disk
`-bcache0 252:0    0  10.9T  0 disk
root@WXS0089:~# cat /sys/block/sda/bcache/priority_stats
Unused:         1%
Clean:          29%
Dirty:          70%
Metadata:       0%
Average:        49
Sectors per Q:  29184768
Quantiles:      [1 2 3 5 6 8 9 11 13 14 16 19 21 23 26 29 32 36 39 43 48 53 59 65 73 83 94 109 129 156 203]
root@WXS0089:~# cat /sys/fs/bcache/066319e1-8680-4b5b-adb8-49596319154b/internal/gc_after_writeback
1
You have new mail in /var/mail/root
root@WXS0089:~# cat /sys/fs/bcache/066319e1-8680-4b5b-adb8-49596319154b/cache_available_percent
28

I read the source codes and found if cache_available_percent > 50, it should wakeup gc while doing writeback, but it seemed not work right.

>
> Coly Li
>
>
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Large latency with bcache for Ceph OSD
  2021-02-26  9:54             ` Coly Li
  2021-03-02  2:03               ` Norman.Kern
@ 2021-03-02  5:30               ` Norman.Kern
  1 sibling, 0 replies; 11+ messages in thread
From: Norman.Kern @ 2021-03-02  5:30 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-block, axboe, linux-bcache


On 2021/2/26 下午5:54, Coly Li wrote:
> On 2/26/21 4:57 PM, Norman.Kern wrote:
> [snipped]
>>> You may try to trigger a gc by writing to
>>> sys/fs/bcache/<cache-set-uuid>/internal/trigger_gc
>>>
>> When all cache had written back, I triggered gc, it recalled.
>>
>> root@WXS0106:~# cat /sys/block/bcache0/bcache/cache/cache_available_percent
>> 30
>>
>> root@WXS0106:~# echo 1 > /sys/block/bcache0/bcache/cache/internal/trigger_gc
>> root@WXS0106:~# cat /sys/block/bcache0/bcache/cache/cache_available_percent
>> 97
>>
>> Why must I trigger gc manually? Is not a default action of bcache-gc thread? And I found it can only work when all dirty data written back.
>>
> 1, GC is automatically triggered after some mount of data consumed. I
> guess it is just not about time in your situation.
>
> 2, Because the gc will shrink all cached clean data, which is very
> unfriendly for read-intend workload. Therefore gc_after_writeback is
> defaulted as 0, when this sysfs file content set to 1, a gc will trigger
> after the writeback accomplished.

I made a test again and get more infomation:

root@WXS0089:~# cat /sys/block/bcache0/bcache/dirty_data
0.0k
root@WXS0089:~# lsblk /dev/sda
NAME      MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda         8:0    0 447.1G  0 disk
`-bcache0 252:0    0  10.9T  0 disk
root@WXS0089:~# cat /sys/block/sda/bcache/priority_stats
Unused:         1%
Clean:          29%
Dirty:          70%
Metadata:       0%
Average:        49
Sectors per Q:  29184768
Quantiles:      [1 2 3 5 6 8 9 11 13 14 16 19 21 23 26 29 32 36 39 43 48 53 59 65 73 83 94 109 129 156 203]
root@WXS0089:~# cat /sys/fs/bcache/066319e1-8680-4b5b-adb8-49596319154b/internal/gc_after_writeback
1
You have new mail in /var/mail/root
root@WXS0089:~# cat /sys/fs/bcache/066319e1-8680-4b5b-adb8-49596319154b/cache_available_percent
28

I read the source codes and found if cache_available_percent > 50, it should wakeup gc while doing writeback, but it seemed not work right.

>
> Coly Li
>
>
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-03-02 23:52 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-18  7:56 Large latency with bcache for Ceph OSD Norman.Kern
2021-02-21 23:48 ` Norman.Kern
2021-02-24  8:52   ` Coly Li
2021-02-25  2:22     ` Norman.Kern
2021-02-25  2:23     ` Norman.Kern
2021-02-25 13:00       ` Norman.Kern
2021-02-25 14:44         ` Coly Li
2021-02-26  8:57           ` Norman.Kern
2021-02-26  9:54             ` Coly Li
2021-03-02  2:03               ` Norman.Kern
2021-03-02  5:30               ` Norman.Kern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.