* Large latency with bcache for Ceph OSD @ 2021-02-18 7:56 Norman.Kern 2021-02-21 23:48 ` Norman.Kern 0 siblings, 1 reply; 11+ messages in thread From: Norman.Kern @ 2021-02-18 7:56 UTC (permalink / raw) To: linux-bcache; +Cc: linux-block, colyli, axboe Hi guys, I am testing ceph with bcache, I found some I/O with O_SYNC writeback to HDD, which caused large latency on HDD, I trace the I/O with iosnoop: ./iosnoop -Q -ts -d '8,192 Tracing block I/O for 1 seconds (buffered)... STARTs ENDs COMM PID TYPE DEV BLOCK BYTES LATms 1809296.292350 1809296.319052 tp_osd_tp 22191 R 8,192 4578940240 16384 26.70 1809296.292330 1809296.320974 tp_osd_tp 22191 R 8,192 4577938704 16384 28.64 1809296.292614 1809296.323292 tp_osd_tp 22191 R 8,192 4600404304 16384 30.68 1809296.292353 1809296.325300 tp_osd_tp 22191 R 8,192 4578343088 16384 32.95 1809296.292340 1809296.328013 tp_osd_tp 22191 R 8,192 4578055472 16384 35.67 1809296.292606 1809296.330518 tp_osd_tp 22191 R 8,192 4578581648 16384 37.91 1809295.169266 1809296.334041 bstore_kv_fi 17266 WS 8,192 4244996360 4096 1164.78 1809296.292618 1809296.336349 tp_osd_tp 22191 R 8,192 4602631760 16384 43.73 1809296.292618 1809296.338812 tp_osd_tp 22191 R 8,192 4602632976 16384 46.19 1809296.030103 1809296.342780 tp_osd_tp 22180 WS 8,192 4741276048 131072 312.68 1809296.292347 1809296.345045 tp_osd_tp 22191 R 8,192 4609037872 16384 52.70 1809296.292620 1809296.345109 tp_osd_tp 22191 R 8,192 4609037904 16384 52.49 1809296.292612 1809296.347251 tp_osd_tp 22191 R 8,192 4578937616 16384 54.64 1809296.292621 1809296.351136 tp_osd_tp 22191 R 8,192 4612654992 16384 58.51 1809296.292341 1809296.353428 tp_osd_tp 22191 R 8,192 4578220656 16384 61.09 1809296.292342 1809296.353864 tp_osd_tp 22191 R 8,192 4578220880 16384 61.52 1809295.167650 1809296.358510 bstore_kv_fi 17266 WS 8,192 4923695960 4096 1190.86 1809296.292347 1809296.361885 tp_osd_tp 22191 R 8,192 4607437136 16384 69.54 1809296.029363 1809296.367313 tp_osd_tp 22180 WS 8,192 4739824400 98304 337.95 1809296.292349 1809296.370245 tp_osd_tp 22191 R 8,192 4591379888 16384 77.90 1809296.292348 1809296.376273 tp_osd_tp 22191 R 8,192 4591289552 16384 83.92 1809296.292353 1809296.378659 tp_osd_tp 22191 R 8,192 4578248656 16384 86.31 1809296.292619 1809296.384835 tp_osd_tp 22191 R 8,192 4617494160 65536 92.22 1809295.165451 1809296.393715 bstore_kv_fi 17266 WS 8,192 1355703120 4096 1228.26 1809295.168595 1809296.401560 bstore_kv_fi 17266 WS 8,192 1122200 4096 1232.96 1809295.165221 1809296.408018 bstore_kv_fi 17266 WS 8,192 960656 4096 1242.80 1809295.166737 1809296.411505 bstore_kv_fi 17266 WS 8,192 57682504 4096 1244.77 1809296.292352 1809296.418123 tp_osd_tp 22191 R 8,192 4579459056 32768 125.77 I'm confused why write with O_SYNC must writeback on the backend storage device? And when I used bcache for a time, the latency increased a lot.(The SSD is not very busy), There's some best practices on configuration? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Large latency with bcache for Ceph OSD 2021-02-18 7:56 Large latency with bcache for Ceph OSD Norman.Kern @ 2021-02-21 23:48 ` Norman.Kern 2021-02-24 8:52 ` Coly Li 0 siblings, 1 reply; 11+ messages in thread From: Norman.Kern @ 2021-02-21 23:48 UTC (permalink / raw) To: linux-bcache, colyli; +Cc: linux-block, axboe Ping. I'm confused on the SYNC I/O on bcache. why SYNC I/O must be writen back for persistent cache? It can cause some latency. @Coly, can you give help me to explain why bcache handle O_SYNC like this.? On 2021/2/18 下午3:56, Norman.Kern wrote: > Hi guys, > > I am testing ceph with bcache, I found some I/O with O_SYNC writeback > to HDD, which caused large latency on HDD, I trace the I/O with iosnoop: > > ./iosnoop -Q -ts -d '8,192 > > Tracing block I/O for 1 seconds (buffered)... > STARTs ENDs COMM PID TYPE DEV > BLOCK BYTES LATms > > 1809296.292350 1809296.319052 tp_osd_tp 22191 R 8,192 > 4578940240 16384 26.70 > 1809296.292330 1809296.320974 tp_osd_tp 22191 R 8,192 > 4577938704 16384 28.64 > 1809296.292614 1809296.323292 tp_osd_tp 22191 R 8,192 > 4600404304 16384 30.68 > 1809296.292353 1809296.325300 tp_osd_tp 22191 R 8,192 > 4578343088 16384 32.95 > 1809296.292340 1809296.328013 tp_osd_tp 22191 R 8,192 > 4578055472 16384 35.67 > 1809296.292606 1809296.330518 tp_osd_tp 22191 R 8,192 > 4578581648 16384 37.91 > 1809295.169266 1809296.334041 bstore_kv_fi 17266 WS 8,192 > 4244996360 4096 1164.78 > 1809296.292618 1809296.336349 tp_osd_tp 22191 R 8,192 > 4602631760 16384 43.73 > 1809296.292618 1809296.338812 tp_osd_tp 22191 R 8,192 > 4602632976 16384 46.19 > 1809296.030103 1809296.342780 tp_osd_tp 22180 WS 8,192 > 4741276048 131072 312.68 > 1809296.292347 1809296.345045 tp_osd_tp 22191 R 8,192 > 4609037872 16384 52.70 > 1809296.292620 1809296.345109 tp_osd_tp 22191 R 8,192 > 4609037904 16384 52.49 > 1809296.292612 1809296.347251 tp_osd_tp 22191 R 8,192 > 4578937616 16384 54.64 > 1809296.292621 1809296.351136 tp_osd_tp 22191 R 8,192 > 4612654992 16384 58.51 > 1809296.292341 1809296.353428 tp_osd_tp 22191 R 8,192 > 4578220656 16384 61.09 > 1809296.292342 1809296.353864 tp_osd_tp 22191 R 8,192 > 4578220880 16384 61.52 > 1809295.167650 1809296.358510 bstore_kv_fi 17266 WS 8,192 > 4923695960 4096 1190.86 > 1809296.292347 1809296.361885 tp_osd_tp 22191 R 8,192 > 4607437136 16384 69.54 > 1809296.029363 1809296.367313 tp_osd_tp 22180 WS 8,192 > 4739824400 98304 337.95 > 1809296.292349 1809296.370245 tp_osd_tp 22191 R 8,192 > 4591379888 16384 77.90 > 1809296.292348 1809296.376273 tp_osd_tp 22191 R 8,192 > 4591289552 16384 83.92 > 1809296.292353 1809296.378659 tp_osd_tp 22191 R 8,192 > 4578248656 16384 86.31 > 1809296.292619 1809296.384835 tp_osd_tp 22191 R 8,192 > 4617494160 65536 92.22 > 1809295.165451 1809296.393715 bstore_kv_fi 17266 WS 8,192 > 1355703120 4096 1228.26 > 1809295.168595 1809296.401560 bstore_kv_fi 17266 WS 8,192 > 1122200 4096 1232.96 > 1809295.165221 1809296.408018 bstore_kv_fi 17266 WS 8,192 > 960656 4096 1242.80 > 1809295.166737 1809296.411505 bstore_kv_fi 17266 WS 8,192 > 57682504 4096 1244.77 > 1809296.292352 1809296.418123 tp_osd_tp 22191 R 8,192 > 4579459056 32768 125.77 > > I'm confused why write with O_SYNC must writeback on the backend > storage device? And when I used bcache for a time, > > the latency increased a lot.(The SSD is not very busy), There's some > best practices on configuration? > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Large latency with bcache for Ceph OSD 2021-02-21 23:48 ` Norman.Kern @ 2021-02-24 8:52 ` Coly Li 2021-02-25 2:22 ` Norman.Kern 2021-02-25 2:23 ` Norman.Kern 0 siblings, 2 replies; 11+ messages in thread From: Coly Li @ 2021-02-24 8:52 UTC (permalink / raw) To: Norman.Kern; +Cc: linux-block, axboe, linux-bcache On 2/22/21 7:48 AM, Norman.Kern wrote: > Ping. > > I'm confused on the SYNC I/O on bcache. why SYNC I/O must be writen back > for persistent cache? It can cause some latency. > > @Coly, can you give help me to explain why bcache handle O_SYNC like this.? > > Hmm, normally we won't observe the application issuing I/Os on backing device except for, - I/O bypass by SSD congestion - Sequential I/O request - Dirty buckets exceeds the cutoff threshold - Write through mode Do you set the write/read congestion threshold to 0 ? Coly Li > On 2021/2/18 下午3:56, Norman.Kern wrote: >> Hi guys, >> >> I am testing ceph with bcache, I found some I/O with O_SYNC writeback >> to HDD, which caused large latency on HDD, I trace the I/O with iosnoop: >> >> ./iosnoop -Q -ts -d '8,192 >> >> Tracing block I/O for 1 seconds (buffered)... >> STARTs ENDs COMM PID TYPE DEV >> BLOCK BYTES LATms >> >> 1809296.292350 1809296.319052 tp_osd_tp 22191 R 8,192 >> 4578940240 16384 26.70 >> 1809296.292330 1809296.320974 tp_osd_tp 22191 R 8,192 >> 4577938704 16384 28.64 >> 1809296.292614 1809296.323292 tp_osd_tp 22191 R 8,192 >> 4600404304 16384 30.68 >> 1809296.292353 1809296.325300 tp_osd_tp 22191 R 8,192 >> 4578343088 16384 32.95 >> 1809296.292340 1809296.328013 tp_osd_tp 22191 R 8,192 >> 4578055472 16384 35.67 >> 1809296.292606 1809296.330518 tp_osd_tp 22191 R 8,192 >> 4578581648 16384 37.91 >> 1809295.169266 1809296.334041 bstore_kv_fi 17266 WS 8,192 >> 4244996360 4096 1164.78 >> 1809296.292618 1809296.336349 tp_osd_tp 22191 R 8,192 >> 4602631760 16384 43.73 >> 1809296.292618 1809296.338812 tp_osd_tp 22191 R 8,192 >> 4602632976 16384 46.19 >> 1809296.030103 1809296.342780 tp_osd_tp 22180 WS 8,192 >> 4741276048 131072 312.68 >> 1809296.292347 1809296.345045 tp_osd_tp 22191 R 8,192 >> 4609037872 16384 52.70 >> 1809296.292620 1809296.345109 tp_osd_tp 22191 R 8,192 >> 4609037904 16384 52.49 >> 1809296.292612 1809296.347251 tp_osd_tp 22191 R 8,192 >> 4578937616 16384 54.64 >> 1809296.292621 1809296.351136 tp_osd_tp 22191 R 8,192 >> 4612654992 16384 58.51 >> 1809296.292341 1809296.353428 tp_osd_tp 22191 R 8,192 >> 4578220656 16384 61.09 >> 1809296.292342 1809296.353864 tp_osd_tp 22191 R 8,192 >> 4578220880 16384 61.52 >> 1809295.167650 1809296.358510 bstore_kv_fi 17266 WS 8,192 >> 4923695960 4096 1190.86 >> 1809296.292347 1809296.361885 tp_osd_tp 22191 R 8,192 >> 4607437136 16384 69.54 >> 1809296.029363 1809296.367313 tp_osd_tp 22180 WS 8,192 >> 4739824400 98304 337.95 >> 1809296.292349 1809296.370245 tp_osd_tp 22191 R 8,192 >> 4591379888 16384 77.90 >> 1809296.292348 1809296.376273 tp_osd_tp 22191 R 8,192 >> 4591289552 16384 83.92 >> 1809296.292353 1809296.378659 tp_osd_tp 22191 R 8,192 >> 4578248656 16384 86.31 >> 1809296.292619 1809296.384835 tp_osd_tp 22191 R 8,192 >> 4617494160 65536 92.22 >> 1809295.165451 1809296.393715 bstore_kv_fi 17266 WS 8,192 >> 1355703120 4096 1228.26 >> 1809295.168595 1809296.401560 bstore_kv_fi 17266 WS 8,192 >> 1122200 4096 1232.96 >> 1809295.165221 1809296.408018 bstore_kv_fi 17266 WS 8,192 >> 960656 4096 1242.80 >> 1809295.166737 1809296.411505 bstore_kv_fi 17266 WS 8,192 >> 57682504 4096 1244.77 >> 1809296.292352 1809296.418123 tp_osd_tp 22191 R 8,192 >> 4579459056 32768 125.77 >> >> I'm confused why write with O_SYNC must writeback on the backend >> storage device? And when I used bcache for a time, >> >> the latency increased a lot.(The SSD is not very busy), There's some >> best practices on configuration? >> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Large latency with bcache for Ceph OSD 2021-02-24 8:52 ` Coly Li @ 2021-02-25 2:22 ` Norman.Kern 2021-02-25 2:23 ` Norman.Kern 1 sibling, 0 replies; 11+ messages in thread From: Norman.Kern @ 2021-02-25 2:22 UTC (permalink / raw) To: Coly Li; +Cc: linux-block, axboe, linux-bcache On 2021/2/24 下午4:52, Coly Li wrote: > On 2/22/21 7:48 AM, Norman.Kern wrote: >> Ping. >> >> I'm confused on the SYNC I/O on bcache. why SYNC I/O must be writen back >> for persistent cache? It can cause some latency. >> >> @Coly, can you give help me to explain why bcache handle O_SYNC like this.? >> >> > Hmm, normally we won't observe the application issuing I/Os on backing > device except for, > - I/O bypass by SSD congestion > - Sequential I/O request > - Dirty buckets exceeds the cutoff threshold > - Write through mode > > Do you set the write/read congestion threshold to 0 ? Thanks for you reply. I have set the threshold to zero, all configs: #make-bcache -C -b 4m -w 4k --discard --cache_replacement_policy=lru /dev/sdm #make-bcache -B --writeback -w 4KiB /dev/sdn --wipe-bcache congested_read_threshold_us = 0 congested_write_threshold_us = 0 # I tried to set sequential_cutoff to 0, but it didn't solve it. sequential_cutoff = 4194304 writeback_percent = 40 cache_mode = writeback I renew the cluster, run for hours and reproduced the problem. I check the cache status: root@WXS0106:/root/perf-tools# cat /sys/fs/bcache/d87713c6-2e76-4a09-8517-d48306468659/cache_available_percent 29 root@WXS0106:/root/perf-tools# cat /sys/fs/bcache/d87713c6-2e76-4a09-8517-d48306468659/internal/cutoff_writeback_sync 70 'Dirty buckets exceeds the cutoff threshold' caused the problem, is my config wrong or other reasons? > > Coly Li > >> On 2021/2/18 下午3:56, Norman.Kern wrote: >>> Hi guys, >>> >>> I am testing ceph with bcache, I found some I/O with O_SYNC writeback >>> to HDD, which caused large latency on HDD, I trace the I/O with iosnoop: >>> >>> ./iosnoop -Q -ts -d '8,192 >>> >>> Tracing block I/O for 1 seconds (buffered)... >>> STARTs ENDs COMM PID TYPE DEV >>> BLOCK BYTES LATms >>> >>> 1809296.292350 1809296.319052 tp_osd_tp 22191 R 8,192 >>> 4578940240 16384 26.70 >>> 1809296.292330 1809296.320974 tp_osd_tp 22191 R 8,192 >>> 4577938704 16384 28.64 >>> 1809296.292614 1809296.323292 tp_osd_tp 22191 R 8,192 >>> 4600404304 16384 30.68 >>> 1809296.292353 1809296.325300 tp_osd_tp 22191 R 8,192 >>> 4578343088 16384 32.95 >>> 1809296.292340 1809296.328013 tp_osd_tp 22191 R 8,192 >>> 4578055472 16384 35.67 >>> 1809296.292606 1809296.330518 tp_osd_tp 22191 R 8,192 >>> 4578581648 16384 37.91 >>> 1809295.169266 1809296.334041 bstore_kv_fi 17266 WS 8,192 >>> 4244996360 4096 1164.78 >>> 1809296.292618 1809296.336349 tp_osd_tp 22191 R 8,192 >>> 4602631760 16384 43.73 >>> 1809296.292618 1809296.338812 tp_osd_tp 22191 R 8,192 >>> 4602632976 16384 46.19 >>> 1809296.030103 1809296.342780 tp_osd_tp 22180 WS 8,192 >>> 4741276048 131072 312.68 >>> 1809296.292347 1809296.345045 tp_osd_tp 22191 R 8,192 >>> 4609037872 16384 52.70 >>> 1809296.292620 1809296.345109 tp_osd_tp 22191 R 8,192 >>> 4609037904 16384 52.49 >>> 1809296.292612 1809296.347251 tp_osd_tp 22191 R 8,192 >>> 4578937616 16384 54.64 >>> 1809296.292621 1809296.351136 tp_osd_tp 22191 R 8,192 >>> 4612654992 16384 58.51 >>> 1809296.292341 1809296.353428 tp_osd_tp 22191 R 8,192 >>> 4578220656 16384 61.09 >>> 1809296.292342 1809296.353864 tp_osd_tp 22191 R 8,192 >>> 4578220880 16384 61.52 >>> 1809295.167650 1809296.358510 bstore_kv_fi 17266 WS 8,192 >>> 4923695960 4096 1190.86 >>> 1809296.292347 1809296.361885 tp_osd_tp 22191 R 8,192 >>> 4607437136 16384 69.54 >>> 1809296.029363 1809296.367313 tp_osd_tp 22180 WS 8,192 >>> 4739824400 98304 337.95 >>> 1809296.292349 1809296.370245 tp_osd_tp 22191 R 8,192 >>> 4591379888 16384 77.90 >>> 1809296.292348 1809296.376273 tp_osd_tp 22191 R 8,192 >>> 4591289552 16384 83.92 >>> 1809296.292353 1809296.378659 tp_osd_tp 22191 R 8,192 >>> 4578248656 16384 86.31 >>> 1809296.292619 1809296.384835 tp_osd_tp 22191 R 8,192 >>> 4617494160 65536 92.22 >>> 1809295.165451 1809296.393715 bstore_kv_fi 17266 WS 8,192 >>> 1355703120 4096 1228.26 >>> 1809295.168595 1809296.401560 bstore_kv_fi 17266 WS 8,192 >>> 1122200 4096 1232.96 >>> 1809295.165221 1809296.408018 bstore_kv_fi 17266 WS 8,192 >>> 960656 4096 1242.80 >>> 1809295.166737 1809296.411505 bstore_kv_fi 17266 WS 8,192 >>> 57682504 4096 1244.77 >>> 1809296.292352 1809296.418123 tp_osd_tp 22191 R 8,192 >>> 4579459056 32768 125.77 >>> >>> I'm confused why write with O_SYNC must writeback on the backend >>> storage device? And when I used bcache for a time, >>> >>> the latency increased a lot.(The SSD is not very busy), There's some >>> best practices on configuration? >>> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Large latency with bcache for Ceph OSD 2021-02-24 8:52 ` Coly Li 2021-02-25 2:22 ` Norman.Kern @ 2021-02-25 2:23 ` Norman.Kern 2021-02-25 13:00 ` Norman.Kern 1 sibling, 1 reply; 11+ messages in thread From: Norman.Kern @ 2021-02-25 2:23 UTC (permalink / raw) To: Coly Li; +Cc: linux-block, axboe, linux-bcache On 2021/2/24 下午4:52, Coly Li wrote: > On 2/22/21 7:48 AM, Norman.Kern wrote: >> Ping. >> >> I'm confused on the SYNC I/O on bcache. why SYNC I/O must be writen back >> for persistent cache? It can cause some latency. >> >> @Coly, can you give help me to explain why bcache handle O_SYNC like this.? >> >> > Hmm, normally we won't observe the application issuing I/Os on backing > device except for, > - I/O bypass by SSD congestion > - Sequential I/O request > - Dirty buckets exceeds the cutoff threshold > - Write through mode > > Do you set the write/read congestion threshold to 0 ? Thanks for you reply. I have set the threshold to zero, all configs: #make-bcache -C -b 4m -w 4k --discard --cache_replacement_policy=lru /dev/sdm #make-bcache -B --writeback -w 4KiB /dev/sdn --wipe-bcache congested_read_threshold_us = 0 congested_write_threshold_us = 0 # I tried to set sequential_cutoff to 0, but it didn't solve it. sequential_cutoff = 4194304 writeback_percent = 40 cache_mode = writeback I renew the cluster, run for hours and reproduced the problem. I check the cache status: root@WXS0106:/root/perf-tools# cat /sys/fs/bcache/d87713c6-2e76-4a09-8517-d48306468659/cache_available_percent 29 root@WXS0106:/root/perf-tools# cat /sys/fs/bcache/d87713c6-2e76-4a09-8517-d48306468659/internal/cutoff_writeback_sync 70 'Dirty buckets exceeds the cutoff threshold' caused the problem? My configs are wrong or other reasons? > > Coly Li > >> On 2021/2/18 下午3:56, Norman.Kern wrote: >>> Hi guys, >>> >>> I am testing ceph with bcache, I found some I/O with O_SYNC writeback >>> to HDD, which caused large latency on HDD, I trace the I/O with iosnoop: >>> >>> ./iosnoop -Q -ts -d '8,192 >>> >>> Tracing block I/O for 1 seconds (buffered)... >>> STARTs ENDs COMM PID TYPE DEV >>> BLOCK BYTES LATms >>> >>> 1809296.292350 1809296.319052 tp_osd_tp 22191 R 8,192 >>> 4578940240 16384 26.70 >>> 1809296.292330 1809296.320974 tp_osd_tp 22191 R 8,192 >>> 4577938704 16384 28.64 >>> 1809296.292614 1809296.323292 tp_osd_tp 22191 R 8,192 >>> 4600404304 16384 30.68 >>> 1809296.292353 1809296.325300 tp_osd_tp 22191 R 8,192 >>> 4578343088 16384 32.95 >>> 1809296.292340 1809296.328013 tp_osd_tp 22191 R 8,192 >>> 4578055472 16384 35.67 >>> 1809296.292606 1809296.330518 tp_osd_tp 22191 R 8,192 >>> 4578581648 16384 37.91 >>> 1809295.169266 1809296.334041 bstore_kv_fi 17266 WS 8,192 >>> 4244996360 4096 1164.78 >>> 1809296.292618 1809296.336349 tp_osd_tp 22191 R 8,192 >>> 4602631760 16384 43.73 >>> 1809296.292618 1809296.338812 tp_osd_tp 22191 R 8,192 >>> 4602632976 16384 46.19 >>> 1809296.030103 1809296.342780 tp_osd_tp 22180 WS 8,192 >>> 4741276048 131072 312.68 >>> 1809296.292347 1809296.345045 tp_osd_tp 22191 R 8,192 >>> 4609037872 16384 52.70 >>> 1809296.292620 1809296.345109 tp_osd_tp 22191 R 8,192 >>> 4609037904 16384 52.49 >>> 1809296.292612 1809296.347251 tp_osd_tp 22191 R 8,192 >>> 4578937616 16384 54.64 >>> 1809296.292621 1809296.351136 tp_osd_tp 22191 R 8,192 >>> 4612654992 16384 58.51 >>> 1809296.292341 1809296.353428 tp_osd_tp 22191 R 8,192 >>> 4578220656 16384 61.09 >>> 1809296.292342 1809296.353864 tp_osd_tp 22191 R 8,192 >>> 4578220880 16384 61.52 >>> 1809295.167650 1809296.358510 bstore_kv_fi 17266 WS 8,192 >>> 4923695960 4096 1190.86 >>> 1809296.292347 1809296.361885 tp_osd_tp 22191 R 8,192 >>> 4607437136 16384 69.54 >>> 1809296.029363 1809296.367313 tp_osd_tp 22180 WS 8,192 >>> 4739824400 98304 337.95 >>> 1809296.292349 1809296.370245 tp_osd_tp 22191 R 8,192 >>> 4591379888 16384 77.90 >>> 1809296.292348 1809296.376273 tp_osd_tp 22191 R 8,192 >>> 4591289552 16384 83.92 >>> 1809296.292353 1809296.378659 tp_osd_tp 22191 R 8,192 >>> 4578248656 16384 86.31 >>> 1809296.292619 1809296.384835 tp_osd_tp 22191 R 8,192 >>> 4617494160 65536 92.22 >>> 1809295.165451 1809296.393715 bstore_kv_fi 17266 WS 8,192 >>> 1355703120 4096 1228.26 >>> 1809295.168595 1809296.401560 bstore_kv_fi 17266 WS 8,192 >>> 1122200 4096 1232.96 >>> 1809295.165221 1809296.408018 bstore_kv_fi 17266 WS 8,192 >>> 960656 4096 1242.80 >>> 1809295.166737 1809296.411505 bstore_kv_fi 17266 WS 8,192 >>> 57682504 4096 1244.77 >>> 1809296.292352 1809296.418123 tp_osd_tp 22191 R 8,192 >>> 4579459056 32768 125.77 >>> >>> I'm confused why write with O_SYNC must writeback on the backend >>> storage device? And when I used bcache for a time, >>> >>> the latency increased a lot.(The SSD is not very busy), There's some >>> best practices on configuration? >>> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Large latency with bcache for Ceph OSD 2021-02-25 2:23 ` Norman.Kern @ 2021-02-25 13:00 ` Norman.Kern 2021-02-25 14:44 ` Coly Li 0 siblings, 1 reply; 11+ messages in thread From: Norman.Kern @ 2021-02-25 13:00 UTC (permalink / raw) To: Coly Li; +Cc: linux-block, axboe, linux-bcache I made a test: - Stop writing and wait for dirty data writen back $ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sdf 8:80 0 7.3T 0 disk └─bcache0 252:0 0 7.3T 0 disk └─ceph--32a481f9--313c--417e--aaf7--bdd74515fd86-osd--data--2f670929--3c8a--45dd--bcef--c60ce3ee08e1 253:1 0 7.3T 0 lvm sdd 8:48 0 7.3T 0 disk sdb 8:16 0 7.3T 0 disk sdk 8:160 0 893.8G 0 disk └─bcache0 252:0 0 7.3T 0 disk └─ceph--32a481f9--313c--417e--aaf7--bdd74515fd86-osd--data--2f670929--3c8a--45dd--bcef--c60ce3ee08e1 253:1 0 7.3T 0 lvm $ cat /sys/block/bcache0/bcache/dirty_data 0.0k root@WXS0106:~# bcache-super-show /dev/sdf sb.magic ok sb.first_sector 8 [match] sb.csum 71DA9CA968B4A625 [match] sb.version 1 [backing device] dev.label (empty) dev.uuid d07dc435-129d-477d-8378-a6af75199852 dev.sectors_per_block 8 dev.sectors_per_bucket 1024 dev.data.first_sector 16 dev.data.cache_mode 1 [writeback] dev.data.cache_state 1 [clean] cset.uuid d87713c6-2e76-4a09-8517-d48306468659 - check the available cache # cat /sys/fs/bcache/d87713c6-2e76-4a09-8517-d48306468659/cache_available_percent 27 As the doc described: cache_available_percent Percentage of cache device which doesn’t contain dirty data, and could potentially be used for writeback. This doesn’t mean this space isn’t used for clean cached data; the unused statistic (in priority_stats) is typically much lower. When all dirty data writen back, why cache_available_percent was not 100? And when I start the write I/O, the new writen didn't replace the clean cache(it think the cache is diry now?), so it cause the hdd with large latency: ./bin/iosnoop -Q -d '8,80' <...> 73338 WS 8,80 3513701472 4096 217.69 <...> 73338 WS 8,80 3513759360 4096 448.80 <...> 73338 WS 8,80 3562211912 4096 511.69 <...> 73335 WS 8,80 3562212528 4096 505.08 <...> 73339 WS 8,80 3562213376 4096 501.19 <...> 73336 WS 8,80 3562213992 4096 511.16 <...> 73343 WS 8,80 3562214016 4096 511.74 <...> 73340 WS 8,80 3562214128 4096 512.95 <...> 73329 WS 8,80 3562214208 4096 510.48 <...> 73338 WS 8,80 3562214600 4096 518.64 <...> 73341 WS 8,80 3562214632 4096 519.09 <...> 73342 WS 8,80 3562214664 4096 518.28 <...> 73336 WS 8,80 3562214688 4096 519.27 <...> 73343 WS 8,80 3562214736 4096 528.31 <...> 73339 WS 8,80 3562214784 4096 530.13 On 2021/2/25 上午10:23, Norman.Kern wrote: > On 2021/2/24 下午4:52, Coly Li wrote: >> On 2/22/21 7:48 AM, Norman.Kern wrote: >>> Ping. >>> >>> I'm confused on the SYNC I/O on bcache. why SYNC I/O must be writen back >>> for persistent cache? It can cause some latency. >>> >>> @Coly, can you give help me to explain why bcache handle O_SYNC like this.? >>> >>> >> Hmm, normally we won't observe the application issuing I/Os on backing >> device except for, >> - I/O bypass by SSD congestion >> - Sequential I/O request >> - Dirty buckets exceeds the cutoff threshold >> - Write through mode >> >> Do you set the write/read congestion threshold to 0 ? > Thanks for you reply. > > I have set the threshold to zero, all configs: > > #make-bcache -C -b 4m -w 4k --discard --cache_replacement_policy=lru /dev/sdm > #make-bcache -B --writeback -w 4KiB /dev/sdn --wipe-bcache > congested_read_threshold_us = 0 > congested_write_threshold_us = 0 > > # I tried to set sequential_cutoff to 0, but it didn't solve it. > > sequential_cutoff = 4194304 > writeback_percent = 40 > cache_mode = writeback > > I renew the cluster, run for hours and reproduced the problem. I check the cache status: > > root@WXS0106:/root/perf-tools# cat /sys/fs/bcache/d87713c6-2e76-4a09-8517-d48306468659/cache_available_percent > 29 > root@WXS0106:/root/perf-tools# cat /sys/fs/bcache/d87713c6-2e76-4a09-8517-d48306468659/internal/cutoff_writeback_sync > 70 > 'Dirty buckets exceeds the cutoff threshold' caused the problem? My configs are wrong or other reasons? > >> Coly Li >> >>> On 2021/2/18 下午3:56, Norman.Kern wrote: >>>> Hi guys, >>>> >>>> I am testing ceph with bcache, I found some I/O with O_SYNC writeback >>>> to HDD, which caused large latency on HDD, I trace the I/O with iosnoop: >>>> >>>> ./iosnoop -Q -ts -d '8,192 >>>> >>>> Tracing block I/O for 1 seconds (buffered)... >>>> STARTs ENDs COMM PID TYPE DEV >>>> BLOCK BYTES LATms >>>> >>>> 1809296.292350 1809296.319052 tp_osd_tp 22191 R 8,192 >>>> 4578940240 16384 26.70 >>>> 1809296.292330 1809296.320974 tp_osd_tp 22191 R 8,192 >>>> 4577938704 16384 28.64 >>>> 1809296.292614 1809296.323292 tp_osd_tp 22191 R 8,192 >>>> 4600404304 16384 30.68 >>>> 1809296.292353 1809296.325300 tp_osd_tp 22191 R 8,192 >>>> 4578343088 16384 32.95 >>>> 1809296.292340 1809296.328013 tp_osd_tp 22191 R 8,192 >>>> 4578055472 16384 35.67 >>>> 1809296.292606 1809296.330518 tp_osd_tp 22191 R 8,192 >>>> 4578581648 16384 37.91 >>>> 1809295.169266 1809296.334041 bstore_kv_fi 17266 WS 8,192 >>>> 4244996360 4096 1164.78 >>>> 1809296.292618 1809296.336349 tp_osd_tp 22191 R 8,192 >>>> 4602631760 16384 43.73 >>>> 1809296.292618 1809296.338812 tp_osd_tp 22191 R 8,192 >>>> 4602632976 16384 46.19 >>>> 1809296.030103 1809296.342780 tp_osd_tp 22180 WS 8,192 >>>> 4741276048 131072 312.68 >>>> 1809296.292347 1809296.345045 tp_osd_tp 22191 R 8,192 >>>> 4609037872 16384 52.70 >>>> 1809296.292620 1809296.345109 tp_osd_tp 22191 R 8,192 >>>> 4609037904 16384 52.49 >>>> 1809296.292612 1809296.347251 tp_osd_tp 22191 R 8,192 >>>> 4578937616 16384 54.64 >>>> 1809296.292621 1809296.351136 tp_osd_tp 22191 R 8,192 >>>> 4612654992 16384 58.51 >>>> 1809296.292341 1809296.353428 tp_osd_tp 22191 R 8,192 >>>> 4578220656 16384 61.09 >>>> 1809296.292342 1809296.353864 tp_osd_tp 22191 R 8,192 >>>> 4578220880 16384 61.52 >>>> 1809295.167650 1809296.358510 bstore_kv_fi 17266 WS 8,192 >>>> 4923695960 4096 1190.86 >>>> 1809296.292347 1809296.361885 tp_osd_tp 22191 R 8,192 >>>> 4607437136 16384 69.54 >>>> 1809296.029363 1809296.367313 tp_osd_tp 22180 WS 8,192 >>>> 4739824400 98304 337.95 >>>> 1809296.292349 1809296.370245 tp_osd_tp 22191 R 8,192 >>>> 4591379888 16384 77.90 >>>> 1809296.292348 1809296.376273 tp_osd_tp 22191 R 8,192 >>>> 4591289552 16384 83.92 >>>> 1809296.292353 1809296.378659 tp_osd_tp 22191 R 8,192 >>>> 4578248656 16384 86.31 >>>> 1809296.292619 1809296.384835 tp_osd_tp 22191 R 8,192 >>>> 4617494160 65536 92.22 >>>> 1809295.165451 1809296.393715 bstore_kv_fi 17266 WS 8,192 >>>> 1355703120 4096 1228.26 >>>> 1809295.168595 1809296.401560 bstore_kv_fi 17266 WS 8,192 >>>> 1122200 4096 1232.96 >>>> 1809295.165221 1809296.408018 bstore_kv_fi 17266 WS 8,192 >>>> 960656 4096 1242.80 >>>> 1809295.166737 1809296.411505 bstore_kv_fi 17266 WS 8,192 >>>> 57682504 4096 1244.77 >>>> 1809296.292352 1809296.418123 tp_osd_tp 22191 R 8,192 >>>> 4579459056 32768 125.77 >>>> >>>> I'm confused why write with O_SYNC must writeback on the backend >>>> storage device? And when I used bcache for a time, >>>> >>>> the latency increased a lot.(The SSD is not very busy), There's some >>>> best practices on configuration? >>>> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Large latency with bcache for Ceph OSD 2021-02-25 13:00 ` Norman.Kern @ 2021-02-25 14:44 ` Coly Li 2021-02-26 8:57 ` Norman.Kern 0 siblings, 1 reply; 11+ messages in thread From: Coly Li @ 2021-02-25 14:44 UTC (permalink / raw) To: Norman.Kern; +Cc: linux-block, axboe, linux-bcache On 2/25/21 9:00 PM, Norman.Kern wrote: > I made a test: BTW, what is the version of your kernel, and your bcache-tool, and which distribution is running ? > > - Stop writing and wait for dirty data writen back > > $ lsblk > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT > sdf 8:80 0 7.3T 0 disk > └─bcache0 252:0 0 7.3T 0 disk > └─ceph--32a481f9--313c--417e--aaf7--bdd74515fd86-osd--data--2f670929--3c8a--45dd--bcef--c60ce3ee08e1 253:1 0 7.3T 0 lvm > sdd 8:48 0 7.3T 0 disk > sdb 8:16 0 7.3T 0 disk > sdk 8:160 0 893.8G 0 disk > └─bcache0 252:0 0 7.3T 0 disk > └─ceph--32a481f9--313c--417e--aaf7--bdd74515fd86-osd--data--2f670929--3c8a--45dd--bcef--c60ce3ee08e1 253:1 0 7.3T 0 lvm > $ cat /sys/block/bcache0/bcache/dirty_data > 0.0k > > root@WXS0106:~# bcache-super-show /dev/sdf > sb.magic ok > sb.first_sector 8 [match] > sb.csum 71DA9CA968B4A625 [match] > sb.version 1 [backing device] > > dev.label (empty) > dev.uuid d07dc435-129d-477d-8378-a6af75199852 > dev.sectors_per_block 8 > dev.sectors_per_bucket 1024 > dev.data.first_sector 16 > dev.data.cache_mode 1 [writeback] > dev.data.cache_state 1 [clean] > cset.uuid d87713c6-2e76-4a09-8517-d48306468659 > > - check the available cache > > # cat /sys/fs/bcache/d87713c6-2e76-4a09-8517-d48306468659/cache_available_percent > 27 > What is the content from /sys/fs/bcache/<cache-set-uuid>/cache0/priority_stats ? Can you past here too. There is no dirty blocks, but cache is occupied 78% buckets, if you are using 5.8+ kernel, then a gc is probably desired. You may try to trigger a gc by writing to sys/fs/bcache/<cache-set-uuid>/internal/trigger_gc > > As the doc described: > > cache_available_percent > Percentage of cache device which doesn’t contain dirty data, and could potentially be used for writeback. This doesn’t mean this space isn’t used for clean cached data; the unused statistic (in priority_stats) is typically much lower. > When all dirty data writen back, why cache_available_percent was not 100? > > And when I start the write I/O, the new writen didn't replace the clean cache(it think the cache is diry now?), so it cause the hdd with large latency: > > ./bin/iosnoop -Q -d '8,80' > > <...> 73338 WS 8,80 3513701472 4096 217.69 > <...> 73338 WS 8,80 3513759360 4096 448.80 > <...> 73338 WS 8,80 3562211912 4096 511.69 > <...> 73335 WS 8,80 3562212528 4096 505.08 > <...> 73339 WS 8,80 3562213376 4096 501.19 > <...> 73336 WS 8,80 3562213992 4096 511.16 > <...> 73343 WS 8,80 3562214016 4096 511.74 > <...> 73340 WS 8,80 3562214128 4096 512.95 > <...> 73329 WS 8,80 3562214208 4096 510.48 > <...> 73338 WS 8,80 3562214600 4096 518.64 > <...> 73341 WS 8,80 3562214632 4096 519.09 > <...> 73342 WS 8,80 3562214664 4096 518.28 > <...> 73336 WS 8,80 3562214688 4096 519.27 > <...> 73343 WS 8,80 3562214736 4096 528.31 > <...> 73339 WS 8,80 3562214784 4096 530.13 > I just wondering why gc thread does not run up .... Thanks. Coly Li ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Large latency with bcache for Ceph OSD 2021-02-25 14:44 ` Coly Li @ 2021-02-26 8:57 ` Norman.Kern 2021-02-26 9:54 ` Coly Li 0 siblings, 1 reply; 11+ messages in thread From: Norman.Kern @ 2021-02-26 8:57 UTC (permalink / raw) To: Coly Li; +Cc: linux-block, axboe, linux-bcache On 2021/2/25 下午10:44, Coly Li wrote: > On 2/25/21 9:00 PM, Norman.Kern wrote: >> I made a test: > BTW, what is the version of your kernel, and your bcache-tool, and which > distribution is running ? root@WXS0106:~# uname -a Linux WXS0106 5.4.0-58-generic #64~18.04.1-Ubuntu SMP Wed Dec 9 17:11:11 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux root@WXS0106:~# cat /etc/os-release NAME="Ubuntu" VERSION="16.04 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 16.04 LTS" VERSION_ID="16.04" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" UBUNTU_CODENAME=xenial root@WXS0106:~# dpkg -l bcache-tools Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-=====================================-=======================-=======================-================================================================================ ii bcache-tools 1.0.8-2 amd64 bcache userspace tools > >> - Stop writing and wait for dirty data writen back >> >> $ lsblk >> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT >> sdf 8:80 0 7.3T 0 disk >> └─bcache0 252:0 0 7.3T 0 disk >> └─ceph--32a481f9--313c--417e--aaf7--bdd74515fd86-osd--data--2f670929--3c8a--45dd--bcef--c60ce3ee08e1 253:1 0 7.3T 0 lvm >> sdd 8:48 0 7.3T 0 disk >> sdb 8:16 0 7.3T 0 disk >> sdk 8:160 0 893.8G 0 disk >> └─bcache0 252:0 0 7.3T 0 disk >> └─ceph--32a481f9--313c--417e--aaf7--bdd74515fd86-osd--data--2f670929--3c8a--45dd--bcef--c60ce3ee08e1 253:1 0 7.3T 0 lvm >> $ cat /sys/block/bcache0/bcache/dirty_data >> 0.0k >> >> root@WXS0106:~# bcache-super-show /dev/sdf >> sb.magic ok >> sb.first_sector 8 [match] >> sb.csum 71DA9CA968B4A625 [match] >> sb.version 1 [backing device] >> >> dev.label (empty) >> dev.uuid d07dc435-129d-477d-8378-a6af75199852 >> dev.sectors_per_block 8 >> dev.sectors_per_bucket 1024 >> dev.data.first_sector 16 >> dev.data.cache_mode 1 [writeback] >> dev.data.cache_state 1 [clean] >> cset.uuid d87713c6-2e76-4a09-8517-d48306468659 >> >> - check the available cache >> >> # cat /sys/fs/bcache/d87713c6-2e76-4a09-8517-d48306468659/cache_available_percent >> 27 >> > What is the content from > /sys/fs/bcache/<cache-set-uuid>/cache0/priority_stats ? Can you past > here too. I forgot get the info before I triggered gc..., I think I can reproduce the problem. When I reproduce it, I will collect the infomation. > > There is no dirty blocks, but cache is occupied 78% buckets, if you are > using 5.8+ kernel, then a gc is probably desired. > > You may try to trigger a gc by writing to > sys/fs/bcache/<cache-set-uuid>/internal/trigger_gc > When all cache had written back, I triggered gc, it recalled. root@WXS0106:~# cat /sys/block/bcache0/bcache/cache/cache_available_percent 30 root@WXS0106:~# echo 1 > /sys/block/bcache0/bcache/cache/internal/trigger_gc root@WXS0106:~# cat /sys/block/bcache0/bcache/cache/cache_available_percent 97 Why must I trigger gc manually? Is not a default action of bcache-gc thread? And I found it can only work when all dirty data written back. >> As the doc described: >> >> cache_available_percent >> Percentage of cache device which doesn’t contain dirty data, and could potentially be used for writeback. This doesn’t mean this space isn’t used for clean cached data; the unused statistic (in priority_stats) is typically much lower. >> When all dirty data writen back, why cache_available_percent was not 100? >> >> And when I start the write I/O, the new writen didn't replace the clean cache(it think the cache is diry now?), so it cause the hdd with large latency: >> >> ./bin/iosnoop -Q -d '8,80' >> >> <...> 73338 WS 8,80 3513701472 4096 217.69 >> <...> 73338 WS 8,80 3513759360 4096 448.80 >> <...> 73338 WS 8,80 3562211912 4096 511.69 >> <...> 73335 WS 8,80 3562212528 4096 505.08 >> <...> 73339 WS 8,80 3562213376 4096 501.19 >> <...> 73336 WS 8,80 3562213992 4096 511.16 >> <...> 73343 WS 8,80 3562214016 4096 511.74 >> <...> 73340 WS 8,80 3562214128 4096 512.95 >> <...> 73329 WS 8,80 3562214208 4096 510.48 >> <...> 73338 WS 8,80 3562214600 4096 518.64 >> <...> 73341 WS 8,80 3562214632 4096 519.09 >> <...> 73342 WS 8,80 3562214664 4096 518.28 >> <...> 73336 WS 8,80 3562214688 4096 519.27 >> <...> 73343 WS 8,80 3562214736 4096 528.31 >> <...> 73339 WS 8,80 3562214784 4096 530.13 >> > I just wondering why gc thread does not run up .... > > > Thanks. > > Coly Li > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Large latency with bcache for Ceph OSD 2021-02-26 8:57 ` Norman.Kern @ 2021-02-26 9:54 ` Coly Li 2021-03-02 2:03 ` Norman.Kern 2021-03-02 5:30 ` Norman.Kern 0 siblings, 2 replies; 11+ messages in thread From: Coly Li @ 2021-02-26 9:54 UTC (permalink / raw) To: Norman.Kern; +Cc: linux-block, axboe, linux-bcache On 2/26/21 4:57 PM, Norman.Kern wrote: > [snipped] >> You may try to trigger a gc by writing to >> sys/fs/bcache/<cache-set-uuid>/internal/trigger_gc >> > When all cache had written back, I triggered gc, it recalled. > > root@WXS0106:~# cat /sys/block/bcache0/bcache/cache/cache_available_percent > 30 > > root@WXS0106:~# echo 1 > /sys/block/bcache0/bcache/cache/internal/trigger_gc > root@WXS0106:~# cat /sys/block/bcache0/bcache/cache/cache_available_percent > 97 > > Why must I trigger gc manually? Is not a default action of bcache-gc thread? And I found it can only work when all dirty data written back. > 1, GC is automatically triggered after some mount of data consumed. I guess it is just not about time in your situation. 2, Because the gc will shrink all cached clean data, which is very unfriendly for read-intend workload. Therefore gc_after_writeback is defaulted as 0, when this sysfs file content set to 1, a gc will trigger after the writeback accomplished. Coly Li ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Large latency with bcache for Ceph OSD 2021-02-26 9:54 ` Coly Li @ 2021-03-02 2:03 ` Norman.Kern 2021-03-02 5:30 ` Norman.Kern 1 sibling, 0 replies; 11+ messages in thread From: Norman.Kern @ 2021-03-02 2:03 UTC (permalink / raw) To: Coly Li; +Cc: linux-block, axboe, linux-bcache On 2021/2/26 下午5:54, Coly Li wrote: > On 2/26/21 4:57 PM, Norman.Kern wrote: > [snipped] >>> You may try to trigger a gc by writing to >>> sys/fs/bcache/<cache-set-uuid>/internal/trigger_gc >>> >> When all cache had written back, I triggered gc, it recalled. >> >> root@WXS0106:~# cat /sys/block/bcache0/bcache/cache/cache_available_percent >> 30 >> >> root@WXS0106:~# echo 1 > /sys/block/bcache0/bcache/cache/internal/trigger_gc >> root@WXS0106:~# cat /sys/block/bcache0/bcache/cache/cache_available_percent >> 97 >> >> Why must I trigger gc manually? Is not a default action of bcache-gc thread? And I found it can only work when all dirty data written back. >> > 1, GC is automatically triggered after some mount of data consumed. I > guess it is just not about time in your situation. > > 2, Because the gc will shrink all cached clean data, which is very > unfriendly for read-intend workload. Therefore gc_after_writeback is > defaulted as 0, when this sysfs file content set to 1, a gc will trigger > after the writeback accomplished. I made a test again and get more infomation: root@WXS0089:~# cat /sys/block/bcache0/bcache/dirty_data 0.0k root@WXS0089:~# lsblk /dev/sda NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 447.1G 0 disk `-bcache0 252:0 0 10.9T 0 disk root@WXS0089:~# cat /sys/block/sda/bcache/priority_stats Unused: 1% Clean: 29% Dirty: 70% Metadata: 0% Average: 49 Sectors per Q: 29184768 Quantiles: [1 2 3 5 6 8 9 11 13 14 16 19 21 23 26 29 32 36 39 43 48 53 59 65 73 83 94 109 129 156 203] root@WXS0089:~# cat /sys/fs/bcache/066319e1-8680-4b5b-adb8-49596319154b/internal/gc_after_writeback 1 You have new mail in /var/mail/root root@WXS0089:~# cat /sys/fs/bcache/066319e1-8680-4b5b-adb8-49596319154b/cache_available_percent 28 I read the source codes and found if cache_available_percent > 50, it should wakeup gc while doing writeback, but it seemed not work right. > > Coly Li > > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Large latency with bcache for Ceph OSD 2021-02-26 9:54 ` Coly Li 2021-03-02 2:03 ` Norman.Kern @ 2021-03-02 5:30 ` Norman.Kern 1 sibling, 0 replies; 11+ messages in thread From: Norman.Kern @ 2021-03-02 5:30 UTC (permalink / raw) To: Coly Li; +Cc: linux-block, axboe, linux-bcache On 2021/2/26 下午5:54, Coly Li wrote: > On 2/26/21 4:57 PM, Norman.Kern wrote: > [snipped] >>> You may try to trigger a gc by writing to >>> sys/fs/bcache/<cache-set-uuid>/internal/trigger_gc >>> >> When all cache had written back, I triggered gc, it recalled. >> >> root@WXS0106:~# cat /sys/block/bcache0/bcache/cache/cache_available_percent >> 30 >> >> root@WXS0106:~# echo 1 > /sys/block/bcache0/bcache/cache/internal/trigger_gc >> root@WXS0106:~# cat /sys/block/bcache0/bcache/cache/cache_available_percent >> 97 >> >> Why must I trigger gc manually? Is not a default action of bcache-gc thread? And I found it can only work when all dirty data written back. >> > 1, GC is automatically triggered after some mount of data consumed. I > guess it is just not about time in your situation. > > 2, Because the gc will shrink all cached clean data, which is very > unfriendly for read-intend workload. Therefore gc_after_writeback is > defaulted as 0, when this sysfs file content set to 1, a gc will trigger > after the writeback accomplished. I made a test again and get more infomation: root@WXS0089:~# cat /sys/block/bcache0/bcache/dirty_data 0.0k root@WXS0089:~# lsblk /dev/sda NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 447.1G 0 disk `-bcache0 252:0 0 10.9T 0 disk root@WXS0089:~# cat /sys/block/sda/bcache/priority_stats Unused: 1% Clean: 29% Dirty: 70% Metadata: 0% Average: 49 Sectors per Q: 29184768 Quantiles: [1 2 3 5 6 8 9 11 13 14 16 19 21 23 26 29 32 36 39 43 48 53 59 65 73 83 94 109 129 156 203] root@WXS0089:~# cat /sys/fs/bcache/066319e1-8680-4b5b-adb8-49596319154b/internal/gc_after_writeback 1 You have new mail in /var/mail/root root@WXS0089:~# cat /sys/fs/bcache/066319e1-8680-4b5b-adb8-49596319154b/cache_available_percent 28 I read the source codes and found if cache_available_percent > 50, it should wakeup gc while doing writeback, but it seemed not work right. > > Coly Li > > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-03-02 23:52 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-02-18 7:56 Large latency with bcache for Ceph OSD Norman.Kern 2021-02-21 23:48 ` Norman.Kern 2021-02-24 8:52 ` Coly Li 2021-02-25 2:22 ` Norman.Kern 2021-02-25 2:23 ` Norman.Kern 2021-02-25 13:00 ` Norman.Kern 2021-02-25 14:44 ` Coly Li 2021-02-26 8:57 ` Norman.Kern 2021-02-26 9:54 ` Coly Li 2021-03-02 2:03 ` Norman.Kern 2021-03-02 5:30 ` Norman.Kern
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.