From: "Norman.Kern" <norman.kern@gmx.com>
To: Coly Li <colyli@suse.de>
Cc: linux-block@vger.kernel.org, axboe@kernel.dk,
linux-bcache@vger.kernel.org
Subject: Re: Large latency with bcache for Ceph OSD
Date: Thu, 25 Feb 2021 21:00:43 +0800 [thread overview]
Message-ID: <cfe2746f-18a7-a768-ea72-901793a3133e@gmx.com> (raw)
In-Reply-To: <07bcb6c8-21e1-11de-d1f0-ffd417bd36ff@gmx.com>
I made a test:
- Stop writing and wait for dirty data writen back
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdf 8:80 0 7.3T 0 disk
└─bcache0 252:0 0 7.3T 0 disk
└─ceph--32a481f9--313c--417e--aaf7--bdd74515fd86-osd--data--2f670929--3c8a--45dd--bcef--c60ce3ee08e1 253:1 0 7.3T 0 lvm
sdd 8:48 0 7.3T 0 disk
sdb 8:16 0 7.3T 0 disk
sdk 8:160 0 893.8G 0 disk
└─bcache0 252:0 0 7.3T 0 disk
└─ceph--32a481f9--313c--417e--aaf7--bdd74515fd86-osd--data--2f670929--3c8a--45dd--bcef--c60ce3ee08e1 253:1 0 7.3T 0 lvm
$ cat /sys/block/bcache0/bcache/dirty_data
0.0k
root@WXS0106:~# bcache-super-show /dev/sdf
sb.magic ok
sb.first_sector 8 [match]
sb.csum 71DA9CA968B4A625 [match]
sb.version 1 [backing device]
dev.label (empty)
dev.uuid d07dc435-129d-477d-8378-a6af75199852
dev.sectors_per_block 8
dev.sectors_per_bucket 1024
dev.data.first_sector 16
dev.data.cache_mode 1 [writeback]
dev.data.cache_state 1 [clean]
cset.uuid d87713c6-2e76-4a09-8517-d48306468659
- check the available cache
# cat /sys/fs/bcache/d87713c6-2e76-4a09-8517-d48306468659/cache_available_percent
27
As the doc described:
cache_available_percent
Percentage of cache device which doesn’t contain dirty data, and could potentially be used for writeback. This doesn’t mean this space isn’t used for clean cached data; the unused statistic (in priority_stats) is typically much lower.
When all dirty data writen back, why cache_available_percent was not 100?
And when I start the write I/O, the new writen didn't replace the clean cache(it think the cache is diry now?), so it cause the hdd with large latency:
./bin/iosnoop -Q -d '8,80'
<...> 73338 WS 8,80 3513701472 4096 217.69
<...> 73338 WS 8,80 3513759360 4096 448.80
<...> 73338 WS 8,80 3562211912 4096 511.69
<...> 73335 WS 8,80 3562212528 4096 505.08
<...> 73339 WS 8,80 3562213376 4096 501.19
<...> 73336 WS 8,80 3562213992 4096 511.16
<...> 73343 WS 8,80 3562214016 4096 511.74
<...> 73340 WS 8,80 3562214128 4096 512.95
<...> 73329 WS 8,80 3562214208 4096 510.48
<...> 73338 WS 8,80 3562214600 4096 518.64
<...> 73341 WS 8,80 3562214632 4096 519.09
<...> 73342 WS 8,80 3562214664 4096 518.28
<...> 73336 WS 8,80 3562214688 4096 519.27
<...> 73343 WS 8,80 3562214736 4096 528.31
<...> 73339 WS 8,80 3562214784 4096 530.13
On 2021/2/25 上午10:23, Norman.Kern wrote:
> On 2021/2/24 下午4:52, Coly Li wrote:
>> On 2/22/21 7:48 AM, Norman.Kern wrote:
>>> Ping.
>>>
>>> I'm confused on the SYNC I/O on bcache. why SYNC I/O must be writen back
>>> for persistent cache? It can cause some latency.
>>>
>>> @Coly, can you give help me to explain why bcache handle O_SYNC like this.?
>>>
>>>
>> Hmm, normally we won't observe the application issuing I/Os on backing
>> device except for,
>> - I/O bypass by SSD congestion
>> - Sequential I/O request
>> - Dirty buckets exceeds the cutoff threshold
>> - Write through mode
>>
>> Do you set the write/read congestion threshold to 0 ?
> Thanks for you reply.
>
> I have set the threshold to zero, all configs:
>
> #make-bcache -C -b 4m -w 4k --discard --cache_replacement_policy=lru /dev/sdm
> #make-bcache -B --writeback -w 4KiB /dev/sdn --wipe-bcache
> congested_read_threshold_us = 0
> congested_write_threshold_us = 0
>
> # I tried to set sequential_cutoff to 0, but it didn't solve it.
>
> sequential_cutoff = 4194304
> writeback_percent = 40
> cache_mode = writeback
>
> I renew the cluster, run for hours and reproduced the problem. I check the cache status:
>
> root@WXS0106:/root/perf-tools# cat /sys/fs/bcache/d87713c6-2e76-4a09-8517-d48306468659/cache_available_percent
> 29
> root@WXS0106:/root/perf-tools# cat /sys/fs/bcache/d87713c6-2e76-4a09-8517-d48306468659/internal/cutoff_writeback_sync
> 70
> 'Dirty buckets exceeds the cutoff threshold' caused the problem? My configs are wrong or other reasons?
>
>> Coly Li
>>
>>> On 2021/2/18 下午3:56, Norman.Kern wrote:
>>>> Hi guys,
>>>>
>>>> I am testing ceph with bcache, I found some I/O with O_SYNC writeback
>>>> to HDD, which caused large latency on HDD, I trace the I/O with iosnoop:
>>>>
>>>> ./iosnoop -Q -ts -d '8,192
>>>>
>>>> Tracing block I/O for 1 seconds (buffered)...
>>>> STARTs ENDs COMM PID TYPE DEV
>>>> BLOCK BYTES LATms
>>>>
>>>> 1809296.292350 1809296.319052 tp_osd_tp 22191 R 8,192
>>>> 4578940240 16384 26.70
>>>> 1809296.292330 1809296.320974 tp_osd_tp 22191 R 8,192
>>>> 4577938704 16384 28.64
>>>> 1809296.292614 1809296.323292 tp_osd_tp 22191 R 8,192
>>>> 4600404304 16384 30.68
>>>> 1809296.292353 1809296.325300 tp_osd_tp 22191 R 8,192
>>>> 4578343088 16384 32.95
>>>> 1809296.292340 1809296.328013 tp_osd_tp 22191 R 8,192
>>>> 4578055472 16384 35.67
>>>> 1809296.292606 1809296.330518 tp_osd_tp 22191 R 8,192
>>>> 4578581648 16384 37.91
>>>> 1809295.169266 1809296.334041 bstore_kv_fi 17266 WS 8,192
>>>> 4244996360 4096 1164.78
>>>> 1809296.292618 1809296.336349 tp_osd_tp 22191 R 8,192
>>>> 4602631760 16384 43.73
>>>> 1809296.292618 1809296.338812 tp_osd_tp 22191 R 8,192
>>>> 4602632976 16384 46.19
>>>> 1809296.030103 1809296.342780 tp_osd_tp 22180 WS 8,192
>>>> 4741276048 131072 312.68
>>>> 1809296.292347 1809296.345045 tp_osd_tp 22191 R 8,192
>>>> 4609037872 16384 52.70
>>>> 1809296.292620 1809296.345109 tp_osd_tp 22191 R 8,192
>>>> 4609037904 16384 52.49
>>>> 1809296.292612 1809296.347251 tp_osd_tp 22191 R 8,192
>>>> 4578937616 16384 54.64
>>>> 1809296.292621 1809296.351136 tp_osd_tp 22191 R 8,192
>>>> 4612654992 16384 58.51
>>>> 1809296.292341 1809296.353428 tp_osd_tp 22191 R 8,192
>>>> 4578220656 16384 61.09
>>>> 1809296.292342 1809296.353864 tp_osd_tp 22191 R 8,192
>>>> 4578220880 16384 61.52
>>>> 1809295.167650 1809296.358510 bstore_kv_fi 17266 WS 8,192
>>>> 4923695960 4096 1190.86
>>>> 1809296.292347 1809296.361885 tp_osd_tp 22191 R 8,192
>>>> 4607437136 16384 69.54
>>>> 1809296.029363 1809296.367313 tp_osd_tp 22180 WS 8,192
>>>> 4739824400 98304 337.95
>>>> 1809296.292349 1809296.370245 tp_osd_tp 22191 R 8,192
>>>> 4591379888 16384 77.90
>>>> 1809296.292348 1809296.376273 tp_osd_tp 22191 R 8,192
>>>> 4591289552 16384 83.92
>>>> 1809296.292353 1809296.378659 tp_osd_tp 22191 R 8,192
>>>> 4578248656 16384 86.31
>>>> 1809296.292619 1809296.384835 tp_osd_tp 22191 R 8,192
>>>> 4617494160 65536 92.22
>>>> 1809295.165451 1809296.393715 bstore_kv_fi 17266 WS 8,192
>>>> 1355703120 4096 1228.26
>>>> 1809295.168595 1809296.401560 bstore_kv_fi 17266 WS 8,192
>>>> 1122200 4096 1232.96
>>>> 1809295.165221 1809296.408018 bstore_kv_fi 17266 WS 8,192
>>>> 960656 4096 1242.80
>>>> 1809295.166737 1809296.411505 bstore_kv_fi 17266 WS 8,192
>>>> 57682504 4096 1244.77
>>>> 1809296.292352 1809296.418123 tp_osd_tp 22191 R 8,192
>>>> 4579459056 32768 125.77
>>>>
>>>> I'm confused why write with O_SYNC must writeback on the backend
>>>> storage device? And when I used bcache for a time,
>>>>
>>>> the latency increased a lot.(The SSD is not very busy), There's some
>>>> best practices on configuration?
>>>>
next prev parent reply other threads:[~2021-02-25 13:03 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-18 7:56 Large latency with bcache for Ceph OSD Norman.Kern
2021-02-21 23:48 ` Norman.Kern
2021-02-24 8:52 ` Coly Li
2021-02-25 2:22 ` Norman.Kern
2021-02-25 2:23 ` Norman.Kern
2021-02-25 13:00 ` Norman.Kern [this message]
2021-02-25 14:44 ` Coly Li
2021-02-26 8:57 ` Norman.Kern
2021-02-26 9:54 ` Coly Li
2021-03-02 2:03 ` Norman.Kern
2021-03-02 5:30 ` Norman.Kern
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cfe2746f-18a7-a768-ea72-901793a3133e@gmx.com \
--to=norman.kern@gmx.com \
--cc=axboe@kernel.dk \
--cc=colyli@suse.de \
--cc=linux-bcache@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).