* Issues about the merge_bvec_fn callback in 3.10 series
[not found] <S1732749AbfE3EBS/20190530040119Z+834@vger.kernel.org>
@ 2019-08-21 11:42 ` Jianchao Wang
2019-08-22 1:41 ` Jianchao Wang
2019-08-23 1:04 ` NeilBrown
0 siblings, 2 replies; 4+ messages in thread
From: Jianchao Wang @ 2019-08-21 11:42 UTC (permalink / raw)
To: linux-block, linux-raid
Hi dear all
This is a question in older kernel versions.
We are using 3.10 series kernel in our production. And we encountered issue as below,
When add a page into a bio, .merge_bvec_fn will be invoked down to the bottom,
and the bio->bi_rw would be saved into bvec_merge_data.bi_rw as the following code,
__bio_add_page
---
if (q->merge_bvec_fn) {
struct bvm = {
.bi_bdev = bio->bi_bdev,
.bi_sector = bio->bi_iter.bi_sector,
.bi_size = bio->bi_iter.bi_size,
.bi_rw = bio->bi_rw,
};
/*
* merge_bvec_fn() returns number of bytes it can accept
* at this offset
*/
if (q->merge_bvec_fn(q, &bvm, bvec) < bvec->bv_len) {
bvec->bv_page = NULL;
bvec->bv_len = 0;
bvec->bv_offset = 0;
return 0;
}
}
---
However, it seems that the bio->bi_rw has not been set at the moment (set by submit_bio),
so it is always zero.
We have a raid5 and the raid5_mergeable_bvec would always handle the write as read and then
we always get a write bio with a stripe chunk size which is not expected and would degrade the
performance. This is code,
raid5_mergeable_bvec
---
if ((bvm->bi_rw & 1) == WRITE)
return biovec->bv_len; /* always allow writes to be mergeable */
if (mddev->new_chunk_sectors < mddev->chunk_sectors)
chunk_sectors = mddev->new_chunk_sectors;
max = (chunk_sectors - ((sector & (chunk_sectors - 1)) + bio_sectors)) << 9;
if (max < 0) max = 0;
if (max <= biovec->bv_len && bio_sectors == 0)
return biovec->bv_len;
else
return max;
---
I have checked
v3.10.108
v3.18.140
v4.1.49
but there seems not fix for it.
And maybe it would be fixed until
8ae126660fddbeebb9251a174e6fa45b6ad8f932
block: kill merge_bvec_fn() completely
Would anyone please give some suggestion on this ?
Any comment will be welcomed.
Thanks in advance
Jianchao
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Issues about the merge_bvec_fn callback in 3.10 series
2019-08-21 11:42 ` Issues about the merge_bvec_fn callback in 3.10 series Jianchao Wang
@ 2019-08-22 1:41 ` Jianchao Wang
2019-08-23 1:04 ` NeilBrown
1 sibling, 0 replies; 4+ messages in thread
From: Jianchao Wang @ 2019-08-22 1:41 UTC (permalink / raw)
To: linux-block, linux-raid; +Cc: axboe, neilb, songliubraving
Would anyone please give some comment here ?
Should we discard the merge_bvec_fn for raid5 and backport the bio split code there ?
Thanks in advance.
Jianchao
On 2019/8/21 19:42, Jianchao Wang wrote:
> Hi dear all
>
> This is a question in older kernel versions.
>
> We are using 3.10 series kernel in our production. And we encountered issue as below,
>
> When add a page into a bio, .merge_bvec_fn will be invoked down to the bottom,
> and the bio->bi_rw would be saved into bvec_merge_data.bi_rw as the following code,
>
> __bio_add_page
> ---
> if (q->merge_bvec_fn) {
> struct bvm = {
> .bi_bdev = bio->bi_bdev,
> .bi_sector = bio->bi_iter.bi_sector,
> .bi_size = bio->bi_iter.bi_size,
> .bi_rw = bio->bi_rw,
> };
>
> /*
> * merge_bvec_fn() returns number of bytes it can accept
> * at this offset
> */
> if (q->merge_bvec_fn(q, &bvm, bvec) < bvec->bv_len) {
> bvec->bv_page = NULL;
> bvec->bv_len = 0;
> bvec->bv_offset = 0;
> return 0;
> }
> }
> ---
>
> However, it seems that the bio->bi_rw has not been set at the moment (set by submit_bio),
> so it is always zero.
>
> We have a raid5 and the raid5_mergeable_bvec would always handle the write as read and then
> we always get a write bio with a stripe chunk size which is not expected and would degrade the
> performance. This is code,
>
> raid5_mergeable_bvec
> ---
> if ((bvm->bi_rw & 1) == WRITE)
> return biovec->bv_len; /* always allow writes to be mergeable */
>
> if (mddev->new_chunk_sectors < mddev->chunk_sectors)
> chunk_sectors = mddev->new_chunk_sectors;
> max = (chunk_sectors - ((sector & (chunk_sectors - 1)) + bio_sectors)) << 9;
> if (max < 0) max = 0;
> if (max <= biovec->bv_len && bio_sectors == 0)
> return biovec->bv_len;
> else
> return max;
>
> ---
>
> I have checked
> v3.10.108
> v3.18.140
> v4.1.49
> but there seems not fix for it.
>
> And maybe it would be fixed until
> 8ae126660fddbeebb9251a174e6fa45b6ad8f932
> block: kill merge_bvec_fn() completely
>
> Would anyone please give some suggestion on this ?
> Any comment will be welcomed.
>
> Thanks in advance
> Jianchao
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Issues about the merge_bvec_fn callback in 3.10 series
2019-08-21 11:42 ` Issues about the merge_bvec_fn callback in 3.10 series Jianchao Wang
2019-08-22 1:41 ` Jianchao Wang
@ 2019-08-23 1:04 ` NeilBrown
2019-08-23 1:31 ` Jianchao Wang
1 sibling, 1 reply; 4+ messages in thread
From: NeilBrown @ 2019-08-23 1:04 UTC (permalink / raw)
To: Jianchao Wang, linux-block, linux-raid
[-- Attachment #1: Type: text/plain, Size: 2408 bytes --]
On Wed, Aug 21 2019, Jianchao Wang wrote:
> Hi dear all
>
> This is a question in older kernel versions.
>
> We are using 3.10 series kernel in our production. And we encountered issue as below,
>
> When add a page into a bio, .merge_bvec_fn will be invoked down to the bottom,
> and the bio->bi_rw would be saved into bvec_merge_data.bi_rw as the following code,
>
> __bio_add_page
> ---
> if (q->merge_bvec_fn) {
> struct bvm = {
> .bi_bdev = bio->bi_bdev,
> .bi_sector = bio->bi_iter.bi_sector,
> .bi_size = bio->bi_iter.bi_size,
> .bi_rw = bio->bi_rw,
> };
>
> /*
> * merge_bvec_fn() returns number of bytes it can accept
> * at this offset
> */
> if (q->merge_bvec_fn(q, &bvm, bvec) < bvec->bv_len) {
> bvec->bv_page = NULL;
> bvec->bv_len = 0;
> bvec->bv_offset = 0;
> return 0;
> }
> }
> ---
>
> However, it seems that the bio->bi_rw has not been set at the moment (set by submit_bio),
> so it is always zero.
Yeah, that's a problem.
>
> We have a raid5 and the raid5_mergeable_bvec would always handle the write as read and then
> we always get a write bio with a stripe chunk size which is not expected and would degrade the
> performance. This is code,
>
> raid5_mergeable_bvec
> ---
> if ((bvm->bi_rw & 1) == WRITE)
> return biovec->bv_len; /* always allow writes to be mergeable */
>
> if (mddev->new_chunk_sectors < mddev->chunk_sectors)
> chunk_sectors = mddev->new_chunk_sectors;
> max = (chunk_sectors - ((sector & (chunk_sectors - 1)) + bio_sectors)) << 9;
> if (max < 0) max = 0;
> if (max <= biovec->bv_len && bio_sectors == 0)
> return biovec->bv_len;
> else
> return max;
>
> ---
>
> I have checked
> v3.10.108
> v3.18.140
> v4.1.49
> but there seems not fix for it.
>
> And maybe it would be fixed until
> 8ae126660fddbeebb9251a174e6fa45b6ad8f932
> block: kill merge_bvec_fn() completely
>
> Would anyone please give some suggestion on this ?
One option would be to make sure that ->bi_rw is set before
bio_add_page is called.
There are about 80 calls, so that isn't trivial, but you might not care
about several of them.
You could backport the 'kill merge_bvec_fn' patch if you like, but I
wouldn't. The change of introducing a bug is much higher.
NeilBrown
> Any comment will be welcomed.
>
> Thanks in advance
> Jianchao
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Issues about the merge_bvec_fn callback in 3.10 series
2019-08-23 1:04 ` NeilBrown
@ 2019-08-23 1:31 ` Jianchao Wang
0 siblings, 0 replies; 4+ messages in thread
From: Jianchao Wang @ 2019-08-23 1:31 UTC (permalink / raw)
To: NeilBrown, linux-block, linux-raid
Hi Neil
Thanks so much for your suggestion.
On 2019/8/23 9:04, NeilBrown wrote:
>> I have checked
>> v3.10.108
>> v3.18.140
>> v4.1.49
>> but there seems not fix for it.
>>
>> And maybe it would be fixed until
>> 8ae126660fddbeebb9251a174e6fa45b6ad8f932
>> block: kill merge_bvec_fn() completely
>>
>> Would anyone please give some suggestion on this ?
>
> One option would be to make sure that ->bi_rw is set before
> bio_add_page is called.
> There are about 80 calls, so that isn't trivial, but you might not care
> about several of them.
>
> You could backport the 'kill merge_bvec_fn' patch if you like, but I
> wouldn't. The change of introducing a bug is much higher.
>
I have killed the raid5_mergeable_bvec and backport the patches that
could make the chunk_aligned_read be able to split bio by its own.
Then I just need to modify the raid5 code and needn't to touch other part
of the system, especially the block core.
It seems work well till now.
Thanks again
Jianchao
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-08-23 1:31 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <S1732749AbfE3EBS/20190530040119Z+834@vger.kernel.org>
2019-08-21 11:42 ` Issues about the merge_bvec_fn callback in 3.10 series Jianchao Wang
2019-08-22 1:41 ` Jianchao Wang
2019-08-23 1:04 ` NeilBrown
2019-08-23 1:31 ` Jianchao Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).