linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
       [not found] <20210813060510.3545109-1-guoqing.jiang@linux.dev>
@ 2021-08-13  7:49 ` Christoph Hellwig
  2021-08-13  8:38   ` Guoqing Jiang
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2021-08-13  7:49 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: song, linux-raid, jens, linux-block

On Fri, Aug 13, 2021 at 02:05:10PM +0800, Guoqing Jiang wrote:
> From: Guoqing Jiang <jiangguoqing@kylinos.cn>
> 
> We can't split bio with more than BIO_MAX_VECS sectors, otherwise the
> below call trace was triggered because we could allocate oversized
> write behind bio later.
> 
> [ 8.097936] bvec_alloc+0x90/0xc0
> [ 8.098934] bio_alloc_bioset+0x1b3/0x260
> [ 8.099959] raid1_make_request+0x9ce/0xc50 [raid1]

Which bio_alloc_bioset is this?  The one in alloc_behind_master_bio?

In which case I think you want to limit the reduction of max_sectors
to just the write behind case, and clearly document what is going on.

In general the size of a bio only depends on the number of vectors, not
the total I/O size.  But alloc_behind_master_bio allocates new backing
pages using order 0 allocations, so in this exceptional case the total
size oes actually matter.

While we're at it: this huge memory allocation looks really deadlock
prone.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
  2021-08-13  7:49 ` [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors Christoph Hellwig
@ 2021-08-13  8:38   ` Guoqing Jiang
  2021-08-14  7:55     ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Guoqing Jiang @ 2021-08-13  8:38 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: song, linux-raid, jens, linux-block



On 8/13/21 3:49 PM, Christoph Hellwig wrote:
> On Fri, Aug 13, 2021 at 02:05:10PM +0800, Guoqing Jiang wrote:
>> From: Guoqing Jiang <jiangguoqing@kylinos.cn>
>>
>> We can't split bio with more than BIO_MAX_VECS sectors, otherwise the
>> below call trace was triggered because we could allocate oversized
>> write behind bio later.
>>
>> [ 8.097936] bvec_alloc+0x90/0xc0
>> [ 8.098934] bio_alloc_bioset+0x1b3/0x260
>> [ 8.099959] raid1_make_request+0x9ce/0xc50 [raid1]
> Which bio_alloc_bioset is this?  The one in alloc_behind_master_bio?

Yes, it should be the one since bio_clone_fast calls bio_alloc_bioset 
with 0 iovecs.

> In which case I think you want to limit the reduction of max_sectors
> to just the write behind case, and clearly document what is going on.

Ok, thanks.

> In general the size of a bio only depends on the number of vectors, not
> the total I/O size.  But alloc_behind_master_bio allocates new backing
> pages using order 0 allocations, so in this exceptional case the total
> size oes actually matter.
>
> While we're at it: this huge memory allocation looks really deadlock
> prone.

Hmm, let me think more about it, or could you share your thought? 😉

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
  2021-08-13  8:38   ` Guoqing Jiang
@ 2021-08-14  7:55     ` Christoph Hellwig
  2021-08-14  8:57       ` Ming Lei
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2021-08-14  7:55 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: Christoph Hellwig, song, linux-raid, jens, linux-block

On Fri, Aug 13, 2021 at 04:38:59PM +0800, Guoqing Jiang wrote:
> 
> Ok, thanks.
> 
> > In general the size of a bio only depends on the number of vectors, not
> > the total I/O size.  But alloc_behind_master_bio allocates new backing
> > pages using order 0 allocations, so in this exceptional case the total
> > size oes actually matter.
> > 
> > While we're at it: this huge memory allocation looks really deadlock
> > prone.
> 
> Hmm, let me think more about it, or could you share your thought? ????

Well, you'd need a mempool which can fit the max payload of a bio,
that is BIO_MAX_VECS pages.

FYI, this is what I'd do instead of this patch for now.  We don't really
need a vetor per sector, just per page.  So this limits the I/O
size a little less.

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 3c44c4bb40fc..5b27d995302e 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1454,6 +1454,15 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 		goto retry_write;
 	}
 
+	/*
+	 * When using a bitmap, we may call alloc_behind_master_bio below.
+	 * alloc_behind_master_bio allocates a copy of the data payload a page
+	 * at a time and thus needs a new bio that can fit the whole payload
+	 * this bio in page sized chunks.
+	 */
+	if (bitmap)
+		max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SIZE);
+
 	if (max_sectors < bio_sectors(bio)) {
 		struct bio *split = bio_split(bio, max_sectors,
 					      GFP_NOIO, &conf->bio_split);

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
  2021-08-14  7:55     ` Christoph Hellwig
@ 2021-08-14  8:57       ` Ming Lei
  2021-08-16  6:27         ` Guoqing Jiang
  2021-08-16  9:37         ` Christoph Hellwig
  0 siblings, 2 replies; 12+ messages in thread
From: Ming Lei @ 2021-08-14  8:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Guoqing Jiang, song, linux-raid, jens, linux-block

On Sat, Aug 14, 2021 at 08:55:21AM +0100, Christoph Hellwig wrote:
> On Fri, Aug 13, 2021 at 04:38:59PM +0800, Guoqing Jiang wrote:
> > 
> > Ok, thanks.
> > 
> > > In general the size of a bio only depends on the number of vectors, not
> > > the total I/O size.  But alloc_behind_master_bio allocates new backing
> > > pages using order 0 allocations, so in this exceptional case the total
> > > size oes actually matter.
> > > 
> > > While we're at it: this huge memory allocation looks really deadlock
> > > prone.
> > 
> > Hmm, let me think more about it, or could you share your thought? ????
> 
> Well, you'd need a mempool which can fit the max payload of a bio,
> that is BIO_MAX_VECS pages.
> 
> FYI, this is what I'd do instead of this patch for now.  We don't really
> need a vetor per sector, just per page.  So this limits the I/O
> size a little less.
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 3c44c4bb40fc..5b27d995302e 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1454,6 +1454,15 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
>  		goto retry_write;
>  	}
>  
> +	/*
> +	 * When using a bitmap, we may call alloc_behind_master_bio below.
> +	 * alloc_behind_master_bio allocates a copy of the data payload a page
> +	 * at a time and thus needs a new bio that can fit the whole payload
> +	 * this bio in page sized chunks.
> +	 */
> +	if (bitmap)
> +		max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SIZE);

s/PAGE_SIZE/PAGE_SECTORS

> +
>  	if (max_sectors < bio_sectors(bio)) {
>  		struct bio *split = bio_split(bio, max_sectors,
>  					      GFP_NOIO, &conf->bio_split);
> 

Here the limit is max single-page vectors, and the above way may not work,
such as:

0 ~ 254: each bvec's length is 512
255: bvec's length is 8192

the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it
still may need 257 bvecs, which can't be allocated via bio_alloc_bioset().

One solution is to add queue limit of max_single_page_bvec, and let
blk_queue_split() handle it.



Thanks,
Ming


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
  2021-08-14  8:57       ` Ming Lei
@ 2021-08-16  6:27         ` Guoqing Jiang
  2021-08-16  7:13           ` Ming Lei
  2021-08-16  9:37         ` Christoph Hellwig
  1 sibling, 1 reply; 12+ messages in thread
From: Guoqing Jiang @ 2021-08-16  6:27 UTC (permalink / raw)
  To: Ming Lei, Christoph Hellwig; +Cc: song, linux-raid, jens, linux-block

Hi Ming and Christoph,

On 8/14/21 4:57 PM, Ming Lei wrote:
> On Sat, Aug 14, 2021 at 08:55:21AM +0100, Christoph Hellwig wrote:
>> On Fri, Aug 13, 2021 at 04:38:59PM +0800, Guoqing Jiang wrote:
>>> Ok, thanks.
>>>
>>>> In general the size of a bio only depends on the number of vectors, not
>>>> the total I/O size.  But alloc_behind_master_bio allocates new backing
>>>> pages using order 0 allocations, so in this exceptional case the total
>>>> size oes actually matter.
>>>>
>>>> While we're at it: this huge memory allocation looks really deadlock
>>>> prone.
>>> Hmm, let me think more about it, or could you share your thought? ????
>> Well, you'd need a mempool which can fit the max payload of a bio,
>> that is BIO_MAX_VECS pages.

IIUC, the behind bio is allocated from bio_set (mddev->bio_set) which is 
allocated in md_run by
call bioset_init, so the mempool (bvec_pool) of  this bio_set is created 
by biovec_init_pool which
uses global biovec slabs. Do we really need another mempool? Or, there 
is no potential deadlock
  for this case.

>> FYI, this is what I'd do instead of this patch for now.  We don't really
>> need a vetor per sector, just per page.  So this limits the I/O
>> size a little less.
>>
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index 3c44c4bb40fc..5b27d995302e 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -1454,6 +1454,15 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
>>   		goto retry_write;
>>   	}
>>   
>> +	/*
>> +	 * When using a bitmap, we may call alloc_behind_master_bio below.
>> +	 * alloc_behind_master_bio allocates a copy of the data payload a page
>> +	 * at a time and thus needs a new bio that can fit the whole payload
>> +	 * this bio in page sized chunks.
>> +	 */

Thanks for the above, will copy it accordingly. I will check if 
WriteMostly is set before, then check both
the flag and bitmap.

>> +	if (bitmap)
>> +		max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SIZE);
> s/PAGE_SIZE/PAGE_SECTORS

Agree.

>> +
>>   	if (max_sectors < bio_sectors(bio)) {
>>   		struct bio *split = bio_split(bio, max_sectors,
>>   					      GFP_NOIO, &conf->bio_split);
> Here the limit is max single-page vectors, and the above way may not work,
> such as:ust splitted and not
>
> 0 ~ 254: each bvec's length is 512
> 255: bvec's length is 8192
>
> the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it
> still may need 257 bvecs, which can't be allocated via bio_alloc_bioset().

Thanks for deeper looking! I guess it is because how vcnt is calculated.

> One solution is to add queue limit of max_single_page_bvec, and let
> blk_queue_split() handle it.

The path (blk_queue_split -> blk_bio_segment_split -> bvec_split_segs) 
which respects max_segments
of limit. Do you mean introduce max_single_page_bvec to limit? Then 
perform similar checking as for
  max_segment.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
  2021-08-16  6:27         ` Guoqing Jiang
@ 2021-08-16  7:13           ` Ming Lei
  0 siblings, 0 replies; 12+ messages in thread
From: Ming Lei @ 2021-08-16  7:13 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: Christoph Hellwig, song, linux-raid, jens, linux-block

On Mon, Aug 16, 2021 at 02:27:48PM +0800, Guoqing Jiang wrote:
> Hi Ming and Christoph,
> 
> On 8/14/21 4:57 PM, Ming Lei wrote:
> > On Sat, Aug 14, 2021 at 08:55:21AM +0100, Christoph Hellwig wrote:
> > > On Fri, Aug 13, 2021 at 04:38:59PM +0800, Guoqing Jiang wrote:
> > > > Ok, thanks.
> > > > 
> > > > > In general the size of a bio only depends on the number of vectors, not
> > > > > the total I/O size.  But alloc_behind_master_bio allocates new backing
> > > > > pages using order 0 allocations, so in this exceptional case the total
> > > > > size oes actually matter.
> > > > > 
> > > > > While we're at it: this huge memory allocation looks really deadlock
> > > > > prone.
> > > > Hmm, let me think more about it, or could you share your thought? ????
> > > Well, you'd need a mempool which can fit the max payload of a bio,
> > > that is BIO_MAX_VECS pages.
> 
> IIUC, the behind bio is allocated from bio_set (mddev->bio_set) which is
> allocated in md_run by
> call bioset_init, so the mempool (bvec_pool) of  this bio_set is created by
> biovec_init_pool which
> uses global biovec slabs. Do we really need another mempool? Or, there is no
> potential deadlock
>  for this case.
> 
> > > FYI, this is what I'd do instead of this patch for now.  We don't really
> > > need a vetor per sector, just per page.  So this limits the I/O
> > > size a little less.
> > > 
> > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> > > index 3c44c4bb40fc..5b27d995302e 100644
> > > --- a/drivers/md/raid1.c
> > > +++ b/drivers/md/raid1.c
> > > @@ -1454,6 +1454,15 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
> > >   		goto retry_write;
> > >   	}
> > > +	/*
> > > +	 * When using a bitmap, we may call alloc_behind_master_bio below.
> > > +	 * alloc_behind_master_bio allocates a copy of the data payload a page
> > > +	 * at a time and thus needs a new bio that can fit the whole payload
> > > +	 * this bio in page sized chunks.
> > > +	 */
> 
> Thanks for the above, will copy it accordingly. I will check if WriteMostly
> is set before, then check both
> the flag and bitmap.
> 
> > > +	if (bitmap)
> > > +		max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SIZE);
> > s/PAGE_SIZE/PAGE_SECTORS
> 
> Agree.
> 
> > > +
> > >   	if (max_sectors < bio_sectors(bio)) {
> > >   		struct bio *split = bio_split(bio, max_sectors,
> > >   					      GFP_NOIO, &conf->bio_split);
> > Here the limit is max single-page vectors, and the above way may not work,
> > such as:ust splitted and not
> > 
> > 0 ~ 254: each bvec's length is 512
> > 255: bvec's length is 8192
> > 
> > the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it
> > still may need 257 bvecs, which can't be allocated via bio_alloc_bioset().
> 
> Thanks for deeper looking! I guess it is because how vcnt is calculated.
> 
> > One solution is to add queue limit of max_single_page_bvec, and let
> > blk_queue_split() handle it.
> 
> The path (blk_queue_split -> blk_bio_segment_split -> bvec_split_segs) which
> respects max_segments
> of limit. Do you mean introduce max_single_page_bvec to limit? Then perform
> similar checking as for
>  max_segment.

Yes, then the bio is guaranteed to not reach max single-page bvec limit,
just like what __blk_queue_bounce() does.


thanks,
Ming


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
  2021-08-14  8:57       ` Ming Lei
  2021-08-16  6:27         ` Guoqing Jiang
@ 2021-08-16  9:37         ` Christoph Hellwig
  2021-08-16 11:40           ` Ming Lei
  1 sibling, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2021-08-16  9:37 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Guoqing Jiang, song, linux-raid, jens, linux-block

On Sat, Aug 14, 2021 at 04:57:06PM +0800, Ming Lei wrote:
> > +	if (bitmap)
> > +		max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SIZE);
> 
> s/PAGE_SIZE/PAGE_SECTORS

Yeah, max_sectors is in size units, I messed that up.

> 
> > +
> >  	if (max_sectors < bio_sectors(bio)) {
> >  		struct bio *split = bio_split(bio, max_sectors,
> >  					      GFP_NOIO, &conf->bio_split);
> > 
> 
> Here the limit is max single-page vectors, and the above way may not work,
> such as:
> 
> 0 ~ 254: each bvec's length is 512
> 255: bvec's length is 8192
> 
> the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it
> still may need 257 bvecs, which can't be allocated via bio_alloc_bioset().

Yes, we still need the rounding magic that alloc_behind_master_bio uses
here.

> One solution is to add queue limit of max_single_page_bvec, and let
> blk_queue_split() handle it.

I'd rather not bloat the core with this.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
  2021-08-16  9:37         ` Christoph Hellwig
@ 2021-08-16 11:40           ` Ming Lei
  2021-08-17  5:06             ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Ming Lei @ 2021-08-16 11:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Guoqing Jiang, song, linux-raid, jens, linux-block

On Mon, Aug 16, 2021 at 10:37:54AM +0100, Christoph Hellwig wrote:
> On Sat, Aug 14, 2021 at 04:57:06PM +0800, Ming Lei wrote:
> > > +	if (bitmap)
> > > +		max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SIZE);
> > 
> > s/PAGE_SIZE/PAGE_SECTORS
> 
> Yeah, max_sectors is in size units, I messed that up.
> 
> > 
> > > +
> > >  	if (max_sectors < bio_sectors(bio)) {
> > >  		struct bio *split = bio_split(bio, max_sectors,
> > >  					      GFP_NOIO, &conf->bio_split);
> > > 
> > 
> > Here the limit is max single-page vectors, and the above way may not work,
> > such as:
> > 
> > 0 ~ 254: each bvec's length is 512
> > 255: bvec's length is 8192
> > 
> > the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it
> > still may need 257 bvecs, which can't be allocated via bio_alloc_bioset().
> 
> Yes, we still need the rounding magic that alloc_behind_master_bio uses
> here.

But it is wrong to use max sectors to limit number of bvecs(segments), isn't it?


Thanks,
Ming


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
  2021-08-16 11:40           ` Ming Lei
@ 2021-08-17  5:06             ` Christoph Hellwig
  2021-08-17 12:32               ` Ming Lei
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2021-08-17  5:06 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Guoqing Jiang, song, linux-raid, jens, linux-block

On Mon, Aug 16, 2021 at 07:40:48PM +0800, Ming Lei wrote:
> > > 
> > > 0 ~ 254: each bvec's length is 512
> > > 255: bvec's length is 8192
> > > 
> > > the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it
> > > still may need 257 bvecs, which can't be allocated via bio_alloc_bioset().
> > 
> > Yes, we still need the rounding magic that alloc_behind_master_bio uses
> > here.
> 
> But it is wrong to use max sectors to limit number of bvecs(segments), isn't it?

The raid1 write behind code cares about the size ofa bio it can reach by
adding order 0 pages to it.  The bvecs are part of that and I think the
calculation in the patch documents that a well.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
  2021-08-17  5:06             ` Christoph Hellwig
@ 2021-08-17 12:32               ` Ming Lei
  2021-09-24 15:34                 ` Jens Stutte (Archiv)
  0 siblings, 1 reply; 12+ messages in thread
From: Ming Lei @ 2021-08-17 12:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Guoqing Jiang, song, linux-raid, jens, linux-block

On Tue, Aug 17, 2021 at 06:06:12AM +0100, Christoph Hellwig wrote:
> On Mon, Aug 16, 2021 at 07:40:48PM +0800, Ming Lei wrote:
> > > > 
> > > > 0 ~ 254: each bvec's length is 512
> > > > 255: bvec's length is 8192
> > > > 
> > > > the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it
> > > > still may need 257 bvecs, which can't be allocated via bio_alloc_bioset().
> > > 
> > > Yes, we still need the rounding magic that alloc_behind_master_bio uses
> > > here.
> > 
> > But it is wrong to use max sectors to limit number of bvecs(segments), isn't it?
> 
> The raid1 write behind code cares about the size ofa bio it can reach by
> adding order 0 pages to it.  The bvecs are part of that and I think the
> calculation in the patch documents that a well.

Thinking of further, your and Guoqing's patch are correct & enough since
bio_copy_data() just copies bytes(sectors) stream from fs bio to the
write behind bio.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
  2021-08-17 12:32               ` Ming Lei
@ 2021-09-24 15:34                 ` Jens Stutte (Archiv)
  2021-09-25 23:02                   ` Guoqing Jiang
  0 siblings, 1 reply; 12+ messages in thread
From: Jens Stutte (Archiv) @ 2021-09-24 15:34 UTC (permalink / raw)
  To: Ming Lei, Christoph Hellwig; +Cc: Guoqing Jiang, song, linux-raid, linux-block

[-- Attachment #1: Type: text/plain, Size: 3628 bytes --]

Hi all,

I just had the occasion to test the new patch as landed in arch linux 
5.14.7. Unfortunately it does not work for me. Attached you can find a 
modification that works for me, though I am not really sure why 
write_behind seems not to be set to true on my configuration. If there 
is any more data I can provide to help you to investigate, please let me 
know.

Thanks for any clues,

Jens

My configuration:

[root@vdr jens]# mdadm --detail -v /dev/md0
/dev/md0:
            Version : 1.2
      Creation Time : Fri Dec 26 09:50:53 2014
         Raid Level : raid1
         Array Size : 1953381440 (1862.89 GiB 2000.26 GB)
      Used Dev Size : 1953381440 (1862.89 GiB 2000.26 GB)
       Raid Devices : 2
      Total Devices : 2
        Persistence : Superblock is persistent

      Intent Bitmap : Internal

        Update Time : Fri Sep 24 17:30:51 2021
              State : active
     Active Devices : 2
    Working Devices : 2
     Failed Devices : 0
      Spare Devices : 0

Consistency Policy : bitmap

               Name : vdr:0  (local to host vdr)
               UUID : 5532ffda:ccbc790f:b50c4959:8f0fd43f
             Events : 32805

     Number   Major   Minor   RaidDevice State
        2       8       33        0      active sync   /dev/sdc1
        3       8       17        1      active sync   /dev/sdb1

[root@vdr jens]# mdadm -X /dev/sdb1
         Filename : /dev/sdb1
            Magic : 6d746962
          Version : 4
             UUID : 5532ffda:ccbc790f:b50c4959:8f0fd43f
           Events : 32804
   Events Cleared : 32804
            State : OK
        Chunksize : 64 MB
           Daemon : 5s flush period
       Write Mode : Allow write behind, max 4096
        Sync Size : 1953381440 (1862.89 GiB 2000.26 GB)
           Bitmap : 29807 bits (chunks), 3 dirty (0.0%)

[root@vdr jens]# mdadm -X /dev/sdc1
         Filename : /dev/sdc1
            Magic : 6d746962
          Version : 4
             UUID : 5532ffda:ccbc790f:b50c4959:8f0fd43f
           Events : 32804
   Events Cleared : 32804
            State : OK
        Chunksize : 64 MB
           Daemon : 5s flush period
       Write Mode : Allow write behind, max 4096
        Sync Size : 1953381440 (1862.89 GiB 2000.26 GB)
           Bitmap : 29807 bits (chunks), 3 dirty (0.0%)

Am 17.08.21 um 14:32 schrieb Ming Lei:
> On Tue, Aug 17, 2021 at 06:06:12AM +0100, Christoph Hellwig wrote:
>> On Mon, Aug 16, 2021 at 07:40:48PM +0800, Ming Lei wrote:
>>>>> 0 ~ 254: each bvec's length is 512
>>>>> 255: bvec's length is 8192
>>>>>
>>>>> the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it
>>>>> still may need 257 bvecs, which can't be allocated via bio_alloc_bioset().
>>>> Yes, we still need the rounding magic that alloc_behind_master_bio uses
>>>> here.
>>> But it is wrong to use max sectors to limit number of bvecs(segments), isn't it?
>> The raid1 write behind code cares about the size ofa bio it can reach by
>> adding order 0 pages to it.  The bvecs are part of that and I think the
>> calculation in the patch documents that a well.
> Thinking of further, your and Guoqing's patch are correct & enough since
> bio_copy_data() just copies bytes(sectors) stream from fs bio to the
> write behind bio.
>
>
> Thanks,
> Ming
>

[-- Attachment #2: raid1.patch --]
[-- Type: text/x-patch, Size: 2245 bytes --]

diff --unified --recursive --text archlinux-linux/drivers/md/raid1.c archlinux-linux-diff/drivers/md/raid1.c
--- archlinux-linux/drivers/md/raid1.c	2021-09-24 14:37:15.347771866 +0200
+++ archlinux-linux-diff/drivers/md/raid1.c	2021-09-24 14:40:02.443978319 +0200
@@ -1501,7 +1501,7 @@
 			 * Not if there are too many, or cannot
 			 * allocate memory, or a reader on WriteMostly
 			 * is waiting for behind writes to flush */
-			if (bitmap &&
+			if (bitmap && write_behind &&
 			    (atomic_read(&bitmap->behind_writes)
 			     < mddev->bitmap_info.max_write_behind) &&
 			    !waitqueue_active(&bitmap->behind_wait)) {
diff --unified --recursive --text archlinux-linux/drivers/md/raid1.c archlinux-linux-changed/drivers/md/raid1.c
--- archlinux-linux/drivers/md/raid1.c	2021-09-24 15:43:22.842680119 +0200
+++ archlinux-linux-changed/drivers/md/raid1.c	2021-09-24 15:43:59.426142955 +0200
@@ -1329,7 +1329,6 @@
 	struct raid1_plug_cb *plug = NULL;
 	int first_clone;
 	int max_sectors;
-	bool write_behind = false;
 
 	if (mddev_is_clustered(mddev) &&
 	     md_cluster_ops->area_resyncing(mddev, WRITE,
@@ -1383,14 +1382,6 @@
 	for (i = 0;  i < disks; i++) {
 		struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev);
 
-		/*
-		 * The write-behind io is only attempted on drives marked as
-		 * write-mostly, which means we could allocate write behind
-		 * bio later.
-		 */
-		if (rdev && test_bit(WriteMostly, &rdev->flags))
-			write_behind = true;
-
 		if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
 			atomic_inc(&rdev->nr_pending);
 			blocked_rdev = rdev;
@@ -1470,7 +1461,7 @@
 	 * at a time and thus needs a new bio that can fit the whole payload
 	 * this bio in page sized chunks.
 	 */
-	if (write_behind && bitmap)
+	if (bitmap)
 		max_sectors = min_t(int, max_sectors,
 				    BIO_MAX_VECS * (PAGE_SIZE >> 9));
 	if (max_sectors < bio_sectors(bio)) {
@@ -1501,7 +1492,7 @@
 			 * Not if there are too many, or cannot
 			 * allocate memory, or a reader on WriteMostly
 			 * is waiting for behind writes to flush */
-			if (bitmap &&
+			if (bitmap && 
 			    (atomic_read(&bitmap->behind_writes)
 			     < mddev->bitmap_info.max_write_behind) &&
 			    !waitqueue_active(&bitmap->behind_wait)) {

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
  2021-09-24 15:34                 ` Jens Stutte (Archiv)
@ 2021-09-25 23:02                   ` Guoqing Jiang
  0 siblings, 0 replies; 12+ messages in thread
From: Guoqing Jiang @ 2021-09-25 23:02 UTC (permalink / raw)
  To: Jens Stutte (Archiv), Ming Lei, Christoph Hellwig
  Cc: song, linux-raid, linux-block



On 9/24/21 11:34 PM, Jens Stutte (Archiv) wrote:
> Hi all,
>
> I just had the occasion to test the new patch as landed in arch linux 
> 5.14.7. Unfortunately it does not work for me. Attached you can find a 
> modification that works for me, though I am not really sure why 
> write_behind seems not to be set to true on my configuration. If there 
> is any more data I can provide to help you to investigate, please let 
> me know.

Thanks for the report!  As commented in bugzilla, this is because 
write-behind
IO still happens even without write-mostly device. I will send new patch 
after
you confirm it works.


1. https://bugzilla.kernel.org/show_bug.cgi?id=213181

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-09-25 23:02 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20210813060510.3545109-1-guoqing.jiang@linux.dev>
2021-08-13  7:49 ` [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors Christoph Hellwig
2021-08-13  8:38   ` Guoqing Jiang
2021-08-14  7:55     ` Christoph Hellwig
2021-08-14  8:57       ` Ming Lei
2021-08-16  6:27         ` Guoqing Jiang
2021-08-16  7:13           ` Ming Lei
2021-08-16  9:37         ` Christoph Hellwig
2021-08-16 11:40           ` Ming Lei
2021-08-17  5:06             ` Christoph Hellwig
2021-08-17 12:32               ` Ming Lei
2021-09-24 15:34                 ` Jens Stutte (Archiv)
2021-09-25 23:02                   ` Guoqing Jiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).