* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors [not found] <20210813060510.3545109-1-guoqing.jiang@linux.dev> @ 2021-08-13 7:49 ` Christoph Hellwig 2021-08-13 8:38 ` Guoqing Jiang 0 siblings, 1 reply; 12+ messages in thread From: Christoph Hellwig @ 2021-08-13 7:49 UTC (permalink / raw) To: Guoqing Jiang; +Cc: song, linux-raid, jens, linux-block On Fri, Aug 13, 2021 at 02:05:10PM +0800, Guoqing Jiang wrote: > From: Guoqing Jiang <jiangguoqing@kylinos.cn> > > We can't split bio with more than BIO_MAX_VECS sectors, otherwise the > below call trace was triggered because we could allocate oversized > write behind bio later. > > [ 8.097936] bvec_alloc+0x90/0xc0 > [ 8.098934] bio_alloc_bioset+0x1b3/0x260 > [ 8.099959] raid1_make_request+0x9ce/0xc50 [raid1] Which bio_alloc_bioset is this? The one in alloc_behind_master_bio? In which case I think you want to limit the reduction of max_sectors to just the write behind case, and clearly document what is going on. In general the size of a bio only depends on the number of vectors, not the total I/O size. But alloc_behind_master_bio allocates new backing pages using order 0 allocations, so in this exceptional case the total size oes actually matter. While we're at it: this huge memory allocation looks really deadlock prone. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors 2021-08-13 7:49 ` [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors Christoph Hellwig @ 2021-08-13 8:38 ` Guoqing Jiang 2021-08-14 7:55 ` Christoph Hellwig 0 siblings, 1 reply; 12+ messages in thread From: Guoqing Jiang @ 2021-08-13 8:38 UTC (permalink / raw) To: Christoph Hellwig; +Cc: song, linux-raid, jens, linux-block On 8/13/21 3:49 PM, Christoph Hellwig wrote: > On Fri, Aug 13, 2021 at 02:05:10PM +0800, Guoqing Jiang wrote: >> From: Guoqing Jiang <jiangguoqing@kylinos.cn> >> >> We can't split bio with more than BIO_MAX_VECS sectors, otherwise the >> below call trace was triggered because we could allocate oversized >> write behind bio later. >> >> [ 8.097936] bvec_alloc+0x90/0xc0 >> [ 8.098934] bio_alloc_bioset+0x1b3/0x260 >> [ 8.099959] raid1_make_request+0x9ce/0xc50 [raid1] > Which bio_alloc_bioset is this? The one in alloc_behind_master_bio? Yes, it should be the one since bio_clone_fast calls bio_alloc_bioset with 0 iovecs. > In which case I think you want to limit the reduction of max_sectors > to just the write behind case, and clearly document what is going on. Ok, thanks. > In general the size of a bio only depends on the number of vectors, not > the total I/O size. But alloc_behind_master_bio allocates new backing > pages using order 0 allocations, so in this exceptional case the total > size oes actually matter. > > While we're at it: this huge memory allocation looks really deadlock > prone. Hmm, let me think more about it, or could you share your thought? 😉 Thanks, Guoqing ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors 2021-08-13 8:38 ` Guoqing Jiang @ 2021-08-14 7:55 ` Christoph Hellwig 2021-08-14 8:57 ` Ming Lei 0 siblings, 1 reply; 12+ messages in thread From: Christoph Hellwig @ 2021-08-14 7:55 UTC (permalink / raw) To: Guoqing Jiang; +Cc: Christoph Hellwig, song, linux-raid, jens, linux-block On Fri, Aug 13, 2021 at 04:38:59PM +0800, Guoqing Jiang wrote: > > Ok, thanks. > > > In general the size of a bio only depends on the number of vectors, not > > the total I/O size. But alloc_behind_master_bio allocates new backing > > pages using order 0 allocations, so in this exceptional case the total > > size oes actually matter. > > > > While we're at it: this huge memory allocation looks really deadlock > > prone. > > Hmm, let me think more about it, or could you share your thought? ???? Well, you'd need a mempool which can fit the max payload of a bio, that is BIO_MAX_VECS pages. FYI, this is what I'd do instead of this patch for now. We don't really need a vetor per sector, just per page. So this limits the I/O size a little less. diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 3c44c4bb40fc..5b27d995302e 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1454,6 +1454,15 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, goto retry_write; } + /* + * When using a bitmap, we may call alloc_behind_master_bio below. + * alloc_behind_master_bio allocates a copy of the data payload a page + * at a time and thus needs a new bio that can fit the whole payload + * this bio in page sized chunks. + */ + if (bitmap) + max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SIZE); + if (max_sectors < bio_sectors(bio)) { struct bio *split = bio_split(bio, max_sectors, GFP_NOIO, &conf->bio_split); ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors 2021-08-14 7:55 ` Christoph Hellwig @ 2021-08-14 8:57 ` Ming Lei 2021-08-16 6:27 ` Guoqing Jiang 2021-08-16 9:37 ` Christoph Hellwig 0 siblings, 2 replies; 12+ messages in thread From: Ming Lei @ 2021-08-14 8:57 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Guoqing Jiang, song, linux-raid, jens, linux-block On Sat, Aug 14, 2021 at 08:55:21AM +0100, Christoph Hellwig wrote: > On Fri, Aug 13, 2021 at 04:38:59PM +0800, Guoqing Jiang wrote: > > > > Ok, thanks. > > > > > In general the size of a bio only depends on the number of vectors, not > > > the total I/O size. But alloc_behind_master_bio allocates new backing > > > pages using order 0 allocations, so in this exceptional case the total > > > size oes actually matter. > > > > > > While we're at it: this huge memory allocation looks really deadlock > > > prone. > > > > Hmm, let me think more about it, or could you share your thought? ???? > > Well, you'd need a mempool which can fit the max payload of a bio, > that is BIO_MAX_VECS pages. > > FYI, this is what I'd do instead of this patch for now. We don't really > need a vetor per sector, just per page. So this limits the I/O > size a little less. > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 3c44c4bb40fc..5b27d995302e 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -1454,6 +1454,15 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, > goto retry_write; > } > > + /* > + * When using a bitmap, we may call alloc_behind_master_bio below. > + * alloc_behind_master_bio allocates a copy of the data payload a page > + * at a time and thus needs a new bio that can fit the whole payload > + * this bio in page sized chunks. > + */ > + if (bitmap) > + max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SIZE); s/PAGE_SIZE/PAGE_SECTORS > + > if (max_sectors < bio_sectors(bio)) { > struct bio *split = bio_split(bio, max_sectors, > GFP_NOIO, &conf->bio_split); > Here the limit is max single-page vectors, and the above way may not work, such as: 0 ~ 254: each bvec's length is 512 255: bvec's length is 8192 the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it still may need 257 bvecs, which can't be allocated via bio_alloc_bioset(). One solution is to add queue limit of max_single_page_bvec, and let blk_queue_split() handle it. Thanks, Ming ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors 2021-08-14 8:57 ` Ming Lei @ 2021-08-16 6:27 ` Guoqing Jiang 2021-08-16 7:13 ` Ming Lei 2021-08-16 9:37 ` Christoph Hellwig 1 sibling, 1 reply; 12+ messages in thread From: Guoqing Jiang @ 2021-08-16 6:27 UTC (permalink / raw) To: Ming Lei, Christoph Hellwig; +Cc: song, linux-raid, jens, linux-block Hi Ming and Christoph, On 8/14/21 4:57 PM, Ming Lei wrote: > On Sat, Aug 14, 2021 at 08:55:21AM +0100, Christoph Hellwig wrote: >> On Fri, Aug 13, 2021 at 04:38:59PM +0800, Guoqing Jiang wrote: >>> Ok, thanks. >>> >>>> In general the size of a bio only depends on the number of vectors, not >>>> the total I/O size. But alloc_behind_master_bio allocates new backing >>>> pages using order 0 allocations, so in this exceptional case the total >>>> size oes actually matter. >>>> >>>> While we're at it: this huge memory allocation looks really deadlock >>>> prone. >>> Hmm, let me think more about it, or could you share your thought? ???? >> Well, you'd need a mempool which can fit the max payload of a bio, >> that is BIO_MAX_VECS pages. IIUC, the behind bio is allocated from bio_set (mddev->bio_set) which is allocated in md_run by call bioset_init, so the mempool (bvec_pool) of this bio_set is created by biovec_init_pool which uses global biovec slabs. Do we really need another mempool? Or, there is no potential deadlock for this case. >> FYI, this is what I'd do instead of this patch for now. We don't really >> need a vetor per sector, just per page. So this limits the I/O >> size a little less. >> >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c >> index 3c44c4bb40fc..5b27d995302e 100644 >> --- a/drivers/md/raid1.c >> +++ b/drivers/md/raid1.c >> @@ -1454,6 +1454,15 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, >> goto retry_write; >> } >> >> + /* >> + * When using a bitmap, we may call alloc_behind_master_bio below. >> + * alloc_behind_master_bio allocates a copy of the data payload a page >> + * at a time and thus needs a new bio that can fit the whole payload >> + * this bio in page sized chunks. >> + */ Thanks for the above, will copy it accordingly. I will check if WriteMostly is set before, then check both the flag and bitmap. >> + if (bitmap) >> + max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SIZE); > s/PAGE_SIZE/PAGE_SECTORS Agree. >> + >> if (max_sectors < bio_sectors(bio)) { >> struct bio *split = bio_split(bio, max_sectors, >> GFP_NOIO, &conf->bio_split); > Here the limit is max single-page vectors, and the above way may not work, > such as:ust splitted and not > > 0 ~ 254: each bvec's length is 512 > 255: bvec's length is 8192 > > the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it > still may need 257 bvecs, which can't be allocated via bio_alloc_bioset(). Thanks for deeper looking! I guess it is because how vcnt is calculated. > One solution is to add queue limit of max_single_page_bvec, and let > blk_queue_split() handle it. The path (blk_queue_split -> blk_bio_segment_split -> bvec_split_segs) which respects max_segments of limit. Do you mean introduce max_single_page_bvec to limit? Then perform similar checking as for max_segment. Thanks, Guoqing ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors 2021-08-16 6:27 ` Guoqing Jiang @ 2021-08-16 7:13 ` Ming Lei 0 siblings, 0 replies; 12+ messages in thread From: Ming Lei @ 2021-08-16 7:13 UTC (permalink / raw) To: Guoqing Jiang; +Cc: Christoph Hellwig, song, linux-raid, jens, linux-block On Mon, Aug 16, 2021 at 02:27:48PM +0800, Guoqing Jiang wrote: > Hi Ming and Christoph, > > On 8/14/21 4:57 PM, Ming Lei wrote: > > On Sat, Aug 14, 2021 at 08:55:21AM +0100, Christoph Hellwig wrote: > > > On Fri, Aug 13, 2021 at 04:38:59PM +0800, Guoqing Jiang wrote: > > > > Ok, thanks. > > > > > > > > > In general the size of a bio only depends on the number of vectors, not > > > > > the total I/O size. But alloc_behind_master_bio allocates new backing > > > > > pages using order 0 allocations, so in this exceptional case the total > > > > > size oes actually matter. > > > > > > > > > > While we're at it: this huge memory allocation looks really deadlock > > > > > prone. > > > > Hmm, let me think more about it, or could you share your thought? ???? > > > Well, you'd need a mempool which can fit the max payload of a bio, > > > that is BIO_MAX_VECS pages. > > IIUC, the behind bio is allocated from bio_set (mddev->bio_set) which is > allocated in md_run by > call bioset_init, so the mempool (bvec_pool) of this bio_set is created by > biovec_init_pool which > uses global biovec slabs. Do we really need another mempool? Or, there is no > potential deadlock > for this case. > > > > FYI, this is what I'd do instead of this patch for now. We don't really > > > need a vetor per sector, just per page. So this limits the I/O > > > size a little less. > > > > > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > > > index 3c44c4bb40fc..5b27d995302e 100644 > > > --- a/drivers/md/raid1.c > > > +++ b/drivers/md/raid1.c > > > @@ -1454,6 +1454,15 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, > > > goto retry_write; > > > } > > > + /* > > > + * When using a bitmap, we may call alloc_behind_master_bio below. > > > + * alloc_behind_master_bio allocates a copy of the data payload a page > > > + * at a time and thus needs a new bio that can fit the whole payload > > > + * this bio in page sized chunks. > > > + */ > > Thanks for the above, will copy it accordingly. I will check if WriteMostly > is set before, then check both > the flag and bitmap. > > > > + if (bitmap) > > > + max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SIZE); > > s/PAGE_SIZE/PAGE_SECTORS > > Agree. > > > > + > > > if (max_sectors < bio_sectors(bio)) { > > > struct bio *split = bio_split(bio, max_sectors, > > > GFP_NOIO, &conf->bio_split); > > Here the limit is max single-page vectors, and the above way may not work, > > such as:ust splitted and not > > > > 0 ~ 254: each bvec's length is 512 > > 255: bvec's length is 8192 > > > > the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it > > still may need 257 bvecs, which can't be allocated via bio_alloc_bioset(). > > Thanks for deeper looking! I guess it is because how vcnt is calculated. > > > One solution is to add queue limit of max_single_page_bvec, and let > > blk_queue_split() handle it. > > The path (blk_queue_split -> blk_bio_segment_split -> bvec_split_segs) which > respects max_segments > of limit. Do you mean introduce max_single_page_bvec to limit? Then perform > similar checking as for > max_segment. Yes, then the bio is guaranteed to not reach max single-page bvec limit, just like what __blk_queue_bounce() does. thanks, Ming ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors 2021-08-14 8:57 ` Ming Lei 2021-08-16 6:27 ` Guoqing Jiang @ 2021-08-16 9:37 ` Christoph Hellwig 2021-08-16 11:40 ` Ming Lei 1 sibling, 1 reply; 12+ messages in thread From: Christoph Hellwig @ 2021-08-16 9:37 UTC (permalink / raw) To: Ming Lei Cc: Christoph Hellwig, Guoqing Jiang, song, linux-raid, jens, linux-block On Sat, Aug 14, 2021 at 04:57:06PM +0800, Ming Lei wrote: > > + if (bitmap) > > + max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SIZE); > > s/PAGE_SIZE/PAGE_SECTORS Yeah, max_sectors is in size units, I messed that up. > > > + > > if (max_sectors < bio_sectors(bio)) { > > struct bio *split = bio_split(bio, max_sectors, > > GFP_NOIO, &conf->bio_split); > > > > Here the limit is max single-page vectors, and the above way may not work, > such as: > > 0 ~ 254: each bvec's length is 512 > 255: bvec's length is 8192 > > the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it > still may need 257 bvecs, which can't be allocated via bio_alloc_bioset(). Yes, we still need the rounding magic that alloc_behind_master_bio uses here. > One solution is to add queue limit of max_single_page_bvec, and let > blk_queue_split() handle it. I'd rather not bloat the core with this. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors 2021-08-16 9:37 ` Christoph Hellwig @ 2021-08-16 11:40 ` Ming Lei 2021-08-17 5:06 ` Christoph Hellwig 0 siblings, 1 reply; 12+ messages in thread From: Ming Lei @ 2021-08-16 11:40 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Guoqing Jiang, song, linux-raid, jens, linux-block On Mon, Aug 16, 2021 at 10:37:54AM +0100, Christoph Hellwig wrote: > On Sat, Aug 14, 2021 at 04:57:06PM +0800, Ming Lei wrote: > > > + if (bitmap) > > > + max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SIZE); > > > > s/PAGE_SIZE/PAGE_SECTORS > > Yeah, max_sectors is in size units, I messed that up. > > > > > > + > > > if (max_sectors < bio_sectors(bio)) { > > > struct bio *split = bio_split(bio, max_sectors, > > > GFP_NOIO, &conf->bio_split); > > > > > > > Here the limit is max single-page vectors, and the above way may not work, > > such as: > > > > 0 ~ 254: each bvec's length is 512 > > 255: bvec's length is 8192 > > > > the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it > > still may need 257 bvecs, which can't be allocated via bio_alloc_bioset(). > > Yes, we still need the rounding magic that alloc_behind_master_bio uses > here. But it is wrong to use max sectors to limit number of bvecs(segments), isn't it? Thanks, Ming ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors 2021-08-16 11:40 ` Ming Lei @ 2021-08-17 5:06 ` Christoph Hellwig 2021-08-17 12:32 ` Ming Lei 0 siblings, 1 reply; 12+ messages in thread From: Christoph Hellwig @ 2021-08-17 5:06 UTC (permalink / raw) To: Ming Lei Cc: Christoph Hellwig, Guoqing Jiang, song, linux-raid, jens, linux-block On Mon, Aug 16, 2021 at 07:40:48PM +0800, Ming Lei wrote: > > > > > > 0 ~ 254: each bvec's length is 512 > > > 255: bvec's length is 8192 > > > > > > the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it > > > still may need 257 bvecs, which can't be allocated via bio_alloc_bioset(). > > > > Yes, we still need the rounding magic that alloc_behind_master_bio uses > > here. > > But it is wrong to use max sectors to limit number of bvecs(segments), isn't it? The raid1 write behind code cares about the size ofa bio it can reach by adding order 0 pages to it. The bvecs are part of that and I think the calculation in the patch documents that a well. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors 2021-08-17 5:06 ` Christoph Hellwig @ 2021-08-17 12:32 ` Ming Lei 2021-09-24 15:34 ` Jens Stutte (Archiv) 0 siblings, 1 reply; 12+ messages in thread From: Ming Lei @ 2021-08-17 12:32 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Guoqing Jiang, song, linux-raid, jens, linux-block On Tue, Aug 17, 2021 at 06:06:12AM +0100, Christoph Hellwig wrote: > On Mon, Aug 16, 2021 at 07:40:48PM +0800, Ming Lei wrote: > > > > > > > > 0 ~ 254: each bvec's length is 512 > > > > 255: bvec's length is 8192 > > > > > > > > the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it > > > > still may need 257 bvecs, which can't be allocated via bio_alloc_bioset(). > > > > > > Yes, we still need the rounding magic that alloc_behind_master_bio uses > > > here. > > > > But it is wrong to use max sectors to limit number of bvecs(segments), isn't it? > > The raid1 write behind code cares about the size ofa bio it can reach by > adding order 0 pages to it. The bvecs are part of that and I think the > calculation in the patch documents that a well. Thinking of further, your and Guoqing's patch are correct & enough since bio_copy_data() just copies bytes(sectors) stream from fs bio to the write behind bio. Thanks, Ming ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors 2021-08-17 12:32 ` Ming Lei @ 2021-09-24 15:34 ` Jens Stutte (Archiv) 2021-09-25 23:02 ` Guoqing Jiang 0 siblings, 1 reply; 12+ messages in thread From: Jens Stutte (Archiv) @ 2021-09-24 15:34 UTC (permalink / raw) To: Ming Lei, Christoph Hellwig; +Cc: Guoqing Jiang, song, linux-raid, linux-block [-- Attachment #1: Type: text/plain, Size: 3628 bytes --] Hi all, I just had the occasion to test the new patch as landed in arch linux 5.14.7. Unfortunately it does not work for me. Attached you can find a modification that works for me, though I am not really sure why write_behind seems not to be set to true on my configuration. If there is any more data I can provide to help you to investigate, please let me know. Thanks for any clues, Jens My configuration: [root@vdr jens]# mdadm --detail -v /dev/md0 /dev/md0: Version : 1.2 Creation Time : Fri Dec 26 09:50:53 2014 Raid Level : raid1 Array Size : 1953381440 (1862.89 GiB 2000.26 GB) Used Dev Size : 1953381440 (1862.89 GiB 2000.26 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Fri Sep 24 17:30:51 2021 State : active Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Consistency Policy : bitmap Name : vdr:0 (local to host vdr) UUID : 5532ffda:ccbc790f:b50c4959:8f0fd43f Events : 32805 Number Major Minor RaidDevice State 2 8 33 0 active sync /dev/sdc1 3 8 17 1 active sync /dev/sdb1 [root@vdr jens]# mdadm -X /dev/sdb1 Filename : /dev/sdb1 Magic : 6d746962 Version : 4 UUID : 5532ffda:ccbc790f:b50c4959:8f0fd43f Events : 32804 Events Cleared : 32804 State : OK Chunksize : 64 MB Daemon : 5s flush period Write Mode : Allow write behind, max 4096 Sync Size : 1953381440 (1862.89 GiB 2000.26 GB) Bitmap : 29807 bits (chunks), 3 dirty (0.0%) [root@vdr jens]# mdadm -X /dev/sdc1 Filename : /dev/sdc1 Magic : 6d746962 Version : 4 UUID : 5532ffda:ccbc790f:b50c4959:8f0fd43f Events : 32804 Events Cleared : 32804 State : OK Chunksize : 64 MB Daemon : 5s flush period Write Mode : Allow write behind, max 4096 Sync Size : 1953381440 (1862.89 GiB 2000.26 GB) Bitmap : 29807 bits (chunks), 3 dirty (0.0%) Am 17.08.21 um 14:32 schrieb Ming Lei: > On Tue, Aug 17, 2021 at 06:06:12AM +0100, Christoph Hellwig wrote: >> On Mon, Aug 16, 2021 at 07:40:48PM +0800, Ming Lei wrote: >>>>> 0 ~ 254: each bvec's length is 512 >>>>> 255: bvec's length is 8192 >>>>> >>>>> the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it >>>>> still may need 257 bvecs, which can't be allocated via bio_alloc_bioset(). >>>> Yes, we still need the rounding magic that alloc_behind_master_bio uses >>>> here. >>> But it is wrong to use max sectors to limit number of bvecs(segments), isn't it? >> The raid1 write behind code cares about the size ofa bio it can reach by >> adding order 0 pages to it. The bvecs are part of that and I think the >> calculation in the patch documents that a well. > Thinking of further, your and Guoqing's patch are correct & enough since > bio_copy_data() just copies bytes(sectors) stream from fs bio to the > write behind bio. > > > Thanks, > Ming > [-- Attachment #2: raid1.patch --] [-- Type: text/x-patch, Size: 2245 bytes --] diff --unified --recursive --text archlinux-linux/drivers/md/raid1.c archlinux-linux-diff/drivers/md/raid1.c --- archlinux-linux/drivers/md/raid1.c 2021-09-24 14:37:15.347771866 +0200 +++ archlinux-linux-diff/drivers/md/raid1.c 2021-09-24 14:40:02.443978319 +0200 @@ -1501,7 +1501,7 @@ * Not if there are too many, or cannot * allocate memory, or a reader on WriteMostly * is waiting for behind writes to flush */ - if (bitmap && + if (bitmap && write_behind && (atomic_read(&bitmap->behind_writes) < mddev->bitmap_info.max_write_behind) && !waitqueue_active(&bitmap->behind_wait)) { diff --unified --recursive --text archlinux-linux/drivers/md/raid1.c archlinux-linux-changed/drivers/md/raid1.c --- archlinux-linux/drivers/md/raid1.c 2021-09-24 15:43:22.842680119 +0200 +++ archlinux-linux-changed/drivers/md/raid1.c 2021-09-24 15:43:59.426142955 +0200 @@ -1329,7 +1329,6 @@ struct raid1_plug_cb *plug = NULL; int first_clone; int max_sectors; - bool write_behind = false; if (mddev_is_clustered(mddev) && md_cluster_ops->area_resyncing(mddev, WRITE, @@ -1383,14 +1382,6 @@ for (i = 0; i < disks; i++) { struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev); - /* - * The write-behind io is only attempted on drives marked as - * write-mostly, which means we could allocate write behind - * bio later. - */ - if (rdev && test_bit(WriteMostly, &rdev->flags)) - write_behind = true; - if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) { atomic_inc(&rdev->nr_pending); blocked_rdev = rdev; @@ -1470,7 +1461,7 @@ * at a time and thus needs a new bio that can fit the whole payload * this bio in page sized chunks. */ - if (write_behind && bitmap) + if (bitmap) max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * (PAGE_SIZE >> 9)); if (max_sectors < bio_sectors(bio)) { @@ -1501,7 +1492,7 @@ * Not if there are too many, or cannot * allocate memory, or a reader on WriteMostly * is waiting for behind writes to flush */ - if (bitmap && + if (bitmap && (atomic_read(&bitmap->behind_writes) < mddev->bitmap_info.max_write_behind) && !waitqueue_active(&bitmap->behind_wait)) { ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors 2021-09-24 15:34 ` Jens Stutte (Archiv) @ 2021-09-25 23:02 ` Guoqing Jiang 0 siblings, 0 replies; 12+ messages in thread From: Guoqing Jiang @ 2021-09-25 23:02 UTC (permalink / raw) To: Jens Stutte (Archiv), Ming Lei, Christoph Hellwig Cc: song, linux-raid, linux-block On 9/24/21 11:34 PM, Jens Stutte (Archiv) wrote: > Hi all, > > I just had the occasion to test the new patch as landed in arch linux > 5.14.7. Unfortunately it does not work for me. Attached you can find a > modification that works for me, though I am not really sure why > write_behind seems not to be set to true on my configuration. If there > is any more data I can provide to help you to investigate, please let > me know. Thanks for the report! As commented in bugzilla, this is because write-behind IO still happens even without write-mostly device. I will send new patch after you confirm it works. 1. https://bugzilla.kernel.org/show_bug.cgi?id=213181 Thanks, Guoqing ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2021-09-25 23:02 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20210813060510.3545109-1-guoqing.jiang@linux.dev> 2021-08-13 7:49 ` [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors Christoph Hellwig 2021-08-13 8:38 ` Guoqing Jiang 2021-08-14 7:55 ` Christoph Hellwig 2021-08-14 8:57 ` Ming Lei 2021-08-16 6:27 ` Guoqing Jiang 2021-08-16 7:13 ` Ming Lei 2021-08-16 9:37 ` Christoph Hellwig 2021-08-16 11:40 ` Ming Lei 2021-08-17 5:06 ` Christoph Hellwig 2021-08-17 12:32 ` Ming Lei 2021-09-24 15:34 ` Jens Stutte (Archiv) 2021-09-25 23:02 ` Guoqing Jiang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).