* [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors @ 2021-08-13 6:05 Guoqing Jiang 2021-08-13 7:49 ` Christoph Hellwig ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Guoqing Jiang @ 2021-08-13 6:05 UTC (permalink / raw) To: song; +Cc: linux-raid, jens From: Guoqing Jiang <jiangguoqing@kylinos.cn> We can't split bio with more than BIO_MAX_VECS sectors, otherwise the below call trace was triggered because we could allocate oversized write behind bio later. [ 8.097936] bvec_alloc+0x90/0xc0 [ 8.098934] bio_alloc_bioset+0x1b3/0x260 [ 8.099959] raid1_make_request+0x9ce/0xc50 [raid1] [ 8.100988] ? __bio_clone_fast+0xa8/0xe0 [ 8.102008] md_handle_request+0x158/0x1d0 [md_mod] [ 8.103050] md_submit_bio+0xcd/0x110 [md_mod] [ 8.104084] submit_bio_noacct+0x139/0x530 [ 8.105127] submit_bio+0x78/0x1d0 [ 8.106163] ext4_io_submit+0x48/0x60 [ext4] [ 8.107242] ext4_writepages+0x652/0x1170 [ext4] [ 8.108300] ? do_writepages+0x41/0x100 [ 8.109338] ? __ext4_mark_inode_dirty+0x240/0x240 [ext4] [ 8.110406] do_writepages+0x41/0x100 [ 8.111450] __filemap_fdatawrite_range+0xc5/0x100 [ 8.112513] file_write_and_wait_range+0x61/0xb0 [ 8.113564] ext4_sync_file+0x73/0x370 [ext4] [ 8.114607] __x64_sys_fsync+0x33/0x60 [ 8.115635] do_syscall_64+0x33/0x40 [ 8.116670] entry_SYSCALL_64_after_hwframe+0x44/0xae [1]. https://bugs.archlinux.org/task/70992 Reported-by: Jens Stutte <jens@chianterastutte.eu> Tested-by: Jens Stutte <jens@chianterastutte.eu> Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> --- Note: this depends on commit 018eca456c4b4dca56aaf1ec27f309c74d0fe246 in linux-block for-next branch. drivers/md/raid1.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 3c44c4bb40fc..ab21abc056b8 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1454,6 +1454,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, goto retry_write; } + max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS); if (max_sectors < bio_sectors(bio)) { struct bio *split = bio_split(bio, max_sectors, GFP_NOIO, &conf->bio_split); -- 2.25.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors 2021-08-13 6:05 [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors Guoqing Jiang @ 2021-08-13 7:49 ` Christoph Hellwig 2021-08-13 8:38 ` Guoqing Jiang 2021-08-13 9:27 ` kernel test robot 2021-08-13 10:12 ` kernel test robot 2 siblings, 1 reply; 17+ messages in thread From: Christoph Hellwig @ 2021-08-13 7:49 UTC (permalink / raw) To: Guoqing Jiang; +Cc: song, linux-raid, jens, linux-block On Fri, Aug 13, 2021 at 02:05:10PM +0800, Guoqing Jiang wrote: > From: Guoqing Jiang <jiangguoqing@kylinos.cn> > > We can't split bio with more than BIO_MAX_VECS sectors, otherwise the > below call trace was triggered because we could allocate oversized > write behind bio later. > > [ 8.097936] bvec_alloc+0x90/0xc0 > [ 8.098934] bio_alloc_bioset+0x1b3/0x260 > [ 8.099959] raid1_make_request+0x9ce/0xc50 [raid1] Which bio_alloc_bioset is this? The one in alloc_behind_master_bio? In which case I think you want to limit the reduction of max_sectors to just the write behind case, and clearly document what is going on. In general the size of a bio only depends on the number of vectors, not the total I/O size. But alloc_behind_master_bio allocates new backing pages using order 0 allocations, so in this exceptional case the total size oes actually matter. While we're at it: this huge memory allocation looks really deadlock prone. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors 2021-08-13 7:49 ` Christoph Hellwig @ 2021-08-13 8:38 ` Guoqing Jiang 2021-08-14 7:55 ` Christoph Hellwig 0 siblings, 1 reply; 17+ messages in thread From: Guoqing Jiang @ 2021-08-13 8:38 UTC (permalink / raw) To: Christoph Hellwig; +Cc: song, linux-raid, jens, linux-block On 8/13/21 3:49 PM, Christoph Hellwig wrote: > On Fri, Aug 13, 2021 at 02:05:10PM +0800, Guoqing Jiang wrote: >> From: Guoqing Jiang <jiangguoqing@kylinos.cn> >> >> We can't split bio with more than BIO_MAX_VECS sectors, otherwise the >> below call trace was triggered because we could allocate oversized >> write behind bio later. >> >> [ 8.097936] bvec_alloc+0x90/0xc0 >> [ 8.098934] bio_alloc_bioset+0x1b3/0x260 >> [ 8.099959] raid1_make_request+0x9ce/0xc50 [raid1] > Which bio_alloc_bioset is this? The one in alloc_behind_master_bio? Yes, it should be the one since bio_clone_fast calls bio_alloc_bioset with 0 iovecs. > In which case I think you want to limit the reduction of max_sectors > to just the write behind case, and clearly document what is going on. Ok, thanks. > In general the size of a bio only depends on the number of vectors, not > the total I/O size. But alloc_behind_master_bio allocates new backing > pages using order 0 allocations, so in this exceptional case the total > size oes actually matter. > > While we're at it: this huge memory allocation looks really deadlock > prone. Hmm, let me think more about it, or could you share your thought? 😉 Thanks, Guoqing ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors 2021-08-13 8:38 ` Guoqing Jiang @ 2021-08-14 7:55 ` Christoph Hellwig 2021-08-14 8:57 ` Ming Lei 0 siblings, 1 reply; 17+ messages in thread From: Christoph Hellwig @ 2021-08-14 7:55 UTC (permalink / raw) To: Guoqing Jiang; +Cc: Christoph Hellwig, song, linux-raid, jens, linux-block On Fri, Aug 13, 2021 at 04:38:59PM +0800, Guoqing Jiang wrote: > > Ok, thanks. > > > In general the size of a bio only depends on the number of vectors, not > > the total I/O size. But alloc_behind_master_bio allocates new backing > > pages using order 0 allocations, so in this exceptional case the total > > size oes actually matter. > > > > While we're at it: this huge memory allocation looks really deadlock > > prone. > > Hmm, let me think more about it, or could you share your thought? ???? Well, you'd need a mempool which can fit the max payload of a bio, that is BIO_MAX_VECS pages. FYI, this is what I'd do instead of this patch for now. We don't really need a vetor per sector, just per page. So this limits the I/O size a little less. diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 3c44c4bb40fc..5b27d995302e 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1454,6 +1454,15 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, goto retry_write; } + /* + * When using a bitmap, we may call alloc_behind_master_bio below. + * alloc_behind_master_bio allocates a copy of the data payload a page + * at a time and thus needs a new bio that can fit the whole payload + * this bio in page sized chunks. + */ + if (bitmap) + max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SIZE); + if (max_sectors < bio_sectors(bio)) { struct bio *split = bio_split(bio, max_sectors, GFP_NOIO, &conf->bio_split); ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors 2021-08-14 7:55 ` Christoph Hellwig @ 2021-08-14 8:57 ` Ming Lei 2021-08-16 6:27 ` Guoqing Jiang 2021-08-16 9:37 ` Christoph Hellwig 0 siblings, 2 replies; 17+ messages in thread From: Ming Lei @ 2021-08-14 8:57 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Guoqing Jiang, song, linux-raid, jens, linux-block On Sat, Aug 14, 2021 at 08:55:21AM +0100, Christoph Hellwig wrote: > On Fri, Aug 13, 2021 at 04:38:59PM +0800, Guoqing Jiang wrote: > > > > Ok, thanks. > > > > > In general the size of a bio only depends on the number of vectors, not > > > the total I/O size. But alloc_behind_master_bio allocates new backing > > > pages using order 0 allocations, so in this exceptional case the total > > > size oes actually matter. > > > > > > While we're at it: this huge memory allocation looks really deadlock > > > prone. > > > > Hmm, let me think more about it, or could you share your thought? ???? > > Well, you'd need a mempool which can fit the max payload of a bio, > that is BIO_MAX_VECS pages. > > FYI, this is what I'd do instead of this patch for now. We don't really > need a vetor per sector, just per page. So this limits the I/O > size a little less. > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 3c44c4bb40fc..5b27d995302e 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -1454,6 +1454,15 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, > goto retry_write; > } > > + /* > + * When using a bitmap, we may call alloc_behind_master_bio below. > + * alloc_behind_master_bio allocates a copy of the data payload a page > + * at a time and thus needs a new bio that can fit the whole payload > + * this bio in page sized chunks. > + */ > + if (bitmap) > + max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SIZE); s/PAGE_SIZE/PAGE_SECTORS > + > if (max_sectors < bio_sectors(bio)) { > struct bio *split = bio_split(bio, max_sectors, > GFP_NOIO, &conf->bio_split); > Here the limit is max single-page vectors, and the above way may not work, such as: 0 ~ 254: each bvec's length is 512 255: bvec's length is 8192 the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it still may need 257 bvecs, which can't be allocated via bio_alloc_bioset(). One solution is to add queue limit of max_single_page_bvec, and let blk_queue_split() handle it. Thanks, Ming ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors 2021-08-14 8:57 ` Ming Lei @ 2021-08-16 6:27 ` Guoqing Jiang 2021-08-16 7:13 ` Ming Lei 2021-08-16 9:37 ` Christoph Hellwig 1 sibling, 1 reply; 17+ messages in thread From: Guoqing Jiang @ 2021-08-16 6:27 UTC (permalink / raw) To: Ming Lei, Christoph Hellwig; +Cc: song, linux-raid, jens, linux-block Hi Ming and Christoph, On 8/14/21 4:57 PM, Ming Lei wrote: > On Sat, Aug 14, 2021 at 08:55:21AM +0100, Christoph Hellwig wrote: >> On Fri, Aug 13, 2021 at 04:38:59PM +0800, Guoqing Jiang wrote: >>> Ok, thanks. >>> >>>> In general the size of a bio only depends on the number of vectors, not >>>> the total I/O size. But alloc_behind_master_bio allocates new backing >>>> pages using order 0 allocations, so in this exceptional case the total >>>> size oes actually matter. >>>> >>>> While we're at it: this huge memory allocation looks really deadlock >>>> prone. >>> Hmm, let me think more about it, or could you share your thought? ???? >> Well, you'd need a mempool which can fit the max payload of a bio, >> that is BIO_MAX_VECS pages. IIUC, the behind bio is allocated from bio_set (mddev->bio_set) which is allocated in md_run by call bioset_init, so the mempool (bvec_pool) of this bio_set is created by biovec_init_pool which uses global biovec slabs. Do we really need another mempool? Or, there is no potential deadlock for this case. >> FYI, this is what I'd do instead of this patch for now. We don't really >> need a vetor per sector, just per page. So this limits the I/O >> size a little less. >> >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c >> index 3c44c4bb40fc..5b27d995302e 100644 >> --- a/drivers/md/raid1.c >> +++ b/drivers/md/raid1.c >> @@ -1454,6 +1454,15 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, >> goto retry_write; >> } >> >> + /* >> + * When using a bitmap, we may call alloc_behind_master_bio below. >> + * alloc_behind_master_bio allocates a copy of the data payload a page >> + * at a time and thus needs a new bio that can fit the whole payload >> + * this bio in page sized chunks. >> + */ Thanks for the above, will copy it accordingly. I will check if WriteMostly is set before, then check both the flag and bitmap. >> + if (bitmap) >> + max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SIZE); > s/PAGE_SIZE/PAGE_SECTORS Agree. >> + >> if (max_sectors < bio_sectors(bio)) { >> struct bio *split = bio_split(bio, max_sectors, >> GFP_NOIO, &conf->bio_split); > Here the limit is max single-page vectors, and the above way may not work, > such as:ust splitted and not > > 0 ~ 254: each bvec's length is 512 > 255: bvec's length is 8192 > > the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it > still may need 257 bvecs, which can't be allocated via bio_alloc_bioset(). Thanks for deeper looking! I guess it is because how vcnt is calculated. > One solution is to add queue limit of max_single_page_bvec, and let > blk_queue_split() handle it. The path (blk_queue_split -> blk_bio_segment_split -> bvec_split_segs) which respects max_segments of limit. Do you mean introduce max_single_page_bvec to limit? Then perform similar checking as for max_segment. Thanks, Guoqing ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors 2021-08-16 6:27 ` Guoqing Jiang @ 2021-08-16 7:13 ` Ming Lei 0 siblings, 0 replies; 17+ messages in thread From: Ming Lei @ 2021-08-16 7:13 UTC (permalink / raw) To: Guoqing Jiang; +Cc: Christoph Hellwig, song, linux-raid, jens, linux-block On Mon, Aug 16, 2021 at 02:27:48PM +0800, Guoqing Jiang wrote: > Hi Ming and Christoph, > > On 8/14/21 4:57 PM, Ming Lei wrote: > > On Sat, Aug 14, 2021 at 08:55:21AM +0100, Christoph Hellwig wrote: > > > On Fri, Aug 13, 2021 at 04:38:59PM +0800, Guoqing Jiang wrote: > > > > Ok, thanks. > > > > > > > > > In general the size of a bio only depends on the number of vectors, not > > > > > the total I/O size. But alloc_behind_master_bio allocates new backing > > > > > pages using order 0 allocations, so in this exceptional case the total > > > > > size oes actually matter. > > > > > > > > > > While we're at it: this huge memory allocation looks really deadlock > > > > > prone. > > > > Hmm, let me think more about it, or could you share your thought? ???? > > > Well, you'd need a mempool which can fit the max payload of a bio, > > > that is BIO_MAX_VECS pages. > > IIUC, the behind bio is allocated from bio_set (mddev->bio_set) which is > allocated in md_run by > call bioset_init, so the mempool (bvec_pool) of this bio_set is created by > biovec_init_pool which > uses global biovec slabs. Do we really need another mempool? Or, there is no > potential deadlock > for this case. > > > > FYI, this is what I'd do instead of this patch for now. We don't really > > > need a vetor per sector, just per page. So this limits the I/O > > > size a little less. > > > > > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > > > index 3c44c4bb40fc..5b27d995302e 100644 > > > --- a/drivers/md/raid1.c > > > +++ b/drivers/md/raid1.c > > > @@ -1454,6 +1454,15 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, > > > goto retry_write; > > > } > > > + /* > > > + * When using a bitmap, we may call alloc_behind_master_bio below. > > > + * alloc_behind_master_bio allocates a copy of the data payload a page > > > + * at a time and thus needs a new bio that can fit the whole payload > > > + * this bio in page sized chunks. > > > + */ > > Thanks for the above, will copy it accordingly. I will check if WriteMostly > is set before, then check both > the flag and bitmap. > > > > + if (bitmap) > > > + max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SIZE); > > s/PAGE_SIZE/PAGE_SECTORS > > Agree. > > > > + > > > if (max_sectors < bio_sectors(bio)) { > > > struct bio *split = bio_split(bio, max_sectors, > > > GFP_NOIO, &conf->bio_split); > > Here the limit is max single-page vectors, and the above way may not work, > > such as:ust splitted and not > > > > 0 ~ 254: each bvec's length is 512 > > 255: bvec's length is 8192 > > > > the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it > > still may need 257 bvecs, which can't be allocated via bio_alloc_bioset(). > > Thanks for deeper looking! I guess it is because how vcnt is calculated. > > > One solution is to add queue limit of max_single_page_bvec, and let > > blk_queue_split() handle it. > > The path (blk_queue_split -> blk_bio_segment_split -> bvec_split_segs) which > respects max_segments > of limit. Do you mean introduce max_single_page_bvec to limit? Then perform > similar checking as for > max_segment. Yes, then the bio is guaranteed to not reach max single-page bvec limit, just like what __blk_queue_bounce() does. thanks, Ming ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors 2021-08-14 8:57 ` Ming Lei 2021-08-16 6:27 ` Guoqing Jiang @ 2021-08-16 9:37 ` Christoph Hellwig 2021-08-16 11:40 ` Ming Lei 1 sibling, 1 reply; 17+ messages in thread From: Christoph Hellwig @ 2021-08-16 9:37 UTC (permalink / raw) To: Ming Lei Cc: Christoph Hellwig, Guoqing Jiang, song, linux-raid, jens, linux-block On Sat, Aug 14, 2021 at 04:57:06PM +0800, Ming Lei wrote: > > + if (bitmap) > > + max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SIZE); > > s/PAGE_SIZE/PAGE_SECTORS Yeah, max_sectors is in size units, I messed that up. > > > + > > if (max_sectors < bio_sectors(bio)) { > > struct bio *split = bio_split(bio, max_sectors, > > GFP_NOIO, &conf->bio_split); > > > > Here the limit is max single-page vectors, and the above way may not work, > such as: > > 0 ~ 254: each bvec's length is 512 > 255: bvec's length is 8192 > > the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it > still may need 257 bvecs, which can't be allocated via bio_alloc_bioset(). Yes, we still need the rounding magic that alloc_behind_master_bio uses here. > One solution is to add queue limit of max_single_page_bvec, and let > blk_queue_split() handle it. I'd rather not bloat the core with this. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors 2021-08-16 9:37 ` Christoph Hellwig @ 2021-08-16 11:40 ` Ming Lei 2021-08-17 5:06 ` Christoph Hellwig 0 siblings, 1 reply; 17+ messages in thread From: Ming Lei @ 2021-08-16 11:40 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Guoqing Jiang, song, linux-raid, jens, linux-block On Mon, Aug 16, 2021 at 10:37:54AM +0100, Christoph Hellwig wrote: > On Sat, Aug 14, 2021 at 04:57:06PM +0800, Ming Lei wrote: > > > + if (bitmap) > > > + max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SIZE); > > > > s/PAGE_SIZE/PAGE_SECTORS > > Yeah, max_sectors is in size units, I messed that up. > > > > > > + > > > if (max_sectors < bio_sectors(bio)) { > > > struct bio *split = bio_split(bio, max_sectors, > > > GFP_NOIO, &conf->bio_split); > > > > > > > Here the limit is max single-page vectors, and the above way may not work, > > such as: > > > > 0 ~ 254: each bvec's length is 512 > > 255: bvec's length is 8192 > > > > the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it > > still may need 257 bvecs, which can't be allocated via bio_alloc_bioset(). > > Yes, we still need the rounding magic that alloc_behind_master_bio uses > here. But it is wrong to use max sectors to limit number of bvecs(segments), isn't it? Thanks, Ming ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors 2021-08-16 11:40 ` Ming Lei @ 2021-08-17 5:06 ` Christoph Hellwig 2021-08-17 12:32 ` Ming Lei 0 siblings, 1 reply; 17+ messages in thread From: Christoph Hellwig @ 2021-08-17 5:06 UTC (permalink / raw) To: Ming Lei Cc: Christoph Hellwig, Guoqing Jiang, song, linux-raid, jens, linux-block On Mon, Aug 16, 2021 at 07:40:48PM +0800, Ming Lei wrote: > > > > > > 0 ~ 254: each bvec's length is 512 > > > 255: bvec's length is 8192 > > > > > > the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it > > > still may need 257 bvecs, which can't be allocated via bio_alloc_bioset(). > > > > Yes, we still need the rounding magic that alloc_behind_master_bio uses > > here. > > But it is wrong to use max sectors to limit number of bvecs(segments), isn't it? The raid1 write behind code cares about the size ofa bio it can reach by adding order 0 pages to it. The bvecs are part of that and I think the calculation in the patch documents that a well. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors 2021-08-17 5:06 ` Christoph Hellwig @ 2021-08-17 12:32 ` Ming Lei 2021-09-24 15:34 ` Jens Stutte (Archiv) 0 siblings, 1 reply; 17+ messages in thread From: Ming Lei @ 2021-08-17 12:32 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Guoqing Jiang, song, linux-raid, jens, linux-block On Tue, Aug 17, 2021 at 06:06:12AM +0100, Christoph Hellwig wrote: > On Mon, Aug 16, 2021 at 07:40:48PM +0800, Ming Lei wrote: > > > > > > > > 0 ~ 254: each bvec's length is 512 > > > > 255: bvec's length is 8192 > > > > > > > > the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it > > > > still may need 257 bvecs, which can't be allocated via bio_alloc_bioset(). > > > > > > Yes, we still need the rounding magic that alloc_behind_master_bio uses > > > here. > > > > But it is wrong to use max sectors to limit number of bvecs(segments), isn't it? > > The raid1 write behind code cares about the size ofa bio it can reach by > adding order 0 pages to it. The bvecs are part of that and I think the > calculation in the patch documents that a well. Thinking of further, your and Guoqing's patch are correct & enough since bio_copy_data() just copies bytes(sectors) stream from fs bio to the write behind bio. Thanks, Ming ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors 2021-08-17 12:32 ` Ming Lei @ 2021-09-24 15:34 ` Jens Stutte (Archiv) 2021-09-25 23:02 ` Guoqing Jiang 0 siblings, 1 reply; 17+ messages in thread From: Jens Stutte (Archiv) @ 2021-09-24 15:34 UTC (permalink / raw) To: Ming Lei, Christoph Hellwig; +Cc: Guoqing Jiang, song, linux-raid, linux-block [-- Attachment #1: Type: text/plain, Size: 3628 bytes --] Hi all, I just had the occasion to test the new patch as landed in arch linux 5.14.7. Unfortunately it does not work for me. Attached you can find a modification that works for me, though I am not really sure why write_behind seems not to be set to true on my configuration. If there is any more data I can provide to help you to investigate, please let me know. Thanks for any clues, Jens My configuration: [root@vdr jens]# mdadm --detail -v /dev/md0 /dev/md0: Version : 1.2 Creation Time : Fri Dec 26 09:50:53 2014 Raid Level : raid1 Array Size : 1953381440 (1862.89 GiB 2000.26 GB) Used Dev Size : 1953381440 (1862.89 GiB 2000.26 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Fri Sep 24 17:30:51 2021 State : active Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Consistency Policy : bitmap Name : vdr:0 (local to host vdr) UUID : 5532ffda:ccbc790f:b50c4959:8f0fd43f Events : 32805 Number Major Minor RaidDevice State 2 8 33 0 active sync /dev/sdc1 3 8 17 1 active sync /dev/sdb1 [root@vdr jens]# mdadm -X /dev/sdb1 Filename : /dev/sdb1 Magic : 6d746962 Version : 4 UUID : 5532ffda:ccbc790f:b50c4959:8f0fd43f Events : 32804 Events Cleared : 32804 State : OK Chunksize : 64 MB Daemon : 5s flush period Write Mode : Allow write behind, max 4096 Sync Size : 1953381440 (1862.89 GiB 2000.26 GB) Bitmap : 29807 bits (chunks), 3 dirty (0.0%) [root@vdr jens]# mdadm -X /dev/sdc1 Filename : /dev/sdc1 Magic : 6d746962 Version : 4 UUID : 5532ffda:ccbc790f:b50c4959:8f0fd43f Events : 32804 Events Cleared : 32804 State : OK Chunksize : 64 MB Daemon : 5s flush period Write Mode : Allow write behind, max 4096 Sync Size : 1953381440 (1862.89 GiB 2000.26 GB) Bitmap : 29807 bits (chunks), 3 dirty (0.0%) Am 17.08.21 um 14:32 schrieb Ming Lei: > On Tue, Aug 17, 2021 at 06:06:12AM +0100, Christoph Hellwig wrote: >> On Mon, Aug 16, 2021 at 07:40:48PM +0800, Ming Lei wrote: >>>>> 0 ~ 254: each bvec's length is 512 >>>>> 255: bvec's length is 8192 >>>>> >>>>> the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it >>>>> still may need 257 bvecs, which can't be allocated via bio_alloc_bioset(). >>>> Yes, we still need the rounding magic that alloc_behind_master_bio uses >>>> here. >>> But it is wrong to use max sectors to limit number of bvecs(segments), isn't it? >> The raid1 write behind code cares about the size ofa bio it can reach by >> adding order 0 pages to it. The bvecs are part of that and I think the >> calculation in the patch documents that a well. > Thinking of further, your and Guoqing's patch are correct & enough since > bio_copy_data() just copies bytes(sectors) stream from fs bio to the > write behind bio. > > > Thanks, > Ming > [-- Attachment #2: raid1.patch --] [-- Type: text/x-patch, Size: 2245 bytes --] diff --unified --recursive --text archlinux-linux/drivers/md/raid1.c archlinux-linux-diff/drivers/md/raid1.c --- archlinux-linux/drivers/md/raid1.c 2021-09-24 14:37:15.347771866 +0200 +++ archlinux-linux-diff/drivers/md/raid1.c 2021-09-24 14:40:02.443978319 +0200 @@ -1501,7 +1501,7 @@ * Not if there are too many, or cannot * allocate memory, or a reader on WriteMostly * is waiting for behind writes to flush */ - if (bitmap && + if (bitmap && write_behind && (atomic_read(&bitmap->behind_writes) < mddev->bitmap_info.max_write_behind) && !waitqueue_active(&bitmap->behind_wait)) { diff --unified --recursive --text archlinux-linux/drivers/md/raid1.c archlinux-linux-changed/drivers/md/raid1.c --- archlinux-linux/drivers/md/raid1.c 2021-09-24 15:43:22.842680119 +0200 +++ archlinux-linux-changed/drivers/md/raid1.c 2021-09-24 15:43:59.426142955 +0200 @@ -1329,7 +1329,6 @@ struct raid1_plug_cb *plug = NULL; int first_clone; int max_sectors; - bool write_behind = false; if (mddev_is_clustered(mddev) && md_cluster_ops->area_resyncing(mddev, WRITE, @@ -1383,14 +1382,6 @@ for (i = 0; i < disks; i++) { struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev); - /* - * The write-behind io is only attempted on drives marked as - * write-mostly, which means we could allocate write behind - * bio later. - */ - if (rdev && test_bit(WriteMostly, &rdev->flags)) - write_behind = true; - if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) { atomic_inc(&rdev->nr_pending); blocked_rdev = rdev; @@ -1470,7 +1461,7 @@ * at a time and thus needs a new bio that can fit the whole payload * this bio in page sized chunks. */ - if (write_behind && bitmap) + if (bitmap) max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * (PAGE_SIZE >> 9)); if (max_sectors < bio_sectors(bio)) { @@ -1501,7 +1492,7 @@ * Not if there are too many, or cannot * allocate memory, or a reader on WriteMostly * is waiting for behind writes to flush */ - if (bitmap && + if (bitmap && (atomic_read(&bitmap->behind_writes) < mddev->bitmap_info.max_write_behind) && !waitqueue_active(&bitmap->behind_wait)) { ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors 2021-09-24 15:34 ` Jens Stutte (Archiv) @ 2021-09-25 23:02 ` Guoqing Jiang 0 siblings, 0 replies; 17+ messages in thread From: Guoqing Jiang @ 2021-09-25 23:02 UTC (permalink / raw) To: Jens Stutte (Archiv), Ming Lei, Christoph Hellwig Cc: song, linux-raid, linux-block On 9/24/21 11:34 PM, Jens Stutte (Archiv) wrote: > Hi all, > > I just had the occasion to test the new patch as landed in arch linux > 5.14.7. Unfortunately it does not work for me. Attached you can find a > modification that works for me, though I am not really sure why > write_behind seems not to be set to true on my configuration. If there > is any more data I can provide to help you to investigate, please let > me know. Thanks for the report! As commented in bugzilla, this is because write-behind IO still happens even without write-mostly device. I will send new patch after you confirm it works. 1. https://bugzilla.kernel.org/show_bug.cgi?id=213181 Thanks, Guoqing ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors 2021-08-13 6:05 [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors Guoqing Jiang @ 2021-08-13 9:27 ` kernel test robot 2021-08-13 9:27 ` kernel test robot 2021-08-13 10:12 ` kernel test robot 2 siblings, 0 replies; 17+ messages in thread From: kernel test robot @ 2021-08-13 9:27 UTC (permalink / raw) To: Guoqing Jiang, song; +Cc: clang-built-linux, kbuild-all, linux-raid, jens [-- Attachment #1: Type: text/plain, Size: 10658 bytes --] Hi Guoqing, Thank you for the patch! Yet something to improve: [auto build test ERROR on song-md/md-next] [also build test ERROR on v5.14-rc5 next-20210812] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Guoqing-Jiang/raid1-ensure-bio-doesn-t-have-more-than-BIO_MAX_VECS-sectors/20210813-140810 base: git://git.kernel.org/pub/scm/linux/kernel/git/song/md.git md-next config: hexagon-randconfig-r001-20210813 (attached as .config) compiler: clang version 12.0.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/29b7720a83de1deea0d8ecfafe0db46146636b15 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Guoqing-Jiang/raid1-ensure-bio-doesn-t-have-more-than-BIO_MAX_VECS-sectors/20210813-140810 git checkout 29b7720a83de1deea0d8ecfafe0db46146636b15 # save the attached .config to linux build tree mkdir build_dir COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross O=build_dir ARCH=hexagon SHELL=/bin/bash drivers/md/ If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): >> drivers/md/raid1.c:1459:55: error: use of undeclared identifier 'PAGE_SECTORS' max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS); ^ >> drivers/md/raid1.c:1459:55: error: use of undeclared identifier 'PAGE_SECTORS' 2 errors generated. vim +/PAGE_SECTORS +1459 drivers/md/raid1.c 1320 1321 static void raid1_write_request(struct mddev *mddev, struct bio *bio, 1322 int max_write_sectors) 1323 { 1324 struct r1conf *conf = mddev->private; 1325 struct r1bio *r1_bio; 1326 int i, disks; 1327 struct bitmap *bitmap = mddev->bitmap; 1328 unsigned long flags; 1329 struct md_rdev *blocked_rdev; 1330 struct blk_plug_cb *cb; 1331 struct raid1_plug_cb *plug = NULL; 1332 int first_clone; 1333 int max_sectors; 1334 1335 if (mddev_is_clustered(mddev) && 1336 md_cluster_ops->area_resyncing(mddev, WRITE, 1337 bio->bi_iter.bi_sector, bio_end_sector(bio))) { 1338 1339 DEFINE_WAIT(w); 1340 for (;;) { 1341 prepare_to_wait(&conf->wait_barrier, 1342 &w, TASK_IDLE); 1343 if (!md_cluster_ops->area_resyncing(mddev, WRITE, 1344 bio->bi_iter.bi_sector, 1345 bio_end_sector(bio))) 1346 break; 1347 schedule(); 1348 } 1349 finish_wait(&conf->wait_barrier, &w); 1350 } 1351 1352 /* 1353 * Register the new request and wait if the reconstruction 1354 * thread has put up a bar for new requests. 1355 * Continue immediately if no resync is active currently. 1356 */ 1357 wait_barrier(conf, bio->bi_iter.bi_sector); 1358 1359 r1_bio = alloc_r1bio(mddev, bio); 1360 r1_bio->sectors = max_write_sectors; 1361 1362 if (conf->pending_count >= max_queued_requests) { 1363 md_wakeup_thread(mddev->thread); 1364 raid1_log(mddev, "wait queued"); 1365 wait_event(conf->wait_barrier, 1366 conf->pending_count < max_queued_requests); 1367 } 1368 /* first select target devices under rcu_lock and 1369 * inc refcount on their rdev. Record them by setting 1370 * bios[x] to bio 1371 * If there are known/acknowledged bad blocks on any device on 1372 * which we have seen a write error, we want to avoid writing those 1373 * blocks. 1374 * This potentially requires several writes to write around 1375 * the bad blocks. Each set of writes gets it's own r1bio 1376 * with a set of bios attached. 1377 */ 1378 1379 disks = conf->raid_disks * 2; 1380 retry_write: 1381 blocked_rdev = NULL; 1382 rcu_read_lock(); 1383 max_sectors = r1_bio->sectors; 1384 for (i = 0; i < disks; i++) { 1385 struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev); 1386 if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) { 1387 atomic_inc(&rdev->nr_pending); 1388 blocked_rdev = rdev; 1389 break; 1390 } 1391 r1_bio->bios[i] = NULL; 1392 if (!rdev || test_bit(Faulty, &rdev->flags)) { 1393 if (i < conf->raid_disks) 1394 set_bit(R1BIO_Degraded, &r1_bio->state); 1395 continue; 1396 } 1397 1398 atomic_inc(&rdev->nr_pending); 1399 if (test_bit(WriteErrorSeen, &rdev->flags)) { 1400 sector_t first_bad; 1401 int bad_sectors; 1402 int is_bad; 1403 1404 is_bad = is_badblock(rdev, r1_bio->sector, max_sectors, 1405 &first_bad, &bad_sectors); 1406 if (is_bad < 0) { 1407 /* mustn't write here until the bad block is 1408 * acknowledged*/ 1409 set_bit(BlockedBadBlocks, &rdev->flags); 1410 blocked_rdev = rdev; 1411 break; 1412 } 1413 if (is_bad && first_bad <= r1_bio->sector) { 1414 /* Cannot write here at all */ 1415 bad_sectors -= (r1_bio->sector - first_bad); 1416 if (bad_sectors < max_sectors) 1417 /* mustn't write more than bad_sectors 1418 * to other devices yet 1419 */ 1420 max_sectors = bad_sectors; 1421 rdev_dec_pending(rdev, mddev); 1422 /* We don't set R1BIO_Degraded as that 1423 * only applies if the disk is 1424 * missing, so it might be re-added, 1425 * and we want to know to recover this 1426 * chunk. 1427 * In this case the device is here, 1428 * and the fact that this chunk is not 1429 * in-sync is recorded in the bad 1430 * block log 1431 */ 1432 continue; 1433 } 1434 if (is_bad) { 1435 int good_sectors = first_bad - r1_bio->sector; 1436 if (good_sectors < max_sectors) 1437 max_sectors = good_sectors; 1438 } 1439 } 1440 r1_bio->bios[i] = bio; 1441 } 1442 rcu_read_unlock(); 1443 1444 if (unlikely(blocked_rdev)) { 1445 /* Wait for this device to become unblocked */ 1446 int j; 1447 1448 for (j = 0; j < i; j++) 1449 if (r1_bio->bios[j]) 1450 rdev_dec_pending(conf->mirrors[j].rdev, mddev); 1451 r1_bio->state = 0; 1452 allow_barrier(conf, bio->bi_iter.bi_sector); 1453 raid1_log(mddev, "wait rdev %d blocked", blocked_rdev->raid_disk); 1454 md_wait_for_blocked_rdev(blocked_rdev, mddev); 1455 wait_barrier(conf, bio->bi_iter.bi_sector); 1456 goto retry_write; 1457 } 1458 > 1459 max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS); 1460 if (max_sectors < bio_sectors(bio)) { 1461 struct bio *split = bio_split(bio, max_sectors, 1462 GFP_NOIO, &conf->bio_split); 1463 bio_chain(split, bio); 1464 submit_bio_noacct(bio); 1465 bio = split; 1466 r1_bio->master_bio = bio; 1467 r1_bio->sectors = max_sectors; 1468 } 1469 1470 if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue)) 1471 r1_bio->start_time = bio_start_io_acct(bio); 1472 atomic_set(&r1_bio->remaining, 1); 1473 atomic_set(&r1_bio->behind_remaining, 0); 1474 1475 first_clone = 1; 1476 1477 for (i = 0; i < disks; i++) { 1478 struct bio *mbio = NULL; 1479 struct md_rdev *rdev = conf->mirrors[i].rdev; 1480 if (!r1_bio->bios[i]) 1481 continue; 1482 1483 if (first_clone) { 1484 /* do behind I/O ? 1485 * Not if there are too many, or cannot 1486 * allocate memory, or a reader on WriteMostly 1487 * is waiting for behind writes to flush */ 1488 if (bitmap && 1489 (atomic_read(&bitmap->behind_writes) 1490 < mddev->bitmap_info.max_write_behind) && 1491 !waitqueue_active(&bitmap->behind_wait)) { 1492 alloc_behind_master_bio(r1_bio, bio); 1493 } 1494 1495 md_bitmap_startwrite(bitmap, r1_bio->sector, r1_bio->sectors, 1496 test_bit(R1BIO_BehindIO, &r1_bio->state)); 1497 first_clone = 0; 1498 } 1499 1500 if (r1_bio->behind_master_bio) 1501 mbio = bio_clone_fast(r1_bio->behind_master_bio, 1502 GFP_NOIO, &mddev->bio_set); 1503 else 1504 mbio = bio_clone_fast(bio, GFP_NOIO, &mddev->bio_set); 1505 1506 if (r1_bio->behind_master_bio) { 1507 if (test_bit(CollisionCheck, &rdev->flags)) 1508 wait_for_serialization(rdev, r1_bio); 1509 if (test_bit(WriteMostly, &rdev->flags)) 1510 atomic_inc(&r1_bio->behind_remaining); 1511 } else if (mddev->serialize_policy) 1512 wait_for_serialization(rdev, r1_bio); 1513 1514 r1_bio->bios[i] = mbio; 1515 1516 mbio->bi_iter.bi_sector = (r1_bio->sector + 1517 conf->mirrors[i].rdev->data_offset); 1518 bio_set_dev(mbio, conf->mirrors[i].rdev->bdev); 1519 mbio->bi_end_io = raid1_end_write_request; 1520 mbio->bi_opf = bio_op(bio) | (bio->bi_opf & (REQ_SYNC | REQ_FUA)); 1521 if (test_bit(FailFast, &conf->mirrors[i].rdev->flags) && 1522 !test_bit(WriteMostly, &conf->mirrors[i].rdev->flags) && 1523 conf->raid_disks - mddev->degraded > 1) 1524 mbio->bi_opf |= MD_FAILFAST; 1525 mbio->bi_private = r1_bio; 1526 1527 atomic_inc(&r1_bio->remaining); 1528 1529 if (mddev->gendisk) 1530 trace_block_bio_remap(mbio, disk_devt(mddev->gendisk), 1531 r1_bio->sector); 1532 /* flush_pending_writes() needs access to the rdev so...*/ 1533 mbio->bi_bdev = (void *)conf->mirrors[i].rdev; 1534 1535 cb = blk_check_plugged(raid1_unplug, mddev, sizeof(*plug)); 1536 if (cb) 1537 plug = container_of(cb, struct raid1_plug_cb, cb); 1538 else 1539 plug = NULL; 1540 if (plug) { 1541 bio_list_add(&plug->pending, mbio); 1542 plug->pending_cnt++; 1543 } else { 1544 spin_lock_irqsave(&conf->device_lock, flags); 1545 bio_list_add(&conf->pending_bio_list, mbio); 1546 conf->pending_count++; 1547 spin_unlock_irqrestore(&conf->device_lock, flags); 1548 md_wakeup_thread(mddev->thread); 1549 } 1550 } 1551 1552 r1_bio_write_done(r1_bio); 1553 1554 /* In case raid1d snuck in to freeze_array */ 1555 wake_up(&conf->wait_barrier); 1556 } 1557 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 28124 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors @ 2021-08-13 9:27 ` kernel test robot 0 siblings, 0 replies; 17+ messages in thread From: kernel test robot @ 2021-08-13 9:27 UTC (permalink / raw) To: kbuild-all [-- Attachment #1: Type: text/plain, Size: 10941 bytes --] Hi Guoqing, Thank you for the patch! Yet something to improve: [auto build test ERROR on song-md/md-next] [also build test ERROR on v5.14-rc5 next-20210812] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Guoqing-Jiang/raid1-ensure-bio-doesn-t-have-more-than-BIO_MAX_VECS-sectors/20210813-140810 base: git://git.kernel.org/pub/scm/linux/kernel/git/song/md.git md-next config: hexagon-randconfig-r001-20210813 (attached as .config) compiler: clang version 12.0.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/29b7720a83de1deea0d8ecfafe0db46146636b15 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Guoqing-Jiang/raid1-ensure-bio-doesn-t-have-more-than-BIO_MAX_VECS-sectors/20210813-140810 git checkout 29b7720a83de1deea0d8ecfafe0db46146636b15 # save the attached .config to linux build tree mkdir build_dir COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross O=build_dir ARCH=hexagon SHELL=/bin/bash drivers/md/ If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): >> drivers/md/raid1.c:1459:55: error: use of undeclared identifier 'PAGE_SECTORS' max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS); ^ >> drivers/md/raid1.c:1459:55: error: use of undeclared identifier 'PAGE_SECTORS' 2 errors generated. vim +/PAGE_SECTORS +1459 drivers/md/raid1.c 1320 1321 static void raid1_write_request(struct mddev *mddev, struct bio *bio, 1322 int max_write_sectors) 1323 { 1324 struct r1conf *conf = mddev->private; 1325 struct r1bio *r1_bio; 1326 int i, disks; 1327 struct bitmap *bitmap = mddev->bitmap; 1328 unsigned long flags; 1329 struct md_rdev *blocked_rdev; 1330 struct blk_plug_cb *cb; 1331 struct raid1_plug_cb *plug = NULL; 1332 int first_clone; 1333 int max_sectors; 1334 1335 if (mddev_is_clustered(mddev) && 1336 md_cluster_ops->area_resyncing(mddev, WRITE, 1337 bio->bi_iter.bi_sector, bio_end_sector(bio))) { 1338 1339 DEFINE_WAIT(w); 1340 for (;;) { 1341 prepare_to_wait(&conf->wait_barrier, 1342 &w, TASK_IDLE); 1343 if (!md_cluster_ops->area_resyncing(mddev, WRITE, 1344 bio->bi_iter.bi_sector, 1345 bio_end_sector(bio))) 1346 break; 1347 schedule(); 1348 } 1349 finish_wait(&conf->wait_barrier, &w); 1350 } 1351 1352 /* 1353 * Register the new request and wait if the reconstruction 1354 * thread has put up a bar for new requests. 1355 * Continue immediately if no resync is active currently. 1356 */ 1357 wait_barrier(conf, bio->bi_iter.bi_sector); 1358 1359 r1_bio = alloc_r1bio(mddev, bio); 1360 r1_bio->sectors = max_write_sectors; 1361 1362 if (conf->pending_count >= max_queued_requests) { 1363 md_wakeup_thread(mddev->thread); 1364 raid1_log(mddev, "wait queued"); 1365 wait_event(conf->wait_barrier, 1366 conf->pending_count < max_queued_requests); 1367 } 1368 /* first select target devices under rcu_lock and 1369 * inc refcount on their rdev. Record them by setting 1370 * bios[x] to bio 1371 * If there are known/acknowledged bad blocks on any device on 1372 * which we have seen a write error, we want to avoid writing those 1373 * blocks. 1374 * This potentially requires several writes to write around 1375 * the bad blocks. Each set of writes gets it's own r1bio 1376 * with a set of bios attached. 1377 */ 1378 1379 disks = conf->raid_disks * 2; 1380 retry_write: 1381 blocked_rdev = NULL; 1382 rcu_read_lock(); 1383 max_sectors = r1_bio->sectors; 1384 for (i = 0; i < disks; i++) { 1385 struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev); 1386 if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) { 1387 atomic_inc(&rdev->nr_pending); 1388 blocked_rdev = rdev; 1389 break; 1390 } 1391 r1_bio->bios[i] = NULL; 1392 if (!rdev || test_bit(Faulty, &rdev->flags)) { 1393 if (i < conf->raid_disks) 1394 set_bit(R1BIO_Degraded, &r1_bio->state); 1395 continue; 1396 } 1397 1398 atomic_inc(&rdev->nr_pending); 1399 if (test_bit(WriteErrorSeen, &rdev->flags)) { 1400 sector_t first_bad; 1401 int bad_sectors; 1402 int is_bad; 1403 1404 is_bad = is_badblock(rdev, r1_bio->sector, max_sectors, 1405 &first_bad, &bad_sectors); 1406 if (is_bad < 0) { 1407 /* mustn't write here until the bad block is 1408 * acknowledged*/ 1409 set_bit(BlockedBadBlocks, &rdev->flags); 1410 blocked_rdev = rdev; 1411 break; 1412 } 1413 if (is_bad && first_bad <= r1_bio->sector) { 1414 /* Cannot write here at all */ 1415 bad_sectors -= (r1_bio->sector - first_bad); 1416 if (bad_sectors < max_sectors) 1417 /* mustn't write more than bad_sectors 1418 * to other devices yet 1419 */ 1420 max_sectors = bad_sectors; 1421 rdev_dec_pending(rdev, mddev); 1422 /* We don't set R1BIO_Degraded as that 1423 * only applies if the disk is 1424 * missing, so it might be re-added, 1425 * and we want to know to recover this 1426 * chunk. 1427 * In this case the device is here, 1428 * and the fact that this chunk is not 1429 * in-sync is recorded in the bad 1430 * block log 1431 */ 1432 continue; 1433 } 1434 if (is_bad) { 1435 int good_sectors = first_bad - r1_bio->sector; 1436 if (good_sectors < max_sectors) 1437 max_sectors = good_sectors; 1438 } 1439 } 1440 r1_bio->bios[i] = bio; 1441 } 1442 rcu_read_unlock(); 1443 1444 if (unlikely(blocked_rdev)) { 1445 /* Wait for this device to become unblocked */ 1446 int j; 1447 1448 for (j = 0; j < i; j++) 1449 if (r1_bio->bios[j]) 1450 rdev_dec_pending(conf->mirrors[j].rdev, mddev); 1451 r1_bio->state = 0; 1452 allow_barrier(conf, bio->bi_iter.bi_sector); 1453 raid1_log(mddev, "wait rdev %d blocked", blocked_rdev->raid_disk); 1454 md_wait_for_blocked_rdev(blocked_rdev, mddev); 1455 wait_barrier(conf, bio->bi_iter.bi_sector); 1456 goto retry_write; 1457 } 1458 > 1459 max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS); 1460 if (max_sectors < bio_sectors(bio)) { 1461 struct bio *split = bio_split(bio, max_sectors, 1462 GFP_NOIO, &conf->bio_split); 1463 bio_chain(split, bio); 1464 submit_bio_noacct(bio); 1465 bio = split; 1466 r1_bio->master_bio = bio; 1467 r1_bio->sectors = max_sectors; 1468 } 1469 1470 if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue)) 1471 r1_bio->start_time = bio_start_io_acct(bio); 1472 atomic_set(&r1_bio->remaining, 1); 1473 atomic_set(&r1_bio->behind_remaining, 0); 1474 1475 first_clone = 1; 1476 1477 for (i = 0; i < disks; i++) { 1478 struct bio *mbio = NULL; 1479 struct md_rdev *rdev = conf->mirrors[i].rdev; 1480 if (!r1_bio->bios[i]) 1481 continue; 1482 1483 if (first_clone) { 1484 /* do behind I/O ? 1485 * Not if there are too many, or cannot 1486 * allocate memory, or a reader on WriteMostly 1487 * is waiting for behind writes to flush */ 1488 if (bitmap && 1489 (atomic_read(&bitmap->behind_writes) 1490 < mddev->bitmap_info.max_write_behind) && 1491 !waitqueue_active(&bitmap->behind_wait)) { 1492 alloc_behind_master_bio(r1_bio, bio); 1493 } 1494 1495 md_bitmap_startwrite(bitmap, r1_bio->sector, r1_bio->sectors, 1496 test_bit(R1BIO_BehindIO, &r1_bio->state)); 1497 first_clone = 0; 1498 } 1499 1500 if (r1_bio->behind_master_bio) 1501 mbio = bio_clone_fast(r1_bio->behind_master_bio, 1502 GFP_NOIO, &mddev->bio_set); 1503 else 1504 mbio = bio_clone_fast(bio, GFP_NOIO, &mddev->bio_set); 1505 1506 if (r1_bio->behind_master_bio) { 1507 if (test_bit(CollisionCheck, &rdev->flags)) 1508 wait_for_serialization(rdev, r1_bio); 1509 if (test_bit(WriteMostly, &rdev->flags)) 1510 atomic_inc(&r1_bio->behind_remaining); 1511 } else if (mddev->serialize_policy) 1512 wait_for_serialization(rdev, r1_bio); 1513 1514 r1_bio->bios[i] = mbio; 1515 1516 mbio->bi_iter.bi_sector = (r1_bio->sector + 1517 conf->mirrors[i].rdev->data_offset); 1518 bio_set_dev(mbio, conf->mirrors[i].rdev->bdev); 1519 mbio->bi_end_io = raid1_end_write_request; 1520 mbio->bi_opf = bio_op(bio) | (bio->bi_opf & (REQ_SYNC | REQ_FUA)); 1521 if (test_bit(FailFast, &conf->mirrors[i].rdev->flags) && 1522 !test_bit(WriteMostly, &conf->mirrors[i].rdev->flags) && 1523 conf->raid_disks - mddev->degraded > 1) 1524 mbio->bi_opf |= MD_FAILFAST; 1525 mbio->bi_private = r1_bio; 1526 1527 atomic_inc(&r1_bio->remaining); 1528 1529 if (mddev->gendisk) 1530 trace_block_bio_remap(mbio, disk_devt(mddev->gendisk), 1531 r1_bio->sector); 1532 /* flush_pending_writes() needs access to the rdev so...*/ 1533 mbio->bi_bdev = (void *)conf->mirrors[i].rdev; 1534 1535 cb = blk_check_plugged(raid1_unplug, mddev, sizeof(*plug)); 1536 if (cb) 1537 plug = container_of(cb, struct raid1_plug_cb, cb); 1538 else 1539 plug = NULL; 1540 if (plug) { 1541 bio_list_add(&plug->pending, mbio); 1542 plug->pending_cnt++; 1543 } else { 1544 spin_lock_irqsave(&conf->device_lock, flags); 1545 bio_list_add(&conf->pending_bio_list, mbio); 1546 conf->pending_count++; 1547 spin_unlock_irqrestore(&conf->device_lock, flags); 1548 md_wakeup_thread(mddev->thread); 1549 } 1550 } 1551 1552 r1_bio_write_done(r1_bio); 1553 1554 /* In case raid1d snuck in to freeze_array */ 1555 wake_up(&conf->wait_barrier); 1556 } 1557 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org [-- Attachment #2: config.gz --] [-- Type: application/gzip, Size: 28124 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors 2021-08-13 6:05 [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors Guoqing Jiang @ 2021-08-13 10:12 ` kernel test robot 2021-08-13 9:27 ` kernel test robot 2021-08-13 10:12 ` kernel test robot 2 siblings, 0 replies; 17+ messages in thread From: kernel test robot @ 2021-08-13 10:12 UTC (permalink / raw) To: Guoqing Jiang, song; +Cc: kbuild-all, linux-raid, jens [-- Attachment #1: Type: text/plain, Size: 13412 bytes --] Hi Guoqing, Thank you for the patch! Yet something to improve: [auto build test ERROR on song-md/md-next] [also build test ERROR on v5.14-rc5 next-20210812] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Guoqing-Jiang/raid1-ensure-bio-doesn-t-have-more-than-BIO_MAX_VECS-sectors/20210813-140810 base: git://git.kernel.org/pub/scm/linux/kernel/git/song/md.git md-next config: nds32-randconfig-r035-20210813 (attached as .config) compiler: nds32le-linux-gcc (GCC) 10.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/29b7720a83de1deea0d8ecfafe0db46146636b15 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Guoqing-Jiang/raid1-ensure-bio-doesn-t-have-more-than-BIO_MAX_VECS-sectors/20210813-140810 git checkout 29b7720a83de1deea0d8ecfafe0db46146636b15 # save the attached .config to linux build tree mkdir build_dir COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-10.3.0 make.cross O=build_dir ARCH=nds32 SHELL=/bin/bash drivers/md/ If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): In file included from include/linux/kernel.h:15, from include/asm-generic/bug.h:20, from ./arch/nds32/include/generated/asm/bug.h:1, from include/linux/bug.h:5, from include/linux/mmdebug.h:5, from include/linux/gfp.h:5, from include/linux/slab.h:15, from drivers/md/raid1.c:26: drivers/md/raid1.c: In function 'raid1_write_request': >> drivers/md/raid1.c:1459:55: error: 'PAGE_SECTORS' undeclared (first use in this function); did you mean 'PAGE_MEMORY'? 1459 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS); | ^~~~~~~~~~~~ include/linux/minmax.h:20:39: note: in definition of macro '__typecheck' 20 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) | ^ include/linux/minmax.h:36:24: note: in expansion of macro '__safe_cmp' 36 | __builtin_choose_expr(__safe_cmp(x, y), \ | ^~~~~~~~~~ include/linux/minmax.h:104:27: note: in expansion of macro '__careful_cmp' 104 | #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <) | ^~~~~~~~~~~~~ drivers/md/raid1.c:1459:16: note: in expansion of macro 'min_t' 1459 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS); | ^~~~~ drivers/md/raid1.c:1459:55: note: each undeclared identifier is reported only once for each function it appears in 1459 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS); | ^~~~~~~~~~~~ include/linux/minmax.h:20:39: note: in definition of macro '__typecheck' 20 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) | ^ include/linux/minmax.h:36:24: note: in expansion of macro '__safe_cmp' 36 | __builtin_choose_expr(__safe_cmp(x, y), \ | ^~~~~~~~~~ include/linux/minmax.h:104:27: note: in expansion of macro '__careful_cmp' 104 | #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <) | ^~~~~~~~~~~~~ drivers/md/raid1.c:1459:16: note: in expansion of macro 'min_t' 1459 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS); | ^~~~~ >> include/linux/minmax.h:36:2: error: first argument to '__builtin_choose_expr' not a constant 36 | __builtin_choose_expr(__safe_cmp(x, y), \ | ^~~~~~~~~~~~~~~~~~~~~ include/linux/minmax.h:104:27: note: in expansion of macro '__careful_cmp' 104 | #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <) | ^~~~~~~~~~~~~ drivers/md/raid1.c:1459:16: note: in expansion of macro 'min_t' 1459 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS); | ^~~~~ vim +1459 drivers/md/raid1.c 1320 1321 static void raid1_write_request(struct mddev *mddev, struct bio *bio, 1322 int max_write_sectors) 1323 { 1324 struct r1conf *conf = mddev->private; 1325 struct r1bio *r1_bio; 1326 int i, disks; 1327 struct bitmap *bitmap = mddev->bitmap; 1328 unsigned long flags; 1329 struct md_rdev *blocked_rdev; 1330 struct blk_plug_cb *cb; 1331 struct raid1_plug_cb *plug = NULL; 1332 int first_clone; 1333 int max_sectors; 1334 1335 if (mddev_is_clustered(mddev) && 1336 md_cluster_ops->area_resyncing(mddev, WRITE, 1337 bio->bi_iter.bi_sector, bio_end_sector(bio))) { 1338 1339 DEFINE_WAIT(w); 1340 for (;;) { 1341 prepare_to_wait(&conf->wait_barrier, 1342 &w, TASK_IDLE); 1343 if (!md_cluster_ops->area_resyncing(mddev, WRITE, 1344 bio->bi_iter.bi_sector, 1345 bio_end_sector(bio))) 1346 break; 1347 schedule(); 1348 } 1349 finish_wait(&conf->wait_barrier, &w); 1350 } 1351 1352 /* 1353 * Register the new request and wait if the reconstruction 1354 * thread has put up a bar for new requests. 1355 * Continue immediately if no resync is active currently. 1356 */ 1357 wait_barrier(conf, bio->bi_iter.bi_sector); 1358 1359 r1_bio = alloc_r1bio(mddev, bio); 1360 r1_bio->sectors = max_write_sectors; 1361 1362 if (conf->pending_count >= max_queued_requests) { 1363 md_wakeup_thread(mddev->thread); 1364 raid1_log(mddev, "wait queued"); 1365 wait_event(conf->wait_barrier, 1366 conf->pending_count < max_queued_requests); 1367 } 1368 /* first select target devices under rcu_lock and 1369 * inc refcount on their rdev. Record them by setting 1370 * bios[x] to bio 1371 * If there are known/acknowledged bad blocks on any device on 1372 * which we have seen a write error, we want to avoid writing those 1373 * blocks. 1374 * This potentially requires several writes to write around 1375 * the bad blocks. Each set of writes gets it's own r1bio 1376 * with a set of bios attached. 1377 */ 1378 1379 disks = conf->raid_disks * 2; 1380 retry_write: 1381 blocked_rdev = NULL; 1382 rcu_read_lock(); 1383 max_sectors = r1_bio->sectors; 1384 for (i = 0; i < disks; i++) { 1385 struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev); 1386 if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) { 1387 atomic_inc(&rdev->nr_pending); 1388 blocked_rdev = rdev; 1389 break; 1390 } 1391 r1_bio->bios[i] = NULL; 1392 if (!rdev || test_bit(Faulty, &rdev->flags)) { 1393 if (i < conf->raid_disks) 1394 set_bit(R1BIO_Degraded, &r1_bio->state); 1395 continue; 1396 } 1397 1398 atomic_inc(&rdev->nr_pending); 1399 if (test_bit(WriteErrorSeen, &rdev->flags)) { 1400 sector_t first_bad; 1401 int bad_sectors; 1402 int is_bad; 1403 1404 is_bad = is_badblock(rdev, r1_bio->sector, max_sectors, 1405 &first_bad, &bad_sectors); 1406 if (is_bad < 0) { 1407 /* mustn't write here until the bad block is 1408 * acknowledged*/ 1409 set_bit(BlockedBadBlocks, &rdev->flags); 1410 blocked_rdev = rdev; 1411 break; 1412 } 1413 if (is_bad && first_bad <= r1_bio->sector) { 1414 /* Cannot write here at all */ 1415 bad_sectors -= (r1_bio->sector - first_bad); 1416 if (bad_sectors < max_sectors) 1417 /* mustn't write more than bad_sectors 1418 * to other devices yet 1419 */ 1420 max_sectors = bad_sectors; 1421 rdev_dec_pending(rdev, mddev); 1422 /* We don't set R1BIO_Degraded as that 1423 * only applies if the disk is 1424 * missing, so it might be re-added, 1425 * and we want to know to recover this 1426 * chunk. 1427 * In this case the device is here, 1428 * and the fact that this chunk is not 1429 * in-sync is recorded in the bad 1430 * block log 1431 */ 1432 continue; 1433 } 1434 if (is_bad) { 1435 int good_sectors = first_bad - r1_bio->sector; 1436 if (good_sectors < max_sectors) 1437 max_sectors = good_sectors; 1438 } 1439 } 1440 r1_bio->bios[i] = bio; 1441 } 1442 rcu_read_unlock(); 1443 1444 if (unlikely(blocked_rdev)) { 1445 /* Wait for this device to become unblocked */ 1446 int j; 1447 1448 for (j = 0; j < i; j++) 1449 if (r1_bio->bios[j]) 1450 rdev_dec_pending(conf->mirrors[j].rdev, mddev); 1451 r1_bio->state = 0; 1452 allow_barrier(conf, bio->bi_iter.bi_sector); 1453 raid1_log(mddev, "wait rdev %d blocked", blocked_rdev->raid_disk); 1454 md_wait_for_blocked_rdev(blocked_rdev, mddev); 1455 wait_barrier(conf, bio->bi_iter.bi_sector); 1456 goto retry_write; 1457 } 1458 > 1459 max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS); 1460 if (max_sectors < bio_sectors(bio)) { 1461 struct bio *split = bio_split(bio, max_sectors, 1462 GFP_NOIO, &conf->bio_split); 1463 bio_chain(split, bio); 1464 submit_bio_noacct(bio); 1465 bio = split; 1466 r1_bio->master_bio = bio; 1467 r1_bio->sectors = max_sectors; 1468 } 1469 1470 if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue)) 1471 r1_bio->start_time = bio_start_io_acct(bio); 1472 atomic_set(&r1_bio->remaining, 1); 1473 atomic_set(&r1_bio->behind_remaining, 0); 1474 1475 first_clone = 1; 1476 1477 for (i = 0; i < disks; i++) { 1478 struct bio *mbio = NULL; 1479 struct md_rdev *rdev = conf->mirrors[i].rdev; 1480 if (!r1_bio->bios[i]) 1481 continue; 1482 1483 if (first_clone) { 1484 /* do behind I/O ? 1485 * Not if there are too many, or cannot 1486 * allocate memory, or a reader on WriteMostly 1487 * is waiting for behind writes to flush */ 1488 if (bitmap && 1489 (atomic_read(&bitmap->behind_writes) 1490 < mddev->bitmap_info.max_write_behind) && 1491 !waitqueue_active(&bitmap->behind_wait)) { 1492 alloc_behind_master_bio(r1_bio, bio); 1493 } 1494 1495 md_bitmap_startwrite(bitmap, r1_bio->sector, r1_bio->sectors, 1496 test_bit(R1BIO_BehindIO, &r1_bio->state)); 1497 first_clone = 0; 1498 } 1499 1500 if (r1_bio->behind_master_bio) 1501 mbio = bio_clone_fast(r1_bio->behind_master_bio, 1502 GFP_NOIO, &mddev->bio_set); 1503 else 1504 mbio = bio_clone_fast(bio, GFP_NOIO, &mddev->bio_set); 1505 1506 if (r1_bio->behind_master_bio) { 1507 if (test_bit(CollisionCheck, &rdev->flags)) 1508 wait_for_serialization(rdev, r1_bio); 1509 if (test_bit(WriteMostly, &rdev->flags)) 1510 atomic_inc(&r1_bio->behind_remaining); 1511 } else if (mddev->serialize_policy) 1512 wait_for_serialization(rdev, r1_bio); 1513 1514 r1_bio->bios[i] = mbio; 1515 1516 mbio->bi_iter.bi_sector = (r1_bio->sector + 1517 conf->mirrors[i].rdev->data_offset); 1518 bio_set_dev(mbio, conf->mirrors[i].rdev->bdev); 1519 mbio->bi_end_io = raid1_end_write_request; 1520 mbio->bi_opf = bio_op(bio) | (bio->bi_opf & (REQ_SYNC | REQ_FUA)); 1521 if (test_bit(FailFast, &conf->mirrors[i].rdev->flags) && 1522 !test_bit(WriteMostly, &conf->mirrors[i].rdev->flags) && 1523 conf->raid_disks - mddev->degraded > 1) 1524 mbio->bi_opf |= MD_FAILFAST; 1525 mbio->bi_private = r1_bio; 1526 1527 atomic_inc(&r1_bio->remaining); 1528 1529 if (mddev->gendisk) 1530 trace_block_bio_remap(mbio, disk_devt(mddev->gendisk), 1531 r1_bio->sector); 1532 /* flush_pending_writes() needs access to the rdev so...*/ 1533 mbio->bi_bdev = (void *)conf->mirrors[i].rdev; 1534 1535 cb = blk_check_plugged(raid1_unplug, mddev, sizeof(*plug)); 1536 if (cb) 1537 plug = container_of(cb, struct raid1_plug_cb, cb); 1538 else 1539 plug = NULL; 1540 if (plug) { 1541 bio_list_add(&plug->pending, mbio); 1542 plug->pending_cnt++; 1543 } else { 1544 spin_lock_irqsave(&conf->device_lock, flags); 1545 bio_list_add(&conf->pending_bio_list, mbio); 1546 conf->pending_count++; 1547 spin_unlock_irqrestore(&conf->device_lock, flags); 1548 md_wakeup_thread(mddev->thread); 1549 } 1550 } 1551 1552 r1_bio_write_done(r1_bio); 1553 1554 /* In case raid1d snuck in to freeze_array */ 1555 wake_up(&conf->wait_barrier); 1556 } 1557 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 30071 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors @ 2021-08-13 10:12 ` kernel test robot 0 siblings, 0 replies; 17+ messages in thread From: kernel test robot @ 2021-08-13 10:12 UTC (permalink / raw) To: kbuild-all [-- Attachment #1: Type: text/plain, Size: 13738 bytes --] Hi Guoqing, Thank you for the patch! Yet something to improve: [auto build test ERROR on song-md/md-next] [also build test ERROR on v5.14-rc5 next-20210812] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Guoqing-Jiang/raid1-ensure-bio-doesn-t-have-more-than-BIO_MAX_VECS-sectors/20210813-140810 base: git://git.kernel.org/pub/scm/linux/kernel/git/song/md.git md-next config: nds32-randconfig-r035-20210813 (attached as .config) compiler: nds32le-linux-gcc (GCC) 10.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/29b7720a83de1deea0d8ecfafe0db46146636b15 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Guoqing-Jiang/raid1-ensure-bio-doesn-t-have-more-than-BIO_MAX_VECS-sectors/20210813-140810 git checkout 29b7720a83de1deea0d8ecfafe0db46146636b15 # save the attached .config to linux build tree mkdir build_dir COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-10.3.0 make.cross O=build_dir ARCH=nds32 SHELL=/bin/bash drivers/md/ If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): In file included from include/linux/kernel.h:15, from include/asm-generic/bug.h:20, from ./arch/nds32/include/generated/asm/bug.h:1, from include/linux/bug.h:5, from include/linux/mmdebug.h:5, from include/linux/gfp.h:5, from include/linux/slab.h:15, from drivers/md/raid1.c:26: drivers/md/raid1.c: In function 'raid1_write_request': >> drivers/md/raid1.c:1459:55: error: 'PAGE_SECTORS' undeclared (first use in this function); did you mean 'PAGE_MEMORY'? 1459 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS); | ^~~~~~~~~~~~ include/linux/minmax.h:20:39: note: in definition of macro '__typecheck' 20 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) | ^ include/linux/minmax.h:36:24: note: in expansion of macro '__safe_cmp' 36 | __builtin_choose_expr(__safe_cmp(x, y), \ | ^~~~~~~~~~ include/linux/minmax.h:104:27: note: in expansion of macro '__careful_cmp' 104 | #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <) | ^~~~~~~~~~~~~ drivers/md/raid1.c:1459:16: note: in expansion of macro 'min_t' 1459 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS); | ^~~~~ drivers/md/raid1.c:1459:55: note: each undeclared identifier is reported only once for each function it appears in 1459 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS); | ^~~~~~~~~~~~ include/linux/minmax.h:20:39: note: in definition of macro '__typecheck' 20 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) | ^ include/linux/minmax.h:36:24: note: in expansion of macro '__safe_cmp' 36 | __builtin_choose_expr(__safe_cmp(x, y), \ | ^~~~~~~~~~ include/linux/minmax.h:104:27: note: in expansion of macro '__careful_cmp' 104 | #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <) | ^~~~~~~~~~~~~ drivers/md/raid1.c:1459:16: note: in expansion of macro 'min_t' 1459 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS); | ^~~~~ >> include/linux/minmax.h:36:2: error: first argument to '__builtin_choose_expr' not a constant 36 | __builtin_choose_expr(__safe_cmp(x, y), \ | ^~~~~~~~~~~~~~~~~~~~~ include/linux/minmax.h:104:27: note: in expansion of macro '__careful_cmp' 104 | #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <) | ^~~~~~~~~~~~~ drivers/md/raid1.c:1459:16: note: in expansion of macro 'min_t' 1459 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS); | ^~~~~ vim +1459 drivers/md/raid1.c 1320 1321 static void raid1_write_request(struct mddev *mddev, struct bio *bio, 1322 int max_write_sectors) 1323 { 1324 struct r1conf *conf = mddev->private; 1325 struct r1bio *r1_bio; 1326 int i, disks; 1327 struct bitmap *bitmap = mddev->bitmap; 1328 unsigned long flags; 1329 struct md_rdev *blocked_rdev; 1330 struct blk_plug_cb *cb; 1331 struct raid1_plug_cb *plug = NULL; 1332 int first_clone; 1333 int max_sectors; 1334 1335 if (mddev_is_clustered(mddev) && 1336 md_cluster_ops->area_resyncing(mddev, WRITE, 1337 bio->bi_iter.bi_sector, bio_end_sector(bio))) { 1338 1339 DEFINE_WAIT(w); 1340 for (;;) { 1341 prepare_to_wait(&conf->wait_barrier, 1342 &w, TASK_IDLE); 1343 if (!md_cluster_ops->area_resyncing(mddev, WRITE, 1344 bio->bi_iter.bi_sector, 1345 bio_end_sector(bio))) 1346 break; 1347 schedule(); 1348 } 1349 finish_wait(&conf->wait_barrier, &w); 1350 } 1351 1352 /* 1353 * Register the new request and wait if the reconstruction 1354 * thread has put up a bar for new requests. 1355 * Continue immediately if no resync is active currently. 1356 */ 1357 wait_barrier(conf, bio->bi_iter.bi_sector); 1358 1359 r1_bio = alloc_r1bio(mddev, bio); 1360 r1_bio->sectors = max_write_sectors; 1361 1362 if (conf->pending_count >= max_queued_requests) { 1363 md_wakeup_thread(mddev->thread); 1364 raid1_log(mddev, "wait queued"); 1365 wait_event(conf->wait_barrier, 1366 conf->pending_count < max_queued_requests); 1367 } 1368 /* first select target devices under rcu_lock and 1369 * inc refcount on their rdev. Record them by setting 1370 * bios[x] to bio 1371 * If there are known/acknowledged bad blocks on any device on 1372 * which we have seen a write error, we want to avoid writing those 1373 * blocks. 1374 * This potentially requires several writes to write around 1375 * the bad blocks. Each set of writes gets it's own r1bio 1376 * with a set of bios attached. 1377 */ 1378 1379 disks = conf->raid_disks * 2; 1380 retry_write: 1381 blocked_rdev = NULL; 1382 rcu_read_lock(); 1383 max_sectors = r1_bio->sectors; 1384 for (i = 0; i < disks; i++) { 1385 struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev); 1386 if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) { 1387 atomic_inc(&rdev->nr_pending); 1388 blocked_rdev = rdev; 1389 break; 1390 } 1391 r1_bio->bios[i] = NULL; 1392 if (!rdev || test_bit(Faulty, &rdev->flags)) { 1393 if (i < conf->raid_disks) 1394 set_bit(R1BIO_Degraded, &r1_bio->state); 1395 continue; 1396 } 1397 1398 atomic_inc(&rdev->nr_pending); 1399 if (test_bit(WriteErrorSeen, &rdev->flags)) { 1400 sector_t first_bad; 1401 int bad_sectors; 1402 int is_bad; 1403 1404 is_bad = is_badblock(rdev, r1_bio->sector, max_sectors, 1405 &first_bad, &bad_sectors); 1406 if (is_bad < 0) { 1407 /* mustn't write here until the bad block is 1408 * acknowledged*/ 1409 set_bit(BlockedBadBlocks, &rdev->flags); 1410 blocked_rdev = rdev; 1411 break; 1412 } 1413 if (is_bad && first_bad <= r1_bio->sector) { 1414 /* Cannot write here at all */ 1415 bad_sectors -= (r1_bio->sector - first_bad); 1416 if (bad_sectors < max_sectors) 1417 /* mustn't write more than bad_sectors 1418 * to other devices yet 1419 */ 1420 max_sectors = bad_sectors; 1421 rdev_dec_pending(rdev, mddev); 1422 /* We don't set R1BIO_Degraded as that 1423 * only applies if the disk is 1424 * missing, so it might be re-added, 1425 * and we want to know to recover this 1426 * chunk. 1427 * In this case the device is here, 1428 * and the fact that this chunk is not 1429 * in-sync is recorded in the bad 1430 * block log 1431 */ 1432 continue; 1433 } 1434 if (is_bad) { 1435 int good_sectors = first_bad - r1_bio->sector; 1436 if (good_sectors < max_sectors) 1437 max_sectors = good_sectors; 1438 } 1439 } 1440 r1_bio->bios[i] = bio; 1441 } 1442 rcu_read_unlock(); 1443 1444 if (unlikely(blocked_rdev)) { 1445 /* Wait for this device to become unblocked */ 1446 int j; 1447 1448 for (j = 0; j < i; j++) 1449 if (r1_bio->bios[j]) 1450 rdev_dec_pending(conf->mirrors[j].rdev, mddev); 1451 r1_bio->state = 0; 1452 allow_barrier(conf, bio->bi_iter.bi_sector); 1453 raid1_log(mddev, "wait rdev %d blocked", blocked_rdev->raid_disk); 1454 md_wait_for_blocked_rdev(blocked_rdev, mddev); 1455 wait_barrier(conf, bio->bi_iter.bi_sector); 1456 goto retry_write; 1457 } 1458 > 1459 max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS); 1460 if (max_sectors < bio_sectors(bio)) { 1461 struct bio *split = bio_split(bio, max_sectors, 1462 GFP_NOIO, &conf->bio_split); 1463 bio_chain(split, bio); 1464 submit_bio_noacct(bio); 1465 bio = split; 1466 r1_bio->master_bio = bio; 1467 r1_bio->sectors = max_sectors; 1468 } 1469 1470 if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue)) 1471 r1_bio->start_time = bio_start_io_acct(bio); 1472 atomic_set(&r1_bio->remaining, 1); 1473 atomic_set(&r1_bio->behind_remaining, 0); 1474 1475 first_clone = 1; 1476 1477 for (i = 0; i < disks; i++) { 1478 struct bio *mbio = NULL; 1479 struct md_rdev *rdev = conf->mirrors[i].rdev; 1480 if (!r1_bio->bios[i]) 1481 continue; 1482 1483 if (first_clone) { 1484 /* do behind I/O ? 1485 * Not if there are too many, or cannot 1486 * allocate memory, or a reader on WriteMostly 1487 * is waiting for behind writes to flush */ 1488 if (bitmap && 1489 (atomic_read(&bitmap->behind_writes) 1490 < mddev->bitmap_info.max_write_behind) && 1491 !waitqueue_active(&bitmap->behind_wait)) { 1492 alloc_behind_master_bio(r1_bio, bio); 1493 } 1494 1495 md_bitmap_startwrite(bitmap, r1_bio->sector, r1_bio->sectors, 1496 test_bit(R1BIO_BehindIO, &r1_bio->state)); 1497 first_clone = 0; 1498 } 1499 1500 if (r1_bio->behind_master_bio) 1501 mbio = bio_clone_fast(r1_bio->behind_master_bio, 1502 GFP_NOIO, &mddev->bio_set); 1503 else 1504 mbio = bio_clone_fast(bio, GFP_NOIO, &mddev->bio_set); 1505 1506 if (r1_bio->behind_master_bio) { 1507 if (test_bit(CollisionCheck, &rdev->flags)) 1508 wait_for_serialization(rdev, r1_bio); 1509 if (test_bit(WriteMostly, &rdev->flags)) 1510 atomic_inc(&r1_bio->behind_remaining); 1511 } else if (mddev->serialize_policy) 1512 wait_for_serialization(rdev, r1_bio); 1513 1514 r1_bio->bios[i] = mbio; 1515 1516 mbio->bi_iter.bi_sector = (r1_bio->sector + 1517 conf->mirrors[i].rdev->data_offset); 1518 bio_set_dev(mbio, conf->mirrors[i].rdev->bdev); 1519 mbio->bi_end_io = raid1_end_write_request; 1520 mbio->bi_opf = bio_op(bio) | (bio->bi_opf & (REQ_SYNC | REQ_FUA)); 1521 if (test_bit(FailFast, &conf->mirrors[i].rdev->flags) && 1522 !test_bit(WriteMostly, &conf->mirrors[i].rdev->flags) && 1523 conf->raid_disks - mddev->degraded > 1) 1524 mbio->bi_opf |= MD_FAILFAST; 1525 mbio->bi_private = r1_bio; 1526 1527 atomic_inc(&r1_bio->remaining); 1528 1529 if (mddev->gendisk) 1530 trace_block_bio_remap(mbio, disk_devt(mddev->gendisk), 1531 r1_bio->sector); 1532 /* flush_pending_writes() needs access to the rdev so...*/ 1533 mbio->bi_bdev = (void *)conf->mirrors[i].rdev; 1534 1535 cb = blk_check_plugged(raid1_unplug, mddev, sizeof(*plug)); 1536 if (cb) 1537 plug = container_of(cb, struct raid1_plug_cb, cb); 1538 else 1539 plug = NULL; 1540 if (plug) { 1541 bio_list_add(&plug->pending, mbio); 1542 plug->pending_cnt++; 1543 } else { 1544 spin_lock_irqsave(&conf->device_lock, flags); 1545 bio_list_add(&conf->pending_bio_list, mbio); 1546 conf->pending_count++; 1547 spin_unlock_irqrestore(&conf->device_lock, flags); 1548 md_wakeup_thread(mddev->thread); 1549 } 1550 } 1551 1552 r1_bio_write_done(r1_bio); 1553 1554 /* In case raid1d snuck in to freeze_array */ 1555 wake_up(&conf->wait_barrier); 1556 } 1557 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org [-- Attachment #2: config.gz --] [-- Type: application/gzip, Size: 30071 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2021-09-25 23:02 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-08-13 6:05 [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors Guoqing Jiang 2021-08-13 7:49 ` Christoph Hellwig 2021-08-13 8:38 ` Guoqing Jiang 2021-08-14 7:55 ` Christoph Hellwig 2021-08-14 8:57 ` Ming Lei 2021-08-16 6:27 ` Guoqing Jiang 2021-08-16 7:13 ` Ming Lei 2021-08-16 9:37 ` Christoph Hellwig 2021-08-16 11:40 ` Ming Lei 2021-08-17 5:06 ` Christoph Hellwig 2021-08-17 12:32 ` Ming Lei 2021-09-24 15:34 ` Jens Stutte (Archiv) 2021-09-25 23:02 ` Guoqing Jiang 2021-08-13 9:27 ` kernel test robot 2021-08-13 9:27 ` kernel test robot 2021-08-13 10:12 ` kernel test robot 2021-08-13 10:12 ` kernel test robot
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.