* [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
@ 2021-08-13 6:05 Guoqing Jiang
2021-08-13 7:49 ` Christoph Hellwig
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Guoqing Jiang @ 2021-08-13 6:05 UTC (permalink / raw)
To: song; +Cc: linux-raid, jens
From: Guoqing Jiang <jiangguoqing@kylinos.cn>
We can't split bio with more than BIO_MAX_VECS sectors, otherwise the
below call trace was triggered because we could allocate oversized
write behind bio later.
[ 8.097936] bvec_alloc+0x90/0xc0
[ 8.098934] bio_alloc_bioset+0x1b3/0x260
[ 8.099959] raid1_make_request+0x9ce/0xc50 [raid1]
[ 8.100988] ? __bio_clone_fast+0xa8/0xe0
[ 8.102008] md_handle_request+0x158/0x1d0 [md_mod]
[ 8.103050] md_submit_bio+0xcd/0x110 [md_mod]
[ 8.104084] submit_bio_noacct+0x139/0x530
[ 8.105127] submit_bio+0x78/0x1d0
[ 8.106163] ext4_io_submit+0x48/0x60 [ext4]
[ 8.107242] ext4_writepages+0x652/0x1170 [ext4]
[ 8.108300] ? do_writepages+0x41/0x100
[ 8.109338] ? __ext4_mark_inode_dirty+0x240/0x240 [ext4]
[ 8.110406] do_writepages+0x41/0x100
[ 8.111450] __filemap_fdatawrite_range+0xc5/0x100
[ 8.112513] file_write_and_wait_range+0x61/0xb0
[ 8.113564] ext4_sync_file+0x73/0x370 [ext4]
[ 8.114607] __x64_sys_fsync+0x33/0x60
[ 8.115635] do_syscall_64+0x33/0x40
[ 8.116670] entry_SYSCALL_64_after_hwframe+0x44/0xae
[1]. https://bugs.archlinux.org/task/70992
Reported-by: Jens Stutte <jens@chianterastutte.eu>
Tested-by: Jens Stutte <jens@chianterastutte.eu>
Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn>
---
Note: this depends on commit 018eca456c4b4dca56aaf1ec27f309c74d0fe246
in linux-block for-next branch.
drivers/md/raid1.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 3c44c4bb40fc..ab21abc056b8 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1454,6 +1454,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
goto retry_write;
}
+ max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
if (max_sectors < bio_sectors(bio)) {
struct bio *split = bio_split(bio, max_sectors,
GFP_NOIO, &conf->bio_split);
--
2.25.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
2021-08-13 6:05 [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors Guoqing Jiang
@ 2021-08-13 7:49 ` Christoph Hellwig
2021-08-13 8:38 ` Guoqing Jiang
2021-08-13 9:27 ` kernel test robot
2021-08-13 10:12 ` kernel test robot
2 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2021-08-13 7:49 UTC (permalink / raw)
To: Guoqing Jiang; +Cc: song, linux-raid, jens, linux-block
On Fri, Aug 13, 2021 at 02:05:10PM +0800, Guoqing Jiang wrote:
> From: Guoqing Jiang <jiangguoqing@kylinos.cn>
>
> We can't split bio with more than BIO_MAX_VECS sectors, otherwise the
> below call trace was triggered because we could allocate oversized
> write behind bio later.
>
> [ 8.097936] bvec_alloc+0x90/0xc0
> [ 8.098934] bio_alloc_bioset+0x1b3/0x260
> [ 8.099959] raid1_make_request+0x9ce/0xc50 [raid1]
Which bio_alloc_bioset is this? The one in alloc_behind_master_bio?
In which case I think you want to limit the reduction of max_sectors
to just the write behind case, and clearly document what is going on.
In general the size of a bio only depends on the number of vectors, not
the total I/O size. But alloc_behind_master_bio allocates new backing
pages using order 0 allocations, so in this exceptional case the total
size oes actually matter.
While we're at it: this huge memory allocation looks really deadlock
prone.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
2021-08-13 7:49 ` Christoph Hellwig
@ 2021-08-13 8:38 ` Guoqing Jiang
2021-08-14 7:55 ` Christoph Hellwig
0 siblings, 1 reply; 17+ messages in thread
From: Guoqing Jiang @ 2021-08-13 8:38 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: song, linux-raid, jens, linux-block
On 8/13/21 3:49 PM, Christoph Hellwig wrote:
> On Fri, Aug 13, 2021 at 02:05:10PM +0800, Guoqing Jiang wrote:
>> From: Guoqing Jiang <jiangguoqing@kylinos.cn>
>>
>> We can't split bio with more than BIO_MAX_VECS sectors, otherwise the
>> below call trace was triggered because we could allocate oversized
>> write behind bio later.
>>
>> [ 8.097936] bvec_alloc+0x90/0xc0
>> [ 8.098934] bio_alloc_bioset+0x1b3/0x260
>> [ 8.099959] raid1_make_request+0x9ce/0xc50 [raid1]
> Which bio_alloc_bioset is this? The one in alloc_behind_master_bio?
Yes, it should be the one since bio_clone_fast calls bio_alloc_bioset
with 0 iovecs.
> In which case I think you want to limit the reduction of max_sectors
> to just the write behind case, and clearly document what is going on.
Ok, thanks.
> In general the size of a bio only depends on the number of vectors, not
> the total I/O size. But alloc_behind_master_bio allocates new backing
> pages using order 0 allocations, so in this exceptional case the total
> size oes actually matter.
>
> While we're at it: this huge memory allocation looks really deadlock
> prone.
Hmm, let me think more about it, or could you share your thought? 😉
Thanks,
Guoqing
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
2021-08-13 6:05 [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors Guoqing Jiang
@ 2021-08-13 9:27 ` kernel test robot
2021-08-13 9:27 ` kernel test robot
2021-08-13 10:12 ` kernel test robot
2 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2021-08-13 9:27 UTC (permalink / raw)
To: Guoqing Jiang, song; +Cc: clang-built-linux, kbuild-all, linux-raid, jens
[-- Attachment #1: Type: text/plain, Size: 10658 bytes --]
Hi Guoqing,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on song-md/md-next]
[also build test ERROR on v5.14-rc5 next-20210812]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Guoqing-Jiang/raid1-ensure-bio-doesn-t-have-more-than-BIO_MAX_VECS-sectors/20210813-140810
base: git://git.kernel.org/pub/scm/linux/kernel/git/song/md.git md-next
config: hexagon-randconfig-r001-20210813 (attached as .config)
compiler: clang version 12.0.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/29b7720a83de1deea0d8ecfafe0db46146636b15
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Guoqing-Jiang/raid1-ensure-bio-doesn-t-have-more-than-BIO_MAX_VECS-sectors/20210813-140810
git checkout 29b7720a83de1deea0d8ecfafe0db46146636b15
# save the attached .config to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross O=build_dir ARCH=hexagon SHELL=/bin/bash drivers/md/
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
>> drivers/md/raid1.c:1459:55: error: use of undeclared identifier 'PAGE_SECTORS'
max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
^
>> drivers/md/raid1.c:1459:55: error: use of undeclared identifier 'PAGE_SECTORS'
2 errors generated.
vim +/PAGE_SECTORS +1459 drivers/md/raid1.c
1320
1321 static void raid1_write_request(struct mddev *mddev, struct bio *bio,
1322 int max_write_sectors)
1323 {
1324 struct r1conf *conf = mddev->private;
1325 struct r1bio *r1_bio;
1326 int i, disks;
1327 struct bitmap *bitmap = mddev->bitmap;
1328 unsigned long flags;
1329 struct md_rdev *blocked_rdev;
1330 struct blk_plug_cb *cb;
1331 struct raid1_plug_cb *plug = NULL;
1332 int first_clone;
1333 int max_sectors;
1334
1335 if (mddev_is_clustered(mddev) &&
1336 md_cluster_ops->area_resyncing(mddev, WRITE,
1337 bio->bi_iter.bi_sector, bio_end_sector(bio))) {
1338
1339 DEFINE_WAIT(w);
1340 for (;;) {
1341 prepare_to_wait(&conf->wait_barrier,
1342 &w, TASK_IDLE);
1343 if (!md_cluster_ops->area_resyncing(mddev, WRITE,
1344 bio->bi_iter.bi_sector,
1345 bio_end_sector(bio)))
1346 break;
1347 schedule();
1348 }
1349 finish_wait(&conf->wait_barrier, &w);
1350 }
1351
1352 /*
1353 * Register the new request and wait if the reconstruction
1354 * thread has put up a bar for new requests.
1355 * Continue immediately if no resync is active currently.
1356 */
1357 wait_barrier(conf, bio->bi_iter.bi_sector);
1358
1359 r1_bio = alloc_r1bio(mddev, bio);
1360 r1_bio->sectors = max_write_sectors;
1361
1362 if (conf->pending_count >= max_queued_requests) {
1363 md_wakeup_thread(mddev->thread);
1364 raid1_log(mddev, "wait queued");
1365 wait_event(conf->wait_barrier,
1366 conf->pending_count < max_queued_requests);
1367 }
1368 /* first select target devices under rcu_lock and
1369 * inc refcount on their rdev. Record them by setting
1370 * bios[x] to bio
1371 * If there are known/acknowledged bad blocks on any device on
1372 * which we have seen a write error, we want to avoid writing those
1373 * blocks.
1374 * This potentially requires several writes to write around
1375 * the bad blocks. Each set of writes gets it's own r1bio
1376 * with a set of bios attached.
1377 */
1378
1379 disks = conf->raid_disks * 2;
1380 retry_write:
1381 blocked_rdev = NULL;
1382 rcu_read_lock();
1383 max_sectors = r1_bio->sectors;
1384 for (i = 0; i < disks; i++) {
1385 struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev);
1386 if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
1387 atomic_inc(&rdev->nr_pending);
1388 blocked_rdev = rdev;
1389 break;
1390 }
1391 r1_bio->bios[i] = NULL;
1392 if (!rdev || test_bit(Faulty, &rdev->flags)) {
1393 if (i < conf->raid_disks)
1394 set_bit(R1BIO_Degraded, &r1_bio->state);
1395 continue;
1396 }
1397
1398 atomic_inc(&rdev->nr_pending);
1399 if (test_bit(WriteErrorSeen, &rdev->flags)) {
1400 sector_t first_bad;
1401 int bad_sectors;
1402 int is_bad;
1403
1404 is_bad = is_badblock(rdev, r1_bio->sector, max_sectors,
1405 &first_bad, &bad_sectors);
1406 if (is_bad < 0) {
1407 /* mustn't write here until the bad block is
1408 * acknowledged*/
1409 set_bit(BlockedBadBlocks, &rdev->flags);
1410 blocked_rdev = rdev;
1411 break;
1412 }
1413 if (is_bad && first_bad <= r1_bio->sector) {
1414 /* Cannot write here at all */
1415 bad_sectors -= (r1_bio->sector - first_bad);
1416 if (bad_sectors < max_sectors)
1417 /* mustn't write more than bad_sectors
1418 * to other devices yet
1419 */
1420 max_sectors = bad_sectors;
1421 rdev_dec_pending(rdev, mddev);
1422 /* We don't set R1BIO_Degraded as that
1423 * only applies if the disk is
1424 * missing, so it might be re-added,
1425 * and we want to know to recover this
1426 * chunk.
1427 * In this case the device is here,
1428 * and the fact that this chunk is not
1429 * in-sync is recorded in the bad
1430 * block log
1431 */
1432 continue;
1433 }
1434 if (is_bad) {
1435 int good_sectors = first_bad - r1_bio->sector;
1436 if (good_sectors < max_sectors)
1437 max_sectors = good_sectors;
1438 }
1439 }
1440 r1_bio->bios[i] = bio;
1441 }
1442 rcu_read_unlock();
1443
1444 if (unlikely(blocked_rdev)) {
1445 /* Wait for this device to become unblocked */
1446 int j;
1447
1448 for (j = 0; j < i; j++)
1449 if (r1_bio->bios[j])
1450 rdev_dec_pending(conf->mirrors[j].rdev, mddev);
1451 r1_bio->state = 0;
1452 allow_barrier(conf, bio->bi_iter.bi_sector);
1453 raid1_log(mddev, "wait rdev %d blocked", blocked_rdev->raid_disk);
1454 md_wait_for_blocked_rdev(blocked_rdev, mddev);
1455 wait_barrier(conf, bio->bi_iter.bi_sector);
1456 goto retry_write;
1457 }
1458
> 1459 max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
1460 if (max_sectors < bio_sectors(bio)) {
1461 struct bio *split = bio_split(bio, max_sectors,
1462 GFP_NOIO, &conf->bio_split);
1463 bio_chain(split, bio);
1464 submit_bio_noacct(bio);
1465 bio = split;
1466 r1_bio->master_bio = bio;
1467 r1_bio->sectors = max_sectors;
1468 }
1469
1470 if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue))
1471 r1_bio->start_time = bio_start_io_acct(bio);
1472 atomic_set(&r1_bio->remaining, 1);
1473 atomic_set(&r1_bio->behind_remaining, 0);
1474
1475 first_clone = 1;
1476
1477 for (i = 0; i < disks; i++) {
1478 struct bio *mbio = NULL;
1479 struct md_rdev *rdev = conf->mirrors[i].rdev;
1480 if (!r1_bio->bios[i])
1481 continue;
1482
1483 if (first_clone) {
1484 /* do behind I/O ?
1485 * Not if there are too many, or cannot
1486 * allocate memory, or a reader on WriteMostly
1487 * is waiting for behind writes to flush */
1488 if (bitmap &&
1489 (atomic_read(&bitmap->behind_writes)
1490 < mddev->bitmap_info.max_write_behind) &&
1491 !waitqueue_active(&bitmap->behind_wait)) {
1492 alloc_behind_master_bio(r1_bio, bio);
1493 }
1494
1495 md_bitmap_startwrite(bitmap, r1_bio->sector, r1_bio->sectors,
1496 test_bit(R1BIO_BehindIO, &r1_bio->state));
1497 first_clone = 0;
1498 }
1499
1500 if (r1_bio->behind_master_bio)
1501 mbio = bio_clone_fast(r1_bio->behind_master_bio,
1502 GFP_NOIO, &mddev->bio_set);
1503 else
1504 mbio = bio_clone_fast(bio, GFP_NOIO, &mddev->bio_set);
1505
1506 if (r1_bio->behind_master_bio) {
1507 if (test_bit(CollisionCheck, &rdev->flags))
1508 wait_for_serialization(rdev, r1_bio);
1509 if (test_bit(WriteMostly, &rdev->flags))
1510 atomic_inc(&r1_bio->behind_remaining);
1511 } else if (mddev->serialize_policy)
1512 wait_for_serialization(rdev, r1_bio);
1513
1514 r1_bio->bios[i] = mbio;
1515
1516 mbio->bi_iter.bi_sector = (r1_bio->sector +
1517 conf->mirrors[i].rdev->data_offset);
1518 bio_set_dev(mbio, conf->mirrors[i].rdev->bdev);
1519 mbio->bi_end_io = raid1_end_write_request;
1520 mbio->bi_opf = bio_op(bio) | (bio->bi_opf & (REQ_SYNC | REQ_FUA));
1521 if (test_bit(FailFast, &conf->mirrors[i].rdev->flags) &&
1522 !test_bit(WriteMostly, &conf->mirrors[i].rdev->flags) &&
1523 conf->raid_disks - mddev->degraded > 1)
1524 mbio->bi_opf |= MD_FAILFAST;
1525 mbio->bi_private = r1_bio;
1526
1527 atomic_inc(&r1_bio->remaining);
1528
1529 if (mddev->gendisk)
1530 trace_block_bio_remap(mbio, disk_devt(mddev->gendisk),
1531 r1_bio->sector);
1532 /* flush_pending_writes() needs access to the rdev so...*/
1533 mbio->bi_bdev = (void *)conf->mirrors[i].rdev;
1534
1535 cb = blk_check_plugged(raid1_unplug, mddev, sizeof(*plug));
1536 if (cb)
1537 plug = container_of(cb, struct raid1_plug_cb, cb);
1538 else
1539 plug = NULL;
1540 if (plug) {
1541 bio_list_add(&plug->pending, mbio);
1542 plug->pending_cnt++;
1543 } else {
1544 spin_lock_irqsave(&conf->device_lock, flags);
1545 bio_list_add(&conf->pending_bio_list, mbio);
1546 conf->pending_count++;
1547 spin_unlock_irqrestore(&conf->device_lock, flags);
1548 md_wakeup_thread(mddev->thread);
1549 }
1550 }
1551
1552 r1_bio_write_done(r1_bio);
1553
1554 /* In case raid1d snuck in to freeze_array */
1555 wake_up(&conf->wait_barrier);
1556 }
1557
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 28124 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
@ 2021-08-13 9:27 ` kernel test robot
0 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2021-08-13 9:27 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 10941 bytes --]
Hi Guoqing,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on song-md/md-next]
[also build test ERROR on v5.14-rc5 next-20210812]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Guoqing-Jiang/raid1-ensure-bio-doesn-t-have-more-than-BIO_MAX_VECS-sectors/20210813-140810
base: git://git.kernel.org/pub/scm/linux/kernel/git/song/md.git md-next
config: hexagon-randconfig-r001-20210813 (attached as .config)
compiler: clang version 12.0.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/29b7720a83de1deea0d8ecfafe0db46146636b15
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Guoqing-Jiang/raid1-ensure-bio-doesn-t-have-more-than-BIO_MAX_VECS-sectors/20210813-140810
git checkout 29b7720a83de1deea0d8ecfafe0db46146636b15
# save the attached .config to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross O=build_dir ARCH=hexagon SHELL=/bin/bash drivers/md/
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
>> drivers/md/raid1.c:1459:55: error: use of undeclared identifier 'PAGE_SECTORS'
max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
^
>> drivers/md/raid1.c:1459:55: error: use of undeclared identifier 'PAGE_SECTORS'
2 errors generated.
vim +/PAGE_SECTORS +1459 drivers/md/raid1.c
1320
1321 static void raid1_write_request(struct mddev *mddev, struct bio *bio,
1322 int max_write_sectors)
1323 {
1324 struct r1conf *conf = mddev->private;
1325 struct r1bio *r1_bio;
1326 int i, disks;
1327 struct bitmap *bitmap = mddev->bitmap;
1328 unsigned long flags;
1329 struct md_rdev *blocked_rdev;
1330 struct blk_plug_cb *cb;
1331 struct raid1_plug_cb *plug = NULL;
1332 int first_clone;
1333 int max_sectors;
1334
1335 if (mddev_is_clustered(mddev) &&
1336 md_cluster_ops->area_resyncing(mddev, WRITE,
1337 bio->bi_iter.bi_sector, bio_end_sector(bio))) {
1338
1339 DEFINE_WAIT(w);
1340 for (;;) {
1341 prepare_to_wait(&conf->wait_barrier,
1342 &w, TASK_IDLE);
1343 if (!md_cluster_ops->area_resyncing(mddev, WRITE,
1344 bio->bi_iter.bi_sector,
1345 bio_end_sector(bio)))
1346 break;
1347 schedule();
1348 }
1349 finish_wait(&conf->wait_barrier, &w);
1350 }
1351
1352 /*
1353 * Register the new request and wait if the reconstruction
1354 * thread has put up a bar for new requests.
1355 * Continue immediately if no resync is active currently.
1356 */
1357 wait_barrier(conf, bio->bi_iter.bi_sector);
1358
1359 r1_bio = alloc_r1bio(mddev, bio);
1360 r1_bio->sectors = max_write_sectors;
1361
1362 if (conf->pending_count >= max_queued_requests) {
1363 md_wakeup_thread(mddev->thread);
1364 raid1_log(mddev, "wait queued");
1365 wait_event(conf->wait_barrier,
1366 conf->pending_count < max_queued_requests);
1367 }
1368 /* first select target devices under rcu_lock and
1369 * inc refcount on their rdev. Record them by setting
1370 * bios[x] to bio
1371 * If there are known/acknowledged bad blocks on any device on
1372 * which we have seen a write error, we want to avoid writing those
1373 * blocks.
1374 * This potentially requires several writes to write around
1375 * the bad blocks. Each set of writes gets it's own r1bio
1376 * with a set of bios attached.
1377 */
1378
1379 disks = conf->raid_disks * 2;
1380 retry_write:
1381 blocked_rdev = NULL;
1382 rcu_read_lock();
1383 max_sectors = r1_bio->sectors;
1384 for (i = 0; i < disks; i++) {
1385 struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev);
1386 if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
1387 atomic_inc(&rdev->nr_pending);
1388 blocked_rdev = rdev;
1389 break;
1390 }
1391 r1_bio->bios[i] = NULL;
1392 if (!rdev || test_bit(Faulty, &rdev->flags)) {
1393 if (i < conf->raid_disks)
1394 set_bit(R1BIO_Degraded, &r1_bio->state);
1395 continue;
1396 }
1397
1398 atomic_inc(&rdev->nr_pending);
1399 if (test_bit(WriteErrorSeen, &rdev->flags)) {
1400 sector_t first_bad;
1401 int bad_sectors;
1402 int is_bad;
1403
1404 is_bad = is_badblock(rdev, r1_bio->sector, max_sectors,
1405 &first_bad, &bad_sectors);
1406 if (is_bad < 0) {
1407 /* mustn't write here until the bad block is
1408 * acknowledged*/
1409 set_bit(BlockedBadBlocks, &rdev->flags);
1410 blocked_rdev = rdev;
1411 break;
1412 }
1413 if (is_bad && first_bad <= r1_bio->sector) {
1414 /* Cannot write here at all */
1415 bad_sectors -= (r1_bio->sector - first_bad);
1416 if (bad_sectors < max_sectors)
1417 /* mustn't write more than bad_sectors
1418 * to other devices yet
1419 */
1420 max_sectors = bad_sectors;
1421 rdev_dec_pending(rdev, mddev);
1422 /* We don't set R1BIO_Degraded as that
1423 * only applies if the disk is
1424 * missing, so it might be re-added,
1425 * and we want to know to recover this
1426 * chunk.
1427 * In this case the device is here,
1428 * and the fact that this chunk is not
1429 * in-sync is recorded in the bad
1430 * block log
1431 */
1432 continue;
1433 }
1434 if (is_bad) {
1435 int good_sectors = first_bad - r1_bio->sector;
1436 if (good_sectors < max_sectors)
1437 max_sectors = good_sectors;
1438 }
1439 }
1440 r1_bio->bios[i] = bio;
1441 }
1442 rcu_read_unlock();
1443
1444 if (unlikely(blocked_rdev)) {
1445 /* Wait for this device to become unblocked */
1446 int j;
1447
1448 for (j = 0; j < i; j++)
1449 if (r1_bio->bios[j])
1450 rdev_dec_pending(conf->mirrors[j].rdev, mddev);
1451 r1_bio->state = 0;
1452 allow_barrier(conf, bio->bi_iter.bi_sector);
1453 raid1_log(mddev, "wait rdev %d blocked", blocked_rdev->raid_disk);
1454 md_wait_for_blocked_rdev(blocked_rdev, mddev);
1455 wait_barrier(conf, bio->bi_iter.bi_sector);
1456 goto retry_write;
1457 }
1458
> 1459 max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
1460 if (max_sectors < bio_sectors(bio)) {
1461 struct bio *split = bio_split(bio, max_sectors,
1462 GFP_NOIO, &conf->bio_split);
1463 bio_chain(split, bio);
1464 submit_bio_noacct(bio);
1465 bio = split;
1466 r1_bio->master_bio = bio;
1467 r1_bio->sectors = max_sectors;
1468 }
1469
1470 if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue))
1471 r1_bio->start_time = bio_start_io_acct(bio);
1472 atomic_set(&r1_bio->remaining, 1);
1473 atomic_set(&r1_bio->behind_remaining, 0);
1474
1475 first_clone = 1;
1476
1477 for (i = 0; i < disks; i++) {
1478 struct bio *mbio = NULL;
1479 struct md_rdev *rdev = conf->mirrors[i].rdev;
1480 if (!r1_bio->bios[i])
1481 continue;
1482
1483 if (first_clone) {
1484 /* do behind I/O ?
1485 * Not if there are too many, or cannot
1486 * allocate memory, or a reader on WriteMostly
1487 * is waiting for behind writes to flush */
1488 if (bitmap &&
1489 (atomic_read(&bitmap->behind_writes)
1490 < mddev->bitmap_info.max_write_behind) &&
1491 !waitqueue_active(&bitmap->behind_wait)) {
1492 alloc_behind_master_bio(r1_bio, bio);
1493 }
1494
1495 md_bitmap_startwrite(bitmap, r1_bio->sector, r1_bio->sectors,
1496 test_bit(R1BIO_BehindIO, &r1_bio->state));
1497 first_clone = 0;
1498 }
1499
1500 if (r1_bio->behind_master_bio)
1501 mbio = bio_clone_fast(r1_bio->behind_master_bio,
1502 GFP_NOIO, &mddev->bio_set);
1503 else
1504 mbio = bio_clone_fast(bio, GFP_NOIO, &mddev->bio_set);
1505
1506 if (r1_bio->behind_master_bio) {
1507 if (test_bit(CollisionCheck, &rdev->flags))
1508 wait_for_serialization(rdev, r1_bio);
1509 if (test_bit(WriteMostly, &rdev->flags))
1510 atomic_inc(&r1_bio->behind_remaining);
1511 } else if (mddev->serialize_policy)
1512 wait_for_serialization(rdev, r1_bio);
1513
1514 r1_bio->bios[i] = mbio;
1515
1516 mbio->bi_iter.bi_sector = (r1_bio->sector +
1517 conf->mirrors[i].rdev->data_offset);
1518 bio_set_dev(mbio, conf->mirrors[i].rdev->bdev);
1519 mbio->bi_end_io = raid1_end_write_request;
1520 mbio->bi_opf = bio_op(bio) | (bio->bi_opf & (REQ_SYNC | REQ_FUA));
1521 if (test_bit(FailFast, &conf->mirrors[i].rdev->flags) &&
1522 !test_bit(WriteMostly, &conf->mirrors[i].rdev->flags) &&
1523 conf->raid_disks - mddev->degraded > 1)
1524 mbio->bi_opf |= MD_FAILFAST;
1525 mbio->bi_private = r1_bio;
1526
1527 atomic_inc(&r1_bio->remaining);
1528
1529 if (mddev->gendisk)
1530 trace_block_bio_remap(mbio, disk_devt(mddev->gendisk),
1531 r1_bio->sector);
1532 /* flush_pending_writes() needs access to the rdev so...*/
1533 mbio->bi_bdev = (void *)conf->mirrors[i].rdev;
1534
1535 cb = blk_check_plugged(raid1_unplug, mddev, sizeof(*plug));
1536 if (cb)
1537 plug = container_of(cb, struct raid1_plug_cb, cb);
1538 else
1539 plug = NULL;
1540 if (plug) {
1541 bio_list_add(&plug->pending, mbio);
1542 plug->pending_cnt++;
1543 } else {
1544 spin_lock_irqsave(&conf->device_lock, flags);
1545 bio_list_add(&conf->pending_bio_list, mbio);
1546 conf->pending_count++;
1547 spin_unlock_irqrestore(&conf->device_lock, flags);
1548 md_wakeup_thread(mddev->thread);
1549 }
1550 }
1551
1552 r1_bio_write_done(r1_bio);
1553
1554 /* In case raid1d snuck in to freeze_array */
1555 wake_up(&conf->wait_barrier);
1556 }
1557
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 28124 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
2021-08-13 6:05 [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors Guoqing Jiang
@ 2021-08-13 10:12 ` kernel test robot
2021-08-13 9:27 ` kernel test robot
2021-08-13 10:12 ` kernel test robot
2 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2021-08-13 10:12 UTC (permalink / raw)
To: Guoqing Jiang, song; +Cc: kbuild-all, linux-raid, jens
[-- Attachment #1: Type: text/plain, Size: 13412 bytes --]
Hi Guoqing,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on song-md/md-next]
[also build test ERROR on v5.14-rc5 next-20210812]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Guoqing-Jiang/raid1-ensure-bio-doesn-t-have-more-than-BIO_MAX_VECS-sectors/20210813-140810
base: git://git.kernel.org/pub/scm/linux/kernel/git/song/md.git md-next
config: nds32-randconfig-r035-20210813 (attached as .config)
compiler: nds32le-linux-gcc (GCC) 10.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/29b7720a83de1deea0d8ecfafe0db46146636b15
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Guoqing-Jiang/raid1-ensure-bio-doesn-t-have-more-than-BIO_MAX_VECS-sectors/20210813-140810
git checkout 29b7720a83de1deea0d8ecfafe0db46146636b15
# save the attached .config to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-10.3.0 make.cross O=build_dir ARCH=nds32 SHELL=/bin/bash drivers/md/
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
In file included from include/linux/kernel.h:15,
from include/asm-generic/bug.h:20,
from ./arch/nds32/include/generated/asm/bug.h:1,
from include/linux/bug.h:5,
from include/linux/mmdebug.h:5,
from include/linux/gfp.h:5,
from include/linux/slab.h:15,
from drivers/md/raid1.c:26:
drivers/md/raid1.c: In function 'raid1_write_request':
>> drivers/md/raid1.c:1459:55: error: 'PAGE_SECTORS' undeclared (first use in this function); did you mean 'PAGE_MEMORY'?
1459 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
| ^~~~~~~~~~~~
include/linux/minmax.h:20:39: note: in definition of macro '__typecheck'
20 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
| ^
include/linux/minmax.h:36:24: note: in expansion of macro '__safe_cmp'
36 | __builtin_choose_expr(__safe_cmp(x, y), \
| ^~~~~~~~~~
include/linux/minmax.h:104:27: note: in expansion of macro '__careful_cmp'
104 | #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
| ^~~~~~~~~~~~~
drivers/md/raid1.c:1459:16: note: in expansion of macro 'min_t'
1459 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
| ^~~~~
drivers/md/raid1.c:1459:55: note: each undeclared identifier is reported only once for each function it appears in
1459 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
| ^~~~~~~~~~~~
include/linux/minmax.h:20:39: note: in definition of macro '__typecheck'
20 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
| ^
include/linux/minmax.h:36:24: note: in expansion of macro '__safe_cmp'
36 | __builtin_choose_expr(__safe_cmp(x, y), \
| ^~~~~~~~~~
include/linux/minmax.h:104:27: note: in expansion of macro '__careful_cmp'
104 | #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
| ^~~~~~~~~~~~~
drivers/md/raid1.c:1459:16: note: in expansion of macro 'min_t'
1459 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
| ^~~~~
>> include/linux/minmax.h:36:2: error: first argument to '__builtin_choose_expr' not a constant
36 | __builtin_choose_expr(__safe_cmp(x, y), \
| ^~~~~~~~~~~~~~~~~~~~~
include/linux/minmax.h:104:27: note: in expansion of macro '__careful_cmp'
104 | #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
| ^~~~~~~~~~~~~
drivers/md/raid1.c:1459:16: note: in expansion of macro 'min_t'
1459 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
| ^~~~~
vim +1459 drivers/md/raid1.c
1320
1321 static void raid1_write_request(struct mddev *mddev, struct bio *bio,
1322 int max_write_sectors)
1323 {
1324 struct r1conf *conf = mddev->private;
1325 struct r1bio *r1_bio;
1326 int i, disks;
1327 struct bitmap *bitmap = mddev->bitmap;
1328 unsigned long flags;
1329 struct md_rdev *blocked_rdev;
1330 struct blk_plug_cb *cb;
1331 struct raid1_plug_cb *plug = NULL;
1332 int first_clone;
1333 int max_sectors;
1334
1335 if (mddev_is_clustered(mddev) &&
1336 md_cluster_ops->area_resyncing(mddev, WRITE,
1337 bio->bi_iter.bi_sector, bio_end_sector(bio))) {
1338
1339 DEFINE_WAIT(w);
1340 for (;;) {
1341 prepare_to_wait(&conf->wait_barrier,
1342 &w, TASK_IDLE);
1343 if (!md_cluster_ops->area_resyncing(mddev, WRITE,
1344 bio->bi_iter.bi_sector,
1345 bio_end_sector(bio)))
1346 break;
1347 schedule();
1348 }
1349 finish_wait(&conf->wait_barrier, &w);
1350 }
1351
1352 /*
1353 * Register the new request and wait if the reconstruction
1354 * thread has put up a bar for new requests.
1355 * Continue immediately if no resync is active currently.
1356 */
1357 wait_barrier(conf, bio->bi_iter.bi_sector);
1358
1359 r1_bio = alloc_r1bio(mddev, bio);
1360 r1_bio->sectors = max_write_sectors;
1361
1362 if (conf->pending_count >= max_queued_requests) {
1363 md_wakeup_thread(mddev->thread);
1364 raid1_log(mddev, "wait queued");
1365 wait_event(conf->wait_barrier,
1366 conf->pending_count < max_queued_requests);
1367 }
1368 /* first select target devices under rcu_lock and
1369 * inc refcount on their rdev. Record them by setting
1370 * bios[x] to bio
1371 * If there are known/acknowledged bad blocks on any device on
1372 * which we have seen a write error, we want to avoid writing those
1373 * blocks.
1374 * This potentially requires several writes to write around
1375 * the bad blocks. Each set of writes gets it's own r1bio
1376 * with a set of bios attached.
1377 */
1378
1379 disks = conf->raid_disks * 2;
1380 retry_write:
1381 blocked_rdev = NULL;
1382 rcu_read_lock();
1383 max_sectors = r1_bio->sectors;
1384 for (i = 0; i < disks; i++) {
1385 struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev);
1386 if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
1387 atomic_inc(&rdev->nr_pending);
1388 blocked_rdev = rdev;
1389 break;
1390 }
1391 r1_bio->bios[i] = NULL;
1392 if (!rdev || test_bit(Faulty, &rdev->flags)) {
1393 if (i < conf->raid_disks)
1394 set_bit(R1BIO_Degraded, &r1_bio->state);
1395 continue;
1396 }
1397
1398 atomic_inc(&rdev->nr_pending);
1399 if (test_bit(WriteErrorSeen, &rdev->flags)) {
1400 sector_t first_bad;
1401 int bad_sectors;
1402 int is_bad;
1403
1404 is_bad = is_badblock(rdev, r1_bio->sector, max_sectors,
1405 &first_bad, &bad_sectors);
1406 if (is_bad < 0) {
1407 /* mustn't write here until the bad block is
1408 * acknowledged*/
1409 set_bit(BlockedBadBlocks, &rdev->flags);
1410 blocked_rdev = rdev;
1411 break;
1412 }
1413 if (is_bad && first_bad <= r1_bio->sector) {
1414 /* Cannot write here at all */
1415 bad_sectors -= (r1_bio->sector - first_bad);
1416 if (bad_sectors < max_sectors)
1417 /* mustn't write more than bad_sectors
1418 * to other devices yet
1419 */
1420 max_sectors = bad_sectors;
1421 rdev_dec_pending(rdev, mddev);
1422 /* We don't set R1BIO_Degraded as that
1423 * only applies if the disk is
1424 * missing, so it might be re-added,
1425 * and we want to know to recover this
1426 * chunk.
1427 * In this case the device is here,
1428 * and the fact that this chunk is not
1429 * in-sync is recorded in the bad
1430 * block log
1431 */
1432 continue;
1433 }
1434 if (is_bad) {
1435 int good_sectors = first_bad - r1_bio->sector;
1436 if (good_sectors < max_sectors)
1437 max_sectors = good_sectors;
1438 }
1439 }
1440 r1_bio->bios[i] = bio;
1441 }
1442 rcu_read_unlock();
1443
1444 if (unlikely(blocked_rdev)) {
1445 /* Wait for this device to become unblocked */
1446 int j;
1447
1448 for (j = 0; j < i; j++)
1449 if (r1_bio->bios[j])
1450 rdev_dec_pending(conf->mirrors[j].rdev, mddev);
1451 r1_bio->state = 0;
1452 allow_barrier(conf, bio->bi_iter.bi_sector);
1453 raid1_log(mddev, "wait rdev %d blocked", blocked_rdev->raid_disk);
1454 md_wait_for_blocked_rdev(blocked_rdev, mddev);
1455 wait_barrier(conf, bio->bi_iter.bi_sector);
1456 goto retry_write;
1457 }
1458
> 1459 max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
1460 if (max_sectors < bio_sectors(bio)) {
1461 struct bio *split = bio_split(bio, max_sectors,
1462 GFP_NOIO, &conf->bio_split);
1463 bio_chain(split, bio);
1464 submit_bio_noacct(bio);
1465 bio = split;
1466 r1_bio->master_bio = bio;
1467 r1_bio->sectors = max_sectors;
1468 }
1469
1470 if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue))
1471 r1_bio->start_time = bio_start_io_acct(bio);
1472 atomic_set(&r1_bio->remaining, 1);
1473 atomic_set(&r1_bio->behind_remaining, 0);
1474
1475 first_clone = 1;
1476
1477 for (i = 0; i < disks; i++) {
1478 struct bio *mbio = NULL;
1479 struct md_rdev *rdev = conf->mirrors[i].rdev;
1480 if (!r1_bio->bios[i])
1481 continue;
1482
1483 if (first_clone) {
1484 /* do behind I/O ?
1485 * Not if there are too many, or cannot
1486 * allocate memory, or a reader on WriteMostly
1487 * is waiting for behind writes to flush */
1488 if (bitmap &&
1489 (atomic_read(&bitmap->behind_writes)
1490 < mddev->bitmap_info.max_write_behind) &&
1491 !waitqueue_active(&bitmap->behind_wait)) {
1492 alloc_behind_master_bio(r1_bio, bio);
1493 }
1494
1495 md_bitmap_startwrite(bitmap, r1_bio->sector, r1_bio->sectors,
1496 test_bit(R1BIO_BehindIO, &r1_bio->state));
1497 first_clone = 0;
1498 }
1499
1500 if (r1_bio->behind_master_bio)
1501 mbio = bio_clone_fast(r1_bio->behind_master_bio,
1502 GFP_NOIO, &mddev->bio_set);
1503 else
1504 mbio = bio_clone_fast(bio, GFP_NOIO, &mddev->bio_set);
1505
1506 if (r1_bio->behind_master_bio) {
1507 if (test_bit(CollisionCheck, &rdev->flags))
1508 wait_for_serialization(rdev, r1_bio);
1509 if (test_bit(WriteMostly, &rdev->flags))
1510 atomic_inc(&r1_bio->behind_remaining);
1511 } else if (mddev->serialize_policy)
1512 wait_for_serialization(rdev, r1_bio);
1513
1514 r1_bio->bios[i] = mbio;
1515
1516 mbio->bi_iter.bi_sector = (r1_bio->sector +
1517 conf->mirrors[i].rdev->data_offset);
1518 bio_set_dev(mbio, conf->mirrors[i].rdev->bdev);
1519 mbio->bi_end_io = raid1_end_write_request;
1520 mbio->bi_opf = bio_op(bio) | (bio->bi_opf & (REQ_SYNC | REQ_FUA));
1521 if (test_bit(FailFast, &conf->mirrors[i].rdev->flags) &&
1522 !test_bit(WriteMostly, &conf->mirrors[i].rdev->flags) &&
1523 conf->raid_disks - mddev->degraded > 1)
1524 mbio->bi_opf |= MD_FAILFAST;
1525 mbio->bi_private = r1_bio;
1526
1527 atomic_inc(&r1_bio->remaining);
1528
1529 if (mddev->gendisk)
1530 trace_block_bio_remap(mbio, disk_devt(mddev->gendisk),
1531 r1_bio->sector);
1532 /* flush_pending_writes() needs access to the rdev so...*/
1533 mbio->bi_bdev = (void *)conf->mirrors[i].rdev;
1534
1535 cb = blk_check_plugged(raid1_unplug, mddev, sizeof(*plug));
1536 if (cb)
1537 plug = container_of(cb, struct raid1_plug_cb, cb);
1538 else
1539 plug = NULL;
1540 if (plug) {
1541 bio_list_add(&plug->pending, mbio);
1542 plug->pending_cnt++;
1543 } else {
1544 spin_lock_irqsave(&conf->device_lock, flags);
1545 bio_list_add(&conf->pending_bio_list, mbio);
1546 conf->pending_count++;
1547 spin_unlock_irqrestore(&conf->device_lock, flags);
1548 md_wakeup_thread(mddev->thread);
1549 }
1550 }
1551
1552 r1_bio_write_done(r1_bio);
1553
1554 /* In case raid1d snuck in to freeze_array */
1555 wake_up(&conf->wait_barrier);
1556 }
1557
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 30071 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
@ 2021-08-13 10:12 ` kernel test robot
0 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2021-08-13 10:12 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 13738 bytes --]
Hi Guoqing,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on song-md/md-next]
[also build test ERROR on v5.14-rc5 next-20210812]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Guoqing-Jiang/raid1-ensure-bio-doesn-t-have-more-than-BIO_MAX_VECS-sectors/20210813-140810
base: git://git.kernel.org/pub/scm/linux/kernel/git/song/md.git md-next
config: nds32-randconfig-r035-20210813 (attached as .config)
compiler: nds32le-linux-gcc (GCC) 10.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/29b7720a83de1deea0d8ecfafe0db46146636b15
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Guoqing-Jiang/raid1-ensure-bio-doesn-t-have-more-than-BIO_MAX_VECS-sectors/20210813-140810
git checkout 29b7720a83de1deea0d8ecfafe0db46146636b15
# save the attached .config to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-10.3.0 make.cross O=build_dir ARCH=nds32 SHELL=/bin/bash drivers/md/
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
In file included from include/linux/kernel.h:15,
from include/asm-generic/bug.h:20,
from ./arch/nds32/include/generated/asm/bug.h:1,
from include/linux/bug.h:5,
from include/linux/mmdebug.h:5,
from include/linux/gfp.h:5,
from include/linux/slab.h:15,
from drivers/md/raid1.c:26:
drivers/md/raid1.c: In function 'raid1_write_request':
>> drivers/md/raid1.c:1459:55: error: 'PAGE_SECTORS' undeclared (first use in this function); did you mean 'PAGE_MEMORY'?
1459 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
| ^~~~~~~~~~~~
include/linux/minmax.h:20:39: note: in definition of macro '__typecheck'
20 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
| ^
include/linux/minmax.h:36:24: note: in expansion of macro '__safe_cmp'
36 | __builtin_choose_expr(__safe_cmp(x, y), \
| ^~~~~~~~~~
include/linux/minmax.h:104:27: note: in expansion of macro '__careful_cmp'
104 | #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
| ^~~~~~~~~~~~~
drivers/md/raid1.c:1459:16: note: in expansion of macro 'min_t'
1459 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
| ^~~~~
drivers/md/raid1.c:1459:55: note: each undeclared identifier is reported only once for each function it appears in
1459 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
| ^~~~~~~~~~~~
include/linux/minmax.h:20:39: note: in definition of macro '__typecheck'
20 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
| ^
include/linux/minmax.h:36:24: note: in expansion of macro '__safe_cmp'
36 | __builtin_choose_expr(__safe_cmp(x, y), \
| ^~~~~~~~~~
include/linux/minmax.h:104:27: note: in expansion of macro '__careful_cmp'
104 | #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
| ^~~~~~~~~~~~~
drivers/md/raid1.c:1459:16: note: in expansion of macro 'min_t'
1459 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
| ^~~~~
>> include/linux/minmax.h:36:2: error: first argument to '__builtin_choose_expr' not a constant
36 | __builtin_choose_expr(__safe_cmp(x, y), \
| ^~~~~~~~~~~~~~~~~~~~~
include/linux/minmax.h:104:27: note: in expansion of macro '__careful_cmp'
104 | #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
| ^~~~~~~~~~~~~
drivers/md/raid1.c:1459:16: note: in expansion of macro 'min_t'
1459 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
| ^~~~~
vim +1459 drivers/md/raid1.c
1320
1321 static void raid1_write_request(struct mddev *mddev, struct bio *bio,
1322 int max_write_sectors)
1323 {
1324 struct r1conf *conf = mddev->private;
1325 struct r1bio *r1_bio;
1326 int i, disks;
1327 struct bitmap *bitmap = mddev->bitmap;
1328 unsigned long flags;
1329 struct md_rdev *blocked_rdev;
1330 struct blk_plug_cb *cb;
1331 struct raid1_plug_cb *plug = NULL;
1332 int first_clone;
1333 int max_sectors;
1334
1335 if (mddev_is_clustered(mddev) &&
1336 md_cluster_ops->area_resyncing(mddev, WRITE,
1337 bio->bi_iter.bi_sector, bio_end_sector(bio))) {
1338
1339 DEFINE_WAIT(w);
1340 for (;;) {
1341 prepare_to_wait(&conf->wait_barrier,
1342 &w, TASK_IDLE);
1343 if (!md_cluster_ops->area_resyncing(mddev, WRITE,
1344 bio->bi_iter.bi_sector,
1345 bio_end_sector(bio)))
1346 break;
1347 schedule();
1348 }
1349 finish_wait(&conf->wait_barrier, &w);
1350 }
1351
1352 /*
1353 * Register the new request and wait if the reconstruction
1354 * thread has put up a bar for new requests.
1355 * Continue immediately if no resync is active currently.
1356 */
1357 wait_barrier(conf, bio->bi_iter.bi_sector);
1358
1359 r1_bio = alloc_r1bio(mddev, bio);
1360 r1_bio->sectors = max_write_sectors;
1361
1362 if (conf->pending_count >= max_queued_requests) {
1363 md_wakeup_thread(mddev->thread);
1364 raid1_log(mddev, "wait queued");
1365 wait_event(conf->wait_barrier,
1366 conf->pending_count < max_queued_requests);
1367 }
1368 /* first select target devices under rcu_lock and
1369 * inc refcount on their rdev. Record them by setting
1370 * bios[x] to bio
1371 * If there are known/acknowledged bad blocks on any device on
1372 * which we have seen a write error, we want to avoid writing those
1373 * blocks.
1374 * This potentially requires several writes to write around
1375 * the bad blocks. Each set of writes gets it's own r1bio
1376 * with a set of bios attached.
1377 */
1378
1379 disks = conf->raid_disks * 2;
1380 retry_write:
1381 blocked_rdev = NULL;
1382 rcu_read_lock();
1383 max_sectors = r1_bio->sectors;
1384 for (i = 0; i < disks; i++) {
1385 struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev);
1386 if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
1387 atomic_inc(&rdev->nr_pending);
1388 blocked_rdev = rdev;
1389 break;
1390 }
1391 r1_bio->bios[i] = NULL;
1392 if (!rdev || test_bit(Faulty, &rdev->flags)) {
1393 if (i < conf->raid_disks)
1394 set_bit(R1BIO_Degraded, &r1_bio->state);
1395 continue;
1396 }
1397
1398 atomic_inc(&rdev->nr_pending);
1399 if (test_bit(WriteErrorSeen, &rdev->flags)) {
1400 sector_t first_bad;
1401 int bad_sectors;
1402 int is_bad;
1403
1404 is_bad = is_badblock(rdev, r1_bio->sector, max_sectors,
1405 &first_bad, &bad_sectors);
1406 if (is_bad < 0) {
1407 /* mustn't write here until the bad block is
1408 * acknowledged*/
1409 set_bit(BlockedBadBlocks, &rdev->flags);
1410 blocked_rdev = rdev;
1411 break;
1412 }
1413 if (is_bad && first_bad <= r1_bio->sector) {
1414 /* Cannot write here at all */
1415 bad_sectors -= (r1_bio->sector - first_bad);
1416 if (bad_sectors < max_sectors)
1417 /* mustn't write more than bad_sectors
1418 * to other devices yet
1419 */
1420 max_sectors = bad_sectors;
1421 rdev_dec_pending(rdev, mddev);
1422 /* We don't set R1BIO_Degraded as that
1423 * only applies if the disk is
1424 * missing, so it might be re-added,
1425 * and we want to know to recover this
1426 * chunk.
1427 * In this case the device is here,
1428 * and the fact that this chunk is not
1429 * in-sync is recorded in the bad
1430 * block log
1431 */
1432 continue;
1433 }
1434 if (is_bad) {
1435 int good_sectors = first_bad - r1_bio->sector;
1436 if (good_sectors < max_sectors)
1437 max_sectors = good_sectors;
1438 }
1439 }
1440 r1_bio->bios[i] = bio;
1441 }
1442 rcu_read_unlock();
1443
1444 if (unlikely(blocked_rdev)) {
1445 /* Wait for this device to become unblocked */
1446 int j;
1447
1448 for (j = 0; j < i; j++)
1449 if (r1_bio->bios[j])
1450 rdev_dec_pending(conf->mirrors[j].rdev, mddev);
1451 r1_bio->state = 0;
1452 allow_barrier(conf, bio->bi_iter.bi_sector);
1453 raid1_log(mddev, "wait rdev %d blocked", blocked_rdev->raid_disk);
1454 md_wait_for_blocked_rdev(blocked_rdev, mddev);
1455 wait_barrier(conf, bio->bi_iter.bi_sector);
1456 goto retry_write;
1457 }
1458
> 1459 max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
1460 if (max_sectors < bio_sectors(bio)) {
1461 struct bio *split = bio_split(bio, max_sectors,
1462 GFP_NOIO, &conf->bio_split);
1463 bio_chain(split, bio);
1464 submit_bio_noacct(bio);
1465 bio = split;
1466 r1_bio->master_bio = bio;
1467 r1_bio->sectors = max_sectors;
1468 }
1469
1470 if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue))
1471 r1_bio->start_time = bio_start_io_acct(bio);
1472 atomic_set(&r1_bio->remaining, 1);
1473 atomic_set(&r1_bio->behind_remaining, 0);
1474
1475 first_clone = 1;
1476
1477 for (i = 0; i < disks; i++) {
1478 struct bio *mbio = NULL;
1479 struct md_rdev *rdev = conf->mirrors[i].rdev;
1480 if (!r1_bio->bios[i])
1481 continue;
1482
1483 if (first_clone) {
1484 /* do behind I/O ?
1485 * Not if there are too many, or cannot
1486 * allocate memory, or a reader on WriteMostly
1487 * is waiting for behind writes to flush */
1488 if (bitmap &&
1489 (atomic_read(&bitmap->behind_writes)
1490 < mddev->bitmap_info.max_write_behind) &&
1491 !waitqueue_active(&bitmap->behind_wait)) {
1492 alloc_behind_master_bio(r1_bio, bio);
1493 }
1494
1495 md_bitmap_startwrite(bitmap, r1_bio->sector, r1_bio->sectors,
1496 test_bit(R1BIO_BehindIO, &r1_bio->state));
1497 first_clone = 0;
1498 }
1499
1500 if (r1_bio->behind_master_bio)
1501 mbio = bio_clone_fast(r1_bio->behind_master_bio,
1502 GFP_NOIO, &mddev->bio_set);
1503 else
1504 mbio = bio_clone_fast(bio, GFP_NOIO, &mddev->bio_set);
1505
1506 if (r1_bio->behind_master_bio) {
1507 if (test_bit(CollisionCheck, &rdev->flags))
1508 wait_for_serialization(rdev, r1_bio);
1509 if (test_bit(WriteMostly, &rdev->flags))
1510 atomic_inc(&r1_bio->behind_remaining);
1511 } else if (mddev->serialize_policy)
1512 wait_for_serialization(rdev, r1_bio);
1513
1514 r1_bio->bios[i] = mbio;
1515
1516 mbio->bi_iter.bi_sector = (r1_bio->sector +
1517 conf->mirrors[i].rdev->data_offset);
1518 bio_set_dev(mbio, conf->mirrors[i].rdev->bdev);
1519 mbio->bi_end_io = raid1_end_write_request;
1520 mbio->bi_opf = bio_op(bio) | (bio->bi_opf & (REQ_SYNC | REQ_FUA));
1521 if (test_bit(FailFast, &conf->mirrors[i].rdev->flags) &&
1522 !test_bit(WriteMostly, &conf->mirrors[i].rdev->flags) &&
1523 conf->raid_disks - mddev->degraded > 1)
1524 mbio->bi_opf |= MD_FAILFAST;
1525 mbio->bi_private = r1_bio;
1526
1527 atomic_inc(&r1_bio->remaining);
1528
1529 if (mddev->gendisk)
1530 trace_block_bio_remap(mbio, disk_devt(mddev->gendisk),
1531 r1_bio->sector);
1532 /* flush_pending_writes() needs access to the rdev so...*/
1533 mbio->bi_bdev = (void *)conf->mirrors[i].rdev;
1534
1535 cb = blk_check_plugged(raid1_unplug, mddev, sizeof(*plug));
1536 if (cb)
1537 plug = container_of(cb, struct raid1_plug_cb, cb);
1538 else
1539 plug = NULL;
1540 if (plug) {
1541 bio_list_add(&plug->pending, mbio);
1542 plug->pending_cnt++;
1543 } else {
1544 spin_lock_irqsave(&conf->device_lock, flags);
1545 bio_list_add(&conf->pending_bio_list, mbio);
1546 conf->pending_count++;
1547 spin_unlock_irqrestore(&conf->device_lock, flags);
1548 md_wakeup_thread(mddev->thread);
1549 }
1550 }
1551
1552 r1_bio_write_done(r1_bio);
1553
1554 /* In case raid1d snuck in to freeze_array */
1555 wake_up(&conf->wait_barrier);
1556 }
1557
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 30071 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
2021-08-13 8:38 ` Guoqing Jiang
@ 2021-08-14 7:55 ` Christoph Hellwig
2021-08-14 8:57 ` Ming Lei
0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2021-08-14 7:55 UTC (permalink / raw)
To: Guoqing Jiang; +Cc: Christoph Hellwig, song, linux-raid, jens, linux-block
On Fri, Aug 13, 2021 at 04:38:59PM +0800, Guoqing Jiang wrote:
>
> Ok, thanks.
>
> > In general the size of a bio only depends on the number of vectors, not
> > the total I/O size. But alloc_behind_master_bio allocates new backing
> > pages using order 0 allocations, so in this exceptional case the total
> > size oes actually matter.
> >
> > While we're at it: this huge memory allocation looks really deadlock
> > prone.
>
> Hmm, let me think more about it, or could you share your thought? ????
Well, you'd need a mempool which can fit the max payload of a bio,
that is BIO_MAX_VECS pages.
FYI, this is what I'd do instead of this patch for now. We don't really
need a vetor per sector, just per page. So this limits the I/O
size a little less.
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 3c44c4bb40fc..5b27d995302e 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1454,6 +1454,15 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
goto retry_write;
}
+ /*
+ * When using a bitmap, we may call alloc_behind_master_bio below.
+ * alloc_behind_master_bio allocates a copy of the data payload a page
+ * at a time and thus needs a new bio that can fit the whole payload
+ * this bio in page sized chunks.
+ */
+ if (bitmap)
+ max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SIZE);
+
if (max_sectors < bio_sectors(bio)) {
struct bio *split = bio_split(bio, max_sectors,
GFP_NOIO, &conf->bio_split);
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
2021-08-14 7:55 ` Christoph Hellwig
@ 2021-08-14 8:57 ` Ming Lei
2021-08-16 6:27 ` Guoqing Jiang
2021-08-16 9:37 ` Christoph Hellwig
0 siblings, 2 replies; 17+ messages in thread
From: Ming Lei @ 2021-08-14 8:57 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Guoqing Jiang, song, linux-raid, jens, linux-block
On Sat, Aug 14, 2021 at 08:55:21AM +0100, Christoph Hellwig wrote:
> On Fri, Aug 13, 2021 at 04:38:59PM +0800, Guoqing Jiang wrote:
> >
> > Ok, thanks.
> >
> > > In general the size of a bio only depends on the number of vectors, not
> > > the total I/O size. But alloc_behind_master_bio allocates new backing
> > > pages using order 0 allocations, so in this exceptional case the total
> > > size oes actually matter.
> > >
> > > While we're at it: this huge memory allocation looks really deadlock
> > > prone.
> >
> > Hmm, let me think more about it, or could you share your thought? ????
>
> Well, you'd need a mempool which can fit the max payload of a bio,
> that is BIO_MAX_VECS pages.
>
> FYI, this is what I'd do instead of this patch for now. We don't really
> need a vetor per sector, just per page. So this limits the I/O
> size a little less.
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 3c44c4bb40fc..5b27d995302e 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1454,6 +1454,15 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
> goto retry_write;
> }
>
> + /*
> + * When using a bitmap, we may call alloc_behind_master_bio below.
> + * alloc_behind_master_bio allocates a copy of the data payload a page
> + * at a time and thus needs a new bio that can fit the whole payload
> + * this bio in page sized chunks.
> + */
> + if (bitmap)
> + max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SIZE);
s/PAGE_SIZE/PAGE_SECTORS
> +
> if (max_sectors < bio_sectors(bio)) {
> struct bio *split = bio_split(bio, max_sectors,
> GFP_NOIO, &conf->bio_split);
>
Here the limit is max single-page vectors, and the above way may not work,
such as:
0 ~ 254: each bvec's length is 512
255: bvec's length is 8192
the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it
still may need 257 bvecs, which can't be allocated via bio_alloc_bioset().
One solution is to add queue limit of max_single_page_bvec, and let
blk_queue_split() handle it.
Thanks,
Ming
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
2021-08-14 8:57 ` Ming Lei
@ 2021-08-16 6:27 ` Guoqing Jiang
2021-08-16 7:13 ` Ming Lei
2021-08-16 9:37 ` Christoph Hellwig
1 sibling, 1 reply; 17+ messages in thread
From: Guoqing Jiang @ 2021-08-16 6:27 UTC (permalink / raw)
To: Ming Lei, Christoph Hellwig; +Cc: song, linux-raid, jens, linux-block
Hi Ming and Christoph,
On 8/14/21 4:57 PM, Ming Lei wrote:
> On Sat, Aug 14, 2021 at 08:55:21AM +0100, Christoph Hellwig wrote:
>> On Fri, Aug 13, 2021 at 04:38:59PM +0800, Guoqing Jiang wrote:
>>> Ok, thanks.
>>>
>>>> In general the size of a bio only depends on the number of vectors, not
>>>> the total I/O size. But alloc_behind_master_bio allocates new backing
>>>> pages using order 0 allocations, so in this exceptional case the total
>>>> size oes actually matter.
>>>>
>>>> While we're at it: this huge memory allocation looks really deadlock
>>>> prone.
>>> Hmm, let me think more about it, or could you share your thought? ????
>> Well, you'd need a mempool which can fit the max payload of a bio,
>> that is BIO_MAX_VECS pages.
IIUC, the behind bio is allocated from bio_set (mddev->bio_set) which is
allocated in md_run by
call bioset_init, so the mempool (bvec_pool) of this bio_set is created
by biovec_init_pool which
uses global biovec slabs. Do we really need another mempool? Or, there
is no potential deadlock
for this case.
>> FYI, this is what I'd do instead of this patch for now. We don't really
>> need a vetor per sector, just per page. So this limits the I/O
>> size a little less.
>>
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index 3c44c4bb40fc..5b27d995302e 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -1454,6 +1454,15 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
>> goto retry_write;
>> }
>>
>> + /*
>> + * When using a bitmap, we may call alloc_behind_master_bio below.
>> + * alloc_behind_master_bio allocates a copy of the data payload a page
>> + * at a time and thus needs a new bio that can fit the whole payload
>> + * this bio in page sized chunks.
>> + */
Thanks for the above, will copy it accordingly. I will check if
WriteMostly is set before, then check both
the flag and bitmap.
>> + if (bitmap)
>> + max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SIZE);
> s/PAGE_SIZE/PAGE_SECTORS
Agree.
>> +
>> if (max_sectors < bio_sectors(bio)) {
>> struct bio *split = bio_split(bio, max_sectors,
>> GFP_NOIO, &conf->bio_split);
> Here the limit is max single-page vectors, and the above way may not work,
> such as:ust splitted and not
>
> 0 ~ 254: each bvec's length is 512
> 255: bvec's length is 8192
>
> the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it
> still may need 257 bvecs, which can't be allocated via bio_alloc_bioset().
Thanks for deeper looking! I guess it is because how vcnt is calculated.
> One solution is to add queue limit of max_single_page_bvec, and let
> blk_queue_split() handle it.
The path (blk_queue_split -> blk_bio_segment_split -> bvec_split_segs)
which respects max_segments
of limit. Do you mean introduce max_single_page_bvec to limit? Then
perform similar checking as for
max_segment.
Thanks,
Guoqing
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
2021-08-16 6:27 ` Guoqing Jiang
@ 2021-08-16 7:13 ` Ming Lei
0 siblings, 0 replies; 17+ messages in thread
From: Ming Lei @ 2021-08-16 7:13 UTC (permalink / raw)
To: Guoqing Jiang; +Cc: Christoph Hellwig, song, linux-raid, jens, linux-block
On Mon, Aug 16, 2021 at 02:27:48PM +0800, Guoqing Jiang wrote:
> Hi Ming and Christoph,
>
> On 8/14/21 4:57 PM, Ming Lei wrote:
> > On Sat, Aug 14, 2021 at 08:55:21AM +0100, Christoph Hellwig wrote:
> > > On Fri, Aug 13, 2021 at 04:38:59PM +0800, Guoqing Jiang wrote:
> > > > Ok, thanks.
> > > >
> > > > > In general the size of a bio only depends on the number of vectors, not
> > > > > the total I/O size. But alloc_behind_master_bio allocates new backing
> > > > > pages using order 0 allocations, so in this exceptional case the total
> > > > > size oes actually matter.
> > > > >
> > > > > While we're at it: this huge memory allocation looks really deadlock
> > > > > prone.
> > > > Hmm, let me think more about it, or could you share your thought? ????
> > > Well, you'd need a mempool which can fit the max payload of a bio,
> > > that is BIO_MAX_VECS pages.
>
> IIUC, the behind bio is allocated from bio_set (mddev->bio_set) which is
> allocated in md_run by
> call bioset_init, so the mempool (bvec_pool) of this bio_set is created by
> biovec_init_pool which
> uses global biovec slabs. Do we really need another mempool? Or, there is no
> potential deadlock
> for this case.
>
> > > FYI, this is what I'd do instead of this patch for now. We don't really
> > > need a vetor per sector, just per page. So this limits the I/O
> > > size a little less.
> > >
> > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> > > index 3c44c4bb40fc..5b27d995302e 100644
> > > --- a/drivers/md/raid1.c
> > > +++ b/drivers/md/raid1.c
> > > @@ -1454,6 +1454,15 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
> > > goto retry_write;
> > > }
> > > + /*
> > > + * When using a bitmap, we may call alloc_behind_master_bio below.
> > > + * alloc_behind_master_bio allocates a copy of the data payload a page
> > > + * at a time and thus needs a new bio that can fit the whole payload
> > > + * this bio in page sized chunks.
> > > + */
>
> Thanks for the above, will copy it accordingly. I will check if WriteMostly
> is set before, then check both
> the flag and bitmap.
>
> > > + if (bitmap)
> > > + max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SIZE);
> > s/PAGE_SIZE/PAGE_SECTORS
>
> Agree.
>
> > > +
> > > if (max_sectors < bio_sectors(bio)) {
> > > struct bio *split = bio_split(bio, max_sectors,
> > > GFP_NOIO, &conf->bio_split);
> > Here the limit is max single-page vectors, and the above way may not work,
> > such as:ust splitted and not
> >
> > 0 ~ 254: each bvec's length is 512
> > 255: bvec's length is 8192
> >
> > the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it
> > still may need 257 bvecs, which can't be allocated via bio_alloc_bioset().
>
> Thanks for deeper looking! I guess it is because how vcnt is calculated.
>
> > One solution is to add queue limit of max_single_page_bvec, and let
> > blk_queue_split() handle it.
>
> The path (blk_queue_split -> blk_bio_segment_split -> bvec_split_segs) which
> respects max_segments
> of limit. Do you mean introduce max_single_page_bvec to limit? Then perform
> similar checking as for
> max_segment.
Yes, then the bio is guaranteed to not reach max single-page bvec limit,
just like what __blk_queue_bounce() does.
thanks,
Ming
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
2021-08-14 8:57 ` Ming Lei
2021-08-16 6:27 ` Guoqing Jiang
@ 2021-08-16 9:37 ` Christoph Hellwig
2021-08-16 11:40 ` Ming Lei
1 sibling, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2021-08-16 9:37 UTC (permalink / raw)
To: Ming Lei
Cc: Christoph Hellwig, Guoqing Jiang, song, linux-raid, jens, linux-block
On Sat, Aug 14, 2021 at 04:57:06PM +0800, Ming Lei wrote:
> > + if (bitmap)
> > + max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SIZE);
>
> s/PAGE_SIZE/PAGE_SECTORS
Yeah, max_sectors is in size units, I messed that up.
>
> > +
> > if (max_sectors < bio_sectors(bio)) {
> > struct bio *split = bio_split(bio, max_sectors,
> > GFP_NOIO, &conf->bio_split);
> >
>
> Here the limit is max single-page vectors, and the above way may not work,
> such as:
>
> 0 ~ 254: each bvec's length is 512
> 255: bvec's length is 8192
>
> the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it
> still may need 257 bvecs, which can't be allocated via bio_alloc_bioset().
Yes, we still need the rounding magic that alloc_behind_master_bio uses
here.
> One solution is to add queue limit of max_single_page_bvec, and let
> blk_queue_split() handle it.
I'd rather not bloat the core with this.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
2021-08-16 9:37 ` Christoph Hellwig
@ 2021-08-16 11:40 ` Ming Lei
2021-08-17 5:06 ` Christoph Hellwig
0 siblings, 1 reply; 17+ messages in thread
From: Ming Lei @ 2021-08-16 11:40 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Guoqing Jiang, song, linux-raid, jens, linux-block
On Mon, Aug 16, 2021 at 10:37:54AM +0100, Christoph Hellwig wrote:
> On Sat, Aug 14, 2021 at 04:57:06PM +0800, Ming Lei wrote:
> > > + if (bitmap)
> > > + max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SIZE);
> >
> > s/PAGE_SIZE/PAGE_SECTORS
>
> Yeah, max_sectors is in size units, I messed that up.
>
> >
> > > +
> > > if (max_sectors < bio_sectors(bio)) {
> > > struct bio *split = bio_split(bio, max_sectors,
> > > GFP_NOIO, &conf->bio_split);
> > >
> >
> > Here the limit is max single-page vectors, and the above way may not work,
> > such as:
> >
> > 0 ~ 254: each bvec's length is 512
> > 255: bvec's length is 8192
> >
> > the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it
> > still may need 257 bvecs, which can't be allocated via bio_alloc_bioset().
>
> Yes, we still need the rounding magic that alloc_behind_master_bio uses
> here.
But it is wrong to use max sectors to limit number of bvecs(segments), isn't it?
Thanks,
Ming
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
2021-08-16 11:40 ` Ming Lei
@ 2021-08-17 5:06 ` Christoph Hellwig
2021-08-17 12:32 ` Ming Lei
0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2021-08-17 5:06 UTC (permalink / raw)
To: Ming Lei
Cc: Christoph Hellwig, Guoqing Jiang, song, linux-raid, jens, linux-block
On Mon, Aug 16, 2021 at 07:40:48PM +0800, Ming Lei wrote:
> > >
> > > 0 ~ 254: each bvec's length is 512
> > > 255: bvec's length is 8192
> > >
> > > the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it
> > > still may need 257 bvecs, which can't be allocated via bio_alloc_bioset().
> >
> > Yes, we still need the rounding magic that alloc_behind_master_bio uses
> > here.
>
> But it is wrong to use max sectors to limit number of bvecs(segments), isn't it?
The raid1 write behind code cares about the size ofa bio it can reach by
adding order 0 pages to it. The bvecs are part of that and I think the
calculation in the patch documents that a well.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
2021-08-17 5:06 ` Christoph Hellwig
@ 2021-08-17 12:32 ` Ming Lei
2021-09-24 15:34 ` Jens Stutte (Archiv)
0 siblings, 1 reply; 17+ messages in thread
From: Ming Lei @ 2021-08-17 12:32 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Guoqing Jiang, song, linux-raid, jens, linux-block
On Tue, Aug 17, 2021 at 06:06:12AM +0100, Christoph Hellwig wrote:
> On Mon, Aug 16, 2021 at 07:40:48PM +0800, Ming Lei wrote:
> > > >
> > > > 0 ~ 254: each bvec's length is 512
> > > > 255: bvec's length is 8192
> > > >
> > > > the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it
> > > > still may need 257 bvecs, which can't be allocated via bio_alloc_bioset().
> > >
> > > Yes, we still need the rounding magic that alloc_behind_master_bio uses
> > > here.
> >
> > But it is wrong to use max sectors to limit number of bvecs(segments), isn't it?
>
> The raid1 write behind code cares about the size ofa bio it can reach by
> adding order 0 pages to it. The bvecs are part of that and I think the
> calculation in the patch documents that a well.
Thinking of further, your and Guoqing's patch are correct & enough since
bio_copy_data() just copies bytes(sectors) stream from fs bio to the
write behind bio.
Thanks,
Ming
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
2021-08-17 12:32 ` Ming Lei
@ 2021-09-24 15:34 ` Jens Stutte (Archiv)
2021-09-25 23:02 ` Guoqing Jiang
0 siblings, 1 reply; 17+ messages in thread
From: Jens Stutte (Archiv) @ 2021-09-24 15:34 UTC (permalink / raw)
To: Ming Lei, Christoph Hellwig; +Cc: Guoqing Jiang, song, linux-raid, linux-block
[-- Attachment #1: Type: text/plain, Size: 3628 bytes --]
Hi all,
I just had the occasion to test the new patch as landed in arch linux
5.14.7. Unfortunately it does not work for me. Attached you can find a
modification that works for me, though I am not really sure why
write_behind seems not to be set to true on my configuration. If there
is any more data I can provide to help you to investigate, please let me
know.
Thanks for any clues,
Jens
My configuration:
[root@vdr jens]# mdadm --detail -v /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Fri Dec 26 09:50:53 2014
Raid Level : raid1
Array Size : 1953381440 (1862.89 GiB 2000.26 GB)
Used Dev Size : 1953381440 (1862.89 GiB 2000.26 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Fri Sep 24 17:30:51 2021
State : active
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Consistency Policy : bitmap
Name : vdr:0 (local to host vdr)
UUID : 5532ffda:ccbc790f:b50c4959:8f0fd43f
Events : 32805
Number Major Minor RaidDevice State
2 8 33 0 active sync /dev/sdc1
3 8 17 1 active sync /dev/sdb1
[root@vdr jens]# mdadm -X /dev/sdb1
Filename : /dev/sdb1
Magic : 6d746962
Version : 4
UUID : 5532ffda:ccbc790f:b50c4959:8f0fd43f
Events : 32804
Events Cleared : 32804
State : OK
Chunksize : 64 MB
Daemon : 5s flush period
Write Mode : Allow write behind, max 4096
Sync Size : 1953381440 (1862.89 GiB 2000.26 GB)
Bitmap : 29807 bits (chunks), 3 dirty (0.0%)
[root@vdr jens]# mdadm -X /dev/sdc1
Filename : /dev/sdc1
Magic : 6d746962
Version : 4
UUID : 5532ffda:ccbc790f:b50c4959:8f0fd43f
Events : 32804
Events Cleared : 32804
State : OK
Chunksize : 64 MB
Daemon : 5s flush period
Write Mode : Allow write behind, max 4096
Sync Size : 1953381440 (1862.89 GiB 2000.26 GB)
Bitmap : 29807 bits (chunks), 3 dirty (0.0%)
Am 17.08.21 um 14:32 schrieb Ming Lei:
> On Tue, Aug 17, 2021 at 06:06:12AM +0100, Christoph Hellwig wrote:
>> On Mon, Aug 16, 2021 at 07:40:48PM +0800, Ming Lei wrote:
>>>>> 0 ~ 254: each bvec's length is 512
>>>>> 255: bvec's length is 8192
>>>>>
>>>>> the total length is just 512*255 + 8192 = 138752 bytes = 271 sectors, but it
>>>>> still may need 257 bvecs, which can't be allocated via bio_alloc_bioset().
>>>> Yes, we still need the rounding magic that alloc_behind_master_bio uses
>>>> here.
>>> But it is wrong to use max sectors to limit number of bvecs(segments), isn't it?
>> The raid1 write behind code cares about the size ofa bio it can reach by
>> adding order 0 pages to it. The bvecs are part of that and I think the
>> calculation in the patch documents that a well.
> Thinking of further, your and Guoqing's patch are correct & enough since
> bio_copy_data() just copies bytes(sectors) stream from fs bio to the
> write behind bio.
>
>
> Thanks,
> Ming
>
[-- Attachment #2: raid1.patch --]
[-- Type: text/x-patch, Size: 2245 bytes --]
diff --unified --recursive --text archlinux-linux/drivers/md/raid1.c archlinux-linux-diff/drivers/md/raid1.c
--- archlinux-linux/drivers/md/raid1.c 2021-09-24 14:37:15.347771866 +0200
+++ archlinux-linux-diff/drivers/md/raid1.c 2021-09-24 14:40:02.443978319 +0200
@@ -1501,7 +1501,7 @@
* Not if there are too many, or cannot
* allocate memory, or a reader on WriteMostly
* is waiting for behind writes to flush */
- if (bitmap &&
+ if (bitmap && write_behind &&
(atomic_read(&bitmap->behind_writes)
< mddev->bitmap_info.max_write_behind) &&
!waitqueue_active(&bitmap->behind_wait)) {
diff --unified --recursive --text archlinux-linux/drivers/md/raid1.c archlinux-linux-changed/drivers/md/raid1.c
--- archlinux-linux/drivers/md/raid1.c 2021-09-24 15:43:22.842680119 +0200
+++ archlinux-linux-changed/drivers/md/raid1.c 2021-09-24 15:43:59.426142955 +0200
@@ -1329,7 +1329,6 @@
struct raid1_plug_cb *plug = NULL;
int first_clone;
int max_sectors;
- bool write_behind = false;
if (mddev_is_clustered(mddev) &&
md_cluster_ops->area_resyncing(mddev, WRITE,
@@ -1383,14 +1382,6 @@
for (i = 0; i < disks; i++) {
struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev);
- /*
- * The write-behind io is only attempted on drives marked as
- * write-mostly, which means we could allocate write behind
- * bio later.
- */
- if (rdev && test_bit(WriteMostly, &rdev->flags))
- write_behind = true;
-
if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
atomic_inc(&rdev->nr_pending);
blocked_rdev = rdev;
@@ -1470,7 +1461,7 @@
* at a time and thus needs a new bio that can fit the whole payload
* this bio in page sized chunks.
*/
- if (write_behind && bitmap)
+ if (bitmap)
max_sectors = min_t(int, max_sectors,
BIO_MAX_VECS * (PAGE_SIZE >> 9));
if (max_sectors < bio_sectors(bio)) {
@@ -1501,7 +1492,7 @@
* Not if there are too many, or cannot
* allocate memory, or a reader on WriteMostly
* is waiting for behind writes to flush */
- if (bitmap &&
+ if (bitmap &&
(atomic_read(&bitmap->behind_writes)
< mddev->bitmap_info.max_write_behind) &&
!waitqueue_active(&bitmap->behind_wait)) {
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors
2021-09-24 15:34 ` Jens Stutte (Archiv)
@ 2021-09-25 23:02 ` Guoqing Jiang
0 siblings, 0 replies; 17+ messages in thread
From: Guoqing Jiang @ 2021-09-25 23:02 UTC (permalink / raw)
To: Jens Stutte (Archiv), Ming Lei, Christoph Hellwig
Cc: song, linux-raid, linux-block
On 9/24/21 11:34 PM, Jens Stutte (Archiv) wrote:
> Hi all,
>
> I just had the occasion to test the new patch as landed in arch linux
> 5.14.7. Unfortunately it does not work for me. Attached you can find a
> modification that works for me, though I am not really sure why
> write_behind seems not to be set to true on my configuration. If there
> is any more data I can provide to help you to investigate, please let
> me know.
Thanks for the report! As commented in bugzilla, this is because
write-behind
IO still happens even without write-mostly device. I will send new patch
after
you confirm it works.
1. https://bugzilla.kernel.org/show_bug.cgi?id=213181
Thanks,
Guoqing
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2021-09-25 23:02 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-13 6:05 [PATCH] raid1: ensure bio doesn't have more than BIO_MAX_VECS sectors Guoqing Jiang
2021-08-13 7:49 ` Christoph Hellwig
2021-08-13 8:38 ` Guoqing Jiang
2021-08-14 7:55 ` Christoph Hellwig
2021-08-14 8:57 ` Ming Lei
2021-08-16 6:27 ` Guoqing Jiang
2021-08-16 7:13 ` Ming Lei
2021-08-16 9:37 ` Christoph Hellwig
2021-08-16 11:40 ` Ming Lei
2021-08-17 5:06 ` Christoph Hellwig
2021-08-17 12:32 ` Ming Lei
2021-09-24 15:34 ` Jens Stutte (Archiv)
2021-09-25 23:02 ` Guoqing Jiang
2021-08-13 9:27 ` kernel test robot
2021-08-13 9:27 ` kernel test robot
2021-08-13 10:12 ` kernel test robot
2021-08-13 10:12 ` kernel test robot
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.