* [PATCH V2] raid1: ensure write behind bio has less than BIO_MAX_VECS sectors
@ 2021-08-18 7:37 Guoqing Jiang
2021-08-18 7:57 ` Guoqing Jiang
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Guoqing Jiang @ 2021-08-18 7:37 UTC (permalink / raw)
To: axboe; +Cc: linux-block, linux-raid
From: Guoqing Jiang <jiangguoqing@kylinos.cn>
We can't split write behind bio with more than BIO_MAX_VECS sectors,
otherwise the below call trace was triggered because we could allocate
oversized write behind bio later.
[ 8.097936] bvec_alloc+0x90/0xc0
[ 8.098934] bio_alloc_bioset+0x1b3/0x260
[ 8.099959] raid1_make_request+0x9ce/0xc50 [raid1]
[ 8.100988] ? __bio_clone_fast+0xa8/0xe0
[ 8.102008] md_handle_request+0x158/0x1d0 [md_mod]
[ 8.103050] md_submit_bio+0xcd/0x110 [md_mod]
[ 8.104084] submit_bio_noacct+0x139/0x530
[ 8.105127] submit_bio+0x78/0x1d0
[ 8.106163] ext4_io_submit+0x48/0x60 [ext4]
[ 8.107242] ext4_writepages+0x652/0x1170 [ext4]
[ 8.108300] ? do_writepages+0x41/0x100
[ 8.109338] ? __ext4_mark_inode_dirty+0x240/0x240 [ext4]
[ 8.110406] do_writepages+0x41/0x100
[ 8.111450] __filemap_fdatawrite_range+0xc5/0x100
[ 8.112513] file_write_and_wait_range+0x61/0xb0
[ 8.113564] ext4_sync_file+0x73/0x370 [ext4]
[ 8.114607] __x64_sys_fsync+0x33/0x60
[ 8.115635] do_syscall_64+0x33/0x40
[ 8.116670] entry_SYSCALL_64_after_hwframe+0x44/0xae
Thanks for the comment from Christoph.
[1]. https://bugs.archlinux.org/task/70992
Cc: Song Liu <song@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Reported-by: Jens Stutte <jens@chianterastutte.eu>
Tested-by: Jens Stutte <jens@chianterastutte.eu>
Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn>
---
Hi Jens,
Could you consider apply this to block tree? Since it depends
on commit 018eca456c4b4dca56aaf1ec27f309c74d0fe246 in block
tree for-next branch, otherwise lkp would complain about compile
issue again.
Thanks,
Guoqing
V2 change:
1. add checking for write-behind case and relevant comments from Christoph.
drivers/md/raid1.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 3c44c4bb40fc..e8c8e6bb0439 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1329,6 +1329,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
struct raid1_plug_cb *plug = NULL;
int first_clone;
int max_sectors;
+ bool write_behind = false;
if (mddev_is_clustered(mddev) &&
md_cluster_ops->area_resyncing(mddev, WRITE,
@@ -1381,6 +1382,10 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
max_sectors = r1_bio->sectors;
for (i = 0; i < disks; i++) {
struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev);
+
+ if (test_bit(WriteMostly, &mirror->rdev->flags))
+ write_behind = true;
+
if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
atomic_inc(&rdev->nr_pending);
blocked_rdev = rdev;
@@ -1454,6 +1459,14 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
goto retry_write;
}
+ /*
+ * When using a bitmap, we may call alloc_behind_master_bio below.
+ * alloc_behind_master_bio allocates a copy of the data payload a page
+ * at a time and thus needs a new bio that can fit the whole payload
+ * this bio in page sized chunks.
+ */
+ if (write_behind && bitmap)
+ max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
if (max_sectors < bio_sectors(bio)) {
struct bio *split = bio_split(bio, max_sectors,
GFP_NOIO, &conf->bio_split);
--
2.25.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH V2] raid1: ensure write behind bio has less than BIO_MAX_VECS sectors
2021-08-18 7:37 [PATCH V2] raid1: ensure write behind bio has less than BIO_MAX_VECS sectors Guoqing Jiang
@ 2021-08-18 7:57 ` Guoqing Jiang
2021-08-18 11:03 ` kernel test robot
` (2 subsequent siblings)
3 siblings, 0 replies; 9+ messages in thread
From: Guoqing Jiang @ 2021-08-18 7:57 UTC (permalink / raw)
To: axboe; +Cc: linux-block, linux-raid, Song Liu, Christoph Hellwig
Missed to cc Song and Christoph, now actually do it ...
On 8/18/21 3:37 PM, Guoqing Jiang wrote:
> From: Guoqing Jiang <jiangguoqing@kylinos.cn>
>
> We can't split write behind bio with more than BIO_MAX_VECS sectors,
> otherwise the below call trace was triggered because we could allocate
> oversized write behind bio later.
>
> [ 8.097936] bvec_alloc+0x90/0xc0
> [ 8.098934] bio_alloc_bioset+0x1b3/0x260
> [ 8.099959] raid1_make_request+0x9ce/0xc50 [raid1]
> [ 8.100988] ? __bio_clone_fast+0xa8/0xe0
> [ 8.102008] md_handle_request+0x158/0x1d0 [md_mod]
> [ 8.103050] md_submit_bio+0xcd/0x110 [md_mod]
> [ 8.104084] submit_bio_noacct+0x139/0x530
> [ 8.105127] submit_bio+0x78/0x1d0
> [ 8.106163] ext4_io_submit+0x48/0x60 [ext4]
> [ 8.107242] ext4_writepages+0x652/0x1170 [ext4]
> [ 8.108300] ? do_writepages+0x41/0x100
> [ 8.109338] ? __ext4_mark_inode_dirty+0x240/0x240 [ext4]
> [ 8.110406] do_writepages+0x41/0x100
> [ 8.111450] __filemap_fdatawrite_range+0xc5/0x100
> [ 8.112513] file_write_and_wait_range+0x61/0xb0
> [ 8.113564] ext4_sync_file+0x73/0x370 [ext4]
> [ 8.114607] __x64_sys_fsync+0x33/0x60
> [ 8.115635] do_syscall_64+0x33/0x40
> [ 8.116670] entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> Thanks for the comment from Christoph.
>
> [1]. https://bugs.archlinux.org/task/70992
>
> Cc: Song Liu <song@kernel.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Reported-by: Jens Stutte <jens@chianterastutte.eu>
> Tested-by: Jens Stutte <jens@chianterastutte.eu>
> Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn>
> ---
> Hi Jens,
>
> Could you consider apply this to block tree? Since it depends
> on commit 018eca456c4b4dca56aaf1ec27f309c74d0fe246 in block
> tree for-next branch, otherwise lkp would complain about compile
> issue again.
>
> Thanks,
> Guoqing
>
> V2 change:
> 1. add checking for write-behind case and relevant comments from Christoph.
>
> drivers/md/raid1.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 3c44c4bb40fc..e8c8e6bb0439 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1329,6 +1329,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
> struct raid1_plug_cb *plug = NULL;
> int first_clone;
> int max_sectors;
> + bool write_behind = false;
>
> if (mddev_is_clustered(mddev) &&
> md_cluster_ops->area_resyncing(mddev, WRITE,
> @@ -1381,6 +1382,10 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
> max_sectors = r1_bio->sectors;
> for (i = 0; i < disks; i++) {
> struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev);
> +
> + if (test_bit(WriteMostly, &mirror->rdev->flags))
> + write_behind = true;
> +
> if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
> atomic_inc(&rdev->nr_pending);
> blocked_rdev = rdev;
> @@ -1454,6 +1459,14 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
> goto retry_write;
> }
>
> + /*
> + * When using a bitmap, we may call alloc_behind_master_bio below.
> + * alloc_behind_master_bio allocates a copy of the data payload a page
> + * at a time and thus needs a new bio that can fit the whole payload
> + * this bio in page sized chunks.
> + */
> + if (write_behind && bitmap)
> + max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
> if (max_sectors < bio_sectors(bio)) {
> struct bio *split = bio_split(bio, max_sectors,
> GFP_NOIO, &conf->bio_split);
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V2] raid1: ensure write behind bio has less than BIO_MAX_VECS sectors
2021-08-18 7:37 [PATCH V2] raid1: ensure write behind bio has less than BIO_MAX_VECS sectors Guoqing Jiang
@ 2021-08-18 11:03 ` kernel test robot
2021-08-18 11:03 ` kernel test robot
` (2 subsequent siblings)
3 siblings, 0 replies; 9+ messages in thread
From: kernel test robot @ 2021-08-18 11:03 UTC (permalink / raw)
To: Guoqing Jiang, axboe; +Cc: kbuild-all, linux-block, linux-raid
[-- Attachment #1: Type: text/plain, Size: 13405 bytes --]
Hi Guoqing,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on song-md/md-next]
[also build test ERROR on v5.14-rc6 next-20210818]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Guoqing-Jiang/raid1-ensure-write-behind-bio-has-less-than-BIO_MAX_VECS-sectors/20210818-154106
base: git://git.kernel.org/pub/scm/linux/kernel/git/song/md.git md-next
config: arc-randconfig-r043-20210816 (attached as .config)
compiler: arc-elf-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/abf22557456363eb6fd1d1d09332f5261d61796c
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Guoqing-Jiang/raid1-ensure-write-behind-bio-has-less-than-BIO_MAX_VECS-sectors/20210818-154106
git checkout abf22557456363eb6fd1d1d09332f5261d61796c
# save the attached .config to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=arc SHELL=/bin/bash drivers/md/
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
drivers/md/raid1.c: In function 'raid1_write_request':
>> drivers/md/raid1.c:1388:44: error: 'mirror' undeclared (first use in this function); did you mean 'md_error'?
1388 | if (test_bit(WriteMostly, &mirror->rdev->flags))
| ^~~~~~
| md_error
drivers/md/raid1.c:1388:44: note: each undeclared identifier is reported only once for each function it appears in
In file included from include/linux/kernel.h:16,
from include/linux/list.h:9,
from include/linux/preempt.h:11,
from include/linux/spinlock.h:51,
from include/linux/mmzone.h:8,
from include/linux/gfp.h:6,
from include/linux/slab.h:15,
from drivers/md/raid1.c:26:
drivers/md/raid1.c:1471:70: error: 'PAGE_SECTORS' undeclared (first use in this function)
1471 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
| ^~~~~~~~~~~~
include/linux/minmax.h:20:46: note: in definition of macro '__typecheck'
20 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
| ^
include/linux/minmax.h:36:31: note: in expansion of macro '__safe_cmp'
36 | __builtin_choose_expr(__safe_cmp(x, y), \
| ^~~~~~~~~~
include/linux/minmax.h:104:33: note: in expansion of macro '__careful_cmp'
104 | #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
| ^~~~~~~~~~~~~
drivers/md/raid1.c:1471:31: note: in expansion of macro 'min_t'
1471 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
| ^~~~~
include/linux/minmax.h:36:9: error: first argument to '__builtin_choose_expr' not a constant
36 | __builtin_choose_expr(__safe_cmp(x, y), \
| ^~~~~~~~~~~~~~~~~~~~~
include/linux/minmax.h:104:33: note: in expansion of macro '__careful_cmp'
104 | #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
| ^~~~~~~~~~~~~
drivers/md/raid1.c:1471:31: note: in expansion of macro 'min_t'
1471 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
| ^~~~~
vim +1388 drivers/md/raid1.c
1320
1321 static void raid1_write_request(struct mddev *mddev, struct bio *bio,
1322 int max_write_sectors)
1323 {
1324 struct r1conf *conf = mddev->private;
1325 struct r1bio *r1_bio;
1326 int i, disks;
1327 struct bitmap *bitmap = mddev->bitmap;
1328 unsigned long flags;
1329 struct md_rdev *blocked_rdev;
1330 struct blk_plug_cb *cb;
1331 struct raid1_plug_cb *plug = NULL;
1332 int first_clone;
1333 int max_sectors;
1334 bool write_behind = false;
1335
1336 if (mddev_is_clustered(mddev) &&
1337 md_cluster_ops->area_resyncing(mddev, WRITE,
1338 bio->bi_iter.bi_sector, bio_end_sector(bio))) {
1339
1340 DEFINE_WAIT(w);
1341 for (;;) {
1342 prepare_to_wait(&conf->wait_barrier,
1343 &w, TASK_IDLE);
1344 if (!md_cluster_ops->area_resyncing(mddev, WRITE,
1345 bio->bi_iter.bi_sector,
1346 bio_end_sector(bio)))
1347 break;
1348 schedule();
1349 }
1350 finish_wait(&conf->wait_barrier, &w);
1351 }
1352
1353 /*
1354 * Register the new request and wait if the reconstruction
1355 * thread has put up a bar for new requests.
1356 * Continue immediately if no resync is active currently.
1357 */
1358 wait_barrier(conf, bio->bi_iter.bi_sector);
1359
1360 r1_bio = alloc_r1bio(mddev, bio);
1361 r1_bio->sectors = max_write_sectors;
1362
1363 if (conf->pending_count >= max_queued_requests) {
1364 md_wakeup_thread(mddev->thread);
1365 raid1_log(mddev, "wait queued");
1366 wait_event(conf->wait_barrier,
1367 conf->pending_count < max_queued_requests);
1368 }
1369 /* first select target devices under rcu_lock and
1370 * inc refcount on their rdev. Record them by setting
1371 * bios[x] to bio
1372 * If there are known/acknowledged bad blocks on any device on
1373 * which we have seen a write error, we want to avoid writing those
1374 * blocks.
1375 * This potentially requires several writes to write around
1376 * the bad blocks. Each set of writes gets it's own r1bio
1377 * with a set of bios attached.
1378 */
1379
1380 disks = conf->raid_disks * 2;
1381 retry_write:
1382 blocked_rdev = NULL;
1383 rcu_read_lock();
1384 max_sectors = r1_bio->sectors;
1385 for (i = 0; i < disks; i++) {
1386 struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev);
1387
> 1388 if (test_bit(WriteMostly, &mirror->rdev->flags))
1389 write_behind = true;
1390
1391 if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
1392 atomic_inc(&rdev->nr_pending);
1393 blocked_rdev = rdev;
1394 break;
1395 }
1396 r1_bio->bios[i] = NULL;
1397 if (!rdev || test_bit(Faulty, &rdev->flags)) {
1398 if (i < conf->raid_disks)
1399 set_bit(R1BIO_Degraded, &r1_bio->state);
1400 continue;
1401 }
1402
1403 atomic_inc(&rdev->nr_pending);
1404 if (test_bit(WriteErrorSeen, &rdev->flags)) {
1405 sector_t first_bad;
1406 int bad_sectors;
1407 int is_bad;
1408
1409 is_bad = is_badblock(rdev, r1_bio->sector, max_sectors,
1410 &first_bad, &bad_sectors);
1411 if (is_bad < 0) {
1412 /* mustn't write here until the bad block is
1413 * acknowledged*/
1414 set_bit(BlockedBadBlocks, &rdev->flags);
1415 blocked_rdev = rdev;
1416 break;
1417 }
1418 if (is_bad && first_bad <= r1_bio->sector) {
1419 /* Cannot write here at all */
1420 bad_sectors -= (r1_bio->sector - first_bad);
1421 if (bad_sectors < max_sectors)
1422 /* mustn't write more than bad_sectors
1423 * to other devices yet
1424 */
1425 max_sectors = bad_sectors;
1426 rdev_dec_pending(rdev, mddev);
1427 /* We don't set R1BIO_Degraded as that
1428 * only applies if the disk is
1429 * missing, so it might be re-added,
1430 * and we want to know to recover this
1431 * chunk.
1432 * In this case the device is here,
1433 * and the fact that this chunk is not
1434 * in-sync is recorded in the bad
1435 * block log
1436 */
1437 continue;
1438 }
1439 if (is_bad) {
1440 int good_sectors = first_bad - r1_bio->sector;
1441 if (good_sectors < max_sectors)
1442 max_sectors = good_sectors;
1443 }
1444 }
1445 r1_bio->bios[i] = bio;
1446 }
1447 rcu_read_unlock();
1448
1449 if (unlikely(blocked_rdev)) {
1450 /* Wait for this device to become unblocked */
1451 int j;
1452
1453 for (j = 0; j < i; j++)
1454 if (r1_bio->bios[j])
1455 rdev_dec_pending(conf->mirrors[j].rdev, mddev);
1456 r1_bio->state = 0;
1457 allow_barrier(conf, bio->bi_iter.bi_sector);
1458 raid1_log(mddev, "wait rdev %d blocked", blocked_rdev->raid_disk);
1459 md_wait_for_blocked_rdev(blocked_rdev, mddev);
1460 wait_barrier(conf, bio->bi_iter.bi_sector);
1461 goto retry_write;
1462 }
1463
1464 /*
1465 * When using a bitmap, we may call alloc_behind_master_bio below.
1466 * alloc_behind_master_bio allocates a copy of the data payload a page
1467 * at a time and thus needs a new bio that can fit the whole payload
1468 * this bio in page sized chunks.
1469 */
1470 if (write_behind && bitmap)
1471 max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
1472 if (max_sectors < bio_sectors(bio)) {
1473 struct bio *split = bio_split(bio, max_sectors,
1474 GFP_NOIO, &conf->bio_split);
1475 bio_chain(split, bio);
1476 submit_bio_noacct(bio);
1477 bio = split;
1478 r1_bio->master_bio = bio;
1479 r1_bio->sectors = max_sectors;
1480 }
1481
1482 if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue))
1483 r1_bio->start_time = bio_start_io_acct(bio);
1484 atomic_set(&r1_bio->remaining, 1);
1485 atomic_set(&r1_bio->behind_remaining, 0);
1486
1487 first_clone = 1;
1488
1489 for (i = 0; i < disks; i++) {
1490 struct bio *mbio = NULL;
1491 struct md_rdev *rdev = conf->mirrors[i].rdev;
1492 if (!r1_bio->bios[i])
1493 continue;
1494
1495 if (first_clone) {
1496 /* do behind I/O ?
1497 * Not if there are too many, or cannot
1498 * allocate memory, or a reader on WriteMostly
1499 * is waiting for behind writes to flush */
1500 if (bitmap &&
1501 (atomic_read(&bitmap->behind_writes)
1502 < mddev->bitmap_info.max_write_behind) &&
1503 !waitqueue_active(&bitmap->behind_wait)) {
1504 alloc_behind_master_bio(r1_bio, bio);
1505 }
1506
1507 md_bitmap_startwrite(bitmap, r1_bio->sector, r1_bio->sectors,
1508 test_bit(R1BIO_BehindIO, &r1_bio->state));
1509 first_clone = 0;
1510 }
1511
1512 if (r1_bio->behind_master_bio)
1513 mbio = bio_clone_fast(r1_bio->behind_master_bio,
1514 GFP_NOIO, &mddev->bio_set);
1515 else
1516 mbio = bio_clone_fast(bio, GFP_NOIO, &mddev->bio_set);
1517
1518 if (r1_bio->behind_master_bio) {
1519 if (test_bit(CollisionCheck, &rdev->flags))
1520 wait_for_serialization(rdev, r1_bio);
1521 if (test_bit(WriteMostly, &rdev->flags))
1522 atomic_inc(&r1_bio->behind_remaining);
1523 } else if (mddev->serialize_policy)
1524 wait_for_serialization(rdev, r1_bio);
1525
1526 r1_bio->bios[i] = mbio;
1527
1528 mbio->bi_iter.bi_sector = (r1_bio->sector +
1529 conf->mirrors[i].rdev->data_offset);
1530 bio_set_dev(mbio, conf->mirrors[i].rdev->bdev);
1531 mbio->bi_end_io = raid1_end_write_request;
1532 mbio->bi_opf = bio_op(bio) | (bio->bi_opf & (REQ_SYNC | REQ_FUA));
1533 if (test_bit(FailFast, &conf->mirrors[i].rdev->flags) &&
1534 !test_bit(WriteMostly, &conf->mirrors[i].rdev->flags) &&
1535 conf->raid_disks - mddev->degraded > 1)
1536 mbio->bi_opf |= MD_FAILFAST;
1537 mbio->bi_private = r1_bio;
1538
1539 atomic_inc(&r1_bio->remaining);
1540
1541 if (mddev->gendisk)
1542 trace_block_bio_remap(mbio, disk_devt(mddev->gendisk),
1543 r1_bio->sector);
1544 /* flush_pending_writes() needs access to the rdev so...*/
1545 mbio->bi_bdev = (void *)conf->mirrors[i].rdev;
1546
1547 cb = blk_check_plugged(raid1_unplug, mddev, sizeof(*plug));
1548 if (cb)
1549 plug = container_of(cb, struct raid1_plug_cb, cb);
1550 else
1551 plug = NULL;
1552 if (plug) {
1553 bio_list_add(&plug->pending, mbio);
1554 plug->pending_cnt++;
1555 } else {
1556 spin_lock_irqsave(&conf->device_lock, flags);
1557 bio_list_add(&conf->pending_bio_list, mbio);
1558 conf->pending_count++;
1559 spin_unlock_irqrestore(&conf->device_lock, flags);
1560 md_wakeup_thread(mddev->thread);
1561 }
1562 }
1563
1564 r1_bio_write_done(r1_bio);
1565
1566 /* In case raid1d snuck in to freeze_array */
1567 wake_up(&conf->wait_barrier);
1568 }
1569
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 32880 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V2] raid1: ensure write behind bio has less than BIO_MAX_VECS sectors
@ 2021-08-18 11:03 ` kernel test robot
0 siblings, 0 replies; 9+ messages in thread
From: kernel test robot @ 2021-08-18 11:03 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 13733 bytes --]
Hi Guoqing,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on song-md/md-next]
[also build test ERROR on v5.14-rc6 next-20210818]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Guoqing-Jiang/raid1-ensure-write-behind-bio-has-less-than-BIO_MAX_VECS-sectors/20210818-154106
base: git://git.kernel.org/pub/scm/linux/kernel/git/song/md.git md-next
config: arc-randconfig-r043-20210816 (attached as .config)
compiler: arc-elf-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/abf22557456363eb6fd1d1d09332f5261d61796c
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Guoqing-Jiang/raid1-ensure-write-behind-bio-has-less-than-BIO_MAX_VECS-sectors/20210818-154106
git checkout abf22557456363eb6fd1d1d09332f5261d61796c
# save the attached .config to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=arc SHELL=/bin/bash drivers/md/
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
drivers/md/raid1.c: In function 'raid1_write_request':
>> drivers/md/raid1.c:1388:44: error: 'mirror' undeclared (first use in this function); did you mean 'md_error'?
1388 | if (test_bit(WriteMostly, &mirror->rdev->flags))
| ^~~~~~
| md_error
drivers/md/raid1.c:1388:44: note: each undeclared identifier is reported only once for each function it appears in
In file included from include/linux/kernel.h:16,
from include/linux/list.h:9,
from include/linux/preempt.h:11,
from include/linux/spinlock.h:51,
from include/linux/mmzone.h:8,
from include/linux/gfp.h:6,
from include/linux/slab.h:15,
from drivers/md/raid1.c:26:
drivers/md/raid1.c:1471:70: error: 'PAGE_SECTORS' undeclared (first use in this function)
1471 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
| ^~~~~~~~~~~~
include/linux/minmax.h:20:46: note: in definition of macro '__typecheck'
20 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
| ^
include/linux/minmax.h:36:31: note: in expansion of macro '__safe_cmp'
36 | __builtin_choose_expr(__safe_cmp(x, y), \
| ^~~~~~~~~~
include/linux/minmax.h:104:33: note: in expansion of macro '__careful_cmp'
104 | #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
| ^~~~~~~~~~~~~
drivers/md/raid1.c:1471:31: note: in expansion of macro 'min_t'
1471 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
| ^~~~~
include/linux/minmax.h:36:9: error: first argument to '__builtin_choose_expr' not a constant
36 | __builtin_choose_expr(__safe_cmp(x, y), \
| ^~~~~~~~~~~~~~~~~~~~~
include/linux/minmax.h:104:33: note: in expansion of macro '__careful_cmp'
104 | #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
| ^~~~~~~~~~~~~
drivers/md/raid1.c:1471:31: note: in expansion of macro 'min_t'
1471 | max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
| ^~~~~
vim +1388 drivers/md/raid1.c
1320
1321 static void raid1_write_request(struct mddev *mddev, struct bio *bio,
1322 int max_write_sectors)
1323 {
1324 struct r1conf *conf = mddev->private;
1325 struct r1bio *r1_bio;
1326 int i, disks;
1327 struct bitmap *bitmap = mddev->bitmap;
1328 unsigned long flags;
1329 struct md_rdev *blocked_rdev;
1330 struct blk_plug_cb *cb;
1331 struct raid1_plug_cb *plug = NULL;
1332 int first_clone;
1333 int max_sectors;
1334 bool write_behind = false;
1335
1336 if (mddev_is_clustered(mddev) &&
1337 md_cluster_ops->area_resyncing(mddev, WRITE,
1338 bio->bi_iter.bi_sector, bio_end_sector(bio))) {
1339
1340 DEFINE_WAIT(w);
1341 for (;;) {
1342 prepare_to_wait(&conf->wait_barrier,
1343 &w, TASK_IDLE);
1344 if (!md_cluster_ops->area_resyncing(mddev, WRITE,
1345 bio->bi_iter.bi_sector,
1346 bio_end_sector(bio)))
1347 break;
1348 schedule();
1349 }
1350 finish_wait(&conf->wait_barrier, &w);
1351 }
1352
1353 /*
1354 * Register the new request and wait if the reconstruction
1355 * thread has put up a bar for new requests.
1356 * Continue immediately if no resync is active currently.
1357 */
1358 wait_barrier(conf, bio->bi_iter.bi_sector);
1359
1360 r1_bio = alloc_r1bio(mddev, bio);
1361 r1_bio->sectors = max_write_sectors;
1362
1363 if (conf->pending_count >= max_queued_requests) {
1364 md_wakeup_thread(mddev->thread);
1365 raid1_log(mddev, "wait queued");
1366 wait_event(conf->wait_barrier,
1367 conf->pending_count < max_queued_requests);
1368 }
1369 /* first select target devices under rcu_lock and
1370 * inc refcount on their rdev. Record them by setting
1371 * bios[x] to bio
1372 * If there are known/acknowledged bad blocks on any device on
1373 * which we have seen a write error, we want to avoid writing those
1374 * blocks.
1375 * This potentially requires several writes to write around
1376 * the bad blocks. Each set of writes gets it's own r1bio
1377 * with a set of bios attached.
1378 */
1379
1380 disks = conf->raid_disks * 2;
1381 retry_write:
1382 blocked_rdev = NULL;
1383 rcu_read_lock();
1384 max_sectors = r1_bio->sectors;
1385 for (i = 0; i < disks; i++) {
1386 struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev);
1387
> 1388 if (test_bit(WriteMostly, &mirror->rdev->flags))
1389 write_behind = true;
1390
1391 if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
1392 atomic_inc(&rdev->nr_pending);
1393 blocked_rdev = rdev;
1394 break;
1395 }
1396 r1_bio->bios[i] = NULL;
1397 if (!rdev || test_bit(Faulty, &rdev->flags)) {
1398 if (i < conf->raid_disks)
1399 set_bit(R1BIO_Degraded, &r1_bio->state);
1400 continue;
1401 }
1402
1403 atomic_inc(&rdev->nr_pending);
1404 if (test_bit(WriteErrorSeen, &rdev->flags)) {
1405 sector_t first_bad;
1406 int bad_sectors;
1407 int is_bad;
1408
1409 is_bad = is_badblock(rdev, r1_bio->sector, max_sectors,
1410 &first_bad, &bad_sectors);
1411 if (is_bad < 0) {
1412 /* mustn't write here until the bad block is
1413 * acknowledged*/
1414 set_bit(BlockedBadBlocks, &rdev->flags);
1415 blocked_rdev = rdev;
1416 break;
1417 }
1418 if (is_bad && first_bad <= r1_bio->sector) {
1419 /* Cannot write here at all */
1420 bad_sectors -= (r1_bio->sector - first_bad);
1421 if (bad_sectors < max_sectors)
1422 /* mustn't write more than bad_sectors
1423 * to other devices yet
1424 */
1425 max_sectors = bad_sectors;
1426 rdev_dec_pending(rdev, mddev);
1427 /* We don't set R1BIO_Degraded as that
1428 * only applies if the disk is
1429 * missing, so it might be re-added,
1430 * and we want to know to recover this
1431 * chunk.
1432 * In this case the device is here,
1433 * and the fact that this chunk is not
1434 * in-sync is recorded in the bad
1435 * block log
1436 */
1437 continue;
1438 }
1439 if (is_bad) {
1440 int good_sectors = first_bad - r1_bio->sector;
1441 if (good_sectors < max_sectors)
1442 max_sectors = good_sectors;
1443 }
1444 }
1445 r1_bio->bios[i] = bio;
1446 }
1447 rcu_read_unlock();
1448
1449 if (unlikely(blocked_rdev)) {
1450 /* Wait for this device to become unblocked */
1451 int j;
1452
1453 for (j = 0; j < i; j++)
1454 if (r1_bio->bios[j])
1455 rdev_dec_pending(conf->mirrors[j].rdev, mddev);
1456 r1_bio->state = 0;
1457 allow_barrier(conf, bio->bi_iter.bi_sector);
1458 raid1_log(mddev, "wait rdev %d blocked", blocked_rdev->raid_disk);
1459 md_wait_for_blocked_rdev(blocked_rdev, mddev);
1460 wait_barrier(conf, bio->bi_iter.bi_sector);
1461 goto retry_write;
1462 }
1463
1464 /*
1465 * When using a bitmap, we may call alloc_behind_master_bio below.
1466 * alloc_behind_master_bio allocates a copy of the data payload a page
1467 * at a time and thus needs a new bio that can fit the whole payload
1468 * this bio in page sized chunks.
1469 */
1470 if (write_behind && bitmap)
1471 max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
1472 if (max_sectors < bio_sectors(bio)) {
1473 struct bio *split = bio_split(bio, max_sectors,
1474 GFP_NOIO, &conf->bio_split);
1475 bio_chain(split, bio);
1476 submit_bio_noacct(bio);
1477 bio = split;
1478 r1_bio->master_bio = bio;
1479 r1_bio->sectors = max_sectors;
1480 }
1481
1482 if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue))
1483 r1_bio->start_time = bio_start_io_acct(bio);
1484 atomic_set(&r1_bio->remaining, 1);
1485 atomic_set(&r1_bio->behind_remaining, 0);
1486
1487 first_clone = 1;
1488
1489 for (i = 0; i < disks; i++) {
1490 struct bio *mbio = NULL;
1491 struct md_rdev *rdev = conf->mirrors[i].rdev;
1492 if (!r1_bio->bios[i])
1493 continue;
1494
1495 if (first_clone) {
1496 /* do behind I/O ?
1497 * Not if there are too many, or cannot
1498 * allocate memory, or a reader on WriteMostly
1499 * is waiting for behind writes to flush */
1500 if (bitmap &&
1501 (atomic_read(&bitmap->behind_writes)
1502 < mddev->bitmap_info.max_write_behind) &&
1503 !waitqueue_active(&bitmap->behind_wait)) {
1504 alloc_behind_master_bio(r1_bio, bio);
1505 }
1506
1507 md_bitmap_startwrite(bitmap, r1_bio->sector, r1_bio->sectors,
1508 test_bit(R1BIO_BehindIO, &r1_bio->state));
1509 first_clone = 0;
1510 }
1511
1512 if (r1_bio->behind_master_bio)
1513 mbio = bio_clone_fast(r1_bio->behind_master_bio,
1514 GFP_NOIO, &mddev->bio_set);
1515 else
1516 mbio = bio_clone_fast(bio, GFP_NOIO, &mddev->bio_set);
1517
1518 if (r1_bio->behind_master_bio) {
1519 if (test_bit(CollisionCheck, &rdev->flags))
1520 wait_for_serialization(rdev, r1_bio);
1521 if (test_bit(WriteMostly, &rdev->flags))
1522 atomic_inc(&r1_bio->behind_remaining);
1523 } else if (mddev->serialize_policy)
1524 wait_for_serialization(rdev, r1_bio);
1525
1526 r1_bio->bios[i] = mbio;
1527
1528 mbio->bi_iter.bi_sector = (r1_bio->sector +
1529 conf->mirrors[i].rdev->data_offset);
1530 bio_set_dev(mbio, conf->mirrors[i].rdev->bdev);
1531 mbio->bi_end_io = raid1_end_write_request;
1532 mbio->bi_opf = bio_op(bio) | (bio->bi_opf & (REQ_SYNC | REQ_FUA));
1533 if (test_bit(FailFast, &conf->mirrors[i].rdev->flags) &&
1534 !test_bit(WriteMostly, &conf->mirrors[i].rdev->flags) &&
1535 conf->raid_disks - mddev->degraded > 1)
1536 mbio->bi_opf |= MD_FAILFAST;
1537 mbio->bi_private = r1_bio;
1538
1539 atomic_inc(&r1_bio->remaining);
1540
1541 if (mddev->gendisk)
1542 trace_block_bio_remap(mbio, disk_devt(mddev->gendisk),
1543 r1_bio->sector);
1544 /* flush_pending_writes() needs access to the rdev so...*/
1545 mbio->bi_bdev = (void *)conf->mirrors[i].rdev;
1546
1547 cb = blk_check_plugged(raid1_unplug, mddev, sizeof(*plug));
1548 if (cb)
1549 plug = container_of(cb, struct raid1_plug_cb, cb);
1550 else
1551 plug = NULL;
1552 if (plug) {
1553 bio_list_add(&plug->pending, mbio);
1554 plug->pending_cnt++;
1555 } else {
1556 spin_lock_irqsave(&conf->device_lock, flags);
1557 bio_list_add(&conf->pending_bio_list, mbio);
1558 conf->pending_count++;
1559 spin_unlock_irqrestore(&conf->device_lock, flags);
1560 md_wakeup_thread(mddev->thread);
1561 }
1562 }
1563
1564 r1_bio_write_done(r1_bio);
1565
1566 /* In case raid1d snuck in to freeze_array */
1567 wake_up(&conf->wait_barrier);
1568 }
1569
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 32880 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V2] raid1: ensure write behind bio has less than BIO_MAX_VECS sectors
2021-08-18 7:37 [PATCH V2] raid1: ensure write behind bio has less than BIO_MAX_VECS sectors Guoqing Jiang
@ 2021-08-18 12:02 ` kernel test robot
2021-08-18 11:03 ` kernel test robot
` (2 subsequent siblings)
3 siblings, 0 replies; 9+ messages in thread
From: kernel test robot @ 2021-08-18 12:02 UTC (permalink / raw)
To: Guoqing Jiang, axboe
Cc: clang-built-linux, kbuild-all, linux-block, linux-raid
[-- Attachment #1: Type: text/plain, Size: 11419 bytes --]
Hi Guoqing,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on song-md/md-next]
[also build test ERROR on v5.14-rc6 next-20210818]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Guoqing-Jiang/raid1-ensure-write-behind-bio-has-less-than-BIO_MAX_VECS-sectors/20210818-154106
base: git://git.kernel.org/pub/scm/linux/kernel/git/song/md.git md-next
config: x86_64-randconfig-a011-20210816 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project d2b574a4dea5b718e4386bf2e26af0126e5978ce)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/abf22557456363eb6fd1d1d09332f5261d61796c
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Guoqing-Jiang/raid1-ensure-write-behind-bio-has-less-than-BIO_MAX_VECS-sectors/20210818-154106
git checkout abf22557456363eb6fd1d1d09332f5261d61796c
# save the attached .config to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/md/
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
>> drivers/md/raid1.c:1388:30: error: use of undeclared identifier 'mirror'
if (test_bit(WriteMostly, &mirror->rdev->flags))
^
drivers/md/raid1.c:1471:56: error: use of undeclared identifier 'PAGE_SECTORS'
max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
^
drivers/md/raid1.c:1471:56: error: use of undeclared identifier 'PAGE_SECTORS'
3 errors generated.
vim +/mirror +1388 drivers/md/raid1.c
1320
1321 static void raid1_write_request(struct mddev *mddev, struct bio *bio,
1322 int max_write_sectors)
1323 {
1324 struct r1conf *conf = mddev->private;
1325 struct r1bio *r1_bio;
1326 int i, disks;
1327 struct bitmap *bitmap = mddev->bitmap;
1328 unsigned long flags;
1329 struct md_rdev *blocked_rdev;
1330 struct blk_plug_cb *cb;
1331 struct raid1_plug_cb *plug = NULL;
1332 int first_clone;
1333 int max_sectors;
1334 bool write_behind = false;
1335
1336 if (mddev_is_clustered(mddev) &&
1337 md_cluster_ops->area_resyncing(mddev, WRITE,
1338 bio->bi_iter.bi_sector, bio_end_sector(bio))) {
1339
1340 DEFINE_WAIT(w);
1341 for (;;) {
1342 prepare_to_wait(&conf->wait_barrier,
1343 &w, TASK_IDLE);
1344 if (!md_cluster_ops->area_resyncing(mddev, WRITE,
1345 bio->bi_iter.bi_sector,
1346 bio_end_sector(bio)))
1347 break;
1348 schedule();
1349 }
1350 finish_wait(&conf->wait_barrier, &w);
1351 }
1352
1353 /*
1354 * Register the new request and wait if the reconstruction
1355 * thread has put up a bar for new requests.
1356 * Continue immediately if no resync is active currently.
1357 */
1358 wait_barrier(conf, bio->bi_iter.bi_sector);
1359
1360 r1_bio = alloc_r1bio(mddev, bio);
1361 r1_bio->sectors = max_write_sectors;
1362
1363 if (conf->pending_count >= max_queued_requests) {
1364 md_wakeup_thread(mddev->thread);
1365 raid1_log(mddev, "wait queued");
1366 wait_event(conf->wait_barrier,
1367 conf->pending_count < max_queued_requests);
1368 }
1369 /* first select target devices under rcu_lock and
1370 * inc refcount on their rdev. Record them by setting
1371 * bios[x] to bio
1372 * If there are known/acknowledged bad blocks on any device on
1373 * which we have seen a write error, we want to avoid writing those
1374 * blocks.
1375 * This potentially requires several writes to write around
1376 * the bad blocks. Each set of writes gets it's own r1bio
1377 * with a set of bios attached.
1378 */
1379
1380 disks = conf->raid_disks * 2;
1381 retry_write:
1382 blocked_rdev = NULL;
1383 rcu_read_lock();
1384 max_sectors = r1_bio->sectors;
1385 for (i = 0; i < disks; i++) {
1386 struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev);
1387
> 1388 if (test_bit(WriteMostly, &mirror->rdev->flags))
1389 write_behind = true;
1390
1391 if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
1392 atomic_inc(&rdev->nr_pending);
1393 blocked_rdev = rdev;
1394 break;
1395 }
1396 r1_bio->bios[i] = NULL;
1397 if (!rdev || test_bit(Faulty, &rdev->flags)) {
1398 if (i < conf->raid_disks)
1399 set_bit(R1BIO_Degraded, &r1_bio->state);
1400 continue;
1401 }
1402
1403 atomic_inc(&rdev->nr_pending);
1404 if (test_bit(WriteErrorSeen, &rdev->flags)) {
1405 sector_t first_bad;
1406 int bad_sectors;
1407 int is_bad;
1408
1409 is_bad = is_badblock(rdev, r1_bio->sector, max_sectors,
1410 &first_bad, &bad_sectors);
1411 if (is_bad < 0) {
1412 /* mustn't write here until the bad block is
1413 * acknowledged*/
1414 set_bit(BlockedBadBlocks, &rdev->flags);
1415 blocked_rdev = rdev;
1416 break;
1417 }
1418 if (is_bad && first_bad <= r1_bio->sector) {
1419 /* Cannot write here at all */
1420 bad_sectors -= (r1_bio->sector - first_bad);
1421 if (bad_sectors < max_sectors)
1422 /* mustn't write more than bad_sectors
1423 * to other devices yet
1424 */
1425 max_sectors = bad_sectors;
1426 rdev_dec_pending(rdev, mddev);
1427 /* We don't set R1BIO_Degraded as that
1428 * only applies if the disk is
1429 * missing, so it might be re-added,
1430 * and we want to know to recover this
1431 * chunk.
1432 * In this case the device is here,
1433 * and the fact that this chunk is not
1434 * in-sync is recorded in the bad
1435 * block log
1436 */
1437 continue;
1438 }
1439 if (is_bad) {
1440 int good_sectors = first_bad - r1_bio->sector;
1441 if (good_sectors < max_sectors)
1442 max_sectors = good_sectors;
1443 }
1444 }
1445 r1_bio->bios[i] = bio;
1446 }
1447 rcu_read_unlock();
1448
1449 if (unlikely(blocked_rdev)) {
1450 /* Wait for this device to become unblocked */
1451 int j;
1452
1453 for (j = 0; j < i; j++)
1454 if (r1_bio->bios[j])
1455 rdev_dec_pending(conf->mirrors[j].rdev, mddev);
1456 r1_bio->state = 0;
1457 allow_barrier(conf, bio->bi_iter.bi_sector);
1458 raid1_log(mddev, "wait rdev %d blocked", blocked_rdev->raid_disk);
1459 md_wait_for_blocked_rdev(blocked_rdev, mddev);
1460 wait_barrier(conf, bio->bi_iter.bi_sector);
1461 goto retry_write;
1462 }
1463
1464 /*
1465 * When using a bitmap, we may call alloc_behind_master_bio below.
1466 * alloc_behind_master_bio allocates a copy of the data payload a page
1467 * at a time and thus needs a new bio that can fit the whole payload
1468 * this bio in page sized chunks.
1469 */
1470 if (write_behind && bitmap)
1471 max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
1472 if (max_sectors < bio_sectors(bio)) {
1473 struct bio *split = bio_split(bio, max_sectors,
1474 GFP_NOIO, &conf->bio_split);
1475 bio_chain(split, bio);
1476 submit_bio_noacct(bio);
1477 bio = split;
1478 r1_bio->master_bio = bio;
1479 r1_bio->sectors = max_sectors;
1480 }
1481
1482 if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue))
1483 r1_bio->start_time = bio_start_io_acct(bio);
1484 atomic_set(&r1_bio->remaining, 1);
1485 atomic_set(&r1_bio->behind_remaining, 0);
1486
1487 first_clone = 1;
1488
1489 for (i = 0; i < disks; i++) {
1490 struct bio *mbio = NULL;
1491 struct md_rdev *rdev = conf->mirrors[i].rdev;
1492 if (!r1_bio->bios[i])
1493 continue;
1494
1495 if (first_clone) {
1496 /* do behind I/O ?
1497 * Not if there are too many, or cannot
1498 * allocate memory, or a reader on WriteMostly
1499 * is waiting for behind writes to flush */
1500 if (bitmap &&
1501 (atomic_read(&bitmap->behind_writes)
1502 < mddev->bitmap_info.max_write_behind) &&
1503 !waitqueue_active(&bitmap->behind_wait)) {
1504 alloc_behind_master_bio(r1_bio, bio);
1505 }
1506
1507 md_bitmap_startwrite(bitmap, r1_bio->sector, r1_bio->sectors,
1508 test_bit(R1BIO_BehindIO, &r1_bio->state));
1509 first_clone = 0;
1510 }
1511
1512 if (r1_bio->behind_master_bio)
1513 mbio = bio_clone_fast(r1_bio->behind_master_bio,
1514 GFP_NOIO, &mddev->bio_set);
1515 else
1516 mbio = bio_clone_fast(bio, GFP_NOIO, &mddev->bio_set);
1517
1518 if (r1_bio->behind_master_bio) {
1519 if (test_bit(CollisionCheck, &rdev->flags))
1520 wait_for_serialization(rdev, r1_bio);
1521 if (test_bit(WriteMostly, &rdev->flags))
1522 atomic_inc(&r1_bio->behind_remaining);
1523 } else if (mddev->serialize_policy)
1524 wait_for_serialization(rdev, r1_bio);
1525
1526 r1_bio->bios[i] = mbio;
1527
1528 mbio->bi_iter.bi_sector = (r1_bio->sector +
1529 conf->mirrors[i].rdev->data_offset);
1530 bio_set_dev(mbio, conf->mirrors[i].rdev->bdev);
1531 mbio->bi_end_io = raid1_end_write_request;
1532 mbio->bi_opf = bio_op(bio) | (bio->bi_opf & (REQ_SYNC | REQ_FUA));
1533 if (test_bit(FailFast, &conf->mirrors[i].rdev->flags) &&
1534 !test_bit(WriteMostly, &conf->mirrors[i].rdev->flags) &&
1535 conf->raid_disks - mddev->degraded > 1)
1536 mbio->bi_opf |= MD_FAILFAST;
1537 mbio->bi_private = r1_bio;
1538
1539 atomic_inc(&r1_bio->remaining);
1540
1541 if (mddev->gendisk)
1542 trace_block_bio_remap(mbio, disk_devt(mddev->gendisk),
1543 r1_bio->sector);
1544 /* flush_pending_writes() needs access to the rdev so...*/
1545 mbio->bi_bdev = (void *)conf->mirrors[i].rdev;
1546
1547 cb = blk_check_plugged(raid1_unplug, mddev, sizeof(*plug));
1548 if (cb)
1549 plug = container_of(cb, struct raid1_plug_cb, cb);
1550 else
1551 plug = NULL;
1552 if (plug) {
1553 bio_list_add(&plug->pending, mbio);
1554 plug->pending_cnt++;
1555 } else {
1556 spin_lock_irqsave(&conf->device_lock, flags);
1557 bio_list_add(&conf->pending_bio_list, mbio);
1558 conf->pending_count++;
1559 spin_unlock_irqrestore(&conf->device_lock, flags);
1560 md_wakeup_thread(mddev->thread);
1561 }
1562 }
1563
1564 r1_bio_write_done(r1_bio);
1565
1566 /* In case raid1d snuck in to freeze_array */
1567 wake_up(&conf->wait_barrier);
1568 }
1569
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 33731 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V2] raid1: ensure write behind bio has less than BIO_MAX_VECS sectors
@ 2021-08-18 12:02 ` kernel test robot
0 siblings, 0 replies; 9+ messages in thread
From: kernel test robot @ 2021-08-18 12:02 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 11717 bytes --]
Hi Guoqing,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on song-md/md-next]
[also build test ERROR on v5.14-rc6 next-20210818]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Guoqing-Jiang/raid1-ensure-write-behind-bio-has-less-than-BIO_MAX_VECS-sectors/20210818-154106
base: git://git.kernel.org/pub/scm/linux/kernel/git/song/md.git md-next
config: x86_64-randconfig-a011-20210816 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project d2b574a4dea5b718e4386bf2e26af0126e5978ce)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/abf22557456363eb6fd1d1d09332f5261d61796c
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Guoqing-Jiang/raid1-ensure-write-behind-bio-has-less-than-BIO_MAX_VECS-sectors/20210818-154106
git checkout abf22557456363eb6fd1d1d09332f5261d61796c
# save the attached .config to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/md/
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
>> drivers/md/raid1.c:1388:30: error: use of undeclared identifier 'mirror'
if (test_bit(WriteMostly, &mirror->rdev->flags))
^
drivers/md/raid1.c:1471:56: error: use of undeclared identifier 'PAGE_SECTORS'
max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
^
drivers/md/raid1.c:1471:56: error: use of undeclared identifier 'PAGE_SECTORS'
3 errors generated.
vim +/mirror +1388 drivers/md/raid1.c
1320
1321 static void raid1_write_request(struct mddev *mddev, struct bio *bio,
1322 int max_write_sectors)
1323 {
1324 struct r1conf *conf = mddev->private;
1325 struct r1bio *r1_bio;
1326 int i, disks;
1327 struct bitmap *bitmap = mddev->bitmap;
1328 unsigned long flags;
1329 struct md_rdev *blocked_rdev;
1330 struct blk_plug_cb *cb;
1331 struct raid1_plug_cb *plug = NULL;
1332 int first_clone;
1333 int max_sectors;
1334 bool write_behind = false;
1335
1336 if (mddev_is_clustered(mddev) &&
1337 md_cluster_ops->area_resyncing(mddev, WRITE,
1338 bio->bi_iter.bi_sector, bio_end_sector(bio))) {
1339
1340 DEFINE_WAIT(w);
1341 for (;;) {
1342 prepare_to_wait(&conf->wait_barrier,
1343 &w, TASK_IDLE);
1344 if (!md_cluster_ops->area_resyncing(mddev, WRITE,
1345 bio->bi_iter.bi_sector,
1346 bio_end_sector(bio)))
1347 break;
1348 schedule();
1349 }
1350 finish_wait(&conf->wait_barrier, &w);
1351 }
1352
1353 /*
1354 * Register the new request and wait if the reconstruction
1355 * thread has put up a bar for new requests.
1356 * Continue immediately if no resync is active currently.
1357 */
1358 wait_barrier(conf, bio->bi_iter.bi_sector);
1359
1360 r1_bio = alloc_r1bio(mddev, bio);
1361 r1_bio->sectors = max_write_sectors;
1362
1363 if (conf->pending_count >= max_queued_requests) {
1364 md_wakeup_thread(mddev->thread);
1365 raid1_log(mddev, "wait queued");
1366 wait_event(conf->wait_barrier,
1367 conf->pending_count < max_queued_requests);
1368 }
1369 /* first select target devices under rcu_lock and
1370 * inc refcount on their rdev. Record them by setting
1371 * bios[x] to bio
1372 * If there are known/acknowledged bad blocks on any device on
1373 * which we have seen a write error, we want to avoid writing those
1374 * blocks.
1375 * This potentially requires several writes to write around
1376 * the bad blocks. Each set of writes gets it's own r1bio
1377 * with a set of bios attached.
1378 */
1379
1380 disks = conf->raid_disks * 2;
1381 retry_write:
1382 blocked_rdev = NULL;
1383 rcu_read_lock();
1384 max_sectors = r1_bio->sectors;
1385 for (i = 0; i < disks; i++) {
1386 struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev);
1387
> 1388 if (test_bit(WriteMostly, &mirror->rdev->flags))
1389 write_behind = true;
1390
1391 if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
1392 atomic_inc(&rdev->nr_pending);
1393 blocked_rdev = rdev;
1394 break;
1395 }
1396 r1_bio->bios[i] = NULL;
1397 if (!rdev || test_bit(Faulty, &rdev->flags)) {
1398 if (i < conf->raid_disks)
1399 set_bit(R1BIO_Degraded, &r1_bio->state);
1400 continue;
1401 }
1402
1403 atomic_inc(&rdev->nr_pending);
1404 if (test_bit(WriteErrorSeen, &rdev->flags)) {
1405 sector_t first_bad;
1406 int bad_sectors;
1407 int is_bad;
1408
1409 is_bad = is_badblock(rdev, r1_bio->sector, max_sectors,
1410 &first_bad, &bad_sectors);
1411 if (is_bad < 0) {
1412 /* mustn't write here until the bad block is
1413 * acknowledged*/
1414 set_bit(BlockedBadBlocks, &rdev->flags);
1415 blocked_rdev = rdev;
1416 break;
1417 }
1418 if (is_bad && first_bad <= r1_bio->sector) {
1419 /* Cannot write here at all */
1420 bad_sectors -= (r1_bio->sector - first_bad);
1421 if (bad_sectors < max_sectors)
1422 /* mustn't write more than bad_sectors
1423 * to other devices yet
1424 */
1425 max_sectors = bad_sectors;
1426 rdev_dec_pending(rdev, mddev);
1427 /* We don't set R1BIO_Degraded as that
1428 * only applies if the disk is
1429 * missing, so it might be re-added,
1430 * and we want to know to recover this
1431 * chunk.
1432 * In this case the device is here,
1433 * and the fact that this chunk is not
1434 * in-sync is recorded in the bad
1435 * block log
1436 */
1437 continue;
1438 }
1439 if (is_bad) {
1440 int good_sectors = first_bad - r1_bio->sector;
1441 if (good_sectors < max_sectors)
1442 max_sectors = good_sectors;
1443 }
1444 }
1445 r1_bio->bios[i] = bio;
1446 }
1447 rcu_read_unlock();
1448
1449 if (unlikely(blocked_rdev)) {
1450 /* Wait for this device to become unblocked */
1451 int j;
1452
1453 for (j = 0; j < i; j++)
1454 if (r1_bio->bios[j])
1455 rdev_dec_pending(conf->mirrors[j].rdev, mddev);
1456 r1_bio->state = 0;
1457 allow_barrier(conf, bio->bi_iter.bi_sector);
1458 raid1_log(mddev, "wait rdev %d blocked", blocked_rdev->raid_disk);
1459 md_wait_for_blocked_rdev(blocked_rdev, mddev);
1460 wait_barrier(conf, bio->bi_iter.bi_sector);
1461 goto retry_write;
1462 }
1463
1464 /*
1465 * When using a bitmap, we may call alloc_behind_master_bio below.
1466 * alloc_behind_master_bio allocates a copy of the data payload a page
1467 * at a time and thus needs a new bio that can fit the whole payload
1468 * this bio in page sized chunks.
1469 */
1470 if (write_behind && bitmap)
1471 max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
1472 if (max_sectors < bio_sectors(bio)) {
1473 struct bio *split = bio_split(bio, max_sectors,
1474 GFP_NOIO, &conf->bio_split);
1475 bio_chain(split, bio);
1476 submit_bio_noacct(bio);
1477 bio = split;
1478 r1_bio->master_bio = bio;
1479 r1_bio->sectors = max_sectors;
1480 }
1481
1482 if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue))
1483 r1_bio->start_time = bio_start_io_acct(bio);
1484 atomic_set(&r1_bio->remaining, 1);
1485 atomic_set(&r1_bio->behind_remaining, 0);
1486
1487 first_clone = 1;
1488
1489 for (i = 0; i < disks; i++) {
1490 struct bio *mbio = NULL;
1491 struct md_rdev *rdev = conf->mirrors[i].rdev;
1492 if (!r1_bio->bios[i])
1493 continue;
1494
1495 if (first_clone) {
1496 /* do behind I/O ?
1497 * Not if there are too many, or cannot
1498 * allocate memory, or a reader on WriteMostly
1499 * is waiting for behind writes to flush */
1500 if (bitmap &&
1501 (atomic_read(&bitmap->behind_writes)
1502 < mddev->bitmap_info.max_write_behind) &&
1503 !waitqueue_active(&bitmap->behind_wait)) {
1504 alloc_behind_master_bio(r1_bio, bio);
1505 }
1506
1507 md_bitmap_startwrite(bitmap, r1_bio->sector, r1_bio->sectors,
1508 test_bit(R1BIO_BehindIO, &r1_bio->state));
1509 first_clone = 0;
1510 }
1511
1512 if (r1_bio->behind_master_bio)
1513 mbio = bio_clone_fast(r1_bio->behind_master_bio,
1514 GFP_NOIO, &mddev->bio_set);
1515 else
1516 mbio = bio_clone_fast(bio, GFP_NOIO, &mddev->bio_set);
1517
1518 if (r1_bio->behind_master_bio) {
1519 if (test_bit(CollisionCheck, &rdev->flags))
1520 wait_for_serialization(rdev, r1_bio);
1521 if (test_bit(WriteMostly, &rdev->flags))
1522 atomic_inc(&r1_bio->behind_remaining);
1523 } else if (mddev->serialize_policy)
1524 wait_for_serialization(rdev, r1_bio);
1525
1526 r1_bio->bios[i] = mbio;
1527
1528 mbio->bi_iter.bi_sector = (r1_bio->sector +
1529 conf->mirrors[i].rdev->data_offset);
1530 bio_set_dev(mbio, conf->mirrors[i].rdev->bdev);
1531 mbio->bi_end_io = raid1_end_write_request;
1532 mbio->bi_opf = bio_op(bio) | (bio->bi_opf & (REQ_SYNC | REQ_FUA));
1533 if (test_bit(FailFast, &conf->mirrors[i].rdev->flags) &&
1534 !test_bit(WriteMostly, &conf->mirrors[i].rdev->flags) &&
1535 conf->raid_disks - mddev->degraded > 1)
1536 mbio->bi_opf |= MD_FAILFAST;
1537 mbio->bi_private = r1_bio;
1538
1539 atomic_inc(&r1_bio->remaining);
1540
1541 if (mddev->gendisk)
1542 trace_block_bio_remap(mbio, disk_devt(mddev->gendisk),
1543 r1_bio->sector);
1544 /* flush_pending_writes() needs access to the rdev so...*/
1545 mbio->bi_bdev = (void *)conf->mirrors[i].rdev;
1546
1547 cb = blk_check_plugged(raid1_unplug, mddev, sizeof(*plug));
1548 if (cb)
1549 plug = container_of(cb, struct raid1_plug_cb, cb);
1550 else
1551 plug = NULL;
1552 if (plug) {
1553 bio_list_add(&plug->pending, mbio);
1554 plug->pending_cnt++;
1555 } else {
1556 spin_lock_irqsave(&conf->device_lock, flags);
1557 bio_list_add(&conf->pending_bio_list, mbio);
1558 conf->pending_count++;
1559 spin_unlock_irqrestore(&conf->device_lock, flags);
1560 md_wakeup_thread(mddev->thread);
1561 }
1562 }
1563
1564 r1_bio_write_done(r1_bio);
1565
1566 /* In case raid1d snuck in to freeze_array */
1567 wake_up(&conf->wait_barrier);
1568 }
1569
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 33731 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V2] raid1: ensure write behind bio has less than BIO_MAX_VECS sectors
2021-08-18 7:37 [PATCH V2] raid1: ensure write behind bio has less than BIO_MAX_VECS sectors Guoqing Jiang
` (2 preceding siblings ...)
2021-08-18 12:02 ` kernel test robot
@ 2021-08-19 8:55 ` Christoph Hellwig
2021-08-20 8:19 ` Guoqing Jiang
3 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2021-08-19 8:55 UTC (permalink / raw)
To: Guoqing Jiang; +Cc: axboe, linux-block, linux-raid
On Wed, Aug 18, 2021 at 03:37:38PM +0800, Guoqing Jiang wrote:
> for (i = 0; i < disks; i++) {
> struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev);
> +
> + if (test_bit(WriteMostly, &mirror->rdev->flags))
> + write_behind = true;
How does this condition relate to the ones used for actually calling
alloc_behind_master_bio? It looks related, but as someone not familiar
with the code I can't really verify if this is correct, so a comment
explaining it might be useful.
> + /*
> + * When using a bitmap, we may call alloc_behind_master_bio below.
> + * alloc_behind_master_bio allocates a copy of the data payload a page
> + * at a time and thus needs a new bio that can fit the whole payload
> + * this bio in page sized chunks.
> + */
> + if (write_behind && bitmap)
> + max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
Overly long line here.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V2] raid1: ensure write behind bio has less than BIO_MAX_VECS sectors
2021-08-19 8:55 ` Christoph Hellwig
@ 2021-08-20 8:19 ` Guoqing Jiang
2021-08-23 6:50 ` Christoph Hellwig
0 siblings, 1 reply; 9+ messages in thread
From: Guoqing Jiang @ 2021-08-20 8:19 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: axboe, linux-block, linux-raid
On 8/19/21 4:55 PM, Christoph Hellwig wrote:
> On Wed, Aug 18, 2021 at 03:37:38PM +0800, Guoqing Jiang wrote:
>> for (i = 0; i < disks; i++) {
>> struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev);
>> +
>> + if (test_bit(WriteMostly, &mirror->rdev->flags))
>> + write_behind = true;
> How does this condition relate to the ones used for actually calling
> alloc_behind_master_bio? It looks related, but as someone not familiar
> with the code I can't really verify if this is correct, so a comment
> explaining it might be useful.
How about this?
+ /*
+ * The write-behind io is only attempted on drives marked as
+ * write-mostly, which means we will allocate write behind
+ * bio later.
+ */
if (test_bit(WriteMostly, &mirror->rdev->flags))
write_behind = true;
>> + /*
>> + * When using a bitmap, we may call alloc_behind_master_bio below.
>> + * alloc_behind_master_bio allocates a copy of the data payload a page
>> + * at a time and thus needs a new bio that can fit the whole payload
>> + * this bio in page sized chunks.
>> + */
>> + if (write_behind && bitmap)
>> + max_sectors = min_t(int, max_sectors, BIO_MAX_VECS * PAGE_SECTORS);
> Overly long line here.
I can change it given you still prefer the limitation is 80 characters.
Thanks,
Guoqing
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V2] raid1: ensure write behind bio has less than BIO_MAX_VECS sectors
2021-08-20 8:19 ` Guoqing Jiang
@ 2021-08-23 6:50 ` Christoph Hellwig
0 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2021-08-23 6:50 UTC (permalink / raw)
To: Guoqing Jiang; +Cc: Christoph Hellwig, axboe, linux-block, linux-raid
On Fri, Aug 20, 2021 at 04:19:22PM +0800, Guoqing Jiang wrote:
> How about this?
>
> +???????????????????????????? /*
> +?????????????????????????????? * The write-behind io is only attempted on drives marked as
> +?????????????????????????????? * write-mostly, which means we will allocate write behind
> +?????????????????????????????? * bio later.
> +?????????????????????????????? */
> ?????????????????????????????? if (test_bit(WriteMostly, &mirror->rdev->flags))
> ?????????????????????????????????????????????? write_behind = true;
Except for the weird formatting of the whitespaces this looks good.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2021-08-23 6:51 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-18 7:37 [PATCH V2] raid1: ensure write behind bio has less than BIO_MAX_VECS sectors Guoqing Jiang
2021-08-18 7:57 ` Guoqing Jiang
2021-08-18 11:03 ` kernel test robot
2021-08-18 11:03 ` kernel test robot
2021-08-18 12:02 ` kernel test robot
2021-08-18 12:02 ` kernel test robot
2021-08-19 8:55 ` Christoph Hellwig
2021-08-20 8:19 ` Guoqing Jiang
2021-08-23 6:50 ` Christoph Hellwig
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.