From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0DAD9C4361B for ; Fri, 18 Dec 2020 02:13:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 62EF3233F9 for ; Fri, 18 Dec 2020 02:13:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 62EF3233F9 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=cn.fujitsu.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8FDDD6B005C; Thu, 17 Dec 2020 21:13:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8ADC76B005D; Thu, 17 Dec 2020 21:13:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7C3B66B0068; Thu, 17 Dec 2020 21:13:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0032.hostedemail.com [216.40.44.32]) by kanga.kvack.org (Postfix) with ESMTP id 666D66B005C for ; Thu, 17 Dec 2020 21:13:51 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 183F3181AC9CC for ; Fri, 18 Dec 2020 02:13:51 +0000 (UTC) X-FDA: 77604782262.09.mom10_50037d327439 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin09.hostedemail.com (Postfix) with ESMTP id F08B0180AD817 for ; Fri, 18 Dec 2020 02:13:50 +0000 (UTC) X-HE-Tag: mom10_50037d327439 X-Filterd-Recvd-Size: 9758 Received: from heian.cn.fujitsu.com (mail.cn.fujitsu.com [183.91.158.132]) by imf04.hostedemail.com (Postfix) with ESMTP for ; Fri, 18 Dec 2020 02:13:49 +0000 (UTC) X-IronPort-AV: E=Sophos;i="5.78,428,1599494400"; d="scan'208";a="102691740" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 18 Dec 2020 10:13:47 +0800 Received: from G08CNEXMBPEKD05.g08.fujitsu.local (unknown [10.167.33.204]) by cn.fujitsu.com (Postfix) with ESMTP id DBE494CE6013; Fri, 18 Dec 2020 10:13:45 +0800 (CST) Received: from irides.mr (10.167.225.141) by G08CNEXMBPEKD05.g08.fujitsu.local (10.167.33.204) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 18 Dec 2020 10:13:45 +0800 Subject: Re: [RFC PATCH v3 8/9] md: Implement ->corrupted_range() To: "Darrick J. Wong" CC: , , , , , , , , , , , , , Theodore Ts'o References: <20201215121414.253660-1-ruansy.fnst@cn.fujitsu.com> <20201215121414.253660-9-ruansy.fnst@cn.fujitsu.com> <20201215205102.GB6918@magnolia> From: Ruan Shiyang Message-ID: Date: Fri, 18 Dec 2020 10:11:54 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.1 MIME-Version: 1.0 In-Reply-To: <20201215205102.GB6918@magnolia> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Language: en-US X-Originating-IP: [10.167.225.141] X-ClientProxiedBy: G08CNEXCHPEKD06.g08.fujitsu.local (10.167.33.205) To G08CNEXMBPEKD05.g08.fujitsu.local (10.167.33.204) X-yoursite-MailScanner-ID: DBE494CE6013.AEDC9 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@cn.fujitsu.com Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2020/12/16 =E4=B8=8A=E5=8D=884:51, Darrick J. Wong wrote: > On Tue, Dec 15, 2020 at 08:14:13PM +0800, Shiyang Ruan wrote: >> With the support of ->rmap(), it is possible to obtain the superblock = on >> a mapped device. >> >> If a pmem device is used as one target of mapped device, we cannot >> obtain its superblock directly. With the help of SYSFS, the mapped >> device can be found on the target devices. So, we iterate the >> bdev->bd_holder_disks to obtain its mapped device. >> >> Signed-off-by: Shiyang Ruan >> --- >> drivers/md/dm.c | 66 +++++++++++++++++++++++++++++++++++++++++= ++ >> drivers/nvdimm/pmem.c | 9 ++++-- >> fs/block_dev.c | 21 ++++++++++++++ >> include/linux/genhd.h | 7 +++++ >> 4 files changed, 100 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/md/dm.c b/drivers/md/dm.c >> index 4e0cbfe3f14d..9da1f9322735 100644 >> --- a/drivers/md/dm.c >> +++ b/drivers/md/dm.c >> @@ -507,6 +507,71 @@ static int dm_blk_report_zones(struct gendisk *di= sk, sector_t sector, >> #define dm_blk_report_zones NULL >> #endif /* CONFIG_BLK_DEV_ZONED */ >> =20 >> +struct dm_blk_corrupt { >> + struct block_device *bdev; >> + sector_t offset; >> +}; >> + >> +static int dm_blk_corrupt_fn(struct dm_target *ti, struct dm_dev *dev= , >> + sector_t start, sector_t len, void *data) >> +{ >> + struct dm_blk_corrupt *bc =3D data; >> + >> + return bc->bdev =3D=3D (void *)dev->bdev && >> + (start <=3D bc->offset && bc->offset < start + len); >> +} >> + >> +static int dm_blk_corrupted_range(struct gendisk *disk, >> + struct block_device *target_bdev, >> + loff_t target_offset, size_t len, void *data) >> +{ >> + struct mapped_device *md =3D disk->private_data; >> + struct block_device *md_bdev =3D md->bdev; >> + struct dm_table *map; >> + struct dm_target *ti; >> + struct super_block *sb; >> + int srcu_idx, i, rc =3D 0; >> + bool found =3D false; >> + sector_t disk_sec, target_sec =3D to_sector(target_offset); >> + >> + map =3D dm_get_live_table(md, &srcu_idx); >> + if (!map) >> + return -ENODEV; >> + >> + for (i =3D 0; i < dm_table_get_num_targets(map); i++) { >> + ti =3D dm_table_get_target(map, i); >> + if (ti->type->iterate_devices && ti->type->rmap) { >> + struct dm_blk_corrupt bc =3D {target_bdev, target_sec}; >> + >> + found =3D ti->type->iterate_devices(ti, dm_blk_corrupt_fn, &bc); >> + if (!found) >> + continue; >> + disk_sec =3D ti->type->rmap(ti, target_sec); >=20 > What happens if the dm device has multiple reverse mappings because the > physical storage is being shared at multiple LBAs? (e.g. a > deduplication target) I thought that the dm device knows the mapping relationship, and it can=20 be done by implementation of ->rmap() in each target. Did I understand=20 it wrong? >=20 >> + break; >> + } >> + } >> + >> + if (!found) { >> + rc =3D -ENODEV; >> + goto out; >> + } >> + >> + sb =3D get_super(md_bdev); >> + if (!sb) { >> + rc =3D bd_disk_holder_corrupted_range(md_bdev, to_bytes(disk_sec), = len, data); >> + goto out; >> + } else if (sb->s_op->corrupted_range) { >> + loff_t off =3D to_bytes(disk_sec - get_start_sect(md_bdev)); >> + >> + rc =3D sb->s_op->corrupted_range(sb, md_bdev, off, len, data); >=20 > This "call bd_disk_holder_corrupted_range or sb->s_op->corrupted_range" > logic appears twice; should it be refactored into a common helper? >=20 > Or, should the superblock dispatch part move to > bd_disk_holder_corrupted_range? bd_disk_holder_corrupted_range() requires SYSFS configuration. I=20 introduce it to handle those block devices that can not obtain=20 superblock by `get_super()`. Usually, if we create filesystem directly on a pmem device, or make some=20 partitions at first, we can use `get_super()` to get the superblock. In=20 other case, such as creating a LVM on pmem device, `get_super()` does=20 not work. So, I think refactoring it into a common helper looks better. -- Thanks, Ruan Shiyang. >=20 >> + } >> + drop_super(sb); >> + >> +out: >> + dm_put_live_table(md, srcu_idx); >> + return rc; >> +} >> + >> static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx, >> struct block_device **bdev) >> { >> @@ -3084,6 +3149,7 @@ static const struct block_device_operations dm_b= lk_dops =3D { >> .getgeo =3D dm_blk_getgeo, >> .report_zones =3D dm_blk_report_zones, >> .pr_ops =3D &dm_pr_ops, >> + .corrupted_range =3D dm_blk_corrupted_range, >> .owner =3D THIS_MODULE >> }; >> =20 >> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c >> index 4688bff19c20..e8cfaf860149 100644 >> --- a/drivers/nvdimm/pmem.c >> +++ b/drivers/nvdimm/pmem.c >> @@ -267,11 +267,14 @@ static int pmem_corrupted_range(struct gendisk *= disk, struct block_device *bdev, >> =20 >> bdev_offset =3D (disk_sector - get_start_sect(bdev)) << SECTOR_SHIF= T; >> sb =3D get_super(bdev); >> - if (sb && sb->s_op->corrupted_range) { >> + if (!sb) { >> + rc =3D bd_disk_holder_corrupted_range(bdev, bdev_offset, len, data)= ; >> + goto out; >> + } else if (sb->s_op->corrupted_range) >> rc =3D sb->s_op->corrupted_range(sb, bdev, bdev_offset, len, data)= ; >> - drop_super(sb); >=20 > This is out of scope for this patch(set) but do you think that the scsi > disk driver should intercept media errors from sense data and call > ->corrupted_range too? ISTR Ted muttering that one of his employers ha= d > a patchset to do more with sense data than the upstream kernel currentl= y > does... >=20 >> - } >> + drop_super(sb); >> =20 >> +out: >> bdput(bdev); >> return rc; >> } >> diff --git a/fs/block_dev.c b/fs/block_dev.c >> index 9e84b1928b94..d3e6bddb8041 100644 >> --- a/fs/block_dev.c >> +++ b/fs/block_dev.c >> @@ -1171,6 +1171,27 @@ struct bd_holder_disk { >> int refcnt; >> }; >> =20 >> +int bd_disk_holder_corrupted_range(struct block_device *bdev, loff_t = off, size_t len, void *data) >> +{ >> + struct bd_holder_disk *holder; >> + struct gendisk *disk; >> + int rc =3D 0; >> + >> + if (list_empty(&(bdev->bd_holder_disks))) >> + return -ENODEV; >> + >> + list_for_each_entry(holder, &bdev->bd_holder_disks, list) { >> + disk =3D holder->disk; >> + if (disk->fops->corrupted_range) { >> + rc =3D disk->fops->corrupted_range(disk, bdev, off, len, data); >> + if (rc !=3D -ENODEV) >> + break; >> + } >> + } >> + return rc; >> +} >> +EXPORT_SYMBOL_GPL(bd_disk_holder_corrupted_range); >> + >> static struct bd_holder_disk *bd_find_holder_disk(struct block_devic= e *bdev, >> struct gendisk *disk) >> { >> diff --git a/include/linux/genhd.h b/include/linux/genhd.h >> index ed06209008b8..fba247b852fa 100644 >> --- a/include/linux/genhd.h >> +++ b/include/linux/genhd.h >> @@ -382,9 +382,16 @@ int blkdev_ioctl(struct block_device *, fmode_t, = unsigned, unsigned long); >> long compat_blkdev_ioctl(struct file *, unsigned, unsigned long); >> =20 >> #ifdef CONFIG_SYSFS >> +int bd_disk_holder_corrupted_range(struct block_device *bdev, loff_t = off, >> + size_t len, void *data); >> int bd_link_disk_holder(struct block_device *bdev, struct gendisk *d= isk); >> void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk= *disk); >> #else >> +int bd_disk_holder_corrupted_range(struct block_device *bdev, loff_t = off, >> + size_t len, void *data) >> +{ >> + return 0; >> +} >> static inline int bd_link_disk_holder(struct block_device *bdev, >> struct gendisk *disk) >> { >> --=20 >> 2.29.2 >> >> >> >=20 >=20