From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.4 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C956AC433DB for ; Wed, 3 Feb 2021 01:51:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 308B064F75 for ; Wed, 3 Feb 2021 01:51:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 308B064F75 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=cn.fujitsu.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9DCB06B006E; Tue, 2 Feb 2021 20:51:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 964D26B0070; Tue, 2 Feb 2021 20:51:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 82E4A6B0071; Tue, 2 Feb 2021 20:51:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0078.hostedemail.com [216.40.44.78]) by kanga.kvack.org (Postfix) with ESMTP id 65F0A6B006E for ; Tue, 2 Feb 2021 20:51:46 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 2A6B2181AEF1D for ; Wed, 3 Feb 2021 01:51:46 +0000 (UTC) X-FDA: 77775280212.14.D91CCC8 Received: from heian.cn.fujitsu.com (mail.cn.fujitsu.com [183.91.158.132]) by imf09.hostedemail.com (Postfix) with ESMTP id 2E52760001AF for ; Wed, 3 Feb 2021 01:51:43 +0000 (UTC) X-IronPort-AV: E=Sophos;i="5.79,396,1602518400"; d="scan'208";a="104124130" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 03 Feb 2021 09:51:42 +0800 Received: from G08CNEXMBPEKD05.g08.fujitsu.local (unknown [10.167.33.204]) by cn.fujitsu.com (Postfix) with ESMTP id C1F3D4CE6D74; Wed, 3 Feb 2021 09:51:38 +0800 (CST) Received: from irides.mr (10.167.225.141) by G08CNEXMBPEKD05.g08.fujitsu.local (10.167.33.204) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Wed, 3 Feb 2021 09:51:39 +0800 Subject: Re: [PATCH RESEND v2 08/10] md: Implement ->corrupted_range() To: "Darrick J. Wong" CC: , , , , , , , , , , , , , , References: <20210129062757.1594130-1-ruansy.fnst@cn.fujitsu.com> <20210129062757.1594130-9-ruansy.fnst@cn.fujitsu.com> <20210202031711.GJ7193@magnolia> From: Ruan Shiyang Message-ID: <8742625e-8ae7-47a8-fd62-18c201c45a33@cn.fujitsu.com> Date: Wed, 3 Feb 2021 09:51:37 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.0 MIME-Version: 1.0 In-Reply-To: <20210202031711.GJ7193@magnolia> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Language: en-US X-Originating-IP: [10.167.225.141] X-ClientProxiedBy: G08CNEXCHPEKD04.g08.fujitsu.local (10.167.33.200) To G08CNEXMBPEKD05.g08.fujitsu.local (10.167.33.204) X-yoursite-MailScanner-ID: C1F3D4CE6D74.A03BC X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@cn.fujitsu.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 2E52760001AF X-Stat-Signature: 9xzkj1818kusrji9qjtqb3zap9kajarm Received-SPF: none (cn.fujitsu.com>: No applicable sender policy available) receiver=imf09; identity=mailfrom; envelope-from=""; helo=heian.cn.fujitsu.com; client-ip=183.91.158.132 X-HE-DKIM-Result: none/none X-HE-Tag: 1612317103-867486 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2021/2/2 =E4=B8=8A=E5=8D=8811:17, Darrick J. Wong wrote: > On Fri, Jan 29, 2021 at 02:27:55PM +0800, Shiyang Ruan wrote: >> With the support of ->rmap(), it is possible to obtain the superblock = on >> a mapped device. >> >> If a pmem device is used as one target of mapped device, we cannot >> obtain its superblock directly. With the help of SYSFS, the mapped >> device can be found on the target devices. So, we iterate the >> bdev->bd_holder_disks to obtain its mapped device. >> >> Signed-off-by: Shiyang Ruan >> --- >> drivers/md/dm.c | 61 +++++++++++++++++++++++++++++++++++++++++= ++ >> drivers/nvdimm/pmem.c | 11 +++----- >> fs/block_dev.c | 42 ++++++++++++++++++++++++++++- >=20 > I feel like this ^^^ part that implements the generic ability for a blo= ck > device with a bad sector to notify whatever's holding onto it (fs, othe= r > block device) should be in patch 2. That's generic block layer code, > and it's hard to tell (when you're looking at patch 2) what the bare > function declaration in it is really supposed to do. >=20 > Also, this patch is still difficult to review because it mixes device > mapper, nvdimm, and block layer changes! OK. I'll split this to make it looks simple. >=20 >> include/linux/genhd.h | 2 ++ >> 4 files changed, 107 insertions(+), 9 deletions(-) >> >> diff --git a/drivers/md/dm.c b/drivers/md/dm.c >> index 7bac564f3faa..31b0c340b695 100644 >> --- a/drivers/md/dm.c >> +++ b/drivers/md/dm.c >> @@ -507,6 +507,66 @@ static int dm_blk_report_zones(struct gendisk *di= sk, sector_t sector, >> #define dm_blk_report_zones NULL >> #endif /* CONFIG_BLK_DEV_ZONED */ >> =20 >> +struct corrupted_hit_info { >> + struct block_device *bdev; >> + sector_t offset; >> +}; >> + >> +static int dm_blk_corrupted_hit(struct dm_target *ti, struct dm_dev *= dev, >> + sector_t start, sector_t count, void *data) >> +{ >> + struct corrupted_hit_info *bc =3D data; >> + >> + return bc->bdev =3D=3D (void *)dev->bdev && >> + (start <=3D bc->offset && bc->offset < start + count); >> + >> +} >> + >> +struct corrupted_do_info { >> + size_t length; >> + void *data; >> +}; >> + >> +static int dm_blk_corrupted_do(struct dm_target *ti, struct block_dev= ice *bdev, >> + sector_t disk_sect, void *data) >> +{ >> + struct corrupted_do_info *bc =3D data; >> + loff_t disk_off =3D to_bytes(disk_sect); >> + loff_t bdev_off =3D to_bytes(disk_sect - get_start_sect(bdev)); >> + >> + return bd_corrupted_range(bdev, disk_off, bdev_off, bc->length, bc->= data); >> +} >> + >> +static int dm_blk_corrupted_range(struct gendisk *disk, >> + struct block_device *target_bdev, >> + loff_t target_offset, size_t len, void *data) >> +{ >> + struct mapped_device *md =3D disk->private_data; >> + struct dm_table *map; >> + struct dm_target *ti; >> + sector_t target_sect =3D to_sector(target_offset); >> + struct corrupted_hit_info hi =3D {target_bdev, target_sect}; >> + struct corrupted_do_info di =3D {len, data}; >> + int srcu_idx, i, rc =3D -ENODEV; >> + >> + map =3D dm_get_live_table(md, &srcu_idx); >> + if (!map) >> + return rc; >> + >> + for (i =3D 0; i < dm_table_get_num_targets(map); i++) { >> + ti =3D dm_table_get_target(map, i); >> + if (!(ti->type->iterate_devices && ti->type->rmap)) >> + continue; >> + if (!ti->type->iterate_devices(ti, dm_blk_corrupted_hit, &hi)) >> + continue; >> + >> + rc =3D ti->type->rmap(ti, target_sect, dm_blk_corrupted_do, &di); >=20 > Why is it necessary to call ->iterate_devices here? ->iterate_devices() here is to find out which target is the pmem device=20 which is corrupted now. Then call ->rmap() on this target. Other=20 targets will be ignored. >=20 > If you pass the target_bdev, offset, and length to the dm-target's > ->rmap function, it should be able to work backwards through its mappin= g > logic to come up with all the LBA ranges of the mapped_device that > are affected, and then it can call bd_corrupted_range on each of those > reverse mappings. >=20 > It would be helpful to have the changes to dm-linear.c in this patch > too, since that's the only real implementation at this point. >=20 >> + break; >> + } >> + >> + dm_put_live_table(md, srcu_idx); >> + return rc; >> +} >> + >> static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx, >> struct block_device **bdev) >> { >> @@ -3062,6 +3122,7 @@ static const struct block_device_operations dm_b= lk_dops =3D { >> .getgeo =3D dm_blk_getgeo, >> .report_zones =3D dm_blk_report_zones, >> .pr_ops =3D &dm_pr_ops, >> + .corrupted_range =3D dm_blk_corrupted_range, >> .owner =3D THIS_MODULE >> }; >> =20 >> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c >> index 501959947d48..3d9f4ccbbd9e 100644 >> --- a/drivers/nvdimm/pmem.c >> +++ b/drivers/nvdimm/pmem.c >> @@ -256,21 +256,16 @@ static int pmem_rw_page(struct block_device *bde= v, sector_t sector, >> static int pmem_corrupted_range(struct gendisk *disk, struct block_d= evice *bdev, >> loff_t disk_offset, size_t len, void *data) >> { >> - struct super_block *sb; >> loff_t bdev_offset; >> sector_t disk_sector =3D disk_offset >> SECTOR_SHIFT; >> - int rc =3D 0; >> + int rc =3D -ENODEV; >> =20 >> bdev =3D bdget_disk_sector(disk, disk_sector); >> if (!bdev) >> - return -ENODEV; >> + return rc; >> =20 >> bdev_offset =3D (disk_sector - get_start_sect(bdev)) << SECTOR_SHIF= T; >> - sb =3D get_super(bdev); >> - if (sb && sb->s_op->corrupted_range) { >> - rc =3D sb->s_op->corrupted_range(sb, bdev, bdev_offset, len, data); >> - drop_super(sb); >> - } >> + rc =3D bd_corrupted_range(bdev, bdev_offset, bdev_offset, len, data)= ; >> =20 >> bdput(bdev); >> return rc; >> diff --git a/fs/block_dev.c b/fs/block_dev.c >> index 3b8963e228a1..3cc2b2911e3a 100644 >> --- a/fs/block_dev.c >> +++ b/fs/block_dev.c >> @@ -1079,6 +1079,27 @@ struct bd_holder_disk { >> int refcnt; >> }; >> =20 >> +static int bd_disk_holder_corrupted_range(struct block_device *bdev, = loff_t off, >> + size_t len, void *data) >> +{ >> + struct bd_holder_disk *holder; >> + struct gendisk *disk; >> + int rc =3D 0; >> + >> + if (list_empty(&(bdev->bd_holder_disks))) >> + return -ENODEV; >> + >> + list_for_each_entry(holder, &bdev->bd_holder_disks, list) { >> + disk =3D holder->disk; >> + if (disk->fops->corrupted_range) { >> + rc =3D disk->fops->corrupted_range(disk, bdev, off, len, data); >> + if (rc !=3D -ENODEV) >> + break; >> + } >> + } >> + return rc; >> +} >> + >> static struct bd_holder_disk *bd_find_holder_disk(struct block_devic= e *bdev, >> struct gendisk *disk) >> { >> @@ -1212,7 +1233,26 @@ void bd_unlink_disk_holder(struct block_device = *bdev, struct gendisk *disk) >> mutex_unlock(&bdev->bd_mutex); >> } >> EXPORT_SYMBOL_GPL(bd_unlink_disk_holder); >> -#endif >> +#endif /* CONFIG_SYSFS */ >> + >> +int bd_corrupted_range(struct block_device *bdev, loff_t disk_off, >> + loff_t bdev_off, size_t len, void *data) >> +{ >> + struct super_block *sb =3D get_super(bdev); >> + int rc =3D -EOPNOTSUPP; >> + >> + if (!sb) { >> +#ifdef CONFIG_SYSFS >> + rc =3D bd_disk_holder_corrupted_range(bdev, disk_off, len, data); >> +#endif /* CONFIG_SYSFS */ >=20 > Normal kernel convention is that you'd provide a empty shell for the > CONFIG_SYSFS=3Dn case, e.g. >=20 > #ifdef CONFIG_SYSFS > int bd_corrupted_range(...) { > /* real code */ > } > #else > static inline bd_corrupted_range(...) { return -EOPNOTSUPP; } > #endif >=20 > so that you don't have preprocessor directives making this function > choppy. I'll fix it. -- Thanks, Ruan Shiyang. >=20 > --D >=20 >> + return rc; >> + } else if (sb->s_op->corrupted_range) >> + rc =3D sb->s_op->corrupted_range(sb, bdev, bdev_off, len, data); >> + drop_super(sb); >> + >> + return rc; >> +} >> +EXPORT_SYMBOL(bd_corrupted_range); >> =20 >> static void __blkdev_put(struct block_device *bdev, fmode_t mode, in= t for_part); >> =20 >> diff --git a/include/linux/genhd.h b/include/linux/genhd.h >> index 4da480798955..996f91b08d48 100644 >> --- a/include/linux/genhd.h >> +++ b/include/linux/genhd.h >> @@ -315,6 +315,8 @@ void unregister_blkdev(unsigned int major, const c= har *name); >> bool bdev_check_media_change(struct block_device *bdev); >> int __invalidate_device(struct block_device *bdev, bool kill_dirty); >> void set_capacity(struct gendisk *disk, sector_t size); >> +int bd_corrupted_range(struct block_device *bdev, loff_t disk_off, >> + loff_t bdev_off, size_t len, void *data); >> =20 >> /* for drivers/char/raw.c: */ >> int blkdev_ioctl(struct block_device *, fmode_t, unsigned, unsigned = long); >> --=20 >> 2.30.0 >> >> >> >=20 >=20