linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ruan Shiyang <ruansy.fnst@cn.fujitsu.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: <linux-kernel@vger.kernel.org>, <linux-xfs@vger.kernel.org>,
	<linux-nvdimm@lists.01.org>, <linux-mm@kvack.org>,
	<linux-fsdevel@vger.kernel.org>, <linux-raid@vger.kernel.org>,
	<dan.j.williams@intel.com>, <david@fromorbit.com>, <hch@lst.de>,
	<song@kernel.org>, <rgoldwyn@suse.de>, <qi.fuli@fujitsu.com>,
	<y-goto@fujitsu.com>, Theodore Ts'o <tytso@mit.edu>
Subject: Re: [RFC PATCH v3 8/9] md: Implement ->corrupted_range()
Date: Fri, 18 Dec 2020 10:11:54 +0800	[thread overview]
Message-ID: <cc48c42d-d8af-c915-5aef-17b7d4853c3c@cn.fujitsu.com> (raw)
In-Reply-To: <20201215205102.GB6918@magnolia>



On 2020/12/16 上午4:51, Darrick J. Wong wrote:
> On Tue, Dec 15, 2020 at 08:14:13PM +0800, Shiyang Ruan wrote:
>> With the support of ->rmap(), it is possible to obtain the superblock on
>> a mapped device.
>>
>> If a pmem device is used as one target of mapped device, we cannot
>> obtain its superblock directly.  With the help of SYSFS, the mapped
>> device can be found on the target devices.  So, we iterate the
>> bdev->bd_holder_disks to obtain its mapped device.
>>
>> Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
>> ---
>>   drivers/md/dm.c       | 66 +++++++++++++++++++++++++++++++++++++++++++
>>   drivers/nvdimm/pmem.c |  9 ++++--
>>   fs/block_dev.c        | 21 ++++++++++++++
>>   include/linux/genhd.h |  7 +++++
>>   4 files changed, 100 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
>> index 4e0cbfe3f14d..9da1f9322735 100644
>> --- a/drivers/md/dm.c
>> +++ b/drivers/md/dm.c
>> @@ -507,6 +507,71 @@ static int dm_blk_report_zones(struct gendisk *disk, sector_t sector,
>>   #define dm_blk_report_zones		NULL
>>   #endif /* CONFIG_BLK_DEV_ZONED */
>>   
>> +struct dm_blk_corrupt {
>> +	struct block_device *bdev;
>> +	sector_t offset;
>> +};
>> +
>> +static int dm_blk_corrupt_fn(struct dm_target *ti, struct dm_dev *dev,
>> +				sector_t start, sector_t len, void *data)
>> +{
>> +	struct dm_blk_corrupt *bc = data;
>> +
>> +	return bc->bdev == (void *)dev->bdev &&
>> +			(start <= bc->offset && bc->offset < start + len);
>> +}
>> +
>> +static int dm_blk_corrupted_range(struct gendisk *disk,
>> +				  struct block_device *target_bdev,
>> +				  loff_t target_offset, size_t len, void *data)
>> +{
>> +	struct mapped_device *md = disk->private_data;
>> +	struct block_device *md_bdev = md->bdev;
>> +	struct dm_table *map;
>> +	struct dm_target *ti;
>> +	struct super_block *sb;
>> +	int srcu_idx, i, rc = 0;
>> +	bool found = false;
>> +	sector_t disk_sec, target_sec = to_sector(target_offset);
>> +
>> +	map = dm_get_live_table(md, &srcu_idx);
>> +	if (!map)
>> +		return -ENODEV;
>> +
>> +	for (i = 0; i < dm_table_get_num_targets(map); i++) {
>> +		ti = dm_table_get_target(map, i);
>> +		if (ti->type->iterate_devices && ti->type->rmap) {
>> +			struct dm_blk_corrupt bc = {target_bdev, target_sec};
>> +
>> +			found = ti->type->iterate_devices(ti, dm_blk_corrupt_fn, &bc);
>> +			if (!found)
>> +				continue;
>> +			disk_sec = ti->type->rmap(ti, target_sec);
> 
> What happens if the dm device has multiple reverse mappings because the
> physical storage is being shared at multiple LBAs?  (e.g. a
> deduplication target)

I thought that the dm device knows the mapping relationship, and it can 
be done by implementation of ->rmap() in each target.  Did I understand 
it wrong?

> 
>> +			break;
>> +		}
>> +	}
>> +
>> +	if (!found) {
>> +		rc = -ENODEV;
>> +		goto out;
>> +	}
>> +
>> +	sb = get_super(md_bdev);
>> +	if (!sb) {
>> +		rc = bd_disk_holder_corrupted_range(md_bdev, to_bytes(disk_sec), len, data);
>> +		goto out;
>> +	} else if (sb->s_op->corrupted_range) {
>> +		loff_t off = to_bytes(disk_sec - get_start_sect(md_bdev));
>> +
>> +		rc = sb->s_op->corrupted_range(sb, md_bdev, off, len, data);
> 
> This "call bd_disk_holder_corrupted_range or sb->s_op->corrupted_range"
> logic appears twice; should it be refactored into a common helper?
> 
> Or, should the superblock dispatch part move to
> bd_disk_holder_corrupted_range?

bd_disk_holder_corrupted_range() requires SYSFS configuration.  I 
introduce it to handle those block devices that can not obtain 
superblock by `get_super()`.

Usually, if we create filesystem directly on a pmem device, or make some 
partitions at first, we can use `get_super()` to get the superblock.  In 
other case, such as creating a LVM on pmem device, `get_super()` does 
not work.

So, I think refactoring it into a common helper looks better.


--
Thanks,
Ruan Shiyang.

> 
>> +	}
>> +	drop_super(sb);
>> +
>> +out:
>> +	dm_put_live_table(md, srcu_idx);
>> +	return rc;
>> +}
>> +
>>   static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
>>   			    struct block_device **bdev)
>>   {
>> @@ -3084,6 +3149,7 @@ static const struct block_device_operations dm_blk_dops = {
>>   	.getgeo = dm_blk_getgeo,
>>   	.report_zones = dm_blk_report_zones,
>>   	.pr_ops = &dm_pr_ops,
>> +	.corrupted_range = dm_blk_corrupted_range,
>>   	.owner = THIS_MODULE
>>   };
>>   
>> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
>> index 4688bff19c20..e8cfaf860149 100644
>> --- a/drivers/nvdimm/pmem.c
>> +++ b/drivers/nvdimm/pmem.c
>> @@ -267,11 +267,14 @@ static int pmem_corrupted_range(struct gendisk *disk, struct block_device *bdev,
>>   
>>   	bdev_offset = (disk_sector - get_start_sect(bdev)) << SECTOR_SHIFT;
>>   	sb = get_super(bdev);
>> -	if (sb && sb->s_op->corrupted_range) {
>> +	if (!sb) {
>> +		rc = bd_disk_holder_corrupted_range(bdev, bdev_offset, len, data);
>> +		goto out;
>> +	} else if (sb->s_op->corrupted_range)
>>   		rc = sb->s_op->corrupted_range(sb, bdev, bdev_offset, len, data);
>> -		drop_super(sb);
> 
> This is out of scope for this patch(set) but do you think that the scsi
> disk driver should intercept media errors from sense data and call
> ->corrupted_range too?  ISTR Ted muttering that one of his employers had
> a patchset to do more with sense data than the upstream kernel currently
> does...
> 
>> -	}
>> +	drop_super(sb);
>>   
>> +out:
>>   	bdput(bdev);
>>   	return rc;
>>   }
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index 9e84b1928b94..d3e6bddb8041 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -1171,6 +1171,27 @@ struct bd_holder_disk {
>>   	int			refcnt;
>>   };
>>   
>> +int bd_disk_holder_corrupted_range(struct block_device *bdev, loff_t off, size_t len, void *data)
>> +{
>> +	struct bd_holder_disk *holder;
>> +	struct gendisk *disk;
>> +	int rc = 0;
>> +
>> +	if (list_empty(&(bdev->bd_holder_disks)))
>> +		return -ENODEV;
>> +
>> +	list_for_each_entry(holder, &bdev->bd_holder_disks, list) {
>> +		disk = holder->disk;
>> +		if (disk->fops->corrupted_range) {
>> +			rc = disk->fops->corrupted_range(disk, bdev, off, len, data);
>> +			if (rc != -ENODEV)
>> +				break;
>> +		}
>> +	}
>> +	return rc;
>> +}
>> +EXPORT_SYMBOL_GPL(bd_disk_holder_corrupted_range);
>> +
>>   static struct bd_holder_disk *bd_find_holder_disk(struct block_device *bdev,
>>   						  struct gendisk *disk)
>>   {
>> diff --git a/include/linux/genhd.h b/include/linux/genhd.h
>> index ed06209008b8..fba247b852fa 100644
>> --- a/include/linux/genhd.h
>> +++ b/include/linux/genhd.h
>> @@ -382,9 +382,16 @@ int blkdev_ioctl(struct block_device *, fmode_t, unsigned, unsigned long);
>>   long compat_blkdev_ioctl(struct file *, unsigned, unsigned long);
>>   
>>   #ifdef CONFIG_SYSFS
>> +int bd_disk_holder_corrupted_range(struct block_device *bdev, loff_t off,
>> +				   size_t len, void *data);
>>   int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk);
>>   void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk);
>>   #else
>> +int bd_disk_holder_corrupted_range(struct block_device *bdev, loff_t off,
>> +				   size_t len, void *data)
>> +{
>> +	return 0;
>> +}
>>   static inline int bd_link_disk_holder(struct block_device *bdev,
>>   				      struct gendisk *disk)
>>   {
>> -- 
>> 2.29.2
>>
>>
>>
> 
> 




  parent reply	other threads:[~2020-12-18  2:13 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-15 12:14 [RFC PATCH v3 0/9] fsdax: introduce fs query to support reflink Shiyang Ruan
2020-12-15 12:14 ` [RFC PATCH v3 1/9] pagemap: Introduce ->memory_failure() Shiyang Ruan
2020-12-15 12:14 ` [RFC PATCH v3 2/9] blk: Introduce ->corrupted_range() for block device Shiyang Ruan
2020-12-15 12:14 ` [RFC PATCH v3 3/9] fs: Introduce ->corrupted_range() for superblock Shiyang Ruan
2020-12-15 12:14 ` [RFC PATCH v3 4/9] mm, fsdax: Refactor memory-failure handler for dax mapping Shiyang Ruan
2020-12-16 21:26   ` Dave Chinner
2020-12-18  1:48     ` Ruan Shiyang
2020-12-15 12:14 ` [RFC PATCH v3 5/9] mm, pmem: Implement ->memory_failure() in pmem driver Shiyang Ruan
2020-12-15 12:14 ` [RFC PATCH v3 6/9] pmem: Implement ->corrupted_range() for " Shiyang Ruan
2020-12-15 12:14 ` [RFC PATCH v3 7/9] dm: Introduce ->rmap() to find bdev offset Shiyang Ruan
2020-12-15 12:14 ` [RFC PATCH v3 8/9] md: Implement ->corrupted_range() Shiyang Ruan
2020-12-15 20:51   ` Darrick J. Wong
2020-12-15 23:28     ` Dave Chinner
2020-12-18  2:11     ` Ruan Shiyang [this message]
2021-01-04 23:34       ` Darrick J. Wong
2021-01-08  9:52         ` Ruan Shiyang
2021-01-08 19:05           ` Darrick J. Wong
2020-12-16  5:43   ` Jane Chu
2020-12-18  1:50     ` Ruan Shiyang
2020-12-15 12:14 ` [RFC PATCH v3 9/9] xfs: Implement ->corrupted_range() for XFS Shiyang Ruan
2020-12-15 20:40   ` Darrick J. Wong
2020-12-18  2:31     ` Ruan Shiyang
2020-12-16 20:55 ` [RFC PATCH v3 0/9] fsdax: introduce fs query to support reflink Jane Chu
2020-12-18  2:44   ` Ruan Shiyang
2020-12-18  3:49     ` Darrick J. Wong
2020-12-18  9:13       ` Ruan Shiyang
2021-01-08 18:14         ` Jane Chu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cc48c42d-d8af-c915-5aef-17b7d4853c3c@cn.fujitsu.com \
    --to=ruansy.fnst@cn.fujitsu.com \
    --cc=dan.j.williams@intel.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=qi.fuli@fujitsu.com \
    --cc=rgoldwyn@suse.de \
    --cc=song@kernel.org \
    --cc=tytso@mit.edu \
    --cc=y-goto@fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).