All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] dm: Fix report zone remapping
@ 2018-10-05 10:34 Damien Le Moal
  2018-10-05 18:29   ` Mike Snitzer
  0 siblings, 1 reply; 14+ messages in thread
From: Damien Le Moal @ 2018-10-05 10:34 UTC (permalink / raw)
  To: dm-devel, Mike Snitzer; +Cc: stable

If dm-linear or dm-flakey have targets on top of a partition of a zoned
block device, remapping of the start sector and write pointer position
of the zones reported by a report zones BIO must be modified to not
only account for the target table entry mapping, but also to account
for the partition first sector. This start sector must be substracted
to the start sector of all zones reported. The write pointer position
of sequential zones must also be reduced by this offset.

Since there is no easy way to access the underlying bdev of the target
table entry from the dm_target or dm_target_io pointers, modify the
interface of the function dm_remap_zone_report() to allow a target to
pass the partition block device starting sector (offset).

Fixes: 10999307c14e ("dm: introduce dm_remap_zone_report()")
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Cc: <stable@vger.kernel.org>
---
 drivers/md/dm-flakey.c        |  3 ++-
 drivers/md/dm-linear.c        |  3 ++-
 drivers/md/dm.c               | 16 ++++++++++++----
 include/linux/device-mapper.h |  2 +-
 4 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c
index 21d126a5078c..c68c2e9cde6b 100644
--- a/drivers/md/dm-flakey.c
+++ b/drivers/md/dm-flakey.c
@@ -381,7 +381,8 @@ static int flakey_end_io(struct dm_target *ti, struct bio *bio,
 		return DM_ENDIO_DONE;
 
 	if (bio_op(bio) == REQ_OP_ZONE_REPORT) {
-		dm_remap_zone_report(ti, bio, fc->start);
+		dm_remap_zone_report(ti, bio, get_start_sect(fc->dev->bdev),
+				     fc->start);
 		return DM_ENDIO_DONE;
 	}
 
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index d10964d41fd7..f8cd367abbf6 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -108,7 +108,8 @@ static int linear_end_io(struct dm_target *ti, struct bio *bio,
 	struct linear_c *lc = ti->private;
 
 	if (!*error && bio_op(bio) == REQ_OP_ZONE_REPORT)
-		dm_remap_zone_report(ti, bio, lc->start);
+		dm_remap_zone_report(ti, bio, get_start_sect(lc->dev->bdev),
+				     lc->start);
 
 	return DM_ENDIO_DONE;
 }
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 20f7e4ef5342..ffd8544eedd8 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1162,7 +1162,8 @@ EXPORT_SYMBOL_GPL(dm_accept_partial_bio);
  * REQ_OP_ZONE_REPORT bio to remap the zone descriptors obtained
  * from the target device mapping to the dm device.
  */
-void dm_remap_zone_report(struct dm_target *ti, struct bio *bio, sector_t start)
+void dm_remap_zone_report(struct dm_target *ti, struct bio *bio,
+			  sector_t dev_offset, sector_t start)
 {
 #ifdef CONFIG_BLK_DEV_ZONED
 	struct dm_target_io *tio = container_of(bio, struct dm_target_io, clone);
@@ -1195,18 +1196,25 @@ void dm_remap_zone_report(struct dm_target *ti, struct bio *bio, sector_t start)
 		/* Set zones start sector */
 		while (hdr->nr_zones && ofst < bvec.bv_len) {
 			zone = addr + ofst;
+			zone->start -= dev_offset;
 			if (zone->start >= start + ti->len) {
 				hdr->nr_zones = 0;
 				break;
 			}
 			zone->start = zone->start + ti->begin - start;
 			if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL) {
-				if (zone->cond == BLK_ZONE_COND_FULL)
+				switch (zone->cond) {
+				case BLK_ZONE_COND_FULL:
 					zone->wp = zone->start + zone->len;
-				else if (zone->cond == BLK_ZONE_COND_EMPTY)
+					break;
+				case BLK_ZONE_COND_EMPTY:
 					zone->wp = zone->start;
-				else
+					break;
+				default:
+					zone->wp -= dev_offset;
 					zone->wp = zone->wp + ti->begin - start;
+					break;
+				}
 			}
 			ofst += sizeof(struct blk_zone);
 			hdr->nr_zones--;
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index 6fb0808e87c8..22c1924c0ee8 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -421,7 +421,7 @@ int dm_suspended(struct dm_target *ti);
 int dm_noflush_suspending(struct dm_target *ti);
 void dm_accept_partial_bio(struct bio *bio, unsigned n_sectors);
 void dm_remap_zone_report(struct dm_target *ti, struct bio *bio,
-			  sector_t start);
+			  sector_t dev_offset, sector_t start);
 union map_info *dm_get_rq_mapinfo(struct request *rq);
 
 struct queue_limits *dm_get_queue_limits(struct mapped_device *md);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: dm: Fix report zone remapping
  2018-10-05 10:34 [PATCH] dm: Fix report zone remapping Damien Le Moal
@ 2018-10-05 18:29   ` Mike Snitzer
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Snitzer @ 2018-10-05 18:29 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: dm-devel, linux-block

On Fri, Oct 05 2018 at  6:34am -0400,
Damien Le Moal <damien.lemoal@wdc.com> wrote:

> If dm-linear or dm-flakey have targets on top of a partition of a zoned
> block device, remapping of the start sector and write pointer position
> of the zones reported by a report zones BIO must be modified to not
> only account for the target table entry mapping, but also to account
> for the partition first sector. This start sector must be substracted
> to the start sector of all zones reported. The write pointer position
> of sequential zones must also be reduced by this offset.
> 
> Since there is no easy way to access the underlying bdev of the target
> table entry from the dm_target or dm_target_io pointers, modify the
> interface of the function dm_remap_zone_report() to allow a target to
> pass the partition block device starting sector (offset).
> 
> Fixes: 10999307c14e ("dm: introduce dm_remap_zone_report()")
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> Cc: <stable@vger.kernel.org>

[dropping the actual stable cc from the mail, please don't cc stable in
the mail header, could be that git-send-email just pulled it in but...]

This needs a different fix.  There should be absolutely no need to pass
in a start offset for a device that is already supposed to be described
within the 'struct bio' passed to dm_remap_zone_report().

Why can't you just use the bio->bi_bdev ?  Err, since commit
74d46992e0d9d, I mean bio->bi_disk to get the start sector?

You should be able to get it from:

struct hd_struct *part = disk_get_part(bio->bi_disk, bio->bi_partno);
part->start_sect;
disk_put_part(part);

or:

struct block_device *bdev = bdget_disk(bio->bi_disk, bio->bi_partno);
get_start_sect(fc->dev->bdev);
bdput(bdev);

Mike


> ---
>  drivers/md/dm-flakey.c        |  3 ++-
>  drivers/md/dm-linear.c        |  3 ++-
>  drivers/md/dm.c               | 16 ++++++++++++----
>  include/linux/device-mapper.h |  2 +-
>  4 files changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c
> index 21d126a5078c..c68c2e9cde6b 100644
> --- a/drivers/md/dm-flakey.c
> +++ b/drivers/md/dm-flakey.c
> @@ -381,7 +381,8 @@ static int flakey_end_io(struct dm_target *ti, struct bio *bio,
>  		return DM_ENDIO_DONE;
>  
>  	if (bio_op(bio) == REQ_OP_ZONE_REPORT) {
> -		dm_remap_zone_report(ti, bio, fc->start);
> +		dm_remap_zone_report(ti, bio, get_start_sect(fc->dev->bdev),
> +				     fc->start);
>  		return DM_ENDIO_DONE;
>  	}
>  
> diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
> index d10964d41fd7..f8cd367abbf6 100644
> --- a/drivers/md/dm-linear.c
> +++ b/drivers/md/dm-linear.c
> @@ -108,7 +108,8 @@ static int linear_end_io(struct dm_target *ti, struct bio *bio,
>  	struct linear_c *lc = ti->private;
>  
>  	if (!*error && bio_op(bio) == REQ_OP_ZONE_REPORT)
> -		dm_remap_zone_report(ti, bio, lc->start);
> +		dm_remap_zone_report(ti, bio, get_start_sect(lc->dev->bdev),
> +				     lc->start);
>  
>  	return DM_ENDIO_DONE;
>  }
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 20f7e4ef5342..ffd8544eedd8 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1162,7 +1162,8 @@ EXPORT_SYMBOL_GPL(dm_accept_partial_bio);
>   * REQ_OP_ZONE_REPORT bio to remap the zone descriptors obtained
>   * from the target device mapping to the dm device.
>   */
> -void dm_remap_zone_report(struct dm_target *ti, struct bio *bio, sector_t start)
> +void dm_remap_zone_report(struct dm_target *ti, struct bio *bio,
> +			  sector_t dev_offset, sector_t start)
>  {
>  #ifdef CONFIG_BLK_DEV_ZONED
>  	struct dm_target_io *tio = container_of(bio, struct dm_target_io, clone);
> @@ -1195,18 +1196,25 @@ void dm_remap_zone_report(struct dm_target *ti, struct bio *bio, sector_t start)
>  		/* Set zones start sector */
>  		while (hdr->nr_zones && ofst < bvec.bv_len) {
>  			zone = addr + ofst;
> +			zone->start -= dev_offset;
>  			if (zone->start >= start + ti->len) {
>  				hdr->nr_zones = 0;
>  				break;
>  			}
>  			zone->start = zone->start + ti->begin - start;
>  			if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL) {
> -				if (zone->cond == BLK_ZONE_COND_FULL)
> +				switch (zone->cond) {
> +				case BLK_ZONE_COND_FULL:
>  					zone->wp = zone->start + zone->len;
> -				else if (zone->cond == BLK_ZONE_COND_EMPTY)
> +					break;
> +				case BLK_ZONE_COND_EMPTY:
>  					zone->wp = zone->start;
> -				else
> +					break;
> +				default:
> +					zone->wp -= dev_offset;
>  					zone->wp = zone->wp + ti->begin - start;
> +					break;
> +				}
>  			}
>  			ofst += sizeof(struct blk_zone);
>  			hdr->nr_zones--;
> diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
> index 6fb0808e87c8..22c1924c0ee8 100644
> --- a/include/linux/device-mapper.h
> +++ b/include/linux/device-mapper.h
> @@ -421,7 +421,7 @@ int dm_suspended(struct dm_target *ti);
>  int dm_noflush_suspending(struct dm_target *ti);
>  void dm_accept_partial_bio(struct bio *bio, unsigned n_sectors);
>  void dm_remap_zone_report(struct dm_target *ti, struct bio *bio,
> -			  sector_t start);
> +			  sector_t dev_offset, sector_t start);
>  union map_info *dm_get_rq_mapinfo(struct request *rq);
>  
>  struct queue_limits *dm_get_queue_limits(struct mapped_device *md);
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: dm: Fix report zone remapping
@ 2018-10-05 18:29   ` Mike Snitzer
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Snitzer @ 2018-10-05 18:29 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-block, dm-devel

On Fri, Oct 05 2018 at  6:34am -0400,
Damien Le Moal <damien.lemoal@wdc.com> wrote:

> If dm-linear or dm-flakey have targets on top of a partition of a zoned
> block device, remapping of the start sector and write pointer position
> of the zones reported by a report zones BIO must be modified to not
> only account for the target table entry mapping, but also to account
> for the partition first sector. This start sector must be substracted
> to the start sector of all zones reported. The write pointer position
> of sequential zones must also be reduced by this offset.
> 
> Since there is no easy way to access the underlying bdev of the target
> table entry from the dm_target or dm_target_io pointers, modify the
> interface of the function dm_remap_zone_report() to allow a target to
> pass the partition block device starting sector (offset).
> 
> Fixes: 10999307c14e ("dm: introduce dm_remap_zone_report()")
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> Cc: <stable@vger.kernel.org>

[dropping the actual stable cc from the mail, please don't cc stable in
the mail header, could be that git-send-email just pulled it in but...]

This needs a different fix.  There should be absolutely no need to pass
in a start offset for a device that is already supposed to be described
within the 'struct bio' passed to dm_remap_zone_report().

Why can't you just use the bio->bi_bdev ?  Err, since commit
74d46992e0d9d, I mean bio->bi_disk to get the start sector?

You should be able to get it from:

struct hd_struct *part = disk_get_part(bio->bi_disk, bio->bi_partno);
part->start_sect;
disk_put_part(part);

or:

struct block_device *bdev = bdget_disk(bio->bi_disk, bio->bi_partno);
get_start_sect(fc->dev->bdev);
bdput(bdev);

Mike


> ---
>  drivers/md/dm-flakey.c        |  3 ++-
>  drivers/md/dm-linear.c        |  3 ++-
>  drivers/md/dm.c               | 16 ++++++++++++----
>  include/linux/device-mapper.h |  2 +-
>  4 files changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c
> index 21d126a5078c..c68c2e9cde6b 100644
> --- a/drivers/md/dm-flakey.c
> +++ b/drivers/md/dm-flakey.c
> @@ -381,7 +381,8 @@ static int flakey_end_io(struct dm_target *ti, struct bio *bio,
>  		return DM_ENDIO_DONE;
>  
>  	if (bio_op(bio) == REQ_OP_ZONE_REPORT) {
> -		dm_remap_zone_report(ti, bio, fc->start);
> +		dm_remap_zone_report(ti, bio, get_start_sect(fc->dev->bdev),
> +				     fc->start);
>  		return DM_ENDIO_DONE;
>  	}
>  
> diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
> index d10964d41fd7..f8cd367abbf6 100644
> --- a/drivers/md/dm-linear.c
> +++ b/drivers/md/dm-linear.c
> @@ -108,7 +108,8 @@ static int linear_end_io(struct dm_target *ti, struct bio *bio,
>  	struct linear_c *lc = ti->private;
>  
>  	if (!*error && bio_op(bio) == REQ_OP_ZONE_REPORT)
> -		dm_remap_zone_report(ti, bio, lc->start);
> +		dm_remap_zone_report(ti, bio, get_start_sect(lc->dev->bdev),
> +				     lc->start);
>  
>  	return DM_ENDIO_DONE;
>  }
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 20f7e4ef5342..ffd8544eedd8 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1162,7 +1162,8 @@ EXPORT_SYMBOL_GPL(dm_accept_partial_bio);
>   * REQ_OP_ZONE_REPORT bio to remap the zone descriptors obtained
>   * from the target device mapping to the dm device.
>   */
> -void dm_remap_zone_report(struct dm_target *ti, struct bio *bio, sector_t start)
> +void dm_remap_zone_report(struct dm_target *ti, struct bio *bio,
> +			  sector_t dev_offset, sector_t start)
>  {
>  #ifdef CONFIG_BLK_DEV_ZONED
>  	struct dm_target_io *tio = container_of(bio, struct dm_target_io, clone);
> @@ -1195,18 +1196,25 @@ void dm_remap_zone_report(struct dm_target *ti, struct bio *bio, sector_t start)
>  		/* Set zones start sector */
>  		while (hdr->nr_zones && ofst < bvec.bv_len) {
>  			zone = addr + ofst;
> +			zone->start -= dev_offset;
>  			if (zone->start >= start + ti->len) {
>  				hdr->nr_zones = 0;
>  				break;
>  			}
>  			zone->start = zone->start + ti->begin - start;
>  			if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL) {
> -				if (zone->cond == BLK_ZONE_COND_FULL)
> +				switch (zone->cond) {
> +				case BLK_ZONE_COND_FULL:
>  					zone->wp = zone->start + zone->len;
> -				else if (zone->cond == BLK_ZONE_COND_EMPTY)
> +					break;
> +				case BLK_ZONE_COND_EMPTY:
>  					zone->wp = zone->start;
> -				else
> +					break;
> +				default:
> +					zone->wp -= dev_offset;
>  					zone->wp = zone->wp + ti->begin - start;
> +					break;
> +				}
>  			}
>  			ofst += sizeof(struct blk_zone);
>  			hdr->nr_zones--;
> diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
> index 6fb0808e87c8..22c1924c0ee8 100644
> --- a/include/linux/device-mapper.h
> +++ b/include/linux/device-mapper.h
> @@ -421,7 +421,7 @@ int dm_suspended(struct dm_target *ti);
>  int dm_noflush_suspending(struct dm_target *ti);
>  void dm_accept_partial_bio(struct bio *bio, unsigned n_sectors);
>  void dm_remap_zone_report(struct dm_target *ti, struct bio *bio,
> -			  sector_t start);
> +			  sector_t dev_offset, sector_t start);
>  union map_info *dm_get_rq_mapinfo(struct request *rq);
>  
>  struct queue_limits *dm_get_queue_limits(struct mapped_device *md);
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: dm: Fix report zone remapping
  2018-10-05 18:29   ` Mike Snitzer
@ 2018-10-06  1:24     ` Damien Le Moal
  -1 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2018-10-06  1:24 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, linux-block

Mike,=0A=
=0A=
On 2018/10/06 3:29, Mike Snitzer wrote:=0A=
> On Fri, Oct 05 2018 at  6:34am -0400,=0A=
> Damien Le Moal <damien.lemoal@wdc.com> wrote:=0A=
> =0A=
>> If dm-linear or dm-flakey have targets on top of a partition of a zoned=
=0A=
>> block device, remapping of the start sector and write pointer position=
=0A=
>> of the zones reported by a report zones BIO must be modified to not=0A=
>> only account for the target table entry mapping, but also to account=0A=
>> for the partition first sector. This start sector must be substracted=0A=
>> to the start sector of all zones reported. The write pointer position=0A=
>> of sequential zones must also be reduced by this offset.=0A=
>>=0A=
>> Since there is no easy way to access the underlying bdev of the target=
=0A=
>> table entry from the dm_target or dm_target_io pointers, modify the=0A=
>> interface of the function dm_remap_zone_report() to allow a target to=0A=
>> pass the partition block device starting sector (offset).=0A=
>>=0A=
>> Fixes: 10999307c14e ("dm: introduce dm_remap_zone_report()")=0A=
>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>=0A=
>> Cc: <stable@vger.kernel.org>=0A=
> =0A=
> [dropping the actual stable cc from the mail, please don't cc stable in=
=0A=
> the mail header, could be that git-send-email just pulled it in but...]=
=0A=
=0A=
My apologies about that. It was indeed git-send-email. Will fix that.=0A=
=0A=
> This needs a different fix.  There should be absolutely no need to pass=
=0A=
> in a start offset for a device that is already supposed to be described=
=0A=
> within the 'struct bio' passed to dm_remap_zone_report().=0A=
> =0A=
> Why can't you just use the bio->bi_bdev ?  Err, since commit=0A=
> 74d46992e0d9d, I mean bio->bi_disk to get the start sector?=0A=
> =0A=
> You should be able to get it from:=0A=
> =0A=
> struct hd_struct *part =3D disk_get_part(bio->bi_disk, bio->bi_partno);=
=0A=
> part->start_sect;=0A=
> disk_put_part(part);=0A=
> =0A=
> or:=0A=
> =0A=
> struct block_device *bdev =3D bdget_disk(bio->bi_disk, bio->bi_partno);=
=0A=
> get_start_sect(fc->dev->bdev);=0A=
> bdput(bdev);=0A=
=0A=
I am afraid this does not work. The original bio disk is of course /dev/dm-=
xx so=0A=
not the partition (partno =3D=3D 0), and the mapped bio passed as argument =
to the=0A=
remap zone function (clone of the orig bio) is pointing to the entire under=
lying=0A=
disk, not the partition disk used in dmsetup. bi_partno is 0 for both BIOs.=
=0A=
Which seems weird, __bio_clone_fast() does copy the disk and part number as=
 is.=0A=
The disk is of course correctly changed from the orig dm disk to the actual=
=0A=
physical disk in the clone, but the partition number stays the same. In the=
 end,=0A=
only the target has the correct bdev (disk+partno) information it seems.=0A=
=0A=
It looks like the partno gets lost, which does not seem correct. I must be=
=0A=
missing something, but I fail to see what it is. Any suggestion ?=0A=
=0A=
=0A=
> =0A=
> Mike=0A=
> =0A=
> =0A=
>> ---=0A=
>>  drivers/md/dm-flakey.c        |  3 ++-=0A=
>>  drivers/md/dm-linear.c        |  3 ++-=0A=
>>  drivers/md/dm.c               | 16 ++++++++++++----=0A=
>>  include/linux/device-mapper.h |  2 +-=0A=
>>  4 files changed, 17 insertions(+), 7 deletions(-)=0A=
>>=0A=
>> diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c=0A=
>> index 21d126a5078c..c68c2e9cde6b 100644=0A=
>> --- a/drivers/md/dm-flakey.c=0A=
>> +++ b/drivers/md/dm-flakey.c=0A=
>> @@ -381,7 +381,8 @@ static int flakey_end_io(struct dm_target *ti, struc=
t bio *bio,=0A=
>>  		return DM_ENDIO_DONE;=0A=
>>  =0A=
>>  	if (bio_op(bio) =3D=3D REQ_OP_ZONE_REPORT) {=0A=
>> -		dm_remap_zone_report(ti, bio, fc->start);=0A=
>> +		dm_remap_zone_report(ti, bio, get_start_sect(fc->dev->bdev),=0A=
>> +				     fc->start);=0A=
>>  		return DM_ENDIO_DONE;=0A=
>>  	}=0A=
>>  =0A=
>> diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c=0A=
>> index d10964d41fd7..f8cd367abbf6 100644=0A=
>> --- a/drivers/md/dm-linear.c=0A=
>> +++ b/drivers/md/dm-linear.c=0A=
>> @@ -108,7 +108,8 @@ static int linear_end_io(struct dm_target *ti, struc=
t bio *bio,=0A=
>>  	struct linear_c *lc =3D ti->private;=0A=
>>  =0A=
>>  	if (!*error && bio_op(bio) =3D=3D REQ_OP_ZONE_REPORT)=0A=
>> -		dm_remap_zone_report(ti, bio, lc->start);=0A=
>> +		dm_remap_zone_report(ti, bio, get_start_sect(lc->dev->bdev),=0A=
>> +				     lc->start);=0A=
>>  =0A=
>>  	return DM_ENDIO_DONE;=0A=
>>  }=0A=
>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c=0A=
>> index 20f7e4ef5342..ffd8544eedd8 100644=0A=
>> --- a/drivers/md/dm.c=0A=
>> +++ b/drivers/md/dm.c=0A=
>> @@ -1162,7 +1162,8 @@ EXPORT_SYMBOL_GPL(dm_accept_partial_bio);=0A=
>>   * REQ_OP_ZONE_REPORT bio to remap the zone descriptors obtained=0A=
>>   * from the target device mapping to the dm device.=0A=
>>   */=0A=
>> -void dm_remap_zone_report(struct dm_target *ti, struct bio *bio, sector=
_t start)=0A=
>> +void dm_remap_zone_report(struct dm_target *ti, struct bio *bio,=0A=
>> +			  sector_t dev_offset, sector_t start)=0A=
>>  {=0A=
>>  #ifdef CONFIG_BLK_DEV_ZONED=0A=
>>  	struct dm_target_io *tio =3D container_of(bio, struct dm_target_io, cl=
one);=0A=
>> @@ -1195,18 +1196,25 @@ void dm_remap_zone_report(struct dm_target *ti, =
struct bio *bio, sector_t start)=0A=
>>  		/* Set zones start sector */=0A=
>>  		while (hdr->nr_zones && ofst < bvec.bv_len) {=0A=
>>  			zone =3D addr + ofst;=0A=
>> +			zone->start -=3D dev_offset;=0A=
>>  			if (zone->start >=3D start + ti->len) {=0A=
>>  				hdr->nr_zones =3D 0;=0A=
>>  				break;=0A=
>>  			}=0A=
>>  			zone->start =3D zone->start + ti->begin - start;=0A=
>>  			if (zone->type !=3D BLK_ZONE_TYPE_CONVENTIONAL) {=0A=
>> -				if (zone->cond =3D=3D BLK_ZONE_COND_FULL)=0A=
>> +				switch (zone->cond) {=0A=
>> +				case BLK_ZONE_COND_FULL:=0A=
>>  					zone->wp =3D zone->start + zone->len;=0A=
>> -				else if (zone->cond =3D=3D BLK_ZONE_COND_EMPTY)=0A=
>> +					break;=0A=
>> +				case BLK_ZONE_COND_EMPTY:=0A=
>>  					zone->wp =3D zone->start;=0A=
>> -				else=0A=
>> +					break;=0A=
>> +				default:=0A=
>> +					zone->wp -=3D dev_offset;=0A=
>>  					zone->wp =3D zone->wp + ti->begin - start;=0A=
>> +					break;=0A=
>> +				}=0A=
>>  			}=0A=
>>  			ofst +=3D sizeof(struct blk_zone);=0A=
>>  			hdr->nr_zones--;=0A=
>> diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper=
.h=0A=
>> index 6fb0808e87c8..22c1924c0ee8 100644=0A=
>> --- a/include/linux/device-mapper.h=0A=
>> +++ b/include/linux/device-mapper.h=0A=
>> @@ -421,7 +421,7 @@ int dm_suspended(struct dm_target *ti);=0A=
>>  int dm_noflush_suspending(struct dm_target *ti);=0A=
>>  void dm_accept_partial_bio(struct bio *bio, unsigned n_sectors);=0A=
>>  void dm_remap_zone_report(struct dm_target *ti, struct bio *bio,=0A=
>> -			  sector_t start);=0A=
>> +			  sector_t dev_offset, sector_t start);=0A=
>>  union map_info *dm_get_rq_mapinfo(struct request *rq);=0A=
>>  =0A=
>>  struct queue_limits *dm_get_queue_limits(struct mapped_device *md);=0A=
>> -- =0A=
>> 2.17.1=0A=
>>=0A=
> =0A=
=0A=
=0A=
-- =0A=
Damien Le Moal=0A=
Western Digital Research=0A=

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: dm: Fix report zone remapping
@ 2018-10-06  1:24     ` Damien Le Moal
  0 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2018-10-06  1:24 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: linux-block, dm-devel

Mike,

On 2018/10/06 3:29, Mike Snitzer wrote:
> On Fri, Oct 05 2018 at  6:34am -0400,
> Damien Le Moal <damien.lemoal@wdc.com> wrote:
> 
>> If dm-linear or dm-flakey have targets on top of a partition of a zoned
>> block device, remapping of the start sector and write pointer position
>> of the zones reported by a report zones BIO must be modified to not
>> only account for the target table entry mapping, but also to account
>> for the partition first sector. This start sector must be substracted
>> to the start sector of all zones reported. The write pointer position
>> of sequential zones must also be reduced by this offset.
>>
>> Since there is no easy way to access the underlying bdev of the target
>> table entry from the dm_target or dm_target_io pointers, modify the
>> interface of the function dm_remap_zone_report() to allow a target to
>> pass the partition block device starting sector (offset).
>>
>> Fixes: 10999307c14e ("dm: introduce dm_remap_zone_report()")
>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>> Cc: <stable@vger.kernel.org>
> 
> [dropping the actual stable cc from the mail, please don't cc stable in
> the mail header, could be that git-send-email just pulled it in but...]

My apologies about that. It was indeed git-send-email. Will fix that.

> This needs a different fix.  There should be absolutely no need to pass
> in a start offset for a device that is already supposed to be described
> within the 'struct bio' passed to dm_remap_zone_report().
> 
> Why can't you just use the bio->bi_bdev ?  Err, since commit
> 74d46992e0d9d, I mean bio->bi_disk to get the start sector?
> 
> You should be able to get it from:
> 
> struct hd_struct *part = disk_get_part(bio->bi_disk, bio->bi_partno);
> part->start_sect;
> disk_put_part(part);
> 
> or:
> 
> struct block_device *bdev = bdget_disk(bio->bi_disk, bio->bi_partno);
> get_start_sect(fc->dev->bdev);
> bdput(bdev);

I am afraid this does not work. The original bio disk is of course /dev/dm-xx so
not the partition (partno == 0), and the mapped bio passed as argument to the
remap zone function (clone of the orig bio) is pointing to the entire underlying
disk, not the partition disk used in dmsetup. bi_partno is 0 for both BIOs.
Which seems weird, __bio_clone_fast() does copy the disk and part number as is.
The disk is of course correctly changed from the orig dm disk to the actual
physical disk in the clone, but the partition number stays the same. In the end,
only the target has the correct bdev (disk+partno) information it seems.

It looks like the partno gets lost, which does not seem correct. I must be
missing something, but I fail to see what it is. Any suggestion ?


> 
> Mike
> 
> 
>> ---
>>  drivers/md/dm-flakey.c        |  3 ++-
>>  drivers/md/dm-linear.c        |  3 ++-
>>  drivers/md/dm.c               | 16 ++++++++++++----
>>  include/linux/device-mapper.h |  2 +-
>>  4 files changed, 17 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c
>> index 21d126a5078c..c68c2e9cde6b 100644
>> --- a/drivers/md/dm-flakey.c
>> +++ b/drivers/md/dm-flakey.c
>> @@ -381,7 +381,8 @@ static int flakey_end_io(struct dm_target *ti, struct bio *bio,
>>  		return DM_ENDIO_DONE;
>>  
>>  	if (bio_op(bio) == REQ_OP_ZONE_REPORT) {
>> -		dm_remap_zone_report(ti, bio, fc->start);
>> +		dm_remap_zone_report(ti, bio, get_start_sect(fc->dev->bdev),
>> +				     fc->start);
>>  		return DM_ENDIO_DONE;
>>  	}
>>  
>> diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
>> index d10964d41fd7..f8cd367abbf6 100644
>> --- a/drivers/md/dm-linear.c
>> +++ b/drivers/md/dm-linear.c
>> @@ -108,7 +108,8 @@ static int linear_end_io(struct dm_target *ti, struct bio *bio,
>>  	struct linear_c *lc = ti->private;
>>  
>>  	if (!*error && bio_op(bio) == REQ_OP_ZONE_REPORT)
>> -		dm_remap_zone_report(ti, bio, lc->start);
>> +		dm_remap_zone_report(ti, bio, get_start_sect(lc->dev->bdev),
>> +				     lc->start);
>>  
>>  	return DM_ENDIO_DONE;
>>  }
>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
>> index 20f7e4ef5342..ffd8544eedd8 100644
>> --- a/drivers/md/dm.c
>> +++ b/drivers/md/dm.c
>> @@ -1162,7 +1162,8 @@ EXPORT_SYMBOL_GPL(dm_accept_partial_bio);
>>   * REQ_OP_ZONE_REPORT bio to remap the zone descriptors obtained
>>   * from the target device mapping to the dm device.
>>   */
>> -void dm_remap_zone_report(struct dm_target *ti, struct bio *bio, sector_t start)
>> +void dm_remap_zone_report(struct dm_target *ti, struct bio *bio,
>> +			  sector_t dev_offset, sector_t start)
>>  {
>>  #ifdef CONFIG_BLK_DEV_ZONED
>>  	struct dm_target_io *tio = container_of(bio, struct dm_target_io, clone);
>> @@ -1195,18 +1196,25 @@ void dm_remap_zone_report(struct dm_target *ti, struct bio *bio, sector_t start)
>>  		/* Set zones start sector */
>>  		while (hdr->nr_zones && ofst < bvec.bv_len) {
>>  			zone = addr + ofst;
>> +			zone->start -= dev_offset;
>>  			if (zone->start >= start + ti->len) {
>>  				hdr->nr_zones = 0;
>>  				break;
>>  			}
>>  			zone->start = zone->start + ti->begin - start;
>>  			if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL) {
>> -				if (zone->cond == BLK_ZONE_COND_FULL)
>> +				switch (zone->cond) {
>> +				case BLK_ZONE_COND_FULL:
>>  					zone->wp = zone->start + zone->len;
>> -				else if (zone->cond == BLK_ZONE_COND_EMPTY)
>> +					break;
>> +				case BLK_ZONE_COND_EMPTY:
>>  					zone->wp = zone->start;
>> -				else
>> +					break;
>> +				default:
>> +					zone->wp -= dev_offset;
>>  					zone->wp = zone->wp + ti->begin - start;
>> +					break;
>> +				}
>>  			}
>>  			ofst += sizeof(struct blk_zone);
>>  			hdr->nr_zones--;
>> diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
>> index 6fb0808e87c8..22c1924c0ee8 100644
>> --- a/include/linux/device-mapper.h
>> +++ b/include/linux/device-mapper.h
>> @@ -421,7 +421,7 @@ int dm_suspended(struct dm_target *ti);
>>  int dm_noflush_suspending(struct dm_target *ti);
>>  void dm_accept_partial_bio(struct bio *bio, unsigned n_sectors);
>>  void dm_remap_zone_report(struct dm_target *ti, struct bio *bio,
>> -			  sector_t start);
>> +			  sector_t dev_offset, sector_t start);
>>  union map_info *dm_get_rq_mapinfo(struct request *rq);
>>  
>>  struct queue_limits *dm_get_queue_limits(struct mapped_device *md);
>> -- 
>> 2.17.1
>>
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: dm: Fix report zone remapping
  2018-10-06  1:24     ` Damien Le Moal
@ 2018-10-06  3:49       ` Mike Snitzer
  -1 siblings, 0 replies; 14+ messages in thread
From: Mike Snitzer @ 2018-10-06  3:49 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: dm-devel, linux-block

On Fri, Oct 05 2018 at  9:24pm -0400,
Damien Le Moal <Damien.LeMoal@wdc.com> wrote:

> Mike,
> 
> On 2018/10/06 3:29, Mike Snitzer wrote:
> > On Fri, Oct 05 2018 at  6:34am -0400,
> > Damien Le Moal <damien.lemoal@wdc.com> wrote:
> > 
> >> If dm-linear or dm-flakey have targets on top of a partition of a zoned
> >> block device, remapping of the start sector and write pointer position
> >> of the zones reported by a report zones BIO must be modified to not
> >> only account for the target table entry mapping, but also to account
> >> for the partition first sector. This start sector must be substracted
> >> to the start sector of all zones reported. The write pointer position
> >> of sequential zones must also be reduced by this offset.
> >>
> >> Since there is no easy way to access the underlying bdev of the target
> >> table entry from the dm_target or dm_target_io pointers, modify the
> >> interface of the function dm_remap_zone_report() to allow a target to
> >> pass the partition block device starting sector (offset).
> >>
> >> Fixes: 10999307c14e ("dm: introduce dm_remap_zone_report()")
> >> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> >> Cc: <stable@vger.kernel.org>
> > 
> > [dropping the actual stable cc from the mail, please don't cc stable in
> > the mail header, could be that git-send-email just pulled it in but...]
> 
> My apologies about that. It was indeed git-send-email. Will fix that.
> 
> > This needs a different fix.  There should be absolutely no need to pass
> > in a start offset for a device that is already supposed to be described
> > within the 'struct bio' passed to dm_remap_zone_report().
> > 
> > Why can't you just use the bio->bi_bdev ?  Err, since commit
> > 74d46992e0d9d, I mean bio->bi_disk to get the start sector?
> > 
> > You should be able to get it from:
> > 
> > struct hd_struct *part = disk_get_part(bio->bi_disk, bio->bi_partno);
> > part->start_sect;
> > disk_put_part(part);
> > 
> > or:
> > 
> > struct block_device *bdev = bdget_disk(bio->bi_disk, bio->bi_partno);
> > get_start_sect(fc->dev->bdev);
> > bdput(bdev);
> 
> I am afraid this does not work. The original bio disk is of course /dev/dm-xx so
> not the partition (partno == 0), and the mapped bio passed as argument to the
> remap zone function (clone of the orig bio) is pointing to the entire underlying
> disk, not the partition disk used in dmsetup. bi_partno is 0 for both BIOs.
> Which seems weird, __bio_clone_fast() does copy the disk and part number as is.
> The disk is of course correctly changed from the orig dm disk to the actual
> physical disk in the clone, but the partition number stays the same. In the end,
> only the target has the correct bdev (disk+partno) information it seems.

OK, that is a bug that must to be found then.

> It looks like the partno gets lost, which does not seem correct. I must be
> missing something, but I fail to see what it is. Any suggestion ?

Not at the moment but I'll have a closer look early next week.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: dm: Fix report zone remapping
@ 2018-10-06  3:49       ` Mike Snitzer
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Snitzer @ 2018-10-06  3:49 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-block, dm-devel

On Fri, Oct 05 2018 at  9:24pm -0400,
Damien Le Moal <Damien.LeMoal@wdc.com> wrote:

> Mike,
> 
> On 2018/10/06 3:29, Mike Snitzer wrote:
> > On Fri, Oct 05 2018 at  6:34am -0400,
> > Damien Le Moal <damien.lemoal@wdc.com> wrote:
> > 
> >> If dm-linear or dm-flakey have targets on top of a partition of a zoned
> >> block device, remapping of the start sector and write pointer position
> >> of the zones reported by a report zones BIO must be modified to not
> >> only account for the target table entry mapping, but also to account
> >> for the partition first sector. This start sector must be substracted
> >> to the start sector of all zones reported. The write pointer position
> >> of sequential zones must also be reduced by this offset.
> >>
> >> Since there is no easy way to access the underlying bdev of the target
> >> table entry from the dm_target or dm_target_io pointers, modify the
> >> interface of the function dm_remap_zone_report() to allow a target to
> >> pass the partition block device starting sector (offset).
> >>
> >> Fixes: 10999307c14e ("dm: introduce dm_remap_zone_report()")
> >> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> >> Cc: <stable@vger.kernel.org>
> > 
> > [dropping the actual stable cc from the mail, please don't cc stable in
> > the mail header, could be that git-send-email just pulled it in but...]
> 
> My apologies about that. It was indeed git-send-email. Will fix that.
> 
> > This needs a different fix.  There should be absolutely no need to pass
> > in a start offset for a device that is already supposed to be described
> > within the 'struct bio' passed to dm_remap_zone_report().
> > 
> > Why can't you just use the bio->bi_bdev ?  Err, since commit
> > 74d46992e0d9d, I mean bio->bi_disk to get the start sector?
> > 
> > You should be able to get it from:
> > 
> > struct hd_struct *part = disk_get_part(bio->bi_disk, bio->bi_partno);
> > part->start_sect;
> > disk_put_part(part);
> > 
> > or:
> > 
> > struct block_device *bdev = bdget_disk(bio->bi_disk, bio->bi_partno);
> > get_start_sect(fc->dev->bdev);
> > bdput(bdev);
> 
> I am afraid this does not work. The original bio disk is of course /dev/dm-xx so
> not the partition (partno == 0), and the mapped bio passed as argument to the
> remap zone function (clone of the orig bio) is pointing to the entire underlying
> disk, not the partition disk used in dmsetup. bi_partno is 0 for both BIOs.
> Which seems weird, __bio_clone_fast() does copy the disk and part number as is.
> The disk is of course correctly changed from the orig dm disk to the actual
> physical disk in the clone, but the partition number stays the same. In the end,
> only the target has the correct bdev (disk+partno) information it seems.

OK, that is a bug that must to be found then.

> It looks like the partno gets lost, which does not seem correct. I must be
> missing something, but I fail to see what it is. Any suggestion ?

Not at the moment but I'll have a closer look early next week.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: dm: Fix report zone remapping
  2018-10-06  3:49       ` Mike Snitzer
@ 2018-10-06  4:03         ` Damien Le Moal
  -1 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2018-10-06  4:03 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, linux-block

Mike,=0A=
=0A=
On 2018/10/06 12:49, Mike Snitzer wrote:=0A=
> On Fri, Oct 05 2018 at  9:24pm -0400,=0A=
> Damien Le Moal <Damien.LeMoal@wdc.com> wrote:=0A=
> =0A=
>> Mike,=0A=
>>=0A=
>> On 2018/10/06 3:29, Mike Snitzer wrote:=0A=
>>> On Fri, Oct 05 2018 at  6:34am -0400,=0A=
>>> Damien Le Moal <damien.lemoal@wdc.com> wrote:=0A=
>>>=0A=
>>>> If dm-linear or dm-flakey have targets on top of a partition of a zone=
d=0A=
>>>> block device, remapping of the start sector and write pointer position=
=0A=
>>>> of the zones reported by a report zones BIO must be modified to not=0A=
>>>> only account for the target table entry mapping, but also to account=
=0A=
>>>> for the partition first sector. This start sector must be substracted=
=0A=
>>>> to the start sector of all zones reported. The write pointer position=
=0A=
>>>> of sequential zones must also be reduced by this offset.=0A=
>>>>=0A=
>>>> Since there is no easy way to access the underlying bdev of the target=
=0A=
>>>> table entry from the dm_target or dm_target_io pointers, modify the=0A=
>>>> interface of the function dm_remap_zone_report() to allow a target to=
=0A=
>>>> pass the partition block device starting sector (offset).=0A=
>>>>=0A=
>>>> Fixes: 10999307c14e ("dm: introduce dm_remap_zone_report()")=0A=
>>>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>=0A=
>>>> Cc: <stable@vger.kernel.org>=0A=
>>>=0A=
>>> [dropping the actual stable cc from the mail, please don't cc stable in=
=0A=
>>> the mail header, could be that git-send-email just pulled it in but...]=
=0A=
>>=0A=
>> My apologies about that. It was indeed git-send-email. Will fix that.=0A=
>>=0A=
>>> This needs a different fix.  There should be absolutely no need to pass=
=0A=
>>> in a start offset for a device that is already supposed to be described=
=0A=
>>> within the 'struct bio' passed to dm_remap_zone_report().=0A=
>>>=0A=
>>> Why can't you just use the bio->bi_bdev ?  Err, since commit=0A=
>>> 74d46992e0d9d, I mean bio->bi_disk to get the start sector?=0A=
>>>=0A=
>>> You should be able to get it from:=0A=
>>>=0A=
>>> struct hd_struct *part =3D disk_get_part(bio->bi_disk, bio->bi_partno);=
=0A=
>>> part->start_sect;=0A=
>>> disk_put_part(part);=0A=
>>>=0A=
>>> or:=0A=
>>>=0A=
>>> struct block_device *bdev =3D bdget_disk(bio->bi_disk, bio->bi_partno);=
=0A=
>>> get_start_sect(fc->dev->bdev);=0A=
>>> bdput(bdev);=0A=
>>=0A=
>> I am afraid this does not work. The original bio disk is of course /dev/=
dm-xx so=0A=
>> not the partition (partno =3D=3D 0), and the mapped bio passed as argume=
nt to the=0A=
>> remap zone function (clone of the orig bio) is pointing to the entire un=
derlying=0A=
>> disk, not the partition disk used in dmsetup. bi_partno is 0 for both BI=
Os.=0A=
>> Which seems weird, __bio_clone_fast() does copy the disk and part number=
 as is.=0A=
>> The disk is of course correctly changed from the orig dm disk to the act=
ual=0A=
>> physical disk in the clone, but the partition number stays the same. In =
the end,=0A=
>> only the target has the correct bdev (disk+partno) information it seems.=
=0A=
> =0A=
> OK, that is a bug that must to be found then.=0A=
=0A=
Nope. No bug. I figured it out: after the bio remapping in the target (whic=
h=0A=
calls bio_set_dev() to indicate the partition bdev target for the bio), the=
=0A=
block core will remap that bio to the container bdev (the entire disk) whic=
h is=0A=
partition 0. So the partition number is lost when the bio completes and zon=
e=0A=
remap cannot get the starting physical sector for the adjustment of the zon=
e=0A=
start and wp position.=0A=
=0A=
>> It looks like the partno gets lost, which does not seem correct. I must =
be=0A=
>> missing something, but I fail to see what it is. Any suggestion ?=0A=
> =0A=
> Not at the moment but I'll have a closer look early next week.=0A=
=0A=
Now, I see 2 solutions:=0A=
* have the target add to the "start" argument passed to dm_remap_zone_repor=
t()=0A=
the offset (start sector) of the block device. The calculation match what n=
eeds=0A=
to be done, so it works. It is a little ugly though as the target has to be=
=0A=
"partition aware" in a sense.=0A=
* Add a "struct dm_dev *dev" argument to dm_remap_zone_report(). The device=
=0A=
start sector is simply "get_start_sect(dev->bdev)" with that.=0A=
=0A=
I prefer the second solution... What do you think ?=0A=
=0A=
Note that I have a zoned block device series in the work which improves zon=
e=0A=
report and simplifies all of this remapping business in dm. But the current=
 code=0A=
still needs fixing.=0A=
=0A=
Best regards.=0A=
=0A=
-- =0A=
Damien Le Moal=0A=
Western Digital Research=0A=

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: dm: Fix report zone remapping
@ 2018-10-06  4:03         ` Damien Le Moal
  0 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2018-10-06  4:03 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: linux-block, dm-devel

Mike,

On 2018/10/06 12:49, Mike Snitzer wrote:
> On Fri, Oct 05 2018 at  9:24pm -0400,
> Damien Le Moal <Damien.LeMoal@wdc.com> wrote:
> 
>> Mike,
>>
>> On 2018/10/06 3:29, Mike Snitzer wrote:
>>> On Fri, Oct 05 2018 at  6:34am -0400,
>>> Damien Le Moal <damien.lemoal@wdc.com> wrote:
>>>
>>>> If dm-linear or dm-flakey have targets on top of a partition of a zoned
>>>> block device, remapping of the start sector and write pointer position
>>>> of the zones reported by a report zones BIO must be modified to not
>>>> only account for the target table entry mapping, but also to account
>>>> for the partition first sector. This start sector must be substracted
>>>> to the start sector of all zones reported. The write pointer position
>>>> of sequential zones must also be reduced by this offset.
>>>>
>>>> Since there is no easy way to access the underlying bdev of the target
>>>> table entry from the dm_target or dm_target_io pointers, modify the
>>>> interface of the function dm_remap_zone_report() to allow a target to
>>>> pass the partition block device starting sector (offset).
>>>>
>>>> Fixes: 10999307c14e ("dm: introduce dm_remap_zone_report()")
>>>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>>>> Cc: <stable@vger.kernel.org>
>>>
>>> [dropping the actual stable cc from the mail, please don't cc stable in
>>> the mail header, could be that git-send-email just pulled it in but...]
>>
>> My apologies about that. It was indeed git-send-email. Will fix that.
>>
>>> This needs a different fix.  There should be absolutely no need to pass
>>> in a start offset for a device that is already supposed to be described
>>> within the 'struct bio' passed to dm_remap_zone_report().
>>>
>>> Why can't you just use the bio->bi_bdev ?  Err, since commit
>>> 74d46992e0d9d, I mean bio->bi_disk to get the start sector?
>>>
>>> You should be able to get it from:
>>>
>>> struct hd_struct *part = disk_get_part(bio->bi_disk, bio->bi_partno);
>>> part->start_sect;
>>> disk_put_part(part);
>>>
>>> or:
>>>
>>> struct block_device *bdev = bdget_disk(bio->bi_disk, bio->bi_partno);
>>> get_start_sect(fc->dev->bdev);
>>> bdput(bdev);
>>
>> I am afraid this does not work. The original bio disk is of course /dev/dm-xx so
>> not the partition (partno == 0), and the mapped bio passed as argument to the
>> remap zone function (clone of the orig bio) is pointing to the entire underlying
>> disk, not the partition disk used in dmsetup. bi_partno is 0 for both BIOs.
>> Which seems weird, __bio_clone_fast() does copy the disk and part number as is.
>> The disk is of course correctly changed from the orig dm disk to the actual
>> physical disk in the clone, but the partition number stays the same. In the end,
>> only the target has the correct bdev (disk+partno) information it seems.
> 
> OK, that is a bug that must to be found then.

Nope. No bug. I figured it out: after the bio remapping in the target (which
calls bio_set_dev() to indicate the partition bdev target for the bio), the
block core will remap that bio to the container bdev (the entire disk) which is
partition 0. So the partition number is lost when the bio completes and zone
remap cannot get the starting physical sector for the adjustment of the zone
start and wp position.

>> It looks like the partno gets lost, which does not seem correct. I must be
>> missing something, but I fail to see what it is. Any suggestion ?
> 
> Not at the moment but I'll have a closer look early next week.

Now, I see 2 solutions:
* have the target add to the "start" argument passed to dm_remap_zone_report()
the offset (start sector) of the block device. The calculation match what needs
to be done, so it works. It is a little ugly though as the target has to be
"partition aware" in a sense.
* Add a "struct dm_dev *dev" argument to dm_remap_zone_report(). The device
start sector is simply "get_start_sect(dev->bdev)" with that.

I prefer the second solution... What do you think ?

Note that I have a zoned block device series in the work which improves zone
report and simplifies all of this remapping business in dm. But the current code
still needs fixing.

Best regards.

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: dm: Fix report zone remapping
  2018-10-06  4:03         ` Damien Le Moal
@ 2018-10-06  5:32           ` Mike Snitzer
  -1 siblings, 0 replies; 14+ messages in thread
From: Mike Snitzer @ 2018-10-06  5:32 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: dm-devel, linux-block

On Sat, Oct 06 2018 at 12:03am -0400,
Damien Le Moal <Damien.LeMoal@wdc.com> wrote:

> Mike,
> 
> On 2018/10/06 12:49, Mike Snitzer wrote:
> > On Fri, Oct 05 2018 at  9:24pm -0400,
> > Damien Le Moal <Damien.LeMoal@wdc.com> wrote:
> > 
> >> Mike,
> >>
> >> On 2018/10/06 3:29, Mike Snitzer wrote:
> >>> On Fri, Oct 05 2018 at  6:34am -0400,
> >>> Damien Le Moal <damien.lemoal@wdc.com> wrote:
> >>>
> >>>> If dm-linear or dm-flakey have targets on top of a partition of a zoned
> >>>> block device, remapping of the start sector and write pointer position
> >>>> of the zones reported by a report zones BIO must be modified to not
> >>>> only account for the target table entry mapping, but also to account
> >>>> for the partition first sector. This start sector must be substracted
> >>>> to the start sector of all zones reported. The write pointer position
> >>>> of sequential zones must also be reduced by this offset.
> >>>>
> >>>> Since there is no easy way to access the underlying bdev of the target
> >>>> table entry from the dm_target or dm_target_io pointers, modify the
> >>>> interface of the function dm_remap_zone_report() to allow a target to
> >>>> pass the partition block device starting sector (offset).
> >>>>
> >>>> Fixes: 10999307c14e ("dm: introduce dm_remap_zone_report()")
> >>>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> >>>> Cc: <stable@vger.kernel.org>
> >>>
> >>> [dropping the actual stable cc from the mail, please don't cc stable in
> >>> the mail header, could be that git-send-email just pulled it in but...]
> >>
> >> My apologies about that. It was indeed git-send-email. Will fix that.
> >>
> >>> This needs a different fix.  There should be absolutely no need to pass
> >>> in a start offset for a device that is already supposed to be described
> >>> within the 'struct bio' passed to dm_remap_zone_report().
> >>>
> >>> Why can't you just use the bio->bi_bdev ?  Err, since commit
> >>> 74d46992e0d9d, I mean bio->bi_disk to get the start sector?
> >>>
> >>> You should be able to get it from:
> >>>
> >>> struct hd_struct *part = disk_get_part(bio->bi_disk, bio->bi_partno);
> >>> part->start_sect;
> >>> disk_put_part(part);
> >>>
> >>> or:
> >>>
> >>> struct block_device *bdev = bdget_disk(bio->bi_disk, bio->bi_partno);
> >>> get_start_sect(fc->dev->bdev);
> >>> bdput(bdev);
> >>
> >> I am afraid this does not work. The original bio disk is of course /dev/dm-xx so
> >> not the partition (partno == 0), and the mapped bio passed as argument to the
> >> remap zone function (clone of the orig bio) is pointing to the entire underlying
> >> disk, not the partition disk used in dmsetup. bi_partno is 0 for both BIOs.
> >> Which seems weird, __bio_clone_fast() does copy the disk and part number as is.
> >> The disk is of course correctly changed from the orig dm disk to the actual
> >> physical disk in the clone, but the partition number stays the same. In the end,
> >> only the target has the correct bdev (disk+partno) information it seems.
> > 
> > OK, that is a bug that must to be found then.
> 
> Nope. No bug. I figured it out: after the bio remapping in the target (which
> calls bio_set_dev() to indicate the partition bdev target for the bio), the
> block core will remap that bio to the container bdev (the entire disk) which is
> partition 0.

I just fully explored the bdev lookup aspect of the DM code (thinking
somehow we were dropping the partno on the floor).

That aspect all looks fine, so I didn't send an email detailing my
findings.

> So the partition number is lost when the bio completes and zone
> remap cannot get the starting physical sector for the adjustment of the zone
> start and wp position.

I didn't think to look at block core's partition remapping having a
side-effect of losing the partition info.  Nasty.  But even if that is
the case, shouldn't the bio have been updated to reflect the proper
offset into the device?

> >> It looks like the partno gets lost, which does not seem correct. I must be
> >> missing something, but I fail to see what it is. Any suggestion ?
> > 
> > Not at the moment but I'll have a closer look early next week.
> 
> Now, I see 2 solutions:
> * have the target add to the "start" argument passed to dm_remap_zone_report()
> the offset (start sector) of the block device. The calculation match what needs
> to be done, so it works. It is a little ugly though as the target has to be
> "partition aware" in a sense.
> * Add a "struct dm_dev *dev" argument to dm_remap_zone_report(). The device
> start sector is simply "get_start_sect(dev->bdev)" with that.
> 
> I prefer the second solution... What do you think ?

I think: I hate block core's partition code! ;)

The 2nd option seems better, but I still need to get to the bottom of
_why_ the completing bio isn't properly describing the IO that was
issued.  It cannot have it both ways.  If it strips the partition info
(and just maps IO relative to front of entire disk) then the bio should
to be updated reflect this.

Too late here now, will look tomorrow.

> Note that I have a zoned block device series in the work which improves zone
> report and simplifies all of this remapping business in dm. But the current code
> still needs fixing.

OK.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: dm: Fix report zone remapping
@ 2018-10-06  5:32           ` Mike Snitzer
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Snitzer @ 2018-10-06  5:32 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-block, dm-devel

On Sat, Oct 06 2018 at 12:03am -0400,
Damien Le Moal <Damien.LeMoal@wdc.com> wrote:

> Mike,
> 
> On 2018/10/06 12:49, Mike Snitzer wrote:
> > On Fri, Oct 05 2018 at  9:24pm -0400,
> > Damien Le Moal <Damien.LeMoal@wdc.com> wrote:
> > 
> >> Mike,
> >>
> >> On 2018/10/06 3:29, Mike Snitzer wrote:
> >>> On Fri, Oct 05 2018 at  6:34am -0400,
> >>> Damien Le Moal <damien.lemoal@wdc.com> wrote:
> >>>
> >>>> If dm-linear or dm-flakey have targets on top of a partition of a zoned
> >>>> block device, remapping of the start sector and write pointer position
> >>>> of the zones reported by a report zones BIO must be modified to not
> >>>> only account for the target table entry mapping, but also to account
> >>>> for the partition first sector. This start sector must be substracted
> >>>> to the start sector of all zones reported. The write pointer position
> >>>> of sequential zones must also be reduced by this offset.
> >>>>
> >>>> Since there is no easy way to access the underlying bdev of the target
> >>>> table entry from the dm_target or dm_target_io pointers, modify the
> >>>> interface of the function dm_remap_zone_report() to allow a target to
> >>>> pass the partition block device starting sector (offset).
> >>>>
> >>>> Fixes: 10999307c14e ("dm: introduce dm_remap_zone_report()")
> >>>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> >>>> Cc: <stable@vger.kernel.org>
> >>>
> >>> [dropping the actual stable cc from the mail, please don't cc stable in
> >>> the mail header, could be that git-send-email just pulled it in but...]
> >>
> >> My apologies about that. It was indeed git-send-email. Will fix that.
> >>
> >>> This needs a different fix.  There should be absolutely no need to pass
> >>> in a start offset for a device that is already supposed to be described
> >>> within the 'struct bio' passed to dm_remap_zone_report().
> >>>
> >>> Why can't you just use the bio->bi_bdev ?  Err, since commit
> >>> 74d46992e0d9d, I mean bio->bi_disk to get the start sector?
> >>>
> >>> You should be able to get it from:
> >>>
> >>> struct hd_struct *part = disk_get_part(bio->bi_disk, bio->bi_partno);
> >>> part->start_sect;
> >>> disk_put_part(part);
> >>>
> >>> or:
> >>>
> >>> struct block_device *bdev = bdget_disk(bio->bi_disk, bio->bi_partno);
> >>> get_start_sect(fc->dev->bdev);
> >>> bdput(bdev);
> >>
> >> I am afraid this does not work. The original bio disk is of course /dev/dm-xx so
> >> not the partition (partno == 0), and the mapped bio passed as argument to the
> >> remap zone function (clone of the orig bio) is pointing to the entire underlying
> >> disk, not the partition disk used in dmsetup. bi_partno is 0 for both BIOs.
> >> Which seems weird, __bio_clone_fast() does copy the disk and part number as is.
> >> The disk is of course correctly changed from the orig dm disk to the actual
> >> physical disk in the clone, but the partition number stays the same. In the end,
> >> only the target has the correct bdev (disk+partno) information it seems.
> > 
> > OK, that is a bug that must to be found then.
> 
> Nope. No bug. I figured it out: after the bio remapping in the target (which
> calls bio_set_dev() to indicate the partition bdev target for the bio), the
> block core will remap that bio to the container bdev (the entire disk) which is
> partition 0.

I just fully explored the bdev lookup aspect of the DM code (thinking
somehow we were dropping the partno on the floor).

That aspect all looks fine, so I didn't send an email detailing my
findings.

> So the partition number is lost when the bio completes and zone
> remap cannot get the starting physical sector for the adjustment of the zone
> start and wp position.

I didn't think to look at block core's partition remapping having a
side-effect of losing the partition info.  Nasty.  But even if that is
the case, shouldn't the bio have been updated to reflect the proper
offset into the device?

> >> It looks like the partno gets lost, which does not seem correct. I must be
> >> missing something, but I fail to see what it is. Any suggestion ?
> > 
> > Not at the moment but I'll have a closer look early next week.
> 
> Now, I see 2 solutions:
> * have the target add to the "start" argument passed to dm_remap_zone_report()
> the offset (start sector) of the block device. The calculation match what needs
> to be done, so it works. It is a little ugly though as the target has to be
> "partition aware" in a sense.
> * Add a "struct dm_dev *dev" argument to dm_remap_zone_report(). The device
> start sector is simply "get_start_sect(dev->bdev)" with that.
> 
> I prefer the second solution... What do you think ?

I think: I hate block core's partition code! ;)

The 2nd option seems better, but I still need to get to the bottom of
_why_ the completing bio isn't properly describing the IO that was
issued.  It cannot have it both ways.  If it strips the partition info
(and just maps IO relative to front of entire disk) then the bio should
to be updated reflect this.

Too late here now, will look tomorrow.

> Note that I have a zoned block device series in the work which improves zone
> report and simplifies all of this remapping business in dm. But the current code
> still needs fixing.

OK.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: dm: Fix report zone remapping
  2018-10-06  5:32           ` Mike Snitzer
@ 2018-10-06  7:37             ` Damien Le Moal
  -1 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2018-10-06  7:37 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, linux-block

On 2018/10/06 14:32, Mike Snitzer wrote:=0A=
> On Sat, Oct 06 2018 at 12:03am -0400,=0A=
> Damien Le Moal <Damien.LeMoal@wdc.com> wrote:=0A=
> =0A=
>> Mike,=0A=
>>=0A=
>> On 2018/10/06 12:49, Mike Snitzer wrote:=0A=
>>> On Fri, Oct 05 2018 at  9:24pm -0400,=0A=
>>> Damien Le Moal <Damien.LeMoal@wdc.com> wrote:=0A=
>>>=0A=
>>>> Mike,=0A=
>>>>=0A=
>>>> On 2018/10/06 3:29, Mike Snitzer wrote:=0A=
>>>>> On Fri, Oct 05 2018 at  6:34am -0400,=0A=
>>>>> Damien Le Moal <damien.lemoal@wdc.com> wrote:=0A=
>>>>>=0A=
>>>>>> If dm-linear or dm-flakey have targets on top of a partition of a zo=
ned=0A=
>>>>>> block device, remapping of the start sector and write pointer positi=
on=0A=
>>>>>> of the zones reported by a report zones BIO must be modified to not=
=0A=
>>>>>> only account for the target table entry mapping, but also to account=
=0A=
>>>>>> for the partition first sector. This start sector must be substracte=
d=0A=
>>>>>> to the start sector of all zones reported. The write pointer positio=
n=0A=
>>>>>> of sequential zones must also be reduced by this offset.=0A=
>>>>>>=0A=
>>>>>> Since there is no easy way to access the underlying bdev of the targ=
et=0A=
>>>>>> table entry from the dm_target or dm_target_io pointers, modify the=
=0A=
>>>>>> interface of the function dm_remap_zone_report() to allow a target t=
o=0A=
>>>>>> pass the partition block device starting sector (offset).=0A=
>>>>>>=0A=
>>>>>> Fixes: 10999307c14e ("dm: introduce dm_remap_zone_report()")=0A=
>>>>>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>=0A=
>>>>>> Cc: <stable@vger.kernel.org>=0A=
>>>>>=0A=
>>>>> [dropping the actual stable cc from the mail, please don't cc stable =
in=0A=
>>>>> the mail header, could be that git-send-email just pulled it in but..=
.]=0A=
>>>>=0A=
>>>> My apologies about that. It was indeed git-send-email. Will fix that.=
=0A=
>>>>=0A=
>>>>> This needs a different fix.  There should be absolutely no need to pa=
ss=0A=
>>>>> in a start offset for a device that is already supposed to be describ=
ed=0A=
>>>>> within the 'struct bio' passed to dm_remap_zone_report().=0A=
>>>>>=0A=
>>>>> Why can't you just use the bio->bi_bdev ?  Err, since commit=0A=
>>>>> 74d46992e0d9d, I mean bio->bi_disk to get the start sector?=0A=
>>>>>=0A=
>>>>> You should be able to get it from:=0A=
>>>>>=0A=
>>>>> struct hd_struct *part =3D disk_get_part(bio->bi_disk, bio->bi_partno=
);=0A=
>>>>> part->start_sect;=0A=
>>>>> disk_put_part(part);=0A=
>>>>>=0A=
>>>>> or:=0A=
>>>>>=0A=
>>>>> struct block_device *bdev =3D bdget_disk(bio->bi_disk, bio->bi_partno=
);=0A=
>>>>> get_start_sect(fc->dev->bdev);=0A=
>>>>> bdput(bdev);=0A=
>>>>=0A=
>>>> I am afraid this does not work. The original bio disk is of course /de=
v/dm-xx so=0A=
>>>> not the partition (partno =3D=3D 0), and the mapped bio passed as argu=
ment to the=0A=
>>>> remap zone function (clone of the orig bio) is pointing to the entire =
underlying=0A=
>>>> disk, not the partition disk used in dmsetup. bi_partno is 0 for both =
BIOs.=0A=
>>>> Which seems weird, __bio_clone_fast() does copy the disk and part numb=
er as is.=0A=
>>>> The disk is of course correctly changed from the orig dm disk to the a=
ctual=0A=
>>>> physical disk in the clone, but the partition number stays the same. I=
n the end,=0A=
>>>> only the target has the correct bdev (disk+partno) information it seem=
s.=0A=
>>>=0A=
>>> OK, that is a bug that must to be found then.=0A=
>>=0A=
>> Nope. No bug. I figured it out: after the bio remapping in the target (w=
hich=0A=
>> calls bio_set_dev() to indicate the partition bdev target for the bio), =
the=0A=
>> block core will remap that bio to the container bdev (the entire disk) w=
hich is=0A=
>> partition 0.=0A=
> =0A=
> I just fully explored the bdev lookup aspect of the DM code (thinking=0A=
> somehow we were dropping the partno on the floor).=0A=
> =0A=
> That aspect all looks fine, so I didn't send an email detailing my=0A=
> findings.=0A=
> =0A=
>> So the partition number is lost when the bio completes and zone=0A=
>> remap cannot get the starting physical sector for the adjustment of the =
zone=0A=
>> start and wp position.=0A=
> =0A=
> I didn't think to look at block core's partition remapping having a=0A=
> side-effect of losing the partition info.  Nasty.  But even if that is=0A=
> the case, shouldn't the bio have been updated to reflect the proper=0A=
> offset into the device?=0A=
> =0A=
>>>> It looks like the partno gets lost, which does not seem correct. I mus=
t be=0A=
>>>> missing something, but I fail to see what it is. Any suggestion ?=0A=
>>>=0A=
>>> Not at the moment but I'll have a closer look early next week.=0A=
>>=0A=
>> Now, I see 2 solutions:=0A=
>> * have the target add to the "start" argument passed to dm_remap_zone_re=
port()=0A=
>> the offset (start sector) of the block device. The calculation match wha=
t needs=0A=
>> to be done, so it works. It is a little ugly though as the target has to=
 be=0A=
>> "partition aware" in a sense.=0A=
>> * Add a "struct dm_dev *dev" argument to dm_remap_zone_report(). The dev=
ice=0A=
>> start sector is simply "get_start_sect(dev->bdev)" with that.=0A=
>>=0A=
>> I prefer the second solution... What do you think ?=0A=
> =0A=
> I think: I hate block core's partition code! ;)=0A=
> =0A=
> The 2nd option seems better, but I still need to get to the bottom of=0A=
> _why_ the completing bio isn't properly describing the IO that was=0A=
> issued.  It cannot have it both ways.  If it strips the partition info=0A=
> (and just maps IO relative to front of entire disk) then the bio should=
=0A=
> to be updated reflect this.=0A=
> =0A=
> Too late here now, will look tomorrow.=0A=
=0A=
I sent a v2 using the dm_dev before reading this email...=0A=
=0A=
On completion, the bio sector will be the sector when issued + completed se=
ctors=0A=
and the BIO size will indicate the remaining bytes to do, which is always 0=
 for=0A=
zone report BIOs. So looking only at the completed "bio" passed as argument=
, the=0A=
actual sector that was used is lost because the BIO original length is lost=
 too.=0A=
=0A=
Thinking of it now, we do not need an additional argument. Since a zone rep=
ort=0A=
bio clone is ALWAYS the entire original BIO (zone reports are never split),=
=0A=
report_bio is storing the original length of the request. Using that, we ca=
n=0A=
find out the actual sector used for execution, and accounting for the targe=
t=0A=
start offset in the device and its mapping, we can do this to get the parti=
tion=0A=
offset (bio is the clone, report_bio is the original):=0A=
=0A=
dev_offset =3D bio->bi_iter.bi_sector + ti->begin=0A=
		- (start + bio_end_sector(report_bio));=0A=
=0A=
Which is super not obvious and will need to be documented with a nice comme=
nt. I=0A=
just tested, and everything works fine with this. If you are OK with this=
=0A=
change, I will send a v3.=0A=
=0A=
Best regards.=0A=
=0A=
=0A=
-- =0A=
Damien Le Moal=0A=
Western Digital Research=0A=

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: dm: Fix report zone remapping
@ 2018-10-06  7:37             ` Damien Le Moal
  0 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2018-10-06  7:37 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: linux-block, dm-devel

On 2018/10/06 14:32, Mike Snitzer wrote:
> On Sat, Oct 06 2018 at 12:03am -0400,
> Damien Le Moal <Damien.LeMoal@wdc.com> wrote:
> 
>> Mike,
>>
>> On 2018/10/06 12:49, Mike Snitzer wrote:
>>> On Fri, Oct 05 2018 at  9:24pm -0400,
>>> Damien Le Moal <Damien.LeMoal@wdc.com> wrote:
>>>
>>>> Mike,
>>>>
>>>> On 2018/10/06 3:29, Mike Snitzer wrote:
>>>>> On Fri, Oct 05 2018 at  6:34am -0400,
>>>>> Damien Le Moal <damien.lemoal@wdc.com> wrote:
>>>>>
>>>>>> If dm-linear or dm-flakey have targets on top of a partition of a zoned
>>>>>> block device, remapping of the start sector and write pointer position
>>>>>> of the zones reported by a report zones BIO must be modified to not
>>>>>> only account for the target table entry mapping, but also to account
>>>>>> for the partition first sector. This start sector must be substracted
>>>>>> to the start sector of all zones reported. The write pointer position
>>>>>> of sequential zones must also be reduced by this offset.
>>>>>>
>>>>>> Since there is no easy way to access the underlying bdev of the target
>>>>>> table entry from the dm_target or dm_target_io pointers, modify the
>>>>>> interface of the function dm_remap_zone_report() to allow a target to
>>>>>> pass the partition block device starting sector (offset).
>>>>>>
>>>>>> Fixes: 10999307c14e ("dm: introduce dm_remap_zone_report()")
>>>>>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>>>>>> Cc: <stable@vger.kernel.org>
>>>>>
>>>>> [dropping the actual stable cc from the mail, please don't cc stable in
>>>>> the mail header, could be that git-send-email just pulled it in but...]
>>>>
>>>> My apologies about that. It was indeed git-send-email. Will fix that.
>>>>
>>>>> This needs a different fix.  There should be absolutely no need to pass
>>>>> in a start offset for a device that is already supposed to be described
>>>>> within the 'struct bio' passed to dm_remap_zone_report().
>>>>>
>>>>> Why can't you just use the bio->bi_bdev ?  Err, since commit
>>>>> 74d46992e0d9d, I mean bio->bi_disk to get the start sector?
>>>>>
>>>>> You should be able to get it from:
>>>>>
>>>>> struct hd_struct *part = disk_get_part(bio->bi_disk, bio->bi_partno);
>>>>> part->start_sect;
>>>>> disk_put_part(part);
>>>>>
>>>>> or:
>>>>>
>>>>> struct block_device *bdev = bdget_disk(bio->bi_disk, bio->bi_partno);
>>>>> get_start_sect(fc->dev->bdev);
>>>>> bdput(bdev);
>>>>
>>>> I am afraid this does not work. The original bio disk is of course /dev/dm-xx so
>>>> not the partition (partno == 0), and the mapped bio passed as argument to the
>>>> remap zone function (clone of the orig bio) is pointing to the entire underlying
>>>> disk, not the partition disk used in dmsetup. bi_partno is 0 for both BIOs.
>>>> Which seems weird, __bio_clone_fast() does copy the disk and part number as is.
>>>> The disk is of course correctly changed from the orig dm disk to the actual
>>>> physical disk in the clone, but the partition number stays the same. In the end,
>>>> only the target has the correct bdev (disk+partno) information it seems.
>>>
>>> OK, that is a bug that must to be found then.
>>
>> Nope. No bug. I figured it out: after the bio remapping in the target (which
>> calls bio_set_dev() to indicate the partition bdev target for the bio), the
>> block core will remap that bio to the container bdev (the entire disk) which is
>> partition 0.
> 
> I just fully explored the bdev lookup aspect of the DM code (thinking
> somehow we were dropping the partno on the floor).
> 
> That aspect all looks fine, so I didn't send an email detailing my
> findings.
> 
>> So the partition number is lost when the bio completes and zone
>> remap cannot get the starting physical sector for the adjustment of the zone
>> start and wp position.
> 
> I didn't think to look at block core's partition remapping having a
> side-effect of losing the partition info.  Nasty.  But even if that is
> the case, shouldn't the bio have been updated to reflect the proper
> offset into the device?
> 
>>>> It looks like the partno gets lost, which does not seem correct. I must be
>>>> missing something, but I fail to see what it is. Any suggestion ?
>>>
>>> Not at the moment but I'll have a closer look early next week.
>>
>> Now, I see 2 solutions:
>> * have the target add to the "start" argument passed to dm_remap_zone_report()
>> the offset (start sector) of the block device. The calculation match what needs
>> to be done, so it works. It is a little ugly though as the target has to be
>> "partition aware" in a sense.
>> * Add a "struct dm_dev *dev" argument to dm_remap_zone_report(). The device
>> start sector is simply "get_start_sect(dev->bdev)" with that.
>>
>> I prefer the second solution... What do you think ?
> 
> I think: I hate block core's partition code! ;)
> 
> The 2nd option seems better, but I still need to get to the bottom of
> _why_ the completing bio isn't properly describing the IO that was
> issued.  It cannot have it both ways.  If it strips the partition info
> (and just maps IO relative to front of entire disk) then the bio should
> to be updated reflect this.
> 
> Too late here now, will look tomorrow.

I sent a v2 using the dm_dev before reading this email...

On completion, the bio sector will be the sector when issued + completed sectors
and the BIO size will indicate the remaining bytes to do, which is always 0 for
zone report BIOs. So looking only at the completed "bio" passed as argument, the
actual sector that was used is lost because the BIO original length is lost too.

Thinking of it now, we do not need an additional argument. Since a zone report
bio clone is ALWAYS the entire original BIO (zone reports are never split),
report_bio is storing the original length of the request. Using that, we can
find out the actual sector used for execution, and accounting for the target
start offset in the device and its mapping, we can do this to get the partition
offset (bio is the clone, report_bio is the original):

dev_offset = bio->bi_iter.bi_sector + ti->begin
		- (start + bio_end_sector(report_bio));

Which is super not obvious and will need to be documented with a nice comment. I
just tested, and everything works fine with this. If you are OK with this
change, I will send a v3.

Best regards.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: dm: Fix report zone remapping
  2018-10-06  7:37             ` Damien Le Moal
  (?)
@ 2018-10-08 16:25             ` Mike Snitzer
  -1 siblings, 0 replies; 14+ messages in thread
From: Mike Snitzer @ 2018-10-08 16:25 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-block, dm-devel

On Sat, Oct 06 2018 at  3:37am -0400,
Damien Le Moal <Damien.LeMoal@wdc.com> wrote:

> On 2018/10/06 14:32, Mike Snitzer wrote:
> > On Sat, Oct 06 2018 at 12:03am -0400,
> > Damien Le Moal <Damien.LeMoal@wdc.com> wrote:
> > 
> >> Mike,
> >>
> >> On 2018/10/06 12:49, Mike Snitzer wrote:
> >>> On Fri, Oct 05 2018 at  9:24pm -0400,
> >>> Damien Le Moal <Damien.LeMoal@wdc.com> wrote:
> >>>
> >>>> Mike,
> >>>>
> >>>> On 2018/10/06 3:29, Mike Snitzer wrote:
> >>>>> On Fri, Oct 05 2018 at  6:34am -0400,
> >>>>> Damien Le Moal <damien.lemoal@wdc.com> wrote:
> >>>>>
> >>>>>> If dm-linear or dm-flakey have targets on top of a partition of a zoned
> >>>>>> block device, remapping of the start sector and write pointer position
> >>>>>> of the zones reported by a report zones BIO must be modified to not
> >>>>>> only account for the target table entry mapping, but also to account
> >>>>>> for the partition first sector. This start sector must be substracted
> >>>>>> to the start sector of all zones reported. The write pointer position
> >>>>>> of sequential zones must also be reduced by this offset.
> >>>>>>
> >>>>>> Since there is no easy way to access the underlying bdev of the target
> >>>>>> table entry from the dm_target or dm_target_io pointers, modify the
> >>>>>> interface of the function dm_remap_zone_report() to allow a target to
> >>>>>> pass the partition block device starting sector (offset).
> >>>>>>
> >>>>>> Fixes: 10999307c14e ("dm: introduce dm_remap_zone_report()")
> >>>>>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> >>>>>> Cc: <stable@vger.kernel.org>
> >>>>>
> >>>>> [dropping the actual stable cc from the mail, please don't cc stable in
> >>>>> the mail header, could be that git-send-email just pulled it in but...]
> >>>>
> >>>> My apologies about that. It was indeed git-send-email. Will fix that.
> >>>>
> >>>>> This needs a different fix.  There should be absolutely no need to pass
> >>>>> in a start offset for a device that is already supposed to be described
> >>>>> within the 'struct bio' passed to dm_remap_zone_report().
> >>>>>
> >>>>> Why can't you just use the bio->bi_bdev ?  Err, since commit
> >>>>> 74d46992e0d9d, I mean bio->bi_disk to get the start sector?
> >>>>>
> >>>>> You should be able to get it from:
> >>>>>
> >>>>> struct hd_struct *part = disk_get_part(bio->bi_disk, bio->bi_partno);
> >>>>> part->start_sect;
> >>>>> disk_put_part(part);
> >>>>>
> >>>>> or:
> >>>>>
> >>>>> struct block_device *bdev = bdget_disk(bio->bi_disk, bio->bi_partno);
> >>>>> get_start_sect(fc->dev->bdev);
> >>>>> bdput(bdev);
> >>>>
> >>>> I am afraid this does not work. The original bio disk is of course /dev/dm-xx so
> >>>> not the partition (partno == 0), and the mapped bio passed as argument to the
> >>>> remap zone function (clone of the orig bio) is pointing to the entire underlying
> >>>> disk, not the partition disk used in dmsetup. bi_partno is 0 for both BIOs.
> >>>> Which seems weird, __bio_clone_fast() does copy the disk and part number as is.
> >>>> The disk is of course correctly changed from the orig dm disk to the actual
> >>>> physical disk in the clone, but the partition number stays the same. In the end,
> >>>> only the target has the correct bdev (disk+partno) information it seems.
> >>>
> >>> OK, that is a bug that must to be found then.
> >>
> >> Nope. No bug. I figured it out: after the bio remapping in the target (which
> >> calls bio_set_dev() to indicate the partition bdev target for the bio), the
> >> block core will remap that bio to the container bdev (the entire disk) which is
> >> partition 0.
> > 
> > I just fully explored the bdev lookup aspect of the DM code (thinking
> > somehow we were dropping the partno on the floor).
> > 
> > That aspect all looks fine, so I didn't send an email detailing my
> > findings.
> > 
> >> So the partition number is lost when the bio completes and zone
> >> remap cannot get the starting physical sector for the adjustment of the zone
> >> start and wp position.
> > 
> > I didn't think to look at block core's partition remapping having a
> > side-effect of losing the partition info.  Nasty.  But even if that is
> > the case, shouldn't the bio have been updated to reflect the proper
> > offset into the device?
> > 
> >>>> It looks like the partno gets lost, which does not seem correct. I must be
> >>>> missing something, but I fail to see what it is. Any suggestion ?
> >>>
> >>> Not at the moment but I'll have a closer look early next week.
> >>
> >> Now, I see 2 solutions:
> >> * have the target add to the "start" argument passed to dm_remap_zone_report()
> >> the offset (start sector) of the block device. The calculation match what needs
> >> to be done, so it works. It is a little ugly though as the target has to be
> >> "partition aware" in a sense.
> >> * Add a "struct dm_dev *dev" argument to dm_remap_zone_report(). The device
> >> start sector is simply "get_start_sect(dev->bdev)" with that.
> >>
> >> I prefer the second solution... What do you think ?
> > 
> > I think: I hate block core's partition code! ;)
> > 
> > The 2nd option seems better, but I still need to get to the bottom of
> > _why_ the completing bio isn't properly describing the IO that was
> > issued.  It cannot have it both ways.  If it strips the partition info
> > (and just maps IO relative to front of entire disk) then the bio should
> > to be updated reflect this.
> > 
> > Too late here now, will look tomorrow.
> 
> I sent a v2 using the dm_dev before reading this email...
> 
> On completion, the bio sector will be the sector when issued + completed sectors
> and the BIO size will indicate the remaining bytes to do, which is always 0 for
> zone report BIOs. So looking only at the completed "bio" passed as argument, the
> actual sector that was used is lost because the BIO original length is lost too.

Makes sense.  Rare to want the full bio description after it has gone
through the beast.  Usually accounting is done on each subset (split) of
the original bio as each completes, no?

> Thinking of it now, we do not need an additional argument. Since a zone report
> bio clone is ALWAYS the entire original BIO (zone reports are never split),
> report_bio is storing the original length of the request. Using that, we can
> find out the actual sector used for execution, and accounting for the target
> start offset in the device and its mapping, we can do this to get the partition
> offset (bio is the clone, report_bio is the original):
> 
> dev_offset = bio->bi_iter.bi_sector + ti->begin
> 		- (start + bio_end_sector(report_bio));
> 
> Which is super not obvious and will need to be documented with a nice comment. I
> just tested, and everything works fine with this. If you are OK with this
> change, I will send a v3.

OK, please do send v3 if you're happy with it.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-10-08 23:38 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-05 10:34 [PATCH] dm: Fix report zone remapping Damien Le Moal
2018-10-05 18:29 ` Mike Snitzer
2018-10-05 18:29   ` Mike Snitzer
2018-10-06  1:24   ` Damien Le Moal
2018-10-06  1:24     ` Damien Le Moal
2018-10-06  3:49     ` Mike Snitzer
2018-10-06  3:49       ` Mike Snitzer
2018-10-06  4:03       ` Damien Le Moal
2018-10-06  4:03         ` Damien Le Moal
2018-10-06  5:32         ` Mike Snitzer
2018-10-06  5:32           ` Mike Snitzer
2018-10-06  7:37           ` Damien Le Moal
2018-10-06  7:37             ` Damien Le Moal
2018-10-08 16:25             ` Mike Snitzer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.