All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 00/12] dm-zoned: multi-device support
@ 2020-05-22 15:38 Hannes Reinecke
  2020-05-22 15:38 ` [PATCH 01/12] dm-zoned: add debugging message for reading superblocks Hannes Reinecke
                   ` (11 more replies)
  0 siblings, 12 replies; 34+ messages in thread
From: Hannes Reinecke @ 2020-05-22 15:38 UTC (permalink / raw)
  To: Damien LeMoal; +Cc: dm-devel, Mike Snitzer

Hi all,

on the risk of boring you to death, here's yet another RFC to update
dm-zoned. As it has seen only light testing and has some areas which
need to be improved I'd consider it RFC material.
I'm just putting it out now to get some feedback and get it ready for
the next merge window.

So, this patchset:
- Converts the zone array to using xarray for better scalability
- Separate out shared structures into per-device structure
- Lift the restriction of 2 devices to handle an arbitrary number
  of drives.

With this patchset I'm seeing a performance increase for writes from
an average of 150MB/s (with 2 drives) to 200MB/s (with 3 drives).

Hannes Reinecke (12):
  dm-zoned: add debugging message for reading superblocks
  dm-zoned: convert to xarray
  dm-zoned: use on-stack superblock for tertiary devices
  dm-zoned: secondary superblock must reside on the same devices than
    primary superblock
  dm-zoned: add device pointer to struct dm_zone
  dm-zoned: add metadata pointer to struct dmz_dev
  dm-zoned: add a 'reserved' zone flag
  dm-zoned: move random and sequential zones into struct dmz_dev
  dm-zoned: improve logging messages for reclaim
  dm-zoned: support arbitrary number of devices
  dm-zoned: round-robin load balancer for reclaiming zones
  dm-zoned: per-device reclaim

 drivers/md/dm-zoned-metadata.c | 430 ++++++++++++++++++++++++-----------------
 drivers/md/dm-zoned-reclaim.c  |  85 ++++----
 drivers/md/dm-zoned-target.c   | 172 ++++++++++-------
 drivers/md/dm-zoned.h          |  70 ++++---
 4 files changed, 454 insertions(+), 303 deletions(-)

-- 
2.16.4

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 01/12] dm-zoned: add debugging message for reading superblocks
  2020-05-22 15:38 [PATCH RFC 00/12] dm-zoned: multi-device support Hannes Reinecke
@ 2020-05-22 15:38 ` Hannes Reinecke
  2020-05-25  1:54   ` Damien Le Moal
  2020-05-22 15:38 ` [PATCH 02/12] dm-zoned: convert to xarray Hannes Reinecke
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Hannes Reinecke @ 2020-05-22 15:38 UTC (permalink / raw)
  To: Damien LeMoal; +Cc: dm-devel, Mike Snitzer

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-zoned-metadata.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index 4a2e351365c5..b0d3ed4ac56a 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -1105,6 +1105,9 @@ static int dmz_check_sb(struct dmz_metadata *zmd, unsigned int set)
  */
 static int dmz_read_sb(struct dmz_metadata *zmd, unsigned int set)
 {
+	DMDEBUG("(%s): read superblock set %d dev %s block %llu",
+		zmd->devname, set, zmd->sb[set].dev->name,
+		zmd->sb[set].block);
 	return dmz_rdwr_block(zmd->sb[set].dev, REQ_OP_READ,
 			      zmd->sb[set].block, zmd->sb[set].mblk->page);
 }
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 02/12] dm-zoned: convert to xarray
  2020-05-22 15:38 [PATCH RFC 00/12] dm-zoned: multi-device support Hannes Reinecke
  2020-05-22 15:38 ` [PATCH 01/12] dm-zoned: add debugging message for reading superblocks Hannes Reinecke
@ 2020-05-22 15:38 ` Hannes Reinecke
  2020-05-25  2:01   ` Damien Le Moal
  2020-05-22 15:38 ` [PATCH 03/12] dm-zoned: use on-stack superblock for tertiary devices Hannes Reinecke
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Hannes Reinecke @ 2020-05-22 15:38 UTC (permalink / raw)
  To: Damien LeMoal; +Cc: dm-devel, Mike Snitzer

The zones array is getting really large, and large arrays
tend to wreak havoc with the caches.
So convert it to xarray to become more cache friendly.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-zoned-metadata.c | 98 +++++++++++++++++++++++++++++++-----------
 1 file changed, 73 insertions(+), 25 deletions(-)

diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index b0d3ed4ac56a..3da6702bb1ae 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -172,7 +172,7 @@ struct dmz_metadata {
 	unsigned int		nr_chunks;
 
 	/* Zone information array */
-	struct dm_zone		*zones;
+	struct xarray		zones;
 
 	struct dmz_sb		sb[3];
 	unsigned int		mblk_primary;
@@ -327,6 +327,11 @@ unsigned int dmz_nr_unmap_seq_zones(struct dmz_metadata *zmd)
 	return atomic_read(&zmd->unmap_nr_seq);
 }
 
+static struct dm_zone *dmz_get(struct dmz_metadata *zmd, unsigned int zone_id)
+{
+	return xa_load(&zmd->zones, zone_id);
+}
+
 const char *dmz_metadata_label(struct dmz_metadata *zmd)
 {
 	return (const char *)zmd->label;
@@ -1121,6 +1126,7 @@ static int dmz_lookup_secondary_sb(struct dmz_metadata *zmd)
 {
 	unsigned int zone_nr_blocks = zmd->zone_nr_blocks;
 	struct dmz_mblock *mblk;
+	unsigned int zone_id = zmd->sb[0].zone->id;
 	int i;
 
 	/* Allocate a block */
@@ -1133,17 +1139,16 @@ static int dmz_lookup_secondary_sb(struct dmz_metadata *zmd)
 
 	/* Bad first super block: search for the second one */
 	zmd->sb[1].block = zmd->sb[0].block + zone_nr_blocks;
-	zmd->sb[1].zone = zmd->sb[0].zone + 1;
+	zmd->sb[1].zone = xa_load(&zmd->zones, zone_id + 1);
 	zmd->sb[1].dev = dmz_zone_to_dev(zmd, zmd->sb[1].zone);
-	for (i = 0; i < zmd->nr_rnd_zones - 1; i++) {
+	for (i = 1; i < zmd->nr_rnd_zones; i++) {
 		if (dmz_read_sb(zmd, 1) != 0)
 			break;
-		if (le32_to_cpu(zmd->sb[1].sb->magic) == DMZ_MAGIC) {
-			zmd->sb[1].zone += i;
+		if (le32_to_cpu(zmd->sb[1].sb->magic) == DMZ_MAGIC)
 			return 0;
-		}
 		zmd->sb[1].block += zone_nr_blocks;
-		zmd->sb[1].dev = dmz_zone_to_dev(zmd, zmd->sb[1].zone + i);
+		zmd->sb[1].zone = dmz_get(zmd, zone_id + i);
+		zmd->sb[1].dev = dmz_zone_to_dev(zmd, zmd->sb[1].zone);
 	}
 
 	dmz_free_mblock(zmd, mblk);
@@ -1259,8 +1264,12 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
 	/* Read and check secondary super block */
 	if (ret == 0) {
 		sb_good[0] = true;
-		if (!zmd->sb[1].zone)
-			zmd->sb[1].zone = zmd->sb[0].zone + zmd->nr_meta_zones;
+		if (!zmd->sb[1].zone) {
+			unsigned int zone_id =
+				zmd->sb[0].zone->id + zmd->nr_meta_zones;
+
+			zmd->sb[1].zone = dmz_get(zmd, zone_id);
+		}
 		zmd->sb[1].block = dmz_start_block(zmd, zmd->sb[1].zone);
 		zmd->sb[1].dev = dmz_zone_to_dev(zmd, zmd->sb[1].zone);
 		ret = dmz_get_sb(zmd, 1);
@@ -1341,7 +1350,12 @@ static int dmz_init_zone(struct blk_zone *blkz, unsigned int num, void *data)
 	struct dmz_metadata *zmd = data;
 	struct dmz_dev *dev = zmd->nr_devs > 1 ? &zmd->dev[1] : &zmd->dev[0];
 	int idx = num + dev->zone_offset;
-	struct dm_zone *zone = &zmd->zones[idx];
+	struct dm_zone *zone = kzalloc(sizeof(struct dm_zone), GFP_KERNEL);
+
+	if (!zone)
+		return -ENOMEM;
+	if (xa_insert(&zmd->zones, idx, zone, GFP_KERNEL))
+		return -EBUSY;
 
 	if (blkz->len != zmd->zone_nr_sectors) {
 		if (zmd->sb_version > 1) {
@@ -1397,14 +1411,18 @@ static int dmz_init_zone(struct blk_zone *blkz, unsigned int num, void *data)
 	return 0;
 }
 
-static void dmz_emulate_zones(struct dmz_metadata *zmd, struct dmz_dev *dev)
+static int dmz_emulate_zones(struct dmz_metadata *zmd, struct dmz_dev *dev)
 {
 	int idx;
 	sector_t zone_offset = 0;
 
 	for(idx = 0; idx < dev->nr_zones; idx++) {
-		struct dm_zone *zone = &zmd->zones[idx];
-
+		struct dm_zone *zone =
+			kzalloc(sizeof(struct dm_zone), GFP_KERNEL);
+		if (!zone)
+			return -ENOMEM;
+		if (xa_insert(&zmd->zones, idx, zone, GFP_KERNEL) < 0)
+			return -EBUSY;
 		INIT_LIST_HEAD(&zone->link);
 		atomic_set(&zone->refcount, 0);
 		zone->id = idx;
@@ -1420,6 +1438,7 @@ static void dmz_emulate_zones(struct dmz_metadata *zmd, struct dmz_dev *dev)
 		}
 		zone_offset += zmd->zone_nr_sectors;
 	}
+	return 0;
 }
 
 /*
@@ -1427,8 +1446,15 @@ static void dmz_emulate_zones(struct dmz_metadata *zmd, struct dmz_dev *dev)
  */
 static void dmz_drop_zones(struct dmz_metadata *zmd)
 {
-	kfree(zmd->zones);
-	zmd->zones = NULL;
+	int idx;
+
+	for(idx = 0; idx < zmd->nr_zones; idx++) {
+		struct dm_zone *zone = xa_load(&zmd->zones, idx);
+
+		kfree(zone);
+		xa_erase(&zmd->zones, idx);
+	}
+	xa_destroy(&zmd->zones);
 }
 
 /*
@@ -1460,20 +1486,25 @@ static int dmz_init_zones(struct dmz_metadata *zmd)
 		DMERR("(%s): No zones found", zmd->devname);
 		return -ENXIO;
 	}
-	zmd->zones = kcalloc(zmd->nr_zones, sizeof(struct dm_zone), GFP_KERNEL);
-	if (!zmd->zones)
-		return -ENOMEM;
+	xa_init(&zmd->zones);
 
 	DMDEBUG("(%s): Using %zu B for zone information",
 		zmd->devname, sizeof(struct dm_zone) * zmd->nr_zones);
 
 	if (zmd->nr_devs > 1) {
-		dmz_emulate_zones(zmd, &zmd->dev[0]);
+		ret = dmz_emulate_zones(zmd, &zmd->dev[0]);
+		if (ret < 0) {
+			DMDEBUG("(%s): Failed to emulate zones, error %d",
+				zmd->devname, ret);
+			dmz_drop_zones(zmd);
+			return ret;
+		}
+
 		/*
 		 * Primary superblock zone is always at zone 0 when multiple
 		 * drives are present.
 		 */
-		zmd->sb[0].zone = &zmd->zones[0];
+		zmd->sb[0].zone = dmz_get(zmd, 0);
 
 		zoned_dev = &zmd->dev[1];
 	}
@@ -1576,11 +1607,6 @@ static int dmz_handle_seq_write_err(struct dmz_metadata *zmd,
 	return 0;
 }
 
-static struct dm_zone *dmz_get(struct dmz_metadata *zmd, unsigned int zone_id)
-{
-	return &zmd->zones[zone_id];
-}
-
 /*
  * Reset a zone write pointer.
  */
@@ -1662,6 +1688,11 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
 		}
 
 		dzone = dmz_get(zmd, dzone_id);
+		if (!dzone) {
+			dmz_zmd_err(zmd, "Chunk %u mapping: data zone %u not present",
+				    chunk, dzone_id);
+			return -EIO;
+		}
 		set_bit(DMZ_DATA, &dzone->flags);
 		dzone->chunk = chunk;
 		dmz_get_zone_weight(zmd, dzone);
@@ -1685,6 +1716,11 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
 		}
 
 		bzone = dmz_get(zmd, bzone_id);
+		if (!bzone) {
+			dmz_zmd_err(zmd, "Chunk %u mapping: buffer zone %u not present",
+				    chunk, bzone_id);
+			return -EIO;
+		}
 		if (!dmz_is_rnd(bzone) && !dmz_is_cache(bzone)) {
 			dmz_zmd_err(zmd, "Chunk %u mapping: invalid buffer zone %u",
 				    chunk, bzone_id);
@@ -1715,6 +1751,8 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
 	 */
 	for (i = 0; i < zmd->nr_zones; i++) {
 		dzone = dmz_get(zmd, i);
+		if (!dzone)
+			continue;
 		if (dmz_is_meta(dzone))
 			continue;
 		if (dmz_is_offline(dzone))
@@ -1977,6 +2015,10 @@ struct dm_zone *dmz_get_chunk_mapping(struct dmz_metadata *zmd, unsigned int chu
 	} else {
 		/* The chunk is already mapped: get the mapping zone */
 		dzone = dmz_get(zmd, dzone_id);
+		if (!dzone) {
+			dzone = ERR_PTR(-EIO);
+			goto out;
+		}
 		if (dzone->chunk != chunk) {
 			dzone = ERR_PTR(-EIO);
 			goto out;
@@ -2794,6 +2836,12 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
 	/* Set metadata zones starting from sb_zone */
 	for (i = 0; i < zmd->nr_meta_zones << 1; i++) {
 		zone = dmz_get(zmd, zmd->sb[0].zone->id + i);
+		if (!zone) {
+			dmz_zmd_err(zmd,
+				    "metadata zone %u not present", i);
+			ret = -ENXIO;
+			goto err;
+		}
 		if (!dmz_is_rnd(zone) && !dmz_is_cache(zone)) {
 			dmz_zmd_err(zmd,
 				    "metadata zone %d is not random", i);
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 03/12] dm-zoned: use on-stack superblock for tertiary devices
  2020-05-22 15:38 [PATCH RFC 00/12] dm-zoned: multi-device support Hannes Reinecke
  2020-05-22 15:38 ` [PATCH 01/12] dm-zoned: add debugging message for reading superblocks Hannes Reinecke
  2020-05-22 15:38 ` [PATCH 02/12] dm-zoned: convert to xarray Hannes Reinecke
@ 2020-05-22 15:38 ` Hannes Reinecke
  2020-05-25  2:09   ` Damien Le Moal
  2020-05-22 15:38 ` [PATCH 04/12] dm-zoned: secondary superblock must reside on the same devices than primary superblock Hannes Reinecke
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Hannes Reinecke @ 2020-05-22 15:38 UTC (permalink / raw)
  To: Damien LeMoal; +Cc: dm-devel, Mike Snitzer

Checking the teriary superblock just consists of validating UUIDs,
crcs, and the generation number; it doesn't have contents which
would be required during the actual operation.
So we should use an on-stack superblock and avoid having to store
it together with the 'real' superblocks.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-zoned-metadata.c | 98 +++++++++++++++++++++++-------------------
 1 file changed, 53 insertions(+), 45 deletions(-)

diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index 3da6702bb1ae..b70a988fa771 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -174,7 +174,7 @@ struct dmz_metadata {
 	/* Zone information array */
 	struct xarray		zones;
 
-	struct dmz_sb		sb[3];
+	struct dmz_sb		sb[2];
 	unsigned int		mblk_primary;
 	unsigned int		sb_version;
 	u64			sb_gen;
@@ -995,10 +995,11 @@ int dmz_flush_metadata(struct dmz_metadata *zmd)
 /*
  * Check super block.
  */
-static int dmz_check_sb(struct dmz_metadata *zmd, unsigned int set)
+static int dmz_check_sb(struct dmz_metadata *zmd, struct dmz_sb *dsb,
+			bool tertiary)
 {
-	struct dmz_super *sb = zmd->sb[set].sb;
-	struct dmz_dev *dev = zmd->sb[set].dev;
+	struct dmz_super *sb = dsb->sb;
+	struct dmz_dev *dev = dsb->dev;
 	unsigned int nr_meta_zones, nr_data_zones;
 	u32 crc, stored_crc;
 	u64 gen;
@@ -1015,7 +1016,7 @@ static int dmz_check_sb(struct dmz_metadata *zmd, unsigned int set)
 			    DMZ_META_VER, zmd->sb_version);
 		return -EINVAL;
 	}
-	if ((zmd->sb_version < 1) && (set == 2)) {
+	if ((zmd->sb_version < 1) && tertiary) {
 		dmz_dev_err(dev, "Tertiary superblocks are not supported");
 		return -EINVAL;
 	}
@@ -1059,7 +1060,7 @@ static int dmz_check_sb(struct dmz_metadata *zmd, unsigned int set)
 			return -ENXIO;
 		}
 
-		if (set == 2) {
+		if (tertiary) {
 			/*
 			 * Generation number should be 0, but it doesn't
 			 * really matter if it isn't.
@@ -1108,13 +1109,13 @@ static int dmz_check_sb(struct dmz_metadata *zmd, unsigned int set)
 /*
  * Read the first or second super block from disk.
  */
-static int dmz_read_sb(struct dmz_metadata *zmd, unsigned int set)
+static int dmz_read_sb(struct dmz_metadata *zmd, struct dmz_sb *sb, int set)
 {
 	DMDEBUG("(%s): read superblock set %d dev %s block %llu",
 		zmd->devname, set, zmd->sb[set].dev->name,
 		zmd->sb[set].block);
-	return dmz_rdwr_block(zmd->sb[set].dev, REQ_OP_READ,
-			      zmd->sb[set].block, zmd->sb[set].mblk->page);
+	return dmz_rdwr_block(sb->dev, REQ_OP_READ,
+			      sb->block, sb->mblk->page);
 }
 
 /*
@@ -1142,7 +1143,7 @@ static int dmz_lookup_secondary_sb(struct dmz_metadata *zmd)
 	zmd->sb[1].zone = xa_load(&zmd->zones, zone_id + 1);
 	zmd->sb[1].dev = dmz_zone_to_dev(zmd, zmd->sb[1].zone);
 	for (i = 1; i < zmd->nr_rnd_zones; i++) {
-		if (dmz_read_sb(zmd, 1) != 0)
+		if (dmz_read_sb(zmd, &zmd->sb[1], 1) != 0)
 			break;
 		if (le32_to_cpu(zmd->sb[1].sb->magic) == DMZ_MAGIC)
 			return 0;
@@ -1160,9 +1161,9 @@ static int dmz_lookup_secondary_sb(struct dmz_metadata *zmd)
 }
 
 /*
- * Read the first or second super block from disk.
+ * Read a super block from disk.
  */
-static int dmz_get_sb(struct dmz_metadata *zmd, unsigned int set)
+static int dmz_get_sb(struct dmz_metadata *zmd, struct dmz_sb *sb, int set)
 {
 	struct dmz_mblock *mblk;
 	int ret;
@@ -1172,14 +1173,14 @@ static int dmz_get_sb(struct dmz_metadata *zmd, unsigned int set)
 	if (!mblk)
 		return -ENOMEM;
 
-	zmd->sb[set].mblk = mblk;
-	zmd->sb[set].sb = mblk->data;
+	sb->mblk = mblk;
+	sb->sb = mblk->data;
 
 	/* Read super block */
-	ret = dmz_read_sb(zmd, set);
+	ret = dmz_read_sb(zmd, sb, set);
 	if (ret) {
 		dmz_free_mblock(zmd, mblk);
-		zmd->sb[set].mblk = NULL;
+		sb->mblk = NULL;
 		return ret;
 	}
 
@@ -1253,13 +1254,13 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
 	/* Read and check the primary super block */
 	zmd->sb[0].block = dmz_start_block(zmd, zmd->sb[0].zone);
 	zmd->sb[0].dev = dmz_zone_to_dev(zmd, zmd->sb[0].zone);
-	ret = dmz_get_sb(zmd, 0);
+	ret = dmz_get_sb(zmd, &zmd->sb[0], 0);
 	if (ret) {
 		dmz_dev_err(zmd->sb[0].dev, "Read primary super block failed");
 		return ret;
 	}
 
-	ret = dmz_check_sb(zmd, 0);
+	ret = dmz_check_sb(zmd, &zmd->sb[0], false);
 
 	/* Read and check secondary super block */
 	if (ret == 0) {
@@ -1272,7 +1273,7 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
 		}
 		zmd->sb[1].block = dmz_start_block(zmd, zmd->sb[1].zone);
 		zmd->sb[1].dev = dmz_zone_to_dev(zmd, zmd->sb[1].zone);
-		ret = dmz_get_sb(zmd, 1);
+		ret = dmz_get_sb(zmd, &zmd->sb[1], 1);
 	} else
 		ret = dmz_lookup_secondary_sb(zmd);
 
@@ -1281,7 +1282,7 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
 		return ret;
 	}
 
-	ret = dmz_check_sb(zmd, 1);
+	ret = dmz_check_sb(zmd, &zmd->sb[1], false);
 	if (ret == 0)
 		sb_good[1] = true;
 
@@ -1326,18 +1327,32 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
 		      "Using super block %u (gen %llu)",
 		      zmd->mblk_primary, zmd->sb_gen);
 
-	if ((zmd->sb_version > 1) && zmd->sb[2].zone) {
-		zmd->sb[2].block = dmz_start_block(zmd, zmd->sb[2].zone);
-		zmd->sb[2].dev = dmz_zone_to_dev(zmd, zmd->sb[2].zone);
-		ret = dmz_get_sb(zmd, 2);
-		if (ret) {
-			dmz_dev_err(zmd->sb[2].dev,
-				    "Read tertiary super block failed");
-			return ret;
+	if (zmd->sb_version > 1) {
+		int i;
+
+		for (i = 1; i < zmd->nr_devs; i++) {
+			struct dmz_sb sb;
+
+			sb.block = 0;
+			sb.zone = dmz_get(zmd, zmd->dev[i].zone_offset);
+			sb.dev = &zmd->dev[i];
+			if (!dmz_is_meta(sb.zone)) {
+				dmz_dev_err(sb.dev,
+					    "Tertiary super block zone %u not marked as metadata zone",
+					    sb.zone->id);
+				return -EINVAL;
+			}
+			ret = dmz_get_sb(zmd, &sb, i + 1);
+			if (ret) {
+				dmz_dev_err(sb.dev,
+					    "Read tertiary super block failed");
+				return ret;
+			}
+			ret = dmz_check_sb(zmd, &sb, true);
+			dmz_free_mblock(zmd, sb.mblk);
+			if (ret == -EINVAL)
+				return ret;
 		}
-		ret = dmz_check_sb(zmd, 2);
-		if (ret == -EINVAL)
-			return ret;
 	}
 	return 0;
 }
@@ -1402,12 +1417,15 @@ static int dmz_init_zone(struct blk_zone *blkz, unsigned int num, void *data)
 				zmd->sb[0].zone = zone;
 			}
 		}
-		if (zmd->nr_devs > 1 && !zmd->sb[2].zone) {
-			/* Tertiary superblock zone */
-			zmd->sb[2].zone = zone;
+		if (zmd->nr_devs > 1 && num == 0) {
+			/*
+			 * Tertiary superblock zones are always at the
+			 * start of the zoned devices, so mark them
+			 * as metadata zone.
+			 */
+			set_bit(DMZ_META, &zone->flags);
 		}
 	}
-
 	return 0;
 }
 
@@ -2850,16 +2868,6 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
 		}
 		set_bit(DMZ_META, &zone->flags);
 	}
-	if (zmd->sb[2].zone) {
-		zone = dmz_get(zmd, zmd->sb[2].zone->id);
-		if (!zone) {
-			dmz_zmd_err(zmd,
-				    "Tertiary metadata zone not present");
-			ret = -ENXIO;
-			goto err;
-		}
-		set_bit(DMZ_META, &zone->flags);
-	}
 	/* Load mapping table */
 	ret = dmz_load_mapping(zmd);
 	if (ret)
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 04/12] dm-zoned: secondary superblock must reside on the same devices than primary superblock
  2020-05-22 15:38 [PATCH RFC 00/12] dm-zoned: multi-device support Hannes Reinecke
                   ` (2 preceding siblings ...)
  2020-05-22 15:38 ` [PATCH 03/12] dm-zoned: use on-stack superblock for tertiary devices Hannes Reinecke
@ 2020-05-22 15:38 ` Hannes Reinecke
  2020-05-25  2:10   ` Damien Le Moal
  2020-05-22 15:38 ` [PATCH 05/12] dm-zoned: add device pointer to struct dm_zone Hannes Reinecke
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Hannes Reinecke @ 2020-05-22 15:38 UTC (permalink / raw)
  To: Damien LeMoal; +Cc: dm-devel, Mike Snitzer

The secondary superblock must reside on the same device than the
primary superblock, so there's no need to re-calculate the device.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-zoned-metadata.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index b70a988fa771..fdae4e0228e7 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -1141,7 +1141,7 @@ static int dmz_lookup_secondary_sb(struct dmz_metadata *zmd)
 	/* Bad first super block: search for the second one */
 	zmd->sb[1].block = zmd->sb[0].block + zone_nr_blocks;
 	zmd->sb[1].zone = xa_load(&zmd->zones, zone_id + 1);
-	zmd->sb[1].dev = dmz_zone_to_dev(zmd, zmd->sb[1].zone);
+	zmd->sb[1].dev = zmd->sb[0].dev;
 	for (i = 1; i < zmd->nr_rnd_zones; i++) {
 		if (dmz_read_sb(zmd, &zmd->sb[1], 1) != 0)
 			break;
@@ -1149,7 +1149,6 @@ static int dmz_lookup_secondary_sb(struct dmz_metadata *zmd)
 			return 0;
 		zmd->sb[1].block += zone_nr_blocks;
 		zmd->sb[1].zone = dmz_get(zmd, zone_id + i);
-		zmd->sb[1].dev = dmz_zone_to_dev(zmd, zmd->sb[1].zone);
 	}
 
 	dmz_free_mblock(zmd, mblk);
@@ -1272,7 +1271,7 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
 			zmd->sb[1].zone = dmz_get(zmd, zone_id);
 		}
 		zmd->sb[1].block = dmz_start_block(zmd, zmd->sb[1].zone);
-		zmd->sb[1].dev = dmz_zone_to_dev(zmd, zmd->sb[1].zone);
+		zmd->sb[1].dev = zmd->sb[0].dev;
 		ret = dmz_get_sb(zmd, &zmd->sb[1], 1);
 	} else
 		ret = dmz_lookup_secondary_sb(zmd);
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 05/12] dm-zoned: add device pointer to struct dm_zone
  2020-05-22 15:38 [PATCH RFC 00/12] dm-zoned: multi-device support Hannes Reinecke
                   ` (3 preceding siblings ...)
  2020-05-22 15:38 ` [PATCH 04/12] dm-zoned: secondary superblock must reside on the same devices than primary superblock Hannes Reinecke
@ 2020-05-22 15:38 ` Hannes Reinecke
  2020-05-25  2:15   ` Damien Le Moal
  2020-05-22 15:38 ` [PATCH 06/12] dm-zoned: add metadata pointer to struct dmz_dev Hannes Reinecke
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Hannes Reinecke @ 2020-05-22 15:38 UTC (permalink / raw)
  To: Damien LeMoal; +Cc: dm-devel, Mike Snitzer

Add a pointer to the containing device to struct dm_zone and
kill dmz_zone_to_dev().

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-zoned-metadata.c | 47 ++++++++++++------------------------------
 drivers/md/dm-zoned-reclaim.c  | 18 +++++++---------
 drivers/md/dm-zoned-target.c   |  7 +++----
 drivers/md/dm-zoned.h          |  4 +++-
 4 files changed, 26 insertions(+), 50 deletions(-)

diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index fdae4e0228e7..7b6e7404f1e8 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -229,16 +229,10 @@ struct dmz_metadata {
  */
 static unsigned int dmz_dev_zone_id(struct dmz_metadata *zmd, struct dm_zone *zone)
 {
-	unsigned int zone_id;
-
 	if (WARN_ON(!zone))
 		return 0;
 
-	zone_id = zone->id;
-	if (zmd->nr_devs > 1 &&
-	    (zone_id >= zmd->dev[1].zone_offset))
-		zone_id -= zmd->dev[1].zone_offset;
-	return zone_id;
+	return zone->id - zone->dev->zone_offset;
 }
 
 sector_t dmz_start_sect(struct dmz_metadata *zmd, struct dm_zone *zone)
@@ -255,18 +249,6 @@ sector_t dmz_start_block(struct dmz_metadata *zmd, struct dm_zone *zone)
 	return (sector_t)zone_id << zmd->zone_nr_blocks_shift;
 }
 
-struct dmz_dev *dmz_zone_to_dev(struct dmz_metadata *zmd, struct dm_zone *zone)
-{
-	if (WARN_ON(!zone))
-		return &zmd->dev[0];
-
-	if (zmd->nr_devs > 1 &&
-	    zone->id >= zmd->dev[1].zone_offset)
-		return &zmd->dev[1];
-
-	return &zmd->dev[0];
-}
-
 unsigned int dmz_zone_nr_blocks(struct dmz_metadata *zmd)
 {
 	return zmd->zone_nr_blocks;
@@ -1252,7 +1234,7 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
 
 	/* Read and check the primary super block */
 	zmd->sb[0].block = dmz_start_block(zmd, zmd->sb[0].zone);
-	zmd->sb[0].dev = dmz_zone_to_dev(zmd, zmd->sb[0].zone);
+	zmd->sb[0].dev = zmd->sb[0].zone->dev;
 	ret = dmz_get_sb(zmd, &zmd->sb[0], 0);
 	if (ret) {
 		dmz_dev_err(zmd->sb[0].dev, "Read primary super block failed");
@@ -1383,6 +1365,7 @@ static int dmz_init_zone(struct blk_zone *blkz, unsigned int num, void *data)
 
 	INIT_LIST_HEAD(&zone->link);
 	atomic_set(&zone->refcount, 0);
+	zone->dev = dev;
 	zone->id = idx;
 	zone->chunk = DMZ_MAP_UNMAPPED;
 
@@ -1442,6 +1425,7 @@ static int dmz_emulate_zones(struct dmz_metadata *zmd, struct dmz_dev *dev)
 			return -EBUSY;
 		INIT_LIST_HEAD(&zone->link);
 		atomic_set(&zone->refcount, 0);
+		zone->dev = dev;
 		zone->id = idx;
 		zone->chunk = DMZ_MAP_UNMAPPED;
 		set_bit(DMZ_CACHE, &zone->flags);
@@ -1567,11 +1551,10 @@ static int dmz_update_zone_cb(struct blk_zone *blkz, unsigned int idx,
  */
 static int dmz_update_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
 {
-	struct dmz_dev *dev = dmz_zone_to_dev(zmd, zone);
 	unsigned int noio_flag;
 	int ret;
 
-	if (dev->flags & DMZ_BDEV_REGULAR)
+	if (zone->dev->flags & DMZ_BDEV_REGULAR)
 		return 0;
 
 	/*
@@ -1581,16 +1564,16 @@ static int dmz_update_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
 	 * GFP_NOIO was specified.
 	 */
 	noio_flag = memalloc_noio_save();
-	ret = blkdev_report_zones(dev->bdev, dmz_start_sect(zmd, zone), 1,
+	ret = blkdev_report_zones(zone->dev->bdev, dmz_start_sect(zmd, zone), 1,
 				  dmz_update_zone_cb, zone);
 	memalloc_noio_restore(noio_flag);
 
 	if (ret == 0)
 		ret = -EIO;
 	if (ret < 0) {
-		dmz_dev_err(dev, "Get zone %u report failed",
+		dmz_dev_err(zone->dev, "Get zone %u report failed",
 			    zone->id);
-		dmz_check_bdev(dev);
+		dmz_check_bdev(zone->dev);
 		return ret;
 	}
 
@@ -1604,7 +1587,6 @@ static int dmz_update_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
 static int dmz_handle_seq_write_err(struct dmz_metadata *zmd,
 				    struct dm_zone *zone)
 {
-	struct dmz_dev *dev = dmz_zone_to_dev(zmd, zone);
 	unsigned int wp = 0;
 	int ret;
 
@@ -1613,7 +1595,8 @@ static int dmz_handle_seq_write_err(struct dmz_metadata *zmd,
 	if (ret)
 		return ret;
 
-	dmz_dev_warn(dev, "Processing zone %u write error (zone wp %u/%u)",
+	dmz_dev_warn(zone->dev,
+		     "Processing zone %u write error (zone wp %u/%u)",
 		     zone->id, zone->wp_block, wp);
 
 	if (zone->wp_block < wp) {
@@ -1641,13 +1624,11 @@ static int dmz_reset_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
 		return 0;
 
 	if (!dmz_is_empty(zone) || dmz_seq_write_err(zone)) {
-		struct dmz_dev *dev = dmz_zone_to_dev(zmd, zone);
-
-		ret = blkdev_zone_mgmt(dev->bdev, REQ_OP_ZONE_RESET,
+		ret = blkdev_zone_mgmt(zone->dev->bdev, REQ_OP_ZONE_RESET,
 				       dmz_start_sect(zmd, zone),
 				       zmd->zone_nr_sectors, GFP_NOIO);
 		if (ret) {
-			dmz_dev_err(dev, "Reset zone %u failed %d",
+			dmz_dev_err(zone->dev, "Reset zone %u failed %d",
 				    zone->id, ret);
 			return ret;
 		}
@@ -2201,9 +2182,7 @@ struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags)
 		goto again;
 	}
 	if (dmz_is_meta(zone)) {
-		struct dmz_dev *dev = dmz_zone_to_dev(zmd, zone);
-
-		dmz_dev_warn(dev, "Zone %u has metadata", zone->id);
+		dmz_zmd_warn(zmd, "Zone %u has metadata", zone->id);
 		zone = NULL;
 		goto again;
 	}
diff --git a/drivers/md/dm-zoned-reclaim.c b/drivers/md/dm-zoned-reclaim.c
index 571bc1d41bab..d1a72b42dea2 100644
--- a/drivers/md/dm-zoned-reclaim.c
+++ b/drivers/md/dm-zoned-reclaim.c
@@ -58,7 +58,6 @@ static int dmz_reclaim_align_wp(struct dmz_reclaim *zrc, struct dm_zone *zone,
 				sector_t block)
 {
 	struct dmz_metadata *zmd = zrc->metadata;
-	struct dmz_dev *dev = dmz_zone_to_dev(zmd, zone);
 	sector_t wp_block = zone->wp_block;
 	unsigned int nr_blocks;
 	int ret;
@@ -74,15 +73,15 @@ static int dmz_reclaim_align_wp(struct dmz_reclaim *zrc, struct dm_zone *zone,
 	 * pointer and the requested position.
 	 */
 	nr_blocks = block - wp_block;
-	ret = blkdev_issue_zeroout(dev->bdev,
+	ret = blkdev_issue_zeroout(zone->dev->bdev,
 				   dmz_start_sect(zmd, zone) + dmz_blk2sect(wp_block),
 				   dmz_blk2sect(nr_blocks), GFP_NOIO, 0);
 	if (ret) {
-		dmz_dev_err(dev,
+		dmz_dev_err(zone->dev,
 			    "Align zone %u wp %llu to %llu (wp+%u) blocks failed %d",
 			    zone->id, (unsigned long long)wp_block,
 			    (unsigned long long)block, nr_blocks, ret);
-		dmz_check_bdev(dev);
+		dmz_check_bdev(zone->dev);
 		return ret;
 	}
 
@@ -116,7 +115,6 @@ static int dmz_reclaim_copy(struct dmz_reclaim *zrc,
 			    struct dm_zone *src_zone, struct dm_zone *dst_zone)
 {
 	struct dmz_metadata *zmd = zrc->metadata;
-	struct dmz_dev *src_dev, *dst_dev;
 	struct dm_io_region src, dst;
 	sector_t block = 0, end_block;
 	sector_t nr_blocks;
@@ -130,17 +128,15 @@ static int dmz_reclaim_copy(struct dmz_reclaim *zrc,
 	else
 		end_block = dmz_zone_nr_blocks(zmd);
 	src_zone_block = dmz_start_block(zmd, src_zone);
-	src_dev = dmz_zone_to_dev(zmd, src_zone);
 	dst_zone_block = dmz_start_block(zmd, dst_zone);
-	dst_dev = dmz_zone_to_dev(zmd, dst_zone);
 
 	if (dmz_is_seq(dst_zone))
 		set_bit(DM_KCOPYD_WRITE_SEQ, &flags);
 
 	while (block < end_block) {
-		if (src_dev->flags & DMZ_BDEV_DYING)
+		if (src_zone->dev->flags & DMZ_BDEV_DYING)
 			return -EIO;
-		if (dst_dev->flags & DMZ_BDEV_DYING)
+		if (dst_zone->dev->flags & DMZ_BDEV_DYING)
 			return -EIO;
 
 		if (dmz_reclaim_should_terminate(src_zone))
@@ -163,11 +159,11 @@ static int dmz_reclaim_copy(struct dmz_reclaim *zrc,
 				return ret;
 		}
 
-		src.bdev = src_dev->bdev;
+		src.bdev = src_zone->dev->bdev;
 		src.sector = dmz_blk2sect(src_zone_block + block);
 		src.count = dmz_blk2sect(nr_blocks);
 
-		dst.bdev = dst_dev->bdev;
+		dst.bdev = dst_zone->dev->bdev;
 		dst.sector = dmz_blk2sect(dst_zone_block + block);
 		dst.count = src.count;
 
diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
index 2770e293a97b..bca9a611b8dd 100644
--- a/drivers/md/dm-zoned-target.c
+++ b/drivers/md/dm-zoned-target.c
@@ -123,18 +123,17 @@ static int dmz_submit_bio(struct dmz_target *dmz, struct dm_zone *zone,
 {
 	struct dmz_bioctx *bioctx =
 		dm_per_bio_data(bio, sizeof(struct dmz_bioctx));
-	struct dmz_dev *dev = dmz_zone_to_dev(dmz->metadata, zone);
 	struct bio *clone;
 
-	if (dev->flags & DMZ_BDEV_DYING)
+	if (zone->dev->flags & DMZ_BDEV_DYING)
 		return -EIO;
 
 	clone = bio_clone_fast(bio, GFP_NOIO, &dmz->bio_set);
 	if (!clone)
 		return -ENOMEM;
 
-	bio_set_dev(clone, dev->bdev);
-	bioctx->dev = dev;
+	bio_set_dev(clone, zone->dev->bdev);
+	bioctx->dev = zone->dev;
 	clone->bi_iter.bi_sector =
 		dmz_start_sect(dmz->metadata, zone) + dmz_blk2sect(chunk_block);
 	clone->bi_iter.bi_size = dmz_blk2sect(nr_blocks) << SECTOR_SHIFT;
diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
index 8083607b9535..356b436425e4 100644
--- a/drivers/md/dm-zoned.h
+++ b/drivers/md/dm-zoned.h
@@ -80,6 +80,9 @@ struct dm_zone {
 	/* For listing the zone depending on its state */
 	struct list_head	link;
 
+	/* Device containing this zone */
+	struct dmz_dev		*dev;
+
 	/* Zone type and state */
 	unsigned long		flags;
 
@@ -188,7 +191,6 @@ const char *dmz_metadata_label(struct dmz_metadata *zmd);
 sector_t dmz_start_sect(struct dmz_metadata *zmd, struct dm_zone *zone);
 sector_t dmz_start_block(struct dmz_metadata *zmd, struct dm_zone *zone);
 unsigned int dmz_nr_chunks(struct dmz_metadata *zmd);
-struct dmz_dev *dmz_zone_to_dev(struct dmz_metadata *zmd, struct dm_zone *zone);
 
 bool dmz_check_dev(struct dmz_metadata *zmd);
 bool dmz_dev_is_dying(struct dmz_metadata *zmd);
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 06/12] dm-zoned: add metadata pointer to struct dmz_dev
  2020-05-22 15:38 [PATCH RFC 00/12] dm-zoned: multi-device support Hannes Reinecke
                   ` (4 preceding siblings ...)
  2020-05-22 15:38 ` [PATCH 05/12] dm-zoned: add device pointer to struct dm_zone Hannes Reinecke
@ 2020-05-22 15:38 ` Hannes Reinecke
  2020-05-25  2:17   ` Damien Le Moal
  2020-05-22 15:38 ` [PATCH 07/12] dm-zoned: add a 'reserved' zone flag Hannes Reinecke
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Hannes Reinecke @ 2020-05-22 15:38 UTC (permalink / raw)
  To: Damien LeMoal; +Cc: dm-devel, Mike Snitzer

Add a metadata pointer to struct dmz_dev and use it as argument
for blkdev_report_zones() instead of the metadata itself.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-zoned-metadata.c | 14 +++++++++-----
 drivers/md/dm-zoned.h          |  7 ++++---
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index 7b6e7404f1e8..73479b4c8bca 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -1343,8 +1343,8 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
  */
 static int dmz_init_zone(struct blk_zone *blkz, unsigned int num, void *data)
 {
-	struct dmz_metadata *zmd = data;
-	struct dmz_dev *dev = zmd->nr_devs > 1 ? &zmd->dev[1] : &zmd->dev[0];
+	struct dmz_dev *dev = data;
+	struct dmz_metadata *zmd = dev->metadata;
 	int idx = num + dev->zone_offset;
 	struct dm_zone *zone = kzalloc(sizeof(struct dm_zone), GFP_KERNEL);
 
@@ -1480,8 +1480,12 @@ static int dmz_init_zones(struct dmz_metadata *zmd)
 
 	/* Allocate zone array */
 	zmd->nr_zones = 0;
-	for (i = 0; i < zmd->nr_devs; i++)
-		zmd->nr_zones += zmd->dev[i].nr_zones;
+	for (i = 0; i < zmd->nr_devs; i++) {
+		struct dmz_dev *dev = &zmd->dev[i];
+
+		dev->metadata = zmd;
+		zmd->nr_zones += dev->nr_zones;
+	}
 
 	if (!zmd->nr_zones) {
 		DMERR("(%s): No zones found", zmd->devname);
@@ -1516,7 +1520,7 @@ static int dmz_init_zones(struct dmz_metadata *zmd)
 	 * first randomly writable zone.
 	 */
 	ret = blkdev_report_zones(zoned_dev->bdev, 0, BLK_ALL_ZONES,
-				  dmz_init_zone, zmd);
+				  dmz_init_zone, zoned_dev);
 	if (ret < 0) {
 		DMDEBUG("(%s): Failed to report zones, error %d",
 			zmd->devname, ret);
diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
index 356b436425e4..dab701893b67 100644
--- a/drivers/md/dm-zoned.h
+++ b/drivers/md/dm-zoned.h
@@ -45,11 +45,15 @@
 #define dmz_bio_block(bio)	dmz_sect2blk((bio)->bi_iter.bi_sector)
 #define dmz_bio_blocks(bio)	dmz_sect2blk(bio_sectors(bio))
 
+struct dmz_metadata;
+struct dmz_reclaim;
+
 /*
  * Zoned block device information.
  */
 struct dmz_dev {
 	struct block_device	*bdev;
+	struct dmz_metadata	*metadata;
 
 	char			name[BDEVNAME_SIZE];
 	uuid_t			uuid;
@@ -168,9 +172,6 @@ enum {
 #define dmz_dev_debug(dev, format, args...)	\
 	DMDEBUG("(%s): " format, (dev)->name, ## args)
 
-struct dmz_metadata;
-struct dmz_reclaim;
-
 /*
  * Functions defined in dm-zoned-metadata.c
  */
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 07/12] dm-zoned: add a 'reserved' zone flag
  2020-05-22 15:38 [PATCH RFC 00/12] dm-zoned: multi-device support Hannes Reinecke
                   ` (5 preceding siblings ...)
  2020-05-22 15:38 ` [PATCH 06/12] dm-zoned: add metadata pointer to struct dmz_dev Hannes Reinecke
@ 2020-05-22 15:38 ` Hannes Reinecke
  2020-05-25  2:18   ` Damien Le Moal
  2020-05-22 15:38 ` [PATCH 08/12] dm-zoned: move random and sequential zones into struct dmz_dev Hannes Reinecke
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Hannes Reinecke @ 2020-05-22 15:38 UTC (permalink / raw)
  To: Damien LeMoal; +Cc: dm-devel, Mike Snitzer

Instead of counting the number of reserved zones in dmz_free_zone()
we should mark the zone as 'reserved' during allocation and simplify
dmz_free_zone().

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-zoned-metadata.c | 4 ++--
 drivers/md/dm-zoned.h          | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index 73479b4c8bca..1b9da698a812 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -1783,6 +1783,7 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
 			atomic_inc(&zmd->unmap_nr_rnd);
 		} else if (atomic_read(&zmd->nr_reserved_seq_zones) < zmd->nr_reserved_seq) {
 			list_add_tail(&dzone->link, &zmd->reserved_seq_zones_list);
+			set_bit(DMZ_RESERVED, &dzone->flags);
 			atomic_inc(&zmd->nr_reserved_seq_zones);
 			zmd->nr_seq--;
 		} else {
@@ -2210,8 +2211,7 @@ void dmz_free_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
 	} else if (dmz_is_rnd(zone)) {
 		list_add_tail(&zone->link, &zmd->unmap_rnd_list);
 		atomic_inc(&zmd->unmap_nr_rnd);
-	} else if (atomic_read(&zmd->nr_reserved_seq_zones) <
-		   zmd->nr_reserved_seq) {
+	} else if (dmz_is_reserved(zone)) {
 		list_add_tail(&zone->link, &zmd->reserved_seq_zones_list);
 		atomic_inc(&zmd->nr_reserved_seq_zones);
 	} else {
diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
index dab701893b67..983f5b5e9fa0 100644
--- a/drivers/md/dm-zoned.h
+++ b/drivers/md/dm-zoned.h
@@ -130,6 +130,7 @@ enum {
 	DMZ_META,
 	DMZ_DATA,
 	DMZ_BUF,
+	DMZ_RESERVED,
 
 	/* Zone internal state */
 	DMZ_RECLAIM,
@@ -147,6 +148,7 @@ enum {
 #define dmz_is_offline(z)	test_bit(DMZ_OFFLINE, &(z)->flags)
 #define dmz_is_readonly(z)	test_bit(DMZ_READ_ONLY, &(z)->flags)
 #define dmz_in_reclaim(z)	test_bit(DMZ_RECLAIM, &(z)->flags)
+#define dmz_is_reserved(z)	test_bit(DMZ_RESERVED, &(z)->flags)
 #define dmz_seq_write_err(z)	test_bit(DMZ_SEQ_WRITE_ERR, &(z)->flags)
 #define dmz_reclaim_should_terminate(z) \
 				test_bit(DMZ_RECLAIM_TERMINATE, &(z)->flags)
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 08/12] dm-zoned: move random and sequential zones into struct dmz_dev
  2020-05-22 15:38 [PATCH RFC 00/12] dm-zoned: multi-device support Hannes Reinecke
                   ` (6 preceding siblings ...)
  2020-05-22 15:38 ` [PATCH 07/12] dm-zoned: add a 'reserved' zone flag Hannes Reinecke
@ 2020-05-22 15:38 ` Hannes Reinecke
  2020-05-25  2:27   ` Damien Le Moal
  2020-05-22 15:38 ` [PATCH 09/12] dm-zoned: improve logging messages for reclaim Hannes Reinecke
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Hannes Reinecke @ 2020-05-22 15:38 UTC (permalink / raw)
  To: Damien LeMoal; +Cc: dm-devel, Mike Snitzer

Random and sequential zones should be part of the respective
device structure to make arbitration between devices possible.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-zoned-metadata.c | 143 +++++++++++++++++++++++++----------------
 drivers/md/dm-zoned.h          |  10 +++
 2 files changed, 99 insertions(+), 54 deletions(-)

diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index 1b9da698a812..5f44970a6187 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -192,21 +192,12 @@ struct dmz_metadata {
 	/* Zone allocation management */
 	struct mutex		map_lock;
 	struct dmz_mblock	**map_mblk;
-	unsigned int		nr_rnd;
-	atomic_t		unmap_nr_rnd;
-	struct list_head	unmap_rnd_list;
-	struct list_head	map_rnd_list;
 
 	unsigned int		nr_cache;
 	atomic_t		unmap_nr_cache;
 	struct list_head	unmap_cache_list;
 	struct list_head	map_cache_list;
 
-	unsigned int		nr_seq;
-	atomic_t		unmap_nr_seq;
-	struct list_head	unmap_seq_list;
-	struct list_head	map_seq_list;
-
 	atomic_t		nr_reserved_seq_zones;
 	struct list_head	reserved_seq_zones_list;
 
@@ -281,12 +272,22 @@ unsigned int dmz_nr_chunks(struct dmz_metadata *zmd)
 
 unsigned int dmz_nr_rnd_zones(struct dmz_metadata *zmd)
 {
-	return zmd->nr_rnd;
+	unsigned int nr_rnd_zones = 0;
+	int i;
+
+	for (i = 0; i < zmd->nr_devs; i++)
+		nr_rnd_zones += zmd->dev[i].nr_rnd;
+	return nr_rnd_zones;
 }
 
 unsigned int dmz_nr_unmap_rnd_zones(struct dmz_metadata *zmd)
 {
-	return atomic_read(&zmd->unmap_nr_rnd);
+	unsigned int nr_unmap_rnd_zones = 0;
+	int i;
+
+	for (i = 0; i < zmd->nr_devs; i++)
+		nr_unmap_rnd_zones += atomic_read(&zmd->dev[i].unmap_nr_rnd);
+	return nr_unmap_rnd_zones;
 }
 
 unsigned int dmz_nr_cache_zones(struct dmz_metadata *zmd)
@@ -301,12 +302,22 @@ unsigned int dmz_nr_unmap_cache_zones(struct dmz_metadata *zmd)
 
 unsigned int dmz_nr_seq_zones(struct dmz_metadata *zmd)
 {
-	return zmd->nr_seq;
+	unsigned int nr_seq_zones = 0;
+	int i;
+
+	for (i = 0; i < zmd->nr_devs; i++)
+		nr_seq_zones += zmd->dev[i].nr_seq;
+	return nr_seq_zones;
 }
 
 unsigned int dmz_nr_unmap_seq_zones(struct dmz_metadata *zmd)
 {
-	return atomic_read(&zmd->unmap_nr_seq);
+	unsigned int nr_unmap_seq_zones = 0;
+	int i;
+
+	for (i = 0; i < zmd->nr_devs; i++)
+		nr_unmap_seq_zones += atomic_read(&zmd->dev[i].unmap_nr_seq);
+	return nr_unmap_seq_zones;
 }
 
 static struct dm_zone *dmz_get(struct dmz_metadata *zmd, unsigned int zone_id)
@@ -1485,6 +1496,14 @@ static int dmz_init_zones(struct dmz_metadata *zmd)
 
 		dev->metadata = zmd;
 		zmd->nr_zones += dev->nr_zones;
+
+		atomic_set(&dev->unmap_nr_rnd, 0);
+		INIT_LIST_HEAD(&dev->unmap_rnd_list);
+		INIT_LIST_HEAD(&dev->map_rnd_list);
+
+		atomic_set(&dev->unmap_nr_seq, 0);
+		INIT_LIST_HEAD(&dev->unmap_seq_list);
+		INIT_LIST_HEAD(&dev->map_seq_list);
 	}
 
 	if (!zmd->nr_zones) {
@@ -1702,9 +1721,9 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
 		if (dmz_is_cache(dzone))
 			list_add_tail(&dzone->link, &zmd->map_cache_list);
 		else if (dmz_is_rnd(dzone))
-			list_add_tail(&dzone->link, &zmd->map_rnd_list);
+			list_add_tail(&dzone->link, &dzone->dev->map_rnd_list);
 		else
-			list_add_tail(&dzone->link, &zmd->map_seq_list);
+			list_add_tail(&dzone->link, &dzone->dev->map_seq_list);
 
 		/* Check buffer zone */
 		bzone_id = le32_to_cpu(dmap[e].bzone_id);
@@ -1738,7 +1757,7 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
 		if (dmz_is_cache(bzone))
 			list_add_tail(&bzone->link, &zmd->map_cache_list);
 		else
-			list_add_tail(&bzone->link, &zmd->map_rnd_list);
+			list_add_tail(&bzone->link, &bzone->dev->map_rnd_list);
 next:
 		chunk++;
 		e++;
@@ -1763,9 +1782,9 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
 		if (dmz_is_cache(dzone))
 			zmd->nr_cache++;
 		else if (dmz_is_rnd(dzone))
-			zmd->nr_rnd++;
+			dzone->dev->nr_rnd++;
 		else
-			zmd->nr_seq++;
+			dzone->dev->nr_seq++;
 
 		if (dmz_is_data(dzone)) {
 			/* Already initialized */
@@ -1779,16 +1798,18 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
 			list_add_tail(&dzone->link, &zmd->unmap_cache_list);
 			atomic_inc(&zmd->unmap_nr_cache);
 		} else if (dmz_is_rnd(dzone)) {
-			list_add_tail(&dzone->link, &zmd->unmap_rnd_list);
-			atomic_inc(&zmd->unmap_nr_rnd);
+			list_add_tail(&dzone->link,
+				      &dzone->dev->unmap_rnd_list);
+			atomic_inc(&dzone->dev->unmap_nr_rnd);
 		} else if (atomic_read(&zmd->nr_reserved_seq_zones) < zmd->nr_reserved_seq) {
 			list_add_tail(&dzone->link, &zmd->reserved_seq_zones_list);
 			set_bit(DMZ_RESERVED, &dzone->flags);
 			atomic_inc(&zmd->nr_reserved_seq_zones);
-			zmd->nr_seq--;
+			dzone->dev->nr_seq--;
 		} else {
-			list_add_tail(&dzone->link, &zmd->unmap_seq_list);
-			atomic_inc(&zmd->unmap_nr_seq);
+			list_add_tail(&dzone->link,
+				      &dzone->dev->unmap_seq_list);
+			atomic_inc(&dzone->dev->unmap_nr_seq);
 		}
 	}
 
@@ -1822,13 +1843,13 @@ static void __dmz_lru_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
 	list_del_init(&zone->link);
 	if (dmz_is_seq(zone)) {
 		/* LRU rotate sequential zone */
-		list_add_tail(&zone->link, &zmd->map_seq_list);
+		list_add_tail(&zone->link, &zone->dev->map_seq_list);
 	} else if (dmz_is_cache(zone)) {
 		/* LRU rotate cache zone */
 		list_add_tail(&zone->link, &zmd->map_cache_list);
 	} else {
 		/* LRU rotate random zone */
-		list_add_tail(&zone->link, &zmd->map_rnd_list);
+		list_add_tail(&zone->link, &zone->dev->map_rnd_list);
 	}
 }
 
@@ -1910,14 +1931,24 @@ static struct dm_zone *dmz_get_rnd_zone_for_reclaim(struct dmz_metadata *zmd,
 {
 	struct dm_zone *dzone = NULL;
 	struct dm_zone *zone;
-	struct list_head *zone_list = &zmd->map_rnd_list;
+	struct list_head *zone_list;
 
 	/* If we have cache zones select from the cache zone list */
 	if (zmd->nr_cache) {
 		zone_list = &zmd->map_cache_list;
 		/* Try to relaim random zones, too, when idle */
-		if (idle && list_empty(zone_list))
-			zone_list = &zmd->map_rnd_list;
+		if (idle && list_empty(zone_list)) {
+			int i;
+
+			for (i = 1; i < zmd->nr_devs; i++) {
+				zone_list = &zmd->dev[i].map_rnd_list;
+				if (!list_empty(zone_list))
+					break;
+			}
+		}
+	} else {
+		/* Otherwise the random zones are on the first disk */
+		zone_list = &zmd->dev[0].map_rnd_list;
 	}
 
 	list_for_each_entry(zone, zone_list, link) {
@@ -1938,12 +1969,17 @@ static struct dm_zone *dmz_get_rnd_zone_for_reclaim(struct dmz_metadata *zmd,
 static struct dm_zone *dmz_get_seq_zone_for_reclaim(struct dmz_metadata *zmd)
 {
 	struct dm_zone *zone;
+	int i;
 
-	list_for_each_entry(zone, &zmd->map_seq_list, link) {
-		if (!zone->bzone)
-			continue;
-		if (dmz_lock_zone_reclaim(zone))
-			return zone;
+	for (i = 0; i < zmd->nr_devs; i++) {
+		struct dmz_dev *dev = &zmd->dev[i];
+
+		list_for_each_entry(zone, &dev->map_seq_list, link) {
+			if (!zone->bzone)
+				continue;
+			if (dmz_lock_zone_reclaim(zone))
+				return zone;
+		}
 	}
 
 	return NULL;
@@ -2129,7 +2165,7 @@ struct dm_zone *dmz_get_chunk_buffer(struct dmz_metadata *zmd,
 	if (dmz_is_cache(bzone))
 		list_add_tail(&bzone->link, &zmd->map_cache_list);
 	else
-		list_add_tail(&bzone->link, &zmd->map_rnd_list);
+		list_add_tail(&bzone->link, &bzone->dev->map_rnd_list);
 out:
 	dmz_unlock_map(zmd);
 
@@ -2144,21 +2180,27 @@ struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags)
 {
 	struct list_head *list;
 	struct dm_zone *zone;
+	unsigned int dev_idx = 0;
 
+again:
 	if (flags & DMZ_ALLOC_CACHE)
 		list = &zmd->unmap_cache_list;
 	else if (flags & DMZ_ALLOC_RND)
-		list = &zmd->unmap_rnd_list;
+		list = &zmd->dev[dev_idx].unmap_rnd_list;
 	else
-		list = &zmd->unmap_seq_list;
+		list = &zmd->dev[dev_idx].unmap_seq_list;
 
-again:
 	if (list_empty(list)) {
 		/*
 		 * No free zone: return NULL if this is for not reclaim.
 		 */
 		if (!(flags & DMZ_ALLOC_RECLAIM))
 			return NULL;
+		if (dev_idx < zmd->nr_devs) {
+			dev_idx++;
+			goto again;
+		}
+
 		/*
 		 * Fallback to the reserved sequential zones
 		 */
@@ -2177,9 +2219,9 @@ struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags)
 	if (dmz_is_cache(zone))
 		atomic_dec(&zmd->unmap_nr_cache);
 	else if (dmz_is_rnd(zone))
-		atomic_dec(&zmd->unmap_nr_rnd);
+		atomic_dec(&zone->dev->unmap_nr_rnd);
 	else
-		atomic_dec(&zmd->unmap_nr_seq);
+		atomic_dec(&zone->dev->unmap_nr_seq);
 
 	if (dmz_is_offline(zone)) {
 		dmz_zmd_warn(zmd, "Zone %u is offline", zone->id);
@@ -2209,14 +2251,14 @@ void dmz_free_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
 		list_add_tail(&zone->link, &zmd->unmap_cache_list);
 		atomic_inc(&zmd->unmap_nr_cache);
 	} else if (dmz_is_rnd(zone)) {
-		list_add_tail(&zone->link, &zmd->unmap_rnd_list);
-		atomic_inc(&zmd->unmap_nr_rnd);
+		list_add_tail(&zone->link, &zone->dev->unmap_rnd_list);
+		atomic_inc(&zone->dev->unmap_nr_rnd);
 	} else if (dmz_is_reserved(zone)) {
 		list_add_tail(&zone->link, &zmd->reserved_seq_zones_list);
 		atomic_inc(&zmd->nr_reserved_seq_zones);
 	} else {
-		list_add_tail(&zone->link, &zmd->unmap_seq_list);
-		atomic_inc(&zmd->unmap_nr_seq);
+		list_add_tail(&zone->link, &zone->dev->unmap_seq_list);
+		atomic_inc(&zone->dev->unmap_nr_seq);
 	}
 
 	wake_up_all(&zmd->free_wq);
@@ -2236,9 +2278,9 @@ void dmz_map_zone(struct dmz_metadata *zmd, struct dm_zone *dzone,
 	if (dmz_is_cache(dzone))
 		list_add_tail(&dzone->link, &zmd->map_cache_list);
 	else if (dmz_is_rnd(dzone))
-		list_add_tail(&dzone->link, &zmd->map_rnd_list);
+		list_add_tail(&dzone->link, &dzone->dev->map_rnd_list);
 	else
-		list_add_tail(&dzone->link, &zmd->map_seq_list);
+		list_add_tail(&dzone->link, &dzone->dev->map_seq_list);
 }
 
 /*
@@ -2806,18 +2848,11 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
 	INIT_LIST_HEAD(&zmd->mblk_dirty_list);
 
 	mutex_init(&zmd->map_lock);
-	atomic_set(&zmd->unmap_nr_rnd, 0);
-	INIT_LIST_HEAD(&zmd->unmap_rnd_list);
-	INIT_LIST_HEAD(&zmd->map_rnd_list);
 
 	atomic_set(&zmd->unmap_nr_cache, 0);
 	INIT_LIST_HEAD(&zmd->unmap_cache_list);
 	INIT_LIST_HEAD(&zmd->map_cache_list);
 
-	atomic_set(&zmd->unmap_nr_seq, 0);
-	INIT_LIST_HEAD(&zmd->unmap_seq_list);
-	INIT_LIST_HEAD(&zmd->map_seq_list);
-
 	atomic_set(&zmd->nr_reserved_seq_zones, 0);
 	INIT_LIST_HEAD(&zmd->reserved_seq_zones_list);
 
@@ -2887,9 +2922,9 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
 	dmz_zmd_debug(zmd, "    %u cache zones (%u unmapped)",
 		      zmd->nr_cache, atomic_read(&zmd->unmap_nr_cache));
 	dmz_zmd_debug(zmd, "    %u random zones (%u unmapped)",
-		      zmd->nr_rnd, atomic_read(&zmd->unmap_nr_rnd));
+		      dmz_nr_rnd_zones(zmd), dmz_nr_unmap_rnd_zones(zmd));
 	dmz_zmd_debug(zmd, "    %u sequential zones (%u unmapped)",
-		      zmd->nr_seq, atomic_read(&zmd->unmap_nr_seq));
+		      dmz_nr_seq_zones(zmd), dmz_nr_unmap_seq_zones(zmd));
 	dmz_zmd_debug(zmd, "  %u reserved sequential data zones",
 		      zmd->nr_reserved_seq);
 	dmz_zmd_debug(zmd, "Format:");
diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
index 983f5b5e9fa0..56e138586d9b 100644
--- a/drivers/md/dm-zoned.h
+++ b/drivers/md/dm-zoned.h
@@ -66,6 +66,16 @@ struct dmz_dev {
 	unsigned int		flags;
 
 	sector_t		zone_nr_sectors;
+
+	unsigned int		nr_rnd;
+	atomic_t		unmap_nr_rnd;
+	struct list_head	unmap_rnd_list;
+	struct list_head	map_rnd_list;
+
+	unsigned int		nr_seq;
+	atomic_t		unmap_nr_seq;
+	struct list_head	unmap_seq_list;
+	struct list_head	map_seq_list;
 };
 
 #define dmz_bio_chunk(zmd, bio)	((bio)->bi_iter.bi_sector >> \
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 09/12] dm-zoned: improve logging messages for reclaim
  2020-05-22 15:38 [PATCH RFC 00/12] dm-zoned: multi-device support Hannes Reinecke
                   ` (7 preceding siblings ...)
  2020-05-22 15:38 ` [PATCH 08/12] dm-zoned: move random and sequential zones into struct dmz_dev Hannes Reinecke
@ 2020-05-22 15:38 ` Hannes Reinecke
  2020-05-25  2:28   ` Damien Le Moal
  2020-05-22 15:38 ` [PATCH 10/12] dm-zoned: support arbitrary number of devices Hannes Reinecke
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Hannes Reinecke @ 2020-05-22 15:38 UTC (permalink / raw)
  To: Damien LeMoal; +Cc: dm-devel, Mike Snitzer

Instead of just reporting the errno this patch adds some more
verbose debugging message in the reclaim path.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-zoned-reclaim.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-zoned-reclaim.c b/drivers/md/dm-zoned-reclaim.c
index d1a72b42dea2..fba0d48e38a7 100644
--- a/drivers/md/dm-zoned-reclaim.c
+++ b/drivers/md/dm-zoned-reclaim.c
@@ -367,8 +367,11 @@ static int dmz_do_reclaim(struct dmz_reclaim *zrc)
 
 	/* Get a data zone */
 	dzone = dmz_get_zone_for_reclaim(zmd, dmz_target_idle(zrc));
-	if (!dzone)
+	if (!dzone) {
+		DMDEBUG("(%s): No zone found to reclaim",
+			dmz_metadata_label(zmd));
 		return -EBUSY;
+	}
 
 	start = jiffies;
 	if (dmz_is_cache(dzone) || dmz_is_rnd(dzone)) {
@@ -412,6 +415,12 @@ static int dmz_do_reclaim(struct dmz_reclaim *zrc)
 	}
 out:
 	if (ret) {
+		if (ret == -EINTR)
+			DMDEBUG("(%s): reclaim zone %u interrupted",
+				dmz_metadata_label(zmd), rzone->id);
+		else
+			DMDEBUG("(%s): Failed to reclaim zone %u, err %d",
+				dmz_metadata_label(zmd), rzone->id, ret);
 		dmz_unlock_zone_reclaim(dzone);
 		return ret;
 	}
@@ -515,8 +524,6 @@ static void dmz_reclaim_work(struct work_struct *work)
 
 	ret = dmz_do_reclaim(zrc);
 	if (ret && ret != -EINTR) {
-		DMDEBUG("(%s): Reclaim error %d",
-			dmz_metadata_label(zmd), ret);
 		if (!dmz_check_dev(zmd))
 			return;
 	}
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 10/12] dm-zoned: support arbitrary number of devices
  2020-05-22 15:38 [PATCH RFC 00/12] dm-zoned: multi-device support Hannes Reinecke
                   ` (8 preceding siblings ...)
  2020-05-22 15:38 ` [PATCH 09/12] dm-zoned: improve logging messages for reclaim Hannes Reinecke
@ 2020-05-22 15:38 ` Hannes Reinecke
  2020-05-25  2:36   ` Damien Le Moal
  2020-05-22 15:39 ` [PATCH 11/12] dm-zoned: round-robin load balancer for reclaiming zones Hannes Reinecke
  2020-05-22 15:39 ` [PATCH 12/12] dm-zoned: per-device reclaim Hannes Reinecke
  11 siblings, 1 reply; 34+ messages in thread
From: Hannes Reinecke @ 2020-05-22 15:38 UTC (permalink / raw)
  To: Damien LeMoal; +Cc: dm-devel, Mike Snitzer

Remove the hard-coded limit of two devices and support an unlimited
number of additional zoned devices.
With that we need to increase the device-mapper version number to
3.0.0 as we've modified the interface.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-zoned-metadata.c |  68 +++++++++++-----------
 drivers/md/dm-zoned-reclaim.c  |  28 ++++++---
 drivers/md/dm-zoned-target.c   | 129 +++++++++++++++++++++++++----------------
 drivers/md/dm-zoned.h          |   9 +--
 4 files changed, 139 insertions(+), 95 deletions(-)

diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index 5f44970a6187..87784e7785bc 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -260,6 +260,11 @@ unsigned int dmz_zone_nr_sectors_shift(struct dmz_metadata *zmd)
 	return zmd->zone_nr_sectors_shift;
 }
 
+unsigned int dmz_nr_devs(struct dmz_metadata *zmd)
+{
+	return zmd->nr_devs;
+}
+
 unsigned int dmz_nr_zones(struct dmz_metadata *zmd)
 {
 	return zmd->nr_zones;
@@ -270,24 +275,14 @@ unsigned int dmz_nr_chunks(struct dmz_metadata *zmd)
 	return zmd->nr_chunks;
 }
 
-unsigned int dmz_nr_rnd_zones(struct dmz_metadata *zmd)
+unsigned int dmz_nr_rnd_zones(struct dmz_metadata *zmd, int idx)
 {
-	unsigned int nr_rnd_zones = 0;
-	int i;
-
-	for (i = 0; i < zmd->nr_devs; i++)
-		nr_rnd_zones += zmd->dev[i].nr_rnd;
-	return nr_rnd_zones;
+	return zmd->dev[idx].nr_rnd;
 }
 
-unsigned int dmz_nr_unmap_rnd_zones(struct dmz_metadata *zmd)
+unsigned int dmz_nr_unmap_rnd_zones(struct dmz_metadata *zmd, int idx)
 {
-	unsigned int nr_unmap_rnd_zones = 0;
-	int i;
-
-	for (i = 0; i < zmd->nr_devs; i++)
-		nr_unmap_rnd_zones += atomic_read(&zmd->dev[i].unmap_nr_rnd);
-	return nr_unmap_rnd_zones;
+	return atomic_read(&zmd->dev[idx].unmap_nr_rnd);
 }
 
 unsigned int dmz_nr_cache_zones(struct dmz_metadata *zmd)
@@ -300,24 +295,14 @@ unsigned int dmz_nr_unmap_cache_zones(struct dmz_metadata *zmd)
 	return atomic_read(&zmd->unmap_nr_cache);
 }
 
-unsigned int dmz_nr_seq_zones(struct dmz_metadata *zmd)
+unsigned int dmz_nr_seq_zones(struct dmz_metadata *zmd, int idx)
 {
-	unsigned int nr_seq_zones = 0;
-	int i;
-
-	for (i = 0; i < zmd->nr_devs; i++)
-		nr_seq_zones += zmd->dev[i].nr_seq;
-	return nr_seq_zones;
+	return zmd->dev[idx].nr_seq;
 }
 
-unsigned int dmz_nr_unmap_seq_zones(struct dmz_metadata *zmd)
+unsigned int dmz_nr_unmap_seq_zones(struct dmz_metadata *zmd, int idx)
 {
-	unsigned int nr_unmap_seq_zones = 0;
-	int i;
-
-	for (i = 0; i < zmd->nr_devs; i++)
-		nr_unmap_seq_zones += atomic_read(&zmd->dev[i].unmap_nr_seq);
-	return nr_unmap_seq_zones;
+	return atomic_read(&zmd->dev[idx].unmap_nr_seq);
 }
 
 static struct dm_zone *dmz_get(struct dmz_metadata *zmd, unsigned int zone_id)
@@ -1530,7 +1515,20 @@ static int dmz_init_zones(struct dmz_metadata *zmd)
 		 */
 		zmd->sb[0].zone = dmz_get(zmd, 0);
 
-		zoned_dev = &zmd->dev[1];
+		for (i = 1; i < zmd->nr_devs; i++) {
+			zoned_dev = &zmd->dev[i];
+
+			ret = blkdev_report_zones(zoned_dev->bdev, 0,
+						  BLK_ALL_ZONES,
+						  dmz_init_zone, zoned_dev);
+			if (ret < 0) {
+				DMDEBUG("(%s): Failed to report zones, error %d",
+					zmd->devname, ret);
+				dmz_drop_zones(zmd);
+				return ret;
+			}
+		}
+		return 0;
 	}
 
 	/*
@@ -2921,10 +2919,14 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
 		      zmd->nr_data_zones, zmd->nr_chunks);
 	dmz_zmd_debug(zmd, "    %u cache zones (%u unmapped)",
 		      zmd->nr_cache, atomic_read(&zmd->unmap_nr_cache));
-	dmz_zmd_debug(zmd, "    %u random zones (%u unmapped)",
-		      dmz_nr_rnd_zones(zmd), dmz_nr_unmap_rnd_zones(zmd));
-	dmz_zmd_debug(zmd, "    %u sequential zones (%u unmapped)",
-		      dmz_nr_seq_zones(zmd), dmz_nr_unmap_seq_zones(zmd));
+	for (i = 0; i < zmd->nr_devs; i++) {
+		dmz_zmd_debug(zmd, "    %u random zones (%u unmapped)",
+			      dmz_nr_rnd_zones(zmd, i),
+			      dmz_nr_unmap_rnd_zones(zmd, i));
+		dmz_zmd_debug(zmd, "    %u sequential zones (%u unmapped)",
+			      dmz_nr_seq_zones(zmd, i),
+			      dmz_nr_unmap_seq_zones(zmd, i));
+	}
 	dmz_zmd_debug(zmd, "  %u reserved sequential data zones",
 		      zmd->nr_reserved_seq);
 	dmz_zmd_debug(zmd, "Format:");
diff --git a/drivers/md/dm-zoned-reclaim.c b/drivers/md/dm-zoned-reclaim.c
index fba0d48e38a7..f2e053b5f2db 100644
--- a/drivers/md/dm-zoned-reclaim.c
+++ b/drivers/md/dm-zoned-reclaim.c
@@ -442,15 +442,18 @@ static unsigned int dmz_reclaim_percentage(struct dmz_reclaim *zrc)
 {
 	struct dmz_metadata *zmd = zrc->metadata;
 	unsigned int nr_cache = dmz_nr_cache_zones(zmd);
-	unsigned int nr_rnd = dmz_nr_rnd_zones(zmd);
-	unsigned int nr_unmap, nr_zones;
+	unsigned int nr_unmap = 0, nr_zones = 0;
 
 	if (nr_cache) {
 		nr_zones = nr_cache;
 		nr_unmap = dmz_nr_unmap_cache_zones(zmd);
 	} else {
-		nr_zones = nr_rnd;
-		nr_unmap = dmz_nr_unmap_rnd_zones(zmd);
+		int i;
+
+		for (i = 0; i < dmz_nr_devs(zmd); i++) {
+			nr_zones += dmz_nr_rnd_zones(zmd, i);
+			nr_unmap += dmz_nr_unmap_rnd_zones(zmd, i);
+		}
 	}
 	return nr_unmap * 100 / nr_zones;
 }
@@ -460,7 +463,11 @@ static unsigned int dmz_reclaim_percentage(struct dmz_reclaim *zrc)
  */
 static bool dmz_should_reclaim(struct dmz_reclaim *zrc, unsigned int p_unmap)
 {
-	unsigned int nr_reclaim = dmz_nr_rnd_zones(zrc->metadata);
+	int i;
+	unsigned int nr_reclaim = 0;
+
+	for (i = 0; i < dmz_nr_devs(zrc->metadata); i++)
+		nr_reclaim += dmz_nr_rnd_zones(zrc->metadata, i);
 
 	if (dmz_nr_cache_zones(zrc->metadata))
 		nr_reclaim += dmz_nr_cache_zones(zrc->metadata);
@@ -487,8 +494,8 @@ static void dmz_reclaim_work(struct work_struct *work)
 {
 	struct dmz_reclaim *zrc = container_of(work, struct dmz_reclaim, work.work);
 	struct dmz_metadata *zmd = zrc->metadata;
-	unsigned int p_unmap;
-	int ret;
+	unsigned int p_unmap, nr_unmap_rnd = 0, nr_rnd = 0;
+	int ret, i;
 
 	if (dmz_dev_is_dying(zmd))
 		return;
@@ -513,14 +520,17 @@ static void dmz_reclaim_work(struct work_struct *work)
 		zrc->kc_throttle.throttle = min(75U, 100U - p_unmap / 2);
 	}
 
+	for (i = 0; i < dmz_nr_devs(zmd); i++) {
+		nr_unmap_rnd += dmz_nr_unmap_rnd_zones(zmd, i);
+		nr_rnd += dmz_nr_rnd_zones(zmd, i);
+	}
 	DMDEBUG("(%s): Reclaim (%u): %s, %u%% free zones (%u/%u cache %u/%u random)",
 		dmz_metadata_label(zmd),
 		zrc->kc_throttle.throttle,
 		(dmz_target_idle(zrc) ? "Idle" : "Busy"),
 		p_unmap, dmz_nr_unmap_cache_zones(zmd),
 		dmz_nr_cache_zones(zmd),
-		dmz_nr_unmap_rnd_zones(zmd),
-		dmz_nr_rnd_zones(zmd));
+		nr_unmap_rnd, nr_rnd);
 
 	ret = dmz_do_reclaim(zrc);
 	if (ret && ret != -EINTR) {
diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
index bca9a611b8dd..f34fcc3f7cc6 100644
--- a/drivers/md/dm-zoned-target.c
+++ b/drivers/md/dm-zoned-target.c
@@ -13,8 +13,6 @@
 
 #define DMZ_MIN_BIOS		8192
 
-#define DMZ_MAX_DEVS		2
-
 /*
  * Zone BIO context.
  */
@@ -40,9 +38,10 @@ struct dm_chunk_work {
  * Target descriptor.
  */
 struct dmz_target {
-	struct dm_dev		*ddev[DMZ_MAX_DEVS];
+	struct dm_dev		**ddev;
+	unsigned int		nr_ddevs;
 
-	unsigned long		flags;
+	unsigned int		flags;
 
 	/* Zoned block device information */
 	struct dmz_dev		*dev;
@@ -764,7 +763,7 @@ static void dmz_put_zoned_device(struct dm_target *ti)
 	struct dmz_target *dmz = ti->private;
 	int i;
 
-	for (i = 0; i < DMZ_MAX_DEVS; i++) {
+	for (i = 0; i < dmz->nr_ddevs; i++) {
 		if (dmz->ddev[i]) {
 			dm_put_device(ti, dmz->ddev[i]);
 			dmz->ddev[i] = NULL;
@@ -777,21 +776,35 @@ static int dmz_fixup_devices(struct dm_target *ti)
 	struct dmz_target *dmz = ti->private;
 	struct dmz_dev *reg_dev, *zoned_dev;
 	struct request_queue *q;
+	sector_t zone_nr_sectors = 0;
+	int i;
 
 	/*
-	 * When we have two devices, the first one must be a regular block
-	 * device and the second a zoned block device.
+	 * When we have more than on devices, the first one must be a
+	 * regular block device and the others zoned block devices.
 	 */
-	if (dmz->ddev[0] && dmz->ddev[1]) {
+	if (dmz->nr_ddevs > 1) {
 		reg_dev = &dmz->dev[0];
 		if (!(reg_dev->flags & DMZ_BDEV_REGULAR)) {
 			ti->error = "Primary disk is not a regular device";
 			return -EINVAL;
 		}
-		zoned_dev = &dmz->dev[1];
-		if (zoned_dev->flags & DMZ_BDEV_REGULAR) {
-			ti->error = "Secondary disk is not a zoned device";
-			return -EINVAL;
+		for (i = 1; i < dmz->nr_ddevs; i++) {
+			zoned_dev = &dmz->dev[i];
+			if (zoned_dev->flags & DMZ_BDEV_REGULAR) {
+				ti->error = "Secondary disk is not a zoned device";
+				return -EINVAL;
+			}
+			q = bdev_get_queue(zoned_dev->bdev);
+			if (zone_nr_sectors &&
+			    zone_nr_sectors != blk_queue_zone_sectors(q)) {
+				ti->error = "Zone nr sectors mismatch";
+				return -EINVAL;
+			}
+			zone_nr_sectors = blk_queue_zone_sectors(q);
+			zoned_dev->zone_nr_sectors = zone_nr_sectors;
+			zoned_dev->nr_zones =
+				blkdev_nr_zones(zoned_dev->bdev->bd_disk);
 		}
 	} else {
 		reg_dev = NULL;
@@ -800,17 +813,24 @@ static int dmz_fixup_devices(struct dm_target *ti)
 			ti->error = "Disk is not a zoned device";
 			return -EINVAL;
 		}
+		q = bdev_get_queue(zoned_dev->bdev);
+		zoned_dev->zone_nr_sectors = blk_queue_zone_sectors(q);
+		zoned_dev->nr_zones = blkdev_nr_zones(zoned_dev->bdev->bd_disk);
 	}
-	q = bdev_get_queue(zoned_dev->bdev);
-	zoned_dev->zone_nr_sectors = blk_queue_zone_sectors(q);
-	zoned_dev->nr_zones = blkdev_nr_zones(zoned_dev->bdev->bd_disk);
 
 	if (reg_dev) {
-		reg_dev->zone_nr_sectors = zoned_dev->zone_nr_sectors;
+		sector_t zone_offset;
+
+		reg_dev->zone_nr_sectors = zone_nr_sectors;
 		reg_dev->nr_zones =
 			DIV_ROUND_UP_SECTOR_T(reg_dev->capacity,
 					      reg_dev->zone_nr_sectors);
-		zoned_dev->zone_offset = reg_dev->nr_zones;
+		reg_dev->zone_offset = 0;
+		zone_offset = reg_dev->nr_zones;
+		for (i = 1; i < dmz->nr_ddevs; i++) {
+			dmz->dev[i].zone_offset = zone_offset;
+			zone_offset += dmz->dev[i].nr_zones;
+		}
 	}
 	return 0;
 }
@@ -821,10 +841,10 @@ static int dmz_fixup_devices(struct dm_target *ti)
 static int dmz_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 {
 	struct dmz_target *dmz;
-	int ret;
+	int ret, i;
 
 	/* Check arguments */
-	if (argc < 1 || argc > 2) {
+	if (argc < 1) {
 		ti->error = "Invalid argument count";
 		return -EINVAL;
 	}
@@ -835,31 +855,31 @@ static int dmz_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 		ti->error = "Unable to allocate the zoned target descriptor";
 		return -ENOMEM;
 	}
-	dmz->dev = kcalloc(2, sizeof(struct dmz_dev), GFP_KERNEL);
+	dmz->dev = kcalloc(argc, sizeof(struct dmz_dev), GFP_KERNEL);
 	if (!dmz->dev) {
 		ti->error = "Unable to allocate the zoned device descriptors";
 		kfree(dmz);
 		return -ENOMEM;
 	}
+	dmz->ddev = kcalloc(argc, sizeof(struct dm_dev *), GFP_KERNEL);
+	if (!dmz->ddev) {
+		ti->error = "Unable to allocate the dm device descriptors";
+		ret = -ENOMEM;
+		goto err;
+	}
+	dmz->nr_ddevs = argc;
+
 	ti->private = dmz;
 
 	/* Get the target zoned block device */
-	ret = dmz_get_zoned_device(ti, argv[0], 0, argc);
-	if (ret)
-		goto err;
-
-	if (argc == 2) {
-		ret = dmz_get_zoned_device(ti, argv[1], 1, argc);
-		if (ret) {
-			dmz_put_zoned_device(ti);
-			goto err;
-		}
+	for (i = 0; i < argc; i++) {
+		ret = dmz_get_zoned_device(ti, argv[i], i, argc);
+		if (ret)
+			goto err_dev;
 	}
 	ret = dmz_fixup_devices(ti);
-	if (ret) {
-		dmz_put_zoned_device(ti);
-		goto err;
-	}
+	if (ret)
+		goto err_dev;
 
 	/* Initialize metadata */
 	ret = dmz_ctr_metadata(dmz->dev, argc, &dmz->metadata,
@@ -1047,13 +1067,13 @@ static int dmz_iterate_devices(struct dm_target *ti,
 	struct dmz_target *dmz = ti->private;
 	unsigned int zone_nr_sectors = dmz_zone_nr_sectors(dmz->metadata);
 	sector_t capacity;
-	int r;
+	int i, r;
 
-	capacity = dmz->dev[0].capacity & ~(zone_nr_sectors - 1);
-	r = fn(ti, dmz->ddev[0], 0, capacity, data);
-	if (!r && dmz->ddev[1]) {
-		capacity = dmz->dev[1].capacity & ~(zone_nr_sectors - 1);
-		r = fn(ti, dmz->ddev[1], 0, capacity, data);
+	for (i = 0; i < dmz->nr_ddevs; i++) {
+		capacity = dmz->dev[i].capacity & ~(zone_nr_sectors - 1);
+		r = fn(ti, dmz->ddev[i], 0, capacity, data);
+		if (r)
+			break;
 	}
 	return r;
 }
@@ -1066,24 +1086,35 @@ static void dmz_status(struct dm_target *ti, status_type_t type,
 	ssize_t sz = 0;
 	char buf[BDEVNAME_SIZE];
 	struct dmz_dev *dev;
+	int i;
 
 	switch (type) {
 	case STATUSTYPE_INFO:
-		DMEMIT("%u zones %u/%u cache %u/%u random %u/%u sequential",
+		DMEMIT("%u zones %u/%u cache",
 		       dmz_nr_zones(dmz->metadata),
 		       dmz_nr_unmap_cache_zones(dmz->metadata),
-		       dmz_nr_cache_zones(dmz->metadata),
-		       dmz_nr_unmap_rnd_zones(dmz->metadata),
-		       dmz_nr_rnd_zones(dmz->metadata),
-		       dmz_nr_unmap_seq_zones(dmz->metadata),
-		       dmz_nr_seq_zones(dmz->metadata));
+		       dmz_nr_cache_zones(dmz->metadata));
+		for (i = 0; i < dmz->nr_ddevs; i++) {
+			/*
+			 * For a multi-device setup the first device
+			 * contains only cache zones.
+			 */
+			if ((i == 0) &&
+			    (dmz_nr_cache_zones(dmz->metadata) > 0))
+				continue;
+			DMEMIT(" %u/%u random %u/%u sequential",
+			       dmz_nr_unmap_rnd_zones(dmz->metadata, i),
+			       dmz_nr_rnd_zones(dmz->metadata, i),
+			       dmz_nr_unmap_seq_zones(dmz->metadata, i),
+			       dmz_nr_seq_zones(dmz->metadata, i));
+		}
 		break;
 	case STATUSTYPE_TABLE:
 		dev = &dmz->dev[0];
 		format_dev_t(buf, dev->bdev->bd_dev);
 		DMEMIT("%s", buf);
-		if (dmz->dev[1].bdev) {
-			dev = &dmz->dev[1];
+		for (i = 1; i < dmz->nr_ddevs; i++) {
+			dev = &dmz->dev[i];
 			format_dev_t(buf, dev->bdev->bd_dev);
 			DMEMIT(" %s", buf);
 		}
@@ -1108,7 +1139,7 @@ static int dmz_message(struct dm_target *ti, unsigned int argc, char **argv,
 
 static struct target_type dmz_type = {
 	.name		 = "zoned",
-	.version	 = {2, 0, 0},
+	.version	 = {3, 0, 0},
 	.features	 = DM_TARGET_SINGLETON | DM_TARGET_ZONED_HM,
 	.module		 = THIS_MODULE,
 	.ctr		 = dmz_ctr,
diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
index 56e138586d9b..0052eee12299 100644
--- a/drivers/md/dm-zoned.h
+++ b/drivers/md/dm-zoned.h
@@ -219,13 +219,14 @@ void dmz_free_zone(struct dmz_metadata *zmd, struct dm_zone *zone);
 void dmz_map_zone(struct dmz_metadata *zmd, struct dm_zone *zone,
 		  unsigned int chunk);
 void dmz_unmap_zone(struct dmz_metadata *zmd, struct dm_zone *zone);
+unsigned int dmz_nr_devs(struct dmz_metadata *zmd);
 unsigned int dmz_nr_zones(struct dmz_metadata *zmd);
 unsigned int dmz_nr_cache_zones(struct dmz_metadata *zmd);
 unsigned int dmz_nr_unmap_cache_zones(struct dmz_metadata *zmd);
-unsigned int dmz_nr_rnd_zones(struct dmz_metadata *zmd);
-unsigned int dmz_nr_unmap_rnd_zones(struct dmz_metadata *zmd);
-unsigned int dmz_nr_seq_zones(struct dmz_metadata *zmd);
-unsigned int dmz_nr_unmap_seq_zones(struct dmz_metadata *zmd);
+unsigned int dmz_nr_rnd_zones(struct dmz_metadata *zmd, int idx);
+unsigned int dmz_nr_unmap_rnd_zones(struct dmz_metadata *zmd, int idx);
+unsigned int dmz_nr_seq_zones(struct dmz_metadata *zmd, int idx);
+unsigned int dmz_nr_unmap_seq_zones(struct dmz_metadata *zmd, int idx);
 unsigned int dmz_zone_nr_blocks(struct dmz_metadata *zmd);
 unsigned int dmz_zone_nr_blocks_shift(struct dmz_metadata *zmd);
 unsigned int dmz_zone_nr_sectors(struct dmz_metadata *zmd);
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 11/12] dm-zoned: round-robin load balancer for reclaiming zones
  2020-05-22 15:38 [PATCH RFC 00/12] dm-zoned: multi-device support Hannes Reinecke
                   ` (9 preceding siblings ...)
  2020-05-22 15:38 ` [PATCH 10/12] dm-zoned: support arbitrary number of devices Hannes Reinecke
@ 2020-05-22 15:39 ` Hannes Reinecke
  2020-05-25  2:42   ` Damien Le Moal
  2020-05-22 15:39 ` [PATCH 12/12] dm-zoned: per-device reclaim Hannes Reinecke
  11 siblings, 1 reply; 34+ messages in thread
From: Hannes Reinecke @ 2020-05-22 15:39 UTC (permalink / raw)
  To: Damien LeMoal; +Cc: dm-devel, Mike Snitzer

When reclaiming zones we should arbitrate between the zoned
devices to get a better throughput. So implement a simple
round-robin load balancer between the zoned devices.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-zoned-metadata.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index 87784e7785bc..25dcad2a565f 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -171,6 +171,8 @@ struct dmz_metadata {
 	unsigned int		nr_reserved_seq;
 	unsigned int		nr_chunks;
 
+	unsigned int		last_alloc_idx;
+
 	/* Zone information array */
 	struct xarray		zones;
 
@@ -2178,7 +2180,7 @@ struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags)
 {
 	struct list_head *list;
 	struct dm_zone *zone;
-	unsigned int dev_idx = 0;
+	unsigned int dev_idx = zmd->last_alloc_idx;
 
 again:
 	if (flags & DMZ_ALLOC_CACHE)
@@ -2214,6 +2216,9 @@ struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags)
 	zone = list_first_entry(list, struct dm_zone, link);
 	list_del_init(&zone->link);
 
+	if (!(flags & DMZ_ALLOC_CACHE))
+		zmd->last_alloc_idx = (dev_idx + 1) % zmd->nr_devs;
+
 	if (dmz_is_cache(zone))
 		atomic_dec(&zmd->unmap_nr_cache);
 	else if (dmz_is_rnd(zone))
@@ -2839,6 +2844,7 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
 	zmd->dev = dev;
 	zmd->nr_devs = num_dev;
 	zmd->mblk_rbtree = RB_ROOT;
+	zmd->last_alloc_idx = 0;
 	init_rwsem(&zmd->mblk_sem);
 	mutex_init(&zmd->mblk_flush_lock);
 	spin_lock_init(&zmd->mblk_lock);
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 12/12] dm-zoned: per-device reclaim
  2020-05-22 15:38 [PATCH RFC 00/12] dm-zoned: multi-device support Hannes Reinecke
                   ` (10 preceding siblings ...)
  2020-05-22 15:39 ` [PATCH 11/12] dm-zoned: round-robin load balancer for reclaiming zones Hannes Reinecke
@ 2020-05-22 15:39 ` Hannes Reinecke
  2020-05-25  2:46   ` Damien Le Moal
  11 siblings, 1 reply; 34+ messages in thread
From: Hannes Reinecke @ 2020-05-22 15:39 UTC (permalink / raw)
  To: Damien LeMoal; +Cc: dm-devel, Mike Snitzer

Instead of having one reclaim workqueue for the entire set we should
be allocating a reclaim workqueue per device; that will reduce
contention and should boost performance for a multi-device setup.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-zoned-reclaim.c | 70 +++++++++++++++++++++----------------------
 drivers/md/dm-zoned-target.c  | 36 +++++++++++++---------
 drivers/md/dm-zoned.h         | 38 ++++++++++++-----------
 3 files changed, 76 insertions(+), 68 deletions(-)

diff --git a/drivers/md/dm-zoned-reclaim.c b/drivers/md/dm-zoned-reclaim.c
index f2e053b5f2db..6f3d8f18b989 100644
--- a/drivers/md/dm-zoned-reclaim.c
+++ b/drivers/md/dm-zoned-reclaim.c
@@ -21,6 +21,8 @@ struct dmz_reclaim {
 	struct dm_kcopyd_throttle kc_throttle;
 	int			kc_err;
 
+	int			dev_idx;
+
 	unsigned long		flags;
 
 	/* Last target access time */
@@ -197,8 +199,8 @@ static int dmz_reclaim_buf(struct dmz_reclaim *zrc, struct dm_zone *dzone)
 	struct dmz_metadata *zmd = zrc->metadata;
 	int ret;
 
-	DMDEBUG("(%s): Chunk %u, move buf zone %u (weight %u) to data zone %u (weight %u)",
-		dmz_metadata_label(zmd),
+	DMDEBUG("(%s/%u): Chunk %u, move buf zone %u (weight %u) to data zone %u (weight %u)",
+		dmz_metadata_label(zmd), zrc->dev_idx,
 		dzone->chunk, bzone->id, dmz_weight(bzone),
 		dzone->id, dmz_weight(dzone));
 
@@ -236,8 +238,8 @@ static int dmz_reclaim_seq_data(struct dmz_reclaim *zrc, struct dm_zone *dzone)
 	struct dmz_metadata *zmd = zrc->metadata;
 	int ret = 0;
 
-	DMDEBUG("(%s): Chunk %u, move data zone %u (weight %u) to buf zone %u (weight %u)",
-		dmz_metadata_label(zmd),
+	DMDEBUG("(%s/%u): Chunk %u, move data zone %u (weight %u) to buf zone %u (weight %u)",
+		dmz_metadata_label(zmd), zrc->dev_idx,
 		chunk, dzone->id, dmz_weight(dzone),
 		bzone->id, dmz_weight(bzone));
 
@@ -294,8 +296,8 @@ static int dmz_reclaim_rnd_data(struct dmz_reclaim *zrc, struct dm_zone *dzone)
 	if (!szone)
 		return -ENOSPC;
 
-	DMDEBUG("(%s): Chunk %u, move %s zone %u (weight %u) to %s zone %u",
-		dmz_metadata_label(zmd), chunk,
+	DMDEBUG("(%s/%u): Chunk %u, move %s zone %u (weight %u) to %s zone %u",
+		dmz_metadata_label(zmd), zrc->dev_idx, chunk,
 		dmz_is_cache(dzone) ? "cache" : "rnd",
 		dzone->id, dmz_weight(dzone),
 		dmz_is_rnd(szone) ? "rnd" : "seq", szone->id);
@@ -368,8 +370,8 @@ static int dmz_do_reclaim(struct dmz_reclaim *zrc)
 	/* Get a data zone */
 	dzone = dmz_get_zone_for_reclaim(zmd, dmz_target_idle(zrc));
 	if (!dzone) {
-		DMDEBUG("(%s): No zone found to reclaim",
-			dmz_metadata_label(zmd));
+		DMDEBUG("(%s/%u): No zone found to reclaim",
+			dmz_metadata_label(zmd), zrc->dev_idx);
 		return -EBUSY;
 	}
 
@@ -416,24 +418,26 @@ static int dmz_do_reclaim(struct dmz_reclaim *zrc)
 out:
 	if (ret) {
 		if (ret == -EINTR)
-			DMDEBUG("(%s): reclaim zone %u interrupted",
-				dmz_metadata_label(zmd), rzone->id);
+			DMDEBUG("(%s/%u): reclaim zone %u interrupted",
+				dmz_metadata_label(zmd), zrc->dev_idx,
+				rzone->id);
 		else
-			DMDEBUG("(%s): Failed to reclaim zone %u, err %d",
-				dmz_metadata_label(zmd), rzone->id, ret);
+			DMDEBUG("(%s/%u): Failed to reclaim zone %u, err %d",
+				dmz_metadata_label(zmd), zrc->dev_idx,
+				rzone->id, ret);
 		dmz_unlock_zone_reclaim(dzone);
 		return ret;
 	}
 
 	ret = dmz_flush_metadata(zrc->metadata);
 	if (ret) {
-		DMDEBUG("(%s): Metadata flush for zone %u failed, err %d",
-			dmz_metadata_label(zmd), rzone->id, ret);
+		DMDEBUG("(%s/%u): Metadata flush for zone %u failed, err %d",
+			dmz_metadata_label(zmd), zrc->dev_idx, rzone->id, ret);
 		return ret;
 	}
 
-	DMDEBUG("(%s): Reclaimed zone %u in %u ms",
-		dmz_metadata_label(zmd),
+	DMDEBUG("(%s/%u): Reclaimed zone %u in %u ms",
+		dmz_metadata_label(zmd), zrc->dev_idx,
 		rzone->id, jiffies_to_msecs(jiffies - start));
 	return 0;
 }
@@ -448,12 +452,8 @@ static unsigned int dmz_reclaim_percentage(struct dmz_reclaim *zrc)
 		nr_zones = nr_cache;
 		nr_unmap = dmz_nr_unmap_cache_zones(zmd);
 	} else {
-		int i;
-
-		for (i = 0; i < dmz_nr_devs(zmd); i++) {
-			nr_zones += dmz_nr_rnd_zones(zmd, i);
-			nr_unmap += dmz_nr_unmap_rnd_zones(zmd, i);
-		}
+		nr_zones = dmz_nr_rnd_zones(zmd, zrc->dev_idx);
+		nr_unmap = dmz_nr_unmap_rnd_zones(zmd, zrc->dev_idx);
 	}
 	return nr_unmap * 100 / nr_zones;
 }
@@ -463,11 +463,9 @@ static unsigned int dmz_reclaim_percentage(struct dmz_reclaim *zrc)
  */
 static bool dmz_should_reclaim(struct dmz_reclaim *zrc, unsigned int p_unmap)
 {
-	int i;
-	unsigned int nr_reclaim = 0;
+	unsigned int nr_reclaim;
 
-	for (i = 0; i < dmz_nr_devs(zrc->metadata); i++)
-		nr_reclaim += dmz_nr_rnd_zones(zrc->metadata, i);
+	nr_reclaim = dmz_nr_rnd_zones(zrc->metadata, zrc->dev_idx);
 
 	if (dmz_nr_cache_zones(zrc->metadata))
 		nr_reclaim += dmz_nr_cache_zones(zrc->metadata);
@@ -495,7 +493,7 @@ static void dmz_reclaim_work(struct work_struct *work)
 	struct dmz_reclaim *zrc = container_of(work, struct dmz_reclaim, work.work);
 	struct dmz_metadata *zmd = zrc->metadata;
 	unsigned int p_unmap, nr_unmap_rnd = 0, nr_rnd = 0;
-	int ret, i;
+	int ret;
 
 	if (dmz_dev_is_dying(zmd))
 		return;
@@ -520,12 +518,11 @@ static void dmz_reclaim_work(struct work_struct *work)
 		zrc->kc_throttle.throttle = min(75U, 100U - p_unmap / 2);
 	}
 
-	for (i = 0; i < dmz_nr_devs(zmd); i++) {
-		nr_unmap_rnd += dmz_nr_unmap_rnd_zones(zmd, i);
-		nr_rnd += dmz_nr_rnd_zones(zmd, i);
-	}
-	DMDEBUG("(%s): Reclaim (%u): %s, %u%% free zones (%u/%u cache %u/%u random)",
-		dmz_metadata_label(zmd),
+	nr_unmap_rnd = dmz_nr_unmap_rnd_zones(zmd, zrc->dev_idx);
+	nr_rnd = dmz_nr_rnd_zones(zmd, zrc->dev_idx);
+
+	DMDEBUG("(%s/%u): Reclaim (%u): %s, %u%% free zones (%u/%u cache %u/%u random)",
+		dmz_metadata_label(zmd), zrc->dev_idx,
 		zrc->kc_throttle.throttle,
 		(dmz_target_idle(zrc) ? "Idle" : "Busy"),
 		p_unmap, dmz_nr_unmap_cache_zones(zmd),
@@ -545,7 +542,7 @@ static void dmz_reclaim_work(struct work_struct *work)
  * Initialize reclaim.
  */
 int dmz_ctr_reclaim(struct dmz_metadata *zmd,
-		    struct dmz_reclaim **reclaim)
+		    struct dmz_reclaim **reclaim, int idx)
 {
 	struct dmz_reclaim *zrc;
 	int ret;
@@ -556,6 +553,7 @@ int dmz_ctr_reclaim(struct dmz_metadata *zmd,
 
 	zrc->metadata = zmd;
 	zrc->atime = jiffies;
+	zrc->dev_idx = idx;
 
 	/* Reclaim kcopyd client */
 	zrc->kc = dm_kcopyd_client_create(&zrc->kc_throttle);
@@ -567,8 +565,8 @@ int dmz_ctr_reclaim(struct dmz_metadata *zmd,
 
 	/* Reclaim work */
 	INIT_DELAYED_WORK(&zrc->work, dmz_reclaim_work);
-	zrc->wq = alloc_ordered_workqueue("dmz_rwq_%s", WQ_MEM_RECLAIM,
-					  dmz_metadata_label(zmd));
+	zrc->wq = alloc_ordered_workqueue("dmz_rwq_%s_%d", WQ_MEM_RECLAIM,
+					  dmz_metadata_label(zmd), idx);
 	if (!zrc->wq) {
 		ret = -ENOMEM;
 		goto err;
diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
index f34fcc3f7cc6..a33c26a6ab31 100644
--- a/drivers/md/dm-zoned-target.c
+++ b/drivers/md/dm-zoned-target.c
@@ -49,9 +49,6 @@ struct dmz_target {
 	/* For metadata handling */
 	struct dmz_metadata     *metadata;
 
-	/* For reclaim */
-	struct dmz_reclaim	*reclaim;
-
 	/* For chunk work */
 	struct radix_tree_root	chunk_rxtree;
 	struct workqueue_struct *chunk_wq;
@@ -402,14 +399,15 @@ static void dmz_handle_bio(struct dmz_target *dmz, struct dm_chunk_work *cw,
 		dm_per_bio_data(bio, sizeof(struct dmz_bioctx));
 	struct dmz_metadata *zmd = dmz->metadata;
 	struct dm_zone *zone;
-	int ret;
+	int i, ret;
 
 	/*
 	 * Write may trigger a zone allocation. So make sure the
 	 * allocation can succeed.
 	 */
 	if (bio_op(bio) == REQ_OP_WRITE)
-		dmz_schedule_reclaim(dmz->reclaim);
+		for (i = 0; i < dmz->nr_ddevs; i++)
+			dmz_schedule_reclaim(dmz->dev[i].reclaim);
 
 	dmz_lock_metadata(zmd);
 
@@ -575,7 +573,6 @@ static int dmz_queue_chunk_work(struct dmz_target *dmz, struct bio *bio)
 
 	bio_list_add(&cw->bio_list, bio);
 
-	dmz_reclaim_bio_acc(dmz->reclaim);
 	if (queue_work(dmz->chunk_wq, &cw->work))
 		dmz_get_chunk_work(cw);
 out:
@@ -935,10 +932,12 @@ static int dmz_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	mod_delayed_work(dmz->flush_wq, &dmz->flush_work, DMZ_FLUSH_PERIOD);
 
 	/* Initialize reclaim */
-	ret = dmz_ctr_reclaim(dmz->metadata, &dmz->reclaim);
-	if (ret) {
-		ti->error = "Zone reclaim initialization failed";
-		goto err_fwq;
+	for (i = 0; i < argc; i++) {
+		ret = dmz_ctr_reclaim(dmz->metadata, &dmz->dev[i].reclaim, i);
+		if (ret) {
+			ti->error = "Zone reclaim initialization failed";
+			goto err_fwq;
+		}
 	}
 
 	DMINFO("(%s): Target device: %llu 512-byte logical sectors (%llu blocks)",
@@ -971,11 +970,13 @@ static int dmz_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 static void dmz_dtr(struct dm_target *ti)
 {
 	struct dmz_target *dmz = ti->private;
+	int i;
 
 	flush_workqueue(dmz->chunk_wq);
 	destroy_workqueue(dmz->chunk_wq);
 
-	dmz_dtr_reclaim(dmz->reclaim);
+	for (i = 0; i < dmz_nr_devs(dmz->metadata); i++)
+		dmz_dtr_reclaim(dmz->dev[i].reclaim);
 
 	cancel_delayed_work_sync(&dmz->flush_work);
 	destroy_workqueue(dmz->flush_wq);
@@ -1044,9 +1045,11 @@ static int dmz_prepare_ioctl(struct dm_target *ti, struct block_device **bdev)
 static void dmz_suspend(struct dm_target *ti)
 {
 	struct dmz_target *dmz = ti->private;
+	int i;
 
 	flush_workqueue(dmz->chunk_wq);
-	dmz_suspend_reclaim(dmz->reclaim);
+	for (i = 0; i < dmz->nr_ddevs; i++)
+		dmz_suspend_reclaim(dmz->dev[i].reclaim);
 	cancel_delayed_work_sync(&dmz->flush_work);
 }
 
@@ -1056,9 +1059,11 @@ static void dmz_suspend(struct dm_target *ti)
 static void dmz_resume(struct dm_target *ti)
 {
 	struct dmz_target *dmz = ti->private;
+	int i;
 
 	queue_delayed_work(dmz->flush_wq, &dmz->flush_work, DMZ_FLUSH_PERIOD);
-	dmz_resume_reclaim(dmz->reclaim);
+	for (i = 0; i < dmz->nr_ddevs; i++)
+		dmz_resume_reclaim(dmz->dev[i].reclaim);
 }
 
 static int dmz_iterate_devices(struct dm_target *ti,
@@ -1130,7 +1135,10 @@ static int dmz_message(struct dm_target *ti, unsigned int argc, char **argv,
 	int r = -EINVAL;
 
 	if (!strcasecmp(argv[0], "reclaim")) {
-		dmz_schedule_reclaim(dmz->reclaim);
+		int i;
+
+		for (i = 0; i < dmz->nr_ddevs; i++)
+			dmz_schedule_reclaim(dmz->dev[i].reclaim);
 		r = 0;
 	} else
 		DMERR("unrecognized message %s", argv[0]);
diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
index 0052eee12299..1ee91a3a4076 100644
--- a/drivers/md/dm-zoned.h
+++ b/drivers/md/dm-zoned.h
@@ -54,6 +54,7 @@ struct dmz_reclaim;
 struct dmz_dev {
 	struct block_device	*bdev;
 	struct dmz_metadata	*metadata;
+	struct dmz_reclaim	*reclaim;
 
 	char			name[BDEVNAME_SIZE];
 	uuid_t			uuid;
@@ -240,23 +241,6 @@ static inline void dmz_activate_zone(struct dm_zone *zone)
 	atomic_inc(&zone->refcount);
 }
 
-/*
- * Deactivate a zone. This decrement the zone reference counter
- * indicating that all BIOs to the zone have completed when the count is 0.
- */
-static inline void dmz_deactivate_zone(struct dm_zone *zone)
-{
-	atomic_dec(&zone->refcount);
-}
-
-/*
- * Test if a zone is active, that is, has a refcount > 0.
- */
-static inline bool dmz_is_active(struct dm_zone *zone)
-{
-	return atomic_read(&zone->refcount);
-}
-
 int dmz_lock_zone_reclaim(struct dm_zone *zone);
 void dmz_unlock_zone_reclaim(struct dm_zone *zone);
 struct dm_zone *dmz_get_zone_for_reclaim(struct dmz_metadata *zmd, bool idle);
@@ -283,7 +267,7 @@ int dmz_merge_valid_blocks(struct dmz_metadata *zmd, struct dm_zone *from_zone,
 /*
  * Functions defined in dm-zoned-reclaim.c
  */
-int dmz_ctr_reclaim(struct dmz_metadata *zmd, struct dmz_reclaim **zrc);
+int dmz_ctr_reclaim(struct dmz_metadata *zmd, struct dmz_reclaim **zrc, int idx);
 void dmz_dtr_reclaim(struct dmz_reclaim *zrc);
 void dmz_suspend_reclaim(struct dmz_reclaim *zrc);
 void dmz_resume_reclaim(struct dmz_reclaim *zrc);
@@ -296,4 +280,22 @@ void dmz_schedule_reclaim(struct dmz_reclaim *zrc);
 bool dmz_bdev_is_dying(struct dmz_dev *dmz_dev);
 bool dmz_check_bdev(struct dmz_dev *dmz_dev);
 
+/*
+ * Deactivate a zone. This decrement the zone reference counter
+ * indicating that all BIOs to the zone have completed when the count is 0.
+ */
+static inline void dmz_deactivate_zone(struct dm_zone *zone)
+{
+	dmz_reclaim_bio_acc(zone->dev->reclaim);
+	atomic_dec(&zone->refcount);
+}
+
+/*
+ * Test if a zone is active, that is, has a refcount > 0.
+ */
+static inline bool dmz_is_active(struct dm_zone *zone)
+{
+	return atomic_read(&zone->refcount);
+}
+
 #endif /* DM_ZONED_H */
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 01/12] dm-zoned: add debugging message for reading superblocks
  2020-05-22 15:38 ` [PATCH 01/12] dm-zoned: add debugging message for reading superblocks Hannes Reinecke
@ 2020-05-25  1:54   ` Damien Le Moal
  0 siblings, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2020-05-25  1:54 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: dm-devel, Mike Snitzer

On 2020/05/23 0:39, Hannes Reinecke wrote:
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/md/dm-zoned-metadata.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> index 4a2e351365c5..b0d3ed4ac56a 100644
> --- a/drivers/md/dm-zoned-metadata.c
> +++ b/drivers/md/dm-zoned-metadata.c
> @@ -1105,6 +1105,9 @@ static int dmz_check_sb(struct dmz_metadata *zmd, unsigned int set)
>   */
>  static int dmz_read_sb(struct dmz_metadata *zmd, unsigned int set)
>  {
> +	DMDEBUG("(%s): read superblock set %d dev %s block %llu",
> +		zmd->devname, set, zmd->sb[set].dev->name,
> +		zmd->sb[set].block);

A blank line here would be nice. Cosmetic only, no big deal.

>  	return dmz_rdwr_block(zmd->sb[set].dev, REQ_OP_READ,
>  			      zmd->sb[set].block, zmd->sb[set].mblk->page);
>  }
> 

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 02/12] dm-zoned: convert to xarray
  2020-05-22 15:38 ` [PATCH 02/12] dm-zoned: convert to xarray Hannes Reinecke
@ 2020-05-25  2:01   ` Damien Le Moal
  2020-05-25  7:40     ` Hannes Reinecke
  0 siblings, 1 reply; 34+ messages in thread
From: Damien Le Moal @ 2020-05-25  2:01 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: dm-devel, Mike Snitzer

On 2020/05/23 0:39, Hannes Reinecke wrote:
> The zones array is getting really large, and large arrays
> tend to wreak havoc with the caches.

s/caches/CPU cache, may be ?

> So convert it to xarray to become more cache friendly.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/md/dm-zoned-metadata.c | 98 +++++++++++++++++++++++++++++++-----------
>  1 file changed, 73 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> index b0d3ed4ac56a..3da6702bb1ae 100644
> --- a/drivers/md/dm-zoned-metadata.c
> +++ b/drivers/md/dm-zoned-metadata.c
> @@ -172,7 +172,7 @@ struct dmz_metadata {
>  	unsigned int		nr_chunks;
>  
>  	/* Zone information array */
> -	struct dm_zone		*zones;
> +	struct xarray		zones;
>  
>  	struct dmz_sb		sb[3];
>  	unsigned int		mblk_primary;
> @@ -327,6 +327,11 @@ unsigned int dmz_nr_unmap_seq_zones(struct dmz_metadata *zmd)
>  	return atomic_read(&zmd->unmap_nr_seq);
>  }
>  
> +static struct dm_zone *dmz_get(struct dmz_metadata *zmd, unsigned int zone_id)
> +{
> +	return xa_load(&zmd->zones, zone_id);
> +}
> +
>  const char *dmz_metadata_label(struct dmz_metadata *zmd)
>  {
>  	return (const char *)zmd->label;
> @@ -1121,6 +1126,7 @@ static int dmz_lookup_secondary_sb(struct dmz_metadata *zmd)
>  {
>  	unsigned int zone_nr_blocks = zmd->zone_nr_blocks;
>  	struct dmz_mblock *mblk;
> +	unsigned int zone_id = zmd->sb[0].zone->id;
>  	int i;
>  
>  	/* Allocate a block */
> @@ -1133,17 +1139,16 @@ static int dmz_lookup_secondary_sb(struct dmz_metadata *zmd)
>  
>  	/* Bad first super block: search for the second one */
>  	zmd->sb[1].block = zmd->sb[0].block + zone_nr_blocks;
> -	zmd->sb[1].zone = zmd->sb[0].zone + 1;
> +	zmd->sb[1].zone = xa_load(&zmd->zones, zone_id + 1);
>  	zmd->sb[1].dev = dmz_zone_to_dev(zmd, zmd->sb[1].zone);
> -	for (i = 0; i < zmd->nr_rnd_zones - 1; i++) {
> +	for (i = 1; i < zmd->nr_rnd_zones; i++) {
>  		if (dmz_read_sb(zmd, 1) != 0)
>  			break;
> -		if (le32_to_cpu(zmd->sb[1].sb->magic) == DMZ_MAGIC) {
> -			zmd->sb[1].zone += i;
> +		if (le32_to_cpu(zmd->sb[1].sb->magic) == DMZ_MAGIC)
>  			return 0;
> -		}
>  		zmd->sb[1].block += zone_nr_blocks;
> -		zmd->sb[1].dev = dmz_zone_to_dev(zmd, zmd->sb[1].zone + i);
> +		zmd->sb[1].zone = dmz_get(zmd, zone_id + i);
> +		zmd->sb[1].dev = dmz_zone_to_dev(zmd, zmd->sb[1].zone);
>  	}
>  
>  	dmz_free_mblock(zmd, mblk);
> @@ -1259,8 +1264,12 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
>  	/* Read and check secondary super block */
>  	if (ret == 0) {
>  		sb_good[0] = true;
> -		if (!zmd->sb[1].zone)
> -			zmd->sb[1].zone = zmd->sb[0].zone + zmd->nr_meta_zones;
> +		if (!zmd->sb[1].zone) {
> +			unsigned int zone_id =
> +				zmd->sb[0].zone->id + zmd->nr_meta_zones;
> +
> +			zmd->sb[1].zone = dmz_get(zmd, zone_id);
> +		}
>  		zmd->sb[1].block = dmz_start_block(zmd, zmd->sb[1].zone);
>  		zmd->sb[1].dev = dmz_zone_to_dev(zmd, zmd->sb[1].zone);
>  		ret = dmz_get_sb(zmd, 1);
> @@ -1341,7 +1350,12 @@ static int dmz_init_zone(struct blk_zone *blkz, unsigned int num, void *data)
>  	struct dmz_metadata *zmd = data;
>  	struct dmz_dev *dev = zmd->nr_devs > 1 ? &zmd->dev[1] : &zmd->dev[0];
>  	int idx = num + dev->zone_offset;
> -	struct dm_zone *zone = &zmd->zones[idx];
> +	struct dm_zone *zone = kzalloc(sizeof(struct dm_zone), GFP_KERNEL);
> +
> +	if (!zone)
> +		return -ENOMEM;
> +	if (xa_insert(&zmd->zones, idx, zone, GFP_KERNEL))
> +		return -EBUSY;
>  
>  	if (blkz->len != zmd->zone_nr_sectors) {
>  		if (zmd->sb_version > 1) {
> @@ -1397,14 +1411,18 @@ static int dmz_init_zone(struct blk_zone *blkz, unsigned int num, void *data)
>  	return 0;
>  }
>  
> -static void dmz_emulate_zones(struct dmz_metadata *zmd, struct dmz_dev *dev)
> +static int dmz_emulate_zones(struct dmz_metadata *zmd, struct dmz_dev *dev)
>  {
>  	int idx;
>  	sector_t zone_offset = 0;
>  
>  	for(idx = 0; idx < dev->nr_zones; idx++) {
> -		struct dm_zone *zone = &zmd->zones[idx];
> -
> +		struct dm_zone *zone =
> +			kzalloc(sizeof(struct dm_zone), GFP_KERNEL);
> +		if (!zone)
> +			return -ENOMEM;
> +		if (xa_insert(&zmd->zones, idx, zone, GFP_KERNEL) < 0)
> +			return -EBUSY;

Same change as in dmz_init_zone(). Make this hunk a helper ?

>  		INIT_LIST_HEAD(&zone->link);
>  		atomic_set(&zone->refcount, 0);
>  		zone->id = idx;

And we can add this inside the helper too.

> @@ -1420,6 +1438,7 @@ static void dmz_emulate_zones(struct dmz_metadata *zmd, struct dmz_dev *dev)
>  		}
>  		zone_offset += zmd->zone_nr_sectors;
>  	}
> +	return 0;
>  }
>  
>  /*
> @@ -1427,8 +1446,15 @@ static void dmz_emulate_zones(struct dmz_metadata *zmd, struct dmz_dev *dev)
>   */
>  static void dmz_drop_zones(struct dmz_metadata *zmd)
>  {
> -	kfree(zmd->zones);
> -	zmd->zones = NULL;
> +	int idx;
> +
> +	for(idx = 0; idx < zmd->nr_zones; idx++) {
> +		struct dm_zone *zone = xa_load(&zmd->zones, idx);
> +
> +		kfree(zone);
> +		xa_erase(&zmd->zones, idx);
> +	}
> +	xa_destroy(&zmd->zones);
>  }
>  
>  /*
> @@ -1460,20 +1486,25 @@ static int dmz_init_zones(struct dmz_metadata *zmd)
>  		DMERR("(%s): No zones found", zmd->devname);
>  		return -ENXIO;
>  	}
> -	zmd->zones = kcalloc(zmd->nr_zones, sizeof(struct dm_zone), GFP_KERNEL);
> -	if (!zmd->zones)
> -		return -ENOMEM;
> +	xa_init(&zmd->zones);
>  
>  	DMDEBUG("(%s): Using %zu B for zone information",
>  		zmd->devname, sizeof(struct dm_zone) * zmd->nr_zones);
>  
>  	if (zmd->nr_devs > 1) {
> -		dmz_emulate_zones(zmd, &zmd->dev[0]);
> +		ret = dmz_emulate_zones(zmd, &zmd->dev[0]);
> +		if (ret < 0) {
> +			DMDEBUG("(%s): Failed to emulate zones, error %d",
> +				zmd->devname, ret);
> +			dmz_drop_zones(zmd);
> +			return ret;
> +		}
> +
>  		/*
>  		 * Primary superblock zone is always at zone 0 when multiple
>  		 * drives are present.
>  		 */
> -		zmd->sb[0].zone = &zmd->zones[0];
> +		zmd->sb[0].zone = dmz_get(zmd, 0);
>  
>  		zoned_dev = &zmd->dev[1];
>  	}
> @@ -1576,11 +1607,6 @@ static int dmz_handle_seq_write_err(struct dmz_metadata *zmd,
>  	return 0;
>  }
>  
> -static struct dm_zone *dmz_get(struct dmz_metadata *zmd, unsigned int zone_id)
> -{
> -	return &zmd->zones[zone_id];
> -}
> -
>  /*
>   * Reset a zone write pointer.
>   */
> @@ -1662,6 +1688,11 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
>  		}
>  
>  		dzone = dmz_get(zmd, dzone_id);
> +		if (!dzone) {
> +			dmz_zmd_err(zmd, "Chunk %u mapping: data zone %u not present",
> +				    chunk, dzone_id);
> +			return -EIO;
> +		}
>  		set_bit(DMZ_DATA, &dzone->flags);
>  		dzone->chunk = chunk;
>  		dmz_get_zone_weight(zmd, dzone);
> @@ -1685,6 +1716,11 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
>  		}
>  
>  		bzone = dmz_get(zmd, bzone_id);
> +		if (!bzone) {
> +			dmz_zmd_err(zmd, "Chunk %u mapping: buffer zone %u not present",
> +				    chunk, bzone_id);
> +			return -EIO;
> +		}
>  		if (!dmz_is_rnd(bzone) && !dmz_is_cache(bzone)) {
>  			dmz_zmd_err(zmd, "Chunk %u mapping: invalid buffer zone %u",
>  				    chunk, bzone_id);
> @@ -1715,6 +1751,8 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
>  	 */
>  	for (i = 0; i < zmd->nr_zones; i++) {
>  		dzone = dmz_get(zmd, i);
> +		if (!dzone)
> +			continue;
>  		if (dmz_is_meta(dzone))
>  			continue;
>  		if (dmz_is_offline(dzone))
> @@ -1977,6 +2015,10 @@ struct dm_zone *dmz_get_chunk_mapping(struct dmz_metadata *zmd, unsigned int chu
>  	} else {
>  		/* The chunk is already mapped: get the mapping zone */
>  		dzone = dmz_get(zmd, dzone_id);
> +		if (!dzone) {
> +			dzone = ERR_PTR(-EIO);
> +			goto out;
> +		}
>  		if (dzone->chunk != chunk) {
>  			dzone = ERR_PTR(-EIO);
>  			goto out;
> @@ -2794,6 +2836,12 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
>  	/* Set metadata zones starting from sb_zone */
>  	for (i = 0; i < zmd->nr_meta_zones << 1; i++) {
>  		zone = dmz_get(zmd, zmd->sb[0].zone->id + i);
> +		if (!zone) {
> +			dmz_zmd_err(zmd,
> +				    "metadata zone %u not present", i);
> +			ret = -ENXIO;
> +			goto err;
> +		}
>  		if (!dmz_is_rnd(zone) && !dmz_is_cache(zone)) {
>  			dmz_zmd_err(zmd,
>  				    "metadata zone %d is not random", i);
> 

Apart from the nits above, this looks good to me.

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 03/12] dm-zoned: use on-stack superblock for tertiary devices
  2020-05-22 15:38 ` [PATCH 03/12] dm-zoned: use on-stack superblock for tertiary devices Hannes Reinecke
@ 2020-05-25  2:09   ` Damien Le Moal
  2020-05-25  7:41     ` Hannes Reinecke
  2020-05-26  8:25     ` Hannes Reinecke
  0 siblings, 2 replies; 34+ messages in thread
From: Damien Le Moal @ 2020-05-25  2:09 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: dm-devel, Mike Snitzer

On 2020/05/23 0:39, Hannes Reinecke wrote:
> Checking the teriary superblock just consists of validating UUIDs,

s/teriary/tertiary

> crcs, and the generation number; it doesn't have contents which
> would be required during the actual operation.
> So we should use an on-stack superblock and avoid having to store
> it together with the 'real' superblocks.

...a temoprary in-memory superblock allocation...

The entire structure should not be on stack... see below.

> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/md/dm-zoned-metadata.c | 98 +++++++++++++++++++++++-------------------
>  1 file changed, 53 insertions(+), 45 deletions(-)
> 
> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> index 3da6702bb1ae..b70a988fa771 100644
> --- a/drivers/md/dm-zoned-metadata.c
> +++ b/drivers/md/dm-zoned-metadata.c
> @@ -174,7 +174,7 @@ struct dmz_metadata {
>  	/* Zone information array */
>  	struct xarray		zones;
>  
> -	struct dmz_sb		sb[3];
> +	struct dmz_sb		sb[2];
>  	unsigned int		mblk_primary;
>  	unsigned int		sb_version;
>  	u64			sb_gen;
> @@ -995,10 +995,11 @@ int dmz_flush_metadata(struct dmz_metadata *zmd)
>  /*
>   * Check super block.
>   */
> -static int dmz_check_sb(struct dmz_metadata *zmd, unsigned int set)
> +static int dmz_check_sb(struct dmz_metadata *zmd, struct dmz_sb *dsb,
> +			bool tertiary)
>  {
> -	struct dmz_super *sb = zmd->sb[set].sb;
> -	struct dmz_dev *dev = zmd->sb[set].dev;
> +	struct dmz_super *sb = dsb->sb;
> +	struct dmz_dev *dev = dsb->dev;
>  	unsigned int nr_meta_zones, nr_data_zones;
>  	u32 crc, stored_crc;
>  	u64 gen;
> @@ -1015,7 +1016,7 @@ static int dmz_check_sb(struct dmz_metadata *zmd, unsigned int set)
>  			    DMZ_META_VER, zmd->sb_version);
>  		return -EINVAL;
>  	}
> -	if ((zmd->sb_version < 1) && (set == 2)) {
> +	if ((zmd->sb_version < 1) && tertiary) {
>  		dmz_dev_err(dev, "Tertiary superblocks are not supported");
>  		return -EINVAL;
>  	}
> @@ -1059,7 +1060,7 @@ static int dmz_check_sb(struct dmz_metadata *zmd, unsigned int set)
>  			return -ENXIO;
>  		}
>  
> -		if (set == 2) {
> +		if (tertiary) {
>  			/*
>  			 * Generation number should be 0, but it doesn't
>  			 * really matter if it isn't.
> @@ -1108,13 +1109,13 @@ static int dmz_check_sb(struct dmz_metadata *zmd, unsigned int set)
>  /*
>   * Read the first or second super block from disk.
>   */
> -static int dmz_read_sb(struct dmz_metadata *zmd, unsigned int set)
> +static int dmz_read_sb(struct dmz_metadata *zmd, struct dmz_sb *sb, int set)
>  {
>  	DMDEBUG("(%s): read superblock set %d dev %s block %llu",
>  		zmd->devname, set, zmd->sb[set].dev->name,
>  		zmd->sb[set].block);
> -	return dmz_rdwr_block(zmd->sb[set].dev, REQ_OP_READ,
> -			      zmd->sb[set].block, zmd->sb[set].mblk->page);
> +	return dmz_rdwr_block(sb->dev, REQ_OP_READ,
> +			      sb->block, sb->mblk->page);
>  }
>  
>  /*
> @@ -1142,7 +1143,7 @@ static int dmz_lookup_secondary_sb(struct dmz_metadata *zmd)
>  	zmd->sb[1].zone = xa_load(&zmd->zones, zone_id + 1);
>  	zmd->sb[1].dev = dmz_zone_to_dev(zmd, zmd->sb[1].zone);
>  	for (i = 1; i < zmd->nr_rnd_zones; i++) {
> -		if (dmz_read_sb(zmd, 1) != 0)
> +		if (dmz_read_sb(zmd, &zmd->sb[1], 1) != 0)
>  			break;
>  		if (le32_to_cpu(zmd->sb[1].sb->magic) == DMZ_MAGIC)
>  			return 0;
> @@ -1160,9 +1161,9 @@ static int dmz_lookup_secondary_sb(struct dmz_metadata *zmd)
>  }
>  
>  /*
> - * Read the first or second super block from disk.
> + * Read a super block from disk.
>   */
> -static int dmz_get_sb(struct dmz_metadata *zmd, unsigned int set)
> +static int dmz_get_sb(struct dmz_metadata *zmd, struct dmz_sb *sb, int set)
>  {
>  	struct dmz_mblock *mblk;
>  	int ret;
> @@ -1172,14 +1173,14 @@ static int dmz_get_sb(struct dmz_metadata *zmd, unsigned int set)
>  	if (!mblk)
>  		return -ENOMEM;
>  
> -	zmd->sb[set].mblk = mblk;
> -	zmd->sb[set].sb = mblk->data;
> +	sb->mblk = mblk;
> +	sb->sb = mblk->data;
>  
>  	/* Read super block */
> -	ret = dmz_read_sb(zmd, set);
> +	ret = dmz_read_sb(zmd, sb, set);
>  	if (ret) {
>  		dmz_free_mblock(zmd, mblk);
> -		zmd->sb[set].mblk = NULL;
> +		sb->mblk = NULL;
>  		return ret;
>  	}
>  
> @@ -1253,13 +1254,13 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
>  	/* Read and check the primary super block */
>  	zmd->sb[0].block = dmz_start_block(zmd, zmd->sb[0].zone);
>  	zmd->sb[0].dev = dmz_zone_to_dev(zmd, zmd->sb[0].zone);
> -	ret = dmz_get_sb(zmd, 0);
> +	ret = dmz_get_sb(zmd, &zmd->sb[0], 0);
>  	if (ret) {
>  		dmz_dev_err(zmd->sb[0].dev, "Read primary super block failed");
>  		return ret;
>  	}
>  
> -	ret = dmz_check_sb(zmd, 0);
> +	ret = dmz_check_sb(zmd, &zmd->sb[0], false);
>  
>  	/* Read and check secondary super block */
>  	if (ret == 0) {
> @@ -1272,7 +1273,7 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
>  		}
>  		zmd->sb[1].block = dmz_start_block(zmd, zmd->sb[1].zone);
>  		zmd->sb[1].dev = dmz_zone_to_dev(zmd, zmd->sb[1].zone);
> -		ret = dmz_get_sb(zmd, 1);
> +		ret = dmz_get_sb(zmd, &zmd->sb[1], 1);
>  	} else
>  		ret = dmz_lookup_secondary_sb(zmd);
>  
> @@ -1281,7 +1282,7 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
>  		return ret;
>  	}
>  
> -	ret = dmz_check_sb(zmd, 1);
> +	ret = dmz_check_sb(zmd, &zmd->sb[1], false);
>  	if (ret == 0)
>  		sb_good[1] = true;
>  
> @@ -1326,18 +1327,32 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
>  		      "Using super block %u (gen %llu)",
>  		      zmd->mblk_primary, zmd->sb_gen);
>  
> -	if ((zmd->sb_version > 1) && zmd->sb[2].zone) {
> -		zmd->sb[2].block = dmz_start_block(zmd, zmd->sb[2].zone);
> -		zmd->sb[2].dev = dmz_zone_to_dev(zmd, zmd->sb[2].zone);
> -		ret = dmz_get_sb(zmd, 2);
> -		if (ret) {
> -			dmz_dev_err(zmd->sb[2].dev,
> -				    "Read tertiary super block failed");
> -			return ret;
> +	if (zmd->sb_version > 1) {
> +		int i;
> +
> +		for (i = 1; i < zmd->nr_devs; i++) {
> +			struct dmz_sb sb;

I would rather have dmz_get_sb() allocate this struct than have it on stack...
It is not big, but still. To be symetric, we can add dmz_put_sb() for freeing it.

> +
> +			sb.block = 0;
> +			sb.zone = dmz_get(zmd, zmd->dev[i].zone_offset);
> +			sb.dev = &zmd->dev[i];
> +			if (!dmz_is_meta(sb.zone)) {
> +				dmz_dev_err(sb.dev,
> +					    "Tertiary super block zone %u not marked as metadata zone",
> +					    sb.zone->id);
> +				return -EINVAL;
> +			}
> +			ret = dmz_get_sb(zmd, &sb, i + 1);
> +			if (ret) {
> +				dmz_dev_err(sb.dev,
> +					    "Read tertiary super block failed");
> +				return ret;
> +			}
> +			ret = dmz_check_sb(zmd, &sb, true);
> +			dmz_free_mblock(zmd, sb.mblk);
> +			if (ret == -EINVAL)
> +				return ret;
>  		}
> -		ret = dmz_check_sb(zmd, 2);
> -		if (ret == -EINVAL)
> -			return ret;
>  	}
>  	return 0;
>  }
> @@ -1402,12 +1417,15 @@ static int dmz_init_zone(struct blk_zone *blkz, unsigned int num, void *data)
>  				zmd->sb[0].zone = zone;
>  			}
>  		}
> -		if (zmd->nr_devs > 1 && !zmd->sb[2].zone) {
> -			/* Tertiary superblock zone */
> -			zmd->sb[2].zone = zone;
> +		if (zmd->nr_devs > 1 && num == 0) {
> +			/*
> +			 * Tertiary superblock zones are always at the
> +			 * start of the zoned devices, so mark them
> +			 * as metadata zone.
> +			 */
> +			set_bit(DMZ_META, &zone->flags);
>  		}
>  	}
> -
>  	return 0;
>  }
>  
> @@ -2850,16 +2868,6 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
>  		}
>  		set_bit(DMZ_META, &zone->flags);
>  	}
> -	if (zmd->sb[2].zone) {
> -		zone = dmz_get(zmd, zmd->sb[2].zone->id);
> -		if (!zone) {
> -			dmz_zmd_err(zmd,
> -				    "Tertiary metadata zone not present");
> -			ret = -ENXIO;
> -			goto err;
> -		}
> -		set_bit(DMZ_META, &zone->flags);
> -	}
>  	/* Load mapping table */
>  	ret = dmz_load_mapping(zmd);
>  	if (ret)
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 04/12] dm-zoned: secondary superblock must reside on the same devices than primary superblock
  2020-05-22 15:38 ` [PATCH 04/12] dm-zoned: secondary superblock must reside on the same devices than primary superblock Hannes Reinecke
@ 2020-05-25  2:10   ` Damien Le Moal
  0 siblings, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2020-05-25  2:10 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: dm-devel, Mike Snitzer

On 2020/05/23 0:39, Hannes Reinecke wrote:
> The secondary superblock must reside on the same device than the
> primary superblock, so there's no need to re-calculate the device.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/md/dm-zoned-metadata.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> index b70a988fa771..fdae4e0228e7 100644
> --- a/drivers/md/dm-zoned-metadata.c
> +++ b/drivers/md/dm-zoned-metadata.c
> @@ -1141,7 +1141,7 @@ static int dmz_lookup_secondary_sb(struct dmz_metadata *zmd)
>  	/* Bad first super block: search for the second one */
>  	zmd->sb[1].block = zmd->sb[0].block + zone_nr_blocks;
>  	zmd->sb[1].zone = xa_load(&zmd->zones, zone_id + 1);
> -	zmd->sb[1].dev = dmz_zone_to_dev(zmd, zmd->sb[1].zone);
> +	zmd->sb[1].dev = zmd->sb[0].dev;
>  	for (i = 1; i < zmd->nr_rnd_zones; i++) {
>  		if (dmz_read_sb(zmd, &zmd->sb[1], 1) != 0)
>  			break;
> @@ -1149,7 +1149,6 @@ static int dmz_lookup_secondary_sb(struct dmz_metadata *zmd)
>  			return 0;
>  		zmd->sb[1].block += zone_nr_blocks;
>  		zmd->sb[1].zone = dmz_get(zmd, zone_id + i);
> -		zmd->sb[1].dev = dmz_zone_to_dev(zmd, zmd->sb[1].zone);
>  	}
>  
>  	dmz_free_mblock(zmd, mblk);
> @@ -1272,7 +1271,7 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
>  			zmd->sb[1].zone = dmz_get(zmd, zone_id);
>  		}
>  		zmd->sb[1].block = dmz_start_block(zmd, zmd->sb[1].zone);
> -		zmd->sb[1].dev = dmz_zone_to_dev(zmd, zmd->sb[1].zone);
> +		zmd->sb[1].dev = zmd->sb[0].dev;
>  		ret = dmz_get_sb(zmd, &zmd->sb[1], 1);
>  	} else
>  		ret = dmz_lookup_secondary_sb(zmd);
> 

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 05/12] dm-zoned: add device pointer to struct dm_zone
  2020-05-22 15:38 ` [PATCH 05/12] dm-zoned: add device pointer to struct dm_zone Hannes Reinecke
@ 2020-05-25  2:15   ` Damien Le Moal
  2020-05-25  7:42     ` Hannes Reinecke
  0 siblings, 1 reply; 34+ messages in thread
From: Damien Le Moal @ 2020-05-25  2:15 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: dm-devel, Mike Snitzer

On 2020/05/23 0:39, Hannes Reinecke wrote:
> Add a pointer to the containing device to struct dm_zone and
> kill dmz_zone_to_dev().
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/md/dm-zoned-metadata.c | 47 ++++++++++++------------------------------
>  drivers/md/dm-zoned-reclaim.c  | 18 +++++++---------
>  drivers/md/dm-zoned-target.c   |  7 +++----
>  drivers/md/dm-zoned.h          |  4 +++-
>  4 files changed, 26 insertions(+), 50 deletions(-)
> 
> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> index fdae4e0228e7..7b6e7404f1e8 100644
> --- a/drivers/md/dm-zoned-metadata.c
> +++ b/drivers/md/dm-zoned-metadata.c
> @@ -229,16 +229,10 @@ struct dmz_metadata {
>   */
>  static unsigned int dmz_dev_zone_id(struct dmz_metadata *zmd, struct dm_zone *zone)
>  {
> -	unsigned int zone_id;
> -
>  	if (WARN_ON(!zone))
>  		return 0;
>  
> -	zone_id = zone->id;
> -	if (zmd->nr_devs > 1 &&
> -	    (zone_id >= zmd->dev[1].zone_offset))
> -		zone_id -= zmd->dev[1].zone_offset;
> -	return zone_id;
> +	return zone->id - zone->dev->zone_offset;
>  }
>  
>  sector_t dmz_start_sect(struct dmz_metadata *zmd, struct dm_zone *zone)
> @@ -255,18 +249,6 @@ sector_t dmz_start_block(struct dmz_metadata *zmd, struct dm_zone *zone)
>  	return (sector_t)zone_id << zmd->zone_nr_blocks_shift;
>  }
>  
> -struct dmz_dev *dmz_zone_to_dev(struct dmz_metadata *zmd, struct dm_zone *zone)
> -{
> -	if (WARN_ON(!zone))
> -		return &zmd->dev[0];
> -
> -	if (zmd->nr_devs > 1 &&
> -	    zone->id >= zmd->dev[1].zone_offset)
> -		return &zmd->dev[1];
> -
> -	return &zmd->dev[0];
> -}
> -
>  unsigned int dmz_zone_nr_blocks(struct dmz_metadata *zmd)
>  {
>  	return zmd->zone_nr_blocks;
> @@ -1252,7 +1234,7 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
>  
>  	/* Read and check the primary super block */
>  	zmd->sb[0].block = dmz_start_block(zmd, zmd->sb[0].zone);
> -	zmd->sb[0].dev = dmz_zone_to_dev(zmd, zmd->sb[0].zone);
> +	zmd->sb[0].dev = zmd->sb[0].zone->dev;
>  	ret = dmz_get_sb(zmd, &zmd->sb[0], 0);
>  	if (ret) {
>  		dmz_dev_err(zmd->sb[0].dev, "Read primary super block failed");
> @@ -1383,6 +1365,7 @@ static int dmz_init_zone(struct blk_zone *blkz, unsigned int num, void *data)
>  
>  	INIT_LIST_HEAD(&zone->link);
>  	atomic_set(&zone->refcount, 0);
> +	zone->dev = dev;
>  	zone->id = idx;
>  	zone->chunk = DMZ_MAP_UNMAPPED;
>  
> @@ -1442,6 +1425,7 @@ static int dmz_emulate_zones(struct dmz_metadata *zmd, struct dmz_dev *dev)
>  			return -EBUSY;
>  		INIT_LIST_HEAD(&zone->link);
>  		atomic_set(&zone->refcount, 0);
> +		zone->dev = dev;
>  		zone->id = idx;
>  		zone->chunk = DMZ_MAP_UNMAPPED;
>  		set_bit(DMZ_CACHE, &zone->flags);
> @@ -1567,11 +1551,10 @@ static int dmz_update_zone_cb(struct blk_zone *blkz, unsigned int idx,
>   */
>  static int dmz_update_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
>  {
> -	struct dmz_dev *dev = dmz_zone_to_dev(zmd, zone);

If you keep this one and make it:

	struct dmz_dev *dev = zone->dev;

You can avoid all the changes below, and dereferencing the same pointer multiple
times.

>  	unsigned int noio_flag;
>  	int ret;
>  
> -	if (dev->flags & DMZ_BDEV_REGULAR)
> +	if (zone->dev->flags & DMZ_BDEV_REGULAR)
>  		return 0;
>  
>  	/*
> @@ -1581,16 +1564,16 @@ static int dmz_update_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
>  	 * GFP_NOIO was specified.
>  	 */
>  	noio_flag = memalloc_noio_save();
> -	ret = blkdev_report_zones(dev->bdev, dmz_start_sect(zmd, zone), 1,
> +	ret = blkdev_report_zones(zone->dev->bdev, dmz_start_sect(zmd, zone), 1,
>  				  dmz_update_zone_cb, zone);
>  	memalloc_noio_restore(noio_flag);
>  
>  	if (ret == 0)
>  		ret = -EIO;
>  	if (ret < 0) {
> -		dmz_dev_err(dev, "Get zone %u report failed",
> +		dmz_dev_err(zone->dev, "Get zone %u report failed",
>  			    zone->id);
> -		dmz_check_bdev(dev);
> +		dmz_check_bdev(zone->dev);
>  		return ret;
>  	}
>  
> @@ -1604,7 +1587,6 @@ static int dmz_update_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
>  static int dmz_handle_seq_write_err(struct dmz_metadata *zmd,
>  				    struct dm_zone *zone)
>  {
> -	struct dmz_dev *dev = dmz_zone_to_dev(zmd, zone);
>  	unsigned int wp = 0;
>  	int ret;
>  
> @@ -1613,7 +1595,8 @@ static int dmz_handle_seq_write_err(struct dmz_metadata *zmd,
>  	if (ret)
>  		return ret;
>  
> -	dmz_dev_warn(dev, "Processing zone %u write error (zone wp %u/%u)",
> +	dmz_dev_warn(zone->dev,
> +		     "Processing zone %u write error (zone wp %u/%u)",
>  		     zone->id, zone->wp_block, wp);
>  
>  	if (zone->wp_block < wp) {
> @@ -1641,13 +1624,11 @@ static int dmz_reset_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
>  		return 0;
>  
>  	if (!dmz_is_empty(zone) || dmz_seq_write_err(zone)) {
> -		struct dmz_dev *dev = dmz_zone_to_dev(zmd, zone);
> -
> -		ret = blkdev_zone_mgmt(dev->bdev, REQ_OP_ZONE_RESET,
> +		ret = blkdev_zone_mgmt(zone->dev->bdev, REQ_OP_ZONE_RESET,
>  				       dmz_start_sect(zmd, zone),
>  				       zmd->zone_nr_sectors, GFP_NOIO);
>  		if (ret) {
> -			dmz_dev_err(dev, "Reset zone %u failed %d",
> +			dmz_dev_err(zone->dev, "Reset zone %u failed %d",
>  				    zone->id, ret);
>  			return ret;
>  		}
> @@ -2201,9 +2182,7 @@ struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags)
>  		goto again;
>  	}
>  	if (dmz_is_meta(zone)) {
> -		struct dmz_dev *dev = dmz_zone_to_dev(zmd, zone);
> -
> -		dmz_dev_warn(dev, "Zone %u has metadata", zone->id);
> +		dmz_zmd_warn(zmd, "Zone %u has metadata", zone->id);
>  		zone = NULL;
>  		goto again;
>  	}

Same comment as above for all these changes.

> diff --git a/drivers/md/dm-zoned-reclaim.c b/drivers/md/dm-zoned-reclaim.c
> index 571bc1d41bab..d1a72b42dea2 100644
> --- a/drivers/md/dm-zoned-reclaim.c
> +++ b/drivers/md/dm-zoned-reclaim.c
> @@ -58,7 +58,6 @@ static int dmz_reclaim_align_wp(struct dmz_reclaim *zrc, struct dm_zone *zone,
>  				sector_t block)
>  {
>  	struct dmz_metadata *zmd = zrc->metadata;
> -	struct dmz_dev *dev = dmz_zone_to_dev(zmd, zone);
>  	sector_t wp_block = zone->wp_block;
>  	unsigned int nr_blocks;
>  	int ret;
> @@ -74,15 +73,15 @@ static int dmz_reclaim_align_wp(struct dmz_reclaim *zrc, struct dm_zone *zone,
>  	 * pointer and the requested position.
>  	 */
>  	nr_blocks = block - wp_block;
> -	ret = blkdev_issue_zeroout(dev->bdev,
> +	ret = blkdev_issue_zeroout(zone->dev->bdev,
>  				   dmz_start_sect(zmd, zone) + dmz_blk2sect(wp_block),
>  				   dmz_blk2sect(nr_blocks), GFP_NOIO, 0);
>  	if (ret) {
> -		dmz_dev_err(dev,
> +		dmz_dev_err(zone->dev,
>  			    "Align zone %u wp %llu to %llu (wp+%u) blocks failed %d",
>  			    zone->id, (unsigned long long)wp_block,
>  			    (unsigned long long)block, nr_blocks, ret);
> -		dmz_check_bdev(dev);
> +		dmz_check_bdev(zone->dev);
>  		return ret;
>  	}

Same again.

>  
> @@ -116,7 +115,6 @@ static int dmz_reclaim_copy(struct dmz_reclaim *zrc,
>  			    struct dm_zone *src_zone, struct dm_zone *dst_zone)
>  {
>  	struct dmz_metadata *zmd = zrc->metadata;
> -	struct dmz_dev *src_dev, *dst_dev;
>  	struct dm_io_region src, dst;
>  	sector_t block = 0, end_block;
>  	sector_t nr_blocks;
> @@ -130,17 +128,15 @@ static int dmz_reclaim_copy(struct dmz_reclaim *zrc,
>  	else
>  		end_block = dmz_zone_nr_blocks(zmd);
>  	src_zone_block = dmz_start_block(zmd, src_zone);
> -	src_dev = dmz_zone_to_dev(zmd, src_zone);
>  	dst_zone_block = dmz_start_block(zmd, dst_zone);
> -	dst_dev = dmz_zone_to_dev(zmd, dst_zone);
>  
>  	if (dmz_is_seq(dst_zone))
>  		set_bit(DM_KCOPYD_WRITE_SEQ, &flags);
>  
>  	while (block < end_block) {
> -		if (src_dev->flags & DMZ_BDEV_DYING)
> +		if (src_zone->dev->flags & DMZ_BDEV_DYING)
>  			return -EIO;
> -		if (dst_dev->flags & DMZ_BDEV_DYING)
> +		if (dst_zone->dev->flags & DMZ_BDEV_DYING)
>  			return -EIO;
>  
>  		if (dmz_reclaim_should_terminate(src_zone))
> @@ -163,11 +159,11 @@ static int dmz_reclaim_copy(struct dmz_reclaim *zrc,
>  				return ret;
>  		}
>  
> -		src.bdev = src_dev->bdev;
> +		src.bdev = src_zone->dev->bdev;
>  		src.sector = dmz_blk2sect(src_zone_block + block);
>  		src.count = dmz_blk2sect(nr_blocks);
>  
> -		dst.bdev = dst_dev->bdev;
> +		dst.bdev = dst_zone->dev->bdev;
>  		dst.sector = dmz_blk2sect(dst_zone_block + block);
>  		dst.count = src.count;

And again the same here.

>  
> diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
> index 2770e293a97b..bca9a611b8dd 100644
> --- a/drivers/md/dm-zoned-target.c
> +++ b/drivers/md/dm-zoned-target.c
> @@ -123,18 +123,17 @@ static int dmz_submit_bio(struct dmz_target *dmz, struct dm_zone *zone,
>  {
>  	struct dmz_bioctx *bioctx =
>  		dm_per_bio_data(bio, sizeof(struct dmz_bioctx));
> -	struct dmz_dev *dev = dmz_zone_to_dev(dmz->metadata, zone);
>  	struct bio *clone;
>  
> -	if (dev->flags & DMZ_BDEV_DYING)
> +	if (zone->dev->flags & DMZ_BDEV_DYING)
>  		return -EIO;
>  
>  	clone = bio_clone_fast(bio, GFP_NOIO, &dmz->bio_set);
>  	if (!clone)
>  		return -ENOMEM;
>  
> -	bio_set_dev(clone, dev->bdev);
> -	bioctx->dev = dev;
> +	bio_set_dev(clone, zone->dev->bdev);
> +	bioctx->dev = zone->dev;
>  	clone->bi_iter.bi_sector =
>  		dmz_start_sect(dmz->metadata, zone) + dmz_blk2sect(chunk_block);
>  	clone->bi_iter.bi_size = dmz_blk2sect(nr_blocks) << SECTOR_SHIFT;

And here too. Yhe patch would become much shorter :)

> diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
> index 8083607b9535..356b436425e4 100644
> --- a/drivers/md/dm-zoned.h
> +++ b/drivers/md/dm-zoned.h
> @@ -80,6 +80,9 @@ struct dm_zone {
>  	/* For listing the zone depending on its state */
>  	struct list_head	link;
>  
> +	/* Device containing this zone */
> +	struct dmz_dev		*dev;
> +
>  	/* Zone type and state */
>  	unsigned long		flags;
>  
> @@ -188,7 +191,6 @@ const char *dmz_metadata_label(struct dmz_metadata *zmd);
>  sector_t dmz_start_sect(struct dmz_metadata *zmd, struct dm_zone *zone);
>  sector_t dmz_start_block(struct dmz_metadata *zmd, struct dm_zone *zone);
>  unsigned int dmz_nr_chunks(struct dmz_metadata *zmd);
> -struct dmz_dev *dmz_zone_to_dev(struct dmz_metadata *zmd, struct dm_zone *zone);
>  
>  bool dmz_check_dev(struct dmz_metadata *zmd);
>  bool dmz_dev_is_dying(struct dmz_metadata *zmd);
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 06/12] dm-zoned: add metadata pointer to struct dmz_dev
  2020-05-22 15:38 ` [PATCH 06/12] dm-zoned: add metadata pointer to struct dmz_dev Hannes Reinecke
@ 2020-05-25  2:17   ` Damien Le Moal
  0 siblings, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2020-05-25  2:17 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: dm-devel, Mike Snitzer

On 2020/05/23 0:39, Hannes Reinecke wrote:
> Add a metadata pointer to struct dmz_dev and use it as argument
> for blkdev_report_zones() instead of the metadata itself.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/md/dm-zoned-metadata.c | 14 +++++++++-----
>  drivers/md/dm-zoned.h          |  7 ++++---
>  2 files changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> index 7b6e7404f1e8..73479b4c8bca 100644
> --- a/drivers/md/dm-zoned-metadata.c
> +++ b/drivers/md/dm-zoned-metadata.c
> @@ -1343,8 +1343,8 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
>   */
>  static int dmz_init_zone(struct blk_zone *blkz, unsigned int num, void *data)
>  {
> -	struct dmz_metadata *zmd = data;
> -	struct dmz_dev *dev = zmd->nr_devs > 1 ? &zmd->dev[1] : &zmd->dev[0];
> +	struct dmz_dev *dev = data;
> +	struct dmz_metadata *zmd = dev->metadata;
>  	int idx = num + dev->zone_offset;
>  	struct dm_zone *zone = kzalloc(sizeof(struct dm_zone), GFP_KERNEL);
>  
> @@ -1480,8 +1480,12 @@ static int dmz_init_zones(struct dmz_metadata *zmd)
>  
>  	/* Allocate zone array */
>  	zmd->nr_zones = 0;
> -	for (i = 0; i < zmd->nr_devs; i++)
> -		zmd->nr_zones += zmd->dev[i].nr_zones;
> +	for (i = 0; i < zmd->nr_devs; i++) {
> +		struct dmz_dev *dev = &zmd->dev[i];
> +
> +		dev->metadata = zmd;
> +		zmd->nr_zones += dev->nr_zones;
> +	}
>  
>  	if (!zmd->nr_zones) {
>  		DMERR("(%s): No zones found", zmd->devname);
> @@ -1516,7 +1520,7 @@ static int dmz_init_zones(struct dmz_metadata *zmd)
>  	 * first randomly writable zone.
>  	 */
>  	ret = blkdev_report_zones(zoned_dev->bdev, 0, BLK_ALL_ZONES,
> -				  dmz_init_zone, zmd);
> +				  dmz_init_zone, zoned_dev);
>  	if (ret < 0) {
>  		DMDEBUG("(%s): Failed to report zones, error %d",
>  			zmd->devname, ret);
> diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
> index 356b436425e4..dab701893b67 100644
> --- a/drivers/md/dm-zoned.h
> +++ b/drivers/md/dm-zoned.h
> @@ -45,11 +45,15 @@
>  #define dmz_bio_block(bio)	dmz_sect2blk((bio)->bi_iter.bi_sector)
>  #define dmz_bio_blocks(bio)	dmz_sect2blk(bio_sectors(bio))
>  
> +struct dmz_metadata;
> +struct dmz_reclaim;
> +
>  /*
>   * Zoned block device information.
>   */
>  struct dmz_dev {
>  	struct block_device	*bdev;
> +	struct dmz_metadata	*metadata;
>  
>  	char			name[BDEVNAME_SIZE];
>  	uuid_t			uuid;
> @@ -168,9 +172,6 @@ enum {
>  #define dmz_dev_debug(dev, format, args...)	\
>  	DMDEBUG("(%s): " format, (dev)->name, ## args)
>  
> -struct dmz_metadata;
> -struct dmz_reclaim;
> -
>  /*
>   * Functions defined in dm-zoned-metadata.c
>   */
> 

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 07/12] dm-zoned: add a 'reserved' zone flag
  2020-05-22 15:38 ` [PATCH 07/12] dm-zoned: add a 'reserved' zone flag Hannes Reinecke
@ 2020-05-25  2:18   ` Damien Le Moal
  0 siblings, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2020-05-25  2:18 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: dm-devel, Mike Snitzer

On 2020/05/23 0:39, Hannes Reinecke wrote:
> Instead of counting the number of reserved zones in dmz_free_zone()
> we should mark the zone as 'reserved' during allocation and simplify
> dmz_free_zone().
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/md/dm-zoned-metadata.c | 4 ++--
>  drivers/md/dm-zoned.h          | 2 ++
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> index 73479b4c8bca..1b9da698a812 100644
> --- a/drivers/md/dm-zoned-metadata.c
> +++ b/drivers/md/dm-zoned-metadata.c
> @@ -1783,6 +1783,7 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
>  			atomic_inc(&zmd->unmap_nr_rnd);
>  		} else if (atomic_read(&zmd->nr_reserved_seq_zones) < zmd->nr_reserved_seq) {
>  			list_add_tail(&dzone->link, &zmd->reserved_seq_zones_list);
> +			set_bit(DMZ_RESERVED, &dzone->flags);
>  			atomic_inc(&zmd->nr_reserved_seq_zones);
>  			zmd->nr_seq--;
>  		} else {
> @@ -2210,8 +2211,7 @@ void dmz_free_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
>  	} else if (dmz_is_rnd(zone)) {
>  		list_add_tail(&zone->link, &zmd->unmap_rnd_list);
>  		atomic_inc(&zmd->unmap_nr_rnd);
> -	} else if (atomic_read(&zmd->nr_reserved_seq_zones) <
> -		   zmd->nr_reserved_seq) {
> +	} else if (dmz_is_reserved(zone)) {
>  		list_add_tail(&zone->link, &zmd->reserved_seq_zones_list);
>  		atomic_inc(&zmd->nr_reserved_seq_zones);
>  	} else {
> diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
> index dab701893b67..983f5b5e9fa0 100644
> --- a/drivers/md/dm-zoned.h
> +++ b/drivers/md/dm-zoned.h
> @@ -130,6 +130,7 @@ enum {
>  	DMZ_META,
>  	DMZ_DATA,
>  	DMZ_BUF,
> +	DMZ_RESERVED,
>  
>  	/* Zone internal state */
>  	DMZ_RECLAIM,
> @@ -147,6 +148,7 @@ enum {
>  #define dmz_is_offline(z)	test_bit(DMZ_OFFLINE, &(z)->flags)
>  #define dmz_is_readonly(z)	test_bit(DMZ_READ_ONLY, &(z)->flags)
>  #define dmz_in_reclaim(z)	test_bit(DMZ_RECLAIM, &(z)->flags)
> +#define dmz_is_reserved(z)	test_bit(DMZ_RESERVED, &(z)->flags)
>  #define dmz_seq_write_err(z)	test_bit(DMZ_SEQ_WRITE_ERR, &(z)->flags)
>  #define dmz_reclaim_should_terminate(z) \
>  				test_bit(DMZ_RECLAIM_TERMINATE, &(z)->flags)
> 

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 08/12] dm-zoned: move random and sequential zones into struct dmz_dev
  2020-05-22 15:38 ` [PATCH 08/12] dm-zoned: move random and sequential zones into struct dmz_dev Hannes Reinecke
@ 2020-05-25  2:27   ` Damien Le Moal
  2020-05-25  7:47     ` Hannes Reinecke
  0 siblings, 1 reply; 34+ messages in thread
From: Damien Le Moal @ 2020-05-25  2:27 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: dm-devel, Mike Snitzer

On 2020/05/23 0:39, Hannes Reinecke wrote:
> Random and sequential zones should be part of the respective
> device structure to make arbitration between devices possible.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/md/dm-zoned-metadata.c | 143 +++++++++++++++++++++++++----------------
>  drivers/md/dm-zoned.h          |  10 +++
>  2 files changed, 99 insertions(+), 54 deletions(-)
> 
> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> index 1b9da698a812..5f44970a6187 100644
> --- a/drivers/md/dm-zoned-metadata.c
> +++ b/drivers/md/dm-zoned-metadata.c
> @@ -192,21 +192,12 @@ struct dmz_metadata {
>  	/* Zone allocation management */
>  	struct mutex		map_lock;
>  	struct dmz_mblock	**map_mblk;
> -	unsigned int		nr_rnd;
> -	atomic_t		unmap_nr_rnd;
> -	struct list_head	unmap_rnd_list;
> -	struct list_head	map_rnd_list;
>  
>  	unsigned int		nr_cache;
>  	atomic_t		unmap_nr_cache;
>  	struct list_head	unmap_cache_list;
>  	struct list_head	map_cache_list;
>  
> -	unsigned int		nr_seq;
> -	atomic_t		unmap_nr_seq;
> -	struct list_head	unmap_seq_list;
> -	struct list_head	map_seq_list;
> -
>  	atomic_t		nr_reserved_seq_zones;
>  	struct list_head	reserved_seq_zones_list;
>  
> @@ -281,12 +272,22 @@ unsigned int dmz_nr_chunks(struct dmz_metadata *zmd)
>  
>  unsigned int dmz_nr_rnd_zones(struct dmz_metadata *zmd)
>  {
> -	return zmd->nr_rnd;
> +	unsigned int nr_rnd_zones = 0;
> +	int i;
> +
> +	for (i = 0; i < zmd->nr_devs; i++)
> +		nr_rnd_zones += zmd->dev[i].nr_rnd;

We could keep the total nr_rnd_zones in dmz_metadata to avoid this one since the
value will never change at run time.

> +	return nr_rnd_zones;
>  }
>  
>  unsigned int dmz_nr_unmap_rnd_zones(struct dmz_metadata *zmd)
>  {
> -	return atomic_read(&zmd->unmap_nr_rnd);
> +	unsigned int nr_unmap_rnd_zones = 0;
> +	int i;
> +
> +	for (i = 0; i < zmd->nr_devs; i++)
> +		nr_unmap_rnd_zones += atomic_read(&zmd->dev[i].unmap_nr_rnd);
> +	return nr_unmap_rnd_zones;
>  }
>  
>  unsigned int dmz_nr_cache_zones(struct dmz_metadata *zmd)
> @@ -301,12 +302,22 @@ unsigned int dmz_nr_unmap_cache_zones(struct dmz_metadata *zmd)
>  
>  unsigned int dmz_nr_seq_zones(struct dmz_metadata *zmd)
>  {
> -	return zmd->nr_seq;
> +	unsigned int nr_seq_zones = 0;
> +	int i;
> +
> +	for (i = 0; i < zmd->nr_devs; i++)
> +		nr_seq_zones += zmd->dev[i].nr_seq;

Same here. This value does not change at runtime.

> +	return nr_seq_zones;
>  }
>  
>  unsigned int dmz_nr_unmap_seq_zones(struct dmz_metadata *zmd)
>  {
> -	return atomic_read(&zmd->unmap_nr_seq);
> +	unsigned int nr_unmap_seq_zones = 0;
> +	int i;
> +
> +	for (i = 0; i < zmd->nr_devs; i++)
> +		nr_unmap_seq_zones += atomic_read(&zmd->dev[i].unmap_nr_seq);
> +	return nr_unmap_seq_zones;
>  }
>  
>  static struct dm_zone *dmz_get(struct dmz_metadata *zmd, unsigned int zone_id)
> @@ -1485,6 +1496,14 @@ static int dmz_init_zones(struct dmz_metadata *zmd)
>  
>  		dev->metadata = zmd;
>  		zmd->nr_zones += dev->nr_zones;
> +
> +		atomic_set(&dev->unmap_nr_rnd, 0);
> +		INIT_LIST_HEAD(&dev->unmap_rnd_list);
> +		INIT_LIST_HEAD(&dev->map_rnd_list);
> +
> +		atomic_set(&dev->unmap_nr_seq, 0);
> +		INIT_LIST_HEAD(&dev->unmap_seq_list);
> +		INIT_LIST_HEAD(&dev->map_seq_list);
>  	}
>  
>  	if (!zmd->nr_zones) {
> @@ -1702,9 +1721,9 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
>  		if (dmz_is_cache(dzone))
>  			list_add_tail(&dzone->link, &zmd->map_cache_list);
>  		else if (dmz_is_rnd(dzone))
> -			list_add_tail(&dzone->link, &zmd->map_rnd_list);
> +			list_add_tail(&dzone->link, &dzone->dev->map_rnd_list);
>  		else
> -			list_add_tail(&dzone->link, &zmd->map_seq_list);
> +			list_add_tail(&dzone->link, &dzone->dev->map_seq_list);
>  
>  		/* Check buffer zone */
>  		bzone_id = le32_to_cpu(dmap[e].bzone_id);
> @@ -1738,7 +1757,7 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
>  		if (dmz_is_cache(bzone))
>  			list_add_tail(&bzone->link, &zmd->map_cache_list);
>  		else
> -			list_add_tail(&bzone->link, &zmd->map_rnd_list);
> +			list_add_tail(&bzone->link, &bzone->dev->map_rnd_list);
>  next:
>  		chunk++;
>  		e++;
> @@ -1763,9 +1782,9 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
>  		if (dmz_is_cache(dzone))
>  			zmd->nr_cache++;
>  		else if (dmz_is_rnd(dzone))
> -			zmd->nr_rnd++;
> +			dzone->dev->nr_rnd++;
>  		else
> -			zmd->nr_seq++;
> +			dzone->dev->nr_seq++;
>  
>  		if (dmz_is_data(dzone)) {
>  			/* Already initialized */
> @@ -1779,16 +1798,18 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
>  			list_add_tail(&dzone->link, &zmd->unmap_cache_list);
>  			atomic_inc(&zmd->unmap_nr_cache);
>  		} else if (dmz_is_rnd(dzone)) {
> -			list_add_tail(&dzone->link, &zmd->unmap_rnd_list);
> -			atomic_inc(&zmd->unmap_nr_rnd);
> +			list_add_tail(&dzone->link,
> +				      &dzone->dev->unmap_rnd_list);
> +			atomic_inc(&dzone->dev->unmap_nr_rnd);
>  		} else if (atomic_read(&zmd->nr_reserved_seq_zones) < zmd->nr_reserved_seq) {
>  			list_add_tail(&dzone->link, &zmd->reserved_seq_zones_list);
>  			set_bit(DMZ_RESERVED, &dzone->flags);
>  			atomic_inc(&zmd->nr_reserved_seq_zones);
> -			zmd->nr_seq--;
> +			dzone->dev->nr_seq--;
>  		} else {
> -			list_add_tail(&dzone->link, &zmd->unmap_seq_list);
> -			atomic_inc(&zmd->unmap_nr_seq);
> +			list_add_tail(&dzone->link,
> +				      &dzone->dev->unmap_seq_list);
> +			atomic_inc(&dzone->dev->unmap_nr_seq);
>  		}
>  	}
>  
> @@ -1822,13 +1843,13 @@ static void __dmz_lru_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
>  	list_del_init(&zone->link);
>  	if (dmz_is_seq(zone)) {
>  		/* LRU rotate sequential zone */
> -		list_add_tail(&zone->link, &zmd->map_seq_list);
> +		list_add_tail(&zone->link, &zone->dev->map_seq_list);
>  	} else if (dmz_is_cache(zone)) {
>  		/* LRU rotate cache zone */
>  		list_add_tail(&zone->link, &zmd->map_cache_list);
>  	} else {
>  		/* LRU rotate random zone */
> -		list_add_tail(&zone->link, &zmd->map_rnd_list);
> +		list_add_tail(&zone->link, &zone->dev->map_rnd_list);
>  	}
>  }
>  
> @@ -1910,14 +1931,24 @@ static struct dm_zone *dmz_get_rnd_zone_for_reclaim(struct dmz_metadata *zmd,
>  {
>  	struct dm_zone *dzone = NULL;
>  	struct dm_zone *zone;
> -	struct list_head *zone_list = &zmd->map_rnd_list;
> +	struct list_head *zone_list;
>  
>  	/* If we have cache zones select from the cache zone list */
>  	if (zmd->nr_cache) {
>  		zone_list = &zmd->map_cache_list;
>  		/* Try to relaim random zones, too, when idle */
> -		if (idle && list_empty(zone_list))
> -			zone_list = &zmd->map_rnd_list;
> +		if (idle && list_empty(zone_list)) {
> +			int i;
> +
> +			for (i = 1; i < zmd->nr_devs; i++) {
> +				zone_list = &zmd->dev[i].map_rnd_list;
> +				if (!list_empty(zone_list))
> +					break;
> +			}

This is going to use the first zoned dev until it has no more random zones, then
switch to the next zoned dev. What about going round-robin on the devices to
increase parallelism between the drives ?


> +		}
> +	} else {
> +		/* Otherwise the random zones are on the first disk */
> +		zone_list = &zmd->dev[0].map_rnd_list;
>  	}
>  
>  	list_for_each_entry(zone, zone_list, link) {
> @@ -1938,12 +1969,17 @@ static struct dm_zone *dmz_get_rnd_zone_for_reclaim(struct dmz_metadata *zmd,
>  static struct dm_zone *dmz_get_seq_zone_for_reclaim(struct dmz_metadata *zmd)
>  {
>  	struct dm_zone *zone;
> +	int i;
>  
> -	list_for_each_entry(zone, &zmd->map_seq_list, link) {
> -		if (!zone->bzone)
> -			continue;
> -		if (dmz_lock_zone_reclaim(zone))
> -			return zone;
> +	for (i = 0; i < zmd->nr_devs; i++) {
> +		struct dmz_dev *dev = &zmd->dev[i];
> +
> +		list_for_each_entry(zone, &dev->map_seq_list, link) {
> +			if (!zone->bzone)
> +				continue;
> +			if (dmz_lock_zone_reclaim(zone))
> +				return zone;
> +		}

Same comment here.

>  	}
>  
>  	return NULL;
> @@ -2129,7 +2165,7 @@ struct dm_zone *dmz_get_chunk_buffer(struct dmz_metadata *zmd,
>  	if (dmz_is_cache(bzone))
>  		list_add_tail(&bzone->link, &zmd->map_cache_list);
>  	else
> -		list_add_tail(&bzone->link, &zmd->map_rnd_list);
> +		list_add_tail(&bzone->link, &bzone->dev->map_rnd_list);
>  out:
>  	dmz_unlock_map(zmd);
>  
> @@ -2144,21 +2180,27 @@ struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags)
>  {
>  	struct list_head *list;
>  	struct dm_zone *zone;
> +	unsigned int dev_idx = 0;
>  
> +again:
>  	if (flags & DMZ_ALLOC_CACHE)
>  		list = &zmd->unmap_cache_list;
>  	else if (flags & DMZ_ALLOC_RND)
> -		list = &zmd->unmap_rnd_list;
> +		list = &zmd->dev[dev_idx].unmap_rnd_list;
>  	else
> -		list = &zmd->unmap_seq_list;
> +		list = &zmd->dev[dev_idx].unmap_seq_list;
>  
> -again:
>  	if (list_empty(list)) {
>  		/*
>  		 * No free zone: return NULL if this is for not reclaim.
>  		 */
>  		if (!(flags & DMZ_ALLOC_RECLAIM))
>  			return NULL;
> +		if (dev_idx < zmd->nr_devs) {
> +			dev_idx++;
> +			goto again;
> +		}
> +
>  		/*
>  		 * Fallback to the reserved sequential zones
>  		 */
> @@ -2177,9 +2219,9 @@ struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags)
>  	if (dmz_is_cache(zone))
>  		atomic_dec(&zmd->unmap_nr_cache);
>  	else if (dmz_is_rnd(zone))
> -		atomic_dec(&zmd->unmap_nr_rnd);
> +		atomic_dec(&zone->dev->unmap_nr_rnd);
>  	else
> -		atomic_dec(&zmd->unmap_nr_seq);
> +		atomic_dec(&zone->dev->unmap_nr_seq);
>  
>  	if (dmz_is_offline(zone)) {
>  		dmz_zmd_warn(zmd, "Zone %u is offline", zone->id);
> @@ -2209,14 +2251,14 @@ void dmz_free_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
>  		list_add_tail(&zone->link, &zmd->unmap_cache_list);
>  		atomic_inc(&zmd->unmap_nr_cache);
>  	} else if (dmz_is_rnd(zone)) {
> -		list_add_tail(&zone->link, &zmd->unmap_rnd_list);
> -		atomic_inc(&zmd->unmap_nr_rnd);
> +		list_add_tail(&zone->link, &zone->dev->unmap_rnd_list);
> +		atomic_inc(&zone->dev->unmap_nr_rnd);
>  	} else if (dmz_is_reserved(zone)) {
>  		list_add_tail(&zone->link, &zmd->reserved_seq_zones_list);
>  		atomic_inc(&zmd->nr_reserved_seq_zones);
>  	} else {
> -		list_add_tail(&zone->link, &zmd->unmap_seq_list);
> -		atomic_inc(&zmd->unmap_nr_seq);
> +		list_add_tail(&zone->link, &zone->dev->unmap_seq_list);
> +		atomic_inc(&zone->dev->unmap_nr_seq);
>  	}
>  
>  	wake_up_all(&zmd->free_wq);
> @@ -2236,9 +2278,9 @@ void dmz_map_zone(struct dmz_metadata *zmd, struct dm_zone *dzone,
>  	if (dmz_is_cache(dzone))
>  		list_add_tail(&dzone->link, &zmd->map_cache_list);
>  	else if (dmz_is_rnd(dzone))
> -		list_add_tail(&dzone->link, &zmd->map_rnd_list);
> +		list_add_tail(&dzone->link, &dzone->dev->map_rnd_list);
>  	else
> -		list_add_tail(&dzone->link, &zmd->map_seq_list);
> +		list_add_tail(&dzone->link, &dzone->dev->map_seq_list);
>  }
>  
>  /*
> @@ -2806,18 +2848,11 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
>  	INIT_LIST_HEAD(&zmd->mblk_dirty_list);
>  
>  	mutex_init(&zmd->map_lock);
> -	atomic_set(&zmd->unmap_nr_rnd, 0);
> -	INIT_LIST_HEAD(&zmd->unmap_rnd_list);
> -	INIT_LIST_HEAD(&zmd->map_rnd_list);
>  
>  	atomic_set(&zmd->unmap_nr_cache, 0);
>  	INIT_LIST_HEAD(&zmd->unmap_cache_list);
>  	INIT_LIST_HEAD(&zmd->map_cache_list);
>  
> -	atomic_set(&zmd->unmap_nr_seq, 0);
> -	INIT_LIST_HEAD(&zmd->unmap_seq_list);
> -	INIT_LIST_HEAD(&zmd->map_seq_list);
> -
>  	atomic_set(&zmd->nr_reserved_seq_zones, 0);
>  	INIT_LIST_HEAD(&zmd->reserved_seq_zones_list);
>  
> @@ -2887,9 +2922,9 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
>  	dmz_zmd_debug(zmd, "    %u cache zones (%u unmapped)",
>  		      zmd->nr_cache, atomic_read(&zmd->unmap_nr_cache));
>  	dmz_zmd_debug(zmd, "    %u random zones (%u unmapped)",
> -		      zmd->nr_rnd, atomic_read(&zmd->unmap_nr_rnd));
> +		      dmz_nr_rnd_zones(zmd), dmz_nr_unmap_rnd_zones(zmd));
>  	dmz_zmd_debug(zmd, "    %u sequential zones (%u unmapped)",
> -		      zmd->nr_seq, atomic_read(&zmd->unmap_nr_seq));
> +		      dmz_nr_seq_zones(zmd), dmz_nr_unmap_seq_zones(zmd));
>  	dmz_zmd_debug(zmd, "  %u reserved sequential data zones",
>  		      zmd->nr_reserved_seq);
>  	dmz_zmd_debug(zmd, "Format:");
> diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
> index 983f5b5e9fa0..56e138586d9b 100644
> --- a/drivers/md/dm-zoned.h
> +++ b/drivers/md/dm-zoned.h
> @@ -66,6 +66,16 @@ struct dmz_dev {
>  	unsigned int		flags;
>  
>  	sector_t		zone_nr_sectors;
> +
> +	unsigned int		nr_rnd;
> +	atomic_t		unmap_nr_rnd;
> +	struct list_head	unmap_rnd_list;
> +	struct list_head	map_rnd_list;
> +
> +	unsigned int		nr_seq;
> +	atomic_t		unmap_nr_seq;
> +	struct list_head	unmap_seq_list;
> +	struct list_head	map_seq_list;
>  };
>  
>  #define dmz_bio_chunk(zmd, bio)	((bio)->bi_iter.bi_sector >> \
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 09/12] dm-zoned: improve logging messages for reclaim
  2020-05-22 15:38 ` [PATCH 09/12] dm-zoned: improve logging messages for reclaim Hannes Reinecke
@ 2020-05-25  2:28   ` Damien Le Moal
  0 siblings, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2020-05-25  2:28 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: dm-devel, Mike Snitzer

On 2020/05/23 0:39, Hannes Reinecke wrote:
> Instead of just reporting the errno this patch adds some more
> verbose debugging message in the reclaim path.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/md/dm-zoned-reclaim.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/dm-zoned-reclaim.c b/drivers/md/dm-zoned-reclaim.c
> index d1a72b42dea2..fba0d48e38a7 100644
> --- a/drivers/md/dm-zoned-reclaim.c
> +++ b/drivers/md/dm-zoned-reclaim.c
> @@ -367,8 +367,11 @@ static int dmz_do_reclaim(struct dmz_reclaim *zrc)
>  
>  	/* Get a data zone */
>  	dzone = dmz_get_zone_for_reclaim(zmd, dmz_target_idle(zrc));
> -	if (!dzone)
> +	if (!dzone) {
> +		DMDEBUG("(%s): No zone found to reclaim",
> +			dmz_metadata_label(zmd));
>  		return -EBUSY;
> +	}
>  
>  	start = jiffies;
>  	if (dmz_is_cache(dzone) || dmz_is_rnd(dzone)) {
> @@ -412,6 +415,12 @@ static int dmz_do_reclaim(struct dmz_reclaim *zrc)
>  	}
>  out:
>  	if (ret) {
> +		if (ret == -EINTR)
> +			DMDEBUG("(%s): reclaim zone %u interrupted",
> +				dmz_metadata_label(zmd), rzone->id);
> +		else
> +			DMDEBUG("(%s): Failed to reclaim zone %u, err %d",
> +				dmz_metadata_label(zmd), rzone->id, ret);
>  		dmz_unlock_zone_reclaim(dzone);
>  		return ret;
>  	}
> @@ -515,8 +524,6 @@ static void dmz_reclaim_work(struct work_struct *work)
>  
>  	ret = dmz_do_reclaim(zrc);
>  	if (ret && ret != -EINTR) {
> -		DMDEBUG("(%s): Reclaim error %d",
> -			dmz_metadata_label(zmd), ret);
>  		if (!dmz_check_dev(zmd))
>  			return;
>  	}
> 

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 10/12] dm-zoned: support arbitrary number of devices
  2020-05-22 15:38 ` [PATCH 10/12] dm-zoned: support arbitrary number of devices Hannes Reinecke
@ 2020-05-25  2:36   ` Damien Le Moal
  2020-05-25  7:52     ` Hannes Reinecke
  0 siblings, 1 reply; 34+ messages in thread
From: Damien Le Moal @ 2020-05-25  2:36 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: dm-devel, Mike Snitzer

On 2020/05/23 0:39, Hannes Reinecke wrote:
> Remove the hard-coded limit of two devices and support an unlimited
> number of additional zoned devices.
> With that we need to increase the device-mapper version number to
> 3.0.0 as we've modified the interface.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/md/dm-zoned-metadata.c |  68 +++++++++++-----------
>  drivers/md/dm-zoned-reclaim.c  |  28 ++++++---
>  drivers/md/dm-zoned-target.c   | 129 +++++++++++++++++++++++++----------------
>  drivers/md/dm-zoned.h          |   9 +--
>  4 files changed, 139 insertions(+), 95 deletions(-)
> 
> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> index 5f44970a6187..87784e7785bc 100644
> --- a/drivers/md/dm-zoned-metadata.c
> +++ b/drivers/md/dm-zoned-metadata.c
> @@ -260,6 +260,11 @@ unsigned int dmz_zone_nr_sectors_shift(struct dmz_metadata *zmd)
>  	return zmd->zone_nr_sectors_shift;
>  }
>  
> +unsigned int dmz_nr_devs(struct dmz_metadata *zmd)
> +{
> +	return zmd->nr_devs;
> +}

Is this helper really needed ?

> +
>  unsigned int dmz_nr_zones(struct dmz_metadata *zmd)
>  {
>  	return zmd->nr_zones;
> @@ -270,24 +275,14 @@ unsigned int dmz_nr_chunks(struct dmz_metadata *zmd)
>  	return zmd->nr_chunks;
>  }
>  
> -unsigned int dmz_nr_rnd_zones(struct dmz_metadata *zmd)
> +unsigned int dmz_nr_rnd_zones(struct dmz_metadata *zmd, int idx)
>  {
> -	unsigned int nr_rnd_zones = 0;
> -	int i;
> -
> -	for (i = 0; i < zmd->nr_devs; i++)
> -		nr_rnd_zones += zmd->dev[i].nr_rnd;
> -	return nr_rnd_zones;
> +	return zmd->dev[idx].nr_rnd;

AH. OK. So my comment on patch 8 is voided :)

>  }
>  
> -unsigned int dmz_nr_unmap_rnd_zones(struct dmz_metadata *zmd)
> +unsigned int dmz_nr_unmap_rnd_zones(struct dmz_metadata *zmd, int idx)
>  {
> -	unsigned int nr_unmap_rnd_zones = 0;
> -	int i;
> -
> -	for (i = 0; i < zmd->nr_devs; i++)
> -		nr_unmap_rnd_zones += atomic_read(&zmd->dev[i].unmap_nr_rnd);
> -	return nr_unmap_rnd_zones;
> +	return atomic_read(&zmd->dev[idx].unmap_nr_rnd);
>  }
>  
>  unsigned int dmz_nr_cache_zones(struct dmz_metadata *zmd)
> @@ -300,24 +295,14 @@ unsigned int dmz_nr_unmap_cache_zones(struct dmz_metadata *zmd)
>  	return atomic_read(&zmd->unmap_nr_cache);
>  }
>  
> -unsigned int dmz_nr_seq_zones(struct dmz_metadata *zmd)
> +unsigned int dmz_nr_seq_zones(struct dmz_metadata *zmd, int idx)
>  {
> -	unsigned int nr_seq_zones = 0;
> -	int i;
> -
> -	for (i = 0; i < zmd->nr_devs; i++)
> -		nr_seq_zones += zmd->dev[i].nr_seq;
> -	return nr_seq_zones;
> +	return zmd->dev[idx].nr_seq;
>  }
>  
> -unsigned int dmz_nr_unmap_seq_zones(struct dmz_metadata *zmd)
> +unsigned int dmz_nr_unmap_seq_zones(struct dmz_metadata *zmd, int idx)
>  {
> -	unsigned int nr_unmap_seq_zones = 0;
> -	int i;
> -
> -	for (i = 0; i < zmd->nr_devs; i++)
> -		nr_unmap_seq_zones += atomic_read(&zmd->dev[i].unmap_nr_seq);
> -	return nr_unmap_seq_zones;
> +	return atomic_read(&zmd->dev[idx].unmap_nr_seq);
>  }
>  
>  static struct dm_zone *dmz_get(struct dmz_metadata *zmd, unsigned int zone_id)
> @@ -1530,7 +1515,20 @@ static int dmz_init_zones(struct dmz_metadata *zmd)
>  		 */
>  		zmd->sb[0].zone = dmz_get(zmd, 0);
>  
> -		zoned_dev = &zmd->dev[1];
> +		for (i = 1; i < zmd->nr_devs; i++) {
> +			zoned_dev = &zmd->dev[i];
> +
> +			ret = blkdev_report_zones(zoned_dev->bdev, 0,
> +						  BLK_ALL_ZONES,
> +						  dmz_init_zone, zoned_dev);
> +			if (ret < 0) {
> +				DMDEBUG("(%s): Failed to report zones, error %d",
> +					zmd->devname, ret);
> +				dmz_drop_zones(zmd);
> +				return ret;
> +			}
> +		}
> +		return 0;
>  	}
>  
>  	/*
> @@ -2921,10 +2919,14 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
>  		      zmd->nr_data_zones, zmd->nr_chunks);
>  	dmz_zmd_debug(zmd, "    %u cache zones (%u unmapped)",
>  		      zmd->nr_cache, atomic_read(&zmd->unmap_nr_cache));
> -	dmz_zmd_debug(zmd, "    %u random zones (%u unmapped)",
> -		      dmz_nr_rnd_zones(zmd), dmz_nr_unmap_rnd_zones(zmd));
> -	dmz_zmd_debug(zmd, "    %u sequential zones (%u unmapped)",
> -		      dmz_nr_seq_zones(zmd), dmz_nr_unmap_seq_zones(zmd));
> +	for (i = 0; i < zmd->nr_devs; i++) {
> +		dmz_zmd_debug(zmd, "    %u random zones (%u unmapped)",
> +			      dmz_nr_rnd_zones(zmd, i),
> +			      dmz_nr_unmap_rnd_zones(zmd, i));
> +		dmz_zmd_debug(zmd, "    %u sequential zones (%u unmapped)",
> +			      dmz_nr_seq_zones(zmd, i),
> +			      dmz_nr_unmap_seq_zones(zmd, i));
> +	}
>  	dmz_zmd_debug(zmd, "  %u reserved sequential data zones",
>  		      zmd->nr_reserved_seq);
>  	dmz_zmd_debug(zmd, "Format:");
> diff --git a/drivers/md/dm-zoned-reclaim.c b/drivers/md/dm-zoned-reclaim.c
> index fba0d48e38a7..f2e053b5f2db 100644
> --- a/drivers/md/dm-zoned-reclaim.c
> +++ b/drivers/md/dm-zoned-reclaim.c
> @@ -442,15 +442,18 @@ static unsigned int dmz_reclaim_percentage(struct dmz_reclaim *zrc)
>  {
>  	struct dmz_metadata *zmd = zrc->metadata;
>  	unsigned int nr_cache = dmz_nr_cache_zones(zmd);
> -	unsigned int nr_rnd = dmz_nr_rnd_zones(zmd);
> -	unsigned int nr_unmap, nr_zones;
> +	unsigned int nr_unmap = 0, nr_zones = 0;
>  
>  	if (nr_cache) {
>  		nr_zones = nr_cache;
>  		nr_unmap = dmz_nr_unmap_cache_zones(zmd);
>  	} else {
> -		nr_zones = nr_rnd;
> -		nr_unmap = dmz_nr_unmap_rnd_zones(zmd);
> +		int i;
> +
> +		for (i = 0; i < dmz_nr_devs(zmd); i++) {
> +			nr_zones += dmz_nr_rnd_zones(zmd, i);

May be not... We could keep constant totals in zmd to avoid this.

> +			nr_unmap += dmz_nr_unmap_rnd_zones(zmd, i);
> +		}
>  	}
>  	return nr_unmap * 100 / nr_zones;
>  }
> @@ -460,7 +463,11 @@ static unsigned int dmz_reclaim_percentage(struct dmz_reclaim *zrc)
>   */
>  static bool dmz_should_reclaim(struct dmz_reclaim *zrc, unsigned int p_unmap)
>  {
> -	unsigned int nr_reclaim = dmz_nr_rnd_zones(zrc->metadata);
> +	int i;
> +	unsigned int nr_reclaim = 0;
> +
> +	for (i = 0; i < dmz_nr_devs(zrc->metadata); i++)
> +		nr_reclaim += dmz_nr_rnd_zones(zrc->metadata, i);
>  
>  	if (dmz_nr_cache_zones(zrc->metadata))
>  		nr_reclaim += dmz_nr_cache_zones(zrc->metadata);
> @@ -487,8 +494,8 @@ static void dmz_reclaim_work(struct work_struct *work)
>  {
>  	struct dmz_reclaim *zrc = container_of(work, struct dmz_reclaim, work.work);
>  	struct dmz_metadata *zmd = zrc->metadata;
> -	unsigned int p_unmap;
> -	int ret;
> +	unsigned int p_unmap, nr_unmap_rnd = 0, nr_rnd = 0;
> +	int ret, i;
>  
>  	if (dmz_dev_is_dying(zmd))
>  		return;
> @@ -513,14 +520,17 @@ static void dmz_reclaim_work(struct work_struct *work)
>  		zrc->kc_throttle.throttle = min(75U, 100U - p_unmap / 2);
>  	}
>  
> +	for (i = 0; i < dmz_nr_devs(zmd); i++) {
> +		nr_unmap_rnd += dmz_nr_unmap_rnd_zones(zmd, i);
> +		nr_rnd += dmz_nr_rnd_zones(zmd, i);
> +	}
>  	DMDEBUG("(%s): Reclaim (%u): %s, %u%% free zones (%u/%u cache %u/%u random)",
>  		dmz_metadata_label(zmd),
>  		zrc->kc_throttle.throttle,
>  		(dmz_target_idle(zrc) ? "Idle" : "Busy"),
>  		p_unmap, dmz_nr_unmap_cache_zones(zmd),
>  		dmz_nr_cache_zones(zmd),
> -		dmz_nr_unmap_rnd_zones(zmd),
> -		dmz_nr_rnd_zones(zmd));
> +		nr_unmap_rnd, nr_rnd);
>  
>  	ret = dmz_do_reclaim(zrc);
>  	if (ret && ret != -EINTR) {
> diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
> index bca9a611b8dd..f34fcc3f7cc6 100644
> --- a/drivers/md/dm-zoned-target.c
> +++ b/drivers/md/dm-zoned-target.c
> @@ -13,8 +13,6 @@
>  
>  #define DMZ_MIN_BIOS		8192
>  
> -#define DMZ_MAX_DEVS		2
> -
>  /*
>   * Zone BIO context.
>   */
> @@ -40,9 +38,10 @@ struct dm_chunk_work {
>   * Target descriptor.
>   */
>  struct dmz_target {
> -	struct dm_dev		*ddev[DMZ_MAX_DEVS];
> +	struct dm_dev		**ddev;
> +	unsigned int		nr_ddevs;
>  
> -	unsigned long		flags;
> +	unsigned int		flags;
>  
>  	/* Zoned block device information */
>  	struct dmz_dev		*dev;
> @@ -764,7 +763,7 @@ static void dmz_put_zoned_device(struct dm_target *ti)
>  	struct dmz_target *dmz = ti->private;
>  	int i;
>  
> -	for (i = 0; i < DMZ_MAX_DEVS; i++) {
> +	for (i = 0; i < dmz->nr_ddevs; i++) {
>  		if (dmz->ddev[i]) {
>  			dm_put_device(ti, dmz->ddev[i]);
>  			dmz->ddev[i] = NULL;
> @@ -777,21 +776,35 @@ static int dmz_fixup_devices(struct dm_target *ti)
>  	struct dmz_target *dmz = ti->private;
>  	struct dmz_dev *reg_dev, *zoned_dev;
>  	struct request_queue *q;
> +	sector_t zone_nr_sectors = 0;
> +	int i;
>  
>  	/*
> -	 * When we have two devices, the first one must be a regular block
> -	 * device and the second a zoned block device.
> +	 * When we have more than on devices, the first one must be a
> +	 * regular block device and the others zoned block devices.
>  	 */
> -	if (dmz->ddev[0] && dmz->ddev[1]) {
> +	if (dmz->nr_ddevs > 1) {
>  		reg_dev = &dmz->dev[0];
>  		if (!(reg_dev->flags & DMZ_BDEV_REGULAR)) {
>  			ti->error = "Primary disk is not a regular device";
>  			return -EINVAL;
>  		}
> -		zoned_dev = &dmz->dev[1];
> -		if (zoned_dev->flags & DMZ_BDEV_REGULAR) {
> -			ti->error = "Secondary disk is not a zoned device";
> -			return -EINVAL;
> +		for (i = 1; i < dmz->nr_ddevs; i++) {
> +			zoned_dev = &dmz->dev[i];
> +			if (zoned_dev->flags & DMZ_BDEV_REGULAR) {
> +				ti->error = "Secondary disk is not a zoned device";
> +				return -EINVAL;
> +			}
> +			q = bdev_get_queue(zoned_dev->bdev);

May be add a comment here that we must check that all zoned devices have the
same zone size ?

> +			if (zone_nr_sectors &&
> +			    zone_nr_sectors != blk_queue_zone_sectors(q)) {
> +				ti->error = "Zone nr sectors mismatch";
> +				return -EINVAL;
> +			}
> +			zone_nr_sectors = blk_queue_zone_sectors(q);
> +			zoned_dev->zone_nr_sectors = zone_nr_sectors;
> +			zoned_dev->nr_zones =
> +				blkdev_nr_zones(zoned_dev->bdev->bd_disk);
>  		}
>  	} else {
>  		reg_dev = NULL;
> @@ -800,17 +813,24 @@ static int dmz_fixup_devices(struct dm_target *ti)
>  			ti->error = "Disk is not a zoned device";
>  			return -EINVAL;
>  		}
> +		q = bdev_get_queue(zoned_dev->bdev);
> +		zoned_dev->zone_nr_sectors = blk_queue_zone_sectors(q);
> +		zoned_dev->nr_zones = blkdev_nr_zones(zoned_dev->bdev->bd_disk);
>  	}
> -	q = bdev_get_queue(zoned_dev->bdev);
> -	zoned_dev->zone_nr_sectors = blk_queue_zone_sectors(q);
> -	zoned_dev->nr_zones = blkdev_nr_zones(zoned_dev->bdev->bd_disk);
>  
>  	if (reg_dev) {
> -		reg_dev->zone_nr_sectors = zoned_dev->zone_nr_sectors;
> +		sector_t zone_offset;
> +
> +		reg_dev->zone_nr_sectors = zone_nr_sectors;
>  		reg_dev->nr_zones =
>  			DIV_ROUND_UP_SECTOR_T(reg_dev->capacity,
>  					      reg_dev->zone_nr_sectors);
> -		zoned_dev->zone_offset = reg_dev->nr_zones;
> +		reg_dev->zone_offset = 0;
> +		zone_offset = reg_dev->nr_zones;
> +		for (i = 1; i < dmz->nr_ddevs; i++) {
> +			dmz->dev[i].zone_offset = zone_offset;
> +			zone_offset += dmz->dev[i].nr_zones;
> +		}
>  	}
>  	return 0;
>  }
> @@ -821,10 +841,10 @@ static int dmz_fixup_devices(struct dm_target *ti)
>  static int dmz_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  {
>  	struct dmz_target *dmz;
> -	int ret;
> +	int ret, i;
>  
>  	/* Check arguments */
> -	if (argc < 1 || argc > 2) {
> +	if (argc < 1) {
>  		ti->error = "Invalid argument count";
>  		return -EINVAL;
>  	}
> @@ -835,31 +855,31 @@ static int dmz_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  		ti->error = "Unable to allocate the zoned target descriptor";
>  		return -ENOMEM;
>  	}
> -	dmz->dev = kcalloc(2, sizeof(struct dmz_dev), GFP_KERNEL);
> +	dmz->dev = kcalloc(argc, sizeof(struct dmz_dev), GFP_KERNEL);
>  	if (!dmz->dev) {
>  		ti->error = "Unable to allocate the zoned device descriptors";
>  		kfree(dmz);
>  		return -ENOMEM;
>  	}
> +	dmz->ddev = kcalloc(argc, sizeof(struct dm_dev *), GFP_KERNEL);
> +	if (!dmz->ddev) {
> +		ti->error = "Unable to allocate the dm device descriptors";
> +		ret = -ENOMEM;
> +		goto err;
> +	}
> +	dmz->nr_ddevs = argc;
> +
>  	ti->private = dmz;
>  
>  	/* Get the target zoned block device */
> -	ret = dmz_get_zoned_device(ti, argv[0], 0, argc);
> -	if (ret)
> -		goto err;
> -
> -	if (argc == 2) {
> -		ret = dmz_get_zoned_device(ti, argv[1], 1, argc);
> -		if (ret) {
> -			dmz_put_zoned_device(ti);
> -			goto err;
> -		}
> +	for (i = 0; i < argc; i++) {
> +		ret = dmz_get_zoned_device(ti, argv[i], i, argc);
> +		if (ret)
> +			goto err_dev;
>  	}
>  	ret = dmz_fixup_devices(ti);
> -	if (ret) {
> -		dmz_put_zoned_device(ti);
> -		goto err;
> -	}
> +	if (ret)
> +		goto err_dev;
>  
>  	/* Initialize metadata */
>  	ret = dmz_ctr_metadata(dmz->dev, argc, &dmz->metadata,
> @@ -1047,13 +1067,13 @@ static int dmz_iterate_devices(struct dm_target *ti,
>  	struct dmz_target *dmz = ti->private;
>  	unsigned int zone_nr_sectors = dmz_zone_nr_sectors(dmz->metadata);
>  	sector_t capacity;
> -	int r;
> +	int i, r;
>  
> -	capacity = dmz->dev[0].capacity & ~(zone_nr_sectors - 1);
> -	r = fn(ti, dmz->ddev[0], 0, capacity, data);
> -	if (!r && dmz->ddev[1]) {
> -		capacity = dmz->dev[1].capacity & ~(zone_nr_sectors - 1);
> -		r = fn(ti, dmz->ddev[1], 0, capacity, data);
> +	for (i = 0; i < dmz->nr_ddevs; i++) {
> +		capacity = dmz->dev[i].capacity & ~(zone_nr_sectors - 1);
> +		r = fn(ti, dmz->ddev[i], 0, capacity, data);
> +		if (r)
> +			break;
>  	}
>  	return r;
>  }
> @@ -1066,24 +1086,35 @@ static void dmz_status(struct dm_target *ti, status_type_t type,
>  	ssize_t sz = 0;
>  	char buf[BDEVNAME_SIZE];
>  	struct dmz_dev *dev;
> +	int i;
>  
>  	switch (type) {
>  	case STATUSTYPE_INFO:
> -		DMEMIT("%u zones %u/%u cache %u/%u random %u/%u sequential",
> +		DMEMIT("%u zones %u/%u cache",
>  		       dmz_nr_zones(dmz->metadata),
>  		       dmz_nr_unmap_cache_zones(dmz->metadata),
> -		       dmz_nr_cache_zones(dmz->metadata),
> -		       dmz_nr_unmap_rnd_zones(dmz->metadata),
> -		       dmz_nr_rnd_zones(dmz->metadata),
> -		       dmz_nr_unmap_seq_zones(dmz->metadata),
> -		       dmz_nr_seq_zones(dmz->metadata));
> +		       dmz_nr_cache_zones(dmz->metadata));
> +		for (i = 0; i < dmz->nr_ddevs; i++) {
> +			/*
> +			 * For a multi-device setup the first device
> +			 * contains only cache zones.
> +			 */
> +			if ((i == 0) &&
> +			    (dmz_nr_cache_zones(dmz->metadata) > 0))
> +				continue;
> +			DMEMIT(" %u/%u random %u/%u sequential",
> +			       dmz_nr_unmap_rnd_zones(dmz->metadata, i),
> +			       dmz_nr_rnd_zones(dmz->metadata, i),
> +			       dmz_nr_unmap_seq_zones(dmz->metadata, i),
> +			       dmz_nr_seq_zones(dmz->metadata, i));
> +		}
>  		break;
>  	case STATUSTYPE_TABLE:
>  		dev = &dmz->dev[0];
>  		format_dev_t(buf, dev->bdev->bd_dev);
>  		DMEMIT("%s", buf);
> -		if (dmz->dev[1].bdev) {
> -			dev = &dmz->dev[1];
> +		for (i = 1; i < dmz->nr_ddevs; i++) {
> +			dev = &dmz->dev[i];
>  			format_dev_t(buf, dev->bdev->bd_dev);
>  			DMEMIT(" %s", buf);
>  		}
> @@ -1108,7 +1139,7 @@ static int dmz_message(struct dm_target *ti, unsigned int argc, char **argv,
>  
>  static struct target_type dmz_type = {
>  	.name		 = "zoned",
> -	.version	 = {2, 0, 0},
> +	.version	 = {3, 0, 0},
>  	.features	 = DM_TARGET_SINGLETON | DM_TARGET_ZONED_HM,
>  	.module		 = THIS_MODULE,
>  	.ctr		 = dmz_ctr,
> diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
> index 56e138586d9b..0052eee12299 100644
> --- a/drivers/md/dm-zoned.h
> +++ b/drivers/md/dm-zoned.h
> @@ -219,13 +219,14 @@ void dmz_free_zone(struct dmz_metadata *zmd, struct dm_zone *zone);
>  void dmz_map_zone(struct dmz_metadata *zmd, struct dm_zone *zone,
>  		  unsigned int chunk);
>  void dmz_unmap_zone(struct dmz_metadata *zmd, struct dm_zone *zone);
> +unsigned int dmz_nr_devs(struct dmz_metadata *zmd);
>  unsigned int dmz_nr_zones(struct dmz_metadata *zmd);
>  unsigned int dmz_nr_cache_zones(struct dmz_metadata *zmd);
>  unsigned int dmz_nr_unmap_cache_zones(struct dmz_metadata *zmd);
> -unsigned int dmz_nr_rnd_zones(struct dmz_metadata *zmd);
> -unsigned int dmz_nr_unmap_rnd_zones(struct dmz_metadata *zmd);
> -unsigned int dmz_nr_seq_zones(struct dmz_metadata *zmd);
> -unsigned int dmz_nr_unmap_seq_zones(struct dmz_metadata *zmd);
> +unsigned int dmz_nr_rnd_zones(struct dmz_metadata *zmd, int idx);
> +unsigned int dmz_nr_unmap_rnd_zones(struct dmz_metadata *zmd, int idx);
> +unsigned int dmz_nr_seq_zones(struct dmz_metadata *zmd, int idx);
> +unsigned int dmz_nr_unmap_seq_zones(struct dmz_metadata *zmd, int idx);
>  unsigned int dmz_zone_nr_blocks(struct dmz_metadata *zmd);
>  unsigned int dmz_zone_nr_blocks_shift(struct dmz_metadata *zmd);
>  unsigned int dmz_zone_nr_sectors(struct dmz_metadata *zmd);
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 11/12] dm-zoned: round-robin load balancer for reclaiming zones
  2020-05-22 15:39 ` [PATCH 11/12] dm-zoned: round-robin load balancer for reclaiming zones Hannes Reinecke
@ 2020-05-25  2:42   ` Damien Le Moal
  2020-05-25  7:53     ` Hannes Reinecke
  0 siblings, 1 reply; 34+ messages in thread
From: Damien Le Moal @ 2020-05-25  2:42 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: dm-devel, Mike Snitzer

On 2020/05/23 0:39, Hannes Reinecke wrote:
> When reclaiming zones we should arbitrate between the zoned
> devices to get a better throughput. So implement a simple
> round-robin load balancer between the zoned devices.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/md/dm-zoned-metadata.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> index 87784e7785bc..25dcad2a565f 100644
> --- a/drivers/md/dm-zoned-metadata.c
> +++ b/drivers/md/dm-zoned-metadata.c
> @@ -171,6 +171,8 @@ struct dmz_metadata {
>  	unsigned int		nr_reserved_seq;
>  	unsigned int		nr_chunks;
>  
> +	unsigned int		last_alloc_idx;
> +
>  	/* Zone information array */
>  	struct xarray		zones;
>  
> @@ -2178,7 +2180,7 @@ struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags)
>  {
>  	struct list_head *list;
>  	struct dm_zone *zone;
> -	unsigned int dev_idx = 0;
> +	unsigned int dev_idx = zmd->last_alloc_idx;
>  
>  again:
>  	if (flags & DMZ_ALLOC_CACHE)
> @@ -2214,6 +2216,9 @@ struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags)
>  	zone = list_first_entry(list, struct dm_zone, link);
>  	list_del_init(&zone->link);
>  
> +	if (!(flags & DMZ_ALLOC_CACHE))
> +		zmd->last_alloc_idx = (dev_idx + 1) % zmd->nr_devs;
> +
>  	if (dmz_is_cache(zone))
>  		atomic_dec(&zmd->unmap_nr_cache);
>  	else if (dmz_is_rnd(zone))
> @@ -2839,6 +2844,7 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
>  	zmd->dev = dev;
>  	zmd->nr_devs = num_dev;
>  	zmd->mblk_rbtree = RB_ROOT;
> +	zmd->last_alloc_idx = 0;
>  	init_rwsem(&zmd->mblk_sem);
>  	mutex_init(&zmd->mblk_flush_lock);
>  	spin_lock_init(&zmd->mblk_lock);
> 


OK. So my comment on patch 8 is already addressed. Or at least partly... Where
is last_alloc_idx actually used ? It looks like this only sets last_alloc_idx
but do not use that value on entry to dmz_alloc_zone() to allocate the zone.

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 12/12] dm-zoned: per-device reclaim
  2020-05-22 15:39 ` [PATCH 12/12] dm-zoned: per-device reclaim Hannes Reinecke
@ 2020-05-25  2:46   ` Damien Le Moal
  0 siblings, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2020-05-25  2:46 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: dm-devel, Mike Snitzer

On 2020/05/23 0:39, Hannes Reinecke wrote:
> Instead of having one reclaim workqueue for the entire set we should
> be allocating a reclaim workqueue per device; that will reduce
> contention and should boost performance for a multi-device setup.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/md/dm-zoned-reclaim.c | 70 +++++++++++++++++++++----------------------
>  drivers/md/dm-zoned-target.c  | 36 +++++++++++++---------
>  drivers/md/dm-zoned.h         | 38 ++++++++++++-----------
>  3 files changed, 76 insertions(+), 68 deletions(-)
> 
> diff --git a/drivers/md/dm-zoned-reclaim.c b/drivers/md/dm-zoned-reclaim.c
> index f2e053b5f2db..6f3d8f18b989 100644
> --- a/drivers/md/dm-zoned-reclaim.c
> +++ b/drivers/md/dm-zoned-reclaim.c
> @@ -21,6 +21,8 @@ struct dmz_reclaim {
>  	struct dm_kcopyd_throttle kc_throttle;
>  	int			kc_err;
>  
> +	int			dev_idx;
> +
>  	unsigned long		flags;
>  
>  	/* Last target access time */
> @@ -197,8 +199,8 @@ static int dmz_reclaim_buf(struct dmz_reclaim *zrc, struct dm_zone *dzone)
>  	struct dmz_metadata *zmd = zrc->metadata;
>  	int ret;
>  
> -	DMDEBUG("(%s): Chunk %u, move buf zone %u (weight %u) to data zone %u (weight %u)",
> -		dmz_metadata_label(zmd),
> +	DMDEBUG("(%s/%u): Chunk %u, move buf zone %u (weight %u) to data zone %u (weight %u)",
> +		dmz_metadata_label(zmd), zrc->dev_idx,
>  		dzone->chunk, bzone->id, dmz_weight(bzone),
>  		dzone->id, dmz_weight(dzone));
>  
> @@ -236,8 +238,8 @@ static int dmz_reclaim_seq_data(struct dmz_reclaim *zrc, struct dm_zone *dzone)
>  	struct dmz_metadata *zmd = zrc->metadata;
>  	int ret = 0;
>  
> -	DMDEBUG("(%s): Chunk %u, move data zone %u (weight %u) to buf zone %u (weight %u)",
> -		dmz_metadata_label(zmd),
> +	DMDEBUG("(%s/%u): Chunk %u, move data zone %u (weight %u) to buf zone %u (weight %u)",
> +		dmz_metadata_label(zmd), zrc->dev_idx,
>  		chunk, dzone->id, dmz_weight(dzone),
>  		bzone->id, dmz_weight(bzone));
>  
> @@ -294,8 +296,8 @@ static int dmz_reclaim_rnd_data(struct dmz_reclaim *zrc, struct dm_zone *dzone)
>  	if (!szone)
>  		return -ENOSPC;
>  
> -	DMDEBUG("(%s): Chunk %u, move %s zone %u (weight %u) to %s zone %u",
> -		dmz_metadata_label(zmd), chunk,
> +	DMDEBUG("(%s/%u): Chunk %u, move %s zone %u (weight %u) to %s zone %u",
> +		dmz_metadata_label(zmd), zrc->dev_idx, chunk,
>  		dmz_is_cache(dzone) ? "cache" : "rnd",
>  		dzone->id, dmz_weight(dzone),
>  		dmz_is_rnd(szone) ? "rnd" : "seq", szone->id);
> @@ -368,8 +370,8 @@ static int dmz_do_reclaim(struct dmz_reclaim *zrc)
>  	/* Get a data zone */
>  	dzone = dmz_get_zone_for_reclaim(zmd, dmz_target_idle(zrc));
>  	if (!dzone) {
> -		DMDEBUG("(%s): No zone found to reclaim",
> -			dmz_metadata_label(zmd));
> +		DMDEBUG("(%s/%u): No zone found to reclaim",
> +			dmz_metadata_label(zmd), zrc->dev_idx);
>  		return -EBUSY;
>  	}
>  
> @@ -416,24 +418,26 @@ static int dmz_do_reclaim(struct dmz_reclaim *zrc)
>  out:
>  	if (ret) {
>  		if (ret == -EINTR)
> -			DMDEBUG("(%s): reclaim zone %u interrupted",
> -				dmz_metadata_label(zmd), rzone->id);
> +			DMDEBUG("(%s/%u): reclaim zone %u interrupted",
> +				dmz_metadata_label(zmd), zrc->dev_idx,
> +				rzone->id);
>  		else
> -			DMDEBUG("(%s): Failed to reclaim zone %u, err %d",
> -				dmz_metadata_label(zmd), rzone->id, ret);
> +			DMDEBUG("(%s/%u): Failed to reclaim zone %u, err %d",
> +				dmz_metadata_label(zmd), zrc->dev_idx,
> +				rzone->id, ret);
>  		dmz_unlock_zone_reclaim(dzone);
>  		return ret;
>  	}
>  
>  	ret = dmz_flush_metadata(zrc->metadata);
>  	if (ret) {
> -		DMDEBUG("(%s): Metadata flush for zone %u failed, err %d",
> -			dmz_metadata_label(zmd), rzone->id, ret);
> +		DMDEBUG("(%s/%u): Metadata flush for zone %u failed, err %d",
> +			dmz_metadata_label(zmd), zrc->dev_idx, rzone->id, ret);
>  		return ret;
>  	}
>  
> -	DMDEBUG("(%s): Reclaimed zone %u in %u ms",
> -		dmz_metadata_label(zmd),
> +	DMDEBUG("(%s/%u): Reclaimed zone %u in %u ms",
> +		dmz_metadata_label(zmd), zrc->dev_idx,
>  		rzone->id, jiffies_to_msecs(jiffies - start));
>  	return 0;
>  }
> @@ -448,12 +452,8 @@ static unsigned int dmz_reclaim_percentage(struct dmz_reclaim *zrc)
>  		nr_zones = nr_cache;
>  		nr_unmap = dmz_nr_unmap_cache_zones(zmd);
>  	} else {
> -		int i;
> -
> -		for (i = 0; i < dmz_nr_devs(zmd); i++) {
> -			nr_zones += dmz_nr_rnd_zones(zmd, i);
> -			nr_unmap += dmz_nr_unmap_rnd_zones(zmd, i);
> -		}
> +		nr_zones = dmz_nr_rnd_zones(zmd, zrc->dev_idx);
> +		nr_unmap = dmz_nr_unmap_rnd_zones(zmd, zrc->dev_idx);
>  	}
>  	return nr_unmap * 100 / nr_zones;
>  }
> @@ -463,11 +463,9 @@ static unsigned int dmz_reclaim_percentage(struct dmz_reclaim *zrc)
>   */
>  static bool dmz_should_reclaim(struct dmz_reclaim *zrc, unsigned int p_unmap)
>  {
> -	int i;
> -	unsigned int nr_reclaim = 0;
> +	unsigned int nr_reclaim;
>  
> -	for (i = 0; i < dmz_nr_devs(zrc->metadata); i++)
> -		nr_reclaim += dmz_nr_rnd_zones(zrc->metadata, i);
> +	nr_reclaim = dmz_nr_rnd_zones(zrc->metadata, zrc->dev_idx);
>  
>  	if (dmz_nr_cache_zones(zrc->metadata))
>  		nr_reclaim += dmz_nr_cache_zones(zrc->metadata);
> @@ -495,7 +493,7 @@ static void dmz_reclaim_work(struct work_struct *work)
>  	struct dmz_reclaim *zrc = container_of(work, struct dmz_reclaim, work.work);
>  	struct dmz_metadata *zmd = zrc->metadata;
>  	unsigned int p_unmap, nr_unmap_rnd = 0, nr_rnd = 0;
> -	int ret, i;
> +	int ret;
>  
>  	if (dmz_dev_is_dying(zmd))
>  		return;
> @@ -520,12 +518,11 @@ static void dmz_reclaim_work(struct work_struct *work)
>  		zrc->kc_throttle.throttle = min(75U, 100U - p_unmap / 2);
>  	}
>  
> -	for (i = 0; i < dmz_nr_devs(zmd); i++) {
> -		nr_unmap_rnd += dmz_nr_unmap_rnd_zones(zmd, i);
> -		nr_rnd += dmz_nr_rnd_zones(zmd, i);
> -	}
> -	DMDEBUG("(%s): Reclaim (%u): %s, %u%% free zones (%u/%u cache %u/%u random)",
> -		dmz_metadata_label(zmd),
> +	nr_unmap_rnd = dmz_nr_unmap_rnd_zones(zmd, zrc->dev_idx);
> +	nr_rnd = dmz_nr_rnd_zones(zmd, zrc->dev_idx);
> +
> +	DMDEBUG("(%s/%u): Reclaim (%u): %s, %u%% free zones (%u/%u cache %u/%u random)",
> +		dmz_metadata_label(zmd), zrc->dev_idx,
>  		zrc->kc_throttle.throttle,
>  		(dmz_target_idle(zrc) ? "Idle" : "Busy"),
>  		p_unmap, dmz_nr_unmap_cache_zones(zmd),
> @@ -545,7 +542,7 @@ static void dmz_reclaim_work(struct work_struct *work)
>   * Initialize reclaim.
>   */
>  int dmz_ctr_reclaim(struct dmz_metadata *zmd,
> -		    struct dmz_reclaim **reclaim)
> +		    struct dmz_reclaim **reclaim, int idx)
>  {
>  	struct dmz_reclaim *zrc;
>  	int ret;
> @@ -556,6 +553,7 @@ int dmz_ctr_reclaim(struct dmz_metadata *zmd,
>  
>  	zrc->metadata = zmd;
>  	zrc->atime = jiffies;
> +	zrc->dev_idx = idx;
>  
>  	/* Reclaim kcopyd client */
>  	zrc->kc = dm_kcopyd_client_create(&zrc->kc_throttle);
> @@ -567,8 +565,8 @@ int dmz_ctr_reclaim(struct dmz_metadata *zmd,
>  
>  	/* Reclaim work */
>  	INIT_DELAYED_WORK(&zrc->work, dmz_reclaim_work);
> -	zrc->wq = alloc_ordered_workqueue("dmz_rwq_%s", WQ_MEM_RECLAIM,
> -					  dmz_metadata_label(zmd));
> +	zrc->wq = alloc_ordered_workqueue("dmz_rwq_%s_%d", WQ_MEM_RECLAIM,
> +					  dmz_metadata_label(zmd), idx);
>  	if (!zrc->wq) {
>  		ret = -ENOMEM;
>  		goto err;
> diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
> index f34fcc3f7cc6..a33c26a6ab31 100644
> --- a/drivers/md/dm-zoned-target.c
> +++ b/drivers/md/dm-zoned-target.c
> @@ -49,9 +49,6 @@ struct dmz_target {
>  	/* For metadata handling */
>  	struct dmz_metadata     *metadata;
>  
> -	/* For reclaim */
> -	struct dmz_reclaim	*reclaim;
> -
>  	/* For chunk work */
>  	struct radix_tree_root	chunk_rxtree;
>  	struct workqueue_struct *chunk_wq;
> @@ -402,14 +399,15 @@ static void dmz_handle_bio(struct dmz_target *dmz, struct dm_chunk_work *cw,
>  		dm_per_bio_data(bio, sizeof(struct dmz_bioctx));
>  	struct dmz_metadata *zmd = dmz->metadata;
>  	struct dm_zone *zone;
> -	int ret;
> +	int i, ret;
>  
>  	/*
>  	 * Write may trigger a zone allocation. So make sure the
>  	 * allocation can succeed.
>  	 */
>  	if (bio_op(bio) == REQ_OP_WRITE)
> -		dmz_schedule_reclaim(dmz->reclaim);
> +		for (i = 0; i < dmz->nr_ddevs; i++)
> +			dmz_schedule_reclaim(dmz->dev[i].reclaim);
>  
>  	dmz_lock_metadata(zmd);
>  
> @@ -575,7 +573,6 @@ static int dmz_queue_chunk_work(struct dmz_target *dmz, struct bio *bio)
>  
>  	bio_list_add(&cw->bio_list, bio);
>  
> -	dmz_reclaim_bio_acc(dmz->reclaim);
>  	if (queue_work(dmz->chunk_wq, &cw->work))
>  		dmz_get_chunk_work(cw);
>  out:
> @@ -935,10 +932,12 @@ static int dmz_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  	mod_delayed_work(dmz->flush_wq, &dmz->flush_work, DMZ_FLUSH_PERIOD);
>  
>  	/* Initialize reclaim */
> -	ret = dmz_ctr_reclaim(dmz->metadata, &dmz->reclaim);
> -	if (ret) {
> -		ti->error = "Zone reclaim initialization failed";
> -		goto err_fwq;
> +	for (i = 0; i < argc; i++) {
> +		ret = dmz_ctr_reclaim(dmz->metadata, &dmz->dev[i].reclaim, i);
> +		if (ret) {
> +			ti->error = "Zone reclaim initialization failed";
> +			goto err_fwq;
> +		}
>  	}
>  
>  	DMINFO("(%s): Target device: %llu 512-byte logical sectors (%llu blocks)",
> @@ -971,11 +970,13 @@ static int dmz_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  static void dmz_dtr(struct dm_target *ti)
>  {
>  	struct dmz_target *dmz = ti->private;
> +	int i;
>  
>  	flush_workqueue(dmz->chunk_wq);
>  	destroy_workqueue(dmz->chunk_wq);
>  
> -	dmz_dtr_reclaim(dmz->reclaim);
> +	for (i = 0; i < dmz_nr_devs(dmz->metadata); i++)
> +		dmz_dtr_reclaim(dmz->dev[i].reclaim);
>  
>  	cancel_delayed_work_sync(&dmz->flush_work);
>  	destroy_workqueue(dmz->flush_wq);
> @@ -1044,9 +1045,11 @@ static int dmz_prepare_ioctl(struct dm_target *ti, struct block_device **bdev)
>  static void dmz_suspend(struct dm_target *ti)
>  {
>  	struct dmz_target *dmz = ti->private;
> +	int i;
>  
>  	flush_workqueue(dmz->chunk_wq);
> -	dmz_suspend_reclaim(dmz->reclaim);
> +	for (i = 0; i < dmz->nr_ddevs; i++)
> +		dmz_suspend_reclaim(dmz->dev[i].reclaim);
>  	cancel_delayed_work_sync(&dmz->flush_work);
>  }
>  
> @@ -1056,9 +1059,11 @@ static void dmz_suspend(struct dm_target *ti)
>  static void dmz_resume(struct dm_target *ti)
>  {
>  	struct dmz_target *dmz = ti->private;
> +	int i;
>  
>  	queue_delayed_work(dmz->flush_wq, &dmz->flush_work, DMZ_FLUSH_PERIOD);
> -	dmz_resume_reclaim(dmz->reclaim);
> +	for (i = 0; i < dmz->nr_ddevs; i++)
> +		dmz_resume_reclaim(dmz->dev[i].reclaim);
>  }
>  
>  static int dmz_iterate_devices(struct dm_target *ti,
> @@ -1130,7 +1135,10 @@ static int dmz_message(struct dm_target *ti, unsigned int argc, char **argv,
>  	int r = -EINVAL;
>  
>  	if (!strcasecmp(argv[0], "reclaim")) {
> -		dmz_schedule_reclaim(dmz->reclaim);
> +		int i;
> +
> +		for (i = 0; i < dmz->nr_ddevs; i++)
> +			dmz_schedule_reclaim(dmz->dev[i].reclaim);
>  		r = 0;
>  	} else
>  		DMERR("unrecognized message %s", argv[0]);
> diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
> index 0052eee12299..1ee91a3a4076 100644
> --- a/drivers/md/dm-zoned.h
> +++ b/drivers/md/dm-zoned.h
> @@ -54,6 +54,7 @@ struct dmz_reclaim;
>  struct dmz_dev {
>  	struct block_device	*bdev;
>  	struct dmz_metadata	*metadata;
> +	struct dmz_reclaim	*reclaim;
>  
>  	char			name[BDEVNAME_SIZE];
>  	uuid_t			uuid;
> @@ -240,23 +241,6 @@ static inline void dmz_activate_zone(struct dm_zone *zone)
>  	atomic_inc(&zone->refcount);
>  }
>  
> -/*
> - * Deactivate a zone. This decrement the zone reference counter
> - * indicating that all BIOs to the zone have completed when the count is 0.
> - */
> -static inline void dmz_deactivate_zone(struct dm_zone *zone)
> -{
> -	atomic_dec(&zone->refcount);
> -}
> -
> -/*
> - * Test if a zone is active, that is, has a refcount > 0.
> - */
> -static inline bool dmz_is_active(struct dm_zone *zone)
> -{
> -	return atomic_read(&zone->refcount);
> -}
> -
>  int dmz_lock_zone_reclaim(struct dm_zone *zone);
>  void dmz_unlock_zone_reclaim(struct dm_zone *zone);
>  struct dm_zone *dmz_get_zone_for_reclaim(struct dmz_metadata *zmd, bool idle);
> @@ -283,7 +267,7 @@ int dmz_merge_valid_blocks(struct dmz_metadata *zmd, struct dm_zone *from_zone,
>  /*
>   * Functions defined in dm-zoned-reclaim.c
>   */
> -int dmz_ctr_reclaim(struct dmz_metadata *zmd, struct dmz_reclaim **zrc);
> +int dmz_ctr_reclaim(struct dmz_metadata *zmd, struct dmz_reclaim **zrc, int idx);
>  void dmz_dtr_reclaim(struct dmz_reclaim *zrc);
>  void dmz_suspend_reclaim(struct dmz_reclaim *zrc);
>  void dmz_resume_reclaim(struct dmz_reclaim *zrc);
> @@ -296,4 +280,22 @@ void dmz_schedule_reclaim(struct dmz_reclaim *zrc);
>  bool dmz_bdev_is_dying(struct dmz_dev *dmz_dev);
>  bool dmz_check_bdev(struct dmz_dev *dmz_dev);
>  
> +/*
> + * Deactivate a zone. This decrement the zone reference counter
> + * indicating that all BIOs to the zone have completed when the count is 0.
> + */
> +static inline void dmz_deactivate_zone(struct dm_zone *zone)
> +{
> +	dmz_reclaim_bio_acc(zone->dev->reclaim);
> +	atomic_dec(&zone->refcount);
> +}
> +
> +/*
> + * Test if a zone is active, that is, has a refcount > 0.
> + */
> +static inline bool dmz_is_active(struct dm_zone *zone)
> +{
> +	return atomic_read(&zone->refcount);
> +}
> +
>  #endif /* DM_ZONED_H */
> 

Looks good.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 02/12] dm-zoned: convert to xarray
  2020-05-25  2:01   ` Damien Le Moal
@ 2020-05-25  7:40     ` Hannes Reinecke
  0 siblings, 0 replies; 34+ messages in thread
From: Hannes Reinecke @ 2020-05-25  7:40 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: dm-devel, Mike Snitzer

On 5/25/20 4:01 AM, Damien Le Moal wrote:
> On 2020/05/23 0:39, Hannes Reinecke wrote:
>> The zones array is getting really large, and large arrays
>> tend to wreak havoc with the caches.
> 
> s/caches/CPU cache, may be ?
> 
>> So convert it to xarray to become more cache friendly.
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>   drivers/md/dm-zoned-metadata.c | 98 +++++++++++++++++++++++++++++++-----------
>>   1 file changed, 73 insertions(+), 25 deletions(-)
>>
>> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
>> index b0d3ed4ac56a..3da6702bb1ae 100644
>> --- a/drivers/md/dm-zoned-metadata.c
>> +++ b/drivers/md/dm-zoned-metadata.c
>> @@ -172,7 +172,7 @@ struct dmz_metadata {
>>   	unsigned int		nr_chunks;
>>   
>>   	/* Zone information array */
>> -	struct dm_zone		*zones;
>> +	struct xarray		zones;
>>   
>>   	struct dmz_sb		sb[3];
>>   	unsigned int		mblk_primary;
>> @@ -327,6 +327,11 @@ unsigned int dmz_nr_unmap_seq_zones(struct dmz_metadata *zmd)
>>   	return atomic_read(&zmd->unmap_nr_seq);
>>   }
>>   
>> +static struct dm_zone *dmz_get(struct dmz_metadata *zmd, unsigned int zone_id)
>> +{
>> +	return xa_load(&zmd->zones, zone_id);
>> +}
>> +
>>   const char *dmz_metadata_label(struct dmz_metadata *zmd)
>>   {
>>   	return (const char *)zmd->label;
>> @@ -1121,6 +1126,7 @@ static int dmz_lookup_secondary_sb(struct dmz_metadata *zmd)
>>   {
>>   	unsigned int zone_nr_blocks = zmd->zone_nr_blocks;
>>   	struct dmz_mblock *mblk;
>> +	unsigned int zone_id = zmd->sb[0].zone->id;
>>   	int i;
>>   
>>   	/* Allocate a block */
>> @@ -1133,17 +1139,16 @@ static int dmz_lookup_secondary_sb(struct dmz_metadata *zmd)
>>   
>>   	/* Bad first super block: search for the second one */
>>   	zmd->sb[1].block = zmd->sb[0].block + zone_nr_blocks;
>> -	zmd->sb[1].zone = zmd->sb[0].zone + 1;
>> +	zmd->sb[1].zone = xa_load(&zmd->zones, zone_id + 1);
>>   	zmd->sb[1].dev = dmz_zone_to_dev(zmd, zmd->sb[1].zone);
>> -	for (i = 0; i < zmd->nr_rnd_zones - 1; i++) {
>> +	for (i = 1; i < zmd->nr_rnd_zones; i++) {
>>   		if (dmz_read_sb(zmd, 1) != 0)
>>   			break;
>> -		if (le32_to_cpu(zmd->sb[1].sb->magic) == DMZ_MAGIC) {
>> -			zmd->sb[1].zone += i;
>> +		if (le32_to_cpu(zmd->sb[1].sb->magic) == DMZ_MAGIC)
>>   			return 0;
>> -		}
>>   		zmd->sb[1].block += zone_nr_blocks;
>> -		zmd->sb[1].dev = dmz_zone_to_dev(zmd, zmd->sb[1].zone + i);
>> +		zmd->sb[1].zone = dmz_get(zmd, zone_id + i);
>> +		zmd->sb[1].dev = dmz_zone_to_dev(zmd, zmd->sb[1].zone);
>>   	}
>>   
>>   	dmz_free_mblock(zmd, mblk);
>> @@ -1259,8 +1264,12 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
>>   	/* Read and check secondary super block */
>>   	if (ret == 0) {
>>   		sb_good[0] = true;
>> -		if (!zmd->sb[1].zone)
>> -			zmd->sb[1].zone = zmd->sb[0].zone + zmd->nr_meta_zones;
>> +		if (!zmd->sb[1].zone) {
>> +			unsigned int zone_id =
>> +				zmd->sb[0].zone->id + zmd->nr_meta_zones;
>> +
>> +			zmd->sb[1].zone = dmz_get(zmd, zone_id);
>> +		}
>>   		zmd->sb[1].block = dmz_start_block(zmd, zmd->sb[1].zone);
>>   		zmd->sb[1].dev = dmz_zone_to_dev(zmd, zmd->sb[1].zone);
>>   		ret = dmz_get_sb(zmd, 1);
>> @@ -1341,7 +1350,12 @@ static int dmz_init_zone(struct blk_zone *blkz, unsigned int num, void *data)
>>   	struct dmz_metadata *zmd = data;
>>   	struct dmz_dev *dev = zmd->nr_devs > 1 ? &zmd->dev[1] : &zmd->dev[0];
>>   	int idx = num + dev->zone_offset;
>> -	struct dm_zone *zone = &zmd->zones[idx];
>> +	struct dm_zone *zone = kzalloc(sizeof(struct dm_zone), GFP_KERNEL);
>> +
>> +	if (!zone)
>> +		return -ENOMEM;
>> +	if (xa_insert(&zmd->zones, idx, zone, GFP_KERNEL))
>> +		return -EBUSY;
>>   
>>   	if (blkz->len != zmd->zone_nr_sectors) {
>>   		if (zmd->sb_version > 1) {
>> @@ -1397,14 +1411,18 @@ static int dmz_init_zone(struct blk_zone *blkz, unsigned int num, void *data)
>>   	return 0;
>>   }
>>   
>> -static void dmz_emulate_zones(struct dmz_metadata *zmd, struct dmz_dev *dev)
>> +static int dmz_emulate_zones(struct dmz_metadata *zmd, struct dmz_dev *dev)
>>   {
>>   	int idx;
>>   	sector_t zone_offset = 0;
>>   
>>   	for(idx = 0; idx < dev->nr_zones; idx++) {
>> -		struct dm_zone *zone = &zmd->zones[idx];
>> -
>> +		struct dm_zone *zone =
>> +			kzalloc(sizeof(struct dm_zone), GFP_KERNEL);
>> +		if (!zone)
>> +			return -ENOMEM;
>> +		if (xa_insert(&zmd->zones, idx, zone, GFP_KERNEL) < 0)
>> +			return -EBUSY;
> 
> Same change as in dmz_init_zone(). Make this hunk a helper ?
> 
>>   		INIT_LIST_HEAD(&zone->link);
>>   		atomic_set(&zone->refcount, 0);
>>   		zone->id = idx;
> 
> And we can add this inside the helper too.
> 

Okay, will be doing so.

>> @@ -1420,6 +1438,7 @@ static void dmz_emulate_zones(struct dmz_metadata *zmd, struct dmz_dev *dev)
>>   		}
>>   		zone_offset += zmd->zone_nr_sectors;
>>   	}
>> +	return 0;
>>   }
>>   
>>   /*
>> @@ -1427,8 +1446,15 @@ static void dmz_emulate_zones(struct dmz_metadata *zmd, struct dmz_dev *dev)
>>    */
>>   static void dmz_drop_zones(struct dmz_metadata *zmd)
>>   {
>> -	kfree(zmd->zones);
>> -	zmd->zones = NULL;
>> +	int idx;
>> +
>> +	for(idx = 0; idx < zmd->nr_zones; idx++) {
>> +		struct dm_zone *zone = xa_load(&zmd->zones, idx);
>> +
>> +		kfree(zone);
>> +		xa_erase(&zmd->zones, idx);
>> +	}
>> +	xa_destroy(&zmd->zones);
>>   }
>>   
>>   /*
>> @@ -1460,20 +1486,25 @@ static int dmz_init_zones(struct dmz_metadata *zmd)
>>   		DMERR("(%s): No zones found", zmd->devname);
>>   		return -ENXIO;
>>   	}
>> -	zmd->zones = kcalloc(zmd->nr_zones, sizeof(struct dm_zone), GFP_KERNEL);
>> -	if (!zmd->zones)
>> -		return -ENOMEM;
>> +	xa_init(&zmd->zones);
>>   
>>   	DMDEBUG("(%s): Using %zu B for zone information",
>>   		zmd->devname, sizeof(struct dm_zone) * zmd->nr_zones);
>>   
>>   	if (zmd->nr_devs > 1) {
>> -		dmz_emulate_zones(zmd, &zmd->dev[0]);
>> +		ret = dmz_emulate_zones(zmd, &zmd->dev[0]);
>> +		if (ret < 0) {
>> +			DMDEBUG("(%s): Failed to emulate zones, error %d",
>> +				zmd->devname, ret);
>> +			dmz_drop_zones(zmd);
>> +			return ret;
>> +		}
>> +
>>   		/*
>>   		 * Primary superblock zone is always at zone 0 when multiple
>>   		 * drives are present.
>>   		 */
>> -		zmd->sb[0].zone = &zmd->zones[0];
>> +		zmd->sb[0].zone = dmz_get(zmd, 0);
>>   
>>   		zoned_dev = &zmd->dev[1];
>>   	}
>> @@ -1576,11 +1607,6 @@ static int dmz_handle_seq_write_err(struct dmz_metadata *zmd,
>>   	return 0;
>>   }
>>   
>> -static struct dm_zone *dmz_get(struct dmz_metadata *zmd, unsigned int zone_id)
>> -{
>> -	return &zmd->zones[zone_id];
>> -}
>> -
>>   /*
>>    * Reset a zone write pointer.
>>    */
>> @@ -1662,6 +1688,11 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
>>   		}
>>   
>>   		dzone = dmz_get(zmd, dzone_id);
>> +		if (!dzone) {
>> +			dmz_zmd_err(zmd, "Chunk %u mapping: data zone %u not present",
>> +				    chunk, dzone_id);
>> +			return -EIO;
>> +		}
>>   		set_bit(DMZ_DATA, &dzone->flags);
>>   		dzone->chunk = chunk;
>>   		dmz_get_zone_weight(zmd, dzone);
>> @@ -1685,6 +1716,11 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
>>   		}
>>   
>>   		bzone = dmz_get(zmd, bzone_id);
>> +		if (!bzone) {
>> +			dmz_zmd_err(zmd, "Chunk %u mapping: buffer zone %u not present",
>> +				    chunk, bzone_id);
>> +			return -EIO;
>> +		}
>>   		if (!dmz_is_rnd(bzone) && !dmz_is_cache(bzone)) {
>>   			dmz_zmd_err(zmd, "Chunk %u mapping: invalid buffer zone %u",
>>   				    chunk, bzone_id);
>> @@ -1715,6 +1751,8 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
>>   	 */
>>   	for (i = 0; i < zmd->nr_zones; i++) {
>>   		dzone = dmz_get(zmd, i);
>> +		if (!dzone)
>> +			continue;
>>   		if (dmz_is_meta(dzone))
>>   			continue;
>>   		if (dmz_is_offline(dzone))
>> @@ -1977,6 +2015,10 @@ struct dm_zone *dmz_get_chunk_mapping(struct dmz_metadata *zmd, unsigned int chu
>>   	} else {
>>   		/* The chunk is already mapped: get the mapping zone */
>>   		dzone = dmz_get(zmd, dzone_id);
>> +		if (!dzone) {
>> +			dzone = ERR_PTR(-EIO);
>> +			goto out;
>> +		}
>>   		if (dzone->chunk != chunk) {
>>   			dzone = ERR_PTR(-EIO);
>>   			goto out;
>> @@ -2794,6 +2836,12 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
>>   	/* Set metadata zones starting from sb_zone */
>>   	for (i = 0; i < zmd->nr_meta_zones << 1; i++) {
>>   		zone = dmz_get(zmd, zmd->sb[0].zone->id + i);
>> +		if (!zone) {
>> +			dmz_zmd_err(zmd,
>> +				    "metadata zone %u not present", i);
>> +			ret = -ENXIO;
>> +			goto err;
>> +		}
>>   		if (!dmz_is_rnd(zone) && !dmz_is_cache(zone)) {
>>   			dmz_zmd_err(zmd,
>>   				    "metadata zone %d is not random", i);
>>
> 
> Apart from the nits above, this looks good to me.
> 
Thanks.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 03/12] dm-zoned: use on-stack superblock for tertiary devices
  2020-05-25  2:09   ` Damien Le Moal
@ 2020-05-25  7:41     ` Hannes Reinecke
  2020-05-26  8:25     ` Hannes Reinecke
  1 sibling, 0 replies; 34+ messages in thread
From: Hannes Reinecke @ 2020-05-25  7:41 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: dm-devel, Mike Snitzer

On 5/25/20 4:09 AM, Damien Le Moal wrote:
> On 2020/05/23 0:39, Hannes Reinecke wrote:
>> Checking the teriary superblock just consists of validating UUIDs,
> 
> s/teriary/tertiary
> 
>> crcs, and the generation number; it doesn't have contents which
>> would be required during the actual operation.
>> So we should use an on-stack superblock and avoid having to store
>> it together with the 'real' superblocks.
> 
> ...a temoprary in-memory superblock allocation...
> 
> The entire structure should not be on stack... see below.
> 
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>   drivers/md/dm-zoned-metadata.c | 98 +++++++++++++++++++++++-------------------
>>   1 file changed, 53 insertions(+), 45 deletions(-)
>>
>> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
>> index 3da6702bb1ae..b70a988fa771 100644
>> --- a/drivers/md/dm-zoned-metadata.c
>> +++ b/drivers/md/dm-zoned-metadata.c
>> @@ -174,7 +174,7 @@ struct dmz_metadata {
>>   	/* Zone information array */
>>   	struct xarray		zones;
>>   
>> -	struct dmz_sb		sb[3];
>> +	struct dmz_sb		sb[2];
>>   	unsigned int		mblk_primary;
>>   	unsigned int		sb_version;
>>   	u64			sb_gen;
>> @@ -995,10 +995,11 @@ int dmz_flush_metadata(struct dmz_metadata *zmd)
>>   /*
>>    * Check super block.
>>    */
>> -static int dmz_check_sb(struct dmz_metadata *zmd, unsigned int set)
>> +static int dmz_check_sb(struct dmz_metadata *zmd, struct dmz_sb *dsb,
>> +			bool tertiary)
>>   {
>> -	struct dmz_super *sb = zmd->sb[set].sb;
>> -	struct dmz_dev *dev = zmd->sb[set].dev;
>> +	struct dmz_super *sb = dsb->sb;
>> +	struct dmz_dev *dev = dsb->dev;
>>   	unsigned int nr_meta_zones, nr_data_zones;
>>   	u32 crc, stored_crc;
>>   	u64 gen;
>> @@ -1015,7 +1016,7 @@ static int dmz_check_sb(struct dmz_metadata *zmd, unsigned int set)
>>   			    DMZ_META_VER, zmd->sb_version);
>>   		return -EINVAL;
>>   	}
>> -	if ((zmd->sb_version < 1) && (set == 2)) {
>> +	if ((zmd->sb_version < 1) && tertiary) {
>>   		dmz_dev_err(dev, "Tertiary superblocks are not supported");
>>   		return -EINVAL;
>>   	}
>> @@ -1059,7 +1060,7 @@ static int dmz_check_sb(struct dmz_metadata *zmd, unsigned int set)
>>   			return -ENXIO;
>>   		}
>>   
>> -		if (set == 2) {
>> +		if (tertiary) {
>>   			/*
>>   			 * Generation number should be 0, but it doesn't
>>   			 * really matter if it isn't.
>> @@ -1108,13 +1109,13 @@ static int dmz_check_sb(struct dmz_metadata *zmd, unsigned int set)
>>   /*
>>    * Read the first or second super block from disk.
>>    */
>> -static int dmz_read_sb(struct dmz_metadata *zmd, unsigned int set)
>> +static int dmz_read_sb(struct dmz_metadata *zmd, struct dmz_sb *sb, int set)
>>   {
>>   	DMDEBUG("(%s): read superblock set %d dev %s block %llu",
>>   		zmd->devname, set, zmd->sb[set].dev->name,
>>   		zmd->sb[set].block);
>> -	return dmz_rdwr_block(zmd->sb[set].dev, REQ_OP_READ,
>> -			      zmd->sb[set].block, zmd->sb[set].mblk->page);
>> +	return dmz_rdwr_block(sb->dev, REQ_OP_READ,
>> +			      sb->block, sb->mblk->page);
>>   }
>>   
>>   /*
>> @@ -1142,7 +1143,7 @@ static int dmz_lookup_secondary_sb(struct dmz_metadata *zmd)
>>   	zmd->sb[1].zone = xa_load(&zmd->zones, zone_id + 1);
>>   	zmd->sb[1].dev = dmz_zone_to_dev(zmd, zmd->sb[1].zone);
>>   	for (i = 1; i < zmd->nr_rnd_zones; i++) {
>> -		if (dmz_read_sb(zmd, 1) != 0)
>> +		if (dmz_read_sb(zmd, &zmd->sb[1], 1) != 0)
>>   			break;
>>   		if (le32_to_cpu(zmd->sb[1].sb->magic) == DMZ_MAGIC)
>>   			return 0;
>> @@ -1160,9 +1161,9 @@ static int dmz_lookup_secondary_sb(struct dmz_metadata *zmd)
>>   }
>>   
>>   /*
>> - * Read the first or second super block from disk.
>> + * Read a super block from disk.
>>    */
>> -static int dmz_get_sb(struct dmz_metadata *zmd, unsigned int set)
>> +static int dmz_get_sb(struct dmz_metadata *zmd, struct dmz_sb *sb, int set)
>>   {
>>   	struct dmz_mblock *mblk;
>>   	int ret;
>> @@ -1172,14 +1173,14 @@ static int dmz_get_sb(struct dmz_metadata *zmd, unsigned int set)
>>   	if (!mblk)
>>   		return -ENOMEM;
>>   
>> -	zmd->sb[set].mblk = mblk;
>> -	zmd->sb[set].sb = mblk->data;
>> +	sb->mblk = mblk;
>> +	sb->sb = mblk->data;
>>   
>>   	/* Read super block */
>> -	ret = dmz_read_sb(zmd, set);
>> +	ret = dmz_read_sb(zmd, sb, set);
>>   	if (ret) {
>>   		dmz_free_mblock(zmd, mblk);
>> -		zmd->sb[set].mblk = NULL;
>> +		sb->mblk = NULL;
>>   		return ret;
>>   	}
>>   
>> @@ -1253,13 +1254,13 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
>>   	/* Read and check the primary super block */
>>   	zmd->sb[0].block = dmz_start_block(zmd, zmd->sb[0].zone);
>>   	zmd->sb[0].dev = dmz_zone_to_dev(zmd, zmd->sb[0].zone);
>> -	ret = dmz_get_sb(zmd, 0);
>> +	ret = dmz_get_sb(zmd, &zmd->sb[0], 0);
>>   	if (ret) {
>>   		dmz_dev_err(zmd->sb[0].dev, "Read primary super block failed");
>>   		return ret;
>>   	}
>>   
>> -	ret = dmz_check_sb(zmd, 0);
>> +	ret = dmz_check_sb(zmd, &zmd->sb[0], false);
>>   
>>   	/* Read and check secondary super block */
>>   	if (ret == 0) {
>> @@ -1272,7 +1273,7 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
>>   		}
>>   		zmd->sb[1].block = dmz_start_block(zmd, zmd->sb[1].zone);
>>   		zmd->sb[1].dev = dmz_zone_to_dev(zmd, zmd->sb[1].zone);
>> -		ret = dmz_get_sb(zmd, 1);
>> +		ret = dmz_get_sb(zmd, &zmd->sb[1], 1);
>>   	} else
>>   		ret = dmz_lookup_secondary_sb(zmd);
>>   
>> @@ -1281,7 +1282,7 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
>>   		return ret;
>>   	}
>>   
>> -	ret = dmz_check_sb(zmd, 1);
>> +	ret = dmz_check_sb(zmd, &zmd->sb[1], false);
>>   	if (ret == 0)
>>   		sb_good[1] = true;
>>   
>> @@ -1326,18 +1327,32 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
>>   		      "Using super block %u (gen %llu)",
>>   		      zmd->mblk_primary, zmd->sb_gen);
>>   
>> -	if ((zmd->sb_version > 1) && zmd->sb[2].zone) {
>> -		zmd->sb[2].block = dmz_start_block(zmd, zmd->sb[2].zone);
>> -		zmd->sb[2].dev = dmz_zone_to_dev(zmd, zmd->sb[2].zone);
>> -		ret = dmz_get_sb(zmd, 2);
>> -		if (ret) {
>> -			dmz_dev_err(zmd->sb[2].dev,
>> -				    "Read tertiary super block failed");
>> -			return ret;
>> +	if (zmd->sb_version > 1) {
>> +		int i;
>> +
>> +		for (i = 1; i < zmd->nr_devs; i++) {
>> +			struct dmz_sb sb;
> 
> I would rather have dmz_get_sb() allocate this struct than have it on stack...
> It is not big, but still. To be symetric, we can add dmz_put_sb() for freeing it.
> 

Okay, no big deal.
I'll convert it to allocate one.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 05/12] dm-zoned: add device pointer to struct dm_zone
  2020-05-25  2:15   ` Damien Le Moal
@ 2020-05-25  7:42     ` Hannes Reinecke
  0 siblings, 0 replies; 34+ messages in thread
From: Hannes Reinecke @ 2020-05-25  7:42 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: dm-devel, Mike Snitzer

On 5/25/20 4:15 AM, Damien Le Moal wrote:
> On 2020/05/23 0:39, Hannes Reinecke wrote:
>> Add a pointer to the containing device to struct dm_zone and
>> kill dmz_zone_to_dev().
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>   drivers/md/dm-zoned-metadata.c | 47 ++++++++++++------------------------------
>>   drivers/md/dm-zoned-reclaim.c  | 18 +++++++---------
>>   drivers/md/dm-zoned-target.c   |  7 +++----
>>   drivers/md/dm-zoned.h          |  4 +++-
>>   4 files changed, 26 insertions(+), 50 deletions(-)
>>
>> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
>> index fdae4e0228e7..7b6e7404f1e8 100644
>> --- a/drivers/md/dm-zoned-metadata.c
>> +++ b/drivers/md/dm-zoned-metadata.c
>> @@ -229,16 +229,10 @@ struct dmz_metadata {
>>    */
>>   static unsigned int dmz_dev_zone_id(struct dmz_metadata *zmd, struct dm_zone *zone)
>>   {
>> -	unsigned int zone_id;
>> -
>>   	if (WARN_ON(!zone))
>>   		return 0;
>>   
>> -	zone_id = zone->id;
>> -	if (zmd->nr_devs > 1 &&
>> -	    (zone_id >= zmd->dev[1].zone_offset))
>> -		zone_id -= zmd->dev[1].zone_offset;
>> -	return zone_id;
>> +	return zone->id - zone->dev->zone_offset;
>>   }
>>   
>>   sector_t dmz_start_sect(struct dmz_metadata *zmd, struct dm_zone *zone)
>> @@ -255,18 +249,6 @@ sector_t dmz_start_block(struct dmz_metadata *zmd, struct dm_zone *zone)
>>   	return (sector_t)zone_id << zmd->zone_nr_blocks_shift;
>>   }
>>   
>> -struct dmz_dev *dmz_zone_to_dev(struct dmz_metadata *zmd, struct dm_zone *zone)
>> -{
>> -	if (WARN_ON(!zone))
>> -		return &zmd->dev[0];
>> -
>> -	if (zmd->nr_devs > 1 &&
>> -	    zone->id >= zmd->dev[1].zone_offset)
>> -		return &zmd->dev[1];
>> -
>> -	return &zmd->dev[0];
>> -}
>> -
>>   unsigned int dmz_zone_nr_blocks(struct dmz_metadata *zmd)
>>   {
>>   	return zmd->zone_nr_blocks;
>> @@ -1252,7 +1234,7 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
>>   
>>   	/* Read and check the primary super block */
>>   	zmd->sb[0].block = dmz_start_block(zmd, zmd->sb[0].zone);
>> -	zmd->sb[0].dev = dmz_zone_to_dev(zmd, zmd->sb[0].zone);
>> +	zmd->sb[0].dev = zmd->sb[0].zone->dev;
>>   	ret = dmz_get_sb(zmd, &zmd->sb[0], 0);
>>   	if (ret) {
>>   		dmz_dev_err(zmd->sb[0].dev, "Read primary super block failed");
>> @@ -1383,6 +1365,7 @@ static int dmz_init_zone(struct blk_zone *blkz, unsigned int num, void *data)
>>   
>>   	INIT_LIST_HEAD(&zone->link);
>>   	atomic_set(&zone->refcount, 0);
>> +	zone->dev = dev;
>>   	zone->id = idx;
>>   	zone->chunk = DMZ_MAP_UNMAPPED;
>>   
>> @@ -1442,6 +1425,7 @@ static int dmz_emulate_zones(struct dmz_metadata *zmd, struct dmz_dev *dev)
>>   			return -EBUSY;
>>   		INIT_LIST_HEAD(&zone->link);
>>   		atomic_set(&zone->refcount, 0);
>> +		zone->dev = dev;
>>   		zone->id = idx;
>>   		zone->chunk = DMZ_MAP_UNMAPPED;
>>   		set_bit(DMZ_CACHE, &zone->flags);
>> @@ -1567,11 +1551,10 @@ static int dmz_update_zone_cb(struct blk_zone *blkz, unsigned int idx,
>>    */
>>   static int dmz_update_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
>>   {
>> -	struct dmz_dev *dev = dmz_zone_to_dev(zmd, zone);
> 
> If you keep this one and make it:
> 
> 	struct dmz_dev *dev = zone->dev;
> 
> You can avoid all the changes below, and dereferencing the same pointer multiple
> times.
> 
>>   	unsigned int noio_flag;
>>   	int ret;
>>   
>> -	if (dev->flags & DMZ_BDEV_REGULAR)
>> +	if (zone->dev->flags & DMZ_BDEV_REGULAR)
>>   		return 0;
>>   
>>   	/*
>> @@ -1581,16 +1564,16 @@ static int dmz_update_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
>>   	 * GFP_NOIO was specified.
>>   	 */
>>   	noio_flag = memalloc_noio_save();
>> -	ret = blkdev_report_zones(dev->bdev, dmz_start_sect(zmd, zone), 1,
>> +	ret = blkdev_report_zones(zone->dev->bdev, dmz_start_sect(zmd, zone), 1,
>>   				  dmz_update_zone_cb, zone);
>>   	memalloc_noio_restore(noio_flag);
>>   
>>   	if (ret == 0)
>>   		ret = -EIO;
>>   	if (ret < 0) {
>> -		dmz_dev_err(dev, "Get zone %u report failed",
>> +		dmz_dev_err(zone->dev, "Get zone %u report failed",
>>   			    zone->id);
>> -		dmz_check_bdev(dev);
>> +		dmz_check_bdev(zone->dev);
>>   		return ret;
>>   	}
>>   
>> @@ -1604,7 +1587,6 @@ static int dmz_update_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
>>   static int dmz_handle_seq_write_err(struct dmz_metadata *zmd,
>>   				    struct dm_zone *zone)
>>   {
>> -	struct dmz_dev *dev = dmz_zone_to_dev(zmd, zone);
>>   	unsigned int wp = 0;
>>   	int ret;
>>   
>> @@ -1613,7 +1595,8 @@ static int dmz_handle_seq_write_err(struct dmz_metadata *zmd,
>>   	if (ret)
>>   		return ret;
>>   
>> -	dmz_dev_warn(dev, "Processing zone %u write error (zone wp %u/%u)",
>> +	dmz_dev_warn(zone->dev,
>> +		     "Processing zone %u write error (zone wp %u/%u)",
>>   		     zone->id, zone->wp_block, wp);
>>   
>>   	if (zone->wp_block < wp) {
>> @@ -1641,13 +1624,11 @@ static int dmz_reset_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
>>   		return 0;
>>   
>>   	if (!dmz_is_empty(zone) || dmz_seq_write_err(zone)) {
>> -		struct dmz_dev *dev = dmz_zone_to_dev(zmd, zone);
>> -
>> -		ret = blkdev_zone_mgmt(dev->bdev, REQ_OP_ZONE_RESET,
>> +		ret = blkdev_zone_mgmt(zone->dev->bdev, REQ_OP_ZONE_RESET,
>>   				       dmz_start_sect(zmd, zone),
>>   				       zmd->zone_nr_sectors, GFP_NOIO);
>>   		if (ret) {
>> -			dmz_dev_err(dev, "Reset zone %u failed %d",
>> +			dmz_dev_err(zone->dev, "Reset zone %u failed %d",
>>   				    zone->id, ret);
>>   			return ret;
>>   		}
>> @@ -2201,9 +2182,7 @@ struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags)
>>   		goto again;
>>   	}
>>   	if (dmz_is_meta(zone)) {
>> -		struct dmz_dev *dev = dmz_zone_to_dev(zmd, zone);
>> -
>> -		dmz_dev_warn(dev, "Zone %u has metadata", zone->id);
>> +		dmz_zmd_warn(zmd, "Zone %u has metadata", zone->id);
>>   		zone = NULL;
>>   		goto again;
>>   	}
> 
> Same comment as above for all these changes.
> 
>> diff --git a/drivers/md/dm-zoned-reclaim.c b/drivers/md/dm-zoned-reclaim.c
>> index 571bc1d41bab..d1a72b42dea2 100644
>> --- a/drivers/md/dm-zoned-reclaim.c
>> +++ b/drivers/md/dm-zoned-reclaim.c
>> @@ -58,7 +58,6 @@ static int dmz_reclaim_align_wp(struct dmz_reclaim *zrc, struct dm_zone *zone,
>>   				sector_t block)
>>   {
>>   	struct dmz_metadata *zmd = zrc->metadata;
>> -	struct dmz_dev *dev = dmz_zone_to_dev(zmd, zone);
>>   	sector_t wp_block = zone->wp_block;
>>   	unsigned int nr_blocks;
>>   	int ret;
>> @@ -74,15 +73,15 @@ static int dmz_reclaim_align_wp(struct dmz_reclaim *zrc, struct dm_zone *zone,
>>   	 * pointer and the requested position.
>>   	 */
>>   	nr_blocks = block - wp_block;
>> -	ret = blkdev_issue_zeroout(dev->bdev,
>> +	ret = blkdev_issue_zeroout(zone->dev->bdev,
>>   				   dmz_start_sect(zmd, zone) + dmz_blk2sect(wp_block),
>>   				   dmz_blk2sect(nr_blocks), GFP_NOIO, 0);
>>   	if (ret) {
>> -		dmz_dev_err(dev,
>> +		dmz_dev_err(zone->dev,
>>   			    "Align zone %u wp %llu to %llu (wp+%u) blocks failed %d",
>>   			    zone->id, (unsigned long long)wp_block,
>>   			    (unsigned long long)block, nr_blocks, ret);
>> -		dmz_check_bdev(dev);
>> +		dmz_check_bdev(zone->dev);
>>   		return ret;
>>   	}
> 
> Same again.
> 
>>   
>> @@ -116,7 +115,6 @@ static int dmz_reclaim_copy(struct dmz_reclaim *zrc,
>>   			    struct dm_zone *src_zone, struct dm_zone *dst_zone)
>>   {
>>   	struct dmz_metadata *zmd = zrc->metadata;
>> -	struct dmz_dev *src_dev, *dst_dev;
>>   	struct dm_io_region src, dst;
>>   	sector_t block = 0, end_block;
>>   	sector_t nr_blocks;
>> @@ -130,17 +128,15 @@ static int dmz_reclaim_copy(struct dmz_reclaim *zrc,
>>   	else
>>   		end_block = dmz_zone_nr_blocks(zmd);
>>   	src_zone_block = dmz_start_block(zmd, src_zone);
>> -	src_dev = dmz_zone_to_dev(zmd, src_zone);
>>   	dst_zone_block = dmz_start_block(zmd, dst_zone);
>> -	dst_dev = dmz_zone_to_dev(zmd, dst_zone);
>>   
>>   	if (dmz_is_seq(dst_zone))
>>   		set_bit(DM_KCOPYD_WRITE_SEQ, &flags);
>>   
>>   	while (block < end_block) {
>> -		if (src_dev->flags & DMZ_BDEV_DYING)
>> +		if (src_zone->dev->flags & DMZ_BDEV_DYING)
>>   			return -EIO;
>> -		if (dst_dev->flags & DMZ_BDEV_DYING)
>> +		if (dst_zone->dev->flags & DMZ_BDEV_DYING)
>>   			return -EIO;
>>   
>>   		if (dmz_reclaim_should_terminate(src_zone))
>> @@ -163,11 +159,11 @@ static int dmz_reclaim_copy(struct dmz_reclaim *zrc,
>>   				return ret;
>>   		}
>>   
>> -		src.bdev = src_dev->bdev;
>> +		src.bdev = src_zone->dev->bdev;
>>   		src.sector = dmz_blk2sect(src_zone_block + block);
>>   		src.count = dmz_blk2sect(nr_blocks);
>>   
>> -		dst.bdev = dst_dev->bdev;
>> +		dst.bdev = dst_zone->dev->bdev;
>>   		dst.sector = dmz_blk2sect(dst_zone_block + block);
>>   		dst.count = src.count;
> 
> And again the same here.
> 
>>   
>> diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
>> index 2770e293a97b..bca9a611b8dd 100644
>> --- a/drivers/md/dm-zoned-target.c
>> +++ b/drivers/md/dm-zoned-target.c
>> @@ -123,18 +123,17 @@ static int dmz_submit_bio(struct dmz_target *dmz, struct dm_zone *zone,
>>   {
>>   	struct dmz_bioctx *bioctx =
>>   		dm_per_bio_data(bio, sizeof(struct dmz_bioctx));
>> -	struct dmz_dev *dev = dmz_zone_to_dev(dmz->metadata, zone);
>>   	struct bio *clone;
>>   
>> -	if (dev->flags & DMZ_BDEV_DYING)
>> +	if (zone->dev->flags & DMZ_BDEV_DYING)
>>   		return -EIO;
>>   
>>   	clone = bio_clone_fast(bio, GFP_NOIO, &dmz->bio_set);
>>   	if (!clone)
>>   		return -ENOMEM;
>>   
>> -	bio_set_dev(clone, dev->bdev);
>> -	bioctx->dev = dev;
>> +	bio_set_dev(clone, zone->dev->bdev);
>> +	bioctx->dev = zone->dev;
>>   	clone->bi_iter.bi_sector =
>>   		dmz_start_sect(dmz->metadata, zone) + dmz_blk2sect(chunk_block);
>>   	clone->bi_iter.bi_size = dmz_blk2sect(nr_blocks) << SECTOR_SHIFT;
> 
> And here too. Yhe patch would become much shorter :)
> 
You know what, that was my first attempt. But then I decided to drop the 
variable :-)

Anyway, will be redoing the patch.

Cheers

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 08/12] dm-zoned: move random and sequential zones into struct dmz_dev
  2020-05-25  2:27   ` Damien Le Moal
@ 2020-05-25  7:47     ` Hannes Reinecke
  0 siblings, 0 replies; 34+ messages in thread
From: Hannes Reinecke @ 2020-05-25  7:47 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: dm-devel, Mike Snitzer

On 5/25/20 4:27 AM, Damien Le Moal wrote:
> On 2020/05/23 0:39, Hannes Reinecke wrote:
>> Random and sequential zones should be part of the respective
>> device structure to make arbitration between devices possible.
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>   drivers/md/dm-zoned-metadata.c | 143 +++++++++++++++++++++++++----------------
>>   drivers/md/dm-zoned.h          |  10 +++
>>   2 files changed, 99 insertions(+), 54 deletions(-)
>>
>> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
>> index 1b9da698a812..5f44970a6187 100644
>> --- a/drivers/md/dm-zoned-metadata.c
>> +++ b/drivers/md/dm-zoned-metadata.c
>> @@ -192,21 +192,12 @@ struct dmz_metadata {
>>   	/* Zone allocation management */
>>   	struct mutex		map_lock;
>>   	struct dmz_mblock	**map_mblk;
>> -	unsigned int		nr_rnd;
>> -	atomic_t		unmap_nr_rnd;
>> -	struct list_head	unmap_rnd_list;
>> -	struct list_head	map_rnd_list;
>>   
>>   	unsigned int		nr_cache;
>>   	atomic_t		unmap_nr_cache;
>>   	struct list_head	unmap_cache_list;
>>   	struct list_head	map_cache_list;
>>   
>> -	unsigned int		nr_seq;
>> -	atomic_t		unmap_nr_seq;
>> -	struct list_head	unmap_seq_list;
>> -	struct list_head	map_seq_list;
>> -
>>   	atomic_t		nr_reserved_seq_zones;
>>   	struct list_head	reserved_seq_zones_list;
>>   
>> @@ -281,12 +272,22 @@ unsigned int dmz_nr_chunks(struct dmz_metadata *zmd)
>>   
>>   unsigned int dmz_nr_rnd_zones(struct dmz_metadata *zmd)
>>   {
>> -	return zmd->nr_rnd;
>> +	unsigned int nr_rnd_zones = 0;
>> +	int i;
>> +
>> +	for (i = 0; i < zmd->nr_devs; i++)
>> +		nr_rnd_zones += zmd->dev[i].nr_rnd;
> 
> We could keep the total nr_rnd_zones in dmz_metadata to avoid this one since the
> value will never change at run time.
> 

Yeah, we could, but in the end this is only used for logging, so it's 
hardly performance critical.
And I have an aversion against having two counters for the same thing;
they inevitably tend to get out of sync.

>> +	return nr_rnd_zones;
>>   }
>>   
>>   unsigned int dmz_nr_unmap_rnd_zones(struct dmz_metadata *zmd)
>>   {
>> -	return atomic_read(&zmd->unmap_nr_rnd);
>> +	unsigned int nr_unmap_rnd_zones = 0;
>> +	int i;
>> +
>> +	for (i = 0; i < zmd->nr_devs; i++)
>> +		nr_unmap_rnd_zones += atomic_read(&zmd->dev[i].unmap_nr_rnd);
>> +	return nr_unmap_rnd_zones;
>>   }
>>   
>>   unsigned int dmz_nr_cache_zones(struct dmz_metadata *zmd)
>> @@ -301,12 +302,22 @@ unsigned int dmz_nr_unmap_cache_zones(struct dmz_metadata *zmd)
>>   
>>   unsigned int dmz_nr_seq_zones(struct dmz_metadata *zmd)
>>   {
>> -	return zmd->nr_seq;
>> +	unsigned int nr_seq_zones = 0;
>> +	int i;
>> +
>> +	for (i = 0; i < zmd->nr_devs; i++)
>> +		nr_seq_zones += zmd->dev[i].nr_seq;
> 
> Same here. This value does not change at runtime.
> 
>> +	return nr_seq_zones;
>>   }
>>   
>>   unsigned int dmz_nr_unmap_seq_zones(struct dmz_metadata *zmd)
>>   {
>> -	return atomic_read(&zmd->unmap_nr_seq);
>> +	unsigned int nr_unmap_seq_zones = 0;
>> +	int i;
>> +
>> +	for (i = 0; i < zmd->nr_devs; i++)
>> +		nr_unmap_seq_zones += atomic_read(&zmd->dev[i].unmap_nr_seq);
>> +	return nr_unmap_seq_zones;
>>   }
>>   
>>   static struct dm_zone *dmz_get(struct dmz_metadata *zmd, unsigned int zone_id)
>> @@ -1485,6 +1496,14 @@ static int dmz_init_zones(struct dmz_metadata *zmd)
>>   
>>   		dev->metadata = zmd;
>>   		zmd->nr_zones += dev->nr_zones;
>> +
>> +		atomic_set(&dev->unmap_nr_rnd, 0);
>> +		INIT_LIST_HEAD(&dev->unmap_rnd_list);
>> +		INIT_LIST_HEAD(&dev->map_rnd_list);
>> +
>> +		atomic_set(&dev->unmap_nr_seq, 0);
>> +		INIT_LIST_HEAD(&dev->unmap_seq_list);
>> +		INIT_LIST_HEAD(&dev->map_seq_list);
>>   	}
>>   
>>   	if (!zmd->nr_zones) {
>> @@ -1702,9 +1721,9 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
>>   		if (dmz_is_cache(dzone))
>>   			list_add_tail(&dzone->link, &zmd->map_cache_list);
>>   		else if (dmz_is_rnd(dzone))
>> -			list_add_tail(&dzone->link, &zmd->map_rnd_list);
>> +			list_add_tail(&dzone->link, &dzone->dev->map_rnd_list);
>>   		else
>> -			list_add_tail(&dzone->link, &zmd->map_seq_list);
>> +			list_add_tail(&dzone->link, &dzone->dev->map_seq_list);
>>   
>>   		/* Check buffer zone */
>>   		bzone_id = le32_to_cpu(dmap[e].bzone_id);
>> @@ -1738,7 +1757,7 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
>>   		if (dmz_is_cache(bzone))
>>   			list_add_tail(&bzone->link, &zmd->map_cache_list);
>>   		else
>> -			list_add_tail(&bzone->link, &zmd->map_rnd_list);
>> +			list_add_tail(&bzone->link, &bzone->dev->map_rnd_list);
>>   next:
>>   		chunk++;
>>   		e++;
>> @@ -1763,9 +1782,9 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
>>   		if (dmz_is_cache(dzone))
>>   			zmd->nr_cache++;
>>   		else if (dmz_is_rnd(dzone))
>> -			zmd->nr_rnd++;
>> +			dzone->dev->nr_rnd++;
>>   		else
>> -			zmd->nr_seq++;
>> +			dzone->dev->nr_seq++;
>>   
>>   		if (dmz_is_data(dzone)) {
>>   			/* Already initialized */
>> @@ -1779,16 +1798,18 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
>>   			list_add_tail(&dzone->link, &zmd->unmap_cache_list);
>>   			atomic_inc(&zmd->unmap_nr_cache);
>>   		} else if (dmz_is_rnd(dzone)) {
>> -			list_add_tail(&dzone->link, &zmd->unmap_rnd_list);
>> -			atomic_inc(&zmd->unmap_nr_rnd);
>> +			list_add_tail(&dzone->link,
>> +				      &dzone->dev->unmap_rnd_list);
>> +			atomic_inc(&dzone->dev->unmap_nr_rnd);
>>   		} else if (atomic_read(&zmd->nr_reserved_seq_zones) < zmd->nr_reserved_seq) {
>>   			list_add_tail(&dzone->link, &zmd->reserved_seq_zones_list);
>>   			set_bit(DMZ_RESERVED, &dzone->flags);
>>   			atomic_inc(&zmd->nr_reserved_seq_zones);
>> -			zmd->nr_seq--;
>> +			dzone->dev->nr_seq--;
>>   		} else {
>> -			list_add_tail(&dzone->link, &zmd->unmap_seq_list);
>> -			atomic_inc(&zmd->unmap_nr_seq);
>> +			list_add_tail(&dzone->link,
>> +				      &dzone->dev->unmap_seq_list);
>> +			atomic_inc(&dzone->dev->unmap_nr_seq);
>>   		}
>>   	}
>>   
>> @@ -1822,13 +1843,13 @@ static void __dmz_lru_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
>>   	list_del_init(&zone->link);
>>   	if (dmz_is_seq(zone)) {
>>   		/* LRU rotate sequential zone */
>> -		list_add_tail(&zone->link, &zmd->map_seq_list);
>> +		list_add_tail(&zone->link, &zone->dev->map_seq_list);
>>   	} else if (dmz_is_cache(zone)) {
>>   		/* LRU rotate cache zone */
>>   		list_add_tail(&zone->link, &zmd->map_cache_list);
>>   	} else {
>>   		/* LRU rotate random zone */
>> -		list_add_tail(&zone->link, &zmd->map_rnd_list);
>> +		list_add_tail(&zone->link, &zone->dev->map_rnd_list);
>>   	}
>>   }
>>   
>> @@ -1910,14 +1931,24 @@ static struct dm_zone *dmz_get_rnd_zone_for_reclaim(struct dmz_metadata *zmd,
>>   {
>>   	struct dm_zone *dzone = NULL;
>>   	struct dm_zone *zone;
>> -	struct list_head *zone_list = &zmd->map_rnd_list;
>> +	struct list_head *zone_list;
>>   
>>   	/* If we have cache zones select from the cache zone list */
>>   	if (zmd->nr_cache) {
>>   		zone_list = &zmd->map_cache_list;
>>   		/* Try to relaim random zones, too, when idle */
>> -		if (idle && list_empty(zone_list))
>> -			zone_list = &zmd->map_rnd_list;
>> +		if (idle && list_empty(zone_list)) {
>> +			int i;
>> +
>> +			for (i = 1; i < zmd->nr_devs; i++) {
>> +				zone_list = &zmd->dev[i].map_rnd_list;
>> +				if (!list_empty(zone_list))
>> +					break;
>> +			}
> 
> This is going to use the first zoned dev until it has no more random zones, then
> switch to the next zoned dev. What about going round-robin on the devices to
> increase parallelism between the drives ?
> 
> 

That will happen in a later patch.
This patch just has the basic necessities to get the infrastructure in 
place.

>> +		}
>> +	} else {
>> +		/* Otherwise the random zones are on the first disk */
>> +		zone_list = &zmd->dev[0].map_rnd_list;
>>   	}
>>   
>>   	list_for_each_entry(zone, zone_list, link) {
>> @@ -1938,12 +1969,17 @@ static struct dm_zone *dmz_get_rnd_zone_for_reclaim(struct dmz_metadata *zmd,
>>   static struct dm_zone *dmz_get_seq_zone_for_reclaim(struct dmz_metadata *zmd)
>>   {
>>   	struct dm_zone *zone;
>> +	int i;
>>   
>> -	list_for_each_entry(zone, &zmd->map_seq_list, link) {
>> -		if (!zone->bzone)
>> -			continue;
>> -		if (dmz_lock_zone_reclaim(zone))
>> -			return zone;
>> +	for (i = 0; i < zmd->nr_devs; i++) {
>> +		struct dmz_dev *dev = &zmd->dev[i];
>> +
>> +		list_for_each_entry(zone, &dev->map_seq_list, link) {
>> +			if (!zone->bzone)
>> +				continue;
>> +			if (dmz_lock_zone_reclaim(zone))
>> +				return zone;
>> +		}
> 
> Same comment here.
> 

Same response here :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 10/12] dm-zoned: support arbitrary number of devices
  2020-05-25  2:36   ` Damien Le Moal
@ 2020-05-25  7:52     ` Hannes Reinecke
  2020-05-25  8:22       ` Damien Le Moal
  0 siblings, 1 reply; 34+ messages in thread
From: Hannes Reinecke @ 2020-05-25  7:52 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: dm-devel, Mike Snitzer

On 5/25/20 4:36 AM, Damien Le Moal wrote:
> On 2020/05/23 0:39, Hannes Reinecke wrote:
>> Remove the hard-coded limit of two devices and support an unlimited
>> number of additional zoned devices.
>> With that we need to increase the device-mapper version number to
>> 3.0.0 as we've modified the interface.
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>   drivers/md/dm-zoned-metadata.c |  68 +++++++++++-----------
>>   drivers/md/dm-zoned-reclaim.c  |  28 ++++++---
>>   drivers/md/dm-zoned-target.c   | 129 +++++++++++++++++++++++++----------------
>>   drivers/md/dm-zoned.h          |   9 +--
>>   4 files changed, 139 insertions(+), 95 deletions(-)
>>
>> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
>> index 5f44970a6187..87784e7785bc 100644
>> --- a/drivers/md/dm-zoned-metadata.c
>> +++ b/drivers/md/dm-zoned-metadata.c
>> @@ -260,6 +260,11 @@ unsigned int dmz_zone_nr_sectors_shift(struct dmz_metadata *zmd)
>>   	return zmd->zone_nr_sectors_shift;
>>   }
>>   
>> +unsigned int dmz_nr_devs(struct dmz_metadata *zmd)
>> +{
>> +	return zmd->nr_devs;
>> +}
> 
> Is this helper really needed ?
> 

Yes, in dm-zoned-reclaim.c

>> +
>>   unsigned int dmz_nr_zones(struct dmz_metadata *zmd)
>>   {
>>   	return zmd->nr_zones;
>> @@ -270,24 +275,14 @@ unsigned int dmz_nr_chunks(struct dmz_metadata *zmd)
>>   	return zmd->nr_chunks;
>>   }
>>   
>> -unsigned int dmz_nr_rnd_zones(struct dmz_metadata *zmd)
>> +unsigned int dmz_nr_rnd_zones(struct dmz_metadata *zmd, int idx)
>>   {
>> -	unsigned int nr_rnd_zones = 0;
>> -	int i;
>> -
>> -	for (i = 0; i < zmd->nr_devs; i++)
>> -		nr_rnd_zones += zmd->dev[i].nr_rnd;
>> -	return nr_rnd_zones;
>> +	return zmd->dev[idx].nr_rnd;
> 
> AH. OK. So my comment on patch 8 is voided :)
> 
Yeah, the patch arrangement could be improved; I'll see to roll both 
changes into one patch.

>>   }
>>   
>> -unsigned int dmz_nr_unmap_rnd_zones(struct dmz_metadata *zmd)
>> +unsigned int dmz_nr_unmap_rnd_zones(struct dmz_metadata *zmd, int idx)
>>   {
>> -	unsigned int nr_unmap_rnd_zones = 0;
>> -	int i;
>> -
>> -	for (i = 0; i < zmd->nr_devs; i++)
>> -		nr_unmap_rnd_zones += atomic_read(&zmd->dev[i].unmap_nr_rnd);
>> -	return nr_unmap_rnd_zones;
>> +	return atomic_read(&zmd->dev[idx].unmap_nr_rnd);
>>   }
>>   
>>   unsigned int dmz_nr_cache_zones(struct dmz_metadata *zmd)
>> @@ -300,24 +295,14 @@ unsigned int dmz_nr_unmap_cache_zones(struct dmz_metadata *zmd)
>>   	return atomic_read(&zmd->unmap_nr_cache);
>>   }
>>   
>> -unsigned int dmz_nr_seq_zones(struct dmz_metadata *zmd)
>> +unsigned int dmz_nr_seq_zones(struct dmz_metadata *zmd, int idx)
>>   {
>> -	unsigned int nr_seq_zones = 0;
>> -	int i;
>> -
>> -	for (i = 0; i < zmd->nr_devs; i++)
>> -		nr_seq_zones += zmd->dev[i].nr_seq;
>> -	return nr_seq_zones;
>> +	return zmd->dev[idx].nr_seq;
>>   }
>>   
>> -unsigned int dmz_nr_unmap_seq_zones(struct dmz_metadata *zmd)
>> +unsigned int dmz_nr_unmap_seq_zones(struct dmz_metadata *zmd, int idx)
>>   {
>> -	unsigned int nr_unmap_seq_zones = 0;
>> -	int i;
>> -
>> -	for (i = 0; i < zmd->nr_devs; i++)
>> -		nr_unmap_seq_zones += atomic_read(&zmd->dev[i].unmap_nr_seq);
>> -	return nr_unmap_seq_zones;
>> +	return atomic_read(&zmd->dev[idx].unmap_nr_seq);
>>   }
>>   
>>   static struct dm_zone *dmz_get(struct dmz_metadata *zmd, unsigned int zone_id)
>> @@ -1530,7 +1515,20 @@ static int dmz_init_zones(struct dmz_metadata *zmd)
>>   		 */
>>   		zmd->sb[0].zone = dmz_get(zmd, 0);
>>   
>> -		zoned_dev = &zmd->dev[1];
>> +		for (i = 1; i < zmd->nr_devs; i++) {
>> +			zoned_dev = &zmd->dev[i];
>> +
>> +			ret = blkdev_report_zones(zoned_dev->bdev, 0,
>> +						  BLK_ALL_ZONES,
>> +						  dmz_init_zone, zoned_dev);
>> +			if (ret < 0) {
>> +				DMDEBUG("(%s): Failed to report zones, error %d",
>> +					zmd->devname, ret);
>> +				dmz_drop_zones(zmd);
>> +				return ret;
>> +			}
>> +		}
>> +		return 0;
>>   	}
>>   
>>   	/*
>> @@ -2921,10 +2919,14 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
>>   		      zmd->nr_data_zones, zmd->nr_chunks);
>>   	dmz_zmd_debug(zmd, "    %u cache zones (%u unmapped)",
>>   		      zmd->nr_cache, atomic_read(&zmd->unmap_nr_cache));
>> -	dmz_zmd_debug(zmd, "    %u random zones (%u unmapped)",
>> -		      dmz_nr_rnd_zones(zmd), dmz_nr_unmap_rnd_zones(zmd));
>> -	dmz_zmd_debug(zmd, "    %u sequential zones (%u unmapped)",
>> -		      dmz_nr_seq_zones(zmd), dmz_nr_unmap_seq_zones(zmd));
>> +	for (i = 0; i < zmd->nr_devs; i++) {
>> +		dmz_zmd_debug(zmd, "    %u random zones (%u unmapped)",
>> +			      dmz_nr_rnd_zones(zmd, i),
>> +			      dmz_nr_unmap_rnd_zones(zmd, i));
>> +		dmz_zmd_debug(zmd, "    %u sequential zones (%u unmapped)",
>> +			      dmz_nr_seq_zones(zmd, i),
>> +			      dmz_nr_unmap_seq_zones(zmd, i));
>> +	}
>>   	dmz_zmd_debug(zmd, "  %u reserved sequential data zones",
>>   		      zmd->nr_reserved_seq);
>>   	dmz_zmd_debug(zmd, "Format:");
>> diff --git a/drivers/md/dm-zoned-reclaim.c b/drivers/md/dm-zoned-reclaim.c
>> index fba0d48e38a7..f2e053b5f2db 100644
>> --- a/drivers/md/dm-zoned-reclaim.c
>> +++ b/drivers/md/dm-zoned-reclaim.c
>> @@ -442,15 +442,18 @@ static unsigned int dmz_reclaim_percentage(struct dmz_reclaim *zrc)
>>   {
>>   	struct dmz_metadata *zmd = zrc->metadata;
>>   	unsigned int nr_cache = dmz_nr_cache_zones(zmd);
>> -	unsigned int nr_rnd = dmz_nr_rnd_zones(zmd);
>> -	unsigned int nr_unmap, nr_zones;
>> +	unsigned int nr_unmap = 0, nr_zones = 0;
>>   
>>   	if (nr_cache) {
>>   		nr_zones = nr_cache;
>>   		nr_unmap = dmz_nr_unmap_cache_zones(zmd);
>>   	} else {
>> -		nr_zones = nr_rnd;
>> -		nr_unmap = dmz_nr_unmap_rnd_zones(zmd);
>> +		int i;
>> +
>> +		for (i = 0; i < dmz_nr_devs(zmd); i++) {
>> +			nr_zones += dmz_nr_rnd_zones(zmd, i);
> 
> May be not... We could keep constant totals in zmd to avoid this.
> 
>> +			nr_unmap += dmz_nr_unmap_rnd_zones(zmd, i);
>> +		}
>>   	}
>>   	return nr_unmap * 100 / nr_zones;
>>   }
>> @@ -460,7 +463,11 @@ static unsigned int dmz_reclaim_percentage(struct dmz_reclaim *zrc)
>>    */
>>   static bool dmz_should_reclaim(struct dmz_reclaim *zrc, unsigned int p_unmap)
>>   {
>> -	unsigned int nr_reclaim = dmz_nr_rnd_zones(zrc->metadata);
>> +	int i;
>> +	unsigned int nr_reclaim = 0;
>> +
>> +	for (i = 0; i < dmz_nr_devs(zrc->metadata); i++)
>> +		nr_reclaim += dmz_nr_rnd_zones(zrc->metadata, i);
>>   
>>   	if (dmz_nr_cache_zones(zrc->metadata))
>>   		nr_reclaim += dmz_nr_cache_zones(zrc->metadata);
>> @@ -487,8 +494,8 @@ static void dmz_reclaim_work(struct work_struct *work)
>>   {
>>   	struct dmz_reclaim *zrc = container_of(work, struct dmz_reclaim, work.work);
>>   	struct dmz_metadata *zmd = zrc->metadata;
>> -	unsigned int p_unmap;
>> -	int ret;
>> +	unsigned int p_unmap, nr_unmap_rnd = 0, nr_rnd = 0;
>> +	int ret, i;
>>   
>>   	if (dmz_dev_is_dying(zmd))
>>   		return;
>> @@ -513,14 +520,17 @@ static void dmz_reclaim_work(struct work_struct *work)
>>   		zrc->kc_throttle.throttle = min(75U, 100U - p_unmap / 2);
>>   	}
>>   
>> +	for (i = 0; i < dmz_nr_devs(zmd); i++) {
>> +		nr_unmap_rnd += dmz_nr_unmap_rnd_zones(zmd, i);
>> +		nr_rnd += dmz_nr_rnd_zones(zmd, i);
>> +	}
>>   	DMDEBUG("(%s): Reclaim (%u): %s, %u%% free zones (%u/%u cache %u/%u random)",
>>   		dmz_metadata_label(zmd),
>>   		zrc->kc_throttle.throttle,
>>   		(dmz_target_idle(zrc) ? "Idle" : "Busy"),
>>   		p_unmap, dmz_nr_unmap_cache_zones(zmd),
>>   		dmz_nr_cache_zones(zmd),
>> -		dmz_nr_unmap_rnd_zones(zmd),
>> -		dmz_nr_rnd_zones(zmd));
>> +		nr_unmap_rnd, nr_rnd);
>>   
>>   	ret = dmz_do_reclaim(zrc);
>>   	if (ret && ret != -EINTR) {

In the light of this I guess there is a benefit from having the counters
in the metadata; that indeed would save us to having to export the 
number of devices.
I'll give it a go with the next round.

>> diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
>> index bca9a611b8dd..f34fcc3f7cc6 100644
>> --- a/drivers/md/dm-zoned-target.c
>> +++ b/drivers/md/dm-zoned-target.c
>> @@ -13,8 +13,6 @@
>>   
>>   #define DMZ_MIN_BIOS		8192
>>   
>> -#define DMZ_MAX_DEVS		2
>> -
>>   /*
>>    * Zone BIO context.
>>    */
>> @@ -40,9 +38,10 @@ struct dm_chunk_work {
>>    * Target descriptor.
>>    */
>>   struct dmz_target {
>> -	struct dm_dev		*ddev[DMZ_MAX_DEVS];
>> +	struct dm_dev		**ddev;
>> +	unsigned int		nr_ddevs;
>>   
>> -	unsigned long		flags;
>> +	unsigned int		flags;
>>   
>>   	/* Zoned block device information */
>>   	struct dmz_dev		*dev;
>> @@ -764,7 +763,7 @@ static void dmz_put_zoned_device(struct dm_target *ti)
>>   	struct dmz_target *dmz = ti->private;
>>   	int i;
>>   
>> -	for (i = 0; i < DMZ_MAX_DEVS; i++) {
>> +	for (i = 0; i < dmz->nr_ddevs; i++) {
>>   		if (dmz->ddev[i]) {
>>   			dm_put_device(ti, dmz->ddev[i]);
>>   			dmz->ddev[i] = NULL;
>> @@ -777,21 +776,35 @@ static int dmz_fixup_devices(struct dm_target *ti)
>>   	struct dmz_target *dmz = ti->private;
>>   	struct dmz_dev *reg_dev, *zoned_dev;
>>   	struct request_queue *q;
>> +	sector_t zone_nr_sectors = 0;
>> +	int i;
>>   
>>   	/*
>> -	 * When we have two devices, the first one must be a regular block
>> -	 * device and the second a zoned block device.
>> +	 * When we have more than on devices, the first one must be a
>> +	 * regular block device and the others zoned block devices.
>>   	 */
>> -	if (dmz->ddev[0] && dmz->ddev[1]) {
>> +	if (dmz->nr_ddevs > 1) {
>>   		reg_dev = &dmz->dev[0];
>>   		if (!(reg_dev->flags & DMZ_BDEV_REGULAR)) {
>>   			ti->error = "Primary disk is not a regular device";
>>   			return -EINVAL;
>>   		}
>> -		zoned_dev = &dmz->dev[1];
>> -		if (zoned_dev->flags & DMZ_BDEV_REGULAR) {
>> -			ti->error = "Secondary disk is not a zoned device";
>> -			return -EINVAL;
>> +		for (i = 1; i < dmz->nr_ddevs; i++) {
>> +			zoned_dev = &dmz->dev[i];
>> +			if (zoned_dev->flags & DMZ_BDEV_REGULAR) {
>> +				ti->error = "Secondary disk is not a zoned device";
>> +				return -EINVAL;
>> +			}
>> +			q = bdev_get_queue(zoned_dev->bdev);
> 
> May be add a comment here that we must check that all zoned devices have the
> same zone size ?
> 

I thought it was self-explanatory; but maybe not.
Will be adding it.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 11/12] dm-zoned: round-robin load balancer for reclaiming zones
  2020-05-25  2:42   ` Damien Le Moal
@ 2020-05-25  7:53     ` Hannes Reinecke
  0 siblings, 0 replies; 34+ messages in thread
From: Hannes Reinecke @ 2020-05-25  7:53 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: dm-devel, Mike Snitzer

On 5/25/20 4:42 AM, Damien Le Moal wrote:
> On 2020/05/23 0:39, Hannes Reinecke wrote:
>> When reclaiming zones we should arbitrate between the zoned
>> devices to get a better throughput. So implement a simple
>> round-robin load balancer between the zoned devices.
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>   drivers/md/dm-zoned-metadata.c | 8 +++++++-
>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
>> index 87784e7785bc..25dcad2a565f 100644
>> --- a/drivers/md/dm-zoned-metadata.c
>> +++ b/drivers/md/dm-zoned-metadata.c
>> @@ -171,6 +171,8 @@ struct dmz_metadata {
>>   	unsigned int		nr_reserved_seq;
>>   	unsigned int		nr_chunks;
>>   
>> +	unsigned int		last_alloc_idx;
>> +
>>   	/* Zone information array */
>>   	struct xarray		zones;
>>   
>> @@ -2178,7 +2180,7 @@ struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags)
>>   {
>>   	struct list_head *list;
>>   	struct dm_zone *zone;
>> -	unsigned int dev_idx = 0;
>> +	unsigned int dev_idx = zmd->last_alloc_idx;
>>   
>>   again:
>>   	if (flags & DMZ_ALLOC_CACHE)
>> @@ -2214,6 +2216,9 @@ struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags)
>>   	zone = list_first_entry(list, struct dm_zone, link);
>>   	list_del_init(&zone->link);
>>   
>> +	if (!(flags & DMZ_ALLOC_CACHE))
>> +		zmd->last_alloc_idx = (dev_idx + 1) % zmd->nr_devs;
>> +
>>   	if (dmz_is_cache(zone))
>>   		atomic_dec(&zmd->unmap_nr_cache);
>>   	else if (dmz_is_rnd(zone))
>> @@ -2839,6 +2844,7 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
>>   	zmd->dev = dev;
>>   	zmd->nr_devs = num_dev;
>>   	zmd->mblk_rbtree = RB_ROOT;
>> +	zmd->last_alloc_idx = 0;
>>   	init_rwsem(&zmd->mblk_sem);
>>   	mutex_init(&zmd->mblk_flush_lock);
>>   	spin_lock_init(&zmd->mblk_lock);
>>
> 
> 
> OK. So my comment on patch 8 is already addressed. Or at least partly... Where
> is last_alloc_idx actually used ? It looks like this only sets last_alloc_idx
> but do not use that value on entry to dmz_alloc_zone() to allocate the zone.
> 
Aw, fsck. Something went astray when generating the patches.
Will be fixing it up for the next round.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 10/12] dm-zoned: support arbitrary number of devices
  2020-05-25  7:52     ` Hannes Reinecke
@ 2020-05-25  8:22       ` Damien Le Moal
  0 siblings, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2020-05-25  8:22 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: dm-devel, Mike Snitzer

On 2020/05/25 16:52, Hannes Reinecke wrote:
> On 5/25/20 4:36 AM, Damien Le Moal wrote:
>> On 2020/05/23 0:39, Hannes Reinecke wrote:
>>> Remove the hard-coded limit of two devices and support an unlimited
>>> number of additional zoned devices.
>>> With that we need to increase the device-mapper version number to
>>> 3.0.0 as we've modified the interface.
>>>
>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>>> ---
>>>   drivers/md/dm-zoned-metadata.c |  68 +++++++++++-----------
>>>   drivers/md/dm-zoned-reclaim.c  |  28 ++++++---
>>>   drivers/md/dm-zoned-target.c   | 129 +++++++++++++++++++++++++----------------
>>>   drivers/md/dm-zoned.h          |   9 +--
>>>   4 files changed, 139 insertions(+), 95 deletions(-)
>>>
>>> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
>>> index 5f44970a6187..87784e7785bc 100644
>>> --- a/drivers/md/dm-zoned-metadata.c
>>> +++ b/drivers/md/dm-zoned-metadata.c
>>> @@ -260,6 +260,11 @@ unsigned int dmz_zone_nr_sectors_shift(struct dmz_metadata *zmd)
>>>   	return zmd->zone_nr_sectors_shift;
>>>   }
>>>   
>>> +unsigned int dmz_nr_devs(struct dmz_metadata *zmd)
>>> +{
>>> +	return zmd->nr_devs;
>>> +}
>>
>> Is this helper really needed ?
>>
> 
> Yes, in dm-zoned-reclaim.c

I meant to say: whoever needs to know the number of devices can just use
"zmd->nr_devs". No need for a helper for that was my point.


> 
>>> +
>>>   unsigned int dmz_nr_zones(struct dmz_metadata *zmd)
>>>   {
>>>   	return zmd->nr_zones;
>>> @@ -270,24 +275,14 @@ unsigned int dmz_nr_chunks(struct dmz_metadata *zmd)
>>>   	return zmd->nr_chunks;
>>>   }
>>>   
>>> -unsigned int dmz_nr_rnd_zones(struct dmz_metadata *zmd)
>>> +unsigned int dmz_nr_rnd_zones(struct dmz_metadata *zmd, int idx)
>>>   {
>>> -	unsigned int nr_rnd_zones = 0;
>>> -	int i;
>>> -
>>> -	for (i = 0; i < zmd->nr_devs; i++)
>>> -		nr_rnd_zones += zmd->dev[i].nr_rnd;
>>> -	return nr_rnd_zones;
>>> +	return zmd->dev[idx].nr_rnd;
>>
>> AH. OK. So my comment on patch 8 is voided :)
>>
> Yeah, the patch arrangement could be improved; I'll see to roll both 
> changes into one patch.
> 
>>>   }
>>>   
>>> -unsigned int dmz_nr_unmap_rnd_zones(struct dmz_metadata *zmd)
>>> +unsigned int dmz_nr_unmap_rnd_zones(struct dmz_metadata *zmd, int idx)
>>>   {
>>> -	unsigned int nr_unmap_rnd_zones = 0;
>>> -	int i;
>>> -
>>> -	for (i = 0; i < zmd->nr_devs; i++)
>>> -		nr_unmap_rnd_zones += atomic_read(&zmd->dev[i].unmap_nr_rnd);
>>> -	return nr_unmap_rnd_zones;
>>> +	return atomic_read(&zmd->dev[idx].unmap_nr_rnd);
>>>   }
>>>   
>>>   unsigned int dmz_nr_cache_zones(struct dmz_metadata *zmd)
>>> @@ -300,24 +295,14 @@ unsigned int dmz_nr_unmap_cache_zones(struct dmz_metadata *zmd)
>>>   	return atomic_read(&zmd->unmap_nr_cache);
>>>   }
>>>   
>>> -unsigned int dmz_nr_seq_zones(struct dmz_metadata *zmd)
>>> +unsigned int dmz_nr_seq_zones(struct dmz_metadata *zmd, int idx)
>>>   {
>>> -	unsigned int nr_seq_zones = 0;
>>> -	int i;
>>> -
>>> -	for (i = 0; i < zmd->nr_devs; i++)
>>> -		nr_seq_zones += zmd->dev[i].nr_seq;
>>> -	return nr_seq_zones;
>>> +	return zmd->dev[idx].nr_seq;
>>>   }
>>>   
>>> -unsigned int dmz_nr_unmap_seq_zones(struct dmz_metadata *zmd)
>>> +unsigned int dmz_nr_unmap_seq_zones(struct dmz_metadata *zmd, int idx)
>>>   {
>>> -	unsigned int nr_unmap_seq_zones = 0;
>>> -	int i;
>>> -
>>> -	for (i = 0; i < zmd->nr_devs; i++)
>>> -		nr_unmap_seq_zones += atomic_read(&zmd->dev[i].unmap_nr_seq);
>>> -	return nr_unmap_seq_zones;
>>> +	return atomic_read(&zmd->dev[idx].unmap_nr_seq);
>>>   }
>>>   
>>>   static struct dm_zone *dmz_get(struct dmz_metadata *zmd, unsigned int zone_id)
>>> @@ -1530,7 +1515,20 @@ static int dmz_init_zones(struct dmz_metadata *zmd)
>>>   		 */
>>>   		zmd->sb[0].zone = dmz_get(zmd, 0);
>>>   
>>> -		zoned_dev = &zmd->dev[1];
>>> +		for (i = 1; i < zmd->nr_devs; i++) {
>>> +			zoned_dev = &zmd->dev[i];
>>> +
>>> +			ret = blkdev_report_zones(zoned_dev->bdev, 0,
>>> +						  BLK_ALL_ZONES,
>>> +						  dmz_init_zone, zoned_dev);
>>> +			if (ret < 0) {
>>> +				DMDEBUG("(%s): Failed to report zones, error %d",
>>> +					zmd->devname, ret);
>>> +				dmz_drop_zones(zmd);
>>> +				return ret;
>>> +			}
>>> +		}
>>> +		return 0;
>>>   	}
>>>   
>>>   	/*
>>> @@ -2921,10 +2919,14 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
>>>   		      zmd->nr_data_zones, zmd->nr_chunks);
>>>   	dmz_zmd_debug(zmd, "    %u cache zones (%u unmapped)",
>>>   		      zmd->nr_cache, atomic_read(&zmd->unmap_nr_cache));
>>> -	dmz_zmd_debug(zmd, "    %u random zones (%u unmapped)",
>>> -		      dmz_nr_rnd_zones(zmd), dmz_nr_unmap_rnd_zones(zmd));
>>> -	dmz_zmd_debug(zmd, "    %u sequential zones (%u unmapped)",
>>> -		      dmz_nr_seq_zones(zmd), dmz_nr_unmap_seq_zones(zmd));
>>> +	for (i = 0; i < zmd->nr_devs; i++) {
>>> +		dmz_zmd_debug(zmd, "    %u random zones (%u unmapped)",
>>> +			      dmz_nr_rnd_zones(zmd, i),
>>> +			      dmz_nr_unmap_rnd_zones(zmd, i));
>>> +		dmz_zmd_debug(zmd, "    %u sequential zones (%u unmapped)",
>>> +			      dmz_nr_seq_zones(zmd, i),
>>> +			      dmz_nr_unmap_seq_zones(zmd, i));
>>> +	}
>>>   	dmz_zmd_debug(zmd, "  %u reserved sequential data zones",
>>>   		      zmd->nr_reserved_seq);
>>>   	dmz_zmd_debug(zmd, "Format:");
>>> diff --git a/drivers/md/dm-zoned-reclaim.c b/drivers/md/dm-zoned-reclaim.c
>>> index fba0d48e38a7..f2e053b5f2db 100644
>>> --- a/drivers/md/dm-zoned-reclaim.c
>>> +++ b/drivers/md/dm-zoned-reclaim.c
>>> @@ -442,15 +442,18 @@ static unsigned int dmz_reclaim_percentage(struct dmz_reclaim *zrc)
>>>   {
>>>   	struct dmz_metadata *zmd = zrc->metadata;
>>>   	unsigned int nr_cache = dmz_nr_cache_zones(zmd);
>>> -	unsigned int nr_rnd = dmz_nr_rnd_zones(zmd);
>>> -	unsigned int nr_unmap, nr_zones;
>>> +	unsigned int nr_unmap = 0, nr_zones = 0;
>>>   
>>>   	if (nr_cache) {
>>>   		nr_zones = nr_cache;
>>>   		nr_unmap = dmz_nr_unmap_cache_zones(zmd);
>>>   	} else {
>>> -		nr_zones = nr_rnd;
>>> -		nr_unmap = dmz_nr_unmap_rnd_zones(zmd);
>>> +		int i;
>>> +
>>> +		for (i = 0; i < dmz_nr_devs(zmd); i++) {
>>> +			nr_zones += dmz_nr_rnd_zones(zmd, i);
>>
>> May be not... We could keep constant totals in zmd to avoid this.
>>
>>> +			nr_unmap += dmz_nr_unmap_rnd_zones(zmd, i);
>>> +		}
>>>   	}
>>>   	return nr_unmap * 100 / nr_zones;
>>>   }
>>> @@ -460,7 +463,11 @@ static unsigned int dmz_reclaim_percentage(struct dmz_reclaim *zrc)
>>>    */
>>>   static bool dmz_should_reclaim(struct dmz_reclaim *zrc, unsigned int p_unmap)
>>>   {
>>> -	unsigned int nr_reclaim = dmz_nr_rnd_zones(zrc->metadata);
>>> +	int i;
>>> +	unsigned int nr_reclaim = 0;
>>> +
>>> +	for (i = 0; i < dmz_nr_devs(zrc->metadata); i++)
>>> +		nr_reclaim += dmz_nr_rnd_zones(zrc->metadata, i);
>>>   
>>>   	if (dmz_nr_cache_zones(zrc->metadata))
>>>   		nr_reclaim += dmz_nr_cache_zones(zrc->metadata);
>>> @@ -487,8 +494,8 @@ static void dmz_reclaim_work(struct work_struct *work)
>>>   {
>>>   	struct dmz_reclaim *zrc = container_of(work, struct dmz_reclaim, work.work);
>>>   	struct dmz_metadata *zmd = zrc->metadata;
>>> -	unsigned int p_unmap;
>>> -	int ret;
>>> +	unsigned int p_unmap, nr_unmap_rnd = 0, nr_rnd = 0;
>>> +	int ret, i;
>>>   
>>>   	if (dmz_dev_is_dying(zmd))
>>>   		return;
>>> @@ -513,14 +520,17 @@ static void dmz_reclaim_work(struct work_struct *work)
>>>   		zrc->kc_throttle.throttle = min(75U, 100U - p_unmap / 2);
>>>   	}
>>>   
>>> +	for (i = 0; i < dmz_nr_devs(zmd); i++) {
>>> +		nr_unmap_rnd += dmz_nr_unmap_rnd_zones(zmd, i);
>>> +		nr_rnd += dmz_nr_rnd_zones(zmd, i);
>>> +	}
>>>   	DMDEBUG("(%s): Reclaim (%u): %s, %u%% free zones (%u/%u cache %u/%u random)",
>>>   		dmz_metadata_label(zmd),
>>>   		zrc->kc_throttle.throttle,
>>>   		(dmz_target_idle(zrc) ? "Idle" : "Busy"),
>>>   		p_unmap, dmz_nr_unmap_cache_zones(zmd),
>>>   		dmz_nr_cache_zones(zmd),
>>> -		dmz_nr_unmap_rnd_zones(zmd),
>>> -		dmz_nr_rnd_zones(zmd));
>>> +		nr_unmap_rnd, nr_rnd);
>>>   
>>>   	ret = dmz_do_reclaim(zrc);
>>>   	if (ret && ret != -EINTR) {
> 
> In the light of this I guess there is a benefit from having the counters
> in the metadata; that indeed would save us to having to export the 
> number of devices.
> I'll give it a go with the next round.
> 
>>> diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
>>> index bca9a611b8dd..f34fcc3f7cc6 100644
>>> --- a/drivers/md/dm-zoned-target.c
>>> +++ b/drivers/md/dm-zoned-target.c
>>> @@ -13,8 +13,6 @@
>>>   
>>>   #define DMZ_MIN_BIOS		8192
>>>   
>>> -#define DMZ_MAX_DEVS		2
>>> -
>>>   /*
>>>    * Zone BIO context.
>>>    */
>>> @@ -40,9 +38,10 @@ struct dm_chunk_work {
>>>    * Target descriptor.
>>>    */
>>>   struct dmz_target {
>>> -	struct dm_dev		*ddev[DMZ_MAX_DEVS];
>>> +	struct dm_dev		**ddev;
>>> +	unsigned int		nr_ddevs;
>>>   
>>> -	unsigned long		flags;
>>> +	unsigned int		flags;
>>>   
>>>   	/* Zoned block device information */
>>>   	struct dmz_dev		*dev;
>>> @@ -764,7 +763,7 @@ static void dmz_put_zoned_device(struct dm_target *ti)
>>>   	struct dmz_target *dmz = ti->private;
>>>   	int i;
>>>   
>>> -	for (i = 0; i < DMZ_MAX_DEVS; i++) {
>>> +	for (i = 0; i < dmz->nr_ddevs; i++) {
>>>   		if (dmz->ddev[i]) {
>>>   			dm_put_device(ti, dmz->ddev[i]);
>>>   			dmz->ddev[i] = NULL;
>>> @@ -777,21 +776,35 @@ static int dmz_fixup_devices(struct dm_target *ti)
>>>   	struct dmz_target *dmz = ti->private;
>>>   	struct dmz_dev *reg_dev, *zoned_dev;
>>>   	struct request_queue *q;
>>> +	sector_t zone_nr_sectors = 0;
>>> +	int i;
>>>   
>>>   	/*
>>> -	 * When we have two devices, the first one must be a regular block
>>> -	 * device and the second a zoned block device.
>>> +	 * When we have more than on devices, the first one must be a
>>> +	 * regular block device and the others zoned block devices.
>>>   	 */
>>> -	if (dmz->ddev[0] && dmz->ddev[1]) {
>>> +	if (dmz->nr_ddevs > 1) {
>>>   		reg_dev = &dmz->dev[0];
>>>   		if (!(reg_dev->flags & DMZ_BDEV_REGULAR)) {
>>>   			ti->error = "Primary disk is not a regular device";
>>>   			return -EINVAL;
>>>   		}
>>> -		zoned_dev = &dmz->dev[1];
>>> -		if (zoned_dev->flags & DMZ_BDEV_REGULAR) {
>>> -			ti->error = "Secondary disk is not a zoned device";
>>> -			return -EINVAL;
>>> +		for (i = 1; i < dmz->nr_ddevs; i++) {
>>> +			zoned_dev = &dmz->dev[i];
>>> +			if (zoned_dev->flags & DMZ_BDEV_REGULAR) {
>>> +				ti->error = "Secondary disk is not a zoned device";
>>> +				return -EINVAL;
>>> +			}
>>> +			q = bdev_get_queue(zoned_dev->bdev);
>>
>> May be add a comment here that we must check that all zoned devices have the
>> same zone size ?
>>
> 
> I thought it was self-explanatory; but maybe not.
> Will be adding it.

It is indeed not too hard to figure out. But a plain english sentence is nice too :)

> 
> Cheers,
> 
> Hannes
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 03/12] dm-zoned: use on-stack superblock for tertiary devices
  2020-05-25  2:09   ` Damien Le Moal
  2020-05-25  7:41     ` Hannes Reinecke
@ 2020-05-26  8:25     ` Hannes Reinecke
  2020-05-26  8:48       ` Damien Le Moal
  1 sibling, 1 reply; 34+ messages in thread
From: Hannes Reinecke @ 2020-05-26  8:25 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: dm-devel, Mike Snitzer

On 5/25/20 4:09 AM, Damien Le Moal wrote:
> On 2020/05/23 0:39, Hannes Reinecke wrote:
>> Checking the teriary superblock just consists of validating UUIDs,
> 
> s/teriary/tertiary
> 
>> crcs, and the generation number; it doesn't have contents which
>> would be required during the actual operation.
>> So we should use an on-stack superblock and avoid having to store
>> it together with the 'real' superblocks.
> 
> ...a temoprary in-memory superblock allocation...
> 
> The entire structure should not be on stack... see below.
> 
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>   drivers/md/dm-zoned-metadata.c | 98 +++++++++++++++++++++++-------------------
>>   1 file changed, 53 insertions(+), 45 deletions(-)
>>
>> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
>> index 3da6702bb1ae..b70a988fa771 100644
>> --- a/drivers/md/dm-zoned-metadata.c
>> +++ b/drivers/md/dm-zoned-metadata.c
[ .. ]
>> @@ -1326,18 +1327,32 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
>>   		      "Using super block %u (gen %llu)",
>>   		      zmd->mblk_primary, zmd->sb_gen);
>>   
>> -	if ((zmd->sb_version > 1) && zmd->sb[2].zone) {
>> -		zmd->sb[2].block = dmz_start_block(zmd, zmd->sb[2].zone);
>> -		zmd->sb[2].dev = dmz_zone_to_dev(zmd, zmd->sb[2].zone);
>> -		ret = dmz_get_sb(zmd, 2);
>> -		if (ret) {
>> -			dmz_dev_err(zmd->sb[2].dev,
>> -				    "Read tertiary super block failed");
>> -			return ret;
>> +	if (zmd->sb_version > 1) {
>> +		int i;
>> +
>> +		for (i = 1; i < zmd->nr_devs; i++) {
>> +			struct dmz_sb sb;
> 
> I would rather have dmz_get_sb() allocate this struct than have it on stack...
> It is not big, but still. To be symetric, we can add dmz_put_sb() for freeing it.
> 
While I do agree to not having it on the stack, having dmz_get_sb() 
returning the structure would require (yet another) overhaul of the
main metadata structure which currently has the primary and secondary
superblocks embedded.
And I would argue to keep it that way, as the primary and secondary 
superblocks are essential to the actual operation. So allocating them 
separately would mean yet another indirection to get to them.
At the same time, any tertiary superblock is just used for validation
during startup, and not referenced anywhere afterwards.
So using kzalloc() here and freeing them after checking is fine.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 03/12] dm-zoned: use on-stack superblock for tertiary devices
  2020-05-26  8:25     ` Hannes Reinecke
@ 2020-05-26  8:48       ` Damien Le Moal
  0 siblings, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2020-05-26  8:48 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: dm-devel, Mike Snitzer

On 2020/05/26 17:25, Hannes Reinecke wrote:
> On 5/25/20 4:09 AM, Damien Le Moal wrote:
>> On 2020/05/23 0:39, Hannes Reinecke wrote:
>>> Checking the teriary superblock just consists of validating UUIDs,
>>
>> s/teriary/tertiary
>>
>>> crcs, and the generation number; it doesn't have contents which
>>> would be required during the actual operation.
>>> So we should use an on-stack superblock and avoid having to store
>>> it together with the 'real' superblocks.
>>
>> ...a temoprary in-memory superblock allocation...
>>
>> The entire structure should not be on stack... see below.
>>
>>>
>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>>> ---
>>>   drivers/md/dm-zoned-metadata.c | 98 +++++++++++++++++++++++-------------------
>>>   1 file changed, 53 insertions(+), 45 deletions(-)
>>>
>>> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
>>> index 3da6702bb1ae..b70a988fa771 100644
>>> --- a/drivers/md/dm-zoned-metadata.c
>>> +++ b/drivers/md/dm-zoned-metadata.c
> [ .. ]
>>> @@ -1326,18 +1327,32 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
>>>   		      "Using super block %u (gen %llu)",
>>>   		      zmd->mblk_primary, zmd->sb_gen);
>>>   
>>> -	if ((zmd->sb_version > 1) && zmd->sb[2].zone) {
>>> -		zmd->sb[2].block = dmz_start_block(zmd, zmd->sb[2].zone);
>>> -		zmd->sb[2].dev = dmz_zone_to_dev(zmd, zmd->sb[2].zone);
>>> -		ret = dmz_get_sb(zmd, 2);
>>> -		if (ret) {
>>> -			dmz_dev_err(zmd->sb[2].dev,
>>> -				    "Read tertiary super block failed");
>>> -			return ret;
>>> +	if (zmd->sb_version > 1) {
>>> +		int i;
>>> +
>>> +		for (i = 1; i < zmd->nr_devs; i++) {
>>> +			struct dmz_sb sb;
>>
>> I would rather have dmz_get_sb() allocate this struct than have it on stack...
>> It is not big, but still. To be symetric, we can add dmz_put_sb() for freeing it.
>>
> While I do agree to not having it on the stack, having dmz_get_sb() 
> returning the structure would require (yet another) overhaul of the
> main metadata structure which currently has the primary and secondary
> superblocks embedded.
> And I would argue to keep it that way, as the primary and secondary 
> superblocks are essential to the actual operation. So allocating them 
> separately would mean yet another indirection to get to them.
> At the same time, any tertiary superblock is just used for validation
> during startup, and not referenced anywhere afterwards.
> So using kzalloc() here and freeing them after checking is fine.

OK. Works for me.

> 
> Cheers,
> 
> Hannes
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2020-05-26  8:48 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-22 15:38 [PATCH RFC 00/12] dm-zoned: multi-device support Hannes Reinecke
2020-05-22 15:38 ` [PATCH 01/12] dm-zoned: add debugging message for reading superblocks Hannes Reinecke
2020-05-25  1:54   ` Damien Le Moal
2020-05-22 15:38 ` [PATCH 02/12] dm-zoned: convert to xarray Hannes Reinecke
2020-05-25  2:01   ` Damien Le Moal
2020-05-25  7:40     ` Hannes Reinecke
2020-05-22 15:38 ` [PATCH 03/12] dm-zoned: use on-stack superblock for tertiary devices Hannes Reinecke
2020-05-25  2:09   ` Damien Le Moal
2020-05-25  7:41     ` Hannes Reinecke
2020-05-26  8:25     ` Hannes Reinecke
2020-05-26  8:48       ` Damien Le Moal
2020-05-22 15:38 ` [PATCH 04/12] dm-zoned: secondary superblock must reside on the same devices than primary superblock Hannes Reinecke
2020-05-25  2:10   ` Damien Le Moal
2020-05-22 15:38 ` [PATCH 05/12] dm-zoned: add device pointer to struct dm_zone Hannes Reinecke
2020-05-25  2:15   ` Damien Le Moal
2020-05-25  7:42     ` Hannes Reinecke
2020-05-22 15:38 ` [PATCH 06/12] dm-zoned: add metadata pointer to struct dmz_dev Hannes Reinecke
2020-05-25  2:17   ` Damien Le Moal
2020-05-22 15:38 ` [PATCH 07/12] dm-zoned: add a 'reserved' zone flag Hannes Reinecke
2020-05-25  2:18   ` Damien Le Moal
2020-05-22 15:38 ` [PATCH 08/12] dm-zoned: move random and sequential zones into struct dmz_dev Hannes Reinecke
2020-05-25  2:27   ` Damien Le Moal
2020-05-25  7:47     ` Hannes Reinecke
2020-05-22 15:38 ` [PATCH 09/12] dm-zoned: improve logging messages for reclaim Hannes Reinecke
2020-05-25  2:28   ` Damien Le Moal
2020-05-22 15:38 ` [PATCH 10/12] dm-zoned: support arbitrary number of devices Hannes Reinecke
2020-05-25  2:36   ` Damien Le Moal
2020-05-25  7:52     ` Hannes Reinecke
2020-05-25  8:22       ` Damien Le Moal
2020-05-22 15:39 ` [PATCH 11/12] dm-zoned: round-robin load balancer for reclaiming zones Hannes Reinecke
2020-05-25  2:42   ` Damien Le Moal
2020-05-25  7:53     ` Hannes Reinecke
2020-05-22 15:39 ` [PATCH 12/12] dm-zoned: per-device reclaim Hannes Reinecke
2020-05-25  2:46   ` Damien Le Moal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.