All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv2 00/15] dm-zoned: multi-device support
@ 2020-05-27  6:22 Hannes Reinecke
  2020-05-27  6:22 ` [PATCH 01/15] dm-zoned: add debugging message for reading superblocks Hannes Reinecke
                   ` (16 more replies)
  0 siblings, 17 replies; 32+ messages in thread
From: Hannes Reinecke @ 2020-05-27  6:22 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Damien LeMoal, dm-devel

Hi all,

here's the second version of my patchset to support multiple zoned
drives with dm-zoned.
This patchset:
- Converts the zone array to using xarray for better scalability
- Separates out shared structures into per-device structure
- Enforce drive-locality for allocating and reclaiming zones
- Lifts the restriction of 2 devices to handle an arbitrary number
  of drives.

This gives me a near-perfect scalability by increasing the write
speed from 150MB/s (for a cache and one zoned drive) to 300MB/s
(for a cache and two zoned drives).

Changes to v1:
- Include reviews from Damien
- Reshuffle patches

Hannes Reinecke (15):
  dm-zoned: add debugging message for reading superblocks
  dm-zoned: secondary superblock must reside on the same devices than
    primary superblock
  dm-zoned: improve logging messages for reclaim
  dm-zoned: add a 'reserved' zone flag
  dm-zoned: convert to xarray
  dm-zoned: temporary superblock for tertiary devices
  dm-zoned: add device pointer to struct dm_zone
  dm-zoned: add metadata pointer to struct dmz_dev
  dm-zoned: allocate dm devices dynamically
  dm-zoned: per-device reclaim
  dm-zoned: move random and sequential zones into struct dmz_dev
  dm-zoned: support arbitrary number of devices
  dm-zoned: allocate zone by device index
  dm-zoned: select reclaim zone based on device index
  dm-zoned: prefer full zones for reclaim

 drivers/md/dm-zoned-metadata.c | 448 ++++++++++++++++++++++++-----------------
 drivers/md/dm-zoned-reclaim.c  |  95 +++++----
 drivers/md/dm-zoned-target.c   | 169 ++++++++++------
 drivers/md/dm-zoned.h          |  77 ++++---
 4 files changed, 481 insertions(+), 308 deletions(-)

-- 
2.16.4

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 01/15] dm-zoned: add debugging message for reading superblocks
  2020-05-27  6:22 [PATCHv2 00/15] dm-zoned: multi-device support Hannes Reinecke
@ 2020-05-27  6:22 ` Hannes Reinecke
  2020-05-27  6:22 ` [PATCH 02/15] dm-zoned: secondary superblock must reside on the same devices than primary superblock Hannes Reinecke
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Hannes Reinecke @ 2020-05-27  6:22 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Damien LeMoal, dm-devel

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/md/dm-zoned-metadata.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index 4a2e351365c5..ef1524d5928a 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -1105,6 +1105,10 @@ static int dmz_check_sb(struct dmz_metadata *zmd, unsigned int set)
  */
 static int dmz_read_sb(struct dmz_metadata *zmd, unsigned int set)
 {
+	dmz_zmd_debug(zmd, "read superblock set %d dev %s block %llu",
+		      set, zmd->sb[set].dev->name,
+		      zmd->sb[set].block);
+
 	return dmz_rdwr_block(zmd->sb[set].dev, REQ_OP_READ,
 			      zmd->sb[set].block, zmd->sb[set].mblk->page);
 }
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 02/15] dm-zoned: secondary superblock must reside on the same devices than primary superblock
  2020-05-27  6:22 [PATCHv2 00/15] dm-zoned: multi-device support Hannes Reinecke
  2020-05-27  6:22 ` [PATCH 01/15] dm-zoned: add debugging message for reading superblocks Hannes Reinecke
@ 2020-05-27  6:22 ` Hannes Reinecke
  2020-05-27  6:22 ` [PATCH 03/15] dm-zoned: improve logging messages for reclaim Hannes Reinecke
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Hannes Reinecke @ 2020-05-27  6:22 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Damien LeMoal, dm-devel

The secondary superblock must reside on the same device than the
primary superblock, so there's no need to re-calculate the device.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/md/dm-zoned-metadata.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index ef1524d5928a..043ed882970a 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -1135,7 +1135,7 @@ static int dmz_lookup_secondary_sb(struct dmz_metadata *zmd)
 	/* Bad first super block: search for the second one */
 	zmd->sb[1].block = zmd->sb[0].block + zone_nr_blocks;
 	zmd->sb[1].zone = zmd->sb[0].zone + 1;
-	zmd->sb[1].dev = dmz_zone_to_dev(zmd, zmd->sb[1].zone);
+	zmd->sb[1].dev = zmd->sb[0].dev;
 	for (i = 0; i < zmd->nr_rnd_zones - 1; i++) {
 		if (dmz_read_sb(zmd, 1) != 0)
 			break;
@@ -1144,7 +1144,6 @@ static int dmz_lookup_secondary_sb(struct dmz_metadata *zmd)
 			return 0;
 		}
 		zmd->sb[1].block += zone_nr_blocks;
-		zmd->sb[1].dev = dmz_zone_to_dev(zmd, zmd->sb[1].zone + i);
 	}
 
 	dmz_free_mblock(zmd, mblk);
@@ -1263,7 +1262,7 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
 		if (!zmd->sb[1].zone)
 			zmd->sb[1].zone = zmd->sb[0].zone + zmd->nr_meta_zones;
 		zmd->sb[1].block = dmz_start_block(zmd, zmd->sb[1].zone);
-		zmd->sb[1].dev = dmz_zone_to_dev(zmd, zmd->sb[1].zone);
+		zmd->sb[1].dev = zmd->sb[0].dev;
 		ret = dmz_get_sb(zmd, 1);
 	} else
 		ret = dmz_lookup_secondary_sb(zmd);
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 03/15] dm-zoned: improve logging messages for reclaim
  2020-05-27  6:22 [PATCHv2 00/15] dm-zoned: multi-device support Hannes Reinecke
  2020-05-27  6:22 ` [PATCH 01/15] dm-zoned: add debugging message for reading superblocks Hannes Reinecke
  2020-05-27  6:22 ` [PATCH 02/15] dm-zoned: secondary superblock must reside on the same devices than primary superblock Hannes Reinecke
@ 2020-05-27  6:22 ` Hannes Reinecke
  2020-05-27  6:22 ` [PATCH 04/15] dm-zoned: add a 'reserved' zone flag Hannes Reinecke
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Hannes Reinecke @ 2020-05-27  6:22 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Damien LeMoal, dm-devel

Instead of just reporting the errno this patch adds some more
verbose debugging message in the reclaim path.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/md/dm-zoned-reclaim.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-zoned-reclaim.c b/drivers/md/dm-zoned-reclaim.c
index 571bc1d41bab..fd4d47dfcea1 100644
--- a/drivers/md/dm-zoned-reclaim.c
+++ b/drivers/md/dm-zoned-reclaim.c
@@ -371,8 +371,11 @@ static int dmz_do_reclaim(struct dmz_reclaim *zrc)
 
 	/* Get a data zone */
 	dzone = dmz_get_zone_for_reclaim(zmd, dmz_target_idle(zrc));
-	if (!dzone)
+	if (!dzone) {
+		DMDEBUG("(%s): No zone found to reclaim",
+			dmz_metadata_label(zmd));
 		return -EBUSY;
+	}
 
 	start = jiffies;
 	if (dmz_is_cache(dzone) || dmz_is_rnd(dzone)) {
@@ -416,6 +419,12 @@ static int dmz_do_reclaim(struct dmz_reclaim *zrc)
 	}
 out:
 	if (ret) {
+		if (ret == -EINTR)
+			DMDEBUG("(%s): reclaim zone %u interrupted",
+				dmz_metadata_label(zmd), rzone->id);
+		else
+			DMDEBUG("(%s): Failed to reclaim zone %u, err %d",
+				dmz_metadata_label(zmd), rzone->id, ret);
 		dmz_unlock_zone_reclaim(dzone);
 		return ret;
 	}
@@ -519,8 +528,6 @@ static void dmz_reclaim_work(struct work_struct *work)
 
 	ret = dmz_do_reclaim(zrc);
 	if (ret && ret != -EINTR) {
-		DMDEBUG("(%s): Reclaim error %d",
-			dmz_metadata_label(zmd), ret);
 		if (!dmz_check_dev(zmd))
 			return;
 	}
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 04/15] dm-zoned: add a 'reserved' zone flag
  2020-05-27  6:22 [PATCHv2 00/15] dm-zoned: multi-device support Hannes Reinecke
                   ` (2 preceding siblings ...)
  2020-05-27  6:22 ` [PATCH 03/15] dm-zoned: improve logging messages for reclaim Hannes Reinecke
@ 2020-05-27  6:22 ` Hannes Reinecke
  2020-05-27  6:22 ` [PATCH 05/15] dm-zoned: convert to xarray Hannes Reinecke
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Hannes Reinecke @ 2020-05-27  6:22 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Damien LeMoal, dm-devel

Instead of counting the number of reserved zones in dmz_free_zone()
we should mark the zone as 'reserved' during allocation and simplify
dmz_free_zone().

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/md/dm-zoned-metadata.c | 4 ++--
 drivers/md/dm-zoned.h          | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index 043ed882970a..0982ab1758a6 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -1743,6 +1743,7 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
 			atomic_inc(&zmd->unmap_nr_rnd);
 		} else if (atomic_read(&zmd->nr_reserved_seq_zones) < zmd->nr_reserved_seq) {
 			list_add_tail(&dzone->link, &zmd->reserved_seq_zones_list);
+			set_bit(DMZ_RESERVED, &dzone->flags);
 			atomic_inc(&zmd->nr_reserved_seq_zones);
 			zmd->nr_seq--;
 		} else {
@@ -2168,8 +2169,7 @@ void dmz_free_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
 	} else if (dmz_is_rnd(zone)) {
 		list_add_tail(&zone->link, &zmd->unmap_rnd_list);
 		atomic_inc(&zmd->unmap_nr_rnd);
-	} else if (atomic_read(&zmd->nr_reserved_seq_zones) <
-		   zmd->nr_reserved_seq) {
+	} else if (dmz_is_reserved(zone)) {
 		list_add_tail(&zone->link, &zmd->reserved_seq_zones_list);
 		atomic_inc(&zmd->nr_reserved_seq_zones);
 	} else {
diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
index 8083607b9535..3451b5a768b4 100644
--- a/drivers/md/dm-zoned.h
+++ b/drivers/md/dm-zoned.h
@@ -123,6 +123,7 @@ enum {
 	DMZ_META,
 	DMZ_DATA,
 	DMZ_BUF,
+	DMZ_RESERVED,
 
 	/* Zone internal state */
 	DMZ_RECLAIM,
@@ -140,6 +141,7 @@ enum {
 #define dmz_is_offline(z)	test_bit(DMZ_OFFLINE, &(z)->flags)
 #define dmz_is_readonly(z)	test_bit(DMZ_READ_ONLY, &(z)->flags)
 #define dmz_in_reclaim(z)	test_bit(DMZ_RECLAIM, &(z)->flags)
+#define dmz_is_reserved(z)	test_bit(DMZ_RESERVED, &(z)->flags)
 #define dmz_seq_write_err(z)	test_bit(DMZ_SEQ_WRITE_ERR, &(z)->flags)
 #define dmz_reclaim_should_terminate(z) \
 				test_bit(DMZ_RECLAIM_TERMINATE, &(z)->flags)
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 05/15] dm-zoned: convert to xarray
  2020-05-27  6:22 [PATCHv2 00/15] dm-zoned: multi-device support Hannes Reinecke
                   ` (3 preceding siblings ...)
  2020-05-27  6:22 ` [PATCH 04/15] dm-zoned: add a 'reserved' zone flag Hannes Reinecke
@ 2020-05-27  6:22 ` Hannes Reinecke
  2020-05-28  2:46   ` Damien Le Moal
  2020-05-27  6:22 ` [PATCH 06/15] dm-zoned: temporary superblock for tertiary devices Hannes Reinecke
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 32+ messages in thread
From: Hannes Reinecke @ 2020-05-27  6:22 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Damien LeMoal, dm-devel

The zones array is getting really large, and large arrays
tend to wreak havoc with the CPU caches.
So convert it to xarray to become more cache friendly.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-zoned-metadata.c | 120 ++++++++++++++++++++++++++++++-----------
 1 file changed, 88 insertions(+), 32 deletions(-)

diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index 0982ab1758a6..839f9078806d 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -172,7 +172,7 @@ struct dmz_metadata {
 	unsigned int		nr_chunks;
 
 	/* Zone information array */
-	struct dm_zone		*zones;
+	struct xarray		zones;
 
 	struct dmz_sb		sb[3];
 	unsigned int		mblk_primary;
@@ -327,6 +327,30 @@ unsigned int dmz_nr_unmap_seq_zones(struct dmz_metadata *zmd)
 	return atomic_read(&zmd->unmap_nr_seq);
 }
 
+static struct dm_zone *dmz_get(struct dmz_metadata *zmd, unsigned int zone_id)
+{
+	return xa_load(&zmd->zones, zone_id);
+}
+
+static struct dm_zone *dmz_insert(struct dmz_metadata *zmd,
+				  unsigned int zone_id)
+{
+	struct dm_zone *zone = kzalloc(sizeof(struct dm_zone), GFP_KERNEL);
+
+	if (!zone)
+		return ERR_PTR(-ENOMEM);
+
+	if (xa_insert(&zmd->zones, zone_id, zone, GFP_KERNEL))
+		return ERR_PTR(-EBUSY);
+
+	INIT_LIST_HEAD(&zone->link);
+	atomic_set(&zone->refcount, 0);
+	zone->id = zone_id;
+	zone->chunk = DMZ_MAP_UNMAPPED;
+
+	return zone;
+}
+
 const char *dmz_metadata_label(struct dmz_metadata *zmd)
 {
 	return (const char *)zmd->label;
@@ -1122,6 +1146,7 @@ static int dmz_lookup_secondary_sb(struct dmz_metadata *zmd)
 {
 	unsigned int zone_nr_blocks = zmd->zone_nr_blocks;
 	struct dmz_mblock *mblk;
+	unsigned int zone_id = zmd->sb[0].zone->id;
 	int i;
 
 	/* Allocate a block */
@@ -1134,16 +1159,15 @@ static int dmz_lookup_secondary_sb(struct dmz_metadata *zmd)
 
 	/* Bad first super block: search for the second one */
 	zmd->sb[1].block = zmd->sb[0].block + zone_nr_blocks;
-	zmd->sb[1].zone = zmd->sb[0].zone + 1;
+	zmd->sb[1].zone = dmz_get(zmd, zone_id + 1);
 	zmd->sb[1].dev = zmd->sb[0].dev;
-	for (i = 0; i < zmd->nr_rnd_zones - 1; i++) {
+	for (i = 1; i < zmd->nr_rnd_zones; i++) {
 		if (dmz_read_sb(zmd, 1) != 0)
 			break;
-		if (le32_to_cpu(zmd->sb[1].sb->magic) == DMZ_MAGIC) {
-			zmd->sb[1].zone += i;
+		if (le32_to_cpu(zmd->sb[1].sb->magic) == DMZ_MAGIC)
 			return 0;
-		}
 		zmd->sb[1].block += zone_nr_blocks;
+		zmd->sb[1].zone = dmz_get(zmd, zone_id + i);
 	}
 
 	dmz_free_mblock(zmd, mblk);
@@ -1259,8 +1283,12 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
 	/* Read and check secondary super block */
 	if (ret == 0) {
 		sb_good[0] = true;
-		if (!zmd->sb[1].zone)
-			zmd->sb[1].zone = zmd->sb[0].zone + zmd->nr_meta_zones;
+		if (!zmd->sb[1].zone) {
+			unsigned int zone_id =
+				zmd->sb[0].zone->id + zmd->nr_meta_zones;
+
+			zmd->sb[1].zone = dmz_get(zmd, zone_id);
+		}
 		zmd->sb[1].block = dmz_start_block(zmd, zmd->sb[1].zone);
 		zmd->sb[1].dev = zmd->sb[0].dev;
 		ret = dmz_get_sb(zmd, 1);
@@ -1341,7 +1369,11 @@ static int dmz_init_zone(struct blk_zone *blkz, unsigned int num, void *data)
 	struct dmz_metadata *zmd = data;
 	struct dmz_dev *dev = zmd->nr_devs > 1 ? &zmd->dev[1] : &zmd->dev[0];
 	int idx = num + dev->zone_offset;
-	struct dm_zone *zone = &zmd->zones[idx];
+	struct dm_zone *zone;
+
+	zone = dmz_insert(zmd, idx);
+	if (IS_ERR(zone))
+		return PTR_ERR(zone);
 
 	if (blkz->len != zmd->zone_nr_sectors) {
 		if (zmd->sb_version > 1) {
@@ -1353,11 +1385,6 @@ static int dmz_init_zone(struct blk_zone *blkz, unsigned int num, void *data)
 		return -ENXIO;
 	}
 
-	INIT_LIST_HEAD(&zone->link);
-	atomic_set(&zone->refcount, 0);
-	zone->id = idx;
-	zone->chunk = DMZ_MAP_UNMAPPED;
-
 	switch (blkz->type) {
 	case BLK_ZONE_TYPE_CONVENTIONAL:
 		set_bit(DMZ_RND, &zone->flags);
@@ -1397,18 +1424,17 @@ static int dmz_init_zone(struct blk_zone *blkz, unsigned int num, void *data)
 	return 0;
 }
 
-static void dmz_emulate_zones(struct dmz_metadata *zmd, struct dmz_dev *dev)
+static int dmz_emulate_zones(struct dmz_metadata *zmd, struct dmz_dev *dev)
 {
 	int idx;
 	sector_t zone_offset = 0;
 
 	for(idx = 0; idx < dev->nr_zones; idx++) {
-		struct dm_zone *zone = &zmd->zones[idx];
+		struct dm_zone *zone;
 
-		INIT_LIST_HEAD(&zone->link);
-		atomic_set(&zone->refcount, 0);
-		zone->id = idx;
-		zone->chunk = DMZ_MAP_UNMAPPED;
+		zone = dmz_insert(zmd, idx);
+		if (IS_ERR(zone))
+			return PTR_ERR(zone);
 		set_bit(DMZ_CACHE, &zone->flags);
 		zone->wp_block = 0;
 		zmd->nr_cache_zones++;
@@ -1420,6 +1446,7 @@ static void dmz_emulate_zones(struct dmz_metadata *zmd, struct dmz_dev *dev)
 		}
 		zone_offset += zmd->zone_nr_sectors;
 	}
+	return 0;
 }
 
 /*
@@ -1427,8 +1454,15 @@ static void dmz_emulate_zones(struct dmz_metadata *zmd, struct dmz_dev *dev)
  */
 static void dmz_drop_zones(struct dmz_metadata *zmd)
 {
-	kfree(zmd->zones);
-	zmd->zones = NULL;
+	int idx;
+
+	for(idx = 0; idx < zmd->nr_zones; idx++) {
+		struct dm_zone *zone = xa_load(&zmd->zones, idx);
+
+		kfree(zone);
+		xa_erase(&zmd->zones, idx);
+	}
+	xa_destroy(&zmd->zones);
 }
 
 /*
@@ -1460,20 +1494,25 @@ static int dmz_init_zones(struct dmz_metadata *zmd)
 		DMERR("(%s): No zones found", zmd->devname);
 		return -ENXIO;
 	}
-	zmd->zones = kcalloc(zmd->nr_zones, sizeof(struct dm_zone), GFP_KERNEL);
-	if (!zmd->zones)
-		return -ENOMEM;
+	xa_init(&zmd->zones);
 
 	DMDEBUG("(%s): Using %zu B for zone information",
 		zmd->devname, sizeof(struct dm_zone) * zmd->nr_zones);
 
 	if (zmd->nr_devs > 1) {
-		dmz_emulate_zones(zmd, &zmd->dev[0]);
+		ret = dmz_emulate_zones(zmd, &zmd->dev[0]);
+		if (ret < 0) {
+			DMDEBUG("(%s): Failed to emulate zones, error %d",
+				zmd->devname, ret);
+			dmz_drop_zones(zmd);
+			return ret;
+		}
+
 		/*
 		 * Primary superblock zone is always at zone 0 when multiple
 		 * drives are present.
 		 */
-		zmd->sb[0].zone = &zmd->zones[0];
+		zmd->sb[0].zone = dmz_get(zmd, 0);
 
 		zoned_dev = &zmd->dev[1];
 	}
@@ -1576,11 +1615,6 @@ static int dmz_handle_seq_write_err(struct dmz_metadata *zmd,
 	return 0;
 }
 
-static struct dm_zone *dmz_get(struct dmz_metadata *zmd, unsigned int zone_id)
-{
-	return &zmd->zones[zone_id];
-}
-
 /*
  * Reset a zone write pointer.
  */
@@ -1662,6 +1696,11 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
 		}
 
 		dzone = dmz_get(zmd, dzone_id);
+		if (!dzone) {
+			dmz_zmd_err(zmd, "Chunk %u mapping: data zone %u not present",
+				    chunk, dzone_id);
+			return -EIO;
+		}
 		set_bit(DMZ_DATA, &dzone->flags);
 		dzone->chunk = chunk;
 		dmz_get_zone_weight(zmd, dzone);
@@ -1685,6 +1724,11 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
 		}
 
 		bzone = dmz_get(zmd, bzone_id);
+		if (!bzone) {
+			dmz_zmd_err(zmd, "Chunk %u mapping: buffer zone %u not present",
+				    chunk, bzone_id);
+			return -EIO;
+		}
 		if (!dmz_is_rnd(bzone) && !dmz_is_cache(bzone)) {
 			dmz_zmd_err(zmd, "Chunk %u mapping: invalid buffer zone %u",
 				    chunk, bzone_id);
@@ -1715,6 +1759,8 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
 	 */
 	for (i = 0; i < zmd->nr_zones; i++) {
 		dzone = dmz_get(zmd, i);
+		if (!dzone)
+			continue;
 		if (dmz_is_meta(dzone))
 			continue;
 		if (dmz_is_offline(dzone))
@@ -1978,6 +2024,10 @@ struct dm_zone *dmz_get_chunk_mapping(struct dmz_metadata *zmd, unsigned int chu
 	} else {
 		/* The chunk is already mapped: get the mapping zone */
 		dzone = dmz_get(zmd, dzone_id);
+		if (!dzone) {
+			dzone = ERR_PTR(-EIO);
+			goto out;
+		}
 		if (dzone->chunk != chunk) {
 			dzone = ERR_PTR(-EIO);
 			goto out;
@@ -2794,6 +2844,12 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
 	/* Set metadata zones starting from sb_zone */
 	for (i = 0; i < zmd->nr_meta_zones << 1; i++) {
 		zone = dmz_get(zmd, zmd->sb[0].zone->id + i);
+		if (!zone) {
+			dmz_zmd_err(zmd,
+				    "metadata zone %u not present", i);
+			ret = -ENXIO;
+			goto err;
+		}
 		if (!dmz_is_rnd(zone) && !dmz_is_cache(zone)) {
 			dmz_zmd_err(zmd,
 				    "metadata zone %d is not random", i);
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 06/15] dm-zoned: temporary superblock for tertiary devices
  2020-05-27  6:22 [PATCHv2 00/15] dm-zoned: multi-device support Hannes Reinecke
                   ` (4 preceding siblings ...)
  2020-05-27  6:22 ` [PATCH 05/15] dm-zoned: convert to xarray Hannes Reinecke
@ 2020-05-27  6:22 ` Hannes Reinecke
  2020-05-28  2:54   ` Damien Le Moal
  2020-05-27  6:22 ` [PATCH 07/15] dm-zoned: add device pointer to struct dm_zone Hannes Reinecke
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 32+ messages in thread
From: Hannes Reinecke @ 2020-05-27  6:22 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Damien LeMoal, dm-devel

Checking the tertiary superblock just consists of validating UUIDs,
crcs, and the generation number; it doesn't have contents which
would be required during the actual operation.
So we should be allocating a temporary superblock when checking
tertiary devices and avoid having to store it together with the
'real' superblocks.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-zoned-metadata.c | 104 ++++++++++++++++++++++-------------------
 1 file changed, 57 insertions(+), 47 deletions(-)

diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index 839f9078806d..bb9ce72bf18c 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -174,7 +174,7 @@ struct dmz_metadata {
 	/* Zone information array */
 	struct xarray		zones;
 
-	struct dmz_sb		sb[3];
+	struct dmz_sb		sb[2];
 	unsigned int		mblk_primary;
 	unsigned int		sb_version;
 	u64			sb_gen;
@@ -1014,10 +1014,11 @@ int dmz_flush_metadata(struct dmz_metadata *zmd)
 /*
  * Check super block.
  */
-static int dmz_check_sb(struct dmz_metadata *zmd, unsigned int set)
+static int dmz_check_sb(struct dmz_metadata *zmd, struct dmz_sb *dsb,
+			bool tertiary)
 {
-	struct dmz_super *sb = zmd->sb[set].sb;
-	struct dmz_dev *dev = zmd->sb[set].dev;
+	struct dmz_super *sb = dsb->sb;
+	struct dmz_dev *dev = dsb->dev;
 	unsigned int nr_meta_zones, nr_data_zones;
 	u32 crc, stored_crc;
 	u64 gen;
@@ -1034,7 +1035,7 @@ static int dmz_check_sb(struct dmz_metadata *zmd, unsigned int set)
 			    DMZ_META_VER, zmd->sb_version);
 		return -EINVAL;
 	}
-	if ((zmd->sb_version < 1) && (set == 2)) {
+	if ((zmd->sb_version < 1) && tertiary) {
 		dmz_dev_err(dev, "Tertiary superblocks are not supported");
 		return -EINVAL;
 	}
@@ -1078,7 +1079,7 @@ static int dmz_check_sb(struct dmz_metadata *zmd, unsigned int set)
 			return -ENXIO;
 		}
 
-		if (set == 2) {
+		if (tertiary) {
 			/*
 			 * Generation number should be 0, but it doesn't
 			 * really matter if it isn't.
@@ -1127,14 +1128,13 @@ static int dmz_check_sb(struct dmz_metadata *zmd, unsigned int set)
 /*
  * Read the first or second super block from disk.
  */
-static int dmz_read_sb(struct dmz_metadata *zmd, unsigned int set)
+static int dmz_read_sb(struct dmz_metadata *zmd, struct dmz_sb *sb, int set)
 {
 	dmz_zmd_debug(zmd, "read superblock set %d dev %s block %llu",
-		      set, zmd->sb[set].dev->name,
-		      zmd->sb[set].block);
+		      set, sb->dev->name, sb->block);
 
-	return dmz_rdwr_block(zmd->sb[set].dev, REQ_OP_READ,
-			      zmd->sb[set].block, zmd->sb[set].mblk->page);
+	return dmz_rdwr_block(sb->dev, REQ_OP_READ,
+			      sb->block, sb->mblk->page);
 }
 
 /*
@@ -1162,7 +1162,7 @@ static int dmz_lookup_secondary_sb(struct dmz_metadata *zmd)
 	zmd->sb[1].zone = dmz_get(zmd, zone_id + 1);
 	zmd->sb[1].dev = zmd->sb[0].dev;
 	for (i = 1; i < zmd->nr_rnd_zones; i++) {
-		if (dmz_read_sb(zmd, 1) != 0)
+		if (dmz_read_sb(zmd, &zmd->sb[1], 1) != 0)
 			break;
 		if (le32_to_cpu(zmd->sb[1].sb->magic) == DMZ_MAGIC)
 			return 0;
@@ -1179,9 +1179,9 @@ static int dmz_lookup_secondary_sb(struct dmz_metadata *zmd)
 }
 
 /*
- * Read the first or second super block from disk.
+ * Read a super block from disk.
  */
-static int dmz_get_sb(struct dmz_metadata *zmd, unsigned int set)
+static int dmz_get_sb(struct dmz_metadata *zmd, struct dmz_sb *sb, int set)
 {
 	struct dmz_mblock *mblk;
 	int ret;
@@ -1191,14 +1191,14 @@ static int dmz_get_sb(struct dmz_metadata *zmd, unsigned int set)
 	if (!mblk)
 		return -ENOMEM;
 
-	zmd->sb[set].mblk = mblk;
-	zmd->sb[set].sb = mblk->data;
+	sb->mblk = mblk;
+	sb->sb = mblk->data;
 
 	/* Read super block */
-	ret = dmz_read_sb(zmd, set);
+	ret = dmz_read_sb(zmd, sb, set);
 	if (ret) {
 		dmz_free_mblock(zmd, mblk);
-		zmd->sb[set].mblk = NULL;
+		sb->mblk = NULL;
 		return ret;
 	}
 
@@ -1272,13 +1272,13 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
 	/* Read and check the primary super block */
 	zmd->sb[0].block = dmz_start_block(zmd, zmd->sb[0].zone);
 	zmd->sb[0].dev = dmz_zone_to_dev(zmd, zmd->sb[0].zone);
-	ret = dmz_get_sb(zmd, 0);
+	ret = dmz_get_sb(zmd, &zmd->sb[0], 0);
 	if (ret) {
 		dmz_dev_err(zmd->sb[0].dev, "Read primary super block failed");
 		return ret;
 	}
 
-	ret = dmz_check_sb(zmd, 0);
+	ret = dmz_check_sb(zmd, &zmd->sb[0], false);
 
 	/* Read and check secondary super block */
 	if (ret == 0) {
@@ -1291,7 +1291,7 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
 		}
 		zmd->sb[1].block = dmz_start_block(zmd, zmd->sb[1].zone);
 		zmd->sb[1].dev = zmd->sb[0].dev;
-		ret = dmz_get_sb(zmd, 1);
+		ret = dmz_get_sb(zmd, &zmd->sb[1], 1);
 	} else
 		ret = dmz_lookup_secondary_sb(zmd);
 
@@ -1300,7 +1300,7 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
 		return ret;
 	}
 
-	ret = dmz_check_sb(zmd, 1);
+	ret = dmz_check_sb(zmd, &zmd->sb[1], false);
 	if (ret == 0)
 		sb_good[1] = true;
 
@@ -1345,18 +1345,35 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
 		      "Using super block %u (gen %llu)",
 		      zmd->mblk_primary, zmd->sb_gen);
 
-	if ((zmd->sb_version > 1) && zmd->sb[2].zone) {
-		zmd->sb[2].block = dmz_start_block(zmd, zmd->sb[2].zone);
-		zmd->sb[2].dev = dmz_zone_to_dev(zmd, zmd->sb[2].zone);
-		ret = dmz_get_sb(zmd, 2);
-		if (ret) {
-			dmz_dev_err(zmd->sb[2].dev,
-				    "Read tertiary super block failed");
-			return ret;
+	if (zmd->sb_version > 1) {
+		int i;
+		struct dmz_sb *sb;
+
+		sb = kzalloc(sizeof(struct dmz_sb), GFP_KERNEL);
+		if (!sb)
+			return -ENOMEM;
+		for (i = 1; i < zmd->nr_devs; i++) {
+			sb->block = 0;
+			sb->zone = dmz_get(zmd, zmd->dev[i].zone_offset);
+			sb->dev = &zmd->dev[i];
+			if (!dmz_is_meta(sb->zone)) {
+				dmz_dev_err(sb->dev,
+					    "Tertiary super block zone %u not marked as metadata zone",
+					    sb->zone->id);
+				return -EINVAL;
+			}
+			ret = dmz_get_sb(zmd, sb, i + 1);
+			if (ret) {
+				dmz_dev_err(sb->dev,
+					    "Read tertiary super block failed");
+				return ret;
+			}
+			ret = dmz_check_sb(zmd, sb, true);
+			dmz_free_mblock(zmd, sb->mblk);
+			if (ret == -EINVAL)
+				return ret;
 		}
-		ret = dmz_check_sb(zmd, 2);
-		if (ret == -EINVAL)
-			return ret;
+		kfree(sb);
 	}
 	return 0;
 }
@@ -1415,12 +1432,15 @@ static int dmz_init_zone(struct blk_zone *blkz, unsigned int num, void *data)
 				zmd->sb[0].zone = zone;
 			}
 		}
-		if (zmd->nr_devs > 1 && !zmd->sb[2].zone) {
-			/* Tertiary superblock zone */
-			zmd->sb[2].zone = zone;
+		if (zmd->nr_devs > 1 && num == 0) {
+			/*
+			 * Tertiary superblock zones are always at the
+			 * start of the zoned devices, so mark them
+			 * as metadata zone.
+			 */
+			set_bit(DMZ_META, &zone->flags);
 		}
 	}
-
 	return 0;
 }
 
@@ -2858,16 +2878,6 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
 		}
 		set_bit(DMZ_META, &zone->flags);
 	}
-	if (zmd->sb[2].zone) {
-		zone = dmz_get(zmd, zmd->sb[2].zone->id);
-		if (!zone) {
-			dmz_zmd_err(zmd,
-				    "Tertiary metadata zone not present");
-			ret = -ENXIO;
-			goto err;
-		}
-		set_bit(DMZ_META, &zone->flags);
-	}
 	/* Load mapping table */
 	ret = dmz_load_mapping(zmd);
 	if (ret)
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 07/15] dm-zoned: add device pointer to struct dm_zone
  2020-05-27  6:22 [PATCHv2 00/15] dm-zoned: multi-device support Hannes Reinecke
                   ` (5 preceding siblings ...)
  2020-05-27  6:22 ` [PATCH 06/15] dm-zoned: temporary superblock for tertiary devices Hannes Reinecke
@ 2020-05-27  6:22 ` Hannes Reinecke
  2020-05-28  2:57   ` Damien Le Moal
  2020-05-27  6:22 ` [PATCH 08/15] dm-zoned: add metadata pointer to struct dmz_dev Hannes Reinecke
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 32+ messages in thread
From: Hannes Reinecke @ 2020-05-27  6:22 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Damien LeMoal, dm-devel

Add a pointer to the containing device to struct dm_zone and
kill dmz_zone_to_dev().

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-zoned-metadata.c | 39 ++++++++++-----------------------------
 drivers/md/dm-zoned-reclaim.c  | 13 +++++--------
 drivers/md/dm-zoned-target.c   |  2 +-
 drivers/md/dm-zoned.h          |  4 +++-
 4 files changed, 19 insertions(+), 39 deletions(-)

diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index bb9ce72bf18c..302443b2d885 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -229,16 +229,10 @@ struct dmz_metadata {
  */
 static unsigned int dmz_dev_zone_id(struct dmz_metadata *zmd, struct dm_zone *zone)
 {
-	unsigned int zone_id;
-
 	if (WARN_ON(!zone))
 		return 0;
 
-	zone_id = zone->id;
-	if (zmd->nr_devs > 1 &&
-	    (zone_id >= zmd->dev[1].zone_offset))
-		zone_id -= zmd->dev[1].zone_offset;
-	return zone_id;
+	return zone->id - zone->dev->zone_offset;
 }
 
 sector_t dmz_start_sect(struct dmz_metadata *zmd, struct dm_zone *zone)
@@ -255,18 +249,6 @@ sector_t dmz_start_block(struct dmz_metadata *zmd, struct dm_zone *zone)
 	return (sector_t)zone_id << zmd->zone_nr_blocks_shift;
 }
 
-struct dmz_dev *dmz_zone_to_dev(struct dmz_metadata *zmd, struct dm_zone *zone)
-{
-	if (WARN_ON(!zone))
-		return &zmd->dev[0];
-
-	if (zmd->nr_devs > 1 &&
-	    zone->id >= zmd->dev[1].zone_offset)
-		return &zmd->dev[1];
-
-	return &zmd->dev[0];
-}
-
 unsigned int dmz_zone_nr_blocks(struct dmz_metadata *zmd)
 {
 	return zmd->zone_nr_blocks;
@@ -333,7 +315,7 @@ static struct dm_zone *dmz_get(struct dmz_metadata *zmd, unsigned int zone_id)
 }
 
 static struct dm_zone *dmz_insert(struct dmz_metadata *zmd,
-				  unsigned int zone_id)
+				  unsigned int zone_id, struct dmz_dev *dev)
 {
 	struct dm_zone *zone = kzalloc(sizeof(struct dm_zone), GFP_KERNEL);
 
@@ -347,6 +329,7 @@ static struct dm_zone *dmz_insert(struct dmz_metadata *zmd,
 	atomic_set(&zone->refcount, 0);
 	zone->id = zone_id;
 	zone->chunk = DMZ_MAP_UNMAPPED;
+	zone->dev = dev;
 
 	return zone;
 }
@@ -1271,7 +1254,7 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
 
 	/* Read and check the primary super block */
 	zmd->sb[0].block = dmz_start_block(zmd, zmd->sb[0].zone);
-	zmd->sb[0].dev = dmz_zone_to_dev(zmd, zmd->sb[0].zone);
+	zmd->sb[0].dev = zmd->sb[0].zone->dev;
 	ret = dmz_get_sb(zmd, &zmd->sb[0], 0);
 	if (ret) {
 		dmz_dev_err(zmd->sb[0].dev, "Read primary super block failed");
@@ -1388,7 +1371,7 @@ static int dmz_init_zone(struct blk_zone *blkz, unsigned int num, void *data)
 	int idx = num + dev->zone_offset;
 	struct dm_zone *zone;
 
-	zone = dmz_insert(zmd, idx);
+	zone = dmz_insert(zmd, idx, dev);
 	if (IS_ERR(zone))
 		return PTR_ERR(zone);
 
@@ -1452,7 +1435,7 @@ static int dmz_emulate_zones(struct dmz_metadata *zmd, struct dmz_dev *dev)
 	for(idx = 0; idx < dev->nr_zones; idx++) {
 		struct dm_zone *zone;
 
-		zone = dmz_insert(zmd, idx);
+		zone = dmz_insert(zmd, idx, dev);
 		if (IS_ERR(zone))
 			return PTR_ERR(zone);
 		set_bit(DMZ_CACHE, &zone->flags);
@@ -1578,7 +1561,7 @@ static int dmz_update_zone_cb(struct blk_zone *blkz, unsigned int idx,
  */
 static int dmz_update_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
 {
-	struct dmz_dev *dev = dmz_zone_to_dev(zmd, zone);
+	struct dmz_dev *dev = zone->dev;
 	unsigned int noio_flag;
 	int ret;
 
@@ -1615,7 +1598,7 @@ static int dmz_update_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
 static int dmz_handle_seq_write_err(struct dmz_metadata *zmd,
 				    struct dm_zone *zone)
 {
-	struct dmz_dev *dev = dmz_zone_to_dev(zmd, zone);
+	struct dmz_dev *dev = zone->dev;
 	unsigned int wp = 0;
 	int ret;
 
@@ -1652,7 +1635,7 @@ static int dmz_reset_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
 		return 0;
 
 	if (!dmz_is_empty(zone) || dmz_seq_write_err(zone)) {
-		struct dmz_dev *dev = dmz_zone_to_dev(zmd, zone);
+		struct dmz_dev *dev = zone->dev;
 
 		ret = blkdev_zone_mgmt(dev->bdev, REQ_OP_ZONE_RESET,
 				       dmz_start_sect(zmd, zone),
@@ -2213,9 +2196,7 @@ struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags)
 		goto again;
 	}
 	if (dmz_is_meta(zone)) {
-		struct dmz_dev *dev = dmz_zone_to_dev(zmd, zone);
-
-		dmz_dev_warn(dev, "Zone %u has metadata", zone->id);
+		dmz_zmd_warn(zmd, "Zone %u has metadata", zone->id);
 		zone = NULL;
 		goto again;
 	}
diff --git a/drivers/md/dm-zoned-reclaim.c b/drivers/md/dm-zoned-reclaim.c
index fd4d47dfcea1..e9e3b730e258 100644
--- a/drivers/md/dm-zoned-reclaim.c
+++ b/drivers/md/dm-zoned-reclaim.c
@@ -58,7 +58,7 @@ static int dmz_reclaim_align_wp(struct dmz_reclaim *zrc, struct dm_zone *zone,
 				sector_t block)
 {
 	struct dmz_metadata *zmd = zrc->metadata;
-	struct dmz_dev *dev = dmz_zone_to_dev(zmd, zone);
+	struct dmz_dev *dev = zone->dev;
 	sector_t wp_block = zone->wp_block;
 	unsigned int nr_blocks;
 	int ret;
@@ -116,7 +116,6 @@ static int dmz_reclaim_copy(struct dmz_reclaim *zrc,
 			    struct dm_zone *src_zone, struct dm_zone *dst_zone)
 {
 	struct dmz_metadata *zmd = zrc->metadata;
-	struct dmz_dev *src_dev, *dst_dev;
 	struct dm_io_region src, dst;
 	sector_t block = 0, end_block;
 	sector_t nr_blocks;
@@ -130,17 +129,15 @@ static int dmz_reclaim_copy(struct dmz_reclaim *zrc,
 	else
 		end_block = dmz_zone_nr_blocks(zmd);
 	src_zone_block = dmz_start_block(zmd, src_zone);
-	src_dev = dmz_zone_to_dev(zmd, src_zone);
 	dst_zone_block = dmz_start_block(zmd, dst_zone);
-	dst_dev = dmz_zone_to_dev(zmd, dst_zone);
 
 	if (dmz_is_seq(dst_zone))
 		set_bit(DM_KCOPYD_WRITE_SEQ, &flags);
 
 	while (block < end_block) {
-		if (src_dev->flags & DMZ_BDEV_DYING)
+		if (src_zone->dev->flags & DMZ_BDEV_DYING)
 			return -EIO;
-		if (dst_dev->flags & DMZ_BDEV_DYING)
+		if (dst_zone->dev->flags & DMZ_BDEV_DYING)
 			return -EIO;
 
 		if (dmz_reclaim_should_terminate(src_zone))
@@ -163,11 +160,11 @@ static int dmz_reclaim_copy(struct dmz_reclaim *zrc,
 				return ret;
 		}
 
-		src.bdev = src_dev->bdev;
+		src.bdev = src_zone->dev->bdev;
 		src.sector = dmz_blk2sect(src_zone_block + block);
 		src.count = dmz_blk2sect(nr_blocks);
 
-		dst.bdev = dst_dev->bdev;
+		dst.bdev = dst_zone->dev->bdev;
 		dst.sector = dmz_blk2sect(dst_zone_block + block);
 		dst.count = src.count;
 
diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
index 2770e293a97b..087dd4801663 100644
--- a/drivers/md/dm-zoned-target.c
+++ b/drivers/md/dm-zoned-target.c
@@ -123,7 +123,7 @@ static int dmz_submit_bio(struct dmz_target *dmz, struct dm_zone *zone,
 {
 	struct dmz_bioctx *bioctx =
 		dm_per_bio_data(bio, sizeof(struct dmz_bioctx));
-	struct dmz_dev *dev = dmz_zone_to_dev(dmz->metadata, zone);
+	struct dmz_dev *dev = zone->dev;
 	struct bio *clone;
 
 	if (dev->flags & DMZ_BDEV_DYING)
diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
index 3451b5a768b4..316344bf07bd 100644
--- a/drivers/md/dm-zoned.h
+++ b/drivers/md/dm-zoned.h
@@ -80,6 +80,9 @@ struct dm_zone {
 	/* For listing the zone depending on its state */
 	struct list_head	link;
 
+	/* Device containing this zone */
+	struct dmz_dev		*dev;
+
 	/* Zone type and state */
 	unsigned long		flags;
 
@@ -190,7 +193,6 @@ const char *dmz_metadata_label(struct dmz_metadata *zmd);
 sector_t dmz_start_sect(struct dmz_metadata *zmd, struct dm_zone *zone);
 sector_t dmz_start_block(struct dmz_metadata *zmd, struct dm_zone *zone);
 unsigned int dmz_nr_chunks(struct dmz_metadata *zmd);
-struct dmz_dev *dmz_zone_to_dev(struct dmz_metadata *zmd, struct dm_zone *zone);
 
 bool dmz_check_dev(struct dmz_metadata *zmd);
 bool dmz_dev_is_dying(struct dmz_metadata *zmd);
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 08/15] dm-zoned: add metadata pointer to struct dmz_dev
  2020-05-27  6:22 [PATCHv2 00/15] dm-zoned: multi-device support Hannes Reinecke
                   ` (6 preceding siblings ...)
  2020-05-27  6:22 ` [PATCH 07/15] dm-zoned: add device pointer to struct dm_zone Hannes Reinecke
@ 2020-05-27  6:22 ` Hannes Reinecke
  2020-05-27  6:22 ` [PATCH 09/15] dm-zoned: allocate dm devices dynamically Hannes Reinecke
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Hannes Reinecke @ 2020-05-27  6:22 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Damien LeMoal, dm-devel

Add a metadata pointer to struct dmz_dev and use it as argument
for blkdev_report_zones() instead of the metadata itself.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/md/dm-zoned-metadata.c | 14 +++++++++-----
 drivers/md/dm-zoned.h          |  7 ++++---
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index 302443b2d885..445760730d10 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -1366,8 +1366,8 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
  */
 static int dmz_init_zone(struct blk_zone *blkz, unsigned int num, void *data)
 {
-	struct dmz_metadata *zmd = data;
-	struct dmz_dev *dev = zmd->nr_devs > 1 ? &zmd->dev[1] : &zmd->dev[0];
+	struct dmz_dev *dev = data;
+	struct dmz_metadata *zmd = dev->metadata;
 	int idx = num + dev->zone_offset;
 	struct dm_zone *zone;
 
@@ -1490,8 +1490,12 @@ static int dmz_init_zones(struct dmz_metadata *zmd)
 
 	/* Allocate zone array */
 	zmd->nr_zones = 0;
-	for (i = 0; i < zmd->nr_devs; i++)
-		zmd->nr_zones += zmd->dev[i].nr_zones;
+	for (i = 0; i < zmd->nr_devs; i++) {
+		struct dmz_dev *dev = &zmd->dev[i];
+
+		dev->metadata = zmd;
+		zmd->nr_zones += dev->nr_zones;
+	}
 
 	if (!zmd->nr_zones) {
 		DMERR("(%s): No zones found", zmd->devname);
@@ -1526,7 +1530,7 @@ static int dmz_init_zones(struct dmz_metadata *zmd)
 	 * first randomly writable zone.
 	 */
 	ret = blkdev_report_zones(zoned_dev->bdev, 0, BLK_ALL_ZONES,
-				  dmz_init_zone, zmd);
+				  dmz_init_zone, zoned_dev);
 	if (ret < 0) {
 		DMDEBUG("(%s): Failed to report zones, error %d",
 			zmd->devname, ret);
diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
index 316344bf07bd..983f5b5e9fa0 100644
--- a/drivers/md/dm-zoned.h
+++ b/drivers/md/dm-zoned.h
@@ -45,11 +45,15 @@
 #define dmz_bio_block(bio)	dmz_sect2blk((bio)->bi_iter.bi_sector)
 #define dmz_bio_blocks(bio)	dmz_sect2blk(bio_sectors(bio))
 
+struct dmz_metadata;
+struct dmz_reclaim;
+
 /*
  * Zoned block device information.
  */
 struct dmz_dev {
 	struct block_device	*bdev;
+	struct dmz_metadata	*metadata;
 
 	char			name[BDEVNAME_SIZE];
 	uuid_t			uuid;
@@ -170,9 +174,6 @@ enum {
 #define dmz_dev_debug(dev, format, args...)	\
 	DMDEBUG("(%s): " format, (dev)->name, ## args)
 
-struct dmz_metadata;
-struct dmz_reclaim;
-
 /*
  * Functions defined in dm-zoned-metadata.c
  */
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 09/15] dm-zoned: allocate dm devices dynamically
  2020-05-27  6:22 [PATCHv2 00/15] dm-zoned: multi-device support Hannes Reinecke
                   ` (7 preceding siblings ...)
  2020-05-27  6:22 ` [PATCH 08/15] dm-zoned: add metadata pointer to struct dmz_dev Hannes Reinecke
@ 2020-05-27  6:22 ` Hannes Reinecke
  2020-05-28  2:59   ` Damien Le Moal
  2020-05-27  6:22 ` [PATCH 10/15] dm-zoned: per-device reclaim Hannes Reinecke
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 32+ messages in thread
From: Hannes Reinecke @ 2020-05-27  6:22 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Damien LeMoal, dm-devel

Allocate dm devices dynamically to allow for expansion to
several devices.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-zoned-target.c | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
index 087dd4801663..fec1b6a9c6f1 100644
--- a/drivers/md/dm-zoned-target.c
+++ b/drivers/md/dm-zoned-target.c
@@ -40,9 +40,10 @@ struct dm_chunk_work {
  * Target descriptor.
  */
 struct dmz_target {
-	struct dm_dev		*ddev[DMZ_MAX_DEVS];
+	struct dm_dev		**ddev;
+	unsigned int		nr_ddevs;
 
-	unsigned long		flags;
+	unsigned int		flags;
 
 	/* Zoned block device information */
 	struct dmz_dev		*dev;
@@ -836,12 +837,20 @@ static int dmz_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 		ti->error = "Unable to allocate the zoned target descriptor";
 		return -ENOMEM;
 	}
-	dmz->dev = kcalloc(2, sizeof(struct dmz_dev), GFP_KERNEL);
+	dmz->dev = kcalloc(argc, sizeof(struct dmz_dev), GFP_KERNEL);
 	if (!dmz->dev) {
 		ti->error = "Unable to allocate the zoned device descriptors";
 		kfree(dmz);
 		return -ENOMEM;
 	}
+	dmz->ddev = kcalloc(argc, sizeof(struct dm_dev *), GFP_KERNEL);
+	if (!dmz->ddev) {
+		ti->error = "Unable to allocate the dm device descriptors";
+		ret = -ENOMEM;
+		goto err;
+	}
+	dmz->nr_ddevs = argc;
+
 	ti->private = dmz;
 
 	/* Get the target zoned block device */
@@ -1048,13 +1057,13 @@ static int dmz_iterate_devices(struct dm_target *ti,
 	struct dmz_target *dmz = ti->private;
 	unsigned int zone_nr_sectors = dmz_zone_nr_sectors(dmz->metadata);
 	sector_t capacity;
-	int r;
+	int i, r;
 
-	capacity = dmz->dev[0].capacity & ~(zone_nr_sectors - 1);
-	r = fn(ti, dmz->ddev[0], 0, capacity, data);
-	if (!r && dmz->ddev[1]) {
-		capacity = dmz->dev[1].capacity & ~(zone_nr_sectors - 1);
-		r = fn(ti, dmz->ddev[1], 0, capacity, data);
+	for (i = 0; i < dmz->nr_ddevs; i++) {
+		capacity = dmz->dev[i].capacity & ~(zone_nr_sectors - 1);
+		r = fn(ti, dmz->ddev[i], 0, capacity, data);
+		if (r)
+			break;
 	}
 	return r;
 }
@@ -1083,7 +1092,7 @@ static void dmz_status(struct dm_target *ti, status_type_t type,
 		dev = &dmz->dev[0];
 		format_dev_t(buf, dev->bdev->bd_dev);
 		DMEMIT("%s", buf);
-		if (dmz->dev[1].bdev) {
+		if (dmz->nr_ddevs > 1) {
 			dev = &dmz->dev[1];
 			format_dev_t(buf, dev->bdev->bd_dev);
 			DMEMIT(" %s", buf);
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 10/15] dm-zoned: per-device reclaim
  2020-05-27  6:22 [PATCHv2 00/15] dm-zoned: multi-device support Hannes Reinecke
                   ` (8 preceding siblings ...)
  2020-05-27  6:22 ` [PATCH 09/15] dm-zoned: allocate dm devices dynamically Hannes Reinecke
@ 2020-05-27  6:22 ` Hannes Reinecke
  2020-05-28  3:19   ` Damien Le Moal
  2020-05-27  6:22 ` [PATCH 11/15] dm-zoned: move random and sequential zones into struct dmz_dev Hannes Reinecke
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 32+ messages in thread
From: Hannes Reinecke @ 2020-05-27  6:22 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Damien LeMoal, dm-devel

Instead of having one reclaim workqueue for the entire set we should
be allocating a reclaim workqueue per device; that will reduce
contention and should boost performance for a multi-device setup.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-zoned-reclaim.c | 66 +++++++++++++++++++++++++++----------------
 drivers/md/dm-zoned-target.c  | 39 +++++++++++++++----------
 drivers/md/dm-zoned.h         | 38 +++++++++++++------------
 3 files changed, 86 insertions(+), 57 deletions(-)

diff --git a/drivers/md/dm-zoned-reclaim.c b/drivers/md/dm-zoned-reclaim.c
index e9e3b730e258..09843645248a 100644
--- a/drivers/md/dm-zoned-reclaim.c
+++ b/drivers/md/dm-zoned-reclaim.c
@@ -21,6 +21,8 @@ struct dmz_reclaim {
 	struct dm_kcopyd_throttle kc_throttle;
 	int			kc_err;
 
+	int			dev_idx;
+
 	unsigned long		flags;
 
 	/* Last target access time */
@@ -198,8 +200,8 @@ static int dmz_reclaim_buf(struct dmz_reclaim *zrc, struct dm_zone *dzone)
 	struct dmz_metadata *zmd = zrc->metadata;
 	int ret;
 
-	DMDEBUG("(%s): Chunk %u, move buf zone %u (weight %u) to data zone %u (weight %u)",
-		dmz_metadata_label(zmd),
+	DMDEBUG("(%s/%u): Chunk %u, move buf zone %u (weight %u) to data zone %u (weight %u)",
+		dmz_metadata_label(zmd), zrc->dev_idx,
 		dzone->chunk, bzone->id, dmz_weight(bzone),
 		dzone->id, dmz_weight(dzone));
 
@@ -237,8 +239,8 @@ static int dmz_reclaim_seq_data(struct dmz_reclaim *zrc, struct dm_zone *dzone)
 	struct dmz_metadata *zmd = zrc->metadata;
 	int ret = 0;
 
-	DMDEBUG("(%s): Chunk %u, move data zone %u (weight %u) to buf zone %u (weight %u)",
-		dmz_metadata_label(zmd),
+	DMDEBUG("(%s/%u): Chunk %u, move data zone %u (weight %u) to buf zone %u (weight %u)",
+		dmz_metadata_label(zmd), zrc->dev_idx,
 		chunk, dzone->id, dmz_weight(dzone),
 		bzone->id, dmz_weight(bzone));
 
@@ -295,8 +297,8 @@ static int dmz_reclaim_rnd_data(struct dmz_reclaim *zrc, struct dm_zone *dzone)
 	if (!szone)
 		return -ENOSPC;
 
-	DMDEBUG("(%s): Chunk %u, move %s zone %u (weight %u) to %s zone %u",
-		dmz_metadata_label(zmd), chunk,
+	DMDEBUG("(%s/%u): Chunk %u, move %s zone %u (weight %u) to %s zone %u",
+		dmz_metadata_label(zmd), zrc->dev_idx, chunk,
 		dmz_is_cache(dzone) ? "cache" : "rnd",
 		dzone->id, dmz_weight(dzone),
 		dmz_is_rnd(szone) ? "rnd" : "seq", szone->id);
@@ -369,8 +371,8 @@ static int dmz_do_reclaim(struct dmz_reclaim *zrc)
 	/* Get a data zone */
 	dzone = dmz_get_zone_for_reclaim(zmd, dmz_target_idle(zrc));
 	if (!dzone) {
-		DMDEBUG("(%s): No zone found to reclaim",
-			dmz_metadata_label(zmd));
+		DMDEBUG("(%s/%u): No zone found to reclaim",
+			dmz_metadata_label(zmd), zrc->dev_idx);
 		return -EBUSY;
 	}
 
@@ -417,24 +419,26 @@ static int dmz_do_reclaim(struct dmz_reclaim *zrc)
 out:
 	if (ret) {
 		if (ret == -EINTR)
-			DMDEBUG("(%s): reclaim zone %u interrupted",
-				dmz_metadata_label(zmd), rzone->id);
+			DMDEBUG("(%s/%u): reclaim zone %u interrupted",
+				dmz_metadata_label(zmd), zrc->dev_idx,
+				rzone->id);
 		else
-			DMDEBUG("(%s): Failed to reclaim zone %u, err %d",
-				dmz_metadata_label(zmd), rzone->id, ret);
+			DMDEBUG("(%s/%u): Failed to reclaim zone %u, err %d",
+				dmz_metadata_label(zmd), zrc->dev_idx,
+				rzone->id, ret);
 		dmz_unlock_zone_reclaim(dzone);
 		return ret;
 	}
 
 	ret = dmz_flush_metadata(zrc->metadata);
 	if (ret) {
-		DMDEBUG("(%s): Metadata flush for zone %u failed, err %d",
-			dmz_metadata_label(zmd), rzone->id, ret);
+		DMDEBUG("(%s/%u): Metadata flush for zone %u failed, err %d",
+			dmz_metadata_label(zmd), zrc->dev_idx, rzone->id, ret);
 		return ret;
 	}
 
-	DMDEBUG("(%s): Reclaimed zone %u in %u ms",
-		dmz_metadata_label(zmd),
+	DMDEBUG("(%s/%u): Reclaimed zone %u in %u ms",
+		dmz_metadata_label(zmd), zrc->dev_idx,
 		rzone->id, jiffies_to_msecs(jiffies - start));
 	return 0;
 }
@@ -461,10 +465,20 @@ static unsigned int dmz_reclaim_percentage(struct dmz_reclaim *zrc)
  */
 static bool dmz_should_reclaim(struct dmz_reclaim *zrc, unsigned int p_unmap)
 {
-	unsigned int nr_reclaim = dmz_nr_rnd_zones(zrc->metadata);
+	unsigned int nr_reclaim;
+
+	nr_reclaim = dmz_nr_rnd_zones(zrc->metadata);
 
-	if (dmz_nr_cache_zones(zrc->metadata))
+	if (dmz_nr_cache_zones(zrc->metadata)) {
+		/*
+		 * The first device in a multi-device
+		 * setup only contains cache zones, so
+		 * never start reclaim there.
+		 */
+		if (zrc->dev_idx == 0)
+			return false;
 		nr_reclaim += dmz_nr_cache_zones(zrc->metadata);
+	}
 
 	/* Reclaim when idle */
 	if (dmz_target_idle(zrc) && nr_reclaim)
@@ -488,7 +502,7 @@ static void dmz_reclaim_work(struct work_struct *work)
 {
 	struct dmz_reclaim *zrc = container_of(work, struct dmz_reclaim, work.work);
 	struct dmz_metadata *zmd = zrc->metadata;
-	unsigned int p_unmap;
+	unsigned int p_unmap, nr_unmap_rnd = 0, nr_rnd = 0;
 	int ret;
 
 	if (dmz_dev_is_dying(zmd))
@@ -514,8 +528,11 @@ static void dmz_reclaim_work(struct work_struct *work)
 		zrc->kc_throttle.throttle = min(75U, 100U - p_unmap / 2);
 	}
 
-	DMDEBUG("(%s): Reclaim (%u): %s, %u%% free zones (%u/%u cache %u/%u random)",
-		dmz_metadata_label(zmd),
+	nr_unmap_rnd = dmz_nr_unmap_rnd_zones(zmd);
+	nr_rnd = dmz_nr_rnd_zones(zmd);
+
+	DMDEBUG("(%s/%u): Reclaim (%u): %s, %u%% free zones (%u/%u cache %u/%u random)",
+		dmz_metadata_label(zmd), zrc->dev_idx,
 		zrc->kc_throttle.throttle,
 		(dmz_target_idle(zrc) ? "Idle" : "Busy"),
 		p_unmap, dmz_nr_unmap_cache_zones(zmd),
@@ -536,7 +553,7 @@ static void dmz_reclaim_work(struct work_struct *work)
  * Initialize reclaim.
  */
 int dmz_ctr_reclaim(struct dmz_metadata *zmd,
-		    struct dmz_reclaim **reclaim)
+		    struct dmz_reclaim **reclaim, int idx)
 {
 	struct dmz_reclaim *zrc;
 	int ret;
@@ -547,6 +564,7 @@ int dmz_ctr_reclaim(struct dmz_metadata *zmd,
 
 	zrc->metadata = zmd;
 	zrc->atime = jiffies;
+	zrc->dev_idx = idx;
 
 	/* Reclaim kcopyd client */
 	zrc->kc = dm_kcopyd_client_create(&zrc->kc_throttle);
@@ -558,8 +576,8 @@ int dmz_ctr_reclaim(struct dmz_metadata *zmd,
 
 	/* Reclaim work */
 	INIT_DELAYED_WORK(&zrc->work, dmz_reclaim_work);
-	zrc->wq = alloc_ordered_workqueue("dmz_rwq_%s", WQ_MEM_RECLAIM,
-					  dmz_metadata_label(zmd));
+	zrc->wq = alloc_ordered_workqueue("dmz_rwq_%s_%d", WQ_MEM_RECLAIM,
+					  dmz_metadata_label(zmd), idx);
 	if (!zrc->wq) {
 		ret = -ENOMEM;
 		goto err;
diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
index fec1b6a9c6f1..fc1df9714f63 100644
--- a/drivers/md/dm-zoned-target.c
+++ b/drivers/md/dm-zoned-target.c
@@ -51,9 +51,6 @@ struct dmz_target {
 	/* For metadata handling */
 	struct dmz_metadata     *metadata;
 
-	/* For reclaim */
-	struct dmz_reclaim	*reclaim;
-
 	/* For chunk work */
 	struct radix_tree_root	chunk_rxtree;
 	struct workqueue_struct *chunk_wq;
@@ -405,14 +402,15 @@ static void dmz_handle_bio(struct dmz_target *dmz, struct dm_chunk_work *cw,
 		dm_per_bio_data(bio, sizeof(struct dmz_bioctx));
 	struct dmz_metadata *zmd = dmz->metadata;
 	struct dm_zone *zone;
-	int ret;
+	int i, ret;
 
 	/*
 	 * Write may trigger a zone allocation. So make sure the
 	 * allocation can succeed.
 	 */
 	if (bio_op(bio) == REQ_OP_WRITE)
-		dmz_schedule_reclaim(dmz->reclaim);
+		for (i = 0; i < dmz->nr_ddevs; i++)
+			dmz_schedule_reclaim(dmz->dev[i].reclaim);
 
 	dmz_lock_metadata(zmd);
 
@@ -432,6 +430,7 @@ static void dmz_handle_bio(struct dmz_target *dmz, struct dm_chunk_work *cw,
 	if (zone) {
 		dmz_activate_zone(zone);
 		bioctx->zone = zone;
+		dmz_reclaim_bio_acc(zone->dev->reclaim);
 	}
 
 	switch (bio_op(bio)) {
@@ -578,7 +577,6 @@ static int dmz_queue_chunk_work(struct dmz_target *dmz, struct bio *bio)
 
 	bio_list_add(&cw->bio_list, bio);
 
-	dmz_reclaim_bio_acc(dmz->reclaim);
 	if (queue_work(dmz->chunk_wq, &cw->work))
 		dmz_get_chunk_work(cw);
 out:
@@ -823,7 +821,7 @@ static int dmz_fixup_devices(struct dm_target *ti)
 static int dmz_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 {
 	struct dmz_target *dmz;
-	int ret;
+	int ret, i;
 
 	/* Check arguments */
 	if (argc < 1 || argc > 2) {
@@ -925,10 +923,12 @@ static int dmz_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	mod_delayed_work(dmz->flush_wq, &dmz->flush_work, DMZ_FLUSH_PERIOD);
 
 	/* Initialize reclaim */
-	ret = dmz_ctr_reclaim(dmz->metadata, &dmz->reclaim);
-	if (ret) {
-		ti->error = "Zone reclaim initialization failed";
-		goto err_fwq;
+	for (i = 0; i < dmz->nr_ddevs; i++) {
+		ret = dmz_ctr_reclaim(dmz->metadata, &dmz->dev[i].reclaim, i);
+		if (ret) {
+			ti->error = "Zone reclaim initialization failed";
+			goto err_fwq;
+		}
 	}
 
 	DMINFO("(%s): Target device: %llu 512-byte logical sectors (%llu blocks)",
@@ -961,11 +961,13 @@ static int dmz_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 static void dmz_dtr(struct dm_target *ti)
 {
 	struct dmz_target *dmz = ti->private;
+	int i;
 
 	flush_workqueue(dmz->chunk_wq);
 	destroy_workqueue(dmz->chunk_wq);
 
-	dmz_dtr_reclaim(dmz->reclaim);
+	for (i = 0; i < dmz->nr_ddevs; i++)
+		dmz_dtr_reclaim(dmz->dev[i].reclaim);
 
 	cancel_delayed_work_sync(&dmz->flush_work);
 	destroy_workqueue(dmz->flush_wq);
@@ -1034,9 +1036,11 @@ static int dmz_prepare_ioctl(struct dm_target *ti, struct block_device **bdev)
 static void dmz_suspend(struct dm_target *ti)
 {
 	struct dmz_target *dmz = ti->private;
+	int i;
 
 	flush_workqueue(dmz->chunk_wq);
-	dmz_suspend_reclaim(dmz->reclaim);
+	for (i = 0; i < dmz->nr_ddevs; i++)
+		dmz_suspend_reclaim(dmz->dev[i].reclaim);
 	cancel_delayed_work_sync(&dmz->flush_work);
 }
 
@@ -1046,9 +1050,11 @@ static void dmz_suspend(struct dm_target *ti)
 static void dmz_resume(struct dm_target *ti)
 {
 	struct dmz_target *dmz = ti->private;
+	int i;
 
 	queue_delayed_work(dmz->flush_wq, &dmz->flush_work, DMZ_FLUSH_PERIOD);
-	dmz_resume_reclaim(dmz->reclaim);
+	for (i = 0; i < dmz->nr_ddevs; i++)
+		dmz_resume_reclaim(dmz->dev[i].reclaim);
 }
 
 static int dmz_iterate_devices(struct dm_target *ti,
@@ -1109,7 +1115,10 @@ static int dmz_message(struct dm_target *ti, unsigned int argc, char **argv,
 	int r = -EINVAL;
 
 	if (!strcasecmp(argv[0], "reclaim")) {
-		dmz_schedule_reclaim(dmz->reclaim);
+		int i;
+
+		for (i = 0; i < dmz->nr_ddevs; i++)
+			dmz_schedule_reclaim(dmz->dev[i].reclaim);
 		r = 0;
 	} else
 		DMERR("unrecognized message %s", argv[0]);
diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
index 983f5b5e9fa0..0cc3459f78ce 100644
--- a/drivers/md/dm-zoned.h
+++ b/drivers/md/dm-zoned.h
@@ -54,6 +54,7 @@ struct dmz_reclaim;
 struct dmz_dev {
 	struct block_device	*bdev;
 	struct dmz_metadata	*metadata;
+	struct dmz_reclaim	*reclaim;
 
 	char			name[BDEVNAME_SIZE];
 	uuid_t			uuid;
@@ -229,23 +230,6 @@ static inline void dmz_activate_zone(struct dm_zone *zone)
 	atomic_inc(&zone->refcount);
 }
 
-/*
- * Deactivate a zone. This decrement the zone reference counter
- * indicating that all BIOs to the zone have completed when the count is 0.
- */
-static inline void dmz_deactivate_zone(struct dm_zone *zone)
-{
-	atomic_dec(&zone->refcount);
-}
-
-/*
- * Test if a zone is active, that is, has a refcount > 0.
- */
-static inline bool dmz_is_active(struct dm_zone *zone)
-{
-	return atomic_read(&zone->refcount);
-}
-
 int dmz_lock_zone_reclaim(struct dm_zone *zone);
 void dmz_unlock_zone_reclaim(struct dm_zone *zone);
 struct dm_zone *dmz_get_zone_for_reclaim(struct dmz_metadata *zmd, bool idle);
@@ -272,7 +256,7 @@ int dmz_merge_valid_blocks(struct dmz_metadata *zmd, struct dm_zone *from_zone,
 /*
  * Functions defined in dm-zoned-reclaim.c
  */
-int dmz_ctr_reclaim(struct dmz_metadata *zmd, struct dmz_reclaim **zrc);
+int dmz_ctr_reclaim(struct dmz_metadata *zmd, struct dmz_reclaim **zrc, int idx);
 void dmz_dtr_reclaim(struct dmz_reclaim *zrc);
 void dmz_suspend_reclaim(struct dmz_reclaim *zrc);
 void dmz_resume_reclaim(struct dmz_reclaim *zrc);
@@ -285,4 +269,22 @@ void dmz_schedule_reclaim(struct dmz_reclaim *zrc);
 bool dmz_bdev_is_dying(struct dmz_dev *dmz_dev);
 bool dmz_check_bdev(struct dmz_dev *dmz_dev);
 
+/*
+ * Deactivate a zone. This decrement the zone reference counter
+ * indicating that all BIOs to the zone have completed when the count is 0.
+ */
+static inline void dmz_deactivate_zone(struct dm_zone *zone)
+{
+	dmz_reclaim_bio_acc(zone->dev->reclaim);
+	atomic_dec(&zone->refcount);
+}
+
+/*
+ * Test if a zone is active, that is, has a refcount > 0.
+ */
+static inline bool dmz_is_active(struct dm_zone *zone)
+{
+	return atomic_read(&zone->refcount);
+}
+
 #endif /* DM_ZONED_H */
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 11/15] dm-zoned: move random and sequential zones into struct dmz_dev
  2020-05-27  6:22 [PATCHv2 00/15] dm-zoned: multi-device support Hannes Reinecke
                   ` (9 preceding siblings ...)
  2020-05-27  6:22 ` [PATCH 10/15] dm-zoned: per-device reclaim Hannes Reinecke
@ 2020-05-27  6:22 ` Hannes Reinecke
  2020-05-28  3:25   ` Damien Le Moal
  2020-05-27  6:22 ` [PATCH 12/15] dm-zoned: support arbitrary number of devices Hannes Reinecke
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 32+ messages in thread
From: Hannes Reinecke @ 2020-05-27  6:22 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Damien LeMoal, dm-devel

Random and sequential zones should be part of the respective
device structure to make arbitration between devices possible.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-zoned-metadata.c | 139 +++++++++++++++++++++++------------------
 drivers/md/dm-zoned-reclaim.c  |  15 +++--
 drivers/md/dm-zoned-target.c   |  25 ++++++--
 drivers/md/dm-zoned.h          |  18 ++++--
 4 files changed, 119 insertions(+), 78 deletions(-)

diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index 445760730d10..f309219a5457 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -192,21 +192,12 @@ struct dmz_metadata {
 	/* Zone allocation management */
 	struct mutex		map_lock;
 	struct dmz_mblock	**map_mblk;
-	unsigned int		nr_rnd;
-	atomic_t		unmap_nr_rnd;
-	struct list_head	unmap_rnd_list;
-	struct list_head	map_rnd_list;
 
 	unsigned int		nr_cache;
 	atomic_t		unmap_nr_cache;
 	struct list_head	unmap_cache_list;
 	struct list_head	map_cache_list;
 
-	unsigned int		nr_seq;
-	atomic_t		unmap_nr_seq;
-	struct list_head	unmap_seq_list;
-	struct list_head	map_seq_list;
-
 	atomic_t		nr_reserved_seq_zones;
 	struct list_head	reserved_seq_zones_list;
 
@@ -279,14 +270,14 @@ unsigned int dmz_nr_chunks(struct dmz_metadata *zmd)
 	return zmd->nr_chunks;
 }
 
-unsigned int dmz_nr_rnd_zones(struct dmz_metadata *zmd)
+unsigned int dmz_nr_rnd_zones(struct dmz_metadata *zmd, int idx)
 {
-	return zmd->nr_rnd;
+	return zmd->dev[idx].nr_rnd;
 }
 
-unsigned int dmz_nr_unmap_rnd_zones(struct dmz_metadata *zmd)
+unsigned int dmz_nr_unmap_rnd_zones(struct dmz_metadata *zmd, int idx)
 {
-	return atomic_read(&zmd->unmap_nr_rnd);
+	return atomic_read(&zmd->dev[idx].unmap_nr_rnd);
 }
 
 unsigned int dmz_nr_cache_zones(struct dmz_metadata *zmd)
@@ -299,14 +290,14 @@ unsigned int dmz_nr_unmap_cache_zones(struct dmz_metadata *zmd)
 	return atomic_read(&zmd->unmap_nr_cache);
 }
 
-unsigned int dmz_nr_seq_zones(struct dmz_metadata *zmd)
+unsigned int dmz_nr_seq_zones(struct dmz_metadata *zmd, int idx)
 {
-	return zmd->nr_seq;
+	return zmd->dev[idx].nr_seq;
 }
 
-unsigned int dmz_nr_unmap_seq_zones(struct dmz_metadata *zmd)
+unsigned int dmz_nr_unmap_seq_zones(struct dmz_metadata *zmd, int idx)
 {
-	return atomic_read(&zmd->unmap_nr_seq);
+	return atomic_read(&zmd->dev[idx].unmap_nr_seq);
 }
 
 static struct dm_zone *dmz_get(struct dmz_metadata *zmd, unsigned int zone_id)
@@ -1495,6 +1486,14 @@ static int dmz_init_zones(struct dmz_metadata *zmd)
 
 		dev->metadata = zmd;
 		zmd->nr_zones += dev->nr_zones;
+
+		atomic_set(&dev->unmap_nr_rnd, 0);
+		INIT_LIST_HEAD(&dev->unmap_rnd_list);
+		INIT_LIST_HEAD(&dev->map_rnd_list);
+
+		atomic_set(&dev->unmap_nr_seq, 0);
+		INIT_LIST_HEAD(&dev->unmap_seq_list);
+		INIT_LIST_HEAD(&dev->map_seq_list);
 	}
 
 	if (!zmd->nr_zones) {
@@ -1715,9 +1714,9 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
 		if (dmz_is_cache(dzone))
 			list_add_tail(&dzone->link, &zmd->map_cache_list);
 		else if (dmz_is_rnd(dzone))
-			list_add_tail(&dzone->link, &zmd->map_rnd_list);
+			list_add_tail(&dzone->link, &dzone->dev->map_rnd_list);
 		else
-			list_add_tail(&dzone->link, &zmd->map_seq_list);
+			list_add_tail(&dzone->link, &dzone->dev->map_seq_list);
 
 		/* Check buffer zone */
 		bzone_id = le32_to_cpu(dmap[e].bzone_id);
@@ -1751,7 +1750,7 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
 		if (dmz_is_cache(bzone))
 			list_add_tail(&bzone->link, &zmd->map_cache_list);
 		else
-			list_add_tail(&bzone->link, &zmd->map_rnd_list);
+			list_add_tail(&bzone->link, &bzone->dev->map_rnd_list);
 next:
 		chunk++;
 		e++;
@@ -1776,9 +1775,9 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
 		if (dmz_is_cache(dzone))
 			zmd->nr_cache++;
 		else if (dmz_is_rnd(dzone))
-			zmd->nr_rnd++;
+			dzone->dev->nr_rnd++;
 		else
-			zmd->nr_seq++;
+			dzone->dev->nr_seq++;
 
 		if (dmz_is_data(dzone)) {
 			/* Already initialized */
@@ -1792,16 +1791,18 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
 			list_add_tail(&dzone->link, &zmd->unmap_cache_list);
 			atomic_inc(&zmd->unmap_nr_cache);
 		} else if (dmz_is_rnd(dzone)) {
-			list_add_tail(&dzone->link, &zmd->unmap_rnd_list);
-			atomic_inc(&zmd->unmap_nr_rnd);
+			list_add_tail(&dzone->link,
+				      &dzone->dev->unmap_rnd_list);
+			atomic_inc(&dzone->dev->unmap_nr_rnd);
 		} else if (atomic_read(&zmd->nr_reserved_seq_zones) < zmd->nr_reserved_seq) {
 			list_add_tail(&dzone->link, &zmd->reserved_seq_zones_list);
 			set_bit(DMZ_RESERVED, &dzone->flags);
 			atomic_inc(&zmd->nr_reserved_seq_zones);
-			zmd->nr_seq--;
+			dzone->dev->nr_seq--;
 		} else {
-			list_add_tail(&dzone->link, &zmd->unmap_seq_list);
-			atomic_inc(&zmd->unmap_nr_seq);
+			list_add_tail(&dzone->link,
+				      &dzone->dev->unmap_seq_list);
+			atomic_inc(&dzone->dev->unmap_nr_seq);
 		}
 	}
 
@@ -1835,13 +1836,13 @@ static void __dmz_lru_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
 	list_del_init(&zone->link);
 	if (dmz_is_seq(zone)) {
 		/* LRU rotate sequential zone */
-		list_add_tail(&zone->link, &zmd->map_seq_list);
+		list_add_tail(&zone->link, &zone->dev->map_seq_list);
 	} else if (dmz_is_cache(zone)) {
 		/* LRU rotate cache zone */
 		list_add_tail(&zone->link, &zmd->map_cache_list);
 	} else {
 		/* LRU rotate random zone */
-		list_add_tail(&zone->link, &zmd->map_rnd_list);
+		list_add_tail(&zone->link, &zone->dev->map_rnd_list);
 	}
 }
 
@@ -1923,14 +1924,24 @@ static struct dm_zone *dmz_get_rnd_zone_for_reclaim(struct dmz_metadata *zmd,
 {
 	struct dm_zone *dzone = NULL;
 	struct dm_zone *zone;
-	struct list_head *zone_list = &zmd->map_rnd_list;
+	struct list_head *zone_list;
 
 	/* If we have cache zones select from the cache zone list */
 	if (zmd->nr_cache) {
 		zone_list = &zmd->map_cache_list;
 		/* Try to relaim random zones, too, when idle */
-		if (idle && list_empty(zone_list))
-			zone_list = &zmd->map_rnd_list;
+		if (idle && list_empty(zone_list)) {
+			int i;
+
+			for (i = 1; i < zmd->nr_devs; i++) {
+				zone_list = &zmd->dev[i].map_rnd_list;
+				if (!list_empty(zone_list))
+					break;
+			}
+		}
+	} else {
+		/* Otherwise the random zones are on the first disk */
+		zone_list = &zmd->dev[0].map_rnd_list;
 	}
 
 	list_for_each_entry(zone, zone_list, link) {
@@ -1951,12 +1962,17 @@ static struct dm_zone *dmz_get_rnd_zone_for_reclaim(struct dmz_metadata *zmd,
 static struct dm_zone *dmz_get_seq_zone_for_reclaim(struct dmz_metadata *zmd)
 {
 	struct dm_zone *zone;
+	int i;
 
-	list_for_each_entry(zone, &zmd->map_seq_list, link) {
-		if (!zone->bzone)
-			continue;
-		if (dmz_lock_zone_reclaim(zone))
-			return zone;
+	for (i = 0; i < zmd->nr_devs; i++) {
+		struct dmz_dev *dev = &zmd->dev[i];
+
+		list_for_each_entry(zone, &dev->map_seq_list, link) {
+			if (!zone->bzone)
+				continue;
+			if (dmz_lock_zone_reclaim(zone))
+				return zone;
+		}
 	}
 
 	return NULL;
@@ -2142,7 +2158,7 @@ struct dm_zone *dmz_get_chunk_buffer(struct dmz_metadata *zmd,
 	if (dmz_is_cache(bzone))
 		list_add_tail(&bzone->link, &zmd->map_cache_list);
 	else
-		list_add_tail(&bzone->link, &zmd->map_rnd_list);
+		list_add_tail(&bzone->link, &bzone->dev->map_rnd_list);
 out:
 	dmz_unlock_map(zmd);
 
@@ -2157,21 +2173,27 @@ struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags)
 {
 	struct list_head *list;
 	struct dm_zone *zone;
+	unsigned int dev_idx = 0;
 
+again:
 	if (flags & DMZ_ALLOC_CACHE)
 		list = &zmd->unmap_cache_list;
 	else if (flags & DMZ_ALLOC_RND)
-		list = &zmd->unmap_rnd_list;
+		list = &zmd->dev[dev_idx].unmap_rnd_list;
 	else
-		list = &zmd->unmap_seq_list;
+		list = &zmd->dev[dev_idx].unmap_seq_list;
 
-again:
 	if (list_empty(list)) {
 		/*
 		 * No free zone: return NULL if this is for not reclaim.
 		 */
 		if (!(flags & DMZ_ALLOC_RECLAIM))
 			return NULL;
+		if (dev_idx < zmd->nr_devs) {
+			dev_idx++;
+			goto again;
+		}
+
 		/*
 		 * Fallback to the reserved sequential zones
 		 */
@@ -2190,9 +2212,9 @@ struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags)
 	if (dmz_is_cache(zone))
 		atomic_dec(&zmd->unmap_nr_cache);
 	else if (dmz_is_rnd(zone))
-		atomic_dec(&zmd->unmap_nr_rnd);
+		atomic_dec(&zone->dev->unmap_nr_rnd);
 	else
-		atomic_dec(&zmd->unmap_nr_seq);
+		atomic_dec(&zone->dev->unmap_nr_seq);
 
 	if (dmz_is_offline(zone)) {
 		dmz_zmd_warn(zmd, "Zone %u is offline", zone->id);
@@ -2222,14 +2244,14 @@ void dmz_free_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
 		list_add_tail(&zone->link, &zmd->unmap_cache_list);
 		atomic_inc(&zmd->unmap_nr_cache);
 	} else if (dmz_is_rnd(zone)) {
-		list_add_tail(&zone->link, &zmd->unmap_rnd_list);
-		atomic_inc(&zmd->unmap_nr_rnd);
+		list_add_tail(&zone->link, &zone->dev->unmap_rnd_list);
+		atomic_inc(&zone->dev->unmap_nr_rnd);
 	} else if (dmz_is_reserved(zone)) {
 		list_add_tail(&zone->link, &zmd->reserved_seq_zones_list);
 		atomic_inc(&zmd->nr_reserved_seq_zones);
 	} else {
-		list_add_tail(&zone->link, &zmd->unmap_seq_list);
-		atomic_inc(&zmd->unmap_nr_seq);
+		list_add_tail(&zone->link, &zone->dev->unmap_seq_list);
+		atomic_inc(&zone->dev->unmap_nr_seq);
 	}
 
 	wake_up_all(&zmd->free_wq);
@@ -2249,9 +2271,9 @@ void dmz_map_zone(struct dmz_metadata *zmd, struct dm_zone *dzone,
 	if (dmz_is_cache(dzone))
 		list_add_tail(&dzone->link, &zmd->map_cache_list);
 	else if (dmz_is_rnd(dzone))
-		list_add_tail(&dzone->link, &zmd->map_rnd_list);
+		list_add_tail(&dzone->link, &dzone->dev->map_rnd_list);
 	else
-		list_add_tail(&dzone->link, &zmd->map_seq_list);
+		list_add_tail(&dzone->link, &dzone->dev->map_seq_list);
 }
 
 /*
@@ -2819,18 +2841,11 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
 	INIT_LIST_HEAD(&zmd->mblk_dirty_list);
 
 	mutex_init(&zmd->map_lock);
-	atomic_set(&zmd->unmap_nr_rnd, 0);
-	INIT_LIST_HEAD(&zmd->unmap_rnd_list);
-	INIT_LIST_HEAD(&zmd->map_rnd_list);
 
 	atomic_set(&zmd->unmap_nr_cache, 0);
 	INIT_LIST_HEAD(&zmd->unmap_cache_list);
 	INIT_LIST_HEAD(&zmd->map_cache_list);
 
-	atomic_set(&zmd->unmap_nr_seq, 0);
-	INIT_LIST_HEAD(&zmd->unmap_seq_list);
-	INIT_LIST_HEAD(&zmd->map_seq_list);
-
 	atomic_set(&zmd->nr_reserved_seq_zones, 0);
 	INIT_LIST_HEAD(&zmd->reserved_seq_zones_list);
 
@@ -2899,10 +2914,14 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
 		      zmd->nr_data_zones, zmd->nr_chunks);
 	dmz_zmd_debug(zmd, "    %u cache zones (%u unmapped)",
 		      zmd->nr_cache, atomic_read(&zmd->unmap_nr_cache));
-	dmz_zmd_debug(zmd, "    %u random zones (%u unmapped)",
-		      zmd->nr_rnd, atomic_read(&zmd->unmap_nr_rnd));
-	dmz_zmd_debug(zmd, "    %u sequential zones (%u unmapped)",
-		      zmd->nr_seq, atomic_read(&zmd->unmap_nr_seq));
+	for (i = 0; i < zmd->nr_devs; i++) {
+		dmz_zmd_debug(zmd, "    %u random zones (%u unmapped)",
+			      dmz_nr_rnd_zones(zmd, i),
+			      dmz_nr_unmap_rnd_zones(zmd, i));
+		dmz_zmd_debug(zmd, "    %u sequential zones (%u unmapped)",
+			      dmz_nr_seq_zones(zmd, i),
+			      dmz_nr_unmap_seq_zones(zmd, i));
+	}
 	dmz_zmd_debug(zmd, "  %u reserved sequential data zones",
 		      zmd->nr_reserved_seq);
 	dmz_zmd_debug(zmd, "Format:");
diff --git a/drivers/md/dm-zoned-reclaim.c b/drivers/md/dm-zoned-reclaim.c
index 09843645248a..18edf1b9bf52 100644
--- a/drivers/md/dm-zoned-reclaim.c
+++ b/drivers/md/dm-zoned-reclaim.c
@@ -447,15 +447,14 @@ static unsigned int dmz_reclaim_percentage(struct dmz_reclaim *zrc)
 {
 	struct dmz_metadata *zmd = zrc->metadata;
 	unsigned int nr_cache = dmz_nr_cache_zones(zmd);
-	unsigned int nr_rnd = dmz_nr_rnd_zones(zmd);
 	unsigned int nr_unmap, nr_zones;
 
 	if (nr_cache) {
 		nr_zones = nr_cache;
 		nr_unmap = dmz_nr_unmap_cache_zones(zmd);
 	} else {
-		nr_zones = nr_rnd;
-		nr_unmap = dmz_nr_unmap_rnd_zones(zmd);
+		nr_zones = dmz_nr_rnd_zones(zmd, zrc->dev_idx);
+		nr_unmap = dmz_nr_unmap_rnd_zones(zmd, zrc->dev_idx);
 	}
 	return nr_unmap * 100 / nr_zones;
 }
@@ -467,7 +466,7 @@ static bool dmz_should_reclaim(struct dmz_reclaim *zrc, unsigned int p_unmap)
 {
 	unsigned int nr_reclaim;
 
-	nr_reclaim = dmz_nr_rnd_zones(zrc->metadata);
+	nr_reclaim = dmz_nr_rnd_zones(zrc->metadata, zrc->dev_idx);
 
 	if (dmz_nr_cache_zones(zrc->metadata)) {
 		/*
@@ -528,8 +527,8 @@ static void dmz_reclaim_work(struct work_struct *work)
 		zrc->kc_throttle.throttle = min(75U, 100U - p_unmap / 2);
 	}
 
-	nr_unmap_rnd = dmz_nr_unmap_rnd_zones(zmd);
-	nr_rnd = dmz_nr_rnd_zones(zmd);
+	nr_unmap_rnd = dmz_nr_unmap_rnd_zones(zmd, zrc->dev_idx);
+	nr_rnd = dmz_nr_rnd_zones(zmd, zrc->dev_idx);
 
 	DMDEBUG("(%s/%u): Reclaim (%u): %s, %u%% free zones (%u/%u cache %u/%u random)",
 		dmz_metadata_label(zmd), zrc->dev_idx,
@@ -537,8 +536,8 @@ static void dmz_reclaim_work(struct work_struct *work)
 		(dmz_target_idle(zrc) ? "Idle" : "Busy"),
 		p_unmap, dmz_nr_unmap_cache_zones(zmd),
 		dmz_nr_cache_zones(zmd),
-		dmz_nr_unmap_rnd_zones(zmd),
-		dmz_nr_rnd_zones(zmd));
+		dmz_nr_unmap_rnd_zones(zmd, zrc->dev_idx),
+		dmz_nr_rnd_zones(zmd, zrc->dev_idx));
 
 	ret = dmz_do_reclaim(zrc);
 	if (ret && ret != -EINTR) {
diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
index fc1df9714f63..f6f00a363903 100644
--- a/drivers/md/dm-zoned-target.c
+++ b/drivers/md/dm-zoned-target.c
@@ -1082,17 +1082,30 @@ static void dmz_status(struct dm_target *ti, status_type_t type,
 	ssize_t sz = 0;
 	char buf[BDEVNAME_SIZE];
 	struct dmz_dev *dev;
+	int i;
 
 	switch (type) {
 	case STATUSTYPE_INFO:
-		DMEMIT("%u zones %u/%u cache %u/%u random %u/%u sequential",
+		DMEMIT("%u zones %u/%u cache",
 		       dmz_nr_zones(dmz->metadata),
 		       dmz_nr_unmap_cache_zones(dmz->metadata),
-		       dmz_nr_cache_zones(dmz->metadata),
-		       dmz_nr_unmap_rnd_zones(dmz->metadata),
-		       dmz_nr_rnd_zones(dmz->metadata),
-		       dmz_nr_unmap_seq_zones(dmz->metadata),
-		       dmz_nr_seq_zones(dmz->metadata));
+		       dmz_nr_cache_zones(dmz->metadata));
+		for (i = 0; i < DMZ_MAX_DEVS; i++) {
+			if (!dmz->ddev[i])
+				continue;
+			/*
+			 * For a multi-device setup the first device
+			 * contains only cache zones.
+			 */
+			if ((i == 0) &&
+			    (dmz_nr_cache_zones(dmz->metadata) > 0))
+				continue;
+			DMEMIT(" %u/%u random %u/%u sequential",
+			       dmz_nr_unmap_rnd_zones(dmz->metadata, i),
+			       dmz_nr_rnd_zones(dmz->metadata, i),
+			       dmz_nr_unmap_seq_zones(dmz->metadata, i),
+			       dmz_nr_seq_zones(dmz->metadata, i));
+		}
 		break;
 	case STATUSTYPE_TABLE:
 		dev = &dmz->dev[0];
diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
index 0cc3459f78ce..f2a760f62db5 100644
--- a/drivers/md/dm-zoned.h
+++ b/drivers/md/dm-zoned.h
@@ -67,6 +67,16 @@ struct dmz_dev {
 	unsigned int		flags;
 
 	sector_t		zone_nr_sectors;
+
+	unsigned int		nr_rnd;
+	atomic_t		unmap_nr_rnd;
+	struct list_head	unmap_rnd_list;
+	struct list_head	map_rnd_list;
+
+	unsigned int		nr_seq;
+	atomic_t		unmap_nr_seq;
+	struct list_head	unmap_seq_list;
+	struct list_head	map_seq_list;
 };
 
 #define dmz_bio_chunk(zmd, bio)	((bio)->bi_iter.bi_sector >> \
@@ -213,10 +223,10 @@ void dmz_unmap_zone(struct dmz_metadata *zmd, struct dm_zone *zone);
 unsigned int dmz_nr_zones(struct dmz_metadata *zmd);
 unsigned int dmz_nr_cache_zones(struct dmz_metadata *zmd);
 unsigned int dmz_nr_unmap_cache_zones(struct dmz_metadata *zmd);
-unsigned int dmz_nr_rnd_zones(struct dmz_metadata *zmd);
-unsigned int dmz_nr_unmap_rnd_zones(struct dmz_metadata *zmd);
-unsigned int dmz_nr_seq_zones(struct dmz_metadata *zmd);
-unsigned int dmz_nr_unmap_seq_zones(struct dmz_metadata *zmd);
+unsigned int dmz_nr_rnd_zones(struct dmz_metadata *zmd, int idx);
+unsigned int dmz_nr_unmap_rnd_zones(struct dmz_metadata *zmd, int idx);
+unsigned int dmz_nr_seq_zones(struct dmz_metadata *zmd, int idx);
+unsigned int dmz_nr_unmap_seq_zones(struct dmz_metadata *zmd, int idx);
 unsigned int dmz_zone_nr_blocks(struct dmz_metadata *zmd);
 unsigned int dmz_zone_nr_blocks_shift(struct dmz_metadata *zmd);
 unsigned int dmz_zone_nr_sectors(struct dmz_metadata *zmd);
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 12/15] dm-zoned: support arbitrary number of devices
  2020-05-27  6:22 [PATCHv2 00/15] dm-zoned: multi-device support Hannes Reinecke
                   ` (10 preceding siblings ...)
  2020-05-27  6:22 ` [PATCH 11/15] dm-zoned: move random and sequential zones into struct dmz_dev Hannes Reinecke
@ 2020-05-27  6:22 ` Hannes Reinecke
  2020-05-28  4:04   ` Damien Le Moal
  2020-05-27  6:22 ` [PATCH 13/15] dm-zoned: allocate zone by device index Hannes Reinecke
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 32+ messages in thread
From: Hannes Reinecke @ 2020-05-27  6:22 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Damien LeMoal, dm-devel

Remove the hard-coded limit of two devices and support an unlimited
number of additional zoned devices.
With that we need to increase the device-mapper version number to
3.0.0 as we've modified the interface.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-zoned-metadata.c | 15 +++++++-
 drivers/md/dm-zoned-target.c   | 81 +++++++++++++++++++++++-------------------
 2 files changed, 59 insertions(+), 37 deletions(-)

diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index f309219a5457..689c1dd7ab20 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -1520,7 +1520,20 @@ static int dmz_init_zones(struct dmz_metadata *zmd)
 		 */
 		zmd->sb[0].zone = dmz_get(zmd, 0);
 
-		zoned_dev = &zmd->dev[1];
+		for (i = 1; i < zmd->nr_devs; i++) {
+			zoned_dev = &zmd->dev[i];
+
+			ret = blkdev_report_zones(zoned_dev->bdev, 0,
+						  BLK_ALL_ZONES,
+						  dmz_init_zone, zoned_dev);
+			if (ret < 0) {
+				DMDEBUG("(%s): Failed to report zones, error %d",
+					zmd->devname, ret);
+				dmz_drop_zones(zmd);
+				return ret;
+			}
+		}
+		return 0;
 	}
 
 	/*
diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
index f6f00a363903..4a51738d4b0d 100644
--- a/drivers/md/dm-zoned-target.c
+++ b/drivers/md/dm-zoned-target.c
@@ -13,8 +13,6 @@
 
 #define DMZ_MIN_BIOS		8192
 
-#define DMZ_MAX_DEVS		2
-
 /*
  * Zone BIO context.
  */
@@ -764,7 +762,7 @@ static void dmz_put_zoned_device(struct dm_target *ti)
 	struct dmz_target *dmz = ti->private;
 	int i;
 
-	for (i = 0; i < DMZ_MAX_DEVS; i++) {
+	for (i = 0; i < dmz->nr_ddevs; i++) {
 		if (dmz->ddev[i]) {
 			dm_put_device(ti, dmz->ddev[i]);
 			dmz->ddev[i] = NULL;
@@ -777,21 +775,35 @@ static int dmz_fixup_devices(struct dm_target *ti)
 	struct dmz_target *dmz = ti->private;
 	struct dmz_dev *reg_dev, *zoned_dev;
 	struct request_queue *q;
+	sector_t zone_nr_sectors = 0;
+	int i;
 
 	/*
-	 * When we have two devices, the first one must be a regular block
-	 * device and the second a zoned block device.
+	 * When we have more than on devices, the first one must be a
+	 * regular block device and the others zoned block devices.
 	 */
-	if (dmz->ddev[0] && dmz->ddev[1]) {
+	if (dmz->nr_ddevs > 1) {
 		reg_dev = &dmz->dev[0];
 		if (!(reg_dev->flags & DMZ_BDEV_REGULAR)) {
 			ti->error = "Primary disk is not a regular device";
 			return -EINVAL;
 		}
-		zoned_dev = &dmz->dev[1];
-		if (zoned_dev->flags & DMZ_BDEV_REGULAR) {
-			ti->error = "Secondary disk is not a zoned device";
-			return -EINVAL;
+		for (i = 1; i < dmz->nr_ddevs; i++) {
+			zoned_dev = &dmz->dev[i];
+			if (zoned_dev->flags & DMZ_BDEV_REGULAR) {
+				ti->error = "Secondary disk is not a zoned device";
+				return -EINVAL;
+			}
+			q = bdev_get_queue(zoned_dev->bdev);
+			if (zone_nr_sectors &&
+			    zone_nr_sectors != blk_queue_zone_sectors(q)) {
+				ti->error = "Zone nr sectors mismatch";
+				return -EINVAL;
+			}
+			zone_nr_sectors = blk_queue_zone_sectors(q);
+			zoned_dev->zone_nr_sectors = zone_nr_sectors;
+			zoned_dev->nr_zones =
+				blkdev_nr_zones(zoned_dev->bdev->bd_disk);
 		}
 	} else {
 		reg_dev = NULL;
@@ -800,17 +812,24 @@ static int dmz_fixup_devices(struct dm_target *ti)
 			ti->error = "Disk is not a zoned device";
 			return -EINVAL;
 		}
+		q = bdev_get_queue(zoned_dev->bdev);
+		zoned_dev->zone_nr_sectors = blk_queue_zone_sectors(q);
+		zoned_dev->nr_zones = blkdev_nr_zones(zoned_dev->bdev->bd_disk);
 	}
-	q = bdev_get_queue(zoned_dev->bdev);
-	zoned_dev->zone_nr_sectors = blk_queue_zone_sectors(q);
-	zoned_dev->nr_zones = blkdev_nr_zones(zoned_dev->bdev->bd_disk);
 
 	if (reg_dev) {
-		reg_dev->zone_nr_sectors = zoned_dev->zone_nr_sectors;
+		sector_t zone_offset;
+
+		reg_dev->zone_nr_sectors = zone_nr_sectors;
 		reg_dev->nr_zones =
 			DIV_ROUND_UP_SECTOR_T(reg_dev->capacity,
 					      reg_dev->zone_nr_sectors);
-		zoned_dev->zone_offset = reg_dev->nr_zones;
+		reg_dev->zone_offset = 0;
+		zone_offset = reg_dev->nr_zones;
+		for (i = 1; i < dmz->nr_ddevs; i++) {
+			dmz->dev[i].zone_offset = zone_offset;
+			zone_offset += dmz->dev[i].nr_zones;
+		}
 	}
 	return 0;
 }
@@ -824,7 +843,7 @@ static int dmz_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	int ret, i;
 
 	/* Check arguments */
-	if (argc < 1 || argc > 2) {
+	if (argc < 1) {
 		ti->error = "Invalid argument count";
 		return -EINVAL;
 	}
@@ -852,22 +871,14 @@ static int dmz_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	ti->private = dmz;
 
 	/* Get the target zoned block device */
-	ret = dmz_get_zoned_device(ti, argv[0], 0, argc);
-	if (ret)
-		goto err;
-
-	if (argc == 2) {
-		ret = dmz_get_zoned_device(ti, argv[1], 1, argc);
-		if (ret) {
-			dmz_put_zoned_device(ti);
-			goto err;
-		}
+	for (i = 0; i < argc; i++) {
+		ret = dmz_get_zoned_device(ti, argv[i], i, argc);
+		if (ret)
+			goto err_dev;
 	}
 	ret = dmz_fixup_devices(ti);
-	if (ret) {
-		dmz_put_zoned_device(ti);
-		goto err;
-	}
+	if (ret)
+		goto err_dev;
 
 	/* Initialize metadata */
 	ret = dmz_ctr_metadata(dmz->dev, argc, &dmz->metadata,
@@ -1090,9 +1101,7 @@ static void dmz_status(struct dm_target *ti, status_type_t type,
 		       dmz_nr_zones(dmz->metadata),
 		       dmz_nr_unmap_cache_zones(dmz->metadata),
 		       dmz_nr_cache_zones(dmz->metadata));
-		for (i = 0; i < DMZ_MAX_DEVS; i++) {
-			if (!dmz->ddev[i])
-				continue;
+		for (i = 0; i < dmz->nr_ddevs; i++) {
 			/*
 			 * For a multi-device setup the first device
 			 * contains only cache zones.
@@ -1111,8 +1120,8 @@ static void dmz_status(struct dm_target *ti, status_type_t type,
 		dev = &dmz->dev[0];
 		format_dev_t(buf, dev->bdev->bd_dev);
 		DMEMIT("%s", buf);
-		if (dmz->nr_ddevs > 1) {
-			dev = &dmz->dev[1];
+		for (i = 1; i < dmz->nr_ddevs; i++) {
+			dev = &dmz->dev[i];
 			format_dev_t(buf, dev->bdev->bd_dev);
 			DMEMIT(" %s", buf);
 		}
@@ -1140,7 +1149,7 @@ static int dmz_message(struct dm_target *ti, unsigned int argc, char **argv,
 
 static struct target_type dmz_type = {
 	.name		 = "zoned",
-	.version	 = {2, 0, 0},
+	.version	 = {3, 0, 0},
 	.features	 = DM_TARGET_SINGLETON | DM_TARGET_ZONED_HM,
 	.module		 = THIS_MODULE,
 	.ctr		 = dmz_ctr,
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 13/15] dm-zoned: allocate zone by device index
  2020-05-27  6:22 [PATCHv2 00/15] dm-zoned: multi-device support Hannes Reinecke
                   ` (11 preceding siblings ...)
  2020-05-27  6:22 ` [PATCH 12/15] dm-zoned: support arbitrary number of devices Hannes Reinecke
@ 2020-05-27  6:22 ` Hannes Reinecke
  2020-05-28  4:08   ` Damien Le Moal
  2020-05-27  6:22 ` [PATCH 14/15] dm-zoned: select reclaim zone based on " Hannes Reinecke
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 32+ messages in thread
From: Hannes Reinecke @ 2020-05-27  6:22 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Damien LeMoal, dm-devel

When allocating a zone we should pass in an indicator on which
device the zone should be allocated; this increases performance
for a multi-device setup as then reclaim can allocate zones on
the device for which reclaim is running.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-zoned-metadata.c | 17 +++++++++++------
 drivers/md/dm-zoned-reclaim.c  |  3 ++-
 drivers/md/dm-zoned.h          |  3 ++-
 3 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index 689c1dd7ab20..0d65af94309a 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -2045,7 +2045,7 @@ struct dm_zone *dmz_get_chunk_mapping(struct dmz_metadata *zmd, unsigned int chu
 			goto out;
 
 		/* Allocate a random zone */
-		dzone = dmz_alloc_zone(zmd, alloc_flags);
+		dzone = dmz_alloc_zone(zmd, 0, alloc_flags);
 		if (!dzone) {
 			if (dmz_dev_is_dying(zmd)) {
 				dzone = ERR_PTR(-EIO);
@@ -2151,7 +2151,7 @@ struct dm_zone *dmz_get_chunk_buffer(struct dmz_metadata *zmd,
 		goto out;
 
 	/* Allocate a random zone */
-	bzone = dmz_alloc_zone(zmd, alloc_flags);
+	bzone = dmz_alloc_zone(zmd, 0, alloc_flags);
 	if (!bzone) {
 		if (dmz_dev_is_dying(zmd)) {
 			bzone = ERR_PTR(-EIO);
@@ -2182,11 +2182,12 @@ struct dm_zone *dmz_get_chunk_buffer(struct dmz_metadata *zmd,
  * Get an unmapped (free) zone.
  * This must be called with the mapping lock held.
  */
-struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags)
+struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned int dev_idx,
+			       unsigned long flags)
 {
 	struct list_head *list;
 	struct dm_zone *zone;
-	unsigned int dev_idx = 0;
+	int i = 0;
 
 again:
 	if (flags & DMZ_ALLOC_CACHE)
@@ -2202,8 +2203,12 @@ struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags)
 		 */
 		if (!(flags & DMZ_ALLOC_RECLAIM))
 			return NULL;
-		if (dev_idx < zmd->nr_devs) {
-			dev_idx++;
+		/*
+		 * Try to allocate from other devices
+		 */
+		if (i < zmd->nr_devs) {
+			dev_idx = (dev_idx + 1) % zmd->nr_devs;
+			i++;
 			goto again;
 		}
 
diff --git a/drivers/md/dm-zoned-reclaim.c b/drivers/md/dm-zoned-reclaim.c
index 18edf1b9bf52..5a04e34d17a9 100644
--- a/drivers/md/dm-zoned-reclaim.c
+++ b/drivers/md/dm-zoned-reclaim.c
@@ -288,7 +288,8 @@ static int dmz_reclaim_rnd_data(struct dmz_reclaim *zrc, struct dm_zone *dzone)
 	/* Get a free random or sequential zone */
 	dmz_lock_map(zmd);
 again:
-	szone = dmz_alloc_zone(zmd, alloc_flags | DMZ_ALLOC_RECLAIM);
+	szone = dmz_alloc_zone(zmd, zrc->dev_idx,
+			       alloc_flags | DMZ_ALLOC_RECLAIM);
 	if (!szone && alloc_flags == DMZ_ALLOC_SEQ && dmz_nr_cache_zones(zmd)) {
 		alloc_flags = DMZ_ALLOC_RND;
 		goto again;
diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
index f2a760f62db5..ec020bb1caf7 100644
--- a/drivers/md/dm-zoned.h
+++ b/drivers/md/dm-zoned.h
@@ -214,7 +214,8 @@ bool dmz_dev_is_dying(struct dmz_metadata *zmd);
 #define DMZ_ALLOC_SEQ		0x04
 #define DMZ_ALLOC_RECLAIM	0x10
 
-struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags);
+struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd,
+			       unsigned int dev_idx, unsigned long flags);
 void dmz_free_zone(struct dmz_metadata *zmd, struct dm_zone *zone);
 
 void dmz_map_zone(struct dmz_metadata *zmd, struct dm_zone *zone,
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 14/15] dm-zoned: select reclaim zone based on device index
  2020-05-27  6:22 [PATCHv2 00/15] dm-zoned: multi-device support Hannes Reinecke
                   ` (12 preceding siblings ...)
  2020-05-27  6:22 ` [PATCH 13/15] dm-zoned: allocate zone by device index Hannes Reinecke
@ 2020-05-27  6:22 ` Hannes Reinecke
  2020-05-28  4:11   ` Damien Le Moal
  2020-05-27  6:22 ` [PATCH 15/15] dm-zoned: prefer full zones for reclaim Hannes Reinecke
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 32+ messages in thread
From: Hannes Reinecke @ 2020-05-27  6:22 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Damien LeMoal, dm-devel

per-device reclaim should select zones on that device only.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-zoned-metadata.c | 50 +++++++++++++++++-------------------------
 drivers/md/dm-zoned-reclaim.c  |  3 ++-
 drivers/md/dm-zoned-target.c   |  1 +
 drivers/md/dm-zoned.h          |  5 ++++-
 4 files changed, 27 insertions(+), 32 deletions(-)

diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index 0d65af94309a..b89b3d3b9ec9 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -1933,7 +1933,7 @@ static void dmz_wait_for_reclaim(struct dmz_metadata *zmd, struct dm_zone *zone)
  * Select a cache or random write zone for reclaim.
  */
 static struct dm_zone *dmz_get_rnd_zone_for_reclaim(struct dmz_metadata *zmd,
-						    bool idle)
+						    unsigned int idx, bool idle)
 {
 	struct dm_zone *dzone = NULL;
 	struct dm_zone *zone;
@@ -1943,24 +1943,17 @@ static struct dm_zone *dmz_get_rnd_zone_for_reclaim(struct dmz_metadata *zmd,
 	if (zmd->nr_cache) {
 		zone_list = &zmd->map_cache_list;
 		/* Try to relaim random zones, too, when idle */
-		if (idle && list_empty(zone_list)) {
-			int i;
-
-			for (i = 1; i < zmd->nr_devs; i++) {
-				zone_list = &zmd->dev[i].map_rnd_list;
-				if (!list_empty(zone_list))
-					break;
-			}
-		}
-	} else {
-		/* Otherwise the random zones are on the first disk */
-		zone_list = &zmd->dev[0].map_rnd_list;
-	}
+		if (idle && list_empty(zone_list))
+			zone_list = &zmd->dev[idx].map_rnd_list;
+	} else
+		zone_list = &zmd->dev[idx].map_rnd_list;
 
 	list_for_each_entry(zone, zone_list, link) {
-		if (dmz_is_buf(zone))
+		if (dmz_is_buf(zone)) {
 			dzone = zone->bzone;
-		else
+			if (dzone->dev->dev_idx != idx)
+				continue;
+		} else
 			dzone = zone;
 		if (dmz_lock_zone_reclaim(dzone))
 			return dzone;
@@ -1972,20 +1965,16 @@ static struct dm_zone *dmz_get_rnd_zone_for_reclaim(struct dmz_metadata *zmd,
 /*
  * Select a buffered sequential zone for reclaim.
  */
-static struct dm_zone *dmz_get_seq_zone_for_reclaim(struct dmz_metadata *zmd)
+static struct dm_zone *dmz_get_seq_zone_for_reclaim(struct dmz_metadata *zmd,
+						    unsigned int idx)
 {
 	struct dm_zone *zone;
-	int i;
-
-	for (i = 0; i < zmd->nr_devs; i++) {
-		struct dmz_dev *dev = &zmd->dev[i];
 
-		list_for_each_entry(zone, &dev->map_seq_list, link) {
-			if (!zone->bzone)
-				continue;
-			if (dmz_lock_zone_reclaim(zone))
-				return zone;
-		}
+	list_for_each_entry(zone, &zmd->dev[idx].map_seq_list, link) {
+		if (!zone->bzone)
+			continue;
+		if (dmz_lock_zone_reclaim(zone))
+			return zone;
 	}
 
 	return NULL;
@@ -1994,7 +1983,8 @@ static struct dm_zone *dmz_get_seq_zone_for_reclaim(struct dmz_metadata *zmd)
 /*
  * Select a zone for reclaim.
  */
-struct dm_zone *dmz_get_zone_for_reclaim(struct dmz_metadata *zmd, bool idle)
+struct dm_zone *dmz_get_zone_for_reclaim(struct dmz_metadata *zmd,
+					 unsigned int dev_idx, bool idle)
 {
 	struct dm_zone *zone;
 
@@ -2008,9 +1998,9 @@ struct dm_zone *dmz_get_zone_for_reclaim(struct dmz_metadata *zmd, bool idle)
 	 */
 	dmz_lock_map(zmd);
 	if (list_empty(&zmd->reserved_seq_zones_list))
-		zone = dmz_get_seq_zone_for_reclaim(zmd);
+		zone = dmz_get_seq_zone_for_reclaim(zmd, dev_idx);
 	else
-		zone = dmz_get_rnd_zone_for_reclaim(zmd, idle);
+		zone = dmz_get_rnd_zone_for_reclaim(zmd, dev_idx, idle);
 	dmz_unlock_map(zmd);
 
 	return zone;
diff --git a/drivers/md/dm-zoned-reclaim.c b/drivers/md/dm-zoned-reclaim.c
index 5a04e34d17a9..2261b4dd60b7 100644
--- a/drivers/md/dm-zoned-reclaim.c
+++ b/drivers/md/dm-zoned-reclaim.c
@@ -370,7 +370,8 @@ static int dmz_do_reclaim(struct dmz_reclaim *zrc)
 	int ret;
 
 	/* Get a data zone */
-	dzone = dmz_get_zone_for_reclaim(zmd, dmz_target_idle(zrc));
+	dzone = dmz_get_zone_for_reclaim(zmd, zrc->dev_idx,
+					 dmz_target_idle(zrc));
 	if (!dzone) {
 		DMDEBUG("(%s/%u): No zone found to reclaim",
 			dmz_metadata_label(zmd), zrc->dev_idx);
diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
index 4a51738d4b0d..a23f1fbd208f 100644
--- a/drivers/md/dm-zoned-target.c
+++ b/drivers/md/dm-zoned-target.c
@@ -738,6 +738,7 @@ static int dmz_get_zoned_device(struct dm_target *ti, char *path,
 		dev = &dmz->dev[idx];
 	}
 	dev->bdev = bdev;
+	dev->dev_idx = idx;
 	(void)bdevname(dev->bdev, dev->name);
 
 	dev->capacity = i_size_read(bdev->bd_inode) >> SECTOR_SHIFT;
diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
index ec020bb1caf7..22f11440b423 100644
--- a/drivers/md/dm-zoned.h
+++ b/drivers/md/dm-zoned.h
@@ -61,6 +61,8 @@ struct dmz_dev {
 
 	sector_t		capacity;
 
+	unsigned int		dev_idx;
+
 	unsigned int		nr_zones;
 	unsigned int		zone_offset;
 
@@ -243,7 +245,8 @@ static inline void dmz_activate_zone(struct dm_zone *zone)
 
 int dmz_lock_zone_reclaim(struct dm_zone *zone);
 void dmz_unlock_zone_reclaim(struct dm_zone *zone);
-struct dm_zone *dmz_get_zone_for_reclaim(struct dmz_metadata *zmd, bool idle);
+struct dm_zone *dmz_get_zone_for_reclaim(struct dmz_metadata *zmd,
+					 unsigned int dev_idx, bool idle);
 
 struct dm_zone *dmz_get_chunk_mapping(struct dmz_metadata *zmd,
 				      unsigned int chunk, int op);
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 15/15] dm-zoned: prefer full zones for reclaim
  2020-05-27  6:22 [PATCHv2 00/15] dm-zoned: multi-device support Hannes Reinecke
                   ` (13 preceding siblings ...)
  2020-05-27  6:22 ` [PATCH 14/15] dm-zoned: select reclaim zone based on " Hannes Reinecke
@ 2020-05-27  6:22 ` Hannes Reinecke
  2020-05-28  4:16   ` Damien Le Moal
  2020-05-27 13:41 ` [PATCHv2 00/15] dm-zoned: multi-device support Mike Snitzer
  2020-05-28  8:24 ` Damien Le Moal
  16 siblings, 1 reply; 32+ messages in thread
From: Hannes Reinecke @ 2020-05-27  6:22 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Damien LeMoal, dm-devel

Prefer full zones when selecting the next zone for reclaim.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-zoned-metadata.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index b89b3d3b9ec9..f161ef4e3d3d 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -1936,7 +1936,7 @@ static struct dm_zone *dmz_get_rnd_zone_for_reclaim(struct dmz_metadata *zmd,
 						    unsigned int idx, bool idle)
 {
 	struct dm_zone *dzone = NULL;
-	struct dm_zone *zone;
+	struct dm_zone *zone, *last = NULL;
 	struct list_head *zone_list;
 
 	/* If we have cache zones select from the cache zone list */
@@ -1953,6 +1953,13 @@ static struct dm_zone *dmz_get_rnd_zone_for_reclaim(struct dmz_metadata *zmd,
 			dzone = zone->bzone;
 			if (dzone->dev->dev_idx != idx)
 				continue;
+			if (!last) {
+				last = dzone;
+				continue;
+			}
+			if (last->weight < dzone->weight)
+				continue;
+			dzone = last;
 		} else
 			dzone = zone;
 		if (dmz_lock_zone_reclaim(dzone))
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 00/15] dm-zoned: multi-device support
  2020-05-27  6:22 [PATCHv2 00/15] dm-zoned: multi-device support Hannes Reinecke
                   ` (14 preceding siblings ...)
  2020-05-27  6:22 ` [PATCH 15/15] dm-zoned: prefer full zones for reclaim Hannes Reinecke
@ 2020-05-27 13:41 ` Mike Snitzer
  2020-05-28  4:20   ` Damien Le Moal
  2020-05-28  8:24 ` Damien Le Moal
  16 siblings, 1 reply; 32+ messages in thread
From: Mike Snitzer @ 2020-05-27 13:41 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Damien LeMoal, dm-devel

On Wed, May 27 2020 at  2:22am -0400,
Hannes Reinecke <hare@suse.de> wrote:

> Hi all,
> 
> here's the second version of my patchset to support multiple zoned
> drives with dm-zoned.
> This patchset:
> - Converts the zone array to using xarray for better scalability
> - Separates out shared structures into per-device structure
> - Enforce drive-locality for allocating and reclaiming zones
> - Lifts the restriction of 2 devices to handle an arbitrary number
>   of drives.
> 
> This gives me a near-perfect scalability by increasing the write
> speed from 150MB/s (for a cache and one zoned drive) to 300MB/s
> (for a cache and two zoned drives).
> 
> Changes to v1:
> - Include reviews from Damien
> - Reshuffle patches
> 
> Hannes Reinecke (15):
>   dm-zoned: add debugging message for reading superblocks
>   dm-zoned: secondary superblock must reside on the same devices than
>     primary superblock
>   dm-zoned: improve logging messages for reclaim
>   dm-zoned: add a 'reserved' zone flag
>   dm-zoned: convert to xarray
>   dm-zoned: temporary superblock for tertiary devices
>   dm-zoned: add device pointer to struct dm_zone
>   dm-zoned: add metadata pointer to struct dmz_dev
>   dm-zoned: allocate dm devices dynamically
>   dm-zoned: per-device reclaim
>   dm-zoned: move random and sequential zones into struct dmz_dev
>   dm-zoned: support arbitrary number of devices
>   dm-zoned: allocate zone by device index
>   dm-zoned: select reclaim zone based on device index
>   dm-zoned: prefer full zones for reclaim
> 
>  drivers/md/dm-zoned-metadata.c | 448 ++++++++++++++++++++++++-----------------
>  drivers/md/dm-zoned-reclaim.c  |  95 +++++----
>  drivers/md/dm-zoned-target.c   | 169 ++++++++++------
>  drivers/md/dm-zoned.h          |  77 ++++---
>  4 files changed, 481 insertions(+), 308 deletions(-)

Would you still like to wait until the 5.9 merge window?

Or would you prefer to see these changes land for 5.8 so as to limit the
variants of related code that needs to be supported?

If you and Damien are OK with 5.8 (and testing backs that up) then I
should be able to get it to land.

Let me know, thanks.
Mike

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 05/15] dm-zoned: convert to xarray
  2020-05-27  6:22 ` [PATCH 05/15] dm-zoned: convert to xarray Hannes Reinecke
@ 2020-05-28  2:46   ` Damien Le Moal
  0 siblings, 0 replies; 32+ messages in thread
From: Damien Le Moal @ 2020-05-28  2:46 UTC (permalink / raw)
  To: Hannes Reinecke, Mike Snitzer; +Cc: dm-devel

On 2020/05/27 15:23, Hannes Reinecke wrote:
> The zones array is getting really large, and large arrays
> tend to wreak havoc with the CPU caches.
> So convert it to xarray to become more cache friendly.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/md/dm-zoned-metadata.c | 120 ++++++++++++++++++++++++++++++-----------
>  1 file changed, 88 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> index 0982ab1758a6..839f9078806d 100644
> --- a/drivers/md/dm-zoned-metadata.c
> +++ b/drivers/md/dm-zoned-metadata.c
> @@ -172,7 +172,7 @@ struct dmz_metadata {
>  	unsigned int		nr_chunks;
>  
>  	/* Zone information array */
> -	struct dm_zone		*zones;
> +	struct xarray		zones;
>  
>  	struct dmz_sb		sb[3];
>  	unsigned int		mblk_primary;
> @@ -327,6 +327,30 @@ unsigned int dmz_nr_unmap_seq_zones(struct dmz_metadata *zmd)
>  	return atomic_read(&zmd->unmap_nr_seq);
>  }
>  
> +static struct dm_zone *dmz_get(struct dmz_metadata *zmd, unsigned int zone_id)
> +{
> +	return xa_load(&zmd->zones, zone_id);
> +}
> +
> +static struct dm_zone *dmz_insert(struct dmz_metadata *zmd,
> +				  unsigned int zone_id)
> +{
> +	struct dm_zone *zone = kzalloc(sizeof(struct dm_zone), GFP_KERNEL);
> +
> +	if (!zone)
> +		return ERR_PTR(-ENOMEM);
> +
> +	if (xa_insert(&zmd->zones, zone_id, zone, GFP_KERNEL))
> +		return ERR_PTR(-EBUSY);
> +
> +	INIT_LIST_HEAD(&zone->link);
> +	atomic_set(&zone->refcount, 0);
> +	zone->id = zone_id;
> +	zone->chunk = DMZ_MAP_UNMAPPED;
> +
> +	return zone;
> +}
> +
>  const char *dmz_metadata_label(struct dmz_metadata *zmd)
>  {
>  	return (const char *)zmd->label;
> @@ -1122,6 +1146,7 @@ static int dmz_lookup_secondary_sb(struct dmz_metadata *zmd)
>  {
>  	unsigned int zone_nr_blocks = zmd->zone_nr_blocks;
>  	struct dmz_mblock *mblk;
> +	unsigned int zone_id = zmd->sb[0].zone->id;
>  	int i;
>  
>  	/* Allocate a block */
> @@ -1134,16 +1159,15 @@ static int dmz_lookup_secondary_sb(struct dmz_metadata *zmd)
>  
>  	/* Bad first super block: search for the second one */
>  	zmd->sb[1].block = zmd->sb[0].block + zone_nr_blocks;
> -	zmd->sb[1].zone = zmd->sb[0].zone + 1;
> +	zmd->sb[1].zone = dmz_get(zmd, zone_id + 1);
>  	zmd->sb[1].dev = zmd->sb[0].dev;
> -	for (i = 0; i < zmd->nr_rnd_zones - 1; i++) {
> +	for (i = 1; i < zmd->nr_rnd_zones; i++) {
>  		if (dmz_read_sb(zmd, 1) != 0)
>  			break;
> -		if (le32_to_cpu(zmd->sb[1].sb->magic) == DMZ_MAGIC) {
> -			zmd->sb[1].zone += i;
> +		if (le32_to_cpu(zmd->sb[1].sb->magic) == DMZ_MAGIC)
>  			return 0;
> -		}
>  		zmd->sb[1].block += zone_nr_blocks;
> +		zmd->sb[1].zone = dmz_get(zmd, zone_id + i);
>  	}
>  
>  	dmz_free_mblock(zmd, mblk);
> @@ -1259,8 +1283,12 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
>  	/* Read and check secondary super block */
>  	if (ret == 0) {
>  		sb_good[0] = true;
> -		if (!zmd->sb[1].zone)
> -			zmd->sb[1].zone = zmd->sb[0].zone + zmd->nr_meta_zones;
> +		if (!zmd->sb[1].zone) {
> +			unsigned int zone_id =
> +				zmd->sb[0].zone->id + zmd->nr_meta_zones;
> +
> +			zmd->sb[1].zone = dmz_get(zmd, zone_id);
> +		}
>  		zmd->sb[1].block = dmz_start_block(zmd, zmd->sb[1].zone);
>  		zmd->sb[1].dev = zmd->sb[0].dev;
>  		ret = dmz_get_sb(zmd, 1);
> @@ -1341,7 +1369,11 @@ static int dmz_init_zone(struct blk_zone *blkz, unsigned int num, void *data)
>  	struct dmz_metadata *zmd = data;
>  	struct dmz_dev *dev = zmd->nr_devs > 1 ? &zmd->dev[1] : &zmd->dev[0];
>  	int idx = num + dev->zone_offset;
> -	struct dm_zone *zone = &zmd->zones[idx];
> +	struct dm_zone *zone;
> +
> +	zone = dmz_insert(zmd, idx);
> +	if (IS_ERR(zone))
> +		return PTR_ERR(zone);
>  
>  	if (blkz->len != zmd->zone_nr_sectors) {
>  		if (zmd->sb_version > 1) {
> @@ -1353,11 +1385,6 @@ static int dmz_init_zone(struct blk_zone *blkz, unsigned int num, void *data)
>  		return -ENXIO;
>  	}
>  
> -	INIT_LIST_HEAD(&zone->link);
> -	atomic_set(&zone->refcount, 0);
> -	zone->id = idx;
> -	zone->chunk = DMZ_MAP_UNMAPPED;
> -
>  	switch (blkz->type) {
>  	case BLK_ZONE_TYPE_CONVENTIONAL:
>  		set_bit(DMZ_RND, &zone->flags);
> @@ -1397,18 +1424,17 @@ static int dmz_init_zone(struct blk_zone *blkz, unsigned int num, void *data)
>  	return 0;
>  }
>  
> -static void dmz_emulate_zones(struct dmz_metadata *zmd, struct dmz_dev *dev)
> +static int dmz_emulate_zones(struct dmz_metadata *zmd, struct dmz_dev *dev)
>  {
>  	int idx;
>  	sector_t zone_offset = 0;
>  
>  	for(idx = 0; idx < dev->nr_zones; idx++) {
> -		struct dm_zone *zone = &zmd->zones[idx];
> +		struct dm_zone *zone;
>  
> -		INIT_LIST_HEAD(&zone->link);
> -		atomic_set(&zone->refcount, 0);
> -		zone->id = idx;
> -		zone->chunk = DMZ_MAP_UNMAPPED;
> +		zone = dmz_insert(zmd, idx);
> +		if (IS_ERR(zone))
> +			return PTR_ERR(zone);
>  		set_bit(DMZ_CACHE, &zone->flags);
>  		zone->wp_block = 0;
>  		zmd->nr_cache_zones++;
> @@ -1420,6 +1446,7 @@ static void dmz_emulate_zones(struct dmz_metadata *zmd, struct dmz_dev *dev)
>  		}
>  		zone_offset += zmd->zone_nr_sectors;
>  	}
> +	return 0;
>  }
>  
>  /*
> @@ -1427,8 +1454,15 @@ static void dmz_emulate_zones(struct dmz_metadata *zmd, struct dmz_dev *dev)
>   */
>  static void dmz_drop_zones(struct dmz_metadata *zmd)
>  {
> -	kfree(zmd->zones);
> -	zmd->zones = NULL;
> +	int idx;
> +
> +	for(idx = 0; idx < zmd->nr_zones; idx++) {
> +		struct dm_zone *zone = xa_load(&zmd->zones, idx);
> +
> +		kfree(zone);
> +		xa_erase(&zmd->zones, idx);
> +	}
> +	xa_destroy(&zmd->zones);
>  }
>  
>  /*
> @@ -1460,20 +1494,25 @@ static int dmz_init_zones(struct dmz_metadata *zmd)
>  		DMERR("(%s): No zones found", zmd->devname);
>  		return -ENXIO;
>  	}
> -	zmd->zones = kcalloc(zmd->nr_zones, sizeof(struct dm_zone), GFP_KERNEL);
> -	if (!zmd->zones)
> -		return -ENOMEM;
> +	xa_init(&zmd->zones);
>  
>  	DMDEBUG("(%s): Using %zu B for zone information",
>  		zmd->devname, sizeof(struct dm_zone) * zmd->nr_zones);
>  
>  	if (zmd->nr_devs > 1) {
> -		dmz_emulate_zones(zmd, &zmd->dev[0]);
> +		ret = dmz_emulate_zones(zmd, &zmd->dev[0]);
> +		if (ret < 0) {
> +			DMDEBUG("(%s): Failed to emulate zones, error %d",
> +				zmd->devname, ret);
> +			dmz_drop_zones(zmd);
> +			return ret;
> +		}
> +
>  		/*
>  		 * Primary superblock zone is always at zone 0 when multiple
>  		 * drives are present.
>  		 */
> -		zmd->sb[0].zone = &zmd->zones[0];
> +		zmd->sb[0].zone = dmz_get(zmd, 0);
>  
>  		zoned_dev = &zmd->dev[1];
>  	}
> @@ -1576,11 +1615,6 @@ static int dmz_handle_seq_write_err(struct dmz_metadata *zmd,
>  	return 0;
>  }
>  
> -static struct dm_zone *dmz_get(struct dmz_metadata *zmd, unsigned int zone_id)
> -{
> -	return &zmd->zones[zone_id];
> -}
> -
>  /*
>   * Reset a zone write pointer.
>   */
> @@ -1662,6 +1696,11 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
>  		}
>  
>  		dzone = dmz_get(zmd, dzone_id);
> +		if (!dzone) {
> +			dmz_zmd_err(zmd, "Chunk %u mapping: data zone %u not present",
> +				    chunk, dzone_id);
> +			return -EIO;
> +		}
>  		set_bit(DMZ_DATA, &dzone->flags);
>  		dzone->chunk = chunk;
>  		dmz_get_zone_weight(zmd, dzone);
> @@ -1685,6 +1724,11 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
>  		}
>  
>  		bzone = dmz_get(zmd, bzone_id);
> +		if (!bzone) {
> +			dmz_zmd_err(zmd, "Chunk %u mapping: buffer zone %u not present",
> +				    chunk, bzone_id);
> +			return -EIO;
> +		}
>  		if (!dmz_is_rnd(bzone) && !dmz_is_cache(bzone)) {
>  			dmz_zmd_err(zmd, "Chunk %u mapping: invalid buffer zone %u",
>  				    chunk, bzone_id);
> @@ -1715,6 +1759,8 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
>  	 */
>  	for (i = 0; i < zmd->nr_zones; i++) {
>  		dzone = dmz_get(zmd, i);
> +		if (!dzone)
> +			continue;
>  		if (dmz_is_meta(dzone))
>  			continue;
>  		if (dmz_is_offline(dzone))
> @@ -1978,6 +2024,10 @@ struct dm_zone *dmz_get_chunk_mapping(struct dmz_metadata *zmd, unsigned int chu
>  	} else {
>  		/* The chunk is already mapped: get the mapping zone */
>  		dzone = dmz_get(zmd, dzone_id);
> +		if (!dzone) {
> +			dzone = ERR_PTR(-EIO);
> +			goto out;
> +		}
>  		if (dzone->chunk != chunk) {
>  			dzone = ERR_PTR(-EIO);
>  			goto out;
> @@ -2794,6 +2844,12 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
>  	/* Set metadata zones starting from sb_zone */
>  	for (i = 0; i < zmd->nr_meta_zones << 1; i++) {
>  		zone = dmz_get(zmd, zmd->sb[0].zone->id + i);
> +		if (!zone) {
> +			dmz_zmd_err(zmd,
> +				    "metadata zone %u not present", i);
> +			ret = -ENXIO;
> +			goto err;
> +		}
>  		if (!dmz_is_rnd(zone) && !dmz_is_cache(zone)) {
>  			dmz_zmd_err(zmd,
>  				    "metadata zone %d is not random", i);
> 

Looks good.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/15] dm-zoned: temporary superblock for tertiary devices
  2020-05-27  6:22 ` [PATCH 06/15] dm-zoned: temporary superblock for tertiary devices Hannes Reinecke
@ 2020-05-28  2:54   ` Damien Le Moal
  0 siblings, 0 replies; 32+ messages in thread
From: Damien Le Moal @ 2020-05-28  2:54 UTC (permalink / raw)
  To: Hannes Reinecke, Mike Snitzer; +Cc: dm-devel

On 2020/05/27 15:22, Hannes Reinecke wrote:
> Checking the tertiary superblock just consists of validating UUIDs,
> crcs, and the generation number; it doesn't have contents which
> would be required during the actual operation.
> So we should be allocating a temporary superblock when checking
> tertiary devices and avoid having to store it together with the
> 'real' superblocks.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/md/dm-zoned-metadata.c | 104 ++++++++++++++++++++++-------------------
>  1 file changed, 57 insertions(+), 47 deletions(-)
> 
> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> index 839f9078806d..bb9ce72bf18c 100644
> --- a/drivers/md/dm-zoned-metadata.c
> +++ b/drivers/md/dm-zoned-metadata.c
> @@ -174,7 +174,7 @@ struct dmz_metadata {
>  	/* Zone information array */
>  	struct xarray		zones;
>  
> -	struct dmz_sb		sb[3];
> +	struct dmz_sb		sb[2];
>  	unsigned int		mblk_primary;
>  	unsigned int		sb_version;
>  	u64			sb_gen;
> @@ -1014,10 +1014,11 @@ int dmz_flush_metadata(struct dmz_metadata *zmd)
>  /*
>   * Check super block.
>   */
> -static int dmz_check_sb(struct dmz_metadata *zmd, unsigned int set)
> +static int dmz_check_sb(struct dmz_metadata *zmd, struct dmz_sb *dsb,
> +			bool tertiary)
>  {
> -	struct dmz_super *sb = zmd->sb[set].sb;
> -	struct dmz_dev *dev = zmd->sb[set].dev;
> +	struct dmz_super *sb = dsb->sb;
> +	struct dmz_dev *dev = dsb->dev;
>  	unsigned int nr_meta_zones, nr_data_zones;
>  	u32 crc, stored_crc;
>  	u64 gen;
> @@ -1034,7 +1035,7 @@ static int dmz_check_sb(struct dmz_metadata *zmd, unsigned int set)
>  			    DMZ_META_VER, zmd->sb_version);
>  		return -EINVAL;
>  	}
> -	if ((zmd->sb_version < 1) && (set == 2)) {
> +	if ((zmd->sb_version < 1) && tertiary) {

You can remove the parentheses around zmd->sb_version < 1 here.
An shouldn't it be "zmd->sb_version <= 1" here ? That already was in the
previous round of updates and did not detect any problem though. Will check again.

>  		dmz_dev_err(dev, "Tertiary superblocks are not supported");
>  		return -EINVAL;
>  	}
> @@ -1078,7 +1079,7 @@ static int dmz_check_sb(struct dmz_metadata *zmd, unsigned int set)
>  			return -ENXIO;
>  		}
>  
> -		if (set == 2) {
> +		if (tertiary) {
>  			/*
>  			 * Generation number should be 0, but it doesn't
>  			 * really matter if it isn't.
> @@ -1127,14 +1128,13 @@ static int dmz_check_sb(struct dmz_metadata *zmd, unsigned int set)
>  /*
>   * Read the first or second super block from disk.
>   */
> -static int dmz_read_sb(struct dmz_metadata *zmd, unsigned int set)
> +static int dmz_read_sb(struct dmz_metadata *zmd, struct dmz_sb *sb, int set)
>  {
>  	dmz_zmd_debug(zmd, "read superblock set %d dev %s block %llu",
> -		      set, zmd->sb[set].dev->name,
> -		      zmd->sb[set].block);
> +		      set, sb->dev->name, sb->block);
>  
> -	return dmz_rdwr_block(zmd->sb[set].dev, REQ_OP_READ,
> -			      zmd->sb[set].block, zmd->sb[set].mblk->page);
> +	return dmz_rdwr_block(sb->dev, REQ_OP_READ,
> +			      sb->block, sb->mblk->page);
>  }
>  
>  /*
> @@ -1162,7 +1162,7 @@ static int dmz_lookup_secondary_sb(struct dmz_metadata *zmd)
>  	zmd->sb[1].zone = dmz_get(zmd, zone_id + 1);
>  	zmd->sb[1].dev = zmd->sb[0].dev;
>  	for (i = 1; i < zmd->nr_rnd_zones; i++) {
> -		if (dmz_read_sb(zmd, 1) != 0)
> +		if (dmz_read_sb(zmd, &zmd->sb[1], 1) != 0)
>  			break;
>  		if (le32_to_cpu(zmd->sb[1].sb->magic) == DMZ_MAGIC)
>  			return 0;
> @@ -1179,9 +1179,9 @@ static int dmz_lookup_secondary_sb(struct dmz_metadata *zmd)
>  }
>  
>  /*
> - * Read the first or second super block from disk.
> + * Read a super block from disk.
>   */
> -static int dmz_get_sb(struct dmz_metadata *zmd, unsigned int set)
> +static int dmz_get_sb(struct dmz_metadata *zmd, struct dmz_sb *sb, int set)
>  {
>  	struct dmz_mblock *mblk;
>  	int ret;
> @@ -1191,14 +1191,14 @@ static int dmz_get_sb(struct dmz_metadata *zmd, unsigned int set)
>  	if (!mblk)
>  		return -ENOMEM;
>  
> -	zmd->sb[set].mblk = mblk;
> -	zmd->sb[set].sb = mblk->data;
> +	sb->mblk = mblk;
> +	sb->sb = mblk->data;
>  
>  	/* Read super block */
> -	ret = dmz_read_sb(zmd, set);
> +	ret = dmz_read_sb(zmd, sb, set);
>  	if (ret) {
>  		dmz_free_mblock(zmd, mblk);
> -		zmd->sb[set].mblk = NULL;
> +		sb->mblk = NULL;
>  		return ret;
>  	}
>  
> @@ -1272,13 +1272,13 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
>  	/* Read and check the primary super block */
>  	zmd->sb[0].block = dmz_start_block(zmd, zmd->sb[0].zone);
>  	zmd->sb[0].dev = dmz_zone_to_dev(zmd, zmd->sb[0].zone);
> -	ret = dmz_get_sb(zmd, 0);
> +	ret = dmz_get_sb(zmd, &zmd->sb[0], 0);
>  	if (ret) {
>  		dmz_dev_err(zmd->sb[0].dev, "Read primary super block failed");
>  		return ret;
>  	}
>  
> -	ret = dmz_check_sb(zmd, 0);
> +	ret = dmz_check_sb(zmd, &zmd->sb[0], false);
>  
>  	/* Read and check secondary super block */
>  	if (ret == 0) {
> @@ -1291,7 +1291,7 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
>  		}
>  		zmd->sb[1].block = dmz_start_block(zmd, zmd->sb[1].zone);
>  		zmd->sb[1].dev = zmd->sb[0].dev;
> -		ret = dmz_get_sb(zmd, 1);
> +		ret = dmz_get_sb(zmd, &zmd->sb[1], 1);
>  	} else
>  		ret = dmz_lookup_secondary_sb(zmd);
>  
> @@ -1300,7 +1300,7 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
>  		return ret;
>  	}
>  
> -	ret = dmz_check_sb(zmd, 1);
> +	ret = dmz_check_sb(zmd, &zmd->sb[1], false);
>  	if (ret == 0)
>  		sb_good[1] = true;
>  
> @@ -1345,18 +1345,35 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
>  		      "Using super block %u (gen %llu)",
>  		      zmd->mblk_primary, zmd->sb_gen);
>  
> -	if ((zmd->sb_version > 1) && zmd->sb[2].zone) {
> -		zmd->sb[2].block = dmz_start_block(zmd, zmd->sb[2].zone);
> -		zmd->sb[2].dev = dmz_zone_to_dev(zmd, zmd->sb[2].zone);
> -		ret = dmz_get_sb(zmd, 2);
> -		if (ret) {
> -			dmz_dev_err(zmd->sb[2].dev,
> -				    "Read tertiary super block failed");
> -			return ret;
> +	if (zmd->sb_version > 1) {
> +		int i;
> +		struct dmz_sb *sb;
> +
> +		sb = kzalloc(sizeof(struct dmz_sb), GFP_KERNEL);
> +		if (!sb)
> +			return -ENOMEM;
> +		for (i = 1; i < zmd->nr_devs; i++) {
> +			sb->block = 0;
> +			sb->zone = dmz_get(zmd, zmd->dev[i].zone_offset);
> +			sb->dev = &zmd->dev[i];
> +			if (!dmz_is_meta(sb->zone)) {
> +				dmz_dev_err(sb->dev,
> +					    "Tertiary super block zone %u not marked as metadata zone",
> +					    sb->zone->id);
> +				return -EINVAL;

Mem leaking sb here...

> +			}
> +			ret = dmz_get_sb(zmd, sb, i + 1);
> +			if (ret) {
> +				dmz_dev_err(sb->dev,
> +					    "Read tertiary super block failed");
> +				return ret;

Here too.

> +			}
> +			ret = dmz_check_sb(zmd, sb, true);
> +			dmz_free_mblock(zmd, sb->mblk);
> +			if (ret == -EINVAL)
> +				return ret;

And here.

>  		}
> -		ret = dmz_check_sb(zmd, 2);
> -		if (ret == -EINVAL)
> -			return ret;
> +		kfree(sb);
>  	}
>  	return 0;
>  }
> @@ -1415,12 +1432,15 @@ static int dmz_init_zone(struct blk_zone *blkz, unsigned int num, void *data)
>  				zmd->sb[0].zone = zone;
>  			}
>  		}
> -		if (zmd->nr_devs > 1 && !zmd->sb[2].zone) {
> -			/* Tertiary superblock zone */
> -			zmd->sb[2].zone = zone;
> +		if (zmd->nr_devs > 1 && num == 0) {
> +			/*
> +			 * Tertiary superblock zones are always at the
> +			 * start of the zoned devices, so mark them
> +			 * as metadata zone.
> +			 */
> +			set_bit(DMZ_META, &zone->flags);
>  		}
>  	}
> -
>  	return 0;
>  }
>  
> @@ -2858,16 +2878,6 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
>  		}
>  		set_bit(DMZ_META, &zone->flags);
>  	}
> -	if (zmd->sb[2].zone) {
> -		zone = dmz_get(zmd, zmd->sb[2].zone->id);
> -		if (!zone) {
> -			dmz_zmd_err(zmd,
> -				    "Tertiary metadata zone not present");
> -			ret = -ENXIO;
> -			goto err;
> -		}
> -		set_bit(DMZ_META, &zone->flags);
> -	}
>  	/* Load mapping table */
>  	ret = dmz_load_mapping(zmd);
>  	if (ret)
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 07/15] dm-zoned: add device pointer to struct dm_zone
  2020-05-27  6:22 ` [PATCH 07/15] dm-zoned: add device pointer to struct dm_zone Hannes Reinecke
@ 2020-05-28  2:57   ` Damien Le Moal
  0 siblings, 0 replies; 32+ messages in thread
From: Damien Le Moal @ 2020-05-28  2:57 UTC (permalink / raw)
  To: Hannes Reinecke, Mike Snitzer; +Cc: dm-devel

On 2020/05/27 15:22, Hannes Reinecke wrote:
> Add a pointer to the containing device to struct dm_zone and
> kill dmz_zone_to_dev().
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/md/dm-zoned-metadata.c | 39 ++++++++++-----------------------------
>  drivers/md/dm-zoned-reclaim.c  | 13 +++++--------
>  drivers/md/dm-zoned-target.c   |  2 +-
>  drivers/md/dm-zoned.h          |  4 +++-
>  4 files changed, 19 insertions(+), 39 deletions(-)
> 
> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> index bb9ce72bf18c..302443b2d885 100644
> --- a/drivers/md/dm-zoned-metadata.c
> +++ b/drivers/md/dm-zoned-metadata.c
> @@ -229,16 +229,10 @@ struct dmz_metadata {
>   */
>  static unsigned int dmz_dev_zone_id(struct dmz_metadata *zmd, struct dm_zone *zone)
>  {
> -	unsigned int zone_id;
> -
>  	if (WARN_ON(!zone))
>  		return 0;
>  
> -	zone_id = zone->id;
> -	if (zmd->nr_devs > 1 &&
> -	    (zone_id >= zmd->dev[1].zone_offset))
> -		zone_id -= zmd->dev[1].zone_offset;
> -	return zone_id;
> +	return zone->id - zone->dev->zone_offset;
>  }
>  
>  sector_t dmz_start_sect(struct dmz_metadata *zmd, struct dm_zone *zone)
> @@ -255,18 +249,6 @@ sector_t dmz_start_block(struct dmz_metadata *zmd, struct dm_zone *zone)
>  	return (sector_t)zone_id << zmd->zone_nr_blocks_shift;
>  }
>  
> -struct dmz_dev *dmz_zone_to_dev(struct dmz_metadata *zmd, struct dm_zone *zone)
> -{
> -	if (WARN_ON(!zone))
> -		return &zmd->dev[0];
> -
> -	if (zmd->nr_devs > 1 &&
> -	    zone->id >= zmd->dev[1].zone_offset)
> -		return &zmd->dev[1];
> -
> -	return &zmd->dev[0];
> -}
> -
>  unsigned int dmz_zone_nr_blocks(struct dmz_metadata *zmd)
>  {
>  	return zmd->zone_nr_blocks;
> @@ -333,7 +315,7 @@ static struct dm_zone *dmz_get(struct dmz_metadata *zmd, unsigned int zone_id)
>  }
>  
>  static struct dm_zone *dmz_insert(struct dmz_metadata *zmd,
> -				  unsigned int zone_id)
> +				  unsigned int zone_id, struct dmz_dev *dev)
>  {
>  	struct dm_zone *zone = kzalloc(sizeof(struct dm_zone), GFP_KERNEL);
>  
> @@ -347,6 +329,7 @@ static struct dm_zone *dmz_insert(struct dmz_metadata *zmd,
>  	atomic_set(&zone->refcount, 0);
>  	zone->id = zone_id;
>  	zone->chunk = DMZ_MAP_UNMAPPED;
> +	zone->dev = dev;
>  
>  	return zone;
>  }
> @@ -1271,7 +1254,7 @@ static int dmz_load_sb(struct dmz_metadata *zmd)
>  
>  	/* Read and check the primary super block */
>  	zmd->sb[0].block = dmz_start_block(zmd, zmd->sb[0].zone);
> -	zmd->sb[0].dev = dmz_zone_to_dev(zmd, zmd->sb[0].zone);
> +	zmd->sb[0].dev = zmd->sb[0].zone->dev;
>  	ret = dmz_get_sb(zmd, &zmd->sb[0], 0);
>  	if (ret) {
>  		dmz_dev_err(zmd->sb[0].dev, "Read primary super block failed");
> @@ -1388,7 +1371,7 @@ static int dmz_init_zone(struct blk_zone *blkz, unsigned int num, void *data)
>  	int idx = num + dev->zone_offset;
>  	struct dm_zone *zone;
>  
> -	zone = dmz_insert(zmd, idx);
> +	zone = dmz_insert(zmd, idx, dev);
>  	if (IS_ERR(zone))
>  		return PTR_ERR(zone);
>  
> @@ -1452,7 +1435,7 @@ static int dmz_emulate_zones(struct dmz_metadata *zmd, struct dmz_dev *dev)
>  	for(idx = 0; idx < dev->nr_zones; idx++) {
>  		struct dm_zone *zone;
>  
> -		zone = dmz_insert(zmd, idx);
> +		zone = dmz_insert(zmd, idx, dev);
>  		if (IS_ERR(zone))
>  			return PTR_ERR(zone);
>  		set_bit(DMZ_CACHE, &zone->flags);
> @@ -1578,7 +1561,7 @@ static int dmz_update_zone_cb(struct blk_zone *blkz, unsigned int idx,
>   */
>  static int dmz_update_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
>  {
> -	struct dmz_dev *dev = dmz_zone_to_dev(zmd, zone);
> +	struct dmz_dev *dev = zone->dev;
>  	unsigned int noio_flag;
>  	int ret;
>  
> @@ -1615,7 +1598,7 @@ static int dmz_update_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
>  static int dmz_handle_seq_write_err(struct dmz_metadata *zmd,
>  				    struct dm_zone *zone)
>  {
> -	struct dmz_dev *dev = dmz_zone_to_dev(zmd, zone);
> +	struct dmz_dev *dev = zone->dev;
>  	unsigned int wp = 0;
>  	int ret;
>  
> @@ -1652,7 +1635,7 @@ static int dmz_reset_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
>  		return 0;
>  
>  	if (!dmz_is_empty(zone) || dmz_seq_write_err(zone)) {
> -		struct dmz_dev *dev = dmz_zone_to_dev(zmd, zone);
> +		struct dmz_dev *dev = zone->dev;
>  
>  		ret = blkdev_zone_mgmt(dev->bdev, REQ_OP_ZONE_RESET,
>  				       dmz_start_sect(zmd, zone),
> @@ -2213,9 +2196,7 @@ struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags)
>  		goto again;
>  	}
>  	if (dmz_is_meta(zone)) {
> -		struct dmz_dev *dev = dmz_zone_to_dev(zmd, zone);
> -
> -		dmz_dev_warn(dev, "Zone %u has metadata", zone->id);
> +		dmz_zmd_warn(zmd, "Zone %u has metadata", zone->id);
>  		zone = NULL;
>  		goto again;
>  	}
> diff --git a/drivers/md/dm-zoned-reclaim.c b/drivers/md/dm-zoned-reclaim.c
> index fd4d47dfcea1..e9e3b730e258 100644
> --- a/drivers/md/dm-zoned-reclaim.c
> +++ b/drivers/md/dm-zoned-reclaim.c
> @@ -58,7 +58,7 @@ static int dmz_reclaim_align_wp(struct dmz_reclaim *zrc, struct dm_zone *zone,
>  				sector_t block)
>  {
>  	struct dmz_metadata *zmd = zrc->metadata;
> -	struct dmz_dev *dev = dmz_zone_to_dev(zmd, zone);
> +	struct dmz_dev *dev = zone->dev;
>  	sector_t wp_block = zone->wp_block;
>  	unsigned int nr_blocks;
>  	int ret;
> @@ -116,7 +116,6 @@ static int dmz_reclaim_copy(struct dmz_reclaim *zrc,
>  			    struct dm_zone *src_zone, struct dm_zone *dst_zone)
>  {
>  	struct dmz_metadata *zmd = zrc->metadata;
> -	struct dmz_dev *src_dev, *dst_dev;
>  	struct dm_io_region src, dst;
>  	sector_t block = 0, end_block;
>  	sector_t nr_blocks;
> @@ -130,17 +129,15 @@ static int dmz_reclaim_copy(struct dmz_reclaim *zrc,
>  	else
>  		end_block = dmz_zone_nr_blocks(zmd);
>  	src_zone_block = dmz_start_block(zmd, src_zone);
> -	src_dev = dmz_zone_to_dev(zmd, src_zone);
>  	dst_zone_block = dmz_start_block(zmd, dst_zone);
> -	dst_dev = dmz_zone_to_dev(zmd, dst_zone);
>  
>  	if (dmz_is_seq(dst_zone))
>  		set_bit(DM_KCOPYD_WRITE_SEQ, &flags);
>  
>  	while (block < end_block) {
> -		if (src_dev->flags & DMZ_BDEV_DYING)
> +		if (src_zone->dev->flags & DMZ_BDEV_DYING)
>  			return -EIO;
> -		if (dst_dev->flags & DMZ_BDEV_DYING)
> +		if (dst_zone->dev->flags & DMZ_BDEV_DYING)
>  			return -EIO;
>  
>  		if (dmz_reclaim_should_terminate(src_zone))
> @@ -163,11 +160,11 @@ static int dmz_reclaim_copy(struct dmz_reclaim *zrc,
>  				return ret;
>  		}
>  
> -		src.bdev = src_dev->bdev;
> +		src.bdev = src_zone->dev->bdev;
>  		src.sector = dmz_blk2sect(src_zone_block + block);
>  		src.count = dmz_blk2sect(nr_blocks);
>  
> -		dst.bdev = dst_dev->bdev;
> +		dst.bdev = dst_zone->dev->bdev;
>  		dst.sector = dmz_blk2sect(dst_zone_block + block);
>  		dst.count = src.count;
>  
> diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
> index 2770e293a97b..087dd4801663 100644
> --- a/drivers/md/dm-zoned-target.c
> +++ b/drivers/md/dm-zoned-target.c
> @@ -123,7 +123,7 @@ static int dmz_submit_bio(struct dmz_target *dmz, struct dm_zone *zone,
>  {
>  	struct dmz_bioctx *bioctx =
>  		dm_per_bio_data(bio, sizeof(struct dmz_bioctx));
> -	struct dmz_dev *dev = dmz_zone_to_dev(dmz->metadata, zone);
> +	struct dmz_dev *dev = zone->dev;
>  	struct bio *clone;
>  
>  	if (dev->flags & DMZ_BDEV_DYING)
> diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
> index 3451b5a768b4..316344bf07bd 100644
> --- a/drivers/md/dm-zoned.h
> +++ b/drivers/md/dm-zoned.h
> @@ -80,6 +80,9 @@ struct dm_zone {
>  	/* For listing the zone depending on its state */
>  	struct list_head	link;
>  
> +	/* Device containing this zone */
> +	struct dmz_dev		*dev;
> +
>  	/* Zone type and state */
>  	unsigned long		flags;
>  
> @@ -190,7 +193,6 @@ const char *dmz_metadata_label(struct dmz_metadata *zmd);
>  sector_t dmz_start_sect(struct dmz_metadata *zmd, struct dm_zone *zone);
>  sector_t dmz_start_block(struct dmz_metadata *zmd, struct dm_zone *zone);
>  unsigned int dmz_nr_chunks(struct dmz_metadata *zmd);
> -struct dmz_dev *dmz_zone_to_dev(struct dmz_metadata *zmd, struct dm_zone *zone);
>  
>  bool dmz_check_dev(struct dmz_metadata *zmd);
>  bool dmz_dev_is_dying(struct dmz_metadata *zmd);
> 

Looks good.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/15] dm-zoned: allocate dm devices dynamically
  2020-05-27  6:22 ` [PATCH 09/15] dm-zoned: allocate dm devices dynamically Hannes Reinecke
@ 2020-05-28  2:59   ` Damien Le Moal
  0 siblings, 0 replies; 32+ messages in thread
From: Damien Le Moal @ 2020-05-28  2:59 UTC (permalink / raw)
  To: Hannes Reinecke, Mike Snitzer; +Cc: dm-devel

On 2020/05/27 15:22, Hannes Reinecke wrote:
> Allocate dm devices dynamically to allow for expansion to
> several devices.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/md/dm-zoned-target.c | 29 +++++++++++++++++++----------
>  1 file changed, 19 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
> index 087dd4801663..fec1b6a9c6f1 100644
> --- a/drivers/md/dm-zoned-target.c
> +++ b/drivers/md/dm-zoned-target.c
> @@ -40,9 +40,10 @@ struct dm_chunk_work {
>   * Target descriptor.
>   */
>  struct dmz_target {
> -	struct dm_dev		*ddev[DMZ_MAX_DEVS];
> +	struct dm_dev		**ddev;
> +	unsigned int		nr_ddevs;
>  
> -	unsigned long		flags;
> +	unsigned int		flags;
>  
>  	/* Zoned block device information */
>  	struct dmz_dev		*dev;
> @@ -836,12 +837,20 @@ static int dmz_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  		ti->error = "Unable to allocate the zoned target descriptor";
>  		return -ENOMEM;
>  	}
> -	dmz->dev = kcalloc(2, sizeof(struct dmz_dev), GFP_KERNEL);
> +	dmz->dev = kcalloc(argc, sizeof(struct dmz_dev), GFP_KERNEL);
>  	if (!dmz->dev) {
>  		ti->error = "Unable to allocate the zoned device descriptors";
>  		kfree(dmz);
>  		return -ENOMEM;
>  	}
> +	dmz->ddev = kcalloc(argc, sizeof(struct dm_dev *), GFP_KERNEL);
> +	if (!dmz->ddev) {
> +		ti->error = "Unable to allocate the dm device descriptors";
> +		ret = -ENOMEM;
> +		goto err;
> +	}
> +	dmz->nr_ddevs = argc;
> +
>  	ti->private = dmz;
>  
>  	/* Get the target zoned block device */
> @@ -1048,13 +1057,13 @@ static int dmz_iterate_devices(struct dm_target *ti,
>  	struct dmz_target *dmz = ti->private;
>  	unsigned int zone_nr_sectors = dmz_zone_nr_sectors(dmz->metadata);
>  	sector_t capacity;
> -	int r;
> +	int i, r;
>  
> -	capacity = dmz->dev[0].capacity & ~(zone_nr_sectors - 1);
> -	r = fn(ti, dmz->ddev[0], 0, capacity, data);
> -	if (!r && dmz->ddev[1]) {
> -		capacity = dmz->dev[1].capacity & ~(zone_nr_sectors - 1);
> -		r = fn(ti, dmz->ddev[1], 0, capacity, data);
> +	for (i = 0; i < dmz->nr_ddevs; i++) {
> +		capacity = dmz->dev[i].capacity & ~(zone_nr_sectors - 1);
> +		r = fn(ti, dmz->ddev[i], 0, capacity, data);
> +		if (r)
> +			break;
>  	}
>  	return r;
>  }
> @@ -1083,7 +1092,7 @@ static void dmz_status(struct dm_target *ti, status_type_t type,
>  		dev = &dmz->dev[0];
>  		format_dev_t(buf, dev->bdev->bd_dev);
>  		DMEMIT("%s", buf);
> -		if (dmz->dev[1].bdev) {
> +		if (dmz->nr_ddevs > 1) {
>  			dev = &dmz->dev[1];
>  			format_dev_t(buf, dev->bdev->bd_dev);
>  			DMEMIT(" %s", buf);
> 

Looks good.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 10/15] dm-zoned: per-device reclaim
  2020-05-27  6:22 ` [PATCH 10/15] dm-zoned: per-device reclaim Hannes Reinecke
@ 2020-05-28  3:19   ` Damien Le Moal
  0 siblings, 0 replies; 32+ messages in thread
From: Damien Le Moal @ 2020-05-28  3:19 UTC (permalink / raw)
  To: snitzer, hare; +Cc: dm-devel

On Wed, 2020-05-27 at 08:22 +0200, Hannes Reinecke wrote:
> Instead of having one reclaim workqueue for the entire set we should
> be allocating a reclaim workqueue per device; that will reduce
> contention and should boost performance for a multi-device setup.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/md/dm-zoned-reclaim.c | 66 +++++++++++++++++++++++++++----------------
>  drivers/md/dm-zoned-target.c  | 39 +++++++++++++++----------
>  drivers/md/dm-zoned.h         | 38 +++++++++++++------------
>  3 files changed, 86 insertions(+), 57 deletions(-)
> 
> diff --git a/drivers/md/dm-zoned-reclaim.c b/drivers/md/dm-zoned-reclaim.c
> index e9e3b730e258..09843645248a 100644
> --- a/drivers/md/dm-zoned-reclaim.c
> +++ b/drivers/md/dm-zoned-reclaim.c
> @@ -21,6 +21,8 @@ struct dmz_reclaim {
>  	struct dm_kcopyd_throttle kc_throttle;
>  	int			kc_err;
>  
> +	int			dev_idx;
> +
>  	unsigned long		flags;
>  
>  	/* Last target access time */
> @@ -198,8 +200,8 @@ static int dmz_reclaim_buf(struct dmz_reclaim *zrc, struct dm_zone *dzone)
>  	struct dmz_metadata *zmd = zrc->metadata;
>  	int ret;
>  
> -	DMDEBUG("(%s): Chunk %u, move buf zone %u (weight %u) to data zone %u (weight %u)",
> -		dmz_metadata_label(zmd),
> +	DMDEBUG("(%s/%u): Chunk %u, move buf zone %u (weight %u) to data zone %u (weight %u)",
> +		dmz_metadata_label(zmd), zrc->dev_idx,
>  		dzone->chunk, bzone->id, dmz_weight(bzone),
>  		dzone->id, dmz_weight(dzone));
>  
> @@ -237,8 +239,8 @@ static int dmz_reclaim_seq_data(struct dmz_reclaim *zrc, struct dm_zone *dzone)
>  	struct dmz_metadata *zmd = zrc->metadata;
>  	int ret = 0;
>  
> -	DMDEBUG("(%s): Chunk %u, move data zone %u (weight %u) to buf zone %u (weight %u)",
> -		dmz_metadata_label(zmd),
> +	DMDEBUG("(%s/%u): Chunk %u, move data zone %u (weight %u) to buf zone %u (weight %u)",
> +		dmz_metadata_label(zmd), zrc->dev_idx,
>  		chunk, dzone->id, dmz_weight(dzone),
>  		bzone->id, dmz_weight(bzone));
>  
> @@ -295,8 +297,8 @@ static int dmz_reclaim_rnd_data(struct dmz_reclaim *zrc, struct dm_zone *dzone)
>  	if (!szone)
>  		return -ENOSPC;
>  
> -	DMDEBUG("(%s): Chunk %u, move %s zone %u (weight %u) to %s zone %u",
> -		dmz_metadata_label(zmd), chunk,
> +	DMDEBUG("(%s/%u): Chunk %u, move %s zone %u (weight %u) to %s zone %u",
> +		dmz_metadata_label(zmd), zrc->dev_idx, chunk,
>  		dmz_is_cache(dzone) ? "cache" : "rnd",
>  		dzone->id, dmz_weight(dzone),
>  		dmz_is_rnd(szone) ? "rnd" : "seq", szone->id);
> @@ -369,8 +371,8 @@ static int dmz_do_reclaim(struct dmz_reclaim *zrc)
>  	/* Get a data zone */
>  	dzone = dmz_get_zone_for_reclaim(zmd, dmz_target_idle(zrc));
>  	if (!dzone) {
> -		DMDEBUG("(%s): No zone found to reclaim",
> -			dmz_metadata_label(zmd));
> +		DMDEBUG("(%s/%u): No zone found to reclaim",
> +			dmz_metadata_label(zmd), zrc->dev_idx);
>  		return -EBUSY;
>  	}
>  
> @@ -417,24 +419,26 @@ static int dmz_do_reclaim(struct dmz_reclaim *zrc)
>  out:
>  	if (ret) {
>  		if (ret == -EINTR)
> -			DMDEBUG("(%s): reclaim zone %u interrupted",
> -				dmz_metadata_label(zmd), rzone->id);
> +			DMDEBUG("(%s/%u): reclaim zone %u interrupted",
> +				dmz_metadata_label(zmd), zrc->dev_idx,
> +				rzone->id);
>  		else
> -			DMDEBUG("(%s): Failed to reclaim zone %u, err %d",
> -				dmz_metadata_label(zmd), rzone->id, ret);
> +			DMDEBUG("(%s/%u): Failed to reclaim zone %u, err %d",
> +				dmz_metadata_label(zmd), zrc->dev_idx,
> +				rzone->id, ret);
>  		dmz_unlock_zone_reclaim(dzone);
>  		return ret;
>  	}
>  
>  	ret = dmz_flush_metadata(zrc->metadata);
>  	if (ret) {
> -		DMDEBUG("(%s): Metadata flush for zone %u failed, err %d",
> -			dmz_metadata_label(zmd), rzone->id, ret);
> +		DMDEBUG("(%s/%u): Metadata flush for zone %u failed, err %d",
> +			dmz_metadata_label(zmd), zrc->dev_idx, rzone->id, ret);
>  		return ret;
>  	}
>  
> -	DMDEBUG("(%s): Reclaimed zone %u in %u ms",
> -		dmz_metadata_label(zmd),
> +	DMDEBUG("(%s/%u): Reclaimed zone %u in %u ms",
> +		dmz_metadata_label(zmd), zrc->dev_idx,
>  		rzone->id, jiffies_to_msecs(jiffies - start));
>  	return 0;
>  }
> @@ -461,10 +465,20 @@ static unsigned int dmz_reclaim_percentage(struct dmz_reclaim *zrc)
>   */
>  static bool dmz_should_reclaim(struct dmz_reclaim *zrc, unsigned int p_unmap)
>  {
> -	unsigned int nr_reclaim = dmz_nr_rnd_zones(zrc->metadata);
> +	unsigned int nr_reclaim;
> +
> +	nr_reclaim = dmz_nr_rnd_zones(zrc->metadata);
>  
> -	if (dmz_nr_cache_zones(zrc->metadata))
> +	if (dmz_nr_cache_zones(zrc->metadata)) {
> +		/*
> +		 * The first device in a multi-device
> +		 * setup only contains cache zones, so
> +		 * never start reclaim there.
> +		 */
> +		if (zrc->dev_idx == 0)
> +			return false;
>  		nr_reclaim += dmz_nr_cache_zones(zrc->metadata);
> +	}
>  
>  	/* Reclaim when idle */
>  	if (dmz_target_idle(zrc) && nr_reclaim)
> @@ -488,7 +502,7 @@ static void dmz_reclaim_work(struct work_struct *work)
>  {
>  	struct dmz_reclaim *zrc = container_of(work, struct dmz_reclaim, work.work);
>  	struct dmz_metadata *zmd = zrc->metadata;
> -	unsigned int p_unmap;
> +	unsigned int p_unmap, nr_unmap_rnd = 0, nr_rnd = 0;
>  	int ret;
>  
>  	if (dmz_dev_is_dying(zmd))
> @@ -514,8 +528,11 @@ static void dmz_reclaim_work(struct work_struct *work)
>  		zrc->kc_throttle.throttle = min(75U, 100U - p_unmap / 2);
>  	}
>  
> -	DMDEBUG("(%s): Reclaim (%u): %s, %u%% free zones (%u/%u cache %u/%u random)",
> -		dmz_metadata_label(zmd),
> +	nr_unmap_rnd = dmz_nr_unmap_rnd_zones(zmd);
> +	nr_rnd = dmz_nr_rnd_zones(zmd);
> +
> +	DMDEBUG("(%s/%u): Reclaim (%u): %s, %u%% free zones (%u/%u cache %u/%u random)",
> +		dmz_metadata_label(zmd), zrc->dev_idx,
>  		zrc->kc_throttle.throttle,
>  		(dmz_target_idle(zrc) ? "Idle" : "Busy"),
>  		p_unmap, dmz_nr_unmap_cache_zones(zmd),
> @@ -536,7 +553,7 @@ static void dmz_reclaim_work(struct work_struct *work)
>   * Initialize reclaim.
>   */
>  int dmz_ctr_reclaim(struct dmz_metadata *zmd,
> -		    struct dmz_reclaim **reclaim)
> +		    struct dmz_reclaim **reclaim, int idx)
>  {
>  	struct dmz_reclaim *zrc;
>  	int ret;
> @@ -547,6 +564,7 @@ int dmz_ctr_reclaim(struct dmz_metadata *zmd,
>  
>  	zrc->metadata = zmd;
>  	zrc->atime = jiffies;
> +	zrc->dev_idx = idx;
>  
>  	/* Reclaim kcopyd client */
>  	zrc->kc = dm_kcopyd_client_create(&zrc->kc_throttle);
> @@ -558,8 +576,8 @@ int dmz_ctr_reclaim(struct dmz_metadata *zmd,
>  
>  	/* Reclaim work */
>  	INIT_DELAYED_WORK(&zrc->work, dmz_reclaim_work);
> -	zrc->wq = alloc_ordered_workqueue("dmz_rwq_%s", WQ_MEM_RECLAIM,
> -					  dmz_metadata_label(zmd));
> +	zrc->wq = alloc_ordered_workqueue("dmz_rwq_%s_%d", WQ_MEM_RECLAIM,
> +					  dmz_metadata_label(zmd), idx);

Nit: We do not need the wq for idx == 0 && nrdevs > 1.

>  	if (!zrc->wq) {
>  		ret = -ENOMEM;
>  		goto err;
> diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
> index fec1b6a9c6f1..fc1df9714f63 100644
> --- a/drivers/md/dm-zoned-target.c
> +++ b/drivers/md/dm-zoned-target.c
> @@ -51,9 +51,6 @@ struct dmz_target {
>  	/* For metadata handling */
>  	struct dmz_metadata     *metadata;
>  
> -	/* For reclaim */
> -	struct dmz_reclaim	*reclaim;
> -
>  	/* For chunk work */
>  	struct radix_tree_root	chunk_rxtree;
>  	struct workqueue_struct *chunk_wq;
> @@ -405,14 +402,15 @@ static void dmz_handle_bio(struct dmz_target *dmz, struct dm_chunk_work *cw,
>  		dm_per_bio_data(bio, sizeof(struct dmz_bioctx));
>  	struct dmz_metadata *zmd = dmz->metadata;
>  	struct dm_zone *zone;
> -	int ret;
> +	int i, ret;
>  
>  	/*
>  	 * Write may trigger a zone allocation. So make sure the
>  	 * allocation can succeed.
>  	 */
>  	if (bio_op(bio) == REQ_OP_WRITE)
> -		dmz_schedule_reclaim(dmz->reclaim);
> +		for (i = 0; i < dmz->nr_ddevs; i++)
> +			dmz_schedule_reclaim(dmz->dev[i].reclaim);
>  
>  	dmz_lock_metadata(zmd);
>  
> @@ -432,6 +430,7 @@ static void dmz_handle_bio(struct dmz_target *dmz, struct dm_chunk_work *cw,
>  	if (zone) {
>  		dmz_activate_zone(zone);
>  		bioctx->zone = zone;
> +		dmz_reclaim_bio_acc(zone->dev->reclaim);
>  	}
>  
>  	switch (bio_op(bio)) {
> @@ -578,7 +577,6 @@ static int dmz_queue_chunk_work(struct dmz_target *dmz, struct bio *bio)
>  
>  	bio_list_add(&cw->bio_list, bio);
>  
> -	dmz_reclaim_bio_acc(dmz->reclaim);
>  	if (queue_work(dmz->chunk_wq, &cw->work))
>  		dmz_get_chunk_work(cw);
>  out:
> @@ -823,7 +821,7 @@ static int dmz_fixup_devices(struct dm_target *ti)
>  static int dmz_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  {
>  	struct dmz_target *dmz;
> -	int ret;
> +	int ret, i;
>  
>  	/* Check arguments */
>  	if (argc < 1 || argc > 2) {
> @@ -925,10 +923,12 @@ static int dmz_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  	mod_delayed_work(dmz->flush_wq, &dmz->flush_work, DMZ_FLUSH_PERIOD);
>  
>  	/* Initialize reclaim */
> -	ret = dmz_ctr_reclaim(dmz->metadata, &dmz->reclaim);
> -	if (ret) {
> -		ti->error = "Zone reclaim initialization failed";
> -		goto err_fwq;
> +	for (i = 0; i < dmz->nr_ddevs; i++) {

We probably could start the loop from 1 here when nr_ddevs > 1 to not
initialize anything for the cache device since reclaim is not run from
it.

> +		ret = dmz_ctr_reclaim(dmz->metadata, &dmz->dev[i].reclaim, i);
> +		if (ret) {
> +			ti->error = "Zone reclaim initialization failed";
> +			goto err_fwq;
> +		}
>  	}
>  
>  	DMINFO("(%s): Target device: %llu 512-byte logical sectors (%llu blocks)",
> @@ -961,11 +961,13 @@ static int dmz_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  static void dmz_dtr(struct dm_target *ti)
>  {
>  	struct dmz_target *dmz = ti->private;
> +	int i;
>  
>  	flush_workqueue(dmz->chunk_wq);
>  	destroy_workqueue(dmz->chunk_wq);
>  
> -	dmz_dtr_reclaim(dmz->reclaim);
> +	for (i = 0; i < dmz->nr_ddevs; i++)
> +		dmz_dtr_reclaim(dmz->dev[i].reclaim);
>  
>  	cancel_delayed_work_sync(&dmz->flush_work);
>  	destroy_workqueue(dmz->flush_wq);
> @@ -1034,9 +1036,11 @@ static int dmz_prepare_ioctl(struct dm_target *ti, struct block_device **bdev)
>  static void dmz_suspend(struct dm_target *ti)
>  {
>  	struct dmz_target *dmz = ti->private;
> +	int i;
>  
>  	flush_workqueue(dmz->chunk_wq);
> -	dmz_suspend_reclaim(dmz->reclaim);
> +	for (i = 0; i < dmz->nr_ddevs; i++)
> +		dmz_suspend_reclaim(dmz->dev[i].reclaim);
>  	cancel_delayed_work_sync(&dmz->flush_work);
>  }
>  
> @@ -1046,9 +1050,11 @@ static void dmz_suspend(struct dm_target *ti)
>  static void dmz_resume(struct dm_target *ti)
>  {
>  	struct dmz_target *dmz = ti->private;
> +	int i;
>  
>  	queue_delayed_work(dmz->flush_wq, &dmz->flush_work, DMZ_FLUSH_PERIOD);
> -	dmz_resume_reclaim(dmz->reclaim);
> +	for (i = 0; i < dmz->nr_ddevs; i++)
> +		dmz_resume_reclaim(dmz->dev[i].reclaim);
>  }
>  
>  static int dmz_iterate_devices(struct dm_target *ti,
> @@ -1109,7 +1115,10 @@ static int dmz_message(struct dm_target *ti, unsigned int argc, char **argv,
>  	int r = -EINVAL;
>  
>  	if (!strcasecmp(argv[0], "reclaim")) {
> -		dmz_schedule_reclaim(dmz->reclaim);
> +		int i;
> +
> +		for (i = 0; i < dmz->nr_ddevs; i++)
> +			dmz_schedule_reclaim(dmz->dev[i].reclaim);
>  		r = 0;
>  	} else
>  		DMERR("unrecognized message %s", argv[0]);
> diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
> index 983f5b5e9fa0..0cc3459f78ce 100644
> --- a/drivers/md/dm-zoned.h
> +++ b/drivers/md/dm-zoned.h
> @@ -54,6 +54,7 @@ struct dmz_reclaim;
>  struct dmz_dev {
>  	struct block_device	*bdev;
>  	struct dmz_metadata	*metadata;
> +	struct dmz_reclaim	*reclaim;
>  
>  	char			name[BDEVNAME_SIZE];
>  	uuid_t			uuid;
> @@ -229,23 +230,6 @@ static inline void dmz_activate_zone(struct dm_zone *zone)
>  	atomic_inc(&zone->refcount);
>  }
>  
> -/*
> - * Deactivate a zone. This decrement the zone reference counter
> - * indicating that all BIOs to the zone have completed when the count is 0.
> - */
> -static inline void dmz_deactivate_zone(struct dm_zone *zone)
> -{
> -	atomic_dec(&zone->refcount);
> -}
> -
> -/*
> - * Test if a zone is active, that is, has a refcount > 0.
> - */
> -static inline bool dmz_is_active(struct dm_zone *zone)
> -{
> -	return atomic_read(&zone->refcount);
> -}
> -
>  int dmz_lock_zone_reclaim(struct dm_zone *zone);
>  void dmz_unlock_zone_reclaim(struct dm_zone *zone);
>  struct dm_zone *dmz_get_zone_for_reclaim(struct dmz_metadata *zmd, bool idle);
> @@ -272,7 +256,7 @@ int dmz_merge_valid_blocks(struct dmz_metadata *zmd, struct dm_zone *from_zone,
>  /*
>   * Functions defined in dm-zoned-reclaim.c
>   */
> -int dmz_ctr_reclaim(struct dmz_metadata *zmd, struct dmz_reclaim **zrc);
> +int dmz_ctr_reclaim(struct dmz_metadata *zmd, struct dmz_reclaim **zrc, int idx);
>  void dmz_dtr_reclaim(struct dmz_reclaim *zrc);
>  void dmz_suspend_reclaim(struct dmz_reclaim *zrc);
>  void dmz_resume_reclaim(struct dmz_reclaim *zrc);
> @@ -285,4 +269,22 @@ void dmz_schedule_reclaim(struct dmz_reclaim *zrc);
>  bool dmz_bdev_is_dying(struct dmz_dev *dmz_dev);
>  bool dmz_check_bdev(struct dmz_dev *dmz_dev);
>  
> +/*
> + * Deactivate a zone. This decrement the zone reference counter
> + * indicating that all BIOs to the zone have completed when the count is 0.
> + */
> +static inline void dmz_deactivate_zone(struct dm_zone *zone)
> +{
> +	dmz_reclaim_bio_acc(zone->dev->reclaim);
> +	atomic_dec(&zone->refcount);
> +}
> +
> +/*
> + * Test if a zone is active, that is, has a refcount > 0.
> + */
> +static inline bool dmz_is_active(struct dm_zone *zone)
> +{
> +	return atomic_read(&zone->refcount);
> +}
> +
>  #endif /* DM_ZONED_H */

Apart from the nit above, looks good.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 11/15] dm-zoned: move random and sequential zones into struct dmz_dev
  2020-05-27  6:22 ` [PATCH 11/15] dm-zoned: move random and sequential zones into struct dmz_dev Hannes Reinecke
@ 2020-05-28  3:25   ` Damien Le Moal
  0 siblings, 0 replies; 32+ messages in thread
From: Damien Le Moal @ 2020-05-28  3:25 UTC (permalink / raw)
  To: snitzer, hare; +Cc: dm-devel

On Wed, 2020-05-27 at 08:22 +0200, Hannes Reinecke wrote:
> Random and sequential zones should be part of the respective
> device structure to make arbitration between devices possible.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/md/dm-zoned-metadata.c | 139 +++++++++++++++++++++++------------------
>  drivers/md/dm-zoned-reclaim.c  |  15 +++--
>  drivers/md/dm-zoned-target.c   |  25 ++++++--
>  drivers/md/dm-zoned.h          |  18 ++++--
>  4 files changed, 119 insertions(+), 78 deletions(-)
> 
> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> index 445760730d10..f309219a5457 100644
> --- a/drivers/md/dm-zoned-metadata.c
> +++ b/drivers/md/dm-zoned-metadata.c
> @@ -192,21 +192,12 @@ struct dmz_metadata {
>  	/* Zone allocation management */
>  	struct mutex		map_lock;
>  	struct dmz_mblock	**map_mblk;
> -	unsigned int		nr_rnd;
> -	atomic_t		unmap_nr_rnd;
> -	struct list_head	unmap_rnd_list;
> -	struct list_head	map_rnd_list;
>  
>  	unsigned int		nr_cache;
>  	atomic_t		unmap_nr_cache;
>  	struct list_head	unmap_cache_list;
>  	struct list_head	map_cache_list;
>  
> -	unsigned int		nr_seq;
> -	atomic_t		unmap_nr_seq;
> -	struct list_head	unmap_seq_list;
> -	struct list_head	map_seq_list;
> -
>  	atomic_t		nr_reserved_seq_zones;
>  	struct list_head	reserved_seq_zones_list;
>  
> @@ -279,14 +270,14 @@ unsigned int dmz_nr_chunks(struct dmz_metadata *zmd)
>  	return zmd->nr_chunks;
>  }
>  
> -unsigned int dmz_nr_rnd_zones(struct dmz_metadata *zmd)
> +unsigned int dmz_nr_rnd_zones(struct dmz_metadata *zmd, int idx)
>  {
> -	return zmd->nr_rnd;
> +	return zmd->dev[idx].nr_rnd;
>  }
>  
> -unsigned int dmz_nr_unmap_rnd_zones(struct dmz_metadata *zmd)
> +unsigned int dmz_nr_unmap_rnd_zones(struct dmz_metadata *zmd, int idx)
>  {
> -	return atomic_read(&zmd->unmap_nr_rnd);
> +	return atomic_read(&zmd->dev[idx].unmap_nr_rnd);
>  }
>  
>  unsigned int dmz_nr_cache_zones(struct dmz_metadata *zmd)
> @@ -299,14 +290,14 @@ unsigned int dmz_nr_unmap_cache_zones(struct dmz_metadata *zmd)
>  	return atomic_read(&zmd->unmap_nr_cache);
>  }
>  
> -unsigned int dmz_nr_seq_zones(struct dmz_metadata *zmd)
> +unsigned int dmz_nr_seq_zones(struct dmz_metadata *zmd, int idx)
>  {
> -	return zmd->nr_seq;
> +	return zmd->dev[idx].nr_seq;
>  }
>  
> -unsigned int dmz_nr_unmap_seq_zones(struct dmz_metadata *zmd)
> +unsigned int dmz_nr_unmap_seq_zones(struct dmz_metadata *zmd, int idx)
>  {
> -	return atomic_read(&zmd->unmap_nr_seq);
> +	return atomic_read(&zmd->dev[idx].unmap_nr_seq);
>  }
>  
>  static struct dm_zone *dmz_get(struct dmz_metadata *zmd, unsigned int zone_id)
> @@ -1495,6 +1486,14 @@ static int dmz_init_zones(struct dmz_metadata *zmd)
>  
>  		dev->metadata = zmd;
>  		zmd->nr_zones += dev->nr_zones;
> +
> +		atomic_set(&dev->unmap_nr_rnd, 0);
> +		INIT_LIST_HEAD(&dev->unmap_rnd_list);
> +		INIT_LIST_HEAD(&dev->map_rnd_list);
> +
> +		atomic_set(&dev->unmap_nr_seq, 0);
> +		INIT_LIST_HEAD(&dev->unmap_seq_list);
> +		INIT_LIST_HEAD(&dev->map_seq_list);
>  	}
>  
>  	if (!zmd->nr_zones) {
> @@ -1715,9 +1714,9 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
>  		if (dmz_is_cache(dzone))
>  			list_add_tail(&dzone->link, &zmd->map_cache_list);
>  		else if (dmz_is_rnd(dzone))
> -			list_add_tail(&dzone->link, &zmd->map_rnd_list);
> +			list_add_tail(&dzone->link, &dzone->dev->map_rnd_list);
>  		else
> -			list_add_tail(&dzone->link, &zmd->map_seq_list);
> +			list_add_tail(&dzone->link, &dzone->dev->map_seq_list);
>  
>  		/* Check buffer zone */
>  		bzone_id = le32_to_cpu(dmap[e].bzone_id);
> @@ -1751,7 +1750,7 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
>  		if (dmz_is_cache(bzone))
>  			list_add_tail(&bzone->link, &zmd->map_cache_list);
>  		else
> -			list_add_tail(&bzone->link, &zmd->map_rnd_list);
> +			list_add_tail(&bzone->link, &bzone->dev->map_rnd_list);
>  next:
>  		chunk++;
>  		e++;
> @@ -1776,9 +1775,9 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
>  		if (dmz_is_cache(dzone))
>  			zmd->nr_cache++;
>  		else if (dmz_is_rnd(dzone))
> -			zmd->nr_rnd++;
> +			dzone->dev->nr_rnd++;
>  		else
> -			zmd->nr_seq++;
> +			dzone->dev->nr_seq++;
>  
>  		if (dmz_is_data(dzone)) {
>  			/* Already initialized */
> @@ -1792,16 +1791,18 @@ static int dmz_load_mapping(struct dmz_metadata *zmd)
>  			list_add_tail(&dzone->link, &zmd->unmap_cache_list);
>  			atomic_inc(&zmd->unmap_nr_cache);
>  		} else if (dmz_is_rnd(dzone)) {
> -			list_add_tail(&dzone->link, &zmd->unmap_rnd_list);
> -			atomic_inc(&zmd->unmap_nr_rnd);
> +			list_add_tail(&dzone->link,
> +				      &dzone->dev->unmap_rnd_list);
> +			atomic_inc(&dzone->dev->unmap_nr_rnd);
>  		} else if (atomic_read(&zmd->nr_reserved_seq_zones) < zmd->nr_reserved_seq) {
>  			list_add_tail(&dzone->link, &zmd->reserved_seq_zones_list);
>  			set_bit(DMZ_RESERVED, &dzone->flags);
>  			atomic_inc(&zmd->nr_reserved_seq_zones);
> -			zmd->nr_seq--;
> +			dzone->dev->nr_seq--;
>  		} else {
> -			list_add_tail(&dzone->link, &zmd->unmap_seq_list);
> -			atomic_inc(&zmd->unmap_nr_seq);
> +			list_add_tail(&dzone->link,
> +				      &dzone->dev->unmap_seq_list);
> +			atomic_inc(&dzone->dev->unmap_nr_seq);
>  		}
>  	}
>  
> @@ -1835,13 +1836,13 @@ static void __dmz_lru_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
>  	list_del_init(&zone->link);
>  	if (dmz_is_seq(zone)) {
>  		/* LRU rotate sequential zone */
> -		list_add_tail(&zone->link, &zmd->map_seq_list);
> +		list_add_tail(&zone->link, &zone->dev->map_seq_list);
>  	} else if (dmz_is_cache(zone)) {
>  		/* LRU rotate cache zone */
>  		list_add_tail(&zone->link, &zmd->map_cache_list);
>  	} else {
>  		/* LRU rotate random zone */
> -		list_add_tail(&zone->link, &zmd->map_rnd_list);
> +		list_add_tail(&zone->link, &zone->dev->map_rnd_list);
>  	}
>  }
>  
> @@ -1923,14 +1924,24 @@ static struct dm_zone *dmz_get_rnd_zone_for_reclaim(struct dmz_metadata *zmd,
>  {
>  	struct dm_zone *dzone = NULL;
>  	struct dm_zone *zone;
> -	struct list_head *zone_list = &zmd->map_rnd_list;
> +	struct list_head *zone_list;
>  
>  	/* If we have cache zones select from the cache zone list */
>  	if (zmd->nr_cache) {
>  		zone_list = &zmd->map_cache_list;
>  		/* Try to relaim random zones, too, when idle */
> -		if (idle && list_empty(zone_list))
> -			zone_list = &zmd->map_rnd_list;
> +		if (idle && list_empty(zone_list)) {
> +			int i;
> +
> +			for (i = 1; i < zmd->nr_devs; i++) {
> +				zone_list = &zmd->dev[i].map_rnd_list;
> +				if (!list_empty(zone_list))
> +					break;
> +			}
> +		}
> +	} else {
> +		/* Otherwise the random zones are on the first disk */
> +		zone_list = &zmd->dev[0].map_rnd_list;
>  	}
>  
>  	list_for_each_entry(zone, zone_list, link) {
> @@ -1951,12 +1962,17 @@ static struct dm_zone *dmz_get_rnd_zone_for_reclaim(struct dmz_metadata *zmd,
>  static struct dm_zone *dmz_get_seq_zone_for_reclaim(struct dmz_metadata *zmd)
>  {
>  	struct dm_zone *zone;
> +	int i;
>  
> -	list_for_each_entry(zone, &zmd->map_seq_list, link) {
> -		if (!zone->bzone)
> -			continue;
> -		if (dmz_lock_zone_reclaim(zone))
> -			return zone;
> +	for (i = 0; i < zmd->nr_devs; i++) {
> +		struct dmz_dev *dev = &zmd->dev[i];
> +
> +		list_for_each_entry(zone, &dev->map_seq_list, link) {
> +			if (!zone->bzone)
> +				continue;
> +			if (dmz_lock_zone_reclaim(zone))
> +				return zone;
> +		}
>  	}
>  
>  	return NULL;
> @@ -2142,7 +2158,7 @@ struct dm_zone *dmz_get_chunk_buffer(struct dmz_metadata *zmd,
>  	if (dmz_is_cache(bzone))
>  		list_add_tail(&bzone->link, &zmd->map_cache_list);
>  	else
> -		list_add_tail(&bzone->link, &zmd->map_rnd_list);
> +		list_add_tail(&bzone->link, &bzone->dev->map_rnd_list);
>  out:
>  	dmz_unlock_map(zmd);
>  
> @@ -2157,21 +2173,27 @@ struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags)
>  {
>  	struct list_head *list;
>  	struct dm_zone *zone;
> +	unsigned int dev_idx = 0;
>  
> +again:
>  	if (flags & DMZ_ALLOC_CACHE)
>  		list = &zmd->unmap_cache_list;
>  	else if (flags & DMZ_ALLOC_RND)
> -		list = &zmd->unmap_rnd_list;
> +		list = &zmd->dev[dev_idx].unmap_rnd_list;
>  	else
> -		list = &zmd->unmap_seq_list;
> +		list = &zmd->dev[dev_idx].unmap_seq_list;
>  
> -again:
>  	if (list_empty(list)) {
>  		/*
>  		 * No free zone: return NULL if this is for not reclaim.
>  		 */
>  		if (!(flags & DMZ_ALLOC_RECLAIM))
>  			return NULL;
> +		if (dev_idx < zmd->nr_devs) {
> +			dev_idx++;
> +			goto again;
> +		}
> +
>  		/*
>  		 * Fallback to the reserved sequential zones
>  		 */
> @@ -2190,9 +2212,9 @@ struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags)
>  	if (dmz_is_cache(zone))
>  		atomic_dec(&zmd->unmap_nr_cache);
>  	else if (dmz_is_rnd(zone))
> -		atomic_dec(&zmd->unmap_nr_rnd);
> +		atomic_dec(&zone->dev->unmap_nr_rnd);
>  	else
> -		atomic_dec(&zmd->unmap_nr_seq);
> +		atomic_dec(&zone->dev->unmap_nr_seq);
>  
>  	if (dmz_is_offline(zone)) {
>  		dmz_zmd_warn(zmd, "Zone %u is offline", zone->id);
> @@ -2222,14 +2244,14 @@ void dmz_free_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
>  		list_add_tail(&zone->link, &zmd->unmap_cache_list);
>  		atomic_inc(&zmd->unmap_nr_cache);
>  	} else if (dmz_is_rnd(zone)) {
> -		list_add_tail(&zone->link, &zmd->unmap_rnd_list);
> -		atomic_inc(&zmd->unmap_nr_rnd);
> +		list_add_tail(&zone->link, &zone->dev->unmap_rnd_list);
> +		atomic_inc(&zone->dev->unmap_nr_rnd);
>  	} else if (dmz_is_reserved(zone)) {
>  		list_add_tail(&zone->link, &zmd->reserved_seq_zones_list);
>  		atomic_inc(&zmd->nr_reserved_seq_zones);
>  	} else {
> -		list_add_tail(&zone->link, &zmd->unmap_seq_list);
> -		atomic_inc(&zmd->unmap_nr_seq);
> +		list_add_tail(&zone->link, &zone->dev->unmap_seq_list);
> +		atomic_inc(&zone->dev->unmap_nr_seq);
>  	}
>  
>  	wake_up_all(&zmd->free_wq);
> @@ -2249,9 +2271,9 @@ void dmz_map_zone(struct dmz_metadata *zmd, struct dm_zone *dzone,
>  	if (dmz_is_cache(dzone))
>  		list_add_tail(&dzone->link, &zmd->map_cache_list);
>  	else if (dmz_is_rnd(dzone))
> -		list_add_tail(&dzone->link, &zmd->map_rnd_list);
> +		list_add_tail(&dzone->link, &dzone->dev->map_rnd_list);
>  	else
> -		list_add_tail(&dzone->link, &zmd->map_seq_list);
> +		list_add_tail(&dzone->link, &dzone->dev->map_seq_list);
>  }
>  
>  /*
> @@ -2819,18 +2841,11 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
>  	INIT_LIST_HEAD(&zmd->mblk_dirty_list);
>  
>  	mutex_init(&zmd->map_lock);
> -	atomic_set(&zmd->unmap_nr_rnd, 0);
> -	INIT_LIST_HEAD(&zmd->unmap_rnd_list);
> -	INIT_LIST_HEAD(&zmd->map_rnd_list);
>  
>  	atomic_set(&zmd->unmap_nr_cache, 0);
>  	INIT_LIST_HEAD(&zmd->unmap_cache_list);
>  	INIT_LIST_HEAD(&zmd->map_cache_list);
>  
> -	atomic_set(&zmd->unmap_nr_seq, 0);
> -	INIT_LIST_HEAD(&zmd->unmap_seq_list);
> -	INIT_LIST_HEAD(&zmd->map_seq_list);
> -
>  	atomic_set(&zmd->nr_reserved_seq_zones, 0);
>  	INIT_LIST_HEAD(&zmd->reserved_seq_zones_list);
>  
> @@ -2899,10 +2914,14 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
>  		      zmd->nr_data_zones, zmd->nr_chunks);
>  	dmz_zmd_debug(zmd, "    %u cache zones (%u unmapped)",
>  		      zmd->nr_cache, atomic_read(&zmd->unmap_nr_cache));
> -	dmz_zmd_debug(zmd, "    %u random zones (%u unmapped)",
> -		      zmd->nr_rnd, atomic_read(&zmd->unmap_nr_rnd));
> -	dmz_zmd_debug(zmd, "    %u sequential zones (%u unmapped)",
> -		      zmd->nr_seq, atomic_read(&zmd->unmap_nr_seq));
> +	for (i = 0; i < zmd->nr_devs; i++) {
> +		dmz_zmd_debug(zmd, "    %u random zones (%u unmapped)",
> +			      dmz_nr_rnd_zones(zmd, i),
> +			      dmz_nr_unmap_rnd_zones(zmd, i));
> +		dmz_zmd_debug(zmd, "    %u sequential zones (%u unmapped)",
> +			      dmz_nr_seq_zones(zmd, i),
> +			      dmz_nr_unmap_seq_zones(zmd, i));
> +	}
>  	dmz_zmd_debug(zmd, "  %u reserved sequential data zones",
>  		      zmd->nr_reserved_seq);
>  	dmz_zmd_debug(zmd, "Format:");
> diff --git a/drivers/md/dm-zoned-reclaim.c b/drivers/md/dm-zoned-reclaim.c
> index 09843645248a..18edf1b9bf52 100644
> --- a/drivers/md/dm-zoned-reclaim.c
> +++ b/drivers/md/dm-zoned-reclaim.c
> @@ -447,15 +447,14 @@ static unsigned int dmz_reclaim_percentage(struct dmz_reclaim *zrc)
>  {
>  	struct dmz_metadata *zmd = zrc->metadata;
>  	unsigned int nr_cache = dmz_nr_cache_zones(zmd);
> -	unsigned int nr_rnd = dmz_nr_rnd_zones(zmd);
>  	unsigned int nr_unmap, nr_zones;
>  
>  	if (nr_cache) {
>  		nr_zones = nr_cache;
>  		nr_unmap = dmz_nr_unmap_cache_zones(zmd);
>  	} else {
> -		nr_zones = nr_rnd;
> -		nr_unmap = dmz_nr_unmap_rnd_zones(zmd);
> +		nr_zones = dmz_nr_rnd_zones(zmd, zrc->dev_idx);
> +		nr_unmap = dmz_nr_unmap_rnd_zones(zmd, zrc->dev_idx);
>  	}
>  	return nr_unmap * 100 / nr_zones;
>  }
> @@ -467,7 +466,7 @@ static bool dmz_should_reclaim(struct dmz_reclaim *zrc, unsigned int p_unmap)
>  {
>  	unsigned int nr_reclaim;
>  
> -	nr_reclaim = dmz_nr_rnd_zones(zrc->metadata);
> +	nr_reclaim = dmz_nr_rnd_zones(zrc->metadata, zrc->dev_idx);
>  
>  	if (dmz_nr_cache_zones(zrc->metadata)) {
>  		/*
> @@ -528,8 +527,8 @@ static void dmz_reclaim_work(struct work_struct *work)
>  		zrc->kc_throttle.throttle = min(75U, 100U - p_unmap / 2);
>  	}
>  
> -	nr_unmap_rnd = dmz_nr_unmap_rnd_zones(zmd);
> -	nr_rnd = dmz_nr_rnd_zones(zmd);
> +	nr_unmap_rnd = dmz_nr_unmap_rnd_zones(zmd, zrc->dev_idx);
> +	nr_rnd = dmz_nr_rnd_zones(zmd, zrc->dev_idx);
>  
>  	DMDEBUG("(%s/%u): Reclaim (%u): %s, %u%% free zones (%u/%u cache %u/%u random)",
>  		dmz_metadata_label(zmd), zrc->dev_idx,
> @@ -537,8 +536,8 @@ static void dmz_reclaim_work(struct work_struct *work)
>  		(dmz_target_idle(zrc) ? "Idle" : "Busy"),
>  		p_unmap, dmz_nr_unmap_cache_zones(zmd),
>  		dmz_nr_cache_zones(zmd),
> -		dmz_nr_unmap_rnd_zones(zmd),
> -		dmz_nr_rnd_zones(zmd));
> +		dmz_nr_unmap_rnd_zones(zmd, zrc->dev_idx),
> +		dmz_nr_rnd_zones(zmd, zrc->dev_idx));
>  
>  	ret = dmz_do_reclaim(zrc);
>  	if (ret && ret != -EINTR) {
> diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
> index fc1df9714f63..f6f00a363903 100644
> --- a/drivers/md/dm-zoned-target.c
> +++ b/drivers/md/dm-zoned-target.c
> @@ -1082,17 +1082,30 @@ static void dmz_status(struct dm_target *ti, status_type_t type,
>  	ssize_t sz = 0;
>  	char buf[BDEVNAME_SIZE];
>  	struct dmz_dev *dev;
> +	int i;
>  
>  	switch (type) {
>  	case STATUSTYPE_INFO:
> -		DMEMIT("%u zones %u/%u cache %u/%u random %u/%u sequential",
> +		DMEMIT("%u zones %u/%u cache",
>  		       dmz_nr_zones(dmz->metadata),
>  		       dmz_nr_unmap_cache_zones(dmz->metadata),
> -		       dmz_nr_cache_zones(dmz->metadata),
> -		       dmz_nr_unmap_rnd_zones(dmz->metadata),
> -		       dmz_nr_rnd_zones(dmz->metadata),
> -		       dmz_nr_unmap_seq_zones(dmz->metadata),
> -		       dmz_nr_seq_zones(dmz->metadata));
> +		       dmz_nr_cache_zones(dmz->metadata));
> +		for (i = 0; i < DMZ_MAX_DEVS; i++) {

i < zmd->nr_devs ? Since now only what is needed is allocated.

> +			if (!dmz->ddev[i])
> +				continue;
> +			/*
> +			 * For a multi-device setup the first device
> +			 * contains only cache zones.
> +			 */
> +			if ((i == 0) &&
> +			    (dmz_nr_cache_zones(dmz->metadata) > 0))
> +				continue;
> +			DMEMIT(" %u/%u random %u/%u sequential",
> +			       dmz_nr_unmap_rnd_zones(dmz->metadata, i),
> +			       dmz_nr_rnd_zones(dmz->metadata, i),
> +			       dmz_nr_unmap_seq_zones(dmz->metadata, i),
> +			       dmz_nr_seq_zones(dmz->metadata, i));
> +		}
>  		break;
>  	case STATUSTYPE_TABLE:
>  		dev = &dmz->dev[0];
> diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
> index 0cc3459f78ce..f2a760f62db5 100644
> --- a/drivers/md/dm-zoned.h
> +++ b/drivers/md/dm-zoned.h
> @@ -67,6 +67,16 @@ struct dmz_dev {
>  	unsigned int		flags;
>  
>  	sector_t		zone_nr_sectors;
> +
> +	unsigned int		nr_rnd;
> +	atomic_t		unmap_nr_rnd;
> +	struct list_head	unmap_rnd_list;
> +	struct list_head	map_rnd_list;
> +
> +	unsigned int		nr_seq;
> +	atomic_t		unmap_nr_seq;
> +	struct list_head	unmap_seq_list;
> +	struct list_head	map_seq_list;
>  };
>  
>  #define dmz_bio_chunk(zmd, bio)	((bio)->bi_iter.bi_sector >> \
> @@ -213,10 +223,10 @@ void dmz_unmap_zone(struct dmz_metadata *zmd, struct dm_zone *zone);
>  unsigned int dmz_nr_zones(struct dmz_metadata *zmd);
>  unsigned int dmz_nr_cache_zones(struct dmz_metadata *zmd);
>  unsigned int dmz_nr_unmap_cache_zones(struct dmz_metadata *zmd);
> -unsigned int dmz_nr_rnd_zones(struct dmz_metadata *zmd);
> -unsigned int dmz_nr_unmap_rnd_zones(struct dmz_metadata *zmd);
> -unsigned int dmz_nr_seq_zones(struct dmz_metadata *zmd);
> -unsigned int dmz_nr_unmap_seq_zones(struct dmz_metadata *zmd);
> +unsigned int dmz_nr_rnd_zones(struct dmz_metadata *zmd, int idx);
> +unsigned int dmz_nr_unmap_rnd_zones(struct dmz_metadata *zmd, int idx);
> +unsigned int dmz_nr_seq_zones(struct dmz_metadata *zmd, int idx);
> +unsigned int dmz_nr_unmap_seq_zones(struct dmz_metadata *zmd, int idx);
>  unsigned int dmz_zone_nr_blocks(struct dmz_metadata *zmd);
>  unsigned int dmz_zone_nr_blocks_shift(struct dmz_metadata *zmd);
>  unsigned int dmz_zone_nr_sectors(struct dmz_metadata *zmd);

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 12/15] dm-zoned: support arbitrary number of devices
  2020-05-27  6:22 ` [PATCH 12/15] dm-zoned: support arbitrary number of devices Hannes Reinecke
@ 2020-05-28  4:04   ` Damien Le Moal
  0 siblings, 0 replies; 32+ messages in thread
From: Damien Le Moal @ 2020-05-28  4:04 UTC (permalink / raw)
  To: snitzer, hare; +Cc: dm-devel

On Wed, 2020-05-27 at 08:22 +0200, Hannes Reinecke wrote:
> Remove the hard-coded limit of two devices and support an unlimited
> number of additional zoned devices.
> With that we need to increase the device-mapper version number to
> 3.0.0 as we've modified the interface.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/md/dm-zoned-metadata.c | 15 +++++++-
>  drivers/md/dm-zoned-target.c   | 81 +++++++++++++++++++++++-------------------
>  2 files changed, 59 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> index f309219a5457..689c1dd7ab20 100644
> --- a/drivers/md/dm-zoned-metadata.c
> +++ b/drivers/md/dm-zoned-metadata.c
> @@ -1520,7 +1520,20 @@ static int dmz_init_zones(struct dmz_metadata *zmd)
>  		 */
>  		zmd->sb[0].zone = dmz_get(zmd, 0);
>  
> -		zoned_dev = &zmd->dev[1];
> +		for (i = 1; i < zmd->nr_devs; i++) {
> +			zoned_dev = &zmd->dev[i];
> +
> +			ret = blkdev_report_zones(zoned_dev->bdev, 0,
> +						  BLK_ALL_ZONES,
> +						  dmz_init_zone, zoned_dev);
> +			if (ret < 0) {
> +				DMDEBUG("(%s): Failed to report zones, error %d",
> +					zmd->devname, ret);
> +				dmz_drop_zones(zmd);
> +				return ret;
> +			}
> +		}
> +		return 0;
>  	}
>  
>  	/*
> diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
> index f6f00a363903..4a51738d4b0d 100644
> --- a/drivers/md/dm-zoned-target.c
> +++ b/drivers/md/dm-zoned-target.c
> @@ -13,8 +13,6 @@
>  
>  #define DMZ_MIN_BIOS		8192
>  
> -#define DMZ_MAX_DEVS		2

This should probably be done in patch 9, and that will naturally fix
the forgotten out-of-bound loop access in patch 11.

Or just merge this patch with patch 9 ?

> -
>  /*
>   * Zone BIO context.
>   */
> @@ -764,7 +762,7 @@ static void dmz_put_zoned_device(struct dm_target *ti)
>  	struct dmz_target *dmz = ti->private;
>  	int i;
>  
> -	for (i = 0; i < DMZ_MAX_DEVS; i++) {
> +	for (i = 0; i < dmz->nr_ddevs; i++) {
>  		if (dmz->ddev[i]) {
>  			dm_put_device(ti, dmz->ddev[i]);
>  			dmz->ddev[i] = NULL;
> @@ -777,21 +775,35 @@ static int dmz_fixup_devices(struct dm_target *ti)
>  	struct dmz_target *dmz = ti->private;
>  	struct dmz_dev *reg_dev, *zoned_dev;
>  	struct request_queue *q;
> +	sector_t zone_nr_sectors = 0;
> +	int i;
>  
>  	/*
> -	 * When we have two devices, the first one must be a regular block
> -	 * device and the second a zoned block device.
> +	 * When we have more than on devices, the first one must be a
> +	 * regular block device and the others zoned block devices.
>  	 */
> -	if (dmz->ddev[0] && dmz->ddev[1]) {
> +	if (dmz->nr_ddevs > 1) {
>  		reg_dev = &dmz->dev[0];
>  		if (!(reg_dev->flags & DMZ_BDEV_REGULAR)) {
>  			ti->error = "Primary disk is not a regular device";
>  			return -EINVAL;
>  		}
> -		zoned_dev = &dmz->dev[1];
> -		if (zoned_dev->flags & DMZ_BDEV_REGULAR) {
> -			ti->error = "Secondary disk is not a zoned device";
> -			return -EINVAL;
> +		for (i = 1; i < dmz->nr_ddevs; i++) {
> +			zoned_dev = &dmz->dev[i];
> +			if (zoned_dev->flags & DMZ_BDEV_REGULAR) {
> +				ti->error = "Secondary disk is not a zoned device";
> +				return -EINVAL;
> +			}
> +			q = bdev_get_queue(zoned_dev->bdev);
> +			if (zone_nr_sectors &&
> +			    zone_nr_sectors != blk_queue_zone_sectors(q)) {
> +				ti->error = "Zone nr sectors mismatch";
> +				return -EINVAL;
> +			}
> +			zone_nr_sectors = blk_queue_zone_sectors(q);
> +			zoned_dev->zone_nr_sectors = zone_nr_sectors;
> +			zoned_dev->nr_zones =
> +				blkdev_nr_zones(zoned_dev->bdev->bd_disk);
>  		}
>  	} else {
>  		reg_dev = NULL;
> @@ -800,17 +812,24 @@ static int dmz_fixup_devices(struct dm_target *ti)
>  			ti->error = "Disk is not a zoned device";
>  			return -EINVAL;
>  		}
> +		q = bdev_get_queue(zoned_dev->bdev);
> +		zoned_dev->zone_nr_sectors = blk_queue_zone_sectors(q);
> +		zoned_dev->nr_zones = blkdev_nr_zones(zoned_dev->bdev->bd_disk);
>  	}
> -	q = bdev_get_queue(zoned_dev->bdev);
> -	zoned_dev->zone_nr_sectors = blk_queue_zone_sectors(q);
> -	zoned_dev->nr_zones = blkdev_nr_zones(zoned_dev->bdev->bd_disk);
>  
>  	if (reg_dev) {
> -		reg_dev->zone_nr_sectors = zoned_dev->zone_nr_sectors;
> +		sector_t zone_offset;
> +
> +		reg_dev->zone_nr_sectors = zone_nr_sectors;
>  		reg_dev->nr_zones =
>  			DIV_ROUND_UP_SECTOR_T(reg_dev->capacity,
>  					      reg_dev->zone_nr_sectors);
> -		zoned_dev->zone_offset = reg_dev->nr_zones;
> +		reg_dev->zone_offset = 0;
> +		zone_offset = reg_dev->nr_zones;
> +		for (i = 1; i < dmz->nr_ddevs; i++) {
> +			dmz->dev[i].zone_offset = zone_offset;
> +			zone_offset += dmz->dev[i].nr_zones;
> +		}
>  	}
>  	return 0;
>  }
> @@ -824,7 +843,7 @@ static int dmz_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  	int ret, i;
>  
>  	/* Check arguments */
> -	if (argc < 1 || argc > 2) {
> +	if (argc < 1) {
>  		ti->error = "Invalid argument count";
>  		return -EINVAL;
>  	}
> @@ -852,22 +871,14 @@ static int dmz_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  	ti->private = dmz;
>  
>  	/* Get the target zoned block device */
> -	ret = dmz_get_zoned_device(ti, argv[0], 0, argc);
> -	if (ret)
> -		goto err;
> -
> -	if (argc == 2) {
> -		ret = dmz_get_zoned_device(ti, argv[1], 1, argc);
> -		if (ret) {
> -			dmz_put_zoned_device(ti);
> -			goto err;
> -		}
> +	for (i = 0; i < argc; i++) {
> +		ret = dmz_get_zoned_device(ti, argv[i], i, argc);
> +		if (ret)
> +			goto err_dev;
>  	}
>  	ret = dmz_fixup_devices(ti);
> -	if (ret) {
> -		dmz_put_zoned_device(ti);
> -		goto err;
> -	}
> +	if (ret)
> +		goto err_dev;
>  
>  	/* Initialize metadata */
>  	ret = dmz_ctr_metadata(dmz->dev, argc, &dmz->metadata,
> @@ -1090,9 +1101,7 @@ static void dmz_status(struct dm_target *ti, status_type_t type,
>  		       dmz_nr_zones(dmz->metadata),
>  		       dmz_nr_unmap_cache_zones(dmz->metadata),
>  		       dmz_nr_cache_zones(dmz->metadata));
> -		for (i = 0; i < DMZ_MAX_DEVS; i++) {
> -			if (!dmz->ddev[i])
> -				continue;
> +		for (i = 0; i < dmz->nr_ddevs; i++) {
>  			/*
>  			 * For a multi-device setup the first device
>  			 * contains only cache zones.
> @@ -1111,8 +1120,8 @@ static void dmz_status(struct dm_target *ti, status_type_t type,
>  		dev = &dmz->dev[0];
>  		format_dev_t(buf, dev->bdev->bd_dev);
>  		DMEMIT("%s", buf);
> -		if (dmz->nr_ddevs > 1) {
> -			dev = &dmz->dev[1];
> +		for (i = 1; i < dmz->nr_ddevs; i++) {
> +			dev = &dmz->dev[i];
>  			format_dev_t(buf, dev->bdev->bd_dev);
>  			DMEMIT(" %s", buf);
>  		}
> @@ -1140,7 +1149,7 @@ static int dmz_message(struct dm_target *ti, unsigned int argc, char **argv,
>  
>  static struct target_type dmz_type = {
>  	.name		 = "zoned",
> -	.version	 = {2, 0, 0},
> +	.version	 = {3, 0, 0},
>  	.features	 = DM_TARGET_SINGLETON | DM_TARGET_ZONED_HM,
>  	.module		 = THIS_MODULE,
>  	.ctr		 = dmz_ctr,

Apart from the nit above, looks good.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 13/15] dm-zoned: allocate zone by device index
  2020-05-27  6:22 ` [PATCH 13/15] dm-zoned: allocate zone by device index Hannes Reinecke
@ 2020-05-28  4:08   ` Damien Le Moal
  2020-05-29 15:48     ` Hannes Reinecke
  0 siblings, 1 reply; 32+ messages in thread
From: Damien Le Moal @ 2020-05-28  4:08 UTC (permalink / raw)
  To: snitzer, hare; +Cc: dm-devel

On Wed, 2020-05-27 at 08:22 +0200, Hannes Reinecke wrote:
> When allocating a zone we should pass in an indicator on which
> device the zone should be allocated; this increases performance
> for a multi-device setup as then reclaim can allocate zones on
> the device for which reclaim is running.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/md/dm-zoned-metadata.c | 17 +++++++++++------
>  drivers/md/dm-zoned-reclaim.c  |  3 ++-
>  drivers/md/dm-zoned.h          |  3 ++-
>  3 files changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> index 689c1dd7ab20..0d65af94309a 100644
> --- a/drivers/md/dm-zoned-metadata.c
> +++ b/drivers/md/dm-zoned-metadata.c
> @@ -2045,7 +2045,7 @@ struct dm_zone *dmz_get_chunk_mapping(struct dmz_metadata *zmd, unsigned int chu
>  			goto out;
>  
>  		/* Allocate a random zone */
> -		dzone = dmz_alloc_zone(zmd, alloc_flags);
> +		dzone = dmz_alloc_zone(zmd, 0, alloc_flags);
>  		if (!dzone) {
>  			if (dmz_dev_is_dying(zmd)) {
>  				dzone = ERR_PTR(-EIO);
> @@ -2151,7 +2151,7 @@ struct dm_zone *dmz_get_chunk_buffer(struct dmz_metadata *zmd,
>  		goto out;
>  
>  	/* Allocate a random zone */
> -	bzone = dmz_alloc_zone(zmd, alloc_flags);
> +	bzone = dmz_alloc_zone(zmd, 0, alloc_flags);
>  	if (!bzone) {
>  		if (dmz_dev_is_dying(zmd)) {
>  			bzone = ERR_PTR(-EIO);
> @@ -2182,11 +2182,12 @@ struct dm_zone *dmz_get_chunk_buffer(struct dmz_metadata *zmd,
>   * Get an unmapped (free) zone.
>   * This must be called with the mapping lock held.
>   */
> -struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags)
> +struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned int dev_idx,
> +			       unsigned long flags)
>  {
>  	struct list_head *list;
>  	struct dm_zone *zone;
> -	unsigned int dev_idx = 0;
> +	int i = 0;
>  
>  again:
>  	if (flags & DMZ_ALLOC_CACHE)
> @@ -2202,8 +2203,12 @@ struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags)
>  		 */
>  		if (!(flags & DMZ_ALLOC_RECLAIM))
>  			return NULL;
> -		if (dev_idx < zmd->nr_devs) {
> -			dev_idx++;
> +		/*
> +		 * Try to allocate from other devices
> +		 */
> +		if (i < zmd->nr_devs) {
> +			dev_idx = (dev_idx + 1) % zmd->nr_devs;

Hu ? You deleted dev_idx declaration above...

> +			i++;
>  			goto again;
>  		}
>  
> diff --git a/drivers/md/dm-zoned-reclaim.c b/drivers/md/dm-zoned-reclaim.c
> index 18edf1b9bf52..5a04e34d17a9 100644
> --- a/drivers/md/dm-zoned-reclaim.c
> +++ b/drivers/md/dm-zoned-reclaim.c
> @@ -288,7 +288,8 @@ static int dmz_reclaim_rnd_data(struct dmz_reclaim *zrc, struct dm_zone *dzone)
>  	/* Get a free random or sequential zone */
>  	dmz_lock_map(zmd);
>  again:
> -	szone = dmz_alloc_zone(zmd, alloc_flags | DMZ_ALLOC_RECLAIM);
> +	szone = dmz_alloc_zone(zmd, zrc->dev_idx,
> +			       alloc_flags | DMZ_ALLOC_RECLAIM);
>  	if (!szone && alloc_flags == DMZ_ALLOC_SEQ && dmz_nr_cache_zones(zmd)) {
>  		alloc_flags = DMZ_ALLOC_RND;
>  		goto again;
> diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
> index f2a760f62db5..ec020bb1caf7 100644
> --- a/drivers/md/dm-zoned.h
> +++ b/drivers/md/dm-zoned.h
> @@ -214,7 +214,8 @@ bool dmz_dev_is_dying(struct dmz_metadata *zmd);
>  #define DMZ_ALLOC_SEQ		0x04
>  #define DMZ_ALLOC_RECLAIM	0x10
>  
> -struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags);
> +struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd,
> +			       unsigned int dev_idx, unsigned long flags);
>  void dmz_free_zone(struct dmz_metadata *zmd, struct dm_zone *zone);
>  
>  void dmz_map_zone(struct dmz_metadata *zmd, struct dm_zone *zone,

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 14/15] dm-zoned: select reclaim zone based on device index
  2020-05-27  6:22 ` [PATCH 14/15] dm-zoned: select reclaim zone based on " Hannes Reinecke
@ 2020-05-28  4:11   ` Damien Le Moal
  0 siblings, 0 replies; 32+ messages in thread
From: Damien Le Moal @ 2020-05-28  4:11 UTC (permalink / raw)
  To: snitzer, hare; +Cc: dm-devel

On Wed, 2020-05-27 at 08:22 +0200, Hannes Reinecke wrote:
> per-device reclaim should select zones on that device only.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/md/dm-zoned-metadata.c | 50 +++++++++++++++++-------------------------
>  drivers/md/dm-zoned-reclaim.c  |  3 ++-
>  drivers/md/dm-zoned-target.c   |  1 +
>  drivers/md/dm-zoned.h          |  5 ++++-
>  4 files changed, 27 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> index 0d65af94309a..b89b3d3b9ec9 100644
> --- a/drivers/md/dm-zoned-metadata.c
> +++ b/drivers/md/dm-zoned-metadata.c
> @@ -1933,7 +1933,7 @@ static void dmz_wait_for_reclaim(struct dmz_metadata *zmd, struct dm_zone *zone)
>   * Select a cache or random write zone for reclaim.
>   */
>  static struct dm_zone *dmz_get_rnd_zone_for_reclaim(struct dmz_metadata *zmd,
> -						    bool idle)
> +						    unsigned int idx, bool idle)
>  {
>  	struct dm_zone *dzone = NULL;
>  	struct dm_zone *zone;
> @@ -1943,24 +1943,17 @@ static struct dm_zone *dmz_get_rnd_zone_for_reclaim(struct dmz_metadata *zmd,
>  	if (zmd->nr_cache) {
>  		zone_list = &zmd->map_cache_list;
>  		/* Try to relaim random zones, too, when idle */
> -		if (idle && list_empty(zone_list)) {
> -			int i;
> -
> -			for (i = 1; i < zmd->nr_devs; i++) {
> -				zone_list = &zmd->dev[i].map_rnd_list;
> -				if (!list_empty(zone_list))
> -					break;
> -			}
> -		}
> -	} else {
> -		/* Otherwise the random zones are on the first disk */
> -		zone_list = &zmd->dev[0].map_rnd_list;
> -	}
> +		if (idle && list_empty(zone_list))
> +			zone_list = &zmd->dev[idx].map_rnd_list;
> +	} else
> +		zone_list = &zmd->dev[idx].map_rnd_list;
>  
>  	list_for_each_entry(zone, zone_list, link) {
> -		if (dmz_is_buf(zone))
> +		if (dmz_is_buf(zone)) {
>  			dzone = zone->bzone;
> -		else
> +			if (dzone->dev->dev_idx != idx)
> +				continue;
> +		} else
>  			dzone = zone;
>  		if (dmz_lock_zone_reclaim(dzone))
>  			return dzone;
> @@ -1972,20 +1965,16 @@ static struct dm_zone *dmz_get_rnd_zone_for_reclaim(struct dmz_metadata *zmd,
>  /*
>   * Select a buffered sequential zone for reclaim.
>   */
> -static struct dm_zone *dmz_get_seq_zone_for_reclaim(struct dmz_metadata *zmd)
> +static struct dm_zone *dmz_get_seq_zone_for_reclaim(struct dmz_metadata *zmd,
> +						    unsigned int idx)
>  {
>  	struct dm_zone *zone;
> -	int i;
> -
> -	for (i = 0; i < zmd->nr_devs; i++) {
> -		struct dmz_dev *dev = &zmd->dev[i];
>  
> -		list_for_each_entry(zone, &dev->map_seq_list, link) {
> -			if (!zone->bzone)
> -				continue;
> -			if (dmz_lock_zone_reclaim(zone))
> -				return zone;
> -		}
> +	list_for_each_entry(zone, &zmd->dev[idx].map_seq_list, link) {
> +		if (!zone->bzone)
> +			continue;
> +		if (dmz_lock_zone_reclaim(zone))
> +			return zone;
>  	}
>  
>  	return NULL;
> @@ -1994,7 +1983,8 @@ static struct dm_zone *dmz_get_seq_zone_for_reclaim(struct dmz_metadata *zmd)
>  /*
>   * Select a zone for reclaim.
>   */
> -struct dm_zone *dmz_get_zone_for_reclaim(struct dmz_metadata *zmd, bool idle)
> +struct dm_zone *dmz_get_zone_for_reclaim(struct dmz_metadata *zmd,
> +					 unsigned int dev_idx, bool idle)
>  {
>  	struct dm_zone *zone;
>  
> @@ -2008,9 +1998,9 @@ struct dm_zone *dmz_get_zone_for_reclaim(struct dmz_metadata *zmd, bool idle)
>  	 */
>  	dmz_lock_map(zmd);
>  	if (list_empty(&zmd->reserved_seq_zones_list))
> -		zone = dmz_get_seq_zone_for_reclaim(zmd);
> +		zone = dmz_get_seq_zone_for_reclaim(zmd, dev_idx);
>  	else
> -		zone = dmz_get_rnd_zone_for_reclaim(zmd, idle);
> +		zone = dmz_get_rnd_zone_for_reclaim(zmd, dev_idx, idle);
>  	dmz_unlock_map(zmd);
>  
>  	return zone;
> diff --git a/drivers/md/dm-zoned-reclaim.c b/drivers/md/dm-zoned-reclaim.c
> index 5a04e34d17a9..2261b4dd60b7 100644
> --- a/drivers/md/dm-zoned-reclaim.c
> +++ b/drivers/md/dm-zoned-reclaim.c
> @@ -370,7 +370,8 @@ static int dmz_do_reclaim(struct dmz_reclaim *zrc)
>  	int ret;
>  
>  	/* Get a data zone */
> -	dzone = dmz_get_zone_for_reclaim(zmd, dmz_target_idle(zrc));
> +	dzone = dmz_get_zone_for_reclaim(zmd, zrc->dev_idx,
> +					 dmz_target_idle(zrc));
>  	if (!dzone) {
>  		DMDEBUG("(%s/%u): No zone found to reclaim",
>  			dmz_metadata_label(zmd), zrc->dev_idx);
> diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
> index 4a51738d4b0d..a23f1fbd208f 100644
> --- a/drivers/md/dm-zoned-target.c
> +++ b/drivers/md/dm-zoned-target.c
> @@ -738,6 +738,7 @@ static int dmz_get_zoned_device(struct dm_target *ti, char *path,
>  		dev = &dmz->dev[idx];
>  	}
>  	dev->bdev = bdev;
> +	dev->dev_idx = idx;
>  	(void)bdevname(dev->bdev, dev->name);
>  
>  	dev->capacity = i_size_read(bdev->bd_inode) >> SECTOR_SHIFT;
> diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
> index ec020bb1caf7..22f11440b423 100644
> --- a/drivers/md/dm-zoned.h
> +++ b/drivers/md/dm-zoned.h
> @@ -61,6 +61,8 @@ struct dmz_dev {
>  
>  	sector_t		capacity;
>  
> +	unsigned int		dev_idx;
> +
>  	unsigned int		nr_zones;
>  	unsigned int		zone_offset;
>  
> @@ -243,7 +245,8 @@ static inline void dmz_activate_zone(struct dm_zone *zone)
>  
>  int dmz_lock_zone_reclaim(struct dm_zone *zone);
>  void dmz_unlock_zone_reclaim(struct dm_zone *zone);
> -struct dm_zone *dmz_get_zone_for_reclaim(struct dmz_metadata *zmd, bool idle);
> +struct dm_zone *dmz_get_zone_for_reclaim(struct dmz_metadata *zmd,
> +					 unsigned int dev_idx, bool idle);
>  
>  struct dm_zone *dmz_get_chunk_mapping(struct dmz_metadata *zmd,
>  				      unsigned int chunk, int op);

Looks good.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 15/15] dm-zoned: prefer full zones for reclaim
  2020-05-27  6:22 ` [PATCH 15/15] dm-zoned: prefer full zones for reclaim Hannes Reinecke
@ 2020-05-28  4:16   ` Damien Le Moal
  2020-05-28  5:15     ` Damien Le Moal
  0 siblings, 1 reply; 32+ messages in thread
From: Damien Le Moal @ 2020-05-28  4:16 UTC (permalink / raw)
  To: snitzer, hare; +Cc: dm-devel

On Wed, 2020-05-27 at 08:22 +0200, Hannes Reinecke wrote:
> Prefer full zones when selecting the next zone for reclaim.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/md/dm-zoned-metadata.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> index b89b3d3b9ec9..f161ef4e3d3d 100644
> --- a/drivers/md/dm-zoned-metadata.c
> +++ b/drivers/md/dm-zoned-metadata.c
> @@ -1936,7 +1936,7 @@ static struct dm_zone *dmz_get_rnd_zone_for_reclaim(struct dmz_metadata *zmd,
>  						    unsigned int idx, bool idle)
>  {
>  	struct dm_zone *dzone = NULL;
> -	struct dm_zone *zone;
> +	struct dm_zone *zone, *last = NULL;
>  	struct list_head *zone_list;
>  
>  	/* If we have cache zones select from the cache zone list */
> @@ -1953,6 +1953,13 @@ static struct dm_zone *dmz_get_rnd_zone_for_reclaim(struct dmz_metadata *zmd,
>  			dzone = zone->bzone;
>  			if (dzone->dev->dev_idx != idx)
>  				continue;
> +			if (!last) {
> +				last = dzone;
> +				continue;
> +			}
> +			if (last->weight < dzone->weight)
> +				continue;
> +			dzone = last;
>  		} else
>  			dzone = zone;
>  		if (dmz_lock_zone_reclaim(dzone))

If all random/cache zones are used but none of them satisfy the
condition last->weight < dzone->weight, we may end up starving reclaim
and having user IOs accessing a new chunk wait a loooong time, if not
forever, No ? I agree that aiming at reclaim of full zones first is
more efficient, but we need a fallback to ensure forward progress.

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 00/15] dm-zoned: multi-device support
  2020-05-27 13:41 ` [PATCHv2 00/15] dm-zoned: multi-device support Mike Snitzer
@ 2020-05-28  4:20   ` Damien Le Moal
  0 siblings, 0 replies; 32+ messages in thread
From: Damien Le Moal @ 2020-05-28  4:20 UTC (permalink / raw)
  To: snitzer, hare; +Cc: dm-devel

On Wed, 2020-05-27 at 09:41 -0400, Mike Snitzer wrote:
> On Wed, May 27 2020 at  2:22am -0400,
> Hannes Reinecke <hare@suse.de> wrote:
> 
> > Hi all,
> > 
> > here's the second version of my patchset to support multiple zoned
> > drives with dm-zoned.
> > This patchset:
> > - Converts the zone array to using xarray for better scalability
> > - Separates out shared structures into per-device structure
> > - Enforce drive-locality for allocating and reclaiming zones
> > - Lifts the restriction of 2 devices to handle an arbitrary number
> >   of drives.
> > 
> > This gives me a near-perfect scalability by increasing the write
> > speed from 150MB/s (for a cache and one zoned drive) to 300MB/s
> > (for a cache and two zoned drives).
> > 
> > Changes to v1:
> > - Include reviews from Damien
> > - Reshuffle patches
> > 
> > Hannes Reinecke (15):
> >   dm-zoned: add debugging message for reading superblocks
> >   dm-zoned: secondary superblock must reside on the same devices than
> >     primary superblock
> >   dm-zoned: improve logging messages for reclaim
> >   dm-zoned: add a 'reserved' zone flag
> >   dm-zoned: convert to xarray
> >   dm-zoned: temporary superblock for tertiary devices
> >   dm-zoned: add device pointer to struct dm_zone
> >   dm-zoned: add metadata pointer to struct dmz_dev
> >   dm-zoned: allocate dm devices dynamically
> >   dm-zoned: per-device reclaim
> >   dm-zoned: move random and sequential zones into struct dmz_dev
> >   dm-zoned: support arbitrary number of devices
> >   dm-zoned: allocate zone by device index
> >   dm-zoned: select reclaim zone based on device index
> >   dm-zoned: prefer full zones for reclaim
> > 
> >  drivers/md/dm-zoned-metadata.c | 448 ++++++++++++++++++++++++-----------------
> >  drivers/md/dm-zoned-reclaim.c  |  95 +++++----
> >  drivers/md/dm-zoned-target.c   | 169 ++++++++++------
> >  drivers/md/dm-zoned.h          |  77 ++++---
> >  4 files changed, 481 insertions(+), 308 deletions(-)
> 
> Would you still like to wait until the 5.9 merge window?
> 
> Or would you prefer to see these changes land for 5.8 so as to limit the
> variants of related code that needs to be supported?
> 
> If you and Damien are OK with 5.8 (and testing backs that up) then I
> should be able to get it to land.

Yes, it would be nice to have all the recent changes under dm-zoned V2.
I just reviewed Hannes V2 series and I am running tests right now. So
far so good. Will send results by the end of my day.

What is your deadline for a final series to make it to 5.8 ? This end
of week ? If Hannes has time to rework another round, we should be good
then.

Thanks.

> 
> Let me know, thanks.
> Mike
> 

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 15/15] dm-zoned: prefer full zones for reclaim
  2020-05-28  4:16   ` Damien Le Moal
@ 2020-05-28  5:15     ` Damien Le Moal
  0 siblings, 0 replies; 32+ messages in thread
From: Damien Le Moal @ 2020-05-28  5:15 UTC (permalink / raw)
  To: snitzer, hare; +Cc: dm-devel

On Thu, 2020-05-28 at 13:16 +0900, Damien Le Moal wrote:
> On Wed, 2020-05-27 at 08:22 +0200, Hannes Reinecke wrote:
> > Prefer full zones when selecting the next zone for reclaim.
> > 
> > Signed-off-by: Hannes Reinecke <hare@suse.de>
> > ---
> >  drivers/md/dm-zoned-metadata.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> > index b89b3d3b9ec9..f161ef4e3d3d 100644
> > --- a/drivers/md/dm-zoned-metadata.c
> > +++ b/drivers/md/dm-zoned-metadata.c
> > @@ -1936,7 +1936,7 @@ static struct dm_zone *dmz_get_rnd_zone_for_reclaim(struct dmz_metadata *zmd,
> >  						    unsigned int idx, bool idle)
> >  {
> >  	struct dm_zone *dzone = NULL;
> > -	struct dm_zone *zone;
> > +	struct dm_zone *zone, *last = NULL;
> >  	struct list_head *zone_list;
> >  
> >  	/* If we have cache zones select from the cache zone list */
> > @@ -1953,6 +1953,13 @@ static struct dm_zone *dmz_get_rnd_zone_for_reclaim(struct dmz_metadata *zmd,
> >  			dzone = zone->bzone;
> >  			if (dzone->dev->dev_idx != idx)
> >  				continue;
> > +			if (!last) {
> > +				last = dzone;
> > +				continue;
> > +			}
> > +			if (last->weight < dzone->weight)
> > +				continue;
> > +			dzone = last;
> >  		} else
> >  			dzone = zone;
> >  		if (dmz_lock_zone_reclaim(dzone))
> 
> If all random/cache zones are used but none of them satisfy the
> condition last->weight < dzone->weight, we may end up starving reclaim
> and having user IOs accessing a new chunk wait a loooong time, if not
> forever, No ? I agree that aiming at reclaim of full zones first is
> more efficient, but we need a fallback to ensure forward progress.

Ignore... You are simply trying to select the zone with the largest
weight, which makes perfect sense. The commit message mentioning "full"
zones tricked me... Arg. I need more coffee. Slow afternoon :)

Renaming "last" to "max" would make the intent clearer I think.

Anyway, please feel free to add:

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCHv2 00/15] dm-zoned: multi-device support
  2020-05-27  6:22 [PATCHv2 00/15] dm-zoned: multi-device support Hannes Reinecke
                   ` (15 preceding siblings ...)
  2020-05-27 13:41 ` [PATCHv2 00/15] dm-zoned: multi-device support Mike Snitzer
@ 2020-05-28  8:24 ` Damien Le Moal
  16 siblings, 0 replies; 32+ messages in thread
From: Damien Le Moal @ 2020-05-28  8:24 UTC (permalink / raw)
  To: snitzer, hare; +Cc: dm-devel

[-- Attachment #1: Type: text/plain, Size: 2334 bytes --]

On Wed, 2020-05-27 at 08:22 +0200, Hannes Reinecke wrote:
> Hi all,
> 
> here's the second version of my patchset to support multiple zoned
> drives with dm-zoned.
> This patchset:
> - Converts the zone array to using xarray for better scalability
> - Separates out shared structures into per-device structure
> - Enforce drive-locality for allocating and reclaiming zones
> - Lifts the restriction of 2 devices to handle an arbitrary number
>   of drives.
> 
> This gives me a near-perfect scalability by increasing the write
> speed from 150MB/s (for a cache and one zoned drive) to 300MB/s
> (for a cache and two zoned drives).

Finished running 3 tests (2 hours each) for single drive, Optane SSD +
single disk and Optane SSD + 2 disks. No problems detected.

See the attached file for a plot of the results. I also get near
perfect avergare performance scaling with the cache device from 185
MB/s with ssd+single disk to 367 MB/s with ssd+2 disks. And that is
more than 10 times higher than the legacy single drive setup at 34 MB/s
in average.

> 
> Changes to v1:
> - Include reviews from Damien
> - Reshuffle patches
> 
> Hannes Reinecke (15):
>   dm-zoned: add debugging message for reading superblocks
>   dm-zoned: secondary superblock must reside on the same devices than
>     primary superblock
>   dm-zoned: improve logging messages for reclaim
>   dm-zoned: add a 'reserved' zone flag
>   dm-zoned: convert to xarray
>   dm-zoned: temporary superblock for tertiary devices
>   dm-zoned: add device pointer to struct dm_zone
>   dm-zoned: add metadata pointer to struct dmz_dev
>   dm-zoned: allocate dm devices dynamically
>   dm-zoned: per-device reclaim
>   dm-zoned: move random and sequential zones into struct dmz_dev
>   dm-zoned: support arbitrary number of devices
>   dm-zoned: allocate zone by device index
>   dm-zoned: select reclaim zone based on device index
>   dm-zoned: prefer full zones for reclaim
> 
>  drivers/md/dm-zoned-metadata.c | 448 ++++++++++++++++++++++++-----------------
>  drivers/md/dm-zoned-reclaim.c  |  95 +++++----
>  drivers/md/dm-zoned-target.c   | 169 ++++++++++------
>  drivers/md/dm-zoned.h          |  77 ++++---
>  4 files changed, 481 insertions(+), 308 deletions(-)
> 

-- 
Damien Le Moal
Western Digital Research

[-- Attachment #2: dm-zoned.png --]
[-- Type: image/png, Size: 114840 bytes --]

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 13/15] dm-zoned: allocate zone by device index
  2020-05-28  4:08   ` Damien Le Moal
@ 2020-05-29 15:48     ` Hannes Reinecke
  2020-05-31  8:54       ` Damien Le Moal
  0 siblings, 1 reply; 32+ messages in thread
From: Hannes Reinecke @ 2020-05-29 15:48 UTC (permalink / raw)
  To: Damien Le Moal, snitzer; +Cc: dm-devel

On 5/28/20 6:08 AM, Damien Le Moal wrote:
> On Wed, 2020-05-27 at 08:22 +0200, Hannes Reinecke wrote:
>> When allocating a zone we should pass in an indicator on which
>> device the zone should be allocated; this increases performance
>> for a multi-device setup as then reclaim can allocate zones on
>> the device for which reclaim is running.
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>   drivers/md/dm-zoned-metadata.c | 17 +++++++++++------
>>   drivers/md/dm-zoned-reclaim.c  |  3 ++-
>>   drivers/md/dm-zoned.h          |  3 ++-
>>   3 files changed, 15 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
>> index 689c1dd7ab20..0d65af94309a 100644
>> --- a/drivers/md/dm-zoned-metadata.c
>> +++ b/drivers/md/dm-zoned-metadata.c
>> @@ -2045,7 +2045,7 @@ struct dm_zone *dmz_get_chunk_mapping(struct dmz_metadata *zmd, unsigned int chu
>>   			goto out;
>>   
>>   		/* Allocate a random zone */
>> -		dzone = dmz_alloc_zone(zmd, alloc_flags);
>> +		dzone = dmz_alloc_zone(zmd, 0, alloc_flags);
>>   		if (!dzone) {
>>   			if (dmz_dev_is_dying(zmd)) {
>>   				dzone = ERR_PTR(-EIO);
>> @@ -2151,7 +2151,7 @@ struct dm_zone *dmz_get_chunk_buffer(struct dmz_metadata *zmd,
>>   		goto out;
>>   
>>   	/* Allocate a random zone */
>> -	bzone = dmz_alloc_zone(zmd, alloc_flags);
>> +	bzone = dmz_alloc_zone(zmd, 0, alloc_flags);
>>   	if (!bzone) {
>>   		if (dmz_dev_is_dying(zmd)) {
>>   			bzone = ERR_PTR(-EIO);
>> @@ -2182,11 +2182,12 @@ struct dm_zone *dmz_get_chunk_buffer(struct dmz_metadata *zmd,
>>    * Get an unmapped (free) zone.
>>    * This must be called with the mapping lock held.
>>    */
>> -struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags)
>> +struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned int dev_idx,
>> +			       unsigned long flags)
>>   {
>>   	struct list_head *list;
>>   	struct dm_zone *zone;
>> -	unsigned int dev_idx = 0;
>> +	int i = 0;
>>   
>>   again:
>>   	if (flags & DMZ_ALLOC_CACHE)
>> @@ -2202,8 +2203,12 @@ struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags)
>>   		 */
>>   		if (!(flags & DMZ_ALLOC_RECLAIM))
>>   			return NULL;
>> -		if (dev_idx < zmd->nr_devs) {
>> -			dev_idx++;
>> +		/*
>> +		 * Try to allocate from other devices
>> +		 */
>> +		if (i < zmd->nr_devs) {
>> +			dev_idx = (dev_idx + 1) % zmd->nr_devs;
> 
> Hu ? You deleted dev_idx declaration above...
> 
It's now the function argument ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 13/15] dm-zoned: allocate zone by device index
  2020-05-29 15:48     ` Hannes Reinecke
@ 2020-05-31  8:54       ` Damien Le Moal
  0 siblings, 0 replies; 32+ messages in thread
From: Damien Le Moal @ 2020-05-31  8:54 UTC (permalink / raw)
  To: snitzer, hare; +Cc: dm-devel

On Fri, 2020-05-29 at 17:48 +0200, Hannes Reinecke wrote:
> On 5/28/20 6:08 AM, Damien Le Moal wrote:
> > On Wed, 2020-05-27 at 08:22 +0200, Hannes Reinecke wrote:
> > > When allocating a zone we should pass in an indicator on which
> > > device the zone should be allocated; this increases performance
> > > for a multi-device setup as then reclaim can allocate zones on
> > > the device for which reclaim is running.
> > > 
> > > Signed-off-by: Hannes Reinecke <hare@suse.de>
> > > ---
> > >   drivers/md/dm-zoned-metadata.c | 17 +++++++++++------
> > >   drivers/md/dm-zoned-reclaim.c  |  3 ++-
> > >   drivers/md/dm-zoned.h          |  3 ++-
> > >   3 files changed, 15 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> > > index 689c1dd7ab20..0d65af94309a 100644
> > > --- a/drivers/md/dm-zoned-metadata.c
> > > +++ b/drivers/md/dm-zoned-metadata.c
> > > @@ -2045,7 +2045,7 @@ struct dm_zone *dmz_get_chunk_mapping(struct dmz_metadata *zmd, unsigned int chu
> > >   			goto out;
> > >   
> > >   		/* Allocate a random zone */
> > > -		dzone = dmz_alloc_zone(zmd, alloc_flags);
> > > +		dzone = dmz_alloc_zone(zmd, 0, alloc_flags);
> > >   		if (!dzone) {
> > >   			if (dmz_dev_is_dying(zmd)) {
> > >   				dzone = ERR_PTR(-EIO);
> > > @@ -2151,7 +2151,7 @@ struct dm_zone *dmz_get_chunk_buffer(struct dmz_metadata *zmd,
> > >   		goto out;
> > >   
> > >   	/* Allocate a random zone */
> > > -	bzone = dmz_alloc_zone(zmd, alloc_flags);
> > > +	bzone = dmz_alloc_zone(zmd, 0, alloc_flags);
> > >   	if (!bzone) {
> > >   		if (dmz_dev_is_dying(zmd)) {
> > >   			bzone = ERR_PTR(-EIO);
> > > @@ -2182,11 +2182,12 @@ struct dm_zone *dmz_get_chunk_buffer(struct dmz_metadata *zmd,
> > >    * Get an unmapped (free) zone.
> > >    * This must be called with the mapping lock held.
> > >    */
> > > -struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags)
> > > +struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned int dev_idx,
> > > +			       unsigned long flags)
> > >   {
> > >   	struct list_head *list;
> > >   	struct dm_zone *zone;
> > > -	unsigned int dev_idx = 0;
> > > +	int i = 0;
> > >   
> > >   again:
> > >   	if (flags & DMZ_ALLOC_CACHE)
> > > @@ -2202,8 +2203,12 @@ struct dm_zone *dmz_alloc_zone(struct dmz_metadata *zmd, unsigned long flags)
> > >   		 */
> > >   		if (!(flags & DMZ_ALLOC_RECLAIM))
> > >   			return NULL;
> > > -		if (dev_idx < zmd->nr_devs) {
> > > -			dev_idx++;
> > > +		/*
> > > +		 * Try to allocate from other devices
> > > +		 */
> > > +		if (i < zmd->nr_devs) {
> > > +			dev_idx = (dev_idx + 1) % zmd->nr_devs;
> > 
> > Hu ? You deleted dev_idx declaration above...
> > 
> It's now the function argument ...

Oops... Yes, missed that...

> 
> Cheers,
> 
> Hannes

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2020-05-31  8:54 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-27  6:22 [PATCHv2 00/15] dm-zoned: multi-device support Hannes Reinecke
2020-05-27  6:22 ` [PATCH 01/15] dm-zoned: add debugging message for reading superblocks Hannes Reinecke
2020-05-27  6:22 ` [PATCH 02/15] dm-zoned: secondary superblock must reside on the same devices than primary superblock Hannes Reinecke
2020-05-27  6:22 ` [PATCH 03/15] dm-zoned: improve logging messages for reclaim Hannes Reinecke
2020-05-27  6:22 ` [PATCH 04/15] dm-zoned: add a 'reserved' zone flag Hannes Reinecke
2020-05-27  6:22 ` [PATCH 05/15] dm-zoned: convert to xarray Hannes Reinecke
2020-05-28  2:46   ` Damien Le Moal
2020-05-27  6:22 ` [PATCH 06/15] dm-zoned: temporary superblock for tertiary devices Hannes Reinecke
2020-05-28  2:54   ` Damien Le Moal
2020-05-27  6:22 ` [PATCH 07/15] dm-zoned: add device pointer to struct dm_zone Hannes Reinecke
2020-05-28  2:57   ` Damien Le Moal
2020-05-27  6:22 ` [PATCH 08/15] dm-zoned: add metadata pointer to struct dmz_dev Hannes Reinecke
2020-05-27  6:22 ` [PATCH 09/15] dm-zoned: allocate dm devices dynamically Hannes Reinecke
2020-05-28  2:59   ` Damien Le Moal
2020-05-27  6:22 ` [PATCH 10/15] dm-zoned: per-device reclaim Hannes Reinecke
2020-05-28  3:19   ` Damien Le Moal
2020-05-27  6:22 ` [PATCH 11/15] dm-zoned: move random and sequential zones into struct dmz_dev Hannes Reinecke
2020-05-28  3:25   ` Damien Le Moal
2020-05-27  6:22 ` [PATCH 12/15] dm-zoned: support arbitrary number of devices Hannes Reinecke
2020-05-28  4:04   ` Damien Le Moal
2020-05-27  6:22 ` [PATCH 13/15] dm-zoned: allocate zone by device index Hannes Reinecke
2020-05-28  4:08   ` Damien Le Moal
2020-05-29 15:48     ` Hannes Reinecke
2020-05-31  8:54       ` Damien Le Moal
2020-05-27  6:22 ` [PATCH 14/15] dm-zoned: select reclaim zone based on " Hannes Reinecke
2020-05-28  4:11   ` Damien Le Moal
2020-05-27  6:22 ` [PATCH 15/15] dm-zoned: prefer full zones for reclaim Hannes Reinecke
2020-05-28  4:16   ` Damien Le Moal
2020-05-28  5:15     ` Damien Le Moal
2020-05-27 13:41 ` [PATCHv2 00/15] dm-zoned: multi-device support Mike Snitzer
2020-05-28  4:20   ` Damien Le Moal
2020-05-28  8:24 ` Damien Le Moal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.