All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/9] null_blk fixes, improvements and cleanup
@ 2020-11-11  5:16 Damien Le Moal
  2020-11-11  5:16 ` [PATCH v2 1/9] null_blk: Fix zone size initialization Damien Le Moal
                   ` (9 more replies)
  0 siblings, 10 replies; 27+ messages in thread
From: Damien Le Moal @ 2020-11-11  5:16 UTC (permalink / raw)
  To: linux-block, Jens Axboe

Jens,

This series provides fixes and improvements for null_blk.

The first two patches are bug fixes which likely should go into 5.10.
The first patch fixes a problem with zone initialization when the
device capacity is not a multiple of the zone size and the second
patch fixes zone append handling.

The following patches are improvements and cleanups:
* Patch 3 makes sure that the device max_sectors limit is aligned to the
  block size.
* Patch 4 improves zone locking overall, and especially the memory
  backing disabled case by introducing a spinlock array to implement a
  per zone lock in place of a global lock. With this patch, write
  performance remains mostly unchanged, but read performance with a
  multi-queue setup more than double from 1.3 MIOPS to 3.3 MIOPS (4K
  random reads to zones with fio zonemode=zbd).
* Patch 5 improves implicit zone close
* Patch 6 and 7 cleanup discard handling code and use that code to free
  the memory backing a zone that is being reset.
* Patch 8 adds the max_sectors configuration option to allow changing
  the max_sectors/max_hw_sectors of the device.
* Finally, patch 9 moves nullblk into its own directory under
  drivers/block/null_blk/

Comments are as always welcome.

Damien Le Moal (9):
  null_blk: Fix zone size initialization
  null_blk: Fail zone append to conventional zones
  null_blk: Align max_hw_sectors to blocksize
  null_blk: improve zone locking
  null_blk: Improve implicit zone close
  null_blk: cleanup discard handling
  null_blk: discard zones on reset
  null_blk: Allow controlling max_hw_sectors limit
  null_blk: Move driver into its own directory

 drivers/block/Kconfig                         |   8 +-
 drivers/block/Makefile                        |   7 +-
 drivers/block/null_blk/Kconfig                |  12 +
 drivers/block/null_blk/Makefile               |  11 +
 .../{null_blk_main.c => null_blk/main.c}      |  65 +++--
 drivers/block/{ => null_blk}/null_blk.h       |   9 +-
 .../{null_blk_trace.c => null_blk/trace.c}    |   2 +-
 .../{null_blk_trace.h => null_blk/trace.h}    |   2 +-
 .../{null_blk_zoned.c => null_blk/zoned.c}    | 245 ++++++++++++------
 9 files changed, 239 insertions(+), 122 deletions(-)
 create mode 100644 drivers/block/null_blk/Kconfig
 create mode 100644 drivers/block/null_blk/Makefile
 rename drivers/block/{null_blk_main.c => null_blk/main.c} (97%)
 rename drivers/block/{ => null_blk}/null_blk.h (94%)
 rename drivers/block/{null_blk_trace.c => null_blk/trace.c} (93%)
 rename drivers/block/{null_blk_trace.h => null_blk/trace.h} (97%)
 rename drivers/block/{null_blk_zoned.c => null_blk/zoned.c} (76%)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 1/9] null_blk: Fix zone size initialization
  2020-11-11  5:16 [PATCH v2 0/9] null_blk fixes, improvements and cleanup Damien Le Moal
@ 2020-11-11  5:16 ` Damien Le Moal
  2020-11-11  8:05   ` Christoph Hellwig
  2020-11-11  8:06   ` Johannes Thumshirn
  2020-11-11  5:16 ` [PATCH v2 2/9] null_blk: Fail zone append to conventional zones Damien Le Moal
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 27+ messages in thread
From: Damien Le Moal @ 2020-11-11  5:16 UTC (permalink / raw)
  To: linux-block, Jens Axboe

For a null_blk device with zoned mode enabled is currently initialized
with a number of zones equal to the device capacity divided by the zone
size, without considering if the device capacity is a multiple of the
zone size. If the zone size is not a divisor of the capacity, the zones
end up not covering the entire capacity, potentially resulting is out
of bounds accesses to the zone array.

Fix this by adding one last smaller zone with a size equal to the
remainder of the disk capacity divided by the zone size if the capacity
is not a multiple of the zone size. For such smaller last zone, the zone
capacity is also checked so that it does not exceed the smaller zone
size.

Reported-by: Naohiro Aota <naohiro.aota@wdc.com>
Fixes: ca4b2a011948 ("null_blk: add zone support")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/block/null_blk_zoned.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/block/null_blk_zoned.c b/drivers/block/null_blk_zoned.c
index beb34b4f76b0..1d0370d91fe7 100644
--- a/drivers/block/null_blk_zoned.c
+++ b/drivers/block/null_blk_zoned.c
@@ -6,8 +6,7 @@
 #define CREATE_TRACE_POINTS
 #include "null_blk_trace.h"
 
-/* zone_size in MBs to sectors. */
-#define ZONE_SIZE_SHIFT		11
+#define MB_TO_SECTS(mb) (((sector_t)mb * SZ_1M) >> SECTOR_SHIFT)
 
 static inline unsigned int null_zone_no(struct nullb_device *dev, sector_t sect)
 {
@@ -16,7 +15,7 @@ static inline unsigned int null_zone_no(struct nullb_device *dev, sector_t sect)
 
 int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
 {
-	sector_t dev_size = (sector_t)dev->size * 1024 * 1024;
+	sector_t dev_capacity_sects, zone_capacity_sects;
 	sector_t sector = 0;
 	unsigned int i;
 
@@ -38,9 +37,13 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
 		return -EINVAL;
 	}
 
-	dev->zone_size_sects = dev->zone_size << ZONE_SIZE_SHIFT;
-	dev->nr_zones = dev_size >>
-				(SECTOR_SHIFT + ilog2(dev->zone_size_sects));
+	zone_capacity_sects = MB_TO_SECTS(dev->zone_capacity);
+	dev_capacity_sects = MB_TO_SECTS(dev->size);
+	dev->zone_size_sects = MB_TO_SECTS(dev->zone_size);
+	dev->nr_zones = dev_capacity_sects >> ilog2(dev->zone_size_sects);
+	if (dev_capacity_sects & (dev->zone_size_sects - 1))
+		dev->nr_zones++;
+
 	dev->zones = kvmalloc_array(dev->nr_zones, sizeof(struct blk_zone),
 			GFP_KERNEL | __GFP_ZERO);
 	if (!dev->zones)
@@ -101,8 +104,12 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
 		struct blk_zone *zone = &dev->zones[i];
 
 		zone->start = zone->wp = sector;
-		zone->len = dev->zone_size_sects;
-		zone->capacity = dev->zone_capacity << ZONE_SIZE_SHIFT;
+		if (zone->start + dev->zone_size_sects > dev_capacity_sects)
+			zone->len = dev_capacity_sects - zone->start;
+		else
+			zone->len = dev->zone_size_sects;
+		zone->capacity =
+			min_t(sector_t, zone->len, zone_capacity_sects);
 		zone->type = BLK_ZONE_TYPE_SEQWRITE_REQ;
 		zone->cond = BLK_ZONE_COND_EMPTY;
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 2/9] null_blk: Fail zone append to conventional zones
  2020-11-11  5:16 [PATCH v2 0/9] null_blk fixes, improvements and cleanup Damien Le Moal
  2020-11-11  5:16 ` [PATCH v2 1/9] null_blk: Fix zone size initialization Damien Le Moal
@ 2020-11-11  5:16 ` Damien Le Moal
  2020-11-11  8:06   ` Christoph Hellwig
  2020-11-11  8:07   ` Johannes Thumshirn
  2020-11-11  5:16 ` [PATCH v2 3/9] null_blk: Align max_hw_sectors to blocksize Damien Le Moal
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 27+ messages in thread
From: Damien Le Moal @ 2020-11-11  5:16 UTC (permalink / raw)
  To: linux-block, Jens Axboe

Conventional zones do not have a write pointer and so cannot accept zone
append writes. Make sure to fail any zone append write command issued to
a conventional zone.

Reported-by: Naohiro Aota <naohiro.aota@wdc.com>
Fixes: e0489ed5daeb ("null_blk: Support REQ_OP_ZONE_APPEND")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/block/null_blk_zoned.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/block/null_blk_zoned.c b/drivers/block/null_blk_zoned.c
index 1d0370d91fe7..172f720b8d63 100644
--- a/drivers/block/null_blk_zoned.c
+++ b/drivers/block/null_blk_zoned.c
@@ -339,8 +339,11 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector,
 
 	trace_nullb_zone_op(cmd, zno, zone->cond);
 
-	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
+	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL) {
+		if (append)
+			return BLK_STS_IOERR;
 		return null_process_cmd(cmd, REQ_OP_WRITE, sector, nr_sectors);
+	}
 
 	null_lock_zone(dev, zno);
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 3/9] null_blk: Align max_hw_sectors to blocksize
  2020-11-11  5:16 [PATCH v2 0/9] null_blk fixes, improvements and cleanup Damien Le Moal
  2020-11-11  5:16 ` [PATCH v2 1/9] null_blk: Fix zone size initialization Damien Le Moal
  2020-11-11  5:16 ` [PATCH v2 2/9] null_blk: Fail zone append to conventional zones Damien Le Moal
@ 2020-11-11  5:16 ` Damien Le Moal
  2020-11-11  8:06   ` Christoph Hellwig
  2020-11-11  5:16 ` [PATCH v2 4/9] null_blk: improve zone locking Damien Le Moal
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Damien Le Moal @ 2020-11-11  5:16 UTC (permalink / raw)
  To: linux-block, Jens Axboe

null_blk always uses the default BLK_SAFE_MAX_SECTORS value for the
max_hw_sectors and max_sectors queue limits resulting in a maximum
request size of 127 sectors. When the blocksize setting is larger than
the default 512B, this maximum request size is not aligned to the
block size. To emulate a real device more accurately, fix this by
setting the max_hw_sectors and max_sectors queue limits to a value
that is aligned to the block size.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/block/null_blk_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
index 4685ea401d5b..b77a506a4ae4 100644
--- a/drivers/block/null_blk_main.c
+++ b/drivers/block/null_blk_main.c
@@ -1866,6 +1866,9 @@ static int null_add_dev(struct nullb_device *dev)
 
 	blk_queue_logical_block_size(nullb->q, dev->blocksize);
 	blk_queue_physical_block_size(nullb->q, dev->blocksize);
+	blk_queue_max_hw_sectors(nullb->q,
+				 round_down(queue_max_hw_sectors(nullb->q),
+					    dev->blocksize >> SECTOR_SHIFT));
 
 	null_config_discard(nullb);
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 4/9] null_blk: improve zone locking
  2020-11-11  5:16 [PATCH v2 0/9] null_blk fixes, improvements and cleanup Damien Le Moal
                   ` (2 preceding siblings ...)
  2020-11-11  5:16 ` [PATCH v2 3/9] null_blk: Align max_hw_sectors to blocksize Damien Le Moal
@ 2020-11-11  5:16 ` Damien Le Moal
  2020-11-11  8:11   ` Christoph Hellwig
  2020-11-11  8:18   ` Johannes Thumshirn
  2020-11-11  5:16 ` [PATCH v2 5/9] null_blk: Improve implicit zone close Damien Le Moal
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 27+ messages in thread
From: Damien Le Moal @ 2020-11-11  5:16 UTC (permalink / raw)
  To: linux-block, Jens Axboe

With memory backing disabled, using a single spinlock for protecting
zone information and zone resource management prevent the parallel
execution on multiple queue of IO requests to different zones.
Furthermore, regardless of the use of memory backing, if a null_blk
device is created without limits on the number of opn and active zones,
accounting for zone resource management is not necessary.

From these observations, zone locking is changed as follow to improve
performance:
1) the zone_lock spinlock is renamed zone_res_lock and used only if zone
   resource management is necessary, that is, if either zone_max_open or
   zone_max_active are not 0. This is indicated using the new boolean
   need_zone_res_mgmt in the nullb_device structure. null_zone_write()
   is modified to reduce the amount of code executed with the
   zone_res_lock spinlock held. null_zone_valid_read_len() is also
   modified to avoid taking the zone lock before calling
   null_process_cmd() for read operations in null_process_zoned_cmd().
2) With memory backing disabled, per zone locking is changed to a
   spinlock per zone.

With these changes, fio performance with zonemode=zbd for 4K random
read and random write on a dual socket (24 cores per socket) machine
using the none schedulder is as follows:

before patch:
	write (psync x 96 jobs) = 465 KIOPS
	read (libaio@qd=8 x 96 jobs) = 1361 KIOPS
after patch:
	write (psync x 96 jobs) = 468 KIOPS
	read (libaio@qd=8 x 96 jobs) = 3340 KIOPS

Write performance remains mostly unchanged but read performance more
than double. Performance when using the mq-deadline scheduler is not
changed by this patch as mq-deadline becomes the bottleneck for a
multi-queue device.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/block/null_blk.h       |   5 +-
 drivers/block/null_blk_zoned.c | 192 +++++++++++++++++++++------------
 2 files changed, 126 insertions(+), 71 deletions(-)

diff --git a/drivers/block/null_blk.h b/drivers/block/null_blk.h
index c24d9b5ad81a..4c101c39c3d1 100644
--- a/drivers/block/null_blk.h
+++ b/drivers/block/null_blk.h
@@ -47,8 +47,9 @@ struct nullb_device {
 	unsigned int nr_zones_closed;
 	struct blk_zone *zones;
 	sector_t zone_size_sects;
-	spinlock_t zone_lock;
-	unsigned long *zone_locks;
+	bool need_zone_res_mgmt;
+	spinlock_t zone_res_lock;
+	void *zone_locks;
 
 	unsigned long size; /* device size in MB */
 	unsigned long completion_nsec; /* time in ns to complete a request */
diff --git a/drivers/block/null_blk_zoned.c b/drivers/block/null_blk_zoned.c
index 172f720b8d63..2630aeda757d 100644
--- a/drivers/block/null_blk_zoned.c
+++ b/drivers/block/null_blk_zoned.c
@@ -56,13 +56,15 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
 	 * wait_on_bit_lock_io(). Sleeping on the lock is OK as memory backing
 	 * implies that the queue is marked with BLK_MQ_F_BLOCKING.
 	 */
-	spin_lock_init(&dev->zone_lock);
-	if (dev->memory_backed) {
+	spin_lock_init(&dev->zone_res_lock);
+	if (dev->memory_backed)
 		dev->zone_locks = bitmap_zalloc(dev->nr_zones, GFP_KERNEL);
-		if (!dev->zone_locks) {
-			kvfree(dev->zones);
-			return -ENOMEM;
-		}
+	else
+		dev->zone_locks = kmalloc_array(dev->nr_zones,
+						sizeof(spinlock_t), GFP_KERNEL);
+	if (!dev->zone_locks) {
+		kvfree(dev->zones);
+		return -ENOMEM;
 	}
 
 	if (dev->zone_nr_conv >= dev->nr_zones) {
@@ -86,10 +88,14 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
 		dev->zone_max_open = 0;
 		pr_info("zone_max_open limit disabled, limit >= zone count\n");
 	}
+	dev->need_zone_res_mgmt = dev->zone_max_active || dev->zone_max_open;
 
 	for (i = 0; i <  dev->zone_nr_conv; i++) {
 		struct blk_zone *zone = &dev->zones[i];
 
+		if (!dev->memory_backed)
+			spin_lock_init((spinlock_t *)dev->zone_locks + i);
+
 		zone->start = sector;
 		zone->len = dev->zone_size_sects;
 		zone->capacity = zone->len;
@@ -103,6 +109,9 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
 	for (i = dev->zone_nr_conv; i < dev->nr_zones; i++) {
 		struct blk_zone *zone = &dev->zones[i];
 
+		if (!dev->memory_backed)
+			spin_lock_init((spinlock_t *)dev->zone_locks + i);
+
 		zone->start = zone->wp = sector;
 		if (zone->start + dev->zone_size_sects > dev_capacity_sects)
 			zone->len = dev_capacity_sects - zone->start;
@@ -147,23 +156,41 @@ int null_register_zoned_dev(struct nullb *nullb)
 
 void null_free_zoned_dev(struct nullb_device *dev)
 {
-	bitmap_free(dev->zone_locks);
+	if (dev->memory_backed)
+		bitmap_free(dev->zone_locks);
+	else
+		kfree(dev->zone_locks);
 	kvfree(dev->zones);
 }
 
+#define null_lock_zone_res(dev, flags)					\
+	do {								\
+		if ((dev)->need_zone_res_mgmt)				\
+			spin_lock_irqsave(&(dev)->zone_res_lock,	\
+					  (flags));			\
+	} while (0)
+
+#define null_unlock_zone_res(dev, flags)				\
+	do {								\
+		if ((dev)->need_zone_res_mgmt)				\
+			spin_unlock_irqrestore(&(dev)->zone_res_lock,	\
+					       (flags));		\
+	} while (0)
+
 static inline void null_lock_zone(struct nullb_device *dev, unsigned int zno)
 {
 	if (dev->memory_backed)
 		wait_on_bit_lock_io(dev->zone_locks, zno, TASK_UNINTERRUPTIBLE);
-	spin_lock_irq(&dev->zone_lock);
+	else
+		spin_lock_irq((spinlock_t *)dev->zone_locks + zno);
 }
 
 static inline void null_unlock_zone(struct nullb_device *dev, unsigned int zno)
 {
-	spin_unlock_irq(&dev->zone_lock);
-
 	if (dev->memory_backed)
 		clear_and_wake_up_bit(zno, dev->zone_locks);
+	else
+		spin_unlock_irq((spinlock_t *)dev->zone_locks + zno);
 }
 
 int null_report_zones(struct gendisk *disk, sector_t sector,
@@ -224,11 +251,9 @@ size_t null_zone_valid_read_len(struct nullb *nullb,
 	return (zone->wp - sector) << SECTOR_SHIFT;
 }
 
-static blk_status_t null_close_zone(struct nullb_device *dev, struct blk_zone *zone)
+static blk_status_t __null_close_zone(struct nullb_device *dev,
+				      struct blk_zone *zone)
 {
-	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
-		return BLK_STS_IOERR;
-
 	switch (zone->cond) {
 	case BLK_ZONE_COND_CLOSED:
 		/* close operation on closed is not an error */
@@ -261,7 +286,7 @@ static void null_close_first_imp_zone(struct nullb_device *dev)
 
 	for (i = dev->zone_nr_conv; i < dev->nr_zones; i++) {
 		if (dev->zones[i].cond == BLK_ZONE_COND_IMP_OPEN) {
-			null_close_zone(dev, &dev->zones[i]);
+			__null_close_zone(dev, &dev->zones[i]);
 			return;
 		}
 	}
@@ -335,6 +360,7 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector,
 	struct nullb_device *dev = cmd->nq->dev;
 	unsigned int zno = null_zone_no(dev, sector);
 	struct blk_zone *zone = &dev->zones[zno];
+	unsigned long flags;
 	blk_status_t ret;
 
 	trace_nullb_zone_op(cmd, zno, zone->cond);
@@ -347,24 +373,10 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector,
 
 	null_lock_zone(dev, zno);
 
-	switch (zone->cond) {
-	case BLK_ZONE_COND_FULL:
+	if (zone->cond == BLK_ZONE_COND_FULL) {
 		/* Cannot write to a full zone */
 		ret = BLK_STS_IOERR;
 		goto unlock;
-	case BLK_ZONE_COND_EMPTY:
-	case BLK_ZONE_COND_CLOSED:
-		ret = null_check_zone_resources(dev, zone);
-		if (ret != BLK_STS_OK)
-			goto unlock;
-		break;
-	case BLK_ZONE_COND_IMP_OPEN:
-	case BLK_ZONE_COND_EXP_OPEN:
-		break;
-	default:
-		/* Invalid zone condition */
-		ret = BLK_STS_IOERR;
-		goto unlock;
 	}
 
 	/*
@@ -389,37 +401,43 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector,
 		goto unlock;
 	}
 
-	if (zone->cond == BLK_ZONE_COND_CLOSED) {
-		dev->nr_zones_closed--;
-		dev->nr_zones_imp_open++;
-	} else if (zone->cond == BLK_ZONE_COND_EMPTY) {
-		dev->nr_zones_imp_open++;
+	if (zone->cond == BLK_ZONE_COND_CLOSED ||
+	    zone->cond == BLK_ZONE_COND_EMPTY) {
+		null_lock_zone_res(dev, flags);
+
+		ret = null_check_zone_resources(dev, zone);
+		if (ret != BLK_STS_OK) {
+			null_unlock_zone_res(dev, flags);
+			goto unlock;
+		}
+		if (zone->cond == BLK_ZONE_COND_CLOSED) {
+			dev->nr_zones_closed--;
+			dev->nr_zones_imp_open++;
+		} else if (zone->cond == BLK_ZONE_COND_EMPTY) {
+			dev->nr_zones_imp_open++;
+		}
+
+		if (zone->cond != BLK_ZONE_COND_EXP_OPEN)
+			zone->cond = BLK_ZONE_COND_IMP_OPEN;
+
+		null_unlock_zone_res(dev, flags);
 	}
-	if (zone->cond != BLK_ZONE_COND_EXP_OPEN)
-		zone->cond = BLK_ZONE_COND_IMP_OPEN;
 
-	/*
-	 * Memory backing allocation may sleep: release the zone_lock spinlock
-	 * to avoid scheduling in atomic context. Zone operation atomicity is
-	 * still guaranteed through the zone_locks bitmap.
-	 */
-	if (dev->memory_backed)
-		spin_unlock_irq(&dev->zone_lock);
 	ret = null_process_cmd(cmd, REQ_OP_WRITE, sector, nr_sectors);
-	if (dev->memory_backed)
-		spin_lock_irq(&dev->zone_lock);
-
 	if (ret != BLK_STS_OK)
 		goto unlock;
 
 	zone->wp += nr_sectors;
 	if (zone->wp == zone->start + zone->capacity) {
+		null_lock_zone_res(dev, flags);
 		if (zone->cond == BLK_ZONE_COND_EXP_OPEN)
 			dev->nr_zones_exp_open--;
 		else if (zone->cond == BLK_ZONE_COND_IMP_OPEN)
 			dev->nr_zones_imp_open--;
 		zone->cond = BLK_ZONE_COND_FULL;
+		null_unlock_zone_res(dev, flags);
 	}
+
 	ret = BLK_STS_OK;
 
 unlock:
@@ -430,19 +448,22 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector,
 
 static blk_status_t null_open_zone(struct nullb_device *dev, struct blk_zone *zone)
 {
-	blk_status_t ret;
+	blk_status_t ret = BLK_STS_OK;
+	unsigned long flags;
 
 	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
 		return BLK_STS_IOERR;
 
+	null_lock_zone_res(dev, flags);
+
 	switch (zone->cond) {
 	case BLK_ZONE_COND_EXP_OPEN:
 		/* open operation on exp open is not an error */
-		return BLK_STS_OK;
+		goto unlock;
 	case BLK_ZONE_COND_EMPTY:
 		ret = null_check_zone_resources(dev, zone);
 		if (ret != BLK_STS_OK)
-			return ret;
+			goto unlock;
 		break;
 	case BLK_ZONE_COND_IMP_OPEN:
 		dev->nr_zones_imp_open--;
@@ -450,35 +471,57 @@ static blk_status_t null_open_zone(struct nullb_device *dev, struct blk_zone *zo
 	case BLK_ZONE_COND_CLOSED:
 		ret = null_check_zone_resources(dev, zone);
 		if (ret != BLK_STS_OK)
-			return ret;
+			goto unlock;
 		dev->nr_zones_closed--;
 		break;
 	case BLK_ZONE_COND_FULL:
 	default:
-		return BLK_STS_IOERR;
+		ret = BLK_STS_IOERR;
+		goto unlock;
 	}
 
 	zone->cond = BLK_ZONE_COND_EXP_OPEN;
 	dev->nr_zones_exp_open++;
 
-	return BLK_STS_OK;
+unlock:
+	null_unlock_zone_res(dev, flags);
+
+	return ret;
 }
 
-static blk_status_t null_finish_zone(struct nullb_device *dev, struct blk_zone *zone)
+static blk_status_t null_close_zone(struct nullb_device *dev, struct blk_zone *zone)
 {
+	unsigned long flags;
 	blk_status_t ret;
 
 	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
 		return BLK_STS_IOERR;
 
+	null_lock_zone_res(dev, flags);
+	ret = __null_close_zone(dev, zone);
+	null_unlock_zone_res(dev, flags);
+
+	return ret;
+}
+
+static blk_status_t null_finish_zone(struct nullb_device *dev, struct blk_zone *zone)
+{
+	blk_status_t ret = BLK_STS_OK;
+	unsigned long flags;
+
+	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
+		return BLK_STS_IOERR;
+
+	null_lock_zone_res(dev, flags);
+
 	switch (zone->cond) {
 	case BLK_ZONE_COND_FULL:
 		/* finish operation on full is not an error */
-		return BLK_STS_OK;
+		goto unlock;
 	case BLK_ZONE_COND_EMPTY:
 		ret = null_check_zone_resources(dev, zone);
 		if (ret != BLK_STS_OK)
-			return ret;
+			goto unlock;
 		break;
 	case BLK_ZONE_COND_IMP_OPEN:
 		dev->nr_zones_imp_open--;
@@ -489,27 +532,36 @@ static blk_status_t null_finish_zone(struct nullb_device *dev, struct blk_zone *
 	case BLK_ZONE_COND_CLOSED:
 		ret = null_check_zone_resources(dev, zone);
 		if (ret != BLK_STS_OK)
-			return ret;
+			goto unlock;
 		dev->nr_zones_closed--;
 		break;
 	default:
-		return BLK_STS_IOERR;
+		ret = BLK_STS_IOERR;
+		goto unlock;
 	}
 
 	zone->cond = BLK_ZONE_COND_FULL;
 	zone->wp = zone->start + zone->len;
 
-	return BLK_STS_OK;
+unlock:
+	null_unlock_zone_res(dev, flags);
+
+	return ret;
 }
 
 static blk_status_t null_reset_zone(struct nullb_device *dev, struct blk_zone *zone)
 {
+	unsigned long flags;
+
 	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
 		return BLK_STS_IOERR;
 
+	null_lock_zone_res(dev, flags);
+
 	switch (zone->cond) {
 	case BLK_ZONE_COND_EMPTY:
 		/* reset operation on empty is not an error */
+		null_unlock_zone_res(dev, flags);
 		return BLK_STS_OK;
 	case BLK_ZONE_COND_IMP_OPEN:
 		dev->nr_zones_imp_open--;
@@ -523,12 +575,15 @@ static blk_status_t null_reset_zone(struct nullb_device *dev, struct blk_zone *z
 	case BLK_ZONE_COND_FULL:
 		break;
 	default:
+		null_unlock_zone_res(dev, flags);
 		return BLK_STS_IOERR;
 	}
 
 	zone->cond = BLK_ZONE_COND_EMPTY;
 	zone->wp = zone->start;
 
+	null_unlock_zone_res(dev, flags);
+
 	return BLK_STS_OK;
 }
 
@@ -588,29 +643,28 @@ static blk_status_t null_zone_mgmt(struct nullb_cmd *cmd, enum req_opf op,
 blk_status_t null_process_zoned_cmd(struct nullb_cmd *cmd, enum req_opf op,
 				    sector_t sector, sector_t nr_sectors)
 {
-	struct nullb_device *dev = cmd->nq->dev;
-	unsigned int zno = null_zone_no(dev, sector);
+	struct nullb_device *dev;
+	unsigned int zno;
 	blk_status_t sts;
 
 	switch (op) {
 	case REQ_OP_WRITE:
-		sts = null_zone_write(cmd, sector, nr_sectors, false);
-		break;
+		return null_zone_write(cmd, sector, nr_sectors, false);
 	case REQ_OP_ZONE_APPEND:
-		sts = null_zone_write(cmd, sector, nr_sectors, true);
-		break;
+		return null_zone_write(cmd, sector, nr_sectors, true);
 	case REQ_OP_ZONE_RESET:
 	case REQ_OP_ZONE_RESET_ALL:
 	case REQ_OP_ZONE_OPEN:
 	case REQ_OP_ZONE_CLOSE:
 	case REQ_OP_ZONE_FINISH:
-		sts = null_zone_mgmt(cmd, op, sector);
-		break;
+		return null_zone_mgmt(cmd, op, sector);
 	default:
+		dev = cmd->nq->dev;
+		zno = null_zone_no(dev, sector);
+
 		null_lock_zone(dev, zno);
 		sts = null_process_cmd(cmd, op, sector, nr_sectors);
 		null_unlock_zone(dev, zno);
+		return sts;
 	}
-
-	return sts;
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 5/9] null_blk: Improve implicit zone close
  2020-11-11  5:16 [PATCH v2 0/9] null_blk fixes, improvements and cleanup Damien Le Moal
                   ` (3 preceding siblings ...)
  2020-11-11  5:16 ` [PATCH v2 4/9] null_blk: improve zone locking Damien Le Moal
@ 2020-11-11  5:16 ` Damien Le Moal
  2020-11-11  5:16 ` [PATCH v2 6/9] null_blk: cleanup discard handling Damien Le Moal
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 27+ messages in thread
From: Damien Le Moal @ 2020-11-11  5:16 UTC (permalink / raw)
  To: linux-block, Jens Axboe

When open zone resource management is enabled, that is, when a null_blk
zoned device is created with zone_max_open different than 0, implicitly
or explicitly opening a zone may require implicitly closing a zone
that is already implicitly open. This operation is done using the
function null_close_first_imp_zone(), which search for an implicitly
open zone to close starting from the first sequential zone. This
implementation is simple but may result in the same being constantly
implicitly closed and then implicitly reopened on write, namely, the
lowest numbered zone that is being written.

Avoid this by starting the search for an implicitly open zone starting
from the zone following the last zone that was implicitly closed. The
function null_close_first_imp_zone() is renamed
null_close_imp_open_zone().

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/block/null_blk.h       |  1 +
 drivers/block/null_blk_zoned.c | 22 +++++++++++++++++-----
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/block/null_blk.h b/drivers/block/null_blk.h
index 4c101c39c3d1..683b573b7e14 100644
--- a/drivers/block/null_blk.h
+++ b/drivers/block/null_blk.h
@@ -45,6 +45,7 @@ struct nullb_device {
 	unsigned int nr_zones_imp_open;
 	unsigned int nr_zones_exp_open;
 	unsigned int nr_zones_closed;
+	unsigned int imp_close_zone_no;
 	struct blk_zone *zones;
 	sector_t zone_size_sects;
 	bool need_zone_res_mgmt;
diff --git a/drivers/block/null_blk_zoned.c b/drivers/block/null_blk_zoned.c
index 2630aeda757d..905cab12ee3c 100644
--- a/drivers/block/null_blk_zoned.c
+++ b/drivers/block/null_blk_zoned.c
@@ -89,6 +89,7 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
 		pr_info("zone_max_open limit disabled, limit >= zone count\n");
 	}
 	dev->need_zone_res_mgmt = dev->zone_max_active || dev->zone_max_open;
+	dev->imp_close_zone_no = dev->zone_nr_conv;
 
 	for (i = 0; i <  dev->zone_nr_conv; i++) {
 		struct blk_zone *zone = &dev->zones[i];
@@ -280,13 +281,24 @@ static blk_status_t __null_close_zone(struct nullb_device *dev,
 	return BLK_STS_OK;
 }
 
-static void null_close_first_imp_zone(struct nullb_device *dev)
+static void null_close_imp_open_zone(struct nullb_device *dev)
 {
-	unsigned int i;
+	struct blk_zone *zone;
+	unsigned int zno, i;
+
+	zno = dev->imp_close_zone_no;
+	if (zno >= dev->nr_zones)
+		zno = dev->zone_nr_conv;
 
 	for (i = dev->zone_nr_conv; i < dev->nr_zones; i++) {
-		if (dev->zones[i].cond == BLK_ZONE_COND_IMP_OPEN) {
-			__null_close_zone(dev, &dev->zones[i]);
+		zone = &dev->zones[zno];
+		zno++;
+		if (zno >= dev->nr_zones)
+			zno = dev->zone_nr_conv;
+
+		if (zone->cond == BLK_ZONE_COND_IMP_OPEN) {
+			__null_close_zone(dev, zone);
+			dev->imp_close_zone_no = zno;
 			return;
 		}
 	}
@@ -314,7 +326,7 @@ static blk_status_t null_check_open(struct nullb_device *dev)
 
 	if (dev->nr_zones_imp_open) {
 		if (null_check_active(dev) == BLK_STS_OK) {
-			null_close_first_imp_zone(dev);
+			null_close_imp_open_zone(dev);
 			return BLK_STS_OK;
 		}
 	}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 6/9] null_blk: cleanup discard handling
  2020-11-11  5:16 [PATCH v2 0/9] null_blk fixes, improvements and cleanup Damien Le Moal
                   ` (4 preceding siblings ...)
  2020-11-11  5:16 ` [PATCH v2 5/9] null_blk: Improve implicit zone close Damien Le Moal
@ 2020-11-11  5:16 ` Damien Le Moal
  2020-11-11  8:13   ` Christoph Hellwig
  2020-11-11  5:16 ` [PATCH v2 7/9] null_blk: discard zones on reset Damien Le Moal
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Damien Le Moal @ 2020-11-11  5:16 UTC (permalink / raw)
  To: linux-block, Jens Axboe

null_handle_discard() is called from both null_handle_rq() and
null_handle_bio(). As these functions are only passed a nullb_cmd
structure, this forces pointer dereferences to identiify the discard
operation code and to access the sector range to be discarded.
Simplify all this by changing the interface of the functions
null_handle_discard() and null_handle_memory_backed() to pass along
the operation code, operation start sector and number of sectors. With
this change null_handle_discard() can be called directly from
null_handle_memory_backed().

Also add a message warning that the discard configuration attribute has
no effect when memory backing is disabled.

No functional change is introduced by this patch.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/block/null_blk_main.c | 43 ++++++++++++++++++-----------------
 1 file changed, 22 insertions(+), 21 deletions(-)

diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
index b77a506a4ae4..06b909fd230b 100644
--- a/drivers/block/null_blk_main.c
+++ b/drivers/block/null_blk_main.c
@@ -1076,13 +1076,16 @@ static void nullb_fill_pattern(struct nullb *nullb, struct page *page,
 	kunmap_atomic(dst);
 }
 
-static void null_handle_discard(struct nullb *nullb, sector_t sector, size_t n)
+static void null_handle_discard(struct nullb_device *dev, sector_t sector,
+				sector_t nr_sectors)
 {
+	struct nullb *nullb = dev->nullb;
+	size_t n = nr_sectors << SECTOR_SHIFT;
 	size_t temp;
 
 	spin_lock_irq(&nullb->lock);
 	while (n > 0) {
-		temp = min_t(size_t, n, nullb->dev->blocksize);
+		temp = min_t(size_t, n, dev->blocksize);
 		null_free_sector(nullb, sector, false);
 		if (null_cache_active(nullb))
 			null_free_sector(nullb, sector, true);
@@ -1149,17 +1152,10 @@ static int null_handle_rq(struct nullb_cmd *cmd)
 	struct nullb *nullb = cmd->nq->dev->nullb;
 	int err;
 	unsigned int len;
-	sector_t sector;
+	sector_t sector = blk_rq_pos(rq);
 	struct req_iterator iter;
 	struct bio_vec bvec;
 
-	sector = blk_rq_pos(rq);
-
-	if (req_op(rq) == REQ_OP_DISCARD) {
-		null_handle_discard(nullb, sector, blk_rq_bytes(rq));
-		return 0;
-	}
-
 	spin_lock_irq(&nullb->lock);
 	rq_for_each_segment(bvec, rq, iter) {
 		len = bvec.bv_len;
@@ -1183,18 +1179,10 @@ static int null_handle_bio(struct nullb_cmd *cmd)
 	struct nullb *nullb = cmd->nq->dev->nullb;
 	int err;
 	unsigned int len;
-	sector_t sector;
+	sector_t sector = bio->bi_iter.bi_sector;
 	struct bio_vec bvec;
 	struct bvec_iter iter;
 
-	sector = bio->bi_iter.bi_sector;
-
-	if (bio_op(bio) == REQ_OP_DISCARD) {
-		null_handle_discard(nullb, sector,
-			bio_sectors(bio) << SECTOR_SHIFT);
-		return 0;
-	}
-
 	spin_lock_irq(&nullb->lock);
 	bio_for_each_segment(bvec, bio, iter) {
 		len = bvec.bv_len;
@@ -1263,11 +1251,18 @@ static inline blk_status_t null_handle_badblocks(struct nullb_cmd *cmd,
 }
 
 static inline blk_status_t null_handle_memory_backed(struct nullb_cmd *cmd,
-						     enum req_opf op)
+						enum req_opf op, sector_t sector,
+						sector_t nr_sectors)
+
 {
 	struct nullb_device *dev = cmd->nq->dev;
 	int err;
 
+	if (op == REQ_OP_DISCARD) {
+		null_handle_discard(dev, sector, nr_sectors);
+		return 0;
+	}
+
 	if (dev->queue_mode == NULL_Q_BIO)
 		err = null_handle_bio(cmd);
 	else
@@ -1343,7 +1338,7 @@ blk_status_t null_process_cmd(struct nullb_cmd *cmd,
 	}
 
 	if (dev->memory_backed)
-		return null_handle_memory_backed(cmd, op);
+		return null_handle_memory_backed(cmd, op, sector, nr_sectors);
 
 	return BLK_STS_OK;
 }
@@ -1589,6 +1584,12 @@ static void null_config_discard(struct nullb *nullb)
 	if (nullb->dev->discard == false)
 		return;
 
+	if (!nullb->dev->memory_backed) {
+		nullb->dev->discard = false;
+		pr_info("discard option is ignored without memory backing\n");
+		return;
+	}
+
 	if (nullb->dev->zoned) {
 		nullb->dev->discard = false;
 		pr_info("discard option is ignored in zoned mode\n");
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 7/9] null_blk: discard zones on reset
  2020-11-11  5:16 [PATCH v2 0/9] null_blk fixes, improvements and cleanup Damien Le Moal
                   ` (5 preceding siblings ...)
  2020-11-11  5:16 ` [PATCH v2 6/9] null_blk: cleanup discard handling Damien Le Moal
@ 2020-11-11  5:16 ` Damien Le Moal
  2020-11-11  8:14   ` Christoph Hellwig
  2020-11-11  8:19   ` Johannes Thumshirn
  2020-11-11  5:16 ` [PATCH v2 8/9] null_blk: Allow controlling max_hw_sectors limit Damien Le Moal
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 27+ messages in thread
From: Damien Le Moal @ 2020-11-11  5:16 UTC (permalink / raw)
  To: linux-block, Jens Axboe

When memory backing is enabled, use null_handle_discard() to free the
backing memory used by a zone when the zone is being reset.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/block/null_blk.h       | 2 ++
 drivers/block/null_blk_main.c  | 4 ++--
 drivers/block/null_blk_zoned.c | 3 +++
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/block/null_blk.h b/drivers/block/null_blk.h
index 683b573b7e14..76bd190fa185 100644
--- a/drivers/block/null_blk.h
+++ b/drivers/block/null_blk.h
@@ -95,6 +95,8 @@ struct nullb {
 	char disk_name[DISK_NAME_LEN];
 };
 
+void null_handle_discard(struct nullb_device *dev, sector_t sector,
+			 sector_t nr_sectors);
 blk_status_t null_process_cmd(struct nullb_cmd *cmd,
 			      enum req_opf op, sector_t sector,
 			      unsigned int nr_sectors);
diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
index 06b909fd230b..fa0bc65bbd1e 100644
--- a/drivers/block/null_blk_main.c
+++ b/drivers/block/null_blk_main.c
@@ -1076,8 +1076,8 @@ static void nullb_fill_pattern(struct nullb *nullb, struct page *page,
 	kunmap_atomic(dst);
 }
 
-static void null_handle_discard(struct nullb_device *dev, sector_t sector,
-				sector_t nr_sectors)
+void null_handle_discard(struct nullb_device *dev, sector_t sector,
+			 sector_t nr_sectors)
 {
 	struct nullb *nullb = dev->nullb;
 	size_t n = nr_sectors << SECTOR_SHIFT;
diff --git a/drivers/block/null_blk_zoned.c b/drivers/block/null_blk_zoned.c
index 905cab12ee3c..87c9b6ebdccb 100644
--- a/drivers/block/null_blk_zoned.c
+++ b/drivers/block/null_blk_zoned.c
@@ -596,6 +596,9 @@ static blk_status_t null_reset_zone(struct nullb_device *dev, struct blk_zone *z
 
 	null_unlock_zone_res(dev, flags);
 
+	if (dev->memory_backed)
+		null_handle_discard(dev, zone->start, zone->len);
+
 	return BLK_STS_OK;
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 8/9] null_blk: Allow controlling max_hw_sectors limit
  2020-11-11  5:16 [PATCH v2 0/9] null_blk fixes, improvements and cleanup Damien Le Moal
                   ` (6 preceding siblings ...)
  2020-11-11  5:16 ` [PATCH v2 7/9] null_blk: discard zones on reset Damien Le Moal
@ 2020-11-11  5:16 ` Damien Le Moal
  2020-11-11  8:14   ` Christoph Hellwig
  2020-11-11  8:22   ` Johannes Thumshirn
  2020-11-11  5:16 ` [PATCH v2 9/9] null_blk: Move driver into its own directory Damien Le Moal
  2020-11-11  6:04 ` [PATCH v2 0/9] null_blk fixes, improvements and cleanup Damien Le Moal
  9 siblings, 2 replies; 27+ messages in thread
From: Damien Le Moal @ 2020-11-11  5:16 UTC (permalink / raw)
  To: linux-block, Jens Axboe

Add the module option and configfs attribute max_sectors to allow
configuring the maximum size of a command issued to a null_blk device.
This allows exercising the block layer BIO splitting with different
limits than the default BLK_SAFE_MAX_SECTORS. This is also useful for
testing the zone append write path of file systems as the max_hw_sectors
limit value is also used for the max_zone_append_sectors limit.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/block/null_blk.h      |  1 +
 drivers/block/null_blk_main.c | 25 +++++++++++++++++++++----
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/block/null_blk.h b/drivers/block/null_blk.h
index 76bd190fa185..6e5197987093 100644
--- a/drivers/block/null_blk.h
+++ b/drivers/block/null_blk.h
@@ -64,6 +64,7 @@ struct nullb_device {
 	unsigned int home_node; /* home node for the device */
 	unsigned int queue_mode; /* block interface */
 	unsigned int blocksize; /* block size */
+	unsigned int max_sectors; /* Max sectors per command */
 	unsigned int irqmode; /* IRQ completion handler */
 	unsigned int hw_queue_depth; /* queue depth */
 	unsigned int index; /* index of the disk, only valid with a disk */
diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
index fa0bc65bbd1e..5f7fbcd56489 100644
--- a/drivers/block/null_blk_main.c
+++ b/drivers/block/null_blk_main.c
@@ -152,6 +152,10 @@ static int g_bs = 512;
 module_param_named(bs, g_bs, int, 0444);
 MODULE_PARM_DESC(bs, "Block size (in bytes)");
 
+static int g_max_sectors;
+module_param_named(max_sectors, g_max_sectors, int, 0444);
+MODULE_PARM_DESC(max_sectors, "Maximum size of a command (in 512B sectors)");
+
 static unsigned int nr_devices = 1;
 module_param(nr_devices, uint, 0444);
 MODULE_PARM_DESC(nr_devices, "Number of devices to register");
@@ -346,6 +350,7 @@ NULLB_DEVICE_ATTR(submit_queues, uint, nullb_apply_submit_queues);
 NULLB_DEVICE_ATTR(home_node, uint, NULL);
 NULLB_DEVICE_ATTR(queue_mode, uint, NULL);
 NULLB_DEVICE_ATTR(blocksize, uint, NULL);
+NULLB_DEVICE_ATTR(max_sectors, uint, NULL);
 NULLB_DEVICE_ATTR(irqmode, uint, NULL);
 NULLB_DEVICE_ATTR(hw_queue_depth, uint, NULL);
 NULLB_DEVICE_ATTR(index, uint, NULL);
@@ -463,6 +468,7 @@ static struct configfs_attribute *nullb_device_attrs[] = {
 	&nullb_device_attr_home_node,
 	&nullb_device_attr_queue_mode,
 	&nullb_device_attr_blocksize,
+	&nullb_device_attr_max_sectors,
 	&nullb_device_attr_irqmode,
 	&nullb_device_attr_hw_queue_depth,
 	&nullb_device_attr_index,
@@ -533,7 +539,7 @@ nullb_group_drop_item(struct config_group *group, struct config_item *item)
 static ssize_t memb_group_features_show(struct config_item *item, char *page)
 {
 	return snprintf(page, PAGE_SIZE,
-			"memory_backed,discard,bandwidth,cache,badblocks,zoned,zone_size,zone_capacity,zone_nr_conv,zone_max_open,zone_max_active\n");
+			"memory_backed,discard,bandwidth,cache,badblocks,zoned,zone_size,zone_capacity,zone_nr_conv,zone_max_open,zone_max_active,blocksize,max_sectors\n");
 }
 
 CONFIGFS_ATTR_RO(memb_group_, features);
@@ -588,6 +594,7 @@ static struct nullb_device *null_alloc_dev(void)
 	dev->home_node = g_home_node;
 	dev->queue_mode = g_queue_mode;
 	dev->blocksize = g_bs;
+	dev->max_sectors = g_max_sectors;
 	dev->irqmode = g_irqmode;
 	dev->hw_queue_depth = g_hw_queue_depth;
 	dev->blocking = g_blocking;
@@ -1867,9 +1874,13 @@ static int null_add_dev(struct nullb_device *dev)
 
 	blk_queue_logical_block_size(nullb->q, dev->blocksize);
 	blk_queue_physical_block_size(nullb->q, dev->blocksize);
-	blk_queue_max_hw_sectors(nullb->q,
-				 round_down(queue_max_hw_sectors(nullb->q),
-					    dev->blocksize >> SECTOR_SHIFT));
+	if (!dev->max_sectors)
+		dev->max_sectors = queue_max_hw_sectors(nullb->q);
+	dev->max_sectors = min_t(unsigned int, dev->max_sectors,
+				 BLK_DEF_MAX_SECTORS);
+	dev->max_sectors = round_down(dev->max_sectors,
+				      dev->blocksize >> SECTOR_SHIFT);
+	blk_queue_max_hw_sectors(nullb->q, dev->max_sectors);
 
 	null_config_discard(nullb);
 
@@ -1913,6 +1924,12 @@ static int __init null_init(void)
 		g_bs = PAGE_SIZE;
 	}
 
+	if (g_max_sectors > BLK_DEF_MAX_SECTORS) {
+		pr_warn("invalid max sectors\n");
+		pr_warn("defaults max sectors to %u\n", BLK_DEF_MAX_SECTORS);
+		g_max_sectors = BLK_DEF_MAX_SECTORS;
+	}
+
 	if (g_home_node != NUMA_NO_NODE && g_home_node >= nr_online_nodes) {
 		pr_err("invalid home_node value\n");
 		g_home_node = NUMA_NO_NODE;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 9/9] null_blk: Move driver into its own directory
  2020-11-11  5:16 [PATCH v2 0/9] null_blk fixes, improvements and cleanup Damien Le Moal
                   ` (7 preceding siblings ...)
  2020-11-11  5:16 ` [PATCH v2 8/9] null_blk: Allow controlling max_hw_sectors limit Damien Le Moal
@ 2020-11-11  5:16 ` Damien Le Moal
  2020-11-11  7:52   ` Johannes Thumshirn
  2020-11-11  6:04 ` [PATCH v2 0/9] null_blk fixes, improvements and cleanup Damien Le Moal
  9 siblings, 1 reply; 27+ messages in thread
From: Damien Le Moal @ 2020-11-11  5:16 UTC (permalink / raw)
  To: linux-block, Jens Axboe

Move null_blk driver code into the new sub-directory
drivers/block/null_blk.

Suggested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/block/Kconfig                                |  8 +-------
 drivers/block/Makefile                               |  7 +------
 drivers/block/null_blk/Kconfig                       | 12 ++++++++++++
 drivers/block/null_blk/Makefile                      | 11 +++++++++++
 drivers/block/{null_blk_main.c => null_blk/main.c}   |  0
 drivers/block/{ => null_blk}/null_blk.h              |  0
 drivers/block/{null_blk_trace.c => null_blk/trace.c} |  2 +-
 drivers/block/{null_blk_trace.h => null_blk/trace.h} |  2 +-
 drivers/block/{null_blk_zoned.c => null_blk/zoned.c} |  2 +-
 9 files changed, 28 insertions(+), 16 deletions(-)
 create mode 100644 drivers/block/null_blk/Kconfig
 create mode 100644 drivers/block/null_blk/Makefile
 rename drivers/block/{null_blk_main.c => null_blk/main.c} (100%)
 rename drivers/block/{ => null_blk}/null_blk.h (100%)
 rename drivers/block/{null_blk_trace.c => null_blk/trace.c} (93%)
 rename drivers/block/{null_blk_trace.h => null_blk/trace.h} (97%)
 rename drivers/block/{null_blk_zoned.c => null_blk/zoned.c} (99%)

diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index ecceaaa1a66f..262326973ee0 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -16,13 +16,7 @@ menuconfig BLK_DEV
 
 if BLK_DEV
 
-config BLK_DEV_NULL_BLK
-	tristate "Null test block driver"
-	select CONFIGFS_FS
-
-config BLK_DEV_NULL_BLK_FAULT_INJECTION
-	bool "Support fault injection for Null test block driver"
-	depends on BLK_DEV_NULL_BLK && FAULT_INJECTION
+source "drivers/block/null_blk/Kconfig"
 
 config BLK_DEV_FD
 	tristate "Normal floppy disk support"
diff --git a/drivers/block/Makefile b/drivers/block/Makefile
index e1f63117ee94..a3170859e01d 100644
--- a/drivers/block/Makefile
+++ b/drivers/block/Makefile
@@ -41,12 +41,7 @@ obj-$(CONFIG_BLK_DEV_RSXX) += rsxx/
 obj-$(CONFIG_ZRAM) += zram/
 obj-$(CONFIG_BLK_DEV_RNBD)	+= rnbd/
 
-obj-$(CONFIG_BLK_DEV_NULL_BLK)	+= null_blk.o
-null_blk-objs	:= null_blk_main.o
-ifeq ($(CONFIG_BLK_DEV_ZONED), y)
-null_blk-$(CONFIG_TRACING) += null_blk_trace.o
-endif
-null_blk-$(CONFIG_BLK_DEV_ZONED) += null_blk_zoned.o
+obj-$(CONFIG_BLK_DEV_NULL_BLK)	+= null_blk/
 
 skd-y		:= skd_main.o
 swim_mod-y	:= swim.o swim_asm.o
diff --git a/drivers/block/null_blk/Kconfig b/drivers/block/null_blk/Kconfig
new file mode 100644
index 000000000000..6bf1f8ca20a2
--- /dev/null
+++ b/drivers/block/null_blk/Kconfig
@@ -0,0 +1,12 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Null block device driver configuration
+#
+
+config BLK_DEV_NULL_BLK
+	tristate "Null test block driver"
+	select CONFIGFS_FS
+
+config BLK_DEV_NULL_BLK_FAULT_INJECTION
+	bool "Support fault injection for Null test block driver"
+	depends on BLK_DEV_NULL_BLK && FAULT_INJECTION
diff --git a/drivers/block/null_blk/Makefile b/drivers/block/null_blk/Makefile
new file mode 100644
index 000000000000..84c36e512ab8
--- /dev/null
+++ b/drivers/block/null_blk/Makefile
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0
+
+# needed for trace events
+ccflags-y			+= -I$(src)
+
+obj-$(CONFIG_BLK_DEV_NULL_BLK)	+= null_blk.o
+null_blk-objs			:= main.o
+ifeq ($(CONFIG_BLK_DEV_ZONED), y)
+null_blk-$(CONFIG_TRACING) 	+= trace.o
+endif
+null_blk-$(CONFIG_BLK_DEV_ZONED) += zoned.o
diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk/main.c
similarity index 100%
rename from drivers/block/null_blk_main.c
rename to drivers/block/null_blk/main.c
diff --git a/drivers/block/null_blk.h b/drivers/block/null_blk/null_blk.h
similarity index 100%
rename from drivers/block/null_blk.h
rename to drivers/block/null_blk/null_blk.h
diff --git a/drivers/block/null_blk_trace.c b/drivers/block/null_blk/trace.c
similarity index 93%
rename from drivers/block/null_blk_trace.c
rename to drivers/block/null_blk/trace.c
index f246e7bff698..3711cba16071 100644
--- a/drivers/block/null_blk_trace.c
+++ b/drivers/block/null_blk/trace.c
@@ -4,7 +4,7 @@
  *
  * Copyright (C) 2020 Western Digital Corporation or its affiliates.
  */
-#include "null_blk_trace.h"
+#include "trace.h"
 
 /*
  * Helper to use for all null_blk traces to extract disk name.
diff --git a/drivers/block/null_blk_trace.h b/drivers/block/null_blk/trace.h
similarity index 97%
rename from drivers/block/null_blk_trace.h
rename to drivers/block/null_blk/trace.h
index 4f83032eb544..ce3b430e88c5 100644
--- a/drivers/block/null_blk_trace.h
+++ b/drivers/block/null_blk/trace.h
@@ -73,7 +73,7 @@ TRACE_EVENT(nullb_report_zones,
 #undef TRACE_INCLUDE_PATH
 #define TRACE_INCLUDE_PATH .
 #undef TRACE_INCLUDE_FILE
-#define TRACE_INCLUDE_FILE null_blk_trace
+#define TRACE_INCLUDE_FILE trace
 
 /* This part must be outside protection */
 #include <trace/define_trace.h>
diff --git a/drivers/block/null_blk_zoned.c b/drivers/block/null_blk/zoned.c
similarity index 99%
rename from drivers/block/null_blk_zoned.c
rename to drivers/block/null_blk/zoned.c
index 87c9b6ebdccb..2e25a0a1c40d 100644
--- a/drivers/block/null_blk_zoned.c
+++ b/drivers/block/null_blk/zoned.c
@@ -4,7 +4,7 @@
 #include "null_blk.h"
 
 #define CREATE_TRACE_POINTS
-#include "null_blk_trace.h"
+#include "trace.h"
 
 #define MB_TO_SECTS(mb) (((sector_t)mb * SZ_1M) >> SECTOR_SHIFT)
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/9] null_blk fixes, improvements and cleanup
  2020-11-11  5:16 [PATCH v2 0/9] null_blk fixes, improvements and cleanup Damien Le Moal
                   ` (8 preceding siblings ...)
  2020-11-11  5:16 ` [PATCH v2 9/9] null_blk: Move driver into its own directory Damien Le Moal
@ 2020-11-11  6:04 ` Damien Le Moal
  9 siblings, 0 replies; 27+ messages in thread
From: Damien Le Moal @ 2020-11-11  6:04 UTC (permalink / raw)
  To: linux-block, Jens Axboe

On 2020/11/11 14:16, Damien Le Moal wrote:
> Jens,
> 
> This series provides fixes and improvements for null_blk.
> 
> The first two patches are bug fixes which likely should go into 5.10.
> The first patch fixes a problem with zone initialization when the
> device capacity is not a multiple of the zone size and the second
> patch fixes zone append handling.
> 
> The following patches are improvements and cleanups:
> * Patch 3 makes sure that the device max_sectors limit is aligned to the
>   block size.
> * Patch 4 improves zone locking overall, and especially the memory
>   backing disabled case by introducing a spinlock array to implement a
>   per zone lock in place of a global lock. With this patch, write
>   performance remains mostly unchanged, but read performance with a
>   multi-queue setup more than double from 1.3 MIOPS to 3.3 MIOPS (4K
>   random reads to zones with fio zonemode=zbd).
> * Patch 5 improves implicit zone close
> * Patch 6 and 7 cleanup discard handling code and use that code to free
>   the memory backing a zone that is being reset.
> * Patch 8 adds the max_sectors configuration option to allow changing
>   the max_sectors/max_hw_sectors of the device.
> * Finally, patch 9 moves nullblk into its own directory under
>   drivers/block/null_blk/
> 
> Comments are as always welcome.

I forgot to add the changelog. Here it is:

Changes from V1:
* Added patch 2, 3 and 8.
* Fix the last patch as suggested by Bart (file names under
  driver/block/null_blk/)
* Reworded patch 1 commit message to more correctly describe the problem.

Note that I also just sent a patch to improve blk_revalidate_disk_zones() to
catch problems such as what patch 1 here fixes. C.f. "[PATCH] block: Improve
blk_revalidate_disk_zones() checks".

> 
> Damien Le Moal (9):
>   null_blk: Fix zone size initialization
>   null_blk: Fail zone append to conventional zones
>   null_blk: Align max_hw_sectors to blocksize
>   null_blk: improve zone locking
>   null_blk: Improve implicit zone close
>   null_blk: cleanup discard handling
>   null_blk: discard zones on reset
>   null_blk: Allow controlling max_hw_sectors limit
>   null_blk: Move driver into its own directory
> 
>  drivers/block/Kconfig                         |   8 +-
>  drivers/block/Makefile                        |   7 +-
>  drivers/block/null_blk/Kconfig                |  12 +
>  drivers/block/null_blk/Makefile               |  11 +
>  .../{null_blk_main.c => null_blk/main.c}      |  65 +++--
>  drivers/block/{ => null_blk}/null_blk.h       |   9 +-
>  .../{null_blk_trace.c => null_blk/trace.c}    |   2 +-
>  .../{null_blk_trace.h => null_blk/trace.h}    |   2 +-
>  .../{null_blk_zoned.c => null_blk/zoned.c}    | 245 ++++++++++++------
>  9 files changed, 239 insertions(+), 122 deletions(-)
>  create mode 100644 drivers/block/null_blk/Kconfig
>  create mode 100644 drivers/block/null_blk/Makefile
>  rename drivers/block/{null_blk_main.c => null_blk/main.c} (97%)
>  rename drivers/block/{ => null_blk}/null_blk.h (94%)
>  rename drivers/block/{null_blk_trace.c => null_blk/trace.c} (93%)
>  rename drivers/block/{null_blk_trace.h => null_blk/trace.h} (97%)
>  rename drivers/block/{null_blk_zoned.c => null_blk/zoned.c} (76%)
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 9/9] null_blk: Move driver into its own directory
  2020-11-11  5:16 ` [PATCH v2 9/9] null_blk: Move driver into its own directory Damien Le Moal
@ 2020-11-11  7:52   ` Johannes Thumshirn
  0 siblings, 0 replies; 27+ messages in thread
From: Johannes Thumshirn @ 2020-11-11  7:52 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/9] null_blk: Fix zone size initialization
  2020-11-11  5:16 ` [PATCH v2 1/9] null_blk: Fix zone size initialization Damien Le Moal
@ 2020-11-11  8:05   ` Christoph Hellwig
  2020-11-11  8:06   ` Johannes Thumshirn
  1 sibling, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2020-11-11  8:05 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-block, Jens Axboe

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 2/9] null_blk: Fail zone append to conventional zones
  2020-11-11  5:16 ` [PATCH v2 2/9] null_blk: Fail zone append to conventional zones Damien Le Moal
@ 2020-11-11  8:06   ` Christoph Hellwig
  2020-11-11  8:07   ` Johannes Thumshirn
  1 sibling, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2020-11-11  8:06 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-block, Jens Axboe

On Wed, Nov 11, 2020 at 02:16:41PM +0900, Damien Le Moal wrote:
> Conventional zones do not have a write pointer and so cannot accept zone
> append writes. Make sure to fail any zone append write command issued to
> a conventional zone.
> 
> Reported-by: Naohiro Aota <naohiro.aota@wdc.com>
> Fixes: e0489ed5daeb ("null_blk: Support REQ_OP_ZONE_APPEND")
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/9] null_blk: Fix zone size initialization
  2020-11-11  5:16 ` [PATCH v2 1/9] null_blk: Fix zone size initialization Damien Le Moal
  2020-11-11  8:05   ` Christoph Hellwig
@ 2020-11-11  8:06   ` Johannes Thumshirn
  1 sibling, 0 replies; 27+ messages in thread
From: Johannes Thumshirn @ 2020-11-11  8:06 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 3/9] null_blk: Align max_hw_sectors to blocksize
  2020-11-11  5:16 ` [PATCH v2 3/9] null_blk: Align max_hw_sectors to blocksize Damien Le Moal
@ 2020-11-11  8:06   ` Christoph Hellwig
  2020-11-11  8:19     ` Damien Le Moal
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2020-11-11  8:06 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-block, Jens Axboe

On Wed, Nov 11, 2020 at 02:16:42PM +0900, Damien Le Moal wrote:
> null_blk always uses the default BLK_SAFE_MAX_SECTORS value for the
> max_hw_sectors and max_sectors queue limits resulting in a maximum
> request size of 127 sectors. When the blocksize setting is larger than
> the default 512B, this maximum request size is not aligned to the
> block size. To emulate a real device more accurately, fix this by
> setting the max_hw_sectors and max_sectors queue limits to a value
> that is aligned to the block size.

Shouldn't we fix this in the block layer instead of individual drivers?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 2/9] null_blk: Fail zone append to conventional zones
  2020-11-11  5:16 ` [PATCH v2 2/9] null_blk: Fail zone append to conventional zones Damien Le Moal
  2020-11-11  8:06   ` Christoph Hellwig
@ 2020-11-11  8:07   ` Johannes Thumshirn
  1 sibling, 0 replies; 27+ messages in thread
From: Johannes Thumshirn @ 2020-11-11  8:07 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 4/9] null_blk: improve zone locking
  2020-11-11  5:16 ` [PATCH v2 4/9] null_blk: improve zone locking Damien Le Moal
@ 2020-11-11  8:11   ` Christoph Hellwig
  2020-11-11  8:20     ` Damien Le Moal
  2020-11-11  8:18   ` Johannes Thumshirn
  1 sibling, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2020-11-11  8:11 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-block, Jens Axboe

> +	spin_lock_init(&dev->zone_res_lock);
> +	if (dev->memory_backed)
>  		dev->zone_locks = bitmap_zalloc(dev->nr_zones, GFP_KERNEL);
> +	else
> +		dev->zone_locks = kmalloc_array(dev->nr_zones,
> +						sizeof(spinlock_t), GFP_KERNEL);
> +	if (!dev->zone_locks) {

Using the same pointer for different types is pretty horrible.  At very
least we should use a union.

But I think we'd end up with less contention if we add a new

struct nullb_zone

that contains the fields of blk_zone that we actuall care about plus:

	union {
		spinlock_t	spinlock;
		mutex_t		mutex;
	}

and then use the proper lock primitive also for the memory backed case.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 6/9] null_blk: cleanup discard handling
  2020-11-11  5:16 ` [PATCH v2 6/9] null_blk: cleanup discard handling Damien Le Moal
@ 2020-11-11  8:13   ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2020-11-11  8:13 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-block, Jens Axboe

On Wed, Nov 11, 2020 at 02:16:45PM +0900, Damien Le Moal wrote:
>  static inline blk_status_t null_handle_memory_backed(struct nullb_cmd *cmd,
> -						     enum req_opf op)
> +						enum req_opf op, sector_t sector,

This adds an overly long line.

> +	if (op == REQ_OP_DISCARD) {
> +		null_handle_discard(dev, sector, nr_sectors);
> +		return 0;

Maybe add a return value to null_handle_discard to simplify this a bit?

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 7/9] null_blk: discard zones on reset
  2020-11-11  5:16 ` [PATCH v2 7/9] null_blk: discard zones on reset Damien Le Moal
@ 2020-11-11  8:14   ` Christoph Hellwig
  2020-11-11  8:19   ` Johannes Thumshirn
  1 sibling, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2020-11-11  8:14 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-block, Jens Axboe

On Wed, Nov 11, 2020 at 02:16:46PM +0900, Damien Le Moal wrote:
> When memory backing is enabled, use null_handle_discard() to free the
> backing memory used by a zone when the zone is being reset.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 8/9] null_blk: Allow controlling max_hw_sectors limit
  2020-11-11  5:16 ` [PATCH v2 8/9] null_blk: Allow controlling max_hw_sectors limit Damien Le Moal
@ 2020-11-11  8:14   ` Christoph Hellwig
  2020-11-11  8:22   ` Johannes Thumshirn
  1 sibling, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2020-11-11  8:14 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-block, Jens Axboe

On Wed, Nov 11, 2020 at 02:16:47PM +0900, Damien Le Moal wrote:
> Add the module option and configfs attribute max_sectors to allow
> configuring the maximum size of a command issued to a null_blk device.
> This allows exercising the block layer BIO splitting with different
> limits than the default BLK_SAFE_MAX_SECTORS. This is also useful for
> testing the zone append write path of file systems as the max_hw_sectors
> limit value is also used for the max_zone_append_sectors limit.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 4/9] null_blk: improve zone locking
  2020-11-11  5:16 ` [PATCH v2 4/9] null_blk: improve zone locking Damien Le Moal
  2020-11-11  8:11   ` Christoph Hellwig
@ 2020-11-11  8:18   ` Johannes Thumshirn
  1 sibling, 0 replies; 27+ messages in thread
From: Johannes Thumshirn @ 2020-11-11  8:18 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe

On 11/11/2020 06:17, Damien Le Moal wrote:
> With memory backing disabled, using a single spinlock for protecting
> zone information and zone resource management prevent the parallel
> execution on multiple queue of IO requests to different zones.
> Furthermore, regardless of the use of memory backing, if a null_blk
> device is created without limits on the number of opn and active zone                                              open ~^        zones ~^

> accounting for zone resource management is not necessary.
> 
> From these observations, zone locking is changed as follow to improve
                                             follows? ~^
> performance:
> 1) the zone_lock spinlock is renamed zone_res_lock and used only if zone
>    resource management is necessary, that is, if either zone_max_open or
>    zone_max_active are not 0. This is indicated using the new boolean
>    need_zone_res_mgmt in the nullb_device structure. null_zone_write()
>    is modified to reduce the amount of code executed with the
>    zone_res_lock spinlock held. null_zone_valid_read_len() is also
>    modified to avoid taking the zone lock before calling
>    null_process_cmd() for read operations in null_process_zoned_cmd().
> 2) With memory backing disabled, per zone locking is changed to a
>    spinlock per zone.
> 
> With these changes, fio performance with zonemode=zbd for 4K random
> read and random write on a dual socket (24 cores per socket) machine
> using the none schedulder is as follows:         scheduler ~^

> 
> before patch:
> 	write (psync x 96 jobs) = 465 KIOPS
> 	read (libaio@qd=8 x 96 jobs) = 1361 KIOPS
> after patch:
> 	write (psync x 96 jobs) = 468 KIOPS
> 	read (libaio@qd=8 x 96 jobs) = 3340 KIOPS
> 
> Write performance remains mostly unchanged but read performance more
> than double. Performance when using the mq-deadline scheduler is not
doubles ~^

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 3/9] null_blk: Align max_hw_sectors to blocksize
  2020-11-11  8:06   ` Christoph Hellwig
@ 2020-11-11  8:19     ` Damien Le Moal
  0 siblings, 0 replies; 27+ messages in thread
From: Damien Le Moal @ 2020-11-11  8:19 UTC (permalink / raw)
  To: hch; +Cc: linux-block, Jens Axboe

On 2020/11/11 17:07, Christoph Hellwig wrote:
> On Wed, Nov 11, 2020 at 02:16:42PM +0900, Damien Le Moal wrote:
>> null_blk always uses the default BLK_SAFE_MAX_SECTORS value for the
>> max_hw_sectors and max_sectors queue limits resulting in a maximum
>> request size of 127 sectors. When the blocksize setting is larger than
>> the default 512B, this maximum request size is not aligned to the
>> block size. To emulate a real device more accurately, fix this by
>> setting the max_hw_sectors and max_sectors queue limits to a value
>> that is aligned to the block size.
> 
> Shouldn't we fix this in the block layer instead of individual drivers?
> 

Yes, I wondered about that too. Probably a good idea to have that generic but
that will require changing both blk_queue_logical_block_size() and
blk_queue_max_hw_sectors() since drivers may call these in different order.

Jens, thoughts ?

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 7/9] null_blk: discard zones on reset
  2020-11-11  5:16 ` [PATCH v2 7/9] null_blk: discard zones on reset Damien Le Moal
  2020-11-11  8:14   ` Christoph Hellwig
@ 2020-11-11  8:19   ` Johannes Thumshirn
  1 sibling, 0 replies; 27+ messages in thread
From: Johannes Thumshirn @ 2020-11-11  8:19 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe

Sounds reasonable,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 4/9] null_blk: improve zone locking
  2020-11-11  8:11   ` Christoph Hellwig
@ 2020-11-11  8:20     ` Damien Le Moal
  0 siblings, 0 replies; 27+ messages in thread
From: Damien Le Moal @ 2020-11-11  8:20 UTC (permalink / raw)
  To: hch; +Cc: linux-block, Jens Axboe

On 2020/11/11 17:11, Christoph Hellwig wrote:
>> +	spin_lock_init(&dev->zone_res_lock);
>> +	if (dev->memory_backed)
>>  		dev->zone_locks = bitmap_zalloc(dev->nr_zones, GFP_KERNEL);
>> +	else
>> +		dev->zone_locks = kmalloc_array(dev->nr_zones,
>> +						sizeof(spinlock_t), GFP_KERNEL);
>> +	if (!dev->zone_locks) {
> 
> Using the same pointer for different types is pretty horrible.  At very
> least we should use a union.
> 
> But I think we'd end up with less contention if we add a new
> 
> struct nullb_zone
> 
> that contains the fields of blk_zone that we actuall care about plus:
> 
> 	union {
> 		spinlock_t	spinlock;
> 		mutex_t		mutex;
> 	}
> 
> and then use the proper lock primitive also for the memory backed case.

OK. Cooking V3 with that.



-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 8/9] null_blk: Allow controlling max_hw_sectors limit
  2020-11-11  5:16 ` [PATCH v2 8/9] null_blk: Allow controlling max_hw_sectors limit Damien Le Moal
  2020-11-11  8:14   ` Christoph Hellwig
@ 2020-11-11  8:22   ` Johannes Thumshirn
  2020-11-11  8:23     ` Damien Le Moal
  1 sibling, 1 reply; 27+ messages in thread
From: Johannes Thumshirn @ 2020-11-11  8:22 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe

On 11/11/2020 06:17, Damien Le Moal wrote:
> Add the module option and configfs attribute max_sectors to allow
> configuring the maximum size of a command issued to a null_blk device.
> This allows exercising the block layer BIO splitting with different
> limits than the default BLK_SAFE_MAX_SECTORS. This is also useful for
> testing the zone append write path of file systems as the max_hw_sectors
> limit value is also used for the max_zone_append_sectors limit.

Oh this sounds super useful for zonefs-tests and xfstests
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 8/9] null_blk: Allow controlling max_hw_sectors limit
  2020-11-11  8:22   ` Johannes Thumshirn
@ 2020-11-11  8:23     ` Damien Le Moal
  0 siblings, 0 replies; 27+ messages in thread
From: Damien Le Moal @ 2020-11-11  8:23 UTC (permalink / raw)
  To: Johannes Thumshirn, linux-block, Jens Axboe

On 2020/11/11 17:22, Johannes Thumshirn wrote:
> On 11/11/2020 06:17, Damien Le Moal wrote:
>> Add the module option and configfs attribute max_sectors to allow
>> configuring the maximum size of a command issued to a null_blk device.
>> This allows exercising the block layer BIO splitting with different
>> limits than the default BLK_SAFE_MAX_SECTORS. This is also useful for
>> testing the zone append write path of file systems as the max_hw_sectors
>> limit value is also used for the max_zone_append_sectors limit.
> 
> Oh this sounds super useful for zonefs-tests and xfstests

And btrfs-zoned. That's why I added it :)

> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2020-11-11  8:24 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-11  5:16 [PATCH v2 0/9] null_blk fixes, improvements and cleanup Damien Le Moal
2020-11-11  5:16 ` [PATCH v2 1/9] null_blk: Fix zone size initialization Damien Le Moal
2020-11-11  8:05   ` Christoph Hellwig
2020-11-11  8:06   ` Johannes Thumshirn
2020-11-11  5:16 ` [PATCH v2 2/9] null_blk: Fail zone append to conventional zones Damien Le Moal
2020-11-11  8:06   ` Christoph Hellwig
2020-11-11  8:07   ` Johannes Thumshirn
2020-11-11  5:16 ` [PATCH v2 3/9] null_blk: Align max_hw_sectors to blocksize Damien Le Moal
2020-11-11  8:06   ` Christoph Hellwig
2020-11-11  8:19     ` Damien Le Moal
2020-11-11  5:16 ` [PATCH v2 4/9] null_blk: improve zone locking Damien Le Moal
2020-11-11  8:11   ` Christoph Hellwig
2020-11-11  8:20     ` Damien Le Moal
2020-11-11  8:18   ` Johannes Thumshirn
2020-11-11  5:16 ` [PATCH v2 5/9] null_blk: Improve implicit zone close Damien Le Moal
2020-11-11  5:16 ` [PATCH v2 6/9] null_blk: cleanup discard handling Damien Le Moal
2020-11-11  8:13   ` Christoph Hellwig
2020-11-11  5:16 ` [PATCH v2 7/9] null_blk: discard zones on reset Damien Le Moal
2020-11-11  8:14   ` Christoph Hellwig
2020-11-11  8:19   ` Johannes Thumshirn
2020-11-11  5:16 ` [PATCH v2 8/9] null_blk: Allow controlling max_hw_sectors limit Damien Le Moal
2020-11-11  8:14   ` Christoph Hellwig
2020-11-11  8:22   ` Johannes Thumshirn
2020-11-11  8:23     ` Damien Le Moal
2020-11-11  5:16 ` [PATCH v2 9/9] null_blk: Move driver into its own directory Damien Le Moal
2020-11-11  7:52   ` Johannes Thumshirn
2020-11-11  6:04 ` [PATCH v2 0/9] null_blk fixes, improvements and cleanup Damien Le Moal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.