linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] zoned null_blk fixes
@ 2020-10-29 11:04 Damien Le Moal
  2020-10-29 11:04 ` [PATCH 1/2] null_blk: Fix zone reset all tracing Damien Le Moal
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Damien Le Moal @ 2020-10-29 11:04 UTC (permalink / raw)
  To: linux-block, Jens Axboe; +Cc: Chaitanya Kulkarni, Kanchan Joshi

Jens,

Here is a couple of fixes for nullblk for this cycle.

The first patch fixes tracing of zone condition in the case of a
REQ_OP_ZONE_RESET_ALL operation.

The second patch fixes a more serious problem introduced with the recent
modification for protecting zone information using a spinlock. A
spinlock cannot be used when memory backing is turned on as that results
in potential memory allocations with the lock held and IRQ disabled.
The fix changes the locking method to using a bit, with the spinlock
retained only for function local protection. This new locking was
extensively tested with xfstests/btrfs and zonefs tests runs.

Damien Le Moal (2):
  null_blk: Fix zone reset all tracing
  null_blk: Fix locking in zoned mode

 drivers/block/null_blk.h       |   3 +-
 drivers/block/null_blk_zoned.c | 113 +++++++++++++++++++++++++--------
 2 files changed, 89 insertions(+), 27 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2] null_blk: Fix zone reset all tracing
  2020-10-29 11:04 [PATCH 0/2] zoned null_blk fixes Damien Le Moal
@ 2020-10-29 11:04 ` Damien Le Moal
  2020-10-29 11:05 ` [PATCH 2/2] null_blk: Fix locking in zoned mode Damien Le Moal
  2020-10-29 13:27 ` [PATCH 0/2] zoned null_blk fixes Jens Axboe
  2 siblings, 0 replies; 4+ messages in thread
From: Damien Le Moal @ 2020-10-29 11:04 UTC (permalink / raw)
  To: linux-block, Jens Axboe; +Cc: Chaitanya Kulkarni, Kanchan Joshi

In the cae of the REQ_OP_ZONE_RESET_ALL operation, the command sector is
ignored and the operation is applied to all sequential zones. For these
commands, tracing the effect of the command using the command sector to
determine the target zone is thus incorrect.

Fix null_zone_mgmt() zone condition tracing in the case of
REQ_OP_ZONE_RESET_ALL to apply tracing to all sequential zones that are
not already empty.

Fixes: 766c3297d7e1 ("null_blk: add trace in null_blk_zoned.c")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/block/null_blk_zoned.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/block/null_blk_zoned.c b/drivers/block/null_blk_zoned.c
index 98056c88926b..b637b16a5f54 100644
--- a/drivers/block/null_blk_zoned.c
+++ b/drivers/block/null_blk_zoned.c
@@ -475,9 +475,14 @@ static blk_status_t null_zone_mgmt(struct nullb_cmd *cmd, enum req_opf op,
 
 	switch (op) {
 	case REQ_OP_ZONE_RESET_ALL:
-		for (i = dev->zone_nr_conv; i < dev->nr_zones; i++)
-			null_reset_zone(dev, &dev->zones[i]);
-		break;
+		for (i = dev->zone_nr_conv; i < dev->nr_zones; i++) {
+			zone = &dev->zones[i];
+			if (zone->cond != BLK_ZONE_COND_EMPTY) {
+				null_reset_zone(dev, zone);
+				trace_nullb_zone_op(cmd, i, zone->cond);
+			}
+		}
+		return BLK_STS_OK;
 	case REQ_OP_ZONE_RESET:
 		ret = null_reset_zone(dev, zone);
 		break;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] null_blk: Fix locking in zoned mode
  2020-10-29 11:04 [PATCH 0/2] zoned null_blk fixes Damien Le Moal
  2020-10-29 11:04 ` [PATCH 1/2] null_blk: Fix zone reset all tracing Damien Le Moal
@ 2020-10-29 11:05 ` Damien Le Moal
  2020-10-29 13:27 ` [PATCH 0/2] zoned null_blk fixes Jens Axboe
  2 siblings, 0 replies; 4+ messages in thread
From: Damien Le Moal @ 2020-10-29 11:05 UTC (permalink / raw)
  To: linux-block, Jens Axboe; +Cc: Chaitanya Kulkarni, Kanchan Joshi

When the zoned mode is enabled in null_blk, Serializing read, write
and zone management operations for each zone is necessary to protect
device level information for managing zone resources (zone open and
closed counters) as well as each zone condition and write pointer
position. Commit 35bc10b2eafb ("null_blk: synchronization fix for
zoned device") introduced a spinlock to implement this serialization.
However, when memory backing is also enabled, GFP_NOIO memory
allocations are executed under the spinlock, resulting in might_sleep()
warnings. Furthermore, the zone_lock spinlock is locked/unlocked using
spin_lock_irq/spin_unlock_irq, similarly to the memory backing code with
the nullb->lock spinlock. This nested use of irq locks wrecks the irq
enabled/disabled state.

Fix all this by introducing a bitmap for per-zone lock, with locking
implemented using wait_on_bit_lock_io() and clear_and_wake_up_bit().
This locking mechanism allows keeping a zone locked while executing
null_process_cmd(), serializing all operations to the zone while
allowing to sleep during memory backing allocation with GFP_NOIO.
Device level zone resource management information is protected using
a spinlock which is not held while executing null_process_cmd();

Fixes: 35bc10b2eafb ("null_blk: synchronization fix for zoned device")
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/block/null_blk.h       |   3 +-
 drivers/block/null_blk_zoned.c | 104 +++++++++++++++++++++++++--------
 2 files changed, 82 insertions(+), 25 deletions(-)

diff --git a/drivers/block/null_blk.h b/drivers/block/null_blk.h
index 3176b269b822..cfd00ad40355 100644
--- a/drivers/block/null_blk.h
+++ b/drivers/block/null_blk.h
@@ -47,7 +47,8 @@ struct nullb_device {
 	unsigned int nr_zones_closed;
 	struct blk_zone *zones;
 	sector_t zone_size_sects;
-	spinlock_t zone_lock;
+	spinlock_t zone_dev_lock;
+	unsigned long *zone_locks;
 
 	unsigned long size; /* device size in MB */
 	unsigned long completion_nsec; /* time in ns to complete a request */
diff --git a/drivers/block/null_blk_zoned.c b/drivers/block/null_blk_zoned.c
index b637b16a5f54..8775acbb4f8f 100644
--- a/drivers/block/null_blk_zoned.c
+++ b/drivers/block/null_blk_zoned.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <linux/vmalloc.h>
+#include <linux/bitmap.h>
 #include "null_blk.h"
 
 #define CREATE_TRACE_POINTS
@@ -45,7 +46,13 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
 	if (!dev->zones)
 		return -ENOMEM;
 
-	spin_lock_init(&dev->zone_lock);
+	spin_lock_init(&dev->zone_dev_lock);
+	dev->zone_locks = bitmap_zalloc(dev->nr_zones, GFP_KERNEL);
+	if (!dev->zone_locks) {
+		kvfree(dev->zones);
+		return -ENOMEM;
+	}
+
 	if (dev->zone_nr_conv >= dev->nr_zones) {
 		dev->zone_nr_conv = dev->nr_zones - 1;
 		pr_info("changed the number of conventional zones to %u",
@@ -124,15 +131,26 @@ int null_register_zoned_dev(struct nullb *nullb)
 
 void null_free_zoned_dev(struct nullb_device *dev)
 {
+	bitmap_free(dev->zone_locks);
 	kvfree(dev->zones);
 }
 
+static inline void null_lock_zone(struct nullb_device *dev, unsigned int zno)
+{
+	wait_on_bit_lock_io(dev->zone_locks, zno, TASK_UNINTERRUPTIBLE);
+}
+
+static inline void null_unlock_zone(struct nullb_device *dev, unsigned int zno)
+{
+	clear_and_wake_up_bit(zno, dev->zone_locks);
+}
+
 int null_report_zones(struct gendisk *disk, sector_t sector,
 		unsigned int nr_zones, report_zones_cb cb, void *data)
 {
 	struct nullb *nullb = disk->private_data;
 	struct nullb_device *dev = nullb->dev;
-	unsigned int first_zone, i;
+	unsigned int first_zone, i, zno;
 	struct blk_zone zone;
 	int error;
 
@@ -143,17 +161,17 @@ int null_report_zones(struct gendisk *disk, sector_t sector,
 	nr_zones = min(nr_zones, dev->nr_zones - first_zone);
 	trace_nullb_report_zones(nullb, nr_zones);
 
-	for (i = 0; i < nr_zones; i++) {
+	zno = first_zone;
+	for (i = 0; i < nr_zones; i++, zno++) {
 		/*
 		 * Stacked DM target drivers will remap the zone information by
 		 * modifying the zone information passed to the report callback.
 		 * So use a local copy to avoid corruption of the device zone
 		 * array.
 		 */
-		spin_lock_irq(&dev->zone_lock);
-		memcpy(&zone, &dev->zones[first_zone + i],
-		       sizeof(struct blk_zone));
-		spin_unlock_irq(&dev->zone_lock);
+		null_lock_zone(dev, zno);
+		memcpy(&zone, &dev->zones[zno], sizeof(struct blk_zone));
+		null_unlock_zone(dev, zno);
 
 		error = cb(&zone, i, data);
 		if (error)
@@ -163,6 +181,10 @@ int null_report_zones(struct gendisk *disk, sector_t sector,
 	return nr_zones;
 }
 
+/*
+ * This is called in the case of memory backing from null_process_cmd()
+ * with the target zone already locked.
+ */
 size_t null_zone_valid_read_len(struct nullb *nullb,
 				sector_t sector, unsigned int len)
 {
@@ -299,22 +321,27 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector,
 	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
 		return null_process_cmd(cmd, REQ_OP_WRITE, sector, nr_sectors);
 
+	null_lock_zone(dev, zno);
+	spin_lock(&dev->zone_dev_lock);
+
 	switch (zone->cond) {
 	case BLK_ZONE_COND_FULL:
 		/* Cannot write to a full zone */
-		return BLK_STS_IOERR;
+		ret = BLK_STS_IOERR;
+		goto unlock;
 	case BLK_ZONE_COND_EMPTY:
 	case BLK_ZONE_COND_CLOSED:
 		ret = null_check_zone_resources(dev, zone);
 		if (ret != BLK_STS_OK)
-			return ret;
+			goto unlock;
 		break;
 	case BLK_ZONE_COND_IMP_OPEN:
 	case BLK_ZONE_COND_EXP_OPEN:
 		break;
 	default:
 		/* Invalid zone condition */
-		return BLK_STS_IOERR;
+		ret = BLK_STS_IOERR;
+		goto unlock;
 	}
 
 	/*
@@ -330,11 +357,14 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector,
 		else
 			cmd->rq->__sector = sector;
 	} else if (sector != zone->wp) {
-		return BLK_STS_IOERR;
+		ret = BLK_STS_IOERR;
+		goto unlock;
 	}
 
-	if (zone->wp + nr_sectors > zone->start + zone->capacity)
-		return BLK_STS_IOERR;
+	if (zone->wp + nr_sectors > zone->start + zone->capacity) {
+		ret = BLK_STS_IOERR;
+		goto unlock;
+	}
 
 	if (zone->cond == BLK_ZONE_COND_CLOSED) {
 		dev->nr_zones_closed--;
@@ -345,9 +375,11 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector,
 	if (zone->cond != BLK_ZONE_COND_EXP_OPEN)
 		zone->cond = BLK_ZONE_COND_IMP_OPEN;
 
+	spin_unlock(&dev->zone_dev_lock);
 	ret = null_process_cmd(cmd, REQ_OP_WRITE, sector, nr_sectors);
+	spin_lock(&dev->zone_dev_lock);
 	if (ret != BLK_STS_OK)
-		return ret;
+		goto unlock;
 
 	zone->wp += nr_sectors;
 	if (zone->wp == zone->start + zone->capacity) {
@@ -357,7 +389,13 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector,
 			dev->nr_zones_imp_open--;
 		zone->cond = BLK_ZONE_COND_FULL;
 	}
-	return BLK_STS_OK;
+	ret = BLK_STS_OK;
+
+unlock:
+	spin_unlock(&dev->zone_dev_lock);
+	null_unlock_zone(dev, zno);
+
+	return ret;
 }
 
 static blk_status_t null_open_zone(struct nullb_device *dev, struct blk_zone *zone)
@@ -468,21 +506,33 @@ static blk_status_t null_zone_mgmt(struct nullb_cmd *cmd, enum req_opf op,
 				   sector_t sector)
 {
 	struct nullb_device *dev = cmd->nq->dev;
-	unsigned int zone_no = null_zone_no(dev, sector);
-	struct blk_zone *zone = &dev->zones[zone_no];
-	blk_status_t ret = BLK_STS_OK;
+	unsigned int zone_no;
+	struct blk_zone *zone;
+	blk_status_t ret;
 	size_t i;
 
-	switch (op) {
-	case REQ_OP_ZONE_RESET_ALL:
+	if (op == REQ_OP_ZONE_RESET_ALL) {
 		for (i = dev->zone_nr_conv; i < dev->nr_zones; i++) {
+			null_lock_zone(dev, i);
 			zone = &dev->zones[i];
 			if (zone->cond != BLK_ZONE_COND_EMPTY) {
+				spin_lock(&dev->zone_dev_lock);
 				null_reset_zone(dev, zone);
+				spin_unlock(&dev->zone_dev_lock);
 				trace_nullb_zone_op(cmd, i, zone->cond);
 			}
+			null_unlock_zone(dev, i);
 		}
 		return BLK_STS_OK;
+	}
+
+	zone_no = null_zone_no(dev, sector);
+	zone = &dev->zones[zone_no];
+
+	null_lock_zone(dev, zone_no);
+	spin_lock(&dev->zone_dev_lock);
+
+	switch (op) {
 	case REQ_OP_ZONE_RESET:
 		ret = null_reset_zone(dev, zone);
 		break;
@@ -496,22 +546,27 @@ static blk_status_t null_zone_mgmt(struct nullb_cmd *cmd, enum req_opf op,
 		ret = null_finish_zone(dev, zone);
 		break;
 	default:
-		return BLK_STS_NOTSUPP;
+		ret = BLK_STS_NOTSUPP;
+		break;
 	}
 
+	spin_unlock(&dev->zone_dev_lock);
+
 	if (ret == BLK_STS_OK)
 		trace_nullb_zone_op(cmd, zone_no, zone->cond);
 
+	null_unlock_zone(dev, zone_no);
+
 	return ret;
 }
 
 blk_status_t null_process_zoned_cmd(struct nullb_cmd *cmd, enum req_opf op,
 				    sector_t sector, sector_t nr_sectors)
 {
-	blk_status_t sts;
 	struct nullb_device *dev = cmd->nq->dev;
+	unsigned int zno = null_zone_no(dev, sector);
+	blk_status_t sts;
 
-	spin_lock_irq(&dev->zone_lock);
 	switch (op) {
 	case REQ_OP_WRITE:
 		sts = null_zone_write(cmd, sector, nr_sectors, false);
@@ -527,9 +582,10 @@ blk_status_t null_process_zoned_cmd(struct nullb_cmd *cmd, enum req_opf op,
 		sts = null_zone_mgmt(cmd, op, sector);
 		break;
 	default:
+		null_lock_zone(dev, zno);
 		sts = null_process_cmd(cmd, op, sector, nr_sectors);
+		null_unlock_zone(dev, zno);
 	}
-	spin_unlock_irq(&dev->zone_lock);
 
 	return sts;
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/2] zoned null_blk fixes
  2020-10-29 11:04 [PATCH 0/2] zoned null_blk fixes Damien Le Moal
  2020-10-29 11:04 ` [PATCH 1/2] null_blk: Fix zone reset all tracing Damien Le Moal
  2020-10-29 11:05 ` [PATCH 2/2] null_blk: Fix locking in zoned mode Damien Le Moal
@ 2020-10-29 13:27 ` Jens Axboe
  2 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2020-10-29 13:27 UTC (permalink / raw)
  To: Damien Le Moal, linux-block; +Cc: Chaitanya Kulkarni, Kanchan Joshi

On 10/29/20 5:04 AM, Damien Le Moal wrote:
> Jens,
> 
> Here is a couple of fixes for nullblk for this cycle.
> 
> The first patch fixes tracing of zone condition in the case of a
> REQ_OP_ZONE_RESET_ALL operation.
> 
> The second patch fixes a more serious problem introduced with the recent
> modification for protecting zone information using a spinlock. A
> spinlock cannot be used when memory backing is turned on as that results
> in potential memory allocations with the lock held and IRQ disabled.
> The fix changes the locking method to using a bit, with the spinlock
> retained only for function local protection. This new locking was
> extensively tested with xfstests/btrfs and zonefs tests runs.

Applied, thanks.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-10-29 13:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-29 11:04 [PATCH 0/2] zoned null_blk fixes Damien Le Moal
2020-10-29 11:04 ` [PATCH 1/2] null_blk: Fix zone reset all tracing Damien Le Moal
2020-10-29 11:05 ` [PATCH 2/2] null_blk: Fix locking in zoned mode Damien Le Moal
2020-10-29 13:27 ` [PATCH 0/2] zoned null_blk fixes Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).