linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/4] Initial support for multi-actuator HDDs
@ 2021-07-26  1:38 Damien Le Moal
  2021-07-26  1:38 ` [PATCH v3 1/4] block: Add concurrent positioning ranges support Damien Le Moal
                   ` (6 more replies)
  0 siblings, 7 replies; 27+ messages in thread
From: Damien Le Moal @ 2021-07-26  1:38 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen, linux-ide
  Cc: Hannes Reinecke

Single LUN multi-actuator hard-disks are cappable to seek and execute
multiple commands in parallel. This capability is exposed to the host
using the Concurrent Positioning Ranges VPD page (SCSI) and Log (ATA).
Each positioning range describes the contiguous set of LBAs that an
actuator serves.

This series adds support the scsi disk driver to retreive this
information and advertize it to user space through sysfs. libata is also
modified to handle ATA drives.

The first patch adds the block layer plumbing to expose concurrent
sector ranges of the device through sysfs as a sub-directory of the
device sysfs queue directory. Patch 2 and 3 add support to sd and
libata. Finally patch 4 documents the sysfs queue attributed changes.

This series does not attempt in any way to optimize accesses to
multi-actuator devices (e.g. block IO scheduler or filesystems). This
initial support only exposes the actuators information to user space
through sysfs.

Changes from v2:
* Update patch 1 to fix a compilation warning for a potential NULL
  pointer dereference of the cr argument of blk_queue_set_cranges().
  Warning reported by the kernel test robot <lkp@intel.com>).

Changes from v1:
* Moved libata-scsi hunk from patch 1 to patch 3 where it belongs
* Fixed unintialized variable in patch 2
  Reported-by: kernel test robot <lkp@intel.com>
  Reported-by: Dan Carpenter <dan.carpenter@oracle.com
* Changed patch 3 adding struct ata_cpr_log to contain both the number
  of concurrent ranges and the array of concurrent ranges.
* Added a note in the documentation (patch 4) about the unit used for
  the concurrent ranges attributes.

Damien Le Moal (4):
  block: Add concurrent positioning ranges support
  scsi: sd: add concurrent positioning ranges support
  libata: support concurrent positioning ranges log
  doc: document sysfs queue/cranges attributes

 Documentation/block/queue-sysfs.rst |  30 ++-
 block/Makefile                      |   2 +-
 block/blk-cranges.c                 | 295 ++++++++++++++++++++++++++++
 block/blk-sysfs.c                   |  13 ++
 block/blk.h                         |   3 +
 drivers/ata/libata-core.c           |  52 +++++
 drivers/ata/libata-scsi.c           |  48 ++++-
 drivers/scsi/sd.c                   |  81 ++++++++
 drivers/scsi/sd.h                   |   1 +
 include/linux/ata.h                 |   1 +
 include/linux/blkdev.h              |  29 +++
 include/linux/libata.h              |  15 ++
 12 files changed, 559 insertions(+), 11 deletions(-)
 create mode 100644 block/blk-cranges.c

-- 
2.31.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 1/4] block: Add concurrent positioning ranges support
  2021-07-26  1:38 [PATCH v3 0/4] Initial support for multi-actuator HDDs Damien Le Moal
@ 2021-07-26  1:38 ` Damien Le Moal
  2021-07-26  7:33   ` Hannes Reinecke
  2021-08-10  8:23   ` Christoph Hellwig
  2021-07-26  1:38 ` [PATCH v3 2/4] scsi: sd: add " Damien Le Moal
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 27+ messages in thread
From: Damien Le Moal @ 2021-07-26  1:38 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen, linux-ide
  Cc: Hannes Reinecke

The Concurrent Positioning Ranges VPD page (for SCSI) and Log (for ATA)
contain parameters describing the number of sets of contiguous LBAs that
can be served independently by a single LUN multi-actuator disk. This
patch provides the blk_queue_set_cranges() function allowing a device
driver to signal to the block layer that a disk has multiple actuators,
each one serving a contiguous range of sectors. To describe the set
of sector ranges representing the different actuators of a device, the
data type struct blk_cranges is introduced.

For a device with multiple actuators, a struct blk_cranges is attached
to the device request queue by the blk_queue_set_cranges() function. The
function blk_alloc_cranges() is provided for drivers to allocate this
structure.

The blk_cranges structure contains kobjects (struct kobject) to register
with sysfs the set of sector ranges defined by a device. On initial
device scan, this registration is done from blk_register_queue() using
the block layer internal function blk_register_cranges(). If a driver
calls blk_queue_set_cranges() for a registered queue, e.g. when a device
is revalidated, blk_queue_set_cranges() will execute
blk_register_cranges() to update the queue sysfs attribute files.

The sysfs file structure created starts from the cranges sub-directory
and contains the start sector and number of sectors served by an
actuator, with the information for each actuator grouped in one
directory per actuator. E.g. for a dual actuator drive, we have:

$ tree /sys/block/sdk/queue/cranges/
/sys/block/sdk/queue/cranges/
|-- 0
|   |-- nr_sectors
|   `-- sector
`-- 1
    |-- nr_sectors
    `-- sector

For a regular single actuator device, the cranges directory does not
exist.

Device revalidation may lead to changes to this structure and to the
attribute values. When manipulated, the queue sysfs_lock and
sysfs_dir_lock are held for atomicity, similarly to how the blk-mq and
elevator sysfs queue sub-directories are protected.

The code related to the management of cranges is added in the new
file block/blk-cranges.c.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 block/Makefile         |   2 +-
 block/blk-cranges.c    | 295 +++++++++++++++++++++++++++++++++++++++++
 block/blk-sysfs.c      |  13 ++
 block/blk.h            |   3 +
 include/linux/blkdev.h |  29 ++++
 5 files changed, 341 insertions(+), 1 deletion(-)
 create mode 100644 block/blk-cranges.c

diff --git a/block/Makefile b/block/Makefile
index bfbe4e13ca1e..e477e6ca9ea6 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -9,7 +9,7 @@ obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-sysfs.o \
 			blk-lib.o blk-mq.o blk-mq-tag.o blk-stat.o \
 			blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \
 			genhd.o ioprio.o badblocks.o partitions/ blk-rq-qos.o \
-			disk-events.o
+			disk-events.o blk-cranges.o
 
 obj-$(CONFIG_BOUNCE)		+= bounce.o
 obj-$(CONFIG_BLK_SCSI_REQUEST)	+= scsi_ioctl.o
diff --git a/block/blk-cranges.c b/block/blk-cranges.c
new file mode 100644
index 000000000000..deaa09e564f7
--- /dev/null
+++ b/block/blk-cranges.c
@@ -0,0 +1,295 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ *  Block device concurrent positioning ranges.
+ *
+ *  Copyright (C) 2021 Western Digital Corporation or its Affiliates.
+ */
+#include <linux/kernel.h>
+#include <linux/blkdev.h>
+#include <linux/slab.h>
+#include <linux/init.h>
+
+#include "blk.h"
+
+static ssize_t blk_crange_sector_show(struct blk_crange *cr, char *page)
+{
+	return sprintf(page, "%llu\n", cr->sector);
+}
+
+static ssize_t blk_crange_nr_sectors_show(struct blk_crange *cr, char *page)
+{
+	return sprintf(page, "%llu\n", cr->nr_sectors);
+}
+
+struct blk_crange_sysfs_entry {
+	struct attribute attr;
+	ssize_t (*show)(struct blk_crange *cr, char *page);
+};
+
+static struct blk_crange_sysfs_entry blk_crange_sector_entry = {
+	.attr = { .name = "sector", .mode = 0444 },
+	.show = blk_crange_sector_show,
+};
+
+static struct blk_crange_sysfs_entry blk_crange_nr_sectors_entry = {
+	.attr = { .name = "nr_sectors", .mode = 0444 },
+	.show = blk_crange_nr_sectors_show,
+};
+
+static struct attribute *blk_crange_attrs[] = {
+	&blk_crange_sector_entry.attr,
+	&blk_crange_nr_sectors_entry.attr,
+	NULL,
+};
+ATTRIBUTE_GROUPS(blk_crange);
+
+static ssize_t blk_crange_sysfs_show(struct kobject *kobj,
+				     struct attribute *attr, char *page)
+{
+	struct blk_crange_sysfs_entry *entry;
+	struct blk_crange *cr;
+	struct request_queue *q;
+	ssize_t ret;
+
+	entry = container_of(attr, struct blk_crange_sysfs_entry, attr);
+	cr = container_of(kobj, struct blk_crange, kobj);
+	q = cr->queue;
+
+	mutex_lock(&q->sysfs_lock);
+	ret = entry->show(cr, page);
+	mutex_unlock(&q->sysfs_lock);
+
+	return ret;
+}
+
+static const struct sysfs_ops blk_crange_sysfs_ops = {
+	.show	= blk_crange_sysfs_show,
+};
+
+/*
+ * Dummy release function to make kobj happy.
+ */
+static void blk_cranges_sysfs_nop_release(struct kobject *kobj)
+{
+}
+
+static struct kobj_type blk_crange_ktype = {
+	.sysfs_ops	= &blk_crange_sysfs_ops,
+	.default_groups	= blk_crange_groups,
+	.release	= blk_cranges_sysfs_nop_release,
+};
+
+static struct kobj_type blk_cranges_ktype = {
+	.release	= blk_cranges_sysfs_nop_release,
+};
+
+/**
+ * blk_register_cranges - register with sysfs a set of concurrent ranges
+ * @disk:		Target disk
+ * @new_cranges:	New set of concurrent ranges
+ *
+ * Register with sysfs a set of concurrent ranges for @disk. If @new_cranges
+ * is not NULL, this set of concurrent ranges is registered and the
+ * old set specified by q->cranges is unregistered. Otherwise, q->cranges
+ * is registered if it is not already.
+ */
+int blk_register_cranges(struct gendisk *disk, struct blk_cranges *new_cranges)
+{
+	struct request_queue *q = disk->queue;
+	struct blk_cranges *cranges;
+	int i, ret;
+
+	lockdep_assert_held(&q->sysfs_dir_lock);
+	lockdep_assert_held(&q->sysfs_lock);
+
+	/* If a new range set is specified, unregister the old one */
+	if (new_cranges) {
+		if (q->cranges)
+			blk_unregister_cranges(disk);
+		q->cranges = new_cranges;
+	}
+
+	cranges = q->cranges;
+	if (!cranges)
+		return 0;
+
+	/*
+	 * At this point, q->cranges is the new set of sector ranges that needs
+	 * to be registered with sysfs.
+	 */
+	WARN_ON(cranges->sysfs_registered);
+	ret = kobject_init_and_add(&cranges->kobj, &blk_cranges_ktype,
+				   &q->kobj, "%s", "cranges");
+	if (ret)
+		goto free;
+
+	for (i = 0; i < cranges->nr_ranges; i++) {
+		cranges->ranges[i].queue = q;
+		ret = kobject_init_and_add(&cranges->ranges[i].kobj,
+					   &blk_crange_ktype, &cranges->kobj,
+					   "%d", i);
+		if (ret)
+			goto delete_obj;
+	}
+
+	cranges->sysfs_registered = true;
+
+	return 0;
+
+delete_obj:
+	while (--i >= 0)
+		kobject_del(&cranges->ranges[i].kobj);
+	kobject_del(&cranges->kobj);
+free:
+	kfree(cranges);
+	q->cranges = NULL;
+	return ret;
+}
+
+void blk_unregister_cranges(struct gendisk *disk)
+{
+	struct request_queue *q = disk->queue;
+	struct blk_cranges *cranges = q->cranges;
+	int i;
+
+	lockdep_assert_held(&q->sysfs_dir_lock);
+	lockdep_assert_held(&q->sysfs_lock);
+
+	if (!cranges)
+		return;
+
+	if (cranges->sysfs_registered) {
+		for (i = 0; i < cranges->nr_ranges; i++)
+			kobject_del(&cranges->ranges[i].kobj);
+		kobject_del(&cranges->kobj);
+	}
+
+	kfree(cranges);
+	q->cranges = NULL;
+}
+
+static bool blk_check_ranges(struct gendisk *disk, struct blk_cranges *cr)
+{
+	sector_t capacity = get_capacity(disk);
+	sector_t min_sector = (sector_t)-1;
+	sector_t max_sector = 0;
+	int i;
+
+	/*
+	 * Sector ranges may overlap but should overall contain all sectors
+	 * within the disk capacity.
+	 */
+	for (i = 0; i < cr->nr_ranges; i++) {
+		min_sector = min(min_sector, cr->ranges[i].sector);
+		max_sector = max(max_sector, cr->ranges[i].sector +
+					     cr->ranges[i].nr_sectors);
+	}
+
+	if (min_sector != 0 || max_sector < capacity) {
+		pr_warn("Invalid concurrent ranges: missing sectors\n");
+		return false;
+	}
+
+	if (max_sector > capacity) {
+		pr_warn("Invalid concurrent ranges: beyond capacity\n");
+		return false;
+	}
+
+	return true;
+}
+
+static bool blk_cranges_changed(struct gendisk *disk, struct blk_cranges *new)
+{
+	struct blk_cranges *old = disk->queue->cranges;
+	int i;
+
+	if (!old)
+		return true;
+
+	if (old->nr_ranges != new->nr_ranges)
+		return true;
+
+	for (i = 0; i < old->nr_ranges; i++) {
+		if (new->ranges[i].sector != old->ranges[i].sector ||
+		    new->ranges[i].nr_sectors != old->ranges[i].nr_sectors)
+			return true;
+	}
+
+	return false;
+}
+
+/**
+ * blk_alloc_cranges - Allocate a concurrent positioning range structure
+ * @disk:	target disk
+ * @nr_ranges:	Number of concurrent ranges
+ *
+ * Allocate a struct blk_cranges structure with @nr_ranges range descriptors.
+ */
+struct blk_cranges *blk_alloc_cranges(struct gendisk *disk, int nr_ranges)
+{
+	struct blk_cranges *cr;
+
+	cr = kzalloc_node(struct_size(cr, ranges, nr_ranges), GFP_KERNEL,
+			  disk->queue->node);
+	if (cr)
+		cr->nr_ranges = nr_ranges;
+	return cr;
+}
+EXPORT_SYMBOL_GPL(blk_alloc_cranges);
+
+/**
+ * blk_queue_set_cranges - Set a disk concurrent positioning ranges
+ * @disk:	target disk
+ * @cr:		concurrent ranges structure
+ *
+ * Set the concurrant positioning ranges information of the request queue
+ * of @disk to @cr. If @cr is NULL and the concurrent ranges structure
+ * already set, if any, is cleared. If there are no differences between
+ * @cr and the concurrent ranges structure already set, @cr is freed.
+ */
+void blk_queue_set_cranges(struct gendisk *disk, struct blk_cranges *cr)
+{
+	struct request_queue *q = disk->queue;
+
+	if (WARN_ON_ONCE(cr && !cr->nr_ranges)) {
+		kfree(cr);
+		cr = NULL;
+	}
+
+	mutex_lock(&q->sysfs_dir_lock);
+	mutex_lock(&q->sysfs_lock);
+
+	if (cr) {
+		if (!blk_check_ranges(disk, cr)) {
+			kfree(cr);
+			cr = NULL;
+			goto reg;
+		}
+
+		if (!blk_cranges_changed(disk, cr)) {
+			kfree(cr);
+			goto unlock;
+		}
+	}
+
+	/*
+	 * This may be called for a registered queue. E.g. during a device
+	 * revalidation. If that is the case, we need to unregister the old
+	 * set of concurrent ranges and register the new set. If the queue
+	 * is not registered, the device request queue registration will
+	 * register the ranges, so only swap in the new set and free the
+	 * old one.
+	 */
+reg:
+	if (blk_queue_registered(q)) {
+		blk_register_cranges(disk, cr);
+	} else {
+		swap(q->cranges, cr);
+		kfree(cr);
+	}
+
+unlock:
+	mutex_unlock(&q->sysfs_lock);
+	mutex_unlock(&q->sysfs_dir_lock);
+}
+EXPORT_SYMBOL_GPL(blk_queue_set_cranges);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 370d83c18057..aeac98ecc5a0 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -899,9 +899,21 @@ int blk_register_queue(struct gendisk *disk)
 	}
 
 	mutex_lock(&q->sysfs_lock);
+
+	ret = blk_register_cranges(disk, NULL);
+	if (ret) {
+		mutex_unlock(&q->sysfs_lock);
+		mutex_unlock(&q->sysfs_dir_lock);
+		kobject_del(&q->kobj);
+		blk_trace_remove_sysfs(dev);
+		kobject_put(&dev->kobj);
+		return ret;
+	}
+
 	if (q->elevator) {
 		ret = elv_register_queue(q, false);
 		if (ret) {
+			blk_unregister_cranges(disk);
 			mutex_unlock(&q->sysfs_lock);
 			mutex_unlock(&q->sysfs_dir_lock);
 			kobject_del(&q->kobj);
@@ -985,6 +997,7 @@ void blk_unregister_queue(struct gendisk *disk)
 	mutex_lock(&q->sysfs_lock);
 	if (q->elevator)
 		elv_unregister_queue(q);
+	blk_unregister_cranges(disk);
 	mutex_unlock(&q->sysfs_lock);
 	mutex_unlock(&q->sysfs_dir_lock);
 
diff --git a/block/blk.h b/block/blk.h
index 4b885c0f6708..650c0d87987c 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -368,4 +368,7 @@ extern struct device_attribute dev_attr_events;
 extern struct device_attribute dev_attr_events_async;
 extern struct device_attribute dev_attr_events_poll_msecs;
 
+int blk_register_cranges(struct gendisk *disk, struct blk_cranges *new_cranges);
+void blk_unregister_cranges(struct gendisk *disk);
+
 #endif /* BLK_INTERNAL_H */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d3afea47ade6..d10352674d20 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -378,6 +378,29 @@ static inline int blkdev_zone_mgmt_ioctl(struct block_device *bdev,
 
 #endif /* CONFIG_BLK_DEV_ZONED */
 
+/*
+ * Concurrent sector ranges: struct blk_crange describes range of
+ * contiguous sectors that can be served by independent resources on the
+ * device. The set of ranges defined in struct blk_cranges must overall
+ * include all sectors within the device capacity.
+ * For a device with multiple ranges, e.g. a single LUN multi-actuator HDD,
+ * requests targeting sectors in different ranges can be executed in parallel.
+ * A request can straddle a range boundary.
+ */
+struct blk_crange {
+	struct kobject		kobj;
+	struct request_queue	*queue;
+	sector_t		sector;
+	sector_t		nr_sectors;
+};
+
+struct blk_cranges {
+	struct kobject		kobj;
+	bool			sysfs_registered;
+	unsigned int		nr_ranges;
+	struct blk_crange	ranges[];
+};
+
 struct request_queue {
 	struct request		*last_merge;
 	struct elevator_queue	*elevator;
@@ -570,6 +593,9 @@ struct request_queue {
 
 #define BLK_MAX_WRITE_HINTS	5
 	u64			write_hints[BLK_MAX_WRITE_HINTS];
+
+	/* Concurrent sector ranges */
+	struct blk_cranges	*cranges;
 };
 
 /* Keep blk_queue_flag_name[] in sync with the definitions below */
@@ -1163,6 +1189,9 @@ extern void blk_queue_required_elevator_features(struct request_queue *q,
 extern bool blk_queue_can_use_dma_map_merging(struct request_queue *q,
 					      struct device *dev);
 
+struct blk_cranges *blk_alloc_cranges(struct gendisk *disk, int nr_ranges);
+void blk_queue_set_cranges(struct gendisk *disk, struct blk_cranges *cr);
+
 /*
  * Number of physical segments as sent to the device.
  *
-- 
2.31.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 2/4] scsi: sd: add concurrent positioning ranges support
  2021-07-26  1:38 [PATCH v3 0/4] Initial support for multi-actuator HDDs Damien Le Moal
  2021-07-26  1:38 ` [PATCH v3 1/4] block: Add concurrent positioning ranges support Damien Le Moal
@ 2021-07-26  1:38 ` Damien Le Moal
  2021-08-10  8:24   ` Christoph Hellwig
  2021-07-26  1:38 ` [PATCH v3 3/4] libata: support concurrent positioning ranges log Damien Le Moal
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Damien Le Moal @ 2021-07-26  1:38 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen, linux-ide
  Cc: Hannes Reinecke

Add the sd_read_cpr() function to the sd scsi disk driver to discover
if a device has multiple concurrent positioning ranges (i.e. multiple
actuators on an HDD). This new function is called from
sd_revalidate_disk() and uses the block layer functions
blk_alloc_cranges() and blk_queue_set_cranges() to set a device
cranges according to the information retrieved from log page B9h,
if the device supports it.

The format of the Concurrent Positioning Ranges VPD page B9h is defined
in section 6.6.6 of SBC-5.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/sd.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++
 drivers/scsi/sd.h |  1 +
 2 files changed, 82 insertions(+)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index b8d55af763f9..8e83099b49f6 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3125,6 +3125,86 @@ static void sd_read_security(struct scsi_disk *sdkp, unsigned char *buffer)
 		sdkp->security = 1;
 }
 
+static inline sector_t sd64_to_sectors(struct scsi_disk *sdkp, u8 *buf)
+{
+	return logical_to_sectors(sdkp->device, get_unaligned_be64(buf));
+}
+
+/**
+ * sd_read_cpr - Query concurrent positioning ranges
+ * @sdkp:	disk to query
+ */
+static void sd_read_cpr(struct scsi_disk *sdkp)
+{
+	unsigned char *buffer = NULL;
+	struct blk_cranges *cr = NULL;
+	unsigned int nr_cpr = 0;
+	int i, vpd_len, buf_len = SD_BUF_SIZE;
+	u8 *desc;
+
+	/*
+	 * We need to have the capacity set first for the block layer to be
+	 * able to check the ranges.
+	 */
+	if (sdkp->first_scan)
+		return;
+
+	if (!sdkp->capacity)
+		goto out;
+
+	/*
+	 * Concurrent Positioning Ranges VPD: there can be at most 256 ranges,
+	 * leading to a maximum page size of 64 + 256*32 bytes.
+	 */
+	buf_len = 64 + 256*32;
+	buffer = kmalloc(buf_len, GFP_KERNEL);
+	if (!buffer || scsi_get_vpd_page(sdkp->device, 0xb9, buffer, buf_len))
+		goto out;
+
+	/* We must have at least a 64B header and one 32B range descriptor */
+	vpd_len = get_unaligned_be16(&buffer[2]) + 3;
+	if (vpd_len > buf_len || vpd_len < 64 + 32 || (vpd_len & 31)) {
+		sd_printk(KERN_ERR, sdkp,
+			  "Invalid Concurrent Positioning Ranges VPD page\n");
+		goto out;
+	}
+
+	nr_cpr = (vpd_len - 64) / 32;
+	if (nr_cpr == 1) {
+		nr_cpr = 0;
+		goto out;
+	}
+
+	cr = blk_alloc_cranges(sdkp->disk, nr_cpr);
+	if (!cr) {
+		nr_cpr = 0;
+		goto out;
+	}
+
+	desc = &buffer[64];
+	for (i = 0; i < nr_cpr; i++, desc += 32) {
+		if (desc[0] != i) {
+			sd_printk(KERN_ERR, sdkp,
+				"Invalid Concurrent Positioning Range number\n");
+			nr_cpr = 0;
+			break;
+		}
+
+		cr->ranges[i].sector = sd64_to_sectors(sdkp, desc + 8);
+		cr->ranges[i].nr_sectors = sd64_to_sectors(sdkp, desc + 16);
+	}
+
+out:
+	blk_queue_set_cranges(sdkp->disk, cr);
+	if (nr_cpr && sdkp->nr_actuators != nr_cpr) {
+		sd_printk(KERN_NOTICE, sdkp,
+			  "%u concurrent positioning ranges\n", nr_cpr);
+		sdkp->nr_actuators = nr_cpr;
+	}
+
+	kfree(buffer);
+}
+
 /*
  * Determine the device's preferred I/O size for reads and writes
  * unless the reported value is unreasonably small, large, not a
@@ -3240,6 +3320,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
 		sd_read_app_tag_own(sdkp, buffer);
 		sd_read_write_same(sdkp, buffer);
 		sd_read_security(sdkp, buffer);
+		sd_read_cpr(sdkp);
 	}
 
 	/*
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index b59136c4125b..2e5932bde43d 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -106,6 +106,7 @@ struct scsi_disk {
 	u8		protection_type;/* Data Integrity Field */
 	u8		provisioning_mode;
 	u8		zeroing_mode;
+	u8		nr_actuators;		/* Number of actuators */
 	unsigned	ATO : 1;	/* state of disk ATO bit */
 	unsigned	cache_override : 1; /* temp override of WCE,RCD */
 	unsigned	WCE : 1;	/* state of disk WCE bit */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 3/4] libata: support concurrent positioning ranges log
  2021-07-26  1:38 [PATCH v3 0/4] Initial support for multi-actuator HDDs Damien Le Moal
  2021-07-26  1:38 ` [PATCH v3 1/4] block: Add concurrent positioning ranges support Damien Le Moal
  2021-07-26  1:38 ` [PATCH v3 2/4] scsi: sd: add " Damien Le Moal
@ 2021-07-26  1:38 ` Damien Le Moal
  2021-07-26  7:34   ` Hannes Reinecke
  2021-08-10  8:26   ` Christoph Hellwig
  2021-07-26  1:38 ` [PATCH v3 4/4] doc: document sysfs queue/cranges attributes Damien Le Moal
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 27+ messages in thread
From: Damien Le Moal @ 2021-07-26  1:38 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen, linux-ide
  Cc: Hannes Reinecke

Add support to discover if an ATA device supports the Concurrent
Positioning Ranges Log (address 0x47), indicating that the device is
capable of seeking to multiple different locations in parallel using
multiple actuators serving different LBA ranges.

Also add support to translate the concurrent positioning ranges log
into its equivalent Concurrent Positioning Ranges VPD page B9h in
libata-scsi.c.

The format of the Concurrent Positioning Ranges Log is defined in ACS-5
r9.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/ata/libata-core.c | 52 +++++++++++++++++++++++++++++++++++++++
 drivers/ata/libata-scsi.c | 48 +++++++++++++++++++++++++++++-------
 include/linux/ata.h       |  1 +
 include/linux/libata.h    | 15 +++++++++++
 4 files changed, 107 insertions(+), 9 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 61c762961ca8..ab3f61ea743e 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -2363,6 +2363,57 @@ static void ata_dev_config_trusted(struct ata_device *dev)
 		dev->flags |= ATA_DFLAG_TRUSTED;
 }
 
+static void ata_dev_config_cpr(struct ata_device *dev)
+{
+	unsigned int err_mask;
+	size_t buf_len;
+	int i, nr_cpr = 0;
+	struct ata_cpr_log *cpr_log = NULL;
+	u8 *desc, *buf = NULL;
+
+	if (!ata_identify_page_supported(dev,
+				 ATA_LOG_CONCURRENT_POSITIONING_RANGES))
+		goto out;
+
+	/*
+	 * Read IDENTIFY DEVICE data log, page 0x47
+	 * (concurrent positioning ranges). We can have at most 255 32B range
+	 * descriptors plus a 64B header.
+	 */
+	buf_len = (64 + 255 * 32 + 511) & ~511;
+	buf = kzalloc(buf_len, GFP_KERNEL);
+	if (!buf)
+		goto out;
+
+	err_mask = ata_read_log_page(dev, ATA_LOG_IDENTIFY_DEVICE,
+				     ATA_LOG_CONCURRENT_POSITIONING_RANGES,
+				     buf, buf_len >> 9);
+	if (err_mask)
+		goto out;
+
+	nr_cpr = buf[0];
+	if (!nr_cpr)
+		goto out;
+
+	cpr_log = kzalloc(struct_size(cpr_log, cpr, nr_cpr), GFP_KERNEL);
+	if (!cpr_log)
+		goto out;
+
+	cpr_log->nr_cpr = nr_cpr;
+	desc = &buf[64];
+	for (i = 0; i < nr_cpr; i++, desc += 32) {
+		cpr_log->cpr[i].num = desc[0];
+		cpr_log->cpr[i].num_storage_elements = desc[1];
+		cpr_log->cpr[i].start_lba = get_unaligned_le64(&desc[8]);
+		cpr_log->cpr[i].num_lbas = get_unaligned_le64(&desc[16]);
+	}
+
+out:
+	swap(dev->cpr_log, cpr_log);
+	kfree(cpr_log);
+	kfree(buf);
+}
+
 /**
  *	ata_dev_configure - Configure the specified ATA/ATAPI device
  *	@dev: Target device to configure
@@ -2591,6 +2642,7 @@ int ata_dev_configure(struct ata_device *dev)
 		ata_dev_config_sense_reporting(dev);
 		ata_dev_config_zac(dev);
 		ata_dev_config_trusted(dev);
+		ata_dev_config_cpr(dev);
 		dev->cdb_len = 32;
 	}
 
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index b9588c52815d..5cadbb9a8bf2 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1937,7 +1937,7 @@ static unsigned int ata_scsiop_inq_std(struct ata_scsi_args *args, u8 *rbuf)
  */
 static unsigned int ata_scsiop_inq_00(struct ata_scsi_args *args, u8 *rbuf)
 {
-	int num_pages;
+	int i, num_pages = 0;
 	static const u8 pages[] = {
 		0x00,	/* page 0x00, this page */
 		0x80,	/* page 0x80, unit serial no page */
@@ -1947,13 +1947,17 @@ static unsigned int ata_scsiop_inq_00(struct ata_scsi_args *args, u8 *rbuf)
 		0xb1,	/* page 0xb1, block device characteristics page */
 		0xb2,	/* page 0xb2, thin provisioning page */
 		0xb6,	/* page 0xb6, zoned block device characteristics */
+		0xb9,	/* page 0xb9, concurrent positioning ranges */
 	};
 
-	num_pages = sizeof(pages);
-	if (!(args->dev->flags & ATA_DFLAG_ZAC))
-		num_pages--;
+	for (i = 0; i < sizeof(pages); i++) {
+		if (pages[i] == 0xb6 &&
+		    !(args->dev->flags & ATA_DFLAG_ZAC))
+			continue;
+		rbuf[num_pages + 4] = pages[i];
+		num_pages++;
+	}
 	rbuf[3] = num_pages;	/* number of supported VPD pages */
-	memcpy(rbuf + 4, pages, num_pages);
 	return 0;
 }
 
@@ -2163,6 +2167,26 @@ static unsigned int ata_scsiop_inq_b6(struct ata_scsi_args *args, u8 *rbuf)
 	return 0;
 }
 
+static unsigned int ata_scsiop_inq_b9(struct ata_scsi_args *args, u8 *rbuf)
+{
+	struct ata_cpr_log *cpr_log = args->dev->cpr_log;
+	u8 *desc = &rbuf[64];
+	int i;
+
+	/* SCSI Concurrent Positioning Ranges VPD page: SBC-5 rev 1 or later */
+	rbuf[1] = 0xb9;
+	put_unaligned_be16(64 + (int)cpr_log->nr_cpr * 32 - 4, &rbuf[3]);
+
+	for (i = 0; i < cpr_log->nr_cpr; i++, desc += 32) {
+		desc[0] = cpr_log->cpr[i].num;
+		desc[1] = cpr_log->cpr[i].num_storage_elements;
+		put_unaligned_be64(cpr_log->cpr[i].start_lba, &desc[8]);
+		put_unaligned_be64(cpr_log->cpr[i].num_lbas, &desc[16]);
+	}
+
+	return 0;
+}
+
 /**
  *	modecpy - Prepare response for MODE SENSE
  *	@dest: output buffer
@@ -4162,11 +4186,17 @@ void ata_scsi_simulate(struct ata_device *dev, struct scsi_cmnd *cmd)
 			ata_scsi_rbuf_fill(&args, ata_scsiop_inq_b2);
 			break;
 		case 0xb6:
-			if (dev->flags & ATA_DFLAG_ZAC) {
+			if (dev->flags & ATA_DFLAG_ZAC)
 				ata_scsi_rbuf_fill(&args, ata_scsiop_inq_b6);
-				break;
-			}
-			fallthrough;
+			else
+				ata_scsi_set_invalid_field(dev, cmd, 2, 0xff);
+			break;
+		case 0xb9:
+			if (dev->cpr_log)
+				ata_scsi_rbuf_fill(&args, ata_scsiop_inq_b9);
+			else
+				ata_scsi_set_invalid_field(dev, cmd, 2, 0xff);
+			break;
 		default:
 			ata_scsi_set_invalid_field(dev, cmd, 2, 0xff);
 			break;
diff --git a/include/linux/ata.h b/include/linux/ata.h
index 1b44f40c7700..199e47e97d64 100644
--- a/include/linux/ata.h
+++ b/include/linux/ata.h
@@ -329,6 +329,7 @@ enum {
 	ATA_LOG_SECURITY	  = 0x06,
 	ATA_LOG_SATA_SETTINGS	  = 0x08,
 	ATA_LOG_ZONED_INFORMATION = 0x09,
+	ATA_LOG_CONCURRENT_POSITIONING_RANGES = 0x47,
 
 	/* Identify device SATA settings log:*/
 	ATA_LOG_DEVSLP_OFFSET	  = 0x30,
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 3fcd24236793..b159a245d88c 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -670,6 +670,18 @@ struct ata_ering {
 	struct ata_ering_entry	ring[ATA_ERING_SIZE];
 };
 
+struct ata_cpr {
+	u8			num;
+	u8			num_storage_elements;
+	u64			start_lba;
+	u64			num_lbas;
+};
+
+struct ata_cpr_log {
+	u8			nr_cpr;
+	struct ata_cpr		cpr[];
+};
+
 struct ata_device {
 	struct ata_link		*link;
 	unsigned int		devno;		/* 0 or 1 */
@@ -729,6 +741,9 @@ struct ata_device {
 	u32			zac_zones_optimal_nonseq;
 	u32			zac_zones_max_open;
 
+	/* Concurrent positioning ranges */
+	struct ata_cpr_log	*cpr_log;
+
 	/* error history */
 	int			spdn_cnt;
 	/* ering is CLEAR_END, read comment above CLEAR_END */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 4/4] doc: document sysfs queue/cranges attributes
  2021-07-26  1:38 [PATCH v3 0/4] Initial support for multi-actuator HDDs Damien Le Moal
                   ` (2 preceding siblings ...)
  2021-07-26  1:38 ` [PATCH v3 3/4] libata: support concurrent positioning ranges log Damien Le Moal
@ 2021-07-26  1:38 ` Damien Le Moal
  2021-07-26  7:35   ` Hannes Reinecke
  2021-08-10  8:27   ` Christoph Hellwig
  2021-07-28 22:59 ` [PATCH v3 0/4] Initial support for multi-actuator HDDs Damien Le Moal
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 27+ messages in thread
From: Damien Le Moal @ 2021-07-26  1:38 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen, linux-ide
  Cc: Hannes Reinecke

Update the file Documentation/block/queue-sysfs.rst to add a description
of a device queue sysfs entries related to concurrent sector ranges
(e.g. concurrent positioning ranges for multi-actuator hard-disks).

While at it, also fix a typo in this file introduction paragraph.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 Documentation/block/queue-sysfs.rst | 30 ++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/Documentation/block/queue-sysfs.rst b/Documentation/block/queue-sysfs.rst
index 4dc7f0d499a8..25f4a768f450 100644
--- a/Documentation/block/queue-sysfs.rst
+++ b/Documentation/block/queue-sysfs.rst
@@ -4,7 +4,7 @@ Queue sysfs files
 
 This text file will detail the queue files that are located in the sysfs tree
 for each block device. Note that stacked devices typically do not export
-any settings, since their queue merely functions are a remapping target.
+any settings, since their queue merely functions as a remapping target.
 These files are the ones found in the /sys/block/xxx/queue/ directory.
 
 Files denoted with a RO postfix are readonly and the RW postfix means
@@ -286,4 +286,32 @@ sequential zones of zoned block devices (devices with a zoned attributed
 that reports "host-managed" or "host-aware"). This value is always 0 for
 regular block devices.
 
+cranges (RO)
+------------
+
+The presence of this sub-directory of the /sys/block/xxx/queue/ directory
+indicates that the device is capable of executing requests targeting
+different sector ranges in parallel. For instance, single LUN multi-actuator
+hard-disks will likely have a cranges directory if the device correctly
+advertizes the sector ranges of its actuators.
+
+The cranges directory contains one directory per concurrent range, with each
+range described using the sector (RO) attribute file to indicate the first
+sector of the range and the nr_sectors (RO) attribute file to indicate the
+total number of sector in the range starting from the first sector.
+For example, a dual-actuator hard disk will have the following cranges
+entries.::
+
+        $ tree /sys/block/<device>/queue/cranges/
+        /sys/block/<device>/queue/cranges/
+        |-- 0
+        |   |-- nr_sectors
+        |   `-- sector
+        `-- 1
+            |-- nr_sectors
+            `-- sector
+
+The sector and nr_sectors attributes use 512B sector unit, regardless of
+the actual block size of the device.
+
 Jens Axboe <jens.axboe@oracle.com>, February 2009
-- 
2.31.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/4] block: Add concurrent positioning ranges support
  2021-07-26  1:38 ` [PATCH v3 1/4] block: Add concurrent positioning ranges support Damien Le Moal
@ 2021-07-26  7:33   ` Hannes Reinecke
  2021-07-26  8:30     ` Damien Le Moal
  2021-08-10  8:23   ` Christoph Hellwig
  1 sibling, 1 reply; 27+ messages in thread
From: Hannes Reinecke @ 2021-07-26  7:33 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, linux-ide

On 7/26/21 3:38 AM, Damien Le Moal wrote:
> The Concurrent Positioning Ranges VPD page (for SCSI) and Log (for ATA)
> contain parameters describing the number of sets of contiguous LBAs that
> can be served independently by a single LUN multi-actuator disk. This
> patch provides the blk_queue_set_cranges() function allowing a device
> driver to signal to the block layer that a disk has multiple actuators,
> each one serving a contiguous range of sectors. To describe the set
> of sector ranges representing the different actuators of a device, the
> data type struct blk_cranges is introduced.
> 
> For a device with multiple actuators, a struct blk_cranges is attached
> to the device request queue by the blk_queue_set_cranges() function. The
> function blk_alloc_cranges() is provided for drivers to allocate this
> structure.
> 
> The blk_cranges structure contains kobjects (struct kobject) to register
> with sysfs the set of sector ranges defined by a device. On initial
> device scan, this registration is done from blk_register_queue() using
> the block layer internal function blk_register_cranges(). If a driver
> calls blk_queue_set_cranges() for a registered queue, e.g. when a device
> is revalidated, blk_queue_set_cranges() will execute
> blk_register_cranges() to update the queue sysfs attribute files.
> 
> The sysfs file structure created starts from the cranges sub-directory
> and contains the start sector and number of sectors served by an
> actuator, with the information for each actuator grouped in one
> directory per actuator. E.g. for a dual actuator drive, we have:
> 
> $ tree /sys/block/sdk/queue/cranges/
> /sys/block/sdk/queue/cranges/
> |-- 0
> |   |-- nr_sectors
> |   `-- sector
> `-- 1
>      |-- nr_sectors
>      `-- sector
> 
> For a regular single actuator device, the cranges directory does not
> exist.
> 
> Device revalidation may lead to changes to this structure and to the
> attribute values. When manipulated, the queue sysfs_lock and
> sysfs_dir_lock are held for atomicity, similarly to how the blk-mq and
> elevator sysfs queue sub-directories are protected.
> 
> The code related to the management of cranges is added in the new
> file block/blk-cranges.c.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>   block/Makefile         |   2 +-
>   block/blk-cranges.c    | 295 +++++++++++++++++++++++++++++++++++++++++
>   block/blk-sysfs.c      |  13 ++
>   block/blk.h            |   3 +
>   include/linux/blkdev.h |  29 ++++
>   5 files changed, 341 insertions(+), 1 deletion(-)
>   create mode 100644 block/blk-cranges.c
> 
> diff --git a/block/Makefile b/block/Makefile
> index bfbe4e13ca1e..e477e6ca9ea6 100644
> --- a/block/Makefile
> +++ b/block/Makefile
> @@ -9,7 +9,7 @@ obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-sysfs.o \
>   			blk-lib.o blk-mq.o blk-mq-tag.o blk-stat.o \
>   			blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \
>   			genhd.o ioprio.o badblocks.o partitions/ blk-rq-qos.o \
> -			disk-events.o
> +			disk-events.o blk-cranges.o
>   
>   obj-$(CONFIG_BOUNCE)		+= bounce.o
>   obj-$(CONFIG_BLK_SCSI_REQUEST)	+= scsi_ioctl.o
> diff --git a/block/blk-cranges.c b/block/blk-cranges.c
> new file mode 100644
> index 000000000000..deaa09e564f7
> --- /dev/null
> +++ b/block/blk-cranges.c
> @@ -0,0 +1,295 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + *  Block device concurrent positioning ranges.
> + *
> + *  Copyright (C) 2021 Western Digital Corporation or its Affiliates.
> + */
> +#include <linux/kernel.h>
> +#include <linux/blkdev.h>
> +#include <linux/slab.h>
> +#include <linux/init.h>
> +
> +#include "blk.h"
> +
> +static ssize_t blk_crange_sector_show(struct blk_crange *cr, char *page)
> +{
> +	return sprintf(page, "%llu\n", cr->sector);
> +}
> +
> +static ssize_t blk_crange_nr_sectors_show(struct blk_crange *cr, char *page)
> +{
> +	return sprintf(page, "%llu\n", cr->nr_sectors);
> +}
> +
> +struct blk_crange_sysfs_entry {
> +	struct attribute attr;
> +	ssize_t (*show)(struct blk_crange *cr, char *page);
> +};
> +
> +static struct blk_crange_sysfs_entry blk_crange_sector_entry = {
> +	.attr = { .name = "sector", .mode = 0444 },
> +	.show = blk_crange_sector_show,
> +};
> +
> +static struct blk_crange_sysfs_entry blk_crange_nr_sectors_entry = {
> +	.attr = { .name = "nr_sectors", .mode = 0444 },
> +	.show = blk_crange_nr_sectors_show,
> +};
> +
> +static struct attribute *blk_crange_attrs[] = {
> +	&blk_crange_sector_entry.attr,
> +	&blk_crange_nr_sectors_entry.attr,
> +	NULL,
> +};
> +ATTRIBUTE_GROUPS(blk_crange);
> +
> +static ssize_t blk_crange_sysfs_show(struct kobject *kobj,
> +				     struct attribute *attr, char *page)
> +{
> +	struct blk_crange_sysfs_entry *entry;
> +	struct blk_crange *cr;
> +	struct request_queue *q;
> +	ssize_t ret;
> +
> +	entry = container_of(attr, struct blk_crange_sysfs_entry, attr);
> +	cr = container_of(kobj, struct blk_crange, kobj);
> +	q = cr->queue;
> +
> +	mutex_lock(&q->sysfs_lock);
> +	ret = entry->show(cr, page);
> +	mutex_unlock(&q->sysfs_lock);
> +
> +	return ret;
> +}
> +
> +static const struct sysfs_ops blk_crange_sysfs_ops = {
> +	.show	= blk_crange_sysfs_show,
> +};
> +
> +/*
> + * Dummy release function to make kobj happy.
> + */
> +static void blk_cranges_sysfs_nop_release(struct kobject *kobj)
> +{
> +}
> +
> +static struct kobj_type blk_crange_ktype = {
> +	.sysfs_ops	= &blk_crange_sysfs_ops,
> +	.default_groups	= blk_crange_groups,
> +	.release	= blk_cranges_sysfs_nop_release,
> +};
> +
> +static struct kobj_type blk_cranges_ktype = {
> +	.release	= blk_cranges_sysfs_nop_release,
> +};
> +
> +/**
> + * blk_register_cranges - register with sysfs a set of concurrent ranges
> + * @disk:		Target disk
> + * @new_cranges:	New set of concurrent ranges
> + *
> + * Register with sysfs a set of concurrent ranges for @disk. If @new_cranges
> + * is not NULL, this set of concurrent ranges is registered and the
> + * old set specified by q->cranges is unregistered. Otherwise, q->cranges
> + * is registered if it is not already.
> + */
> +int blk_register_cranges(struct gendisk *disk, struct blk_cranges *new_cranges)
> +{
> +	struct request_queue *q = disk->queue;
> +	struct blk_cranges *cranges;
> +	int i, ret;
> +
> +	lockdep_assert_held(&q->sysfs_dir_lock);
> +	lockdep_assert_held(&q->sysfs_lock);
> +
> +	/* If a new range set is specified, unregister the old one */
> +	if (new_cranges) {
> +		if (q->cranges)
> +			blk_unregister_cranges(disk);
> +		q->cranges = new_cranges;
> +	}
> +
> +	cranges = q->cranges;
> +	if (!cranges)
> +		return 0;
> +
> +	/*
> +	 * At this point, q->cranges is the new set of sector ranges that needs
> +	 * to be registered with sysfs.
> +	 */
> +	WARN_ON(cranges->sysfs_registered);
> +	ret = kobject_init_and_add(&cranges->kobj, &blk_cranges_ktype,
> +				   &q->kobj, "%s", "cranges");
> +	if (ret)
> +		goto free;
> +
> +	for (i = 0; i < cranges->nr_ranges; i++) {
> +		cranges->ranges[i].queue = q;
> +		ret = kobject_init_and_add(&cranges->ranges[i].kobj,
> +					   &blk_crange_ktype, &cranges->kobj,
> +					   "%d", i);
> +		if (ret)
> +			goto delete_obj;
> +	}
> +
> +	cranges->sysfs_registered = true;
> +
> +	return 0;
> +
> +delete_obj:
> +	while (--i >= 0)
> +		kobject_del(&cranges->ranges[i].kobj);
> +	kobject_del(&cranges->kobj);
> +free:
> +	kfree(cranges);
> +	q->cranges = NULL;
> +	return ret;
> +}
> +
> +void blk_unregister_cranges(struct gendisk *disk)
> +{
> +	struct request_queue *q = disk->queue;
> +	struct blk_cranges *cranges = q->cranges;
> +	int i;
> +
> +	lockdep_assert_held(&q->sysfs_dir_lock);
> +	lockdep_assert_held(&q->sysfs_lock);
> +
> +	if (!cranges)
> +		return;
> +
> +	if (cranges->sysfs_registered) {
> +		for (i = 0; i < cranges->nr_ranges; i++)
> +			kobject_del(&cranges->ranges[i].kobj);
> +		kobject_del(&cranges->kobj);
> +	}
> +
> +	kfree(cranges);
> +	q->cranges = NULL;
> +}
> +
> +static bool blk_check_ranges(struct gendisk *disk, struct blk_cranges *cr)
> +{
> +	sector_t capacity = get_capacity(disk);
> +	sector_t min_sector = (sector_t)-1;
> +	sector_t max_sector = 0;
> +	int i;
> +
> +	/*
> +	 * Sector ranges may overlap but should overall contain all sectors
> +	 * within the disk capacity.
> +	 */
> +	for (i = 0; i < cr->nr_ranges; i++) {
> +		min_sector = min(min_sector, cr->ranges[i].sector);
> +		max_sector = max(max_sector, cr->ranges[i].sector +
> +					     cr->ranges[i].nr_sectors);
> +	}
> +
> +	if (min_sector != 0 || max_sector < capacity) {
> +		pr_warn("Invalid concurrent ranges: missing sectors\n");
> +		return false;
> +	}
> +
> +	if (max_sector > capacity) {
> +		pr_warn("Invalid concurrent ranges: beyond capacity\n");
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +static bool blk_cranges_changed(struct gendisk *disk, struct blk_cranges *new)
> +{
> +	struct blk_cranges *old = disk->queue->cranges;
> +	int i;
> +
> +	if (!old)
> +		return true;
> +
> +	if (old->nr_ranges != new->nr_ranges)
> +		return true;
> +
> +	for (i = 0; i < old->nr_ranges; i++) {
> +		if (new->ranges[i].sector != old->ranges[i].sector ||
> +		    new->ranges[i].nr_sectors != old->ranges[i].nr_sectors)
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
> +/**
> + * blk_alloc_cranges - Allocate a concurrent positioning range structure
> + * @disk:	target disk
> + * @nr_ranges:	Number of concurrent ranges
> + *
> + * Allocate a struct blk_cranges structure with @nr_ranges range descriptors.
> + */
> +struct blk_cranges *blk_alloc_cranges(struct gendisk *disk, int nr_ranges)
> +{
> +	struct blk_cranges *cr;
> +
> +	cr = kzalloc_node(struct_size(cr, ranges, nr_ranges), GFP_KERNEL,
> +			  disk->queue->node);
> +	if (cr)
> +		cr->nr_ranges = nr_ranges;
> +	return cr;
> +}
> +EXPORT_SYMBOL_GPL(blk_alloc_cranges);
> +
> +/**
> + * blk_queue_set_cranges - Set a disk concurrent positioning ranges
> + * @disk:	target disk
> + * @cr:		concurrent ranges structure
> + *
> + * Set the concurrant positioning ranges information of the request queue
> + * of @disk to @cr. If @cr is NULL and the concurrent ranges structure
> + * already set, if any, is cleared. If there are no differences between
> + * @cr and the concurrent ranges structure already set, @cr is freed.
> + */
> +void blk_queue_set_cranges(struct gendisk *disk, struct blk_cranges *cr)
> +{
> +	struct request_queue *q = disk->queue;
> +
> +	if (WARN_ON_ONCE(cr && !cr->nr_ranges)) {
> +		kfree(cr);
> +		cr = NULL;
> +	}
> +
> +	mutex_lock(&q->sysfs_dir_lock);
> +	mutex_lock(&q->sysfs_lock);
> +
> +	if (cr) {
> +		if (!blk_check_ranges(disk, cr)) {
> +			kfree(cr);
> +			cr = NULL;
> +			goto reg;
> +		}
> +
> +		if (!blk_cranges_changed(disk, cr)) {
> +			kfree(cr);
> +			goto unlock;
> +		}
> +	}
> +
> +	/*
> +	 * This may be called for a registered queue. E.g. during a device
> +	 * revalidation. If that is the case, we need to unregister the old
> +	 * set of concurrent ranges and register the new set. If the queue
> +	 * is not registered, the device request queue registration will
> +	 * register the ranges, so only swap in the new set and free the
> +	 * old one.
> +	 */
> +reg:
> +	if (blk_queue_registered(q)) {
> +		blk_register_cranges(disk, cr);
> +	} else {
> +		swap(q->cranges, cr);
> +		kfree(cr);
> +	}
> +
> +unlock:
> +	mutex_unlock(&q->sysfs_lock);
> +	mutex_unlock(&q->sysfs_dir_lock);
> +}
> +EXPORT_SYMBOL_GPL(blk_queue_set_cranges);
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 370d83c18057..aeac98ecc5a0 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -899,9 +899,21 @@ int blk_register_queue(struct gendisk *disk)
>   	}
>   
>   	mutex_lock(&q->sysfs_lock);
> +
> +	ret = blk_register_cranges(disk, NULL);
> +	if (ret) {
> +		mutex_unlock(&q->sysfs_lock);
> +		mutex_unlock(&q->sysfs_dir_lock);
> +		kobject_del(&q->kobj);
> +		blk_trace_remove_sysfs(dev);
> +		kobject_put(&dev->kobj);
> +		return ret;
> +	}
> +
>   	if (q->elevator) {
>   		ret = elv_register_queue(q, false);
>   		if (ret) {
> +			blk_unregister_cranges(disk);
>   			mutex_unlock(&q->sysfs_lock);
>   			mutex_unlock(&q->sysfs_dir_lock);
>   			kobject_del(&q->kobj);
> @@ -985,6 +997,7 @@ void blk_unregister_queue(struct gendisk *disk)
>   	mutex_lock(&q->sysfs_lock);
>   	if (q->elevator)
>   		elv_unregister_queue(q);
> +	blk_unregister_cranges(disk);
>   	mutex_unlock(&q->sysfs_lock);
>   	mutex_unlock(&q->sysfs_dir_lock);
>   
> diff --git a/block/blk.h b/block/blk.h
> index 4b885c0f6708..650c0d87987c 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -368,4 +368,7 @@ extern struct device_attribute dev_attr_events;
>   extern struct device_attribute dev_attr_events_async;
>   extern struct device_attribute dev_attr_events_poll_msecs;
>   
> +int blk_register_cranges(struct gendisk *disk, struct blk_cranges *new_cranges);
> +void blk_unregister_cranges(struct gendisk *disk);
> +
>   #endif /* BLK_INTERNAL_H */
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index d3afea47ade6..d10352674d20 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -378,6 +378,29 @@ static inline int blkdev_zone_mgmt_ioctl(struct block_device *bdev,
>   
>   #endif /* CONFIG_BLK_DEV_ZONED */
>   
> +/*
> + * Concurrent sector ranges: struct blk_crange describes range of
> + * contiguous sectors that can be served by independent resources on the
> + * device. The set of ranges defined in struct blk_cranges must overall
> + * include all sectors within the device capacity.
> + * For a device with multiple ranges, e.g. a single LUN multi-actuator HDD,
> + * requests targeting sectors in different ranges can be executed in parallel.
> + * A request can straddle a range boundary.
> + */
> +struct blk_crange {
> +	struct kobject		kobj;
> +	struct request_queue	*queue;
> +	sector_t		sector;
> +	sector_t		nr_sectors;
> +};
> +
> +struct blk_cranges {
> +	struct kobject		kobj;
> +	bool			sysfs_registered;
> +	unsigned int		nr_ranges;
> +	struct blk_crange	ranges[];
> +};
> +
>   struct request_queue {
>   	struct request		*last_merge;
>   	struct elevator_queue	*elevator;
> @@ -570,6 +593,9 @@ struct request_queue {
>   
>   #define BLK_MAX_WRITE_HINTS	5
>   	u64			write_hints[BLK_MAX_WRITE_HINTS];
> +
> +	/* Concurrent sector ranges */
> +	struct blk_cranges	*cranges;
>   };
>   
>   /* Keep blk_queue_flag_name[] in sync with the definitions below */
> @@ -1163,6 +1189,9 @@ extern void blk_queue_required_elevator_features(struct request_queue *q,
>   extern bool blk_queue_can_use_dma_map_merging(struct request_queue *q,
>   					      struct device *dev);
>   
> +struct blk_cranges *blk_alloc_cranges(struct gendisk *disk, int nr_ranges);
> +void blk_queue_set_cranges(struct gendisk *disk, struct blk_cranges *cr);
> +
>   /*
>    * Number of physical segments as sent to the device.
>    *
> 
In principle it looks good, but what would be the appropriate action 
when invalid ranges are being detected during revalidation?
The current code will leave the original ones intact, but I guess that's 
questionable as the current settings are most likely invalid.
I would vote for de-register the old ones and implement an error state 
(using an error pointer?); that would signal that there _are_ ranges, 
but we couldn't parse them properly.
Hmm?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 3/4] libata: support concurrent positioning ranges log
  2021-07-26  1:38 ` [PATCH v3 3/4] libata: support concurrent positioning ranges log Damien Le Moal
@ 2021-07-26  7:34   ` Hannes Reinecke
  2021-08-10  8:26   ` Christoph Hellwig
  1 sibling, 0 replies; 27+ messages in thread
From: Hannes Reinecke @ 2021-07-26  7:34 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, linux-ide

On 7/26/21 3:38 AM, Damien Le Moal wrote:
> Add support to discover if an ATA device supports the Concurrent
> Positioning Ranges Log (address 0x47), indicating that the device is
> capable of seeking to multiple different locations in parallel using
> multiple actuators serving different LBA ranges.
> 
> Also add support to translate the concurrent positioning ranges log
> into its equivalent Concurrent Positioning Ranges VPD page B9h in
> libata-scsi.c.
> 
> The format of the Concurrent Positioning Ranges Log is defined in ACS-5
> r9.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>   drivers/ata/libata-core.c | 52 +++++++++++++++++++++++++++++++++++++++
>   drivers/ata/libata-scsi.c | 48 +++++++++++++++++++++++++++++-------
>   include/linux/ata.h       |  1 +
>   include/linux/libata.h    | 15 +++++++++++
>   4 files changed, 107 insertions(+), 9 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 4/4] doc: document sysfs queue/cranges attributes
  2021-07-26  1:38 ` [PATCH v3 4/4] doc: document sysfs queue/cranges attributes Damien Le Moal
@ 2021-07-26  7:35   ` Hannes Reinecke
  2021-08-10  8:27   ` Christoph Hellwig
  1 sibling, 0 replies; 27+ messages in thread
From: Hannes Reinecke @ 2021-07-26  7:35 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, linux-ide

On 7/26/21 3:38 AM, Damien Le Moal wrote:
> Update the file Documentation/block/queue-sysfs.rst to add a description
> of a device queue sysfs entries related to concurrent sector ranges
> (e.g. concurrent positioning ranges for multi-actuator hard-disks).
> 
> While at it, also fix a typo in this file introduction paragraph.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>   Documentation/block/queue-sysfs.rst | 30 ++++++++++++++++++++++++++++-
>   1 file changed, 29 insertions(+), 1 deletion(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/4] block: Add concurrent positioning ranges support
  2021-07-26  7:33   ` Hannes Reinecke
@ 2021-07-26  8:30     ` Damien Le Moal
  2021-07-26  8:47       ` Hannes Reinecke
  0 siblings, 1 reply; 27+ messages in thread
From: Damien Le Moal @ 2021-07-26  8:30 UTC (permalink / raw)
  To: Hannes Reinecke, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, linux-ide

On 2021/07/26 16:34, Hannes Reinecke wrote:
> On 7/26/21 3:38 AM, Damien Le Moal wrote:
>> The Concurrent Positioning Ranges VPD page (for SCSI) and Log (for ATA)
>> contain parameters describing the number of sets of contiguous LBAs that
>> can be served independently by a single LUN multi-actuator disk. This
>> patch provides the blk_queue_set_cranges() function allowing a device
>> driver to signal to the block layer that a disk has multiple actuators,
>> each one serving a contiguous range of sectors. To describe the set
>> of sector ranges representing the different actuators of a device, the
>> data type struct blk_cranges is introduced.
>>
>> For a device with multiple actuators, a struct blk_cranges is attached
>> to the device request queue by the blk_queue_set_cranges() function. The
>> function blk_alloc_cranges() is provided for drivers to allocate this
>> structure.
>>
>> The blk_cranges structure contains kobjects (struct kobject) to register
>> with sysfs the set of sector ranges defined by a device. On initial
>> device scan, this registration is done from blk_register_queue() using
>> the block layer internal function blk_register_cranges(). If a driver
>> calls blk_queue_set_cranges() for a registered queue, e.g. when a device
>> is revalidated, blk_queue_set_cranges() will execute
>> blk_register_cranges() to update the queue sysfs attribute files.
>>
>> The sysfs file structure created starts from the cranges sub-directory
>> and contains the start sector and number of sectors served by an
>> actuator, with the information for each actuator grouped in one
>> directory per actuator. E.g. for a dual actuator drive, we have:
>>
>> $ tree /sys/block/sdk/queue/cranges/
>> /sys/block/sdk/queue/cranges/
>> |-- 0
>> |   |-- nr_sectors
>> |   `-- sector
>> `-- 1
>>      |-- nr_sectors
>>      `-- sector
>>
>> For a regular single actuator device, the cranges directory does not
>> exist.
>>
>> Device revalidation may lead to changes to this structure and to the
>> attribute values. When manipulated, the queue sysfs_lock and
>> sysfs_dir_lock are held for atomicity, similarly to how the blk-mq and
>> elevator sysfs queue sub-directories are protected.
>>
>> The code related to the management of cranges is added in the new
>> file block/blk-cranges.c.
>>
>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>> ---
>>   block/Makefile         |   2 +-
>>   block/blk-cranges.c    | 295 +++++++++++++++++++++++++++++++++++++++++
>>   block/blk-sysfs.c      |  13 ++
>>   block/blk.h            |   3 +
>>   include/linux/blkdev.h |  29 ++++
>>   5 files changed, 341 insertions(+), 1 deletion(-)
>>   create mode 100644 block/blk-cranges.c
>>
>> diff --git a/block/Makefile b/block/Makefile
>> index bfbe4e13ca1e..e477e6ca9ea6 100644
>> --- a/block/Makefile
>> +++ b/block/Makefile
>> @@ -9,7 +9,7 @@ obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-sysfs.o \
>>   			blk-lib.o blk-mq.o blk-mq-tag.o blk-stat.o \
>>   			blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \
>>   			genhd.o ioprio.o badblocks.o partitions/ blk-rq-qos.o \
>> -			disk-events.o
>> +			disk-events.o blk-cranges.o
>>   
>>   obj-$(CONFIG_BOUNCE)		+= bounce.o
>>   obj-$(CONFIG_BLK_SCSI_REQUEST)	+= scsi_ioctl.o
>> diff --git a/block/blk-cranges.c b/block/blk-cranges.c
>> new file mode 100644
>> index 000000000000..deaa09e564f7
>> --- /dev/null
>> +++ b/block/blk-cranges.c
>> @@ -0,0 +1,295 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + *  Block device concurrent positioning ranges.
>> + *
>> + *  Copyright (C) 2021 Western Digital Corporation or its Affiliates.
>> + */
>> +#include <linux/kernel.h>
>> +#include <linux/blkdev.h>
>> +#include <linux/slab.h>
>> +#include <linux/init.h>
>> +
>> +#include "blk.h"
>> +
>> +static ssize_t blk_crange_sector_show(struct blk_crange *cr, char *page)
>> +{
>> +	return sprintf(page, "%llu\n", cr->sector);
>> +}
>> +
>> +static ssize_t blk_crange_nr_sectors_show(struct blk_crange *cr, char *page)
>> +{
>> +	return sprintf(page, "%llu\n", cr->nr_sectors);
>> +}
>> +
>> +struct blk_crange_sysfs_entry {
>> +	struct attribute attr;
>> +	ssize_t (*show)(struct blk_crange *cr, char *page);
>> +};
>> +
>> +static struct blk_crange_sysfs_entry blk_crange_sector_entry = {
>> +	.attr = { .name = "sector", .mode = 0444 },
>> +	.show = blk_crange_sector_show,
>> +};
>> +
>> +static struct blk_crange_sysfs_entry blk_crange_nr_sectors_entry = {
>> +	.attr = { .name = "nr_sectors", .mode = 0444 },
>> +	.show = blk_crange_nr_sectors_show,
>> +};
>> +
>> +static struct attribute *blk_crange_attrs[] = {
>> +	&blk_crange_sector_entry.attr,
>> +	&blk_crange_nr_sectors_entry.attr,
>> +	NULL,
>> +};
>> +ATTRIBUTE_GROUPS(blk_crange);
>> +
>> +static ssize_t blk_crange_sysfs_show(struct kobject *kobj,
>> +				     struct attribute *attr, char *page)
>> +{
>> +	struct blk_crange_sysfs_entry *entry;
>> +	struct blk_crange *cr;
>> +	struct request_queue *q;
>> +	ssize_t ret;
>> +
>> +	entry = container_of(attr, struct blk_crange_sysfs_entry, attr);
>> +	cr = container_of(kobj, struct blk_crange, kobj);
>> +	q = cr->queue;
>> +
>> +	mutex_lock(&q->sysfs_lock);
>> +	ret = entry->show(cr, page);
>> +	mutex_unlock(&q->sysfs_lock);
>> +
>> +	return ret;
>> +}
>> +
>> +static const struct sysfs_ops blk_crange_sysfs_ops = {
>> +	.show	= blk_crange_sysfs_show,
>> +};
>> +
>> +/*
>> + * Dummy release function to make kobj happy.
>> + */
>> +static void blk_cranges_sysfs_nop_release(struct kobject *kobj)
>> +{
>> +}
>> +
>> +static struct kobj_type blk_crange_ktype = {
>> +	.sysfs_ops	= &blk_crange_sysfs_ops,
>> +	.default_groups	= blk_crange_groups,
>> +	.release	= blk_cranges_sysfs_nop_release,
>> +};
>> +
>> +static struct kobj_type blk_cranges_ktype = {
>> +	.release	= blk_cranges_sysfs_nop_release,
>> +};
>> +
>> +/**
>> + * blk_register_cranges - register with sysfs a set of concurrent ranges
>> + * @disk:		Target disk
>> + * @new_cranges:	New set of concurrent ranges
>> + *
>> + * Register with sysfs a set of concurrent ranges for @disk. If @new_cranges
>> + * is not NULL, this set of concurrent ranges is registered and the
>> + * old set specified by q->cranges is unregistered. Otherwise, q->cranges
>> + * is registered if it is not already.
>> + */
>> +int blk_register_cranges(struct gendisk *disk, struct blk_cranges *new_cranges)
>> +{
>> +	struct request_queue *q = disk->queue;
>> +	struct blk_cranges *cranges;
>> +	int i, ret;
>> +
>> +	lockdep_assert_held(&q->sysfs_dir_lock);
>> +	lockdep_assert_held(&q->sysfs_lock);
>> +
>> +	/* If a new range set is specified, unregister the old one */
>> +	if (new_cranges) {
>> +		if (q->cranges)
>> +			blk_unregister_cranges(disk);
>> +		q->cranges = new_cranges;
>> +	}
>> +
>> +	cranges = q->cranges;
>> +	if (!cranges)
>> +		return 0;
>> +
>> +	/*
>> +	 * At this point, q->cranges is the new set of sector ranges that needs
>> +	 * to be registered with sysfs.
>> +	 */
>> +	WARN_ON(cranges->sysfs_registered);
>> +	ret = kobject_init_and_add(&cranges->kobj, &blk_cranges_ktype,
>> +				   &q->kobj, "%s", "cranges");
>> +	if (ret)
>> +		goto free;
>> +
>> +	for (i = 0; i < cranges->nr_ranges; i++) {
>> +		cranges->ranges[i].queue = q;
>> +		ret = kobject_init_and_add(&cranges->ranges[i].kobj,
>> +					   &blk_crange_ktype, &cranges->kobj,
>> +					   "%d", i);
>> +		if (ret)
>> +			goto delete_obj;
>> +	}
>> +
>> +	cranges->sysfs_registered = true;
>> +
>> +	return 0;
>> +
>> +delete_obj:
>> +	while (--i >= 0)
>> +		kobject_del(&cranges->ranges[i].kobj);
>> +	kobject_del(&cranges->kobj);
>> +free:
>> +	kfree(cranges);
>> +	q->cranges = NULL;
>> +	return ret;
>> +}
>> +
>> +void blk_unregister_cranges(struct gendisk *disk)
>> +{
>> +	struct request_queue *q = disk->queue;
>> +	struct blk_cranges *cranges = q->cranges;
>> +	int i;
>> +
>> +	lockdep_assert_held(&q->sysfs_dir_lock);
>> +	lockdep_assert_held(&q->sysfs_lock);
>> +
>> +	if (!cranges)
>> +		return;
>> +
>> +	if (cranges->sysfs_registered) {
>> +		for (i = 0; i < cranges->nr_ranges; i++)
>> +			kobject_del(&cranges->ranges[i].kobj);
>> +		kobject_del(&cranges->kobj);
>> +	}
>> +
>> +	kfree(cranges);
>> +	q->cranges = NULL;
>> +}
>> +
>> +static bool blk_check_ranges(struct gendisk *disk, struct blk_cranges *cr)
>> +{
>> +	sector_t capacity = get_capacity(disk);
>> +	sector_t min_sector = (sector_t)-1;
>> +	sector_t max_sector = 0;
>> +	int i;
>> +
>> +	/*
>> +	 * Sector ranges may overlap but should overall contain all sectors
>> +	 * within the disk capacity.
>> +	 */
>> +	for (i = 0; i < cr->nr_ranges; i++) {
>> +		min_sector = min(min_sector, cr->ranges[i].sector);
>> +		max_sector = max(max_sector, cr->ranges[i].sector +
>> +					     cr->ranges[i].nr_sectors);
>> +	}
>> +
>> +	if (min_sector != 0 || max_sector < capacity) {
>> +		pr_warn("Invalid concurrent ranges: missing sectors\n");
>> +		return false;
>> +	}
>> +
>> +	if (max_sector > capacity) {
>> +		pr_warn("Invalid concurrent ranges: beyond capacity\n");
>> +		return false;
>> +	}
>> +
>> +	return true;
>> +}
>> +
>> +static bool blk_cranges_changed(struct gendisk *disk, struct blk_cranges *new)
>> +{
>> +	struct blk_cranges *old = disk->queue->cranges;
>> +	int i;
>> +
>> +	if (!old)
>> +		return true;
>> +
>> +	if (old->nr_ranges != new->nr_ranges)
>> +		return true;
>> +
>> +	for (i = 0; i < old->nr_ranges; i++) {
>> +		if (new->ranges[i].sector != old->ranges[i].sector ||
>> +		    new->ranges[i].nr_sectors != old->ranges[i].nr_sectors)
>> +			return true;
>> +	}
>> +
>> +	return false;
>> +}
>> +
>> +/**
>> + * blk_alloc_cranges - Allocate a concurrent positioning range structure
>> + * @disk:	target disk
>> + * @nr_ranges:	Number of concurrent ranges
>> + *
>> + * Allocate a struct blk_cranges structure with @nr_ranges range descriptors.
>> + */
>> +struct blk_cranges *blk_alloc_cranges(struct gendisk *disk, int nr_ranges)
>> +{
>> +	struct blk_cranges *cr;
>> +
>> +	cr = kzalloc_node(struct_size(cr, ranges, nr_ranges), GFP_KERNEL,
>> +			  disk->queue->node);
>> +	if (cr)
>> +		cr->nr_ranges = nr_ranges;
>> +	return cr;
>> +}
>> +EXPORT_SYMBOL_GPL(blk_alloc_cranges);
>> +
>> +/**
>> + * blk_queue_set_cranges - Set a disk concurrent positioning ranges
>> + * @disk:	target disk
>> + * @cr:		concurrent ranges structure
>> + *
>> + * Set the concurrant positioning ranges information of the request queue
>> + * of @disk to @cr. If @cr is NULL and the concurrent ranges structure
>> + * already set, if any, is cleared. If there are no differences between
>> + * @cr and the concurrent ranges structure already set, @cr is freed.
>> + */
>> +void blk_queue_set_cranges(struct gendisk *disk, struct blk_cranges *cr)
>> +{
>> +	struct request_queue *q = disk->queue;
>> +
>> +	if (WARN_ON_ONCE(cr && !cr->nr_ranges)) {
>> +		kfree(cr);
>> +		cr = NULL;
>> +	}
>> +
>> +	mutex_lock(&q->sysfs_dir_lock);
>> +	mutex_lock(&q->sysfs_lock);
>> +
>> +	if (cr) {
>> +		if (!blk_check_ranges(disk, cr)) {
>> +			kfree(cr);
>> +			cr = NULL;
>> +			goto reg;
>> +		}
>> +
>> +		if (!blk_cranges_changed(disk, cr)) {
>> +			kfree(cr);
>> +			goto unlock;
>> +		}
>> +	}
>> +
>> +	/*
>> +	 * This may be called for a registered queue. E.g. during a device
>> +	 * revalidation. If that is the case, we need to unregister the old
>> +	 * set of concurrent ranges and register the new set. If the queue
>> +	 * is not registered, the device request queue registration will
>> +	 * register the ranges, so only swap in the new set and free the
>> +	 * old one.
>> +	 */
>> +reg:
>> +	if (blk_queue_registered(q)) {
>> +		blk_register_cranges(disk, cr);
>> +	} else {
>> +		swap(q->cranges, cr);
>> +		kfree(cr);
>> +	}
>> +
>> +unlock:
>> +	mutex_unlock(&q->sysfs_lock);
>> +	mutex_unlock(&q->sysfs_dir_lock);
>> +}
>> +EXPORT_SYMBOL_GPL(blk_queue_set_cranges);
>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
>> index 370d83c18057..aeac98ecc5a0 100644
>> --- a/block/blk-sysfs.c
>> +++ b/block/blk-sysfs.c
>> @@ -899,9 +899,21 @@ int blk_register_queue(struct gendisk *disk)
>>   	}
>>   
>>   	mutex_lock(&q->sysfs_lock);
>> +
>> +	ret = blk_register_cranges(disk, NULL);
>> +	if (ret) {
>> +		mutex_unlock(&q->sysfs_lock);
>> +		mutex_unlock(&q->sysfs_dir_lock);
>> +		kobject_del(&q->kobj);
>> +		blk_trace_remove_sysfs(dev);
>> +		kobject_put(&dev->kobj);
>> +		return ret;
>> +	}
>> +
>>   	if (q->elevator) {
>>   		ret = elv_register_queue(q, false);
>>   		if (ret) {
>> +			blk_unregister_cranges(disk);
>>   			mutex_unlock(&q->sysfs_lock);
>>   			mutex_unlock(&q->sysfs_dir_lock);
>>   			kobject_del(&q->kobj);
>> @@ -985,6 +997,7 @@ void blk_unregister_queue(struct gendisk *disk)
>>   	mutex_lock(&q->sysfs_lock);
>>   	if (q->elevator)
>>   		elv_unregister_queue(q);
>> +	blk_unregister_cranges(disk);
>>   	mutex_unlock(&q->sysfs_lock);
>>   	mutex_unlock(&q->sysfs_dir_lock);
>>   
>> diff --git a/block/blk.h b/block/blk.h
>> index 4b885c0f6708..650c0d87987c 100644
>> --- a/block/blk.h
>> +++ b/block/blk.h
>> @@ -368,4 +368,7 @@ extern struct device_attribute dev_attr_events;
>>   extern struct device_attribute dev_attr_events_async;
>>   extern struct device_attribute dev_attr_events_poll_msecs;
>>   
>> +int blk_register_cranges(struct gendisk *disk, struct blk_cranges *new_cranges);
>> +void blk_unregister_cranges(struct gendisk *disk);
>> +
>>   #endif /* BLK_INTERNAL_H */
>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>> index d3afea47ade6..d10352674d20 100644
>> --- a/include/linux/blkdev.h
>> +++ b/include/linux/blkdev.h
>> @@ -378,6 +378,29 @@ static inline int blkdev_zone_mgmt_ioctl(struct block_device *bdev,
>>   
>>   #endif /* CONFIG_BLK_DEV_ZONED */
>>   
>> +/*
>> + * Concurrent sector ranges: struct blk_crange describes range of
>> + * contiguous sectors that can be served by independent resources on the
>> + * device. The set of ranges defined in struct blk_cranges must overall
>> + * include all sectors within the device capacity.
>> + * For a device with multiple ranges, e.g. a single LUN multi-actuator HDD,
>> + * requests targeting sectors in different ranges can be executed in parallel.
>> + * A request can straddle a range boundary.
>> + */
>> +struct blk_crange {
>> +	struct kobject		kobj;
>> +	struct request_queue	*queue;
>> +	sector_t		sector;
>> +	sector_t		nr_sectors;
>> +};
>> +
>> +struct blk_cranges {
>> +	struct kobject		kobj;
>> +	bool			sysfs_registered;
>> +	unsigned int		nr_ranges;
>> +	struct blk_crange	ranges[];
>> +};
>> +
>>   struct request_queue {
>>   	struct request		*last_merge;
>>   	struct elevator_queue	*elevator;
>> @@ -570,6 +593,9 @@ struct request_queue {
>>   
>>   #define BLK_MAX_WRITE_HINTS	5
>>   	u64			write_hints[BLK_MAX_WRITE_HINTS];
>> +
>> +	/* Concurrent sector ranges */
>> +	struct blk_cranges	*cranges;
>>   };
>>   
>>   /* Keep blk_queue_flag_name[] in sync with the definitions below */
>> @@ -1163,6 +1189,9 @@ extern void blk_queue_required_elevator_features(struct request_queue *q,
>>   extern bool blk_queue_can_use_dma_map_merging(struct request_queue *q,
>>   					      struct device *dev);
>>   
>> +struct blk_cranges *blk_alloc_cranges(struct gendisk *disk, int nr_ranges);
>> +void blk_queue_set_cranges(struct gendisk *disk, struct blk_cranges *cr);
>> +
>>   /*
>>    * Number of physical segments as sent to the device.
>>    *
>>
> In principle it looks good, but what would be the appropriate action 
> when invalid ranges are being detected during revalidation?
> The current code will leave the original ones intact, but I guess that's 
> questionable as the current settings are most likely invalid.

Nope. In that case, the old ranges are removed. In blk_queue_set_cranges(),
there is:

+		if (!blk_check_ranges(disk, cr)) {
+			kfree(cr);
+			cr = NULL;
+			goto reg;
+		}

So for incorrect ranges, we will register "NULL", so no ranges. The old ranges
are gone.

> I would vote for de-register the old ones and implement an error state 
> (using an error pointer?); that would signal that there _are_ ranges, 
> but we couldn't parse them properly.
> Hmm?

With the current code, the information "there are ranges" will be completely
gone if the ranges are bad... dmesg will have a message about it, but that's it.


> 
> Cheers,
> 
> Hannes
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/4] block: Add concurrent positioning ranges support
  2021-07-26  8:30     ` Damien Le Moal
@ 2021-07-26  8:47       ` Hannes Reinecke
  2021-07-26 11:33         ` Damien Le Moal
  0 siblings, 1 reply; 27+ messages in thread
From: Hannes Reinecke @ 2021-07-26  8:47 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, linux-ide

On 7/26/21 10:30 AM, Damien Le Moal wrote:
> On 2021/07/26 16:34, Hannes Reinecke wrote:
[ .. ]
>> In principle it looks good, but what would be the appropriate action
>> when invalid ranges are being detected during revalidation?
>> The current code will leave the original ones intact, but I guess that's
>> questionable as the current settings are most likely invalid.
> 
> Nope. In that case, the old ranges are removed. In blk_queue_set_cranges(),
> there is:
> 
> +		if (!blk_check_ranges(disk, cr)) {
> +			kfree(cr);
> +			cr = NULL;
> +			goto reg;
> +		}
> 
> So for incorrect ranges, we will register "NULL", so no ranges. The old ranges
> are gone.
> 

Right. So that's the first concern addressed.

>> I would vote for de-register the old ones and implement an error state
>> (using an error pointer?); that would signal that there _are_ ranges,
>> but we couldn't parse them properly.
>> Hmm?
> 
> With the current code, the information "there are ranges" will be completely
> gone if the ranges are bad... dmesg will have a message about it, but that's it.
> 
So there will be no additional information in sysfs in case of incorrect 
ranges?

Hmm. Not sure if I like that, but then it might be the best option after 
all. So you can add my:

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/4] block: Add concurrent positioning ranges support
  2021-07-26  8:47       ` Hannes Reinecke
@ 2021-07-26 11:33         ` Damien Le Moal
  2021-07-27 14:07           ` Paolo Valente
  0 siblings, 1 reply; 27+ messages in thread
From: Damien Le Moal @ 2021-07-26 11:33 UTC (permalink / raw)
  To: Hannes Reinecke, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, linux-ide

On 2021/07/26 17:47, Hannes Reinecke wrote:
> On 7/26/21 10:30 AM, Damien Le Moal wrote:
>> On 2021/07/26 16:34, Hannes Reinecke wrote:
> [ .. ]
>>> In principle it looks good, but what would be the appropriate action
>>> when invalid ranges are being detected during revalidation?
>>> The current code will leave the original ones intact, but I guess that's
>>> questionable as the current settings are most likely invalid.
>>
>> Nope. In that case, the old ranges are removed. In blk_queue_set_cranges(),
>> there is:
>>
>> +		if (!blk_check_ranges(disk, cr)) {
>> +			kfree(cr);
>> +			cr = NULL;
>> +			goto reg;
>> +		}
>>
>> So for incorrect ranges, we will register "NULL", so no ranges. The old ranges
>> are gone.
>>
> 
> Right. So that's the first concern addressed.

Not that at the scsi layer, if there is an error retrieving the ranges
informations, blk_queue_set_cranges(q, NULL) is called, so the same happen: the
ranges set are removed and no range information will appear in sysfs.

> 
>>> I would vote for de-register the old ones and implement an error state
>>> (using an error pointer?); that would signal that there _are_ ranges,
>>> but we couldn't parse them properly.
>>> Hmm?
>>
>> With the current code, the information "there are ranges" will be completely
>> gone if the ranges are bad... dmesg will have a message about it, but that's it.
>>
> So there will be no additional information in sysfs in case of incorrect 
> ranges?

Yep, there will be no queue/cranges directory. The drive will be the same as a
single actuator one.

> Hmm. Not sure if I like that, but then it might be the best option after 
> all. So you can add my:

Nothing much that we can do. If we fail to retrieve the ranges, or the ranges
are incorrect, access optimization by FS or scheduler is not really possible.
Note that the drive will still work. Only any eventual optimization will be
turned off.

> Reviewed-by: Hannes Reinecke <hare@suse.de>

Thanks !

> 
> Cheers,
> 
> Hannes
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/4] block: Add concurrent positioning ranges support
  2021-07-26 11:33         ` Damien Le Moal
@ 2021-07-27 14:07           ` Paolo Valente
  2021-07-27 23:44             ` Damien Le Moal
  0 siblings, 1 reply; 27+ messages in thread
From: Paolo Valente @ 2021-07-27 14:07 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Hannes Reinecke, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, linux-ide



> Il giorno 26 lug 2021, alle ore 13:33, Damien Le Moal <Damien.LeMoal@wdc.com> ha scritto:
> 
> On 2021/07/26 17:47, Hannes Reinecke wrote:
>> On 7/26/21 10:30 AM, Damien Le Moal wrote:
>>> On 2021/07/26 16:34, Hannes Reinecke wrote:
>> [ .. ]
>>>> In principle it looks good, but what would be the appropriate action
>>>> when invalid ranges are being detected during revalidation?
>>>> The current code will leave the original ones intact, but I guess that's
>>>> questionable as the current settings are most likely invalid.
>>> 
>>> Nope. In that case, the old ranges are removed. In blk_queue_set_cranges(),
>>> there is:
>>> 
>>> +		if (!blk_check_ranges(disk, cr)) {
>>> +			kfree(cr);
>>> +			cr = NULL;
>>> +			goto reg;
>>> +		}
>>> 
>>> So for incorrect ranges, we will register "NULL", so no ranges. The old ranges
>>> are gone.
>>> 
>> 
>> Right. So that's the first concern addressed.
> 
> Not that at the scsi layer, if there is an error retrieving the ranges
> informations, blk_queue_set_cranges(q, NULL) is called, so the same happen: the
> ranges set are removed and no range information will appear in sysfs.
> 

As a very personal opinion, silent failures are often misleading when
trying to understand what is going wrong in a system.  But I guess
this is however the best option.

Thanks,
Paolo

>> 
>>>> I would vote for de-register the old ones and implement an error state
>>>> (using an error pointer?); that would signal that there _are_ ranges,
>>>> but we couldn't parse them properly.
>>>> Hmm?
>>> 
>>> With the current code, the information "there are ranges" will be completely
>>> gone if the ranges are bad... dmesg will have a message about it, but that's it.
>>> 
>> So there will be no additional information in sysfs in case of incorrect 
>> ranges?
> 
> Yep, there will be no queue/cranges directory. The drive will be the same as a
> single actuator one.
> 
>> Hmm. Not sure if I like that, but then it might be the best option after 
>> all. So you can add my:
> 
> Nothing much that we can do. If we fail to retrieve the ranges, or the ranges
> are incorrect, access optimization by FS or scheduler is not really possible.
> Note that the drive will still work. Only any eventual optimization will be
> turned off.
> 
>> Reviewed-by: Hannes Reinecke <hare@suse.de>
> 
> Thanks !
> 
>> 
>> Cheers,
>> 
>> Hannes
>> 
> 
> 
> -- 
> Damien Le Moal
> Western Digital Research


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/4] block: Add concurrent positioning ranges support
  2021-07-27 14:07           ` Paolo Valente
@ 2021-07-27 23:44             ` Damien Le Moal
  0 siblings, 0 replies; 27+ messages in thread
From: Damien Le Moal @ 2021-07-27 23:44 UTC (permalink / raw)
  To: Paolo Valente
  Cc: Hannes Reinecke, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, linux-ide

On 2021/07/27 23:07, Paolo Valente wrote:
> 
> 
>> Il giorno 26 lug 2021, alle ore 13:33, Damien Le Moal <Damien.LeMoal@wdc.com> ha scritto:
>>
>> On 2021/07/26 17:47, Hannes Reinecke wrote:
>>> On 7/26/21 10:30 AM, Damien Le Moal wrote:
>>>> On 2021/07/26 16:34, Hannes Reinecke wrote:
>>> [ .. ]
>>>>> In principle it looks good, but what would be the appropriate action
>>>>> when invalid ranges are being detected during revalidation?
>>>>> The current code will leave the original ones intact, but I guess that's
>>>>> questionable as the current settings are most likely invalid.
>>>>
>>>> Nope. In that case, the old ranges are removed. In blk_queue_set_cranges(),
>>>> there is:
>>>>
>>>> +		if (!blk_check_ranges(disk, cr)) {
>>>> +			kfree(cr);
>>>> +			cr = NULL;
>>>> +			goto reg;
>>>> +		}
>>>>
>>>> So for incorrect ranges, we will register "NULL", so no ranges. The old ranges
>>>> are gone.
>>>>
>>>
>>> Right. So that's the first concern addressed.
>>
>> Not that at the scsi layer, if there is an error retrieving the ranges
>> informations, blk_queue_set_cranges(q, NULL) is called, so the same happen: the
>> ranges set are removed and no range information will appear in sysfs.
>>
> 
> As a very personal opinion, silent failures are often misleading when
> trying to understand what is going wrong in a system.  But I guess
> this is however the best option.

Failure are not silent: error messages are printed and will be visible in dmesg.

We can always completely ignore the drive and completely fail its initialization
in the case of a failed cranges initialization. But I find that rather extreme
since the drive is supposed to work anyway, even without any access optimization
for the multiple cranges case.

> 
> Thanks,
> Paolo
> 
>>>
>>>>> I would vote for de-register the old ones and implement an error state
>>>>> (using an error pointer?); that would signal that there _are_ ranges,
>>>>> but we couldn't parse them properly.
>>>>> Hmm?
>>>>
>>>> With the current code, the information "there are ranges" will be completely
>>>> gone if the ranges are bad... dmesg will have a message about it, but that's it.
>>>>
>>> So there will be no additional information in sysfs in case of incorrect 
>>> ranges?
>>
>> Yep, there will be no queue/cranges directory. The drive will be the same as a
>> single actuator one.
>>
>>> Hmm. Not sure if I like that, but then it might be the best option after 
>>> all. So you can add my:
>>
>> Nothing much that we can do. If we fail to retrieve the ranges, or the ranges
>> are incorrect, access optimization by FS or scheduler is not really possible.
>> Note that the drive will still work. Only any eventual optimization will be
>> turned off.
>>
>>> Reviewed-by: Hannes Reinecke <hare@suse.de>
>>
>> Thanks !
>>
>>>
>>> Cheers,
>>>
>>> Hannes
>>>
>>
>>
>> -- 
>> Damien Le Moal
>> Western Digital Research
> 
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/4] Initial support for multi-actuator HDDs
  2021-07-26  1:38 [PATCH v3 0/4] Initial support for multi-actuator HDDs Damien Le Moal
                   ` (3 preceding siblings ...)
  2021-07-26  1:38 ` [PATCH v3 4/4] doc: document sysfs queue/cranges attributes Damien Le Moal
@ 2021-07-28 22:59 ` Damien Le Moal
  2021-08-06  2:12 ` Damien Le Moal
  2021-08-06  3:41 ` Martin K. Petersen
  6 siblings, 0 replies; 27+ messages in thread
From: Damien Le Moal @ 2021-07-28 22:59 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen, linux-ide
  Cc: Hannes Reinecke

On 2021/07/26 10:38, Damien Le Moal wrote:
> Single LUN multi-actuator hard-disks are cappable to seek and execute
> multiple commands in parallel. This capability is exposed to the host
> using the Concurrent Positioning Ranges VPD page (SCSI) and Log (ATA).
> Each positioning range describes the contiguous set of LBAs that an
> actuator serves.
> 
> This series adds support the scsi disk driver to retreive this
> information and advertize it to user space through sysfs. libata is also
> modified to handle ATA drives.
> 
> The first patch adds the block layer plumbing to expose concurrent
> sector ranges of the device through sysfs as a sub-directory of the
> device sysfs queue directory. Patch 2 and 3 add support to sd and
> libata. Finally patch 4 documents the sysfs queue attributed changes.
> 
> This series does not attempt in any way to optimize accesses to
> multi-actuator devices (e.g. block IO scheduler or filesystems). This
> initial support only exposes the actuators information to user space
> through sysfs.

Jens, Martin,

Any comment on this series ?

> 
> Changes from v2:
> * Update patch 1 to fix a compilation warning for a potential NULL
>   pointer dereference of the cr argument of blk_queue_set_cranges().
>   Warning reported by the kernel test robot <lkp@intel.com>).
> 
> Changes from v1:
> * Moved libata-scsi hunk from patch 1 to patch 3 where it belongs
> * Fixed unintialized variable in patch 2
>   Reported-by: kernel test robot <lkp@intel.com>
>   Reported-by: Dan Carpenter <dan.carpenter@oracle.com
> * Changed patch 3 adding struct ata_cpr_log to contain both the number
>   of concurrent ranges and the array of concurrent ranges.
> * Added a note in the documentation (patch 4) about the unit used for
>   the concurrent ranges attributes.
> 
> Damien Le Moal (4):
>   block: Add concurrent positioning ranges support
>   scsi: sd: add concurrent positioning ranges support
>   libata: support concurrent positioning ranges log
>   doc: document sysfs queue/cranges attributes
> 
>  Documentation/block/queue-sysfs.rst |  30 ++-
>  block/Makefile                      |   2 +-
>  block/blk-cranges.c                 | 295 ++++++++++++++++++++++++++++
>  block/blk-sysfs.c                   |  13 ++
>  block/blk.h                         |   3 +
>  drivers/ata/libata-core.c           |  52 +++++
>  drivers/ata/libata-scsi.c           |  48 ++++-
>  drivers/scsi/sd.c                   |  81 ++++++++
>  drivers/scsi/sd.h                   |   1 +
>  include/linux/ata.h                 |   1 +
>  include/linux/blkdev.h              |  29 +++
>  include/linux/libata.h              |  15 ++
>  12 files changed, 559 insertions(+), 11 deletions(-)
>  create mode 100644 block/blk-cranges.c
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/4] Initial support for multi-actuator HDDs
  2021-07-26  1:38 [PATCH v3 0/4] Initial support for multi-actuator HDDs Damien Le Moal
                   ` (4 preceding siblings ...)
  2021-07-28 22:59 ` [PATCH v3 0/4] Initial support for multi-actuator HDDs Damien Le Moal
@ 2021-08-06  2:12 ` Damien Le Moal
  2021-08-06  3:41 ` Martin K. Petersen
  6 siblings, 0 replies; 27+ messages in thread
From: Damien Le Moal @ 2021-08-06  2:12 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen, linux-ide
  Cc: Hannes Reinecke

On 2021/07/26 10:38, Damien Le Moal wrote:
> Single LUN multi-actuator hard-disks are cappable to seek and execute
> multiple commands in parallel. This capability is exposed to the host
> using the Concurrent Positioning Ranges VPD page (SCSI) and Log (ATA).
> Each positioning range describes the contiguous set of LBAs that an
> actuator serves.
> 
> This series adds support the scsi disk driver to retreive this
> information and advertize it to user space through sysfs. libata is also
> modified to handle ATA drives.
> 
> The first patch adds the block layer plumbing to expose concurrent
> sector ranges of the device through sysfs as a sub-directory of the
> device sysfs queue directory. Patch 2 and 3 add support to sd and
> libata. Finally patch 4 documents the sysfs queue attributed changes.
> 
> This series does not attempt in any way to optimize accesses to
> multi-actuator devices (e.g. block IO scheduler or filesystems). This
> initial support only exposes the actuators information to user space
> through sysfs.

Jens, Martin,

re-ping... Any comment on this series ?

> 
> Changes from v2:
> * Update patch 1 to fix a compilation warning for a potential NULL
>   pointer dereference of the cr argument of blk_queue_set_cranges().
>   Warning reported by the kernel test robot <lkp@intel.com>).
> 
> Changes from v1:
> * Moved libata-scsi hunk from patch 1 to patch 3 where it belongs
> * Fixed unintialized variable in patch 2
>   Reported-by: kernel test robot <lkp@intel.com>
>   Reported-by: Dan Carpenter <dan.carpenter@oracle.com
> * Changed patch 3 adding struct ata_cpr_log to contain both the number
>   of concurrent ranges and the array of concurrent ranges.
> * Added a note in the documentation (patch 4) about the unit used for
>   the concurrent ranges attributes.
> 
> Damien Le Moal (4):
>   block: Add concurrent positioning ranges support
>   scsi: sd: add concurrent positioning ranges support
>   libata: support concurrent positioning ranges log
>   doc: document sysfs queue/cranges attributes
> 
>  Documentation/block/queue-sysfs.rst |  30 ++-
>  block/Makefile                      |   2 +-
>  block/blk-cranges.c                 | 295 ++++++++++++++++++++++++++++
>  block/blk-sysfs.c                   |  13 ++
>  block/blk.h                         |   3 +
>  drivers/ata/libata-core.c           |  52 +++++
>  drivers/ata/libata-scsi.c           |  48 ++++-
>  drivers/scsi/sd.c                   |  81 ++++++++
>  drivers/scsi/sd.h                   |   1 +
>  include/linux/ata.h                 |   1 +
>  include/linux/blkdev.h              |  29 +++
>  include/linux/libata.h              |  15 ++
>  12 files changed, 559 insertions(+), 11 deletions(-)
>  create mode 100644 block/blk-cranges.c
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/4] Initial support for multi-actuator HDDs
  2021-07-26  1:38 [PATCH v3 0/4] Initial support for multi-actuator HDDs Damien Le Moal
                   ` (5 preceding siblings ...)
  2021-08-06  2:12 ` Damien Le Moal
@ 2021-08-06  3:41 ` Martin K. Petersen
  2021-08-06  4:05   ` Damien Le Moal
  6 siblings, 1 reply; 27+ messages in thread
From: Martin K. Petersen @ 2021-08-06  3:41 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	linux-ide, Hannes Reinecke


Damien,

> Single LUN multi-actuator hard-disks are cappable to seek and execute
> multiple commands in parallel. This capability is exposed to the host
> using the Concurrent Positioning Ranges VPD page (SCSI) and Log (ATA).
> Each positioning range describes the contiguous set of LBAs that an
> actuator serves.

I have to say that I prefer the multi-LUN model.

> The first patch adds the block layer plumbing to expose concurrent
> sector ranges of the device through sysfs as a sub-directory of the
> device sysfs queue directory.

So how do you envision this range reporting should work when putting
DM/MD on top of a multi-actuator disk?

And even without multi-actuator drives, how would you express concurrent
ranges on a DM/MD device sitting on top of a several single-actuator
devices?

While I appreciate that it is easy to just export what the hardware
reports in sysfs, I also think we should consider how filesystems would
use that information. And how things would work outside of the simple
fs-on-top-of-multi-actuator-drive case.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/4] Initial support for multi-actuator HDDs
  2021-08-06  3:41 ` Martin K. Petersen
@ 2021-08-06  4:05   ` Damien Le Moal
  2021-08-06  8:35     ` Hannes Reinecke
  0 siblings, 1 reply; 27+ messages in thread
From: Damien Le Moal @ 2021-08-06  4:05 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: linux-block, Jens Axboe, linux-scsi, linux-ide, Hannes Reinecke

On 2021/08/06 12:42, Martin K. Petersen wrote:
> 
> Damien,
> 
>> Single LUN multi-actuator hard-disks are cappable to seek and execute
>> multiple commands in parallel. This capability is exposed to the host
>> using the Concurrent Positioning Ranges VPD page (SCSI) and Log (ATA).
>> Each positioning range describes the contiguous set of LBAs that an
>> actuator serves.
> 
> I have to say that I prefer the multi-LUN model.

It is certainly easier: nothing to do :)
SATA, as usual, makes things harder...

> 
>> The first patch adds the block layer plumbing to expose concurrent
>> sector ranges of the device through sysfs as a sub-directory of the
>> device sysfs queue directory.
> 
> So how do you envision this range reporting should work when putting
> DM/MD on top of a multi-actuator disk?

The ranges are attached to the device request queue. So the DM/MD target driver
can use that information from the underlying devices for whatever possible
optimization. For the logical device exposed by the target driver, the ranges
are not limits so they are not inherited. As is, right now, DM target devices
will not show any range information for the logical devices they create, even if
the underlying devices have multiple ranges.

The DM/MD target driver is free to set any range information pertinent to the
target. E.g. dm-liear could set the range information corresponding to sector
chunks from different devices used to build the dm-linear device.

> And even without multi-actuator drives, how would you express concurrent
> ranges on a DM/MD device sitting on top of a several single-actuator
> devices?

Similar comment as above: it is up to the DM/MD target driver to decide if range
information can be useful. For dm-linear, there are obvious cases where it is.
Ex: 2 single actuator drives concatenated together can generate 2 ranges
similarly to a real split-actuator disk. Expressing the chunks of a dm-linear
setup as ranges may not always be possible though, that is, if we keep the
assumption that a range is independent from others in terms of command
execution. Ex: a dm-linear setup that shuffles a drive LBA mapping (high to low
and low to high) has no business showing sector ranges.

> While I appreciate that it is easy to just export what the hardware
> reports in sysfs, I also think we should consider how filesystems would
> use that information. And how things would work outside of the simple
> fs-on-top-of-multi-actuator-drive case.

Without any change anywhere in existing code (kernel and applications using raw
disk accesses), things will just work as is. The multi/split actuator drive will
behave as a single actuator drive, even for commands spanning range boundaries.
Your guess on potential IOPS gains is as good as mine in this case. Performance
will totally depend on the workload but will not be worse than an equivalent
single actuator disk.

FS block allocators can definitely use the range information to distribute
writes among actuators. For reads, well, gains will depend on the workload,
obviously, but optimizations at the block IO scheduler level can improve things
too, especially if the drive is being used at a QD beyond its capability (that
is, requests are accumulated in the IO scheduler).

Similar write optimization can be achieved by applications using block device
files directly. This series is intended for this case for now. FS and bloc IO
scheduler optimization can be added later.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/4] Initial support for multi-actuator HDDs
  2021-08-06  4:05   ` Damien Le Moal
@ 2021-08-06  8:35     ` Hannes Reinecke
  2021-08-06  8:52       ` Damien Le Moal
  0 siblings, 1 reply; 27+ messages in thread
From: Hannes Reinecke @ 2021-08-06  8:35 UTC (permalink / raw)
  To: Damien Le Moal, Martin K. Petersen
  Cc: linux-block, Jens Axboe, linux-scsi, linux-ide

On 8/6/21 6:05 AM, Damien Le Moal wrote:
> On 2021/08/06 12:42, Martin K. Petersen wrote:
>>
>> Damien,
>>
>>> Single LUN multi-actuator hard-disks are cappable to seek and execute
>>> multiple commands in parallel. This capability is exposed to the host
>>> using the Concurrent Positioning Ranges VPD page (SCSI) and Log (ATA).
>>> Each positioning range describes the contiguous set of LBAs that an
>>> actuator serves.
>>
>> I have to say that I prefer the multi-LUN model.
> 
> It is certainly easier: nothing to do :)
> SATA, as usual, makes things harder...
> 
>>
>>> The first patch adds the block layer plumbing to expose concurrent
>>> sector ranges of the device through sysfs as a sub-directory of the
>>> device sysfs queue directory.
>>
>> So how do you envision this range reporting should work when putting
>> DM/MD on top of a multi-actuator disk?
> 
> The ranges are attached to the device request queue. So the DM/MD target driver
> can use that information from the underlying devices for whatever possible
> optimization. For the logical device exposed by the target driver, the ranges
> are not limits so they are not inherited. As is, right now, DM target devices
> will not show any range information for the logical devices they create, even if
> the underlying devices have multiple ranges.
> 
> The DM/MD target driver is free to set any range information pertinent to the
> target. E.g. dm-liear could set the range information corresponding to sector
> chunks from different devices used to build the dm-linear device.
> 
And indeed, that would be the easiest consumer.
One 'just' needs to have a simple script converting the sysfs ranges
into the corresponding dm-linear table definitions, and create one DM
device for each range.
That would simulate the multi-LUN approach.
Not sure if that would warrant a 'real' DM target, seeing that it's
fully scriptable.

>> And even without multi-actuator drives, how would you express concurrent
>> ranges on a DM/MD device sitting on top of a several single-actuator
>> devices?
> 
> Similar comment as above: it is up to the DM/MD target driver to decide if range
> information can be useful. For dm-linear, there are obvious cases where it is.
> Ex: 2 single actuator drives concatenated together can generate 2 ranges
> similarly to a real split-actuator disk. Expressing the chunks of a dm-linear
> setup as ranges may not always be possible though, that is, if we keep the
> assumption that a range is independent from others in terms of command
> execution. Ex: a dm-linear setup that shuffles a drive LBA mapping (high to low
> and low to high) has no business showing sector ranges.
> 
>> While I appreciate that it is easy to just export what the hardware
>> reports in sysfs, I also think we should consider how filesystems would
>> use that information. And how things would work outside of the simple
>> fs-on-top-of-multi-actuator-drive case.
> 
> Without any change anywhere in existing code (kernel and applications using raw
> disk accesses), things will just work as is. The multi/split actuator drive will
> behave as a single actuator drive, even for commands spanning range boundaries.
> Your guess on potential IOPS gains is as good as mine in this case. Performance
> will totally depend on the workload but will not be worse than an equivalent
> single actuator disk.
> 
> FS block allocators can definitely use the range information to distribute
> writes among actuators. For reads, well, gains will depend on the workload,
> obviously, but optimizations at the block IO scheduler level can improve things
> too, especially if the drive is being used at a QD beyond its capability (that
> is, requests are accumulated in the IO scheduler).
> 
> Similar write optimization can be achieved by applications using block device
> files directly. This series is intended for this case for now. FS and bloc IO
> scheduler optimization can be added later.
> 
> 
Rumours have it that Paolo Valente is working on adapting BFQ to utilize
the range information for better actuator utilisation.
And eventually one should modify filesystem utilities like xfs to adapt
the metadata layout to multi-actuator drives.

The _real_ fun starts once the HDD manufactures starts putting out
multi-actuator SMR drives :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/4] Initial support for multi-actuator HDDs
  2021-08-06  8:35     ` Hannes Reinecke
@ 2021-08-06  8:52       ` Damien Le Moal
  0 siblings, 0 replies; 27+ messages in thread
From: Damien Le Moal @ 2021-08-06  8:52 UTC (permalink / raw)
  To: Hannes Reinecke, Martin K. Petersen
  Cc: linux-block, Jens Axboe, linux-scsi, linux-ide

On 2021/08/06 17:36, Hannes Reinecke wrote:
> On 8/6/21 6:05 AM, Damien Le Moal wrote:
>> On 2021/08/06 12:42, Martin K. Petersen wrote:
>>>
>>> Damien,
>>>
>>>> Single LUN multi-actuator hard-disks are cappable to seek and execute
>>>> multiple commands in parallel. This capability is exposed to the host
>>>> using the Concurrent Positioning Ranges VPD page (SCSI) and Log (ATA).
>>>> Each positioning range describes the contiguous set of LBAs that an
>>>> actuator serves.
>>>
>>> I have to say that I prefer the multi-LUN model.
>>
>> It is certainly easier: nothing to do :)
>> SATA, as usual, makes things harder...
>>
>>>
>>>> The first patch adds the block layer plumbing to expose concurrent
>>>> sector ranges of the device through sysfs as a sub-directory of the
>>>> device sysfs queue directory.
>>>
>>> So how do you envision this range reporting should work when putting
>>> DM/MD on top of a multi-actuator disk?
>>
>> The ranges are attached to the device request queue. So the DM/MD target driver
>> can use that information from the underlying devices for whatever possible
>> optimization. For the logical device exposed by the target driver, the ranges
>> are not limits so they are not inherited. As is, right now, DM target devices
>> will not show any range information for the logical devices they create, even if
>> the underlying devices have multiple ranges.
>>
>> The DM/MD target driver is free to set any range information pertinent to the
>> target. E.g. dm-liear could set the range information corresponding to sector
>> chunks from different devices used to build the dm-linear device.
>>
> And indeed, that would be the easiest consumer.
> One 'just' needs to have a simple script converting the sysfs ranges
> into the corresponding dm-linear table definitions, and create one DM
> device for each range.
> That would simulate the multi-LUN approach.
> Not sure if that would warrant a 'real' DM target, seeing that it's
> fully scriptable.
> 
>>> And even without multi-actuator drives, how would you express concurrent
>>> ranges on a DM/MD device sitting on top of a several single-actuator
>>> devices?
>>
>> Similar comment as above: it is up to the DM/MD target driver to decide if range
>> information can be useful. For dm-linear, there are obvious cases where it is.
>> Ex: 2 single actuator drives concatenated together can generate 2 ranges
>> similarly to a real split-actuator disk. Expressing the chunks of a dm-linear
>> setup as ranges may not always be possible though, that is, if we keep the
>> assumption that a range is independent from others in terms of command
>> execution. Ex: a dm-linear setup that shuffles a drive LBA mapping (high to low
>> and low to high) has no business showing sector ranges.
>>
>>> While I appreciate that it is easy to just export what the hardware
>>> reports in sysfs, I also think we should consider how filesystems would
>>> use that information. And how things would work outside of the simple
>>> fs-on-top-of-multi-actuator-drive case.
>>
>> Without any change anywhere in existing code (kernel and applications using raw
>> disk accesses), things will just work as is. The multi/split actuator drive will
>> behave as a single actuator drive, even for commands spanning range boundaries.
>> Your guess on potential IOPS gains is as good as mine in this case. Performance
>> will totally depend on the workload but will not be worse than an equivalent
>> single actuator disk.
>>
>> FS block allocators can definitely use the range information to distribute
>> writes among actuators. For reads, well, gains will depend on the workload,
>> obviously, but optimizations at the block IO scheduler level can improve things
>> too, especially if the drive is being used at a QD beyond its capability (that
>> is, requests are accumulated in the IO scheduler).
>>
>> Similar write optimization can be achieved by applications using block device
>> files directly. This series is intended for this case for now. FS and bloc IO
>> scheduler optimization can be added later.
>>
>>
> Rumours have it that Paolo Valente is working on adapting BFQ to utilize
> the range information for better actuator utilisation.

Paolo has a talk on this subject scheduled for SNIA SDC 2021.

https://storagedeveloper.org/events/sdc-2021/abstracts#hd-Walker

> And eventually one should modify filesystem utilities like xfs to adapt
> the metadata layout to multi-actuator drives.
> 
> The _real_ fun starts once the HDD manufactures starts putting out
> multi-actuator SMR drives :-)

Well, that does not change things that much in the end. The same constraints
remain, and the sector ranges will be aligned to zones. So no added difficulty.

> 
> Cheers,
> 
> Hannes
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/4] block: Add concurrent positioning ranges support
  2021-07-26  1:38 ` [PATCH v3 1/4] block: Add concurrent positioning ranges support Damien Le Moal
  2021-07-26  7:33   ` Hannes Reinecke
@ 2021-08-10  8:23   ` Christoph Hellwig
  2021-08-10 11:03     ` Damien Le Moal
  1 sibling, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2021-08-10  8:23 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	linux-ide, Hannes Reinecke

On Mon, Jul 26, 2021 at 10:38:03AM +0900, Damien Le Moal wrote:
> The Concurrent Positioning Ranges VPD page (for SCSI) and Log (for ATA)
> contain parameters describing the number of sets of contiguous LBAs that
> can be served independently by a single LUN multi-actuator disk. This
> patch provides the blk_queue_set_cranges() function allowing a device
> driver to signal to the block layer that a disk has multiple actuators,
> each one serving a contiguous range of sectors. To describe the set
> of sector ranges representing the different actuators of a device, the
> data type struct blk_cranges is introduced.
> 
> For a device with multiple actuators, a struct blk_cranges is attached
> to the device request queue by the blk_queue_set_cranges() function. The
> function blk_alloc_cranges() is provided for drivers to allocate this
> structure.
> 
> The blk_cranges structure contains kobjects (struct kobject) to register
> with sysfs the set of sector ranges defined by a device. On initial
> device scan, this registration is done from blk_register_queue() using
> the block layer internal function blk_register_cranges(). If a driver
> calls blk_queue_set_cranges() for a registered queue, e.g. when a device
> is revalidated, blk_queue_set_cranges() will execute
> blk_register_cranges() to update the queue sysfs attribute files.
> 
> The sysfs file structure created starts from the cranges sub-directory
> and contains the start sector and number of sectors served by an
> actuator, with the information for each actuator grouped in one
> directory per actuator. E.g. for a dual actuator drive, we have:
> 
> $ tree /sys/block/sdk/queue/cranges/
> /sys/block/sdk/queue/cranges/
> |-- 0
> |   |-- nr_sectors
> |   `-- sector
> `-- 1
>     |-- nr_sectors
>     `-- sector
> 
> For a regular single actuator device, the cranges directory does not
> exist.
> 
> Device revalidation may lead to changes to this structure and to the
> attribute values. When manipulated, the queue sysfs_lock and
> sysfs_dir_lock are held for atomicity, similarly to how the blk-mq and
> elevator sysfs queue sub-directories are protected.
> 
> The code related to the management of cranges is added in the new
> file block/blk-cranges.c.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>  block/Makefile         |   2 +-
>  block/blk-cranges.c    | 295 +++++++++++++++++++++++++++++++++++++++++
>  block/blk-sysfs.c      |  13 ++
>  block/blk.h            |   3 +
>  include/linux/blkdev.h |  29 ++++
>  5 files changed, 341 insertions(+), 1 deletion(-)
>  create mode 100644 block/blk-cranges.c
> 
> diff --git a/block/Makefile b/block/Makefile
> index bfbe4e13ca1e..e477e6ca9ea6 100644
> --- a/block/Makefile
> +++ b/block/Makefile
> @@ -9,7 +9,7 @@ obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-sysfs.o \
>  			blk-lib.o blk-mq.o blk-mq-tag.o blk-stat.o \
>  			blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \
>  			genhd.o ioprio.o badblocks.o partitions/ blk-rq-qos.o \
> -			disk-events.o
> +			disk-events.o blk-cranges.o
>  
>  obj-$(CONFIG_BOUNCE)		+= bounce.o
>  obj-$(CONFIG_BLK_SCSI_REQUEST)	+= scsi_ioctl.o
> diff --git a/block/blk-cranges.c b/block/blk-cranges.c
> new file mode 100644
> index 000000000000..deaa09e564f7
> --- /dev/null
> +++ b/block/blk-cranges.c
> @@ -0,0 +1,295 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + *  Block device concurrent positioning ranges.
> + *
> + *  Copyright (C) 2021 Western Digital Corporation or its Affiliates.
> + */
> +#include <linux/kernel.h>
> +#include <linux/blkdev.h>
> +#include <linux/slab.h>
> +#include <linux/init.h>
> +
> +#include "blk.h"
> +
> +static ssize_t blk_crange_sector_show(struct blk_crange *cr, char *page)
> +{
> +	return sprintf(page, "%llu\n", cr->sector);
> +}
> +
> +static ssize_t blk_crange_nr_sectors_show(struct blk_crange *cr, char *page)
> +{
> +	return sprintf(page, "%llu\n", cr->nr_sectors);
> +}
> +
> +struct blk_crange_sysfs_entry {
> +	struct attribute attr;
> +	ssize_t (*show)(struct blk_crange *cr, char *page);
> +};
> +
> +static struct blk_crange_sysfs_entry blk_crange_sector_entry = {
> +	.attr = { .name = "sector", .mode = 0444 },
> +	.show = blk_crange_sector_show,
> +};
> +
> +static struct blk_crange_sysfs_entry blk_crange_nr_sectors_entry = {
> +	.attr = { .name = "nr_sectors", .mode = 0444 },
> +	.show = blk_crange_nr_sectors_show,
> +};
> +
> +static struct attribute *blk_crange_attrs[] = {
> +	&blk_crange_sector_entry.attr,
> +	&blk_crange_nr_sectors_entry.attr,
> +	NULL,
> +};
> +ATTRIBUTE_GROUPS(blk_crange);
> +
> +static ssize_t blk_crange_sysfs_show(struct kobject *kobj,
> +				     struct attribute *attr, char *page)
> +{
> +	struct blk_crange_sysfs_entry *entry;
> +	struct blk_crange *cr;
> +	struct request_queue *q;
> +	ssize_t ret;
> +
> +	entry = container_of(attr, struct blk_crange_sysfs_entry, attr);
> +	cr = container_of(kobj, struct blk_crange, kobj);
> +	q = cr->queue;

Nit: I'd find this a little eaier to read if the variables were
initialized at declaration time:

	struct blk_crange_sysfs_entry *entry =
		container_of(attr, struct blk_crange_sysfs_entry, attr);
	struct blk_crange *cr = container_of(kobj, struct blk_crange, kobj);
	struct request_queue *q = cr->queue;

(or maybe even don't bother with the q local variable).

> +/*
> + * Dummy release function to make kobj happy.
> + */
> +static void blk_cranges_sysfs_nop_release(struct kobject *kobj)
> +{
> +}

How do we ensure the kobj is still alive while it is accessed?

> +void blk_queue_set_cranges(struct gendisk *disk, struct blk_cranges *cr)

s/blk_queue/disk/

>  	mutex_lock(&q->sysfs_lock);
> +
> +	ret = blk_register_cranges(disk, NULL);
> +	if (ret) {
> +		mutex_unlock(&q->sysfs_lock);
> +		mutex_unlock(&q->sysfs_dir_lock);
> +		kobject_del(&q->kobj);
> +		blk_trace_remove_sysfs(dev);
> +		kobject_put(&dev->kobj);
> +		return ret;
> +	}
> +
>  	if (q->elevator) {
>  		ret = elv_register_queue(q, false);
>  		if (ret) {
> +			blk_unregister_cranges(disk);
>  			mutex_unlock(&q->sysfs_lock);
>  			mutex_unlock(&q->sysfs_dir_lock);
>  			kobject_del(&q->kobj);

Please use a goto instead of duplicating the code.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/4] scsi: sd: add concurrent positioning ranges support
  2021-07-26  1:38 ` [PATCH v3 2/4] scsi: sd: add " Damien Le Moal
@ 2021-08-10  8:24   ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2021-08-10  8:24 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	linux-ide, Hannes Reinecke

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 3/4] libata: support concurrent positioning ranges log
  2021-07-26  1:38 ` [PATCH v3 3/4] libata: support concurrent positioning ranges log Damien Le Moal
  2021-07-26  7:34   ` Hannes Reinecke
@ 2021-08-10  8:26   ` Christoph Hellwig
  1 sibling, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2021-08-10  8:26 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	linux-ide, Hannes Reinecke

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 4/4] doc: document sysfs queue/cranges attributes
  2021-07-26  1:38 ` [PATCH v3 4/4] doc: document sysfs queue/cranges attributes Damien Le Moal
  2021-07-26  7:35   ` Hannes Reinecke
@ 2021-08-10  8:27   ` Christoph Hellwig
  2021-08-10 11:04     ` Damien Le Moal
  1 sibling, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2021-08-10  8:27 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	linux-ide, Hannes Reinecke

On Mon, Jul 26, 2021 at 10:38:06AM +0900, Damien Le Moal wrote:
> Update the file Documentation/block/queue-sysfs.rst to add a description
> of a device queue sysfs entries related to concurrent sector ranges
> (e.g. concurrent positioning ranges for multi-actuator hard-disks).
> 
> While at it, also fix a typo in this file introduction paragraph.

That should really be a separate patch.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/4] block: Add concurrent positioning ranges support
  2021-08-10  8:23   ` Christoph Hellwig
@ 2021-08-10 11:03     ` Damien Le Moal
  2021-08-10 16:02       ` hch
  0 siblings, 1 reply; 27+ messages in thread
From: Damien Le Moal @ 2021-08-10 11:03 UTC (permalink / raw)
  To: hch
  Cc: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	linux-ide, Hannes Reinecke

On 2021/08/10 17:24, Christoph Hellwig wrote:
> On Mon, Jul 26, 2021 at 10:38:03AM +0900, Damien Le Moal wrote:
>> The Concurrent Positioning Ranges VPD page (for SCSI) and Log (for ATA)
>> contain parameters describing the number of sets of contiguous LBAs that
>> can be served independently by a single LUN multi-actuator disk. This
>> patch provides the blk_queue_set_cranges() function allowing a device
>> driver to signal to the block layer that a disk has multiple actuators,
>> each one serving a contiguous range of sectors. To describe the set
>> of sector ranges representing the different actuators of a device, the
>> data type struct blk_cranges is introduced.
>>
>> For a device with multiple actuators, a struct blk_cranges is attached
>> to the device request queue by the blk_queue_set_cranges() function. The
>> function blk_alloc_cranges() is provided for drivers to allocate this
>> structure.
>>
>> The blk_cranges structure contains kobjects (struct kobject) to register
>> with sysfs the set of sector ranges defined by a device. On initial
>> device scan, this registration is done from blk_register_queue() using
>> the block layer internal function blk_register_cranges(). If a driver
>> calls blk_queue_set_cranges() for a registered queue, e.g. when a device
>> is revalidated, blk_queue_set_cranges() will execute
>> blk_register_cranges() to update the queue sysfs attribute files.
>>
>> The sysfs file structure created starts from the cranges sub-directory
>> and contains the start sector and number of sectors served by an
>> actuator, with the information for each actuator grouped in one
>> directory per actuator. E.g. for a dual actuator drive, we have:
>>
>> $ tree /sys/block/sdk/queue/cranges/
>> /sys/block/sdk/queue/cranges/
>> |-- 0
>> |   |-- nr_sectors
>> |   `-- sector
>> `-- 1
>>     |-- nr_sectors
>>     `-- sector
>>
>> For a regular single actuator device, the cranges directory does not
>> exist.
>>
>> Device revalidation may lead to changes to this structure and to the
>> attribute values. When manipulated, the queue sysfs_lock and
>> sysfs_dir_lock are held for atomicity, similarly to how the blk-mq and
>> elevator sysfs queue sub-directories are protected.
>>
>> The code related to the management of cranges is added in the new
>> file block/blk-cranges.c.
>>
>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>> ---
>>  block/Makefile         |   2 +-
>>  block/blk-cranges.c    | 295 +++++++++++++++++++++++++++++++++++++++++
>>  block/blk-sysfs.c      |  13 ++
>>  block/blk.h            |   3 +
>>  include/linux/blkdev.h |  29 ++++
>>  5 files changed, 341 insertions(+), 1 deletion(-)
>>  create mode 100644 block/blk-cranges.c
>>
>> diff --git a/block/Makefile b/block/Makefile
>> index bfbe4e13ca1e..e477e6ca9ea6 100644
>> --- a/block/Makefile
>> +++ b/block/Makefile
>> @@ -9,7 +9,7 @@ obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-sysfs.o \
>>  			blk-lib.o blk-mq.o blk-mq-tag.o blk-stat.o \
>>  			blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \
>>  			genhd.o ioprio.o badblocks.o partitions/ blk-rq-qos.o \
>> -			disk-events.o
>> +			disk-events.o blk-cranges.o
>>  
>>  obj-$(CONFIG_BOUNCE)		+= bounce.o
>>  obj-$(CONFIG_BLK_SCSI_REQUEST)	+= scsi_ioctl.o
>> diff --git a/block/blk-cranges.c b/block/blk-cranges.c
>> new file mode 100644
>> index 000000000000..deaa09e564f7
>> --- /dev/null
>> +++ b/block/blk-cranges.c
>> @@ -0,0 +1,295 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + *  Block device concurrent positioning ranges.
>> + *
>> + *  Copyright (C) 2021 Western Digital Corporation or its Affiliates.
>> + */
>> +#include <linux/kernel.h>
>> +#include <linux/blkdev.h>
>> +#include <linux/slab.h>
>> +#include <linux/init.h>
>> +
>> +#include "blk.h"
>> +
>> +static ssize_t blk_crange_sector_show(struct blk_crange *cr, char *page)
>> +{
>> +	return sprintf(page, "%llu\n", cr->sector);
>> +}
>> +
>> +static ssize_t blk_crange_nr_sectors_show(struct blk_crange *cr, char *page)
>> +{
>> +	return sprintf(page, "%llu\n", cr->nr_sectors);
>> +}
>> +
>> +struct blk_crange_sysfs_entry {
>> +	struct attribute attr;
>> +	ssize_t (*show)(struct blk_crange *cr, char *page);
>> +};
>> +
>> +static struct blk_crange_sysfs_entry blk_crange_sector_entry = {
>> +	.attr = { .name = "sector", .mode = 0444 },
>> +	.show = blk_crange_sector_show,
>> +};
>> +
>> +static struct blk_crange_sysfs_entry blk_crange_nr_sectors_entry = {
>> +	.attr = { .name = "nr_sectors", .mode = 0444 },
>> +	.show = blk_crange_nr_sectors_show,
>> +};
>> +
>> +static struct attribute *blk_crange_attrs[] = {
>> +	&blk_crange_sector_entry.attr,
>> +	&blk_crange_nr_sectors_entry.attr,
>> +	NULL,
>> +};
>> +ATTRIBUTE_GROUPS(blk_crange);
>> +
>> +static ssize_t blk_crange_sysfs_show(struct kobject *kobj,
>> +				     struct attribute *attr, char *page)
>> +{
>> +	struct blk_crange_sysfs_entry *entry;
>> +	struct blk_crange *cr;
>> +	struct request_queue *q;
>> +	ssize_t ret;
>> +
>> +	entry = container_of(attr, struct blk_crange_sysfs_entry, attr);
>> +	cr = container_of(kobj, struct blk_crange, kobj);
>> +	q = cr->queue;
> 
> Nit: I'd find this a little eaier to read if the variables were
> initialized at declaration time:
> 
> 	struct blk_crange_sysfs_entry *entry =
> 		container_of(attr, struct blk_crange_sysfs_entry, attr);
> 	struct blk_crange *cr = container_of(kobj, struct blk_crange, kobj);
> 	struct request_queue *q = cr->queue;
> 
> (or maybe even don't bother with the q local variable).

OK.

> 
>> +/*
>> + * Dummy release function to make kobj happy.
>> + */
>> +static void blk_cranges_sysfs_nop_release(struct kobject *kobj)
>> +{
>> +}
> 
> How do we ensure the kobj is still alive while it is accessed?

q->sysfs_lock ensures that. This mutex is taken whenever revalidate registers
new ranges (see blk_queue_set_cranges below), and is taken also when the ranges
are unregistered (on revalidate if the ranges changed and when the request queue
is unregistered). And blk_crange_sysfs_show() takes that lock too. So the kobj
cannot be freed while it is being accessed (the sysfs inode lock also prevents
it since kobj_del() will take the inode lock).

> 
>> +void blk_queue_set_cranges(struct gendisk *disk, struct blk_cranges *cr)
> 
> s/blk_queue/disk/

Hmmm... The argument is a gendisk, but it is the request queue that is modified.
So following other functions like this, isn't blk_queue_ prefix better ?

> 
>>  	mutex_lock(&q->sysfs_lock);
>> +
>> +	ret = blk_register_cranges(disk, NULL);
>> +	if (ret) {
>> +		mutex_unlock(&q->sysfs_lock);
>> +		mutex_unlock(&q->sysfs_dir_lock);
>> +		kobject_del(&q->kobj);
>> +		blk_trace_remove_sysfs(dev);
>> +		kobject_put(&dev->kobj);
>> +		return ret;
>> +	}
>> +
>>  	if (q->elevator) {
>>  		ret = elv_register_queue(q, false);
>>  		if (ret) {
>> +			blk_unregister_cranges(disk);
>>  			mutex_unlock(&q->sysfs_lock);
>>  			mutex_unlock(&q->sysfs_dir_lock);
>>  			kobject_del(&q->kobj);
> 
> Please use a goto instead of duplicating the code.

I had tried that but it made a mess. Will have another look at it.

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 4/4] doc: document sysfs queue/cranges attributes
  2021-08-10  8:27   ` Christoph Hellwig
@ 2021-08-10 11:04     ` Damien Le Moal
  0 siblings, 0 replies; 27+ messages in thread
From: Damien Le Moal @ 2021-08-10 11:04 UTC (permalink / raw)
  To: hch
  Cc: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	linux-ide, Hannes Reinecke

On 2021/08/10 17:28, Christoph Hellwig wrote:
> On Mon, Jul 26, 2021 at 10:38:06AM +0900, Damien Le Moal wrote:
>> Update the file Documentation/block/queue-sysfs.rst to add a description
>> of a device queue sysfs entries related to concurrent sector ranges
>> (e.g. concurrent positioning ranges for multi-actuator hard-disks).
>>
>> While at it, also fix a typo in this file introduction paragraph.
> 
> That should really be a separate patch.

OK. Will split that part out in a different patch (and cc stable maybe).

> 
> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/4] block: Add concurrent positioning ranges support
  2021-08-10 11:03     ` Damien Le Moal
@ 2021-08-10 16:02       ` hch
  2021-08-10 23:46         ` Damien Le Moal
  0 siblings, 1 reply; 27+ messages in thread
From: hch @ 2021-08-10 16:02 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: hch, linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	linux-ide, Hannes Reinecke

On Tue, Aug 10, 2021 at 11:03:41AM +0000, Damien Le Moal wrote:
> >> + * Dummy release function to make kobj happy.
> >> + */
> >> +static void blk_cranges_sysfs_nop_release(struct kobject *kobj)
> >> +{
> >> +}
> > 
> > How do we ensure the kobj is still alive while it is accessed?
> 
> q->sysfs_lock ensures that. This mutex is taken whenever revalidate registers
> new ranges (see blk_queue_set_cranges below), and is taken also when the ranges
> are unregistered (on revalidate if the ranges changed and when the request queue
> is unregistered). And blk_crange_sysfs_show() takes that lock too. So the kobj
> cannot be freed while it is being accessed (the sysfs inode lock also prevents
> it since kobj_del() will take the inode lock).

Does it?  It only protects the access inside of it, but not the object
lifetime.

> 
> > 
> >> +void blk_queue_set_cranges(struct gendisk *disk, struct blk_cranges *cr)
> > 
> > s/blk_queue/disk/
> 
> Hmmm... The argument is a gendisk, but it is the request queue that is modified.
> So following other functions like this, isn't blk_queue_ prefix better ?

Do we have blk_queue_ functions that take a gendisk anywhere?  The
ones I noticed (and the ones I've recently added) all use disk_.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/4] block: Add concurrent positioning ranges support
  2021-08-10 16:02       ` hch
@ 2021-08-10 23:46         ` Damien Le Moal
  0 siblings, 0 replies; 27+ messages in thread
From: Damien Le Moal @ 2021-08-10 23:46 UTC (permalink / raw)
  To: hch, Damien Le Moal
  Cc: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen,
	linux-ide, Hannes Reinecke

On 2021/08/11 1:02, hch@infradead.org wrote:
> On Tue, Aug 10, 2021 at 11:03:41AM +0000, Damien Le Moal wrote:
>>>> + * Dummy release function to make kobj happy.
>>>> + */
>>>> +static void blk_cranges_sysfs_nop_release(struct kobject *kobj)
>>>> +{
>>>> +}
>>>
>>> How do we ensure the kobj is still alive while it is accessed?
>>
>> q->sysfs_lock ensures that. This mutex is taken whenever revalidate registers
>> new ranges (see blk_queue_set_cranges below), and is taken also when the ranges
>> are unregistered (on revalidate if the ranges changed and when the request queue
>> is unregistered). And blk_crange_sysfs_show() takes that lock too. So the kobj
>> cannot be freed while it is being accessed (the sysfs inode lock also prevents
>> it since kobj_del() will take the inode lock).
> 
> Does it?  It only protects the access inside of it, but not the object
> lifetime.

Alloc & free of the cranges structure (the top kobj) are under the
q->sysfs_lock, always. With the accesses also under that same lock, this is
mutually exclusive and all protected. Furthermore, the crange
kobj_add()/kobj_del() do a kobj_get()/kobj_put() on the parent kobj, which is
the request queue kobj. So the queue cannot go away under the crange struct.

I can add a kobj_get/put for the crange struct, but I really do not see the need
for that. Or am I missing something ?

>>>> +void blk_queue_set_cranges(struct gendisk *disk, struct blk_cranges *cr)
>>>
>>> s/blk_queue/disk/
>>
>> Hmmm... The argument is a gendisk, but it is the request queue that is modified.
>> So following other functions like this, isn't blk_queue_ prefix better ?
> 
> Do we have blk_queue_ functions that take a gendisk anywhere?  The
> ones I noticed (and the ones I've recently added) all use disk_.

The only one I can find is blk_queue_set_zoned(), which we could probably rename
to disk_set_zoned(). There are blk_revalidate_disk_zones() and blkdev_nr_zones()
which also take a gendisk and could probably be renamed as
disk_revalidate_zones() and disk_nr_zones() for consistency.

Will rename to disk_set_cranges().

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2021-08-10 23:47 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-26  1:38 [PATCH v3 0/4] Initial support for multi-actuator HDDs Damien Le Moal
2021-07-26  1:38 ` [PATCH v3 1/4] block: Add concurrent positioning ranges support Damien Le Moal
2021-07-26  7:33   ` Hannes Reinecke
2021-07-26  8:30     ` Damien Le Moal
2021-07-26  8:47       ` Hannes Reinecke
2021-07-26 11:33         ` Damien Le Moal
2021-07-27 14:07           ` Paolo Valente
2021-07-27 23:44             ` Damien Le Moal
2021-08-10  8:23   ` Christoph Hellwig
2021-08-10 11:03     ` Damien Le Moal
2021-08-10 16:02       ` hch
2021-08-10 23:46         ` Damien Le Moal
2021-07-26  1:38 ` [PATCH v3 2/4] scsi: sd: add " Damien Le Moal
2021-08-10  8:24   ` Christoph Hellwig
2021-07-26  1:38 ` [PATCH v3 3/4] libata: support concurrent positioning ranges log Damien Le Moal
2021-07-26  7:34   ` Hannes Reinecke
2021-08-10  8:26   ` Christoph Hellwig
2021-07-26  1:38 ` [PATCH v3 4/4] doc: document sysfs queue/cranges attributes Damien Le Moal
2021-07-26  7:35   ` Hannes Reinecke
2021-08-10  8:27   ` Christoph Hellwig
2021-08-10 11:04     ` Damien Le Moal
2021-07-28 22:59 ` [PATCH v3 0/4] Initial support for multi-actuator HDDs Damien Le Moal
2021-08-06  2:12 ` Damien Le Moal
2021-08-06  3:41 ` Martin K. Petersen
2021-08-06  4:05   ` Damien Le Moal
2021-08-06  8:35     ` Hannes Reinecke
2021-08-06  8:52       ` Damien Le Moal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).