linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Initial support for multi-actuator HDDs
@ 2021-07-21 10:42 Damien Le Moal
  2021-07-21 10:42 ` [PATCH 1/4] block: Add concurrent positioning ranges support Damien Le Moal
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Damien Le Moal @ 2021-07-21 10:42 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen, linux-ide

Single LUN multi-actuator hard-disks are cappable to seek and execute
multiple commands in parallel. This capability is exposed to the host
using the Concurrent Positioning Ranges VPD page (SCSI) and Log (ATA).
Each positioning range describes the contiguous set of LBAs that an
actuator serves.

This series adds support the scsi disk driver to retreive this
information and advertize it to user space through sysfs. libata is also
modified to handle ATA drives.

The first patch adds the block layer plumbing to expose concurrent
sector ranges of the device through sysfs as a sub-directory of the
device sysfs queue directory. Patch 2 and 3 add support to sd and
libata. Finally patch 4 documents the sysfs queue attributed changes.

This series does not attempt in any way to optimize accesses to
multi-actuator devices (e.g. block IO scheduler or filesystems). This
initial support only exposes the actuators information to user space
through sysfs.

Damien Le Moal (4):
  block: Add concurrent positioning ranges support
  scsi: sd: add concurrent positioning ranges support
  libata: support concurrent positioning ranges log
  doc: document sysfs queue/cranges attributes

 Documentation/block/queue-sysfs.rst |  27 ++-
 block/Makefile                      |   2 +-
 block/blk-cranges.c                 | 286 ++++++++++++++++++++++++++++
 block/blk-sysfs.c                   |  13 ++
 block/blk.h                         |   3 +
 drivers/ata/libata-core.c           |  57 ++++++
 drivers/ata/libata-scsi.c           |  47 ++++-
 drivers/scsi/sd.c                   |  80 ++++++++
 drivers/scsi/sd.h                   |   1 +
 include/linux/ata.h                 |   1 +
 include/linux/blkdev.h              |  29 +++
 include/linux/libata.h              |  11 ++
 12 files changed, 546 insertions(+), 11 deletions(-)
 create mode 100644 block/blk-cranges.c

-- 
2.31.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] block: Add concurrent positioning ranges support
  2021-07-21 10:42 [PATCH 0/4] Initial support for multi-actuator HDDs Damien Le Moal
@ 2021-07-21 10:42 ` Damien Le Moal
  2021-07-22 16:11   ` Hannes Reinecke
  2021-07-21 10:42 ` [PATCH 2/4] scsi: sd: add " Damien Le Moal
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Damien Le Moal @ 2021-07-21 10:42 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen, linux-ide

The Concurrent Positioning Ranges VPD page (for SCSI) and Log (for ATA)
contain parameters describing the number of sets of contiguous LBAs that
can be served independently by a single LUN multi-actuator disk. This
patch provides the blk_queue_set_cranges() function allowing a device
driver to signal to the block layer that a disk has multiple actuators,
each one serving a contiguous range of sectors. To describe the set
of sector ranges representing the different actuators of a device, the
data type struct blk_cranges is introduced.

For a device with multiple actuators, a struct blk_cranges is attached
to the device request queue by the blk_queue_set_cranges() function. The
function blk_alloc_cranges() is provided for drivers to allocate this
structure.

The blk_cranges structure contains kobjects (struct kobject) to register
with sysfs the set of sector ranges defined by a device. On initial
device scan, this registration is done from blk_register_queue() using
the block layer internal function blk_register_cranges(). If a driver
calls blk_queue_set_cranges() for a registered queue, e.g. when a device
is revalidated, blk_queue_set_cranges() will execute
blk_register_cranges() to update the queue sysfs attribute files.

The sysfs file structure created starts from the cranges sub-directory
and contains the start sector and number of sectors served by an
actuator, with the information for each actuator grouped in one
directory per actuator. E.g. for a dual actuator drive, we have:

$ tree /sys/block/sdk/queue/cranges/
/sys/block/sdk/queue/cranges/
|-- 0
|   |-- nr_sectors
|   `-- sector
`-- 1
    |-- nr_sectors
    `-- sector

For a regular single actuator device, the cranges directory does not
exist.

Device revalidation may lead to changes to this structure and to the
attribute values. When manipulated, the queue sysfs_lock and
sysfs_dir_lock are held for atomicity, similarly to how the blk-mq and
elevator sysfs queue sub-directories are protected.

The code related to the management of cranges is added in the new
file block/blk-cranges.c.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 block/Makefile            |   2 +-
 block/blk-cranges.c       | 286 ++++++++++++++++++++++++++++++++++++++
 block/blk-sysfs.c         |  13 ++
 block/blk.h               |   3 +
 drivers/ata/libata-scsi.c |   1 +
 include/linux/blkdev.h    |  29 ++++
 6 files changed, 333 insertions(+), 1 deletion(-)
 create mode 100644 block/blk-cranges.c

diff --git a/block/Makefile b/block/Makefile
index bfbe4e13ca1e..e477e6ca9ea6 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -9,7 +9,7 @@ obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-sysfs.o \
 			blk-lib.o blk-mq.o blk-mq-tag.o blk-stat.o \
 			blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \
 			genhd.o ioprio.o badblocks.o partitions/ blk-rq-qos.o \
-			disk-events.o
+			disk-events.o blk-cranges.o
 
 obj-$(CONFIG_BOUNCE)		+= bounce.o
 obj-$(CONFIG_BLK_SCSI_REQUEST)	+= scsi_ioctl.o
diff --git a/block/blk-cranges.c b/block/blk-cranges.c
new file mode 100644
index 000000000000..e8ff0d40a5c9
--- /dev/null
+++ b/block/blk-cranges.c
@@ -0,0 +1,286 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ *  Block device concurrent positioning ranges.
+ *
+ *  Copyright (C) 2021 Western Digital Corporation or its Affiliates.
+ */
+#include <linux/kernel.h>
+#include <linux/blkdev.h>
+#include <linux/slab.h>
+#include <linux/init.h>
+
+#include "blk.h"
+
+static ssize_t blk_crange_sector_show(struct blk_crange *cr, char *page)
+{
+	return sprintf(page, "%llu\n", cr->sector);
+}
+
+static ssize_t blk_crange_nr_sectors_show(struct blk_crange *cr, char *page)
+{
+	return sprintf(page, "%llu\n", cr->nr_sectors);
+}
+
+struct blk_crange_sysfs_entry {
+	struct attribute attr;
+	ssize_t (*show)(struct blk_crange *cr, char *page);
+};
+
+static struct blk_crange_sysfs_entry blk_crange_sector_entry = {
+	.attr = { .name = "sector", .mode = 0444 },
+	.show = blk_crange_sector_show,
+};
+
+static struct blk_crange_sysfs_entry blk_crange_nr_sectors_entry = {
+	.attr = { .name = "nr_sectors", .mode = 0444 },
+	.show = blk_crange_nr_sectors_show,
+};
+
+static struct attribute *blk_crange_attrs[] = {
+	&blk_crange_sector_entry.attr,
+	&blk_crange_nr_sectors_entry.attr,
+	NULL,
+};
+ATTRIBUTE_GROUPS(blk_crange);
+
+static ssize_t blk_crange_sysfs_show(struct kobject *kobj,
+				     struct attribute *attr, char *page)
+{
+	struct blk_crange_sysfs_entry *entry;
+	struct blk_crange *cr;
+	struct request_queue *q;
+	ssize_t ret;
+
+	entry = container_of(attr, struct blk_crange_sysfs_entry, attr);
+	cr = container_of(kobj, struct blk_crange, kobj);
+	q = cr->queue;
+
+	mutex_lock(&q->sysfs_lock);
+	ret = entry->show(cr, page);
+	mutex_unlock(&q->sysfs_lock);
+
+	return ret;
+}
+
+static const struct sysfs_ops blk_crange_sysfs_ops = {
+	.show	= blk_crange_sysfs_show,
+};
+
+/*
+ * Dummy release function to make kobj happy.
+ */
+static void blk_cranges_sysfs_nop_release(struct kobject *kobj)
+{
+}
+
+static struct kobj_type blk_crange_ktype = {
+	.sysfs_ops	= &blk_crange_sysfs_ops,
+	.default_groups	= blk_crange_groups,
+	.release	= blk_cranges_sysfs_nop_release,
+};
+
+static struct kobj_type blk_cranges_ktype = {
+	.release	= blk_cranges_sysfs_nop_release,
+};
+
+/**
+ * blk_register_cranges - register with sysfs a set of concurrent ranges
+ * @disk:		Target disk
+ * @new_cranges:	New set of concurrent ranges
+ *
+ * Register with sysfs a set of concurrent ranges for @disk. If @new_cranges
+ * is not NULL, this set of concurrent ranges is registered and the
+ * old set specified by q->cranges is unregistered. Otherwise, q->cranges
+ * is registered if it is not already.
+ */
+int blk_register_cranges(struct gendisk *disk, struct blk_cranges *new_cranges)
+{
+	struct request_queue *q = disk->queue;
+	struct blk_cranges *cranges;
+	int i, ret;
+
+	lockdep_assert_held(&q->sysfs_dir_lock);
+	lockdep_assert_held(&q->sysfs_lock);
+
+	/* If a new range set is specified, unregister the old one */
+	if (new_cranges) {
+		if (q->cranges)
+			blk_unregister_cranges(disk);
+		q->cranges = new_cranges;
+	}
+
+	cranges = q->cranges;
+	if (!cranges)
+		return 0;
+
+	/*
+	 * At this point, q->cranges is the new set of sector ranges that needs
+	 * to be registered with sysfs.
+	 */
+	WARN_ON(cranges->sysfs_registered);
+	ret = kobject_init_and_add(&cranges->kobj, &blk_cranges_ktype,
+				   &q->kobj, "%s", "cranges");
+	if (ret)
+		goto free;
+
+	for (i = 0; i < cranges->nr_ranges; i++) {
+		cranges->ranges[i].queue = q;
+		ret = kobject_init_and_add(&cranges->ranges[i].kobj,
+					   &blk_crange_ktype, &cranges->kobj,
+					   "%d", i);
+		if (ret)
+			goto delete_obj;
+	}
+
+	cranges->sysfs_registered = true;
+
+	return 0;
+
+delete_obj:
+	while (--i >= 0)
+		kobject_del(&cranges->ranges[i].kobj);
+	kobject_del(&cranges->kobj);
+free:
+	kfree(cranges);
+	q->cranges = NULL;
+	return ret;
+}
+
+void blk_unregister_cranges(struct gendisk *disk)
+{
+	struct request_queue *q = disk->queue;
+	struct blk_cranges *cranges = q->cranges;
+	int i;
+
+	lockdep_assert_held(&q->sysfs_dir_lock);
+	lockdep_assert_held(&q->sysfs_lock);
+
+	if (!cranges)
+		return;
+
+	if (cranges->sysfs_registered) {
+		for (i = 0; i < cranges->nr_ranges; i++)
+			kobject_del(&cranges->ranges[i].kobj);
+		kobject_del(&cranges->kobj);
+	}
+
+	kfree(cranges);
+	q->cranges = NULL;
+}
+
+static bool blk_check_ranges(struct gendisk *disk, struct blk_cranges *cr)
+{
+	sector_t capacity = get_capacity(disk);
+	sector_t min_sector = (sector_t)-1;
+	sector_t max_sector = 0;
+	int i;
+
+	/*
+	 * Sector ranges may overlap but should overall contain all sectors
+	 * within the disk capacity.
+	 */
+	for (i = 0; i < cr->nr_ranges; i++) {
+		min_sector = min(min_sector, cr->ranges[i].sector);
+		max_sector = max(max_sector, cr->ranges[i].sector +
+					     cr->ranges[i].nr_sectors);
+	}
+
+	if (min_sector != 0 || max_sector < capacity) {
+		pr_warn("Invalid concurrent ranges: missing sectors\n");
+		return false;
+	}
+
+	if (max_sector > capacity) {
+		pr_warn("Invalid concurrent ranges: beyond capacity\n");
+		return false;
+	}
+
+	return true;
+}
+
+static bool blk_cranges_changed(struct gendisk *disk, struct blk_cranges *new)
+{
+	struct blk_cranges *old = disk->queue->cranges;
+	int i;
+
+	if (!old || old->nr_ranges != new->nr_ranges)
+		return true;
+
+	for (i = 0; i < new->nr_ranges; i++) {
+		if (old->ranges[i].sector != new->ranges[i].sector ||
+		    old->ranges[i].nr_sectors != new->ranges[i].nr_sectors)
+			return true;
+	}
+
+	return false;
+}
+
+/**
+ * blk_alloc_cranges - Allocate a concurrent positioning range structure
+ * @disk:	target disk
+ * @nr_ranges:	Number of concurrent ranges
+ *
+ * Allocate a struct blk_cranges structure with @nr_ranges range descriptors.
+ */
+struct blk_cranges *blk_alloc_cranges(struct gendisk *disk, int nr_ranges)
+{
+	struct blk_cranges *cr;
+
+	cr = kzalloc_node(struct_size(cr, ranges, nr_ranges), GFP_KERNEL,
+			  disk->queue->node);
+	if (cr)
+		cr->nr_ranges = nr_ranges;
+	return cr;
+}
+EXPORT_SYMBOL_GPL(blk_alloc_cranges);
+
+/**
+ * blk_queue_set_cranges - Set a disk concurrent positioning ranges
+ * @disk:	target disk
+ * @cr:		concurrent ranges structure
+ *
+ * Set the concurrant positioning ranges information of the request queue
+ * of @disk to @cr. If @cr is NULL and the concurrent ranges structure
+ * already set, if any, is cleared. If there are no differences between
+ * @cr and the concurrent ranges structure already set, @cr is freed.
+ */
+void blk_queue_set_cranges(struct gendisk *disk, struct blk_cranges *cr)
+{
+	struct request_queue *q = disk->queue;
+
+	if (WARN_ON_ONCE(cr && !cr->nr_ranges)) {
+		kfree(cr);
+		cr = NULL;
+	}
+
+	mutex_lock(&q->sysfs_dir_lock);
+	mutex_lock(&q->sysfs_lock);
+
+	if (cr && !blk_check_ranges(disk, cr)) {
+		kfree(cr);
+		cr = NULL;
+	}
+
+	if (!blk_cranges_changed(disk, cr)) {
+		kfree(cr);
+		goto unlock;
+	}
+
+	/*
+	 * This may be called for a registered queue. E.g. during a device
+	 * revalidation. If that is the case, we need to register the new set
+	 * of concurrent ranges. If the queue is not already registered, the
+	 * device request queue registration will register the ranges.
+	 */
+	if (blk_queue_registered(q)) {
+		blk_register_cranges(disk, cr);
+	} else {
+		swap(q->cranges, cr);
+		kfree(cr);
+	}
+
+unlock:
+	mutex_unlock(&q->sysfs_lock);
+	mutex_unlock(&q->sysfs_dir_lock);
+}
+EXPORT_SYMBOL_GPL(blk_queue_set_cranges);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 370d83c18057..aeac98ecc5a0 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -899,9 +899,21 @@ int blk_register_queue(struct gendisk *disk)
 	}
 
 	mutex_lock(&q->sysfs_lock);
+
+	ret = blk_register_cranges(disk, NULL);
+	if (ret) {
+		mutex_unlock(&q->sysfs_lock);
+		mutex_unlock(&q->sysfs_dir_lock);
+		kobject_del(&q->kobj);
+		blk_trace_remove_sysfs(dev);
+		kobject_put(&dev->kobj);
+		return ret;
+	}
+
 	if (q->elevator) {
 		ret = elv_register_queue(q, false);
 		if (ret) {
+			blk_unregister_cranges(disk);
 			mutex_unlock(&q->sysfs_lock);
 			mutex_unlock(&q->sysfs_dir_lock);
 			kobject_del(&q->kobj);
@@ -985,6 +997,7 @@ void blk_unregister_queue(struct gendisk *disk)
 	mutex_lock(&q->sysfs_lock);
 	if (q->elevator)
 		elv_unregister_queue(q);
+	blk_unregister_cranges(disk);
 	mutex_unlock(&q->sysfs_lock);
 	mutex_unlock(&q->sysfs_dir_lock);
 
diff --git a/block/blk.h b/block/blk.h
index 4b885c0f6708..650c0d87987c 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -368,4 +368,7 @@ extern struct device_attribute dev_attr_events;
 extern struct device_attribute dev_attr_events_async;
 extern struct device_attribute dev_attr_events_poll_msecs;
 
+int blk_register_cranges(struct gendisk *disk, struct blk_cranges *new_cranges);
+void blk_unregister_cranges(struct gendisk *disk);
+
 #endif /* BLK_INTERNAL_H */
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index b9588c52815d..144e8cac44ba 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1947,6 +1947,7 @@ static unsigned int ata_scsiop_inq_00(struct ata_scsi_args *args, u8 *rbuf)
 		0xb1,	/* page 0xb1, block device characteristics page */
 		0xb2,	/* page 0xb2, thin provisioning page */
 		0xb6,	/* page 0xb6, zoned block device characteristics */
+		0xb9,	/* page 0xb9, concurrent positioning ranges */
 	};
 
 	num_pages = sizeof(pages);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3177181c4326..cd58aa1090a3 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -378,6 +378,29 @@ static inline int blkdev_zone_mgmt_ioctl(struct block_device *bdev,
 
 #endif /* CONFIG_BLK_DEV_ZONED */
 
+/*
+ * Concurrent sector ranges: struct blk_crange describes range of
+ * contiguous sectors that can be served by independent resources on the
+ * device. The set of ranges defined in struct blk_cranges must overall
+ * include all sectors within the device capacity.
+ * For a device with multiple ranges, e.g. a single LUN multi-actuator HDD,
+ * requests targeting sectors in different ranges can be executed in parallel.
+ * A request can straddle a range boundary.
+ */
+struct blk_crange {
+	struct kobject		kobj;
+	struct request_queue	*queue;
+	sector_t		sector;
+	sector_t		nr_sectors;
+};
+
+struct blk_cranges {
+	struct kobject		kobj;
+	bool			sysfs_registered;
+	unsigned int		nr_ranges;
+	struct blk_crange	ranges[];
+};
+
 struct request_queue {
 	struct request		*last_merge;
 	struct elevator_queue	*elevator;
@@ -570,6 +593,9 @@ struct request_queue {
 
 #define BLK_MAX_WRITE_HINTS	5
 	u64			write_hints[BLK_MAX_WRITE_HINTS];
+
+	/* Concurrent sector ranges */
+	struct blk_cranges	*cranges;
 };
 
 /* Keep blk_queue_flag_name[] in sync with the definitions below */
@@ -1163,6 +1189,9 @@ extern void blk_queue_required_elevator_features(struct request_queue *q,
 extern bool blk_queue_can_use_dma_map_merging(struct request_queue *q,
 					      struct device *dev);
 
+struct blk_cranges *blk_alloc_cranges(struct gendisk *disk, int nr_ranges);
+void blk_queue_set_cranges(struct gendisk *disk, struct blk_cranges *cr);
+
 /*
  * Number of physical segments as sent to the device.
  *
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4] scsi: sd: add concurrent positioning ranges support
  2021-07-21 10:42 [PATCH 0/4] Initial support for multi-actuator HDDs Damien Le Moal
  2021-07-21 10:42 ` [PATCH 1/4] block: Add concurrent positioning ranges support Damien Le Moal
@ 2021-07-21 10:42 ` Damien Le Moal
  2021-07-21 18:01   ` Dan Carpenter
  2021-07-22 16:14   ` Hannes Reinecke
  2021-07-21 10:42 ` [PATCH 3/4] libata: support concurrent positioning ranges log Damien Le Moal
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 14+ messages in thread
From: Damien Le Moal @ 2021-07-21 10:42 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen, linux-ide

Add the sd_read_cpr() function to the sd scsi disk driver to discover
if a device has multiple concurrent positioning ranges (i.e. multiple
actuators on an HDD). This new function is called from
sd_revalidate_disk() and uses the block layer functions
blk_alloc_cranges() and blk_queue_set_cranges() to set a device
cranges according to the information retrieved from log page B9h,
if the device supports it.

The format of the Concurrent Positioning Ranges VPD page B9h is defined
in section 6.6.6 of SBC-5.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/scsi/sd.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++
 drivers/scsi/sd.h |  1 +
 2 files changed, 81 insertions(+)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index b8d55af763f9..b1e767a01b9f 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3125,6 +3125,85 @@ static void sd_read_security(struct scsi_disk *sdkp, unsigned char *buffer)
 		sdkp->security = 1;
 }
 
+static inline sector_t sd64_to_sectors(struct scsi_disk *sdkp, u8 *buf)
+{
+	return logical_to_sectors(sdkp->device, get_unaligned_be64(buf));
+}
+
+/**
+ * sd_read_cpr - Query concurrent positioning ranges
+ * @sdkp:	disk to query
+ */
+static void sd_read_cpr(struct scsi_disk *sdkp)
+{
+	unsigned char *buffer, *desc;
+	struct blk_cranges *cr = NULL;
+	unsigned int nr_cpr = 0;
+	int i, vpd_len, buf_len = SD_BUF_SIZE;
+
+	/*
+	 * We need to have the capacity set first for the block layer to be
+	 * able to check the ranges.
+	 */
+	if (sdkp->first_scan)
+		return;
+
+	if (!sdkp->capacity)
+		goto out;
+
+	/*
+	 * Concurrent Positioning Ranges VPD: there can be at most 256 ranges,
+	 * leading to a maximum page size of 64 + 256*32 bytes.
+	 */
+	buf_len = 64 + 256*32;
+	buffer = kmalloc(buf_len, GFP_KERNEL);
+	if (!buffer || scsi_get_vpd_page(sdkp->device, 0xb9, buffer, buf_len))
+		goto out;
+
+	/* We must have at least a 64B header and one 32B range descriptor */
+	vpd_len = get_unaligned_be16(&buffer[2]) + 3;
+	if (vpd_len > buf_len || vpd_len < 64 + 32 || (vpd_len & 31)) {
+		sd_printk(KERN_ERR, sdkp,
+			  "Invalid Concurrent Positioning Ranges VPD page\n");
+		goto out;
+	}
+
+	nr_cpr = (vpd_len - 64) / 32;
+	if (nr_cpr == 1) {
+		nr_cpr = 0;
+		goto out;
+	}
+
+	cr = blk_alloc_cranges(sdkp->disk, nr_cpr);
+	if (!cr) {
+		nr_cpr = 0;
+		goto out;
+	}
+
+	desc = &buffer[64];
+	for (i = 0; i < nr_cpr; i++, desc += 32) {
+		if (desc[0] != i) {
+			sd_printk(KERN_ERR, sdkp,
+				"Invalid Concurrent Positioning Range number\n");
+			nr_cpr = 0;
+			break;
+		}
+
+		cr->ranges[i].sector = sd64_to_sectors(sdkp, desc + 8);
+		cr->ranges[i].nr_sectors = sd64_to_sectors(sdkp, desc + 16);
+	}
+
+out:
+	blk_queue_set_cranges(sdkp->disk, cr);
+	if (nr_cpr && sdkp->nr_actuators != nr_cpr) {
+		sd_printk(KERN_NOTICE, sdkp,
+			  "%u concurrent positioning ranges\n", nr_cpr);
+		sdkp->nr_actuators = nr_cpr;
+	}
+
+	kfree(buffer);
+}
+
 /*
  * Determine the device's preferred I/O size for reads and writes
  * unless the reported value is unreasonably small, large, not a
@@ -3240,6 +3319,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
 		sd_read_app_tag_own(sdkp, buffer);
 		sd_read_write_same(sdkp, buffer);
 		sd_read_security(sdkp, buffer);
+		sd_read_cpr(sdkp);
 	}
 
 	/*
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index b59136c4125b..2e5932bde43d 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -106,6 +106,7 @@ struct scsi_disk {
 	u8		protection_type;/* Data Integrity Field */
 	u8		provisioning_mode;
 	u8		zeroing_mode;
+	u8		nr_actuators;		/* Number of actuators */
 	unsigned	ATO : 1;	/* state of disk ATO bit */
 	unsigned	cache_override : 1; /* temp override of WCE,RCD */
 	unsigned	WCE : 1;	/* state of disk WCE bit */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4] libata: support concurrent positioning ranges log
  2021-07-21 10:42 [PATCH 0/4] Initial support for multi-actuator HDDs Damien Le Moal
  2021-07-21 10:42 ` [PATCH 1/4] block: Add concurrent positioning ranges support Damien Le Moal
  2021-07-21 10:42 ` [PATCH 2/4] scsi: sd: add " Damien Le Moal
@ 2021-07-21 10:42 ` Damien Le Moal
  2021-07-22 17:34   ` Hannes Reinecke
  2021-07-21 10:42 ` [PATCH 4/4] doc: document sysfs queue/cranges attributes Damien Le Moal
  2021-07-22 15:58 ` [PATCH 0/4] Initial support for multi-actuator HDDs Hannes Reinecke
  4 siblings, 1 reply; 14+ messages in thread
From: Damien Le Moal @ 2021-07-21 10:42 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen, linux-ide

Add support to discover if an ATA device supports the Concurrent
Positioning Ranges Log (address 0x47), indicating that the device is
capable of seeking to multiple different locations in parallel using
multiple actuators serving different LBA ranges.

Also add support to translate the concurrent positioning ranges log
into its equivalent Concurrent Positioning Ranges VPD page B9h in
libata-scsi.c.

The format of the Concurrent Positioning Ranges Log is defined in ACS-5
r9.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/ata/libata-core.c | 57 +++++++++++++++++++++++++++++++++++++++
 drivers/ata/libata-scsi.c | 46 ++++++++++++++++++++++++-------
 include/linux/ata.h       |  1 +
 include/linux/libata.h    | 11 ++++++++
 4 files changed, 106 insertions(+), 9 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 61c762961ca8..863287c77ba2 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -2363,6 +2363,62 @@ static void ata_dev_config_trusted(struct ata_device *dev)
 		dev->flags |= ATA_DFLAG_TRUSTED;
 }
 
+static void ata_dev_config_cpr(struct ata_device *dev)
+{
+	unsigned int err_mask;
+	size_t buf_len;
+	int i, nr_cpr = 0;
+	struct ata_cpr *cpr = NULL;
+	u8 *buf, *desc;
+
+	dev->nr_cpr = 0;
+	kfree(dev->cpr);
+	dev->cpr = NULL;
+
+	if (!ata_identify_page_supported(dev,
+				 ATA_LOG_CONCURRENT_POSITIONING_RANGES))
+		return;
+
+	/*
+	 * Read IDENTIFY DEVICE data log, page 0x47
+	 * (concurrent positioning ranges). We can have at most 255 32B range
+	 * descriptors plus a 64B header.
+	 */
+	buf_len = (64 + 255 * 32 + 511) & ~511;
+	buf = kzalloc(buf_len, GFP_KERNEL);
+	if (!buf)
+		return;
+
+	err_mask = ata_read_log_page(dev, ATA_LOG_IDENTIFY_DEVICE,
+				     ATA_LOG_CONCURRENT_POSITIONING_RANGES,
+				     buf, buf_len >> 9);
+	if (err_mask)
+		goto free;
+
+	nr_cpr = buf[0];
+	if (!nr_cpr)
+		goto free;
+
+	cpr = kcalloc(nr_cpr, sizeof(struct ata_cpr), GFP_KERNEL);
+	if (!cpr) {
+		nr_cpr = 0;
+		goto free;
+	}
+
+	desc = &buf[64];
+	for (i = 0; i < nr_cpr; i++, desc += 32) {
+		cpr[i].num = desc[0];
+		cpr[i].num_storage_elements = desc[1];
+		cpr[i].start_lba = get_unaligned_le64(&desc[8]);
+		cpr[i].num_lbas = get_unaligned_le64(&desc[16]);
+	}
+
+free:
+	kfree(buf);
+	dev->nr_cpr = nr_cpr;
+	dev->cpr = cpr;
+}
+
 /**
  *	ata_dev_configure - Configure the specified ATA/ATAPI device
  *	@dev: Target device to configure
@@ -2591,6 +2647,7 @@ int ata_dev_configure(struct ata_device *dev)
 		ata_dev_config_sense_reporting(dev);
 		ata_dev_config_zac(dev);
 		ata_dev_config_trusted(dev);
+		ata_dev_config_cpr(dev);
 		dev->cdb_len = 32;
 	}
 
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 144e8cac44ba..922f130b8382 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1937,7 +1937,7 @@ static unsigned int ata_scsiop_inq_std(struct ata_scsi_args *args, u8 *rbuf)
  */
 static unsigned int ata_scsiop_inq_00(struct ata_scsi_args *args, u8 *rbuf)
 {
-	int num_pages;
+	int i, num_pages = 0;
 	static const u8 pages[] = {
 		0x00,	/* page 0x00, this page */
 		0x80,	/* page 0x80, unit serial no page */
@@ -1950,11 +1950,14 @@ static unsigned int ata_scsiop_inq_00(struct ata_scsi_args *args, u8 *rbuf)
 		0xb9,	/* page 0xb9, concurrent positioning ranges */
 	};
 
-	num_pages = sizeof(pages);
-	if (!(args->dev->flags & ATA_DFLAG_ZAC))
-		num_pages--;
+	for (i = 0; i < sizeof(pages); i++) {
+		if (pages[i] == 0xb6 &&
+		    !(args->dev->flags & ATA_DFLAG_ZAC))
+			continue;
+		rbuf[num_pages + 4] = pages[i];
+		num_pages++;
+	}
 	rbuf[3] = num_pages;	/* number of supported VPD pages */
-	memcpy(rbuf + 4, pages, num_pages);
 	return 0;
 }
 
@@ -2164,6 +2167,25 @@ static unsigned int ata_scsiop_inq_b6(struct ata_scsi_args *args, u8 *rbuf)
 	return 0;
 }
 
+static unsigned int ata_scsiop_inq_b9(struct ata_scsi_args *args, u8 *rbuf)
+{
+	u8 *desc = &rbuf[64];
+	int i;
+
+	/* SCSI Concurrent Positioning Ranges VPD page: SBC-5 rev 1 or later */
+	rbuf[1] = 0xb9;
+	put_unaligned_be16(64 + (int)args->dev->nr_cpr * 32 - 4, &rbuf[3]);
+
+	for (i = 0; i < args->dev->nr_cpr; i++, desc += 32) {
+		desc[0] = args->dev->cpr[i].num;
+		desc[1] = args->dev->cpr[i].num_storage_elements;
+		put_unaligned_be64(args->dev->cpr[i].start_lba, &desc[8]);
+		put_unaligned_be64(args->dev->cpr[i].num_lbas, &desc[16]);
+	}
+
+	return 0;
+}
+
 /**
  *	modecpy - Prepare response for MODE SENSE
  *	@dest: output buffer
@@ -4163,11 +4185,17 @@ void ata_scsi_simulate(struct ata_device *dev, struct scsi_cmnd *cmd)
 			ata_scsi_rbuf_fill(&args, ata_scsiop_inq_b2);
 			break;
 		case 0xb6:
-			if (dev->flags & ATA_DFLAG_ZAC) {
+			if (dev->flags & ATA_DFLAG_ZAC)
 				ata_scsi_rbuf_fill(&args, ata_scsiop_inq_b6);
-				break;
-			}
-			fallthrough;
+			else
+				ata_scsi_set_invalid_field(dev, cmd, 2, 0xff);
+			break;
+		case 0xb9:
+			if (dev->nr_cpr)
+				ata_scsi_rbuf_fill(&args, ata_scsiop_inq_b9);
+			else
+				ata_scsi_set_invalid_field(dev, cmd, 2, 0xff);
+			break;
 		default:
 			ata_scsi_set_invalid_field(dev, cmd, 2, 0xff);
 			break;
diff --git a/include/linux/ata.h b/include/linux/ata.h
index 1b44f40c7700..199e47e97d64 100644
--- a/include/linux/ata.h
+++ b/include/linux/ata.h
@@ -329,6 +329,7 @@ enum {
 	ATA_LOG_SECURITY	  = 0x06,
 	ATA_LOG_SATA_SETTINGS	  = 0x08,
 	ATA_LOG_ZONED_INFORMATION = 0x09,
+	ATA_LOG_CONCURRENT_POSITIONING_RANGES = 0x47,
 
 	/* Identify device SATA settings log:*/
 	ATA_LOG_DEVSLP_OFFSET	  = 0x30,
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 3fcd24236793..94b7e152e76e 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -670,6 +670,13 @@ struct ata_ering {
 	struct ata_ering_entry	ring[ATA_ERING_SIZE];
 };
 
+struct ata_cpr {
+	u8			num;
+	u8			num_storage_elements;
+	u64			start_lba;
+	u64			num_lbas;
+};
+
 struct ata_device {
 	struct ata_link		*link;
 	unsigned int		devno;		/* 0 or 1 */
@@ -729,6 +736,10 @@ struct ata_device {
 	u32			zac_zones_optimal_nonseq;
 	u32			zac_zones_max_open;
 
+	/* Concurrent positioning ranges */
+	u8			nr_cpr;
+	struct ata_cpr		*cpr;
+
 	/* error history */
 	int			spdn_cnt;
 	/* ering is CLEAR_END, read comment above CLEAR_END */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] doc: document sysfs queue/cranges attributes
  2021-07-21 10:42 [PATCH 0/4] Initial support for multi-actuator HDDs Damien Le Moal
                   ` (2 preceding siblings ...)
  2021-07-21 10:42 ` [PATCH 3/4] libata: support concurrent positioning ranges log Damien Le Moal
@ 2021-07-21 10:42 ` Damien Le Moal
  2021-07-22 15:58 ` [PATCH 0/4] Initial support for multi-actuator HDDs Hannes Reinecke
  4 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2021-07-21 10:42 UTC (permalink / raw)
  To: linux-block, Jens Axboe, linux-scsi, Martin K . Petersen, linux-ide

Update the file Documentation/block/queue-sysfs.rst to add a description
of a device queue sysfs entries related to concurrent sector ranges
(e.g. concurrent positioning ranges for multi-actuator hard-disks).

While at it, also fix a typo in this file introduction paragraph.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 Documentation/block/queue-sysfs.rst | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/Documentation/block/queue-sysfs.rst b/Documentation/block/queue-sysfs.rst
index 4dc7f0d499a8..ffbea0c7a576 100644
--- a/Documentation/block/queue-sysfs.rst
+++ b/Documentation/block/queue-sysfs.rst
@@ -4,7 +4,7 @@ Queue sysfs files
 
 This text file will detail the queue files that are located in the sysfs tree
 for each block device. Note that stacked devices typically do not export
-any settings, since their queue merely functions are a remapping target.
+any settings, since their queue merely functions as a remapping target.
 These files are the ones found in the /sys/block/xxx/queue/ directory.
 
 Files denoted with a RO postfix are readonly and the RW postfix means
@@ -286,4 +286,29 @@ sequential zones of zoned block devices (devices with a zoned attributed
 that reports "host-managed" or "host-aware"). This value is always 0 for
 regular block devices.
 
+cranges (RO)
+------------
+
+The presence of this sub-directory of the /sys/block/xxx/queue/ directory
+indicates that the device is capable of executing requests targeting
+different sector ranges in parallel. For instance, single LUN multi-actuator
+hard-disks will likely have a cranges directory if the device correctly
+advertizes the sector ranges of its actuators.
+
+The cranges directory contains one directory per sector range, with each
+range described using the sector (RO) attribute file to indicate the first
+sector of the range and the nr_sectors (RO) attribute file to indicate the
+total number of sector in the range starting from the first sector.
+For example, a dual-actuator hard disk will have the following cranges
+entries.::
+
+        $ tree /sys/block/<device>/queue/cranges/
+        /sys/block/<device>/queue/cranges/
+        |-- 0
+        |   |-- nr_sectors
+        |   `-- sector
+        `-- 1
+            |-- nr_sectors
+            `-- sector
+
 Jens Axboe <jens.axboe@oracle.com>, February 2009
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] scsi: sd: add concurrent positioning ranges support
  2021-07-21 10:42 ` [PATCH 2/4] scsi: sd: add " Damien Le Moal
@ 2021-07-21 18:01   ` Dan Carpenter
  2021-07-22 16:14   ` Hannes Reinecke
  1 sibling, 0 replies; 14+ messages in thread
From: Dan Carpenter @ 2021-07-21 18:01 UTC (permalink / raw)
  To: kbuild, Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, linux-ide
  Cc: lkp, kbuild-all

Hi Damien,

url:    https://github.com/0day-ci/linux/commits/Damien-Le-Moal/Initial-support-for-multi-actuator-HDDs/20210721-185447
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: x86_64-randconfig-m001-20210720 (attached as .config)
compiler: gcc-10 (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
drivers/scsi/sd.c:3204 sd_read_cpr() error: uninitialized symbol 'buffer'.

vim +/buffer +3204 drivers/scsi/sd.c

331fc9cf44c011 Damien Le Moal 2021-07-21  3137  static void sd_read_cpr(struct scsi_disk *sdkp)
331fc9cf44c011 Damien Le Moal 2021-07-21  3138  {
331fc9cf44c011 Damien Le Moal 2021-07-21  3139  	unsigned char *buffer, *desc;
                                                                       ^^^^^^
331fc9cf44c011 Damien Le Moal 2021-07-21  3140  	struct blk_cranges *cr = NULL;
331fc9cf44c011 Damien Le Moal 2021-07-21  3141  	unsigned int nr_cpr = 0;
331fc9cf44c011 Damien Le Moal 2021-07-21  3142  	int i, vpd_len, buf_len = SD_BUF_SIZE;
331fc9cf44c011 Damien Le Moal 2021-07-21  3143  
331fc9cf44c011 Damien Le Moal 2021-07-21  3144  	/*
331fc9cf44c011 Damien Le Moal 2021-07-21  3145  	 * We need to have the capacity set first for the block layer to be
331fc9cf44c011 Damien Le Moal 2021-07-21  3146  	 * able to check the ranges.
331fc9cf44c011 Damien Le Moal 2021-07-21  3147  	 */
331fc9cf44c011 Damien Le Moal 2021-07-21  3148  	if (sdkp->first_scan)
331fc9cf44c011 Damien Le Moal 2021-07-21  3149  		return;
331fc9cf44c011 Damien Le Moal 2021-07-21  3150  
331fc9cf44c011 Damien Le Moal 2021-07-21  3151  	if (!sdkp->capacity)
331fc9cf44c011 Damien Le Moal 2021-07-21  3152  		goto out;
                                                                ^^^^^^^^^
This should just return probably?

331fc9cf44c011 Damien Le Moal 2021-07-21  3153  
331fc9cf44c011 Damien Le Moal 2021-07-21  3154  	/*
331fc9cf44c011 Damien Le Moal 2021-07-21  3155  	 * Concurrent Positioning Ranges VPD: there can be at most 256 ranges,
331fc9cf44c011 Damien Le Moal 2021-07-21  3156  	 * leading to a maximum page size of 64 + 256*32 bytes.
331fc9cf44c011 Damien Le Moal 2021-07-21  3157  	 */
331fc9cf44c011 Damien Le Moal 2021-07-21  3158  	buf_len = 64 + 256*32;
331fc9cf44c011 Damien Le Moal 2021-07-21  3159  	buffer = kmalloc(buf_len, GFP_KERNEL);
331fc9cf44c011 Damien Le Moal 2021-07-21  3160  	if (!buffer || scsi_get_vpd_page(sdkp->device, 0xb9, buffer, buf_len))
331fc9cf44c011 Damien Le Moal 2021-07-21  3161  		goto out;
331fc9cf44c011 Damien Le Moal 2021-07-21  3162  
331fc9cf44c011 Damien Le Moal 2021-07-21  3163  	/* We must have at least a 64B header and one 32B range descriptor */
331fc9cf44c011 Damien Le Moal 2021-07-21  3164  	vpd_len = get_unaligned_be16(&buffer[2]) + 3;
331fc9cf44c011 Damien Le Moal 2021-07-21  3165  	if (vpd_len > buf_len || vpd_len < 64 + 32 || (vpd_len & 31)) {
331fc9cf44c011 Damien Le Moal 2021-07-21  3166  		sd_printk(KERN_ERR, sdkp,
331fc9cf44c011 Damien Le Moal 2021-07-21  3167  			  "Invalid Concurrent Positioning Ranges VPD page\n");
331fc9cf44c011 Damien Le Moal 2021-07-21  3168  		goto out;
331fc9cf44c011 Damien Le Moal 2021-07-21  3169  	}
331fc9cf44c011 Damien Le Moal 2021-07-21  3170  
331fc9cf44c011 Damien Le Moal 2021-07-21  3171  	nr_cpr = (vpd_len - 64) / 32;
331fc9cf44c011 Damien Le Moal 2021-07-21  3172  	if (nr_cpr == 1) {
331fc9cf44c011 Damien Le Moal 2021-07-21  3173  		nr_cpr = 0;
331fc9cf44c011 Damien Le Moal 2021-07-21  3174  		goto out;
331fc9cf44c011 Damien Le Moal 2021-07-21  3175  	}
331fc9cf44c011 Damien Le Moal 2021-07-21  3176  
331fc9cf44c011 Damien Le Moal 2021-07-21  3177  	cr = blk_alloc_cranges(sdkp->disk, nr_cpr);
331fc9cf44c011 Damien Le Moal 2021-07-21  3178  	if (!cr) {
331fc9cf44c011 Damien Le Moal 2021-07-21  3179  		nr_cpr = 0;
331fc9cf44c011 Damien Le Moal 2021-07-21  3180  		goto out;
331fc9cf44c011 Damien Le Moal 2021-07-21  3181  	}
331fc9cf44c011 Damien Le Moal 2021-07-21  3182  
331fc9cf44c011 Damien Le Moal 2021-07-21  3183  	desc = &buffer[64];
331fc9cf44c011 Damien Le Moal 2021-07-21  3184  	for (i = 0; i < nr_cpr; i++, desc += 32) {
331fc9cf44c011 Damien Le Moal 2021-07-21  3185  		if (desc[0] != i) {
331fc9cf44c011 Damien Le Moal 2021-07-21  3186  			sd_printk(KERN_ERR, sdkp,
331fc9cf44c011 Damien Le Moal 2021-07-21  3187  				"Invalid Concurrent Positioning Range number\n");
331fc9cf44c011 Damien Le Moal 2021-07-21  3188  			nr_cpr = 0;
331fc9cf44c011 Damien Le Moal 2021-07-21  3189  			break;
331fc9cf44c011 Damien Le Moal 2021-07-21  3190  		}
331fc9cf44c011 Damien Le Moal 2021-07-21  3191  
331fc9cf44c011 Damien Le Moal 2021-07-21  3192  		cr->ranges[i].sector = sd64_to_sectors(sdkp, desc + 8);
331fc9cf44c011 Damien Le Moal 2021-07-21  3193  		cr->ranges[i].nr_sectors = sd64_to_sectors(sdkp, desc + 16);
331fc9cf44c011 Damien Le Moal 2021-07-21  3194  	}
331fc9cf44c011 Damien Le Moal 2021-07-21  3195  
331fc9cf44c011 Damien Le Moal 2021-07-21  3196  out:
331fc9cf44c011 Damien Le Moal 2021-07-21  3197  	blk_queue_set_cranges(sdkp->disk, cr);
331fc9cf44c011 Damien Le Moal 2021-07-21  3198  	if (nr_cpr && sdkp->nr_actuators != nr_cpr) {
331fc9cf44c011 Damien Le Moal 2021-07-21  3199  		sd_printk(KERN_NOTICE, sdkp,
331fc9cf44c011 Damien Le Moal 2021-07-21  3200  			  "%u concurrent positioning ranges\n", nr_cpr);
331fc9cf44c011 Damien Le Moal 2021-07-21  3201  		sdkp->nr_actuators = nr_cpr;
331fc9cf44c011 Damien Le Moal 2021-07-21  3202  	}
331fc9cf44c011 Damien Le Moal 2021-07-21  3203  
331fc9cf44c011 Damien Le Moal 2021-07-21 @3204  	kfree(buffer);
                                                        ^^^^^^^^^^^^^

331fc9cf44c011 Damien Le Moal 2021-07-21  3205  }

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/4] Initial support for multi-actuator HDDs
  2021-07-21 10:42 [PATCH 0/4] Initial support for multi-actuator HDDs Damien Le Moal
                   ` (3 preceding siblings ...)
  2021-07-21 10:42 ` [PATCH 4/4] doc: document sysfs queue/cranges attributes Damien Le Moal
@ 2021-07-22 15:58 ` Hannes Reinecke
  4 siblings, 0 replies; 14+ messages in thread
From: Hannes Reinecke @ 2021-07-22 15:58 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, linux-ide

On 7/21/21 12:42 PM, Damien Le Moal wrote:
> Single LUN multi-actuator hard-disks are cappable to seek and execute
> multiple commands in parallel. This capability is exposed to the host
> using the Concurrent Positioning Ranges VPD page (SCSI) and Log (ATA).
> Each positioning range describes the contiguous set of LBAs that an
> actuator serves.
> 
> This series adds support the scsi disk driver to retreive this
> information and advertize it to user space through sysfs. libata is also
> modified to handle ATA drives.
> 
> The first patch adds the block layer plumbing to expose concurrent
> sector ranges of the device through sysfs as a sub-directory of the
> device sysfs queue directory. Patch 2 and 3 add support to sd and
> libata. Finally patch 4 documents the sysfs queue attributed changes.
> 
> This series does not attempt in any way to optimize accesses to
> multi-actuator devices (e.g. block IO scheduler or filesystems). This
> initial support only exposes the actuators information to user space
> through sysfs.
> 
> Damien Le Moal (4):
>    block: Add concurrent positioning ranges support
>    scsi: sd: add concurrent positioning ranges support
>    libata: support concurrent positioning ranges log
>    doc: document sysfs queue/cranges attributes
> 
>   Documentation/block/queue-sysfs.rst |  27 ++-
>   block/Makefile                      |   2 +-
>   block/blk-cranges.c                 | 286 ++++++++++++++++++++++++++++
>   block/blk-sysfs.c                   |  13 ++
>   block/blk.h                         |   3 +
>   drivers/ata/libata-core.c           |  57 ++++++
>   drivers/ata/libata-scsi.c           |  47 ++++-
>   drivers/scsi/sd.c                   |  80 ++++++++
>   drivers/scsi/sd.h                   |   1 +
>   include/linux/ata.h                 |   1 +
>   include/linux/blkdev.h              |  29 +++
>   include/linux/libata.h              |  11 ++
>   12 files changed, 546 insertions(+), 11 deletions(-)
>   create mode 100644 block/blk-cranges.c
> 
Oh well, you beat me to it.
Actually I have patches here locally, which just had been waiting for 
the missing SBC definitions, which apparently are present now.

Too bad; should have been more aggressive posting them :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] block: Add concurrent positioning ranges support
  2021-07-21 10:42 ` [PATCH 1/4] block: Add concurrent positioning ranges support Damien Le Moal
@ 2021-07-22 16:11   ` Hannes Reinecke
  2021-07-22 23:20     ` Damien Le Moal
  2021-07-22 23:59     ` Damien Le Moal
  0 siblings, 2 replies; 14+ messages in thread
From: Hannes Reinecke @ 2021-07-22 16:11 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, linux-ide

On 7/21/21 12:42 PM, Damien Le Moal wrote:
> The Concurrent Positioning Ranges VPD page (for SCSI) and Log (for ATA)
> contain parameters describing the number of sets of contiguous LBAs that
> can be served independently by a single LUN multi-actuator disk. This
> patch provides the blk_queue_set_cranges() function allowing a device
> driver to signal to the block layer that a disk has multiple actuators,
> each one serving a contiguous range of sectors. To describe the set
> of sector ranges representing the different actuators of a device, the
> data type struct blk_cranges is introduced.
> 
> For a device with multiple actuators, a struct blk_cranges is attached
> to the device request queue by the blk_queue_set_cranges() function. The
> function blk_alloc_cranges() is provided for drivers to allocate this
> structure.
> 
> The blk_cranges structure contains kobjects (struct kobject) to register
> with sysfs the set of sector ranges defined by a device. On initial
> device scan, this registration is done from blk_register_queue() using
> the block layer internal function blk_register_cranges(). If a driver
> calls blk_queue_set_cranges() for a registered queue, e.g. when a device
> is revalidated, blk_queue_set_cranges() will execute
> blk_register_cranges() to update the queue sysfs attribute files.
> 
> The sysfs file structure created starts from the cranges sub-directory
> and contains the start sector and number of sectors served by an
> actuator, with the information for each actuator grouped in one
> directory per actuator. E.g. for a dual actuator drive, we have:
> 
> $ tree /sys/block/sdk/queue/cranges/
> /sys/block/sdk/queue/cranges/
> |-- 0
> |   |-- nr_sectors
> |   `-- sector
> `-- 1
>      |-- nr_sectors
>      `-- sector
> 
> For a regular single actuator device, the cranges directory does not
> exist.
> 
Hmm. I do wonder if we need such a detailed layout.
In my patch I have just:

+       unsigned int            num_lba_ranges;
+       sector_t                *lba_ranges;

to hold the information; after all, I don't expect this data to be 
changing very often during the lifetime of the OS.

> Device revalidation may lead to changes to this structure and to the
> attribute values. When manipulated, the queue sysfs_lock and
> sysfs_dir_lock are held for atomicity, similarly to how the blk-mq and
> elevator sysfs queue sub-directories are protected.
> 

Indeed, the values might change with revalidation. But not 
independently; a change in _one_ range will typically affect the 
adjacent ranges, too.
So it should be sufficient to drop them completely and redo from scratch 
for revalidation.

> The code related to the management of cranges is added in the new
> file block/blk-cranges.c.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>   block/Makefile            |   2 +-
>   block/blk-cranges.c       | 286 ++++++++++++++++++++++++++++++++++++++
>   block/blk-sysfs.c         |  13 ++
>   block/blk.h               |   3 +
>   drivers/ata/libata-scsi.c |   1 +
>   include/linux/blkdev.h    |  29 ++++
>   6 files changed, 333 insertions(+), 1 deletion(-)
>   create mode 100644 block/blk-cranges.c
> 
> diff --git a/block/Makefile b/block/Makefile
> index bfbe4e13ca1e..e477e6ca9ea6 100644
> --- a/block/Makefile
> +++ b/block/Makefile
> @@ -9,7 +9,7 @@ obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-sysfs.o \
>   			blk-lib.o blk-mq.o blk-mq-tag.o blk-stat.o \
>   			blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \
>   			genhd.o ioprio.o badblocks.o partitions/ blk-rq-qos.o \
> -			disk-events.o
> +			disk-events.o blk-cranges.o
>   
>   obj-$(CONFIG_BOUNCE)		+= bounce.o
>   obj-$(CONFIG_BLK_SCSI_REQUEST)	+= scsi_ioctl.o
> diff --git a/block/blk-cranges.c b/block/blk-cranges.c
> new file mode 100644
> index 000000000000..e8ff0d40a5c9
> --- /dev/null
> +++ b/block/blk-cranges.c
> @@ -0,0 +1,286 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + *  Block device concurrent positioning ranges.
> + *
> + *  Copyright (C) 2021 Western Digital Corporation or its Affiliates.
> + */
> +#include <linux/kernel.h>
> +#include <linux/blkdev.h>
> +#include <linux/slab.h>
> +#include <linux/init.h>
> +
> +#include "blk.h"
> +
> +static ssize_t blk_crange_sector_show(struct blk_crange *cr, char *page)
> +{
> +	return sprintf(page, "%llu\n", cr->sector);
> +}
> +
> +static ssize_t blk_crange_nr_sectors_show(struct blk_crange *cr, char *page)
> +{
> +	return sprintf(page, "%llu\n", cr->nr_sectors);
> +}
> +
> +struct blk_crange_sysfs_entry {
> +	struct attribute attr;
> +	ssize_t (*show)(struct blk_crange *cr, char *page);
> +};
> +
> +static struct blk_crange_sysfs_entry blk_crange_sector_entry = {
> +	.attr = { .name = "sector", .mode = 0444 },
> +	.show = blk_crange_sector_show,
> +};
> +
> +static struct blk_crange_sysfs_entry blk_crange_nr_sectors_entry = {
> +	.attr = { .name = "nr_sectors", .mode = 0444 },
> +	.show = blk_crange_nr_sectors_show,
> +};
> +
> +static struct attribute *blk_crange_attrs[] = {
> +	&blk_crange_sector_entry.attr,
> +	&blk_crange_nr_sectors_entry.attr,
> +	NULL,
> +};
> +ATTRIBUTE_GROUPS(blk_crange);
> +
> +static ssize_t blk_crange_sysfs_show(struct kobject *kobj,
> +				     struct attribute *attr, char *page)
> +{
> +	struct blk_crange_sysfs_entry *entry;
> +	struct blk_crange *cr;
> +	struct request_queue *q;
> +	ssize_t ret;
> +
> +	entry = container_of(attr, struct blk_crange_sysfs_entry, attr);
> +	cr = container_of(kobj, struct blk_crange, kobj);
> +	q = cr->queue;
> +
> +	mutex_lock(&q->sysfs_lock);
> +	ret = entry->show(cr, page);
> +	mutex_unlock(&q->sysfs_lock);
> +
> +	return ret;
> +}
> +
> +static const struct sysfs_ops blk_crange_sysfs_ops = {
> +	.show	= blk_crange_sysfs_show,
> +};
> +
> +/*
> + * Dummy release function to make kobj happy.
> + */
> +static void blk_cranges_sysfs_nop_release(struct kobject *kobj)
> +{
> +}
> +
> +static struct kobj_type blk_crange_ktype = {
> +	.sysfs_ops	= &blk_crange_sysfs_ops,
> +	.default_groups	= blk_crange_groups,
> +	.release	= blk_cranges_sysfs_nop_release,
> +};
> +
> +static struct kobj_type blk_cranges_ktype = {
> +	.release	= blk_cranges_sysfs_nop_release,
> +};
> +
> +/**
> + * blk_register_cranges - register with sysfs a set of concurrent ranges
> + * @disk:		Target disk
> + * @new_cranges:	New set of concurrent ranges
> + *
> + * Register with sysfs a set of concurrent ranges for @disk. If @new_cranges
> + * is not NULL, this set of concurrent ranges is registered and the
> + * old set specified by q->cranges is unregistered. Otherwise, q->cranges
> + * is registered if it is not already.
> + */
> +int blk_register_cranges(struct gendisk *disk, struct blk_cranges *new_cranges)
> +{
> +	struct request_queue *q = disk->queue;
> +	struct blk_cranges *cranges;
> +	int i, ret;
> +
> +	lockdep_assert_held(&q->sysfs_dir_lock);
> +	lockdep_assert_held(&q->sysfs_lock);
> +
> +	/* If a new range set is specified, unregister the old one */
> +	if (new_cranges) {
> +		if (q->cranges)
> +			blk_unregister_cranges(disk);
> +		q->cranges = new_cranges;
> +	}
> +
> +	cranges = q->cranges;
> +	if (!cranges)
> +		return 0;
> +
> +	/*
> +	 * At this point, q->cranges is the new set of sector ranges that needs
> +	 * to be registered with sysfs.
> +	 */
> +	WARN_ON(cranges->sysfs_registered);
> +	ret = kobject_init_and_add(&cranges->kobj, &blk_cranges_ktype,
> +				   &q->kobj, "%s", "cranges");
> +	if (ret)
> +		goto free;
> + > +	for (i = 0; i < cranges->nr_ranges; i++) {
> +		cranges->ranges[i].queue = q;
> +		ret = kobject_init_and_add(&cranges->ranges[i].kobj,
> +					   &blk_crange_ktype, &cranges->kobj,
> +					   "%d", i);
> +		if (ret)
> +			goto delete_obj;
> +	}
> +
> +	cranges->sysfs_registered = true;
> +

Do you really need independent kobj here?
As discussed above, the individual ranges are not independent, and it 
might be easier to just lump them into one structure, seeing that all of 
them will have to be redone during revalidation anyway.

> +	return 0;
> +
> +delete_obj:
> +	while (--i >= 0)
> +		kobject_del(&cranges->ranges[i].kobj);
> +	kobject_del(&cranges->kobj);
> +free:
> +	kfree(cranges);
> +	q->cranges = NULL;
> +	return ret;
> +}
> +
> +void blk_unregister_cranges(struct gendisk *disk)
> +{
> +	struct request_queue *q = disk->queue;
> +	struct blk_cranges *cranges = q->cranges;
> +	int i;
> +
> +	lockdep_assert_held(&q->sysfs_dir_lock);
> +	lockdep_assert_held(&q->sysfs_lock);
> +
> +	if (!cranges)
> +		return;
> +
> +	if (cranges->sysfs_registered) {
> +		for (i = 0; i < cranges->nr_ranges; i++)
> +			kobject_del(&cranges->ranges[i].kobj);
> +		kobject_del(&cranges->kobj);
> +	}
> +
> +	kfree(cranges);
> +	q->cranges = NULL;
> +}
> +
> +static bool blk_check_ranges(struct gendisk *disk, struct blk_cranges *cr)
> +{
> +	sector_t capacity = get_capacity(disk);
> +	sector_t min_sector = (sector_t)-1;
> +	sector_t max_sector = 0;
> +	int i;
> +
> +	/*
> +	 * Sector ranges may overlap but should overall contain all sectors
> +	 * within the disk capacity.
> +	 */
> +	for (i = 0; i < cr->nr_ranges; i++) {
> +		min_sector = min(min_sector, cr->ranges[i].sector);
> +		max_sector = max(max_sector, cr->ranges[i].sector +
> +					     cr->ranges[i].nr_sectors);
> +	}
> +

Warm fuzzy feelings.
Sector ranges may overlap? Seriously?

The only way this could happen is when a given LBA will be present in 
_two_ ranges. And this is valid?

> +	if (min_sector != 0 || max_sector < capacity) {
> +		pr_warn("Invalid concurrent ranges: missing sectors\n");
> +		return false;
> +	}
> +
> +	if (max_sector > capacity) {
> +		pr_warn("Invalid concurrent ranges: beyond capacity\n");
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +static bool blk_cranges_changed(struct gendisk *disk, struct blk_cranges *new)
> +{
> +	struct blk_cranges *old = disk->queue->cranges;
> +	int i;
> +
> +	if (!old || old->nr_ranges != new->nr_ranges)
> +		return true;
> +
> +	for (i = 0; i < new->nr_ranges; i++) {
> +		if (old->ranges[i].sector != new->ranges[i].sector ||
> +		    old->ranges[i].nr_sectors != new->ranges[i].nr_sectors)
> +			return true;
> +	}
> +
> +	return false;
> +}
> +

As mentioned above: I doubt we need to go into such detail. Just redo 
the ranges on revalidation should be sufficient.

> +/**
> + * blk_alloc_cranges - Allocate a concurrent positioning range structure
> + * @disk:	target disk
> + * @nr_ranges:	Number of concurrent ranges
> + *
> + * Allocate a struct blk_cranges structure with @nr_ranges range descriptors.
> + */
> +struct blk_cranges *blk_alloc_cranges(struct gendisk *disk, int nr_ranges)
> +{
> +	struct blk_cranges *cr;
> +
> +	cr = kzalloc_node(struct_size(cr, ranges, nr_ranges), GFP_KERNEL,
> +			  disk->queue->node);
> +	if (cr)
> +		cr->nr_ranges = nr_ranges;
> +	return cr;
> +}
> +EXPORT_SYMBOL_GPL(blk_alloc_cranges);
> +
> +/**
> + * blk_queue_set_cranges - Set a disk concurrent positioning ranges
> + * @disk:	target disk
> + * @cr:		concurrent ranges structure
> + *
> + * Set the concurrant positioning ranges information of the request queue
> + * of @disk to @cr. If @cr is NULL and the concurrent ranges structure
> + * already set, if any, is cleared. If there are no differences between
> + * @cr and the concurrent ranges structure already set, @cr is freed.
> + */
> +void blk_queue_set_cranges(struct gendisk *disk, struct blk_cranges *cr)
> +{
> +	struct request_queue *q = disk->queue;
> +
> +	if (WARN_ON_ONCE(cr && !cr->nr_ranges)) {
> +		kfree(cr);
> +		cr = NULL;
> +	}
> +
> +	mutex_lock(&q->sysfs_dir_lock);
> +	mutex_lock(&q->sysfs_lock);
> +

And this is what you get for all the complexity.
Having to take _two_ locks, and have a detailed description how the 
function should be used.

I still think that redoing everything is simpler.

> +	if (cr && !blk_check_ranges(disk, cr)) {
> +		kfree(cr);
> +		cr = NULL;
> +	}
> +
> +	if (!blk_cranges_changed(disk, cr)) {
> +		kfree(cr);
> +		goto unlock;
> +	}
> +
> +	/*
> +	 * This may be called for a registered queue. E.g. during a device
> +	 * revalidation. If that is the case, we need to register the new set
> +	 * of concurrent ranges. If the queue is not already registered, the
> +	 * device request queue registration will register the ranges.
> +	 */
> +	if (blk_queue_registered(q)) {
> +		blk_register_cranges(disk, cr);
> +	} else {
> +		swap(q->cranges, cr);
> +		kfree(cr);
> +	}
> +
> +unlock:
> +	mutex_unlock(&q->sysfs_lock);
> +	mutex_unlock(&q->sysfs_dir_lock);
> +}
> +EXPORT_SYMBOL_GPL(blk_queue_set_cranges);
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 370d83c18057..aeac98ecc5a0 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -899,9 +899,21 @@ int blk_register_queue(struct gendisk *disk)
>   	}
>   
>   	mutex_lock(&q->sysfs_lock);
> +
> +	ret = blk_register_cranges(disk, NULL);
> +	if (ret) {
> +		mutex_unlock(&q->sysfs_lock);
> +		mutex_unlock(&q->sysfs_dir_lock);
> +		kobject_del(&q->kobj);
> +		blk_trace_remove_sysfs(dev);
> +		kobject_put(&dev->kobj);
> +		return ret;
> +	}
> +
>   	if (q->elevator) {
>   		ret = elv_register_queue(q, false);
>   		if (ret) {
> +			blk_unregister_cranges(disk);
>   			mutex_unlock(&q->sysfs_lock);
>   			mutex_unlock(&q->sysfs_dir_lock);
>   			kobject_del(&q->kobj);
> @@ -985,6 +997,7 @@ void blk_unregister_queue(struct gendisk *disk)
>   	mutex_lock(&q->sysfs_lock);
>   	if (q->elevator)
>   		elv_unregister_queue(q);
> +	blk_unregister_cranges(disk);
>   	mutex_unlock(&q->sysfs_lock);
>   	mutex_unlock(&q->sysfs_dir_lock);
>   
> diff --git a/block/blk.h b/block/blk.h
> index 4b885c0f6708..650c0d87987c 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -368,4 +368,7 @@ extern struct device_attribute dev_attr_events;
>   extern struct device_attribute dev_attr_events_async;
>   extern struct device_attribute dev_attr_events_poll_msecs;
>   
> +int blk_register_cranges(struct gendisk *disk, struct blk_cranges *new_cranges);
> +void blk_unregister_cranges(struct gendisk *disk);
> +
>   #endif /* BLK_INTERNAL_H */
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index b9588c52815d..144e8cac44ba 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -1947,6 +1947,7 @@ static unsigned int ata_scsiop_inq_00(struct ata_scsi_args *args, u8 *rbuf)
>   		0xb1,	/* page 0xb1, block device characteristics page */
>   		0xb2,	/* page 0xb2, thin provisioning page */
>   		0xb6,	/* page 0xb6, zoned block device characteristics */
> +		0xb9,	/* page 0xb9, concurrent positioning ranges */
>   	};
>   
>   	num_pages = sizeof(pages);

And this hunk is here because ... ?

> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 3177181c4326..cd58aa1090a3 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -378,6 +378,29 @@ static inline int blkdev_zone_mgmt_ioctl(struct block_device *bdev,
>   
>   #endif /* CONFIG_BLK_DEV_ZONED */
>   
> +/*
> + * Concurrent sector ranges: struct blk_crange describes range of
> + * contiguous sectors that can be served by independent resources on the
> + * device. The set of ranges defined in struct blk_cranges must overall
> + * include all sectors within the device capacity.
> + * For a device with multiple ranges, e.g. a single LUN multi-actuator HDD,
> + * requests targeting sectors in different ranges can be executed in parallel.
> + * A request can straddle a range boundary.
> + */
> +struct blk_crange {
> +	struct kobject		kobj;
> +	struct request_queue	*queue;
> +	sector_t		sector;
> +	sector_t		nr_sectors;
> +};
> +
> +struct blk_cranges {
> +	struct kobject		kobj;
> +	bool			sysfs_registered;
> +	unsigned int		nr_ranges;
> +	struct blk_crange	ranges[];
> +};
> +
>   struct request_queue {
>   	struct request		*last_merge;
>   	struct elevator_queue	*elevator;
> @@ -570,6 +593,9 @@ struct request_queue {
>   
>   #define BLK_MAX_WRITE_HINTS	5
>   	u64			write_hints[BLK_MAX_WRITE_HINTS];
> +
> +	/* Concurrent sector ranges */
> +	struct blk_cranges	*cranges;
>   };
>   
>   /* Keep blk_queue_flag_name[] in sync with the definitions below */
> @@ -1163,6 +1189,9 @@ extern void blk_queue_required_elevator_features(struct request_queue *q,
>   extern bool blk_queue_can_use_dma_map_merging(struct request_queue *q,
>   					      struct device *dev);
>   
> +struct blk_cranges *blk_alloc_cranges(struct gendisk *disk, int nr_ranges);
> +void blk_queue_set_cranges(struct gendisk *disk, struct blk_cranges *cr);
> +
>   /*
>    * Number of physical segments as sent to the device.
>    *
> 

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] scsi: sd: add concurrent positioning ranges support
  2021-07-21 10:42 ` [PATCH 2/4] scsi: sd: add " Damien Le Moal
  2021-07-21 18:01   ` Dan Carpenter
@ 2021-07-22 16:14   ` Hannes Reinecke
  2021-07-22 23:26     ` Damien Le Moal
  1 sibling, 1 reply; 14+ messages in thread
From: Hannes Reinecke @ 2021-07-22 16:14 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, linux-ide

On 7/21/21 12:42 PM, Damien Le Moal wrote:
> Add the sd_read_cpr() function to the sd scsi disk driver to discover
> if a device has multiple concurrent positioning ranges (i.e. multiple
> actuators on an HDD). This new function is called from
> sd_revalidate_disk() and uses the block layer functions
> blk_alloc_cranges() and blk_queue_set_cranges() to set a device
> cranges according to the information retrieved from log page B9h,
> if the device supports it.
> 
> The format of the Concurrent Positioning Ranges VPD page B9h is defined
> in section 6.6.6 of SBC-5.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>   drivers/scsi/sd.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++
>   drivers/scsi/sd.h |  1 +
>   2 files changed, 81 insertions(+)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index b8d55af763f9..b1e767a01b9f 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3125,6 +3125,85 @@ static void sd_read_security(struct scsi_disk *sdkp, unsigned char *buffer)
>   		sdkp->security = 1;
>   }
>   
> +static inline sector_t sd64_to_sectors(struct scsi_disk *sdkp, u8 *buf)
> +{
> +	return logical_to_sectors(sdkp->device, get_unaligned_be64(buf));
> +}
> +
> +/**
> + * sd_read_cpr - Query concurrent positioning ranges
> + * @sdkp:	disk to query
> + */
> +static void sd_read_cpr(struct scsi_disk *sdkp)
> +{
> +	unsigned char *buffer, *desc;
> +	struct blk_cranges *cr = NULL;
> +	unsigned int nr_cpr = 0;
> +	int i, vpd_len, buf_len = SD_BUF_SIZE;
> +
> +	/*
> +	 * We need to have the capacity set first for the block layer to be
> +	 * able to check the ranges.
> +	 */
> +	if (sdkp->first_scan)
> +		return;
> +
> +	if (!sdkp->capacity)
> +		goto out;
> +
> +	/*
> +	 * Concurrent Positioning Ranges VPD: there can be at most 256 ranges,
> +	 * leading to a maximum page size of 64 + 256*32 bytes.
> +	 */
> +	buf_len = 64 + 256*32;
> +	buffer = kmalloc(buf_len, GFP_KERNEL);
> +	if (!buffer || scsi_get_vpd_page(sdkp->device, 0xb9, buffer, buf_len))
> +		goto out;
> +
> +	/* We must have at least a 64B header and one 32B range descriptor */
> +	vpd_len = get_unaligned_be16(&buffer[2]) + 3;
> +	if (vpd_len > buf_len || vpd_len < 64 + 32 || (vpd_len & 31)) {
> +		sd_printk(KERN_ERR, sdkp,
> +			  "Invalid Concurrent Positioning Ranges VPD page\n");
> +		goto out;
> +	}
> +
> +	nr_cpr = (vpd_len - 64) / 32;
> +	if (nr_cpr == 1) {
> +		nr_cpr = 0;
> +		goto out;
> +	}
> +
> +	cr = blk_alloc_cranges(sdkp->disk, nr_cpr);
> +	if (!cr) {
> +		nr_cpr = 0;
> +		goto out;
> +	}
> +
> +	desc = &buffer[64];
> +	for (i = 0; i < nr_cpr; i++, desc += 32) {
> +		if (desc[0] != i) {
> +			sd_printk(KERN_ERR, sdkp,
> +				"Invalid Concurrent Positioning Range number\n");
> +			nr_cpr = 0;
> +			break;
> +		}
> +
> +		cr->ranges[i].sector = sd64_to_sectors(sdkp, desc + 8);
> +		cr->ranges[i].nr_sectors = sd64_to_sectors(sdkp, desc + 16);
> +	}
> +
> +out:
> +	blk_queue_set_cranges(sdkp->disk, cr);

See? We are _are_ creating a new set of ranges.
So why bother updating the old ones?

> +	if (nr_cpr && sdkp->nr_actuators != nr_cpr) {
> +		sd_printk(KERN_NOTICE, sdkp,
> +			  "%u concurrent positioning ranges\n", nr_cpr);
> +		sdkp->nr_actuators = nr_cpr;
> +	}
> +
> +	kfree(buffer);
> +}
> +
>   /*
>    * Determine the device's preferred I/O size for reads and writes
>    * unless the reported value is unreasonably small, large, not a
> @@ -3240,6 +3319,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
>   		sd_read_app_tag_own(sdkp, buffer);
>   		sd_read_write_same(sdkp, buffer);
>   		sd_read_security(sdkp, buffer);
> +		sd_read_cpr(sdkp);
>   	}
>   
>   	/*
> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
> index b59136c4125b..2e5932bde43d 100644
> --- a/drivers/scsi/sd.h
> +++ b/drivers/scsi/sd.h
> @@ -106,6 +106,7 @@ struct scsi_disk {
>   	u8		protection_type;/* Data Integrity Field */
>   	u8		provisioning_mode;
>   	u8		zeroing_mode;
> +	u8		nr_actuators;		/* Number of actuators */
>   	unsigned	ATO : 1;	/* state of disk ATO bit */
>   	unsigned	cache_override : 1; /* temp override of WCE,RCD */
>   	unsigned	WCE : 1;	/* state of disk WCE bit */
> 
Otherwise:

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] libata: support concurrent positioning ranges log
  2021-07-21 10:42 ` [PATCH 3/4] libata: support concurrent positioning ranges log Damien Le Moal
@ 2021-07-22 17:34   ` Hannes Reinecke
  2021-07-22 23:27     ` Damien Le Moal
  0 siblings, 1 reply; 14+ messages in thread
From: Hannes Reinecke @ 2021-07-22 17:34 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, linux-ide

On 7/21/21 12:42 PM, Damien Le Moal wrote:
> Add support to discover if an ATA device supports the Concurrent
> Positioning Ranges Log (address 0x47), indicating that the device is
> capable of seeking to multiple different locations in parallel using
> multiple actuators serving different LBA ranges.
> 
> Also add support to translate the concurrent positioning ranges log
> into its equivalent Concurrent Positioning Ranges VPD page B9h in
> libata-scsi.c.
> 
> The format of the Concurrent Positioning Ranges Log is defined in ACS-5
> r9.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>   drivers/ata/libata-core.c | 57 +++++++++++++++++++++++++++++++++++++++
>   drivers/ata/libata-scsi.c | 46 ++++++++++++++++++++++++-------
>   include/linux/ata.h       |  1 +
>   include/linux/libata.h    | 11 ++++++++
>   4 files changed, 106 insertions(+), 9 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] block: Add concurrent positioning ranges support
  2021-07-22 16:11   ` Hannes Reinecke
@ 2021-07-22 23:20     ` Damien Le Moal
  2021-07-22 23:59     ` Damien Le Moal
  1 sibling, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2021-07-22 23:20 UTC (permalink / raw)
  To: Hannes Reinecke, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, linux-ide

On 2021/07/23 1:11, Hannes Reinecke wrote:
> On 7/21/21 12:42 PM, Damien Le Moal wrote:
>> The Concurrent Positioning Ranges VPD page (for SCSI) and Log (for ATA)
>> contain parameters describing the number of sets of contiguous LBAs that
>> can be served independently by a single LUN multi-actuator disk. This
>> patch provides the blk_queue_set_cranges() function allowing a device
>> driver to signal to the block layer that a disk has multiple actuators,
>> each one serving a contiguous range of sectors. To describe the set
>> of sector ranges representing the different actuators of a device, the
>> data type struct blk_cranges is introduced.
>>
>> For a device with multiple actuators, a struct blk_cranges is attached
>> to the device request queue by the blk_queue_set_cranges() function. The
>> function blk_alloc_cranges() is provided for drivers to allocate this
>> structure.
>>
>> The blk_cranges structure contains kobjects (struct kobject) to register
>> with sysfs the set of sector ranges defined by a device. On initial
>> device scan, this registration is done from blk_register_queue() using
>> the block layer internal function blk_register_cranges(). If a driver
>> calls blk_queue_set_cranges() for a registered queue, e.g. when a device
>> is revalidated, blk_queue_set_cranges() will execute
>> blk_register_cranges() to update the queue sysfs attribute files.
>>
>> The sysfs file structure created starts from the cranges sub-directory
>> and contains the start sector and number of sectors served by an
>> actuator, with the information for each actuator grouped in one
>> directory per actuator. E.g. for a dual actuator drive, we have:
>>
>> $ tree /sys/block/sdk/queue/cranges/
>> /sys/block/sdk/queue/cranges/
>> |-- 0
>> |   |-- nr_sectors
>> |   `-- sector
>> `-- 1
>>      |-- nr_sectors
>>      `-- sector
>>
>> For a regular single actuator device, the cranges directory does not
>> exist.
>>
> Hmm. I do wonder if we need such a detailed layout.
> In my patch I have just:
> 
> +       unsigned int            num_lba_ranges;
> +       sector_t                *lba_ranges;
> 
> to hold the information; after all, I don't expect this data to be 
> changing very often during the lifetime of the OS.

Indeed, it will not. Except after a reformat that change the LBA size or that
depops heads. I was also split between what I sent here and a flat list. In the
end, I went for this one as I think it is more elegant and also allows simply
swaping the old cranges with the new on revalidation (more on that below).
Otherwise, with the flat list, we will have to mess with the visibility of the
ranges OR add/remove attribute files to the queue sysfs directory when something
changes (and the number of ranges can actually change with head depop formats...).

>> Device revalidation may lead to changes to this structure and to the
>> attribute values. When manipulated, the queue sysfs_lock and
>> sysfs_dir_lock are held for atomicity, similarly to how the blk-mq and
>> elevator sysfs queue sub-directories are protected.
>>
> 
> Indeed, the values might change with revalidation. But not 
> independently; a change in _one_ range will typically affect the 
> adjacent ranges, too.
> So it should be sufficient to drop them completely and redo from scratch 
> for revalidation.

Which is what blk_queue_set_cranges() does :) If the new cranges specified as
argument has a change from the old one already set as q->cranges, the old one is
dropped and the new one registred in its place.

>> The code related to the management of cranges is added in the new
>> file block/blk-cranges.c.
>>
>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>> ---
>>   block/Makefile            |   2 +-
>>   block/blk-cranges.c       | 286 ++++++++++++++++++++++++++++++++++++++
>>   block/blk-sysfs.c         |  13 ++
>>   block/blk.h               |   3 +
>>   drivers/ata/libata-scsi.c |   1 +
>>   include/linux/blkdev.h    |  29 ++++
>>   6 files changed, 333 insertions(+), 1 deletion(-)
>>   create mode 100644 block/blk-cranges.c
>>
>> diff --git a/block/Makefile b/block/Makefile
>> index bfbe4e13ca1e..e477e6ca9ea6 100644
>> --- a/block/Makefile
>> +++ b/block/Makefile
>> @@ -9,7 +9,7 @@ obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-sysfs.o \
>>   			blk-lib.o blk-mq.o blk-mq-tag.o blk-stat.o \
>>   			blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \
>>   			genhd.o ioprio.o badblocks.o partitions/ blk-rq-qos.o \
>> -			disk-events.o
>> +			disk-events.o blk-cranges.o
>>   
>>   obj-$(CONFIG_BOUNCE)		+= bounce.o
>>   obj-$(CONFIG_BLK_SCSI_REQUEST)	+= scsi_ioctl.o
>> diff --git a/block/blk-cranges.c b/block/blk-cranges.c
>> new file mode 100644
>> index 000000000000..e8ff0d40a5c9
>> --- /dev/null
>> +++ b/block/blk-cranges.c
>> @@ -0,0 +1,286 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + *  Block device concurrent positioning ranges.
>> + *
>> + *  Copyright (C) 2021 Western Digital Corporation or its Affiliates.
>> + */
>> +#include <linux/kernel.h>
>> +#include <linux/blkdev.h>
>> +#include <linux/slab.h>
>> +#include <linux/init.h>
>> +
>> +#include "blk.h"
>> +
>> +static ssize_t blk_crange_sector_show(struct blk_crange *cr, char *page)
>> +{
>> +	return sprintf(page, "%llu\n", cr->sector);
>> +}
>> +
>> +static ssize_t blk_crange_nr_sectors_show(struct blk_crange *cr, char *page)
>> +{
>> +	return sprintf(page, "%llu\n", cr->nr_sectors);
>> +}
>> +
>> +struct blk_crange_sysfs_entry {
>> +	struct attribute attr;
>> +	ssize_t (*show)(struct blk_crange *cr, char *page);
>> +};
>> +
>> +static struct blk_crange_sysfs_entry blk_crange_sector_entry = {
>> +	.attr = { .name = "sector", .mode = 0444 },
>> +	.show = blk_crange_sector_show,
>> +};
>> +
>> +static struct blk_crange_sysfs_entry blk_crange_nr_sectors_entry = {
>> +	.attr = { .name = "nr_sectors", .mode = 0444 },
>> +	.show = blk_crange_nr_sectors_show,
>> +};
>> +
>> +static struct attribute *blk_crange_attrs[] = {
>> +	&blk_crange_sector_entry.attr,
>> +	&blk_crange_nr_sectors_entry.attr,
>> +	NULL,
>> +};
>> +ATTRIBUTE_GROUPS(blk_crange);
>> +
>> +static ssize_t blk_crange_sysfs_show(struct kobject *kobj,
>> +				     struct attribute *attr, char *page)
>> +{
>> +	struct blk_crange_sysfs_entry *entry;
>> +	struct blk_crange *cr;
>> +	struct request_queue *q;
>> +	ssize_t ret;
>> +
>> +	entry = container_of(attr, struct blk_crange_sysfs_entry, attr);
>> +	cr = container_of(kobj, struct blk_crange, kobj);
>> +	q = cr->queue;
>> +
>> +	mutex_lock(&q->sysfs_lock);
>> +	ret = entry->show(cr, page);
>> +	mutex_unlock(&q->sysfs_lock);
>> +
>> +	return ret;
>> +}
>> +
>> +static const struct sysfs_ops blk_crange_sysfs_ops = {
>> +	.show	= blk_crange_sysfs_show,
>> +};
>> +
>> +/*
>> + * Dummy release function to make kobj happy.
>> + */
>> +static void blk_cranges_sysfs_nop_release(struct kobject *kobj)
>> +{
>> +}
>> +
>> +static struct kobj_type blk_crange_ktype = {
>> +	.sysfs_ops	= &blk_crange_sysfs_ops,
>> +	.default_groups	= blk_crange_groups,
>> +	.release	= blk_cranges_sysfs_nop_release,
>> +};
>> +
>> +static struct kobj_type blk_cranges_ktype = {
>> +	.release	= blk_cranges_sysfs_nop_release,
>> +};
>> +
>> +/**
>> + * blk_register_cranges - register with sysfs a set of concurrent ranges
>> + * @disk:		Target disk
>> + * @new_cranges:	New set of concurrent ranges
>> + *
>> + * Register with sysfs a set of concurrent ranges for @disk. If @new_cranges
>> + * is not NULL, this set of concurrent ranges is registered and the
>> + * old set specified by q->cranges is unregistered. Otherwise, q->cranges
>> + * is registered if it is not already.
>> + */
>> +int blk_register_cranges(struct gendisk *disk, struct blk_cranges *new_cranges)
>> +{
>> +	struct request_queue *q = disk->queue;
>> +	struct blk_cranges *cranges;
>> +	int i, ret;
>> +
>> +	lockdep_assert_held(&q->sysfs_dir_lock);
>> +	lockdep_assert_held(&q->sysfs_lock);
>> +
>> +	/* If a new range set is specified, unregister the old one */
>> +	if (new_cranges) {
>> +		if (q->cranges)
>> +			blk_unregister_cranges(disk);
>> +		q->cranges = new_cranges;
>> +	}
>> +
>> +	cranges = q->cranges;
>> +	if (!cranges)
>> +		return 0;
>> +
>> +	/*
>> +	 * At this point, q->cranges is the new set of sector ranges that needs
>> +	 * to be registered with sysfs.
>> +	 */
>> +	WARN_ON(cranges->sysfs_registered);
>> +	ret = kobject_init_and_add(&cranges->kobj, &blk_cranges_ktype,
>> +				   &q->kobj, "%s", "cranges");
>> +	if (ret)
>> +		goto free;
>> + > +	for (i = 0; i < cranges->nr_ranges; i++) {
>> +		cranges->ranges[i].queue = q;
>> +		ret = kobject_init_and_add(&cranges->ranges[i].kobj,
>> +					   &blk_crange_ktype, &cranges->kobj,
>> +					   "%d", i);
>> +		if (ret)
>> +			goto delete_obj;
>> +	}
>> +
>> +	cranges->sysfs_registered = true;
>> +
> 
> Do you really need independent kobj here?
> As discussed above, the individual ranges are not independent, and it 
> might be easier to just lump them into one structure, seeing that all of 
> them will have to be redone during revalidation anyway.

For the structure lifetime, no, we do not need separate kobjs. But with sysfs,
one kobj == one directory. SO I have these for the queue/cranges sysfs directory
and its sub directories (one per range). Note that there are no kobj_get|put()
for these. It is not needed. They are only for the sub-directories.

>> +	return 0;
>> +
>> +delete_obj:
>> +	while (--i >= 0)
>> +		kobject_del(&cranges->ranges[i].kobj);
>> +	kobject_del(&cranges->kobj);
>> +free:
>> +	kfree(cranges);
>> +	q->cranges = NULL;
>> +	return ret;
>> +}
>> +
>> +void blk_unregister_cranges(struct gendisk *disk)
>> +{
>> +	struct request_queue *q = disk->queue;
>> +	struct blk_cranges *cranges = q->cranges;
>> +	int i;
>> +
>> +	lockdep_assert_held(&q->sysfs_dir_lock);
>> +	lockdep_assert_held(&q->sysfs_lock);
>> +
>> +	if (!cranges)
>> +		return;
>> +
>> +	if (cranges->sysfs_registered) {
>> +		for (i = 0; i < cranges->nr_ranges; i++)
>> +			kobject_del(&cranges->ranges[i].kobj);
>> +		kobject_del(&cranges->kobj);
>> +	}
>> +
>> +	kfree(cranges);
>> +	q->cranges = NULL;
>> +}
>> +
>> +static bool blk_check_ranges(struct gendisk *disk, struct blk_cranges *cr)
>> +{
>> +	sector_t capacity = get_capacity(disk);
>> +	sector_t min_sector = (sector_t)-1;
>> +	sector_t max_sector = 0;
>> +	int i;
>> +
>> +	/*
>> +	 * Sector ranges may overlap but should overall contain all sectors
>> +	 * within the disk capacity.
>> +	 */
>> +	for (i = 0; i < cr->nr_ranges; i++) {
>> +		min_sector = min(min_sector, cr->ranges[i].sector);
>> +		max_sector = max(max_sector, cr->ranges[i].sector +
>> +					     cr->ranges[i].nr_sectors);
>> +	}
>> +
> 
> Warm fuzzy feelings.
> Sector ranges may overlap? Seriously?

Yes they can. The standards do not forbid it, and for a good reason:
split-actuator drives vs multi-actuator drives. These two terms are often used
interchangably, but they actually refer to something different:
1) Split-actuator drives: On these, the head stack (actuator) is split in 2 on
the same axis. This creates 2 subsets of heads, each serving different disk
platters, so different LBAs. In this case, there will be no overlap.
2) Multi-actuator drives: With these, the drives have 2 distinct actuators (head
stacks) and each stack can potentially access the entire LBA range (all disk
platters). In this case, the ranges for each one will be the entire disk
capacity. Which makes reporting that rather pointless I guess, at least in terms
of optimization for the storage stack since we will be better off having the
disk do its own scheduling in this case.

So following the standard, both are allowed here so I only check that all
sectors are covered. No check on overlap. File systems or IO schedulers who want
to optimize for multiple cranges will have to be careful about that.

> The only way this could happen is when a given LBA will be present in 
> _two_ ranges. And this is valid?

Yep. Dual-actuator case.

>> +	if (min_sector != 0 || max_sector < capacity) {
>> +		pr_warn("Invalid concurrent ranges: missing sectors\n");
>> +		return false;
>> +	}
>> +
>> +	if (max_sector > capacity) {
>> +		pr_warn("Invalid concurrent ranges: beyond capacity\n");
>> +		return false;
>> +	}
>> +
>> +	return true;
>> +}
>> +
>> +static bool blk_cranges_changed(struct gendisk *disk, struct blk_cranges *new)
>> +{
>> +	struct blk_cranges *old = disk->queue->cranges;
>> +	int i;
>> +
>> +	if (!old || old->nr_ranges != new->nr_ranges)
>> +		return true;
>> +
>> +	for (i = 0; i < new->nr_ranges; i++) {
>> +		if (old->ranges[i].sector != new->ranges[i].sector ||
>> +		    old->ranges[i].nr_sectors != new->ranges[i].nr_sectors)
>> +			return true;
>> +	}
>> +
>> +	return false;
>> +}
>> +
> 
> As mentioned above: I doubt we need to go into such detail. Just redo 
> the ranges on revalidation should be sufficient.

Yes, we could. But checking for the change is trivial with this function and it
will return true most of the time. If it does, nothing is done and the cranges
structure passed by blk_queue_set_cranges() is simply freed and sysfs remains
untouched. I prefer that over constantly changing the cranges structure for
nothing. after all, revalidation is not exactly a rare event...

>> +/**
>> + * blk_alloc_cranges - Allocate a concurrent positioning range structure
>> + * @disk:	target disk
>> + * @nr_ranges:	Number of concurrent ranges
>> + *
>> + * Allocate a struct blk_cranges structure with @nr_ranges range descriptors.
>> + */
>> +struct blk_cranges *blk_alloc_cranges(struct gendisk *disk, int nr_ranges)
>> +{
>> +	struct blk_cranges *cr;
>> +
>> +	cr = kzalloc_node(struct_size(cr, ranges, nr_ranges), GFP_KERNEL,
>> +			  disk->queue->node);
>> +	if (cr)
>> +		cr->nr_ranges = nr_ranges;
>> +	return cr;
>> +}
>> +EXPORT_SYMBOL_GPL(blk_alloc_cranges);
>> +
>> +/**
>> + * blk_queue_set_cranges - Set a disk concurrent positioning ranges
>> + * @disk:	target disk
>> + * @cr:		concurrent ranges structure
>> + *
>> + * Set the concurrant positioning ranges information of the request queue
>> + * of @disk to @cr. If @cr is NULL and the concurrent ranges structure
>> + * already set, if any, is cleared. If there are no differences between
>> + * @cr and the concurrent ranges structure already set, @cr is freed.
>> + */
>> +void blk_queue_set_cranges(struct gendisk *disk, struct blk_cranges *cr)
>> +{
>> +	struct request_queue *q = disk->queue;
>> +
>> +	if (WARN_ON_ONCE(cr && !cr->nr_ranges)) {
>> +		kfree(cr);
>> +		cr = NULL;
>> +	}
>> +
>> +	mutex_lock(&q->sysfs_dir_lock);
>> +	mutex_lock(&q->sysfs_lock);
>> +
> 
> And this is what you get for all the complexity.
> Having to take _two_ locks, and have a detailed description how the 
> function should be used.
> 
> I still think that redoing everything is simpler.

It is redoing everything, but only in the case of a difference between the old
and new cranges. q->sysfs_dir_lock is taken because the structure has
sub-directories in sysfs. We would not need to take it with a flat list of
attribute files that are updated, unless the number of of actuators actually
changes. In which case, attribute files would need to be removed and
q->sysfs_dir_lock taken. Or we need to muck with attribute visibility, which I
find ugly personnally... But if you really insist, we can swith to that.

> 
>> +	if (cr && !blk_check_ranges(disk, cr)) {
>> +		kfree(cr);
>> +		cr = NULL;
>> +	}
>> +
>> +	if (!blk_cranges_changed(disk, cr)) {
>> +		kfree(cr);
>> +		goto unlock;
>> +	}
>> +
>> +	/*
>> +	 * This may be called for a registered queue. E.g. during a device
>> +	 * revalidation. If that is the case, we need to register the new set
>> +	 * of concurrent ranges. If the queue is not already registered, the
>> +	 * device request queue registration will register the ranges.
>> +	 */
>> +	if (blk_queue_registered(q)) {
>> +		blk_register_cranges(disk, cr);
>> +	} else {
>> +		swap(q->cranges, cr);
>> +		kfree(cr);
>> +	}
>> +
>> +unlock:
>> +	mutex_unlock(&q->sysfs_lock);
>> +	mutex_unlock(&q->sysfs_dir_lock);
>> +}
>> +EXPORT_SYMBOL_GPL(blk_queue_set_cranges);
>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
>> index 370d83c18057..aeac98ecc5a0 100644
>> --- a/block/blk-sysfs.c
>> +++ b/block/blk-sysfs.c
>> @@ -899,9 +899,21 @@ int blk_register_queue(struct gendisk *disk)
>>   	}
>>   
>>   	mutex_lock(&q->sysfs_lock);
>> +
>> +	ret = blk_register_cranges(disk, NULL);
>> +	if (ret) {
>> +		mutex_unlock(&q->sysfs_lock);
>> +		mutex_unlock(&q->sysfs_dir_lock);
>> +		kobject_del(&q->kobj);
>> +		blk_trace_remove_sysfs(dev);
>> +		kobject_put(&dev->kobj);
>> +		return ret;
>> +	}
>> +
>>   	if (q->elevator) {
>>   		ret = elv_register_queue(q, false);
>>   		if (ret) {
>> +			blk_unregister_cranges(disk);
>>   			mutex_unlock(&q->sysfs_lock);
>>   			mutex_unlock(&q->sysfs_dir_lock);
>>   			kobject_del(&q->kobj);
>> @@ -985,6 +997,7 @@ void blk_unregister_queue(struct gendisk *disk)
>>   	mutex_lock(&q->sysfs_lock);
>>   	if (q->elevator)
>>   		elv_unregister_queue(q);
>> +	blk_unregister_cranges(disk);
>>   	mutex_unlock(&q->sysfs_lock);
>>   	mutex_unlock(&q->sysfs_dir_lock);
>>   
>> diff --git a/block/blk.h b/block/blk.h
>> index 4b885c0f6708..650c0d87987c 100644
>> --- a/block/blk.h
>> +++ b/block/blk.h
>> @@ -368,4 +368,7 @@ extern struct device_attribute dev_attr_events;
>>   extern struct device_attribute dev_attr_events_async;
>>   extern struct device_attribute dev_attr_events_poll_msecs;
>>   
>> +int blk_register_cranges(struct gendisk *disk, struct blk_cranges *new_cranges);
>> +void blk_unregister_cranges(struct gendisk *disk);
>> +
>>   #endif /* BLK_INTERNAL_H */
>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>> index b9588c52815d..144e8cac44ba 100644
>> --- a/drivers/ata/libata-scsi.c
>> +++ b/drivers/ata/libata-scsi.c
>> @@ -1947,6 +1947,7 @@ static unsigned int ata_scsiop_inq_00(struct ata_scsi_args *args, u8 *rbuf)
>>   		0xb1,	/* page 0xb1, block device characteristics page */
>>   		0xb2,	/* page 0xb2, thin provisioning page */
>>   		0xb6,	/* page 0xb6, zoned block device characteristics */
>> +		0xb9,	/* page 0xb9, concurrent positioning ranges */
>>   	};
>>   
>>   	num_pages = sizeof(pages);
> 
> And this hunk is here because ... ?

Oops... That belongs to patch 3.

Thanks for the review. Sending V2.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] scsi: sd: add concurrent positioning ranges support
  2021-07-22 16:14   ` Hannes Reinecke
@ 2021-07-22 23:26     ` Damien Le Moal
  0 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2021-07-22 23:26 UTC (permalink / raw)
  To: Hannes Reinecke, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, linux-ide

On 2021/07/23 1:14, Hannes Reinecke wrote:
> On 7/21/21 12:42 PM, Damien Le Moal wrote:
>> Add the sd_read_cpr() function to the sd scsi disk driver to discover
>> if a device has multiple concurrent positioning ranges (i.e. multiple
>> actuators on an HDD). This new function is called from
>> sd_revalidate_disk() and uses the block layer functions
>> blk_alloc_cranges() and blk_queue_set_cranges() to set a device
>> cranges according to the information retrieved from log page B9h,
>> if the device supports it.
>>
>> The format of the Concurrent Positioning Ranges VPD page B9h is defined
>> in section 6.6.6 of SBC-5.
>>
>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>> ---
>>   drivers/scsi/sd.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++
>>   drivers/scsi/sd.h |  1 +
>>   2 files changed, 81 insertions(+)
>>
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index b8d55af763f9..b1e767a01b9f 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -3125,6 +3125,85 @@ static void sd_read_security(struct scsi_disk *sdkp, unsigned char *buffer)
>>   		sdkp->security = 1;
>>   }
>>   
>> +static inline sector_t sd64_to_sectors(struct scsi_disk *sdkp, u8 *buf)
>> +{
>> +	return logical_to_sectors(sdkp->device, get_unaligned_be64(buf));
>> +}
>> +
>> +/**
>> + * sd_read_cpr - Query concurrent positioning ranges
>> + * @sdkp:	disk to query
>> + */
>> +static void sd_read_cpr(struct scsi_disk *sdkp)
>> +{
>> +	unsigned char *buffer, *desc;
>> +	struct blk_cranges *cr = NULL;
>> +	unsigned int nr_cpr = 0;
>> +	int i, vpd_len, buf_len = SD_BUF_SIZE;
>> +
>> +	/*
>> +	 * We need to have the capacity set first for the block layer to be
>> +	 * able to check the ranges.
>> +	 */
>> +	if (sdkp->first_scan)
>> +		return;
>> +
>> +	if (!sdkp->capacity)
>> +		goto out;
>> +
>> +	/*
>> +	 * Concurrent Positioning Ranges VPD: there can be at most 256 ranges,
>> +	 * leading to a maximum page size of 64 + 256*32 bytes.
>> +	 */
>> +	buf_len = 64 + 256*32;
>> +	buffer = kmalloc(buf_len, GFP_KERNEL);
>> +	if (!buffer || scsi_get_vpd_page(sdkp->device, 0xb9, buffer, buf_len))
>> +		goto out;
>> +
>> +	/* We must have at least a 64B header and one 32B range descriptor */
>> +	vpd_len = get_unaligned_be16(&buffer[2]) + 3;
>> +	if (vpd_len > buf_len || vpd_len < 64 + 32 || (vpd_len & 31)) {
>> +		sd_printk(KERN_ERR, sdkp,
>> +			  "Invalid Concurrent Positioning Ranges VPD page\n");
>> +		goto out;
>> +	}
>> +
>> +	nr_cpr = (vpd_len - 64) / 32;
>> +	if (nr_cpr == 1) {
>> +		nr_cpr = 0;
>> +		goto out;
>> +	}
>> +
>> +	cr = blk_alloc_cranges(sdkp->disk, nr_cpr);
>> +	if (!cr) {
>> +		nr_cpr = 0;
>> +		goto out;
>> +	}
>> +
>> +	desc = &buffer[64];
>> +	for (i = 0; i < nr_cpr; i++, desc += 32) {
>> +		if (desc[0] != i) {
>> +			sd_printk(KERN_ERR, sdkp,
>> +				"Invalid Concurrent Positioning Range number\n");
>> +			nr_cpr = 0;
>> +			break;
>> +		}
>> +
>> +		cr->ranges[i].sector = sd64_to_sectors(sdkp, desc + 8);
>> +		cr->ranges[i].nr_sectors = sd64_to_sectors(sdkp, desc + 16);
>> +	}
>> +
>> +out:
>> +	blk_queue_set_cranges(sdkp->disk, cr);
> 
> See? We are _are_ creating a new set of ranges.
> So why bother updating the old ones?

See my reply to my comments on patch 1. We do not update the old ones. The cases
are:
1) This is the first call, q->cranges is NULL: cr will become q->cranges.
2) q->cranges is already set and cr has the same info as q->cranges (no change):
cr will be freed by blk_queue_set_cranges().
3) q->cranges is already set and cr is different from q-cranges: q->cranges will
be completely dropped and cr become the new q->cranges.

So we never directly update anything inside q->cranges. We just swap in a new
full structure in case of change.

>> +	if (nr_cpr && sdkp->nr_actuators != nr_cpr) {
>> +		sd_printk(KERN_NOTICE, sdkp,
>> +			  "%u concurrent positioning ranges\n", nr_cpr);
>> +		sdkp->nr_actuators = nr_cpr;
>> +	}
>> +
>> +	kfree(buffer);
>> +}
>> +
>>   /*
>>    * Determine the device's preferred I/O size for reads and writes
>>    * unless the reported value is unreasonably small, large, not a
>> @@ -3240,6 +3319,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
>>   		sd_read_app_tag_own(sdkp, buffer);
>>   		sd_read_write_same(sdkp, buffer);
>>   		sd_read_security(sdkp, buffer);
>> +		sd_read_cpr(sdkp);
>>   	}
>>   
>>   	/*
>> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
>> index b59136c4125b..2e5932bde43d 100644
>> --- a/drivers/scsi/sd.h
>> +++ b/drivers/scsi/sd.h
>> @@ -106,6 +106,7 @@ struct scsi_disk {
>>   	u8		protection_type;/* Data Integrity Field */
>>   	u8		provisioning_mode;
>>   	u8		zeroing_mode;
>> +	u8		nr_actuators;		/* Number of actuators */
>>   	unsigned	ATO : 1;	/* state of disk ATO bit */
>>   	unsigned	cache_override : 1; /* temp override of WCE,RCD */
>>   	unsigned	WCE : 1;	/* state of disk WCE bit */
>>
> Otherwise:
> 
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> 
> Cheers,
> 
> Hannes
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] libata: support concurrent positioning ranges log
  2021-07-22 17:34   ` Hannes Reinecke
@ 2021-07-22 23:27     ` Damien Le Moal
  0 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2021-07-22 23:27 UTC (permalink / raw)
  To: Hannes Reinecke, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, linux-ide

On 2021/07/23 2:34, Hannes Reinecke wrote:
> On 7/21/21 12:42 PM, Damien Le Moal wrote:
>> Add support to discover if an ATA device supports the Concurrent
>> Positioning Ranges Log (address 0x47), indicating that the device is
>> capable of seeking to multiple different locations in parallel using
>> multiple actuators serving different LBA ranges.
>>
>> Also add support to translate the concurrent positioning ranges log
>> into its equivalent Concurrent Positioning Ranges VPD page B9h in
>> libata-scsi.c.
>>
>> The format of the Concurrent Positioning Ranges Log is defined in ACS-5
>> r9.
>>
>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>> ---
>>   drivers/ata/libata-core.c | 57 +++++++++++++++++++++++++++++++++++++++
>>   drivers/ata/libata-scsi.c | 46 ++++++++++++++++++++++++-------
>>   include/linux/ata.h       |  1 +
>>   include/linux/libata.h    | 11 ++++++++
>>   4 files changed, 106 insertions(+), 9 deletions(-)
>>
> Reviewed-by: Hannes Reinecke <hare@suse.de>

Thanks. But I will change this one, bringing in the misplaced hunk in patch 1.

> 
> Cheers,
> 
> Hannes
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] block: Add concurrent positioning ranges support
  2021-07-22 16:11   ` Hannes Reinecke
  2021-07-22 23:20     ` Damien Le Moal
@ 2021-07-22 23:59     ` Damien Le Moal
  1 sibling, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2021-07-22 23:59 UTC (permalink / raw)
  To: Hannes Reinecke, linux-block, Jens Axboe, linux-scsi,
	Martin K . Petersen, linux-ide

On 2021/07/23 1:11, Hannes Reinecke wrote:
> On 7/21/21 12:42 PM, Damien Le Moal wrote:
>> The Concurrent Positioning Ranges VPD page (for SCSI) and Log (for ATA)
>> contain parameters describing the number of sets of contiguous LBAs that
>> can be served independently by a single LUN multi-actuator disk. This
>> patch provides the blk_queue_set_cranges() function allowing a device
>> driver to signal to the block layer that a disk has multiple actuators,
>> each one serving a contiguous range of sectors. To describe the set
>> of sector ranges representing the different actuators of a device, the
>> data type struct blk_cranges is introduced.
>>
>> For a device with multiple actuators, a struct blk_cranges is attached
>> to the device request queue by the blk_queue_set_cranges() function. The
>> function blk_alloc_cranges() is provided for drivers to allocate this
>> structure.
>>
>> The blk_cranges structure contains kobjects (struct kobject) to register
>> with sysfs the set of sector ranges defined by a device. On initial
>> device scan, this registration is done from blk_register_queue() using
>> the block layer internal function blk_register_cranges(). If a driver
>> calls blk_queue_set_cranges() for a registered queue, e.g. when a device
>> is revalidated, blk_queue_set_cranges() will execute
>> blk_register_cranges() to update the queue sysfs attribute files.
>>
>> The sysfs file structure created starts from the cranges sub-directory
>> and contains the start sector and number of sectors served by an
>> actuator, with the information for each actuator grouped in one
>> directory per actuator. E.g. for a dual actuator drive, we have:
>>
>> $ tree /sys/block/sdk/queue/cranges/
>> /sys/block/sdk/queue/cranges/
>> |-- 0
>> |   |-- nr_sectors
>> |   `-- sector
>> `-- 1
>>      |-- nr_sectors
>>      `-- sector
>>
>> For a regular single actuator device, the cranges directory does not
>> exist.
>>
> Hmm. I do wonder if we need such a detailed layout.
> In my patch I have just:
> 
> +       unsigned int            num_lba_ranges;
> +       sector_t                *lba_ranges;
> 
> to hold the information; after all, I don't expect this data to be 
> changing very often during the lifetime of the OS.

Another thing about this sysfs structure: it allows easily adding attributes for
each actuator if needed. For instance, the VPD and Log page advertize a "number
of storage elements" for each actuator. This is not exposed here as I do not
find that information particularly useful. But we could add anything that the
standards defines. Or purely software things like a request counter to see how
well balanced accesses are.

> 
>> Device revalidation may lead to changes to this structure and to the
>> attribute values. When manipulated, the queue sysfs_lock and
>> sysfs_dir_lock are held for atomicity, similarly to how the blk-mq and
>> elevator sysfs queue sub-directories are protected.
>>
> 
> Indeed, the values might change with revalidation. But not 
> independently; a change in _one_ range will typically affect the 
> adjacent ranges, too.
> So it should be sufficient to drop them completely and redo from scratch 
> for revalidation.
> 
>> The code related to the management of cranges is added in the new
>> file block/blk-cranges.c.
>>
>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>> ---
>>   block/Makefile            |   2 +-
>>   block/blk-cranges.c       | 286 ++++++++++++++++++++++++++++++++++++++
>>   block/blk-sysfs.c         |  13 ++
>>   block/blk.h               |   3 +
>>   drivers/ata/libata-scsi.c |   1 +
>>   include/linux/blkdev.h    |  29 ++++
>>   6 files changed, 333 insertions(+), 1 deletion(-)
>>   create mode 100644 block/blk-cranges.c
>>
>> diff --git a/block/Makefile b/block/Makefile
>> index bfbe4e13ca1e..e477e6ca9ea6 100644
>> --- a/block/Makefile
>> +++ b/block/Makefile
>> @@ -9,7 +9,7 @@ obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-sysfs.o \
>>   			blk-lib.o blk-mq.o blk-mq-tag.o blk-stat.o \
>>   			blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \
>>   			genhd.o ioprio.o badblocks.o partitions/ blk-rq-qos.o \
>> -			disk-events.o
>> +			disk-events.o blk-cranges.o
>>   
>>   obj-$(CONFIG_BOUNCE)		+= bounce.o
>>   obj-$(CONFIG_BLK_SCSI_REQUEST)	+= scsi_ioctl.o
>> diff --git a/block/blk-cranges.c b/block/blk-cranges.c
>> new file mode 100644
>> index 000000000000..e8ff0d40a5c9
>> --- /dev/null
>> +++ b/block/blk-cranges.c
>> @@ -0,0 +1,286 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + *  Block device concurrent positioning ranges.
>> + *
>> + *  Copyright (C) 2021 Western Digital Corporation or its Affiliates.
>> + */
>> +#include <linux/kernel.h>
>> +#include <linux/blkdev.h>
>> +#include <linux/slab.h>
>> +#include <linux/init.h>
>> +
>> +#include "blk.h"
>> +
>> +static ssize_t blk_crange_sector_show(struct blk_crange *cr, char *page)
>> +{
>> +	return sprintf(page, "%llu\n", cr->sector);
>> +}
>> +
>> +static ssize_t blk_crange_nr_sectors_show(struct blk_crange *cr, char *page)
>> +{
>> +	return sprintf(page, "%llu\n", cr->nr_sectors);
>> +}
>> +
>> +struct blk_crange_sysfs_entry {
>> +	struct attribute attr;
>> +	ssize_t (*show)(struct blk_crange *cr, char *page);
>> +};
>> +
>> +static struct blk_crange_sysfs_entry blk_crange_sector_entry = {
>> +	.attr = { .name = "sector", .mode = 0444 },
>> +	.show = blk_crange_sector_show,
>> +};
>> +
>> +static struct blk_crange_sysfs_entry blk_crange_nr_sectors_entry = {
>> +	.attr = { .name = "nr_sectors", .mode = 0444 },
>> +	.show = blk_crange_nr_sectors_show,
>> +};
>> +
>> +static struct attribute *blk_crange_attrs[] = {
>> +	&blk_crange_sector_entry.attr,
>> +	&blk_crange_nr_sectors_entry.attr,
>> +	NULL,
>> +};
>> +ATTRIBUTE_GROUPS(blk_crange);
>> +
>> +static ssize_t blk_crange_sysfs_show(struct kobject *kobj,
>> +				     struct attribute *attr, char *page)
>> +{
>> +	struct blk_crange_sysfs_entry *entry;
>> +	struct blk_crange *cr;
>> +	struct request_queue *q;
>> +	ssize_t ret;
>> +
>> +	entry = container_of(attr, struct blk_crange_sysfs_entry, attr);
>> +	cr = container_of(kobj, struct blk_crange, kobj);
>> +	q = cr->queue;
>> +
>> +	mutex_lock(&q->sysfs_lock);
>> +	ret = entry->show(cr, page);
>> +	mutex_unlock(&q->sysfs_lock);
>> +
>> +	return ret;
>> +}
>> +
>> +static const struct sysfs_ops blk_crange_sysfs_ops = {
>> +	.show	= blk_crange_sysfs_show,
>> +};
>> +
>> +/*
>> + * Dummy release function to make kobj happy.
>> + */
>> +static void blk_cranges_sysfs_nop_release(struct kobject *kobj)
>> +{
>> +}
>> +
>> +static struct kobj_type blk_crange_ktype = {
>> +	.sysfs_ops	= &blk_crange_sysfs_ops,
>> +	.default_groups	= blk_crange_groups,
>> +	.release	= blk_cranges_sysfs_nop_release,
>> +};
>> +
>> +static struct kobj_type blk_cranges_ktype = {
>> +	.release	= blk_cranges_sysfs_nop_release,
>> +};
>> +
>> +/**
>> + * blk_register_cranges - register with sysfs a set of concurrent ranges
>> + * @disk:		Target disk
>> + * @new_cranges:	New set of concurrent ranges
>> + *
>> + * Register with sysfs a set of concurrent ranges for @disk. If @new_cranges
>> + * is not NULL, this set of concurrent ranges is registered and the
>> + * old set specified by q->cranges is unregistered. Otherwise, q->cranges
>> + * is registered if it is not already.
>> + */
>> +int blk_register_cranges(struct gendisk *disk, struct blk_cranges *new_cranges)
>> +{
>> +	struct request_queue *q = disk->queue;
>> +	struct blk_cranges *cranges;
>> +	int i, ret;
>> +
>> +	lockdep_assert_held(&q->sysfs_dir_lock);
>> +	lockdep_assert_held(&q->sysfs_lock);
>> +
>> +	/* If a new range set is specified, unregister the old one */
>> +	if (new_cranges) {
>> +		if (q->cranges)
>> +			blk_unregister_cranges(disk);
>> +		q->cranges = new_cranges;
>> +	}
>> +
>> +	cranges = q->cranges;
>> +	if (!cranges)
>> +		return 0;
>> +
>> +	/*
>> +	 * At this point, q->cranges is the new set of sector ranges that needs
>> +	 * to be registered with sysfs.
>> +	 */
>> +	WARN_ON(cranges->sysfs_registered);
>> +	ret = kobject_init_and_add(&cranges->kobj, &blk_cranges_ktype,
>> +				   &q->kobj, "%s", "cranges");
>> +	if (ret)
>> +		goto free;
>> + > +	for (i = 0; i < cranges->nr_ranges; i++) {
>> +		cranges->ranges[i].queue = q;
>> +		ret = kobject_init_and_add(&cranges->ranges[i].kobj,
>> +					   &blk_crange_ktype, &cranges->kobj,
>> +					   "%d", i);
>> +		if (ret)
>> +			goto delete_obj;
>> +	}
>> +
>> +	cranges->sysfs_registered = true;
>> +
> 
> Do you really need independent kobj here?
> As discussed above, the individual ranges are not independent, and it 
> might be easier to just lump them into one structure, seeing that all of 
> them will have to be redone during revalidation anyway.
> 
>> +	return 0;
>> +
>> +delete_obj:
>> +	while (--i >= 0)
>> +		kobject_del(&cranges->ranges[i].kobj);
>> +	kobject_del(&cranges->kobj);
>> +free:
>> +	kfree(cranges);
>> +	q->cranges = NULL;
>> +	return ret;
>> +}
>> +
>> +void blk_unregister_cranges(struct gendisk *disk)
>> +{
>> +	struct request_queue *q = disk->queue;
>> +	struct blk_cranges *cranges = q->cranges;
>> +	int i;
>> +
>> +	lockdep_assert_held(&q->sysfs_dir_lock);
>> +	lockdep_assert_held(&q->sysfs_lock);
>> +
>> +	if (!cranges)
>> +		return;
>> +
>> +	if (cranges->sysfs_registered) {
>> +		for (i = 0; i < cranges->nr_ranges; i++)
>> +			kobject_del(&cranges->ranges[i].kobj);
>> +		kobject_del(&cranges->kobj);
>> +	}
>> +
>> +	kfree(cranges);
>> +	q->cranges = NULL;
>> +}
>> +
>> +static bool blk_check_ranges(struct gendisk *disk, struct blk_cranges *cr)
>> +{
>> +	sector_t capacity = get_capacity(disk);
>> +	sector_t min_sector = (sector_t)-1;
>> +	sector_t max_sector = 0;
>> +	int i;
>> +
>> +	/*
>> +	 * Sector ranges may overlap but should overall contain all sectors
>> +	 * within the disk capacity.
>> +	 */
>> +	for (i = 0; i < cr->nr_ranges; i++) {
>> +		min_sector = min(min_sector, cr->ranges[i].sector);
>> +		max_sector = max(max_sector, cr->ranges[i].sector +
>> +					     cr->ranges[i].nr_sectors);
>> +	}
>> +
> 
> Warm fuzzy feelings.
> Sector ranges may overlap? Seriously?
> 
> The only way this could happen is when a given LBA will be present in 
> _two_ ranges. And this is valid?
> 
>> +	if (min_sector != 0 || max_sector < capacity) {
>> +		pr_warn("Invalid concurrent ranges: missing sectors\n");
>> +		return false;
>> +	}
>> +
>> +	if (max_sector > capacity) {
>> +		pr_warn("Invalid concurrent ranges: beyond capacity\n");
>> +		return false;
>> +	}
>> +
>> +	return true;
>> +}
>> +
>> +static bool blk_cranges_changed(struct gendisk *disk, struct blk_cranges *new)
>> +{
>> +	struct blk_cranges *old = disk->queue->cranges;
>> +	int i;
>> +
>> +	if (!old || old->nr_ranges != new->nr_ranges)
>> +		return true;
>> +
>> +	for (i = 0; i < new->nr_ranges; i++) {
>> +		if (old->ranges[i].sector != new->ranges[i].sector ||
>> +		    old->ranges[i].nr_sectors != new->ranges[i].nr_sectors)
>> +			return true;
>> +	}
>> +
>> +	return false;
>> +}
>> +
> 
> As mentioned above: I doubt we need to go into such detail. Just redo 
> the ranges on revalidation should be sufficient.
> 
>> +/**
>> + * blk_alloc_cranges - Allocate a concurrent positioning range structure
>> + * @disk:	target disk
>> + * @nr_ranges:	Number of concurrent ranges
>> + *
>> + * Allocate a struct blk_cranges structure with @nr_ranges range descriptors.
>> + */
>> +struct blk_cranges *blk_alloc_cranges(struct gendisk *disk, int nr_ranges)
>> +{
>> +	struct blk_cranges *cr;
>> +
>> +	cr = kzalloc_node(struct_size(cr, ranges, nr_ranges), GFP_KERNEL,
>> +			  disk->queue->node);
>> +	if (cr)
>> +		cr->nr_ranges = nr_ranges;
>> +	return cr;
>> +}
>> +EXPORT_SYMBOL_GPL(blk_alloc_cranges);
>> +
>> +/**
>> + * blk_queue_set_cranges - Set a disk concurrent positioning ranges
>> + * @disk:	target disk
>> + * @cr:		concurrent ranges structure
>> + *
>> + * Set the concurrant positioning ranges information of the request queue
>> + * of @disk to @cr. If @cr is NULL and the concurrent ranges structure
>> + * already set, if any, is cleared. If there are no differences between
>> + * @cr and the concurrent ranges structure already set, @cr is freed.
>> + */
>> +void blk_queue_set_cranges(struct gendisk *disk, struct blk_cranges *cr)
>> +{
>> +	struct request_queue *q = disk->queue;
>> +
>> +	if (WARN_ON_ONCE(cr && !cr->nr_ranges)) {
>> +		kfree(cr);
>> +		cr = NULL;
>> +	}
>> +
>> +	mutex_lock(&q->sysfs_dir_lock);
>> +	mutex_lock(&q->sysfs_lock);
>> +
> 
> And this is what you get for all the complexity.
> Having to take _two_ locks, and have a detailed description how the 
> function should be used.
> 
> I still think that redoing everything is simpler.
> 
>> +	if (cr && !blk_check_ranges(disk, cr)) {
>> +		kfree(cr);
>> +		cr = NULL;
>> +	}
>> +
>> +	if (!blk_cranges_changed(disk, cr)) {
>> +		kfree(cr);
>> +		goto unlock;
>> +	}
>> +
>> +	/*
>> +	 * This may be called for a registered queue. E.g. during a device
>> +	 * revalidation. If that is the case, we need to register the new set
>> +	 * of concurrent ranges. If the queue is not already registered, the
>> +	 * device request queue registration will register the ranges.
>> +	 */
>> +	if (blk_queue_registered(q)) {
>> +		blk_register_cranges(disk, cr);
>> +	} else {
>> +		swap(q->cranges, cr);
>> +		kfree(cr);
>> +	}
>> +
>> +unlock:
>> +	mutex_unlock(&q->sysfs_lock);
>> +	mutex_unlock(&q->sysfs_dir_lock);
>> +}
>> +EXPORT_SYMBOL_GPL(blk_queue_set_cranges);
>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
>> index 370d83c18057..aeac98ecc5a0 100644
>> --- a/block/blk-sysfs.c
>> +++ b/block/blk-sysfs.c
>> @@ -899,9 +899,21 @@ int blk_register_queue(struct gendisk *disk)
>>   	}
>>   
>>   	mutex_lock(&q->sysfs_lock);
>> +
>> +	ret = blk_register_cranges(disk, NULL);
>> +	if (ret) {
>> +		mutex_unlock(&q->sysfs_lock);
>> +		mutex_unlock(&q->sysfs_dir_lock);
>> +		kobject_del(&q->kobj);
>> +		blk_trace_remove_sysfs(dev);
>> +		kobject_put(&dev->kobj);
>> +		return ret;
>> +	}
>> +
>>   	if (q->elevator) {
>>   		ret = elv_register_queue(q, false);
>>   		if (ret) {
>> +			blk_unregister_cranges(disk);
>>   			mutex_unlock(&q->sysfs_lock);
>>   			mutex_unlock(&q->sysfs_dir_lock);
>>   			kobject_del(&q->kobj);
>> @@ -985,6 +997,7 @@ void blk_unregister_queue(struct gendisk *disk)
>>   	mutex_lock(&q->sysfs_lock);
>>   	if (q->elevator)
>>   		elv_unregister_queue(q);
>> +	blk_unregister_cranges(disk);
>>   	mutex_unlock(&q->sysfs_lock);
>>   	mutex_unlock(&q->sysfs_dir_lock);
>>   
>> diff --git a/block/blk.h b/block/blk.h
>> index 4b885c0f6708..650c0d87987c 100644
>> --- a/block/blk.h
>> +++ b/block/blk.h
>> @@ -368,4 +368,7 @@ extern struct device_attribute dev_attr_events;
>>   extern struct device_attribute dev_attr_events_async;
>>   extern struct device_attribute dev_attr_events_poll_msecs;
>>   
>> +int blk_register_cranges(struct gendisk *disk, struct blk_cranges *new_cranges);
>> +void blk_unregister_cranges(struct gendisk *disk);
>> +
>>   #endif /* BLK_INTERNAL_H */
>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>> index b9588c52815d..144e8cac44ba 100644
>> --- a/drivers/ata/libata-scsi.c
>> +++ b/drivers/ata/libata-scsi.c
>> @@ -1947,6 +1947,7 @@ static unsigned int ata_scsiop_inq_00(struct ata_scsi_args *args, u8 *rbuf)
>>   		0xb1,	/* page 0xb1, block device characteristics page */
>>   		0xb2,	/* page 0xb2, thin provisioning page */
>>   		0xb6,	/* page 0xb6, zoned block device characteristics */
>> +		0xb9,	/* page 0xb9, concurrent positioning ranges */
>>   	};
>>   
>>   	num_pages = sizeof(pages);
> 
> And this hunk is here because ... ?
> 
>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>> index 3177181c4326..cd58aa1090a3 100644
>> --- a/include/linux/blkdev.h
>> +++ b/include/linux/blkdev.h
>> @@ -378,6 +378,29 @@ static inline int blkdev_zone_mgmt_ioctl(struct block_device *bdev,
>>   
>>   #endif /* CONFIG_BLK_DEV_ZONED */
>>   
>> +/*
>> + * Concurrent sector ranges: struct blk_crange describes range of
>> + * contiguous sectors that can be served by independent resources on the
>> + * device. The set of ranges defined in struct blk_cranges must overall
>> + * include all sectors within the device capacity.
>> + * For a device with multiple ranges, e.g. a single LUN multi-actuator HDD,
>> + * requests targeting sectors in different ranges can be executed in parallel.
>> + * A request can straddle a range boundary.
>> + */
>> +struct blk_crange {
>> +	struct kobject		kobj;
>> +	struct request_queue	*queue;
>> +	sector_t		sector;
>> +	sector_t		nr_sectors;
>> +};
>> +
>> +struct blk_cranges {
>> +	struct kobject		kobj;
>> +	bool			sysfs_registered;
>> +	unsigned int		nr_ranges;
>> +	struct blk_crange	ranges[];
>> +};
>> +
>>   struct request_queue {
>>   	struct request		*last_merge;
>>   	struct elevator_queue	*elevator;
>> @@ -570,6 +593,9 @@ struct request_queue {
>>   
>>   #define BLK_MAX_WRITE_HINTS	5
>>   	u64			write_hints[BLK_MAX_WRITE_HINTS];
>> +
>> +	/* Concurrent sector ranges */
>> +	struct blk_cranges	*cranges;
>>   };
>>   
>>   /* Keep blk_queue_flag_name[] in sync with the definitions below */
>> @@ -1163,6 +1189,9 @@ extern void blk_queue_required_elevator_features(struct request_queue *q,
>>   extern bool blk_queue_can_use_dma_map_merging(struct request_queue *q,
>>   					      struct device *dev);
>>   
>> +struct blk_cranges *blk_alloc_cranges(struct gendisk *disk, int nr_ranges);
>> +void blk_queue_set_cranges(struct gendisk *disk, struct blk_cranges *cr);
>> +
>>   /*
>>    * Number of physical segments as sent to the device.
>>    *
>>
> 
> Cheers,
> 
> Hannes
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-07-23  0:00 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-21 10:42 [PATCH 0/4] Initial support for multi-actuator HDDs Damien Le Moal
2021-07-21 10:42 ` [PATCH 1/4] block: Add concurrent positioning ranges support Damien Le Moal
2021-07-22 16:11   ` Hannes Reinecke
2021-07-22 23:20     ` Damien Le Moal
2021-07-22 23:59     ` Damien Le Moal
2021-07-21 10:42 ` [PATCH 2/4] scsi: sd: add " Damien Le Moal
2021-07-21 18:01   ` Dan Carpenter
2021-07-22 16:14   ` Hannes Reinecke
2021-07-22 23:26     ` Damien Le Moal
2021-07-21 10:42 ` [PATCH 3/4] libata: support concurrent positioning ranges log Damien Le Moal
2021-07-22 17:34   ` Hannes Reinecke
2021-07-22 23:27     ` Damien Le Moal
2021-07-21 10:42 ` [PATCH 4/4] doc: document sysfs queue/cranges attributes Damien Le Moal
2021-07-22 15:58 ` [PATCH 0/4] Initial support for multi-actuator HDDs Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).