linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/5] Initial support for multi-actuator HDDs
@ 2021-08-27  7:50 Damien Le Moal
  2021-08-27  7:50 ` [PATCH v6 1/5] block: Add independent access ranges support Damien Le Moal
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Damien Le Moal @ 2021-08-27  7:50 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Martin K . Petersen, linux-scsi

Single LUN multi-actuator hard-disks are cappable to seek and execute
multiple commands in parallel. This capability is exposed to the host
using the Concurrent Positioning Ranges VPD page (SCSI) and Log (ATA).
Each positioning range describes the contiguous set of LBAs that an
actuator serves.

This series adds support to the scsi disk driver to retreive this
information and advertize it to user space through sysfs. libata is
also modified to handle ATA drives.

The first patch adds the block layer plumbing to expose concurrent
sector ranges of the device through sysfs as a sub-directory of the
device sysfs queue directory. Patch 2 and 3 add support to sd and
libata. Finally patch 4 documents the sysfs queue attributed changes.
Patch 5 fixes a typo in the document file (strictly speaking, not
related to this series).

This series does not attempt in any way to optimize accesses to
multi-actuator devices (e.g. block IO schedulers or filesystems). This
initial support only exposes the independent access ranges information
to user space through sysfs.

Changes from v5:
* Changed type names in patch 1:
  - struct blk_crange -> sturct blk_independent_access_range
  - struct blk_cranges -> sturct blk_independent_access_ranges
  All functions and variables are renamed accordingly, using shorter
  names related to the new type names, e.g.
  sturct blk_independent_access_ranges -> iaranges or iars.
* Update the commit message of patch 1 to 4. Patch 1 and 4 titles are
  also new.
* Dropped reviewed-tags on modified patches. Patch 3 and 5 are
  unmodified

Changes from v4:
* Fixed kdoc comment function name mismatch for disk_register_cranges()
  in patch 1

Changes from v3:
* Modified patch 1:
  - Prefix functions that take a struct gendisk as argument with
    "disk_". Modified patch 2 accordingly.
  - Added a functional release operation for struct blk_cranges kobj to
    ensure that this structure is freed only after all references to it
    are released, including kobject_del() execution for all crange sysfs
    entries.
* Added patch 5 to separate the typo fix from the crange documentation
  addition.
* Added reviewed-by tags

Changes from v2:
* Update patch 1 to fix a compilation warning for a potential NULL
  pointer dereference of the cr argument of blk_queue_set_cranges().
  Warning reported by the kernel test robot <lkp@intel.com>).

Changes from v1:
* Moved libata-scsi hunk from patch 1 to patch 3 where it belongs
* Fixed unintialized variable in patch 2
  Reported-by: kernel test robot <lkp@intel.com>
  Reported-by: Dan Carpenter <dan.carpenter@oracle.com
* Changed patch 3 adding struct ata_cpr_log to contain both the number
  of concurrent ranges and the array of concurrent ranges.
* Added a note in the documentation (patch 4) about the unit used for
  the concurrent ranges attributes.

Damien Le Moal (5):
  block: Add independent access ranges support
  scsi: sd: add concurrent positioning ranges support
  libata: support concurrent positioning ranges log
  doc: document sysfs queue/independent_access_ranges attributes
  doc: Fix typo in request queue sysfs documentation

 Documentation/block/queue-sysfs.rst |  30 ++-
 block/Makefile                      |   2 +-
 block/blk-iaranges.c                | 322 ++++++++++++++++++++++++++++
 block/blk-sysfs.c                   |  26 ++-
 block/blk.h                         |   4 +
 drivers/ata/libata-core.c           |  52 +++++
 drivers/ata/libata-scsi.c           |  48 ++++-
 drivers/scsi/sd.c                   |  81 +++++++
 drivers/scsi/sd.h                   |   1 +
 include/linux/ata.h                 |   1 +
 include/linux/blkdev.h              |  34 +++
 include/linux/libata.h              |  15 ++
 12 files changed, 597 insertions(+), 19 deletions(-)
 create mode 100644 block/blk-iaranges.c

-- 
2.31.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v6 1/5] block: Add independent access ranges support
  2021-08-27  7:50 [PATCH v6 0/5] Initial support for multi-actuator HDDs Damien Le Moal
@ 2021-08-27  7:50 ` Damien Le Moal
  2021-08-27  7:50 ` [PATCH v6 2/5] scsi: sd: add concurrent positioning " Damien Le Moal
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Damien Le Moal @ 2021-08-27  7:50 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Martin K . Petersen, linux-scsi

The Concurrent Positioning Ranges VPD page (for SCSI) and data log page
(for ATA) contain parameters describing the set of contiguous LBAs that
can be served independently by a single LUN multi-actuator hard-disk.
Similarly, a logically defined block device composed of multiple disks
can in some cases execute requests directed at different sector ranges
in parallel. A dm-linear device aggregating 2 block devices togewther is
an example.

This patch implement support for exposing this information to the
user to allow optimizing device accesses to increase performance.

To describe the set of independent sector ranges of a device (actuators
of a multi-actuator HDDs or table entries of a dm-linear device),
The type struct blk_independent_access_ranges is introduced. This
structure describes each sector range using an array of
struct blk_independent_access_range structures. This range structure
defines the start sector and number of sectors of the access range.

The function disk_set_iaranges() allows a device driver to signal to
the block layer that a device has multiple independent access ranges.
struct blk_independent_access_ranges is attached to the device request
queue by the disk_set_iaranges() function. The function
disk_alloc_iaranges() is provided for drivers to allocate this
structure.

The blk_independent_access_ranges structure contains kobjects
(struct kobject) to expose to the user through sysfs the set of
independent access ranges supported by a device. When the device is
initialized, sysfs registration of the range information is done from
blk_register_queue() using the block layer internal function
disk_register_iaranges(). If a driver calls disk_set_iaranges() for a
registered queue, e.g. when a device is revalidated, disk_set_iaranges()
will execute disk_register_iaranges() to update the queue sysfs
attribute files.

The sysfs file structure created starts from the
independent_access_ranges sub-directory and contains the start sector
and number of sectors of each range, with the information for each
range grouped in sub-directories.

E.g. for a dual actuator HDD, the user sees:

$ tree /sys/block/sdk/queue/independent_access_ranges/
/sys/block/sdk/queue/independent_access_ranges/
|-- 0
|   |-- nr_sectors
|   `-- sector
`-- 1
    |-- nr_sectors
    `-- sector

For a regular device with a single access range, the
independent_access_ranges sysfs directory does not exist.

Device revalidation may lead to changes to this structure and to the
attribute values. When manipulated, the queue sysfs_lock and
sysfs_dir_lock are held for atomicity, similarly to how the blk-mq and
elevator sysfs queue sub-directories are protected.

The code related to the management of independent access rages is
added in the new file block/blk-iaranges.c.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 block/Makefile         |   2 +-
 block/blk-iaranges.c   | 322 +++++++++++++++++++++++++++++++++++++++++
 block/blk-sysfs.c      |  26 +++-
 block/blk.h            |   4 +
 include/linux/blkdev.h |  34 +++++
 5 files changed, 379 insertions(+), 9 deletions(-)
 create mode 100644 block/blk-iaranges.c

diff --git a/block/Makefile b/block/Makefile
index 0d951adce796..8e45fced0bb8 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -9,7 +9,7 @@ obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-sysfs.o \
 			blk-lib.o blk-mq.o blk-mq-tag.o blk-stat.o \
 			blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \
 			genhd.o ioprio.o badblocks.o partitions/ blk-rq-qos.o \
-			disk-events.o
+			disk-events.o blk-iaranges.o
 
 obj-$(CONFIG_BOUNCE)		+= bounce.o
 obj-$(CONFIG_BLK_SCSI_REQUEST)	+= scsi_ioctl.o
diff --git a/block/blk-iaranges.c b/block/blk-iaranges.c
new file mode 100644
index 000000000000..bdecdf095f6a
--- /dev/null
+++ b/block/blk-iaranges.c
@@ -0,0 +1,322 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ *  Block device concurrent positioning ranges.
+ *
+ *  Copyright (C) 2021 Western Digital Corporation or its Affiliates.
+ */
+#include <linux/kernel.h>
+#include <linux/blkdev.h>
+#include <linux/slab.h>
+#include <linux/init.h>
+
+#include "blk.h"
+
+static ssize_t
+blk_iarange_sector_show(struct blk_independent_access_range *iar, char *buf)
+{
+	return sprintf(buf, "%llu\n", iar->sector);
+}
+
+static ssize_t
+blk_iarange_nr_sectors_show(struct blk_independent_access_range *iar, char *buf)
+{
+	return sprintf(buf, "%llu\n", iar->nr_sectors);
+}
+
+struct blk_iarange_sysfs_entry {
+	struct attribute attr;
+	ssize_t (*show)(struct blk_independent_access_range *iar, char *buf);
+};
+
+static struct blk_iarange_sysfs_entry blk_iarange_sector_entry = {
+	.attr = { .name = "sector", .mode = 0444 },
+	.show = blk_iarange_sector_show,
+};
+
+static struct blk_iarange_sysfs_entry blk_iarange_nr_sectors_entry = {
+	.attr = { .name = "nr_sectors", .mode = 0444 },
+	.show = blk_iarange_nr_sectors_show,
+};
+
+static struct attribute *blk_iarange_attrs[] = {
+	&blk_iarange_sector_entry.attr,
+	&blk_iarange_nr_sectors_entry.attr,
+	NULL,
+};
+ATTRIBUTE_GROUPS(blk_iarange);
+
+static ssize_t blk_iarange_sysfs_show(struct kobject *kobj,
+				      struct attribute *attr, char *buf)
+{
+	struct blk_iarange_sysfs_entry *entry =
+		container_of(attr, struct blk_iarange_sysfs_entry, attr);
+	struct blk_independent_access_range *iar =
+		container_of(kobj, struct blk_independent_access_range, kobj);
+	ssize_t ret;
+
+	mutex_lock(&iar->queue->sysfs_lock);
+	ret = entry->show(iar, buf);
+	mutex_unlock(&iar->queue->sysfs_lock);
+
+	return ret;
+}
+
+static const struct sysfs_ops blk_iarange_sysfs_ops = {
+	.show	= blk_iarange_sysfs_show,
+};
+
+/*
+ * Independent access range entries are not freed individually, but alltogether
+ * with struct blk_independent_access_ranges and its array of range entries.
+ * Since kobject_add() takes a reference on the parent kobject contained in
+ * struct blk_independent_access_ranges, the array of independent access range
+ * entries cannot be freed until kobject_del() is called for all entries.
+ * So we do not need to do anything here, but still need this no-op release
+ * operation to avoid complaints from the kobject code.
+ */
+static void blk_iarange_sysfs_nop_release(struct kobject *kobj)
+{
+}
+
+static struct kobj_type blk_iarange_ktype = {
+	.sysfs_ops	= &blk_iarange_sysfs_ops,
+	.default_groups	= blk_iarange_groups,
+	.release	= blk_iarange_sysfs_nop_release,
+};
+
+/*
+ * This will be executed only after all independent access range entries are
+ * removed with kobject_del(), at which point, it is safe to free everything,
+ * including the array of range entries.
+ */
+static void blk_iaranges_sysfs_release(struct kobject *kobj)
+{
+	struct blk_independent_access_ranges *iars =
+		container_of(kobj, struct blk_independent_access_ranges, kobj);
+
+	kfree(iars);
+}
+
+static struct kobj_type blk_iaranges_ktype = {
+	.release	= blk_iaranges_sysfs_release,
+};
+
+/**
+ * disk_register_iaranges - register with sysfs a set of independent
+ *			    access ranges
+ * @disk:	Target disk
+ * @new_iars:	New set of independent access ranges
+ *
+ * Register with sysfs a set of independent access ranges for @disk.
+ * If @new_iars is not NULL, this set of ranges is registered and the old set
+ * specified by q->iaranges is unregistered. Otherwise, q->iaranges is
+ * registered if it is not already.
+ */
+int disk_register_iaranges(struct gendisk *disk,
+			   struct blk_independent_access_ranges *new_iars)
+{
+	struct request_queue *q = disk->queue;
+	struct blk_independent_access_ranges *iars;
+	int i, ret;
+
+	lockdep_assert_held(&q->sysfs_dir_lock);
+	lockdep_assert_held(&q->sysfs_lock);
+
+	/* If a new range set is specified, unregister the old one */
+	if (new_iars) {
+		if (q->iaranges)
+			disk_unregister_iaranges(disk);
+		q->iaranges = new_iars;
+	}
+
+	iars = q->iaranges;
+	if (!iars)
+		return 0;
+
+	/*
+	 * At this point, iars is the new set of sector access ranges that needs
+	 * to be registered with sysfs.
+	 */
+	WARN_ON(iars->sysfs_registered);
+	ret = kobject_init_and_add(&iars->kobj, &blk_iaranges_ktype,
+				   &q->kobj, "%s", "independent_access_ranges");
+	if (ret) {
+		q->iaranges = NULL;
+		kfree(iars);
+		return ret;
+	}
+
+	for (i = 0; i < iars->nr_iaranges; i++) {
+		iars->iarange[i].queue = q;
+		ret = kobject_init_and_add(&iars->iarange[i].kobj,
+					   &blk_iarange_ktype, &iars->kobj,
+					   "%d", i);
+		if (ret) {
+			while (--i >= 0)
+				kobject_del(&iars->iarange[i].kobj);
+			kobject_del(&iars->kobj);
+			kobject_put(&iars->kobj);
+			return ret;
+		}
+	}
+
+	iars->sysfs_registered = true;
+
+	return 0;
+}
+
+void disk_unregister_iaranges(struct gendisk *disk)
+{
+	struct request_queue *q = disk->queue;
+	struct blk_independent_access_ranges *iars = q->iaranges;
+	int i;
+
+	lockdep_assert_held(&q->sysfs_dir_lock);
+	lockdep_assert_held(&q->sysfs_lock);
+
+	if (!iars)
+		return;
+
+	if (iars->sysfs_registered) {
+		for (i = 0; i < iars->nr_iaranges; i++)
+			kobject_del(&iars->iarange[i].kobj);
+		kobject_del(&iars->kobj);
+		kobject_put(&iars->kobj);
+	} else {
+		kfree(iars);
+	}
+
+	q->iaranges = NULL;
+}
+
+static bool disk_check_iaranges(struct gendisk *disk,
+				struct blk_independent_access_ranges *iars)
+{
+	sector_t capacity = get_capacity(disk);
+	sector_t min_sector = (sector_t)-1;
+	sector_t max_sector = 0;
+	int i;
+
+	/*
+	 * Sector ranges may overlap but should overall contain all sectors
+	 * within the disk capacity.
+	 */
+	for (i = 0; i < iars->nr_iaranges; i++) {
+		min_sector = min(min_sector, iars->iarange[i].sector);
+		max_sector = max(max_sector, iars->iarange[i].sector +
+					     iars->iarange[i].nr_sectors);
+	}
+
+	if (min_sector != 0 || max_sector < capacity) {
+		pr_warn("Invalid independent access ranges: missing sectors\n");
+		return false;
+	}
+
+	if (max_sector > capacity) {
+		pr_warn("Invalid independent access ranges: beyond capacity\n");
+		return false;
+	}
+
+	return true;
+}
+
+static bool disk_iaranges_changed(struct gendisk *disk,
+				  struct blk_independent_access_ranges *new)
+{
+	struct blk_independent_access_ranges *old = disk->queue->iaranges;
+	int i;
+
+	if (!old)
+		return true;
+
+	if (old->nr_iaranges != new->nr_iaranges)
+		return true;
+
+	for (i = 0; i < old->nr_iaranges; i++) {
+		if (new->iarange[i].sector != old->iarange[i].sector ||
+		    new->iarange[i].nr_sectors != old->iarange[i].nr_sectors)
+			return true;
+	}
+
+	return false;
+}
+
+/**
+ * disk_alloc_iaranges - Allocate an independent access range structure
+ * @disk:		target disk
+ * @nr_iaranges:	Number of independent access ranges
+ *
+ * Allocate a struct blk_independent_access_ranges structure with @nr_iaranges
+ * access range descriptors.
+ */
+struct blk_independent_access_ranges *disk_alloc_iaranges(struct gendisk *disk,
+							  int nr_iaranges)
+{
+	struct blk_independent_access_ranges *iars;
+
+	iars = kzalloc_node(struct_size(iars, iarange, nr_iaranges),
+			    GFP_KERNEL, disk->queue->node);
+	if (iars)
+		iars->nr_iaranges = nr_iaranges;
+	return iars;
+}
+EXPORT_SYMBOL_GPL(disk_alloc_iaranges);
+
+/**
+ * disk_set_iaranges - Set a disk independent access ranges
+ * @disk:	target disk
+ * @iars:	independent access ranges structure
+ *
+ * Set the independent access ranges information of the request queue
+ * of @disk to @iars. If @iars is NULL and the independent access ranges
+ * structure already set is cleared. If there are no differences between
+ * @iars and the independent access ranges structure already set, @iars
+ * is freed.
+ */
+void disk_set_iaranges(struct gendisk *disk,
+		       struct blk_independent_access_ranges *iars)
+{
+	struct request_queue *q = disk->queue;
+
+	if (WARN_ON_ONCE(iars && !iars->nr_iaranges)) {
+		kfree(iars);
+		iars = NULL;
+	}
+
+	mutex_lock(&q->sysfs_dir_lock);
+	mutex_lock(&q->sysfs_lock);
+
+	if (iars) {
+		if (!disk_check_iaranges(disk, iars)) {
+			kfree(iars);
+			iars = NULL;
+			goto reg;
+		}
+
+		if (!disk_iaranges_changed(disk, iars)) {
+			kfree(iars);
+			goto unlock;
+		}
+	}
+
+	/*
+	 * This may be called for a registered queue. E.g. during a device
+	 * revalidation. If that is the case, we need to unregister the old
+	 * set of independent access ranges and register the new set. If the
+	 * queue is not registered, registration of the device request queue
+	 * will register the independent access ranges, so only swap in the
+	 * new set and free the old one.
+	 */
+reg:
+	if (blk_queue_registered(q)) {
+		disk_register_iaranges(disk, iars);
+	} else {
+		swap(q->iaranges, iars);
+		kfree(iars);
+	}
+
+unlock:
+	mutex_unlock(&q->sysfs_lock);
+	mutex_unlock(&q->sysfs_dir_lock);
+}
+EXPORT_SYMBOL_GPL(disk_set_iaranges);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 614d9d47de36..874b34060efa 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -887,16 +887,15 @@ int blk_register_queue(struct gendisk *disk)
 	}
 
 	mutex_lock(&q->sysfs_lock);
+
+	ret = disk_register_iaranges(disk, NULL);
+	if (ret)
+		goto put_dev;
+
 	if (q->elevator) {
 		ret = elv_register_queue(q, false);
-		if (ret) {
-			mutex_unlock(&q->sysfs_lock);
-			mutex_unlock(&q->sysfs_dir_lock);
-			kobject_del(&q->kobj);
-			blk_trace_remove_sysfs(dev);
-			kobject_put(&dev->kobj);
-			return ret;
-		}
+		if (ret)
+			goto put_dev;
 	}
 
 	blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q);
@@ -927,6 +926,16 @@ int blk_register_queue(struct gendisk *disk)
 		percpu_ref_switch_to_percpu(&q->q_usage_counter);
 	}
 
+	return ret;
+
+put_dev:
+	disk_unregister_iaranges(disk);
+	mutex_unlock(&q->sysfs_lock);
+	mutex_unlock(&q->sysfs_dir_lock);
+	kobject_del(&q->kobj);
+	blk_trace_remove_sysfs(dev);
+	kobject_put(&dev->kobj);
+
 	return ret;
 }
 
@@ -972,6 +981,7 @@ void blk_unregister_queue(struct gendisk *disk)
 	mutex_lock(&q->sysfs_lock);
 	if (q->elevator)
 		elv_unregister_queue(q);
+	disk_unregister_iaranges(disk);
 	mutex_unlock(&q->sysfs_lock);
 	mutex_unlock(&q->sysfs_dir_lock);
 
diff --git a/block/blk.h b/block/blk.h
index bbbcc1a64a2d..2aab4a936c4e 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -370,4 +370,8 @@ extern struct device_attribute dev_attr_events;
 extern struct device_attribute dev_attr_events_async;
 extern struct device_attribute dev_attr_events_poll_msecs;
 
+int disk_register_iaranges(struct gendisk *disk,
+			   struct blk_independent_access_ranges *new_iars);
+void disk_unregister_iaranges(struct gendisk *disk);
+
 #endif /* BLK_INTERNAL_H */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c9cb12483e12..62cebeec6699 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -377,6 +377,32 @@ static inline int blkdev_zone_mgmt_ioctl(struct block_device *bdev,
 
 #endif /* CONFIG_BLK_DEV_ZONED */
 
+/*
+ * Independent access ranges: struct blk_independent_access_range describes
+ * a range of contiguous sectors that can be accessed using device command
+ * execution resources that are independent form the resources used for
+ * other access ranges. This is typically found with multi-actuator HDDs where
+ * each access range is served by a different set of heads.
+ * The set of ranges defined in struct blk_independent_access_ranges must
+ * overall include all sectors within the device capacity.
+ * For a device with multiple ranges, e.g. a single LUN multi-actuator HDD,
+ * requests targeting sectors in different ranges can be executed in parallel.
+ * A request can straddle an access range boundary.
+ */
+struct blk_independent_access_range {
+	struct kobject		kobj;
+	struct request_queue	*queue;
+	sector_t		sector;
+	sector_t		nr_sectors;
+};
+
+struct blk_independent_access_ranges {
+	struct kobject				kobj;
+	bool					sysfs_registered;
+	unsigned int				nr_iaranges;
+	struct blk_independent_access_range	iarange[];
+};
+
 struct request_queue {
 	struct request		*last_merge;
 	struct elevator_queue	*elevator;
@@ -569,6 +595,9 @@ struct request_queue {
 
 #define BLK_MAX_WRITE_HINTS	5
 	u64			write_hints[BLK_MAX_WRITE_HINTS];
+
+	/* Independent sector ranges */
+	struct blk_independent_access_ranges *iaranges;
 };
 
 /* Keep blk_queue_flag_name[] in sync with the definitions below */
@@ -1164,6 +1193,11 @@ extern void blk_queue_required_elevator_features(struct request_queue *q,
 extern bool blk_queue_can_use_dma_map_merging(struct request_queue *q,
 					      struct device *dev);
 
+struct blk_independent_access_ranges *disk_alloc_iaranges(struct gendisk *disk,
+							  int nr_iaranges);
+void disk_set_iaranges(struct gendisk *disk,
+		       struct blk_independent_access_ranges *iars);
+
 /*
  * Number of physical segments as sent to the device.
  *
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 2/5] scsi: sd: add concurrent positioning ranges support
  2021-08-27  7:50 [PATCH v6 0/5] Initial support for multi-actuator HDDs Damien Le Moal
  2021-08-27  7:50 ` [PATCH v6 1/5] block: Add independent access ranges support Damien Le Moal
@ 2021-08-27  7:50 ` Damien Le Moal
  2021-08-27  7:50 ` [PATCH v6 3/5] libata: support concurrent positioning ranges log Damien Le Moal
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Damien Le Moal @ 2021-08-27  7:50 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Martin K . Petersen, linux-scsi

Add the sd_read_cpr() function to the sd scsi disk driver to discover
if a device has multiple concurrent positioning ranges (i.e. multiple
actuators on an HDD). The existence of VPD page B9h indicates if a
device has multiple concurrent positioning ranges. The page content
describes each range supported by the device.

sd_read_cpr() is called from sd_revalidate_disk() and uses the block
layer functions disk_alloc_iaranges() and disk_set_iaranges() to
represent the set of actuators of the device as independent access
ranges.

The format of the Concurrent Positioning Ranges VPD page B9h is defined
in section 6.6.6 of SBC-5.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/scsi/sd.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++
 drivers/scsi/sd.h |  1 +
 2 files changed, 82 insertions(+)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 610ebba0d66e..1bb49d74c3db 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3126,6 +3126,86 @@ static void sd_read_security(struct scsi_disk *sdkp, unsigned char *buffer)
 		sdkp->security = 1;
 }
 
+static inline sector_t sd64_to_sectors(struct scsi_disk *sdkp, u8 *buf)
+{
+	return logical_to_sectors(sdkp->device, get_unaligned_be64(buf));
+}
+
+/**
+ * sd_read_cpr - Query concurrent positioning ranges
+ * @sdkp:	disk to query
+ */
+static void sd_read_cpr(struct scsi_disk *sdkp)
+{
+	struct blk_independent_access_ranges *iars = NULL;
+	unsigned char *buffer = NULL;
+	unsigned int nr_cpr = 0;
+	int i, vpd_len, buf_len = SD_BUF_SIZE;
+	u8 *desc;
+
+	/*
+	 * We need to have the capacity set first for the block layer to be
+	 * able to check the ranges.
+	 */
+	if (sdkp->first_scan)
+		return;
+
+	if (!sdkp->capacity)
+		goto out;
+
+	/*
+	 * Concurrent Positioning Ranges VPD: there can be at most 256 ranges,
+	 * leading to a maximum page size of 64 + 256*32 bytes.
+	 */
+	buf_len = 64 + 256*32;
+	buffer = kmalloc(buf_len, GFP_KERNEL);
+	if (!buffer || scsi_get_vpd_page(sdkp->device, 0xb9, buffer, buf_len))
+		goto out;
+
+	/* We must have at least a 64B header and one 32B range descriptor */
+	vpd_len = get_unaligned_be16(&buffer[2]) + 3;
+	if (vpd_len > buf_len || vpd_len < 64 + 32 || (vpd_len & 31)) {
+		sd_printk(KERN_ERR, sdkp,
+			  "Invalid Concurrent Positioning Ranges VPD page\n");
+		goto out;
+	}
+
+	nr_cpr = (vpd_len - 64) / 32;
+	if (nr_cpr == 1) {
+		nr_cpr = 0;
+		goto out;
+	}
+
+	iars = disk_alloc_iaranges(sdkp->disk, nr_cpr);
+	if (!iars) {
+		nr_cpr = 0;
+		goto out;
+	}
+
+	desc = &buffer[64];
+	for (i = 0; i < nr_cpr; i++, desc += 32) {
+		if (desc[0] != i) {
+			sd_printk(KERN_ERR, sdkp,
+				"Invalid Concurrent Positioning Range number\n");
+			nr_cpr = 0;
+			break;
+		}
+
+		iars->iarange[i].sector = sd64_to_sectors(sdkp, desc + 8);
+		iars->iarange[i].nr_sectors = sd64_to_sectors(sdkp, desc + 16);
+	}
+
+out:
+	disk_set_iaranges(sdkp->disk, iars);
+	if (nr_cpr && sdkp->nr_actuators != nr_cpr) {
+		sd_printk(KERN_NOTICE, sdkp,
+			  "%u concurrent positioning ranges\n", nr_cpr);
+		sdkp->nr_actuators = nr_cpr;
+	}
+
+	kfree(buffer);
+}
+
 /*
  * Determine the device's preferred I/O size for reads and writes
  * unless the reported value is unreasonably small, large, not a
@@ -3241,6 +3321,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
 		sd_read_app_tag_own(sdkp, buffer);
 		sd_read_write_same(sdkp, buffer);
 		sd_read_security(sdkp, buffer);
+		sd_read_cpr(sdkp);
 	}
 
 	/*
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index b59136c4125b..2e5932bde43d 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -106,6 +106,7 @@ struct scsi_disk {
 	u8		protection_type;/* Data Integrity Field */
 	u8		provisioning_mode;
 	u8		zeroing_mode;
+	u8		nr_actuators;		/* Number of actuators */
 	unsigned	ATO : 1;	/* state of disk ATO bit */
 	unsigned	cache_override : 1; /* temp override of WCE,RCD */
 	unsigned	WCE : 1;	/* state of disk WCE bit */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 3/5] libata: support concurrent positioning ranges log
  2021-08-27  7:50 [PATCH v6 0/5] Initial support for multi-actuator HDDs Damien Le Moal
  2021-08-27  7:50 ` [PATCH v6 1/5] block: Add independent access ranges support Damien Le Moal
  2021-08-27  7:50 ` [PATCH v6 2/5] scsi: sd: add concurrent positioning " Damien Le Moal
@ 2021-08-27  7:50 ` Damien Le Moal
  2021-08-27  7:50 ` [PATCH v6 4/5] doc: document sysfs queue/independent_access_ranges attributes Damien Le Moal
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Damien Le Moal @ 2021-08-27  7:50 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Martin K . Petersen, linux-scsi

Add support to discover if an ATA device supports the Concurrent
Positioning Ranges Log (address 0x47), indicating that the device is
capable of seeking to multiple different locations in parallel using
multiple actuators serving different LBA ranges.

Also add support to translate the concurrent positioning ranges log
into its equivalent Concurrent Positioning Ranges VPD page B9h in
libata-scsi.c.

The format of the Concurrent Positioning Ranges Log is defined in ACS-5
r9.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/ata/libata-core.c | 52 +++++++++++++++++++++++++++++++++++++++
 drivers/ata/libata-scsi.c | 48 +++++++++++++++++++++++++++++-------
 include/linux/ata.h       |  1 +
 include/linux/libata.h    | 15 +++++++++++
 4 files changed, 107 insertions(+), 9 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 61c762961ca8..ab3f61ea743e 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -2363,6 +2363,57 @@ static void ata_dev_config_trusted(struct ata_device *dev)
 		dev->flags |= ATA_DFLAG_TRUSTED;
 }
 
+static void ata_dev_config_cpr(struct ata_device *dev)
+{
+	unsigned int err_mask;
+	size_t buf_len;
+	int i, nr_cpr = 0;
+	struct ata_cpr_log *cpr_log = NULL;
+	u8 *desc, *buf = NULL;
+
+	if (!ata_identify_page_supported(dev,
+				 ATA_LOG_CONCURRENT_POSITIONING_RANGES))
+		goto out;
+
+	/*
+	 * Read IDENTIFY DEVICE data log, page 0x47
+	 * (concurrent positioning ranges). We can have at most 255 32B range
+	 * descriptors plus a 64B header.
+	 */
+	buf_len = (64 + 255 * 32 + 511) & ~511;
+	buf = kzalloc(buf_len, GFP_KERNEL);
+	if (!buf)
+		goto out;
+
+	err_mask = ata_read_log_page(dev, ATA_LOG_IDENTIFY_DEVICE,
+				     ATA_LOG_CONCURRENT_POSITIONING_RANGES,
+				     buf, buf_len >> 9);
+	if (err_mask)
+		goto out;
+
+	nr_cpr = buf[0];
+	if (!nr_cpr)
+		goto out;
+
+	cpr_log = kzalloc(struct_size(cpr_log, cpr, nr_cpr), GFP_KERNEL);
+	if (!cpr_log)
+		goto out;
+
+	cpr_log->nr_cpr = nr_cpr;
+	desc = &buf[64];
+	for (i = 0; i < nr_cpr; i++, desc += 32) {
+		cpr_log->cpr[i].num = desc[0];
+		cpr_log->cpr[i].num_storage_elements = desc[1];
+		cpr_log->cpr[i].start_lba = get_unaligned_le64(&desc[8]);
+		cpr_log->cpr[i].num_lbas = get_unaligned_le64(&desc[16]);
+	}
+
+out:
+	swap(dev->cpr_log, cpr_log);
+	kfree(cpr_log);
+	kfree(buf);
+}
+
 /**
  *	ata_dev_configure - Configure the specified ATA/ATAPI device
  *	@dev: Target device to configure
@@ -2591,6 +2642,7 @@ int ata_dev_configure(struct ata_device *dev)
 		ata_dev_config_sense_reporting(dev);
 		ata_dev_config_zac(dev);
 		ata_dev_config_trusted(dev);
+		ata_dev_config_cpr(dev);
 		dev->cdb_len = 32;
 	}
 
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index b9588c52815d..5cadbb9a8bf2 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1937,7 +1937,7 @@ static unsigned int ata_scsiop_inq_std(struct ata_scsi_args *args, u8 *rbuf)
  */
 static unsigned int ata_scsiop_inq_00(struct ata_scsi_args *args, u8 *rbuf)
 {
-	int num_pages;
+	int i, num_pages = 0;
 	static const u8 pages[] = {
 		0x00,	/* page 0x00, this page */
 		0x80,	/* page 0x80, unit serial no page */
@@ -1947,13 +1947,17 @@ static unsigned int ata_scsiop_inq_00(struct ata_scsi_args *args, u8 *rbuf)
 		0xb1,	/* page 0xb1, block device characteristics page */
 		0xb2,	/* page 0xb2, thin provisioning page */
 		0xb6,	/* page 0xb6, zoned block device characteristics */
+		0xb9,	/* page 0xb9, concurrent positioning ranges */
 	};
 
-	num_pages = sizeof(pages);
-	if (!(args->dev->flags & ATA_DFLAG_ZAC))
-		num_pages--;
+	for (i = 0; i < sizeof(pages); i++) {
+		if (pages[i] == 0xb6 &&
+		    !(args->dev->flags & ATA_DFLAG_ZAC))
+			continue;
+		rbuf[num_pages + 4] = pages[i];
+		num_pages++;
+	}
 	rbuf[3] = num_pages;	/* number of supported VPD pages */
-	memcpy(rbuf + 4, pages, num_pages);
 	return 0;
 }
 
@@ -2163,6 +2167,26 @@ static unsigned int ata_scsiop_inq_b6(struct ata_scsi_args *args, u8 *rbuf)
 	return 0;
 }
 
+static unsigned int ata_scsiop_inq_b9(struct ata_scsi_args *args, u8 *rbuf)
+{
+	struct ata_cpr_log *cpr_log = args->dev->cpr_log;
+	u8 *desc = &rbuf[64];
+	int i;
+
+	/* SCSI Concurrent Positioning Ranges VPD page: SBC-5 rev 1 or later */
+	rbuf[1] = 0xb9;
+	put_unaligned_be16(64 + (int)cpr_log->nr_cpr * 32 - 4, &rbuf[3]);
+
+	for (i = 0; i < cpr_log->nr_cpr; i++, desc += 32) {
+		desc[0] = cpr_log->cpr[i].num;
+		desc[1] = cpr_log->cpr[i].num_storage_elements;
+		put_unaligned_be64(cpr_log->cpr[i].start_lba, &desc[8]);
+		put_unaligned_be64(cpr_log->cpr[i].num_lbas, &desc[16]);
+	}
+
+	return 0;
+}
+
 /**
  *	modecpy - Prepare response for MODE SENSE
  *	@dest: output buffer
@@ -4162,11 +4186,17 @@ void ata_scsi_simulate(struct ata_device *dev, struct scsi_cmnd *cmd)
 			ata_scsi_rbuf_fill(&args, ata_scsiop_inq_b2);
 			break;
 		case 0xb6:
-			if (dev->flags & ATA_DFLAG_ZAC) {
+			if (dev->flags & ATA_DFLAG_ZAC)
 				ata_scsi_rbuf_fill(&args, ata_scsiop_inq_b6);
-				break;
-			}
-			fallthrough;
+			else
+				ata_scsi_set_invalid_field(dev, cmd, 2, 0xff);
+			break;
+		case 0xb9:
+			if (dev->cpr_log)
+				ata_scsi_rbuf_fill(&args, ata_scsiop_inq_b9);
+			else
+				ata_scsi_set_invalid_field(dev, cmd, 2, 0xff);
+			break;
 		default:
 			ata_scsi_set_invalid_field(dev, cmd, 2, 0xff);
 			break;
diff --git a/include/linux/ata.h b/include/linux/ata.h
index 1b44f40c7700..199e47e97d64 100644
--- a/include/linux/ata.h
+++ b/include/linux/ata.h
@@ -329,6 +329,7 @@ enum {
 	ATA_LOG_SECURITY	  = 0x06,
 	ATA_LOG_SATA_SETTINGS	  = 0x08,
 	ATA_LOG_ZONED_INFORMATION = 0x09,
+	ATA_LOG_CONCURRENT_POSITIONING_RANGES = 0x47,
 
 	/* Identify device SATA settings log:*/
 	ATA_LOG_DEVSLP_OFFSET	  = 0x30,
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 3fcd24236793..b159a245d88c 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -670,6 +670,18 @@ struct ata_ering {
 	struct ata_ering_entry	ring[ATA_ERING_SIZE];
 };
 
+struct ata_cpr {
+	u8			num;
+	u8			num_storage_elements;
+	u64			start_lba;
+	u64			num_lbas;
+};
+
+struct ata_cpr_log {
+	u8			nr_cpr;
+	struct ata_cpr		cpr[];
+};
+
 struct ata_device {
 	struct ata_link		*link;
 	unsigned int		devno;		/* 0 or 1 */
@@ -729,6 +741,9 @@ struct ata_device {
 	u32			zac_zones_optimal_nonseq;
 	u32			zac_zones_max_open;
 
+	/* Concurrent positioning ranges */
+	struct ata_cpr_log	*cpr_log;
+
 	/* error history */
 	int			spdn_cnt;
 	/* ering is CLEAR_END, read comment above CLEAR_END */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 4/5] doc: document sysfs queue/independent_access_ranges attributes
  2021-08-27  7:50 [PATCH v6 0/5] Initial support for multi-actuator HDDs Damien Le Moal
                   ` (2 preceding siblings ...)
  2021-08-27  7:50 ` [PATCH v6 3/5] libata: support concurrent positioning ranges log Damien Le Moal
@ 2021-08-27  7:50 ` Damien Le Moal
  2021-08-27  7:50 ` [PATCH v6 5/5] doc: Fix typo in request queue sysfs documentation Damien Le Moal
  2021-08-27 13:58 ` [PATCH v6 0/5] Initial support for multi-actuator HDDs Phillip Susi
  5 siblings, 0 replies; 13+ messages in thread
From: Damien Le Moal @ 2021-08-27  7:50 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Martin K . Petersen, linux-scsi

Update the file Documentation/block/queue-sysfs.rst to add a description
of a device queue sysfs entries related to independent access ranges
(e.g. concurrent positioning ranges for multi-actuator hard-disks).

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 Documentation/block/queue-sysfs.rst | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/Documentation/block/queue-sysfs.rst b/Documentation/block/queue-sysfs.rst
index 4dc7f0d499a8..17eef55434e7 100644
--- a/Documentation/block/queue-sysfs.rst
+++ b/Documentation/block/queue-sysfs.rst
@@ -286,4 +286,32 @@ sequential zones of zoned block devices (devices with a zoned attributed
 that reports "host-managed" or "host-aware"). This value is always 0 for
 regular block devices.
 
+independent_access_ranges (RO)
+------------------------------
+
+The presence of this sub-directory of the /sys/block/xxx/queue/ directory
+indicates that the device is capable of executing requests targeting
+different sector ranges in parallel. For instance, single LUN multi-actuator
+hard-disks will likely have an independent_access_ranges directory if the
+device correctly advertizes the sector ranges of its actuators.
+
+The independent_access_ranges directory contains one directory per access
+range, with each range described using the sector (RO) attribute file to
+indicate the first sector of the range and the nr_sectors (RO) attribute file
+to indicate the total number of sector in the range starting from the first
+sector of the range.  For example, a dual-actuator hard disk will have the
+following independent_access_ranges entries.::
+
+        $ tree /sys/block/<device>/queue/independent_access_ranges/
+        /sys/block/<device>/queue/independent_access_ranges/
+        |-- 0
+        |   |-- nr_sectors
+        |   `-- sector
+        `-- 1
+            |-- nr_sectors
+            `-- sector
+
+The sector and nr_sectors attributes use 512B sector unit, regardless of
+the actual block size of the device.
+
 Jens Axboe <jens.axboe@oracle.com>, February 2009
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 5/5] doc: Fix typo in request queue sysfs documentation
  2021-08-27  7:50 [PATCH v6 0/5] Initial support for multi-actuator HDDs Damien Le Moal
                   ` (3 preceding siblings ...)
  2021-08-27  7:50 ` [PATCH v6 4/5] doc: document sysfs queue/independent_access_ranges attributes Damien Le Moal
@ 2021-08-27  7:50 ` Damien Le Moal
  2021-08-27 13:58 ` [PATCH v6 0/5] Initial support for multi-actuator HDDs Phillip Susi
  5 siblings, 0 replies; 13+ messages in thread
From: Damien Le Moal @ 2021-08-27  7:50 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Martin K . Petersen, linux-scsi

Fix a typo (are -> as) in the introduction paragraph of
Documentation/block/queue-sysfs.rst.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 Documentation/block/queue-sysfs.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/block/queue-sysfs.rst b/Documentation/block/queue-sysfs.rst
index 17eef55434e7..f4cf9e20f878 100644
--- a/Documentation/block/queue-sysfs.rst
+++ b/Documentation/block/queue-sysfs.rst
@@ -4,7 +4,7 @@ Queue sysfs files
 
 This text file will detail the queue files that are located in the sysfs tree
 for each block device. Note that stacked devices typically do not export
-any settings, since their queue merely functions are a remapping target.
+any settings, since their queue merely functions as a remapping target.
 These files are the ones found in the /sys/block/xxx/queue/ directory.
 
 Files denoted with a RO postfix are readonly and the RW postfix means
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 0/5] Initial support for multi-actuator HDDs
  2021-08-27  7:50 [PATCH v6 0/5] Initial support for multi-actuator HDDs Damien Le Moal
                   ` (4 preceding siblings ...)
  2021-08-27  7:50 ` [PATCH v6 5/5] doc: Fix typo in request queue sysfs documentation Damien Le Moal
@ 2021-08-27 13:58 ` Phillip Susi
  2021-08-27 14:28   ` Tim Walker
  5 siblings, 1 reply; 13+ messages in thread
From: Phillip Susi @ 2021-08-27 13:58 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: Jens Axboe, linux-block, Martin K . Petersen, linux-scsi


Damien Le Moal <damien.lemoal@wdc.com> writes:

> Single LUN multi-actuator hard-disks are cappable to seek and execute
> multiple commands in parallel. This capability is exposed to the host
> using the Concurrent Positioning Ranges VPD page (SCSI) and Log (ATA).
> Each positioning range describes the contiguous set of LBAs that an
> actuator serves.

Are these ranges exlusive to each actuator or can they overlap?

> This series does not attempt in any way to optimize accesses to
> multi-actuator devices (e.g. block IO schedulers or filesystems). This
> initial support only exposes the independent access ranges information
> to user space through sysfs.

Is the plan to eventually change the IO scheduler to maintain two
different queues, one for each actuator, and send down commands for two
different IO streams that the elevator attempts to keep sequential?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 0/5] Initial support for multi-actuator HDDs
  2021-08-27 13:58 ` [PATCH v6 0/5] Initial support for multi-actuator HDDs Phillip Susi
@ 2021-08-27 14:28   ` Tim Walker
  2021-08-27 16:41     ` Christoph Hellwig
  2021-08-27 17:34     ` Phillip Susi
  0 siblings, 2 replies; 13+ messages in thread
From: Tim Walker @ 2021-08-27 14:28 UTC (permalink / raw)
  To: Phillip Susi, Damien Le Moal
  Cc: Jens Axboe, linux-block, Martin K . Petersen, linux-scsi



On  Friday, August 27, 2021 at 10:10:15 AM Phillip Susi wrote:

>
>This message has originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email.
>
>
>Damien Le Moal <damien.lemoal@wdc.com> writes:
>
>> Single LUN multi-actuator hard-disks are cappable to seek and execute
>> multiple commands in parallel. This capability is exposed to the host
>> using the Concurrent Positioning Ranges VPD page (SCSI) and Log (ATA).
>> Each positioning range describes the contiguous set of LBAs that an
>> actuator serves.
>
>Are these ranges exlusive to each actuator or can they overlap?
>
>> This series does not attempt in any way to optimize accesses to
>> multi-actuator devices (e.g. block IO schedulers or filesystems). This
>> initial support only exposes the independent access ranges information
>> to user space through sysfs.
>
>Is the plan to eventually change the IO scheduler to maintain two
>different queues, one for each actuator, and send down commands for two
>different IO streams that the elevator attempts to keep sequential?
>
>

There is nothing in the spec that requires the ranges to be contiguous or non-overlapping. It's easy to imagine a HDD architecture that allows multiple heads to access the same sectors on the disk. It's also easy to imagine a workload scenario where parallel access to the same disk could be useful. (Think of a typical storage design that sequentially writes new user data gradually filling the disk, while simultaneously supporting random user reads over the written data.)

The IO Scheduler is a useful place to implement per-actuator load management, but with the LBA-to-actuator mapping available to user space (via sysfs) it could also be done at the user level. Or pretty much anywhere else where we have knowledge and control of the various streams.

The system is flexible and adaptable to a really wide range of HDD designs and usage models.

Best regards,
-Tim

Tim Walker
Seagate Research


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 0/5] Initial support for multi-actuator HDDs
  2021-08-27 14:28   ` Tim Walker
@ 2021-08-27 16:41     ` Christoph Hellwig
  2021-08-27 17:00       ` Tim Walker
  2021-08-29 22:55       ` Damien Le Moal
  2021-08-27 17:34     ` Phillip Susi
  1 sibling, 2 replies; 13+ messages in thread
From: Christoph Hellwig @ 2021-08-27 16:41 UTC (permalink / raw)
  To: Tim Walker
  Cc: Phillip Susi, Damien Le Moal, Jens Axboe, linux-block,
	Martin K . Petersen, linux-scsi

On Fri, Aug 27, 2021 at 02:28:58PM +0000, Tim Walker wrote:
> There is nothing in the spec that requires the ranges to be contiguous
> or non-overlapping.

Yikes, that is a pretty stupid standard.  Almost as bad as allowing
non-uniform sized non-power of two sized zones :)

> It's easy to imagine a HDD architecture that allows multiple heads to access the same sectors on the disk. It's also easy to imagine a workload scenario where parallel access to the same disk could be useful. (Think of a typical storage design that sequentially writes new user data gradually filling the disk, while simultaneously supporting random user reads over the written data.)

But for those drivers you do not actually need this scheme at all.
Storage devices that support higher concurrency are bog standard with
SSDs and if you want to go back storage arrays.  The only interesting
case is when these ranges are separate so that the access can be carved
up based on the boundary.  Now I don't want to give people ideas with
overlapping but not identical, which would be just horrible.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 0/5] Initial support for multi-actuator HDDs
  2021-08-27 16:41     ` Christoph Hellwig
@ 2021-08-27 17:00       ` Tim Walker
  2021-08-29 22:55       ` Damien Le Moal
  1 sibling, 0 replies; 13+ messages in thread
From: Tim Walker @ 2021-08-27 17:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Phillip Susi, Damien Le Moal, Jens Axboe, linux-block,
	Martin K . Petersen, linux-scsi


On  Friday, August 27, 2021 at 12:42:54 PM Christoph Hellwig wrote:
>
>
>
>On Fri, Aug 27, 2021 at 02:28:58PM +0000, Tim Walker wrote:
>> There is nothing in the spec that requires the ranges to be contiguous
>> or non-overlapping.
>
>Yikes, that is a pretty stupid standard.  Almost as bad as allowing
>non-uniform sized non-power of two sized zones :)
>
>> It's easy to imagine a HDD architecture that allows multiple heads to access the same sectors on the disk. It's also easy to imagine a workload scenario where parallel access to the same disk could be useful. (Think of a typical storage design that sequentially writes new user data gradually filling the disk, while simultaneously supporting random user reads over the written data.)
>
>But for those drivers you do not actually need this scheme at all.
>Storage devices that support higher concurrency are bog standard with
>SSDs and if you want to go back storage arrays.  The only interesting
>case is when these ranges are separate so that the access can be carved
>up based on the boundary.  Now I don't want to give people ideas with
>overlapping but not identical, which would be just horrible.
>

Christoph - you are right. The main purpose, AFAIC, is to expose the parallel access capabilities within a LUN/SATA target due to multiple actuators. I hope the ranges are *always* contiguous and *never* overlapping.  But there's no telling what somebody has up their sleeve. 

Best regards,
-Tim


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 0/5] Initial support for multi-actuator HDDs
  2021-08-27 14:28   ` Tim Walker
  2021-08-27 16:41     ` Christoph Hellwig
@ 2021-08-27 17:34     ` Phillip Susi
  2021-08-29 22:50       ` Damien Le Moal
  1 sibling, 1 reply; 13+ messages in thread
From: Phillip Susi @ 2021-08-27 17:34 UTC (permalink / raw)
  To: Tim Walker
  Cc: Damien Le Moal, Jens Axboe, linux-block, Martin K . Petersen, linux-scsi


Tim Walker <tim.t.walker@seagate.com> writes:

> The IO Scheduler is a useful place to implement per-actuator load
> management, but with the LBA-to-actuator mapping available to user
> space (via sysfs) it could also be done at the user level. Or pretty
> much anywhere else where we have knowledge and control of the various
> streams.

I suppose there may be some things user space could do with the
information, but mainly doesn't it have to be done in the IO scheduler?
As it stands now, it is going to try to avoid seeking between the two
regions even though the drive can service a contiguous stream from both
just fine, right?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 0/5] Initial support for multi-actuator HDDs
  2021-08-27 17:34     ` Phillip Susi
@ 2021-08-29 22:50       ` Damien Le Moal
  0 siblings, 0 replies; 13+ messages in thread
From: Damien Le Moal @ 2021-08-29 22:50 UTC (permalink / raw)
  To: Phillip Susi, Tim Walker
  Cc: Jens Axboe, linux-block, Martin K . Petersen, linux-scsi

On 2021/08/28 2:38, Phillip Susi wrote:
> 
> Tim Walker <tim.t.walker@seagate.com> writes:
> 
>> The IO Scheduler is a useful place to implement per-actuator load
>> management, but with the LBA-to-actuator mapping available to user
>> space (via sysfs) it could also be done at the user level. Or pretty
>> much anywhere else where we have knowledge and control of the various
>> streams.
> 
> I suppose there may be some things user space could do with the
> information, but mainly doesn't it have to be done in the IO scheduler?

Correct, if the user does not use a file system then optimizations will depend
on the user application and the IO scheduler.

> As it stands now, it is going to try to avoid seeking between the two
> regions even though the drive can service a contiguous stream from both
> just fine, right?

Correct. But any IO scheduler optimization will kick-in only and only if the
user is accessing the drive at a queue depth beyond the drive max QD, 32 for
SATA. If the drive is exercised at a QD less than its maximum, the scheduler
does not hold on to requests (at least mq-deadline does not, not sure about
bfq). So even with only this patch set (no optimizations at the kernel level),
the user can still make things work as expected, that is, get multiple streams
of IOs to execute in parallel.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 0/5] Initial support for multi-actuator HDDs
  2021-08-27 16:41     ` Christoph Hellwig
  2021-08-27 17:00       ` Tim Walker
@ 2021-08-29 22:55       ` Damien Le Moal
  1 sibling, 0 replies; 13+ messages in thread
From: Damien Le Moal @ 2021-08-29 22:55 UTC (permalink / raw)
  To: hch, Tim Walker
  Cc: Phillip Susi, Jens Axboe, linux-block, Martin K . Petersen, linux-scsi

On 2021/08/28 1:43, Christoph Hellwig wrote:
> On Fri, Aug 27, 2021 at 02:28:58PM +0000, Tim Walker wrote:
>> There is nothing in the spec that requires the ranges to be contiguous
>> or non-overlapping.
> 
> Yikes, that is a pretty stupid standard.  Almost as bad as allowing
> non-uniform sized non-power of two sized zones :)
> 
>> It's easy to imagine a HDD architecture that allows multiple heads to access the same sectors on the disk. It's also easy to imagine a workload scenario where parallel access to the same disk could be useful. (Think of a typical storage design that sequentially writes new user data gradually filling the disk, while simultaneously supporting random user reads over the written data.)
> 
> But for those drivers you do not actually need this scheme at all.

Agree.

> Storage devices that support higher concurrency are bog standard with
> SSDs and if you want to go back storage arrays.  The only interesting
> case is when these ranges are separate so that the access can be carved
> up based on the boundary.  Now I don't want to give people ideas with
> overlapping but not identical, which would be just horrible.

Agree too. And looking at my patch again, the function disk_check_iaranges() in
patch 1 only checks that the overall sector range of all access ranges is form 0
to capacity - 1, but it does not check for holes nor overlap. I need to change
that and ignore any disk that reports overlapping ranges or ranges with holes in
the LBA space. Holes would be horrible and if we have overlap, then the drive
can optimize by itself. Will resend a V7 with corrections for that.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-08-29 22:56 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-27  7:50 [PATCH v6 0/5] Initial support for multi-actuator HDDs Damien Le Moal
2021-08-27  7:50 ` [PATCH v6 1/5] block: Add independent access ranges support Damien Le Moal
2021-08-27  7:50 ` [PATCH v6 2/5] scsi: sd: add concurrent positioning " Damien Le Moal
2021-08-27  7:50 ` [PATCH v6 3/5] libata: support concurrent positioning ranges log Damien Le Moal
2021-08-27  7:50 ` [PATCH v6 4/5] doc: document sysfs queue/independent_access_ranges attributes Damien Le Moal
2021-08-27  7:50 ` [PATCH v6 5/5] doc: Fix typo in request queue sysfs documentation Damien Le Moal
2021-08-27 13:58 ` [PATCH v6 0/5] Initial support for multi-actuator HDDs Phillip Susi
2021-08-27 14:28   ` Tim Walker
2021-08-27 16:41     ` Christoph Hellwig
2021-08-27 17:00       ` Tim Walker
2021-08-29 22:55       ` Damien Le Moal
2021-08-27 17:34     ` Phillip Susi
2021-08-29 22:50       ` Damien Le Moal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).