linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v2 0/6] scsi: rework mid-layer with xarrays
@ 2020-05-24 15:58 Douglas Gilbert
  2020-05-24 15:58 ` [RFC v2 1/6] scsi: xarray hctl Douglas Gilbert
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Douglas Gilbert @ 2020-05-24 15:58 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

The SCSI subsystem uses a "hctl" topological tuple that stands for
"host,channel,target,lun". It implies that the SCSI mid-layer (or
mid-level) has a four level (inverted) object tree, or five level if
the root and leaf levels are both counted. Actually it is implemented
as a three level object tree in which the channel level is merged into
the target level.

The leaf level of the object tree (i.e. the "lun" part of the tuple)
derives its name from SCSI Logical Units (LUs) identified by LU Numbers
(LUNs). For historical reasons Linux terms LUs as "devices" and the
corresponding object in the subsystem's API is of type struct
scsi_device. Somewhat confusingly Linux also terms the host as a
"device". LUs are typically block storage while SCSI hosts are usually
associated with Host Bus Adapters (HBAs). To complete the confusion
in SCSI terminology the target is known as a "device". 

This series replaces the doubly linked lists that bind together the SCSI
subsystem's object tree with xarrays. At the top of that object tree is
an idr/ida virtual array which holds a collection of SCSI hosts that are
active in the system. The idr/ida implementation already uses xarrays,
so the collection of SCSI hosts is not altered by this patchset.

Each SCSI host object (shost) holds two collections: one of scsi_target
objects (starg(s)), the other of scsi_device objects (sdev(s)). That
second collection is technically redundant since each starg has a
collection of sdevs. Currently these three collections are implemented
using doubly linked lists (see: include/linux/list.h). This patchset
replaces those linked lists collections with xarrays. That replacement
is done in the first patch. In the remaining 5 patches, some advantage
is taken of xarray features. Also some inefficient iterations are
reworked.


Why xarrays?
------------
  - xarrays have built-in locking: spinlocks for modifying operations
    and rcu (locks) for non-modifying operations. With list_head you
    are on your own.
  - the name xarray suggests O(1) indexing (but it isn't that fast) and
    feels like it should be faster than doubly linked lists. [And
    xarrays are not true arrays, they just have a similar interface.]
  - struct list_head seems too clever. Looking at a group of related
    structures used to build an object tree (e.g. struct scsi_target,
    it is difficult to distinguish the collections from the elements of
    a collection. There are just lots of:
        struct list_head <name_obscures_whether_collector_or_collectee>;
  - struct list_head evenly distributes its storage overhead between the
    collection holder and the elements of a collection: 2 pointers each,
    16 bytes on 64 bit machines. xarray imposes an overhead on the
    collection holder but a smaller overhead on its elements, perhaps we
    can restrict it to 2 bytes: [0..USHRT_MAX]? Failing that, a 4 byte
    overhead per element.
  - linked lists don't scale very well on multi-core machines
  - any decisions that can be made on the basis of xarray marks is a
    cache(line) win, as there is no need to pull in the corresponding
    element (e.g. see: SCSI_XA_NON_SDEV_DEL)
  - xarray is well instrumented and will warn (once) at run time if it
    doesn't like what it finds with regard to locking.

At several points in the code changes are comments starting with
"XARRAY: ". These are typically where list manipulations have been
changed out for xarray operations. There might be subtle changes in
semantics (e.g. if an addition racing with a deletion is permitted).

Locking
-------
xarrays have inbuilt locking. Since the collection is held in the
parent, they are inherently safer than doubly linked lists. Lists can
crash within an iteration if either the current or next element is
deleted. As long as the parent doesn't get deleted, then a xarray
iteration will produce a pointer (or NULL) without crashing. The rcu
lock on the xarray iteration guarantees that. Dereferencing that
pointer is where the fun begins (i.e. it may crash). The caller may
know that (scsi_device) object must exist because it holds a reference
to it.

This patchset does _not_ weaken the existing locking structures in the
SCSI mid-layer. It just adds another locking layer underneath it. So it
is "over-locked" especially on modifying operations.

Performance
===========
It is a real struggle to measure anything meaningful on the creation and
deletion side. The problem is systemd and udev which conspire to show
the tester that they own 99.9% of all available cores during major
disruptions (e.g. adding 6000 SCSI devices). Using mask and stop on
systemd-udevd and systemd-journald helps (but they must be unmasked
before a reboot otherwise problems will follow).

It is hard to beat a linked list when iterating through a collection.
A possible win is on the lookup side, if that is required (and it may
not be). A xarray could be maintained in ascending order so a binary
search could be done (e.g. order shost->__targets with the channel
combined with target_id).


Douglas Gilbert (6):
  scsi: xarray hctl
  scsi: xarray, iterations on scsi_target
  scsi: xarray mark sdev_del state
  scsi: improve scsi_device_lookup
  scsi: add __scsi_target_lookup
  scsi: count number of targets

 drivers/scsi/hosts.c               |   8 +-
 drivers/scsi/scsi.c                | 194 +++++++++++++++++++++++------
 drivers/scsi/scsi_error.c          |  34 ++---
 drivers/scsi/scsi_lib.c            |  38 +++++-
 drivers/scsi/scsi_scan.c           |  70 +++++++----
 drivers/scsi/scsi_sysfs.c          |  56 ++++++---
 drivers/target/target_core_pscsi.c |   2 +-
 include/scsi/scsi_device.h         | 148 ++++++++++++++++++++--
 include/scsi/scsi_host.h           |  15 ++-
 include/scsi/scsi_transport.h      |   3 +-
 10 files changed, 448 insertions(+), 120 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC v2 1/6] scsi: xarray hctl
  2020-05-24 15:58 [RFC v2 0/6] scsi: rework mid-layer with xarrays Douglas Gilbert
@ 2020-05-24 15:58 ` Douglas Gilbert
  2020-05-25  6:57   ` Hannes Reinecke
  2020-05-24 15:58 ` [RFC v2 2/6] scsi: xarray, iterations on scsi_target Douglas Gilbert
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Douglas Gilbert @ 2020-05-24 15:58 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare, kbuild test robot

Replace three linked lists with xarrays:
  - Scsi_Host::__devices  collection of scsi_device objects
  - Scsi_Host::__targets  collection of scsi_target objects
  - scsi_target::devices  collection of scsi_device objects that
    belong to this target

The collection of Scsi_Host objects remains unaltered. It uses the
idr/ida mechanism which already uses xarrays in its implementation.

Add a scsi_target::parent_shost pointer for direct access to a
target's host since the oft called dev_to_shost() needs to loop through
any intermediate (transport supplied) objects between a target and its
parent host. Add a new, trivial access function: starg_to_shost() and
thus allow calls to dev_to_shost(starg->dev.parent) to be replaced
with starg_to_shost(starg). Use this faster replacement in mid-level
source and in an inline function defined in scsi_transport.h .

Against the advice in scsi_devices.h and scsi_host.h this file:
drivers/target/target_core_pscsi.c uses Scsi_Host::__devices directly
and needs a minimal change to allow it to compile and work in the
same fashion.

xarray technical info: all three xarrays are initialized with
XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ flags. When XA_FLAGS_ALLOC is used
a xarray allocates unique index numbers and uses one of the 3 available
marks to manage that allocation. So there are two marks remaining and
they are currently unused. The LOCK_IRQ flags means it calls
spin_lock_irq() internally on xarray modifying operations. xarray
non-modifying operations use the rcu lock.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/hosts.c               |  8 +++--
 drivers/scsi/scsi.c                | 43 +++++++++++++++--------
 drivers/scsi/scsi_lib.c            |  8 ++---
 drivers/scsi/scsi_scan.c           | 48 +++++++++++++++++--------
 drivers/scsi/scsi_sysfs.c          | 41 +++++++++++++++-------
 drivers/target/target_core_pscsi.c |  2 +-
 include/scsi/scsi_device.h         | 56 +++++++++++++++++++++++++-----
 include/scsi/scsi_host.h           | 12 ++++---
 include/scsi/scsi_transport.h      |  3 +-
 9 files changed, 158 insertions(+), 63 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 7ec91c3a66ca..2bba293a497d 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -345,6 +345,8 @@ static void scsi_host_dev_release(struct device *dev)
 
 	if (parent)
 		put_device(parent);
+	xa_destroy(&shost->__targets);
+	xa_destroy(&shost->__devices);
 	kfree(shost);
 }
 
@@ -382,8 +384,8 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
 	shost->host_lock = &shost->default_lock;
 	spin_lock_init(shost->host_lock);
 	shost->shost_state = SHOST_CREATED;
-	INIT_LIST_HEAD(&shost->__devices);
-	INIT_LIST_HEAD(&shost->__targets);
+	xa_init_flags(&shost->__devices, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ);
+	xa_init_flags(&shost->__targets, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ);
 	INIT_LIST_HEAD(&shost->eh_cmd_q);
 	INIT_LIST_HEAD(&shost->starved_list);
 	init_waitqueue_head(&shost->host_wait);
@@ -502,6 +504,8 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
  fail_index_remove:
 	ida_simple_remove(&host_index_ida, shost->host_no);
  fail_kfree:
+	xa_destroy(&shost->__targets);
+	xa_destroy(&shost->__devices);
 	kfree(shost);
 	return NULL;
 }
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 56c24a73e0c7..61aa68083f67 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -554,19 +554,32 @@ EXPORT_SYMBOL(scsi_device_put);
 struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost,
 					   struct scsi_device *prev)
 {
-	struct list_head *list = (prev ? &prev->siblings : &shost->__devices);
-	struct scsi_device *next = NULL;
-	unsigned long flags;
+	struct scsi_device *next = prev;
+	unsigned long flags, l_idx;
 
 	spin_lock_irqsave(shost->host_lock, flags);
-	while (list->next != &shost->__devices) {
-		next = list_entry(list->next, struct scsi_device, siblings);
-		/* skip devices that we can't get a reference to */
-		if (!scsi_device_get(next))
-			break;
+	if (xa_empty(&shost->__devices)) {
 		next = NULL;
-		list = list->next;
+		goto unlock;
 	}
+	do {
+		if (!next) {	/* get first element iff first iteration */
+			l_idx = 0;
+			next = xa_find(&shost->__devices, &l_idx,
+				       scsi_xa_limit_16b.max, XA_PRESENT);
+		} else {
+			l_idx = next->sh_idx,
+			next = xa_find_after(&shost->__devices, &l_idx,
+					     scsi_xa_limit_16b.max,
+					     XA_PRESENT);
+		}
+		if (next) {
+			/* skip devices that we can't get a reference to */
+			if (!scsi_device_get(next))
+				break;
+		}
+	} while (next);
+unlock:
 	spin_unlock_irqrestore(shost->host_lock, flags);
 
 	if (prev)
@@ -588,7 +601,7 @@ EXPORT_SYMBOL(__scsi_iterate_devices);
 void starget_for_each_device(struct scsi_target *starget, void *data,
 		     void (*fn)(struct scsi_device *, void *))
 {
-	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
+	struct Scsi_Host *shost = starg_to_shost(starget);
 	struct scsi_device *sdev;
 
 	shost_for_each_device(sdev, shost) {
@@ -616,7 +629,7 @@ EXPORT_SYMBOL(starget_for_each_device);
 void __starget_for_each_device(struct scsi_target *starget, void *data,
 			       void (*fn)(struct scsi_device *, void *))
 {
-	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
+	struct Scsi_Host *shost = starg_to_shost(starget);
 	struct scsi_device *sdev;
 
 	__shost_for_each_device(sdev, shost) {
@@ -645,9 +658,10 @@ EXPORT_SYMBOL(__starget_for_each_device);
 struct scsi_device *__scsi_device_lookup_by_target(struct scsi_target *starget,
 						   u64 lun)
 {
+	unsigned long l_idx;
 	struct scsi_device *sdev;
 
-	list_for_each_entry(sdev, &starget->devices, same_target_siblings) {
+	xa_for_each(&starget->devices, l_idx, sdev) {
 		if (sdev->sdev_state == SDEV_DEL)
 			continue;
 		if (sdev->lun ==lun)
@@ -671,7 +685,7 @@ struct scsi_device *scsi_device_lookup_by_target(struct scsi_target *starget,
 						 u64 lun)
 {
 	struct scsi_device *sdev;
-	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
+	struct Scsi_Host *shost = starg_to_shost(starget);
 	unsigned long flags;
 
 	spin_lock_irqsave(shost->host_lock, flags);
@@ -703,9 +717,10 @@ EXPORT_SYMBOL(scsi_device_lookup_by_target);
 struct scsi_device *__scsi_device_lookup(struct Scsi_Host *shost,
 		uint channel, uint id, u64 lun)
 {
+	unsigned long l_idx;
 	struct scsi_device *sdev;
 
-	list_for_each_entry(sdev, &shost->__devices, siblings) {
+	xa_for_each(&shost->__devices, l_idx, sdev) {
 		if (sdev->sdev_state == SDEV_DEL)
 			continue;
 		if (sdev->channel == channel && sdev->id == id &&
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 47835c4b4ee0..68df68f54fc8 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -372,9 +372,9 @@ static void scsi_kick_queue(struct request_queue *q)
 static void scsi_single_lun_run(struct scsi_device *current_sdev)
 {
 	struct Scsi_Host *shost = current_sdev->host;
-	struct scsi_device *sdev, *tmp;
+	struct scsi_device *sdev;
 	struct scsi_target *starget = scsi_target(current_sdev);
-	unsigned long flags;
+	unsigned long flags, l_idx;
 
 	spin_lock_irqsave(shost->host_lock, flags);
 	starget->starget_sdev_user = NULL;
@@ -391,8 +391,8 @@ static void scsi_single_lun_run(struct scsi_device *current_sdev)
 	spin_lock_irqsave(shost->host_lock, flags);
 	if (starget->starget_sdev_user)
 		goto out;
-	list_for_each_entry_safe(sdev, tmp, &starget->devices,
-			same_target_siblings) {
+	/* XARRAY: was list_for_each_entry_safe(); is this safe ? */
+	xa_for_each(&starget->devices, l_idx, sdev) {
 		if (sdev == current_sdev)
 			continue;
 		if (scsi_device_get(sdev))
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index f2437a7570ce..b8f5850c5a04 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -217,7 +217,7 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
 {
 	struct scsi_device *sdev;
 	int display_failure_msg = 1, ret;
-	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
+	struct Scsi_Host *shost = starg_to_shost(starget);
 
 	sdev = kzalloc(sizeof(*sdev) + shost->transportt->device_size,
 		       GFP_KERNEL);
@@ -234,8 +234,9 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
 	sdev->channel = starget->channel;
 	mutex_init(&sdev->state_mutex);
 	sdev->sdev_state = SDEV_CREATED;
-	INIT_LIST_HEAD(&sdev->siblings);
-	INIT_LIST_HEAD(&sdev->same_target_siblings);
+	/* XARRAY: INIT_LIST_HEAD()s here on list leaves, why ? */
+	sdev->sh_idx = -1;
+	sdev->starg_idx = -1;		/* for debugging */
 	INIT_LIST_HEAD(&sdev->starved_entry);
 	INIT_LIST_HEAD(&sdev->event_list);
 	spin_lock_init(&sdev->list_lock);
@@ -306,8 +307,9 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
 
 static void scsi_target_destroy(struct scsi_target *starget)
 {
+	struct scsi_target *e_starg;
 	struct device *dev = &starget->dev;
-	struct Scsi_Host *shost = dev_to_shost(dev->parent);
+	struct Scsi_Host *shost = starg_to_shost(starget);
 	unsigned long flags;
 
 	BUG_ON(starget->state == STARGET_DEL);
@@ -316,7 +318,10 @@ static void scsi_target_destroy(struct scsi_target *starget)
 	spin_lock_irqsave(shost->host_lock, flags);
 	if (shost->hostt->target_destroy)
 		shost->hostt->target_destroy(starget);
-	list_del_init(&starget->siblings);
+	/* XARRAY: was list_del_init(); why the _init ?  */
+	e_starg = xa_erase(&shost->__targets, starget->sh_idx);
+	if (e_starg != starget)
+		pr_err("%s: bad xa_erase()\n", __func__);
 	spin_unlock_irqrestore(shost->host_lock, flags);
 	put_device(dev);
 }
@@ -326,6 +331,7 @@ static void scsi_target_dev_release(struct device *dev)
 	struct device *parent = dev->parent;
 	struct scsi_target *starget = to_scsi_target(dev);
 
+	xa_destroy(&starget->devices);
 	kfree(starget);
 	put_device(parent);
 }
@@ -344,12 +350,13 @@ EXPORT_SYMBOL(scsi_is_target_device);
 static struct scsi_target *__scsi_find_target(struct device *parent,
 					      int channel, uint id)
 {
+	unsigned long l_idx;
 	struct scsi_target *starget, *found_starget = NULL;
 	struct Scsi_Host *shost = dev_to_shost(parent);
 	/*
-	 * Search for an existing target for this sdev.
+	 * Search for an existing target for this host.
 	 */
-	list_for_each_entry(starget, &shost->__targets, siblings) {
+	xa_for_each(&shost->__targets, l_idx, starget) {
 		if (starget->id == id &&
 		    starget->channel == channel) {
 			found_starget = starget;
@@ -412,6 +419,8 @@ static struct scsi_target *scsi_alloc_target(struct device *parent,
 	struct Scsi_Host *shost = dev_to_shost(parent);
 	struct device *dev = NULL;
 	unsigned long flags;
+	unsigned int u_idx;
+	int res;
 	const int size = sizeof(struct scsi_target)
 		+ shost->transportt->target_size;
 	struct scsi_target *starget;
@@ -427,14 +436,15 @@ static struct scsi_target *scsi_alloc_target(struct device *parent,
 	device_initialize(dev);
 	kref_init(&starget->reap_ref);
 	dev->parent = get_device(parent);
+	starget->parent_shost = shost;
 	dev_set_name(dev, "target%d:%d:%d", shost->host_no, channel, id);
 	dev->bus = &scsi_bus_type;
 	dev->type = &scsi_target_type;
 	starget->id = id;
 	starget->channel = channel;
 	starget->can_queue = 0;
-	INIT_LIST_HEAD(&starget->siblings);
-	INIT_LIST_HEAD(&starget->devices);
+	xa_init_flags(&starget->devices, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ);
+	starget->sh_idx = -1;		/* debugging */
 	starget->state = STARGET_CREATED;
 	starget->scsi_level = SCSI_2;
 	starget->max_target_blocked = SCSI_DEFAULT_TARGET_BLOCKED;
@@ -445,7 +455,15 @@ static struct scsi_target *scsi_alloc_target(struct device *parent,
 	if (found_target)
 		goto found;
 
-	list_add_tail(&starget->siblings, &shost->__targets);
+	/* XARRAY: was list_add_tail(); may get hole in xarray or tail */
+	res = xa_alloc(&shost->__targets, &u_idx, starget, scsi_xa_limit_16b,
+		       GFP_ATOMIC);
+	if (res < 0) {
+		spin_unlock_irqrestore(shost->host_lock, flags);
+		pr_err("%s: xa_alloc failure errno=%d\n", __func__, -res);
+		return NULL;
+	}
+	starget->sh_idx = u_idx;
 	spin_unlock_irqrestore(shost->host_lock, flags);
 	/* allocate and add */
 	transport_setup_device(dev);
@@ -1049,7 +1067,7 @@ static int scsi_probe_and_add_lun(struct scsi_target *starget,
 	unsigned char *result;
 	blist_flags_t bflags;
 	int res = SCSI_SCAN_NO_RESPONSE, result_len = 256;
-	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
+	struct Scsi_Host *shost = starg_to_shost(starget);
 
 	/*
 	 * The rescan flag is used as an optimization, the first scan of a
@@ -1199,7 +1217,7 @@ static void scsi_sequential_lun_scan(struct scsi_target *starget,
 {
 	uint max_dev_lun;
 	u64 sparse_lun, lun;
-	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
+	struct Scsi_Host *shost = starg_to_shost(starget);
 
 	SCSI_LOG_SCAN_BUS(3, starget_printk(KERN_INFO, starget,
 		"scsi scan: Sequential scan\n"));
@@ -1297,7 +1315,7 @@ static int scsi_report_lun_scan(struct scsi_target *starget, blist_flags_t bflag
 	struct scsi_lun *lunp, *lun_data;
 	struct scsi_sense_hdr sshdr;
 	struct scsi_device *sdev;
-	struct Scsi_Host *shost = dev_to_shost(&starget->dev);
+	struct Scsi_Host *shost = starg_to_shost(starget);
 	int ret = 0;
 
 	/*
@@ -1860,11 +1878,11 @@ EXPORT_SYMBOL(scsi_scan_host);
 void scsi_forget_host(struct Scsi_Host *shost)
 {
 	struct scsi_device *sdev;
-	unsigned long flags;
+	unsigned long flags, l_idx;
 
  restart:
 	spin_lock_irqsave(shost->host_lock, flags);
-	list_for_each_entry(sdev, &shost->__devices, siblings) {
+	xa_for_each(&shost->__devices, l_idx, sdev) {
 		if (sdev->sdev_state == SDEV_DEL)
 			continue;
 		spin_unlock_irqrestore(shost->host_lock, flags);
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 163dbcb741c1..4bfcf33139a2 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -433,7 +433,7 @@ static void scsi_device_cls_release(struct device *class_dev)
 
 static void scsi_device_dev_release_usercontext(struct work_struct *work)
 {
-	struct scsi_device *sdev;
+	struct scsi_device *sdev, *e_sdev;
 	struct device *parent;
 	struct list_head *this, *tmp;
 	struct scsi_vpd *vpd_pg80 = NULL, *vpd_pg83 = NULL;
@@ -447,8 +447,12 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
 	parent = sdev->sdev_gendev.parent;
 
 	spin_lock_irqsave(sdev->host->host_lock, flags);
-	list_del(&sdev->siblings);
-	list_del(&sdev->same_target_siblings);
+	e_sdev = xa_erase(&sdev->host->__devices, sdev->sh_idx);
+	if (e_sdev != sdev)
+		pr_err("%s: bad xa_erase(host:__devices)\n", __func__);
+	e_sdev = xa_erase(&sdev->sdev_target->devices, sdev->starg_idx);
+	if (e_sdev != sdev)
+		pr_err("%s: bad xa_erase(sdev_target->devices)\n", __func__);
 	list_del(&sdev->starved_entry);
 	spin_unlock_irqrestore(sdev->host->host_lock, flags);
 
@@ -1471,22 +1475,19 @@ EXPORT_SYMBOL(scsi_remove_device);
 
 static void __scsi_remove_target(struct scsi_target *starget)
 {
-	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
-	unsigned long flags;
+	struct Scsi_Host *shost = starg_to_shost(starget);
 	struct scsi_device *sdev;
+	unsigned long flags, l_idx;
 
 	spin_lock_irqsave(shost->host_lock, flags);
  restart:
-	list_for_each_entry(sdev, &shost->__devices, siblings) {
+	xa_for_each(&starget->devices, l_idx, sdev) {
 		/*
 		 * We cannot call scsi_device_get() here, as
 		 * we might've been called from rmmod() causing
 		 * scsi_device_get() to fail the module_is_live()
 		 * check.
 		 */
-		if (sdev->channel != starget->channel ||
-		    sdev->id != starget->id)
-			continue;
 		if (sdev->sdev_state == SDEV_DEL ||
 		    sdev->sdev_state == SDEV_CANCEL ||
 		    !get_device(&sdev->sdev_gendev))
@@ -1495,6 +1496,7 @@ static void __scsi_remove_target(struct scsi_target *starget)
 		scsi_remove_device(sdev);
 		put_device(&sdev->sdev_gendev);
 		spin_lock_irqsave(shost->host_lock, flags);
+		/* XARRAY: is this goto start of loop correct ? */
 		goto restart;
 	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
@@ -1512,11 +1514,11 @@ void scsi_remove_target(struct device *dev)
 {
 	struct Scsi_Host *shost = dev_to_shost(dev->parent);
 	struct scsi_target *starget;
-	unsigned long flags;
+	unsigned long flags, l_idx;
 
 restart:
 	spin_lock_irqsave(shost->host_lock, flags);
-	list_for_each_entry(starget, &shost->__targets, siblings) {
+	xa_for_each(&shost->__targets, l_idx, starget) {
 		if (starget->state == STARGET_DEL ||
 		    starget->state == STARGET_REMOVE ||
 		    starget->state == STARGET_CREATED_REMOVE)
@@ -1584,6 +1586,8 @@ static struct device_type scsi_dev_type = {
 
 void scsi_sysfs_device_initialize(struct scsi_device *sdev)
 {
+	int res;
+	unsigned int u_idx;
 	unsigned long flags;
 	struct Scsi_Host *shost = sdev->host;
 	struct scsi_target  *starget = sdev->sdev_target;
@@ -1614,8 +1618,19 @@ void scsi_sysfs_device_initialize(struct scsi_device *sdev)
 
 	transport_setup_device(&sdev->sdev_gendev);
 	spin_lock_irqsave(shost->host_lock, flags);
-	list_add_tail(&sdev->same_target_siblings, &starget->devices);
-	list_add_tail(&sdev->siblings, &shost->__devices);
+	/* XARRAY: might add in hole */
+	res = xa_alloc(&starget->devices, &u_idx, sdev, scsi_xa_limit_16b,
+		       GFP_ATOMIC);
+	if (res < 0)
+		pr_err("%s: xa_alloc 1 failure errno=%d\n", __func__, -res);
+	else
+		sdev->starg_idx = u_idx;
+	res = xa_alloc(&shost->__devices, &u_idx, sdev, scsi_xa_limit_16b,
+		       GFP_ATOMIC);
+	if (res < 0)
+		pr_err("%s: xa_alloc 2 failure errno=%d\n", __func__, -res);
+	else
+		sdev->sh_idx = u_idx;
 	spin_unlock_irqrestore(shost->host_lock, flags);
 	/*
 	 * device can now only be removed via __scsi_remove_device() so hold
diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
index c9d92b3e777d..8547fc9ec344 100644
--- a/drivers/target/target_core_pscsi.c
+++ b/drivers/target/target_core_pscsi.c
@@ -496,7 +496,7 @@ static int pscsi_configure_device(struct se_device *dev)
 	}
 
 	spin_lock_irq(sh->host_lock);
-	list_for_each_entry(sd, &sh->__devices, siblings) {
+	__shost_for_each_device(sd, sh) {
 		if ((pdv->pdv_channel_id != sd->channel) ||
 		    (pdv->pdv_target_id != sd->id) ||
 		    (pdv->pdv_lun_id != sd->lun))
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index c3cba2aaf934..dc9de4cc0e79 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -7,7 +7,9 @@
 #include <linux/workqueue.h>
 #include <linux/blkdev.h>
 #include <scsi/scsi.h>
+#include <scsi/scsi_host.h>
 #include <linux/atomic.h>
+#include <linux/xarray.h>
 
 struct device;
 struct request_queue;
@@ -103,8 +105,8 @@ struct scsi_device {
 	struct request_queue *request_queue;
 
 	/* the next two are protected by the host->host_lock */
-	struct list_head    siblings;   /* list of all devices on this host */
-	struct list_head    same_target_siblings; /* just the devices sharing same target id */
+	int sh_idx;		/* index within host->__devices array */
+	int starg_idx;		/* index within sdev_target->devices array */
 
 	atomic_t device_busy;		/* commands actually active on LLDD */
 	atomic_t device_blocked;	/* Device returned QUEUE_FULL. */
@@ -281,16 +283,18 @@ enum scsi_target_state {
  * scsi_target: representation of a scsi target, for now, this is only
  * used for single_lun devices. If no one has active IO to the target,
  * starget_sdev_user is NULL, else it points to the active sdev.
+ * Invariant: starg->parent_shost == dev_to_shost(starg->dev.parent)
  */
 struct scsi_target {
 	struct scsi_device	*starget_sdev_user;
-	struct list_head	siblings;
-	struct list_head	devices;
-	struct device		dev;
+	struct Scsi_Host	*parent_shost;
+	int			sh_idx;	/* index within Scsi_Host::__targets */
 	struct kref		reap_ref; /* last put renders target invisible */
 	unsigned int		channel;
 	unsigned int		id; /* target id ... replace
 				     * scsi_device.id eventually */
+	struct xarray devices;	/* scsi_device objects owned by this target */
+	struct device		dev;
 	unsigned int		create:1; /* signal that it needs to be added */
 	unsigned int		single_lun:1;	/* Indicates we should only
 						 * allow I/O to one of the luns
@@ -329,6 +333,11 @@ static inline struct scsi_target *scsi_target(struct scsi_device *sdev)
 #define transport_class_to_starget(class_dev) \
 	to_scsi_target(class_dev->parent)
 
+static inline struct Scsi_Host *starg_to_shost(struct scsi_target *starg)
+{
+	return starg->parent_shost;
+}
+
 #define starget_printk(prefix, starget, fmt, a...)	\
 	dev_printk(prefix, &(starget)->dev, fmt, ##a)
 
@@ -365,7 +374,7 @@ extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *,
 /**
  * shost_for_each_device - iterate over all devices of a host
  * @sdev: the &struct scsi_device to use as a cursor
- * @shost: the &struct scsi_host to iterate over
+ * @shost: the &struct Scsi_Host to iterate over
  *
  * Iterator that returns each device attached to @shost.  This loop
  * takes a reference on each device and releases it at the end.  If
@@ -376,21 +385,50 @@ extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *,
 	     (sdev); \
 	     (sdev) = __scsi_iterate_devices((shost), (sdev)))
 
+/* helper for __shost_for_each_device */
+static inline struct scsi_device *__scsi_h2d_1st_it(struct Scsi_Host *shost)
+{
+	unsigned long l_idx = 0;
+
+	return xa_find(&shost->__devices, &l_idx, scsi_xa_limit_16b.max,
+		       XA_PRESENT);
+}
+
+/* helper for __shost_for_each_device */
+static inline struct scsi_device *__scsi_h2d_next_it(struct Scsi_Host *shost,
+						     struct scsi_device *prev)
+{
+	unsigned long l_idx;
+
+	if (prev) {
+		l_idx = prev->sh_idx,
+		prev = xa_find_after(&shost->__devices, &l_idx,
+				     scsi_xa_limit_16b.max, XA_PRESENT);
+	}
+	return prev;	/* now either _next_ or NULL */
+}
+
 /**
  * __shost_for_each_device - iterate over all devices of a host (UNLOCKED)
  * @sdev: the &struct scsi_device to use as a cursor
- * @shost: the &struct scsi_host to iterate over
+ * @shost: the &struct Scsi_Host to iterate over
  *
  * Iterator that returns each device attached to @shost.  It does _not_
  * take a reference on the scsi_device, so the whole loop must be
- * protected by shost->host_lock.
+ * protected by shost->host_lock (see Note 2).
  *
  * Note: The only reason to use this is because you need to access the
  * device list in interrupt context.  Otherwise you really want to use
  * shost_for_each_device instead.
+ *
+ * Note 2: The iteration won't fail but dereferencing sdev might. To access
+ * the xarray features (e.g. marks) associated with sdev safely use:
+ * xa_for_each(&shost->__devices, l_idx, next) directly then use l_idx.
  */
 #define __shost_for_each_device(sdev, shost) \
-	list_for_each_entry((sdev), &((shost)->__devices), siblings)
+	for ((sdev) = __scsi_h2d_1st_it((shost)); \
+	     (sdev); \
+	     (sdev) = __scsi_h2d_next_it((shost), (sdev)))
 
 extern int scsi_change_queue_depth(struct scsi_device *, int);
 extern int scsi_track_queue_full(struct scsi_device *, int);
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 822e8cda8d9b..e6386f3d6de1 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -9,6 +9,7 @@
 #include <linux/mutex.h>
 #include <linux/seq_file.h>
 #include <linux/blk-mq.h>
+#include <linux/xarray.h>
 #include <scsi/scsi.h>
 
 struct block_device;
@@ -29,6 +30,9 @@ struct scsi_transport_template;
 #define MODE_INITIATOR 0x01
 #define MODE_TARGET 0x02
 
+/* XARRAY: this limits number of devices (LUs) in host to 64K */
+#define scsi_xa_limit_16b    XA_LIMIT(0, USHRT_MAX)
+
 struct scsi_host_template {
 	struct module *module;
 	const char *name;
@@ -233,7 +237,7 @@ struct scsi_host_template {
 	 * If a host has the ability to discover targets on its own instead
 	 * of scanning the entire bus, it can fill in this function and
 	 * call scsi_scan_host().  This function will be called periodically
-	 * until it returns 1 with the scsi_host and the elapsed time of
+	 * until it returns 1 with the Scsi_Host and the elapsed time of
 	 * the scan in jiffies.
 	 *
 	 * Status: OPTIONAL
@@ -520,9 +524,9 @@ struct Scsi_Host {
 	 * their __ prefixed variants with the lock held. NEVER
 	 * access this list directly from a driver.
 	 */
-	struct list_head	__devices;
-	struct list_head	__targets;
-	
+	struct xarray		__devices;	/* array of scsi_debug objs */
+	struct xarray		__targets;	/* array of scsi_target objs */
+
 	struct list_head	starved_list;
 
 	spinlock_t		default_lock;
diff --git a/include/scsi/scsi_transport.h b/include/scsi/scsi_transport.h
index a0458bda3148..3756b264809a 100644
--- a/include/scsi/scsi_transport.h
+++ b/include/scsi/scsi_transport.h
@@ -70,7 +70,8 @@ scsi_transport_reserve_device(struct scsi_transport_template * t, int space)
 static inline void *
 scsi_transport_target_data(struct scsi_target *starget)
 {
-	struct Scsi_Host *shost = dev_to_shost(&starget->dev);
+	struct Scsi_Host *shost = starg_to_shost(starget);
+
 	return (u8 *)starget->starget_data
 		+ shost->transportt->target_private_offset;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC v2 2/6] scsi: xarray, iterations on scsi_target
  2020-05-24 15:58 [RFC v2 0/6] scsi: rework mid-layer with xarrays Douglas Gilbert
  2020-05-24 15:58 ` [RFC v2 1/6] scsi: xarray hctl Douglas Gilbert
@ 2020-05-24 15:58 ` Douglas Gilbert
  2020-05-25  7:06   ` Hannes Reinecke
  2020-05-24 15:58 ` [RFC v2 3/6] scsi: xarray mark sdev_del state Douglas Gilbert
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Douglas Gilbert @ 2020-05-24 15:58 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

For reasons unknown, the supplied functions for iterating through all
scsi_device objects that belonged to a scsi_target involved going
up to the scsi_host object (i.e. the target's parent) and iterating
through all scsi_device objects belonging to that scsi_host, and only
selecting those scsi_device objects that belonged to the original
scsi_target object. While correct, that is very inefficient when it
is noted that each scsi_target object has a 'devices' xarray which
contains pointers to the scsi_device object it owns.

Modify starget_for_each_device() and __starget_for_each_device()
to take this more efficient route. Add starg_for_each_device() and
__starg_for_each_device() macros in scsi_device.h that are finer
grained versions of the existing shost_for_each_device() and
__shost_for_each_device() macros.

The new __scsi_target_iterate_devices() helper function takes
the host_lock to be consistent with the existing
__scsi_iterate_devices() function's locking. At this stage the user
is _not_ being encouraged to use per scsi_target locks.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/scsi.c        | 68 ++++++++++++++++++++++++++---------
 drivers/scsi/scsi_scan.c   |  3 +-
 include/scsi/scsi_device.h | 73 +++++++++++++++++++++++++++++++++++++-
 3 files changed, 125 insertions(+), 19 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 61aa68083f67..aa3240f7aed9 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -588,9 +588,50 @@ struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost,
 }
 EXPORT_SYMBOL(__scsi_iterate_devices);
 
+/* helper for starg_for_each_device, see that for documentation */
+struct scsi_device *__scsi_target_iterate_devices(struct scsi_target *starg,
+						  struct scsi_device *prev)
+{
+	struct Scsi_Host *shost = starg_to_shost(starg);
+	struct scsi_device *next = prev;
+	unsigned long flags, l_idx;
+
+	if (!shost)
+		return NULL;
+	spin_lock_irqsave(shost->host_lock, flags);
+	if (xa_empty(&starg->devices)) {
+		next = NULL;
+		goto unlock;
+	}
+	do {
+		if (!next) {	/* get first element iff first iteration */
+			l_idx = 0;
+			next = xa_find(&starg->devices, &l_idx,
+				       scsi_xa_limit_16b.max, XA_PRESENT);
+		} else {
+			l_idx = next->sh_idx,
+			next = xa_find_after(&starg->devices, &l_idx,
+					     scsi_xa_limit_16b.max,
+					     XA_PRESENT);
+		}
+		if (next) {
+			/* skip devices that we can't get a reference to */
+			if (!scsi_device_get(next))
+				break;
+		}
+	} while (next);
+unlock:
+	spin_unlock_irqrestore(shost->host_lock, flags);
+
+	if (prev)
+		scsi_device_put(prev);
+	return next;
+}
+EXPORT_SYMBOL(__scsi_target_iterate_devices);
+
 /**
  * starget_for_each_device  -  helper to walk all devices of a target
- * @starget:	target whose devices we want to iterate over.
+ * @starg:	target whose devices we want to iterate over.
  * @data:	Opaque passed to each function call.
  * @fn:		Function to call on each device
  *
@@ -598,23 +639,20 @@ EXPORT_SYMBOL(__scsi_iterate_devices);
  * a reference that must be released by scsi_host_put when breaking
  * out of the loop.
  */
-void starget_for_each_device(struct scsi_target *starget, void *data,
-		     void (*fn)(struct scsi_device *, void *))
+void starget_for_each_device(struct scsi_target *starg, void *data,
+			     void (*fn)(struct scsi_device *, void *))
 {
-	struct Scsi_Host *shost = starg_to_shost(starget);
 	struct scsi_device *sdev;
 
-	shost_for_each_device(sdev, shost) {
-		if ((sdev->channel == starget->channel) &&
-		    (sdev->id == starget->id))
-			fn(sdev, data);
-	}
+	/* XARRAY: this looks like recursion, but isn't. Rename function? */
+	sstarg_for_each_device(sdev, starg)
+		fn(sdev, data);
 }
 EXPORT_SYMBOL(starget_for_each_device);
 
 /**
  * __starget_for_each_device - helper to walk all devices of a target (UNLOCKED)
- * @starget:	target whose devices we want to iterate over.
+ * @starg:	target whose devices we want to iterate over.
  * @data:	parameter for callback @fn()
  * @fn:		callback function that is invoked for each device
  *
@@ -626,17 +664,13 @@ EXPORT_SYMBOL(starget_for_each_device);
  * they need to access the device list in irq context.  Otherwise you
  * really want to use starget_for_each_device instead.
  **/
-void __starget_for_each_device(struct scsi_target *starget, void *data,
+void __starget_for_each_device(struct scsi_target *starg, void *data,
 			       void (*fn)(struct scsi_device *, void *))
 {
-	struct Scsi_Host *shost = starg_to_shost(starget);
 	struct scsi_device *sdev;
 
-	__shost_for_each_device(sdev, shost) {
-		if ((sdev->channel == starget->channel) &&
-		    (sdev->id == starget->id))
-			fn(sdev, data);
-	}
+	__sstarg_for_each_device(sdev, starg)
+		fn(sdev, data);
 }
 EXPORT_SYMBOL(__starget_for_each_device);
 
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index b8f5850c5a04..72a0064a879a 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -403,7 +403,8 @@ static void scsi_target_reap_ref_put(struct scsi_target *starget)
 
 /**
  * scsi_alloc_target - allocate a new or find an existing target
- * @parent:	parent of the target (need not be a scsi host)
+ * @parent:	may point to the parent shost, or an intermediate object
+ *		that dev_to_shost() can resolve to the parent shost
  * @channel:	target channel number (zero if no channels)
  * @id:		target id number
  *
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index dc9de4cc0e79..4219b8d1b94c 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -295,6 +295,7 @@ struct scsi_target {
 				     * scsi_device.id eventually */
 	struct xarray devices;	/* scsi_device objects owned by this target */
 	struct device		dev;
+
 	unsigned int		create:1; /* signal that it needs to be added */
 	unsigned int		single_lun:1;	/* Indicates we should only
 						 * allow I/O to one of the luns
@@ -361,6 +362,7 @@ extern struct scsi_device *scsi_device_lookup_by_target(struct scsi_target *,
 							u64);
 extern struct scsi_device *__scsi_device_lookup_by_target(struct scsi_target *,
 							  u64);
+/* XARRAY: these visitor names too close to sstarg_for_each_device macros? */
 extern void starget_for_each_device(struct scsi_target *, void *,
 		     void (*fn)(struct scsi_device *, void *));
 extern void __starget_for_each_device(struct scsi_target *, void *,
@@ -370,6 +372,10 @@ extern void __starget_for_each_device(struct scsi_target *, void *,
 /* only exposed to implement shost_for_each_device */
 extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *,
 						  struct scsi_device *);
+/* only exposed to implement starg_for_each_device */
+extern struct scsi_device *__scsi_target_iterate_devices
+					(struct scsi_target *starg,
+					 struct scsi_device *sdev);
 
 /**
  * shost_for_each_device - iterate over all devices of a host
@@ -423,13 +429,78 @@ static inline struct scsi_device *__scsi_h2d_next_it(struct Scsi_Host *shost,
  *
  * Note 2: The iteration won't fail but dereferencing sdev might. To access
  * the xarray features (e.g. marks) associated with sdev safely use:
- * xa_for_each(&shost->__devices, l_idx, next) directly then use l_idx.
+ * xa_for_each(&shost->__devices, l_idx, sdev) directly then use l_idx.
  */
 #define __shost_for_each_device(sdev, shost) \
 	for ((sdev) = __scsi_h2d_1st_it((shost)); \
 	     (sdev); \
 	     (sdev) = __scsi_h2d_next_it((shost), (sdev)))
 
+/**
+ * sstarg_for_each_device - iterate over all devices of a target
+ * @sdev: the &struct scsi_device to use as a cursor
+ * @starg: the &struct scsi_target to iterate over
+ *
+ * Iterator that returns each device attached to @starg.  This loop
+ * takes a reference on each device and releases it at the end.  If
+ * you break out of the loop, you must call scsi_device_put(sdev).
+ *
+ * Note: the leading double "ss" is to lessen the similarity between this
+ * macro and the starget_for_each_device() function declared above.
+ */
+#define sstarg_for_each_device(sdev, starg) \
+	for ((sdev) = __scsi_target_iterate_devices((starg), NULL); \
+	     (sdev); \
+	     (sdev) = __scsi_target_iterate_devices((starg), (sdev)))
+
+/* helper for __sstarg_for_each_device */
+static inline struct scsi_device *__scsi_t2d_1st_it(struct scsi_target *starg)
+{
+	unsigned long l_idx = 0;
+
+	return xa_find(&starg->devices, &l_idx, scsi_xa_limit_16b.max,
+		       XA_PRESENT);
+}
+
+/* helper for __sstarg_for_each_device */
+static inline struct scsi_device *__scsi_t2d_next_it(struct scsi_target *starg,
+						     struct scsi_device *prev)
+{
+	unsigned long l_idx;
+
+	if (prev) {
+		l_idx = prev->starg_idx,
+		prev = xa_find_after(&starg->devices, &l_idx,
+				     scsi_xa_limit_16b.max, XA_PRESENT);
+	}
+	return prev;	/* now either _next_ or NULL */
+}
+
+/**
+ * __sstarg_for_each_device - iterate over all devices of a target (UNLOCKED)
+ * @sdev: the &struct scsi_device to use as a cursor
+ * @starg: the &struct scsi_target to iterate over
+ *
+ * Iterator that returns each device attached to @starg.  It does _not_
+ * take a reference on the scsi_device, so the whole loop must be
+ * protected by shost->host_lock.
+ *
+ * Note: The only reason to use this is because you need to access the
+ * device list in interrupt context.  Otherwise you really want to use
+ * starg_for_each_device instead.
+ *
+ * Note 2: The iteration won't fail but dereferencing sdev might. To access
+ * the xarray features (e.g. marks) associated with sdev safely use:
+ * xa_for_each(&starg->devices, l_idx, sdev) directly then use l_idx.
+ *
+ * Note 3: the leading double "ss" is to lessen the similarity between this
+ * macro and the __starget_for_each_device() function declared above.
+ */
+#define __sstarg_for_each_device(sdev, starg) \
+	for ((sdev) = __scsi_t2d_1st_it((starg)); \
+	     (sdev); \
+	     (sdev) = __scsi_t2d_next_it((starg), (sdev)))
+
 extern int scsi_change_queue_depth(struct scsi_device *, int);
 extern int scsi_track_queue_full(struct scsi_device *, int);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC v2 3/6] scsi: xarray mark sdev_del state
  2020-05-24 15:58 [RFC v2 0/6] scsi: rework mid-layer with xarrays Douglas Gilbert
  2020-05-24 15:58 ` [RFC v2 1/6] scsi: xarray hctl Douglas Gilbert
  2020-05-24 15:58 ` [RFC v2 2/6] scsi: xarray, iterations on scsi_target Douglas Gilbert
@ 2020-05-24 15:58 ` Douglas Gilbert
  2020-05-25  7:00   ` Hannes Reinecke
  2020-05-24 15:58 ` [RFC v2 4/6] scsi: improve scsi_device_lookup Douglas Gilbert
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Douglas Gilbert @ 2020-05-24 15:58 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

One of the features of an xarray is the ability to mark the pointers
that it holds. Sadly there are only two marks available per xarray so
it is not possible to map marks to the sdev_state enumeration which
holds 9 states. Since several loops skip sdev_s in SDEV_DEL state
then its negation was chosen as one mark: SCSI_XA_NON_SDEV_DEL.

To support this mark a new iterator macro:
  shost_for_each_non_del_device(sdev, shost)
has been introduced plus a helper function:
   __scsi_iterate_non_del_devices(shost, sdev).
The new iterator macro is used once (in scsi_scan.c) and the new
mark is used directly in three other loops. The machinery to support
the mark is placed in a helper (scsi_adjust_non_sdev_del_mark())
called mainly by scsi_device_set_state(). Other locations that set
sdev_state were checked and some additional calls to that helper
were added.

Following a complaint from 'make W=1' the formal name of a parameter
to both starget_for_each_device() and __starget_for_each_device()
was changed from starg to starget.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/scsi.c        | 49 +++++++++++++++++++++++++++++++++-----
 drivers/scsi/scsi_lib.c    | 34 ++++++++++++++++++++++++--
 drivers/scsi/scsi_scan.c   | 14 +++++------
 drivers/scsi/scsi_sysfs.c  | 19 +++++++++------
 include/scsi/scsi_device.h | 19 +++++++++++++++
 include/scsi/scsi_host.h   |  2 ++
 6 files changed, 114 insertions(+), 23 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index aa3240f7aed9..0fb650aebcfb 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -588,6 +588,45 @@ struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost,
 }
 EXPORT_SYMBOL(__scsi_iterate_devices);
 
+/* helper for shost_for_each_non_del_device, see that for documentation */
+struct scsi_device *__scsi_iterate_non_del_devices(struct Scsi_Host *shost,
+						   struct scsi_device *prev)
+{
+	struct scsi_device *next = prev;
+	unsigned long flags, l_idx;
+
+	spin_lock_irqsave(shost->host_lock, flags);
+	if (xa_empty(&shost->__devices)) {
+		next = NULL;
+		goto unlock;
+	}
+	do {
+		if (!next) {	/* get first element iff first iteration */
+			l_idx = 0;
+			next = xa_find(&shost->__devices, &l_idx,
+				       scsi_xa_limit_16b.max,
+				       SCSI_XA_NON_SDEV_DEL);
+		} else {
+			l_idx = next->sh_idx,
+			next = xa_find_after(&shost->__devices, &l_idx,
+					     scsi_xa_limit_16b.max,
+					     SCSI_XA_NON_SDEV_DEL);
+		}
+		if (next) {
+			/* skip devices that we can't get a reference to */
+			if (!scsi_device_get(next))
+				break;
+		}
+	} while (next);
+unlock:
+	spin_unlock_irqrestore(shost->host_lock, flags);
+
+	if (prev)
+		scsi_device_put(prev);
+	return next;
+}
+EXPORT_SYMBOL(__scsi_iterate_non_del_devices);
+
 /* helper for starg_for_each_device, see that for documentation */
 struct scsi_device *__scsi_target_iterate_devices(struct scsi_target *starg,
 						  struct scsi_device *prev)
@@ -695,9 +734,8 @@ struct scsi_device *__scsi_device_lookup_by_target(struct scsi_target *starget,
 	unsigned long l_idx;
 	struct scsi_device *sdev;
 
-	xa_for_each(&starget->devices, l_idx, sdev) {
-		if (sdev->sdev_state == SDEV_DEL)
-			continue;
+	xa_for_each_marked(&starget->devices, l_idx, sdev,
+			   SCSI_XA_NON_SDEV_DEL) {
 		if (sdev->lun ==lun)
 			return sdev;
 	}
@@ -754,9 +792,8 @@ struct scsi_device *__scsi_device_lookup(struct Scsi_Host *shost,
 	unsigned long l_idx;
 	struct scsi_device *sdev;
 
-	xa_for_each(&shost->__devices, l_idx, sdev) {
-		if (sdev->sdev_state == SDEV_DEL)
-			continue;
+	xa_for_each_marked(&shost->__devices, l_idx, sdev,
+			   SCSI_XA_NON_SDEV_DEL) {
 		if (sdev->channel == channel && sdev->id == id &&
 				sdev->lun ==lun)
 			return sdev;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 68df68f54fc8..f484bf2b5d7d 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2217,6 +2217,27 @@ scsi_test_unit_ready(struct scsi_device *sdev, int timeout, int retries,
 }
 EXPORT_SYMBOL(scsi_test_unit_ready);
 
+/* Assume host_lock taken and that xahp->xa_lock is the same lock. */
+static inline void
+scsi_adjust_non_sdev_del_mark(struct scsi_device *sdev,
+			      enum scsi_device_state old_state,
+			      enum scsi_device_state new_state)
+{
+	struct xarray *xatp =
+			&to_scsi_target(sdev->sdev_gendev.parent)->devices;
+	struct xarray *xahp = &sdev->host->__devices;
+
+	if (old_state == new_state)
+		return;
+	if (old_state == SDEV_DEL) {
+		xa_set_mark(xatp, sdev->starg_idx, SCSI_XA_NON_SDEV_DEL);
+		xa_set_mark(xahp, sdev->sh_idx, SCSI_XA_NON_SDEV_DEL);
+	} else if (new_state == SDEV_DEL) {
+		xa_clear_mark(xatp, sdev->starg_idx, SCSI_XA_NON_SDEV_DEL);
+		xa_clear_mark(xahp, sdev->sh_idx, SCSI_XA_NON_SDEV_DEL);
+	}
+}
+
 /**
  *	scsi_device_set_state - Take the given device through the device state model.
  *	@sdev:	scsi device to change the state of.
@@ -2330,6 +2351,8 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
 
 	}
 	sdev->offline_already = false;
+	if (unlikely(oldstate == SDEV_DEL || state == SDEV_DEL))
+		scsi_adjust_non_sdev_del_mark(sdev, oldstate, state);
 	sdev->sdev_state = state;
 	return 0;
 
@@ -2731,14 +2754,21 @@ int scsi_internal_device_unblock_nowait(struct scsi_device *sdev,
 	switch (sdev->sdev_state) {
 	case SDEV_BLOCK:
 	case SDEV_TRANSPORT_OFFLINE:
+		if (unlikely(new_state == SDEV_DEL))
+			scsi_adjust_non_sdev_del_mark(sdev, sdev->sdev_state,
+						      SDEV_DEL);
 		sdev->sdev_state = new_state;
 		break;
 	case SDEV_CREATED_BLOCK:
 		if (new_state == SDEV_TRANSPORT_OFFLINE ||
-		    new_state == SDEV_OFFLINE)
+		    new_state == SDEV_OFFLINE) {
 			sdev->sdev_state = new_state;
-		else
+		} else {
+			if (unlikely(new_state == SDEV_DEL))
+				scsi_adjust_non_sdev_del_mark
+					(sdev, sdev->sdev_state, SDEV_DEL);
 			sdev->sdev_state = SDEV_CREATED;
+		}
 		break;
 	case SDEV_CANCEL:
 	case SDEV_OFFLINE:
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 72a0064a879a..f5a5e48ca5ae 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -280,7 +280,7 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
 	scsi_change_queue_depth(sdev, sdev->host->cmd_per_lun ?
 					sdev->host->cmd_per_lun : 1);
 
-	scsi_sysfs_device_initialize(sdev);
+	scsi_sysfs_device_initialize(sdev);	/* marked as non_sdev_del */
 
 	if (shost->hostt->slave_alloc) {
 		ret = shost->hostt->slave_alloc(sdev);
@@ -1708,10 +1708,9 @@ int scsi_scan_host_selected(struct Scsi_Host *shost, unsigned int channel,
 static void scsi_sysfs_add_devices(struct Scsi_Host *shost)
 {
 	struct scsi_device *sdev;
-	shost_for_each_device(sdev, shost) {
-		/* target removed before the device could be added */
-		if (sdev->sdev_state == SDEV_DEL)
-			continue;
+
+	/* target may be removed before devices could be added */
+	shost_for_each_non_del_device(sdev, shost) {
 		/* If device is already visible, skip adding it to sysfs */
 		if (sdev->is_visible)
 			continue;
@@ -1883,9 +1882,8 @@ void scsi_forget_host(struct Scsi_Host *shost)
 
  restart:
 	spin_lock_irqsave(shost->host_lock, flags);
-	xa_for_each(&shost->__devices, l_idx, sdev) {
-		if (sdev->sdev_state == SDEV_DEL)
-			continue;
+	xa_for_each_marked(&shost->__devices, l_idx, sdev,
+			   SCSI_XA_NON_SDEV_DEL) {
 		spin_unlock_irqrestore(shost->host_lock, flags);
 		__scsi_remove_device(sdev);
 		goto restart;
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 4bfcf33139a2..6fdd4648f374 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1481,15 +1481,15 @@ static void __scsi_remove_target(struct scsi_target *starget)
 
 	spin_lock_irqsave(shost->host_lock, flags);
  restart:
-	xa_for_each(&starget->devices, l_idx, sdev) {
+	xa_for_each_marked(&starget->devices, l_idx, sdev,
+			   SCSI_XA_NON_SDEV_DEL) {
 		/*
 		 * We cannot call scsi_device_get() here, as
 		 * we might've been called from rmmod() causing
 		 * scsi_device_get() to fail the module_is_live()
 		 * check.
 		 */
-		if (sdev->sdev_state == SDEV_DEL ||
-		    sdev->sdev_state == SDEV_CANCEL ||
+		if (sdev->sdev_state == SDEV_CANCEL ||
 		    !get_device(&sdev->sdev_gendev))
 			continue;
 		spin_unlock_irqrestore(shost->host_lock, flags);
@@ -1621,16 +1621,21 @@ void scsi_sysfs_device_initialize(struct scsi_device *sdev)
 	/* XARRAY: might add in hole */
 	res = xa_alloc(&starget->devices, &u_idx, sdev, scsi_xa_limit_16b,
 		       GFP_ATOMIC);
-	if (res < 0)
+	if (res < 0) {
 		pr_err("%s: xa_alloc 1 failure errno=%d\n", __func__, -res);
-	else
+	} else {
 		sdev->starg_idx = u_idx;
+		xa_set_mark(&starget->devices, u_idx, SCSI_XA_NON_SDEV_DEL);
+	}
 	res = xa_alloc(&shost->__devices, &u_idx, sdev, scsi_xa_limit_16b,
 		       GFP_ATOMIC);
-	if (res < 0)
+	if (res < 0) {
 		pr_err("%s: xa_alloc 2 failure errno=%d\n", __func__, -res);
-	else
+	} else {
 		sdev->sh_idx = u_idx;
+		/* already have host_lock which is shost->__devices.xa_lock */
+		xa_set_mark(&shost->__devices, u_idx, SCSI_XA_NON_SDEV_DEL);
+	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
 	/*
 	 * device can now only be removed via __scsi_remove_device() so hold
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 4219b8d1b94c..23bf686cbabc 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -372,6 +372,9 @@ extern void __starget_for_each_device(struct scsi_target *, void *,
 /* only exposed to implement shost_for_each_device */
 extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *,
 						  struct scsi_device *);
+/* only exposed to implement shost_for_each_non_del_device */
+extern struct scsi_device *__scsi_iterate_non_del_devices
+			(struct Scsi_Host *shost, struct scsi_device *sdev);
 /* only exposed to implement starg_for_each_device */
 extern struct scsi_device *__scsi_target_iterate_devices
 					(struct scsi_target *starg,
@@ -391,6 +394,22 @@ extern struct scsi_device *__scsi_target_iterate_devices
 	     (sdev); \
 	     (sdev) = __scsi_iterate_devices((shost), (sdev)))
 
+/**
+ * shost_for_each_non_del_device - iterate over those scsi_devices that do not
+ *				   have the SDEV_DEL state and are associated
+ *				   with given host
+ * @sdev: the &struct scsi_device to use as a cursor
+ * @shost: the &struct Scsi_Host to iterate over
+ *
+ * Iterator that returns each device attached to @shost.  This loop
+ * takes a reference on each device and releases it at the end.  If
+ * you break out of the loop, you must call scsi_device_put(sdev).
+ */
+#define shost_for_each_non_del_device(sdev, shost) \
+	for ((sdev) = __scsi_iterate_non_del_devices((shost), NULL); \
+	     (sdev); \
+	     (sdev) = __scsi_iterate_non_del_devices((shost), (sdev)))
+
 /* helper for __shost_for_each_device */
 static inline struct scsi_device *__scsi_h2d_1st_it(struct Scsi_Host *shost)
 {
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index e6386f3d6de1..0e94b1feb8e9 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -33,6 +33,8 @@ struct scsi_transport_template;
 /* XARRAY: this limits number of devices (LUs) in host to 64K */
 #define scsi_xa_limit_16b    XA_LIMIT(0, USHRT_MAX)
 
+#define SCSI_XA_NON_SDEV_DEL XA_MARK_1
+
 struct scsi_host_template {
 	struct module *module;
 	const char *name;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC v2 4/6] scsi: improve scsi_device_lookup
  2020-05-24 15:58 [RFC v2 0/6] scsi: rework mid-layer with xarrays Douglas Gilbert
                   ` (2 preceding siblings ...)
  2020-05-24 15:58 ` [RFC v2 3/6] scsi: xarray mark sdev_del state Douglas Gilbert
@ 2020-05-24 15:58 ` Douglas Gilbert
  2020-05-25  7:07   ` Hannes Reinecke
  2020-05-24 15:58 ` [RFC v2 5/6] scsi: add __scsi_target_lookup Douglas Gilbert
  2020-05-24 15:58 ` [RFC v2 6/6] scsi: count number of targets Douglas Gilbert
  5 siblings, 1 reply; 25+ messages in thread
From: Douglas Gilbert @ 2020-05-24 15:58 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

When the __scsi_device_lookup() function is given a "ctl" (i.e. the
latter part of "hctl" tuple), improve the loop to find a matching
device (LU) in the given host. Rather than loop over all LUs in the
host, first loop over all targets to find a match on "ct" then, if
found, loop over all LUs in that target for a match on "l". These
nested loops are better since they don't visit LUs belonging to
non-matching targets. This improvement flows through to the locked
version of this function, namely scsi_device_lookup().

Remove a 21 year old comment by the author that no longer seem
relevant.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/scsi.c | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 0fb650aebcfb..9e7658aebdb7 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -35,7 +35,6 @@
  *
  *  Jiffies wrap fixes (host->resetting), 3 Dec 1998 Andrea Arcangeli
  *
- *  out_of_space hacks, D. Gilbert (dpg) 990608
  */
 
 #include <linux/module.h>
@@ -789,16 +788,31 @@ EXPORT_SYMBOL(scsi_device_lookup_by_target);
 struct scsi_device *__scsi_device_lookup(struct Scsi_Host *shost,
 		uint channel, uint id, u64 lun)
 {
-	unsigned long l_idx;
+	unsigned long l_idx, m_idx;
+	struct scsi_target *starg;
 	struct scsi_device *sdev;
 
-	xa_for_each_marked(&shost->__devices, l_idx, sdev,
+	if (xa_empty(&shost->__devices))
+		return NULL;
+	if (xa_empty(&shost->__targets))
+		goto inconsistent;
+	xa_for_each(&shost->__targets, l_idx, starg) {
+		if (!(starg->id == id && starg->channel == channel))
+			continue;
+		xa_for_each_marked(&starg->devices, m_idx, sdev,
+				   SCSI_XA_NON_SDEV_DEL) {
+			if (sdev->lun == lun)
+				return sdev;
+		}
+	}
+	return NULL;
+inconsistent:
+	xa_for_each_marked(&shost->__devices, m_idx, sdev,
 			   SCSI_XA_NON_SDEV_DEL) {
-		if (sdev->channel == channel && sdev->id == id &&
-				sdev->lun ==lun)
+		if (sdev->id == id && sdev->channel == channel &&
+		    sdev->lun == lun)
 			return sdev;
 	}
-
 	return NULL;
 }
 EXPORT_SYMBOL(__scsi_device_lookup);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC v2 5/6] scsi: add __scsi_target_lookup
  2020-05-24 15:58 [RFC v2 0/6] scsi: rework mid-layer with xarrays Douglas Gilbert
                   ` (3 preceding siblings ...)
  2020-05-24 15:58 ` [RFC v2 4/6] scsi: improve scsi_device_lookup Douglas Gilbert
@ 2020-05-24 15:58 ` Douglas Gilbert
  2020-05-24 15:58 ` [RFC v2 6/6] scsi: count number of targets Douglas Gilbert
  5 siblings, 0 replies; 25+ messages in thread
From: Douglas Gilbert @ 2020-05-24 15:58 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Hides the xarray based lookup of a target given the channel and target
identifiers within a shost. Use this and __sstarg_for_each_device()
where useful.

Break the pattern when deleting to jump back and restart the iteration
as that seems to be a linked list quirk. May change semantics if
elements can be added in the newly created holes in the xarray at the
same time as the ongoing deletion.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/scsi.c        | 24 ++++++++++++++++++++++++
 drivers/scsi/scsi_error.c  | 34 +++++++++++++++++-----------------
 drivers/scsi/scsi_lib.c    |  8 +++-----
 drivers/scsi/scsi_scan.c   |  4 ++--
 drivers/scsi/scsi_sysfs.c  |  8 +++-----
 include/scsi/scsi_device.h |  2 ++
 6 files changed, 51 insertions(+), 29 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 9e7658aebdb7..1b1fc8d4e5c2 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -667,6 +667,30 @@ struct scsi_device *__scsi_target_iterate_devices(struct scsi_target *starg,
 }
 EXPORT_SYMBOL(__scsi_target_iterate_devices);
 
+/**
+ * __scsi_target_lookup  -  helper to find target given channel and target_id
+ * @shost:	host to search within
+ * @chan:	channel number
+ * @t_id:	target identifier
+ *
+ * No locks or references taken. Returns NULL if not found.
+ */
+struct scsi_target *__scsi_target_lookup(struct Scsi_Host *shost,
+					 uint chan, uint t_id)
+{
+	unsigned long l_idx;
+	struct scsi_target *starg = NULL;
+
+	if (shost) {
+		xa_for_each(&shost->__targets, l_idx, starg) {
+			if (chan == starg->channel && t_id == starg->id)
+				break;
+		}
+	}
+	return starg;
+}
+EXPORT_SYMBOL(__scsi_target_lookup);
+
 /**
  * starget_for_each_device  -  helper to walk all devices of a target
  * @starg:	target whose devices we want to iterate over.
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 978be1602f71..b2dbfa5f73a0 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -644,6 +644,7 @@ static void scsi_handle_queue_ramp_up(struct scsi_device *sdev)
 {
 	struct scsi_host_template *sht = sdev->host->hostt;
 	struct scsi_device *tmp_sdev;
+	struct scsi_target *starg = scsi_target(sdev);
 
 	if (!sht->track_queue_depth ||
 	    sdev->queue_depth >= sdev->max_queue_depth)
@@ -658,13 +659,10 @@ static void scsi_handle_queue_ramp_up(struct scsi_device *sdev)
 		return;
 
 	/*
-	 * Walk all devices of a target and do
-	 * ramp up on them.
+	 * Walk all devices of a target and do ramp up on them.
 	 */
-	shost_for_each_device(tmp_sdev, sdev->host) {
-		if (tmp_sdev->channel != sdev->channel ||
-		    tmp_sdev->id != sdev->id ||
-		    tmp_sdev->queue_depth == sdev->max_queue_depth)
+	sstarg_for_each_device(tmp_sdev, starg) {
+		if (tmp_sdev->queue_depth == sdev->max_queue_depth)
 			continue;
 
 		scsi_change_queue_depth(tmp_sdev, tmp_sdev->queue_depth + 1);
@@ -672,25 +670,26 @@ static void scsi_handle_queue_ramp_up(struct scsi_device *sdev)
 	}
 }
 
+/*
+ * queue_full corresponds to the SCSI status: "Task Set Full". Its scope is
+ * at the SCSI taret device level.
+ */
 static void scsi_handle_queue_full(struct scsi_device *sdev)
 {
 	struct scsi_host_template *sht = sdev->host->hostt;
 	struct scsi_device *tmp_sdev;
+	struct scsi_target *starg = scsi_target(sdev);
 
 	if (!sht->track_queue_depth)
 		return;
 
-	shost_for_each_device(tmp_sdev, sdev->host) {
-		if (tmp_sdev->channel != sdev->channel ||
-		    tmp_sdev->id != sdev->id)
-			continue;
+	sstarg_for_each_device(tmp_sdev, starg)
 		/*
 		 * We do not know the number of commands that were at
-		 * the device when we got the queue full so we start
+		 * the target when we got the queue full so we start
 		 * from the highest possible value and work our way down.
 		 */
 		scsi_track_queue_full(tmp_sdev, tmp_sdev->queue_depth - 1);
-	}
 }
 
 /**
@@ -2305,12 +2304,13 @@ EXPORT_SYMBOL(scsi_report_bus_reset);
 void scsi_report_device_reset(struct Scsi_Host *shost, int channel, int target)
 {
 	struct scsi_device *sdev;
+	struct scsi_target *starg = __scsi_target_lookup(shost, channel,
+							 target);
 
-	__shost_for_each_device(sdev, shost) {
-		if (channel == sdev_channel(sdev) &&
-		    target == sdev_id(sdev))
-			__scsi_report_device_reset(sdev, NULL);
-	}
+	if (!starg)
+		return;
+	__sstarg_for_each_device(sdev, starg)
+		__scsi_report_device_reset(sdev, NULL);
 }
 EXPORT_SYMBOL(scsi_report_device_reset);
 
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f484bf2b5d7d..034ec2e57d1c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -374,7 +374,7 @@ static void scsi_single_lun_run(struct scsi_device *current_sdev)
 	struct Scsi_Host *shost = current_sdev->host;
 	struct scsi_device *sdev;
 	struct scsi_target *starget = scsi_target(current_sdev);
-	unsigned long flags, l_idx;
+	unsigned long flags;
 
 	spin_lock_irqsave(shost->host_lock, flags);
 	starget->starget_sdev_user = NULL;
@@ -392,7 +392,7 @@ static void scsi_single_lun_run(struct scsi_device *current_sdev)
 	if (starget->starget_sdev_user)
 		goto out;
 	/* XARRAY: was list_for_each_entry_safe(); is this safe ? */
-	xa_for_each(&starget->devices, l_idx, sdev) {
+	__sstarg_for_each_device(sdev, starget) {
 		if (sdev == current_sdev)
 			continue;
 		if (scsi_device_get(sdev))
@@ -2217,14 +2217,12 @@ scsi_test_unit_ready(struct scsi_device *sdev, int timeout, int retries,
 }
 EXPORT_SYMBOL(scsi_test_unit_ready);
 
-/* Assume host_lock taken and that xahp->xa_lock is the same lock. */
 static inline void
 scsi_adjust_non_sdev_del_mark(struct scsi_device *sdev,
 			      enum scsi_device_state old_state,
 			      enum scsi_device_state new_state)
 {
-	struct xarray *xatp =
-			&to_scsi_target(sdev->sdev_gendev.parent)->devices;
+	struct xarray *xatp = &scsi_target(sdev)->devices;
 	struct xarray *xahp = &sdev->host->__devices;
 
 	if (old_state == new_state)
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index f5a5e48ca5ae..c055ee083ea9 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1880,13 +1880,13 @@ void scsi_forget_host(struct Scsi_Host *shost)
 	struct scsi_device *sdev;
 	unsigned long flags, l_idx;
 
- restart:
 	spin_lock_irqsave(shost->host_lock, flags);
 	xa_for_each_marked(&shost->__devices, l_idx, sdev,
 			   SCSI_XA_NON_SDEV_DEL) {
 		spin_unlock_irqrestore(shost->host_lock, flags);
 		__scsi_remove_device(sdev);
-		goto restart;
+		/* XARRAY: was a goto to before first line */
+		spin_lock_irqsave(shost->host_lock, flags);
 	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
 }
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 6fdd4648f374..394230d69afd 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1480,7 +1480,6 @@ static void __scsi_remove_target(struct scsi_target *starget)
 	unsigned long flags, l_idx;
 
 	spin_lock_irqsave(shost->host_lock, flags);
- restart:
 	xa_for_each_marked(&starget->devices, l_idx, sdev,
 			   SCSI_XA_NON_SDEV_DEL) {
 		/*
@@ -1496,8 +1495,7 @@ static void __scsi_remove_target(struct scsi_target *starget)
 		scsi_remove_device(sdev);
 		put_device(&sdev->sdev_gendev);
 		spin_lock_irqsave(shost->host_lock, flags);
-		/* XARRAY: is this goto start of loop correct ? */
-		goto restart;
+		/* XARRAY: was a goto to restart loop */
 	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
 }
@@ -1516,7 +1514,6 @@ void scsi_remove_target(struct device *dev)
 	struct scsi_target *starget;
 	unsigned long flags, l_idx;
 
-restart:
 	spin_lock_irqsave(shost->host_lock, flags);
 	xa_for_each(&shost->__targets, l_idx, starget) {
 		if (starget->state == STARGET_DEL ||
@@ -1532,7 +1529,8 @@ void scsi_remove_target(struct device *dev)
 			spin_unlock_irqrestore(shost->host_lock, flags);
 			__scsi_remove_target(starget);
 			scsi_target_reap(starget);
-			goto restart;
+			/* XARRAY: was a goto to before first line */
+			spin_lock_irqsave(shost->host_lock, flags);
 		}
 	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 23bf686cbabc..e019621416d5 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -362,6 +362,8 @@ extern struct scsi_device *scsi_device_lookup_by_target(struct scsi_target *,
 							u64);
 extern struct scsi_device *__scsi_device_lookup_by_target(struct scsi_target *,
 							  u64);
+extern struct scsi_target *__scsi_target_lookup(struct Scsi_Host *shost,
+						uint chan, uint t_id);
 /* XARRAY: these visitor names too close to sstarg_for_each_device macros? */
 extern void starget_for_each_device(struct scsi_target *, void *,
 		     void (*fn)(struct scsi_device *, void *));
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC v2 6/6] scsi: count number of targets
  2020-05-24 15:58 [RFC v2 0/6] scsi: rework mid-layer with xarrays Douglas Gilbert
                   ` (4 preceding siblings ...)
  2020-05-24 15:58 ` [RFC v2 5/6] scsi: add __scsi_target_lookup Douglas Gilbert
@ 2020-05-24 15:58 ` Douglas Gilbert
  5 siblings, 0 replies; 25+ messages in thread
From: Douglas Gilbert @ 2020-05-24 15:58 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

If less than 2 used to flatten iteration in __scsi_device_lookup().
The majority of hosts will have 1 target and 1 LU (device).

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/scsi.c      | 12 ++++--------
 drivers/scsi/scsi_scan.c |  5 ++++-
 include/scsi/scsi_host.h |  1 +
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 1b1fc8d4e5c2..6acd85ce21dd 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -557,10 +557,6 @@ struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost,
 	unsigned long flags, l_idx;
 
 	spin_lock_irqsave(shost->host_lock, flags);
-	if (xa_empty(&shost->__devices)) {
-		next = NULL;
-		goto unlock;
-	}
 	do {
 		if (!next) {	/* get first element iff first iteration */
 			l_idx = 0;
@@ -578,7 +574,6 @@ struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost,
 				break;
 		}
 	} while (next);
-unlock:
 	spin_unlock_irqrestore(shost->host_lock, flags);
 
 	if (prev)
@@ -818,8 +813,9 @@ struct scsi_device *__scsi_device_lookup(struct Scsi_Host *shost,
 
 	if (xa_empty(&shost->__devices))
 		return NULL;
-	if (xa_empty(&shost->__targets))
-		goto inconsistent;
+	if (shost->num_targets < 2)
+		goto flat_iteration;
+
 	xa_for_each(&shost->__targets, l_idx, starg) {
 		if (!(starg->id == id && starg->channel == channel))
 			continue;
@@ -830,7 +826,7 @@ struct scsi_device *__scsi_device_lookup(struct Scsi_Host *shost,
 		}
 	}
 	return NULL;
-inconsistent:
+flat_iteration:
 	xa_for_each_marked(&shost->__devices, m_idx, sdev,
 			   SCSI_XA_NON_SDEV_DEL) {
 		if (sdev->id == id && sdev->channel == channel &&
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index c055ee083ea9..d665da2b01a4 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -320,7 +320,9 @@ static void scsi_target_destroy(struct scsi_target *starget)
 		shost->hostt->target_destroy(starget);
 	/* XARRAY: was list_del_init(); why the _init ?  */
 	e_starg = xa_erase(&shost->__targets, starget->sh_idx);
-	if (e_starg != starget)
+	if (e_starg == starget)
+		--shost->num_targets;
+	else
 		pr_err("%s: bad xa_erase()\n", __func__);
 	spin_unlock_irqrestore(shost->host_lock, flags);
 	put_device(dev);
@@ -465,6 +467,7 @@ static struct scsi_target *scsi_alloc_target(struct device *parent,
 		return NULL;
 	}
 	starget->sh_idx = u_idx;
+	++shost->num_targets;
 	spin_unlock_irqrestore(shost->host_lock, flags);
 	/* allocate and add */
 	transport_setup_device(dev);
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 0e94b1feb8e9..f49298a4c452 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -528,6 +528,7 @@ struct Scsi_Host {
 	 */
 	struct xarray		__devices;	/* array of scsi_debug objs */
 	struct xarray		__targets;	/* array of scsi_target objs */
+	int			num_targets;	/* modified under host_lock */
 
 	struct list_head	starved_list;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC v2 1/6] scsi: xarray hctl
  2020-05-24 15:58 ` [RFC v2 1/6] scsi: xarray hctl Douglas Gilbert
@ 2020-05-25  6:57   ` Hannes Reinecke
  2020-05-25 16:30     ` Douglas Gilbert
  0 siblings, 1 reply; 25+ messages in thread
From: Hannes Reinecke @ 2020-05-25  6:57 UTC (permalink / raw)
  To: Douglas Gilbert, linux-scsi; +Cc: martin.petersen, jejb, kbuild test robot

On 5/24/20 5:58 PM, Douglas Gilbert wrote:
> Replace three linked lists with xarrays:
>    - Scsi_Host::__devices  collection of scsi_device objects
>    - Scsi_Host::__targets  collection of scsi_target objects
>    - scsi_target::devices  collection of scsi_device objects that
>      belong to this target
> 
> The collection of Scsi_Host objects remains unaltered. It uses the
> idr/ida mechanism which already uses xarrays in its implementation.
> 
> Add a scsi_target::parent_shost pointer for direct access to a
> target's host since the oft called dev_to_shost() needs to loop through
> any intermediate (transport supplied) objects between a target and its
> parent host. Add a new, trivial access function: starg_to_shost() and
> thus allow calls to dev_to_shost(starg->dev.parent) to be replaced
> with starg_to_shost(starg). Use this faster replacement in mid-level
> source and in an inline function defined in scsi_transport.h .
> 
> Against the advice in scsi_devices.h and scsi_host.h this file:
> drivers/target/target_core_pscsi.c uses Scsi_Host::__devices directly
> and needs a minimal change to allow it to compile and work in the
> same fashion.
> 
> xarray technical info: all three xarrays are initialized with
> XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ flags. When XA_FLAGS_ALLOC is used
> a xarray allocates unique index numbers and uses one of the 3 available
> marks to manage that allocation. So there are two marks remaining and
> they are currently unused. The LOCK_IRQ flags means it calls
> spin_lock_irq() internally on xarray modifying operations. xarray
> non-modifying operations use the rcu lock.
> 
> Reported-by: kbuild test robot <lkp@intel.com>
> Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
> ---
>   drivers/scsi/hosts.c               |  8 +++--
>   drivers/scsi/scsi.c                | 43 +++++++++++++++--------
>   drivers/scsi/scsi_lib.c            |  8 ++---
>   drivers/scsi/scsi_scan.c           | 48 +++++++++++++++++--------
>   drivers/scsi/scsi_sysfs.c          | 41 +++++++++++++++-------
>   drivers/target/target_core_pscsi.c |  2 +-
>   include/scsi/scsi_device.h         | 56 +++++++++++++++++++++++++-----
>   include/scsi/scsi_host.h           | 12 ++++---
>   include/scsi/scsi_transport.h      |  3 +-
>   9 files changed, 158 insertions(+), 63 deletions(-)
> 
> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
> index 7ec91c3a66ca..2bba293a497d 100644
> --- a/drivers/scsi/hosts.c
> +++ b/drivers/scsi/hosts.c
> @@ -345,6 +345,8 @@ static void scsi_host_dev_release(struct device *dev)
>   
>   	if (parent)
>   		put_device(parent);
> +	xa_destroy(&shost->__targets);
> +	xa_destroy(&shost->__devices);
>   	kfree(shost);
>   }
>   
> @@ -382,8 +384,8 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
>   	shost->host_lock = &shost->default_lock;
>   	spin_lock_init(shost->host_lock);
>   	shost->shost_state = SHOST_CREATED;
> -	INIT_LIST_HEAD(&shost->__devices);
> -	INIT_LIST_HEAD(&shost->__targets);
> +	xa_init_flags(&shost->__devices, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ);
> +	xa_init_flags(&shost->__targets, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ);
>   	INIT_LIST_HEAD(&shost->eh_cmd_q);
>   	INIT_LIST_HEAD(&shost->starved_list);
>   	init_waitqueue_head(&shost->host_wait);
> @@ -502,6 +504,8 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
>    fail_index_remove:
>   	ida_simple_remove(&host_index_ida, shost->host_no);
>    fail_kfree:
> +	xa_destroy(&shost->__targets);
> +	xa_destroy(&shost->__devices);
>   	kfree(shost);
>   	return NULL;
>   }
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index 56c24a73e0c7..61aa68083f67 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -554,19 +554,32 @@ EXPORT_SYMBOL(scsi_device_put);
>   struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost,
>   					   struct scsi_device *prev)
>   {
> -	struct list_head *list = (prev ? &prev->siblings : &shost->__devices);
> -	struct scsi_device *next = NULL;
> -	unsigned long flags;
> +	struct scsi_device *next = prev;
> +	unsigned long flags, l_idx;
>   
>   	spin_lock_irqsave(shost->host_lock, flags);
> -	while (list->next != &shost->__devices) {
> -		next = list_entry(list->next, struct scsi_device, siblings);
> -		/* skip devices that we can't get a reference to */
> -		if (!scsi_device_get(next))
> -			break;
> +	if (xa_empty(&shost->__devices)) {
>   		next = NULL;
> -		list = list->next;
> +		goto unlock;
>   	}
> +	do {
> +		if (!next) {	/* get first element iff first iteration */
> +			l_idx = 0;
> +			next = xa_find(&shost->__devices, &l_idx,
> +				       scsi_xa_limit_16b.max, XA_PRESENT);
> +		} else {
> +			l_idx = next->sh_idx,
> +			next = xa_find_after(&shost->__devices, &l_idx,
> +					     scsi_xa_limit_16b.max,
> +					     XA_PRESENT);
> +		}
> +		if (next) {
> +			/* skip devices that we can't get a reference to */
> +			if (!scsi_device_get(next))
> +				break;
> +		}
> +	} while (next);
> +unlock:
>   	spin_unlock_irqrestore(shost->host_lock, flags);
>   
>   	if (prev)
This one is patently ugly.

Why not kill __scsi_iterate_devices() and use xa_for_each() on each call 
site?

> @@ -588,7 +601,7 @@ EXPORT_SYMBOL(__scsi_iterate_devices);
>   void starget_for_each_device(struct scsi_target *starget, void *data,
>   		     void (*fn)(struct scsi_device *, void *))
>   {
> -	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
> +	struct Scsi_Host *shost = starg_to_shost(starget);
>   	struct scsi_device *sdev;
>   
>   	shost_for_each_device(sdev, shost) {

See comment above.

> @@ -616,7 +629,7 @@ EXPORT_SYMBOL(starget_for_each_device);
>   void __starget_for_each_device(struct scsi_target *starget, void *data,
>   			       void (*fn)(struct scsi_device *, void *))
>   {
> -	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
> +	struct Scsi_Host *shost = starg_to_shost(starget);
>   	struct scsi_device *sdev;
>   
>   	__shost_for_each_device(sdev, shost) {

Same here.

> @@ -645,9 +658,10 @@ EXPORT_SYMBOL(__starget_for_each_device);
>   struct scsi_device *__scsi_device_lookup_by_target(struct scsi_target *starget,
>   						   u64 lun)
>   {
> +	unsigned long l_idx;
>   	struct scsi_device *sdev;
>   
> -	list_for_each_entry(sdev, &starget->devices, same_target_siblings) {
> +	xa_for_each(&starget->devices, l_idx, sdev) {
>   		if (sdev->sdev_state == SDEV_DEL)
>   			continue;
>   		if (sdev->lun ==lun)
> @@ -671,7 +685,7 @@ struct scsi_device *scsi_device_lookup_by_target(struct scsi_target *starget,
>   						 u64 lun)
>   {
>   	struct scsi_device *sdev;
> -	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
> +	struct Scsi_Host *shost = starg_to_shost(starget);
>   	unsigned long flags;
>   
>   	spin_lock_irqsave(shost->host_lock, flags);
> @@ -703,9 +717,10 @@ EXPORT_SYMBOL(scsi_device_lookup_by_target);
>   struct scsi_device *__scsi_device_lookup(struct Scsi_Host *shost,
>   		uint channel, uint id, u64 lun)
>   {
> +	unsigned long l_idx;
>   	struct scsi_device *sdev;
>   
> -	list_for_each_entry(sdev, &shost->__devices, siblings) {
> +	xa_for_each(&shost->__devices, l_idx, sdev) {
>   		if (sdev->sdev_state == SDEV_DEL)
>   			continue;
>   		if (sdev->channel == channel && sdev->id == id &&
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 47835c4b4ee0..68df68f54fc8 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -372,9 +372,9 @@ static void scsi_kick_queue(struct request_queue *q)
>   static void scsi_single_lun_run(struct scsi_device *current_sdev)
>   {
>   	struct Scsi_Host *shost = current_sdev->host;
> -	struct scsi_device *sdev, *tmp;
> +	struct scsi_device *sdev;
>   	struct scsi_target *starget = scsi_target(current_sdev);
> -	unsigned long flags;
> +	unsigned long flags, l_idx;
>   
>   	spin_lock_irqsave(shost->host_lock, flags);
>   	starget->starget_sdev_user = NULL;
> @@ -391,8 +391,8 @@ static void scsi_single_lun_run(struct scsi_device *current_sdev)
>   	spin_lock_irqsave(shost->host_lock, flags);
>   	if (starget->starget_sdev_user)
>   		goto out;
> -	list_for_each_entry_safe(sdev, tmp, &starget->devices,
> -			same_target_siblings) {
> +	/* XARRAY: was list_for_each_entry_safe(); is this safe ? */
> +	xa_for_each(&starget->devices, l_idx, sdev) {
>   		if (sdev == current_sdev)
>   			continue;
>   		if (scsi_device_get(sdev))

Frankly, I don't know why we're using the _safe variant here.
Typically the _safe variant is used to protect against removal
(ie if a list element is removed while iterating over the list),
but I can't really imagine why starting I/O on the device would
cause one entry to be deleted...
James?

> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index f2437a7570ce..b8f5850c5a04 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -217,7 +217,7 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
>   {
>   	struct scsi_device *sdev;
>   	int display_failure_msg = 1, ret;
> -	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
> +	struct Scsi_Host *shost = starg_to_shost(starget);
>   
>   	sdev = kzalloc(sizeof(*sdev) + shost->transportt->device_size,
>   		       GFP_KERNEL);

This is actually a cleanup in itself :-)

> @@ -234,8 +234,9 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
>   	sdev->channel = starget->channel;
>   	mutex_init(&sdev->state_mutex);
>   	sdev->sdev_state = SDEV_CREATED;
> -	INIT_LIST_HEAD(&sdev->siblings);
> -	INIT_LIST_HEAD(&sdev->same_target_siblings);
> +	/* XARRAY: INIT_LIST_HEAD()s here on list leaves, why ? */
> +	sdev->sh_idx = -1;
> +	sdev->starg_idx = -1;		/* for debugging */
>   	INIT_LIST_HEAD(&sdev->starved_entry);
>   	INIT_LIST_HEAD(&sdev->event_list);
>   	spin_lock_init(&sdev->list_lock);

It is good practice to call INIT_LIST_HEAD() on list leaves as then you 
can call 'list_empty()' to figure out if they are part of a list or not.

But I'm wondering about the 'sh_idx' and the 'starg_idx' here.
We already have perfectly good indices with the 'lun' and 'id' entries 
in struct scsi_device, which (incidentally) _need_ to be unique.

So why not use them?

> @@ -306,8 +307,9 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
>   
>   static void scsi_target_destroy(struct scsi_target *starget)
>   {
> +	struct scsi_target *e_starg;
>   	struct device *dev = &starget->dev;
> -	struct Scsi_Host *shost = dev_to_shost(dev->parent);
> +	struct Scsi_Host *shost = starg_to_shost(starget);
>   	unsigned long flags;
>   
>   	BUG_ON(starget->state == STARGET_DEL);
> @@ -316,7 +318,10 @@ static void scsi_target_destroy(struct scsi_target *starget)
>   	spin_lock_irqsave(shost->host_lock, flags);
>   	if (shost->hostt->target_destroy)
>   		shost->hostt->target_destroy(starget);
> -	list_del_init(&starget->siblings);
> +	/* XARRAY: was list_del_init(); why the _init ?  */
> +	e_starg = xa_erase(&shost->__targets, starget->sh_idx);
> +	if (e_starg != starget)
> +		pr_err("%s: bad xa_erase()\n", __func__);
>   	spin_unlock_irqrestore(shost->host_lock, flags);
>   	put_device(dev);
>   }
See above for my comment about INIT_LIST_HEAD().
Calling list_del_init() will reset the 'siblings' entry such that 
list_empty() returns true.

> @@ -326,6 +331,7 @@ static void scsi_target_dev_release(struct device *dev)
>   	struct device *parent = dev->parent;
>   	struct scsi_target *starget = to_scsi_target(dev);
>   
> +	xa_destroy(&starget->devices);
>   	kfree(starget);
>   	put_device(parent);
>   }
> @@ -344,12 +350,13 @@ EXPORT_SYMBOL(scsi_is_target_device);
>   static struct scsi_target *__scsi_find_target(struct device *parent,
>   					      int channel, uint id)
>   {
> +	unsigned long l_idx;
>   	struct scsi_target *starget, *found_starget = NULL;
>   	struct Scsi_Host *shost = dev_to_shost(parent);
>   	/*
> -	 * Search for an existing target for this sdev.
> +	 * Search for an existing target for this host.
>   	 */
> -	list_for_each_entry(starget, &shost->__targets, siblings) {
> +	xa_for_each(&shost->__targets, l_idx, starget) {
>   		if (starget->id == id &&
>   		    starget->channel == channel) {
>   			found_starget = starget; > @@ -412,6 +419,8 @@ static struct scsi_target 
*scsi_alloc_target(struct device *parent,
>   	struct Scsi_Host *shost = dev_to_shost(parent);
>   	struct device *dev = NULL;
>   	unsigned long flags;
> +	unsigned int u_idx;
> +	int res;
>   	const int size = sizeof(struct scsi_target)
>   		+ shost->transportt->target_size;
>   	struct scsi_target *starget;
> @@ -427,14 +436,15 @@ static struct scsi_target *scsi_alloc_target(struct device *parent,
>   	device_initialize(dev);
>   	kref_init(&starget->reap_ref);
>   	dev->parent = get_device(parent);
> +	starget->parent_shost = shost;
>   	dev_set_name(dev, "target%d:%d:%d", shost->host_no, channel, id);
>   	dev->bus = &scsi_bus_type;
>   	dev->type = &scsi_target_type;
>   	starget->id = id;
>   	starget->channel = channel;
>   	starget->can_queue = 0;
> -	INIT_LIST_HEAD(&starget->siblings);
> -	INIT_LIST_HEAD(&starget->devices);
> +	xa_init_flags(&starget->devices, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ);
> +	starget->sh_idx = -1;		/* debugging */
>   	starget->state = STARGET_CREATED;
>   	starget->scsi_level = SCSI_2;
>   	starget->max_target_blocked = SCSI_DEFAULT_TARGET_BLOCKED;

Why do you need the parent_shost pointer?
That's precisely the point of the starget_to_shost() accessor; the shost 
device must be part of the parent chain, so we just need to follow it.
And it's not really performance-critical, either...

> @@ -445,7 +455,15 @@ static struct scsi_target *scsi_alloc_target(struct device *parent,
>   	if (found_target)
>   		goto found;
>   
> -	list_add_tail(&starget->siblings, &shost->__targets);
> +	/* XARRAY: was list_add_tail(); may get hole in xarray or tail */
> +	res = xa_alloc(&shost->__targets, &u_idx, starget, scsi_xa_limit_16b,
> +		       GFP_ATOMIC);
> +	if (res < 0) {
> +		spin_unlock_irqrestore(shost->host_lock, flags);
> +		pr_err("%s: xa_alloc failure errno=%d\n", __func__, -res);
> +		return NULL;
> +	}
> +	starget->sh_idx = u_idx;
>   	spin_unlock_irqrestore(shost->host_lock, flags);
>   	/* allocate and add */
>   	transport_setup_device(dev);

'scsi_xa_limit_16b' ?
What's that for?
And, again, if we were to use the target id as an index we could be 
using 'xa_store' and the problem would be solved...

> @@ -1049,7 +1067,7 @@ static int scsi_probe_and_add_lun(struct scsi_target *starget,
>   	unsigned char *result;
>   	blist_flags_t bflags;
>   	int res = SCSI_SCAN_NO_RESPONSE, result_len = 256;
> -	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
> +	struct Scsi_Host *shost = starg_to_shost(starget);
>   
>   	/*
>   	 * The rescan flag is used as an optimization, the first scan of a
> @@ -1199,7 +1217,7 @@ static void scsi_sequential_lun_scan(struct scsi_target *starget,
>   {
>   	uint max_dev_lun;
>   	u64 sparse_lun, lun;
> -	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
> +	struct Scsi_Host *shost = starg_to_shost(starget);
>   
>   	SCSI_LOG_SCAN_BUS(3, starget_printk(KERN_INFO, starget,
>   		"scsi scan: Sequential scan\n"));
> @@ -1297,7 +1315,7 @@ static int scsi_report_lun_scan(struct scsi_target *starget, blist_flags_t bflag
>   	struct scsi_lun *lunp, *lun_data;
>   	struct scsi_sense_hdr sshdr;
>   	struct scsi_device *sdev;
> -	struct Scsi_Host *shost = dev_to_shost(&starget->dev);
> +	struct Scsi_Host *shost = starg_to_shost(starget);
>   	int ret = 0;
>   
>   	/*
> @@ -1860,11 +1878,11 @@ EXPORT_SYMBOL(scsi_scan_host);
>   void scsi_forget_host(struct Scsi_Host *shost)
>   {
>   	struct scsi_device *sdev;
> -	unsigned long flags;
> +	unsigned long flags, l_idx;
>   
>    restart:
>   	spin_lock_irqsave(shost->host_lock, flags);
> -	list_for_each_entry(sdev, &shost->__devices, siblings) {
> +	xa_for_each(&shost->__devices, l_idx, sdev) {
>   		if (sdev->sdev_state == SDEV_DEL)
>   			continue;
>   		spin_unlock_irqrestore(shost->host_lock, flags);
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 163dbcb741c1..4bfcf33139a2 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -433,7 +433,7 @@ static void scsi_device_cls_release(struct device *class_dev)
>   
>   static void scsi_device_dev_release_usercontext(struct work_struct *work)
>   {
> -	struct scsi_device *sdev;
> +	struct scsi_device *sdev, *e_sdev;
>   	struct device *parent;
>   	struct list_head *this, *tmp;
>   	struct scsi_vpd *vpd_pg80 = NULL, *vpd_pg83 = NULL;
> @@ -447,8 +447,12 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
>   	parent = sdev->sdev_gendev.parent;
>   
>   	spin_lock_irqsave(sdev->host->host_lock, flags);
> -	list_del(&sdev->siblings);
> -	list_del(&sdev->same_target_siblings);
> +	e_sdev = xa_erase(&sdev->host->__devices, sdev->sh_idx);
> +	if (e_sdev != sdev)
> +		pr_err("%s: bad xa_erase(host:__devices)\n", __func__);
> +	e_sdev = xa_erase(&sdev->sdev_target->devices, sdev->starg_idx);
> +	if (e_sdev != sdev)
> +		pr_err("%s: bad xa_erase(sdev_target->devices)\n", __func__);
>   	list_del(&sdev->starved_entry);
>   	spin_unlock_irqrestore(sdev->host->host_lock, flags);
>   

Oh yeah, these double lists. I really wonder if we shouldn't do away 
with one of them (presumably the starget device list), and iterate over 
the shost device list with the starget filter instead.

> @@ -1471,22 +1475,19 @@ EXPORT_SYMBOL(scsi_remove_device);
>   
>   static void __scsi_remove_target(struct scsi_target *starget)
>   {
> -	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
> -	unsigned long flags;
> +	struct Scsi_Host *shost = starg_to_shost(starget);
>   	struct scsi_device *sdev;
> +	unsigned long flags, l_idx;
>   
>   	spin_lock_irqsave(shost->host_lock, flags);
>    restart:
> -	list_for_each_entry(sdev, &shost->__devices, siblings) {
> +	xa_for_each(&starget->devices, l_idx, sdev) {
>   		/*
>   		 * We cannot call scsi_device_get() here, as
>   		 * we might've been called from rmmod() causing
>   		 * scsi_device_get() to fail the module_is_live()
>   		 * check.
>   		 */
> -		if (sdev->channel != starget->channel ||
> -		    sdev->id != starget->id)
> -			continue;
>   		if (sdev->sdev_state == SDEV_DEL ||
>   		    sdev->sdev_state == SDEV_CANCEL ||
>   		    !get_device(&sdev->sdev_gendev))
> @@ -1495,6 +1496,7 @@ static void __scsi_remove_target(struct scsi_target *starget)
>   		scsi_remove_device(sdev);
>   		put_device(&sdev->sdev_gendev);
>   		spin_lock_irqsave(shost->host_lock, flags);
> +		/* XARRAY: is this goto start of loop correct ? */
>   		goto restart;
>   	}
>   	spin_unlock_irqrestore(shost->host_lock, flags);
This is a very convoluted way of _not_ using list_for_each_safe() ;-)
(James will probably telling me off as I'm missing the intricacies here)
But again, the prime reason for the goto here is that we're removing an 
element from the list, and need to ensure list consistency as we want to 
iterate across all devices and remove each of them.

So no need for that complicated loop; xa_for_each() and xa_erase() on 
each element should be sufficient.

> @@ -1512,11 +1514,11 @@ void scsi_remove_target(struct device *dev)
>   {
>   	struct Scsi_Host *shost = dev_to_shost(dev->parent);
>   	struct scsi_target *starget;
> -	unsigned long flags;
> +	unsigned long flags, l_idx;
>   
>   restart:
>   	spin_lock_irqsave(shost->host_lock, flags);
> -	list_for_each_entry(starget, &shost->__targets, siblings) {
> +	xa_for_each(&shost->__targets, l_idx, starget) {
>   		if (starget->state == STARGET_DEL ||
>   		    starget->state == STARGET_REMOVE ||
>   		    starget->state == STARGET_CREATED_REMOVE)
> @@ -1584,6 +1586,8 @@ static struct device_type scsi_dev_type = {
>   
>   void scsi_sysfs_device_initialize(struct scsi_device *sdev)
>   {
> +	int res;
> +	unsigned int u_idx;
>   	unsigned long flags;
>   	struct Scsi_Host *shost = sdev->host;
>   	struct scsi_target  *starget = sdev->sdev_target;
> @@ -1614,8 +1618,19 @@ void scsi_sysfs_device_initialize(struct scsi_device *sdev)
>   
>   	transport_setup_device(&sdev->sdev_gendev);
>   	spin_lock_irqsave(shost->host_lock, flags);
> -	list_add_tail(&sdev->same_target_siblings, &starget->devices);
> -	list_add_tail(&sdev->siblings, &shost->__devices);
> +	/* XARRAY: might add in hole */
> +	res = xa_alloc(&starget->devices, &u_idx, sdev, scsi_xa_limit_16b,
> +		       GFP_ATOMIC);
> +	if (res < 0)
> +		pr_err("%s: xa_alloc 1 failure errno=%d\n", __func__, -res);
> +	else
> +		sdev->starg_idx = u_idx;
> +	res = xa_alloc(&shost->__devices, &u_idx, sdev, scsi_xa_limit_16b,
> +		       GFP_ATOMIC);
> +	if (res < 0)
> +		pr_err("%s: xa_alloc 2 failure errno=%d\n", __func__, -res);
> +	else
> +		sdev->sh_idx = u_idx;
>   	spin_unlock_irqrestore(shost->host_lock, flags);
>   	/*
>   	 * device can now only be removed via __scsi_remove_device() so hold
Same here.
While LUNs are 64-bit, I've yet to come across a target leveraging the 
full 64 bit range. Plus for most arrays the number of LUNs per host is 
capped to 255 anyway.
Hence I guess we should be okay with using the LUN as an index into the 
xarray. If we ever get a real 64-LUN we can stuff them into the unused 
areas of the LUN range, eg by using the single-level LUN structure with 
LUNs above 256.

> diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
> index c9d92b3e777d..8547fc9ec344 100644
> --- a/drivers/target/target_core_pscsi.c
> +++ b/drivers/target/target_core_pscsi.c
> @@ -496,7 +496,7 @@ static int pscsi_configure_device(struct se_device *dev)
>   	}
>   
>   	spin_lock_irq(sh->host_lock);
> -	list_for_each_entry(sd, &sh->__devices, siblings) {
> +	__shost_for_each_device(sd, sh) {
>   		if ((pdv->pdv_channel_id != sd->channel) ||
>   		    (pdv->pdv_target_id != sd->id) ||
>   		    (pdv->pdv_lun_id != sd->lun))
> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> index c3cba2aaf934..dc9de4cc0e79 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -7,7 +7,9 @@
>   #include <linux/workqueue.h>
>   #include <linux/blkdev.h>
>   #include <scsi/scsi.h>
> +#include <scsi/scsi_host.h>
>   #include <linux/atomic.h>
> +#include <linux/xarray.h>
>   
>   struct device;
>   struct request_queue;
> @@ -103,8 +105,8 @@ struct scsi_device {
>   	struct request_queue *request_queue;
>   
>   	/* the next two are protected by the host->host_lock */
> -	struct list_head    siblings;   /* list of all devices on this host */
> -	struct list_head    same_target_siblings; /* just the devices sharing same target id */
> +	int sh_idx;		/* index within host->__devices array */
> +	int starg_idx;		/* index within sdev_target->devices array */
>   
>   	atomic_t device_busy;		/* commands actually active on LLDD */
>   	atomic_t device_blocked;	/* Device returned QUEUE_FULL. */
> @@ -281,16 +283,18 @@ enum scsi_target_state {
>    * scsi_target: representation of a scsi target, for now, this is only
>    * used for single_lun devices. If no one has active IO to the target,
>    * starget_sdev_user is NULL, else it points to the active sdev.
> + * Invariant: starg->parent_shost == dev_to_shost(starg->dev.parent)
>    */
>   struct scsi_target {
>   	struct scsi_device	*starget_sdev_user;
> -	struct list_head	siblings;
> -	struct list_head	devices;
> -	struct device		dev;
> +	struct Scsi_Host	*parent_shost;
> +	int			sh_idx;	/* index within Scsi_Host::__targets */
>   	struct kref		reap_ref; /* last put renders target invisible */
>   	unsigned int		channel;
>   	unsigned int		id; /* target id ... replace
>   				     * scsi_device.id eventually */
> +	struct xarray devices;	/* scsi_device objects owned by this target */
> +	struct device		dev;
>   	unsigned int		create:1; /* signal that it needs to be added */
>   	unsigned int		single_lun:1;	/* Indicates we should only
>   						 * allow I/O to one of the luns
> @@ -329,6 +333,11 @@ static inline struct scsi_target *scsi_target(struct scsi_device *sdev)
>   #define transport_class_to_starget(class_dev) \
>   	to_scsi_target(class_dev->parent)
>   
> +static inline struct Scsi_Host *starg_to_shost(struct scsi_target *starg)
> +{
> +	return starg->parent_shost;
> +}
> +
>   #define starget_printk(prefix, starget, fmt, a...)	\
>   	dev_printk(prefix, &(starget)->dev, fmt, ##a)
>   
> @@ -365,7 +374,7 @@ extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *,
>   /**
>    * shost_for_each_device - iterate over all devices of a host
>    * @sdev: the &struct scsi_device to use as a cursor
> - * @shost: the &struct scsi_host to iterate over
> + * @shost: the &struct Scsi_Host to iterate over
>    *
>    * Iterator that returns each device attached to @shost.  This loop
>    * takes a reference on each device and releases it at the end.  If
> @@ -376,21 +385,50 @@ extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *,
>   	     (sdev); \
>   	     (sdev) = __scsi_iterate_devices((shost), (sdev)))
>   
> +/* helper for __shost_for_each_device */
> +static inline struct scsi_device *__scsi_h2d_1st_it(struct Scsi_Host *shost)
> +{
> +	unsigned long l_idx = 0;
> +
> +	return xa_find(&shost->__devices, &l_idx, scsi_xa_limit_16b.max,
> +		       XA_PRESENT);
> +}
> +
> +/* helper for __shost_for_each_device */
> +static inline struct scsi_device *__scsi_h2d_next_it(struct Scsi_Host *shost,
> +						     struct scsi_device *prev)
> +{
> +	unsigned long l_idx;
> +
> +	if (prev) {
> +		l_idx = prev->sh_idx,
> +		prev = xa_find_after(&shost->__devices, &l_idx,
> +				     scsi_xa_limit_16b.max, XA_PRESENT);
> +	}
> +	return prev;	/* now either _next_ or NULL */
> +}
> +
>   /**
>    * __shost_for_each_device - iterate over all devices of a host (UNLOCKED)
>    * @sdev: the &struct scsi_device to use as a cursor
> - * @shost: the &struct scsi_host to iterate over
> + * @shost: the &struct Scsi_Host to iterate over
>    *
>    * Iterator that returns each device attached to @shost.  It does _not_
>    * take a reference on the scsi_device, so the whole loop must be
> - * protected by shost->host_lock.
> + * protected by shost->host_lock (see Note 2).
>    *
>    * Note: The only reason to use this is because you need to access the
>    * device list in interrupt context.  Otherwise you really want to use
>    * shost_for_each_device instead.
> + *
> + * Note 2: The iteration won't fail but dereferencing sdev might. To access
> + * the xarray features (e.g. marks) associated with sdev safely use:
> + * xa_for_each(&shost->__devices, l_idx, next) directly then use l_idx.
>    */
>   #define __shost_for_each_device(sdev, shost) \
> -	list_for_each_entry((sdev), &((shost)->__devices), siblings)
> +	for ((sdev) = __scsi_h2d_1st_it((shost)); \
> +	     (sdev); \
> +	     (sdev) = __scsi_h2d_next_it((shost), (sdev)))
>   
>   extern int scsi_change_queue_depth(struct scsi_device *, int);
>   extern int scsi_track_queue_full(struct scsi_device *, int);
> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
> index 822e8cda8d9b..e6386f3d6de1 100644
> --- a/include/scsi/scsi_host.h
> +++ b/include/scsi/scsi_host.h
> @@ -9,6 +9,7 @@
>   #include <linux/mutex.h>
>   #include <linux/seq_file.h>
>   #include <linux/blk-mq.h>
> +#include <linux/xarray.h>
>   #include <scsi/scsi.h>
>   
>   struct block_device;
> @@ -29,6 +30,9 @@ struct scsi_transport_template;
>   #define MODE_INITIATOR 0x01
>   #define MODE_TARGET 0x02
>   
> +/* XARRAY: this limits number of devices (LUs) in host to 64K */
> +#define scsi_xa_limit_16b    XA_LIMIT(0, USHRT_MAX)
> +
>   struct scsi_host_template {
>   	struct module *module;
>   	const char *name;
> @@ -233,7 +237,7 @@ struct scsi_host_template {
>   	 * If a host has the ability to discover targets on its own instead
>   	 * of scanning the entire bus, it can fill in this function and
>   	 * call scsi_scan_host().  This function will be called periodically
> -	 * until it returns 1 with the scsi_host and the elapsed time of
> +	 * until it returns 1 with the Scsi_Host and the elapsed time of
>   	 * the scan in jiffies.
>   	 *
>   	 * Status: OPTIONAL
> @@ -520,9 +524,9 @@ struct Scsi_Host {
>   	 * their __ prefixed variants with the lock held. NEVER
>   	 * access this list directly from a driver.
>   	 */
> -	struct list_head	__devices;
> -	struct list_head	__targets;
> -	
> +	struct xarray		__devices;	/* array of scsi_debug objs */
> +	struct xarray		__targets;	/* array of scsi_target objs */
> +
>   	struct list_head	starved_list;
>   
>   	spinlock_t		default_lock;
> diff --git a/include/scsi/scsi_transport.h b/include/scsi/scsi_transport.h
> index a0458bda3148..3756b264809a 100644
> --- a/include/scsi/scsi_transport.h
> +++ b/include/scsi/scsi_transport.h
> @@ -70,7 +70,8 @@ scsi_transport_reserve_device(struct scsi_transport_template * t, int space)
>   static inline void *
>   scsi_transport_target_data(struct scsi_target *starget)
>   {
> -	struct Scsi_Host *shost = dev_to_shost(&starget->dev);
> +	struct Scsi_Host *shost = starg_to_shost(starget);
> +
>   	return (u8 *)starget->starget_data
>   		+ shost->transportt->target_private_offset;
>   
> 
Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 3/6] scsi: xarray mark sdev_del state
  2020-05-24 15:58 ` [RFC v2 3/6] scsi: xarray mark sdev_del state Douglas Gilbert
@ 2020-05-25  7:00   ` Hannes Reinecke
  2020-05-25 16:47     ` Douglas Gilbert
  0 siblings, 1 reply; 25+ messages in thread
From: Hannes Reinecke @ 2020-05-25  7:00 UTC (permalink / raw)
  To: Douglas Gilbert, linux-scsi; +Cc: martin.petersen, jejb

On 5/24/20 5:58 PM, Douglas Gilbert wrote:
> One of the features of an xarray is the ability to mark the pointers
> that it holds. Sadly there are only two marks available per xarray so
> it is not possible to map marks to the sdev_state enumeration which
> holds 9 states. Since several loops skip sdev_s in SDEV_DEL state
> then its negation was chosen as one mark: SCSI_XA_NON_SDEV_DEL.
> 
> To support this mark a new iterator macro:
>    shost_for_each_non_del_device(sdev, shost)
> has been introduced plus a helper function:
>     __scsi_iterate_non_del_devices(shost, sdev).
> The new iterator macro is used once (in scsi_scan.c) and the new
> mark is used directly in three other loops. The machinery to support
> the mark is placed in a helper (scsi_adjust_non_sdev_del_mark())
> called mainly by scsi_device_set_state(). Other locations that set
> sdev_state were checked and some additional calls to that helper
> were added.
> 
> Following a complaint from 'make W=1' the formal name of a parameter
> to both starget_for_each_device() and __starget_for_each_device()
> was changed from starg to starget.
> 
> Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
> ---
>   drivers/scsi/scsi.c        | 49 +++++++++++++++++++++++++++++++++-----
>   drivers/scsi/scsi_lib.c    | 34 ++++++++++++++++++++++++--
>   drivers/scsi/scsi_scan.c   | 14 +++++------
>   drivers/scsi/scsi_sysfs.c  | 19 +++++++++------
>   include/scsi/scsi_device.h | 19 +++++++++++++++
>   include/scsi/scsi_host.h   |  2 ++
>   6 files changed, 114 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index aa3240f7aed9..0fb650aebcfb 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -588,6 +588,45 @@ struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost,
>   }
>   EXPORT_SYMBOL(__scsi_iterate_devices);
>   
> +/* helper for shost_for_each_non_del_device, see that for documentation */
> +struct scsi_device *__scsi_iterate_non_del_devices(struct Scsi_Host *shost,
> +						   struct scsi_device *prev)
> +{
> +	struct scsi_device *next = prev;
> +	unsigned long flags, l_idx;
> +
> +	spin_lock_irqsave(shost->host_lock, flags);
> +	if (xa_empty(&shost->__devices)) {
> +		next = NULL;
> +		goto unlock;
> +	}
> +	do {
> +		if (!next) {	/* get first element iff first iteration */
> +			l_idx = 0;
> +			next = xa_find(&shost->__devices, &l_idx,
> +				       scsi_xa_limit_16b.max,
> +				       SCSI_XA_NON_SDEV_DEL);
> +		} else {
> +			l_idx = next->sh_idx,
> +			next = xa_find_after(&shost->__devices, &l_idx,
> +					     scsi_xa_limit_16b.max,
> +					     SCSI_XA_NON_SDEV_DEL);
> +		}
> +		if (next) {
> +			/* skip devices that we can't get a reference to */
> +			if (!scsi_device_get(next))
> +				break;
> +		}
> +	} while (next);
> +unlock:
> +	spin_unlock_irqrestore(shost->host_lock, flags);
> +
> +	if (prev)
> +		scsi_device_put(prev);
> +	return next;
> +}
> +EXPORT_SYMBOL(__scsi_iterate_non_del_devices);
> +
>   /* helper for starg_for_each_device, see that for documentation */
>   struct scsi_device *__scsi_target_iterate_devices(struct scsi_target *starg,
>   						  struct scsi_device *prev)
> @@ -695,9 +734,8 @@ struct scsi_device *__scsi_device_lookup_by_target(struct scsi_target *starget,
>   	unsigned long l_idx;
>   	struct scsi_device *sdev;
>   
> -	xa_for_each(&starget->devices, l_idx, sdev) {
> -		if (sdev->sdev_state == SDEV_DEL)
> -			continue;
> +	xa_for_each_marked(&starget->devices, l_idx, sdev,
> +			   SCSI_XA_NON_SDEV_DEL) {
>   		if (sdev->lun ==lun)
>   			return sdev;
>   	}
> @@ -754,9 +792,8 @@ struct scsi_device *__scsi_device_lookup(struct Scsi_Host *shost,
>   	unsigned long l_idx;
>   	struct scsi_device *sdev;
>   
> -	xa_for_each(&shost->__devices, l_idx, sdev) {
> -		if (sdev->sdev_state == SDEV_DEL)
> -			continue;
> +	xa_for_each_marked(&shost->__devices, l_idx, sdev,
> +			   SCSI_XA_NON_SDEV_DEL) {
>   		if (sdev->channel == channel && sdev->id == id &&
>   				sdev->lun ==lun)
>   			return sdev;
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 68df68f54fc8..f484bf2b5d7d 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2217,6 +2217,27 @@ scsi_test_unit_ready(struct scsi_device *sdev, int timeout, int retries,
>   }
>   EXPORT_SYMBOL(scsi_test_unit_ready);
>   
> +/* Assume host_lock taken and that xahp->xa_lock is the same lock. */
> +static inline void
> +scsi_adjust_non_sdev_del_mark(struct scsi_device *sdev,
> +			      enum scsi_device_state old_state,
> +			      enum scsi_device_state new_state)
> +{
> +	struct xarray *xatp =
> +			&to_scsi_target(sdev->sdev_gendev.parent)->devices;
> +	struct xarray *xahp = &sdev->host->__devices;
> +
> +	if (old_state == new_state)
> +		return;
> +	if (old_state == SDEV_DEL) {
> +		xa_set_mark(xatp, sdev->starg_idx, SCSI_XA_NON_SDEV_DEL);
> +		xa_set_mark(xahp, sdev->sh_idx, SCSI_XA_NON_SDEV_DEL);
> +	} else if (new_state == SDEV_DEL) {
> +		xa_clear_mark(xatp, sdev->starg_idx, SCSI_XA_NON_SDEV_DEL);
> +		xa_clear_mark(xahp, sdev->sh_idx, SCSI_XA_NON_SDEV_DEL);
> +	}
> +}
> +
>   /**
>    *	scsi_device_set_state - Take the given device through the device state model.
>    *	@sdev:	scsi device to change the state of.
> @@ -2330,6 +2351,8 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
>   
>   	}
>   	sdev->offline_already = false;
> +	if (unlikely(oldstate == SDEV_DEL || state == SDEV_DEL))
> +		scsi_adjust_non_sdev_del_mark(sdev, oldstate, state);
>   	sdev->sdev_state = state;
>   	return 0;
>   
> @@ -2731,14 +2754,21 @@ int scsi_internal_device_unblock_nowait(struct scsi_device *sdev,
>   	switch (sdev->sdev_state) {
>   	case SDEV_BLOCK:
>   	case SDEV_TRANSPORT_OFFLINE:
> +		if (unlikely(new_state == SDEV_DEL))
> +			scsi_adjust_non_sdev_del_mark(sdev, sdev->sdev_state,
> +						      SDEV_DEL);
>   		sdev->sdev_state = new_state;
>   		break;
>   	case SDEV_CREATED_BLOCK:
>   		if (new_state == SDEV_TRANSPORT_OFFLINE ||
> -		    new_state == SDEV_OFFLINE)
> +		    new_state == SDEV_OFFLINE) {
>   			sdev->sdev_state = new_state;
> -		else
> +		} else {
> +			if (unlikely(new_state == SDEV_DEL))
> +				scsi_adjust_non_sdev_del_mark
> +					(sdev, sdev->sdev_state, SDEV_DEL);
>   			sdev->sdev_state = SDEV_CREATED;
> +		}
>   		break;
>   	case SDEV_CANCEL:
>   	case SDEV_OFFLINE:
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index 72a0064a879a..f5a5e48ca5ae 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -280,7 +280,7 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
>   	scsi_change_queue_depth(sdev, sdev->host->cmd_per_lun ?
>   					sdev->host->cmd_per_lun : 1);
>   
> -	scsi_sysfs_device_initialize(sdev);
> +	scsi_sysfs_device_initialize(sdev);	/* marked as non_sdev_del */
>   
>   	if (shost->hostt->slave_alloc) {
>   		ret = shost->hostt->slave_alloc(sdev);
> @@ -1708,10 +1708,9 @@ int scsi_scan_host_selected(struct Scsi_Host *shost, unsigned int channel,
>   static void scsi_sysfs_add_devices(struct Scsi_Host *shost)
>   {
>   	struct scsi_device *sdev;
> -	shost_for_each_device(sdev, shost) {
> -		/* target removed before the device could be added */
> -		if (sdev->sdev_state == SDEV_DEL)
> -			continue;
> +
> +	/* target may be removed before devices could be added */
> +	shost_for_each_non_del_device(sdev, shost) {
>   		/* If device is already visible, skip adding it to sysfs */
>   		if (sdev->is_visible)
>   			continue;
> @@ -1883,9 +1882,8 @@ void scsi_forget_host(struct Scsi_Host *shost)
>   
>    restart:
>   	spin_lock_irqsave(shost->host_lock, flags);
> -	xa_for_each(&shost->__devices, l_idx, sdev) {
> -		if (sdev->sdev_state == SDEV_DEL)
> -			continue;
> +	xa_for_each_marked(&shost->__devices, l_idx, sdev,
> +			   SCSI_XA_NON_SDEV_DEL) {
>   		spin_unlock_irqrestore(shost->host_lock, flags);
>   		__scsi_remove_device(sdev);
>   		goto restart;
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 4bfcf33139a2..6fdd4648f374 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1481,15 +1481,15 @@ static void __scsi_remove_target(struct scsi_target *starget)
>   
>   	spin_lock_irqsave(shost->host_lock, flags);
>    restart:
> -	xa_for_each(&starget->devices, l_idx, sdev) {
> +	xa_for_each_marked(&starget->devices, l_idx, sdev,
> +			   SCSI_XA_NON_SDEV_DEL) {
>   		/*
>   		 * We cannot call scsi_device_get() here, as
>   		 * we might've been called from rmmod() causing
>   		 * scsi_device_get() to fail the module_is_live()
>   		 * check.
>   		 */
> -		if (sdev->sdev_state == SDEV_DEL ||
> -		    sdev->sdev_state == SDEV_CANCEL ||
> +		if (sdev->sdev_state == SDEV_CANCEL ||
>   		    !get_device(&sdev->sdev_gendev))
>   			continue;
>   		spin_unlock_irqrestore(shost->host_lock, flags);
> @@ -1621,16 +1621,21 @@ void scsi_sysfs_device_initialize(struct scsi_device *sdev)
>   	/* XARRAY: might add in hole */
>   	res = xa_alloc(&starget->devices, &u_idx, sdev, scsi_xa_limit_16b,
>   		       GFP_ATOMIC);
> -	if (res < 0)
> +	if (res < 0) {
>   		pr_err("%s: xa_alloc 1 failure errno=%d\n", __func__, -res);
> -	else
> +	} else {
>   		sdev->starg_idx = u_idx;
> +		xa_set_mark(&starget->devices, u_idx, SCSI_XA_NON_SDEV_DEL);
> +	}
>   	res = xa_alloc(&shost->__devices, &u_idx, sdev, scsi_xa_limit_16b,
>   		       GFP_ATOMIC);
> -	if (res < 0)
> +	if (res < 0) {
>   		pr_err("%s: xa_alloc 2 failure errno=%d\n", __func__, -res);
> -	else
> +	} else {
>   		sdev->sh_idx = u_idx;
> +		/* already have host_lock which is shost->__devices.xa_lock */
> +		xa_set_mark(&shost->__devices, u_idx, SCSI_XA_NON_SDEV_DEL);
> +	}
>   	spin_unlock_irqrestore(shost->host_lock, flags);
>   	/*
>   	 * device can now only be removed via __scsi_remove_device() so hold
> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> index 4219b8d1b94c..23bf686cbabc 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -372,6 +372,9 @@ extern void __starget_for_each_device(struct scsi_target *, void *,
>   /* only exposed to implement shost_for_each_device */
>   extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *,
>   						  struct scsi_device *);
> +/* only exposed to implement shost_for_each_non_del_device */
> +extern struct scsi_device *__scsi_iterate_non_del_devices
> +			(struct Scsi_Host *shost, struct scsi_device *sdev);
>   /* only exposed to implement starg_for_each_device */
>   extern struct scsi_device *__scsi_target_iterate_devices
>   					(struct scsi_target *starg,
> @@ -391,6 +394,22 @@ extern struct scsi_device *__scsi_target_iterate_devices
>   	     (sdev); \
>   	     (sdev) = __scsi_iterate_devices((shost), (sdev)))
>   
> +/**
> + * shost_for_each_non_del_device - iterate over those scsi_devices that do not
> + *				   have the SDEV_DEL state and are associated
> + *				   with given host
> + * @sdev: the &struct scsi_device to use as a cursor
> + * @shost: the &struct Scsi_Host to iterate over
> + *
> + * Iterator that returns each device attached to @shost.  This loop
> + * takes a reference on each device and releases it at the end.  If
> + * you break out of the loop, you must call scsi_device_put(sdev).
> + */
> +#define shost_for_each_non_del_device(sdev, shost) \
> +	for ((sdev) = __scsi_iterate_non_del_devices((shost), NULL); \
> +	     (sdev); \
> +	     (sdev) = __scsi_iterate_non_del_devices((shost), (sdev)))
> +
>   /* helper for __shost_for_each_device */
>   static inline struct scsi_device *__scsi_h2d_1st_it(struct Scsi_Host *shost)
>   {
> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
> index e6386f3d6de1..0e94b1feb8e9 100644
> --- a/include/scsi/scsi_host.h
> +++ b/include/scsi/scsi_host.h
> @@ -33,6 +33,8 @@ struct scsi_transport_template;
>   /* XARRAY: this limits number of devices (LUs) in host to 64K */
>   #define scsi_xa_limit_16b    XA_LIMIT(0, USHRT_MAX)
>   
> +#define SCSI_XA_NON_SDEV_DEL XA_MARK_1
> +
>   struct scsi_host_template {
>   	struct module *module;
>   	const char *name;
> 
Hmm.
Nice idea, yet I'm not sure if it's worth the hassle.
We still have to check additional conditions during the iterators;
saving just one condition is probably not worth it.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 2/6] scsi: xarray, iterations on scsi_target
  2020-05-24 15:58 ` [RFC v2 2/6] scsi: xarray, iterations on scsi_target Douglas Gilbert
@ 2020-05-25  7:06   ` Hannes Reinecke
  0 siblings, 0 replies; 25+ messages in thread
From: Hannes Reinecke @ 2020-05-25  7:06 UTC (permalink / raw)
  To: Douglas Gilbert, linux-scsi; +Cc: martin.petersen, jejb

On 5/24/20 5:58 PM, Douglas Gilbert wrote:
> For reasons unknown, the supplied functions for iterating through all
> scsi_device objects that belonged to a scsi_target involved going
> up to the scsi_host object (i.e. the target's parent) and iterating
> through all scsi_device objects belonging to that scsi_host, and only
> selecting those scsi_device objects that belonged to the original
> scsi_target object. While correct, that is very inefficient when it
> is noted that each scsi_target object has a 'devices' xarray which
> contains pointers to the scsi_device object it owns.
> 
> Modify starget_for_each_device() and __starget_for_each_device()
> to take this more efficient route. Add starg_for_each_device() and
> __starg_for_each_device() macros in scsi_device.h that are finer
> grained versions of the existing shost_for_each_device() and
> __shost_for_each_device() macros.
> 
> The new __scsi_target_iterate_devices() helper function takes
> the host_lock to be consistent with the existing
> __scsi_iterate_devices() function's locking. At this stage the user
> is _not_ being encouraged to use per scsi_target locks.
> 
> Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
> ---
>   drivers/scsi/scsi.c        | 68 ++++++++++++++++++++++++++---------
>   drivers/scsi/scsi_scan.c   |  3 +-
>   include/scsi/scsi_device.h | 73 +++++++++++++++++++++++++++++++++++++-
>   3 files changed, 125 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index 61aa68083f67..aa3240f7aed9 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -588,9 +588,50 @@ struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost,
>   }
>   EXPORT_SYMBOL(__scsi_iterate_devices);
>   
> +/* helper for starg_for_each_device, see that for documentation */
> +struct scsi_device *__scsi_target_iterate_devices(struct scsi_target *starg,
> +						  struct scsi_device *prev)
> +{
> +	struct Scsi_Host *shost = starg_to_shost(starg);
> +	struct scsi_device *next = prev;
> +	unsigned long flags, l_idx;
> +
> +	if (!shost)
> +		return NULL;
> +	spin_lock_irqsave(shost->host_lock, flags);
> +	if (xa_empty(&starg->devices)) {
> +		next = NULL;
> +		goto unlock;
> +	}
> +	do {
> +		if (!next) {	/* get first element iff first iteration */
> +			l_idx = 0;
> +			next = xa_find(&starg->devices, &l_idx,
> +				       scsi_xa_limit_16b.max, XA_PRESENT);
> +		} else {
> +			l_idx = next->sh_idx,
> +			next = xa_find_after(&starg->devices, &l_idx,
> +					     scsi_xa_limit_16b.max,
> +					     XA_PRESENT);
> +		}
> +		if (next) {
> +			/* skip devices that we can't get a reference to */
> +			if (!scsi_device_get(next))
> +				break;
> +		}
> +	} while (next);
> +unlock:
> +	spin_unlock_irqrestore(shost->host_lock, flags);
> +
> +	if (prev)
> +		scsi_device_put(prev);
> +	return next;
> +}
> +EXPORT_SYMBOL(__scsi_target_iterate_devices);
> +

As mentioned in the comments to the first patch, these iterations are 
horrible.
We really should look at doing away with __scsi_target_iterate_devices() 
and use xa_for_each() in the callers.

>   /**
>    * starget_for_each_device  -  helper to walk all devices of a target
> - * @starget:	target whose devices we want to iterate over.
> + * @starg:	target whose devices we want to iterate over.
>    * @data:	Opaque passed to each function call.
>    * @fn:		Function to call on each device
>    *
> @@ -598,23 +639,20 @@ EXPORT_SYMBOL(__scsi_iterate_devices);
>    * a reference that must be released by scsi_host_put when breaking
>    * out of the loop.
>    */
> -void starget_for_each_device(struct scsi_target *starget, void *data,
> -		     void (*fn)(struct scsi_device *, void *))
> +void starget_for_each_device(struct scsi_target *starg, void *data,
> +			     void (*fn)(struct scsi_device *, void *))
>   {
> -	struct Scsi_Host *shost = starg_to_shost(starget);
>   	struct scsi_device *sdev;
>   
> -	shost_for_each_device(sdev, shost) {
> -		if ((sdev->channel == starget->channel) &&
> -		    (sdev->id == starget->id))
> -			fn(sdev, data);
> -	}
> +	/* XARRAY: this looks like recursion, but isn't. Rename function? */
> +	sstarg_for_each_device(sdev, starg)
> +		fn(sdev, data);
>   }
>   EXPORT_SYMBOL(starget_for_each_device);
>   
>   /**
>    * __starget_for_each_device - helper to walk all devices of a target (UNLOCKED)
> - * @starget:	target whose devices we want to iterate over.
> + * @starg:	target whose devices we want to iterate over.
>    * @data:	parameter for callback @fn()
>    * @fn:		callback function that is invoked for each device
>    *
> @@ -626,17 +664,13 @@ EXPORT_SYMBOL(starget_for_each_device);
>    * they need to access the device list in irq context.  Otherwise you
>    * really want to use starget_for_each_device instead.
>    **/
> -void __starget_for_each_device(struct scsi_target *starget, void *data,
> +void __starget_for_each_device(struct scsi_target *starg, void *data,
>   			       void (*fn)(struct scsi_device *, void *))
>   {
> -	struct Scsi_Host *shost = starg_to_shost(starget);
>   	struct scsi_device *sdev;
>   
> -	__shost_for_each_device(sdev, shost) {
> -		if ((sdev->channel == starget->channel) &&
> -		    (sdev->id == starget->id))
> -			fn(sdev, data);
> -	}
> +	__sstarg_for_each_device(sdev, starg)
> +		fn(sdev, data);
>   }
>   EXPORT_SYMBOL(__starget_for_each_device);
>   
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index b8f5850c5a04..72a0064a879a 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -403,7 +403,8 @@ static void scsi_target_reap_ref_put(struct scsi_target *starget)
>   
>   /**
>    * scsi_alloc_target - allocate a new or find an existing target
> - * @parent:	parent of the target (need not be a scsi host)
> + * @parent:	may point to the parent shost, or an intermediate object
> + *		that dev_to_shost() can resolve to the parent shost
>    * @channel:	target channel number (zero if no channels)
>    * @id:		target id number
>    *
> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> index dc9de4cc0e79..4219b8d1b94c 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -295,6 +295,7 @@ struct scsi_target {
>   				     * scsi_device.id eventually */
>   	struct xarray devices;	/* scsi_device objects owned by this target */
>   	struct device		dev;
> +
>   	unsigned int		create:1; /* signal that it needs to be added */
>   	unsigned int		single_lun:1;	/* Indicates we should only
>   						 * allow I/O to one of the luns

Stray newline.

> @@ -361,6 +362,7 @@ extern struct scsi_device *scsi_device_lookup_by_target(struct scsi_target *,
>   							u64);
>   extern struct scsi_device *__scsi_device_lookup_by_target(struct scsi_target *,
>   							  u64);
> +/* XARRAY: these visitor names too close to sstarg_for_each_device macros? */
>   extern void starget_for_each_device(struct scsi_target *, void *,
>   		     void (*fn)(struct scsi_device *, void *));
>   extern void __starget_for_each_device(struct scsi_target *, void *,
> @@ -370,6 +372,10 @@ extern void __starget_for_each_device(struct scsi_target *, void *,
>   /* only exposed to implement shost_for_each_device */
>   extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *,
>   						  struct scsi_device *);
> +/* only exposed to implement starg_for_each_device */
> +extern struct scsi_device *__scsi_target_iterate_devices
> +					(struct scsi_target *starg,
> +					 struct scsi_device *sdev);
>   
>   /**
>    * shost_for_each_device - iterate over all devices of a host
> @@ -423,13 +429,78 @@ static inline struct scsi_device *__scsi_h2d_next_it(struct Scsi_Host *shost,
>    *
>    * Note 2: The iteration won't fail but dereferencing sdev might. To access
>    * the xarray features (e.g. marks) associated with sdev safely use:
> - * xa_for_each(&shost->__devices, l_idx, next) directly then use l_idx.
> + * xa_for_each(&shost->__devices, l_idx, sdev) directly then use l_idx.
>    */
>   #define __shost_for_each_device(sdev, shost) \
>   	for ((sdev) = __scsi_h2d_1st_it((shost)); \
>   	     (sdev); \
>   	     (sdev) = __scsi_h2d_next_it((shost), (sdev)))
>   
> +/**
> + * sstarg_for_each_device - iterate over all devices of a target
> + * @sdev: the &struct scsi_device to use as a cursor
> + * @starg: the &struct scsi_target to iterate over
> + *
> + * Iterator that returns each device attached to @starg.  This loop
> + * takes a reference on each device and releases it at the end.  If
> + * you break out of the loop, you must call scsi_device_put(sdev).
> + *
> + * Note: the leading double "ss" is to lessen the similarity between this
> + * macro and the starget_for_each_device() function declared above.
> + */
> +#define sstarg_for_each_device(sdev, starg) \
> +	for ((sdev) = __scsi_target_iterate_devices((starg), NULL); \
> +	     (sdev); \
> +	     (sdev) = __scsi_target_iterate_devices((starg), (sdev)))
> +
> +/* helper for __sstarg_for_each_device */
> +static inline struct scsi_device *__scsi_t2d_1st_it(struct scsi_target *starg)
> +{
> +	unsigned long l_idx = 0;
> +
> +	return xa_find(&starg->devices, &l_idx, scsi_xa_limit_16b.max,
> +		       XA_PRESENT);
> +}
> +
> +/* helper for __sstarg_for_each_device */
> +static inline struct scsi_device *__scsi_t2d_next_it(struct scsi_target *starg,
> +						     struct scsi_device *prev)
> +{
> +	unsigned long l_idx;
> +
> +	if (prev) {
> +		l_idx = prev->starg_idx,
> +		prev = xa_find_after(&starg->devices, &l_idx,
> +				     scsi_xa_limit_16b.max, XA_PRESENT);
> +	}
> +	return prev;	/* now either _next_ or NULL */
> +}
> +
> +/**
> + * __sstarg_for_each_device - iterate over all devices of a target (UNLOCKED)
> + * @sdev: the &struct scsi_device to use as a cursor
> + * @starg: the &struct scsi_target to iterate over
> + *
> + * Iterator that returns each device attached to @starg.  It does _not_
> + * take a reference on the scsi_device, so the whole loop must be
> + * protected by shost->host_lock.
> + *
> + * Note: The only reason to use this is because you need to access the
> + * device list in interrupt context.  Otherwise you really want to use
> + * starg_for_each_device instead.
> + *
> + * Note 2: The iteration won't fail but dereferencing sdev might. To access
> + * the xarray features (e.g. marks) associated with sdev safely use:
> + * xa_for_each(&starg->devices, l_idx, sdev) directly then use l_idx.
> + *
> + * Note 3: the leading double "ss" is to lessen the similarity between this
> + * macro and the __starget_for_each_device() function declared above.
> + */
> +#define __sstarg_for_each_device(sdev, starg) \
> +	for ((sdev) = __scsi_t2d_1st_it((starg)); \
> +	     (sdev); \
> +	     (sdev) = __scsi_t2d_next_it((starg), (sdev)))
> +
>   extern int scsi_change_queue_depth(struct scsi_device *, int);
>   extern int scsi_track_queue_full(struct scsi_device *, int);
>   
> 
I think we should be looking at an overhaul of these functions; the 
current model is structured around lists and the requirements for 
iterating over them.
When switching to xarray quite some restrictions don't apply anymore,
and as such trying to fit into the list model makes only limited sense.

Don't get me wrong: I _do_ think that using xarrays here _is_ a good 
idea. It's just that doing it properly might warrant a larger overhaul...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 4/6] scsi: improve scsi_device_lookup
  2020-05-24 15:58 ` [RFC v2 4/6] scsi: improve scsi_device_lookup Douglas Gilbert
@ 2020-05-25  7:07   ` Hannes Reinecke
  0 siblings, 0 replies; 25+ messages in thread
From: Hannes Reinecke @ 2020-05-25  7:07 UTC (permalink / raw)
  To: Douglas Gilbert, linux-scsi; +Cc: martin.petersen, jejb

On 5/24/20 5:58 PM, Douglas Gilbert wrote:
> When the __scsi_device_lookup() function is given a "ctl" (i.e. the
> latter part of "hctl" tuple), improve the loop to find a matching
> device (LU) in the given host. Rather than loop over all LUs in the
> host, first loop over all targets to find a match on "ct" then, if
> found, loop over all LUs in that target for a match on "l". These
> nested loops are better since they don't visit LUs belonging to
> non-matching targets. This improvement flows through to the locked
> version of this function, namely scsi_device_lookup().
> 
> Remove a 21 year old comment by the author that no longer seem
> relevant.
> 
> Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
> ---
>   drivers/scsi/scsi.c | 26 ++++++++++++++++++++------
>   1 file changed, 20 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index 0fb650aebcfb..9e7658aebdb7 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -35,7 +35,6 @@
>    *
>    *  Jiffies wrap fixes (host->resetting), 3 Dec 1998 Andrea Arcangeli
>    *
> - *  out_of_space hacks, D. Gilbert (dpg) 990608
>    */
>   
>   #include <linux/module.h>
> @@ -789,16 +788,31 @@ EXPORT_SYMBOL(scsi_device_lookup_by_target);
>   struct scsi_device *__scsi_device_lookup(struct Scsi_Host *shost,
>   		uint channel, uint id, u64 lun)
>   {
> -	unsigned long l_idx;
> +	unsigned long l_idx, m_idx;
> +	struct scsi_target *starg;
>   	struct scsi_device *sdev;
>   
> -	xa_for_each_marked(&shost->__devices, l_idx, sdev,
> +	if (xa_empty(&shost->__devices))
> +		return NULL;
> +	if (xa_empty(&shost->__targets))
> +		goto inconsistent;
> +	xa_for_each(&shost->__targets, l_idx, starg) {
> +		if (!(starg->id == id && starg->channel == channel))
> +			continue;
> +		xa_for_each_marked(&starg->devices, m_idx, sdev,
> +				   SCSI_XA_NON_SDEV_DEL) {
> +			if (sdev->lun == lun)
> +				return sdev;
> +		}
> +	}
> +	return NULL;
> +inconsistent:
> +	xa_for_each_marked(&shost->__devices, m_idx, sdev,
>   			   SCSI_XA_NON_SDEV_DEL) {
> -		if (sdev->channel == channel && sdev->id == id &&
> -				sdev->lun ==lun)
> +		if (sdev->id == id && sdev->channel == channel &&
> +		    sdev->lun == lun)
>   			return sdev;
>   	}
> -
>   	return NULL;
>   }
>   EXPORT_SYMBOL(__scsi_device_lookup);
> 
... and if we had been using the LUN as an index we could have using 
'xa_load()' directly without needing any loop ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 1/6] scsi: xarray hctl
  2020-05-25  6:57   ` Hannes Reinecke
@ 2020-05-25 16:30     ` Douglas Gilbert
  2020-05-25 17:40       ` Matthew Wilcox
  0 siblings, 1 reply; 25+ messages in thread
From: Douglas Gilbert @ 2020-05-25 16:30 UTC (permalink / raw)
  To: Hannes Reinecke, linux-scsi; +Cc: martin.petersen, jejb, Matthew Wilcox

On 2020-05-25 2:57 a.m., Hannes Reinecke wrote:
> On 5/24/20 5:58 PM, Douglas Gilbert wrote:
>> Replace three linked lists with xarrays:
>>    - Scsi_Host::__devices  collection of scsi_device objects
>>    - Scsi_Host::__targets  collection of scsi_target objects
>>    - scsi_target::devices  collection of scsi_device objects that
>>      belong to this target
>>
>> The collection of Scsi_Host objects remains unaltered. It uses the
>> idr/ida mechanism which already uses xarrays in its implementation.
>>
>> Add a scsi_target::parent_shost pointer for direct access to a
>> target's host since the oft called dev_to_shost() needs to loop through
>> any intermediate (transport supplied) objects between a target and its
>> parent host. Add a new, trivial access function: starg_to_shost() and
>> thus allow calls to dev_to_shost(starg->dev.parent) to be replaced
>> with starg_to_shost(starg). Use this faster replacement in mid-level
>> source and in an inline function defined in scsi_transport.h .
>>
>> Against the advice in scsi_devices.h and scsi_host.h this file:
>> drivers/target/target_core_pscsi.c uses Scsi_Host::__devices directly
>> and needs a minimal change to allow it to compile and work in the
>> same fashion.
>>
>> xarray technical info: all three xarrays are initialized with
>> XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ flags. When XA_FLAGS_ALLOC is used
>> a xarray allocates unique index numbers and uses one of the 3 available
>> marks to manage that allocation. So there are two marks remaining and
>> they are currently unused. The LOCK_IRQ flags means it calls
>> spin_lock_irq() internally on xarray modifying operations. xarray
>> non-modifying operations use the rcu lock.
>>
>> Reported-by: kbuild test robot <lkp@intel.com>
>> Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
>> ---
>>   drivers/scsi/hosts.c               |  8 +++--
>>   drivers/scsi/scsi.c                | 43 +++++++++++++++--------
>>   drivers/scsi/scsi_lib.c            |  8 ++---
>>   drivers/scsi/scsi_scan.c           | 48 +++++++++++++++++--------
>>   drivers/scsi/scsi_sysfs.c          | 41 +++++++++++++++-------
>>   drivers/target/target_core_pscsi.c |  2 +-
>>   include/scsi/scsi_device.h         | 56 +++++++++++++++++++++++++-----
>>   include/scsi/scsi_host.h           | 12 ++++---
>>   include/scsi/scsi_transport.h      |  3 +-
>>   9 files changed, 158 insertions(+), 63 deletions(-)
>>
>> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
>> index 7ec91c3a66ca..2bba293a497d 100644
>> --- a/drivers/scsi/hosts.c
>> +++ b/drivers/scsi/hosts.c
>> @@ -345,6 +345,8 @@ static void scsi_host_dev_release(struct device *dev)
>>       if (parent)
>>           put_device(parent);
>> +    xa_destroy(&shost->__targets);
>> +    xa_destroy(&shost->__devices);
>>       kfree(shost);
>>   }
>> @@ -382,8 +384,8 @@ struct Scsi_Host *scsi_host_alloc(struct 
>> scsi_host_template *sht, int privsize)
>>       shost->host_lock = &shost->default_lock;
>>       spin_lock_init(shost->host_lock);
>>       shost->shost_state = SHOST_CREATED;
>> -    INIT_LIST_HEAD(&shost->__devices);
>> -    INIT_LIST_HEAD(&shost->__targets);
>> +    xa_init_flags(&shost->__devices, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ);
>> +    xa_init_flags(&shost->__targets, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ);
>>       INIT_LIST_HEAD(&shost->eh_cmd_q);
>>       INIT_LIST_HEAD(&shost->starved_list);
>>       init_waitqueue_head(&shost->host_wait);
>> @@ -502,6 +504,8 @@ struct Scsi_Host *scsi_host_alloc(struct 
>> scsi_host_template *sht, int privsize)
>>    fail_index_remove:
>>       ida_simple_remove(&host_index_ida, shost->host_no);
>>    fail_kfree:
>> +    xa_destroy(&shost->__targets);
>> +    xa_destroy(&shost->__devices);
>>       kfree(shost);
>>       return NULL;
>>   }
>> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
>> index 56c24a73e0c7..61aa68083f67 100644
>> --- a/drivers/scsi/scsi.c
>> +++ b/drivers/scsi/scsi.c
>> @@ -554,19 +554,32 @@ EXPORT_SYMBOL(scsi_device_put);
>>   struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost,
>>                          struct scsi_device *prev)
>>   {
>> -    struct list_head *list = (prev ? &prev->siblings : &shost->__devices);
>> -    struct scsi_device *next = NULL;
>> -    unsigned long flags;
>> +    struct scsi_device *next = prev;
>> +    unsigned long flags, l_idx;
>>       spin_lock_irqsave(shost->host_lock, flags);
>> -    while (list->next != &shost->__devices) {
>> -        next = list_entry(list->next, struct scsi_device, siblings);
>> -        /* skip devices that we can't get a reference to */
>> -        if (!scsi_device_get(next))
>> -            break;
>> +    if (xa_empty(&shost->__devices)) {
>>           next = NULL;
>> -        list = list->next;
>> +        goto unlock;
>>       }
>> +    do {
>> +        if (!next) {    /* get first element iff first iteration */
>> +            l_idx = 0;
>> +            next = xa_find(&shost->__devices, &l_idx,
>> +                       scsi_xa_limit_16b.max, XA_PRESENT);
>> +        } else {
>> +            l_idx = next->sh_idx,
>> +            next = xa_find_after(&shost->__devices, &l_idx,
>> +                         scsi_xa_limit_16b.max,
>> +                         XA_PRESENT);
>> +        }
>> +        if (next) {
>> +            /* skip devices that we can't get a reference to */
>> +            if (!scsi_device_get(next))
>> +                break;
>> +        }
>> +    } while (next);
>> +unlock:
>>       spin_unlock_irqrestore(shost->host_lock, flags);
>>       if (prev)
> This one is patently ugly.
> 
> Why not kill __scsi_iterate_devices() and use xa_for_each() on each call site?

They are in the public interface of the ML. I didn't want to visit
ever file that touches uses scsi_devices.h . Apart from anything else
it makes that larger patchset harder to carry forward when those
other users (e.g. LLDs) have theri own changes.

Also it is a bit of proof-of-concept, IOwWs can xarray do this
contortion?

>> @@ -588,7 +601,7 @@ EXPORT_SYMBOL(__scsi_iterate_devices);
>>   void starget_for_each_device(struct scsi_target *starget, void *data,
>>                void (*fn)(struct scsi_device *, void *))
>>   {
>> -    struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
>> +    struct Scsi_Host *shost = starg_to_shost(starget);
>>       struct scsi_device *sdev;
>>       shost_for_each_device(sdev, shost) {
> 
> See comment above.

As James pointed out, there can be one or more levels inserted into
the ML object tree between the shost layer and the starget layer.
So dev_to_shost has a loop, an identity check and some pointer
arithmetic at each iteration. But if you start with a startget
and want your parent shost, that tomfoolery can be bypassed. And
I believe the improvement is measurable.

>> @@ -616,7 +629,7 @@ EXPORT_SYMBOL(starget_for_each_device);
>>   void __starget_for_each_device(struct scsi_target *starget, void *data,
>>                      void (*fn)(struct scsi_device *, void *))
>>   {
>> -    struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
>> +    struct Scsi_Host *shost = starg_to_shost(starget);
>>       struct scsi_device *sdev;
>>       __shost_for_each_device(sdev, shost) {
> 
> Same here.

dito.

>> @@ -645,9 +658,10 @@ EXPORT_SYMBOL(__starget_for_each_device);
>>   struct scsi_device *__scsi_device_lookup_by_target(struct scsi_target *starget,
>>                              u64 lun)
>>   {
>> +    unsigned long l_idx;
>>       struct scsi_device *sdev;
>> -    list_for_each_entry(sdev, &starget->devices, same_target_siblings) {
>> +    xa_for_each(&starget->devices, l_idx, sdev) {
>>           if (sdev->sdev_state == SDEV_DEL)
>>               continue;
>>           if (sdev->lun ==lun)
>> @@ -671,7 +685,7 @@ struct scsi_device *scsi_device_lookup_by_target(struct 
>> scsi_target *starget,
>>                            u64 lun)
>>   {
>>       struct scsi_device *sdev;
>> -    struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
>> +    struct Scsi_Host *shost = starg_to_shost(starget);
>>       unsigned long flags;
>>       spin_lock_irqsave(shost->host_lock, flags);
>> @@ -703,9 +717,10 @@ EXPORT_SYMBOL(scsi_device_lookup_by_target);
>>   struct scsi_device *__scsi_device_lookup(struct Scsi_Host *shost,
>>           uint channel, uint id, u64 lun)
>>   {
>> +    unsigned long l_idx;
>>       struct scsi_device *sdev;
>> -    list_for_each_entry(sdev, &shost->__devices, siblings) {
>> +    xa_for_each(&shost->__devices, l_idx, sdev) {
>>           if (sdev->sdev_state == SDEV_DEL)
>>               continue;
>>           if (sdev->channel == channel && sdev->id == id &&
>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> index 47835c4b4ee0..68df68f54fc8 100644
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -372,9 +372,9 @@ static void scsi_kick_queue(struct request_queue *q)
>>   static void scsi_single_lun_run(struct scsi_device *current_sdev)
>>   {
>>       struct Scsi_Host *shost = current_sdev->host;
>> -    struct scsi_device *sdev, *tmp;
>> +    struct scsi_device *sdev;
>>       struct scsi_target *starget = scsi_target(current_sdev);
>> -    unsigned long flags;
>> +    unsigned long flags, l_idx;
>>       spin_lock_irqsave(shost->host_lock, flags);
>>       starget->starget_sdev_user = NULL;
>> @@ -391,8 +391,8 @@ static void scsi_single_lun_run(struct scsi_device 
>> *current_sdev)
>>       spin_lock_irqsave(shost->host_lock, flags);
>>       if (starget->starget_sdev_user)
>>           goto out;
>> -    list_for_each_entry_safe(sdev, tmp, &starget->devices,
>> -            same_target_siblings) {
>> +    /* XARRAY: was list_for_each_entry_safe(); is this safe ? */
>> +    xa_for_each(&starget->devices, l_idx, sdev) {
>>           if (sdev == current_sdev)
>>               continue;
>>           if (scsi_device_get(sdev))
> 
> Frankly, I don't know why we're using the _safe variant here.
> Typically the _safe variant is used to protect against removal
> (ie if a list element is removed while iterating over the list),
> but I can't really imagine why starting I/O on the device would
> cause one entry to be deleted...
> James?
> 
>> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
>> index f2437a7570ce..b8f5850c5a04 100644
>> --- a/drivers/scsi/scsi_scan.c
>> +++ b/drivers/scsi/scsi_scan.c
>> @@ -217,7 +217,7 @@ static struct scsi_device *scsi_alloc_sdev(struct 
>> scsi_target *starget,
>>   {
>>       struct scsi_device *sdev;
>>       int display_failure_msg = 1, ret;
>> -    struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
>> +    struct Scsi_Host *shost = starg_to_shost(starget);
>>       sdev = kzalloc(sizeof(*sdev) + shost->transportt->device_size,
>>                  GFP_KERNEL);
> 
> This is actually a cleanup in itself :-)

Follow one pointer, or:
   - dive into its device object,
   - get its parent,
   - and run the dev_to_shost while loop

Hopefully you have convinced yourself by now.

>> @@ -234,8 +234,9 @@ static struct scsi_device *scsi_alloc_sdev(struct 
>> scsi_target *starget,
>>       sdev->channel = starget->channel;
>>       mutex_init(&sdev->state_mutex);
>>       sdev->sdev_state = SDEV_CREATED;
>> -    INIT_LIST_HEAD(&sdev->siblings);
>> -    INIT_LIST_HEAD(&sdev->same_target_siblings);
>> +    /* XARRAY: INIT_LIST_HEAD()s here on list leaves, why ? */
>> +    sdev->sh_idx = -1;
>> +    sdev->starg_idx = -1;        /* for debugging */
>>       INIT_LIST_HEAD(&sdev->starved_entry);
>>       INIT_LIST_HEAD(&sdev->event_list);
>>       spin_lock_init(&sdev->list_lock);
> 
> It is good practice to call INIT_LIST_HEAD() on list leaves as then you can call 
> 'list_empty()' to figure out if they are part of a list or not.
> 
> But I'm wondering about the 'sh_idx' and the 'starg_idx' here.
> We already have perfectly good indices with the 'lun' and 'id' entries in struct 
> scsi_device, which (incidentally) _need_ to be unique.
> 
> So why not use them?

Ah now we have something to discuss and perhaps Willy can help

At the target level we have two indexes:
    - channel (is 16 bits enough)
    - target (16 or 32 bits?)
This could be address to ways:
    1) introduce struct scsi_channel {int num_chans; scsi_target *t1p;}
    2) combine channnel and target into one 32 bit integer.

The xarray type of its index seems to be unsigned long (Willy,
is that correct?). That is the same size as a pointer, 32 bits
on a 32 bit machine, 64 bits on a 64 bit machine. That is
unfortunate, since we can represent the full 64 bit LUN as
a single index on a 32 bit machine. We could 'hide' that and
have two xarrays in scsi_device for 32 bit machines.

So to start with, I took the easy approach, the one that most
resembles the what include/linux/list.h has been using. There
is also a (overly ?) complex locking scheme that I really did
not want to upset.

Those num_chans, t1p members that I put in scsi_channel is just
to show the xarray lookup can be bypassed when num_chans=1
which is very likely. Same approach could be taken in scsi_target
to scsi_device.

BTW scsi_device is a terrible name for a LU. The SCSI device in
T10 jargon is the damn TARGET, not the LU.

>> @@ -306,8 +307,9 @@ static struct scsi_device *scsi_alloc_sdev(struct 
>> scsi_target *starget,
>>   static void scsi_target_destroy(struct scsi_target *starget)
>>   {
>> +    struct scsi_target *e_starg;
>>       struct device *dev = &starget->dev;
>> -    struct Scsi_Host *shost = dev_to_shost(dev->parent);
>> +    struct Scsi_Host *shost = starg_to_shost(starget);
>>       unsigned long flags;
>>       BUG_ON(starget->state == STARGET_DEL);
>> @@ -316,7 +318,10 @@ static void scsi_target_destroy(struct scsi_target *starget)
>>       spin_lock_irqsave(shost->host_lock, flags);
>>       if (shost->hostt->target_destroy)
>>           shost->hostt->target_destroy(starget);
>> -    list_del_init(&starget->siblings);
>> +    /* XARRAY: was list_del_init(); why the _init ?  */
>> +    e_starg = xa_erase(&shost->__targets, starget->sh_idx);
>> +    if (e_starg != starget)
>> +        pr_err("%s: bad xa_erase()\n", __func__);
>>       spin_unlock_irqrestore(shost->host_lock, flags);
>>       put_device(dev);
>>   }
> See above for my comment about INIT_LIST_HEAD().
> Calling list_del_init() will reset the 'siblings' entry such that list_empty() 
> returns true.
> 
>> @@ -326,6 +331,7 @@ static void scsi_target_dev_release(struct device *dev)
>>       struct device *parent = dev->parent;
>>       struct scsi_target *starget = to_scsi_target(dev);
>> +    xa_destroy(&starget->devices);
>>       kfree(starget);
>>       put_device(parent);
>>   }
>> @@ -344,12 +350,13 @@ EXPORT_SYMBOL(scsi_is_target_device);
>>   static struct scsi_target *__scsi_find_target(struct device *parent,
>>                             int channel, uint id)
>>   {
>> +    unsigned long l_idx;
>>       struct scsi_target *starget, *found_starget = NULL;
>>       struct Scsi_Host *shost = dev_to_shost(parent);
>>       /*
>> -     * Search for an existing target for this sdev.
>> +     * Search for an existing target for this host.
>>        */
>> -    list_for_each_entry(starget, &shost->__targets, siblings) {
>> +    xa_for_each(&shost->__targets, l_idx, starget) {
>>           if (starget->id == id &&
>>               starget->channel == channel) {
>>               found_starget = starget; > @@ -412,6 +419,8 @@ static struct 
>> scsi_target 
> *scsi_alloc_target(struct device *parent,
>>       struct Scsi_Host *shost = dev_to_shost(parent);
>>       struct device *dev = NULL;
>>       unsigned long flags;
>> +    unsigned int u_idx;
>> +    int res;
>>       const int size = sizeof(struct scsi_target)
>>           + shost->transportt->target_size;
>>       struct scsi_target *starget;
>> @@ -427,14 +436,15 @@ static struct scsi_target *scsi_alloc_target(struct 
>> device *parent,
>>       device_initialize(dev);
>>       kref_init(&starget->reap_ref);
>>       dev->parent = get_device(parent);
>> +    starget->parent_shost = shost;
>>       dev_set_name(dev, "target%d:%d:%d", shost->host_no, channel, id);
>>       dev->bus = &scsi_bus_type;
>>       dev->type = &scsi_target_type;
>>       starget->id = id;
>>       starget->channel = channel;
>>       starget->can_queue = 0;
>> -    INIT_LIST_HEAD(&starget->siblings);
>> -    INIT_LIST_HEAD(&starget->devices);
>> +    xa_init_flags(&starget->devices, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ);
>> +    starget->sh_idx = -1;        /* debugging */
>>       starget->state = STARGET_CREATED;
>>       starget->scsi_level = SCSI_2;
>>       starget->max_target_blocked = SCSI_DEFAULT_TARGET_BLOCKED;
> 
> Why do you need the parent_shost pointer?
> That's precisely the point of the starget_to_shost() accessor; the shost device 
> must be part of the parent chain, so we just need to follow it.
> And it's not really performance-critical, either...

See "As James pointed out ..." above

> 
>> @@ -445,7 +455,15 @@ static struct scsi_target *scsi_alloc_target(struct 
>> device *parent,
>>       if (found_target)
>>           goto found;
>> -    list_add_tail(&starget->siblings, &shost->__targets);
>> +    /* XARRAY: was list_add_tail(); may get hole in xarray or tail */
>> +    res = xa_alloc(&shost->__targets, &u_idx, starget, scsi_xa_limit_16b,
>> +               GFP_ATOMIC);
>> +    if (res < 0) {
>> +        spin_unlock_irqrestore(shost->host_lock, flags);
>> +        pr_err("%s: xa_alloc failure errno=%d\n", __func__, -res);
>> +        return NULL;
>> +    }
>> +    starget->sh_idx = u_idx;
>>       spin_unlock_irqrestore(shost->host_lock, flags);
>>       /* allocate and add */
>>       transport_setup_device(dev);
> 
> 'scsi_xa_limit_16b' ?
> What's that for?

See include/linux/xarray.h xa_limit_32b

This is a 16 bit equivalent. Not sure if that is too restrictive.
Anyway, the xa_alloc() call amongst others needs an upper and
lower bound.

> And, again, if we were to use the target id as an index we could be using 
> 'xa_store' and the problem would be solved...
> 
>> @@ -1049,7 +1067,7 @@ static int scsi_probe_and_add_lun(struct scsi_target 
>> *starget,
>>       unsigned char *result;
>>       blist_flags_t bflags;
>>       int res = SCSI_SCAN_NO_RESPONSE, result_len = 256;
>> -    struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
>> +    struct Scsi_Host *shost = starg_to_shost(starget);
>>       /*
>>        * The rescan flag is used as an optimization, the first scan of a
>> @@ -1199,7 +1217,7 @@ static void scsi_sequential_lun_scan(struct scsi_target 
>> *starget,
>>   {
>>       uint max_dev_lun;
>>       u64 sparse_lun, lun;
>> -    struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
>> +    struct Scsi_Host *shost = starg_to_shost(starget);
>>       SCSI_LOG_SCAN_BUS(3, starget_printk(KERN_INFO, starget,
>>           "scsi scan: Sequential scan\n"));
>> @@ -1297,7 +1315,7 @@ static int scsi_report_lun_scan(struct scsi_target 
>> *starget, blist_flags_t bflag
>>       struct scsi_lun *lunp, *lun_data;
>>       struct scsi_sense_hdr sshdr;
>>       struct scsi_device *sdev;
>> -    struct Scsi_Host *shost = dev_to_shost(&starget->dev);
>> +    struct Scsi_Host *shost = starg_to_shost(starget);
>>       int ret = 0;
>>       /*
>> @@ -1860,11 +1878,11 @@ EXPORT_SYMBOL(scsi_scan_host);
>>   void scsi_forget_host(struct Scsi_Host *shost)
>>   {
>>       struct scsi_device *sdev;
>> -    unsigned long flags;
>> +    unsigned long flags, l_idx;
>>    restart:
>>       spin_lock_irqsave(shost->host_lock, flags);
>> -    list_for_each_entry(sdev, &shost->__devices, siblings) {
>> +    xa_for_each(&shost->__devices, l_idx, sdev) {
>>           if (sdev->sdev_state == SDEV_DEL)
>>               continue;
>>           spin_unlock_irqrestore(shost->host_lock, flags);
>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
>> index 163dbcb741c1..4bfcf33139a2 100644
>> --- a/drivers/scsi/scsi_sysfs.c
>> +++ b/drivers/scsi/scsi_sysfs.c
>> @@ -433,7 +433,7 @@ static void scsi_device_cls_release(struct device *class_dev)
>>   static void scsi_device_dev_release_usercontext(struct work_struct *work)
>>   {
>> -    struct scsi_device *sdev;
>> +    struct scsi_device *sdev, *e_sdev;
>>       struct device *parent;
>>       struct list_head *this, *tmp;
>>       struct scsi_vpd *vpd_pg80 = NULL, *vpd_pg83 = NULL;
>> @@ -447,8 +447,12 @@ static void scsi_device_dev_release_usercontext(struct 
>> work_struct *work)
>>       parent = sdev->sdev_gendev.parent;
>>       spin_lock_irqsave(sdev->host->host_lock, flags);
>> -    list_del(&sdev->siblings);
>> -    list_del(&sdev->same_target_siblings);
>> +    e_sdev = xa_erase(&sdev->host->__devices, sdev->sh_idx);
>> +    if (e_sdev != sdev)
>> +        pr_err("%s: bad xa_erase(host:__devices)\n", __func__);
>> +    e_sdev = xa_erase(&sdev->sdev_target->devices, sdev->starg_idx);
>> +    if (e_sdev != sdev)
>> +        pr_err("%s: bad xa_erase(sdev_target->devices)\n", __func__);
>>       list_del(&sdev->starved_entry);
>>       spin_unlock_irqrestore(sdev->host->host_lock, flags);
> 
> Oh yeah, these double lists. I really wonder if we shouldn't do away with one of 
> them (presumably the starget device list), and iterate over the shost device 
> list with the starget filter instead.

The are lots of iterations over a shost for sdev_s in the ML code.
Some of them have been changed to an interation over stargat for
sdev_s.

>> @@ -1471,22 +1475,19 @@ EXPORT_SYMBOL(scsi_remove_device);
>>   static void __scsi_remove_target(struct scsi_target *starget)
>>   {
>> -    struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
>> -    unsigned long flags;
>> +    struct Scsi_Host *shost = starg_to_shost(starget);
>>       struct scsi_device *sdev;
>> +    unsigned long flags, l_idx;
>>       spin_lock_irqsave(shost->host_lock, flags);
>>    restart:
>> -    list_for_each_entry(sdev, &shost->__devices, siblings) {
>> +    xa_for_each(&starget->devices, l_idx, sdev) {
>>           /*
>>            * We cannot call scsi_device_get() here, as
>>            * we might've been called from rmmod() causing
>>            * scsi_device_get() to fail the module_is_live()
>>            * check.
>>            */
>> -        if (sdev->channel != starget->channel ||
>> -            sdev->id != starget->id)
>> -            continue;
>>           if (sdev->sdev_state == SDEV_DEL ||
>>               sdev->sdev_state == SDEV_CANCEL ||
>>               !get_device(&sdev->sdev_gendev))
>> @@ -1495,6 +1496,7 @@ static void __scsi_remove_target(struct scsi_target 
>> *starget)
>>           scsi_remove_device(sdev);
>>           put_device(&sdev->sdev_gendev);
>>           spin_lock_irqsave(shost->host_lock, flags);
>> +        /* XARRAY: is this goto start of loop correct ? */
>>           goto restart;
>>       }
>>       spin_unlock_irqrestore(shost->host_lock, flags);
> This is a very convoluted way of _not_ using list_for_each_safe() ;-)
> (James will probably telling me off as I'm missing the intricacies here)
> But again, the prime reason for the goto here is that we're removing an element 
> from the list, and need to ensure list consistency as we want to iterate across 
> all devices and remove each of them.
> 
> So no need for that complicated loop; xa_for_each() and xa_erase() on each 
> element should be sufficient.

Good. If nothing else, xarrays lead to cleaner code with less subtleties
and a clear distinction between collector and collectee.

>> @@ -1512,11 +1514,11 @@ void scsi_remove_target(struct device *dev)
>>   {
>>       struct Scsi_Host *shost = dev_to_shost(dev->parent);
>>       struct scsi_target *starget;
>> -    unsigned long flags;
>> +    unsigned long flags, l_idx;
>>   restart:
>>       spin_lock_irqsave(shost->host_lock, flags);
>> -    list_for_each_entry(starget, &shost->__targets, siblings) {
>> +    xa_for_each(&shost->__targets, l_idx, starget) {
>>           if (starget->state == STARGET_DEL ||
>>               starget->state == STARGET_REMOVE ||
>>               starget->state == STARGET_CREATED_REMOVE)
>> @@ -1584,6 +1586,8 @@ static struct device_type scsi_dev_type = {
>>   void scsi_sysfs_device_initialize(struct scsi_device *sdev)
>>   {
>> +    int res;
>> +    unsigned int u_idx;
>>       unsigned long flags;
>>       struct Scsi_Host *shost = sdev->host;
>>       struct scsi_target  *starget = sdev->sdev_target;
>> @@ -1614,8 +1618,19 @@ void scsi_sysfs_device_initialize(struct scsi_device 
>> *sdev)
>>       transport_setup_device(&sdev->sdev_gendev);
>>       spin_lock_irqsave(shost->host_lock, flags);
>> -    list_add_tail(&sdev->same_target_siblings, &starget->devices);
>> -    list_add_tail(&sdev->siblings, &shost->__devices);
>> +    /* XARRAY: might add in hole */
>> +    res = xa_alloc(&starget->devices, &u_idx, sdev, scsi_xa_limit_16b,
>> +               GFP_ATOMIC);
>> +    if (res < 0)
>> +        pr_err("%s: xa_alloc 1 failure errno=%d\n", __func__, -res);
>> +    else
>> +        sdev->starg_idx = u_idx;
>> +    res = xa_alloc(&shost->__devices, &u_idx, sdev, scsi_xa_limit_16b,
>> +               GFP_ATOMIC);
>> +    if (res < 0)
>> +        pr_err("%s: xa_alloc 2 failure errno=%d\n", __func__, -res);
>> +    else
>> +        sdev->sh_idx = u_idx;
>>       spin_unlock_irqrestore(shost->host_lock, flags);
>>       /*
>>        * device can now only be removed via __scsi_remove_device() so hold
> Same here.
> While LUNs are 64-bit, I've yet to come across a target leveraging the full 64 
> bit range. Plus for most arrays the number of LUNs per host is capped to 255 
> anyway.
> Hence I guess we should be okay with using the LUN as an index into the xarray. 
> If we ever get a real 64-LUN we can stuff them into the unused areas of the LUN 
> range, eg by using the single-level LUN structure with LUNs above 256.

Discussed above.

>> diff --git a/drivers/target/target_core_pscsi.c 
>> b/drivers/target/target_core_pscsi.c
>> index c9d92b3e777d..8547fc9ec344 100644
>> --- a/drivers/target/target_core_pscsi.c
>> +++ b/drivers/target/target_core_pscsi.c
>> @@ -496,7 +496,7 @@ static int pscsi_configure_device(struct se_device *dev)
>>       }
>>       spin_lock_irq(sh->host_lock);
>> -    list_for_each_entry(sd, &sh->__devices, siblings) {
>> +    __shost_for_each_device(sd, sh) {
>>           if ((pdv->pdv_channel_id != sd->channel) ||
>>               (pdv->pdv_target_id != sd->id) ||
>>               (pdv->pdv_lun_id != sd->lun))
>> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
>> index c3cba2aaf934..dc9de4cc0e79 100644
>> --- a/include/scsi/scsi_device.h
>> +++ b/include/scsi/scsi_device.h
>> @@ -7,7 +7,9 @@
>>   #include <linux/workqueue.h>
>>   #include <linux/blkdev.h>
>>   #include <scsi/scsi.h>
>> +#include <scsi/scsi_host.h>
>>   #include <linux/atomic.h>
>> +#include <linux/xarray.h>
>>   struct device;
>>   struct request_queue;
>> @@ -103,8 +105,8 @@ struct scsi_device {
>>       struct request_queue *request_queue;
>>       /* the next two are protected by the host->host_lock */
>> -    struct list_head    siblings;   /* list of all devices on this host */
>> -    struct list_head    same_target_siblings; /* just the devices sharing 
>> same target id */
>> +    int sh_idx;        /* index within host->__devices array */
>> +    int starg_idx;        /* index within sdev_target->devices array */
>>       atomic_t device_busy;        /* commands actually active on LLDD */
>>       atomic_t device_blocked;    /* Device returned QUEUE_FULL. */
>> @@ -281,16 +283,18 @@ enum scsi_target_state {
>>    * scsi_target: representation of a scsi target, for now, this is only
>>    * used for single_lun devices. If no one has active IO to the target,
>>    * starget_sdev_user is NULL, else it points to the active sdev.
>> + * Invariant: starg->parent_shost == dev_to_shost(starg->dev.parent)
>>    */
>>   struct scsi_target {
>>       struct scsi_device    *starget_sdev_user;
>> -    struct list_head    siblings;
>> -    struct list_head    devices;
>> -    struct device        dev;
>> +    struct Scsi_Host    *parent_shost;
>> +    int            sh_idx;    /* index within Scsi_Host::__targets */
>>       struct kref        reap_ref; /* last put renders target invisible */
>>       unsigned int        channel;
>>       unsigned int        id; /* target id ... replace
>>                        * scsi_device.id eventually */
>> +    struct xarray devices;    /* scsi_device objects owned by this target */
>> +    struct device        dev;
>>       unsigned int        create:1; /* signal that it needs to be added */
>>       unsigned int        single_lun:1;    /* Indicates we should only
>>                            * allow I/O to one of the luns
>> @@ -329,6 +333,11 @@ static inline struct scsi_target *scsi_target(struct 
>> scsi_device *sdev)
>>   #define transport_class_to_starget(class_dev) \
>>       to_scsi_target(class_dev->parent)
>> +static inline struct Scsi_Host *starg_to_shost(struct scsi_target *starg)
>> +{
>> +    return starg->parent_shost;
>> +}
>> +
>>   #define starget_printk(prefix, starget, fmt, a...)    \
>>       dev_printk(prefix, &(starget)->dev, fmt, ##a)
>> @@ -365,7 +374,7 @@ extern struct scsi_device *__scsi_iterate_devices(struct 
>> Scsi_Host *,
>>   /**
>>    * shost_for_each_device - iterate over all devices of a host
>>    * @sdev: the &struct scsi_device to use as a cursor
>> - * @shost: the &struct scsi_host to iterate over
>> + * @shost: the &struct Scsi_Host to iterate over
>>    *
>>    * Iterator that returns each device attached to @shost.  This loop
>>    * takes a reference on each device and releases it at the end.  If
>> @@ -376,21 +385,50 @@ extern struct scsi_device *__scsi_iterate_devices(struct 
>> Scsi_Host *,
>>            (sdev); \
>>            (sdev) = __scsi_iterate_devices((shost), (sdev)))
>> +/* helper for __shost_for_each_device */
>> +static inline struct scsi_device *__scsi_h2d_1st_it(struct Scsi_Host *shost)
>> +{
>> +    unsigned long l_idx = 0;
>> +
>> +    return xa_find(&shost->__devices, &l_idx, scsi_xa_limit_16b.max,
>> +               XA_PRESENT);
>> +}
>> +
>> +/* helper for __shost_for_each_device */
>> +static inline struct scsi_device *__scsi_h2d_next_it(struct Scsi_Host *shost,
>> +                             struct scsi_device *prev)
>> +{
>> +    unsigned long l_idx;
>> +
>> +    if (prev) {
>> +        l_idx = prev->sh_idx,
>> +        prev = xa_find_after(&shost->__devices, &l_idx,
>> +                     scsi_xa_limit_16b.max, XA_PRESENT);
>> +    }
>> +    return prev;    /* now either _next_ or NULL */
>> +}
>> +
>>   /**
>>    * __shost_for_each_device - iterate over all devices of a host (UNLOCKED)
>>    * @sdev: the &struct scsi_device to use as a cursor
>> - * @shost: the &struct scsi_host to iterate over
>> + * @shost: the &struct Scsi_Host to iterate over
>>    *
>>    * Iterator that returns each device attached to @shost.  It does _not_
>>    * take a reference on the scsi_device, so the whole loop must be
>> - * protected by shost->host_lock.
>> + * protected by shost->host_lock (see Note 2).
>>    *
>>    * Note: The only reason to use this is because you need to access the
>>    * device list in interrupt context.  Otherwise you really want to use
>>    * shost_for_each_device instead.
>> + *
>> + * Note 2: The iteration won't fail but dereferencing sdev might. To access
>> + * the xarray features (e.g. marks) associated with sdev safely use:
>> + * xa_for_each(&shost->__devices, l_idx, next) directly then use l_idx.
>>    */
>>   #define __shost_for_each_device(sdev, shost) \
>> -    list_for_each_entry((sdev), &((shost)->__devices), siblings)
>> +    for ((sdev) = __scsi_h2d_1st_it((shost)); \
>> +         (sdev); \
>> +         (sdev) = __scsi_h2d_next_it((shost), (sdev)))
>>   extern int scsi_change_queue_depth(struct scsi_device *, int);
>>   extern int scsi_track_queue_full(struct scsi_device *, int);
>> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
>> index 822e8cda8d9b..e6386f3d6de1 100644
>> --- a/include/scsi/scsi_host.h
>> +++ b/include/scsi/scsi_host.h
>> @@ -9,6 +9,7 @@
>>   #include <linux/mutex.h>
>>   #include <linux/seq_file.h>
>>   #include <linux/blk-mq.h>
>> +#include <linux/xarray.h>
>>   #include <scsi/scsi.h>
>>   struct block_device;
>> @@ -29,6 +30,9 @@ struct scsi_transport_template;
>>   #define MODE_INITIATOR 0x01
>>   #define MODE_TARGET 0x02
>> +/* XARRAY: this limits number of devices (LUs) in host to 64K */
>> +#define scsi_xa_limit_16b    XA_LIMIT(0, USHRT_MAX)
>> +
>>   struct scsi_host_template {
>>       struct module *module;
>>       const char *name;
>> @@ -233,7 +237,7 @@ struct scsi_host_template {
>>        * If a host has the ability to discover targets on its own instead
>>        * of scanning the entire bus, it can fill in this function and
>>        * call scsi_scan_host().  This function will be called periodically
>> -     * until it returns 1 with the scsi_host and the elapsed time of
>> +     * until it returns 1 with the Scsi_Host and the elapsed time of
>>        * the scan in jiffies.
>>        *
>>        * Status: OPTIONAL
>> @@ -520,9 +524,9 @@ struct Scsi_Host {
>>        * their __ prefixed variants with the lock held. NEVER
>>        * access this list directly from a driver.
>>        */
>> -    struct list_head    __devices;
>> -    struct list_head    __targets;
>> -
>> +    struct xarray        __devices;    /* array of scsi_debug objs */
>> +    struct xarray        __targets;    /* array of scsi_target objs */
>> +
>>       struct list_head    starved_list;
>>       spinlock_t        default_lock;
>> diff --git a/include/scsi/scsi_transport.h b/include/scsi/scsi_transport.h
>> index a0458bda3148..3756b264809a 100644
>> --- a/include/scsi/scsi_transport.h
>> +++ b/include/scsi/scsi_transport.h
>> @@ -70,7 +70,8 @@ scsi_transport_reserve_device(struct scsi_transport_template 
>> * t, int space)
>>   static inline void *
>>   scsi_transport_target_data(struct scsi_target *starget)
>>   {
>> -    struct Scsi_Host *shost = dev_to_shost(&starget->dev);
>> +    struct Scsi_Host *shost = starg_to_shost(starget);
>> +
>>       return (u8 *)starget->starget_data
>>           + shost->transportt->target_private_offset;
>>
> Cheers,

Thanks for the review.

Doug Gilbert


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 3/6] scsi: xarray mark sdev_del state
  2020-05-25  7:00   ` Hannes Reinecke
@ 2020-05-25 16:47     ` Douglas Gilbert
  0 siblings, 0 replies; 25+ messages in thread
From: Douglas Gilbert @ 2020-05-25 16:47 UTC (permalink / raw)
  To: Hannes Reinecke, linux-scsi; +Cc: martin.petersen, jejb

On 2020-05-25 3:00 a.m., Hannes Reinecke wrote:
> On 5/24/20 5:58 PM, Douglas Gilbert wrote:
>> One of the features of an xarray is the ability to mark the pointers
>> that it holds. Sadly there are only two marks available per xarray so
>> it is not possible to map marks to the sdev_state enumeration which
>> holds 9 states. Since several loops skip sdev_s in SDEV_DEL state
>> then its negation was chosen as one mark: SCSI_XA_NON_SDEV_DEL.
>>
>> To support this mark a new iterator macro:
>>    shost_for_each_non_del_device(sdev, shost)
>> has been introduced plus a helper function:
>>     __scsi_iterate_non_del_devices(shost, sdev).
>> The new iterator macro is used once (in scsi_scan.c) and the new
>> mark is used directly in three other loops. The machinery to support
>> the mark is placed in a helper (scsi_adjust_non_sdev_del_mark())
>> called mainly by scsi_device_set_state(). Other locations that set
>> sdev_state were checked and some additional calls to that helper
>> were added.
>>
>> Following a complaint from 'make W=1' the formal name of a parameter
>> to both starget_for_each_device() and __starget_for_each_device()
>> was changed from starg to starget.
>>
>> Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
>> ---
>>   drivers/scsi/scsi.c        | 49 +++++++++++++++++++++++++++++++++-----
>>   drivers/scsi/scsi_lib.c    | 34 ++++++++++++++++++++++++--
>>   drivers/scsi/scsi_scan.c   | 14 +++++------
>>   drivers/scsi/scsi_sysfs.c  | 19 +++++++++------
>>   include/scsi/scsi_device.h | 19 +++++++++++++++
>>   include/scsi/scsi_host.h   |  2 ++
>>   6 files changed, 114 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
>> index aa3240f7aed9..0fb650aebcfb 100644
>> --- a/drivers/scsi/scsi.c
>> +++ b/drivers/scsi/scsi.c
>> @@ -588,6 +588,45 @@ struct scsi_device *__scsi_iterate_devices(struct 
>> Scsi_Host *shost,
>>   }
>>   EXPORT_SYMBOL(__scsi_iterate_devices);
>> +/* helper for shost_for_each_non_del_device, see that for documentation */
>> +struct scsi_device *__scsi_iterate_non_del_devices(struct Scsi_Host *shost,
>> +                           struct scsi_device *prev)
>> +{
>> +    struct scsi_device *next = prev;
>> +    unsigned long flags, l_idx;
>> +
>> +    spin_lock_irqsave(shost->host_lock, flags);
>> +    if (xa_empty(&shost->__devices)) {
>> +        next = NULL;
>> +        goto unlock;
>> +    }
>> +    do {
>> +        if (!next) {    /* get first element iff first iteration */
>> +            l_idx = 0;
>> +            next = xa_find(&shost->__devices, &l_idx,
>> +                       scsi_xa_limit_16b.max,
>> +                       SCSI_XA_NON_SDEV_DEL);
>> +        } else {
>> +            l_idx = next->sh_idx,
>> +            next = xa_find_after(&shost->__devices, &l_idx,
>> +                         scsi_xa_limit_16b.max,
>> +                         SCSI_XA_NON_SDEV_DEL);
>> +        }
>> +        if (next) {
>> +            /* skip devices that we can't get a reference to */
>> +            if (!scsi_device_get(next))
>> +                break;
>> +        }
>> +    } while (next);
>> +unlock:
>> +    spin_unlock_irqrestore(shost->host_lock, flags);
>> +
>> +    if (prev)
>> +        scsi_device_put(prev);
>> +    return next;
>> +}
>> +EXPORT_SYMBOL(__scsi_iterate_non_del_devices);
>> +
>>   /* helper for starg_for_each_device, see that for documentation */
>>   struct scsi_device *__scsi_target_iterate_devices(struct scsi_target *starg,
>>                             struct scsi_device *prev)
>> @@ -695,9 +734,8 @@ struct scsi_device *__scsi_device_lookup_by_target(struct 
>> scsi_target *starget,
>>       unsigned long l_idx;
>>       struct scsi_device *sdev;
>> -    xa_for_each(&starget->devices, l_idx, sdev) {
>> -        if (sdev->sdev_state == SDEV_DEL)
>> -            continue;
>> +    xa_for_each_marked(&starget->devices, l_idx, sdev,
>> +               SCSI_XA_NON_SDEV_DEL) {
>>           if (sdev->lun ==lun)
>>               return sdev;
>>       }
>> @@ -754,9 +792,8 @@ struct scsi_device *__scsi_device_lookup(struct Scsi_Host 
>> *shost,
>>       unsigned long l_idx;
>>       struct scsi_device *sdev;
>> -    xa_for_each(&shost->__devices, l_idx, sdev) {
>> -        if (sdev->sdev_state == SDEV_DEL)
>> -            continue;
>> +    xa_for_each_marked(&shost->__devices, l_idx, sdev,
>> +               SCSI_XA_NON_SDEV_DEL) {
>>           if (sdev->channel == channel && sdev->id == id &&
>>                   sdev->lun ==lun)
>>               return sdev;
>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> index 68df68f54fc8..f484bf2b5d7d 100644
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -2217,6 +2217,27 @@ scsi_test_unit_ready(struct scsi_device *sdev, int 
>> timeout, int retries,
>>   }
>>   EXPORT_SYMBOL(scsi_test_unit_ready);
>> +/* Assume host_lock taken and that xahp->xa_lock is the same lock. */
>> +static inline void
>> +scsi_adjust_non_sdev_del_mark(struct scsi_device *sdev,
>> +                  enum scsi_device_state old_state,
>> +                  enum scsi_device_state new_state)
>> +{
>> +    struct xarray *xatp =
>> +            &to_scsi_target(sdev->sdev_gendev.parent)->devices;
>> +    struct xarray *xahp = &sdev->host->__devices;
>> +
>> +    if (old_state == new_state)
>> +        return;
>> +    if (old_state == SDEV_DEL) {
>> +        xa_set_mark(xatp, sdev->starg_idx, SCSI_XA_NON_SDEV_DEL);
>> +        xa_set_mark(xahp, sdev->sh_idx, SCSI_XA_NON_SDEV_DEL);
>> +    } else if (new_state == SDEV_DEL) {
>> +        xa_clear_mark(xatp, sdev->starg_idx, SCSI_XA_NON_SDEV_DEL);
>> +        xa_clear_mark(xahp, sdev->sh_idx, SCSI_XA_NON_SDEV_DEL);
>> +    }
>> +}
>> +
>>   /**
>>    *    scsi_device_set_state - Take the given device through the device state 
>> model.
>>    *    @sdev:    scsi device to change the state of.
>> @@ -2330,6 +2351,8 @@ scsi_device_set_state(struct scsi_device *sdev, enum 
>> scsi_device_state state)
>>       }
>>       sdev->offline_already = false;
>> +    if (unlikely(oldstate == SDEV_DEL || state == SDEV_DEL))
>> +        scsi_adjust_non_sdev_del_mark(sdev, oldstate, state);
>>       sdev->sdev_state = state;
>>       return 0;
>> @@ -2731,14 +2754,21 @@ int scsi_internal_device_unblock_nowait(struct 
>> scsi_device *sdev,
>>       switch (sdev->sdev_state) {
>>       case SDEV_BLOCK:
>>       case SDEV_TRANSPORT_OFFLINE:
>> +        if (unlikely(new_state == SDEV_DEL))
>> +            scsi_adjust_non_sdev_del_mark(sdev, sdev->sdev_state,
>> +                              SDEV_DEL);
>>           sdev->sdev_state = new_state;
>>           break;
>>       case SDEV_CREATED_BLOCK:
>>           if (new_state == SDEV_TRANSPORT_OFFLINE ||
>> -            new_state == SDEV_OFFLINE)
>> +            new_state == SDEV_OFFLINE) {
>>               sdev->sdev_state = new_state;
>> -        else
>> +        } else {
>> +            if (unlikely(new_state == SDEV_DEL))
>> +                scsi_adjust_non_sdev_del_mark
>> +                    (sdev, sdev->sdev_state, SDEV_DEL);
>>               sdev->sdev_state = SDEV_CREATED;
>> +        }
>>           break;
>>       case SDEV_CANCEL:
>>       case SDEV_OFFLINE:
>> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
>> index 72a0064a879a..f5a5e48ca5ae 100644
>> --- a/drivers/scsi/scsi_scan.c
>> +++ b/drivers/scsi/scsi_scan.c
>> @@ -280,7 +280,7 @@ static struct scsi_device *scsi_alloc_sdev(struct 
>> scsi_target *starget,
>>       scsi_change_queue_depth(sdev, sdev->host->cmd_per_lun ?
>>                       sdev->host->cmd_per_lun : 1);
>> -    scsi_sysfs_device_initialize(sdev);
>> +    scsi_sysfs_device_initialize(sdev);    /* marked as non_sdev_del */
>>       if (shost->hostt->slave_alloc) {
>>           ret = shost->hostt->slave_alloc(sdev);
>> @@ -1708,10 +1708,9 @@ int scsi_scan_host_selected(struct Scsi_Host *shost, 
>> unsigned int channel,
>>   static void scsi_sysfs_add_devices(struct Scsi_Host *shost)
>>   {
>>       struct scsi_device *sdev;
>> -    shost_for_each_device(sdev, shost) {
>> -        /* target removed before the device could be added */
>> -        if (sdev->sdev_state == SDEV_DEL)
>> -            continue;
>> +
>> +    /* target may be removed before devices could be added */
>> +    shost_for_each_non_del_device(sdev, shost) {
>>           /* If device is already visible, skip adding it to sysfs */
>>           if (sdev->is_visible)
>>               continue;
>> @@ -1883,9 +1882,8 @@ void scsi_forget_host(struct Scsi_Host *shost)
>>    restart:
>>       spin_lock_irqsave(shost->host_lock, flags);
>> -    xa_for_each(&shost->__devices, l_idx, sdev) {
>> -        if (sdev->sdev_state == SDEV_DEL)
>> -            continue;
>> +    xa_for_each_marked(&shost->__devices, l_idx, sdev,
>> +               SCSI_XA_NON_SDEV_DEL) {
>>           spin_unlock_irqrestore(shost->host_lock, flags);
>>           __scsi_remove_device(sdev);
>>           goto restart;
>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
>> index 4bfcf33139a2..6fdd4648f374 100644
>> --- a/drivers/scsi/scsi_sysfs.c
>> +++ b/drivers/scsi/scsi_sysfs.c
>> @@ -1481,15 +1481,15 @@ static void __scsi_remove_target(struct scsi_target 
>> *starget)
>>       spin_lock_irqsave(shost->host_lock, flags);
>>    restart:
>> -    xa_for_each(&starget->devices, l_idx, sdev) {
>> +    xa_for_each_marked(&starget->devices, l_idx, sdev,
>> +               SCSI_XA_NON_SDEV_DEL) {
>>           /*
>>            * We cannot call scsi_device_get() here, as
>>            * we might've been called from rmmod() causing
>>            * scsi_device_get() to fail the module_is_live()
>>            * check.
>>            */
>> -        if (sdev->sdev_state == SDEV_DEL ||
>> -            sdev->sdev_state == SDEV_CANCEL ||
>> +        if (sdev->sdev_state == SDEV_CANCEL ||
>>               !get_device(&sdev->sdev_gendev))
>>               continue;
>>           spin_unlock_irqrestore(shost->host_lock, flags);
>> @@ -1621,16 +1621,21 @@ void scsi_sysfs_device_initialize(struct scsi_device 
>> *sdev)
>>       /* XARRAY: might add in hole */
>>       res = xa_alloc(&starget->devices, &u_idx, sdev, scsi_xa_limit_16b,
>>                  GFP_ATOMIC);
>> -    if (res < 0)
>> +    if (res < 0) {
>>           pr_err("%s: xa_alloc 1 failure errno=%d\n", __func__, -res);
>> -    else
>> +    } else {
>>           sdev->starg_idx = u_idx;
>> +        xa_set_mark(&starget->devices, u_idx, SCSI_XA_NON_SDEV_DEL);
>> +    }
>>       res = xa_alloc(&shost->__devices, &u_idx, sdev, scsi_xa_limit_16b,
>>                  GFP_ATOMIC);
>> -    if (res < 0)
>> +    if (res < 0) {
>>           pr_err("%s: xa_alloc 2 failure errno=%d\n", __func__, -res);
>> -    else
>> +    } else {
>>           sdev->sh_idx = u_idx;
>> +        /* already have host_lock which is shost->__devices.xa_lock */
>> +        xa_set_mark(&shost->__devices, u_idx, SCSI_XA_NON_SDEV_DEL);
>> +    }
>>       spin_unlock_irqrestore(shost->host_lock, flags);
>>       /*
>>        * device can now only be removed via __scsi_remove_device() so hold
>> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
>> index 4219b8d1b94c..23bf686cbabc 100644
>> --- a/include/scsi/scsi_device.h
>> +++ b/include/scsi/scsi_device.h
>> @@ -372,6 +372,9 @@ extern void __starget_for_each_device(struct scsi_target 
>> *, void *,
>>   /* only exposed to implement shost_for_each_device */
>>   extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *,
>>                             struct scsi_device *);
>> +/* only exposed to implement shost_for_each_non_del_device */
>> +extern struct scsi_device *__scsi_iterate_non_del_devices
>> +            (struct Scsi_Host *shost, struct scsi_device *sdev);
>>   /* only exposed to implement starg_for_each_device */
>>   extern struct scsi_device *__scsi_target_iterate_devices
>>                       (struct scsi_target *starg,
>> @@ -391,6 +394,22 @@ extern struct scsi_device *__scsi_target_iterate_devices
>>            (sdev); \
>>            (sdev) = __scsi_iterate_devices((shost), (sdev)))
>> +/**
>> + * shost_for_each_non_del_device - iterate over those scsi_devices that do not
>> + *                   have the SDEV_DEL state and are associated
>> + *                   with given host
>> + * @sdev: the &struct scsi_device to use as a cursor
>> + * @shost: the &struct Scsi_Host to iterate over
>> + *
>> + * Iterator that returns each device attached to @shost.  This loop
>> + * takes a reference on each device and releases it at the end.  If
>> + * you break out of the loop, you must call scsi_device_put(sdev).
>> + */
>> +#define shost_for_each_non_del_device(sdev, shost) \
>> +    for ((sdev) = __scsi_iterate_non_del_devices((shost), NULL); \
>> +         (sdev); \
>> +         (sdev) = __scsi_iterate_non_del_devices((shost), (sdev)))
>> +
>>   /* helper for __shost_for_each_device */
>>   static inline struct scsi_device *__scsi_h2d_1st_it(struct Scsi_Host *shost)
>>   {
>> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
>> index e6386f3d6de1..0e94b1feb8e9 100644
>> --- a/include/scsi/scsi_host.h
>> +++ b/include/scsi/scsi_host.h
>> @@ -33,6 +33,8 @@ struct scsi_transport_template;
>>   /* XARRAY: this limits number of devices (LUs) in host to 64K */
>>   #define scsi_xa_limit_16b    XA_LIMIT(0, USHRT_MAX)
>> +#define SCSI_XA_NON_SDEV_DEL XA_MARK_1
>> +
>>   struct scsi_host_template {
>>       struct module *module;
>>       const char *name;
>>
> Hmm.
> Nice idea, yet I'm not sure if it's worth the hassle.
> We still have to check additional conditions during the iterators;
> saving just one condition is probably not worth it.

Again, this was proof-of-concept. I believe it is useful. In this
case, the mark allows the code to know that a sdev is in SDEV_DELETE
state _without_ derefencing sdev. And that makes it inherently safer.

Also showing LLD coders how to use marks on xarrays and leaving at least
one unused might give them ideas for improving their code. For example
a mark for every sdev that is "in-flight" might make for a very
useful sub-list.


And if I understand xarrays correctly, the concept of:
     pointer_s[index]
can be replaced by:
     <32 or 64 set of bit_flags>_s[index]

with the restriction that <32 or 64 set of bit_flags> can't be 0. Also
I think some of those 32 or 64 bit positions are used, so deduct a few.
That way the collector (shost or starget) could hold the target or
LU states (e.g. replacing the need for an 'enum scsi_device_state'
member in sdev objects).

Doug Gilbert

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 1/6] scsi: xarray hctl
  2020-05-25 16:30     ` Douglas Gilbert
@ 2020-05-25 17:40       ` Matthew Wilcox
  2020-05-26  2:01         ` Douglas Gilbert
  2020-05-26  6:21         ` Hannes Reinecke
  0 siblings, 2 replies; 25+ messages in thread
From: Matthew Wilcox @ 2020-05-25 17:40 UTC (permalink / raw)
  To: Douglas Gilbert; +Cc: Hannes Reinecke, linux-scsi, martin.petersen, jejb

On Mon, May 25, 2020 at 12:30:52PM -0400, Douglas Gilbert wrote:
> On 2020-05-25 2:57 a.m., Hannes Reinecke wrote:
> > On 5/24/20 5:58 PM, Douglas Gilbert wrote:
> > > +++ b/drivers/scsi/scsi.c
> > > @@ -554,19 +554,32 @@ EXPORT_SYMBOL(scsi_device_put);
> > >   struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost,
> > >                          struct scsi_device *prev)
> > >   {
> > > -    struct list_head *list = (prev ? &prev->siblings : &shost->__devices);
> > > -    struct scsi_device *next = NULL;
> > > -    unsigned long flags;
> > > +    struct scsi_device *next = prev;
> > > +    unsigned long flags, l_idx;
> > >       spin_lock_irqsave(shost->host_lock, flags);
> > > -    while (list->next != &shost->__devices) {
> > > -        next = list_entry(list->next, struct scsi_device, siblings);
> > > -        /* skip devices that we can't get a reference to */
> > > -        if (!scsi_device_get(next))
> > > -            break;
> > > +    if (xa_empty(&shost->__devices)) {
> > >           next = NULL;
> > > -        list = list->next;
> > > +        goto unlock;
> > >       }

You don't need this stanza.  xa_find() will do the right thing (ie return
NULL) with an empty array.

> > > +    do {
> > > +        if (!next) {    /* get first element iff first iteration */
> > > +            l_idx = 0;
> > > +            next = xa_find(&shost->__devices, &l_idx,
> > > +                       scsi_xa_limit_16b.max, XA_PRESENT);

You don't need to use a 16-bit limit here; the XArray code will stop after
the end of the array, and the only thing you might save is scanning the top
node unnecessarily far.  To get there, you'd have to have an index >= 2^12,
and the additional limit would mean that you'd scan the first 16 entries of
the top node instead of all 64.  Kind of pales into insignificance compared
to actually walking all 2^12 elements up to that point ;-)

> > > @@ -391,8 +391,8 @@ static void scsi_single_lun_run(struct
> > > scsi_device *current_sdev)
> > >       spin_lock_irqsave(shost->host_lock, flags);
> > >       if (starget->starget_sdev_user)
> > >           goto out;
> > > -    list_for_each_entry_safe(sdev, tmp, &starget->devices,
> > > -            same_target_siblings) {
> > > +    /* XARRAY: was list_for_each_entry_safe(); is this safe ? */
> > > +    xa_for_each(&starget->devices, l_idx, sdev) {
> > >           if (sdev == current_sdev)
> > >               continue;
> > >           if (scsi_device_get(sdev))
> > 
> > Frankly, I don't know why we're using the _safe variant here.
> > Typically the _safe variant is used to protect against removal
> > (ie if a list element is removed while iterating over the list),
> > but I can't really imagine why starting I/O on the device would
> > cause one entry to be deleted...
> > James?

... to add, xa_for_each() is safe against deletion while walking.
Or addition while walking.

> > > @@ -234,8 +234,9 @@ static struct scsi_device
> > > *scsi_alloc_sdev(struct scsi_target *starget,
> > >       sdev->channel = starget->channel;
> > >       mutex_init(&sdev->state_mutex);
> > >       sdev->sdev_state = SDEV_CREATED;
> > > -    INIT_LIST_HEAD(&sdev->siblings);
> > > -    INIT_LIST_HEAD(&sdev->same_target_siblings);
> > > +    /* XARRAY: INIT_LIST_HEAD()s here on list leaves, why ? */
> > > +    sdev->sh_idx = -1;
> > > +    sdev->starg_idx = -1;        /* for debugging */
> > >       INIT_LIST_HEAD(&sdev->starved_entry);
> > >       INIT_LIST_HEAD(&sdev->event_list);
> > >       spin_lock_init(&sdev->list_lock);
> > 
> > It is good practice to call INIT_LIST_HEAD() on list leaves as then you
> > can call 'list_empty()' to figure out if they are part of a list or not.
> > 
> > But I'm wondering about the 'sh_idx' and the 'starg_idx' here.
> > We already have perfectly good indices with the 'lun' and 'id' entries
> > in struct scsi_device, which (incidentally) _need_ to be unique.
> > 
> > So why not use them?
> 
> Ah now we have something to discuss and perhaps Willy can help
> 
> At the target level we have two indexes:
>    - channel (is 16 bits enough)
>    - target (16 or 32 bits?)
> This could be address to ways:
>    1) introduce struct scsi_channel {int num_chans; scsi_target *t1p;}
>    2) combine channnel and target into one 32 bit integer.
> 
> The xarray type of its index seems to be unsigned long (Willy,
> is that correct?). That is the same size as a pointer, 32 bits
> on a 32 bit machine, 64 bits on a 64 bit machine. That is
> unfortunate, since we can represent the full 64 bit LUN as
> a single index on a 32 bit machine. We could 'hide' that and
> have two xarrays in scsi_device for 32 bit machines.

You aren't the first person to ask about having a 64-bit lookup on
32-bit machines.  Indeed, I remember Hannes asking for a 256 bit lookup
at LinuxCon in Prague.  I have always been reluctant to add such a thing because the XArray uses quite a naive data type underneath.  It works well with
dense arrays but becomes very wasteful of memory for sparse arrays.

My understanding of SCSI-world is that most devices have a single
LUN 0.  Most devices that have multiple LUNs number them sequentially.
Some devices have either an internal structure or essentially pick random
LUNs for the devices they expose.

> So to start with, I took the easy approach, the one that most
> resembles the what include/linux/list.h has been using. There
> is also a (overly ?) complex locking scheme that I really did
> not want to upset.

I think this is currently the best way to use the XArray.  I do have
a project in the works to replace the data structure implementing the
XArray API with a better one, but that project has been underway for
almost two years now, so I would have a hard time suggesting that you
wait for it to be ready.

> Those num_chans, t1p members that I put in scsi_channel is just
> to show the xarray lookup can be bypassed when num_chans=1
> which is very likely. Same approach could be taken in scsi_target
> to scsi_device.

If you do only have a single entry at 0, the XArray is very efficient.
It simplifies to a single pointer.  You don't need to optimise for that
case yourself ;-)

> > > @@ -326,6 +331,7 @@ static void scsi_target_dev_release(struct device *dev)
> > >       struct device *parent = dev->parent;
> > >       struct scsi_target *starget = to_scsi_target(dev);
> > > +    xa_destroy(&starget->devices);

By the way, you don't have to call xa_destroy() if you know the array
is empty.  The IDR used to have that unfortunate requirement, and about a
third of users blissfully ignored it, leaking uncounted amounts of memory.

> > > @@ -445,7 +455,15 @@ static struct scsi_target
> > > *scsi_alloc_target(struct device *parent,
> > >       if (found_target)
> > >           goto found;
> > > -    list_add_tail(&starget->siblings, &shost->__targets);
> > > +    /* XARRAY: was list_add_tail(); may get hole in xarray or tail */
> > > +    res = xa_alloc(&shost->__targets, &u_idx, starget, scsi_xa_limit_16b,
> > > +               GFP_ATOMIC);
> > > +    if (res < 0) {
> > > +        spin_unlock_irqrestore(shost->host_lock, flags);
> > > +        pr_err("%s: xa_alloc failure errno=%d\n", __func__, -res);
> > > +        return NULL;
> > > +    }
> > > +    starget->sh_idx = u_idx;
> > >       spin_unlock_irqrestore(shost->host_lock, flags);
> > >       /* allocate and add */
> > >       transport_setup_device(dev);
> > 
> > 'scsi_xa_limit_16b' ?
> > What's that for?
> 
> See include/linux/xarray.h xa_limit_32b
> 
> This is a 16 bit equivalent. Not sure if that is too restrictive.
> Anyway, the xa_alloc() call amongst others needs an upper and
> lower bound.

I'm happy to have an xa_limit_16b in xarray.h.  I estimate it would have
about 5-10 users across the entire kernel.

> > And, again, if we were to use the target id as an index we could be
> > using 'xa_store' and the problem would be solved...

It may make sense to use an allocating XArray for LUNs and targets, but
use the channel ID directly.  It depends how structured these things
are in SCSI, and I'll admit I haven't been keeping up with how SCSI
numbers things on a modern system (I suspect 99.9% of SCSI devices
are USB keys with one host, one channel, one target, one LUN, but
that the 0.1% of SCSI devices on Linux systems are responsible for 80%
of the performance end of the market)


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 1/6] scsi: xarray hctl
  2020-05-25 17:40       ` Matthew Wilcox
@ 2020-05-26  2:01         ` Douglas Gilbert
  2020-05-26  3:01           ` Matthew Wilcox
  2020-05-26  7:24           ` Hannes Reinecke
  2020-05-26  6:21         ` Hannes Reinecke
  1 sibling, 2 replies; 25+ messages in thread
From: Douglas Gilbert @ 2020-05-26  2:01 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Hannes Reinecke, linux-scsi, martin.petersen, jejb

[-- Attachment #1: Type: text/plain, Size: 11364 bytes --]

On 2020-05-25 1:40 p.m., Matthew Wilcox wrote:
> On Mon, May 25, 2020 at 12:30:52PM -0400, Douglas Gilbert wrote:
>> On 2020-05-25 2:57 a.m., Hannes Reinecke wrote:
>>> On 5/24/20 5:58 PM, Douglas Gilbert wrote:
>>>> +++ b/drivers/scsi/scsi.c
>>>> @@ -554,19 +554,32 @@ EXPORT_SYMBOL(scsi_device_put);
>>>>    struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost,
>>>>                           struct scsi_device *prev)
>>>>    {
>>>> -    struct list_head *list = (prev ? &prev->siblings : &shost->__devices);
>>>> -    struct scsi_device *next = NULL;
>>>> -    unsigned long flags;
>>>> +    struct scsi_device *next = prev;
>>>> +    unsigned long flags, l_idx;
>>>>        spin_lock_irqsave(shost->host_lock, flags);
>>>> -    while (list->next != &shost->__devices) {
>>>> -        next = list_entry(list->next, struct scsi_device, siblings);
>>>> -        /* skip devices that we can't get a reference to */
>>>> -        if (!scsi_device_get(next))
>>>> -            break;
>>>> +    if (xa_empty(&shost->__devices)) {
>>>>            next = NULL;
>>>> -        list = list->next;
>>>> +        goto unlock;
>>>>        }
> 
> You don't need this stanza.  xa_find() will do the right thing (ie return
> NULL) with an empty array.

Okay.

When using xarray to store a bitset (e.g. say the 9 states in:
    enum scsi_device_state   [include/scsi/scsi_device.h]
as 2**0, 2**1, 2**2 .... 2**8), is there a mask to indicate which bit
positions are used internally by xarray? Or are the full 32 bits available,
other than the special value 0 aka NULL?

>>>> +    do {
>>>> +        if (!next) {    /* get first element iff first iteration */
>>>> +            l_idx = 0;
>>>> +            next = xa_find(&shost->__devices, &l_idx,
>>>> +                       scsi_xa_limit_16b.max, XA_PRESENT);
> 
> You don't need to use a 16-bit limit here; the XArray code will stop after
> the end of the array, and the only thing you might save is scanning the top
> node unnecessarily far.  To get there, you'd have to have an index >= 2^12,
> and the additional limit would mean that you'd scan the first 16 entries of
> the top node instead of all 64.  Kind of pales into insignificance compared
> to actually walking all 2^12 elements up to that point ;-)
> 

If we go to using parts of the hctl tuple (host, channel, target_id, lun)
as xarray indexes, then large numbers can occur in unstable systems with
say the same group of targets going offline and then back online  when,
for example, a power supply trips out. The number of elements will still
be relatively small, say well less than 2**16.

>>>> @@ -391,8 +391,8 @@ static void scsi_single_lun_run(struct
>>>> scsi_device *current_sdev)
>>>>        spin_lock_irqsave(shost->host_lock, flags);
>>>>        if (starget->starget_sdev_user)
>>>>            goto out;
>>>> -    list_for_each_entry_safe(sdev, tmp, &starget->devices,
>>>> -            same_target_siblings) {
>>>> +    /* XARRAY: was list_for_each_entry_safe(); is this safe ? */
>>>> +    xa_for_each(&starget->devices, l_idx, sdev) {
>>>>            if (sdev == current_sdev)
>>>>                continue;
>>>>            if (scsi_device_get(sdev))
>>>
>>> Frankly, I don't know why we're using the _safe variant here.
>>> Typically the _safe variant is used to protect against removal
>>> (ie if a list element is removed while iterating over the list),
>>> but I can't really imagine why starting I/O on the device would
>>> cause one entry to be deleted...
>>> James?
> 
> ... to add, xa_for_each() is safe against deletion while walking.
> Or addition while walking.
> 
>>>> @@ -234,8 +234,9 @@ static struct scsi_device
>>>> *scsi_alloc_sdev(struct scsi_target *starget,
>>>>        sdev->channel = starget->channel;
>>>>        mutex_init(&sdev->state_mutex);
>>>>        sdev->sdev_state = SDEV_CREATED;
>>>> -    INIT_LIST_HEAD(&sdev->siblings);
>>>> -    INIT_LIST_HEAD(&sdev->same_target_siblings);
>>>> +    /* XARRAY: INIT_LIST_HEAD()s here on list leaves, why ? */
>>>> +    sdev->sh_idx = -1;
>>>> +    sdev->starg_idx = -1;        /* for debugging */
>>>>        INIT_LIST_HEAD(&sdev->starved_entry);
>>>>        INIT_LIST_HEAD(&sdev->event_list);
>>>>        spin_lock_init(&sdev->list_lock);
>>>
>>> It is good practice to call INIT_LIST_HEAD() on list leaves as then you
>>> can call 'list_empty()' to figure out if they are part of a list or not.
>>>
>>> But I'm wondering about the 'sh_idx' and the 'starg_idx' here.
>>> We already have perfectly good indices with the 'lun' and 'id' entries
>>> in struct scsi_device, which (incidentally) _need_ to be unique.
>>>
>>> So why not use them?
>>
>> Ah now we have something to discuss and perhaps Willy can help
>>
>> At the target level we have two indexes:
>>     - channel (is 16 bits enough)
>>     - target (16 or 32 bits?)
>> This could be address to ways:
>>     1) introduce struct scsi_channel {int num_chans; scsi_target *t1p;}
>>     2) combine channnel and target into one 32 bit integer.
>>
>> The xarray type of its index seems to be unsigned long (Willy,
>> is that correct?). That is the same size as a pointer, 32 bits
>> on a 32 bit machine, 64 bits on a 64 bit machine. That is
>> unfortunate, since we can represent the full 64 bit LUN as
>> a single index on a 32 bit machine. We could 'hide' that and
>> have two xarrays in scsi_device for 32 bit machines.
> 
> You aren't the first person to ask about having a 64-bit lookup on
> 32-bit machines.  Indeed, I remember Hannes asking for a 256 bit lookup
> at LinuxCon in Prague.  I have always been reluctant to add such a thing because the XArray uses quite a naive data type underneath.  It works well with
> dense arrays but becomes very wasteful of memory for sparse arrays.
> 
> My understanding of SCSI-world is that most devices have a single
> LUN 0.  Most devices that have multiple LUNs number them sequentially.
> Some devices have either an internal structure or essentially pick random
> LUNs for the devices they expose.

Yes an no. There are now all sorts of weird patterns hiding in the
64 bit LUNs. See chapter 4.7 in sam6r05.pdf over at www.t10.org .
And arguably there is a 65th bit with the LU_CONG bit in an INQUIRY
standard response changing the interpretation of the 64 bit LUN.

>> So to start with, I took the easy approach, the one that most
>> resembles the what include/linux/list.h has been using. There
>> is also a (overly ?) complex locking scheme that I really did
>> not want to upset.
> 
> I think this is currently the best way to use the XArray.  I do have
> a project in the works to replace the data structure implementing the
> XArray API with a better one, but that project has been underway for
> almost two years now, so I would have a hard time suggesting that you
> wait for it to be ready.
> 
>> Those num_chans, t1p members that I put in scsi_channel is just
>> to show the xarray lookup can be bypassed when num_chans=1
>> which is very likely. Same approach could be taken in scsi_target
>> to scsi_device.
> 
> If you do only have a single entry at 0, the XArray is very efficient.
> It simplifies to a single pointer.  You don't need to optimise for that
> case yourself ;-)

That is good to know.

>>>> @@ -326,6 +331,7 @@ static void scsi_target_dev_release(struct device *dev)
>>>>        struct device *parent = dev->parent;
>>>>        struct scsi_target *starget = to_scsi_target(dev);
>>>> +    xa_destroy(&starget->devices);
> 
> By the way, you don't have to call xa_destroy() if you know the array
> is empty.  The IDR used to have that unfortunate requirement, and about a
> third of users blissfully ignored it, leaking uncounted amounts of memory.
> 
>>>> @@ -445,7 +455,15 @@ static struct scsi_target
>>>> *scsi_alloc_target(struct device *parent,
>>>>        if (found_target)
>>>>            goto found;
>>>> -    list_add_tail(&starget->siblings, &shost->__targets);
>>>> +    /* XARRAY: was list_add_tail(); may get hole in xarray or tail */
>>>> +    res = xa_alloc(&shost->__targets, &u_idx, starget, scsi_xa_limit_16b,
>>>> +               GFP_ATOMIC);
>>>> +    if (res < 0) {
>>>> +        spin_unlock_irqrestore(shost->host_lock, flags);
>>>> +        pr_err("%s: xa_alloc failure errno=%d\n", __func__, -res);
>>>> +        return NULL;
>>>> +    }
>>>> +    starget->sh_idx = u_idx;
>>>>        spin_unlock_irqrestore(shost->host_lock, flags);
>>>>        /* allocate and add */
>>>>        transport_setup_device(dev);
>>>
>>> 'scsi_xa_limit_16b' ?
>>> What's that for?
>>
>> See include/linux/xarray.h xa_limit_32b
>>
>> This is a 16 bit equivalent. Not sure if that is too restrictive.
>> Anyway, the xa_alloc() call amongst others needs an upper and
>> lower bound.
> 
> I'm happy to have an xa_limit_16b in xarray.h.  I estimate it would have
> about 5-10 users across the entire kernel.

No, that was just experimental.

>>> And, again, if we were to use the target id as an index we could be
>>> using 'xa_store' and the problem would be solved...
> 
> It may make sense to use an allocating XArray for LUNs and targets, but
> use the channel ID directly.  It depends how structured these things
> are in SCSI, and I'll admit I haven't been keeping up with how SCSI
> numbers things on a modern system (I suspect 99.9% of SCSI devices
> are USB keys with one host, one channel, one target, one LUN, but
> that the 0.1% of SCSI devices on Linux systems are responsible for 80%
> of the performance end of the market)

Attached is a UML (like) class diagram for folks to kick around. It handles
u32 values for the host, channel and target_id level and u64 at the lun
(leaf) level, via a bit of hackery, even on 32 bit machines. So the object
tree is built on the diagonal, with the multiplicity coming from xarray
members (attributes). The arrowed lines show inheritance, with
struct device being the base class aka superclass (one could argue
otherwise). The "transport_class_<n>"s are added as needed by SCSI
transports such as SAS and iSCSI.

And it uses native xarray indexes up to 2**32 - 1. So it meets all the
constraints we have currently.

Plus, based on the answers I get from above questions, in the
scsi_lu_low32 class I would like to add another xarray, that holds
bitsets rather than pointers. Each bitset holds amongst other things
the 9 scsi_device states. That way you could iterate through all
LU devices looking for an uncommon state _without_ dereferencing
pointers to any sdev_s that are bypassed. And if xarray is cache
efficient in storing its data, then it would be much faster. The
same thing (i.e. bitset xarray) could be done in scsi_channel, for
starget states.

Finally, for efficient xarray use, every object needs to be able to
get to its parent (or at least, its "collection") quickly as that is
where all xarray operations take place. And by efficient I do _not_
mean calling dev_to_shost(starget->dev.parent) when we built the
object hierarchy and _know_ where its "shost" is.

Doug Gilbert

[-- Attachment #2: scsi_ml_cd1.pdf --]
[-- Type: application/pdf, Size: 20482 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 1/6] scsi: xarray hctl
  2020-05-26  2:01         ` Douglas Gilbert
@ 2020-05-26  3:01           ` Matthew Wilcox
  2020-05-26  7:24           ` Hannes Reinecke
  1 sibling, 0 replies; 25+ messages in thread
From: Matthew Wilcox @ 2020-05-26  3:01 UTC (permalink / raw)
  To: Douglas Gilbert; +Cc: Hannes Reinecke, linux-scsi, martin.petersen, jejb

On Mon, May 25, 2020 at 10:01:59PM -0400, Douglas Gilbert wrote:
> When using xarray to store a bitset (e.g. say the 9 states in:
>    enum scsi_device_state   [include/scsi/scsi_device.h]
> as 2**0, 2**1, 2**2 .... 2**8), is there a mask to indicate which bit
> positions are used internally by xarray? Or are the full 32 bits available,
> other than the special value 0 aka NULL?

erm ... not quite sure what you mean by 'bitset'.  Are you saying that
you want to store a number between [0..511] at a particular offset in
the array?  If so, you want xa_mk_value(N) to turn a value into an xarray
entry.  When you load it, you can check it's a value with xa_is_value()
and convert it back to an integer with xa_to_value().  You han have up
to 31/63 bits that way, depending on sizeof(void *).

> Plus, based on the answers I get from above questions, in the
> scsi_lu_low32 class I would like to add another xarray, that holds
> bitsets rather than pointers. Each bitset holds amongst other things
> the 9 scsi_device states. That way you could iterate through all
> LU devices looking for an uncommon state _without_ dereferencing
> pointers to any sdev_s that are bypassed. And if xarray is cache
> efficient in storing its data, then it would be much faster. The
> same thing (i.e. bitset xarray) could be done in scsi_channel, for
> starget states.

Sounds like you really want more search marks.  This is a relatively common request and is disappointingly hard to do efficiently with the current XArray.
It's a request that should be solved with the new data structure, but
see earlier comments on when you might see that.

In the meantime, you might consider using an IDA for an efficient
expandable bitmap.  It's built on top of the XArray, but stores only
one bit per index instead of a pointer.  You'd need nine IDAs for nine
bits of state ...



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 1/6] scsi: xarray hctl
  2020-05-25 17:40       ` Matthew Wilcox
  2020-05-26  2:01         ` Douglas Gilbert
@ 2020-05-26  6:21         ` Hannes Reinecke
  2020-05-26 14:27           ` Matthew Wilcox
  1 sibling, 1 reply; 25+ messages in thread
From: Hannes Reinecke @ 2020-05-26  6:21 UTC (permalink / raw)
  To: Matthew Wilcox, Douglas Gilbert; +Cc: linux-scsi, martin.petersen, jejb

On 5/25/20 7:40 PM, Matthew Wilcox wrote:
> On Mon, May 25, 2020 at 12:30:52PM -0400, Douglas Gilbert wrote:
>> On 2020-05-25 2:57 a.m., Hannes Reinecke wrote:
>>> On 5/24/20 5:58 PM, Douglas Gilbert wrote:
>>>> +++ b/drivers/scsi/scsi.c
>>>> @@ -554,19 +554,32 @@ EXPORT_SYMBOL(scsi_device_put);
>>>>    struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost,
>>>>                           struct scsi_device *prev)
>>>>    {
>>>> -    struct list_head *list = (prev ? &prev->siblings : &shost->__devices);
>>>> -    struct scsi_device *next = NULL;
>>>> -    unsigned long flags;
>>>> +    struct scsi_device *next = prev;
>>>> +    unsigned long flags, l_idx;
>>>>        spin_lock_irqsave(shost->host_lock, flags);
>>>> -    while (list->next != &shost->__devices) {
>>>> -        next = list_entry(list->next, struct scsi_device, siblings);
>>>> -        /* skip devices that we can't get a reference to */
>>>> -        if (!scsi_device_get(next))
>>>> -            break;
>>>> +    if (xa_empty(&shost->__devices)) {
>>>>            next = NULL;
>>>> -        list = list->next;
>>>> +        goto unlock;
>>>>        }
> 
> You don't need this stanza.  xa_find() will do the right thing (ie return
> NULL) with an empty array.
> 
>>>> +    do {
>>>> +        if (!next) {    /* get first element iff first iteration */
>>>> +            l_idx = 0;
>>>> +            next = xa_find(&shost->__devices, &l_idx,
>>>> +                       scsi_xa_limit_16b.max, XA_PRESENT);
> 
> You don't need to use a 16-bit limit here; the XArray code will stop after
> the end of the array, and the only thing you might save is scanning the top
> node unnecessarily far.  To get there, you'd have to have an index >= 2^12,
> and the additional limit would mean that you'd scan the first 16 entries of
> the top node instead of all 64.  Kind of pales into insignificance compared
> to actually walking all 2^12 elements up to that point ;-)
> 
>>>> @@ -391,8 +391,8 @@ static void scsi_single_lun_run(struct
>>>> scsi_device *current_sdev)
>>>>        spin_lock_irqsave(shost->host_lock, flags);
>>>>        if (starget->starget_sdev_user)
>>>>            goto out;
>>>> -    list_for_each_entry_safe(sdev, tmp, &starget->devices,
>>>> -            same_target_siblings) {
>>>> +    /* XARRAY: was list_for_each_entry_safe(); is this safe ? */
>>>> +    xa_for_each(&starget->devices, l_idx, sdev) {
>>>>            if (sdev == current_sdev)
>>>>                continue;
>>>>            if (scsi_device_get(sdev))
>>>
>>> Frankly, I don't know why we're using the _safe variant here.
>>> Typically the _safe variant is used to protect against removal
>>> (ie if a list element is removed while iterating over the list),
>>> but I can't really imagine why starting I/O on the device would
>>> cause one entry to be deleted...
>>> James?
> 
> ... to add, xa_for_each() is safe against deletion while walking.
> Or addition while walking.
> 
>>>> @@ -234,8 +234,9 @@ static struct scsi_device
>>>> *scsi_alloc_sdev(struct scsi_target *starget,
>>>>        sdev->channel = starget->channel;
>>>>        mutex_init(&sdev->state_mutex);
>>>>        sdev->sdev_state = SDEV_CREATED;
>>>> -    INIT_LIST_HEAD(&sdev->siblings);
>>>> -    INIT_LIST_HEAD(&sdev->same_target_siblings);
>>>> +    /* XARRAY: INIT_LIST_HEAD()s here on list leaves, why ? */
>>>> +    sdev->sh_idx = -1;
>>>> +    sdev->starg_idx = -1;        /* for debugging */
>>>>        INIT_LIST_HEAD(&sdev->starved_entry);
>>>>        INIT_LIST_HEAD(&sdev->event_list);
>>>>        spin_lock_init(&sdev->list_lock);
>>>
>>> It is good practice to call INIT_LIST_HEAD() on list leaves as then you
>>> can call 'list_empty()' to figure out if they are part of a list or not.
>>>
>>> But I'm wondering about the 'sh_idx' and the 'starg_idx' here.
>>> We already have perfectly good indices with the 'lun' and 'id' entries
>>> in struct scsi_device, which (incidentally) _need_ to be unique.
>>>
>>> So why not use them?
>>
>> Ah now we have something to discuss and perhaps Willy can help
>>
>> At the target level we have two indexes:
>>     - channel (is 16 bits enough)
>>     - target (16 or 32 bits?)
>> This could be address to ways:
>>     1) introduce struct scsi_channel {int num_chans; scsi_target *t1p;}
>>     2) combine channnel and target into one 32 bit integer.
>>
>> The xarray type of its index seems to be unsigned long (Willy,
>> is that correct?). That is the same size as a pointer, 32 bits
>> on a 32 bit machine, 64 bits on a 64 bit machine. That is
>> unfortunate, since we can represent the full 64 bit LUN as
>> a single index on a 32 bit machine. We could 'hide' that and
>> have two xarrays in scsi_device for 32 bit machines.
> 
> You aren't the first person to ask about having a 64-bit lookup on
> 32-bit machines.  Indeed, I remember Hannes asking for a 256 bit lookup
> at LinuxCon in Prague.  I have always been reluctant to add such a thing
> because the XArray uses quite a naive data type underneath.  It works well with
> dense arrays but becomes very wasteful of memory for sparse arrays.
> 
> My understanding of SCSI-world is that most devices have a single
> LUN 0.  Most devices that have multiple LUNs number them sequentially.
> Some devices have either an internal structure or essentially pick random
> LUNs for the devices they expose.
> 
Not quite. You are correct that most devices have a single LUN 0
(think of libata :-), but those with several LUNs typically
enumerate them. In most cases the enumeration starts at 0 (or 1,
if LUN 0 is used for a management LUN), and reaches up to 256.
Some arrays use a different LUN layout, which means that the top
two bit of the LUN number are set, and possibly some intermediate
numbers, too. But the LUNs themselves are numbered consecutively, too;
it's just at a certain offset.
I've never seen anyone picking LUN numbers at random.

>> So to start with, I took the easy approach, the one that most
>> resembles the what include/linux/list.h has been using. There
>> is also a (overly ?) complex locking scheme that I really did
>> not want to upset.
> 
> I think this is currently the best way to use the XArray.  I do have
> a project in the works to replace the data structure implementing the
> XArray API with a better one, but that project has been underway for
> almost two years now, so I would have a hard time suggesting that you
> wait for it to be ready.
> 
>> Those num_chans, t1p members that I put in scsi_channel is just
>> to show the xarray lookup can be bypassed when num_chans=1
>> which is very likely. Same approach could be taken in scsi_target
>> to scsi_device.
> 
> If you do only have a single entry at 0, the XArray is very efficient.
> It simplifies to a single pointer.  You don't need to optimise for that
> case yourself ;-)
> 
But still, the original question still stands: what would be the most 
efficient way using xarrays here?
We have a four-level hierarchy Host:Channel:Target:LUN
and we need to lookup devices (and, occasinally, targets) per host.
At this time, 'channel' and 'target' are unsigned integer, and
LUNs are 64 bit.
It should be possible to shorten 'channel' and 'target' to 16 bits,
as most driver have a limit well below that anyway.
And as discussed above we might be able to shorten the LUN number
to 32 bits by (ab-)using unused ranges in the LUN structure.
But that still leaves us with two 32 bit numbers.

The current approach from Doug is using an _additional_ index into
the xarray. Which I think it pretty wasteful, seeing that the
HCTL number already _is_ a linear index. So using the HCTL number
as index into the xarray would be the logical choice, only we can't
due to implementation limitations.

>>>> @@ -326,6 +331,7 @@ static void scsi_target_dev_release(struct device *dev)
>>>>        struct device *parent = dev->parent;
>>>>        struct scsi_target *starget = to_scsi_target(dev);
>>>> +    xa_destroy(&starget->devices);
> 
> By the way, you don't have to call xa_destroy() if you know the array
> is empty.  The IDR used to have that unfortunate requirement, and about a
> third of users blissfully ignored it, leaking uncounted amounts of memory.
> 
>>>> @@ -445,7 +455,15 @@ static struct scsi_target
>>>> *scsi_alloc_target(struct device *parent,
>>>>        if (found_target)
>>>>            goto found;
>>>> -    list_add_tail(&starget->siblings, &shost->__targets);
>>>> +    /* XARRAY: was list_add_tail(); may get hole in xarray or tail */
>>>> +    res = xa_alloc(&shost->__targets, &u_idx, starget, scsi_xa_limit_16b,
>>>> +               GFP_ATOMIC);
>>>> +    if (res < 0) {
>>>> +        spin_unlock_irqrestore(shost->host_lock, flags);
>>>> +        pr_err("%s: xa_alloc failure errno=%d\n", __func__, -res);
>>>> +        return NULL;
>>>> +    }
>>>> +    starget->sh_idx = u_idx;
>>>>        spin_unlock_irqrestore(shost->host_lock, flags);
>>>>        /* allocate and add */
>>>>        transport_setup_device(dev);
>>>
>>> 'scsi_xa_limit_16b' ?
>>> What's that for?
>>
>> See include/linux/xarray.h xa_limit_32b
>>
>> This is a 16 bit equivalent. Not sure if that is too restrictive.
>> Anyway, the xa_alloc() call amongst others needs an upper and
>> lower bound.
> 
> I'm happy to have an xa_limit_16b in xarray.h.  I estimate it would have
> about 5-10 users across the entire kernel.
> 
>>> And, again, if we were to use the target id as an index we could be
>>> using 'xa_store' and the problem would be solved...
> 
> It may make sense to use an allocating XArray for LUNs and targets, but
> use the channel ID directly.  It depends how structured these things
> are in SCSI, and I'll admit I haven't been keeping up with how SCSI
> numbers things on a modern system (I suspect 99.9% of SCSI devices
> are USB keys with one host, one channel, one target, one LUN, but
> that the 0.1% of SCSI devices on Linux systems are responsible for 80%
> of the performance end of the market)
> 
I can't really argue here, as my perception is a bit skewed. What with 
95% of my customer calls coming from the latter end of the spectrum ...

Anyway. I'll see if I can come up with a mapping for HCTL numbers
onto xarray.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 1/6] scsi: xarray hctl
  2020-05-26  2:01         ` Douglas Gilbert
  2020-05-26  3:01           ` Matthew Wilcox
@ 2020-05-26  7:24           ` Hannes Reinecke
  1 sibling, 0 replies; 25+ messages in thread
From: Hannes Reinecke @ 2020-05-26  7:24 UTC (permalink / raw)
  To: dgilbert, Matthew Wilcox; +Cc: linux-scsi, martin.petersen, jejb

On 5/26/20 4:01 AM, Douglas Gilbert wrote:
[ .. ]
> 
> Attached is a UML (like) class diagram for folks to kick around. It handles
> u32 values for the host, channel and target_id level and u64 at the lun
> (leaf) level, via a bit of hackery, even on 32 bit machines. So the object
> tree is built on the diagonal, with the multiplicity coming from xarray
> members (attributes). The arrowed lines show inheritance, with
> struct device being the base class aka superclass (one could argue
> otherwise). The "transport_class_<n>"s are added as needed by SCSI
> transports such as SAS and iSCSI.
> 
> And it uses native xarray indexes up to 2**32 - 1. So it meets all the
> constraints we have currently.
> 
> Plus, based on the answers I get from above questions, in the
> scsi_lu_low32 class I would like to add another xarray, that holds
> bitsets rather than pointers. Each bitset holds amongst other things
> the 9 scsi_device states. That way you could iterate through all
> LU devices looking for an uncommon state _without_ dereferencing
> pointers to any sdev_s that are bypassed. And if xarray is cache
> efficient in storing its data, then it would be much faster. The
> same thing (i.e. bitset xarray) could be done in scsi_channel, for
> starget states.
> 
> Finally, for efficient xarray use, every object needs to be able to
> get to its parent (or at least, its "collection") quickly as that is
> where all xarray operations take place. And by efficient I do _not_
> mean calling dev_to_shost(starget->dev.parent) when we built the
> object hierarchy and _know_ where its "shost" is.
> 
I would kill the 'channel' object, and use a combined 'channel':'target' 
index for the scsi target. The 'channel' really is only a number, and
doesn't have an object on its own.
And for the LUN numbers I doubt we need to have two lookup arrays; one
32bit array would be sufficient to hold basically _all_ installations
I've seen so far (I've yet to come across on installation using more 
than two LUN levels).
What we could to is use the index into REPORT LUNs as the array index;
things might shuffle around there if someone remaps devices, but that's
a tough one even nowadays.

Lemme see...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 1/6] scsi: xarray hctl
  2020-05-26  6:21         ` Hannes Reinecke
@ 2020-05-26 14:27           ` Matthew Wilcox
  2020-05-26 17:53             ` Hannes Reinecke
  0 siblings, 1 reply; 25+ messages in thread
From: Matthew Wilcox @ 2020-05-26 14:27 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Douglas Gilbert, linux-scsi, martin.petersen, jejb

On Tue, May 26, 2020 at 08:21:26AM +0200, Hannes Reinecke wrote:
> On 5/25/20 7:40 PM, Matthew Wilcox wrote:
> > You aren't the first person to ask about having a 64-bit lookup on
> > 32-bit machines.  Indeed, I remember Hannes asking for a 256 bit lookup
> > at LinuxCon in Prague.  I have always been reluctant to add such a thing
> > because the XArray uses quite a naive data type underneath.  It works well with
> > dense arrays but becomes very wasteful of memory for sparse arrays.
> > 
> > My understanding of SCSI-world is that most devices have a single
> > LUN 0.  Most devices that have multiple LUNs number them sequentially.
> > Some devices have either an internal structure or essentially pick random
> > LUNs for the devices they expose.
> 
> Not quite. You are correct that most devices have a single LUN 0
> (think of libata :-), but those with several LUNs typically
> enumerate them. In most cases the enumeration starts at 0 (or 1,
> if LUN 0 is used for a management LUN), and reaches up to 256.
> Some arrays use a different LUN layout, which means that the top
> two bit of the LUN number are set, and possibly some intermediate
> numbers, too. But the LUNs themselves are numbered consecutively, too;
> it's just at a certain offset.
> I've never seen anyone picking LUN numbers at random.

Ah, OK.  I think for these arrays you'd be better off accepting the cost
of an extra 4 bytes in the struct scsi_device rather than the cost of
storing the scsi_device at the LUN.

Let's just work an example where you have a 64-bit LUN with 4 ranges,
each of 64 entries (this is almost a best-case scenario for the XArray).
[0,63], 2^62+[0,63], 2^63+[0,63], 2^63+2^62+[0,63].

If we store them sequentially in an allocating XArray, we take up 256 *
4 bytes = 1kB extra space in the scsi_device.  The XArray will allocate
four nodes plus one node to hold the four nodes, which is 5 * 576 bytes
(2780 bytes) for a total of 3804 bytes.

Storing them in at their LUN will allocate a top level node which covers
bits 60-66, then four nodes, each covering bits of 54-59, another four
nodes covering bits 48-53, four nodes for 42-47, ...  I make it 41 nodes,
coming to 23616 bytes.  And the pointer chase to get to each LUN is
ten deep.  It'll mostly be cached, but still ...

> But still, the original question still stands: what would be the most
> efficient way using xarrays here?
> We have a four-level hierarchy Host:Channel:Target:LUN
> and we need to lookup devices (and, occasinally, targets) per host.
> At this time, 'channel' and 'target' are unsigned integer, and
> LUNs are 64 bit.

It certainly seems sensible to me to have a per-host allocating XArray
to store the targets that belong to that host.  I imagine you also want
a per-target XArray for the LUNs that belong to that target.  Do you
also want a per-host XArray to store the LUNs so you can iterate all
LUNs per host as a single lookup rather than indirecting through the
target Xarray?  That's a design decision for you to make.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 1/6] scsi: xarray hctl
  2020-05-26 14:27           ` Matthew Wilcox
@ 2020-05-26 17:53             ` Hannes Reinecke
  2020-05-26 18:39               ` Matthew Wilcox
  2020-05-26 19:10               ` Douglas Gilbert
  0 siblings, 2 replies; 25+ messages in thread
From: Hannes Reinecke @ 2020-05-26 17:53 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Douglas Gilbert, linux-scsi, martin.petersen, jejb

On 5/26/20 4:27 PM, Matthew Wilcox wrote:
> On Tue, May 26, 2020 at 08:21:26AM +0200, Hannes Reinecke wrote:
>> On 5/25/20 7:40 PM, Matthew Wilcox wrote:
>>> You aren't the first person to ask about having a 64-bit lookup on
>>> 32-bit machines.  Indeed, I remember Hannes asking for a 256 bit lookup
>>> at LinuxCon in Prague.  I have always been reluctant to add such a thing
>>> because the XArray uses quite a naive data type underneath.  It works well with
>>> dense arrays but becomes very wasteful of memory for sparse arrays.
>>>
>>> My understanding of SCSI-world is that most devices have a single
>>> LUN 0.  Most devices that have multiple LUNs number them sequentially.
>>> Some devices have either an internal structure or essentially pick random
>>> LUNs for the devices they expose.
>>
>> Not quite. You are correct that most devices have a single LUN 0
>> (think of libata :-), but those with several LUNs typically
>> enumerate them. In most cases the enumeration starts at 0 (or 1,
>> if LUN 0 is used for a management LUN), and reaches up to 256.
>> Some arrays use a different LUN layout, which means that the top
>> two bit of the LUN number are set, and possibly some intermediate
>> numbers, too. But the LUNs themselves are numbered consecutively, too;
>> it's just at a certain offset.
>> I've never seen anyone picking LUN numbers at random.
> 
> Ah, OK.  I think for these arrays you'd be better off accepting the cost
> of an extra 4 bytes in the struct scsi_device rather than the cost of
> storing the scsi_device at the LUN.
> 
> Let's just work an example where you have a 64-bit LUN with 4 ranges,
> each of 64 entries (this is almost a best-case scenario for the XArray).
> [0,63], 2^62+[0,63], 2^63+[0,63], 2^63+2^62+[0,63].
> 
> If we store them sequentially in an allocating XArray, we take up 256 *
> 4 bytes = 1kB extra space in the scsi_device.  The XArray will allocate
> four nodes plus one node to hold the four nodes, which is 5 * 576 bytes
> (2780 bytes) for a total of 3804 bytes.
> 
> Storing them in at their LUN will allocate a top level node which covers
> bits 60-66, then four nodes, each covering bits of 54-59, another four
> nodes covering bits 48-53, four nodes for 42-47, ...  I make it 41 nodes,
> coming to 23616 bytes.  And the pointer chase to get to each LUN is
> ten deep.  It'll mostly be cached, but still ...
> 
Which is my worry, too.
In the end we're having a massively large array space (128bit if we take 
the numbers as they stand today), of which only a _tiny_ fraction is 
actually allocated.
We can try to reduce the array space by eg. restricting channels and 
targets to be 16 bits, and the LUN to be 32 bits.
But then we'll be having a hard time arguing; "Hey, we have a cool new 
feature, which has a really nice interface, but we can't support the 
same set of devices as we have now...".
That surely will go down well.

Maybe one should look at using an rbtree directly; _that_ could be made 
to work with an 128bit index, and lookup should be fast, too.
Let's see.

>> But still, the original question still stands: what would be the most
>> efficient way using xarrays here?
>> We have a four-level hierarchy Host:Channel:Target:LUN
>> and we need to lookup devices (and, occasinally, targets) per host.
>> At this time, 'channel' and 'target' are unsigned integer, and
>> LUNs are 64 bit.
> 
> It certainly seems sensible to me to have a per-host allocating XArray
> to store the targets that belong to that host.  I imagine you also want
> a per-target XArray for the LUNs that belong to that target.  Do you
> also want a per-host XArray to store the LUNs so you can iterate all
> LUNs per host as a single lookup rather than indirecting through the
> target Xarray?  That's a design decision for you to make.
> 
Seeing that I'm trying to use the existing HCTL number as index it's 
quite hard to have a per-host lookup structure, as this would inevitably
require a separate LUN index somewhere.
Which would mean to have a two-level xarray structure, where the first 
xarray hold the scsi targets, and the second one the LUNs per target.
Not super-nice, but doable.

Still thinking about the rbtree idea, though.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 1/6] scsi: xarray hctl
  2020-05-26 17:53             ` Hannes Reinecke
@ 2020-05-26 18:39               ` Matthew Wilcox
  2020-05-26 19:28                 ` James Bottomley
  2020-05-26 19:10               ` Douglas Gilbert
  1 sibling, 1 reply; 25+ messages in thread
From: Matthew Wilcox @ 2020-05-26 18:39 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Douglas Gilbert, linux-scsi, martin.petersen, jejb

On Tue, May 26, 2020 at 07:53:35PM +0200, Hannes Reinecke wrote:
> On 5/26/20 4:27 PM, Matthew Wilcox wrote:
> > Ah, OK.  I think for these arrays you'd be better off accepting the cost
> > of an extra 4 bytes in the struct scsi_device rather than the cost of
> > storing the scsi_device at the LUN.
> > 
> > Let's just work an example where you have a 64-bit LUN with 4 ranges,
> > each of 64 entries (this is almost a best-case scenario for the XArray).
> > [0,63], 2^62+[0,63], 2^63+[0,63], 2^63+2^62+[0,63].
> > 
> > If we store them sequentially in an allocating XArray, we take up 256 *
> > 4 bytes = 1kB extra space in the scsi_device.  The XArray will allocate
> > four nodes plus one node to hold the four nodes, which is 5 * 576 bytes
> > (2780 bytes) for a total of 3804 bytes.
> > 
> > Storing them in at their LUN will allocate a top level node which covers
> > bits 60-66, then four nodes, each covering bits of 54-59, another four
> > nodes covering bits 48-53, four nodes for 42-47, ...  I make it 41 nodes,
> > coming to 23616 bytes.  And the pointer chase to get to each LUN is
> > ten deep.  It'll mostly be cached, but still ...
> > 
> Which is my worry, too.
> In the end we're having a massively large array space (128bit if we take the
> numbers as they stand today), of which only a _tiny_ fraction is actually
> allocated.

In your scheme, yes.  Do you ever need to look up scsi_device from
a scsi_host with only the C:T:L as a key (and it's a situation where
speed matters)?  Everything I've seen so far is about iterating every
sdev in an shost, but maybe this is a "when you have a hammer" situation.

> We can try to reduce the array space by eg. restricting channels and targets
> to be 16 bits, and the LUN to be 32 bits.
> But then we'll be having a hard time arguing; "Hey, we have a cool new
> feature, which has a really nice interface, but we can't support the same
> set of devices as we have now...".
> That surely will go down well.

But using the allocating XArray will outperform both the linked list and
the direct-store XArray that you're proposing.

> Maybe one should look at using an rbtree directly; _that_ could be made to
> work with an 128bit index, and lookup should be fast, too.
> Let's see.

The rbtree is even worse than the linked list if your primary API is
"iterate all objects".


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 1/6] scsi: xarray hctl
  2020-05-26 17:53             ` Hannes Reinecke
  2020-05-26 18:39               ` Matthew Wilcox
@ 2020-05-26 19:10               ` Douglas Gilbert
  2020-05-26 20:27                 ` Douglas Gilbert
  1 sibling, 1 reply; 25+ messages in thread
From: Douglas Gilbert @ 2020-05-26 19:10 UTC (permalink / raw)
  To: Hannes Reinecke, Matthew Wilcox; +Cc: linux-scsi, martin.petersen, jejb

On 2020-05-26 1:53 p.m., Hannes Reinecke wrote:
> On 5/26/20 4:27 PM, Matthew Wilcox wrote:
>> On Tue, May 26, 2020 at 08:21:26AM +0200, Hannes Reinecke wrote:
>>> On 5/25/20 7:40 PM, Matthew Wilcox wrote:
>>>> You aren't the first person to ask about having a 64-bit lookup on
>>>> 32-bit machines.  Indeed, I remember Hannes asking for a 256 bit lookup
>>>> at LinuxCon in Prague.  I have always been reluctant to add such a thing
>>>> because the XArray uses quite a naive data type underneath.  It works well with
>>>> dense arrays but becomes very wasteful of memory for sparse arrays.
>>>>
>>>> My understanding of SCSI-world is that most devices have a single
>>>> LUN 0.  Most devices that have multiple LUNs number them sequentially.
>>>> Some devices have either an internal structure or essentially pick random
>>>> LUNs for the devices they expose.
>>>
>>> Not quite. You are correct that most devices have a single LUN 0
>>> (think of libata :-), but those with several LUNs typically
>>> enumerate them. In most cases the enumeration starts at 0 (or 1,
>>> if LUN 0 is used for a management LUN), and reaches up to 256.
>>> Some arrays use a different LUN layout, which means that the top
>>> two bit of the LUN number are set, and possibly some intermediate
>>> numbers, too. But the LUNs themselves are numbered consecutively, too;
>>> it's just at a certain offset.
>>> I've never seen anyone picking LUN numbers at random.
>>
>> Ah, OK.  I think for these arrays you'd be better off accepting the cost
>> of an extra 4 bytes in the struct scsi_device rather than the cost of
>> storing the scsi_device at the LUN.
>>
>> Let's just work an example where you have a 64-bit LUN with 4 ranges,
>> each of 64 entries (this is almost a best-case scenario for the XArray).
>> [0,63], 2^62+[0,63], 2^63+[0,63], 2^63+2^62+[0,63].
>>
>> If we store them sequentially in an allocating XArray, we take up 256 *
>> 4 bytes = 1kB extra space in the scsi_device.  The XArray will allocate
>> four nodes plus one node to hold the four nodes, which is 5 * 576 bytes
>> (2780 bytes) for a total of 3804 bytes.
>>
>> Storing them in at their LUN will allocate a top level node which covers
>> bits 60-66, then four nodes, each covering bits of 54-59, another four
>> nodes covering bits 48-53, four nodes for 42-47, ...  I make it 41 nodes,
>> coming to 23616 bytes.  And the pointer chase to get to each LUN is
>> ten deep.  It'll mostly be cached, but still ...
>>
> Which is my worry, too.
> In the end we're having a massively large array space (128bit if we take the 
> numbers as they stand today), of which only a _tiny_ fraction is actually 
> allocated.
> We can try to reduce the array space by eg. restricting channels and targets to 
> be 16 bits, and the LUN to be 32 bits.
> But then we'll be having a hard time arguing; "Hey, we have a cool new feature, 
> which has a really nice interface, but we can't support the same set of devices 
> as we have now...".
> That surely will go down well.
> 
> Maybe one should look at using an rbtree directly; _that_ could be made to work 
> with an 128bit index, and lookup should be fast, too.
> Let's see.
> 
>>> But still, the original question still stands: what would be the most
>>> efficient way using xarrays here?
>>> We have a four-level hierarchy Host:Channel:Target:LUN
>>> and we need to lookup devices (and, occasinally, targets) per host.
>>> At this time, 'channel' and 'target' are unsigned integer, and
>>> LUNs are 64 bit.
>>
>> It certainly seems sensible to me to have a per-host allocating XArray
>> to store the targets that belong to that host.  I imagine you also want
>> a per-target XArray for the LUNs that belong to that target.  Do you
>> also want a per-host XArray to store the LUNs so you can iterate all
>> LUNs per host as a single lookup rather than indirecting through the
>> target Xarray?  That's a design decision for you to make.
>>
> Seeing that I'm trying to use the existing HCTL number as index it's quite hard 
> to have a per-host lookup structure, as this would inevitably
> require a separate LUN index somewhere.
> Which would mean to have a two-level xarray structure, where the first xarray 
> hold the scsi targets, and the second one the LUNs per target.
> Not super-nice, but doable.
> 
> Still thinking about the rbtree idea, though.

You already have one, and it handles its own locking. It is called xarray :-)


Circling back to where I started: xarrays that allocate their own indexes.

Apart from the LUN ***, the other indexes tend to be monotonic increasing, most
of the time, correct Hannes? So thinking about an xarray in a shost object
with pointers to starget objects. At run time if say a target disappears,
often another appears either with the same target_id or a larger target_id,
perhaps larger than any previously issued id. So with careful insertions
(xa_alloc() will allow us to precisely position replacements) we can maintain
that monotonic ascending order. As I showed in patch 6/6 we can easily
track the number of elements in each xarray, lets call it 'n'. For
n < 12 (say) we don't care and just do O(n) iterator searches. For n >= 12
we check if the xarray is in ascending order, if it isn't, we sort it by
the index we want (e.g. channel_id,target_id). For elements that we
change positions of, in the xarray, we need to change the variable I called 
'sh_idx' (i.e. the xarray index) in the corresponding starget object.

Why do this, to do a binary search when n >= 12, of course. Lookup time
is then O(ln(n)) with only around ln(n) cache-unfriendly visits to starget
objects.

And that sort can be much better than O(n.ln(n)) since at the point of an
insertion, if the xarray was known to be sorted before the insertion,
then we only need as single iteration through the xarray to place the
new element in the correct position. After that it would be handy to
xa_swap() function that gave us the new xarray index. Making sure the
xarray indexes don't become too sparse might also be addressed in the
sort.


Question to Matthew:
Will xa_find(&xarray, &idx, full_range, 0)
return NULL _and_ place the index of the lowest "hole" in that xarray
(or 1 beyond the upper limit of the range) in 'idx' ?

Note to others: the last argument in a typical xa_find() call
is either XA_PRESENT or some other mark the app is using, not 0.

Doug Gilbert


*** the SCSI REPORT LUNS command doesn't say anything about the _ordering_
     of LUNs or even if it is stable from one invocation to the next.
     There is nothing to stop the scsi_scan.c code from sorting the LUN
     inventory before applying it to the ML object tree.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 1/6] scsi: xarray hctl
  2020-05-26 18:39               ` Matthew Wilcox
@ 2020-05-26 19:28                 ` James Bottomley
  0 siblings, 0 replies; 25+ messages in thread
From: James Bottomley @ 2020-05-26 19:28 UTC (permalink / raw)
  To: Matthew Wilcox, Hannes Reinecke
  Cc: Douglas Gilbert, linux-scsi, martin.petersen

On Tue, 2020-05-26 at 11:39 -0700, Matthew Wilcox wrote:
> On Tue, May 26, 2020 at 07:53:35PM +0200, Hannes Reinecke wrote:
> > On 5/26/20 4:27 PM, Matthew Wilcox wrote:
> > > Ah, OK.  I think for these arrays you'd be better off accepting
> > > the cost of an extra 4 bytes in the struct scsi_device rather
> > > than the cost of storing the scsi_device at the LUN.
> > > 
> > > Let's just work an example where you have a 64-bit LUN with 4
> > > ranges, each of 64 entries (this is almost a best-case scenario
> > > for the XArray). [0,63], 2^62+[0,63], 2^63+[0,63],
> > > 2^63+2^62+[0,63].
> > > 
> > > If we store them sequentially in an allocating XArray, we take up
> > > 256 * 4 bytes = 1kB extra space in the scsi_device.  The XArray
> > > will allocate four nodes plus one node to hold the four nodes,
> > > which is 5 * 576 bytes (2780 bytes) for a total of 3804 bytes.
> > > 
> > > Storing them in at their LUN will allocate a top level node which
> > > covers bits 60-66, then four nodes, each covering bits of 54-59,
> > > another four nodes covering bits 48-53, four nodes for 42-47,
> > > ...  I make it 41 nodes, coming to 23616 bytes.  And the pointer
> > > chase to get to each LUN is ten deep.  It'll mostly be cached,
> > > but still ...
> > > 
> > 
> > Which is my worry, too.
> > In the end we're having a massively large array space (128bit if we
> > take the numbers as they stand today), of which only a _tiny_
> > fraction is actually allocated.
> 
> In your scheme, yes.  Do you ever need to look up scsi_device from
> a scsi_host with only the C:T:L as a key (and it's a situation where
> speed matters)?  Everything I've seen so far is about iterating every
> sdev in an shost, but maybe this is a "when you have a hammer"
> situation.

I don't believe we ever do.  We've arranged pretty much everything so
the SCSI device is immediately obtainable from the enclosing structure
(sysfs, rw, ioctl, interrupt from device etc).  The only time we do a
direct lookup by H:C:T:L is the very old scsi proc API, which we're
trying to deprecate.

James

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 1/6] scsi: xarray hctl
  2020-05-26 19:10               ` Douglas Gilbert
@ 2020-05-26 20:27                 ` Douglas Gilbert
  2020-05-27  2:53                   ` Douglas Gilbert
  0 siblings, 1 reply; 25+ messages in thread
From: Douglas Gilbert @ 2020-05-26 20:27 UTC (permalink / raw)
  To: Hannes Reinecke, Matthew Wilcox; +Cc: linux-scsi, martin.petersen, jejb

On 2020-05-26 3:10 p.m., Douglas Gilbert wrote:
> On 2020-05-26 1:53 p.m., Hannes Reinecke wrote:
>> On 5/26/20 4:27 PM, Matthew Wilcox wrote:
>>> On Tue, May 26, 2020 at 08:21:26AM +0200, Hannes Reinecke wrote:
>>>> On 5/25/20 7:40 PM, Matthew Wilcox wrote:
>>>>> You aren't the first person to ask about having a 64-bit lookup on
>>>>> 32-bit machines.  Indeed, I remember Hannes asking for a 256 bit lookup
>>>>> at LinuxCon in Prague.  I have always been reluctant to add such a thing
>>>>> because the XArray uses quite a naive data type underneath.  It works well 
>>>>> with
>>>>> dense arrays but becomes very wasteful of memory for sparse arrays.
>>>>>
>>>>> My understanding of SCSI-world is that most devices have a single
>>>>> LUN 0.  Most devices that have multiple LUNs number them sequentially.
>>>>> Some devices have either an internal structure or essentially pick random
>>>>> LUNs for the devices they expose.
>>>>
>>>> Not quite. You are correct that most devices have a single LUN 0
>>>> (think of libata :-), but those with several LUNs typically
>>>> enumerate them. In most cases the enumeration starts at 0 (or 1,
>>>> if LUN 0 is used for a management LUN), and reaches up to 256.
>>>> Some arrays use a different LUN layout, which means that the top
>>>> two bit of the LUN number are set, and possibly some intermediate
>>>> numbers, too. But the LUNs themselves are numbered consecutively, too;
>>>> it's just at a certain offset.
>>>> I've never seen anyone picking LUN numbers at random.
>>>
>>> Ah, OK.  I think for these arrays you'd be better off accepting the cost
>>> of an extra 4 bytes in the struct scsi_device rather than the cost of
>>> storing the scsi_device at the LUN.
>>>
>>> Let's just work an example where you have a 64-bit LUN with 4 ranges,
>>> each of 64 entries (this is almost a best-case scenario for the XArray).
>>> [0,63], 2^62+[0,63], 2^63+[0,63], 2^63+2^62+[0,63].
>>>
>>> If we store them sequentially in an allocating XArray, we take up 256 *
>>> 4 bytes = 1kB extra space in the scsi_device.  The XArray will allocate
>>> four nodes plus one node to hold the four nodes, which is 5 * 576 bytes
>>> (2780 bytes) for a total of 3804 bytes.
>>>
>>> Storing them in at their LUN will allocate a top level node which covers
>>> bits 60-66, then four nodes, each covering bits of 54-59, another four
>>> nodes covering bits 48-53, four nodes for 42-47, ...  I make it 41 nodes,
>>> coming to 23616 bytes.  And the pointer chase to get to each LUN is
>>> ten deep.  It'll mostly be cached, but still ...
>>>
>> Which is my worry, too.
>> In the end we're having a massively large array space (128bit if we take the 
>> numbers as they stand today), of which only a _tiny_ fraction is actually 
>> allocated.
>> We can try to reduce the array space by eg. restricting channels and targets 
>> to be 16 bits, and the LUN to be 32 bits.
>> But then we'll be having a hard time arguing; "Hey, we have a cool new 
>> feature, which has a really nice interface, but we can't support the same set 
>> of devices as we have now...".
>> That surely will go down well.
>>
>> Maybe one should look at using an rbtree directly; _that_ could be made to 
>> work with an 128bit index, and lookup should be fast, too.
>> Let's see.
>>
>>>> But still, the original question still stands: what would be the most
>>>> efficient way using xarrays here?
>>>> We have a four-level hierarchy Host:Channel:Target:LUN
>>>> and we need to lookup devices (and, occasinally, targets) per host.
>>>> At this time, 'channel' and 'target' are unsigned integer, and
>>>> LUNs are 64 bit.
>>>
>>> It certainly seems sensible to me to have a per-host allocating XArray
>>> to store the targets that belong to that host.  I imagine you also want
>>> a per-target XArray for the LUNs that belong to that target.  Do you
>>> also want a per-host XArray to store the LUNs so you can iterate all
>>> LUNs per host as a single lookup rather than indirecting through the
>>> target Xarray?  That's a design decision for you to make.
>>>
>> Seeing that I'm trying to use the existing HCTL number as index it's quite 
>> hard to have a per-host lookup structure, as this would inevitably
>> require a separate LUN index somewhere.
>> Which would mean to have a two-level xarray structure, where the first xarray 
>> hold the scsi targets, and the second one the LUNs per target.
>> Not super-nice, but doable.
>>
>> Still thinking about the rbtree idea, though.
> 
> You already have one, and it handles its own locking. It is called xarray :-)
> 
> 
> Circling back to where I started: xarrays that allocate their own indexes.
> 
> Apart from the LUN ***, the other indexes tend to be monotonic increasing, most
> of the time, correct Hannes? So thinking about an xarray in a shost object
> with pointers to starget objects. At run time if say a target disappears,
> often another appears either with the same target_id or a larger target_id,
> perhaps larger than any previously issued id. So with careful insertions
> (xa_alloc() will allow us to precisely position replacements) we can maintain
> that monotonic ascending order. As I showed in patch 6/6 we can easily
> track the number of elements in each xarray, lets call it 'n'. For
> n < 12 (say) we don't care and just do O(n) iterator searches. For n >= 12
> we check if the xarray is in ascending order, if it isn't, we sort it by
> the index we want (e.g. channel_id,target_id). For elements that we
> change positions of, in the xarray, we need to change the variable I called 
> 'sh_idx' (i.e. the xarray index) in the corresponding starget object.
> 
> Why do this, to do a binary search when n >= 12, of course. Lookup time
> is then O(ln(n)) with only around ln(n) cache-unfriendly visits to starget
> objects.
> 
> And that sort can be much better than O(n.ln(n)) since at the point of an
> insertion, if the xarray was known to be sorted before the insertion,
> then we only need as single iteration through the xarray to place the
> new element in the correct position. After that it would be handy to
> xa_swap() function that gave us the new xarray index. Making sure the
> xarray indexes don't become too sparse might also be addressed in the
> sort.
> 
> 
> Question to Matthew:
> Will xa_find(&xarray, &idx, full_range, 0)
> return NULL _and_ place the index of the lowest "hole" in that xarray
> (or 1 beyond the upper limit of the range) in 'idx' ?
> 
> Note to others: the last argument in a typical xa_find() call
> is either XA_PRESENT or some other mark the app is using, not 0.
> 
> Doug Gilbert
> 
> 
> *** the SCSI REPORT LUNS command doesn't say anything about the _ordering_
>      of LUNs or even if it is stable from one invocation to the next.
>      There is nothing to stop the scsi_scan.c code from sorting the LUN
>      inventory before applying it to the ML object tree.

Maybe interest Matthew into doing the heavy lifting:

void xa_compact_and_sort(struct xa_array *xa,
                          int offset_ul_idx,
                          int (*compar)(const void *l, const void *r));

[Function modelled on qsort(3), perhaps qsort_r(3) would be better.]

The second argument to xa_compact_and_sort() is either:
     - offsetof(struct struct, sh_idx) where sh_idx is an unsigned long, or
     - if less than zero then ignore index fixup and don't compact

The third argument is either:
      - a compare function that, for example, acts on an ordering for the
        key: channel_id,target_id and uses it to return <0, ==0 or >=0
        based on being called with pointers to two starget objects, or
      - NULL: then just do a compaction if ul_idx >= 0

So when ul_idx >= 0 xa_compact_and_sort() places the new compacted index
at each *((unsigned long *)((u8 *)starget) + ul_idx)). So in the case of
my 1/6 patch the type of sh_idx would change from int to unsigned long.

		

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 1/6] scsi: xarray hctl
  2020-05-26 20:27                 ` Douglas Gilbert
@ 2020-05-27  2:53                   ` Douglas Gilbert
  0 siblings, 0 replies; 25+ messages in thread
From: Douglas Gilbert @ 2020-05-27  2:53 UTC (permalink / raw)
  To: Hannes Reinecke, Matthew Wilcox; +Cc: linux-scsi, martin.petersen, jejb

[-- Attachment #1: Type: text/plain, Size: 9108 bytes --]

On 2020-05-26 4:27 p.m., Douglas Gilbert wrote:
> On 2020-05-26 3:10 p.m., Douglas Gilbert wrote:
>> On 2020-05-26 1:53 p.m., Hannes Reinecke wrote:
>>> On 5/26/20 4:27 PM, Matthew Wilcox wrote:
>>>> On Tue, May 26, 2020 at 08:21:26AM +0200, Hannes Reinecke wrote:
>>>>> On 5/25/20 7:40 PM, Matthew Wilcox wrote:
>>>>>> You aren't the first person to ask about having a 64-bit lookup on
>>>>>> 32-bit machines.  Indeed, I remember Hannes asking for a 256 bit lookup
>>>>>> at LinuxCon in Prague.  I have always been reluctant to add such a thing
>>>>>> because the XArray uses quite a naive data type underneath.  It works well 
>>>>>> with
>>>>>> dense arrays but becomes very wasteful of memory for sparse arrays.
>>>>>>
>>>>>> My understanding of SCSI-world is that most devices have a single
>>>>>> LUN 0.  Most devices that have multiple LUNs number them sequentially.
>>>>>> Some devices have either an internal structure or essentially pick random
>>>>>> LUNs for the devices they expose.
>>>>>
>>>>> Not quite. You are correct that most devices have a single LUN 0
>>>>> (think of libata :-), but those with several LUNs typically
>>>>> enumerate them. In most cases the enumeration starts at 0 (or 1,
>>>>> if LUN 0 is used for a management LUN), and reaches up to 256.
>>>>> Some arrays use a different LUN layout, which means that the top
>>>>> two bit of the LUN number are set, and possibly some intermediate
>>>>> numbers, too. But the LUNs themselves are numbered consecutively, too;
>>>>> it's just at a certain offset.
>>>>> I've never seen anyone picking LUN numbers at random.
>>>>
>>>> Ah, OK.  I think for these arrays you'd be better off accepting the cost
>>>> of an extra 4 bytes in the struct scsi_device rather than the cost of
>>>> storing the scsi_device at the LUN.
>>>>
>>>> Let's just work an example where you have a 64-bit LUN with 4 ranges,
>>>> each of 64 entries (this is almost a best-case scenario for the XArray).
>>>> [0,63], 2^62+[0,63], 2^63+[0,63], 2^63+2^62+[0,63].
>>>>
>>>> If we store them sequentially in an allocating XArray, we take up 256 *
>>>> 4 bytes = 1kB extra space in the scsi_device.  The XArray will allocate
>>>> four nodes plus one node to hold the four nodes, which is 5 * 576 bytes
>>>> (2780 bytes) for a total of 3804 bytes.
>>>>
>>>> Storing them in at their LUN will allocate a top level node which covers
>>>> bits 60-66, then four nodes, each covering bits of 54-59, another four
>>>> nodes covering bits 48-53, four nodes for 42-47, ...  I make it 41 nodes,
>>>> coming to 23616 bytes.  And the pointer chase to get to each LUN is
>>>> ten deep.  It'll mostly be cached, but still ...
>>>>
>>> Which is my worry, too.
>>> In the end we're having a massively large array space (128bit if we take the 
>>> numbers as they stand today), of which only a _tiny_ fraction is actually 
>>> allocated.
>>> We can try to reduce the array space by eg. restricting channels and targets 
>>> to be 16 bits, and the LUN to be 32 bits.
>>> But then we'll be having a hard time arguing; "Hey, we have a cool new 
>>> feature, which has a really nice interface, but we can't support the same set 
>>> of devices as we have now...".
>>> That surely will go down well.
>>>
>>> Maybe one should look at using an rbtree directly; _that_ could be made to 
>>> work with an 128bit index, and lookup should be fast, too.
>>> Let's see.
>>>
>>>>> But still, the original question still stands: what would be the most
>>>>> efficient way using xarrays here?
>>>>> We have a four-level hierarchy Host:Channel:Target:LUN
>>>>> and we need to lookup devices (and, occasinally, targets) per host.
>>>>> At this time, 'channel' and 'target' are unsigned integer, and
>>>>> LUNs are 64 bit.
>>>>
>>>> It certainly seems sensible to me to have a per-host allocating XArray
>>>> to store the targets that belong to that host.  I imagine you also want
>>>> a per-target XArray for the LUNs that belong to that target.  Do you
>>>> also want a per-host XArray to store the LUNs so you can iterate all
>>>> LUNs per host as a single lookup rather than indirecting through the
>>>> target Xarray?  That's a design decision for you to make.
>>>>
>>> Seeing that I'm trying to use the existing HCTL number as index it's quite 
>>> hard to have a per-host lookup structure, as this would inevitably
>>> require a separate LUN index somewhere.
>>> Which would mean to have a two-level xarray structure, where the first xarray 
>>> hold the scsi targets, and the second one the LUNs per target.
>>> Not super-nice, but doable.
>>>
>>> Still thinking about the rbtree idea, though.
>>
>> You already have one, and it handles its own locking. It is called xarray :-)
>>
>>
>> Circling back to where I started: xarrays that allocate their own indexes.
>>
>> Apart from the LUN ***, the other indexes tend to be monotonic increasing, most
>> of the time, correct Hannes? So thinking about an xarray in a shost object
>> with pointers to starget objects. At run time if say a target disappears,
>> often another appears either with the same target_id or a larger target_id,
>> perhaps larger than any previously issued id. So with careful insertions
>> (xa_alloc() will allow us to precisely position replacements) we can maintain
>> that monotonic ascending order. As I showed in patch 6/6 we can easily
>> track the number of elements in each xarray, lets call it 'n'. For
>> n < 12 (say) we don't care and just do O(n) iterator searches. For n >= 12
>> we check if the xarray is in ascending order, if it isn't, we sort it by
>> the index we want (e.g. channel_id,target_id). For elements that we
>> change positions of, in the xarray, we need to change the variable I called 
>> 'sh_idx' (i.e. the xarray index) in the corresponding starget object.
>>
>> Why do this, to do a binary search when n >= 12, of course. Lookup time
>> is then O(ln(n)) with only around ln(n) cache-unfriendly visits to starget
>> objects.
>>
>> And that sort can be much better than O(n.ln(n)) since at the point of an
>> insertion, if the xarray was known to be sorted before the insertion,
>> then we only need as single iteration through the xarray to place the
>> new element in the correct position. After that it would be handy to
>> xa_swap() function that gave us the new xarray index. Making sure the
>> xarray indexes don't become too sparse might also be addressed in the
>> sort.
>>
>>
>> Question to Matthew:
>> Will xa_find(&xarray, &idx, full_range, 0)
>> return NULL _and_ place the index of the lowest "hole" in that xarray
>> (or 1 beyond the upper limit of the range) in 'idx' ?

Looked at xa_find() in lib/xarray.c and the answer is no. No harm,
we can find holes by watching the index arguments output by
xa_find() and xa_find_after() as we iterate over the array.

>> Note to others: the last argument in a typical xa_find() call
>> is either XA_PRESENT or some other mark the app is using, not 0.
>>
>> Doug Gilbert
>>
>>
>> *** the SCSI REPORT LUNS command doesn't say anything about the _ordering_
>>      of LUNs or even if it is stable from one invocation to the next.
>>      There is nothing to stop the scsi_scan.c code from sorting the LUN
>>      inventory before applying it to the ML object tree.
> 
> Maybe interest Matthew into doing the heavy lifting:
> 
> void xa_compact_and_sort(struct xa_array *xa,
>                           int offset_ul_idx,
>                           int (*compar)(const void *l, const void *r));
> 
> [Function modelled on qsort(3), perhaps qsort_r(3) would be better.]
> 
> The second argument to xa_compact_and_sort() is either:
>      - offsetof(struct struct, sh_idx) where sh_idx is an unsigned long, or
>      - if less than zero then ignore index fixup and don't compact
> 
> The third argument is either:
>       - a compare function that, for example, acts on an ordering for the
>         key: channel_id,target_id and uses it to return <0, ==0 or >=0
>         based on being called with pointers to two starget objects, or
>       - NULL: then just do a compaction if ul_idx >= 0
> 
> So when ul_idx >= 0 xa_compact_and_sort() places the new compacted index
> at each *((unsigned long *)((u8 *)starget) + ul_idx)). So in the case of
> my 1/6 patch the type of sh_idx would change from int to unsigned long.

On further thought, I would add a new bool argument 'do_compact' and
drop that "and don't compact" from offset_ul_idx description. Or
just have separate the xa_compact() and xa_sort() functions.

Attached is a rough idea of what the xa_compact logic might do. It in
turn depends on either a xa_swap() or xa_move() function which we
don't currently have. Naturally I would like those functions to "carry"
the mark info with them (unlike xa_load() and xa_store()) as they
swap or move elements.

A common use case might be when all but the last element are already
sorted, and a newly arrived target (disk) is appended to the array.
So only the last element needs to be re-placed if it is out of order.

Doug Gilbert





[-- Attachment #2: xa_compact.c --]
[-- Type: text/x-csrc, Size: 742 bytes --]

#include <linux/xarray.h>


unsigned long
xa_compact(struct xa_array *xap, int off_ul_idx, unsigned long last)
{
	bool adjust_user_idx = (offset_ul_idx >= 0);
	unsigned long idx;
	unsigned long pp1_idx = 0;	/* previous index plus 1 */
	unsigned long num = 0;
	unsigned long gap = 0;
	void *vp;

	for (vp = xa_find(xap, idx, last, XA_PRESENT); vp;
	     ++num, vp = xa_find_after(xap, idx, last, XA_PRESENT)) {
		if (idx > pp1_idx)
			gap += idx - pp1_idx;
		if (gap > 0)
			xa_swap(xap, num, idx);
			/* or xa_move(xap, dst_idx, src_idx) */
			if (adjust_user_idx) {
				vp = xa_load(xap, num);
				*((unsigned long *)((u8 *)vp + off_ul_idx)) =
								num;
			}
		}
		pp1_idx = idx + 1;
	}
	return num;		/* compacted number of elements */
}

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2020-05-27  2:53 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-24 15:58 [RFC v2 0/6] scsi: rework mid-layer with xarrays Douglas Gilbert
2020-05-24 15:58 ` [RFC v2 1/6] scsi: xarray hctl Douglas Gilbert
2020-05-25  6:57   ` Hannes Reinecke
2020-05-25 16:30     ` Douglas Gilbert
2020-05-25 17:40       ` Matthew Wilcox
2020-05-26  2:01         ` Douglas Gilbert
2020-05-26  3:01           ` Matthew Wilcox
2020-05-26  7:24           ` Hannes Reinecke
2020-05-26  6:21         ` Hannes Reinecke
2020-05-26 14:27           ` Matthew Wilcox
2020-05-26 17:53             ` Hannes Reinecke
2020-05-26 18:39               ` Matthew Wilcox
2020-05-26 19:28                 ` James Bottomley
2020-05-26 19:10               ` Douglas Gilbert
2020-05-26 20:27                 ` Douglas Gilbert
2020-05-27  2:53                   ` Douglas Gilbert
2020-05-24 15:58 ` [RFC v2 2/6] scsi: xarray, iterations on scsi_target Douglas Gilbert
2020-05-25  7:06   ` Hannes Reinecke
2020-05-24 15:58 ` [RFC v2 3/6] scsi: xarray mark sdev_del state Douglas Gilbert
2020-05-25  7:00   ` Hannes Reinecke
2020-05-25 16:47     ` Douglas Gilbert
2020-05-24 15:58 ` [RFC v2 4/6] scsi: improve scsi_device_lookup Douglas Gilbert
2020-05-25  7:07   ` Hannes Reinecke
2020-05-24 15:58 ` [RFC v2 5/6] scsi: add __scsi_target_lookup Douglas Gilbert
2020-05-24 15:58 ` [RFC v2 6/6] scsi: count number of targets Douglas Gilbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).