linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] scsi: sd: support specify probe type of build-in driver
@ 2023-06-06  5:12 Jianlin Lv
  2023-06-06 17:38 ` Bart Van Assche
  2023-06-07 14:10 ` Christoph Hellwig
  0 siblings, 2 replies; 9+ messages in thread
From: Jianlin Lv @ 2023-06-06  5:12 UTC (permalink / raw)
  To: jejb, martin.petersen, paulmck, bp, peterz, will, rdunlap,
	kim.phillips, rostedt, wyes.karny
  Cc: iecedge, jianlv, linux-kernel, linux-doc, linux-scsi

When SCSI disks are located on different SCSI hosts within a system,
asynchronous detection can lead to non-deterministic SCSI disk names.

This patch introduces the 'sd_probe_type=' kernel boot parameter.

In scenarios where SCSI disk name sensitivity is crucial, the probe type
of the build-in sd driver can be set to synchronous. As a result,
the scsi disk names are deterministic.

Signed-off-by: Jianlin Lv <iecedge@gmail.com>
---
 .../admin-guide/kernel-parameters.txt         |  9 ++++++++
 drivers/scsi/sd.c                             | 23 +++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9e5bab29685f..083f741d63bb 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5611,6 +5611,15 @@
 			non-zero "wait" parameter.  See weight_single
 			and weight_many.
 
+	sd_probe_type=	[HW,SCSI] Manual setup probe type of built-in scsi disk driver
+			Format: <int>
+			Default: 1
+			<int> -- device driver probe type to try
+				0 - PROBE_DEFAULT_STRATEGY
+				1 - PROBE_PREFER_ASYNCHRONOUS
+				2 - PROBE_FORCE_SYNCHRONOUS
+			Example: sd_probe_type=1
+
 	skew_tick=	[KNL] Offset the periodic timer tick per cpu to mitigate
 			xtime_lock contention on larger systems, and/or RCU lock
 			contention on all systems with CONFIG_MAXSMP set.
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 1624d528aa1f..78b80b9e5618 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -121,6 +121,9 @@ static void scsi_disk_release(struct device *cdev);
 
 static DEFINE_IDA(sd_index_ida);
 
+/* Probe type of SCSI Disk Driver */
+static int sd_probe_type = PROBE_PREFER_ASYNCHRONOUS;
+
 static mempool_t *sd_page_pool;
 static struct lock_class_key sd_bio_compl_lkclass;
 
@@ -3826,6 +3829,25 @@ static int sd_resume_runtime(struct device *dev)
 	return sd_resume(dev);
 }
 
+#ifndef MODULE
+
+/* Set the boot options to sd driver.
+ * Syntax is defined in Documentation/admin-guide/kernel-parameters.txt.
+ */
+static int __init sd_probe_setup(char *str)
+{
+	int probe_type = -1;
+
+	if (get_option(&str, &probe_type) && probe_type >= 0 && probe_type < 3)
+		sd_probe_type = probe_type;
+
+	return 1;
+}
+
+__setup("sd_probe_type=", sd_probe_setup);
+
+#endif
+
 /**
  *	init_sd - entry point for this driver (both when built in or when
  *	a module).
@@ -3858,6 +3880,7 @@ static int __init init_sd(void)
 		goto err_out_class;
 	}
 
+	sd_template.gendrv.probe_type = sd_probe_type;
 	err = scsi_register_driver(&sd_template.gendrv);
 	if (err)
 		goto err_out_driver;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] scsi: sd: support specify probe type of build-in driver
  2023-06-06  5:12 [PATCH] scsi: sd: support specify probe type of build-in driver Jianlin Lv
@ 2023-06-06 17:38 ` Bart Van Assche
  2023-06-07 15:55   ` Jianlin Lv
  2023-06-07 14:10 ` Christoph Hellwig
  1 sibling, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2023-06-06 17:38 UTC (permalink / raw)
  To: Jianlin Lv, jejb, martin.petersen, paulmck, bp, peterz, will,
	rdunlap, kim.phillips, rostedt, wyes.karny
  Cc: jianlv, linux-kernel, linux-doc, linux-scsi

On 6/5/23 22:12, Jianlin Lv wrote:
> In scenarios where SCSI disk name sensitivity is crucial, the probe type
> of the build-in sd driver can be set to synchronous. As a result,
> the scsi disk names are deterministic.

Which are these scenarios?

Additionally, how can synchronous scanning of sd devices make a 
difference if there are multiple host bus adapters that use an interface 
type that is scanned asynchronously?

Bart.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] scsi: sd: support specify probe type of build-in driver
  2023-06-06  5:12 [PATCH] scsi: sd: support specify probe type of build-in driver Jianlin Lv
  2023-06-06 17:38 ` Bart Van Assche
@ 2023-06-07 14:10 ` Christoph Hellwig
  2023-06-08  3:10   ` Jianlin Lv
  1 sibling, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2023-06-07 14:10 UTC (permalink / raw)
  To: Jianlin Lv
  Cc: jejb, martin.petersen, paulmck, bp, peterz, will, rdunlap,
	kim.phillips, rostedt, wyes.karny, jianlv, linux-kernel,
	linux-doc, linux-scsi

On Tue, Jun 06, 2023 at 05:12:17AM +0000, Jianlin Lv wrote:
> When SCSI disks are located on different SCSI hosts within a system,
> asynchronous detection can lead to non-deterministic SCSI disk names.

Yes, as can various other conditions.  Your code better be able to deal
with that.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] scsi: sd: support specify probe type of build-in driver
  2023-06-06 17:38 ` Bart Van Assche
@ 2023-06-07 15:55   ` Jianlin Lv
  2023-06-07 17:07     ` Bart Van Assche
  0 siblings, 1 reply; 9+ messages in thread
From: Jianlin Lv @ 2023-06-07 15:55 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: jejb, martin.petersen, paulmck, bp, peterz, will, rdunlap,
	kim.phillips, rostedt, wyes.karny, jianlv, linux-kernel,
	linux-doc, linux-scsi

On Wed, Jun 7, 2023 at 1:38 AM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 6/5/23 22:12, Jianlin Lv wrote:
> > In scenarios where SCSI disk name sensitivity is crucial, the probe type
> > of the build-in sd driver can be set to synchronous. As a result,
> > the scsi disk names are deterministic.
>
> Which are these scenarios?
>
> Additionally, how can synchronous scanning of sd devices make a
> difference if there are multiple host bus adapters that use an interface
> type that is scanned asynchronously?
>
> Bart.

The change was prompted by an issue with SCSI devices probing
non-deterministic. On the issue node, there are two types of SCSI hosts:

1. MegaRAID adapters associated with 24 local disks. The disks are named
sequentially as "sda," "sdb," and so on, up to "sdx."
2. STAT controllers associated with the root disk, named "sdy."

Both the MegaRAID adapters and the SATA controller (PCH) are accessed via
the PCIe bus. In theory, depending on their PCIe bus ID in ascending order,
the devices should be initialized in ascending order as well.

However, the SCSI driver currently probes devices asynchronously to allow
for more parallelism.

__driver_attach
  ->if (driver_allows_async_probing(drv))
      async_schedule_dev(__driver_attach_async_helper, dev);

During the probing of SCSI disks attached to MegaRAID, root disk probing
may occur, resulting in a disk naming inconsistency issue.
For example, if root disk probing happens in the middle,it is named "sdq",
The subsequent SCSI disks that are probed will have their names drift,
starting from "sdr" up to "sdy."

For cloud deployment, the local volume provisioner detects and creates PVs
for each local disk (from sda to sdx) on the host, and it cleans up the
disks when they are released.
This requires the logical names of the disks to be deterministic.

Therefore, I have submitted this patch to allow users to configure the
SCSI disk probe type.
If synchronous probing is configured, the SCSI disk probing order is
deterministic and will follow the ascending order of the PCIe bus ID.

Jianlin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] scsi: sd: support specify probe type of build-in driver
  2023-06-07 15:55   ` Jianlin Lv
@ 2023-06-07 17:07     ` Bart Van Assche
  2023-06-08  2:51       ` Jianlin Lv
  0 siblings, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2023-06-07 17:07 UTC (permalink / raw)
  To: Jianlin Lv
  Cc: jejb, martin.petersen, paulmck, bp, peterz, will, rdunlap,
	kim.phillips, rostedt, wyes.karny, jianlv, linux-kernel,
	linux-doc, linux-scsi

On 6/7/23 08:55, Jianlin Lv wrote:
> 1. MegaRAID adapters associated with 24 local disks. The disks are named
> sequentially as "sda," "sdb," and so on, up to "sdx."
> 2. STAT controllers associated with the root disk, named "sdy."
> 
> Both the MegaRAID adapters and the SATA controller (PCH) are accessed via
> the PCIe bus. In theory, depending on their PCIe bus ID in ascending order,
> the devices should be initialized in ascending order as well.

Hmm ... I don't think there is anything that prevents the PCIe maintainer
from changing the PCIe probing behavior from synchronous to asynchronous?
In other words, I don't think it is safe to assume that PCIe devices are
always scanned in the same order.

> For cloud deployment, the local volume provisioner detects and creates PVs
> for each local disk (from sda to sdx) on the host, and it cleans up the
> disks when they are released.
> This requires the logical names of the disks to be deterministic.

I see two possible solutions:
- Change the volume provisioner such that it uses disk references that do
   not depend on the probing order, e.g. /dev/disk/by-id/...
- Implement an algorithm in systemd that makes disk names predictable.
   An explanation of how predictable names work for network interfaces is
   available here: https://wiki.debian.org/NetworkInterfaceNames. The
   systemd documentation about predictable network names is available here:
   https://www.freedesktop.org/software/systemd/man/systemd.net-naming-scheme.html

These alternatives have the advantage that disk scanning remains asynchronous.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] scsi: sd: support specify probe type of build-in driver
  2023-06-07 17:07     ` Bart Van Assche
@ 2023-06-08  2:51       ` Jianlin Lv
  2023-06-08 16:23         ` Bart Van Assche
  0 siblings, 1 reply; 9+ messages in thread
From: Jianlin Lv @ 2023-06-08  2:51 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: jejb, martin.petersen, paulmck, bp, peterz, will, rdunlap,
	kim.phillips, rostedt, wyes.karny, jianlv, linux-kernel,
	linux-doc, linux-scsi

On Thu, Jun 8, 2023 at 1:07 AM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 6/7/23 08:55, Jianlin Lv wrote:
> > 1. MegaRAID adapters associated with 24 local disks. The disks are named
> > sequentially as "sda," "sdb," and so on, up to "sdx."
> > 2. STAT controllers associated with the root disk, named "sdy."
> >
> > Both the MegaRAID adapters and the SATA controller (PCH) are accessed via
> > the PCIe bus. In theory, depending on their PCIe bus ID in ascending order,
> > the devices should be initialized in ascending order as well.
>
> Hmm ... I don't think there is anything that prevents the PCIe maintainer
> from changing the PCIe probing behavior from synchronous to asynchronous?
> In other words, I don't think it is safe to assume that PCIe devices are
> always scanned in the same order.
>
> > For cloud deployment, the local volume provisioner detects and creates PVs
> > for each local disk (from sda to sdx) on the host, and it cleans up the
> > disks when they are released.
> > This requires the logical names of the disks to be deterministic.
>
> I see two possible solutions:
> - Change the volume provisioner such that it uses disk references that do
>    not depend on the probing order, e.g. /dev/disk/by-id/...

Yes, The "/dev/disk/by-id/" can uniquely identify SCSI devices. However,
I don't think it is suitable for the volume provisioner workflow.
For nodes of the same SKU , a unified YAML file will be defined to instruct
the volume provisioner on how to manage the local disks.
If use WWID, it would mean that a unique YAML file needs to be defined
for each node. This approach becomes impractical when dealing with a large
number of work nodes.

Jianlin

> - Implement an algorithm in systemd that makes disk names predictable.
>    An explanation of how predictable names work for network interfaces is
>    available here: https://wiki.debian.org/NetworkInterfaceNames. The
>    systemd documentation about predictable network names is available here:
>    https://www.freedesktop.org/software/systemd/man/systemd.net-naming-scheme.html
>
> These alternatives have the advantage that disk scanning remains asynchronous.
>
> Thanks,
>
> Bart.
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] scsi: sd: support specify probe type of build-in driver
  2023-06-07 14:10 ` Christoph Hellwig
@ 2023-06-08  3:10   ` Jianlin Lv
  0 siblings, 0 replies; 9+ messages in thread
From: Jianlin Lv @ 2023-06-08  3:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: jejb, martin.petersen, paulmck, bp, peterz, will, rdunlap,
	kim.phillips, rostedt, wyes.karny, jianlv, linux-kernel,
	linux-doc, linux-scsi

On Wed, Jun 7, 2023 at 10:10 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Tue, Jun 06, 2023 at 05:12:17AM +0000, Jianlin Lv wrote:
> > When SCSI disks are located on different SCSI hosts within a system,
> > asynchronous detection can lead to non-deterministic SCSI disk names.
>
> Yes, as can various other conditions.  Your code better be able to deal
> with that.

Could you give an example about "other conditions" ?

Jianlin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] scsi: sd: support specify probe type of build-in driver
  2023-06-08  2:51       ` Jianlin Lv
@ 2023-06-08 16:23         ` Bart Van Assche
  2023-06-24 11:45           ` Jianlin Lv
  0 siblings, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2023-06-08 16:23 UTC (permalink / raw)
  To: Jianlin Lv
  Cc: jejb, martin.petersen, paulmck, bp, peterz, will, rdunlap,
	kim.phillips, rostedt, wyes.karny, jianlv, linux-kernel,
	linux-doc, linux-scsi

On 6/7/23 19:51, Jianlin Lv wrote:
> On Thu, Jun 8, 2023 at 1:07 AM Bart Van Assche <bvanassche@acm.org> wrote:
>> On 6/7/23 08:55, Jianlin Lv wrote:
>> I see two possible solutions:
>> - Change the volume provisioner such that it uses disk references that do
>>     not depend on the probing order, e.g. /dev/disk/by-id/...
> 
> Yes, The "/dev/disk/by-id/" can uniquely identify SCSI devices. However,
> I don't think it is suitable for the volume provisioner workflow.
> For nodes of the same SKU , a unified YAML file will be defined to instruct
> the volume provisioner on how to manage the local disks.
> If use WWID, it would mean that a unique YAML file needs to be defined
> for each node. This approach becomes impractical when dealing with a large
> number of work nodes.
Please consider using the paths available in /dev/disk/by-path.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] scsi: sd: support specify probe type of build-in driver
  2023-06-08 16:23         ` Bart Van Assche
@ 2023-06-24 11:45           ` Jianlin Lv
  0 siblings, 0 replies; 9+ messages in thread
From: Jianlin Lv @ 2023-06-24 11:45 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: jejb, martin.petersen, paulmck, bp, peterz, will, rdunlap,
	kim.phillips, rostedt, wyes.karny, jianlv, linux-kernel,
	linux-doc, linux-scsi

On Fri, Jun 9, 2023 at 12:23 AM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 6/7/23 19:51, Jianlin Lv wrote:
> > On Thu, Jun 8, 2023 at 1:07 AM Bart Van Assche <bvanassche@acm.org> wrote:
> >> On 6/7/23 08:55, Jianlin Lv wrote:
> >> I see two possible solutions:
> >> - Change the volume provisioner such that it uses disk references that do
> >>     not depend on the probing order, e.g. /dev/disk/by-id/...
> >
> > Yes, The "/dev/disk/by-id/" can uniquely identify SCSI devices. However,
> > I don't think it is suitable for the volume provisioner workflow.
> > For nodes of the same SKU , a unified YAML file will be defined to instruct
> > the volume provisioner on how to manage the local disks.
> > If use WWID, it would mean that a unique YAML file needs to be defined
> > for each node. This approach becomes impractical when dealing with a large
> > number of work nodes.
> Please consider using the paths available in /dev/disk/by-path.

Sorry for the late reply.
I carefully checked the server in the production environment and found
some corner cases where there are differences in the dev/disk/by-path/ of
nodes with the same SKU. These differences are caused by inconsistent
target_numbers.

For example:

diff -y aa-by-path bb-by-path
pci-0000:86:00.0-scsi-0:3:86:0 -> ../../sda |
pci-0000:86:00.0-scsi-0:3:88:0 -> ../../sda
pci-0000:86:00.0-scsi-0:3:87:0 -> ../../sdb |
pci-0000:86:00.0-scsi-0:3:89:0 -> ../../sdb
pci-0000:86:00.0-scsi-0:3:88:0 -> ../../sdc |
pci-0000:86:00.0-scsi-0:3:90:0 -> ../../sdc
pci-0000:86:00.0-scsi-0:3:89:0 -> ../../sdd |
pci-0000:86:00.0-scsi-0:3:91:0 -> ../../sdd
pci-0000:86:00.0-scsi-0:3:90:0 -> ../../sde |
pci-0000:86:00.0-scsi-0:3:92:0 -> ../../sde
pci-0000:86:00.0-scsi-0:3:92:0 -> ../../sdf |
pci-0000:86:00.0-scsi-0:3:93:0 -> ../../sdf
pci-0000:86:00.0-scsi-0:3:93:0 -> ../../sdg |
pci-0000:86:00.0-scsi-0:3:94:0 -> ../../sdg
pci-0000:86:00.0-scsi-0:3:94:0 -> ../../sdh |
pci-0000:86:00.0-scsi-0:3:95:0 -> ../../sdh

I'm still not sure what causes the target_numbers to be different.
However, the existence of such corner cases makes /dev/disk/by-path
unusable for the volume provisioner, similar to /dev/disk/by-id/.

So, If it's not possible to configure disk serialization detection, then
it seems that implementing predictable disk names is the only option.

Jianlin

>
> Thanks,
>
> Bart.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-06-24 11:45 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-06  5:12 [PATCH] scsi: sd: support specify probe type of build-in driver Jianlin Lv
2023-06-06 17:38 ` Bart Van Assche
2023-06-07 15:55   ` Jianlin Lv
2023-06-07 17:07     ` Bart Van Assche
2023-06-08  2:51       ` Jianlin Lv
2023-06-08 16:23         ` Bart Van Assche
2023-06-24 11:45           ` Jianlin Lv
2023-06-07 14:10 ` Christoph Hellwig
2023-06-08  3:10   ` Jianlin Lv

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).