All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] SCSI core and UFS patches for kernel v6.4
@ 2023-04-17 23:06 Bart Van Assche
  2023-04-17 23:06 ` [PATCH v2 1/4] scsi: sd: Let sd_shutdown() fail future I/O Bart Van Assche
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Bart Van Assche @ 2023-04-17 23:06 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: Jaegeuk Kim, Avri Altman, Adrian Hunter, linux-scsi, Bart Van Assche

Hi Martin,

This patch series includes a patch for making sd_shutdown() fail future I/O
and also two UFS patches.

Patch 3/3 of this series has been posted earlier. Compared to the previous
version of that patch, a Fixes: tag has been added.

Please consider this patch series for the next merge window. 

Thanks,

Bart.

Changes compared to v1:
- Slightly changed the description of patch "scsi: sd: Let sd_shutdown() fail
  future I/O".
- Included patch "scsi: ufs: Fix handling of lrbp->cmd"

Bart Van Assche (4):
  scsi: sd: Let sd_shutdown() fail future I/O
  scsi: ufs: Simplify ufshcd_wl_shutdown()
  scsi: ufs: Increase the START STOP UNIT timeout from one to ten
    seconds
  scsi: ufs: Fix handling of lrbp->cmd

 drivers/scsi/sd.c         | 11 ++++++++++-
 drivers/ufs/core/ufshcd.c | 20 +++-----------------
 2 files changed, 13 insertions(+), 18 deletions(-)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/4] scsi: sd: Let sd_shutdown() fail future I/O
  2023-04-17 23:06 [PATCH v2 0/4] SCSI core and UFS patches for kernel v6.4 Bart Van Assche
@ 2023-04-17 23:06 ` Bart Van Assche
  2023-04-18  4:37   ` Ming Lei
                     ` (2 more replies)
  2023-04-17 23:06 ` [PATCH v2 2/4] scsi: ufs: Simplify ufshcd_wl_shutdown() Bart Van Assche
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 24+ messages in thread
From: Bart Van Assche @ 2023-04-17 23:06 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: Jaegeuk Kim, Avri Altman, Adrian Hunter, linux-scsi,
	Bart Van Assche, Christoph Hellwig, Ming Lei, Hannes Reinecke,
	Mike Christie, Tomas Henzl, James E.J. Bottomley

System shutdown happens as follows (see e.g. the systemd source file
src/shutdown/shutdown.c):
* sync() is called.
* reboot(RB_AUTOBOOT/RB_HALT_SYSTEM/RB_POWER_OFF) is called.
* If the reboot() system call returns, log an error message.

The reboot() system call causes the kernel to call kernel_restart(),
kernel_halt() or kernel_power_off(). Each of these functions calls
device_shutdown(). device_shutdown() calls sd_shutdown(). After
sd_shutdown() has been called the .shutdown() callback of the LLD
will be called. Hence, I/O submitted after sd_shutdown() will hang or
may even cause a kernel crash.

Let sd_shutdown() fail future I/O such that LLD .shutdown() callbacks
can be simplified.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/sd.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 4bb87043e6db..4017b5412ba4 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3699,12 +3699,13 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
 static void sd_shutdown(struct device *dev)
 {
 	struct scsi_disk *sdkp = dev_get_drvdata(dev);
+	struct request_queue *q;
 
 	if (!sdkp)
 		return;         /* this can happen */
 
 	if (pm_runtime_suspended(dev))
-		return;
+		goto fail_future_io;
 
 	if (sdkp->WCE && sdkp->media_present) {
 		sd_printk(KERN_NOTICE, sdkp, "Synchronizing SCSI cache\n");
@@ -3715,6 +3716,14 @@ static void sd_shutdown(struct device *dev)
 		sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n");
 		sd_start_stop_device(sdkp, 0);
 	}
+
+fail_future_io:
+	q = sdkp->disk->queue;
+	blk_queue_flag_set(QUEUE_FLAG_DYING, q);
+	if (!scsi_host_busy(sdkp->device->host))
+		return;
+	blk_mq_freeze_queue(q);
+	blk_mq_unfreeze_queue(q);
 }
 
 static int sd_suspend_common(struct device *dev, bool ignore_stop_errors)

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 2/4] scsi: ufs: Simplify ufshcd_wl_shutdown()
  2023-04-17 23:06 [PATCH v2 0/4] SCSI core and UFS patches for kernel v6.4 Bart Van Assche
  2023-04-17 23:06 ` [PATCH v2 1/4] scsi: sd: Let sd_shutdown() fail future I/O Bart Van Assche
@ 2023-04-17 23:06 ` Bart Van Assche
  2023-04-18 13:45   ` Adrian Hunter
  2023-04-17 23:06 ` [PATCH v2 3/4] scsi: ufs: Increase the START STOP UNIT timeout from one to ten seconds Bart Van Assche
  2023-04-17 23:06 ` [PATCH v2 4/4] scsi: ufs: Fix handling of lrbp->cmd Bart Van Assche
  3 siblings, 1 reply; 24+ messages in thread
From: Bart Van Assche @ 2023-04-17 23:06 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: Jaegeuk Kim, Avri Altman, Adrian Hunter, linux-scsi,
	Bart Van Assche, Asutosh Das, Tomas Henzl, James E.J. Bottomley,
	Bart Van Assche, Bean Huo, Stanley Chu, Asutosh Das

Now that sd_shutdown() fails future I/O the code for quiescing LUNs in
ufshcd_wl_shutdown() is superfluous. Remove the code for quiescing LUNs.
Also remove the ufshcd_rpm_get_sync() call because it is not necessary
to resume a UFS device before submitting a START STOP UNIT command.

Cc: Asutosh Das <asutoshd@codeaurora.org>
Cc: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/ufs/core/ufshcd.c | 12 +-----------
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 9434328ba323..784787cf08c3 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -9768,22 +9768,12 @@ static int ufshcd_wl_resume(struct device *dev)
 static void ufshcd_wl_shutdown(struct device *dev)
 {
 	struct scsi_device *sdev = to_scsi_device(dev);
-	struct ufs_hba *hba;
-
-	hba = shost_priv(sdev->host);
+	struct ufs_hba *hba = shost_priv(sdev->host);
 
 	down(&hba->host_sem);
 	hba->shutting_down = true;
 	up(&hba->host_sem);
 
-	/* Turn on everything while shutting down */
-	ufshcd_rpm_get_sync(hba);
-	scsi_device_quiesce(sdev);
-	shost_for_each_device(sdev, hba->host) {
-		if (sdev == hba->ufs_device_wlun)
-			continue;
-		scsi_device_quiesce(sdev);
-	}
 	__ufshcd_wl_suspend(hba, UFS_SHUTDOWN_PM);
 }
 

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 3/4] scsi: ufs: Increase the START STOP UNIT timeout from one to ten seconds
  2023-04-17 23:06 [PATCH v2 0/4] SCSI core and UFS patches for kernel v6.4 Bart Van Assche
  2023-04-17 23:06 ` [PATCH v2 1/4] scsi: sd: Let sd_shutdown() fail future I/O Bart Van Assche
  2023-04-17 23:06 ` [PATCH v2 2/4] scsi: ufs: Simplify ufshcd_wl_shutdown() Bart Van Assche
@ 2023-04-17 23:06 ` Bart Van Assche
  2023-04-18  7:30   ` Adrian Hunter
  2023-04-18  7:57   ` Stanley Chu
  2023-04-17 23:06 ` [PATCH v2 4/4] scsi: ufs: Fix handling of lrbp->cmd Bart Van Assche
  3 siblings, 2 replies; 24+ messages in thread
From: Bart Van Assche @ 2023-04-17 23:06 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: Jaegeuk Kim, Avri Altman, Adrian Hunter, linux-scsi,
	Bart Van Assche, James E.J. Bottomley, Bart Van Assche, Bean Huo,
	Stanley Chu, Asutosh Das

One UFS vendor asked to increase the UFS timeout from 1 s to 3 s.
Another UFS vendor asked to increase the UFS timeout from 1 s to 10 s.
Hence this patch that increases the UFS timeout to 10 s. This patch can
cause the total timeout to exceed 20 s, the Android shutdown timeout.
This is fine since the loop around ufshcd_execute_start_stop() exists to
deal with unit attentions and because unit attentions are reported
quickly.

Fixes: dcd5b7637c6d ("scsi: ufs: Reduce the START STOP UNIT timeout")
Fixes: 8f2c96420c6e ("scsi: ufs: core: Reduce the power mode change timeout")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/ufs/core/ufshcd.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 784787cf08c3..6831eb1afc30 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -9182,7 +9182,8 @@ static int ufshcd_execute_start_stop(struct scsi_device *sdev,
 	};
 
 	return scsi_execute_cmd(sdev, cdb, REQ_OP_DRV_IN, /*buffer=*/NULL,
-			/*bufflen=*/0, /*timeout=*/HZ, /*retries=*/0, &args);
+			/*bufflen=*/0, /*timeout=*/10 * HZ, /*retries=*/0,
+			&args);
 }
 
 /**

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 4/4] scsi: ufs: Fix handling of lrbp->cmd
  2023-04-17 23:06 [PATCH v2 0/4] SCSI core and UFS patches for kernel v6.4 Bart Van Assche
                   ` (2 preceding siblings ...)
  2023-04-17 23:06 ` [PATCH v2 3/4] scsi: ufs: Increase the START STOP UNIT timeout from one to ten seconds Bart Van Assche
@ 2023-04-17 23:06 ` Bart Van Assche
  3 siblings, 0 replies; 24+ messages in thread
From: Bart Van Assche @ 2023-04-17 23:06 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: Jaegeuk Kim, Avri Altman, Adrian Hunter, linux-scsi,
	Bart Van Assche, Bart Van Assche, James E.J. Bottomley, Bean Huo,
	Stanley Chu, Asutosh Das, James Bottomley, Santosh Y

From: Bart Van Assche <bvanassche@google.com>

ufshcd_queuecommand() may be called two times in a row for a SCSI
command before it is completed. Additionally, a single SCSI command may
be completed twice. Hence make the following changes:
- In the functions that submit a command, do not check the old value of
  lrbp->cmd nor clear lrbp->cmd in error paths.
- In ufshcd_release_scsi_cmd(), do not clear lrbp->cmd.

See also scsi_send_eh_cmnd().

This patch prevents that the following appears if a command times out:

WARNING: at drivers/ufs/core/ufshcd.c:2965 ufshcd_queuecommand+0x6f8/0x9a8
Call trace:
 ufshcd_queuecommand+0x6f8/0x9a8
 scsi_send_eh_cmnd+0x2c0/0x960
 scsi_eh_test_devices+0x100/0x314
 scsi_eh_ready_devs+0xd90/0x114c
 scsi_error_handler+0x2b4/0xb70
 kthread+0x16c/0x1e0

Fixes: 5a0b0cb9bee7 ("[SCSI] ufs: Add support for sending NOP OUT UPIU")
Signed-off-by: Bart Van Assche <bvanassche@google.com>
---
 drivers/ufs/core/ufshcd.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 6831eb1afc30..4f4c5ba66bac 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -2952,7 +2952,6 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 		(hba->clk_gating.state != CLKS_ON));
 
 	lrbp = &hba->lrb[tag];
-	WARN_ON(lrbp->cmd);
 	lrbp->cmd = cmd;
 	lrbp->task_tag = tag;
 	lrbp->lun = ufshcd_scsi_to_upiu_lun(cmd->device->lun);
@@ -2968,7 +2967,6 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 
 	err = ufshcd_map_sg(hba, lrbp);
 	if (err) {
-		lrbp->cmd = NULL;
 		ufshcd_release(hba);
 		goto out;
 	}
@@ -5429,7 +5427,6 @@ static void ufshcd_release_scsi_cmd(struct ufs_hba *hba,
 	struct scsi_cmnd *cmd = lrbp->cmd;
 
 	scsi_dma_unmap(cmd);
-	lrbp->cmd = NULL;	/* Mark the command as completed. */
 	ufshcd_release(hba);
 	ufshcd_clk_scaling_update_busy(hba);
 }
@@ -7044,7 +7041,6 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba,
 	down_read(&hba->clk_scaling_lock);
 
 	lrbp = &hba->lrb[tag];
-	WARN_ON(lrbp->cmd);
 	lrbp->cmd = NULL;
 	lrbp->task_tag = tag;
 	lrbp->lun = 0;
@@ -7216,7 +7212,6 @@ int ufshcd_advanced_rpmb_req_handler(struct ufs_hba *hba, struct utp_upiu_req *r
 	down_read(&hba->clk_scaling_lock);
 
 	lrbp = &hba->lrb[tag];
-	WARN_ON(lrbp->cmd);
 	lrbp->cmd = NULL;
 	lrbp->task_tag = tag;
 	lrbp->lun = UFS_UPIU_RPMB_WLUN;

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/4] scsi: sd: Let sd_shutdown() fail future I/O
  2023-04-17 23:06 ` [PATCH v2 1/4] scsi: sd: Let sd_shutdown() fail future I/O Bart Van Assche
@ 2023-04-18  4:37   ` Ming Lei
  2023-04-18 14:09     ` Bart Van Assche
  2023-04-18  5:06   ` Christoph Hellwig
  2023-04-18 14:36   ` James Bottomley
  2 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2023-04-18  4:37 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Martin K . Petersen, Jaegeuk Kim, Avri Altman, Adrian Hunter,
	linux-scsi, Christoph Hellwig, Hannes Reinecke, Mike Christie,
	Tomas Henzl, James E.J. Bottomley, ming.lei

On Mon, Apr 17, 2023 at 04:06:53PM -0700, Bart Van Assche wrote:
> System shutdown happens as follows (see e.g. the systemd source file
> src/shutdown/shutdown.c):
> * sync() is called.
> * reboot(RB_AUTOBOOT/RB_HALT_SYSTEM/RB_POWER_OFF) is called.
> * If the reboot() system call returns, log an error message.
> 
> The reboot() system call causes the kernel to call kernel_restart(),
> kernel_halt() or kernel_power_off(). Each of these functions calls
> device_shutdown(). device_shutdown() calls sd_shutdown(). After
> sd_shutdown() has been called the .shutdown() callback of the LLD
> will be called. Hence, I/O submitted after sd_shutdown() will hang or
> may even cause a kernel crash.
> 
> Let sd_shutdown() fail future I/O such that LLD .shutdown() callbacks
> can be simplified.

Hi Bart,

Last time you mentioned the current way may have kernel panic risk, but
you never explain the panic, can you document the panic in commit log?


Thanks,
Ming


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/4] scsi: sd: Let sd_shutdown() fail future I/O
  2023-04-17 23:06 ` [PATCH v2 1/4] scsi: sd: Let sd_shutdown() fail future I/O Bart Van Assche
  2023-04-18  4:37   ` Ming Lei
@ 2023-04-18  5:06   ` Christoph Hellwig
  2023-04-18 14:36   ` James Bottomley
  2 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2023-04-18  5:06 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Martin K . Petersen, Jaegeuk Kim, Avri Altman, Adrian Hunter,
	linux-scsi, Christoph Hellwig, Ming Lei, Hannes Reinecke,
	Mike Christie, Tomas Henzl, James E.J. Bottomley

On Mon, Apr 17, 2023 at 04:06:53PM -0700, Bart Van Assche wrote:
> +fail_future_io:
> +	q = sdkp->disk->queue;
> +	blk_queue_flag_set(QUEUE_FLAG_DYING, q);

SD does not own the queue and has abslutely not business marking it
dying.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/4] scsi: ufs: Increase the START STOP UNIT timeout from one to ten seconds
  2023-04-17 23:06 ` [PATCH v2 3/4] scsi: ufs: Increase the START STOP UNIT timeout from one to ten seconds Bart Van Assche
@ 2023-04-18  7:30   ` Adrian Hunter
  2023-04-18  7:57   ` Stanley Chu
  1 sibling, 0 replies; 24+ messages in thread
From: Adrian Hunter @ 2023-04-18  7:30 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen
  Cc: Jaegeuk Kim, Avri Altman, linux-scsi, James E.J. Bottomley,
	Bart Van Assche, Bean Huo, Stanley Chu, Asutosh Das

On 18/04/23 02:06, Bart Van Assche wrote:
> One UFS vendor asked to increase the UFS timeout from 1 s to 3 s.
> Another UFS vendor asked to increase the UFS timeout from 1 s to 10 s.
> Hence this patch that increases the UFS timeout to 10 s. This patch can
> cause the total timeout to exceed 20 s, the Android shutdown timeout.
> This is fine since the loop around ufshcd_execute_start_stop() exists to
> deal with unit attentions and because unit attentions are reported
> quickly.
> 
> Fixes: dcd5b7637c6d ("scsi: ufs: Reduce the START STOP UNIT timeout")
> Fixes: 8f2c96420c6e ("scsi: ufs: core: Reduce the power mode change timeout")
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Acked-by: Adrian Hunter <adrian.hunter@intel.com>

> ---
>  drivers/ufs/core/ufshcd.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index 784787cf08c3..6831eb1afc30 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -9182,7 +9182,8 @@ static int ufshcd_execute_start_stop(struct scsi_device *sdev,
>  	};
>  
>  	return scsi_execute_cmd(sdev, cdb, REQ_OP_DRV_IN, /*buffer=*/NULL,
> -			/*bufflen=*/0, /*timeout=*/HZ, /*retries=*/0, &args);
> +			/*bufflen=*/0, /*timeout=*/10 * HZ, /*retries=*/0,
> +			&args);
>  }
>  
>  /**


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/4] scsi: ufs: Increase the START STOP UNIT timeout from one to ten seconds
  2023-04-17 23:06 ` [PATCH v2 3/4] scsi: ufs: Increase the START STOP UNIT timeout from one to ten seconds Bart Van Assche
  2023-04-18  7:30   ` Adrian Hunter
@ 2023-04-18  7:57   ` Stanley Chu
  1 sibling, 0 replies; 24+ messages in thread
From: Stanley Chu @ 2023-04-18  7:57 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Martin K . Petersen, Jaegeuk Kim, Avri Altman, Adrian Hunter,
	linux-scsi, James E.J. Bottomley, Bart Van Assche, Bean Huo,
	Stanley Chu, Asutosh Das

Bart Van Assche <bvanassche@acm.org> 於 2023年4月18日 週二 上午7:17寫道:
>
> One UFS vendor asked to increase the UFS timeout from 1 s to 3 s.
> Another UFS vendor asked to increase the UFS timeout from 1 s to 10 s.
> Hence this patch that increases the UFS timeout to 10 s. This patch can
> cause the total timeout to exceed 20 s, the Android shutdown timeout.
> This is fine since the loop around ufshcd_execute_start_stop() exists to
> deal with unit attentions and because unit attentions are reported
> quickly.
>
> Fixes: dcd5b7637c6d ("scsi: ufs: Reduce the START STOP UNIT timeout")
> Fixes: 8f2c96420c6e ("scsi: ufs: core: Reduce the power mode change timeout")
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] scsi: ufs: Simplify ufshcd_wl_shutdown()
  2023-04-17 23:06 ` [PATCH v2 2/4] scsi: ufs: Simplify ufshcd_wl_shutdown() Bart Van Assche
@ 2023-04-18 13:45   ` Adrian Hunter
  2023-04-18 14:06     ` Bart Van Assche
  0 siblings, 1 reply; 24+ messages in thread
From: Adrian Hunter @ 2023-04-18 13:45 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen
  Cc: Jaegeuk Kim, Avri Altman, linux-scsi, Asutosh Das, Tomas Henzl,
	James E.J. Bottomley, Bart Van Assche, Bean Huo, Stanley Chu,
	Asutosh Das

On 18/04/23 02:06, Bart Van Assche wrote:
> Now that sd_shutdown() fails future I/O the code for quiescing LUNs in
> ufshcd_wl_shutdown() is superfluous. Remove the code for quiescing LUNs.
> Also remove the ufshcd_rpm_get_sync() call because it is not necessary
> to resume a UFS device before submitting a START STOP UNIT command.

What about the host controller hba->dev?

> 
> Cc: Asutosh Das <asutoshd@codeaurora.org>
> Cc: Tomas Henzl <thenzl@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/ufs/core/ufshcd.c | 12 +-----------
>  1 file changed, 1 insertion(+), 11 deletions(-)
> 
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index 9434328ba323..784787cf08c3 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -9768,22 +9768,12 @@ static int ufshcd_wl_resume(struct device *dev)
>  static void ufshcd_wl_shutdown(struct device *dev)
>  {
>  	struct scsi_device *sdev = to_scsi_device(dev);
> -	struct ufs_hba *hba;
> -
> -	hba = shost_priv(sdev->host);
> +	struct ufs_hba *hba = shost_priv(sdev->host);
>  
>  	down(&hba->host_sem);
>  	hba->shutting_down = true;
>  	up(&hba->host_sem);
>  
> -	/* Turn on everything while shutting down */
> -	ufshcd_rpm_get_sync(hba);
> -	scsi_device_quiesce(sdev);
> -	shost_for_each_device(sdev, hba->host) {
> -		if (sdev == hba->ufs_device_wlun)
> -			continue;
> -		scsi_device_quiesce(sdev);
> -	}
>  	__ufshcd_wl_suspend(hba, UFS_SHUTDOWN_PM);
>  }
>  


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] scsi: ufs: Simplify ufshcd_wl_shutdown()
  2023-04-18 13:45   ` Adrian Hunter
@ 2023-04-18 14:06     ` Bart Van Assche
  2023-04-18 14:13       ` Adrian Hunter
  0 siblings, 1 reply; 24+ messages in thread
From: Bart Van Assche @ 2023-04-18 14:06 UTC (permalink / raw)
  To: Adrian Hunter, Martin K . Petersen
  Cc: Jaegeuk Kim, Avri Altman, linux-scsi, Asutosh Das, Tomas Henzl,
	James E.J. Bottomley, Bart Van Assche, Bean Huo, Stanley Chu,
	Asutosh Das

On 4/18/23 06:45, Adrian Hunter wrote:
> On 18/04/23 02:06, Bart Van Assche wrote:
>> Now that sd_shutdown() fails future I/O the code for quiescing LUNs in
>> ufshcd_wl_shutdown() is superfluous. Remove the code for quiescing LUNs.
>> Also remove the ufshcd_rpm_get_sync() call because it is not necessary
>> to resume a UFS device before submitting a START STOP UNIT command.
> 
> What about the host controller hba->dev?

The above question is not clear to me. Please elaborate.

Bart.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/4] scsi: sd: Let sd_shutdown() fail future I/O
  2023-04-18  4:37   ` Ming Lei
@ 2023-04-18 14:09     ` Bart Van Assche
  0 siblings, 0 replies; 24+ messages in thread
From: Bart Van Assche @ 2023-04-18 14:09 UTC (permalink / raw)
  To: Ming Lei
  Cc: Martin K . Petersen, Jaegeuk Kim, Avri Altman, Adrian Hunter,
	linux-scsi, Christoph Hellwig, Hannes Reinecke, Mike Christie,
	Tomas Henzl, James E.J. Bottomley

On 4/17/23 21:37, Ming Lei wrote:
> On Mon, Apr 17, 2023 at 04:06:53PM -0700, Bart Van Assche wrote:
>> System shutdown happens as follows (see e.g. the systemd source file
>> src/shutdown/shutdown.c):
>> * sync() is called.
>> * reboot(RB_AUTOBOOT/RB_HALT_SYSTEM/RB_POWER_OFF) is called.
>> * If the reboot() system call returns, log an error message.
>>
>> The reboot() system call causes the kernel to call kernel_restart(),
>> kernel_halt() or kernel_power_off(). Each of these functions calls
>> device_shutdown(). device_shutdown() calls sd_shutdown(). After
>> sd_shutdown() has been called the .shutdown() callback of the LLD
>> will be called. Hence, I/O submitted after sd_shutdown() will hang or
>> may even cause a kernel crash.
>>
>> Let sd_shutdown() fail future I/O such that LLD .shutdown() callbacks
>> can be simplified.
> 
> Hi Bart,
> 
> Last time you mentioned the current way may have kernel panic risk, but
> you never explain the panic, can you document the panic in commit log?

Hi Ming,

I removed the references to the risk of a kernel panic since I think 
that shutdown methods should not introduce that risk. From 
include/device/bus.h:

  * @shutdown:	Called at shut-down time to quiesce the device.

That comment says "quiesce the device". It does not say that it is 
allowed to crash the system if more I/O is submitted to the device.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] scsi: ufs: Simplify ufshcd_wl_shutdown()
  2023-04-18 14:06     ` Bart Van Assche
@ 2023-04-18 14:13       ` Adrian Hunter
  2023-04-18 20:14         ` Bart Van Assche
  0 siblings, 1 reply; 24+ messages in thread
From: Adrian Hunter @ 2023-04-18 14:13 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen
  Cc: Jaegeuk Kim, Avri Altman, linux-scsi, Asutosh Das, Tomas Henzl,
	James E.J. Bottomley, Bart Van Assche, Bean Huo, Stanley Chu,
	Asutosh Das

On 18/04/23 17:06, Bart Van Assche wrote:
> On 4/18/23 06:45, Adrian Hunter wrote:
>> On 18/04/23 02:06, Bart Van Assche wrote:
>>> Now that sd_shutdown() fails future I/O the code for quiescing LUNs in
>>> ufshcd_wl_shutdown() is superfluous. Remove the code for quiescing LUNs.
>>> Also remove the ufshcd_rpm_get_sync() call because it is not necessary
>>> to resume a UFS device before submitting a START STOP UNIT command.
>>
>> What about the host controller hba->dev?
> 
> The above question is not clear to me. Please elaborate.

Does hba->dev need to be runtime resumed?



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/4] scsi: sd: Let sd_shutdown() fail future I/O
  2023-04-17 23:06 ` [PATCH v2 1/4] scsi: sd: Let sd_shutdown() fail future I/O Bart Van Assche
  2023-04-18  4:37   ` Ming Lei
  2023-04-18  5:06   ` Christoph Hellwig
@ 2023-04-18 14:36   ` James Bottomley
  2023-04-18 18:37     ` Bart Van Assche
  2 siblings, 1 reply; 24+ messages in thread
From: James Bottomley @ 2023-04-18 14:36 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen
  Cc: Jaegeuk Kim, Avri Altman, Adrian Hunter, linux-scsi,
	Christoph Hellwig, Ming Lei, Hannes Reinecke, Mike Christie,
	Tomas Henzl

On Mon, 2023-04-17 at 16:06 -0700, Bart Van Assche wrote:
> System shutdown happens as follows (see e.g. the systemd source file
> src/shutdown/shutdown.c):
> * sync() is called.
> * reboot(RB_AUTOBOOT/RB_HALT_SYSTEM/RB_POWER_OFF) is called.
> * If the reboot() system call returns, log an error message.
> 
> The reboot() system call causes the kernel to call kernel_restart(),
> kernel_halt() or kernel_power_off(). Each of these functions calls
> device_shutdown(). device_shutdown() calls sd_shutdown(). After
> sd_shutdown() has been called the .shutdown() callback of the LLD
> will be called. Hence, I/O submitted after sd_shutdown() will hang or
> may even cause a kernel crash.
> 
> Let sd_shutdown() fail future I/O such that LLD .shutdown() callbacks
> can be simplified.

What is the actual reason for this?  What is it you think might be
submitting I/O after the system gets into this state?  Current
sd_shutdown is constructed on the premise that it's the last thing that
ever happens to the device before reboot/power off which is why it
flushes the cache if necessary and stops the device if required, but
for most standard devices neither is required because we don't expect
Linux to go down with pending items in the block queue and for a write
through disk cache anything that's completed on the block queue is
safely durable on the device.

James


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/4] scsi: sd: Let sd_shutdown() fail future I/O
  2023-04-18 14:36   ` James Bottomley
@ 2023-04-18 18:37     ` Bart Van Assche
  2023-04-19  2:34       ` James Bottomley
  0 siblings, 1 reply; 24+ messages in thread
From: Bart Van Assche @ 2023-04-18 18:37 UTC (permalink / raw)
  To: jejb, Martin K . Petersen
  Cc: Jaegeuk Kim, Avri Altman, Adrian Hunter, linux-scsi,
	Christoph Hellwig, Ming Lei, Hannes Reinecke, Mike Christie,
	Tomas Henzl

On 4/18/23 07:36, James Bottomley wrote:
> On Mon, 2023-04-17 at 16:06 -0700, Bart Van Assche wrote:
>> System shutdown happens as follows (see e.g. the systemd source file
>> src/shutdown/shutdown.c):
>> * sync() is called.
>> * reboot(RB_AUTOBOOT/RB_HALT_SYSTEM/RB_POWER_OFF) is called.
>> * If the reboot() system call returns, log an error message.
>>
>> The reboot() system call causes the kernel to call kernel_restart(),
>> kernel_halt() or kernel_power_off(). Each of these functions calls
>> device_shutdown(). device_shutdown() calls sd_shutdown(). After
>> sd_shutdown() has been called the .shutdown() callback of the LLD
>> will be called. Hence, I/O submitted after sd_shutdown() will hang or
>> may even cause a kernel crash.
>>
>> Let sd_shutdown() fail future I/O such that LLD .shutdown() callbacks
>> can be simplified.
> 
> What is the actual reason for this?  What is it you think might be
> submitting I/O after the system gets into this state?  Current
> sd_shutdown is constructed on the premise that it's the last thing that
> ever happens to the device before reboot/power off which is why it
> flushes the cache if necessary and stops the device if required, but
> for most standard devices neither is required because we don't expect
> Linux to go down with pending items in the block queue and for a write
> through disk cache anything that's completed on the block queue is
> safely durable on the device.

Hi James,

.shutdown() callbacks should quiesce I/O but the sd_shutdown() function 
doesn't do this. I see this as a bug.

Regarding your question, I think that sd_check_events() can be called 
while sd_shutdown() is in progress or after sd_shutdown() has finished. 
sd_check_events() may submit a TEST UNIT READY command.

In pci_device_shutdown() one can see that the PCI core clears the bus 
master bit for PCI devices during shutdown. In other words, it is not 
safe to submit I/O or to process completions during invocation of 
shutdown callbacks. I think that also shows that this patch fixes a bug.

Bart.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] scsi: ufs: Simplify ufshcd_wl_shutdown()
  2023-04-18 14:13       ` Adrian Hunter
@ 2023-04-18 20:14         ` Bart Van Assche
  2023-04-19  5:23           ` Adrian Hunter
  0 siblings, 1 reply; 24+ messages in thread
From: Bart Van Assche @ 2023-04-18 20:14 UTC (permalink / raw)
  To: Adrian Hunter, Martin K . Petersen
  Cc: Jaegeuk Kim, Avri Altman, linux-scsi, Asutosh Das, Tomas Henzl,
	James E.J. Bottomley, Bart Van Assche, Bean Huo, Stanley Chu,
	Asutosh Das

On 4/18/23 07:13, Adrian Hunter wrote:
> On 18/04/23 17:06, Bart Van Assche wrote:
>> On 4/18/23 06:45, Adrian Hunter wrote:
>>> On 18/04/23 02:06, Bart Van Assche wrote:
>>>> Now that sd_shutdown() fails future I/O the code for quiescing LUNs in
>>>> ufshcd_wl_shutdown() is superfluous. Remove the code for quiescing LUNs.
>>>> Also remove the ufshcd_rpm_get_sync() call because it is not necessary
>>>> to resume a UFS device before submitting a START STOP UNIT command.
>>>
>>> What about the host controller hba->dev?
>>
>> The above question is not clear to me. Please elaborate.
> 
> Does hba->dev need to be runtime resumed?

Hi Adrian,

I don't think so. Shutdown callback functions are expected to quiesce 
hardware activity. To me runtime resuming a device seems to contradict 
the goal of quiescing hardware activity.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/4] scsi: sd: Let sd_shutdown() fail future I/O
  2023-04-18 18:37     ` Bart Van Assche
@ 2023-04-19  2:34       ` James Bottomley
  2023-04-19 13:36         ` Tomas Henzl
  0 siblings, 1 reply; 24+ messages in thread
From: James Bottomley @ 2023-04-19  2:34 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen
  Cc: Jaegeuk Kim, Avri Altman, Adrian Hunter, linux-scsi,
	Christoph Hellwig, Ming Lei, Hannes Reinecke, Mike Christie,
	Tomas Henzl

On Tue, 2023-04-18 at 11:37 -0700, Bart Van Assche wrote:
> On 4/18/23 07:36, James Bottomley wrote:
> > On Mon, 2023-04-17 at 16:06 -0700, Bart Van Assche wrote:
> > > System shutdown happens as follows (see e.g. the systemd source
> > > file src/shutdown/shutdown.c):
> > > * sync() is called.
> > > * reboot(RB_AUTOBOOT/RB_HALT_SYSTEM/RB_POWER_OFF) is called.
> > > * If the reboot() system call returns, log an error message.
> > > 
> > > The reboot() system call causes the kernel to call
> > > kernel_restart(), kernel_halt() or kernel_power_off(). Each of
> > > these functions calls device_shutdown(). device_shutdown() calls
> > > sd_shutdown(). After sd_shutdown() has been called the
> > > .shutdown() callback of the LLD will be called. Hence, I/O
> > > submitted after sd_shutdown() will hang or may even cause a
> > > kernel crash.
> > > 
> > > Let sd_shutdown() fail future I/O such that LLD .shutdown()
> > > callbacks can be simplified.
> > 
> > What is the actual reason for this?  What is it you think might be
> > submitting I/O after the system gets into this state?  Current
> > sd_shutdown is constructed on the premise that it's the last thing
> > that ever happens to the device before reboot/power off which is
> > why it flushes the cache if necessary and stops the device if
> > required, but for most standard devices neither is required because
> > we don't expect Linux to go down with pending items in the block
> > queue and for a write through disk cache anything that's completed
> > on the block queue is safely durable on the device.
> 
> Hi James,
> 
> .shutdown() callbacks should quiesce I/O but the sd_shutdown()
> function doesn't do this. I see this as a bug.

Why? They're only called on reboot or shutdown.  In orderly cases, the
queues should have been stopped long ago, so there should be no I/O to
quiesce, and in the disorderly case, you obviously didn't care about
the data anyway, so the job of the shutdown routine is to salvage as
much as it can, which is why we flush the cache and stop the disk if
necessary.

> Regarding your question, I think that sd_check_events() can be called
> while sd_shutdown() is in progress or after sd_shutdown() has
> finished.  sd_check_events() may submit a TEST UNIT READY command.

If that's true, that would argue that the block layer caller should be
aware of the shutdown and not do this.  On the other hand, if the TUR
fails, what is the harm?

> In pci_device_shutdown() one can see that the PCI core clears the bus
> master bit for PCI devices during shutdown. In other words, it is not
> safe to submit I/O or to process completions during invocation of 
> shutdown callbacks. I think that also shows that this patch fixes a
> bug.

I don't think it does: in the orderly case, there should be no I/O
outstanding, so nothing to trigger and in the disorderly case, you have
no expectation of the I/O reaching the device, so what would the actual
bug be that this fixes?

James


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/4] scsi: ufs: Simplify ufshcd_wl_shutdown()
  2023-04-18 20:14         ` Bart Van Assche
@ 2023-04-19  5:23           ` Adrian Hunter
  0 siblings, 0 replies; 24+ messages in thread
From: Adrian Hunter @ 2023-04-19  5:23 UTC (permalink / raw)
  To: Bart Van Assche, Martin K . Petersen
  Cc: Jaegeuk Kim, Avri Altman, linux-scsi, Asutosh Das, Tomas Henzl,
	James E.J. Bottomley, Bart Van Assche, Bean Huo, Stanley Chu,
	Asutosh Das

On 18/04/23 23:14, Bart Van Assche wrote:
> On 4/18/23 07:13, Adrian Hunter wrote:
>> On 18/04/23 17:06, Bart Van Assche wrote:
>>> On 4/18/23 06:45, Adrian Hunter wrote:
>>>> On 18/04/23 02:06, Bart Van Assche wrote:
>>>>> Now that sd_shutdown() fails future I/O the code for quiescing LUNs in
>>>>> ufshcd_wl_shutdown() is superfluous. Remove the code for quiescing LUNs.
>>>>> Also remove the ufshcd_rpm_get_sync() call because it is not necessary
>>>>> to resume a UFS device before submitting a START STOP UNIT command.
>>>>
>>>> What about the host controller hba->dev?
>>>
>>> The above question is not clear to me. Please elaborate.
>>
>> Does hba->dev need to be runtime resumed?
> 
> Hi Adrian,
> 
> I don't think so. Shutdown callback functions are expected to quiesce hardware activity. To me runtime resuming a device seems to contradict the goal of quiescing hardware activity.

I don't think the host controller can be used to submit a START STOP UNIT command if the host controller is runtime suspended.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/4] scsi: sd: Let sd_shutdown() fail future I/O
  2023-04-19  2:34       ` James Bottomley
@ 2023-04-19 13:36         ` Tomas Henzl
  2023-04-19 14:02           ` James Bottomley
  0 siblings, 1 reply; 24+ messages in thread
From: Tomas Henzl @ 2023-04-19 13:36 UTC (permalink / raw)
  To: jejb, Bart Van Assche, Martin K . Petersen
  Cc: Jaegeuk Kim, Avri Altman, Adrian Hunter, linux-scsi,
	Christoph Hellwig, Ming Lei, Hannes Reinecke, Mike Christie

On 4/19/23 04:34, James Bottomley wrote:
> On Tue, 2023-04-18 at 11:37 -0700, Bart Van Assche wrote:
>> On 4/18/23 07:36, James Bottomley wrote:
>>> On Mon, 2023-04-17 at 16:06 -0700, Bart Van Assche wrote:
>>>> System shutdown happens as follows (see e.g. the systemd source
>>>> file src/shutdown/shutdown.c):
>>>> * sync() is called.
>>>> * reboot(RB_AUTOBOOT/RB_HALT_SYSTEM/RB_POWER_OFF) is called.
>>>> * If the reboot() system call returns, log an error message.
>>>>
>>>> The reboot() system call causes the kernel to call
>>>> kernel_restart(), kernel_halt() or kernel_power_off(). Each of
>>>> these functions calls device_shutdown(). device_shutdown() calls
>>>> sd_shutdown(). After sd_shutdown() has been called the
>>>> .shutdown() callback of the LLD will be called. Hence, I/O
>>>> submitted after sd_shutdown() will hang or may even cause a
>>>> kernel crash.
>>>>
>>>> Let sd_shutdown() fail future I/O such that LLD .shutdown()
>>>> callbacks can be simplified.
>>>
>>> What is the actual reason for this?  What is it you think might be
>>> submitting I/O after the system gets into this state?  Current
>>> sd_shutdown is constructed on the premise that it's the last thing
>>> that ever happens to the device before reboot/power off which is
>>> why it flushes the cache if necessary and stops the device if
>>> required, but for most standard devices neither is required because
>>> we don't expect Linux to go down with pending items in the block
>>> queue and for a write through disk cache anything that's completed
>>> on the block queue is safely durable on the device.
>>
>> Hi James,
>>
>> .shutdown() callbacks should quiesce I/O but the sd_shutdown()
>> function doesn't do this. I see this as a bug.
> 
> Why? They're only called on reboot or shutdown.  In orderly cases, the
> queues should have been stopped long ago, so there should be no I/O to
> quiesce, and in the disorderly case, you obviously didn't care about
> the data anyway, so the job of the shutdown routine is to salvage as
> much as it can, which is why we flush the cache and stop the disk if
> necessary.

A kernel panic has been reported caused by I/O arriving after driver's
shutdown (the driver is fixed now so just scsi exception handling is
logged).
All drivers should have a protection against late commands queued, with
a block after sd.shutdown that could be dropped  and so well written
drivers could enjoy a bit easier/faster code and and those not well
written wouldn't be waiting for issues yet to come.

What actually is the downside of blocking further I/O in sd shutdown?

> 
>> Regarding your question, I think that sd_check_events() can be called
>> while sd_shutdown() is in progress or after sd_shutdown() has
>> finished.  sd_check_events() may submit a TEST UNIT READY command.
> 
> If that's true, that would argue that the block layer caller should be
> aware of the shutdown and not do this.  On the other hand, if the TUR
> fails, what is the harm?
> 
>> In pci_device_shutdown() one can see that the PCI core clears the bus
>> master bit for PCI devices during shutdown. In other words, it is not
>> safe to submit I/O or to process completions during invocation of 
>> shutdown callbacks. I think that also shows that this patch fixes a
>> bug.
> 
> I don't think it does: in the orderly case, there should be no I/O
> outstanding, so nothing to trigger and in the disorderly case, you have
> no expectation of the I/O reaching the device, so what would the actual
> bug be that this fixes?


> 
> James
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/4] scsi: sd: Let sd_shutdown() fail future I/O
  2023-04-19 13:36         ` Tomas Henzl
@ 2023-04-19 14:02           ` James Bottomley
  2023-04-19 17:58             ` Bart Van Assche
  0 siblings, 1 reply; 24+ messages in thread
From: James Bottomley @ 2023-04-19 14:02 UTC (permalink / raw)
  To: Tomas Henzl, Bart Van Assche, Martin K . Petersen
  Cc: Jaegeuk Kim, Avri Altman, Adrian Hunter, linux-scsi,
	Christoph Hellwig, Ming Lei, Hannes Reinecke, Mike Christie

On Wed, 2023-04-19 at 15:36 +0200, Tomas Henzl wrote:
> On 4/19/23 04:34, James Bottomley wrote:
> > On Tue, 2023-04-18 at 11:37 -0700, Bart Van Assche wrote:
> > > On 4/18/23 07:36, James Bottomley wrote:
> > > > On Mon, 2023-04-17 at 16:06 -0700, Bart Van Assche wrote:
> > > > > System shutdown happens as follows (see e.g. the systemd
> > > > > source
> > > > > file src/shutdown/shutdown.c):
> > > > > * sync() is called.
> > > > > * reboot(RB_AUTOBOOT/RB_HALT_SYSTEM/RB_POWER_OFF) is called.
> > > > > * If the reboot() system call returns, log an error message.
> > > > > 
> > > > > The reboot() system call causes the kernel to call
> > > > > kernel_restart(), kernel_halt() or kernel_power_off(). Each
> > > > > of these functions calls device_shutdown(). device_shutdown()
> > > > > calls sd_shutdown(). After sd_shutdown() has been called the
> > > > > .shutdown() callback of the LLD will be called. Hence, I/O
> > > > > submitted after sd_shutdown() will hang or may even cause a
> > > > > kernel crash.
> > > > > 
> > > > > Let sd_shutdown() fail future I/O such that LLD .shutdown()
> > > > > callbacks can be simplified.
> > > > 
> > > > What is the actual reason for this?  What is it you think might
> > > > be submitting I/O after the system gets into this state? 
> > > > Current sd_shutdown is constructed on the premise that it's the
> > > > last thing that ever happens to the device before reboot/power
> > > > off which is why it flushes the cache if necessary and stops
> > > > the device if required, but for most standard devices neither
> > > > is required because we don't expect Linux to go down with
> > > > pending items in the block queue and for a write through disk
> > > > cache anything that's completed on the block queue is safely
> > > > durable on the device.
> > > 
> > > Hi James,
> > > 
> > > .shutdown() callbacks should quiesce I/O but the sd_shutdown()
> > > function doesn't do this. I see this as a bug.
> > 
> > Why? They're only called on reboot or shutdown.  In orderly cases,
> > the queues should have been stopped long ago, so there should be no
> > I/O to quiesce, and in the disorderly case, you obviously didn't
> > care about the data anyway, so the job of the shutdown routine is
> > to salvage as much as it can, which is why we flush the cache and
> > stop the disk if necessary.
> 
> A kernel panic has been reported caused by I/O arriving after
> driver's shutdown (the driver is fixed now so just scsi exception
> handling is logged). All drivers should have a protection against
> late commands queued, with a block after sd.shutdown that could be
> dropped  and so well written drivers could enjoy a bit easier/faster
> code and and those not well written wouldn't be waiting for issues
> yet to come.
> 
> What actually is the downside of blocking further I/O in sd shutdown?

Well, it's a layering problem: if all queues should be stopped on
shutdown, then this should be done from block (for all queues including
those outside SCSI).

device_shutdown() goes in reverse devices_kset->list order, so it looks
like it would do the PCI device then the SCSI device then the ULD then
block, so we can use the queues in SCSI for emergency actions (like
flush or stop) before block goes down.  Although this isn't guaranteed;
there are things, like device_link_add, which reorder this kset, so
we'd need to make sure the above assumption is correct.

James


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/4] scsi: sd: Let sd_shutdown() fail future I/O
  2023-04-19 14:02           ` James Bottomley
@ 2023-04-19 17:58             ` Bart Van Assche
  2023-04-19 18:33               ` James Bottomley
  0 siblings, 1 reply; 24+ messages in thread
From: Bart Van Assche @ 2023-04-19 17:58 UTC (permalink / raw)
  To: jejb, Tomas Henzl, Martin K . Petersen
  Cc: Jaegeuk Kim, Avri Altman, Adrian Hunter, linux-scsi,
	Christoph Hellwig, Ming Lei, Hannes Reinecke, Mike Christie

[-- Attachment #1: Type: text/plain, Size: 1108 bytes --]

On 4/19/23 07:02, James Bottomley wrote:
> device_shutdown() goes in reverse devices_kset->list order, so it looks
> like it would do the PCI device then the SCSI device then the ULD then
> block, so we can use the queues in SCSI for emergency actions (like
> flush or stop) before block goes down.  Although this isn't guaranteed;
> there are things, like device_link_add, which reorder this kset, so
> we'd need to make sure the above assumption is correct.

Hi James,

My understanding is that both the block device associated with 
/dev/sd<x> and the struct scsi_disk associated with the same SCSI device 
have the sdev_gendev member of struct scsi_device as parent. In other 
words, without creating device links, there are no guarantees about the 
order in which the .shutdown() methods of struct block_device.bd_device 
and struct scsi_disk.disk_dev are called. Adding device links seems like 
an unnecessary complexity to me. Hence my preference to make 
sd_shutdown() responsible for quiescing future SCSI command activity.

See also the attached diagram (not sure I got that diagram right).

Bart.

[-- Attachment #2: scsi-disk-structures.svg --]
[-- Type: image/svg+xml, Size: 17334 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/4] scsi: sd: Let sd_shutdown() fail future I/O
  2023-04-19 17:58             ` Bart Van Assche
@ 2023-04-19 18:33               ` James Bottomley
  2023-04-19 19:24                 ` Bart Van Assche
  0 siblings, 1 reply; 24+ messages in thread
From: James Bottomley @ 2023-04-19 18:33 UTC (permalink / raw)
  To: Bart Van Assche, Tomas Henzl, Martin K . Petersen
  Cc: Jaegeuk Kim, Avri Altman, Adrian Hunter, linux-scsi,
	Christoph Hellwig, Ming Lei, Hannes Reinecke, Mike Christie

On Wed, 2023-04-19 at 10:58 -0700, Bart Van Assche wrote:
> On 4/19/23 07:02, James Bottomley wrote:
> > device_shutdown() goes in reverse devices_kset->list order, so it
> > looks like it would do the PCI device then the SCSI device then the
> > ULD then block, so we can use the queues in SCSI for emergency
> > actions (like flush or stop) before block goes down.  Although this
> > isn't guaranteed; there are things, like device_link_add, which
> > reorder this kset, so we'd need to make sure the above assumption
> > is correct.
> 
> Hi James,
> 
> My understanding is that both the block device associated with 
> /dev/sd<x> and the struct scsi_disk associated with the same SCSI
> device have the sdev_gendev member of struct scsi_device as parent.
> In other words, without creating device links, there are no
> guarantees about the order in which the .shutdown() methods of struct
> block_device.bd_device  and struct scsi_disk.disk_dev are called.
> Adding device links seems like an unnecessary complexity to me. Hence
> my preference to make sd_shutdown() responsible for quiescing future
> SCSI command activity.

So if we can't reliably get it right, let's not do it at all.  The
previous argument you made was about problems with I/O to shutdown PCI
devices.  That's not SCSI specific, so either we fix it for everything
or decide it's not really a problem for anything.

James


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/4] scsi: sd: Let sd_shutdown() fail future I/O
  2023-04-19 18:33               ` James Bottomley
@ 2023-04-19 19:24                 ` Bart Van Assche
  2023-04-19 19:29                   ` James Bottomley
  0 siblings, 1 reply; 24+ messages in thread
From: Bart Van Assche @ 2023-04-19 19:24 UTC (permalink / raw)
  To: jejb, Tomas Henzl, Martin K . Petersen
  Cc: Jaegeuk Kim, Avri Altman, Adrian Hunter, linux-scsi,
	Christoph Hellwig, Ming Lei, Hannes Reinecke, Mike Christie

On 4/19/23 11:33, James Bottomley wrote:
> So if we can't reliably get it right, let's not do it at all.  The
> previous argument you made was about problems with I/O to shutdown PCI
> devices.  That's not SCSI specific, so either we fix it for everything
> or decide it's not really a problem for anything.

The reference to PCI devices was an example only. As mentioned elsewhere 
in this email thread, sd_shutdown() does not do what it should do, 
namely to quiesce I/O. From include/linux/bus.h:

  * @shutdown:	Called at shut-down time to quiesce the device.

Bart.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/4] scsi: sd: Let sd_shutdown() fail future I/O
  2023-04-19 19:24                 ` Bart Van Assche
@ 2023-04-19 19:29                   ` James Bottomley
  0 siblings, 0 replies; 24+ messages in thread
From: James Bottomley @ 2023-04-19 19:29 UTC (permalink / raw)
  To: Bart Van Assche, Tomas Henzl, Martin K . Petersen
  Cc: Jaegeuk Kim, Avri Altman, Adrian Hunter, linux-scsi,
	Christoph Hellwig, Ming Lei, Hannes Reinecke, Mike Christie

On Wed, 2023-04-19 at 12:24 -0700, Bart Van Assche wrote:
> On 4/19/23 11:33, James Bottomley wrote:
> > So if we can't reliably get it right, let's not do it at all.  The
> > previous argument you made was about problems with I/O to shutdown
> > PCI devices.  That's not SCSI specific, so either we fix it for
> > everything or decide it's not really a problem for anything.
> 
> The reference to PCI devices was an example only. As mentioned
> elsewhere in this email thread, sd_shutdown() does not do what it
> should do, namely to quiesce I/O. From include/linux/bus.h:
> 
>   * @shutdown:  Called at shut-down time to quiesce the device.

Shutdown seems to be one of the most ill documented device hooks.  I
really don't think we should be declaring existing code to be buggy
based on this single comment.  But the fact remains that if you want to
act on it, block is the correct place to shut down the queues.

James


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-04-19 19:30 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-17 23:06 [PATCH v2 0/4] SCSI core and UFS patches for kernel v6.4 Bart Van Assche
2023-04-17 23:06 ` [PATCH v2 1/4] scsi: sd: Let sd_shutdown() fail future I/O Bart Van Assche
2023-04-18  4:37   ` Ming Lei
2023-04-18 14:09     ` Bart Van Assche
2023-04-18  5:06   ` Christoph Hellwig
2023-04-18 14:36   ` James Bottomley
2023-04-18 18:37     ` Bart Van Assche
2023-04-19  2:34       ` James Bottomley
2023-04-19 13:36         ` Tomas Henzl
2023-04-19 14:02           ` James Bottomley
2023-04-19 17:58             ` Bart Van Assche
2023-04-19 18:33               ` James Bottomley
2023-04-19 19:24                 ` Bart Van Assche
2023-04-19 19:29                   ` James Bottomley
2023-04-17 23:06 ` [PATCH v2 2/4] scsi: ufs: Simplify ufshcd_wl_shutdown() Bart Van Assche
2023-04-18 13:45   ` Adrian Hunter
2023-04-18 14:06     ` Bart Van Assche
2023-04-18 14:13       ` Adrian Hunter
2023-04-18 20:14         ` Bart Van Assche
2023-04-19  5:23           ` Adrian Hunter
2023-04-17 23:06 ` [PATCH v2 3/4] scsi: ufs: Increase the START STOP UNIT timeout from one to ten seconds Bart Van Assche
2023-04-18  7:30   ` Adrian Hunter
2023-04-18  7:57   ` Stanley Chu
2023-04-17 23:06 ` [PATCH v2 4/4] scsi: ufs: Fix handling of lrbp->cmd Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.