All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/6] Fix a deadlock in the SCSI power management code
@ 2020-08-31  2:53 Bart Van Assche
  2020-08-31  2:53 ` [PATCH RFC 1/6] ide: Do not set the RQF_PREEMPT flag for sense requests Bart Van Assche
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Bart Van Assche @ 2020-08-31  2:53 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-scsi, Martin Kepplinger, Can Guo, Bart Van Assche

Recently Martin Kepplinger reported a problem with the SCSI runtime PM
code. Alan Stern root-caused the reported deadlock. This patch series is
an attempt to fix that deadlock. These patches compile but have not yet
been tested.

Bart Van Assche (6):
  ide: Do not set the RQF_PREEMPT flag for sense requests
  scsi: Remove an incorrect comment
  scsi: Pass a request queue pointer to __scsi_execute()
  scsi_transport_spi: Make spi_execute() accept a request queue pointer
  scsi_transport_spi: Freeze request queues instead of quiescing
  block, scsi, ide: Only submit power management requests in state
    RPM_SUSPENDED

 block/blk-core.c                  |   6 +-
 block/blk-mq-debugfs.c            |   1 -
 block/blk-mq.c                    |   4 +-
 drivers/ide/ide-atapi.c           |   1 -
 drivers/ide/ide-io.c              |   3 +-
 drivers/ide/ide-pm.c              |   2 +-
 drivers/scsi/scsi_lib.c           |  61 ++++++++-------
 drivers/scsi/scsi_priv.h          |   2 +
 drivers/scsi/scsi_transport_spi.c | 123 +++++++++++++++++-------------
 include/linux/blk-mq.h            |   4 +-
 include/linux/blkdev.h            |   6 +-
 include/scsi/scsi_device.h        |  16 ++--
 12 files changed, 120 insertions(+), 109 deletions(-)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH RFC 1/6] ide: Do not set the RQF_PREEMPT flag for sense requests
  2020-08-31  2:53 [PATCH RFC 0/6] Fix a deadlock in the SCSI power management code Bart Van Assche
@ 2020-08-31  2:53 ` Bart Van Assche
  2020-08-31  2:53 ` [PATCH RFC 2/6] scsi: Remove an incorrect comment Bart Van Assche
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Bart Van Assche @ 2020-08-31  2:53 UTC (permalink / raw)
  To: Alan Stern
  Cc: linux-scsi, Martin Kepplinger, Can Guo, Bart Van Assche,
	David S . Miller

RQF_PREEMPT is used for two different purposes in the legacy IDE code:
1. To mark power management requests.
2. To mark requests that should preempt another request. An (old)
   explanation of that feature is as follows:
   "The IDE driver in the Linux kernel normally uses a series of busywait
   delays during its initialization. When the driver executes these
   busywaits, the kernel does nothing for the duration of the wait. The
   time spent in these waits could be used for other initialization
   activities, if they could be run concurrently with these waits.

   More specifically, busywait-style delays such as udelay() in module
   init functions inhibit kernel preemption because the Big Kernel Lock
   is held, while yielding APIs such as schedule_timeout() allow preemption.
   This is true because the kernel handles the BKL specially and releases
   and reacquires it across reschedules allowed by the current thread.

   This IDE-preempt specification requires that the driver eliminate these
   busywaits and replace them with a mechanism that allows other work to
   proceed while the IDE driver is initializing."

Since I haven't found an implementation of (2), do not set the PREEMPT
flag for sense requests. This patch causes sense requests to be
postponed while a drive is suspended instead of being submitted to
ide_queue_rq().

If it would ever be necessary to restore the IDE PREEMPT functionality,
that can be done by introducing a new flag in struct ide_request.

This patch is a first step towards removing the PREEMPT flag from the
block layer.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/ide/ide-atapi.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/ide/ide-atapi.c b/drivers/ide/ide-atapi.c
index 2162bc80f09e..013ad33fbbc8 100644
--- a/drivers/ide/ide-atapi.c
+++ b/drivers/ide/ide-atapi.c
@@ -223,7 +223,6 @@ void ide_prep_sense(ide_drive_t *drive, struct request *rq)
 	sense_rq->rq_disk = rq->rq_disk;
 	sense_rq->cmd_flags = REQ_OP_DRV_IN;
 	ide_req(sense_rq)->type = ATA_PRIV_SENSE;
-	sense_rq->rq_flags |= RQF_PREEMPT;
 
 	req->cmd[0] = GPCMD_REQUEST_SENSE;
 	req->cmd[4] = cmd_len;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RFC 2/6] scsi: Remove an incorrect comment
  2020-08-31  2:53 [PATCH RFC 0/6] Fix a deadlock in the SCSI power management code Bart Van Assche
  2020-08-31  2:53 ` [PATCH RFC 1/6] ide: Do not set the RQF_PREEMPT flag for sense requests Bart Van Assche
@ 2020-08-31  2:53 ` Bart Van Assche
  2020-08-31  2:53 ` [PATCH RFC 3/6] scsi: Pass a request queue pointer to __scsi_execute() Bart Van Assche
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Bart Van Assche @ 2020-08-31  2:53 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-scsi, Martin Kepplinger, Can Guo, Bart Van Assche

scsi_device.sdev_target is used in more code than the single_lun code,
hence remove the comment next to the definition of the sdev_target member.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 include/scsi/scsi_device.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index bc5909033d13..264501d23aea 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -144,7 +144,7 @@ struct scsi_device {
 	struct scsi_vpd __rcu *vpd_pg80;
 	struct scsi_vpd __rcu *vpd_pg89;
 	unsigned char current_tag;	/* current tag */
-	struct scsi_target      *sdev_target;   /* used only for single_lun */
+	struct scsi_target      *sdev_target;
 
 	blist_flags_t		sdev_bflags; /* black/white flags as also found in
 				 * scsi_devinfo.[hc]. For now used only to

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RFC 3/6] scsi: Pass a request queue pointer to __scsi_execute()
  2020-08-31  2:53 [PATCH RFC 0/6] Fix a deadlock in the SCSI power management code Bart Van Assche
  2020-08-31  2:53 ` [PATCH RFC 1/6] ide: Do not set the RQF_PREEMPT flag for sense requests Bart Van Assche
  2020-08-31  2:53 ` [PATCH RFC 2/6] scsi: Remove an incorrect comment Bart Van Assche
@ 2020-08-31  2:53 ` Bart Van Assche
  2020-08-31  2:53 ` [PATCH RFC 4/6] scsi_transport_spi: Make spi_execute() accept a request queue pointer Bart Van Assche
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Bart Van Assche @ 2020-08-31  2:53 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-scsi, Martin Kepplinger, Can Guo, Bart Van Assche

This patch does not change any functionality but makes a later patch easier
to read.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_lib.c    | 12 +++++-------
 include/scsi/scsi_device.h |  8 ++++----
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 1bc7f2ffe627..1d7135f61962 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -221,7 +221,7 @@ void scsi_queue_insert(struct scsi_cmnd *cmd, int reason)
 
 /**
  * __scsi_execute - insert request and wait for the result
- * @sdev:	scsi device
+ * @q:		queue to insert the request into
  * @cmd:	scsi command
  * @data_direction: data direction
  * @buffer:	data buffer
@@ -237,7 +237,7 @@ void scsi_queue_insert(struct scsi_cmnd *cmd, int reason)
  * Returns the scsi_cmnd result field if a command was executed, or a negative
  * Linux error code if we didn't get that far.
  */
-int __scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
+int __scsi_execute(struct request_queue *q, const unsigned char *cmd,
 		 int data_direction, void *buffer, unsigned bufflen,
 		 unsigned char *sense, struct scsi_sense_hdr *sshdr,
 		 int timeout, int retries, u64 flags, req_flags_t rq_flags,
@@ -247,15 +247,13 @@ int __scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
 	struct scsi_request *rq;
 	int ret = DRIVER_ERROR << 24;
 
-	req = blk_get_request(sdev->request_queue,
-			data_direction == DMA_TO_DEVICE ?
+	req = blk_get_request(q, data_direction == DMA_TO_DEVICE ?
 			REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, BLK_MQ_REQ_PREEMPT);
 	if (IS_ERR(req))
 		return ret;
 	rq = scsi_req(req);
 
-	if (bufflen &&	blk_rq_map_kern(sdev->request_queue, req,
-					buffer, bufflen, GFP_NOIO))
+	if (bufflen && blk_rq_map_kern(q, req, buffer, bufflen, GFP_NOIO))
 		goto out;
 
 	rq->cmd_len = COMMAND_SIZE(cmd[0]);
@@ -268,7 +266,7 @@ int __scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
 	/*
 	 * head injection *required* here otherwise quiesce won't work
 	 */
-	blk_execute_rq(req->q, NULL, req, 1);
+	blk_execute_rq(q, NULL, req, 1);
 
 	/*
 	 * Some devices (USB mass-storage in particular) may transfer
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 264501d23aea..ef6e96e12c7c 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -437,7 +437,7 @@ extern const char *scsi_device_state_name(enum scsi_device_state);
 extern int scsi_is_sdev_device(const struct device *);
 extern int scsi_is_target_device(const struct device *);
 extern void scsi_sanitize_inquiry_string(unsigned char *s, int len);
-extern int __scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
+extern int __scsi_execute(struct request_queue *q, const unsigned char *cmd,
 			int data_direction, void *buffer, unsigned bufflen,
 			unsigned char *sense, struct scsi_sense_hdr *sshdr,
 			int timeout, int retries, u64 flags,
@@ -448,9 +448,9 @@ extern int __scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
 ({									\
 	BUILD_BUG_ON((sense) != NULL &&					\
 		     sizeof(sense) != SCSI_SENSE_BUFFERSIZE);		\
-	__scsi_execute(sdev, cmd, data_direction, buffer, bufflen,	\
-		       sense, sshdr, timeout, retries, flags, rq_flags,	\
-		       resid);						\
+	__scsi_execute(sdev->request_queue, cmd, data_direction,	\
+		       buffer, bufflen, sense, sshdr, timeout, retries,	\
+		       flags, rq_flags, resid);				\
 })
 static inline int scsi_execute_req(struct scsi_device *sdev,
 	const unsigned char *cmd, int data_direction, void *buffer,

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RFC 4/6] scsi_transport_spi: Make spi_execute() accept a request queue pointer
  2020-08-31  2:53 [PATCH RFC 0/6] Fix a deadlock in the SCSI power management code Bart Van Assche
                   ` (2 preceding siblings ...)
  2020-08-31  2:53 ` [PATCH RFC 3/6] scsi: Pass a request queue pointer to __scsi_execute() Bart Van Assche
@ 2020-08-31  2:53 ` Bart Van Assche
  2020-08-31  2:53 ` [PATCH RFC 5/6] scsi_transport_spi: Freeze request queues instead of quiescing Bart Van Assche
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Bart Van Assche @ 2020-08-31  2:53 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-scsi, Martin Kepplinger, Can Guo, Bart Van Assche

Passing a request queue pointer to spi_execute() instead of a SCSI device
pointer will allow a later patch to associate two request queues with a
SCSI device. Additionally, instead of assuming that the device state is
SDEV_QUIESCE before domain validation starts, read the device state. This
patch does not change any functionality but makes a later patch easier to
read.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_transport_spi.c | 69 ++++++++++++++++---------------
 1 file changed, 36 insertions(+), 33 deletions(-)

diff --git a/drivers/scsi/scsi_transport_spi.c b/drivers/scsi/scsi_transport_spi.c
index f3d5b1bbd5aa..959990f66865 100644
--- a/drivers/scsi/scsi_transport_spi.c
+++ b/drivers/scsi/scsi_transport_spi.c
@@ -104,7 +104,7 @@ static int sprint_frac(char *dest, int value, int denom)
 	return result;
 }
 
-static int spi_execute(struct scsi_device *sdev, const void *cmd,
+static int spi_execute(struct request_queue *q, const void *cmd,
 		       enum dma_data_direction dir,
 		       void *buffer, unsigned bufflen,
 		       struct scsi_sense_hdr *sshdr)
@@ -117,7 +117,7 @@ static int spi_execute(struct scsi_device *sdev, const void *cmd,
 		sshdr = &sshdr_tmp;
 
 	for(i = 0; i < DV_RETRIES; i++) {
-		result = scsi_execute(sdev, cmd, dir, buffer, bufflen, sense,
+		result = __scsi_execute(q, cmd, dir, buffer, bufflen, sense,
 				      sshdr, DV_TIMEOUT, /* retries */ 1,
 				      REQ_FAILFAST_DEV |
 				      REQ_FAILFAST_TRANSPORT |
@@ -620,13 +620,14 @@ enum spi_compare_returns {
 /* This is for read/write Domain Validation:  If the device supports
  * an echo buffer, we do read/write tests to it */
 static enum spi_compare_returns
-spi_dv_device_echo_buffer(struct scsi_device *sdev, u8 *buffer,
-			  u8 *ptr, const int retries)
+spi_dv_device_echo_buffer(struct scsi_device *sdev, struct request_queue *q,
+			  u8 *buffer, u8 *ptr, const int retries)
 {
 	int len = ptr - buffer;
 	int j, k, r, result;
 	unsigned int pattern = 0x0000ffff;
 	struct scsi_sense_hdr sshdr;
+	enum scsi_device_state sdev_state = sdev->sdev_state;
 
 	const char spi_write_buffer[] = {
 		WRITE_BUFFER, 0x0a, 0, 0, 0, 0, 0, len >> 8, len & 0xff, 0
@@ -671,11 +672,10 @@ spi_dv_device_echo_buffer(struct scsi_device *sdev, u8 *buffer,
 	}
 
 	for (r = 0; r < retries; r++) {
-		result = spi_execute(sdev, spi_write_buffer, DMA_TO_DEVICE,
+		result = spi_execute(q, spi_write_buffer, DMA_TO_DEVICE,
 				     buffer, len, &sshdr);
 		if(result || !scsi_device_online(sdev)) {
-
-			scsi_device_set_state(sdev, SDEV_QUIESCE);
+			scsi_device_set_state(sdev, sdev_state);
 			if (scsi_sense_valid(&sshdr)
 			    && sshdr.sense_key == ILLEGAL_REQUEST
 			    /* INVALID FIELD IN CDB */
@@ -693,9 +693,9 @@ spi_dv_device_echo_buffer(struct scsi_device *sdev, u8 *buffer,
 		}
 
 		memset(ptr, 0, len);
-		spi_execute(sdev, spi_read_buffer, DMA_FROM_DEVICE,
-			    ptr, len, NULL);
-		scsi_device_set_state(sdev, SDEV_QUIESCE);
+		spi_execute(q, spi_read_buffer, DMA_FROM_DEVICE, ptr, len,
+			    NULL);
+		scsi_device_set_state(sdev, sdev_state);
 
 		if (memcmp(buffer, ptr, len) != 0)
 			return SPI_COMPARE_FAILURE;
@@ -706,11 +706,12 @@ spi_dv_device_echo_buffer(struct scsi_device *sdev, u8 *buffer,
 /* This is for the simplest form of Domain Validation: a read test
  * on the inquiry data from the device */
 static enum spi_compare_returns
-spi_dv_device_compare_inquiry(struct scsi_device *sdev, u8 *buffer,
-			      u8 *ptr, const int retries)
+spi_dv_device_compare_inquiry(struct scsi_device *sdev, struct request_queue *q,
+			      u8 *buffer, u8 *ptr, const int retries)
 {
 	int r, result;
 	const int len = sdev->inquiry_len;
+	enum scsi_device_state sdev_state = sdev->sdev_state;
 	const char spi_inquiry[] = {
 		INQUIRY, 0, 0, 0, len, 0
 	};
@@ -718,11 +719,11 @@ spi_dv_device_compare_inquiry(struct scsi_device *sdev, u8 *buffer,
 	for (r = 0; r < retries; r++) {
 		memset(ptr, 0, len);
 
-		result = spi_execute(sdev, spi_inquiry, DMA_FROM_DEVICE,
-				     ptr, len, NULL);
+		result = spi_execute(q, spi_inquiry, DMA_FROM_DEVICE, ptr, len,
+				     NULL);
 		
 		if(result || !scsi_device_online(sdev)) {
-			scsi_device_set_state(sdev, SDEV_QUIESCE);
+			scsi_device_set_state(sdev, sdev_state);
 			return SPI_COMPARE_FAILURE;
 		}
 
@@ -742,9 +743,10 @@ spi_dv_device_compare_inquiry(struct scsi_device *sdev, u8 *buffer,
 }
 
 static enum spi_compare_returns
-spi_dv_retrain(struct scsi_device *sdev, u8 *buffer, u8 *ptr,
-	       enum spi_compare_returns 
-	       (*compare_fn)(struct scsi_device *, u8 *, u8 *, int))
+spi_dv_retrain(struct scsi_device *sdev, struct request_queue *q, u8 *buffer,
+	       u8 *ptr, enum spi_compare_returns
+	       (*compare_fn)(struct scsi_device *, struct request_queue *,
+			     u8 *, u8 *, int))
 {
 	struct spi_internal *i = to_spi_internal(sdev->host->transportt);
 	struct scsi_target *starget = sdev->sdev_target;
@@ -754,7 +756,7 @@ spi_dv_retrain(struct scsi_device *sdev, u8 *buffer, u8 *ptr,
 
 	for (;;) {
 		int newperiod;
-		retval = compare_fn(sdev, buffer, ptr, DV_LOOPS);
+		retval = compare_fn(sdev, q, buffer, ptr, DV_LOOPS);
 
 		if (retval == SPI_COMPARE_SUCCESS
 		    || retval == SPI_COMPARE_SKIP_TEST)
@@ -800,7 +802,8 @@ spi_dv_retrain(struct scsi_device *sdev, u8 *buffer, u8 *ptr,
 }
 
 static int
-spi_dv_device_get_echo_buffer(struct scsi_device *sdev, u8 *buffer)
+spi_dv_device_get_echo_buffer(struct scsi_device *sdev,
+			      struct request_queue *q, u8 *buffer)
 {
 	int l, result;
 
@@ -824,8 +827,8 @@ spi_dv_device_get_echo_buffer(struct scsi_device *sdev, u8 *buffer)
 	 * (reservation conflict, device not ready, etc) just
 	 * skip the write tests */
 	for (l = 0; ; l++) {
-		result = spi_execute(sdev, spi_test_unit_ready, DMA_NONE, 
-				     NULL, 0, NULL);
+		result = spi_execute(q, spi_test_unit_ready, DMA_NONE, NULL, 0,
+				     NULL);
 
 		if(result) {
 			if(l >= 3)
@@ -836,8 +839,8 @@ spi_dv_device_get_echo_buffer(struct scsi_device *sdev, u8 *buffer)
 		}
 	}
 
-	result = spi_execute(sdev, spi_read_buffer_descriptor, 
-			     DMA_FROM_DEVICE, buffer, 4, NULL);
+	result = spi_execute(q, spi_read_buffer_descriptor, DMA_FROM_DEVICE,
+			     buffer, 4, NULL);
 
 	if (result)
 		/* Device has no echo buffer */
@@ -847,7 +850,8 @@ spi_dv_device_get_echo_buffer(struct scsi_device *sdev, u8 *buffer)
 }
 
 static void
-spi_dv_device_internal(struct scsi_device *sdev, u8 *buffer)
+spi_dv_device_internal(struct scsi_device *sdev, struct request_queue *q,
+		       u8 *buffer)
 {
 	struct spi_internal *i = to_spi_internal(sdev->host->transportt);
 	struct scsi_target *starget = sdev->sdev_target;
@@ -859,7 +863,7 @@ spi_dv_device_internal(struct scsi_device *sdev, u8 *buffer)
 	DV_SET(offset, 0);
 	DV_SET(width, 0);
 
-	if (spi_dv_device_compare_inquiry(sdev, buffer, buffer, DV_LOOPS)
+	if (spi_dv_device_compare_inquiry(sdev, q, buffer, buffer, DV_LOOPS)
 	    != SPI_COMPARE_SUCCESS) {
 		starget_printk(KERN_ERR, starget, "Domain Validation Initial Inquiry Failed\n");
 		/* FIXME: should probably offline the device here? */
@@ -875,9 +879,8 @@ spi_dv_device_internal(struct scsi_device *sdev, u8 *buffer)
 	if (i->f->set_width && max_width) {
 		i->f->set_width(starget, 1);
 
-		if (spi_dv_device_compare_inquiry(sdev, buffer,
-						   buffer + len,
-						   DV_LOOPS)
+		if (spi_dv_device_compare_inquiry(sdev, q, buffer, buffer + len,
+						  DV_LOOPS)
 		    != SPI_COMPARE_SUCCESS) {
 			starget_printk(KERN_ERR, starget, "Wide Transfers Fail\n");
 			i->f->set_width(starget, 0);
@@ -946,7 +949,7 @@ spi_dv_device_internal(struct scsi_device *sdev, u8 *buffer)
 	DV_SET(width, max_width);
 
 	/* Do the read only INQUIRY tests */
-	spi_dv_retrain(sdev, buffer, buffer + sdev->inquiry_len,
+	spi_dv_retrain(sdev, q, buffer, buffer + sdev->inquiry_len,
 		       spi_dv_device_compare_inquiry);
 	/* See if we actually managed to negotiate and sustain DT */
 	if (i->f->get_dt)
@@ -958,7 +961,7 @@ spi_dv_device_internal(struct scsi_device *sdev, u8 *buffer)
 	 * negotiated DT */
 
 	if (len == -1 && spi_dt(starget))
-		len = spi_dv_device_get_echo_buffer(sdev, buffer);
+		len = spi_dv_device_get_echo_buffer(sdev, q, buffer);
 
 	if (len <= 0) {
 		starget_printk(KERN_INFO, starget, "Domain Validation skipping write tests\n");
@@ -970,7 +973,7 @@ spi_dv_device_internal(struct scsi_device *sdev, u8 *buffer)
 		len = SPI_MAX_ECHO_BUFFER_SIZE;
 	}
 
-	if (spi_dv_retrain(sdev, buffer, buffer + len,
+	if (spi_dv_retrain(sdev, q, buffer, buffer + len,
 			   spi_dv_device_echo_buffer)
 	    == SPI_COMPARE_SKIP_TEST) {
 		/* OK, the stupid drive can't do a write echo buffer
@@ -1030,7 +1033,7 @@ spi_dv_device(struct scsi_device *sdev)
 
 	starget_printk(KERN_INFO, starget, "Beginning Domain Validation\n");
 
-	spi_dv_device_internal(sdev, buffer);
+	spi_dv_device_internal(sdev, sdev->request_queue, buffer);
 
 	starget_printk(KERN_INFO, starget, "Ending Domain Validation\n");
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RFC 5/6] scsi_transport_spi: Freeze request queues instead of quiescing
  2020-08-31  2:53 [PATCH RFC 0/6] Fix a deadlock in the SCSI power management code Bart Van Assche
                   ` (3 preceding siblings ...)
  2020-08-31  2:53 ` [PATCH RFC 4/6] scsi_transport_spi: Make spi_execute() accept a request queue pointer Bart Van Assche
@ 2020-08-31  2:53 ` Bart Van Assche
  2020-09-03  1:39   ` Alan Stern
  2020-08-31  2:53 ` [PATCH RFC 6/6] block, scsi, ide: Only submit power management requests in state RPM_SUSPENDED Bart Van Assche
  2020-08-31  9:09 ` [PATCH RFC 0/6] Fix a deadlock in the SCSI power management code Martin Kepplinger
  6 siblings, 1 reply; 13+ messages in thread
From: Bart Van Assche @ 2020-08-31  2:53 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-scsi, Martin Kepplinger, Can Guo, Bart Van Assche

Instead of quiescing the request queues involved in domain validation,
freeze these. As a result, the struct request_queue pm_only member is no
longer set during domain validation. That will allow to modify
scsi_execute() such that it stops setting the BLK_MQ_REQ_PREEMPT flag.
Three additional changes in this patch are that scsi_mq_alloc_queue() is
exported, that scsi_device_quiesce() is no longer exported and that
scsi_target_{quiesce,resume}() have been changed into
scsi_target_{freeze,unfreeze}().

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_lib.c           | 22 ++++++------
 drivers/scsi/scsi_priv.h          |  2 ++
 drivers/scsi/scsi_transport_spi.c | 56 ++++++++++++++++++++-----------
 include/scsi/scsi_device.h        |  6 ++--
 4 files changed, 51 insertions(+), 35 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 1d7135f61962..49eb8f2dffd8 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1866,6 +1866,7 @@ struct request_queue *scsi_mq_alloc_queue(struct scsi_device *sdev)
 	blk_queue_flag_set(QUEUE_FLAG_SCSI_PASSTHROUGH, sdev->request_queue);
 	return sdev->request_queue;
 }
+EXPORT_SYMBOL_GPL(scsi_mq_alloc_queue);
 
 int scsi_mq_setup_tags(struct Scsi_Host *shost)
 {
@@ -2540,7 +2541,6 @@ scsi_device_quiesce(struct scsi_device *sdev)
 
 	return err;
 }
-EXPORT_SYMBOL(scsi_device_quiesce);
 
 /**
  *	scsi_device_resume - Restart user issued commands to a quiesced device.
@@ -2569,30 +2569,30 @@ void scsi_device_resume(struct scsi_device *sdev)
 EXPORT_SYMBOL(scsi_device_resume);
 
 static void
-device_quiesce_fn(struct scsi_device *sdev, void *data)
+device_freeze_fn(struct scsi_device *sdev, void *data)
 {
-	scsi_device_quiesce(sdev);
+	blk_mq_freeze_queue(sdev->request_queue);
 }
 
 void
-scsi_target_quiesce(struct scsi_target *starget)
+scsi_target_freeze(struct scsi_target *starget)
 {
-	starget_for_each_device(starget, NULL, device_quiesce_fn);
+	starget_for_each_device(starget, NULL, device_freeze_fn);
 }
-EXPORT_SYMBOL(scsi_target_quiesce);
+EXPORT_SYMBOL(scsi_target_freeze);
 
 static void
-device_resume_fn(struct scsi_device *sdev, void *data)
+device_unfreeze_fn(struct scsi_device *sdev, void *data)
 {
-	scsi_device_resume(sdev);
+	blk_mq_unfreeze_queue(sdev->request_queue);
 }
 
 void
-scsi_target_resume(struct scsi_target *starget)
+scsi_target_unfreeze(struct scsi_target *starget)
 {
-	starget_for_each_device(starget, NULL, device_resume_fn);
+	starget_for_each_device(starget, NULL, device_unfreeze_fn);
 }
-EXPORT_SYMBOL(scsi_target_resume);
+EXPORT_SYMBOL(scsi_target_unfreeze);
 
 /**
  * scsi_internal_device_block_nowait - try to transition to the SDEV_BLOCK state
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index d12ada035961..6b9203df84c8 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -95,6 +95,8 @@ extern int scsi_mq_setup_tags(struct Scsi_Host *shost);
 extern void scsi_mq_destroy_tags(struct Scsi_Host *shost);
 extern void scsi_exit_queue(void);
 extern void scsi_evt_thread(struct work_struct *work);
+extern int scsi_device_quiesce(struct scsi_device *sdev);
+extern void scsi_device_resume(struct scsi_device *sdev);
 struct request_queue;
 struct request;
 
diff --git a/drivers/scsi/scsi_transport_spi.c b/drivers/scsi/scsi_transport_spi.c
index 959990f66865..63bec8980b27 100644
--- a/drivers/scsi/scsi_transport_spi.c
+++ b/drivers/scsi/scsi_transport_spi.c
@@ -997,59 +997,75 @@ void
 spi_dv_device(struct scsi_device *sdev)
 {
 	struct scsi_target *starget = sdev->sdev_target;
+	struct request_queue *q1, *q2;
 	u8 *buffer;
 	const int len = SPI_MAX_ECHO_BUFFER_SIZE*2;
 
 	/*
-	 * Because this function and the power management code both call
-	 * scsi_device_quiesce(), it is not safe to perform domain validation
-	 * while suspend or resume is in progress. Hence the
-	 * lock/unlock_system_sleep() calls.
+	 * Because creates a new request queue that is not visible to the rest
+	 * of the system, domain validation must be serialized against suspend,
+	 * resume and runtime power management. Hence the
+	 * lock/unlock_system_sleep() and scsi_autopm_{get,put}_device() calls.
 	 */
 	lock_system_sleep();
 
+	if (scsi_autopm_get_device(sdev))
+		goto unlock_system_sleep;
+
 	if (unlikely(spi_dv_in_progress(starget)))
-		goto unlock;
+		goto put_autopm;
 
 	if (unlikely(scsi_device_get(sdev)))
-		goto unlock;
+		goto put_autopm;
 
 	spi_dv_in_progress(starget) = 1;
 
 	buffer = kzalloc(len, GFP_KERNEL);
 
 	if (unlikely(!buffer))
-		goto out_put;
-
-	/* We need to verify that the actual device will quiesce; the
-	 * later target quiesce is just a nice to have */
-	if (unlikely(scsi_device_quiesce(sdev)))
-		goto out_free;
-
-	scsi_target_quiesce(starget);
+		goto put_sdev;
 
 	spi_dv_pending(starget) = 1;
 	mutex_lock(&spi_dv_mutex(starget));
 
 	starget_printk(KERN_INFO, starget, "Beginning Domain Validation\n");
 
-	spi_dv_device_internal(sdev, sdev->request_queue, buffer);
+	/*
+	 * Save the request queue pointer before it is overwritten by
+	 * scsi_mq_alloc_queue().
+	 */
+	q1 = sdev->request_queue;
+	q2 = scsi_mq_alloc_queue(sdev);
+
+	if (q2) {
+		/*
+		 * Restore the request queue pointer such that no other
+		 * subsystem can submit SCSI commands to 'sdev'.
+		 */
+		sdev->request_queue = q1;
+		scsi_target_freeze(starget);
+		spi_dv_device_internal(sdev, q2, buffer);
+		blk_cleanup_queue(q2);
+		scsi_target_unfreeze(starget);
+	}
 
 	starget_printk(KERN_INFO, starget, "Ending Domain Validation\n");
 
 	mutex_unlock(&spi_dv_mutex(starget));
 	spi_dv_pending(starget) = 0;
 
-	scsi_target_resume(starget);
-
 	spi_initial_dv(starget) = 1;
 
- out_free:
 	kfree(buffer);
- out_put:
+
+put_sdev:
 	spi_dv_in_progress(starget) = 0;
 	scsi_device_put(sdev);
-unlock:
+
+put_autopm:
+	scsi_autopm_put_device(sdev);
+
+unlock_system_sleep:
 	unlock_system_sleep();
 }
 EXPORT_SYMBOL(spi_dv_device);
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index ef6e96e12c7c..08f88ef04bc9 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -422,10 +422,8 @@ extern struct scsi_event *sdev_evt_alloc(enum scsi_device_event evt_type,
 extern void sdev_evt_send(struct scsi_device *sdev, struct scsi_event *evt);
 extern void sdev_evt_send_simple(struct scsi_device *sdev,
 			  enum scsi_device_event evt_type, gfp_t gfpflags);
-extern int scsi_device_quiesce(struct scsi_device *sdev);
-extern void scsi_device_resume(struct scsi_device *sdev);
-extern void scsi_target_quiesce(struct scsi_target *);
-extern void scsi_target_resume(struct scsi_target *);
+extern void scsi_target_freeze(struct scsi_target *);
+extern void scsi_target_unfreeze(struct scsi_target *);
 extern void scsi_scan_target(struct device *parent, unsigned int channel,
 			     unsigned int id, u64 lun,
 			     enum scsi_scan_mode rescan);

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RFC 6/6] block, scsi, ide: Only submit power management requests in state RPM_SUSPENDED
  2020-08-31  2:53 [PATCH RFC 0/6] Fix a deadlock in the SCSI power management code Bart Van Assche
                   ` (4 preceding siblings ...)
  2020-08-31  2:53 ` [PATCH RFC 5/6] scsi_transport_spi: Freeze request queues instead of quiescing Bart Van Assche
@ 2020-08-31  2:53 ` Bart Van Assche
  2020-08-31 18:25   ` Alan Stern
  2020-08-31  9:09 ` [PATCH RFC 0/6] Fix a deadlock in the SCSI power management code Martin Kepplinger
  6 siblings, 1 reply; 13+ messages in thread
From: Bart Van Assche @ 2020-08-31  2:53 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-scsi, Martin Kepplinger, Can Guo, Bart Van Assche

Recently Martin Kepplinger reported a problem with the SCSI runtime PM
code. Alan Stern root-caused that deadlock as follows:

	Thread 0		Thread 1
	--------		--------
	Start runtime suspend
	blk_pre_runtime_suspend calls
	  blk_set_pm_only and sets
	  q->rpm_status to RPM_SUSPENDING

				Call sd_open -> ... -> scsi_test_unit_ready
				  -> __scsi_execute -> ...
				  -> blk_queue_enter
				Sees BLK_MQ_REQ_PREEMPT set and
				  RPM_SUSPENDING queue status, so does
				  not postpone the request

	blk_post_runtime_suspend sets
	  q->rpm_status to RPM_SUSPENDED
	The drive goes into runtime suspend

				Issues the TEST UNIT READY request
				Request fails because the drive is suspended

Fix that deadlock by only accepting power management requests while
suspended. Remove flag RQF_PREEMPT.

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Martin Kepplinger <martin.kepplinger@puri.sm>
Cc: Can Guo <cang@codeaurora.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-core.c        |  6 +++---
 block/blk-mq-debugfs.c  |  1 -
 block/blk-mq.c          |  4 ++--
 drivers/ide/ide-io.c    |  3 +--
 drivers/ide/ide-pm.c    |  2 +-
 drivers/scsi/scsi_lib.c | 27 ++++++++++++++-------------
 include/linux/blk-mq.h  |  4 ++--
 include/linux/blkdev.h  |  6 +-----
 8 files changed, 24 insertions(+), 29 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index d9d632639bd1..78a002ff9473 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -420,11 +420,11 @@ EXPORT_SYMBOL(blk_cleanup_queue);
 /**
  * blk_queue_enter() - try to increase q->q_usage_counter
  * @q: request queue pointer
- * @flags: BLK_MQ_REQ_NOWAIT and/or BLK_MQ_REQ_PREEMPT
+ * @flags: BLK_MQ_REQ_NOWAIT and/or BLK_MQ_REQ_PM
  */
 int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 {
-	const bool pm = flags & BLK_MQ_REQ_PREEMPT;
+	const bool pm = flags & BLK_MQ_REQ_PM;
 
 	while (true) {
 		bool success = false;
@@ -626,7 +626,7 @@ struct request *blk_get_request(struct request_queue *q, unsigned int op,
 	struct request *req;
 
 	WARN_ON_ONCE(op & REQ_NOWAIT);
-	WARN_ON_ONCE(flags & ~(BLK_MQ_REQ_NOWAIT | BLK_MQ_REQ_PREEMPT));
+	WARN_ON_ONCE(flags & ~(BLK_MQ_REQ_NOWAIT | BLK_MQ_REQ_PM));
 
 	req = blk_mq_alloc_request(q, op, flags);
 	if (!IS_ERR(req) && q->mq_ops->initialize_rq_fn)
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 3f09bcb8a6fd..9273978f06e7 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -296,7 +296,6 @@ static const char *const rqf_name[] = {
 	RQF_NAME(MIXED_MERGE),
 	RQF_NAME(MQ_INFLIGHT),
 	RQF_NAME(DONTPREP),
-	RQF_NAME(PREEMPT),
 	RQF_NAME(FAILED),
 	RQF_NAME(QUIET),
 	RQF_NAME(ELVPRIV),
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 0015a1892153..4e7589e55bf5 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -292,8 +292,8 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 	rq->mq_hctx = data->hctx;
 	rq->rq_flags = 0;
 	rq->cmd_flags = data->cmd_flags;
-	if (data->flags & BLK_MQ_REQ_PREEMPT)
-		rq->rq_flags |= RQF_PREEMPT;
+	if (data->flags & BLK_MQ_REQ_PM)
+		rq->rq_flags |= RQF_PM;
 	if (blk_queue_io_stat(data->q))
 		rq->rq_flags |= RQF_IO_STAT;
 	INIT_LIST_HEAD(&rq->queuelist);
diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
index 1a53c7a75224..beb850679fa9 100644
--- a/drivers/ide/ide-io.c
+++ b/drivers/ide/ide-io.c
@@ -522,8 +522,7 @@ blk_status_t ide_issue_rq(ide_drive_t *drive, struct request *rq,
 		 * state machine.
 		 */
 		if ((drive->dev_flags & IDE_DFLAG_BLOCKED) &&
-		    ata_pm_request(rq) == 0 &&
-		    (rq->rq_flags & RQF_PREEMPT) == 0) {
+		    ata_pm_request(rq) == 0) {
 			/* there should be no pending command at this point */
 			ide_unlock_port(hwif);
 			goto plug_device;
diff --git a/drivers/ide/ide-pm.c b/drivers/ide/ide-pm.c
index 192e6c65d34e..82ab308f1aaf 100644
--- a/drivers/ide/ide-pm.c
+++ b/drivers/ide/ide-pm.c
@@ -77,7 +77,7 @@ int generic_ide_resume(struct device *dev)
 	}
 
 	memset(&rqpm, 0, sizeof(rqpm));
-	rq = blk_get_request(drive->queue, REQ_OP_DRV_IN, BLK_MQ_REQ_PREEMPT);
+	rq = blk_get_request(drive->queue, REQ_OP_DRV_IN, BLK_MQ_REQ_PM);
 	ide_req(rq)->type = ATA_PRIV_PM_RESUME;
 	ide_req(rq)->special = &rqpm;
 	rqpm.pm_step = IDE_PM_START_RESUME;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 49eb8f2dffd8..bd9930964697 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -248,7 +248,8 @@ int __scsi_execute(struct request_queue *q, const unsigned char *cmd,
 	int ret = DRIVER_ERROR << 24;
 
 	req = blk_get_request(q, data_direction == DMA_TO_DEVICE ?
-			REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, BLK_MQ_REQ_PREEMPT);
+			      REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN,
+			      rq_flags & RQF_PM ? BLK_MQ_REQ_PM : 0);
 	if (IS_ERR(req))
 		return ret;
 	rq = scsi_req(req);
@@ -1219,6 +1220,8 @@ static blk_status_t
 scsi_prep_state_check(struct scsi_device *sdev, struct request *req)
 {
 	switch (sdev->sdev_state) {
+	case SDEV_CREATED:
+		return BLK_STS_OK;
 	case SDEV_OFFLINE:
 	case SDEV_TRANSPORT_OFFLINE:
 		/*
@@ -1245,18 +1248,18 @@ scsi_prep_state_check(struct scsi_device *sdev, struct request *req)
 		return BLK_STS_RESOURCE;
 	case SDEV_QUIESCE:
 		/*
-		 * If the devices is blocked we defer normal commands.
+		 * If the device is blocked we only accept power management
+		 * commands.
 		 */
-		if (req && !(req->rq_flags & RQF_PREEMPT))
+		if (req && WARN_ON_ONCE(!(req->rq_flags & RQF_PM)))
 			return BLK_STS_RESOURCE;
 		return BLK_STS_OK;
 	default:
 		/*
 		 * For any other not fully online state we only allow
-		 * special commands.  In particular any user initiated
-		 * command is not allowed.
+		 * power management commands.
 		 */
-		if (req && !(req->rq_flags & RQF_PREEMPT))
+		if (req && !(req->rq_flags & RQF_PM))
 			return BLK_STS_IOERR;
 		return BLK_STS_OK;
 	}
@@ -2489,15 +2492,13 @@ void sdev_evt_send_simple(struct scsi_device *sdev,
 EXPORT_SYMBOL_GPL(sdev_evt_send_simple);
 
 /**
- *	scsi_device_quiesce - Block user issued commands.
+ *	scsi_device_quiesce - Block all commands except power management.
  *	@sdev:	scsi device to quiesce.
  *
  *	This works by trying to transition to the SDEV_QUIESCE state
  *	(which must be a legal transition).  When the device is in this
- *	state, only special requests will be accepted, all others will
- *	be deferred.  Since special requests may also be requeued requests,
- *	a successful return doesn't guarantee the device will be
- *	totally quiescent.
+ *	state, only power management requests will be accepted, all others will
+ *	be deferred.
  *
  *	Must be called with user context, may sleep.
  *
@@ -2558,12 +2559,12 @@ void scsi_device_resume(struct scsi_device *sdev)
 	 * device deleted during suspend)
 	 */
 	mutex_lock(&sdev->state_mutex);
+	if (sdev->sdev_state == SDEV_QUIESCE)
+		scsi_device_set_state(sdev, SDEV_RUNNING);
 	if (sdev->quiesced_by) {
 		sdev->quiesced_by = NULL;
 		blk_clear_pm_only(sdev->request_queue);
 	}
-	if (sdev->sdev_state == SDEV_QUIESCE)
-		scsi_device_set_state(sdev, SDEV_RUNNING);
 	mutex_unlock(&sdev->state_mutex);
 }
 EXPORT_SYMBOL(scsi_device_resume);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 9d2d5ad367a4..864f3b512d83 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -433,8 +433,8 @@ enum {
 	BLK_MQ_REQ_NOWAIT	= (__force blk_mq_req_flags_t)(1 << 0),
 	/* allocate from reserved pool */
 	BLK_MQ_REQ_RESERVED	= (__force blk_mq_req_flags_t)(1 << 1),
-	/* set RQF_PREEMPT */
-	BLK_MQ_REQ_PREEMPT	= (__force blk_mq_req_flags_t)(1 << 3),
+	/* set RQF_PM */
+	BLK_MQ_REQ_PM		= (__force blk_mq_req_flags_t)(1 << 3),
 };
 
 struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index bb5636cc17b9..909cb8b1d4ee 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -77,9 +77,6 @@ typedef __u32 __bitwise req_flags_t;
 #define RQF_MQ_INFLIGHT		((__force req_flags_t)(1 << 6))
 /* don't call prep for this one */
 #define RQF_DONTPREP		((__force req_flags_t)(1 << 7))
-/* set for "ide_preempt" requests and also for requests for which the SCSI
-   "quiesce" state must be ignored. */
-#define RQF_PREEMPT		((__force req_flags_t)(1 << 8))
 /* vaguely specified driver internal error.  Ignored by the block layer */
 #define RQF_FAILED		((__force req_flags_t)(1 << 10))
 /* don't warn about errors */
@@ -424,8 +421,7 @@ struct request_queue {
 	unsigned long		queue_flags;
 	/*
 	 * Number of contexts that have called blk_set_pm_only(). If this
-	 * counter is above zero then only RQF_PM and RQF_PREEMPT requests are
-	 * processed.
+	 * counter is above zero then only RQF_PM requests are processed.
 	 */
 	atomic_t		pm_only;
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC 0/6] Fix a deadlock in the SCSI power management code
  2020-08-31  2:53 [PATCH RFC 0/6] Fix a deadlock in the SCSI power management code Bart Van Assche
                   ` (5 preceding siblings ...)
  2020-08-31  2:53 ` [PATCH RFC 6/6] block, scsi, ide: Only submit power management requests in state RPM_SUSPENDED Bart Van Assche
@ 2020-08-31  9:09 ` Martin Kepplinger
  2020-09-01  3:55   ` Bart Van Assche
  6 siblings, 1 reply; 13+ messages in thread
From: Martin Kepplinger @ 2020-08-31  9:09 UTC (permalink / raw)
  To: Bart Van Assche, Alan Stern; +Cc: linux-scsi, Can Guo

On 31.08.20 04:53, Bart Van Assche wrote:
> Recently Martin Kepplinger reported a problem with the SCSI runtime PM
> code. Alan Stern root-caused the reported deadlock. This patch series is
> an attempt to fix that deadlock. These patches compile but have not yet
> been tested.
> 
> Bart Van Assche (6):
>   ide: Do not set the RQF_PREEMPT flag for sense requests
>   scsi: Remove an incorrect comment
>   scsi: Pass a request queue pointer to __scsi_execute()
>   scsi_transport_spi: Make spi_execute() accept a request queue pointer
>   scsi_transport_spi: Freeze request queues instead of quiescing
>   block, scsi, ide: Only submit power management requests in state
>     RPM_SUSPENDED
> 

this patchset works for me (as an alternative to Alan's initial fix:
https://lore.kernel.org/linux-scsi/20200623111018.31954-1-martin.kepplinger@puri.sm/T/#ma566fe2e39cb3fcccdd245564913d17a343e1d1a
)

On top of this I have to apply scsi "fixes" of course and now run
exactly this:
https://lore.kernel.org/linux-scsi/20200824190400.12339-1-martin.kepplinger@puri.sm/T/#u

Is this fully applicable still now? Again, the cardreader works and
suspends as expected. So thanks a lot for working on this!

Sure someone who has experience with block and scsi code should review
this, but definitely:

Tested-by: Martin Kepplinger <martin.kepplinger@puri.sm>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC 6/6] block, scsi, ide: Only submit power management requests in state RPM_SUSPENDED
  2020-08-31  2:53 ` [PATCH RFC 6/6] block, scsi, ide: Only submit power management requests in state RPM_SUSPENDED Bart Van Assche
@ 2020-08-31 18:25   ` Alan Stern
  2020-09-01  5:00     ` Bart Van Assche
  0 siblings, 1 reply; 13+ messages in thread
From: Alan Stern @ 2020-08-31 18:25 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-scsi, Martin Kepplinger, Can Guo

On Sun, Aug 30, 2020 at 07:53:57PM -0700, Bart Van Assche wrote:
> Recently Martin Kepplinger reported a problem with the SCSI runtime PM
> code. Alan Stern root-caused that deadlock as follows:
> 
> 	Thread 0		Thread 1
> 	--------		--------
> 	Start runtime suspend
> 	blk_pre_runtime_suspend calls
> 	  blk_set_pm_only and sets
> 	  q->rpm_status to RPM_SUSPENDING
> 
> 				Call sd_open -> ... -> scsi_test_unit_ready
> 				  -> __scsi_execute -> ...
> 				  -> blk_queue_enter
> 				Sees BLK_MQ_REQ_PREEMPT set and
> 				  RPM_SUSPENDING queue status, so does
> 				  not postpone the request
> 
> 	blk_post_runtime_suspend sets
> 	  q->rpm_status to RPM_SUSPENDED
> 	The drive goes into runtime suspend
> 
> 				Issues the TEST UNIT READY request
> 				Request fails because the drive is suspended
> 
> Fix that deadlock by only accepting power management requests while
> suspended. Remove flag RQF_PREEMPT.

Let me clarify this description.

Firstly, the second-to-last sentence is ambiguous.  The word "only" is
all too easy to misuse through carelessness.  In this case you meant
to say "by accepting only power management requests while suspended",
but what you actually wrote was equivalent to "by accepting power
management requests only while suspended".  And as it happens, both
meanings are incorrect because we don't want to accept _any_ requests
while the device is suspended -- not even power-management requests.
The description should have said "by postponing all non-power-management 
requests while the device is suspending, suspended, or resuming."

Secondly, the scenario described above is not a deadlock; it is a race 
leading to a command failure.  Namely, the thread setting q->rpm_status 
to RPM_SUSPENDED races with the thread testing q->rpm_status.

Thirdly, this race is _not_ the problem that Martin encountered.  His 
problem was a simple bug (failure to postpone a request), not a race or 
a deadlock.

Fourthly, the race illustrated above is, for now, theoretical.  It 
cannot occur with the existing code base (mostly because the existing 
code is buggy).  The advantage of your series over the patch I submitted 
on Aug. 23 ("block: Fix bug in runtime-resume handling") is that it 
fixes Martin's problem without introducing this new race.

Alan Stern

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC 0/6] Fix a deadlock in the SCSI power management code
  2020-08-31  9:09 ` [PATCH RFC 0/6] Fix a deadlock in the SCSI power management code Martin Kepplinger
@ 2020-09-01  3:55   ` Bart Van Assche
  0 siblings, 0 replies; 13+ messages in thread
From: Bart Van Assche @ 2020-09-01  3:55 UTC (permalink / raw)
  To: Martin Kepplinger, Alan Stern; +Cc: linux-scsi, Can Guo

On 2020-08-31 02:09, Martin Kepplinger wrote:
> Tested-by: Martin Kepplinger <martin.kepplinger@puri.sm>

Thanks for having tested this patch series!

Bart.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC 6/6] block, scsi, ide: Only submit power management requests in state RPM_SUSPENDED
  2020-08-31 18:25   ` Alan Stern
@ 2020-09-01  5:00     ` Bart Van Assche
  2020-09-01 14:50       ` Alan Stern
  0 siblings, 1 reply; 13+ messages in thread
From: Bart Van Assche @ 2020-09-01  5:00 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-scsi, Martin Kepplinger, Can Guo

On 2020-08-31 11:25, Alan Stern wrote:
> Let me clarify this description.
> 
> Firstly, the second-to-last sentence is ambiguous.  The word "only" is
> all too easy to misuse through carelessness.  In this case you meant
> to say "by accepting only power management requests while suspended",
> but what you actually wrote was equivalent to "by accepting power
> management requests only while suspended".  And as it happens, both
> meanings are incorrect because we don't want to accept _any_ requests
> while the device is suspended -- not even power-management requests.
> The description should have said "by postponing all non-power-management 
> requests while the device is suspending, suspended, or resuming."
> 
> Secondly, the scenario described above is not a deadlock; it is a race 
> leading to a command failure.  Namely, the thread setting q->rpm_status 
> to RPM_SUSPENDED races with the thread testing q->rpm_status.
> 
> Thirdly, this race is _not_ the problem that Martin encountered.  His 
> problem was a simple bug (failure to postpone a request), not a race or 
> a deadlock.
> 
> Fourthly, the race illustrated above is, for now, theoretical.  It 
> cannot occur with the existing code base (mostly because the existing 
> code is buggy).  The advantage of your series over the patch I submitted 
> on Aug. 23 ("block: Fix bug in runtime-resume handling") is that it 
> fixes Martin's problem without introducing this new race.

Hi Alan,

Thanks for having taken a look at this patch. Do you perhaps plan to
review the other patches in this series too?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC 6/6] block, scsi, ide: Only submit power management requests in state RPM_SUSPENDED
  2020-09-01  5:00     ` Bart Van Assche
@ 2020-09-01 14:50       ` Alan Stern
  0 siblings, 0 replies; 13+ messages in thread
From: Alan Stern @ 2020-09-01 14:50 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-scsi, Martin Kepplinger, Can Guo

On Mon, Aug 31, 2020 at 10:00:31PM -0700, Bart Van Assche wrote:
> Hi Alan,
> 
> Thanks for having taken a look at this patch. Do you perhaps plan to
> review the other patches in this series too?

I'll look over them, but I don't know enough about the block and IDE 
subsystems to do a competent review.

Alan Stern

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC 5/6] scsi_transport_spi: Freeze request queues instead of quiescing
  2020-08-31  2:53 ` [PATCH RFC 5/6] scsi_transport_spi: Freeze request queues instead of quiescing Bart Van Assche
@ 2020-09-03  1:39   ` Alan Stern
  0 siblings, 0 replies; 13+ messages in thread
From: Alan Stern @ 2020-09-03  1:39 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-scsi, Martin Kepplinger, Can Guo

On Sun, Aug 30, 2020 at 07:53:56PM -0700, Bart Van Assche wrote:
> Instead of quiescing the request queues involved in domain validation,
> freeze these. As a result, the struct request_queue pm_only member is no
> longer set during domain validation. That will allow to modify
> scsi_execute() such that it stops setting the BLK_MQ_REQ_PREEMPT flag.
> Three additional changes in this patch are that scsi_mq_alloc_queue() is
> exported, that scsi_device_quiesce() is no longer exported and that
> scsi_target_{quiesce,resume}() have been changed into
> scsi_target_{freeze,unfreeze}().
> 
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---

> --- a/drivers/scsi/scsi_transport_spi.c
> +++ b/drivers/scsi/scsi_transport_spi.c
> @@ -997,59 +997,75 @@ void
>  spi_dv_device(struct scsi_device *sdev)
>  {
>  	struct scsi_target *starget = sdev->sdev_target;
> +	struct request_queue *q1, *q2;
>  	u8 *buffer;
>  	const int len = SPI_MAX_ECHO_BUFFER_SIZE*2;
>  
>  	/*
> -	 * Because this function and the power management code both call
> -	 * scsi_device_quiesce(), it is not safe to perform domain validation
> -	 * while suspend or resume is in progress. Hence the
> -	 * lock/unlock_system_sleep() calls.
> +	 * Because creates a new request queue that is not visible to the rest
> +	 * of the system, domain validation must be serialized against suspend,
> +	 * resume and runtime power management. Hence the
> +	 * lock/unlock_system_sleep() and scsi_autopm_{get,put}_device() calls.
>  	 */
>  	lock_system_sleep();
>  
> +	if (scsi_autopm_get_device(sdev))
> +		goto unlock_system_sleep;
> +
>  	if (unlikely(spi_dv_in_progress(starget)))
> -		goto unlock;
> +		goto put_autopm;
>  
>  	if (unlikely(scsi_device_get(sdev)))
> -		goto unlock;
> +		goto put_autopm;
>  
>  	spi_dv_in_progress(starget) = 1;
>  
>  	buffer = kzalloc(len, GFP_KERNEL);
>  
>  	if (unlikely(!buffer))
> -		goto out_put;
> -
> -	/* We need to verify that the actual device will quiesce; the
> -	 * later target quiesce is just a nice to have */
> -	if (unlikely(scsi_device_quiesce(sdev)))
> -		goto out_free;
> -
> -	scsi_target_quiesce(starget);
> +		goto put_sdev;
>  
>  	spi_dv_pending(starget) = 1;

I'm not familiar with this code.  What happens if the test and 
assignment of spi_dv_in_progress() race between two threads?

It looks like the first to acquire the spi_dv_mutex will do a domain 
validation, then clear the spi_dv_in_progress and spi_dv_pending flags.
Then the second thread will perform another domain validation with 
those flags set to 0.  Is it supposed to work like that?

>  	mutex_lock(&spi_dv_mutex(starget));
>  
>  	starget_printk(KERN_INFO, starget, "Beginning Domain Validation\n");
>  
> -	spi_dv_device_internal(sdev, sdev->request_queue, buffer);
> +	/*
> +	 * Save the request queue pointer before it is overwritten by
> +	 * scsi_mq_alloc_queue().
> +	 */
> +	q1 = sdev->request_queue;
> +	q2 = scsi_mq_alloc_queue(sdev);
> +
> +	if (q2) {
> +		/*
> +		 * Restore the request queue pointer such that no other
> +		 * subsystem can submit SCSI commands to 'sdev'.
> +		 */

Too bad there's a little window here during which external commands can 
get sent to q2.

> +		sdev->request_queue = q1;
> +		scsi_target_freeze(starget);
> +		spi_dv_device_internal(sdev, q2, buffer);
> +		blk_cleanup_queue(q2);
> +		scsi_target_unfreeze(starget);
> +	}

No error message if the domain validation couldn't be performed?

Also, do you need to restore sdev->request_queue if the allocation 
failed?  It would be a little safer to move the restoration line 
immediately after the scsi_mq_alloc_queue call.

Ideally scsi_mq_alloc_queue would return q2 without assigning it to 
sdev->request_queue.

>  
>  	starget_printk(KERN_INFO, starget, "Ending Domain Validation\n");
>  
>  	mutex_unlock(&spi_dv_mutex(starget));
>  	spi_dv_pending(starget) = 0;
>  
> -	scsi_target_resume(starget);
> -
>  	spi_initial_dv(starget) = 1;
>  
> - out_free:
>  	kfree(buffer);
> - out_put:
> +
> +put_sdev:
>  	spi_dv_in_progress(starget) = 0;
>  	scsi_device_put(sdev);
> -unlock:
> +
> +put_autopm:
> +	scsi_autopm_put_device(sdev);
> +
> +unlock_system_sleep:
>  	unlock_system_sleep();
>  }
>  EXPORT_SYMBOL(spi_dv_device);

Alan Stern

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-09-03  1:39 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-31  2:53 [PATCH RFC 0/6] Fix a deadlock in the SCSI power management code Bart Van Assche
2020-08-31  2:53 ` [PATCH RFC 1/6] ide: Do not set the RQF_PREEMPT flag for sense requests Bart Van Assche
2020-08-31  2:53 ` [PATCH RFC 2/6] scsi: Remove an incorrect comment Bart Van Assche
2020-08-31  2:53 ` [PATCH RFC 3/6] scsi: Pass a request queue pointer to __scsi_execute() Bart Van Assche
2020-08-31  2:53 ` [PATCH RFC 4/6] scsi_transport_spi: Make spi_execute() accept a request queue pointer Bart Van Assche
2020-08-31  2:53 ` [PATCH RFC 5/6] scsi_transport_spi: Freeze request queues instead of quiescing Bart Van Assche
2020-09-03  1:39   ` Alan Stern
2020-08-31  2:53 ` [PATCH RFC 6/6] block, scsi, ide: Only submit power management requests in state RPM_SUSPENDED Bart Van Assche
2020-08-31 18:25   ` Alan Stern
2020-09-01  5:00     ` Bart Van Assche
2020-09-01 14:50       ` Alan Stern
2020-08-31  9:09 ` [PATCH RFC 0/6] Fix a deadlock in the SCSI power management code Martin Kepplinger
2020-09-01  3:55   ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.