linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] scsi: mpt3sas: fix hang on ata passthru commands
@ 2016-12-29  4:30 Jason Baron
  2016-12-29  8:02 ` Christoph Hellwig
  2016-12-29 16:16 ` David Miller
  0 siblings, 2 replies; 22+ messages in thread
From: Jason Baron @ 2016-12-29  4:30 UTC (permalink / raw)
  To: linux-scsi
  Cc: linux-kernel, Sathya Prakash, Chaitra P B,
	Suganath Prabu Subramani, Sreekanth Reddy, Hannes Reinecke,
	Martin K. Petersen, Bart Van Assche, Sagi Grimberg,
	James Bottomley, Christoph Hellwig, Doug Ledford, David Miller

On ata passthru commands scsih_qcmd() ends up spinning in
scsi_wait_for_queuecommand() indefinitely. scsih_qcmd() is called from
__blk_run_queue_uncond() which first increments request_fn_active to a
non-zero value. Thus, scsi_wait_for_queuecommand() never completes because
its spinning waiting for request_fn_active to become 0.

Two patches interact here. The first:

commit 18f6084a989b ("scsi: mpt3sas: Fix secure erase premature
termination") calls scsi_internal_device_block() for ata passthru commands.

The second patch:

commit 669f044170d8 ("scsi: srp_transport: Move queuecommand() wait code
to SCSI core") adds a call to scsi_wait_for_queuecommand() from
scsi_internal_device_block().

Add a new parameter to scsi_internal_device_block() to decide whether
or not to invoke scsi_wait_for_queuecommand().

Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: Sathya Prakash <sathya.prakash@broadcom.com>
Cc: Chaitra P B <chaitra.basappa@broadcom.com>
Cc: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com>
Cc: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: James Bottomley <jejb@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Doug Ledford <dledford@redhat.com>
Cc: David Miller <davem@davemloft.net>
---
 drivers/scsi/mpt3sas/mpt3sas_base.h  |  2 +-
 drivers/scsi/mpt3sas/mpt3sas_scsih.c |  6 +++---
 drivers/scsi/scsi_lib.c              | 11 +++++++----
 drivers/scsi/scsi_priv.h             |  2 +-
 4 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
index 394fe13..5da3427 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -1431,7 +1431,7 @@ void mpt3sas_transport_update_links(struct MPT3SAS_ADAPTER *ioc,
 	u64 sas_address, u16 handle, u8 phy_number, u8 link_rate);
 extern struct sas_function_template mpt3sas_transport_functions;
 extern struct scsi_transport_template *mpt3sas_transport_template;
-extern int scsi_internal_device_block(struct scsi_device *sdev);
+extern int scsi_internal_device_block(struct scsi_device *sdev, bool flush);
 extern int scsi_internal_device_unblock(struct scsi_device *sdev,
 				enum scsi_device_state new_state);
 /* trigger data externs */
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index b5c966e..509ef8a 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -2839,7 +2839,7 @@ _scsih_internal_device_block(struct scsi_device *sdev,
 	    sas_device_priv_data->sas_target->handle);
 	sas_device_priv_data->block = 1;
 
-	r = scsi_internal_device_block(sdev);
+	r = scsi_internal_device_block(sdev, true);
 	if (r == -EINVAL)
 		sdev_printk(KERN_WARNING, sdev,
 		    "device_block failed with return(%d) for handle(0x%04x)\n",
@@ -2875,7 +2875,7 @@ _scsih_internal_device_unblock(struct scsi_device *sdev,
 		    "performing a block followed by an unblock\n",
 		    r, sas_device_priv_data->sas_target->handle);
 		sas_device_priv_data->block = 1;
-		r = scsi_internal_device_block(sdev);
+		r = scsi_internal_device_block(sdev, true);
 		if (r)
 			sdev_printk(KERN_WARNING, sdev, "retried device_block "
 			    "failed with return(%d) for handle(0x%04x)\n",
@@ -4068,7 +4068,7 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *scmd)
 	 * done.
 	 */
 	if (ata_12_16_cmd(scmd))
-		scsi_internal_device_block(scmd->device);
+		scsi_internal_device_block(scmd->device, false);
 
 	sas_device_priv_data = scmd->device->hostdata;
 	if (!sas_device_priv_data || !sas_device_priv_data->sas_target) {
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index c35b6de..2ee2db9 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2856,9 +2856,11 @@ EXPORT_SYMBOL(scsi_target_resume);
 /**
  * scsi_internal_device_block - internal function to put a device temporarily into the SDEV_BLOCK state
  * @sdev:	device to block
+ * @flush:	wait for oustanding queuecommand calls to finish
  *
  * Block request made by scsi lld's to temporarily stop all
- * scsi commands on the specified device. May sleep.
+ * scsi commands on the specified device. May sleep if
+ * flush is set
  *
  * Returns zero if successful or error if not
  *
@@ -2873,7 +2875,7 @@ EXPORT_SYMBOL(scsi_target_resume);
  * remove the rport mutex lock and unlock calls from srp_queuecommand().
  */
 int
-scsi_internal_device_block(struct scsi_device *sdev)
+scsi_internal_device_block(struct scsi_device *sdev, bool flush)
 {
 	struct request_queue *q = sdev->request_queue;
 	unsigned long flags;
@@ -2898,7 +2900,8 @@ scsi_internal_device_block(struct scsi_device *sdev)
 		spin_lock_irqsave(q->queue_lock, flags);
 		blk_stop_queue(q);
 		spin_unlock_irqrestore(q->queue_lock, flags);
-		scsi_wait_for_queuecommand(sdev);
+		if (flush)
+			scsi_wait_for_queuecommand(sdev);
 	}
 
 	return 0;
@@ -2960,7 +2963,7 @@ EXPORT_SYMBOL_GPL(scsi_internal_device_unblock);
 static void
 device_block(struct scsi_device *sdev, void *data)
 {
-	scsi_internal_device_block(sdev);
+	scsi_internal_device_block(sdev, true);
 }
 
 static int
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 193636a..c0f79b8 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -189,7 +189,7 @@ static inline void scsi_dh_remove_device(struct scsi_device *sdev) { }
  */
 
 #define SCSI_DEVICE_BLOCK_MAX_TIMEOUT	600	/* units in seconds */
-extern int scsi_internal_device_block(struct scsi_device *sdev);
+extern int scsi_internal_device_block(struct scsi_device *sdev, bool flush);
 extern int scsi_internal_device_unblock(struct scsi_device *sdev,
 					enum scsi_device_state new_state);
 
-- 
2.6.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] scsi: mpt3sas: fix hang on ata passthru commands
  2016-12-29  4:30 [PATCH] scsi: mpt3sas: fix hang on ata passthru commands Jason Baron
@ 2016-12-29  8:02 ` Christoph Hellwig
  2016-12-29 16:02   ` Jason Baron
  2016-12-31 23:19   ` James Bottomley
  2016-12-29 16:16 ` David Miller
  1 sibling, 2 replies; 22+ messages in thread
From: Christoph Hellwig @ 2016-12-29  8:02 UTC (permalink / raw)
  To: Jason Baron
  Cc: linux-scsi, linux-kernel, Sathya Prakash, Chaitra P B,
	Suganath Prabu Subramani, Sreekanth Reddy, Hannes Reinecke,
	Martin K. Petersen, Bart Van Assche, Sagi Grimberg,
	James Bottomley, Christoph Hellwig, Doug Ledford, David Miller

On Wed, Dec 28, 2016 at 11:30:24PM -0500, Jason Baron wrote:
> Add a new parameter to scsi_internal_device_block() to decide whether
> or not to invoke scsi_wait_for_queuecommand().

We'll also need to deal with the blk-mq wait path that Bart has been
working on (I think it's already in the scsi tree, but I'd have to
check).

Also adding a bool flag for the last call in a function is style that's
a little annoying.

I'd prefer to add a scsi_internal_device_block_nowait that contains
all the code except for the waiting, and then make
scsi_internal_device_block_nowait a wrapper around it.  Or drop the
annoying internal for both while we're at it :)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] scsi: mpt3sas: fix hang on ata passthru commands
  2016-12-29  8:02 ` Christoph Hellwig
@ 2016-12-29 16:02   ` Jason Baron
  2016-12-31 23:19   ` James Bottomley
  1 sibling, 0 replies; 22+ messages in thread
From: Jason Baron @ 2016-12-29 16:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-scsi, linux-kernel, Sathya Prakash, Chaitra P B,
	Suganath Prabu Subramani, Sreekanth Reddy, Hannes Reinecke,
	Martin K. Petersen, Bart Van Assche, Sagi Grimberg,
	James Bottomley, Christoph Hellwig, Doug Ledford, David Miller

On 12/29/2016 03:02 AM, Christoph Hellwig wrote:

> On Wed, Dec 28, 2016 at 11:30:24PM -0500, Jason Baron wrote:
>> Add a new parameter to scsi_internal_device_block() to decide whether
>> or not to invoke scsi_wait_for_queuecommand().
> We'll also need to deal with the blk-mq wait path that Bart has been
> working on (I think it's already in the scsi tree, but I'd have to
> check).
Ok, I'm not sure either.

> Also adding a bool flag for the last call in a function is style that's
> a little annoying.
>
> I'd prefer to add a scsi_internal_device_block_nowait that contains
> all the code except for the waiting, and then make
> scsi_internal_device_block_nowait a wrapper around it.  Or drop the
> annoying internal for both while we're at it :)

The proposed patch brings the code in-line with what is in 4.8 stable
where scsi_internal_device_block() does not call
scsi_wait_for_queuecommand(). So I saw it as a minimal fix to make
my system boot again :)

I was wondering if the original fix is racy in that there could be multiple
threads in the queuecommand. Perhaps we should do something like:
       
    if (ata_12_16_cmd(scmd))
{                                                                                                                                   

                if (!test_and_set_bit(MPT_DEVICE_EXCLUSIVE,
&sas_device_priv_data->flags))
{                                                                         
                       
scsi_internal_device_block(scmd->device);                                                                                                    

                }
else                                                                                                                                               

                        return
SCSI_MLQUEUE_HOST_BUSY;                                                                                                               

        }        

where scsi_internal_device_block() could be taught to wait for
request_fn_active becoming 1 instead of 0.

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] scsi: mpt3sas: fix hang on ata passthru commands
  2016-12-29  4:30 [PATCH] scsi: mpt3sas: fix hang on ata passthru commands Jason Baron
  2016-12-29  8:02 ` Christoph Hellwig
@ 2016-12-29 16:16 ` David Miller
  1 sibling, 0 replies; 22+ messages in thread
From: David Miller @ 2016-12-29 16:16 UTC (permalink / raw)
  To: jbaron
  Cc: linux-scsi, linux-kernel, sathya.prakash, chaitra.basappa,
	suganath-prabu.subramani, Sreekanth.Reddy, hare, martin.petersen,
	bart.vanassche, sagi, jejb, hch, dledford

From: Jason Baron <jbaron@akamai.com>
Date: Wed, 28 Dec 2016 23:30:24 -0500

> On ata passthru commands scsih_qcmd() ends up spinning in
> scsi_wait_for_queuecommand() indefinitely. scsih_qcmd() is called from
> __blk_run_queue_uncond() which first increments request_fn_active to a
> non-zero value. Thus, scsi_wait_for_queuecommand() never completes because
> its spinning waiting for request_fn_active to become 0.
> 
> Two patches interact here. The first:
> 
> commit 18f6084a989b ("scsi: mpt3sas: Fix secure erase premature
> termination") calls scsi_internal_device_block() for ata passthru commands.
> 
> The second patch:
> 
> commit 669f044170d8 ("scsi: srp_transport: Move queuecommand() wait code
> to SCSI core") adds a call to scsi_wait_for_queuecommand() from
> scsi_internal_device_block().
> 
> Add a new parameter to scsi_internal_device_block() to decide whether
> or not to invoke scsi_wait_for_queuecommand().
> 
> Signed-off-by: Jason Baron <jbaron@akamai.com>

Tested-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] scsi: mpt3sas: fix hang on ata passthru commands
  2016-12-29  8:02 ` Christoph Hellwig
  2016-12-29 16:02   ` Jason Baron
@ 2016-12-31 23:19   ` James Bottomley
  2017-01-01 14:22     ` Bart Van Assche
  1 sibling, 1 reply; 22+ messages in thread
From: James Bottomley @ 2016-12-31 23:19 UTC (permalink / raw)
  To: Christoph Hellwig, Jason Baron
  Cc: linux-scsi, linux-kernel, Sathya Prakash, Chaitra P B,
	Suganath Prabu Subramani, Sreekanth Reddy, Hannes Reinecke,
	Martin K. Petersen, Bart Van Assche, Sagi Grimberg,
	Christoph Hellwig, Doug Ledford, David Miller

On Thu, 2016-12-29 at 00:02 -0800, Christoph Hellwig wrote:
> On Wed, Dec 28, 2016 at 11:30:24PM -0500, Jason Baron wrote:
> > Add a new parameter to scsi_internal_device_block() to decide 
> > whether or not to invoke scsi_wait_for_queuecommand().
> 
> We'll also need to deal with the blk-mq wait path that Bart has been
> working on (I think it's already in the scsi tree, but I'd have to
> check).
> 
> Also adding a bool flag for the last call in a function is style 
> that's a little annoying.
> 
> I'd prefer to add a scsi_internal_device_block_nowait that contains
> all the code except for the waiting, and then make
> scsi_internal_device_block_nowait a wrapper around it.  Or drop the
> annoying internal for both while we're at it :)

OK, I know it's new year, but this is an unpatched regression in -rc1
that's causing serious issues.  I would like this fixed by -rc3 so we
have 3 options

   1. revert all the queuecommand wait stuff until it proves it's actually
      working without regressions
   2. apply this patch and fix the style issues later
   3. someone else supplies the correctly styled fix patch

The conservative in me says that 1. is probably the most correct thing
to do because it gives us time to get the queuecommand wait stuff
right; that's what I'll probably do if there's no movement next week. 
 However, since we're reasonably early in the -rc cycle, so either 2 or
3 are viable provided no further regressions caused by the queuecommand
wait stuff pop up.

James

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] scsi: mpt3sas: fix hang on ata passthru commands
  2016-12-31 23:19   ` James Bottomley
@ 2017-01-01 14:22     ` Bart Van Assche
  2017-01-01 15:30       ` Jason Baron
  2017-01-01 16:33       ` David Miller
  0 siblings, 2 replies; 22+ messages in thread
From: Bart Van Assche @ 2017-01-01 14:22 UTC (permalink / raw)
  To: jejb, hch, jbaron
  Cc: linux-kernel, sagi, sathya.prakash, suganath-prabu.subramani,
	martin.petersen, hare, linux-scsi, hch, davem, Sreekanth.Reddy,
	chaitra.basappa, dledford

On Sat, 2016-12-31 at 15:19 -0800, James Bottomley wrote:
> On Thu, 2016-12-29 at 00:02 -0800, Christoph Hellwig wrote:
> > On Wed, Dec 28, 2016 at 11:30:24PM -0500, Jason Baron wrote:
> > > Add a new parameter to scsi_internal_device_block() to decide 
> > > whether or not to invoke scsi_wait_for_queuecommand().
> > 
> > We'll also need to deal with the blk-mq wait path that Bart has been
> > working on (I think it's already in the scsi tree, but I'd have to
> > check).
> > 
> > Also adding a bool flag for the last call in a function is style 
> > that's a little annoying.
> > 
> > I'd prefer to add a scsi_internal_device_block_nowait that contains
> > all the code except for the waiting, and then make
> > scsi_internal_device_block_nowait a wrapper around it.  Or drop the
> > annoying internal for both while we're at it :)
> 
> OK, I know it's new year, but this is an unpatched regression in -rc1
> that's causing serious issues.  I would like this fixed by -rc3 so we
> have 3 options
> 
>    1. revert all the queuecommand wait stuff until it proves it's actually
>       working without regressions
>    2. apply this patch and fix the style issues later
>    3. someone else supplies the correctly styled fix patch
> 
> The conservative in me says that 1. is probably the most correct thing
> to do because it gives us time to get the queuecommand wait stuff
> right; that's what I'll probably do if there's no movement next week. 
>  However, since we're reasonably early in the -rc cycle, so either 2 or
> 3 are viable provided no further regressions caused by the queuecommand
> wait stuff pop up.

Hello James,

My recommendation is to revert commit 18f6084a989b ("scsi: mpt3sas: Fix
secure erase premature termination"). Since the mpt3sas driver uses the
single-queue approach and since the SCSI core unlocks the block layer
request queue lock before the .queuecommand callback function is called,
multiple threads can execute that callback function (scsih_qcmd() in this
case) simultaneously. This means that using scsi_internal_device_block()
from inside .queuecommand to serialize SCSI command execution is wrong.

Bart.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] scsi: mpt3sas: fix hang on ata passthru commands
  2017-01-01 14:22     ` Bart Van Assche
@ 2017-01-01 15:30       ` Jason Baron
  2017-01-01 16:33       ` David Miller
  1 sibling, 0 replies; 22+ messages in thread
From: Jason Baron @ 2017-01-01 15:30 UTC (permalink / raw)
  To: Bart Van Assche, jejb, hch
  Cc: linux-kernel, sagi, sathya.prakash, suganath-prabu.subramani,
	martin.petersen, hare, linux-scsi, hch, davem, Sreekanth.Reddy,
	chaitra.basappa, dledford



On 01/01/2017 09:22 AM, Bart Van Assche wrote:
> On Sat, 2016-12-31 at 15:19 -0800, James Bottomley wrote:
>> On Thu, 2016-12-29 at 00:02 -0800, Christoph Hellwig wrote:
>>> On Wed, Dec 28, 2016 at 11:30:24PM -0500, Jason Baron wrote:
>>>> Add a new parameter to scsi_internal_device_block() to decide 
>>>> whether or not to invoke scsi_wait_for_queuecommand().
>>> We'll also need to deal with the blk-mq wait path that Bart has been
>>> working on (I think it's already in the scsi tree, but I'd have to
>>> check).
>>>
>>> Also adding a bool flag for the last call in a function is style 
>>> that's a little annoying.
>>>
>>> I'd prefer to add a scsi_internal_device_block_nowait that contains
>>> all the code except for the waiting, and then make
>>> scsi_internal_device_block_nowait a wrapper around it.  Or drop the
>>> annoying internal for both while we're at it :)
>> OK, I know it's new year, but this is an unpatched regression in -rc1
>> that's causing serious issues.  I would like this fixed by -rc3 so we
>> have 3 options
>>
>>    1. revert all the queuecommand wait stuff until it proves it's actually
>>       working without regressions
>>    2. apply this patch and fix the style issues later
>>    3. someone else supplies the correctly styled fix patch
>>
>> The conservative in me says that 1. is probably the most correct thing
>> to do because it gives us time to get the queuecommand wait stuff
>> right; that's what I'll probably do if there's no movement next week. 
>>  However, since we're reasonably early in the -rc cycle, so either 2 or
>> 3 are viable provided no further regressions caused by the queuecommand
>> wait stuff pop up.
> Hello James,
>
> My recommendation is to revert commit 18f6084a989b ("scsi: mpt3sas: Fix
> secure erase premature termination"). Since the mpt3sas driver uses the
> single-queue approach and since the SCSI core unlocks the block layer
> request queue lock before the .queuecommand callback function is called,
> multiple threads can execute that callback function (scsih_qcmd() in this
> case) simultaneously. This means that using scsi_internal_device_block()
> from inside .queuecommand to serialize SCSI command execution is wrong.
>
> Bart.

Hi,

Yes, commit 18f6084a989b looked racy to me as well. I noted that
in a follow up to my patch, and a suggested a way of fixing it:
http://lkml.iu.edu/hypermail/linux/kernel/1612.3/01301.html

Also worth noting is that commit 18f6084a989b was backported
to at least 4.8 stable.

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] scsi: mpt3sas: fix hang on ata passthru commands
  2017-01-01 14:22     ` Bart Van Assche
  2017-01-01 15:30       ` Jason Baron
@ 2017-01-01 16:33       ` David Miller
  2017-01-01 17:39         ` James Bottomley
  1 sibling, 1 reply; 22+ messages in thread
From: David Miller @ 2017-01-01 16:33 UTC (permalink / raw)
  To: Bart.VanAssche
  Cc: jejb, hch, jbaron, linux-kernel, sagi, sathya.prakash,
	suganath-prabu.subramani, martin.petersen, hare, linux-scsi, hch,
	Sreekanth.Reddy, chaitra.basappa, dledford

From: Bart Van Assche <Bart.VanAssche@sandisk.com>
Date: Sun, 1 Jan 2017 14:22:11 +0000

> My recommendation is to revert commit 18f6084a989b ("scsi: mpt3sas: Fix
> secure erase premature termination"). Since the mpt3sas driver uses the
> single-queue approach and since the SCSI core unlocks the block layer
> request queue lock before the .queuecommand callback function is called,
> multiple threads can execute that callback function (scsih_qcmd() in this
> case) simultaneously. This means that using scsi_internal_device_block()
> from inside .queuecommand to serialize SCSI command execution is wrong.

But this causes a regression for the thing being fixed by that commit.

Why don't we figure out what that semantics that commit was trying to
achieve and implement that properly?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] scsi: mpt3sas: fix hang on ata passthru commands
  2017-01-01 16:33       ` David Miller
@ 2017-01-01 17:39         ` James Bottomley
  2017-01-03 20:46           ` Jason Baron
  2017-01-06  1:59           ` Martin K. Petersen
  0 siblings, 2 replies; 22+ messages in thread
From: James Bottomley @ 2017-01-01 17:39 UTC (permalink / raw)
  To: David Miller, Bart.VanAssche
  Cc: hch, jbaron, linux-kernel, sagi, sathya.prakash,
	suganath-prabu.subramani, martin.petersen, hare, linux-scsi, hch,
	Sreekanth.Reddy, chaitra.basappa, dledford

On Sun, 2017-01-01 at 11:33 -0500, David Miller wrote:
> From: Bart Van Assche <Bart.VanAssche@sandisk.com>
> Date: Sun, 1 Jan 2017 14:22:11 +0000
> 
> > My recommendation is to revert commit 18f6084a989b ("scsi: mpt3sas: Fix
> > secure erase premature termination"). Since the mpt3sas driver uses the
> > single-queue approach and since the SCSI core unlocks the block layer
> > request queue lock before the .queuecommand callback function is called,
> > multiple threads can execute that callback function (scsih_qcmd() in this
> > case) simultaneously. This means that using scsi_internal_device_block()
> > from inside .queuecommand to serialize SCSI command execution is wrong.
> 
> But this causes a regression for the thing being fixed by that
> commit.

Right, we don't do that; that's why I didn't list it as one of the
options.

> Why don't we figure out what that semantics that commit was trying to
> achieve and implement that properly?

Now that I look at the reviews, each of the reviewers said what the
correct thing to do was: return SAM_STAT_BUSY if SATL commands are
outstanding like the spec says.  You all get negative brownie points
for not insisting on a rework.

Does this patch (compile tested only) fix the problems for everyone?

James

---

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
index 394fe13..0983a65 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -393,6 +393,7 @@ struct MPT3SAS_TARGET {
  * @eedp_enable: eedp support enable bit
  * @eedp_type: 0(type_1), 1(type_2), 2(type_3)
  * @eedp_block_length: block size
+ * @ata_command_pending: SATL passthrough outstanding for device
  */
 struct MPT3SAS_DEVICE {
 	struct MPT3SAS_TARGET *sas_target;
@@ -404,6 +405,17 @@ struct MPT3SAS_DEVICE {
 	u8	ignore_delay_remove;
 	/* Iopriority Command Handling */
 	u8	ncq_prio_enable;
+	/*
+	 * Bug workaround for SATL handling: the mpt2/3sas firmware
+	 * doesn't return BUSY or TASK_SET_FULL for subsequent
+	 * commands while a SATL pass through is in operation as the
+	 * spec requires, it simply does nothing with them until the
+	 * pass through completes, causing them possibly to timeout if
+	 * the passthrough is a long executing command (like format or
+	 * secure erase).  This variable allows us to do the right
+	 * thing while a SATL command is pending.
+	 */
+	u8 ata_command_pending;
 
 };
 
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index b5c966e..1446a0a 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -3899,9 +3899,12 @@ _scsih_temp_threshold_events(struct MPT3SAS_ADAPTER *ioc,
 	}
 }
 
-static inline bool ata_12_16_cmd(struct scsi_cmnd *scmd)
+static void set_satl_pending(struct scsi_cmnd *scmd, bool pending)
 {
-	return (scmd->cmnd[0] == ATA_12 || scmd->cmnd[0] == ATA_16);
+	struct MPT3SAS_DEVICE *priv = scmd->device->hostdata;
+
+	if  (scmd->cmnd[0] == ATA_12 || scmd->cmnd[0] == ATA_16)
+		priv->ata_command_pending = pending;
 }
 
 /**
@@ -3925,9 +3928,7 @@ _scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc)
 		if (!scmd)
 			continue;
 		count++;
-		if (ata_12_16_cmd(scmd))
-			scsi_internal_device_unblock(scmd->device,
-							SDEV_RUNNING);
+		set_satl_pending(scmd, false);
 		mpt3sas_base_free_smid(ioc, smid);
 		scsi_dma_unmap(scmd);
 		if (ioc->pci_error_recovery)
@@ -4063,13 +4064,6 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *scmd)
 	if (ioc->logging_level & MPT_DEBUG_SCSI)
 		scsi_print_command(scmd);
 
-	/*
-	 * Lock the device for any subsequent command until command is
-	 * done.
-	 */
-	if (ata_12_16_cmd(scmd))
-		scsi_internal_device_block(scmd->device);
-
 	sas_device_priv_data = scmd->device->hostdata;
 	if (!sas_device_priv_data || !sas_device_priv_data->sas_target) {
 		scmd->result = DID_NO_CONNECT << 16;
@@ -4083,6 +4077,16 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *scmd)
 		return 0;
 	}
 
+	/*
+	 * Bug work around for firmware SATL handling
+	 */
+	if (sas_device_priv_data->ata_command_pending) {
+		scmd->result = SAM_STAT_BUSY;
+		scmd->scsi_done(scmd);
+		return 0;
+	}
+	set_satl_pending(scmd, true);
+
 	sas_target_priv_data = sas_device_priv_data->sas_target;
 
 	/* invalid device handle */
@@ -4650,8 +4654,7 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 msix_index, u32 reply)
 	if (scmd == NULL)
 		return 1;
 
-	if (ata_12_16_cmd(scmd))
-		scsi_internal_device_unblock(scmd->device, SDEV_RUNNING);
+	set_satl_pending(scmd, false);
 
 	mpi_request = mpt3sas_base_get_msg_frame(ioc, smid);
 

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] scsi: mpt3sas: fix hang on ata passthru commands
  2017-01-01 17:39         ` James Bottomley
@ 2017-01-03 20:46           ` Jason Baron
  2017-01-15 17:01             ` James Bottomley
  2017-01-06  1:59           ` Martin K. Petersen
  1 sibling, 1 reply; 22+ messages in thread
From: Jason Baron @ 2017-01-03 20:46 UTC (permalink / raw)
  To: James Bottomley, David Miller, Bart.VanAssche
  Cc: hch, linux-kernel, sagi, sathya.prakash,
	suganath-prabu.subramani, martin.petersen, hare, linux-scsi, hch,
	Sreekanth.Reddy, chaitra.basappa, dledford

On 01/01/2017 12:39 PM, James Bottomley wrote:
> On Sun, 2017-01-01 at 11:33 -0500, David Miller wrote:
>> From: Bart Van Assche <Bart.VanAssche@sandisk.com>
>> Date: Sun, 1 Jan 2017 14:22:11 +0000
>>
>>> My recommendation is to revert commit 18f6084a989b ("scsi: mpt3sas: Fix
>>> secure erase premature termination"). Since the mpt3sas driver uses the
>>> single-queue approach and since the SCSI core unlocks the block layer
>>> request queue lock before the .queuecommand callback function is called,
>>> multiple threads can execute that callback function (scsih_qcmd() in this
>>> case) simultaneously. This means that using scsi_internal_device_block()
>>> from inside .queuecommand to serialize SCSI command execution is wrong.
>>
>> But this causes a regression for the thing being fixed by that
>> commit.
>
> Right, we don't do that; that's why I didn't list it as one of the
> options.
>
>> Why don't we figure out what that semantics that commit was trying to
>> achieve and implement that properly?
>
> Now that I look at the reviews, each of the reviewers said what the
> correct thing to do was: return SAM_STAT_BUSY if SATL commands are
> outstanding like the spec says.  You all get negative brownie points
> for not insisting on a rework.
>
> Does this patch (compile tested only) fix the problems for everyone?
>

Hi,

Yes, with this patch applied my system boots :)

...

> @@ -4083,6 +4077,16 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *scmd)
>  		return 0;
>  	}
>
> +	/*
> +	 * Bug work around for firmware SATL handling
> +	 */
> +	if (sas_device_priv_data->ata_command_pending) {
> +		scmd->result = SAM_STAT_BUSY;
> +		scmd->scsi_done(scmd);
> +		return 0;
> +	}
> +	set_satl_pending(scmd, true);
> +
>  	sas_target_priv_data = sas_device_priv_data->sas_target;
>
>  	/* invalid device handle */


I was also wondering if 2 threads could both fall through and do:
'set_satl_pending(scmd, true)'; ?

Afaict there is nothing preventing that race?

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] scsi: mpt3sas: fix hang on ata passthru commands
  2017-01-01 17:39         ` James Bottomley
  2017-01-03 20:46           ` Jason Baron
@ 2017-01-06  1:59           ` Martin K. Petersen
  2017-01-06 15:46             ` Sreekanth Reddy
  2017-01-16 20:01             ` James Bottomley
  1 sibling, 2 replies; 22+ messages in thread
From: Martin K. Petersen @ 2017-01-06  1:59 UTC (permalink / raw)
  To: James Bottomley
  Cc: David Miller, Bart.VanAssche, hch, jbaron, linux-kernel, sagi,
	sathya.prakash, suganath-prabu.subramani, martin.petersen, hare,
	linux-scsi, hch, Sreekanth.Reddy, chaitra.basappa, dledford

>>>>> "James" == James Bottomley <jejb@linux.vnet.ibm.com> writes:

James> Now that I look at the reviews, each of the reviewers said what
James> the correct thing to do was: return SAM_STAT_BUSY if SATL
James> commands are outstanding like the spec says.  You all get
James> negative brownie points for not insisting on a rework.

James> Does this patch (compile tested only) fix the problems for
James> everyone?

I also like this approach better.

Broadcom folks: Please comment and test.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] scsi: mpt3sas: fix hang on ata passthru commands
  2017-01-06  1:59           ` Martin K. Petersen
@ 2017-01-06 15:46             ` Sreekanth Reddy
  2017-01-10  4:50               ` Martin K. Petersen
  2017-01-16 20:01             ` James Bottomley
  1 sibling, 1 reply; 22+ messages in thread
From: Sreekanth Reddy @ 2017-01-06 15:46 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, David Miller, Bart Van Assche,
	Christoph Hellwig, jbaron, linux-kernel, sagi, Sathya Prakash,
	Suganath Prabu Subramani, Hannes Reinecke, linux-scsi,
	Christoph Hellwig, Chaitra Basappa, dledford

On Fri, Jan 6, 2017 at 7:29 AM, Martin K. Petersen
<martin.petersen@oracle.com> wrote:
>>>>>> "James" == James Bottomley <jejb@linux.vnet.ibm.com> writes:
>
> James> Now that I look at the reviews, each of the reviewers said what
> James> the correct thing to do was: return SAM_STAT_BUSY if SATL
> James> commands are outstanding like the spec says.  You all get
> James> negative brownie points for not insisting on a rework.
>
> James> Does this patch (compile tested only) fix the problems for
> James> everyone?
>
> I also like this approach better.
>
> Broadcom folks: Please comment and test.

Matin, We are fine with this patch. Can we rename function
'set_satl_pending()' name to '_scsih_set_satl_pending()' and can add
headers to this function.

other wise I am OK.

Acked-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>

>
> --
> Martin K. Petersen      Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] scsi: mpt3sas: fix hang on ata passthru commands
  2017-01-06 15:46             ` Sreekanth Reddy
@ 2017-01-10  4:50               ` Martin K. Petersen
  0 siblings, 0 replies; 22+ messages in thread
From: Martin K. Petersen @ 2017-01-10  4:50 UTC (permalink / raw)
  To: Sreekanth Reddy
  Cc: Martin K. Petersen, James Bottomley, David Miller,
	Bart Van Assche, Christoph Hellwig, jbaron, linux-kernel, sagi,
	Sathya Prakash, Suganath Prabu Subramani, Hannes Reinecke,
	linux-scsi, Christoph Hellwig, Chaitra Basappa, dledford

>>>>> "Sreekanth" == Sreekanth Reddy <sreekanth.reddy@broadcom.com> writes:

Sreekanth> We are fine with this patch. Can we rename function
Sreekanth> 'set_satl_pending()' name to '_scsih_set_satl_pending()' and
Sreekanth> can add headers to this function.

Sreekanth> other wise I am OK.

Sreekanth> Acked-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>

James: Please tweak and post an updated patch.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] scsi: mpt3sas: fix hang on ata passthru commands
  2017-01-03 20:46           ` Jason Baron
@ 2017-01-15 17:01             ` James Bottomley
  2017-01-16 16:20               ` Bart Van Assche
  0 siblings, 1 reply; 22+ messages in thread
From: James Bottomley @ 2017-01-15 17:01 UTC (permalink / raw)
  To: Jason Baron, David Miller, Bart.VanAssche
  Cc: hch, linux-kernel, sagi, sathya.prakash,
	suganath-prabu.subramani, martin.petersen, hare, linux-scsi, hch,
	Sreekanth.Reddy, chaitra.basappa, dledford

On Tue, 2017-01-03 at 15:46 -0500, Jason Baron wrote:
> On 01/01/2017 12:39 PM, James Bottomley wrote:
> > +	/*
> > +	 * Bug work around for firmware SATL handling
> > +	 */
> > +	if (sas_device_priv_data->ata_command_pending) {
> > +		scmd->result = SAM_STAT_BUSY;
> > +		scmd->scsi_done(scmd);
> > +		return 0;
> > +	}
> > +	set_satl_pending(scmd, true);
> > +
> >  	sas_target_priv_data = sas_device_priv_data->sas_target;
> > 
> >  	/* invalid device handle */
> 
> 
> I was also wondering if 2 threads could both fall through and do:
> 'set_satl_pending(scmd, true)'; ?
> 
> Afaict there is nothing preventing that race?

Yes, it does look like queuecommand is lockless in the mpt3sas.  How
about this version using bitops for atomicity?

James

---

>From b47c28434e9cee9cbb95a794c97ec53657408111 Mon Sep 17 00:00:00 2001
From: James Bottomley <jejb@linux.vnet.ibm.com>
Date: Sun, 1 Jan 2017 09:39:24 -0800
Subject: [PATCH] scsi: mpt3sas: fix hang on ata passthru commands

mp3sas has a firmware failure where it can only handle one pass
through ATA command at a time.  If another comes in, contrary to the
SAT standard, it will hang until the first one completes (causing long
commands like secure erase to timeout).  The original fix was to block
the device when an ATA command came in, but this caused a regression
with

commit 669f044170d8933c3d66d231b69ea97cb8447338
Author: Bart Van Assche <bart.vanassche@sandisk.com>
Date:   Tue Nov 22 16:17:13 2016 -0800

    scsi: srp_transport: Move queuecommand() wait code to SCSI core

So fix the original fix of the secure erase timeout by properly
returning SAM_STAT_BUSY like the SAT recommends.

Fixes: 18f6084a989ba1b38702f9af37a2e4049a924be6
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

---

v2 - use bitops for lockless atomicity

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
index 394fe13..dcb33f4 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -393,6 +393,7 @@ struct MPT3SAS_TARGET {
  * @eedp_enable: eedp support enable bit
  * @eedp_type: 0(type_1), 1(type_2), 2(type_3)
  * @eedp_block_length: block size
+ * @ata_command_pending: SATL passthrough outstanding for device
  */
 struct MPT3SAS_DEVICE {
 	struct MPT3SAS_TARGET *sas_target;
@@ -404,6 +405,17 @@ struct MPT3SAS_DEVICE {
 	u8	ignore_delay_remove;
 	/* Iopriority Command Handling */
 	u8	ncq_prio_enable;
+	/*
+	 * Bug workaround for SATL handling: the mpt2/3sas firmware
+	 * doesn't return BUSY or TASK_SET_FULL for subsequent
+	 * commands while a SATL pass through is in operation as the
+	 * spec requires, it simply does nothing with them until the
+	 * pass through completes, causing them possibly to timeout if
+	 * the passthrough is a long executing command (like format or
+	 * secure erase).  This variable allows us to do the right
+	 * thing while a SATL command is pending.
+	 */
+	unsigned long ata_command_pending;
 
 };
 
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index b5c966e..6f9b4c0 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -3899,9 +3899,18 @@ _scsih_temp_threshold_events(struct MPT3SAS_ADAPTER *ioc,
 	}
 }
 
-static inline bool ata_12_16_cmd(struct scsi_cmnd *scmd)
+static int set_satl_pending(struct scsi_cmnd *scmd, bool pending)
 {
-	return (scmd->cmnd[0] == ATA_12 || scmd->cmnd[0] == ATA_16);
+	struct MPT3SAS_DEVICE *priv = scmd->device->hostdata;
+
+	if  (scmd->cmnd[0] != ATA_12 && scmd->cmnd[0] != ATA_16)
+		return 0;
+
+	if (pending)
+		return test_and_set_bit(0, &priv->ata_command_pending);
+
+	clear_bit(0, &priv->ata_command_pending);
+	return 0;
 }
 
 /**
@@ -3925,9 +3934,7 @@ _scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc)
 		if (!scmd)
 			continue;
 		count++;
-		if (ata_12_16_cmd(scmd))
-			scsi_internal_device_unblock(scmd->device,
-							SDEV_RUNNING);
+		set_satl_pending(scmd, false);
 		mpt3sas_base_free_smid(ioc, smid);
 		scsi_dma_unmap(scmd);
 		if (ioc->pci_error_recovery)
@@ -4063,13 +4070,6 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *scmd)
 	if (ioc->logging_level & MPT_DEBUG_SCSI)
 		scsi_print_command(scmd);
 
-	/*
-	 * Lock the device for any subsequent command until command is
-	 * done.
-	 */
-	if (ata_12_16_cmd(scmd))
-		scsi_internal_device_block(scmd->device);
-
 	sas_device_priv_data = scmd->device->hostdata;
 	if (!sas_device_priv_data || !sas_device_priv_data->sas_target) {
 		scmd->result = DID_NO_CONNECT << 16;
@@ -4083,6 +4083,17 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *scmd)
 		return 0;
 	}
 
+	/*
+	 * Bug work around for firmware SATL handling
+	 */
+	do {
+		if (sas_device_priv_data->ata_command_pending) {
+			scmd->result = SAM_STAT_BUSY;
+			scmd->scsi_done(scmd);
+			return 0;
+		}
+	} while (set_satl_pending(scmd, true));
+
 	sas_target_priv_data = sas_device_priv_data->sas_target;
 
 	/* invalid device handle */
@@ -4650,8 +4661,7 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 msix_index, u32 reply)
 	if (scmd == NULL)
 		return 1;
 
-	if (ata_12_16_cmd(scmd))
-		scsi_internal_device_unblock(scmd->device, SDEV_RUNNING);
+	set_satl_pending(scmd, false);
 
 	mpi_request = mpt3sas_base_get_msg_frame(ioc, smid);
 

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] scsi: mpt3sas: fix hang on ata passthru commands
  2017-01-15 17:01             ` James Bottomley
@ 2017-01-16 16:20               ` Bart Van Assche
  0 siblings, 0 replies; 22+ messages in thread
From: Bart Van Assche @ 2017-01-16 16:20 UTC (permalink / raw)
  To: jejb, davem, jbaron
  Cc: linux-kernel, hch, sagi, suganath-prabu.subramani,
	sathya.prakash, martin.petersen, hare, linux-scsi, hch,
	Sreekanth.Reddy, chaitra.basappa, dledford

On Sun, 2017-01-15 at 09:01 -0800, James Bottomley wrote:
> From b47c28434e9cee9cbb95a794c97ec53657408111 Mon Sep 17 00:00:00 2001
> From: James Bottomley <jejb@linux.vnet.ibm.com>
> Date: Sun, 1 Jan 2017 09:39:24 -0800
> Subject: [PATCH] scsi: mpt3sas: fix hang on ata passthru commands
> 
> mp3sas has a firmware failure where it can only handle one pass
> through ATA command at a time.  If another comes in, contrary to the
> SAT standard, it will hang until the first one completes (causing long
> commands like secure erase to timeout).  The original fix was to block
> the device when an ATA command came in, but this caused a regression
> with
> 
> commit 669f044170d8933c3d66d231b69ea97cb8447338
> Author: Bart Van Assche <bart.vanassche@sandisk.com>
> Date:   Tue Nov 22 16:17:13 2016 -0800
> 
>     scsi: srp_transport: Move queuecommand() wait code to SCSI core
> 
> So fix the original fix of the secure erase timeout by properly
> returning SAM_STAT_BUSY like the SAT recommends.
> 
> Fixes: 18f6084a989ba1b38702f9af37a2e4049a924be6
> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

Hello James,

This description looks incomplete to me. It doesn't mention the race
condition that was introduced by patch "scsi: mpt3sas: Fix secure
erase premature termination". Please also follow the guidelines from
process/submitting-patches.rst for the "Fixes:" tag. A quote from that
document: "please use the 'Fixes:' tag with the first 12 characters
of the SHA-1 ID, and the one line summary". Please also fix the
spelling of the adapter name "mp3sas".

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] scsi: mpt3sas: fix hang on ata passthru commands
  2017-01-06  1:59           ` Martin K. Petersen
  2017-01-06 15:46             ` Sreekanth Reddy
@ 2017-01-16 20:01             ` James Bottomley
  2017-01-16 21:01               ` Martin K. Petersen
  2017-01-17 14:13               ` Sreekanth Reddy
  1 sibling, 2 replies; 22+ messages in thread
From: James Bottomley @ 2017-01-16 20:01 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: David Miller, Bart Van Assche, Christoph Hellwig, jbaron,
	linux-kernel, sagi, Sathya Prakash, Suganath Prabu Subramani,
	Hannes Reinecke, linux-scsi, Christoph Hellwig, Chaitra Basappa,
	dledford, Sreekanth Reddy

>From 91d249409546569444897a1ffde65c421e064899 Mon Sep 17 00:00:00 2001
From: James Bottomley <James.Bottomley@HansenPartnership.com>
Date: Sun, 1 Jan 2017 09:39:24 -0800
Subject: [PATCH] scsi: mpt3sas: fix hang on ata passthrough commands

mpt3sas has a firmware failure where it can only handle one pass
through ATA command at a time.  If another comes in, contrary to the
SAT standard, it will hang until the first one completes (causing long
commands like secure erase to timeout).  The original fix was to block
the device when an ATA command came in, but this caused a regression
with

commit 669f044170d8933c3d66d231b69ea97cb8447338
Author: Bart Van Assche <bart.vanassche@sandisk.com>
Date:   Tue Nov 22 16:17:13 2016 -0800

    scsi: srp_transport: Move queuecommand() wait code to SCSI core

So fix the original fix of the secure erase timeout by properly
returning SAM_STAT_BUSY like the SAT recommends.  The original patch
also had a concurrency problem since scsih_qcmd is lockless at that
point (this is fixed by using atomic bitops to set and test the flag).

Fixes: 18f6084a989ba1b (mpt3sas: Fix secure erase premature termination)
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

---

v2 - use bitops for lockless atomicity
v3 - update description, change function name
---
 drivers/scsi/mpt3sas/mpt3sas_base.h  | 12 +++++++++++
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 40 +++++++++++++++++++++++-------------
 2 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
index 394fe13..dcb33f4 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -393,6 +393,7 @@ struct MPT3SAS_TARGET {
  * @eedp_enable: eedp support enable bit
  * @eedp_type: 0(type_1), 1(type_2), 2(type_3)
  * @eedp_block_length: block size
+ * @ata_command_pending: SATL passthrough outstanding for device
  */
 struct MPT3SAS_DEVICE {
 	struct MPT3SAS_TARGET *sas_target;
@@ -404,6 +405,17 @@ struct MPT3SAS_DEVICE {
 	u8	ignore_delay_remove;
 	/* Iopriority Command Handling */
 	u8	ncq_prio_enable;
+	/*
+	 * Bug workaround for SATL handling: the mpt2/3sas firmware
+	 * doesn't return BUSY or TASK_SET_FULL for subsequent
+	 * commands while a SATL pass through is in operation as the
+	 * spec requires, it simply does nothing with them until the
+	 * pass through completes, causing them possibly to timeout if
+	 * the passthrough is a long executing command (like format or
+	 * secure erase).  This variable allows us to do the right
+	 * thing while a SATL command is pending.
+	 */
+	unsigned long ata_command_pending;
 
 };
 
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index b5c966e..830e2c1 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -3899,9 +3899,18 @@ _scsih_temp_threshold_events(struct MPT3SAS_ADAPTER *ioc,
 	}
 }
 
-static inline bool ata_12_16_cmd(struct scsi_cmnd *scmd)
+static int _scsih_set_satl_pending(struct scsi_cmnd *scmd, bool pending)
 {
-	return (scmd->cmnd[0] == ATA_12 || scmd->cmnd[0] == ATA_16);
+	struct MPT3SAS_DEVICE *priv = scmd->device->hostdata;
+
+	if  (scmd->cmnd[0] != ATA_12 && scmd->cmnd[0] != ATA_16)
+		return 0;
+
+	if (pending)
+		return test_and_set_bit(0, &priv->ata_command_pending);
+
+	clear_bit(0, &priv->ata_command_pending);
+	return 0;
 }
 
 /**
@@ -3925,9 +3934,7 @@ _scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc)
 		if (!scmd)
 			continue;
 		count++;
-		if (ata_12_16_cmd(scmd))
-			scsi_internal_device_unblock(scmd->device,
-							SDEV_RUNNING);
+		_scsih_set_satl_pending(scmd, false);
 		mpt3sas_base_free_smid(ioc, smid);
 		scsi_dma_unmap(scmd);
 		if (ioc->pci_error_recovery)
@@ -4063,13 +4070,6 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *scmd)
 	if (ioc->logging_level & MPT_DEBUG_SCSI)
 		scsi_print_command(scmd);
 
-	/*
-	 * Lock the device for any subsequent command until command is
-	 * done.
-	 */
-	if (ata_12_16_cmd(scmd))
-		scsi_internal_device_block(scmd->device);
-
 	sas_device_priv_data = scmd->device->hostdata;
 	if (!sas_device_priv_data || !sas_device_priv_data->sas_target) {
 		scmd->result = DID_NO_CONNECT << 16;
@@ -4083,6 +4083,19 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *scmd)
 		return 0;
 	}
 
+	/*
+	 * Bug work around for firmware SATL handling.  The loop
+	 * is based on atomic operations and ensures consistency
+	 * since we're lockless at this point
+	 */
+	do {
+		if (sas_device_priv_data->ata_command_pending) {
+			scmd->result = SAM_STAT_BUSY;
+			scmd->scsi_done(scmd);
+			return 0;
+		}
+	} while (_scsih_set_satl_pending(scmd, true));
+
 	sas_target_priv_data = sas_device_priv_data->sas_target;
 
 	/* invalid device handle */
@@ -4650,8 +4663,7 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 msix_index, u32 reply)
 	if (scmd == NULL)
 		return 1;
 
-	if (ata_12_16_cmd(scmd))
-		scsi_internal_device_unblock(scmd->device, SDEV_RUNNING);
+	_scsih_set_satl_pending(scmd, false);
 
 	mpi_request = mpt3sas_base_get_msg_frame(ioc, smid);
 
-- 
2.6.6

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] scsi: mpt3sas: fix hang on ata passthru commands
  2017-01-16 20:01             ` James Bottomley
@ 2017-01-16 21:01               ` Martin K. Petersen
  2017-01-17  9:20                 ` Ingo Molnar
  2017-01-17 14:13               ` Sreekanth Reddy
  1 sibling, 1 reply; 22+ messages in thread
From: Martin K. Petersen @ 2017-01-16 21:01 UTC (permalink / raw)
  To: James Bottomley
  Cc: Martin K. Petersen, David Miller, Bart Van Assche,
	Christoph Hellwig, jbaron, linux-kernel, sagi, Sathya Prakash,
	Suganath Prabu Subramani, Hannes Reinecke, linux-scsi,
	Christoph Hellwig, Chaitra Basappa, dledford, Sreekanth Reddy,
	Ingo Molnar

>>>>> "James" == James Bottomley <James.Bottomley@HansenPartnership.com> writes:

James> Subject: [PATCH] scsi: mpt3sas: fix hang on ata passthrough
James> commands

James> mpt3sas has a firmware failure where it can only handle one pass
James> through ATA command at a time.  If another comes in, contrary to
James> the SAT standard, it will hang until the first one completes
James> (causing long commands like secure erase to timeout).  The
James> original fix was to block the device when an ATA command came in,
James> but this caused a regression with

Broadcom folks: Please test and ack as soon as possible so we can get
this fix queued up.

Ingo: Since you appear to have hardware, it would be great if you could
test James' v3 (https://patchwork.kernel.org/patch/9519383/). Sorry for
the inconvenience.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] scsi: mpt3sas: fix hang on ata passthru commands
  2017-01-16 21:01               ` Martin K. Petersen
@ 2017-01-17  9:20                 ` Ingo Molnar
  0 siblings, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2017-01-17  9:20 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, David Miller, Bart Van Assche,
	Christoph Hellwig, jbaron, linux-kernel, sagi, Sathya Prakash,
	Suganath Prabu Subramani, Hannes Reinecke, linux-scsi,
	Christoph Hellwig, Chaitra Basappa, dledford, Sreekanth Reddy


* Martin K. Petersen <martin.petersen@oracle.com> wrote:

> >>>>> "James" == James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> 
> James> Subject: [PATCH] scsi: mpt3sas: fix hang on ata passthrough
> James> commands
> 
> James> mpt3sas has a firmware failure where it can only handle one pass
> James> through ATA command at a time.  If another comes in, contrary to
> James> the SAT standard, it will hang until the first one completes
> James> (causing long commands like secure erase to timeout).  The
> James> original fix was to block the device when an ATA command came in,
> James> but this caused a regression with
> 
> Broadcom folks: Please test and ack as soon as possible so we can get
> this fix queued up.
> 
> Ingo: Since you appear to have hardware, it would be great if you could
> test James' v3 (https://patchwork.kernel.org/patch/9519383/). Sorry for
> the inconvenience.

As per the interdiff below v2->v3 did not change the code in any way, only the 
name of the function and a comment, so you can add this to v3 as well:

  Reported-by: Ingo Molnar <mingo@kernel.org>
  Tested-by: Ingo Molnar <mingo@kernel.org>

Thanks,

	Ingo

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 6f9b4c051e4d..830e2c10ba02 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -3899,7 +3899,7 @@ _scsih_temp_threshold_events(struct MPT3SAS_ADAPTER *ioc,
 	}
 }
 
-static int set_satl_pending(struct scsi_cmnd *scmd, bool pending)
+static int _scsih_set_satl_pending(struct scsi_cmnd *scmd, bool pending)
 {
 	struct MPT3SAS_DEVICE *priv = scmd->device->hostdata;
 
@@ -3934,7 +3934,7 @@ _scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc)
 		if (!scmd)
 			continue;
 		count++;
-		set_satl_pending(scmd, false);
+		_scsih_set_satl_pending(scmd, false);
 		mpt3sas_base_free_smid(ioc, smid);
 		scsi_dma_unmap(scmd);
 		if (ioc->pci_error_recovery)
@@ -4084,7 +4084,9 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *scmd)
 	}
 
 	/*
-	 * Bug work around for firmware SATL handling
+	 * Bug work around for firmware SATL handling.  The loop
+	 * is based on atomic operations and ensures consistency
+	 * since we're lockless at this point
 	 */
 	do {
 		if (sas_device_priv_data->ata_command_pending) {
@@ -4092,7 +4094,7 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *scmd)
 			scmd->scsi_done(scmd);
 			return 0;
 		}
-	} while (set_satl_pending(scmd, true));
+	} while (_scsih_set_satl_pending(scmd, true));
 
 	sas_target_priv_data = sas_device_priv_data->sas_target;
 
@@ -4661,7 +4663,7 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 msix_index, u32 reply)
 	if (scmd == NULL)
 		return 1;
 
-	set_satl_pending(scmd, false);
+	_scsih_set_satl_pending(scmd, false);
 
 	mpi_request = mpt3sas_base_get_msg_frame(ioc, smid);
 

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] scsi: mpt3sas: fix hang on ata passthru commands
  2017-01-16 20:01             ` James Bottomley
  2017-01-16 21:01               ` Martin K. Petersen
@ 2017-01-17 14:13               ` Sreekanth Reddy
  2017-01-17 14:15                 ` Christoph Hellwig
  2017-01-17 14:44                 ` James Bottomley
  1 sibling, 2 replies; 22+ messages in thread
From: Sreekanth Reddy @ 2017-01-17 14:13 UTC (permalink / raw)
  To: James Bottomley
  Cc: Martin K. Petersen, David Miller, Bart Van Assche,
	Christoph Hellwig, jbaron, linux-kernel, sagi, Sathya Prakash,
	Suganath Prabu Subramani, Hannes Reinecke, linux-scsi,
	Christoph Hellwig, Chaitra Basappa, dledford

On Tue, Jan 17, 2017 at 1:31 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> From 91d249409546569444897a1ffde65c421e064899 Mon Sep 17 00:00:00 2001
> From: James Bottomley <James.Bottomley@HansenPartnership.com>
> Date: Sun, 1 Jan 2017 09:39:24 -0800
> Subject: [PATCH] scsi: mpt3sas: fix hang on ata passthrough commands
>
> mpt3sas has a firmware failure where it can only handle one pass
> through ATA command at a time.  If another comes in, contrary to the
> SAT standard, it will hang until the first one completes (causing long
> commands like secure erase to timeout).  The original fix was to block
> the device when an ATA command came in, but this caused a regression
> with
>
> commit 669f044170d8933c3d66d231b69ea97cb8447338
> Author: Bart Van Assche <bart.vanassche@sandisk.com>
> Date:   Tue Nov 22 16:17:13 2016 -0800
>
>     scsi: srp_transport: Move queuecommand() wait code to SCSI core
>
> So fix the original fix of the secure erase timeout by properly
> returning SAM_STAT_BUSY like the SAT recommends.  The original patch
> also had a concurrency problem since scsih_qcmd is lockless at that
> point (this is fixed by using atomic bitops to set and test the flag).
>
> Fixes: 18f6084a989ba1b (mpt3sas: Fix secure erase premature termination)
> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
>
> ---
>
> v2 - use bitops for lockless atomicity
> v3 - update description, change function name
> ---
>  drivers/scsi/mpt3sas/mpt3sas_base.h  | 12 +++++++++++
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c | 40 +++++++++++++++++++++++-------------
>  2 files changed, 38 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
> index 394fe13..dcb33f4 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_base.h
> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
> @@ -393,6 +393,7 @@ struct MPT3SAS_TARGET {
>   * @eedp_enable: eedp support enable bit
>   * @eedp_type: 0(type_1), 1(type_2), 2(type_3)
>   * @eedp_block_length: block size
> + * @ata_command_pending: SATL passthrough outstanding for device
>   */
>  struct MPT3SAS_DEVICE {
>         struct MPT3SAS_TARGET *sas_target;
> @@ -404,6 +405,17 @@ struct MPT3SAS_DEVICE {
>         u8      ignore_delay_remove;
>         /* Iopriority Command Handling */
>         u8      ncq_prio_enable;
> +       /*
> +        * Bug workaround for SATL handling: the mpt2/3sas firmware
> +        * doesn't return BUSY or TASK_SET_FULL for subsequent
> +        * commands while a SATL pass through is in operation as the
> +        * spec requires, it simply does nothing with them until the
> +        * pass through completes, causing them possibly to timeout if
> +        * the passthrough is a long executing command (like format or
> +        * secure erase).  This variable allows us to do the right
> +        * thing while a SATL command is pending.
> +        */
> +       unsigned long ata_command_pending;
>
>  };
>
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> index b5c966e..830e2c1 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> @@ -3899,9 +3899,18 @@ _scsih_temp_threshold_events(struct MPT3SAS_ADAPTER *ioc,
>         }
>  }
>
> -static inline bool ata_12_16_cmd(struct scsi_cmnd *scmd)
> +static int _scsih_set_satl_pending(struct scsi_cmnd *scmd, bool pending)
>  {
> -       return (scmd->cmnd[0] == ATA_12 || scmd->cmnd[0] == ATA_16);
> +       struct MPT3SAS_DEVICE *priv = scmd->device->hostdata;
> +
> +       if  (scmd->cmnd[0] != ATA_12 && scmd->cmnd[0] != ATA_16)
> +               return 0;
> +
> +       if (pending)
> +               return test_and_set_bit(0, &priv->ata_command_pending);
> +
> +       clear_bit(0, &priv->ata_command_pending);
> +       return 0;
>  }
>
>  /**
> @@ -3925,9 +3934,7 @@ _scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc)
>                 if (!scmd)
>                         continue;
>                 count++;
> -               if (ata_12_16_cmd(scmd))
> -                       scsi_internal_device_unblock(scmd->device,
> -                                                       SDEV_RUNNING);
> +               _scsih_set_satl_pending(scmd, false);
>                 mpt3sas_base_free_smid(ioc, smid);
>                 scsi_dma_unmap(scmd);
>                 if (ioc->pci_error_recovery)
> @@ -4063,13 +4070,6 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *scmd)
>         if (ioc->logging_level & MPT_DEBUG_SCSI)
>                 scsi_print_command(scmd);
>
> -       /*
> -        * Lock the device for any subsequent command until command is
> -        * done.
> -        */
> -       if (ata_12_16_cmd(scmd))
> -               scsi_internal_device_block(scmd->device);
> -
>         sas_device_priv_data = scmd->device->hostdata;
>         if (!sas_device_priv_data || !sas_device_priv_data->sas_target) {
>                 scmd->result = DID_NO_CONNECT << 16;
> @@ -4083,6 +4083,19 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *scmd)
>                 return 0;
>         }
>
> +       /*
> +        * Bug work around for firmware SATL handling.  The loop
> +        * is based on atomic operations and ensures consistency
> +        * since we're lockless at this point
> +        */
> +       do {
> +               if (sas_device_priv_data->ata_command_pending) {
> +                       scmd->result = SAM_STAT_BUSY;
> +                       scmd->scsi_done(scmd);
> +                       return 0;
> +               }
> +       } while (_scsih_set_satl_pending(scmd, true));
> +

[Sreekanth] Just for readability purpose, can use use "if (test_bit(0,
&sas_device_priv_data->ata_command_pending)"
 instead of "if (sas_device_priv_data->ata_command_pending)".
Since while setting & clearing the bit we are using atomic bit
operations. I don't see any issue functionality wise.

>         sas_target_priv_data = sas_device_priv_data->sas_target;
>
>         /* invalid device handle */
> @@ -4650,8 +4663,7 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 msix_index, u32 reply)
>         if (scmd == NULL)
>                 return 1;
>
> -       if (ata_12_16_cmd(scmd))
> -               scsi_internal_device_unblock(scmd->device, SDEV_RUNNING);
> +       _scsih_set_satl_pending(scmd, false);
>
>         mpi_request = mpt3sas_base_get_msg_frame(ioc, smid);
>
> --
> 2.6.6
>

Tested this patch. It is working fine. Please consider this patch as

Acked-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>

Thanks,
Sreekanth

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] scsi: mpt3sas: fix hang on ata passthru commands
  2017-01-17 14:13               ` Sreekanth Reddy
@ 2017-01-17 14:15                 ` Christoph Hellwig
  2017-01-17 19:44                   ` Martin K. Petersen
  2017-01-17 14:44                 ` James Bottomley
  1 sibling, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2017-01-17 14:15 UTC (permalink / raw)
  To: Sreekanth Reddy
  Cc: James Bottomley, Martin K. Petersen, David Miller,
	Bart Van Assche, Christoph Hellwig, jbaron, linux-kernel, sagi,
	Sathya Prakash, Suganath Prabu Subramani, Hannes Reinecke,
	linux-scsi, Christoph Hellwig, Chaitra Basappa, dledford

On Tue, Jan 17, 2017 at 07:43:51PM +0530, Sreekanth Reddy wrote:
> [Sreekanth] Just for readability purpose, can use use "if (test_bit(0,
> &sas_device_priv_data->ata_command_pending)"
>  instead of "if (sas_device_priv_data->ata_command_pending)".
> Since while setting & clearing the bit we are using atomic bit
> operations. I don't see any issue functionality wise.

I agree.  Also while we're into nitpicking - it would be good to
give bit 0 of the bitmap a name instead of hardcoding the 0.

Except for these patch looks fine:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] scsi: mpt3sas: fix hang on ata passthru commands
  2017-01-17 14:13               ` Sreekanth Reddy
  2017-01-17 14:15                 ` Christoph Hellwig
@ 2017-01-17 14:44                 ` James Bottomley
  1 sibling, 0 replies; 22+ messages in thread
From: James Bottomley @ 2017-01-17 14:44 UTC (permalink / raw)
  To: Sreekanth Reddy
  Cc: Martin K. Petersen, David Miller, Bart Van Assche,
	Christoph Hellwig, jbaron, linux-kernel, sagi, Sathya Prakash,
	Suganath Prabu Subramani, Hannes Reinecke, linux-scsi,
	Christoph Hellwig, Chaitra Basappa, dledford

On Tue, 2017-01-17 at 19:43 +0530, Sreekanth Reddy wrote:
> On Tue, Jan 17, 2017 at 1:31 AM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> > From 91d249409546569444897a1ffde65c421e064899 Mon Sep 17 00:00:00
> > 2001
> > From: James Bottomley <James.Bottomley@HansenPartnership.com>
> > Date: Sun, 1 Jan 2017 09:39:24 -0800
> > Subject: [PATCH] scsi: mpt3sas: fix hang on ata passthrough
> > commands
> > 
> > mpt3sas has a firmware failure where it can only handle one pass
> > through ATA command at a time.  If another comes in, contrary to
> > the
> > SAT standard, it will hang until the first one completes (causing
> > long
> > commands like secure erase to timeout).  The original fix was to
> > block
> > the device when an ATA command came in, but this caused a
> > regression
> > with
> > 
> > commit 669f044170d8933c3d66d231b69ea97cb8447338
> > Author: Bart Van Assche <bart.vanassche@sandisk.com>
> > Date:   Tue Nov 22 16:17:13 2016 -0800
> > 
> >     scsi: srp_transport: Move queuecommand() wait code to SCSI core
> > 
> > So fix the original fix of the secure erase timeout by properly
> > returning SAM_STAT_BUSY like the SAT recommends.  The original
> > patch
> > also had a concurrency problem since scsih_qcmd is lockless at that
> > point (this is fixed by using atomic bitops to set and test the
> > flag).
> > 
> > Fixes: 18f6084a989ba1b (mpt3sas: Fix secure erase premature
> > termination)
> > Signed-off-by: James Bottomley <
> > James.Bottomley@HansenPartnership.com>
> > 
> > ---
> > 
> > v2 - use bitops for lockless atomicity
> > v3 - update description, change function name
> > ---
> >  drivers/scsi/mpt3sas/mpt3sas_base.h  | 12 +++++++++++
> >  drivers/scsi/mpt3sas/mpt3sas_scsih.c | 40 +++++++++++++++++++++++-
> > ------------
> >  2 files changed, 38 insertions(+), 14 deletions(-)
> > 
> > diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h
> > b/drivers/scsi/mpt3sas/mpt3sas_base.h
> > index 394fe13..dcb33f4 100644
> > --- a/drivers/scsi/mpt3sas/mpt3sas_base.h
> > +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
> > @@ -393,6 +393,7 @@ struct MPT3SAS_TARGET {
> >   * @eedp_enable: eedp support enable bit
> >   * @eedp_type: 0(type_1), 1(type_2), 2(type_3)
> >   * @eedp_block_length: block size
> > + * @ata_command_pending: SATL passthrough outstanding for device
> >   */
> >  struct MPT3SAS_DEVICE {
> >         struct MPT3SAS_TARGET *sas_target;
> > @@ -404,6 +405,17 @@ struct MPT3SAS_DEVICE {
> >         u8      ignore_delay_remove;
> >         /* Iopriority Command Handling */
> >         u8      ncq_prio_enable;
> > +       /*
> > +        * Bug workaround for SATL handling: the mpt2/3sas firmware
> > +        * doesn't return BUSY or TASK_SET_FULL for subsequent
> > +        * commands while a SATL pass through is in operation as
> > the
> > +        * spec requires, it simply does nothing with them until
> > the
> > +        * pass through completes, causing them possibly to timeout
> > if
> > +        * the passthrough is a long executing command (like format
> > or
> > +        * secure erase).  This variable allows us to do the right
> > +        * thing while a SATL command is pending.
> > +        */
> > +       unsigned long ata_command_pending;
> > 
> >  };
> > 
> > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > index b5c966e..830e2c1 100644
> > --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > @@ -3899,9 +3899,18 @@ _scsih_temp_threshold_events(struct
> > MPT3SAS_ADAPTER *ioc,
> >         }
> >  }
> > 
> > -static inline bool ata_12_16_cmd(struct scsi_cmnd *scmd)
> > +static int _scsih_set_satl_pending(struct scsi_cmnd *scmd, bool
> > pending)
> >  {
> > -       return (scmd->cmnd[0] == ATA_12 || scmd->cmnd[0] ==
> > ATA_16);
> > +       struct MPT3SAS_DEVICE *priv = scmd->device->hostdata;
> > +
> > +       if  (scmd->cmnd[0] != ATA_12 && scmd->cmnd[0] != ATA_16)
> > +               return 0;
> > +
> > +       if (pending)
> > +               return test_and_set_bit(0, &priv
> > ->ata_command_pending);
> > +
> > +       clear_bit(0, &priv->ata_command_pending);
> > +       return 0;
> >  }
> > 
> >  /**
> > @@ -3925,9 +3934,7 @@ _scsih_flush_running_cmds(struct
> > MPT3SAS_ADAPTER *ioc)
> >                 if (!scmd)
> >                         continue;
> >                 count++;
> > -               if (ata_12_16_cmd(scmd))
> > -                       scsi_internal_device_unblock(scmd->device,
> > -                                                      
> >  SDEV_RUNNING);
> > +               _scsih_set_satl_pending(scmd, false);
> >                 mpt3sas_base_free_smid(ioc, smid);
> >                 scsi_dma_unmap(scmd);
> >                 if (ioc->pci_error_recovery)
> > @@ -4063,13 +4070,6 @@ scsih_qcmd(struct Scsi_Host *shost, struct
> > scsi_cmnd *scmd)
> >         if (ioc->logging_level & MPT_DEBUG_SCSI)
> >                 scsi_print_command(scmd);
> > 
> > -       /*
> > -        * Lock the device for any subsequent command until command
> > is
> > -        * done.
> > -        */
> > -       if (ata_12_16_cmd(scmd))
> > -               scsi_internal_device_block(scmd->device);
> > -
> >         sas_device_priv_data = scmd->device->hostdata;
> >         if (!sas_device_priv_data || !sas_device_priv_data
> > ->sas_target) {
> >                 scmd->result = DID_NO_CONNECT << 16;
> > @@ -4083,6 +4083,19 @@ scsih_qcmd(struct Scsi_Host *shost, struct
> > scsi_cmnd *scmd)
> >                 return 0;
> >         }
> > 
> > +       /*
> > +        * Bug work around for firmware SATL handling.  The loop
> > +        * is based on atomic operations and ensures consistency
> > +        * since we're lockless at this point
> > +        */
> > +       do {
> > +               if (sas_device_priv_data->ata_command_pending) {
> > +                       scmd->result = SAM_STAT_BUSY;
> > +                       scmd->scsi_done(scmd);
> > +                       return 0;
> > +               }
> > +       } while (_scsih_set_satl_pending(scmd, true));
> > +
> 
> [Sreekanth] Just for readability purpose, can use use "if
> (test_bit(0,
> &sas_device_priv_data->ata_command_pending)"
>  instead of "if (sas_device_priv_data->ata_command_pending)".
> Since while setting & clearing the bit we are using atomic bit
> operations. I don't see any issue functionality wise.

It's taste I suppose.  I like the idea of exposing a true or false
value that can be read but which uses bitops under the cover to ensure
atomicity.

The clincher for why not is probably that if you want another go
around, I'm just about to get on a 'plane, so it won't get to it for a
while and Linus will likely revert the other patch if we don't get a
fix in before -rc4

James

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] scsi: mpt3sas: fix hang on ata passthru commands
  2017-01-17 14:15                 ` Christoph Hellwig
@ 2017-01-17 19:44                   ` Martin K. Petersen
  0 siblings, 0 replies; 22+ messages in thread
From: Martin K. Petersen @ 2017-01-17 19:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sreekanth Reddy, James Bottomley, Martin K. Petersen,
	David Miller, Bart Van Assche, Christoph Hellwig, jbaron,
	linux-kernel, sagi, Sathya Prakash, Suganath Prabu Subramani,
	Hannes Reinecke, linux-scsi, Chaitra Basappa, dledford,
	Ingo Molnar

>>>>> "Christoph" == Christoph Hellwig <hch@lst.de> writes:

Christoph> On Tue, Jan 17, 2017 at 07:43:51PM +0530, Sreekanth Reddy wrote:
>> [Sreekanth] Just for readability purpose, can use use "if
>> (test_bit(0, &sas_device_priv_data->ata_command_pending)" instead of
>> "if (sas_device_priv_data->ata_command_pending)".  Since while
>> setting & clearing the bit we are using atomic bit operations. I
>> don't see any issue functionality wise.

Christoph> I agree.  Also while we're into nitpicking - it would be good
Christoph> to give bit 0 of the bitmap a name instead of hardcoding the
Christoph> 0.

I tweaked the test case. We can name the bit later if more flags are
needed (and in that case the ata_command_pending would need to get
renamed too).

In any case. This issue has taken waaay too long to get resolved so the
patch is now queued up in 4.10/scsi-fixes.

Thanks everyone!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-01-17 20:59 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-29  4:30 [PATCH] scsi: mpt3sas: fix hang on ata passthru commands Jason Baron
2016-12-29  8:02 ` Christoph Hellwig
2016-12-29 16:02   ` Jason Baron
2016-12-31 23:19   ` James Bottomley
2017-01-01 14:22     ` Bart Van Assche
2017-01-01 15:30       ` Jason Baron
2017-01-01 16:33       ` David Miller
2017-01-01 17:39         ` James Bottomley
2017-01-03 20:46           ` Jason Baron
2017-01-15 17:01             ` James Bottomley
2017-01-16 16:20               ` Bart Van Assche
2017-01-06  1:59           ` Martin K. Petersen
2017-01-06 15:46             ` Sreekanth Reddy
2017-01-10  4:50               ` Martin K. Petersen
2017-01-16 20:01             ` James Bottomley
2017-01-16 21:01               ` Martin K. Petersen
2017-01-17  9:20                 ` Ingo Molnar
2017-01-17 14:13               ` Sreekanth Reddy
2017-01-17 14:15                 ` Christoph Hellwig
2017-01-17 19:44                   ` Martin K. Petersen
2017-01-17 14:44                 ` James Bottomley
2016-12-29 16:16 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).