All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] mpt2sas,mpt3sas - PCI master abort fixups
@ 2014-12-30 14:07 Joe Lawrence
  2014-12-30 14:07 ` [PATCH 1/2] mpt2sas,mpt3sas: correct master-abort checking in doorbell ack Joe Lawrence
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Joe Lawrence @ 2014-12-30 14:07 UTC (permalink / raw)
  To: linux-scsi
  Cc: Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, James E.J. Bottomley, Christoph Hellwig,
	Derek Shute, Joe Lawrence

A colleague noticed that the mpt2 and mpt3sas drivers do not correctly
check the PCI master abort pattern in _base_wait_for_doorbell_ack.  This
pattern should be checked *prior* to any valid bit patterns, which would
always return true since a PCI read on master abort sets all bits high.

The second patch adds similar checking to _base_wait_for_doorbell_int and
_base_wait_for_doorbell_not_used to avoid potentially long loops around
PCI reads.

Joe Lawrence (2):
  mpt2sas,mpt3sas: correct master-abort checking in doorbell ack
  mpt2sas,mpt3sas: additional master abort checks

 drivers/scsi/mpt2sas/mpt2sas_base.c |   17 ++++++++++++-----
 drivers/scsi/mpt3sas/mpt3sas_base.c |   17 ++++++++++++-----
 2 files changed, 24 insertions(+), 10 deletions(-)

-- 
1.7.10.4


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] mpt2sas,mpt3sas: correct master-abort checking in doorbell ack
  2014-12-30 14:07 [PATCH 0/2] mpt2sas,mpt3sas - PCI master abort fixups Joe Lawrence
@ 2014-12-30 14:07 ` Joe Lawrence
  2014-12-30 14:07 ` [PATCH 2/2] mpt2sas,mpt3sas: additional master abort checks Joe Lawrence
  2015-04-13  0:11 ` [PATCH 0/2] mpt2sas,mpt3sas - PCI master abort fixups Joe Lawrence
  2 siblings, 0 replies; 8+ messages in thread
From: Joe Lawrence @ 2014-12-30 14:07 UTC (permalink / raw)
  To: linux-scsi
  Cc: Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, James E.J. Bottomley, Christoph Hellwig,
	Derek Shute, Joe Lawrence

Test against the invalid data pattern ~0 before testing valid data
patterns.

Reported-by: Derek Shute <derek.shute@stratus.com>
Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
---
 drivers/scsi/mpt2sas/mpt2sas_base.c |    7 ++++---
 drivers/scsi/mpt3sas/mpt3sas_base.c |    7 ++++---
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c b/drivers/scsi/mpt2sas/mpt2sas_base.c
index 3fb01d1883c6..16d6fd5e037e 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_base.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_base.c
@@ -2983,7 +2983,9 @@ _base_wait_for_doorbell_ack(struct MPT2SAS_ADAPTER *ioc, int timeout,
 	cntdn = (sleep_flag == CAN_SLEEP) ? 1000*timeout : 2000*timeout;
 	do {
 		int_status = readl(&ioc->chip->HostInterruptStatus);
-		if (!(int_status & MPI2_HIS_SYS2IOC_DB_STATUS)) {
+		if (int_status == 0xFFFFFFFF)
+			goto out;
+		else if (!(int_status & MPI2_HIS_SYS2IOC_DB_STATUS)) {
 			dhsprintk(ioc, printk(MPT2SAS_INFO_FMT "%s: "
 			    "successful count(%d), timeout(%d)\n", ioc->name,
 			    __func__, count, timeout));
@@ -2995,8 +2997,7 @@ _base_wait_for_doorbell_ack(struct MPT2SAS_ADAPTER *ioc, int timeout,
 				mpt2sas_base_fault_info(ioc , doorbell);
 				return -EFAULT;
 			}
-		} else if (int_status == 0xFFFFFFFF)
-			goto out;
+		}
 
 		if (sleep_flag == CAN_SLEEP)
 			msleep(1);
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index b9c27398e206..878bf6ddd2a0 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -3157,7 +3157,9 @@ _base_wait_for_doorbell_ack(struct MPT3SAS_ADAPTER *ioc, int timeout,
 	cntdn = (sleep_flag == CAN_SLEEP) ? 1000*timeout : 2000*timeout;
 	do {
 		int_status = readl(&ioc->chip->HostInterruptStatus);
-		if (!(int_status & MPI2_HIS_SYS2IOC_DB_STATUS)) {
+		if (int_status == 0xFFFFFFFF)
+			goto out;
+		else if (!(int_status & MPI2_HIS_SYS2IOC_DB_STATUS)) {
 			dhsprintk(ioc, pr_info(MPT3SAS_FMT
 				"%s: successful count(%d), timeout(%d)\n",
 				ioc->name, __func__, count, timeout));
@@ -3169,8 +3171,7 @@ _base_wait_for_doorbell_ack(struct MPT3SAS_ADAPTER *ioc, int timeout,
 				mpt3sas_base_fault_info(ioc , doorbell);
 				return -EFAULT;
 			}
-		} else if (int_status == 0xFFFFFFFF)
-			goto out;
+		}
 
 		if (sleep_flag == CAN_SLEEP)
 			usleep_range(1000, 1500);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] mpt2sas,mpt3sas: additional master abort checks
  2014-12-30 14:07 [PATCH 0/2] mpt2sas,mpt3sas - PCI master abort fixups Joe Lawrence
  2014-12-30 14:07 ` [PATCH 1/2] mpt2sas,mpt3sas: correct master-abort checking in doorbell ack Joe Lawrence
@ 2014-12-30 14:07 ` Joe Lawrence
  2015-04-13  0:11 ` [PATCH 0/2] mpt2sas,mpt3sas - PCI master abort fixups Joe Lawrence
  2 siblings, 0 replies; 8+ messages in thread
From: Joe Lawrence @ 2014-12-30 14:07 UTC (permalink / raw)
  To: linux-scsi
  Cc: Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, James E.J. Bottomley, Christoph Hellwig,
	Derek Shute, Joe Lawrence

Add checks for PCI master abort reads in _base_wait_for_doorbell_int and
_base_wait_for_doorbell_not_used, which contain potentially long loops
around readl calls.

Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
---
 drivers/scsi/mpt2sas/mpt2sas_base.c |   10 ++++++++--
 drivers/scsi/mpt3sas/mpt3sas_base.c |   10 ++++++++--
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c b/drivers/scsi/mpt2sas/mpt2sas_base.c
index 16d6fd5e037e..6ad1268cc57b 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_base.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_base.c
@@ -2942,7 +2942,9 @@ _base_wait_for_doorbell_int(struct MPT2SAS_ADAPTER *ioc, int timeout,
 	cntdn = (sleep_flag == CAN_SLEEP) ? 1000*timeout : 2000*timeout;
 	do {
 		int_status = readl(&ioc->chip->HostInterruptStatus);
-		if (int_status & MPI2_HIS_IOC2SYS_DB_STATUS) {
+		if (int_status == 0xFFFFFFFF)
+			goto out;
+		else if (int_status & MPI2_HIS_IOC2SYS_DB_STATUS) {
 			dhsprintk(ioc, printk(MPT2SAS_INFO_FMT "%s: "
 			    "successful count(%d), timeout(%d)\n", ioc->name,
 			    __func__, count, timeout));
@@ -2955,6 +2957,7 @@ _base_wait_for_doorbell_int(struct MPT2SAS_ADAPTER *ioc, int timeout,
 		count++;
 	} while (--cntdn);
 
+ out:
 	printk(MPT2SAS_ERR_FMT "%s: failed due to timeout count(%d), "
 	    "int_status(%x)!\n", ioc->name, __func__, count, int_status);
 	return -EFAULT;
@@ -3032,7 +3035,9 @@ _base_wait_for_doorbell_not_used(struct MPT2SAS_ADAPTER *ioc, int timeout,
 	cntdn = (sleep_flag == CAN_SLEEP) ? 1000*timeout : 2000*timeout;
 	do {
 		doorbell_reg = readl(&ioc->chip->Doorbell);
-		if (!(doorbell_reg & MPI2_DOORBELL_USED)) {
+		if (doorbell_reg == 0xFFFFFFFF)
+			goto out;
+		else if (!(doorbell_reg & MPI2_DOORBELL_USED)) {
 			dhsprintk(ioc, printk(MPT2SAS_INFO_FMT "%s: "
 			    "successful count(%d), timeout(%d)\n", ioc->name,
 			    __func__, count, timeout));
@@ -3045,6 +3050,7 @@ _base_wait_for_doorbell_not_used(struct MPT2SAS_ADAPTER *ioc, int timeout,
 		count++;
 	} while (--cntdn);
 
+ out:
 	printk(MPT2SAS_ERR_FMT "%s: failed due to timeout count(%d), "
 	    "doorbell_reg(%x)!\n", ioc->name, __func__, count, doorbell_reg);
 	return -EFAULT;
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 878bf6ddd2a0..d3b6549640c7 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -3115,7 +3115,9 @@ _base_wait_for_doorbell_int(struct MPT3SAS_ADAPTER *ioc, int timeout,
 	cntdn = (sleep_flag == CAN_SLEEP) ? 1000*timeout : 2000*timeout;
 	do {
 		int_status = readl(&ioc->chip->HostInterruptStatus);
-		if (int_status & MPI2_HIS_IOC2SYS_DB_STATUS) {
+		if (int_status == 0xFFFFFFFF)
+			goto out;
+		else if (int_status & MPI2_HIS_IOC2SYS_DB_STATUS) {
 			dhsprintk(ioc, pr_info(MPT3SAS_FMT
 				"%s: successful count(%d), timeout(%d)\n",
 				ioc->name, __func__, count, timeout));
@@ -3128,6 +3130,7 @@ _base_wait_for_doorbell_int(struct MPT3SAS_ADAPTER *ioc, int timeout,
 		count++;
 	} while (--cntdn);
 
+ out:
 	pr_err(MPT3SAS_FMT
 		"%s: failed due to timeout count(%d), int_status(%x)!\n",
 		ioc->name, __func__, count, int_status);
@@ -3207,7 +3210,9 @@ _base_wait_for_doorbell_not_used(struct MPT3SAS_ADAPTER *ioc, int timeout,
 	cntdn = (sleep_flag == CAN_SLEEP) ? 1000*timeout : 2000*timeout;
 	do {
 		doorbell_reg = readl(&ioc->chip->Doorbell);
-		if (!(doorbell_reg & MPI2_DOORBELL_USED)) {
+		if (doorbell_reg == 0xFFFFFFFF)
+			goto out;
+		else if (!(doorbell_reg & MPI2_DOORBELL_USED)) {
 			dhsprintk(ioc, pr_info(MPT3SAS_FMT
 				"%s: successful count(%d), timeout(%d)\n",
 				ioc->name, __func__, count, timeout));
@@ -3220,6 +3225,7 @@ _base_wait_for_doorbell_not_used(struct MPT3SAS_ADAPTER *ioc, int timeout,
 		count++;
 	} while (--cntdn);
 
+ out:
 	pr_err(MPT3SAS_FMT
 		"%s: failed due to timeout count(%d), doorbell_reg(%x)!\n",
 		ioc->name, __func__, count, doorbell_reg);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] mpt2sas,mpt3sas - PCI master abort fixups
  2014-12-30 14:07 [PATCH 0/2] mpt2sas,mpt3sas - PCI master abort fixups Joe Lawrence
  2014-12-30 14:07 ` [PATCH 1/2] mpt2sas,mpt3sas: correct master-abort checking in doorbell ack Joe Lawrence
  2014-12-30 14:07 ` [PATCH 2/2] mpt2sas,mpt3sas: additional master abort checks Joe Lawrence
@ 2015-04-13  0:11 ` Joe Lawrence
  2015-04-13  0:54   ` James Bottomley
  2 siblings, 1 reply; 8+ messages in thread
From: Joe Lawrence @ 2015-04-13  0:11 UTC (permalink / raw)
  To: linux-scsi
  Cc: Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan, James E.J. Bottomley, Christoph Hellwig

On 12/30/2014 09:07 AM, Joe Lawrence wrote:
> A colleague noticed that the mpt2 and mpt3sas drivers do not correctly
> check the PCI master abort pattern in _base_wait_for_doorbell_ack.  This
> pattern should be checked *prior* to any valid bit patterns, which would
> always return true since a PCI read on master abort sets all bits high.
> 
> The second patch adds similar checking to _base_wait_for_doorbell_int and
> _base_wait_for_doorbell_not_used to avoid potentially long loops around
> PCI reads.
> 
> Joe Lawrence (2):
>   mpt2sas,mpt3sas: correct master-abort checking in doorbell ack
>   mpt2sas,mpt3sas: additional master abort checks
> 
>  drivers/scsi/mpt2sas/mpt2sas_base.c |   17 ++++++++++++-----
>  drivers/scsi/mpt3sas/mpt3sas_base.c |   17 ++++++++++++-----
>  2 files changed, 24 insertions(+), 10 deletions(-)
> 

Avago ping?

This one was pretty straightforward: check 0xFFFFFFFF *before* any
individual bit(s), i.e. before reading the doorbell register.

-- Joe

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] mpt2sas,mpt3sas - PCI master abort fixups
  2015-04-13  0:11 ` [PATCH 0/2] mpt2sas,mpt3sas - PCI master abort fixups Joe Lawrence
@ 2015-04-13  0:54   ` James Bottomley
  2015-04-13 14:06     ` Joe Lawrence
  0 siblings, 1 reply; 8+ messages in thread
From: James Bottomley @ 2015-04-13  0:54 UTC (permalink / raw)
  To: Joe Lawrence
  Cc: linux-scsi, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Christoph Hellwig

On Sun, 2015-04-12 at 20:11 -0400, Joe Lawrence wrote:
> On 12/30/2014 09:07 AM, Joe Lawrence wrote:
> > A colleague noticed that the mpt2 and mpt3sas drivers do not correctly
> > check the PCI master abort pattern in _base_wait_for_doorbell_ack.  This
> > pattern should be checked *prior* to any valid bit patterns, which would
> > always return true since a PCI read on master abort sets all bits high.
> > 
> > The second patch adds similar checking to _base_wait_for_doorbell_int and
> > _base_wait_for_doorbell_not_used to avoid potentially long loops around
> > PCI reads.
> > 
> > Joe Lawrence (2):
> >   mpt2sas,mpt3sas: correct master-abort checking in doorbell ack
> >   mpt2sas,mpt3sas: additional master abort checks
> > 
> >  drivers/scsi/mpt2sas/mpt2sas_base.c |   17 ++++++++++++-----
> >  drivers/scsi/mpt3sas/mpt3sas_base.c |   17 ++++++++++++-----
> >  2 files changed, 24 insertions(+), 10 deletions(-)
> > 
> 
> Avago ping?
> 
> This one was pretty straightforward: check 0xFFFFFFFF *before* any
> individual bit(s), i.e. before reading the doorbell register.

OK, Joe, explain why this patch is important: what problems could result
from it not being present?  If you convince everyone then no more mpt2/3
sas patches until this is at least commented on and a plan of action
proposed.

James



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] mpt2sas,mpt3sas - PCI master abort fixups
  2015-04-13  0:54   ` James Bottomley
@ 2015-04-13 14:06     ` Joe Lawrence
  2015-04-13 14:38       ` James Bottomley
  0 siblings, 1 reply; 8+ messages in thread
From: Joe Lawrence @ 2015-04-13 14:06 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Christoph Hellwig

On 04/12/2015 08:54 PM, James Bottomley wrote:
> On Sun, 2015-04-12 at 20:11 -0400, Joe Lawrence wrote:
>> On 12/30/2014 09:07 AM, Joe Lawrence wrote:
>>> A colleague noticed that the mpt2 and mpt3sas drivers do not correctly
>>> check the PCI master abort pattern in _base_wait_for_doorbell_ack.  This
>>> pattern should be checked *prior* to any valid bit patterns, which would
>>> always return true since a PCI read on master abort sets all bits high.
>>>
>>> The second patch adds similar checking to _base_wait_for_doorbell_int and
>>> _base_wait_for_doorbell_not_used to avoid potentially long loops around
>>> PCI reads.
>>>
>>> Joe Lawrence (2):
>>>   mpt2sas,mpt3sas: correct master-abort checking in doorbell ack
>>>   mpt2sas,mpt3sas: additional master abort checks
>>>
>>>  drivers/scsi/mpt2sas/mpt2sas_base.c |   17 ++++++++++++-----
>>>  drivers/scsi/mpt3sas/mpt3sas_base.c |   17 ++++++++++++-----
>>>  2 files changed, 24 insertions(+), 10 deletions(-)
>>>
>>
>> Avago ping?
>>
>> This one was pretty straightforward: check 0xFFFFFFFF *before* any
>> individual bit(s), i.e. before reading the doorbell register.
> 
> OK, Joe, explain why this patch is important: what problems could result
> from it not being present?  If you convince everyone then no more mpt2/3
> sas patches until this is at least commented on and a plan of action
> proposed.

Hi James,

As currently coded:  If the PCI read returns a master abort,
_base_wait_for_doorbell_ack will loop until it exhausts its timeout (up
to 15 seconds).  Other parts of the driver, like the periodic watchdog
or EEH, may detect a similar problem before such a long time and cleanup
the mess.  However, complete device removal may be stalled until whoever
called _base_wait_for_doorbell_ack is satisfied that it has finished.

This behavior is not really a bug, but feels like one in the making.
Should additional code be introduced, copy/pasted, etc. it may not do
what was intended.

For future reference, would a repost have been more appropriate?  This
changeset was so small that I figured a status ping would have sufficed.

Regards,

-- Joe

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] mpt2sas,mpt3sas - PCI master abort fixups
  2015-04-13 14:06     ` Joe Lawrence
@ 2015-04-13 14:38       ` James Bottomley
  2015-04-13 15:44         ` Joe Lawrence
  0 siblings, 1 reply; 8+ messages in thread
From: James Bottomley @ 2015-04-13 14:38 UTC (permalink / raw)
  To: Joe Lawrence
  Cc: linux-scsi, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Christoph Hellwig

On Mon, 2015-04-13 at 10:06 -0400, Joe Lawrence wrote:
> On 04/12/2015 08:54 PM, James Bottomley wrote:
> > On Sun, 2015-04-12 at 20:11 -0400, Joe Lawrence wrote:
> >> On 12/30/2014 09:07 AM, Joe Lawrence wrote:
> >>> A colleague noticed that the mpt2 and mpt3sas drivers do not correctly
> >>> check the PCI master abort pattern in _base_wait_for_doorbell_ack.  This
> >>> pattern should be checked *prior* to any valid bit patterns, which would
> >>> always return true since a PCI read on master abort sets all bits high.
> >>>
> >>> The second patch adds similar checking to _base_wait_for_doorbell_int and
> >>> _base_wait_for_doorbell_not_used to avoid potentially long loops around
> >>> PCI reads.
> >>>
> >>> Joe Lawrence (2):
> >>>   mpt2sas,mpt3sas: correct master-abort checking in doorbell ack
> >>>   mpt2sas,mpt3sas: additional master abort checks
> >>>
> >>>  drivers/scsi/mpt2sas/mpt2sas_base.c |   17 ++++++++++++-----
> >>>  drivers/scsi/mpt3sas/mpt3sas_base.c |   17 ++++++++++++-----
> >>>  2 files changed, 24 insertions(+), 10 deletions(-)
> >>>
> >>
> >> Avago ping?
> >>
> >> This one was pretty straightforward: check 0xFFFFFFFF *before* any
> >> individual bit(s), i.e. before reading the doorbell register.
> > 
> > OK, Joe, explain why this patch is important: what problems could result
> > from it not being present?  If you convince everyone then no more mpt2/3
> > sas patches until this is at least commented on and a plan of action
> > proposed.
> 
> Hi James,
> 
> As currently coded:  If the PCI read returns a master abort,
> _base_wait_for_doorbell_ack will loop until it exhausts its timeout (up
> to 15 seconds).  Other parts of the driver, like the periodic watchdog
> or EEH, may detect a similar problem before such a long time and cleanup
> the mess.  However, complete device removal may be stalled until whoever
> called _base_wait_for_doorbell_ack is satisfied that it has finished.

I think we all picked this up from the changelog.  What I meant was in
what situations might a card get a master abort ... because that's when
the problem becomes user visible.  It sounds like it's something that
might not occur very often or is a bit theoretical, is that right?

> This behavior is not really a bug, but feels like one in the making.
> Should additional code be introduced, copy/pasted, etc. it may not do
> what was intended.
> 
> For future reference, would a repost have been more appropriate?  This
> changeset was so small that I figured a status ping would have sufficed.

Either works.  I was just trying to work out what sort of attention
needs to be paid to the fix based on what sort of problem it fixes for
the end user.

James



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] mpt2sas,mpt3sas - PCI master abort fixups
  2015-04-13 14:38       ` James Bottomley
@ 2015-04-13 15:44         ` Joe Lawrence
  0 siblings, 0 replies; 8+ messages in thread
From: Joe Lawrence @ 2015-04-13 15:44 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Nagalakshmi Nandigama, Praveen Krishnamoorthy,
	Sreekanth Reddy, Abhijit Mahajan, Christoph Hellwig

On 04/13/2015 10:38 AM, James Bottomley wrote:
> On Mon, 2015-04-13 at 10:06 -0400, Joe Lawrence wrote:
>> On 04/12/2015 08:54 PM, James Bottomley wrote:
>>> On Sun, 2015-04-12 at 20:11 -0400, Joe Lawrence wrote:
>>>> On 12/30/2014 09:07 AM, Joe Lawrence wrote:
>>>>> A colleague noticed that the mpt2 and mpt3sas drivers do not correctly
>>>>> check the PCI master abort pattern in _base_wait_for_doorbell_ack.  This
>>>>> pattern should be checked *prior* to any valid bit patterns, which would
>>>>> always return true since a PCI read on master abort sets all bits high.
>>>>>
>>>>> The second patch adds similar checking to _base_wait_for_doorbell_int and
>>>>> _base_wait_for_doorbell_not_used to avoid potentially long loops around
>>>>> PCI reads.
>>>>>
>>>>> Joe Lawrence (2):
>>>>>   mpt2sas,mpt3sas: correct master-abort checking in doorbell ack
>>>>>   mpt2sas,mpt3sas: additional master abort checks
>>>>>
>>>>>  drivers/scsi/mpt2sas/mpt2sas_base.c |   17 ++++++++++++-----
>>>>>  drivers/scsi/mpt3sas/mpt3sas_base.c |   17 ++++++++++++-----
>>>>>  2 files changed, 24 insertions(+), 10 deletions(-)
>>>>>
>>>>
>>>> Avago ping?
>>>>
>>>> This one was pretty straightforward: check 0xFFFFFFFF *before* any
>>>> individual bit(s), i.e. before reading the doorbell register.
>>>
>>> OK, Joe, explain why this patch is important: what problems could result
>>> from it not being present?  If you convince everyone then no more mpt2/3
>>> sas patches until this is at least commented on and a plan of action
>>> proposed.
>>
>> Hi James,
>>
>> As currently coded:  If the PCI read returns a master abort,
>> _base_wait_for_doorbell_ack will loop until it exhausts its timeout (up
>> to 15 seconds).  Other parts of the driver, like the periodic watchdog
>> or EEH, may detect a similar problem before such a long time and cleanup
>> the mess.  However, complete device removal may be stalled until whoever
>> called _base_wait_for_doorbell_ack is satisfied that it has finished.
> 
> I think we all picked this up from the changelog.  What I meant was in
> what situations might a card get a master abort ... because that's when
> the problem becomes user visible.  It sounds like it's something that
> might not occur very often or is a bit theoretical, is that right?

Got it -- yes, master abort should be an infrequent event.  On Stratus
HW it typically indicates PCI hotplug removal of a device.  We run many
automated removal/add tests, which find corner case bugs and other race
conditions in driver error paths and whatnot.

>> This behavior is not really a bug, but feels like one in the making.
>> Should additional code be introduced, copy/pasted, etc. it may not do
>> what was intended.
>>
>> For future reference, would a repost have been more appropriate?  This
>> changeset was so small that I figured a status ping would have sufficed.
> 
> Either works.  I was just trying to work out what sort of attention
> needs to be paid to the fix based on what sort of problem it fixes for
> the end user.

Ok.  This particular piece of code isn't going to crash the machine or
cause corruption, etc.  It only seemed proper to fix the check that was
already there.

-- Joe

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-04-13 15:52 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-30 14:07 [PATCH 0/2] mpt2sas,mpt3sas - PCI master abort fixups Joe Lawrence
2014-12-30 14:07 ` [PATCH 1/2] mpt2sas,mpt3sas: correct master-abort checking in doorbell ack Joe Lawrence
2014-12-30 14:07 ` [PATCH 2/2] mpt2sas,mpt3sas: additional master abort checks Joe Lawrence
2015-04-13  0:11 ` [PATCH 0/2] mpt2sas,mpt3sas - PCI master abort fixups Joe Lawrence
2015-04-13  0:54   ` James Bottomley
2015-04-13 14:06     ` Joe Lawrence
2015-04-13 14:38       ` James Bottomley
2015-04-13 15:44         ` Joe Lawrence

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.