All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] scsi: mpt3sas: disable ASPM for mpt3sas / SAS3.0
@ 2021-03-25 19:51 Joe Damato
  2021-04-06  4:00 ` Martin K. Petersen
  0 siblings, 1 reply; 3+ messages in thread
From: Joe Damato @ 2021-03-25 19:51 UTC (permalink / raw)
  To: linux-scsi; +Cc: suganath-prabu.subramani, sreekanth.reddy, Joe Damato

Noticed commit ffdadd68af5a ("scsi: mpt3sas: disable ASPM for MPI2
controllers") disables ASPM for SAS-2.0 HBAs, but this change was not
replicated for SAS-3.0 HBAs. This change replicates this behavior.

Signed-off-by: Joe Damato <ice799@gmail.com>
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 6aa6de7..bc038e4 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -11842,6 +11842,8 @@ _scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		break;
 	case MPI25_VERSION:
 	case MPI26_VERSION:
+		pci_disable_link_state(pdev, PCIE_LINK_STATE_L0S |
+			PCIE_LINK_STATE_L1 | PCIE_LINK_STATE_CLKPM);
 		/* Use mpt3sas driver host template for SAS 3.0 HBA's */
 		shost = scsi_host_alloc(&mpt3sas_driver_template,
 		  sizeof(struct MPT3SAS_ADAPTER));
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] scsi: mpt3sas: disable ASPM for mpt3sas / SAS3.0
  2021-03-25 19:51 [PATCH] scsi: mpt3sas: disable ASPM for mpt3sas / SAS3.0 Joe Damato
@ 2021-04-06  4:00 ` Martin K. Petersen
  2021-04-06 19:01   ` Joe Damato
  0 siblings, 1 reply; 3+ messages in thread
From: Martin K. Petersen @ 2021-04-06  4:00 UTC (permalink / raw)
  To: Joe Damato; +Cc: linux-scsi, suganath-prabu.subramani, sreekanth.reddy


Joe,

> Noticed commit ffdadd68af5a ("scsi: mpt3sas: disable ASPM for MPI2
> controllers") disables ASPM for SAS-2.0 HBAs, but this change was not
> replicated for SAS-3.0 HBAs. This change replicates this behavior.

Do you have a system that exhibits problems with ASPM enabled?

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] scsi: mpt3sas: disable ASPM for mpt3sas / SAS3.0
  2021-04-06  4:00 ` Martin K. Petersen
@ 2021-04-06 19:01   ` Joe Damato
  0 siblings, 0 replies; 3+ messages in thread
From: Joe Damato @ 2021-04-06 19:01 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-scsi, suganath-prabu.subramani, sreekanth.reddy

On Mon, Apr 5, 2021 at 9:00 PM Martin K. Petersen
<martin.petersen@oracle.com> wrote:
>
>
> Joe,
>
> > Noticed commit ffdadd68af5a ("scsi: mpt3sas: disable ASPM for MPI2
> > controllers") disables ASPM for SAS-2.0 HBAs, but this change was not
> > replicated for SAS-3.0 HBAs. This change replicates this behavior.
>
> Do you have a system that exhibits problems with ASPM enabled?

I am not sure.

I get intermittent messages in dmesg as seen below and stumbled upon
commit ffdadd68af5a while researching, which looked similar.

I haven't found a way to easily or reliably reproduce this issue, but
it surfaces as dmesg reporting an unknown NMI, and all the disks
suddenly going offline. There is some sort of controller fault
occurring because of the dmesg line which says "mpt3sas_cm0:
_base_fault_reset_work: Running mpt3sas_dead_ioc thread success."

My naive thought process was that:

- A message from Sreekanth back in ~2016 suggested that it should be
disabled explicitly for SAS-2.0 [1] - perhaps this is also true for
SAS-3.0 ?
- Not sure, but disabling ASPM for SAS-3.0 probably wouldn't
negatively impact users
- Disabling ASPM explicitly in the driver only has an impact if the
BIOS has given kernel control of ASPM, but could be a good safeguard.
- It may (or may not) reduce the incidence of this event I sporadically see.

Is there a way to induce ASPM events so that I could test this? Or
perhaps can I tweak the fault handler to get more information about
the specific type of fault?

All in all I figured the change was relatively harmless and could
reduce the incidence of this sporadic NMI I see.

Thanks,
Joe

[1]: https://patchwork.kernel.org/project/linux-scsi/patch/20161228110524.7516-1-ojab@ojab.ru/#20106435

1513141.713575] Uhhuh. NMI received for unknown reason 30 on CPU 0.
[1513141.713576] Do you have a strange power saving mode enabled?
[1513141.713577] Dazed and confused, but trying to continue
[1513141.839140] mpt3sas_cm0: SAS host is non-operational !!!!
[1513142.867056] mpt3sas_cm0: SAS host is non-operational !!!!
[1513143.890996] mpt3sas_cm0: SAS host is non-operational !!!!
[1513144.914887] mpt3sas_cm0: SAS host is non-operational !!!!
[1513145.934806] mpt3sas_cm0: SAS host is non-operational !!!!
[1513146.958724] mpt3sas_cm0: SAS host is non-operational !!!!
[1513146.965053] mpt3sas_cm0: _base_fault_reset_work: Running
mpt3sas_dead_ioc thread success !!!!
[1513146.965423] sd 0:0:7:0: [sdh] tag#0 FAILED Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[1513146.973762] sd 0:0:7:0: [sdh] tag#0 CDB: Read(10) 28 00 d7 72 30
b0 00 00 10 00
[1513146.973764] print_req_error: I/O error, dev sdh, sector 3614585008
[1513146.978754] sd 0:0:6:0: [sdg] tag#29 FAILED Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[1513146.978756] sd 0:0:6:0: [sdg] tag#9 FAILED Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[1513146.978757] sd 0:0:6:0: [sdg] tag#33 FAILED Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[1513146.978759] sd 0:0:6:0: [sdg] tag#33 CDB: Read(10) 28 00 d8 47 30
68 00 00 30 00
[1513146.978760] sd 0:0:6:0: [sdg] tag#9 CDB: Write(10) 2a 00 61 d1 ae
20 00 04 00 00

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-04-06 19:01 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-25 19:51 [PATCH] scsi: mpt3sas: disable ASPM for mpt3sas / SAS3.0 Joe Damato
2021-04-06  4:00 ` Martin K. Petersen
2021-04-06 19:01   ` Joe Damato

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.