iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [bugzilla-daemon@bugzilla.kernel.org: [Bug 209149] New: "iommu/vt-d: Enable PCI ACS for platform opt in hint" makes NVMe config space not accessible after S3]
@ 2020-09-23 16:03 Bjorn Helgaas
  2020-09-23 16:19 ` Raj, Ashok
  2020-09-23 16:31 ` Kai-Heng Feng
  0 siblings, 2 replies; 9+ messages in thread
From: Bjorn Helgaas @ 2020-09-23 16:03 UTC (permalink / raw)
  To: linux-pci
  Cc: Joerg Roedel, Ashok Raj, Sagi Grimberg, linux-nvme, Jens Axboe,
	Lalithambika Krishnakumar, iommu, Kai-Heng Feng, Keith Busch,
	Rajat Jain, Mika Westerberg, Christoph Hellwig

[+cc IOMMU and NVMe folks]

Sorry, I forgot to forward this to linux-pci when it was first
reported.

Apparently this happens with v5.9-rc3, and may be related to
50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint"),
which appeared in v5.8-rc3.

There are several dmesg logs and proposed patches in the bugzilla, but
no analysis yet of what the problem is.  From the first dmesg
attachment (https://bugzilla.kernel.org/attachment.cgi?id=292327):

  [   50.434945] PM: suspend entry (deep)
  [   50.802086] nvme 0000:01:00.0: saving config space at offset 0x0 (reading 0x11e0f)
  [   50.842775] ACPI: Preparing to enter system sleep state S3
  [   50.858922] ACPI: Waking up from system sleep state S3
  [   50.883622] nvme 0000:01:00.0: can't change power state from D3hot to D0 (config space inaccessible)
  [   50.947352] nvme 0000:01:00.0: restoring config space at offset 0x0 (was 0xffffffff, writing 0x11e0f)
  [   50.947816] pcieport 0000:00:1b.0: DPC: containment event, status:0x1f01 source:0x0000
  [   50.947817] pcieport 0000:00:1b.0: DPC: unmasked uncorrectable error detected
  [   50.947829] pcieport 0000:00:1b.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID)
  [   50.947830] pcieport 0000:00:1b.0:   device [8086:06ac] error status/mask=00200000/00010000
  [   50.947831] pcieport 0000:00:1b.0:    [21] ACSViol                (First)
  [   50.947841] pcieport 0000:00:1b.0: AER: broadcast error_detected message
  [   50.947843] nvme nvme0: frozen state error detected, reset controller

I suspect the nvme "can't change power state" and restore config space
errors are a consequence of the DPC event.  If DPC disables the link,
the device is inaccessible.

I don't know what caused the ACS Violation.  The AER TLP Header Log
might have a clue, but unfortunately we didn't print it.

Tangent:

  The fact that we didn't print the AER TLP Header log looks like
  a bug in itself.  PCIe r5.0, sec 6.2.7, table 6-5, says many
  errors, including ACS Violation, should log the TLP header.  But
  aer_get_device_error_info() only reads the log for error bits in
  AER_LOG_TLP_MASKS, which doesn't include PCI_ERR_UNC_ACSV.

  I don't think there's a "TLP Header Log Valid" bit, and it's ugly to
  have to update AER_LOG_TLP_MASKS if new errors are added.  I think
  maybe we should always print the header log.

----- Forwarded message from bugzilla-daemon@bugzilla.kernel.org -----

Date: Fri, 04 Sep 2020 14:31:20 +0000
From: bugzilla-daemon@bugzilla.kernel.org
To: bjorn@helgaas.com
Subject: [Bug 209149] New: "iommu/vt-d: Enable PCI ACS for platform opt in
	hint" makes NVMe config space not accessible after S3
Message-ID: <bug-209149-41252@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=209149

            Bug ID: 209149
           Summary: "iommu/vt-d: Enable PCI ACS for platform opt in hint"
                    makes NVMe config space not accessible after S3
           Product: Drivers
           Version: 2.5
    Kernel Version: mainline
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: PCI
          Assignee: drivers_pci@kernel-bugs.osdl.org
          Reporter: kai.heng.feng@canonical.com
        Regression: No

Here's the error:
[   50.947816] pcieport 0000:00:1b.0: DPC: containment event, status:0x1f01
source:0x0000
[   50.947817] pcieport 0000:00:1b.0: DPC: unmasked uncorrectable error
detected
[   50.947829] pcieport 0000:00:1b.0: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Receiver ID)
[   50.947830] pcieport 0000:00:1b.0:   device [8086:06ac] error
status/mask=00200000/00010000
[   50.947831] pcieport 0000:00:1b.0:    [21] ACSViol                (First)
[   50.947841] pcieport 0000:00:1b.0: AER: broadcast error_detected message
[   50.947843] nvme nvme0: frozen state error detected, reset controller

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

----- End forwarded message -----
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bugzilla-daemon@bugzilla.kernel.org: [Bug 209149] New: "iommu/vt-d: Enable PCI ACS for platform opt in hint" makes NVMe config space not accessible after S3]
  2020-09-23 16:03 [bugzilla-daemon@bugzilla.kernel.org: [Bug 209149] New: "iommu/vt-d: Enable PCI ACS for platform opt in hint" makes NVMe config space not accessible after S3] Bjorn Helgaas
@ 2020-09-23 16:19 ` Raj, Ashok
  2020-09-23 19:45   ` Rajat Jain via iommu
  2020-09-23 16:31 ` Kai-Heng Feng
  1 sibling, 1 reply; 9+ messages in thread
From: Raj, Ashok @ 2020-09-23 16:19 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Joerg Roedel, Sagi Grimberg, linux-pci, linux-nvme, Jens Axboe,
	Lalithambika Krishnakumar, iommu, Kai-Heng Feng, Keith Busch,
	Ashok Raj, Rajat Jain, Mika Westerberg, Christoph Hellwig

Hi Bjorn


On Wed, Sep 23, 2020 at 11:03:27AM -0500, Bjorn Helgaas wrote:
> [+cc IOMMU and NVMe folks]
> 
> Sorry, I forgot to forward this to linux-pci when it was first
> reported.
> 
> Apparently this happens with v5.9-rc3, and may be related to
> 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint"),
> which appeared in v5.8-rc3.
> 
> There are several dmesg logs and proposed patches in the bugzilla, but
> no analysis yet of what the problem is.  From the first dmesg
> attachment (https://bugzilla.kernel.org/attachment.cgi?id=292327):

We have been investigating this internally as well. It appears maybe the
specupdate for Cometlake is missing the errata documention. The offsets
were wrong in some of them, and if its the same issue its likely cause. 

Will nudge the hw folks to hunt that down :-(.

Cheers,
Ashok
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bugzilla-daemon@bugzilla.kernel.org: [Bug 209149] New: "iommu/vt-d: Enable PCI ACS for platform opt in hint" makes NVMe config space not accessible after S3]
  2020-09-23 16:03 [bugzilla-daemon@bugzilla.kernel.org: [Bug 209149] New: "iommu/vt-d: Enable PCI ACS for platform opt in hint" makes NVMe config space not accessible after S3] Bjorn Helgaas
  2020-09-23 16:19 ` Raj, Ashok
@ 2020-09-23 16:31 ` Kai-Heng Feng
  2020-09-24 18:09   ` Raj, Ashok
  1 sibling, 1 reply; 9+ messages in thread
From: Kai-Heng Feng @ 2020-09-23 16:31 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Joerg Roedel, Jechlitschek, Christoph, Ashok Raj, Sagi Grimberg,
	open list:PCI SUBSYSTEM, open list:NVM EXPRESS DRIVER,
	Jens Axboe, Lalithambika Krishnakumar, iommu, Keith Busch,
	Rajat Jain, Mika Westerberg, Christoph Hellwig

[+Cc Christoph]

> On Sep 24, 2020, at 00:03, Bjorn Helgaas <helgaas@kernel.org> wrote:
> 
> [+cc IOMMU and NVMe folks]
> 
> Sorry, I forgot to forward this to linux-pci when it was first
> reported.
> 
> Apparently this happens with v5.9-rc3, and may be related to
> 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint"),
> which appeared in v5.8-rc3.
> 
> There are several dmesg logs and proposed patches in the bugzilla, but
> no analysis yet of what the problem is.  From the first dmesg
> attachment (https://bugzilla.kernel.org/attachment.cgi?id=292327):

AFAIK Intel is working on it internally.
Comet Lake probably needs ACS quirk like older generation chips.

> 
>  [   50.434945] PM: suspend entry (deep)
>  [   50.802086] nvme 0000:01:00.0: saving config space at offset 0x0 (reading 0x11e0f)
>  [   50.842775] ACPI: Preparing to enter system sleep state S3
>  [   50.858922] ACPI: Waking up from system sleep state S3
>  [   50.883622] nvme 0000:01:00.0: can't change power state from D3hot to D0 (config space inaccessible)
>  [   50.947352] nvme 0000:01:00.0: restoring config space at offset 0x0 (was 0xffffffff, writing 0x11e0f)
>  [   50.947816] pcieport 0000:00:1b.0: DPC: containment event, status:0x1f01 source:0x0000
>  [   50.947817] pcieport 0000:00:1b.0: DPC: unmasked uncorrectable error detected
>  [   50.947829] pcieport 0000:00:1b.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID)
>  [   50.947830] pcieport 0000:00:1b.0:   device [8086:06ac] error status/mask=00200000/00010000
>  [   50.947831] pcieport 0000:00:1b.0:    [21] ACSViol                (First)
>  [   50.947841] pcieport 0000:00:1b.0: AER: broadcast error_detected message
>  [   50.947843] nvme nvme0: frozen state error detected, reset controller
> 
> I suspect the nvme "can't change power state" and restore config space
> errors are a consequence of the DPC event.  If DPC disables the link,
> the device is inaccessible.
> 
> I don't know what caused the ACS Violation.  The AER TLP Header Log
> might have a clue, but unfortunately we didn't print it.
> 
> Tangent:
> 
>  The fact that we didn't print the AER TLP Header log looks like
>  a bug in itself.  PCIe r5.0, sec 6.2.7, table 6-5, says many
>  errors, including ACS Violation, should log the TLP header.  But
>  aer_get_device_error_info() only reads the log for error bits in
>  AER_LOG_TLP_MASKS, which doesn't include PCI_ERR_UNC_ACSV.
> 
>  I don't think there's a "TLP Header Log Valid" bit, and it's ugly to
>  have to update AER_LOG_TLP_MASKS if new errors are added.  I think
>  maybe we should always print the header log.

I can attach TLP Header if there's a patch...

Kai-Heng

> 
> ----- Forwarded message from bugzilla-daemon@bugzilla.kernel.org -----
> 
> Date: Fri, 04 Sep 2020 14:31:20 +0000
> From: bugzilla-daemon@bugzilla.kernel.org
> To: bjorn@helgaas.com
> Subject: [Bug 209149] New: "iommu/vt-d: Enable PCI ACS for platform opt in
> 	hint" makes NVMe config space not accessible after S3
> Message-ID: <bug-209149-41252@https.bugzilla.kernel.org/>
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=209149
> 
>            Bug ID: 209149
>           Summary: "iommu/vt-d: Enable PCI ACS for platform opt in hint"
>                    makes NVMe config space not accessible after S3
>           Product: Drivers
>           Version: 2.5
>    Kernel Version: mainline
>          Hardware: All
>                OS: Linux
>              Tree: Mainline
>            Status: NEW
>          Severity: normal
>          Priority: P1
>         Component: PCI
>          Assignee: drivers_pci@kernel-bugs.osdl.org
>          Reporter: kai.heng.feng@canonical.com
>        Regression: No
> 
> Here's the error:
> [   50.947816] pcieport 0000:00:1b.0: DPC: containment event, status:0x1f01
> source:0x0000
> [   50.947817] pcieport 0000:00:1b.0: DPC: unmasked uncorrectable error
> detected
> [   50.947829] pcieport 0000:00:1b.0: PCIe Bus Error: severity=Uncorrected
> (Non-Fatal), type=Transaction Layer, (Receiver ID)
> [   50.947830] pcieport 0000:00:1b.0:   device [8086:06ac] error
> status/mask=00200000/00010000
> [   50.947831] pcieport 0000:00:1b.0:    [21] ACSViol                (First)
> [   50.947841] pcieport 0000:00:1b.0: AER: broadcast error_detected message
> [   50.947843] nvme nvme0: frozen state error detected, reset controller
> 
> -- 
> You are receiving this mail because:
> You are watching the assignee of the bug.
> 
> ----- End forwarded message -----

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bugzilla-daemon@bugzilla.kernel.org: [Bug 209149] New: "iommu/vt-d: Enable PCI ACS for platform opt in hint" makes NVMe config space not accessible after S3]
  2020-09-23 16:19 ` Raj, Ashok
@ 2020-09-23 19:45   ` Rajat Jain via iommu
  2020-09-24 20:03     ` Raj, Ashok
  0 siblings, 1 reply; 9+ messages in thread
From: Rajat Jain via iommu @ 2020-09-23 19:45 UTC (permalink / raw)
  To: Raj, Ashok
  Cc: Joerg Roedel, Sagi Grimberg, linux-pci, linux-nvme, Jens Axboe,
	Lalithambika Krishnakumar, open list:AMD IOMMU (AMD-VI),
	Kai-Heng Feng, Bjorn Helgaas, Keith Busch, Mika Westerberg,
	Christoph Hellwig

On Wed, Sep 23, 2020 at 9:19 AM Raj, Ashok <ashok.raj@intel.com> wrote:
>
> Hi Bjorn
>
>
> On Wed, Sep 23, 2020 at 11:03:27AM -0500, Bjorn Helgaas wrote:
> > [+cc IOMMU and NVMe folks]
> >
> > Sorry, I forgot to forward this to linux-pci when it was first
> > reported.
> >
> > Apparently this happens with v5.9-rc3, and may be related to
> > 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint"),
> > which appeared in v5.8-rc3.
> >
> > There are several dmesg logs and proposed patches in the bugzilla, but
> > no analysis yet of what the problem is.  From the first dmesg
> > attachment (https://bugzilla.kernel.org/attachment.cgi?id=292327):
>
> We have been investigating this internally as well. It appears maybe the
> specupdate for Cometlake is missing the errata documention. The offsets
> were wrong in some of them, and if its the same issue its likely cause.

Can you please also confirm if errata applies to Tigerlake ?

Thanks,

Rajat

>
> Will nudge the hw folks to hunt that down :-(.
>
> Cheers,
> Ashok
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bugzilla-daemon@bugzilla.kernel.org: [Bug 209149] New: "iommu/vt-d: Enable PCI ACS for platform opt in hint" makes NVMe config space not accessible after S3]
  2020-09-23 16:31 ` Kai-Heng Feng
@ 2020-09-24 18:09   ` Raj, Ashok
  2020-09-24 19:39     ` Alex Williamson
  0 siblings, 1 reply; 9+ messages in thread
From: Raj, Ashok @ 2020-09-24 18:09 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: Joerg Roedel, Jechlitschek, Christoph, Sagi Grimberg,
	open list:PCI SUBSYSTEM, Alex Williamson,
	open list:NVM EXPRESS DRIVER, Jens Axboe,
	Lalithambika Krishnakumar, iommu, Bjorn Helgaas, Keith Busch,
	Ashok Raj, Rajat Jain, Mika Westerberg, Christoph Hellwig

Hi Kai

+ Alex, since he had some of the early quirks authored.

On Thu, Sep 24, 2020 at 12:31:53AM +0800, Kai-Heng Feng wrote:
> [+Cc Christoph]
> 
> > On Sep 24, 2020, at 00:03, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > 
> > [+cc IOMMU and NVMe folks]
> > 
> > Sorry, I forgot to forward this to linux-pci when it was first
> > reported.
> > 
> > Apparently this happens with v5.9-rc3, and may be related to
> > 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint"),
> > which appeared in v5.8-rc3.
> > 
> > There are several dmesg logs and proposed patches in the bugzilla, but
> > no analysis yet of what the problem is.  From the first dmesg
> > attachment (https://bugzilla.kernel.org/attachment.cgi?id=292327):
> 
> AFAIK Intel is working on it internally.
> Comet Lake probably needs ACS quirk like older generation chips.

I have confirmed with Internal documentation that the problem exists on
Comet Lake. But its fixed ICL and TGL generations.

Unfortunately I do not see if the public specupdate documents are for these
generation chipsets to makes sure all root port id's can be captured.

There is also another entry in bugzilla that was forwarded that referred to
Request Redirect Capability to be always disabled as well. This same
workaround also seems to be turning off RR for the root port. I believe it
should fix it as well. But i saw another patch attached.

Can you tell how you reproduce this? just doing a

#echo mem > /sys/power/state

is sufficient with an attached NVMe drive? 

> 
> > 
> >  [   50.434945] PM: suspend entry (deep)
> >  [   50.802086] nvme 0000:01:00.0: saving config space at offset 0x0 (reading 0x11e0f)
> >  [   50.842775] ACPI: Preparing to enter system sleep state S3
> >  [   50.858922] ACPI: Waking up from system sleep state S3
> >  [   50.883622] nvme 0000:01:00.0: can't change power state from D3hot to D0 (config space inaccessible)
> >  [   50.947352] nvme 0000:01:00.0: restoring config space at offset 0x0 (was 0xffffffff, writing 0x11e0f)
> >  [   50.947816] pcieport 0000:00:1b.0: DPC: containment event, status:0x1f01 source:0x0000
> >  [   50.947817] pcieport 0000:00:1b.0: DPC: unmasked uncorrectable error detected
> >  [   50.947829] pcieport 0000:00:1b.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID)
> >  [   50.947830] pcieport 0000:00:1b.0:   device [8086:06ac] error status/mask=00200000/00010000
> >  [   50.947831] pcieport 0000:00:1b.0:    [21] ACSViol                (First)
> >  [   50.947841] pcieport 0000:00:1b.0: AER: broadcast error_detected message
> >  [   50.947843] nvme nvme0: frozen state error detected, reset controller
> > 
> > I suspect the nvme "can't change power state" and restore config space
> > errors are a consequence of the DPC event.  If DPC disables the link,
> > the device is inaccessible.
> > 
> > I don't know what caused the ACS Violation.  The AER TLP Header Log
> > might have a clue, but unfortunately we didn't print it.
> > 

Apparently it also requires to disable RR, and I'm not able to confirm if
CML requires that as well. 

pci_quirk_disable_intel_spt_pch_acs_redir() also seems to consult the same
table, so i'm not sure why we need the other patch in bugzilla is required.


Cheers,
Ashok
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bugzilla-daemon@bugzilla.kernel.org: [Bug 209149] New: "iommu/vt-d: Enable PCI ACS for platform opt in hint" makes NVMe config space not accessible after S3]
  2020-09-24 18:09   ` Raj, Ashok
@ 2020-09-24 19:39     ` Alex Williamson
  2020-09-24 19:44       ` Raj, Ashok
  0 siblings, 1 reply; 9+ messages in thread
From: Alex Williamson @ 2020-09-24 19:39 UTC (permalink / raw)
  To: Raj, Ashok
  Cc: Joerg Roedel, Jechlitschek, Christoph, Sagi Grimberg,
	open list:PCI SUBSYSTEM, open list:NVM EXPRESS DRIVER,
	Jens Axboe, Lalithambika Krishnakumar, iommu, Kai-Heng Feng,
	Bjorn Helgaas, Keith Busch, Rajat Jain, Mika Westerberg,
	Christoph Hellwig

On Thu, 24 Sep 2020 11:09:05 -0700
"Raj, Ashok" <ashok.raj@intel.com> wrote:

> Hi Kai
> 
> + Alex, since he had some of the early quirks authored.
> 
> On Thu, Sep 24, 2020 at 12:31:53AM +0800, Kai-Heng Feng wrote:
> > [+Cc Christoph]
> >   
> > > On Sep 24, 2020, at 00:03, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > 
> > > [+cc IOMMU and NVMe folks]
> > > 
> > > Sorry, I forgot to forward this to linux-pci when it was first
> > > reported.
> > > 
> > > Apparently this happens with v5.9-rc3, and may be related to
> > > 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint"),
> > > which appeared in v5.8-rc3.
> > > 
> > > There are several dmesg logs and proposed patches in the bugzilla, but
> > > no analysis yet of what the problem is.  From the first dmesg
> > > attachment (https://bugzilla.kernel.org/attachment.cgi?id=292327):  
> > 
> > AFAIK Intel is working on it internally.
> > Comet Lake probably needs ACS quirk like older generation chips.  
> 
> I have confirmed with Internal documentation that the problem exists on
> Comet Lake. But its fixed ICL and TGL generations.
> 
> Unfortunately I do not see if the public specupdate documents are for these
> generation chipsets to makes sure all root port id's can be captured.
> 
> There is also another entry in bugzilla that was forwarded that referred to
> Request Redirect Capability to be always disabled as well. This same
> workaround also seems to be turning off RR for the root port. I believe it
> should fix it as well. But i saw another patch attached.
> 
> Can you tell how you reproduce this? just doing a
> 
> #echo mem > /sys/power/state
> 
> is sufficient with an attached NVMe drive? 
> 
> >   
> > > 
> > >  [   50.434945] PM: suspend entry (deep)
> > >  [   50.802086] nvme 0000:01:00.0: saving config space at offset 0x0 (reading 0x11e0f)
> > >  [   50.842775] ACPI: Preparing to enter system sleep state S3
> > >  [   50.858922] ACPI: Waking up from system sleep state S3
> > >  [   50.883622] nvme 0000:01:00.0: can't change power state from D3hot to D0 (config space inaccessible)
> > >  [   50.947352] nvme 0000:01:00.0: restoring config space at offset 0x0 (was 0xffffffff, writing 0x11e0f)
> > >  [   50.947816] pcieport 0000:00:1b.0: DPC: containment event, status:0x1f01 source:0x0000
> > >  [   50.947817] pcieport 0000:00:1b.0: DPC: unmasked uncorrectable error detected
> > >  [   50.947829] pcieport 0000:00:1b.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID)
> > >  [   50.947830] pcieport 0000:00:1b.0:   device [8086:06ac] error status/mask=00200000/00010000
> > >  [   50.947831] pcieport 0000:00:1b.0:    [21] ACSViol                (First)
> > >  [   50.947841] pcieport 0000:00:1b.0: AER: broadcast error_detected message
> > >  [   50.947843] nvme nvme0: frozen state error detected, reset controller
> > > 
> > > I suspect the nvme "can't change power state" and restore config space
> > > errors are a consequence of the DPC event.  If DPC disables the link,
> > > the device is inaccessible.
> > > 
> > > I don't know what caused the ACS Violation.  The AER TLP Header Log
> > > might have a clue, but unfortunately we didn't print it.
> > >   
> 
> Apparently it also requires to disable RR, and I'm not able to confirm if
> CML requires that as well. 
> 
> pci_quirk_disable_intel_spt_pch_acs_redir() also seems to consult the same
> table, so i'm not sure why we need the other patch in bugzilla is required.

If we're talking about the Intel bug where PCH root ports implement
the ACS capability and control registers as dword rather than word
registers, then how is ACS getting enabled in order to generate an ACS
violation?  The standard ACS code would write to the control register
word at offset 6, which is still the read-only capability register on
those devices.  Thanks,

Alex

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bugzilla-daemon@bugzilla.kernel.org: [Bug 209149] New: "iommu/vt-d: Enable PCI ACS for platform opt in hint" makes NVMe config space not accessible after S3]
  2020-09-24 19:39     ` Alex Williamson
@ 2020-09-24 19:44       ` Raj, Ashok
  2020-09-25  6:35         ` Kai-Heng Feng
  0 siblings, 1 reply; 9+ messages in thread
From: Raj, Ashok @ 2020-09-24 19:44 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Joerg Roedel, Jechlitschek, Christoph, Sagi Grimberg,
	open list:PCI SUBSYSTEM, open list:NVM EXPRESS DRIVER,
	Jens Axboe, Lalithambika Krishnakumar, iommu, Kai-Heng Feng,
	Bjorn Helgaas, Keith Busch, Rajat Jain, Mika Westerberg,
	Christoph Hellwig, Ashok Raj

Hi Alex

> > Apparently it also requires to disable RR, and I'm not able to confirm if
> > CML requires that as well. 
> > 
> > pci_quirk_disable_intel_spt_pch_acs_redir() also seems to consult the same
> > table, so i'm not sure why we need the other patch in bugzilla is required.
> 
> If we're talking about the Intel bug where PCH root ports implement
> the ACS capability and control registers as dword rather than word
> registers, then how is ACS getting enabled in order to generate an ACS
> violation?  The standard ACS code would write to the control register
> word at offset 6, which is still the read-only capability register on
> those devices.  Thanks,


Right... Maybe we need header log to figure out what exatly is happening. 

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bugzilla-daemon@bugzilla.kernel.org: [Bug 209149] New: "iommu/vt-d: Enable PCI ACS for platform opt in hint" makes NVMe config space not accessible after S3]
  2020-09-23 19:45   ` Rajat Jain via iommu
@ 2020-09-24 20:03     ` Raj, Ashok
  0 siblings, 0 replies; 9+ messages in thread
From: Raj, Ashok @ 2020-09-24 20:03 UTC (permalink / raw)
  To: Rajat Jain
  Cc: Joerg Roedel, Sagi Grimberg, linux-pci, linux-nvme, Jens Axboe,
	Lalithambika Krishnakumar, open list:AMD IOMMU (AMD-VI),
	Kai-Heng Feng, Bjorn Helgaas, Keith Busch, Ashok Raj,
	Mika Westerberg, Christoph Hellwig

On Wed, Sep 23, 2020 at 12:45:11PM -0700, Rajat Jain wrote:
> On Wed, Sep 23, 2020 at 9:19 AM Raj, Ashok <ashok.raj@intel.com> wrote:
> >
> > Hi Bjorn
> >
> >
> > On Wed, Sep 23, 2020 at 11:03:27AM -0500, Bjorn Helgaas wrote:
> > > [+cc IOMMU and NVMe folks]
> > >
> > > Sorry, I forgot to forward this to linux-pci when it was first
> > > reported.
> > >
> > > Apparently this happens with v5.9-rc3, and may be related to
> > > 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint"),
> > > which appeared in v5.8-rc3.
> > >
> > > There are several dmesg logs and proposed patches in the bugzilla, but
> > > no analysis yet of what the problem is.  From the first dmesg
> > > attachment (https://bugzilla.kernel.org/attachment.cgi?id=292327):
> >
> > We have been investigating this internally as well. It appears maybe the
> > specupdate for Cometlake is missing the errata documention. The offsets
> > were wrong in some of them, and if its the same issue its likely cause.
> 
> Can you please also confirm if errata applies to Tigerlake ?
> 

We confirmed ICL/TGL isn't affected.


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bugzilla-daemon@bugzilla.kernel.org: [Bug 209149] New: "iommu/vt-d: Enable PCI ACS for platform opt in hint" makes NVMe config space not accessible after S3]
  2020-09-24 19:44       ` Raj, Ashok
@ 2020-09-25  6:35         ` Kai-Heng Feng
  0 siblings, 0 replies; 9+ messages in thread
From: Kai-Heng Feng @ 2020-09-25  6:35 UTC (permalink / raw)
  To: Raj, Ashok
  Cc: Joerg Roedel, Jechlitschek, Christoph, Sagi Grimberg,
	open list:PCI SUBSYSTEM, iommu, open list:NVM EXPRESS DRIVER,
	Jens Axboe, Lalithambika Krishnakumar, Alex Williamson,
	Bjorn Helgaas, Keith Busch, Rajat Jain, Mika Westerberg,
	Christoph Hellwig

Raj,

> On Sep 25, 2020, at 03:44, Raj, Ashok <ashok.raj@intel.com> wrote:
> 
> Hi Alex
> 
>>> Apparently it also requires to disable RR, and I'm not able to confirm if
>>> CML requires that as well. 
>>> 
>>> pci_quirk_disable_intel_spt_pch_acs_redir() also seems to consult the same
>>> table, so i'm not sure why we need the other patch in bugzilla is required.
>> 
>> If we're talking about the Intel bug where PCH root ports implement
>> the ACS capability and control registers as dword rather than word
>> registers, then how is ACS getting enabled in order to generate an ACS
>> violation?  The standard ACS code would write to the control register
>> word at offset 6, which is still the read-only capability register on
>> those devices.  Thanks,
> 
> 
> Right... Maybe we need header log to figure out what exatly is happening. 
> 

Please let me know what logs you need.

As Bjorn mentioned earlier, there's currently no way to dump TLP header log?

Kai-Heng
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-09-25  6:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-23 16:03 [bugzilla-daemon@bugzilla.kernel.org: [Bug 209149] New: "iommu/vt-d: Enable PCI ACS for platform opt in hint" makes NVMe config space not accessible after S3] Bjorn Helgaas
2020-09-23 16:19 ` Raj, Ashok
2020-09-23 19:45   ` Rajat Jain via iommu
2020-09-24 20:03     ` Raj, Ashok
2020-09-23 16:31 ` Kai-Heng Feng
2020-09-24 18:09   ` Raj, Ashok
2020-09-24 19:39     ` Alex Williamson
2020-09-24 19:44       ` Raj, Ashok
2020-09-25  6:35         ` Kai-Heng Feng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).