All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
@ 2017-04-07 14:32 Joerg Roedel
  2017-04-07 16:46   ` Deucher, Alexander
                   ` (3 more replies)
  0 siblings, 4 replies; 35+ messages in thread
From: Joerg Roedel @ 2017-04-07 14:32 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-kernel, Daniel Drake, Alexander Deucher,
	Samuel Sieb, David Woodhouse, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

ATS is broken on this hardware and causes IOMMU stalls and
system failure. Disable ATS on these devices to make them
usable again with IOMMU enabled.

Note that the commit in the Fixes-tag is not buggy, it
just uncovers the problem in the hardware by increasing
the ATS-flush rate.

Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 drivers/pci/quirks.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 6736836..7cbe316 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4634,3 +4634,22 @@ static void quirk_no_aersid(struct pci_dev *pdev)
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2031, quirk_no_aersid);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2032, quirk_no_aersid);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2033, quirk_no_aersid);
+
+#ifdef CONFIG_PCI_ATS
+/*
+ * Some devices have a broken ATS implementation causing IOMMU stalls.
+ * Don't use ATS for those devices.
+ */
+static void quirk_disable_ats(struct pci_dev *pdev)
+{
+	/*
+	 * Set pdev->ats_cap = 0 to make pci_enable_ats() bail out
+	 * early.
+	 */
+	dev_info(&pdev->dev, "QUIRK: Disabling ATS");
+	pdev->ats_cap = 0;
+}
+
+/* AMD Stoney platform GPU */
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4, quirk_disable_ats);
+#endif /* CONFIG_PCI_ATS */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
  2017-04-07 14:32 [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs Joerg Roedel
@ 2017-04-07 16:46   ` Deucher, Alexander
  2017-04-08  7:41 ` Lukas Wunner
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 35+ messages in thread
From: Deucher, Alexander @ 2017-04-07 16:46 UTC (permalink / raw)
  To: 'Joerg Roedel', Bjorn Helgaas
  Cc: linux-pci, linux-kernel, Daniel Drake, Samuel Sieb,
	David Woodhouse, Joerg Roedel

> -----Original Message-----
> From: Joerg Roedel [mailto:joro@8bytes.org]
> Sent: Friday, April 07, 2017 10:32 AM
> To: Bjorn Helgaas
> Cc: linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel Drake;
> Deucher, Alexander; Samuel Sieb; David Woodhouse; Joerg Roedel
> Subject: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> 
> From: Joerg Roedel <jroedel@suse.de>
> 
> ATS is broken on this hardware and causes IOMMU stalls and
> system failure. Disable ATS on these devices to make them
> usable again with IOMMU enabled.
> 
> Note that the commit in the Fixes-tag is not buggy, it
> just uncovers the problem in the hardware by increasing
> the ATS-flush rate.
> 
> Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
> Signed-off-by: Joerg Roedel <jroedel@suse.de>

Acked-by: Alex Deucher <alexander.deucher@amd.com>

> ---
>  drivers/pci/quirks.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 6736836..7cbe316 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4634,3 +4634,22 @@ static void quirk_no_aersid(struct pci_dev *pdev)
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2031,
> quirk_no_aersid);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2032,
> quirk_no_aersid);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2033,
> quirk_no_aersid);
> +
> +#ifdef CONFIG_PCI_ATS
> +/*
> + * Some devices have a broken ATS implementation causing IOMMU stalls.
> + * Don't use ATS for those devices.
> + */
> +static void quirk_disable_ats(struct pci_dev *pdev)
> +{
> +	/*
> +	 * Set pdev->ats_cap = 0 to make pci_enable_ats() bail out
> +	 * early.
> +	 */
> +	dev_info(&pdev->dev, "QUIRK: Disabling ATS");
> +	pdev->ats_cap = 0;
> +}
> +
> +/* AMD Stoney platform GPU */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4,
> quirk_disable_ats);
> +#endif /* CONFIG_PCI_ATS */
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
@ 2017-04-07 16:46   ` Deucher, Alexander
  0 siblings, 0 replies; 35+ messages in thread
From: Deucher, Alexander @ 2017-04-07 16:46 UTC (permalink / raw)
  To: 'Joerg Roedel', Bjorn Helgaas
  Cc: linux-pci, linux-kernel, Daniel Drake, Samuel Sieb,
	David Woodhouse, Joerg Roedel

> -----Original Message-----
> From: Joerg Roedel [mailto:joro@8bytes.org]
> Sent: Friday, April 07, 2017 10:32 AM
> To: Bjorn Helgaas
> Cc: linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel Drake=
;
> Deucher, Alexander; Samuel Sieb; David Woodhouse; Joerg Roedel
> Subject: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
>=20
> From: Joerg Roedel <jroedel@suse.de>
>=20
> ATS is broken on this hardware and causes IOMMU stalls and
> system failure. Disable ATS on these devices to make them
> usable again with IOMMU enabled.
>=20
> Note that the commit in the Fixes-tag is not buggy, it
> just uncovers the problem in the hardware by increasing
> the ATS-flush rate.
>=20
> Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
> Signed-off-by: Joerg Roedel <jroedel@suse.de>

Acked-by: Alex Deucher <alexander.deucher@amd.com>

> ---
>  drivers/pci/quirks.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
>=20
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 6736836..7cbe316 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4634,3 +4634,22 @@ static void quirk_no_aersid(struct pci_dev *pdev)
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2031,
> quirk_no_aersid);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2032,
> quirk_no_aersid);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2033,
> quirk_no_aersid);
> +
> +#ifdef CONFIG_PCI_ATS
> +/*
> + * Some devices have a broken ATS implementation causing IOMMU stalls.
> + * Don't use ATS for those devices.
> + */
> +static void quirk_disable_ats(struct pci_dev *pdev)
> +{
> +	/*
> +	 * Set pdev->ats_cap =3D 0 to make pci_enable_ats() bail out
> +	 * early.
> +	 */
> +	dev_info(&pdev->dev, "QUIRK: Disabling ATS");
> +	pdev->ats_cap =3D 0;
> +}
> +
> +/* AMD Stoney platform GPU */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4,
> quirk_disable_ats);
> +#endif /* CONFIG_PCI_ATS */
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
  2017-04-07 14:32 [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs Joerg Roedel
  2017-04-07 16:46   ` Deucher, Alexander
@ 2017-04-08  7:41 ` Lukas Wunner
  2017-04-20 12:11   ` Joerg Roedel
  2017-06-15 14:04 ` Joerg Roedel
  2017-07-13  2:56 ` Bjorn Helgaas
  3 siblings, 1 reply; 35+ messages in thread
From: Lukas Wunner @ 2017-04-08  7:41 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Bjorn Helgaas, linux-pci, linux-kernel, Daniel Drake,
	Alexander Deucher, Samuel Sieb, David Woodhouse, Joerg Roedel,
	Paul Menzel

On Fri, Apr 07, 2017 at 04:32:18PM +0200, Joerg Roedel wrote:
> From: Joerg Roedel <jroedel@suse.de>
> 
> ATS is broken on this hardware and causes IOMMU stalls and
> system failure. Disable ATS on these devices to make them
> usable again with IOMMU enabled.

AMD Stoney Ridge is an x86 CPU + GPU combo and this quirk pertains
to the GPU, right?

In that case the quirk should go to arch/x86.  Paul Menzel (+cc)
has just complained on linux-pci@ that final fixups are taking half
a second, and I think that could be reduced if more efforts were
spent to move arch-specific quirks out of the catch-all in
drivers/pci/quirks.c.

Thanks,

Lukas

> 
> Note that the commit in the Fixes-tag is not buggy, it
> just uncovers the problem in the hardware by increasing
> the ATS-flush rate.
> 
> Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  drivers/pci/quirks.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 6736836..7cbe316 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4634,3 +4634,22 @@ static void quirk_no_aersid(struct pci_dev *pdev)
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2031, quirk_no_aersid);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2032, quirk_no_aersid);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2033, quirk_no_aersid);
> +
> +#ifdef CONFIG_PCI_ATS
> +/*
> + * Some devices have a broken ATS implementation causing IOMMU stalls.
> + * Don't use ATS for those devices.
> + */
> +static void quirk_disable_ats(struct pci_dev *pdev)
> +{
> +	/*
> +	 * Set pdev->ats_cap = 0 to make pci_enable_ats() bail out
> +	 * early.
> +	 */
> +	dev_info(&pdev->dev, "QUIRK: Disabling ATS");
> +	pdev->ats_cap = 0;
> +}
> +
> +/* AMD Stoney platform GPU */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4, quirk_disable_ats);
> +#endif /* CONFIG_PCI_ATS */
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
  2017-04-08  7:41 ` Lukas Wunner
@ 2017-04-20 12:11   ` Joerg Roedel
  2017-06-15 17:12     ` Bjorn Helgaas
  0 siblings, 1 reply; 35+ messages in thread
From: Joerg Roedel @ 2017-04-20 12:11 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Joerg Roedel, Bjorn Helgaas, linux-pci, linux-kernel,
	Daniel Drake, Alexander Deucher, Samuel Sieb, David Woodhouse,
	Paul Menzel

On Sat, Apr 08, 2017 at 09:41:07AM +0200, Lukas Wunner wrote:
> On Fri, Apr 07, 2017 at 04:32:18PM +0200, Joerg Roedel wrote:
> > From: Joerg Roedel <jroedel@suse.de>
> > 
> > ATS is broken on this hardware and causes IOMMU stalls and
> > system failure. Disable ATS on these devices to make them
> > usable again with IOMMU enabled.
> 
> AMD Stoney Ridge is an x86 CPU + GPU combo and this quirk pertains
> to the GPU, right?
> 
> In that case the quirk should go to arch/x86.  Paul Menzel (+cc)
> has just complained on linux-pci@ that final fixups are taking half
> a second, and I think that could be reduced if more efforts were
> spent to move arch-specific quirks out of the catch-all in
> drivers/pci/quirks.c.

The affected hardware here might be x86-only, but ATS is not. If a
broken ATS-capable plug-in card appears, we need this in generic code
anyway.

Also has anyone profiled why the fixups take so long (and on what
hardware)? Maybe the fixup-device matching can be improved instead of
cluttering arch-code with pci-fixups.


	Joerg

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
  2017-04-07 16:46   ` Deucher, Alexander
  (?)
@ 2017-05-04 10:21   ` David Woodhouse
  2017-05-04 14:41       ` Deucher, Alexander
  2017-05-23 19:54       ` Deucher, Alexander
  -1 siblings, 2 replies; 35+ messages in thread
From: David Woodhouse @ 2017-05-04 10:21 UTC (permalink / raw)
  To: Deucher, Alexander, 'Joerg Roedel', Bjorn Helgaas
  Cc: linux-pci, linux-kernel, Daniel Drake, Samuel Sieb, Joerg Roedel

[-- Attachment #1: Type: text/plain, Size: 2407 bytes --]

On Fri, 2017-04-07 at 16:46 +0000, Deucher, Alexander wrote:
> > 
> > -----Original Message-----
> > From: Joerg Roedel [mailto:joro@8bytes.org]
> > Sent: Friday, April 07, 2017 10:32 AM
> > To: Bjorn Helgaas
> > Cc: linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel Drake;
> > Deucher, Alexander; Samuel Sieb; David Woodhouse; Joerg Roedel
> > Subject: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> > 
> > From: Joerg Roedel <jroedel@suse.de>
> > 
> > ATS is broken on this hardware and causes IOMMU stalls and
> > system failure. Disable ATS on these devices to make them
> > usable again with IOMMU enabled.
> > 
> > Note that the commit in the Fixes-tag is not buggy, it
> > just uncovers the problem in the hardware by increasing
> > the ATS-flush rate.
> > 
> > Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
> > Signed-off-by: Joerg Roedel <jroedel@suse.de>
> Acked-by: Alex Deucher <alexander.deucher@amd.com>

Alex, are you able to confirm that it is *only* the device with PCI ID
0x98e4 which has this problem, or (more likely) come up with an
exhaustive list? Thanks.

We'll want the same blacklist in Xen too, won't we?

> > 
> > ---
> >  drivers/pci/quirks.c | 19 +++++++++++++++++++
> >  1 file changed, 19 insertions(+)
> > 
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index 6736836..7cbe316 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -4634,3 +4634,22 @@ static void quirk_no_aersid(struct pci_dev *pdev)
> >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2031,
> > quirk_no_aersid);
> >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2032,
> > quirk_no_aersid);
> >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2033,
> > quirk_no_aersid);
> > +
> > +#ifdef CONFIG_PCI_ATS
> > +/*
> > + * Some devices have a broken ATS implementation causing IOMMU stalls.
> > + * Don't use ATS for those devices.
> > + */
> > +static void quirk_disable_ats(struct pci_dev *pdev)
> > +{
> > +	/*
> > +	 * Set pdev->ats_cap = 0 to make pci_enable_ats() bail out
> > +	 * early.
> > +	 */
> > +	dev_info(&pdev->dev, "QUIRK: Disabling ATS");
> > +	pdev->ats_cap = 0;
> > +}
> > +
> > +/* AMD Stoney platform GPU */
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4, quirk_disable_ats);
> > +#endif /* CONFIG_PCI_ATS */
> > --
> > 1.9.1

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4938 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
  2017-05-04 10:21   ` David Woodhouse
@ 2017-05-04 14:41       ` Deucher, Alexander
  2017-05-23 19:54       ` Deucher, Alexander
  1 sibling, 0 replies; 35+ messages in thread
From: Deucher, Alexander @ 2017-05-04 14:41 UTC (permalink / raw)
  To: 'David Woodhouse', 'Joerg Roedel', Bjorn Helgaas
  Cc: linux-pci, linux-kernel, Daniel Drake, Samuel Sieb, Joerg Roedel

> -----Original Message-----
> From: David Woodhouse [mailto:dwmw2@infradead.org]
> Sent: Thursday, May 04, 2017 6:22 AM
> To: Deucher, Alexander; 'Joerg Roedel'; Bjorn Helgaas
> Cc: linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel Drake;
> Samuel Sieb; Joerg Roedel
> Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> 
> On Fri, 2017-04-07 at 16:46 +0000, Deucher, Alexander wrote:
> > >
> > > -----Original Message-----
> > > From: Joerg Roedel [mailto:joro@8bytes.org]
> > > Sent: Friday, April 07, 2017 10:32 AM
> > > To: Bjorn Helgaas
> > > Cc: linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel
> Drake;
> > > Deucher, Alexander; Samuel Sieb; David Woodhouse; Joerg Roedel
> > > Subject: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> > >
> > > From: Joerg Roedel <jroedel@suse.de>
> > >
> > > ATS is broken on this hardware and causes IOMMU stalls and
> > > system failure. Disable ATS on these devices to make them
> > > usable again with IOMMU enabled.
> > >
> > > Note that the commit in the Fixes-tag is not buggy, it
> > > just uncovers the problem in the hardware by increasing
> > > the ATS-flush rate.
> > >
> > > Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
> > > Signed-off-by: Joerg Roedel <jroedel@suse.de>
> > Acked-by: Alex Deucher <alexander.deucher@amd.com>
> 
> Alex, are you able to confirm that it is *only* the device with PCI ID
> 0x98e4 which has this problem, or (more likely) come up with an
> exhaustive list? Thanks.

It's just Stoney, that is the only ID.  ATS is validated on the other GPU parts.

Alex

> 
> We'll want the same blacklist in Xen too, won't we?
> 
> > >
> > > ---
> > >  drivers/pci/quirks.c | 19 +++++++++++++++++++
> > >  1 file changed, 19 insertions(+)
> > >
> > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > index 6736836..7cbe316 100644
> > > --- a/drivers/pci/quirks.c
> > > +++ b/drivers/pci/quirks.c
> > > @@ -4634,3 +4634,22 @@ static void quirk_no_aersid(struct pci_dev
> *pdev)
> > >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2031,
> > > quirk_no_aersid);
> > >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2032,
> > > quirk_no_aersid);
> > >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2033,
> > > quirk_no_aersid);
> > > +
> > > +#ifdef CONFIG_PCI_ATS
> > > +/*
> > > + * Some devices have a broken ATS implementation causing IOMMU
> stalls.
> > > + * Don't use ATS for those devices.
> > > + */
> > > +static void quirk_disable_ats(struct pci_dev *pdev)
> > > +{
> > > +	/*
> > > +	 * Set pdev->ats_cap = 0 to make pci_enable_ats() bail out
> > > +	 * early.
> > > +	 */
> > > +	dev_info(&pdev->dev, "QUIRK: Disabling ATS");
> > > +	pdev->ats_cap = 0;
> > > +}
> > > +
> > > +/* AMD Stoney platform GPU */
> > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4,
> quirk_disable_ats);
> > > +#endif /* CONFIG_PCI_ATS */
> > > --
> > > 1.9.1

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
@ 2017-05-04 14:41       ` Deucher, Alexander
  0 siblings, 0 replies; 35+ messages in thread
From: Deucher, Alexander @ 2017-05-04 14:41 UTC (permalink / raw)
  To: 'David Woodhouse', 'Joerg Roedel', Bjorn Helgaas
  Cc: linux-pci, linux-kernel, Daniel Drake, Samuel Sieb, Joerg Roedel

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBEYXZpZCBXb29kaG91c2UgW21h
aWx0bzpkd213MkBpbmZyYWRlYWQub3JnXQ0KPiBTZW50OiBUaHVyc2RheSwgTWF5IDA0LCAyMDE3
IDY6MjIgQU0NCj4gVG86IERldWNoZXIsIEFsZXhhbmRlcjsgJ0pvZXJnIFJvZWRlbCc7IEJqb3Ju
IEhlbGdhYXMNCj4gQ2M6IGxpbnV4LXBjaUB2Z2VyLmtlcm5lbC5vcmc7IGxpbnV4LWtlcm5lbEB2
Z2VyLmtlcm5lbC5vcmc7IERhbmllbCBEcmFrZTsNCj4gU2FtdWVsIFNpZWI7IEpvZXJnIFJvZWRl
bA0KPiBTdWJqZWN0OiBSZTogW1BBVENIIHYyXSBQQ0k6IEFkZCBBVFMtZGlzYWJsZSBxdWlyayBm
b3IgQU1EIFN0b25leSBHUFVzDQo+IA0KPiBPbiBGcmksIDIwMTctMDQtMDcgYXQgMTY6NDYgKzAw
MDAsIERldWNoZXIsIEFsZXhhbmRlciB3cm90ZToNCj4gPiA+DQo+ID4gPiAtLS0tLU9yaWdpbmFs
IE1lc3NhZ2UtLS0tLQ0KPiA+ID4gRnJvbTogSm9lcmcgUm9lZGVsIFttYWlsdG86am9yb0A4Ynl0
ZXMub3JnXQ0KPiA+ID4gU2VudDogRnJpZGF5LCBBcHJpbCAwNywgMjAxNyAxMDozMiBBTQ0KPiA+
ID4gVG86IEJqb3JuIEhlbGdhYXMNCj4gPiA+IENjOiBsaW51eC1wY2lAdmdlci5rZXJuZWwub3Jn
OyBsaW51eC1rZXJuZWxAdmdlci5rZXJuZWwub3JnOyBEYW5pZWwNCj4gRHJha2U7DQo+ID4gPiBE
ZXVjaGVyLCBBbGV4YW5kZXI7IFNhbXVlbCBTaWViOyBEYXZpZCBXb29kaG91c2U7IEpvZXJnIFJv
ZWRlbA0KPiA+ID4gU3ViamVjdDogW1BBVENIIHYyXSBQQ0k6IEFkZCBBVFMtZGlzYWJsZSBxdWly
ayBmb3IgQU1EIFN0b25leSBHUFVzDQo+ID4gPg0KPiA+ID4gRnJvbTogSm9lcmcgUm9lZGVsIDxq
cm9lZGVsQHN1c2UuZGU+DQo+ID4gPg0KPiA+ID4gQVRTIGlzIGJyb2tlbiBvbiB0aGlzIGhhcmR3
YXJlIGFuZCBjYXVzZXMgSU9NTVUgc3RhbGxzIGFuZA0KPiA+ID4gc3lzdGVtIGZhaWx1cmUuIERp
c2FibGUgQVRTIG9uIHRoZXNlIGRldmljZXMgdG8gbWFrZSB0aGVtDQo+ID4gPiB1c2FibGUgYWdh
aW4gd2l0aCBJT01NVSBlbmFibGVkLg0KPiA+ID4NCj4gPiA+IE5vdGUgdGhhdCB0aGUgY29tbWl0
IGluIHRoZSBGaXhlcy10YWcgaXMgbm90IGJ1Z2d5LCBpdA0KPiA+ID4ganVzdCB1bmNvdmVycyB0
aGUgcHJvYmxlbSBpbiB0aGUgaGFyZHdhcmUgYnkgaW5jcmVhc2luZw0KPiA+ID4gdGhlIEFUUy1m
bHVzaCByYXRlLg0KPiA+ID4NCj4gPiA+IEZpeGVzOiBiMTUxNmExNDY1N2EgKCdpb21tdS9hbWQ6
IEltcGxlbWVudCBmbHVzaCBxdWV1ZScpDQo+ID4gPiBTaWduZWQtb2ZmLWJ5OiBKb2VyZyBSb2Vk
ZWwgPGpyb2VkZWxAc3VzZS5kZT4NCj4gPiBBY2tlZC1ieTogQWxleCBEZXVjaGVyIDxhbGV4YW5k
ZXIuZGV1Y2hlckBhbWQuY29tPg0KPiANCj4gQWxleCwgYXJlIHlvdSBhYmxlIHRvIGNvbmZpcm0g
dGhhdCBpdCBpcyAqb25seSogdGhlIGRldmljZSB3aXRoIFBDSSBJRA0KPiAweDk4ZTQgd2hpY2gg
aGFzIHRoaXMgcHJvYmxlbSwgb3IgKG1vcmUgbGlrZWx5KSBjb21lIHVwIHdpdGggYW4NCj4gZXho
YXVzdGl2ZSBsaXN0PyBUaGFua3MuDQoNCkl0J3MganVzdCBTdG9uZXksIHRoYXQgaXMgdGhlIG9u
bHkgSUQuICBBVFMgaXMgdmFsaWRhdGVkIG9uIHRoZSBvdGhlciBHUFUgcGFydHMuDQoNCkFsZXgN
Cg0KPiANCj4gV2UnbGwgd2FudCB0aGUgc2FtZSBibGFja2xpc3QgaW4gWGVuIHRvbywgd29uJ3Qg
d2U/DQo+IA0KPiA+ID4NCj4gPiA+IC0tLQ0KPiA+ID4gwqBkcml2ZXJzL3BjaS9xdWlya3MuYyB8
IDE5ICsrKysrKysrKysrKysrKysrKysNCj4gPiA+IMKgMSBmaWxlIGNoYW5nZWQsIDE5IGluc2Vy
dGlvbnMoKykNCj4gPiA+DQo+ID4gPiBkaWZmIC0tZ2l0IGEvZHJpdmVycy9wY2kvcXVpcmtzLmMg
Yi9kcml2ZXJzL3BjaS9xdWlya3MuYw0KPiA+ID4gaW5kZXggNjczNjgzNi4uN2NiZTMxNiAxMDA2
NDQNCj4gPiA+IC0tLSBhL2RyaXZlcnMvcGNpL3F1aXJrcy5jDQo+ID4gPiArKysgYi9kcml2ZXJz
L3BjaS9xdWlya3MuYw0KPiA+ID4gQEAgLTQ2MzQsMyArNDYzNCwyMiBAQCBzdGF0aWMgdm9pZCBx
dWlya19ub19hZXJzaWQoc3RydWN0IHBjaV9kZXYNCj4gKnBkZXYpDQo+ID4gPiDCoERFQ0xBUkVf
UENJX0ZJWFVQX0VBUkxZKFBDSV9WRU5ET1JfSURfSU5URUwsIDB4MjAzMSwNCj4gPiA+IHF1aXJr
X25vX2FlcnNpZCk7DQo+ID4gPiDCoERFQ0xBUkVfUENJX0ZJWFVQX0VBUkxZKFBDSV9WRU5ET1Jf
SURfSU5URUwsIDB4MjAzMiwNCj4gPiA+IHF1aXJrX25vX2FlcnNpZCk7DQo+ID4gPiDCoERFQ0xB
UkVfUENJX0ZJWFVQX0VBUkxZKFBDSV9WRU5ET1JfSURfSU5URUwsIDB4MjAzMywNCj4gPiA+IHF1
aXJrX25vX2FlcnNpZCk7DQo+ID4gPiArDQo+ID4gPiArI2lmZGVmIENPTkZJR19QQ0lfQVRTDQo+
ID4gPiArLyoNCj4gPiA+ICsgKiBTb21lIGRldmljZXMgaGF2ZSBhIGJyb2tlbiBBVFMgaW1wbGVt
ZW50YXRpb24gY2F1c2luZyBJT01NVQ0KPiBzdGFsbHMuDQo+ID4gPiArICogRG9uJ3QgdXNlIEFU
UyBmb3IgdGhvc2UgZGV2aWNlcy4NCj4gPiA+ICsgKi8NCj4gPiA+ICtzdGF0aWMgdm9pZCBxdWly
a19kaXNhYmxlX2F0cyhzdHJ1Y3QgcGNpX2RldiAqcGRldikNCj4gPiA+ICt7DQo+ID4gPiArCS8q
DQo+ID4gPiArCcKgKiBTZXQgcGRldi0+YXRzX2NhcCA9IDAgdG8gbWFrZSBwY2lfZW5hYmxlX2F0
cygpIGJhaWwgb3V0DQo+ID4gPiArCcKgKiBlYXJseS4NCj4gPiA+ICsJwqAqLw0KPiA+ID4gKwlk
ZXZfaW5mbygmcGRldi0+ZGV2LCAiUVVJUks6IERpc2FibGluZyBBVFMiKTsNCj4gPiA+ICsJcGRl
di0+YXRzX2NhcCA9IDA7DQo+ID4gPiArfQ0KPiA+ID4gKw0KPiA+ID4gKy8qIEFNRCBTdG9uZXkg
cGxhdGZvcm0gR1BVICovDQo+ID4gPiArREVDTEFSRV9QQ0lfRklYVVBfRklOQUwoUENJX1ZFTkRP
Ul9JRF9BVEksIDB4OThlNCwNCj4gcXVpcmtfZGlzYWJsZV9hdHMpOw0KPiA+ID4gKyNlbmRpZiAv
KiBDT05GSUdfUENJX0FUUyAqLw0KPiA+ID4gLS0NCj4gPiA+IDEuOS4xDQo=

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
  2017-05-04 10:21   ` David Woodhouse
@ 2017-05-23 19:54       ` Deucher, Alexander
  2017-05-23 19:54       ` Deucher, Alexander
  1 sibling, 0 replies; 35+ messages in thread
From: Deucher, Alexander @ 2017-05-23 19:54 UTC (permalink / raw)
  To: 'David Woodhouse', 'Joerg Roedel', Bjorn Helgaas
  Cc: linux-pci, linux-kernel, Daniel Drake, Samuel Sieb, Joerg Roedel

> -----Original Message-----
> From: David Woodhouse [mailto:dwmw2@infradead.org]
> Sent: Thursday, May 04, 2017 6:22 AM
> To: Deucher, Alexander; 'Joerg Roedel'; Bjorn Helgaas
> Cc: linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel Drake;
> Samuel Sieb; Joerg Roedel
> Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> 
> On Fri, 2017-04-07 at 16:46 +0000, Deucher, Alexander wrote:
> > >
> > > -----Original Message-----
> > > From: Joerg Roedel [mailto:joro@8bytes.org]
> > > Sent: Friday, April 07, 2017 10:32 AM
> > > To: Bjorn Helgaas
> > > Cc: linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel
> Drake;
> > > Deucher, Alexander; Samuel Sieb; David Woodhouse; Joerg Roedel
> > > Subject: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> > >
> > > From: Joerg Roedel <jroedel@suse.de>
> > >
> > > ATS is broken on this hardware and causes IOMMU stalls and
> > > system failure. Disable ATS on these devices to make them
> > > usable again with IOMMU enabled.
> > >
> > > Note that the commit in the Fixes-tag is not buggy, it
> > > just uncovers the problem in the hardware by increasing
> > > the ATS-flush rate.
> > >
> > > Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
> > > Signed-off-by: Joerg Roedel <jroedel@suse.de>
> > Acked-by: Alex Deucher <alexander.deucher@amd.com>
> 
> Alex, are you able to confirm that it is *only* the device with PCI ID
> 0x98e4 which has this problem, or (more likely) come up with an
> exhaustive list? Thanks.
> 
> We'll want the same blacklist in Xen too, won't we?

I finally got an answer from the hw team and we validated ATS on stoney as well so in theory this patch shouldn’t actually be needed.  I think we may actually be papering over some other issue.  The following patch seems to also fix this issue (and other issues):
https://www.spinics.net/lists/stable/msg172631.html

Alex

> 
> > >
> > > ---
> > >  drivers/pci/quirks.c | 19 +++++++++++++++++++
> > >  1 file changed, 19 insertions(+)
> > >
> > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > index 6736836..7cbe316 100644
> > > --- a/drivers/pci/quirks.c
> > > +++ b/drivers/pci/quirks.c
> > > @@ -4634,3 +4634,22 @@ static void quirk_no_aersid(struct pci_dev
> *pdev)
> > >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2031,
> > > quirk_no_aersid);
> > >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2032,
> > > quirk_no_aersid);
> > >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2033,
> > > quirk_no_aersid);
> > > +
> > > +#ifdef CONFIG_PCI_ATS
> > > +/*
> > > + * Some devices have a broken ATS implementation causing IOMMU
> stalls.
> > > + * Don't use ATS for those devices.
> > > + */
> > > +static void quirk_disable_ats(struct pci_dev *pdev)
> > > +{
> > > +	/*
> > > +	 * Set pdev->ats_cap = 0 to make pci_enable_ats() bail out
> > > +	 * early.
> > > +	 */
> > > +	dev_info(&pdev->dev, "QUIRK: Disabling ATS");
> > > +	pdev->ats_cap = 0;
> > > +}
> > > +
> > > +/* AMD Stoney platform GPU */
> > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4,
> quirk_disable_ats);
> > > +#endif /* CONFIG_PCI_ATS */
> > > --
> > > 1.9.1

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
@ 2017-05-23 19:54       ` Deucher, Alexander
  0 siblings, 0 replies; 35+ messages in thread
From: Deucher, Alexander @ 2017-05-23 19:54 UTC (permalink / raw)
  To: 'David Woodhouse', 'Joerg Roedel', Bjorn Helgaas
  Cc: linux-pci, linux-kernel, Daniel Drake, Samuel Sieb, Joerg Roedel

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBEYXZpZCBXb29kaG91c2UgW21h
aWx0bzpkd213MkBpbmZyYWRlYWQub3JnXQ0KPiBTZW50OiBUaHVyc2RheSwgTWF5IDA0LCAyMDE3
IDY6MjIgQU0NCj4gVG86IERldWNoZXIsIEFsZXhhbmRlcjsgJ0pvZXJnIFJvZWRlbCc7IEJqb3Ju
IEhlbGdhYXMNCj4gQ2M6IGxpbnV4LXBjaUB2Z2VyLmtlcm5lbC5vcmc7IGxpbnV4LWtlcm5lbEB2
Z2VyLmtlcm5lbC5vcmc7IERhbmllbCBEcmFrZTsNCj4gU2FtdWVsIFNpZWI7IEpvZXJnIFJvZWRl
bA0KPiBTdWJqZWN0OiBSZTogW1BBVENIIHYyXSBQQ0k6IEFkZCBBVFMtZGlzYWJsZSBxdWlyayBm
b3IgQU1EIFN0b25leSBHUFVzDQo+IA0KPiBPbiBGcmksIDIwMTctMDQtMDcgYXQgMTY6NDYgKzAw
MDAsIERldWNoZXIsIEFsZXhhbmRlciB3cm90ZToNCj4gPiA+DQo+ID4gPiAtLS0tLU9yaWdpbmFs
IE1lc3NhZ2UtLS0tLQ0KPiA+ID4gRnJvbTogSm9lcmcgUm9lZGVsIFttYWlsdG86am9yb0A4Ynl0
ZXMub3JnXQ0KPiA+ID4gU2VudDogRnJpZGF5LCBBcHJpbCAwNywgMjAxNyAxMDozMiBBTQ0KPiA+
ID4gVG86IEJqb3JuIEhlbGdhYXMNCj4gPiA+IENjOiBsaW51eC1wY2lAdmdlci5rZXJuZWwub3Jn
OyBsaW51eC1rZXJuZWxAdmdlci5rZXJuZWwub3JnOyBEYW5pZWwNCj4gRHJha2U7DQo+ID4gPiBE
ZXVjaGVyLCBBbGV4YW5kZXI7IFNhbXVlbCBTaWViOyBEYXZpZCBXb29kaG91c2U7IEpvZXJnIFJv
ZWRlbA0KPiA+ID4gU3ViamVjdDogW1BBVENIIHYyXSBQQ0k6IEFkZCBBVFMtZGlzYWJsZSBxdWly
ayBmb3IgQU1EIFN0b25leSBHUFVzDQo+ID4gPg0KPiA+ID4gRnJvbTogSm9lcmcgUm9lZGVsIDxq
cm9lZGVsQHN1c2UuZGU+DQo+ID4gPg0KPiA+ID4gQVRTIGlzIGJyb2tlbiBvbiB0aGlzIGhhcmR3
YXJlIGFuZCBjYXVzZXMgSU9NTVUgc3RhbGxzIGFuZA0KPiA+ID4gc3lzdGVtIGZhaWx1cmUuIERp
c2FibGUgQVRTIG9uIHRoZXNlIGRldmljZXMgdG8gbWFrZSB0aGVtDQo+ID4gPiB1c2FibGUgYWdh
aW4gd2l0aCBJT01NVSBlbmFibGVkLg0KPiA+ID4NCj4gPiA+IE5vdGUgdGhhdCB0aGUgY29tbWl0
IGluIHRoZSBGaXhlcy10YWcgaXMgbm90IGJ1Z2d5LCBpdA0KPiA+ID4ganVzdCB1bmNvdmVycyB0
aGUgcHJvYmxlbSBpbiB0aGUgaGFyZHdhcmUgYnkgaW5jcmVhc2luZw0KPiA+ID4gdGhlIEFUUy1m
bHVzaCByYXRlLg0KPiA+ID4NCj4gPiA+IEZpeGVzOiBiMTUxNmExNDY1N2EgKCdpb21tdS9hbWQ6
IEltcGxlbWVudCBmbHVzaCBxdWV1ZScpDQo+ID4gPiBTaWduZWQtb2ZmLWJ5OiBKb2VyZyBSb2Vk
ZWwgPGpyb2VkZWxAc3VzZS5kZT4NCj4gPiBBY2tlZC1ieTogQWxleCBEZXVjaGVyIDxhbGV4YW5k
ZXIuZGV1Y2hlckBhbWQuY29tPg0KPiANCj4gQWxleCwgYXJlIHlvdSBhYmxlIHRvIGNvbmZpcm0g
dGhhdCBpdCBpcyAqb25seSogdGhlIGRldmljZSB3aXRoIFBDSSBJRA0KPiAweDk4ZTQgd2hpY2gg
aGFzIHRoaXMgcHJvYmxlbSwgb3IgKG1vcmUgbGlrZWx5KSBjb21lIHVwIHdpdGggYW4NCj4gZXho
YXVzdGl2ZSBsaXN0PyBUaGFua3MuDQo+IA0KPiBXZSdsbCB3YW50IHRoZSBzYW1lIGJsYWNrbGlz
dCBpbiBYZW4gdG9vLCB3b24ndCB3ZT8NCg0KSSBmaW5hbGx5IGdvdCBhbiBhbnN3ZXIgZnJvbSB0
aGUgaHcgdGVhbSBhbmQgd2UgdmFsaWRhdGVkIEFUUyBvbiBzdG9uZXkgYXMgd2VsbCBzbyBpbiB0
aGVvcnkgdGhpcyBwYXRjaCBzaG91bGRu4oCZdCBhY3R1YWxseSBiZSBuZWVkZWQuICBJIHRoaW5r
IHdlIG1heSBhY3R1YWxseSBiZSBwYXBlcmluZyBvdmVyIHNvbWUgb3RoZXIgaXNzdWUuICBUaGUg
Zm9sbG93aW5nIHBhdGNoIHNlZW1zIHRvIGFsc28gZml4IHRoaXMgaXNzdWUgKGFuZCBvdGhlciBp
c3N1ZXMpOg0KaHR0cHM6Ly93d3cuc3Bpbmljcy5uZXQvbGlzdHMvc3RhYmxlL21zZzE3MjYzMS5o
dG1sDQoNCkFsZXgNCg0KPiANCj4gPiA+DQo+ID4gPiAtLS0NCj4gPiA+IMKgZHJpdmVycy9wY2kv
cXVpcmtzLmMgfCAxOSArKysrKysrKysrKysrKysrKysrDQo+ID4gPiDCoDEgZmlsZSBjaGFuZ2Vk
LCAxOSBpbnNlcnRpb25zKCspDQo+ID4gPg0KPiA+ID4gZGlmZiAtLWdpdCBhL2RyaXZlcnMvcGNp
L3F1aXJrcy5jIGIvZHJpdmVycy9wY2kvcXVpcmtzLmMNCj4gPiA+IGluZGV4IDY3MzY4MzYuLjdj
YmUzMTYgMTAwNjQ0DQo+ID4gPiAtLS0gYS9kcml2ZXJzL3BjaS9xdWlya3MuYw0KPiA+ID4gKysr
IGIvZHJpdmVycy9wY2kvcXVpcmtzLmMNCj4gPiA+IEBAIC00NjM0LDMgKzQ2MzQsMjIgQEAgc3Rh
dGljIHZvaWQgcXVpcmtfbm9fYWVyc2lkKHN0cnVjdCBwY2lfZGV2DQo+ICpwZGV2KQ0KPiA+ID4g
wqBERUNMQVJFX1BDSV9GSVhVUF9FQVJMWShQQ0lfVkVORE9SX0lEX0lOVEVMLCAweDIwMzEsDQo+
ID4gPiBxdWlya19ub19hZXJzaWQpOw0KPiA+ID4gwqBERUNMQVJFX1BDSV9GSVhVUF9FQVJMWShQ
Q0lfVkVORE9SX0lEX0lOVEVMLCAweDIwMzIsDQo+ID4gPiBxdWlya19ub19hZXJzaWQpOw0KPiA+
ID4gwqBERUNMQVJFX1BDSV9GSVhVUF9FQVJMWShQQ0lfVkVORE9SX0lEX0lOVEVMLCAweDIwMzMs
DQo+ID4gPiBxdWlya19ub19hZXJzaWQpOw0KPiA+ID4gKw0KPiA+ID4gKyNpZmRlZiBDT05GSUdf
UENJX0FUUw0KPiA+ID4gKy8qDQo+ID4gPiArICogU29tZSBkZXZpY2VzIGhhdmUgYSBicm9rZW4g
QVRTIGltcGxlbWVudGF0aW9uIGNhdXNpbmcgSU9NTVUNCj4gc3RhbGxzLg0KPiA+ID4gKyAqIERv
bid0IHVzZSBBVFMgZm9yIHRob3NlIGRldmljZXMuDQo+ID4gPiArICovDQo+ID4gPiArc3RhdGlj
IHZvaWQgcXVpcmtfZGlzYWJsZV9hdHMoc3RydWN0IHBjaV9kZXYgKnBkZXYpDQo+ID4gPiArew0K
PiA+ID4gKwkvKg0KPiA+ID4gKwnCoCogU2V0IHBkZXYtPmF0c19jYXAgPSAwIHRvIG1ha2UgcGNp
X2VuYWJsZV9hdHMoKSBiYWlsIG91dA0KPiA+ID4gKwnCoCogZWFybHkuDQo+ID4gPiArCcKgKi8N
Cj4gPiA+ICsJZGV2X2luZm8oJnBkZXYtPmRldiwgIlFVSVJLOiBEaXNhYmxpbmcgQVRTIik7DQo+
ID4gPiArCXBkZXYtPmF0c19jYXAgPSAwOw0KPiA+ID4gK30NCj4gPiA+ICsNCj4gPiA+ICsvKiBB
TUQgU3RvbmV5IHBsYXRmb3JtIEdQVSAqLw0KPiA+ID4gK0RFQ0xBUkVfUENJX0ZJWFVQX0ZJTkFM
KFBDSV9WRU5ET1JfSURfQVRJLCAweDk4ZTQsDQo+IHF1aXJrX2Rpc2FibGVfYXRzKTsNCj4gPiA+
ICsjZW5kaWYgLyogQ09ORklHX1BDSV9BVFMgKi8NCj4gPiA+IC0tDQo+ID4gPiAxLjkuMQ0K

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
  2017-05-23 19:54       ` Deucher, Alexander
  (?)
@ 2017-05-24  8:44       ` Joerg Roedel
  2017-05-24 10:38         ` David Woodhouse
                           ` (2 more replies)
  -1 siblings, 3 replies; 35+ messages in thread
From: Joerg Roedel @ 2017-05-24  8:44 UTC (permalink / raw)
  To: Deucher, Alexander
  Cc: 'David Woodhouse', 'Joerg Roedel',
	Bjorn Helgaas, linux-pci, linux-kernel, Daniel Drake,
	Samuel Sieb

Hi Alexander,

On Tue, May 23, 2017 at 07:54:12PM +0000, Deucher, Alexander wrote:
> I finally got an answer from the hw team and we validated ATS on
> stoney as well so in theory this patch shouldn’t actually be needed.
> I think we may actually be papering over some other issue.  The
> following patch seems to also fix this issue (and other issues):
> https://www.spinics.net/lists/stable/msg172631.html

Yeah, but it still looks to me like that the hardware got into some
weird state with the storm of ATS invalidations sent to it.

The Completion-Wait loop timeouts seen in the original bug report
indicate that the IOMMU is waiting for a response that never comes. And
this is probably the ATS flush completion response from the GPU, as
disabling ATS on the GPU makes the issue disappear.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
  2017-05-24  8:44       ` Joerg Roedel
@ 2017-05-24 10:38         ` David Woodhouse
  2017-05-24 12:56           ` Deucher, Alexander
  2017-05-26 11:57           ` Deucher, Alexander
  2 siblings, 0 replies; 35+ messages in thread
From: David Woodhouse @ 2017-05-24 10:38 UTC (permalink / raw)
  To: Joerg Roedel, Deucher, Alexander
  Cc: 'Joerg Roedel',
	Bjorn Helgaas, linux-pci, linux-kernel, Daniel Drake,
	Samuel Sieb

[-- Attachment #1: Type: text/plain, Size: 1326 bytes --]

On Wed, 2017-05-24 at 10:44 +0200, Joerg Roedel wrote:
> Hi Alexander,
> 
> On Tue, May 23, 2017 at 07:54:12PM +0000, Deucher, Alexander wrote:
> > 
> > I finally got an answer from the hw team and we validated ATS on
> > stoney as well so in theory this patch shouldn’t actually be needed.
> > I think we may actually be papering over some other issue.  The
> > following patch seems to also fix this issue (and other issues):
> > https://www.spinics.net/lists/stable/msg172631.html
>
> Yeah, but it still looks to me like that the hardware got into some
> weird state with the storm of ATS invalidations sent to it.
> 
> The Completion-Wait loop timeouts seen in the original bug report
> indicate that the IOMMU is waiting for a response that never comes. And
> this is probably the ATS flush completion response from the GPU, as
> disabling ATS on the GPU makes the issue disappear.

The above patch doesn't actually fix any spec violation which could the
GPU an *excuse* to crash and stop responding to invalidations, does it?
It just seems to reduce the invalidation load a little, and thus paper
over the problem that the card tends to crash under load. Absent a more
coherent explanation, it still seems like the correct answer is to
blacklist these devices for ATS because they're broken.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4938 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
  2017-05-24  8:44       ` Joerg Roedel
@ 2017-05-24 12:56           ` Deucher, Alexander
  2017-05-24 12:56           ` Deucher, Alexander
  2017-05-26 11:57           ` Deucher, Alexander
  2 siblings, 0 replies; 35+ messages in thread
From: Deucher, Alexander @ 2017-05-24 12:56 UTC (permalink / raw)
  To: 'Joerg Roedel'
  Cc: 'David Woodhouse', 'Joerg Roedel',
	Bjorn Helgaas, linux-pci, linux-kernel, Daniel Drake,
	Samuel Sieb

> -----Original Message-----
> From: Joerg Roedel [mailto:jroedel@suse.de]
> Sent: Wednesday, May 24, 2017 4:45 AM
> To: Deucher, Alexander
> Cc: 'David Woodhouse'; 'Joerg Roedel'; Bjorn Helgaas; linux-
> pci@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel Drake; Samuel
> Sieb
> Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> 
> Hi Alexander,
> 
> On Tue, May 23, 2017 at 07:54:12PM +0000, Deucher, Alexander wrote:
> > I finally got an answer from the hw team and we validated ATS on
> > stoney as well so in theory this patch shouldn’t actually be needed.
> > I think we may actually be papering over some other issue.  The
> > following patch seems to also fix this issue (and other issues):
> > https://www.spinics.net/lists/stable/msg172631.html
> 
> Yeah, but it still looks to me like that the hardware got into some
> weird state with the storm of ATS invalidations sent to it.
> 
> The Completion-Wait loop timeouts seen in the original bug report
> indicate that the IOMMU is waiting for a response that never comes. And
> this is probably the ATS flush completion response from the GPU, as
> disabling ATS on the GPU makes the issue disappear.

Yeah, it's weird.  My ack on the patch still stands.  Just adding some additional data.

Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
@ 2017-05-24 12:56           ` Deucher, Alexander
  0 siblings, 0 replies; 35+ messages in thread
From: Deucher, Alexander @ 2017-05-24 12:56 UTC (permalink / raw)
  To: 'Joerg Roedel'
  Cc: 'David Woodhouse', 'Joerg Roedel',
	Bjorn Helgaas, linux-pci, linux-kernel, Daniel Drake,
	Samuel Sieb

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBKb2VyZyBSb2VkZWwgW21haWx0
bzpqcm9lZGVsQHN1c2UuZGVdDQo+IFNlbnQ6IFdlZG5lc2RheSwgTWF5IDI0LCAyMDE3IDQ6NDUg
QU0NCj4gVG86IERldWNoZXIsIEFsZXhhbmRlcg0KPiBDYzogJ0RhdmlkIFdvb2Rob3VzZSc7ICdK
b2VyZyBSb2VkZWwnOyBCam9ybiBIZWxnYWFzOyBsaW51eC0NCj4gcGNpQHZnZXIua2VybmVsLm9y
ZzsgbGludXgta2VybmVsQHZnZXIua2VybmVsLm9yZzsgRGFuaWVsIERyYWtlOyBTYW11ZWwNCj4g
U2llYg0KPiBTdWJqZWN0OiBSZTogW1BBVENIIHYyXSBQQ0k6IEFkZCBBVFMtZGlzYWJsZSBxdWly
ayBmb3IgQU1EIFN0b25leSBHUFVzDQo+IA0KPiBIaSBBbGV4YW5kZXIsDQo+IA0KPiBPbiBUdWUs
IE1heSAyMywgMjAxNyBhdCAwNzo1NDoxMlBNICswMDAwLCBEZXVjaGVyLCBBbGV4YW5kZXIgd3Jv
dGU6DQo+ID4gSSBmaW5hbGx5IGdvdCBhbiBhbnN3ZXIgZnJvbSB0aGUgaHcgdGVhbSBhbmQgd2Ug
dmFsaWRhdGVkIEFUUyBvbg0KPiA+IHN0b25leSBhcyB3ZWxsIHNvIGluIHRoZW9yeSB0aGlzIHBh
dGNoIHNob3VsZG7igJl0IGFjdHVhbGx5IGJlIG5lZWRlZC4NCj4gPiBJIHRoaW5rIHdlIG1heSBh
Y3R1YWxseSBiZSBwYXBlcmluZyBvdmVyIHNvbWUgb3RoZXIgaXNzdWUuICBUaGUNCj4gPiBmb2xs
b3dpbmcgcGF0Y2ggc2VlbXMgdG8gYWxzbyBmaXggdGhpcyBpc3N1ZSAoYW5kIG90aGVyIGlzc3Vl
cyk6DQo+ID4gaHR0cHM6Ly93d3cuc3Bpbmljcy5uZXQvbGlzdHMvc3RhYmxlL21zZzE3MjYzMS5o
dG1sDQo+IA0KPiBZZWFoLCBidXQgaXQgc3RpbGwgbG9va3MgdG8gbWUgbGlrZSB0aGF0IHRoZSBo
YXJkd2FyZSBnb3QgaW50byBzb21lDQo+IHdlaXJkIHN0YXRlIHdpdGggdGhlIHN0b3JtIG9mIEFU
UyBpbnZhbGlkYXRpb25zIHNlbnQgdG8gaXQuDQo+IA0KPiBUaGUgQ29tcGxldGlvbi1XYWl0IGxv
b3AgdGltZW91dHMgc2VlbiBpbiB0aGUgb3JpZ2luYWwgYnVnIHJlcG9ydA0KPiBpbmRpY2F0ZSB0
aGF0IHRoZSBJT01NVSBpcyB3YWl0aW5nIGZvciBhIHJlc3BvbnNlIHRoYXQgbmV2ZXIgY29tZXMu
IEFuZA0KPiB0aGlzIGlzIHByb2JhYmx5IHRoZSBBVFMgZmx1c2ggY29tcGxldGlvbiByZXNwb25z
ZSBmcm9tIHRoZSBHUFUsIGFzDQo+IGRpc2FibGluZyBBVFMgb24gdGhlIEdQVSBtYWtlcyB0aGUg
aXNzdWUgZGlzYXBwZWFyLg0KDQpZZWFoLCBpdCdzIHdlaXJkLiAgTXkgYWNrIG9uIHRoZSBwYXRj
aCBzdGlsbCBzdGFuZHMuICBKdXN0IGFkZGluZyBzb21lIGFkZGl0aW9uYWwgZGF0YS4NCg0KQWxl
eA0KDQo=

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
  2017-05-24 12:56           ` Deucher, Alexander
  (?)
@ 2017-05-26  6:48           ` Samuel Sieb
  -1 siblings, 0 replies; 35+ messages in thread
From: Samuel Sieb @ 2017-05-26  6:48 UTC (permalink / raw)
  To: Deucher, Alexander, 'Joerg Roedel'
  Cc: 'David Woodhouse', 'Joerg Roedel',
	Bjorn Helgaas, linux-pci, linux-kernel, Daniel Drake

On 05/24/2017 05:56 AM, Deucher, Alexander wrote:
>> -----Original Message-----
>> From: Joerg Roedel [mailto:jroedel@suse.de]
>> Sent: Wednesday, May 24, 2017 4:45 AM
>> To: Deucher, Alexander
>> Cc: 'David Woodhouse'; 'Joerg Roedel'; Bjorn Helgaas; linux-
>> pci@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel Drake; Samuel
>> Sieb
>> Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
>>
>> Hi Alexander,
>>
>> On Tue, May 23, 2017 at 07:54:12PM +0000, Deucher, Alexander wrote:
>>> I finally got an answer from the hw team and we validated ATS on
>>> stoney as well so in theory this patch shouldn’t actually be needed.
>>> I think we may actually be papering over some other issue.  The
>>> following patch seems to also fix this issue (and other issues):
>>> https://www.spinics.net/lists/stable/msg172631.html
>>
>> Yeah, but it still looks to me like that the hardware got into some
>> weird state with the storm of ATS invalidations sent to it.
>>
>> The Completion-Wait loop timeouts seen in the original bug report
>> indicate that the IOMMU is waiting for a response that never comes. And
>> this is probably the ATS flush completion response from the GPU, as
>> disabling ATS on the GPU makes the issue disappear.
> 
> Yeah, it's weird.  My ack on the patch still stands.  Just adding some additional data.
> 

I just tested this patch without the previous ATS disabling patch (I 
verified that ATS was enabled).  Doing a stress-test kernel build while 
running a 3D graphical application caused no disk corruption.  That was 
running for several hours.  If it's not the solution, it sure hides the 
problem really well.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
  2017-05-24  8:44       ` Joerg Roedel
@ 2017-05-26 11:57           ` Deucher, Alexander
  2017-05-24 12:56           ` Deucher, Alexander
  2017-05-26 11:57           ` Deucher, Alexander
  2 siblings, 0 replies; 35+ messages in thread
From: Deucher, Alexander @ 2017-05-26 11:57 UTC (permalink / raw)
  To: 'Joerg Roedel'
  Cc: 'David Woodhouse', 'Joerg Roedel',
	Bjorn Helgaas, linux-pci, linux-kernel, Daniel Drake,
	Samuel Sieb

> -----Original Message-----
> From: Joerg Roedel [mailto:jroedel@suse.de]
> Sent: Wednesday, May 24, 2017 4:45 AM
> To: Deucher, Alexander
> Cc: 'David Woodhouse'; 'Joerg Roedel'; Bjorn Helgaas; linux-
> pci@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel Drake; Samuel
> Sieb
> Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> 
> Hi Alexander,
> 
> On Tue, May 23, 2017 at 07:54:12PM +0000, Deucher, Alexander wrote:
> > I finally got an answer from the hw team and we validated ATS on
> > stoney as well so in theory this patch shouldn’t actually be needed.
> > I think we may actually be papering over some other issue.  The
> > following patch seems to also fix this issue (and other issues):
> > https://www.spinics.net/lists/stable/msg172631.html
> 
> Yeah, but it still looks to me like that the hardware got into some
> weird state with the storm of ATS invalidations sent to it.
> 
> The Completion-Wait loop timeouts seen in the original bug report
> indicate that the IOMMU is waiting for a response that never comes. And
> this is probably the ATS flush completion response from the GPU, as
> disabling ATS on the GPU makes the issue disappear.

FWIW, the GPU driver does not actually use ATS at the moment so I don't think we should see any ATS transactions.

Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
@ 2017-05-26 11:57           ` Deucher, Alexander
  0 siblings, 0 replies; 35+ messages in thread
From: Deucher, Alexander @ 2017-05-26 11:57 UTC (permalink / raw)
  To: 'Joerg Roedel'
  Cc: 'David Woodhouse', 'Joerg Roedel',
	Bjorn Helgaas, linux-pci, linux-kernel, Daniel Drake,
	Samuel Sieb

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBKb2VyZyBSb2VkZWwgW21haWx0
bzpqcm9lZGVsQHN1c2UuZGVdDQo+IFNlbnQ6IFdlZG5lc2RheSwgTWF5IDI0LCAyMDE3IDQ6NDUg
QU0NCj4gVG86IERldWNoZXIsIEFsZXhhbmRlcg0KPiBDYzogJ0RhdmlkIFdvb2Rob3VzZSc7ICdK
b2VyZyBSb2VkZWwnOyBCam9ybiBIZWxnYWFzOyBsaW51eC0NCj4gcGNpQHZnZXIua2VybmVsLm9y
ZzsgbGludXgta2VybmVsQHZnZXIua2VybmVsLm9yZzsgRGFuaWVsIERyYWtlOyBTYW11ZWwNCj4g
U2llYg0KPiBTdWJqZWN0OiBSZTogW1BBVENIIHYyXSBQQ0k6IEFkZCBBVFMtZGlzYWJsZSBxdWly
ayBmb3IgQU1EIFN0b25leSBHUFVzDQo+IA0KPiBIaSBBbGV4YW5kZXIsDQo+IA0KPiBPbiBUdWUs
IE1heSAyMywgMjAxNyBhdCAwNzo1NDoxMlBNICswMDAwLCBEZXVjaGVyLCBBbGV4YW5kZXIgd3Jv
dGU6DQo+ID4gSSBmaW5hbGx5IGdvdCBhbiBhbnN3ZXIgZnJvbSB0aGUgaHcgdGVhbSBhbmQgd2Ug
dmFsaWRhdGVkIEFUUyBvbg0KPiA+IHN0b25leSBhcyB3ZWxsIHNvIGluIHRoZW9yeSB0aGlzIHBh
dGNoIHNob3VsZG7igJl0IGFjdHVhbGx5IGJlIG5lZWRlZC4NCj4gPiBJIHRoaW5rIHdlIG1heSBh
Y3R1YWxseSBiZSBwYXBlcmluZyBvdmVyIHNvbWUgb3RoZXIgaXNzdWUuICBUaGUNCj4gPiBmb2xs
b3dpbmcgcGF0Y2ggc2VlbXMgdG8gYWxzbyBmaXggdGhpcyBpc3N1ZSAoYW5kIG90aGVyIGlzc3Vl
cyk6DQo+ID4gaHR0cHM6Ly93d3cuc3Bpbmljcy5uZXQvbGlzdHMvc3RhYmxlL21zZzE3MjYzMS5o
dG1sDQo+IA0KPiBZZWFoLCBidXQgaXQgc3RpbGwgbG9va3MgdG8gbWUgbGlrZSB0aGF0IHRoZSBo
YXJkd2FyZSBnb3QgaW50byBzb21lDQo+IHdlaXJkIHN0YXRlIHdpdGggdGhlIHN0b3JtIG9mIEFU
UyBpbnZhbGlkYXRpb25zIHNlbnQgdG8gaXQuDQo+IA0KPiBUaGUgQ29tcGxldGlvbi1XYWl0IGxv
b3AgdGltZW91dHMgc2VlbiBpbiB0aGUgb3JpZ2luYWwgYnVnIHJlcG9ydA0KPiBpbmRpY2F0ZSB0
aGF0IHRoZSBJT01NVSBpcyB3YWl0aW5nIGZvciBhIHJlc3BvbnNlIHRoYXQgbmV2ZXIgY29tZXMu
IEFuZA0KPiB0aGlzIGlzIHByb2JhYmx5IHRoZSBBVFMgZmx1c2ggY29tcGxldGlvbiByZXNwb25z
ZSBmcm9tIHRoZSBHUFUsIGFzDQo+IGRpc2FibGluZyBBVFMgb24gdGhlIEdQVSBtYWtlcyB0aGUg
aXNzdWUgZGlzYXBwZWFyLg0KDQpGV0lXLCB0aGUgR1BVIGRyaXZlciBkb2VzIG5vdCBhY3R1YWxs
eSB1c2UgQVRTIGF0IHRoZSBtb21lbnQgc28gSSBkb24ndCB0aGluayB3ZSBzaG91bGQgc2VlIGFu
eSBBVFMgdHJhbnNhY3Rpb25zLg0KDQpBbGV4DQoNCg==

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
  2017-05-26 11:57           ` Deucher, Alexander
  (?)
@ 2017-05-26 12:54           ` David Woodhouse
  2017-05-26 15:59               ` Deucher, Alexander
  -1 siblings, 1 reply; 35+ messages in thread
From: David Woodhouse @ 2017-05-26 12:54 UTC (permalink / raw)
  To: Deucher, Alexander, 'Joerg Roedel'
  Cc: 'Joerg Roedel',
	Bjorn Helgaas, linux-pci, linux-kernel, Daniel Drake,
	Samuel Sieb

[-- Attachment #1: Type: text/plain, Size: 885 bytes --]

On Fri, 2017-05-26 at 11:57 +0000, Deucher, Alexander wrote:
> 
> FWIW, the GPU driver does not actually use ATS at the moment so I
> don't think we should see any ATS transactions.

That's a confusing sentence. The "GPU driver", if you mean software
running in the OS, wouldn't be expected to have anything to do with
ATS.

ATS is something that the CPU itself (or its DMA engine) would do.
Instead of just performing a DMA transaction to a given bus address,
and letting the IOMMU do the translation, the hardware might choose to
first perform an IOTLB lookup, and then later do the actual DMA
transaction to the pre-translated, raw physical address. Which kind of
makes a mockery of any kind of protection the IOMMU is supposed to give
you, but does shave a cycle or two of latency off the DMA when it
finally happens, since the translation can be done in advance.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4938 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
  2017-05-26 12:54           ` David Woodhouse
@ 2017-05-26 15:59               ` Deucher, Alexander
  0 siblings, 0 replies; 35+ messages in thread
From: Deucher, Alexander @ 2017-05-26 15:59 UTC (permalink / raw)
  To: 'David Woodhouse', 'Joerg Roedel',
	Bridgman, John, Suthikulpanit, Suravee
  Cc: 'Joerg Roedel',
	Bjorn Helgaas, linux-pci, linux-kernel, Daniel Drake,
	Samuel Sieb

> -----Original Message-----
> From: David Woodhouse [mailto:dwmw2@infradead.org]
> Sent: Friday, May 26, 2017 8:55 AM
> To: Deucher, Alexander; 'Joerg Roedel'
> Cc: 'Joerg Roedel'; Bjorn Helgaas; linux-pci@vger.kernel.org; linux-
> kernel@vger.kernel.org; Daniel Drake; Samuel Sieb
> Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> 
> On Fri, 2017-05-26 at 11:57 +0000, Deucher, Alexander wrote:
> >
> > FWIW, the GPU driver does not actually use ATS at the moment so I
> > don't think we should see any ATS transactions.
> 
> That's a confusing sentence. The "GPU driver", if you mean software
> running in the OS, wouldn't be expected to have anything to do with
> ATS.
> 
> ATS is something that the CPU itself (or its DMA engine) would do.
> Instead of just performing a DMA transaction to a given bus address,
> and letting the IOMMU do the translation, the hardware might choose to
> first perform an IOTLB lookup, and then later do the actual DMA
> transaction to the pre-translated, raw physical address. Which kind of
> makes a mockery of any kind of protection the IOMMU is supposed to give
> you, but does shave a cycle or two of latency off the DMA when it
> finally happens, since the translation can be done in advance.

+ John, Suravee

Full disclosure, I'm not by any means an expert with ATS.  I guess I'm thinking of PRI support rather than ATS per se.  On the GPU side the GPU's memory controller has multiple paths to system memory, the non-ATS/PRI path and the ATS/PRI path.  The GPU has its own integrated MMU to virtualize the GPU's internal address space per GPU client.  The non-ATS/PRI path uses the GPU's MMU and is just "regular" dma to addresses potentially translated by the IOMMU just like any other device that may not have ATS support.  The system memory has to be resident because if the GPU faults, it can't retry the transaction.  For the ATS/PRI path, the GPU's MMU is bypassed and PASIDs need to be setup on the IOMMU for each client, but once done, transactions that use that interface support retries on GPU page faults (after the OS had paged the memory in and the IOMMU tables been updated) and other features.  I think only the ATS/PRI case uses the ATC on the end point.  John, Suravee, correct me if I'm wrong.

Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
@ 2017-05-26 15:59               ` Deucher, Alexander
  0 siblings, 0 replies; 35+ messages in thread
From: Deucher, Alexander @ 2017-05-26 15:59 UTC (permalink / raw)
  To: 'David Woodhouse', 'Joerg Roedel',
	Bridgman, John, Suthikulpanit, Suravee
  Cc: 'Joerg Roedel',
	Bjorn Helgaas, linux-pci, linux-kernel, Daniel Drake,
	Samuel Sieb

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBEYXZpZCBXb29kaG91c2UgW21h
aWx0bzpkd213MkBpbmZyYWRlYWQub3JnXQ0KPiBTZW50OiBGcmlkYXksIE1heSAyNiwgMjAxNyA4
OjU1IEFNDQo+IFRvOiBEZXVjaGVyLCBBbGV4YW5kZXI7ICdKb2VyZyBSb2VkZWwnDQo+IENjOiAn
Sm9lcmcgUm9lZGVsJzsgQmpvcm4gSGVsZ2FhczsgbGludXgtcGNpQHZnZXIua2VybmVsLm9yZzsg
bGludXgtDQo+IGtlcm5lbEB2Z2VyLmtlcm5lbC5vcmc7IERhbmllbCBEcmFrZTsgU2FtdWVsIFNp
ZWINCj4gU3ViamVjdDogUmU6IFtQQVRDSCB2Ml0gUENJOiBBZGQgQVRTLWRpc2FibGUgcXVpcmsg
Zm9yIEFNRCBTdG9uZXkgR1BVcw0KPiANCj4gT24gRnJpLCAyMDE3LTA1LTI2IGF0IDExOjU3ICsw
MDAwLCBEZXVjaGVyLCBBbGV4YW5kZXIgd3JvdGU6DQo+ID4NCj4gPiBGV0lXLCB0aGUgR1BVIGRy
aXZlciBkb2VzIG5vdCBhY3R1YWxseSB1c2UgQVRTIGF0IHRoZSBtb21lbnQgc28gSQ0KPiA+IGRv
bid0IHRoaW5rIHdlIHNob3VsZCBzZWUgYW55IEFUUyB0cmFuc2FjdGlvbnMuDQo+IA0KPiBUaGF0
J3MgYSBjb25mdXNpbmcgc2VudGVuY2UuIFRoZSAiR1BVIGRyaXZlciIsIGlmIHlvdSBtZWFuIHNv
ZnR3YXJlDQo+IHJ1bm5pbmcgaW4gdGhlIE9TLCB3b3VsZG4ndCBiZSBleHBlY3RlZCB0byBoYXZl
IGFueXRoaW5nIHRvIGRvIHdpdGgNCj4gQVRTLg0KPiANCj4gQVRTIGlzIHNvbWV0aGluZyB0aGF0
IHRoZSBDUFUgaXRzZWxmIChvciBpdHMgRE1BIGVuZ2luZSkgd291bGQgZG8uDQo+IEluc3RlYWQg
b2YganVzdCBwZXJmb3JtaW5nIGEgRE1BIHRyYW5zYWN0aW9uIHRvIGEgZ2l2ZW4gYnVzIGFkZHJl
c3MsDQo+IGFuZCBsZXR0aW5nIHRoZSBJT01NVSBkbyB0aGUgdHJhbnNsYXRpb24sIHRoZSBoYXJk
d2FyZSBtaWdodCBjaG9vc2UgdG8NCj4gZmlyc3QgcGVyZm9ybSBhbiBJT1RMQiBsb29rdXAsIGFu
ZCB0aGVuIGxhdGVyIGRvIHRoZSBhY3R1YWwgRE1BDQo+IHRyYW5zYWN0aW9uIHRvIHRoZSBwcmUt
dHJhbnNsYXRlZCwgcmF3IHBoeXNpY2FsIGFkZHJlc3MuIFdoaWNoIGtpbmQgb2YNCj4gbWFrZXMg
YSBtb2NrZXJ5IG9mIGFueSBraW5kIG9mIHByb3RlY3Rpb24gdGhlIElPTU1VIGlzIHN1cHBvc2Vk
IHRvIGdpdmUNCj4geW91LCBidXQgZG9lcyBzaGF2ZSBhIGN5Y2xlIG9yIHR3byBvZiBsYXRlbmN5
IG9mZiB0aGUgRE1BIHdoZW4gaXQNCj4gZmluYWxseSBoYXBwZW5zLCBzaW5jZSB0aGUgdHJhbnNs
YXRpb24gY2FuIGJlIGRvbmUgaW4gYWR2YW5jZS4NCg0KKyBKb2huLCBTdXJhdmVlDQoNCkZ1bGwg
ZGlzY2xvc3VyZSwgSSdtIG5vdCBieSBhbnkgbWVhbnMgYW4gZXhwZXJ0IHdpdGggQVRTLiAgSSBn
dWVzcyBJJ20gdGhpbmtpbmcgb2YgUFJJIHN1cHBvcnQgcmF0aGVyIHRoYW4gQVRTIHBlciBzZS4g
IE9uIHRoZSBHUFUgc2lkZSB0aGUgR1BVJ3MgbWVtb3J5IGNvbnRyb2xsZXIgaGFzIG11bHRpcGxl
IHBhdGhzIHRvIHN5c3RlbSBtZW1vcnksIHRoZSBub24tQVRTL1BSSSBwYXRoIGFuZCB0aGUgQVRT
L1BSSSBwYXRoLiAgVGhlIEdQVSBoYXMgaXRzIG93biBpbnRlZ3JhdGVkIE1NVSB0byB2aXJ0dWFs
aXplIHRoZSBHUFUncyBpbnRlcm5hbCBhZGRyZXNzIHNwYWNlIHBlciBHUFUgY2xpZW50LiAgVGhl
IG5vbi1BVFMvUFJJIHBhdGggdXNlcyB0aGUgR1BVJ3MgTU1VIGFuZCBpcyBqdXN0ICJyZWd1bGFy
IiBkbWEgdG8gYWRkcmVzc2VzIHBvdGVudGlhbGx5IHRyYW5zbGF0ZWQgYnkgdGhlIElPTU1VIGp1
c3QgbGlrZSBhbnkgb3RoZXIgZGV2aWNlIHRoYXQgbWF5IG5vdCBoYXZlIEFUUyBzdXBwb3J0LiAg
VGhlIHN5c3RlbSBtZW1vcnkgaGFzIHRvIGJlIHJlc2lkZW50IGJlY2F1c2UgaWYgdGhlIEdQVSBm
YXVsdHMsIGl0IGNhbid0IHJldHJ5IHRoZSB0cmFuc2FjdGlvbi4gIEZvciB0aGUgQVRTL1BSSSBw
YXRoLCB0aGUgR1BVJ3MgTU1VIGlzIGJ5cGFzc2VkIGFuZCBQQVNJRHMgbmVlZCB0byBiZSBzZXR1
cCBvbiB0aGUgSU9NTVUgZm9yIGVhY2ggY2xpZW50LCBidXQgb25jZSBkb25lLCB0cmFuc2FjdGlv
bnMgdGhhdCB1c2UgdGhhdCBpbnRlcmZhY2Ugc3VwcG9ydCByZXRyaWVzIG9uIEdQVSBwYWdlIGZh
dWx0cyAoYWZ0ZXIgdGhlIE9TIGhhZCBwYWdlZCB0aGUgbWVtb3J5IGluIGFuZCB0aGUgSU9NTVUg
dGFibGVzIGJlZW4gdXBkYXRlZCkgYW5kIG90aGVyIGZlYXR1cmVzLiAgSSB0aGluayBvbmx5IHRo
ZSBBVFMvUFJJIGNhc2UgdXNlcyB0aGUgQVRDIG9uIHRoZSBlbmQgcG9pbnQuICBKb2huLCBTdXJh
dmVlLCBjb3JyZWN0IG1lIGlmIEknbSB3cm9uZy4NCg0KQWxleA0KDQo=

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
  2017-04-07 14:32 [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs Joerg Roedel
  2017-04-07 16:46   ` Deucher, Alexander
  2017-04-08  7:41 ` Lukas Wunner
@ 2017-06-15 14:04 ` Joerg Roedel
  2017-06-15 17:01   ` Samuel Sieb
  2017-06-15 19:15   ` Bjorn Helgaas
  2017-07-13  2:56 ` Bjorn Helgaas
  3 siblings, 2 replies; 35+ messages in thread
From: Joerg Roedel @ 2017-06-15 14:04 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-kernel, Daniel Drake, Alexander Deucher,
	Samuel Sieb, David Woodhouse

Hi Bjorn,

On Fri, Apr 07, 2017 at 04:32:18PM +0200, Joerg Roedel wrote:
> From: Joerg Roedel <jroedel@suse.de>
> 
> ATS is broken on this hardware and causes IOMMU stalls and
> system failure. Disable ATS on these devices to make them
> usable again with IOMMU enabled.
> 
> Note that the commit in the Fixes-tag is not buggy, it
> just uncovers the problem in the hardware by increasing
> the ATS-flush rate.
> 
> Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  drivers/pci/quirks.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)

Any more objections on this patch? Please let me know if you want to
have something changed.


Regards,

	Joerg

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
  2017-06-15 14:04 ` Joerg Roedel
@ 2017-06-15 17:01   ` Samuel Sieb
  2017-06-15 18:13       ` Deucher, Alexander
  2017-06-15 19:15   ` Bjorn Helgaas
  1 sibling, 1 reply; 35+ messages in thread
From: Samuel Sieb @ 2017-06-15 17:01 UTC (permalink / raw)
  To: Joerg Roedel, Bjorn Helgaas
  Cc: linux-pci, linux-kernel, Daniel Drake, Alexander Deucher,
	David Woodhouse

On 06/15/2017 07:04 AM, Joerg Roedel wrote:
> Hi Bjorn,
> 
> On Fri, Apr 07, 2017 at 04:32:18PM +0200, Joerg Roedel wrote:
>> From: Joerg Roedel <jroedel@suse.de>
>>
>> ATS is broken on this hardware and causes IOMMU stalls and
>> system failure. Disable ATS on these devices to make them
>> usable again with IOMMU enabled.
>>
>> Note that the commit in the Fixes-tag is not buggy, it
>> just uncovers the problem in the hardware by increasing
>> the ATS-flush rate.
>>
>> Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
>> Signed-off-by: Joerg Roedel <jroedel@suse.de>
>> ---
>>   drivers/pci/quirks.c | 19 +++++++++++++++++++
>>   1 file changed, 19 insertions(+)
> 
> Any more objections on this patch? Please let me know if you want to
> have something changed.

The other patch seems to fix this issue without disabling ATS.  Isn't 
that better?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
  2017-04-20 12:11   ` Joerg Roedel
@ 2017-06-15 17:12     ` Bjorn Helgaas
  0 siblings, 0 replies; 35+ messages in thread
From: Bjorn Helgaas @ 2017-06-15 17:12 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Lukas Wunner, Joerg Roedel, Bjorn Helgaas, linux-pci,
	linux-kernel, Daniel Drake, Alexander Deucher, Samuel Sieb,
	David Woodhouse, Paul Menzel

On Thu, Apr 20, 2017 at 02:11:42PM +0200, Joerg Roedel wrote:
> On Sat, Apr 08, 2017 at 09:41:07AM +0200, Lukas Wunner wrote:
> > On Fri, Apr 07, 2017 at 04:32:18PM +0200, Joerg Roedel wrote:
> > > From: Joerg Roedel <jroedel@suse.de>
> > > 
> > > ATS is broken on this hardware and causes IOMMU stalls and
> > > system failure. Disable ATS on these devices to make them
> > > usable again with IOMMU enabled.
> > 
> > AMD Stoney Ridge is an x86 CPU + GPU combo and this quirk pertains
> > to the GPU, right?
> > 
> > In that case the quirk should go to arch/x86.  Paul Menzel (+cc)
> > has just complained on linux-pci@ that final fixups are taking half
> > a second, and I think that could be reduced if more efforts were
> > spent to move arch-specific quirks out of the catch-all in
> > drivers/pci/quirks.c.
> 
> The affected hardware here might be x86-only, but ATS is not. If a
> broken ATS-capable plug-in card appears, we need this in generic code
> anyway.

It could go in either arch/x86/pci/fixup.c or drivers/pci/quirks.c.

It's not clear to me exactly what the hardware defect is or where it
is.  If it's in the CPU or in a GPU that can only be found on x86, I
think arch/x86/pci/fixup.c is the appropriate place.

If it's in a GPU that could be found on other arches,
drivers/pci/quirks.c would be the appropriate place.

I don't personally think the possibility of a plugin card with broken
ATS is a real reason to put this quirk in drivers/pci/quirks.c.  It's
a trivial patch and easy to copy or move later if we need to.

Bjorn

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
  2017-06-15 17:01   ` Samuel Sieb
@ 2017-06-15 18:13       ` Deucher, Alexander
  0 siblings, 0 replies; 35+ messages in thread
From: Deucher, Alexander @ 2017-06-15 18:13 UTC (permalink / raw)
  To: 'Samuel Sieb', Joerg Roedel, Bjorn Helgaas
  Cc: linux-pci, linux-kernel, Daniel Drake, David Woodhouse

> -----Original Message-----
> From: Samuel Sieb [mailto:samuel@sieb.net]
> Sent: Thursday, June 15, 2017 1:02 PM
> To: Joerg Roedel; Bjorn Helgaas
> Cc: linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Daniel Drake;
> Deucher, Alexander; David Woodhouse
> Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> 
> On 06/15/2017 07:04 AM, Joerg Roedel wrote:
> > Hi Bjorn,
> >
> > On Fri, Apr 07, 2017 at 04:32:18PM +0200, Joerg Roedel wrote:
> >> From: Joerg Roedel <jroedel@suse.de>
> >>
> >> ATS is broken on this hardware and causes IOMMU stalls and
> >> system failure. Disable ATS on these devices to make them
> >> usable again with IOMMU enabled.
> >>
> >> Note that the commit in the Fixes-tag is not buggy, it
> >> just uncovers the problem in the hardware by increasing
> >> the ATS-flush rate.
> >>
> >> Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
> >> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> >> ---
> >>   drivers/pci/quirks.c | 19 +++++++++++++++++++
> >>   1 file changed, 19 insertions(+)
> >
> > Any more objections on this patch? Please let me know if you want to
> > have something changed.
> 
> The other patch seems to fix this issue without disabling ATS.  Isn't
> that better?

I talked to our validation team and ATS was validated on Stoney, so this patch is just working around something else.  The other patch fixes it and is a valid optimization (it should be applied eventually), but apparently the current behavior is allowed even if it's now optimal.  I'm not really an ATS expert.

Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
@ 2017-06-15 18:13       ` Deucher, Alexander
  0 siblings, 0 replies; 35+ messages in thread
From: Deucher, Alexander @ 2017-06-15 18:13 UTC (permalink / raw)
  To: 'Samuel Sieb', Joerg Roedel, Bjorn Helgaas
  Cc: linux-pci, linux-kernel, Daniel Drake, David Woodhouse

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBTYW11ZWwgU2llYiBbbWFpbHRv
OnNhbXVlbEBzaWViLm5ldF0NCj4gU2VudDogVGh1cnNkYXksIEp1bmUgMTUsIDIwMTcgMTowMiBQ
TQ0KPiBUbzogSm9lcmcgUm9lZGVsOyBCam9ybiBIZWxnYWFzDQo+IENjOiBsaW51eC1wY2lAdmdl
ci5rZXJuZWwub3JnOyBsaW51eC1rZXJuZWxAdmdlci5rZXJuZWwub3JnOyBEYW5pZWwgRHJha2U7
DQo+IERldWNoZXIsIEFsZXhhbmRlcjsgRGF2aWQgV29vZGhvdXNlDQo+IFN1YmplY3Q6IFJlOiBb
UEFUQ0ggdjJdIFBDSTogQWRkIEFUUy1kaXNhYmxlIHF1aXJrIGZvciBBTUQgU3RvbmV5IEdQVXMN
Cj4gDQo+IE9uIDA2LzE1LzIwMTcgMDc6MDQgQU0sIEpvZXJnIFJvZWRlbCB3cm90ZToNCj4gPiBI
aSBCam9ybiwNCj4gPg0KPiA+IE9uIEZyaSwgQXByIDA3LCAyMDE3IGF0IDA0OjMyOjE4UE0gKzAy
MDAsIEpvZXJnIFJvZWRlbCB3cm90ZToNCj4gPj4gRnJvbTogSm9lcmcgUm9lZGVsIDxqcm9lZGVs
QHN1c2UuZGU+DQo+ID4+DQo+ID4+IEFUUyBpcyBicm9rZW4gb24gdGhpcyBoYXJkd2FyZSBhbmQg
Y2F1c2VzIElPTU1VIHN0YWxscyBhbmQNCj4gPj4gc3lzdGVtIGZhaWx1cmUuIERpc2FibGUgQVRT
IG9uIHRoZXNlIGRldmljZXMgdG8gbWFrZSB0aGVtDQo+ID4+IHVzYWJsZSBhZ2FpbiB3aXRoIElP
TU1VIGVuYWJsZWQuDQo+ID4+DQo+ID4+IE5vdGUgdGhhdCB0aGUgY29tbWl0IGluIHRoZSBGaXhl
cy10YWcgaXMgbm90IGJ1Z2d5LCBpdA0KPiA+PiBqdXN0IHVuY292ZXJzIHRoZSBwcm9ibGVtIGlu
IHRoZSBoYXJkd2FyZSBieSBpbmNyZWFzaW5nDQo+ID4+IHRoZSBBVFMtZmx1c2ggcmF0ZS4NCj4g
Pj4NCj4gPj4gRml4ZXM6IGIxNTE2YTE0NjU3YSAoJ2lvbW11L2FtZDogSW1wbGVtZW50IGZsdXNo
IHF1ZXVlJykNCj4gPj4gU2lnbmVkLW9mZi1ieTogSm9lcmcgUm9lZGVsIDxqcm9lZGVsQHN1c2Uu
ZGU+DQo+ID4+IC0tLQ0KPiA+PiAgIGRyaXZlcnMvcGNpL3F1aXJrcy5jIHwgMTkgKysrKysrKysr
KysrKysrKysrKw0KPiA+PiAgIDEgZmlsZSBjaGFuZ2VkLCAxOSBpbnNlcnRpb25zKCspDQo+ID4N
Cj4gPiBBbnkgbW9yZSBvYmplY3Rpb25zIG9uIHRoaXMgcGF0Y2g/IFBsZWFzZSBsZXQgbWUga25v
dyBpZiB5b3Ugd2FudCB0bw0KPiA+IGhhdmUgc29tZXRoaW5nIGNoYW5nZWQuDQo+IA0KPiBUaGUg
b3RoZXIgcGF0Y2ggc2VlbXMgdG8gZml4IHRoaXMgaXNzdWUgd2l0aG91dCBkaXNhYmxpbmcgQVRT
LiAgSXNuJ3QNCj4gdGhhdCBiZXR0ZXI/DQoNCkkgdGFsa2VkIHRvIG91ciB2YWxpZGF0aW9uIHRl
YW0gYW5kIEFUUyB3YXMgdmFsaWRhdGVkIG9uIFN0b25leSwgc28gdGhpcyBwYXRjaCBpcyBqdXN0
IHdvcmtpbmcgYXJvdW5kIHNvbWV0aGluZyBlbHNlLiAgVGhlIG90aGVyIHBhdGNoIGZpeGVzIGl0
IGFuZCBpcyBhIHZhbGlkIG9wdGltaXphdGlvbiAoaXQgc2hvdWxkIGJlIGFwcGxpZWQgZXZlbnR1
YWxseSksIGJ1dCBhcHBhcmVudGx5IHRoZSBjdXJyZW50IGJlaGF2aW9yIGlzIGFsbG93ZWQgZXZl
biBpZiBpdCdzIG5vdyBvcHRpbWFsLiAgSSdtIG5vdCByZWFsbHkgYW4gQVRTIGV4cGVydC4NCg0K
QWxleA0KDQo=

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
  2017-06-15 14:04 ` Joerg Roedel
  2017-06-15 17:01   ` Samuel Sieb
@ 2017-06-15 19:15   ` Bjorn Helgaas
  2017-06-16 16:29     ` Joerg Roedel
  1 sibling, 1 reply; 35+ messages in thread
From: Bjorn Helgaas @ 2017-06-15 19:15 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Bjorn Helgaas, linux-pci, linux-kernel, Daniel Drake,
	Alexander Deucher, Samuel Sieb, David Woodhouse

On Thu, Jun 15, 2017 at 04:04:21PM +0200, Joerg Roedel wrote:
> Hi Bjorn,
> 
> On Fri, Apr 07, 2017 at 04:32:18PM +0200, Joerg Roedel wrote:
> > From: Joerg Roedel <jroedel@suse.de>
> > 
> > ATS is broken on this hardware and causes IOMMU stalls and
> > system failure. Disable ATS on these devices to make them
> > usable again with IOMMU enabled.
> > 
> > Note that the commit in the Fixes-tag is not buggy, it
> > just uncovers the problem in the hardware by increasing
> > the ATS-flush rate.
> > 
> > Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
> > Signed-off-by: Joerg Roedel <jroedel@suse.de>
> > ---
> >  drivers/pci/quirks.c | 19 +++++++++++++++++++
> >  1 file changed, 19 insertions(+)
> 
> Any more objections on this patch? Please let me know if you want to
> have something changed.

It was marked "superseded" in patchwork and thus off my radar.  I
don't remember if I did that or why.  I changed it back to "New" so I
won't forget about it.

You mention (May 24) the original bug report.  Can you include the URL
for that?

I admit I just don't have warm fuzzies that the problem is well
understood.

Bjorn

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
  2017-06-15 19:15   ` Bjorn Helgaas
@ 2017-06-16 16:29     ` Joerg Roedel
  2017-07-10 16:53       ` Bjorn Helgaas
  0 siblings, 1 reply; 35+ messages in thread
From: Joerg Roedel @ 2017-06-16 16:29 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, linux-pci, linux-kernel, Daniel Drake,
	Alexander Deucher, Samuel Sieb, David Woodhouse

Hi Bjorn,

On Thu, Jun 15, 2017 at 02:15:45PM -0500, Bjorn Helgaas wrote:
> It was marked "superseded" in patchwork and thus off my radar.  I
> don't remember if I did that or why.  I changed it back to "New" so I
> won't forget about it.

Great!

> You mention (May 24) the original bug report.  Can you include the URL
> for that?

I think there were multiple reports, here is one I could still find:

	https://lists.linuxfoundation.org/pipermail/iommu/2017-March/020836.html

> I admit I just don't have warm fuzzies that the problem is well
> understood.

The current understanding (without my ability to debug the hardware
involved) is that the GPU in the Stoney systems gets into a weird state
when ATS invalidations are sent too fast and stops responding to the
iommu.

The iommu then can't complete the invalidation commands and the driver
throws completion-wait loop timeout messages out.



	Joerg

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
  2017-06-16 16:29     ` Joerg Roedel
@ 2017-07-10 16:53       ` Bjorn Helgaas
  2017-07-11 11:49         ` Joerg Roedel
  0 siblings, 1 reply; 35+ messages in thread
From: Bjorn Helgaas @ 2017-07-10 16:53 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Bjorn Helgaas, linux-pci, linux-kernel, Daniel Drake,
	Alexander Deucher, Samuel Sieb, David Woodhouse

On Fri, Jun 16, 2017 at 06:29:23PM +0200, Joerg Roedel wrote:
> Hi Bjorn,
> 
> On Thu, Jun 15, 2017 at 02:15:45PM -0500, Bjorn Helgaas wrote:
> > It was marked "superseded" in patchwork and thus off my radar.  I
> > don't remember if I did that or why.  I changed it back to "New" so I
> > won't forget about it.
> 
> Great!
> 
> > You mention (May 24) the original bug report.  Can you include the URL
> > for that?
> 
> I think there were multiple reports, here is one I could still find:
> 
> 	https://lists.linuxfoundation.org/pipermail/iommu/2017-March/020836.html
> 
> > I admit I just don't have warm fuzzies that the problem is well
> > understood.
> 
> The current understanding (without my ability to debug the hardware
> involved) is that the GPU in the Stoney systems gets into a weird state
> when ATS invalidations are sent too fast and stops responding to the
> iommu.
> 
> The iommu then can't complete the invalidation commands and the driver
> throws completion-wait loop timeout messages out.

I'm still confused.  Per Samuel
(6dd9dbac-9b65-bc7c-bb08-413a05d09fc8@sieb.net):

Samuel> The other patch seems to fix this issue without disabling ATS.
Samuel> Isn't that better?

and Alex
(BN6PR12MB1652DF4130FC792B71DD9974F7C00@BN6PR12MB1652.namprd12.prod.outlook.com):

Alex> I talked to our validation team and ATS was validated on Stoney,
Alex> so this patch is just working around something else.  The other
Alex> patch fixes it and is a valid optimization ...

I'm confused about what this "other patch" is and whether we want that
one, this one, or both.

Bjorn

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
  2017-07-10 16:53       ` Bjorn Helgaas
@ 2017-07-11 11:49         ` Joerg Roedel
  2017-07-11 19:08             ` Deucher, Alexander
  0 siblings, 1 reply; 35+ messages in thread
From: Joerg Roedel @ 2017-07-11 11:49 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, linux-pci, linux-kernel, Daniel Drake,
	Alexander Deucher, Samuel Sieb, David Woodhouse

Hi Bjorn,

On Mon, Jul 10, 2017 at 11:53:58AM -0500, Bjorn Helgaas wrote:
> I'm still confused.  Per Samuel
> (6dd9dbac-9b65-bc7c-bb08-413a05d09fc8@sieb.net):
> 
> Samuel> The other patch seems to fix this issue without disabling ATS.
> Samuel> Isn't that better?
> 
> and Alex
> (BN6PR12MB1652DF4130FC792B71DD9974F7C00@BN6PR12MB1652.namprd12.prod.outlook.com):
> 
> Alex> I talked to our validation team and ATS was validated on Stoney,
> Alex> so this patch is just working around something else.  The other
> Alex> patch fixes it and is a valid optimization ...
> 
> I'm confused about what this "other patch" is and whether we want that
> one, this one, or both.

The other patches floating around lowered the ATS flush-rate from the
AMD IOMMU driver, which makes the issue disappear as well. But the issue
only disappeared, it is not solved and could probably still be
reproduced with a GPU usage pattern that increases the ATS flush-rate.

So blacklisting the device for ATS is still the safest thing we could do
here.


Regards,

	Joerg

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
  2017-07-11 11:49         ` Joerg Roedel
@ 2017-07-11 19:08             ` Deucher, Alexander
  0 siblings, 0 replies; 35+ messages in thread
From: Deucher, Alexander @ 2017-07-11 19:08 UTC (permalink / raw)
  To: 'Joerg Roedel', Bjorn Helgaas
  Cc: Bjorn Helgaas, linux-pci, linux-kernel, Daniel Drake,
	Samuel Sieb, David Woodhouse

> -----Original Message-----
> From: Joerg Roedel [mailto:jroedel@suse.de]
> Sent: Tuesday, July 11, 2017 7:50 AM
> To: Bjorn Helgaas
> Cc: Bjorn Helgaas; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org;
> Daniel Drake; Deucher, Alexander; Samuel Sieb; David Woodhouse
> Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> 
> Hi Bjorn,
> 
> On Mon, Jul 10, 2017 at 11:53:58AM -0500, Bjorn Helgaas wrote:
> > I'm still confused.  Per Samuel
> > (6dd9dbac-9b65-bc7c-bb08-413a05d09fc8@sieb.net):
> >
> > Samuel> The other patch seems to fix this issue without disabling ATS.
> > Samuel> Isn't that better?
> >
> > and Alex
> >
> (BN6PR12MB1652DF4130FC792B71DD9974F7C00@BN6PR12MB1652.namprd1
> 2.prod.outlook.com):
> >
> > Alex> I talked to our validation team and ATS was validated on Stoney,
> > Alex> so this patch is just working around something else.  The other
> > Alex> patch fixes it and is a valid optimization ...
> >
> > I'm confused about what this "other patch" is and whether we want that
> > one, this one, or both.

Here's the other patch:
https://lists.freedesktop.org/archives/amd-gfx/2017-May/009421.html

> 
> The other patches floating around lowered the ATS flush-rate from the
> AMD IOMMU driver, which makes the issue disappear as well. But the issue
> only disappeared, it is not solved and could probably still be
> reproduced with a GPU usage pattern that increases the ATS flush-rate.
> 
> So blacklisting the device for ATS is still the safest thing we could do
> here.

I don't have any objection per se, but I'd hate to add a quirk to disable it only to remove it again in the future if we needed ATS related functionality later.  We are in the process of upstreaming KFD support for Carrizo (which is a bigger version of Stoney) and that utilizes ATS related functionality to provide GPU access to pageable memory.  There are no immediate requirements for Stoney, but that may change.

Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
@ 2017-07-11 19:08             ` Deucher, Alexander
  0 siblings, 0 replies; 35+ messages in thread
From: Deucher, Alexander @ 2017-07-11 19:08 UTC (permalink / raw)
  To: 'Joerg Roedel', Bjorn Helgaas
  Cc: Bjorn Helgaas, linux-pci, linux-kernel, Daniel Drake,
	Samuel Sieb, David Woodhouse

> -----Original Message-----
> From: Joerg Roedel [mailto:jroedel@suse.de]
> Sent: Tuesday, July 11, 2017 7:50 AM
> To: Bjorn Helgaas
> Cc: Bjorn Helgaas; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.or=
g;
> Daniel Drake; Deucher, Alexander; Samuel Sieb; David Woodhouse
> Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
>=20
> Hi Bjorn,
>=20
> On Mon, Jul 10, 2017 at 11:53:58AM -0500, Bjorn Helgaas wrote:
> > I'm still confused.  Per Samuel
> > (6dd9dbac-9b65-bc7c-bb08-413a05d09fc8@sieb.net):
> >
> > Samuel> The other patch seems to fix this issue without disabling ATS.
> > Samuel> Isn't that better?
> >
> > and Alex
> >
> (BN6PR12MB1652DF4130FC792B71DD9974F7C00@BN6PR12MB1652.namprd1
> 2.prod.outlook.com):
> >
> > Alex> I talked to our validation team and ATS was validated on Stoney,
> > Alex> so this patch is just working around something else.  The other
> > Alex> patch fixes it and is a valid optimization ...
> >
> > I'm confused about what this "other patch" is and whether we want that
> > one, this one, or both.

Here's the other patch:
https://lists.freedesktop.org/archives/amd-gfx/2017-May/009421.html

>=20
> The other patches floating around lowered the ATS flush-rate from the
> AMD IOMMU driver, which makes the issue disappear as well. But the issue
> only disappeared, it is not solved and could probably still be
> reproduced with a GPU usage pattern that increases the ATS flush-rate.
>=20
> So blacklisting the device for ATS is still the safest thing we could do
> here.

I don't have any objection per se, but I'd hate to add a quirk to disable i=
t only to remove it again in the future if we needed ATS related functional=
ity later.  We are in the process of upstreaming KFD support for Carrizo (w=
hich is a bigger version of Stoney) and that utilizes ATS related functiona=
lity to provide GPU access to pageable memory.  There are no immediate requ=
irements for Stoney, but that may change.

Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
  2017-04-07 14:32 [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs Joerg Roedel
                   ` (2 preceding siblings ...)
  2017-06-15 14:04 ` Joerg Roedel
@ 2017-07-13  2:56 ` Bjorn Helgaas
  2017-08-29 20:02   ` Samuel Sieb
  3 siblings, 1 reply; 35+ messages in thread
From: Bjorn Helgaas @ 2017-07-13  2:56 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Bjorn Helgaas, linux-pci, linux-kernel, Daniel Drake,
	Alexander Deucher, Samuel Sieb, David Woodhouse, Joerg Roedel

On Fri, Apr 07, 2017 at 04:32:18PM +0200, Joerg Roedel wrote:
> From: Joerg Roedel <jroedel@suse.de>
> 
> ATS is broken on this hardware and causes IOMMU stalls and
> system failure. Disable ATS on these devices to make them
> usable again with IOMMU enabled.
> 
> Note that the commit in the Fixes-tag is not buggy, it
> just uncovers the problem in the hardware by increasing
> the ATS-flush rate.
> 
> Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
> Signed-off-by: Joerg Roedel <jroedel@suse.de>

Applied with Alex's ack to pci/virtualization for v4.14, thanks!

Alex, you seemed a little ambivalent later.  If you want to rescind
your ack, let me know and I'll remove it.

> ---
>  drivers/pci/quirks.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 6736836..7cbe316 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4634,3 +4634,22 @@ static void quirk_no_aersid(struct pci_dev *pdev)
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2031, quirk_no_aersid);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2032, quirk_no_aersid);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2033, quirk_no_aersid);
> +
> +#ifdef CONFIG_PCI_ATS
> +/*
> + * Some devices have a broken ATS implementation causing IOMMU stalls.
> + * Don't use ATS for those devices.
> + */
> +static void quirk_disable_ats(struct pci_dev *pdev)
> +{
> +	/*
> +	 * Set pdev->ats_cap = 0 to make pci_enable_ats() bail out
> +	 * early.
> +	 */
> +	dev_info(&pdev->dev, "QUIRK: Disabling ATS");
> +	pdev->ats_cap = 0;
> +}
> +
> +/* AMD Stoney platform GPU */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4, quirk_disable_ats);
> +#endif /* CONFIG_PCI_ATS */
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
  2017-07-13  2:56 ` Bjorn Helgaas
@ 2017-08-29 20:02   ` Samuel Sieb
  2017-08-29 20:49     ` Bjorn Helgaas
  2017-08-30 11:44     ` Joerg Roedel
  0 siblings, 2 replies; 35+ messages in thread
From: Samuel Sieb @ 2017-08-29 20:02 UTC (permalink / raw)
  To: Bjorn Helgaas, Joerg Roedel
  Cc: Bjorn Helgaas, linux-pci, linux-kernel, Daniel Drake,
	Alexander Deucher, David Woodhouse, Joerg Roedel

On 07/12/2017 07:56 PM, Bjorn Helgaas wrote:
> On Fri, Apr 07, 2017 at 04:32:18PM +0200, Joerg Roedel wrote:
>> From: Joerg Roedel <jroedel@suse.de>
>>
>> ATS is broken on this hardware and causes IOMMU stalls and
>> system failure. Disable ATS on these devices to make them
>> usable again with IOMMU enabled.
>>
>> Note that the commit in the Fixes-tag is not buggy, it
>> just uncovers the problem in the hardware by increasing
>> the ATS-flush rate.
>>
>> Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
>> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> 
> Applied with Alex's ack to pci/virtualization for v4.14, thanks!

Is there any chance of getting this into an earlier kernel?  This is a 
pretty devastating bug for users!  I'm currently providing patched 
kernels, but if they run upgrades and get a new kernel without noticing 
it, any filesystem they access will get completely mangled.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
  2017-08-29 20:02   ` Samuel Sieb
@ 2017-08-29 20:49     ` Bjorn Helgaas
  2017-08-30 11:44     ` Joerg Roedel
  1 sibling, 0 replies; 35+ messages in thread
From: Bjorn Helgaas @ 2017-08-29 20:49 UTC (permalink / raw)
  To: Samuel Sieb
  Cc: Joerg Roedel, Bjorn Helgaas, linux-pci, linux-kernel,
	Daniel Drake, Alexander Deucher, David Woodhouse, Joerg Roedel

On Tue, Aug 29, 2017 at 01:02:41PM -0700, Samuel Sieb wrote:
> On 07/12/2017 07:56 PM, Bjorn Helgaas wrote:
> >On Fri, Apr 07, 2017 at 04:32:18PM +0200, Joerg Roedel wrote:
> >>From: Joerg Roedel <jroedel@suse.de>
> >>
> >>ATS is broken on this hardware and causes IOMMU stalls and
> >>system failure. Disable ATS on these devices to make them
> >>usable again with IOMMU enabled.
> >>
> >>Note that the commit in the Fixes-tag is not buggy, it
> >>just uncovers the problem in the hardware by increasing
> >>the ATS-flush rate.
> >>
> >>Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
> >>Signed-off-by: Joerg Roedel <jroedel@suse.de>
> >
> >Applied with Alex's ack to pci/virtualization for v4.14, thanks!
> 
> Is there any chance of getting this into an earlier kernel?  This is
> a pretty devastating bug for users!  I'm currently providing patched
> kernels, but if they run upgrades and get a new kernel without
> noticing it, any filesystem they access will get completely mangled.

I assume you're looking to get this into stable kernels or distro
update kernels.  I don't personally deal with either of those, but for
stable kernels, see
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/stable-kernel-rules.rst

For distro update kernels, you'd have to talk to the distro folks, and
I don't have contacts for those.

Bjorn

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
  2017-08-29 20:02   ` Samuel Sieb
  2017-08-29 20:49     ` Bjorn Helgaas
@ 2017-08-30 11:44     ` Joerg Roedel
  1 sibling, 0 replies; 35+ messages in thread
From: Joerg Roedel @ 2017-08-30 11:44 UTC (permalink / raw)
  To: Samuel Sieb
  Cc: Bjorn Helgaas, Bjorn Helgaas, linux-pci, linux-kernel,
	Daniel Drake, Alexander Deucher, David Woodhouse, Joerg Roedel

On Tue, Aug 29, 2017 at 01:02:41PM -0700, Samuel Sieb wrote:
> On 07/12/2017 07:56 PM, Bjorn Helgaas wrote:
> >Applied with Alex's ack to pci/virtualization for v4.14, thanks!
> 
> Is there any chance of getting this into an earlier kernel?  This is a
> pretty devastating bug for users!  I'm currently providing patched kernels,
> but if they run upgrades and get a new kernel without noticing it, any
> filesystem they access will get completely mangled.

The patch is queued for v4.14, so it will probably be picked up by
distributions when v4.14-rc1 is released. At least this will be the case
for the affected (Open)SUSE distributions.


	Joerg

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2017-08-30 11:44 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-07 14:32 [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs Joerg Roedel
2017-04-07 16:46 ` Deucher, Alexander
2017-04-07 16:46   ` Deucher, Alexander
2017-05-04 10:21   ` David Woodhouse
2017-05-04 14:41     ` Deucher, Alexander
2017-05-04 14:41       ` Deucher, Alexander
2017-05-23 19:54     ` Deucher, Alexander
2017-05-23 19:54       ` Deucher, Alexander
2017-05-24  8:44       ` Joerg Roedel
2017-05-24 10:38         ` David Woodhouse
2017-05-24 12:56         ` Deucher, Alexander
2017-05-24 12:56           ` Deucher, Alexander
2017-05-26  6:48           ` Samuel Sieb
2017-05-26 11:57         ` Deucher, Alexander
2017-05-26 11:57           ` Deucher, Alexander
2017-05-26 12:54           ` David Woodhouse
2017-05-26 15:59             ` Deucher, Alexander
2017-05-26 15:59               ` Deucher, Alexander
2017-04-08  7:41 ` Lukas Wunner
2017-04-20 12:11   ` Joerg Roedel
2017-06-15 17:12     ` Bjorn Helgaas
2017-06-15 14:04 ` Joerg Roedel
2017-06-15 17:01   ` Samuel Sieb
2017-06-15 18:13     ` Deucher, Alexander
2017-06-15 18:13       ` Deucher, Alexander
2017-06-15 19:15   ` Bjorn Helgaas
2017-06-16 16:29     ` Joerg Roedel
2017-07-10 16:53       ` Bjorn Helgaas
2017-07-11 11:49         ` Joerg Roedel
2017-07-11 19:08           ` Deucher, Alexander
2017-07-11 19:08             ` Deucher, Alexander
2017-07-13  2:56 ` Bjorn Helgaas
2017-08-29 20:02   ` Samuel Sieb
2017-08-29 20:49     ` Bjorn Helgaas
2017-08-30 11:44     ` Joerg Roedel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.