linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* AMD IOMMU causing filesystem corruption
@ 2017-04-03 20:38 Samuel Sieb
  2017-04-03 21:39 ` Joerg Roedel
  0 siblings, 1 reply; 14+ messages in thread
From: Samuel Sieb @ 2017-04-03 20:38 UTC (permalink / raw)
  To: linux-kernel

I filed a bug in bugzilla, but I wasn't sure what category to put it in, 
so I suspect I ended up picking one that doesn't get looked at much.

https://bugzilla.kernel.org/show_bug.cgi?id=195051

The issue is that on a specific Acer laptop with a dual-core A9, if I 
don't disable the IOMMU using iommu=off, it has immediate and rapidly 
fatal filesystem corruption by the time a user logs into the desktop. 
What led me to try that was at one point I noticed an error message 
about the iommu in the logs.  However, I did not have a chance to save 
that due to the corruption obliterating the log files.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: AMD IOMMU causing filesystem corruption
  2017-04-03 20:38 AMD IOMMU causing filesystem corruption Samuel Sieb
@ 2017-04-03 21:39 ` Joerg Roedel
  2017-04-04  3:37   ` Samuel Sieb
  0 siblings, 1 reply; 14+ messages in thread
From: Joerg Roedel @ 2017-04-03 21:39 UTC (permalink / raw)
  To: Samuel Sieb; +Cc: linux-kernel

Hi Samuel,

On Mon, Apr 03, 2017 at 01:38:08PM -0700, Samuel Sieb wrote:
> I filed a bug in bugzilla, but I wasn't sure what category to put it
> in, so I suspect I ended up picking one that doesn't get looked at
> much.
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=195051
> 
> The issue is that on a specific Acer laptop with a dual-core A9, if
> I don't disable the IOMMU using iommu=off, it has immediate and
> rapidly fatal filesystem corruption by the time a user logs into the
> desktop. What led me to try that was at one point I noticed an error
> message about the iommu in the logs.  However, I did not have a
> chance to save that due to the corruption obliterating the log
> files.

You have a system based on the AMD Stoney platform, on which the PCI-ATS
feature of the GPU is broken, as we recently found out.

Can you please test whether the attached patch fixes the issue on your
machine?

>From 09cbdcbbd23f0823e7651b4f35b13ae633b3fbe2 Mon Sep 17 00:00:00 2001
From: Joerg Roedel <jroedel@suse.de>
Date: Tue, 28 Mar 2017 13:20:27 +0200
Subject: [PATCH] PCI: Blacklist AMD Stoney GPU devices for ATS

ATS is broken on these devices. Under invalidation load, the
GPU does not reply to invalidations anymore, causing
Completion-wait loop timeouts on the AMD IOMMU driver side.
Fix it by not enabling ATS on these devices.

Note that below mentioned commit is not broken, it just
triggers the issue because it might cause invalidation
storms on devices.

Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
Reported-by: Daniel Drake <drake@endlessm.com>
Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 drivers/pci/ats.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index eeb9fb2..711bdb2 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -17,10 +17,18 @@
 
 #include "pci.h"
 
+static const struct pci_device_id broken_ats_tbl[] = {
+	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, 0x98e4) }, /* AMD Stoney GPU part */
+	{ 0 }
+};
+
 void pci_ats_init(struct pci_dev *dev)
 {
 	int pos;
 
+	if (pci_match_id(broken_ats_tbl, dev))
+		return;
+
 	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ATS);
 	if (!pos)
 		return;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: AMD IOMMU causing filesystem corruption
  2017-04-03 21:39 ` Joerg Roedel
@ 2017-04-04  3:37   ` Samuel Sieb
  2017-04-04  7:11     ` Samuel Sieb
  0 siblings, 1 reply; 14+ messages in thread
From: Samuel Sieb @ 2017-04-04  3:37 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: linux-kernel

On 04/03/2017 02:39 PM, Joerg Roedel wrote:
> You have a system based on the AMD Stoney platform, on which the PCI-ATS
> feature of the GPU is broken, as we recently found out.
>
> Can you please test whether the attached patch fixes the issue on your
> machine?
>
Yes, that works, thank you!

Now I'm curious what Windows does.  Either they don't use that feature 
or they already knew to avoid it.  In which case, why did AMD take so 
long to let the kernel developers know?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: AMD IOMMU causing filesystem corruption
  2017-04-04  3:37   ` Samuel Sieb
@ 2017-04-04  7:11     ` Samuel Sieb
  2017-04-04  7:32       ` Joerg Roedel
  0 siblings, 1 reply; 14+ messages in thread
From: Samuel Sieb @ 2017-04-04  7:11 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: linux-kernel

On 04/03/2017 08:37 PM, Samuel Sieb wrote:
> On 04/03/2017 02:39 PM, Joerg Roedel wrote:
>> You have a system based on the AMD Stoney platform, on which the PCI-ATS
>> feature of the GPU is broken, as we recently found out.
>>
>> Can you please test whether the attached patch fixes the issue on your
>> machine?
>>
> Yes, that works, thank you!
>
Unfortunately, that turned out to be a bit premature.  After compiling a 
kernel on it remotely over ssh (not interacting with the logged in 
desktop), a reboot failed with endless completion-wait loop timeout 
messages and after a force poweroff and restart it won't boot.  The EFI 
filesystem was even destroyed, probably because the kernel installation 
modified a file on there.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: AMD IOMMU causing filesystem corruption
  2017-04-04  7:11     ` Samuel Sieb
@ 2017-04-04  7:32       ` Joerg Roedel
  2017-04-04 16:29         ` Samuel Sieb
  0 siblings, 1 reply; 14+ messages in thread
From: Joerg Roedel @ 2017-04-04  7:32 UTC (permalink / raw)
  To: Samuel Sieb; +Cc: linux-kernel

On Tue, Apr 04, 2017 at 12:11:31AM -0700, Samuel Sieb wrote:
> Unfortunately, that turned out to be a bit premature.  After
> compiling a kernel on it remotely over ssh (not interacting with the
> logged in desktop), a reboot failed with endless completion-wait
> loop timeout messages and after a force poweroff and restart it
> won't boot.  The EFI filesystem was even destroyed, probably because
> the kernel installation modified a file on there.

Yeah, please boot the machine with amd_iommu=off during re-installation
and when you install the modified kernel. And then boot into the patches
kernel with amd_iommu=on (which is the default).


Thanks,

	Joerg

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: AMD IOMMU causing filesystem corruption
  2017-04-04  7:32       ` Joerg Roedel
@ 2017-04-04 16:29         ` Samuel Sieb
  2017-04-07 10:22           ` Joerg Roedel
  2017-04-07 10:27           ` Joerg Roedel
  0 siblings, 2 replies; 14+ messages in thread
From: Samuel Sieb @ 2017-04-04 16:29 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Linux Kernel

On 04/04/2017 12:32 AM, Joerg Roedel wrote:
> On Tue, Apr 04, 2017 at 12:11:31AM -0700, Samuel Sieb wrote:
>> Unfortunately, that turned out to be a bit premature.  After
>> compiling a kernel on it remotely over ssh (not interacting with the
>> logged in desktop), a reboot failed with endless completion-wait
>> loop timeout messages and after a force poweroff and restart it
>> won't boot.  The EFI filesystem was even destroyed, probably because
>> the kernel installation modified a file on there.
>
> Yeah, please boot the machine with amd_iommu=off during re-installation
> and when you install the modified kernel. And then boot into the patches
> kernel with amd_iommu=on (which is the default).
>
That's what I did.  While running with iommu=off, I compiled and 
installed a 4.11rc kernel with the patch.  I rebooted to use that kernel 
and then compiled and installed a 4.10 kernel with that patch and 
another unrelated patch.  That is what I described above.  The 
filesystem destruction happened while running the 4.11rc kernel with 
that patch.  Is there any way to verify that the patch was actually 
having any effect?  Can I check if ATS is enabled or not?  I will have 
to rebuild the system before I can test again.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: AMD IOMMU causing filesystem corruption
  2017-04-04 16:29         ` Samuel Sieb
@ 2017-04-07 10:22           ` Joerg Roedel
  2017-04-08  6:49             ` Samuel Sieb
  2017-04-07 10:27           ` Joerg Roedel
  1 sibling, 1 reply; 14+ messages in thread
From: Joerg Roedel @ 2017-04-07 10:22 UTC (permalink / raw)
  To: Samuel Sieb; +Cc: Linux Kernel

On Tue, Apr 04, 2017 at 09:29:37AM -0700, Samuel Sieb wrote:
> That's what I did.  While running with iommu=off, I compiled and
> installed a 4.11rc kernel with the patch.  I rebooted to use that
> kernel and then compiled and installed a 4.10 kernel with that patch
> and another unrelated patch.  That is what I described above.  The
> filesystem destruction happened while running the 4.11rc kernel with
> that patch.  Is there any way to verify that the patch was actually
> having any effect?  Can I check if ATS is enabled or not?  I will
> have to rebuild the system before I can test again.

Can you please send me output of 'lspci -nv' on your system?


Thanks,

	Joerg

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: AMD IOMMU causing filesystem corruption
  2017-04-04 16:29         ` Samuel Sieb
  2017-04-07 10:22           ` Joerg Roedel
@ 2017-04-07 10:27           ` Joerg Roedel
  2017-04-25 17:55             ` Samuel Sieb
  1 sibling, 1 reply; 14+ messages in thread
From: Joerg Roedel @ 2017-04-07 10:27 UTC (permalink / raw)
  To: Samuel Sieb; +Cc: Linux Kernel

On Tue, Apr 04, 2017 at 09:29:37AM -0700, Samuel Sieb wrote:
> That's what I did.  While running with iommu=off, I compiled and
> installed a 4.11rc kernel with the patch.  I rebooted to use that
> kernel and then compiled and installed a 4.10 kernel with that patch
> and another unrelated patch.  That is what I described above.  The
> filesystem destruction happened while running the 4.11rc kernel with
> that patch.  Is there any way to verify that the patch was actually
> having any effect?  Can I check if ATS is enabled or not?  I will
> have to rebuild the system before I can test again.

Also, please try the attached debug-diff on your kernel. It completly
disables the use of ATS in the amd-iommu driver.

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 98940d1392cb..f019aa67c54c 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -467,7 +467,7 @@ static int iommu_init_device(struct device *dev)
 		struct amd_iommu *iommu;
 
 		iommu = amd_iommu_rlookup_table[dev_data->devid];
-		dev_data->iommu_v2 = iommu->is_iommu_v2;
+		dev_data->iommu_v2 = false;
 	}
 
 	dev->archdata.iommu = dev_data;
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 6130278c5d71..41d0e645960c 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -171,7 +171,7 @@ int amd_iommus_present;
 
 /* IOMMUs have a non-present cache? */
 bool amd_iommu_np_cache __read_mostly;
-bool amd_iommu_iotlb_sup __read_mostly = true;
+bool amd_iommu_iotlb_sup __read_mostly = false;
 
 u32 amd_iommu_max_pasid __read_mostly = ~0;
 

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: AMD IOMMU causing filesystem corruption
  2017-04-07 10:22           ` Joerg Roedel
@ 2017-04-08  6:49             ` Samuel Sieb
  0 siblings, 0 replies; 14+ messages in thread
From: Samuel Sieb @ 2017-04-08  6:49 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Linux Kernel

On 04/07/2017 03:22 AM, Joerg Roedel wrote:
> Can you please send me output of 'lspci -nv' on your system?
>
I have to figure out how to rebuild the system and find the time to do 
it before I can test that patch, but here's the lspci output:

00:00.0 0600: 1022:1576
	Subsystem: 1025:1099
	Flags: bus master, fast devsel, latency 0

00:00.2 0806: 1022:1577
	Subsystem: 1025:1099
	Flags: fast devsel, IRQ 24
	Capabilities: [40] Secure device <?>
	Capabilities: [64] MSI: Enable+ Count=1/4 Maskable- 64bit+
	Capabilities: [74] HyperTransport: MSI Mapping Enable+ Fixed+

00:01.0 0300: 1002:98e4 (rev c1) (prog-if 00 [VGA controller])
	Subsystem: 1025:1099
	Flags: bus master, fast devsel, latency 0, IRQ 37
	Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Memory at f0000000 (64-bit, prefetchable) [size=8M]
	I/O ports at 3000 [size=256]
	Memory at f0d00000 (32-bit, non-prefetchable) [size=256K]
	Expansion ROM at f0d80000 [disabled] [size=128K]
	Capabilities: [48] Vendor Specific Information: Len=08 <?>
	Capabilities: [50] Power Management version 3
	Capabilities: [58] Express Root Complex Integrated Endpoint, MSI 00
	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [270] #19
	Capabilities: [2b0] Address Translation Service (ATS)
	Capabilities: [2c0] Page Request Interface (PRI)
	Capabilities: [2d0] Process Address Space ID (PASID)
	Kernel driver in use: amdgpu
	Kernel modules: amdgpu

00:01.1 0403: 1002:15b3
	Subsystem: 1002:15b3
	Flags: bus master, fast devsel, latency 0, IRQ 255
	Memory at f0d60000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [48] Vendor Specific Information: Len=08 <?>
	Capabilities: [50] Power Management version 3
	Capabilities: [58] Express Root Complex Integrated Endpoint, MSI 00
	Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Kernel modules: snd_hda_intel

00:02.0 0600: 1022:157b
	Flags: fast devsel

00:02.1 0604: 1022:157c (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 26
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	I/O behind bridge: 00001000-00001fff
	Memory behind bridge: f0e00000-f0ffffff
	Prefetchable memory behind bridge: 00000000f1000000-00000000f11fffff
	Capabilities: [50] Power Management version 3
	Capabilities: [58] Express Root Port (Slot+), MSI 00
	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [c0] Subsystem: 1022:1234
	Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+
	Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [270] #19
	Kernel driver in use: pcieport
	Kernel modules: shpchp

00:02.2 0604: 1022:157c (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 27
	Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
	I/O behind bridge: 00002000-00002fff
	Memory behind bridge: f0c00000-f0cfffff
	Capabilities: [50] Power Management version 3
	Capabilities: [58] Express Root Port (Slot+), MSI 00
	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [c0] Subsystem: 1022:1234
	Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+
	Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [270] #19
	Kernel driver in use: pcieport
	Kernel modules: shpchp

00:02.3 0604: 1022:157c (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 28
	Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
	Memory behind bridge: f0800000-f09fffff
	Capabilities: [50] Power Management version 3
	Capabilities: [58] Express Root Port (Slot+), MSI 00
	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [c0] Subsystem: 1022:1234
	Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+
	Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [270] #19
	Kernel driver in use: pcieport
	Kernel modules: shpchp

00:03.0 0600: 1022:157b
	Flags: fast devsel

00:08.0 1080: 1022:1578
	Subsystem: 1025:1099
	Flags: bus master, fast devsel, latency 0, IRQ 255
	Memory at f0d40000 (64-bit, prefetchable) [size=128K]
	Memory at f0b00000 (32-bit, non-prefetchable) [size=1M]
	Memory at f0d6f000 (32-bit, non-prefetchable) [size=4K]
	Memory at f0d6a000 (32-bit, non-prefetchable) [size=8K]
	Capabilities: [50] MSI-X: Enable- Count=2 Masked-
	Capabilities: [5c] HyperTransport: MSI Mapping Enable+ Fixed+
	Capabilities: [60] Power Management version 3
	Capabilities: [a4] PCI Advanced Features

00:09.0 0600: 1022:157d
	Flags: fast devsel

00:09.2 0403: 1022:157a
	Subsystem: 1025:1099
	Flags: bus master, fast devsel, latency 0, IRQ 255
	Memory at f0d64000 (32-bit, non-prefetchable) [size=16K]
	Capabilities: [60] Power Management version 3
	Capabilities: [a4] PCI Advanced Features
	Kernel modules: snd_hda_intel

00:10.0 0c03: 1022:7914 (rev 20) (prog-if 30 [XHCI])
	Subsystem: 1025:1099
	Flags: bus master, fast devsel, latency 0, IRQ 18
	Memory at f0d68000 (64-bit, non-prefetchable) [size=8K]
	Capabilities: [50] Power Management version 3
	Capabilities: [70] MSI: Enable- Count=1/8 Maskable- 64bit+
	Capabilities: [90] MSI-X: Enable+ Count=8 Masked-
	Capabilities: [a0] Express Root Complex Integrated Endpoint, MSI 00
	Capabilities: [100] Latency Tolerance Reporting
	Kernel driver in use: xhci_hcd

00:11.0 0106: 1022:7901 (rev 4b) (prog-if 01 [AHCI 1.0])
	Subsystem: 1025:1099
	Flags: bus master, 66MHz, medium devsel, latency 32, IRQ 19
	I/O ports at 3118 [size=8]
	I/O ports at 3124 [size=4]
	I/O ports at 3110 [size=8]
	I/O ports at 3120 [size=4]
	I/O ports at 3100 [size=16]
	Memory at f0d6c000 (32-bit, non-prefetchable) [size=1K]
	Capabilities: [60] Power Management version 3
	Capabilities: [70] SATA HBA v1.0
	Kernel driver in use: ahci

00:12.0 0c03: 1022:7908 (rev 49) (prog-if 20 [EHCI])
	Subsystem: 1025:1099
	Flags: bus master, 66MHz, medium devsel, latency 32, IRQ 18
	Memory at f0d6d000 (32-bit, non-prefetchable) [size=256]
	Capabilities: [c0] Power Management version 2
	Capabilities: [e4] Debug port: BAR=1 offset=00e0
	Kernel driver in use: ehci-pci

00:14.0 0c05: 1022:790b (rev 4b)
	Subsystem: 1025:1099
	Flags: 66MHz, medium devsel
	Kernel driver in use: piix4_smbus
	Kernel modules: i2c_piix4, sp5100_tco

00:14.3 0601: 1022:790e (rev 11)
	Subsystem: 1025:1099
	Flags: bus master, 66MHz, medium devsel, latency 0

00:18.0 0600: 1022:15b0
	Flags: fast devsel

00:18.1 0600: 1022:15b1
	Flags: fast devsel

00:18.2 0600: 1022:15b2
	Flags: fast devsel

00:18.3 0600: 1022:15b3
	Flags: fast devsel
	Capabilities: [f0] Secure device <?>

00:18.4 0600: 1022:15b4
	Flags: fast devsel
	Kernel modules: fam15h_power

00:18.5 0600: 1022:15b5
	Flags: fast devsel

02:00.0 ff00: 10ec:5287 (rev 01)
	Subsystem: 1025:1099
	Flags: bus master, fast devsel, latency 0, IRQ 33
	Memory at f0c05000 (32-bit, non-prefetchable) [size=4K]
	Expansion ROM at f0c10000 [disabled] [size=64K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [70] Express Endpoint, MSI 00
	Capabilities: [b0] MSI-X: Enable- Count=1 Masked-
	Capabilities: [d0] Vital Product Data
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Virtual Channel
	Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00
	Capabilities: [170] Latency Tolerance Reporting
	Capabilities: [178] L1 PM Substates
	Kernel driver in use: rtsx_pci
	Kernel modules: rtsx_pci

02:00.1 0200: 10ec:8168 (rev 12)
	Subsystem: 1025:1099
	Flags: bus master, fast devsel, latency 0, IRQ 35
	I/O ports at 2000 [size=256]
	Memory at f0c04000 (64-bit, non-prefetchable) [size=4K]
	Memory at f0c00000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [70] Express Endpoint, MSI 01
	Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
	Capabilities: [d0] Vital Product Data
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00
	Capabilities: [170] Latency Tolerance Reporting
	Capabilities: [178] L1 PM Substates
	Kernel driver in use: r8169
	Kernel modules: r8169

03:00.0 0280: 168c:0042 (rev 31)
	Subsystem: 11ad:08a6
	Flags: bus master, fast devsel, latency 0, IRQ 39
	Memory at f0800000 (64-bit, non-prefetchable) [size=2M]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable+ Count=8/8 Maskable+ 64bit-
	Capabilities: [70] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [148] Virtual Channel
	Capabilities: [168] Device Serial Number 00-00-00-00-00-00-00-00
	Capabilities: [178] Latency Tolerance Reporting
	Capabilities: [180] L1 PM Substates
	Kernel driver in use: ath10k_pci
	Kernel modules: ath10k_pci

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: AMD IOMMU causing filesystem corruption
  2017-04-07 10:27           ` Joerg Roedel
@ 2017-04-25 17:55             ` Samuel Sieb
  2017-04-26 10:14               ` Joerg Roedel
  0 siblings, 1 reply; 14+ messages in thread
From: Samuel Sieb @ 2017-04-25 17:55 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Linux Kernel

On 04/07/2017 03:27 AM, Joerg Roedel wrote:
> Also, please try the attached debug-diff on your kernel. It completly
> disables the use of ATS in the amd-iommu driver.
> 
I applied this patch to 4.11.0 rc8 and then stress tested the laptop 
with another kernel build while running graphical applications and there 
appears to be no damage to the filesystem.  Is there any way to 
determine if ATS is enabled or disabled?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: AMD IOMMU causing filesystem corruption
  2017-04-25 17:55             ` Samuel Sieb
@ 2017-04-26 10:14               ` Joerg Roedel
  2017-04-26 21:31                 ` Samuel Sieb
  0 siblings, 1 reply; 14+ messages in thread
From: Joerg Roedel @ 2017-04-26 10:14 UTC (permalink / raw)
  To: Samuel Sieb; +Cc: Linux Kernel

Hi Samuel,

On Tue, Apr 25, 2017 at 10:55:24AM -0700, Samuel Sieb wrote:
> On 04/07/2017 03:27 AM, Joerg Roedel wrote:
> >Also, please try the attached debug-diff on your kernel. It completly
> >disables the use of ATS in the amd-iommu driver.
> >
> I applied this patch to 4.11.0 rc8 and then stress tested the laptop
> with another kernel build while running graphical applications and
> there appears to be no damage to the filesystem.  Is there any way
> to determine if ATS is enabled or disabled?

Great, thanks for testing the patch. The lspci tool should be able to
tell you whether the ATS capability is enabled on the GPU. With a
'lspci -vvv -s <GPUDEV>" should give you that info.


	Joerg

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: AMD IOMMU causing filesystem corruption
  2017-04-26 10:14               ` Joerg Roedel
@ 2017-04-26 21:31                 ` Samuel Sieb
  2017-04-26 21:43                   ` Joerg Roedel
  0 siblings, 1 reply; 14+ messages in thread
From: Samuel Sieb @ 2017-04-26 21:31 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Linux Kernel

On 04/26/2017 03:14 AM, Joerg Roedel wrote:
> On Tue, Apr 25, 2017 at 10:55:24AM -0700, Samuel Sieb wrote:
>> On 04/07/2017 03:27 AM, Joerg Roedel wrote:
>>> Also, please try the attached debug-diff on your kernel. It completly
>>> disables the use of ATS in the amd-iommu driver.
>>>
>> I applied this patch to 4.11.0 rc8 and then stress tested the laptop
>> with another kernel build while running graphical applications and
>> there appears to be no damage to the filesystem.  Is there any way
>> to determine if ATS is enabled or disabled?
> 
> Great, thanks for testing the patch. The lspci tool should be able to
> tell you whether the ATS capability is enabled on the GPU. With a
> 'lspci -vvv -s <GPUDEV>" should give you that info.
> 
This test was done with the patch that always disables ATS.  Which is 
the current patch to selectively disable it?  The last patch I tried 
didn't seem to work.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: AMD IOMMU causing filesystem corruption
  2017-04-26 21:31                 ` Samuel Sieb
@ 2017-04-26 21:43                   ` Joerg Roedel
  2017-04-27 19:32                     ` Samuel Sieb
  0 siblings, 1 reply; 14+ messages in thread
From: Joerg Roedel @ 2017-04-26 21:43 UTC (permalink / raw)
  To: Samuel Sieb; +Cc: Linux Kernel

On Wed, Apr 26, 2017 at 02:31:40PM -0700, Samuel Sieb wrote:
> This test was done with the patch that always disables ATS.  Which
> is the current patch to selectively disable it?  The last patch I
> tried didn't seem to work.

Its

	[PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs


You should have received it as you were on the Cc list.



	Joerg

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: AMD IOMMU causing filesystem corruption
  2017-04-26 21:43                   ` Joerg Roedel
@ 2017-04-27 19:32                     ` Samuel Sieb
  0 siblings, 0 replies; 14+ messages in thread
From: Samuel Sieb @ 2017-04-27 19:32 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Linux Kernel

On 04/26/2017 02:43 PM, Joerg Roedel wrote:
> On Wed, Apr 26, 2017 at 02:31:40PM -0700, Samuel Sieb wrote:
>> This test was done with the patch that always disables ATS.  Which
>> is the current patch to selectively disable it?  The last patch I
>> tried didn't seem to work.
> 
> Its
> 
> 	[PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
> 
> 
> You should have received it as you were on the Cc list.
> 
Yes, but there was some discussion about it, so I wanted to make sure 
that was the latest.  I can verify that the patch works.  ATS is 
disabled and there is no filesystem corruption.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-04-27 19:33 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-03 20:38 AMD IOMMU causing filesystem corruption Samuel Sieb
2017-04-03 21:39 ` Joerg Roedel
2017-04-04  3:37   ` Samuel Sieb
2017-04-04  7:11     ` Samuel Sieb
2017-04-04  7:32       ` Joerg Roedel
2017-04-04 16:29         ` Samuel Sieb
2017-04-07 10:22           ` Joerg Roedel
2017-04-08  6:49             ` Samuel Sieb
2017-04-07 10:27           ` Joerg Roedel
2017-04-25 17:55             ` Samuel Sieb
2017-04-26 10:14               ` Joerg Roedel
2017-04-26 21:31                 ` Samuel Sieb
2017-04-26 21:43                   ` Joerg Roedel
2017-04-27 19:32                     ` Samuel Sieb

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).