linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
@ 2012-11-24 19:40 Justin Piszcz
  2012-11-26 21:42 ` Bruno Prémont
  0 siblings, 1 reply; 24+ messages in thread
From: Justin Piszcz @ 2012-11-24 19:40 UTC (permalink / raw)
  To: support, linux-kernel

Hi,

Is the following normal on an X9SRL-F board (bios 1.0a)?

In the manual it states:

Data Direct I/O
Select Enabled to enable Intel I/OAT (I/O Acceleration Technology), which
significantly reduces CPU overhead by leveraging CPU architectural
improvements and freeing the system resource for other tasks. The options
are Disabled and Enabled.

Default is Enabled.

When enabled in the kernel, I see the following:

[    0.696357] ioatdma: Intel(R) QuickData Technology Driver 4.00
[    0.696487] ioatdma 0000:00:04.0: channel error register unreachable
[    0.696546] ioatdma 0000:00:04.0: channel enumeration error
[    0.696604] ioatdma 0000:00:04.0: Intel(R) I/OAT DMA Engine init failed
[    0.696721] ioatdma 0000:00:04.1: channel error register unreachable
[    0.696779] ioatdma 0000:00:04.1: channel enumeration error
[    0.697522] ioatdma 0000:00:04.1: Intel(R) I/OAT DMA Engine init failed
[    0.697617] ioatdma 0000:00:04.2: channel error register unreachable
[    0.697681] ioatdma 0000:00:04.2: channel enumeration error
[    0.697739] ioatdma 0000:00:04.2: Intel(R) I/OAT DMA Engine init failed
[    0.697831] ioatdma 0000:00:04.3: channel error register unreachable
[    0.697890] ioatdma 0000:00:04.3: channel enumeration error
[    0.697948] ioatdma 0000:00:04.3: Intel(R) I/OAT DMA Engine init failed
[    0.698037] ioatdma 0000:00:04.4: channel error register unreachable
[    0.698095] ioatdma 0000:00:04.4: channel enumeration error
[    0.698153] ioatdma 0000:00:04.4: Intel(R) I/OAT DMA Engine init failed
[    0.698245] ioatdma 0000:00:04.5: channel error register unreachable
[    0.698303] ioatdma 0000:00:04.5: channel enumeration error
[    0.698360] ioatdma 0000:00:04.5: Intel(R) I/OAT DMA Engine init failed
[    0.698449] ioatdma 0000:00:04.6: channel error register unreachable
[    0.698508] ioatdma 0000:00:04.6: channel enumeration error
[    0.698565] ioatdma 0000:00:04.6: Intel(R) I/OAT DMA Engine init failed
[    0.698676] ioatdma 0000:00:04.7: channel error register unreachable
[    0.698735] ioatdma 0000:00:04.7: channel enumeration error
[    0.698792] ioatdma 0000:00:04.7: Intel(R) I/OAT DMA Engine init failed

--

Also, I tried using ASPM (enabled in BIOS), but since ACPI Linux query is
ignored, it fails to work:
[    0.562229] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored

I assume this is something Supermicro has to fix?

Justin.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
  2012-11-24 19:40 Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question Justin Piszcz
@ 2012-11-26 21:42 ` Bruno Prémont
  2012-11-27  0:50   ` Justin Piszcz
  2012-11-27  0:56   ` Bjorn Helgaas
  0 siblings, 2 replies; 24+ messages in thread
From: Bruno Prémont @ 2012-11-26 21:42 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: support, linux-kernel

Hi Justin,

On Sat, 24 November 2012 "Justin Piszcz" wrote:
> Is the following normal on an X9SRL-F board (bios 1.0a)?
> 
> In the manual it states:
> 
> Data Direct I/O
> Select Enabled to enable Intel I/OAT (I/O Acceleration Technology), which
> significantly reduces CPU overhead by leveraging CPU architectural
> improvements and freeing the system resource for other tasks. The options
> are Disabled and Enabled.
> 
> Default is Enabled.
> 
> When enabled in the kernel, I see the following:
> 
> [    0.696357] ioatdma: Intel(R) QuickData Technology Driver 4.00
> [    0.696487] ioatdma 0000:00:04.0: channel error register unreachable
> [    0.696546] ioatdma 0000:00:04.0: channel enumeration error
> [    0.696604] ioatdma 0000:00:04.0: Intel(R) I/OAT DMA Engine init failed
> [    0.696721] ioatdma 0000:00:04.1: channel error register unreachable
> [    0.696779] ioatdma 0000:00:04.1: channel enumeration error
> [    0.697522] ioatdma 0000:00:04.1: Intel(R) I/OAT DMA Engine init failed
> [    0.697617] ioatdma 0000:00:04.2: channel error register unreachable
> [    0.697681] ioatdma 0000:00:04.2: channel enumeration error
> [    0.697739] ioatdma 0000:00:04.2: Intel(R) I/OAT DMA Engine init failed
> [    0.697831] ioatdma 0000:00:04.3: channel error register unreachable
> [    0.697890] ioatdma 0000:00:04.3: channel enumeration error
> [    0.697948] ioatdma 0000:00:04.3: Intel(R) I/OAT DMA Engine init failed
> [    0.698037] ioatdma 0000:00:04.4: channel error register unreachable
> [    0.698095] ioatdma 0000:00:04.4: channel enumeration error
> [    0.698153] ioatdma 0000:00:04.4: Intel(R) I/OAT DMA Engine init failed
> [    0.698245] ioatdma 0000:00:04.5: channel error register unreachable
> [    0.698303] ioatdma 0000:00:04.5: channel enumeration error
> [    0.698360] ioatdma 0000:00:04.5: Intel(R) I/OAT DMA Engine init failed
> [    0.698449] ioatdma 0000:00:04.6: channel error register unreachable
> [    0.698508] ioatdma 0000:00:04.6: channel enumeration error
> [    0.698565] ioatdma 0000:00:04.6: Intel(R) I/OAT DMA Engine init failed
> [    0.698676] ioatdma 0000:00:04.7: channel error register unreachable
> [    0.698735] ioatdma 0000:00:04.7: channel enumeration error
> [    0.698792] ioatdma 0000:00:04.7: Intel(R) I/OAT DMA Engine init failed
> 
> --
> 
> Also, I tried using ASPM (enabled in BIOS), but since ACPI Linux query is
> ignored, it fails to work:
> [    0.562229] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
> 
> I assume this is something Supermicro has to fix?

You are probably missing some kernel config option(s) :) - I did fight similar
issues on a Fujitsu SandyBridge Xeon based server.

Check if enabling CONFIG_X86_X2APIC helps as well as other APIC/IOMMU options.

Bruno

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
  2012-11-26 21:42 ` Bruno Prémont
@ 2012-11-27  0:50   ` Justin Piszcz
  2012-11-27  0:56   ` Bjorn Helgaas
  1 sibling, 0 replies; 24+ messages in thread
From: Justin Piszcz @ 2012-11-27  0:50 UTC (permalink / raw)
  To: 'Bruno Prémont'; +Cc: support, linux-kernel



> [    0.696357] ioatdma: Intel(R) QuickData Technology Driver 4.00
> [    0.696487] ioatdma 0000:00:04.0: channel error register unreachable
> I assume this is something Supermicro has to fix?

You are probably missing some kernel config option(s) :) - I did fight
similar
issues on a Fujitsu SandyBridge Xeon based server.

Check if enabling CONFIG_X86_X2APIC helps as well as other APIC/IOMMU
options.

Bruno

=> Enabled:
CONFIG_IOMMU_SUPPORT
CONFIG_INTEL_IOMMU
CONFIG_INTEL_IOMMU_DEFAULT_ON
CONFIG_IRQ_REMAP

Also tried enabling NUMA, etc:

[    0.330998] ACPI FADT declares the system doesn't support PCIe ASPM, so
disable it
[    0.331068] ACPI: bus type pci registered

[    0.615234] ACPI: Dynamic OEM Table Load:
[    0.615373] ACPI: PRAD           (null) 000BE (v02 PRADID  PRADTID
00000001 MSFT 04000000)
[    0.615631] \_SB_:_OSC invalid UUID
[    0.615633] _OSC request data:1 7


[    0.663138] pci 0000:ff:13.5: [8086:3c44] type 00 class 0x110100
[    0.663170] pci 0000:ff:13.6: [8086:3c45] type 00 class 0x088000
[    0.663211]  pci0000:ff: ACPI _OSC support notification failed, disabling
PCIe ASPM
[    0.663281]  pci0000:ff: Unable to request _OSC control (_OSC support
mask: 0x08)

:(

Justin.




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
  2012-11-26 21:42 ` Bruno Prémont
  2012-11-27  0:50   ` Justin Piszcz
@ 2012-11-27  0:56   ` Bjorn Helgaas
  2012-11-27  1:00     ` Bjorn Helgaas
  2012-11-27  1:11     ` Dan Williams
  1 sibling, 2 replies; 24+ messages in thread
From: Bjorn Helgaas @ 2012-11-27  0:56 UTC (permalink / raw)
  To: Bruno Prémont; +Cc: Justin Piszcz, support, linux-kernel, Dan Williams

[+cc Dan]

On Mon, Nov 26, 2012 at 2:42 PM, Bruno Prémont
<bonbons@linux-vserver.org> wrote:
> Hi Justin,
>
> On Sat, 24 November 2012 "Justin Piszcz" wrote:
>> Is the following normal on an X9SRL-F board (bios 1.0a)?
>>
>> In the manual it states:
>>
>> Data Direct I/O
>> Select Enabled to enable Intel I/OAT (I/O Acceleration Technology), which
>> significantly reduces CPU overhead by leveraging CPU architectural
>> improvements and freeing the system resource for other tasks. The options
>> are Disabled and Enabled.
>>
>> Default is Enabled.
>>
>> When enabled in the kernel, I see the following:
>>
>> [    0.696357] ioatdma: Intel(R) QuickData Technology Driver 4.00
>> [    0.696487] ioatdma 0000:00:04.0: channel error register unreachable
>> [    0.696546] ioatdma 0000:00:04.0: channel enumeration error
>> [    0.696604] ioatdma 0000:00:04.0: Intel(R) I/OAT DMA Engine init failed
>> [    0.696721] ioatdma 0000:00:04.1: channel error register unreachable
>> [    0.696779] ioatdma 0000:00:04.1: channel enumeration error
>> [    0.697522] ioatdma 0000:00:04.1: Intel(R) I/OAT DMA Engine init failed
>> [    0.697617] ioatdma 0000:00:04.2: channel error register unreachable
>> [    0.697681] ioatdma 0000:00:04.2: channel enumeration error
>> [    0.697739] ioatdma 0000:00:04.2: Intel(R) I/OAT DMA Engine init failed
>> [    0.697831] ioatdma 0000:00:04.3: channel error register unreachable
>> [    0.697890] ioatdma 0000:00:04.3: channel enumeration error
>> [    0.697948] ioatdma 0000:00:04.3: Intel(R) I/OAT DMA Engine init failed
>> [    0.698037] ioatdma 0000:00:04.4: channel error register unreachable
>> [    0.698095] ioatdma 0000:00:04.4: channel enumeration error
>> [    0.698153] ioatdma 0000:00:04.4: Intel(R) I/OAT DMA Engine init failed
>> [    0.698245] ioatdma 0000:00:04.5: channel error register unreachable
>> [    0.698303] ioatdma 0000:00:04.5: channel enumeration error
>> [    0.698360] ioatdma 0000:00:04.5: Intel(R) I/OAT DMA Engine init failed
>> [    0.698449] ioatdma 0000:00:04.6: channel error register unreachable
>> [    0.698508] ioatdma 0000:00:04.6: channel enumeration error
>> [    0.698565] ioatdma 0000:00:04.6: Intel(R) I/OAT DMA Engine init failed
>> [    0.698676] ioatdma 0000:00:04.7: channel error register unreachable
>> [    0.698735] ioatdma 0000:00:04.7: channel enumeration error
>> [    0.698792] ioatdma 0000:00:04.7: Intel(R) I/OAT DMA Engine init failed
>>
>> --
>>
>> Also, I tried using ASPM (enabled in BIOS), but since ACPI Linux query is
>> ignored, it fails to work:
>> [    0.562229] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
>>
>> I assume this is something Supermicro has to fix?
>
> You are probably missing some kernel config option(s) :) - I did fight similar
> issues on a Fujitsu SandyBridge Xeon based server.
>
> Check if enabling CONFIG_X86_X2APIC helps as well as other APIC/IOMMU options.

Changing config options is not a valid fix for error messages like
this.  We should be able to make the config smarter by adding
dependencies or something, or else make the driver smart enough to
give a more useful diagnostic.

The "channel error register unreachable" message indicates that
pci_read_config_dword() failed.  The register in question
(IOAT_PCI_CHANERR_INT_OFFSET) is at 0x180, so possibly we don't have
PCI config accessors for the extended config space (0x100-0xfff).  A
complete dmesg log should show that.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
  2012-11-27  0:56   ` Bjorn Helgaas
@ 2012-11-27  1:00     ` Bjorn Helgaas
  2012-11-27  1:00       ` Justin Piszcz
  2012-11-27  1:11     ` Dan Williams
  1 sibling, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2012-11-27  1:00 UTC (permalink / raw)
  To: Bruno Prémont; +Cc: Justin Piszcz, support, linux-kernel, Dan Williams

[Try Dan's current email address; sorry Dan]

On Mon, Nov 26, 2012 at 5:56 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> [+cc Dan]
>
> On Mon, Nov 26, 2012 at 2:42 PM, Bruno Prémont
> <bonbons@linux-vserver.org> wrote:
>> Hi Justin,
>>
>> On Sat, 24 November 2012 "Justin Piszcz" wrote:
>>> Is the following normal on an X9SRL-F board (bios 1.0a)?
>>>
>>> In the manual it states:
>>>
>>> Data Direct I/O
>>> Select Enabled to enable Intel I/OAT (I/O Acceleration Technology), which
>>> significantly reduces CPU overhead by leveraging CPU architectural
>>> improvements and freeing the system resource for other tasks. The options
>>> are Disabled and Enabled.
>>>
>>> Default is Enabled.
>>>
>>> When enabled in the kernel, I see the following:
>>>
>>> [    0.696357] ioatdma: Intel(R) QuickData Technology Driver 4.00
>>> [    0.696487] ioatdma 0000:00:04.0: channel error register unreachable
>>> [    0.696546] ioatdma 0000:00:04.0: channel enumeration error
>>> [    0.696604] ioatdma 0000:00:04.0: Intel(R) I/OAT DMA Engine init failed
>>> [    0.696721] ioatdma 0000:00:04.1: channel error register unreachable
>>> [    0.696779] ioatdma 0000:00:04.1: channel enumeration error
>>> [    0.697522] ioatdma 0000:00:04.1: Intel(R) I/OAT DMA Engine init failed
>>> [    0.697617] ioatdma 0000:00:04.2: channel error register unreachable
>>> [    0.697681] ioatdma 0000:00:04.2: channel enumeration error
>>> [    0.697739] ioatdma 0000:00:04.2: Intel(R) I/OAT DMA Engine init failed
>>> [    0.697831] ioatdma 0000:00:04.3: channel error register unreachable
>>> [    0.697890] ioatdma 0000:00:04.3: channel enumeration error
>>> [    0.697948] ioatdma 0000:00:04.3: Intel(R) I/OAT DMA Engine init failed
>>> [    0.698037] ioatdma 0000:00:04.4: channel error register unreachable
>>> [    0.698095] ioatdma 0000:00:04.4: channel enumeration error
>>> [    0.698153] ioatdma 0000:00:04.4: Intel(R) I/OAT DMA Engine init failed
>>> [    0.698245] ioatdma 0000:00:04.5: channel error register unreachable
>>> [    0.698303] ioatdma 0000:00:04.5: channel enumeration error
>>> [    0.698360] ioatdma 0000:00:04.5: Intel(R) I/OAT DMA Engine init failed
>>> [    0.698449] ioatdma 0000:00:04.6: channel error register unreachable
>>> [    0.698508] ioatdma 0000:00:04.6: channel enumeration error
>>> [    0.698565] ioatdma 0000:00:04.6: Intel(R) I/OAT DMA Engine init failed
>>> [    0.698676] ioatdma 0000:00:04.7: channel error register unreachable
>>> [    0.698735] ioatdma 0000:00:04.7: channel enumeration error
>>> [    0.698792] ioatdma 0000:00:04.7: Intel(R) I/OAT DMA Engine init failed
>>>
>>> --
>>>
>>> Also, I tried using ASPM (enabled in BIOS), but since ACPI Linux query is
>>> ignored, it fails to work:
>>> [    0.562229] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
>>>
>>> I assume this is something Supermicro has to fix?
>>
>> You are probably missing some kernel config option(s) :) - I did fight similar
>> issues on a Fujitsu SandyBridge Xeon based server.
>>
>> Check if enabling CONFIG_X86_X2APIC helps as well as other APIC/IOMMU options.
>
> Changing config options is not a valid fix for error messages like
> this.  We should be able to make the config smarter by adding
> dependencies or something, or else make the driver smart enough to
> give a more useful diagnostic.
>
> The "channel error register unreachable" message indicates that
> pci_read_config_dword() failed.  The register in question
> (IOAT_PCI_CHANERR_INT_OFFSET) is at 0x180, so possibly we don't have
> PCI config accessors for the extended config space (0x100-0xfff).  A
> complete dmesg log should show that.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
  2012-11-27  1:00     ` Bjorn Helgaas
@ 2012-11-27  1:00       ` Justin Piszcz
  2012-11-27  1:11         ` Bjorn Helgaas
  0 siblings, 1 reply; 24+ messages in thread
From: Justin Piszcz @ 2012-11-27  1:00 UTC (permalink / raw)
  To: 'Bjorn Helgaas', 'Bruno Prémont'
  Cc: support, linux-kernel, 'Dan Williams'



-----Original Message-----
From: Bjorn Helgaas [mailto:bhelgaas@google.com] 
Sent: Monday, November 26, 2012 8:00 PM
To: Bruno Prémont
Cc: Justin Piszcz; support@supermicro.com; linux-kernel@vger.kernel.org; Dan
Williams
Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
bug question

[Try Dan's current email address; sorry Dan]

On Mon, Nov 26, 2012 at 5:56 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> [+cc Dan]
>
> On Mon, Nov 26, 2012 at 2:42 PM, Bruno Prémont
> <bonbons@linux-vserver.org> wrote:
>> Hi Justin,
>>
>> On Sat, 24 November 2012 "Justin Piszcz" wrote:
>>> Is the following normal on an X9SRL-F board (bios 1.0a)?
>>>
>>> In the manual it states:
>>>
>>> Data Direct I/O
>>> Select Enabled to enable Intel I/OAT (I/O Acceleration Technology),
which
>>> significantly reduces CPU overhead by leveraging CPU architectural
>>> improvements and freeing the system resource for other tasks. The
options
>>> are Disabled and Enabled.
>>>
>>> Default is Enabled.
>>>
>>> When enabled in the kernel, I see the following:
>>>
>>> [    0.696357] ioatdma: Intel(R) QuickData Technology Driver 4.00
>>> [    0.696487] ioatdma 0000:00:04.0: channel error register unreachable
>>> [    0.696546] ioatdma 0000:00:04.0: channel enumeration error
>>> [    0.696604] ioatdma 0000:00:04.0: Intel(R) I/OAT DMA Engine init
failed
>>> [    0.696721] ioatdma 0000:00:04.1: channel error register unreachable
>>> [    0.696779] ioatdma 0000:00:04.1: channel enumeration error
>>> [    0.697522] ioatdma 0000:00:04.1: Intel(R) I/OAT DMA Engine init
failed
>>> [    0.697617] ioatdma 0000:00:04.2: channel error register unreachable
>>> [    0.697681] ioatdma 0000:00:04.2: channel enumeration error
>>> [    0.697739] ioatdma 0000:00:04.2: Intel(R) I/OAT DMA Engine init
failed
>>> [    0.697831] ioatdma 0000:00:04.3: channel error register unreachable
>>> [    0.697890] ioatdma 0000:00:04.3: channel enumeration error
>>> [    0.697948] ioatdma 0000:00:04.3: Intel(R) I/OAT DMA Engine init
failed
>>> [    0.698037] ioatdma 0000:00:04.4: channel error register unreachable
>>> [    0.698095] ioatdma 0000:00:04.4: channel enumeration error
>>> [    0.698153] ioatdma 0000:00:04.4: Intel(R) I/OAT DMA Engine init
failed
>>> [    0.698245] ioatdma 0000:00:04.5: channel error register unreachable
>>> [    0.698303] ioatdma 0000:00:04.5: channel enumeration error
>>> [    0.698360] ioatdma 0000:00:04.5: Intel(R) I/OAT DMA Engine init
failed
>>> [    0.698449] ioatdma 0000:00:04.6: channel error register unreachable
>>> [    0.698508] ioatdma 0000:00:04.6: channel enumeration error
>>> [    0.698565] ioatdma 0000:00:04.6: Intel(R) I/OAT DMA Engine init
failed
>>> [    0.698676] ioatdma 0000:00:04.7: channel error register unreachable
>>> [    0.698735] ioatdma 0000:00:04.7: channel enumeration error
>>> [    0.698792] ioatdma 0000:00:04.7: Intel(R) I/OAT DMA Engine init
failed
>>>
>>> --
>>>
>>> Also, I tried using ASPM (enabled in BIOS), but since ACPI Linux query
is
>>> ignored, it fails to work:
>>> [    0.562229] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
>>>
>>> I assume this is something Supermicro has to fix?
>>
>> You are probably missing some kernel config option(s) :) - I did fight
similar
>> issues on a Fujitsu SandyBridge Xeon based server.
>>
>> Check if enabling CONFIG_X86_X2APIC helps as well as other APIC/IOMMU
options.
>
> Changing config options is not a valid fix for error messages like
> this.  We should be able to make the config smarter by adding
> dependencies or something, or else make the driver smart enough to
> give a more useful diagnostic.
>
> The "channel error register unreachable" message indicates that
> pci_read_config_dword() failed.  The register in question
> (IOAT_PCI_CHANERR_INT_OFFSET) is at 0x180, so possibly we don't have
> PCI config accessors for the extended config space (0x100-0xfff).  A
> complete dmesg log should show that.

--

Here is the full dmesg: (I went back to my older kernel, let me know if you
need a dmesg w/ those options enabled)
http://home.comcast.net/~jpiszcz/20121126/dmesg.txt

Justin.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
  2012-11-27  0:56   ` Bjorn Helgaas
  2012-11-27  1:00     ` Bjorn Helgaas
@ 2012-11-27  1:11     ` Dan Williams
  1 sibling, 0 replies; 24+ messages in thread
From: Dan Williams @ 2012-11-27  1:11 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bruno Prémont, Justin Piszcz, support, linux-kernel,
	Dan Williams, dave.jiang

On Mon, Nov 26, 2012 at 4:56 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> The "channel error register unreachable" message indicates that
> pci_read_config_dword() failed.  The register in question
> (IOAT_PCI_CHANERR_INT_OFFSET) is at 0x180, so possibly we don't have
> PCI config accessors for the extended config space (0x100-0xfff).  A
> complete dmesg log should show that.

Yes, this happens when extended pci configuration space is not
reachable.  Although the need to access this register has been found
to no longer be needed.  So, it appears this patch [1] from Dave
should be modified to just stop touching that register altogether and
then go to -stable.

--
Dan

[1]: http://marc.info/?l=linux-kernel&m=135310841032707&w=2

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
  2012-11-27  1:00       ` Justin Piszcz
@ 2012-11-27  1:11         ` Bjorn Helgaas
  2012-11-27 13:33           ` Justin Piszcz
  0 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2012-11-27  1:11 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: Bruno Prémont, support, linux-kernel, Dan Williams

On Mon, Nov 26, 2012 at 6:00 PM, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
>
>
> -----Original Message-----
> From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> Sent: Monday, November 26, 2012 8:00 PM
> To: Bruno Prémont
> Cc: Justin Piszcz; support@supermicro.com; linux-kernel@vger.kernel.org; Dan
> Williams
> Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
> bug question
>
> [Try Dan's current email address; sorry Dan]
>
> On Mon, Nov 26, 2012 at 5:56 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> [+cc Dan]
>>
>> On Mon, Nov 26, 2012 at 2:42 PM, Bruno Prémont
>> <bonbons@linux-vserver.org> wrote:
>>> Hi Justin,
>>>
>>> On Sat, 24 November 2012 "Justin Piszcz" wrote:
>>>> Is the following normal on an X9SRL-F board (bios 1.0a)?
>>>>
>>>> In the manual it states:
>>>>
>>>> Data Direct I/O
>>>> Select Enabled to enable Intel I/OAT (I/O Acceleration Technology),
> which
>>>> significantly reduces CPU overhead by leveraging CPU architectural
>>>> improvements and freeing the system resource for other tasks. The
> options
>>>> are Disabled and Enabled.
>>>>
>>>> Default is Enabled.
>>>>
>>>> When enabled in the kernel, I see the following:
>>>>
>>>> [    0.696357] ioatdma: Intel(R) QuickData Technology Driver 4.00
>>>> [    0.696487] ioatdma 0000:00:04.0: channel error register unreachable
>>>> [    0.696546] ioatdma 0000:00:04.0: channel enumeration error
>>>> [    0.696604] ioatdma 0000:00:04.0: Intel(R) I/OAT DMA Engine init
> failed
>>>> [    0.696721] ioatdma 0000:00:04.1: channel error register unreachable
>>>> [    0.696779] ioatdma 0000:00:04.1: channel enumeration error
>>>> [    0.697522] ioatdma 0000:00:04.1: Intel(R) I/OAT DMA Engine init
> failed
>>>> [    0.697617] ioatdma 0000:00:04.2: channel error register unreachable
>>>> [    0.697681] ioatdma 0000:00:04.2: channel enumeration error
>>>> [    0.697739] ioatdma 0000:00:04.2: Intel(R) I/OAT DMA Engine init
> failed
>>>> [    0.697831] ioatdma 0000:00:04.3: channel error register unreachable
>>>> [    0.697890] ioatdma 0000:00:04.3: channel enumeration error
>>>> [    0.697948] ioatdma 0000:00:04.3: Intel(R) I/OAT DMA Engine init
> failed
>>>> [    0.698037] ioatdma 0000:00:04.4: channel error register unreachable
>>>> [    0.698095] ioatdma 0000:00:04.4: channel enumeration error
>>>> [    0.698153] ioatdma 0000:00:04.4: Intel(R) I/OAT DMA Engine init
> failed
>>>> [    0.698245] ioatdma 0000:00:04.5: channel error register unreachable
>>>> [    0.698303] ioatdma 0000:00:04.5: channel enumeration error
>>>> [    0.698360] ioatdma 0000:00:04.5: Intel(R) I/OAT DMA Engine init
> failed
>>>> [    0.698449] ioatdma 0000:00:04.6: channel error register unreachable
>>>> [    0.698508] ioatdma 0000:00:04.6: channel enumeration error
>>>> [    0.698565] ioatdma 0000:00:04.6: Intel(R) I/OAT DMA Engine init
> failed
>>>> [    0.698676] ioatdma 0000:00:04.7: channel error register unreachable
>>>> [    0.698735] ioatdma 0000:00:04.7: channel enumeration error
>>>> [    0.698792] ioatdma 0000:00:04.7: Intel(R) I/OAT DMA Engine init
> failed
>>>>
>>>> --
>>>>
>>>> Also, I tried using ASPM (enabled in BIOS), but since ACPI Linux query
> is
>>>> ignored, it fails to work:
>>>> [    0.562229] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
>>>>
>>>> I assume this is something Supermicro has to fix?
>>>
>>> You are probably missing some kernel config option(s) :) - I did fight
> similar
>>> issues on a Fujitsu SandyBridge Xeon based server.
>>>
>>> Check if enabling CONFIG_X86_X2APIC helps as well as other APIC/IOMMU
> options.
>>
>> Changing config options is not a valid fix for error messages like
>> this.  We should be able to make the config smarter by adding
>> dependencies or something, or else make the driver smart enough to
>> give a more useful diagnostic.
>>
>> The "channel error register unreachable" message indicates that
>> pci_read_config_dword() failed.  The register in question
>> (IOAT_PCI_CHANERR_INT_OFFSET) is at 0x180, so possibly we don't have
>> PCI config accessors for the extended config space (0x100-0xfff).  A
>> complete dmesg log should show that.
>
> --
>
> Here is the full dmesg: (I went back to my older kernel, let me know if you
> need a dmesg w/ those options enabled)
> http://home.comcast.net/~jpiszcz/20121126/dmesg.txt

It looks like maybe you don't have CONFIG_PCI_MMCONFIG turned on?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
  2012-11-27  1:11         ` Bjorn Helgaas
@ 2012-11-27 13:33           ` Justin Piszcz
  2012-11-27 13:49             ` Justin Piszcz
  0 siblings, 1 reply; 24+ messages in thread
From: Justin Piszcz @ 2012-11-27 13:33 UTC (permalink / raw)
  To: 'Bjorn Helgaas'
  Cc: 'Bruno Prémont',
	support, linux-kernel, 'Dan Williams'



-----Original Message-----
From: Bjorn Helgaas [mailto:bhelgaas@google.com] 
Sent: Monday, November 26, 2012 8:12 PM
To: Justin Piszcz
Cc: Bruno Prémont; support@supermicro.com; linux-kernel@vger.kernel.org; Dan
Williams
Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
bug question

On Mon, Nov 26, 2012 at 6:00 PM, Justin Piszcz <jpiszcz@lucidpixels.com>
wrote:
>
>
> -----Original Message-----
> From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> Sent: Monday, November 26, 2012 8:00 PM
> To: Bruno Prémont
> Cc: Justin Piszcz; support@supermicro.com; linux-kernel@vger.kernel.org;
Dan
> Williams
> Subject: Re: Supermicro X9SRL-F - channel enumeration error &
ACPI/firmware
> bug question
>
> [Try Dan's current email address; sorry Dan]
>
> On Mon, Nov 26, 2012 at 5:56 PM, Bjorn Helgaas <bhelgaas@google.com>
wrote:
>> [+cc Dan]
>>
>> On Mon, Nov 26, 2012 at 2:42 PM, Bruno Prémont
>> <bonbons@linux-vserver.org> wrote:
>>> Hi Justin,
>>>
>>> On Sat, 24 November 2012 "Justin Piszcz" wrote:
>>>> Is the following normal on an X9SRL-F board (bios 1.0a)?
>>>>
>>>> In the manual it states:
>>>>
>>>> Data Direct I/O
>>>> Select Enabled to enable Intel I/OAT (I/O Acceleration Technology),
> which
>>>> significantly reduces CPU overhead by leveraging CPU architectural
>>>> improvements and freeing the system resource for other tasks. The>
> Here is the full dmesg: (I went back to my older kernel, let me know if
you
> need a dmesg w/ those options enabled)
> http://home.comcast.net/~jpiszcz/20121126/dmesg.txt

It looks like maybe you don't have CONFIG_PCI_MMCONFIG turned on?

Hi,

I have two supermicro boards I am trying this on, I tried this on another
system I have (X8DTH-6F), with all of these options enabled, the system does
not boot.  It cannot talk to the SATA boot drive.

" 5520 chips built in, the X8DTH-6/X8DTH-6F/X8DTH-i/X8DTH-iF offers ......
The Intel I/OAT (I/O Acceleration Technology) significantly reduces CPU
over- head by ..."

When the following options are enabled, the system does not boot:

+CONFIG_HAVE_INTEL_TXT=y
+CONFIG_IOMMU_API=y
+CONFIG_IOMMU_SUPPORT=y
+CONFIG_DMAR_TABLE=y
+CONFIG_INTEL_IOMMU=y
+CONFIG_INTEL_IOMMU_DEFAULT_ON=y
+CONFIG_INTEL_IOMMU_FLOPPY_WA=y

It fails like so:

(Fails to talk to the SSD)
http://home.comcast.net/~jpiszcz/20121127/photo1-resize.jpg

(then, a few moments later: Kernel panic)
http://home.comcast.net/~jpiszcz/20121127/photo2-resize.jpg

With those options disabled, the system boots (and always has booted fine).
Is there a certain combination of parameters that allows I/OAT to be enabled
_and_ allow the system to boot?

Justin.





^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
  2012-11-27 13:33           ` Justin Piszcz
@ 2012-11-27 13:49             ` Justin Piszcz
  2012-11-27 13:56               ` Justin Piszcz
                                 ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Justin Piszcz @ 2012-11-27 13:49 UTC (permalink / raw)
  To: 'Bjorn Helgaas'
  Cc: 'Bruno Prémont',
	support, linux-kernel, 'Dan Williams'


> It looks like maybe you don't have CONFIG_PCI_MMCONFIG turned on?

===> FOR I/OAT DMA
Latest status, it _appears_ its working on the X9SRL-F now, thank you!

1) Supermicro X9SRL-F (GOOD)
[    0.738510] ioatdma: Intel(R) QuickData Technology Driver 4.00
[    0.738719] ioatdma 0000:00:04.0: irq 75 for MSI/MSI-X
[    0.739088] ioatdma 0000:00:04.1: irq 76 for MSI/MSI-X
[    0.739408] ioatdma 0000:00:04.2: irq 77 for MSI/MSI-X
[    0.739739] ioatdma 0000:00:04.3: irq 78 for MSI/MSI-X
[    0.740040] ioatdma 0000:00:04.4: irq 79 for MSI/MSI-X
[    0.740342] ioatdma 0000:00:04.5: irq 80 for MSI/MSI-X
[    0.740670] ioatdma 0000:00:04.6: irq 81 for MSI/MSI-X
[    0.740971] ioatdma 0000:00:04.7: irq 82 for MSI/MSI-X

It is _not_ working on the:

2) Supermicro X8DTH-F (the boot drive in this system is running off a PCI-e
card, could the IRQ for the I/O controller be getting re-mapped and fail?)--
worse case I can move the SSD from the 6.0gbpa SATA card to the motherboard
and see if that works, but that kind of defeats the purpose of a 6.0gbps
SATA SSD.

(Fails to talk to the SSD)
http://home.comcast.net/~jpiszcz/20121127/photo1-resize.jpg

(then, a few moments later: Kernel panic)
http://home.comcast.net/~jpiszcz/20121127/photo2-resize.jpg

Would be curious if anyone had any suggestions besides removing the
controller card?

--


==> Further issues with the X9SRL-F -- does this board support ASPM or is
this a Linux/ASPM implementation issue?
[    0.632170]  pci0000:ff: ACPI _OSC support notification failed, disabling
PCIe ASPM
[    0.632239]  pci0000:ff: Unable to request _OSC control (_OSC support
mask: 0x08)

Justin.







^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
  2012-11-27 13:49             ` Justin Piszcz
@ 2012-11-27 13:56               ` Justin Piszcz
  2012-11-27 14:35                 ` Justin Piszcz
  2012-11-28 23:54               ` Bjorn Helgaas
  2012-11-29  0:34               ` Robert Hancock
  2 siblings, 1 reply; 24+ messages in thread
From: Justin Piszcz @ 2012-11-27 13:56 UTC (permalink / raw)
  To: 'Bjorn Helgaas'
  Cc: 'Bruno Prémont',
	support, linux-kernel, 'Dan Williams'


> It is _not_ working on the:

> 2) Supermicro X8DTH-F (the boot drive in this system is running off a
PCI-e
> card, could the IRQ for the I/O controller be getting re-mapped and
fail?)--
> worse case I can move the SSD from the 6.0gbpa SATA card to the
motherboard
> and see if that works, but that kind of defeats the purpose of a 6.0gbps
> SATA SSD.

When IOMMU is disabled, I/OAT DMA is successful on the second motherboard
(X8DTH-6F).
Specifically:

--- DMA Engine support
[*]   Intel I/OAT DMA support
[*]   Network: TCP receive copy offload   
[*]   Async_tx: Offload support for the async_tx api

When IOMMU/X2APIC is enabled on the X8DTH-6F it fails to boot.
Will keep doing more testing to see if I get anywhere w/regards to the
IOMMU.

Proof of success:

[    0.757467] ioatdma: Intel(R) QuickData Technology Driver 4.00
[    0.757690] ioatdma 0000:00:16.0: irq 88 for MSI/MSI-X
[    0.757948] ioatdma 0000:00:16.1: irq 89 for MSI/MSI-X
[    0.758166] ioatdma 0000:00:16.2: irq 90 for MSI/MSI-X
[    0.758377] ioatdma 0000:00:16.3: irq 91 for MSI/MSI-X
[    0.758577] ioatdma 0000:00:16.4: irq 92 for MSI/MSI-X
[    0.758794] ioatdma 0000:00:16.5: irq 93 for MSI/MSI-X
[    0.759000] ioatdma 0000:00:16.6: irq 94 for MSI/MSI-X
[    0.759214] ioatdma 0000:00:16.7: irq 95 for MSI/MSI-X
[    0.759461] ioatdma 0000:80:16.0: irq 96 for MSI/MSI-X
[    0.759720] ioatdma 0000:80:16.1: irq 97 for MSI/MSI-X
[    0.759963] ioatdma 0000:80:16.2: irq 98 for MSI/MSI-X
[    0.760190] ioatdma 0000:80:16.3: irq 99 for MSI/MSI-X
[    0.760414] ioatdma 0000:80:16.4: irq 100 for MSI/MSI-X
[    0.760630] ioatdma 0000:80:16.5: irq 101 for MSI/MSI-X
[    0.760862] ioatdma 0000:80:16.6: irq 102 for MSI/MSI-X
[    0.761081] ioatdma 0000:80:16.7: irq 103 for MSI/MSI-X

--


==> Further issues with the X9SRL-F -- does this board support ASPM or is
this a Linux/ASPM implementation issue?
[    0.632170]  pci0000:ff: ACPI _OSC support notification failed, disabling
PCIe ASPM
[    0.632239]  pci0000:ff: Unable to request _OSC control (_OSC support
mask: 0x08)

Justin.








^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
  2012-11-27 13:56               ` Justin Piszcz
@ 2012-11-27 14:35                 ` Justin Piszcz
  2012-11-29  0:08                   ` Bjorn Helgaas
  0 siblings, 1 reply; 24+ messages in thread
From: Justin Piszcz @ 2012-11-27 14:35 UTC (permalink / raw)
  To: 'Bjorn Helgaas'
  Cc: 'Bruno Prémont',
	support, linux-kernel, 'Dan Williams'



-----Original Message-----
From: Justin Piszcz [mailto:jpiszcz@lucidpixels.com] 
Sent: Tuesday, November 27, 2012 8:56 AM
To: 'Bjorn Helgaas'
Cc: 'Bruno Prémont'; support@supermicro.com; linux-kernel@vger.kernel.org;
'Dan Williams'
Subject: RE: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
bug question


> It is _not_ working on the:

> 2) Supermicro X8DTH-F (the boot drive in this system is running off a
PCI-e
> card, could the IRQ for the I/O controller be getting re-mapped and
fail?)--
> worse case I can move the SSD from the 6.0gbpa SATA card to the
motherboard
> and see if that works, but that kind of defeats the purpose of a 6.0gbps
> SATA SSD.

When I removed the Highpoint 2-port SATA card and plugged it into the
motherboard, the system boots (plugged the SSD into the motherboard).
So if you use a HIGHPOINT 2-PORT SATA 6.0gbps card, do NOT enable IOMMU or
it will fail to initialize the Highpoint 2-port SATA controller card!
I also tried upgrading the BIOS (of the mobo, no diff)
I also tried just leaving the SATA card in and plugging it into the
motherboard (no diff)
Removed the Highpoint 2-port SATA card and then success, it would be nice to
use that card with IOMMU support though, is it just not compatible
(marvell-problem?) or is a driver bug?  Based on the pictures/etc sent
earlier?

$ dmesg|grep -i iommu
[    0.055134] dmar: IOMMU 0: reg_base_addr cfdfe000 ver 1:0 cap
c90780106f0462 ecap f020f6
[    0.055396] dmar: IOMMU 1: reg_base_addr fecfe000 ver 1:0 cap
c90780106f0462 ecap f020f6
[    0.760665] IOMMU 0 0xcfdfe000: using Queued invalidation
[    0.760803] IOMMU 1 0xfecfe000: using Queued invalidation
[    0.760937] IOMMU: Setting RMRR:
[    0.761102] IOMMU: Setting identity map for device 0000:00:1d.0
[0xbf7ec000 - 0xbf7fffff]
[    0.761329] IOMMU: Setting identity map for device 0000:00:1d.1
[0xbf7ec000 - 0xbf7fffff]
[    0.761542] IOMMU: Setting identity map for device 0000:00:1d.2
[0xbf7ec000 - 0xbf7fffff]
[    0.761758] IOMMU: Setting identity map for device 0000:00:1d.7
[0xbf7ec000 - 0xbf7fffff]
[    0.761974] IOMMU: Setting identity map for device 0000:00:1a.0
[0xbf7ec000 - 0xbf7fffff]
[    0.762190] IOMMU: Setting identity map for device 0000:00:1a.1
[0xbf7ec000 - 0xbf7fffff]
[    0.762407] IOMMU: Setting identity map for device 0000:00:1a.2
[0xbf7ec000 - 0xbf7fffff]
[    0.762620] IOMMU: Setting identity map for device 0000:00:1a.7
[0xbf7ec000 - 0xbf7fffff]
[    0.762816] IOMMU: Setting identity map for device 0000:00:1d.0 [0xec000
- 0xeffff]
[    0.763010] IOMMU: Setting identity map for device 0000:00:1d.1 [0xec000
- 0xeffff]
[    0.763197] IOMMU: Setting identity map for device 0000:00:1d.2 [0xec000
- 0xeffff]
[    0.763382] IOMMU: Setting identity map for device 0000:00:1d.7 [0xec000
- 0xeffff]
[    0.763567] IOMMU: Setting identity map for device 0000:00:1a.0 [0xec000
- 0xeffff]
[    0.763749] IOMMU: Setting identity map for device 0000:00:1a.1 [0xec000
- 0xeffff]
[    0.763934] IOMMU: Setting identity map for device 0000:00:1a.2 [0xec000
- 0xeffff]
[    0.764127] IOMMU: Setting identity map for device 0000:00:1a.7 [0xec000
- 0xeffff]
[    0.764311] IOMMU: Prepare 0-16MiB unity mapping for LPC
[    0.764465] IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 -
0xffffff]

--


==> Further issues with the X9SRL-F -- does this board support ASPM or is
this a Linux/ASPM implementation issue?
[    0.632170]  pci0000:ff: ACPI _OSC support notification failed, disabling
PCIe ASPM
[    0.632239]  pci0000:ff: Unable to request _OSC control (_OSC support
mask: 0x08)

Justin.









^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
  2012-11-27 13:49             ` Justin Piszcz
  2012-11-27 13:56               ` Justin Piszcz
@ 2012-11-28 23:54               ` Bjorn Helgaas
  2012-11-29  0:48                 ` Justin Piszcz
  2012-11-29  0:34               ` Robert Hancock
  2 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2012-11-28 23:54 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: Bruno Prémont, support, linux-kernel, Dan Williams

On Tue, Nov 27, 2012 at 6:49 AM, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
>
>> It looks like maybe you don't have CONFIG_PCI_MMCONFIG turned on?
>
> ===> FOR I/OAT DMA
> Latest status, it _appears_ its working on the X9SRL-F now, thank you!
>
> 1) Supermicro X9SRL-F (GOOD)
> [    0.738510] ioatdma: Intel(R) QuickData Technology Driver 4.00
> [    0.738719] ioatdma 0000:00:04.0: irq 75 for MSI/MSI-X
> [    0.739088] ioatdma 0000:00:04.1: irq 76 for MSI/MSI-X
> [    0.739408] ioatdma 0000:00:04.2: irq 77 for MSI/MSI-X
> [    0.739739] ioatdma 0000:00:04.3: irq 78 for MSI/MSI-X
> [    0.740040] ioatdma 0000:00:04.4: irq 79 for MSI/MSI-X
> [    0.740342] ioatdma 0000:00:04.5: irq 80 for MSI/MSI-X
> [    0.740670] ioatdma 0000:00:04.6: irq 81 for MSI/MSI-X
> [    0.740971] ioatdma 0000:00:04.7: irq 82 for MSI/MSI-X

Good.  You have two issues, and I'm going to separate them and only
address the first one here.  I opened a bug report [1] against the
IOAT driver.  It should do something more useful when
CONFIG_PCI_MMCONFIG=n so we don't have to debug this again in the
future.  But otherwise, it sounds like this issue is resolved.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=51101

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
  2012-11-27 14:35                 ` Justin Piszcz
@ 2012-11-29  0:08                   ` Bjorn Helgaas
  2012-11-29  0:49                     ` Justin Piszcz
  0 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2012-11-29  0:08 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: Bruno Prémont, support, linux-kernel, Dan Williams

On Tue, Nov 27, 2012 at 7:35 AM, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
>
>
> -----Original Message-----
> From: Justin Piszcz [mailto:jpiszcz@lucidpixels.com]
> Sent: Tuesday, November 27, 2012 8:56 AM
> To: 'Bjorn Helgaas'
> Cc: 'Bruno Prémont'; support@supermicro.com; linux-kernel@vger.kernel.org;
> 'Dan Williams'
> Subject: RE: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
> bug question
>
>
>> It is _not_ working on the:
>
>> 2) Supermicro X8DTH-F (the boot drive in this system is running off a
> PCI-e
>> card, could the IRQ for the I/O controller be getting re-mapped and
> fail?)--
>> worse case I can move the SSD from the 6.0gbpa SATA card to the
> motherboard
>> and see if that works, but that kind of defeats the purpose of a 6.0gbps
>> SATA SSD.
>
> When I removed the Highpoint 2-port SATA card and plugged it into the
> motherboard, the system boots (plugged the SSD into the motherboard).
> So if you use a HIGHPOINT 2-PORT SATA 6.0gbps card, do NOT enable IOMMU or
> it will fail to initialize the Highpoint 2-port SATA controller card!
> I also tried upgrading the BIOS (of the mobo, no diff)
> I also tried just leaving the SATA card in and plugging it into the
> motherboard (no diff)
> Removed the Highpoint 2-port SATA card and then success, it would be nice to
> use that card with IOMMU support though, is it just not compatible
> (marvell-problem?) or is a driver bug?  Based on the pictures/etc sent
> earlier?

I would guess this is a core bug, but it's hard to tell without more
information.

If you boot with "intel_iommu=off", I would guess the Highpoint card
would work (this should have the same effect as turning off
CONFIG_INTEL_IOMMU).  I'd like to compare the complete dmesg log for
that boot with the one that fails.

It sounds like it might be hard to collect the log for the failing
case -- you said the boot fails when the Highpoint card is in the
system even if the SSD is connected to the motherboard instead of the
Highpoint card.  The panic in the photo2 image looks like it's just a
failure to mount the root filesystem, which is what I'd expect if we
can't find the SSD.  It seems like we ought to be able to *boot* with
the SSD connected to the motherboard, even if the Highpoint card
doesn't work.  But worst-case, a video of the failing boot might be
enough, especially if you can slow it down with "boot_delay="

> $ dmesg|grep -i iommu
> [    0.055134] dmar: IOMMU 0: reg_base_addr cfdfe000 ver 1:0 cap
> c90780106f0462 ecap f020f6
> [    0.055396] dmar: IOMMU 1: reg_base_addr fecfe000 ver 1:0 cap
> c90780106f0462 ecap f020f6
> [    0.760665] IOMMU 0 0xcfdfe000: using Queued invalidation
> [    0.760803] IOMMU 1 0xfecfe000: using Queued invalidation
> [    0.760937] IOMMU: Setting RMRR:
> [    0.761102] IOMMU: Setting identity map for device 0000:00:1d.0
> [0xbf7ec000 - 0xbf7fffff]
> [    0.761329] IOMMU: Setting identity map for device 0000:00:1d.1
> [0xbf7ec000 - 0xbf7fffff]
> [    0.761542] IOMMU: Setting identity map for device 0000:00:1d.2
> [0xbf7ec000 - 0xbf7fffff]
> [    0.761758] IOMMU: Setting identity map for device 0000:00:1d.7
> [0xbf7ec000 - 0xbf7fffff]
> [    0.761974] IOMMU: Setting identity map for device 0000:00:1a.0
> [0xbf7ec000 - 0xbf7fffff]
> [    0.762190] IOMMU: Setting identity map for device 0000:00:1a.1
> [0xbf7ec000 - 0xbf7fffff]
> [    0.762407] IOMMU: Setting identity map for device 0000:00:1a.2
> [0xbf7ec000 - 0xbf7fffff]
> [    0.762620] IOMMU: Setting identity map for device 0000:00:1a.7
> [0xbf7ec000 - 0xbf7fffff]
> [    0.762816] IOMMU: Setting identity map for device 0000:00:1d.0 [0xec000
> - 0xeffff]
> [    0.763010] IOMMU: Setting identity map for device 0000:00:1d.1 [0xec000
> - 0xeffff]
> [    0.763197] IOMMU: Setting identity map for device 0000:00:1d.2 [0xec000
> - 0xeffff]
> [    0.763382] IOMMU: Setting identity map for device 0000:00:1d.7 [0xec000
> - 0xeffff]
> [    0.763567] IOMMU: Setting identity map for device 0000:00:1a.0 [0xec000
> - 0xeffff]
> [    0.763749] IOMMU: Setting identity map for device 0000:00:1a.1 [0xec000
> - 0xeffff]
> [    0.763934] IOMMU: Setting identity map for device 0000:00:1a.2 [0xec000
> - 0xeffff]
> [    0.764127] IOMMU: Setting identity map for device 0000:00:1a.7 [0xec000
> - 0xeffff]
> [    0.764311] IOMMU: Prepare 0-16MiB unity mapping for LPC
> [    0.764465] IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 -
> 0xffffff]
>
> --
>
>
> ==> Further issues with the X9SRL-F -- does this board support ASPM or is
> this a Linux/ASPM implementation issue?
> [    0.632170]  pci0000:ff: ACPI _OSC support notification failed, disabling
> PCIe ASPM
> [    0.632239]  pci0000:ff: Unable to request _OSC control (_OSC support
> mask: 0x08)

I'm going to ignore this issue for the time being.  I know we complain
about this on many machines, and I don't know whether it's a real
problem or just an overly alarming message.

Bjorn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
  2012-11-27 13:49             ` Justin Piszcz
  2012-11-27 13:56               ` Justin Piszcz
  2012-11-28 23:54               ` Bjorn Helgaas
@ 2012-11-29  0:34               ` Robert Hancock
  2012-11-29  0:49                 ` Justin Piszcz
  2 siblings, 1 reply; 24+ messages in thread
From: Robert Hancock @ 2012-11-29  0:34 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: 'Bjorn Helgaas', 'Bruno Prémont',
	support, linux-kernel, 'Dan Williams'

On 11/27/2012 07:49 AM, Justin Piszcz wrote:
>
>> It looks like maybe you don't have CONFIG_PCI_MMCONFIG turned on?
>
> ===> FOR I/OAT DMA
> Latest status, it _appears_ its working on the X9SRL-F now, thank you!
>
> 1) Supermicro X9SRL-F (GOOD)
> [    0.738510] ioatdma: Intel(R) QuickData Technology Driver 4.00
> [    0.738719] ioatdma 0000:00:04.0: irq 75 for MSI/MSI-X
> [    0.739088] ioatdma 0000:00:04.1: irq 76 for MSI/MSI-X
> [    0.739408] ioatdma 0000:00:04.2: irq 77 for MSI/MSI-X
> [    0.739739] ioatdma 0000:00:04.3: irq 78 for MSI/MSI-X
> [    0.740040] ioatdma 0000:00:04.4: irq 79 for MSI/MSI-X
> [    0.740342] ioatdma 0000:00:04.5: irq 80 for MSI/MSI-X
> [    0.740670] ioatdma 0000:00:04.6: irq 81 for MSI/MSI-X
> [    0.740971] ioatdma 0000:00:04.7: irq 82 for MSI/MSI-X
>
> It is _not_ working on the:
>
> 2) Supermicro X8DTH-F (the boot drive in this system is running off a PCI-e
> card, could the IRQ for the I/O controller be getting re-mapped and fail?)--
> worse case I can move the SSD from the 6.0gbpa SATA card to the motherboard
> and see if that works, but that kind of defeats the purpose of a 6.0gbps
> SATA SSD.
>
> (Fails to talk to the SSD)
> http://home.comcast.net/~jpiszcz/20121127/photo1-resize.jpg
>
> (then, a few moments later: Kernel panic)
> http://home.comcast.net/~jpiszcz/20121127/photo2-resize.jpg
>
> Would be curious if anyone had any suggestions besides removing the
> controller card?

What does lspci -vv show on that controller? Not sure what actual 
chipset that controller is, but there's a known issue with some Marvell 
6Gbps SATA controllers with DMAR enabled - it seems the device issues 
memory read/write requests from the wrong PCI function ID and the IOMMU 
rightly denies access as the function listed in the requests doesn't 
have any mapping to that memory. I don't think there's presently a 
workaround other than disabling DMAR. We could (and likely should) be 
detecting that device and adding some kind of quirk for it.

>
> --
>
>
> ==> Further issues with the X9SRL-F -- does this board support ASPM or is
> this a Linux/ASPM implementation issue?
> [    0.632170]  pci0000:ff: ACPI _OSC support notification failed, disabling
> PCIe ASPM
> [    0.632239]  pci0000:ff: Unable to request _OSC control (_OSC support
> mask: 0x08)

What's the full dmesg from this machine (or is it already posted somewhere)?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
  2012-11-28 23:54               ` Bjorn Helgaas
@ 2012-11-29  0:48                 ` Justin Piszcz
  0 siblings, 0 replies; 24+ messages in thread
From: Justin Piszcz @ 2012-11-29  0:48 UTC (permalink / raw)
  To: 'Bjorn Helgaas', 'Robert Hancock'
  Cc: 'Bruno Prémont',
	support, linux-kernel, 'Dan Williams'



-----Original Message-----
From: Bjorn Helgaas [mailto:bhelgaas@google.com] 
Sent: Wednesday, November 28, 2012 6:54 PM
To: Justin Piszcz
Cc: Bruno Prémont; support@supermicro.com; linux-kernel@vger.kernel.org; Dan
Williams
Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
bug question

On Tue, Nov 27, 2012 at 6:49 AM, Justin Piszcz <jpiszcz@lucidpixels.com>
wrote:
>
>> It looks like maybe you don't have CONFIG_PCI_MMCONFIG turned on?
>
> ===> FOR I/OAT DMA
> Latest status, it _appears_ its working on the X9SRL-F now, thank you!
>
> 1) Supermicro X9SRL-F (GOOD)
> [    0.738510] ioatdma: Intel(R) QuickData Technology Driver 4.00
> [    0.738719] ioatdma 0000:00:04.0: irq 75 for MSI/MSI-X
> [    0.739088] ioatdma 0000:00:04.1: irq 76 for MSI/MSI-X
> [    0.739408] ioatdma 0000:00:04.2: irq 77 for MSI/MSI-X
> [    0.739739] ioatdma 0000:00:04.3: irq 78 for MSI/MSI-X
> [    0.740040] ioatdma 0000:00:04.4: irq 79 for MSI/MSI-X
> [    0.740342] ioatdma 0000:00:04.5: irq 80 for MSI/MSI-X
> [    0.740670] ioatdma 0000:00:04.6: irq 81 for MSI/MSI-X
> [    0.740971] ioatdma 0000:00:04.7: irq 82 for MSI/MSI-X

Good.  You have two issues, and I'm going to separate them and only
address the first one here.  I opened a bug report [1] against the
IOAT driver.  It should do something more useful when
CONFIG_PCI_MMCONFIG=n so we don't have to debug this again in the
future.  But otherwise, it sounds like this issue is resolved.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=51101

--

Yes--(agree w/ config option) Thank you!

Justin.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
  2012-11-29  0:08                   ` Bjorn Helgaas
@ 2012-11-29  0:49                     ` Justin Piszcz
  0 siblings, 0 replies; 24+ messages in thread
From: Justin Piszcz @ 2012-11-29  0:49 UTC (permalink / raw)
  To: 'Bjorn Helgaas', 'Robert Hancock'
  Cc: 'Bruno Prémont',
	support, linux-kernel, 'Dan Williams'



-----Original Message-----
From: Bjorn Helgaas [mailto:bhelgaas@google.com] 
Sent: Wednesday, November 28, 2012 7:09 PM
To: Justin Piszcz
Cc: Bruno Prémont; support@supermicro.com; linux-kernel@vger.kernel.org; Dan
Williams
Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
bug question

On Tue, Nov 27, 2012 at 7:35 AM, Justin Piszcz <jpiszcz@lucidpixels.com>
wrote:
>
>
> -----Original Message-----
> From: Justin Piszcz [mailto:jpiszcz@lucidpixels.com]
> Sent: Tuesday, November 27, 2012 8:56 AM
> To: 'Bjorn Helgaas'
> Cc: 'Bruno Prémont'; support@supermicro.com; linux-kernel@vger.kernel.org;
> 'Dan Williams'
> Subject: RE: Supermicro X9SRL-F - channel enumeration error &
ACPI/firmware
> bug question
>
>
>> It is _not_ working on the:
>
>> 2) Supermicro X8DTH-F (the boot drive in this system is running off a
> PCI-e
>> card, could the IRQ for the I/O controller be getting re-mapped and
> fail?)--
>> worse case I can move the SSD from the 6.0gbpa SATA card to the
> motherboard
>> and see if that works, but that kind of defeats the purpose of a 6.0gbps
>> SATA SSD.
>
> When I removed the Highpoint 2-port SATA card and plugged it into the
> motherboard, the system boots (plugged the SSD into the motherboard).
> So if you use a HIGHPOINT 2-PORT SATA 6.0gbps card, do NOT enable IOMMU or
> it will fail to initialize the Highpoint 2-port SATA controller card!
> I also tried upgrading the BIOS (of the mobo, no diff)
> I also tried just leaving the SATA card in and plugging it into the
> motherboard (no diff)
> Removed the Highpoint 2-port SATA card and then success, it would be nice
to
> use that card with IOMMU support though, is it just not compatible
> (marvell-problem?) or is a driver bug?  Based on the pictures/etc sent
> earlier?

I would guess this is a core bug, but it's hard to tell without more
information.

If you boot with "intel_iommu=off", I would guess the Highpoint card
would work (this should have the same effect as turning off
CONFIG_INTEL_IOMMU).  I'd like to compare the complete dmesg log for
that boot with the one that fails.

It sounds like it might be hard to collect the log for the failing
case -- you said the boot fails when the Highpoint card is in the
system even if the SSD is connected to the motherboard instead of the
Highpoint card.  The panic in the photo2 image looks like it's just a
failure to mount the root filesystem, which is what I'd expect if we
can't find the SSD.  It seems like we ought to be able to *boot* with
the SSD connected to the motherboard, even if the Highpoint card
doesn't work.  But worst-case, a video of the failing boot might be
enough, especially if you can slow it down with "boot_delay="

--

SUMMARY: Card fails with iommu support in the kernel: (but system does now
boot (3.6.8) with the card in as long as the system disk isn't attached to
it, not sure what was wrong earlier).

It seems to be working now:
=> SSD on motherboard
=> PCI-e card (highpoint in the system but not used, no disks attached)

(After I enabled nouveau, not sure that has anything to do with it) I put
the card in, and it errors as usual but the SSD now on the motherboard it
does boot successfully.  

Here are the errors from the kernel trying to initialize the board with
iommu enabled (retrieved via netconsole) also picture below (w/help from
boot_delay=100 && nouveau enabled):
http://home.comcast.net/~jpiszcz/20121128/highpoint.jpg

Nov 28 19:30:16 p34 [    7.771060] ata14.00: qc timeout (cmd 0xa1) 
Nov 28 19:30:16 p34 [    8.270153] ata14.00: failed to IDENTIFY (I/O error,
err_mask=0x4) 
Nov 28 19:30:17 p34 [    9.073935] ata14: SATA link up 1.5 Gbps (SStatus 113
SControl 300) 
Nov 28 19:30:27 p34 [   19.058915] ata14.00: qc timeout (cmd 0xa1) 
Nov 28 19:30:28 p34 [   19.557885] ata14.00: failed to IDENTIFY (I/O error,
err_mask=0x4) 
Nov 28 19:30:28 p34 [   19.558478] ata14: limiting SATA link speed to 1.5
Gbps 
Nov 28 19:30:29 p34 [   20.363658] ata14: SATA link up 1.5 Gbps (SStatus 113
SControl 310) 
Nov 28 19:30:48 p34 [   39.568234] dmar: DRHD: handling fault status reg 502

Nov 28 19:30:48 p34 [   39.571508] dmar: DMAR:[DMA Read] Request device
[04:00.0] fault addr 0  [   39.571508] DMAR:[fault reason 06] PTE Read
access is not set 
Nov 28 19:30:59 p34 [   50.318146] ata14.00: qc timeout (cmd 0xa1) 
Nov 28 19:30:59 p34 [   50.818061] ata14.00: failed to IDENTIFY (I/O error,
err_mask=0x4) 
Nov 28 19:31:00 p34 [   51.621827] ata14: SATA link up 1.5 Gbps (SStatus 113
SControl 310)

Justin.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
  2012-11-29  0:34               ` Robert Hancock
@ 2012-11-29  0:49                 ` Justin Piszcz
  2012-11-29  0:55                   ` Robert Hancock
  0 siblings, 1 reply; 24+ messages in thread
From: Justin Piszcz @ 2012-11-29  0:49 UTC (permalink / raw)
  To: 'Robert Hancock'
  Cc: 'Bjorn Helgaas', 'Bruno Prémont',
	support, linux-kernel, 'Dan Williams'



-----Original Message-----
From: Robert Hancock [mailto:hancockrwd@gmail.com] 
Sent: Wednesday, November 28, 2012 7:35 PM
To: Justin Piszcz
Cc: 'Bjorn Helgaas'; 'Bruno Prémont'; support@supermicro.com;
linux-kernel@vger.kernel.org; 'Dan Williams'
Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
bug question


What does lspci -vv show on that controller? Not sure what actual 
chipset that controller is, but there's a known issue with some Marvell 
6Gbps SATA controllers with DMAR enabled - it seems the device issues 
memory read/write requests from the wrong PCI function ID and the IOMMU 
rightly denies access as the function listed in the requests doesn't 
have any mapping to that memory. I don't think there's presently a 
workaround other than disabling DMAR. We could (and likely should) be 
detecting that device and adding some kind of quirk for it.

That sounds likely...
It is shown below:

Card name: HighPoint Rocket 620 Dual Port SATA 6 Gbps PCI Express 2.0 Host
Adapter

lspci -vv output:

84:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA
6.0 Gb/s controller (rev 11) (prog-if 01 [AHCI 1.0])
  Subsystem: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s
controller
  Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
  Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
  Latency: 0, Cache Line Size: 256 bytes
  Interrupt: pin A routed to IRQ 119
  Region 0: I/O ports at e000 [size=8]
  Region 1: I/O ports at dc00 [size=4]
  Region 2: I/O ports at ec00 [size=8]
  Region 3: I/O ports at e800 [size=4]
  Region 4: I/O ports at e400 [size=16]
  Region 5: Memory at cfeee000 (32-bit, non-prefetchable) [size=2K]
  Expansion ROM at cfef0000 [disabled] [size=64K]
  Capabilities: [40] Power Management version 3
    Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot+,D3cold-)
    Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
  Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
    Address: fee20000  Data: 4076
  Capabilities: [70] Express (v2) Legacy Endpoint, MSI 00
    DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
      ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
    DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
      RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
      MaxPayload 256 bytes, MaxReadReq 512 bytes
    DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
    LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns,
L1 <64us
      ClockPM- Surprise- LLActRep- BwNot-
    LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
      ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
    LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt-
ABWMgmt-
    DevCap2: Completion Timeout: Not Supported, TimeoutDis+
    DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
    LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-,
Selectable De-emphasis: -6dB
       Transmit Margin: Normal Operating Range, EnterModifiedCompliance-
ComplianceSOS-
       Compliance De-emphasis: -6dB
    LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-,
EqualizationPhase1-
       EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
  Capabilities: [100 v1] Advanced Error Reporting
    UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
MalfTLP- ECRC- UnsupReq- ACSViol-
    UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
MalfTLP- ECRC- UnsupReq- ACSViol-
    UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+
MalfTLP+ ECRC- UnsupReq- ACSViol-
    CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
    CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
    AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
  Kernel driver in use: ahci

>
> --
>
>
> ==> Further issues with the X9SRL-F -- does this board support ASPM or is
> this a Linux/ASPM implementation issue?
> [    0.632170]  pci0000:ff: ACPI _OSC support notification failed,
disabling
> PCIe ASPM
> [    0.632239]  pci0000:ff: Unable to request _OSC control (_OSC support
> mask: 0x08)

What's the full dmesg from this machine (or is it already posted somewhere)?

It is now available here:
http://home.comcast.net/~jpiszcz/20121128/dmesg.txt

Justin.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
  2012-11-29  0:49                 ` Justin Piszcz
@ 2012-11-29  0:55                   ` Robert Hancock
  2012-11-29  8:55                     ` Justin Piszcz
  0 siblings, 1 reply; 24+ messages in thread
From: Robert Hancock @ 2012-11-29  0:55 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Bjorn Helgaas, Bruno Prémont, support, linux-kernel, Dan Williams

On Wed, Nov 28, 2012 at 6:49 PM, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
>
>
> -----Original Message-----
> From: Robert Hancock [mailto:hancockrwd@gmail.com]
> Sent: Wednesday, November 28, 2012 7:35 PM
> To: Justin Piszcz
> Cc: 'Bjorn Helgaas'; 'Bruno Prémont'; support@supermicro.com;
> linux-kernel@vger.kernel.org; 'Dan Williams'
> Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
> bug question
>
>
> What does lspci -vv show on that controller? Not sure what actual
> chipset that controller is, but there's a known issue with some Marvell
> 6Gbps SATA controllers with DMAR enabled - it seems the device issues
> memory read/write requests from the wrong PCI function ID and the IOMMU
> rightly denies access as the function listed in the requests doesn't
> have any mapping to that memory. I don't think there's presently a
> workaround other than disabling DMAR. We could (and likely should) be
> detecting that device and adding some kind of quirk for it.
>
> That sounds likely...
> It is shown below:
>
> Card name: HighPoint Rocket 620 Dual Port SATA 6 Gbps PCI Express 2.0 Host
> Adapter
>
> lspci -vv output:
>
> 84:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA
> 6.0 Gb/s controller (rev 11) (prog-if 01 [AHCI 1.0])
>   Subsystem: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s
> controller

Yeah, that's one of those controllers I think. But I can't tell from
the bit of the dmesg you posted exactly what's going on. Can you post
a full boot log from having the card installed and some drive attached
(by putting the boot drive on another controller for example)?

>> ==> Further issues with the X9SRL-F -- does this board support ASPM or is
>> this a Linux/ASPM implementation issue?
>> [    0.632170]  pci0000:ff: ACPI _OSC support notification failed,
> disabling
>> PCIe ASPM
>> [    0.632239]  pci0000:ff: Unable to request _OSC control (_OSC support
>> mask: 0x08)
>
> What's the full dmesg from this machine (or is it already posted somewhere)?
>
> It is now available here:
> http://home.comcast.net/~jpiszcz/20121128/dmesg.txt

Is that the same boot log? It doesn't have this error in it.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
  2012-11-29  0:55                   ` Robert Hancock
@ 2012-11-29  8:55                     ` Justin Piszcz
  2012-11-29 18:16                       ` Bjorn Helgaas
  0 siblings, 1 reply; 24+ messages in thread
From: Justin Piszcz @ 2012-11-29  8:55 UTC (permalink / raw)
  To: 'Robert Hancock'
  Cc: 'Bjorn Helgaas', 'Bruno Prémont',
	support, linux-kernel, 'Dan Williams'



-----Original Message-----
From: Robert Hancock [mailto:hancockrwd@gmail.com] 
Sent: Wednesday, November 28, 2012 7:55 PM
To: Justin Piszcz
Cc: Bjorn Helgaas; Bruno Prémont; support@supermicro.com;
linux-kernel@vger.kernel.org; Dan Williams
Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
bug question

On Wed, Nov 28, 2012 at 6:49 PM, Justin Piszcz <jpiszcz@lucidpixels.com>
wrote:
>
>
> -----Original Message-----
> From: Robert Hancock [mailto:hancockrwd@gmail.com]
> Sent: Wednesday, November 28, 2012 7:35 PM
> To: Justin Piszcz
> Cc: 'Bjorn Helgaas'; 'Bruno Prémont'; support@supermicro.com;
> linux-kernel@vger.kernel.org; 'Dan Williams'
> Subject: Re: Supermicro X9SRL-F - channel enumeration error &
ACPI/firmware
> bug question
>
>
> What does lspci -vv show on that controller? Not sure what actual
> chipset that controller is, but there's a known issue with some Marvell
> 6Gbps SATA controllers with DMAR enabled - it seems the device issues
> memory read/write requests from the wrong PCI function ID and the IOMMU
> rightly denies access as the function listed in the requests doesn't
> have any mapping to that memory. I don't think there's presently a
> workaround other than disabling DMAR. We could (and likely should) be
> detecting that device and adding some kind of quirk for it.
>
> That sounds likely...
> It is shown below:
>
> Card name: HighPoint Rocket 620 Dual Port SATA 6 Gbps PCI Express 2.0 Host
> Adapter
>
> lspci -vv output:
>
> 84:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA
> 6.0 Gb/s controller (rev 11) (prog-if 01 [AHCI 1.0])
>   Subsystem: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s
> controller

Yeah, that's one of those controllers I think. But I can't tell from
the bit of the dmesg you posted exactly what's going on. Can you post
a full boot log from having the card installed and some drive attached
(by putting the boot drive on another controller for example)?

>> ==> Further issues with the X9SRL-F -- does this board support ASPM or is
>> this a Linux/ASPM implementation issue?
>> [    0.632170]  pci0000:ff: ACPI _OSC support notification failed,
> disabling
>> PCIe ASPM
>> [    0.632239]  pci0000:ff: Unable to request _OSC control (_OSC support
>> mask: 0x08)
>
> What's the full dmesg from this machine (or is it already posted
somewhere)?
>
> It is now available here:
> http://home.comcast.net/~jpiszcz/20121128/dmesg.txt

> Is that the same boot log? It doesn't have this error in it.

Yes, the error is here: (its towards the bottom)

 [    7.973015] ata14.00: qc timeout (cmd 0xa1)
[    8.472120] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[    9.275922] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[   19.260667] ata14.00: qc timeout (cmd 0xa1)
[   19.759828] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[   19.760451] ata14: limiting SATA link speed to 1.5 Gbps
[   20.566598] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[   50.521078] ata14.00: qc timeout (cmd 0xa1)
[   51.020880] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[   51.824664] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[   51.824682] dmar: DRHD: handling fault status reg 502
[   51.824686] dmar: DMAR:[DMA Read] Request device [04:00.0] fault addr 0 
[   51.824686] DMAR:[fault reason 06] PTE Read access is not set
[   52.338871] EXT3-fs (sdb2): error: couldn't mount because of unsupported
optional features (240)
[   52.348938] EXT2-fs (sdb2): error: couldn't mount because of unsupported
optional features (240)
[   52.360314] EXT4-fs (sdb2): mounted filesystem with ordered data mode.
Opts: (null)

The system does not boot when the SSD is on that SATA controller.
The error we were trying to get earlier (kernel panic)-- I cannot reproduce
that anymore after adding nouveau for whatever reason.
So to re-cap it boots now with nothing connected to the controller but the
controller is non-workable/useless, as shown above.
When you put the SSD on it, it cannot mount rootfs.

Justin.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
  2012-11-29  8:55                     ` Justin Piszcz
@ 2012-11-29 18:16                       ` Bjorn Helgaas
  2012-11-30  2:39                         ` Robert Hancock
  0 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2012-11-29 18:16 UTC (permalink / raw)
  To: Justin Piszcz
  Cc: Robert Hancock, Bruno Prémont, support, linux-kernel, Dan Williams

On Thu, Nov 29, 2012 at 1:55 AM, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
>
>
> -----Original Message-----
> From: Robert Hancock [mailto:hancockrwd@gmail.com]
> Sent: Wednesday, November 28, 2012 7:55 PM
> To: Justin Piszcz
> Cc: Bjorn Helgaas; Bruno Prémont; support@supermicro.com;
> linux-kernel@vger.kernel.org; Dan Williams
> Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
> bug question
>
> On Wed, Nov 28, 2012 at 6:49 PM, Justin Piszcz <jpiszcz@lucidpixels.com>
> wrote:
>>
>>
>> -----Original Message-----
>> From: Robert Hancock [mailto:hancockrwd@gmail.com]
>> Sent: Wednesday, November 28, 2012 7:35 PM
>> To: Justin Piszcz
>> Cc: 'Bjorn Helgaas'; 'Bruno Prémont'; support@supermicro.com;
>> linux-kernel@vger.kernel.org; 'Dan Williams'
>> Subject: Re: Supermicro X9SRL-F - channel enumeration error &
> ACPI/firmware
>> bug question
>>
>>
>> What does lspci -vv show on that controller? Not sure what actual
>> chipset that controller is, but there's a known issue with some Marvell
>> 6Gbps SATA controllers with DMAR enabled - it seems the device issues
>> memory read/write requests from the wrong PCI function ID and the IOMMU
>> rightly denies access as the function listed in the requests doesn't
>> have any mapping to that memory. I don't think there's presently a
>> workaround other than disabling DMAR. We could (and likely should) be
>> detecting that device and adding some kind of quirk for it.
>>
>> That sounds likely...
>> It is shown below:
>>
>> Card name: HighPoint Rocket 620 Dual Port SATA 6 Gbps PCI Express 2.0 Host
>> Adapter
>>
>> lspci -vv output:
>>
>> 84:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA
>> 6.0 Gb/s controller (rev 11) (prog-if 01 [AHCI 1.0])
>>   Subsystem: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s
>> controller
>
> Yeah, that's one of those controllers I think. But I can't tell from
> the bit of the dmesg you posted exactly what's going on. Can you post
> a full boot log from having the card installed and some drive attached
> (by putting the boot drive on another controller for example)?
>
>>> ==> Further issues with the X9SRL-F -- does this board support ASPM or is
>>> this a Linux/ASPM implementation issue?
>>> [    0.632170]  pci0000:ff: ACPI _OSC support notification failed,
>> disabling
>>> PCIe ASPM
>>> [    0.632239]  pci0000:ff: Unable to request _OSC control (_OSC support
>>> mask: 0x08)
>>
>> What's the full dmesg from this machine (or is it already posted
> somewhere)?
>>
>> It is now available here:
>> http://home.comcast.net/~jpiszcz/20121128/dmesg.txt
>
>> Is that the same boot log? It doesn't have this error in it.
>
> Yes, the error is here: (its towards the bottom)
>
>  [    7.973015] ata14.00: qc timeout (cmd 0xa1)
> [    8.472120] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [    9.275922] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [   19.260667] ata14.00: qc timeout (cmd 0xa1)
> [   19.759828] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [   19.760451] ata14: limiting SATA link speed to 1.5 Gbps
> [   20.566598] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> [   50.521078] ata14.00: qc timeout (cmd 0xa1)
> [   51.020880] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [   51.824664] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> [   51.824682] dmar: DRHD: handling fault status reg 502
> [   51.824686] dmar: DMAR:[DMA Read] Request device [04:00.0] fault addr 0
> [   51.824686] DMAR:[fault reason 06] PTE Read access is not set

You have these devices:

    pci 0000:04:00.0: [10de:01d3] type 00 class 0x030000 nVidia G72
    pci 0000:84:00.0: [1b4b:9123] type 00 class 0x010601 Marvell 88SE9123 SATA
    pci 0000:84:00.1: [1b4b:91a4] type 00 class 0x01018f Marvell 88SE9128 IDE

I think the 04:00.0 DMAR errors are symptoms of nouveau driver issues,
and if you get rid of that driver, they'll probably go away.

But this 84:00.1 DMAR error:

    dmar: DMAR:[DMA Read] Request device [84:00.1] fault addr fff00000
    DMAR:[fault reason 02] Present bit in context entry is clear

looks like the probable cause of the Marvell issue.  It looks similar
to https://bugzilla.kernel.org/show_bug.cgi?id=42679, although the
reports there show a bb:dd.0 device (but no bb:dd.1 device), and the
DMAR rejects DMA that appears to be from bb:dd.1.

Another report that's even more similar is
https://bugzilla.redhat.com/show_bug.cgi?id=757166 .  In that case,
both bb:dd.0 and bb:dd.1 exist (as in your system), and the DMAR fault
is exactly like what you're seeing.

So you're not alone, but unfortunately, nobody seems to be working on
either bug report.  I took the liberty to add you to the cc: list of
both.

I don't really know what else to do at this point.  Maybe a SATA
expert with some Marvell docs could figure out why we're seeing DMA
from the IDE controller, but I'm not that person :)

> [   52.338871] EXT3-fs (sdb2): error: couldn't mount because of unsupported
> optional features (240)
> [   52.348938] EXT2-fs (sdb2): error: couldn't mount because of unsupported
> optional features (240)
> [   52.360314] EXT4-fs (sdb2): mounted filesystem with ordered data mode.
> Opts: (null)
>
> The system does not boot when the SSD is on that SATA controller.
> The error we were trying to get earlier (kernel panic)-- I cannot reproduce
> that anymore after adding nouveau for whatever reason.
> So to re-cap it boots now with nothing connected to the controller but the
> controller is non-workable/useless, as shown above.
> When you put the SSD on it, it cannot mount rootfs.
>
> Justin.
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
  2012-11-29 18:16                       ` Bjorn Helgaas
@ 2012-11-30  2:39                         ` Robert Hancock
  2012-11-30  3:38                           ` Bjorn Helgaas
  0 siblings, 1 reply; 24+ messages in thread
From: Robert Hancock @ 2012-11-30  2:39 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Justin Piszcz, Bruno Prémont, support, linux-kernel, Dan Williams

On Thu, Nov 29, 2012 at 12:16 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Thu, Nov 29, 2012 at 1:55 AM, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
>>
>>
>> -----Original Message-----
>> From: Robert Hancock [mailto:hancockrwd@gmail.com]
>> Sent: Wednesday, November 28, 2012 7:55 PM
>> To: Justin Piszcz
>> Cc: Bjorn Helgaas; Bruno Prémont; support@supermicro.com;
>> linux-kernel@vger.kernel.org; Dan Williams
>> Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
>> bug question
>>
>> On Wed, Nov 28, 2012 at 6:49 PM, Justin Piszcz <jpiszcz@lucidpixels.com>
>> wrote:
>>>
>>>
>>> -----Original Message-----
>>> From: Robert Hancock [mailto:hancockrwd@gmail.com]
>>> Sent: Wednesday, November 28, 2012 7:35 PM
>>> To: Justin Piszcz
>>> Cc: 'Bjorn Helgaas'; 'Bruno Prémont'; support@supermicro.com;
>>> linux-kernel@vger.kernel.org; 'Dan Williams'
>>> Subject: Re: Supermicro X9SRL-F - channel enumeration error &
>> ACPI/firmware
>>> bug question
>>>
>>>
>>> What does lspci -vv show on that controller? Not sure what actual
>>> chipset that controller is, but there's a known issue with some Marvell
>>> 6Gbps SATA controllers with DMAR enabled - it seems the device issues
>>> memory read/write requests from the wrong PCI function ID and the IOMMU
>>> rightly denies access as the function listed in the requests doesn't
>>> have any mapping to that memory. I don't think there's presently a
>>> workaround other than disabling DMAR. We could (and likely should) be
>>> detecting that device and adding some kind of quirk for it.
>>>
>>> That sounds likely...
>>> It is shown below:
>>>
>>> Card name: HighPoint Rocket 620 Dual Port SATA 6 Gbps PCI Express 2.0 Host
>>> Adapter
>>>
>>> lspci -vv output:
>>>
>>> 84:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA
>>> 6.0 Gb/s controller (rev 11) (prog-if 01 [AHCI 1.0])
>>>   Subsystem: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s
>>> controller
>>
>> Yeah, that's one of those controllers I think. But I can't tell from
>> the bit of the dmesg you posted exactly what's going on. Can you post
>> a full boot log from having the card installed and some drive attached
>> (by putting the boot drive on another controller for example)?
>>
>>>> ==> Further issues with the X9SRL-F -- does this board support ASPM or is
>>>> this a Linux/ASPM implementation issue?
>>>> [    0.632170]  pci0000:ff: ACPI _OSC support notification failed,
>>> disabling
>>>> PCIe ASPM
>>>> [    0.632239]  pci0000:ff: Unable to request _OSC control (_OSC support
>>>> mask: 0x08)
>>>
>>> What's the full dmesg from this machine (or is it already posted
>> somewhere)?
>>>
>>> It is now available here:
>>> http://home.comcast.net/~jpiszcz/20121128/dmesg.txt
>>
>>> Is that the same boot log? It doesn't have this error in it.
>>
>> Yes, the error is here: (its towards the bottom)
>>
>>  [    7.973015] ata14.00: qc timeout (cmd 0xa1)
>> [    8.472120] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [    9.275922] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>> [   19.260667] ata14.00: qc timeout (cmd 0xa1)
>> [   19.759828] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [   19.760451] ata14: limiting SATA link speed to 1.5 Gbps
>> [   20.566598] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
>> [   50.521078] ata14.00: qc timeout (cmd 0xa1)
>> [   51.020880] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [   51.824664] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
>> [   51.824682] dmar: DRHD: handling fault status reg 502
>> [   51.824686] dmar: DMAR:[DMA Read] Request device [04:00.0] fault addr 0
>> [   51.824686] DMAR:[fault reason 06] PTE Read access is not set
>
> You have these devices:
>
>     pci 0000:04:00.0: [10de:01d3] type 00 class 0x030000 nVidia G72
>     pci 0000:84:00.0: [1b4b:9123] type 00 class 0x010601 Marvell 88SE9123 SATA
>     pci 0000:84:00.1: [1b4b:91a4] type 00 class 0x01018f Marvell 88SE9128 IDE
>
> I think the 04:00.0 DMAR errors are symptoms of nouveau driver issues,
> and if you get rid of that driver, they'll probably go away.
>
> But this 84:00.1 DMAR error:
>
>     dmar: DMAR:[DMA Read] Request device [84:00.1] fault addr fff00000
>     DMAR:[fault reason 02] Present bit in context entry is clear
>
> looks like the probable cause of the Marvell issue.  It looks similar
> to https://bugzilla.kernel.org/show_bug.cgi?id=42679, although the
> reports there show a bb:dd.0 device (but no bb:dd.1 device), and the
> DMAR rejects DMA that appears to be from bb:dd.1.
>
> Another report that's even more similar is
> https://bugzilla.redhat.com/show_bug.cgi?id=757166 .  In that case,
> both bb:dd.0 and bb:dd.1 exist (as in your system), and the DMAR fault
> is exactly like what you're seeing.
>
> So you're not alone, but unfortunately, nobody seems to be working on
> either bug report.  I took the liberty to add you to the cc: list of
> both.
>
> I don't really know what else to do at this point.  Maybe a SATA
> expert with some Marvell docs could figure out why we're seeing DMA
> from the IDE controller, but I'm not that person :)

I doubt any Marvell docs would really be very helpful (except for
maybe an errata list but that likely would just tell us what we can
already figure out). The SATA controller part of the device seems to
just be issuing accesses with the wrong PCI function ID.

The only solution I can think of would be at the PCI/DMAR layer -
basically functions 0 and 1 on this device should be allowed to access
each other's DMA regions.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
  2012-11-30  2:39                         ` Robert Hancock
@ 2012-11-30  3:38                           ` Bjorn Helgaas
  2012-12-02 13:26                             ` Joerg Roedel
  0 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2012-11-30  3:38 UTC (permalink / raw)
  To: Robert Hancock
  Cc: Justin Piszcz, Bruno Prémont, support, linux-kernel,
	Dan Williams, Jeff Garzik, linux-ide, David Woodhouse,
	Joerg Roedel, iommu

[+cc Jeff, linux-ide, David, Joerg, iommu]

On Thu, Nov 29, 2012 at 7:39 PM, Robert Hancock <hancockrwd@gmail.com> wrote:
> On Thu, Nov 29, 2012 at 12:16 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Thu, Nov 29, 2012 at 1:55 AM, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
>>>
>>>
>>> -----Original Message-----
>>> From: Robert Hancock [mailto:hancockrwd@gmail.com]
>>> Sent: Wednesday, November 28, 2012 7:55 PM
>>> To: Justin Piszcz
>>> Cc: Bjorn Helgaas; Bruno Prémont; support@supermicro.com;
>>> linux-kernel@vger.kernel.org; Dan Williams
>>> Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
>>> bug question
>>>
>>> On Wed, Nov 28, 2012 at 6:49 PM, Justin Piszcz <jpiszcz@lucidpixels.com>
>>> wrote:
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Robert Hancock [mailto:hancockrwd@gmail.com]
>>>> Sent: Wednesday, November 28, 2012 7:35 PM
>>>> To: Justin Piszcz
>>>> Cc: 'Bjorn Helgaas'; 'Bruno Prémont'; support@supermicro.com;
>>>> linux-kernel@vger.kernel.org; 'Dan Williams'
>>>> Subject: Re: Supermicro X9SRL-F - channel enumeration error &
>>> ACPI/firmware
>>>> bug question
>>>>
>>>>
>>>> What does lspci -vv show on that controller? Not sure what actual
>>>> chipset that controller is, but there's a known issue with some Marvell
>>>> 6Gbps SATA controllers with DMAR enabled - it seems the device issues
>>>> memory read/write requests from the wrong PCI function ID and the IOMMU
>>>> rightly denies access as the function listed in the requests doesn't
>>>> have any mapping to that memory. I don't think there's presently a
>>>> workaround other than disabling DMAR. We could (and likely should) be
>>>> detecting that device and adding some kind of quirk for it.
>>>>
>>>> That sounds likely...
>>>> It is shown below:
>>>>
>>>> Card name: HighPoint Rocket 620 Dual Port SATA 6 Gbps PCI Express 2.0 Host
>>>> Adapter
>>>>
>>>> lspci -vv output:
>>>>
>>>> 84:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA
>>>> 6.0 Gb/s controller (rev 11) (prog-if 01 [AHCI 1.0])
>>>>   Subsystem: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s
>>>> controller
>>>
>>> Yeah, that's one of those controllers I think. But I can't tell from
>>> the bit of the dmesg you posted exactly what's going on. Can you post
>>> a full boot log from having the card installed and some drive attached
>>> (by putting the boot drive on another controller for example)?
>>>
>>>>> ==> Further issues with the X9SRL-F -- does this board support ASPM or is
>>>>> this a Linux/ASPM implementation issue?
>>>>> [    0.632170]  pci0000:ff: ACPI _OSC support notification failed,
>>>> disabling
>>>>> PCIe ASPM
>>>>> [    0.632239]  pci0000:ff: Unable to request _OSC control (_OSC support
>>>>> mask: 0x08)
>>>>
>>>> What's the full dmesg from this machine (or is it already posted
>>> somewhere)?
>>>>
>>>> It is now available here:
>>>> http://home.comcast.net/~jpiszcz/20121128/dmesg.txt
>>>
>>>> Is that the same boot log? It doesn't have this error in it.
>>>
>>> Yes, the error is here: (its towards the bottom)
>>>
>>>  [    7.973015] ata14.00: qc timeout (cmd 0xa1)
>>> [    8.472120] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>> [    9.275922] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>> [   19.260667] ata14.00: qc timeout (cmd 0xa1)
>>> [   19.759828] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>> [   19.760451] ata14: limiting SATA link speed to 1.5 Gbps
>>> [   20.566598] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
>>> [   50.521078] ata14.00: qc timeout (cmd 0xa1)
>>> [   51.020880] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>> [   51.824664] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
>>> [   51.824682] dmar: DRHD: handling fault status reg 502
>>> [   51.824686] dmar: DMAR:[DMA Read] Request device [04:00.0] fault addr 0
>>> [   51.824686] DMAR:[fault reason 06] PTE Read access is not set
>>
>> You have these devices:
>>
>>     pci 0000:04:00.0: [10de:01d3] type 00 class 0x030000 nVidia G72
>>     pci 0000:84:00.0: [1b4b:9123] type 00 class 0x010601 Marvell 88SE9123 SATA
>>     pci 0000:84:00.1: [1b4b:91a4] type 00 class 0x01018f Marvell 88SE9128 IDE
>>
>> I think the 04:00.0 DMAR errors are symptoms of nouveau driver issues,
>> and if you get rid of that driver, they'll probably go away.
>>
>> But this 84:00.1 DMAR error:
>>
>>     dmar: DMAR:[DMA Read] Request device [84:00.1] fault addr fff00000
>>     DMAR:[fault reason 02] Present bit in context entry is clear
>>
>> looks like the probable cause of the Marvell issue.  It looks similar
>> to https://bugzilla.kernel.org/show_bug.cgi?id=42679, although the
>> reports there show a bb:dd.0 device (but no bb:dd.1 device), and the
>> DMAR rejects DMA that appears to be from bb:dd.1.
>>
>> Another report that's even more similar is
>> https://bugzilla.redhat.com/show_bug.cgi?id=757166 .  In that case,
>> both bb:dd.0 and bb:dd.1 exist (as in your system), and the DMAR fault
>> is exactly like what you're seeing.
>>
>> So you're not alone, but unfortunately, nobody seems to be working on
>> either bug report.  I took the liberty to add you to the cc: list of
>> both.
>>
>> I don't really know what else to do at this point.  Maybe a SATA
>> expert with some Marvell docs could figure out why we're seeing DMA
>> from the IDE controller, but I'm not that person :)
>
> I doubt any Marvell docs would really be very helpful (except for
> maybe an errata list but that likely would just tell us what we can
> already figure out). The SATA controller part of the device seems to
> just be issuing accesses with the wrong PCI function ID.
>
> The only solution I can think of would be at the PCI/DMAR layer -
> basically functions 0 and 1 on this device should be allowed to access
> each other's DMA regions.

That's essentially the patch at
https://bugzilla.redhat.com/show_bug.cgi?id=757166#c16, which in my
opinion is too ugly to consider.  But fortunately, I'm not the
maintainer for any IOMMU drivers.

My point about the docs is that often we think "this hardware is
clearly broken and the only workaround is X," but sometimes it's just
that we don't understand the hardware designer's intent.  It may be
that the hardware was just never tested with DMAR and is indeed
broken, or it may be that it does work with DMAR given a different
driver structure or different device initialization.  I just don't
want lack of imagination to force us to assume there's only one
workaround.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
  2012-11-30  3:38                           ` Bjorn Helgaas
@ 2012-12-02 13:26                             ` Joerg Roedel
  0 siblings, 0 replies; 24+ messages in thread
From: Joerg Roedel @ 2012-12-02 13:26 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Robert Hancock, Justin Piszcz, Bruno Prémont, support,
	linux-kernel, Dan Williams, Jeff Garzik, linux-ide,
	David Woodhouse, iommu

On Thu, Nov 29, 2012 at 08:38:53PM -0700, Bjorn Helgaas wrote:
> That's essentially the patch at
> https://bugzilla.redhat.com/show_bug.cgi?id=757166#c16, which in my
> opinion is too ugly to consider.  But fortunately, I'm not the
> maintainer for any IOMMU drivers.

There is a quirk infrastructure for those kinds of broken devices in
drivers/pci/quirks.c. Have a look into the function
pci_get_dma_source(). This function is used by the IOMMU drivers to
create the correct mappings.


	Joerg



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2012-12-02 13:26 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-24 19:40 Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question Justin Piszcz
2012-11-26 21:42 ` Bruno Prémont
2012-11-27  0:50   ` Justin Piszcz
2012-11-27  0:56   ` Bjorn Helgaas
2012-11-27  1:00     ` Bjorn Helgaas
2012-11-27  1:00       ` Justin Piszcz
2012-11-27  1:11         ` Bjorn Helgaas
2012-11-27 13:33           ` Justin Piszcz
2012-11-27 13:49             ` Justin Piszcz
2012-11-27 13:56               ` Justin Piszcz
2012-11-27 14:35                 ` Justin Piszcz
2012-11-29  0:08                   ` Bjorn Helgaas
2012-11-29  0:49                     ` Justin Piszcz
2012-11-28 23:54               ` Bjorn Helgaas
2012-11-29  0:48                 ` Justin Piszcz
2012-11-29  0:34               ` Robert Hancock
2012-11-29  0:49                 ` Justin Piszcz
2012-11-29  0:55                   ` Robert Hancock
2012-11-29  8:55                     ` Justin Piszcz
2012-11-29 18:16                       ` Bjorn Helgaas
2012-11-30  2:39                         ` Robert Hancock
2012-11-30  3:38                           ` Bjorn Helgaas
2012-12-02 13:26                             ` Joerg Roedel
2012-11-27  1:11     ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).