All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] AMD/IO-APIC: Use old IO-APIC ack method if AMD 813{1, 2} PCI-X tunnel is present
@ 2013-06-04 10:13 Andrew Cooper
  2013-06-04 10:36 ` George Dunlap
  2013-06-04 16:29 ` Jan Beulich
  0 siblings, 2 replies; 9+ messages in thread
From: Andrew Cooper @ 2013-06-04 10:13 UTC (permalink / raw)
  To: xen-devel; +Cc: Jacob Shin, Keir Fraser, Suravee Suthikulpanit, Jan Beulich

References at time of patch:

AMD 8132 PCI-X HyperTransport Tunnel
http://support.amd.com/us/ChipsetMotherboard_TechDocs/30801.pdf

AMD 8131 PCI-X HyperTransport Tunnel
http://support.amd.com/us/ChipsetMotherboard_TechDocs/26310_AMD-8131_HyperTransport_PCI-X_Tunnel_Revision_Guide_rev_3_18.pdf

The IO-APICs on the above two PCI-X HyperTransport Tunnels will issue illegal
HT packets if an interrupt gets masked while active.

This is sadly exactly what the "new" IO-APIC ack mode does.  The "old" ack
mode does however avoid this bad behaviour, and stabilises an affected system
belonging to a customer.

Original patch:
Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>

Forward ported and tweaked for upstream:
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

--
Changes since v1:
 * In the case that ioapic_ack_mode is specified on the command line, warn if
   instability will ensue.

diff -r 2d37d2d652a8 -r 115069a50a65 xen/arch/x86/io_apic.c
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -2016,8 +2016,76 @@ static void __init ioapic_pm_state_alloc
     BUG_ON(ioapic_pm_state == NULL);
 }
 
+/* AMD 8131 and 8132 PCI-X HyperTransport Tunnel IO-APICs will issue illegal
+ * HyperTransport packets if an interrupt is masked while active.  This leads
+ * to system instability and crashes with no obvious cause.
+ *
+ * This is sadly the exact behaviour of the "new" IO-APIC ack mode, but using
+ * "old" mode works around the issue.
+ */
+void __init amd813x_errata_quirks(void)
+{
+    unsigned i, found = 0;
+    uint32_t bus, dev, vendor_id;
+
+    if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD )
+        return;
+
+    /* Check for the affected IO-APIC version (0x11) to save scanning the PCI
+     * bus on most systems. */
+    for ( i = 0; i < nr_ioapics; ++i )
+    {
+        if ( mp_ioapics[i].mpc_apicver == 0x11 )
+        {
+            found = 1;
+            break;
+        }
+    }
+
+    if ( !found )
+        return;
+
+    for ( bus = 0; bus < 256; ++bus )
+    {
+        for ( dev = 0; dev < 32; ++dev )
+        {
+            /* IO-APIC is always Function 1.  Check for Multifunction. */
+            if ( !( pci_conf_read8(0, bus, dev, 0, PCI_HEADER_TYPE ) & 0x80 ) )
+                continue;
+
+            vendor_id = pci_conf_read32(0, bus, dev, 1, PCI_VENDOR_ID);
+
+            /* 0x7451 is AMD-8131 PCI-X IO-APIC */
+            if ( vendor_id == 0x74511022 )
+                printk(XENLOG_WARNING "Found AMD 8131 PCI-X tunnel, erraturm #63: ");
+            /* 0x7459 is AMD-8132 PCI-X IO-APIC */
+            else if ( vendor_id == 0x74591022 )
+                printk(XENLOG_WARNING "Found AMD 8132 PCI-X tunnel, erraturm #81: ");
+            else
+                continue;
+
+            if ( ioapic_ack_forced )
+            {
+                if ( ioapic_ack_new != 0 )
+                    printk("Not overriding command line. System will be unstable. "
+                           "Use ioapic_ack_mode=old\n");
+                else
+                    printk("Command line is ok for system stability\n");
+            }
+            else
+            {
+                printk("Forcing 'old' IO-APIC ack method\n");
+                ioapic_ack_new = 0;
+            }
+            return;
+        }
+    }
+}
+
 void __init setup_IO_APIC(void)
 {
+    amd813x_errata_quirks();
+
     enable_IO_APIC();
 
     if (acpi_ioapic)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] AMD/IO-APIC: Use old IO-APIC ack method if AMD 813{1, 2} PCI-X tunnel is present
  2013-06-04 10:13 [PATCH v2] AMD/IO-APIC: Use old IO-APIC ack method if AMD 813{1, 2} PCI-X tunnel is present Andrew Cooper
@ 2013-06-04 10:36 ` George Dunlap
  2013-06-04 16:29 ` Jan Beulich
  1 sibling, 0 replies; 9+ messages in thread
From: George Dunlap @ 2013-06-04 10:36 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Keir Fraser, Jacob Shin, Jan Beulich, Suravee Suthikulpanit, xen-devel

On Tue, Jun 4, 2013 at 11:13 AM, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> References at time of patch:
>
> AMD 8132 PCI-X HyperTransport Tunnel
> http://support.amd.com/us/ChipsetMotherboard_TechDocs/30801.pdf
>
> AMD 8131 PCI-X HyperTransport Tunnel
> http://support.amd.com/us/ChipsetMotherboard_TechDocs/26310_AMD-8131_HyperTransport_PCI-X_Tunnel_Revision_Guide_rev_3_18.pdf
>
> The IO-APICs on the above two PCI-X HyperTransport Tunnels will issue illegal
> HT packets if an interrupt gets masked while active.
>
> This is sadly exactly what the "new" IO-APIC ack mode does.  The "old" ack
> mode does however avoid this bad behaviour, and stabilises an affected system
> belonging to a customer.
>
> Original patch:
> Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
>
> Forward ported and tweaked for upstream:
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Re the release:

Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] AMD/IO-APIC: Use old IO-APIC ack method if AMD 813{1, 2} PCI-X tunnel is present
  2013-06-04 10:13 [PATCH v2] AMD/IO-APIC: Use old IO-APIC ack method if AMD 813{1, 2} PCI-X tunnel is present Andrew Cooper
  2013-06-04 10:36 ` George Dunlap
@ 2013-06-04 16:29 ` Jan Beulich
  2013-06-04 16:48   ` Andrew Cooper
  1 sibling, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2013-06-04 16:29 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir Fraser, Jacob Shin, SuraveeSuthikulpanit, xen-devel

>>> On 04.06.13 at 12:13, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> References at time of patch:
> 
> AMD 8132 PCI-X HyperTransport Tunnel
> http://support.amd.com/us/ChipsetMotherboard_TechDocs/30801.pdf 
> 
> AMD 8131 PCI-X HyperTransport Tunnel
> http://support.amd.com/us/ChipsetMotherboard_TechDocs/26310_AMD-8131_HyperTra 
> nsport_PCI-X_Tunnel_Revision_Guide_rev_3_18.pdf
> 
> The IO-APICs on the above two PCI-X HyperTransport Tunnels will issue illegal
> HT packets if an interrupt gets masked while active.
> 
> This is sadly exactly what the "new" IO-APIC ack mode does.  The "old" ack
> mode does however avoid this bad behaviour, and stabilises an affected 
> system belonging to a customer.

Reading the errata descriptions, I'm getting the impression that
your patch only tries to deal with some portion of the issue, and
even that only on the basis that devices (and perhaps even their
drivers) are well behaved. That's because there's no silencing of
interrupt sources being done here, and I also can't see how we
would from the hypervisor.

If that's correct, is adding the workaround code really worth it,
rather than documenting that on those systems "ioapic_ack=old"
will make them a little more reliable?

If it's not correct, would you mind explaining in some more detail
how you see your change dealing with all aspects of the erratum,
namely for edge triggered IRQs as well as for the masking that's
done with the old ack mode for level triggered ones (or why you
don't have to deal with those cases)?

> +void __init amd813x_errata_quirks(void)
> +{
> +    unsigned i, found = 0;
> +    uint32_t bus, dev, vendor_id;
> +
> +    if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD )
> +        return;
> +
> +    /* Check for the affected IO-APIC version (0x11) to save scanning the PCI
> +     * bus on most systems. */
> +    for ( i = 0; i < nr_ioapics; ++i )
> +    {
> +        if ( mp_ioapics[i].mpc_apicver == 0x11 )
> +        {
> +            found = 1;
> +            break;
> +        }
> +    }
> +
> +    if ( !found )
> +        return;
> +
> +    for ( bus = 0; bus < 256; ++bus )
> +    {
> +        for ( dev = 0; dev < 32; ++dev )
> +        {
> +            /* IO-APIC is always Function 1.  Check for Multifunction. */
> +            if ( !( pci_conf_read8(0, bus, dev, 0, PCI_HEADER_TYPE ) & 0x80 ) )
> +                continue;
> +
> +            vendor_id = pci_conf_read32(0, bus, dev, 1, PCI_VENDOR_ID);
> +
> +            /* 0x7451 is AMD-8131 PCI-X IO-APIC */
> +            if ( vendor_id == 0x74511022 )
> +                printk(XENLOG_WARNING "Found AMD 8131 PCI-X tunnel, erraturm #63: ");

"erratum"

> +            /* 0x7459 is AMD-8132 PCI-X IO-APIC */
> +            else if ( vendor_id == 0x74591022 )
> +                printk(XENLOG_WARNING "Found AMD 8132 PCI-X tunnel, erraturm #81: ");

Again.

> +            else
> +                continue;

And overall this is the kind of thing that I think a switch statement
is dealing with in a more legible way, at once avoiding the need for
the "vendor_id" local variable.

> +
> +            if ( ioapic_ack_forced )
> +            {

Inverting the condition would allow you to make this a "if/else-if/else"
sequence, with less deep indentation...

> +                if ( ioapic_ack_new != 0 )

!= 0 on a boolean variable? Two lines earlier you didn't do so,
please be consistent.

> +                    printk("Not overriding command line. System will be unstable. "

Is "will" really right here? These chipsets are rather old, and not
having seen reports of instabilities makes me imply that many
such systems just happen to work fine. Therefore, "may" might
be the better term...

> +                           "Use ioapic_ack_mode=old\n");

"Don't use \"ioapic_ack=new\"\n" would seem better.

Jan

> +                else
> +                    printk("Command line is ok for system stability\n");
> +            }
> +            else
> +            {
> +                printk("Forcing 'old' IO-APIC ack method\n");
> +                ioapic_ack_new = 0;
> +            }
> +            return;
> +        }
> +    }
> +}
> +
>  void __init setup_IO_APIC(void)
>  {
> +    amd813x_errata_quirks();
> +
>      enable_IO_APIC();
>  
>      if (acpi_ioapic)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] AMD/IO-APIC: Use old IO-APIC ack method if AMD 813{1, 2} PCI-X tunnel is present
  2013-06-04 16:29 ` Jan Beulich
@ 2013-06-04 16:48   ` Andrew Cooper
  2013-06-05  8:20     ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2013-06-04 16:48 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir (Xen.org), Jacob Shin, SuraveeSuthikulpanit, xen-devel

On 04/06/13 17:29, Jan Beulich wrote:
>>>> On 04.06.13 at 12:13, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> References at time of patch:
>>
>> AMD 8132 PCI-X HyperTransport Tunnel
>> http://support.amd.com/us/ChipsetMotherboard_TechDocs/30801.pdf 
>>
>> AMD 8131 PCI-X HyperTransport Tunnel
>> http://support.amd.com/us/ChipsetMotherboard_TechDocs/26310_AMD-8131_HyperTra 
>> nsport_PCI-X_Tunnel_Revision_Guide_rev_3_18.pdf
>>
>> The IO-APICs on the above two PCI-X HyperTransport Tunnels will issue illegal
>> HT packets if an interrupt gets masked while active.
>>
>> This is sadly exactly what the "new" IO-APIC ack mode does.  The "old" ack
>> mode does however avoid this bad behaviour, and stabilises an affected 
>> system belonging to a customer.
> Reading the errata descriptions, I'm getting the impression that
> your patch only tries to deal with some portion of the issue, and
> even that only on the basis that devices (and perhaps even their
> drivers) are well behaved. That's because there's no silencing of
> interrupt sources being done here, and I also can't see how we
> would from the hypervisor.
>
> If that's correct, is adding the workaround code really worth it,
> rather than documenting that on those systems "ioapic_ack=old"
> will make them a little more reliable?
>
> If it's not correct, would you mind explaining in some more detail
> how you see your change dealing with all aspects of the erratum,
> namely for edge triggered IRQs as well as for the masking that's
> done with the old ack mode for level triggered ones (or why you
> don't have to deal with those cases)?

The observation we have from the customer is that using "new" mode will
result in host hangs/crashes about once every two hours.  With "old",
the system has been rock solid for a week.

Unfortunately I do not have the hardware at my disposal.

>
>> +void __init amd813x_errata_quirks(void)
>> +{
>> +    unsigned i, found = 0;
>> +    uint32_t bus, dev, vendor_id;
>> +
>> +    if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD )
>> +        return;
>> +
>> +    /* Check for the affected IO-APIC version (0x11) to save scanning the PCI
>> +     * bus on most systems. */
>> +    for ( i = 0; i < nr_ioapics; ++i )
>> +    {
>> +        if ( mp_ioapics[i].mpc_apicver == 0x11 )
>> +        {
>> +            found = 1;
>> +            break;
>> +        }
>> +    }
>> +
>> +    if ( !found )
>> +        return;
>> +
>> +    for ( bus = 0; bus < 256; ++bus )
>> +    {
>> +        for ( dev = 0; dev < 32; ++dev )
>> +        {
>> +            /* IO-APIC is always Function 1.  Check for Multifunction. */
>> +            if ( !( pci_conf_read8(0, bus, dev, 0, PCI_HEADER_TYPE ) & 0x80 ) )
>> +                continue;
>> +
>> +            vendor_id = pci_conf_read32(0, bus, dev, 1, PCI_VENDOR_ID);
>> +
>> +            /* 0x7451 is AMD-8131 PCI-X IO-APIC */
>> +            if ( vendor_id == 0x74511022 )
>> +                printk(XENLOG_WARNING "Found AMD 8131 PCI-X tunnel, erraturm #63: ");
> "erratum"

Doh.

>
>> +            /* 0x7459 is AMD-8132 PCI-X IO-APIC */
>> +            else if ( vendor_id == 0x74591022 )
>> +                printk(XENLOG_WARNING "Found AMD 8132 PCI-X tunnel, erraturm #81: ");
> Again.
>
>> +            else
>> +                continue;
> And overall this is the kind of thing that I think a switch statement
> is dealing with in a more legible way, at once avoiding the need for
> the "vendor_id" local variable.

Ok

>
>> +
>> +            if ( ioapic_ack_forced )
>> +            {
> Inverting the condition would allow you to make this a "if/else-if/else"
> sequence, with less deep indentation...
>
>> +                if ( ioapic_ack_new != 0 )
> != 0 on a boolean variable? Two lines earlier you didn't do so,
> please be consistent.

Sorry - I thought I checked this.

>
>> +                    printk("Not overriding command line. System will be unstable. "
> Is "will" really right here? These chipsets are rather old, and not
> having seen reports of instabilities makes me imply that many
> such systems just happen to work fine. Therefore, "may" might
> be the better term...

Yes - see description above.

>
>> +                           "Use ioapic_ack_mode=old\n");
> "Don't use \"ioapic_ack=new\"\n" would seem better.
>
> Jan

No - this is more informative to someone who doesn't know the Xen
command line arguments inside/out.

~Andrew

>
>> +                else
>> +                    printk("Command line is ok for system stability\n");
>> +            }
>> +            else
>> +            {
>> +                printk("Forcing 'old' IO-APIC ack method\n");
>> +                ioapic_ack_new = 0;
>> +            }
>> +            return;
>> +        }
>> +    }
>> +}
>> +
>>  void __init setup_IO_APIC(void)
>>  {
>> +    amd813x_errata_quirks();
>> +
>>      enable_IO_APIC();
>>  
>>      if (acpi_ioapic)
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] AMD/IO-APIC: Use old IO-APIC ack method if AMD 813{1, 2} PCI-X tunnel is present
  2013-06-04 16:48   ` Andrew Cooper
@ 2013-06-05  8:20     ` Jan Beulich
  2013-06-05  9:43       ` Andrew Cooper
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2013-06-05  8:20 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir (Xen.org), Jacob Shin, SuraveeSuthikulpanit, xen-devel

>>> On 04.06.13 at 18:48, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 04/06/13 17:29, Jan Beulich wrote:
>>> +                           "Use ioapic_ack_mode=old\n");
>> "Don't use \"ioapic_ack=new\"\n" would seem better.
>>
>> Jan
> 
> No - this is more informative to someone who doesn't know the Xen
> command line arguments inside/out.

At least the spelling of the option needs to be fixed anyway. And
the reason I prefer the inverted statement is that if this someone
adds the suggested option to the beginning of the command line,
but leaves the bad option in somewhere towards the tail, the bad
behavior (and the message) will still occur.

But anyway - I continue to be unconvinced that this case can't be
easily enough dealt with the admin adding "ioapic_ack=old" to the
command line.

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] AMD/IO-APIC: Use old IO-APIC ack method if AMD 813{1, 2} PCI-X tunnel is present
  2013-06-05  8:20     ` Jan Beulich
@ 2013-06-05  9:43       ` Andrew Cooper
  2013-06-05  9:54         ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2013-06-05  9:43 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir (Xen.org), Jacob Shin, SuraveeSuthikulpanit, xen-devel

On 05/06/13 09:20, Jan Beulich wrote:
>>>> On 04.06.13 at 18:48, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 04/06/13 17:29, Jan Beulich wrote:
>>>> +                           "Use ioapic_ack_mode=old\n");
>>> "Don't use \"ioapic_ack=new\"\n" would seem better.
>>>
>>> Jan
>> No - this is more informative to someone who doesn't know the Xen
>> command line arguments inside/out.
> At least the spelling of the option needs to be fixed anyway. And
> the reason I prefer the inverted statement is that if this someone
> adds the suggested option to the beginning of the command line,
> but leaves the bad option in somewhere towards the tail, the bad
> behavior (and the message) will still occur.

I completely missed the spelling - that does need fixing.

While I can appreciate your point of view with the statement, it is only
rare cases where someone would set io_apic_mode anyway and when given
instructions to use =new, can purge instances of =old.

>
> But anyway - I continue to be unconvinced that this case can't be
> easily enough dealt with the admin adding "ioapic_ack=old" to the
> command line.
>
> Jan
>

That describes half the errata workarounds we do.  Why does this deserve
different treatment?.

It can certainly can be worked around by using io_apci_ack=old on the
command line, and that is how we verified the 'fix'.  But in the
meantime it took a normally technically-savvy customer 2 months of time
in highest level support (i.e. my colleagues and I) before we got to the
bottom of the issue.

~Andrew

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] AMD/IO-APIC: Use old IO-APIC ack method if AMD 813{1, 2} PCI-X tunnel is present
  2013-06-05  9:43       ` Andrew Cooper
@ 2013-06-05  9:54         ` Jan Beulich
  2013-06-05 10:39           ` Andrew Cooper
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2013-06-05  9:54 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir (Xen.org), Jacob Shin, SuraveeSuthikulpanit, xen-devel

>>> On 05.06.13 at 11:43, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 05/06/13 09:20, Jan Beulich wrote:
>> But anyway - I continue to be unconvinced that this case can't be
>> easily enough dealt with the admin adding "ioapic_ack=old" to the
>> command line.
> 
> That describes half the errata workarounds we do.  Why does this deserve
> different treatment?.

Because (and I don't think you said anything to the contrary so far)
other than those other workarounds, this one only addresses a
portion of the effects of said erratum.

> It can certainly can be worked around by using io_apci_ack=old on the
> command line, and that is how we verified the 'fix'.  But in the
> meantime it took a normally technically-savvy customer 2 months of time
> in highest level support (i.e. my colleagues and I) before we got to the
> bottom of the issue.

I appreciate you having found the root cause.

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] AMD/IO-APIC: Use old IO-APIC ack method if AMD 813{1, 2} PCI-X tunnel is present
  2013-06-05  9:54         ` Jan Beulich
@ 2013-06-05 10:39           ` Andrew Cooper
  2013-06-05 11:33             ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2013-06-05 10:39 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir (Xen.org), Jacob Shin, SuraveeSuthikulpanit, xen-devel

On 05/06/13 10:54, Jan Beulich wrote:
>>>> On 05.06.13 at 11:43, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 05/06/13 09:20, Jan Beulich wrote:
>>> But anyway - I continue to be unconvinced that this case can't be
>>> easily enough dealt with the admin adding "ioapic_ack=old" to the
>>> command line.
>> That describes half the errata workarounds we do.  Why does this deserve
>> different treatment?.
> Because (and I don't think you said anything to the contrary so far)
> other than those other workarounds, this one only addresses a
> portion of the effects of said erratum.

I am working on the knowledge from the customer that using
io_apic_ack=old does fix all their unstable behaviour.  I suppose this
does not guarantee that the fix is good, but is certainly a good indication.

>From the errata documentation:

The following interrupt and virtual wire message register settings
should not be changed in such a manner that
would cause an active interrupt or virtual wire message source to go
from unmasked to masked without first
quiescing the interrupts and/or virtual wire messages they affect in the
system

Dev[B,A]:0x48, bit 14 [INTx_PACKET_EN]
Dev[B,A]:1x04, bit 2 [MASEN]
Dev[B,A]:1x44, bit 1 [IOAEN]
Dev[B,A]:1x44, bit 0 [OSVISBAR]
Any IOAPIC Redirection Register, bit 16, Interrupt Mask [IM]
Any IOAPIC Redirection Register, bit 15, Trigger Mode [TM]
Any IOAPIC Redirection Register, bit 13, Polarity [POL]

Without a hypertransport specification to hand I cant really comment
about 1,3,4 other than expecting that Xen does not no nor care about
them.  Xen might possibly play with the busmaster bit, but without an
IOMMU on the affected system, I cant spot any codepaths which would.  It
does occur to me however that these devices are more candidates for
"stuff not even dom0 should be permitted to play with"

After boot, I am not aware of anything which would play with the
poliarity bit of an RTE.  The Trigger Mode bit might be played with if a
line level interrupt gets delivered as an edge interrupt.  As I
remember, this was a bugfix workaround for ancient IO-APICs anyway and
would not normally be expected to happen.

The only bit which is regularly played with by xen is the mask bit, but
only in the default case of io_apic_ack=new.

~Andrew

>
>> It can certainly can be worked around by using io_apci_ack=old on the
>> command line, and that is how we verified the 'fix'.  But in the
>> meantime it took a normally technically-savvy customer 2 months of time
>> in highest level support (i.e. my colleagues and I) before we got to the
>> bottom of the issue.
> I appreciate you having found the root cause.
>
> Jan
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] AMD/IO-APIC: Use old IO-APIC ack method if AMD 813{1, 2} PCI-X tunnel is present
  2013-06-05 10:39           ` Andrew Cooper
@ 2013-06-05 11:33             ` Jan Beulich
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2013-06-05 11:33 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir (Xen.org), Jacob Shin, SuraveeSuthikulpanit, xen-devel

>>> On 05.06.13 at 12:39, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 05/06/13 10:54, Jan Beulich wrote:
>>>>> On 05.06.13 at 11:43, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>> On 05/06/13 09:20, Jan Beulich wrote:
>>>> But anyway - I continue to be unconvinced that this case can't be
>>>> easily enough dealt with the admin adding "ioapic_ack=old" to the
>>>> command line.
>>> That describes half the errata workarounds we do.  Why does this deserve
>>> different treatment?.
>> Because (and I don't think you said anything to the contrary so far)
>> other than those other workarounds, this one only addresses a
>> portion of the effects of said erratum.
> 
> I am working on the knowledge from the customer that using
> io_apic_ack=old does fix all their unstable behaviour.  I suppose this
> does not guarantee that the fix is good, but is certainly a good indication.
> 
> From the errata documentation:
> 
> The following interrupt and virtual wire message register settings
> should not be changed in such a manner that
> would cause an active interrupt or virtual wire message source to go
> from unmasked to masked without first
> quiescing the interrupts and/or virtual wire messages they affect in the
> system
> 
> Dev[B,A]:0x48, bit 14 [INTx_PACKET_EN]
> Dev[B,A]:1x04, bit 2 [MASEN]
> Dev[B,A]:1x44, bit 1 [IOAEN]
> Dev[B,A]:1x44, bit 0 [OSVISBAR]
> Any IOAPIC Redirection Register, bit 16, Interrupt Mask [IM]
> Any IOAPIC Redirection Register, bit 15, Trigger Mode [TM]
> Any IOAPIC Redirection Register, bit 13, Polarity [POL]
> 
> Without a hypertransport specification to hand I cant really comment
> about 1,3,4 other than expecting that Xen does not no nor care about
> them.  Xen might possibly play with the busmaster bit, but without an
> IOMMU on the affected system, I cant spot any codepaths which would.  It
> does occur to me however that these devices are more candidates for
> "stuff not even dom0 should be permitted to play with"
> 
> After boot, I am not aware of anything which would play with the
> poliarity bit of an RTE.  The Trigger Mode bit might be played with if a
> line level interrupt gets delivered as an edge interrupt.  As I
> remember, this was a bugfix workaround for ancient IO-APICs anyway and
> would not normally be expected to happen.

None of these are what make me consider the workaround partial.

> The only bit which is regularly played with by xen is the mask bit, but
> only in the default case of io_apic_ack=new.

This is what does - what makes you think the mask bit would be
played with only for level IRQs in new-ack mode?

end_level_ioapic_irq_old() has a call to mask_IO_APIC_irq() as
much as ack_edge_ioapic_irq() has. They're being called only
under certain conditions (just like in end_level_ioapic_irq_new()),
but that doesn't mean they won't be called at all. And both uses
are - afaict - also falling under the same don't-do-this that the
erratum workaround describes.

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-06-05 11:33 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-04 10:13 [PATCH v2] AMD/IO-APIC: Use old IO-APIC ack method if AMD 813{1, 2} PCI-X tunnel is present Andrew Cooper
2013-06-04 10:36 ` George Dunlap
2013-06-04 16:29 ` Jan Beulich
2013-06-04 16:48   ` Andrew Cooper
2013-06-05  8:20     ` Jan Beulich
2013-06-05  9:43       ` Andrew Cooper
2013-06-05  9:54         ` Jan Beulich
2013-06-05 10:39           ` Andrew Cooper
2013-06-05 11:33             ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.