All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Xen-Error: Disabling IOMMU on Stepping C2 5520 Host-Bridge
       [not found] <43-60fd5980-39-6ab37100@54709844>
@ 2021-07-25 13:55 ` Marek Marczykowski-Górecki
       [not found]   ` <659EA336-E36F-4025-9B6A-DC50A31F0FF1@openhardware.de>
  2021-07-27 12:21   ` Xen-Error: Disabling IOMMU on Stepping C2 5520 Host-Bridge Andrew Cooper
  0 siblings, 2 replies; 5+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-07-25 13:55 UTC (permalink / raw)
  To: luja; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 3120 bytes --]

On Sun, Jul 25, 2021 at 02:31:17PM +0200, luja wrote:
> Hi Marek, Hi all,

Hi luja,

First of all, please use appropriate mailing list for such emails, not
email individual developers privately. I'm adding xen-devel here.

> 
> On a HP Z600 I am trying to run qubes.
> The Xen log says that the Chipset is affected by Intel-Errate #47, #53
> 
> the code in Xen is this:
> 
> "
> /* 5500/5520/X58 Chipset Interrupt remapping errata, for stepping B-3.
> * Fixed in stepping C-2. */
> static void __init tylersburg_intremap_quirk(void)
> {
> uint32_t bus, device;
> uint8_t rev;
> 
> for ( bus = 0; bus < 0x100; bus++ )
> {
> /* Match on System Management Registers on Device 20 Function 0 */
> device = pci_conf_read32(0, bus, 20, 0, PCI_VENDOR_ID);
> rev = pci_conf_read8(0, bus, 20, 0, PCI_REVISION_ID);
> 
> if ( rev == 0x13 && device == 0x342e8086 )
> {
> printk(XENLOG_WARNING VTDPREFIX
> "Disabling IOMMU due to Intel 5500/5520/X58 Chipset errata #47, #53\n");
> iommu_enable = 0;
> break;
> }
> }
> }
> 
> "
> 
> But! rev 0x13 is not suficient to detect the "wrong" host bridge.

According to the spec by Intel (page 11 in the PDF you attached), it is.

> This Z600 is equipped with 0B54h mainboard as can be seen with dmi-decode.
> 
> The manual states that 0B54h mainboard has the "newer C2 stepping",
> so it is *not* affected by Intel "spec update" (nota bene: Intel updates the
> spec, others report erratas) bugs  

The code above checks for rev 0x13, and the spec (page 11) clearly says that rev
0x13 is stepping B-3. Stepping C-2 is rev 0x22. So, if this check
triggers for you, I'm afraid you have the affected chipset.

According to HP doc you attached, you can additionally confirm it via
BIOS:
    To determine if a specific HP Z600 system
    has the C2 revision of the chipset:
    1. Use the BIOS setup menu to access the “Boot
    Block Date” from the “System Information Menu.”
    All B3-based systems will have a “1/30/09”
    date and C2-based systems will have a
    “01/07/10” date.

> So the way Xen detects the "bug" (pci rev 13) is not sufficient, as my Z600
> shows pci rev13 with lspci but 0xB54h (board rev only on Z600) with dmidecode
> I would suggest first to have an override xen kernel boot option to disable the disablement in this code section. Or just patch this part out of the Xen code and rebuild xen. If this stuff really crashes, one will see it.

Patching it out is out of the question, this check if there for a
reason.

> So please build a new xen without this stupid disablement or please add an override boot command for it.
> 
> Please see the attached upgrade manual of Z600 and the errata "spec update" by Intel.
> You see that the C2 stepping is not affected by the bugs refered to in the xen code,
> so removing that section or adding better detection of the mask revision (B3 vs. C2)  of 5520 host bridge would allow  many users to operate Qubes4.

Maybe someone else has an alternative idea?

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Xen-Error: Disabling IOMMU on Stepping C2 5520 Host-Bridge // Laser markings: SLH3P
       [not found]   ` <659EA336-E36F-4025-9B6A-DC50A31F0FF1@openhardware.de>
@ 2021-07-25 14:38     ` Ludwig Jaffe
  0 siblings, 0 replies; 5+ messages in thread
From: Ludwig Jaffe @ 2021-07-25 14:38 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 4709 bytes --]

Just for documentation,
heat sink reassembled using normal grey cpu thermal grease


On July 25, 2021 4:30:39 PM GMT+02:00, Ludwig Jaffe <luja@openhardware.de> wrote:
>Hi Marek, as you are refered as Xen expert I thought you are the only
>one in the qubes project to know about it.
>Hi people at Xen, it would be nice to add override options in such code
>for test purposes something like forceiommu=1 
>
>So disassembling the cooler the chip reads 
>"SLH3P" the errata sheet refers it to C2 stepping and states it
>supports Intel Trusted Execution TXT.
>This is on page 11 (3rd line of table) of said intel errata.
>
>So things get a bit wired. Having an override in the kernel boot flags
>would surely help 
>to bring the computer up with cubes as it should be supported according
>to yhe laser markings. Maybe the pci-revisions are writen into
>registers of the host bridge at the time the bios does pci(e) config
>cycles and a buggy bios could simply write buggy pci revisions (just an
>assumption). Laser markings on the die should be trusted.
>
>Regards,
>
>luja
>
>
>On July 25, 2021 3:55:52 PM GMT+02:00, "Marek Marczykowski-Górecki"
><marmarek@invisiblethingslab.com> wrote:
>>On Sun, Jul 25, 2021 at 02:31:17PM +0200, luja wrote:
>>> Hi Marek, Hi all,
>>
>>Hi luja,
>>
>>First of all, please use appropriate mailing list for such emails, not
>>email individual developers privately. I'm adding xen-devel here.
>>
>>> 
>>> On a HP Z600 I am trying to run qubes.
>>> The Xen log says that the Chipset is affected by Intel-Errate #47,
>>#53
>>> 
>>> the code in Xen is this:
>>> 
>>> "
>>> /* 5500/5520/X58 Chipset Interrupt remapping errata, for stepping
>>B-3.
>>> * Fixed in stepping C-2. */
>>> static void __init tylersburg_intremap_quirk(void)
>>> {
>>> uint32_t bus, device;
>>> uint8_t rev;
>>> 
>>> for ( bus = 0; bus < 0x100; bus++ )
>>> {
>>> /* Match on System Management Registers on Device 20 Function 0 */
>>> device = pci_conf_read32(0, bus, 20, 0, PCI_VENDOR_ID);
>>> rev = pci_conf_read8(0, bus, 20, 0, PCI_REVISION_ID);
>>> 
>>> if ( rev == 0x13 && device == 0x342e8086 )
>>> {
>>> printk(XENLOG_WARNING VTDPREFIX
>>> "Disabling IOMMU due to Intel 5500/5520/X58 Chipset errata #47,
>>#53\n");
>>> iommu_enable = 0;
>>> break;
>>> }
>>> }
>>> }
>>> 
>>> "
>>> 
>>> But! rev 0x13 is not suficient to detect the "wrong" host bridge.
>>
>>According to the spec by Intel (page 11 in the PDF you attached), it
>>is.
>>
>>> This Z600 is equipped with 0B54h mainboard as can be seen with
>>dmi-decode.
>>> 
>>> The manual states that 0B54h mainboard has the "newer C2 stepping",
>>> so it is *not* affected by Intel "spec update" (nota bene: Intel
>>updates the
>>> spec, others report erratas) bugs  
>>
>>The code above checks for rev 0x13, and the spec (page 11) clearly
>says
>>that rev
>>0x13 is stepping B-3. Stepping C-2 is rev 0x22. So, if this check
>>triggers for you, I'm afraid you have the affected chipset.
>>
>>According to HP doc you attached, you can additionally confirm it via
>>BIOS:
>>    To determine if a specific HP Z600 system
>>    has the C2 revision of the chipset:
>>    1. Use the BIOS setup menu to access the “Boot
>>    Block Date” from the “System Information Menu.”
>>    All B3-based systems will have a “1/30/09”
>>    date and C2-based systems will have a
>>    “01/07/10” date.
>>
>>> So the way Xen detects the "bug" (pci rev 13) is not sufficient, as
>>my Z600
>>> shows pci rev13 with lspci but 0xB54h (board rev only on Z600) with
>>dmidecode
>>> I would suggest first to have an override xen kernel boot option to
>>disable the disablement in this code section. Or just patch this part
>>out of the Xen code and rebuild xen. If this stuff really crashes, one
>>will see it.
>>
>>Patching it out is out of the question, this check if there for a
>>reason.
>>
>>> So please build a new xen without this stupid disablement or please
>>add an override boot command for it.
>>> 
>>> Please see the attached upgrade manual of Z600 and the errata "spec
>>update" by Intel.
>>> You see that the C2 stepping is not affected by the bugs refered to
>>in the xen code,
>>> so removing that section or adding better detection of the mask
>>revision (B3 vs. C2)  of 5520 host bridge would allow  many users to
>>operate Qubes4.
>>
>>Maybe someone else has an alternative idea?
>>
>>-- 
>>Best Regards,
>>Marek Marczykowski-Górecki
>>Invisible Things Lab
>
>-- 
>Sent from my Android device with K-9 Mail. Please excuse my brevity.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

[-- Attachment #1.2: Type: text/html, Size: 5508 bytes --]

[-- Attachment #2: IMG_20210725_163710.jpg --]
[-- Type: image/jpeg, Size: 5979970 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Xen-Error: Disabling IOMMU on Stepping C2 5520 Host-Bridge
  2021-07-25 13:55 ` Xen-Error: Disabling IOMMU on Stepping C2 5520 Host-Bridge Marek Marczykowski-Górecki
       [not found]   ` <659EA336-E36F-4025-9B6A-DC50A31F0FF1@openhardware.de>
@ 2021-07-27 12:21   ` Andrew Cooper
  2021-07-27 14:36     ` Xen-Error: Disabling IOMMU on Stepping C2 5520 Host-Bridge // SLH3P marking on die luja
  1 sibling, 1 reply; 5+ messages in thread
From: Andrew Cooper @ 2021-07-27 12:21 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki, luja; +Cc: xen-devel

On 25/07/2021 14:55, Marek Marczykowski-Górecki wrote:
> On Sun, Jul 25, 2021 at 02:31:17PM +0200, luja wrote:
>> This Z600 is equipped with 0B54h mainboard as can be seen with dmi-decode.
>>
>> The manual states that 0B54h mainboard has the "newer C2 stepping",
>> so it is *not* affected by Intel "spec update" (nota bene: Intel updates the
>> spec, others report erratas) bugs  
> The code above checks for rev 0x13, and the spec (page 11) clearly says that rev
> 0x13 is stepping B-3. Stepping C-2 is rev 0x22. So, if this check
> triggers for you, I'm afraid you have the affected chipset.

The ID in hardware is the authoritative information.  Sounds like the
Z600 manual is wrong.

>> So the way Xen detects the "bug" (pci rev 13) is not sufficient, as my Z600
>> shows pci rev13 with lspci but 0xB54h (board rev only on Z600) with dmidecode
>> I would suggest first to have an override xen kernel boot option to disable the disablement in this code section. Or just patch this part out of the Xen code and rebuild xen. If this stuff really crashes, one will see it.
> Patching it out is out of the question, this check if there for a
> reason.

Using interrupt remapping on these systems does cause it to cease
functioning.

>> So please build a new xen without this stupid disablement or please add an override boot command for it.
>>
>> Please see the attached upgrade manual of Z600 and the errata "spec update" by Intel.
>> You see that the C2 stepping is not affected by the bugs refered to in the xen code,
>> so removing that section or adding better detection of the mask revision (B3 vs. C2)  of 5520 host bridge would allow  many users to operate Qubes4.
> Maybe someone else has an alternative idea?

The logic in Xen is broken.  I've tried fixing it before for XenServer,
but was objected to, and the patch is still in the patchqueue.

The errata is with the Queued Invalidation, which (in Xen) is tied to
interrupt remapping.  The rest of the IOMMU works fine.

The current status quo is that if Xen boots with an Intel gen1 IOMMU, it
will be happy with DMA remapping but no IRQ remapping.  If Xen boots on
this specific buggy system, it will turn the entire IOMMU off in
protest, which leaves the system less secure than booting on the
previous generation of hardware.

The correct behaviour is to just disable interrupt remapping in this
case, which brings Xen's behaviour in line with adjacent generations of
hardware.

~Andrew



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Xen-Error: Disabling IOMMU on Stepping C2  5520 Host-Bridge // SLH3P marking on die
  2021-07-27 12:21   ` Xen-Error: Disabling IOMMU on Stepping C2 5520 Host-Bridge Andrew Cooper
@ 2021-07-27 14:36     ` luja
  2021-07-27 15:36       ` Andrew Cooper
  0 siblings, 1 reply; 5+ messages in thread
From: luja @ 2021-07-27 14:36 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Marek Marczykowski-Górecki, xen-devel

[-- Attachment #1: Type: text/plain, Size: 5245 bytes --]


Hi all,

No, the correct behavior is to just use the host bridge as it is correct and works!
Just the PCI config space is done wrongly in the board's BIOS?

To get the truth...
I disassembled the cooler, cleaned the "phase change" wax from it,
photographed the laser engraving of the flip chip die and compared
the text with the errata "spec update" by Intel.

According to the laser marking and the errata the chip is a 5520 with C2
stepping. As it has an SLH3P marking on its die. I made a photo of it,
which is available on request.
The errata sheet refers it to C2 stepping and states it supports Intel
Trusted Execution TXT. This is on page 11 (3rd line of table) of said intel errata.
https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/5520-and-5500-chipset-ioh-specification-update.pdf


So both Chipset errata #47, #53 mentioned in the code snippet
disabling the Vt-d feature, are not present in this hardware, so the Host bridge should 
be kosher.

For some wired reason the PCI rev is 13.
I guess, that the ID is written by the bios using
pci config cycles at early boot into registers of the host bridge to
be then displayed using tools like lspci.
Page 11 of the errata:
"3. The Revision Number corresponds to bits 7:0 of the Revision ID Register located at offset 08h in the PCI
function 0 configuration space
"

But in general:
This is not Windows, so I would expect a kernel boot option
to just say "I ignore your warning, and when a black hole forms in my mainboard
it is my fault" so force_5520_C2=1 or something like this should be appropriate.
So a small readme would advise the people who are affected by a fleaky implementation
of C2 Hostbridge to give it a try! So what should happen?!
Loose all your data on a freshly installed qubes-os?!
Oh, I forgot my hdd password, and forgot to write it under the keyboard ;-) , so
I need to reinstall.
What is the difference. Computers should do what the user wants them to do,
and when they break it is the fault of the user who ordered them to fail.

So please add a kernel boot option to just go against this if-statement,
so only a warning is printed into the log but IOMMMU is not disabled:if ( rev == 0x13 && device == 0x342e8086 )
{
if (force_5520_C2==1)
{
printk(XENLOG_WARNING VTDPREFIX "NOT Disabling IOMMU as you requested force_5520_C2=1 and ignoring Intel 5500/5520/X58 Chipset errata #47, #53\n");
}
else
{
printk(XENLOG_WARNING VTDPREFIX
"Disabling IOMMU due to Intel 5500/5520/X58 Chipset errata #47, #53\n");
iommu_enable = 0;
break;
}
}

Cheers,

luja


Am Dienstag, Juli 27, 2021 14:21 CEST, schrieb Andrew Cooper <andrew.cooper3@citrix.com>:
 On 25/07/2021 14:55, Marek Marczykowski-Górecki wrote:
> On Sun, Jul 25, 2021 at 02:31:17PM +0200, luja wrote:
>> This Z600 is equipped with 0B54h mainboard as can be seen with dmi-decode.
>>
>> The manual states that 0B54h mainboard has the "newer C2 stepping",
>> so it is *not* affected by Intel "spec update" (nota bene: Intel updates the
>> spec, others report erratas) bugs  
> The code above checks for rev 0x13, and the spec (page 11) clearly says that rev
> 0x13 is stepping B-3. Stepping C-2 is rev 0x22. So, if this check
> triggers for you, I'm afraid you have the affected chipset.

The ID in hardware is the authoritative information.  Sounds like the
Z600 manual is wrong.

>> So the way Xen detects the "bug" (pci rev 13) is not sufficient, as my Z600
>> shows pci rev13 with lspci but 0xB54h (board rev only on Z600) with dmidecode
>> I would suggest first to have an override xen kernel boot option to disable the disablement in this code section. Or just patch this part out of the Xen code and rebuild xen. If this stuff really crashes, one will see it.
> Patching it out is out of the question, this check if there for a
> reason.

Using interrupt remapping on these systems does cause it to cease
functioning.

>> So please build a new xen without this stupid disablement or please add an override boot command for it.
>>
>> Please see the attached upgrade manual of Z600 and the errata "spec update" by Intel.
>> You see that the C2 stepping is not affected by the bugs refered to in the xen code,
>> so removing that section or adding better detection of the mask revision (B3 vs. C2)  of 5520 host bridge would allow  many users to operate Qubes4.
> Maybe someone else has an alternative idea?

The logic in Xen is broken.  I've tried fixing it before for XenServer,
but was objected to, and the patch is still in the patchqueue.

The errata is with the Queued Invalidation, which (in Xen) is tied to
interrupt remapping.  The rest of the IOMMU works fine.

The current status quo is that if Xen boots with an Intel gen1 IOMMU, it
will be happy with DMA remapping but no IRQ remapping.  If Xen boots on
this specific buggy system, it will turn the entire IOMMU off in
protest, which leaves the system less secure than booting on the
previous generation of hardware.

The correct behaviour is to just disable interrupt remapping in this
case, which brings Xen's behaviour in line with adjacent generations of
hardware.

~Andrew
 


 

[-- Attachment #2: Type: text/html, Size: 6554 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Xen-Error: Disabling IOMMU on Stepping C2 5520 Host-Bridge // SLH3P marking on die
  2021-07-27 14:36     ` Xen-Error: Disabling IOMMU on Stepping C2 5520 Host-Bridge // SLH3P marking on die luja
@ 2021-07-27 15:36       ` Andrew Cooper
  0 siblings, 0 replies; 5+ messages in thread
From: Andrew Cooper @ 2021-07-27 15:36 UTC (permalink / raw)
  To: luja; +Cc: Marek Marczykowski-Górecki, xen-devel

On 27/07/2021 15:36, luja wrote:
> Hi all,
>
> No, the correct behavior is to just use the host bridge as it is
> correct and works!

What evidence do you have of this claim?

Have you actually deleted the workaround, and confirmed that Xen works
fully and correctly on this hardware?  If not, that is your next task.

> Just the PCI config space is done wrongly in the board's BIOS?

These details are typically hard wired.

>
> To get the truth...
> I disassembled the cooler, cleaned the "phase change" wax from it,
> photographed the laser engraving of the flip chip die and compared
> the text with the errata "spec update" by Intel.
>
> According to the laser marking and the errata the chip is a 5520 with C2
> stepping. As it has an SLH3P marking on its die. I made a photo of it,
> which is available on request.
> The errata sheet refers it to C2 stepping and states it supports Intel
> Trusted Execution TXT. This is on page 11 (3rd line of table) of said
> intel errata.
> https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/5520-and-5500-chipset-ioh-specification-update.pdf

I'm afraid that this doesn't prove anything.

Topmarking fraud sadly exists.   A famous example is the overclocking
multiplier which used to be an external pin to chips, and no longer is
because the cheaper slower CPUs had their topmarkings forged and sold as
expensive faster ones.

~Andrew



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-07-27 15:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <43-60fd5980-39-6ab37100@54709844>
2021-07-25 13:55 ` Xen-Error: Disabling IOMMU on Stepping C2 5520 Host-Bridge Marek Marczykowski-Górecki
     [not found]   ` <659EA336-E36F-4025-9B6A-DC50A31F0FF1@openhardware.de>
2021-07-25 14:38     ` Xen-Error: Disabling IOMMU on Stepping C2 5520 Host-Bridge // Laser markings: SLH3P Ludwig Jaffe
2021-07-27 12:21   ` Xen-Error: Disabling IOMMU on Stepping C2 5520 Host-Bridge Andrew Cooper
2021-07-27 14:36     ` Xen-Error: Disabling IOMMU on Stepping C2 5520 Host-Bridge // SLH3P marking on die luja
2021-07-27 15:36       ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.