* Re: Xen-Error: Disabling IOMMU on Stepping C2 5520 Host-Bridge [not found] <43-60fd5980-39-6ab37100@54709844> @ 2021-07-25 13:55 ` Marek Marczykowski-Górecki [not found] ` <659EA336-E36F-4025-9B6A-DC50A31F0FF1@openhardware.de> 2021-07-27 12:21 ` Xen-Error: Disabling IOMMU on Stepping C2 5520 Host-Bridge Andrew Cooper 0 siblings, 2 replies; 5+ messages in thread From: Marek Marczykowski-Górecki @ 2021-07-25 13:55 UTC (permalink / raw) To: luja; +Cc: xen-devel [-- Attachment #1: Type: text/plain, Size: 3120 bytes --] On Sun, Jul 25, 2021 at 02:31:17PM +0200, luja wrote: > Hi Marek, Hi all, Hi luja, First of all, please use appropriate mailing list for such emails, not email individual developers privately. I'm adding xen-devel here. > > On a HP Z600 I am trying to run qubes. > The Xen log says that the Chipset is affected by Intel-Errate #47, #53 > > the code in Xen is this: > > " > /* 5500/5520/X58 Chipset Interrupt remapping errata, for stepping B-3. > * Fixed in stepping C-2. */ > static void __init tylersburg_intremap_quirk(void) > { > uint32_t bus, device; > uint8_t rev; > > for ( bus = 0; bus < 0x100; bus++ ) > { > /* Match on System Management Registers on Device 20 Function 0 */ > device = pci_conf_read32(0, bus, 20, 0, PCI_VENDOR_ID); > rev = pci_conf_read8(0, bus, 20, 0, PCI_REVISION_ID); > > if ( rev == 0x13 && device == 0x342e8086 ) > { > printk(XENLOG_WARNING VTDPREFIX > "Disabling IOMMU due to Intel 5500/5520/X58 Chipset errata #47, #53\n"); > iommu_enable = 0; > break; > } > } > } > > " > > But! rev 0x13 is not suficient to detect the "wrong" host bridge. According to the spec by Intel (page 11 in the PDF you attached), it is. > This Z600 is equipped with 0B54h mainboard as can be seen with dmi-decode. > > The manual states that 0B54h mainboard has the "newer C2 stepping", > so it is *not* affected by Intel "spec update" (nota bene: Intel updates the > spec, others report erratas) bugs The code above checks for rev 0x13, and the spec (page 11) clearly says that rev 0x13 is stepping B-3. Stepping C-2 is rev 0x22. So, if this check triggers for you, I'm afraid you have the affected chipset. According to HP doc you attached, you can additionally confirm it via BIOS: To determine if a specific HP Z600 system has the C2 revision of the chipset: 1. Use the BIOS setup menu to access the “Boot Block Date” from the “System Information Menu.” All B3-based systems will have a “1/30/09” date and C2-based systems will have a “01/07/10” date. > So the way Xen detects the "bug" (pci rev 13) is not sufficient, as my Z600 > shows pci rev13 with lspci but 0xB54h (board rev only on Z600) with dmidecode > I would suggest first to have an override xen kernel boot option to disable the disablement in this code section. Or just patch this part out of the Xen code and rebuild xen. If this stuff really crashes, one will see it. Patching it out is out of the question, this check if there for a reason. > So please build a new xen without this stupid disablement or please add an override boot command for it. > > Please see the attached upgrade manual of Z600 and the errata "spec update" by Intel. > You see that the C2 stepping is not affected by the bugs refered to in the xen code, > so removing that section or adding better detection of the mask revision (B3 vs. C2) of 5520 host bridge would allow many users to operate Qubes4. Maybe someone else has an alternative idea? -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <659EA336-E36F-4025-9B6A-DC50A31F0FF1@openhardware.de>]
* Re: Xen-Error: Disabling IOMMU on Stepping C2 5520 Host-Bridge // Laser markings: SLH3P [not found] ` <659EA336-E36F-4025-9B6A-DC50A31F0FF1@openhardware.de> @ 2021-07-25 14:38 ` Ludwig Jaffe 0 siblings, 0 replies; 5+ messages in thread From: Ludwig Jaffe @ 2021-07-25 14:38 UTC (permalink / raw) To: Marek Marczykowski-Górecki; +Cc: xen-devel [-- Attachment #1.1: Type: text/plain, Size: 4709 bytes --] Just for documentation, heat sink reassembled using normal grey cpu thermal grease On July 25, 2021 4:30:39 PM GMT+02:00, Ludwig Jaffe <luja@openhardware.de> wrote: >Hi Marek, as you are refered as Xen expert I thought you are the only >one in the qubes project to know about it. >Hi people at Xen, it would be nice to add override options in such code >for test purposes something like forceiommu=1 > >So disassembling the cooler the chip reads >"SLH3P" the errata sheet refers it to C2 stepping and states it >supports Intel Trusted Execution TXT. >This is on page 11 (3rd line of table) of said intel errata. > >So things get a bit wired. Having an override in the kernel boot flags >would surely help >to bring the computer up with cubes as it should be supported according >to yhe laser markings. Maybe the pci-revisions are writen into >registers of the host bridge at the time the bios does pci(e) config >cycles and a buggy bios could simply write buggy pci revisions (just an >assumption). Laser markings on the die should be trusted. > >Regards, > >luja > > >On July 25, 2021 3:55:52 PM GMT+02:00, "Marek Marczykowski-Górecki" ><marmarek@invisiblethingslab.com> wrote: >>On Sun, Jul 25, 2021 at 02:31:17PM +0200, luja wrote: >>> Hi Marek, Hi all, >> >>Hi luja, >> >>First of all, please use appropriate mailing list for such emails, not >>email individual developers privately. I'm adding xen-devel here. >> >>> >>> On a HP Z600 I am trying to run qubes. >>> The Xen log says that the Chipset is affected by Intel-Errate #47, >>#53 >>> >>> the code in Xen is this: >>> >>> " >>> /* 5500/5520/X58 Chipset Interrupt remapping errata, for stepping >>B-3. >>> * Fixed in stepping C-2. */ >>> static void __init tylersburg_intremap_quirk(void) >>> { >>> uint32_t bus, device; >>> uint8_t rev; >>> >>> for ( bus = 0; bus < 0x100; bus++ ) >>> { >>> /* Match on System Management Registers on Device 20 Function 0 */ >>> device = pci_conf_read32(0, bus, 20, 0, PCI_VENDOR_ID); >>> rev = pci_conf_read8(0, bus, 20, 0, PCI_REVISION_ID); >>> >>> if ( rev == 0x13 && device == 0x342e8086 ) >>> { >>> printk(XENLOG_WARNING VTDPREFIX >>> "Disabling IOMMU due to Intel 5500/5520/X58 Chipset errata #47, >>#53\n"); >>> iommu_enable = 0; >>> break; >>> } >>> } >>> } >>> >>> " >>> >>> But! rev 0x13 is not suficient to detect the "wrong" host bridge. >> >>According to the spec by Intel (page 11 in the PDF you attached), it >>is. >> >>> This Z600 is equipped with 0B54h mainboard as can be seen with >>dmi-decode. >>> >>> The manual states that 0B54h mainboard has the "newer C2 stepping", >>> so it is *not* affected by Intel "spec update" (nota bene: Intel >>updates the >>> spec, others report erratas) bugs >> >>The code above checks for rev 0x13, and the spec (page 11) clearly >says >>that rev >>0x13 is stepping B-3. Stepping C-2 is rev 0x22. So, if this check >>triggers for you, I'm afraid you have the affected chipset. >> >>According to HP doc you attached, you can additionally confirm it via >>BIOS: >> To determine if a specific HP Z600 system >> has the C2 revision of the chipset: >> 1. Use the BIOS setup menu to access the “Boot >> Block Date” from the “System Information Menu.” >> All B3-based systems will have a “1/30/09” >> date and C2-based systems will have a >> “01/07/10” date. >> >>> So the way Xen detects the "bug" (pci rev 13) is not sufficient, as >>my Z600 >>> shows pci rev13 with lspci but 0xB54h (board rev only on Z600) with >>dmidecode >>> I would suggest first to have an override xen kernel boot option to >>disable the disablement in this code section. Or just patch this part >>out of the Xen code and rebuild xen. If this stuff really crashes, one >>will see it. >> >>Patching it out is out of the question, this check if there for a >>reason. >> >>> So please build a new xen without this stupid disablement or please >>add an override boot command for it. >>> >>> Please see the attached upgrade manual of Z600 and the errata "spec >>update" by Intel. >>> You see that the C2 stepping is not affected by the bugs refered to >>in the xen code, >>> so removing that section or adding better detection of the mask >>revision (B3 vs. C2) of 5520 host bridge would allow many users to >>operate Qubes4. >> >>Maybe someone else has an alternative idea? >> >>-- >>Best Regards, >>Marek Marczykowski-Górecki >>Invisible Things Lab > >-- >Sent from my Android device with K-9 Mail. Please excuse my brevity. -- Sent from my Android device with K-9 Mail. Please excuse my brevity. [-- Attachment #1.2: Type: text/html, Size: 5508 bytes --] [-- Attachment #2: IMG_20210725_163710.jpg --] [-- Type: image/jpeg, Size: 5979970 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Xen-Error: Disabling IOMMU on Stepping C2 5520 Host-Bridge 2021-07-25 13:55 ` Xen-Error: Disabling IOMMU on Stepping C2 5520 Host-Bridge Marek Marczykowski-Górecki [not found] ` <659EA336-E36F-4025-9B6A-DC50A31F0FF1@openhardware.de> @ 2021-07-27 12:21 ` Andrew Cooper 2021-07-27 14:36 ` Xen-Error: Disabling IOMMU on Stepping C2 5520 Host-Bridge // SLH3P marking on die luja 1 sibling, 1 reply; 5+ messages in thread From: Andrew Cooper @ 2021-07-27 12:21 UTC (permalink / raw) To: Marek Marczykowski-Górecki, luja; +Cc: xen-devel On 25/07/2021 14:55, Marek Marczykowski-Górecki wrote: > On Sun, Jul 25, 2021 at 02:31:17PM +0200, luja wrote: >> This Z600 is equipped with 0B54h mainboard as can be seen with dmi-decode. >> >> The manual states that 0B54h mainboard has the "newer C2 stepping", >> so it is *not* affected by Intel "spec update" (nota bene: Intel updates the >> spec, others report erratas) bugs > The code above checks for rev 0x13, and the spec (page 11) clearly says that rev > 0x13 is stepping B-3. Stepping C-2 is rev 0x22. So, if this check > triggers for you, I'm afraid you have the affected chipset. The ID in hardware is the authoritative information. Sounds like the Z600 manual is wrong. >> So the way Xen detects the "bug" (pci rev 13) is not sufficient, as my Z600 >> shows pci rev13 with lspci but 0xB54h (board rev only on Z600) with dmidecode >> I would suggest first to have an override xen kernel boot option to disable the disablement in this code section. Or just patch this part out of the Xen code and rebuild xen. If this stuff really crashes, one will see it. > Patching it out is out of the question, this check if there for a > reason. Using interrupt remapping on these systems does cause it to cease functioning. >> So please build a new xen without this stupid disablement or please add an override boot command for it. >> >> Please see the attached upgrade manual of Z600 and the errata "spec update" by Intel. >> You see that the C2 stepping is not affected by the bugs refered to in the xen code, >> so removing that section or adding better detection of the mask revision (B3 vs. C2) of 5520 host bridge would allow many users to operate Qubes4. > Maybe someone else has an alternative idea? The logic in Xen is broken. I've tried fixing it before for XenServer, but was objected to, and the patch is still in the patchqueue. The errata is with the Queued Invalidation, which (in Xen) is tied to interrupt remapping. The rest of the IOMMU works fine. The current status quo is that if Xen boots with an Intel gen1 IOMMU, it will be happy with DMA remapping but no IRQ remapping. If Xen boots on this specific buggy system, it will turn the entire IOMMU off in protest, which leaves the system less secure than booting on the previous generation of hardware. The correct behaviour is to just disable interrupt remapping in this case, which brings Xen's behaviour in line with adjacent generations of hardware. ~Andrew ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Xen-Error: Disabling IOMMU on Stepping C2 5520 Host-Bridge // SLH3P marking on die 2021-07-27 12:21 ` Xen-Error: Disabling IOMMU on Stepping C2 5520 Host-Bridge Andrew Cooper @ 2021-07-27 14:36 ` luja 2021-07-27 15:36 ` Andrew Cooper 0 siblings, 1 reply; 5+ messages in thread From: luja @ 2021-07-27 14:36 UTC (permalink / raw) To: Andrew Cooper; +Cc: Marek Marczykowski-Górecki, xen-devel [-- Attachment #1: Type: text/plain, Size: 5245 bytes --] Hi all, No, the correct behavior is to just use the host bridge as it is correct and works! Just the PCI config space is done wrongly in the board's BIOS? To get the truth... I disassembled the cooler, cleaned the "phase change" wax from it, photographed the laser engraving of the flip chip die and compared the text with the errata "spec update" by Intel. According to the laser marking and the errata the chip is a 5520 with C2 stepping. As it has an SLH3P marking on its die. I made a photo of it, which is available on request. The errata sheet refers it to C2 stepping and states it supports Intel Trusted Execution TXT. This is on page 11 (3rd line of table) of said intel errata. https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/5520-and-5500-chipset-ioh-specification-update.pdf So both Chipset errata #47, #53 mentioned in the code snippet disabling the Vt-d feature, are not present in this hardware, so the Host bridge should be kosher. For some wired reason the PCI rev is 13. I guess, that the ID is written by the bios using pci config cycles at early boot into registers of the host bridge to be then displayed using tools like lspci. Page 11 of the errata: "3. The Revision Number corresponds to bits 7:0 of the Revision ID Register located at offset 08h in the PCI function 0 configuration space " But in general: This is not Windows, so I would expect a kernel boot option to just say "I ignore your warning, and when a black hole forms in my mainboard it is my fault" so force_5520_C2=1 or something like this should be appropriate. So a small readme would advise the people who are affected by a fleaky implementation of C2 Hostbridge to give it a try! So what should happen?! Loose all your data on a freshly installed qubes-os?! Oh, I forgot my hdd password, and forgot to write it under the keyboard ;-) , so I need to reinstall. What is the difference. Computers should do what the user wants them to do, and when they break it is the fault of the user who ordered them to fail. So please add a kernel boot option to just go against this if-statement, so only a warning is printed into the log but IOMMMU is not disabled:if ( rev == 0x13 && device == 0x342e8086 ) { if (force_5520_C2==1) { printk(XENLOG_WARNING VTDPREFIX "NOT Disabling IOMMU as you requested force_5520_C2=1 and ignoring Intel 5500/5520/X58 Chipset errata #47, #53\n"); } else { printk(XENLOG_WARNING VTDPREFIX "Disabling IOMMU due to Intel 5500/5520/X58 Chipset errata #47, #53\n"); iommu_enable = 0; break; } } Cheers, luja Am Dienstag, Juli 27, 2021 14:21 CEST, schrieb Andrew Cooper <andrew.cooper3@citrix.com>: On 25/07/2021 14:55, Marek Marczykowski-Górecki wrote: > On Sun, Jul 25, 2021 at 02:31:17PM +0200, luja wrote: >> This Z600 is equipped with 0B54h mainboard as can be seen with dmi-decode. >> >> The manual states that 0B54h mainboard has the "newer C2 stepping", >> so it is *not* affected by Intel "spec update" (nota bene: Intel updates the >> spec, others report erratas) bugs > The code above checks for rev 0x13, and the spec (page 11) clearly says that rev > 0x13 is stepping B-3. Stepping C-2 is rev 0x22. So, if this check > triggers for you, I'm afraid you have the affected chipset. The ID in hardware is the authoritative information. Sounds like the Z600 manual is wrong. >> So the way Xen detects the "bug" (pci rev 13) is not sufficient, as my Z600 >> shows pci rev13 with lspci but 0xB54h (board rev only on Z600) with dmidecode >> I would suggest first to have an override xen kernel boot option to disable the disablement in this code section. Or just patch this part out of the Xen code and rebuild xen. If this stuff really crashes, one will see it. > Patching it out is out of the question, this check if there for a > reason. Using interrupt remapping on these systems does cause it to cease functioning. >> So please build a new xen without this stupid disablement or please add an override boot command for it. >> >> Please see the attached upgrade manual of Z600 and the errata "spec update" by Intel. >> You see that the C2 stepping is not affected by the bugs refered to in the xen code, >> so removing that section or adding better detection of the mask revision (B3 vs. C2) of 5520 host bridge would allow many users to operate Qubes4. > Maybe someone else has an alternative idea? The logic in Xen is broken. I've tried fixing it before for XenServer, but was objected to, and the patch is still in the patchqueue. The errata is with the Queued Invalidation, which (in Xen) is tied to interrupt remapping. The rest of the IOMMU works fine. The current status quo is that if Xen boots with an Intel gen1 IOMMU, it will be happy with DMA remapping but no IRQ remapping. If Xen boots on this specific buggy system, it will turn the entire IOMMU off in protest, which leaves the system less secure than booting on the previous generation of hardware. The correct behaviour is to just disable interrupt remapping in this case, which brings Xen's behaviour in line with adjacent generations of hardware. ~Andrew [-- Attachment #2: Type: text/html, Size: 6554 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Xen-Error: Disabling IOMMU on Stepping C2 5520 Host-Bridge // SLH3P marking on die 2021-07-27 14:36 ` Xen-Error: Disabling IOMMU on Stepping C2 5520 Host-Bridge // SLH3P marking on die luja @ 2021-07-27 15:36 ` Andrew Cooper 0 siblings, 0 replies; 5+ messages in thread From: Andrew Cooper @ 2021-07-27 15:36 UTC (permalink / raw) To: luja; +Cc: Marek Marczykowski-Górecki, xen-devel On 27/07/2021 15:36, luja wrote: > Hi all, > > No, the correct behavior is to just use the host bridge as it is > correct and works! What evidence do you have of this claim? Have you actually deleted the workaround, and confirmed that Xen works fully and correctly on this hardware? If not, that is your next task. > Just the PCI config space is done wrongly in the board's BIOS? These details are typically hard wired. > > To get the truth... > I disassembled the cooler, cleaned the "phase change" wax from it, > photographed the laser engraving of the flip chip die and compared > the text with the errata "spec update" by Intel. > > According to the laser marking and the errata the chip is a 5520 with C2 > stepping. As it has an SLH3P marking on its die. I made a photo of it, > which is available on request. > The errata sheet refers it to C2 stepping and states it supports Intel > Trusted Execution TXT. This is on page 11 (3rd line of table) of said > intel errata. > https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/5520-and-5500-chipset-ioh-specification-update.pdf I'm afraid that this doesn't prove anything. Topmarking fraud sadly exists. A famous example is the overclocking multiplier which used to be an external pin to chips, and no longer is because the cheaper slower CPUs had their topmarkings forged and sold as expensive faster ones. ~Andrew ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-07-27 15:36 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <43-60fd5980-39-6ab37100@54709844> 2021-07-25 13:55 ` Xen-Error: Disabling IOMMU on Stepping C2 5520 Host-Bridge Marek Marczykowski-Górecki [not found] ` <659EA336-E36F-4025-9B6A-DC50A31F0FF1@openhardware.de> 2021-07-25 14:38 ` Xen-Error: Disabling IOMMU on Stepping C2 5520 Host-Bridge // Laser markings: SLH3P Ludwig Jaffe 2021-07-27 12:21 ` Xen-Error: Disabling IOMMU on Stepping C2 5520 Host-Bridge Andrew Cooper 2021-07-27 14:36 ` Xen-Error: Disabling IOMMU on Stepping C2 5520 Host-Bridge // SLH3P marking on die luja 2021-07-27 15:36 ` Andrew Cooper
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.