All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/PCI: Fix Broadwell-EP Home Agent & PCU non-compliant BARs
@ 2016-05-09 18:23 Prarit Bhargava
  2016-05-09 19:20 ` Bjorn Helgaas
  0 siblings, 1 reply; 6+ messages in thread
From: Prarit Bhargava @ 2016-05-09 18:23 UTC (permalink / raw)
  To: linux-pci
  Cc: Prarit Bhargava, Bjorn Helgaas, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Myron Stowe, x86

commit b894157 ("x86/PCI: Mark Broadwell-EP Home Agent & PCU as having
non-compliant BARs") marks Home Agent 0 & PCU has having non-compliant
BARs.

Before commit b894157,

pci 0000:ff:12.0: BAR 2: failed to assign [mem size 0x00000040]
pci 0000:ff:12.0: BAR 4: failed to assign [mem size 0x00000040]
pci 0000:ff:12.4: BAR 2: failed to assign [mem size 0x00000040]
pci 0000:ff:12.4: BAR 4: failed to assign [mem size 0x00000040]
pci 0000:ff:12.0: BAR 1: failed to assign [mem size 0x00000010]
pci 0000:ff:12.0: BAR 3: failed to assign [mem size 0x00000010]
pci 0000:ff:12.0: BAR 5: failed to assign [mem size 0x00000010]
pci 0000:ff:12.4: BAR 1: failed to assign [mem size 0x00000010]
pci 0000:ff:12.4: BAR 3: failed to assign [mem size 0x00000010]
pci 0000:ff:12.4: BAR 5: failed to assign [mem size 0x00000010]
pci 0000:7f:12.0: BAR 2: failed to assign [mem size 0x00000040]
pci 0000:7f:12.0: BAR 4: failed to assign [mem size 0x00000040]
pci 0000:7f:12.4: BAR 2: failed to assign [mem size 0x00000040]
pci 0000:7f:12.4: BAR 4: failed to assign [mem size 0x00000040]
pci 0000:7f:12.0: BAR 1: failed to assign [mem size 0x00000010]
pci 0000:7f:12.0: BAR 3: failed to assign [mem size 0x00000010]
pci 0000:7f:12.0: BAR 5: failed to assign [mem size 0x00000010]
pci 0000:7f:12.4: BAR 1: failed to assign [mem size 0x00000010]
pci 0000:7f:12.4: BAR 3: failed to assign [mem size 0x00000010]
pci 0000:7f:12.4: BAR 5: failed to assign [mem size 0x00000010]

After commit b894157, there are still "failed to assign" messages,
as well as new "failed to assign" messages for ff:12.0, ff:1e.3,
7f:12.0, and 7f:1e.3.

 pci 0000:ff:12.4: BAR 2: failed to assign [mem size 0x00000040]
 pci 0000:ff:12.4: BAR 4: failed to assign [mem size 0x00000040]
 pci 0000:ff:12.4: BAR 1: failed to assign [mem size 0x00000010]
 pci 0000:ff:12.4: BAR 3: failed to assign [mem size 0x00000010]
 pci 0000:ff:12.4: BAR 5: failed to assign [mem size 0x00000010]
 pci 0000:ff:12.0: BAR 6: failed to assign [mem size 0x00000001 pref]
 pci 0000:ff:1e.3: BAR 6: failed to assign [mem size 0x00000001 pref]
 pci 0000:7f:12.4: BAR 2: failed to assign [mem size 0x00000040]
 pci 0000:7f:12.4: BAR 4: failed to assign [mem size 0x00000040]
 pci 0000:7f:12.4: BAR 1: failed to assign [mem size 0x00000010]
 pci 0000:7f:12.4: BAR 3: failed to assign [mem size 0x00000010]
 pci 0000:7f:12.4: BAR 5: failed to assign [mem size 0x00000010]
 pci 0000:7f:12.0: BAR 6: failed to assign [mem size 0x00000001 pref]
 pci 0000:7f:1e.3: BAR 6: failed to assign [mem size 0x00000001 pref]

There are two issues with commit b894157.

The first is that there is another device, Home Agent 1 & PCU, that must
also be quirked in the same way.

\# lspci -n -s 7f:12.4
7f:12.4 0880: 8086:6f60 (rev 01)

After applying the quirk patch, we end up with:

pci 0000:ff:12.0: BAR 6: failed to assign [mem size 0x00000001 pref]
pci 0000:ff:12.4: BAR 6: failed to assign [mem size 0x00000001 pref]
pci 0000:ff:1e.3: BAR 6: failed to assign [mem size 0x00000001 pref]
pci 0000:7f:12.0: BAR 6: failed to assign [mem size 0x00000001 pref]
pci 0000:7f:12.4: BAR 6: failed to assign [mem size 0x00000001 pref]
pci 0000:7f:1e.3: BAR 6: failed to assign [mem size 0x00000001 pref]

which drives us to the second issue.  Since the PCI devices now
have unnassigned resources (BARs), pcibios_assign_resources()
call pci_assign_unassigned_root_bus_resources().  This results in the
messages above.  I have added a non_compliant_bars check in
pbus_assign_resources_sorted() to avoid the unassigned device's resources
from being added to the failed resources list for the bus.

Successfully tested by me on a three vendor's Broadwell-EP systems which
no longer show the above false errors messages.

Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Myron Stowe <mstowe@redhat.com>
Cc: x86@kernel.org
Fixes: b894157 ("x86/PCI: Mark Broadwell-EP Home Agent & PCU as having non-compliant BARs")
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
---
 arch/x86/pci/fixup.c    |    1 +
 drivers/pci/setup-bus.c |    6 ++++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
index b7de192..59a79e6 100644
--- a/arch/x86/pci/fixup.c
+++ b/arch/x86/pci/fixup.c
@@ -556,5 +556,6 @@ static void pci_bdwep_bar(struct pci_dev *dev)
 {
 	dev->non_compliant_bars = 1;
 }
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x6f60, pci_bdwep_bar);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x6fa0, pci_bdwep_bar);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x6fc0, pci_bdwep_bar);
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 55641a3..78b6b69 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -515,8 +515,10 @@ static void pbus_assign_resources_sorted(const struct pci_bus *bus,
 	struct pci_dev *dev;
 	LIST_HEAD(head);
 
-	list_for_each_entry(dev, &bus->devices, bus_list)
-		__dev_sort_resources(dev, &head);
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		if (!dev->non_compliant_bars)
+			__dev_sort_resources(dev, &head);
+	}
 
 	__assign_resources_sorted(&head, realloc_head, fail_head);
 }
-- 
1.7.9.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/PCI: Fix Broadwell-EP Home Agent & PCU non-compliant BARs
  2016-05-09 18:23 [PATCH] x86/PCI: Fix Broadwell-EP Home Agent & PCU non-compliant BARs Prarit Bhargava
@ 2016-05-09 19:20 ` Bjorn Helgaas
  2016-05-09 20:34   ` Andi Kleen
  2016-05-10 13:06   ` Prarit Bhargava
  0 siblings, 2 replies; 6+ messages in thread
From: Bjorn Helgaas @ 2016-05-09 19:20 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: linux-pci, Bjorn Helgaas, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Myron Stowe, x86, Andi Kleen

[+cc Andi]

Hi Prarit,

On Mon, May 09, 2016 at 02:23:15PM -0400, Prarit Bhargava wrote:
> commit b894157 ("x86/PCI: Mark Broadwell-EP Home Agent & PCU as having
> non-compliant BARs") marks Home Agent 0 & PCU has having non-compliant
> BARs.

By convention, I use 12-char SHA1 ("[core] abbrev=12" in .git/config)
when citing commits.

> Before commit b894157,
> 
> pci 0000:ff:12.0: BAR 2: failed to assign [mem size 0x00000040]
> pci 0000:ff:12.0: BAR 4: failed to assign [mem size 0x00000040]
> pci 0000:ff:12.4: BAR 2: failed to assign [mem size 0x00000040]
> pci 0000:ff:12.4: BAR 4: failed to assign [mem size 0x00000040]
> pci 0000:ff:12.0: BAR 1: failed to assign [mem size 0x00000010]
> pci 0000:ff:12.0: BAR 3: failed to assign [mem size 0x00000010]
> pci 0000:ff:12.0: BAR 5: failed to assign [mem size 0x00000010]
> pci 0000:ff:12.4: BAR 1: failed to assign [mem size 0x00000010]
> pci 0000:ff:12.4: BAR 3: failed to assign [mem size 0x00000010]
> pci 0000:ff:12.4: BAR 5: failed to assign [mem size 0x00000010]
> pci 0000:7f:12.0: BAR 2: failed to assign [mem size 0x00000040]
> pci 0000:7f:12.0: BAR 4: failed to assign [mem size 0x00000040]
> pci 0000:7f:12.4: BAR 2: failed to assign [mem size 0x00000040]
> pci 0000:7f:12.4: BAR 4: failed to assign [mem size 0x00000040]
> pci 0000:7f:12.0: BAR 1: failed to assign [mem size 0x00000010]
> pci 0000:7f:12.0: BAR 3: failed to assign [mem size 0x00000010]
> pci 0000:7f:12.0: BAR 5: failed to assign [mem size 0x00000010]
> pci 0000:7f:12.4: BAR 1: failed to assign [mem size 0x00000010]
> pci 0000:7f:12.4: BAR 3: failed to assign [mem size 0x00000010]
> pci 0000:7f:12.4: BAR 5: failed to assign [mem size 0x00000010]
> 
> After commit b894157, there are still "failed to assign" messages,
> as well as new "failed to assign" messages for ff:12.0, ff:1e.3,
> 7f:12.0, and 7f:1e.3.
> 
>  pci 0000:ff:12.4: BAR 2: failed to assign [mem size 0x00000040]
>  pci 0000:ff:12.4: BAR 4: failed to assign [mem size 0x00000040]
>  pci 0000:ff:12.4: BAR 1: failed to assign [mem size 0x00000010]
>  pci 0000:ff:12.4: BAR 3: failed to assign [mem size 0x00000010]
>  pci 0000:ff:12.4: BAR 5: failed to assign [mem size 0x00000010]
>  pci 0000:ff:12.0: BAR 6: failed to assign [mem size 0x00000001 pref]
>  pci 0000:ff:1e.3: BAR 6: failed to assign [mem size 0x00000001 pref]
>  pci 0000:7f:12.4: BAR 2: failed to assign [mem size 0x00000040]
>  pci 0000:7f:12.4: BAR 4: failed to assign [mem size 0x00000040]
>  pci 0000:7f:12.4: BAR 1: failed to assign [mem size 0x00000010]
>  pci 0000:7f:12.4: BAR 3: failed to assign [mem size 0x00000010]
>  pci 0000:7f:12.4: BAR 5: failed to assign [mem size 0x00000010]
>  pci 0000:7f:12.0: BAR 6: failed to assign [mem size 0x00000001 pref]
>  pci 0000:7f:1e.3: BAR 6: failed to assign [mem size 0x00000001 pref]
> 
> There are two issues with commit b894157.
> 
> The first is that there is another device, Home Agent 1 & PCU, that must
> also be quirked in the same way.
> 
> \# lspci -n -s 7f:12.4
> 7f:12.4 0880: 8086:6f60 (rev 01)

I think we should split this into two patches: one to add quirks for
the Home Agent 1 & PCU, and a second for the resource assignment
issue.

Can you dig up a spec for these devices?  I should have asked Andi for
that the first time around, but I didn't.  Maybe there's something
we're not interpreting correctly.  I still have a hard time believing
that Intel would produce a PCI device with non-BAR registers where the
BARs are supposed to be.  Maybe there's supposed to be an EA
capability or something that tells us to ignore these registers.

Can you collect "lspci -vvxxx" output for one of these devices?

> After applying the quirk patch, we end up with:
> 
> pci 0000:ff:12.0: BAR 6: failed to assign [mem size 0x00000001 pref]
> pci 0000:ff:12.4: BAR 6: failed to assign [mem size 0x00000001 pref]
> pci 0000:ff:1e.3: BAR 6: failed to assign [mem size 0x00000001 pref]
> pci 0000:7f:12.0: BAR 6: failed to assign [mem size 0x00000001 pref]
> pci 0000:7f:12.4: BAR 6: failed to assign [mem size 0x00000001 pref]
> pci 0000:7f:1e.3: BAR 6: failed to assign [mem size 0x00000001 pref]
> 
> which drives us to the second issue.  Since the PCI devices now
> have unnassigned resources (BARs), pcibios_assign_resources()
> call pci_assign_unassigned_root_bus_resources().  This results in the
> messages above.  I have added a non_compliant_bars check in
> pbus_assign_resources_sorted() to avoid the unassigned device's resources
> from being added to the failed resources list for the bus.

I don't understand this part yet.  If we mark a device with
non_compliant_bars, __pci_read_base() will return without doing
anything, so we should not fill in the struct resource at all.  It
wouldn't have the "mem" or "pref" bits shown above, and it shouldn't
participate in pcibios_assign_resources() at all.  All of these are
for BAR 6 (the ROM BAR), so maybe there's something wrong with the way
to handle that in particular.

Can you collect "lspci -vvxxx" output for one of these devices also?

> Successfully tested by me on a three vendor's Broadwell-EP systems which
> no longer show the above false errors messages.
> 
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Myron Stowe <mstowe@redhat.com>
> Cc: x86@kernel.org
> Fixes: b894157 ("x86/PCI: Mark Broadwell-EP Home Agent & PCU as having non-compliant BARs")
> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
> ---
>  arch/x86/pci/fixup.c    |    1 +
>  drivers/pci/setup-bus.c |    6 ++++--
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
> index b7de192..59a79e6 100644
> --- a/arch/x86/pci/fixup.c
> +++ b/arch/x86/pci/fixup.c
> @@ -556,5 +556,6 @@ static void pci_bdwep_bar(struct pci_dev *dev)
>  {
>  	dev->non_compliant_bars = 1;
>  }
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x6f60, pci_bdwep_bar);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x6fa0, pci_bdwep_bar);
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x6fc0, pci_bdwep_bar);
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 55641a3..78b6b69 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -515,8 +515,10 @@ static void pbus_assign_resources_sorted(const struct pci_bus *bus,
>  	struct pci_dev *dev;
>  	LIST_HEAD(head);
>  
> -	list_for_each_entry(dev, &bus->devices, bus_list)
> -		__dev_sort_resources(dev, &head);
> +	list_for_each_entry(dev, &bus->devices, bus_list) {
> +		if (!dev->non_compliant_bars)
> +			__dev_sort_resources(dev, &head);
> +	}
>  
>  	__assign_resources_sorted(&head, realloc_head, fail_head);
>  }
> -- 
> 1.7.9.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/PCI: Fix Broadwell-EP Home Agent & PCU non-compliant BARs
  2016-05-09 19:20 ` Bjorn Helgaas
@ 2016-05-09 20:34   ` Andi Kleen
  2016-05-10 13:06   ` Prarit Bhargava
  1 sibling, 0 replies; 6+ messages in thread
From: Andi Kleen @ 2016-05-09 20:34 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Prarit Bhargava, linux-pci, Bjorn Helgaas, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Myron Stowe, x86

> > The first is that there is another device, Home Agent 1 & PCU, that must
> > also be quirked in the same way.
> > 
> > \# lspci -n -s 7f:12.4
> > 7f:12.4 0880: 8086:6f60 (rev 01)

I think I had this in the later versions of my patches.
Perhaps the second ID got lost somewhere when the patches
got changed around. Change looks good to me.

> 
> I think we should split this into two patches: one to add quirks for
> the Home Agent 1 & PCU, and a second for the resource assignment
> issue.
> 
> Can you dig up a spec for these devices?  I should have asked Andi for
> that the first time around, but I didn't.  Maybe there's something
> we're not interpreting correctly.  I still have a hard time believing
> that Intel would produce a PCI device with non-BAR registers where the
> BARs are supposed to be.  Maybe there's supposed to be an EA
> capability or something that tells us to ignore these registers.

It's not a real register, but due to a hardware problem it
still returns non zero on reads.

The issue is documented in the Xeon v4 specification update (but unfortunately
missing the second device ID there)

http://www.intel.com/content/www/us/en/processors/xeon/xeon-e5-v4-spec-update.html

BDF2
PCI BARs in the Home Agent Will Return Non-Zero Values During 
Enumeration
Problem:
During system initialization the Operating System may access the standard PCI BARs 
(Base Address Registers). Due to this erratum, accesses to the Home Agent BAR 
registers (Bus 1; Device 18; Function 0,4; Offsets 0x14-0x24) will return non-zero 
values.
Implication:
The operating system may issue a warning. Intel has not observed any functional 
failures due to this erratum.
Workaround:
None identified.

-Andi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/PCI: Fix Broadwell-EP Home Agent & PCU non-compliant BARs
  2016-05-09 19:20 ` Bjorn Helgaas
  2016-05-09 20:34   ` Andi Kleen
@ 2016-05-10 13:06   ` Prarit Bhargava
  2016-05-10 17:45     ` Bjorn Helgaas
  1 sibling, 1 reply; 6+ messages in thread
From: Prarit Bhargava @ 2016-05-10 13:06 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Bjorn Helgaas, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Myron Stowe, x86, Andi Kleen



On 05/09/2016 03:20 PM, Bjorn Helgaas wrote:
> [+cc Andi]
> 
> Hi Prarit,
> 
> On Mon, May 09, 2016 at 02:23:15PM -0400, Prarit Bhargava wrote:
>> commit b894157 ("x86/PCI: Mark Broadwell-EP Home Agent & PCU as having
>> non-compliant BARs") marks Home Agent 0 & PCU has having non-compliant
>> BARs.
> 
> By convention, I use 12-char SHA1 ("[core] abbrev=12" in .git/config)
> when citing commits.
> 
>> Before commit b894157,
>>
>> pci 0000:ff:12.0: BAR 2: failed to assign [mem size 0x00000040]
>> pci 0000:ff:12.0: BAR 4: failed to assign [mem size 0x00000040]
>> pci 0000:ff:12.4: BAR 2: failed to assign [mem size 0x00000040]
>> pci 0000:ff:12.4: BAR 4: failed to assign [mem size 0x00000040]
>> pci 0000:ff:12.0: BAR 1: failed to assign [mem size 0x00000010]
>> pci 0000:ff:12.0: BAR 3: failed to assign [mem size 0x00000010]
>> pci 0000:ff:12.0: BAR 5: failed to assign [mem size 0x00000010]
>> pci 0000:ff:12.4: BAR 1: failed to assign [mem size 0x00000010]
>> pci 0000:ff:12.4: BAR 3: failed to assign [mem size 0x00000010]
>> pci 0000:ff:12.4: BAR 5: failed to assign [mem size 0x00000010]
>> pci 0000:7f:12.0: BAR 2: failed to assign [mem size 0x00000040]
>> pci 0000:7f:12.0: BAR 4: failed to assign [mem size 0x00000040]
>> pci 0000:7f:12.4: BAR 2: failed to assign [mem size 0x00000040]
>> pci 0000:7f:12.4: BAR 4: failed to assign [mem size 0x00000040]
>> pci 0000:7f:12.0: BAR 1: failed to assign [mem size 0x00000010]
>> pci 0000:7f:12.0: BAR 3: failed to assign [mem size 0x00000010]
>> pci 0000:7f:12.0: BAR 5: failed to assign [mem size 0x00000010]
>> pci 0000:7f:12.4: BAR 1: failed to assign [mem size 0x00000010]
>> pci 0000:7f:12.4: BAR 3: failed to assign [mem size 0x00000010]
>> pci 0000:7f:12.4: BAR 5: failed to assign [mem size 0x00000010]
>>
>> After commit b894157, there are still "failed to assign" messages,
>> as well as new "failed to assign" messages for ff:12.0, ff:1e.3,
>> 7f:12.0, and 7f:1e.3.
>>
>>  pci 0000:ff:12.4: BAR 2: failed to assign [mem size 0x00000040]
>>  pci 0000:ff:12.4: BAR 4: failed to assign [mem size 0x00000040]
>>  pci 0000:ff:12.4: BAR 1: failed to assign [mem size 0x00000010]
>>  pci 0000:ff:12.4: BAR 3: failed to assign [mem size 0x00000010]
>>  pci 0000:ff:12.4: BAR 5: failed to assign [mem size 0x00000010]
>>  pci 0000:ff:12.0: BAR 6: failed to assign [mem size 0x00000001 pref]
>>  pci 0000:ff:1e.3: BAR 6: failed to assign [mem size 0x00000001 pref]
>>  pci 0000:7f:12.4: BAR 2: failed to assign [mem size 0x00000040]
>>  pci 0000:7f:12.4: BAR 4: failed to assign [mem size 0x00000040]
>>  pci 0000:7f:12.4: BAR 1: failed to assign [mem size 0x00000010]
>>  pci 0000:7f:12.4: BAR 3: failed to assign [mem size 0x00000010]
>>  pci 0000:7f:12.4: BAR 5: failed to assign [mem size 0x00000010]
>>  pci 0000:7f:12.0: BAR 6: failed to assign [mem size 0x00000001 pref]
>>  pci 0000:7f:1e.3: BAR 6: failed to assign [mem size 0x00000001 pref]
>>
>> There are two issues with commit b894157.
>>
>> The first is that there is another device, Home Agent 1 & PCU, that must
>> also be quirked in the same way.
>>
>> \# lspci -n -s 7f:12.4
>> 7f:12.4 0880: 8086:6f60 (rev 01)
> 
> I think we should split this into two patches: one to add quirks for
> the Home Agent 1 & PCU, and a second for the resource assignment
> issue.
> 
> Can you dig up a spec for these devices?  I should have asked Andi for
> that the first time around, but I didn't.  Maybe there's something
> we're not interpreting correctly.  I still have a hard time believing
> that Intel would produce a PCI device with non-BAR registers where the
> BARs are supposed to be.  Maybe there's supposed to be an EA
> capability or something that tells us to ignore these registers.

It looks like Andi has provided this information in his reply.  I will add some
of that info in my next post.

> 
> Can you collect "lspci -vvxxx" output for one of these devices?

# lspci -xxxvv -s 7f:12.4
7f:12.4 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3
v4/Xeon D Home Agent 1 (rev 01)
        Subsystem: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D
Home Agent 1
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
00: 86 80 60 6f 00 00 00 00 01 00 80 08 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 60 6f
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
40: 04 f5 01 00 04 f5 21 00 00 f0 21 00 00 f0 21 00
50: 00 f0 21 00 00 f0 21 00 00 f0 21 00 00 f0 21 00
60: 00 f0 21 00 00 f0 21 00 00 f0 21 00 00 f0 21 00
70: 00 00 02 00 00 00 00 00 00 00 88 44 04 00 00 00
80: 02 82 30 03 88 4a 61 40 00 00 00 00 00 00 00 00
90: 00 00 02 00 00 00 01 00 5c 00 00 00 18 00 00 00
a0: 00 00 00 00 93 97 15 00 0d 34 90 00 00 30 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 0c 00 00 00
c0: 00 00 00 00 00 00 00 00 2d 2d 00 00 f8 e3 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00


> 
>> After applying the quirk patch, we end up with:
>>
>> pci 0000:ff:12.0: BAR 6: failed to assign [mem size 0x00000001 pref]
>> pci 0000:ff:12.4: BAR 6: failed to assign [mem size 0x00000001 pref]
>> pci 0000:ff:1e.3: BAR 6: failed to assign [mem size 0x00000001 pref]
>> pci 0000:7f:12.0: BAR 6: failed to assign [mem size 0x00000001 pref]
>> pci 0000:7f:12.4: BAR 6: failed to assign [mem size 0x00000001 pref]
>> pci 0000:7f:1e.3: BAR 6: failed to assign [mem size 0x00000001 pref]
>>
>> which drives us to the second issue.  Since the PCI devices now
>> have unnassigned resources (BARs), pcibios_assign_resources()
>> call pci_assign_unassigned_root_bus_resources().  This results in the
>> messages above.  I have added a non_compliant_bars check in
>> pbus_assign_resources_sorted() to avoid the unassigned device's resources
>> from being added to the failed resources list for the bus.
> 
> I don't understand this part yet.  If we mark a device with
> non_compliant_bars, __pci_read_base() will return without doing
> anything, so we should not fill in the struct resource at all.  It
> wouldn't have the "mem" or "pref" bits shown above, and it shouldn't
> participate in pcibios_assign_resources() at all.  All of these are
> for BAR 6 (the ROM BAR), so maybe there's something wrong with the way
> to handle that in particular.

I did some additional debugging after reading your comment.  I dumped out the
contents of the resources for each bus's devices.  I concentrated on one
particular device, 7f:12.4 (the output of lspci is above).

Before the call to pcibios_assign_resources(), 7f:12.4 has

pci 0000:7f:12.4: PRARIT:    BAR 6: [mem 0x00000000 pref]

so that means it the resource was not changed/modified in the call of
pcibios_assign_resources().

I took a closer look at the code, and I think I know what the issue is.  In
pci_read_bases() the code does

        if (rom) {
                struct resource *res = &dev->resource[PCI_ROM_RESOURCE];
                dev->rom_base_reg = rom;
                res->flags = IORESOURCE_MEM | IORESOURCE_PREFETCH |
                                IORESOURCE_READONLY | IORESOURCE_CACHEABLE |
                                IORESOURCE_SIZEALIGN;
                __pci_read_base(dev, pci_bar_mem32, res, rom);
        }

which initializes the res->flags field.  This field is later checked in
pcibios_allocate_dev_rom_resource() which is called from
pcibios_assign_resources(), and the resource is declared unassigned because it
has a res->flags field.

Perhaps (sorry for the cut-and-paste) this is a more correct fix which would
avoid the setting of res->flags?  There is no point in setting
dev->rom_base_reg, etc., if this is a non-compliant device.

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 2384100..818731a 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -361,7 +361,7 @@ static void pci_read_bases(struct pci_dev *dev, unsigned int
                pos += __pci_read_base(dev, pci_bar_unknown, res, reg);
        }

-       if (rom) {
+       if (rom && !dev->non_compliant_bars) {
                struct resource *res = &dev->resource[PCI_ROM_RESOURCE];
                dev->rom_base_reg = rom;
                res->flags = IORESOURCE_MEM | IORESOURCE_PREFETCH |

P.

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/PCI: Fix Broadwell-EP Home Agent & PCU non-compliant BARs
  2016-05-10 13:06   ` Prarit Bhargava
@ 2016-05-10 17:45     ` Bjorn Helgaas
  2016-05-11 10:08       ` Prarit Bhargava
  0 siblings, 1 reply; 6+ messages in thread
From: Bjorn Helgaas @ 2016-05-10 17:45 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: linux-pci, Bjorn Helgaas, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Myron Stowe, x86, Andi Kleen

On Tue, May 10, 2016 at 09:06:29AM -0400, Prarit Bhargava wrote:
> 
> 
> On 05/09/2016 03:20 PM, Bjorn Helgaas wrote:
> > [+cc Andi]
> > 
> > Hi Prarit,
> > 
> > On Mon, May 09, 2016 at 02:23:15PM -0400, Prarit Bhargava wrote:
> >> commit b894157 ("x86/PCI: Mark Broadwell-EP Home Agent & PCU as having
> >> non-compliant BARs") marks Home Agent 0 & PCU has having non-compliant
> >> BARs.
> > 
> > By convention, I use 12-char SHA1 ("[core] abbrev=12" in .git/config)
> > when citing commits.
> > 
> >> Before commit b894157,
> >>
> >> pci 0000:ff:12.0: BAR 2: failed to assign [mem size 0x00000040]
> >> pci 0000:ff:12.0: BAR 4: failed to assign [mem size 0x00000040]
> >> pci 0000:ff:12.4: BAR 2: failed to assign [mem size 0x00000040]
> >> pci 0000:ff:12.4: BAR 4: failed to assign [mem size 0x00000040]
> >> pci 0000:ff:12.0: BAR 1: failed to assign [mem size 0x00000010]
> >> pci 0000:ff:12.0: BAR 3: failed to assign [mem size 0x00000010]
> >> pci 0000:ff:12.0: BAR 5: failed to assign [mem size 0x00000010]
> >> pci 0000:ff:12.4: BAR 1: failed to assign [mem size 0x00000010]
> >> pci 0000:ff:12.4: BAR 3: failed to assign [mem size 0x00000010]
> >> pci 0000:ff:12.4: BAR 5: failed to assign [mem size 0x00000010]
> >> pci 0000:7f:12.0: BAR 2: failed to assign [mem size 0x00000040]
> >> pci 0000:7f:12.0: BAR 4: failed to assign [mem size 0x00000040]
> >> pci 0000:7f:12.4: BAR 2: failed to assign [mem size 0x00000040]
> >> pci 0000:7f:12.4: BAR 4: failed to assign [mem size 0x00000040]
> >> pci 0000:7f:12.0: BAR 1: failed to assign [mem size 0x00000010]
> >> pci 0000:7f:12.0: BAR 3: failed to assign [mem size 0x00000010]
> >> pci 0000:7f:12.0: BAR 5: failed to assign [mem size 0x00000010]
> >> pci 0000:7f:12.4: BAR 1: failed to assign [mem size 0x00000010]
> >> pci 0000:7f:12.4: BAR 3: failed to assign [mem size 0x00000010]
> >> pci 0000:7f:12.4: BAR 5: failed to assign [mem size 0x00000010]
> >>
> >> After commit b894157, there are still "failed to assign" messages,
> >> as well as new "failed to assign" messages for ff:12.0, ff:1e.3,
> >> 7f:12.0, and 7f:1e.3.
> >>
> >>  pci 0000:ff:12.4: BAR 2: failed to assign [mem size 0x00000040]
> >>  pci 0000:ff:12.4: BAR 4: failed to assign [mem size 0x00000040]
> >>  pci 0000:ff:12.4: BAR 1: failed to assign [mem size 0x00000010]
> >>  pci 0000:ff:12.4: BAR 3: failed to assign [mem size 0x00000010]
> >>  pci 0000:ff:12.4: BAR 5: failed to assign [mem size 0x00000010]
> >>  pci 0000:ff:12.0: BAR 6: failed to assign [mem size 0x00000001 pref]
> >>  pci 0000:ff:1e.3: BAR 6: failed to assign [mem size 0x00000001 pref]
> >>  pci 0000:7f:12.4: BAR 2: failed to assign [mem size 0x00000040]
> >>  pci 0000:7f:12.4: BAR 4: failed to assign [mem size 0x00000040]
> >>  pci 0000:7f:12.4: BAR 1: failed to assign [mem size 0x00000010]
> >>  pci 0000:7f:12.4: BAR 3: failed to assign [mem size 0x00000010]
> >>  pci 0000:7f:12.4: BAR 5: failed to assign [mem size 0x00000010]
> >>  pci 0000:7f:12.0: BAR 6: failed to assign [mem size 0x00000001 pref]
> >>  pci 0000:7f:1e.3: BAR 6: failed to assign [mem size 0x00000001 pref]
> >>
> >> There are two issues with commit b894157.
> >>
> >> The first is that there is another device, Home Agent 1 & PCU, that must
> >> also be quirked in the same way.
> >>
> >> \# lspci -n -s 7f:12.4
> >> 7f:12.4 0880: 8086:6f60 (rev 01)
> > 
> > I think we should split this into two patches: one to add quirks for
> > the Home Agent 1 & PCU, and a second for the resource assignment
> > issue.
> > 
> > Can you dig up a spec for these devices?  I should have asked Andi for
> > that the first time around, but I didn't.  Maybe there's something
> > we're not interpreting correctly.  I still have a hard time believing
> > that Intel would produce a PCI device with non-BAR registers where the
> > BARs are supposed to be.  Maybe there's supposed to be an EA
> > capability or something that tells us to ignore these registers.
> 
> It looks like Andi has provided this information in his reply.  I will add some
> of that info in my next post.
> 
> > 
> > Can you collect "lspci -vvxxx" output for one of these devices?
> 
> # lspci -xxxvv -s 7f:12.4
> 7f:12.4 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3
> v4/Xeon D Home Agent 1 (rev 01)
>         Subsystem: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D
> Home Agent 1
>         Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx-
>         Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
> <MAbort- >SERR- <PERR- INTx-
> 00: 86 80 60 6f 00 00 00 00 01 00 80 08 00 00 80 00
> 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 60 6f
> 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 40: 04 f5 01 00 04 f5 21 00 00 f0 21 00 00 f0 21 00
> 50: 00 f0 21 00 00 f0 21 00 00 f0 21 00 00 f0 21 00
> 60: 00 f0 21 00 00 f0 21 00 00 f0 21 00 00 f0 21 00
> 70: 00 00 02 00 00 00 00 00 00 00 88 44 04 00 00 00
> 80: 02 82 30 03 88 4a 61 40 00 00 00 00 00 00 00 00
> 90: 00 00 02 00 00 00 01 00 5c 00 00 00 18 00 00 00
> a0: 00 00 00 00 93 97 15 00 0d 34 90 00 00 30 00 00
> b0: 00 00 00 00 00 00 00 00 00 00 00 00 0c 00 00 00
> c0: 00 00 00 00 00 00 00 00 2d 2d 00 00 f8 e3 00 00
> d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 
> 
> > 
> >> After applying the quirk patch, we end up with:
> >>
> >> pci 0000:ff:12.0: BAR 6: failed to assign [mem size 0x00000001 pref]
> >> pci 0000:ff:12.4: BAR 6: failed to assign [mem size 0x00000001 pref]
> >> pci 0000:ff:1e.3: BAR 6: failed to assign [mem size 0x00000001 pref]
> >> pci 0000:7f:12.0: BAR 6: failed to assign [mem size 0x00000001 pref]
> >> pci 0000:7f:12.4: BAR 6: failed to assign [mem size 0x00000001 pref]
> >> pci 0000:7f:1e.3: BAR 6: failed to assign [mem size 0x00000001 pref]
> >>
> >> which drives us to the second issue.  Since the PCI devices now
> >> have unnassigned resources (BARs), pcibios_assign_resources()
> >> call pci_assign_unassigned_root_bus_resources().  This results in the
> >> messages above.  I have added a non_compliant_bars check in
> >> pbus_assign_resources_sorted() to avoid the unassigned device's resources
> >> from being added to the failed resources list for the bus.
> > 
> > I don't understand this part yet.  If we mark a device with
> > non_compliant_bars, __pci_read_base() will return without doing
> > anything, so we should not fill in the struct resource at all.  It
> > wouldn't have the "mem" or "pref" bits shown above, and it shouldn't
> > participate in pcibios_assign_resources() at all.  All of these are
> > for BAR 6 (the ROM BAR), so maybe there's something wrong with the way
> > to handle that in particular.
> 
> I did some additional debugging after reading your comment.  I dumped out the
> contents of the resources for each bus's devices.  I concentrated on one
> particular device, 7f:12.4 (the output of lspci is above).
> 
> Before the call to pcibios_assign_resources(), 7f:12.4 has
> 
> pci 0000:7f:12.4: PRARIT:    BAR 6: [mem 0x00000000 pref]
> 
> so that means it the resource was not changed/modified in the call of
> pcibios_assign_resources().
> 
> I took a closer look at the code, and I think I know what the issue is.  In
> pci_read_bases() the code does
> 
>         if (rom) {
>                 struct resource *res = &dev->resource[PCI_ROM_RESOURCE];
>                 dev->rom_base_reg = rom;
>                 res->flags = IORESOURCE_MEM | IORESOURCE_PREFETCH |
>                                 IORESOURCE_READONLY | IORESOURCE_CACHEABLE |
>                                 IORESOURCE_SIZEALIGN;
>                 __pci_read_base(dev, pci_bar_mem32, res, rom);
>         }
> 
> which initializes the res->flags field.  This field is later checked in
> pcibios_allocate_dev_rom_resource() which is called from
> pcibios_assign_resources(), and the resource is declared unassigned because it
> has a res->flags field.
> 
> Perhaps (sorry for the cut-and-paste) this is a more correct fix which would
> avoid the setting of res->flags?  There is no point in setting
> dev->rom_base_reg, etc., if this is a non-compliant device.

Right, thanks for the analysis.  I think this is a much better fix. We
should not set any flags in this case.

> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 2384100..818731a 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -361,7 +361,7 @@ static void pci_read_bases(struct pci_dev *dev, unsigned int
>                 pos += __pci_read_base(dev, pci_bar_unknown, res, reg);
>         }
> 
> -       if (rom) {
> +       if (rom && !dev->non_compliant_bars) {

What if we just checked dev->non_compliant_bars as the very first
thing in pci_read_bases()?  I think I originally put the check in
__pci_read_base() because we considered doing this on a per-BAR basis.
But I later decided that was overkill.

I could be convinced either way, but if we put the check in
pci_read_bases(), I think we could probably drop the one in
__pci_read_base().  We do call __pci_read_base() directly from
sriov_init(), but we don't have any issues with BARs in SR-IOV
capabilities yet, so we probably don't need to worry about that path.

>                 struct resource *res = &dev->resource[PCI_ROM_RESOURCE];
>                 dev->rom_base_reg = rom;
>                 res->flags = IORESOURCE_MEM | IORESOURCE_PREFETCH |
> 
> P.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/PCI: Fix Broadwell-EP Home Agent & PCU non-compliant BARs
  2016-05-10 17:45     ` Bjorn Helgaas
@ 2016-05-11 10:08       ` Prarit Bhargava
  0 siblings, 0 replies; 6+ messages in thread
From: Prarit Bhargava @ 2016-05-11 10:08 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Bjorn Helgaas, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Myron Stowe, x86, Andi Kleen



On 05/10/2016 01:45 PM, Bjorn Helgaas wrote:
> On Tue, May 10, 2016 at 09:06:29AM -0400, Prarit Bhargava wrote:

<snip output of lspci>

>>>>
>>>> which drives us to the second issue.  Since the PCI devices now
>>>> have unnassigned resources (BARs), pcibios_assign_resources()
>>>> call pci_assign_unassigned_root_bus_resources().  This results in the
>>>> messages above.  I have added a non_compliant_bars check in
>>>> pbus_assign_resources_sorted() to avoid the unassigned device's resources
>>>> from being added to the failed resources list for the bus.
>>>
>>> I don't understand this part yet.  If we mark a device with
>>> non_compliant_bars, __pci_read_base() will return without doing
>>> anything, so we should not fill in the struct resource at all.  It
>>> wouldn't have the "mem" or "pref" bits shown above, and it shouldn't
>>> participate in pcibios_assign_resources() at all.  All of these are
>>> for BAR 6 (the ROM BAR), so maybe there's something wrong with the way
>>> to handle that in particular.
>>
>> I did some additional debugging after reading your comment.  I dumped out the
>> contents of the resources for each bus's devices.  I concentrated on one
>> particular device, 7f:12.4 (the output of lspci is above).
>>
>> Before the call to pcibios_assign_resources(), 7f:12.4 has
>>
>> pci 0000:7f:12.4: PRARIT:    BAR 6: [mem 0x00000000 pref]
>>
>> so that means it the resource was not changed/modified in the call of
>> pcibios_assign_resources().
>>
>> I took a closer look at the code, and I think I know what the issue is.  In
>> pci_read_bases() the code does
>>
>>         if (rom) {
>>                 struct resource *res = &dev->resource[PCI_ROM_RESOURCE];
>>                 dev->rom_base_reg = rom;
>>                 res->flags = IORESOURCE_MEM | IORESOURCE_PREFETCH |
>>                                 IORESOURCE_READONLY | IORESOURCE_CACHEABLE |
>>                                 IORESOURCE_SIZEALIGN;
>>                 __pci_read_base(dev, pci_bar_mem32, res, rom);
>>         }
>>
>> which initializes the res->flags field.  This field is later checked in
>> pcibios_allocate_dev_rom_resource() which is called from
>> pcibios_assign_resources(), and the resource is declared unassigned because it
>> has a res->flags field.
>>
>> Perhaps (sorry for the cut-and-paste) this is a more correct fix which would
>> avoid the setting of res->flags?  There is no point in setting
>> dev->rom_base_reg, etc., if this is a non-compliant device.
> 
> Right, thanks for the analysis.  I think this is a much better fix. We
> should not set any flags in this case.
> 
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index 2384100..818731a 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -361,7 +361,7 @@ static void pci_read_bases(struct pci_dev *dev, unsigned int
>>                 pos += __pci_read_base(dev, pci_bar_unknown, res, reg);
>>         }
>>
>> -       if (rom) {
>> +       if (rom && !dev->non_compliant_bars) {
> 
> What if we just checked dev->non_compliant_bars as the very first
> thing in pci_read_bases()?  I think I originally put the check in
> __pci_read_base() because we considered doing this on a per-BAR basis.
> But I later decided that was overkill.

Okay -- FWIW, I found it odd that the field was plural when its action appeared
to be on a single BAR.

> 
> I could be convinced either way, but if we put the check in
> pci_read_bases(), I think we could probably drop the one in
> __pci_read_base().  We do call __pci_read_base() directly from
> sriov_init(), but we don't have any issues with BARs in SR-IOV
> capabilities yet, so we probably don't need to worry about that path.

I have tested on BDW-EP systems and will post new patches shortly.

P.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-05-11 10:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-09 18:23 [PATCH] x86/PCI: Fix Broadwell-EP Home Agent & PCU non-compliant BARs Prarit Bhargava
2016-05-09 19:20 ` Bjorn Helgaas
2016-05-09 20:34   ` Andi Kleen
2016-05-10 13:06   ` Prarit Bhargava
2016-05-10 17:45     ` Bjorn Helgaas
2016-05-11 10:08       ` Prarit Bhargava

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.