All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] PCI: Unassigned Expansion ROM BARs
@ 2015-09-24  2:47 Myron Stowe
  2015-09-24  3:21 ` Yinghai Lu
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Myron Stowe @ 2015-09-24  2:47 UTC (permalink / raw)
  To: linux-pci; +Cc: LKML, Bjorn Helgaas, Yinghai Lu, Alex Williamson

I've encountered numerous bugzilla reports related to platform BIOS' not
programming valid values into a PCI device's Type 0 Configuration space
"Expansion ROM Base Address" field (a.k.a. Expansion ROM BAR).  The main
observed consequence being 'dmesg' entries like the following that get
customers excited enough to file reports against the kernel.

  pci 0000:01:00.0: can't claim BAR 6 [mem 0xfff00000-0xffffffff pref]:
    no compatible bridge window
  pci 0000:04:03.0: can't claim BAR 6 [mem 0xffff0000-0xffffffff pref]:
    no compatible bridge window

After I've provided an analysis similar to [1] the respective BIOS response
(teams from two of the major vendors) is typically:
   "The OS has no business touching the Expansion ROM BARs and it
    provides no value to the equation here.  The Expansion ROM BAR
    is only useful in pre-boot for the BIOS to get boot code from
    a device."

This scenario has occurred enough times now that I'd like to attempt to
"raise the bar" and invite a technically merit based discussion concerning
this topic - via a public forum that is archived and provides a source of
reference for use upon future occurrences - and see if a consensus can be
reached between the various vendor's BIOS engineers and kernel engineers.


A little more background context -

The kernel expects device Expansion ROM BARs to be programmed with valid
values - even if the respective Expansion ROM's Enable bit is 0 (i.e. the
device’s expansion ROM address space is disabled).  This seems to be the
main contention point with said BIOS engineers.  If an Expansion ROM BAR is
not programmed, the kernel will attempt to find available resources and, if
successful, program it.  As this occurs various 'dmesg' entries
related to kernel's actions are output.

Note that for devices that share decoders between the Expansion ROM BAR and
other BARs the firmware (probably) should not enable the Expansion ROM BAR
at hand-off to the operating system (see the last paragraph of the PCI
Firmware Specification, Rev 3.2, Section 3.5 "Device State at
Firmware/Operating System Handoff").

There is a kernel boot parameter, pci=norom, that is intended to disable the
kernel's resource assignment actions for Expansion ROMs that do not already
have BIOS assigned address ranges.  Note however, if I remember correctly,
that this only works if the Expansion ROM BAR is set to "0" by the BIOS
before hand-off.


I've opened https://bugzilla.kernel.org/show_bug.cgi?id=104931 and attached
the full 'dmesg' that exhibits a typical occurrence as an example.  I'd like
to use the bugzilla to archive any discussion that takes place.   I'll copy all
relevant discussion that takes place here into the bugzilla as "Additional
Comments".

Please continue with this thread, adding your views in these regards.  Citing's
from pertinent specifications that back up your position would be appreciated.

Thanks,
 Myron



[1]  Annotated 'dmesg' log concerning Expansion ROM BARs not setup by BIOS

The "can't claim" messages of interest are:
  pci 0000:01:00.0: can't claim BAR 6 [mem 0xfff00000-0xffffffff pref]:
  no compatible bridge window
  pci 0000:04:03.0: can't claim BAR 6 [mem 0xffff0000-0xffffffff pref]:
  no compatible bridge window

The PCI devices of interest are a device at PCI Bus 1, Device 0, Function
0 (01:00.0) and another device at PCI Bus 4, Device 3, Function 0 (04:03.0).

The "root bridge" that leads to PCI buses 1 and 4 - the buses of interest -
is "PCI0" and its I/O Port space and Memory Mapped I/O (MMIO) space are:

 ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-fe])
 PCI host bridge to bus 0000:00
 pci_bus 0000:00: root bus resource [bus 00-fe]
 pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7]
 pci_bus 0000:00: root bus resource [io  0x0d00-0xffff]
 pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff]
 pci_bus 0000:00: root bus resource [mem 0xc0000000-0xfeafffff]

It's helpful to gather up all the resource related information pertaining to
the devices of interest in one place.  Concentrating on the PCI-to-PCI
bridges and individual PCI devices that lead to 01:00.0, the first device
exhibiting the "can't claim" message (everything that is consuming resources
on PCI bus 0 and PCI bus 1):

    pci 0000:00:1a.0: [8086:1c2d] type 00 class 0x0c0320
    pci 0000:00:1a.0: reg 0x10: [mem 0xc1305000-0xc13053ff]

    pci 0000:00:1d.0: [8086:1c26] type 00 class 0x0c0320
    pci 0000:00:1d.0: reg 0x10: [mem 0xc1304000-0xc13043ff]

    pci 0000:00:1f.2: [8086:1c00] type 00 class 0x01018f
    pci 0000:00:1f.2: reg 0x10: [io  0x3078-0x307f]
    pci 0000:00:1f.2: reg 0x14: [io  0x308c-0x308f]
    pci 0000:00:1f.2: reg 0x18: [io  0x3070-0x3077]
    pci 0000:00:1f.2: reg 0x1c: [io  0x3088-0x308b]
    pci 0000:00:1f.2: reg 0x20: [io  0x3050-0x305f]
    pci 0000:00:1f.2: reg 0x24: [io  0x3040-0x304f]

    pci 0000:00:1f.3: [8086:1c22] type 00 class 0x0c0500
    pci 0000:00:1f.3: reg 0x10: [mem 0xc1302000-0xc13020ff 64bit]
    pci 0000:00:1f.3: reg 0x20: [io  0x3000-0x301f]

    pci 0000:00:1f.5: [8086:1c08] type 00 class 0x010185
    pci 0000:00:1f.5: reg 0x10: [io  0x3068-0x306f]
    pci 0000:00:1f.5: reg 0x14: [io  0x3084-0x3087]
    pci 0000:00:1f.5: reg 0x18: [io  0x3060-0x3067]
    pci 0000:00:1f.5: reg 0x1c: [io  0x3080-0x3083]
    pci 0000:00:1f.5: reg 0x20: [io  0x3030-0x303f]
    pci 0000:00:1f.5: reg 0x24: [io  0x3020-0x302f]

  pci 0000:00:01.0: PCI bridge to [bus 01]
  pci 0000:00:01.0:   bridge window [io  0x2000-0x2fff]
  pci 0000:00:01.0:   bridge window [mem 0xc1200000-0xc12fffff]

    pci 0000:01:00.0: [1000:0072] type 00 class 0x010700
      [1000:0072] - LSI (Symbios) Logic : SAS2008 PCIe Fusion-MPT SAS-2
    pci 0000:01:00.0: reg 0x10: [io  0x2000-0x20ff]
    pci 0000:01:00.0: reg 0x14: [mem 0xc1240000-0xc124ffff 64bit]
    pci 0000:01:00.0: reg 0x1c: [mem 0xc1200000-0xc123ffff 64bit]
x   pci 0000:01:00.0: reg 0x30: [mem 0xfff00000-0xffffffff pref]


The PCI-to-PCI bridge device for Bus 0 to Bus 1 only has one memory space
apeture ("bridge window") active - [mem 0xc1200000-0xc12fffff].  This must
be a subset of one of the "root bus" memory resources and looking at those,
it is.

The target device - 01:00.0; an 'mpt2sas' device - consumes three memory
ranges.  These correspond to the device's BAR 1 and 2 (64 bit addresses
consume two BAR registers), BAR 3 and 4, and the "Expansion ROM Base Address
(a.k.a. BAR 6).  These also must be a subset of both the corresponding root
bus resources and all PCI-to-PCI bridge devices in the PCI hiearchy leading
to the device itself.  Looking at them we see that the first two satisfy the
requirement but the third - [mem 0xfff00000-0xffffffff pref] - does not!
It's because the "Expansion ROM Base Address" register (a.k.a. BAR 6) does
not adhear to the subset requirement(s) that the kernel later outputs:

 pci 0000:01:00.0: can't claim BAR 6 [mem 0xfff00000-0xffffffff pref]:
 no compatible bridge window
 pci 0000:01:00.0: BAR 6: no space for [mem size 0x00100000 pref]
 pci 0000:01:00.0: BAR 6: failed to assign [mem size 0x00100000 pref]

The "can't claim" message is the kernel alerting us that the BIOS has not
correctly set up resources that fulfill all the requirements (subset,
alignment, type, ...).

The kernel then "sizes" the BAR to see how much address space that BAR
requires - in this case we see BAR 6 of 01:00.0 needs 1 MB of contiguous
space - and subsequently tries to work around the BIOS' failure, attempting
to find available, currently unused, resource space that meets all the
requirements which is where the "no space for" message comes from.

There is no contiguous space that meets all the requirements (subset,
alignment, type, ...) available which is fairly easy to see here; there was
only a 1 MB memory aperture provided by the PCI-to-PCI bridge device to
begin with and the 01:00.0 device consumed subsets of that for BARs 1 and 3
so there is no way 1 MB remains free to satisfy BAR 6's needs.  And so the
kernel outputs the "failed to assign" message.


In a very similar scenario, the 04:03.0 device also has not been properly
set up by BIOS ([mem 0xffff0000-0xffffffff pref]).  The difference in this
case is that there were enough available resources left to satisfy all the
subset, alignment, type, ..., requirements and thus the kernel was able to
allocate from such and re-program the device's BAR 6 ([mem
0xc1010000-0xc101ffff pref]) so that the device can function correctly.

  pci 0000:00:1e.0: PCI bridge to [bus 04] (subtractive decode)
  pci 0000:00:1e.0:   bridge window [mem 0xc0800000-0xc10fffff]
  pci 0000:00:1e.0:   bridge window [mem 0xc0000000-0xc07fffff 64bit pref]
  pci 0000:00:1e.0:   bridge window [io  0x0000-0x0cf7] (subtractive decode)
  pci 0000:00:1e.0:   bridge window [io  0x0d00-0xffff] (subtractive decode)
  pci 0000:00:1e.0:   bridge window [mem 0x000a0000-0x000bffff] (subtract d)
  pci 0000:00:1e.0:   bridge window [mem 0xc0000000-0xfeafffff] (subtract d)

    pci 0000:04:03.0: [102b:0532] type 00 class 0x030000
      [102b:0532] - Matrox : MGS G200eW WPCM450 (Graphics)
    pci 0000:04:03.0: reg 0x10: [mem 0xc0000000-0xc07fffff pref]
    pci 0000:04:03.0: reg 0x14: [mem 0xc1000000-0xc1003fff]
    pci 0000:04:03.0: reg 0x18: [mem 0xc0800000-0xc0ffffff]
x   pci 0000:04:03.0: reg 0x30: [mem 0xffff0000-0xffffffff pref]

 pci 0000:04:03.0: can't claim BAR 6 [mem 0xffff0000-0xffffffff pref]:
 no compatible bridge window
 pci 0000:04:03.0: BAR 6: assigned [mem 0xc1010000-0xc101ffff pref]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] PCI: Unassigned Expansion ROM BARs
  2015-09-24  2:47 [RFC] PCI: Unassigned Expansion ROM BARs Myron Stowe
@ 2015-09-24  3:21 ` Yinghai Lu
  2015-09-24  3:58   ` Alex Williamson
  2015-09-24 17:06   ` Myron Stowe
  2015-09-27 19:29 ` Myron Stowe
  2015-09-27 20:19 ` Myron Stowe
  2 siblings, 2 replies; 15+ messages in thread
From: Yinghai Lu @ 2015-09-24  3:21 UTC (permalink / raw)
  To: Myron Stowe; +Cc: linux-pci, LKML, Bjorn Helgaas, Alex Williamson

On Wed, Sep 23, 2015 at 7:47 PM, Myron Stowe <myron.stowe@gmail.com> wrote:
>
> The kernel expects device Expansion ROM BARs to be programmed with valid
> values - even if the respective Expansion ROM's Enable bit is 0 (i.e. the
> device’s expansion ROM address space is disabled).  This seems to be the
> main contention point with said BIOS engineers.  If an Expansion ROM BAR is
> not programmed, the kernel will attempt to find available resources and, if
> successful, program it.  As this occurs various 'dmesg' entries
> related to kernel's actions are output.
...
> There is a kernel boot parameter, pci=norom, that is intended to disable the
> kernel's resource assignment actions for Expansion ROMs that do not already
> have BIOS assigned address ranges.  Note however, if I remember correctly,
> that this only works if the Expansion ROM BAR is set to "0" by the BIOS
> before hand-off.

option rom is used by legacy bios to enable booting from external device.
usually BIOS call the option rom, so the firmware will be loaded to
add on cards.
and firmware get started.
Also option rom would include tools that is used to configure behavior of cards
like add/remove raid.

Also there is some use case that kernel driver try to get some parameters from
BIOS. like intel soft raid ? --- bad practice !

I would like to treat option rom BAR as optional resources during
resource allocation.

https://git.kernel.org/cgit/linux/kernel/git/yinghai/linux-yinghai.git/patch/?id=7f689da33302e4871fd18aee2c19abb5e3ea5261

Subject: PCI: Treat ROM resource as optional during realloc

Current on realloc path, we just ignore ROM resource if we can not assign
them in first try.

Treat ROM resources as optional resources,so try to allocate them together
with required ones, if can not assign them, could go with other required
resources only, and try to allocate them second time in expand path.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] PCI: Unassigned Expansion ROM BARs
  2015-09-24  3:21 ` Yinghai Lu
@ 2015-09-24  3:58   ` Alex Williamson
  2015-09-24 17:06   ` Myron Stowe
  1 sibling, 0 replies; 15+ messages in thread
From: Alex Williamson @ 2015-09-24  3:58 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Myron Stowe, linux-pci, LKML, Bjorn Helgaas

On Wed, 2015-09-23 at 20:21 -0700, Yinghai Lu wrote:
> On Wed, Sep 23, 2015 at 7:47 PM, Myron Stowe <myron.stowe@gmail.com> wrote:
> >
> > The kernel expects device Expansion ROM BARs to be programmed with valid
> > values - even if the respective Expansion ROM's Enable bit is 0 (i.e. the
> > device’s expansion ROM address space is disabled).  This seems to be the
> > main contention point with said BIOS engineers.  If an Expansion ROM BAR is
> > not programmed, the kernel will attempt to find available resources and, if
> > successful, program it.  As this occurs various 'dmesg' entries
> > related to kernel's actions are output.
> ...
> > There is a kernel boot parameter, pci=norom, that is intended to disable the
> > kernel's resource assignment actions for Expansion ROMs that do not already
> > have BIOS assigned address ranges.  Note however, if I remember correctly,
> > that this only works if the Expansion ROM BAR is set to "0" by the BIOS
> > before hand-off.
> 
> option rom is used by legacy bios to enable booting from external device.
> usually BIOS call the option rom, so the firmware will be loaded to
> add on cards.
> and firmware get started.
> Also option rom would include tools that is used to configure behavior of cards
> like add/remove raid.
> 
> Also there is some use case that kernel driver try to get some parameters from
> BIOS. like intel soft raid ? --- bad practice !
> 
> I would like to treat option rom BAR as optional resources during
> resource allocation.
> 
> https://git.kernel.org/cgit/linux/kernel/git/yinghai/linux-yinghai.git/patch/?id=7f689da33302e4871fd18aee2c19abb5e3ea5261
> 
> Subject: PCI: Treat ROM resource as optional during realloc
> 
> Current on realloc path, we just ignore ROM resource if we can not assign
> them in first try.
> 
> Treat ROM resources as optional resources,so try to allocate them together
> with required ones, if can not assign them, could go with other required
> resources only, and try to allocate them second time in expand path.

Don't forget that the physical system boot may not be the only "boot" of
the PCI device.  We can assign a PCI device to a VM running on top of
the bare-metal OS, at which point the option ROM of the device may be
re-executed and the device re-initialized by the VM BIOS.  A BIOS
engineer that argues that the option ROM is unnecessary after bare-metal
BIOS boot is completely disregarding this use case.  We do have ways to
make this be a soft requirement, we can pass the option ROM as a file to
the VM, but we need to be able to rip the option ROM from the device in
order to do that, likely from a better behaved platform wrt option ROM
mapping.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] PCI: Unassigned Expansion ROM BARs
  2015-09-24  3:21 ` Yinghai Lu
  2015-09-24  3:58   ` Alex Williamson
@ 2015-09-24 17:06   ` Myron Stowe
  2015-09-24 19:01     ` Yinghai Lu
  1 sibling, 1 reply; 15+ messages in thread
From: Myron Stowe @ 2015-09-24 17:06 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: linux-pci, LKML, Bjorn Helgaas, Alex Williamson

On Wed, Sep 23, 2015 at 9:21 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Wed, Sep 23, 2015 at 7:47 PM, Myron Stowe <myron.stowe@gmail.com> wrote:
>>
>> The kernel expects device Expansion ROM BARs to be programmed with valid
>> values - even if the respective Expansion ROM's Enable bit is 0 (i.e. the
>> device’s expansion ROM address space is disabled).  This seems to be the
>> main contention point with said BIOS engineers.  If an Expansion ROM BAR is
>> not programmed, the kernel will attempt to find available resources and, if
>> successful, program it.  As this occurs various 'dmesg' entries
>> related to kernel's actions are output.
> ...
>> There is a kernel boot parameter, pci=norom, that is intended to disable the
>> kernel's resource assignment actions for Expansion ROMs that do not already
>> have BIOS assigned address ranges.  Note however, if I remember correctly,
>> that this only works if the Expansion ROM BAR is set to "0" by the BIOS
>> before hand-off.
>
> option rom is used by legacy bios to enable booting from external device.
> usually BIOS call the option rom, so the firmware will be loaded to
> add on cards.
> and firmware get started.
> Also option rom would include tools that is used to configure behavior of cards
> like add/remove raid.

I'm not sure what you are getting at here but yes, there are use cases
where the BIOS
needs to access the Expansion ROM, one of the more common being PXE booting from
a NIC device where the PXE boot content is retrieved from the ROM, but
that has little,
if anything, to do with what I'm after here.

The BIOS engineers are expressing that the kernel should *never* need
to access the Expansion
ROM, and thus should *never* try to allocate resources for these BARs
and program them
to sane address range values.


I know you work with bringing up new hardware.  So picture yourself
sitting with some
members from your platform's BIOS team.  They tell you: "The OS is
incorrect in thinking
it needs to find, and program, sane resource range values into a
device's Expansion ROM
BAR.  We (the BIOS) hand-off the platform with these disabled, thus
whatever values are in
the ROMs BAR should be totally ignored, and the OS should never touch
them."  What would you
reply with to them in an attempt to show that your position (i.e. the
kernel finding, and programming
values under these circumstances) is correct and that the BIOS opinion
is in-correct?  That is
what I'm after.

>
> Also there is some use case that kernel driver try to get some parameters from
> BIOS. like intel soft raid ? --- bad practice !

Again, your replies are so terse I have no idea what you are saying; it's
undecipherable!  Are you indicating that you agree with the BIOS
engineers views?

>
> I would like to treat option rom BAR as optional resources during
> resource allocation.
>
> https://git.kernel.org/cgit/linux/kernel/git/yinghai/linux-yinghai.git/patch/?id=7f689da33302e4871fd18aee2c19abb5e3ea5261
>
> Subject: PCI: Treat ROM resource as optional during realloc
>
> Current on realloc path, we just ignore ROM resource if we can not assign
> them in first try.
>
> Treat ROM resources as optional resources,so try to allocate them together
> with required ones, if can not assign them, could go with other required
> resources only, and try to allocate them second time in expand path.

Yes, while they may have lower priority in obtaining resources, your still
attempting to do so initially.  The BIOS engineers seem to believe that this is
incorrect - the OS should not even attempt to allocate them in the first try.

So, which side are you on and can you support your view with some
technical based
argument (and any references from the specifications)?

Please take some time and respond with some thought out explanations
and opinions.
I value your opinion because I have seen your work but your terse
replies are going to
do nothing what so ever in trying to convince BIOS engineers that the
OS should, or needs
to, access such.  Otherwise: "Why are we (the kernel) allocating
resources for them?"

Myron

>
> Thanks
>
> Yinghai

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] PCI: Unassigned Expansion ROM BARs
  2015-09-24 17:06   ` Myron Stowe
@ 2015-09-24 19:01     ` Yinghai Lu
  2015-09-25  4:35       ` Yinghai Lu
  0 siblings, 1 reply; 15+ messages in thread
From: Yinghai Lu @ 2015-09-24 19:01 UTC (permalink / raw)
  To: Myron Stowe; +Cc: linux-pci, LKML, Bjorn Helgaas, Alex Williamson

On Thu, Sep 24, 2015 at 10:06 AM, Myron Stowe <myron.stowe@gmail.com> wrote:
> On Wed, Sep 23, 2015 at 9:21 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Wed, Sep 23, 2015 at 7:47 PM, Myron Stowe <myron.stowe@gmail.com> wrote:
>>>
>>> The kernel expects device Expansion ROM BARs to be programmed with valid
>>> values - even if the respective Expansion ROM's Enable bit is 0 (i.e. the
>>> device’s expansion ROM address space is disabled).  This seems to be the
>>> main contention point with said BIOS engineers.  If an Expansion ROM BAR is
>>> not programmed, the kernel will attempt to find available resources and, if
>>> successful, program it.  As this occurs various 'dmesg' entries
>>> related to kernel's actions are output.
>> ...
>>> There is a kernel boot parameter, pci=norom, that is intended to disable the
>>> kernel's resource assignment actions for Expansion ROMs that do not already
>>> have BIOS assigned address ranges.  Note however, if I remember correctly,
>>> that this only works if the Expansion ROM BAR is set to "0" by the BIOS
>>> before hand-off.
>>
>> option rom is used by legacy bios to enable booting from external device.
>> usually BIOS call the option rom, so the firmware will be loaded to
>> add on cards.
>> and firmware get started.
>> Also option rom would include tools that is used to configure behavior of cards
>> like add/remove raid.
>
> I'm not sure what you are getting at here but yes, there are use cases
> where the BIOS
> needs to access the Expansion ROM, one of the more common being PXE booting from
> a NIC device where the PXE boot content is retrieved from the ROM, but
> that has little,
> if anything, to do with what I'm after here.
>
> The BIOS engineers are expressing that the kernel should *never* need
> to access the Expansion
> ROM, and thus should *never* try to allocate resources for these BARs
> and program them
> to sane address range values.

They are wrong.
It still depends on how addon card firmware and drivers
from OS to use it.

>
>
> I know you work with bringing up new hardware.  So picture yourself
> sitting with some
> members from your platform's BIOS team.  They tell you: "The OS is
> incorrect in thinking
> it needs to find, and program, sane resource range values into a
> device's Expansion ROM
> BAR.  We (the BIOS) hand-off the platform with these disabled, thus
> whatever values are in
> the ROMs BAR should be totally ignored, and the OS should never touch
> them."  What would you
> reply with to them in an attempt to show that your position (i.e. the
> kernel finding, and programming
> values under these circumstances) is correct and that the BIOS opinion
> is in-correct?  That is
> what I'm after.

Some addon cards drivers are use ROM bar to talk to card.
like  Intel DC21285 driver in drivers/mtd/maps/pci.c

>
>>
>> Also there is some use case that kernel driver try to get some parameters from
>> BIOS. like intel soft raid ? --- bad practice !
>
> Again, your replies are so terse I have no idea what you are saying; it's
> undecipherable!  Are you indicating that you agree with the BIOS
> engineers views?

No.

Just some addon card/drivers would avoid accessing expand rom to get
parameter or settings.



>
>>
>> I would like to treat option rom BAR as optional resources during
>> resource allocation.
>>
>> https://git.kernel.org/cgit/linux/kernel/git/yinghai/linux-yinghai.git/patch/?id=7f689da33302e4871fd18aee2c19abb5e3ea5261
>>
>> Subject: PCI: Treat ROM resource as optional during realloc
>>
>> Current on realloc path, we just ignore ROM resource if we can not assign
>> them in first try.
>>
>> Treat ROM resources as optional resources,so try to allocate them together
>> with required ones, if can not assign them, could go with other required
>> resources only, and try to allocate them second time in expand path.
>
> Yes, while they may have lower priority in obtaining resources, your still
> attempting to do so initially.  The BIOS engineers seem to believe that this is
> incorrect - the OS should not even attempt to allocate them in the first try.

Some drivers may use it, and we don't know who is really need them.
so just give it a try.

Or do we want to keep a white list to say which device should have
ROM bar as mush have, and other is optional to have ?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] PCI: Unassigned Expansion ROM BARs
  2015-09-24 19:01     ` Yinghai Lu
@ 2015-09-25  4:35       ` Yinghai Lu
  2015-09-25 13:31         ` Myron Stowe
  2015-09-25 14:35         ` Bjorn Helgaas
  0 siblings, 2 replies; 15+ messages in thread
From: Yinghai Lu @ 2015-09-25  4:35 UTC (permalink / raw)
  To: Myron Stowe; +Cc: linux-pci, LKML, Bjorn Helgaas, Alex Williamson

On Thu, Sep 24, 2015 at 12:01 PM, Yinghai Lu <yinghai@kernel.org> wrote:

> Or do we want to keep a white list to say which device should have
> ROM bar as mush have, and other is optional to have ?

Subject: [RFC PATCH] PCI: Add pci_dev_need_rom_bar()

Only set that for
1. if BIOS/firmware already set ROM bar.
2. via quirks for some devices.

We assign those needed ROM bar as required
and other ROM bars as optional resources.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/pci/i386.c        |    9 +++++-
 drivers/dma/pch_dma.c      |    1
 drivers/gpio/gpio-ml-ioh.c |    2 -
 drivers/misc/pch_phub.c    |   12 --------
 drivers/pci/probe.c        |    7 +++++
 drivers/pci/quirks.c       |   63 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/pci/setup-bus.c    |   18 ++++++++++--
 include/linux/pci.h        |   13 +++++++++
 include/linux/pci_ids.h    |    7 +++++
 9 files changed, 112 insertions(+), 20 deletions(-)

Index: linux-2.6/arch/x86/pci/i386.c
===================================================================
--- linux-2.6.orig/arch/x86/pci/i386.c
+++ linux-2.6/arch/x86/pci/i386.c
@@ -377,11 +377,16 @@ static void pcibios_allocate_rom_resourc
     }
 }

+bool pci_assign_roms(void)
+{
+    return !!(pci_probe & PCI_ASSIGN_ROMS);
+}
+
 static int __init pcibios_assign_resources(void)
 {
     struct pci_host_bridge *host_bridge = NULL;

-    if (!(pci_probe & PCI_ASSIGN_ROMS))
+    if (!pci_assign_roms())
         for_each_pci_host_bridge(host_bridge)
             pcibios_allocate_rom_resources(host_bridge->bus);

@@ -406,7 +411,7 @@ void pcibios_resource_survey_bus(struct
     pcibios_allocate_resources(bus, 0);
     pcibios_allocate_resources(bus, 1);

-    if (!(pci_probe & PCI_ASSIGN_ROMS))
+    if (!pci_assign_roms())
         pcibios_allocate_rom_resources(bus);
 }

Index: linux-2.6/drivers/pci/probe.c
===================================================================
--- linux-2.6.orig/drivers/pci/probe.c
+++ linux-2.6/drivers/pci/probe.c
@@ -224,6 +224,13 @@ int __pci_read_base(struct pci_dev *dev,
         l64 = l & PCI_ROM_ADDRESS_MASK;
         sz64 = sz & PCI_ROM_ADDRESS_MASK;
         mask64 = (u32)PCI_ROM_ADDRESS_MASK;
+        /* simple validation */
+        if (l64 && sz64 &&
+            (l64 & 0xff000000) != 0xff000000 &&
+            system_state == SYSTEM_BOOTING) {
+            dev_printk(KERN_DEBUG, &dev->dev, "set dev_flags NEED_ROM_BAR\n");
+            pci_dev_set_need_rom_bar(dev);
+        }
     }

     if (res->flags & IORESOURCE_MEM_64) {
Index: linux-2.6/drivers/pci/quirks.c
===================================================================
--- linux-2.6.orig/drivers/pci/quirks.c
+++ linux-2.6/drivers/pci/quirks.c
@@ -4197,3 +4197,66 @@ static void quirk_intel_qat_vf_cap(struc
     }
 }
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, quirk_intel_qat_vf_cap);
+
+/* from drivers/mtd/maps/pci.c */
+static void quirk_set_need_rom_bar(struct pci_dev *pdev)
+{
+    if (!pci_dev_need_rom_bar(pdev)) {
+        dev_printk(KERN_DEBUG, &pdev->dev, "set dev_flags NEED_ROM_BAR\n");
+        pci_dev_set_need_rom_bar(pdev);
+    }
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_DEC, PCI_DEVICE_ID_DEC_21285,
+             quirk_set_need_rom_bar);
+
+#ifdef CONFIG_PARISC
+/* from drivers/video/console/sticore.c */
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HP, PCI_DEVICE_ID_HP_VISUALIZE_EG,
+             quirk_set_need_rom_bar);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HP, PCI_DEVICE_ID_HP_VISUALIZE_FX6,
+             quirk_set_need_rom_bar);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HP, PCI_DEVICE_ID_HP_VISUALIZE_FX4,
+             quirk_set_need_rom_bar);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HP, PCI_DEVICE_ID_HP_VISUALIZE_FX2,
+             quirk_set_need_rom_bar);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HP, PCI_DEVICE_ID_HP_VISUALIZE_FXE,
+             quirk_set_need_rom_bar);
+#endif
+
+/* from drivers/misc/pch_phub.c */
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_PCH1_PHUB,
+             quirk_set_need_rom_bar);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ROHM, PCI_DEVICE_ID_ROHM_ML7213_PHUB,
+             quirk_set_need_rom_bar);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ROHM, PCI_DEVICE_ID_ROHM_ML7223_mPHUB,
+             quirk_set_need_rom_bar);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ROHM, PCI_DEVICE_ID_ROHM_ML7223_nPHUB,
+             quirk_set_need_rom_bar);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ROHM, PCI_DEVICE_ID_ROHM_ML7831_PHUB,
+             quirk_set_need_rom_bar);
+
+/* from drivers/net/ethernet/sun/sungem.c */
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_GEM,
+             quirk_set_need_rom_bar);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_RIO_GEM,
+             quirk_set_need_rom_bar);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_APPLE, PCI_DEVICE_ID_APPLE_UNI_N_GMAC,
+             quirk_set_need_rom_bar);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_APPLE, PCI_DEVICE_ID_APPLE_UNI_N_GMACP,
+             quirk_set_need_rom_bar);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_APPLE, PCI_DEVICE_ID_APPLE_UNI_N_GMAC2,
+             quirk_set_need_rom_bar);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_APPLE, PCI_DEVICE_ID_APPLE_K2_GMAC,
+             quirk_set_need_rom_bar);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_APPLE, PCI_DEVICE_ID_APPLE_SH_SUNGEM,
+             quirk_set_need_rom_bar);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_APPLE, PCI_DEVICE_ID_APPLE_IPID2_GMAC,
+             quirk_set_need_rom_bar);
+
+/* from drivers/net/ethernet/sun/sunhme.c */
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_HAPPYMEAL,
+             quirk_set_need_rom_bar);
+
+/* from drm and fbdev */
+DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_DISPLAY_VGA,
+                8, quirk_set_need_rom_bar);
Index: linux-2.6/drivers/pci/setup-bus.c
===================================================================
--- linux-2.6.orig/drivers/pci/setup-bus.c
+++ linux-2.6/drivers/pci/setup-bus.c
@@ -226,6 +226,10 @@ static void pdev_assign_resources_prepar
         if (resource_disabled(r) || r->parent)
             continue;

+        if (i == PCI_ROM_RESOURCE &&
+            !pci_assign_roms() && !pci_dev_need_rom_bar(dev))
+            continue;
+
         if ((r->flags & IORESOURCE_IO) &&
             !pci_find_host_bridge(dev->bus)->has_ioport)
             continue;
@@ -1461,11 +1465,16 @@ out:
     return good_size;
 }

-static inline bool is_optional(int i)
+bool __weak pci_assign_roms(void)
+{
+    return false;
+}
+
+static inline bool is_optional(struct pci_dev *dev, int i)
 {

     if (i == PCI_ROM_RESOURCE)
-        return true;
+        return !pci_assign_roms() && !pci_dev_need_rom_bar(dev);

 #ifdef CONFIG_PCI_IOV
     if (i >= PCI_IOV_RESOURCES && i <= PCI_IOV_RESOURCE_END)
@@ -1538,7 +1547,7 @@ static int pbus_size_mem(struct pci_bus
             r_size = resource_size(r);
             align = pci_resource_alignment(dev, r);
             /* put SRIOV/ROM res to realloc list */
-            if (realloc_head && is_optional(i)) {
+            if (realloc_head && is_optional(dev, i)) {
                 add_to_align_test_list(&align_test_add_list,
                         align, r_size, &dev->dev, i);
                 r->end = r->start - 1;
@@ -1549,6 +1558,9 @@ static int pbus_size_mem(struct pci_bus
                     max_add_align = align;
                 continue;
             }
+            if (!realloc_head && i == PCI_ROM_RESOURCE &&
+                !pci_assign_roms() && !pci_dev_need_rom_bar(dev))
+                continue;

             if (align > (1ULL<<37)) { /*128 Gb*/
                 dev_warn(&dev->dev, "disabling BAR %d: %pR (bad
alignment %#llx)\n",
Index: linux-2.6/include/linux/pci.h
===================================================================
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -182,6 +182,8 @@ enum pci_dev_flags {
     PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
     /* Get VPD from function 0 VPD */
     PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
+    /* Need to have ROM BAR */
+    PCI_DEV_FLAGS_NEED_ROM_BAR = (__force pci_dev_flags_t) (1 << 9),
 };

 enum pci_irq_reroute_variant {
@@ -1965,6 +1967,14 @@ static inline bool pci_is_dev_assigned(s
 {
     return (pdev->dev_flags & PCI_DEV_FLAGS_ASSIGNED) ==
PCI_DEV_FLAGS_ASSIGNED;
 }
+static inline void pci_dev_set_need_rom_bar(struct pci_dev *pdev)
+{
+    pdev->dev_flags |= PCI_DEV_FLAGS_NEED_ROM_BAR;
+}
+static inline bool pci_dev_need_rom_bar(struct pci_dev *pdev)
+{
+    return !!(pdev->dev_flags & PCI_DEV_FLAGS_NEED_ROM_BAR);
+}

 /**
  * pci_ari_enabled - query ARI forwarding status
@@ -1976,4 +1986,7 @@ static inline bool pci_ari_enabled(struc
 {
     return bus->self && bus->self->ari_enabled;
 }
+
+bool pci_assign_roms(void);
+
 #endif /* LINUX_PCI_H */
Index: linux-2.6/drivers/misc/pch_phub.c
===================================================================
--- linux-2.6.orig/drivers/misc/pch_phub.c
+++ linux-2.6/drivers/misc/pch_phub.c
@@ -49,7 +49,6 @@

 /* MAX number of INT_REDUCE_CONTROL registers */
 #define MAX_NUM_INT_REDUCE_CONTROL_REG 128
-#define PCI_DEVICE_ID_PCH1_PHUB 0x8801
 #define PCH_MINOR_NOS 1
 #define CLKCFG_CAN_50MHZ 0x12000000
 #define CLKCFG_CANCLK_MASK 0xFF000000
@@ -61,17 +60,6 @@
 #define CLKCFG_PLL2VCO                (8 << 9)
 #define CLKCFG_UARTCLKSEL            (1 << 18)

-/* Macros for ML7213 */
-#define PCI_VENDOR_ID_ROHM            0x10db
-#define PCI_DEVICE_ID_ROHM_ML7213_PHUB        0x801A
-
-/* Macros for ML7223 */
-#define PCI_DEVICE_ID_ROHM_ML7223_mPHUB    0x8012 /* for Bus-m */
-#define PCI_DEVICE_ID_ROHM_ML7223_nPHUB    0x8002 /* for Bus-n */
-
-/* Macros for ML7831 */
-#define PCI_DEVICE_ID_ROHM_ML7831_PHUB 0x8801
-
 /* SROM ACCESS Macro */
 #define PCH_WORD_ADDR_MASK (~((1 << 2) - 1))

Index: linux-2.6/include/linux/pci_ids.h
===================================================================
--- linux-2.6.orig/include/linux/pci_ids.h
+++ linux-2.6/include/linux/pci_ids.h
@@ -1122,6 +1122,12 @@
 #define PCI_VENDOR_ID_TCONRAD        0x10da
 #define PCI_DEVICE_ID_TCONRAD_TOKENRING    0x0508

+#define PCI_VENDOR_ID_ROHM             0x10DB
+#define PCI_DEVICE_ID_ROHM_ML7213_PHUB          0x801A
+#define PCI_DEVICE_ID_ROHM_ML7223_mPHUB 0x8012 /* for Bus-m */
+#define PCI_DEVICE_ID_ROHM_ML7223_nPHUB 0x8002 /* for Bus-n */
+#define PCI_DEVICE_ID_ROHM_ML7831_PHUB 0x8801
+
 #define PCI_VENDOR_ID_NVIDIA            0x10de
 #define PCI_DEVICE_ID_NVIDIA_TNT        0x0020
 #define PCI_DEVICE_ID_NVIDIA_TNT2        0x0028
@@ -2900,6 +2906,7 @@
 #define PCI_DEVICE_ID_INTEL_82454NX     0x84cb
 #define PCI_DEVICE_ID_INTEL_84460GX    0x84ea
 #define PCI_DEVICE_ID_INTEL_IXP4XX    0x8500
+#define PCI_DEVICE_ID_PCH1_PHUB 0x8801
 #define PCI_DEVICE_ID_INTEL_IXP2800    0x9004
 #define PCI_DEVICE_ID_INTEL_S21152BB    0xb152

Index: linux-2.6/drivers/dma/pch_dma.c
===================================================================
--- linux-2.6.orig/drivers/dma/pch_dma.c
+++ linux-2.6/drivers/dma/pch_dma.c
@@ -976,7 +976,6 @@ static void pch_dma_remove(struct pci_de
 }

 /* PCI Device ID of DMA device */
-#define PCI_VENDOR_ID_ROHM             0x10DB
 #define PCI_DEVICE_ID_EG20T_PCH_DMA_8CH        0x8810
 #define PCI_DEVICE_ID_EG20T_PCH_DMA_4CH        0x8815
 #define PCI_DEVICE_ID_ML7213_DMA1_8CH    0x8026
Index: linux-2.6/drivers/gpio/gpio-ml-ioh.c
===================================================================
--- linux-2.6.orig/drivers/gpio/gpio-ml-ioh.c
+++ linux-2.6/drivers/gpio/gpio-ml-ioh.c
@@ -31,8 +31,6 @@

 #define IOH_IRQ_BASE        0

-#define PCI_VENDOR_ID_ROHM             0x10DB
-
 struct ioh_reg_comn {
     u32    ien;
     u32    istatus;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] PCI: Unassigned Expansion ROM BARs
  2015-09-25  4:35       ` Yinghai Lu
@ 2015-09-25 13:31         ` Myron Stowe
  2015-09-25 17:02           ` Myron Stowe
  2015-09-25 14:35         ` Bjorn Helgaas
  1 sibling, 1 reply; 15+ messages in thread
From: Myron Stowe @ 2015-09-25 13:31 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: linux-pci, LKML, Bjorn Helgaas, Alex Williamson

On Thu, Sep 24, 2015 at 10:35 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Thu, Sep 24, 2015 at 12:01 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>
>> Or do we want to keep a white list to say which device should have
>> ROM bar as mush have, and other is optional to have ?

I suppose this idea is one possible outcome that could occur but I
think we need to have a discussion first before we start making a lot
of changes.  We need to try and come to some consensus with BIOS
engineers.  I know that both sets have been alerted to this
conversation so *if* we come up with some good arguments to support
the kernel's current view/actions perhaps things will start to
progress.

In the prior thread you replied:
  "They are wrong.
   It still depends on how addon card firmware and drivers
   from OS to use it."

So continuing my suggested thought experiment where you are sitting
across the table from your platform's BIOS engineers having this
discussion ...  Do you *really* think your reply was helpful in any
way?  Do you *really* think you did anything what so ever to get the
BIOS engineers to consider something they hadn't before.  Do you
*really* think your reply was technically based in any way?  Do you
have any specification references or such to back up such a strong
claim?

Come on here Yinghai - you are an intelligent person.  Take 1/10th the
time that you spent developing this patch and think, gather your
thoughts, and then sit down calmly, have a beer or coffee or tea
(which ever you prefer), relax, and take some time to reply in a
logical, thoughtful manner here with enough expression that others can
understand what you are getting at and hopefully even with some
passion or logic to try to convince the BIOS engineers that the
kernel's current behavior is correct.  This is your area of expertise
- so stand up and rise to the occasion here!

Hacking out a patch before this thread has played out serves no
purpose what so ever so I'm not even going to waste my time and look
at it.  It serves no purpose and will only make matters worse as there
is already strong disagreement with the kernel's actions in these
regards.

>
> Subject: [RFC PATCH] PCI: Add pci_dev_need_rom_bar()
>
> Only set that for
> 1. if BIOS/firmware already set ROM bar.
> 2. via quirks for some devices.
>
> We assign those needed ROM bar as required
> and other ROM bars as optional resources.
>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>
> ---
>  arch/x86/pci/i386.c        |    9 +++++-
>  drivers/dma/pch_dma.c      |    1
>  drivers/gpio/gpio-ml-ioh.c |    2 -
>  drivers/misc/pch_phub.c    |   12 --------
>  drivers/pci/probe.c        |    7 +++++
>  drivers/pci/quirks.c       |   63 +++++++++++++++++++++++++++++++++++++++++++++
>  drivers/pci/setup-bus.c    |   18 ++++++++++--
>  include/linux/pci.h        |   13 +++++++++
>  include/linux/pci_ids.h    |    7 +++++
>  9 files changed, 112 insertions(+), 20 deletions(-)
>
> Index: linux-2.6/arch/x86/pci/i386.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/pci/i386.c
> +++ linux-2.6/arch/x86/pci/i386.c
> @@ -377,11 +377,16 @@ static void pcibios_allocate_rom_resourc
>      }
>  }
>
> +bool pci_assign_roms(void)
> +{
> +    return !!(pci_probe & PCI_ASSIGN_ROMS);
> +}
> +
>  static int __init pcibios_assign_resources(void)
>  {
>      struct pci_host_bridge *host_bridge = NULL;
>
> -    if (!(pci_probe & PCI_ASSIGN_ROMS))
> +    if (!pci_assign_roms())
>          for_each_pci_host_bridge(host_bridge)
>              pcibios_allocate_rom_resources(host_bridge->bus);
>
> @@ -406,7 +411,7 @@ void pcibios_resource_survey_bus(struct
>      pcibios_allocate_resources(bus, 0);
>      pcibios_allocate_resources(bus, 1);
>
> -    if (!(pci_probe & PCI_ASSIGN_ROMS))
> +    if (!pci_assign_roms())
>          pcibios_allocate_rom_resources(bus);
>  }
>
> Index: linux-2.6/drivers/pci/probe.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/probe.c
> +++ linux-2.6/drivers/pci/probe.c
> @@ -224,6 +224,13 @@ int __pci_read_base(struct pci_dev *dev,
>          l64 = l & PCI_ROM_ADDRESS_MASK;
>          sz64 = sz & PCI_ROM_ADDRESS_MASK;
>          mask64 = (u32)PCI_ROM_ADDRESS_MASK;
> +        /* simple validation */
> +        if (l64 && sz64 &&
> +            (l64 & 0xff000000) != 0xff000000 &&
> +            system_state == SYSTEM_BOOTING) {
> +            dev_printk(KERN_DEBUG, &dev->dev, "set dev_flags NEED_ROM_BAR\n");
> +            pci_dev_set_need_rom_bar(dev);
> +        }
>      }
>
>      if (res->flags & IORESOURCE_MEM_64) {
> Index: linux-2.6/drivers/pci/quirks.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/quirks.c
> +++ linux-2.6/drivers/pci/quirks.c
> @@ -4197,3 +4197,66 @@ static void quirk_intel_qat_vf_cap(struc
>      }
>  }
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, quirk_intel_qat_vf_cap);
> +
> +/* from drivers/mtd/maps/pci.c */
> +static void quirk_set_need_rom_bar(struct pci_dev *pdev)
> +{
> +    if (!pci_dev_need_rom_bar(pdev)) {
> +        dev_printk(KERN_DEBUG, &pdev->dev, "set dev_flags NEED_ROM_BAR\n");
> +        pci_dev_set_need_rom_bar(pdev);
> +    }
> +}
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_DEC, PCI_DEVICE_ID_DEC_21285,
> +             quirk_set_need_rom_bar);
> +
> +#ifdef CONFIG_PARISC
> +/* from drivers/video/console/sticore.c */
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HP, PCI_DEVICE_ID_HP_VISUALIZE_EG,
> +             quirk_set_need_rom_bar);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HP, PCI_DEVICE_ID_HP_VISUALIZE_FX6,
> +             quirk_set_need_rom_bar);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HP, PCI_DEVICE_ID_HP_VISUALIZE_FX4,
> +             quirk_set_need_rom_bar);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HP, PCI_DEVICE_ID_HP_VISUALIZE_FX2,
> +             quirk_set_need_rom_bar);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HP, PCI_DEVICE_ID_HP_VISUALIZE_FXE,
> +             quirk_set_need_rom_bar);
> +#endif
> +
> +/* from drivers/misc/pch_phub.c */
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_PCH1_PHUB,
> +             quirk_set_need_rom_bar);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ROHM, PCI_DEVICE_ID_ROHM_ML7213_PHUB,
> +             quirk_set_need_rom_bar);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ROHM, PCI_DEVICE_ID_ROHM_ML7223_mPHUB,
> +             quirk_set_need_rom_bar);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ROHM, PCI_DEVICE_ID_ROHM_ML7223_nPHUB,
> +             quirk_set_need_rom_bar);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ROHM, PCI_DEVICE_ID_ROHM_ML7831_PHUB,
> +             quirk_set_need_rom_bar);
> +
> +/* from drivers/net/ethernet/sun/sungem.c */
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_GEM,
> +             quirk_set_need_rom_bar);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_RIO_GEM,
> +             quirk_set_need_rom_bar);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_APPLE, PCI_DEVICE_ID_APPLE_UNI_N_GMAC,
> +             quirk_set_need_rom_bar);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_APPLE, PCI_DEVICE_ID_APPLE_UNI_N_GMACP,
> +             quirk_set_need_rom_bar);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_APPLE, PCI_DEVICE_ID_APPLE_UNI_N_GMAC2,
> +             quirk_set_need_rom_bar);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_APPLE, PCI_DEVICE_ID_APPLE_K2_GMAC,
> +             quirk_set_need_rom_bar);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_APPLE, PCI_DEVICE_ID_APPLE_SH_SUNGEM,
> +             quirk_set_need_rom_bar);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_APPLE, PCI_DEVICE_ID_APPLE_IPID2_GMAC,
> +             quirk_set_need_rom_bar);
> +
> +/* from drivers/net/ethernet/sun/sunhme.c */
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_HAPPYMEAL,
> +             quirk_set_need_rom_bar);
> +
> +/* from drm and fbdev */
> +DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_DISPLAY_VGA,
> +                8, quirk_set_need_rom_bar);
> Index: linux-2.6/drivers/pci/setup-bus.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/setup-bus.c
> +++ linux-2.6/drivers/pci/setup-bus.c
> @@ -226,6 +226,10 @@ static void pdev_assign_resources_prepar
>          if (resource_disabled(r) || r->parent)
>              continue;
>
> +        if (i == PCI_ROM_RESOURCE &&
> +            !pci_assign_roms() && !pci_dev_need_rom_bar(dev))
> +            continue;
> +
>          if ((r->flags & IORESOURCE_IO) &&
>              !pci_find_host_bridge(dev->bus)->has_ioport)
>              continue;
> @@ -1461,11 +1465,16 @@ out:
>      return good_size;
>  }
>
> -static inline bool is_optional(int i)
> +bool __weak pci_assign_roms(void)
> +{
> +    return false;
> +}
> +
> +static inline bool is_optional(struct pci_dev *dev, int i)
>  {
>
>      if (i == PCI_ROM_RESOURCE)
> -        return true;
> +        return !pci_assign_roms() && !pci_dev_need_rom_bar(dev);
>
>  #ifdef CONFIG_PCI_IOV
>      if (i >= PCI_IOV_RESOURCES && i <= PCI_IOV_RESOURCE_END)
> @@ -1538,7 +1547,7 @@ static int pbus_size_mem(struct pci_bus
>              r_size = resource_size(r);
>              align = pci_resource_alignment(dev, r);
>              /* put SRIOV/ROM res to realloc list */
> -            if (realloc_head && is_optional(i)) {
> +            if (realloc_head && is_optional(dev, i)) {
>                  add_to_align_test_list(&align_test_add_list,
>                          align, r_size, &dev->dev, i);
>                  r->end = r->start - 1;
> @@ -1549,6 +1558,9 @@ static int pbus_size_mem(struct pci_bus
>                      max_add_align = align;
>                  continue;
>              }
> +            if (!realloc_head && i == PCI_ROM_RESOURCE &&
> +                !pci_assign_roms() && !pci_dev_need_rom_bar(dev))
> +                continue;
>
>              if (align > (1ULL<<37)) { /*128 Gb*/
>                  dev_warn(&dev->dev, "disabling BAR %d: %pR (bad
> alignment %#llx)\n",
> Index: linux-2.6/include/linux/pci.h
> ===================================================================
> --- linux-2.6.orig/include/linux/pci.h
> +++ linux-2.6/include/linux/pci.h
> @@ -182,6 +182,8 @@ enum pci_dev_flags {
>      PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
>      /* Get VPD from function 0 VPD */
>      PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
> +    /* Need to have ROM BAR */
> +    PCI_DEV_FLAGS_NEED_ROM_BAR = (__force pci_dev_flags_t) (1 << 9),
>  };
>
>  enum pci_irq_reroute_variant {
> @@ -1965,6 +1967,14 @@ static inline bool pci_is_dev_assigned(s
>  {
>      return (pdev->dev_flags & PCI_DEV_FLAGS_ASSIGNED) ==
> PCI_DEV_FLAGS_ASSIGNED;
>  }
> +static inline void pci_dev_set_need_rom_bar(struct pci_dev *pdev)
> +{
> +    pdev->dev_flags |= PCI_DEV_FLAGS_NEED_ROM_BAR;
> +}
> +static inline bool pci_dev_need_rom_bar(struct pci_dev *pdev)
> +{
> +    return !!(pdev->dev_flags & PCI_DEV_FLAGS_NEED_ROM_BAR);
> +}
>
>  /**
>   * pci_ari_enabled - query ARI forwarding status
> @@ -1976,4 +1986,7 @@ static inline bool pci_ari_enabled(struc
>  {
>      return bus->self && bus->self->ari_enabled;
>  }
> +
> +bool pci_assign_roms(void);
> +
>  #endif /* LINUX_PCI_H */
> Index: linux-2.6/drivers/misc/pch_phub.c
> ===================================================================
> --- linux-2.6.orig/drivers/misc/pch_phub.c
> +++ linux-2.6/drivers/misc/pch_phub.c
> @@ -49,7 +49,6 @@
>
>  /* MAX number of INT_REDUCE_CONTROL registers */
>  #define MAX_NUM_INT_REDUCE_CONTROL_REG 128
> -#define PCI_DEVICE_ID_PCH1_PHUB 0x8801
>  #define PCH_MINOR_NOS 1
>  #define CLKCFG_CAN_50MHZ 0x12000000
>  #define CLKCFG_CANCLK_MASK 0xFF000000
> @@ -61,17 +60,6 @@
>  #define CLKCFG_PLL2VCO                (8 << 9)
>  #define CLKCFG_UARTCLKSEL            (1 << 18)
>
> -/* Macros for ML7213 */
> -#define PCI_VENDOR_ID_ROHM            0x10db
> -#define PCI_DEVICE_ID_ROHM_ML7213_PHUB        0x801A
> -
> -/* Macros for ML7223 */
> -#define PCI_DEVICE_ID_ROHM_ML7223_mPHUB    0x8012 /* for Bus-m */
> -#define PCI_DEVICE_ID_ROHM_ML7223_nPHUB    0x8002 /* for Bus-n */
> -
> -/* Macros for ML7831 */
> -#define PCI_DEVICE_ID_ROHM_ML7831_PHUB 0x8801
> -
>  /* SROM ACCESS Macro */
>  #define PCH_WORD_ADDR_MASK (~((1 << 2) - 1))
>
> Index: linux-2.6/include/linux/pci_ids.h
> ===================================================================
> --- linux-2.6.orig/include/linux/pci_ids.h
> +++ linux-2.6/include/linux/pci_ids.h
> @@ -1122,6 +1122,12 @@
>  #define PCI_VENDOR_ID_TCONRAD        0x10da
>  #define PCI_DEVICE_ID_TCONRAD_TOKENRING    0x0508
>
> +#define PCI_VENDOR_ID_ROHM             0x10DB
> +#define PCI_DEVICE_ID_ROHM_ML7213_PHUB          0x801A
> +#define PCI_DEVICE_ID_ROHM_ML7223_mPHUB 0x8012 /* for Bus-m */
> +#define PCI_DEVICE_ID_ROHM_ML7223_nPHUB 0x8002 /* for Bus-n */
> +#define PCI_DEVICE_ID_ROHM_ML7831_PHUB 0x8801
> +
>  #define PCI_VENDOR_ID_NVIDIA            0x10de
>  #define PCI_DEVICE_ID_NVIDIA_TNT        0x0020
>  #define PCI_DEVICE_ID_NVIDIA_TNT2        0x0028
> @@ -2900,6 +2906,7 @@
>  #define PCI_DEVICE_ID_INTEL_82454NX     0x84cb
>  #define PCI_DEVICE_ID_INTEL_84460GX    0x84ea
>  #define PCI_DEVICE_ID_INTEL_IXP4XX    0x8500
> +#define PCI_DEVICE_ID_PCH1_PHUB 0x8801
>  #define PCI_DEVICE_ID_INTEL_IXP2800    0x9004
>  #define PCI_DEVICE_ID_INTEL_S21152BB    0xb152
>
> Index: linux-2.6/drivers/dma/pch_dma.c
> ===================================================================
> --- linux-2.6.orig/drivers/dma/pch_dma.c
> +++ linux-2.6/drivers/dma/pch_dma.c
> @@ -976,7 +976,6 @@ static void pch_dma_remove(struct pci_de
>  }
>
>  /* PCI Device ID of DMA device */
> -#define PCI_VENDOR_ID_ROHM             0x10DB
>  #define PCI_DEVICE_ID_EG20T_PCH_DMA_8CH        0x8810
>  #define PCI_DEVICE_ID_EG20T_PCH_DMA_4CH        0x8815
>  #define PCI_DEVICE_ID_ML7213_DMA1_8CH    0x8026
> Index: linux-2.6/drivers/gpio/gpio-ml-ioh.c
> ===================================================================
> --- linux-2.6.orig/drivers/gpio/gpio-ml-ioh.c
> +++ linux-2.6/drivers/gpio/gpio-ml-ioh.c
> @@ -31,8 +31,6 @@
>
>  #define IOH_IRQ_BASE        0
>
> -#define PCI_VENDOR_ID_ROHM             0x10DB
> -
>  struct ioh_reg_comn {
>      u32    ien;
>      u32    istatus;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] PCI: Unassigned Expansion ROM BARs
  2015-09-25  4:35       ` Yinghai Lu
  2015-09-25 13:31         ` Myron Stowe
@ 2015-09-25 14:35         ` Bjorn Helgaas
  2015-09-25 16:18           ` Alex Williamson
  1 sibling, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2015-09-25 14:35 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Myron Stowe, linux-pci, LKML, Bjorn Helgaas, Alex Williamson

On Thu, Sep 24, 2015 at 09:35:20PM -0700, Yinghai Lu wrote:
> On Thu, Sep 24, 2015 at 12:01 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> 
> > Or do we want to keep a white list to say which device should have
> > ROM bar as mush have, and other is optional to have ?
> 
> Subject: [RFC PATCH] PCI: Add pci_dev_need_rom_bar()
> 
> Only set that for
> 1. if BIOS/firmware already set ROM bar.
> 2. via quirks for some devices.
> 
> We assign those needed ROM bar as required
> and other ROM bars as optional resources.

I'd rather not have a whitelist if we can avoid it.  We'd be
continually adding new devices to it, and it makes the system harder
to understand because there's no consistent rule about how ROMs are
handled.

Alex mentioned the idea of ripping the ROM, and I'd like to explore
that idea a little more.  What if we could temporarily assign space
for the ROM during enumeration, read the contents, cache the contents
somewhere, then deallocate the actual BAR space?  We could hang onto
the cache and give it to anybody who later needs the ROM.

I know there are probably issues here, but I don't know what they are,
so I'd like to at least have a conversation about it.

Bjorn

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] PCI: Unassigned Expansion ROM BARs
  2015-09-25 14:35         ` Bjorn Helgaas
@ 2015-09-25 16:18           ` Alex Williamson
  2015-09-26  0:04             ` Yinghai Lu
  0 siblings, 1 reply; 15+ messages in thread
From: Alex Williamson @ 2015-09-25 16:18 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Yinghai Lu, Myron Stowe, linux-pci, LKML, Bjorn Helgaas

On Fri, 2015-09-25 at 09:35 -0500, Bjorn Helgaas wrote:
> On Thu, Sep 24, 2015 at 09:35:20PM -0700, Yinghai Lu wrote:
> > On Thu, Sep 24, 2015 at 12:01 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> > 
> > > Or do we want to keep a white list to say which device should have
> > > ROM bar as mush have, and other is optional to have ?
> > 
> > Subject: [RFC PATCH] PCI: Add pci_dev_need_rom_bar()
> > 
> > Only set that for
> > 1. if BIOS/firmware already set ROM bar.
> > 2. via quirks for some devices.
> > 
> > We assign those needed ROM bar as required
> > and other ROM bars as optional resources.
> 
> I'd rather not have a whitelist if we can avoid it.  We'd be
> continually adding new devices to it, and it makes the system harder
> to understand because there's no consistent rule about how ROMs are
> handled.
> 
> Alex mentioned the idea of ripping the ROM, and I'd like to explore
> that idea a little more.  What if we could temporarily assign space
> for the ROM during enumeration, read the contents, cache the contents
> somewhere, then deallocate the actual BAR space?  We could hang onto
> the cache and give it to anybody who later needs the ROM.
> 
> I know there are probably issues here, but I don't know what they are,
> so I'd like to at least have a conversation about it.

Ok, so for background in the case I mentioned, we can often use the
pci-sysfs rom interface to get a copy of the device ROM, which we then
pass to QEMU and we avoid any access to the physical ROM from the VM.
We can obviously get the ROM from other sources too, but that's not
really relevant here.

If we want to extend this idea into the kernel, creating a buffer that
holds the ROM contents that we access rather than mapping and enabling
the ROM to provide access, we first need to get access to the ROM at
least once.  The simplification here is that we can do this on boot and
we can re-use the space allocated to the standard BARs since we don't
need space for both the ROM and the standard BARs at the same time.  I
think that in the vast majority of cases, we're going to find that the
ROM BAR is smaller than the largest standard MMIO BAR of the device or
at least smaller than the minimum bridge aperture.  This suggests that
we could simply record the BAR values, reprogram them to zero, then
overlap the ROM BAR to the orignal addresses, enable, rip the ROM, then
restore the configuration without needing to adjust any bridge
apertures.

That sounds pretty good, so long as we can consider the ROM to be
perfectly static.  I don't know if anything relies on an in-place update
or if there are ROMs that are less read-only than others.  Maybe it's
safe to assume that or at least safe to assume that if the BIOS didn't
leave room for them, then we can consider them static.  It might be
interesting to strace some of the userspace firmware update programs for
add-in cards to see if they re-read the ROM to determine if it has
changed.

We already sort of do this for VGA ROMs.  When a driver tries to map the
boot VGA ROM they actually map the shadow copy at 0xC0000 (iirc) rather
than the one on the device.  This actually sort of sucks because this
particular shadow copy certainly is not read-only and in all the glory
that is VGA, the shadow sometimes gets modified by the execution of the
VGA ROM itself and we no longer have access to the pristine device
version of it (bad for device assignment of primary graphics).

Now, if we want to make our own shadow copies for all ROMs, it seems
pretty clear that we first need to get access to the ROM, which means we
need to figure out if the BIOS mapped it.  If the ROM BAR is outside of
the bridge aperture (catching both 0 and 0xFFF..000) or overlaps a
standard ROM BAR, we can consider it unprogrammed.  In that case we need
to try to do the trick above with unmapping standard BARs to get the
shadow copy.  Otherwise we should be able to get the shadow copy in
place (maybe a question of which we prefer to use in this case).  I
would be willing to risk that if the BIOS didn't leave room for the ROM
and we can't map it into the space used by other BARs or it doesn't fit
the bridge aperture, we can spit out a printk warning and skip it.  I
expect a very low failure rate.

What dependencies would we have on the BIOS programming of the ROM BAR
if we took such an approach?  It seems like we should always be able to
detect invalid contents in the ROM BAR and it doesn't matter if they
think we should have access or not, we own the device and can give
ourselves access.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] PCI: Unassigned Expansion ROM BARs
  2015-09-25 13:31         ` Myron Stowe
@ 2015-09-25 17:02           ` Myron Stowe
  0 siblings, 0 replies; 15+ messages in thread
From: Myron Stowe @ 2015-09-25 17:02 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: linux-pci, LKML, Bjorn Helgaas, Alex Williamson

On Fri, Sep 25, 2015 at 7:31 AM, Myron Stowe <myron.stowe@gmail.com> wrote:
> On Thu, Sep 24, 2015 at 10:35 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Thu, Sep 24, 2015 at 12:01 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>
>>> Or do we want to keep a white list to say which device should have
>>> ROM bar as mush have, and other is optional to have ?
>
> I suppose this idea is one possible outcome that could occur but I
> think we need to have a discussion first before we start making a lot
> of changes.  We need to try and come to some consensus with BIOS
> engineers.  I know that both sets have been alerted to this
> conversation so *if* we come up with some good arguments to support
> the kernel's current view/actions perhaps things will start to
> progress.
>
> In the prior thread you replied:
>   "They are wrong.
>    It still depends on how addon card firmware and drivers
>    from OS to use it."
>
> So continuing my suggested thought experiment where you are sitting
> across the table from your platform's BIOS engineers having this
> discussion ...  Do you *really* think your reply was helpful in any
> way?  Do you *really* think you did anything what so ever to get the
> BIOS engineers to consider something they hadn't before.  Do you
> *really* think your reply was technically based in any way?  Do you
> have any specification references or such to back up such a strong
> claim?
>
> Come on here Yinghai - you are an intelligent person.  Take 1/10th the
> time that you spent developing this patch and think, gather your
> thoughts, and then sit down calmly, have a beer or coffee or tea
> (which ever you prefer), relax, and take some time to reply in a
> logical, thoughtful manner here with enough expression that others can
> understand what you are getting at and hopefully even with some
> passion or logic to try to convince the BIOS engineers that the
> kernel's current behavior is correct.  This is your area of expertise
> - so stand up and rise to the occasion here!
>
> Hacking out a patch before this thread has played out serves no
> purpose what so ever so I'm not even going to waste my time and look
> at it.  It serves no purpose and will only make matters worse as there
> is already strong disagreement with the kernel's actions in these
> regards.

Sigh, that was a terrible response on my part - I'm trying to
encourage engagement in this discussion and yet my response likely has
exactly the opposite result; shutting you down.  Yinghai, I apologize,
truly!

Others have privately said that you may be uncomfortable with
expressing your views due to language skills.  If that's the case then
please don't be intimidated and limit your contributions.  I expect
you know at least two languages which is 50% more than me so don't
worry about grammar or such - that's not important and we could
benefit from your experience and knowledge.  English is my only
language and I still too often find it difficult to express my
opinions.

Again, I'm sorry for my rash, harsh, response,
 Myron

>
>>
>> Subject: [RFC PATCH] PCI: Add pci_dev_need_rom_bar()
>>
>> Only set that for
>> 1. if BIOS/firmware already set ROM bar.
>> 2. via quirks for some devices.
>>
>> We assign those needed ROM bar as required
>> and other ROM bars as optional resources.
>>
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>>
>> ---
>>  arch/x86/pci/i386.c        |    9 +++++-
>>  drivers/dma/pch_dma.c      |    1
>>  drivers/gpio/gpio-ml-ioh.c |    2 -
>>  drivers/misc/pch_phub.c    |   12 --------
>>  drivers/pci/probe.c        |    7 +++++
>>  drivers/pci/quirks.c       |   63 +++++++++++++++++++++++++++++++++++++++++++++
>>  drivers/pci/setup-bus.c    |   18 ++++++++++--
>>  include/linux/pci.h        |   13 +++++++++
>>  include/linux/pci_ids.h    |    7 +++++
>>  9 files changed, 112 insertions(+), 20 deletions(-)
>>
>> Index: linux-2.6/arch/x86/pci/i386.c
>> ===================================================================
>> --- linux-2.6.orig/arch/x86/pci/i386.c
>> +++ linux-2.6/arch/x86/pci/i386.c
>> @@ -377,11 +377,16 @@ static void pcibios_allocate_rom_resourc
>>      }
>>  }
>>
>> +bool pci_assign_roms(void)
>> +{
>> +    return !!(pci_probe & PCI_ASSIGN_ROMS);
>> +}
>> +
>>  static int __init pcibios_assign_resources(void)
>>  {
>>      struct pci_host_bridge *host_bridge = NULL;
>>
>> -    if (!(pci_probe & PCI_ASSIGN_ROMS))
>> +    if (!pci_assign_roms())
>>          for_each_pci_host_bridge(host_bridge)
>>              pcibios_allocate_rom_resources(host_bridge->bus);
>>
>> @@ -406,7 +411,7 @@ void pcibios_resource_survey_bus(struct
>>      pcibios_allocate_resources(bus, 0);
>>      pcibios_allocate_resources(bus, 1);
>>
>> -    if (!(pci_probe & PCI_ASSIGN_ROMS))
>> +    if (!pci_assign_roms())
>>          pcibios_allocate_rom_resources(bus);
>>  }
>>
>> Index: linux-2.6/drivers/pci/probe.c
>> ===================================================================
>> --- linux-2.6.orig/drivers/pci/probe.c
>> +++ linux-2.6/drivers/pci/probe.c
>> @@ -224,6 +224,13 @@ int __pci_read_base(struct pci_dev *dev,
>>          l64 = l & PCI_ROM_ADDRESS_MASK;
>>          sz64 = sz & PCI_ROM_ADDRESS_MASK;
>>          mask64 = (u32)PCI_ROM_ADDRESS_MASK;
>> +        /* simple validation */
>> +        if (l64 && sz64 &&
>> +            (l64 & 0xff000000) != 0xff000000 &&
>> +            system_state == SYSTEM_BOOTING) {
>> +            dev_printk(KERN_DEBUG, &dev->dev, "set dev_flags NEED_ROM_BAR\n");
>> +            pci_dev_set_need_rom_bar(dev);
>> +        }
>>      }
>>
>>      if (res->flags & IORESOURCE_MEM_64) {
>> Index: linux-2.6/drivers/pci/quirks.c
>> ===================================================================
>> --- linux-2.6.orig/drivers/pci/quirks.c
>> +++ linux-2.6/drivers/pci/quirks.c
>> @@ -4197,3 +4197,66 @@ static void quirk_intel_qat_vf_cap(struc
>>      }
>>  }
>>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, quirk_intel_qat_vf_cap);
>> +
>> +/* from drivers/mtd/maps/pci.c */
>> +static void quirk_set_need_rom_bar(struct pci_dev *pdev)
>> +{
>> +    if (!pci_dev_need_rom_bar(pdev)) {
>> +        dev_printk(KERN_DEBUG, &pdev->dev, "set dev_flags NEED_ROM_BAR\n");
>> +        pci_dev_set_need_rom_bar(pdev);
>> +    }
>> +}
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_DEC, PCI_DEVICE_ID_DEC_21285,
>> +             quirk_set_need_rom_bar);
>> +
>> +#ifdef CONFIG_PARISC
>> +/* from drivers/video/console/sticore.c */
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HP, PCI_DEVICE_ID_HP_VISUALIZE_EG,
>> +             quirk_set_need_rom_bar);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HP, PCI_DEVICE_ID_HP_VISUALIZE_FX6,
>> +             quirk_set_need_rom_bar);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HP, PCI_DEVICE_ID_HP_VISUALIZE_FX4,
>> +             quirk_set_need_rom_bar);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HP, PCI_DEVICE_ID_HP_VISUALIZE_FX2,
>> +             quirk_set_need_rom_bar);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HP, PCI_DEVICE_ID_HP_VISUALIZE_FXE,
>> +             quirk_set_need_rom_bar);
>> +#endif
>> +
>> +/* from drivers/misc/pch_phub.c */
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_PCH1_PHUB,
>> +             quirk_set_need_rom_bar);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ROHM, PCI_DEVICE_ID_ROHM_ML7213_PHUB,
>> +             quirk_set_need_rom_bar);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ROHM, PCI_DEVICE_ID_ROHM_ML7223_mPHUB,
>> +             quirk_set_need_rom_bar);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ROHM, PCI_DEVICE_ID_ROHM_ML7223_nPHUB,
>> +             quirk_set_need_rom_bar);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ROHM, PCI_DEVICE_ID_ROHM_ML7831_PHUB,
>> +             quirk_set_need_rom_bar);
>> +
>> +/* from drivers/net/ethernet/sun/sungem.c */
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_GEM,
>> +             quirk_set_need_rom_bar);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_RIO_GEM,
>> +             quirk_set_need_rom_bar);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_APPLE, PCI_DEVICE_ID_APPLE_UNI_N_GMAC,
>> +             quirk_set_need_rom_bar);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_APPLE, PCI_DEVICE_ID_APPLE_UNI_N_GMACP,
>> +             quirk_set_need_rom_bar);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_APPLE, PCI_DEVICE_ID_APPLE_UNI_N_GMAC2,
>> +             quirk_set_need_rom_bar);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_APPLE, PCI_DEVICE_ID_APPLE_K2_GMAC,
>> +             quirk_set_need_rom_bar);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_APPLE, PCI_DEVICE_ID_APPLE_SH_SUNGEM,
>> +             quirk_set_need_rom_bar);
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_APPLE, PCI_DEVICE_ID_APPLE_IPID2_GMAC,
>> +             quirk_set_need_rom_bar);
>> +
>> +/* from drivers/net/ethernet/sun/sunhme.c */
>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SUN, PCI_DEVICE_ID_SUN_HAPPYMEAL,
>> +             quirk_set_need_rom_bar);
>> +
>> +/* from drm and fbdev */
>> +DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_DISPLAY_VGA,
>> +                8, quirk_set_need_rom_bar);
>> Index: linux-2.6/drivers/pci/setup-bus.c
>> ===================================================================
>> --- linux-2.6.orig/drivers/pci/setup-bus.c
>> +++ linux-2.6/drivers/pci/setup-bus.c
>> @@ -226,6 +226,10 @@ static void pdev_assign_resources_prepar
>>          if (resource_disabled(r) || r->parent)
>>              continue;
>>
>> +        if (i == PCI_ROM_RESOURCE &&
>> +            !pci_assign_roms() && !pci_dev_need_rom_bar(dev))
>> +            continue;
>> +
>>          if ((r->flags & IORESOURCE_IO) &&
>>              !pci_find_host_bridge(dev->bus)->has_ioport)
>>              continue;
>> @@ -1461,11 +1465,16 @@ out:
>>      return good_size;
>>  }
>>
>> -static inline bool is_optional(int i)
>> +bool __weak pci_assign_roms(void)
>> +{
>> +    return false;
>> +}
>> +
>> +static inline bool is_optional(struct pci_dev *dev, int i)
>>  {
>>
>>      if (i == PCI_ROM_RESOURCE)
>> -        return true;
>> +        return !pci_assign_roms() && !pci_dev_need_rom_bar(dev);
>>
>>  #ifdef CONFIG_PCI_IOV
>>      if (i >= PCI_IOV_RESOURCES && i <= PCI_IOV_RESOURCE_END)
>> @@ -1538,7 +1547,7 @@ static int pbus_size_mem(struct pci_bus
>>              r_size = resource_size(r);
>>              align = pci_resource_alignment(dev, r);
>>              /* put SRIOV/ROM res to realloc list */
>> -            if (realloc_head && is_optional(i)) {
>> +            if (realloc_head && is_optional(dev, i)) {
>>                  add_to_align_test_list(&align_test_add_list,
>>                          align, r_size, &dev->dev, i);
>>                  r->end = r->start - 1;
>> @@ -1549,6 +1558,9 @@ static int pbus_size_mem(struct pci_bus
>>                      max_add_align = align;
>>                  continue;
>>              }
>> +            if (!realloc_head && i == PCI_ROM_RESOURCE &&
>> +                !pci_assign_roms() && !pci_dev_need_rom_bar(dev))
>> +                continue;
>>
>>              if (align > (1ULL<<37)) { /*128 Gb*/
>>                  dev_warn(&dev->dev, "disabling BAR %d: %pR (bad
>> alignment %#llx)\n",
>> Index: linux-2.6/include/linux/pci.h
>> ===================================================================
>> --- linux-2.6.orig/include/linux/pci.h
>> +++ linux-2.6/include/linux/pci.h
>> @@ -182,6 +182,8 @@ enum pci_dev_flags {
>>      PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
>>      /* Get VPD from function 0 VPD */
>>      PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
>> +    /* Need to have ROM BAR */
>> +    PCI_DEV_FLAGS_NEED_ROM_BAR = (__force pci_dev_flags_t) (1 << 9),
>>  };
>>
>>  enum pci_irq_reroute_variant {
>> @@ -1965,6 +1967,14 @@ static inline bool pci_is_dev_assigned(s
>>  {
>>      return (pdev->dev_flags & PCI_DEV_FLAGS_ASSIGNED) ==
>> PCI_DEV_FLAGS_ASSIGNED;
>>  }
>> +static inline void pci_dev_set_need_rom_bar(struct pci_dev *pdev)
>> +{
>> +    pdev->dev_flags |= PCI_DEV_FLAGS_NEED_ROM_BAR;
>> +}
>> +static inline bool pci_dev_need_rom_bar(struct pci_dev *pdev)
>> +{
>> +    return !!(pdev->dev_flags & PCI_DEV_FLAGS_NEED_ROM_BAR);
>> +}
>>
>>  /**
>>   * pci_ari_enabled - query ARI forwarding status
>> @@ -1976,4 +1986,7 @@ static inline bool pci_ari_enabled(struc
>>  {
>>      return bus->self && bus->self->ari_enabled;
>>  }
>> +
>> +bool pci_assign_roms(void);
>> +
>>  #endif /* LINUX_PCI_H */
>> Index: linux-2.6/drivers/misc/pch_phub.c
>> ===================================================================
>> --- linux-2.6.orig/drivers/misc/pch_phub.c
>> +++ linux-2.6/drivers/misc/pch_phub.c
>> @@ -49,7 +49,6 @@
>>
>>  /* MAX number of INT_REDUCE_CONTROL registers */
>>  #define MAX_NUM_INT_REDUCE_CONTROL_REG 128
>> -#define PCI_DEVICE_ID_PCH1_PHUB 0x8801
>>  #define PCH_MINOR_NOS 1
>>  #define CLKCFG_CAN_50MHZ 0x12000000
>>  #define CLKCFG_CANCLK_MASK 0xFF000000
>> @@ -61,17 +60,6 @@
>>  #define CLKCFG_PLL2VCO                (8 << 9)
>>  #define CLKCFG_UARTCLKSEL            (1 << 18)
>>
>> -/* Macros for ML7213 */
>> -#define PCI_VENDOR_ID_ROHM            0x10db
>> -#define PCI_DEVICE_ID_ROHM_ML7213_PHUB        0x801A
>> -
>> -/* Macros for ML7223 */
>> -#define PCI_DEVICE_ID_ROHM_ML7223_mPHUB    0x8012 /* for Bus-m */
>> -#define PCI_DEVICE_ID_ROHM_ML7223_nPHUB    0x8002 /* for Bus-n */
>> -
>> -/* Macros for ML7831 */
>> -#define PCI_DEVICE_ID_ROHM_ML7831_PHUB 0x8801
>> -
>>  /* SROM ACCESS Macro */
>>  #define PCH_WORD_ADDR_MASK (~((1 << 2) - 1))
>>
>> Index: linux-2.6/include/linux/pci_ids.h
>> ===================================================================
>> --- linux-2.6.orig/include/linux/pci_ids.h
>> +++ linux-2.6/include/linux/pci_ids.h
>> @@ -1122,6 +1122,12 @@
>>  #define PCI_VENDOR_ID_TCONRAD        0x10da
>>  #define PCI_DEVICE_ID_TCONRAD_TOKENRING    0x0508
>>
>> +#define PCI_VENDOR_ID_ROHM             0x10DB
>> +#define PCI_DEVICE_ID_ROHM_ML7213_PHUB          0x801A
>> +#define PCI_DEVICE_ID_ROHM_ML7223_mPHUB 0x8012 /* for Bus-m */
>> +#define PCI_DEVICE_ID_ROHM_ML7223_nPHUB 0x8002 /* for Bus-n */
>> +#define PCI_DEVICE_ID_ROHM_ML7831_PHUB 0x8801
>> +
>>  #define PCI_VENDOR_ID_NVIDIA            0x10de
>>  #define PCI_DEVICE_ID_NVIDIA_TNT        0x0020
>>  #define PCI_DEVICE_ID_NVIDIA_TNT2        0x0028
>> @@ -2900,6 +2906,7 @@
>>  #define PCI_DEVICE_ID_INTEL_82454NX     0x84cb
>>  #define PCI_DEVICE_ID_INTEL_84460GX    0x84ea
>>  #define PCI_DEVICE_ID_INTEL_IXP4XX    0x8500
>> +#define PCI_DEVICE_ID_PCH1_PHUB 0x8801
>>  #define PCI_DEVICE_ID_INTEL_IXP2800    0x9004
>>  #define PCI_DEVICE_ID_INTEL_S21152BB    0xb152
>>
>> Index: linux-2.6/drivers/dma/pch_dma.c
>> ===================================================================
>> --- linux-2.6.orig/drivers/dma/pch_dma.c
>> +++ linux-2.6/drivers/dma/pch_dma.c
>> @@ -976,7 +976,6 @@ static void pch_dma_remove(struct pci_de
>>  }
>>
>>  /* PCI Device ID of DMA device */
>> -#define PCI_VENDOR_ID_ROHM             0x10DB
>>  #define PCI_DEVICE_ID_EG20T_PCH_DMA_8CH        0x8810
>>  #define PCI_DEVICE_ID_EG20T_PCH_DMA_4CH        0x8815
>>  #define PCI_DEVICE_ID_ML7213_DMA1_8CH    0x8026
>> Index: linux-2.6/drivers/gpio/gpio-ml-ioh.c
>> ===================================================================
>> --- linux-2.6.orig/drivers/gpio/gpio-ml-ioh.c
>> +++ linux-2.6/drivers/gpio/gpio-ml-ioh.c
>> @@ -31,8 +31,6 @@
>>
>>  #define IOH_IRQ_BASE        0
>>
>> -#define PCI_VENDOR_ID_ROHM             0x10DB
>> -
>>  struct ioh_reg_comn {
>>      u32    ien;
>>      u32    istatus;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] PCI: Unassigned Expansion ROM BARs
  2015-09-25 16:18           ` Alex Williamson
@ 2015-09-26  0:04             ` Yinghai Lu
  0 siblings, 0 replies; 15+ messages in thread
From: Yinghai Lu @ 2015-09-26  0:04 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Bjorn Helgaas, Myron Stowe, linux-pci, LKML, Bjorn Helgaas

On Fri, Sep 25, 2015 at 9:18 AM, Alex Williamson
<alex.williamson@redhat.com> wrote:
>> > > Or do we want to keep a white list to say which device should have
>> > > ROM bar as mush have, and other is optional to have ?
>> >
>> > Subject: [RFC PATCH] PCI: Add pci_dev_need_rom_bar()
>> >
>> > Only set that for
>> > 1. if BIOS/firmware already set ROM bar.
>> > 2. via quirks for some devices.
>> >
>> > We assign those needed ROM bar as required
>> > and other ROM bars as optional resources.
>>
>> I'd rather not have a whitelist if we can avoid it.  We'd be
>> continually adding new devices to it, and it makes the system harder
>> to understand because there's no consistent rule about how ROMs are
>> handled.

I don't like that whitelist way, and hope we can find better way to
handle all the
cases elegantly.

>>
>> Alex mentioned the idea of ripping the ROM, and I'd like to explore
>> that idea a little more.  What if we could temporarily assign space
>> for the ROM during enumeration, read the contents, cache the contents
>> somewhere, then deallocate the actual BAR space?  We could hang onto
>> the cache and give it to anybody who later needs the ROM.
>
> That sounds pretty good, so long as we can consider the ROM to be
> perfectly static.  I don't know if anything relies on an in-place update
> or if there are ROMs that are less read-only than others.  Maybe it's
> safe to assume that or at least safe to assume that if the BIOS didn't
> leave room for them, then we can consider them static.  It might be
> interesting to strace some of the userspace firmware update programs for
> add-in cards to see if they re-read the ROM to determine if it has
> changed.

Should cover most cases. But there is some driver like
drivers/mtd/maps/pci.c::drivers/mtd/maps/pci.c do have write operation
to the mmio range.

>
> We already sort of do this for VGA ROMs.  When a driver tries to map the
> boot VGA ROM they actually map the shadow copy at 0xC0000 (iirc) rather
> than the one on the device.  This actually sort of sucks because this
> particular shadow copy certainly is not read-only and in all the glory
> that is VGA, the shadow sometimes gets modified by the execution of the
> VGA ROM itself and we no longer have access to the pristine device
> version of it (bad for device assignment of primary graphics).
>
> Now, if we want to make our own shadow copies for all ROMs, it seems
> pretty clear that we first need to get access to the ROM, which means we
> need to figure out if the BIOS mapped it.  If the ROM BAR is outside of
> the bridge aperture (catching both 0 and 0xFFF..000) or overlaps a
> standard ROM BAR, we can consider it unprogrammed.  In that case we need
> to try to do the trick above with unmapping standard BARs to get the
> shadow copy.  Otherwise we should be able to get the shadow copy in
> place (maybe a question of which we prefer to use in this case).  I
> would be willing to risk that if the BIOS didn't leave room for the ROM
> and we can't map it into the space used by other BARs or it doesn't fit
> the bridge aperture, we can spit out a printk warning and skip it.  I
> expect a very low failure rate.

Maybe we can combine two methods together:
1. have NEED_ROM_BAR, and it is set
    a) if BIOS allocate resource to yet (maybe not, so we leave space
for MMIO bars)
    b) via limited whitelist that will not support static copy.
2. kernel will try to allocate resource to ROM bar with
    NEED_ROM_BAR as required, and others
    as optional
3. for ROM bar that can not get assigned, kernel try to borrow mmio range
     from other MMIO bar on the device, and save a copy and  expose that
     via /sys/.../rom
     That should happen FINAL_FIXUP stage before driver get
      loaded.

    --- There is risk on it, some add-on card firmware would
         stop working if kernel ever try to change MMIO bar.
         then we will need blacklist to skip BAR on those devices.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] PCI: Unassigned Expansion ROM BARs
  2015-09-24  2:47 [RFC] PCI: Unassigned Expansion ROM BARs Myron Stowe
  2015-09-24  3:21 ` Yinghai Lu
@ 2015-09-27 19:29 ` Myron Stowe
  2015-09-27 20:19 ` Myron Stowe
  2 siblings, 0 replies; 15+ messages in thread
From: Myron Stowe @ 2015-09-27 19:29 UTC (permalink / raw)
  To: linux-pci; +Cc: LKML, Bjorn Helgaas, Yinghai Lu, Alex Williamson

On Wed, Sep 23, 2015 at 8:47 PM, Myron Stowe <myron.stowe@gmail.com> wrote:
snip
>
> The kernel expects device Expansion ROM BARs to be programmed with valid
> values - even if the respective Expansion ROM's Enable bit is 0 (i.e. the
> device’s expansion ROM address space is disabled).  This seems to be the
> main contention point with said BIOS engineers.  If an Expansion ROM BAR is
> not programmed, the kernel will attempt to find available resources and, if
> successful, program it.  As this occurs various 'dmesg' entries
> related to kernel's actions are output.
>

The respective BIOS engineers from the two major vendors exhibiting the
behavior outlined are aware of, and monitoring, this thread.  With the
exception of Daniel's recent post, there hasn't been much substance presented
supporting the OS's viewpoint to encourage the BIOS engineers to enter
into any kind of discussion.  The majority of the responses have gone
straight towards how the OS can effectively work around platform's that
exhibit such setups.  I'd like to step back and present known instances of
the OS's need(s) to access the Expansion (a.k.a. option) ROMs - something
for the BIOS engineers to consider; something with which to start a
dialogue.


There are at least three known major reasons why Linux uses the ROMs:

1) For many of the video cards, Linux has drivers that assume the card has
been initialized by the ROM.  The drivers work fine, but they aren't smart
enough to work with the card straight out of reset - a lot of which is due
to specific vendor's keeping their devices closed; the code remains
proprietary.  When such devices are reset when the OS is running (i.e.
when X is restarted), the OS has to run the ROM before the driver works
again.  Alex Williamson and Daniel Blueman both covered this fairly well,
including the current dificiencies of such, in prior threads.

2) As Daniel further expressed, hot-plug scenarios and PCI domains which
may not be visible to the BIOS at initial boot, may need to access the
ROMs.  In these environments - PCI hiearchies shared among multiple,
distinct, servers; hiearchies using non-transparent bridges - option ROMs
handed off by the BIOS unassigned need to be allocated by the OS so that
they can be accessed under these circumstances.

3) Virtualized guest environments where a device may be assigned to a
virtualized guest is an interesting case.  In such environments the host
OS effectively functions as the meta-level BIOS, setting up a guest's
environment (virtual platform) prior to instantianting it.  Within such a
context consider a simple example:

  NIC devices often have Preboot Execution Environment (PXE) code in their
  ROMs.  In a bare-metal scenario, the BIOS (a.k.a. platform firmware)
  obtains the PXE code from the ROM and initiates its execution.  In this
  scenario, once the OS is up and running there would seem to be no
  further need to access such device's ROMs.

  If we now extend the scenario one meta-level to include virtualization,
  the host OS [1] assumes the role of bare-metal environment's BIOS and
  the virtualized guest takes on the role of bare-metal OS.  As such, if
  the guest is booted via PXE from a NIC device, the meta-level BIOS
  (QEMU/seabios) needs the ROM's content in order initiate PXE execution
  to bring up the guest.

So in virtualized environments, it becomes obvious that all the
traditional BIOS ROM related actions extend to the (host) OS - PXE
booting, device initialization, hot-plug, and directly assigning physical
devices to virtualized guests, etc.


[1]  "host OS" is used here in the generalized sence (i.e. it is in
control and thus the subsequent use of QEMU and seabios are not
specifically differentiated).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] PCI: Unassigned Expansion ROM BARs
  2015-09-24  2:47 [RFC] PCI: Unassigned Expansion ROM BARs Myron Stowe
  2015-09-24  3:21 ` Yinghai Lu
  2015-09-27 19:29 ` Myron Stowe
@ 2015-09-27 20:19 ` Myron Stowe
  2016-09-01 19:16   ` Myron Stowe
  2 siblings, 1 reply; 15+ messages in thread
From: Myron Stowe @ 2015-09-27 20:19 UTC (permalink / raw)
  To: linux-pci; +Cc: LKML, Bjorn Helgaas, Yinghai Lu, Alex Williamson

On Wed, Sep 23, 2015 at 8:47 PM, Myron Stowe <myron.stowe@gmail.com> wrote:
snip
>
> There is a kernel boot parameter, pci=norom, that is intended to disable the
> kernel's resource assignment actions for Expansion ROMs that do not already
> have BIOS assigned address ranges.  Note however, if I remember correctly,
> that this only works if the Expansion ROM BAR is set to "0" by the BIOS
> before hand-off.

In private conversations I was asked: Why do you propose asking the BIOS
to assign setting Expansion ROM BARs to "0"?

That is not what I'm advocating.  I think it's a complete hack.

  Some background context - this is effectively the defacto detente that
  has come to be somehow with one of the major vendor's BIOS' to
  circumvent 'dmesg' entries corresponding to unassigned Expansion ROM
  BARs which draws customer attention.

Unless something has changed recently, specifying "pci=norom" when booting
does not cause the kernel to completely ignore Expansion ROM BARs
all together as one would expect.  The kernel still outputs 'dmesg's
corresponding to unassigned Expansion ROM BARs and also attempts to
allocate such.  This is a kernel bug in my opinion.  It's only if both
"pci=norom" has been specified, and, the BIOS has set the Expansion ROM
BARs to "0" that the kernel completely ignores Expansion ROM BARs and no
'dmesg's are output.

Customers don't want to, and shouldn't have to, utilize kernel parameters.
They are indispensable for kernel engineers to use for debugging and such
but not for normal, every day, (i.e. customer) use.  So, no, I am not
advocating the current defacto detente that is in place today

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] PCI: Unassigned Expansion ROM BARs
  2015-09-27 20:19 ` Myron Stowe
@ 2016-09-01 19:16   ` Myron Stowe
  0 siblings, 0 replies; 15+ messages in thread
From: Myron Stowe @ 2016-09-01 19:16 UTC (permalink / raw)
  To: linux-pci; +Cc: LKML, Bjorn Helgaas, Yinghai Lu, Alex Williamson

Here it is a year later and there has basically been no progress on
this ongoing situation.  I still often encounter bugs raised against
the kernel w.r.t. unmet resource allocations - here is the most recent
example, I'll attach the 'dmesg' log from the platform at
https://bugzilla.kernel.org/show_bug.cgi?id=104931.


Researching device 0000:04:00.3 as it's the device with the issue (and all
other devices/functions under PCI bus 04 due to possible competing resource
needs).


Analysis from v4.7.0 kernel run 'dmesg' log with comments interspersed ...

This platform has two PCI Root Bridges.  Limiting analysis to the first
Root Bridge handling PCI buses 0x00 through 0x7e as it contains the
PCI bus in question - bus 04.

  ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-7e])
  PCI host bridge to bus 0000:00
  pci_bus 0000:00: root bus resource [io  0x0000-0x03bb window]
  pci_bus 0000:00: root bus resource [io  0x03bc-0x03df window]
  pci_bus 0000:00: root bus resource [io  0x03e0-0x0cf7 window]
  pci_bus 0000:00: root bus resource [io  0x1000-0x7fff window]
  pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
  pci_bus 0000:00: root bus resource [mem 0x90000000-0xc7ffbfff window]
  pci_bus 0000:00: root bus resource [mem 0x30000000000-0x33fffffffff window]

CPU addresses falling into the above resource ranges will get intercepted
by the host controller and converted into PCI bus transactions.  Looking
further into the log we find the set of resource ranges (PCI-to-PCI bridge
apertures) corresponding to PCI bus 04.

  pci 0000:00:02.0: PCI bridge to [bus 04]
  pci 0000:00:02.0:   bridge window [io  0x2000-0x2fff]
  pci 0000:00:02.0:   bridge window [mem 0x92000000-0x940fffff]          33M

The following shows what the platforms BIOS programmed into the BARs of
device(s) under PCI bus 04.

  pci 0000:04:00.0: [1924:0923] type 00 class 0x020000
  pci 0000:04:00.0: reg 0x10: [io  0x2300-0x23ff]
  pci 0000:04:00.0: reg 0x18: [mem 0x93800000-0x93ffffff 64bit]         BAR2
  pci 0000:04:00.0: reg 0x20: [mem 0x9400c000-0x9400ffff 64bit]         BAR4
  pci 0000:04:00.0: reg 0x30: [mem 0xfffc0000-0xffffffff pref]          E ROM
  pci 0000:04:00.1: [1924:0923] type 00 class 0x020000
  pci 0000:04:00.1: reg 0x10: [io  0x2200-0x22ff]
  pci 0000:04:00.1: reg 0x18: [mem 0x93000000-0x937fffff 64bit]
  pci 0000:04:00.1: reg 0x20: [mem 0x94008000-0x9400bfff 64bit]
  pci 0000:04:00.1: reg 0x30: [mem 0xfffc0000-0xffffffff pref]
  pci 0000:04:00.2: [1924:0923] type 00 class 0x020000
  pci 0000:04:00.2: reg 0x10: [io  0x2100-0x21ff]
  pci 0000:04:00.2: reg 0x18: [mem 0x92800000-0x92ffffff 64bit]
  pci 0000:04:00.2: reg 0x20: [mem 0x94004000-0x94007fff 64bit]
  pci 0000:04:00.2: reg 0x30: [mem 0xfffc0000-0xffffffff pref]
  pci 0000:04:00.3: [1924:0923] type 00 class 0x020000
  pci 0000:04:00.3: reg 0x10: [io  0x2000-0x20ff]
  pci 0000:04:00.3: reg 0x18: [mem 0x92000000-0x927fffff 64bit]           8M
  pci 0000:04:00.3: reg 0x20: [mem 0x94000000-0x94003fff 64bit]          16K
  pci 0000:04:00.3: reg 0x30: [mem 0xfffc0000-0xffffffff pref]          256K

It's already obvious that the 33M of MMIO space that the PCI-to-PCI bridge
leading to PCI bus 04 provides (0x92000000-0x940fffff) is not enough space
to fully satisfy the MMIO specific addressing needs of all device's BARs
below it - the 4 combined ports - totaling (8M + 16K + 256K) *4) = 33M + 64K.
This is _without_ taking into account any alignment constraints that likely
would increase the buses needed aperture range even further.

Note that the values programmed into the device's Expansion ROM BARs do not
fit within any of its immediately upstream bridge's MMIO related apertures.

  pci 0000:04:00.0: can't claim BAR 6 [mem 0xfffc0000-0xffffffff pref]: no
  compatible bridge window
  pci 0000:04:00.1: can't claim BAR 6 [mem 0xfffc0000-0xffffffff pref]: no
  compatible bridge window
  pci 0000:04:00.2: can't claim BAR 6 [mem 0xfffc0000-0xffffffff pref]: no
  compatible bridge window
  pci 0000:04:00.3: can't claim BAR 6 [mem 0xfffc0000-0xffffffff pref]: no
  compatible bridge window

The kernel notices this and attempts to allocate appropriate space for them
from any remaining, available, MMIO space that meets all the alignment
constraints and such.

  pci 0000:04:00.0: BAR 6: assigned [mem 0x94040000-0x9407ffff pref]
  pci 0000:04:00.1: BAR 6: assigned [mem 0x94080000-0x940bffff pref]
  pci 0000:04:00.2: BAR 6: assigned [mem 0x940c0000-0x940fffff pref]
  pci 0000:04:00.3: BAR 6: no space for [mem size 0x00040000 pref]
  pci 0000:04:00.3: BAR 6: failed to assign [mem size 0x00040000 pref]

The kernel was able to satisfy the first three ports MMIO needs but was
_not_ able to for the last port - there is no remaining available
addressing space within the range to satisfy its needs!

At this point the 0000:04:00.3 device just happens to work by luck due to
the fact that the unmet resource needs correspond to its Expansion ROM
BAR [1].


Next a "user" initiates a PCIe hot-unplug of the device, the bus
is re-scanned and as a result, BAR4 of all 4 of the device's functions fail
getting their appropriate resources allocated.

  pci 0000:00:02.0: PCI bridge to [bus 04]
  pci 0000:00:02.0:   bridge window [io  0x2000-0x2fff]
  pci 0000:00:02.0:   bridge window [mem 0x92000000-0x940fffff]          33M

  pci 0000:04:00.0: BAR 2: assigned [mem 0x92000000-0x927fffff 64bit]
  pci 0000:04:00.1: BAR 2: assigned [mem 0x92800000-0x92ffffff 64bit]
  pci 0000:04:00.2: BAR 2: assigned [mem 0x93000000-0x937fffff 64bit]
  pci 0000:04:00.3: BAR 2: assigned [mem 0x93800000-0x93ffffff 64bit]
  pci 0000:04:00.0: BAR 6: assigned [mem 0x94000000-0x9403ffff pref]
  pci 0000:04:00.1: BAR 6: assigned [mem 0x94040000-0x9407ffff pref]
  pci 0000:04:00.2: BAR 6: assigned [mem 0x94080000-0x940bffff pref]
  pci 0000:04:00.3: BAR 6: assigned [mem 0x940c0000-0x940fffff pref]

At this point -all- available MMIO resource space has been consumed.

  For the more visually inclined (if it's not already obvious).  There's
  probably an easier way to visualize the exhaustion but here is my lame
  attempt:

  PCI Bridge 04's MMIO aperture resource range totals 33M
  ( 0x92000000-0x940fffff ).  The first line below denotes the 33M in
  1M increments (chunks).  The second line denotes the addressing range;
  specifically bytes 7 and 6 withing the resource's range ( 0x9--xxxxx ).
  The last line denotes the port (0 through 3) consuming that portion
  of the resource's range.

   1 2 3 4 5 6 7 8 9101112131415161718192021222324252627282930313233    33M
  202122232425262728292a2b2c2d232f303132333435363738393a3b3c3d3e3f40    [76]
   0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3--

  The last 1M is consumed by a smaller granularity so expanding the
  above conceptualization to a finer level.

  1M of resource range ( 94000000-940fffff ) visualized in 32K increments
  ( bytes 5 and 4; 0x940--xxx ).
   1 2 3 4 5 6 7 8 91011121314151617181920212223242526272829303132       1M
  0008101820283038404850586068707880889098a0a8b0b8c0c8d0d8e0e8f0f8      [54]
   0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3

and the remaining needed resource allocation attempts are going to
fail.

  pci 0000:04:00.0: BAR 4: no space for [mem size 0x00004000 64bit]
  pci 0000:04:00.0: BAR 4: failed to assign [mem size 0x00004000 64bit]
  pci 0000:04:00.1: BAR 4: no space for [mem size 0x00004000 64bit]
  pci 0000:04:00.1: BAR 4: failed to assign [mem size 0x00004000 64bit]
  pci 0000:04:00.2: BAR 4: no space for [mem size 0x00004000 64bit]
  pci 0000:04:00.2: BAR 4: failed to assign [mem size 0x00004000 64bit]
  pci 0000:04:00.3: BAR 4: no space for [mem size 0x00004000 64bit]
  pci 0000:04:00.3: BAR 4: failed to assign [mem size 0x00004000 64bit]
  pci 0000:04:00.0: BAR 0: assigned [io  0x2000-0x20ff]
  pci 0000:04:00.1: BAR 0: assigned [io  0x2400-0x24ff]
  pci 0000:04:00.2: BAR 0: assigned [io  0x2800-0x28ff]
  pci 0000:04:00.3: BAR 0: assigned [io  0x2c00-0x2cff]

At this point none of the four functions (ports) - 0000:04:00.{0..3} were
able to get their necessary resource needs met and thus the device's functions
(NIC ports) do not work.  In fact, I would expect the driver's call into
the kernel's PCI core 'pci_enable_device()' routine to fail [1].


Conclusion ...

The root cause of the issue(s) [2] is the platform's BIOS not providing
enough, and setting up properly, resource needs that the device requires -
specifically MMIO addressing space related resources.  Most notably
conspicuous is the device's Expansion ROM BAR(s) as they are improperly
programmed - the initial BIOS programmed values do not fall within any
valid resource ranges of the immediately upstream PCI-to-PCI Bridge's MMIO
apertures.


As for "symptomatic" solutions (just a band-aid to treat the symptom and
not addressing the root cause) ...

Short of getting the platform's BIOS updated to appropriately account for
the device's total needs, a "compromized" solution has been to get them to
program device's Expansion ROM BAR values with '0'.  This has been
done in the past so why this platform's BIOS engineers have chosen not
to do that
again in this instance is "out of character" and concerning.  If, and only
if, a device's Expansion ROM BAR is programmed with '0', then adding the
"norom" kernel boot parameter will cause the kernel to ignore, and not
attempt to assign resources to, such.

Short of that, drivers can use, and check the return value of,
pci_enable_rom().  That should fail if it's unassigned.  Looking at it, it
only fails if 'flags == 0' so I'm not sure that catches all cases of it
being unassigned.


[1] For a device's normal BARs - the BARs corresponding to the PCI
    specification's "Base Address 0 through 5" Type 0 configuration header
    space entries - that are initially ill programmed and the kernel can
    not subsequently assign appropriate resources for such, then the
    kernel's PCI core subsystem's 'pci_enable_device()' routine should
    fail.

[2] While the analysis only covers one specific device, the 'dmesg' log
    shows that the same base root cause occurs in at least two additional
    instances.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] PCI: Unassigned Expansion ROM BARs
@ 2015-09-26  6:49 Daniel J Blueman
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel J Blueman @ 2015-09-26  6:49 UTC (permalink / raw)
  To: Myron Stowe; +Cc: Yinghai Lu, Alex Williamson, Bjorn Helgaas, Linux Kernel

On Thursday, September 24, 2015 at 10:50:07 AM UTC+8, Myron Stowe wrote:
> I've encountered numerous bugzilla reports related to platform BIOS' not
> programming valid values into a PCI device's Type 0 Configuration space
> "Expansion ROM Base Address" field (a.k.a. Expansion ROM BAR).  The main
> observed consequence being 'dmesg' entries like the following that get
> customers excited enough to file reports against the kernel.

PCI option ROMs legitimately hold real-mode/EFI code needed to
initialise devices; the problem is, we can't guarantee that the BIOS
has initialised all devices with the option ROM code, so linux must
ensure they are correctly accessible.

In addition to VMs as Alex points out, hotplug (eg Thunderbold GPUs)
and PCI domains which may not be visible to the BIOS at early boot,
may need the option ROM. Nvidia GPUs primarily have had a lot of
encoder/connector (HDCP?) and product-specific voltage-frequency setup
code and tables in the ROM.

As such, in my NumaConnect open firmware which maps the PCI domains of
multiple servers into one, I have to also reallocate PCI option ROMs
[1] to guarantee GPU VBIOS execution in linux. That said, option ROMs
are a dying trend in favour of shipped binary blobs and open-coded
initialisation for cross-platform support, and there are only 10 users
of pci_map_rom().

Thanks,
  Daniel

[1] https://github.com/numascale/nc-utils/blob/master/bootloader/dnc-mmio.c
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-09-01 21:14 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-24  2:47 [RFC] PCI: Unassigned Expansion ROM BARs Myron Stowe
2015-09-24  3:21 ` Yinghai Lu
2015-09-24  3:58   ` Alex Williamson
2015-09-24 17:06   ` Myron Stowe
2015-09-24 19:01     ` Yinghai Lu
2015-09-25  4:35       ` Yinghai Lu
2015-09-25 13:31         ` Myron Stowe
2015-09-25 17:02           ` Myron Stowe
2015-09-25 14:35         ` Bjorn Helgaas
2015-09-25 16:18           ` Alex Williamson
2015-09-26  0:04             ` Yinghai Lu
2015-09-27 19:29 ` Myron Stowe
2015-09-27 20:19 ` Myron Stowe
2016-09-01 19:16   ` Myron Stowe
2015-09-26  6:49 Daniel J Blueman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.