linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] ARM64 PCI resource survey issue(s)
@ 2019-06-03 23:41 Benjamin Herrenschmidt
  2019-06-04  1:49 ` Bjorn Helgaas
  0 siblings, 1 reply; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2019-06-03 23:41 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Sinan Kaya, Lorenzo Pieralisi, bhelgaas, linux-pci,
	linux-arm-kernel, Zilberman, Zeev, Saidi, Ali

Hi Folks !

I'd like to revive the discussion around Ard's old patch:

https://patchwork.kernel.org/patch/9675707/

We (Amazon) need that sorted one way or another ASAP since we have
setups coming where we must not let Linux change the FW assignments
under some host bridges.

In fact it's a reasonable thing to require for other reasons. The EFI
framebuffer is an example, there can be others where FW/ACPI/EL3 etc...
might have assumptions based on where some system devices were located
by the boot FW and will break if we move them (such things are common
on x86 and powerpc).

Taking a step back I think (and I suspect we generally agree based on
followup discussions I've seen) that the "right" thing to do is to have
our default behaviour be:

   - Claim what the FW established if it's not obviously broken

   - Call pci_assign_unassigned_resources() to assign what the FW
didn't assign

Which is more or less what powerpc and x86 do today, but using a
different mechanism than what's in pci_bus_claim_resources() (they are
similar to each other, I wrote the current powerpc one loosely based on
the x86 one at the time). That leads to a side question (which we
should probably discuss in a separate thread) of whether we want to
consolidate all that.

That said, we also know this is going to break *some* existing
platforms that have known broken FW assignment. The question is then
can we sufficiently detect the breakage and re-assign in those cases
(like x86 does fairly successfully in a number of cases), or do we need
to add some board/platform quirks to force the full re-assigment on
known broken platforms ?

Even if all arm64 platforms are found to be broken today, I would still
like to have our default be to use the FW setup, if anything as an
incentive for newer platforms to get it right. At the very least on
ACPI.

We can use DSM#5 when it exists to force one way or another (ideally on
a per bus basis but that's harder, so let's start with host bridges
maybe ?)

Thoughts ? I'm happy to do some of the work here. We have an emergency
to get the AZ case solved, and Ard old patch does that...

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] ARM64 PCI resource survey issue(s)
  2019-06-03 23:41 [RFC] ARM64 PCI resource survey issue(s) Benjamin Herrenschmidt
@ 2019-06-04  1:49 ` Bjorn Helgaas
  2019-06-04  3:32   ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 27+ messages in thread
From: Bjorn Helgaas @ 2019-06-04  1:49 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ard Biesheuvel, Sinan Kaya, Lorenzo Pieralisi, linux-pci,
	linux-arm-kernel, Zilberman, Zeev, Saidi, Ali

On Tue, Jun 04, 2019 at 09:41:16AM +1000, Benjamin Herrenschmidt wrote:
> Hi Folks !
> 
> I'd like to revive the discussion around Ard's old patch:
> 
> https://patchwork.kernel.org/patch/9675707/
> 
> We (Amazon) need that sorted one way or another ASAP since we have
> setups coming where we must not let Linux change the FW assignments
> under some host bridges.
> 
> In fact it's a reasonable thing to require for other reasons. The EFI
> framebuffer is an example, there can be others where FW/ACPI/EL3 etc...
> might have assumptions based on where some system devices were located
> by the boot FW and will break if we move them (such things are common
> on x86 and powerpc).

I would like to handle these individual devices that cannot be moved
the same way we handle legacy (IDE, VGA) devices, i.e., mark the BARs
with IORESOURCE_IO_FIXED.

This could be done with either Enhanced Allocation capabilities or via
ACPI _DSM function #5.  My preference would be to do this at the
lowest possible level of the PCI hierarchy.  IIRC, EA can do it for
individual BARs, and _DSM can be supplied for any individual device
(or bridge, but I'd prefer to do it on the device because that gives
us more information about exactly what needs to be preserved).

Of course, _DSM *can* be higher, e.g., at the host bridge, but then we
lose the information about what specifically must be immutable, and
that means the OS cannot ever move *anything*, even if it becomes
capable of moving things around to accommodate hot-added devices.

I'm not aware of anything in DT that would correspond to DSM #5, but
it could be added.

> Taking a step back I think (and I suspect we generally agree based on
> followup discussions I've seen) that the "right" thing to do is to have
> our default behaviour be:
> 
>    - Claim what the FW established if it's not obviously broken
> 
>    - Call pci_assign_unassigned_resources() to assign what the FW
> didn't assign
> 
> Which is more or less what powerpc and x86 do today, but using a
> different mechanism than what's in pci_bus_claim_resources() (they are
> similar to each other, I wrote the current powerpc one loosely based on
> the x86 one at the time). That leads to a side question (which we
> should probably discuss in a separate thread) of whether we want to
> consolidate all that.
> 
> That said, we also know this is going to break *some* existing
> platforms that have known broken FW assignment. The question is then
> can we sufficiently detect the breakage and re-assign in those cases
> (like x86 does fairly successfully in a number of cases), or do we need
> to add some board/platform quirks to force the full re-assigment on
> known broken platforms ?

I don't know how to parse this.  What does "known broken FW
assignment" mean?  Are you saying the assignment from FW *looks* valid
(all BARs contain valid addresses and are inside windows of upstream
bridges), but it doesn't work for some reason?  If that's the case,
how would full reassignment by Linux fix anything?  Linux has no idea
how to change a valid-looking assignment to make an actually-valid
assignment.

> Even if all arm64 platforms are found to be broken today, I would still
> like to have our default be to use the FW setup, if anything as an
> incentive for newer platforms to get it right. At the very least on
> ACPI.

I agree that Linux should not change anything unless it needs to.  Of
course, reasons it "needs to" might include allocating more space for
hot-added devices, either because Linux discovered an open slot or
because a user requested more space with a kernel parameter.

> We can use DSM#5 when it exists to force one way or another (ideally on
> a per bus basis but that's harder, so let's start with host bridges
> maybe ?)

I'd prefer starting with endpoints, but I think it will all work out
the same in the end.  When enumerating X, we look for a local _DSM #5
and mark X's BARs/windows accordingly and set a pdev->fixed_resources
bit.  If there's no local _DSM #5, act based on X's parent's
fixed_resources bit.

Bjorn

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] ARM64 PCI resource survey issue(s)
  2019-06-04  1:49 ` Bjorn Helgaas
@ 2019-06-04  3:32   ` Benjamin Herrenschmidt
  2019-06-04  3:37     ` Benjamin Herrenschmidt
                       ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2019-06-04  3:32 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Ard Biesheuvel, Sinan Kaya, Lorenzo Pieralisi, linux-pci,
	linux-arm-kernel, Zilberman, Zeev, Saidi, Ali

On Mon, 2019-06-03 at 20:49 -0500, Bjorn Helgaas wrote:
> On Tue, Jun 04, 2019 at 09:41:16AM +1000, Benjamin Herrenschmidt wrote:
> > Hi Folks !
> > 
> > I'd like to revive the discussion around Ard's old patch:
> > 
> > https://patchwork.kernel.org/patch/9675707/
> > 
> > We (Amazon) need that sorted one way or another ASAP since we have
> > setups coming where we must not let Linux change the FW assignments
> > under some host bridges.
> > 
> > In fact it's a reasonable thing to require for other reasons. The EFI
> > framebuffer is an example, there can be others where FW/ACPI/EL3 etc...
> > might have assumptions based on where some system devices were located
> > by the boot FW and will break if we move them (such things are common
> > on x86 and powerpc).
> 
> I would like to handle these individual devices that cannot be moved
> the same way we handle legacy (IDE, VGA) devices, i.e., mark the BARs
> with IORESOURCE_IO_FIXED.

A bit more messy but doable. However....

> This could be done with either Enhanced Allocation capabilities or
via
> ACPI _DSM function #5.  My preference would be to do this at the
> lowest possible level of the PCI hierarchy.
>  IIRC, EA can do it for
> individual BARs, and _DSM can be supplied for any individual device
> (or bridge, but I'd prefer to do it on the device because that gives
> us more information about exactly what needs to be preserved).

_DSM #5 seems to be working the other way around, it tells us to ignore
the FW setting. So the "intent" here is that unless that things is
present and says "1", we should just leave things alone as long as they
don't conflict.

What you seem to want to do is to go a step beyond and if present and
0, force everything to be fixed. I'm not completely comfortable with
that approach. Let's see what others think.

As fas as bridges vs. individual endpoints, It might have to handle
both cases. For example in our case, the security policies will prevent
changing the switch windows completely.

> Of course, _DSM *can* be higher, e.g., at the host bridge, but then we
> lose the information about what specifically must be immutable, and
> that means the OS cannot ever move *anything*, even if it becomes
> capable of moving things around to accommodate hot-added devices.

Well, in our case at least this is a non-issue, we don't want the OS to
move anything or change anything and there is no hotplug.

That said, the two aren't exclusive. The presence at the host bridge
level can be honored, and if absent, we can also honor at a finer
granularity.

However, as I said above, I'm not completely comfortable with treating
_DSM #5 = 0 as meaning "must be fixed". This is not what it means.

> I'm not aware of anything in DT that would correspond to DSM #5, but
> it could be added.

Yes, we could. On DT what we tend to do in those cases on powerpc and
sparc is to "manufacture" the pci_dev structures based on the info in
the DT, and only use config space to fill the remaining blanks. Let's
look at the ACPI issue for now though, we can handle DT later.

> > Taking a step back I think (and I suspect we generally agree based on
> > followup discussions I've seen) that the "right" thing to do is to have
> > our default behaviour be:
> > 
> >    - Claim what the FW established if it's not obviously broken
> > 
> >    - Call pci_assign_unassigned_resources() to assign what the FW
> > didn't assign
> > 
> > Which is more or less what powerpc and x86 do today, but using a
> > different mechanism than what's in pci_bus_claim_resources() (they are
> > similar to each other, I wrote the current powerpc one loosely based on
> > the x86 one at the time). That leads to a side question (which we
> > should probably discuss in a separate thread) of whether we want to
> > consolidate all that.
> > 
> > That said, we also know this is going to break *some* existing
> > platforms that have known broken FW assignment. The question is then
> > can we sufficiently detect the breakage and re-assign in those cases
> > (like x86 does fairly successfully in a number of cases), or do we need
> > to add some board/platform quirks to force the full re-assigment on
> > known broken platforms ?
> 
> I don't know how to parse this.  What does "known broken FW
> assignment" mean?  Are you saying the assignment from FW *looks* valid
> (all BARs contain valid addresses and are inside windows of upstream
> bridges), but it doesn't work for some reason? 

Yes... I am not personally aware of such a case but I was led to
believe based on prior conversations that such setups do exist,
especially in the non-ACPI ARM64 world. Which is why I would suggest
initially changing the default only on ACPI, at least until we have a
bit better visibility.

>  If that's the case,
> how would full reassignment by Linux fix anything?  Linux has no idea
> how to change a valid-looking assignment to make an actually-valid
> assignment.

Isn't it exactly what happens today though on arm64 ? We ignore
whatever the FW set, and assign everything anew... This is also what we
do on powerpc when the corresponding flag is set by the platform.

IE. Currently arm64 does:

	pci_bus_size_bridges(bus);
	pci_bus_assign_resources(bus);

Unconditionally... Or am I missing something ? That code is headache
inducing :)

There is no attempt at leaving things where they are. There is code to
avoid messing with IORESOURCE_IO_FIXED in there but I'm not sure how
well that works when it comes to dealing with bridge windows.

In comparison, x86 and powerpc have a different mechanism that first
surveys existing stuff, blasts away what looks bad/conflicting, and
then does something like pci_assign_unassigned_resources() to assign
whaetver's left.

If we look at Ard's patch, he wants to use pci_bus_claim_resources()
instead when the FW doesn't explicitely tells us to leave things alone.
I think that needs pci_assign_unassigned_resources() as well, though I
am not yet completely familiar with how pci_bus_claim_resources()
differs from the x86 and powerpc resource survey.

I think pci_bus_claim_resources() is intended for the "PROBE_ONLY" case
which is the extreme opposite: we *must* trust everything the FW did
and not try to touch anything at all. It would work for us too mind you
but we don't have a way to convey that via ACPI since, as I said above,
it's not really what _DSM #5 is intended to be.

We seem to enjoy creating mostly-duplicate ways of doing things over
and over again in PCI land :)

> > Even if all arm64 platforms are found to be broken today, I would still
> > like to have our default be to use the FW setup, if anything as an
> > incentive for newer platforms to get it right. At the very least on
> > ACPI.
> 
> I agree that Linux should not change anything unless it needs to.  Of
> course, reasons it "needs to" might include allocating more space for
> hot-added devices, either because Linux discovered an open slot or
> because a user requested more space with a kernel parameter.

Right and I'm fine with that. It's not an issue for us.

> > We can use DSM#5 when it exists to force one way or another (ideally on
> > a per bus basis but that's harder, so let's start with host bridges
> > maybe ?)
> 
> I'd prefer starting with endpoints, but I think it will all work out
> the same in the end.  When enumerating X, we look for a local _DSM #5
> and mark X's BARs/windows accordingly and set a pdev->fixed_resources
> bit.  If there's no local _DSM #5, act based on X's parent's
> fixed_resources bit.

Lorenzo, Zeev, have you already started looking at doing something this
way ? Does it work ?

Bjorn, we could do both, I don't see any problem there.

Do you see any reason why we shouldn't change the arm64 logic today, at
least when ACPI is present, to claim existing resources & assign
unassigned one instead of reallocating everything ?

And followup question, if the above is yes, will the sequence:

	pci_bus_claim_resources(b);
	if (!pci_has_flag(PCI_PROBE_ONLY))
		pci_assign_unassigned_resources(b);

Do what we want or do we need to replace pci_bus_claim_resources() with
something a bit more like what we have on x86 with the 2 pass
allocation mechanism ?

Finally what about pci_host_probe() ? It seems to also miss
pci_assign_unassigned_resources() and will unconditionally reassign
everything if PCI_PROBE_ONLY is not set, so it's yet another different
logic used by some archs.

Cheers,
Ben.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] ARM64 PCI resource survey issue(s)
  2019-06-04  3:32   ` Benjamin Herrenschmidt
@ 2019-06-04  3:37     ` Benjamin Herrenschmidt
  2019-06-04  6:56     ` Benjamin Herrenschmidt
  2019-06-04 12:49     ` Bjorn Helgaas
  2 siblings, 0 replies; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2019-06-04  3:37 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Ard Biesheuvel, Sinan Kaya, Lorenzo Pieralisi, linux-pci,
	linux-arm-kernel, Zilberman, Zeev, Saidi, Ali

On Tue, 2019-06-04 at 13:32 +1000, Benjamin Herrenschmidt wrote:
> 
> > Of course, _DSM *can* be higher, e.g., at the host bridge, but then we
> > lose the information about what specifically must be immutable, and
> > that means the OS cannot ever move *anything*, even if it becomes
> > capable of moving things around to accommodate hot-added devices.
> 
> Well, in our case at least this is a non-issue, we don't want the OS to
> move anything or change anything and there is no hotplug.

Correction: There is hotplug in the leaf ports, but the FW will have
setup the switch with enough space already.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] ARM64 PCI resource survey issue(s)
  2019-06-04  3:32   ` Benjamin Herrenschmidt
  2019-06-04  3:37     ` Benjamin Herrenschmidt
@ 2019-06-04  6:56     ` Benjamin Herrenschmidt
  2019-06-04 12:49     ` Bjorn Helgaas
  2 siblings, 0 replies; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2019-06-04  6:56 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Ard Biesheuvel, Sinan Kaya, Lorenzo Pieralisi, linux-pci,
	linux-arm-kernel, Zilberman, Zeev, Saidi, Ali

On Tue, 2019-06-04 at 13:32 +1000, Benjamin Herrenschmidt wrote:
> 
> > I would like to handle these individual devices that cannot be moved
> > the same way we handle legacy (IDE, VGA) devices, i.e., mark the BARs
> > with IORESOURCE_IO_FIXED.
> 
> A bit more messy but doable. However....

Sooo.... I spent some quality whisky getting my head around the current
state of setup-bus.c and setup-res.c ... gosh, what a mess ! Anyway, I
have some concerns about the use of IORESOURCE_IO_FIXED in the context
of an arch like arm64 that just blindly does:

        pci_bus_size_bridges(bus);
        pci_bus_assign_resources(bus);

Unless I'm missing something (please correct me if I am), this is all
extremely fragile and will only work under some very specific
circumstances:

First, the big issue is that having individual devices with such
"Fixed" BAR doesn't work in isolation. There is no chance in hell the
code in setup-bus.c will manage to configure the enclosing bridges
etc... to accomodate such fixed devices, it would require a major
refactoring of our entire resource allocation scheme.

pci_bus_size_bridges() does all the calculation for sizing up bridges
and ... completely ignores fixed BARs. Later pci_bus_assign_resources()
will assign bridges a location, again, completely ignoring children
fixed BARs if any.

Chances that we successfully land the fixed BARs in the resulting
allocation are slim at best.

So at the very least, to have a chance of working, if any endpoint has
fixed BARs, then all of it's parent chain must also be fixed, because
we won't be able to deal with it otherwise, at least not via the
"blunt" assignment code path.

But there seem to be more damage here (though that one maybe easier to
fix) from looking at the code: Unless I'm mistaken in my reading of the
code, for a given "level" of the bus tree, __pci_bus_assign_resources()
will *first* assign all the non-fixed devices by calling
pbus_assign_resources_sorted(), and *then* try to claim the fixed
resources of any device at that level using
pdev_assign_fixed_resources().

Again, even if we had the parent bridge by design (fixed too) or by
chance, covering the space where the fixed BAR is, chances that a
sibling non-fixed device will be given that slot before we have a
chance to claim it.

So in practice, IORESOURCE_IO_FIXED only works if all parents and
siblings are also IORESOURCE_IO_FIXED (either that or I just
misunderstood the code). The sibling case might be fixable by changing
the order inside __pci_bus_assign_resources().

There's more gunk too ... for example for IORESOURCE_IO_FIXED to work
on bridges one must call pci_read_bridge_bases() first which the
generic code doesn't do. So here comes an arch pcibios_fixup_bus ...
yuck.

There's also an oddball test about enabled bridges in
__pci_bus_assign_resources() that __pci_bridge_assign_resources()
doesn't have (yet another almost identical but not quite set of
functions just to confuse people) which might make snse

I looked at Lorenzo patches in his git repo for trying to deal with
_DSM #5 via IORESOURCE_IO_FIXED and I don't like what I see in that
context either :)

So at this stage, I think we are better off solving the immediate
issues of platforms that have good allocations coming from FW by using
Ard's old patch at the host brigde level (possibly changing the default
in absence of _DSM #5).

Taking on step back here and trying to think more long term, I am
concerned that we have how many different methods of doing the resource
allocation depending on the arch that go through different (sometime
majorly, sometimes subltly) path in the generic code, this doesn't seem
particularily maintainable...

It used to be that pci_bus_assign_resources() basically meant "force
reassign everything". If you throw IORESOURCE_IO_FIXED in the mix then
it's no longer that. It's now starting to look a bit more (but not
quite) like pci_assign_unassigned_bus_resources() except that we don't
have a generic 2-pass survey of existing resources like x86 has.

We do have pci_bus_claim_resources() but that one seems more targetted
as the opposite where we trust everything setup by FW and don't
reassign anything. It doesn't seem to have provisions for detecting
unsassigned resources unless I missed another bit here.

Then we have pci_host_probe() that some archs use (but not arm64),
which seems to be yet a different combination, which either trusts
firmware completely (pci_bus_claim_resources) or not at all
(pci_bus_size_bridges+pci_bus_assign_resources) based on a single
global flag (shouldn't it be per-host bridge ?).

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] ARM64 PCI resource survey issue(s)
  2019-06-04  3:32   ` Benjamin Herrenschmidt
  2019-06-04  3:37     ` Benjamin Herrenschmidt
  2019-06-04  6:56     ` Benjamin Herrenschmidt
@ 2019-06-04 12:49     ` Bjorn Helgaas
  2019-06-04 20:41       ` Benjamin Herrenschmidt
  2 siblings, 1 reply; 27+ messages in thread
From: Bjorn Helgaas @ 2019-06-04 12:49 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ard Biesheuvel, Sinan Kaya, Lorenzo Pieralisi, linux-pci,
	linux-arm-kernel, Zilberman, Zeev, Saidi, Ali

On Tue, Jun 04, 2019 at 01:32:11PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2019-06-03 at 20:49 -0500, Bjorn Helgaas wrote:

> > This could be done with either Enhanced Allocation capabilities or
> > via ACPI _DSM function #5.  My preference would be to do this at
> > the lowest possible level of the PCI hierarchy.  IIRC, EA can do
> > it for individual BARs, and _DSM can be supplied for any
> > individual device (or bridge, but I'd prefer to do it on the
> > device because that gives us more information about exactly what
> > needs to be preserved).
> 
> _DSM #5 seems to be working the other way around, it tells us to ignore
> the FW setting. So the "intent" here is that unless that things is
> present and says "1", we should just leave things alone as long as they
> don't conflict.

I wish you'd been involved in the recent effort to revise the _DSM #5
documentation.  The language in PCI FW r3.2 makes the implicit
assumption that by default, in the absence of _DSM, the OS must
preserve all window and BAR assignments.  But nobody has ever been
able to come up with a spec reference to support that assumption and I
think it is invalid.

The ECN under consideration ("Clarifications to _DSM Function 5",
March 26, 2019, currently posted for member review at
https://members.pcisig.com/wg/PCI-SIG-WG_Members/document/13014)
changes some of that language to basically say "if _DSM #5 exists and
returns 0, the OS must preserve the settings of this device and its
children; otherwise the OS is free to modify things."

> What you seem to want to do is to go a step beyond and if present and
> 0, force everything to be fixed. I'm not completely comfortable with
> that approach. Let's see what others think.

I'm not grasping the distinction you're making here.  What you
describe seems be what _DSM #5 requires.

> > Of course, _DSM *can* be higher, e.g., at the host bridge, but then we
> > lose the information about what specifically must be immutable, and
> > that means the OS cannot ever move *anything*, even if it becomes
> > capable of moving things around to accommodate hot-added devices.
> 
> Well, in our case at least this is a non-issue, we don't want the OS to
> move anything or change anything and there is no hotplug.
> 
> That said, the two aren't exclusive. The presence at the host bridge
> level can be honored, and if absent, we can also honor at a finer
> granularity.
> 
> However, as I said above, I'm not completely comfortable with treating
> _DSM #5 = 0 as meaning "must be fixed". This is not what it means.

The existing language in PCI FW r3.2 is "if _DSM #5 does not exist or
it exists and returns 0, the OS must not ignore PCI configuration done
by firmware."

> > I'm not aware of anything in DT that would correspond to DSM #5, but
> > it could be added.
> 
> Yes, we could. On DT what we tend to do in those cases on powerpc and
> sparc is to "manufacture" the pci_dev structures based on the info in
> the DT, and only use config space to fill the remaining blanks. Let's
> look at the ACPI issue for now though, we can handle DT later.
> 
> > > Taking a step back I think (and I suspect we generally agree based on
> > > followup discussions I've seen) that the "right" thing to do is to have
> > > our default behaviour be:
> > > 
> > >    - Claim what the FW established if it's not obviously broken
> > > 
> > >    - Call pci_assign_unassigned_resources() to assign what the FW
> > > didn't assign
> > > 
> > > Which is more or less what powerpc and x86 do today, but using a
> > > different mechanism than what's in pci_bus_claim_resources() (they are
> > > similar to each other, I wrote the current powerpc one loosely based on
> > > the x86 one at the time). That leads to a side question (which we
> > > should probably discuss in a separate thread) of whether we want to
> > > consolidate all that.
> > > 
> > > That said, we also know this is going to break *some* existing
> > > platforms that have known broken FW assignment. The question is then
> > > can we sufficiently detect the breakage and re-assign in those cases
> > > (like x86 does fairly successfully in a number of cases), or do we need
> > > to add some board/platform quirks to force the full re-assigment on
> > > known broken platforms ?
> > 
> > I don't know how to parse this.  What does "known broken FW
> > assignment" mean?  Are you saying the assignment from FW *looks* valid
> > (all BARs contain valid addresses and are inside windows of upstream
> > bridges), but it doesn't work for some reason? 
> 
> Yes... I am not personally aware of such a case but I was led to
> believe based on prior conversations that such setups do exist,
> especially in the non-ACPI ARM64 world. Which is why I would suggest
> initially changing the default only on ACPI, at least until we have a
> bit better visibility.

If a resource assignment that is valid in terms of all the PCI rules
(BARs are valid, BARs are inside upstream bridge windows, etc) doesn't
work, we would need more information in order to fix anything.  We'd
need to know exactly *what* doesn't work and *why* so we can avoid it.
The current blanket statement of "reassign everything and hope it
works better" is useless.

Bjorn

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] ARM64 PCI resource survey issue(s)
  2019-06-04 12:49     ` Bjorn Helgaas
@ 2019-06-04 20:41       ` Benjamin Herrenschmidt
  2019-06-06  9:00         ` [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup Benjamin Herrenschmidt
  2019-06-10 10:11         ` [RFC] ARM64 PCI resource survey issue(s) Lorenzo Pieralisi
  0 siblings, 2 replies; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2019-06-04 20:41 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Ard Biesheuvel, Sinan Kaya, Lorenzo Pieralisi, linux-pci,
	linux-arm-kernel, Zilberman, Zeev, Saidi, Ali

On Tue, 2019-06-04 at 07:49 -0500, Bjorn Helgaas wrote:
> > Yes... I am not personally aware of such a case but I was led to
> > believe based on prior conversations that such setups do exist,
> > especially in the non-ACPI ARM64 world. Which is why I would suggest
> > initially changing the default only on ACPI, at least until we have a
> > bit better visibility.
> 
> If a resource assignment that is valid in terms of all the PCI rules
> (BARs are valid, BARs are inside upstream bridge windows, etc) doesn't
> work, we would need more information in order to fix anything.  We'd
> need to know exactly *what* doesn't work and *why* so we can avoid it.
> The current blanket statement of "reassign everything and hope it
> works better" is useless.

I agree and I assume the problem stems from BIOSes creating either
invalid or incomplete assignments but as I said, I don't know for sure
as our platforms dont have that problem.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-04 20:41       ` Benjamin Herrenschmidt
@ 2019-06-06  9:00         ` Benjamin Herrenschmidt
  2019-06-06  9:13           ` Ard Biesheuvel
                             ` (2 more replies)
  2019-06-10 10:11         ` [RFC] ARM64 PCI resource survey issue(s) Lorenzo Pieralisi
  1 sibling, 3 replies; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2019-06-06  9:00 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Ard Biesheuvel, Sinan Kaya, Lorenzo Pieralisi, linux-pci,
	linux-arm-kernel, Zilberman, Zeev, Saidi, Ali

From: Ard Biesheuvel <ard.biesheuvel@linaro.org>

On arm64 ACPI systems, we unconditionally reconfigure the entire PCI
hierarchy at boot. This is a departure from what is customary on ACPI
systems, and may break assumptions in some places (e.g., EFIFB), that
the kernel will leave BARs of enabled PCI devices where they are.

Given that PCI already specifies a device specific ACPI method (_DSM)
for PCI root bridge nodes that tells us whether the firmware thinks
the configuration should be left alone, let's sidestep the entire
policy debate about whether the PCI configuration should be preserved
or not, and put it under the control of the firmware instead.

[BenH: Added pci_assign_unassigned_root_bus_resources()]

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---

So I would like this variant rather than mucking around with
IORESOURCE_PCI_FIXED at this stage to fix the problem with our platforms.

See my other email, IORESOURCE_PCI_FIXED doesn't really work terribly well
when using pci_bus_size_bridges and pci_bus_assign_resources, and the
resulting patches are ugly and add more mess.

Long run, I propose to start working on consolidating all those various
resource survey mechanisms around what x86 does, unless people strongly
object... (with the addition of the probe only and force reassign quirks
so platforms can still chose that).

Note: I haven't tested the effect of pci_assign_unassigned_root_bus_resources
as our platforms don't leave anything unassigned. I'm not entirely sure how
well pci_bus_claim_resources() will deal with a partially assigned setup...

We do want to support partial assignment by BIOS though, it's a trend to
reduce boot time, people seem to want BIOSes to only assign what's critical
for booting.

Bjorn: I haven't made the claim path the default in absence of _DSM #5 yet.
I suggest we do that as a separate patch in case it breaks somebody, thus
making bisection more meaningful. It will also make this one more palatable
to distros since it won't change the behaviour on systems without _DSM #5,
and we verified nobody has it except Seattle which returns 1. 

 arch/arm64/kernel/pci.c  | 23 +++++++++++++++++++++--
 include/linux/pci-acpi.h |  7 ++++---
 2 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c
index bb85e2f4603f..6358e1cb4f9f 100644
--- a/arch/arm64/kernel/pci.c
+++ b/arch/arm64/kernel/pci.c
@@ -168,6 +168,7 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
 	struct acpi_pci_generic_root_info *ri;
 	struct pci_bus *bus, *child;
 	struct acpi_pci_root_ops *root_ops;
+	union acpi_object *obj;
 
 	ri = kzalloc(sizeof(*ri), GFP_KERNEL);
 	if (!ri)
@@ -193,8 +194,26 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
 	if (!bus)
 		return NULL;
 
-	pci_bus_size_bridges(bus);
-	pci_bus_assign_resources(bus);
+	/*
+	 * Invoke the PCI device specific method (_DSM) #5 'Ignore PCI Boot
+	 * Configuration', which tells us whether the firmware wants us to
+	 * preserve the configuration of the PCI resource tree for this root
+	 * bridge.
+	 */
+	obj = acpi_evaluate_dsm(ACPI_HANDLE(bus->bridge), &pci_acpi_dsm_guid, 1,
+	                        IGNORE_PCI_BOOT_CONFIG_DSM, NULL);
+	if (obj && obj->type == ACPI_TYPE_INTEGER && obj->integer.value == 0) {
+		/* preserve existing resource assignment */
+		pci_bus_claim_resources(bus);
+
+		/* Assign anything that might have been left out */
+		pci_assign_unassigned_root_bus_resources(bus);
+	} else {
+		/* reconfigure the resource tree from scratch */
+		pci_bus_size_bridges(bus);
+		pci_bus_assign_resources(bus);
+	}
+	ACPI_FREE(obj);
 
 	list_for_each_entry(child, &bus->children, node)
 		pcie_bus_configure_settings(child);
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index 8082b612f561..62b7fdcc661c 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -107,9 +107,10 @@ static inline void acpiphp_check_host_bridge(struct acpi_device *adev) { }
 #endif
 
 extern const guid_t pci_acpi_dsm_guid;
-#define DEVICE_LABEL_DSM	0x07
-#define RESET_DELAY_DSM		0x08
-#define FUNCTION_DELAY_DSM	0x09
+#define IGNORE_PCI_BOOT_CONFIG_DSM	0x05
+#define DEVICE_LABEL_DSM		0x07
+#define RESET_DELAY_DSM			0x08
+#define FUNCTION_DELAY_DSM		0x09
 
 #else	/* CONFIG_ACPI */
 static inline void acpi_pci_add_bus(struct pci_bus *bus) { }



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-06  9:00         ` [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup Benjamin Herrenschmidt
@ 2019-06-06  9:13           ` Ard Biesheuvel
  2019-06-06 10:55             ` Benjamin Herrenschmidt
  2019-06-11 14:58           ` Lorenzo Pieralisi
  2019-06-11 23:39           ` Bjorn Helgaas
  2 siblings, 1 reply; 27+ messages in thread
From: Ard Biesheuvel @ 2019-06-06  9:13 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Bjorn Helgaas, Sinan Kaya, Lorenzo Pieralisi, linux-pci,
	linux-arm-kernel, Zilberman, Zeev, Saidi, Ali

On Thu, 6 Jun 2019 at 11:00, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
>
> From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>
> On arm64 ACPI systems, we unconditionally reconfigure the entire PCI
> hierarchy at boot. This is a departure from what is customary on ACPI
> systems, and may break assumptions in some places (e.g., EFIFB), that
> the kernel will leave BARs of enabled PCI devices where they are.
>
> Given that PCI already specifies a device specific ACPI method (_DSM)
> for PCI root bridge nodes that tells us whether the firmware thinks
> the configuration should be left alone, let's sidestep the entire
> policy debate about whether the PCI configuration should be preserved
> or not, and put it under the control of the firmware instead.
>
> [BenH: Added pci_assign_unassigned_root_bus_resources()]
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>
> So I would like this variant rather than mucking around with
> IORESOURCE_PCI_FIXED at this stage to fix the problem with our platforms.
>
> See my other email, IORESOURCE_PCI_FIXED doesn't really work terribly well
> when using pci_bus_size_bridges and pci_bus_assign_resources, and the
> resulting patches are ugly and add more mess.
>
> Long run, I propose to start working on consolidating all those various
> resource survey mechanisms around what x86 does, unless people strongly
> object... (with the addition of the probe only and force reassign quirks
> so platforms can still chose that).
>
> Note: I haven't tested the effect of pci_assign_unassigned_root_bus_resources
> as our platforms don't leave anything unassigned. I'm not entirely sure how
> well pci_bus_claim_resources() will deal with a partially assigned setup...
>
> We do want to support partial assignment by BIOS though, it's a trend to
> reduce boot time, people seem to want BIOSes to only assign what's critical
> for booting.
>
> Bjorn: I haven't made the claim path the default in absence of _DSM #5 yet.
> I suggest we do that as a separate patch in case it breaks somebody, thus
> making bisection more meaningful. It will also make this one more palatable
> to distros since it won't change the behaviour on systems without _DSM #5,
> and we verified nobody has it except Seattle which returns 1.
>

FYI Seattle is broken in any case since it returns Package(1) rather
than just an int.

The problem with this patch is that currently, the PCI fw spec permits
_DSM #5 everywhere *except* on the host bridge device object itself,
and this is in the process of being changed.

I will leave it up to the maintainers to decide whether we can take
this patch in anticipation of that, even though it doesn't deal with
_DSM #5 on nodes anywhere else in the PCIe tree.

>  arch/arm64/kernel/pci.c  | 23 +++++++++++++++++++++--
>  include/linux/pci-acpi.h |  7 ++++---
>  2 files changed, 25 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c
> index bb85e2f4603f..6358e1cb4f9f 100644
> --- a/arch/arm64/kernel/pci.c
> +++ b/arch/arm64/kernel/pci.c
> @@ -168,6 +168,7 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
>         struct acpi_pci_generic_root_info *ri;
>         struct pci_bus *bus, *child;
>         struct acpi_pci_root_ops *root_ops;
> +       union acpi_object *obj;
>
>         ri = kzalloc(sizeof(*ri), GFP_KERNEL);
>         if (!ri)
> @@ -193,8 +194,26 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
>         if (!bus)
>                 return NULL;
>
> -       pci_bus_size_bridges(bus);
> -       pci_bus_assign_resources(bus);
> +       /*
> +        * Invoke the PCI device specific method (_DSM) #5 'Ignore PCI Boot
> +        * Configuration', which tells us whether the firmware wants us to
> +        * preserve the configuration of the PCI resource tree for this root
> +        * bridge.
> +        */
> +       obj = acpi_evaluate_dsm(ACPI_HANDLE(bus->bridge), &pci_acpi_dsm_guid, 1,
> +                               IGNORE_PCI_BOOT_CONFIG_DSM, NULL);
> +       if (obj && obj->type == ACPI_TYPE_INTEGER && obj->integer.value == 0) {
> +               /* preserve existing resource assignment */
> +               pci_bus_claim_resources(bus);
> +
> +               /* Assign anything that might have been left out */
> +               pci_assign_unassigned_root_bus_resources(bus);
> +       } else {
> +               /* reconfigure the resource tree from scratch */
> +               pci_bus_size_bridges(bus);
> +               pci_bus_assign_resources(bus);
> +       }
> +       ACPI_FREE(obj);
>
>         list_for_each_entry(child, &bus->children, node)
>                 pcie_bus_configure_settings(child);
> diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
> index 8082b612f561..62b7fdcc661c 100644
> --- a/include/linux/pci-acpi.h
> +++ b/include/linux/pci-acpi.h
> @@ -107,9 +107,10 @@ static inline void acpiphp_check_host_bridge(struct acpi_device *adev) { }
>  #endif
>
>  extern const guid_t pci_acpi_dsm_guid;
> -#define DEVICE_LABEL_DSM       0x07
> -#define RESET_DELAY_DSM                0x08
> -#define FUNCTION_DELAY_DSM     0x09
> +#define IGNORE_PCI_BOOT_CONFIG_DSM     0x05
> +#define DEVICE_LABEL_DSM               0x07
> +#define RESET_DELAY_DSM                        0x08
> +#define FUNCTION_DELAY_DSM             0x09
>
>  #else  /* CONFIG_ACPI */
>  static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-06  9:13           ` Ard Biesheuvel
@ 2019-06-06 10:55             ` Benjamin Herrenschmidt
  2019-06-11 14:31               ` Lorenzo Pieralisi
  0 siblings, 1 reply; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2019-06-06 10:55 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Bjorn Helgaas, Sinan Kaya, Lorenzo Pieralisi, linux-pci,
	linux-arm-kernel, Zilberman, Zeev, Saidi, Ali

On Thu, 2019-06-06 at 11:13 +0200, Ard Biesheuvel wrote:
> > Bjorn: I haven't made the claim path the default in absence of _DSM #5 yet.
> > I suggest we do that as a separate patch in case it breaks somebody, thus
> > making bisection more meaningful. It will also make this one more palatable
> > to distros since it won't change the behaviour on systems without _DSM #5,
> > and we verified nobody has it except Seattle which returns 1.
> > 
> 
> FYI Seattle is broken in any case since it returns Package(1) rather
> than just an int.

Great .... not. Do we care ?

> The problem with this patch is that currently, the PCI fw spec permits
> _DSM #5 everywhere *except* on the host bridge device object itself,
> and this is in the process of being changed.

Yes, I'm indirectly aware of that :)

> I will leave it up to the maintainers to decide whether we can take
> this patch in anticipation of that, even though it doesn't deal with
> _DSM #5 on nodes anywhere else in the PCIe tree.

Right, so the problem at this point is that dealing with it elsewhere
in the tree is very fragile and problematic (see my other messages).
Doing it at the host bridge level fixes the immediate problem for us
(provided we are ok anticipating the spec update), and doesn't preclude
also honoring it for individual devices later on.

My thinking is if we converge everybody toward the x86 method of doing
a 2 pass survey of existing resources followed by assign_unassigned,
and have that the main generic code path (with added quirks to force a
full assignment and keeping probe_only around but that's easy, we have
that on powerpc and our code is originally based on the x86 one), then
we'll have a much easier time supporting IORESOURCE_PCI_FIXED on
portions of the tree as well (though it also becomes less critical to
do so since we will no longer reallocate unless we have to).

That said we need to understand what "fixed" means and why we do it.

IE, If an endpoint somehere has "fixed" BARs for example, that means
all parent bridge must be setup to enclose that range.

Now our allocator for bridge windows cannot handle that and probably
never will, so we have to rely on the existing window established by
the FW being reasonable and use it. We can still *extend" bridge
windows (and we have code to do that) if necessary but we cannot move
them if they contain a fixed BAR device.

There is a much bigger discussion to be had around that concept of
fixed device anyway, maybe at Plumbers ? Why is the BAR fixed ? Because
the EFI FB is on it ? Because HW bugs ? Because FW might access it from
SMM or ARM equivalent ? Because ACPI will poke at it based on its
initial address ? etc...

Some of the answers to the above questions imply more than the need to
fix the BAR: Does it also mean that disabling access to that BAR, even
temporarily, isn't safe ? However that's what we do today when we
probe, if anything, to do the BAR sizing...

This isn't a new problem. We had issues like that dating back 15 years
on powerpc for example, where a big ASIC hanging off PCI had all the
Apple gunk including the interrupt controller, which was initialized
from the DT way before PCI probing. If you took an interrupt at the
"wrong" time during BAR sizing, kaboom ! If you had debug printk's in
the wrong place in the PCI probing code, kaboom ! etc....

If we want to solve that properly in the long run, we'll probably want
ACPI to tell us the BAR sizes and use that instead of doing manual
sizing on such "system" devices. We similarily have ways to "construct"
pci_dev's from the OF tree on sparc64 and powerpc, limiting direct
config access to populate stuff we can't get from FW.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] ARM64 PCI resource survey issue(s)
  2019-06-04 20:41       ` Benjamin Herrenschmidt
  2019-06-06  9:00         ` [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup Benjamin Herrenschmidt
@ 2019-06-10 10:11         ` Lorenzo Pieralisi
  2019-06-11  5:46           ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 27+ messages in thread
From: Lorenzo Pieralisi @ 2019-06-10 10:11 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Bjorn Helgaas, Ard Biesheuvel, Sinan Kaya, linux-pci,
	linux-arm-kernel, Zilberman, Zeev, Saidi, Ali

On Wed, Jun 05, 2019 at 06:41:16AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2019-06-04 at 07:49 -0500, Bjorn Helgaas wrote:
> > > Yes... I am not personally aware of such a case but I was led to
> > > believe based on prior conversations that such setups do exist,
> > > especially in the non-ACPI ARM64 world. Which is why I would suggest
> > > initially changing the default only on ACPI, at least until we have a
> > > bit better visibility.
> > 
> > If a resource assignment that is valid in terms of all the PCI rules
> > (BARs are valid, BARs are inside upstream bridge windows, etc) doesn't
> > work, we would need more information in order to fix anything.  We'd
> > need to know exactly *what* doesn't work and *why* so we can avoid it.
> > The current blanket statement of "reassign everything and hope it
> > works better" is useless.
> 
> I agree and I assume the problem stems from BIOSes creating either
> invalid or incomplete assignments but as I said, I don't know for sure
> as our platforms dont have that problem.

The first thing that I would like to agree on is what mechanism the
kernel has to ensure a BAR resource is not changed by the
PCI resource assignment mechanism.

(1) IORESOURCE_PCI_FIXED flag: I have *very* strong feelings that this
    flag works because x86 kernel code claims all resources on probe
    (which effectively makes them immutable). It does not work for
    PCI bridges apertures and I am not sure it works for arches (eg
    ARM64) that reassign everything, even for *devices*
(2) Claiming resources: this basically means assigning a parent to
    a resource.

Kernel PCI resource assignment code is full of code I do not understand
and that in my opinion works because of (1). I *tried*, while posting
ACPI PCI code for ARM64, to do what x86 does:

- Claim all resources
- Reassign the resources that could not be claimed

This led to:

a) Spurious dmesg logs (Eg Resources that could not be claimed)
b) If FW set-up bridge windows, it may lead to reassignment failures
   if the bridge windows were sized *wrongly* but the kernel still
   manage to claim them (because they fit in the parent window).

Hence, we reassign everything on ARM64 on ACPI (except for bus numbers
and that was, indeed, a mistake).

In short, this is a mess and one that I do not have much leeway to fix
because the PCI resource assignment code is as it is owing to legacy
that we can't touch unless we want to fix regressions for the next
year.

And then there is _DSM #5.

The problem we have is that there is *no* specific way to tell what's
right or wrong and that's what _DSM #5 is supposed to fix. To implement
it to the letter it will take lots of work.

First thing is to answer and agree (1) vs (2) above, aka define what
an immutable resource is from a kernel point of view, one with a
parent != NULL OR one with IORESOURCE_PCI_FIXED flag.

Lorenzo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] ARM64 PCI resource survey issue(s)
  2019-06-10 10:11         ` [RFC] ARM64 PCI resource survey issue(s) Lorenzo Pieralisi
@ 2019-06-11  5:46           ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2019-06-11  5:46 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Bjorn Helgaas, Ard Biesheuvel, Sinan Kaya, linux-pci,
	linux-arm-kernel, Zilberman, Zeev, Saidi, Ali

On Mon, 2019-06-10 at 11:11 +0100, Lorenzo Pieralisi wrote:
> 
> > I agree and I assume the problem stems from BIOSes creating either
> > invalid or incomplete assignments but as I said, I don't know for sure
> > as our platforms dont have that problem.
> 
> The first thing that I would like to agree on is what mechanism the
> kernel has to ensure a BAR resource is not changed by the
> PCI resource assignment mechanism.
> 
> (1) IORESOURCE_PCI_FIXED flag: I have *very* strong feelings that this
>     flag works because x86 kernel code claims all resources on probe
>     (which effectively makes them immutable). It does not work for
>     PCI bridges apertures and I am not sure it works for arches (eg
>     ARM64) that reassign everything, even for *devices*

I'm pretty sure this flag generally doesn't work indeed, see my other
messages on that subject. The only valid use I can see for this flag
honestly is in a context where we do an x86-style claim pass, to
prevent selected resources or bridge windows from being moved or
resized due to conflicts (ie: move the *other* one if there's a
conflict) but it's of very limited value.

> (2) Claiming resources: this basically means assigning a parent to
>     a resource.

Well yes, a claimed resource is immutable, that's a given.

That and of course it's implied that there are no conflicts between
claimed resources and/or the resources have valid existing assigned
values. From my quick look last week, it doesn't seem like
pci_bus_claim_resources() handle those edge cases terribly well.

This is why I'm keen on trying to standardize around the x86 (and
powerpc which is similar) method which deals with these partial
assignments and some broken cases much better.

> Kernel PCI resource assignment code is full of code I do not understand
> and that in my opinion works because of (1). I *tried*, while posting
> ACPI PCI code for ARM64, to do what x86 does:

:-)

> - Claim all resources
> - Reassign the resources that could not be claimed
> 
> This led to:
> 
> a) Spurious dmesg logs (Eg Resources that could not be claimed)
> b) If FW set-up bridge windows, it may lead to reassignment failures
>    if the bridge windows were sized *wrongly* but the kernel still
>    manage to claim them (because they fit in the parent window).
> 
> Hence, we reassign everything on ARM64 on ACPI (except for bus numbers
> and that was, indeed, a mistake).

Hrm... do you have example systems or the ability to give me access to
systems where these problems occur ? The x86 code has some smarts to
deal with BIOSes who didn't quite do the right thing that we might be
able to port over. What I called the "2 pass survey followed by
pci_assign_unassigned_resources". I've implemented something similar
back in the days for powerpc and it served us well there too. I'd like
to give a shot and making this generic and using it on arm64.

> In short, this is a mess and one that I do not have much leeway to fix
> because the PCI resource assignment code is as it is owing to legacy
> that we can't touch unless we want to fix regressions for the next
> year.

Yes and no... We can try to make the x86 method work on arm64 and see
what goes right or wrong. I wouldn't mind adding quirks to recognize
existing broken amd64 firmwares if those really still don't work
properly, if we can help getting future ones better as well.

> And then there is _DSM #5.
> 
> The problem we have is that there is *no* specific way to tell what's
> right or wrong and that's what _DSM #5 is supposed to fix. To implement
> it to the letter it will take lots of work.

Correct but I don't think we need fully "to the letter", it's almost
impossible to support all possible cases of _DSM #5 values at arbitrary
locations in the tree.

> First thing is to answer and agree (1) vs (2) above, aka define what
> an immutable resource is from a kernel point of view, one with a
> parent != NULL OR one with IORESOURCE_PCI_FIXED flag.

Well, I think (2) is a given. The issue however is that a resource
can't be "fixed" in isolation. It's parent need to be sufficiently
"fixed" to enclose that resource, and it's parent parent etc.... This
is why random _DSM #5 on leaf nodes isn't going to fly terribly well if
the FW hasn't already assigned sane windows to bridges.

I think the key here is to try to claim what FW setup iff it doesn't
conflict and looks like it's valid at all (assigned at all), using a
similar kind of 2 pass mechanism as x86 and powrpc that favors already
enabled devices (likely to be used by FW), and (re)assigns everything
else.

I can look at making x86 and powerpc common with reasonable ease as I
wrote a lot of the powerpc code for that. I don't have all the HW at
hand to test anymore but I still know who does :-)

For ARM64, the difficulty is all I have access to at the moment are
Amazon own systems which seem to have a reasonable FW assignment, and
qemu. So I'll need help with testing against the bad guys.

In the meantime, I'm still keen on getting in Ard patch that handles
_DSM #5 at the host bridge level. It will help existing valid use cases
and we are reasonably confident it won't break anything.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-06 10:55             ` Benjamin Herrenschmidt
@ 2019-06-11 14:31               ` Lorenzo Pieralisi
  2019-06-11 22:09                 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 27+ messages in thread
From: Lorenzo Pieralisi @ 2019-06-11 14:31 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ard Biesheuvel, Bjorn Helgaas, Sinan Kaya, linux-pci,
	linux-arm-kernel, Zilberman, Zeev, Saidi, Ali

On Thu, Jun 06, 2019 at 08:55:07PM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2019-06-06 at 11:13 +0200, Ard Biesheuvel wrote:
> > > Bjorn: I haven't made the claim path the default in absence of _DSM #5 yet.
> > > I suggest we do that as a separate patch in case it breaks somebody, thus
> > > making bisection more meaningful. It will also make this one more palatable
> > > to distros since it won't change the behaviour on systems without _DSM #5,
> > > and we verified nobody has it except Seattle which returns 1.
> > > 
> > 
> > FYI Seattle is broken in any case since it returns Package(1) rather
> > than just an int.
> 
> Great .... not. Do we care ?
> 
> > The problem with this patch is that currently, the PCI fw spec permits
> > _DSM #5 everywhere *except* on the host bridge device object itself,
> > and this is in the process of being changed.
> 
> Yes, I'm indirectly aware of that :)
> 
> > I will leave it up to the maintainers to decide whether we can take
> > this patch in anticipation of that, even though it doesn't deal with
> > _DSM #5 on nodes anywhere else in the PCIe tree.
> 
> Right, so the problem at this point is that dealing with it elsewhere
> in the tree is very fragile and problematic (see my other messages).
> Doing it at the host bridge level fixes the immediate problem for us
> (provided we are ok anticipating the spec update), and doesn't preclude
> also honoring it for individual devices later on.

True, minus specs update schedule, I can't change that and merging
this patch (and firmware thereof) relies on specifications that
are intent changes till they become an ECN (~another merge window,
so this patch could land at v5.4).

The other option is doing what this patch does *without* relying
on _DSM #5, we may have regressions unfortunately though.

It is kind of orthogonal (but not really), bus numbers assignment
is _not_ in line with resource assignment at the moment and I want
to change it.

Since ACPI on ARM64 is still at its inception maybe we should have
a stab at patching the kernel so that it reassigns bus numbers by
default and toggle that behaviour on _DSM #5 == 0 detection.

I doubt that reassigning bus numbers by default can trigger
regressions on existing platforms but the only way to figure
it out is by testing it.

> My thinking is if we converge everybody toward the x86 method of doing
> a 2 pass survey of existing resources followed by assign_unassigned,

I am not entirely sure we need a 2-pass survey,

pci_bus_claim_resources()

should be enough; if it is not we update it.

> and have that the main generic code path (with added quirks to force a
> full assignment and keeping probe_only around but that's easy, we have
> that on powerpc and our code is originally based on the x86 one), then
> we'll have a much easier time supporting IORESOURCE_PCI_FIXED on
> portions of the tree as well (though it also becomes less critical to
> do so since we will no longer reallocate unless we have to).
> 
> That said we need to understand what "fixed" means and why we do it.

Agree, totally and I want to make it clear how a BAR is fixed in
the kernel, there are too many discrepancies in the resource management
code already.

> IE, If an endpoint somehere has "fixed" BARs for example, that means
> all parent bridge must be setup to enclose that range.
> 
> Now our allocator for bridge windows cannot handle that and probably
> never will, so we have to rely on the existing window established by
> the FW being reasonable and use it. We can still *extend" bridge
> windows (and we have code to do that) if necessary but we cannot move
> them if they contain a fixed BAR device.
> 
> There is a much bigger discussion to be had around that concept of
> fixed device anyway, maybe at Plumbers ? Why is the BAR fixed ? Because
> the EFI FB is on it ? Because HW bugs ? Because FW might access it from
> SMM or ARM equivalent ? Because ACPI will poke at it based on its
> initial address ? etc...

Consider a slot booked at LPC PCI uconf for this discussion.

> Some of the answers to the above questions imply more than the need to
> fix the BAR: Does it also mean that disabling access to that BAR, even
> temporarily, isn't safe ? However that's what we do today when we
> probe, if anything, to do the BAR sizing...

Eh, another question that came up already should be debated.

> This isn't a new problem. We had issues like that dating back 15 years
> on powerpc for example, where a big ASIC hanging off PCI had all the
> Apple gunk including the interrupt controller, which was initialized
> from the DT way before PCI probing. If you took an interrupt at the
> "wrong" time during BAR sizing, kaboom ! If you had debug printk's in
> the wrong place in the PCI probing code, kaboom ! etc....
> 
> If we want to solve that properly in the long run, we'll probably want
> ACPI to tell us the BAR sizes and use that instead of doing manual
> sizing on such "system" devices. We similarily have ways to "construct"
> pci_dev's from the OF tree on sparc64 and powerpc, limiting direct
> config access to populate stuff we can't get from FW.

https://lore.kernel.org/linux-pci/20190121174225.15835-1-mr.nuke.me@gmail.com/

?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-06  9:00         ` [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup Benjamin Herrenschmidt
  2019-06-06  9:13           ` Ard Biesheuvel
@ 2019-06-11 14:58           ` Lorenzo Pieralisi
  2019-06-11 22:19             ` Benjamin Herrenschmidt
  2019-06-11 23:39           ` Bjorn Helgaas
  2 siblings, 1 reply; 27+ messages in thread
From: Lorenzo Pieralisi @ 2019-06-11 14:58 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Bjorn Helgaas, Ard Biesheuvel, Sinan Kaya, linux-pci,
	linux-arm-kernel, Zilberman, Zeev, Saidi, Ali

On Thu, Jun 06, 2019 at 07:00:12PM +1000, Benjamin Herrenschmidt wrote:
> From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> 
> On arm64 ACPI systems, we unconditionally reconfigure the entire PCI
> hierarchy at boot. This is a departure from what is customary on ACPI
> systems, and may break assumptions in some places (e.g., EFIFB), that
> the kernel will leave BARs of enabled PCI devices where they are.
> 
> Given that PCI already specifies a device specific ACPI method (_DSM)
> for PCI root bridge nodes that tells us whether the firmware thinks
> the configuration should be left alone, let's sidestep the entire
> policy debate about whether the PCI configuration should be preserved
> or not, and put it under the control of the firmware instead.
> 
> [BenH: Added pci_assign_unassigned_root_bus_resources()]
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
> 
> So I would like this variant rather than mucking around with
> IORESOURCE_PCI_FIXED at this stage to fix the problem with our platforms.
> 
> See my other email, IORESOURCE_PCI_FIXED doesn't really work terribly well
> when using pci_bus_size_bridges and pci_bus_assign_resources, and the
> resulting patches are ugly and add more mess.
> 
> Long run, I propose to start working on consolidating all those various
> resource survey mechanisms around what x86 does, unless people strongly
> object... (with the addition of the probe only and force reassign quirks
> so platforms can still chose that).
> 
> Note: I haven't tested the effect of pci_assign_unassigned_root_bus_resources
> as our platforms don't leave anything unassigned. I'm not entirely sure how
> well pci_bus_claim_resources() will deal with a partially assigned setup...
> 
> We do want to support partial assignment by BIOS though, it's a trend to
> reduce boot time, people seem to want BIOSes to only assign what's critical
> for booting.
> 
> Bjorn: I haven't made the claim path the default in absence of _DSM #5 yet.
> I suggest we do that as a separate patch in case it breaks somebody, thus
> making bisection more meaningful. It will also make this one more palatable
> to distros since it won't change the behaviour on systems without _DSM #5,
> and we verified nobody has it except Seattle which returns 1. 
> 
>  arch/arm64/kernel/pci.c  | 23 +++++++++++++++++++++--
>  include/linux/pci-acpi.h |  7 ++++---
>  2 files changed, 25 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c
> index bb85e2f4603f..6358e1cb4f9f 100644
> --- a/arch/arm64/kernel/pci.c
> +++ b/arch/arm64/kernel/pci.c
> @@ -168,6 +168,7 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
>  	struct acpi_pci_generic_root_info *ri;
>  	struct pci_bus *bus, *child;
>  	struct acpi_pci_root_ops *root_ops;
> +	union acpi_object *obj;
>  
>  	ri = kzalloc(sizeof(*ri), GFP_KERNEL);
>  	if (!ri)
> @@ -193,8 +194,26 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
>  	if (!bus)
>  		return NULL;
>  
> -	pci_bus_size_bridges(bus);
> -	pci_bus_assign_resources(bus);
> +	/*
> +	 * Invoke the PCI device specific method (_DSM) #5 'Ignore PCI Boot
> +	 * Configuration', which tells us whether the firmware wants us to
> +	 * preserve the configuration of the PCI resource tree for this root
> +	 * bridge.
> +	 */
> +	obj = acpi_evaluate_dsm(ACPI_HANDLE(bus->bridge), &pci_acpi_dsm_guid, 1,
> +	                        IGNORE_PCI_BOOT_CONFIG_DSM, NULL);
> +	if (obj && obj->type == ACPI_TYPE_INTEGER && obj->integer.value == 0) {
> +		/* preserve existing resource assignment */
> +		pci_bus_claim_resources(bus);
> +
> +		/* Assign anything that might have been left out */
> +		pci_assign_unassigned_root_bus_resources(bus);
> +	} else {
> +		/* reconfigure the resource tree from scratch */
> +		pci_bus_size_bridges(bus);
> +		pci_bus_assign_resources(bus);
> +	}

	if (obj && obj->type == ACPI_TYPE_INTEGER && obj->integer.value == 0) {
		/* preserve existing resource assignment */
		pci_bus_claim_resources(bus);
	}

	pci_bus_size_bridges(bus);
	pci_bus_assign_resources(bus);

That's how it should be I think:

1) we do not want pci_assign_unassigned_root_bus_resources(bus) to
   reallocate resources already claimed (see realloc parameter), do we ?
2) pci_bus_size_bridges(bus) and pci_bus_assign_resources(bus) should
   not interfere with resources already claimed so it *should* be safe
   to call them anyway

Most importantly: I want everyone to agree that claiming is equivalent
to making a resource immutable (except for realloc, see (1) above)
because that's what we are doing by claiming on _DSM #5 == 0.

There are too many ways to make a resource immutable in the kernel
and this is confusing and prone to bugs.

Thanks,
Lorenzo

> +	ACPI_FREE(obj);
>  
>  	list_for_each_entry(child, &bus->children, node)
>  		pcie_bus_configure_settings(child);
> diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
> index 8082b612f561..62b7fdcc661c 100644
> --- a/include/linux/pci-acpi.h
> +++ b/include/linux/pci-acpi.h
> @@ -107,9 +107,10 @@ static inline void acpiphp_check_host_bridge(struct acpi_device *adev) { }
>  #endif
>  
>  extern const guid_t pci_acpi_dsm_guid;
> -#define DEVICE_LABEL_DSM	0x07
> -#define RESET_DELAY_DSM		0x08
> -#define FUNCTION_DELAY_DSM	0x09
> +#define IGNORE_PCI_BOOT_CONFIG_DSM	0x05
> +#define DEVICE_LABEL_DSM		0x07
> +#define RESET_DELAY_DSM			0x08
> +#define FUNCTION_DELAY_DSM		0x09
>  
>  #else	/* CONFIG_ACPI */
>  static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-11 14:31               ` Lorenzo Pieralisi
@ 2019-06-11 22:09                 ` Benjamin Herrenschmidt
  2019-06-11 22:34                   ` Ard Biesheuvel
  2019-06-12 10:21                   ` Lorenzo Pieralisi
  0 siblings, 2 replies; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2019-06-11 22:09 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Ard Biesheuvel, Bjorn Helgaas, Sinan Kaya, linux-pci,
	linux-arm-kernel, Zilberman, Zeev, Saidi, Ali

On Tue, 2019-06-11 at 15:31 +0100, Lorenzo Pieralisi wrote:
> 
> True, minus specs update schedule, I can't change that and merging
> this patch (and firmware thereof) relies on specifications that
> are intent changes till they become an ECN (~another merge window,
> so this patch could land at v5.4).

Hrm... annoying for us but I understand your reasoning.

> The other option is doing what this patch does *without* relying
> on _DSM #5, we may have regressions unfortunately though.

We could work around regressions with quirks I suppose. It does make
sense to assume that if you have ACPI and UEFI, you have a decent PCI
BAR assignment at boot in the "general case". That said, we need to
double check first that pci_bus_claim_resources() will not do horrible
things on partially assigned setups, since there's a real interest in
doing that in the field.

> It is kind of orthogonal (but not really), bus numbers assignment
> is _not_ in line with resource assignment at the moment and I want
> to change it.

Hrm. We should probably reassign bus numbers if we reassign resources
yes, but then I'd like us to not reassign resources unless we have to
:-)

> Since ACPI on ARM64 is still at its inception maybe we should have
> a stab at patching the kernel so that it reassigns bus numbers by
> default and toggle that behaviour on _DSM #5 == 0 detection.
> 
> I doubt that reassigning bus numbers by default can trigger
> regressions on existing platforms but the only way to figure
> it out is by testing it.
>
> > My thinking is if we converge everybody toward the x86 method of
> > doing
> > a 2 pass survey of existing resources followed by
> > assign_unassigned,
> 
> I am not entirely sure we need a 2-pass survey,
>
> pci_bus_claim_resources()
> 
> should be enough; if it is not we update it.

So it's not so much about the 2 passes per-se, though they have value,
it's more about consolidating archs to do the same thing. Chances that
we change x86 are nil. But we can change powerpc and arm64 to do like
x86 and move that code to generic.

pci_bus_claim_resources() seems to be a "lightweight" variant of the
survey done by x86. The main differences I can see are:

 - The 2 passes thing which we may or may not care about, its main
purpose is to favor resources that are already enabled by the BIOS in
case of conflicts as far as I understand.

 - pci_read_bridge_bases() is done by pci_bus_claim_resources(), while
x86 (and powerpc and others) do it in their pcibios_fixup_bus. That one
is interesting... Any reason why we shouldn't unconditionally read the
bridges while probing ? Bjorn ?

 - When allocating bridge resources, there are interesting differences:

  * x86 (and powerpc to some extent): If one has a 0 start or we fail
to claim it, x86 will wipe out the resource struct (including flags). I
assume that pci_assign_unassign_* will restore bridges when needed but
I haven't verified. 

  * pci_bus_claim_resources() is dumber in that regard. It will call
pci_claim_bridge_resources() blindly try to claim whatever is there
even if res->start is 0. This could be a problem with partially
assigned trees. It also doesn't wipe the resource in case of failure to
claim which could be a problem going down the tree and letting children
attach to the non-claimed resource, thus potentially causing the
reassign pass to fail.

The r->start == 0 test is interesting ... the generic claim code will
honor IORESOURCE_UNSET but we don't seem to set that generically unless
we hit some of the specific pass for explicit resource alignment, or
during the reassignment phases.

 - When allocating device resources, the main difference other than the
2 passes is that x86 will "0 base" the resource (r->end -= r->start; r-
>start = 0) for later reassignment. The claim path we use won't do
that. Note: none sets IORESOURCE_UNSET... Additionally x86 has some
oddball code to save the original FW values and restore them if
assignment later fails, which is somewhat odd since there's a conflict
but probably helps really broken setups.

 - x86 will not claim ROMs in that pass, it does a 3rd pass just for
them (it's common I think to not have room for all the ROMs). It also
disables them in config space during the survey.
pci_bus_claim_resources() will claim everything and leave ROMs enabled.

So as a somewhat temprary conclusion, I think the main difference here
is what happens when claim fails (also the res->start = 0 case which we
need to look at more closely) and whether we should make the generic
code also "0-base" the resource.

The question for me really is, do we want to just "upgrade" (if
necessary) pci_bus_claim_resources() and continue having x86 do its own
thing for ever, or do we want to consolidate around what is probably
the most tested platform when it comes to PCI :-)

And if we consolidate, I think that won't be by changing what x86 does,
that code is the result of decades of fiddling to get things right with
all sorts of broken BIOSes...

> > and have that the main generic code path (with added quirks to force a
> > full assignment and keeping probe_only around but that's easy, we have
> > that on powerpc and our code is originally based on the x86 one), then
> > we'll have a much easier time supporting IORESOURCE_PCI_FIXED on
> > portions of the tree as well (though it also becomes less critical to
> > do so since we will no longer reallocate unless we have to).
> > 
> > That said we need to understand what "fixed" means and why we do it.
> 
> Agree, totally and I want to make it clear how a BAR is fixed in
> the kernel, there are too many discrepancies in the resource
> management code already.
> 
> > IE, If an endpoint somehere has "fixed" BARs for example, that means
> > all parent bridge must be setup to enclose that range.
> > 
> > Now our allocator for bridge windows cannot handle that and probably
> > never will, so we have to rely on the existing window established by
> > the FW being reasonable and use it. We can still *extend" bridge
> > windows (and we have code to do that) if necessary but we cannot move
> > them if they contain a fixed BAR device.
> > 
> > There is a much bigger discussion to be had around that concept of
> > fixed device anyway, maybe at Plumbers ? Why is the BAR fixed ? Because
> > the EFI FB is on it ? Because HW bugs ? Because FW might access it from
> > SMM or ARM equivalent ? Because ACPI will poke at it based on its
> > initial address ? etc...
> 
> Consider a slot booked at LPC PCI uconf for this discussion.

Excellent.

> > Some of the answers to the above questions imply more than the need to
> > fix the BAR: Does it also mean that disabling access to that BAR, even
> > temporarily, isn't safe ? However that's what we do today when we
> > probe, if anything, to do the BAR sizing...
> 
> Eh, another question that came up already should be debated.

Yup.

> > This isn't a new problem. We had issues like that dating back 15 years
> > on powerpc for example, where a big ASIC hanging off PCI had all the
> > Apple gunk including the interrupt controller, which was initialized
> > from the DT way before PCI probing. If you took an interrupt at the
> > "wrong" time during BAR sizing, kaboom ! If you had debug printk's in
> > the wrong place in the PCI probing code, kaboom ! etc....
> > 
> > If we want to solve that properly in the long run, we'll probably want
> > ACPI to tell us the BAR sizes and use that instead of doing manual
> > sizing on such "system" devices. We similarily have ways to "construct"
> > pci_dev's from the OF tree on sparc64 and powerpc, limiting direct
> > config access to populate stuff we can't get from FW.
> 
> https://lore.kernel.org/linux-pci/20190121174225.15835-1-mr.nuke.me@gmail.com/
> 
> ?

Ah I don't know enough about ACPI yet, on my reading list :-)

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-11 14:58           ` Lorenzo Pieralisi
@ 2019-06-11 22:19             ` Benjamin Herrenschmidt
  2019-06-12 10:08               ` Lorenzo Pieralisi
  0 siblings, 1 reply; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2019-06-11 22:19 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Bjorn Helgaas, Ard Biesheuvel, Sinan Kaya, linux-pci,
	linux-arm-kernel, Zilberman, Zeev, Saidi, Ali

On Tue, 2019-06-11 at 15:58 +0100, Lorenzo Pieralisi wrote:
> 
> 
> 	if (obj && obj->type == ACPI_TYPE_INTEGER && obj->integer.value == 0) {
> 		/* preserve existing resource assignment */
> 		pci_bus_claim_resources(bus);
> 	}
> 
> 	pci_bus_size_bridges(bus);
> 	pci_bus_assign_resources(bus);

So that makes me nervous... my understanding is that the pair

	pci_bus_size_bridges(bus);
 	pci_bus_assign_resources(bus);

Is intended for full reassignment. Now they will try to skip resources
that already have a parent, but that's yet another code path. What's
wrong with pci_unassigned_* ? That's what it's meant for...

> That's how it should be I think:
> 
> 1) we do not want pci_assign_unassigned_root_bus_resources(bus) to
>    reallocate resources already claimed (see realloc parameter), do we ?

Well, realloc is useful to handle SR_IOV when the BIOS doesn't do it
right (common case). That said, at this point, we should be able to
honor IORESOURCE_PCI_FIXED for things that have _DSM #5 since they have
been claimed. I don't see that realloc logic being a problem for us,
and I want to avoid gratuitous differences with x86, but maybe I'm
missing something here...

> 2) pci_bus_size_bridges(bus) and pci_bus_assign_resources(bus) should
>    not interfere with resources already claimed so it *should* be safe
>    to call them anyway

Sure, *should* and here we introduce yet another way of doing things
though... Any reason we don't want to do what x86 does here ?

> Most importantly: I want everyone to agree that claiming is equivalent
> to making a resource immutable (except for realloc, see (1) above)
> because that's what we are doing by claiming on _DSM #5 == 0.

I think the combination of claiming *and* IORESOURCE_PCI_FIXED is what
makes it *really* immutable. I'm a bit confused by the realloc logic
right now, I'll need more quality time looking at it after ingesting
more caffeing but I'm under the impression that it will honor the flag.

> There are too many ways to make a resource immutable in the kernel
> and this is confusing and prone to bugs.

It is, but I don't want to create new ways of doing things, and what
you seem to propose is a new way imho :-)

Cheers,
Ben.

> Thanks,
> Lorenzo
> 
> > +	ACPI_FREE(obj);
> >  
> >  	list_for_each_entry(child, &bus->children, node)
> >  		pcie_bus_configure_settings(child);
> > diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
> > index 8082b612f561..62b7fdcc661c 100644
> > --- a/include/linux/pci-acpi.h
> > +++ b/include/linux/pci-acpi.h
> > @@ -107,9 +107,10 @@ static inline void acpiphp_check_host_bridge(struct acpi_device *adev) { }
> >  #endif
> >  
> >  extern const guid_t pci_acpi_dsm_guid;
> > -#define DEVICE_LABEL_DSM	0x07
> > -#define RESET_DELAY_DSM		0x08
> > -#define FUNCTION_DELAY_DSM	0x09
> > +#define IGNORE_PCI_BOOT_CONFIG_DSM	0x05
> > +#define DEVICE_LABEL_DSM		0x07
> > +#define RESET_DELAY_DSM			0x08
> > +#define FUNCTION_DELAY_DSM		0x09
> >  
> >  #else	/* CONFIG_ACPI */
> >  static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
> > 
> > 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-11 22:09                 ` Benjamin Herrenschmidt
@ 2019-06-11 22:34                   ` Ard Biesheuvel
  2019-06-11 22:40                     ` Benjamin Herrenschmidt
  2019-06-12 10:21                   ` Lorenzo Pieralisi
  1 sibling, 1 reply; 27+ messages in thread
From: Ard Biesheuvel @ 2019-06-11 22:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Lorenzo Pieralisi, Bjorn Helgaas, Sinan Kaya, linux-pci,
	linux-arm-kernel, Zilberman, Zeev, Saidi, Ali

'

On Wed, 12 Jun 2019 at 00:09, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
>
> On Tue, 2019-06-11 at 15:31 +0100, Lorenzo Pieralisi wrote:
> >
> > True, minus specs update schedule, I can't change that and merging
> > this patch (and firmware thereof) relies on specifications that
> > are intent changes till they become an ECN (~another merge window,
> > so this patch could land at v5.4).
>
> Hrm... annoying for us but I understand your reasoning.
>
> > The other option is doing what this patch does *without* relying
> > on _DSM #5, we may have regressions unfortunately though.
>
> We could work around regressions with quirks I suppose. It does make
> sense to assume that if you have ACPI and UEFI, you have a decent PCI
> BAR assignment at boot in the "general case". That said, we need to
> double check first that pci_bus_claim_resources() will not do horrible
> things on partially assigned setups, since there's a real interest in
> doing that in the field.
>

EDK2 based code is typically very fork heavy, in the sense that,
instead of upstreaming a change, a driver gets forked and changes are
applied locally, which then need to be carried into perpetuity. That
means that 'recent' ports could still display behavior that was
removed from the generic code a long time ago. All the open source
arm64 platforms now use the generic PCI host bridge driver (which is
in charge of the bus enumeration and resource allocation) and so
hopefully, future platforms will not deviate too much from that.

In particular, EDK2 has some PCD tunables for things like PCIe hotplug
and SR-IOV support, which affects the number of spare buses that get
allocated for hotplug capable root ports, and for SR-IOV capable
endpoints.

As Lorenzo mentions, we don't actively reassign bus numbers from
scratch, but I am not sure if that is 100% true. I think you do get
some errors when booting with hotplug capable root ports that don't
have 'pci_hotplug_bus_size' spare bus numbers available.

Also note that EDK2 leaves ROM BARs unassigned.

> > It is kind of orthogonal (but not really), bus numbers assignment
> > is _not_ in line with resource assignment at the moment and I want
> > to change it.
>
> Hrm. We should probably reassign bus numbers if we reassign resources
> yes, but then I'd like us to not reassign resources unless we have to
> :-)
>
> > Since ACPI on ARM64 is still at its inception maybe we should have
> > a stab at patching the kernel so that it reassigns bus numbers by
> > default and toggle that behaviour on _DSM #5 == 0 detection.
> >
> > I doubt that reassigning bus numbers by default can trigger
> > regressions on existing platforms but the only way to figure
> > it out is by testing it.
> >
> > > My thinking is if we converge everybody toward the x86 method of
> > > doing
> > > a 2 pass survey of existing resources followed by
> > > assign_unassigned,
> >
> > I am not entirely sure we need a 2-pass survey,
> >
> > pci_bus_claim_resources()
> >
> > should be enough; if it is not we update it.
>
> So it's not so much about the 2 passes per-se, though they have value,
> it's more about consolidating archs to do the same thing. Chances that
> we change x86 are nil. But we can change powerpc and arm64 to do like
> x86 and move that code to generic.
>
> pci_bus_claim_resources() seems to be a "lightweight" variant of the
> survey done by x86. The main differences I can see are:
>
>  - The 2 passes thing which we may or may not care about, its main
> purpose is to favor resources that are already enabled by the BIOS in
> case of conflicts as far as I understand.
>
>  - pci_read_bridge_bases() is done by pci_bus_claim_resources(), while
> x86 (and powerpc and others) do it in their pcibios_fixup_bus. That one
> is interesting... Any reason why we shouldn't unconditionally read the
> bridges while probing ? Bjorn ?
>
>  - When allocating bridge resources, there are interesting differences:
>
>   * x86 (and powerpc to some extent): If one has a 0 start or we fail
> to claim it, x86 will wipe out the resource struct (including flags). I
> assume that pci_assign_unassign_* will restore bridges when needed but
> I haven't verified.
>
>   * pci_bus_claim_resources() is dumber in that regard. It will call
> pci_claim_bridge_resources() blindly try to claim whatever is there
> even if res->start is 0. This could be a problem with partially
> assigned trees. It also doesn't wipe the resource in case of failure to
> claim which could be a problem going down the tree and letting children
> attach to the non-claimed resource, thus potentially causing the
> reassign pass to fail.
>
> The r->start == 0 test is interesting ... the generic claim code will
> honor IORESOURCE_UNSET but we don't seem to set that generically unless
> we hit some of the specific pass for explicit resource alignment, or
> during the reassignment phases.
>
>  - When allocating device resources, the main difference other than the
> 2 passes is that x86 will "0 base" the resource (r->end -= r->start; r-
> >start = 0) for later reassignment. The claim path we use won't do
> that. Note: none sets IORESOURCE_UNSET... Additionally x86 has some
> oddball code to save the original FW values and restore them if
> assignment later fails, which is somewhat odd since there's a conflict
> but probably helps really broken setups.
>
>  - x86 will not claim ROMs in that pass, it does a 3rd pass just for
> them (it's common I think to not have room for all the ROMs). It also
> disables them in config space during the survey.
> pci_bus_claim_resources() will claim everything and leave ROMs enabled.
>
> So as a somewhat temprary conclusion, I think the main difference here
> is what happens when claim fails (also the res->start = 0 case which we
> need to look at more closely) and whether we should make the generic
> code also "0-base" the resource.
>
> The question for me really is, do we want to just "upgrade" (if
> necessary) pci_bus_claim_resources() and continue having x86 do its own
> thing for ever, or do we want to consolidate around what is probably
> the most tested platform when it comes to PCI :-)
>
> And if we consolidate, I think that won't be by changing what x86 does,
> that code is the result of decades of fiddling to get things right with
> all sorts of broken BIOSes...
>
> > > and have that the main generic code path (with added quirks to force a
> > > full assignment and keeping probe_only around but that's easy, we have
> > > that on powerpc and our code is originally based on the x86 one), then
> > > we'll have a much easier time supporting IORESOURCE_PCI_FIXED on
> > > portions of the tree as well (though it also becomes less critical to
> > > do so since we will no longer reallocate unless we have to).
> > >
> > > That said we need to understand what "fixed" means and why we do it.
> >
> > Agree, totally and I want to make it clear how a BAR is fixed in
> > the kernel, there are too many discrepancies in the resource
> > management code already.
> >
> > > IE, If an endpoint somehere has "fixed" BARs for example, that means
> > > all parent bridge must be setup to enclose that range.
> > >
> > > Now our allocator for bridge windows cannot handle that and probably
> > > never will, so we have to rely on the existing window established by
> > > the FW being reasonable and use it. We can still *extend" bridge
> > > windows (and we have code to do that) if necessary but we cannot move
> > > them if they contain a fixed BAR device.
> > >
> > > There is a much bigger discussion to be had around that concept of
> > > fixed device anyway, maybe at Plumbers ? Why is the BAR fixed ? Because
> > > the EFI FB is on it ? Because HW bugs ? Because FW might access it from
> > > SMM or ARM equivalent ? Because ACPI will poke at it based on its
> > > initial address ? etc...
> >
> > Consider a slot booked at LPC PCI uconf for this discussion.
>
> Excellent.
>
> > > Some of the answers to the above questions imply more than the need to
> > > fix the BAR: Does it also mean that disabling access to that BAR, even
> > > temporarily, isn't safe ? However that's what we do today when we
> > > probe, if anything, to do the BAR sizing...
> >
> > Eh, another question that came up already should be debated.
>
> Yup.
>
> > > This isn't a new problem. We had issues like that dating back 15 years
> > > on powerpc for example, where a big ASIC hanging off PCI had all the
> > > Apple gunk including the interrupt controller, which was initialized
> > > from the DT way before PCI probing. If you took an interrupt at the
> > > "wrong" time during BAR sizing, kaboom ! If you had debug printk's in
> > > the wrong place in the PCI probing code, kaboom ! etc....
> > >
> > > If we want to solve that properly in the long run, we'll probably want
> > > ACPI to tell us the BAR sizes and use that instead of doing manual
> > > sizing on such "system" devices. We similarily have ways to "construct"
> > > pci_dev's from the OF tree on sparc64 and powerpc, limiting direct
> > > config access to populate stuff we can't get from FW.
> >
> > https://lore.kernel.org/linux-pci/20190121174225.15835-1-mr.nuke.me@gmail.com/
> >
> > ?
>
> Ah I don't know enough about ACPI yet, on my reading list :-)
>
> Cheers,
> Ben.
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-11 22:34                   ` Ard Biesheuvel
@ 2019-06-11 22:40                     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2019-06-11 22:40 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Lorenzo Pieralisi, Bjorn Helgaas, Sinan Kaya, linux-pci,
	linux-arm-kernel, Zilberman, Zeev, Saidi, Ali

On Wed, 2019-06-12 at 00:34 +0200, Ard Biesheuvel wrote:

> EDK2 based code is typically very fork heavy, in the sense that,
> instead of upstreaming a change, a driver gets forked and changes are
> applied locally, which then need to be carried into perpetuity. That
> means that 'recent' ports could still display behavior that was
> removed from the generic code a long time ago. All the open source
> arm64 platforms now use the generic PCI host bridge driver (which is
> in charge of the bus enumeration and resource allocation) and so
> hopefully, future platforms will not deviate too much from that.
> 
> In particular, EDK2 has some PCD tunables for things like PCIe
> hotplug
> and SR-IOV support, which affects the number of spare buses that get
> allocated for hotplug capable root ports, and for SR-IOV capable
> endpoints.
> 
> As Lorenzo mentions, we don't actively reassign bus numbers from
> scratch, but I am not sure if that is 100% true. I think you do get
> some errors when booting with hotplug capable root ports that don't
> have 'pci_hotplug_bus_size' spare bus numbers available.
> 
> Also note that EDK2 leaves ROM BARs unassigned.

This is all somewhat reasonable. x86 is in the same situation which is
why I'm really keen on trying to consolidate the two approaches.

> > > It is kind of orthogonal (but not really), bus numbers assignment
> > > is _not_ in line with resource assignment at the moment and I
> > > want
> > > to change it.
> > 
> > Hrm. We should probably reassign bus numbers if we reassign
> > resources
> > yes, but then I'd like us to not reassign resources unless we have
> > to
> > :-)
> > 
> > > Since ACPI on ARM64 is still at its inception maybe we should
> > > have
> > > a stab at patching the kernel so that it reassigns bus numbers by
> > > default and toggle that behaviour on _DSM #5 == 0 detection.
> > > 
> > > I doubt that reassigning bus numbers by default can trigger
> > > regressions on existing platforms but the only way to figure
> > > it out is by testing it.
> > > 
> > > > My thinking is if we converge everybody toward the x86 method
> > > > of
> > > > doing
> > > > a 2 pass survey of existing resources followed by
> > > > assign_unassigned,
> > > 
> > > I am not entirely sure we need a 2-pass survey,
> > > 
> > > pci_bus_claim_resources()
> > > 
> > > should be enough; if it is not we update it.
> > 
> > So it's not so much about the 2 passes per-se, though they have
> > value,
> > it's more about consolidating archs to do the same thing. Chances
> > that
> > we change x86 are nil. But we can change powerpc and arm64 to do
> > like
> > x86 and move that code to generic.
> > 
> > pci_bus_claim_resources() seems to be a "lightweight" variant of
> > the
> > survey done by x86. The main differences I can see are:
> > 
> >  - The 2 passes thing which we may or may not care about, its main
> > purpose is to favor resources that are already enabled by the BIOS
> > in
> > case of conflicts as far as I understand.
> > 
> >  - pci_read_bridge_bases() is done by pci_bus_claim_resources(),
> > while
> > x86 (and powerpc and others) do it in their pcibios_fixup_bus. That
> > one
> > is interesting... Any reason why we shouldn't unconditionally read
> > the
> > bridges while probing ? Bjorn ?
> > 
> >  - When allocating bridge resources, there are interesting
> > differences:
> > 
> >   * x86 (and powerpc to some extent): If one has a 0 start or we
> > fail
> > to claim it, x86 will wipe out the resource struct (including
> > flags). I
> > assume that pci_assign_unassign_* will restore bridges when needed
> > but
> > I haven't verified.
> > 
> >   * pci_bus_claim_resources() is dumber in that regard. It will
> > call
> > pci_claim_bridge_resources() blindly try to claim whatever is there
> > even if res->start is 0. This could be a problem with partially
> > assigned trees. It also doesn't wipe the resource in case of
> > failure to
> > claim which could be a problem going down the tree and letting
> > children
> > attach to the non-claimed resource, thus potentially causing the
> > reassign pass to fail.
> > 
> > The r->start == 0 test is interesting ... the generic claim code
> > will
> > honor IORESOURCE_UNSET but we don't seem to set that generically
> > unless
> > we hit some of the specific pass for explicit resource alignment,
> > or
> > during the reassignment phases.
> > 
> >  - When allocating device resources, the main difference other than
> > the
> > 2 passes is that x86 will "0 base" the resource (r->end -= r-
> > >start; r-
> > > start = 0) for later reassignment. The claim path we use won't do
> > 
> > that. Note: none sets IORESOURCE_UNSET... Additionally x86 has some
> > oddball code to save the original FW values and restore them if
> > assignment later fails, which is somewhat odd since there's a
> > conflict
> > but probably helps really broken setups.
> > 
> >  - x86 will not claim ROMs in that pass, it does a 3rd pass just
> > for
> > them (it's common I think to not have room for all the ROMs). It
> > also
> > disables them in config space during the survey.
> > pci_bus_claim_resources() will claim everything and leave ROMs
> > enabled.
> > 
> > So as a somewhat temprary conclusion, I think the main difference
> > here
> > is what happens when claim fails (also the res->start = 0 case
> > which we
> > need to look at more closely) and whether we should make the
> > generic
> > code also "0-base" the resource.
> > 
> > The question for me really is, do we want to just "upgrade" (if
> > necessary) pci_bus_claim_resources() and continue having x86 do its
> > own
> > thing for ever, or do we want to consolidate around what is
> > probably
> > the most tested platform when it comes to PCI :-)
> > 
> > And if we consolidate, I think that won't be by changing what x86
> > does,
> > that code is the result of decades of fiddling to get things right
> > with
> > all sorts of broken BIOSes...
> > 
> > > > and have that the main generic code path (with added quirks to
> > > > force a
> > > > full assignment and keeping probe_only around but that's easy,
> > > > we have
> > > > that on powerpc and our code is originally based on the x86
> > > > one), then
> > > > we'll have a much easier time supporting IORESOURCE_PCI_FIXED
> > > > on
> > > > portions of the tree as well (though it also becomes less
> > > > critical to
> > > > do so since we will no longer reallocate unless we have to).
> > > > 
> > > > That said we need to understand what "fixed" means and why we
> > > > do it.
> > > 
> > > Agree, totally and I want to make it clear how a BAR is fixed in
> > > the kernel, there are too many discrepancies in the resource
> > > management code already.
> > > 
> > > > IE, If an endpoint somehere has "fixed" BARs for example, that
> > > > means
> > > > all parent bridge must be setup to enclose that range.
> > > > 
> > > > Now our allocator for bridge windows cannot handle that and
> > > > probably
> > > > never will, so we have to rely on the existing window
> > > > established by
> > > > the FW being reasonable and use it. We can still *extend"
> > > > bridge
> > > > windows (and we have code to do that) if necessary but we
> > > > cannot move
> > > > them if they contain a fixed BAR device.
> > > > 
> > > > There is a much bigger discussion to be had around that concept
> > > > of
> > > > fixed device anyway, maybe at Plumbers ? Why is the BAR fixed ?
> > > > Because
> > > > the EFI FB is on it ? Because HW bugs ? Because FW might access
> > > > it from
> > > > SMM or ARM equivalent ? Because ACPI will poke at it based on
> > > > its
> > > > initial address ? etc...
> > > 
> > > Consider a slot booked at LPC PCI uconf for this discussion.
> > 
> > Excellent.
> > 
> > > > Some of the answers to the above questions imply more than the
> > > > need to
> > > > fix the BAR: Does it also mean that disabling access to that
> > > > BAR, even
> > > > temporarily, isn't safe ? However that's what we do today when
> > > > we
> > > > probe, if anything, to do the BAR sizing...
> > > 
> > > Eh, another question that came up already should be debated.
> > 
> > Yup.
> > 
> > > > This isn't a new problem. We had issues like that dating back
> > > > 15 years
> > > > on powerpc for example, where a big ASIC hanging off PCI had
> > > > all the
> > > > Apple gunk including the interrupt controller, which was
> > > > initialized
> > > > from the DT way before PCI probing. If you took an interrupt at
> > > > the
> > > > "wrong" time during BAR sizing, kaboom ! If you had debug
> > > > printk's in
> > > > the wrong place in the PCI probing code, kaboom ! etc....
> > > > 
> > > > If we want to solve that properly in the long run, we'll
> > > > probably want
> > > > ACPI to tell us the BAR sizes and use that instead of doing
> > > > manual
> > > > sizing on such "system" devices. We similarily have ways to
> > > > "construct"
> > > > pci_dev's from the OF tree on sparc64 and powerpc, limiting
> > > > direct
> > > > config access to populate stuff we can't get from FW.
> > > 
> > > 
https://lore.kernel.org/linux-pci/20190121174225.15835-1-mr.nuke.me@gmail.com/
> > > 
> > > ?
> > 
> > Ah I don't know enough about ACPI yet, on my reading list :-)
> > 
> > Cheers,
> > Ben.
> > 
> > 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-06  9:00         ` [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup Benjamin Herrenschmidt
  2019-06-06  9:13           ` Ard Biesheuvel
  2019-06-11 14:58           ` Lorenzo Pieralisi
@ 2019-06-11 23:39           ` Bjorn Helgaas
  2019-06-12  0:06             ` Benjamin Herrenschmidt
  2 siblings, 1 reply; 27+ messages in thread
From: Bjorn Helgaas @ 2019-06-11 23:39 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ard Biesheuvel, Sinan Kaya, Lorenzo Pieralisi, linux-pci,
	linux-arm-kernel, Zilberman, Zeev, Saidi, Ali

On Thu, Jun 06, 2019 at 07:00:12PM +1000, Benjamin Herrenschmidt wrote:
> From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> 
> On arm64 ACPI systems, we unconditionally reconfigure the entire PCI
> hierarchy at boot. This is a departure from what is customary on ACPI
> systems, and may break assumptions in some places (e.g., EFIFB), that
> the kernel will leave BARs of enabled PCI devices where they are.
> 
> Given that PCI already specifies a device specific ACPI method (_DSM)
> for PCI root bridge nodes that tells us whether the firmware thinks
> the configuration should be left alone, let's sidestep the entire
> policy debate about whether the PCI configuration should be preserved
> or not, and put it under the control of the firmware instead.

The current PCI Firmware spec r3.2 specifies _DSM function 5 for
PCI-to-PCI bridge objects, which does not include host bridge
(PNP0A03) nodes, but the proposed revision does allow it under host
bridges.  So I'm fine with this, but we should update the commit log
so it doesn't say "PCI *already* specifies this".

> [BenH: Added pci_assign_unassigned_root_bus_resources()]
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

I think you should add a signed-off-by for yourself?

> ---
> 
> So I would like this variant rather than mucking around with
> IORESOURCE_PCI_FIXED at this stage to fix the problem with our platforms.
> 
> See my other email, IORESOURCE_PCI_FIXED doesn't really work terribly well
> when using pci_bus_size_bridges and pci_bus_assign_resources, and the
> resulting patches are ugly and add more mess.
> 
> Long run, I propose to start working on consolidating all those various
> resource survey mechanisms around what x86 does, unless people strongly
> object... (with the addition of the probe only and force reassign quirks
> so platforms can still chose that).
> 
> Note: I haven't tested the effect of pci_assign_unassigned_root_bus_resources
> as our platforms don't leave anything unassigned. I'm not entirely sure how
> well pci_bus_claim_resources() will deal with a partially assigned setup...
> 
> We do want to support partial assignment by BIOS though, it's a trend to
> reduce boot time, people seem to want BIOSes to only assign what's critical
> for booting.
> 
> Bjorn: I haven't made the claim path the default in absence of _DSM #5 yet.
> I suggest we do that as a separate patch in case it breaks somebody, thus
> making bisection more meaningful. It will also make this one more palatable
> to distros since it won't change the behaviour on systems without _DSM #5,
> and we verified nobody has it except Seattle which returns 1. 
> 
>  arch/arm64/kernel/pci.c  | 23 +++++++++++++++++++++--
>  include/linux/pci-acpi.h |  7 ++++---
>  2 files changed, 25 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c
> index bb85e2f4603f..6358e1cb4f9f 100644
> --- a/arch/arm64/kernel/pci.c
> +++ b/arch/arm64/kernel/pci.c
> @@ -168,6 +168,7 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
>  	struct acpi_pci_generic_root_info *ri;
>  	struct pci_bus *bus, *child;
>  	struct acpi_pci_root_ops *root_ops;
> +	union acpi_object *obj;
>  
>  	ri = kzalloc(sizeof(*ri), GFP_KERNEL);
>  	if (!ri)
> @@ -193,8 +194,26 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
>  	if (!bus)
>  		return NULL;
>  
> -	pci_bus_size_bridges(bus);
> -	pci_bus_assign_resources(bus);
> +	/*
> +	 * Invoke the PCI device specific method (_DSM) #5 'Ignore PCI Boot
> +	 * Configuration', which tells us whether the firmware wants us to
> +	 * preserve the configuration of the PCI resource tree for this root
> +	 * bridge.
> +	 */
> +	obj = acpi_evaluate_dsm(ACPI_HANDLE(bus->bridge), &pci_acpi_dsm_guid, 1,
> +	                        IGNORE_PCI_BOOT_CONFIG_DSM, NULL);
> +	if (obj && obj->type == ACPI_TYPE_INTEGER && obj->integer.value == 0) {

This is fine, but can we make a tiny step toward doing this in generic
code instead of adding more arch-specific stuff?

E.g., evaluate the _DSM in the generic acpi_pci_root_add(), set a
"preserve_config" bit in the struct acpi_pci_root, and test the bit
here?

It would also be nice to add a printk in the other
pci_acpi_scan_root() implementations if the bit is set so we know that
the platform supplied the _DSM but we're ignoring it.

> +		/* preserve existing resource assignment */
> +		pci_bus_claim_resources(bus);
> +
> +		/* Assign anything that might have been left out */
> +		pci_assign_unassigned_root_bus_resources(bus);
> +	} else {
> +		/* reconfigure the resource tree from scratch */
> +		pci_bus_size_bridges(bus);
> +		pci_bus_assign_resources(bus);
> +	}
> +	ACPI_FREE(obj);
>  
>  	list_for_each_entry(child, &bus->children, node)
>  		pcie_bus_configure_settings(child);
> diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
> index 8082b612f561..62b7fdcc661c 100644
> --- a/include/linux/pci-acpi.h
> +++ b/include/linux/pci-acpi.h
> @@ -107,9 +107,10 @@ static inline void acpiphp_check_host_bridge(struct acpi_device *adev) { }
>  #endif
>  
>  extern const guid_t pci_acpi_dsm_guid;
> -#define DEVICE_LABEL_DSM	0x07
> -#define RESET_DELAY_DSM		0x08
> -#define FUNCTION_DELAY_DSM	0x09
> +#define IGNORE_PCI_BOOT_CONFIG_DSM	0x05
> +#define DEVICE_LABEL_DSM		0x07
> +#define RESET_DELAY_DSM			0x08
> +#define FUNCTION_DELAY_DSM		0x09
>  
>  #else	/* CONFIG_ACPI */
>  static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-11 23:39           ` Bjorn Helgaas
@ 2019-06-12  0:06             ` Benjamin Herrenschmidt
  2019-06-12 13:27               ` Bjorn Helgaas
  0 siblings, 1 reply; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2019-06-12  0:06 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Ard Biesheuvel, Sinan Kaya, Lorenzo Pieralisi, linux-pci,
	linux-arm-kernel, Zilberman, Zeev, Saidi, Ali

On Tue, 2019-06-11 at 18:39 -0500, Bjorn Helgaas wrote:
> On Thu, Jun 06, 2019 at 07:00:12PM +1000, Benjamin Herrenschmidt wrote:
> > From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > 
> > On arm64 ACPI systems, we unconditionally reconfigure the entire PCI
> > hierarchy at boot. This is a departure from what is customary on ACPI
> > systems, and may break assumptions in some places (e.g., EFIFB), that
> > the kernel will leave BARs of enabled PCI devices where they are.
> > 
> > Given that PCI already specifies a device specific ACPI method (_DSM)
> > for PCI root bridge nodes that tells us whether the firmware thinks
> > the configuration should be left alone, let's sidestep the entire
> > policy debate about whether the PCI configuration should be preserved
> > or not, and put it under the control of the firmware instead.
> 
> The current PCI Firmware spec r3.2 specifies _DSM function 5 for
> PCI-to-PCI bridge objects, which does not include host bridge
> (PNP0A03) nodes, but the proposed revision does allow it under host
> bridges.  So I'm fine with this, but we should update the commit log
> so it doesn't say "PCI *already* specifies this".
> 
> > [BenH: Added pci_assign_unassigned_root_bus_resources()]
> > 
> > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> 
> I think you should add a signed-off-by for yourself?

I should, I forgot. That said, Lorenzo wants to wait for the actual
ECN... and we're also discussing some details.
> 

 .../...

> > +	/*
> > +	 * Invoke the PCI device specific method (_DSM) #5 'Ignore PCI Boot
> > +	 * Configuration', which tells us whether the firmware wants us to
> > +	 * preserve the configuration of the PCI resource tree for this root
> > +	 * bridge.
> > +	 */
> > +	obj = acpi_evaluate_dsm(ACPI_HANDLE(bus->bridge), &pci_acpi_dsm_guid, 1,
> > +	                        IGNORE_PCI_BOOT_CONFIG_DSM, NULL);
> > +	if (obj && obj->type == ACPI_TYPE_INTEGER && obj->integer.value == 0) {
> 
> This is fine, but can we make a tiny step toward doing this in generic
> code instead of adding more arch-specific stuff?
> 
> E.g., evaluate the _DSM in the generic acpi_pci_root_add(), set a
> "preserve_config" bit in the struct acpi_pci_root, and test the bit
> here?

I'd rather have the flag in the host bridge no ?

> It would also be nice to add a printk in the oter
> pci_acpi_scan_root() implementations if the bit is set so we know that
> the platform supplied the _DSM but we're ignoring it.

Ok.

Talking of which, look at the ongoing discussion I have with Lorenzo
when it comes to pci_bus_claim_resources vs. what x86 does, I'd love
for you to chime in. I'd like to try to consolidate things further
accross architectures but there might be reasons I don't see as to why
things are different in that area, so ...

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-11 22:19             ` Benjamin Herrenschmidt
@ 2019-06-12 10:08               ` Lorenzo Pieralisi
  2019-06-12 10:58                 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 27+ messages in thread
From: Lorenzo Pieralisi @ 2019-06-12 10:08 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Bjorn Helgaas, Ard Biesheuvel, Sinan Kaya, linux-pci,
	linux-arm-kernel, Zilberman, Zeev, Saidi, Ali

On Wed, Jun 12, 2019 at 08:19:40AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2019-06-11 at 15:58 +0100, Lorenzo Pieralisi wrote:
> > 
> > 
> > 	if (obj && obj->type == ACPI_TYPE_INTEGER && obj->integer.value == 0) {
> > 		/* preserve existing resource assignment */
> > 		pci_bus_claim_resources(bus);
> > 	}
> > 
> > 	pci_bus_size_bridges(bus);
> > 	pci_bus_assign_resources(bus);
> 
> So that makes me nervous... my understanding is that the pair
> 
> 	pci_bus_size_bridges(bus);
>  	pci_bus_assign_resources(bus);
> 
> Is intended for full reassignment. Now they will try to skip resources
> that already have a parent, but that's yet another code path. What's
> wrong with pci_unassigned_* ? That's what it's meant for...

Nothing wrong, we have not understood each others. What I am asking
is to write it like this:

if (obj && obj->type == ACPI_TYPE_INTEGER && obj->integer.value == 0) {
	/* preserve existing resource assignment */
	pci_bus_claim_resources(bus);
}

/* this code path should be common, indipendent of _DSM #5
pci_assign_unassigned_root_bus_resources(bus);

There is no reason to have two distinct code paths, current code
can call:

pci_assign_unassigned_root_bus_resources(bus);

instead of

pci_bus_size_bridges(bus);
pci_bus_assign_resources(bus);

Actually, current code is *questionable* given that AFAICS it
does not account for hotplug bridges additional memory window
size.

> > That's how it should be I think:
> > 
> > 1) we do not want pci_assign_unassigned_root_bus_resources(bus) to
> >    reallocate resources already claimed (see realloc parameter), do we ?
> 
> Well, realloc is useful to handle SR_IOV when the BIOS doesn't do it
> right (common case). That said, at this point, we should be able to
> honor IORESOURCE_PCI_FIXED for things that have _DSM #5 since they have
> been claimed. I don't see that realloc logic being a problem for us,
> and I want to avoid gratuitous differences with x86, but maybe I'm
> missing something here...

See above. The conditions that make IORESOURCE_PCI_FIXED work are
still unclear to me.

> > 2) pci_bus_size_bridges(bus) and pci_bus_assign_resources(bus) should
> >    not interfere with resources already claimed so it *should* be safe
> >    to call them anyway
> 
> Sure, *should* and here we introduce yet another way of doing things
> though... Any reason we don't want to do what x86 does here ?

No, see above, keeping in mind that what x86 does is still not
very well defined AFAICS.

> > Most importantly: I want everyone to agree that claiming is equivalent
> > to making a resource immutable (except for realloc, see (1) above)
> > because that's what we are doing by claiming on _DSM #5 == 0.
> 
> I think the combination of claiming *and* IORESOURCE_PCI_FIXED is what
> makes it *really* immutable. I'm a bit confused by the realloc logic
> right now, I'll need more quality time looking at it after ingesting
> more caffeing but I'm under the impression that it will honor the flag.

Likewise, this requires some fuzzing because it is really hard to
understand by perusing the code.

> > There are too many ways to make a resource immutable in the kernel
> > and this is confusing and prone to bugs.
> 
> It is, but I don't want to create new ways of doing things, and what
> you seem to propose is a new way imho :-)

No, see above. What I am saying is that we have IORESOURCE_PCI_FIXED,
res->parent != NULL and Enhanced allocation to make a BAR immutable and
before going any further it would be great if we understand their
interaction given that we are starting from a pseudo clean slate.

If we fail to do that it is quirks later (and I would rather avoid
seeing x86 command line parameters to work around them).

Cheers,
Lorenzo

> Cheers,
> Ben.
> 
> > Thanks,
> > Lorenzo
> > 
> > > +	ACPI_FREE(obj);
> > >  
> > >  	list_for_each_entry(child, &bus->children, node)
> > >  		pcie_bus_configure_settings(child);
> > > diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
> > > index 8082b612f561..62b7fdcc661c 100644
> > > --- a/include/linux/pci-acpi.h
> > > +++ b/include/linux/pci-acpi.h
> > > @@ -107,9 +107,10 @@ static inline void acpiphp_check_host_bridge(struct acpi_device *adev) { }
> > >  #endif
> > >  
> > >  extern const guid_t pci_acpi_dsm_guid;
> > > -#define DEVICE_LABEL_DSM	0x07
> > > -#define RESET_DELAY_DSM		0x08
> > > -#define FUNCTION_DELAY_DSM	0x09
> > > +#define IGNORE_PCI_BOOT_CONFIG_DSM	0x05
> > > +#define DEVICE_LABEL_DSM		0x07
> > > +#define RESET_DELAY_DSM			0x08
> > > +#define FUNCTION_DELAY_DSM		0x09
> > >  
> > >  #else	/* CONFIG_ACPI */
> > >  static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
> > > 
> > > 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-11 22:09                 ` Benjamin Herrenschmidt
  2019-06-11 22:34                   ` Ard Biesheuvel
@ 2019-06-12 10:21                   ` Lorenzo Pieralisi
  2019-06-12 22:05                     ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 27+ messages in thread
From: Lorenzo Pieralisi @ 2019-06-12 10:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ard Biesheuvel, Bjorn Helgaas, Sinan Kaya, linux-pci,
	linux-arm-kernel, Zilberman, Zeev, Saidi, Ali

On Wed, Jun 12, 2019 at 08:09:01AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2019-06-11 at 15:31 +0100, Lorenzo Pieralisi wrote:
> > 
> > True, minus specs update schedule, I can't change that and merging
> > this patch (and firmware thereof) relies on specifications that
> > are intent changes till they become an ECN (~another merge window,
> > so this patch could land at v5.4).
> 
> Hrm... annoying for us but I understand your reasoning.

If we can wait it is better, also because it gives us time to
bring this discussion to completion.

> > The other option is doing what this patch does *without* relying
> > on _DSM #5, we may have regressions unfortunately though.
> 
> We could work around regressions with quirks I suppose. It does make
> sense to assume that if you have ACPI and UEFI, you have a decent PCI
> BAR assignment at boot in the "general case". That said, we need to
> double check first that pci_bus_claim_resources() will not do horrible
> things on partially assigned setups, since there's a real interest in
> doing that in the field.
> 
> > It is kind of orthogonal (but not really), bus numbers assignment
> > is _not_ in line with resource assignment at the moment and I want
> > to change it.
> 
> Hrm. We should probably reassign bus numbers if we reassign resources
> yes, but then I'd like us to not reassign resources unless we have to
> :-)

But for that we can use _DSM #5 returning 0, at least we would
be consistent.

Current situation is inconsistent and that bothers me I can put
together a separate patch and send it as an RFT, there are not
that many ARM64 PCI ACPI platforms to test it on.

> > a stab at patching the kernel so that it reassigns bus numbers by
> > default and toggle that behaviour on _DSM #5 == 0 detection.
> > 
> > I doubt that reassigning bus numbers by default can trigger
> > regressions on existing platforms but the only way to figure
> > it out is by testing it.
> >
> > > My thinking is if we converge everybody toward the x86 method of
> > > doing
> > > a 2 pass survey of existing resources followed by
> > > assign_unassigned,
> > 
> > I am not entirely sure we need a 2-pass survey,
> >
> > pci_bus_claim_resources()
> > 
> > should be enough; if it is not we update it.
> 
> So it's not so much about the 2 passes per-se, though they have value,
> it's more about consolidating archs to do the same thing. Chances that
> we change x86 are nil. But we can change powerpc and arm64 to do like
> x86 and move that code to generic.

Agree on that.

> pci_bus_claim_resources() seems to be a "lightweight" variant of the
> survey done by x86. The main differences I can see are:
> 
>  - The 2 passes thing which we may or may not care about, its main
> purpose is to favor resources that are already enabled by the BIOS in
> case of conflicts as far as I understand.

Yes.

>  - pci_read_bridge_bases() is done by pci_bus_claim_resources(), while
> x86 (and powerpc and others) do it in their pcibios_fixup_bus. That one
> is interesting... Any reason why we shouldn't unconditionally read the
> bridges while probing ? Bjorn ?

I tried and failed miserably:

https://lore.kernel.org/linux-pci/20150916085850.GA17510@red-moon/

>  - When allocating bridge resources, there are interesting differences:
> 
>   * x86 (and powerpc to some extent): If one has a 0 start or we fail
> to claim it, x86 will wipe out the resource struct (including flags). I
> assume that pci_assign_unassign_* will restore bridges when needed but
> I haven't verified. 
> 
>   * pci_bus_claim_resources() is dumber in that regard. It will call
> pci_claim_bridge_resources() blindly try to claim whatever is there
> even if res->start is 0. This could be a problem with partially
> assigned trees. It also doesn't wipe the resource in case of failure to
> claim which could be a problem going down the tree and letting children
> attach to the non-claimed resource, thus potentially causing the
> reassign pass to fail.
> 
> The r->start == 0 test is interesting ... the generic claim code will
> honor IORESOURCE_UNSET but we don't seem to set that generically unless
> we hit some of the specific pass for explicit resource alignment, or
> during the reassignment phases.
> 
>  - When allocating device resources, the main difference other than the
> 2 passes is that x86 will "0 base" the resource (r->end -= r->start; r-
> >start = 0) for later reassignment. The claim path we use won't do
> that. Note: none sets IORESOURCE_UNSET... Additionally x86 has some
> oddball code to save the original FW values and restore them if
> assignment later fails, which is somewhat odd since there's a conflict
> but probably helps really broken setups.
> 
>  - x86 will not claim ROMs in that pass, it does a 3rd pass just for
> them (it's common I think to not have room for all the ROMs). It also
> disables them in config space during the survey.
> pci_bus_claim_resources() will claim everything and leave ROMs enabled.
> 
> So as a somewhat temprary conclusion, I think the main difference here
> is what happens when claim fails (also the res->start = 0 case which we
> need to look at more closely) and whether we should make the generic
> code also "0-base" the resource.

Oh my, res->start == 0, another can of worms. Honestly I do not know
what to do on that one mostly because we need to figure out how it
plays with resource assignment code (and legacy stuff, you know the
drill).

> 
> The question for me really is, do we want to just "upgrade" (if
> necessary) pci_bus_claim_resources() and continue having x86 do its own
> thing for ever, or do we want to consolidate around what is probably
> the most tested platform when it comes to PCI :-)

Consolidating is the right thing to do, with the caveats above, there
are many but you have all my support.

> And if we consolidate, I think that won't be by changing what x86 does,
> that code is the result of decades of fiddling to get things right with
> all sorts of broken BIOSes...

There is 0 chance to change x86 code (and there is 0 chance to change
core PCI code with x86 assumptions in it).

Cheers,
Lorenzo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-12 10:08               ` Lorenzo Pieralisi
@ 2019-06-12 10:58                 ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2019-06-12 10:58 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Bjorn Helgaas, Ard Biesheuvel, Sinan Kaya, linux-pci,
	linux-arm-kernel, Zilberman, Zeev, Saidi, Ali

On Wed, 2019-06-12 at 11:08 +0100, Lorenzo Pieralisi wrote:
> 
> Nothing wrong, we have not understood each others. What I am asking
> is to write it like this:
> 
> if (obj && obj->type == ACPI_TYPE_INTEGER && obj->integer.value == 0) {
> 	/* preserve existing resource assignment */
> 	pci_bus_claim_resources(bus);
> }
> 
> /* this code path should be common, indipendent of _DSM #5
> pci_assign_unassigned_root_bus_resources(bus);
> 
> There is no reason to have two distinct code paths, current code
> can call:
> 
> pci_assign_unassigned_root_bus_resources(bus);
> 
> instead of
> 
> pci_bus_size_bridges(bus);
> pci_bus_assign_resources(bus);
> 
> Actually, current code is *questionable* given that AFAICS it
> does not account for hotplug bridges additional memory window
> size.

Ah yes, I see, ok, I'll play with that in qemu a bit tomorrow
make sure we aren't missing something obvious & update the patch.

> > > That's how it should be I think:
> > > 
> > > 1) we do not want pci_assign_unassigned_root_bus_resources(bus) to
> > >    reallocate resources already claimed (see realloc parameter), do we ?
> > 
> > Well, realloc is useful to handle SR_IOV when the BIOS doesn't do it
> > right (common case). That said, at this point, we should be able to
> > honor IORESOURCE_PCI_FIXED for things that have _DSM #5 since they have
> > been claimed. I don't see that realloc logic being a problem for us,
> > and I want to avoid gratuitous differences with x86, but maybe I'm
> > missing something here...
> 
> See above. The conditions that make IORESOURCE_PCI_FIXED work are
> still unclear to me.

I more/less convinced myself it doesn't in the full-reassignment case
:-) As for the more traditional survey + assign, it can work if the
original survey led to reasonable values, mostly by preventing (I
*think*, I still need to convince myself) the realloc path for kicking
in, but this is all very fishy... At least x86 does use that flag at
least in one place from my grepping around...

> > > 2) pci_bus_size_bridges(bus) and pci_bus_assign_resources(bus) should
> > >    not interfere with resources already claimed so it *should* be safe
> > >    to call them anyway
> > 
> > Sure, *should* and here we introduce yet another way of doing things
> > though... Any reason we don't want to do what x86 does here ?
> 
> No, see above, keeping in mind that what x86 does is still not
> very well defined AFAICS.

It's not defined, it's grown through an evolutionary process :-) It's
still by far the most tested with the widest range of crazy PCI devices
and BIOSes I would think.

> > > Most importantly: I want everyone to agree that claiming is equivalent
> > > to making a resource immutable (except for realloc, see (1) above)
> > > because that's what we are doing by claiming on _DSM #5 == 0.
> > 
> > I think the combination of claiming *and* IORESOURCE_PCI_FIXED is what
> > makes it *really* immutable. I'm a bit confused by the realloc logic
> > right now, I'll need more quality time looking at it after ingesting
> > more caffeing but I'm under the impression that it will honor the flag.
> 
> Likewise, this requires some fuzzing because it is really hard to
> understand by perusing the code.

Yup.

> > > There are too many ways to make a resource immutable in the kernel
> > > and this is confusing and prone to bugs.
> > 
> > It is, but I don't want to create new ways of doing things, and what
> > you seem to propose is a new way imho :-)
> 
> No, see above. What I am saying is that we have IORESOURCE_PCI_FIXED,
> res->parent != NULL and Enhanced allocation to make a BAR immutable and
> before going any further it would be great if we understand their
> interaction given that we are starting from a pseudo clean slate.

> If we fail to do that it is quirks later (and I would rather avoid
> seeing x86 command line parameters to work around them).

Well.. while I would hate to have to *use* the x86 command line
parameters, I also don't like them not existing on all archs, as they
can actually be useful to help diagnose issues if anything (with of
course the intention of fixing things so they aren't needed in the long
run of course).

Cheers,
Ben.

> Cheers,
> Lorenzo
> 
> > Cheers,
> > Ben.
> > 
> > > Thanks,
> > > Lorenzo
> > > 
> > > > +	ACPI_FREE(obj);
> > > >  
> > > >  	list_for_each_entry(child, &bus->children, node)
> > > >  		pcie_bus_configure_settings(child);
> > > > diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
> > > > index 8082b612f561..62b7fdcc661c 100644
> > > > --- a/include/linux/pci-acpi.h
> > > > +++ b/include/linux/pci-acpi.h
> > > > @@ -107,9 +107,10 @@ static inline void acpiphp_check_host_bridge(struct acpi_device *adev) { }
> > > >  #endif
> > > >  
> > > >  extern const guid_t pci_acpi_dsm_guid;
> > > > -#define DEVICE_LABEL_DSM	0x07
> > > > -#define RESET_DELAY_DSM		0x08
> > > > -#define FUNCTION_DELAY_DSM	0x09
> > > > +#define IGNORE_PCI_BOOT_CONFIG_DSM	0x05
> > > > +#define DEVICE_LABEL_DSM		0x07
> > > > +#define RESET_DELAY_DSM			0x08
> > > > +#define FUNCTION_DELAY_DSM		0x09
> > > >  
> > > >  #else	/* CONFIG_ACPI */
> > > >  static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
> > > > 
> > > > 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-12  0:06             ` Benjamin Herrenschmidt
@ 2019-06-12 13:27               ` Bjorn Helgaas
  2019-06-12 21:46                 ` Benjamin Herrenschmidt
  2019-06-12 23:58                 ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 27+ messages in thread
From: Bjorn Helgaas @ 2019-06-12 13:27 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ard Biesheuvel, Sinan Kaya, Lorenzo Pieralisi, linux-pci,
	linux-arm-kernel, Zilberman, Zeev, Saidi, Ali

On Wed, Jun 12, 2019 at 10:06:06AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2019-06-11 at 18:39 -0500, Bjorn Helgaas wrote:

> > This is fine, but can we make a tiny step toward doing this in generic
> > code instead of adding more arch-specific stuff?
> > 
> > E.g., evaluate the _DSM in the generic acpi_pci_root_add(), set a
> > "preserve_config" bit in the struct acpi_pci_root, and test the bit
> > here?
> 
> I'd rather have the flag in the host bridge no ?

Oh, of course, that would make more sense.

> Talking of which, look at the ongoing discussion I have with Lorenzo
> when it comes to pci_bus_claim_resources vs. what x86 does, I'd love
> for you to chime in. I'd like to try to consolidate things further
> accross architectures but there might be reasons I don't see as to why
> things are different in that area, so ...

I don't know any reasons why things are different per arch.  In most
cases I suspect FUD.

Speaking of which, *this* patch looks like FUD because it essentially
says "Linux shouldn't change the PCI configuration on this system" but
it offers no explanation of *why* the config needs to be preserved.  I
would really like some note like "run-time firmware depends on the
addresses of device X".

Bjorn

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-12 13:27               ` Bjorn Helgaas
@ 2019-06-12 21:46                 ` Benjamin Herrenschmidt
  2019-06-12 23:58                 ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2019-06-12 21:46 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Ard Biesheuvel, Sinan Kaya, Lorenzo Pieralisi, linux-pci,
	linux-arm-kernel, Zilberman, Zeev, Saidi, Ali

On Wed, 2019-06-12 at 08:27 -0500, Bjorn Helgaas wrote:
> On Wed, Jun 12, 2019 at 10:06:06AM +1000, Benjamin Herrenschmidt
> wrote:
> > On Tue, 2019-06-11 at 18:39 -0500, Bjorn Helgaas wrote:
> > > This is fine, but can we make a tiny step toward doing this in
> > > generic
> > > code instead of adding more arch-specific stuff?
> > > 
> > > E.g., evaluate the _DSM in the generic acpi_pci_root_add(), set a
> > > "preserve_config" bit in the struct acpi_pci_root, and test the
> > > bit
> > > here?
> > 
> > I'd rather have the flag in the host bridge no ?
> 
> Oh, of course, that would make more sense.

Thinking of this ... I still believe eventually the default should be
to preseve the BIOS config (see ongoing discussions about converging
everybody toward the x86 model), so the flag should be the other way
around.

I'm thinking moving PROBE_ONLY and REASSIGN_ALL_RSRC/BUS to be host
bridge flags (initially copied from the global flags).

To not break things, ARM64 would start setting that 'reassign all' by
default, then we can flip it.

> > Talking of which, look at the ongoing discussion I have with Lorenzo
> > when it comes to pci_bus_claim_resources vs. what x86 does, I'd love
> > for you to chime in. I'd like to try to consolidate things further
> > accross architectures but there might be reasons I don't see as to why
> > things are different in that area, so ...
> 
> I don't know any reasons why things are different per arch.  In most
> cases I suspect FUD.
>
> Speaking of which, *this* patch looks like FUD because it essentially
> says "Linux shouldn't change the PCI configuration on this system" but
> it offers no explanation of *why* the config needs to be preserved.  I
> would really like some note like "run-time firmware depends on the
> addresses of device X".

Oh there are a number of reasons as to why but yes, that's one of them.
I can improve the comments.

That said, I think everybody tends to agree that reassigning everything
by default isn't a great idea, so while I still need something like
this patch in asap, I'll be working towards making that stuff more
common accross archs.

My logic is that x86 is the most tested arch with the widest range of
PCI devices and broken BIOSes. So what works for x86 should work for
others (minus maybe a handful of quirks). So it doesn't make sense to
have the main resource survey logic stuck in arch/x86 and everybody
else growing different things.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-12 10:21                   ` Lorenzo Pieralisi
@ 2019-06-12 22:05                     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2019-06-12 22:05 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Ard Biesheuvel, Bjorn Helgaas, Sinan Kaya, linux-pci,
	linux-arm-kernel, Zilberman, Zeev, Saidi, Ali

On Wed, 2019-06-12 at 11:21 +0100, Lorenzo Pieralisi wrote:

> Hrm. We should probably reassign bus numbers if we reassign resources
> > yes, but then I'd like us to not reassign resources unless we have to
> > :-)
> 
> But for that we can use _DSM #5 returning 0, at least we would
> be consistent.

Yes we should be consistent. My personal preference would be to always
honor FW resources by default regardless of DSM, and have DSM = 1 force
a full reassign instead. In a way, by always reassigning we send the
wrong message to FW folks, that it's ok to be broken bcs Linux will fix
it up..

Do you know how Windows deals with this ?

> Current situation is inconsistent and that bothers me I can put
> together a separate patch and send it as an RFT, there are not
> that many ARM64 PCI ACPI platforms to test it on.

Ok.

 ../..

> >  - pci_read_bridge_bases() is done by pci_bus_claim_resources(), while
> > x86 (and powerpc and others) do it in their pcibios_fixup_bus. That one
> > is interesting... Any reason why we shouldn't unconditionally read the
> > bridges while probing ? Bjorn ?
> 
> I tried and failed miserably:
> 
> https://lore.kernel.org/linux-pci/20150916085850.GA17510@red-moon/

Yes, I see... I think we can revive this if we key it off not
reassigning all resources.

There's a PCI flag PCI_REASSIGN_ALL_RSRC that's currently only use on
powerpc, but it wouldn't be hard to make sure it's set on archs that do
a full reassign. We could then have the generic code key off that.

That said, I'd rather have this be a host bridge flag. I'll look into
it later.

> >  - When allocating bridge resources, there are interesting differences:
> > 
> >   * x86 (and powerpc to some extent): If one has a 0 start or we fail
> > to claim it, x86 will wipe out the resource struct (including flags). I
> > assume that pci_assign_unassign_* will restore bridges when needed but
> > I haven't verified. 
> > 
> >   * pci_bus_claim_resources() is dumber in that regard. It will call
> > pci_claim_bridge_resources() blindly try to claim whatever is there
> > even if res->start is 0. This could be a problem with partially
> > assigned trees. It also doesn't wipe the resource in case of failure to
> > claim which could be a problem going down the tree and letting children
> > attach to the non-claimed resource, thus potentially causing the
> > reassign pass to fail.
> > 
> > The r->start == 0 test is interesting ... the generic claim code will
> > honor IORESOURCE_UNSET but we don't seem to set that generically unless
> > we hit some of the specific pass for explicit resource alignment, or
> > during the reassignment phases.
> > 
> >  - When allocating device resources, the main difference other than the
> > 2 passes is that x86 will "0 base" the resource (r->end -= r->start; r-
> > > start = 0) for later reassignment. The claim path we use won't do
> > 
> > that. Note: none sets IORESOURCE_UNSET... Additionally x86 has some
> > oddball code to save the original FW values and restore them if
> > assignment later fails, which is somewhat odd since there's a conflict
> > but probably helps really broken setups.
> > 
> >  - x86 will not claim ROMs in that pass, it does a 3rd pass just for
> > them (it's common I think to not have room for all the ROMs). It also
> > disables them in config space during the survey.
> > pci_bus_claim_resources() will claim everything and leave ROMs enabled.
> > 
> > So as a somewhat temprary conclusion, I think the main difference here
> > is what happens when claim fails (also the res->start = 0 case which we
> > need to look at more closely) and whether we should make the generic
> > code also "0-base" the resource.
> 
> Oh my, res->start == 0, another can of worms. Honestly I do not know
> what to do on that one mostly because we need to figure out how it
> plays with resource assignment code (and legacy stuff, you know the
> drill).

Yes. We have that funny pcibios_uninitialized_bridge_resource() in
arch/powerpc/kernel/pci-common.c which tries to "guess" whether a
bridge with a 0 res->start means that it's uninitialized or has a
"valid" 0 based resource. Among others, we check if memory decoding is
enabled, etc... If we decide it's really uninitialized we set
IORESOURCE_UNSET, and we rely on that later on.

In an idea world, nobody should create valid 0 based resources, it's
best to stay off the first 1MB of PCI space due to various legacy
concerns anyways but ...

> > The question for me really is, do we want to just "upgrade" (if
> > necessary) pci_bus_claim_resources() and continue having x86 do its own
> > thing for ever, or do we want to consolidate around what is probably
> > the most tested platform when it comes to PCI :-)
> 
> Consolidating is the right thing to do, with the caveats above, there
> are many but you have all my support.
> 
> > And if we consolidate, I think that won't be by changing what x86 does,
> > that code is the result of decades of fiddling to get things right with
> > all sorts of broken BIOSes...
> 
> There is 0 chance to change x86 code (and there is 0 chance to change
> core PCI code with x86 assumptions in it).

I wouldn't say 0 but the bar is high yes.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-12 13:27               ` Bjorn Helgaas
  2019-06-12 21:46                 ` Benjamin Herrenschmidt
@ 2019-06-12 23:58                 ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2019-06-12 23:58 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Lorenzo Pieralisi, Ard Biesheuvel, linux-pci, Sinan Kaya,
	Zilberman, Zeev, Saidi, Ali, linux-arm-kernel

On Wed, 2019-06-12 at 08:27 -0500, Bjorn Helgaas wrote:
> Speaking of which, *this* patch looks like FUD because it essentially
> says "Linux shouldn't change the PCI configuration on this system" but
> it offers no explanation of *why* the config needs to be preserved.  I
> would really like some note like "run-time firmware depends on the
> addresses of device X".

BTW. the patch doesn't say that. ACPI tells us that :-)

Cheers
Ben.



^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2019-06-13 17:16 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-03 23:41 [RFC] ARM64 PCI resource survey issue(s) Benjamin Herrenschmidt
2019-06-04  1:49 ` Bjorn Helgaas
2019-06-04  3:32   ` Benjamin Herrenschmidt
2019-06-04  3:37     ` Benjamin Herrenschmidt
2019-06-04  6:56     ` Benjamin Herrenschmidt
2019-06-04 12:49     ` Bjorn Helgaas
2019-06-04 20:41       ` Benjamin Herrenschmidt
2019-06-06  9:00         ` [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup Benjamin Herrenschmidt
2019-06-06  9:13           ` Ard Biesheuvel
2019-06-06 10:55             ` Benjamin Herrenschmidt
2019-06-11 14:31               ` Lorenzo Pieralisi
2019-06-11 22:09                 ` Benjamin Herrenschmidt
2019-06-11 22:34                   ` Ard Biesheuvel
2019-06-11 22:40                     ` Benjamin Herrenschmidt
2019-06-12 10:21                   ` Lorenzo Pieralisi
2019-06-12 22:05                     ` Benjamin Herrenschmidt
2019-06-11 14:58           ` Lorenzo Pieralisi
2019-06-11 22:19             ` Benjamin Herrenschmidt
2019-06-12 10:08               ` Lorenzo Pieralisi
2019-06-12 10:58                 ` Benjamin Herrenschmidt
2019-06-11 23:39           ` Bjorn Helgaas
2019-06-12  0:06             ` Benjamin Herrenschmidt
2019-06-12 13:27               ` Bjorn Helgaas
2019-06-12 21:46                 ` Benjamin Herrenschmidt
2019-06-12 23:58                 ` Benjamin Herrenschmidt
2019-06-10 10:11         ` [RFC] ARM64 PCI resource survey issue(s) Lorenzo Pieralisi
2019-06-11  5:46           ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).