* [RFC] ARM64 PCI resource survey issue(s) @ 2019-06-03 23:41 Benjamin Herrenschmidt 2019-06-04 1:49 ` Bjorn Helgaas 0 siblings, 1 reply; 27+ messages in thread From: Benjamin Herrenschmidt @ 2019-06-03 23:41 UTC (permalink / raw) To: Ard Biesheuvel Cc: Lorenzo Pieralisi, linux-pci, Sinan Kaya, Zilberman, Zeev, Saidi, Ali, bhelgaas, linux-arm-kernel Hi Folks ! I'd like to revive the discussion around Ard's old patch: https://patchwork.kernel.org/patch/9675707/ We (Amazon) need that sorted one way or another ASAP since we have setups coming where we must not let Linux change the FW assignments under some host bridges. In fact it's a reasonable thing to require for other reasons. The EFI framebuffer is an example, there can be others where FW/ACPI/EL3 etc... might have assumptions based on where some system devices were located by the boot FW and will break if we move them (such things are common on x86 and powerpc). Taking a step back I think (and I suspect we generally agree based on followup discussions I've seen) that the "right" thing to do is to have our default behaviour be: - Claim what the FW established if it's not obviously broken - Call pci_assign_unassigned_resources() to assign what the FW didn't assign Which is more or less what powerpc and x86 do today, but using a different mechanism than what's in pci_bus_claim_resources() (they are similar to each other, I wrote the current powerpc one loosely based on the x86 one at the time). That leads to a side question (which we should probably discuss in a separate thread) of whether we want to consolidate all that. That said, we also know this is going to break *some* existing platforms that have known broken FW assignment. The question is then can we sufficiently detect the breakage and re-assign in those cases (like x86 does fairly successfully in a number of cases), or do we need to add some board/platform quirks to force the full re-assigment on known broken platforms ? Even if all arm64 platforms are found to be broken today, I would still like to have our default be to use the FW setup, if anything as an incentive for newer platforms to get it right. At the very least on ACPI. We can use DSM#5 when it exists to force one way or another (ideally on a per bus basis but that's harder, so let's start with host bridges maybe ?) Thoughts ? I'm happy to do some of the work here. We have an emergency to get the AZ case solved, and Ard old patch does that... Cheers, Ben. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] ARM64 PCI resource survey issue(s) 2019-06-03 23:41 [RFC] ARM64 PCI resource survey issue(s) Benjamin Herrenschmidt @ 2019-06-04 1:49 ` Bjorn Helgaas 2019-06-04 3:32 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 27+ messages in thread From: Bjorn Helgaas @ 2019-06-04 1:49 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Lorenzo Pieralisi, Ard Biesheuvel, linux-pci, Sinan Kaya, Zilberman, Zeev, Saidi, Ali, linux-arm-kernel On Tue, Jun 04, 2019 at 09:41:16AM +1000, Benjamin Herrenschmidt wrote: > Hi Folks ! > > I'd like to revive the discussion around Ard's old patch: > > https://patchwork.kernel.org/patch/9675707/ > > We (Amazon) need that sorted one way or another ASAP since we have > setups coming where we must not let Linux change the FW assignments > under some host bridges. > > In fact it's a reasonable thing to require for other reasons. The EFI > framebuffer is an example, there can be others where FW/ACPI/EL3 etc... > might have assumptions based on where some system devices were located > by the boot FW and will break if we move them (such things are common > on x86 and powerpc). I would like to handle these individual devices that cannot be moved the same way we handle legacy (IDE, VGA) devices, i.e., mark the BARs with IORESOURCE_IO_FIXED. This could be done with either Enhanced Allocation capabilities or via ACPI _DSM function #5. My preference would be to do this at the lowest possible level of the PCI hierarchy. IIRC, EA can do it for individual BARs, and _DSM can be supplied for any individual device (or bridge, but I'd prefer to do it on the device because that gives us more information about exactly what needs to be preserved). Of course, _DSM *can* be higher, e.g., at the host bridge, but then we lose the information about what specifically must be immutable, and that means the OS cannot ever move *anything*, even if it becomes capable of moving things around to accommodate hot-added devices. I'm not aware of anything in DT that would correspond to DSM #5, but it could be added. > Taking a step back I think (and I suspect we generally agree based on > followup discussions I've seen) that the "right" thing to do is to have > our default behaviour be: > > - Claim what the FW established if it's not obviously broken > > - Call pci_assign_unassigned_resources() to assign what the FW > didn't assign > > Which is more or less what powerpc and x86 do today, but using a > different mechanism than what's in pci_bus_claim_resources() (they are > similar to each other, I wrote the current powerpc one loosely based on > the x86 one at the time). That leads to a side question (which we > should probably discuss in a separate thread) of whether we want to > consolidate all that. > > That said, we also know this is going to break *some* existing > platforms that have known broken FW assignment. The question is then > can we sufficiently detect the breakage and re-assign in those cases > (like x86 does fairly successfully in a number of cases), or do we need > to add some board/platform quirks to force the full re-assigment on > known broken platforms ? I don't know how to parse this. What does "known broken FW assignment" mean? Are you saying the assignment from FW *looks* valid (all BARs contain valid addresses and are inside windows of upstream bridges), but it doesn't work for some reason? If that's the case, how would full reassignment by Linux fix anything? Linux has no idea how to change a valid-looking assignment to make an actually-valid assignment. > Even if all arm64 platforms are found to be broken today, I would still > like to have our default be to use the FW setup, if anything as an > incentive for newer platforms to get it right. At the very least on > ACPI. I agree that Linux should not change anything unless it needs to. Of course, reasons it "needs to" might include allocating more space for hot-added devices, either because Linux discovered an open slot or because a user requested more space with a kernel parameter. > We can use DSM#5 when it exists to force one way or another (ideally on > a per bus basis but that's harder, so let's start with host bridges > maybe ?) I'd prefer starting with endpoints, but I think it will all work out the same in the end. When enumerating X, we look for a local _DSM #5 and mark X's BARs/windows accordingly and set a pdev->fixed_resources bit. If there's no local _DSM #5, act based on X's parent's fixed_resources bit. Bjorn _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] ARM64 PCI resource survey issue(s) 2019-06-04 1:49 ` Bjorn Helgaas @ 2019-06-04 3:32 ` Benjamin Herrenschmidt 2019-06-04 3:37 ` Benjamin Herrenschmidt ` (2 more replies) 0 siblings, 3 replies; 27+ messages in thread From: Benjamin Herrenschmidt @ 2019-06-04 3:32 UTC (permalink / raw) To: Bjorn Helgaas Cc: Lorenzo Pieralisi, Ard Biesheuvel, linux-pci, Sinan Kaya, Zilberman, Zeev, Saidi, Ali, linux-arm-kernel On Mon, 2019-06-03 at 20:49 -0500, Bjorn Helgaas wrote: > On Tue, Jun 04, 2019 at 09:41:16AM +1000, Benjamin Herrenschmidt wrote: > > Hi Folks ! > > > > I'd like to revive the discussion around Ard's old patch: > > > > https://patchwork.kernel.org/patch/9675707/ > > > > We (Amazon) need that sorted one way or another ASAP since we have > > setups coming where we must not let Linux change the FW assignments > > under some host bridges. > > > > In fact it's a reasonable thing to require for other reasons. The EFI > > framebuffer is an example, there can be others where FW/ACPI/EL3 etc... > > might have assumptions based on where some system devices were located > > by the boot FW and will break if we move them (such things are common > > on x86 and powerpc). > > I would like to handle these individual devices that cannot be moved > the same way we handle legacy (IDE, VGA) devices, i.e., mark the BARs > with IORESOURCE_IO_FIXED. A bit more messy but doable. However.... > This could be done with either Enhanced Allocation capabilities or via > ACPI _DSM function #5. My preference would be to do this at the > lowest possible level of the PCI hierarchy. > IIRC, EA can do it for > individual BARs, and _DSM can be supplied for any individual device > (or bridge, but I'd prefer to do it on the device because that gives > us more information about exactly what needs to be preserved). _DSM #5 seems to be working the other way around, it tells us to ignore the FW setting. So the "intent" here is that unless that things is present and says "1", we should just leave things alone as long as they don't conflict. What you seem to want to do is to go a step beyond and if present and 0, force everything to be fixed. I'm not completely comfortable with that approach. Let's see what others think. As fas as bridges vs. individual endpoints, It might have to handle both cases. For example in our case, the security policies will prevent changing the switch windows completely. > Of course, _DSM *can* be higher, e.g., at the host bridge, but then we > lose the information about what specifically must be immutable, and > that means the OS cannot ever move *anything*, even if it becomes > capable of moving things around to accommodate hot-added devices. Well, in our case at least this is a non-issue, we don't want the OS to move anything or change anything and there is no hotplug. That said, the two aren't exclusive. The presence at the host bridge level can be honored, and if absent, we can also honor at a finer granularity. However, as I said above, I'm not completely comfortable with treating _DSM #5 = 0 as meaning "must be fixed". This is not what it means. > I'm not aware of anything in DT that would correspond to DSM #5, but > it could be added. Yes, we could. On DT what we tend to do in those cases on powerpc and sparc is to "manufacture" the pci_dev structures based on the info in the DT, and only use config space to fill the remaining blanks. Let's look at the ACPI issue for now though, we can handle DT later. > > Taking a step back I think (and I suspect we generally agree based on > > followup discussions I've seen) that the "right" thing to do is to have > > our default behaviour be: > > > > - Claim what the FW established if it's not obviously broken > > > > - Call pci_assign_unassigned_resources() to assign what the FW > > didn't assign > > > > Which is more or less what powerpc and x86 do today, but using a > > different mechanism than what's in pci_bus_claim_resources() (they are > > similar to each other, I wrote the current powerpc one loosely based on > > the x86 one at the time). That leads to a side question (which we > > should probably discuss in a separate thread) of whether we want to > > consolidate all that. > > > > That said, we also know this is going to break *some* existing > > platforms that have known broken FW assignment. The question is then > > can we sufficiently detect the breakage and re-assign in those cases > > (like x86 does fairly successfully in a number of cases), or do we need > > to add some board/platform quirks to force the full re-assigment on > > known broken platforms ? > > I don't know how to parse this. What does "known broken FW > assignment" mean? Are you saying the assignment from FW *looks* valid > (all BARs contain valid addresses and are inside windows of upstream > bridges), but it doesn't work for some reason? Yes... I am not personally aware of such a case but I was led to believe based on prior conversations that such setups do exist, especially in the non-ACPI ARM64 world. Which is why I would suggest initially changing the default only on ACPI, at least until we have a bit better visibility. > If that's the case, > how would full reassignment by Linux fix anything? Linux has no idea > how to change a valid-looking assignment to make an actually-valid > assignment. Isn't it exactly what happens today though on arm64 ? We ignore whatever the FW set, and assign everything anew... This is also what we do on powerpc when the corresponding flag is set by the platform. IE. Currently arm64 does: pci_bus_size_bridges(bus); pci_bus_assign_resources(bus); Unconditionally... Or am I missing something ? That code is headache inducing :) There is no attempt at leaving things where they are. There is code to avoid messing with IORESOURCE_IO_FIXED in there but I'm not sure how well that works when it comes to dealing with bridge windows. In comparison, x86 and powerpc have a different mechanism that first surveys existing stuff, blasts away what looks bad/conflicting, and then does something like pci_assign_unassigned_resources() to assign whaetver's left. If we look at Ard's patch, he wants to use pci_bus_claim_resources() instead when the FW doesn't explicitely tells us to leave things alone. I think that needs pci_assign_unassigned_resources() as well, though I am not yet completely familiar with how pci_bus_claim_resources() differs from the x86 and powerpc resource survey. I think pci_bus_claim_resources() is intended for the "PROBE_ONLY" case which is the extreme opposite: we *must* trust everything the FW did and not try to touch anything at all. It would work for us too mind you but we don't have a way to convey that via ACPI since, as I said above, it's not really what _DSM #5 is intended to be. We seem to enjoy creating mostly-duplicate ways of doing things over and over again in PCI land :) > > Even if all arm64 platforms are found to be broken today, I would still > > like to have our default be to use the FW setup, if anything as an > > incentive for newer platforms to get it right. At the very least on > > ACPI. > > I agree that Linux should not change anything unless it needs to. Of > course, reasons it "needs to" might include allocating more space for > hot-added devices, either because Linux discovered an open slot or > because a user requested more space with a kernel parameter. Right and I'm fine with that. It's not an issue for us. > > We can use DSM#5 when it exists to force one way or another (ideally on > > a per bus basis but that's harder, so let's start with host bridges > > maybe ?) > > I'd prefer starting with endpoints, but I think it will all work out > the same in the end. When enumerating X, we look for a local _DSM #5 > and mark X's BARs/windows accordingly and set a pdev->fixed_resources > bit. If there's no local _DSM #5, act based on X's parent's > fixed_resources bit. Lorenzo, Zeev, have you already started looking at doing something this way ? Does it work ? Bjorn, we could do both, I don't see any problem there. Do you see any reason why we shouldn't change the arm64 logic today, at least when ACPI is present, to claim existing resources & assign unassigned one instead of reallocating everything ? And followup question, if the above is yes, will the sequence: pci_bus_claim_resources(b); if (!pci_has_flag(PCI_PROBE_ONLY)) pci_assign_unassigned_resources(b); Do what we want or do we need to replace pci_bus_claim_resources() with something a bit more like what we have on x86 with the 2 pass allocation mechanism ? Finally what about pci_host_probe() ? It seems to also miss pci_assign_unassigned_resources() and will unconditionally reassign everything if PCI_PROBE_ONLY is not set, so it's yet another different logic used by some archs. Cheers, Ben. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] ARM64 PCI resource survey issue(s) 2019-06-04 3:32 ` Benjamin Herrenschmidt @ 2019-06-04 3:37 ` Benjamin Herrenschmidt 2019-06-04 6:56 ` Benjamin Herrenschmidt 2019-06-04 12:49 ` Bjorn Helgaas 2 siblings, 0 replies; 27+ messages in thread From: Benjamin Herrenschmidt @ 2019-06-04 3:37 UTC (permalink / raw) To: Bjorn Helgaas Cc: Lorenzo Pieralisi, Ard Biesheuvel, linux-pci, Sinan Kaya, Zilberman, Zeev, Saidi, Ali, linux-arm-kernel On Tue, 2019-06-04 at 13:32 +1000, Benjamin Herrenschmidt wrote: > > > Of course, _DSM *can* be higher, e.g., at the host bridge, but then we > > lose the information about what specifically must be immutable, and > > that means the OS cannot ever move *anything*, even if it becomes > > capable of moving things around to accommodate hot-added devices. > > Well, in our case at least this is a non-issue, we don't want the OS to > move anything or change anything and there is no hotplug. Correction: There is hotplug in the leaf ports, but the FW will have setup the switch with enough space already. Cheers, Ben. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] ARM64 PCI resource survey issue(s) 2019-06-04 3:32 ` Benjamin Herrenschmidt 2019-06-04 3:37 ` Benjamin Herrenschmidt @ 2019-06-04 6:56 ` Benjamin Herrenschmidt 2019-06-04 12:49 ` Bjorn Helgaas 2 siblings, 0 replies; 27+ messages in thread From: Benjamin Herrenschmidt @ 2019-06-04 6:56 UTC (permalink / raw) To: Bjorn Helgaas Cc: Lorenzo Pieralisi, Ard Biesheuvel, linux-pci, Sinan Kaya, Zilberman, Zeev, Saidi, Ali, linux-arm-kernel On Tue, 2019-06-04 at 13:32 +1000, Benjamin Herrenschmidt wrote: > > > I would like to handle these individual devices that cannot be moved > > the same way we handle legacy (IDE, VGA) devices, i.e., mark the BARs > > with IORESOURCE_IO_FIXED. > > A bit more messy but doable. However.... Sooo.... I spent some quality whisky getting my head around the current state of setup-bus.c and setup-res.c ... gosh, what a mess ! Anyway, I have some concerns about the use of IORESOURCE_IO_FIXED in the context of an arch like arm64 that just blindly does: pci_bus_size_bridges(bus); pci_bus_assign_resources(bus); Unless I'm missing something (please correct me if I am), this is all extremely fragile and will only work under some very specific circumstances: First, the big issue is that having individual devices with such "Fixed" BAR doesn't work in isolation. There is no chance in hell the code in setup-bus.c will manage to configure the enclosing bridges etc... to accomodate such fixed devices, it would require a major refactoring of our entire resource allocation scheme. pci_bus_size_bridges() does all the calculation for sizing up bridges and ... completely ignores fixed BARs. Later pci_bus_assign_resources() will assign bridges a location, again, completely ignoring children fixed BARs if any. Chances that we successfully land the fixed BARs in the resulting allocation are slim at best. So at the very least, to have a chance of working, if any endpoint has fixed BARs, then all of it's parent chain must also be fixed, because we won't be able to deal with it otherwise, at least not via the "blunt" assignment code path. But there seem to be more damage here (though that one maybe easier to fix) from looking at the code: Unless I'm mistaken in my reading of the code, for a given "level" of the bus tree, __pci_bus_assign_resources() will *first* assign all the non-fixed devices by calling pbus_assign_resources_sorted(), and *then* try to claim the fixed resources of any device at that level using pdev_assign_fixed_resources(). Again, even if we had the parent bridge by design (fixed too) or by chance, covering the space where the fixed BAR is, chances that a sibling non-fixed device will be given that slot before we have a chance to claim it. So in practice, IORESOURCE_IO_FIXED only works if all parents and siblings are also IORESOURCE_IO_FIXED (either that or I just misunderstood the code). The sibling case might be fixable by changing the order inside __pci_bus_assign_resources(). There's more gunk too ... for example for IORESOURCE_IO_FIXED to work on bridges one must call pci_read_bridge_bases() first which the generic code doesn't do. So here comes an arch pcibios_fixup_bus ... yuck. There's also an oddball test about enabled bridges in __pci_bus_assign_resources() that __pci_bridge_assign_resources() doesn't have (yet another almost identical but not quite set of functions just to confuse people) which might make snse I looked at Lorenzo patches in his git repo for trying to deal with _DSM #5 via IORESOURCE_IO_FIXED and I don't like what I see in that context either :) So at this stage, I think we are better off solving the immediate issues of platforms that have good allocations coming from FW by using Ard's old patch at the host brigde level (possibly changing the default in absence of _DSM #5). Taking on step back here and trying to think more long term, I am concerned that we have how many different methods of doing the resource allocation depending on the arch that go through different (sometime majorly, sometimes subltly) path in the generic code, this doesn't seem particularily maintainable... It used to be that pci_bus_assign_resources() basically meant "force reassign everything". If you throw IORESOURCE_IO_FIXED in the mix then it's no longer that. It's now starting to look a bit more (but not quite) like pci_assign_unassigned_bus_resources() except that we don't have a generic 2-pass survey of existing resources like x86 has. We do have pci_bus_claim_resources() but that one seems more targetted as the opposite where we trust everything setup by FW and don't reassign anything. It doesn't seem to have provisions for detecting unsassigned resources unless I missed another bit here. Then we have pci_host_probe() that some archs use (but not arm64), which seems to be yet a different combination, which either trusts firmware completely (pci_bus_claim_resources) or not at all (pci_bus_size_bridges+pci_bus_assign_resources) based on a single global flag (shouldn't it be per-host bridge ?). Cheers, Ben. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] ARM64 PCI resource survey issue(s) 2019-06-04 3:32 ` Benjamin Herrenschmidt 2019-06-04 3:37 ` Benjamin Herrenschmidt 2019-06-04 6:56 ` Benjamin Herrenschmidt @ 2019-06-04 12:49 ` Bjorn Helgaas 2019-06-04 20:41 ` Benjamin Herrenschmidt 2 siblings, 1 reply; 27+ messages in thread From: Bjorn Helgaas @ 2019-06-04 12:49 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Lorenzo Pieralisi, Ard Biesheuvel, linux-pci, Sinan Kaya, Zilberman, Zeev, Saidi, Ali, linux-arm-kernel On Tue, Jun 04, 2019 at 01:32:11PM +1000, Benjamin Herrenschmidt wrote: > On Mon, 2019-06-03 at 20:49 -0500, Bjorn Helgaas wrote: > > This could be done with either Enhanced Allocation capabilities or > > via ACPI _DSM function #5. My preference would be to do this at > > the lowest possible level of the PCI hierarchy. IIRC, EA can do > > it for individual BARs, and _DSM can be supplied for any > > individual device (or bridge, but I'd prefer to do it on the > > device because that gives us more information about exactly what > > needs to be preserved). > > _DSM #5 seems to be working the other way around, it tells us to ignore > the FW setting. So the "intent" here is that unless that things is > present and says "1", we should just leave things alone as long as they > don't conflict. I wish you'd been involved in the recent effort to revise the _DSM #5 documentation. The language in PCI FW r3.2 makes the implicit assumption that by default, in the absence of _DSM, the OS must preserve all window and BAR assignments. But nobody has ever been able to come up with a spec reference to support that assumption and I think it is invalid. The ECN under consideration ("Clarifications to _DSM Function 5", March 26, 2019, currently posted for member review at https://members.pcisig.com/wg/PCI-SIG-WG_Members/document/13014) changes some of that language to basically say "if _DSM #5 exists and returns 0, the OS must preserve the settings of this device and its children; otherwise the OS is free to modify things." > What you seem to want to do is to go a step beyond and if present and > 0, force everything to be fixed. I'm not completely comfortable with > that approach. Let's see what others think. I'm not grasping the distinction you're making here. What you describe seems be what _DSM #5 requires. > > Of course, _DSM *can* be higher, e.g., at the host bridge, but then we > > lose the information about what specifically must be immutable, and > > that means the OS cannot ever move *anything*, even if it becomes > > capable of moving things around to accommodate hot-added devices. > > Well, in our case at least this is a non-issue, we don't want the OS to > move anything or change anything and there is no hotplug. > > That said, the two aren't exclusive. The presence at the host bridge > level can be honored, and if absent, we can also honor at a finer > granularity. > > However, as I said above, I'm not completely comfortable with treating > _DSM #5 = 0 as meaning "must be fixed". This is not what it means. The existing language in PCI FW r3.2 is "if _DSM #5 does not exist or it exists and returns 0, the OS must not ignore PCI configuration done by firmware." > > I'm not aware of anything in DT that would correspond to DSM #5, but > > it could be added. > > Yes, we could. On DT what we tend to do in those cases on powerpc and > sparc is to "manufacture" the pci_dev structures based on the info in > the DT, and only use config space to fill the remaining blanks. Let's > look at the ACPI issue for now though, we can handle DT later. > > > > Taking a step back I think (and I suspect we generally agree based on > > > followup discussions I've seen) that the "right" thing to do is to have > > > our default behaviour be: > > > > > > - Claim what the FW established if it's not obviously broken > > > > > > - Call pci_assign_unassigned_resources() to assign what the FW > > > didn't assign > > > > > > Which is more or less what powerpc and x86 do today, but using a > > > different mechanism than what's in pci_bus_claim_resources() (they are > > > similar to each other, I wrote the current powerpc one loosely based on > > > the x86 one at the time). That leads to a side question (which we > > > should probably discuss in a separate thread) of whether we want to > > > consolidate all that. > > > > > > That said, we also know this is going to break *some* existing > > > platforms that have known broken FW assignment. The question is then > > > can we sufficiently detect the breakage and re-assign in those cases > > > (like x86 does fairly successfully in a number of cases), or do we need > > > to add some board/platform quirks to force the full re-assigment on > > > known broken platforms ? > > > > I don't know how to parse this. What does "known broken FW > > assignment" mean? Are you saying the assignment from FW *looks* valid > > (all BARs contain valid addresses and are inside windows of upstream > > bridges), but it doesn't work for some reason? > > Yes... I am not personally aware of such a case but I was led to > believe based on prior conversations that such setups do exist, > especially in the non-ACPI ARM64 world. Which is why I would suggest > initially changing the default only on ACPI, at least until we have a > bit better visibility. If a resource assignment that is valid in terms of all the PCI rules (BARs are valid, BARs are inside upstream bridge windows, etc) doesn't work, we would need more information in order to fix anything. We'd need to know exactly *what* doesn't work and *why* so we can avoid it. The current blanket statement of "reassign everything and hope it works better" is useless. Bjorn _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] ARM64 PCI resource survey issue(s) 2019-06-04 12:49 ` Bjorn Helgaas @ 2019-06-04 20:41 ` Benjamin Herrenschmidt 2019-06-06 9:00 ` [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup Benjamin Herrenschmidt 2019-06-10 10:11 ` [RFC] ARM64 PCI resource survey issue(s) Lorenzo Pieralisi 0 siblings, 2 replies; 27+ messages in thread From: Benjamin Herrenschmidt @ 2019-06-04 20:41 UTC (permalink / raw) To: Bjorn Helgaas Cc: Lorenzo Pieralisi, Ard Biesheuvel, linux-pci, Sinan Kaya, Zilberman, Zeev, Saidi, Ali, linux-arm-kernel On Tue, 2019-06-04 at 07:49 -0500, Bjorn Helgaas wrote: > > Yes... I am not personally aware of such a case but I was led to > > believe based on prior conversations that such setups do exist, > > especially in the non-ACPI ARM64 world. Which is why I would suggest > > initially changing the default only on ACPI, at least until we have a > > bit better visibility. > > If a resource assignment that is valid in terms of all the PCI rules > (BARs are valid, BARs are inside upstream bridge windows, etc) doesn't > work, we would need more information in order to fix anything. We'd > need to know exactly *what* doesn't work and *why* so we can avoid it. > The current blanket statement of "reassign everything and hope it > works better" is useless. I agree and I assume the problem stems from BIOSes creating either invalid or incomplete assignments but as I said, I don't know for sure as our platforms dont have that problem. Cheers, Ben. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup 2019-06-04 20:41 ` Benjamin Herrenschmidt @ 2019-06-06 9:00 ` Benjamin Herrenschmidt 2019-06-06 9:13 ` Ard Biesheuvel ` (2 more replies) 2019-06-10 10:11 ` [RFC] ARM64 PCI resource survey issue(s) Lorenzo Pieralisi 1 sibling, 3 replies; 27+ messages in thread From: Benjamin Herrenschmidt @ 2019-06-06 9:00 UTC (permalink / raw) To: Bjorn Helgaas Cc: Lorenzo Pieralisi, Ard Biesheuvel, linux-pci, Sinan Kaya, Zilberman, Zeev, Saidi, Ali, linux-arm-kernel From: Ard Biesheuvel <ard.biesheuvel@linaro.org> On arm64 ACPI systems, we unconditionally reconfigure the entire PCI hierarchy at boot. This is a departure from what is customary on ACPI systems, and may break assumptions in some places (e.g., EFIFB), that the kernel will leave BARs of enabled PCI devices where they are. Given that PCI already specifies a device specific ACPI method (_DSM) for PCI root bridge nodes that tells us whether the firmware thinks the configuration should be left alone, let's sidestep the entire policy debate about whether the PCI configuration should be preserved or not, and put it under the control of the firmware instead. [BenH: Added pci_assign_unassigned_root_bus_resources()] Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> --- So I would like this variant rather than mucking around with IORESOURCE_PCI_FIXED at this stage to fix the problem with our platforms. See my other email, IORESOURCE_PCI_FIXED doesn't really work terribly well when using pci_bus_size_bridges and pci_bus_assign_resources, and the resulting patches are ugly and add more mess. Long run, I propose to start working on consolidating all those various resource survey mechanisms around what x86 does, unless people strongly object... (with the addition of the probe only and force reassign quirks so platforms can still chose that). Note: I haven't tested the effect of pci_assign_unassigned_root_bus_resources as our platforms don't leave anything unassigned. I'm not entirely sure how well pci_bus_claim_resources() will deal with a partially assigned setup... We do want to support partial assignment by BIOS though, it's a trend to reduce boot time, people seem to want BIOSes to only assign what's critical for booting. Bjorn: I haven't made the claim path the default in absence of _DSM #5 yet. I suggest we do that as a separate patch in case it breaks somebody, thus making bisection more meaningful. It will also make this one more palatable to distros since it won't change the behaviour on systems without _DSM #5, and we verified nobody has it except Seattle which returns 1. arch/arm64/kernel/pci.c | 23 +++++++++++++++++++++-- include/linux/pci-acpi.h | 7 ++++--- 2 files changed, 25 insertions(+), 5 deletions(-) diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c index bb85e2f4603f..6358e1cb4f9f 100644 --- a/arch/arm64/kernel/pci.c +++ b/arch/arm64/kernel/pci.c @@ -168,6 +168,7 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) struct acpi_pci_generic_root_info *ri; struct pci_bus *bus, *child; struct acpi_pci_root_ops *root_ops; + union acpi_object *obj; ri = kzalloc(sizeof(*ri), GFP_KERNEL); if (!ri) @@ -193,8 +194,26 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) if (!bus) return NULL; - pci_bus_size_bridges(bus); - pci_bus_assign_resources(bus); + /* + * Invoke the PCI device specific method (_DSM) #5 'Ignore PCI Boot + * Configuration', which tells us whether the firmware wants us to + * preserve the configuration of the PCI resource tree for this root + * bridge. + */ + obj = acpi_evaluate_dsm(ACPI_HANDLE(bus->bridge), &pci_acpi_dsm_guid, 1, + IGNORE_PCI_BOOT_CONFIG_DSM, NULL); + if (obj && obj->type == ACPI_TYPE_INTEGER && obj->integer.value == 0) { + /* preserve existing resource assignment */ + pci_bus_claim_resources(bus); + + /* Assign anything that might have been left out */ + pci_assign_unassigned_root_bus_resources(bus); + } else { + /* reconfigure the resource tree from scratch */ + pci_bus_size_bridges(bus); + pci_bus_assign_resources(bus); + } + ACPI_FREE(obj); list_for_each_entry(child, &bus->children, node) pcie_bus_configure_settings(child); diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h index 8082b612f561..62b7fdcc661c 100644 --- a/include/linux/pci-acpi.h +++ b/include/linux/pci-acpi.h @@ -107,9 +107,10 @@ static inline void acpiphp_check_host_bridge(struct acpi_device *adev) { } #endif extern const guid_t pci_acpi_dsm_guid; -#define DEVICE_LABEL_DSM 0x07 -#define RESET_DELAY_DSM 0x08 -#define FUNCTION_DELAY_DSM 0x09 +#define IGNORE_PCI_BOOT_CONFIG_DSM 0x05 +#define DEVICE_LABEL_DSM 0x07 +#define RESET_DELAY_DSM 0x08 +#define FUNCTION_DELAY_DSM 0x09 #else /* CONFIG_ACPI */ static inline void acpi_pci_add_bus(struct pci_bus *bus) { } _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup 2019-06-06 9:00 ` [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup Benjamin Herrenschmidt @ 2019-06-06 9:13 ` Ard Biesheuvel 2019-06-06 10:55 ` Benjamin Herrenschmidt 2019-06-11 14:58 ` Lorenzo Pieralisi 2019-06-11 23:39 ` Bjorn Helgaas 2 siblings, 1 reply; 27+ messages in thread From: Ard Biesheuvel @ 2019-06-06 9:13 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Lorenzo Pieralisi, linux-pci, Sinan Kaya, Zilberman, Zeev, Saidi, Ali, Bjorn Helgaas, linux-arm-kernel On Thu, 6 Jun 2019 at 11:00, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > > From: Ard Biesheuvel <ard.biesheuvel@linaro.org> > > On arm64 ACPI systems, we unconditionally reconfigure the entire PCI > hierarchy at boot. This is a departure from what is customary on ACPI > systems, and may break assumptions in some places (e.g., EFIFB), that > the kernel will leave BARs of enabled PCI devices where they are. > > Given that PCI already specifies a device specific ACPI method (_DSM) > for PCI root bridge nodes that tells us whether the firmware thinks > the configuration should be left alone, let's sidestep the entire > policy debate about whether the PCI configuration should be preserved > or not, and put it under the control of the firmware instead. > > [BenH: Added pci_assign_unassigned_root_bus_resources()] > > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > --- > > So I would like this variant rather than mucking around with > IORESOURCE_PCI_FIXED at this stage to fix the problem with our platforms. > > See my other email, IORESOURCE_PCI_FIXED doesn't really work terribly well > when using pci_bus_size_bridges and pci_bus_assign_resources, and the > resulting patches are ugly and add more mess. > > Long run, I propose to start working on consolidating all those various > resource survey mechanisms around what x86 does, unless people strongly > object... (with the addition of the probe only and force reassign quirks > so platforms can still chose that). > > Note: I haven't tested the effect of pci_assign_unassigned_root_bus_resources > as our platforms don't leave anything unassigned. I'm not entirely sure how > well pci_bus_claim_resources() will deal with a partially assigned setup... > > We do want to support partial assignment by BIOS though, it's a trend to > reduce boot time, people seem to want BIOSes to only assign what's critical > for booting. > > Bjorn: I haven't made the claim path the default in absence of _DSM #5 yet. > I suggest we do that as a separate patch in case it breaks somebody, thus > making bisection more meaningful. It will also make this one more palatable > to distros since it won't change the behaviour on systems without _DSM #5, > and we verified nobody has it except Seattle which returns 1. > FYI Seattle is broken in any case since it returns Package(1) rather than just an int. The problem with this patch is that currently, the PCI fw spec permits _DSM #5 everywhere *except* on the host bridge device object itself, and this is in the process of being changed. I will leave it up to the maintainers to decide whether we can take this patch in anticipation of that, even though it doesn't deal with _DSM #5 on nodes anywhere else in the PCIe tree. > arch/arm64/kernel/pci.c | 23 +++++++++++++++++++++-- > include/linux/pci-acpi.h | 7 ++++--- > 2 files changed, 25 insertions(+), 5 deletions(-) > > diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c > index bb85e2f4603f..6358e1cb4f9f 100644 > --- a/arch/arm64/kernel/pci.c > +++ b/arch/arm64/kernel/pci.c > @@ -168,6 +168,7 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) > struct acpi_pci_generic_root_info *ri; > struct pci_bus *bus, *child; > struct acpi_pci_root_ops *root_ops; > + union acpi_object *obj; > > ri = kzalloc(sizeof(*ri), GFP_KERNEL); > if (!ri) > @@ -193,8 +194,26 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) > if (!bus) > return NULL; > > - pci_bus_size_bridges(bus); > - pci_bus_assign_resources(bus); > + /* > + * Invoke the PCI device specific method (_DSM) #5 'Ignore PCI Boot > + * Configuration', which tells us whether the firmware wants us to > + * preserve the configuration of the PCI resource tree for this root > + * bridge. > + */ > + obj = acpi_evaluate_dsm(ACPI_HANDLE(bus->bridge), &pci_acpi_dsm_guid, 1, > + IGNORE_PCI_BOOT_CONFIG_DSM, NULL); > + if (obj && obj->type == ACPI_TYPE_INTEGER && obj->integer.value == 0) { > + /* preserve existing resource assignment */ > + pci_bus_claim_resources(bus); > + > + /* Assign anything that might have been left out */ > + pci_assign_unassigned_root_bus_resources(bus); > + } else { > + /* reconfigure the resource tree from scratch */ > + pci_bus_size_bridges(bus); > + pci_bus_assign_resources(bus); > + } > + ACPI_FREE(obj); > > list_for_each_entry(child, &bus->children, node) > pcie_bus_configure_settings(child); > diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h > index 8082b612f561..62b7fdcc661c 100644 > --- a/include/linux/pci-acpi.h > +++ b/include/linux/pci-acpi.h > @@ -107,9 +107,10 @@ static inline void acpiphp_check_host_bridge(struct acpi_device *adev) { } > #endif > > extern const guid_t pci_acpi_dsm_guid; > -#define DEVICE_LABEL_DSM 0x07 > -#define RESET_DELAY_DSM 0x08 > -#define FUNCTION_DELAY_DSM 0x09 > +#define IGNORE_PCI_BOOT_CONFIG_DSM 0x05 > +#define DEVICE_LABEL_DSM 0x07 > +#define RESET_DELAY_DSM 0x08 > +#define FUNCTION_DELAY_DSM 0x09 > > #else /* CONFIG_ACPI */ > static inline void acpi_pci_add_bus(struct pci_bus *bus) { } > > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup 2019-06-06 9:13 ` Ard Biesheuvel @ 2019-06-06 10:55 ` Benjamin Herrenschmidt 2019-06-11 14:31 ` Lorenzo Pieralisi 0 siblings, 1 reply; 27+ messages in thread From: Benjamin Herrenschmidt @ 2019-06-06 10:55 UTC (permalink / raw) To: Ard Biesheuvel Cc: Lorenzo Pieralisi, linux-pci, Sinan Kaya, Zilberman, Zeev, Saidi, Ali, Bjorn Helgaas, linux-arm-kernel On Thu, 2019-06-06 at 11:13 +0200, Ard Biesheuvel wrote: > > Bjorn: I haven't made the claim path the default in absence of _DSM #5 yet. > > I suggest we do that as a separate patch in case it breaks somebody, thus > > making bisection more meaningful. It will also make this one more palatable > > to distros since it won't change the behaviour on systems without _DSM #5, > > and we verified nobody has it except Seattle which returns 1. > > > > FYI Seattle is broken in any case since it returns Package(1) rather > than just an int. Great .... not. Do we care ? > The problem with this patch is that currently, the PCI fw spec permits > _DSM #5 everywhere *except* on the host bridge device object itself, > and this is in the process of being changed. Yes, I'm indirectly aware of that :) > I will leave it up to the maintainers to decide whether we can take > this patch in anticipation of that, even though it doesn't deal with > _DSM #5 on nodes anywhere else in the PCIe tree. Right, so the problem at this point is that dealing with it elsewhere in the tree is very fragile and problematic (see my other messages). Doing it at the host bridge level fixes the immediate problem for us (provided we are ok anticipating the spec update), and doesn't preclude also honoring it for individual devices later on. My thinking is if we converge everybody toward the x86 method of doing a 2 pass survey of existing resources followed by assign_unassigned, and have that the main generic code path (with added quirks to force a full assignment and keeping probe_only around but that's easy, we have that on powerpc and our code is originally based on the x86 one), then we'll have a much easier time supporting IORESOURCE_PCI_FIXED on portions of the tree as well (though it also becomes less critical to do so since we will no longer reallocate unless we have to). That said we need to understand what "fixed" means and why we do it. IE, If an endpoint somehere has "fixed" BARs for example, that means all parent bridge must be setup to enclose that range. Now our allocator for bridge windows cannot handle that and probably never will, so we have to rely on the existing window established by the FW being reasonable and use it. We can still *extend" bridge windows (and we have code to do that) if necessary but we cannot move them if they contain a fixed BAR device. There is a much bigger discussion to be had around that concept of fixed device anyway, maybe at Plumbers ? Why is the BAR fixed ? Because the EFI FB is on it ? Because HW bugs ? Because FW might access it from SMM or ARM equivalent ? Because ACPI will poke at it based on its initial address ? etc... Some of the answers to the above questions imply more than the need to fix the BAR: Does it also mean that disabling access to that BAR, even temporarily, isn't safe ? However that's what we do today when we probe, if anything, to do the BAR sizing... This isn't a new problem. We had issues like that dating back 15 years on powerpc for example, where a big ASIC hanging off PCI had all the Apple gunk including the interrupt controller, which was initialized from the DT way before PCI probing. If you took an interrupt at the "wrong" time during BAR sizing, kaboom ! If you had debug printk's in the wrong place in the PCI probing code, kaboom ! etc.... If we want to solve that properly in the long run, we'll probably want ACPI to tell us the BAR sizes and use that instead of doing manual sizing on such "system" devices. We similarily have ways to "construct" pci_dev's from the OF tree on sparc64 and powerpc, limiting direct config access to populate stuff we can't get from FW. Cheers, Ben. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup 2019-06-06 10:55 ` Benjamin Herrenschmidt @ 2019-06-11 14:31 ` Lorenzo Pieralisi 2019-06-11 22:09 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 27+ messages in thread From: Lorenzo Pieralisi @ 2019-06-11 14:31 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Ard Biesheuvel, linux-pci, Sinan Kaya, Zilberman, Zeev, Saidi, Ali, Bjorn Helgaas, linux-arm-kernel On Thu, Jun 06, 2019 at 08:55:07PM +1000, Benjamin Herrenschmidt wrote: > On Thu, 2019-06-06 at 11:13 +0200, Ard Biesheuvel wrote: > > > Bjorn: I haven't made the claim path the default in absence of _DSM #5 yet. > > > I suggest we do that as a separate patch in case it breaks somebody, thus > > > making bisection more meaningful. It will also make this one more palatable > > > to distros since it won't change the behaviour on systems without _DSM #5, > > > and we verified nobody has it except Seattle which returns 1. > > > > > > > FYI Seattle is broken in any case since it returns Package(1) rather > > than just an int. > > Great .... not. Do we care ? > > > The problem with this patch is that currently, the PCI fw spec permits > > _DSM #5 everywhere *except* on the host bridge device object itself, > > and this is in the process of being changed. > > Yes, I'm indirectly aware of that :) > > > I will leave it up to the maintainers to decide whether we can take > > this patch in anticipation of that, even though it doesn't deal with > > _DSM #5 on nodes anywhere else in the PCIe tree. > > Right, so the problem at this point is that dealing with it elsewhere > in the tree is very fragile and problematic (see my other messages). > Doing it at the host bridge level fixes the immediate problem for us > (provided we are ok anticipating the spec update), and doesn't preclude > also honoring it for individual devices later on. True, minus specs update schedule, I can't change that and merging this patch (and firmware thereof) relies on specifications that are intent changes till they become an ECN (~another merge window, so this patch could land at v5.4). The other option is doing what this patch does *without* relying on _DSM #5, we may have regressions unfortunately though. It is kind of orthogonal (but not really), bus numbers assignment is _not_ in line with resource assignment at the moment and I want to change it. Since ACPI on ARM64 is still at its inception maybe we should have a stab at patching the kernel so that it reassigns bus numbers by default and toggle that behaviour on _DSM #5 == 0 detection. I doubt that reassigning bus numbers by default can trigger regressions on existing platforms but the only way to figure it out is by testing it. > My thinking is if we converge everybody toward the x86 method of doing > a 2 pass survey of existing resources followed by assign_unassigned, I am not entirely sure we need a 2-pass survey, pci_bus_claim_resources() should be enough; if it is not we update it. > and have that the main generic code path (with added quirks to force a > full assignment and keeping probe_only around but that's easy, we have > that on powerpc and our code is originally based on the x86 one), then > we'll have a much easier time supporting IORESOURCE_PCI_FIXED on > portions of the tree as well (though it also becomes less critical to > do so since we will no longer reallocate unless we have to). > > That said we need to understand what "fixed" means and why we do it. Agree, totally and I want to make it clear how a BAR is fixed in the kernel, there are too many discrepancies in the resource management code already. > IE, If an endpoint somehere has "fixed" BARs for example, that means > all parent bridge must be setup to enclose that range. > > Now our allocator for bridge windows cannot handle that and probably > never will, so we have to rely on the existing window established by > the FW being reasonable and use it. We can still *extend" bridge > windows (and we have code to do that) if necessary but we cannot move > them if they contain a fixed BAR device. > > There is a much bigger discussion to be had around that concept of > fixed device anyway, maybe at Plumbers ? Why is the BAR fixed ? Because > the EFI FB is on it ? Because HW bugs ? Because FW might access it from > SMM or ARM equivalent ? Because ACPI will poke at it based on its > initial address ? etc... Consider a slot booked at LPC PCI uconf for this discussion. > Some of the answers to the above questions imply more than the need to > fix the BAR: Does it also mean that disabling access to that BAR, even > temporarily, isn't safe ? However that's what we do today when we > probe, if anything, to do the BAR sizing... Eh, another question that came up already should be debated. > This isn't a new problem. We had issues like that dating back 15 years > on powerpc for example, where a big ASIC hanging off PCI had all the > Apple gunk including the interrupt controller, which was initialized > from the DT way before PCI probing. If you took an interrupt at the > "wrong" time during BAR sizing, kaboom ! If you had debug printk's in > the wrong place in the PCI probing code, kaboom ! etc.... > > If we want to solve that properly in the long run, we'll probably want > ACPI to tell us the BAR sizes and use that instead of doing manual > sizing on such "system" devices. We similarily have ways to "construct" > pci_dev's from the OF tree on sparc64 and powerpc, limiting direct > config access to populate stuff we can't get from FW. https://lore.kernel.org/linux-pci/20190121174225.15835-1-mr.nuke.me@gmail.com/ ? _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup 2019-06-11 14:31 ` Lorenzo Pieralisi @ 2019-06-11 22:09 ` Benjamin Herrenschmidt 2019-06-11 22:34 ` Ard Biesheuvel 2019-06-12 10:21 ` Lorenzo Pieralisi 0 siblings, 2 replies; 27+ messages in thread From: Benjamin Herrenschmidt @ 2019-06-11 22:09 UTC (permalink / raw) To: Lorenzo Pieralisi Cc: Ard Biesheuvel, linux-pci, Sinan Kaya, Zilberman, Zeev, Saidi, Ali, Bjorn Helgaas, linux-arm-kernel On Tue, 2019-06-11 at 15:31 +0100, Lorenzo Pieralisi wrote: > > True, minus specs update schedule, I can't change that and merging > this patch (and firmware thereof) relies on specifications that > are intent changes till they become an ECN (~another merge window, > so this patch could land at v5.4). Hrm... annoying for us but I understand your reasoning. > The other option is doing what this patch does *without* relying > on _DSM #5, we may have regressions unfortunately though. We could work around regressions with quirks I suppose. It does make sense to assume that if you have ACPI and UEFI, you have a decent PCI BAR assignment at boot in the "general case". That said, we need to double check first that pci_bus_claim_resources() will not do horrible things on partially assigned setups, since there's a real interest in doing that in the field. > It is kind of orthogonal (but not really), bus numbers assignment > is _not_ in line with resource assignment at the moment and I want > to change it. Hrm. We should probably reassign bus numbers if we reassign resources yes, but then I'd like us to not reassign resources unless we have to :-) > Since ACPI on ARM64 is still at its inception maybe we should have > a stab at patching the kernel so that it reassigns bus numbers by > default and toggle that behaviour on _DSM #5 == 0 detection. > > I doubt that reassigning bus numbers by default can trigger > regressions on existing platforms but the only way to figure > it out is by testing it. > > > My thinking is if we converge everybody toward the x86 method of > > doing > > a 2 pass survey of existing resources followed by > > assign_unassigned, > > I am not entirely sure we need a 2-pass survey, > > pci_bus_claim_resources() > > should be enough; if it is not we update it. So it's not so much about the 2 passes per-se, though they have value, it's more about consolidating archs to do the same thing. Chances that we change x86 are nil. But we can change powerpc and arm64 to do like x86 and move that code to generic. pci_bus_claim_resources() seems to be a "lightweight" variant of the survey done by x86. The main differences I can see are: - The 2 passes thing which we may or may not care about, its main purpose is to favor resources that are already enabled by the BIOS in case of conflicts as far as I understand. - pci_read_bridge_bases() is done by pci_bus_claim_resources(), while x86 (and powerpc and others) do it in their pcibios_fixup_bus. That one is interesting... Any reason why we shouldn't unconditionally read the bridges while probing ? Bjorn ? - When allocating bridge resources, there are interesting differences: * x86 (and powerpc to some extent): If one has a 0 start or we fail to claim it, x86 will wipe out the resource struct (including flags). I assume that pci_assign_unassign_* will restore bridges when needed but I haven't verified. * pci_bus_claim_resources() is dumber in that regard. It will call pci_claim_bridge_resources() blindly try to claim whatever is there even if res->start is 0. This could be a problem with partially assigned trees. It also doesn't wipe the resource in case of failure to claim which could be a problem going down the tree and letting children attach to the non-claimed resource, thus potentially causing the reassign pass to fail. The r->start == 0 test is interesting ... the generic claim code will honor IORESOURCE_UNSET but we don't seem to set that generically unless we hit some of the specific pass for explicit resource alignment, or during the reassignment phases. - When allocating device resources, the main difference other than the 2 passes is that x86 will "0 base" the resource (r->end -= r->start; r- >start = 0) for later reassignment. The claim path we use won't do that. Note: none sets IORESOURCE_UNSET... Additionally x86 has some oddball code to save the original FW values and restore them if assignment later fails, which is somewhat odd since there's a conflict but probably helps really broken setups. - x86 will not claim ROMs in that pass, it does a 3rd pass just for them (it's common I think to not have room for all the ROMs). It also disables them in config space during the survey. pci_bus_claim_resources() will claim everything and leave ROMs enabled. So as a somewhat temprary conclusion, I think the main difference here is what happens when claim fails (also the res->start = 0 case which we need to look at more closely) and whether we should make the generic code also "0-base" the resource. The question for me really is, do we want to just "upgrade" (if necessary) pci_bus_claim_resources() and continue having x86 do its own thing for ever, or do we want to consolidate around what is probably the most tested platform when it comes to PCI :-) And if we consolidate, I think that won't be by changing what x86 does, that code is the result of decades of fiddling to get things right with all sorts of broken BIOSes... > > and have that the main generic code path (with added quirks to force a > > full assignment and keeping probe_only around but that's easy, we have > > that on powerpc and our code is originally based on the x86 one), then > > we'll have a much easier time supporting IORESOURCE_PCI_FIXED on > > portions of the tree as well (though it also becomes less critical to > > do so since we will no longer reallocate unless we have to). > > > > That said we need to understand what "fixed" means and why we do it. > > Agree, totally and I want to make it clear how a BAR is fixed in > the kernel, there are too many discrepancies in the resource > management code already. > > > IE, If an endpoint somehere has "fixed" BARs for example, that means > > all parent bridge must be setup to enclose that range. > > > > Now our allocator for bridge windows cannot handle that and probably > > never will, so we have to rely on the existing window established by > > the FW being reasonable and use it. We can still *extend" bridge > > windows (and we have code to do that) if necessary but we cannot move > > them if they contain a fixed BAR device. > > > > There is a much bigger discussion to be had around that concept of > > fixed device anyway, maybe at Plumbers ? Why is the BAR fixed ? Because > > the EFI FB is on it ? Because HW bugs ? Because FW might access it from > > SMM or ARM equivalent ? Because ACPI will poke at it based on its > > initial address ? etc... > > Consider a slot booked at LPC PCI uconf for this discussion. Excellent. > > Some of the answers to the above questions imply more than the need to > > fix the BAR: Does it also mean that disabling access to that BAR, even > > temporarily, isn't safe ? However that's what we do today when we > > probe, if anything, to do the BAR sizing... > > Eh, another question that came up already should be debated. Yup. > > This isn't a new problem. We had issues like that dating back 15 years > > on powerpc for example, where a big ASIC hanging off PCI had all the > > Apple gunk including the interrupt controller, which was initialized > > from the DT way before PCI probing. If you took an interrupt at the > > "wrong" time during BAR sizing, kaboom ! If you had debug printk's in > > the wrong place in the PCI probing code, kaboom ! etc.... > > > > If we want to solve that properly in the long run, we'll probably want > > ACPI to tell us the BAR sizes and use that instead of doing manual > > sizing on such "system" devices. We similarily have ways to "construct" > > pci_dev's from the OF tree on sparc64 and powerpc, limiting direct > > config access to populate stuff we can't get from FW. > > https://lore.kernel.org/linux-pci/20190121174225.15835-1-mr.nuke.me@gmail.com/ > > ? Ah I don't know enough about ACPI yet, on my reading list :-) Cheers, Ben. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup 2019-06-11 22:09 ` Benjamin Herrenschmidt @ 2019-06-11 22:34 ` Ard Biesheuvel 2019-06-11 22:40 ` Benjamin Herrenschmidt 2019-06-12 10:21 ` Lorenzo Pieralisi 1 sibling, 1 reply; 27+ messages in thread From: Ard Biesheuvel @ 2019-06-11 22:34 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Lorenzo Pieralisi, linux-pci, Sinan Kaya, Zilberman, Zeev, Saidi, Ali, Bjorn Helgaas, linux-arm-kernel ' On Wed, 12 Jun 2019 at 00:09, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > > On Tue, 2019-06-11 at 15:31 +0100, Lorenzo Pieralisi wrote: > > > > True, minus specs update schedule, I can't change that and merging > > this patch (and firmware thereof) relies on specifications that > > are intent changes till they become an ECN (~another merge window, > > so this patch could land at v5.4). > > Hrm... annoying for us but I understand your reasoning. > > > The other option is doing what this patch does *without* relying > > on _DSM #5, we may have regressions unfortunately though. > > We could work around regressions with quirks I suppose. It does make > sense to assume that if you have ACPI and UEFI, you have a decent PCI > BAR assignment at boot in the "general case". That said, we need to > double check first that pci_bus_claim_resources() will not do horrible > things on partially assigned setups, since there's a real interest in > doing that in the field. > EDK2 based code is typically very fork heavy, in the sense that, instead of upstreaming a change, a driver gets forked and changes are applied locally, which then need to be carried into perpetuity. That means that 'recent' ports could still display behavior that was removed from the generic code a long time ago. All the open source arm64 platforms now use the generic PCI host bridge driver (which is in charge of the bus enumeration and resource allocation) and so hopefully, future platforms will not deviate too much from that. In particular, EDK2 has some PCD tunables for things like PCIe hotplug and SR-IOV support, which affects the number of spare buses that get allocated for hotplug capable root ports, and for SR-IOV capable endpoints. As Lorenzo mentions, we don't actively reassign bus numbers from scratch, but I am not sure if that is 100% true. I think you do get some errors when booting with hotplug capable root ports that don't have 'pci_hotplug_bus_size' spare bus numbers available. Also note that EDK2 leaves ROM BARs unassigned. > > It is kind of orthogonal (but not really), bus numbers assignment > > is _not_ in line with resource assignment at the moment and I want > > to change it. > > Hrm. We should probably reassign bus numbers if we reassign resources > yes, but then I'd like us to not reassign resources unless we have to > :-) > > > Since ACPI on ARM64 is still at its inception maybe we should have > > a stab at patching the kernel so that it reassigns bus numbers by > > default and toggle that behaviour on _DSM #5 == 0 detection. > > > > I doubt that reassigning bus numbers by default can trigger > > regressions on existing platforms but the only way to figure > > it out is by testing it. > > > > > My thinking is if we converge everybody toward the x86 method of > > > doing > > > a 2 pass survey of existing resources followed by > > > assign_unassigned, > > > > I am not entirely sure we need a 2-pass survey, > > > > pci_bus_claim_resources() > > > > should be enough; if it is not we update it. > > So it's not so much about the 2 passes per-se, though they have value, > it's more about consolidating archs to do the same thing. Chances that > we change x86 are nil. But we can change powerpc and arm64 to do like > x86 and move that code to generic. > > pci_bus_claim_resources() seems to be a "lightweight" variant of the > survey done by x86. The main differences I can see are: > > - The 2 passes thing which we may or may not care about, its main > purpose is to favor resources that are already enabled by the BIOS in > case of conflicts as far as I understand. > > - pci_read_bridge_bases() is done by pci_bus_claim_resources(), while > x86 (and powerpc and others) do it in their pcibios_fixup_bus. That one > is interesting... Any reason why we shouldn't unconditionally read the > bridges while probing ? Bjorn ? > > - When allocating bridge resources, there are interesting differences: > > * x86 (and powerpc to some extent): If one has a 0 start or we fail > to claim it, x86 will wipe out the resource struct (including flags). I > assume that pci_assign_unassign_* will restore bridges when needed but > I haven't verified. > > * pci_bus_claim_resources() is dumber in that regard. It will call > pci_claim_bridge_resources() blindly try to claim whatever is there > even if res->start is 0. This could be a problem with partially > assigned trees. It also doesn't wipe the resource in case of failure to > claim which could be a problem going down the tree and letting children > attach to the non-claimed resource, thus potentially causing the > reassign pass to fail. > > The r->start == 0 test is interesting ... the generic claim code will > honor IORESOURCE_UNSET but we don't seem to set that generically unless > we hit some of the specific pass for explicit resource alignment, or > during the reassignment phases. > > - When allocating device resources, the main difference other than the > 2 passes is that x86 will "0 base" the resource (r->end -= r->start; r- > >start = 0) for later reassignment. The claim path we use won't do > that. Note: none sets IORESOURCE_UNSET... Additionally x86 has some > oddball code to save the original FW values and restore them if > assignment later fails, which is somewhat odd since there's a conflict > but probably helps really broken setups. > > - x86 will not claim ROMs in that pass, it does a 3rd pass just for > them (it's common I think to not have room for all the ROMs). It also > disables them in config space during the survey. > pci_bus_claim_resources() will claim everything and leave ROMs enabled. > > So as a somewhat temprary conclusion, I think the main difference here > is what happens when claim fails (also the res->start = 0 case which we > need to look at more closely) and whether we should make the generic > code also "0-base" the resource. > > The question for me really is, do we want to just "upgrade" (if > necessary) pci_bus_claim_resources() and continue having x86 do its own > thing for ever, or do we want to consolidate around what is probably > the most tested platform when it comes to PCI :-) > > And if we consolidate, I think that won't be by changing what x86 does, > that code is the result of decades of fiddling to get things right with > all sorts of broken BIOSes... > > > > and have that the main generic code path (with added quirks to force a > > > full assignment and keeping probe_only around but that's easy, we have > > > that on powerpc and our code is originally based on the x86 one), then > > > we'll have a much easier time supporting IORESOURCE_PCI_FIXED on > > > portions of the tree as well (though it also becomes less critical to > > > do so since we will no longer reallocate unless we have to). > > > > > > That said we need to understand what "fixed" means and why we do it. > > > > Agree, totally and I want to make it clear how a BAR is fixed in > > the kernel, there are too many discrepancies in the resource > > management code already. > > > > > IE, If an endpoint somehere has "fixed" BARs for example, that means > > > all parent bridge must be setup to enclose that range. > > > > > > Now our allocator for bridge windows cannot handle that and probably > > > never will, so we have to rely on the existing window established by > > > the FW being reasonable and use it. We can still *extend" bridge > > > windows (and we have code to do that) if necessary but we cannot move > > > them if they contain a fixed BAR device. > > > > > > There is a much bigger discussion to be had around that concept of > > > fixed device anyway, maybe at Plumbers ? Why is the BAR fixed ? Because > > > the EFI FB is on it ? Because HW bugs ? Because FW might access it from > > > SMM or ARM equivalent ? Because ACPI will poke at it based on its > > > initial address ? etc... > > > > Consider a slot booked at LPC PCI uconf for this discussion. > > Excellent. > > > > Some of the answers to the above questions imply more than the need to > > > fix the BAR: Does it also mean that disabling access to that BAR, even > > > temporarily, isn't safe ? However that's what we do today when we > > > probe, if anything, to do the BAR sizing... > > > > Eh, another question that came up already should be debated. > > Yup. > > > > This isn't a new problem. We had issues like that dating back 15 years > > > on powerpc for example, where a big ASIC hanging off PCI had all the > > > Apple gunk including the interrupt controller, which was initialized > > > from the DT way before PCI probing. If you took an interrupt at the > > > "wrong" time during BAR sizing, kaboom ! If you had debug printk's in > > > the wrong place in the PCI probing code, kaboom ! etc.... > > > > > > If we want to solve that properly in the long run, we'll probably want > > > ACPI to tell us the BAR sizes and use that instead of doing manual > > > sizing on such "system" devices. We similarily have ways to "construct" > > > pci_dev's from the OF tree on sparc64 and powerpc, limiting direct > > > config access to populate stuff we can't get from FW. > > > > https://lore.kernel.org/linux-pci/20190121174225.15835-1-mr.nuke.me@gmail.com/ > > > > ? > > Ah I don't know enough about ACPI yet, on my reading list :-) > > Cheers, > Ben. > > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup 2019-06-11 22:34 ` Ard Biesheuvel @ 2019-06-11 22:40 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 27+ messages in thread From: Benjamin Herrenschmidt @ 2019-06-11 22:40 UTC (permalink / raw) To: Ard Biesheuvel Cc: Lorenzo Pieralisi, linux-pci, Sinan Kaya, Zilberman, Zeev, Saidi, Ali, Bjorn Helgaas, linux-arm-kernel On Wed, 2019-06-12 at 00:34 +0200, Ard Biesheuvel wrote: > EDK2 based code is typically very fork heavy, in the sense that, > instead of upstreaming a change, a driver gets forked and changes are > applied locally, which then need to be carried into perpetuity. That > means that 'recent' ports could still display behavior that was > removed from the generic code a long time ago. All the open source > arm64 platforms now use the generic PCI host bridge driver (which is > in charge of the bus enumeration and resource allocation) and so > hopefully, future platforms will not deviate too much from that. > > In particular, EDK2 has some PCD tunables for things like PCIe > hotplug > and SR-IOV support, which affects the number of spare buses that get > allocated for hotplug capable root ports, and for SR-IOV capable > endpoints. > > As Lorenzo mentions, we don't actively reassign bus numbers from > scratch, but I am not sure if that is 100% true. I think you do get > some errors when booting with hotplug capable root ports that don't > have 'pci_hotplug_bus_size' spare bus numbers available. > > Also note that EDK2 leaves ROM BARs unassigned. This is all somewhat reasonable. x86 is in the same situation which is why I'm really keen on trying to consolidate the two approaches. > > > It is kind of orthogonal (but not really), bus numbers assignment > > > is _not_ in line with resource assignment at the moment and I > > > want > > > to change it. > > > > Hrm. We should probably reassign bus numbers if we reassign > > resources > > yes, but then I'd like us to not reassign resources unless we have > > to > > :-) > > > > > Since ACPI on ARM64 is still at its inception maybe we should > > > have > > > a stab at patching the kernel so that it reassigns bus numbers by > > > default and toggle that behaviour on _DSM #5 == 0 detection. > > > > > > I doubt that reassigning bus numbers by default can trigger > > > regressions on existing platforms but the only way to figure > > > it out is by testing it. > > > > > > > My thinking is if we converge everybody toward the x86 method > > > > of > > > > doing > > > > a 2 pass survey of existing resources followed by > > > > assign_unassigned, > > > > > > I am not entirely sure we need a 2-pass survey, > > > > > > pci_bus_claim_resources() > > > > > > should be enough; if it is not we update it. > > > > So it's not so much about the 2 passes per-se, though they have > > value, > > it's more about consolidating archs to do the same thing. Chances > > that > > we change x86 are nil. But we can change powerpc and arm64 to do > > like > > x86 and move that code to generic. > > > > pci_bus_claim_resources() seems to be a "lightweight" variant of > > the > > survey done by x86. The main differences I can see are: > > > > - The 2 passes thing which we may or may not care about, its main > > purpose is to favor resources that are already enabled by the BIOS > > in > > case of conflicts as far as I understand. > > > > - pci_read_bridge_bases() is done by pci_bus_claim_resources(), > > while > > x86 (and powerpc and others) do it in their pcibios_fixup_bus. That > > one > > is interesting... Any reason why we shouldn't unconditionally read > > the > > bridges while probing ? Bjorn ? > > > > - When allocating bridge resources, there are interesting > > differences: > > > > * x86 (and powerpc to some extent): If one has a 0 start or we > > fail > > to claim it, x86 will wipe out the resource struct (including > > flags). I > > assume that pci_assign_unassign_* will restore bridges when needed > > but > > I haven't verified. > > > > * pci_bus_claim_resources() is dumber in that regard. It will > > call > > pci_claim_bridge_resources() blindly try to claim whatever is there > > even if res->start is 0. This could be a problem with partially > > assigned trees. It also doesn't wipe the resource in case of > > failure to > > claim which could be a problem going down the tree and letting > > children > > attach to the non-claimed resource, thus potentially causing the > > reassign pass to fail. > > > > The r->start == 0 test is interesting ... the generic claim code > > will > > honor IORESOURCE_UNSET but we don't seem to set that generically > > unless > > we hit some of the specific pass for explicit resource alignment, > > or > > during the reassignment phases. > > > > - When allocating device resources, the main difference other than > > the > > 2 passes is that x86 will "0 base" the resource (r->end -= r- > > >start; r- > > > start = 0) for later reassignment. The claim path we use won't do > > > > that. Note: none sets IORESOURCE_UNSET... Additionally x86 has some > > oddball code to save the original FW values and restore them if > > assignment later fails, which is somewhat odd since there's a > > conflict > > but probably helps really broken setups. > > > > - x86 will not claim ROMs in that pass, it does a 3rd pass just > > for > > them (it's common I think to not have room for all the ROMs). It > > also > > disables them in config space during the survey. > > pci_bus_claim_resources() will claim everything and leave ROMs > > enabled. > > > > So as a somewhat temprary conclusion, I think the main difference > > here > > is what happens when claim fails (also the res->start = 0 case > > which we > > need to look at more closely) and whether we should make the > > generic > > code also "0-base" the resource. > > > > The question for me really is, do we want to just "upgrade" (if > > necessary) pci_bus_claim_resources() and continue having x86 do its > > own > > thing for ever, or do we want to consolidate around what is > > probably > > the most tested platform when it comes to PCI :-) > > > > And if we consolidate, I think that won't be by changing what x86 > > does, > > that code is the result of decades of fiddling to get things right > > with > > all sorts of broken BIOSes... > > > > > > and have that the main generic code path (with added quirks to > > > > force a > > > > full assignment and keeping probe_only around but that's easy, > > > > we have > > > > that on powerpc and our code is originally based on the x86 > > > > one), then > > > > we'll have a much easier time supporting IORESOURCE_PCI_FIXED > > > > on > > > > portions of the tree as well (though it also becomes less > > > > critical to > > > > do so since we will no longer reallocate unless we have to). > > > > > > > > That said we need to understand what "fixed" means and why we > > > > do it. > > > > > > Agree, totally and I want to make it clear how a BAR is fixed in > > > the kernel, there are too many discrepancies in the resource > > > management code already. > > > > > > > IE, If an endpoint somehere has "fixed" BARs for example, that > > > > means > > > > all parent bridge must be setup to enclose that range. > > > > > > > > Now our allocator for bridge windows cannot handle that and > > > > probably > > > > never will, so we have to rely on the existing window > > > > established by > > > > the FW being reasonable and use it. We can still *extend" > > > > bridge > > > > windows (and we have code to do that) if necessary but we > > > > cannot move > > > > them if they contain a fixed BAR device. > > > > > > > > There is a much bigger discussion to be had around that concept > > > > of > > > > fixed device anyway, maybe at Plumbers ? Why is the BAR fixed ? > > > > Because > > > > the EFI FB is on it ? Because HW bugs ? Because FW might access > > > > it from > > > > SMM or ARM equivalent ? Because ACPI will poke at it based on > > > > its > > > > initial address ? etc... > > > > > > Consider a slot booked at LPC PCI uconf for this discussion. > > > > Excellent. > > > > > > Some of the answers to the above questions imply more than the > > > > need to > > > > fix the BAR: Does it also mean that disabling access to that > > > > BAR, even > > > > temporarily, isn't safe ? However that's what we do today when > > > > we > > > > probe, if anything, to do the BAR sizing... > > > > > > Eh, another question that came up already should be debated. > > > > Yup. > > > > > > This isn't a new problem. We had issues like that dating back > > > > 15 years > > > > on powerpc for example, where a big ASIC hanging off PCI had > > > > all the > > > > Apple gunk including the interrupt controller, which was > > > > initialized > > > > from the DT way before PCI probing. If you took an interrupt at > > > > the > > > > "wrong" time during BAR sizing, kaboom ! If you had debug > > > > printk's in > > > > the wrong place in the PCI probing code, kaboom ! etc.... > > > > > > > > If we want to solve that properly in the long run, we'll > > > > probably want > > > > ACPI to tell us the BAR sizes and use that instead of doing > > > > manual > > > > sizing on such "system" devices. We similarily have ways to > > > > "construct" > > > > pci_dev's from the OF tree on sparc64 and powerpc, limiting > > > > direct > > > > config access to populate stuff we can't get from FW. > > > > > > https://lore.kernel.org/linux-pci/20190121174225.15835-1-mr.nuke.me@gmail.com/ > > > > > > ? > > > > Ah I don't know enough about ACPI yet, on my reading list :-) > > > > Cheers, > > Ben. > > > > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup 2019-06-11 22:09 ` Benjamin Herrenschmidt 2019-06-11 22:34 ` Ard Biesheuvel @ 2019-06-12 10:21 ` Lorenzo Pieralisi 2019-06-12 22:05 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 27+ messages in thread From: Lorenzo Pieralisi @ 2019-06-12 10:21 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Ard Biesheuvel, linux-pci, Sinan Kaya, Zilberman, Zeev, Saidi, Ali, Bjorn Helgaas, linux-arm-kernel On Wed, Jun 12, 2019 at 08:09:01AM +1000, Benjamin Herrenschmidt wrote: > On Tue, 2019-06-11 at 15:31 +0100, Lorenzo Pieralisi wrote: > > > > True, minus specs update schedule, I can't change that and merging > > this patch (and firmware thereof) relies on specifications that > > are intent changes till they become an ECN (~another merge window, > > so this patch could land at v5.4). > > Hrm... annoying for us but I understand your reasoning. If we can wait it is better, also because it gives us time to bring this discussion to completion. > > The other option is doing what this patch does *without* relying > > on _DSM #5, we may have regressions unfortunately though. > > We could work around regressions with quirks I suppose. It does make > sense to assume that if you have ACPI and UEFI, you have a decent PCI > BAR assignment at boot in the "general case". That said, we need to > double check first that pci_bus_claim_resources() will not do horrible > things on partially assigned setups, since there's a real interest in > doing that in the field. > > > It is kind of orthogonal (but not really), bus numbers assignment > > is _not_ in line with resource assignment at the moment and I want > > to change it. > > Hrm. We should probably reassign bus numbers if we reassign resources > yes, but then I'd like us to not reassign resources unless we have to > :-) But for that we can use _DSM #5 returning 0, at least we would be consistent. Current situation is inconsistent and that bothers me I can put together a separate patch and send it as an RFT, there are not that many ARM64 PCI ACPI platforms to test it on. > > a stab at patching the kernel so that it reassigns bus numbers by > > default and toggle that behaviour on _DSM #5 == 0 detection. > > > > I doubt that reassigning bus numbers by default can trigger > > regressions on existing platforms but the only way to figure > > it out is by testing it. > > > > > My thinking is if we converge everybody toward the x86 method of > > > doing > > > a 2 pass survey of existing resources followed by > > > assign_unassigned, > > > > I am not entirely sure we need a 2-pass survey, > > > > pci_bus_claim_resources() > > > > should be enough; if it is not we update it. > > So it's not so much about the 2 passes per-se, though they have value, > it's more about consolidating archs to do the same thing. Chances that > we change x86 are nil. But we can change powerpc and arm64 to do like > x86 and move that code to generic. Agree on that. > pci_bus_claim_resources() seems to be a "lightweight" variant of the > survey done by x86. The main differences I can see are: > > - The 2 passes thing which we may or may not care about, its main > purpose is to favor resources that are already enabled by the BIOS in > case of conflicts as far as I understand. Yes. > - pci_read_bridge_bases() is done by pci_bus_claim_resources(), while > x86 (and powerpc and others) do it in their pcibios_fixup_bus. That one > is interesting... Any reason why we shouldn't unconditionally read the > bridges while probing ? Bjorn ? I tried and failed miserably: https://lore.kernel.org/linux-pci/20150916085850.GA17510@red-moon/ > - When allocating bridge resources, there are interesting differences: > > * x86 (and powerpc to some extent): If one has a 0 start or we fail > to claim it, x86 will wipe out the resource struct (including flags). I > assume that pci_assign_unassign_* will restore bridges when needed but > I haven't verified. > > * pci_bus_claim_resources() is dumber in that regard. It will call > pci_claim_bridge_resources() blindly try to claim whatever is there > even if res->start is 0. This could be a problem with partially > assigned trees. It also doesn't wipe the resource in case of failure to > claim which could be a problem going down the tree and letting children > attach to the non-claimed resource, thus potentially causing the > reassign pass to fail. > > The r->start == 0 test is interesting ... the generic claim code will > honor IORESOURCE_UNSET but we don't seem to set that generically unless > we hit some of the specific pass for explicit resource alignment, or > during the reassignment phases. > > - When allocating device resources, the main difference other than the > 2 passes is that x86 will "0 base" the resource (r->end -= r->start; r- > >start = 0) for later reassignment. The claim path we use won't do > that. Note: none sets IORESOURCE_UNSET... Additionally x86 has some > oddball code to save the original FW values and restore them if > assignment later fails, which is somewhat odd since there's a conflict > but probably helps really broken setups. > > - x86 will not claim ROMs in that pass, it does a 3rd pass just for > them (it's common I think to not have room for all the ROMs). It also > disables them in config space during the survey. > pci_bus_claim_resources() will claim everything and leave ROMs enabled. > > So as a somewhat temprary conclusion, I think the main difference here > is what happens when claim fails (also the res->start = 0 case which we > need to look at more closely) and whether we should make the generic > code also "0-base" the resource. Oh my, res->start == 0, another can of worms. Honestly I do not know what to do on that one mostly because we need to figure out how it plays with resource assignment code (and legacy stuff, you know the drill). > > The question for me really is, do we want to just "upgrade" (if > necessary) pci_bus_claim_resources() and continue having x86 do its own > thing for ever, or do we want to consolidate around what is probably > the most tested platform when it comes to PCI :-) Consolidating is the right thing to do, with the caveats above, there are many but you have all my support. > And if we consolidate, I think that won't be by changing what x86 does, > that code is the result of decades of fiddling to get things right with > all sorts of broken BIOSes... There is 0 chance to change x86 code (and there is 0 chance to change core PCI code with x86 assumptions in it). Cheers, Lorenzo _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup 2019-06-12 10:21 ` Lorenzo Pieralisi @ 2019-06-12 22:05 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 27+ messages in thread From: Benjamin Herrenschmidt @ 2019-06-12 22:05 UTC (permalink / raw) To: Lorenzo Pieralisi Cc: Ard Biesheuvel, linux-pci, Sinan Kaya, Zilberman, Zeev, Saidi, Ali, Bjorn Helgaas, linux-arm-kernel On Wed, 2019-06-12 at 11:21 +0100, Lorenzo Pieralisi wrote: > Hrm. We should probably reassign bus numbers if we reassign resources > > yes, but then I'd like us to not reassign resources unless we have to > > :-) > > But for that we can use _DSM #5 returning 0, at least we would > be consistent. Yes we should be consistent. My personal preference would be to always honor FW resources by default regardless of DSM, and have DSM = 1 force a full reassign instead. In a way, by always reassigning we send the wrong message to FW folks, that it's ok to be broken bcs Linux will fix it up.. Do you know how Windows deals with this ? > Current situation is inconsistent and that bothers me I can put > together a separate patch and send it as an RFT, there are not > that many ARM64 PCI ACPI platforms to test it on. Ok. ../.. > > - pci_read_bridge_bases() is done by pci_bus_claim_resources(), while > > x86 (and powerpc and others) do it in their pcibios_fixup_bus. That one > > is interesting... Any reason why we shouldn't unconditionally read the > > bridges while probing ? Bjorn ? > > I tried and failed miserably: > > https://lore.kernel.org/linux-pci/20150916085850.GA17510@red-moon/ Yes, I see... I think we can revive this if we key it off not reassigning all resources. There's a PCI flag PCI_REASSIGN_ALL_RSRC that's currently only use on powerpc, but it wouldn't be hard to make sure it's set on archs that do a full reassign. We could then have the generic code key off that. That said, I'd rather have this be a host bridge flag. I'll look into it later. > > - When allocating bridge resources, there are interesting differences: > > > > * x86 (and powerpc to some extent): If one has a 0 start or we fail > > to claim it, x86 will wipe out the resource struct (including flags). I > > assume that pci_assign_unassign_* will restore bridges when needed but > > I haven't verified. > > > > * pci_bus_claim_resources() is dumber in that regard. It will call > > pci_claim_bridge_resources() blindly try to claim whatever is there > > even if res->start is 0. This could be a problem with partially > > assigned trees. It also doesn't wipe the resource in case of failure to > > claim which could be a problem going down the tree and letting children > > attach to the non-claimed resource, thus potentially causing the > > reassign pass to fail. > > > > The r->start == 0 test is interesting ... the generic claim code will > > honor IORESOURCE_UNSET but we don't seem to set that generically unless > > we hit some of the specific pass for explicit resource alignment, or > > during the reassignment phases. > > > > - When allocating device resources, the main difference other than the > > 2 passes is that x86 will "0 base" the resource (r->end -= r->start; r- > > > start = 0) for later reassignment. The claim path we use won't do > > > > that. Note: none sets IORESOURCE_UNSET... Additionally x86 has some > > oddball code to save the original FW values and restore them if > > assignment later fails, which is somewhat odd since there's a conflict > > but probably helps really broken setups. > > > > - x86 will not claim ROMs in that pass, it does a 3rd pass just for > > them (it's common I think to not have room for all the ROMs). It also > > disables them in config space during the survey. > > pci_bus_claim_resources() will claim everything and leave ROMs enabled. > > > > So as a somewhat temprary conclusion, I think the main difference here > > is what happens when claim fails (also the res->start = 0 case which we > > need to look at more closely) and whether we should make the generic > > code also "0-base" the resource. > > Oh my, res->start == 0, another can of worms. Honestly I do not know > what to do on that one mostly because we need to figure out how it > plays with resource assignment code (and legacy stuff, you know the > drill). Yes. We have that funny pcibios_uninitialized_bridge_resource() in arch/powerpc/kernel/pci-common.c which tries to "guess" whether a bridge with a 0 res->start means that it's uninitialized or has a "valid" 0 based resource. Among others, we check if memory decoding is enabled, etc... If we decide it's really uninitialized we set IORESOURCE_UNSET, and we rely on that later on. In an idea world, nobody should create valid 0 based resources, it's best to stay off the first 1MB of PCI space due to various legacy concerns anyways but ... > > The question for me really is, do we want to just "upgrade" (if > > necessary) pci_bus_claim_resources() and continue having x86 do its own > > thing for ever, or do we want to consolidate around what is probably > > the most tested platform when it comes to PCI :-) > > Consolidating is the right thing to do, with the caveats above, there > are many but you have all my support. > > > And if we consolidate, I think that won't be by changing what x86 does, > > that code is the result of decades of fiddling to get things right with > > all sorts of broken BIOSes... > > There is 0 chance to change x86 code (and there is 0 chance to change > core PCI code with x86 assumptions in it). I wouldn't say 0 but the bar is high yes. Cheers, Ben. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup 2019-06-06 9:00 ` [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup Benjamin Herrenschmidt 2019-06-06 9:13 ` Ard Biesheuvel @ 2019-06-11 14:58 ` Lorenzo Pieralisi 2019-06-11 22:19 ` Benjamin Herrenschmidt 2019-06-11 23:39 ` Bjorn Helgaas 2 siblings, 1 reply; 27+ messages in thread From: Lorenzo Pieralisi @ 2019-06-11 14:58 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Ard Biesheuvel, linux-pci, Sinan Kaya, Zilberman, Zeev, Saidi, Ali, Bjorn Helgaas, linux-arm-kernel On Thu, Jun 06, 2019 at 07:00:12PM +1000, Benjamin Herrenschmidt wrote: > From: Ard Biesheuvel <ard.biesheuvel@linaro.org> > > On arm64 ACPI systems, we unconditionally reconfigure the entire PCI > hierarchy at boot. This is a departure from what is customary on ACPI > systems, and may break assumptions in some places (e.g., EFIFB), that > the kernel will leave BARs of enabled PCI devices where they are. > > Given that PCI already specifies a device specific ACPI method (_DSM) > for PCI root bridge nodes that tells us whether the firmware thinks > the configuration should be left alone, let's sidestep the entire > policy debate about whether the PCI configuration should be preserved > or not, and put it under the control of the firmware instead. > > [BenH: Added pci_assign_unassigned_root_bus_resources()] > > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > --- > > So I would like this variant rather than mucking around with > IORESOURCE_PCI_FIXED at this stage to fix the problem with our platforms. > > See my other email, IORESOURCE_PCI_FIXED doesn't really work terribly well > when using pci_bus_size_bridges and pci_bus_assign_resources, and the > resulting patches are ugly and add more mess. > > Long run, I propose to start working on consolidating all those various > resource survey mechanisms around what x86 does, unless people strongly > object... (with the addition of the probe only and force reassign quirks > so platforms can still chose that). > > Note: I haven't tested the effect of pci_assign_unassigned_root_bus_resources > as our platforms don't leave anything unassigned. I'm not entirely sure how > well pci_bus_claim_resources() will deal with a partially assigned setup... > > We do want to support partial assignment by BIOS though, it's a trend to > reduce boot time, people seem to want BIOSes to only assign what's critical > for booting. > > Bjorn: I haven't made the claim path the default in absence of _DSM #5 yet. > I suggest we do that as a separate patch in case it breaks somebody, thus > making bisection more meaningful. It will also make this one more palatable > to distros since it won't change the behaviour on systems without _DSM #5, > and we verified nobody has it except Seattle which returns 1. > > arch/arm64/kernel/pci.c | 23 +++++++++++++++++++++-- > include/linux/pci-acpi.h | 7 ++++--- > 2 files changed, 25 insertions(+), 5 deletions(-) > > diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c > index bb85e2f4603f..6358e1cb4f9f 100644 > --- a/arch/arm64/kernel/pci.c > +++ b/arch/arm64/kernel/pci.c > @@ -168,6 +168,7 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) > struct acpi_pci_generic_root_info *ri; > struct pci_bus *bus, *child; > struct acpi_pci_root_ops *root_ops; > + union acpi_object *obj; > > ri = kzalloc(sizeof(*ri), GFP_KERNEL); > if (!ri) > @@ -193,8 +194,26 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) > if (!bus) > return NULL; > > - pci_bus_size_bridges(bus); > - pci_bus_assign_resources(bus); > + /* > + * Invoke the PCI device specific method (_DSM) #5 'Ignore PCI Boot > + * Configuration', which tells us whether the firmware wants us to > + * preserve the configuration of the PCI resource tree for this root > + * bridge. > + */ > + obj = acpi_evaluate_dsm(ACPI_HANDLE(bus->bridge), &pci_acpi_dsm_guid, 1, > + IGNORE_PCI_BOOT_CONFIG_DSM, NULL); > + if (obj && obj->type == ACPI_TYPE_INTEGER && obj->integer.value == 0) { > + /* preserve existing resource assignment */ > + pci_bus_claim_resources(bus); > + > + /* Assign anything that might have been left out */ > + pci_assign_unassigned_root_bus_resources(bus); > + } else { > + /* reconfigure the resource tree from scratch */ > + pci_bus_size_bridges(bus); > + pci_bus_assign_resources(bus); > + } if (obj && obj->type == ACPI_TYPE_INTEGER && obj->integer.value == 0) { /* preserve existing resource assignment */ pci_bus_claim_resources(bus); } pci_bus_size_bridges(bus); pci_bus_assign_resources(bus); That's how it should be I think: 1) we do not want pci_assign_unassigned_root_bus_resources(bus) to reallocate resources already claimed (see realloc parameter), do we ? 2) pci_bus_size_bridges(bus) and pci_bus_assign_resources(bus) should not interfere with resources already claimed so it *should* be safe to call them anyway Most importantly: I want everyone to agree that claiming is equivalent to making a resource immutable (except for realloc, see (1) above) because that's what we are doing by claiming on _DSM #5 == 0. There are too many ways to make a resource immutable in the kernel and this is confusing and prone to bugs. Thanks, Lorenzo > + ACPI_FREE(obj); > > list_for_each_entry(child, &bus->children, node) > pcie_bus_configure_settings(child); > diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h > index 8082b612f561..62b7fdcc661c 100644 > --- a/include/linux/pci-acpi.h > +++ b/include/linux/pci-acpi.h > @@ -107,9 +107,10 @@ static inline void acpiphp_check_host_bridge(struct acpi_device *adev) { } > #endif > > extern const guid_t pci_acpi_dsm_guid; > -#define DEVICE_LABEL_DSM 0x07 > -#define RESET_DELAY_DSM 0x08 > -#define FUNCTION_DELAY_DSM 0x09 > +#define IGNORE_PCI_BOOT_CONFIG_DSM 0x05 > +#define DEVICE_LABEL_DSM 0x07 > +#define RESET_DELAY_DSM 0x08 > +#define FUNCTION_DELAY_DSM 0x09 > > #else /* CONFIG_ACPI */ > static inline void acpi_pci_add_bus(struct pci_bus *bus) { } > > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup 2019-06-11 14:58 ` Lorenzo Pieralisi @ 2019-06-11 22:19 ` Benjamin Herrenschmidt 2019-06-12 10:08 ` Lorenzo Pieralisi 0 siblings, 1 reply; 27+ messages in thread From: Benjamin Herrenschmidt @ 2019-06-11 22:19 UTC (permalink / raw) To: Lorenzo Pieralisi Cc: Ard Biesheuvel, linux-pci, Sinan Kaya, Zilberman, Zeev, Saidi, Ali, Bjorn Helgaas, linux-arm-kernel On Tue, 2019-06-11 at 15:58 +0100, Lorenzo Pieralisi wrote: > > > if (obj && obj->type == ACPI_TYPE_INTEGER && obj->integer.value == 0) { > /* preserve existing resource assignment */ > pci_bus_claim_resources(bus); > } > > pci_bus_size_bridges(bus); > pci_bus_assign_resources(bus); So that makes me nervous... my understanding is that the pair pci_bus_size_bridges(bus); pci_bus_assign_resources(bus); Is intended for full reassignment. Now they will try to skip resources that already have a parent, but that's yet another code path. What's wrong with pci_unassigned_* ? That's what it's meant for... > That's how it should be I think: > > 1) we do not want pci_assign_unassigned_root_bus_resources(bus) to > reallocate resources already claimed (see realloc parameter), do we ? Well, realloc is useful to handle SR_IOV when the BIOS doesn't do it right (common case). That said, at this point, we should be able to honor IORESOURCE_PCI_FIXED for things that have _DSM #5 since they have been claimed. I don't see that realloc logic being a problem for us, and I want to avoid gratuitous differences with x86, but maybe I'm missing something here... > 2) pci_bus_size_bridges(bus) and pci_bus_assign_resources(bus) should > not interfere with resources already claimed so it *should* be safe > to call them anyway Sure, *should* and here we introduce yet another way of doing things though... Any reason we don't want to do what x86 does here ? > Most importantly: I want everyone to agree that claiming is equivalent > to making a resource immutable (except for realloc, see (1) above) > because that's what we are doing by claiming on _DSM #5 == 0. I think the combination of claiming *and* IORESOURCE_PCI_FIXED is what makes it *really* immutable. I'm a bit confused by the realloc logic right now, I'll need more quality time looking at it after ingesting more caffeing but I'm under the impression that it will honor the flag. > There are too many ways to make a resource immutable in the kernel > and this is confusing and prone to bugs. It is, but I don't want to create new ways of doing things, and what you seem to propose is a new way imho :-) Cheers, Ben. > Thanks, > Lorenzo > > > + ACPI_FREE(obj); > > > > list_for_each_entry(child, &bus->children, node) > > pcie_bus_configure_settings(child); > > diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h > > index 8082b612f561..62b7fdcc661c 100644 > > --- a/include/linux/pci-acpi.h > > +++ b/include/linux/pci-acpi.h > > @@ -107,9 +107,10 @@ static inline void acpiphp_check_host_bridge(struct acpi_device *adev) { } > > #endif > > > > extern const guid_t pci_acpi_dsm_guid; > > -#define DEVICE_LABEL_DSM 0x07 > > -#define RESET_DELAY_DSM 0x08 > > -#define FUNCTION_DELAY_DSM 0x09 > > +#define IGNORE_PCI_BOOT_CONFIG_DSM 0x05 > > +#define DEVICE_LABEL_DSM 0x07 > > +#define RESET_DELAY_DSM 0x08 > > +#define FUNCTION_DELAY_DSM 0x09 > > > > #else /* CONFIG_ACPI */ > > static inline void acpi_pci_add_bus(struct pci_bus *bus) { } > > > > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup 2019-06-11 22:19 ` Benjamin Herrenschmidt @ 2019-06-12 10:08 ` Lorenzo Pieralisi 2019-06-12 10:58 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 27+ messages in thread From: Lorenzo Pieralisi @ 2019-06-12 10:08 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Ard Biesheuvel, linux-pci, Sinan Kaya, Zilberman, Zeev, Saidi, Ali, Bjorn Helgaas, linux-arm-kernel On Wed, Jun 12, 2019 at 08:19:40AM +1000, Benjamin Herrenschmidt wrote: > On Tue, 2019-06-11 at 15:58 +0100, Lorenzo Pieralisi wrote: > > > > > > if (obj && obj->type == ACPI_TYPE_INTEGER && obj->integer.value == 0) { > > /* preserve existing resource assignment */ > > pci_bus_claim_resources(bus); > > } > > > > pci_bus_size_bridges(bus); > > pci_bus_assign_resources(bus); > > So that makes me nervous... my understanding is that the pair > > pci_bus_size_bridges(bus); > pci_bus_assign_resources(bus); > > Is intended for full reassignment. Now they will try to skip resources > that already have a parent, but that's yet another code path. What's > wrong with pci_unassigned_* ? That's what it's meant for... Nothing wrong, we have not understood each others. What I am asking is to write it like this: if (obj && obj->type == ACPI_TYPE_INTEGER && obj->integer.value == 0) { /* preserve existing resource assignment */ pci_bus_claim_resources(bus); } /* this code path should be common, indipendent of _DSM #5 pci_assign_unassigned_root_bus_resources(bus); There is no reason to have two distinct code paths, current code can call: pci_assign_unassigned_root_bus_resources(bus); instead of pci_bus_size_bridges(bus); pci_bus_assign_resources(bus); Actually, current code is *questionable* given that AFAICS it does not account for hotplug bridges additional memory window size. > > That's how it should be I think: > > > > 1) we do not want pci_assign_unassigned_root_bus_resources(bus) to > > reallocate resources already claimed (see realloc parameter), do we ? > > Well, realloc is useful to handle SR_IOV when the BIOS doesn't do it > right (common case). That said, at this point, we should be able to > honor IORESOURCE_PCI_FIXED for things that have _DSM #5 since they have > been claimed. I don't see that realloc logic being a problem for us, > and I want to avoid gratuitous differences with x86, but maybe I'm > missing something here... See above. The conditions that make IORESOURCE_PCI_FIXED work are still unclear to me. > > 2) pci_bus_size_bridges(bus) and pci_bus_assign_resources(bus) should > > not interfere with resources already claimed so it *should* be safe > > to call them anyway > > Sure, *should* and here we introduce yet another way of doing things > though... Any reason we don't want to do what x86 does here ? No, see above, keeping in mind that what x86 does is still not very well defined AFAICS. > > Most importantly: I want everyone to agree that claiming is equivalent > > to making a resource immutable (except for realloc, see (1) above) > > because that's what we are doing by claiming on _DSM #5 == 0. > > I think the combination of claiming *and* IORESOURCE_PCI_FIXED is what > makes it *really* immutable. I'm a bit confused by the realloc logic > right now, I'll need more quality time looking at it after ingesting > more caffeing but I'm under the impression that it will honor the flag. Likewise, this requires some fuzzing because it is really hard to understand by perusing the code. > > There are too many ways to make a resource immutable in the kernel > > and this is confusing and prone to bugs. > > It is, but I don't want to create new ways of doing things, and what > you seem to propose is a new way imho :-) No, see above. What I am saying is that we have IORESOURCE_PCI_FIXED, res->parent != NULL and Enhanced allocation to make a BAR immutable and before going any further it would be great if we understand their interaction given that we are starting from a pseudo clean slate. If we fail to do that it is quirks later (and I would rather avoid seeing x86 command line parameters to work around them). Cheers, Lorenzo > Cheers, > Ben. > > > Thanks, > > Lorenzo > > > > > + ACPI_FREE(obj); > > > > > > list_for_each_entry(child, &bus->children, node) > > > pcie_bus_configure_settings(child); > > > diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h > > > index 8082b612f561..62b7fdcc661c 100644 > > > --- a/include/linux/pci-acpi.h > > > +++ b/include/linux/pci-acpi.h > > > @@ -107,9 +107,10 @@ static inline void acpiphp_check_host_bridge(struct acpi_device *adev) { } > > > #endif > > > > > > extern const guid_t pci_acpi_dsm_guid; > > > -#define DEVICE_LABEL_DSM 0x07 > > > -#define RESET_DELAY_DSM 0x08 > > > -#define FUNCTION_DELAY_DSM 0x09 > > > +#define IGNORE_PCI_BOOT_CONFIG_DSM 0x05 > > > +#define DEVICE_LABEL_DSM 0x07 > > > +#define RESET_DELAY_DSM 0x08 > > > +#define FUNCTION_DELAY_DSM 0x09 > > > > > > #else /* CONFIG_ACPI */ > > > static inline void acpi_pci_add_bus(struct pci_bus *bus) { } > > > > > > > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup 2019-06-12 10:08 ` Lorenzo Pieralisi @ 2019-06-12 10:58 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 27+ messages in thread From: Benjamin Herrenschmidt @ 2019-06-12 10:58 UTC (permalink / raw) To: Lorenzo Pieralisi Cc: Ard Biesheuvel, linux-pci, Sinan Kaya, Zilberman, Zeev, Saidi, Ali, Bjorn Helgaas, linux-arm-kernel On Wed, 2019-06-12 at 11:08 +0100, Lorenzo Pieralisi wrote: > > Nothing wrong, we have not understood each others. What I am asking > is to write it like this: > > if (obj && obj->type == ACPI_TYPE_INTEGER && obj->integer.value == 0) { > /* preserve existing resource assignment */ > pci_bus_claim_resources(bus); > } > > /* this code path should be common, indipendent of _DSM #5 > pci_assign_unassigned_root_bus_resources(bus); > > There is no reason to have two distinct code paths, current code > can call: > > pci_assign_unassigned_root_bus_resources(bus); > > instead of > > pci_bus_size_bridges(bus); > pci_bus_assign_resources(bus); > > Actually, current code is *questionable* given that AFAICS it > does not account for hotplug bridges additional memory window > size. Ah yes, I see, ok, I'll play with that in qemu a bit tomorrow make sure we aren't missing something obvious & update the patch. > > > That's how it should be I think: > > > > > > 1) we do not want pci_assign_unassigned_root_bus_resources(bus) to > > > reallocate resources already claimed (see realloc parameter), do we ? > > > > Well, realloc is useful to handle SR_IOV when the BIOS doesn't do it > > right (common case). That said, at this point, we should be able to > > honor IORESOURCE_PCI_FIXED for things that have _DSM #5 since they have > > been claimed. I don't see that realloc logic being a problem for us, > > and I want to avoid gratuitous differences with x86, but maybe I'm > > missing something here... > > See above. The conditions that make IORESOURCE_PCI_FIXED work are > still unclear to me. I more/less convinced myself it doesn't in the full-reassignment case :-) As for the more traditional survey + assign, it can work if the original survey led to reasonable values, mostly by preventing (I *think*, I still need to convince myself) the realloc path for kicking in, but this is all very fishy... At least x86 does use that flag at least in one place from my grepping around... > > > 2) pci_bus_size_bridges(bus) and pci_bus_assign_resources(bus) should > > > not interfere with resources already claimed so it *should* be safe > > > to call them anyway > > > > Sure, *should* and here we introduce yet another way of doing things > > though... Any reason we don't want to do what x86 does here ? > > No, see above, keeping in mind that what x86 does is still not > very well defined AFAICS. It's not defined, it's grown through an evolutionary process :-) It's still by far the most tested with the widest range of crazy PCI devices and BIOSes I would think. > > > Most importantly: I want everyone to agree that claiming is equivalent > > > to making a resource immutable (except for realloc, see (1) above) > > > because that's what we are doing by claiming on _DSM #5 == 0. > > > > I think the combination of claiming *and* IORESOURCE_PCI_FIXED is what > > makes it *really* immutable. I'm a bit confused by the realloc logic > > right now, I'll need more quality time looking at it after ingesting > > more caffeing but I'm under the impression that it will honor the flag. > > Likewise, this requires some fuzzing because it is really hard to > understand by perusing the code. Yup. > > > There are too many ways to make a resource immutable in the kernel > > > and this is confusing and prone to bugs. > > > > It is, but I don't want to create new ways of doing things, and what > > you seem to propose is a new way imho :-) > > No, see above. What I am saying is that we have IORESOURCE_PCI_FIXED, > res->parent != NULL and Enhanced allocation to make a BAR immutable and > before going any further it would be great if we understand their > interaction given that we are starting from a pseudo clean slate. > If we fail to do that it is quirks later (and I would rather avoid > seeing x86 command line parameters to work around them). Well.. while I would hate to have to *use* the x86 command line parameters, I also don't like them not existing on all archs, as they can actually be useful to help diagnose issues if anything (with of course the intention of fixing things so they aren't needed in the long run of course). Cheers, Ben. > Cheers, > Lorenzo > > > Cheers, > > Ben. > > > > > Thanks, > > > Lorenzo > > > > > > > + ACPI_FREE(obj); > > > > > > > > list_for_each_entry(child, &bus->children, node) > > > > pcie_bus_configure_settings(child); > > > > diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h > > > > index 8082b612f561..62b7fdcc661c 100644 > > > > --- a/include/linux/pci-acpi.h > > > > +++ b/include/linux/pci-acpi.h > > > > @@ -107,9 +107,10 @@ static inline void acpiphp_check_host_bridge(struct acpi_device *adev) { } > > > > #endif > > > > > > > > extern const guid_t pci_acpi_dsm_guid; > > > > -#define DEVICE_LABEL_DSM 0x07 > > > > -#define RESET_DELAY_DSM 0x08 > > > > -#define FUNCTION_DELAY_DSM 0x09 > > > > +#define IGNORE_PCI_BOOT_CONFIG_DSM 0x05 > > > > +#define DEVICE_LABEL_DSM 0x07 > > > > +#define RESET_DELAY_DSM 0x08 > > > > +#define FUNCTION_DELAY_DSM 0x09 > > > > > > > > #else /* CONFIG_ACPI */ > > > > static inline void acpi_pci_add_bus(struct pci_bus *bus) { } > > > > > > > > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup 2019-06-06 9:00 ` [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup Benjamin Herrenschmidt 2019-06-06 9:13 ` Ard Biesheuvel 2019-06-11 14:58 ` Lorenzo Pieralisi @ 2019-06-11 23:39 ` Bjorn Helgaas 2019-06-12 0:06 ` Benjamin Herrenschmidt 2 siblings, 1 reply; 27+ messages in thread From: Bjorn Helgaas @ 2019-06-11 23:39 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Lorenzo Pieralisi, Ard Biesheuvel, linux-pci, Sinan Kaya, Zilberman, Zeev, Saidi, Ali, linux-arm-kernel On Thu, Jun 06, 2019 at 07:00:12PM +1000, Benjamin Herrenschmidt wrote: > From: Ard Biesheuvel <ard.biesheuvel@linaro.org> > > On arm64 ACPI systems, we unconditionally reconfigure the entire PCI > hierarchy at boot. This is a departure from what is customary on ACPI > systems, and may break assumptions in some places (e.g., EFIFB), that > the kernel will leave BARs of enabled PCI devices where they are. > > Given that PCI already specifies a device specific ACPI method (_DSM) > for PCI root bridge nodes that tells us whether the firmware thinks > the configuration should be left alone, let's sidestep the entire > policy debate about whether the PCI configuration should be preserved > or not, and put it under the control of the firmware instead. The current PCI Firmware spec r3.2 specifies _DSM function 5 for PCI-to-PCI bridge objects, which does not include host bridge (PNP0A03) nodes, but the proposed revision does allow it under host bridges. So I'm fine with this, but we should update the commit log so it doesn't say "PCI *already* specifies this". > [BenH: Added pci_assign_unassigned_root_bus_resources()] > > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> I think you should add a signed-off-by for yourself? > --- > > So I would like this variant rather than mucking around with > IORESOURCE_PCI_FIXED at this stage to fix the problem with our platforms. > > See my other email, IORESOURCE_PCI_FIXED doesn't really work terribly well > when using pci_bus_size_bridges and pci_bus_assign_resources, and the > resulting patches are ugly and add more mess. > > Long run, I propose to start working on consolidating all those various > resource survey mechanisms around what x86 does, unless people strongly > object... (with the addition of the probe only and force reassign quirks > so platforms can still chose that). > > Note: I haven't tested the effect of pci_assign_unassigned_root_bus_resources > as our platforms don't leave anything unassigned. I'm not entirely sure how > well pci_bus_claim_resources() will deal with a partially assigned setup... > > We do want to support partial assignment by BIOS though, it's a trend to > reduce boot time, people seem to want BIOSes to only assign what's critical > for booting. > > Bjorn: I haven't made the claim path the default in absence of _DSM #5 yet. > I suggest we do that as a separate patch in case it breaks somebody, thus > making bisection more meaningful. It will also make this one more palatable > to distros since it won't change the behaviour on systems without _DSM #5, > and we verified nobody has it except Seattle which returns 1. > > arch/arm64/kernel/pci.c | 23 +++++++++++++++++++++-- > include/linux/pci-acpi.h | 7 ++++--- > 2 files changed, 25 insertions(+), 5 deletions(-) > > diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c > index bb85e2f4603f..6358e1cb4f9f 100644 > --- a/arch/arm64/kernel/pci.c > +++ b/arch/arm64/kernel/pci.c > @@ -168,6 +168,7 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) > struct acpi_pci_generic_root_info *ri; > struct pci_bus *bus, *child; > struct acpi_pci_root_ops *root_ops; > + union acpi_object *obj; > > ri = kzalloc(sizeof(*ri), GFP_KERNEL); > if (!ri) > @@ -193,8 +194,26 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) > if (!bus) > return NULL; > > - pci_bus_size_bridges(bus); > - pci_bus_assign_resources(bus); > + /* > + * Invoke the PCI device specific method (_DSM) #5 'Ignore PCI Boot > + * Configuration', which tells us whether the firmware wants us to > + * preserve the configuration of the PCI resource tree for this root > + * bridge. > + */ > + obj = acpi_evaluate_dsm(ACPI_HANDLE(bus->bridge), &pci_acpi_dsm_guid, 1, > + IGNORE_PCI_BOOT_CONFIG_DSM, NULL); > + if (obj && obj->type == ACPI_TYPE_INTEGER && obj->integer.value == 0) { This is fine, but can we make a tiny step toward doing this in generic code instead of adding more arch-specific stuff? E.g., evaluate the _DSM in the generic acpi_pci_root_add(), set a "preserve_config" bit in the struct acpi_pci_root, and test the bit here? It would also be nice to add a printk in the other pci_acpi_scan_root() implementations if the bit is set so we know that the platform supplied the _DSM but we're ignoring it. > + /* preserve existing resource assignment */ > + pci_bus_claim_resources(bus); > + > + /* Assign anything that might have been left out */ > + pci_assign_unassigned_root_bus_resources(bus); > + } else { > + /* reconfigure the resource tree from scratch */ > + pci_bus_size_bridges(bus); > + pci_bus_assign_resources(bus); > + } > + ACPI_FREE(obj); > > list_for_each_entry(child, &bus->children, node) > pcie_bus_configure_settings(child); > diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h > index 8082b612f561..62b7fdcc661c 100644 > --- a/include/linux/pci-acpi.h > +++ b/include/linux/pci-acpi.h > @@ -107,9 +107,10 @@ static inline void acpiphp_check_host_bridge(struct acpi_device *adev) { } > #endif > > extern const guid_t pci_acpi_dsm_guid; > -#define DEVICE_LABEL_DSM 0x07 > -#define RESET_DELAY_DSM 0x08 > -#define FUNCTION_DELAY_DSM 0x09 > +#define IGNORE_PCI_BOOT_CONFIG_DSM 0x05 > +#define DEVICE_LABEL_DSM 0x07 > +#define RESET_DELAY_DSM 0x08 > +#define FUNCTION_DELAY_DSM 0x09 > > #else /* CONFIG_ACPI */ > static inline void acpi_pci_add_bus(struct pci_bus *bus) { } > > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup 2019-06-11 23:39 ` Bjorn Helgaas @ 2019-06-12 0:06 ` Benjamin Herrenschmidt 2019-06-12 13:27 ` Bjorn Helgaas 0 siblings, 1 reply; 27+ messages in thread From: Benjamin Herrenschmidt @ 2019-06-12 0:06 UTC (permalink / raw) To: Bjorn Helgaas Cc: Lorenzo Pieralisi, Ard Biesheuvel, linux-pci, Sinan Kaya, Zilberman, Zeev, Saidi, Ali, linux-arm-kernel On Tue, 2019-06-11 at 18:39 -0500, Bjorn Helgaas wrote: > On Thu, Jun 06, 2019 at 07:00:12PM +1000, Benjamin Herrenschmidt wrote: > > From: Ard Biesheuvel <ard.biesheuvel@linaro.org> > > > > On arm64 ACPI systems, we unconditionally reconfigure the entire PCI > > hierarchy at boot. This is a departure from what is customary on ACPI > > systems, and may break assumptions in some places (e.g., EFIFB), that > > the kernel will leave BARs of enabled PCI devices where they are. > > > > Given that PCI already specifies a device specific ACPI method (_DSM) > > for PCI root bridge nodes that tells us whether the firmware thinks > > the configuration should be left alone, let's sidestep the entire > > policy debate about whether the PCI configuration should be preserved > > or not, and put it under the control of the firmware instead. > > The current PCI Firmware spec r3.2 specifies _DSM function 5 for > PCI-to-PCI bridge objects, which does not include host bridge > (PNP0A03) nodes, but the proposed revision does allow it under host > bridges. So I'm fine with this, but we should update the commit log > so it doesn't say "PCI *already* specifies this". > > > [BenH: Added pci_assign_unassigned_root_bus_resources()] > > > > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > > I think you should add a signed-off-by for yourself? I should, I forgot. That said, Lorenzo wants to wait for the actual ECN... and we're also discussing some details. > .../... > > + /* > > + * Invoke the PCI device specific method (_DSM) #5 'Ignore PCI Boot > > + * Configuration', which tells us whether the firmware wants us to > > + * preserve the configuration of the PCI resource tree for this root > > + * bridge. > > + */ > > + obj = acpi_evaluate_dsm(ACPI_HANDLE(bus->bridge), &pci_acpi_dsm_guid, 1, > > + IGNORE_PCI_BOOT_CONFIG_DSM, NULL); > > + if (obj && obj->type == ACPI_TYPE_INTEGER && obj->integer.value == 0) { > > This is fine, but can we make a tiny step toward doing this in generic > code instead of adding more arch-specific stuff? > > E.g., evaluate the _DSM in the generic acpi_pci_root_add(), set a > "preserve_config" bit in the struct acpi_pci_root, and test the bit > here? I'd rather have the flag in the host bridge no ? > It would also be nice to add a printk in the oter > pci_acpi_scan_root() implementations if the bit is set so we know that > the platform supplied the _DSM but we're ignoring it. Ok. Talking of which, look at the ongoing discussion I have with Lorenzo when it comes to pci_bus_claim_resources vs. what x86 does, I'd love for you to chime in. I'd like to try to consolidate things further accross architectures but there might be reasons I don't see as to why things are different in that area, so ... Cheers, Ben. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup 2019-06-12 0:06 ` Benjamin Herrenschmidt @ 2019-06-12 13:27 ` Bjorn Helgaas 2019-06-12 21:46 ` Benjamin Herrenschmidt 2019-06-12 23:58 ` Benjamin Herrenschmidt 0 siblings, 2 replies; 27+ messages in thread From: Bjorn Helgaas @ 2019-06-12 13:27 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Lorenzo Pieralisi, Ard Biesheuvel, linux-pci, Sinan Kaya, Zilberman, Zeev, Saidi, Ali, linux-arm-kernel On Wed, Jun 12, 2019 at 10:06:06AM +1000, Benjamin Herrenschmidt wrote: > On Tue, 2019-06-11 at 18:39 -0500, Bjorn Helgaas wrote: > > This is fine, but can we make a tiny step toward doing this in generic > > code instead of adding more arch-specific stuff? > > > > E.g., evaluate the _DSM in the generic acpi_pci_root_add(), set a > > "preserve_config" bit in the struct acpi_pci_root, and test the bit > > here? > > I'd rather have the flag in the host bridge no ? Oh, of course, that would make more sense. > Talking of which, look at the ongoing discussion I have with Lorenzo > when it comes to pci_bus_claim_resources vs. what x86 does, I'd love > for you to chime in. I'd like to try to consolidate things further > accross architectures but there might be reasons I don't see as to why > things are different in that area, so ... I don't know any reasons why things are different per arch. In most cases I suspect FUD. Speaking of which, *this* patch looks like FUD because it essentially says "Linux shouldn't change the PCI configuration on this system" but it offers no explanation of *why* the config needs to be preserved. I would really like some note like "run-time firmware depends on the addresses of device X". Bjorn _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup 2019-06-12 13:27 ` Bjorn Helgaas @ 2019-06-12 21:46 ` Benjamin Herrenschmidt 2019-06-12 23:58 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 27+ messages in thread From: Benjamin Herrenschmidt @ 2019-06-12 21:46 UTC (permalink / raw) To: Bjorn Helgaas Cc: Lorenzo Pieralisi, Ard Biesheuvel, linux-pci, Sinan Kaya, Zilberman, Zeev, Saidi, Ali, linux-arm-kernel On Wed, 2019-06-12 at 08:27 -0500, Bjorn Helgaas wrote: > On Wed, Jun 12, 2019 at 10:06:06AM +1000, Benjamin Herrenschmidt > wrote: > > On Tue, 2019-06-11 at 18:39 -0500, Bjorn Helgaas wrote: > > > This is fine, but can we make a tiny step toward doing this in > > > generic > > > code instead of adding more arch-specific stuff? > > > > > > E.g., evaluate the _DSM in the generic acpi_pci_root_add(), set a > > > "preserve_config" bit in the struct acpi_pci_root, and test the > > > bit > > > here? > > > > I'd rather have the flag in the host bridge no ? > > Oh, of course, that would make more sense. Thinking of this ... I still believe eventually the default should be to preseve the BIOS config (see ongoing discussions about converging everybody toward the x86 model), so the flag should be the other way around. I'm thinking moving PROBE_ONLY and REASSIGN_ALL_RSRC/BUS to be host bridge flags (initially copied from the global flags). To not break things, ARM64 would start setting that 'reassign all' by default, then we can flip it. > > Talking of which, look at the ongoing discussion I have with Lorenzo > > when it comes to pci_bus_claim_resources vs. what x86 does, I'd love > > for you to chime in. I'd like to try to consolidate things further > > accross architectures but there might be reasons I don't see as to why > > things are different in that area, so ... > > I don't know any reasons why things are different per arch. In most > cases I suspect FUD. > > Speaking of which, *this* patch looks like FUD because it essentially > says "Linux shouldn't change the PCI configuration on this system" but > it offers no explanation of *why* the config needs to be preserved. I > would really like some note like "run-time firmware depends on the > addresses of device X". Oh there are a number of reasons as to why but yes, that's one of them. I can improve the comments. That said, I think everybody tends to agree that reassigning everything by default isn't a great idea, so while I still need something like this patch in asap, I'll be working towards making that stuff more common accross archs. My logic is that x86 is the most tested arch with the widest range of PCI devices and broken BIOSes. So what works for x86 should work for others (minus maybe a handful of quirks). So it doesn't make sense to have the main resource survey logic stuck in arch/x86 and everybody else growing different things. Cheers, Ben. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup 2019-06-12 13:27 ` Bjorn Helgaas 2019-06-12 21:46 ` Benjamin Herrenschmidt @ 2019-06-12 23:58 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 27+ messages in thread From: Benjamin Herrenschmidt @ 2019-06-12 23:58 UTC (permalink / raw) To: Bjorn Helgaas Cc: Lorenzo Pieralisi, Ard Biesheuvel, linux-pci, Sinan Kaya, Zilberman, Zeev, linux-arm-kernel, Saidi, Ali On Wed, 2019-06-12 at 08:27 -0500, Bjorn Helgaas wrote: > Speaking of which, *this* patch looks like FUD because it essentially > says "Linux shouldn't change the PCI configuration on this system" but > it offers no explanation of *why* the config needs to be preserved. I > would really like some note like "run-time firmware depends on the > addresses of device X". BTW. the patch doesn't say that. ACPI tells us that :-) Cheers Ben. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] ARM64 PCI resource survey issue(s) 2019-06-04 20:41 ` Benjamin Herrenschmidt 2019-06-06 9:00 ` [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup Benjamin Herrenschmidt @ 2019-06-10 10:11 ` Lorenzo Pieralisi 2019-06-11 5:46 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 27+ messages in thread From: Lorenzo Pieralisi @ 2019-06-10 10:11 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Ard Biesheuvel, linux-pci, Sinan Kaya, Zilberman, Zeev, Saidi, Ali, Bjorn Helgaas, linux-arm-kernel On Wed, Jun 05, 2019 at 06:41:16AM +1000, Benjamin Herrenschmidt wrote: > On Tue, 2019-06-04 at 07:49 -0500, Bjorn Helgaas wrote: > > > Yes... I am not personally aware of such a case but I was led to > > > believe based on prior conversations that such setups do exist, > > > especially in the non-ACPI ARM64 world. Which is why I would suggest > > > initially changing the default only on ACPI, at least until we have a > > > bit better visibility. > > > > If a resource assignment that is valid in terms of all the PCI rules > > (BARs are valid, BARs are inside upstream bridge windows, etc) doesn't > > work, we would need more information in order to fix anything. We'd > > need to know exactly *what* doesn't work and *why* so we can avoid it. > > The current blanket statement of "reassign everything and hope it > > works better" is useless. > > I agree and I assume the problem stems from BIOSes creating either > invalid or incomplete assignments but as I said, I don't know for sure > as our platforms dont have that problem. The first thing that I would like to agree on is what mechanism the kernel has to ensure a BAR resource is not changed by the PCI resource assignment mechanism. (1) IORESOURCE_PCI_FIXED flag: I have *very* strong feelings that this flag works because x86 kernel code claims all resources on probe (which effectively makes them immutable). It does not work for PCI bridges apertures and I am not sure it works for arches (eg ARM64) that reassign everything, even for *devices* (2) Claiming resources: this basically means assigning a parent to a resource. Kernel PCI resource assignment code is full of code I do not understand and that in my opinion works because of (1). I *tried*, while posting ACPI PCI code for ARM64, to do what x86 does: - Claim all resources - Reassign the resources that could not be claimed This led to: a) Spurious dmesg logs (Eg Resources that could not be claimed) b) If FW set-up bridge windows, it may lead to reassignment failures if the bridge windows were sized *wrongly* but the kernel still manage to claim them (because they fit in the parent window). Hence, we reassign everything on ARM64 on ACPI (except for bus numbers and that was, indeed, a mistake). In short, this is a mess and one that I do not have much leeway to fix because the PCI resource assignment code is as it is owing to legacy that we can't touch unless we want to fix regressions for the next year. And then there is _DSM #5. The problem we have is that there is *no* specific way to tell what's right or wrong and that's what _DSM #5 is supposed to fix. To implement it to the letter it will take lots of work. First thing is to answer and agree (1) vs (2) above, aka define what an immutable resource is from a kernel point of view, one with a parent != NULL OR one with IORESOURCE_PCI_FIXED flag. Lorenzo _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC] ARM64 PCI resource survey issue(s) 2019-06-10 10:11 ` [RFC] ARM64 PCI resource survey issue(s) Lorenzo Pieralisi @ 2019-06-11 5:46 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 27+ messages in thread From: Benjamin Herrenschmidt @ 2019-06-11 5:46 UTC (permalink / raw) To: Lorenzo Pieralisi Cc: Ard Biesheuvel, linux-pci, Sinan Kaya, Zilberman, Zeev, Saidi, Ali, Bjorn Helgaas, linux-arm-kernel On Mon, 2019-06-10 at 11:11 +0100, Lorenzo Pieralisi wrote: > > > I agree and I assume the problem stems from BIOSes creating either > > invalid or incomplete assignments but as I said, I don't know for sure > > as our platforms dont have that problem. > > The first thing that I would like to agree on is what mechanism the > kernel has to ensure a BAR resource is not changed by the > PCI resource assignment mechanism. > > (1) IORESOURCE_PCI_FIXED flag: I have *very* strong feelings that this > flag works because x86 kernel code claims all resources on probe > (which effectively makes them immutable). It does not work for > PCI bridges apertures and I am not sure it works for arches (eg > ARM64) that reassign everything, even for *devices* I'm pretty sure this flag generally doesn't work indeed, see my other messages on that subject. The only valid use I can see for this flag honestly is in a context where we do an x86-style claim pass, to prevent selected resources or bridge windows from being moved or resized due to conflicts (ie: move the *other* one if there's a conflict) but it's of very limited value. > (2) Claiming resources: this basically means assigning a parent to > a resource. Well yes, a claimed resource is immutable, that's a given. That and of course it's implied that there are no conflicts between claimed resources and/or the resources have valid existing assigned values. From my quick look last week, it doesn't seem like pci_bus_claim_resources() handle those edge cases terribly well. This is why I'm keen on trying to standardize around the x86 (and powerpc which is similar) method which deals with these partial assignments and some broken cases much better. > Kernel PCI resource assignment code is full of code I do not understand > and that in my opinion works because of (1). I *tried*, while posting > ACPI PCI code for ARM64, to do what x86 does: :-) > - Claim all resources > - Reassign the resources that could not be claimed > > This led to: > > a) Spurious dmesg logs (Eg Resources that could not be claimed) > b) If FW set-up bridge windows, it may lead to reassignment failures > if the bridge windows were sized *wrongly* but the kernel still > manage to claim them (because they fit in the parent window). > > Hence, we reassign everything on ARM64 on ACPI (except for bus numbers > and that was, indeed, a mistake). Hrm... do you have example systems or the ability to give me access to systems where these problems occur ? The x86 code has some smarts to deal with BIOSes who didn't quite do the right thing that we might be able to port over. What I called the "2 pass survey followed by pci_assign_unassigned_resources". I've implemented something similar back in the days for powerpc and it served us well there too. I'd like to give a shot and making this generic and using it on arm64. > In short, this is a mess and one that I do not have much leeway to fix > because the PCI resource assignment code is as it is owing to legacy > that we can't touch unless we want to fix regressions for the next > year. Yes and no... We can try to make the x86 method work on arm64 and see what goes right or wrong. I wouldn't mind adding quirks to recognize existing broken amd64 firmwares if those really still don't work properly, if we can help getting future ones better as well. > And then there is _DSM #5. > > The problem we have is that there is *no* specific way to tell what's > right or wrong and that's what _DSM #5 is supposed to fix. To implement > it to the letter it will take lots of work. Correct but I don't think we need fully "to the letter", it's almost impossible to support all possible cases of _DSM #5 values at arbitrary locations in the tree. > First thing is to answer and agree (1) vs (2) above, aka define what > an immutable resource is from a kernel point of view, one with a > parent != NULL OR one with IORESOURCE_PCI_FIXED flag. Well, I think (2) is a given. The issue however is that a resource can't be "fixed" in isolation. It's parent need to be sufficiently "fixed" to enclose that resource, and it's parent parent etc.... This is why random _DSM #5 on leaf nodes isn't going to fly terribly well if the FW hasn't already assigned sane windows to bridges. I think the key here is to try to claim what FW setup iff it doesn't conflict and looks like it's valid at all (assigned at all), using a similar kind of 2 pass mechanism as x86 and powrpc that favors already enabled devices (likely to be used by FW), and (re)assigns everything else. I can look at making x86 and powerpc common with reasonable ease as I wrote a lot of the powerpc code for that. I don't have all the HW at hand to test anymore but I still know who does :-) For ARM64, the difficulty is all I have access to at the moment are Amazon own systems which seem to have a reasonable FW assignment, and qemu. So I'll need help with testing against the bad guys. In the meantime, I'm still keen on getting in Ard patch that handles _DSM #5 at the host bridge level. It will help existing valid use cases and we are reasonably confident it won't break anything. Cheers, Ben. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2019-06-12 23:59 UTC | newest] Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-06-03 23:41 [RFC] ARM64 PCI resource survey issue(s) Benjamin Herrenschmidt 2019-06-04 1:49 ` Bjorn Helgaas 2019-06-04 3:32 ` Benjamin Herrenschmidt 2019-06-04 3:37 ` Benjamin Herrenschmidt 2019-06-04 6:56 ` Benjamin Herrenschmidt 2019-06-04 12:49 ` Bjorn Helgaas 2019-06-04 20:41 ` Benjamin Herrenschmidt 2019-06-06 9:00 ` [PATCH/RESEND] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup Benjamin Herrenschmidt 2019-06-06 9:13 ` Ard Biesheuvel 2019-06-06 10:55 ` Benjamin Herrenschmidt 2019-06-11 14:31 ` Lorenzo Pieralisi 2019-06-11 22:09 ` Benjamin Herrenschmidt 2019-06-11 22:34 ` Ard Biesheuvel 2019-06-11 22:40 ` Benjamin Herrenschmidt 2019-06-12 10:21 ` Lorenzo Pieralisi 2019-06-12 22:05 ` Benjamin Herrenschmidt 2019-06-11 14:58 ` Lorenzo Pieralisi 2019-06-11 22:19 ` Benjamin Herrenschmidt 2019-06-12 10:08 ` Lorenzo Pieralisi 2019-06-12 10:58 ` Benjamin Herrenschmidt 2019-06-11 23:39 ` Bjorn Helgaas 2019-06-12 0:06 ` Benjamin Herrenschmidt 2019-06-12 13:27 ` Bjorn Helgaas 2019-06-12 21:46 ` Benjamin Herrenschmidt 2019-06-12 23:58 ` Benjamin Herrenschmidt 2019-06-10 10:11 ` [RFC] ARM64 PCI resource survey issue(s) Lorenzo Pieralisi 2019-06-11 5:46 ` Benjamin Herrenschmidt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).