From: Benjamin Herrenschmidt <firstname.lastname@example.org> To: Bjorn Helgaas <email@example.com> Cc: Ard Biesheuvel <firstname.lastname@example.org>, Sinan Kaya <email@example.com>, Lorenzo Pieralisi <firstname.lastname@example.org>, linux-pci <email@example.com>, linux-arm-kernel <firstname.lastname@example.org> Subject: Re: [RFC PATCH v2] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup Date: Fri, 14 Jun 2019 23:46:39 +1000 Message-ID: <email@example.com> (raw) In-Reply-To: <20190614130952.GQ13533@google.com> On Fri, 2019-06-14 at 08:09 -0500, Bjorn Helgaas wrote: > On Fri, Jun 14, 2019 at 06:36:32PM +1000, Benjamin Herrenschmidt wrote: > > Linux can't change the switch configuration. I may have mentioned > > earlier it has to do with platform sec policies. But that's not totally > > relevant, we shoudn't be changing resources anyway since in theory > > runtime FW might rely on where some system devices were assigned at > > boot. EFI fb is another example of that. > > "We shouldn't be changing resources anyway" is not really useful > guidance. I completely agree that we shouldn't change things > *unnecessarily*, but it's up to the OS to define what makes it > necessary -- it might be for rebalancing for hotplug, to make space > for SR-IOV, to respect user requests to increase alignment, etc. > > IMO the real value of _DSM #5 is as a mechanism to advertise > dependencies runtime firmware has on devices, e.g., SMM firmware might > want to log errors to a PCI remote management device. If the OS moved > that managment device, the SMM logging would itself cause errors. This I agree with, though that wasn't specifically why it was created but this is where it's going yes. However, I think the resource management code in Linux is some way from being able to handle that at device granularity. It's actually a very difficult problem to solve in an optimal way. For example you could have a bridge that can be moved but a child of that bridge is fixed. That means that now, the bridge must only be moved within the constraints that it still contains that child. This in turn is very tricky to properly define because the bridge can have other children who aren't fixed, but whose alignment constraints will also limit where the bridge can go. I don't think we want to practically create such a mechanism, we have to be realistic. So that means if something needs to be fixed, all of its parents need too. Due to how the spec is written, if that _DSM #5 == 0 object is a bridge, then both all of its parents and all of its descendants are now fixed (except hotplugged descendants), unless said descendants are themselves tagged with _DSM #5 == 1. But then, as I explain below, this does more than just "cancel" _DSM #5 == 0 ... what a mess :-) That said, I have been thinking about ways we could better honor that _DSM #5 at various levels of the tree, and I think it's doable. There's going to still be some arguments to sort out about policy details as above, and we might end up trying to get the SIG to clarify things further but I think we can get somewhere reasonably sane in practice once we've changed arm64 to use an x86-style allocation rather than trying to do everything from scratch. Working on it ... > > The biggest issue for me right now is that the spec says pretty much at > > _DSM #5 = 0 is equivalent to _DSM #5 absent, and Bjorn seems keen on > > having it this way, but for arm64, we specifically want to distinguish > > those 2 cases. > > Nope, my opinion is exactly the opposite. Sorry that I'm not > communicating this clearly. > > It's true that the r3.2 spec *does* say _DSM #5 = 0 is equivalent to > the situation where it's absent, but that's based on the assumption > that the OS is never allowed to change PCI configuration. I think > that assumption is complete nonsense and should be disregarded. > > _DSM #5 = 0: OS must preserve this device's BARs > (current spec says "OS must not ignore") > > _DSM #5 = 1: OS *may* change this device's BARs > (current spec says "OS may ignore") > > Other than _DSM #5, there's no spec I'm aware of that restricts the > OS's ability to change BARs. Therefore I think "_DSM #5 = 1" is > equivalent to _DSM #5 being absent, and in both cases the OS is free > to change BARs as it sees fit. I think we are conflating too many different things together here, it's not binary. If you look at the reasons why DSM#5 was created in the first place, it had to do with firmwares setting up a 32-bit compatible layout which is suboptimal for 64-bit, and thus marking with DSM #1 things that could (and probably should) be reallocated elsewhere. Based on that pattern, it's tempting (and I'm tempted...) to handle _DSM #5 == 1, at least at a non-root-bridge level, by wiping the resources (or setting IORESOURCE_UNSET), to force Linux to create a potentially more optimal allocation. However, that shouldn't be our default either :-) So my personal opinion is that DSM #5 0, 1 and absent are actually 3 different things. > > Looking at the spec (and followup discussions for specs updates), I'm > > quite keen on treating _DSM #5 = 1 as "wipe out the resource for that > > endpoint/bridge and realloc something better. There are reasons for > > that, but we can probably discuss that later. > > I disagree on the "wipe out all resources" part of this because we > have no idea how to realloc something better. We should of course > look for problems (overlapping devices, etc) and fix them. But > starting from scratch and reallocating won't reliably produce anything > different from the original, supposedly broken, configuration. Well, again, it depends *why* _DSM #5 was set to 1 in the first place. As I said above, there are actually more than 2 cases here and _DSM is trying to convey too much with too little information. We lack the firmware intent, it's not being conveyed well. From my understanding of the spec history, _DSM #5 == 1 was specifically created because the FW knew it was setting up some resources in a sub optimal way. The 32-bit example is spelled out. I can imagine others, such as FW whacking a fixed BAR value on a boot device to get going knowing full well it's sub optimal. Now unless we start building some pretty serious smarts into our allocation scheme to "decide" whether a valid resource could be done better elsewhere or not, which we aren't near being able to do I suspect, that case will not work *unless* we just wipe such resources and let Linux do what it thinks is best with it. All those interpretations are valid on either the current or the "proposed" spec, and it still leaves us in a bind because there is not enough intent being conveyed. It needs to be more than a binary choice ideally. That said, I'm not in a hurry to deal with the _DSM #5 == 1 case, so we can defer that conversation. Maybe we can create a mechanism where we do a "dummy run" of our allocation code on a snapshot of resources with the assumption that _DSM #5 == 1 ones get wiped, and create some metrics to decide whether the end result is "better" than what's already there, and based on that chose what to apply, but it sounds really really messy. Cheers, Ben.
prev parent reply index Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-06-13 7:54 Benjamin Herrenschmidt 2019-06-13 19:02 ` Bjorn Helgaas 2019-06-13 21:59 ` Benjamin Herrenschmidt 2019-06-13 23:07 ` Benjamin Herrenschmidt 2019-06-14 7:42 ` Ard Biesheuvel 2019-06-14 8:36 ` Benjamin Herrenschmidt 2019-06-14 9:57 ` Lorenzo Pieralisi 2019-06-14 10:36 ` Benjamin Herrenschmidt 2019-06-14 10:43 ` Benjamin Herrenschmidt 2019-06-14 13:12 ` Bjorn Helgaas 2019-06-14 13:48 ` Benjamin Herrenschmidt 2019-06-14 20:13 ` Bjorn Helgaas 2019-06-15 1:18 ` Benjamin Herrenschmidt 2019-06-14 13:09 ` Bjorn Helgaas 2019-06-14 13:46 ` Benjamin Herrenschmidt [this message]
Reply instructions: You may reply publically to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Linux-PCI Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/linux-pci/0 linux-pci/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-pci linux-pci/ https://lore.kernel.org/linux-pci \ firstname.lastname@example.org email@example.com public-inbox-index linux-pci Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-pci AGPL code for this site: git clone https://public-inbox.org/ public-inbox