linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Logan Gunthorpe <logang@deltatee.com>
To: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>
Subject: Re: Multitude of resource assignment functions
Date: Thu, 27 Jun 2019 10:35:12 -0600	[thread overview]
Message-ID: <e2eec9dc-5eef-62ba-6251-f420d6579d03@deltatee.com> (raw)
In-Reply-To: <SL2P216MB01875C9CB93E6B39846749B280FD0@SL2P216MB0187.KORP216.PROD.OUTLOOK.COM>



On 2019-06-27 1:40 a.m., Nicholas Johnson wrote:
> On Mon, Jun 24, 2019 at 10:45:17AM -0600, Logan Gunthorpe wrote:
>>
>>
>> On 2019-06-24 3:13 a.m., Benjamin Herrenschmidt wrote:
>>> So I'm staring at these three mostly at this point:
>>>
>>> void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
>>> void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge)
>>> void pci_assign_unassigned_bus_resources(struct pci_bus *bus)
>>>
>>> Now we have 3 functions that fundamentally have the same purpose,
>>> assign what was left unassigned down a PCI hierarchy, but are going
>>> about it in quite a different manner.
>>>
>>> Now to make things worse, there's little consistency in which one gets
>>> called where. We have PCI controllers calling the first one sometimes,
>>> the last one sometimes, or doing the manual:
>>>
>>> 	pci_bus_size_bridges(bus);
>>> 	pci_bus_assign_resources(bus);
>>>
>>> Or variants with pci_bus_size_bridges sometimes missing etc...
>>
>> I suspect there isn't much rhyme or reason to it. None of this is well
>> documented so developers writing the controller drivers probably didn't
>> have a good idea of what the correct thing to do was, and just stuck
>> with the first thing that worked.
>>
>>> Now I've consolidated a lot of that and removed all of those "manual"
>>> cases in my work-in-progress branch, but I'd like to clarify and
>>> possibly remove the 3 ones above.
>>>
>>> Let's start with the last one, pci_assign_unassigned_bus_resources, as
>>> it's the easiest to remove from users in drivers/pci/controller/* (and
>>> replace with pci_assign_unassigned_root_bus_resources typically).
>>>
>>> This leaves it used in a couple of corner cases, most of them I think
>>> I can kill, and .... sysfs 'rescan'.
>>>
>>> The interesting thing about that function is that it tries to avoid
>>> resizing the bridge of the bus passed as an argument, it will only
>>> resize subordinate bridges. From the changelog it was created for
>>> hotplug bridges, but almost none uses it (some powerpc stuff I can
>>> probably kill) ... and sysfs rescan.
>>>
>>> I wonder what's the remaining purpose of it. sysfs rescan could
>>> probably be cleaned up to use the two first... Also why avoid resizing
>>> the bridge itself ?
>>>
>>> That leads to the difference between
>>> pci_assign_unassigned_root_bus_resources()
>>> and pci_assign_unassigned_bridge_resources().
>>>
>>> The names are misleading. The former isn't just about the root bus
>>> resources. It's about the entire tree underneath the root bus.
>>>
>>> The main difference that I can tell are:
>>>
>>>  - pci_assign_unassigned_root_bus_resources() may or may not try to
>>> realloc, depending on a combination of command line args, config
>>> option, presence of IOV devices etc... while
>>> pci_assign_unassigned_bridge_resources() always will
>>>
>>>  - pci_assign_unassigned_bridge_resources() will call
>>> pci_bridge_distribute_available_resources() to distribute resource to
>>> child hotplug bridges, while pci_assign_unassigned_root_bus_resources()
>>> won't.
>>>
>>> Now, are we 100% confident we want to keep those discrepancies ?
>>>
>>> It feels like the former function is intended for boot time resource
>>> allocation, and the latter for hotplug, but I can't make sense of why
>>> the resources of a device behind a hotplug bridge should be allocated
>>> differently depending on whether that device was plugged at boot or
>>> plugged later.
>>
>> I don't really know, but I kind of assumed reallocing any time but early
>> in boot would be dangerous. It involves un-assigning a bunch of
>> resources without any real check to see if a driver is using them or
>> not. If they were being used by a driver (which is typical) and they
>> were reassigned, everything would break.
>>
>> I mean, in theory the code could/should be the same for both paths and
>> it could just make a single, better decision on whether to realloc or
>> not. But that's going to be challenging to get there.
>>
>>> Also why not distribute available resources at boot between top level
>>> hotplug bridges ?
>>>
>>> I'm not even going into the question of why the resource
>>> sizing/assignment code is so obscure/cryptic/incomprehensible, that's
>>> another kettle of fish, but I'd like to at least clarify the usage
>>> patterns a bit better.
>> I got the impression the code was designed to generally let the firmware
>> set things up -- it just fixed things up if the firmware messed it up
>> somehow. My guess would be it evolved out of a bunch of hacks designed
>> to fix broken bioses into something new platforms used to do full
>> enumeration (because it happened to work).
> Unfortunately, the operating system is designed to let the firmware do 
> things. In my mind, ACPI should not need to exist, and the operating 
> system should start with a clean state with PCI and re-enumerate 
> everything at boot time. The PCI allocation is so broken and 
> inconsistent (as you have noted) because it tries to combine the two, 
> when firmware enumeration and native enumeration should be mutually 
> exclusive. I have attempted to re-write large chunks of probe.c, pci.c 
> and setup-bus.c to completely disregard firmware enumeration and clean 
> everything up. Unfortunately, I get stuck in probe.c with the double 
> recursive loop which assigns bus numbers - I cannot figure out how to 
> re-write it successfully. Plus, I feel like nobody will be ready for 
> such a drastic change - I am having trouble selling minor changes that 
> fix actual use cases, as opposed to code reworking.

My worry would be if the firmware depends on any of those PCI resources
for any of it's calls. For example, laptop firmware often has specific
code for screen blanking/dimming when the special buttons are pressed.
If it implements this by communicating with a PCI device then the kernel
will break things by reassigning all the addresses.

However, having a kernel parameter to ignore the firmware choices might
be a good way for us to start testing whether this is a problem or not
on some systems

Logan

  parent reply	other threads:[~2019-06-27 16:35 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <SL2P216MB01874DFDDBDE49B935A9B1B380E50@SL2P216MB0187.KORP216.PROD.OUTLOOK.COM>
2019-06-19 16:21 ` [nicholas.johnson-opensource@outlook.com.au: [PATCH v6 3/4] PCI: Fix bug resulting in double hpmemsize being assigned to MMIO window] Logan Gunthorpe
2019-06-20  0:44   ` Nicholas Johnson
2019-06-20  0:49     ` Logan Gunthorpe
2019-06-23  5:01       ` Nicholas Johnson
2019-06-24  9:13         ` Multitude of resource assignment functions Benjamin Herrenschmidt
2019-06-24 16:45           ` Logan Gunthorpe
2019-06-27  7:40             ` Nicholas Johnson
2019-06-27  8:48               ` Benjamin Herrenschmidt
2019-06-30  2:40                 ` Nicholas Johnson
2019-06-27 16:35               ` Logan Gunthorpe [this message]
2019-06-27 20:26                 ` Benjamin Herrenschmidt
2019-06-30  2:57                 ` Nicholas Johnson
2019-07-01  4:33                   ` Oliver O'Halloran
2019-07-02 21:39                   ` Bjorn Helgaas
2019-07-03 13:43                     ` Nicholas Johnson
2019-07-03 14:19                       ` Bjorn Helgaas
2019-07-03 22:54                       ` Benjamin Herrenschmidt
2019-06-20 13:43     ` [nicholas.johnson-opensource@outlook.com.au: [PATCH v6 3/4] PCI: Fix bug resulting in double hpmemsize being assigned to MMIO window] Bjorn Helgaas
2019-06-20 23:24       ` Benjamin Herrenschmidt
2019-06-27  7:50   ` Nicholas Johnson
2019-06-27 16:54     ` Logan Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e2eec9dc-5eef-62ba-6251-f420d6579d03@deltatee.com \
    --to=logang@deltatee.com \
    --cc=benh@kernel.crashing.org \
    --cc=bhelgaas@google.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=nicholas.johnson-opensource@outlook.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).