linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	Mika Westerberg <mika.westerberg@linux.intel.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Logan Gunthorpe <logang@deltatee.com>
Subject: Re: [PATCH v1 4/4] PCI: Allow extend_bridge_window() to shrink resource if necessary
Date: Wed, 8 Jan 2020 01:36:04 +0000	[thread overview]
Message-ID: <PSXP216MB043869924730BFD3AA97B0A0803E0@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM> (raw)
In-Reply-To: <20200107203435.GA137091@google.com>

On Tue, Jan 07, 2020 at 02:34:35PM -0600, Bjorn Helgaas wrote:
> On Mon, Jan 06, 2020 at 03:48:06PM +0000, Nicholas Johnson wrote:
> > Remove checks for resource size in extend_bridge_window(). This is
> > necessary to allow the pci_bus_distribute_available_resources() to
> > function when the kernel parameter pci=hpmemsize=nn[KMG] is used to
> > allocate resources. Because the kernel parameter sets the size of all
> > hotplug bridges to be the same, there are problems when nested hotplug
> > bridges are encountered. Fitting a downstream hotplug bridge with size X
> > and normal bridges with non-zero size Y into parent hotplug bridge with
> > size X is impossible, and hence the downstream hotplug bridge needs to
> > shrink to fit into its parent.
> 
> s/extend_bridge_window()/adjust_bridge_window()/ above
> s/to allow the/to allow/

Okay
> 
> If this patch allows pci_bus_distribute_available_resources() to
> function when pci=hpmemsize=nn is used, what happens *before* this
> patch?  The text implies that pci_bus_distribute_available_resources()
> doesn't function, but what happens?  Do we try to assign a downstream
> bridge requiring X+n inside an upstream window of size X and the
> assignment fails, leaving the downstream bridge unusable?

I could add something similar to this to the log:

The hpmemsize is applied to add_size of every hotplug bridge, even 
nested ones. Say we set hpmemsize=256M, the upstream hotplug bridge gets 
256M. Then when we hot-add a Thunderbolt device with daisy chaining, the 
new nested bridge also gets 256M and this will not fit because some 
further space has been consumed by the endpoints in the Thunderbolt 
device. Hence, we cannot extend.

It works for Mika because he is interested in the cases when the 
firmware assigns the resources, hence hpmemsize=2M (default) and it does 
not cause problems, unless we run out of space and need to go below 2M.
> 
> > Add check for if bridge is extended or shrunken and reflect that in the
> > call to pci_dbg().
> > 
> > Reset the resource if its new size is zero (if we have run out of a
> > bridge window resource) to prevent the PCI resource assignment code from
> > attempting to assign a zero-sized resource.
> > 
> > Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
> > ---
> >  drivers/pci/setup-bus.c | 17 ++++++++++++-----
> >  1 file changed, 12 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> > index 0c51f4937..e7e57bf72 100644
> > --- a/drivers/pci/setup-bus.c
> > +++ b/drivers/pci/setup-bus.c
> > @@ -1836,18 +1836,25 @@ static void adjust_bridge_window(struct pci_dev *bridge, struct resource *res,
> >  				 struct list_head *add_list,
> >  				 resource_size_t new_size)
> >  {
> > -	resource_size_t add_size;
> > +	resource_size_t add_size, size = resource_size(res);
> >  
> >  	if (res->parent)
> >  		return;
> >  
> > -	if (resource_size(res) >= new_size)
> > -		return;
> > +	if (new_size > size) {
> > +		add_size = new_size - size;
> > +		pci_dbg(bridge, "bridge window %pR extended by %pa\n", res,
> > +			&add_size);
> > +	} else if (new_size < size) {
> > +		add_size = size - new_size;
> > +		pci_dbg(bridge, "bridge window %pR shrunken by %pa\n", res,
> > +			&add_size);
> > +	}
> 
> Where's the patch that changes the caller so "new_size" may be smaller
> than "size"?  I guess it must be "[3/3] PCI: Consider alignment of
> hot-added bridges ..." because that's the only one that makes a
> non-trivial change, right?

As above, there was always a possibility of the new_size being smaller. 
For some reason, 1M is assigned to bridges, even if nothing is below 
them (for example, unused non hotplug bridges in a Thunderbolt dock). It 
may be an edge case if we are low on space, but theoretically it can 
happen.

Also, when writing this, Mika was not interested in using hpmemsize, 
which, when used, will cause new_size to be smaller than the current 
size (actual size and add_size combined).

So it does not need a patch to cause "new_size" to be smaller than 
"size" - just a change in user behaviour to use pci=hpmemsize.
> 
> > -	add_size = new_size - resource_size(res);
> > -	pci_dbg(bridge, "bridge window %pR extended by %pa\n", res, &add_size);
> >  	res->end = res->start + new_size - 1;
> >  	remove_from_list(add_list, res);
> > +	if (!new_size)
> > +		reset_resource(res);
> 
> I consider reset_resource() to be deprecated because it throws away
> res->flags, which tells us what kind of resource it is
> (mem/io/32-bit/64-bit/prefetchable).  We learn this during
> enumeration, and we shouldn't forget the information until we remove
> the device.

I will look at this, but I distinctly remember doing this because of IO 
BARs which would run out and cause the device not to be enabled.

Can you please comment on IORESOURCE_UNSET and what effect that would 
have if applied to the flags. Also, can you suggest any other solution 
other than making it handle zero-sized resources better? I agree, that 
would be ideal to make it handle zero-sized resources, but given the 
state of drivers/pci/setup-bus.c, I feel like it will just recursively 
open up more cans of worms until we metaphorically stack overflow. And I 
am happy to go down that path. But at some point I feel we will have to 
make some compromises / stop-gap measures to apply some patches and make 
progress, before going down that road. Would you agree?
> 
> If the resource assignment code doesn't do the right thing with a
> zero-sized resource, I think we should fix that code.  Clearing the
> resource struct does nothing with the hardware BAR or window
> registers, so the BAR/window remains enabled unless we do something
> more.  If we don't need a window and we want to disable it, we can do
> that, but it requires writing special values to the hardware
> registers.
> 
> Bjorn

https://lkml.org/lkml/2020/1/7/1544

You describe this as "black magic code", what appears to be the 
assignment code which handles lists of resources. And I agree. I believe 
it is in both our interests to avoid using add_size because nobody 
understands how these are handled. There may be bugs, and there is 
definitely lots of complexity involved. I believe simplicity is key. 
Hence why these changes in this series:

- Change resource size directly instead of using add_size

- Aside from the currently known cases of needing to shrink the 
resource, we cannot know that there will not be more cases of this in 
the future. There is no need for preventing it from shrinking - we have 
an available size for the bridge window, and if that happens to be 
smaller than the bridge window, we have no choice but to shrink. I 
believe this makes the check unnecessary and warrants removal.

Regards,
Nicholas

  reply	other threads:[~2020-01-08  1:36 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-06 15:48 [PATCH v1 4/4] PCI: Allow extend_bridge_window() to shrink resource if necessary Nicholas Johnson
2020-01-07 20:34 ` Bjorn Helgaas
2020-01-08  1:36   ` Nicholas Johnson [this message]
2020-01-13 16:21     ` Mika Westerberg
2020-01-15 14:51       ` Nicholas Johnson
2020-01-15 14:59         ` Mika Westerberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=PSXP216MB043869924730BFD3AA97B0A0803E0@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM \
    --to=nicholas.johnson-opensource@outlook.com.au \
    --cc=benh@kernel.crashing.org \
    --cc=helgaas@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=mika.westerberg@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).