linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: linux-kernel@vger.kernel.org, xuyandong <xuyandong2@huawei.com>,
	Yinghai Lu <yinghai@kernel.org>,
	Jesse Barnes <jbarnes@virtuousgeek.org>,
	linux-pci@vger.kernel.org
Subject: Re: [PATCH v2] PCI: avoid bridge feature re-probing on hotplug
Date: Tue, 15 Jan 2019 16:23:52 -0600	[thread overview]
Message-ID: <20190115222352.GA158366@google.com> (raw)
In-Reply-To: <20190114230507-mutt-send-email-mst@kernel.org>

[-cc stable]

On Mon, Jan 14, 2019 at 11:07:27PM -0500, Michael S. Tsirkin wrote:
> On Thu, Dec 20, 2018 at 05:36:03PM -0500, Michael S. Tsirkin wrote:
> > On Thu, Dec 20, 2018 at 04:31:58PM -0600, Bjorn Helgaas wrote:
> > > On Thu, Dec 20, 2018 at 04:26:54PM -0500, Michael S. Tsirkin wrote:
> > > > On Thu, Dec 20, 2018 at 01:49:50PM -0600, Bjorn Helgaas wrote:
> > > > > On Mon, Dec 17, 2018 at 07:45:41PM -0500, Michael S. Tsirkin wrote:
> > > > > > commit 1f82de10d6b1 ("PCI/x86: don't assume prefetchable ranges are 64bit")
> > > > > > added probing of bridge support for 64 bit memory each time bridge is
> > > > > > re-enumerated.
> > > > > > 
> > > > > > Unfortunately this probing is destructive if any device behind
> > > > > > the bridge is in use at this time.
> > > > > > 
> > > > > > This was observed in the field, see
> > > > > > https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html
> > > > > > and specifically
> > > > > > https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02082.html
> > > > > > 
> > > > > > There's no real need to re-probe the bridge features as the
> > > > > > registers in question never change - detect that using
> > > > > > the memory flag being set (it's always set on the 1st pass since
> > > > > > all PCI2PCI bridges support memory forwarding) and skip the probing.
> > > > > > Thus, only the first call will perform the disruptive probing and sets
> > > > > > the resource flags as required - which we can be reasonably sure happens
> > > > > > before any devices have been configured.
> > > > > > Avoiding repeated calls to pci_bridge_check_ranges might be even nicer.
> > > > > > Unfortunately I couldn't come up with a clean way to do it without a
> > > > > > major probing code refactoring.
> > > > > 
> > > > > I'm OK with major probe code refactoring as long as it's done
> > > > > carefully.  Doing a special-case fix like this solves the immediate
> > > > > problem but adds to the long-term maintenance problem.
> > > > > 
> > > > > As far as I can tell, everything in pci_bridge_check_ranges() should
> > > > > be done once at enumeration-time, e.g., in the pci_read_bridge_bases()
> > > > > path, and pci_bridge_check_ranges() itself should be removed.
> > > > > 
> > > > > If that turns out to be impossible for some reason, we need a comment
> > > > > explaining why.
> > > > 
> > > > Maybe possible but I am not sure how.
> > > > 
> > > > Here's why:
> > > > 
> > > > Upon hotplug we want to poke at new bridges if any, but not the old
> > > > ones.  The issue is that e.g. with ACPI hotplug the event that Linux
> > > > knows how to handle is by design a heavy weight bus rescan.
> > > 
> > > Yeah, it's tricky.  But I don't think it's PCI or ACPI that makes this
> > > tricky; I think it's just the historical baggage of the PCI core
> > > design that makes it hard.
> > > 
> > > Even in the ACPI hotplug path, I think we use this pci_scan_device()
> > > path:
> > > 
> > >   pci_scan_device
> > >     pci_setup_device
> > >       case PCI_HEADER_TYPE_NORMAL:
> > >         pci_read_bases(6)             # normal PCI BARs
> > >       case PCI_HEADER_TYPE_BRIDGE:
> > >         pci_read_bases(2)             # bridge BARs (not windows)
> > > 
> > > Unfortunately that path doesn't call pci_read_bridge_bases() to read
> > > the bridge windows; that currently happens in pcibios_fixup_bus(),
> > > which is only called from pci_scan_child_bus_extend().
> > > 
> > > This is a broken design because reading the bridge apertures is not at
> > > all platform-specific, so it shouldn't be done in a pcibios hook.
> > > And, more to the issue at hand, it shouldn't be done in
> > > pci_scan_child_bus() either.  We might have to *update* the windows
> > > when scanning child buses, but we should be able to do the work of
> > > finding out what windows are implemented and their properties
> > > somewhere in the pci_setup_device() path.
> > > 
> > > > Specifically 
> > > > 
> > > > pci_read_bridge_bases does not
> > > > seem to be called on ACPI hotplug path.
> > > > 
> > > > Rather,
> > > > 
> > > > pci_assign_unassigned_root_bus_resources
> > > > pci_assign_unassigned_bridge_resources
> > > > 
> > > > would be the two functions in question.
> > > > 
> > > > 
> > > > Would above explanation be sufficient? If not, since I understand your
> > > > reluctance to pile up hacks, would you be open to doing the suggested
> > > > rewrite yourself? Me and xuyandong can help test it.
> > > 
> > > I'll be on vacation or holiday most of the time until the new year,
> > > but I put a reminder on my calendar to look at this again then.  I'm
> > > pretty sure we've tried to unravel this in the past, but I can't
> > > remember what issues we tripped over.  Maybe we can make some progress
> > > by restricting the problem we're trying to solve.
> > > 
> > > Thanks for bringing this up!  This is a wart in the PCI core that has
> > > bothered me for a long time, and maybe this is the incentive we need
> > > to make some progress on it.
> > 
> > Sounds good, let me know when I can help with testing.
> 
> FWIW this patch has been in -next through my tree for a while now with
> no ill effects. And the bug it fixes is real. So ... nudge nudge ...
> 
> Would you consider merging if I added a bg fat FIXME on top
> saying the fact that we re-probe multiple times is
> a sign that probing needs to be cleaned up?

I don't think we've really exhausted the possibilities for a cleaner
fix yet.  I have some ideas and would like to test them myself before
making a fool of myself in public.  Is there any chance one of you
could post a minimal way to reproduce the problem?  I fiddled with the
qemu stuff in
https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02082.html
but it's far from minimal and doesn't include the hot-add parts of the
recipe.  I'm pretty much qemu-illiterate, so any hints would be much
appreciated.

Bjorn

  reply	other threads:[~2019-01-15 22:23 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-18  0:45 [PATCH v2] PCI: avoid bridge feature re-probing on hotplug Michael S. Tsirkin
2018-12-20 19:49 ` Bjorn Helgaas
2018-12-20 21:26   ` Michael S. Tsirkin
2018-12-20 22:31     ` Bjorn Helgaas
2018-12-20 22:36       ` Michael S. Tsirkin
2019-01-07 15:01         ` Michael S. Tsirkin
2019-01-15  4:07         ` Michael S. Tsirkin
2019-01-15 22:23           ` Bjorn Helgaas [this message]
2019-01-19 20:12 ` [PATCH v3] " Bjorn Helgaas
2019-01-20 14:49   ` Michael S. Tsirkin
2019-01-22  5:31   ` xuyandong
2019-01-22 13:53     ` Michael S. Tsirkin
2019-01-22 18:58     ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190115222352.GA158366@google.com \
    --to=helgaas@kernel.org \
    --cc=jbarnes@virtuousgeek.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=xuyandong2@huawei.com \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).