linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] PCI: avoid bridge feature re-probing on hotplug
@ 2018-12-18  0:45 Michael S. Tsirkin
  2018-12-20 19:49 ` Bjorn Helgaas
  2019-01-19 20:12 ` [PATCH v3] " Bjorn Helgaas
  0 siblings, 2 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2018-12-18  0:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: xuyandong, stable, Yinghai Lu, Jesse Barnes, Bjorn Helgaas, linux-pci

commit 1f82de10d6b1 ("PCI/x86: don't assume prefetchable ranges are 64bit")
added probing of bridge support for 64 bit memory each time bridge is
re-enumerated.

Unfortunately this probing is destructive if any device behind
the bridge is in use at this time.

This was observed in the field, see
https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html
and specifically
https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02082.html

There's no real need to re-probe the bridge features as the
registers in question never change - detect that using
the memory flag being set (it's always set on the 1st pass since
all PCI2PCI bridges support memory forwarding) and skip the probing.
Thus, only the first call will perform the disruptive probing and sets
the resource flags as required - which we can be reasonably sure happens
before any devices have been configured.
Avoiding repeated calls to pci_bridge_check_ranges might be even nicer.
Unfortunately I couldn't come up with a clean way to do it without a
major probing code refactoring.

Reported-by: xuyandong <xuyandong2@huawei.com>
Tested-by: xuyandong <xuyandong2@huawei.com>
Cc: stable@vger.kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

Please review and consider for stable.

changes from v1:
	comment and commit log updates to address comments by Bjorn.

 drivers/pci/setup-bus.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index ed960436df5e..d5c25d465d97 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -741,6 +741,16 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
 	struct resource *b_res;
 
 	b_res = &bridge->resource[PCI_BRIDGE_RESOURCES];
+
+	/*
+	 * Don't re-check after this was called once already:
+	 * important since bridge might be in use.
+	 * Note: this is only reliable because as per spec all PCI to PCI
+	 * bridges support memory unconditionally so IORESOURCE_MEM is set.
+	 */
+	if (b_res[1].flags & IORESOURCE_MEM)
+		return;
+
 	b_res[1].flags |= IORESOURCE_MEM;
 
 	pci_read_config_word(bridge, PCI_IO_BASE, &io);
-- 
MST

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] PCI: avoid bridge feature re-probing on hotplug
  2018-12-18  0:45 [PATCH v2] PCI: avoid bridge feature re-probing on hotplug Michael S. Tsirkin
@ 2018-12-20 19:49 ` Bjorn Helgaas
  2018-12-20 21:26   ` Michael S. Tsirkin
  2019-01-19 20:12 ` [PATCH v3] " Bjorn Helgaas
  1 sibling, 1 reply; 13+ messages in thread
From: Bjorn Helgaas @ 2018-12-20 19:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, xuyandong, stable, Yinghai Lu, Jesse Barnes, linux-pci

Hi Michael,

On Mon, Dec 17, 2018 at 07:45:41PM -0500, Michael S. Tsirkin wrote:
> commit 1f82de10d6b1 ("PCI/x86: don't assume prefetchable ranges are 64bit")
> added probing of bridge support for 64 bit memory each time bridge is
> re-enumerated.
> 
> Unfortunately this probing is destructive if any device behind
> the bridge is in use at this time.
> 
> This was observed in the field, see
> https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html
> and specifically
> https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02082.html
> 
> There's no real need to re-probe the bridge features as the
> registers in question never change - detect that using
> the memory flag being set (it's always set on the 1st pass since
> all PCI2PCI bridges support memory forwarding) and skip the probing.
> Thus, only the first call will perform the disruptive probing and sets
> the resource flags as required - which we can be reasonably sure happens
> before any devices have been configured.
> Avoiding repeated calls to pci_bridge_check_ranges might be even nicer.
> Unfortunately I couldn't come up with a clean way to do it without a
> major probing code refactoring.

I'm OK with major probe code refactoring as long as it's done
carefully.  Doing a special-case fix like this solves the immediate
problem but adds to the long-term maintenance problem.

As far as I can tell, everything in pci_bridge_check_ranges() should
be done once at enumeration-time, e.g., in the pci_read_bridge_bases()
path, and pci_bridge_check_ranges() itself should be removed.

If that turns out to be impossible for some reason, we need a comment
explaining why.

> Reported-by: xuyandong <xuyandong2@huawei.com>
> Tested-by: xuyandong <xuyandong2@huawei.com>
> Cc: stable@vger.kernel.org
> Cc: Yinghai Lu <yinghai@kernel.org>
> Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> 
> Please review and consider for stable.
> 
> changes from v1:
> 	comment and commit log updates to address comments by Bjorn.
> 
>  drivers/pci/setup-bus.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index ed960436df5e..d5c25d465d97 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -741,6 +741,16 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
>  	struct resource *b_res;
>  
>  	b_res = &bridge->resource[PCI_BRIDGE_RESOURCES];
> +
> +	/*
> +	 * Don't re-check after this was called once already:
> +	 * important since bridge might be in use.
> +	 * Note: this is only reliable because as per spec all PCI to PCI
> +	 * bridges support memory unconditionally so IORESOURCE_MEM is set.
> +	 */
> +	if (b_res[1].flags & IORESOURCE_MEM)
> +		return;
> +
>  	b_res[1].flags |= IORESOURCE_MEM;
>  
>  	pci_read_config_word(bridge, PCI_IO_BASE, &io);
> -- 
> MST

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] PCI: avoid bridge feature re-probing on hotplug
  2018-12-20 19:49 ` Bjorn Helgaas
@ 2018-12-20 21:26   ` Michael S. Tsirkin
  2018-12-20 22:31     ` Bjorn Helgaas
  0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2018-12-20 21:26 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-kernel, xuyandong, stable, Yinghai Lu, Jesse Barnes, linux-pci

On Thu, Dec 20, 2018 at 01:49:50PM -0600, Bjorn Helgaas wrote:
> Hi Michael,
> 
> On Mon, Dec 17, 2018 at 07:45:41PM -0500, Michael S. Tsirkin wrote:
> > commit 1f82de10d6b1 ("PCI/x86: don't assume prefetchable ranges are 64bit")
> > added probing of bridge support for 64 bit memory each time bridge is
> > re-enumerated.
> > 
> > Unfortunately this probing is destructive if any device behind
> > the bridge is in use at this time.
> > 
> > This was observed in the field, see
> > https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html
> > and specifically
> > https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02082.html
> > 
> > There's no real need to re-probe the bridge features as the
> > registers in question never change - detect that using
> > the memory flag being set (it's always set on the 1st pass since
> > all PCI2PCI bridges support memory forwarding) and skip the probing.
> > Thus, only the first call will perform the disruptive probing and sets
> > the resource flags as required - which we can be reasonably sure happens
> > before any devices have been configured.
> > Avoiding repeated calls to pci_bridge_check_ranges might be even nicer.
> > Unfortunately I couldn't come up with a clean way to do it without a
> > major probing code refactoring.
> 
> I'm OK with major probe code refactoring as long as it's done
> carefully.  Doing a special-case fix like this solves the immediate
> problem but adds to the long-term maintenance problem.
> 
> As far as I can tell, everything in pci_bridge_check_ranges() should
> be done once at enumeration-time, e.g., in the pci_read_bridge_bases()
> path, and pci_bridge_check_ranges() itself should be removed.
> 
> If that turns out to be impossible for some reason, we need a comment
> explaining why.

Maybe possible but I am not sure how.

Here's why:

Upon hotplug we want to poke at new bridges if any, but not the old
ones.  The issue is that e.g. with ACPI hotplug the event that Linux
knows how to handle is by design a heavy weight bus rescan.

Specifically 

pci_read_bridge_bases does not
seem to be called on ACPI hotplug path.

Rather,

pci_assign_unassigned_root_bus_resources
pci_assign_unassigned_bridge_resources

would be the two functions in question.


Would above explanation be sufficient? If not, since I understand your
reluctance to pile up hacks, would you be open to doing the suggested
rewrite yourself? Me and xuyandong can help test it.


> > Reported-by: xuyandong <xuyandong2@huawei.com>
> > Tested-by: xuyandong <xuyandong2@huawei.com>
> > Cc: stable@vger.kernel.org
> > Cc: Yinghai Lu <yinghai@kernel.org>
> > Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > 
> > Please review and consider for stable.
> > 
> > changes from v1:
> > 	comment and commit log updates to address comments by Bjorn.
> > 
> >  drivers/pci/setup-bus.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> > index ed960436df5e..d5c25d465d97 100644
> > --- a/drivers/pci/setup-bus.c
> > +++ b/drivers/pci/setup-bus.c
> > @@ -741,6 +741,16 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
> >  	struct resource *b_res;
> >  
> >  	b_res = &bridge->resource[PCI_BRIDGE_RESOURCES];
> > +
> > +	/*
> > +	 * Don't re-check after this was called once already:
> > +	 * important since bridge might be in use.
> > +	 * Note: this is only reliable because as per spec all PCI to PCI
> > +	 * bridges support memory unconditionally so IORESOURCE_MEM is set.
> > +	 */
> > +	if (b_res[1].flags & IORESOURCE_MEM)
> > +		return;
> > +
> >  	b_res[1].flags |= IORESOURCE_MEM;
> >  
> >  	pci_read_config_word(bridge, PCI_IO_BASE, &io);
> > -- 
> > MST

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] PCI: avoid bridge feature re-probing on hotplug
  2018-12-20 21:26   ` Michael S. Tsirkin
@ 2018-12-20 22:31     ` Bjorn Helgaas
  2018-12-20 22:36       ` Michael S. Tsirkin
  0 siblings, 1 reply; 13+ messages in thread
From: Bjorn Helgaas @ 2018-12-20 22:31 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, xuyandong, stable, Yinghai Lu, Jesse Barnes, linux-pci

On Thu, Dec 20, 2018 at 04:26:54PM -0500, Michael S. Tsirkin wrote:
> On Thu, Dec 20, 2018 at 01:49:50PM -0600, Bjorn Helgaas wrote:
> > On Mon, Dec 17, 2018 at 07:45:41PM -0500, Michael S. Tsirkin wrote:
> > > commit 1f82de10d6b1 ("PCI/x86: don't assume prefetchable ranges are 64bit")
> > > added probing of bridge support for 64 bit memory each time bridge is
> > > re-enumerated.
> > > 
> > > Unfortunately this probing is destructive if any device behind
> > > the bridge is in use at this time.
> > > 
> > > This was observed in the field, see
> > > https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html
> > > and specifically
> > > https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02082.html
> > > 
> > > There's no real need to re-probe the bridge features as the
> > > registers in question never change - detect that using
> > > the memory flag being set (it's always set on the 1st pass since
> > > all PCI2PCI bridges support memory forwarding) and skip the probing.
> > > Thus, only the first call will perform the disruptive probing and sets
> > > the resource flags as required - which we can be reasonably sure happens
> > > before any devices have been configured.
> > > Avoiding repeated calls to pci_bridge_check_ranges might be even nicer.
> > > Unfortunately I couldn't come up with a clean way to do it without a
> > > major probing code refactoring.
> > 
> > I'm OK with major probe code refactoring as long as it's done
> > carefully.  Doing a special-case fix like this solves the immediate
> > problem but adds to the long-term maintenance problem.
> > 
> > As far as I can tell, everything in pci_bridge_check_ranges() should
> > be done once at enumeration-time, e.g., in the pci_read_bridge_bases()
> > path, and pci_bridge_check_ranges() itself should be removed.
> > 
> > If that turns out to be impossible for some reason, we need a comment
> > explaining why.
> 
> Maybe possible but I am not sure how.
> 
> Here's why:
> 
> Upon hotplug we want to poke at new bridges if any, but not the old
> ones.  The issue is that e.g. with ACPI hotplug the event that Linux
> knows how to handle is by design a heavy weight bus rescan.

Yeah, it's tricky.  But I don't think it's PCI or ACPI that makes this
tricky; I think it's just the historical baggage of the PCI core
design that makes it hard.

Even in the ACPI hotplug path, I think we use this pci_scan_device()
path:

  pci_scan_device
    pci_setup_device
      case PCI_HEADER_TYPE_NORMAL:
        pci_read_bases(6)             # normal PCI BARs
      case PCI_HEADER_TYPE_BRIDGE:
        pci_read_bases(2)             # bridge BARs (not windows)

Unfortunately that path doesn't call pci_read_bridge_bases() to read
the bridge windows; that currently happens in pcibios_fixup_bus(),
which is only called from pci_scan_child_bus_extend().

This is a broken design because reading the bridge apertures is not at
all platform-specific, so it shouldn't be done in a pcibios hook.
And, more to the issue at hand, it shouldn't be done in
pci_scan_child_bus() either.  We might have to *update* the windows
when scanning child buses, but we should be able to do the work of
finding out what windows are implemented and their properties
somewhere in the pci_setup_device() path.

> Specifically 
> 
> pci_read_bridge_bases does not
> seem to be called on ACPI hotplug path.
> 
> Rather,
> 
> pci_assign_unassigned_root_bus_resources
> pci_assign_unassigned_bridge_resources
> 
> would be the two functions in question.
> 
> 
> Would above explanation be sufficient? If not, since I understand your
> reluctance to pile up hacks, would you be open to doing the suggested
> rewrite yourself? Me and xuyandong can help test it.

I'll be on vacation or holiday most of the time until the new year,
but I put a reminder on my calendar to look at this again then.  I'm
pretty sure we've tried to unravel this in the past, but I can't
remember what issues we tripped over.  Maybe we can make some progress
by restricting the problem we're trying to solve.

Thanks for bringing this up!  This is a wart in the PCI core that has
bothered me for a long time, and maybe this is the incentive we need
to make some progress on it.

Bjorn

> > > Reported-by: xuyandong <xuyandong2@huawei.com>
> > > Tested-by: xuyandong <xuyandong2@huawei.com>
> > > Cc: stable@vger.kernel.org
> > > Cc: Yinghai Lu <yinghai@kernel.org>
> > > Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > > 
> > > Please review and consider for stable.
> > > 
> > > changes from v1:
> > >   comment and commit log updates to address comments by Bjorn.
> > > 
> > >  drivers/pci/setup-bus.c | 10 ++++++++++
> > >  1 file changed, 10 insertions(+)
> > > 
> > > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> > > index ed960436df5e..d5c25d465d97 100644
> > > --- a/drivers/pci/setup-bus.c
> > > +++ b/drivers/pci/setup-bus.c
> > > @@ -741,6 +741,16 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
> > >   struct resource *b_res;
> > >  
> > >   b_res = &bridge->resource[PCI_BRIDGE_RESOURCES];
> > > +
> > > + /*
> > > +  * Don't re-check after this was called once already:
> > > +  * important since bridge might be in use.
> > > +  * Note: this is only reliable because as per spec all PCI to PCI
> > > +  * bridges support memory unconditionally so IORESOURCE_MEM is set.
> > > +  */
> > > + if (b_res[1].flags & IORESOURCE_MEM)
> > > +         return;
> > > +
> > >   b_res[1].flags |= IORESOURCE_MEM;
> > >  
> > >   pci_read_config_word(bridge, PCI_IO_BASE, &io);
> > > -- 
> > > MST

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] PCI: avoid bridge feature re-probing on hotplug
  2018-12-20 22:31     ` Bjorn Helgaas
@ 2018-12-20 22:36       ` Michael S. Tsirkin
  2019-01-07 15:01         ` Michael S. Tsirkin
  2019-01-15  4:07         ` Michael S. Tsirkin
  0 siblings, 2 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2018-12-20 22:36 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-kernel, xuyandong, stable, Yinghai Lu, Jesse Barnes, linux-pci

On Thu, Dec 20, 2018 at 04:31:58PM -0600, Bjorn Helgaas wrote:
> On Thu, Dec 20, 2018 at 04:26:54PM -0500, Michael S. Tsirkin wrote:
> > On Thu, Dec 20, 2018 at 01:49:50PM -0600, Bjorn Helgaas wrote:
> > > On Mon, Dec 17, 2018 at 07:45:41PM -0500, Michael S. Tsirkin wrote:
> > > > commit 1f82de10d6b1 ("PCI/x86: don't assume prefetchable ranges are 64bit")
> > > > added probing of bridge support for 64 bit memory each time bridge is
> > > > re-enumerated.
> > > > 
> > > > Unfortunately this probing is destructive if any device behind
> > > > the bridge is in use at this time.
> > > > 
> > > > This was observed in the field, see
> > > > https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html
> > > > and specifically
> > > > https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02082.html
> > > > 
> > > > There's no real need to re-probe the bridge features as the
> > > > registers in question never change - detect that using
> > > > the memory flag being set (it's always set on the 1st pass since
> > > > all PCI2PCI bridges support memory forwarding) and skip the probing.
> > > > Thus, only the first call will perform the disruptive probing and sets
> > > > the resource flags as required - which we can be reasonably sure happens
> > > > before any devices have been configured.
> > > > Avoiding repeated calls to pci_bridge_check_ranges might be even nicer.
> > > > Unfortunately I couldn't come up with a clean way to do it without a
> > > > major probing code refactoring.
> > > 
> > > I'm OK with major probe code refactoring as long as it's done
> > > carefully.  Doing a special-case fix like this solves the immediate
> > > problem but adds to the long-term maintenance problem.
> > > 
> > > As far as I can tell, everything in pci_bridge_check_ranges() should
> > > be done once at enumeration-time, e.g., in the pci_read_bridge_bases()
> > > path, and pci_bridge_check_ranges() itself should be removed.
> > > 
> > > If that turns out to be impossible for some reason, we need a comment
> > > explaining why.
> > 
> > Maybe possible but I am not sure how.
> > 
> > Here's why:
> > 
> > Upon hotplug we want to poke at new bridges if any, but not the old
> > ones.  The issue is that e.g. with ACPI hotplug the event that Linux
> > knows how to handle is by design a heavy weight bus rescan.
> 
> Yeah, it's tricky.  But I don't think it's PCI or ACPI that makes this
> tricky; I think it's just the historical baggage of the PCI core
> design that makes it hard.
> 
> Even in the ACPI hotplug path, I think we use this pci_scan_device()
> path:
> 
>   pci_scan_device
>     pci_setup_device
>       case PCI_HEADER_TYPE_NORMAL:
>         pci_read_bases(6)             # normal PCI BARs
>       case PCI_HEADER_TYPE_BRIDGE:
>         pci_read_bases(2)             # bridge BARs (not windows)
> 
> Unfortunately that path doesn't call pci_read_bridge_bases() to read
> the bridge windows; that currently happens in pcibios_fixup_bus(),
> which is only called from pci_scan_child_bus_extend().
> 
> This is a broken design because reading the bridge apertures is not at
> all platform-specific, so it shouldn't be done in a pcibios hook.
> And, more to the issue at hand, it shouldn't be done in
> pci_scan_child_bus() either.  We might have to *update* the windows
> when scanning child buses, but we should be able to do the work of
> finding out what windows are implemented and their properties
> somewhere in the pci_setup_device() path.
> 
> > Specifically 
> > 
> > pci_read_bridge_bases does not
> > seem to be called on ACPI hotplug path.
> > 
> > Rather,
> > 
> > pci_assign_unassigned_root_bus_resources
> > pci_assign_unassigned_bridge_resources
> > 
> > would be the two functions in question.
> > 
> > 
> > Would above explanation be sufficient? If not, since I understand your
> > reluctance to pile up hacks, would you be open to doing the suggested
> > rewrite yourself? Me and xuyandong can help test it.
> 
> I'll be on vacation or holiday most of the time until the new year,
> but I put a reminder on my calendar to look at this again then.  I'm
> pretty sure we've tried to unravel this in the past, but I can't
> remember what issues we tripped over.  Maybe we can make some progress
> by restricting the problem we're trying to solve.
> 
> Thanks for bringing this up!  This is a wart in the PCI core that has
> bothered me for a long time, and maybe this is the incentive we need
> to make some progress on it.
> 
> Bjorn


Sounds good, let me know when I can help with testing.


> > > > Reported-by: xuyandong <xuyandong2@huawei.com>
> > > > Tested-by: xuyandong <xuyandong2@huawei.com>
> > > > Cc: stable@vger.kernel.org
> > > > Cc: Yinghai Lu <yinghai@kernel.org>
> > > > Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
> > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > ---
> > > > 
> > > > Please review and consider for stable.
> > > > 
> > > > changes from v1:
> > > >   comment and commit log updates to address comments by Bjorn.
> > > > 
> > > >  drivers/pci/setup-bus.c | 10 ++++++++++
> > > >  1 file changed, 10 insertions(+)
> > > > 
> > > > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> > > > index ed960436df5e..d5c25d465d97 100644
> > > > --- a/drivers/pci/setup-bus.c
> > > > +++ b/drivers/pci/setup-bus.c
> > > > @@ -741,6 +741,16 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
> > > >   struct resource *b_res;
> > > >  
> > > >   b_res = &bridge->resource[PCI_BRIDGE_RESOURCES];
> > > > +
> > > > + /*
> > > > +  * Don't re-check after this was called once already:
> > > > +  * important since bridge might be in use.
> > > > +  * Note: this is only reliable because as per spec all PCI to PCI
> > > > +  * bridges support memory unconditionally so IORESOURCE_MEM is set.
> > > > +  */
> > > > + if (b_res[1].flags & IORESOURCE_MEM)
> > > > +         return;
> > > > +
> > > >   b_res[1].flags |= IORESOURCE_MEM;
> > > >  
> > > >   pci_read_config_word(bridge, PCI_IO_BASE, &io);
> > > > -- 
> > > > MST

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] PCI: avoid bridge feature re-probing on hotplug
  2018-12-20 22:36       ` Michael S. Tsirkin
@ 2019-01-07 15:01         ` Michael S. Tsirkin
  2019-01-15  4:07         ` Michael S. Tsirkin
  1 sibling, 0 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2019-01-07 15:01 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-kernel, xuyandong, stable, Yinghai Lu, Jesse Barnes, linux-pci

On Thu, Dec 20, 2018 at 05:36:03PM -0500, Michael S. Tsirkin wrote:
> On Thu, Dec 20, 2018 at 04:31:58PM -0600, Bjorn Helgaas wrote:
> > On Thu, Dec 20, 2018 at 04:26:54PM -0500, Michael S. Tsirkin wrote:
> > > On Thu, Dec 20, 2018 at 01:49:50PM -0600, Bjorn Helgaas wrote:
> > > > On Mon, Dec 17, 2018 at 07:45:41PM -0500, Michael S. Tsirkin wrote:
> > > > > commit 1f82de10d6b1 ("PCI/x86: don't assume prefetchable ranges are 64bit")
> > > > > added probing of bridge support for 64 bit memory each time bridge is
> > > > > re-enumerated.
> > > > > 
> > > > > Unfortunately this probing is destructive if any device behind
> > > > > the bridge is in use at this time.
> > > > > 
> > > > > This was observed in the field, see
> > > > > https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html
> > > > > and specifically
> > > > > https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02082.html
> > > > > 
> > > > > There's no real need to re-probe the bridge features as the
> > > > > registers in question never change - detect that using
> > > > > the memory flag being set (it's always set on the 1st pass since
> > > > > all PCI2PCI bridges support memory forwarding) and skip the probing.
> > > > > Thus, only the first call will perform the disruptive probing and sets
> > > > > the resource flags as required - which we can be reasonably sure happens
> > > > > before any devices have been configured.
> > > > > Avoiding repeated calls to pci_bridge_check_ranges might be even nicer.
> > > > > Unfortunately I couldn't come up with a clean way to do it without a
> > > > > major probing code refactoring.
> > > > 
> > > > I'm OK with major probe code refactoring as long as it's done
> > > > carefully.  Doing a special-case fix like this solves the immediate
> > > > problem but adds to the long-term maintenance problem.
> > > > 
> > > > As far as I can tell, everything in pci_bridge_check_ranges() should
> > > > be done once at enumeration-time, e.g., in the pci_read_bridge_bases()
> > > > path, and pci_bridge_check_ranges() itself should be removed.
> > > > 
> > > > If that turns out to be impossible for some reason, we need a comment
> > > > explaining why.
> > > 
> > > Maybe possible but I am not sure how.
> > > 
> > > Here's why:
> > > 
> > > Upon hotplug we want to poke at new bridges if any, but not the old
> > > ones.  The issue is that e.g. with ACPI hotplug the event that Linux
> > > knows how to handle is by design a heavy weight bus rescan.
> > 
> > Yeah, it's tricky.  But I don't think it's PCI or ACPI that makes this
> > tricky; I think it's just the historical baggage of the PCI core
> > design that makes it hard.
> > 
> > Even in the ACPI hotplug path, I think we use this pci_scan_device()
> > path:
> > 
> >   pci_scan_device
> >     pci_setup_device
> >       case PCI_HEADER_TYPE_NORMAL:
> >         pci_read_bases(6)             # normal PCI BARs
> >       case PCI_HEADER_TYPE_BRIDGE:
> >         pci_read_bases(2)             # bridge BARs (not windows)
> > 
> > Unfortunately that path doesn't call pci_read_bridge_bases() to read
> > the bridge windows; that currently happens in pcibios_fixup_bus(),
> > which is only called from pci_scan_child_bus_extend().
> > 
> > This is a broken design because reading the bridge apertures is not at
> > all platform-specific, so it shouldn't be done in a pcibios hook.
> > And, more to the issue at hand, it shouldn't be done in
> > pci_scan_child_bus() either.  We might have to *update* the windows
> > when scanning child buses, but we should be able to do the work of
> > finding out what windows are implemented and their properties
> > somewhere in the pci_setup_device() path.
> > 
> > > Specifically 
> > > 
> > > pci_read_bridge_bases does not
> > > seem to be called on ACPI hotplug path.
> > > 
> > > Rather,
> > > 
> > > pci_assign_unassigned_root_bus_resources
> > > pci_assign_unassigned_bridge_resources
> > > 
> > > would be the two functions in question.
> > > 
> > > 
> > > Would above explanation be sufficient? If not, since I understand your
> > > reluctance to pile up hacks, would you be open to doing the suggested
> > > rewrite yourself? Me and xuyandong can help test it.
> > 
> > I'll be on vacation or holiday most of the time until the new year,
> > but I put a reminder on my calendar to look at this again then.  I'm
> > pretty sure we've tried to unravel this in the past, but I can't
> > remember what issues we tripped over.  Maybe we can make some progress
> > by restricting the problem we're trying to solve.
> > 
> > Thanks for bringing this up!  This is a wart in the PCI core that has
> > bothered me for a long time, and maybe this is the incentive we need
> > to make some progress on it.
> > 
> > Bjorn
> 
> 
> Sounds good, let me know when I can help with testing.

Ping.

I guess I'll just have put this patch in the RHEL kernel for now.
No point in keeping kernels crashing until we can
make it perfect ...

> > > > > Reported-by: xuyandong <xuyandong2@huawei.com>
> > > > > Tested-by: xuyandong <xuyandong2@huawei.com>
> > > > > Cc: stable@vger.kernel.org
> > > > > Cc: Yinghai Lu <yinghai@kernel.org>
> > > > > Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
> > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > ---
> > > > > 
> > > > > Please review and consider for stable.
> > > > > 
> > > > > changes from v1:
> > > > >   comment and commit log updates to address comments by Bjorn.
> > > > > 
> > > > >  drivers/pci/setup-bus.c | 10 ++++++++++
> > > > >  1 file changed, 10 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> > > > > index ed960436df5e..d5c25d465d97 100644
> > > > > --- a/drivers/pci/setup-bus.c
> > > > > +++ b/drivers/pci/setup-bus.c
> > > > > @@ -741,6 +741,16 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
> > > > >   struct resource *b_res;
> > > > >  
> > > > >   b_res = &bridge->resource[PCI_BRIDGE_RESOURCES];
> > > > > +
> > > > > + /*
> > > > > +  * Don't re-check after this was called once already:
> > > > > +  * important since bridge might be in use.
> > > > > +  * Note: this is only reliable because as per spec all PCI to PCI
> > > > > +  * bridges support memory unconditionally so IORESOURCE_MEM is set.
> > > > > +  */
> > > > > + if (b_res[1].flags & IORESOURCE_MEM)
> > > > > +         return;
> > > > > +
> > > > >   b_res[1].flags |= IORESOURCE_MEM;
> > > > >  
> > > > >   pci_read_config_word(bridge, PCI_IO_BASE, &io);
> > > > > -- 
> > > > > MST

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] PCI: avoid bridge feature re-probing on hotplug
  2018-12-20 22:36       ` Michael S. Tsirkin
  2019-01-07 15:01         ` Michael S. Tsirkin
@ 2019-01-15  4:07         ` Michael S. Tsirkin
  2019-01-15 22:23           ` Bjorn Helgaas
  1 sibling, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2019-01-15  4:07 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-kernel, xuyandong, stable, Yinghai Lu, Jesse Barnes, linux-pci

On Thu, Dec 20, 2018 at 05:36:03PM -0500, Michael S. Tsirkin wrote:
> On Thu, Dec 20, 2018 at 04:31:58PM -0600, Bjorn Helgaas wrote:
> > On Thu, Dec 20, 2018 at 04:26:54PM -0500, Michael S. Tsirkin wrote:
> > > On Thu, Dec 20, 2018 at 01:49:50PM -0600, Bjorn Helgaas wrote:
> > > > On Mon, Dec 17, 2018 at 07:45:41PM -0500, Michael S. Tsirkin wrote:
> > > > > commit 1f82de10d6b1 ("PCI/x86: don't assume prefetchable ranges are 64bit")
> > > > > added probing of bridge support for 64 bit memory each time bridge is
> > > > > re-enumerated.
> > > > > 
> > > > > Unfortunately this probing is destructive if any device behind
> > > > > the bridge is in use at this time.
> > > > > 
> > > > > This was observed in the field, see
> > > > > https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html
> > > > > and specifically
> > > > > https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02082.html
> > > > > 
> > > > > There's no real need to re-probe the bridge features as the
> > > > > registers in question never change - detect that using
> > > > > the memory flag being set (it's always set on the 1st pass since
> > > > > all PCI2PCI bridges support memory forwarding) and skip the probing.
> > > > > Thus, only the first call will perform the disruptive probing and sets
> > > > > the resource flags as required - which we can be reasonably sure happens
> > > > > before any devices have been configured.
> > > > > Avoiding repeated calls to pci_bridge_check_ranges might be even nicer.
> > > > > Unfortunately I couldn't come up with a clean way to do it without a
> > > > > major probing code refactoring.
> > > > 
> > > > I'm OK with major probe code refactoring as long as it's done
> > > > carefully.  Doing a special-case fix like this solves the immediate
> > > > problem but adds to the long-term maintenance problem.
> > > > 
> > > > As far as I can tell, everything in pci_bridge_check_ranges() should
> > > > be done once at enumeration-time, e.g., in the pci_read_bridge_bases()
> > > > path, and pci_bridge_check_ranges() itself should be removed.
> > > > 
> > > > If that turns out to be impossible for some reason, we need a comment
> > > > explaining why.
> > > 
> > > Maybe possible but I am not sure how.
> > > 
> > > Here's why:
> > > 
> > > Upon hotplug we want to poke at new bridges if any, but not the old
> > > ones.  The issue is that e.g. with ACPI hotplug the event that Linux
> > > knows how to handle is by design a heavy weight bus rescan.
> > 
> > Yeah, it's tricky.  But I don't think it's PCI or ACPI that makes this
> > tricky; I think it's just the historical baggage of the PCI core
> > design that makes it hard.
> > 
> > Even in the ACPI hotplug path, I think we use this pci_scan_device()
> > path:
> > 
> >   pci_scan_device
> >     pci_setup_device
> >       case PCI_HEADER_TYPE_NORMAL:
> >         pci_read_bases(6)             # normal PCI BARs
> >       case PCI_HEADER_TYPE_BRIDGE:
> >         pci_read_bases(2)             # bridge BARs (not windows)
> > 
> > Unfortunately that path doesn't call pci_read_bridge_bases() to read
> > the bridge windows; that currently happens in pcibios_fixup_bus(),
> > which is only called from pci_scan_child_bus_extend().
> > 
> > This is a broken design because reading the bridge apertures is not at
> > all platform-specific, so it shouldn't be done in a pcibios hook.
> > And, more to the issue at hand, it shouldn't be done in
> > pci_scan_child_bus() either.  We might have to *update* the windows
> > when scanning child buses, but we should be able to do the work of
> > finding out what windows are implemented and their properties
> > somewhere in the pci_setup_device() path.
> > 
> > > Specifically 
> > > 
> > > pci_read_bridge_bases does not
> > > seem to be called on ACPI hotplug path.
> > > 
> > > Rather,
> > > 
> > > pci_assign_unassigned_root_bus_resources
> > > pci_assign_unassigned_bridge_resources
> > > 
> > > would be the two functions in question.
> > > 
> > > 
> > > Would above explanation be sufficient? If not, since I understand your
> > > reluctance to pile up hacks, would you be open to doing the suggested
> > > rewrite yourself? Me and xuyandong can help test it.
> > 
> > I'll be on vacation or holiday most of the time until the new year,
> > but I put a reminder on my calendar to look at this again then.  I'm
> > pretty sure we've tried to unravel this in the past, but I can't
> > remember what issues we tripped over.  Maybe we can make some progress
> > by restricting the problem we're trying to solve.
> > 
> > Thanks for bringing this up!  This is a wart in the PCI core that has
> > bothered me for a long time, and maybe this is the incentive we need
> > to make some progress on it.
> > 
> > Bjorn
> 
> 
> Sounds good, let me know when I can help with testing.

FWIW this patch has been in -next through my tree for a while now with
no ill effects. And the bug it fixes is real. So ... nudge nudge ...

Would you consider merging if I added a bg fat FIXME on top
saying the fact that we re-probe multiple times is
a sign that probing needs to be cleaned up?

> 
> > > > > Reported-by: xuyandong <xuyandong2@huawei.com>
> > > > > Tested-by: xuyandong <xuyandong2@huawei.com>
> > > > > Cc: stable@vger.kernel.org
> > > > > Cc: Yinghai Lu <yinghai@kernel.org>
> > > > > Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
> > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > ---
> > > > > 
> > > > > Please review and consider for stable.
> > > > > 
> > > > > changes from v1:
> > > > >   comment and commit log updates to address comments by Bjorn.
> > > > > 
> > > > >  drivers/pci/setup-bus.c | 10 ++++++++++
> > > > >  1 file changed, 10 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> > > > > index ed960436df5e..d5c25d465d97 100644
> > > > > --- a/drivers/pci/setup-bus.c
> > > > > +++ b/drivers/pci/setup-bus.c
> > > > > @@ -741,6 +741,16 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
> > > > >   struct resource *b_res;
> > > > >  
> > > > >   b_res = &bridge->resource[PCI_BRIDGE_RESOURCES];
> > > > > +
> > > > > + /*
> > > > > +  * Don't re-check after this was called once already:
> > > > > +  * important since bridge might be in use.
> > > > > +  * Note: this is only reliable because as per spec all PCI to PCI
> > > > > +  * bridges support memory unconditionally so IORESOURCE_MEM is set.
> > > > > +  */
> > > > > + if (b_res[1].flags & IORESOURCE_MEM)
> > > > > +         return;
> > > > > +
> > > > >   b_res[1].flags |= IORESOURCE_MEM;
> > > > >  
> > > > >   pci_read_config_word(bridge, PCI_IO_BASE, &io);
> > > > > -- 
> > > > > MST

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] PCI: avoid bridge feature re-probing on hotplug
  2019-01-15  4:07         ` Michael S. Tsirkin
@ 2019-01-15 22:23           ` Bjorn Helgaas
  0 siblings, 0 replies; 13+ messages in thread
From: Bjorn Helgaas @ 2019-01-15 22:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, xuyandong, Yinghai Lu, Jesse Barnes, linux-pci

[-cc stable]

On Mon, Jan 14, 2019 at 11:07:27PM -0500, Michael S. Tsirkin wrote:
> On Thu, Dec 20, 2018 at 05:36:03PM -0500, Michael S. Tsirkin wrote:
> > On Thu, Dec 20, 2018 at 04:31:58PM -0600, Bjorn Helgaas wrote:
> > > On Thu, Dec 20, 2018 at 04:26:54PM -0500, Michael S. Tsirkin wrote:
> > > > On Thu, Dec 20, 2018 at 01:49:50PM -0600, Bjorn Helgaas wrote:
> > > > > On Mon, Dec 17, 2018 at 07:45:41PM -0500, Michael S. Tsirkin wrote:
> > > > > > commit 1f82de10d6b1 ("PCI/x86: don't assume prefetchable ranges are 64bit")
> > > > > > added probing of bridge support for 64 bit memory each time bridge is
> > > > > > re-enumerated.
> > > > > > 
> > > > > > Unfortunately this probing is destructive if any device behind
> > > > > > the bridge is in use at this time.
> > > > > > 
> > > > > > This was observed in the field, see
> > > > > > https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html
> > > > > > and specifically
> > > > > > https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02082.html
> > > > > > 
> > > > > > There's no real need to re-probe the bridge features as the
> > > > > > registers in question never change - detect that using
> > > > > > the memory flag being set (it's always set on the 1st pass since
> > > > > > all PCI2PCI bridges support memory forwarding) and skip the probing.
> > > > > > Thus, only the first call will perform the disruptive probing and sets
> > > > > > the resource flags as required - which we can be reasonably sure happens
> > > > > > before any devices have been configured.
> > > > > > Avoiding repeated calls to pci_bridge_check_ranges might be even nicer.
> > > > > > Unfortunately I couldn't come up with a clean way to do it without a
> > > > > > major probing code refactoring.
> > > > > 
> > > > > I'm OK with major probe code refactoring as long as it's done
> > > > > carefully.  Doing a special-case fix like this solves the immediate
> > > > > problem but adds to the long-term maintenance problem.
> > > > > 
> > > > > As far as I can tell, everything in pci_bridge_check_ranges() should
> > > > > be done once at enumeration-time, e.g., in the pci_read_bridge_bases()
> > > > > path, and pci_bridge_check_ranges() itself should be removed.
> > > > > 
> > > > > If that turns out to be impossible for some reason, we need a comment
> > > > > explaining why.
> > > > 
> > > > Maybe possible but I am not sure how.
> > > > 
> > > > Here's why:
> > > > 
> > > > Upon hotplug we want to poke at new bridges if any, but not the old
> > > > ones.  The issue is that e.g. with ACPI hotplug the event that Linux
> > > > knows how to handle is by design a heavy weight bus rescan.
> > > 
> > > Yeah, it's tricky.  But I don't think it's PCI or ACPI that makes this
> > > tricky; I think it's just the historical baggage of the PCI core
> > > design that makes it hard.
> > > 
> > > Even in the ACPI hotplug path, I think we use this pci_scan_device()
> > > path:
> > > 
> > >   pci_scan_device
> > >     pci_setup_device
> > >       case PCI_HEADER_TYPE_NORMAL:
> > >         pci_read_bases(6)             # normal PCI BARs
> > >       case PCI_HEADER_TYPE_BRIDGE:
> > >         pci_read_bases(2)             # bridge BARs (not windows)
> > > 
> > > Unfortunately that path doesn't call pci_read_bridge_bases() to read
> > > the bridge windows; that currently happens in pcibios_fixup_bus(),
> > > which is only called from pci_scan_child_bus_extend().
> > > 
> > > This is a broken design because reading the bridge apertures is not at
> > > all platform-specific, so it shouldn't be done in a pcibios hook.
> > > And, more to the issue at hand, it shouldn't be done in
> > > pci_scan_child_bus() either.  We might have to *update* the windows
> > > when scanning child buses, but we should be able to do the work of
> > > finding out what windows are implemented and their properties
> > > somewhere in the pci_setup_device() path.
> > > 
> > > > Specifically 
> > > > 
> > > > pci_read_bridge_bases does not
> > > > seem to be called on ACPI hotplug path.
> > > > 
> > > > Rather,
> > > > 
> > > > pci_assign_unassigned_root_bus_resources
> > > > pci_assign_unassigned_bridge_resources
> > > > 
> > > > would be the two functions in question.
> > > > 
> > > > 
> > > > Would above explanation be sufficient? If not, since I understand your
> > > > reluctance to pile up hacks, would you be open to doing the suggested
> > > > rewrite yourself? Me and xuyandong can help test it.
> > > 
> > > I'll be on vacation or holiday most of the time until the new year,
> > > but I put a reminder on my calendar to look at this again then.  I'm
> > > pretty sure we've tried to unravel this in the past, but I can't
> > > remember what issues we tripped over.  Maybe we can make some progress
> > > by restricting the problem we're trying to solve.
> > > 
> > > Thanks for bringing this up!  This is a wart in the PCI core that has
> > > bothered me for a long time, and maybe this is the incentive we need
> > > to make some progress on it.
> > 
> > Sounds good, let me know when I can help with testing.
> 
> FWIW this patch has been in -next through my tree for a while now with
> no ill effects. And the bug it fixes is real. So ... nudge nudge ...
> 
> Would you consider merging if I added a bg fat FIXME on top
> saying the fact that we re-probe multiple times is
> a sign that probing needs to be cleaned up?

I don't think we've really exhausted the possibilities for a cleaner
fix yet.  I have some ideas and would like to test them myself before
making a fool of myself in public.  Is there any chance one of you
could post a minimal way to reproduce the problem?  I fiddled with the
qemu stuff in
https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02082.html
but it's far from minimal and doesn't include the hot-add parts of the
recipe.  I'm pretty much qemu-illiterate, so any hints would be much
appreciated.

Bjorn

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3] PCI: avoid bridge feature re-probing on hotplug
  2018-12-18  0:45 [PATCH v2] PCI: avoid bridge feature re-probing on hotplug Michael S. Tsirkin
  2018-12-20 19:49 ` Bjorn Helgaas
@ 2019-01-19 20:12 ` Bjorn Helgaas
  2019-01-20 14:49   ` Michael S. Tsirkin
  2019-01-22  5:31   ` xuyandong
  1 sibling, 2 replies; 13+ messages in thread
From: Bjorn Helgaas @ 2019-01-19 20:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, xuyandong, Yinghai Lu, Jesse Barnes, linux-pci,
	Sagi Grimberg, Ofer Hayut, Roy Shterman, Keith Busch, Zhou Wang

I gave up trying to reproduce the problem and test this patch with qemu;
can you guys (Michael and Xu (sorry if I mangled your name)) give this a
try?

I cc'd a few other people who have noticed this issue in the past, so just
FYI for them.

Bjorn


commit dd21b922db366ba069291b6fef2a8ce6768756a2
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Sat Jan 19 11:35:04 2019 -0600

    PCI: Probe bridge window attributes once at enumeration-time
    
    pci_bridge_check_ranges() determines whether a bridge supports the optional
    I/O and prefetchable memory windows and sets the flag bits in the bridge
    resources.  This could be done once during enumeration except that the
    resource allocation code completely clears the flag bits, e.g., in the
    pci_assign_unassigned_bridge_resources() path.
    
    The problem was that in some cases pci_bridge_check_ranges() *changes* the
    window registers to determine whether they're writable, and this may break
    concurrent accesses to devices behind the bridge.
    
    Add a new pci_read_bridge_windows() to determine whether a bridge supports
    the optional windows, call it once during enumeration, remember the
    results, and change pci_bridge_check_ranges() to set the flag bits based on
    those remembered results.
    
    Link: https://lore.kernel.org/linux-pci/1506151482-113560-1-git-send-email-wangzhou1@hisilicon.com
    Link: https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02082.html
    Reported-by: xuyandong <xuyandong2@huawei.com>
    Cc: Sagi Grimberg <sagi@grimberg.me>
    Cc: Ofer Hayut <ofer@lightbitslabs.com>
    Cc: Roy Shterman <roys@lightbitslabs.com>
    Cc: Keith Busch <keith.busch@intel.com>
    Cc: Zhou Wang <wangzhou1@hisilicon.com>

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 257b9f6f2ebb..2ef8b954c65a 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -348,6 +348,57 @@ static void pci_read_bases(struct pci_dev *dev, unsigned int howmany, int rom)
 	}
 }
 
+static void pci_read_bridge_windows(struct pci_dev *bridge)
+{
+	u16 io;
+	u32 pmem, tmp;
+
+	pci_read_config_word(bridge, PCI_IO_BASE, &io);
+	if (!io) {
+		pci_write_config_word(bridge, PCI_IO_BASE, 0xe0f0);
+		pci_read_config_word(bridge, PCI_IO_BASE, &io);
+		pci_write_config_word(bridge, PCI_IO_BASE, 0x0);
+	}
+	if (io)
+		bridge->io_window = 1;
+
+	/*
+	 * DECchip 21050 pass 2 errata: the bridge may miss an address
+	 * disconnect boundary by one PCI data phase.  Workaround: do not
+	 * use prefetching on this device.
+	 */
+	if (bridge->vendor == PCI_VENDOR_ID_DEC && bridge->device == 0x0001)
+		return;
+
+	pci_read_config_dword(bridge, PCI_PREF_MEMORY_BASE, &pmem);
+	if (!pmem) {
+		pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE,
+					       0xffe0fff0);
+		pci_read_config_dword(bridge, PCI_PREF_MEMORY_BASE, &pmem);
+		pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE, 0x0);
+	}
+	if (!pmem)
+		return;
+
+	bridge->pref_window = 1;
+
+	if ((pmem & PCI_PREF_RANGE_TYPE_MASK) == PCI_PREF_RANGE_TYPE_64) {
+
+		/*
+		 * Bridge claims to have a 64-bit prefetchable memory
+		 * window; verify that the upper bits are actually
+		 * writable.
+		 */
+		pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32, &pmem);
+		pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
+				       0xffffffff);
+		pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32, &tmp);
+		pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, pmem);
+		if (tmp)
+			bridge->pref_64_window = 1;
+	}
+}
+
 static void pci_read_bridge_io(struct pci_bus *child)
 {
 	struct pci_dev *dev = child->self;
@@ -1739,6 +1790,7 @@ int pci_setup_device(struct pci_dev *dev)
 		pci_read_irq(dev);
 		dev->transparent = ((dev->class & 0xff) == 1);
 		pci_read_bases(dev, 2, PCI_ROM_ADDRESS1);
+		pci_read_bridge_windows(dev);
 		set_pcie_hotplug_bridge(dev);
 		pos = pci_find_capability(dev, PCI_CAP_ID_SSVID);
 		if (pos) {
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index ed960436df5e..1941bb0a6c13 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -735,58 +735,21 @@ int pci_claim_bridge_resource(struct pci_dev *bridge, int i)
    base/limit registers must be read-only and read as 0. */
 static void pci_bridge_check_ranges(struct pci_bus *bus)
 {
-	u16 io;
-	u32 pmem;
 	struct pci_dev *bridge = bus->self;
-	struct resource *b_res;
+	struct resource *b_res = &bridge->resource[PCI_BRIDGE_RESOURCES];
 
-	b_res = &bridge->resource[PCI_BRIDGE_RESOURCES];
 	b_res[1].flags |= IORESOURCE_MEM;
 
-	pci_read_config_word(bridge, PCI_IO_BASE, &io);
-	if (!io) {
-		pci_write_config_word(bridge, PCI_IO_BASE, 0xe0f0);
-		pci_read_config_word(bridge, PCI_IO_BASE, &io);
-		pci_write_config_word(bridge, PCI_IO_BASE, 0x0);
-	}
-	if (io)
+	if (bridge->io_window)
 		b_res[0].flags |= IORESOURCE_IO;
 
-	/*  DECchip 21050 pass 2 errata: the bridge may miss an address
-	    disconnect boundary by one PCI data phase.
-	    Workaround: do not use prefetching on this device. */
-	if (bridge->vendor == PCI_VENDOR_ID_DEC && bridge->device == 0x0001)
-		return;
-
-	pci_read_config_dword(bridge, PCI_PREF_MEMORY_BASE, &pmem);
-	if (!pmem) {
-		pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE,
-					       0xffe0fff0);
-		pci_read_config_dword(bridge, PCI_PREF_MEMORY_BASE, &pmem);
-		pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE, 0x0);
-	}
-	if (pmem) {
+	if (bridge->pref_window) {
 		b_res[2].flags |= IORESOURCE_MEM | IORESOURCE_PREFETCH;
-		if ((pmem & PCI_PREF_RANGE_TYPE_MASK) ==
-		    PCI_PREF_RANGE_TYPE_64) {
+		if (bridge->pref_64_window) {
 			b_res[2].flags |= IORESOURCE_MEM_64;
 			b_res[2].flags |= PCI_PREF_RANGE_TYPE_64;
 		}
 	}
-
-	/* double check if bridge does support 64 bit pref */
-	if (b_res[2].flags & IORESOURCE_MEM_64) {
-		u32 mem_base_hi, tmp;
-		pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32,
-					 &mem_base_hi);
-		pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
-					       0xffffffff);
-		pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32, &tmp);
-		if (!tmp)
-			b_res[2].flags &= ~IORESOURCE_MEM_64;
-		pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
-				       mem_base_hi);
-	}
 }
 
 /* Helper function for sizing routines: find first available
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 65f1d8c2f082..40b327b814aa 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -373,6 +373,9 @@ struct pci_dev {
 	bool		match_driver;		/* Skip attaching driver */
 
 	unsigned int	transparent:1;		/* Subtractive decode bridge */
+	unsigned int	io_window:1;		/* Bridge has I/O window */
+	unsigned int	pref_window:1;		/* Bridge has pref mem window */
+	unsigned int	pref_64_window:1;	/* Pref mem window is 64-bit */
 	unsigned int	multifunction:1;	/* Multi-function device */
 
 	unsigned int	is_busmaster:1;		/* Is busmaster */

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] PCI: avoid bridge feature re-probing on hotplug
  2019-01-19 20:12 ` [PATCH v3] " Bjorn Helgaas
@ 2019-01-20 14:49   ` Michael S. Tsirkin
  2019-01-22  5:31   ` xuyandong
  1 sibling, 0 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2019-01-20 14:49 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-kernel, xuyandong, Yinghai Lu, Jesse Barnes, linux-pci,
	Sagi Grimberg, Ofer Hayut, Roy Shterman, Keith Busch, Zhou Wang

Will do within a week, thanks a lot!

On Sat, Jan 19, 2019 at 02:12:59PM -0600, Bjorn Helgaas wrote:
> I gave up trying to reproduce the problem and test this patch with qemu;
> can you guys (Michael and Xu (sorry if I mangled your name)) give this a
> try?
> 
> I cc'd a few other people who have noticed this issue in the past, so just
> FYI for them.
> 
> Bjorn
> 
> 
> commit dd21b922db366ba069291b6fef2a8ce6768756a2
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date:   Sat Jan 19 11:35:04 2019 -0600
> 
>     PCI: Probe bridge window attributes once at enumeration-time
>     
>     pci_bridge_check_ranges() determines whether a bridge supports the optional
>     I/O and prefetchable memory windows and sets the flag bits in the bridge
>     resources.  This could be done once during enumeration except that the
>     resource allocation code completely clears the flag bits, e.g., in the
>     pci_assign_unassigned_bridge_resources() path.
>     
>     The problem was that in some cases pci_bridge_check_ranges() *changes* the
>     window registers to determine whether they're writable, and this may break
>     concurrent accesses to devices behind the bridge.
>     
>     Add a new pci_read_bridge_windows() to determine whether a bridge supports
>     the optional windows, call it once during enumeration, remember the
>     results, and change pci_bridge_check_ranges() to set the flag bits based on
>     those remembered results.
>     
>     Link: https://lore.kernel.org/linux-pci/1506151482-113560-1-git-send-email-wangzhou1@hisilicon.com
>     Link: https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02082.html
>     Reported-by: xuyandong <xuyandong2@huawei.com>
>     Cc: Sagi Grimberg <sagi@grimberg.me>
>     Cc: Ofer Hayut <ofer@lightbitslabs.com>
>     Cc: Roy Shterman <roys@lightbitslabs.com>
>     Cc: Keith Busch <keith.busch@intel.com>
>     Cc: Zhou Wang <wangzhou1@hisilicon.com>
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 257b9f6f2ebb..2ef8b954c65a 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -348,6 +348,57 @@ static void pci_read_bases(struct pci_dev *dev, unsigned int howmany, int rom)
>  	}
>  }
>  
> +static void pci_read_bridge_windows(struct pci_dev *bridge)
> +{
> +	u16 io;
> +	u32 pmem, tmp;
> +
> +	pci_read_config_word(bridge, PCI_IO_BASE, &io);
> +	if (!io) {
> +		pci_write_config_word(bridge, PCI_IO_BASE, 0xe0f0);
> +		pci_read_config_word(bridge, PCI_IO_BASE, &io);
> +		pci_write_config_word(bridge, PCI_IO_BASE, 0x0);
> +	}
> +	if (io)
> +		bridge->io_window = 1;
> +
> +	/*
> +	 * DECchip 21050 pass 2 errata: the bridge may miss an address
> +	 * disconnect boundary by one PCI data phase.  Workaround: do not
> +	 * use prefetching on this device.
> +	 */
> +	if (bridge->vendor == PCI_VENDOR_ID_DEC && bridge->device == 0x0001)
> +		return;
> +
> +	pci_read_config_dword(bridge, PCI_PREF_MEMORY_BASE, &pmem);
> +	if (!pmem) {
> +		pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE,
> +					       0xffe0fff0);
> +		pci_read_config_dword(bridge, PCI_PREF_MEMORY_BASE, &pmem);
> +		pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE, 0x0);
> +	}
> +	if (!pmem)
> +		return;
> +
> +	bridge->pref_window = 1;
> +
> +	if ((pmem & PCI_PREF_RANGE_TYPE_MASK) == PCI_PREF_RANGE_TYPE_64) {
> +
> +		/*
> +		 * Bridge claims to have a 64-bit prefetchable memory
> +		 * window; verify that the upper bits are actually
> +		 * writable.
> +		 */
> +		pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32, &pmem);
> +		pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
> +				       0xffffffff);
> +		pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32, &tmp);
> +		pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, pmem);
> +		if (tmp)
> +			bridge->pref_64_window = 1;
> +	}
> +}
> +
>  static void pci_read_bridge_io(struct pci_bus *child)
>  {
>  	struct pci_dev *dev = child->self;
> @@ -1739,6 +1790,7 @@ int pci_setup_device(struct pci_dev *dev)
>  		pci_read_irq(dev);
>  		dev->transparent = ((dev->class & 0xff) == 1);
>  		pci_read_bases(dev, 2, PCI_ROM_ADDRESS1);
> +		pci_read_bridge_windows(dev);
>  		set_pcie_hotplug_bridge(dev);
>  		pos = pci_find_capability(dev, PCI_CAP_ID_SSVID);
>  		if (pos) {
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index ed960436df5e..1941bb0a6c13 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -735,58 +735,21 @@ int pci_claim_bridge_resource(struct pci_dev *bridge, int i)
>     base/limit registers must be read-only and read as 0. */
>  static void pci_bridge_check_ranges(struct pci_bus *bus)
>  {
> -	u16 io;
> -	u32 pmem;
>  	struct pci_dev *bridge = bus->self;
> -	struct resource *b_res;
> +	struct resource *b_res = &bridge->resource[PCI_BRIDGE_RESOURCES];
>  
> -	b_res = &bridge->resource[PCI_BRIDGE_RESOURCES];
>  	b_res[1].flags |= IORESOURCE_MEM;
>  
> -	pci_read_config_word(bridge, PCI_IO_BASE, &io);
> -	if (!io) {
> -		pci_write_config_word(bridge, PCI_IO_BASE, 0xe0f0);
> -		pci_read_config_word(bridge, PCI_IO_BASE, &io);
> -		pci_write_config_word(bridge, PCI_IO_BASE, 0x0);
> -	}
> -	if (io)
> +	if (bridge->io_window)
>  		b_res[0].flags |= IORESOURCE_IO;
>  
> -	/*  DECchip 21050 pass 2 errata: the bridge may miss an address
> -	    disconnect boundary by one PCI data phase.
> -	    Workaround: do not use prefetching on this device. */
> -	if (bridge->vendor == PCI_VENDOR_ID_DEC && bridge->device == 0x0001)
> -		return;
> -
> -	pci_read_config_dword(bridge, PCI_PREF_MEMORY_BASE, &pmem);
> -	if (!pmem) {
> -		pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE,
> -					       0xffe0fff0);
> -		pci_read_config_dword(bridge, PCI_PREF_MEMORY_BASE, &pmem);
> -		pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE, 0x0);
> -	}
> -	if (pmem) {
> +	if (bridge->pref_window) {
>  		b_res[2].flags |= IORESOURCE_MEM | IORESOURCE_PREFETCH;
> -		if ((pmem & PCI_PREF_RANGE_TYPE_MASK) ==
> -		    PCI_PREF_RANGE_TYPE_64) {
> +		if (bridge->pref_64_window) {
>  			b_res[2].flags |= IORESOURCE_MEM_64;
>  			b_res[2].flags |= PCI_PREF_RANGE_TYPE_64;
>  		}
>  	}
> -
> -	/* double check if bridge does support 64 bit pref */
> -	if (b_res[2].flags & IORESOURCE_MEM_64) {
> -		u32 mem_base_hi, tmp;
> -		pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32,
> -					 &mem_base_hi);
> -		pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
> -					       0xffffffff);
> -		pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32, &tmp);
> -		if (!tmp)
> -			b_res[2].flags &= ~IORESOURCE_MEM_64;
> -		pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
> -				       mem_base_hi);
> -	}
>  }
>  
>  /* Helper function for sizing routines: find first available
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 65f1d8c2f082..40b327b814aa 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -373,6 +373,9 @@ struct pci_dev {
>  	bool		match_driver;		/* Skip attaching driver */
>  
>  	unsigned int	transparent:1;		/* Subtractive decode bridge */
> +	unsigned int	io_window:1;		/* Bridge has I/O window */
> +	unsigned int	pref_window:1;		/* Bridge has pref mem window */
> +	unsigned int	pref_64_window:1;	/* Pref mem window is 64-bit */
>  	unsigned int	multifunction:1;	/* Multi-function device */
>  
>  	unsigned int	is_busmaster:1;		/* Is busmaster */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH v3] PCI: avoid bridge feature re-probing on hotplug
  2019-01-19 20:12 ` [PATCH v3] " Bjorn Helgaas
  2019-01-20 14:49   ` Michael S. Tsirkin
@ 2019-01-22  5:31   ` xuyandong
  2019-01-22 13:53     ` Michael S. Tsirkin
  2019-01-22 18:58     ` Bjorn Helgaas
  1 sibling, 2 replies; 13+ messages in thread
From: xuyandong @ 2019-01-22  5:31 UTC (permalink / raw)
  To: Bjorn Helgaas, Michael S. Tsirkin
  Cc: linux-kernel, Yinghai Lu, Jesse Barnes, linux-pci, Sagi Grimberg,
	Ofer Hayut, Roy Shterman, Keith Busch, Wangzhou (B)

Hi Bjorn and Michael

After trying to reproduce the problem for a whole day, the bug did not show up
any more. So I think the new patch does solve this problem.

> -----Original Message-----
> From: Bjorn Helgaas [mailto:helgaas@kernel.org]
> Sent: Sunday, January 20, 2019 4:13 AM
> To: Michael S. Tsirkin <mst@redhat.com>
> Cc: linux-kernel@vger.kernel.org; xuyandong <xuyandong2@huawei.com>;
> Yinghai Lu <yinghai@kernel.org>; Jesse Barnes <jbarnes@virtuousgeek.org>;
> linux-pci@vger.kernel.org; Sagi Grimberg <sagi@grimberg.me>; Ofer Hayut
> <ofer@lightbitslabs.com>; Roy Shterman <roys@lightbitslabs.com>; Keith
> Busch <keith.busch@intel.com>; Wangzhou (B) <wangzhou1@hisilicon.com>
> Subject: [PATCH v3] PCI: avoid bridge feature re-probing on hotplug
> 
> I gave up trying to reproduce the problem and test this patch with qemu; can
> you guys (Michael and Xu (sorry if I mangled your name)) give this a try?
> 
> I cc'd a few other people who have noticed this issue in the past, so just FYI for
> them.
> 
> Bjorn
> 
> 
> commit dd21b922db366ba069291b6fef2a8ce6768756a2
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date:   Sat Jan 19 11:35:04 2019 -0600
> 
>     PCI: Probe bridge window attributes once at enumeration-time
> 
>     pci_bridge_check_ranges() determines whether a bridge supports the
> optional
>     I/O and prefetchable memory windows and sets the flag bits in the bridge
>     resources.  This could be done once during enumeration except that the
>     resource allocation code completely clears the flag bits, e.g., in the
>     pci_assign_unassigned_bridge_resources() path.
> 
>     The problem was that in some cases pci_bridge_check_ranges() *changes*
> the
>     window registers to determine whether they're writable, and this may break
>     concurrent accesses to devices behind the bridge.
> 
>     Add a new pci_read_bridge_windows() to determine whether a bridge
> supports
>     the optional windows, call it once during enumeration, remember the
>     results, and change pci_bridge_check_ranges() to set the flag bits based on
>     those remembered results.
> 
>     Link: https://lore.kernel.org/linux-pci/1506151482-113560-1-git-send-email-
> wangzhou1@hisilicon.com
>     Link: https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02082.html
>     Reported-by: xuyandong <xuyandong2@huawei.com>
>     Cc: Sagi Grimberg <sagi@grimberg.me>
>     Cc: Ofer Hayut <ofer@lightbitslabs.com>
>     Cc: Roy Shterman <roys@lightbitslabs.com>
>     Cc: Keith Busch <keith.busch@intel.com>
>     Cc: Zhou Wang <wangzhou1@hisilicon.com>
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index
> 257b9f6f2ebb..2ef8b954c65a 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -348,6 +348,57 @@ static void pci_read_bases(struct pci_dev *dev,
> unsigned int howmany, int rom)
>  	}
>  }
> 
> +static void pci_read_bridge_windows(struct pci_dev *bridge) {
> +	u16 io;
> +	u32 pmem, tmp;
> +
> +	pci_read_config_word(bridge, PCI_IO_BASE, &io);
> +	if (!io) {
> +		pci_write_config_word(bridge, PCI_IO_BASE, 0xe0f0);
> +		pci_read_config_word(bridge, PCI_IO_BASE, &io);
> +		pci_write_config_word(bridge, PCI_IO_BASE, 0x0);
> +	}
> +	if (io)
> +		bridge->io_window = 1;
> +
> +	/*
> +	 * DECchip 21050 pass 2 errata: the bridge may miss an address
> +	 * disconnect boundary by one PCI data phase.  Workaround: do not
> +	 * use prefetching on this device.
> +	 */
> +	if (bridge->vendor == PCI_VENDOR_ID_DEC && bridge->device ==
> 0x0001)
> +		return;
> +
> +	pci_read_config_dword(bridge, PCI_PREF_MEMORY_BASE, &pmem);
> +	if (!pmem) {
> +		pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE,
> +					       0xffe0fff0);
> +		pci_read_config_dword(bridge, PCI_PREF_MEMORY_BASE,
> &pmem);
> +		pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE,
> 0x0);
> +	}
> +	if (!pmem)
> +		return;
> +
> +	bridge->pref_window = 1;
> +
> +	if ((pmem & PCI_PREF_RANGE_TYPE_MASK) ==
> PCI_PREF_RANGE_TYPE_64) {
> +
> +		/*
> +		 * Bridge claims to have a 64-bit prefetchable memory
> +		 * window; verify that the upper bits are actually
> +		 * writable.
> +		 */
> +		pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32,
> &pmem);
> +		pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
> +				       0xffffffff);
> +		pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32,
> &tmp);
> +		pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
> pmem);
> +		if (tmp)
> +			bridge->pref_64_window = 1;
> +	}
> +}
> +
>  static void pci_read_bridge_io(struct pci_bus *child)  {
>  	struct pci_dev *dev = child->self;
> @@ -1739,6 +1790,7 @@ int pci_setup_device(struct pci_dev *dev)
>  		pci_read_irq(dev);
>  		dev->transparent = ((dev->class & 0xff) == 1);
>  		pci_read_bases(dev, 2, PCI_ROM_ADDRESS1);
> +		pci_read_bridge_windows(dev);
>  		set_pcie_hotplug_bridge(dev);
>  		pos = pci_find_capability(dev, PCI_CAP_ID_SSVID);
>  		if (pos) {
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index
> ed960436df5e..1941bb0a6c13 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -735,58 +735,21 @@ int pci_claim_bridge_resource(struct pci_dev
> *bridge, int i)
>     base/limit registers must be read-only and read as 0. */  static void
> pci_bridge_check_ranges(struct pci_bus *bus)  {
> -	u16 io;
> -	u32 pmem;
>  	struct pci_dev *bridge = bus->self;
> -	struct resource *b_res;
> +	struct resource *b_res = &bridge->resource[PCI_BRIDGE_RESOURCES];
> 
> -	b_res = &bridge->resource[PCI_BRIDGE_RESOURCES];
>  	b_res[1].flags |= IORESOURCE_MEM;
> 
> -	pci_read_config_word(bridge, PCI_IO_BASE, &io);
> -	if (!io) {
> -		pci_write_config_word(bridge, PCI_IO_BASE, 0xe0f0);
> -		pci_read_config_word(bridge, PCI_IO_BASE, &io);
> -		pci_write_config_word(bridge, PCI_IO_BASE, 0x0);
> -	}
> -	if (io)
> +	if (bridge->io_window)
>  		b_res[0].flags |= IORESOURCE_IO;
> 
> -	/*  DECchip 21050 pass 2 errata: the bridge may miss an address
> -	    disconnect boundary by one PCI data phase.
> -	    Workaround: do not use prefetching on this device. */
> -	if (bridge->vendor == PCI_VENDOR_ID_DEC && bridge->device ==
> 0x0001)
> -		return;
> -
> -	pci_read_config_dword(bridge, PCI_PREF_MEMORY_BASE, &pmem);
> -	if (!pmem) {
> -		pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE,
> -					       0xffe0fff0);
> -		pci_read_config_dword(bridge, PCI_PREF_MEMORY_BASE,
> &pmem);
> -		pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE,
> 0x0);
> -	}
> -	if (pmem) {
> +	if (bridge->pref_window) {
>  		b_res[2].flags |= IORESOURCE_MEM |
> IORESOURCE_PREFETCH;
> -		if ((pmem & PCI_PREF_RANGE_TYPE_MASK) ==
> -		    PCI_PREF_RANGE_TYPE_64) {
> +		if (bridge->pref_64_window) {
>  			b_res[2].flags |= IORESOURCE_MEM_64;
>  			b_res[2].flags |= PCI_PREF_RANGE_TYPE_64;
>  		}
>  	}
> -
> -	/* double check if bridge does support 64 bit pref */
> -	if (b_res[2].flags & IORESOURCE_MEM_64) {
> -		u32 mem_base_hi, tmp;
> -		pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32,
> -					 &mem_base_hi);
> -		pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
> -					       0xffffffff);
> -		pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32,
> &tmp);
> -		if (!tmp)
> -			b_res[2].flags &= ~IORESOURCE_MEM_64;
> -		pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
> -				       mem_base_hi);
> -	}
>  }
> 
>  /* Helper function for sizing routines: find first available diff --git
> a/include/linux/pci.h b/include/linux/pci.h index 65f1d8c2f082..40b327b814aa
> 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -373,6 +373,9 @@ struct pci_dev {
>  	bool		match_driver;		/* Skip attaching driver */
> 
>  	unsigned int	transparent:1;		/* Subtractive decode bridge
> */
> +	unsigned int	io_window:1;		/* Bridge has I/O window */
> +	unsigned int	pref_window:1;		/* Bridge has pref mem
> window */
> +	unsigned int	pref_64_window:1;	/* Pref mem window is 64-bit
> */
>  	unsigned int	multifunction:1;	/* Multi-function device */
> 
>  	unsigned int	is_busmaster:1;		/* Is busmaster */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] PCI: avoid bridge feature re-probing on hotplug
  2019-01-22  5:31   ` xuyandong
@ 2019-01-22 13:53     ` Michael S. Tsirkin
  2019-01-22 18:58     ` Bjorn Helgaas
  1 sibling, 0 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2019-01-22 13:53 UTC (permalink / raw)
  To: xuyandong
  Cc: Bjorn Helgaas, linux-kernel, Yinghai Lu, Jesse Barnes, linux-pci,
	Sagi Grimberg, Ofer Hayut, Roy Shterman, Keith Busch,
	Wangzhou (B)

Thanks a lot for the testing!
I'll play with it today hopefully.

On Tue, Jan 22, 2019 at 05:31:10AM +0000, xuyandong wrote:
> Hi Bjorn and Michael
> 
> After trying to reproduce the problem for a whole day, the bug did not show up
> any more. So I think the new patch does solve this problem.
> 
> > -----Original Message-----
> > From: Bjorn Helgaas [mailto:helgaas@kernel.org]
> > Sent: Sunday, January 20, 2019 4:13 AM
> > To: Michael S. Tsirkin <mst@redhat.com>
> > Cc: linux-kernel@vger.kernel.org; xuyandong <xuyandong2@huawei.com>;
> > Yinghai Lu <yinghai@kernel.org>; Jesse Barnes <jbarnes@virtuousgeek.org>;
> > linux-pci@vger.kernel.org; Sagi Grimberg <sagi@grimberg.me>; Ofer Hayut
> > <ofer@lightbitslabs.com>; Roy Shterman <roys@lightbitslabs.com>; Keith
> > Busch <keith.busch@intel.com>; Wangzhou (B) <wangzhou1@hisilicon.com>
> > Subject: [PATCH v3] PCI: avoid bridge feature re-probing on hotplug
> > 
> > I gave up trying to reproduce the problem and test this patch with qemu; can
> > you guys (Michael and Xu (sorry if I mangled your name)) give this a try?
> > 
> > I cc'd a few other people who have noticed this issue in the past, so just FYI for
> > them.
> > 
> > Bjorn
> > 
> > 
> > commit dd21b922db366ba069291b6fef2a8ce6768756a2
> > Author: Bjorn Helgaas <bhelgaas@google.com>
> > Date:   Sat Jan 19 11:35:04 2019 -0600
> > 
> >     PCI: Probe bridge window attributes once at enumeration-time
> > 
> >     pci_bridge_check_ranges() determines whether a bridge supports the
> > optional
> >     I/O and prefetchable memory windows and sets the flag bits in the bridge
> >     resources.  This could be done once during enumeration except that the
> >     resource allocation code completely clears the flag bits, e.g., in the
> >     pci_assign_unassigned_bridge_resources() path.
> > 
> >     The problem was that in some cases pci_bridge_check_ranges() *changes*
> > the
> >     window registers to determine whether they're writable, and this may break
> >     concurrent accesses to devices behind the bridge.
> > 
> >     Add a new pci_read_bridge_windows() to determine whether a bridge
> > supports
> >     the optional windows, call it once during enumeration, remember the
> >     results, and change pci_bridge_check_ranges() to set the flag bits based on
> >     those remembered results.
> > 
> >     Link: https://lore.kernel.org/linux-pci/1506151482-113560-1-git-send-email-
> > wangzhou1@hisilicon.com
> >     Link: https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02082.html
> >     Reported-by: xuyandong <xuyandong2@huawei.com>
> >     Cc: Sagi Grimberg <sagi@grimberg.me>
> >     Cc: Ofer Hayut <ofer@lightbitslabs.com>
> >     Cc: Roy Shterman <roys@lightbitslabs.com>
> >     Cc: Keith Busch <keith.busch@intel.com>
> >     Cc: Zhou Wang <wangzhou1@hisilicon.com>
> > 
> > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index
> > 257b9f6f2ebb..2ef8b954c65a 100644
> > --- a/drivers/pci/probe.c
> > +++ b/drivers/pci/probe.c
> > @@ -348,6 +348,57 @@ static void pci_read_bases(struct pci_dev *dev,
> > unsigned int howmany, int rom)
> >  	}
> >  }
> > 
> > +static void pci_read_bridge_windows(struct pci_dev *bridge) {
> > +	u16 io;
> > +	u32 pmem, tmp;
> > +
> > +	pci_read_config_word(bridge, PCI_IO_BASE, &io);
> > +	if (!io) {
> > +		pci_write_config_word(bridge, PCI_IO_BASE, 0xe0f0);
> > +		pci_read_config_word(bridge, PCI_IO_BASE, &io);
> > +		pci_write_config_word(bridge, PCI_IO_BASE, 0x0);
> > +	}
> > +	if (io)
> > +		bridge->io_window = 1;
> > +
> > +	/*
> > +	 * DECchip 21050 pass 2 errata: the bridge may miss an address
> > +	 * disconnect boundary by one PCI data phase.  Workaround: do not
> > +	 * use prefetching on this device.
> > +	 */
> > +	if (bridge->vendor == PCI_VENDOR_ID_DEC && bridge->device ==
> > 0x0001)
> > +		return;
> > +
> > +	pci_read_config_dword(bridge, PCI_PREF_MEMORY_BASE, &pmem);
> > +	if (!pmem) {
> > +		pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE,
> > +					       0xffe0fff0);
> > +		pci_read_config_dword(bridge, PCI_PREF_MEMORY_BASE,
> > &pmem);
> > +		pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE,
> > 0x0);
> > +	}
> > +	if (!pmem)
> > +		return;
> > +
> > +	bridge->pref_window = 1;
> > +
> > +	if ((pmem & PCI_PREF_RANGE_TYPE_MASK) ==
> > PCI_PREF_RANGE_TYPE_64) {
> > +
> > +		/*
> > +		 * Bridge claims to have a 64-bit prefetchable memory
> > +		 * window; verify that the upper bits are actually
> > +		 * writable.
> > +		 */
> > +		pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32,
> > &pmem);
> > +		pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
> > +				       0xffffffff);
> > +		pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32,
> > &tmp);
> > +		pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
> > pmem);
> > +		if (tmp)
> > +			bridge->pref_64_window = 1;
> > +	}
> > +}
> > +
> >  static void pci_read_bridge_io(struct pci_bus *child)  {
> >  	struct pci_dev *dev = child->self;
> > @@ -1739,6 +1790,7 @@ int pci_setup_device(struct pci_dev *dev)
> >  		pci_read_irq(dev);
> >  		dev->transparent = ((dev->class & 0xff) == 1);
> >  		pci_read_bases(dev, 2, PCI_ROM_ADDRESS1);
> > +		pci_read_bridge_windows(dev);
> >  		set_pcie_hotplug_bridge(dev);
> >  		pos = pci_find_capability(dev, PCI_CAP_ID_SSVID);
> >  		if (pos) {
> > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index
> > ed960436df5e..1941bb0a6c13 100644
> > --- a/drivers/pci/setup-bus.c
> > +++ b/drivers/pci/setup-bus.c
> > @@ -735,58 +735,21 @@ int pci_claim_bridge_resource(struct pci_dev
> > *bridge, int i)
> >     base/limit registers must be read-only and read as 0. */  static void
> > pci_bridge_check_ranges(struct pci_bus *bus)  {
> > -	u16 io;
> > -	u32 pmem;
> >  	struct pci_dev *bridge = bus->self;
> > -	struct resource *b_res;
> > +	struct resource *b_res = &bridge->resource[PCI_BRIDGE_RESOURCES];
> > 
> > -	b_res = &bridge->resource[PCI_BRIDGE_RESOURCES];
> >  	b_res[1].flags |= IORESOURCE_MEM;
> > 
> > -	pci_read_config_word(bridge, PCI_IO_BASE, &io);
> > -	if (!io) {
> > -		pci_write_config_word(bridge, PCI_IO_BASE, 0xe0f0);
> > -		pci_read_config_word(bridge, PCI_IO_BASE, &io);
> > -		pci_write_config_word(bridge, PCI_IO_BASE, 0x0);
> > -	}
> > -	if (io)
> > +	if (bridge->io_window)
> >  		b_res[0].flags |= IORESOURCE_IO;
> > 
> > -	/*  DECchip 21050 pass 2 errata: the bridge may miss an address
> > -	    disconnect boundary by one PCI data phase.
> > -	    Workaround: do not use prefetching on this device. */
> > -	if (bridge->vendor == PCI_VENDOR_ID_DEC && bridge->device ==
> > 0x0001)
> > -		return;
> > -
> > -	pci_read_config_dword(bridge, PCI_PREF_MEMORY_BASE, &pmem);
> > -	if (!pmem) {
> > -		pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE,
> > -					       0xffe0fff0);
> > -		pci_read_config_dword(bridge, PCI_PREF_MEMORY_BASE,
> > &pmem);
> > -		pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE,
> > 0x0);
> > -	}
> > -	if (pmem) {
> > +	if (bridge->pref_window) {
> >  		b_res[2].flags |= IORESOURCE_MEM |
> > IORESOURCE_PREFETCH;
> > -		if ((pmem & PCI_PREF_RANGE_TYPE_MASK) ==
> > -		    PCI_PREF_RANGE_TYPE_64) {
> > +		if (bridge->pref_64_window) {
> >  			b_res[2].flags |= IORESOURCE_MEM_64;
> >  			b_res[2].flags |= PCI_PREF_RANGE_TYPE_64;
> >  		}
> >  	}
> > -
> > -	/* double check if bridge does support 64 bit pref */
> > -	if (b_res[2].flags & IORESOURCE_MEM_64) {
> > -		u32 mem_base_hi, tmp;
> > -		pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32,
> > -					 &mem_base_hi);
> > -		pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
> > -					       0xffffffff);
> > -		pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32,
> > &tmp);
> > -		if (!tmp)
> > -			b_res[2].flags &= ~IORESOURCE_MEM_64;
> > -		pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
> > -				       mem_base_hi);
> > -	}
> >  }
> > 
> >  /* Helper function for sizing routines: find first available diff --git
> > a/include/linux/pci.h b/include/linux/pci.h index 65f1d8c2f082..40b327b814aa
> > 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -373,6 +373,9 @@ struct pci_dev {
> >  	bool		match_driver;		/* Skip attaching driver */
> > 
> >  	unsigned int	transparent:1;		/* Subtractive decode bridge
> > */
> > +	unsigned int	io_window:1;		/* Bridge has I/O window */
> > +	unsigned int	pref_window:1;		/* Bridge has pref mem
> > window */
> > +	unsigned int	pref_64_window:1;	/* Pref mem window is 64-bit
> > */
> >  	unsigned int	multifunction:1;	/* Multi-function device */
> > 
> >  	unsigned int	is_busmaster:1;		/* Is busmaster */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] PCI: avoid bridge feature re-probing on hotplug
  2019-01-22  5:31   ` xuyandong
  2019-01-22 13:53     ` Michael S. Tsirkin
@ 2019-01-22 18:58     ` Bjorn Helgaas
  1 sibling, 0 replies; 13+ messages in thread
From: Bjorn Helgaas @ 2019-01-22 18:58 UTC (permalink / raw)
  To: xuyandong
  Cc: Michael S. Tsirkin, linux-kernel, Yinghai Lu, Jesse Barnes,
	linux-pci, Sagi Grimberg, Ofer Hayut, Roy Shterman, Keith Busch,
	Wangzhou (B)

On Tue, Jan 22, 2019 at 05:31:10AM +0000, xuyandong wrote:
> Hi Bjorn and Michael
> 
> After trying to reproduce the problem for a whole day, the bug did not show up
> any more. So I think the new patch does solve this problem.

Thank you very much for testing this!

I'd like to give you the appropriate credit in the changelog, but I don't
know exactly how your name should be spelled and capitalized.  Is the
following what you want?  If not, let me know and I'll correct it.

  Reported-by: xuyandong <xuyandong2@huawei.com>
  Tested-by: xuyandong <xuyandong2@huawei.com>

Bjorn

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-01-22 18:58 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-18  0:45 [PATCH v2] PCI: avoid bridge feature re-probing on hotplug Michael S. Tsirkin
2018-12-20 19:49 ` Bjorn Helgaas
2018-12-20 21:26   ` Michael S. Tsirkin
2018-12-20 22:31     ` Bjorn Helgaas
2018-12-20 22:36       ` Michael S. Tsirkin
2019-01-07 15:01         ` Michael S. Tsirkin
2019-01-15  4:07         ` Michael S. Tsirkin
2019-01-15 22:23           ` Bjorn Helgaas
2019-01-19 20:12 ` [PATCH v3] " Bjorn Helgaas
2019-01-20 14:49   ` Michael S. Tsirkin
2019-01-22  5:31   ` xuyandong
2019-01-22 13:53     ` Michael S. Tsirkin
2019-01-22 18:58     ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).