All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] PCI: PCI-E MPS bug fixes
@ 2011-09-08 21:41 Jon Mason
  2011-09-09 22:44 ` Linus Torvalds
  0 siblings, 1 reply; 8+ messages in thread
From: Jon Mason @ 2011-09-08 21:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jesse Barnes, linux-kernel, linux-pci

Please include these bug fixes to PCI in 3.1-rc6


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] PCI: PCI-E MPS bug fixes
  2011-09-08 21:41 [PATCH 0/2] PCI: PCI-E MPS bug fixes Jon Mason
@ 2011-09-09 22:44 ` Linus Torvalds
  2011-09-10  0:04   ` Jesse Barnes
  0 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2011-09-09 22:44 UTC (permalink / raw)
  To: Jon Mason, Jesse Barnes; +Cc: linux-kernel, linux-pci

Jesse,
  what's the status of this? It does seem like our current PCIe
behavior is rather unsafe.

Should I apply the patches, should I wait for a pull request from you,
should I ignore them because of xyz?

                      Linus

On Thu, Sep 8, 2011 at 2:41 PM, Jon Mason <mason@myri.com> wrote:
> Please include these bug fixes to PCI in 3.1-rc6
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] PCI: PCI-E MPS bug fixes
  2011-09-09 22:44 ` Linus Torvalds
@ 2011-09-10  0:04   ` Jesse Barnes
  2011-09-13 10:50     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 8+ messages in thread
From: Jesse Barnes @ 2011-09-10  0:04 UTC (permalink / raw)
  To: Linus Torvalds, Jon Mason; +Cc: linux-kernel, linux-pci

Linus Torvalds <torvalds@linux-foundation.org> wrote:

>Jesse,
>  what's the status of this? It does seem like our current PCIe
>behavior is rather unsafe.
>
>Should I apply the patches, should I wait for a pull request from you,
>should I ignore them because of xyz?
>
>                      Linus
>
>On Thu, Sep 8, 2011 at 2:41 PM, Jon Mason <mason@myri.com> wrote:
>> Please include these bug fixes to PCI in 3.1-rc6
>>
>>

You should apply them. I would queue them, and but I don't have another PCI tree anywhere for you to pull them. 

In fact, I'll be out for the next few weeks; I'll ask Bjorn to cover for me. 

Thanks,
Jesse
-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] PCI: PCI-E MPS bug fixes
  2011-09-10  0:04   ` Jesse Barnes
@ 2011-09-13 10:50     ` Benjamin Herrenschmidt
  2011-09-13 18:17       ` Benjamin Herrenschmidt
  2011-09-13 20:55       ` Jon Mason
  0 siblings, 2 replies; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2011-09-13 10:50 UTC (permalink / raw)
  To: Jon Mason; +Cc: Linus Torvalds, linux-kernel, linux-pci, Jesse Barnes


> 
> You should apply them. I would queue them, and but I don't have
> another PCI tree anywhere for you to pull them. 
> 
> In fact, I'll be out for the next few weeks; I'll ask Bjorn to cover
> for me. 

Note that I might be missing some patches, but what's currently in Linus
tree is rather wrong.

The comment is right, but the implementation is bogus:

static void pcie_write_mps(struct pci_dev *dev, int mps)
{
	int rc, dev_mpss;

	dev_mpss = 128 << dev->pcie_mpss;

	if (pcie_bus_config == PCIE_BUS_PERFORMANCE) {
		if (dev->bus->self) {
			dev_dbg(&dev->bus->dev, "Bus MPSS %d\n",
				128 << dev->bus->self->pcie_mpss);

			/* For "MPS Force Max", the assumption is made that
			 * downstream communication will never be larger than
			 * the MRRS.  So, the MPS only needs to be configured
			 * for the upstream communication.  This being the case,
			 * walk from the top down and set the MPS of the child
			 * to that of the parent bus.
			 */
			mps = 128 << dev->bus->self->pcie_mpss;

So here, we decide to use as an mps, the mpss of the bridge. That's already not
quite right. The bridge might itself be clamped down (because it's parent has
a smaller MPSS, well, your other bugs below will prevent that but that's how
it should be)

So we should use the MPS of the bridge not the MPSS (ie the configured MPS of
the bridge, not the max MPS of the bridge).

That does mean that we do need to configure the bridge first before we configure
the bridge children, I don't remember if that's what happens with the walk function
so we need to dbl check (I have to run, no time to check right now)

			if (mps > dev_mpss)
				dev_warn(&dev->dev, "MPS configured higher than"
					 " maximum supported by the device.  If"
					 " a bus issue occurs, try running with"
					 " pci=pcie_bus_safe.\n");

Yeah right. Instead of warning, we should actually -clamp- it. IE If the
bridge can do 4096 bytes and the device can only do 128 bytes, it's crazy
to try to set the device to 4096 bytes.

The idea is that we set the device to the higher it can up to the size
supported by the brigde, not the exact size of the brigde regardless of
whether that's bigger than what the device can handle.

It -never- makes any sense to configure a device to an MPS larger than
it's own maxium.

		}

		dev->pcie_mpss = ffs(mps) - 8;

And here I'm baffled. Why on earth would we -ever- want to change dev->pcie_mpss ?

This is the the HARD maximum supported by the device, as read from the capability
register. Changing our idea of what the device supports as a maximum isn't going
to help us... all it will do is allow the subsequent:
  
	}

	rc = pcie_set_mps(dev, mps);
	if (rc)
		dev_err(&dev->dev, "Failed attempting to set the MPS\n");

To actually try to program the bogus mps value we just calculated (ie the device
larger than what it supports).

}

The right thing to do here for the "performance" case is to use the min() of
the bus mps and the dev_mpss basically.

Ie. Going top down:

 - If !parent, mps = mpss else mps = min(mpss, parent->mps)
 - Iterate children

That's it.

So of course, the current implementation -will- break things. It might even explain
some of the radeon problems you've been observing, ie, might have nothing to do with
MRRS...

As to whether we should default to "safe" or "performance", well, it really depends
on the platform.

I know for a fact on power that my host bridges are never going to generate a downstream
request with a TLP larger than 128 bytes, and we never do PCIe  <-> PCIe transfers, 
so "performance" is good for me.

x86 with funky DMA engines in the CPU etc... might be capable of generating larger
TLPs, in which case it must switch to "safe".

Note that this whole business is really a mis-design of PCIe. The 'performance' mode is
a way to try to work around it and not end up with everything clamped down to the minimum
MPS (128 bytes) all the time, but it's a bit risky. Ideally the spec should have separated
at least the MPS of packets generated vs. the MPS of packets received at the root
complex level.

Now, I am giving some courses/training today and may not have time to propose a patch,
but if you don't get to it by tomorrow I'll see what I can do.

Cheers,
Ben.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] PCI: PCI-E MPS bug fixes
  2011-09-13 10:50     ` Benjamin Herrenschmidt
@ 2011-09-13 18:17       ` Benjamin Herrenschmidt
  2011-09-13 21:05         ` Jon Mason
  2011-09-13 20:55       ` Jon Mason
  1 sibling, 1 reply; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2011-09-13 18:17 UTC (permalink / raw)
  To: Jon Mason; +Cc: Linus Torvalds, linux-kernel, linux-pci, Jesse Barnes

On Tue, 2011-09-13 at 07:50 -0300, Benjamin Herrenschmidt wrote:
> > 
> > You should apply them. I would queue them, and but I don't have
> > another PCI tree anywhere for you to pull them. 
> > 
> > In fact, I'll be out for the next few weeks; I'll ask Bjorn to cover
> > for me. 
> 
> Note that I might be missing some patches, but what's currently in Linus
> tree is rather wrong.
> 
> The comment is right, but the implementation is bogus:

Oh and another one (again, I never saw the patches I'm responding to for
some odd reason, only email 0/2, so you might have already fixed that):

When using "safe" mode, I get a crash in pcie_find_smpss(). Patch below.

>From 1b542d690323fab332e61099193d598c5c515d13 Mon Sep 17 00:00:00 2001
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: Tue, 13 Sep 2011 15:16:33 -0300
Subject: [PATCH] pci: Don't crash when reading mpss from root complex

In pcie_find_smpss(), we have the following statement:

 	if (dev->is_hotplug_bridge && (!list_is_singular(&dev->bus->devices) ||
	    dev->bus->self->pcie_type != PCI_EXP_TYPE_ROOT_PORT))

The problem is that at least on my machine, this gets called for the
root complex (virtual P2P bridge), and dev->bus->self is NULL since
the parent bus for this is not itself anchor to a PCI device.

This adds the necessary NULL check.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 drivers/pci/probe.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 8473727..09d1c0f 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1351,7 +1351,8 @@ static int pcie_find_smpss(struct pci_dev *dev, void *data)
 	 * will occur as normal.
 	 */
 	if (dev->is_hotplug_bridge && (!list_is_singular(&dev->bus->devices) ||
-	    dev->bus->self->pcie_type != PCI_EXP_TYPE_ROOT_PORT))
+	     (dev->bus->self &&
+	      dev->bus->self->pcie_type != PCI_EXP_TYPE_ROOT_PORT)))
 		*smpss = 0;
 
 	if (*smpss > dev->pcie_mpss)
-- 
1.7.4.1







^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] PCI: PCI-E MPS bug fixes
  2011-09-13 10:50     ` Benjamin Herrenschmidt
  2011-09-13 18:17       ` Benjamin Herrenschmidt
@ 2011-09-13 20:55       ` Jon Mason
  1 sibling, 0 replies; 8+ messages in thread
From: Jon Mason @ 2011-09-13 20:55 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Linus Torvalds, linux-kernel, linux-pci, Jesse Barnes

On Tue, Sep 13, 2011 at 5:50 AM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
>
>>
>> You should apply them. I would queue them, and but I don't have
>> another PCI tree anywhere for you to pull them.
>>
>> In fact, I'll be out for the next few weeks; I'll ask Bjorn to cover
>> for me.
>
> Note that I might be missing some patches, but what's currently in Linus
> tree is rather wrong.

It is worth noting that with my last patch, this is not the default
behavior.  Whether it is faulty is open for debate.

>
> The comment is right, but the implementation is bogus:
>
> static void pcie_write_mps(struct pci_dev *dev, int mps)
> {
>        int rc, dev_mpss;
>
>        dev_mpss = 128 << dev->pcie_mpss;
>
>        if (pcie_bus_config == PCIE_BUS_PERFORMANCE) {
>                if (dev->bus->self) {
>                        dev_dbg(&dev->bus->dev, "Bus MPSS %d\n",
>                                128 << dev->bus->self->pcie_mpss);
>
>                        /* For "MPS Force Max", the assumption is made that
>                         * downstream communication will never be larger than
>                         * the MRRS.  So, the MPS only needs to be configured
>                         * for the upstream communication.  This being the case,
>                         * walk from the top down and set the MPS of the child
>                         * to that of the parent bus.
>                         */
>                        mps = 128 << dev->bus->self->pcie_mpss;
>
> So here, we decide to use as an mps, the mpss of the bridge. That's already not
> quite right. The bridge might itself be clamped down (because it's parent has
> a smaller MPSS, well, your other bugs below will prevent that but that's how
> it should be)

This is started from the root complex.  So, it should be able to
handle the maximum setting of the bridge chip.

>
> So we should use the MPS of the bridge not the MPSS (ie the configured MPS of
> the bridge, not the max MPS of the bridge).
>
> That does mean that we do need to configure the bridge first before we configure
> the bridge children, I don't remember if that's what happens with the walk function
> so we need to dbl check (I have to run, no time to check right now)

Yes, that is the behavior.

>
>                        if (mps > dev_mpss)
>                                dev_warn(&dev->dev, "MPS configured higher than"
>                                         " maximum supported by the device.  If"
>                                         " a bus issue occurs, try running with"
>                                         " pci=pcie_bus_safe.\n");
>
> Yeah right. Instead of warning, we should actually -clamp- it. IE If the
> bridge can do 4096 bytes and the device can only do 128 bytes, it's crazy
> to try to set the device to 4096 bytes.

You are correct.  I was looking at this code when the bugs started
coming in and realized this was erroneous.  A simple mps=min(mps,
dev_mpss) should fix it.

> The idea is that we set the device to the higher it can up to the size
> supported by the brigde, not the exact size of the brigde regardless of
> whether that's bigger than what the device can handle.
>
> It -never- makes any sense to configure a device to an MPS larger than
> it's own maxium.
>
>                }
>
>                dev->pcie_mpss = ffs(mps) - 8;
>
> And here I'm baffled. Why on earth would we -ever- want to change dev->pcie_mpss ?

I was being overly clever here and using it as a way to actual MPS of
the bridges.  A comment here would be helpful, but in hindsight I
think it is a bad idea to do it this way.

>
> This is the the HARD maximum supported by the device, as read from the capability
> register. Changing our idea of what the device supports as a maximum isn't going
> to help us... all it will do is allow the subsequent:
>
>        }
>
>        rc = pcie_set_mps(dev, mps);
>        if (rc)
>                dev_err(&dev->dev, "Failed attempting to set the MPS\n");
>
> To actually try to program the bogus mps value we just calculated (ie the device
> larger than what it supports).
>
> }
>
> The right thing to do here for the "performance" case is to use the min() of
> the bus mps and the dev_mpss basically.
>
> Ie. Going top down:
>
>  - If !parent, mps = mpss else mps = min(mpss, parent->mps)
>  - Iterate children
>
> That's it.
>
> So of course, the current implementation -will- break things. It might even explain
> some of the radeon problems you've been observing, ie, might have nothing to do with
> MRRS...

My patch before that removed MRRS setting from the "performance"
option solved a majority of the problems seen, but as you said in
response to that patch, it was buggy.

>
> As to whether we should default to "safe" or "performance", well, it really depends
> on the platform.
>
> I know for a fact on power that my host bridges are never going to generate a downstream
> request with a TLP larger than 128 bytes, and we never do PCIe  <-> PCIe transfers,
> so "performance" is good for me.
>
> x86 with funky DMA engines in the CPU etc... might be capable of generating larger
> TLPs, in which case it must switch to "safe".

Only is it is not observing the MRRS value.

>
> Note that this whole business is really a mis-design of PCIe. The 'performance' mode is
> a way to try to work around it and not end up with everything clamped down to the minimum
> MPS (128 bytes) all the time, but it's a bit risky. Ideally the spec should have separated
> at least the MPS of packets generated vs. the MPS of packets received at the root
> complex level.

And a fix for poorly written BIOS (the original cause of this patch).

>
> Now, I am giving some courses/training today and may not have time to propose a patch,
> but if you don't get to it by tomorrow I'll see what I can do.

I'll make a pass at this shortly and send it to linux-pci.  Thanks for
going over the code.

>
> Cheers,
> Ben.
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] PCI: PCI-E MPS bug fixes
  2011-09-13 18:17       ` Benjamin Herrenschmidt
@ 2011-09-13 21:05         ` Jon Mason
  2011-09-13 21:56           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 8+ messages in thread
From: Jon Mason @ 2011-09-13 21:05 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Linus Torvalds, linux-kernel, linux-pci, Jesse Barnes

On Tue, Sep 13, 2011 at 1:17 PM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Tue, 2011-09-13 at 07:50 -0300, Benjamin Herrenschmidt wrote:
>> >
>> > You should apply them. I would queue them, and but I don't have
>> > another PCI tree anywhere for you to pull them.
>> >
>> > In fact, I'll be out for the next few weeks; I'll ask Bjorn to cover
>> > for me.
>>
>> Note that I might be missing some patches, but what's currently in Linus
>> tree is rather wrong.
>>
>> The comment is right, but the implementation is bogus:
>
> Oh and another one (again, I never saw the patches I'm responding to for
> some odd reason, only email 0/2, so you might have already fixed that):

DNS issue with kernel.org when the patches went out caused it to not
get out to any of the mailing lists.

In Linux' github tree
Patch 1/2 is commit 5307f6d5fb12fd01f9f321bc4a8fd77e74858647
Patch 2/2 is commit ed2888e906b56769b4ffabb9c577190438aa68b8

> When using "safe" mode, I get a crash in pcie_find_smpss(). Patch below.
>
> From 1b542d690323fab332e61099193d598c5c515d13 Mon Sep 17 00:00:00 2001
> From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Date: Tue, 13 Sep 2011 15:16:33 -0300
> Subject: [PATCH] pci: Don't crash when reading mpss from root complex
>
> In pcie_find_smpss(), we have the following statement:
>
>        if (dev->is_hotplug_bridge && (!list_is_singular(&dev->bus->devices) ||
>            dev->bus->self->pcie_type != PCI_EXP_TYPE_ROOT_PORT))
>
> The problem is that at least on my machine, this gets called for the
> root complex (virtual P2P bridge), and dev->bus->self is NULL since
> the parent bus for this is not itself anchor to a PCI device.
>
> This adds the necessary NULL check.
>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Good catch.

Acked-by: Jon Mason <mason@myri.com>

> ---
>  drivers/pci/probe.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 8473727..09d1c0f 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1351,7 +1351,8 @@ static int pcie_find_smpss(struct pci_dev *dev, void *data)
>         * will occur as normal.
>         */
>        if (dev->is_hotplug_bridge && (!list_is_singular(&dev->bus->devices) ||
> -           dev->bus->self->pcie_type != PCI_EXP_TYPE_ROOT_PORT))
> +            (dev->bus->self &&
> +             dev->bus->self->pcie_type != PCI_EXP_TYPE_ROOT_PORT)))
>                *smpss = 0;
>
>        if (*smpss > dev->pcie_mpss)
> --
> 1.7.4.1
>
>
>
>
>
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] PCI: PCI-E MPS bug fixes
  2011-09-13 21:05         ` Jon Mason
@ 2011-09-13 21:56           ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2011-09-13 21:56 UTC (permalink / raw)
  To: Jon Mason; +Cc: Linus Torvalds, linux-kernel, linux-pci, Jesse Barnes


> DNS issue with kernel.org when the patches went out caused it to not
> get out to any of the mailing lists.
> 
> In Linux' github tree
> Patch 1/2 is commit 5307f6d5fb12fd01f9f321bc4a8fd77e74858647
> Patch 2/2 is commit ed2888e906b56769b4ffabb9c577190438aa68b8
> 
> > When using "safe" mode, I get a crash in pcie_find_smpss(). Patch below.
> >
> > From 1b542d690323fab332e61099193d598c5c515d13 Mon Sep 17 00:00:00 2001
> > From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > Date: Tue, 13 Sep 2011 15:16:33 -0300
> > Subject: [PATCH] pci: Don't crash when reading mpss from root complex
> >
> > In pcie_find_smpss(), we have the following statement:
> >
> >        if (dev->is_hotplug_bridge && (!list_is_singular(&dev->bus->devices) ||
> >            dev->bus->self->pcie_type != PCI_EXP_TYPE_ROOT_PORT))
> >
> > The problem is that at least on my machine, this gets called for the
> > root complex (virtual P2P bridge), and dev->bus->self is NULL since
> > the parent bus for this is not itself anchor to a PCI device.
> >
> > This adds the necessary NULL check.
> >
> > Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> 
> Good catch.
> 
> Acked-by: Jon Mason <mason@myri.com>

Linus, please apply.

Cheers
Ben.

> > ---
> >  drivers/pci/probe.c |    3 ++-
> >  1 files changed, 2 insertions(+), 1 deletions(-)
> >
> > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > index 8473727..09d1c0f 100644
> > --- a/drivers/pci/probe.c
> > +++ b/drivers/pci/probe.c
> > @@ -1351,7 +1351,8 @@ static int pcie_find_smpss(struct pci_dev *dev, void *data)
> >         * will occur as normal.
> >         */
> >        if (dev->is_hotplug_bridge && (!list_is_singular(&dev->bus->devices) ||
> > -           dev->bus->self->pcie_type != PCI_EXP_TYPE_ROOT_PORT))
> > +            (dev->bus->self &&
> > +             dev->bus->self->pcie_type != PCI_EXP_TYPE_ROOT_PORT)))
> >                *smpss = 0;
> >
> >        if (*smpss > dev->pcie_mpss)
> > --
> > 1.7.4.1
> >
> >
> >
> >
> >
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-09-13 21:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-08 21:41 [PATCH 0/2] PCI: PCI-E MPS bug fixes Jon Mason
2011-09-09 22:44 ` Linus Torvalds
2011-09-10  0:04   ` Jesse Barnes
2011-09-13 10:50     ` Benjamin Herrenschmidt
2011-09-13 18:17       ` Benjamin Herrenschmidt
2011-09-13 21:05         ` Jon Mason
2011-09-13 21:56           ` Benjamin Herrenschmidt
2011-09-13 20:55       ` Jon Mason

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.