All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] PCI: avoid NULL deref in alloc_pcie_link_state
@ 2013-06-19 18:56 Radim Krčmář
  2013-06-25  1:38 ` Bjorn Helgaas
  0 siblings, 1 reply; 11+ messages in thread
From: Radim Krčmář @ 2013-06-19 18:56 UTC (permalink / raw)
  To: linux-pci
  Cc: linux-kernel, Myron Stowe, Joe Lawrence, Kenji Kaneshige,
	Bjorn Helgaas, Radim Krčmář

PCIe switch upstream port can be connected directly to the PCIe root bus
in QEMU; ASPM does not expect this topology and dereferences NULL pointer
when initializing.

I have not confirmed this can happen on real hardware, but it is presented
as a feature in QEMU, so there is no reason to panic if we can recover.

The dereference happens with topology defined by
  -M q35 -device x3130-upstream,bus=pcie.0,id=upstream \
  -device xio3130-downstream,bus=upstream,id=downstream,chassis=1
where on line drivers/pci/pcie/aspm.c:530 (alloc_pcie_link_state+13):
  		parent = pdev->bus->parent->self->link_state;
"pdev->bus->parent->self == NULL", because "pdev->bus->parent" has no
"->parent", hence no "->self".

Even though discouraged by QEMU documentation, one can set up even
topology without the upstream port
  -M q35 -device xio3130-downstream,bus=pcie.0,id=downstream,chassis=1
so "pdev->bus->parent == NULL", because "pdev->bus" is the root bus.
The patch checks for this too, because I do not like *NULL.

Right now, PCIe switch has to connect to the root port
  -M q35 -device ioh3420,bus=pcie.0,id=root.0 \
  -device x3130-upstream,bus=root.0,id=upstream \
  -device xio3130-downstream,bus=upstream,id=downstream,chassis=1

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 drivers/pci/pcie/aspm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 403a443..1ad1514 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -527,8 +527,8 @@ static struct pcie_link_state *alloc_pcie_link_state(struct pci_dev *pdev)
 	link->pdev = pdev;
 	if (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM) {
 		struct pcie_link_state *parent;
-		parent = pdev->bus->parent->self->link_state;
-		if (!parent) {
+		if (!pdev->bus->parent || !pdev->bus->parent->self ||
+		    !(parent = pdev->bus->parent->self->link_state)) {
 			kfree(link);
 			return NULL;
 		}
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] PCI: avoid NULL deref in alloc_pcie_link_state
  2013-06-19 18:56 [PATCH] PCI: avoid NULL deref in alloc_pcie_link_state Radim Krčmář
@ 2013-06-25  1:38 ` Bjorn Helgaas
  2013-06-25  2:58   ` Alex Williamson
  2013-06-25 11:23   ` Michael S. Tsirkin
  0 siblings, 2 replies; 11+ messages in thread
From: Bjorn Helgaas @ 2013-06-25  1:38 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: linux-pci, linux-kernel, Myron Stowe, Joe Lawrence,
	Kenji Kaneshige, Michael S. Tsirkin, Isaku Yamahata,
	Alex Williamson

[+cc Michael, Alex, Isaku]

On Wed, Jun 19, 2013 at 12:56 PM, Radim Krčmář <rkrcmar@redhat.com> wrote:
> PCIe switch upstream port can be connected directly to the PCIe root bus
> in QEMU; ASPM does not expect this topology and dereferences NULL pointer
> when initializing.
>
> I have not confirmed this can happen on real hardware, but it is presented
> as a feature in QEMU, so there is no reason to panic if we can recover.

This doesn't seem like a valid hardware topology to me.  If this *can*
occur on real hardware, we should fix it in Linux.  If not, maybe QEMU
should be changed to disallow it.

> The dereference happens with topology defined by
>   -M q35 -device x3130-upstream,bus=pcie.0,id=upstream \
>   -device xio3130-downstream,bus=upstream,id=downstream,chassis=1
> where on line drivers/pci/pcie/aspm.c:530 (alloc_pcie_link_state+13):
>                 parent = pdev->bus->parent->self->link_state;
> "pdev->bus->parent->self == NULL", because "pdev->bus->parent" has no
> "->parent", hence no "->self".
>
> Even though discouraged by QEMU documentation, one can set up even
> topology without the upstream port
>   -M q35 -device xio3130-downstream,bus=pcie.0,id=downstream,chassis=1
> so "pdev->bus->parent == NULL", because "pdev->bus" is the root bus.
> The patch checks for this too, because I do not like *NULL.
>
> Right now, PCIe switch has to connect to the root port
>   -M q35 -device ioh3420,bus=pcie.0,id=root.0 \
>   -device x3130-upstream,bus=root.0,id=upstream \
>   -device xio3130-downstream,bus=upstream,id=downstream,chassis=1
>
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  drivers/pci/pcie/aspm.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index 403a443..1ad1514 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -527,8 +527,8 @@ static struct pcie_link_state *alloc_pcie_link_state(struct pci_dev *pdev)
>         link->pdev = pdev;
>         if (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM) {
>                 struct pcie_link_state *parent;
> -               parent = pdev->bus->parent->self->link_state;
> -               if (!parent) {
> +               if (!pdev->bus->parent || !pdev->bus->parent->self ||
> +                   !(parent = pdev->bus->parent->self->link_state)) {
>                         kfree(link);
>                         return NULL;
>                 }
> --
> 1.8.1.4

I don't really want to further complicate the "if" statement you're
changing.  The link state allocation is pretty obtuse already, and if
this situation only occurs in QEMU, we're likely to break it again
when somebody refactors this code.

Bjorn

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] PCI: avoid NULL deref in alloc_pcie_link_state
  2013-06-25  1:38 ` Bjorn Helgaas
@ 2013-06-25  2:58   ` Alex Williamson
  2013-06-25  3:35     ` Bjorn Helgaas
  2013-06-25 11:23   ` Michael S. Tsirkin
  1 sibling, 1 reply; 11+ messages in thread
From: Alex Williamson @ 2013-06-25  2:58 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Radim Krčmář,
	linux-pci, linux-kernel, Myron Stowe, Joe Lawrence,
	Kenji Kaneshige, Michael S. Tsirkin, Isaku Yamahata

On Mon, 2013-06-24 at 19:38 -0600, Bjorn Helgaas wrote:
> [+cc Michael, Alex, Isaku]
> 
> On Wed, Jun 19, 2013 at 12:56 PM, Radim Krčmář <rkrcmar@redhat.com> wrote:
> > PCIe switch upstream port can be connected directly to the PCIe root bus
> > in QEMU; ASPM does not expect this topology and dereferences NULL pointer
> > when initializing.
> >
> > I have not confirmed this can happen on real hardware, but it is presented
> > as a feature in QEMU, so there is no reason to panic if we can recover.
> 
> This doesn't seem like a valid hardware topology to me.  If this *can*
> occur on real hardware, we should fix it in Linux.  If not, maybe QEMU
> should be changed to disallow it.

I think a quad-port 82576 plugged into an express slot is likely the
same topology.

> > The dereference happens with topology defined by
> >   -M q35 -device x3130-upstream,bus=pcie.0,id=upstream \
> >   -device xio3130-downstream,bus=upstream,id=downstream,chassis=1
> > where on line drivers/pci/pcie/aspm.c:530 (alloc_pcie_link_state+13):
> >                 parent = pdev->bus->parent->self->link_state;
> > "pdev->bus->parent->self == NULL", because "pdev->bus->parent" has no
> > "->parent", hence no "->self".
> >
> > Even though discouraged by QEMU documentation, one can set up even
> > topology without the upstream port
> >   -M q35 -device xio3130-downstream,bus=pcie.0,id=downstream,chassis=1
> > so "pdev->bus->parent == NULL", because "pdev->bus" is the root bus.
> > The patch checks for this too, because I do not like *NULL.

I don't think this is legal on real hardware.

> > Right now, PCIe switch has to connect to the root port
> >   -M q35 -device ioh3420,bus=pcie.0,id=root.0 \
> >   -device x3130-upstream,bus=root.0,id=upstream \
> >   -device xio3130-downstream,bus=upstream,id=downstream,chassis=1
> >
> > Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> > ---
> >  drivers/pci/pcie/aspm.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > index 403a443..1ad1514 100644
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -527,8 +527,8 @@ static struct pcie_link_state *alloc_pcie_link_state(struct pci_dev *pdev)
> >         link->pdev = pdev;
> >         if (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM) {
> >                 struct pcie_link_state *parent;
> > -               parent = pdev->bus->parent->self->link_state;
> > -               if (!parent) {
> > +               if (!pdev->bus->parent || !pdev->bus->parent->self ||

I think there's an extra test in here.  Elsewhere we seem to assume that
if parent exists, then so does parent->self.  So this could be
simplified to just add:

if (pci_is_root_bus(pdev->bus) {
	kfree(link);
	return NULL;
}

> > +                   !(parent = pdev->bus->parent->self->link_state)) {
> >                         kfree(link);
> >                         return NULL;
> >                 }
> > --
> > 1.8.1.4
> 
> I don't really want to further complicate the "if" statement you're
> changing.  The link state allocation is pretty obtuse already, and if
> this situation only occurs in QEMU, we're likely to break it again
> when somebody refactors this code.

Maybe the above plus a common exit to avoid duplicate free/return.
Thanks,

Alex



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] PCI: avoid NULL deref in alloc_pcie_link_state
  2013-06-25  2:58   ` Alex Williamson
@ 2013-06-25  3:35     ` Bjorn Helgaas
  2013-06-25  3:57       ` Alex Williamson
  0 siblings, 1 reply; 11+ messages in thread
From: Bjorn Helgaas @ 2013-06-25  3:35 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Radim Krčmář,
	linux-pci, linux-kernel, Myron Stowe, Joe Lawrence,
	Kenji Kaneshige, Michael S. Tsirkin, Isaku Yamahata

On Mon, Jun 24, 2013 at 8:58 PM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Mon, 2013-06-24 at 19:38 -0600, Bjorn Helgaas wrote:
>> [+cc Michael, Alex, Isaku]
>>
>> On Wed, Jun 19, 2013 at 12:56 PM, Radim Krčmář <rkrcmar@redhat.com> wrote:
>> > PCIe switch upstream port can be connected directly to the PCIe root bus
>> > in QEMU; ASPM does not expect this topology and dereferences NULL pointer
>> > when initializing.
>> >
>> > I have not confirmed this can happen on real hardware, but it is presented
>> > as a feature in QEMU, so there is no reason to panic if we can recover.
>>
>> This doesn't seem like a valid hardware topology to me.  If this *can*
>> occur on real hardware, we should fix it in Linux.  If not, maybe QEMU
>> should be changed to disallow it.
>
> I think a quad-port 82576 plugged into an express slot is likely the
> same topology.

I don't think that would be the same topology Radim described.  In
Radim's case, we have this:

  00:03.0 upstream port
  01:00.0 downstream port

and when we call alloc_pcie_link_state() for 01:00.0,

  pdev is 01:00.0
  pdev->bus is bus 01
  pdev->bus->parent is bus 00
  pdev->bus->parent->self (the bridge device leading to bus 00) is NULL

But in the case of a quad 82576 plugged into a slot, there would be a
root port or a downstream port leading to the slot's link, so my guess
is we'd have something like this (based on lspci output I found at
[1]):

  00:05.0 root port leading to slot (bridge to [bus 01-08])
  01:00.0 upstream port of switch on card (bridge to [bus 02-08])
  02:02.0 downstream port (bridge to [bus 03-05])
  02:04.0 downstream port (bridge to [bus 06-08])
  03:00.0 82576 port 0
  03:00.1 82576 port 1
  06:00.0 82576 port 2
  06:00.1 82576 port 3

So when we call alloc_pcie_link_state() for 02:02.0,

  pdev is 02:02.0
  pdev->bus is bus 02
  pdev->bus->parent is bus 01
  pdev->bus->parent->self (the bridge leading to bus 01) is 00:05.0

Bjorn

[1] http://sourceforge.net/p/e1000/bugs/112/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] PCI: avoid NULL deref in alloc_pcie_link_state
  2013-06-25  3:35     ` Bjorn Helgaas
@ 2013-06-25  3:57       ` Alex Williamson
  0 siblings, 0 replies; 11+ messages in thread
From: Alex Williamson @ 2013-06-25  3:57 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Radim Krčmář,
	linux-pci, linux-kernel, Myron Stowe, Joe Lawrence,
	Kenji Kaneshige, Michael S. Tsirkin, Isaku Yamahata

On Mon, 2013-06-24 at 21:35 -0600, Bjorn Helgaas wrote:
> On Mon, Jun 24, 2013 at 8:58 PM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> > On Mon, 2013-06-24 at 19:38 -0600, Bjorn Helgaas wrote:
> >> [+cc Michael, Alex, Isaku]
> >>
> >> On Wed, Jun 19, 2013 at 12:56 PM, Radim Krčmář <rkrcmar@redhat.com> wrote:
> >> > PCIe switch upstream port can be connected directly to the PCIe root bus
> >> > in QEMU; ASPM does not expect this topology and dereferences NULL pointer
> >> > when initializing.
> >> >
> >> > I have not confirmed this can happen on real hardware, but it is presented
> >> > as a feature in QEMU, so there is no reason to panic if we can recover.
> >>
> >> This doesn't seem like a valid hardware topology to me.  If this *can*
> >> occur on real hardware, we should fix it in Linux.  If not, maybe QEMU
> >> should be changed to disallow it.
> >
> > I think a quad-port 82576 plugged into an express slot is likely the
> > same topology.
> 
> I don't think that would be the same topology Radim described.  In
> Radim's case, we have this:
> 
>   00:03.0 upstream port
>   01:00.0 downstream port
> 
> and when we call alloc_pcie_link_state() for 01:00.0,
> 
>   pdev is 01:00.0
>   pdev->bus is bus 01
>   pdev->bus->parent is bus 00
>   pdev->bus->parent->self (the bridge device leading to bus 00) is NULL
> 
> But in the case of a quad 82576 plugged into a slot, there would be a
> root port or a downstream port leading to the slot's link, so my guess
> is we'd have something like this (based on lspci output I found at
> [1]):
> 
>   00:05.0 root port leading to slot (bridge to [bus 01-08])
>   01:00.0 upstream port of switch on card (bridge to [bus 02-08])
>   02:02.0 downstream port (bridge to [bus 03-05])
>   02:04.0 downstream port (bridge to [bus 06-08])
>   03:00.0 82576 port 0
>   03:00.1 82576 port 1
>   06:00.0 82576 port 2
>   06:00.1 82576 port 3
> 
> So when we call alloc_pcie_link_state() for 02:02.0,
> 
>   pdev is 02:02.0
>   pdev->bus is bus 02
>   pdev->bus->parent is bus 01
>   pdev->bus->parent->self (the bridge leading to bus 01) is 00:05.0


Oops, I misread his statement.  I don't think an upstream port connected
directly to a root complex is valid.  I thought he was having problems
with an upstream port connected to a root port, which is obviously
valid.  QEMU lets us attach nearly anything to the root complex, most of
them aren't valid.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] PCI: avoid NULL deref in alloc_pcie_link_state
  2013-06-25  1:38 ` Bjorn Helgaas
  2013-06-25  2:58   ` Alex Williamson
@ 2013-06-25 11:23   ` Michael S. Tsirkin
  2013-06-25 17:17     ` Bjorn Helgaas
  1 sibling, 1 reply; 11+ messages in thread
From: Michael S. Tsirkin @ 2013-06-25 11:23 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Radim Krčmář,
	linux-pci, linux-kernel, Myron Stowe, Joe Lawrence,
	Kenji Kaneshige, Isaku Yamahata, Alex Williamson

On Mon, Jun 24, 2013 at 07:38:45PM -0600, Bjorn Helgaas wrote:
> [+cc Michael, Alex, Isaku]
> 
> On Wed, Jun 19, 2013 at 12:56 PM, Radim Krčmář <rkrcmar@redhat.com> wrote:
> > PCIe switch upstream port can be connected directly to the PCIe root bus
> > in QEMU; ASPM does not expect this topology and dereferences NULL pointer
> > when initializing.
> >
> > I have not confirmed this can happen on real hardware, but it is presented
> > as a feature in QEMU, so there is no reason to panic if we can recover.
> 
> This doesn't seem like a valid hardware topology to me.  If this *can*
> occur on real hardware, we should fix it in Linux.  If not, maybe QEMU
> should be changed to disallow it.

I don't think it's a spec compliant topology either.

Anything connected to an RC is either an integrated endpoint
or a root port.
There's no way to have an upstream port that is also
an integrated endpoint or a root port - these are distinct
device types.

So I don't think Linux needs to support it.

Having said that, there's all kind of broken hardware
out there - crashing is not a friendly way to tell users
that their hardware is not spec compliant.
Maybe linux can print a friendly warning and ignore
this port?


> > The dereference happens with topology defined by
> >   -M q35 -device x3130-upstream,bus=pcie.0,id=upstream \
> >   -device xio3130-downstream,bus=upstream,id=downstream,chassis=1
> > where on line drivers/pci/pcie/aspm.c:530 (alloc_pcie_link_state+13):
> >                 parent = pdev->bus->parent->self->link_state;
> > "pdev->bus->parent->self == NULL", because "pdev->bus->parent" has no
> > "->parent", hence no "->self".
> >
> > Even though discouraged by QEMU documentation, one can set up even
> > topology without the upstream port
> >   -M q35 -device xio3130-downstream,bus=pcie.0,id=downstream,chassis=1
> > so "pdev->bus->parent == NULL", because "pdev->bus" is the root bus.
> > The patch checks for this too, because I do not like *NULL.
> >
> > Right now, PCIe switch has to connect to the root port
> >   -M q35 -device ioh3420,bus=pcie.0,id=root.0 \
> >   -device x3130-upstream,bus=root.0,id=upstream \
> >   -device xio3130-downstream,bus=upstream,id=downstream,chassis=1
> >
> > Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> > ---
> >  drivers/pci/pcie/aspm.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > index 403a443..1ad1514 100644
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -527,8 +527,8 @@ static struct pcie_link_state *alloc_pcie_link_state(struct pci_dev *pdev)
> >         link->pdev = pdev;
> >         if (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM) {
> >                 struct pcie_link_state *parent;
> > -               parent = pdev->bus->parent->self->link_state;
> > -               if (!parent) {
> > +               if (!pdev->bus->parent || !pdev->bus->parent->self ||
> > +                   !(parent = pdev->bus->parent->self->link_state)) {
> >                         kfree(link);
> >                         return NULL;
> >                 }
> > --
> > 1.8.1.4
> 
> I don't really want to further complicate the "if" statement you're
> changing.  The link state allocation is pretty obtuse already, and if
> this situation only occurs in QEMU, we're likely to break it again
> when somebody refactors this code.
> 
> Bjorn

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] PCI: avoid NULL deref in alloc_pcie_link_state
  2013-06-25 11:23   ` Michael S. Tsirkin
@ 2013-06-25 17:17     ` Bjorn Helgaas
  2013-06-25 20:50       ` Radim Krčmář
  0 siblings, 1 reply; 11+ messages in thread
From: Bjorn Helgaas @ 2013-06-25 17:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Radim Krčmář,
	linux-pci, linux-kernel, Myron Stowe, Joe Lawrence,
	Kenji Kaneshige, Isaku Yamahata, Alex Williamson

On Tue, Jun 25, 2013 at 5:23 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Mon, Jun 24, 2013 at 07:38:45PM -0600, Bjorn Helgaas wrote:
>> [+cc Michael, Alex, Isaku]
>>
>> On Wed, Jun 19, 2013 at 12:56 PM, Radim Krčmář <rkrcmar@redhat.com> wrote:
>> > PCIe switch upstream port can be connected directly to the PCIe root bus
>> > in QEMU; ASPM does not expect this topology and dereferences NULL pointer
>> > when initializing.
>> >
>> > I have not confirmed this can happen on real hardware, but it is presented
>> > as a feature in QEMU, so there is no reason to panic if we can recover.
>>
>> This doesn't seem like a valid hardware topology to me.  If this *can*
>> occur on real hardware, we should fix it in Linux.  If not, maybe QEMU
>> should be changed to disallow it.
>
> I don't think it's a spec compliant topology either.
>
> Anything connected to an RC is either an integrated endpoint
> or a root port.
> There's no way to have an upstream port that is also
> an integrated endpoint or a root port - these are distinct
> device types.
>
> So I don't think Linux needs to support it.
>
> Having said that, there's all kind of broken hardware
> out there - crashing is not a friendly way to tell users
> that their hardware is not spec compliant.
> Maybe linux can print a friendly warning and ignore
> this port?

Indeed, that would be nicer.  I booted Win7 on the same config, and it
came up fine.  It did complain that it couldn't start the PCI-to-PCI
bridge driver on 00:03.0, the upstream port.

I'd like it if Linux could similarly tolerate that.  But since this is
a relatively low-priority issue, I'm going to hold out for a more
straightforward solution.  Checking for a null
"pdev->bus->parent->self" pointer is not very obvious.  I think we can
probably come up with a more direct check that reads better and
possibly even detects the issue at the upstream port, not the
downstream port.

I opened a bugzilla for this issue:
https://bugzilla.kernel.org/show_bug.cgi?id=60111

Radim, can you please attach a complete dmesg log showing the oops,
i.e., console output with "ignore_loglevel" and lspci -vv output (have
to use your patch so it boots, I guess)?  I tried to reproduce it and
I know the problem is there, but I haven't quite found the right
recipe yet.

Bjorn

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] PCI: avoid NULL deref in alloc_pcie_link_state
  2013-06-25 17:17     ` Bjorn Helgaas
@ 2013-06-25 20:50       ` Radim Krčmář
  0 siblings, 0 replies; 11+ messages in thread
From: Radim Krčmář @ 2013-06-25 20:50 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Michael S. Tsirkin, linux-pci, linux-kernel, Myron Stowe,
	Joe Lawrence, Kenji Kaneshige, Isaku Yamahata, Alex Williamson

2013-06-25 11:17-0600, Bjorn Helgaas:
> On Tue, Jun 25, 2013 at 5:23 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Mon, Jun 24, 2013 at 07:38:45PM -0600, Bjorn Helgaas wrote:
> >> [+cc Michael, Alex, Isaku]
> >>
> >> On Wed, Jun 19, 2013 at 12:56 PM, Radim Krčmář <rkrcmar@redhat.com> wrote:
> >> > PCIe switch upstream port can be connected directly to the PCIe root bus
> >> > in QEMU; ASPM does not expect this topology and dereferences NULL pointer
> >> > when initializing.
> >> >
> >> > I have not confirmed this can happen on real hardware, but it is presented
> >> > as a feature in QEMU, so there is no reason to panic if we can recover.
> >>
> >> This doesn't seem like a valid hardware topology to me.  If this *can*
> >> occur on real hardware, we should fix it in Linux.  If not, maybe QEMU
> >> should be changed to disallow it.
> >
> > I don't think it's a spec compliant topology either.
> >
> > Anything connected to an RC is either an integrated endpoint
> > or a root port.
> > There's no way to have an upstream port that is also
> > an integrated endpoint or a root port - these are distinct
> > device types.
> >
> > So I don't think Linux needs to support it.
> >
> > Having said that, there's all kind of broken hardware
> > out there - crashing is not a friendly way to tell users
> > that their hardware is not spec compliant.
> > Maybe linux can print a friendly warning and ignore
> > this port?
> 
> Indeed, that would be nicer.  I booted Win7 on the same config, and it
> came up fine.  It did complain that it couldn't start the PCI-to-PCI
> bridge driver on 00:03.0, the upstream port.

True, would something like

  printk(KERN_WARNING "pcie: %s connected directly to root bus[complex?]:"
                      " aspm disabled\n", dev_name(pdev));

be enough?

> I'd like it if Linux could similarly tolerate that.  But since this is
> a relatively low-priority issue, I'm going to hold out for a more
> straightforward solution.  Checking for a null
> "pdev->bus->parent->self" pointer is not very obvious.  I think we can
> probably come up with a more direct check that reads better and
> possibly even detects the issue at the upstream port, not the
> downstream port.

The check could be in pcie_aspm_init_link_state, where a "strange" VIA
chipset is already being detected and using "pci_is_root_bus()" instead
of "->self" is probably clearer as well.

  	/* QEMU can connect upstream port directly to the root complex */
  	if (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM &&
  	    !pci_is_root_bus(pdev->bus->parent))
  		return;

If we were to check for the, even worse, downstream<->root complex,
it would be "!pci_is_root_bus(pdev->bus)".

Upstream port is a bit neglected in our aspm implementation, so
refactoring might be required to get detection in it.

> I opened a bugzilla for this issue:
> https://bugzilla.kernel.org/show_bug.cgi?id=60111
> 
> Radim, can you please attach a complete dmesg log showing the oops,
> i.e., console output with "ignore_loglevel" and lspci -vv output (have
> to use your patch so it boots, I guess)?  I tried to reproduce it and
> I know the problem is there, but I haven't quite found the right
> recipe yet.

Sorry, I have not mentioned that device must be connected to the switch,
so "pci_scan_slot" does not end with "nr == 0", skipping the link setup.
I connected the root disk; whole command for qemu-kvm 1.5:

  qemu-kvm -m 2048 -M q35 -serial stdio -vnc :0 \
  	-device x3130-upstream,bus=pcie.0,id=upstream \
  	-device xio3130-downstream,bus=upstream,id=downstream,chassis=1 \
  	-drive file=fc19.img,id=disk1,if=none,format=raw,media=disk,cache=none \
  	-device virtio-blk-pci,bus=downstream,id=virtio-disk1,drive=disk1

"pcie_aspm=off" should do exactly the same as patch.

Both logs attached in bugzilla.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] PCI: avoid NULL deref in alloc_pcie_link_state
  2013-08-23  0:02 ` Bjorn Helgaas
@ 2013-08-23 21:46   ` Bjorn Helgaas
  0 siblings, 0 replies; 11+ messages in thread
From: Bjorn Helgaas @ 2013-08-23 21:46 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: linux-kernel, linux-pci, Michael S. Tsirkin, Alex Williamson,
	Myron Stowe, Joe Lawrence, Kenji Kaneshige, Isaku Yamahata,
	Shaohua Li

[+cc Shaohua]

On Thu, Aug 22, 2013 at 6:02 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Thu, Aug 08, 2013 at 03:57:07PM +0200, Radim Krčmář wrote:
>> PCIe switch can be connected directly to the PCIe root complex in QEMU;
>> ASPM does not expect this topology and dereferences NULL pointer when
>> initializing.
>>
>> Downstream port can be also connected to the root complex without
>> upstream one, so code checks for both, otherwise they dereference NULL
>> on line drivers/pci/pcie/aspm.c:530 (alloc_pcie_link_state+13):
>>               parent = pdev->bus->parent->self->link_state;
>> "pdev->bus->parent->self == NULL" if upstream port is connected directly
>> to the root bus and "pdev->bus->parent == NULL" in the second case.
>>
>> v1 -> v2: (https://lkml.org/lkml/2013/6/19/753)
>>  - Initialization is aborted in pcie_aspm_init_link_state, where other
>>    special cases are being handled
>>  - pci_is_root_bus is used
>>  - Warning is printed
>>
>> Reproducer for "downstream -- root" and "downstream -- upstream -- root"
>> (used qemu-kvm 1.5, q35 machine type might be missing on older ones)
>>
>>   for parent in pcie.0 upstream; do
>>    qemu-kvm -m 128 -M q35 -nographic -no-reboot \
>>      -device x3130-upstream,bus=pcie.0,id=upstream \
>>      -device xio3130-downstream,bus=$parent,id=downstream,chassis=1 \
>>      -device virtio-blk-pci,bus=downstream,id=virtio-zero,drive=zero \
>>      -drive  file=/dev/zero,id=zero,format=raw \
>>      -kernel bzImage -append "console=ttyS0 panic=3" # pcie_aspm=off
>>   done
>>
>> ASPM in QEMU works if we connect upstream through root port
>>   -device ioh3420,bus=pcie.0,id=root.0 \
>>   -device x3130-upstream,bus=root.0,id=upstream
>>
>> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
>> ---
>>  drivers/pci/pcie/aspm.c | 9 +++++++++
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>> index 403a443..209cd7f 100644
>> --- a/drivers/pci/pcie/aspm.c
>> +++ b/drivers/pci/pcie/aspm.c
>> @@ -570,6 +570,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev)
>>           pdev->bus->self)
>>               return;
>>
>> +     /* We require at least two ports between downstream and root bus */
>> +     if (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM &&
>> +         (pci_is_root_bus(pdev->bus) ||
>> +          pci_is_root_bus(pdev->bus->parent))) {
>> +             dev_warn(&pdev->dev, "ASPM disabled"
>> +                                  " (connected directly to root bus)\n");
>> +             return;
>> +     }
>
> I don't really want to detect invalid topologies piecemeal -- we will
> likely find other areas (MPS, AER, link speed management, etc.) that
> have similar dependencies.  I'd rather do it generically, maybe with
> something like the following patch.
>
> I tried this with the following qemu invocation:
>
>     $ /usr/local/bin/qemu-system-x86_64 -M q35 -enable-kvm -m 512   -device x3130-upstream,bus=pcie.0,id=upstream   -device xio3130-downstream,bus=upstream,id=downstream,chassis=1   -drive file=ubuntu.img,if=none,id=mydisk   -device ide-drive,drive=mydisk,bus=ide.0   -drive file=scratch.img,id=disk1 -device virtio-blk-pci,bus=downstream,id=virtio-disk1,drive=disk1 -nographic -kernel ~/linux/arch/x86/boot/bzImage   -append "console=ttyS0,115200n8 root=/dev/sda1 ignore_loglevel"
>
> With unmodified v3.11-rc2, I see the NULL pointer dereference, but with
> the patch below applied, we just ignore the 00:03.0 device and the kernel
> boots fine.

Thinking about this some more, I'm inclined to do nothing.  The patch
below *would* avoid the issue, but it's a fair bit of code and
comments that future code readers would have to wade through, and the
only benefit is for topologies that are completely invalid and will
never be seen in real hardware.

I think that we could also avoid this issue by reworking our ASPM
support, and if we were to do anything, that's what I would prefer.
For example, consider the following (legal and common) topology:

    00:1c.0 Root Port (bridge leading to [bus 01-03])
    01:00.0 Upstream Port (bridge leading to [bus 02-03])
    02:00.0 Downstream Port (bridge leading to [bus 03])

When we initialize ASPM link state for 02:00.0, we look all the way up
to the link state for 00:1c.0 (the
"pdev->bus->parent->self->link_state" in alloc_pcie_link_state()).  I
think that's crazy.  ASPM is fundamentally a feature of a *link*, and
the only link directly related to 02:00.0 is the one leading
*downstream* to device 03:00.0.

I know there are latency questions where we do care about upstream
links, and there's commit 46bbdfa44c, which is what originally added
this "pdev->bus->parent->self->link_state" expression.  It suggests
that we need to check the whole hierarchy up to the root port.  But
there's not enough information in the changelog to really explain it.

So I strongly support rework that would simplify ASPM management
overall.   I just don't want to add point fixes that only handle
special cases, especially when they don't occur in the real world.

Bjorn

> ---
>  arch/powerpc/kernel/pci_of_scan.c |    7 ++++++-
>  arch/sparc/kernel/pci.c           |    7 ++++++-
>  drivers/pci/probe.c               |   37 +++++++++++++++++++++++++++++++++----
>  include/linux/pci.h               |    2 +-
>  4 files changed, 46 insertions(+), 7 deletions(-)
>
> diff --git a/arch/powerpc/kernel/pci_of_scan.c b/arch/powerpc/kernel/pci_of_scan.c
> index 6b0ba58..f6ef4dd 100644
> --- a/arch/powerpc/kernel/pci_of_scan.c
> +++ b/arch/powerpc/kernel/pci_of_scan.c
> @@ -143,7 +143,6 @@ struct pci_dev *of_create_pci_dev(struct device_node *node,
>         dev->devfn = devfn;
>         dev->multifunction = 0;         /* maybe a lie? */
>         dev->needs_freset = 0;          /* pcie fundamental reset required */
> -       set_pcie_port_type(dev);
>
>         list_for_each_entry(slot, &dev->bus->slots, list)
>                 if (PCI_SLOT(dev->devfn) == slot->number)
> @@ -164,6 +163,12 @@ struct pci_dev *of_create_pci_dev(struct device_node *node,
>         pr_debug("    class: 0x%x\n", dev->class);
>         pr_debug("    revision: 0x%x\n", dev->revision);
>
> +       if (set_pcie_port_type(dev)) {
> +               pci_bus_put(dev->bus);
> +               kfree(dev);
> +               return NULL;
> +       }
> +
>         dev->current_state = PCI_UNKNOWN;       /* unknown power state */
>         dev->error_state = pci_channel_io_normal;
>         dev->dma_mask = 0xffffffff;
> diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c
> index bc4d3f5..5600849 100644
> --- a/arch/sparc/kernel/pci.c
> +++ b/arch/sparc/kernel/pci.c
> @@ -287,7 +287,6 @@ static struct pci_dev *of_create_pci_dev(struct pci_pbm_info *pbm,
>         dev->dev.of_node = of_node_get(node);
>         dev->devfn = devfn;
>         dev->multifunction = 0;         /* maybe a lie? */
> -       set_pcie_port_type(dev);
>
>         list_for_each_entry(slot, &dev->bus->slots, list)
>                 if (PCI_SLOT(dev->devfn) == slot->number)
> @@ -319,6 +318,12 @@ static struct pci_dev *of_create_pci_dev(struct pci_pbm_info *pbm,
>                 printk("    class: 0x%x device name: %s\n",
>                        dev->class, pci_name(dev));
>
> +       if (set_pcie_port_type(dev)) {
> +               pci_bus_put(dev->bus);
> +               kfree(dev);
> +               return NULL;
> +       }
> +
>         /* I have seen IDE devices which will not respond to
>          * the bmdma simplex check reads if bus mastering is
>          * disabled.
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index cf57fe7..a8cc929 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -970,20 +970,47 @@ static void pci_read_irq(struct pci_dev *dev)
>         dev->irq = irq;
>  }
>
> -void set_pcie_port_type(struct pci_dev *pdev)
> +int set_pcie_port_type(struct pci_dev *pdev)
>  {
> -       int pos;
> +       int pos, type;
>         u16 reg16;
> +       struct pci_dev *parent;
>
>         pos = pci_find_capability(pdev, PCI_CAP_ID_EXP);
>         if (!pos)
> -               return;
> +               return 0;
> +
>         pdev->is_pcie = 1;
>         pdev->pcie_cap = pos;
>         pci_read_config_word(pdev, pos + PCI_EXP_FLAGS, &reg16);
>         pdev->pcie_flags_reg = reg16;
>         pci_read_config_word(pdev, pos + PCI_EXP_DEVCAP, &reg16);
>         pdev->pcie_mpss = reg16 & PCI_EXP_DEVCAP_PAYLOAD;
> +
> +       type = pci_pcie_type(pdev);
> +       parent = pdev->bus->self;
> +
> +       /* An Upstream Port must be below a Root Port or a Downstream Port */
> +       if (type == PCI_EXP_TYPE_UPSTREAM) {
> +               if (!parent ||
> +                   (pci_pcie_type(parent) != PCI_EXP_TYPE_ROOT_PORT &&
> +                    pci_pcie_type(parent) != PCI_EXP_TYPE_DOWNSTREAM)) {
> +                       dev_warn(&pdev->dev,
> +                                "ignoring improperly connected PCIe switch\n");
> +                       return -EINVAL;
> +               }
> +       }
> +
> +       /* A Downstream Port must be directly below an Upstream Port */
> +       if (type == PCI_EXP_TYPE_DOWNSTREAM) {
> +               if (!parent ||
> +                   pci_pcie_type(parent) != PCI_EXP_TYPE_UPSTREAM) {
> +                       dev_warn(&pdev->dev, "ignoring malformed PCIe switch (no Upstream Port)\n");
> +                       return -EINVAL;
> +               }
> +       }
> +
> +       return 0;
>  }
>
>  void set_pcie_hotplug_bridge(struct pci_dev *pdev)
> @@ -1025,7 +1052,6 @@ int pci_setup_device(struct pci_dev *dev)
>         dev->hdr_type = hdr_type & 0x7f;
>         dev->multifunction = !!(hdr_type & 0x80);
>         dev->error_state = pci_channel_io_normal;
> -       set_pcie_port_type(dev);
>
>         list_for_each_entry(slot, &dev->bus->slots, list)
>                 if (PCI_SLOT(dev->devfn) == slot->number)
> @@ -1046,6 +1072,9 @@ int pci_setup_device(struct pci_dev *dev)
>         dev_printk(KERN_DEBUG, &dev->dev, "[%04x:%04x] type %02x class %#08x\n",
>                    dev->vendor, dev->device, dev->hdr_type, dev->class);
>
> +       if (set_pcie_port_type(dev))
> +               return -EINVAL;
> +
>         /* need to have dev->class ready */
>         dev->cfg_size = pci_cfg_space_size(dev);
>
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 89ed123..a61134b 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -982,7 +982,7 @@ void pci_disable_ltr(struct pci_dev *dev);
>  int pci_set_ltr(struct pci_dev *dev, int snoop_lat_ns, int nosnoop_lat_ns);
>
>  /* For use by arch with custom probe code */
> -void set_pcie_port_type(struct pci_dev *pdev);
> +int set_pcie_port_type(struct pci_dev *pdev);
>  void set_pcie_hotplug_bridge(struct pci_dev *pdev);
>
>  /* Functions for PCI Hotplug drivers to use */

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] PCI: avoid NULL deref in alloc_pcie_link_state
  2013-08-08 13:57 Radim Krčmář
@ 2013-08-23  0:02 ` Bjorn Helgaas
  2013-08-23 21:46   ` Bjorn Helgaas
  0 siblings, 1 reply; 11+ messages in thread
From: Bjorn Helgaas @ 2013-08-23  0:02 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: linux-kernel, linux-pci, Michael S. Tsirkin, Alex Williamson,
	Myron Stowe, Joe Lawrence, Kenji Kaneshige, Isaku Yamahata

On Thu, Aug 08, 2013 at 03:57:07PM +0200, Radim Krčmář wrote:
> PCIe switch can be connected directly to the PCIe root complex in QEMU;
> ASPM does not expect this topology and dereferences NULL pointer when
> initializing.
> 
> Downstream port can be also connected to the root complex without
> upstream one, so code checks for both, otherwise they dereference NULL
> on line drivers/pci/pcie/aspm.c:530 (alloc_pcie_link_state+13):
>   		parent = pdev->bus->parent->self->link_state;
> "pdev->bus->parent->self == NULL" if upstream port is connected directly
> to the root bus and "pdev->bus->parent == NULL" in the second case.
> 
> v1 -> v2: (https://lkml.org/lkml/2013/6/19/753)
>  - Initialization is aborted in pcie_aspm_init_link_state, where other
>    special cases are being handled
>  - pci_is_root_bus is used
>  - Warning is printed
> 
> Reproducer for "downstream -- root" and "downstream -- upstream -- root"
> (used qemu-kvm 1.5, q35 machine type might be missing on older ones)
> 
>   for parent in pcie.0 upstream; do
>    qemu-kvm -m 128 -M q35 -nographic -no-reboot \
>      -device x3130-upstream,bus=pcie.0,id=upstream \
>      -device xio3130-downstream,bus=$parent,id=downstream,chassis=1 \
>      -device virtio-blk-pci,bus=downstream,id=virtio-zero,drive=zero \
>      -drive  file=/dev/zero,id=zero,format=raw \
>      -kernel bzImage -append "console=ttyS0 panic=3" # pcie_aspm=off
>   done
> 
> ASPM in QEMU works if we connect upstream through root port
>   -device ioh3420,bus=pcie.0,id=root.0 \
>   -device x3130-upstream,bus=root.0,id=upstream
> 
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  drivers/pci/pcie/aspm.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index 403a443..209cd7f 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -570,6 +570,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev)
>  	    pdev->bus->self)
>  		return;
>  
> +	/* We require at least two ports between downstream and root bus */
> +	if (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM &&
> +	    (pci_is_root_bus(pdev->bus) ||
> +	     pci_is_root_bus(pdev->bus->parent))) {
> +		dev_warn(&pdev->dev, "ASPM disabled"
> +		                     " (connected directly to root bus)\n");
> +		return;
> +	}

I don't really want to detect invalid topologies piecemeal -- we will
likely find other areas (MPS, AER, link speed management, etc.) that
have similar dependencies.  I'd rather do it generically, maybe with
something like the following patch.

I tried this with the following qemu invocation:

    $ /usr/local/bin/qemu-system-x86_64 -M q35 -enable-kvm -m 512   -device x3130-upstream,bus=pcie.0,id=upstream   -device xio3130-downstream,bus=upstream,id=downstream,chassis=1   -drive file=ubuntu.img,if=none,id=mydisk   -device ide-drive,drive=mydisk,bus=ide.0   -drive file=scratch.img,id=disk1 -device virtio-blk-pci,bus=downstream,id=virtio-disk1,drive=disk1 -nographic -kernel ~/linux/arch/x86/boot/bzImage   -append "console=ttyS0,115200n8 root=/dev/sda1 ignore_loglevel"

With unmodified v3.11-rc2, I see the NULL pointer dereference, but with
the patch below applied, we just ignore the 00:03.0 device and the kernel
boots fine.

Bjorn

---
 arch/powerpc/kernel/pci_of_scan.c |    7 ++++++-
 arch/sparc/kernel/pci.c           |    7 ++++++-
 drivers/pci/probe.c               |   37 +++++++++++++++++++++++++++++++++----
 include/linux/pci.h               |    2 +-
 4 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/pci_of_scan.c b/arch/powerpc/kernel/pci_of_scan.c
index 6b0ba58..f6ef4dd 100644
--- a/arch/powerpc/kernel/pci_of_scan.c
+++ b/arch/powerpc/kernel/pci_of_scan.c
@@ -143,7 +143,6 @@ struct pci_dev *of_create_pci_dev(struct device_node *node,
 	dev->devfn = devfn;
 	dev->multifunction = 0;		/* maybe a lie? */
 	dev->needs_freset = 0;		/* pcie fundamental reset required */
-	set_pcie_port_type(dev);
 
 	list_for_each_entry(slot, &dev->bus->slots, list)
 		if (PCI_SLOT(dev->devfn) == slot->number)
@@ -164,6 +163,12 @@ struct pci_dev *of_create_pci_dev(struct device_node *node,
 	pr_debug("    class: 0x%x\n", dev->class);
 	pr_debug("    revision: 0x%x\n", dev->revision);
 
+	if (set_pcie_port_type(dev)) {
+		pci_bus_put(dev->bus);
+		kfree(dev);
+		return NULL;
+	}
+
 	dev->current_state = PCI_UNKNOWN;	/* unknown power state */
 	dev->error_state = pci_channel_io_normal;
 	dev->dma_mask = 0xffffffff;
diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c
index bc4d3f5..5600849 100644
--- a/arch/sparc/kernel/pci.c
+++ b/arch/sparc/kernel/pci.c
@@ -287,7 +287,6 @@ static struct pci_dev *of_create_pci_dev(struct pci_pbm_info *pbm,
 	dev->dev.of_node = of_node_get(node);
 	dev->devfn = devfn;
 	dev->multifunction = 0;		/* maybe a lie? */
-	set_pcie_port_type(dev);
 
 	list_for_each_entry(slot, &dev->bus->slots, list)
 		if (PCI_SLOT(dev->devfn) == slot->number)
@@ -319,6 +318,12 @@ static struct pci_dev *of_create_pci_dev(struct pci_pbm_info *pbm,
 		printk("    class: 0x%x device name: %s\n",
 		       dev->class, pci_name(dev));
 
+	if (set_pcie_port_type(dev)) {
+		pci_bus_put(dev->bus);
+		kfree(dev);
+		return NULL;
+	}
+
 	/* I have seen IDE devices which will not respond to
 	 * the bmdma simplex check reads if bus mastering is
 	 * disabled.
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index cf57fe7..a8cc929 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -970,20 +970,47 @@ static void pci_read_irq(struct pci_dev *dev)
 	dev->irq = irq;
 }
 
-void set_pcie_port_type(struct pci_dev *pdev)
+int set_pcie_port_type(struct pci_dev *pdev)
 {
-	int pos;
+	int pos, type;
 	u16 reg16;
+	struct pci_dev *parent;
 
 	pos = pci_find_capability(pdev, PCI_CAP_ID_EXP);
 	if (!pos)
-		return;
+		return 0;
+
 	pdev->is_pcie = 1;
 	pdev->pcie_cap = pos;
 	pci_read_config_word(pdev, pos + PCI_EXP_FLAGS, &reg16);
 	pdev->pcie_flags_reg = reg16;
 	pci_read_config_word(pdev, pos + PCI_EXP_DEVCAP, &reg16);
 	pdev->pcie_mpss = reg16 & PCI_EXP_DEVCAP_PAYLOAD;
+
+	type = pci_pcie_type(pdev);
+	parent = pdev->bus->self;
+
+	/* An Upstream Port must be below a Root Port or a Downstream Port */
+	if (type == PCI_EXP_TYPE_UPSTREAM) {
+		if (!parent ||
+		    (pci_pcie_type(parent) != PCI_EXP_TYPE_ROOT_PORT &&
+		     pci_pcie_type(parent) != PCI_EXP_TYPE_DOWNSTREAM)) {
+			dev_warn(&pdev->dev,
+				 "ignoring improperly connected PCIe switch\n");
+			return -EINVAL;
+		}
+	}
+
+	/* A Downstream Port must be directly below an Upstream Port */
+	if (type == PCI_EXP_TYPE_DOWNSTREAM) {
+		if (!parent ||
+		    pci_pcie_type(parent) != PCI_EXP_TYPE_UPSTREAM) {
+			dev_warn(&pdev->dev, "ignoring malformed PCIe switch (no Upstream Port)\n");
+			return -EINVAL;
+		}
+	}
+
+	return 0;
 }
 
 void set_pcie_hotplug_bridge(struct pci_dev *pdev)
@@ -1025,7 +1052,6 @@ int pci_setup_device(struct pci_dev *dev)
 	dev->hdr_type = hdr_type & 0x7f;
 	dev->multifunction = !!(hdr_type & 0x80);
 	dev->error_state = pci_channel_io_normal;
-	set_pcie_port_type(dev);
 
 	list_for_each_entry(slot, &dev->bus->slots, list)
 		if (PCI_SLOT(dev->devfn) == slot->number)
@@ -1046,6 +1072,9 @@ int pci_setup_device(struct pci_dev *dev)
 	dev_printk(KERN_DEBUG, &dev->dev, "[%04x:%04x] type %02x class %#08x\n",
 		   dev->vendor, dev->device, dev->hdr_type, dev->class);
 
+	if (set_pcie_port_type(dev))
+		return -EINVAL;
+
 	/* need to have dev->class ready */
 	dev->cfg_size = pci_cfg_space_size(dev);
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 89ed123..a61134b 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -982,7 +982,7 @@ void pci_disable_ltr(struct pci_dev *dev);
 int pci_set_ltr(struct pci_dev *dev, int snoop_lat_ns, int nosnoop_lat_ns);
 
 /* For use by arch with custom probe code */
-void set_pcie_port_type(struct pci_dev *pdev);
+int set_pcie_port_type(struct pci_dev *pdev);
 void set_pcie_hotplug_bridge(struct pci_dev *pdev);
 
 /* Functions for PCI Hotplug drivers to use */

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH] PCI: avoid NULL deref in alloc_pcie_link_state
@ 2013-08-08 13:57 Radim Krčmář
  2013-08-23  0:02 ` Bjorn Helgaas
  0 siblings, 1 reply; 11+ messages in thread
From: Radim Krčmář @ 2013-08-08 13:57 UTC (permalink / raw)
  To: linux-kernel, linux-pci
  Cc: Bjorn Helgaas, Michael S. Tsirkin, Alex Williamson, Myron Stowe,
	Joe Lawrence, Kenji Kaneshige, Isaku Yamahata

PCIe switch can be connected directly to the PCIe root complex in QEMU;
ASPM does not expect this topology and dereferences NULL pointer when
initializing.

Downstream port can be also connected to the root complex without
upstream one, so code checks for both, otherwise they dereference NULL
on line drivers/pci/pcie/aspm.c:530 (alloc_pcie_link_state+13):
  		parent = pdev->bus->parent->self->link_state;
"pdev->bus->parent->self == NULL" if upstream port is connected directly
to the root bus and "pdev->bus->parent == NULL" in the second case.

v1 -> v2: (https://lkml.org/lkml/2013/6/19/753)
 - Initialization is aborted in pcie_aspm_init_link_state, where other
   special cases are being handled
 - pci_is_root_bus is used
 - Warning is printed

Reproducer for "downstream -- root" and "downstream -- upstream -- root"
(used qemu-kvm 1.5, q35 machine type might be missing on older ones)

  for parent in pcie.0 upstream; do
   qemu-kvm -m 128 -M q35 -nographic -no-reboot \
     -device x3130-upstream,bus=pcie.0,id=upstream \
     -device xio3130-downstream,bus=$parent,id=downstream,chassis=1 \
     -device virtio-blk-pci,bus=downstream,id=virtio-zero,drive=zero \
     -drive  file=/dev/zero,id=zero,format=raw \
     -kernel bzImage -append "console=ttyS0 panic=3" # pcie_aspm=off
  done

ASPM in QEMU works if we connect upstream through root port
  -device ioh3420,bus=pcie.0,id=root.0 \
  -device x3130-upstream,bus=root.0,id=upstream

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 drivers/pci/pcie/aspm.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 403a443..209cd7f 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -570,6 +570,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev)
 	    pdev->bus->self)
 		return;
 
+	/* We require at least two ports between downstream and root bus */
+	if (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM &&
+	    (pci_is_root_bus(pdev->bus) ||
+	     pci_is_root_bus(pdev->bus->parent))) {
+		dev_warn(&pdev->dev, "ASPM disabled"
+		                     " (connected directly to root bus)\n");
+		return;
+	}
+
 	down_read(&pci_bus_sem);
 	if (list_empty(&pdev->subordinate->devices))
 		goto out;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-08-23 21:46 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-19 18:56 [PATCH] PCI: avoid NULL deref in alloc_pcie_link_state Radim Krčmář
2013-06-25  1:38 ` Bjorn Helgaas
2013-06-25  2:58   ` Alex Williamson
2013-06-25  3:35     ` Bjorn Helgaas
2013-06-25  3:57       ` Alex Williamson
2013-06-25 11:23   ` Michael S. Tsirkin
2013-06-25 17:17     ` Bjorn Helgaas
2013-06-25 20:50       ` Radim Krčmář
2013-08-08 13:57 Radim Krčmář
2013-08-23  0:02 ` Bjorn Helgaas
2013-08-23 21:46   ` Bjorn Helgaas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.