All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] powerpc/eeh: Validate arch in eeh_add_device_early()
@ 2016-01-10  3:08 Guilherme G. Piccoli
  2016-01-13  6:04 ` Benjamin Herrenschmidt
  2016-01-13 10:38 ` Michael Ellerman
  0 siblings, 2 replies; 9+ messages in thread
From: Guilherme G. Piccoli @ 2016-01-10  3:08 UTC (permalink / raw)
  To: mpe, gwshan, linuxppc-dev; +Cc: benh, paulus, gpiccoli

Commit 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell")
added a check on function eeh_add_device_early(): since in Cell arch eeh_ops
is NULL, that code used to crash on Cell. The commit's approach was validate
if EEH was available by checking the result of function eeh_enabled().

Since the function eeh_add_device_early() is used to perform EEH
initialization in devices added later on the system, like in hotplug/DLPAR
scenarios, we might reach a case in which no PCI devices are present on boot
and so EEH is not initialized. Then, if a device is added via DLPAR for
example, eeh_add_device_early() fails because eeh_enabled() is false.

We can hit a kernel oops on pSeries arch if eeh_add_device_early() fails:
if we have no PCI devices on machine at boot time, and then we add a PCI device
via DLPAR operation, the function query_ddw() triggers the oops on NULL pointer
dereference in the line "cfg_addr = edev->config_addr;". It happens because
config_addr in edev is NULL, since the function eeh_add_device_early() was not
completed successfully.

This patch just changes the way the arch checking is done in function
eeh_add_device_early(): we use no more eeh_enabled(), but instead we check the
running architecture by using the macro machine_is(). If we are running on
pSeries or PowerNV, the EEH mechanism can be enabled; otherwise, we bail out
the function. This way, we don't enable EEH on Cell and we don't hit the oops
on DLPAR either.

Fixes: 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell")
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 40e4d4a..81e2d3e 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1072,7 +1072,13 @@ void eeh_add_device_early(struct pci_dn *pdn)
 	struct pci_controller *phb;
 	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
 
-	if (!edev || !eeh_enabled())
+	if (!edev)
+		return;
+
+	/* Some platforms (like Cell) don't have EEH capabilities, so we
+	 * need to abort here. In case of pseries or powernv, we have EEH
+	 * so we can continue. */
+	if (!machine_is(pseries) && !machine_is(powernv))
 		return;
 
 	if (!eeh_has_flag(EEH_PROBE_MODE_DEVTREE))
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] powerpc/eeh: Validate arch in eeh_add_device_early()
  2016-01-10  3:08 [PATCH] powerpc/eeh: Validate arch in eeh_add_device_early() Guilherme G. Piccoli
@ 2016-01-13  6:04 ` Benjamin Herrenschmidt
  2016-01-13 11:56   ` Guilherme G. Piccoli
  2016-01-13 10:38 ` Michael Ellerman
  1 sibling, 1 reply; 9+ messages in thread
From: Benjamin Herrenschmidt @ 2016-01-13  6:04 UTC (permalink / raw)
  To: Guilherme G. Piccoli, mpe, gwshan, linuxppc-dev; +Cc: paulus

On Sun, 2016-01-10 at 01:08 -0200, Guilherme G. Piccoli wrote:weust changes the way the arch checking is done in function
> 
> This patch jeeh_add_device_early(): we use no more eeh_enabled(), but instead we check therunning architecture by using the macro machine_is(). If we are running on
> pSeries or PowerNV, the EEH mechanism can be enabled; otherwise, we bail out
> the function. This way, we don't enable EEH on Cell and we don't hit the oops
> on DLPAR either.

Can't we just check for eeh_ops being NULL ?

Cheers,
Ben.

> Fixes: 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell")
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/eeh.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index 40e4d4a..81e2d3e 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -1072,7 +1072,13 @@ void eeh_add_device_early(struct pci_dn *pdn)
>  	struct pci_controller *phb;
>  	struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
>  
> -	if (!edev || !eeh_enabled())
> +	if (!edev)
> +		return;
> +
> +	/* Some platforms (like Cell) don't have EEH capabilities, so we
> +	 * need to abort here. In case of pseries or powernv, we have EEH
> +	 * so we can continue. */
> +	if (!machine_is(pseries) && !machine_is(powernv))
>  		return;
>  
>  	if (!eeh_has_flag(EEH_PROBE_MODE_DEVTREE))

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] powerpc/eeh: Validate arch in eeh_add_device_early()
  2016-01-10  3:08 [PATCH] powerpc/eeh: Validate arch in eeh_add_device_early() Guilherme G. Piccoli
  2016-01-13  6:04 ` Benjamin Herrenschmidt
@ 2016-01-13 10:38 ` Michael Ellerman
  2016-01-13 12:08   ` Guilherme G. Piccoli
  1 sibling, 1 reply; 9+ messages in thread
From: Michael Ellerman @ 2016-01-13 10:38 UTC (permalink / raw)
  To: Guilherme G. Piccoli, gwshan, linuxppc-dev; +Cc: benh, paulus

On Sun, 2016-01-10 at 01:08 -0200, Guilherme G. Piccoli wrote:

> Commit 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell")
> added a check on function eeh_add_device_early(): since in Cell arch eeh_ops
> is NULL, that code used to crash on Cell. The commit's approach was validate
> if EEH was available by checking the result of function eeh_enabled().
> 
> Since the function eeh_add_device_early() is used to perform EEH
> initialization in devices added later on the system, like in hotplug/DLPAR
> scenarios, we might reach a case in which no PCI devices are present on boot
> and so EEH is not initialized. Then, if a device is added via DLPAR for
> example, eeh_add_device_early() fails because eeh_enabled() is false.
> 
> We can hit a kernel oops on pSeries arch if eeh_add_device_early() fails:
> if we have no PCI devices on machine at boot time, and then we add a PCI device
> via DLPAR operation, the function query_ddw() triggers the oops on NULL pointer
> dereference in the line "cfg_addr = edev->config_addr;". It happens because
> config_addr in edev is NULL, since the function eeh_add_device_early() was not
> completed successfully.
> 
> This patch just changes the way the arch checking is done in function
> eeh_add_device_early(): we use no more eeh_enabled(), but instead we check the
> running architecture by using the macro machine_is(). If we are running on
> pSeries or PowerNV, the EEH mechanism can be enabled; otherwise, we bail out
> the function. This way, we don't enable EEH on Cell and we don't hit the oops
> on DLPAR either.

But eeh_enabled() is still false? That seems like it's liable to cause breakage
elsewhere.

Shouldn't the PCI hotplug code instead be taught to initialise EEH correctly
when the first device is added?

cheers

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] powerpc/eeh: Validate arch in eeh_add_device_early()
  2016-01-13  6:04 ` Benjamin Herrenschmidt
@ 2016-01-13 11:56   ` Guilherme G. Piccoli
  0 siblings, 0 replies; 9+ messages in thread
From: Guilherme G. Piccoli @ 2016-01-13 11:56 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: mpe, gwshan, linuxppc-dev, paulus

On 01/13/2016 04:04 AM, Benjamin Herrenschmidt wrote:
> On Sun, 2016-01-10 at 01:08 -0200, Guilherme G. Piccoli wrote:weust changes the way the arch checking is done in function
>>
>> This patch jeeh_add_device_early(): we use no more eeh_enabled(), but instead we check the running architecture by using the macro machine_is(). If we are running on
>> pSeries or PowerNV, the EEH mechanism can be enabled; otherwise, we bail out
>> the function. This way, we don't enable EEH on Cell and we don't hit the oops
>> on DLPAR either.
>
> Can't we just check for eeh_ops being NULL ?
>
> Cheers,
> Ben.

Sure, we can. I prefer the arch checking just because I think it's more 
"logical", so it's easier to understand why it's there. The "cost" is 
the same in practice, since the arch checking is just a macro that 
checks a struct member.

What do you think it's better? Thanks for the review.

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] powerpc/eeh: Validate arch in eeh_add_device_early()
  2016-01-13 10:38 ` Michael Ellerman
@ 2016-01-13 12:08   ` Guilherme G. Piccoli
  2016-01-13 21:25     ` Michael Ellerman
  0 siblings, 1 reply; 9+ messages in thread
From: Guilherme G. Piccoli @ 2016-01-13 12:08 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: gwshan, linuxppc-dev, paulus, Benjamin Herrenschmidt

On 01/13/2016 08:38 AM, Michael Ellerman wrote:
> But eeh_enabled() is still false? That seems like it's liable to cause breakage
> elsewhere.

Yes, eeh_enabled() is false as expected. Notice that eeh_enabled() is 
telling if EEH is enabled or not, and since it's not (because there's no 
PCI adapters on machine yet!), makes sense it returns false.

At the end of runs of eeh_add_device_early(), the devices are probed and 
EEH is enabled, so the function will reflect this. Notice that the 
problem addressed by this patch is the use of eeh_enabled() in hotplug 
operations only, which in my opinion is not correct. I checked every 
other use of eeh_enable() in the code, and they all seems appropriate, 
so I expect no errors at all.


> Shouldn't the PCI hotplug code instead be taught to initialise EEH correctly
> when the first device is added?

Well, for sure there are other ways to achieve this patch's goal, like 
refactor PCI hotplug code in lots of places. But notice although the 
proposed solution is simple, it solves the issue because 
eeh_add_device_early() is ultimately the function used by PCI hotplug 
mechanism (no matter if pseries/cell/etc or the type of hotplug) to 
initialize EEH by probing the devices at the moment they are added to 
the system. In my opinion, this is exactly the location we want to 
change code to address this issue.

What do you think? Thanks very much for the review.

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] powerpc/eeh: Validate arch in eeh_add_device_early()
  2016-01-13 12:08   ` Guilherme G. Piccoli
@ 2016-01-13 21:25     ` Michael Ellerman
  2016-01-14 19:59       ` Guilherme G. Piccoli
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Ellerman @ 2016-01-13 21:25 UTC (permalink / raw)
  To: Guilherme G. Piccoli; +Cc: gwshan, linuxppc-dev, paulus, Benjamin Herrenschmidt

On Wed, 2016-01-13 at 10:08 -0200, Guilherme G. Piccoli wrote:
> On 01/13/2016 08:38 AM, Michael Ellerman wrote:
> > But eeh_enabled() is still false? That seems like it's liable to cause breakage
> > elsewhere.
>
> Yes, eeh_enabled() is false as expected. Notice that eeh_enabled() is
> telling if EEH is enabled or not, and since it's not (because there's no
> PCI adapters on machine yet!), makes sense it returns false.
>
> At the end of runs of eeh_add_device_early(), the devices are probed and
> EEH is enabled, so the function will reflect this. Notice that the
> problem addressed by this patch is the use of eeh_enabled() in hotplug
> operations only, which in my opinion is not correct. I checked every
> other use of eeh_enable() in the code, and they all seems appropriate,
> so I expect no errors at all.

OK.

> > Shouldn't the PCI hotplug code instead be taught to initialise EEH correctly
> > when the first device is added?
>
> Well, for sure there are other ways to achieve this patch's goal, like
> refactor PCI hotplug code in lots of places. But notice although the
> proposed solution is simple, it solves the issue because
> eeh_add_device_early() is ultimately the function used by PCI hotplug
> mechanism (no matter if pseries/cell/etc or the type of hotplug) to
> initialize EEH by probing the devices at the moment they are added to
> the system. In my opinion, this is exactly the location we want to
> change code to address this issue.
>
> What do you think? Thanks very much for the review.

I'm still not sure. I'm certainly happy for the fix to be simple, it will need
to be backported after all.

But for example what happens if the user boots with eeh=off on the command
line, and then hotplugs a device. It looks like because you're not using
eeh_enabled() you will incorrectly initialise EEH anyway?

cheers

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] powerpc/eeh: Validate arch in eeh_add_device_early()
  2016-01-13 21:25     ` Michael Ellerman
@ 2016-01-14 19:59       ` Guilherme G. Piccoli
  2016-01-14 23:37         ` Michael Ellerman
  0 siblings, 1 reply; 9+ messages in thread
From: Guilherme G. Piccoli @ 2016-01-14 19:59 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: paulus, linuxppc-dev, gwshan

On 01/13/2016 07:25 PM, Michael Ellerman wrote:
> But for example what happens if the user boots with eeh=off on the command
> line, and then hotplugs a device. It looks like because you're not using
> eeh_enabled() you will incorrectly initialise EEH anyway?

Thanks very much for this catch Michael! I didn't think in this 
possibility; I just tested and it fails with the kernel oops.

So, since my patch does not cover this case, I think would be more 
interesting "unlink" the DDW mechanism from the EEH. It seems easy, I'll 
try to send you a patch soon.

Do you think it is a good approach?

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] powerpc/eeh: Validate arch in eeh_add_device_early()
  2016-01-14 19:59       ` Guilherme G. Piccoli
@ 2016-01-14 23:37         ` Michael Ellerman
  2016-01-19 20:11           ` Guilherme G. Piccoli
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Ellerman @ 2016-01-14 23:37 UTC (permalink / raw)
  To: Guilherme G. Piccoli; +Cc: paulus, linuxppc-dev, gwshan

On Thu, 2016-01-14 at 17:59 -0200, Guilherme G. Piccoli wrote:
> On 01/13/2016 07:25 PM, Michael Ellerman wrote:

> > But for example what happens if the user boots with eeh=off on the command
> > line, and then hotplugs a device. It looks like because you're not using
> > eeh_enabled() you will incorrectly initialise EEH anyway?
>
> Thanks very much for this catch Michael! I didn't think in this
> possibility; I just tested and it fails with the kernel oops.

OK, that's a pity.

> So, since my patch does not cover this case, I think would be more
> interesting "unlink" the DDW mechanism from the EEH. It seems easy, I'll
> try to send you a patch soon.
>
> Do you think it is a good approach?

It sounds good, but I don't know off hand whether it will work. See how it goes
and send us the patch.

cheers

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] powerpc/eeh: Validate arch in eeh_add_device_early()
  2016-01-14 23:37         ` Michael Ellerman
@ 2016-01-19 20:11           ` Guilherme G. Piccoli
  0 siblings, 0 replies; 9+ messages in thread
From: Guilherme G. Piccoli @ 2016-01-19 20:11 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, paulus, gwshan

On 01/14/2016 09:37 PM, Michael Ellerman wrote:
>> So, since my patch does not cover this case, I think would be more
>> interesting "unlink" the DDW mechanism from the EEH. It seems easy, I'll
>> try to send you a patch soon.
>>
>> Do you think it is a good approach?
>
> It sounds good, but I don't know off hand whether it will work. See how it goes
> and send us the patch.

You are right (again!). It's kind of complicated to unlink DDW from EEH 
since it relies on EEH to get the config. address of devices.

Instead of try to unlink it, I've inserted an EEH check to avoid issues 
in case DDW can't count on EEH being enabled. Will sent the patches to 
the list.

Thanks very much for your help and comments,



Guilherme

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-01-19 20:11 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-10  3:08 [PATCH] powerpc/eeh: Validate arch in eeh_add_device_early() Guilherme G. Piccoli
2016-01-13  6:04 ` Benjamin Herrenschmidt
2016-01-13 11:56   ` Guilherme G. Piccoli
2016-01-13 10:38 ` Michael Ellerman
2016-01-13 12:08   ` Guilherme G. Piccoli
2016-01-13 21:25     ` Michael Ellerman
2016-01-14 19:59       ` Guilherme G. Piccoli
2016-01-14 23:37         ` Michael Ellerman
2016-01-19 20:11           ` Guilherme G. Piccoli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.