All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH kernel] powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status
@ 2018-11-19  4:25 Alexey Kardashevskiy
  2018-11-19 23:28 ` Sam Bobroff
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Alexey Kardashevskiy @ 2018-11-19  4:25 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Sam Bobroff,
	Piotr Jaroszynski, Oliver O'Halloran, Reza Arbab

The current implementation of the OPAL_PCI_EEH_FREEZE_STATUS call in
skiboot's NPU driver does not touch the pci_error_type parameter so
it might have garbage but the powernv code analyzes it nevertheless.

This initializes pcierr and fstate to zero in all call sites.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---

Without this, this happens:

pnv_eeh_get_phb_diag: Failure -7 getting PHB#6 diag-data
EEH: PHB#6 failure detected, location: N/A
CPU: 23 PID: 5939 Comm: qemu-system-ppc Not tainted 4.19.0-le_f5a7bb7_aikATfstn1-p1 torvalds#106
Call Trace:
[c000003fea9df9c0] [c000000000a990ec] dump_stack+0xb0/0xf4 (unreliable)
[c000003fea9dfa00] [c000000000038480] eeh_dev_check_failure+0x1f0/0x5f0
[c000003fea9dfaa0] [c0000000000a2768] pnv_pci_read_config+0x128/0x160
[c000003fea9dfae0] [c0000000005d2b0c] pci_bus_read_config_dword+0x9c/0xf0
[c000003fea9dfb40] [c0000000005df3d4] pci_save_state+0x64/0x250
[c000003fea9dfbc0] [c0000000005e0730] pci_dev_save_and_disable+0x70/0xa0
[c000003fea9dfbf0] [c0000000005e4078] pci_try_reset_function+0x48/0xc0
[c000003fea9dfc20] [c00800001cbc1b1c] vfio_pci_ioctl+0x334/0xea0 [vfio_pci]
[c000003fea9dfcf0] [c00800001ca9046c] vfio_device_fops_unl_ioctl+0x44/0x70 [vfio]
[c000003fea9dfd10] [c00000000039fd84] do_vfs_ioctl+0xd4/0xa00
[c000003fea9dfdb0] [c0000000003a07b4] ksys_ioctl+0x104/0x120
[c000003fea9dfe00] [c0000000003a07f8] sys_ioctl+0x28/0x80
[c000003fea9dfe20] [c00000000000b3a4] system_call+0x5c/0x70
EEH: Detected error on PHB#6
EEH: This PCI device has failed 1 times in the last hour and will be permanently disabled after 5 fail
ures.
EEH: Notify device drivers to shutdown
EEH: Beginning: 'error_detected(IO frozen)'
EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
EEH: PE#c (PCI 0006:00:01.0): Invoking vfio-pci->error_detected(IO frozen)
EEH: PE#c (PCI 0006:00:01.0): vfio-pci driver reports: 'can recover'
EEH: PE#c (PCI 0006:00:01.1): Invoking vfio-pci->error_detected(IO frozen)
EEH: PE#c (PCI 0006:00:01.1): vfio-pci driver reports: 'can recover'
EEH: PE#b (PCI 0006:00:02.0): Invoking vfio-pci->error_detected(IO frozen)
EEH: PE#b (PCI 0006:00:02.0): vfio-pci driver reports: 'can recover'
EEH: PE#b (PCI 0006:00:02.1): Invoking vfio-pci->error_detected(IO frozen)
EEH: PE#b (PCI 0006:00:02.1): vfio-pci driver reports: 'can recover'
EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'can recover'
EEH: Collect temporary log
pnv_pci_dump_phb_diag_data: Unrecognized ioType 0
EEH: Reset without hotplug activity
iommu: Removing device 0006:00:01.0 from group 4
iommu: Removing device 0006:00:01.1 from group 4
iommu: Removing device 0006:00:02.0 from group 4
iommu: Removing device 0006:00:02.1 from group 4
pnv_ioda_freeze_pe: Failure -7 freezing PHB#6-PE#0
pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7)
pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7)
pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7)
pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7)
pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7)
pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7)
pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7)
pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7)
pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7)
pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7)
EEH: Sleep 5s ahead of partial hotplug
pci 0004:04     : [PE# 00] Setting up window#0 0..3fffffff pg=1000
pci 0004:05     : [PE# 18] Setting up window#0 0..3fffffff pg=1000
pci 0004:06     : [PE# 30] Setting up window#0 0..3fffffff pg=1000
pci 0006:00:00.0: [PE# 0d] Setting up window 0..3fffffff pg=1000
pci 0006:00:01.0: [PE# 0c] Setting up window 0..3fffffff pg=1000
pci 0006:00:02.0: [PE# 0b] Setting up window 0..3fffffff pg=1000
EEH: Beginning: 'slot_reset'
EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
EEH: Finished:'slot_reset' with aggregate recovery state:'none'
EEH: Notify device driver to resume
EEH: Beginning: 'resume'
EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
EEH: Finished:'resume'
EEH: Recovery successful.
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 8 ++++----
 arch/powerpc/platforms/powernv/pci-ioda.c    | 4 ++--
 arch/powerpc/platforms/powernv/pci.c         | 4 ++--
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index abc0be7..f380789 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -564,8 +564,8 @@ static void pnv_eeh_get_phb_diag(struct eeh_pe *pe)
 static int pnv_eeh_get_phb_state(struct eeh_pe *pe)
 {
 	struct pnv_phb *phb = pe->phb->private_data;
-	u8 fstate;
-	__be16 pcierr;
+	u8 fstate = 0;
+	__be16 pcierr = 0;
 	s64 rc;
 	int result = 0;
 
@@ -603,8 +603,8 @@ static int pnv_eeh_get_phb_state(struct eeh_pe *pe)
 static int pnv_eeh_get_pe_state(struct eeh_pe *pe)
 {
 	struct pnv_phb *phb = pe->phb->private_data;
-	u8 fstate;
-	__be16 pcierr;
+	u8 fstate = 0;
+	__be16 pcierr = 0;
 	s64 rc;
 	int result;
 
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index dd80744..72b5cc0 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -604,8 +604,8 @@ static int pnv_ioda_unfreeze_pe(struct pnv_phb *phb, int pe_no, int opt)
 static int pnv_ioda_get_pe_state(struct pnv_phb *phb, int pe_no)
 {
 	struct pnv_ioda_pe *slave, *pe;
-	u8 fstate, state;
-	__be16 pcierr;
+	u8 fstate = 0, state;
+	__be16 pcierr = 0;
 	s64 rc;
 
 	/* Sanity check on PE number */
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 13aef23..db230a35 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -602,8 +602,8 @@ static void pnv_pci_handle_eeh_config(struct pnv_phb *phb, u32 pe_no)
 static void pnv_pci_config_check_eeh(struct pci_dn *pdn)
 {
 	struct pnv_phb *phb = pdn->phb->private_data;
-	u8	fstate;
-	__be16	pcierr;
+	u8	fstate = 0;
+	__be16	pcierr = 0;
 	unsigned int pe_no;
 	s64	rc;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH kernel] powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status
  2018-11-19  4:25 [PATCH kernel] powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status Alexey Kardashevskiy
@ 2018-11-19 23:28 ` Sam Bobroff
  2018-11-20  2:51 ` Michael Ellerman
  2018-12-23 13:28 ` [kernel] " Michael Ellerman
  2 siblings, 0 replies; 8+ messages in thread
From: Sam Bobroff @ 2018-11-19 23:28 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Alistair Popple, linuxppc-dev, Piotr Jaroszynski,
	Oliver O'Halloran, Reza Arbab

[-- Attachment #1: Type: text/plain, Size: 7219 bytes --]

On Mon, Nov 19, 2018 at 03:25:17PM +1100, Alexey Kardashevskiy wrote:
> The current implementation of the OPAL_PCI_EEH_FREEZE_STATUS call in
> skiboot's NPU driver does not touch the pci_error_type parameter so
> it might have garbage but the powernv code analyzes it nevertheless.
> 
> This initializes pcierr and fstate to zero in all call sites.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

The skiboot code for npu2_freeze_status() is just:

	*freeze_state = OPAL_EEH_STOPPED_NOT_FROZEN;
	return OPAL_SUCCESS;

It looks like npu_freeze_status() is affected as well. I think we
could consider these skiboot bugs, but this still seems like an
improvement.

Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>

> ---
> 
> Without this, this happens:
> 
> pnv_eeh_get_phb_diag: Failure -7 getting PHB#6 diag-data
> EEH: PHB#6 failure detected, location: N/A
> CPU: 23 PID: 5939 Comm: qemu-system-ppc Not tainted 4.19.0-le_f5a7bb7_aikATfstn1-p1 torvalds#106
> Call Trace:
> [c000003fea9df9c0] [c000000000a990ec] dump_stack+0xb0/0xf4 (unreliable)
> [c000003fea9dfa00] [c000000000038480] eeh_dev_check_failure+0x1f0/0x5f0
> [c000003fea9dfaa0] [c0000000000a2768] pnv_pci_read_config+0x128/0x160
> [c000003fea9dfae0] [c0000000005d2b0c] pci_bus_read_config_dword+0x9c/0xf0
> [c000003fea9dfb40] [c0000000005df3d4] pci_save_state+0x64/0x250
> [c000003fea9dfbc0] [c0000000005e0730] pci_dev_save_and_disable+0x70/0xa0
> [c000003fea9dfbf0] [c0000000005e4078] pci_try_reset_function+0x48/0xc0
> [c000003fea9dfc20] [c00800001cbc1b1c] vfio_pci_ioctl+0x334/0xea0 [vfio_pci]
> [c000003fea9dfcf0] [c00800001ca9046c] vfio_device_fops_unl_ioctl+0x44/0x70 [vfio]
> [c000003fea9dfd10] [c00000000039fd84] do_vfs_ioctl+0xd4/0xa00
> [c000003fea9dfdb0] [c0000000003a07b4] ksys_ioctl+0x104/0x120
> [c000003fea9dfe00] [c0000000003a07f8] sys_ioctl+0x28/0x80
> [c000003fea9dfe20] [c00000000000b3a4] system_call+0x5c/0x70
> EEH: Detected error on PHB#6
> EEH: This PCI device has failed 1 times in the last hour and will be permanently disabled after 5 fail
> ures.
> EEH: Notify device drivers to shutdown
> EEH: Beginning: 'error_detected(IO frozen)'
> EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> EEH: PE#c (PCI 0006:00:01.0): Invoking vfio-pci->error_detected(IO frozen)
> EEH: PE#c (PCI 0006:00:01.0): vfio-pci driver reports: 'can recover'
> EEH: PE#c (PCI 0006:00:01.1): Invoking vfio-pci->error_detected(IO frozen)
> EEH: PE#c (PCI 0006:00:01.1): vfio-pci driver reports: 'can recover'
> EEH: PE#b (PCI 0006:00:02.0): Invoking vfio-pci->error_detected(IO frozen)
> EEH: PE#b (PCI 0006:00:02.0): vfio-pci driver reports: 'can recover'
> EEH: PE#b (PCI 0006:00:02.1): Invoking vfio-pci->error_detected(IO frozen)
> EEH: PE#b (PCI 0006:00:02.1): vfio-pci driver reports: 'can recover'
> EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'can recover'
> EEH: Collect temporary log
> pnv_pci_dump_phb_diag_data: Unrecognized ioType 0
> EEH: Reset without hotplug activity
> iommu: Removing device 0006:00:01.0 from group 4
> iommu: Removing device 0006:00:01.1 from group 4
> iommu: Removing device 0006:00:02.0 from group 4
> iommu: Removing device 0006:00:02.1 from group 4
> pnv_ioda_freeze_pe: Failure -7 freezing PHB#6-PE#0
> pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7)
> EEH: Sleep 5s ahead of partial hotplug
> pci 0004:04     : [PE# 00] Setting up window#0 0..3fffffff pg=1000
> pci 0004:05     : [PE# 18] Setting up window#0 0..3fffffff pg=1000
> pci 0004:06     : [PE# 30] Setting up window#0 0..3fffffff pg=1000
> pci 0006:00:00.0: [PE# 0d] Setting up window 0..3fffffff pg=1000
> pci 0006:00:01.0: [PE# 0c] Setting up window 0..3fffffff pg=1000
> pci 0006:00:02.0: [PE# 0b] Setting up window 0..3fffffff pg=1000
> EEH: Beginning: 'slot_reset'
> EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> EEH: Finished:'slot_reset' with aggregate recovery state:'none'
> EEH: Notify device driver to resume
> EEH: Beginning: 'resume'
> EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> EEH: Finished:'resume'
> EEH: Recovery successful.
> ---
>  arch/powerpc/platforms/powernv/eeh-powernv.c | 8 ++++----
>  arch/powerpc/platforms/powernv/pci-ioda.c    | 4 ++--
>  arch/powerpc/platforms/powernv/pci.c         | 4 ++--
>  3 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index abc0be7..f380789 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -564,8 +564,8 @@ static void pnv_eeh_get_phb_diag(struct eeh_pe *pe)
>  static int pnv_eeh_get_phb_state(struct eeh_pe *pe)
>  {
>  	struct pnv_phb *phb = pe->phb->private_data;
> -	u8 fstate;
> -	__be16 pcierr;
> +	u8 fstate = 0;
> +	__be16 pcierr = 0;
>  	s64 rc;
>  	int result = 0;
>  
> @@ -603,8 +603,8 @@ static int pnv_eeh_get_phb_state(struct eeh_pe *pe)
>  static int pnv_eeh_get_pe_state(struct eeh_pe *pe)
>  {
>  	struct pnv_phb *phb = pe->phb->private_data;
> -	u8 fstate;
> -	__be16 pcierr;
> +	u8 fstate = 0;
> +	__be16 pcierr = 0;
>  	s64 rc;
>  	int result;
>  
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index dd80744..72b5cc0 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -604,8 +604,8 @@ static int pnv_ioda_unfreeze_pe(struct pnv_phb *phb, int pe_no, int opt)
>  static int pnv_ioda_get_pe_state(struct pnv_phb *phb, int pe_no)
>  {
>  	struct pnv_ioda_pe *slave, *pe;
> -	u8 fstate, state;
> -	__be16 pcierr;
> +	u8 fstate = 0, state;
> +	__be16 pcierr = 0;
>  	s64 rc;
>  
>  	/* Sanity check on PE number */
> diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
> index 13aef23..db230a35 100644
> --- a/arch/powerpc/platforms/powernv/pci.c
> +++ b/arch/powerpc/platforms/powernv/pci.c
> @@ -602,8 +602,8 @@ static void pnv_pci_handle_eeh_config(struct pnv_phb *phb, u32 pe_no)
>  static void pnv_pci_config_check_eeh(struct pci_dn *pdn)
>  {
>  	struct pnv_phb *phb = pdn->phb->private_data;
> -	u8	fstate;
> -	__be16	pcierr;
> +	u8	fstate = 0;
> +	__be16	pcierr = 0;
>  	unsigned int pe_no;
>  	s64	rc;
>  
> -- 
> 2.17.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH kernel] powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status
  2018-11-19  4:25 [PATCH kernel] powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status Alexey Kardashevskiy
  2018-11-19 23:28 ` Sam Bobroff
@ 2018-11-20  2:51 ` Michael Ellerman
  2018-11-20  3:51   ` Sam Bobroff
  2018-12-23 13:28 ` [kernel] " Michael Ellerman
  2 siblings, 1 reply; 8+ messages in thread
From: Michael Ellerman @ 2018-11-20  2:51 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Sam Bobroff,
	Piotr Jaroszynski, Oliver O'Halloran, Reza Arbab

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> The current implementation of the OPAL_PCI_EEH_FREEZE_STATUS call in
> skiboot's NPU driver does not touch the pci_error_type parameter so
> it might have garbage but the powernv code analyzes it nevertheless.
>
> This initializes pcierr and fstate to zero in all call sites.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---

Can we tag this with a Fixes? And seems like it should probably go to
stable, or can we not trigger this path on older kernels?

cheers

> Without this, this happens:
>
> pnv_eeh_get_phb_diag: Failure -7 getting PHB#6 diag-data
> EEH: PHB#6 failure detected, location: N/A
> CPU: 23 PID: 5939 Comm: qemu-system-ppc Not tainted 4.19.0-le_f5a7bb7_aikATfstn1-p1 torvalds#106
> Call Trace:
> [c000003fea9df9c0] [c000000000a990ec] dump_stack+0xb0/0xf4 (unreliable)
> [c000003fea9dfa00] [c000000000038480] eeh_dev_check_failure+0x1f0/0x5f0
> [c000003fea9dfaa0] [c0000000000a2768] pnv_pci_read_config+0x128/0x160
> [c000003fea9dfae0] [c0000000005d2b0c] pci_bus_read_config_dword+0x9c/0xf0
> [c000003fea9dfb40] [c0000000005df3d4] pci_save_state+0x64/0x250
> [c000003fea9dfbc0] [c0000000005e0730] pci_dev_save_and_disable+0x70/0xa0
> [c000003fea9dfbf0] [c0000000005e4078] pci_try_reset_function+0x48/0xc0
> [c000003fea9dfc20] [c00800001cbc1b1c] vfio_pci_ioctl+0x334/0xea0 [vfio_pci]
> [c000003fea9dfcf0] [c00800001ca9046c] vfio_device_fops_unl_ioctl+0x44/0x70 [vfio]
> [c000003fea9dfd10] [c00000000039fd84] do_vfs_ioctl+0xd4/0xa00
> [c000003fea9dfdb0] [c0000000003a07b4] ksys_ioctl+0x104/0x120
> [c000003fea9dfe00] [c0000000003a07f8] sys_ioctl+0x28/0x80
> [c000003fea9dfe20] [c00000000000b3a4] system_call+0x5c/0x70
> EEH: Detected error on PHB#6
> EEH: This PCI device has failed 1 times in the last hour and will be permanently disabled after 5 fail
> ures.
> EEH: Notify device drivers to shutdown
> EEH: Beginning: 'error_detected(IO frozen)'
> EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> EEH: PE#c (PCI 0006:00:01.0): Invoking vfio-pci->error_detected(IO frozen)
> EEH: PE#c (PCI 0006:00:01.0): vfio-pci driver reports: 'can recover'
> EEH: PE#c (PCI 0006:00:01.1): Invoking vfio-pci->error_detected(IO frozen)
> EEH: PE#c (PCI 0006:00:01.1): vfio-pci driver reports: 'can recover'
> EEH: PE#b (PCI 0006:00:02.0): Invoking vfio-pci->error_detected(IO frozen)
> EEH: PE#b (PCI 0006:00:02.0): vfio-pci driver reports: 'can recover'
> EEH: PE#b (PCI 0006:00:02.1): Invoking vfio-pci->error_detected(IO frozen)
> EEH: PE#b (PCI 0006:00:02.1): vfio-pci driver reports: 'can recover'
> EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'can recover'
> EEH: Collect temporary log
> pnv_pci_dump_phb_diag_data: Unrecognized ioType 0
> EEH: Reset without hotplug activity
> iommu: Removing device 0006:00:01.0 from group 4
> iommu: Removing device 0006:00:01.1 from group 4
> iommu: Removing device 0006:00:02.0 from group 4
> iommu: Removing device 0006:00:02.1 from group 4
> pnv_ioda_freeze_pe: Failure -7 freezing PHB#6-PE#0
> pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7)
> pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7)
> EEH: Sleep 5s ahead of partial hotplug
> pci 0004:04     : [PE# 00] Setting up window#0 0..3fffffff pg=1000
> pci 0004:05     : [PE# 18] Setting up window#0 0..3fffffff pg=1000
> pci 0004:06     : [PE# 30] Setting up window#0 0..3fffffff pg=1000
> pci 0006:00:00.0: [PE# 0d] Setting up window 0..3fffffff pg=1000
> pci 0006:00:01.0: [PE# 0c] Setting up window 0..3fffffff pg=1000
> pci 0006:00:02.0: [PE# 0b] Setting up window 0..3fffffff pg=1000
> EEH: Beginning: 'slot_reset'
> EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> EEH: Finished:'slot_reset' with aggregate recovery state:'none'
> EEH: Notify device driver to resume
> EEH: Beginning: 'resume'
> EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> EEH: Finished:'resume'
> EEH: Recovery successful.
> ---
>  arch/powerpc/platforms/powernv/eeh-powernv.c | 8 ++++----
>  arch/powerpc/platforms/powernv/pci-ioda.c    | 4 ++--
>  arch/powerpc/platforms/powernv/pci.c         | 4 ++--
>  3 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index abc0be7..f380789 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -564,8 +564,8 @@ static void pnv_eeh_get_phb_diag(struct eeh_pe *pe)
>  static int pnv_eeh_get_phb_state(struct eeh_pe *pe)
>  {
>  	struct pnv_phb *phb = pe->phb->private_data;
> -	u8 fstate;
> -	__be16 pcierr;
> +	u8 fstate = 0;
> +	__be16 pcierr = 0;
>  	s64 rc;
>  	int result = 0;
>  
> @@ -603,8 +603,8 @@ static int pnv_eeh_get_phb_state(struct eeh_pe *pe)
>  static int pnv_eeh_get_pe_state(struct eeh_pe *pe)
>  {
>  	struct pnv_phb *phb = pe->phb->private_data;
> -	u8 fstate;
> -	__be16 pcierr;
> +	u8 fstate = 0;
> +	__be16 pcierr = 0;
>  	s64 rc;
>  	int result;
>  
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index dd80744..72b5cc0 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -604,8 +604,8 @@ static int pnv_ioda_unfreeze_pe(struct pnv_phb *phb, int pe_no, int opt)
>  static int pnv_ioda_get_pe_state(struct pnv_phb *phb, int pe_no)
>  {
>  	struct pnv_ioda_pe *slave, *pe;
> -	u8 fstate, state;
> -	__be16 pcierr;
> +	u8 fstate = 0, state;
> +	__be16 pcierr = 0;
>  	s64 rc;
>  
>  	/* Sanity check on PE number */
> diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
> index 13aef23..db230a35 100644
> --- a/arch/powerpc/platforms/powernv/pci.c
> +++ b/arch/powerpc/platforms/powernv/pci.c
> @@ -602,8 +602,8 @@ static void pnv_pci_handle_eeh_config(struct pnv_phb *phb, u32 pe_no)
>  static void pnv_pci_config_check_eeh(struct pci_dn *pdn)
>  {
>  	struct pnv_phb *phb = pdn->phb->private_data;
> -	u8	fstate;
> -	__be16	pcierr;
> +	u8	fstate = 0;
> +	__be16	pcierr = 0;
>  	unsigned int pe_no;
>  	s64	rc;
>  
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH kernel] powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status
  2018-11-20  2:51 ` Michael Ellerman
@ 2018-11-20  3:51   ` Sam Bobroff
  2018-11-20  6:55     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 8+ messages in thread
From: Sam Bobroff @ 2018-11-20  3:51 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Alexey Kardashevskiy, Alistair Popple, linuxppc-dev,
	Piotr Jaroszynski, Oliver O'Halloran, Reza Arbab

[-- Attachment #1: Type: text/plain, Size: 7692 bytes --]

On Tue, Nov 20, 2018 at 01:51:06PM +1100, Michael Ellerman wrote:
> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> 
> > The current implementation of the OPAL_PCI_EEH_FREEZE_STATUS call in
> > skiboot's NPU driver does not touch the pci_error_type parameter so
> > it might have garbage but the powernv code analyzes it nevertheless.
> >
> > This initializes pcierr and fstate to zero in all call sites.
> >
> > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > ---
> 
> Can we tag this with a Fixes? And seems like it should probably go to
> stable, or can we not trigger this path on older kernels?
> 
> cheers

Hmm, it's triggered by use on an NPU PE so that would be any kernel that
can run on P8 or later (AFAIK).

It looks like the issue was present earlier, but the code was last
touched when it was moved, in...

40ae5f693f6a ("powerpc/powernv: Drop PHB operation get_state()")

... which was back in v4.1.

Sam.

> > Without this, this happens:
> >
> > pnv_eeh_get_phb_diag: Failure -7 getting PHB#6 diag-data
> > EEH: PHB#6 failure detected, location: N/A
> > CPU: 23 PID: 5939 Comm: qemu-system-ppc Not tainted 4.19.0-le_f5a7bb7_aikATfstn1-p1 torvalds#106
> > Call Trace:
> > [c000003fea9df9c0] [c000000000a990ec] dump_stack+0xb0/0xf4 (unreliable)
> > [c000003fea9dfa00] [c000000000038480] eeh_dev_check_failure+0x1f0/0x5f0
> > [c000003fea9dfaa0] [c0000000000a2768] pnv_pci_read_config+0x128/0x160
> > [c000003fea9dfae0] [c0000000005d2b0c] pci_bus_read_config_dword+0x9c/0xf0
> > [c000003fea9dfb40] [c0000000005df3d4] pci_save_state+0x64/0x250
> > [c000003fea9dfbc0] [c0000000005e0730] pci_dev_save_and_disable+0x70/0xa0
> > [c000003fea9dfbf0] [c0000000005e4078] pci_try_reset_function+0x48/0xc0
> > [c000003fea9dfc20] [c00800001cbc1b1c] vfio_pci_ioctl+0x334/0xea0 [vfio_pci]
> > [c000003fea9dfcf0] [c00800001ca9046c] vfio_device_fops_unl_ioctl+0x44/0x70 [vfio]
> > [c000003fea9dfd10] [c00000000039fd84] do_vfs_ioctl+0xd4/0xa00
> > [c000003fea9dfdb0] [c0000000003a07b4] ksys_ioctl+0x104/0x120
> > [c000003fea9dfe00] [c0000000003a07f8] sys_ioctl+0x28/0x80
> > [c000003fea9dfe20] [c00000000000b3a4] system_call+0x5c/0x70
> > EEH: Detected error on PHB#6
> > EEH: This PCI device has failed 1 times in the last hour and will be permanently disabled after 5 fail
> > ures.
> > EEH: Notify device drivers to shutdown
> > EEH: Beginning: 'error_detected(IO frozen)'
> > EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> > EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> > EEH: PE#c (PCI 0006:00:01.0): Invoking vfio-pci->error_detected(IO frozen)
> > EEH: PE#c (PCI 0006:00:01.0): vfio-pci driver reports: 'can recover'
> > EEH: PE#c (PCI 0006:00:01.1): Invoking vfio-pci->error_detected(IO frozen)
> > EEH: PE#c (PCI 0006:00:01.1): vfio-pci driver reports: 'can recover'
> > EEH: PE#b (PCI 0006:00:02.0): Invoking vfio-pci->error_detected(IO frozen)
> > EEH: PE#b (PCI 0006:00:02.0): vfio-pci driver reports: 'can recover'
> > EEH: PE#b (PCI 0006:00:02.1): Invoking vfio-pci->error_detected(IO frozen)
> > EEH: PE#b (PCI 0006:00:02.1): vfio-pci driver reports: 'can recover'
> > EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'can recover'
> > EEH: Collect temporary log
> > pnv_pci_dump_phb_diag_data: Unrecognized ioType 0
> > EEH: Reset without hotplug activity
> > iommu: Removing device 0006:00:01.0 from group 4
> > iommu: Removing device 0006:00:01.1 from group 4
> > iommu: Removing device 0006:00:02.0 from group 4
> > iommu: Removing device 0006:00:02.1 from group 4
> > pnv_ioda_freeze_pe: Failure -7 freezing PHB#6-PE#0
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7)
> > EEH: Sleep 5s ahead of partial hotplug
> > pci 0004:04     : [PE# 00] Setting up window#0 0..3fffffff pg=1000
> > pci 0004:05     : [PE# 18] Setting up window#0 0..3fffffff pg=1000
> > pci 0004:06     : [PE# 30] Setting up window#0 0..3fffffff pg=1000
> > pci 0006:00:00.0: [PE# 0d] Setting up window 0..3fffffff pg=1000
> > pci 0006:00:01.0: [PE# 0c] Setting up window 0..3fffffff pg=1000
> > pci 0006:00:02.0: [PE# 0b] Setting up window 0..3fffffff pg=1000
> > EEH: Beginning: 'slot_reset'
> > EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> > EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> > EEH: Finished:'slot_reset' with aggregate recovery state:'none'
> > EEH: Notify device driver to resume
> > EEH: Beginning: 'resume'
> > EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> > EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> > EEH: Finished:'resume'
> > EEH: Recovery successful.
> > ---
> >  arch/powerpc/platforms/powernv/eeh-powernv.c | 8 ++++----
> >  arch/powerpc/platforms/powernv/pci-ioda.c    | 4 ++--
> >  arch/powerpc/platforms/powernv/pci.c         | 4 ++--
> >  3 files changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
> > index abc0be7..f380789 100644
> > --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> > +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> > @@ -564,8 +564,8 @@ static void pnv_eeh_get_phb_diag(struct eeh_pe *pe)
> >  static int pnv_eeh_get_phb_state(struct eeh_pe *pe)
> >  {
> >  	struct pnv_phb *phb = pe->phb->private_data;
> > -	u8 fstate;
> > -	__be16 pcierr;
> > +	u8 fstate = 0;
> > +	__be16 pcierr = 0;
> >  	s64 rc;
> >  	int result = 0;
> >  
> > @@ -603,8 +603,8 @@ static int pnv_eeh_get_phb_state(struct eeh_pe *pe)
> >  static int pnv_eeh_get_pe_state(struct eeh_pe *pe)
> >  {
> >  	struct pnv_phb *phb = pe->phb->private_data;
> > -	u8 fstate;
> > -	__be16 pcierr;
> > +	u8 fstate = 0;
> > +	__be16 pcierr = 0;
> >  	s64 rc;
> >  	int result;
> >  
> > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> > index dd80744..72b5cc0 100644
> > --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> > @@ -604,8 +604,8 @@ static int pnv_ioda_unfreeze_pe(struct pnv_phb *phb, int pe_no, int opt)
> >  static int pnv_ioda_get_pe_state(struct pnv_phb *phb, int pe_no)
> >  {
> >  	struct pnv_ioda_pe *slave, *pe;
> > -	u8 fstate, state;
> > -	__be16 pcierr;
> > +	u8 fstate = 0, state;
> > +	__be16 pcierr = 0;
> >  	s64 rc;
> >  
> >  	/* Sanity check on PE number */
> > diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
> > index 13aef23..db230a35 100644
> > --- a/arch/powerpc/platforms/powernv/pci.c
> > +++ b/arch/powerpc/platforms/powernv/pci.c
> > @@ -602,8 +602,8 @@ static void pnv_pci_handle_eeh_config(struct pnv_phb *phb, u32 pe_no)
> >  static void pnv_pci_config_check_eeh(struct pci_dn *pdn)
> >  {
> >  	struct pnv_phb *phb = pdn->phb->private_data;
> > -	u8	fstate;
> > -	__be16	pcierr;
> > +	u8	fstate = 0;
> > +	__be16	pcierr = 0;
> >  	unsigned int pe_no;
> >  	s64	rc;
> >  
> > -- 
> > 2.17.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH kernel] powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status
  2018-11-20  3:51   ` Sam Bobroff
@ 2018-11-20  6:55     ` Alexey Kardashevskiy
  2018-11-20  6:59       ` Russell Currey
  0 siblings, 1 reply; 8+ messages in thread
From: Alexey Kardashevskiy @ 2018-11-20  6:55 UTC (permalink / raw)
  To: Sam Bobroff, Michael Ellerman
  Cc: Alistair Popple, linuxppc-dev, Piotr Jaroszynski,
	Oliver O'Halloran, Reza Arbab



On 20/11/2018 14:51, Sam Bobroff wrote:
> On Tue, Nov 20, 2018 at 01:51:06PM +1100, Michael Ellerman wrote:
>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>>
>>> The current implementation of the OPAL_PCI_EEH_FREEZE_STATUS call in
>>> skiboot's NPU driver does not touch the pci_error_type parameter so
>>> it might have garbage but the powernv code analyzes it nevertheless.
>>>
>>> This initializes pcierr and fstate to zero in all call sites.
>>>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>
>> Can we tag this with a Fixes? And seems like it should probably go to
>> stable, or can we not trigger this path on older kernels?
>>
>> cheers
> 
> Hmm, it's triggered by use on an NPU PE so that would be any kernel that
> can run on P8 or later (AFAIK).
> 
> It looks like the issue was present earlier, but the code was last
> touched when it was moved, in...
> 
> 40ae5f693f6a ("powerpc/powernv: Drop PHB operation get_state()")
> 
> ... which was back in v4.1.


The original commit is
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8c41a7f3f7593fe57

2013-06-20 13:21:09 +0800
===
powerpc/eeh: I/O chip EEH state retrieval

The patch adds I/O chip backend to retrieve the state for the
indicated PE. While the PE state is temperarily unavailable,
the upper layer (powernv platform) should return default delay
(1 second).
===

This was 3.10-rc5 (this is what that sha1's Makefile has).

But this was not the issue till skiboot decided not to zero these
parameters so "Fixes:" should point to what?


> 
> Sam.
> 
>>> Without this, this happens:
>>>
>>> pnv_eeh_get_phb_diag: Failure -7 getting PHB#6 diag-data
>>> EEH: PHB#6 failure detected, location: N/A
>>> CPU: 23 PID: 5939 Comm: qemu-system-ppc Not tainted 4.19.0-le_f5a7bb7_aikATfstn1-p1 torvalds#106
>>> Call Trace:
>>> [c000003fea9df9c0] [c000000000a990ec] dump_stack+0xb0/0xf4 (unreliable)
>>> [c000003fea9dfa00] [c000000000038480] eeh_dev_check_failure+0x1f0/0x5f0
>>> [c000003fea9dfaa0] [c0000000000a2768] pnv_pci_read_config+0x128/0x160
>>> [c000003fea9dfae0] [c0000000005d2b0c] pci_bus_read_config_dword+0x9c/0xf0
>>> [c000003fea9dfb40] [c0000000005df3d4] pci_save_state+0x64/0x250
>>> [c000003fea9dfbc0] [c0000000005e0730] pci_dev_save_and_disable+0x70/0xa0
>>> [c000003fea9dfbf0] [c0000000005e4078] pci_try_reset_function+0x48/0xc0
>>> [c000003fea9dfc20] [c00800001cbc1b1c] vfio_pci_ioctl+0x334/0xea0 [vfio_pci]
>>> [c000003fea9dfcf0] [c00800001ca9046c] vfio_device_fops_unl_ioctl+0x44/0x70 [vfio]
>>> [c000003fea9dfd10] [c00000000039fd84] do_vfs_ioctl+0xd4/0xa00
>>> [c000003fea9dfdb0] [c0000000003a07b4] ksys_ioctl+0x104/0x120
>>> [c000003fea9dfe00] [c0000000003a07f8] sys_ioctl+0x28/0x80
>>> [c000003fea9dfe20] [c00000000000b3a4] system_call+0x5c/0x70
>>> EEH: Detected error on PHB#6
>>> EEH: This PCI device has failed 1 times in the last hour and will be permanently disabled after 5 fail
>>> ures.
>>> EEH: Notify device drivers to shutdown
>>> EEH: Beginning: 'error_detected(IO frozen)'
>>> EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
>>> EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
>>> EEH: PE#c (PCI 0006:00:01.0): Invoking vfio-pci->error_detected(IO frozen)
>>> EEH: PE#c (PCI 0006:00:01.0): vfio-pci driver reports: 'can recover'
>>> EEH: PE#c (PCI 0006:00:01.1): Invoking vfio-pci->error_detected(IO frozen)
>>> EEH: PE#c (PCI 0006:00:01.1): vfio-pci driver reports: 'can recover'
>>> EEH: PE#b (PCI 0006:00:02.0): Invoking vfio-pci->error_detected(IO frozen)
>>> EEH: PE#b (PCI 0006:00:02.0): vfio-pci driver reports: 'can recover'
>>> EEH: PE#b (PCI 0006:00:02.1): Invoking vfio-pci->error_detected(IO frozen)
>>> EEH: PE#b (PCI 0006:00:02.1): vfio-pci driver reports: 'can recover'
>>> EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'can recover'
>>> EEH: Collect temporary log
>>> pnv_pci_dump_phb_diag_data: Unrecognized ioType 0
>>> EEH: Reset without hotplug activity
>>> iommu: Removing device 0006:00:01.0 from group 4
>>> iommu: Removing device 0006:00:01.1 from group 4
>>> iommu: Removing device 0006:00:02.0 from group 4
>>> iommu: Removing device 0006:00:02.1 from group 4
>>> pnv_ioda_freeze_pe: Failure -7 freezing PHB#6-PE#0
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7)
>>> EEH: Sleep 5s ahead of partial hotplug
>>> pci 0004:04     : [PE# 00] Setting up window#0 0..3fffffff pg=1000
>>> pci 0004:05     : [PE# 18] Setting up window#0 0..3fffffff pg=1000
>>> pci 0004:06     : [PE# 30] Setting up window#0 0..3fffffff pg=1000
>>> pci 0006:00:00.0: [PE# 0d] Setting up window 0..3fffffff pg=1000
>>> pci 0006:00:01.0: [PE# 0c] Setting up window 0..3fffffff pg=1000
>>> pci 0006:00:02.0: [PE# 0b] Setting up window 0..3fffffff pg=1000
>>> EEH: Beginning: 'slot_reset'
>>> EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
>>> EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
>>> EEH: Finished:'slot_reset' with aggregate recovery state:'none'
>>> EEH: Notify device driver to resume
>>> EEH: Beginning: 'resume'
>>> EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
>>> EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
>>> EEH: Finished:'resume'
>>> EEH: Recovery successful.
>>> ---
>>>  arch/powerpc/platforms/powernv/eeh-powernv.c | 8 ++++----
>>>  arch/powerpc/platforms/powernv/pci-ioda.c    | 4 ++--
>>>  arch/powerpc/platforms/powernv/pci.c         | 4 ++--
>>>  3 files changed, 8 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>> index abc0be7..f380789 100644
>>> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>>> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>> @@ -564,8 +564,8 @@ static void pnv_eeh_get_phb_diag(struct eeh_pe *pe)
>>>  static int pnv_eeh_get_phb_state(struct eeh_pe *pe)
>>>  {
>>>  	struct pnv_phb *phb = pe->phb->private_data;
>>> -	u8 fstate;
>>> -	__be16 pcierr;
>>> +	u8 fstate = 0;
>>> +	__be16 pcierr = 0;
>>>  	s64 rc;
>>>  	int result = 0;
>>>  
>>> @@ -603,8 +603,8 @@ static int pnv_eeh_get_phb_state(struct eeh_pe *pe)
>>>  static int pnv_eeh_get_pe_state(struct eeh_pe *pe)
>>>  {
>>>  	struct pnv_phb *phb = pe->phb->private_data;
>>> -	u8 fstate;
>>> -	__be16 pcierr;
>>> +	u8 fstate = 0;
>>> +	__be16 pcierr = 0;
>>>  	s64 rc;
>>>  	int result;
>>>  
>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> index dd80744..72b5cc0 100644
>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> @@ -604,8 +604,8 @@ static int pnv_ioda_unfreeze_pe(struct pnv_phb *phb, int pe_no, int opt)
>>>  static int pnv_ioda_get_pe_state(struct pnv_phb *phb, int pe_no)
>>>  {
>>>  	struct pnv_ioda_pe *slave, *pe;
>>> -	u8 fstate, state;
>>> -	__be16 pcierr;
>>> +	u8 fstate = 0, state;
>>> +	__be16 pcierr = 0;
>>>  	s64 rc;
>>>  
>>>  	/* Sanity check on PE number */
>>> diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
>>> index 13aef23..db230a35 100644
>>> --- a/arch/powerpc/platforms/powernv/pci.c
>>> +++ b/arch/powerpc/platforms/powernv/pci.c
>>> @@ -602,8 +602,8 @@ static void pnv_pci_handle_eeh_config(struct pnv_phb *phb, u32 pe_no)
>>>  static void pnv_pci_config_check_eeh(struct pci_dn *pdn)
>>>  {
>>>  	struct pnv_phb *phb = pdn->phb->private_data;
>>> -	u8	fstate;
>>> -	__be16	pcierr;
>>> +	u8	fstate = 0;
>>> +	__be16	pcierr = 0;
>>>  	unsigned int pe_no;
>>>  	s64	rc;
>>>  
>>> -- 
>>> 2.17.1
>>

-- 
Alexey

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH kernel] powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status
  2018-11-20  6:55     ` Alexey Kardashevskiy
@ 2018-11-20  6:59       ` Russell Currey
  2018-11-20 10:58         ` Michael Ellerman
  0 siblings, 1 reply; 8+ messages in thread
From: Russell Currey @ 2018-11-20  6:59 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Sam Bobroff, Michael Ellerman
  Cc: Alistair Popple, linuxppc-dev, Piotr Jaroszynski,
	Oliver O'Halloran, Reza Arbab

On Tue, 2018-11-20 at 17:55 +1100, Alexey Kardashevskiy wrote:
> 
> On 20/11/2018 14:51, Sam Bobroff wrote:
> > On Tue, Nov 20, 2018 at 01:51:06PM +1100, Michael Ellerman wrote:
> > > Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> > > 
> > > > The current implementation of the OPAL_PCI_EEH_FREEZE_STATUS
> > > > call in
> > > > skiboot's NPU driver does not touch the pci_error_type
> > > > parameter so
> > > > it might have garbage but the powernv code analyzes it
> > > > nevertheless.
> > > > 
> > > > This initializes pcierr and fstate to zero in all call sites.
> > > > 
> > > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > > > ---
> > > 
> > > Can we tag this with a Fixes? And seems like it should probably
> > > go to
> > > stable, or can we not trigger this path on older kernels?
> > > 
> > > cheers
> > 
> > Hmm, it's triggered by use on an NPU PE so that would be any kernel
> > that
> > can run on P8 or later (AFAIK).
> > 
> > It looks like the issue was present earlier, but the code was last
> > touched when it was moved, in...
> > 
> > 40ae5f693f6a ("powerpc/powernv: Drop PHB operation get_state()")
> > 
> > ... which was back in v4.1.
> 
> The original commit is
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8c41a7f3f7593fe57
> 
> 2013-06-20 13:21:09 +0800
> ===
> powerpc/eeh: I/O chip EEH state retrieval
> 
> The patch adds I/O chip backend to retrieve the state for the
> indicated PE. While the PE state is temperarily unavailable,
> the upper layer (powernv platform) should return default delay
> (1 second).
> ===
> 
> This was 3.10-rc5 (this is what that sha1's Makefile has).
> 
> But this was not the issue till skiboot decided not to zero these
> parameters so "Fixes:" should point to what?

It should still be that commit, it's perfectly reasonable (however
unlikely) that someone could be running a 3.10 kernel with a modern
skiboot.

> 
> 
> > Sam.
> > 
> > > > Without this, this happens:
> > > > 
> > > > pnv_eeh_get_phb_diag: Failure -7 getting PHB#6 diag-data
> > > > EEH: PHB#6 failure detected, location: N/A
> > > > CPU: 23 PID: 5939 Comm: qemu-system-ppc Not tainted 4.19.0-
> > > > le_f5a7bb7_aikATfstn1-p1 torvalds#106
> > > > Call Trace:
> > > > [c000003fea9df9c0] [c000000000a990ec] dump_stack+0xb0/0xf4
> > > > (unreliable)
> > > > [c000003fea9dfa00] [c000000000038480]
> > > > eeh_dev_check_failure+0x1f0/0x5f0
> > > > [c000003fea9dfaa0] [c0000000000a2768]
> > > > pnv_pci_read_config+0x128/0x160
> > > > [c000003fea9dfae0] [c0000000005d2b0c]
> > > > pci_bus_read_config_dword+0x9c/0xf0
> > > > [c000003fea9dfb40] [c0000000005df3d4] pci_save_state+0x64/0x250
> > > > [c000003fea9dfbc0] [c0000000005e0730]
> > > > pci_dev_save_and_disable+0x70/0xa0
> > > > [c000003fea9dfbf0] [c0000000005e4078]
> > > > pci_try_reset_function+0x48/0xc0
> > > > [c000003fea9dfc20] [c00800001cbc1b1c]
> > > > vfio_pci_ioctl+0x334/0xea0 [vfio_pci]
> > > > [c000003fea9dfcf0] [c00800001ca9046c]
> > > > vfio_device_fops_unl_ioctl+0x44/0x70 [vfio]
> > > > [c000003fea9dfd10] [c00000000039fd84] do_vfs_ioctl+0xd4/0xa00
> > > > [c000003fea9dfdb0] [c0000000003a07b4] ksys_ioctl+0x104/0x120
> > > > [c000003fea9dfe00] [c0000000003a07f8] sys_ioctl+0x28/0x80
> > > > [c000003fea9dfe20] [c00000000000b3a4] system_call+0x5c/0x70
> > > > EEH: Detected error on PHB#6
> > > > EEH: This PCI device has failed 1 times in the last hour and
> > > > will be permanently disabled after 5 fail
> > > > ures.
> > > > EEH: Notify device drivers to shutdown
> > > > EEH: Beginning: 'error_detected(IO frozen)'
> > > > EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> > > > EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> > > > EEH: PE#c (PCI 0006:00:01.0): Invoking vfio-pci-
> > > > >error_detected(IO frozen)
> > > > EEH: PE#c (PCI 0006:00:01.0): vfio-pci driver reports: 'can
> > > > recover'
> > > > EEH: PE#c (PCI 0006:00:01.1): Invoking vfio-pci-
> > > > >error_detected(IO frozen)
> > > > EEH: PE#c (PCI 0006:00:01.1): vfio-pci driver reports: 'can
> > > > recover'
> > > > EEH: PE#b (PCI 0006:00:02.0): Invoking vfio-pci-
> > > > >error_detected(IO frozen)
> > > > EEH: PE#b (PCI 0006:00:02.0): vfio-pci driver reports: 'can
> > > > recover'
> > > > EEH: PE#b (PCI 0006:00:02.1): Invoking vfio-pci-
> > > > >error_detected(IO frozen)
> > > > EEH: PE#b (PCI 0006:00:02.1): vfio-pci driver reports: 'can
> > > > recover'
> > > > EEH: Finished:'error_detected(IO frozen)' with aggregate
> > > > recovery state:'can recover'
> > > > EEH: Collect temporary log
> > > > pnv_pci_dump_phb_diag_data: Unrecognized ioType 0
> > > > EEH: Reset without hotplug activity
> > > > iommu: Removing device 0006:00:01.0 from group 4
> > > > iommu: Removing device 0006:00:01.1 from group 4
> > > > iommu: Removing device 0006:00:02.0 from group 4
> > > > iommu: Removing device 0006:00:02.1 from group 4
> > > > pnv_ioda_freeze_pe: Failure -7 freezing PHB#6-PE#0
> > > > pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
> > > > pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7)
> > > > pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7)
> > > > pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7)
> > > > pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7)
> > > > pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7)
> > > > pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
> > > > pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7)
> > > > pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7)
> > > > pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7)
> > > > pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7)
> > > > pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7)
> > > > EEH: Sleep 5s ahead of partial hotplug
> > > > pci 0004:04     : [PE# 00] Setting up window#0 0..3fffffff
> > > > pg=1000
> > > > pci 0004:05     : [PE# 18] Setting up window#0 0..3fffffff
> > > > pg=1000
> > > > pci 0004:06     : [PE# 30] Setting up window#0 0..3fffffff
> > > > pg=1000
> > > > pci 0006:00:00.0: [PE# 0d] Setting up window 0..3fffffff
> > > > pg=1000
> > > > pci 0006:00:01.0: [PE# 0c] Setting up window 0..3fffffff
> > > > pg=1000
> > > > pci 0006:00:02.0: [PE# 0b] Setting up window 0..3fffffff
> > > > pg=1000
> > > > EEH: Beginning: 'slot_reset'
> > > > EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> > > > EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> > > > EEH: Finished:'slot_reset' with aggregate recovery state:'none'
> > > > EEH: Notify device driver to resume
> > > > EEH: Beginning: 'resume'
> > > > EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> > > > EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> > > > EEH: Finished:'resume'
> > > > EEH: Recovery successful.
> > > > ---
> > > >  arch/powerpc/platforms/powernv/eeh-powernv.c | 8 ++++----
> > > >  arch/powerpc/platforms/powernv/pci-ioda.c    | 4 ++--
> > > >  arch/powerpc/platforms/powernv/pci.c         | 4 ++--
> > > >  3 files changed, 8 insertions(+), 8 deletions(-)
> > > > 
> > > > diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c
> > > > b/arch/powerpc/platforms/powernv/eeh-powernv.c
> > > > index abc0be7..f380789 100644
> > > > --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> > > > +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> > > > @@ -564,8 +564,8 @@ static void pnv_eeh_get_phb_diag(struct
> > > > eeh_pe *pe)
> > > >  static int pnv_eeh_get_phb_state(struct eeh_pe *pe)
> > > >  {
> > > >  	struct pnv_phb *phb = pe->phb->private_data;
> > > > -	u8 fstate;
> > > > -	__be16 pcierr;
> > > > +	u8 fstate = 0;
> > > > +	__be16 pcierr = 0;
> > > >  	s64 rc;
> > > >  	int result = 0;
> > > >  
> > > > @@ -603,8 +603,8 @@ static int pnv_eeh_get_phb_state(struct
> > > > eeh_pe *pe)
> > > >  static int pnv_eeh_get_pe_state(struct eeh_pe *pe)
> > > >  {
> > > >  	struct pnv_phb *phb = pe->phb->private_data;
> > > > -	u8 fstate;
> > > > -	__be16 pcierr;
> > > > +	u8 fstate = 0;
> > > > +	__be16 pcierr = 0;
> > > >  	s64 rc;
> > > >  	int result;
> > > >  
> > > > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c
> > > > b/arch/powerpc/platforms/powernv/pci-ioda.c
> > > > index dd80744..72b5cc0 100644
> > > > --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> > > > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> > > > @@ -604,8 +604,8 @@ static int pnv_ioda_unfreeze_pe(struct
> > > > pnv_phb *phb, int pe_no, int opt)
> > > >  static int pnv_ioda_get_pe_state(struct pnv_phb *phb, int
> > > > pe_no)
> > > >  {
> > > >  	struct pnv_ioda_pe *slave, *pe;
> > > > -	u8 fstate, state;
> > > > -	__be16 pcierr;
> > > > +	u8 fstate = 0, state;
> > > > +	__be16 pcierr = 0;
> > > >  	s64 rc;
> > > >  
> > > >  	/* Sanity check on PE number */
> > > > diff --git a/arch/powerpc/platforms/powernv/pci.c
> > > > b/arch/powerpc/platforms/powernv/pci.c
> > > > index 13aef23..db230a35 100644
> > > > --- a/arch/powerpc/platforms/powernv/pci.c
> > > > +++ b/arch/powerpc/platforms/powernv/pci.c
> > > > @@ -602,8 +602,8 @@ static void
> > > > pnv_pci_handle_eeh_config(struct pnv_phb *phb, u32 pe_no)
> > > >  static void pnv_pci_config_check_eeh(struct pci_dn *pdn)
> > > >  {
> > > >  	struct pnv_phb *phb = pdn->phb->private_data;
> > > > -	u8	fstate;
> > > > -	__be16	pcierr;
> > > > +	u8	fstate = 0;
> > > > +	__be16	pcierr = 0;
> > > >  	unsigned int pe_no;
> > > >  	s64	rc;
> > > >  
> > > > -- 
> > > > 2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH kernel] powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status
  2018-11-20  6:59       ` Russell Currey
@ 2018-11-20 10:58         ` Michael Ellerman
  0 siblings, 0 replies; 8+ messages in thread
From: Michael Ellerman @ 2018-11-20 10:58 UTC (permalink / raw)
  To: Russell Currey, Alexey Kardashevskiy, Sam Bobroff
  Cc: Alistair Popple, linuxppc-dev, Piotr Jaroszynski,
	Oliver O'Halloran, Reza Arbab

Russell Currey <ruscur@russell.cc> writes:
> On Tue, 2018-11-20 at 17:55 +1100, Alexey Kardashevskiy wrote:
>> 
>> On 20/11/2018 14:51, Sam Bobroff wrote:
>> > On Tue, Nov 20, 2018 at 01:51:06PM +1100, Michael Ellerman wrote:
>> > > Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>> > > 
>> > > > The current implementation of the OPAL_PCI_EEH_FREEZE_STATUS
>> > > > call in
>> > > > skiboot's NPU driver does not touch the pci_error_type
>> > > > parameter so
>> > > > it might have garbage but the powernv code analyzes it
>> > > > nevertheless.
>> > > > 
>> > > > This initializes pcierr and fstate to zero in all call sites.
>> > > > 
>> > > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> > > > ---
>> > > 
>> > > Can we tag this with a Fixes? And seems like it should probably
>> > > go to
>> > > stable, or can we not trigger this path on older kernels?
>> > > 
>> > > cheers
>> > 
>> > Hmm, it's triggered by use on an NPU PE so that would be any kernel
>> > that
>> > can run on P8 or later (AFAIK).
>> > 
>> > It looks like the issue was present earlier, but the code was last
>> > touched when it was moved, in...
>> > 
>> > 40ae5f693f6a ("powerpc/powernv: Drop PHB operation get_state()")
>> > 
>> > ... which was back in v4.1.
>> 
>> The original commit is
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8c41a7f3f7593fe57
>> 
>> 2013-06-20 13:21:09 +0800
>> ===
>> powerpc/eeh: I/O chip EEH state retrieval
>> 
>> The patch adds I/O chip backend to retrieve the state for the
>> indicated PE. While the PE state is temperarily unavailable,
>> the upper layer (powernv platform) should return default delay
>> (1 second).
>> ===
>> 
>> This was 3.10-rc5 (this is what that sha1's Makefile has).
>> 
>> But this was not the issue till skiboot decided not to zero these
>> parameters so "Fixes:" should point to what?
>
> It should still be that commit, it's perfectly reasonable (however
> unlikely) that someone could be running a 3.10 kernel with a modern
> skiboot.

Yeah that kernel commit is the earliest point where we could hit the
bug, so that's where Fixes should point.

If the bug's not actually in that commit, as it sounds here, then you
can explain that in the change log.

The other way to think about is if someone has back ported
8c41a7f3f7593fe57 then they would also want this fix, so pointing the
Fixes tag at 8c41a7f3f7593fe57 expresses that.

cheers

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [kernel] powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status
  2018-11-19  4:25 [PATCH kernel] powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status Alexey Kardashevskiy
  2018-11-19 23:28 ` Sam Bobroff
  2018-11-20  2:51 ` Michael Ellerman
@ 2018-12-23 13:28 ` Michael Ellerman
  2 siblings, 0 replies; 8+ messages in thread
From: Michael Ellerman @ 2018-12-23 13:28 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Sam Bobroff,
	Piotr Jaroszynski, Oliver O'Halloran, Reza Arbab

On Mon, 2018-11-19 at 04:25:17 UTC, Alexey Kardashevskiy wrote:
> The current implementation of the OPAL_PCI_EEH_FREEZE_STATUS call in
> skiboot's NPU driver does not touch the pci_error_type parameter so
> it might have garbage but the powernv code analyzes it nevertheless.
> 
> This initializes pcierr and fstate to zero in all call sites.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/c20577014f85f36d4e137d3d52a1f6

cheers

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-12-23 14:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-19  4:25 [PATCH kernel] powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status Alexey Kardashevskiy
2018-11-19 23:28 ` Sam Bobroff
2018-11-20  2:51 ` Michael Ellerman
2018-11-20  3:51   ` Sam Bobroff
2018-11-20  6:55     ` Alexey Kardashevskiy
2018-11-20  6:59       ` Russell Currey
2018-11-20 10:58         ` Michael Ellerman
2018-12-23 13:28 ` [kernel] " Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.