linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
@ 2019-01-09 15:13 Frederic Barrat
  2019-01-09 16:25 ` Greg Kurz
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Frederic Barrat @ 2019-01-09 15:13 UTC (permalink / raw)
  To: linuxppc-dev, aik, andrew.donnellan

With a recent change around IOMMU group, a system with an opencapi
adapter is no longer booting and we get a kernel oops:

BUG: Kernel NULL pointer dereference at 0x00000028
Faulting instruction address: 0xc0000000000aa38c
Oops: Kernel access of bad area, sig: 7 [#1]
LE SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in:
CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-00001-g3bd6e94bec12
NIP:  c0000000000aa38c LR: c0000000000a6608 CTR: c000000000097480
REGS: c000000005783700 TRAP: 0300   Not tainted  (5.0.0-rc1-fxb-00001-g3bd6
MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 28000228  XER: 20
CFAR: c0000000000a6604 DAR: 0000000000000028 DSISR: 00080000 IRQMASK: 0
GPR00: c0000000000a6608 c000000005783990 c000000001036100 c0000007bf761860
GPR04: 0000000000000000 c000000005783834 0000000000000000 0000000000000000
GPR08: 69626d2c6e707500 0000000000000000 0000000000000000 9000000002001003
GPR12: 0000000000000000 c0000007bfff8300 c000000000010450 0000000000000000
GPR16: c000000000ced938 0000000000000100 c000000000ced948 00000000000a0000
GPR20: 00000000000bfffe c000000000ced9a8 0000000000000200 c000000000ced978
GPR24: 00000000006080c0 c000000716d09828 c00000002e6fd000 0000000000000000
GPR28: c0000007bf4aff68 c0000007bf8d0080 c000000000f23938 c0000007bf761860
NIP [c0000000000aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
LR [c0000000000a6608] pnv_pci_ioda_fixup+0x1f8/0x660
Call Trace:
[c000000005783990] [c0000000000aa3d0] pnv_try_setup_npu_table_group+0x60/0x
[c0000000057839d0] [c0000000000a661c] pnv_pci_ioda_fixup+0x20c/0x660
[c000000005783ab0] [c000000000e1d4c0] pcibios_resource_survey+0x2c8/0x31c
[c000000005783b90] [c000000000e1caf4] pcibios_init+0xb0/0xe4
[c000000005783c10] [c000000000010054] do_one_initcall+0x64/0x264
[c000000005783ce0] [c000000000e1132c] kernel_init_freeable+0x36c/0x468
[c000000005783db0] [c000000000010474] kernel_init+0x2c/0x148
[c000000005783e20] [c00000000000b794] ret_from_kernel_thread+0x5c/0x68

An opencapi device is using a device PE, so the current code breaks
because pe->pbus is not defined.

More generally, there's no need to define an IOMMU group for opencapi,
as the device sends real addresses directly (admittedly, the
virtualization story is yet to be written). So let's fix it by
skipping the IOMMU group setup for opencapi PHBs.

Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 1d6406a051f1..7db3119f8a5b 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2681,7 +2681,8 @@ static void pnv_pci_ioda_setup_iommu_api(void)
 	list_for_each_entry(hose, &hose_list, list_node) {
 		phb = hose->private_data;
 
-		if (phb->type == PNV_PHB_NPU_NVLINK)
+		if (phb->type == PNV_PHB_NPU_NVLINK ||
+		    phb->type == PNV_PHB_NPU_OCAPI)
 			continue;
 
 		list_for_each_entry(pe, &phb->ioda.pe_list, list) {
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
  2019-01-09 15:13 [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group() Frederic Barrat
@ 2019-01-09 16:25 ` Greg Kurz
  2019-01-09 16:45   ` Frederic Barrat
  2019-01-10  0:40 ` Andrew Donnellan
  2019-01-14 10:12 ` Michael Ellerman
  2 siblings, 1 reply; 11+ messages in thread
From: Greg Kurz @ 2019-01-09 16:25 UTC (permalink / raw)
  To: Frederic Barrat; +Cc: aik, linuxppc-dev, stable, andrew.donnellan

On Wed,  9 Jan 2019 16:13:42 +0100
Frederic Barrat <fbarrat@linux.ibm.com> wrote:

> With a recent change around IOMMU group, a system with an opencapi
> adapter is no longer booting and we get a kernel oops:
> 
> BUG: Kernel NULL pointer dereference at 0x00000028
> Faulting instruction address: 0xc0000000000aa38c
> Oops: Kernel access of bad area, sig: 7 [#1]
> LE SMP NR_CPUS=2048 NUMA PowerNV
> Modules linked in:
> CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-00001-g3bd6e94bec12
> NIP:  c0000000000aa38c LR: c0000000000a6608 CTR: c000000000097480
> REGS: c000000005783700 TRAP: 0300   Not tainted  (5.0.0-rc1-fxb-00001-g3bd6
> MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 28000228  XER: 20
> CFAR: c0000000000a6604 DAR: 0000000000000028 DSISR: 00080000 IRQMASK: 0
> GPR00: c0000000000a6608 c000000005783990 c000000001036100 c0000007bf761860
> GPR04: 0000000000000000 c000000005783834 0000000000000000 0000000000000000
> GPR08: 69626d2c6e707500 0000000000000000 0000000000000000 9000000002001003
> GPR12: 0000000000000000 c0000007bfff8300 c000000000010450 0000000000000000
> GPR16: c000000000ced938 0000000000000100 c000000000ced948 00000000000a0000
> GPR20: 00000000000bfffe c000000000ced9a8 0000000000000200 c000000000ced978
> GPR24: 00000000006080c0 c000000716d09828 c00000002e6fd000 0000000000000000
> GPR28: c0000007bf4aff68 c0000007bf8d0080 c000000000f23938 c0000007bf761860
> NIP [c0000000000aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
> LR [c0000000000a6608] pnv_pci_ioda_fixup+0x1f8/0x660
> Call Trace:
> [c000000005783990] [c0000000000aa3d0] pnv_try_setup_npu_table_group+0x60/0x
> [c0000000057839d0] [c0000000000a661c] pnv_pci_ioda_fixup+0x20c/0x660
> [c000000005783ab0] [c000000000e1d4c0] pcibios_resource_survey+0x2c8/0x31c
> [c000000005783b90] [c000000000e1caf4] pcibios_init+0xb0/0xe4
> [c000000005783c10] [c000000000010054] do_one_initcall+0x64/0x264
> [c000000005783ce0] [c000000000e1132c] kernel_init_freeable+0x36c/0x468
> [c000000005783db0] [c000000000010474] kernel_init+0x2c/0x148
> [c000000005783e20] [c00000000000b794] ret_from_kernel_thread+0x5c/0x68
> 
> An opencapi device is using a device PE, so the current code breaks
> because pe->pbus is not defined.
> 
> More generally, there's no need to define an IOMMU group for opencapi,
> as the device sends real addresses directly (admittedly, the
> virtualization story is yet to be written). So let's fix it by

Current plan is to go for mediated VFIO. The real HW stays under the control
of the host ocxl driver, and we still don't need an IOMMU group.

> skipping the IOMMU group setup for opencapi PHBs.
> 
> Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> ---

Reviewed-by: Greg Kurz <groug@kaod.org>

and

Cc: stable@vger.kernel.org      # v4.20

>  arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 1d6406a051f1..7db3119f8a5b 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -2681,7 +2681,8 @@ static void pnv_pci_ioda_setup_iommu_api(void)
>  	list_for_each_entry(hose, &hose_list, list_node) {
>  		phb = hose->private_data;
>  
> -		if (phb->type == PNV_PHB_NPU_NVLINK)
> +		if (phb->type == PNV_PHB_NPU_NVLINK ||
> +		    phb->type == PNV_PHB_NPU_OCAPI)
>  			continue;
>  
>  		list_for_each_entry(pe, &phb->ioda.pe_list, list) {


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
  2019-01-09 16:25 ` Greg Kurz
@ 2019-01-09 16:45   ` Frederic Barrat
  2019-01-09 16:56     ` Greg Kurz
  2019-01-09 16:58     ` Greg KH
  0 siblings, 2 replies; 11+ messages in thread
From: Frederic Barrat @ 2019-01-09 16:45 UTC (permalink / raw)
  To: Greg Kurz; +Cc: aik, linuxppc-dev, stable, andrew.donnellan



Le 09/01/2019 à 17:25, Greg Kurz a écrit :
> On Wed,  9 Jan 2019 16:13:42 +0100
> Frederic Barrat <fbarrat@linux.ibm.com> wrote:
> 
>> With a recent change around IOMMU group, a system with an opencapi
>> adapter is no longer booting and we get a kernel oops:
>>
>> BUG: Kernel NULL pointer dereference at 0x00000028
>> Faulting instruction address: 0xc0000000000aa38c
>> Oops: Kernel access of bad area, sig: 7 [#1]
>> LE SMP NR_CPUS=2048 NUMA PowerNV
>> Modules linked in:
>> CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-00001-g3bd6e94bec12
>> NIP:  c0000000000aa38c LR: c0000000000a6608 CTR: c000000000097480
>> REGS: c000000005783700 TRAP: 0300   Not tainted  (5.0.0-rc1-fxb-00001-g3bd6
>> MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 28000228  XER: 20
>> CFAR: c0000000000a6604 DAR: 0000000000000028 DSISR: 00080000 IRQMASK: 0
>> GPR00: c0000000000a6608 c000000005783990 c000000001036100 c0000007bf761860
>> GPR04: 0000000000000000 c000000005783834 0000000000000000 0000000000000000
>> GPR08: 69626d2c6e707500 0000000000000000 0000000000000000 9000000002001003
>> GPR12: 0000000000000000 c0000007bfff8300 c000000000010450 0000000000000000
>> GPR16: c000000000ced938 0000000000000100 c000000000ced948 00000000000a0000
>> GPR20: 00000000000bfffe c000000000ced9a8 0000000000000200 c000000000ced978
>> GPR24: 00000000006080c0 c000000716d09828 c00000002e6fd000 0000000000000000
>> GPR28: c0000007bf4aff68 c0000007bf8d0080 c000000000f23938 c0000007bf761860
>> NIP [c0000000000aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
>> LR [c0000000000a6608] pnv_pci_ioda_fixup+0x1f8/0x660
>> Call Trace:
>> [c000000005783990] [c0000000000aa3d0] pnv_try_setup_npu_table_group+0x60/0x
>> [c0000000057839d0] [c0000000000a661c] pnv_pci_ioda_fixup+0x20c/0x660
>> [c000000005783ab0] [c000000000e1d4c0] pcibios_resource_survey+0x2c8/0x31c
>> [c000000005783b90] [c000000000e1caf4] pcibios_init+0xb0/0xe4
>> [c000000005783c10] [c000000000010054] do_one_initcall+0x64/0x264
>> [c000000005783ce0] [c000000000e1132c] kernel_init_freeable+0x36c/0x468
>> [c000000005783db0] [c000000000010474] kernel_init+0x2c/0x148
>> [c000000005783e20] [c00000000000b794] ret_from_kernel_thread+0x5c/0x68
>>
>> An opencapi device is using a device PE, so the current code breaks
>> because pe->pbus is not defined.
>>
>> More generally, there's no need to define an IOMMU group for opencapi,
>> as the device sends real addresses directly (admittedly, the
>> virtualization story is yet to be written). So let's fix it by
> 
> Current plan is to go for mediated VFIO. The real HW stays under the control
> of the host ocxl driver, and we still don't need an IOMMU group.
> 
>> skipping the IOMMU group setup for opencapi PHBs.
>>
>> Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
>> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
>> ---
> 
> Reviewed-by: Greg Kurz <groug@kaod.org>
> 
> and
> 
> Cc: stable@vger.kernel.org      # v4.20

Thanks for the review! But why did you add stable? that problem is only 
seen on 5.0-rc1, isn't it?

   Fred


>>   arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>> index 1d6406a051f1..7db3119f8a5b 100644
>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>> @@ -2681,7 +2681,8 @@ static void pnv_pci_ioda_setup_iommu_api(void)
>>   	list_for_each_entry(hose, &hose_list, list_node) {
>>   		phb = hose->private_data;
>>   
>> -		if (phb->type == PNV_PHB_NPU_NVLINK)
>> +		if (phb->type == PNV_PHB_NPU_NVLINK ||
>> +		    phb->type == PNV_PHB_NPU_OCAPI)
>>   			continue;
>>   
>>   		list_for_each_entry(pe, &phb->ioda.pe_list, list) {
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
  2019-01-09 16:45   ` Frederic Barrat
@ 2019-01-09 16:56     ` Greg Kurz
  2019-01-10 12:25       ` Michael Ellerman
  2019-01-09 16:58     ` Greg KH
  1 sibling, 1 reply; 11+ messages in thread
From: Greg Kurz @ 2019-01-09 16:56 UTC (permalink / raw)
  To: Frederic Barrat; +Cc: aik, linuxppc-dev, stable, andrew.donnellan

On Wed, 9 Jan 2019 17:45:53 +0100
Frederic Barrat <fbarrat@linux.ibm.com> wrote:

> Le 09/01/2019 à 17:25, Greg Kurz a écrit :
> > On Wed,  9 Jan 2019 16:13:42 +0100
> > Frederic Barrat <fbarrat@linux.ibm.com> wrote:
> >   
> >> With a recent change around IOMMU group, a system with an opencapi
> >> adapter is no longer booting and we get a kernel oops:
> >>
> >> BUG: Kernel NULL pointer dereference at 0x00000028
> >> Faulting instruction address: 0xc0000000000aa38c
> >> Oops: Kernel access of bad area, sig: 7 [#1]
> >> LE SMP NR_CPUS=2048 NUMA PowerNV
> >> Modules linked in:
> >> CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-00001-g3bd6e94bec12
> >> NIP:  c0000000000aa38c LR: c0000000000a6608 CTR: c000000000097480
> >> REGS: c000000005783700 TRAP: 0300   Not tainted  (5.0.0-rc1-fxb-00001-g3bd6
> >> MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 28000228  XER: 20
> >> CFAR: c0000000000a6604 DAR: 0000000000000028 DSISR: 00080000 IRQMASK: 0
> >> GPR00: c0000000000a6608 c000000005783990 c000000001036100 c0000007bf761860
> >> GPR04: 0000000000000000 c000000005783834 0000000000000000 0000000000000000
> >> GPR08: 69626d2c6e707500 0000000000000000 0000000000000000 9000000002001003
> >> GPR12: 0000000000000000 c0000007bfff8300 c000000000010450 0000000000000000
> >> GPR16: c000000000ced938 0000000000000100 c000000000ced948 00000000000a0000
> >> GPR20: 00000000000bfffe c000000000ced9a8 0000000000000200 c000000000ced978
> >> GPR24: 00000000006080c0 c000000716d09828 c00000002e6fd000 0000000000000000
> >> GPR28: c0000007bf4aff68 c0000007bf8d0080 c000000000f23938 c0000007bf761860
> >> NIP [c0000000000aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
> >> LR [c0000000000a6608] pnv_pci_ioda_fixup+0x1f8/0x660
> >> Call Trace:
> >> [c000000005783990] [c0000000000aa3d0] pnv_try_setup_npu_table_group+0x60/0x
> >> [c0000000057839d0] [c0000000000a661c] pnv_pci_ioda_fixup+0x20c/0x660
> >> [c000000005783ab0] [c000000000e1d4c0] pcibios_resource_survey+0x2c8/0x31c
> >> [c000000005783b90] [c000000000e1caf4] pcibios_init+0xb0/0xe4
> >> [c000000005783c10] [c000000000010054] do_one_initcall+0x64/0x264
> >> [c000000005783ce0] [c000000000e1132c] kernel_init_freeable+0x36c/0x468
> >> [c000000005783db0] [c000000000010474] kernel_init+0x2c/0x148
> >> [c000000005783e20] [c00000000000b794] ret_from_kernel_thread+0x5c/0x68
> >>
> >> An opencapi device is using a device PE, so the current code breaks
> >> because pe->pbus is not defined.
> >>
> >> More generally, there's no need to define an IOMMU group for opencapi,
> >> as the device sends real addresses directly (admittedly, the
> >> virtualization story is yet to be written). So let's fix it by  
> > 
> > Current plan is to go for mediated VFIO. The real HW stays under the control
> > of the host ocxl driver, and we still don't need an IOMMU group.
> >   
> >> skipping the IOMMU group setup for opencapi PHBs.
> >>
> >> Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
> >> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> >> ---  
> > 
> > Reviewed-by: Greg Kurz <groug@kaod.org>
> > 
> > and
> > 
> > Cc: stable@vger.kernel.org      # v4.20  
> 
> Thanks for the review! But why did you add stable? that problem is only 
> seen on 5.0-rc1, isn't it?
> 

Based on the fact that 0bd971676e68 was committed in 4.20... but I haven't
tested :)

>    Fred
> 
> 
> >>   arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++-
> >>   1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> >> index 1d6406a051f1..7db3119f8a5b 100644
> >> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> >> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> >> @@ -2681,7 +2681,8 @@ static void pnv_pci_ioda_setup_iommu_api(void)
> >>   	list_for_each_entry(hose, &hose_list, list_node) {
> >>   		phb = hose->private_data;
> >>   
> >> -		if (phb->type == PNV_PHB_NPU_NVLINK)
> >> +		if (phb->type == PNV_PHB_NPU_NVLINK ||
> >> +		    phb->type == PNV_PHB_NPU_OCAPI)
> >>   			continue;
> >>   
> >>   		list_for_each_entry(pe, &phb->ioda.pe_list, list) {  
> >   
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
  2019-01-09 16:45   ` Frederic Barrat
  2019-01-09 16:56     ` Greg Kurz
@ 2019-01-09 16:58     ` Greg KH
  1 sibling, 0 replies; 11+ messages in thread
From: Greg KH @ 2019-01-09 16:58 UTC (permalink / raw)
  To: Frederic Barrat; +Cc: aik, stable, linuxppc-dev, Greg Kurz, andrew.donnellan

On Wed, Jan 09, 2019 at 05:45:53PM +0100, Frederic Barrat wrote:
> 
> 
> Le 09/01/2019 à 17:25, Greg Kurz a écrit :
> > On Wed,  9 Jan 2019 16:13:42 +0100
> > Frederic Barrat <fbarrat@linux.ibm.com> wrote:
> > 
> > > With a recent change around IOMMU group, a system with an opencapi
> > > adapter is no longer booting and we get a kernel oops:
> > > 
> > > BUG: Kernel NULL pointer dereference at 0x00000028
> > > Faulting instruction address: 0xc0000000000aa38c
> > > Oops: Kernel access of bad area, sig: 7 [#1]
> > > LE SMP NR_CPUS=2048 NUMA PowerNV
> > > Modules linked in:
> > > CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-00001-g3bd6e94bec12
> > > NIP:  c0000000000aa38c LR: c0000000000a6608 CTR: c000000000097480
> > > REGS: c000000005783700 TRAP: 0300   Not tainted  (5.0.0-rc1-fxb-00001-g3bd6
> > > MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 28000228  XER: 20
> > > CFAR: c0000000000a6604 DAR: 0000000000000028 DSISR: 00080000 IRQMASK: 0
> > > GPR00: c0000000000a6608 c000000005783990 c000000001036100 c0000007bf761860
> > > GPR04: 0000000000000000 c000000005783834 0000000000000000 0000000000000000
> > > GPR08: 69626d2c6e707500 0000000000000000 0000000000000000 9000000002001003
> > > GPR12: 0000000000000000 c0000007bfff8300 c000000000010450 0000000000000000
> > > GPR16: c000000000ced938 0000000000000100 c000000000ced948 00000000000a0000
> > > GPR20: 00000000000bfffe c000000000ced9a8 0000000000000200 c000000000ced978
> > > GPR24: 00000000006080c0 c000000716d09828 c00000002e6fd000 0000000000000000
> > > GPR28: c0000007bf4aff68 c0000007bf8d0080 c000000000f23938 c0000007bf761860
> > > NIP [c0000000000aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
> > > LR [c0000000000a6608] pnv_pci_ioda_fixup+0x1f8/0x660
> > > Call Trace:
> > > [c000000005783990] [c0000000000aa3d0] pnv_try_setup_npu_table_group+0x60/0x
> > > [c0000000057839d0] [c0000000000a661c] pnv_pci_ioda_fixup+0x20c/0x660
> > > [c000000005783ab0] [c000000000e1d4c0] pcibios_resource_survey+0x2c8/0x31c
> > > [c000000005783b90] [c000000000e1caf4] pcibios_init+0xb0/0xe4
> > > [c000000005783c10] [c000000000010054] do_one_initcall+0x64/0x264
> > > [c000000005783ce0] [c000000000e1132c] kernel_init_freeable+0x36c/0x468
> > > [c000000005783db0] [c000000000010474] kernel_init+0x2c/0x148
> > > [c000000005783e20] [c00000000000b794] ret_from_kernel_thread+0x5c/0x68
> > > 
> > > An opencapi device is using a device PE, so the current code breaks
> > > because pe->pbus is not defined.
> > > 
> > > More generally, there's no need to define an IOMMU group for opencapi,
> > > as the device sends real addresses directly (admittedly, the
> > > virtualization story is yet to be written). So let's fix it by
> > 
> > Current plan is to go for mediated VFIO. The real HW stays under the control
> > of the host ocxl driver, and we still don't need an IOMMU group.
> > 
> > > skipping the IOMMU group setup for opencapi PHBs.
> > > 
> > > Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
> > > Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> > > ---
> > 
> > Reviewed-by: Greg Kurz <groug@kaod.org>
> > 
> > and
> > 
> > Cc: stable@vger.kernel.org      # v4.20
> 
> Thanks for the review! But why did you add stable? that problem is only seen
> on 5.0-rc1, isn't it?

No, this is fixing a patch that got backported to stable.

Well, attempted to be backported, I dropped it because of the problem :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
  2019-01-09 15:13 [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group() Frederic Barrat
  2019-01-09 16:25 ` Greg Kurz
@ 2019-01-10  0:40 ` Andrew Donnellan
  2019-01-14 10:12 ` Michael Ellerman
  2 siblings, 0 replies; 11+ messages in thread
From: Andrew Donnellan @ 2019-01-10  0:40 UTC (permalink / raw)
  To: Frederic Barrat, linuxppc-dev, aik

On 10/1/19 2:13 am, Frederic Barrat wrote:
> With a recent change around IOMMU group, a system with an opencapi
> adapter is no longer booting and we get a kernel oops:
> 
> BUG: Kernel NULL pointer dereference at 0x00000028
> Faulting instruction address: 0xc0000000000aa38c
> Oops: Kernel access of bad area, sig: 7 [#1]
> LE SMP NR_CPUS=2048 NUMA PowerNV
> Modules linked in:
> CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-00001-g3bd6e94bec12
> NIP:  c0000000000aa38c LR: c0000000000a6608 CTR: c000000000097480
> REGS: c000000005783700 TRAP: 0300   Not tainted  (5.0.0-rc1-fxb-00001-g3bd6
> MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 28000228  XER: 20
> CFAR: c0000000000a6604 DAR: 0000000000000028 DSISR: 00080000 IRQMASK: 0
> GPR00: c0000000000a6608 c000000005783990 c000000001036100 c0000007bf761860
> GPR04: 0000000000000000 c000000005783834 0000000000000000 0000000000000000
> GPR08: 69626d2c6e707500 0000000000000000 0000000000000000 9000000002001003
> GPR12: 0000000000000000 c0000007bfff8300 c000000000010450 0000000000000000
> GPR16: c000000000ced938 0000000000000100 c000000000ced948 00000000000a0000
> GPR20: 00000000000bfffe c000000000ced9a8 0000000000000200 c000000000ced978
> GPR24: 00000000006080c0 c000000716d09828 c00000002e6fd000 0000000000000000
> GPR28: c0000007bf4aff68 c0000007bf8d0080 c000000000f23938 c0000007bf761860
> NIP [c0000000000aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
> LR [c0000000000a6608] pnv_pci_ioda_fixup+0x1f8/0x660
> Call Trace:
> [c000000005783990] [c0000000000aa3d0] pnv_try_setup_npu_table_group+0x60/0x
> [c0000000057839d0] [c0000000000a661c] pnv_pci_ioda_fixup+0x20c/0x660
> [c000000005783ab0] [c000000000e1d4c0] pcibios_resource_survey+0x2c8/0x31c
> [c000000005783b90] [c000000000e1caf4] pcibios_init+0xb0/0xe4
> [c000000005783c10] [c000000000010054] do_one_initcall+0x64/0x264
> [c000000005783ce0] [c000000000e1132c] kernel_init_freeable+0x36c/0x468
> [c000000005783db0] [c000000000010474] kernel_init+0x2c/0x148
> [c000000005783e20] [c00000000000b794] ret_from_kernel_thread+0x5c/0x68
> 
> An opencapi device is using a device PE, so the current code breaks
> because pe->pbus is not defined.
> 
> More generally, there's no need to define an IOMMU group for opencapi,
> as the device sends real addresses directly (admittedly, the
> virtualization story is yet to be written). So let's fix it by
> skipping the IOMMU group setup for opencapi PHBs.
> 
> Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

> ---
>   arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 1d6406a051f1..7db3119f8a5b 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -2681,7 +2681,8 @@ static void pnv_pci_ioda_setup_iommu_api(void)
>   	list_for_each_entry(hose, &hose_list, list_node) {
>   		phb = hose->private_data;
>   
> -		if (phb->type == PNV_PHB_NPU_NVLINK)
> +		if (phb->type == PNV_PHB_NPU_NVLINK ||
> +		    phb->type == PNV_PHB_NPU_OCAPI)
>   			continue;
>   
>   		list_for_each_entry(pe, &phb->ioda.pe_list, list) {
> 

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com  IBM Australia Limited


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
  2019-01-09 16:56     ` Greg Kurz
@ 2019-01-10 12:25       ` Michael Ellerman
  2019-01-10 12:31         ` Greg Kurz
  2019-01-10 12:58         ` Frederic Barrat
  0 siblings, 2 replies; 11+ messages in thread
From: Michael Ellerman @ 2019-01-10 12:25 UTC (permalink / raw)
  To: Greg Kurz, Frederic Barrat; +Cc: aik, linuxppc-dev, andrew.donnellan, stable

Greg Kurz <groug@kaod.org> writes:
> On Wed, 9 Jan 2019 17:45:53 +0100
> Frederic Barrat <fbarrat@linux.ibm.com> wrote:
>
>> Le 09/01/2019 à 17:25, Greg Kurz a écrit :
>> > On Wed,  9 Jan 2019 16:13:42 +0100
>> > Frederic Barrat <fbarrat@linux.ibm.com> wrote:
>> >   
>> >> With a recent change around IOMMU group, a system with an opencapi
>> >> adapter is no longer booting and we get a kernel oops:
>> >>
>> >> BUG: Kernel NULL pointer dereference at 0x00000028
>> >> Faulting instruction address: 0xc0000000000aa38c
>> >> Oops: Kernel access of bad area, sig: 7 [#1]
>> >> LE SMP NR_CPUS=2048 NUMA PowerNV
>> >> Modules linked in:
>> >> CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-00001-g3bd6e94bec12
>> >> NIP:  c0000000000aa38c LR: c0000000000a6608 CTR: c000000000097480
>> >> REGS: c000000005783700 TRAP: 0300   Not tainted  (5.0.0-rc1-fxb-00001-g3bd6
>> >> MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 28000228  XER: 20
>> >> CFAR: c0000000000a6604 DAR: 0000000000000028 DSISR: 00080000 IRQMASK: 0
>> >> GPR00: c0000000000a6608 c000000005783990 c000000001036100 c0000007bf761860
>> >> GPR04: 0000000000000000 c000000005783834 0000000000000000 0000000000000000
>> >> GPR08: 69626d2c6e707500 0000000000000000 0000000000000000 9000000002001003
>> >> GPR12: 0000000000000000 c0000007bfff8300 c000000000010450 0000000000000000
>> >> GPR16: c000000000ced938 0000000000000100 c000000000ced948 00000000000a0000
>> >> GPR20: 00000000000bfffe c000000000ced9a8 0000000000000200 c000000000ced978
>> >> GPR24: 00000000006080c0 c000000716d09828 c00000002e6fd000 0000000000000000
>> >> GPR28: c0000007bf4aff68 c0000007bf8d0080 c000000000f23938 c0000007bf761860
>> >> NIP [c0000000000aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
>> >> LR [c0000000000a6608] pnv_pci_ioda_fixup+0x1f8/0x660
>> >> Call Trace:
>> >> [c000000005783990] [c0000000000aa3d0] pnv_try_setup_npu_table_group+0x60/0x
>> >> [c0000000057839d0] [c0000000000a661c] pnv_pci_ioda_fixup+0x20c/0x660
>> >> [c000000005783ab0] [c000000000e1d4c0] pcibios_resource_survey+0x2c8/0x31c
>> >> [c000000005783b90] [c000000000e1caf4] pcibios_init+0xb0/0xe4
>> >> [c000000005783c10] [c000000000010054] do_one_initcall+0x64/0x264
>> >> [c000000005783ce0] [c000000000e1132c] kernel_init_freeable+0x36c/0x468
>> >> [c000000005783db0] [c000000000010474] kernel_init+0x2c/0x148
>> >> [c000000005783e20] [c00000000000b794] ret_from_kernel_thread+0x5c/0x68
>> >>
>> >> An opencapi device is using a device PE, so the current code breaks
>> >> because pe->pbus is not defined.
>> >>
>> >> More generally, there's no need to define an IOMMU group for opencapi,
>> >> as the device sends real addresses directly (admittedly, the
>> >> virtualization story is yet to be written). So let's fix it by  
>> > 
>> > Current plan is to go for mediated VFIO. The real HW stays under the control
>> > of the host ocxl driver, and we still don't need an IOMMU group.
>> >   
>> >> skipping the IOMMU group setup for opencapi PHBs.
>> >>
>> >> Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
>> >> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
>> >> ---  
>> > 
>> > Reviewed-by: Greg Kurz <groug@kaod.org>
>> > 
>> > and
>> > 
>> > Cc: stable@vger.kernel.org      # v4.20  
>> 
>> Thanks for the review! But why did you add stable? that problem is only 
>> seen on 5.0-rc1, isn't it?
>
> Based on the fact that 0bd971676e68 was committed in 4.20... but I haven't
> tested :)

It was committed to a branch based off 4.20-rc2, but it wasn't merged
into the 4.20 release.

  $ git describe --match "v[0-9]*" --contains 0bd971676e68
  v5.0-rc1~137^2~15

So it doesn't need to go to stable.

cheers

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
  2019-01-10 12:25       ` Michael Ellerman
@ 2019-01-10 12:31         ` Greg Kurz
  2019-01-10 12:58         ` Frederic Barrat
  1 sibling, 0 replies; 11+ messages in thread
From: Greg Kurz @ 2019-01-10 12:31 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Frederic Barrat, aik, linuxppc-dev, andrew.donnellan, stable

On Thu, 10 Jan 2019 23:25:11 +1100
Michael Ellerman <mpe@ellerman.id.au> wrote:

> Greg Kurz <groug@kaod.org> writes:
> > On Wed, 9 Jan 2019 17:45:53 +0100
> > Frederic Barrat <fbarrat@linux.ibm.com> wrote:
> >  
> >> Le 09/01/2019 à 17:25, Greg Kurz a écrit :  
> >> > On Wed,  9 Jan 2019 16:13:42 +0100
> >> > Frederic Barrat <fbarrat@linux.ibm.com> wrote:
> >> >     
> >> >> With a recent change around IOMMU group, a system with an opencapi
> >> >> adapter is no longer booting and we get a kernel oops:
> >> >>
> >> >> BUG: Kernel NULL pointer dereference at 0x00000028
> >> >> Faulting instruction address: 0xc0000000000aa38c
> >> >> Oops: Kernel access of bad area, sig: 7 [#1]
> >> >> LE SMP NR_CPUS=2048 NUMA PowerNV
> >> >> Modules linked in:
> >> >> CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-00001-g3bd6e94bec12
> >> >> NIP:  c0000000000aa38c LR: c0000000000a6608 CTR: c000000000097480
> >> >> REGS: c000000005783700 TRAP: 0300   Not tainted  (5.0.0-rc1-fxb-00001-g3bd6
> >> >> MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 28000228  XER: 20
> >> >> CFAR: c0000000000a6604 DAR: 0000000000000028 DSISR: 00080000 IRQMASK: 0
> >> >> GPR00: c0000000000a6608 c000000005783990 c000000001036100 c0000007bf761860
> >> >> GPR04: 0000000000000000 c000000005783834 0000000000000000 0000000000000000
> >> >> GPR08: 69626d2c6e707500 0000000000000000 0000000000000000 9000000002001003
> >> >> GPR12: 0000000000000000 c0000007bfff8300 c000000000010450 0000000000000000
> >> >> GPR16: c000000000ced938 0000000000000100 c000000000ced948 00000000000a0000
> >> >> GPR20: 00000000000bfffe c000000000ced9a8 0000000000000200 c000000000ced978
> >> >> GPR24: 00000000006080c0 c000000716d09828 c00000002e6fd000 0000000000000000
> >> >> GPR28: c0000007bf4aff68 c0000007bf8d0080 c000000000f23938 c0000007bf761860
> >> >> NIP [c0000000000aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
> >> >> LR [c0000000000a6608] pnv_pci_ioda_fixup+0x1f8/0x660
> >> >> Call Trace:
> >> >> [c000000005783990] [c0000000000aa3d0] pnv_try_setup_npu_table_group+0x60/0x
> >> >> [c0000000057839d0] [c0000000000a661c] pnv_pci_ioda_fixup+0x20c/0x660
> >> >> [c000000005783ab0] [c000000000e1d4c0] pcibios_resource_survey+0x2c8/0x31c
> >> >> [c000000005783b90] [c000000000e1caf4] pcibios_init+0xb0/0xe4
> >> >> [c000000005783c10] [c000000000010054] do_one_initcall+0x64/0x264
> >> >> [c000000005783ce0] [c000000000e1132c] kernel_init_freeable+0x36c/0x468
> >> >> [c000000005783db0] [c000000000010474] kernel_init+0x2c/0x148
> >> >> [c000000005783e20] [c00000000000b794] ret_from_kernel_thread+0x5c/0x68
> >> >>
> >> >> An opencapi device is using a device PE, so the current code breaks
> >> >> because pe->pbus is not defined.
> >> >>
> >> >> More generally, there's no need to define an IOMMU group for opencapi,
> >> >> as the device sends real addresses directly (admittedly, the
> >> >> virtualization story is yet to be written). So let's fix it by    
> >> > 
> >> > Current plan is to go for mediated VFIO. The real HW stays under the control
> >> > of the host ocxl driver, and we still don't need an IOMMU group.
> >> >     
> >> >> skipping the IOMMU group setup for opencapi PHBs.
> >> >>
> >> >> Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
> >> >> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> >> >> ---    
> >> > 
> >> > Reviewed-by: Greg Kurz <groug@kaod.org>
> >> > 
> >> > and
> >> > 
> >> > Cc: stable@vger.kernel.org      # v4.20    
> >> 
> >> Thanks for the review! But why did you add stable? that problem is only 
> >> seen on 5.0-rc1, isn't it?  
> >
> > Based on the fact that 0bd971676e68 was committed in 4.20... but I haven't
> > tested :)  
> 
> It was committed to a branch based off 4.20-rc2, but it wasn't merged
> into the 4.20 release.
> 
>   $ git describe --match "v[0-9]*" --contains 0bd971676e68
>   v5.0-rc1~137^2~15
> 
> So it doesn't need to go to stable.
> 

Yeah I realized that afterwards, sorry for the noise and Happy New Year :)

> cheers


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
  2019-01-10 12:25       ` Michael Ellerman
  2019-01-10 12:31         ` Greg Kurz
@ 2019-01-10 12:58         ` Frederic Barrat
  2019-01-10 13:31           ` Greg KH
  1 sibling, 1 reply; 11+ messages in thread
From: Frederic Barrat @ 2019-01-10 12:58 UTC (permalink / raw)
  To: Michael Ellerman, Greg Kurz, greg
  Cc: aik, linuxppc-dev, andrew.donnellan, stable



Le 10/01/2019 à 13:25, Michael Ellerman a écrit :
> Greg Kurz <groug@kaod.org> writes:
>> On Wed, 9 Jan 2019 17:45:53 +0100
>> Frederic Barrat <fbarrat@linux.ibm.com> wrote:
>>
>>> Le 09/01/2019 à 17:25, Greg Kurz a écrit :
>>>> On Wed,  9 Jan 2019 16:13:42 +0100
>>>> Frederic Barrat <fbarrat@linux.ibm.com> wrote:
>>>>    
>>>>> With a recent change around IOMMU group, a system with an opencapi
>>>>> adapter is no longer booting and we get a kernel oops:
>>>>>
>>>>> BUG: Kernel NULL pointer dereference at 0x00000028
>>>>> Faulting instruction address: 0xc0000000000aa38c
>>>>> Oops: Kernel access of bad area, sig: 7 [#1]
>>>>> LE SMP NR_CPUS=2048 NUMA PowerNV
>>>>> Modules linked in:
>>>>> CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-00001-g3bd6e94bec12
>>>>> NIP:  c0000000000aa38c LR: c0000000000a6608 CTR: c000000000097480
>>>>> REGS: c000000005783700 TRAP: 0300   Not tainted  (5.0.0-rc1-fxb-00001-g3bd6
>>>>> MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 28000228  XER: 20
>>>>> CFAR: c0000000000a6604 DAR: 0000000000000028 DSISR: 00080000 IRQMASK: 0
>>>>> GPR00: c0000000000a6608 c000000005783990 c000000001036100 c0000007bf761860
>>>>> GPR04: 0000000000000000 c000000005783834 0000000000000000 0000000000000000
>>>>> GPR08: 69626d2c6e707500 0000000000000000 0000000000000000 9000000002001003
>>>>> GPR12: 0000000000000000 c0000007bfff8300 c000000000010450 0000000000000000
>>>>> GPR16: c000000000ced938 0000000000000100 c000000000ced948 00000000000a0000
>>>>> GPR20: 00000000000bfffe c000000000ced9a8 0000000000000200 c000000000ced978
>>>>> GPR24: 00000000006080c0 c000000716d09828 c00000002e6fd000 0000000000000000
>>>>> GPR28: c0000007bf4aff68 c0000007bf8d0080 c000000000f23938 c0000007bf761860
>>>>> NIP [c0000000000aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
>>>>> LR [c0000000000a6608] pnv_pci_ioda_fixup+0x1f8/0x660
>>>>> Call Trace:
>>>>> [c000000005783990] [c0000000000aa3d0] pnv_try_setup_npu_table_group+0x60/0x
>>>>> [c0000000057839d0] [c0000000000a661c] pnv_pci_ioda_fixup+0x20c/0x660
>>>>> [c000000005783ab0] [c000000000e1d4c0] pcibios_resource_survey+0x2c8/0x31c
>>>>> [c000000005783b90] [c000000000e1caf4] pcibios_init+0xb0/0xe4
>>>>> [c000000005783c10] [c000000000010054] do_one_initcall+0x64/0x264
>>>>> [c000000005783ce0] [c000000000e1132c] kernel_init_freeable+0x36c/0x468
>>>>> [c000000005783db0] [c000000000010474] kernel_init+0x2c/0x148
>>>>> [c000000005783e20] [c00000000000b794] ret_from_kernel_thread+0x5c/0x68
>>>>>
>>>>> An opencapi device is using a device PE, so the current code breaks
>>>>> because pe->pbus is not defined.
>>>>>
>>>>> More generally, there's no need to define an IOMMU group for opencapi,
>>>>> as the device sends real addresses directly (admittedly, the
>>>>> virtualization story is yet to be written). So let's fix it by
>>>>
>>>> Current plan is to go for mediated VFIO. The real HW stays under the control
>>>> of the host ocxl driver, and we still don't need an IOMMU group.
>>>>    
>>>>> skipping the IOMMU group setup for opencapi PHBs.
>>>>>
>>>>> Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
>>>>> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
>>>>> ---
>>>>
>>>> Reviewed-by: Greg Kurz <groug@kaod.org>
>>>>
>>>> and
>>>>
>>>> Cc: stable@vger.kernel.org      # v4.20
>>>
>>> Thanks for the review! But why did you add stable? that problem is only
>>> seen on 5.0-rc1, isn't it?
>>
>> Based on the fact that 0bd971676e68 was committed in 4.20... but I haven't
>> tested :)
> 
> It was committed to a branch based off 4.20-rc2, but it wasn't merged
> into the 4.20 release.
> 
>    $ git describe --match "v[0-9]*" --contains 0bd971676e68
>    v5.0-rc1~137^2~15
> 
> So it doesn't need to go to stable.

Which makes me wonder if Greg (KH) was really talking about that 
original patch and whether something worthwhile was dropped from stable 
by mistake?

   Fred


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
  2019-01-10 12:58         ` Frederic Barrat
@ 2019-01-10 13:31           ` Greg KH
  0 siblings, 0 replies; 11+ messages in thread
From: Greg KH @ 2019-01-10 13:31 UTC (permalink / raw)
  To: Frederic Barrat; +Cc: aik, Greg Kurz, stable, andrew.donnellan, linuxppc-dev

On Thu, Jan 10, 2019 at 01:58:31PM +0100, Frederic Barrat wrote:
> 
> 
> Le 10/01/2019 à 13:25, Michael Ellerman a écrit :
> > Greg Kurz <groug@kaod.org> writes:
> > > On Wed, 9 Jan 2019 17:45:53 +0100
> > > Frederic Barrat <fbarrat@linux.ibm.com> wrote:
> > > 
> > > > Le 09/01/2019 à 17:25, Greg Kurz a écrit :
> > > > > On Wed,  9 Jan 2019 16:13:42 +0100
> > > > > Frederic Barrat <fbarrat@linux.ibm.com> wrote:
> > > > > > With a recent change around IOMMU group, a system with an opencapi
> > > > > > adapter is no longer booting and we get a kernel oops:
> > > > > > 
> > > > > > BUG: Kernel NULL pointer dereference at 0x00000028
> > > > > > Faulting instruction address: 0xc0000000000aa38c
> > > > > > Oops: Kernel access of bad area, sig: 7 [#1]
> > > > > > LE SMP NR_CPUS=2048 NUMA PowerNV
> > > > > > Modules linked in:
> > > > > > CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-00001-g3bd6e94bec12
> > > > > > NIP:  c0000000000aa38c LR: c0000000000a6608 CTR: c000000000097480
> > > > > > REGS: c000000005783700 TRAP: 0300   Not tainted  (5.0.0-rc1-fxb-00001-g3bd6
> > > > > > MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 28000228  XER: 20
> > > > > > CFAR: c0000000000a6604 DAR: 0000000000000028 DSISR: 00080000 IRQMASK: 0
> > > > > > GPR00: c0000000000a6608 c000000005783990 c000000001036100 c0000007bf761860
> > > > > > GPR04: 0000000000000000 c000000005783834 0000000000000000 0000000000000000
> > > > > > GPR08: 69626d2c6e707500 0000000000000000 0000000000000000 9000000002001003
> > > > > > GPR12: 0000000000000000 c0000007bfff8300 c000000000010450 0000000000000000
> > > > > > GPR16: c000000000ced938 0000000000000100 c000000000ced948 00000000000a0000
> > > > > > GPR20: 00000000000bfffe c000000000ced9a8 0000000000000200 c000000000ced978
> > > > > > GPR24: 00000000006080c0 c000000716d09828 c00000002e6fd000 0000000000000000
> > > > > > GPR28: c0000007bf4aff68 c0000007bf8d0080 c000000000f23938 c0000007bf761860
> > > > > > NIP [c0000000000aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
> > > > > > LR [c0000000000a6608] pnv_pci_ioda_fixup+0x1f8/0x660
> > > > > > Call Trace:
> > > > > > [c000000005783990] [c0000000000aa3d0] pnv_try_setup_npu_table_group+0x60/0x
> > > > > > [c0000000057839d0] [c0000000000a661c] pnv_pci_ioda_fixup+0x20c/0x660
> > > > > > [c000000005783ab0] [c000000000e1d4c0] pcibios_resource_survey+0x2c8/0x31c
> > > > > > [c000000005783b90] [c000000000e1caf4] pcibios_init+0xb0/0xe4
> > > > > > [c000000005783c10] [c000000000010054] do_one_initcall+0x64/0x264
> > > > > > [c000000005783ce0] [c000000000e1132c] kernel_init_freeable+0x36c/0x468
> > > > > > [c000000005783db0] [c000000000010474] kernel_init+0x2c/0x148
> > > > > > [c000000005783e20] [c00000000000b794] ret_from_kernel_thread+0x5c/0x68
> > > > > > 
> > > > > > An opencapi device is using a device PE, so the current code breaks
> > > > > > because pe->pbus is not defined.
> > > > > > 
> > > > > > More generally, there's no need to define an IOMMU group for opencapi,
> > > > > > as the device sends real addresses directly (admittedly, the
> > > > > > virtualization story is yet to be written). So let's fix it by
> > > > > 
> > > > > Current plan is to go for mediated VFIO. The real HW stays under the control
> > > > > of the host ocxl driver, and we still don't need an IOMMU group.
> > > > > > skipping the IOMMU group setup for opencapi PHBs.
> > > > > > 
> > > > > > Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
> > > > > > Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> > > > > > ---
> > > > > 
> > > > > Reviewed-by: Greg Kurz <groug@kaod.org>
> > > > > 
> > > > > and
> > > > > 
> > > > > Cc: stable@vger.kernel.org      # v4.20
> > > > 
> > > > Thanks for the review! But why did you add stable? that problem is only
> > > > seen on 5.0-rc1, isn't it?
> > > 
> > > Based on the fact that 0bd971676e68 was committed in 4.20... but I haven't
> > > tested :)
> > 
> > It was committed to a branch based off 4.20-rc2, but it wasn't merged
> > into the 4.20 release.
> > 
> >    $ git describe --match "v[0-9]*" --contains 0bd971676e68
> >    v5.0-rc1~137^2~15
> > 
> > So it doesn't need to go to stable.
> 
> Which makes me wonder if Greg (KH) was really talking about that original
> patch and whether something worthwhile was dropped from stable by mistake?

Totally different thread, sorry for the noise, my fault...

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
  2019-01-09 15:13 [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group() Frederic Barrat
  2019-01-09 16:25 ` Greg Kurz
  2019-01-10  0:40 ` Andrew Donnellan
@ 2019-01-14 10:12 ` Michael Ellerman
  2 siblings, 0 replies; 11+ messages in thread
From: Michael Ellerman @ 2019-01-14 10:12 UTC (permalink / raw)
  To: Frederic Barrat, linuxppc-dev, aik, andrew.donnellan

On Wed, 2019-01-09 at 15:13:42 UTC, Frederic Barrat wrote:
> With a recent change around IOMMU group, a system with an opencapi
> adapter is no longer booting and we get a kernel oops:
> 
> BUG: Kernel NULL pointer dereference at 0x00000028
> Faulting instruction address: 0xc0000000000aa38c
> Oops: Kernel access of bad area, sig: 7 [#1]
> LE SMP NR_CPUS=2048 NUMA PowerNV
> Modules linked in:
> CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-00001-g3bd6e94bec12
> NIP:  c0000000000aa38c LR: c0000000000a6608 CTR: c000000000097480
> REGS: c000000005783700 TRAP: 0300   Not tainted  (5.0.0-rc1-fxb-00001-g3bd6
> MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 28000228  XER: 20
> CFAR: c0000000000a6604 DAR: 0000000000000028 DSISR: 00080000 IRQMASK: 0
> GPR00: c0000000000a6608 c000000005783990 c000000001036100 c0000007bf761860
> GPR04: 0000000000000000 c000000005783834 0000000000000000 0000000000000000
> GPR08: 69626d2c6e707500 0000000000000000 0000000000000000 9000000002001003
> GPR12: 0000000000000000 c0000007bfff8300 c000000000010450 0000000000000000
> GPR16: c000000000ced938 0000000000000100 c000000000ced948 00000000000a0000
> GPR20: 00000000000bfffe c000000000ced9a8 0000000000000200 c000000000ced978
> GPR24: 00000000006080c0 c000000716d09828 c00000002e6fd000 0000000000000000
> GPR28: c0000007bf4aff68 c0000007bf8d0080 c000000000f23938 c0000007bf761860
> NIP [c0000000000aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
> LR [c0000000000a6608] pnv_pci_ioda_fixup+0x1f8/0x660
> Call Trace:
> [c000000005783990] [c0000000000aa3d0] pnv_try_setup_npu_table_group+0x60/0x
> [c0000000057839d0] [c0000000000a661c] pnv_pci_ioda_fixup+0x20c/0x660
> [c000000005783ab0] [c000000000e1d4c0] pcibios_resource_survey+0x2c8/0x31c
> [c000000005783b90] [c000000000e1caf4] pcibios_init+0xb0/0xe4
> [c000000005783c10] [c000000000010054] do_one_initcall+0x64/0x264
> [c000000005783ce0] [c000000000e1132c] kernel_init_freeable+0x36c/0x468
> [c000000005783db0] [c000000000010474] kernel_init+0x2c/0x148
> [c000000005783e20] [c00000000000b794] ret_from_kernel_thread+0x5c/0x68
> 
> An opencapi device is using a device PE, so the current code breaks
> because pe->pbus is not defined.
> 
> More generally, there's no need to define an IOMMU group for opencapi,
> as the device sends real addresses directly (admittedly, the
> virtualization story is yet to be written). So let's fix it by
> skipping the IOMMU group setup for opencapi PHBs.
> 
> Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> Reviewed-by: Greg Kurz <groug@kaod.org>
> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/6bca515917515b66b7e1dfc1d1d3b7bd

cheers

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-01-14 10:19 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-09 15:13 [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group() Frederic Barrat
2019-01-09 16:25 ` Greg Kurz
2019-01-09 16:45   ` Frederic Barrat
2019-01-09 16:56     ` Greg Kurz
2019-01-10 12:25       ` Michael Ellerman
2019-01-10 12:31         ` Greg Kurz
2019-01-10 12:58         ` Frederic Barrat
2019-01-10 13:31           ` Greg KH
2019-01-09 16:58     ` Greg KH
2019-01-10  0:40 ` Andrew Donnellan
2019-01-14 10:12 ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).