linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] powerpc/pseries/iommu: LPAR panics when rebooted with a frozen PE
@ 2024-04-16 20:58 Gaurav Batra
  2024-04-19  6:12 ` Michael Ellerman
  0 siblings, 1 reply; 6+ messages in thread
From: Gaurav Batra @ 2024-04-16 20:58 UTC (permalink / raw)
  To: mpe; +Cc: Gaurav Batra, linuxppc-dev

At the time of LPAR reboot, partition firmware provides Open Firmware
property ibm,dma-window for the PE. This property is provided on the PCI
bus the PE is attached to.

There are execptions where the partition firmware might not provide this
property for the PE at the time of LPAR reboot. One of the scenario is
where the firmware has frozen the PE due to some error conditions. This
PE is frozen for 24 hours or unless the whole system is reinitialized.

Within this time frame, if the LPAR is rebooted, the frozen PE will be
presented to the LPAR but ibm,dma-window property could be missing.

Today, under these circumstances, the LPAR oopses with NULL pointer
dereference, when configuring the PCI bus the PE is attached to.

BUG: Kernel NULL pointer dereference on read at 0x000000c8
Faulting instruction address: 0xc0000000001024c0
Oops: Kernel access of bad area, sig: 7 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
Supported: Yes
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-150600.9-default #1
Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_023) hv:phyp pSeries
NIP:  c0000000001024c0 LR: c0000000001024b0 CTR: c000000000102450
REGS: c0000000037db5c0 TRAP: 0300   Not tainted  (6.4.0-150600.9-default)
MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 28000822  XER: 00000000
CFAR: c00000000010254c DAR: 00000000000000c8 DSISR: 00080000 IRQMASK: 0
...
NIP [c0000000001024c0] pci_dma_bus_setup_pSeriesLP+0x70/0x2a0
LR [c0000000001024b0] pci_dma_bus_setup_pSeriesLP+0x60/0x2a0
Call Trace:
	pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 (unreliable)
	pcibios_setup_bus_self+0x1c0/0x370
	__of_scan_bus+0x2f8/0x330
	pcibios_scan_phb+0x280/0x3d0
	pcibios_init+0x88/0x12c
	do_one_initcall+0x60/0x320
	kernel_init_freeable+0x344/0x3e4
	kernel_init+0x34/0x1d0
	ret_from_kernel_user_thread+0x14/0x1c

Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-window")
Signed-off-by: Gaurav Batra <gbatra@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/iommu.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index e8c4129697b1..e808d5b1fa49 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -786,8 +786,16 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
 	 * parent bus. During reboot, there will be ibm,dma-window property to
 	 * define DMA window. For kdump, there will at least be default window or DDW
 	 * or both.
+	 * There is an exception to the above. In case the PE goes into frozen
+	 * state, firmware may not provide ibm,dma-window property at the time
+	 * of LPAR reboot.
 	 */
 
+	if (!pdn) {
+		pr_debug("  no ibm,dma-window property !\n");
+		return;
+	}
+
 	ppci = PCI_DN(pdn);
 
 	pr_debug("  parent is %pOF, iommu_table: 0x%p\n",

base-commit: 2c71fdf02a95b3dd425b42f28fd47fb2b1d22702
-- 
2.39.3 (Apple Git-146)


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/pseries/iommu: LPAR panics when rebooted with a frozen PE
  2024-04-16 20:58 [PATCH] powerpc/pseries/iommu: LPAR panics when rebooted with a frozen PE Gaurav Batra
@ 2024-04-19  6:12 ` Michael Ellerman
  2024-04-19 11:11   ` Michal Suchánek
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Ellerman @ 2024-04-19  6:12 UTC (permalink / raw)
  To: Gaurav Batra; +Cc: Gaurav Batra, linuxppc-dev

Gaurav Batra <gbatra@linux.ibm.com> writes:
> At the time of LPAR reboot, partition firmware provides Open Firmware
> property ibm,dma-window for the PE. This property is provided on the PCI
> bus the PE is attached to.

AFAICS you're actually describing a bug that happens during boot *up*?

Describing it as "reboot" makes me think you're talking about the
shutdown path. I think that will confuse people, me at least :)

cheers

> There are execptions where the partition firmware might not provide this
> property for the PE at the time of LPAR reboot. One of the scenario is
> where the firmware has frozen the PE due to some error conditions. This
> PE is frozen for 24 hours or unless the whole system is reinitialized.
>
> Within this time frame, if the LPAR is rebooted, the frozen PE will be
> presented to the LPAR but ibm,dma-window property could be missing.
>
> Today, under these circumstances, the LPAR oopses with NULL pointer
> dereference, when configuring the PCI bus the PE is attached to.
>
> BUG: Kernel NULL pointer dereference on read at 0x000000c8
> Faulting instruction address: 0xc0000000001024c0
> Oops: Kernel access of bad area, sig: 7 [#1]
> LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> Modules linked in:
> Supported: Yes
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-150600.9-default #1
> Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_023) hv:phyp pSeries
> NIP:  c0000000001024c0 LR: c0000000001024b0 CTR: c000000000102450
> REGS: c0000000037db5c0 TRAP: 0300   Not tainted  (6.4.0-150600.9-default)
> MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 28000822  XER: 00000000
> CFAR: c00000000010254c DAR: 00000000000000c8 DSISR: 00080000 IRQMASK: 0
> ...
> NIP [c0000000001024c0] pci_dma_bus_setup_pSeriesLP+0x70/0x2a0
> LR [c0000000001024b0] pci_dma_bus_setup_pSeriesLP+0x60/0x2a0
> Call Trace:
> 	pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 (unreliable)
> 	pcibios_setup_bus_self+0x1c0/0x370
> 	__of_scan_bus+0x2f8/0x330
> 	pcibios_scan_phb+0x280/0x3d0
> 	pcibios_init+0x88/0x12c
> 	do_one_initcall+0x60/0x320
> 	kernel_init_freeable+0x344/0x3e4
> 	kernel_init+0x34/0x1d0
> 	ret_from_kernel_user_thread+0x14/0x1c
>
> Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-window")
> Signed-off-by: Gaurav Batra <gbatra@linux.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/iommu.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index e8c4129697b1..e808d5b1fa49 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -786,8 +786,16 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
>  	 * parent bus. During reboot, there will be ibm,dma-window property to
>  	 * define DMA window. For kdump, there will at least be default window or DDW
>  	 * or both.
> +	 * There is an exception to the above. In case the PE goes into frozen
> +	 * state, firmware may not provide ibm,dma-window property at the time
> +	 * of LPAR reboot.
>  	 */
>  
> +	if (!pdn) {
> +		pr_debug("  no ibm,dma-window property !\n");
> +		return;
> +	}
> +
>  	ppci = PCI_DN(pdn);
>  
>  	pr_debug("  parent is %pOF, iommu_table: 0x%p\n",
>
> base-commit: 2c71fdf02a95b3dd425b42f28fd47fb2b1d22702
> -- 
> 2.39.3 (Apple Git-146)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/pseries/iommu: LPAR panics when rebooted with a frozen PE
  2024-04-19  6:12 ` Michael Ellerman
@ 2024-04-19 11:11   ` Michal Suchánek
  2024-04-19 14:41     ` Gaurav Batra
  2024-04-22  5:40     ` Michael Ellerman
  0 siblings, 2 replies; 6+ messages in thread
From: Michal Suchánek @ 2024-04-19 11:11 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: Gaurav Batra, linuxppc-dev

Hello,

On Fri, Apr 19, 2024 at 04:12:46PM +1000, Michael Ellerman wrote:
> Gaurav Batra <gbatra@linux.ibm.com> writes:
> > At the time of LPAR reboot, partition firmware provides Open Firmware
> > property ibm,dma-window for the PE. This property is provided on the PCI
> > bus the PE is attached to.
> 
> AFAICS you're actually describing a bug that happens during boot *up*?
> 
> Describing it as "reboot" makes me think you're talking about the
> shutdown path. I think that will confuse people, me at least :)

there is probably an assumption that it must have been running
previously for the errors to happen in the first place but given the
error state persists for a day it may be a very long 'reboot'.

Thanks

Michal
> 
> cheers
> 
> > There are execptions where the partition firmware might not provide this
> > property for the PE at the time of LPAR reboot. One of the scenario is
> > where the firmware has frozen the PE due to some error conditions. This
> > PE is frozen for 24 hours or unless the whole system is reinitialized.
> >
> > Within this time frame, if the LPAR is rebooted, the frozen PE will be
> > presented to the LPAR but ibm,dma-window property could be missing.
> >
> > Today, under these circumstances, the LPAR oopses with NULL pointer
> > dereference, when configuring the PCI bus the PE is attached to.
> >
> > BUG: Kernel NULL pointer dereference on read at 0x000000c8
> > Faulting instruction address: 0xc0000000001024c0
> > Oops: Kernel access of bad area, sig: 7 [#1]
> > LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> > Modules linked in:
> > Supported: Yes
> > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-150600.9-default #1
> > Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_023) hv:phyp pSeries
> > NIP:  c0000000001024c0 LR: c0000000001024b0 CTR: c000000000102450
> > REGS: c0000000037db5c0 TRAP: 0300   Not tainted  (6.4.0-150600.9-default)
> > MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 28000822  XER: 00000000
> > CFAR: c00000000010254c DAR: 00000000000000c8 DSISR: 00080000 IRQMASK: 0
> > ...
> > NIP [c0000000001024c0] pci_dma_bus_setup_pSeriesLP+0x70/0x2a0
> > LR [c0000000001024b0] pci_dma_bus_setup_pSeriesLP+0x60/0x2a0
> > Call Trace:
> > 	pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 (unreliable)
> > 	pcibios_setup_bus_self+0x1c0/0x370
> > 	__of_scan_bus+0x2f8/0x330
> > 	pcibios_scan_phb+0x280/0x3d0
> > 	pcibios_init+0x88/0x12c
> > 	do_one_initcall+0x60/0x320
> > 	kernel_init_freeable+0x344/0x3e4
> > 	kernel_init+0x34/0x1d0
> > 	ret_from_kernel_user_thread+0x14/0x1c
> >
> > Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-window")
> > Signed-off-by: Gaurav Batra <gbatra@linux.ibm.com>
> > ---
> >  arch/powerpc/platforms/pseries/iommu.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> > index e8c4129697b1..e808d5b1fa49 100644
> > --- a/arch/powerpc/platforms/pseries/iommu.c
> > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > @@ -786,8 +786,16 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
> >  	 * parent bus. During reboot, there will be ibm,dma-window property to
> >  	 * define DMA window. For kdump, there will at least be default window or DDW
> >  	 * or both.
> > +	 * There is an exception to the above. In case the PE goes into frozen
> > +	 * state, firmware may not provide ibm,dma-window property at the time
> > +	 * of LPAR reboot.
> >  	 */
> >  
> > +	if (!pdn) {
> > +		pr_debug("  no ibm,dma-window property !\n");
> > +		return;
> > +	}
> > +
> >  	ppci = PCI_DN(pdn);
> >  
> >  	pr_debug("  parent is %pOF, iommu_table: 0x%p\n",
> >
> > base-commit: 2c71fdf02a95b3dd425b42f28fd47fb2b1d22702
> > -- 
> > 2.39.3 (Apple Git-146)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/pseries/iommu: LPAR panics when rebooted with a frozen PE
  2024-04-19 11:11   ` Michal Suchánek
@ 2024-04-19 14:41     ` Gaurav Batra
  2024-04-22  5:42       ` Michael Ellerman
  2024-04-22  5:40     ` Michael Ellerman
  1 sibling, 1 reply; 6+ messages in thread
From: Gaurav Batra @ 2024-04-19 14:41 UTC (permalink / raw)
  To: Michal Suchánek, Michael Ellerman; +Cc: linuxppc-dev

You are right. I think, the "reboot" should be replaced with just "boot 
up". If there are no other comments, or code changes, I can re-word the 
commit message and submit for review.

Thanks,

Gaurav

On 4/19/24 6:11 AM, Michal Suchánek wrote:
> Hello,
>
> On Fri, Apr 19, 2024 at 04:12:46PM +1000, Michael Ellerman wrote:
>> Gaurav Batra <gbatra@linux.ibm.com> writes:
>>> At the time of LPAR reboot, partition firmware provides Open Firmware
>>> property ibm,dma-window for the PE. This property is provided on the PCI
>>> bus the PE is attached to.
>> AFAICS you're actually describing a bug that happens during boot *up*?
>>
>> Describing it as "reboot" makes me think you're talking about the
>> shutdown path. I think that will confuse people, me at least :)
> there is probably an assumption that it must have been running
> previously for the errors to happen in the first place but given the
> error state persists for a day it may be a very long 'reboot'.
>
> Thanks
>
> Michal
>> cheers
>>
>>> There are execptions where the partition firmware might not provide this
>>> property for the PE at the time of LPAR reboot. One of the scenario is
>>> where the firmware has frozen the PE due to some error conditions. This
>>> PE is frozen for 24 hours or unless the whole system is reinitialized.
>>>
>>> Within this time frame, if the LPAR is rebooted, the frozen PE will be
>>> presented to the LPAR but ibm,dma-window property could be missing.
>>>
>>> Today, under these circumstances, the LPAR oopses with NULL pointer
>>> dereference, when configuring the PCI bus the PE is attached to.
>>>
>>> BUG: Kernel NULL pointer dereference on read at 0x000000c8
>>> Faulting instruction address: 0xc0000000001024c0
>>> Oops: Kernel access of bad area, sig: 7 [#1]
>>> LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
>>> Modules linked in:
>>> Supported: Yes
>>> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-150600.9-default #1
>>> Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_023) hv:phyp pSeries
>>> NIP:  c0000000001024c0 LR: c0000000001024b0 CTR: c000000000102450
>>> REGS: c0000000037db5c0 TRAP: 0300   Not tainted  (6.4.0-150600.9-default)
>>> MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 28000822  XER: 00000000
>>> CFAR: c00000000010254c DAR: 00000000000000c8 DSISR: 00080000 IRQMASK: 0
>>> ...
>>> NIP [c0000000001024c0] pci_dma_bus_setup_pSeriesLP+0x70/0x2a0
>>> LR [c0000000001024b0] pci_dma_bus_setup_pSeriesLP+0x60/0x2a0
>>> Call Trace:
>>> 	pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 (unreliable)
>>> 	pcibios_setup_bus_self+0x1c0/0x370
>>> 	__of_scan_bus+0x2f8/0x330
>>> 	pcibios_scan_phb+0x280/0x3d0
>>> 	pcibios_init+0x88/0x12c
>>> 	do_one_initcall+0x60/0x320
>>> 	kernel_init_freeable+0x344/0x3e4
>>> 	kernel_init+0x34/0x1d0
>>> 	ret_from_kernel_user_thread+0x14/0x1c
>>>
>>> Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-window")
>>> Signed-off-by: Gaurav Batra <gbatra@linux.ibm.com>
>>> ---
>>>   arch/powerpc/platforms/pseries/iommu.c | 8 ++++++++
>>>   1 file changed, 8 insertions(+)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
>>> index e8c4129697b1..e808d5b1fa49 100644
>>> --- a/arch/powerpc/platforms/pseries/iommu.c
>>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>>> @@ -786,8 +786,16 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
>>>   	 * parent bus. During reboot, there will be ibm,dma-window property to
>>>   	 * define DMA window. For kdump, there will at least be default window or DDW
>>>   	 * or both.
>>> +	 * There is an exception to the above. In case the PE goes into frozen
>>> +	 * state, firmware may not provide ibm,dma-window property at the time
>>> +	 * of LPAR reboot.
>>>   	 */
>>>   
>>> +	if (!pdn) {
>>> +		pr_debug("  no ibm,dma-window property !\n");
>>> +		return;
>>> +	}
>>> +
>>>   	ppci = PCI_DN(pdn);
>>>   
>>>   	pr_debug("  parent is %pOF, iommu_table: 0x%p\n",
>>>
>>> base-commit: 2c71fdf02a95b3dd425b42f28fd47fb2b1d22702
>>> -- 
>>> 2.39.3 (Apple Git-146)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/pseries/iommu: LPAR panics when rebooted with a frozen PE
  2024-04-19 11:11   ` Michal Suchánek
  2024-04-19 14:41     ` Gaurav Batra
@ 2024-04-22  5:40     ` Michael Ellerman
  1 sibling, 0 replies; 6+ messages in thread
From: Michael Ellerman @ 2024-04-22  5:40 UTC (permalink / raw)
  To: Michal Suchánek; +Cc: Gaurav Batra, linuxppc-dev

Michal Suchánek <msuchanek@suse.de> writes:
> Hello,
>
> On Fri, Apr 19, 2024 at 04:12:46PM +1000, Michael Ellerman wrote:
>> Gaurav Batra <gbatra@linux.ibm.com> writes:
>> > At the time of LPAR reboot, partition firmware provides Open Firmware
>> > property ibm,dma-window for the PE. This property is provided on the PCI
>> > bus the PE is attached to.
>> 
>> AFAICS you're actually describing a bug that happens during boot *up*?
>> 
>> Describing it as "reboot" makes me think you're talking about the
>> shutdown path. I think that will confuse people, me at least :)
>
> there is probably an assumption that it must have been running
> previously for the errors to happen in the first place but given the
> error state persists for a day it may be a very long 'reboot'.

Yeah. Which is good detail, but the actual change is to the boot up path
so I think it's better described that way.

cheers

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/pseries/iommu: LPAR panics when rebooted with a frozen PE
  2024-04-19 14:41     ` Gaurav Batra
@ 2024-04-22  5:42       ` Michael Ellerman
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Ellerman @ 2024-04-22  5:42 UTC (permalink / raw)
  To: Gaurav Batra, Michal Suchánek; +Cc: linuxppc-dev

Gaurav Batra <gbatra@linux.ibm.com> writes:
> You are right. I think, the "reboot" should be replaced with just "boot 
> up". If there are no other comments, or code changes, I can re-word the 
> commit message and submit for review.

Yeah thanks. The change looks fine, just the change log needs a tweak.

It's fine to mention that the bug happens when a system has been
running, a device has been frozen, then the LPAR is rebooted, and *then*
we hit the bug at boot up.

cheers

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-04-22  5:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-16 20:58 [PATCH] powerpc/pseries/iommu: LPAR panics when rebooted with a frozen PE Gaurav Batra
2024-04-19  6:12 ` Michael Ellerman
2024-04-19 11:11   ` Michal Suchánek
2024-04-19 14:41     ` Gaurav Batra
2024-04-22  5:42       ` Michael Ellerman
2024-04-22  5:40     ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).