linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv3 1/2] powerpc/pseries: group lmb operation and memblock's
@ 2020-07-22  1:52 Pingfan Liu
  2020-07-22  1:52 ` [PATCHv3 2/2] powerpc/pseries: update device tree before ejecting hotplug uevents Pingfan Liu
  2020-07-23 14:41 ` [PATCHv3 1/2] powerpc/pseries: group lmb operation and memblock's Nathan Lynch
  0 siblings, 2 replies; 10+ messages in thread
From: Pingfan Liu @ 2020-07-22  1:52 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nathan Lynch, kexec, Hari Bathini, Pingfan Liu

This patch prepares for the incoming patch which swaps the order of KOBJ_
uevent and dt's updating.

It has no functional effect, just groups lmb operation and memblock's in
order to insert dt updating operation easily, and makes it easier to
review.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
To: linuxppc-dev@lists.ozlabs.org
Cc: kexec@lists.infradead.org
---
v2 -> v3: rebase onto ppc next-test branch
---
 arch/powerpc/platforms/pseries/hotplug-memory.c | 26 ++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 5d545b7..1a3ac3b 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -355,7 +355,8 @@ static int dlpar_add_lmb(struct drmem_lmb *);
 static int dlpar_remove_lmb(struct drmem_lmb *lmb)
 {
 	unsigned long block_sz;
-	int rc;
+	phys_addr_t base_addr;
+	int rc, nid;
 
 	if (!lmb_is_removable(lmb))
 		return -EINVAL;
@@ -364,17 +365,19 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
 	if (rc)
 		return rc;
 
+	base_addr = lmb->base_addr;
+	nid = lmb->nid;
 	block_sz = pseries_memory_block_size();
 
-	__remove_memory(lmb->nid, lmb->base_addr, block_sz);
-
-	/* Update memory regions for memory remove */
-	memblock_remove(lmb->base_addr, block_sz);
-
 	invalidate_lmb_associativity_index(lmb);
 	lmb_clear_nid(lmb);
 	lmb->flags &= ~DRCONF_MEM_ASSIGNED;
 
+	__remove_memory(nid, base_addr, block_sz);
+
+	/* Update memory regions for memory remove */
+	memblock_remove(base_addr, block_sz);
+
 	return 0;
 }
 
@@ -603,6 +606,8 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 	}
 
 	lmb_set_nid(lmb);
+	lmb->flags |= DRCONF_MEM_ASSIGNED;
+
 	block_sz = memory_block_size_bytes();
 
 	/* Add the memory */
@@ -614,11 +619,14 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 
 	rc = dlpar_online_lmb(lmb);
 	if (rc) {
-		__remove_memory(lmb->nid, lmb->base_addr, block_sz);
+		int nid = lmb->nid;
+		phys_addr_t base_addr = lmb->base_addr;
+
 		invalidate_lmb_associativity_index(lmb);
 		lmb_clear_nid(lmb);
-	} else {
-		lmb->flags |= DRCONF_MEM_ASSIGNED;
+		lmb->flags &= ~DRCONF_MEM_ASSIGNED;
+
+		__remove_memory(nid, base_addr, block_sz);
 	}
 
 	return rc;
-- 
2.7.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCHv3 2/2] powerpc/pseries: update device tree before ejecting hotplug uevents
  2020-07-22  1:52 [PATCHv3 1/2] powerpc/pseries: group lmb operation and memblock's Pingfan Liu
@ 2020-07-22  1:52 ` Pingfan Liu
  2020-07-22  4:57   ` Michael Ellerman
  2020-07-23 13:27   ` Nathan Lynch
  2020-07-23 14:41 ` [PATCHv3 1/2] powerpc/pseries: group lmb operation and memblock's Nathan Lynch
  1 sibling, 2 replies; 10+ messages in thread
From: Pingfan Liu @ 2020-07-22  1:52 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nathan Lynch, kexec, Hari Bathini, Pingfan Liu

A bug is observed on pseries by taking the following steps on rhel:
-1. drmgr -c mem -r -q 5
-2. echo c > /proc/sysrq-trigger

And then, the failure looks like:
kdump: saving to /sysroot//var/crash/127.0.0.1-2020-01-16-02:06:14/
kdump: saving vmcore-dmesg.txt
kdump: saving vmcore-dmesg.txt complete
kdump: saving vmcore
 Checking for memory holes                         : [  0.0 %] /                   Checking for memory holes                         : [100.0 %] |                   Excluding unnecessary pages                       : [100.0 %] \                   Copying data                                      : [  0.3 %] -          eta: 38s[   44.337636] hash-mmu: mm: Hashing failure ! EA=0x7fffba400000 access=0x8000000000000004 current=makedumpfile
[   44.337663] hash-mmu:     trap=0x300 vsid=0x13a109c ssize=1 base psize=2 psize 2 pte=0xc000000050000504
[   44.337677] hash-mmu: mm: Hashing failure ! EA=0x7fffba400000 access=0x8000000000000004 current=makedumpfile
[   44.337692] hash-mmu:     trap=0x300 vsid=0x13a109c ssize=1 base psize=2 psize 2 pte=0xc000000050000504
[   44.337708] makedumpfile[469]: unhandled signal 7 at 00007fffba400000 nip 00007fffbbc4d7fc lr 000000011356ca3c code 2
[   44.338548] Core dump to |/bin/false pipe failed
/lib/kdump-lib-initramfs.sh: line 98:   469 Bus error               $CORE_COLLECTOR /proc/vmcore $_mp/$KDUMP_PATH/$HOST_IP-$DATEDIR/vmcore-incomplete
kdump: saving vmcore failed

* Root cause *
  After analyzing, it turns out that in the current implementation,
when hot-removing lmb, the KOBJ_REMOVE event ejects before the dt updating as
the code __remove_memory() comes before drmem_update_dt().
So in kdump kernel, when read_from_oldmem() resorts to
pSeries_lpar_hpte_insert() to install hpte, but fails with -2 due to
non-exist pfn. And finally, low_hash_fault() raise SIGBUS to process, as it
can be observed "Bus error"

From a viewpoint of listener and publisher, the publisher notifies the
listener before data is ready.  This introduces a problem where udev
launches kexec-tools (due to KOBJ_REMOVE) and loads a stale dt before
updating. And in capture kernel, makedumpfile will access the memory based
on the stale dt info, and hit a SIGBUS error due to an un-existed lmb.

* Fix *
  In order to fix this issue, update dt before __remove_memory(), and
accordingly the same rule in hot-add path.

This will introduce extra dt updating payload for each involved lmb when hotplug.
But it should be fine since drmem_update_dt() is memory based operation and
hotplug is not a hot path.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
To: linuxppc-dev@lists.ozlabs.org
Cc: kexec@lists.infradead.org
---
v2 -> v3: rebase onto ppc next-test branch
---
 arch/powerpc/platforms/pseries/hotplug-memory.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 1a3ac3b..def8cb3f 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -372,6 +372,7 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
 	invalidate_lmb_associativity_index(lmb);
 	lmb_clear_nid(lmb);
 	lmb->flags &= ~DRCONF_MEM_ASSIGNED;
+	drmem_update_dt();
 
 	__remove_memory(nid, base_addr, block_sz);
 
@@ -607,6 +608,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 
 	lmb_set_nid(lmb);
 	lmb->flags |= DRCONF_MEM_ASSIGNED;
+	drmem_update_dt();
 
 	block_sz = memory_block_size_bytes();
 
@@ -625,6 +627,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 		invalidate_lmb_associativity_index(lmb);
 		lmb_clear_nid(lmb);
 		lmb->flags &= ~DRCONF_MEM_ASSIGNED;
+		drmem_update_dt();
 
 		__remove_memory(nid, base_addr, block_sz);
 	}
@@ -877,9 +880,6 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
 		break;
 	}
 
-	if (!rc)
-		rc = drmem_update_dt();
-
 	unlock_device_hotplug();
 	return rc;
 }
-- 
2.7.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCHv3 2/2] powerpc/pseries: update device tree before ejecting hotplug uevents
  2020-07-22  1:52 ` [PATCHv3 2/2] powerpc/pseries: update device tree before ejecting hotplug uevents Pingfan Liu
@ 2020-07-22  4:57   ` Michael Ellerman
  2020-07-23  2:27     ` Pingfan Liu
  2020-07-23 13:27   ` Nathan Lynch
  1 sibling, 1 reply; 10+ messages in thread
From: Michael Ellerman @ 2020-07-22  4:57 UTC (permalink / raw)
  To: Pingfan Liu, linuxppc-dev; +Cc: Nathan Lynch, kexec, Hari Bathini, Pingfan Liu

Pingfan Liu <kernelfans@gmail.com> writes:
> A bug is observed on pseries by taking the following steps on rhel:
                                                                ^
                                                                RHEL

I assume it happens on mainline too?

> -1. drmgr -c mem -r -q 5
> -2. echo c > /proc/sysrq-trigger
>
> And then, the failure looks like:
> kdump: saving to /sysroot//var/crash/127.0.0.1-2020-01-16-02:06:14/
> kdump: saving vmcore-dmesg.txt
> kdump: saving vmcore-dmesg.txt complete
> kdump: saving vmcore
>  Checking for memory holes                         : [  0.0 %] /                   Checking for memory holes                         : [100.0 %] |                   Excluding unnecessary pages                       : [100.0 %] \                   Copying data                                      : [  0.3 %] -          eta: 38s[   44.337636] hash-mmu: mm: Hashing failure ! EA=0x7fffba400000 access=0x8000000000000004 current=makedumpfile
> [   44.337663] hash-mmu:     trap=0x300 vsid=0x13a109c ssize=1 base psize=2 psize 2 pte=0xc000000050000504
> [   44.337677] hash-mmu: mm: Hashing failure ! EA=0x7fffba400000 access=0x8000000000000004 current=makedumpfile
> [   44.337692] hash-mmu:     trap=0x300 vsid=0x13a109c ssize=1 base psize=2 psize 2 pte=0xc000000050000504
> [   44.337708] makedumpfile[469]: unhandled signal 7 at 00007fffba400000 nip 00007fffbbc4d7fc lr 000000011356ca3c code 2
> [   44.338548] Core dump to |/bin/false pipe failed
> /lib/kdump-lib-initramfs.sh: line 98:   469 Bus error               $CORE_COLLECTOR /proc/vmcore $_mp/$KDUMP_PATH/$HOST_IP-$DATEDIR/vmcore-incomplete
> kdump: saving vmcore failed
>
> * Root cause *
>   After analyzing, it turns out that in the current implementation,
> when hot-removing lmb, the KOBJ_REMOVE event ejects before the dt updating as
> the code __remove_memory() comes before drmem_update_dt().
> So in kdump kernel, when read_from_oldmem() resorts to
> pSeries_lpar_hpte_insert() to install hpte, but fails with -2 due to
> non-exist pfn. And finally, low_hash_fault() raise SIGBUS to process, as it
> can be observed "Bus error"
>
> From a viewpoint of listener and publisher, the publisher notifies the
> listener before data is ready.  This introduces a problem where udev
> launches kexec-tools (due to KOBJ_REMOVE) and loads a stale dt before
> updating. And in capture kernel, makedumpfile will access the memory based
> on the stale dt info, and hit a SIGBUS error due to an un-existed lmb.
>
> * Fix *
>   In order to fix this issue, update dt before __remove_memory(), and
> accordingly the same rule in hot-add path.
>
> This will introduce extra dt updating payload for each involved lmb when hotplug.
> But it should be fine since drmem_update_dt() is memory based operation and
> hotplug is not a hot path.
>
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Hari Bathini <hbathini@linux.ibm.com>
> Cc: Nathan Lynch <nathanl@linux.ibm.com>
> To: linuxppc-dev@lists.ozlabs.org
> Cc: kexec@lists.infradead.org
> ---
> v2 -> v3: rebase onto ppc next-test branch
> ---
>  arch/powerpc/platforms/pseries/hotplug-memory.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index 1a3ac3b..def8cb3f 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -372,6 +372,7 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
>  	invalidate_lmb_associativity_index(lmb);
>  	lmb_clear_nid(lmb);
>  	lmb->flags &= ~DRCONF_MEM_ASSIGNED;
> +	drmem_update_dt();

No error checking?

>  	__remove_memory(nid, base_addr, block_sz);
>  
> @@ -607,6 +608,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
>  
>  	lmb_set_nid(lmb);
>  	lmb->flags |= DRCONF_MEM_ASSIGNED;
> +	drmem_update_dt();

And here ..
>  
>  	block_sz = memory_block_size_bytes();
>  
> @@ -625,6 +627,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
>  		invalidate_lmb_associativity_index(lmb);
>  		lmb_clear_nid(lmb);
>  		lmb->flags &= ~DRCONF_MEM_ASSIGNED;
> +		drmem_update_dt();


And here ..

>  		__remove_memory(nid, base_addr, block_sz);
>  	}
> @@ -877,9 +880,6 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
>  		break;
>  	}
>  
> -	if (!rc)
> -		rc = drmem_update_dt();
> -
>  	unlock_device_hotplug();
>  	return rc;

Whereas previously we did check it.


cheers

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv3 2/2] powerpc/pseries: update device tree before ejecting hotplug uevents
  2020-07-22  4:57   ` Michael Ellerman
@ 2020-07-23  2:27     ` Pingfan Liu
  2020-07-23  6:41       ` Michael Ellerman
  0 siblings, 1 reply; 10+ messages in thread
From: Pingfan Liu @ 2020-07-23  2:27 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Kexec Mailing List, linuxppc-dev, Hari Bathini

On Wed, Jul 22, 2020 at 12:57 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Pingfan Liu <kernelfans@gmail.com> writes:
> > A bug is observed on pseries by taking the following steps on rhel:
>                                                                 ^
>                                                                 RHEL
>
> I assume it happens on mainline too?
Yes, it does.
>
[...]
> > diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> > index 1a3ac3b..def8cb3f 100644
> > --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> > +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> > @@ -372,6 +372,7 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
> >       invalidate_lmb_associativity_index(lmb);
> >       lmb_clear_nid(lmb);
> >       lmb->flags &= ~DRCONF_MEM_ASSIGNED;
> > +     drmem_update_dt();
>
> No error checking?
Hmm, here should be a more careful design. Please see the comment at the end.
>
> >       __remove_memory(nid, base_addr, block_sz);
> >
> > @@ -607,6 +608,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
> >
> >       lmb_set_nid(lmb);
> >       lmb->flags |= DRCONF_MEM_ASSIGNED;
> > +     drmem_update_dt();
>
> And here ..
> >
> >       block_sz = memory_block_size_bytes();
> >
> > @@ -625,6 +627,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
> >               invalidate_lmb_associativity_index(lmb);
> >               lmb_clear_nid(lmb);
> >               lmb->flags &= ~DRCONF_MEM_ASSIGNED;
> > +             drmem_update_dt();
>
>
> And here ..
>
> >               __remove_memory(nid, base_addr, block_sz);
> >       }
> > @@ -877,9 +880,6 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
> >               break;
> >       }
> >
> > -     if (!rc)
> > -             rc = drmem_update_dt();
> > -
> >       unlock_device_hotplug();
> >       return rc;
>
> Whereas previously we did check it.

drmem_update_dt() fails iff allocating memory fail. And in the failed
case, even the original code does not roll back the effect of
__add_memory()/__remove_memory().

And I plan to do the following in V4: if drmem_update_dt() fails in
dlpar_add_lmb(), then bails out immediately.

Thanks,
Pingfan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv3 2/2] powerpc/pseries: update device tree before ejecting hotplug uevents
  2020-07-23  2:27     ` Pingfan Liu
@ 2020-07-23  6:41       ` Michael Ellerman
  0 siblings, 0 replies; 10+ messages in thread
From: Michael Ellerman @ 2020-07-23  6:41 UTC (permalink / raw)
  To: Pingfan Liu; +Cc: Nathan Lynch, Kexec Mailing List, linuxppc-dev, Hari Bathini

Pingfan Liu <kernelfans@gmail.com> writes:
> On Wed, Jul 22, 2020 at 12:57 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>>
>> Pingfan Liu <kernelfans@gmail.com> writes:
>> > A bug is observed on pseries by taking the following steps on rhel:
>>                                                                 ^
>>                                                                 RHEL
>>
>> I assume it happens on mainline too?
> Yes, it does.
>>
> [...]
>> > diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
>> > index 1a3ac3b..def8cb3f 100644
>> > --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
>> > +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
>> > @@ -372,6 +372,7 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
>> >       invalidate_lmb_associativity_index(lmb);
>> >       lmb_clear_nid(lmb);
>> >       lmb->flags &= ~DRCONF_MEM_ASSIGNED;
>> > +     drmem_update_dt();
>>
>> No error checking?
> Hmm, here should be a more careful design. Please see the comment at the end.
>>
>> >       __remove_memory(nid, base_addr, block_sz);
>> >
>> > @@ -607,6 +608,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
>> >
>> >       lmb_set_nid(lmb);
>> >       lmb->flags |= DRCONF_MEM_ASSIGNED;
>> > +     drmem_update_dt();
>>
>> And here ..
>> >
>> >       block_sz = memory_block_size_bytes();
>> >
>> > @@ -625,6 +627,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
>> >               invalidate_lmb_associativity_index(lmb);
>> >               lmb_clear_nid(lmb);
>> >               lmb->flags &= ~DRCONF_MEM_ASSIGNED;
>> > +             drmem_update_dt();
>>
>>
>> And here ..
>>
>> >               __remove_memory(nid, base_addr, block_sz);
>> >       }
>> > @@ -877,9 +880,6 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
>> >               break;
>> >       }
>> >
>> > -     if (!rc)
>> > -             rc = drmem_update_dt();
>> > -
>> >       unlock_device_hotplug();
>> >       return rc;
>>
>> Whereas previously we did check it.
>
> drmem_update_dt() fails iff allocating memory fail.

That's true currently, but it might change in future.

> And in the failed case, even the original code does not roll back the
> effect of __add_memory()/__remove_memory().

Yeah hard to know what the desired behaviour is. If something fails we
at least need to print a message though, not silently swallow it.

> And I plan to do the following in V4: if drmem_update_dt() fails in
> dlpar_add_lmb(), then bails out immediately.

That sounds reasonable.

cheers

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv3 2/2] powerpc/pseries: update device tree before ejecting hotplug uevents
  2020-07-22  1:52 ` [PATCHv3 2/2] powerpc/pseries: update device tree before ejecting hotplug uevents Pingfan Liu
  2020-07-22  4:57   ` Michael Ellerman
@ 2020-07-23 13:27   ` Nathan Lynch
  2020-07-24 12:24     ` Pingfan Liu
  1 sibling, 1 reply; 10+ messages in thread
From: Nathan Lynch @ 2020-07-23 13:27 UTC (permalink / raw)
  To: Pingfan Liu; +Cc: cheloha, kexec, ldufour, linuxppc-dev, Hari Bathini

Pingfan Liu <kernelfans@gmail.com> writes:
> A bug is observed on pseries by taking the following steps on rhel:
> -1. drmgr -c mem -r -q 5
> -2. echo c > /proc/sysrq-trigger
>
> And then, the failure looks like:
> kdump: saving to /sysroot//var/crash/127.0.0.1-2020-01-16-02:06:14/
> kdump: saving vmcore-dmesg.txt
> kdump: saving vmcore-dmesg.txt complete
> kdump: saving vmcore
>  Checking for memory holes                         : [  0.0 %] /                   Checking for memory holes                         : [100.0 %] |                   Excluding unnecessary pages                       : [100.0 %] \                   Copying data                                      : [  0.3 %] -          eta: 38s[   44.337636] hash-mmu: mm: Hashing failure ! EA=0x7fffba400000 access=0x8000000000000004 current=makedumpfile
> [   44.337663] hash-mmu:     trap=0x300 vsid=0x13a109c ssize=1 base psize=2 psize 2 pte=0xc000000050000504
> [   44.337677] hash-mmu: mm: Hashing failure ! EA=0x7fffba400000 access=0x8000000000000004 current=makedumpfile
> [   44.337692] hash-mmu:     trap=0x300 vsid=0x13a109c ssize=1 base psize=2 psize 2 pte=0xc000000050000504
> [   44.337708] makedumpfile[469]: unhandled signal 7 at 00007fffba400000 nip 00007fffbbc4d7fc lr 000000011356ca3c code 2
> [   44.338548] Core dump to |/bin/false pipe failed
> /lib/kdump-lib-initramfs.sh: line 98:   469 Bus error               $CORE_COLLECTOR /proc/vmcore $_mp/$KDUMP_PATH/$HOST_IP-$DATEDIR/vmcore-incomplete
> kdump: saving vmcore failed
>
> * Root cause *
>   After analyzing, it turns out that in the current implementation,
> when hot-removing lmb, the KOBJ_REMOVE event ejects before the dt updating as
> the code __remove_memory() comes before drmem_update_dt().
> So in kdump kernel, when read_from_oldmem() resorts to
> pSeries_lpar_hpte_insert() to install hpte, but fails with -2 due to
> non-exist pfn. And finally, low_hash_fault() raise SIGBUS to process, as it
> can be observed "Bus error"
>
> From a viewpoint of listener and publisher, the publisher notifies the
> listener before data is ready.  This introduces a problem where udev
> launches kexec-tools (due to KOBJ_REMOVE) and loads a stale dt before
> updating. And in capture kernel, makedumpfile will access the memory based
> on the stale dt info, and hit a SIGBUS error due to an un-existed lmb.
>
> * Fix *
>   In order to fix this issue, update dt before __remove_memory(), and
> accordingly the same rule in hot-add path.
>
> This will introduce extra dt updating payload for each involved lmb when hotplug.
> But it should be fine since drmem_update_dt() is memory based operation and
> hotplug is not a hot path.

This is great analysis but the performance implications of the change
are grave. The add/remove paths here are already O(n) where n is the
quantity of memory assigned to the LP, this change would make it O(n^2):

dlpar_memory_add_by_count
  for_each_drmem_lmb             <--
    dlpar_add_lmb
      drmem_update_dt(_v1|_v2)
        for_each_drmem_lmb       <--

Memory add/remove isn't a hot path but quadratic runtime complexity
isn't acceptable. Its current performance is bad enough that I have
internal bugs open on it.

Not to mention we leak memory every time drmem_update_dt is called
because we can't safely free device tree properties :-(

Also note that this sort of reverts (fixes?) 063b8b1251fd
("powerpc/pseries/memory-hotplug: Only update DT once per memory DLPAR
request").

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv3 1/2] powerpc/pseries: group lmb operation and memblock's
  2020-07-22  1:52 [PATCHv3 1/2] powerpc/pseries: group lmb operation and memblock's Pingfan Liu
  2020-07-22  1:52 ` [PATCHv3 2/2] powerpc/pseries: update device tree before ejecting hotplug uevents Pingfan Liu
@ 2020-07-23 14:41 ` Nathan Lynch
  2020-07-28  6:39   ` Pingfan Liu
  1 sibling, 1 reply; 10+ messages in thread
From: Nathan Lynch @ 2020-07-23 14:41 UTC (permalink / raw)
  To: Pingfan Liu; +Cc: kexec, linuxppc-dev, Hari Bathini

Pingfan Liu <kernelfans@gmail.com> writes:
> This patch prepares for the incoming patch which swaps the order of KOBJ_
> uevent and dt's updating.
>
> It has no functional effect, just groups lmb operation and memblock's in
> order to insert dt updating operation easily, and makes it easier to
> review.

...

> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index 5d545b7..1a3ac3b 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -355,7 +355,8 @@ static int dlpar_add_lmb(struct drmem_lmb *);
>  static int dlpar_remove_lmb(struct drmem_lmb *lmb)
>  {
>  	unsigned long block_sz;
> -	int rc;
> +	phys_addr_t base_addr;
> +	int rc, nid;
>  
>  	if (!lmb_is_removable(lmb))
>  		return -EINVAL;
> @@ -364,17 +365,19 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
>  	if (rc)
>  		return rc;
>  
> +	base_addr = lmb->base_addr;
> +	nid = lmb->nid;
>  	block_sz = pseries_memory_block_size();
>  
> -	__remove_memory(lmb->nid, lmb->base_addr, block_sz);
> -
> -	/* Update memory regions for memory remove */
> -	memblock_remove(lmb->base_addr, block_sz);
> -
>  	invalidate_lmb_associativity_index(lmb);
>  	lmb_clear_nid(lmb);
>  	lmb->flags &= ~DRCONF_MEM_ASSIGNED;
>  
> +	__remove_memory(nid, base_addr, block_sz);
> +
> +	/* Update memory regions for memory remove */
> +	memblock_remove(base_addr, block_sz);
> +
>  	return 0;
>  }

I don't understand; the commit message should not claim this has no
functional effect when it changes the order of operations like
this. Maybe this is an improvement over the current behavior, but it's
not explained why it would be.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv3 2/2] powerpc/pseries: update device tree before ejecting hotplug uevents
  2020-07-23 13:27   ` Nathan Lynch
@ 2020-07-24 12:24     ` Pingfan Liu
  2020-07-24 16:50       ` Nathan Lynch
  0 siblings, 1 reply; 10+ messages in thread
From: Pingfan Liu @ 2020-07-24 12:24 UTC (permalink / raw)
  To: Nathan Lynch
  Cc: cheloha, Kexec Mailing List, ldufour, linuxppc-dev, Hari Bathini

On Thu, Jul 23, 2020 at 9:27 PM Nathan Lynch <nathanl@linux.ibm.com> wrote:
>
> Pingfan Liu <kernelfans@gmail.com> writes:
> > A bug is observed on pseries by taking the following steps on rhel:
> > -1. drmgr -c mem -r -q 5
> > -2. echo c > /proc/sysrq-trigger
> >
> > And then, the failure looks like:
> > kdump: saving to /sysroot//var/crash/127.0.0.1-2020-01-16-02:06:14/
> > kdump: saving vmcore-dmesg.txt
> > kdump: saving vmcore-dmesg.txt complete
> > kdump: saving vmcore
> >  Checking for memory holes                         : [  0.0 %] /                   Checking for memory holes                         : [100.0 %] |                   Excluding unnecessary pages                       : [100.0 %] \                   Copying data                                      : [  0.3 %] -          eta: 38s[   44.337636] hash-mmu: mm: Hashing failure ! EA=0x7fffba400000 access=0x8000000000000004 current=makedumpfile
> > [   44.337663] hash-mmu:     trap=0x300 vsid=0x13a109c ssize=1 base psize=2 psize 2 pte=0xc000000050000504
> > [   44.337677] hash-mmu: mm: Hashing failure ! EA=0x7fffba400000 access=0x8000000000000004 current=makedumpfile
> > [   44.337692] hash-mmu:     trap=0x300 vsid=0x13a109c ssize=1 base psize=2 psize 2 pte=0xc000000050000504
> > [   44.337708] makedumpfile[469]: unhandled signal 7 at 00007fffba400000 nip 00007fffbbc4d7fc lr 000000011356ca3c code 2
> > [   44.338548] Core dump to |/bin/false pipe failed
> > /lib/kdump-lib-initramfs.sh: line 98:   469 Bus error               $CORE_COLLECTOR /proc/vmcore $_mp/$KDUMP_PATH/$HOST_IP-$DATEDIR/vmcore-incomplete
> > kdump: saving vmcore failed
> >
> > * Root cause *
> >   After analyzing, it turns out that in the current implementation,
> > when hot-removing lmb, the KOBJ_REMOVE event ejects before the dt updating as
> > the code __remove_memory() comes before drmem_update_dt().
> > So in kdump kernel, when read_from_oldmem() resorts to
> > pSeries_lpar_hpte_insert() to install hpte, but fails with -2 due to
> > non-exist pfn. And finally, low_hash_fault() raise SIGBUS to process, as it
> > can be observed "Bus error"
> >
> > From a viewpoint of listener and publisher, the publisher notifies the
> > listener before data is ready.  This introduces a problem where udev
> > launches kexec-tools (due to KOBJ_REMOVE) and loads a stale dt before
> > updating. And in capture kernel, makedumpfile will access the memory based
> > on the stale dt info, and hit a SIGBUS error due to an un-existed lmb.
> >
> > * Fix *
> >   In order to fix this issue, update dt before __remove_memory(), and
> > accordingly the same rule in hot-add path.
> >
> > This will introduce extra dt updating payload for each involved lmb when hotplug.
> > But it should be fine since drmem_update_dt() is memory based operation and
> > hotplug is not a hot path.
>
> This is great analysis but the performance implications of the change
> are grave. The add/remove paths here are already O(n) where n is the
> quantity of memory assigned to the LP, this change would make it O(n^2):
>
> dlpar_memory_add_by_count
>   for_each_drmem_lmb             <--
>     dlpar_add_lmb
>       drmem_update_dt(_v1|_v2)
>         for_each_drmem_lmb       <--
>
> Memory add/remove isn't a hot path but quadratic runtime complexity
> isn't acceptable. Its current performance is bad enough that I have
Yes, the quadratic runtime complexity sounds terrible.
And I am curious about the bug. Does the system have thousands of lmb?

> internal bugs open on it.
>
> Not to mention we leak memory every time drmem_update_dt is called
> because we can't safely free device tree properties :-(
Do you know what block us to free it?
>
> Also note that this sort of reverts (fixes?) 063b8b1251fd
> ("powerpc/pseries/memory-hotplug: Only update DT once per memory DLPAR
> request").
Yes. And now, I think I need to bring up another method to fix it.

Thanks,
Pingfan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv3 2/2] powerpc/pseries: update device tree before ejecting hotplug uevents
  2020-07-24 12:24     ` Pingfan Liu
@ 2020-07-24 16:50       ` Nathan Lynch
  0 siblings, 0 replies; 10+ messages in thread
From: Nathan Lynch @ 2020-07-24 16:50 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: cheloha, Kexec Mailing List, ldufour, linuxppc-dev, Hari Bathini

Pingfan Liu <kernelfans@gmail.com> writes:
> On Thu, Jul 23, 2020 at 9:27 PM Nathan Lynch <nathanl@linux.ibm.com> wrote:
>> Pingfan Liu <kernelfans@gmail.com> writes:
>> > This will introduce extra dt updating payload for each involved lmb when hotplug.
>> > But it should be fine since drmem_update_dt() is memory based operation and
>> > hotplug is not a hot path.
>>
>> This is great analysis but the performance implications of the change
>> are grave. The add/remove paths here are already O(n) where n is the
>> quantity of memory assigned to the LP, this change would make it O(n^2):
>>
>> dlpar_memory_add_by_count
>>   for_each_drmem_lmb             <--
>>     dlpar_add_lmb
>>       drmem_update_dt(_v1|_v2)
>>         for_each_drmem_lmb       <--
>>
>> Memory add/remove isn't a hot path but quadratic runtime complexity
>> isn't acceptable. Its current performance is bad enough that I have
> Yes, the quadratic runtime complexity sounds terrible.
> And I am curious about the bug. Does the system have thousands of lmb?

Yes.

>> Not to mention we leak memory every time drmem_update_dt is called
>> because we can't safely free device tree properties :-(
> Do you know what block us to free it?

It's a longstanding problem. References to device tree properties aren't
counted or tracked so there's no way to safely free them unless the node
itself is released. But the ibm,dynamic-reconfiguration-memory node does
not ever go away and its properties are only subject to updates.

Maybe there's a way to address the specific case of
ibm,dynamic-reconfiguration-memory and the ibm,dynamic-memory(-v2)
properties, instead of tackling the general problem.

Regardless of all that, the drmem code needs better data structures and
lookup functions.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv3 1/2] powerpc/pseries: group lmb operation and memblock's
  2020-07-23 14:41 ` [PATCHv3 1/2] powerpc/pseries: group lmb operation and memblock's Nathan Lynch
@ 2020-07-28  6:39   ` Pingfan Liu
  0 siblings, 0 replies; 10+ messages in thread
From: Pingfan Liu @ 2020-07-28  6:39 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: Kexec Mailing List, linuxppc-dev, Hari Bathini

On Thu, Jul 23, 2020 at 10:41 PM Nathan Lynch <nathanl@linux.ibm.com> wrote:
>
> Pingfan Liu <kernelfans@gmail.com> writes:
> > This patch prepares for the incoming patch which swaps the order of KOBJ_
> > uevent and dt's updating.
> >
> > It has no functional effect, just groups lmb operation and memblock's in
> > order to insert dt updating operation easily, and makes it easier to
> > review.
>
> ...
>
> > diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> > index 5d545b7..1a3ac3b 100644
> > --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> > +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> > @@ -355,7 +355,8 @@ static int dlpar_add_lmb(struct drmem_lmb *);
> >  static int dlpar_remove_lmb(struct drmem_lmb *lmb)
> >  {
> >       unsigned long block_sz;
> > -     int rc;
> > +     phys_addr_t base_addr;
> > +     int rc, nid;
> >
> >       if (!lmb_is_removable(lmb))
> >               return -EINVAL;
> > @@ -364,17 +365,19 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
> >       if (rc)
> >               return rc;
> >
> > +     base_addr = lmb->base_addr;
> > +     nid = lmb->nid;
> >       block_sz = pseries_memory_block_size();
> >
> > -     __remove_memory(lmb->nid, lmb->base_addr, block_sz);
> > -
> > -     /* Update memory regions for memory remove */
> > -     memblock_remove(lmb->base_addr, block_sz);
> > -
> >       invalidate_lmb_associativity_index(lmb);
> >       lmb_clear_nid(lmb);
> >       lmb->flags &= ~DRCONF_MEM_ASSIGNED;
> >
> > +     __remove_memory(nid, base_addr, block_sz);
> > +
> > +     /* Update memory regions for memory remove */
> > +     memblock_remove(base_addr, block_sz);
> > +
> >       return 0;
> >  }
>
> I don't understand; the commit message should not claim this has no
> functional effect when it changes the order of operations like
> this. Maybe this is an improvement over the current behavior, but it's
> not explained why it would be.
One group of functions, which name contains lmb, are powerpc specific,
and used to form dt.

The other group __remove_memory() and memblock_remove() are integrated
with linux mm.

And [2/2] arrange dt-updating just before __remove_memory()

Thanks,
Pingfan

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-07-28  6:42 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-22  1:52 [PATCHv3 1/2] powerpc/pseries: group lmb operation and memblock's Pingfan Liu
2020-07-22  1:52 ` [PATCHv3 2/2] powerpc/pseries: update device tree before ejecting hotplug uevents Pingfan Liu
2020-07-22  4:57   ` Michael Ellerman
2020-07-23  2:27     ` Pingfan Liu
2020-07-23  6:41       ` Michael Ellerman
2020-07-23 13:27   ` Nathan Lynch
2020-07-24 12:24     ` Pingfan Liu
2020-07-24 16:50       ` Nathan Lynch
2020-07-23 14:41 ` [PATCHv3 1/2] powerpc/pseries: group lmb operation and memblock's Nathan Lynch
2020-07-28  6:39   ` Pingfan Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).