All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] powerpc/powernv: Increase memory block size to 1GB on radix
@ 2017-09-07  5:05 Anton Blanchard
  2017-09-07  5:09 ` Aneesh Kumar K.V
  2017-10-06 11:10 ` Michael Ellerman
  0 siblings, 2 replies; 9+ messages in thread
From: Anton Blanchard @ 2017-09-07  5:05 UTC (permalink / raw)
  To: benh, paulus, mpe, bharata, arbab, npiggin, mikey, cyrilbur,
	aneesh.kumar
  Cc: linuxppc-dev

From: Anton Blanchard <anton@samba.org>

Memory hot unplug on PowerNV radix hosts is broken. Our memory block
size is 256MB but since we map the linear region with very large pages,
each pte we tear down maps 1GB.

A hot unplug of one 256MB memory block results in 768MB of memory
getting unintentionally unmapped. At this point we are likely to oops.

Fix this by increasing our memory block size to 1GB on PowerNV radix
hosts.

Signed-off-by: Anton Blanchard <anton@samba.org>
---
 arch/powerpc/platforms/powernv/setup.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
index 897aa1400eb8..bbb73aa0eb8f 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -272,7 +272,15 @@ static void pnv_kexec_cpu_down(int crash_shutdown, int secondary)
 #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
 static unsigned long pnv_memory_block_size(void)
 {
-	return 256UL * 1024 * 1024;
+	/*
+	 * We map the kernel linear region with 1GB large pages on radix. For
+	 * memory hot unplug to work our memory block size must be at least
+	 * this size.
+	 */
+	if (radix_enabled())
+		return 1UL * 1024 * 1024 * 1024;
+	else
+		return 256UL * 1024 * 1024;
 }
 #endif
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] powerpc/powernv: Increase memory block size to 1GB on radix
  2017-09-07  5:05 [PATCH] powerpc/powernv: Increase memory block size to 1GB on radix Anton Blanchard
@ 2017-09-07  5:09 ` Aneesh Kumar K.V
  2017-09-07  5:17   ` Anton Blanchard
  2017-10-06 11:10 ` Michael Ellerman
  1 sibling, 1 reply; 9+ messages in thread
From: Aneesh Kumar K.V @ 2017-09-07  5:09 UTC (permalink / raw)
  To: Anton Blanchard, benh, paulus, mpe, bharata, arbab, npiggin,
	mikey, cyrilbur
  Cc: linuxppc-dev



On 09/07/2017 10:35 AM, Anton Blanchard wrote:
> From: Anton Blanchard <anton@samba.org>
> 
> Memory hot unplug on PowerNV radix hosts is broken. Our memory block
> size is 256MB but since we map the linear region with very large pages,
> each pte we tear down maps 1GB.
> 
> A hot unplug of one 256MB memory block results in 768MB of memory
> getting unintentionally unmapped. At this point we are likely to oops.
> 
> Fix this by increasing our memory block size to 1GB on PowerNV radix
> hosts.
> 
> Signed-off-by: Anton Blanchard <anton@samba.org>
> ---
>   arch/powerpc/platforms/powernv/setup.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
> index 897aa1400eb8..bbb73aa0eb8f 100644
> --- a/arch/powerpc/platforms/powernv/setup.c
> +++ b/arch/powerpc/platforms/powernv/setup.c
> @@ -272,7 +272,15 @@ static void pnv_kexec_cpu_down(int crash_shutdown, int secondary)
>   #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
>   static unsigned long pnv_memory_block_size(void)
>   {
> -	return 256UL * 1024 * 1024;
> +	/*
> +	 * We map the kernel linear region with 1GB large pages on radix. For
> +	 * memory hot unplug to work our memory block size must be at least
> +	 * this size.
> +	 */
> +	if (radix_enabled())
> +		return 1UL * 1024 * 1024 * 1024;
> +	else
> +		return 256UL * 1024 * 1024;
>   }
>   #endif
> 

There is a similar issue being worked on w.r.t pseries.

https://lkml.kernel.org/r/1502357028-27465-1-git-send-email-bharata@linux.vnet.ibm.com

The question is should we map these regions ? ie, we need to tell the 
kernel memory region that we would like to hot unplug later so that we 
avoid doing kernel allocations from that. If we do that, then we can 
possibly map them via 2M size ?

-aneesh

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] powerpc/powernv: Increase memory block size to 1GB on radix
  2017-09-07  5:09 ` Aneesh Kumar K.V
@ 2017-09-07  5:17   ` Anton Blanchard
  2017-09-07  7:21     ` Benjamin Herrenschmidt
  2017-09-07 15:59     ` Reza Arbab
  0 siblings, 2 replies; 9+ messages in thread
From: Anton Blanchard @ 2017-09-07  5:17 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: benh, paulus, mpe, bharata, arbab, npiggin, mikey, cyrilbur,
	linuxppc-dev

Hi,

> There is a similar issue being worked on w.r.t pseries.
> 
> https://lkml.kernel.org/r/1502357028-27465-1-git-send-email-bharata@linux.vnet.ibm.com
> 
> The question is should we map these regions ? ie, we need to tell the 
> kernel memory region that we would like to hot unplug later so that
> we avoid doing kernel allocations from that. If we do that, then we
> can possibly map them via 2M size ?

But all of memory on PowerNV should be able to be hot unplugged, so
there are two options as I see it - either increase the memory block
size, or map everything with 2MB pages. 

Anton

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] powerpc/powernv: Increase memory block size to 1GB on radix
  2017-09-07  5:17   ` Anton Blanchard
@ 2017-09-07  7:21     ` Benjamin Herrenschmidt
  2017-09-08 21:51       ` Balbir Singh
  2017-09-07 15:59     ` Reza Arbab
  1 sibling, 1 reply; 9+ messages in thread
From: Benjamin Herrenschmidt @ 2017-09-07  7:21 UTC (permalink / raw)
  To: Anton Blanchard, Aneesh Kumar K.V
  Cc: paulus, mpe, bharata, arbab, npiggin, mikey, cyrilbur, linuxppc-dev

On Thu, 2017-09-07 at 15:17 +1000, Anton Blanchard wrote:
> Hi,
> 
> > There is a similar issue being worked on w.r.t pseries.
> > 
> > https://lkml.kernel.org/r/1502357028-27465-1-git-send-email-bharata@linux.vnet.ibm.com
> > 
> > The question is should we map these regions ? ie, we need to tell the 
> > kernel memory region that we would like to hot unplug later so that
> > we avoid doing kernel allocations from that. If we do that, then we
> > can possibly map them via 2M size ?
> 
> But all of memory on PowerNV should be able to be hot unplugged, so
> there are two options as I see it - either increase the memory block
> size, or map everything with 2MB pages. 

Or be smarter and map with 1G when blocks of 1G are available and break
down to 2M where necessary, it shouldn't be too hard.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] powerpc/powernv: Increase memory block size to 1GB on radix
  2017-09-07  5:17   ` Anton Blanchard
  2017-09-07  7:21     ` Benjamin Herrenschmidt
@ 2017-09-07 15:59     ` Reza Arbab
  2017-09-08  1:15       ` Anton Blanchard
  1 sibling, 1 reply; 9+ messages in thread
From: Reza Arbab @ 2017-09-07 15:59 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: Aneesh Kumar K.V, mikey, npiggin, paulus, bharata, linuxppc-dev,
	cyrilbur

On Thu, Sep 07, 2017 at 05:17:41AM +0000, Anton Blanchard wrote:
>But all of memory on PowerNV should be able to be hot unplugged, so
>there are two options as I see it - either increase the memory block
>size, or map everything with 2MB pages.

I may be misunderstanding this, but what if we did something like x86 
does? When trying to unplug a region smaller than the mapping, they fill 
that part of the pagetable with 0xFD instead of freeing the whole thing.  
Once the whole thing is 0xFD, free it.

See arch/x86/mm/init_64.c:remove_{pte,pmd,pud}_table()

---%<---
	memset((void *)addr, PAGE_INUSE, next - addr);

	page_addr = page_address(pte_page(*pte));
	if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) {
		...
		pte_clear(&init_mm, addr, pte);
		...
	}
---%<---

-- 
Reza Arbab

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] powerpc/powernv: Increase memory block size to 1GB on radix
  2017-09-07 15:59     ` Reza Arbab
@ 2017-09-08  1:15       ` Anton Blanchard
  2017-09-09 21:30         ` Michael Ellerman
  0 siblings, 1 reply; 9+ messages in thread
From: Anton Blanchard @ 2017-09-08  1:15 UTC (permalink / raw)
  To: Reza Arbab
  Cc: Aneesh Kumar K.V, mikey, npiggin, paulus, bharata, linuxppc-dev,
	cyrilbur, Benjamin Herrenschmidt

Hi Reza,

> I may be misunderstanding this, but what if we did something like x86 
> does? When trying to unplug a region smaller than the mapping, they
> fill that part of the pagetable with 0xFD instead of freeing the
> whole thing. Once the whole thing is 0xFD, free it.
> 
> See arch/x86/mm/init_64.c:remove_{pte,pmd,pud}_table()
> 
> ---%<---
> 	memset((void *)addr, PAGE_INUSE, next - addr);
> 
> 	page_addr = page_address(pte_page(*pte));
> 	if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) {
> 		...
> 		pte_clear(&init_mm, addr, pte);
> 		...
> 	}
> ---%<---

But you only have 1GB ptes at this point, you'd need to start
instantiating a new level in the tree, and populate 2MB ptes.

That is what Ben is suggesting. I'm happy to go any way (fix hotplug
to handle this, or increase the memblock size on PowerNV to 1GB), I just
need a solution.

Anton

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] powerpc/powernv: Increase memory block size to 1GB on radix
  2017-09-07  7:21     ` Benjamin Herrenschmidt
@ 2017-09-08 21:51       ` Balbir Singh
  0 siblings, 0 replies; 9+ messages in thread
From: Balbir Singh @ 2017-09-08 21:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Anton Blanchard, Aneesh Kumar K.V, Michael Neuling,
	Nicholas Piggin, Paul Mackerras, Reza Arbab, Bharata B Rao,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
	Cyril Bur

On Thu, Sep 7, 2017 at 5:21 PM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Thu, 2017-09-07 at 15:17 +1000, Anton Blanchard wrote:
>> Hi,
>>
>> > There is a similar issue being worked on w.r.t pseries.
>> >
>> > https://lkml.kernel.org/r/1502357028-27465-1-git-send-email-bharata@linux.vnet.ibm.com
>> >
>> > The question is should we map these regions ? ie, we need to tell the
>> > kernel memory region that we would like to hot unplug later so that
>> > we avoid doing kernel allocations from that. If we do that, then we
>> > can possibly map them via 2M size ?
>>
>> But all of memory on PowerNV should be able to be hot unplugged, so

For this ideally we need movable mappings for the regions we intend
to hot-unplug - no? Otherwise, there is no guarantee that hot-unplug
will work

>> there are two options as I see it - either increase the memory block
>> size, or map everything with 2MB pages.
>
> Or be smarter and map with 1G when blocks of 1G are available and break
> down to 2M where necessary, it shouldn't be too hard.
>

strict_rwx patches added helpers to do this

Balbir Singh.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] powerpc/powernv: Increase memory block size to 1GB on radix
  2017-09-08  1:15       ` Anton Blanchard
@ 2017-09-09 21:30         ` Michael Ellerman
  0 siblings, 0 replies; 9+ messages in thread
From: Michael Ellerman @ 2017-09-09 21:30 UTC (permalink / raw)
  To: linuxppc-dev, Anton Blanchard, Reza Arbab
  Cc: mikey, npiggin, paulus, Aneesh Kumar K.V, bharata, cyrilbur

[-- Attachment #1: Type: text/plain, Size: 1191 bytes --]

We should do the 1G block size as a fix, and backport it, and then make the hot unplug code smarter.

cheers

On 8 September 2017 11:15:47 am AEST, Anton Blanchard <anton@ozlabs.org> wrote:
>Hi Reza,
>
>> I may be misunderstanding this, but what if we did something like x86
>
>> does? When trying to unplug a region smaller than the mapping, they
>> fill that part of the pagetable with 0xFD instead of freeing the
>> whole thing. Once the whole thing is 0xFD, free it.
>> 
>> See arch/x86/mm/init_64.c:remove_{pte,pmd,pud}_table()
>> 
>> ---%<---
>> 	memset((void *)addr, PAGE_INUSE, next - addr);
>> 
>> 	page_addr = page_address(pte_page(*pte));
>> 	if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) {
>> 		...
>> 		pte_clear(&init_mm, addr, pte);
>> 		...
>> 	}
>> ---%<---
>
>But you only have 1GB ptes at this point, you'd need to start
>instantiating a new level in the tree, and populate 2MB ptes.
>
>That is what Ben is suggesting. I'm happy to go any way (fix hotplug
>to handle this, or increase the memblock size on PowerNV to 1GB), I
>just
>need a solution.
>
>Anton

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

[-- Attachment #2: Type: text/html, Size: 1665 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: powerpc/powernv: Increase memory block size to 1GB on radix
  2017-09-07  5:05 [PATCH] powerpc/powernv: Increase memory block size to 1GB on radix Anton Blanchard
  2017-09-07  5:09 ` Aneesh Kumar K.V
@ 2017-10-06 11:10 ` Michael Ellerman
  1 sibling, 0 replies; 9+ messages in thread
From: Michael Ellerman @ 2017-10-06 11:10 UTC (permalink / raw)
  To: Anton Blanchard, benh, paulus, bharata, arbab, npiggin, mikey,
	cyrilbur, aneesh.kumar
  Cc: linuxppc-dev

On Thu, 2017-09-07 at 05:05:51 UTC, Anton Blanchard wrote:
> From: Anton Blanchard <anton@samba.org>
> 
> Memory hot unplug on PowerNV radix hosts is broken. Our memory block
> size is 256MB but since we map the linear region with very large pages,
> each pte we tear down maps 1GB.
> 
> A hot unplug of one 256MB memory block results in 768MB of memory
> getting unintentionally unmapped. At this point we are likely to oops.
> 
> Fix this by increasing our memory block size to 1GB on PowerNV radix
> hosts.
> 
> Signed-off-by: Anton Blanchard <anton@samba.org>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/53ecde0b9126ff140abe3aefd7f0ec

cheers

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-10-06 11:10 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-07  5:05 [PATCH] powerpc/powernv: Increase memory block size to 1GB on radix Anton Blanchard
2017-09-07  5:09 ` Aneesh Kumar K.V
2017-09-07  5:17   ` Anton Blanchard
2017-09-07  7:21     ` Benjamin Herrenschmidt
2017-09-08 21:51       ` Balbir Singh
2017-09-07 15:59     ` Reza Arbab
2017-09-08  1:15       ` Anton Blanchard
2017-09-09 21:30         ` Michael Ellerman
2017-10-06 11:10 ` Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.