All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory
       [not found] <01010174769e2b68-a6f3768e-aef8-43c7-b357-a8cb1e17d3eb-000000@us-west-2.amazonses.com>
@ 2020-09-10  6:45   ` Anshuman Khandual
  2020-09-10  8:27   ` Steven Price
  1 sibling, 0 replies; 20+ messages in thread
From: Anshuman Khandual @ 2020-09-10  6:45 UTC (permalink / raw)
  To: Sudarshan Rajagopalan, linux-arm-kernel, linux-kernel
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Logan Gunthorpe,
	David Hildenbrand, Andrew Morton, Steven Price

Hello Sudarshan,

On 09/10/2020 11:35 AM, Sudarshan Rajagopalan wrote:
> When section mappings are enabled, we allocate vmemmap pages from physically
> continuous memory of size PMD_SZIE using vmemmap_alloc_block_buf(). Section> mappings are good to reduce TLB pressure. But when system is highly fragmented
> and memory blocks are being hot-added at runtime, its possible that such
> physically continuous memory allocations can fail. Rather than failing the

Did you really see this happen on a system ?

> memory hot-add procedure, add a fallback option to allocate vmemmap pages from
> discontinuous pages using vmemmap_populate_basepages().

Which could lead to a mixed page size mapping in the VMEMMAP area.
Allocation failure in vmemmap_populate() should just cleanly fail
the memory hot add operation, which can then be retried. Why the
retry has to be offloaded to kernel ?

> 
> Signed-off-by: Sudarshan Rajagopalan <sudaraja@codeaurora.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Steven Price <steven.price@arm.com>
> ---
>  arch/arm64/mm/mmu.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 75df62f..a46c7d4 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1100,6 +1100,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>  	p4d_t *p4dp;
>  	pud_t *pudp;
>  	pmd_t *pmdp;
> +	int ret = 0;
>  
>  	do {
>  		next = pmd_addr_end(addr, end);
> @@ -1121,15 +1122,23 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>  			void *p = NULL;
>  
>  			p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
> -			if (!p)
> -				return -ENOMEM;
> +			if (!p) {
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +				vmemmap_free(start, end, altmap);
> +#endif

The mapping was never created in the first place, as the allocation
failed. vmemmap_free() here will free an unmapped area !

> +				ret = -ENOMEM;
> +				break;
> +			}
>  
>  			pmd_set_huge(pmdp, __pa(p), __pgprot(PROT_SECT_NORMAL));
>  		} else
>  			vmemmap_verify((pte_t *)pmdp, node, addr, next);
>  	} while (addr = next, addr != end);
>  
> -	return 0;
> +	if (ret)
> +		return vmemmap_populate_basepages(start, end, node, altmap);
> +	else
> +		return ret;
>  }
>  #endif	/* !ARM64_SWAPPER_USES_SECTION_MAPS */
>  void vmemmap_free(unsigned long start, unsigned long end,
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory
@ 2020-09-10  6:45   ` Anshuman Khandual
  0 siblings, 0 replies; 20+ messages in thread
From: Anshuman Khandual @ 2020-09-10  6:45 UTC (permalink / raw)
  To: Sudarshan Rajagopalan, linux-arm-kernel, linux-kernel
  Cc: Mark Rutland, Will Deacon, David Hildenbrand, Catalin Marinas,
	Steven Price, Andrew Morton, Logan Gunthorpe

Hello Sudarshan,

On 09/10/2020 11:35 AM, Sudarshan Rajagopalan wrote:
> When section mappings are enabled, we allocate vmemmap pages from physically
> continuous memory of size PMD_SZIE using vmemmap_alloc_block_buf(). Section> mappings are good to reduce TLB pressure. But when system is highly fragmented
> and memory blocks are being hot-added at runtime, its possible that such
> physically continuous memory allocations can fail. Rather than failing the

Did you really see this happen on a system ?

> memory hot-add procedure, add a fallback option to allocate vmemmap pages from
> discontinuous pages using vmemmap_populate_basepages().

Which could lead to a mixed page size mapping in the VMEMMAP area.
Allocation failure in vmemmap_populate() should just cleanly fail
the memory hot add operation, which can then be retried. Why the
retry has to be offloaded to kernel ?

> 
> Signed-off-by: Sudarshan Rajagopalan <sudaraja@codeaurora.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Steven Price <steven.price@arm.com>
> ---
>  arch/arm64/mm/mmu.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 75df62f..a46c7d4 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1100,6 +1100,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>  	p4d_t *p4dp;
>  	pud_t *pudp;
>  	pmd_t *pmdp;
> +	int ret = 0;
>  
>  	do {
>  		next = pmd_addr_end(addr, end);
> @@ -1121,15 +1122,23 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>  			void *p = NULL;
>  
>  			p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
> -			if (!p)
> -				return -ENOMEM;
> +			if (!p) {
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +				vmemmap_free(start, end, altmap);
> +#endif

The mapping was never created in the first place, as the allocation
failed. vmemmap_free() here will free an unmapped area !

> +				ret = -ENOMEM;
> +				break;
> +			}
>  
>  			pmd_set_huge(pmdp, __pa(p), __pgprot(PROT_SECT_NORMAL));
>  		} else
>  			vmemmap_verify((pte_t *)pmdp, node, addr, next);
>  	} while (addr = next, addr != end);
>  
> -	return 0;
> +	if (ret)
> +		return vmemmap_populate_basepages(start, end, node, altmap);
> +	else
> +		return ret;
>  }
>  #endif	/* !ARM64_SWAPPER_USES_SECTION_MAPS */
>  void vmemmap_free(unsigned long start, unsigned long end,
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory
  2020-09-10  6:45   ` Anshuman Khandual
@ 2020-09-10  8:08     ` David Hildenbrand
  -1 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2020-09-10  8:08 UTC (permalink / raw)
  To: Anshuman Khandual, Sudarshan Rajagopalan, linux-arm-kernel, linux-kernel
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Logan Gunthorpe,
	Andrew Morton, Steven Price

On 10.09.20 08:45, Anshuman Khandual wrote:
> Hello Sudarshan,
> 
> On 09/10/2020 11:35 AM, Sudarshan Rajagopalan wrote:
>> When section mappings are enabled, we allocate vmemmap pages from physically
>> continuous memory of size PMD_SZIE using vmemmap_alloc_block_buf(). Section> mappings are good to reduce TLB pressure. But when system is highly fragmented
>> and memory blocks are being hot-added at runtime, its possible that such
>> physically continuous memory allocations can fail. Rather than failing the
> 
> Did you really see this happen on a system ?
> 
>> memory hot-add procedure, add a fallback option to allocate vmemmap pages from
>> discontinuous pages using vmemmap_populate_basepages().
> 
> Which could lead to a mixed page size mapping in the VMEMMAP area.

Right, with gives you a slight performance hit - nobody really cares,
especially if it happens in corner cases only.

At least x86_64 (see vmemmap_populate_hugepages()) and s390x (added
recently by me) implement that behavior.

Assume you run in a virtualized environment where your hypervisor tries
to do some smart dynamic guest resizing - like monitoring the guest
memory consumption and adding more memory on demand. You much rather
want hotadd to succeed (in these corner cases) that failing just because
you weren't able to grab a huge page in one instance.

Examples include XEN balloon, Hyper-V balloon, and virtio-mem. We might
see some of these for arm64 as well (if don't already do).

> Allocation failure in vmemmap_populate() should just cleanly fail
> the memory hot add operation, which can then be retried. Why the
> retry has to be offloaded to kernel ?

(not sure what "offloaded to kernel" really means here - add_memory() is
also just triggered from the kernel) I disagree, we should try our best
to add memory and make it available, especially when short on memory
already.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory
@ 2020-09-10  8:08     ` David Hildenbrand
  0 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2020-09-10  8:08 UTC (permalink / raw)
  To: Anshuman Khandual, Sudarshan Rajagopalan, linux-arm-kernel, linux-kernel
  Cc: Mark Rutland, Will Deacon, Catalin Marinas, Steven Price,
	Andrew Morton, Logan Gunthorpe

On 10.09.20 08:45, Anshuman Khandual wrote:
> Hello Sudarshan,
> 
> On 09/10/2020 11:35 AM, Sudarshan Rajagopalan wrote:
>> When section mappings are enabled, we allocate vmemmap pages from physically
>> continuous memory of size PMD_SZIE using vmemmap_alloc_block_buf(). Section> mappings are good to reduce TLB pressure. But when system is highly fragmented
>> and memory blocks are being hot-added at runtime, its possible that such
>> physically continuous memory allocations can fail. Rather than failing the
> 
> Did you really see this happen on a system ?
> 
>> memory hot-add procedure, add a fallback option to allocate vmemmap pages from
>> discontinuous pages using vmemmap_populate_basepages().
> 
> Which could lead to a mixed page size mapping in the VMEMMAP area.

Right, with gives you a slight performance hit - nobody really cares,
especially if it happens in corner cases only.

At least x86_64 (see vmemmap_populate_hugepages()) and s390x (added
recently by me) implement that behavior.

Assume you run in a virtualized environment where your hypervisor tries
to do some smart dynamic guest resizing - like monitoring the guest
memory consumption and adding more memory on demand. You much rather
want hotadd to succeed (in these corner cases) that failing just because
you weren't able to grab a huge page in one instance.

Examples include XEN balloon, Hyper-V balloon, and virtio-mem. We might
see some of these for arm64 as well (if don't already do).

> Allocation failure in vmemmap_populate() should just cleanly fail
> the memory hot add operation, which can then be retried. Why the
> retry has to be offloaded to kernel ?

(not sure what "offloaded to kernel" really means here - add_memory() is
also just triggered from the kernel) I disagree, we should try our best
to add memory and make it available, especially when short on memory
already.

-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory
  2020-09-10  6:45   ` Anshuman Khandual
                     ` (2 preceding siblings ...)
  (?)
@ 2020-09-10  8:27   ` sudaraja
  -1 siblings, 0 replies; 20+ messages in thread
From: sudaraja @ 2020-09-10  8:27 UTC (permalink / raw)
  To: 'Anshuman Khandual', linux-arm-kernel, linux-kernel
  Cc: 'Catalin Marinas', 'Will Deacon',
	'Mark Rutland', 'Logan Gunthorpe',
	'David Hildenbrand', 'Andrew Morton',
	'Steven Price',
	pratikp

Hello Anshuman,

>On 09/10/2020 11:35 AM, Sudarshan Rajagopalan wrote:
>> When section mappings are enabled, we allocate vmemmap pages from 
>> physically continuous memory of size PMD_SZIE using 
>> vmemmap_alloc_block_buf(). Section> mappings are good to reduce TLB 
>> pressure. But when system is highly fragmented and memory blocks are 
>> being hot-added at runtime, its possible that such physically 
>> continuous memory allocations can fail. Rather than failing the
>
>Did you really see this happen on a system ?

Thanks for the response.

Yes, this happened on a system with very low RAM (size ~120MB) where no free order-9 pages were present. Pasting below few kernel logs. On systems with low RAM, its high probability where memory is fragmented and no higher order pages are free. On such scenarios, vmemmap alloc would fail for PMD_SIZE of contiguous memory.

We have a usecase for memory sharing between VMs where one of the VM uses add_memory() to add the memory that was donated by the other VM. This uses something similar to VirtIO-Mem. And this requires memory to be _guaranteed_ to be added in the VM so that the usecase can run without any failure.

vmemmap alloc failure: order:9, mode:0x4cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL), nodemask=(null),cpuset=/,mems_allowed=0
CPU: 1 PID: 294 Comm: -------- Tainted: G S                5.4.50 #1
Call trace:
 dump_stack+0xa4/0xdc
 warn_alloc+0x104/0x160
 vmemmap_alloc_block+0xe4/0xf4
 vmemmap_alloc_block_buf+0x34/0x38
 vmemmap_populate+0xc8/0x224
 __populate_section_memmap+0x34/0x54
 sparse_add_section+0x16c/0x254
 __add_pages+0xd0/0x138
 arch_add_memory+0x114/0x1a8

DMA32: 2627*4kB (UMC) 23*8kB (UME) 6*16kB (UM) 8*32kB (UME) 2*64kB (ME) 2*128kB (UE) 1*256kB (M) 2*512kB (ME) 1*1024kB (M) 0*2048kB 0*4096kB = 13732kB
30455 pages RAM

But keeping this usecase aside, won’t this be problematic on any systems with low RAM where order-9 alloc would fail on a fragmented system, and any memory hot-adding would fail? Or other similar users of VirtIO-Mem which uses arch_add_memory.

>
>> memory hot-add procedure, add a fallback option to allocate vmemmap 
>> pages from discontinuous pages using vmemmap_populate_basepages().
>
>Which could lead to a mixed page size mapping in the VMEMMAP area.

Would this be problematic? We would only lose one section mapping per failure and increases slight TLB pressure. Also, we would anyway do discontinuous pages alloc for systems having non-4K pages (ARM64_SWAPPER_USES_SECTION_MAPS will be 0). I only see a small cost to performance due to slight TLB pressure.

>Allocation failure in vmemmap_populate() should just cleanly fail the memory hot add operation, which can then be retried. Why the retry has to be offloaded to kernel ?

While a retry can attempted again, but it won’t help in cases where there are no order-9 pages available and any retry would just not succeed again until a order-9 page gets free'ed. Here we are just falling back to use discontinuous pages allocation to help succeed memory hot-add as best as possible.

Thanks and Regards,
Sudarshan

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

-----Original Message-----
From: Anshuman Khandual <anshuman.khandual@arm.com> 
Sent: Wednesday, September 9, 2020 11:45 PM
To: Sudarshan Rajagopalan <sudaraja@codeaurora.org>; linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>; Will Deacon <will@kernel.org>; Mark Rutland <mark.rutland@arm.com>; Logan Gunthorpe <logang@deltatee.com>; David Hildenbrand <david@redhat.com>; Andrew Morton <akpm@linux-foundation.org>; Steven Price <steven.price@arm.com>
Subject: Re: [PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory

Hello Sudarshan,

On 09/10/2020 11:35 AM, Sudarshan Rajagopalan wrote:
> When section mappings are enabled, we allocate vmemmap pages from 
> physically continuous memory of size PMD_SZIE using 
> vmemmap_alloc_block_buf(). Section> mappings are good to reduce TLB 
> pressure. But when system is highly fragmented and memory blocks are 
> being hot-added at runtime, its possible that such physically 
> continuous memory allocations can fail. Rather than failing the

Did you really see this happen on a system ?

> memory hot-add procedure, add a fallback option to allocate vmemmap 
> pages from discontinuous pages using vmemmap_populate_basepages().

Which could lead to a mixed page size mapping in the VMEMMAP area.
Allocation failure in vmemmap_populate() should just cleanly fail the memory hot add operation, which can then be retried. Why the retry has to be offloaded to kernel ?

> 
> Signed-off-by: Sudarshan Rajagopalan <sudaraja@codeaurora.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Steven Price <steven.price@arm.com>
> ---
>  arch/arm64/mm/mmu.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 
> 75df62f..a46c7d4 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1100,6 +1100,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>  	p4d_t *p4dp;
>  	pud_t *pudp;
>  	pmd_t *pmdp;
> +	int ret = 0;
>  
>  	do {
>  		next = pmd_addr_end(addr, end);
> @@ -1121,15 +1122,23 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>  			void *p = NULL;
>  
>  			p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
> -			if (!p)
> -				return -ENOMEM;
> +			if (!p) {
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +				vmemmap_free(start, end, altmap); #endif

The mapping was never created in the first place, as the allocation failed. vmemmap_free() here will free an unmapped area !

> +				ret = -ENOMEM;
> +				break;
> +			}
>  
>  			pmd_set_huge(pmdp, __pa(p), __pgprot(PROT_SECT_NORMAL));
>  		} else
>  			vmemmap_verify((pte_t *)pmdp, node, addr, next);
>  	} while (addr = next, addr != end);
>  
> -	return 0;
> +	if (ret)
> +		return vmemmap_populate_basepages(start, end, node, altmap);
> +	else
> +		return ret;
>  }
>  #endif	/* !ARM64_SWAPPER_USES_SECTION_MAPS */
>  void vmemmap_free(unsigned long start, unsigned long end,
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory
  2020-09-10  6:45   ` Anshuman Khandual
  (?)
  (?)
@ 2020-09-10  8:27   ` sudaraja
  2020-09-10 11:16       ` Anshuman Khandual
  -1 siblings, 1 reply; 20+ messages in thread
From: sudaraja @ 2020-09-10  8:27 UTC (permalink / raw)
  To: 'Anshuman Khandual', linux-arm-kernel, linux-kernel
  Cc: 'Mark Rutland', 'Will Deacon',
	'David Hildenbrand', 'Catalin Marinas',
	'Steven Price', 'Andrew Morton',
	'Logan Gunthorpe',
	pratikp

Hello Anshuman,

>On 09/10/2020 11:35 AM, Sudarshan Rajagopalan wrote:
>> When section mappings are enabled, we allocate vmemmap pages from 
>> physically continuous memory of size PMD_SZIE using 
>> vmemmap_alloc_block_buf(). Section> mappings are good to reduce TLB 
>> pressure. But when system is highly fragmented and memory blocks are 
>> being hot-added at runtime, its possible that such physically 
>> continuous memory allocations can fail. Rather than failing the
>
>Did you really see this happen on a system ?

Thanks for the response.

Yes, this happened on a system with very low RAM (size ~120MB) where no free order-9 pages were present. Pasting below few kernel logs. On systems with low RAM, its high probability where memory is fragmented and no higher order pages are free. On such scenarios, vmemmap alloc would fail for PMD_SIZE of contiguous memory.

We have a usecase for memory sharing between VMs where one of the VM uses add_memory() to add the memory that was donated by the other VM. This uses something similar to VirtIO-Mem. And this requires memory to be _guaranteed_ to be added in the VM so that the usecase can run without any failure.

vmemmap alloc failure: order:9, mode:0x4cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL), nodemask=(null),cpuset=/,mems_allowed=0
CPU: 1 PID: 294 Comm: -------- Tainted: G S                5.4.50 #1
Call trace:
 dump_stack+0xa4/0xdc
 warn_alloc+0x104/0x160
 vmemmap_alloc_block+0xe4/0xf4
 vmemmap_alloc_block_buf+0x34/0x38
 vmemmap_populate+0xc8/0x224
 __populate_section_memmap+0x34/0x54
 sparse_add_section+0x16c/0x254
 __add_pages+0xd0/0x138
 arch_add_memory+0x114/0x1a8

DMA32: 2627*4kB (UMC) 23*8kB (UME) 6*16kB (UM) 8*32kB (UME) 2*64kB (ME) 2*128kB (UE) 1*256kB (M) 2*512kB (ME) 1*1024kB (M) 0*2048kB 0*4096kB = 13732kB
30455 pages RAM

But keeping this usecase aside, won’t this be problematic on any systems with low RAM where order-9 alloc would fail on a fragmented system, and any memory hot-adding would fail? Or other similar users of VirtIO-Mem which uses arch_add_memory.

>
>> memory hot-add procedure, add a fallback option to allocate vmemmap 
>> pages from discontinuous pages using vmemmap_populate_basepages().
>
>Which could lead to a mixed page size mapping in the VMEMMAP area.

Would this be problematic? We would only lose one section mapping per failure and increases slight TLB pressure. Also, we would anyway do discontinuous pages alloc for systems having non-4K pages (ARM64_SWAPPER_USES_SECTION_MAPS will be 0). I only see a small cost to performance due to slight TLB pressure.

>Allocation failure in vmemmap_populate() should just cleanly fail the memory hot add operation, which can then be retried. Why the retry has to be offloaded to kernel ?

While a retry can attempted again, but it won’t help in cases where there are no order-9 pages available and any retry would just not succeed again until a order-9 page gets free'ed. Here we are just falling back to use discontinuous pages allocation to help succeed memory hot-add as best as possible.

Thanks and Regards,
Sudarshan

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

-----Original Message-----
From: Anshuman Khandual <anshuman.khandual@arm.com> 
Sent: Wednesday, September 9, 2020 11:45 PM
To: Sudarshan Rajagopalan <sudaraja@codeaurora.org>; linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>; Will Deacon <will@kernel.org>; Mark Rutland <mark.rutland@arm.com>; Logan Gunthorpe <logang@deltatee.com>; David Hildenbrand <david@redhat.com>; Andrew Morton <akpm@linux-foundation.org>; Steven Price <steven.price@arm.com>
Subject: Re: [PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory

Hello Sudarshan,

On 09/10/2020 11:35 AM, Sudarshan Rajagopalan wrote:
> When section mappings are enabled, we allocate vmemmap pages from 
> physically continuous memory of size PMD_SZIE using 
> vmemmap_alloc_block_buf(). Section> mappings are good to reduce TLB 
> pressure. But when system is highly fragmented and memory blocks are 
> being hot-added at runtime, its possible that such physically 
> continuous memory allocations can fail. Rather than failing the

Did you really see this happen on a system ?

> memory hot-add procedure, add a fallback option to allocate vmemmap 
> pages from discontinuous pages using vmemmap_populate_basepages().

Which could lead to a mixed page size mapping in the VMEMMAP area.
Allocation failure in vmemmap_populate() should just cleanly fail the memory hot add operation, which can then be retried. Why the retry has to be offloaded to kernel ?

> 
> Signed-off-by: Sudarshan Rajagopalan <sudaraja@codeaurora.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Steven Price <steven.price@arm.com>
> ---
>  arch/arm64/mm/mmu.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 
> 75df62f..a46c7d4 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1100,6 +1100,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>  	p4d_t *p4dp;
>  	pud_t *pudp;
>  	pmd_t *pmdp;
> +	int ret = 0;
>  
>  	do {
>  		next = pmd_addr_end(addr, end);
> @@ -1121,15 +1122,23 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>  			void *p = NULL;
>  
>  			p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
> -			if (!p)
> -				return -ENOMEM;
> +			if (!p) {
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +				vmemmap_free(start, end, altmap); #endif

The mapping was never created in the first place, as the allocation failed. vmemmap_free() here will free an unmapped area !

> +				ret = -ENOMEM;
> +				break;
> +			}
>  
>  			pmd_set_huge(pmdp, __pa(p), __pgprot(PROT_SECT_NORMAL));
>  		} else
>  			vmemmap_verify((pte_t *)pmdp, node, addr, next);
>  	} while (addr = next, addr != end);
>  
> -	return 0;
> +	if (ret)
> +		return vmemmap_populate_basepages(start, end, node, altmap);
> +	else
> +		return ret;
>  }
>  #endif	/* !ARM64_SWAPPER_USES_SECTION_MAPS */
>  void vmemmap_free(unsigned long start, unsigned long end,
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory
       [not found] <01010174769e2b68-a6f3768e-aef8-43c7-b357-a8cb1e17d3eb-000000@us-west-2.amazonses.com>
@ 2020-09-10  8:27   ` Steven Price
  2020-09-10  8:27   ` Steven Price
  1 sibling, 0 replies; 20+ messages in thread
From: Steven Price @ 2020-09-10  8:27 UTC (permalink / raw)
  To: Sudarshan Rajagopalan, linux-arm-kernel, linux-kernel
  Cc: Catalin Marinas, Will Deacon, Anshuman Khandual, Mark Rutland,
	Logan Gunthorpe, David Hildenbrand, Andrew Morton

On 10/09/2020 07:05, Sudarshan Rajagopalan wrote:
> When section mappings are enabled, we allocate vmemmap pages from physically
> continuous memory of size PMD_SZIE using vmemmap_alloc_block_buf(). Section
> mappings are good to reduce TLB pressure. But when system is highly fragmented
> and memory blocks are being hot-added at runtime, its possible that such
> physically continuous memory allocations can fail. Rather than failing the
> memory hot-add procedure, add a fallback option to allocate vmemmap pages from
> discontinuous pages using vmemmap_populate_basepages().
> 
> Signed-off-by: Sudarshan Rajagopalan <sudaraja@codeaurora.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Steven Price <steven.price@arm.com>
> ---
>   arch/arm64/mm/mmu.c | 15 ++++++++++++---
>   1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 75df62f..a46c7d4 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1100,6 +1100,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>   	p4d_t *p4dp;
>   	pud_t *pudp;
>   	pmd_t *pmdp;
> +	int ret = 0;
>   
>   	do {
>   		next = pmd_addr_end(addr, end);
> @@ -1121,15 +1122,23 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>   			void *p = NULL;
>   
>   			p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
> -			if (!p)
> -				return -ENOMEM;
> +			if (!p) {
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +				vmemmap_free(start, end, altmap);
> +#endif
> +				ret = -ENOMEM;
> +				break;
> +			}
>   
>   			pmd_set_huge(pmdp, __pa(p), __pgprot(PROT_SECT_NORMAL));
>   		} else
>   			vmemmap_verify((pte_t *)pmdp, node, addr, next);
>   	} while (addr = next, addr != end);
>   
> -	return 0;
> +	if (ret)
> +		return vmemmap_populate_basepages(start, end, node, altmap);
> +	else
> +		return ret;

Style comment: I find this usage of 'ret' confusing. When we assign 
-ENOMEM above that is never actually the return value of the function 
(in that case vmemmap_populate_basepages() provides the actual return 
value).

Also the "return ret" is misleading since we know by that point that 
ret==0 (and the 'else' is redundant).

Can you not just move the call to vmemmap_populate_basepages() up to 
just after the (possible) vmemmap_free() call and remove the 'ret' variable?

AFAICT the call to vmemmap_free() also doesn't need the #ifdef as the 
function is a no-op if CONFIG_MEMORY_HOTPLUG isn't set. I also feel you 
need at least a comment to explain Anshuman's point that it looks like 
you're freeing an unmapped area. Although if I'm reading the code 
correctly it seems like the unmapped area will just be skipped.

Steve

>   }
>   #endif	/* !ARM64_SWAPPER_USES_SECTION_MAPS */
>   void vmemmap_free(unsigned long start, unsigned long end,
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory
@ 2020-09-10  8:27   ` Steven Price
  0 siblings, 0 replies; 20+ messages in thread
From: Steven Price @ 2020-09-10  8:27 UTC (permalink / raw)
  To: Sudarshan Rajagopalan, linux-arm-kernel, linux-kernel
  Cc: Mark Rutland, Will Deacon, David Hildenbrand, Catalin Marinas,
	Anshuman Khandual, Andrew Morton, Logan Gunthorpe

On 10/09/2020 07:05, Sudarshan Rajagopalan wrote:
> When section mappings are enabled, we allocate vmemmap pages from physically
> continuous memory of size PMD_SZIE using vmemmap_alloc_block_buf(). Section
> mappings are good to reduce TLB pressure. But when system is highly fragmented
> and memory blocks are being hot-added at runtime, its possible that such
> physically continuous memory allocations can fail. Rather than failing the
> memory hot-add procedure, add a fallback option to allocate vmemmap pages from
> discontinuous pages using vmemmap_populate_basepages().
> 
> Signed-off-by: Sudarshan Rajagopalan <sudaraja@codeaurora.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Steven Price <steven.price@arm.com>
> ---
>   arch/arm64/mm/mmu.c | 15 ++++++++++++---
>   1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 75df62f..a46c7d4 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1100,6 +1100,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>   	p4d_t *p4dp;
>   	pud_t *pudp;
>   	pmd_t *pmdp;
> +	int ret = 0;
>   
>   	do {
>   		next = pmd_addr_end(addr, end);
> @@ -1121,15 +1122,23 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>   			void *p = NULL;
>   
>   			p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
> -			if (!p)
> -				return -ENOMEM;
> +			if (!p) {
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +				vmemmap_free(start, end, altmap);
> +#endif
> +				ret = -ENOMEM;
> +				break;
> +			}
>   
>   			pmd_set_huge(pmdp, __pa(p), __pgprot(PROT_SECT_NORMAL));
>   		} else
>   			vmemmap_verify((pte_t *)pmdp, node, addr, next);
>   	} while (addr = next, addr != end);
>   
> -	return 0;
> +	if (ret)
> +		return vmemmap_populate_basepages(start, end, node, altmap);
> +	else
> +		return ret;

Style comment: I find this usage of 'ret' confusing. When we assign 
-ENOMEM above that is never actually the return value of the function 
(in that case vmemmap_populate_basepages() provides the actual return 
value).

Also the "return ret" is misleading since we know by that point that 
ret==0 (and the 'else' is redundant).

Can you not just move the call to vmemmap_populate_basepages() up to 
just after the (possible) vmemmap_free() call and remove the 'ret' variable?

AFAICT the call to vmemmap_free() also doesn't need the #ifdef as the 
function is a no-op if CONFIG_MEMORY_HOTPLUG isn't set. I also feel you 
need at least a comment to explain Anshuman's point that it looks like 
you're freeing an unmapped area. Although if I'm reading the code 
correctly it seems like the unmapped area will just be skipped.

Steve

>   }
>   #endif	/* !ARM64_SWAPPER_USES_SECTION_MAPS */
>   void vmemmap_free(unsigned long start, unsigned long end,
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory
  2020-09-10  8:27   ` Steven Price
@ 2020-09-10 10:50     ` Anshuman Khandual
  -1 siblings, 0 replies; 20+ messages in thread
From: Anshuman Khandual @ 2020-09-10 10:50 UTC (permalink / raw)
  To: Steven Price, Sudarshan Rajagopalan, linux-arm-kernel, linux-kernel
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Logan Gunthorpe,
	David Hildenbrand, Andrew Morton



On 09/10/2020 01:57 PM, Steven Price wrote:
> On 10/09/2020 07:05, Sudarshan Rajagopalan wrote:
>> When section mappings are enabled, we allocate vmemmap pages from physically
>> continuous memory of size PMD_SZIE using vmemmap_alloc_block_buf(). Section
>> mappings are good to reduce TLB pressure. But when system is highly fragmented
>> and memory blocks are being hot-added at runtime, its possible that such
>> physically continuous memory allocations can fail. Rather than failing the
>> memory hot-add procedure, add a fallback option to allocate vmemmap pages from
>> discontinuous pages using vmemmap_populate_basepages().
>>
>> Signed-off-by: Sudarshan Rajagopalan <sudaraja@codeaurora.org>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Logan Gunthorpe <logang@deltatee.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Steven Price <steven.price@arm.com>
>> ---
>>   arch/arm64/mm/mmu.c | 15 ++++++++++++---
>>   1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 75df62f..a46c7d4 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -1100,6 +1100,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>>       p4d_t *p4dp;
>>       pud_t *pudp;
>>       pmd_t *pmdp;
>> +    int ret = 0;
>>         do {
>>           next = pmd_addr_end(addr, end);
>> @@ -1121,15 +1122,23 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>>               void *p = NULL;
>>                 p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
>> -            if (!p)
>> -                return -ENOMEM;
>> +            if (!p) {
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> +                vmemmap_free(start, end, altmap);
>> +#endif
>> +                ret = -ENOMEM;
>> +                break;
>> +            }
>>                 pmd_set_huge(pmdp, __pa(p), __pgprot(PROT_SECT_NORMAL));
>>           } else
>>               vmemmap_verify((pte_t *)pmdp, node, addr, next);
>>       } while (addr = next, addr != end);
>>   -    return 0;
>> +    if (ret)
>> +        return vmemmap_populate_basepages(start, end, node, altmap);
>> +    else
>> +        return ret;
> 
> Style comment: I find this usage of 'ret' confusing. When we assign -ENOMEM above that is never actually the return value of the function (in that case vmemmap_populate_basepages() provides the actual return value).

Right.

> 
> Also the "return ret" is misleading since we know by that point that ret==0 (and the 'else' is redundant).

Right.

> 
> Can you not just move the call to vmemmap_populate_basepages() up to just after the (possible) vmemmap_free() call and remove the 'ret' variable?
> 
> AFAICT the call to vmemmap_free() also doesn't need the #ifdef as the function is a no-op if CONFIG_MEMORY_HOTPLUG isn't set. I also feel you 

Right, CONFIG_MEMORY_HOTPLUG is not required.

need at least a comment to explain Anshuman's point that it looks like you're freeing an unmapped area. Although if I'm reading the code correctly it seems like the unmapped area will just be skipped.
Proposed vmemmap_free() attempts to free the entire requested vmemmap range
[start, end] when an intermediate PMD entry can not be allocated. Hence even
if vmemap_free() could skip an unmapped area (will double check on that), it
unnecessarily goes through large sections of unmapped range, which could not
have been mapped.

So, basically there could be two different methods for doing this fallback.

1. Call vmemmap_populate_basepages() for sections when PMD_SIZE allocation fails

	- vmemmap_free() need not be called

2. Abort at the first instance of PMD_SIZE allocation failure

	- Call vmemmap_free() to unmap all sections mapped till that point
	- Call vmemmap_populate_basepages() to map the entire request section

The proposed patch tried to mix both approaches. Regardless, the first approach
here seems better and is the case in vmemmap_populate_hugepages() implementation
on x86 as well.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory
@ 2020-09-10 10:50     ` Anshuman Khandual
  0 siblings, 0 replies; 20+ messages in thread
From: Anshuman Khandual @ 2020-09-10 10:50 UTC (permalink / raw)
  To: Steven Price, Sudarshan Rajagopalan, linux-arm-kernel, linux-kernel
  Cc: Mark Rutland, Will Deacon, David Hildenbrand, Catalin Marinas,
	Andrew Morton, Logan Gunthorpe



On 09/10/2020 01:57 PM, Steven Price wrote:
> On 10/09/2020 07:05, Sudarshan Rajagopalan wrote:
>> When section mappings are enabled, we allocate vmemmap pages from physically
>> continuous memory of size PMD_SZIE using vmemmap_alloc_block_buf(). Section
>> mappings are good to reduce TLB pressure. But when system is highly fragmented
>> and memory blocks are being hot-added at runtime, its possible that such
>> physically continuous memory allocations can fail. Rather than failing the
>> memory hot-add procedure, add a fallback option to allocate vmemmap pages from
>> discontinuous pages using vmemmap_populate_basepages().
>>
>> Signed-off-by: Sudarshan Rajagopalan <sudaraja@codeaurora.org>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Logan Gunthorpe <logang@deltatee.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Steven Price <steven.price@arm.com>
>> ---
>>   arch/arm64/mm/mmu.c | 15 ++++++++++++---
>>   1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 75df62f..a46c7d4 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -1100,6 +1100,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>>       p4d_t *p4dp;
>>       pud_t *pudp;
>>       pmd_t *pmdp;
>> +    int ret = 0;
>>         do {
>>           next = pmd_addr_end(addr, end);
>> @@ -1121,15 +1122,23 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>>               void *p = NULL;
>>                 p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
>> -            if (!p)
>> -                return -ENOMEM;
>> +            if (!p) {
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> +                vmemmap_free(start, end, altmap);
>> +#endif
>> +                ret = -ENOMEM;
>> +                break;
>> +            }
>>                 pmd_set_huge(pmdp, __pa(p), __pgprot(PROT_SECT_NORMAL));
>>           } else
>>               vmemmap_verify((pte_t *)pmdp, node, addr, next);
>>       } while (addr = next, addr != end);
>>   -    return 0;
>> +    if (ret)
>> +        return vmemmap_populate_basepages(start, end, node, altmap);
>> +    else
>> +        return ret;
> 
> Style comment: I find this usage of 'ret' confusing. When we assign -ENOMEM above that is never actually the return value of the function (in that case vmemmap_populate_basepages() provides the actual return value).

Right.

> 
> Also the "return ret" is misleading since we know by that point that ret==0 (and the 'else' is redundant).

Right.

> 
> Can you not just move the call to vmemmap_populate_basepages() up to just after the (possible) vmemmap_free() call and remove the 'ret' variable?
> 
> AFAICT the call to vmemmap_free() also doesn't need the #ifdef as the function is a no-op if CONFIG_MEMORY_HOTPLUG isn't set. I also feel you 

Right, CONFIG_MEMORY_HOTPLUG is not required.

need at least a comment to explain Anshuman's point that it looks like you're freeing an unmapped area. Although if I'm reading the code correctly it seems like the unmapped area will just be skipped.
Proposed vmemmap_free() attempts to free the entire requested vmemmap range
[start, end] when an intermediate PMD entry can not be allocated. Hence even
if vmemap_free() could skip an unmapped area (will double check on that), it
unnecessarily goes through large sections of unmapped range, which could not
have been mapped.

So, basically there could be two different methods for doing this fallback.

1. Call vmemmap_populate_basepages() for sections when PMD_SIZE allocation fails

	- vmemmap_free() need not be called

2. Abort at the first instance of PMD_SIZE allocation failure

	- Call vmemmap_free() to unmap all sections mapped till that point
	- Call vmemmap_populate_basepages() to map the entire request section

The proposed patch tried to mix both approaches. Regardless, the first approach
here seems better and is the case in vmemmap_populate_hugepages() implementation
on x86 as well.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory
  2020-09-10  8:08     ` David Hildenbrand
@ 2020-09-10 10:58       ` Anshuman Khandual
  -1 siblings, 0 replies; 20+ messages in thread
From: Anshuman Khandual @ 2020-09-10 10:58 UTC (permalink / raw)
  To: David Hildenbrand, Sudarshan Rajagopalan, linux-arm-kernel, linux-kernel
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Logan Gunthorpe,
	Andrew Morton, Steven Price



On 09/10/2020 01:38 PM, David Hildenbrand wrote:
> On 10.09.20 08:45, Anshuman Khandual wrote:
>> Hello Sudarshan,
>>
>> On 09/10/2020 11:35 AM, Sudarshan Rajagopalan wrote:
>>> When section mappings are enabled, we allocate vmemmap pages from physically
>>> continuous memory of size PMD_SZIE using vmemmap_alloc_block_buf(). Section> mappings are good to reduce TLB pressure. But when system is highly fragmented
>>> and memory blocks are being hot-added at runtime, its possible that such
>>> physically continuous memory allocations can fail. Rather than failing the
>>
>> Did you really see this happen on a system ?
>>
>>> memory hot-add procedure, add a fallback option to allocate vmemmap pages from
>>> discontinuous pages using vmemmap_populate_basepages().
>>
>> Which could lead to a mixed page size mapping in the VMEMMAP area.
> 
> Right, with gives you a slight performance hit - nobody really cares,
> especially if it happens in corner cases only.

On performance impact, will probably let Catalin and others comment from
arm64 platform perspective, because I might not have all information here.
But will do some more audit regarding possible impact of a mixed page size
vmemmap mapping.

> 
> At least x86_64 (see vmemmap_populate_hugepages()) and s390x (added
> recently by me) implement that behavior.
> 
> Assume you run in a virtualized environment where your hypervisor tries
> to do some smart dynamic guest resizing - like monitoring the guest
> memory consumption and adding more memory on demand. You much rather
> want hotadd to succeed (in these corner cases) that failing just because
> you weren't able to grab a huge page in one instance.
> 
> Examples include XEN balloon, Hyper-V balloon, and virtio-mem. We might
> see some of these for arm64 as well (if don't already do).

Makes sense.

> 
>> Allocation failure in vmemmap_populate() should just cleanly fail
>> the memory hot add operation, which can then be retried. Why the
>> retry has to be offloaded to kernel ?
> 
> (not sure what "offloaded to kernel" really means here - add_memory() is

Offloaded here referred to the responsibility to retry or just fallback.
If the situation can be resolved by user retrying hot add operation till
it succeeds, compared to kernel falling back on allocating normal pages.

> also just triggered from the kernel) I disagree, we should try our best
> to add memory and make it available, especially when short on memory
> already.

Okay.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory
@ 2020-09-10 10:58       ` Anshuman Khandual
  0 siblings, 0 replies; 20+ messages in thread
From: Anshuman Khandual @ 2020-09-10 10:58 UTC (permalink / raw)
  To: David Hildenbrand, Sudarshan Rajagopalan, linux-arm-kernel, linux-kernel
  Cc: Mark Rutland, Will Deacon, Catalin Marinas, Steven Price,
	Andrew Morton, Logan Gunthorpe



On 09/10/2020 01:38 PM, David Hildenbrand wrote:
> On 10.09.20 08:45, Anshuman Khandual wrote:
>> Hello Sudarshan,
>>
>> On 09/10/2020 11:35 AM, Sudarshan Rajagopalan wrote:
>>> When section mappings are enabled, we allocate vmemmap pages from physically
>>> continuous memory of size PMD_SZIE using vmemmap_alloc_block_buf(). Section> mappings are good to reduce TLB pressure. But when system is highly fragmented
>>> and memory blocks are being hot-added at runtime, its possible that such
>>> physically continuous memory allocations can fail. Rather than failing the
>>
>> Did you really see this happen on a system ?
>>
>>> memory hot-add procedure, add a fallback option to allocate vmemmap pages from
>>> discontinuous pages using vmemmap_populate_basepages().
>>
>> Which could lead to a mixed page size mapping in the VMEMMAP area.
> 
> Right, with gives you a slight performance hit - nobody really cares,
> especially if it happens in corner cases only.

On performance impact, will probably let Catalin and others comment from
arm64 platform perspective, because I might not have all information here.
But will do some more audit regarding possible impact of a mixed page size
vmemmap mapping.

> 
> At least x86_64 (see vmemmap_populate_hugepages()) and s390x (added
> recently by me) implement that behavior.
> 
> Assume you run in a virtualized environment where your hypervisor tries
> to do some smart dynamic guest resizing - like monitoring the guest
> memory consumption and adding more memory on demand. You much rather
> want hotadd to succeed (in these corner cases) that failing just because
> you weren't able to grab a huge page in one instance.
> 
> Examples include XEN balloon, Hyper-V balloon, and virtio-mem. We might
> see some of these for arm64 as well (if don't already do).

Makes sense.

> 
>> Allocation failure in vmemmap_populate() should just cleanly fail
>> the memory hot add operation, which can then be retried. Why the
>> retry has to be offloaded to kernel ?
> 
> (not sure what "offloaded to kernel" really means here - add_memory() is

Offloaded here referred to the responsibility to retry or just fallback.
If the situation can be resolved by user retrying hot add operation till
it succeeds, compared to kernel falling back on allocating normal pages.

> also just triggered from the kernel) I disagree, we should try our best
> to add memory and make it available, especially when short on memory
> already.

Okay.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory
  2020-09-10  8:27   ` sudaraja
@ 2020-09-10 11:16       ` Anshuman Khandual
  0 siblings, 0 replies; 20+ messages in thread
From: Anshuman Khandual @ 2020-09-10 11:16 UTC (permalink / raw)
  To: sudaraja, linux-arm-kernel, linux-kernel
  Cc: 'Mark Rutland', 'Will Deacon',
	'David Hildenbrand', 'Catalin Marinas',
	'Steven Price', 'Andrew Morton',
	'Logan Gunthorpe',
	pratikp



On 09/10/2020 01:57 PM, sudaraja@codeaurora.org wrote:
> Hello Anshuman,
> 
>> On 09/10/2020 11:35 AM, Sudarshan Rajagopalan wrote:
>>> When section mappings are enabled, we allocate vmemmap pages from 
>>> physically continuous memory of size PMD_SZIE using 
>>> vmemmap_alloc_block_buf(). Section> mappings are good to reduce TLB 
>>> pressure. But when system is highly fragmented and memory blocks are 
>>> being hot-added at runtime, its possible that such physically 
>>> continuous memory allocations can fail. Rather than failing the
>>
>> Did you really see this happen on a system ?
> 
> Thanks for the response.

There seems to be some text alignment problem in your response on this
thread, please have a look.

> 
> Yes, this happened on a system with very low RAM (size ~120MB) where no free order-9 pages were present. Pasting below few kernel logs. On systems with low RAM, its high probability where memory is fragmented and no higher order pages are free. On such scenarios, vmemmap alloc would fail for PMD_SIZE of contiguous memory.
> 
> We have a usecase for memory sharing between VMs where one of the VM uses add_memory() to add the memory that was donated by the other VM. This uses something similar to VirtIO-Mem. And this requires memory to be _guaranteed_ to be added in the VM so that the usecase can run without any failure.
> 
> vmemmap alloc failure: order:9, mode:0x4cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL), nodemask=(null),cpuset=/,mems_allowed=0
> CPU: 1 PID: 294 Comm: -------- Tainted: G S                5.4.50 #1
> Call trace:
>  dump_stack+0xa4/0xdc
>  warn_alloc+0x104/0x160
>  vmemmap_alloc_block+0xe4/0xf4
>  vmemmap_alloc_block_buf+0x34/0x38
>  vmemmap_populate+0xc8/0x224
>  __populate_section_memmap+0x34/0x54
>  sparse_add_section+0x16c/0x254
>  __add_pages+0xd0/0x138
>  arch_add_memory+0x114/0x1a8
> 
> DMA32: 2627*4kB (UMC) 23*8kB (UME) 6*16kB (UM) 8*32kB (UME) 2*64kB (ME) 2*128kB (UE) 1*256kB (M) 2*512kB (ME) 1*1024kB (M) 0*2048kB 0*4096kB = 13732kB
> 30455 pages RAM
> 
> But keeping this usecase aside, won’t this be problematic on any systems with low RAM where order-9 alloc would fail on a fragmented system, and any memory hot-adding would fail? Or other similar users of VirtIO-Mem which uses arch_add_memory.
> 
>>
>>> memory hot-add procedure, add a fallback option to allocate vmemmap 
>>> pages from discontinuous pages using vmemmap_populate_basepages().
>>
>> Which could lead to a mixed page size mapping in the VMEMMAP area.
> 
> Would this be problematic? We would only lose one section mapping per failure and increases slight TLB pressure. Also, we would anyway do discontinuous pages alloc for systems having non-4K pages (ARM64_SWAPPER_USES_SECTION_MAPS will be 0). I only see a small cost to performance due to slight TLB pressure.
> 
>> Allocation failure in vmemmap_populate() should just cleanly fail the memory hot add operation, which can then be retried. Why the retry has to be offloaded to kernel ?
> 
> While a retry can attempted again, but it won’t help in cases where there are no order-9 pages available and any retry would just not succeed again until a order-9 page gets free'ed. Here we are just falling back to use discontinuous pages allocation to help succeed memory hot-add as best as possible.

Understood, seems like there is enough potential use cases and scenarios
right now, to consider this fallback mechanism and a possible mixed page
size vmemmap. But I would let others weigh in, on the performance impact.

> 
> Thanks and Regards,
> Sudarshan
> 
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
> 
> -----Original Message-----
> From: Anshuman Khandual <anshuman.khandual@arm.com> 
> Sent: Wednesday, September 9, 2020 11:45 PM
> To: Sudarshan Rajagopalan <sudaraja@codeaurora.org>; linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org
> Cc: Catalin Marinas <catalin.marinas@arm.com>; Will Deacon <will@kernel.org>; Mark Rutland <mark.rutland@arm.com>; Logan Gunthorpe <logang@deltatee.com>; David Hildenbrand <david@redhat.com>; Andrew Morton <akpm@linux-foundation.org>; Steven Price <steven.price@arm.com>
> Subject: Re: [PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory
> 
> Hello Sudarshan,
> 
> On 09/10/2020 11:35 AM, Sudarshan Rajagopalan wrote:
>> When section mappings are enabled, we allocate vmemmap pages from 
>> physically continuous memory of size PMD_SZIE using 
>> vmemmap_alloc_block_buf(). Section> mappings are good to reduce TLB 
>> pressure. But when system is highly fragmented and memory blocks are 
>> being hot-added at runtime, its possible that such physically 
>> continuous memory allocations can fail. Rather than failing the
> 
> Did you really see this happen on a system ?
> 
>> memory hot-add procedure, add a fallback option to allocate vmemmap 
>> pages from discontinuous pages using vmemmap_populate_basepages().
> 
> Which could lead to a mixed page size mapping in the VMEMMAP area.
> Allocation failure in vmemmap_populate() should just cleanly fail the memory hot add operation, which can then be retried. Why the retry has to be offloaded to kernel ?
> 
>>
>> Signed-off-by: Sudarshan Rajagopalan <sudaraja@codeaurora.org>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Logan Gunthorpe <logang@deltatee.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Steven Price <steven.price@arm.com>
>> ---
>>  arch/arm64/mm/mmu.c | 15 ++++++++++++---
>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 
>> 75df62f..a46c7d4 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -1100,6 +1100,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>>  	p4d_t *p4dp;
>>  	pud_t *pudp;
>>  	pmd_t *pmdp;
>> +	int ret = 0;
>>  
>>  	do {
>>  		next = pmd_addr_end(addr, end);
>> @@ -1121,15 +1122,23 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>>  			void *p = NULL;
>>  
>>  			p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
>> -			if (!p)
>> -				return -ENOMEM;
>> +			if (!p) {
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> +				vmemmap_free(start, end, altmap); #endif
> 
> The mapping was never created in the first place, as the allocation failed. vmemmap_free() here will free an unmapped area !
> 
>> +				ret = -ENOMEM;
>> +				break;
>> +			}
>>  
>>  			pmd_set_huge(pmdp, __pa(p), __pgprot(PROT_SECT_NORMAL));
>>  		} else
>>  			vmemmap_verify((pte_t *)pmdp, node, addr, next);
>>  	} while (addr = next, addr != end);
>>  
>> -	return 0;
>> +	if (ret)
>> +		return vmemmap_populate_basepages(start, end, node, altmap);
>> +	else
>> +		return ret;
>>  }
>>  #endif	/* !ARM64_SWAPPER_USES_SECTION_MAPS */
>>  void vmemmap_free(unsigned long start, unsigned long end,
>>
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory
@ 2020-09-10 11:16       ` Anshuman Khandual
  0 siblings, 0 replies; 20+ messages in thread
From: Anshuman Khandual @ 2020-09-10 11:16 UTC (permalink / raw)
  To: sudaraja, linux-arm-kernel, linux-kernel
  Cc: 'Mark Rutland', 'David Hildenbrand',
	'Catalin Marinas', 'Steven Price',
	'Logan Gunthorpe', 'Andrew Morton',
	'Will Deacon',
	pratikp



On 09/10/2020 01:57 PM, sudaraja@codeaurora.org wrote:
> Hello Anshuman,
> 
>> On 09/10/2020 11:35 AM, Sudarshan Rajagopalan wrote:
>>> When section mappings are enabled, we allocate vmemmap pages from 
>>> physically continuous memory of size PMD_SZIE using 
>>> vmemmap_alloc_block_buf(). Section> mappings are good to reduce TLB 
>>> pressure. But when system is highly fragmented and memory blocks are 
>>> being hot-added at runtime, its possible that such physically 
>>> continuous memory allocations can fail. Rather than failing the
>>
>> Did you really see this happen on a system ?
> 
> Thanks for the response.

There seems to be some text alignment problem in your response on this
thread, please have a look.

> 
> Yes, this happened on a system with very low RAM (size ~120MB) where no free order-9 pages were present. Pasting below few kernel logs. On systems with low RAM, its high probability where memory is fragmented and no higher order pages are free. On such scenarios, vmemmap alloc would fail for PMD_SIZE of contiguous memory.
> 
> We have a usecase for memory sharing between VMs where one of the VM uses add_memory() to add the memory that was donated by the other VM. This uses something similar to VirtIO-Mem. And this requires memory to be _guaranteed_ to be added in the VM so that the usecase can run without any failure.
> 
> vmemmap alloc failure: order:9, mode:0x4cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL), nodemask=(null),cpuset=/,mems_allowed=0
> CPU: 1 PID: 294 Comm: -------- Tainted: G S                5.4.50 #1
> Call trace:
>  dump_stack+0xa4/0xdc
>  warn_alloc+0x104/0x160
>  vmemmap_alloc_block+0xe4/0xf4
>  vmemmap_alloc_block_buf+0x34/0x38
>  vmemmap_populate+0xc8/0x224
>  __populate_section_memmap+0x34/0x54
>  sparse_add_section+0x16c/0x254
>  __add_pages+0xd0/0x138
>  arch_add_memory+0x114/0x1a8
> 
> DMA32: 2627*4kB (UMC) 23*8kB (UME) 6*16kB (UM) 8*32kB (UME) 2*64kB (ME) 2*128kB (UE) 1*256kB (M) 2*512kB (ME) 1*1024kB (M) 0*2048kB 0*4096kB = 13732kB
> 30455 pages RAM
> 
> But keeping this usecase aside, won’t this be problematic on any systems with low RAM where order-9 alloc would fail on a fragmented system, and any memory hot-adding would fail? Or other similar users of VirtIO-Mem which uses arch_add_memory.
> 
>>
>>> memory hot-add procedure, add a fallback option to allocate vmemmap 
>>> pages from discontinuous pages using vmemmap_populate_basepages().
>>
>> Which could lead to a mixed page size mapping in the VMEMMAP area.
> 
> Would this be problematic? We would only lose one section mapping per failure and increases slight TLB pressure. Also, we would anyway do discontinuous pages alloc for systems having non-4K pages (ARM64_SWAPPER_USES_SECTION_MAPS will be 0). I only see a small cost to performance due to slight TLB pressure.
> 
>> Allocation failure in vmemmap_populate() should just cleanly fail the memory hot add operation, which can then be retried. Why the retry has to be offloaded to kernel ?
> 
> While a retry can attempted again, but it won’t help in cases where there are no order-9 pages available and any retry would just not succeed again until a order-9 page gets free'ed. Here we are just falling back to use discontinuous pages allocation to help succeed memory hot-add as best as possible.

Understood, seems like there is enough potential use cases and scenarios
right now, to consider this fallback mechanism and a possible mixed page
size vmemmap. But I would let others weigh in, on the performance impact.

> 
> Thanks and Regards,
> Sudarshan
> 
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
> 
> -----Original Message-----
> From: Anshuman Khandual <anshuman.khandual@arm.com> 
> Sent: Wednesday, September 9, 2020 11:45 PM
> To: Sudarshan Rajagopalan <sudaraja@codeaurora.org>; linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org
> Cc: Catalin Marinas <catalin.marinas@arm.com>; Will Deacon <will@kernel.org>; Mark Rutland <mark.rutland@arm.com>; Logan Gunthorpe <logang@deltatee.com>; David Hildenbrand <david@redhat.com>; Andrew Morton <akpm@linux-foundation.org>; Steven Price <steven.price@arm.com>
> Subject: Re: [PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory
> 
> Hello Sudarshan,
> 
> On 09/10/2020 11:35 AM, Sudarshan Rajagopalan wrote:
>> When section mappings are enabled, we allocate vmemmap pages from 
>> physically continuous memory of size PMD_SZIE using 
>> vmemmap_alloc_block_buf(). Section> mappings are good to reduce TLB 
>> pressure. But when system is highly fragmented and memory blocks are 
>> being hot-added at runtime, its possible that such physically 
>> continuous memory allocations can fail. Rather than failing the
> 
> Did you really see this happen on a system ?
> 
>> memory hot-add procedure, add a fallback option to allocate vmemmap 
>> pages from discontinuous pages using vmemmap_populate_basepages().
> 
> Which could lead to a mixed page size mapping in the VMEMMAP area.
> Allocation failure in vmemmap_populate() should just cleanly fail the memory hot add operation, which can then be retried. Why the retry has to be offloaded to kernel ?
> 
>>
>> Signed-off-by: Sudarshan Rajagopalan <sudaraja@codeaurora.org>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Logan Gunthorpe <logang@deltatee.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Steven Price <steven.price@arm.com>
>> ---
>>  arch/arm64/mm/mmu.c | 15 ++++++++++++---
>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 
>> 75df62f..a46c7d4 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -1100,6 +1100,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>>  	p4d_t *p4dp;
>>  	pud_t *pudp;
>>  	pmd_t *pmdp;
>> +	int ret = 0;
>>  
>>  	do {
>>  		next = pmd_addr_end(addr, end);
>> @@ -1121,15 +1122,23 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>>  			void *p = NULL;
>>  
>>  			p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
>> -			if (!p)
>> -				return -ENOMEM;
>> +			if (!p) {
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> +				vmemmap_free(start, end, altmap); #endif
> 
> The mapping was never created in the first place, as the allocation failed. vmemmap_free() here will free an unmapped area !
> 
>> +				ret = -ENOMEM;
>> +				break;
>> +			}
>>  
>>  			pmd_set_huge(pmdp, __pa(p), __pgprot(PROT_SECT_NORMAL));
>>  		} else
>>  			vmemmap_verify((pte_t *)pmdp, node, addr, next);
>>  	} while (addr = next, addr != end);
>>  
>> -	return 0;
>> +	if (ret)
>> +		return vmemmap_populate_basepages(start, end, node, altmap);
>> +	else
>> +		return ret;
>>  }
>>  #endif	/* !ARM64_SWAPPER_USES_SECTION_MAPS */
>>  void vmemmap_free(unsigned long start, unsigned long end,
>>
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory
  2020-09-10 10:50     ` Anshuman Khandual
  (?)
@ 2020-09-10 20:48     ` sudaraja
  2020-09-21 17:43         ` Will Deacon
  -1 siblings, 1 reply; 20+ messages in thread
From: sudaraja @ 2020-09-10 20:48 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Steven Price, linux-arm-kernel, linux-kernel, Catalin Marinas,
	Will Deacon, Mark Rutland, Logan Gunthorpe, David Hildenbrand,
	Andrew Morton, pratikp

On 2020-09-10 03:50, Anshuman Khandual wrote:
> On 09/10/2020 01:57 PM, Steven Price wrote:
>> On 10/09/2020 07:05, Sudarshan Rajagopalan wrote:
>>> When section mappings are enabled, we allocate vmemmap pages from 
>>> physically
>>> continuous memory of size PMD_SZIE using vmemmap_alloc_block_buf(). 
>>> Section
>>> mappings are good to reduce TLB pressure. But when system is highly 
>>> fragmented
>>> and memory blocks are being hot-added at runtime, its possible that 
>>> such
>>> physically continuous memory allocations can fail. Rather than 
>>> failing the
>>> memory hot-add procedure, add a fallback option to allocate vmemmap 
>>> pages from
>>> discontinuous pages using vmemmap_populate_basepages().
>>> 
>>> Signed-off-by: Sudarshan Rajagopalan <sudaraja@codeaurora.org>
>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>> Cc: Will Deacon <will@kernel.org>
>>> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>> Cc: Logan Gunthorpe <logang@deltatee.com>
>>> Cc: David Hildenbrand <david@redhat.com>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Steven Price <steven.price@arm.com>
>>> ---
>>>   arch/arm64/mm/mmu.c | 15 ++++++++++++---
>>>   1 file changed, 12 insertions(+), 3 deletions(-)
>>> 
>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> index 75df62f..a46c7d4 100644
>>> --- a/arch/arm64/mm/mmu.c
>>> +++ b/arch/arm64/mm/mmu.c
>>> @@ -1100,6 +1100,7 @@ int __meminit vmemmap_populate(unsigned long 
>>> start, unsigned long end, int node,
>>>       p4d_t *p4dp;
>>>       pud_t *pudp;
>>>       pmd_t *pmdp;
>>> +    int ret = 0;
>>>         do {
>>>           next = pmd_addr_end(addr, end);
>>> @@ -1121,15 +1122,23 @@ int __meminit vmemmap_populate(unsigned long 
>>> start, unsigned long end, int node,
>>>               void *p = NULL;
>>>                 p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
>>> -            if (!p)
>>> -                return -ENOMEM;
>>> +            if (!p) {
>>> +#ifdef CONFIG_MEMORY_HOTPLUG
>>> +                vmemmap_free(start, end, altmap);
>>> +#endif
>>> +                ret = -ENOMEM;
>>> +                break;
>>> +            }
>>>                 pmd_set_huge(pmdp, __pa(p), 
>>> __pgprot(PROT_SECT_NORMAL));
>>>           } else
>>>               vmemmap_verify((pte_t *)pmdp, node, addr, next);
>>>       } while (addr = next, addr != end);
>>>   -    return 0;
>>> +    if (ret)
>>> +        return vmemmap_populate_basepages(start, end, node, altmap);
>>> +    else
>>> +        return ret;
>> 
>> Style comment: I find this usage of 'ret' confusing. When we assign 
>> -ENOMEM above that is never actually the return value of the function 
>> (in that case vmemmap_populate_basepages() provides the actual return 
>> value).
> 
> Right.
> 
>> 
>> Also the "return ret" is misleading since we know by that point that 
>> ret==0 (and the 'else' is redundant).
> 
> Right.
> 
>> 
>> Can you not just move the call to vmemmap_populate_basepages() up to 
>> just after the (possible) vmemmap_free() call and remove the 'ret' 
>> variable?
>> 

Yes the usage of "return ret" is quite confusing and misleading here - 
will clean this.

>> AFAICT the call to vmemmap_free() also doesn't need the #ifdef as the 
>> function is a no-op if CONFIG_MEMORY_HOTPLUG isn't set. I also feel 
>> you
> 
> Right, CONFIG_MEMORY_HOTPLUG is not required.

Not quite exactly - the vmemmap_free() declaration in include/linux/mm.h 
header file is wrapped around CONFIG_MEMORY_HOTPLUG as well. And since 
the function definition is below the place where this is called, it will 
throw an implicit declaration compile error when CONFIG_MEMORY_HOTPLUG 
is not enabled. We can move the function definition above so that we 
don't have to place this #ifdef. But we can go with 1st approach that 
Anshuman mentions below.

> 
> need at least a comment to explain Anshuman's point that it looks like
> you're freeing an unmapped area. Although if I'm reading the code
> correctly it seems like the unmapped area will just be skipped.
> Proposed vmemmap_free() attempts to free the entire requested vmemmap 
> range
> [start, end] when an intermediate PMD entry can not be allocated. Hence 
> even
> if vmemap_free() could skip an unmapped area (will double check on 
> that), it
> unnecessarily goes through large sections of unmapped range, which 
> could not
> have been mapped.
> 
> So, basically there could be two different methods for doing this 
> fallback.
> 
> 1. Call vmemmap_populate_basepages() for sections when PMD_SIZE 
> allocation fails
> 
> 	- vmemmap_free() need not be called
> 
> 2. Abort at the first instance of PMD_SIZE allocation failure
> 
> 	- Call vmemmap_free() to unmap all sections mapped till that point
> 	- Call vmemmap_populate_basepages() to map the entire request section
> 
> The proposed patch tried to mix both approaches. Regardless, the first 
> approach
> here seems better and is the case in vmemmap_populate_hugepages() 
> implementation
> on x86 as well.

The 1st approach looks more cleaner compared to bailing out in first 
failure, unmapping all previously mapped sections and map entire request 
with vmemmap_populate_basepages. Thanks for the review and suggestion - 
will send over a cleaner patch soon.

Sudarshan

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory
  2020-09-10 10:50     ` Anshuman Khandual
  (?)
  (?)
@ 2020-09-10 20:48     ` sudaraja
  -1 siblings, 0 replies; 20+ messages in thread
From: sudaraja @ 2020-09-10 20:48 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Mark Rutland, David Hildenbrand, Catalin Marinas, linux-kernel,
	Steven Price, Logan Gunthorpe, Andrew Morton, Will Deacon,
	linux-arm-kernel, pratikp

On 2020-09-10 03:50, Anshuman Khandual wrote:
> On 09/10/2020 01:57 PM, Steven Price wrote:
>> On 10/09/2020 07:05, Sudarshan Rajagopalan wrote:
>>> When section mappings are enabled, we allocate vmemmap pages from 
>>> physically
>>> continuous memory of size PMD_SZIE using vmemmap_alloc_block_buf(). 
>>> Section
>>> mappings are good to reduce TLB pressure. But when system is highly 
>>> fragmented
>>> and memory blocks are being hot-added at runtime, its possible that 
>>> such
>>> physically continuous memory allocations can fail. Rather than 
>>> failing the
>>> memory hot-add procedure, add a fallback option to allocate vmemmap 
>>> pages from
>>> discontinuous pages using vmemmap_populate_basepages().
>>> 
>>> Signed-off-by: Sudarshan Rajagopalan <sudaraja@codeaurora.org>
>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>> Cc: Will Deacon <will@kernel.org>
>>> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>> Cc: Logan Gunthorpe <logang@deltatee.com>
>>> Cc: David Hildenbrand <david@redhat.com>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Steven Price <steven.price@arm.com>
>>> ---
>>>   arch/arm64/mm/mmu.c | 15 ++++++++++++---
>>>   1 file changed, 12 insertions(+), 3 deletions(-)
>>> 
>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> index 75df62f..a46c7d4 100644
>>> --- a/arch/arm64/mm/mmu.c
>>> +++ b/arch/arm64/mm/mmu.c
>>> @@ -1100,6 +1100,7 @@ int __meminit vmemmap_populate(unsigned long 
>>> start, unsigned long end, int node,
>>>       p4d_t *p4dp;
>>>       pud_t *pudp;
>>>       pmd_t *pmdp;
>>> +    int ret = 0;
>>>         do {
>>>           next = pmd_addr_end(addr, end);
>>> @@ -1121,15 +1122,23 @@ int __meminit vmemmap_populate(unsigned long 
>>> start, unsigned long end, int node,
>>>               void *p = NULL;
>>>                 p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
>>> -            if (!p)
>>> -                return -ENOMEM;
>>> +            if (!p) {
>>> +#ifdef CONFIG_MEMORY_HOTPLUG
>>> +                vmemmap_free(start, end, altmap);
>>> +#endif
>>> +                ret = -ENOMEM;
>>> +                break;
>>> +            }
>>>                 pmd_set_huge(pmdp, __pa(p), 
>>> __pgprot(PROT_SECT_NORMAL));
>>>           } else
>>>               vmemmap_verify((pte_t *)pmdp, node, addr, next);
>>>       } while (addr = next, addr != end);
>>>   -    return 0;
>>> +    if (ret)
>>> +        return vmemmap_populate_basepages(start, end, node, altmap);
>>> +    else
>>> +        return ret;
>> 
>> Style comment: I find this usage of 'ret' confusing. When we assign 
>> -ENOMEM above that is never actually the return value of the function 
>> (in that case vmemmap_populate_basepages() provides the actual return 
>> value).
> 
> Right.
> 
>> 
>> Also the "return ret" is misleading since we know by that point that 
>> ret==0 (and the 'else' is redundant).
> 
> Right.
> 
>> 
>> Can you not just move the call to vmemmap_populate_basepages() up to 
>> just after the (possible) vmemmap_free() call and remove the 'ret' 
>> variable?
>> 

Yes the usage of "return ret" is quite confusing and misleading here - 
will clean this.

>> AFAICT the call to vmemmap_free() also doesn't need the #ifdef as the 
>> function is a no-op if CONFIG_MEMORY_HOTPLUG isn't set. I also feel 
>> you
> 
> Right, CONFIG_MEMORY_HOTPLUG is not required.

Not quite exactly - the vmemmap_free() declaration in include/linux/mm.h 
header file is wrapped around CONFIG_MEMORY_HOTPLUG as well. And since 
the function definition is below the place where this is called, it will 
throw an implicit declaration compile error when CONFIG_MEMORY_HOTPLUG 
is not enabled. We can move the function definition above so that we 
don't have to place this #ifdef. But we can go with 1st approach that 
Anshuman mentions below.

> 
> need at least a comment to explain Anshuman's point that it looks like
> you're freeing an unmapped area. Although if I'm reading the code
> correctly it seems like the unmapped area will just be skipped.
> Proposed vmemmap_free() attempts to free the entire requested vmemmap 
> range
> [start, end] when an intermediate PMD entry can not be allocated. Hence 
> even
> if vmemap_free() could skip an unmapped area (will double check on 
> that), it
> unnecessarily goes through large sections of unmapped range, which 
> could not
> have been mapped.
> 
> So, basically there could be two different methods for doing this 
> fallback.
> 
> 1. Call vmemmap_populate_basepages() for sections when PMD_SIZE 
> allocation fails
> 
> 	- vmemmap_free() need not be called
> 
> 2. Abort at the first instance of PMD_SIZE allocation failure
> 
> 	- Call vmemmap_free() to unmap all sections mapped till that point
> 	- Call vmemmap_populate_basepages() to map the entire request section
> 
> The proposed patch tried to mix both approaches. Regardless, the first 
> approach
> here seems better and is the case in vmemmap_populate_hugepages() 
> implementation
> on x86 as well.

The 1st approach looks more cleaner compared to bailing out in first 
failure, unmapping all previously mapped sections and map entire request 
with vmemmap_populate_basepages. Thanks for the review and suggestion - 
will send over a cleaner patch soon.

Sudarshan

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory
  2020-09-10 20:48     ` sudaraja
@ 2020-09-21 17:43         ` Will Deacon
  0 siblings, 0 replies; 20+ messages in thread
From: Will Deacon @ 2020-09-21 17:43 UTC (permalink / raw)
  To: sudaraja
  Cc: Anshuman Khandual, Steven Price, linux-arm-kernel, linux-kernel,
	Catalin Marinas, Mark Rutland, Logan Gunthorpe,
	David Hildenbrand, Andrew Morton, pratikp

On Thu, Sep 10, 2020 at 08:48:40PM +0000, sudaraja@codeaurora.org wrote:
> On 2020-09-10 03:50, Anshuman Khandual wrote:
> > The proposed patch tried to mix both approaches. Regardless, the first
> > approach
> > here seems better and is the case in vmemmap_populate_hugepages()
> > implementation
> > on x86 as well.
> 
> The 1st approach looks more cleaner compared to bailing out in first
> failure, unmapping all previously mapped sections and map entire request
> with vmemmap_populate_basepages. Thanks for the review and suggestion - will
> send over a cleaner patch soon.

Did you send an updated version of this? The threading has gone wonky in
my mail client, so I may have missed it.

Will

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory
@ 2020-09-21 17:43         ` Will Deacon
  0 siblings, 0 replies; 20+ messages in thread
From: Will Deacon @ 2020-09-21 17:43 UTC (permalink / raw)
  To: sudaraja
  Cc: Mark Rutland, Anshuman Khandual, Catalin Marinas,
	David Hildenbrand, linux-kernel, Steven Price, Andrew Morton,
	Logan Gunthorpe, linux-arm-kernel, pratikp

On Thu, Sep 10, 2020 at 08:48:40PM +0000, sudaraja@codeaurora.org wrote:
> On 2020-09-10 03:50, Anshuman Khandual wrote:
> > The proposed patch tried to mix both approaches. Regardless, the first
> > approach
> > here seems better and is the case in vmemmap_populate_hugepages()
> > implementation
> > on x86 as well.
> 
> The 1st approach looks more cleaner compared to bailing out in first
> failure, unmapping all previously mapped sections and map entire request
> with vmemmap_populate_basepages. Thanks for the review and suggestion - will
> send over a cleaner patch soon.

Did you send an updated version of this? The threading has gone wonky in
my mail client, so I may have missed it.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory
@ 2020-09-10  6:05 Sudarshan Rajagopalan
  0 siblings, 0 replies; 20+ messages in thread
From: Sudarshan Rajagopalan @ 2020-09-10  6:05 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel
  Cc: Sudarshan Rajagopalan, Catalin Marinas, Will Deacon,
	Anshuman Khandual, Mark Rutland, Logan Gunthorpe,
	David Hildenbrand, Andrew Morton, Steven Price

When section mappings are enabled, we allocate vmemmap pages from physically
continuous memory of size PMD_SZIE using vmemmap_alloc_block_buf(). Section
mappings are good to reduce TLB pressure. But when system is highly fragmented
and memory blocks are being hot-added at runtime, its possible that such
physically continuous memory allocations can fail. Rather than failing the
memory hot-add procedure, add a fallback option to allocate vmemmap pages from
discontinuous pages using vmemmap_populate_basepages().

Signed-off-by: Sudarshan Rajagopalan <sudaraja@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Price <steven.price@arm.com>
---
 arch/arm64/mm/mmu.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 75df62f..a46c7d4 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1100,6 +1100,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 	p4d_t *p4dp;
 	pud_t *pudp;
 	pmd_t *pmdp;
+	int ret = 0;
 
 	do {
 		next = pmd_addr_end(addr, end);
@@ -1121,15 +1122,23 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 			void *p = NULL;
 
 			p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
-			if (!p)
-				return -ENOMEM;
+			if (!p) {
+#ifdef CONFIG_MEMORY_HOTPLUG
+				vmemmap_free(start, end, altmap);
+#endif
+				ret = -ENOMEM;
+				break;
+			}
 
 			pmd_set_huge(pmdp, __pa(p), __pgprot(PROT_SECT_NORMAL));
 		} else
 			vmemmap_verify((pte_t *)pmdp, node, addr, next);
 	} while (addr = next, addr != end);
 
-	return 0;
+	if (ret)
+		return vmemmap_populate_basepages(start, end, node, altmap);
+	else
+		return ret;
 }
 #endif	/* !ARM64_SWAPPER_USES_SECTION_MAPS */
 void vmemmap_free(unsigned long start, unsigned long end,
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory
@ 2020-09-10  6:05 Sudarshan Rajagopalan
  0 siblings, 0 replies; 20+ messages in thread
From: Sudarshan Rajagopalan @ 2020-09-10  6:05 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel
  Cc: Mark Rutland, Will Deacon, Anshuman Khandual, Catalin Marinas,
	Sudarshan Rajagopalan, David Hildenbrand, Steven Price,
	Andrew Morton, Logan Gunthorpe

When section mappings are enabled, we allocate vmemmap pages from physically
continuous memory of size PMD_SZIE using vmemmap_alloc_block_buf(). Section
mappings are good to reduce TLB pressure. But when system is highly fragmented
and memory blocks are being hot-added at runtime, its possible that such
physically continuous memory allocations can fail. Rather than failing the
memory hot-add procedure, add a fallback option to allocate vmemmap pages from
discontinuous pages using vmemmap_populate_basepages().

Signed-off-by: Sudarshan Rajagopalan <sudaraja@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Price <steven.price@arm.com>
---
 arch/arm64/mm/mmu.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 75df62f..a46c7d4 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1100,6 +1100,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 	p4d_t *p4dp;
 	pud_t *pudp;
 	pmd_t *pmdp;
+	int ret = 0;
 
 	do {
 		next = pmd_addr_end(addr, end);
@@ -1121,15 +1122,23 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 			void *p = NULL;
 
 			p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
-			if (!p)
-				return -ENOMEM;
+			if (!p) {
+#ifdef CONFIG_MEMORY_HOTPLUG
+				vmemmap_free(start, end, altmap);
+#endif
+				ret = -ENOMEM;
+				break;
+			}
 
 			pmd_set_huge(pmdp, __pa(p), __pgprot(PROT_SECT_NORMAL));
 		} else
 			vmemmap_verify((pte_t *)pmdp, node, addr, next);
 	} while (addr = next, addr != end);
 
-	return 0;
+	if (ret)
+		return vmemmap_populate_basepages(start, end, node, altmap);
+	else
+		return ret;
 }
 #endif	/* !ARM64_SWAPPER_USES_SECTION_MAPS */
 void vmemmap_free(unsigned long start, unsigned long end,
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-09-21 17:44 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <01010174769e2b68-a6f3768e-aef8-43c7-b357-a8cb1e17d3eb-000000@us-west-2.amazonses.com>
2020-09-10  6:45 ` [PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory Anshuman Khandual
2020-09-10  6:45   ` Anshuman Khandual
2020-09-10  8:08   ` David Hildenbrand
2020-09-10  8:08     ` David Hildenbrand
2020-09-10 10:58     ` Anshuman Khandual
2020-09-10 10:58       ` Anshuman Khandual
2020-09-10  8:27   ` sudaraja
2020-09-10 11:16     ` Anshuman Khandual
2020-09-10 11:16       ` Anshuman Khandual
2020-09-10  8:27   ` sudaraja
2020-09-10  8:27 ` Steven Price
2020-09-10  8:27   ` Steven Price
2020-09-10 10:50   ` Anshuman Khandual
2020-09-10 10:50     ` Anshuman Khandual
2020-09-10 20:48     ` sudaraja
2020-09-21 17:43       ` Will Deacon
2020-09-21 17:43         ` Will Deacon
2020-09-10 20:48     ` sudaraja
2020-09-10  6:05 Sudarshan Rajagopalan
  -- strict thread matches above, loose matches on Subject: below --
2020-09-10  6:05 Sudarshan Rajagopalan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.