linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: Support memblock alloc on the exact node for sparse_buffer_init()
@ 2019-09-18  4:22 Yunfeng Ye
  2019-09-18  6:51 ` Wei Yang
  2019-09-19  4:47 ` Mike Rapoport
  0 siblings, 2 replies; 9+ messages in thread
From: Yunfeng Ye @ 2019-09-18  4:22 UTC (permalink / raw)
  To: rppt, akpm, osalvador, mhocko, dan.j.williams, rppt, david,
	richardw.yang, cai
  Cc: linux-mm, linux-kernel

Currently, when memblock_find_in_range_node() fail on the exact node, it
will use %NUMA_NO_NODE to find memblock from other nodes. At present,
the work is good, but when the large memory is insufficient and the
small memory is enough, we want to allocate the small memory of this
node first, and do not need to allocate large memory from other nodes.

In sparse_buffer_init(), it will prepare large chunks of memory for page
structure. The page management structure requires a lot of memory, but
if the node does not have enough memory, it can be converted to a small
memory allocation without having to allocate it from other nodes.

Add %MEMBLOCK_ALLOC_EXACT_NODE flag for this situation. Normally, the
behavior is the same with %MEMBLOCK_ALLOC_ACCESSIBLE, only that it will
not allocate from other nodes when a single node fails to allocate.

If large contiguous block memory allocated fail in sparse_buffer_init(),
it will allocates small block memmory section by section later.

Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
---
 include/linux/memblock.h | 1 +
 mm/memblock.c            | 3 ++-
 mm/sparse.c              | 2 +-
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index f491690..9a81d9c 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -339,6 +339,7 @@ static inline int memblock_get_region_node(const struct memblock_region *r)
 #define MEMBLOCK_ALLOC_ANYWHERE	(~(phys_addr_t)0)
 #define MEMBLOCK_ALLOC_ACCESSIBLE	0
 #define MEMBLOCK_ALLOC_KASAN		1
+#define MEMBLOCK_ALLOC_EXACT_NODE	2

 /* We are using top down, so it is safe to use 0 here */
 #define MEMBLOCK_LOW_LIMIT 0
diff --git a/mm/memblock.c b/mm/memblock.c
index 7d4f61a..dbd52c3c 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -277,6 +277,7 @@ static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,

 	/* pump up @end */
 	if (end == MEMBLOCK_ALLOC_ACCESSIBLE ||
+	    end == MEMBLOCK_ALLOC_EXACT_NODE ||
 	    end == MEMBLOCK_ALLOC_KASAN)
 		end = memblock.current_limit;

@@ -1365,7 +1366,7 @@ static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
 	if (found && !memblock_reserve(found, size))
 		goto done;

-	if (nid != NUMA_NO_NODE) {
+	if (end != MEMBLOCK_ALLOC_EXACT_NODE && nid != NUMA_NO_NODE) {
 		found = memblock_find_in_range_node(size, align, start,
 						    end, NUMA_NO_NODE,
 						    flags);
diff --git a/mm/sparse.c b/mm/sparse.c
index 72f010d..828db46 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -477,7 +477,7 @@ static void __init sparse_buffer_init(unsigned long size, int nid)
 	sparsemap_buf =
 		memblock_alloc_try_nid_raw(size, PAGE_SIZE,
 						addr,
-						MEMBLOCK_ALLOC_ACCESSIBLE, nid);
+						MEMBLOCK_ALLOC_EXACT_NODE, nid);
 	sparsemap_buf_end = sparsemap_buf + size;
 }

-- 
2.7.4.huawei.3



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: Support memblock alloc on the exact node for sparse_buffer_init()
  2019-09-18  4:22 [PATCH] mm: Support memblock alloc on the exact node for sparse_buffer_init() Yunfeng Ye
@ 2019-09-18  6:51 ` Wei Yang
  2019-09-18  7:08   ` Yunfeng Ye
  2019-09-19  4:47 ` Mike Rapoport
  1 sibling, 1 reply; 9+ messages in thread
From: Wei Yang @ 2019-09-18  6:51 UTC (permalink / raw)
  To: Yunfeng Ye
  Cc: rppt, akpm, osalvador, mhocko, dan.j.williams, david,
	richardw.yang, cai, linux-mm, linux-kernel

On Wed, Sep 18, 2019 at 12:22:29PM +0800, Yunfeng Ye wrote:
>Currently, when memblock_find_in_range_node() fail on the exact node, it
>will use %NUMA_NO_NODE to find memblock from other nodes. At present,
>the work is good, but when the large memory is insufficient and the
>small memory is enough, we want to allocate the small memory of this
>node first, and do not need to allocate large memory from other nodes.
>
>In sparse_buffer_init(), it will prepare large chunks of memory for page
>structure. The page management structure requires a lot of memory, but
>if the node does not have enough memory, it can be converted to a small
>memory allocation without having to allocate it from other nodes.
>
>Add %MEMBLOCK_ALLOC_EXACT_NODE flag for this situation. Normally, the
>behavior is the same with %MEMBLOCK_ALLOC_ACCESSIBLE, only that it will
>not allocate from other nodes when a single node fails to allocate.
>
>If large contiguous block memory allocated fail in sparse_buffer_init(),
>it will allocates small block memmory section by section later.
>

Looks this changes current behavior even it fall back to section based
allocation.

>Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
>---
> include/linux/memblock.h | 1 +
> mm/memblock.c            | 3 ++-
> mm/sparse.c              | 2 +-
> 3 files changed, 4 insertions(+), 2 deletions(-)
>
>diff --git a/include/linux/memblock.h b/include/linux/memblock.h
>index f491690..9a81d9c 100644
>--- a/include/linux/memblock.h
>+++ b/include/linux/memblock.h
>@@ -339,6 +339,7 @@ static inline int memblock_get_region_node(const struct memblock_region *r)
> #define MEMBLOCK_ALLOC_ANYWHERE	(~(phys_addr_t)0)
> #define MEMBLOCK_ALLOC_ACCESSIBLE	0
> #define MEMBLOCK_ALLOC_KASAN		1
>+#define MEMBLOCK_ALLOC_EXACT_NODE	2
>
> /* We are using top down, so it is safe to use 0 here */
> #define MEMBLOCK_LOW_LIMIT 0
>diff --git a/mm/memblock.c b/mm/memblock.c
>index 7d4f61a..dbd52c3c 100644
>--- a/mm/memblock.c
>+++ b/mm/memblock.c
>@@ -277,6 +277,7 @@ static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
>
> 	/* pump up @end */
> 	if (end == MEMBLOCK_ALLOC_ACCESSIBLE ||
>+	    end == MEMBLOCK_ALLOC_EXACT_NODE ||
> 	    end == MEMBLOCK_ALLOC_KASAN)
> 		end = memblock.current_limit;
>
>@@ -1365,7 +1366,7 @@ static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
> 	if (found && !memblock_reserve(found, size))
> 		goto done;
>
>-	if (nid != NUMA_NO_NODE) {
>+	if (end != MEMBLOCK_ALLOC_EXACT_NODE && nid != NUMA_NO_NODE) {
> 		found = memblock_find_in_range_node(size, align, start,
> 						    end, NUMA_NO_NODE,
> 						    flags);
>diff --git a/mm/sparse.c b/mm/sparse.c
>index 72f010d..828db46 100644
>--- a/mm/sparse.c
>+++ b/mm/sparse.c
>@@ -477,7 +477,7 @@ static void __init sparse_buffer_init(unsigned long size, int nid)
> 	sparsemap_buf =
> 		memblock_alloc_try_nid_raw(size, PAGE_SIZE,
> 						addr,
>-						MEMBLOCK_ALLOC_ACCESSIBLE, nid);
>+						MEMBLOCK_ALLOC_EXACT_NODE, nid);
> 	sparsemap_buf_end = sparsemap_buf + size;
> }
>
>-- 
>2.7.4.huawei.3
>

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: Support memblock alloc on the exact node for sparse_buffer_init()
  2019-09-18  6:51 ` Wei Yang
@ 2019-09-18  7:08   ` Yunfeng Ye
  2019-09-19  0:30     ` Wei Yang
  0 siblings, 1 reply; 9+ messages in thread
From: Yunfeng Ye @ 2019-09-18  7:08 UTC (permalink / raw)
  To: Wei Yang
  Cc: rppt, akpm, osalvador, mhocko, dan.j.williams, david, cai,
	linux-mm, linux-kernel



On 2019/9/18 14:51, Wei Yang wrote:
> On Wed, Sep 18, 2019 at 12:22:29PM +0800, Yunfeng Ye wrote:
>> Currently, when memblock_find_in_range_node() fail on the exact node, it
>> will use %NUMA_NO_NODE to find memblock from other nodes. At present,
>> the work is good, but when the large memory is insufficient and the
>> small memory is enough, we want to allocate the small memory of this
>> node first, and do not need to allocate large memory from other nodes.
>>
>> In sparse_buffer_init(), it will prepare large chunks of memory for page
>> structure. The page management structure requires a lot of memory, but
>> if the node does not have enough memory, it can be converted to a small
>> memory allocation without having to allocate it from other nodes.
>>
>> Add %MEMBLOCK_ALLOC_EXACT_NODE flag for this situation. Normally, the
>> behavior is the same with %MEMBLOCK_ALLOC_ACCESSIBLE, only that it will
>> not allocate from other nodes when a single node fails to allocate.
>>
>> If large contiguous block memory allocated fail in sparse_buffer_init(),
>> it will allocates small block memmory section by section later.
>>
> 
> Looks this changes current behavior even it fall back to section based
> allocation.
> 
When fall back to section allocation, it still use %MEMBLOCK_ALLOC_ACCESSIBLE
,I think the behavior is not change, Can you tell me the detail about the
changes. thanks.


>> Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
>> ---
>> include/linux/memblock.h | 1 +
>> mm/memblock.c            | 3 ++-
>> mm/sparse.c              | 2 +-
>> 3 files changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
>> index f491690..9a81d9c 100644
>> --- a/include/linux/memblock.h
>> +++ b/include/linux/memblock.h
>> @@ -339,6 +339,7 @@ static inline int memblock_get_region_node(const struct memblock_region *r)
>> #define MEMBLOCK_ALLOC_ANYWHERE	(~(phys_addr_t)0)
>> #define MEMBLOCK_ALLOC_ACCESSIBLE	0
>> #define MEMBLOCK_ALLOC_KASAN		1
>> +#define MEMBLOCK_ALLOC_EXACT_NODE	2
>>
>> /* We are using top down, so it is safe to use 0 here */
>> #define MEMBLOCK_LOW_LIMIT 0
>> diff --git a/mm/memblock.c b/mm/memblock.c
>> index 7d4f61a..dbd52c3c 100644
>> --- a/mm/memblock.c
>> +++ b/mm/memblock.c
>> @@ -277,6 +277,7 @@ static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
>>
>> 	/* pump up @end */
>> 	if (end == MEMBLOCK_ALLOC_ACCESSIBLE ||
>> +	    end == MEMBLOCK_ALLOC_EXACT_NODE ||
>> 	    end == MEMBLOCK_ALLOC_KASAN)
>> 		end = memblock.current_limit;
>>
>> @@ -1365,7 +1366,7 @@ static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
>> 	if (found && !memblock_reserve(found, size))
>> 		goto done;
>>
>> -	if (nid != NUMA_NO_NODE) {
>> +	if (end != MEMBLOCK_ALLOC_EXACT_NODE && nid != NUMA_NO_NODE) {
>> 		found = memblock_find_in_range_node(size, align, start,
>> 						    end, NUMA_NO_NODE,
>> 						    flags);
>> diff --git a/mm/sparse.c b/mm/sparse.c
>> index 72f010d..828db46 100644
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -477,7 +477,7 @@ static void __init sparse_buffer_init(unsigned long size, int nid)
>> 	sparsemap_buf =
>> 		memblock_alloc_try_nid_raw(size, PAGE_SIZE,
>> 						addr,
>> -						MEMBLOCK_ALLOC_ACCESSIBLE, nid);
>> +						MEMBLOCK_ALLOC_EXACT_NODE, nid);
>> 	sparsemap_buf_end = sparsemap_buf + size;
>> }
>>
>> -- 
>> 2.7.4.huawei.3
>>
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: Support memblock alloc on the exact node for sparse_buffer_init()
  2019-09-18  7:08   ` Yunfeng Ye
@ 2019-09-19  0:30     ` Wei Yang
  2019-09-19 11:33       ` Yunfeng Ye
  0 siblings, 1 reply; 9+ messages in thread
From: Wei Yang @ 2019-09-19  0:30 UTC (permalink / raw)
  To: Yunfeng Ye
  Cc: Wei Yang, rppt, akpm, osalvador, mhocko, dan.j.williams, david,
	cai, linux-mm, linux-kernel

On Wed, Sep 18, 2019 at 03:08:41PM +0800, Yunfeng Ye wrote:
>
>
>On 2019/9/18 14:51, Wei Yang wrote:
>> On Wed, Sep 18, 2019 at 12:22:29PM +0800, Yunfeng Ye wrote:
>>> Currently, when memblock_find_in_range_node() fail on the exact node, it
>>> will use %NUMA_NO_NODE to find memblock from other nodes. At present,
>>> the work is good, but when the large memory is insufficient and the
>>> small memory is enough, we want to allocate the small memory of this
>>> node first, and do not need to allocate large memory from other nodes.
>>>
>>> In sparse_buffer_init(), it will prepare large chunks of memory for page
>>> structure. The page management structure requires a lot of memory, but
>>> if the node does not have enough memory, it can be converted to a small
>>> memory allocation without having to allocate it from other nodes.
>>>
>>> Add %MEMBLOCK_ALLOC_EXACT_NODE flag for this situation. Normally, the
>>> behavior is the same with %MEMBLOCK_ALLOC_ACCESSIBLE, only that it will
>>> not allocate from other nodes when a single node fails to allocate.
>>>
>>> If large contiguous block memory allocated fail in sparse_buffer_init(),
>>> it will allocates small block memmory section by section later.
>>>
>> 
>> Looks this changes current behavior even it fall back to section based
>> allocation.
>> 
>When fall back to section allocation, it still use %MEMBLOCK_ALLOC_ACCESSIBLE
>,I think the behavior is not change, Can you tell me the detail about the
>changes. thanks.
>

You pass MEMBLOCK_ALLOC_EXACT_NODE for the first round allocation, which
forbid it allocates from other node. This is different from current behavior.
Am I right?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: Support memblock alloc on the exact node for sparse_buffer_init()
  2019-09-18  4:22 [PATCH] mm: Support memblock alloc on the exact node for sparse_buffer_init() Yunfeng Ye
  2019-09-18  6:51 ` Wei Yang
@ 2019-09-19  4:47 ` Mike Rapoport
  2019-09-19  7:14   ` Yunfeng Ye
  1 sibling, 1 reply; 9+ messages in thread
From: Mike Rapoport @ 2019-09-19  4:47 UTC (permalink / raw)
  To: Yunfeng Ye
  Cc: akpm, osalvador, mhocko, dan.j.williams, david, richardw.yang,
	cai, linux-mm, linux-kernel

Hi,

On Wed, Sep 18, 2019 at 12:22:29PM +0800, Yunfeng Ye wrote:
> Currently, when memblock_find_in_range_node() fail on the exact node, it
> will use %NUMA_NO_NODE to find memblock from other nodes. At present,
> the work is good, but when the large memory is insufficient and the
> small memory is enough, we want to allocate the small memory of this
> node first, and do not need to allocate large memory from other nodes.
> 
> In sparse_buffer_init(), it will prepare large chunks of memory for page
> structure. The page management structure requires a lot of memory, but
> if the node does not have enough memory, it can be converted to a small
> memory allocation without having to allocate it from other nodes.
> 
> Add %MEMBLOCK_ALLOC_EXACT_NODE flag for this situation. Normally, the
> behavior is the same with %MEMBLOCK_ALLOC_ACCESSIBLE, only that it will
> not allocate from other nodes when a single node fails to allocate.
> 
> If large contiguous block memory allocated fail in sparse_buffer_init(),
> it will allocates small block memmory section by section later.

Did you see the sparse_buffer_init() actually falling back to allocate from a
different node? If a node does not have enough memory to hold it's own
memory map, filling only it with parts of the memory map will not make such
node usable.
 
> Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
> ---
>  include/linux/memblock.h | 1 +
>  mm/memblock.c            | 3 ++-
>  mm/sparse.c              | 2 +-
>  3 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index f491690..9a81d9c 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -339,6 +339,7 @@ static inline int memblock_get_region_node(const struct memblock_region *r)
>  #define MEMBLOCK_ALLOC_ANYWHERE	(~(phys_addr_t)0)
>  #define MEMBLOCK_ALLOC_ACCESSIBLE	0
>  #define MEMBLOCK_ALLOC_KASAN		1
> +#define MEMBLOCK_ALLOC_EXACT_NODE	2
> 
>  /* We are using top down, so it is safe to use 0 here */
>  #define MEMBLOCK_LOW_LIMIT 0
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 7d4f61a..dbd52c3c 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -277,6 +277,7 @@ static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
> 
>  	/* pump up @end */
>  	if (end == MEMBLOCK_ALLOC_ACCESSIBLE ||
> +	    end == MEMBLOCK_ALLOC_EXACT_NODE ||
>  	    end == MEMBLOCK_ALLOC_KASAN)
>  		end = memblock.current_limit;
> 
> @@ -1365,7 +1366,7 @@ static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
>  	if (found && !memblock_reserve(found, size))
>  		goto done;
> 
> -	if (nid != NUMA_NO_NODE) {
> +	if (end != MEMBLOCK_ALLOC_EXACT_NODE && nid != NUMA_NO_NODE) {
>  		found = memblock_find_in_range_node(size, align, start,
>  						    end, NUMA_NO_NODE,
>  						    flags);
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 72f010d..828db46 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -477,7 +477,7 @@ static void __init sparse_buffer_init(unsigned long size, int nid)
>  	sparsemap_buf =
>  		memblock_alloc_try_nid_raw(size, PAGE_SIZE,
>  						addr,
> -						MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> +						MEMBLOCK_ALLOC_EXACT_NODE, nid);
>  	sparsemap_buf_end = sparsemap_buf + size;
>  }
> 
> -- 
> 2.7.4.huawei.3
> 
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: Support memblock alloc on the exact node for sparse_buffer_init()
  2019-09-19  4:47 ` Mike Rapoport
@ 2019-09-19  7:14   ` Yunfeng Ye
  2019-09-19  9:28     ` Mike Rapoport
  0 siblings, 1 reply; 9+ messages in thread
From: Yunfeng Ye @ 2019-09-19  7:14 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: akpm, osalvador, mhocko, dan.j.williams, david, richardw.yang,
	cai, linux-mm, linux-kernel



On 2019/9/19 12:47, Mike Rapoport wrote:
> Hi,
> 
> On Wed, Sep 18, 2019 at 12:22:29PM +0800, Yunfeng Ye wrote:
>> Currently, when memblock_find_in_range_node() fail on the exact node, it
>> will use %NUMA_NO_NODE to find memblock from other nodes. At present,
>> the work is good, but when the large memory is insufficient and the
>> small memory is enough, we want to allocate the small memory of this
>> node first, and do not need to allocate large memory from other nodes.
>>
>> In sparse_buffer_init(), it will prepare large chunks of memory for page
>> structure. The page management structure requires a lot of memory, but
>> if the node does not have enough memory, it can be converted to a small
>> memory allocation without having to allocate it from other nodes.
>>
>> Add %MEMBLOCK_ALLOC_EXACT_NODE flag for this situation. Normally, the
>> behavior is the same with %MEMBLOCK_ALLOC_ACCESSIBLE, only that it will
>> not allocate from other nodes when a single node fails to allocate.
>>
>> If large contiguous block memory allocated fail in sparse_buffer_init(),
>> it will allocates small block memmory section by section later.
> 
> Did you see the sparse_buffer_init() actually falling back to allocate from a
> different node? If a node does not have enough memory to hold it's own
> memory map, filling only it with parts of the memory map will not make such
> node usable.
>  
Normally, it won't happen that sparse_buffer_init() falling back from a different
node, because page structure size is 64 bytes per 4KB of memory, no more than 2%
of total available memory. But in the special cases, for eaxmple, memory address
is isolated by BIOS when memory failure, split the total memory many pieces,
although we have enough memory, but no large contiguous block memory in one node.
sparse_buffer_init() needs large contiguous block memory to be alloc in one time,

Eg, the size of memory is 1TB, sparse_buffer_init() need 1TB * 64/4096 = 16GB, but
we have 100 blocks memory which every block only have 10GB, although total memory
have almost 100*10GB=1TB, but no contiguous 16GB block.

Before commit 2a3cb8baef71 ("mm/sparse: delete old sparse_init and enable new one"),
we have %CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER config to meeting this situation,
after that, it fall back to allocate memory from other nodes, so have the performance
impact by remote numa access.

commit 85c77f791390 ("mm/sparse: add new sparse_init_nid() and sparse_init()") wrote
that:
    "
    sparse_init_nid(), which only
    operates within one memory node, and thus allocates memory either in large
    contiguous block or allocates section by section
    "
it means that allocates section by section is a normal choice too, so I think add
%MEMBLOCK_ALLOC_EXACT_NODE is also a choice for this situation. Most cases,
sparse_buffer_init() works good and not allocated from other nodes at present.

thanks.
Yunfeng Ye

>> Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
>> ---
>>  include/linux/memblock.h | 1 +
>>  mm/memblock.c            | 3 ++-
>>  mm/sparse.c              | 2 +-
>>  3 files changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
>> index f491690..9a81d9c 100644
>> --- a/include/linux/memblock.h
>> +++ b/include/linux/memblock.h
>> @@ -339,6 +339,7 @@ static inline int memblock_get_region_node(const struct memblock_region *r)
>>  #define MEMBLOCK_ALLOC_ANYWHERE	(~(phys_addr_t)0)
>>  #define MEMBLOCK_ALLOC_ACCESSIBLE	0
>>  #define MEMBLOCK_ALLOC_KASAN		1
>> +#define MEMBLOCK_ALLOC_EXACT_NODE	2
>>
>>  /* We are using top down, so it is safe to use 0 here */
>>  #define MEMBLOCK_LOW_LIMIT 0
>> diff --git a/mm/memblock.c b/mm/memblock.c
>> index 7d4f61a..dbd52c3c 100644
>> --- a/mm/memblock.c
>> +++ b/mm/memblock.c
>> @@ -277,6 +277,7 @@ static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
>>
>>  	/* pump up @end */
>>  	if (end == MEMBLOCK_ALLOC_ACCESSIBLE ||
>> +	    end == MEMBLOCK_ALLOC_EXACT_NODE ||
>>  	    end == MEMBLOCK_ALLOC_KASAN)
>>  		end = memblock.current_limit;
>>
>> @@ -1365,7 +1366,7 @@ static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
>>  	if (found && !memblock_reserve(found, size))
>>  		goto done;
>>
>> -	if (nid != NUMA_NO_NODE) {
>> +	if (end != MEMBLOCK_ALLOC_EXACT_NODE && nid != NUMA_NO_NODE) {
>>  		found = memblock_find_in_range_node(size, align, start,
>>  						    end, NUMA_NO_NODE,
>>  						    flags);
>> diff --git a/mm/sparse.c b/mm/sparse.c
>> index 72f010d..828db46 100644
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -477,7 +477,7 @@ static void __init sparse_buffer_init(unsigned long size, int nid)
>>  	sparsemap_buf =
>>  		memblock_alloc_try_nid_raw(size, PAGE_SIZE,
>>  						addr,
>> -						MEMBLOCK_ALLOC_ACCESSIBLE, nid);
>> +						MEMBLOCK_ALLOC_EXACT_NODE, nid);
>>  	sparsemap_buf_end = sparsemap_buf + size;
>>  }
>>
>> -- 
>> 2.7.4.huawei.3
>>
>>
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: Support memblock alloc on the exact node for sparse_buffer_init()
  2019-09-19  7:14   ` Yunfeng Ye
@ 2019-09-19  9:28     ` Mike Rapoport
  2019-09-19 11:43       ` Yunfeng Ye
  0 siblings, 1 reply; 9+ messages in thread
From: Mike Rapoport @ 2019-09-19  9:28 UTC (permalink / raw)
  To: Yunfeng Ye
  Cc: akpm, osalvador, mhocko, dan.j.williams, david, richardw.yang,
	cai, linux-mm, linux-kernel

On Thu, Sep 19, 2019 at 03:14:22PM +0800, Yunfeng Ye wrote:
> 
> 
> On 2019/9/19 12:47, Mike Rapoport wrote:
> > Hi,
> > 
> > On Wed, Sep 18, 2019 at 12:22:29PM +0800, Yunfeng Ye wrote:
> >> Currently, when memblock_find_in_range_node() fail on the exact node, it
> >> will use %NUMA_NO_NODE to find memblock from other nodes. At present,
> >> the work is good, but when the large memory is insufficient and the
> >> small memory is enough, we want to allocate the small memory of this
> >> node first, and do not need to allocate large memory from other nodes.
> >>
> >> In sparse_buffer_init(), it will prepare large chunks of memory for page
> >> structure. The page management structure requires a lot of memory, but
> >> if the node does not have enough memory, it can be converted to a small
> >> memory allocation without having to allocate it from other nodes.
> >>
> >> Add %MEMBLOCK_ALLOC_EXACT_NODE flag for this situation. Normally, the
> >> behavior is the same with %MEMBLOCK_ALLOC_ACCESSIBLE, only that it will
> >> not allocate from other nodes when a single node fails to allocate.
> >>
> >> If large contiguous block memory allocated fail in sparse_buffer_init(),
> >> it will allocates small block memmory section by section later.
> > 
> > Did you see the sparse_buffer_init() actually falling back to allocate from a
> > different node? If a node does not have enough memory to hold it's own
> > memory map, filling only it with parts of the memory map will not make such
> > node usable.
> >  
> Normally, it won't happen that sparse_buffer_init() falling back from a different
> node, because page structure size is 64 bytes per 4KB of memory, no more than 2%
> of total available memory. But in the special cases, for eaxmple, memory address
> is isolated by BIOS when memory failure, split the total memory many pieces,
> although we have enough memory, but no large contiguous block memory in one node.
> sparse_buffer_init() needs large contiguous block memory to be alloc in one time,
> 
> Eg, the size of memory is 1TB, sparse_buffer_init() need 1TB * 64/4096 = 16GB, but
> we have 100 blocks memory which every block only have 10GB, although total memory
> have almost 100*10GB=1TB, but no contiguous 16GB block.
 
An explanation that a node memory may become highly fragmented should be a
part of the changelog.

> Before commit 2a3cb8baef71 ("mm/sparse: delete old sparse_init and enable new one"),
> we have %CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER config to meeting this situation,
> after that, it fall back to allocate memory from other nodes, so have the performance
> impact by remote numa access.
> 
> commit 85c77f791390 ("mm/sparse: add new sparse_init_nid() and sparse_init()") wrote
> that:
>     "
>     sparse_init_nid(), which only
>     operates within one memory node, and thus allocates memory either in large
>     contiguous block or allocates section by section
>     "
> it means that allocates section by section is a normal choice too, so I think add
> %MEMBLOCK_ALLOC_EXACT_NODE is also a choice for this situation. Most cases,
> sparse_buffer_init() works good and not allocated from other nodes at present.

I'd prefer to see memblock_alloc_exact_nid_raw() wrapper for
memblock_find_in_range_node() rather than using a flag.
 
> thanks.
> Yunfeng Ye
> 
> >> Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
> >> ---
> >>  include/linux/memblock.h | 1 +
> >>  mm/memblock.c            | 3 ++-
> >>  mm/sparse.c              | 2 +-
> >>  3 files changed, 4 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> >> index f491690..9a81d9c 100644
> >> --- a/include/linux/memblock.h
> >> +++ b/include/linux/memblock.h
> >> @@ -339,6 +339,7 @@ static inline int memblock_get_region_node(const struct memblock_region *r)
> >>  #define MEMBLOCK_ALLOC_ANYWHERE	(~(phys_addr_t)0)
> >>  #define MEMBLOCK_ALLOC_ACCESSIBLE	0
> >>  #define MEMBLOCK_ALLOC_KASAN		1
> >> +#define MEMBLOCK_ALLOC_EXACT_NODE	2
> >>
> >>  /* We are using top down, so it is safe to use 0 here */
> >>  #define MEMBLOCK_LOW_LIMIT 0
> >> diff --git a/mm/memblock.c b/mm/memblock.c
> >> index 7d4f61a..dbd52c3c 100644
> >> --- a/mm/memblock.c
> >> +++ b/mm/memblock.c
> >> @@ -277,6 +277,7 @@ static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
> >>
> >>  	/* pump up @end */
> >>  	if (end == MEMBLOCK_ALLOC_ACCESSIBLE ||
> >> +	    end == MEMBLOCK_ALLOC_EXACT_NODE ||
> >>  	    end == MEMBLOCK_ALLOC_KASAN)
> >>  		end = memblock.current_limit;
> >>
> >> @@ -1365,7 +1366,7 @@ static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
> >>  	if (found && !memblock_reserve(found, size))
> >>  		goto done;
> >>
> >> -	if (nid != NUMA_NO_NODE) {
> >> +	if (end != MEMBLOCK_ALLOC_EXACT_NODE && nid != NUMA_NO_NODE) {
> >>  		found = memblock_find_in_range_node(size, align, start,
> >>  						    end, NUMA_NO_NODE,
> >>  						    flags);
> >> diff --git a/mm/sparse.c b/mm/sparse.c
> >> index 72f010d..828db46 100644
> >> --- a/mm/sparse.c
> >> +++ b/mm/sparse.c
> >> @@ -477,7 +477,7 @@ static void __init sparse_buffer_init(unsigned long size, int nid)
> >>  	sparsemap_buf =
> >>  		memblock_alloc_try_nid_raw(size, PAGE_SIZE,
> >>  						addr,
> >> -						MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> >> +						MEMBLOCK_ALLOC_EXACT_NODE, nid);
> >>  	sparsemap_buf_end = sparsemap_buf + size;
> >>  }
> >>
> >> -- 
> >> 2.7.4.huawei.3
> >>
> >>
> > 
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: Support memblock alloc on the exact node for sparse_buffer_init()
  2019-09-19  0:30     ` Wei Yang
@ 2019-09-19 11:33       ` Yunfeng Ye
  0 siblings, 0 replies; 9+ messages in thread
From: Yunfeng Ye @ 2019-09-19 11:33 UTC (permalink / raw)
  To: Wei Yang
  Cc: rppt, akpm, osalvador, mhocko, dan.j.williams, david, cai,
	linux-mm, linux-kernel



On 2019/9/19 8:30, Wei Yang wrote:
> On Wed, Sep 18, 2019 at 03:08:41PM +0800, Yunfeng Ye wrote:
>>
>>
>> On 2019/9/18 14:51, Wei Yang wrote:
>>> On Wed, Sep 18, 2019 at 12:22:29PM +0800, Yunfeng Ye wrote:
>>>> Currently, when memblock_find_in_range_node() fail on the exact node, it
>>>> will use %NUMA_NO_NODE to find memblock from other nodes. At present,
>>>> the work is good, but when the large memory is insufficient and the
>>>> small memory is enough, we want to allocate the small memory of this
>>>> node first, and do not need to allocate large memory from other nodes.
>>>>
>>>> In sparse_buffer_init(), it will prepare large chunks of memory for page
>>>> structure. The page management structure requires a lot of memory, but
>>>> if the node does not have enough memory, it can be converted to a small
>>>> memory allocation without having to allocate it from other nodes.
>>>>
>>>> Add %MEMBLOCK_ALLOC_EXACT_NODE flag for this situation. Normally, the
>>>> behavior is the same with %MEMBLOCK_ALLOC_ACCESSIBLE, only that it will
>>>> not allocate from other nodes when a single node fails to allocate.
>>>>
>>>> If large contiguous block memory allocated fail in sparse_buffer_init(),
>>>> it will allocates small block memmory section by section later.
>>>>
>>>
>>> Looks this changes current behavior even it fall back to section based
>>> allocation.
>>>
>> When fall back to section allocation, it still use %MEMBLOCK_ALLOC_ACCESSIBLE
>> ,I think the behavior is not change, Can you tell me the detail about the
>> changes. thanks.
>>
> 
> You pass MEMBLOCK_ALLOC_EXACT_NODE for the first round allocation, which
> forbid it allocates from other node. This is different from current behavior.
> Am I right?
> 
Most cases, it will not go to the %MEMBLOCK_ALLOC_EXACT_NODE allocate routine.
so the behavior will not change.
If current node have no large contiguous block memory, it will fall back to
allocate memory based section. this is the different from current behavior.

thanks.
> .
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: Support memblock alloc on the exact node for sparse_buffer_init()
  2019-09-19  9:28     ` Mike Rapoport
@ 2019-09-19 11:43       ` Yunfeng Ye
  0 siblings, 0 replies; 9+ messages in thread
From: Yunfeng Ye @ 2019-09-19 11:43 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: akpm, osalvador, mhocko, dan.j.williams, david, richardw.yang,
	cai, linux-mm, linux-kernel



On 2019/9/19 17:28, Mike Rapoport wrote:
> On Thu, Sep 19, 2019 at 03:14:22PM +0800, Yunfeng Ye wrote:
>>
>>
>> On 2019/9/19 12:47, Mike Rapoport wrote:
>>> Hi,
>>>
>>> On Wed, Sep 18, 2019 at 12:22:29PM +0800, Yunfeng Ye wrote:
>>>> Currently, when memblock_find_in_range_node() fail on the exact node, it
>>>> will use %NUMA_NO_NODE to find memblock from other nodes. At present,
>>>> the work is good, but when the large memory is insufficient and the
>>>> small memory is enough, we want to allocate the small memory of this
>>>> node first, and do not need to allocate large memory from other nodes.
>>>>
>>>> In sparse_buffer_init(), it will prepare large chunks of memory for page
>>>> structure. The page management structure requires a lot of memory, but
>>>> if the node does not have enough memory, it can be converted to a small
>>>> memory allocation without having to allocate it from other nodes.
>>>>
>>>> Add %MEMBLOCK_ALLOC_EXACT_NODE flag for this situation. Normally, the
>>>> behavior is the same with %MEMBLOCK_ALLOC_ACCESSIBLE, only that it will
>>>> not allocate from other nodes when a single node fails to allocate.
>>>>
>>>> If large contiguous block memory allocated fail in sparse_buffer_init(),
>>>> it will allocates small block memmory section by section later.
>>>
>>> Did you see the sparse_buffer_init() actually falling back to allocate from a
>>> different node? If a node does not have enough memory to hold it's own
>>> memory map, filling only it with parts of the memory map will not make such
>>> node usable.
>>>  
>> Normally, it won't happen that sparse_buffer_init() falling back from a different
>> node, because page structure size is 64 bytes per 4KB of memory, no more than 2%
>> of total available memory. But in the special cases, for eaxmple, memory address
>> is isolated by BIOS when memory failure, split the total memory many pieces,
>> although we have enough memory, but no large contiguous block memory in one node.
>> sparse_buffer_init() needs large contiguous block memory to be alloc in one time,
>>
>> Eg, the size of memory is 1TB, sparse_buffer_init() need 1TB * 64/4096 = 16GB, but
>> we have 100 blocks memory which every block only have 10GB, although total memory
>> have almost 100*10GB=1TB, but no contiguous 16GB block.
>  
> An explanation that a node memory may become highly fragmented should be a
> part of the changelog.
> 
ok, thanks for your advice.

>> Before commit 2a3cb8baef71 ("mm/sparse: delete old sparse_init and enable new one"),
>> we have %CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER config to meeting this situation,
>> after that, it fall back to allocate memory from other nodes, so have the performance
>> impact by remote numa access.
>>
>> commit 85c77f791390 ("mm/sparse: add new sparse_init_nid() and sparse_init()") wrote
>> that:
>>     "
>>     sparse_init_nid(), which only
>>     operates within one memory node, and thus allocates memory either in large
>>     contiguous block or allocates section by section
>>     "
>> it means that allocates section by section is a normal choice too, so I think add
>> %MEMBLOCK_ALLOC_EXACT_NODE is also a choice for this situation. Most cases,
>> sparse_buffer_init() works good and not allocated from other nodes at present.
> 
> I'd prefer to see memblock_alloc_exact_nid_raw() wrapper for
> memblock_find_in_range_node() rather than using a flag.
>  
I've also thought about this modification method, I will modify as you suggest. thanks.

>> thanks.
>> Yunfeng Ye
>>
>>>> Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
>>>> ---
>>>>  include/linux/memblock.h | 1 +
>>>>  mm/memblock.c            | 3 ++-
>>>>  mm/sparse.c              | 2 +-
>>>>  3 files changed, 4 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
>>>> index f491690..9a81d9c 100644
>>>> --- a/include/linux/memblock.h
>>>> +++ b/include/linux/memblock.h
>>>> @@ -339,6 +339,7 @@ static inline int memblock_get_region_node(const struct memblock_region *r)
>>>>  #define MEMBLOCK_ALLOC_ANYWHERE	(~(phys_addr_t)0)
>>>>  #define MEMBLOCK_ALLOC_ACCESSIBLE	0
>>>>  #define MEMBLOCK_ALLOC_KASAN		1
>>>> +#define MEMBLOCK_ALLOC_EXACT_NODE	2
>>>>
>>>>  /* We are using top down, so it is safe to use 0 here */
>>>>  #define MEMBLOCK_LOW_LIMIT 0
>>>> diff --git a/mm/memblock.c b/mm/memblock.c
>>>> index 7d4f61a..dbd52c3c 100644
>>>> --- a/mm/memblock.c
>>>> +++ b/mm/memblock.c
>>>> @@ -277,6 +277,7 @@ static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
>>>>
>>>>  	/* pump up @end */
>>>>  	if (end == MEMBLOCK_ALLOC_ACCESSIBLE ||
>>>> +	    end == MEMBLOCK_ALLOC_EXACT_NODE ||
>>>>  	    end == MEMBLOCK_ALLOC_KASAN)
>>>>  		end = memblock.current_limit;
>>>>
>>>> @@ -1365,7 +1366,7 @@ static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
>>>>  	if (found && !memblock_reserve(found, size))
>>>>  		goto done;
>>>>
>>>> -	if (nid != NUMA_NO_NODE) {
>>>> +	if (end != MEMBLOCK_ALLOC_EXACT_NODE && nid != NUMA_NO_NODE) {
>>>>  		found = memblock_find_in_range_node(size, align, start,
>>>>  						    end, NUMA_NO_NODE,
>>>>  						    flags);
>>>> diff --git a/mm/sparse.c b/mm/sparse.c
>>>> index 72f010d..828db46 100644
>>>> --- a/mm/sparse.c
>>>> +++ b/mm/sparse.c
>>>> @@ -477,7 +477,7 @@ static void __init sparse_buffer_init(unsigned long size, int nid)
>>>>  	sparsemap_buf =
>>>>  		memblock_alloc_try_nid_raw(size, PAGE_SIZE,
>>>>  						addr,
>>>> -						MEMBLOCK_ALLOC_ACCESSIBLE, nid);
>>>> +						MEMBLOCK_ALLOC_EXACT_NODE, nid);
>>>>  	sparsemap_buf_end = sparsemap_buf + size;
>>>>  }
>>>>
>>>> -- 
>>>> 2.7.4.huawei.3
>>>>
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-09-19 11:43 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-18  4:22 [PATCH] mm: Support memblock alloc on the exact node for sparse_buffer_init() Yunfeng Ye
2019-09-18  6:51 ` Wei Yang
2019-09-18  7:08   ` Yunfeng Ye
2019-09-19  0:30     ` Wei Yang
2019-09-19 11:33       ` Yunfeng Ye
2019-09-19  4:47 ` Mike Rapoport
2019-09-19  7:14   ` Yunfeng Ye
2019-09-19  9:28     ` Mike Rapoport
2019-09-19 11:43       ` Yunfeng Ye

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).