* [PATCH v3 0/2] fix numa spreading for large hash tables @ 2021-10-21 8:07 Chen Wandun 2021-10-21 8:07 ` [PATCH v3 1/2] mm/vmalloc: " Chen Wandun 2021-10-21 8:07 ` [PATCH v3 2/2] mm/vmalloc: introduce alloc_pages_bulk_array_mempolicy to accelerate memory allocation Chen Wandun 0 siblings, 2 replies; 5+ messages in thread From: Chen Wandun @ 2021-10-21 8:07 UTC (permalink / raw) To: akpm, npiggin, linux-mm, linux-kernel, edumazet, wangkefeng.wang, guohanjun, shakeelb, urezki Cc: chenwandun [PATCH v3 1/2] fix numa spreading problem [PATCH v3 2/2] optimization about performance v1 ==> v2: 1. do a minimal fix in [PATCH v2 1/2] 2. add some comments v2 ==> v3: remote redundant code based on commit 14a60b114a8560de785df502fdc5687a969eae81 Chen Wandun (2): mm/vmalloc: fix numa spreading for large hash tables mm/vmalloc: introduce alloc_pages_bulk_array_mempolicy to accelerate memory allocation include/linux/gfp.h | 4 +++ mm/mempolicy.c | 82 +++++++++++++++++++++++++++++++++++++++++++++ mm/vmalloc.c | 27 +++++++++++---- 3 files changed, 107 insertions(+), 6 deletions(-) -- 2.25.1 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v3 1/2] mm/vmalloc: fix numa spreading for large hash tables 2021-10-21 8:07 [PATCH v3 0/2] fix numa spreading for large hash tables Chen Wandun @ 2021-10-21 8:07 ` Chen Wandun 2021-10-21 8:07 ` [PATCH v3 2/2] mm/vmalloc: introduce alloc_pages_bulk_array_mempolicy to accelerate memory allocation Chen Wandun 1 sibling, 0 replies; 5+ messages in thread From: Chen Wandun @ 2021-10-21 8:07 UTC (permalink / raw) To: akpm, npiggin, linux-mm, linux-kernel, edumazet, wangkefeng.wang, guohanjun, shakeelb, urezki Cc: chenwandun Eric Dumazet reported a strange numa spreading info in [1], and found commit 121e6f3258fe ("mm/vmalloc: hugepage vmalloc mappings") introduced this issue [2]. Dig into the difference before and after this patch, page allocation has some difference: before: alloc_large_system_hash __vmalloc __vmalloc_node(..., NUMA_NO_NODE, ...) __vmalloc_node_range __vmalloc_area_node alloc_page /* because NUMA_NO_NODE, so choose alloc_page branch */ alloc_pages_current alloc_page_interleave /* can be proved by print policy mode */ after: alloc_large_system_hash __vmalloc __vmalloc_node(..., NUMA_NO_NODE, ...) __vmalloc_node_range __vmalloc_area_node alloc_pages_node /* choose nid by nuam_mem_id() */ __alloc_pages_node(nid, ....) So after commit 121e6f3258fe ("mm/vmalloc: hugepage vmalloc mappings"), it will allocate memory in current node instead of interleaving allocate memory. [1] https://lore.kernel.org/linux-mm/CANn89iL6AAyWhfxdHO+jaT075iOa3XcYn9k6JJc7JR2XYn6k_Q@mail.gmail.com/ [2] https://lore.kernel.org/linux-mm/CANn89iLofTR=AK-QOZY87RdUZENCZUT4O6a0hvhu3_EwRMerOg@mail.gmail.com/ Fixes: 121e6f3258fe ("mm/vmalloc: hugepage vmalloc mappings") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Chen Wandun <chenwandun@huawei.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> --- mm/vmalloc.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index d77830ff604c..e8a807c78110 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2816,6 +2816,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid, unsigned int order, unsigned int nr_pages, struct page **pages) { unsigned int nr_allocated = 0; + struct page *page; + int i; /* * For order-0 pages we make use of bulk allocator, if @@ -2823,7 +2825,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid, * to fails, fallback to a single page allocator that is * more permissive. */ - if (!order) { + if (!order && nid != NUMA_NO_NODE) { while (nr_allocated < nr_pages) { unsigned int nr, nr_pages_request; @@ -2848,7 +2850,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid, if (nr != nr_pages_request) break; } - } else + } else if (order) /* * Compound pages required for remap_vmalloc_page if * high-order pages. @@ -2856,11 +2858,12 @@ vm_area_alloc_pages(gfp_t gfp, int nid, gfp |= __GFP_COMP; /* High-order pages or fallback path if "bulk" fails. */ - while (nr_allocated < nr_pages) { - struct page *page; - int i; - page = alloc_pages_node(nid, gfp, order); + while (nr_allocated < nr_pages) { + if (nid == NUMA_NO_NODE) + page = alloc_pages(gfp, order); + else + page = alloc_pages_node(nid, gfp, order); if (unlikely(!page)) break; -- 2.25.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v3 2/2] mm/vmalloc: introduce alloc_pages_bulk_array_mempolicy to accelerate memory allocation 2021-10-21 8:07 [PATCH v3 0/2] fix numa spreading for large hash tables Chen Wandun 2021-10-21 8:07 ` [PATCH v3 1/2] mm/vmalloc: " Chen Wandun @ 2021-10-21 8:07 ` Chen Wandun 2021-10-22 3:26 ` Andrew Morton 1 sibling, 1 reply; 5+ messages in thread From: Chen Wandun @ 2021-10-21 8:07 UTC (permalink / raw) To: akpm, npiggin, linux-mm, linux-kernel, edumazet, wangkefeng.wang, guohanjun, shakeelb, urezki Cc: chenwandun It will cause significant performance regressions in some situations as Andrew mentioned in [1]. The main situation is vmalloc, vmalloc will allocate pages with NUMA_NO_NODE by default, that will result in alloc page one by one; In order to solve this, __alloc_pages_bulk and mempolicy should be considered at the same time. 1) If node is specified in memory allocation request, it will alloc all pages by __alloc_pages_bulk. 2) If interleaving allocate memory, it will cauculate how many pages should be allocated in each node, and use __alloc_pages_bulk to alloc pages in each node. [1]: https://lore.kernel.org/lkml/CALvZod4G3SzP3kWxQYn0fj+VgG-G3yWXz=gz17+3N57ru1iajw@mail.gmail.com/t/#m750c8e3231206134293b089feaa090590afa0f60 Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Chen Wandun <chenwandun@huawei.com> --- include/linux/gfp.h | 4 +++ mm/mempolicy.c | 82 +++++++++++++++++++++++++++++++++++++++++++++ mm/vmalloc.c | 20 ++++++++--- 3 files changed, 102 insertions(+), 4 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 55b2ec1f965a..cd98c858fc74 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -535,6 +535,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, struct list_head *page_list, struct page **page_array); +unsigned long alloc_pages_bulk_array_mempolicy(gfp_t gfp, + unsigned long nr_pages, + struct page **page_array); + /* Bulk allocate order-0 pages */ static inline unsigned long alloc_pages_bulk_list(gfp_t gfp, unsigned long nr_pages, struct list_head *list) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 1592b081c58e..56bb1fe4d179 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2202,6 +2202,88 @@ struct page *alloc_pages(gfp_t gfp, unsigned order) } EXPORT_SYMBOL(alloc_pages); +unsigned long alloc_pages_bulk_array_interleave(gfp_t gfp, + struct mempolicy *pol, unsigned long nr_pages, + struct page **page_array) +{ + int nodes; + unsigned long nr_pages_per_node; + int delta; + int i; + unsigned long nr_allocated; + unsigned long total_allocated = 0; + + nodes = nodes_weight(pol->nodes); + nr_pages_per_node = nr_pages / nodes; + delta = nr_pages - nodes * nr_pages_per_node; + + for (i = 0; i < nodes; i++) { + if (delta) { + nr_allocated = __alloc_pages_bulk(gfp, + interleave_nodes(pol), NULL, + nr_pages_per_node + 1, NULL, + page_array); + delta--; + } else { + nr_allocated = __alloc_pages_bulk(gfp, + interleave_nodes(pol), NULL, + nr_pages_per_node, NULL, page_array); + } + + page_array += nr_allocated; + total_allocated += nr_allocated; + } + + return total_allocated; +} + +unsigned long alloc_pages_bulk_array_preferred_many(gfp_t gfp, int nid, + struct mempolicy *pol, unsigned long nr_pages, + struct page **page_array) +{ + gfp_t preferred_gfp; + unsigned long nr_allocated = 0; + + preferred_gfp = gfp | __GFP_NOWARN; + preferred_gfp &= ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL); + + nr_allocated = __alloc_pages_bulk(preferred_gfp, nid, &pol->nodes, + nr_pages, NULL, page_array); + + if (nr_allocated < nr_pages) + nr_allocated += __alloc_pages_bulk(gfp, numa_node_id(), NULL, + nr_pages - nr_allocated, NULL, + page_array + nr_allocated); + return nr_allocated; +} + +/* alloc pages bulk and mempolicy should be considered at the + * same time in some situation such as vmalloc. + * + * It can accelerate memory allocation especially interleaving + * allocate memory. + */ +unsigned long alloc_pages_bulk_array_mempolicy(gfp_t gfp, + unsigned long nr_pages, struct page **page_array) +{ + struct mempolicy *pol = &default_policy; + + if (!in_interrupt() && !(gfp & __GFP_THISNODE)) + pol = get_task_policy(current); + + if (pol->mode == MPOL_INTERLEAVE) + return alloc_pages_bulk_array_interleave(gfp, pol, + nr_pages, page_array); + + if (pol->mode == MPOL_PREFERRED_MANY) + return alloc_pages_bulk_array_preferred_many(gfp, + numa_node_id(), pol, nr_pages, page_array); + + return __alloc_pages_bulk(gfp, policy_node(gfp, pol, numa_node_id()), + policy_nodemask(gfp, pol), nr_pages, NULL, + page_array); +} + int vma_dup_policy(struct vm_area_struct *src, struct vm_area_struct *dst) { struct mempolicy *pol = mpol_dup(vma_policy(src)); diff --git a/mm/vmalloc.c b/mm/vmalloc.c index e8a807c78110..c3ab25d408dd 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2825,7 +2825,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid, * to fails, fallback to a single page allocator that is * more permissive. */ - if (!order && nid != NUMA_NO_NODE) { + if (!order) { while (nr_allocated < nr_pages) { unsigned int nr, nr_pages_request; @@ -2837,8 +2837,20 @@ vm_area_alloc_pages(gfp_t gfp, int nid, */ nr_pages_request = min(100U, nr_pages - nr_allocated); - nr = alloc_pages_bulk_array_node(gfp, nid, - nr_pages_request, pages + nr_allocated); + /* memory allocation should consider mempolicy, we cant + * wrongly use nearest node when nid == NUMA_NO_NODE, + * otherwise memory may be allocated in only one node, + * but mempolcy want to alloc memory by interleaving. + */ + if (nid == NUMA_NO_NODE) + nr = alloc_pages_bulk_array_mempolicy(gfp, + nr_pages_request, + pages + nr_allocated); + + else + nr = alloc_pages_bulk_array_node(gfp, nid, + nr_pages_request, + pages + nr_allocated); nr_allocated += nr; cond_resched(); @@ -2850,7 +2862,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid, if (nr != nr_pages_request) break; } - } else if (order) + } else /* * Compound pages required for remap_vmalloc_page if * high-order pages. -- 2.25.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v3 2/2] mm/vmalloc: introduce alloc_pages_bulk_array_mempolicy to accelerate memory allocation 2021-10-21 8:07 ` [PATCH v3 2/2] mm/vmalloc: introduce alloc_pages_bulk_array_mempolicy to accelerate memory allocation Chen Wandun @ 2021-10-22 3:26 ` Andrew Morton 2021-10-28 13:39 ` Chen Wandun 0 siblings, 1 reply; 5+ messages in thread From: Andrew Morton @ 2021-10-22 3:26 UTC (permalink / raw) To: Chen Wandun Cc: npiggin, linux-mm, linux-kernel, edumazet, wangkefeng.wang, guohanjun, shakeelb, urezki On Thu, 21 Oct 2021 16:07:44 +0800 Chen Wandun <chenwandun@huawei.com> wrote: > It What is "it"? > will cause significant performance regressions in some situations > as Andrew mentioned in [1]. The main situation is vmalloc, vmalloc > will allocate pages with NUMA_NO_NODE by default, that will result > in alloc page one by one; > > In order to solve this, __alloc_pages_bulk and mempolicy should be > considered at the same time. > > 1) If node is specified in memory allocation request, it will alloc > all pages by __alloc_pages_bulk. > > 2) If interleaving allocate memory, it will cauculate how many pages > should be allocated in each node, and use __alloc_pages_bulk to alloc > pages in each node. This v3 patch didn't incorporate my two fixes, below. It is usual to incorporate such fixes prior to resending. I have retained those two fixes, now against v3. From: Andrew Morton <akpm@linux-foundation.org> Subject: mm-vmalloc-introduce-alloc_pages_bulk_array_mempolicy-to-accelerate-memory-allocation-fix make two functions static Cc: Chen Wandun <chenwandun@huawei.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/mempolicy.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/mm/mempolicy.c~mm-vmalloc-introduce-alloc_pages_bulk_array_mempolicy-to-accelerate-memory-allocation-fix +++ a/mm/mempolicy.c @@ -2196,7 +2196,7 @@ struct page *alloc_pages(gfp_t gfp, unsi } EXPORT_SYMBOL(alloc_pages); -unsigned long alloc_pages_bulk_array_interleave(gfp_t gfp, +static unsigned long alloc_pages_bulk_array_interleave(gfp_t gfp, struct mempolicy *pol, unsigned long nr_pages, struct page **page_array) { @@ -2231,7 +2231,7 @@ unsigned long alloc_pages_bulk_array_int return total_allocated; } -unsigned long alloc_pages_bulk_array_preferred_many(gfp_t gfp, int nid, +static unsigned long alloc_pages_bulk_array_preferred_many(gfp_t gfp, int nid, struct mempolicy *pol, unsigned long nr_pages, struct page **page_array) { _ From: Andrew Morton <akpm@linux-foundation.org> Subject: mm-vmalloc-introduce-alloc_pages_bulk_array_mempolicy-to-accelerate-memory-allocation-fix-2 fix CONFIG_NUMA=n build. alloc_pages_bulk_array_mempolicy() was undefined Cc: Chen Wandun <chenwandun@huawei.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/vmalloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/vmalloc.c~mm-vmalloc-introduce-alloc_pages_bulk_array_mempolicy-to-accelerate-memory-allocation-fix-2 +++ a/mm/vmalloc.c @@ -2860,7 +2860,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid, * otherwise memory may be allocated in only one node, * but mempolcy want to alloc memory by interleaving. */ - if (nid == NUMA_NO_NODE) + if (IS_ENABLED(CONFIG_NUMA) && nid == NUMA_NO_NODE) nr = alloc_pages_bulk_array_mempolicy(gfp, nr_pages_request, pages + nr_allocated); _ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3 2/2] mm/vmalloc: introduce alloc_pages_bulk_array_mempolicy to accelerate memory allocation 2021-10-22 3:26 ` Andrew Morton @ 2021-10-28 13:39 ` Chen Wandun 0 siblings, 0 replies; 5+ messages in thread From: Chen Wandun @ 2021-10-28 13:39 UTC (permalink / raw) To: Andrew Morton Cc: npiggin, linux-mm, linux-kernel, edumazet, wangkefeng.wang, guohanjun, shakeelb, urezki 在 2021/10/22 11:26, Andrew Morton 写道: > On Thu, 21 Oct 2021 16:07:44 +0800 Chen Wandun <chenwandun@huawei.com> wrote: > >> It > > What is "it"? it == > [PATCH] mm/vmalloc: fix numa spreading for large hash tables; > >> will cause significant performance regressions in some situations >> as Andrew mentioned in [1]. The main situation is vmalloc, vmalloc >> will allocate pages with NUMA_NO_NODE by default, that will result >> in alloc page one by one; >> >> In order to solve this, __alloc_pages_bulk and mempolicy should be >> considered at the same time. >> >> 1) If node is specified in memory allocation request, it will alloc >> all pages by __alloc_pages_bulk. >> >> 2) If interleaving allocate memory, it will cauculate how many pages >> should be allocated in each node, and use __alloc_pages_bulk to alloc >> pages in each node. > > This v3 patch didn't incorporate my two fixes, below. It is usual to > incorporate such fixes prior to resending. I have retained those two > fixes, now against v3. > > > From: Andrew Morton <akpm@linux-foundation.org> > Subject: mm-vmalloc-introduce-alloc_pages_bulk_array_mempolicy-to-accelerate-memory-allocation-fix > > make two functions static > > Cc: Chen Wandun <chenwandun@huawei.com> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Hanjun Guo <guohanjun@huawei.com> > Cc: Kefeng Wang <wangkefeng.wang@huawei.com> > Cc: Nicholas Piggin <npiggin@gmail.com> > Cc: Shakeel Butt <shakeelb@google.com> > Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > --- > > mm/mempolicy.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > --- a/mm/mempolicy.c~mm-vmalloc-introduce-alloc_pages_bulk_array_mempolicy-to-accelerate-memory-allocation-fix > +++ a/mm/mempolicy.c > @@ -2196,7 +2196,7 @@ struct page *alloc_pages(gfp_t gfp, unsi > } > EXPORT_SYMBOL(alloc_pages); > > -unsigned long alloc_pages_bulk_array_interleave(gfp_t gfp, > +static unsigned long alloc_pages_bulk_array_interleave(gfp_t gfp, > struct mempolicy *pol, unsigned long nr_pages, > struct page **page_array) > { > @@ -2231,7 +2231,7 @@ unsigned long alloc_pages_bulk_array_int > return total_allocated; > } > > -unsigned long alloc_pages_bulk_array_preferred_many(gfp_t gfp, int nid, > +static unsigned long alloc_pages_bulk_array_preferred_many(gfp_t gfp, int nid, > struct mempolicy *pol, unsigned long nr_pages, > struct page **page_array) > { > _ > > > > > From: Andrew Morton <akpm@linux-foundation.org> > Subject: mm-vmalloc-introduce-alloc_pages_bulk_array_mempolicy-to-accelerate-memory-allocation-fix-2 > > fix CONFIG_NUMA=n build. alloc_pages_bulk_array_mempolicy() was undefined > > Cc: Chen Wandun <chenwandun@huawei.com> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Hanjun Guo <guohanjun@huawei.com> > Cc: Kefeng Wang <wangkefeng.wang@huawei.com> > Cc: Nicholas Piggin <npiggin@gmail.com> > Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > --- > > mm/vmalloc.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > --- a/mm/vmalloc.c~mm-vmalloc-introduce-alloc_pages_bulk_array_mempolicy-to-accelerate-memory-allocation-fix-2 > +++ a/mm/vmalloc.c > @@ -2860,7 +2860,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid, > * otherwise memory may be allocated in only one node, > * but mempolcy want to alloc memory by interleaving. > */ > - if (nid == NUMA_NO_NODE) > + if (IS_ENABLED(CONFIG_NUMA) && nid == NUMA_NO_NODE) > nr = alloc_pages_bulk_array_mempolicy(gfp, > nr_pages_request, > pages + nr_allocated); > _ > > . > ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-10-28 13:39 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-10-21 8:07 [PATCH v3 0/2] fix numa spreading for large hash tables Chen Wandun 2021-10-21 8:07 ` [PATCH v3 1/2] mm/vmalloc: " Chen Wandun 2021-10-21 8:07 ` [PATCH v3 2/2] mm/vmalloc: introduce alloc_pages_bulk_array_mempolicy to accelerate memory allocation Chen Wandun 2021-10-22 3:26 ` Andrew Morton 2021-10-28 13:39 ` Chen Wandun
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).