linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] vmalloc() vs bulk allocator v2
@ 2021-05-16 20:20 Uladzislau Rezki (Sony)
  2021-05-16 20:20 ` [PATCH 1/3] mm/page_alloc: Add an alloc_pages_bulk_array_node() helper Uladzislau Rezki (Sony)
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Uladzislau Rezki (Sony) @ 2021-05-16 20:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Mel Gorman, Matthew Wilcox, Nicholas Piggin,
	Uladzislau Rezki, Hillf Danton, Michal Hocko,
	Oleksiy Avramchenko, Steven Rostedt

Hi.

There are three patches in this small series. This is a second iteration.
First one was buggy and leaded to kernel panic due to passing NUMA_NO_NODE
as "nid" to the bulk allocator(it assumes to have a valid node). 

Therefore the patch number [1] adds an extra helper that guarantees a correct
numa node ID if it is not specified. Basically saying when it is invoked with
NUMA_NO_NODE.

A patch [2] has been slightly updated, the change-log is below.

V1 -> V2:
    - Switch to the alloc_pages_bulk_array_node() helper so the NUMA_NO_NODE
      is correctly handled(similar to alloc_pages_node() API func).
    - Use "while()" loop instead of "for()" for high-order pages and increase
      number of allocated pages after actual alloc.

Uladzislau Rezki (Sony) (3):
[1]  mm/page_alloc: Add an alloc_pages_bulk_array_node() helper
[2]  mm/vmalloc: Switch to bulk allocator in __vmalloc_area_node()
[3]  mm/vmalloc: Print a warning message first on failure

 include/linux/gfp.h |  9 ++++++
 mm/vmalloc.c        | 78 +++++++++++++++++++++++++--------------------
 2 files changed, 52 insertions(+), 35 deletions(-)

--
2.20.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] mm/page_alloc: Add an alloc_pages_bulk_array_node() helper
  2021-05-16 20:20 [PATCH 0/3] vmalloc() vs bulk allocator v2 Uladzislau Rezki (Sony)
@ 2021-05-16 20:20 ` Uladzislau Rezki (Sony)
  2021-05-16 20:20 ` [PATCH 2/3] mm/vmalloc: Switch to bulk allocator in __vmalloc_area_node() Uladzislau Rezki (Sony)
  2021-05-16 20:20 ` [PATCH 3/3] mm/vmalloc: Print a warning message first on failure Uladzislau Rezki (Sony)
  2 siblings, 0 replies; 12+ messages in thread
From: Uladzislau Rezki (Sony) @ 2021-05-16 20:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Mel Gorman, Matthew Wilcox, Nicholas Piggin,
	Uladzislau Rezki, Hillf Danton, Michal Hocko,
	Oleksiy Avramchenko, Steven Rostedt

Add a "node" variant of the alloc_pages_bulk_array() function.
The helper guarantees that a __alloc_pages_bulk() is invoked
with a valid NUMA node ID.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Acked-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/gfp.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 11da8af06704..94f0b8b1cb55 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -536,6 +536,15 @@ alloc_pages_bulk_array(gfp_t gfp, unsigned long nr_pages, struct page **page_arr
 	return __alloc_pages_bulk(gfp, numa_mem_id(), NULL, nr_pages, NULL, page_array);
 }
 
+static inline unsigned long
+alloc_pages_bulk_array_node(gfp_t gfp, int nid, unsigned long nr_pages, struct page **page_array)
+{
+	if (nid == NUMA_NO_NODE)
+		nid = numa_mem_id();
+
+	return __alloc_pages_bulk(gfp, nid, NULL, nr_pages, NULL, page_array);
+}
+
 /*
  * Allocate pages, preferring the node given as nid. The node must be valid and
  * online. For more general interface, see alloc_pages_node().
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] mm/vmalloc: Switch to bulk allocator in __vmalloc_area_node()
  2021-05-16 20:20 [PATCH 0/3] vmalloc() vs bulk allocator v2 Uladzislau Rezki (Sony)
  2021-05-16 20:20 ` [PATCH 1/3] mm/page_alloc: Add an alloc_pages_bulk_array_node() helper Uladzislau Rezki (Sony)
@ 2021-05-16 20:20 ` Uladzislau Rezki (Sony)
  2021-05-17  8:24   ` Mel Gorman
  2021-05-19 13:44   ` Christoph Hellwig
  2021-05-16 20:20 ` [PATCH 3/3] mm/vmalloc: Print a warning message first on failure Uladzislau Rezki (Sony)
  2 siblings, 2 replies; 12+ messages in thread
From: Uladzislau Rezki (Sony) @ 2021-05-16 20:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Mel Gorman, Matthew Wilcox, Nicholas Piggin,
	Uladzislau Rezki, Hillf Danton, Michal Hocko,
	Oleksiy Avramchenko, Steven Rostedt

Recently there has been introduced a page bulk allocator for
users which need to get number of pages per one call request.

For order-0 pages switch to an alloc_pages_bulk_array_node()
instead of alloc_pages_node(), the reason is the former is
not capable of allocating set of pages, thus a one call is
per one page.

Second, according to my tests the bulk allocator uses less
cycles even for scenarios when only one page is requested.
Running the "perf" on same test case shows below difference:

<default>
  - 45.18% __vmalloc_node
     - __vmalloc_node_range
        - 35.60% __alloc_pages
           - get_page_from_freelist
                3.36% __list_del_entry_valid
                3.00% check_preemption_disabled
                1.42% prep_new_page
<default>

<patch>
  - 31.00% __vmalloc_node
     - __vmalloc_node_range
        - 14.48% __alloc_pages_bulk
             3.22% __list_del_entry_valid
           - 0.83% __alloc_pages
                get_page_from_freelist
<patch>

The "test_vmalloc.sh" also shows performance improvements:

fix_size_alloc_test_4MB   loops: 1000000 avg: 89105095 usec
fix_size_alloc_test       loops: 1000000 avg: 513672   usec
full_fit_alloc_test       loops: 1000000 avg: 748900   usec
long_busy_list_alloc_test loops: 1000000 avg: 8043038  usec
random_size_alloc_test    loops: 1000000 avg: 4028582  usec
fix_align_alloc_test      loops: 1000000 avg: 1457671  usec

fix_size_alloc_test_4MB   loops: 1000000 avg: 62083711 usec
fix_size_alloc_test       loops: 1000000 avg: 449207   usec
full_fit_alloc_test       loops: 1000000 avg: 735985   usec
long_busy_list_alloc_test loops: 1000000 avg: 5176052  usec
random_size_alloc_test    loops: 1000000 avg: 2589252  usec
fix_align_alloc_test      loops: 1000000 avg: 1365009  usec

For example 4MB allocations illustrates ~30% gain, all the
rest is also better.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 mm/vmalloc.c | 76 +++++++++++++++++++++++++++++-----------------------
 1 file changed, 42 insertions(+), 34 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 5d96fee17226..a8e50278019a 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2766,8 +2766,6 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 	unsigned long array_size;
 	unsigned int nr_small_pages = size >> PAGE_SHIFT;
 	unsigned int page_order;
-	struct page **pages;
-	unsigned int i;
 
 	array_size = (unsigned long)nr_small_pages * sizeof(struct page *);
 	gfp_mask |= __GFP_NOWARN;
@@ -2776,13 +2774,13 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 
 	/* Please note that the recursion is strictly bounded. */
 	if (array_size > PAGE_SIZE) {
-		pages = __vmalloc_node(array_size, 1, nested_gfp, node,
+		area->pages = __vmalloc_node(array_size, 1, nested_gfp, node,
 					area->caller);
 	} else {
-		pages = kmalloc_node(array_size, nested_gfp, node);
+		area->pages = kmalloc_node(array_size, nested_gfp, node);
 	}
 
-	if (!pages) {
+	if (!area->pages) {
 		free_vm_area(area);
 		warn_alloc(gfp_mask, NULL,
 			   "vmalloc size %lu allocation failure: "
@@ -2791,43 +2789,53 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 		return NULL;
 	}
 
-	area->pages = pages;
-	area->nr_pages = nr_small_pages;
+	area->nr_pages = 0;
 	set_vm_area_page_order(area, page_shift - PAGE_SHIFT);
-
 	page_order = vm_area_page_order(area);
 
-	/*
-	 * Careful, we allocate and map page_order pages, but tracking is done
-	 * per PAGE_SIZE page so as to keep the vm_struct APIs independent of
-	 * the physical/mapped size.
-	 */
-	for (i = 0; i < area->nr_pages; i += 1U << page_order) {
-		struct page *page;
-		int p;
-
-		/* Compound pages required for remap_vmalloc_page */
-		page = alloc_pages_node(node, gfp_mask | __GFP_COMP, page_order);
-		if (unlikely(!page)) {
-			/* Successfully allocated i pages, free them in __vfree() */
-			area->nr_pages = i;
-			atomic_long_add(area->nr_pages, &nr_vmalloc_pages);
-			warn_alloc(gfp_mask, NULL,
-				   "vmalloc size %lu allocation failure: "
-				   "page order %u allocation failed",
-				   area->nr_pages * PAGE_SIZE, page_order);
-			goto fail;
-		}
+	if (!page_order) {
+		area->nr_pages = alloc_pages_bulk_array_node(
+			gfp_mask, node, nr_small_pages, area->pages);
+	} else {
+		/*
+		 * Careful, we allocate and map page_order pages, but tracking is done
+		 * per PAGE_SIZE page so as to keep the vm_struct APIs independent of
+		 * the physical/mapped size.
+		 */
+		while (area->nr_pages < nr_small_pages) {
+			struct page *page;
+			int i;
+
+			/* Compound pages required for remap_vmalloc_page */
+			page = alloc_pages_node(node, gfp_mask | __GFP_COMP, page_order);
+			if (unlikely(!page))
+				break;
 
-		for (p = 0; p < (1U << page_order); p++)
-			area->pages[i + p] = page + p;
+			for (i = 0; i < (1U << page_order); i++)
+				area->pages[area->nr_pages + i] = page + i;
 
-		if (gfpflags_allow_blocking(gfp_mask))
-			cond_resched();
+			if (gfpflags_allow_blocking(gfp_mask))
+				cond_resched();
+
+			area->nr_pages += 1U << page_order;
+		}
 	}
+
 	atomic_long_add(area->nr_pages, &nr_vmalloc_pages);
 
-	if (vmap_pages_range(addr, addr + size, prot, pages, page_shift) < 0) {
+	/*
+	 * If not enough pages were obtained to accomplish an
+	 * allocation request, free them via __vfree() if any.
+	 */
+	if (area->nr_pages != nr_small_pages) {
+		warn_alloc(gfp_mask, NULL,
+			"vmalloc size %lu allocation failure: "
+			"page order %u allocation failed",
+			area->nr_pages * PAGE_SIZE, page_order);
+		goto fail;
+	}
+
+	if (vmap_pages_range(addr, addr + size, prot, area->pages, page_shift) < 0) {
 		warn_alloc(gfp_mask, NULL,
 			   "vmalloc size %lu allocation failure: "
 			   "failed to map pages",
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] mm/vmalloc: Print a warning message first on failure
  2021-05-16 20:20 [PATCH 0/3] vmalloc() vs bulk allocator v2 Uladzislau Rezki (Sony)
  2021-05-16 20:20 ` [PATCH 1/3] mm/page_alloc: Add an alloc_pages_bulk_array_node() helper Uladzislau Rezki (Sony)
  2021-05-16 20:20 ` [PATCH 2/3] mm/vmalloc: Switch to bulk allocator in __vmalloc_area_node() Uladzislau Rezki (Sony)
@ 2021-05-16 20:20 ` Uladzislau Rezki (Sony)
  2 siblings, 0 replies; 12+ messages in thread
From: Uladzislau Rezki (Sony) @ 2021-05-16 20:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Mel Gorman, Matthew Wilcox, Nicholas Piggin,
	Uladzislau Rezki, Hillf Danton, Michal Hocko,
	Oleksiy Avramchenko, Steven Rostedt

When a memory allocation for array of pages are not succeed
emit a warning message as a first step and then perform the
further cleanup.

The reason it should be done in a right order is the clean
up function which is free_vm_area() can potentially also
follow its error paths what can lead to confusion what was
broken first.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 mm/vmalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index a8e50278019a..b2a0cbfa37c1 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2781,11 +2781,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 	}
 
 	if (!area->pages) {
-		free_vm_area(area);
 		warn_alloc(gfp_mask, NULL,
 			   "vmalloc size %lu allocation failure: "
 			   "page array size %lu allocation failed",
 			   nr_small_pages * PAGE_SIZE, array_size);
+		free_vm_area(area);
 		return NULL;
 	}
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] mm/vmalloc: Switch to bulk allocator in __vmalloc_area_node()
  2021-05-16 20:20 ` [PATCH 2/3] mm/vmalloc: Switch to bulk allocator in __vmalloc_area_node() Uladzislau Rezki (Sony)
@ 2021-05-17  8:24   ` Mel Gorman
  2021-05-17 11:51     ` Uladzislau Rezki
  2021-05-19 13:44   ` Christoph Hellwig
  1 sibling, 1 reply; 12+ messages in thread
From: Mel Gorman @ 2021-05-17  8:24 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: Andrew Morton, linux-mm, LKML, Matthew Wilcox, Nicholas Piggin,
	Hillf Danton, Michal Hocko, Oleksiy Avramchenko, Steven Rostedt

On Sun, May 16, 2021 at 10:20:55PM +0200, Uladzislau Rezki (Sony) wrote:
> Recently there has been introduced a page bulk allocator for
> users which need to get number of pages per one call request.
> 
> For order-0 pages switch to an alloc_pages_bulk_array_node()
> instead of alloc_pages_node(), the reason is the former is
> not capable of allocating set of pages, thus a one call is
> per one page.
> 
> Second, according to my tests the bulk allocator uses less
> cycles even for scenarios when only one page is requested.
> Running the "perf" on same test case shows below difference:
> 
> <default>
>   - 45.18% __vmalloc_node
>      - __vmalloc_node_range
>         - 35.60% __alloc_pages
>            - get_page_from_freelist
>                 3.36% __list_del_entry_valid
>                 3.00% check_preemption_disabled
>                 1.42% prep_new_page
> <default>
> 
> <patch>
>   - 31.00% __vmalloc_node
>      - __vmalloc_node_range
>         - 14.48% __alloc_pages_bulk
>              3.22% __list_del_entry_valid
>            - 0.83% __alloc_pages
>                 get_page_from_freelist
> <patch>
> 
> The "test_vmalloc.sh" also shows performance improvements:
> 
> fix_size_alloc_test_4MB   loops: 1000000 avg: 89105095 usec
> fix_size_alloc_test       loops: 1000000 avg: 513672   usec
> full_fit_alloc_test       loops: 1000000 avg: 748900   usec
> long_busy_list_alloc_test loops: 1000000 avg: 8043038  usec
> random_size_alloc_test    loops: 1000000 avg: 4028582  usec
> fix_align_alloc_test      loops: 1000000 avg: 1457671  usec
> 
> fix_size_alloc_test_4MB   loops: 1000000 avg: 62083711 usec
> fix_size_alloc_test       loops: 1000000 avg: 449207   usec
> full_fit_alloc_test       loops: 1000000 avg: 735985   usec
> long_busy_list_alloc_test loops: 1000000 avg: 5176052  usec
> random_size_alloc_test    loops: 1000000 avg: 2589252  usec
> fix_align_alloc_test      loops: 1000000 avg: 1365009  usec
> 
> For example 4MB allocations illustrates ~30% gain, all the
> rest is also better.
> 
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

FWIW, it passed build and boot tests.

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] mm/vmalloc: Switch to bulk allocator in __vmalloc_area_node()
  2021-05-17  8:24   ` Mel Gorman
@ 2021-05-17 11:51     ` Uladzislau Rezki
  0 siblings, 0 replies; 12+ messages in thread
From: Uladzislau Rezki @ 2021-05-17 11:51 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux Memory Management List, LKML,
	Matthew Wilcox, Nicholas Piggin, Hillf Danton, Michal Hocko,
	Oleksiy Avramchenko, Steven Rostedt

>
> FWIW, it passed build and boot tests.
>
> Acked-by: Mel Gorman <mgorman@suse.de>
>
Thanks!

-- 
Vlad Rezki


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] mm/vmalloc: Switch to bulk allocator in __vmalloc_area_node()
  2021-05-16 20:20 ` [PATCH 2/3] mm/vmalloc: Switch to bulk allocator in __vmalloc_area_node() Uladzislau Rezki (Sony)
  2021-05-17  8:24   ` Mel Gorman
@ 2021-05-19 13:44   ` Christoph Hellwig
  2021-05-19 14:39     ` Uladzislau Rezki
  1 sibling, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2021-05-19 13:44 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: Andrew Morton, linux-mm, LKML, Mel Gorman, Matthew Wilcox,
	Nicholas Piggin, Hillf Danton, Michal Hocko, Oleksiy Avramchenko,
	Steven Rostedt

> +	if (!page_order) {
> +		area->nr_pages = alloc_pages_bulk_array_node(
> +			gfp_mask, node, nr_small_pages, area->pages);
> +	} else {
> +		/*
> +		 * Careful, we allocate and map page_order pages, but tracking is done
> +		 * per PAGE_SIZE page so as to keep the vm_struct APIs independent of

Comments over 80 lines are completely unreadable, so please avoid them.

> +		 * the physical/mapped size.
> +		 */
> +		while (area->nr_pages < nr_small_pages) {
> +			struct page *page;
> +			int i;
> +
> +			/* Compound pages required for remap_vmalloc_page */
> +			page = alloc_pages_node(node, gfp_mask | __GFP_COMP, page_order);
> +			if (unlikely(!page))
> +				break;
>  
> +			for (i = 0; i < (1U << page_order); i++)
> +				area->pages[area->nr_pages + i] = page + i;
>  
> +			if (gfpflags_allow_blocking(gfp_mask))
> +				cond_resched();
> +
> +			area->nr_pages += 1U << page_order;
> +		}

In fact splitting this whole high order allocation logic into a little
helper would massivel benefit the function by ordering it more logical
and reducing a level of indentation.

> +	/*
> +	 * If not enough pages were obtained to accomplish an
> +	 * allocation request, free them via __vfree() if any.
> +	 */
> +	if (area->nr_pages != nr_small_pages) {
> +		warn_alloc(gfp_mask, NULL,
> +			"vmalloc size %lu allocation failure: "
> +			"page order %u allocation failed",
> +			area->nr_pages * PAGE_SIZE, page_order);
> +		goto fail;
> +	}

From reading __alloc_pages_bulk not allocating all pages is something
that cn happen fairly easily.  Shouldn't we try to allocate the missing
pages manually and/ore retry here?

> +
> +	if (vmap_pages_range(addr, addr + size, prot, area->pages, page_shift) < 0) {

Another pointlessly long line.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] mm/vmalloc: Switch to bulk allocator in __vmalloc_area_node()
  2021-05-19 13:44   ` Christoph Hellwig
@ 2021-05-19 14:39     ` Uladzislau Rezki
  2021-05-19 15:56       ` Mel Gorman
  0 siblings, 1 reply; 12+ messages in thread
From: Uladzislau Rezki @ 2021-05-19 14:39 UTC (permalink / raw)
  To: Christoph Hellwig, Mel Gorman
  Cc: Uladzislau Rezki (Sony),
	Andrew Morton, linux-mm, LKML, Mel Gorman, Matthew Wilcox,
	Nicholas Piggin, Hillf Danton, Michal Hocko, Oleksiy Avramchenko,
	Steven Rostedt

On Wed, May 19, 2021 at 02:44:08PM +0100, Christoph Hellwig wrote:
> > +	if (!page_order) {
> > +		area->nr_pages = alloc_pages_bulk_array_node(
> > +			gfp_mask, node, nr_small_pages, area->pages);
> > +	} else {
> > +		/*
> > +		 * Careful, we allocate and map page_order pages, but tracking is done
> > +		 * per PAGE_SIZE page so as to keep the vm_struct APIs independent of
> 
> Comments over 80 lines are completely unreadable, so please avoid them.
> 
That i can fix by separate patch.

> > +		 * the physical/mapped size.
> > +		 */
> > +		while (area->nr_pages < nr_small_pages) {
> > +			struct page *page;
> > +			int i;
> > +
> > +			/* Compound pages required for remap_vmalloc_page */
> > +			page = alloc_pages_node(node, gfp_mask | __GFP_COMP, page_order);
> > +			if (unlikely(!page))
> > +				break;
> >  
> > +			for (i = 0; i < (1U << page_order); i++)
> > +				area->pages[area->nr_pages + i] = page + i;
> >  
> > +			if (gfpflags_allow_blocking(gfp_mask))
> > +				cond_resched();
> > +
> > +			area->nr_pages += 1U << page_order;
> > +		}
> 
> In fact splitting this whole high order allocation logic into a little
> helper would massivel benefit the function by ordering it more logical
> and reducing a level of indentation.
> 
I can put it into separate function. Actually i was thinking about it.

> > +	/*
> > +	 * If not enough pages were obtained to accomplish an
> > +	 * allocation request, free them via __vfree() if any.
> > +	 */
> > +	if (area->nr_pages != nr_small_pages) {
> > +		warn_alloc(gfp_mask, NULL,
> > +			"vmalloc size %lu allocation failure: "
> > +			"page order %u allocation failed",
> > +			area->nr_pages * PAGE_SIZE, page_order);
> > +		goto fail;
> > +	}
> 
> From reading __alloc_pages_bulk not allocating all pages is something
> that cn happen fairly easily.  Shouldn't we try to allocate the missing
> pages manually and/ore retry here?
> 
It is a good point. The bulk-allocator, as i see, only tries to access
to pcp-list and falls-back to a single allocator once it fails, so the
array may not be fully populated.

In that case probably it makes sense to manually populate it using
single page allocator.

Mel, could you please also comment on it?

> > +
> > +	if (vmap_pages_range(addr, addr + size, prot, area->pages, page_shift) < 0) {
> 
> Another pointlessly long line.
Yep. Will fix it by a separate patch. Actually the checkpatch.pl also
complains on splitting the text like below:

    warn_alloc(gfp_mask, NULL,
        "vmalloc size %lu allocation failure: "
        "page order %u allocation failed",
        area->nr_pages * PAGE_SIZE, page_order);


Thanks for the comments!

--
Vlad Rezki


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] mm/vmalloc: Switch to bulk allocator in __vmalloc_area_node()
  2021-05-19 14:39     ` Uladzislau Rezki
@ 2021-05-19 15:56       ` Mel Gorman
  2021-05-19 19:52         ` Uladzislau Rezki
  0 siblings, 1 reply; 12+ messages in thread
From: Mel Gorman @ 2021-05-19 15:56 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Christoph Hellwig, Andrew Morton, linux-mm, LKML, Matthew Wilcox,
	Nicholas Piggin, Hillf Danton, Michal Hocko, Oleksiy Avramchenko,
	Steven Rostedt

On Wed, May 19, 2021 at 04:39:00PM +0200, Uladzislau Rezki wrote:
> > > +	/*
> > > +	 * If not enough pages were obtained to accomplish an
> > > +	 * allocation request, free them via __vfree() if any.
> > > +	 */
> > > +	if (area->nr_pages != nr_small_pages) {
> > > +		warn_alloc(gfp_mask, NULL,
> > > +			"vmalloc size %lu allocation failure: "
> > > +			"page order %u allocation failed",
> > > +			area->nr_pages * PAGE_SIZE, page_order);
> > > +		goto fail;
> > > +	}
> > 
> > From reading __alloc_pages_bulk not allocating all pages is something
> > that cn happen fairly easily.  Shouldn't we try to allocate the missing
> > pages manually and/ore retry here?
> > 
>
> It is a good point. The bulk-allocator, as i see, only tries to access
> to pcp-list and falls-back to a single allocator once it fails, so the
> array may not be fully populated.
> 

Partially correct. It does allocate via the pcp-list but the pcp-list will
be refilled if it's empty so if the bulk allocator returns fewer pages
than requested, it may be due to hitting watermarks or the local zone is
depleted. It does not take any special action to correct the situation
or stall e.g.  wake kswapd, enter direct reclaim, allocate from a remote
node etc.

If no pages were allocated, it'll try allocate at least one page via a
single allocation request in case the bulk failure would push the zone
over the watermark but 1 page does not. That path as a side-effect would
also wake kswapd.

> In that case probably it makes sense to manually populate it using
> single page allocator.
> 
> Mel, could you please also comment on it?
> 

It is by design because it's unknown if callers can recover or if so,
how they want to recover and the primary intent behind the bulk allocator
was speed. In the case of network, it only wants some pages quickly so as
long as it gets 1, it makes progress. For the sunrpc user, it's willing
to wait and retry. For vmalloc, I'm unsure what a suitable recovery path
should be as I do not have a good handle on workloads that are sensitive
to vmalloc performance. The obvious option would be to loop and allocate
single pages with alloc_pages_node understanding that the additional
pages may take longer to allocate.

An alternative option would be to define either __GFP_RETRY_MAYFAIL or
__GFP_NORETRY semantics for the bulk allocator to handle it in the failure
path. It's a bit more complex because the existing __GFP_RETRY_MAYFAIL
semantics deal with costly high-order allocations. __GFP_NORETRY would
be slightly trickier although it makes more sense. The failure path would
retry the failure path unless __GFP_NORETRY was specified. For that option,
the network path would need to be updated to add the __GFP_NORETRY flag
as it almost certainly does not want looping behaviour.

-- 
Mel Gorman
SUSE Labs


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] mm/vmalloc: Switch to bulk allocator in __vmalloc_area_node()
  2021-05-19 15:56       ` Mel Gorman
@ 2021-05-19 19:52         ` Uladzislau Rezki
  2021-05-19 21:07           ` Uladzislau Rezki
  0 siblings, 1 reply; 12+ messages in thread
From: Uladzislau Rezki @ 2021-05-19 19:52 UTC (permalink / raw)
  To: Mel Gorman, Christoph Hellwig
  Cc: Uladzislau Rezki, Christoph Hellwig, Andrew Morton, linux-mm,
	LKML, Matthew Wilcox, Nicholas Piggin, Hillf Danton,
	Michal Hocko, Oleksiy Avramchenko, Steven Rostedt

> On Wed, May 19, 2021 at 04:39:00PM +0200, Uladzislau Rezki wrote:
> > > > +	/*
> > > > +	 * If not enough pages were obtained to accomplish an
> > > > +	 * allocation request, free them via __vfree() if any.
> > > > +	 */
> > > > +	if (area->nr_pages != nr_small_pages) {
> > > > +		warn_alloc(gfp_mask, NULL,
> > > > +			"vmalloc size %lu allocation failure: "
> > > > +			"page order %u allocation failed",
> > > > +			area->nr_pages * PAGE_SIZE, page_order);
> > > > +		goto fail;
> > > > +	}
> > > 
> > > From reading __alloc_pages_bulk not allocating all pages is something
> > > that cn happen fairly easily.  Shouldn't we try to allocate the missing
> > > pages manually and/ore retry here?
> > > 
> >
> > It is a good point. The bulk-allocator, as i see, only tries to access
> > to pcp-list and falls-back to a single allocator once it fails, so the
> > array may not be fully populated.
> > 
> 
> Partially correct. It does allocate via the pcp-list but the pcp-list will
> be refilled if it's empty so if the bulk allocator returns fewer pages
> than requested, it may be due to hitting watermarks or the local zone is
> depleted. It does not take any special action to correct the situation
> or stall e.g.  wake kswapd, enter direct reclaim, allocate from a remote
> node etc.
> 
> If no pages were allocated, it'll try allocate at least one page via a
> single allocation request in case the bulk failure would push the zone
> over the watermark but 1 page does not. That path as a side-effect would
> also wake kswapd.
> 
OK. A single page allocator can enter a slow path i mean direct reclaim,
etc to adjust watermarks.

> > In that case probably it makes sense to manually populate it using
> > single page allocator.
> > 
> > Mel, could you please also comment on it?
> > 
> 
> It is by design because it's unknown if callers can recover or if so,
> how they want to recover and the primary intent behind the bulk allocator
> was speed. In the case of network, it only wants some pages quickly so as
> long as it gets 1, it makes progress. For the sunrpc user, it's willing
> to wait and retry. For vmalloc, I'm unsure what a suitable recovery path
> should be as I do not have a good handle on workloads that are sensitive
> to vmalloc performance. The obvious option would be to loop and allocate
> single pages with alloc_pages_node understanding that the additional
> pages may take longer to allocate.
> 
I got it. At least we should fall-back for a single allocator, that is how
we used to allocate before(now it is for high-order pages). If it also fails
to obtain a page we are done.

Basically a single-page allocator is more permissive so it is a higher
chance to success. Therefore a fallback to it makes sense.

Thanks.

--
Vlad Rezki


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] mm/vmalloc: Switch to bulk allocator in __vmalloc_area_node()
  2021-05-19 19:52         ` Uladzislau Rezki
@ 2021-05-19 21:07           ` Uladzislau Rezki
  2021-06-28 23:00             ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Uladzislau Rezki @ 2021-05-19 21:07 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Mel Gorman, Christoph Hellwig, Andrew Morton, linux-mm, LKML,
	Matthew Wilcox, Nicholas Piggin, Hillf Danton, Michal Hocko,
	Oleksiy Avramchenko, Steven Rostedt

On Wed, May 19, 2021 at 09:52:14PM +0200, Uladzislau Rezki wrote:
> > On Wed, May 19, 2021 at 04:39:00PM +0200, Uladzislau Rezki wrote:
> > > > > +	/*
> > > > > +	 * If not enough pages were obtained to accomplish an
> > > > > +	 * allocation request, free them via __vfree() if any.
> > > > > +	 */
> > > > > +	if (area->nr_pages != nr_small_pages) {
> > > > > +		warn_alloc(gfp_mask, NULL,
> > > > > +			"vmalloc size %lu allocation failure: "
> > > > > +			"page order %u allocation failed",
> > > > > +			area->nr_pages * PAGE_SIZE, page_order);
> > > > > +		goto fail;
> > > > > +	}
> > > > 
> > > > From reading __alloc_pages_bulk not allocating all pages is something
> > > > that cn happen fairly easily.  Shouldn't we try to allocate the missing
> > > > pages manually and/ore retry here?
> > > > 
> > >
> > > It is a good point. The bulk-allocator, as i see, only tries to access
> > > to pcp-list and falls-back to a single allocator once it fails, so the
> > > array may not be fully populated.
> > > 
> > 
> > Partially correct. It does allocate via the pcp-list but the pcp-list will
> > be refilled if it's empty so if the bulk allocator returns fewer pages
> > than requested, it may be due to hitting watermarks or the local zone is
> > depleted. It does not take any special action to correct the situation
> > or stall e.g.  wake kswapd, enter direct reclaim, allocate from a remote
> > node etc.
> > 
> > If no pages were allocated, it'll try allocate at least one page via a
> > single allocation request in case the bulk failure would push the zone
> > over the watermark but 1 page does not. That path as a side-effect would
> > also wake kswapd.
> > 
> OK. A single page allocator can enter a slow path i mean direct reclaim,
> etc to adjust watermarks.
> 
> > > In that case probably it makes sense to manually populate it using
> > > single page allocator.
> > > 
> > > Mel, could you please also comment on it?
> > > 
> > 
> > It is by design because it's unknown if callers can recover or if so,
> > how they want to recover and the primary intent behind the bulk allocator
> > was speed. In the case of network, it only wants some pages quickly so as
> > long as it gets 1, it makes progress. For the sunrpc user, it's willing
> > to wait and retry. For vmalloc, I'm unsure what a suitable recovery path
> > should be as I do not have a good handle on workloads that are sensitive
> > to vmalloc performance. The obvious option would be to loop and allocate
> > single pages with alloc_pages_node understanding that the additional
> > pages may take longer to allocate.
> > 
> I got it. At least we should fall-back for a single allocator, that is how
> we used to allocate before(now it is for high-order pages). If it also fails
> to obtain a page we are done.
> 
> Basically a single-page allocator is more permissive so it is a higher
> chance to success. Therefore a fallback to it makes sense.
> 
Hello, Christoph.

See below the patch. Does it sound good for you? It is about moving
page allocation part into separate function:

<snip>
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index b2a0cbfa37c1..18773a4ad5fa 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2756,6 +2756,53 @@ void *vmap_pfn(unsigned long *pfns, unsigned int count, pgprot_t prot)
 EXPORT_SYMBOL_GPL(vmap_pfn);
 #endif /* CONFIG_VMAP_PFN */
 
+static inline unsigned int
+__vmalloc_area_node_get_pages(gfp_t gfp, int nid, unsigned int page_order,
+	unsigned long nr_small_pages, struct page **pages)
+{
+	unsigned int nr_allocated = 0;
+
+	/*
+	 * For order-0 pages we make use of bulk allocator, if
+	 * the page array is partly or not at all populated due
+	 * to fails, fallback to a single page allocator that is
+	 * more permissive.
+	 */
+	if (!page_order)
+		nr_allocated = alloc_pages_bulk_array_node(
+			gfp, nid, nr_small_pages, pages);
+
+	/* High-order pages or fallback path if "bulk" fails. */
+	while (nr_allocated < nr_small_pages) {
+		struct page *page;
+		int i;
+
+		/*
+		 * Compound pages required for remap_vmalloc_page if
+		 * high-order pages. For the order-0 the __GFP_COMP
+		 * is ignored.
+		 */
+		page = alloc_pages_node(nid, gfp | __GFP_COMP, page_order);
+		if (unlikely(!page))
+			break;
+
+		/*
+		 * Careful, we allocate and map page_order pages, but
+		 * tracking is done per PAGE_SIZE page so as to keep the
+		 * vm_struct APIs independent of the physical/mapped size.
+		 */
+		for (i = 0; i < (1U << page_order); i++)
+			pages[nr_allocated + i] = page + i;
+
+		if (gfpflags_allow_blocking(gfp))
+			cond_resched();
+
+		nr_allocated += 1U << page_order;
+	}
+
+	return nr_allocated;
+}
+
 static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 				 pgprot_t prot, unsigned int page_shift,
 				 int node)
@@ -2789,37 +2836,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 		return NULL;
 	}
 
-	area->nr_pages = 0;
 	set_vm_area_page_order(area, page_shift - PAGE_SHIFT);
 	page_order = vm_area_page_order(area);
 
-	if (!page_order) {
-		area->nr_pages = alloc_pages_bulk_array_node(
-			gfp_mask, node, nr_small_pages, area->pages);
-	} else {
-		/*
-		 * Careful, we allocate and map page_order pages, but tracking is done
-		 * per PAGE_SIZE page so as to keep the vm_struct APIs independent of
-		 * the physical/mapped size.
-		 */
-		while (area->nr_pages < nr_small_pages) {
-			struct page *page;
-			int i;
-
-			/* Compound pages required for remap_vmalloc_page */
-			page = alloc_pages_node(node, gfp_mask | __GFP_COMP, page_order);
-			if (unlikely(!page))
-				break;
-
-			for (i = 0; i < (1U << page_order); i++)
-				area->pages[area->nr_pages + i] = page + i;
-
-			if (gfpflags_allow_blocking(gfp_mask))
-				cond_resched();
-
-			area->nr_pages += 1U << page_order;
-		}
-	}
+	area->nr_pages = __vmalloc_area_node_get_pages(gfp_mask,
+		node, page_order, nr_small_pages, area->pages);
 
 	atomic_long_add(area->nr_pages, &nr_vmalloc_pages);
<snip>

--
Vlad Rezki


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] mm/vmalloc: Switch to bulk allocator in __vmalloc_area_node()
  2021-05-19 21:07           ` Uladzislau Rezki
@ 2021-06-28 23:00             ` Andrew Morton
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Morton @ 2021-06-28 23:00 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Christoph Hellwig, Mel Gorman, linux-mm, LKML, Matthew Wilcox,
	Nicholas Piggin, Hillf Danton, Michal Hocko, Oleksiy Avramchenko,
	Steven Rostedt

On Wed, 19 May 2021 23:07:50 +0200 Uladzislau Rezki <urezki@gmail.com> wrote:

> > Basically a single-page allocator is more permissive so it is a higher
> > chance to success. Therefore a fallback to it makes sense.
> > 
> Hello, Christoph.
> 
> See below the patch. Does it sound good for you? It is about moving
> page allocation part into separate function:

Please just send over a patch which addresses this and Christoph's
other review comments.

I don't think the discussion with Mel against this patch identified any
needed changes, so I'll send this series to Linus.



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-06-28 23:00 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-16 20:20 [PATCH 0/3] vmalloc() vs bulk allocator v2 Uladzislau Rezki (Sony)
2021-05-16 20:20 ` [PATCH 1/3] mm/page_alloc: Add an alloc_pages_bulk_array_node() helper Uladzislau Rezki (Sony)
2021-05-16 20:20 ` [PATCH 2/3] mm/vmalloc: Switch to bulk allocator in __vmalloc_area_node() Uladzislau Rezki (Sony)
2021-05-17  8:24   ` Mel Gorman
2021-05-17 11:51     ` Uladzislau Rezki
2021-05-19 13:44   ` Christoph Hellwig
2021-05-19 14:39     ` Uladzislau Rezki
2021-05-19 15:56       ` Mel Gorman
2021-05-19 19:52         ` Uladzislau Rezki
2021-05-19 21:07           ` Uladzislau Rezki
2021-06-28 23:00             ` Andrew Morton
2021-05-16 20:20 ` [PATCH 3/3] mm/vmalloc: Print a warning message first on failure Uladzislau Rezki (Sony)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).