All of lore.kernel.org
 help / color / mirror / Atom feed
From: Uladzislau Rezki <urezki@gmail.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>,
	Christoph Hellwig <hch@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Nicholas Piggin <npiggin@gmail.com>,
	Hillf Danton <hdanton@sina.com>, Michal Hocko <mhocko@suse.com>,
	Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>,
	Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [PATCH 2/3] mm/vmalloc: Switch to bulk allocator in __vmalloc_area_node()
Date: Wed, 19 May 2021 23:07:50 +0200	[thread overview]
Message-ID: <20210519210750.GA5615@pc638.lan> (raw)
In-Reply-To: <20210519195214.GA2343@pc638.lan>

On Wed, May 19, 2021 at 09:52:14PM +0200, Uladzislau Rezki wrote:
> > On Wed, May 19, 2021 at 04:39:00PM +0200, Uladzislau Rezki wrote:
> > > > > +	/*
> > > > > +	 * If not enough pages were obtained to accomplish an
> > > > > +	 * allocation request, free them via __vfree() if any.
> > > > > +	 */
> > > > > +	if (area->nr_pages != nr_small_pages) {
> > > > > +		warn_alloc(gfp_mask, NULL,
> > > > > +			"vmalloc size %lu allocation failure: "
> > > > > +			"page order %u allocation failed",
> > > > > +			area->nr_pages * PAGE_SIZE, page_order);
> > > > > +		goto fail;
> > > > > +	}
> > > > 
> > > > From reading __alloc_pages_bulk not allocating all pages is something
> > > > that cn happen fairly easily.  Shouldn't we try to allocate the missing
> > > > pages manually and/ore retry here?
> > > > 
> > >
> > > It is a good point. The bulk-allocator, as i see, only tries to access
> > > to pcp-list and falls-back to a single allocator once it fails, so the
> > > array may not be fully populated.
> > > 
> > 
> > Partially correct. It does allocate via the pcp-list but the pcp-list will
> > be refilled if it's empty so if the bulk allocator returns fewer pages
> > than requested, it may be due to hitting watermarks or the local zone is
> > depleted. It does not take any special action to correct the situation
> > or stall e.g.  wake kswapd, enter direct reclaim, allocate from a remote
> > node etc.
> > 
> > If no pages were allocated, it'll try allocate at least one page via a
> > single allocation request in case the bulk failure would push the zone
> > over the watermark but 1 page does not. That path as a side-effect would
> > also wake kswapd.
> > 
> OK. A single page allocator can enter a slow path i mean direct reclaim,
> etc to adjust watermarks.
> 
> > > In that case probably it makes sense to manually populate it using
> > > single page allocator.
> > > 
> > > Mel, could you please also comment on it?
> > > 
> > 
> > It is by design because it's unknown if callers can recover or if so,
> > how they want to recover and the primary intent behind the bulk allocator
> > was speed. In the case of network, it only wants some pages quickly so as
> > long as it gets 1, it makes progress. For the sunrpc user, it's willing
> > to wait and retry. For vmalloc, I'm unsure what a suitable recovery path
> > should be as I do not have a good handle on workloads that are sensitive
> > to vmalloc performance. The obvious option would be to loop and allocate
> > single pages with alloc_pages_node understanding that the additional
> > pages may take longer to allocate.
> > 
> I got it. At least we should fall-back for a single allocator, that is how
> we used to allocate before(now it is for high-order pages). If it also fails
> to obtain a page we are done.
> 
> Basically a single-page allocator is more permissive so it is a higher
> chance to success. Therefore a fallback to it makes sense.
> 
Hello, Christoph.

See below the patch. Does it sound good for you? It is about moving
page allocation part into separate function:

<snip>
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index b2a0cbfa37c1..18773a4ad5fa 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2756,6 +2756,53 @@ void *vmap_pfn(unsigned long *pfns, unsigned int count, pgprot_t prot)
 EXPORT_SYMBOL_GPL(vmap_pfn);
 #endif /* CONFIG_VMAP_PFN */
 
+static inline unsigned int
+__vmalloc_area_node_get_pages(gfp_t gfp, int nid, unsigned int page_order,
+	unsigned long nr_small_pages, struct page **pages)
+{
+	unsigned int nr_allocated = 0;
+
+	/*
+	 * For order-0 pages we make use of bulk allocator, if
+	 * the page array is partly or not at all populated due
+	 * to fails, fallback to a single page allocator that is
+	 * more permissive.
+	 */
+	if (!page_order)
+		nr_allocated = alloc_pages_bulk_array_node(
+			gfp, nid, nr_small_pages, pages);
+
+	/* High-order pages or fallback path if "bulk" fails. */
+	while (nr_allocated < nr_small_pages) {
+		struct page *page;
+		int i;
+
+		/*
+		 * Compound pages required for remap_vmalloc_page if
+		 * high-order pages. For the order-0 the __GFP_COMP
+		 * is ignored.
+		 */
+		page = alloc_pages_node(nid, gfp | __GFP_COMP, page_order);
+		if (unlikely(!page))
+			break;
+
+		/*
+		 * Careful, we allocate and map page_order pages, but
+		 * tracking is done per PAGE_SIZE page so as to keep the
+		 * vm_struct APIs independent of the physical/mapped size.
+		 */
+		for (i = 0; i < (1U << page_order); i++)
+			pages[nr_allocated + i] = page + i;
+
+		if (gfpflags_allow_blocking(gfp))
+			cond_resched();
+
+		nr_allocated += 1U << page_order;
+	}
+
+	return nr_allocated;
+}
+
 static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 				 pgprot_t prot, unsigned int page_shift,
 				 int node)
@@ -2789,37 +2836,11 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 		return NULL;
 	}
 
-	area->nr_pages = 0;
 	set_vm_area_page_order(area, page_shift - PAGE_SHIFT);
 	page_order = vm_area_page_order(area);
 
-	if (!page_order) {
-		area->nr_pages = alloc_pages_bulk_array_node(
-			gfp_mask, node, nr_small_pages, area->pages);
-	} else {
-		/*
-		 * Careful, we allocate and map page_order pages, but tracking is done
-		 * per PAGE_SIZE page so as to keep the vm_struct APIs independent of
-		 * the physical/mapped size.
-		 */
-		while (area->nr_pages < nr_small_pages) {
-			struct page *page;
-			int i;
-
-			/* Compound pages required for remap_vmalloc_page */
-			page = alloc_pages_node(node, gfp_mask | __GFP_COMP, page_order);
-			if (unlikely(!page))
-				break;
-
-			for (i = 0; i < (1U << page_order); i++)
-				area->pages[area->nr_pages + i] = page + i;
-
-			if (gfpflags_allow_blocking(gfp_mask))
-				cond_resched();
-
-			area->nr_pages += 1U << page_order;
-		}
-	}
+	area->nr_pages = __vmalloc_area_node_get_pages(gfp_mask,
+		node, page_order, nr_small_pages, area->pages);
 
 	atomic_long_add(area->nr_pages, &nr_vmalloc_pages);
<snip>

--
Vlad Rezki

  reply	other threads:[~2021-05-19 21:07 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-16 20:20 [PATCH 0/3] vmalloc() vs bulk allocator v2 Uladzislau Rezki (Sony)
2021-05-16 20:20 ` [PATCH 1/3] mm/page_alloc: Add an alloc_pages_bulk_array_node() helper Uladzislau Rezki (Sony)
2021-05-16 20:20 ` [PATCH 2/3] mm/vmalloc: Switch to bulk allocator in __vmalloc_area_node() Uladzislau Rezki (Sony)
2021-05-17  8:24   ` Mel Gorman
2021-05-17 11:51     ` Uladzislau Rezki
2021-05-17 11:51       ` Uladzislau Rezki
2021-05-19 13:44   ` Christoph Hellwig
2021-05-19 14:39     ` Uladzislau Rezki
2021-05-19 15:56       ` Mel Gorman
2021-05-19 19:52         ` Uladzislau Rezki
2021-05-19 21:07           ` Uladzislau Rezki [this message]
2021-06-28 23:00             ` Andrew Morton
2021-05-16 20:20 ` [PATCH 3/3] mm/vmalloc: Print a warning message first on failure Uladzislau Rezki (Sony)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210519210750.GA5615@pc638.lan \
    --to=urezki@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=hch@infradead.org \
    --cc=hdanton@sina.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=npiggin@gmail.com \
    --cc=oleksiy.avramchenko@sonymobile.com \
    --cc=rostedt@goodmis.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.