All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Linux 4.9-rc6
Date: Sun, 20 Nov 2016 20:59:50 -0800	[thread overview]
Message-ID: <1479704390.8455.398.camel@edumazet-glaptop3.roam.corp.google.com> (raw)
In-Reply-To: <20161121013558.GG1555@ZenIV.linux.org.uk>

On Mon, 2016-11-21 at 01:35 +0000, Al Viro wrote:

> 
> Umm...  One possibility would be something like fs/namespace.c:m_start() -
> if nothing has changed since the last time, just use a cached pointer.
> That has sped the damn thing (/proc/mounts et.al.) big way, but it's
> dependent upon having an event count updated whenever we change the
> mount tree - doing the same for vma_area list might or might not be
> a good idea.  /proc/mounts and friends get ->poll() on that as well;
> that probably would _not_ be a good idea in this case.

Yes, a generation number could help in some cases.

Another potential issue with CONFIG_VMAP_STACK is that we make no
attempt to allocate 4 consecutive pages.

Even if we have plenty of memory, 4 calls to alloc_page() are likely to
give us 4 pages in completely different locations.

Here I printed the hugepage number of the 4 pages for some stacks :


0xffffc9001a07c000-0xffffc9001a081000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfcac Hfeba Hfec0 Hfc9d N0=4
0xffffc9001a084000-0xffffc9001a089000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfc79 Hfc79 Hfc79 Hfc83 N0=4
0xffffc9001a08c000-0xffffc9001a091000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfc9b Hfe91 Hfebe Hfca2 N0=4
0xffffc9001a094000-0xffffc9001a099000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfcaa Hfcaa Hfca6 Hfebc N0=4
0xffffc9001a09c000-0xffffc9001a0a1000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfe9b Hfe90 Hff09 Hfefb N0=4
0xffffc9001a0a4000-0xffffc9001a0a9000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfe94 Hfe62 Hfea0 Hfe7b N0=4
0xffffc9001a0ac000-0xffffc9001a0b1000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfe78 Hff05 Hff05 Hfc74 N0=4
0xffffc9001a0b4000-0xffffc9001a0b9000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfc9b Hfc9b Hfe83 Hf782 N0=4
0xffffc9001a0bc000-0xffffc9001a0c1000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfe78 Hfe78 Hfc7f Hfc7f N0=4
0xffffc9001a0c4000-0xffffc9001a0c9000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfebe Hfebe Hfe82 Hfe85 N0=4
0xffffc9001a0cc000-0xffffc9001a0d1000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfc6b Hfe62 Hfe62 Hfcaa N0=4
0xffffc9001a0d4000-0xffffc9001a0d9000   20480 _do_fork+0xe1/0x360 pages=4 vmalloc Hfebd Hfebd Hfc92 Hfc92 N0=4

This is a vmalloc() generic issue that is worth fixing now ?

Note this RFC might conflict with NUMA interleave policy.

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index f2481cb4e6b2..0123e97debb9 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1602,9 +1602,10 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 				 pgprot_t prot, int node)
 {
 	struct page **pages;
-	unsigned int nr_pages, array_size, i;
+	unsigned int nr_pages, array_size, i, j;
 	const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
 	const gfp_t alloc_mask = gfp_mask | __GFP_NOWARN;
+	const gfp_t multi_alloc_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) | __GFP_NORETRY;
 
 	nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
 	array_size = (nr_pages * sizeof(struct page *));
@@ -1624,20 +1625,34 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 		return NULL;
 	}
 
-	for (i = 0; i < area->nr_pages; i++) {
-		struct page *page;
-
-		if (node == NUMA_NO_NODE)
-			page = alloc_page(alloc_mask);
-		else
-			page = alloc_pages_node(node, alloc_mask, 0);
+	for (i = 0; i < area->nr_pages;) {
+		struct page *page = NULL;
+		unsigned int chunk_order = min(ilog2(area->nr_pages - i), MAX_ORDER - 1);
+
+		while (chunk_order && !page) {
+			if (node == NUMA_NO_NODE)
+				page = alloc_pages(multi_alloc_mask, chunk_order);
+			else
+				page = alloc_pages_node(node, multi_alloc_mask, chunk_order);
+			if (page)
+				split_page(page, chunk_order);
+			else
+				chunk_order--;
+		}
+		if (!page) {
+			if (node == NUMA_NO_NODE)
+				page = alloc_pages(alloc_mask, 0);
+			else
+				page = alloc_pages_node(node, alloc_mask, 0);
+		}
 
 		if (unlikely(!page)) {
 			/* Successfully allocated i pages, free them in __vunmap() */
 			area->nr_pages = i;
 			goto fail;
 		}
-		area->pages[i] = page;
+		for (j = 0; j < (1 << chunk_order); j++)
+			area->pages[i++] = page++;
 		if (gfpflags_allow_blocking(gfp_mask))
 			cond_resched();
 	}

  reply	other threads:[~2016-11-21  4:59 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-20 22:05 Linux 4.9-rc6 Linus Torvalds
2016-11-20 22:27 ` Eric Dumazet
2016-11-20 23:27   ` Linus Torvalds
2016-11-21  1:35     ` Al Viro
2016-11-21  4:59       ` Eric Dumazet [this message]
2016-11-21  8:34         ` David Rientjes
2016-11-21 13:32           ` Eric Dumazet
2016-11-21 13:51             ` Eric Dumazet
2016-11-21 16:49               ` Eric Dumazet
2016-12-04 10:43               ` Thorsten Leemhuis
     [not found]                 ` <CA+55aFzPiZW4FfWbvM-+AFraa0fkUHv4C1Y9SCzHdXEcUSPqdg@mail.gmail.com>
2016-12-04 17:17                   ` Eric Dumazet
2016-12-21 15:30                     ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1479704390.8455.398.camel@edumazet-glaptop3.roam.corp.google.com \
    --to=eric.dumazet@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.