All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: aarcange@redhat.com, akpm@linux-foundation.org,
	arbn@yandex-team.com, linux-mm@kvack.org,
	mgorman@techsingularity.net, mhocko@suse.com,
	mm-commits@vger.kernel.org, rientjes@google.com,
	stable@vger.kernel.org, torvalds@linux-foundation.org
Subject: [patch 2/9] mm: mempolicy: fix THP allocations escaping mempolicy restrictions
Date: Fri, 24 Dec 2021 21:12:35 -0800	[thread overview]
Message-ID: <20211225051235.JoA_I5IqL%akpm@linux-foundation.org> (raw)
In-Reply-To: <20211224211127.30b60764d059ff3b0afea38a@linux-foundation.org>

From: Andrey Ryabinin <arbn@yandex-team.com>
Subject: mm: mempolicy: fix THP allocations escaping mempolicy restrictions

alloc_pages_vma() may try to allocate THP page on the local NUMA node
first:

	page = __alloc_pages_node(hpage_node,
		gfp | __GFP_THISNODE | __GFP_NORETRY, order);

And if the allocation fails it retries allowing remote memory:

	if (!page && (gfp & __GFP_DIRECT_RECLAIM))
    		page = __alloc_pages_node(hpage_node,
					gfp, order);

However, this retry allocation completely ignores memory policy nodemask
allowing allocation to escape restrictions.

The first appearance of this bug seems to be the commit ac5b2c18911f
 ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings")
The bug disappeared later in the commit 89c83fb539f9
 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask")
and reappeared again in slightly different form in the commit 76e654cc91bb
 ("mm, page_alloc: allow hugepage fallback to remote nodes when madvised")

Fix this by passing correct nodemask to the __alloc_pages() call.

The demonstration/reproducer of the problem:
 $ mount -oremount,size=4G,huge=always /dev/shm/
 $ echo always > /sys/kernel/mm/transparent_hugepage/defrag
 $ cat mbind_thp.c
 #include <unistd.h>
 #include <sys/mman.h>
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <assert.h>
 #include <stdlib.h>
 #include <stdio.h>
 #include <numaif.h>

 #define SIZE 2ULL << 30
 int main(int argc, char **argv)
 {
   int fd;
   unsigned long long i;
   char *addr;
   pid_t pid;
   char buf[100];
   unsigned long nodemask = 1;

   fd = open("/dev/shm/test", O_RDWR|O_CREAT);
   assert(fd > 0);
   assert(ftruncate(fd, SIZE) == 0);

   addr = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
                        MAP_SHARED, fd, 0);

   assert(mbind(addr, SIZE, MPOL_BIND, &nodemask, 2, MPOL_MF_STRICT|MPOL_MF_MOVE)==0);
   for (i = 0; i < SIZE; i+=4096) {
     addr[i] = 1;
   }
   pid = getpid();
   snprintf(buf, sizeof(buf), "grep shm /proc/%d/numa_maps", pid);
   system(buf);
   sleep(10000);

   return 0;
 }
 $ gcc mbind_thp.c -o mbind_thp -lnuma
 $ numactl -H
 available: 2 nodes (0-1)
 node 0 cpus: 0 2
 node 0 size: 1918 MB
 node 0 free: 1595 MB
 node 1 cpus: 1 3
 node 1 size: 2014 MB
 node 1 free: 1731 MB
 node distances:
 node   0   1
   0:  10  20
   1:  20  10
 $ rm -f /dev/shm/test; taskset -c 0 ./mbind_thp
 7fd970a00000 bind:0 file=/dev/shm/test dirty=524288 active=0 N0=396800 N1=127488 kernelpagesize_kB=4

Link: https://lkml.kernel.org/r/20211208165343.22349-1-arbn@yandex-team.com
Fixes: ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings")
Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/mempolicy.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/mm/mempolicy.c~mm-mempolicy-fix-thp-allocations-escaping-mempolicy-restrictions
+++ a/mm/mempolicy.c
@@ -2140,8 +2140,7 @@ struct page *alloc_pages_vma(gfp_t gfp,
 			 * memory with both reclaim and compact as well.
 			 */
 			if (!page && (gfp & __GFP_DIRECT_RECLAIM))
-				page = __alloc_pages_node(hpage_node,
-								gfp, order);
+				page = __alloc_pages(gfp, order, hpage_node, nmask);
 
 			goto out;
 		}
_

  parent reply	other threads:[~2021-12-25  5:12 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-25  5:11 incoming Andrew Morton
2021-12-25  5:12 ` [patch 1/9] kfence: fix memory leak when cat kfence objects Andrew Morton
2021-12-25  5:12 ` Andrew Morton [this message]
2021-12-25  5:12 ` [patch 3/9] kernel/crash_core: suppress unknown crashkernel parameter warning Andrew Morton
2021-12-25  5:12 ` [patch 4/9] MAINTAINERS: mark more list instances as moderated Andrew Morton
2021-12-25  5:12 ` [patch 5/9] mm, hwpoison: fix condition in free hugetlb page path Andrew Morton
2021-12-25  5:12 ` [patch 6/9] mm: delete unsafe BUG from page_cache_add_speculative() Andrew Morton
2021-12-25  5:12 ` [patch 7/9] mm/page_alloc: fix __alloc_size attribute for alloc_pages_exact_nid Andrew Morton
2021-12-25  5:12 ` [patch 8/9] mm/damon/dbgfs: protect targets destructions with kdamond_lock Andrew Morton
2021-12-25  5:12 ` [patch 9/9] mm/hwpoison: clear MF_COUNT_INCREASED before retrying get_any_page() Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211225051235.JoA_I5IqL%akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=aarcange@redhat.com \
    --cc=arbn@yandex-team.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=mm-commits@vger.kernel.org \
    --cc=rientjes@google.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.