stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Andrey Ryabinin <arbn@yandex-team.com>,
	Michal Hocko <mhocko@suse.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	David Rientjes <rientjes@google.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 5.4 39/47] mm: mempolicy: fix THP allocations escaping mempolicy restrictions
Date: Mon, 27 Dec 2021 16:31:15 +0100	[thread overview]
Message-ID: <20211227151322.146610498@linuxfoundation.org> (raw)
In-Reply-To: <20211227151320.801714429@linuxfoundation.org>

From: Andrey Ryabinin <arbn@yandex-team.com>

commit 338635340669d5b317c7e8dcf4fff4a0f3651d87 upstream.

alloc_pages_vma() may try to allocate THP page on the local NUMA node
first:

	page = __alloc_pages_node(hpage_node,
		gfp | __GFP_THISNODE | __GFP_NORETRY, order);

And if the allocation fails it retries allowing remote memory:

	if (!page && (gfp & __GFP_DIRECT_RECLAIM))
    		page = __alloc_pages_node(hpage_node,
					gfp, order);

However, this retry allocation completely ignores memory policy nodemask
allowing allocation to escape restrictions.

The first appearance of this bug seems to be the commit ac5b2c18911f
("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings").

The bug disappeared later in the commit 89c83fb539f9 ("mm, thp:
consolidate THP gfp handling into alloc_hugepage_direct_gfpmask") and
reappeared again in slightly different form in the commit 76e654cc91bb
("mm, page_alloc: allow hugepage fallback to remote nodes when
madvised")

Fix this by passing correct nodemask to the __alloc_pages() call.

The demonstration/reproducer of the problem:

    $ mount -oremount,size=4G,huge=always /dev/shm/
    $ echo always > /sys/kernel/mm/transparent_hugepage/defrag
    $ cat mbind_thp.c
    #include <unistd.h>
    #include <sys/mman.h>
    #include <sys/stat.h>
    #include <fcntl.h>
    #include <assert.h>
    #include <stdlib.h>
    #include <stdio.h>
    #include <numaif.h>

    #define SIZE 2ULL << 30
    int main(int argc, char **argv)
    {
        int fd;
        unsigned long long i;
        char *addr;
        pid_t pid;
        char buf[100];
        unsigned long nodemask = 1;

        fd = open("/dev/shm/test", O_RDWR|O_CREAT);
        assert(fd > 0);
        assert(ftruncate(fd, SIZE) == 0);

        addr = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
                           MAP_SHARED, fd, 0);

        assert(mbind(addr, SIZE, MPOL_BIND, &nodemask, 2, MPOL_MF_STRICT|MPOL_MF_MOVE)==0);
        for (i = 0; i < SIZE; i+=4096) {
          addr[i] = 1;
        }
        pid = getpid();
        snprintf(buf, sizeof(buf), "grep shm /proc/%d/numa_maps", pid);
        system(buf);
        sleep(10000);

        return 0;
    }
    $ gcc mbind_thp.c -o mbind_thp -lnuma
    $ numactl -H
    available: 2 nodes (0-1)
    node 0 cpus: 0 2
    node 0 size: 1918 MB
    node 0 free: 1595 MB
    node 1 cpus: 1 3
    node 1 size: 2014 MB
    node 1 free: 1731 MB
    node distances:
    node   0   1
      0:  10  20
      1:  20  10
    $ rm -f /dev/shm/test; taskset -c 0 ./mbind_thp
    7fd970a00000 bind:0 file=/dev/shm/test dirty=524288 active=0 N0=396800 N1=127488 kernelpagesize_kB=4

Link: https://lkml.kernel.org/r/20211208165343.22349-1-arbn@yandex-team.com
Fixes: ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings")
Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/mempolicy.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2143,8 +2143,9 @@ alloc_pages_vma(gfp_t gfp, int order, st
 			 * memory as well.
 			 */
 			if (!page && (gfp & __GFP_DIRECT_RECLAIM))
-				page = __alloc_pages_node(hpage_node,
-						gfp | __GFP_NORETRY, order);
+				page = __alloc_pages_nodemask(gfp | __GFP_NORETRY,
+							order, hpage_node,
+							nmask);
 
 			goto out;
 		}



  parent reply	other threads:[~2021-12-27 15:37 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-27 15:30 [PATCH 5.4 00/47] 5.4.169-rc1 review Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 01/47] net: usb: lan78xx: add Allied Telesis AT29M2-AF Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 02/47] serial: 8250_fintek: Fix garbled text for console Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 03/47] HID: holtek: fix mouse probing Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 04/47] arm64: dts: allwinner: orangepi-zero-plus: fix PHY mode Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 05/47] spi: change clk_disable_unprepare to clk_unprepare Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 06/47] IB/qib: Fix memory leak in qib_user_sdma_queue_pkts() Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 07/47] netfilter: fix regression in looped (broad|multi)casts MAC handling Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 08/47] qlcnic: potential dereference null pointer of rx_queue->page_ring Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 09/47] net: accept UFOv6 packages in virtio_net_hdr_to_skb Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 10/47] net: skip virtio_net_hdr_set_proto if protocol already set Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 11/47] ipmi: Fix UAF when uninstall ipmi_si and ipmi_msghandler module Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 12/47] bonding: fix ad_actor_system option setting to default Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 13/47] fjes: Check for error irq Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 14/47] drivers: net: smc911x: " Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 15/47] sfc: falcon: Check null pointer of rx_queue->page_ring Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 16/47] Input: elantech - fix stack out of bound access in elantech_change_report_id() Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 17/47] hwmon: (lm90) Fix usage of CONFIG2 register in detect function Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 18/47] hwmon: (lm90) Add max6654 support to lm90 driver Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 19/47] hwmon: (lm90) Add basic support for TI TMP461 Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 20/47] hwmon: (lm90) Introduce flag indicating extended temperature support Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 21/47] hwmon: (lm90) Drop critical attribute support for MAX6654 Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 22/47] ALSA: jack: Check the return value of kstrdup() Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 23/47] ALSA: drivers: opl3: Fix incorrect use of vp->state Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 24/47] ALSA: hda/realtek: Amp init fixup for HP ZBook 15 G6 Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 25/47] Input: atmel_mxt_ts - fix double free in mxt_read_info_block Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 26/47] ipmi: bail out if init_srcu_struct fails Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 27/47] ipmi: ssif: initialize ssif_info->client early Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 28/47] ipmi: fix initialization when workqueue allocation fails Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 29/47] parisc: Correct completer in lws start Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 30/47] x86/pkey: Fix undefined behaviour with PKRU_WD_BIT Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 31/47] pinctrl: stm32: consider the GPIO offset to expose all the GPIO lines Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 32/47] mmc: sdhci-tegra: Fix switch to HS400ES mode Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 33/47] mmc: core: Disable card detect during shutdown Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 34/47] ARM: 9169/1: entry: fix Thumb2 bug in iWMMXt exception handling Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 35/47] tee: optee: Fix incorrect page free bug Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 36/47] f2fs: fix to do sanity check on last xattr entry in __f2fs_setxattr() Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 37/47] usb: gadget: u_ether: fix race in setting MAC address in setup phase Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 38/47] KVM: VMX: Fix stale docs for kvm-intel.emulate_invalid_guest_state Greg Kroah-Hartman
2021-12-27 15:31 ` Greg Kroah-Hartman [this message]
2021-12-27 15:31 ` [PATCH 5.4 40/47] Input: i8042 - enable deferred probe quirk for ASUS UM325UA Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 41/47] pinctrl: mediatek: fix global-out-of-bounds issue Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 42/47] hwmom: (lm90) Fix citical alarm status for MAX6680/MAX6681 Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 43/47] hwmon: (lm90) Do not report busy status bit as alarm Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 44/47] ax25: NPD bug when detaching AX25 device Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 45/47] hamradio: defer ax25 kfree after unregister_netdev Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 46/47] hamradio: improve the incomplete fix to avoid NPD Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 47/47] phonet/pep: refuse to enable an unbound pipe Greg Kroah-Hartman
2021-12-27 17:43 ` [PATCH 5.4 00/47] 5.4.169-rc1 review Florian Fainelli
2021-12-28  0:56 ` Samuel Zou
2021-12-28 10:21 ` Naresh Kamboju
2021-12-28 13:20 ` Sudip Mukherjee
2021-12-28 17:07 ` Guenter Roeck
2021-12-28 21:28 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211227151322.146610498@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=arbn@yandex-team.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=rientjes@google.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).