All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Andrey Ryabinin <arbn@yandex-team.com>,
	Michal Hocko <mhocko@suse.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	David Rientjes <rientjes@google.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 5.4 39/47] mm: mempolicy: fix THP allocations escaping mempolicy restrictions
Date: Mon, 27 Dec 2021 16:31:15 +0100	[thread overview]
Message-ID: <20211227151322.146610498@linuxfoundation.org> (raw)
In-Reply-To: <20211227151320.801714429@linuxfoundation.org>

From: Andrey Ryabinin <arbn@yandex-team.com>

commit 338635340669d5b317c7e8dcf4fff4a0f3651d87 upstream.

alloc_pages_vma() may try to allocate THP page on the local NUMA node
first:

	page = __alloc_pages_node(hpage_node,
		gfp | __GFP_THISNODE | __GFP_NORETRY, order);

And if the allocation fails it retries allowing remote memory:

	if (!page && (gfp & __GFP_DIRECT_RECLAIM))
    		page = __alloc_pages_node(hpage_node,
					gfp, order);

However, this retry allocation completely ignores memory policy nodemask
allowing allocation to escape restrictions.

The first appearance of this bug seems to be the commit ac5b2c18911f
("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings").

The bug disappeared later in the commit 89c83fb539f9 ("mm, thp:
consolidate THP gfp handling into alloc_hugepage_direct_gfpmask") and
reappeared again in slightly different form in the commit 76e654cc91bb
("mm, page_alloc: allow hugepage fallback to remote nodes when
madvised")

Fix this by passing correct nodemask to the __alloc_pages() call.

The demonstration/reproducer of the problem:

    $ mount -oremount,size=4G,huge=always /dev/shm/
    $ echo always > /sys/kernel/mm/transparent_hugepage/defrag
    $ cat mbind_thp.c
    #include <unistd.h>
    #include <sys/mman.h>
    #include <sys/stat.h>
    #include <fcntl.h>
    #include <assert.h>
    #include <stdlib.h>
    #include <stdio.h>
    #include <numaif.h>

    #define SIZE 2ULL << 30
    int main(int argc, char **argv)
    {
        int fd;
        unsigned long long i;
        char *addr;
        pid_t pid;
        char buf[100];
        unsigned long nodemask = 1;

        fd = open("/dev/shm/test", O_RDWR|O_CREAT);
        assert(fd > 0);
        assert(ftruncate(fd, SIZE) == 0);

        addr = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
                           MAP_SHARED, fd, 0);

        assert(mbind(addr, SIZE, MPOL_BIND, &nodemask, 2, MPOL_MF_STRICT|MPOL_MF_MOVE)==0);
        for (i = 0; i < SIZE; i+=4096) {
          addr[i] = 1;
        }
        pid = getpid();
        snprintf(buf, sizeof(buf), "grep shm /proc/%d/numa_maps", pid);
        system(buf);
        sleep(10000);

        return 0;
    }
    $ gcc mbind_thp.c -o mbind_thp -lnuma
    $ numactl -H
    available: 2 nodes (0-1)
    node 0 cpus: 0 2
    node 0 size: 1918 MB
    node 0 free: 1595 MB
    node 1 cpus: 1 3
    node 1 size: 2014 MB
    node 1 free: 1731 MB
    node distances:
    node   0   1
      0:  10  20
      1:  20  10
    $ rm -f /dev/shm/test; taskset -c 0 ./mbind_thp
    7fd970a00000 bind:0 file=/dev/shm/test dirty=524288 active=0 N0=396800 N1=127488 kernelpagesize_kB=4

Link: https://lkml.kernel.org/r/20211208165343.22349-1-arbn@yandex-team.com
Fixes: ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings")
Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/mempolicy.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2143,8 +2143,9 @@ alloc_pages_vma(gfp_t gfp, int order, st
 			 * memory as well.
 			 */
 			if (!page && (gfp & __GFP_DIRECT_RECLAIM))
-				page = __alloc_pages_node(hpage_node,
-						gfp | __GFP_NORETRY, order);
+				page = __alloc_pages_nodemask(gfp | __GFP_NORETRY,
+							order, hpage_node,
+							nmask);
 
 			goto out;
 		}



  parent reply	other threads:[~2021-12-27 15:37 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-27 15:30 [PATCH 5.4 00/47] 5.4.169-rc1 review Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 01/47] net: usb: lan78xx: add Allied Telesis AT29M2-AF Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 02/47] serial: 8250_fintek: Fix garbled text for console Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 03/47] HID: holtek: fix mouse probing Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 04/47] arm64: dts: allwinner: orangepi-zero-plus: fix PHY mode Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 05/47] spi: change clk_disable_unprepare to clk_unprepare Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 06/47] IB/qib: Fix memory leak in qib_user_sdma_queue_pkts() Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 07/47] netfilter: fix regression in looped (broad|multi)casts MAC handling Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 08/47] qlcnic: potential dereference null pointer of rx_queue->page_ring Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 09/47] net: accept UFOv6 packages in virtio_net_hdr_to_skb Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 10/47] net: skip virtio_net_hdr_set_proto if protocol already set Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 11/47] ipmi: Fix UAF when uninstall ipmi_si and ipmi_msghandler module Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 12/47] bonding: fix ad_actor_system option setting to default Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 13/47] fjes: Check for error irq Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 14/47] drivers: net: smc911x: " Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 15/47] sfc: falcon: Check null pointer of rx_queue->page_ring Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 16/47] Input: elantech - fix stack out of bound access in elantech_change_report_id() Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 17/47] hwmon: (lm90) Fix usage of CONFIG2 register in detect function Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 18/47] hwmon: (lm90) Add max6654 support to lm90 driver Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 19/47] hwmon: (lm90) Add basic support for TI TMP461 Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 20/47] hwmon: (lm90) Introduce flag indicating extended temperature support Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 21/47] hwmon: (lm90) Drop critical attribute support for MAX6654 Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 22/47] ALSA: jack: Check the return value of kstrdup() Greg Kroah-Hartman
2021-12-27 15:30 ` [PATCH 5.4 23/47] ALSA: drivers: opl3: Fix incorrect use of vp->state Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 24/47] ALSA: hda/realtek: Amp init fixup for HP ZBook 15 G6 Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 25/47] Input: atmel_mxt_ts - fix double free in mxt_read_info_block Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 26/47] ipmi: bail out if init_srcu_struct fails Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 27/47] ipmi: ssif: initialize ssif_info->client early Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 28/47] ipmi: fix initialization when workqueue allocation fails Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 29/47] parisc: Correct completer in lws start Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 30/47] x86/pkey: Fix undefined behaviour with PKRU_WD_BIT Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 31/47] pinctrl: stm32: consider the GPIO offset to expose all the GPIO lines Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 32/47] mmc: sdhci-tegra: Fix switch to HS400ES mode Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 33/47] mmc: core: Disable card detect during shutdown Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 34/47] ARM: 9169/1: entry: fix Thumb2 bug in iWMMXt exception handling Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 35/47] tee: optee: Fix incorrect page free bug Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 36/47] f2fs: fix to do sanity check on last xattr entry in __f2fs_setxattr() Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 37/47] usb: gadget: u_ether: fix race in setting MAC address in setup phase Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 38/47] KVM: VMX: Fix stale docs for kvm-intel.emulate_invalid_guest_state Greg Kroah-Hartman
2021-12-27 15:31 ` Greg Kroah-Hartman [this message]
2021-12-27 15:31 ` [PATCH 5.4 40/47] Input: i8042 - enable deferred probe quirk for ASUS UM325UA Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 41/47] pinctrl: mediatek: fix global-out-of-bounds issue Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 42/47] hwmom: (lm90) Fix citical alarm status for MAX6680/MAX6681 Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 43/47] hwmon: (lm90) Do not report busy status bit as alarm Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 44/47] ax25: NPD bug when detaching AX25 device Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 45/47] hamradio: defer ax25 kfree after unregister_netdev Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 46/47] hamradio: improve the incomplete fix to avoid NPD Greg Kroah-Hartman
2021-12-27 15:31 ` [PATCH 5.4 47/47] phonet/pep: refuse to enable an unbound pipe Greg Kroah-Hartman
2021-12-27 17:43 ` [PATCH 5.4 00/47] 5.4.169-rc1 review Florian Fainelli
2021-12-28  0:56 ` Samuel Zou
2021-12-28 10:21 ` Naresh Kamboju
2021-12-28 13:20 ` Sudip Mukherjee
2021-12-28 17:07 ` Guenter Roeck
2021-12-28 21:28 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211227151322.146610498@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=arbn@yandex-team.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=rientjes@google.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.