stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 2/9] mm: mempolicy: fix THP allocations escaping mempolicy restrictions
       [not found] <20211224211127.30b60764d059ff3b0afea38a@linux-foundation.org>
@ 2021-12-25  5:12 ` Andrew Morton
  2021-12-25  5:12 ` [patch 5/9] mm, hwpoison: fix condition in free hugetlb page path Andrew Morton
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2021-12-25  5:12 UTC (permalink / raw)
  To: aarcange, akpm, arbn, linux-mm, mgorman, mhocko, mm-commits,
	rientjes, stable, torvalds

From: Andrey Ryabinin <arbn@yandex-team.com>
Subject: mm: mempolicy: fix THP allocations escaping mempolicy restrictions

alloc_pages_vma() may try to allocate THP page on the local NUMA node
first:

	page = __alloc_pages_node(hpage_node,
		gfp | __GFP_THISNODE | __GFP_NORETRY, order);

And if the allocation fails it retries allowing remote memory:

	if (!page && (gfp & __GFP_DIRECT_RECLAIM))
    		page = __alloc_pages_node(hpage_node,
					gfp, order);

However, this retry allocation completely ignores memory policy nodemask
allowing allocation to escape restrictions.

The first appearance of this bug seems to be the commit ac5b2c18911f
 ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings")
The bug disappeared later in the commit 89c83fb539f9
 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask")
and reappeared again in slightly different form in the commit 76e654cc91bb
 ("mm, page_alloc: allow hugepage fallback to remote nodes when madvised")

Fix this by passing correct nodemask to the __alloc_pages() call.

The demonstration/reproducer of the problem:
 $ mount -oremount,size=4G,huge=always /dev/shm/
 $ echo always > /sys/kernel/mm/transparent_hugepage/defrag
 $ cat mbind_thp.c
 #include <unistd.h>
 #include <sys/mman.h>
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <assert.h>
 #include <stdlib.h>
 #include <stdio.h>
 #include <numaif.h>

 #define SIZE 2ULL << 30
 int main(int argc, char **argv)
 {
   int fd;
   unsigned long long i;
   char *addr;
   pid_t pid;
   char buf[100];
   unsigned long nodemask = 1;

   fd = open("/dev/shm/test", O_RDWR|O_CREAT);
   assert(fd > 0);
   assert(ftruncate(fd, SIZE) == 0);

   addr = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
                        MAP_SHARED, fd, 0);

   assert(mbind(addr, SIZE, MPOL_BIND, &nodemask, 2, MPOL_MF_STRICT|MPOL_MF_MOVE)==0);
   for (i = 0; i < SIZE; i+=4096) {
     addr[i] = 1;
   }
   pid = getpid();
   snprintf(buf, sizeof(buf), "grep shm /proc/%d/numa_maps", pid);
   system(buf);
   sleep(10000);

   return 0;
 }
 $ gcc mbind_thp.c -o mbind_thp -lnuma
 $ numactl -H
 available: 2 nodes (0-1)
 node 0 cpus: 0 2
 node 0 size: 1918 MB
 node 0 free: 1595 MB
 node 1 cpus: 1 3
 node 1 size: 2014 MB
 node 1 free: 1731 MB
 node distances:
 node   0   1
   0:  10  20
   1:  20  10
 $ rm -f /dev/shm/test; taskset -c 0 ./mbind_thp
 7fd970a00000 bind:0 file=/dev/shm/test dirty=524288 active=0 N0=396800 N1=127488 kernelpagesize_kB=4

Link: https://lkml.kernel.org/r/20211208165343.22349-1-arbn@yandex-team.com
Fixes: ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings")
Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/mempolicy.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/mm/mempolicy.c~mm-mempolicy-fix-thp-allocations-escaping-mempolicy-restrictions
+++ a/mm/mempolicy.c
@@ -2140,8 +2140,7 @@ struct page *alloc_pages_vma(gfp_t gfp,
 			 * memory with both reclaim and compact as well.
 			 */
 			if (!page && (gfp & __GFP_DIRECT_RECLAIM))
-				page = __alloc_pages_node(hpage_node,
-								gfp, order);
+				page = __alloc_pages(gfp, order, hpage_node, nmask);
 
 			goto out;
 		}
_

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [patch 5/9] mm, hwpoison: fix condition in free hugetlb page path
       [not found] <20211224211127.30b60764d059ff3b0afea38a@linux-foundation.org>
  2021-12-25  5:12 ` [patch 2/9] mm: mempolicy: fix THP allocations escaping mempolicy restrictions Andrew Morton
@ 2021-12-25  5:12 ` Andrew Morton
  2021-12-25  5:12 ` [patch 8/9] mm/damon/dbgfs: protect targets destructions with kdamond_lock Andrew Morton
  2021-12-25  5:12 ` [patch 9/9] mm/hwpoison: clear MF_COUNT_INCREASED before retrying get_any_page() Andrew Morton
  3 siblings, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2021-12-25  5:12 UTC (permalink / raw)
  To: akpm, linux-mm, luofei, mike.kravetz, mm-commits,
	naoya.horiguchi, stable, torvalds

From: Naoya Horiguchi <naoya.horiguchi@nec.com>
Subject: mm, hwpoison: fix condition in free hugetlb page path

When a memory error hits a tail page of a free hugepage,
__page_handle_poison() is expected to be called to isolate the error in
4kB unit, but it's not called due to the outdated if-condition in
memory_failure_hugetlb().  This loses the chance to isolate the error in
the finer unit, so it's not optimal.  Drop the condition.

This "(p != head && TestSetPageHWPoison(head)" condition is based on the
old semantics of PageHWPoison on hugepage (where PG_hwpoison flag was set
on the subpage), so it's not necessray any more.  By getting to set
PG_hwpoison on head page for hugepages, concurrent error events on
different subpages in a single hugepage can be prevented by
TestSetPageHWPoison(head) at the beginning of memory_failure_hugetlb(). 
So dropping the condition should not reopen the race window originally
mentioned in commit b985194c8c0a ("hwpoison, hugetlb:
lock_page/unlock_page does not match for handling a free hugepage")

[naoya.horiguchi@linux.dev: fix "HardwareCorrupted" counter]
  Link: https://lkml.kernel.org/r/20211220084851.GA1460264@u2004
Link: https://lkml.kernel.org/r/20211210110208.879740-1-naoya.horiguchi@linux.dev
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reported-by: Fei Luo <luofei@unicloud.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>	[5.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory-failure.c |   13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

--- a/mm/memory-failure.c~mm-hwpoison-fix-condition-in-free-hugetlb-page-path
+++ a/mm/memory-failure.c
@@ -1470,17 +1470,12 @@ static int memory_failure_hugetlb(unsign
 	if (!(flags & MF_COUNT_INCREASED)) {
 		res = get_hwpoison_page(p, flags);
 		if (!res) {
-			/*
-			 * Check "filter hit" and "race with other subpage."
-			 */
 			lock_page(head);
-			if (PageHWPoison(head)) {
-				if ((hwpoison_filter(p) && TestClearPageHWPoison(p))
-				    || (p != head && TestSetPageHWPoison(head))) {
+			if (hwpoison_filter(p)) {
+				if (TestClearPageHWPoison(head))
 					num_poisoned_pages_dec();
-					unlock_page(head);
-					return 0;
-				}
+				unlock_page(head);
+				return 0;
 			}
 			unlock_page(head);
 			res = MF_FAILED;
_

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [patch 8/9] mm/damon/dbgfs: protect targets destructions with kdamond_lock
       [not found] <20211224211127.30b60764d059ff3b0afea38a@linux-foundation.org>
  2021-12-25  5:12 ` [patch 2/9] mm: mempolicy: fix THP allocations escaping mempolicy restrictions Andrew Morton
  2021-12-25  5:12 ` [patch 5/9] mm, hwpoison: fix condition in free hugetlb page path Andrew Morton
@ 2021-12-25  5:12 ` Andrew Morton
  2021-12-25  5:12 ` [patch 9/9] mm/hwpoison: clear MF_COUNT_INCREASED before retrying get_any_page() Andrew Morton
  3 siblings, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2021-12-25  5:12 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, sangwoob, sj, stable, torvalds

From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/dbgfs: protect targets destructions with kdamond_lock

DAMON debugfs interface iterates current monitoring targets in
'dbgfs_target_ids_read()' while holding the corresponding 'kdamond_lock'. 
However, it also destructs the monitoring targets in
'dbgfs_before_terminate()' without holding the lock.  This can result in a
use_after_free bug.  This commit avoids the race by protecting the
destruction with the corresponding 'kdamond_lock'.

Link: https://lkml.kernel.org/r/20211221094447.2241-1-sj@kernel.org
Reported-by: Sangwoo Bae <sangwoob@amazon.com>
Fixes: 4bc05954d007 ("mm/damon: implement a debugfs-based user space interface")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>	[5.15.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/damon/dbgfs.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/mm/damon/dbgfs.c~mm-damon-dbgfs-protect-targets-destructions-with-kdamond_lock
+++ a/mm/damon/dbgfs.c
@@ -650,10 +650,12 @@ static void dbgfs_before_terminate(struc
 	if (!targetid_is_pid(ctx))
 		return;
 
+	mutex_lock(&ctx->kdamond_lock);
 	damon_for_each_target_safe(t, next, ctx) {
 		put_pid((struct pid *)t->id);
 		damon_destroy_target(t);
 	}
+	mutex_unlock(&ctx->kdamond_lock);
 }
 
 static struct damon_ctx *dbgfs_new_ctx(void)
_

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [patch 9/9] mm/hwpoison: clear MF_COUNT_INCREASED before retrying get_any_page()
       [not found] <20211224211127.30b60764d059ff3b0afea38a@linux-foundation.org>
                   ` (2 preceding siblings ...)
  2021-12-25  5:12 ` [patch 8/9] mm/damon/dbgfs: protect targets destructions with kdamond_lock Andrew Morton
@ 2021-12-25  5:12 ` Andrew Morton
  3 siblings, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2021-12-25  5:12 UTC (permalink / raw)
  To: akpm, hulkci, linux-mm, liushixin2, mm-commits, naoya.horiguchi,
	osalvador, stable, torvalds

From: Liu Shixin <liushixin2@huawei.com>
Subject: mm/hwpoison: clear MF_COUNT_INCREASED before retrying get_any_page()

Hulk Robot reported a panic in put_page_testzero() when testing madvise()
with MADV_SOFT_OFFLINE.  The BUG() is triggered when retrying
get_any_page().  This is because we keep MF_COUNT_INCREASED flag in second
try but the refcnt is not increased.

 page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
 ------------[ cut here ]------------
 kernel BUG at include/linux/mm.h:737!
 invalid opcode: 0000 [#1] PREEMPT SMP
 CPU: 5 PID: 2135 Comm: sshd Tainted: G    B             5.16.0-rc6-dirty #373
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
 RIP: 0010:release_pages+0x53f/0x840
 Code: 0c 01 4c 8d 60 ff e9 5b fb ff ff 48 c7 c6 d8 97 0c b3 4c 89 e7 48 83 05 0e 7b 3c 0c 01 e8 89 3d 04 00 48 83 05 11 7b 3c 0c 01 <0f> 0b 48 83 05 0f 7b 3c 0c 01 48 83 05 0f 7b 3c 0c 01 48 83 05f
 RSP: 0018:ffffc900015a7bc0 EFLAGS: 00010002
 RAX: 000000000000003e RBX: ffffffffbace04c8 RCX: 0000000000000002
 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00000000ffffffff
 RBP: ffff88817b9acd50 R08: 0000000000000000 R09: c0000000ffefffff
 R10: 0000000000000001 R11: ffffc900015a79b0 R12: ffffea0005e1c900
 R13: ffffea0005e1de88 R14: 000000000000001f R15: ffff888100071000
 FS:  0000000000000000(0000) GS:ffff88842fb40000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f305e8de3d4 CR3: 000000017bb6f000 CR4: 00000000000006e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  <TASK>
  free_pages_and_swap_cache+0x64/0x80
  tlb_flush_mmu+0x6f/0x220
  unmap_page_range+0xe6c/0x12c0
  unmap_single_vma+0x90/0x170
  unmap_vmas+0xc4/0x180
  exit_mmap+0xde/0x3a0
  mmput+0xa3/0x250
  do_exit+0x564/0x1470
  do_group_exit+0x3b/0x100
  __do_sys_exit_group+0x13/0x20
  __x64_sys_exit_group+0x16/0x20
  do_syscall_64+0x34/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7f30625401d9
 Code: Unable to access opcode bytes at RIP 0x7f30625401af.
 RSP: 002b:00007ffe391b0c88 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
 RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f30625401d9
 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000001
 RBP: 00007f306283d838 R08: 000000000000003c R09: 00000000000000e7
 R10: fffffffffffffe30 R11: 0000000000000246 R12: 00007f306283d838
 R13: 00007f3062842e80 R14: 0000000000000000 R15: ffffaa4fb7932430
  </TASK>
 Modules linked in:
 ---[ end trace e99579b570fe0649 ]---
 RIP: 0010:release_pages+0x53f/0x840
 Code: 0c 01 4c 8d 60 ff e9 5b fb ff ff 48 c7 c6 d8 97 0c b3 4c 89 e7 48 83 05 0e 7b 3c 0c 01 e8 89 3d 04 00 48 83 05 11 7b 3c 0c 01 <0f> 0b 48 83 05 0f 7b 3c 0c 01 48 83 05 0f 7b 3c 0c 01 48 83 05f
 RSP: 0018:ffffc900015a7bc0 EFLAGS: 00010002
 RAX: 000000000000003e RBX: ffffffffbace04c8 RCX: 0000000000000002
 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00000000ffffffff
 RBP: ffff88817b9acd50 R08: 0000000000000000 R09: c0000000ffefffff
 R10: 0000000000000001 R11: ffffc900015a79b0 R12: ffffea0005e1c900
 R13: ffffea0005e1de88 R14: 000000000000001f R15: ffff888100071000
 FS:  0000000000000000(0000) GS:ffff88842fb40000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f305e8de3d4 CR3: 000000017bb6f000 CR4: 00000000000006e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Link: https://lkml.kernel.org/r/20211221074908.3910286-1-liushixin2@huawei.com
Fixes: b94e02822deb ("mm,hwpoison: try to narrow window race for free pages")
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory-failure.c |    1 +
 1 file changed, 1 insertion(+)

--- a/mm/memory-failure.c~mm-hwpoison-clear-mf_count_increased-before-retrying-get_any_page
+++ a/mm/memory-failure.c
@@ -2234,6 +2234,7 @@ retry:
 	} else if (ret == 0) {
 		if (soft_offline_free_page(page) && try_again) {
 			try_again = false;
+			flags &= ~MF_COUNT_INCREASED;
 			goto retry;
 		}
 	}
_

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-12-25  5:13 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20211224211127.30b60764d059ff3b0afea38a@linux-foundation.org>
2021-12-25  5:12 ` [patch 2/9] mm: mempolicy: fix THP allocations escaping mempolicy restrictions Andrew Morton
2021-12-25  5:12 ` [patch 5/9] mm, hwpoison: fix condition in free hugetlb page path Andrew Morton
2021-12-25  5:12 ` [patch 8/9] mm/damon/dbgfs: protect targets destructions with kdamond_lock Andrew Morton
2021-12-25  5:12 ` [patch 9/9] mm/hwpoison: clear MF_COUNT_INCREASED before retrying get_any_page() Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).