linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* incoming
@ 2021-09-09  1:08 Andrew Morton
  2021-09-09  1:10 ` [patch 1/8] mm/hmm: bypass devmap pte when all pfn requested flags are fulfilled Andrew Morton
                   ` (7 more replies)
  0 siblings, 8 replies; 10+ messages in thread
From: Andrew Morton @ 2021-09-09  1:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm


A bunch of hotfixes, mostly cc:stable.


8 patches, based on 2d338201d5311bcd79d42f66df4cecbcbc5f4f2c.

Subsystems affected by this patch series:

  mm/hmm
  mm/hugetlb
  mm/vmscan
  mm/pagealloc
  mm/pagemap
  mm/kmemleak
  mm/mempolicy
  mm/memblock

Subsystem: mm/hmm

    Li Zhijian <lizhijian@cn.fujitsu.com>:
      mm/hmm: bypass devmap pte when all pfn requested flags are fulfilled

Subsystem: mm/hugetlb

    Liu Zixian <liuzixian4@huawei.com>:
      mm/hugetlb: initialize hugetlb_usage in mm_init

Subsystem: mm/vmscan

    Rik van Riel <riel@surriel.com>:
      mm,vmscan: fix divide by zero in get_scan_count

Subsystem: mm/pagealloc

    Miaohe Lin <linmiaohe@huawei.com>:
      mm/page_alloc.c: avoid accessing uninitialized pcp page migratetype

Subsystem: mm/pagemap

    Liam Howlett <liam.howlett@oracle.com>:
      mmap_lock: change trace and locking order

Subsystem: mm/kmemleak

    Naohiro Aota <naohiro.aota@wdc.com>:
      mm/kmemleak: allow __GFP_NOLOCKDEP passed to kmemleak's gfp

Subsystem: mm/mempolicy

    yanghui <yanghui.def@bytedance.com>:
      mm/mempolicy: fix a race between offset_il_node and mpol_rebind_task

Subsystem: mm/memblock

    Mike Rapoport <rppt@linux.ibm.com>:
      nds32/setup: remove unused memblock_region variable in setup_memory()

 arch/nds32/kernel/setup.c |    1 -
 include/linux/hugetlb.h   |    9 +++++++++
 include/linux/mmap_lock.h |    8 ++++----
 kernel/fork.c             |    1 +
 mm/hmm.c                  |    5 ++++-
 mm/kmemleak.c             |    3 ++-
 mm/mempolicy.c            |   17 +++++++++++++----
 mm/page_alloc.c           |    4 +++-
 mm/vmscan.c               |    2 +-
 9 files changed, 37 insertions(+), 13 deletions(-)



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch 1/8] mm/hmm: bypass devmap pte when all pfn requested flags are fulfilled
  2021-09-09  1:08 incoming Andrew Morton
@ 2021-09-09  1:10 ` Andrew Morton
  2021-09-09  1:10 ` [patch 2/8] mm/hugetlb: initialize hugetlb_usage in mm_init Andrew Morton
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2021-09-09  1:10 UTC (permalink / raw)
  To: akpm, hch, jgg, linux-mm, lizhijian, mm-commits, stable, torvalds

From: Li Zhijian <lizhijian@cn.fujitsu.com>
Subject: mm/hmm: bypass devmap pte when all pfn requested flags are fulfilled

Previously, we noticed the one rpma example was failed[1] since
36f30e486d, where it will use ODP feature to do RDMA WRITE between fsdax
files.

After digging into the code, we found hmm_vma_handle_pte() will still
return EFAULT even though all the its requesting flags has been fulfilled.
That's because a DAX page will be marked as (_PAGE_SPECIAL | PAGE_DEVMAP)
by pte_mkdevmap().

[1]: https://github.com/pmem/rpma/issues/1142

Link: https://lkml.kernel.org/r/20210830094232.203029-1-lizhijian@cn.fujitsu.com
Fixes: 405506274922 ("mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling")
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/hmm.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

--- a/mm/hmm.c~mm-hmm-bypass-devmap-pte-when-all-pfn-requested-flags-are-fulfilled
+++ a/mm/hmm.c
@@ -295,10 +295,13 @@ static int hmm_vma_handle_pte(struct mm_
 		goto fault;
 
 	/*
+	 * Bypass devmap pte such as DAX page when all pfn requested
+	 * flags(pfn_req_flags) are fulfilled.
 	 * Since each architecture defines a struct page for the zero page, just
 	 * fall through and treat it like a normal page.
 	 */
-	if (pte_special(pte) && !is_zero_pfn(pte_pfn(pte))) {
+	if (pte_special(pte) && !pte_devmap(pte) &&
+	    !is_zero_pfn(pte_pfn(pte))) {
 		if (hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, 0)) {
 			pte_unmap(ptep);
 			return -EFAULT;
_


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch 2/8] mm/hugetlb: initialize hugetlb_usage in mm_init
  2021-09-09  1:08 incoming Andrew Morton
  2021-09-09  1:10 ` [patch 1/8] mm/hmm: bypass devmap pte when all pfn requested flags are fulfilled Andrew Morton
@ 2021-09-09  1:10 ` Andrew Morton
  2021-09-09  1:10 ` [patch 3/8] mm,vmscan: fix divide by zero in get_scan_count Andrew Morton
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2021-09-09  1:10 UTC (permalink / raw)
  To: akpm, linux-mm, liuzixian4, mike.kravetz, mm-commits,
	naoya.horiguchi, stable, torvalds

From: Liu Zixian <liuzixian4@huawei.com>
Subject: mm/hugetlb: initialize hugetlb_usage in mm_init

After fork, the child process will get incorrect (2x) hugetlb_usage.
If a process uses 5 2MB hugetlb pages in an anonymous mapping,

	HugetlbPages:	   10240 kB

and then forks, the child will show,

	HugetlbPages:	   20480 kB

The reason for double the amount is because hugetlb_usage will be copied
from the parent and then increased when we copy page tables from parent to
child.  Child will have 2x actual usage.

Fix this by adding hugetlb_count_init in mm_init.

Link: https://lkml.kernel.org/r/20210826071742.877-1-liuzixian4@huawei.com
Fixes: 5d317b2b6536 ("mm: hugetlb: proc: add HugetlbPages field to /proc/PID/status")
Signed-off-by: Liu Zixian <liuzixian4@huawei.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/hugetlb.h |    9 +++++++++
 kernel/fork.c           |    1 +
 2 files changed, 10 insertions(+)

--- a/include/linux/hugetlb.h~mm-hugetlb-initialize-hugetlb_usage-in-mm_init
+++ a/include/linux/hugetlb.h
@@ -858,6 +858,11 @@ static inline spinlock_t *huge_pte_lockp
 
 void hugetlb_report_usage(struct seq_file *m, struct mm_struct *mm);
 
+static inline void hugetlb_count_init(struct mm_struct *mm)
+{
+	atomic_long_set(&mm->hugetlb_usage, 0);
+}
+
 static inline void hugetlb_count_add(long l, struct mm_struct *mm)
 {
 	atomic_long_add(l, &mm->hugetlb_usage);
@@ -1042,6 +1047,10 @@ static inline spinlock_t *huge_pte_lockp
 	return &mm->page_table_lock;
 }
 
+static inline void hugetlb_count_init(struct mm_struct *mm)
+{
+}
+
 static inline void hugetlb_report_usage(struct seq_file *f, struct mm_struct *m)
 {
 }
--- a/kernel/fork.c~mm-hugetlb-initialize-hugetlb_usage-in-mm_init
+++ a/kernel/fork.c
@@ -1063,6 +1063,7 @@ static struct mm_struct *mm_init(struct
 	mm->pmd_huge_pte = NULL;
 #endif
 	mm_init_uprobes_state(mm);
+	hugetlb_count_init(mm);
 
 	if (current->mm) {
 		mm->flags = current->mm->flags & MMF_INIT_MASK;
_


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch 3/8] mm,vmscan: fix divide by zero in get_scan_count
  2021-09-09  1:08 incoming Andrew Morton
  2021-09-09  1:10 ` [patch 1/8] mm/hmm: bypass devmap pte when all pfn requested flags are fulfilled Andrew Morton
  2021-09-09  1:10 ` [patch 2/8] mm/hugetlb: initialize hugetlb_usage in mm_init Andrew Morton
@ 2021-09-09  1:10 ` Andrew Morton
  2021-09-09  1:10 ` [patch 4/8] mm/page_alloc.c: avoid accessing uninitialized pcp page migratetype Andrew Morton
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2021-09-09  1:10 UTC (permalink / raw)
  To: akpm, chris, guro, hannes, linux-mm, mhocko, mm-commits, riel,
	stable, torvalds

From: Rik van Riel <riel@surriel.com>
Subject: mm,vmscan: fix divide by zero in get_scan_count

Changeset f56ce412a59d ("mm: memcontrol: fix occasional OOMs due to
proportional memory.low reclaim") introduced a divide by zero corner case
when oomd is being used in combination with cgroup memory.low protection.

When oomd decides to kill a cgroup, it will force the cgroup memory to be
reclaimed after killing the tasks, by writing to the memory.max file for
that cgroup, forcing the remaining page cache and reclaimable slab to be
reclaimed down to zero.

Previously, on cgroups with some memory.low protection that would result
in the memory being reclaimed down to the memory.low limit, or likely not
at all, having the page cache reclaimed asynchronously later.

With f56ce412a59d the oomd write to memory.max tries to reclaim all the
way down to zero, which may race with another reclaimer, to the point of
ending up with the divide by zero below.

This patch implements the obvious fix.

Link: https://lkml.kernel.org/r/20210826220149.058089c6@imladris.surriel.com
Fixes: f56ce412a59d ("mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim")
Signed-off-by: Rik van Riel <riel@surriel.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Chris Down <chris@chrisdown.name>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/vmscan.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/vmscan.c~mmvmscan-fix-divide-by-zero-in-get_scan_count
+++ a/mm/vmscan.c
@@ -2715,7 +2715,7 @@ out:
 			cgroup_size = max(cgroup_size, protection);
 
 			scan = lruvec_size - lruvec_size * protection /
-				cgroup_size;
+				(cgroup_size + 1);
 
 			/*
 			 * Minimally target SWAP_CLUSTER_MAX pages to keep
_


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch 4/8] mm/page_alloc.c: avoid accessing uninitialized pcp page migratetype
  2021-09-09  1:08 incoming Andrew Morton
                   ` (2 preceding siblings ...)
  2021-09-09  1:10 ` [patch 3/8] mm,vmscan: fix divide by zero in get_scan_count Andrew Morton
@ 2021-09-09  1:10 ` Andrew Morton
  2021-09-09  1:10 ` [patch 5/8] mmap_lock: change trace and locking order Andrew Morton
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2021-09-09  1:10 UTC (permalink / raw)
  To: akpm, david, linmiaohe, linux-mm, mgorman, mm-commits, stable,
	torvalds, vbabka

From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/page_alloc.c: avoid accessing uninitialized pcp page migratetype

If it's not prepared to free unref page, the pcp page migratetype is
unset.  Thus We will get rubbish from get_pcppage_migratetype() and might
list_del &page->lru again after it's already deleted from the list leading
to grumble about data corruption.

Link: https://lkml.kernel.org/r/20210902115447.57050-1-linmiaohe@huawei.com
Fixes: df1acc856923 ("mm/page_alloc: avoid conflating IRQs disabled with zone->lock")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_alloc.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/mm/page_alloc.c~mm-page_allocc-avoid-accessing-uninitialized-pcp-page-migratetype
+++ a/mm/page_alloc.c
@@ -3428,8 +3428,10 @@ void free_unref_page_list(struct list_he
 	/* Prepare pages for freeing */
 	list_for_each_entry_safe(page, next, list, lru) {
 		pfn = page_to_pfn(page);
-		if (!free_unref_page_prepare(page, pfn, 0))
+		if (!free_unref_page_prepare(page, pfn, 0)) {
 			list_del(&page->lru);
+			continue;
+		}
 
 		/*
 		 * Free isolated pages directly to the allocator, see
_


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch 5/8] mmap_lock: change trace and locking order
  2021-09-09  1:08 incoming Andrew Morton
                   ` (3 preceding siblings ...)
  2021-09-09  1:10 ` [patch 4/8] mm/page_alloc.c: avoid accessing uninitialized pcp page migratetype Andrew Morton
@ 2021-09-09  1:10 ` Andrew Morton
  2021-09-09 12:56   ` Liam Howlett
  2021-09-09  1:10 ` [patch 6/8] mm/kmemleak: allow __GFP_NOLOCKDEP passed to kmemleak's gfp Andrew Morton
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2021-09-09  1:10 UTC (permalink / raw)
  To: akpm, Liam.Howlett, linux-mm, mm-commits, rostedt, torvalds,
	vbabka, walken.cr, willy

From: Liam Howlett <liam.howlett@oracle.com>
Subject: mmap_lock: change trace and locking order

Print to the trace log before releasing the lock to avoid racing with
other trace log printers of the same lock type.

Link: https://lkml.kernel.org/r/20210903022041.1843024-1-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Suggested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michel Lespinasse <walken.cr@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mmap_lock.h |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/include/linux/mmap_lock.h~mmap_lock-change-trace-and-locking-order
+++ a/include/linux/mmap_lock.h
@@ -101,14 +101,14 @@ static inline bool mmap_write_trylock(st
 
 static inline void mmap_write_unlock(struct mm_struct *mm)
 {
-	up_write(&mm->mmap_lock);
 	__mmap_lock_trace_released(mm, true);
+	up_write(&mm->mmap_lock);
 }
 
 static inline void mmap_write_downgrade(struct mm_struct *mm)
 {
-	downgrade_write(&mm->mmap_lock);
 	__mmap_lock_trace_acquire_returned(mm, false, true);
+	downgrade_write(&mm->mmap_lock);
 }
 
 static inline void mmap_read_lock(struct mm_struct *mm)
@@ -140,8 +140,8 @@ static inline bool mmap_read_trylock(str
 
 static inline void mmap_read_unlock(struct mm_struct *mm)
 {
-	up_read(&mm->mmap_lock);
 	__mmap_lock_trace_released(mm, false);
+	up_read(&mm->mmap_lock);
 }
 
 static inline bool mmap_read_trylock_non_owner(struct mm_struct *mm)
@@ -155,8 +155,8 @@ static inline bool mmap_read_trylock_non
 
 static inline void mmap_read_unlock_non_owner(struct mm_struct *mm)
 {
-	up_read_non_owner(&mm->mmap_lock);
 	__mmap_lock_trace_released(mm, false);
+	up_read_non_owner(&mm->mmap_lock);
 }
 
 static inline void mmap_assert_locked(struct mm_struct *mm)
_


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch 6/8] mm/kmemleak: allow __GFP_NOLOCKDEP passed to kmemleak's gfp
  2021-09-09  1:08 incoming Andrew Morton
                   ` (4 preceding siblings ...)
  2021-09-09  1:10 ` [patch 5/8] mmap_lock: change trace and locking order Andrew Morton
@ 2021-09-09  1:10 ` Andrew Morton
  2021-09-09  1:10 ` [patch 7/8] mm/mempolicy: fix a race between offset_il_node and mpol_rebind_task Andrew Morton
  2021-09-09  1:10 ` [patch 8/8] nds32/setup: remove unused memblock_region variable in setup_memory() Andrew Morton
  7 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2021-09-09  1:10 UTC (permalink / raw)
  To: akpm, catalin.marinas, djwong, linux-mm, mm-commits,
	naohiro.aota, torvalds

From: Naohiro Aota <naohiro.aota@wdc.com>
Subject: mm/kmemleak: allow __GFP_NOLOCKDEP passed to kmemleak's gfp

In a memory pressure situation, I'm seeing the lockdep WARNING below. 
Actually, this is similar to a known false positive which is already
addressed by commit 6dcde60efd94 ("xfs: more lockdep whackamole with
kmem_alloc*").

This warning still persists because it's not from kmalloc() itself but
from an allocation for kmemleak object.  While kmalloc() itself suppress
the warning with __GFP_NOLOCKDEP, gfp_kmemleak_mask() is dropping the flag
for the kmemleak's allocation.

Allow __GFP_NOLOCKDEP to be passed to kmemleak's allocation, so that the
warning for it is also suppressed.

  ======================================================
  WARNING: possible circular locking dependency detected
  5.14.0-rc7-BTRFS-ZNS+ #37 Not tainted
  ------------------------------------------------------
  kswapd0/288 is trying to acquire lock:
  ffff88825ab45df0 (&xfs_nondir_ilock_class){++++}-{3:3}, at: xfs_ilock+0x8a/0x250

  but task is already holding lock:
  ffffffff848cc1e0 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #1 (fs_reclaim){+.+.}-{0:0}:
         fs_reclaim_acquire+0x112/0x160
         kmem_cache_alloc+0x48/0x400
         create_object.isra.0+0x42/0xb10
         kmemleak_alloc+0x48/0x80
         __kmalloc+0x228/0x440
         kmem_alloc+0xd3/0x2b0
         kmem_alloc_large+0x5a/0x1c0
         xfs_attr_copy_value+0x112/0x190
         xfs_attr_shortform_getvalue+0x1fc/0x300
         xfs_attr_get_ilocked+0x125/0x170
         xfs_attr_get+0x329/0x450
         xfs_get_acl+0x18d/0x430
         get_acl.part.0+0xb6/0x1e0
         posix_acl_xattr_get+0x13a/0x230
         vfs_getxattr+0x21d/0x270
         getxattr+0x126/0x310
         __x64_sys_fgetxattr+0x1a6/0x2a0
         do_syscall_64+0x3b/0x90
         entry_SYSCALL_64_after_hwframe+0x44/0xae

  -> #0 (&xfs_nondir_ilock_class){++++}-{3:3}:
         __lock_acquire+0x2c0f/0x5a00
         lock_acquire+0x1a1/0x4b0
         down_read_nested+0x50/0x90
         xfs_ilock+0x8a/0x250
         xfs_can_free_eofblocks+0x34f/0x570
         xfs_inactive+0x411/0x520
         xfs_fs_destroy_inode+0x2c8/0x710
         destroy_inode+0xc5/0x1a0
         evict+0x444/0x620
         dispose_list+0xfe/0x1c0
         prune_icache_sb+0xdc/0x160
         super_cache_scan+0x31e/0x510
         do_shrink_slab+0x337/0x8e0
         shrink_slab+0x362/0x5c0
         shrink_node+0x7a7/0x1a40
         balance_pgdat+0x64e/0xfe0
         kswapd+0x590/0xa80
         kthread+0x38c/0x460
         ret_from_fork+0x22/0x30

  other info that might help us debug this:
   Possible unsafe locking scenario:
         CPU0                    CPU1
         ----                    ----
    lock(fs_reclaim);
                                 lock(&xfs_nondir_ilock_class);
                                 lock(fs_reclaim);
    lock(&xfs_nondir_ilock_class);

   *** DEADLOCK ***
  3 locks held by kswapd0/288:
   #0: ffffffff848cc1e0 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
   #1: ffffffff848a08d8 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x269/0x5c0
   #2: ffff8881a7a820e8 (&type->s_umount_key#60){++++}-{3:3}, at: super_cache_scan+0x5a/0x510

Link: https://lkml.kernel.org/r/20210907055659.3182992-1-naohiro.aota@wdc.com
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: "Darrick J . Wong" <djwong@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/kmemleak.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/mm/kmemleak.c~mm-kmemleak-allow-__gfp_nolockdep-passed-to-kmemleaks-gfp
+++ a/mm/kmemleak.c
@@ -113,7 +113,8 @@
 #define BYTES_PER_POINTER	sizeof(void *)
 
 /* GFP bitmask for kmemleak internal allocations */
-#define gfp_kmemleak_mask(gfp)	(((gfp) & (GFP_KERNEL | GFP_ATOMIC)) | \
+#define gfp_kmemleak_mask(gfp)	(((gfp) & (GFP_KERNEL | GFP_ATOMIC | \
+					   __GFP_NOLOCKDEP)) | \
 				 __GFP_NORETRY | __GFP_NOMEMALLOC | \
 				 __GFP_NOWARN)
 
_


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch 7/8] mm/mempolicy: fix a race between offset_il_node and mpol_rebind_task
  2021-09-09  1:08 incoming Andrew Morton
                   ` (5 preceding siblings ...)
  2021-09-09  1:10 ` [patch 6/8] mm/kmemleak: allow __GFP_NOLOCKDEP passed to kmemleak's gfp Andrew Morton
@ 2021-09-09  1:10 ` Andrew Morton
  2021-09-09  1:10 ` [patch 8/8] nds32/setup: remove unused memblock_region variable in setup_memory() Andrew Morton
  7 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2021-09-09  1:10 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, songmuchun, stable, torvalds, yanghui.def

From: yanghui <yanghui.def@bytedance.com>
Subject: mm/mempolicy: fix a race between offset_il_node and mpol_rebind_task

Servers happened below panic:
Kernel version:5.4.56
BUG: unable to handle page fault for address: 0000000000002c48
RIP: 0010:__next_zones_zonelist+0x1d/0x40
[264003.977696] RAX: 0000000000002c40 RBX: 0000000000100dca RCX: 0000000000000014
[264003.977872] Call Trace:
[264003.977888]  __alloc_pages_nodemask+0x277/0x310
[264003.977908]  alloc_page_interleave+0x13/0x70
[264003.977926]  handle_mm_fault+0xf99/0x1390
[264003.977951]  __do_page_fault+0x288/0x500
[264003.977979]  ? schedule+0x39/0xa0
[264003.977994]  do_page_fault+0x30/0x110
[264003.978010]  page_fault+0x3e/0x50

The reason for the panic is that MAX_NUMNODES is passed in the third
parameter in __alloc_pages_nodemask(preferred_nid).  So access to
zonelist->zoneref->zone_idx in __next_zones_zonelist will cause a panic.

In offset_il_node(), first_node() returns nid from pol->v.nodes, after
this other threads may chang pol->v.nodes before next_node().  This race
condition will let next_node return MAX_NUMNODES.  So put pol->nodes in a
local variable.

The race condition is between offset_il_node and cpuset_change_task_nodemask:
CPU0:                                     CPU1:
alloc_pages_vma()
  interleave_nid(pol,)
    offset_il_node(pol,)
      first_node(pol->v.nodes)            cpuset_change_task_nodemask
                      //nodes==0xc          mpol_rebind_task
                                              mpol_rebind_policy
                                                mpol_rebind_nodemask(pol,nodes)
                      //nodes==0x3
      next_node(nid, pol->v.nodes)//return MAX_NUMNODES

Link: https://lkml.kernel.org/r/20210906034658.48721-1-yanghui.def@bytedance.com
Signed-off-by: yanghui <yanghui.def@bytedance.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/mempolicy.c |   17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

--- a/mm/mempolicy.c~mm-mempolicy-fix-a-race-between-offset_il_node-and-mpol_rebind_task
+++ a/mm/mempolicy.c
@@ -1876,17 +1876,26 @@ unsigned int mempolicy_slab_node(void)
  */
 static unsigned offset_il_node(struct mempolicy *pol, unsigned long n)
 {
-	unsigned nnodes = nodes_weight(pol->nodes);
-	unsigned target;
+	nodemask_t nodemask = pol->nodes;
+	unsigned int target, nnodes;
 	int i;
 	int nid;
+	/*
+	 * The barrier will stabilize the nodemask in a register or on
+	 * the stack so that it will stop changing under the code.
+	 *
+	 * Between first_node() and next_node(), pol->nodes could be changed
+	 * by other threads. So we put pol->nodes in a local stack.
+	 */
+	barrier();
 
+	nnodes = nodes_weight(nodemask);
 	if (!nnodes)
 		return numa_node_id();
 	target = (unsigned int)n % nnodes;
-	nid = first_node(pol->nodes);
+	nid = first_node(nodemask);
 	for (i = 0; i < target; i++)
-		nid = next_node(nid, pol->nodes);
+		nid = next_node(nid, nodemask);
 	return nid;
 }
 
_


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch 8/8] nds32/setup: remove unused memblock_region variable in setup_memory()
  2021-09-09  1:08 incoming Andrew Morton
                   ` (6 preceding siblings ...)
  2021-09-09  1:10 ` [patch 7/8] mm/mempolicy: fix a race between offset_il_node and mpol_rebind_task Andrew Morton
@ 2021-09-09  1:10 ` Andrew Morton
  7 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2021-09-09  1:10 UTC (permalink / raw)
  To: akpm, deanbo422, green.hu, linux-mm, linux, lkp, mm-commits,
	nickhu, rppt, torvalds

From: Mike Rapoport <rppt@linux.ibm.com>
Subject: nds32/setup: remove unused memblock_region variable in setup_memory()

kernel test robot reports unused variable warning:

cppcheck possible warnings: (new ones prefixed by >>, may not real
problems)

>> arch/nds32/kernel/setup.c:247:26: warning: Unused variable: region
>> [unusedVariable]
    struct memblock_region *region;
                            ^

Remove the unused variable.

Link: https://lkml.kernel.org/r/20210712125218.28951-1-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/nds32/kernel/setup.c |    1 -
 1 file changed, 1 deletion(-)

--- a/arch/nds32/kernel/setup.c~nds32-setup-remove-unused-memblock_region-variable-in-setup_memory
+++ a/arch/nds32/kernel/setup.c
@@ -244,7 +244,6 @@ static void __init setup_memory(void)
 	unsigned long ram_start_pfn;
 	unsigned long free_ram_start_pfn;
 	phys_addr_t memory_start, memory_end;
-	struct memblock_region *region;
 
 	memory_end = memory_start = 0;
 
_


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 5/8] mmap_lock: change trace and locking order
  2021-09-09  1:10 ` [patch 5/8] mmap_lock: change trace and locking order Andrew Morton
@ 2021-09-09 12:56   ` Liam Howlett
  0 siblings, 0 replies; 10+ messages in thread
From: Liam Howlett @ 2021-09-09 12:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, mm-commits, rostedt, torvalds, vbabka, walken.cr, willy

Andrew,

I sent a v3 of this patch with a better description as suggested by
Vlastimil Babka and Steven Rostedt.  I also forgot to add the
reviewed-by's and acked-by's from v2 as Steven Rostedt pointed out.
It's probably best to look at the email message [1].

1.
https://lore.kernel.org/linux-mm/20210907162537.27cbf082@gandalf.local.home/

Thanks,
Liam

* Andrew Morton <akpm@linux-foundation.org> [210908 21:10]:
> From: Liam Howlett <liam.howlett@oracle.com>
> Subject: mmap_lock: change trace and locking order
> 
> Print to the trace log before releasing the lock to avoid racing with
> other trace log printers of the same lock type.
> 
> Link: https://lkml.kernel.org/r/20210903022041.1843024-1-Liam.Howlett@oracle.com
> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> Suggested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Michel Lespinasse <walken.cr@gmail.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  include/linux/mmap_lock.h |    8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> --- a/include/linux/mmap_lock.h~mmap_lock-change-trace-and-locking-order
> +++ a/include/linux/mmap_lock.h
> @@ -101,14 +101,14 @@ static inline bool mmap_write_trylock(st
>  
>  static inline void mmap_write_unlock(struct mm_struct *mm)
>  {
> -	up_write(&mm->mmap_lock);
>  	__mmap_lock_trace_released(mm, true);
> +	up_write(&mm->mmap_lock);
>  }
>  
>  static inline void mmap_write_downgrade(struct mm_struct *mm)
>  {
> -	downgrade_write(&mm->mmap_lock);
>  	__mmap_lock_trace_acquire_returned(mm, false, true);
> +	downgrade_write(&mm->mmap_lock);
>  }
>  
>  static inline void mmap_read_lock(struct mm_struct *mm)
> @@ -140,8 +140,8 @@ static inline bool mmap_read_trylock(str
>  
>  static inline void mmap_read_unlock(struct mm_struct *mm)
>  {
> -	up_read(&mm->mmap_lock);
>  	__mmap_lock_trace_released(mm, false);
> +	up_read(&mm->mmap_lock);
>  }
>  
>  static inline bool mmap_read_trylock_non_owner(struct mm_struct *mm)
> @@ -155,8 +155,8 @@ static inline bool mmap_read_trylock_non
>  
>  static inline void mmap_read_unlock_non_owner(struct mm_struct *mm)
>  {
> -	up_read_non_owner(&mm->mmap_lock);
>  	__mmap_lock_trace_released(mm, false);
> +	up_read_non_owner(&mm->mmap_lock);
>  }
>  
>  static inline void mmap_assert_locked(struct mm_struct *mm)
> _

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-09-09 12:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-09  1:08 incoming Andrew Morton
2021-09-09  1:10 ` [patch 1/8] mm/hmm: bypass devmap pte when all pfn requested flags are fulfilled Andrew Morton
2021-09-09  1:10 ` [patch 2/8] mm/hugetlb: initialize hugetlb_usage in mm_init Andrew Morton
2021-09-09  1:10 ` [patch 3/8] mm,vmscan: fix divide by zero in get_scan_count Andrew Morton
2021-09-09  1:10 ` [patch 4/8] mm/page_alloc.c: avoid accessing uninitialized pcp page migratetype Andrew Morton
2021-09-09  1:10 ` [patch 5/8] mmap_lock: change trace and locking order Andrew Morton
2021-09-09 12:56   ` Liam Howlett
2021-09-09  1:10 ` [patch 6/8] mm/kmemleak: allow __GFP_NOLOCKDEP passed to kmemleak's gfp Andrew Morton
2021-09-09  1:10 ` [patch 7/8] mm/mempolicy: fix a race between offset_il_node and mpol_rebind_task Andrew Morton
2021-09-09  1:10 ` [patch 8/8] nds32/setup: remove unused memblock_region variable in setup_memory() Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).