linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* incoming
@ 2021-03-25  4:36 Andrew Morton
  2021-03-25  4:37 ` [patch 01/14] hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings Andrew Morton
                   ` (13 more replies)
  0 siblings, 14 replies; 16+ messages in thread
From: Andrew Morton @ 2021-03-25  4:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, mm-commits


14 patches, based on 7acac4b3196caee5e21fb5ea53f8bc124e6a16fc.

Subsystems affected by this patch series:

  mm/hugetlb
  mm/kasan
  mm/gup
  mm/selftests
  mm/z3fold
  squashfs
  ia64
  gcov
  mm/kfence
  mm/memblock
  mm/highmem
  mailmap

Subsystem: mm/hugetlb

    Miaohe Lin <linmiaohe@huawei.com>:
      hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings

Subsystem: mm/kasan

    Andrey Konovalov <andreyknvl@google.com>:
      kasan: fix per-page tags for non-page_alloc pages

Subsystem: mm/gup

    Sean Christopherson <seanjc@google.com>:
      mm/mmu_notifiers: ensure range_end() is paired with range_start()

Subsystem: mm/selftests

    Rong Chen <rong.a.chen@intel.com>:
      selftests/vm: fix out-of-tree build

Subsystem: mm/z3fold

    Thomas Hebb <tommyhebb@gmail.com>:
      z3fold: prevent reclaim/free race for headless pages

Subsystem: squashfs

    Sean Nyekjaer <sean@geanix.com>:
      squashfs: fix inode lookup sanity checks

    Phillip Lougher <phillip@squashfs.org.uk>:
      squashfs: fix xattr id and id lookup sanity checks

Subsystem: ia64

    Sergei Trofimovich <slyfox@gentoo.org>:
      ia64: mca: allocate early mca with GFP_ATOMIC
      ia64: fix format strings for err_inject

Subsystem: gcov

    Nick Desaulniers <ndesaulniers@google.com>:
      gcov: fix clang-11+ support

Subsystem: mm/kfence

    Marco Elver <elver@google.com>:
      kfence: make compatible with kmemleak

Subsystem: mm/memblock

    Mike Rapoport <rppt@linux.ibm.com>:
      mm: memblock: fix section mismatch warning again

Subsystem: mm/highmem

    Ira Weiny <ira.weiny@intel.com>:
      mm/highmem: fix CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP

Subsystem: mailmap

    Andrey Konovalov <andreyknvl@google.com>:
      mailmap: update Andrey Konovalov's email address

 .mailmap                            |    1 
 arch/ia64/kernel/err_inject.c       |   22 +++++------
 arch/ia64/kernel/mca.c              |    2 -
 fs/squashfs/export.c                |    8 +++-
 fs/squashfs/id.c                    |    6 ++-
 fs/squashfs/squashfs_fs.h           |    1 
 fs/squashfs/xattr_id.c              |    6 ++-
 include/linux/hugetlb_cgroup.h      |   15 ++++++-
 include/linux/memblock.h            |    4 +-
 include/linux/mm.h                  |   18 +++++++--
 include/linux/mmu_notifier.h        |   10 ++---
 kernel/gcov/clang.c                 |   69 ++++++++++++++++++++++++++++++++++++
 mm/highmem.c                        |    4 +-
 mm/hugetlb.c                        |   41 +++++++++++++++++++--
 mm/hugetlb_cgroup.c                 |   10 ++++-
 mm/kfence/core.c                    |    9 ++++
 mm/kmemleak.c                       |    3 +
 mm/mmu_notifier.c                   |   23 ++++++++++++
 mm/z3fold.c                         |   16 +++++++-
 tools/testing/selftests/vm/Makefile |    4 +-
 20 files changed, 230 insertions(+), 42 deletions(-)



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 01/14] hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings
  2021-03-25  4:36 incoming Andrew Morton
@ 2021-03-25  4:37 ` Andrew Morton
  2021-03-25  4:37 ` [patch 02/14] kasan: fix per-page tags for non-page_alloc pages Andrew Morton
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2021-03-25  4:37 UTC (permalink / raw)
  To: akpm, almasrymina, aneesh.kumar, linmiaohe, linux-mm, liwp.linux,
	lkp, mike.kravetz, mm-commits, stable, torvalds

From: Miaohe Lin <linmiaohe@huawei.com>
Subject: hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings

The current implementation of hugetlb_cgroup for shared mappings could
have different behavior.  Consider the following two scenarios:

1.Assume initial css reference count of hugetlb_cgroup is 1:
  1.1 Call hugetlb_reserve_pages with from = 1, to = 2. So css reference
count is 2 associated with 1 file_region.
  1.2 Call hugetlb_reserve_pages with from = 2, to = 3. So css reference
count is 3 associated with 2 file_region.
  1.3 coalesce_file_region will coalesce these two file_regions into one.
So css reference count is 3 associated with 1 file_region now.

2.Assume initial css reference count of hugetlb_cgroup is 1 again:
  2.1 Call hugetlb_reserve_pages with from = 1, to = 3. So css reference
count is 2 associated with 1 file_region.

Therefore, we might have one file_region while holding one or more css
reference counts.  This inconsistency could lead to imbalanced css_get()
and css_put() pair.  If we do css_put one by one (i.g.  hole punch case),
scenario 2 would put one more css reference.  If we do css_put all
together (i.g.  truncate case), scenario 1 will leak one css reference.

The imbalanced css_get() and css_put() pair would result in a non-zero
reference when we try to destroy the hugetlb cgroup.  The hugetlb cgroup
directory is removed __but__ associated resource is not freed.  This might
result in OOM or can not create a new hugetlb cgroup in a busy workload
ultimately.

In order to fix this, we have to make sure that one file_region must hold
exactly one css reference.  So in coalesce_file_region case, we should
release one css reference before coalescence.  Also only put css reference
when the entire file_region is removed.

The last thing to note is that the caller of region_add() will only hold
one reference to h_cg->css for the whole contiguous reservation region. 
But this area might be scattered when there are already some file_regions
reside in it.  As a result, many file_regions may share only one h_cg->css
reference.  In order to ensure that one file_region must hold exactly one
css reference, we should do css_get() for each file_region and release the
reference held by caller when they are done.

[linmiaohe@huawei.com: fix imbalanced css_get and css_put pair for shared mappings]
  Link: https://lkml.kernel.org/r/20210316023002.53921-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20210301120540.37076-1-linmiaohe@huawei.com
Fixes: 075a61d07a8e ("hugetlb_cgroup: add accounting for shared mappings")
Reported-by: kernel test robot <lkp@intel.com> (auto build test ERROR)
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Wanpeng Li <liwp.linux@gmail.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/hugetlb_cgroup.h |   15 +++++++++--
 mm/hugetlb.c                   |   41 +++++++++++++++++++++++++++----
 mm/hugetlb_cgroup.c            |   10 ++++++-
 3 files changed, 58 insertions(+), 8 deletions(-)

--- a/include/linux/hugetlb_cgroup.h~hugetlb_cgroup-fix-imbalanced-css_get-and-css_put-pair-for-shared-mappings
+++ a/include/linux/hugetlb_cgroup.h
@@ -113,6 +113,11 @@ static inline bool hugetlb_cgroup_disabl
 	return !cgroup_subsys_enabled(hugetlb_cgrp_subsys);
 }
 
+static inline void hugetlb_cgroup_put_rsvd_cgroup(struct hugetlb_cgroup *h_cg)
+{
+	css_put(&h_cg->css);
+}
+
 extern int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
 					struct hugetlb_cgroup **ptr);
 extern int hugetlb_cgroup_charge_cgroup_rsvd(int idx, unsigned long nr_pages,
@@ -138,7 +143,8 @@ extern void hugetlb_cgroup_uncharge_coun
 
 extern void hugetlb_cgroup_uncharge_file_region(struct resv_map *resv,
 						struct file_region *rg,
-						unsigned long nr_pages);
+						unsigned long nr_pages,
+						bool region_del);
 
 extern void hugetlb_cgroup_file_init(void) __init;
 extern void hugetlb_cgroup_migrate(struct page *oldhpage,
@@ -147,7 +153,8 @@ extern void hugetlb_cgroup_migrate(struc
 #else
 static inline void hugetlb_cgroup_uncharge_file_region(struct resv_map *resv,
 						       struct file_region *rg,
-						       unsigned long nr_pages)
+						       unsigned long nr_pages,
+						       bool region_del)
 {
 }
 
@@ -185,6 +192,10 @@ static inline bool hugetlb_cgroup_disabl
 	return true;
 }
 
+static inline void hugetlb_cgroup_put_rsvd_cgroup(struct hugetlb_cgroup *h_cg)
+{
+}
+
 static inline int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
 					       struct hugetlb_cgroup **ptr)
 {
--- a/mm/hugetlb.c~hugetlb_cgroup-fix-imbalanced-css_get-and-css_put-pair-for-shared-mappings
+++ a/mm/hugetlb.c
@@ -280,6 +280,17 @@ static void record_hugetlb_cgroup_unchar
 		nrg->reservation_counter =
 			&h_cg->rsvd_hugepage[hstate_index(h)];
 		nrg->css = &h_cg->css;
+		/*
+		 * The caller will hold exactly one h_cg->css reference for the
+		 * whole contiguous reservation region. But this area might be
+		 * scattered when there are already some file_regions reside in
+		 * it. As a result, many file_regions may share only one css
+		 * reference. In order to ensure that one file_region must hold
+		 * exactly one h_cg->css reference, we should do css_get for
+		 * each file_region and leave the reference held by caller
+		 * untouched.
+		 */
+		css_get(&h_cg->css);
 		if (!resv->pages_per_hpage)
 			resv->pages_per_hpage = pages_per_huge_page(h);
 		/* pages_per_hpage should be the same for all entries in
@@ -293,6 +304,14 @@ static void record_hugetlb_cgroup_unchar
 #endif
 }
 
+static void put_uncharge_info(struct file_region *rg)
+{
+#ifdef CONFIG_CGROUP_HUGETLB
+	if (rg->css)
+		css_put(rg->css);
+#endif
+}
+
 static bool has_same_uncharge_info(struct file_region *rg,
 				   struct file_region *org)
 {
@@ -316,6 +335,7 @@ static void coalesce_file_region(struct
 		prg->to = rg->to;
 
 		list_del(&rg->link);
+		put_uncharge_info(rg);
 		kfree(rg);
 
 		rg = prg;
@@ -327,6 +347,7 @@ static void coalesce_file_region(struct
 		nrg->from = rg->from;
 
 		list_del(&rg->link);
+		put_uncharge_info(rg);
 		kfree(rg);
 	}
 }
@@ -662,7 +683,7 @@ retry:
 
 			del += t - f;
 			hugetlb_cgroup_uncharge_file_region(
-				resv, rg, t - f);
+				resv, rg, t - f, false);
 
 			/* New entry for end of split region */
 			nrg->from = t;
@@ -683,7 +704,7 @@ retry:
 		if (f <= rg->from && t >= rg->to) { /* Remove entire region */
 			del += rg->to - rg->from;
 			hugetlb_cgroup_uncharge_file_region(resv, rg,
-							    rg->to - rg->from);
+							    rg->to - rg->from, true);
 			list_del(&rg->link);
 			kfree(rg);
 			continue;
@@ -691,13 +712,13 @@ retry:
 
 		if (f <= rg->from) {	/* Trim beginning of region */
 			hugetlb_cgroup_uncharge_file_region(resv, rg,
-							    t - rg->from);
+							    t - rg->from, false);
 
 			del += t - rg->from;
 			rg->from = t;
 		} else {		/* Trim end of region */
 			hugetlb_cgroup_uncharge_file_region(resv, rg,
-							    rg->to - f);
+							    rg->to - f, false);
 
 			del += rg->to - f;
 			rg->to = f;
@@ -5187,6 +5208,10 @@ bool hugetlb_reserve_pages(struct inode
 			 */
 			long rsv_adjust;
 
+			/*
+			 * hugetlb_cgroup_uncharge_cgroup_rsvd() will put the
+			 * reference to h_cg->css. See comment below for detail.
+			 */
 			hugetlb_cgroup_uncharge_cgroup_rsvd(
 				hstate_index(h),
 				(chg - add) * pages_per_huge_page(h), h_cg);
@@ -5194,6 +5219,14 @@ bool hugetlb_reserve_pages(struct inode
 			rsv_adjust = hugepage_subpool_put_pages(spool,
 								chg - add);
 			hugetlb_acct_memory(h, -rsv_adjust);
+		} else if (h_cg) {
+			/*
+			 * The file_regions will hold their own reference to
+			 * h_cg->css. So we should release the reference held
+			 * via hugetlb_cgroup_charge_cgroup_rsvd() when we are
+			 * done.
+			 */
+			hugetlb_cgroup_put_rsvd_cgroup(h_cg);
 		}
 	}
 	return true;
--- a/mm/hugetlb_cgroup.c~hugetlb_cgroup-fix-imbalanced-css_get-and-css_put-pair-for-shared-mappings
+++ a/mm/hugetlb_cgroup.c
@@ -391,7 +391,8 @@ void hugetlb_cgroup_uncharge_counter(str
 
 void hugetlb_cgroup_uncharge_file_region(struct resv_map *resv,
 					 struct file_region *rg,
-					 unsigned long nr_pages)
+					 unsigned long nr_pages,
+					 bool region_del)
 {
 	if (hugetlb_cgroup_disabled() || !resv || !rg || !nr_pages)
 		return;
@@ -400,7 +401,12 @@ void hugetlb_cgroup_uncharge_file_region
 	    !resv->reservation_counter) {
 		page_counter_uncharge(rg->reservation_counter,
 				      nr_pages * resv->pages_per_hpage);
-		css_put(rg->css);
+		/*
+		 * Only do css_put(rg->css) when we delete the entire region
+		 * because one file_region must hold exactly one css reference.
+		 */
+		if (region_del)
+			css_put(rg->css);
 	}
 }
 
_


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 02/14] kasan: fix per-page tags for non-page_alloc pages
  2021-03-25  4:36 incoming Andrew Morton
  2021-03-25  4:37 ` [patch 01/14] hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings Andrew Morton
@ 2021-03-25  4:37 ` Andrew Morton
  2021-03-25  4:37 ` [patch 03/14] mm/mmu_notifiers: ensure range_end() is paired with range_start() Andrew Morton
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2021-03-25  4:37 UTC (permalink / raw)
  To: akpm, andreyknvl, aryabinin, Branislav.Rankov, catalin.marinas,
	dvyukov, elver, eugenis, glider, kevin.brodsky, linux-mm,
	mm-commits, pcc, stable, torvalds, vincenzo.frascino,
	will.deacon

From: Andrey Konovalov <andreyknvl@google.com>
Subject: kasan: fix per-page tags for non-page_alloc pages

To allow performing tag checks on page_alloc addresses obtained via
page_address(), tag-based KASAN modes store tags for page_alloc
allocations in page->flags.

Currently, the default tag value stored in page->flags is 0x00. 
Therefore, page_address() returns a 0x00ffff...  address for pages that
were not allocated via page_alloc.

This might cause problems.  A particular case we encountered is a conflict
with KFENCE.  If a KFENCE-allocated slab object is being freed via
kfree(page_address(page) + offset), the address passed to kfree() will get
tagged with 0x00 (as slab pages keep the default per-page tags).  This
leads to is_kfence_address() check failing, and a KFENCE object ending up
in normal slab freelist, which causes memory corruptions.

This patch changes the way KASAN stores tag in page-flags: they are now
stored xor'ed with 0xff.  This way, KASAN doesn't need to initialize
per-page flags for every created page, which might be slow.

With this change, page_address() returns natively-tagged (with 0xff)
pointers for pages that didn't have tags set explicitly.

This patch fixes the encountered conflict with KFENCE and prevents more
similar issues that can occur in the future.

Link: https://lkml.kernel.org/r/1a41abb11c51b264511d9e71c303bb16d5cb367b.1615475452.git.andreyknvl@google.com
Fixes: 2813b9c02962 ("kasan, mm, arm64: tag non slab memory allocated via pagealloc")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mm.h |   18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

--- a/include/linux/mm.h~kasan-fix-per-page-tags-for-non-page_alloc-pages
+++ a/include/linux/mm.h
@@ -1461,16 +1461,28 @@ static inline bool cpupid_match_pid(stru
 
 #if defined(CONFIG_KASAN_SW_TAGS) || defined(CONFIG_KASAN_HW_TAGS)
 
+/*
+ * KASAN per-page tags are stored xor'ed with 0xff. This allows to avoid
+ * setting tags for all pages to native kernel tag value 0xff, as the default
+ * value 0x00 maps to 0xff.
+ */
+
 static inline u8 page_kasan_tag(const struct page *page)
 {
-	if (kasan_enabled())
-		return (page->flags >> KASAN_TAG_PGSHIFT) & KASAN_TAG_MASK;
-	return 0xff;
+	u8 tag = 0xff;
+
+	if (kasan_enabled()) {
+		tag = (page->flags >> KASAN_TAG_PGSHIFT) & KASAN_TAG_MASK;
+		tag ^= 0xff;
+	}
+
+	return tag;
 }
 
 static inline void page_kasan_tag_set(struct page *page, u8 tag)
 {
 	if (kasan_enabled()) {
+		tag ^= 0xff;
 		page->flags &= ~(KASAN_TAG_MASK << KASAN_TAG_PGSHIFT);
 		page->flags |= (tag & KASAN_TAG_MASK) << KASAN_TAG_PGSHIFT;
 	}
_


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 03/14] mm/mmu_notifiers: ensure range_end() is paired with range_start()
  2021-03-25  4:36 incoming Andrew Morton
  2021-03-25  4:37 ` [patch 01/14] hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings Andrew Morton
  2021-03-25  4:37 ` [patch 02/14] kasan: fix per-page tags for non-page_alloc pages Andrew Morton
@ 2021-03-25  4:37 ` Andrew Morton
  2021-03-25  4:37 ` [patch 04/14] selftests/vm: fix out-of-tree build Andrew Morton
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2021-03-25  4:37 UTC (permalink / raw)
  To: aarcange, akpm, bgardon, dimitri.sivanich, hannes, jgg, jgg,
	jglisse, linux-mm, mhocko, mm-commits, rientjes, seanjc,
	torvalds

From: Sean Christopherson <seanjc@google.com>
Subject: mm/mmu_notifiers: ensure range_end() is paired with range_start()

If one or more notifiers fails .invalidate_range_start(), invoke
.invalidate_range_end() for "all" notifiers.  If there are multiple
notifiers, those that did not fail are expecting _start() and _end() to be
paired, e.g.  KVM's mmu_notifier_count would become imbalanced.  Disallow
notifiers that can fail _start() from implementing _end() so that it's
unnecessary to either track which notifiers rejected _start(), or had
already succeeded prior to a failed _start().

Note, the existing behavior of calling _start() on all notifiers even
after a previous notifier failed _start() was an unintented "feature". 
Make it canon now that the behavior is depended on for correctness.

As of today, the bug is likely benign:

  1. The only caller of the non-blocking notifier is OOM kill.
  2. The only notifiers that can fail _start() are the i915 and Nouveau
     drivers.
  3. The only notifiers that utilize _end() are the SGI UV GRU driver
     and KVM.
  4. The GRU driver will never coincide with the i195/Nouveau drivers.
  5. An imbalanced kvm->mmu_notifier_count only causes soft lockup in the
     _guest_, and the guest is already doomed due to being an OOM victim.

Fix the bug now to play nice with future usage, e.g.  KVM has a potential
use case for blocking memslot updates in KVM while an invalidation is
in-progress, and failure to unblock would result in said updates being
blocked indefinitely and hanging.

Found by inspection.  Verified by adding a second notifier in KVM that
periodically returns -EAGAIN on non-blockable ranges, triggering OOM, and
observing that KVM exits with an elevated notifier count.

Link: https://lkml.kernel.org/r/20210311180057.1582638-1-seanjc@google.com
Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ben Gardon <bgardon@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mmu_notifier.h |   10 +++++-----
 mm/mmu_notifier.c            |   23 +++++++++++++++++++++++
 2 files changed, 28 insertions(+), 5 deletions(-)

--- a/include/linux/mmu_notifier.h~mm-mmu_notifiers-esnure-range_end-is-paired-with-range_start
+++ a/include/linux/mmu_notifier.h
@@ -169,11 +169,11 @@ struct mmu_notifier_ops {
 	 * the last refcount is dropped.
 	 *
 	 * If blockable argument is set to false then the callback cannot
-	 * sleep and has to return with -EAGAIN. 0 should be returned
-	 * otherwise. Please note that if invalidate_range_start approves
-	 * a non-blocking behavior then the same applies to
-	 * invalidate_range_end.
-	 *
+	 * sleep and has to return with -EAGAIN if sleeping would be required.
+	 * 0 should be returned otherwise. Please note that notifiers that can
+	 * fail invalidate_range_start are not allowed to implement
+	 * invalidate_range_end, as there is no mechanism for informing the
+	 * notifier that its start failed.
 	 */
 	int (*invalidate_range_start)(struct mmu_notifier *subscription,
 				      const struct mmu_notifier_range *range);
--- a/mm/mmu_notifier.c~mm-mmu_notifiers-esnure-range_end-is-paired-with-range_start
+++ a/mm/mmu_notifier.c
@@ -501,10 +501,33 @@ static int mn_hlist_invalidate_range_sta
 						"");
 				WARN_ON(mmu_notifier_range_blockable(range) ||
 					_ret != -EAGAIN);
+				/*
+				 * We call all the notifiers on any EAGAIN,
+				 * there is no way for a notifier to know if
+				 * its start method failed, thus a start that
+				 * does EAGAIN can't also do end.
+				 */
+				WARN_ON(ops->invalidate_range_end);
 				ret = _ret;
 			}
 		}
 	}
+
+	if (ret) {
+		/*
+		 * Must be non-blocking to get here.  If there are multiple
+		 * notifiers and one or more failed start, any that succeeded
+		 * start are expecting their end to be called.  Do so now.
+		 */
+		hlist_for_each_entry_rcu(subscription, &subscriptions->list,
+					 hlist, srcu_read_lock_held(&srcu)) {
+			if (!subscription->ops->invalidate_range_end)
+				continue;
+
+			subscription->ops->invalidate_range_end(subscription,
+								range);
+		}
+	}
 	srcu_read_unlock(&srcu, id);
 
 	return ret;
_


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 04/14] selftests/vm: fix out-of-tree build
  2021-03-25  4:36 incoming Andrew Morton
                   ` (2 preceding siblings ...)
  2021-03-25  4:37 ` [patch 03/14] mm/mmu_notifiers: ensure range_end() is paired with range_start() Andrew Morton
@ 2021-03-25  4:37 ` Andrew Morton
  2021-03-25  4:37 ` [patch 05/14] z3fold: prevent reclaim/free race for headless pages Andrew Morton
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2021-03-25  4:37 UTC (permalink / raw)
  To: akpm, linux-mm, lkp, mm-commits, rong.a.chen, shuah, torvalds

From: Rong Chen <rong.a.chen@intel.com>
Subject: selftests/vm: fix out-of-tree build

When building out-of-tree, attempting to make target from $(OUTPUT) directory:

  make[1]: *** No rule to make target '$(OUTPUT)/protection_keys.c', needed by '$(OUTPUT)/protection_keys_32'.

Link: https://lkml.kernel.org/r/20210315094700.522753-1-rong.a.chen@intel.com
Signed-off-by: Rong Chen <rong.a.chen@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 tools/testing/selftests/vm/Makefile |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/tools/testing/selftests/vm/Makefile~selftests-vm-fix-out-of-tree-build
+++ a/tools/testing/selftests/vm/Makefile
@@ -101,7 +101,7 @@ endef
 ifeq ($(CAN_BUILD_I386),1)
 $(BINARIES_32): CFLAGS += -m32
 $(BINARIES_32): LDLIBS += -lrt -ldl -lm
-$(BINARIES_32): %_32: %.c
+$(BINARIES_32): $(OUTPUT)/%_32: %.c
 	$(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(notdir $^) $(LDLIBS) -o $@
 $(foreach t,$(TARGETS),$(eval $(call gen-target-rule-32,$(t))))
 endif
@@ -109,7 +109,7 @@ endif
 ifeq ($(CAN_BUILD_X86_64),1)
 $(BINARIES_64): CFLAGS += -m64
 $(BINARIES_64): LDLIBS += -lrt -ldl
-$(BINARIES_64): %_64: %.c
+$(BINARIES_64): $(OUTPUT)/%_64: %.c
 	$(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(notdir $^) $(LDLIBS) -o $@
 $(foreach t,$(TARGETS),$(eval $(call gen-target-rule-64,$(t))))
 endif
_


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 05/14] z3fold: prevent reclaim/free race for headless pages
  2021-03-25  4:36 incoming Andrew Morton
                   ` (3 preceding siblings ...)
  2021-03-25  4:37 ` [patch 04/14] selftests/vm: fix out-of-tree build Andrew Morton
@ 2021-03-25  4:37 ` Andrew Morton
  2021-03-25  4:37 ` [patch 06/14] squashfs: fix inode lookup sanity checks Andrew Morton
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2021-03-25  4:37 UTC (permalink / raw)
  To: akpm, ks77sj, linux-mm, mm-commits, snild, stable, tommyhebb,
	torvalds, vitaly.wool

From: Thomas Hebb <tommyhebb@gmail.com>
Subject: z3fold: prevent reclaim/free race for headless pages

commit ca0246bb97c2 ("z3fold: fix possible reclaim races") introduced the
PAGE_CLAIMED flag "to avoid racing on a z3fold 'headless' page release."
By atomically testing and setting the bit in each of z3fold_free() and
z3fold_reclaim_page(), a double-free was avoided.

However, commit dcf5aedb24f8 ("z3fold: stricter locking and more careful
reclaim") appears to have unintentionally broken this behavior by moving
the PAGE_CLAIMED check in z3fold_reclaim_page() to after the page lock
gets taken, which only happens for non-headless pages.  For headless
pages, the check is now skipped entirely and races can occur again.

I have observed such a race on my system:

    page:00000000ffbd76b7 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x165316
    flags: 0x2ffff0000000000()
    raw: 02ffff0000000000 ffffea0004535f48 ffff8881d553a170 0000000000000000
    raw: 0000000000000000 0000000000000011 00000000ffffffff 0000000000000000
    page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
    ------------[ cut here ]------------
    kernel BUG at include/linux/mm.h:707!
    invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
    CPU: 2 PID: 291928 Comm: kworker/2:0 Tainted: G    B             5.10.7-arch1-1-kasan #1
    Hardware name: Gigabyte Technology Co., Ltd. H97N-WIFI/H97N-WIFI, BIOS F9b 03/03/2016
    Workqueue: zswap-shrink shrink_worker
    RIP: 0010:__free_pages+0x10a/0x130
    Code: c1 e7 06 48 01 ef 45 85 e4 74 d1 44 89 e6 31 d2 41 83 ec 01 e8 e7 b0 ff ff eb da 48 c7 c6 e0 32 91 88 48 89 ef e8 a6 89 f8 ff <0f> 0b 4c 89 e7 e8 fc 79 07 00 e9 33 ff ff ff 48 89 ef e8 ff 79 07
    RSP: 0000:ffff88819a2ffb98 EFLAGS: 00010296
    RAX: 0000000000000000 RBX: ffffea000594c5a8 RCX: 0000000000000000
    RDX: 1ffffd4000b298b7 RSI: 0000000000000000 RDI: ffffea000594c5b8
    RBP: ffffea000594c580 R08: 000000000000003e R09: ffff8881d5520bbb
    R10: ffffed103aaa4177 R11: 0000000000000001 R12: ffffea000594c5b4
    R13: 0000000000000000 R14: ffff888165316000 R15: ffffea000594c588
    FS:  0000000000000000(0000) GS:ffff8881d5500000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f7c8c3654d8 CR3: 0000000103f42004 CR4: 00000000001706e0
    Call Trace:
     z3fold_zpool_shrink+0x9b6/0x1240
     ? sugov_update_single+0x357/0x990
     ? sched_clock+0x5/0x10
     ? sched_clock_cpu+0x18/0x180
     ? z3fold_zpool_map+0x490/0x490
     ? _raw_spin_lock_irq+0x88/0xe0
     shrink_worker+0x35/0x90
     process_one_work+0x70c/0x1210
     ? pwq_dec_nr_in_flight+0x15b/0x2a0
     worker_thread+0x539/0x1200
     ? __kthread_parkme+0x73/0x120
     ? rescuer_thread+0x1000/0x1000
     kthread+0x330/0x400
     ? __kthread_bind_mask+0x90/0x90
     ret_from_fork+0x22/0x30
    Modules linked in: rfcomm ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ccm algif_aead des_generic libdes ecb algif_skcipher cmac bnep md4 algif_hash af_alg vfat fat intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel iwlmvm hid_logitech_hidpp kvm at24 mac80211 snd_hda_codec_realtek iTCO_wdt snd_hda_codec_generic intel_pmc_bxt snd_hda_codec_hdmi ledtrig_audio iTCO_vendor_support mei_wdt mei_hdcp snd_hda_intel snd_intel_dspcfg libarc4 soundwire_intel irqbypass iwlwifi soundwire_generic_allocation rapl soundwire_cadence intel_cstate snd_hda_codec intel_uncore btusb joydev mousedev snd_usb_audio pcspkr btrtl uvcvideo nouveau btbcm i2c_i801 btintel snd_hda_core videobuf2_vmalloc i2c_smbus snd_usbmidi_lib videobuf2_memops bluetooth snd_hwdep soundwire_bus snd_soc_rt5640 videobuf2_v4l2 cfg80211 snd_soc_rl6231 videobuf2_common snd_rawmidi lpc_ich alx videodev mdio snd_seq_device snd_soc_core mc ecdh_generic mxm_wmi mei_me
     hid_logitech_dj wmi snd_compress e1000e ac97_bus mei ttm rfkill snd_pcm_dmaengine ecc snd_pcm snd_timer snd soundcore mac_hid acpi_pad pkcs8_key_parser it87 hwmon_vid crypto_user fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_crypt cbc encrypted_keys trusted tpm rng_core usbhid dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper xhci_pci xhci_pci_renesas i915 video intel_gtt i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm agpgart
    ---[ end trace 126d646fc3dc0ad8 ]---

To fix the issue, re-add the earlier test and set in the case where we
have a headless page.

Link: https://lkml.kernel.org/r/c8106dbe6d8390b290cd1d7f873a2942e805349e.1615452048.git.tommyhebb@gmail.com
Fixes: dcf5aedb24f8 ("z3fold: stricter locking and more careful reclaim")
Signed-off-by: Thomas Hebb <tommyhebb@gmail.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Jongseok Kim <ks77sj@gmail.com>
Cc: Snild Dolkow <snild@sony.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/z3fold.c |   16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

--- a/mm/z3fold.c~z3fold-prevent-reclaim-free-race-for-headless-pages
+++ a/mm/z3fold.c
@@ -1346,8 +1346,22 @@ static int z3fold_reclaim_page(struct z3
 			page = list_entry(pos, struct page, lru);
 
 			zhdr = page_address(page);
-			if (test_bit(PAGE_HEADLESS, &page->private))
+			if (test_bit(PAGE_HEADLESS, &page->private)) {
+				/*
+				 * For non-headless pages, we wait to do this
+				 * until we have the page lock to avoid racing
+				 * with __z3fold_alloc(). Headless pages don't
+				 * have a lock (and __z3fold_alloc() will never
+				 * see them), but we still need to test and set
+				 * PAGE_CLAIMED to avoid racing with
+				 * z3fold_free(), so just do it now before
+				 * leaving the loop.
+				 */
+				if (test_and_set_bit(PAGE_CLAIMED, &page->private))
+					continue;
+
 				break;
+			}
 
 			if (kref_get_unless_zero(&zhdr->refcount) == 0) {
 				zhdr = NULL;
_


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 06/14] squashfs: fix inode lookup sanity checks
  2021-03-25  4:36 incoming Andrew Morton
                   ` (4 preceding siblings ...)
  2021-03-25  4:37 ` [patch 05/14] z3fold: prevent reclaim/free race for headless pages Andrew Morton
@ 2021-03-25  4:37 ` Andrew Morton
  2021-03-25  4:37 ` [patch 07/14] squashfs: fix xattr id and id " Andrew Morton
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2021-03-25  4:37 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, phillip, sean, stable, torvalds

From: Sean Nyekjaer <sean@geanix.com>
Subject: squashfs: fix inode lookup sanity checks

When mouting a squashfs image created without inode compression it fails
with: "unable to read inode lookup table"

It turns out that the BLOCK_OFFSET is missing when checking the
SQUASHFS_METADATA_SIZE agaist the actual size.

Link: https://lkml.kernel.org/r/20210226092903.1473545-1-sean@geanix.com
Fixes: eabac19e40c0 ("squashfs: add more sanity checks in inode lookup")
Signed-off-by: Sean Nyekjaer <sean@geanix.com>
Acked-by: Phillip Lougher <phillip@squashfs.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/squashfs/export.c      |    8 ++++++--
 fs/squashfs/squashfs_fs.h |    1 +
 2 files changed, 7 insertions(+), 2 deletions(-)

--- a/fs/squashfs/export.c~squashfs-fix-inode-lookup-sanity-checks
+++ a/fs/squashfs/export.c
@@ -152,14 +152,18 @@ __le64 *squashfs_read_inode_lookup_table
 		start = le64_to_cpu(table[n]);
 		end = le64_to_cpu(table[n + 1]);
 
-		if (start >= end || (end - start) > SQUASHFS_METADATA_SIZE) {
+		if (start >= end
+		    || (end - start) >
+		    (SQUASHFS_METADATA_SIZE + SQUASHFS_BLOCK_OFFSET)) {
 			kfree(table);
 			return ERR_PTR(-EINVAL);
 		}
 	}
 
 	start = le64_to_cpu(table[indexes - 1]);
-	if (start >= lookup_table_start || (lookup_table_start - start) > SQUASHFS_METADATA_SIZE) {
+	if (start >= lookup_table_start ||
+	    (lookup_table_start - start) >
+	    (SQUASHFS_METADATA_SIZE + SQUASHFS_BLOCK_OFFSET)) {
 		kfree(table);
 		return ERR_PTR(-EINVAL);
 	}
--- a/fs/squashfs/squashfs_fs.h~squashfs-fix-inode-lookup-sanity-checks
+++ a/fs/squashfs/squashfs_fs.h
@@ -17,6 +17,7 @@
 
 /* size of metadata (inode and directory) blocks */
 #define SQUASHFS_METADATA_SIZE		8192
+#define SQUASHFS_BLOCK_OFFSET		2
 
 /* default size of block device I/O */
 #ifdef CONFIG_SQUASHFS_4K_DEVBLK_SIZE
_


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 07/14] squashfs: fix xattr id and id lookup sanity checks
  2021-03-25  4:36 incoming Andrew Morton
                   ` (5 preceding siblings ...)
  2021-03-25  4:37 ` [patch 06/14] squashfs: fix inode lookup sanity checks Andrew Morton
@ 2021-03-25  4:37 ` Andrew Morton
  2021-03-25  4:37 ` [patch 08/14] ia64: mca: allocate early mca with GFP_ATOMIC Andrew Morton
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2021-03-25  4:37 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, phillip, sean, stable, torvalds

From: Phillip Lougher <phillip@squashfs.org.uk>
Subject: squashfs: fix xattr id and id lookup sanity checks

The checks for maximum metadata block size is missing
SQUASHFS_BLOCK_OFFSET (the two byte length count).

Link: https://lkml.kernel.org/r/2069685113.2081245.1614583677427@webmail.123-reg.co.uk
Fixes: f37aa4c7366e23f ("squashfs: add more sanity checks in id lookup")
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Sean Nyekjaer <sean@geanix.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/squashfs/id.c       |    6 ++++--
 fs/squashfs/xattr_id.c |    6 ++++--
 2 files changed, 8 insertions(+), 4 deletions(-)

--- a/fs/squashfs/id.c~squashfs-fix-xattr-id-and-id-lookup-sanity-checks
+++ a/fs/squashfs/id.c
@@ -97,14 +97,16 @@ __le64 *squashfs_read_id_index_table(str
 		start = le64_to_cpu(table[n]);
 		end = le64_to_cpu(table[n + 1]);
 
-		if (start >= end || (end - start) > SQUASHFS_METADATA_SIZE) {
+		if (start >= end || (end - start) >
+				(SQUASHFS_METADATA_SIZE + SQUASHFS_BLOCK_OFFSET)) {
 			kfree(table);
 			return ERR_PTR(-EINVAL);
 		}
 	}
 
 	start = le64_to_cpu(table[indexes - 1]);
-	if (start >= id_table_start || (id_table_start - start) > SQUASHFS_METADATA_SIZE) {
+	if (start >= id_table_start || (id_table_start - start) >
+				(SQUASHFS_METADATA_SIZE + SQUASHFS_BLOCK_OFFSET)) {
 		kfree(table);
 		return ERR_PTR(-EINVAL);
 	}
--- a/fs/squashfs/xattr_id.c~squashfs-fix-xattr-id-and-id-lookup-sanity-checks
+++ a/fs/squashfs/xattr_id.c
@@ -109,14 +109,16 @@ __le64 *squashfs_read_xattr_id_table(str
 		start = le64_to_cpu(table[n]);
 		end = le64_to_cpu(table[n + 1]);
 
-		if (start >= end || (end - start) > SQUASHFS_METADATA_SIZE) {
+		if (start >= end || (end - start) >
+				(SQUASHFS_METADATA_SIZE + SQUASHFS_BLOCK_OFFSET)) {
 			kfree(table);
 			return ERR_PTR(-EINVAL);
 		}
 	}
 
 	start = le64_to_cpu(table[indexes - 1]);
-	if (start >= table_start || (table_start - start) > SQUASHFS_METADATA_SIZE) {
+	if (start >= table_start || (table_start - start) >
+				(SQUASHFS_METADATA_SIZE + SQUASHFS_BLOCK_OFFSET)) {
 		kfree(table);
 		return ERR_PTR(-EINVAL);
 	}
_


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 08/14] ia64: mca: allocate early mca with GFP_ATOMIC
  2021-03-25  4:36 incoming Andrew Morton
                   ` (6 preceding siblings ...)
  2021-03-25  4:37 ` [patch 07/14] squashfs: fix xattr id and id " Andrew Morton
@ 2021-03-25  4:37 ` Andrew Morton
  2021-03-25  4:37 ` [patch 09/14] ia64: fix format strings for err_inject Andrew Morton
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2021-03-25  4:37 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, slyfox, torvalds

From: Sergei Trofimovich <slyfox@gentoo.org>
Subject: ia64: mca: allocate early mca with GFP_ATOMIC

The sleep warning happens at early boot right at secondary CPU activation
bootup:

    smp: Bringing up secondary CPUs ...
    BUG: sleeping function called from invalid context at mm/page_alloc.c:4942
    in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1
    CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.12.0-rc2-00007-g79e228d0b611-dirty #99

    Call Trace:
     [<a000000100014d10>] show_stack+0x90/0xc0
     [<a000000101111d90>] dump_stack+0x150/0x1c0
     [<a0000001000cbec0>] ___might_sleep+0x1c0/0x2a0
     [<a0000001000cc040>] __might_sleep+0xa0/0x160
     [<a000000100399960>] __alloc_pages_nodemask+0x1a0/0x600
     [<a0000001003b71b0>] alloc_page_interleave+0x30/0x1c0
     [<a0000001003b9b60>] alloc_pages_current+0x2c0/0x340
     [<a00000010038c270>] __get_free_pages+0x30/0xa0
     [<a000000100044730>] ia64_mca_cpu_init+0x2d0/0x3a0
     [<a000000100023430>] cpu_init+0x8b0/0x1440
     [<a000000100054680>] start_secondary+0x60/0x700
     [<a00000010111e1d0>] start_ap+0x750/0x780
    Fixed BSP b0 value from CPU 1

As I understand interrupts are not enabled yet and system has a lot of
memory.  There is little chance to sleep and switch to GFP_ATOMIC should
be a no-op.

Link: https://lkml.kernel.org/r/20210315085045.204414-1-slyfox@gentoo.org
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/ia64/kernel/mca.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/ia64/kernel/mca.c~ia64-mca-allocate-early-mca-with-gfp_atomic
+++ a/arch/ia64/kernel/mca.c
@@ -1824,7 +1824,7 @@ ia64_mca_cpu_init(void *cpu_data)
 			data = mca_bootmem();
 			first_time = 0;
 		} else
-			data = (void *)__get_free_pages(GFP_KERNEL,
+			data = (void *)__get_free_pages(GFP_ATOMIC,
 							get_order(sz));
 		if (!data)
 			panic("Could not allocate MCA memory for cpu %d\n",
_


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 09/14] ia64: fix format strings for err_inject
  2021-03-25  4:36 incoming Andrew Morton
                   ` (7 preceding siblings ...)
  2021-03-25  4:37 ` [patch 08/14] ia64: mca: allocate early mca with GFP_ATOMIC Andrew Morton
@ 2021-03-25  4:37 ` Andrew Morton
  2021-03-25  4:37 ` [patch 10/14] gcov: fix clang-11+ support Andrew Morton
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2021-03-25  4:37 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, slyfox, torvalds

From: Sergei Trofimovich <slyfox@gentoo.org>
Subject: ia64: fix format strings for err_inject

Fix warning with %lx / u64 mismatch:

  arch/ia64/kernel/err_inject.c: In function 'show_resources':
  arch/ia64/kernel/err_inject.c:62:22: warning:
    format '%lx' expects argument of type 'long unsigned int',
    but argument 3 has type 'u64' {aka 'long long unsigned int'}
     62 |  return sprintf(buf, "%lx
", name[cpu]);   \
        |                      ^~~~~~~

Link: https://lkml.kernel.org/r/20210313104312.1548232-1-slyfox@gentoo.org
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/ia64/kernel/err_inject.c |   22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

--- a/arch/ia64/kernel/err_inject.c~ia64-fix-format-strings-for-err_inject
+++ a/arch/ia64/kernel/err_inject.c
@@ -59,7 +59,7 @@ show_##name(struct device *dev, struct d
 		char *buf)						\
 {									\
 	u32 cpu=dev->id;						\
-	return sprintf(buf, "%lx\n", name[cpu]);			\
+	return sprintf(buf, "%llx\n", name[cpu]);			\
 }
 
 #define store(name)							\
@@ -86,9 +86,9 @@ store_call_start(struct device *dev, str
 
 #ifdef ERR_INJ_DEBUG
 	printk(KERN_DEBUG "pal_mc_err_inject for cpu%d:\n", cpu);
-	printk(KERN_DEBUG "err_type_info=%lx,\n", err_type_info[cpu]);
-	printk(KERN_DEBUG "err_struct_info=%lx,\n", err_struct_info[cpu]);
-	printk(KERN_DEBUG "err_data_buffer=%lx, %lx, %lx.\n",
+	printk(KERN_DEBUG "err_type_info=%llx,\n", err_type_info[cpu]);
+	printk(KERN_DEBUG "err_struct_info=%llx,\n", err_struct_info[cpu]);
+	printk(KERN_DEBUG "err_data_buffer=%llx, %llx, %llx.\n",
 			  err_data_buffer[cpu].data1,
 			  err_data_buffer[cpu].data2,
 			  err_data_buffer[cpu].data3);
@@ -117,8 +117,8 @@ store_call_start(struct device *dev, str
 
 #ifdef ERR_INJ_DEBUG
 	printk(KERN_DEBUG "Returns: status=%d,\n", (int)status[cpu]);
-	printk(KERN_DEBUG "capabilities=%lx,\n", capabilities[cpu]);
-	printk(KERN_DEBUG "resources=%lx\n", resources[cpu]);
+	printk(KERN_DEBUG "capabilities=%llx,\n", capabilities[cpu]);
+	printk(KERN_DEBUG "resources=%llx\n", resources[cpu]);
 #endif
 	return size;
 }
@@ -131,7 +131,7 @@ show_virtual_to_phys(struct device *dev,
 			char *buf)
 {
 	unsigned int cpu=dev->id;
-	return sprintf(buf, "%lx\n", phys_addr[cpu]);
+	return sprintf(buf, "%llx\n", phys_addr[cpu]);
 }
 
 static ssize_t
@@ -145,7 +145,7 @@ store_virtual_to_phys(struct device *dev
 	ret = get_user_pages_fast(virt_addr, 1, FOLL_WRITE, NULL);
 	if (ret<=0) {
 #ifdef ERR_INJ_DEBUG
-		printk("Virtual address %lx is not existing.\n",virt_addr);
+		printk("Virtual address %llx is not existing.\n", virt_addr);
 #endif
 		return -EINVAL;
 	}
@@ -163,7 +163,7 @@ show_err_data_buffer(struct device *dev,
 {
 	unsigned int cpu=dev->id;
 
-	return sprintf(buf, "%lx, %lx, %lx\n",
+	return sprintf(buf, "%llx, %llx, %llx\n",
 			err_data_buffer[cpu].data1,
 			err_data_buffer[cpu].data2,
 			err_data_buffer[cpu].data3);
@@ -178,13 +178,13 @@ store_err_data_buffer(struct device *dev
 	int ret;
 
 #ifdef ERR_INJ_DEBUG
-	printk("write err_data_buffer=[%lx,%lx,%lx] on cpu%d\n",
+	printk("write err_data_buffer=[%llx,%llx,%llx] on cpu%d\n",
 		 err_data_buffer[cpu].data1,
 		 err_data_buffer[cpu].data2,
 		 err_data_buffer[cpu].data3,
 		 cpu);
 #endif
-	ret=sscanf(buf, "%lx, %lx, %lx",
+	ret = sscanf(buf, "%llx, %llx, %llx",
 			&err_data_buffer[cpu].data1,
 			&err_data_buffer[cpu].data2,
 			&err_data_buffer[cpu].data3);
_


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 10/14] gcov: fix clang-11+ support
  2021-03-25  4:36 incoming Andrew Morton
                   ` (8 preceding siblings ...)
  2021-03-25  4:37 ` [patch 09/14] ia64: fix format strings for err_inject Andrew Morton
@ 2021-03-25  4:37 ` Andrew Morton
  2021-03-25  4:37 ` [patch 11/14] kfence: make compatible with kmemleak Andrew Morton
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2021-03-25  4:37 UTC (permalink / raw)
  To: akpm, linux-mm, maskray, mm-commits, nathan, ndesaulniers,
	oberpar, psodagud, stable, torvalds

From: Nick Desaulniers <ndesaulniers@google.com>
Subject: gcov: fix clang-11+ support

LLVM changed the expected function signatures for llvm_gcda_start_file()
and llvm_gcda_emit_function() in the clang-11 release.  Users of clang-11
or newer may have noticed their kernels failing to boot due to a panic
when enabling CONFIG_GCOV_KERNEL=y +CONFIG_GCOV_PROFILE_ALL=y.  Fix up the
function signatures so calling these functions doesn't panic the kernel.

Link: https://reviews.llvm.org/rGcdd683b516d147925212724b09ec6fb792a40041
Link: https://reviews.llvm.org/rG13a633b438b6500ecad9e4f936ebadf3411d0f44
Link: https://lkml.kernel.org/r/20210312224132.3413602-2-ndesaulniers@google.com
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reported-by: Prasad Sodagudi <psodagud@quicinc.com>
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Fangrui Song <maskray@google.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Cc: <stable@vger.kernel.org>	[5.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 kernel/gcov/clang.c |   69 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)

--- a/kernel/gcov/clang.c~gcov-fix-clang-11-support
+++ a/kernel/gcov/clang.c
@@ -75,7 +75,9 @@ struct gcov_fn_info {
 
 	u32 num_counters;
 	u64 *counters;
+#if CONFIG_CLANG_VERSION < 110000
 	const char *function_name;
+#endif
 };
 
 static struct gcov_info *current_info;
@@ -105,6 +107,7 @@ void llvm_gcov_init(llvm_gcov_callback w
 }
 EXPORT_SYMBOL(llvm_gcov_init);
 
+#if CONFIG_CLANG_VERSION < 110000
 void llvm_gcda_start_file(const char *orig_filename, const char version[4],
 		u32 checksum)
 {
@@ -113,7 +116,17 @@ void llvm_gcda_start_file(const char *or
 	current_info->checksum = checksum;
 }
 EXPORT_SYMBOL(llvm_gcda_start_file);
+#else
+void llvm_gcda_start_file(const char *orig_filename, u32 version, u32 checksum)
+{
+	current_info->filename = orig_filename;
+	current_info->version = version;
+	current_info->checksum = checksum;
+}
+EXPORT_SYMBOL(llvm_gcda_start_file);
+#endif
 
+#if CONFIG_CLANG_VERSION < 110000
 void llvm_gcda_emit_function(u32 ident, const char *function_name,
 		u32 func_checksum, u8 use_extra_checksum, u32 cfg_checksum)
 {
@@ -133,6 +146,24 @@ void llvm_gcda_emit_function(u32 ident,
 	list_add_tail(&info->head, &current_info->functions);
 }
 EXPORT_SYMBOL(llvm_gcda_emit_function);
+#else
+void llvm_gcda_emit_function(u32 ident, u32 func_checksum,
+		u8 use_extra_checksum, u32 cfg_checksum)
+{
+	struct gcov_fn_info *info = kzalloc(sizeof(*info), GFP_KERNEL);
+
+	if (!info)
+		return;
+
+	INIT_LIST_HEAD(&info->head);
+	info->ident = ident;
+	info->checksum = func_checksum;
+	info->use_extra_checksum = use_extra_checksum;
+	info->cfg_checksum = cfg_checksum;
+	list_add_tail(&info->head, &current_info->functions);
+}
+EXPORT_SYMBOL(llvm_gcda_emit_function);
+#endif
 
 void llvm_gcda_emit_arcs(u32 num_counters, u64 *counters)
 {
@@ -295,6 +326,7 @@ void gcov_info_add(struct gcov_info *dst
 	}
 }
 
+#if CONFIG_CLANG_VERSION < 110000
 static struct gcov_fn_info *gcov_fn_info_dup(struct gcov_fn_info *fn)
 {
 	size_t cv_size; /* counter values size */
@@ -322,6 +354,28 @@ err_name:
 	kfree(fn_dup);
 	return NULL;
 }
+#else
+static struct gcov_fn_info *gcov_fn_info_dup(struct gcov_fn_info *fn)
+{
+	size_t cv_size; /* counter values size */
+	struct gcov_fn_info *fn_dup = kmemdup(fn, sizeof(*fn),
+			GFP_KERNEL);
+	if (!fn_dup)
+		return NULL;
+	INIT_LIST_HEAD(&fn_dup->head);
+
+	cv_size = fn->num_counters * sizeof(fn->counters[0]);
+	fn_dup->counters = vmalloc(cv_size);
+	if (!fn_dup->counters) {
+		kfree(fn_dup);
+		return NULL;
+	}
+
+	memcpy(fn_dup->counters, fn->counters, cv_size);
+
+	return fn_dup;
+}
+#endif
 
 /**
  * gcov_info_dup - duplicate profiling data set
@@ -362,6 +416,7 @@ err:
  * gcov_info_free - release memory for profiling data set duplicate
  * @info: profiling data set duplicate to free
  */
+#if CONFIG_CLANG_VERSION < 110000
 void gcov_info_free(struct gcov_info *info)
 {
 	struct gcov_fn_info *fn, *tmp;
@@ -375,6 +430,20 @@ void gcov_info_free(struct gcov_info *in
 	kfree(info->filename);
 	kfree(info);
 }
+#else
+void gcov_info_free(struct gcov_info *info)
+{
+	struct gcov_fn_info *fn, *tmp;
+
+	list_for_each_entry_safe(fn, tmp, &info->functions, head) {
+		vfree(fn->counters);
+		list_del(&fn->head);
+		kfree(fn);
+	}
+	kfree(info->filename);
+	kfree(info);
+}
+#endif
 
 #define ITER_STRIDE	PAGE_SIZE
 
_


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 11/14] kfence: make compatible with kmemleak
  2021-03-25  4:36 incoming Andrew Morton
                   ` (9 preceding siblings ...)
  2021-03-25  4:37 ` [patch 10/14] gcov: fix clang-11+ support Andrew Morton
@ 2021-03-25  4:37 ` Andrew Morton
  2021-03-25  4:37 ` [patch 12/14] mm: memblock: fix section mismatch warning again Andrew Morton
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2021-03-25  4:37 UTC (permalink / raw)
  To: akpm, andreyknvl, catalin.marinas, dvyukov, elver, glider, jannh,
	lhenriques, linux-mm, mm-commits, torvalds

From: Marco Elver <elver@google.com>
Subject: kfence: make compatible with kmemleak

Because memblock allocations are registered with kmemleak, the KFENCE pool
was seen by kmemleak as one large object.  Later allocations through
kfence_alloc() that were registered with kmemleak via
slab_post_alloc_hook() would then overlap and trigger a warning. 
Therefore, once the pool is initialized, we can remove (free) it from
kmemleak again, since it should be treated as allocator-internal and be
seen as "free memory".

The second problem is that kmemleak is passed the rounded size, and not
the originally requested size, which is also the size of KFENCE objects. 
To avoid kmemleak scanning past the end of an object and trigger a KFENCE
out-of-bounds error, fix the size if it is a KFENCE object.

For simplicity, to avoid a call to kfence_ksize() in
slab_post_alloc_hook() (and avoid new IS_ENABLED(CONFIG_DEBUG_KMEMLEAK)
guard), just call kfence_ksize() in mm/kmemleak.c:create_object().

Link: https://lkml.kernel.org/r/20210317084740.3099921-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Luis Henriques <lhenriques@suse.de>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Luis Henriques <lhenriques@suse.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/kfence/core.c |    9 +++++++++
 mm/kmemleak.c    |    3 ++-
 2 files changed, 11 insertions(+), 1 deletion(-)

--- a/mm/kfence/core.c~kfence-make-compatible-with-kmemleak
+++ a/mm/kfence/core.c
@@ -12,6 +12,7 @@
 #include <linux/debugfs.h>
 #include <linux/kcsan-checks.h>
 #include <linux/kfence.h>
+#include <linux/kmemleak.h>
 #include <linux/list.h>
 #include <linux/lockdep.h>
 #include <linux/memblock.h>
@@ -480,6 +481,14 @@ static bool __init kfence_init_pool(void
 		addr += 2 * PAGE_SIZE;
 	}
 
+	/*
+	 * The pool is live and will never be deallocated from this point on.
+	 * Remove the pool object from the kmemleak object tree, as it would
+	 * otherwise overlap with allocations returned by kfence_alloc(), which
+	 * are registered with kmemleak through the slab post-alloc hook.
+	 */
+	kmemleak_free(__kfence_pool);
+
 	return true;
 
 err:
--- a/mm/kmemleak.c~kfence-make-compatible-with-kmemleak
+++ a/mm/kmemleak.c
@@ -97,6 +97,7 @@
 #include <linux/atomic.h>
 
 #include <linux/kasan.h>
+#include <linux/kfence.h>
 #include <linux/kmemleak.h>
 #include <linux/memory_hotplug.h>
 
@@ -589,7 +590,7 @@ static struct kmemleak_object *create_ob
 	atomic_set(&object->use_count, 1);
 	object->flags = OBJECT_ALLOCATED;
 	object->pointer = ptr;
-	object->size = size;
+	object->size = kfence_ksize((void *)ptr) ?: size;
 	object->excess_ref = 0;
 	object->min_count = min_count;
 	object->count = 0;			/* white color initially */
_


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 12/14] mm: memblock: fix section mismatch warning again
  2021-03-25  4:36 incoming Andrew Morton
                   ` (10 preceding siblings ...)
  2021-03-25  4:37 ` [patch 11/14] kfence: make compatible with kmemleak Andrew Morton
@ 2021-03-25  4:37 ` Andrew Morton
  2021-03-25  4:37 ` [patch 13/14] mm/highmem: fix CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP Andrew Morton
  2021-03-25  4:37 ` [patch 14/14] mailmap: update Andrey Konovalov's email address Andrew Morton
  13 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2021-03-25  4:37 UTC (permalink / raw)
  To: akpm, arnd, david, linux-mm, lkp, mm-commits, ndesaulniers, rppt,
	torvalds

From: Mike Rapoport <rppt@linux.ibm.com>
Subject: mm: memblock: fix section mismatch warning again

Commit 34dc2efb39a2 ("memblock: fix section mismatch warning") marked
memblock_bottom_up() and memblock_set_bottom_up() as __init, but they
could be referenced from non-init functions like
memblock_find_in_range_node() on architectures that enable
CONFIG_ARCH_KEEP_MEMBLOCK.

For such builds kernel test robot reports: All warnings (new ones prefixed
by >>, old ones prefixed by <<):

>> WARNING: modpost: vmlinux.o(.text+0x74fea4): Section mismatch in reference from the function memblock_find_in_range_node() to the function .init.text:memblock_bottom_up()
The function memblock_find_in_range_node() references
the function __init memblock_bottom_up().
This is often because memblock_find_in_range_node lacks a __init
annotation or the annotation of memblock_bottom_up is wrong.

Replace __init annotations with __init_memblock annotations so that the
appropriate section will be selected depending on
CONFIG_ARCH_KEEP_MEMBLOCK.

Link: https://lore.kernel.org/lkml/202103160133.UzhgY0wt-lkp@intel.com
Link: https://lkml.kernel.org/r/20210316171347.14084-1-rppt@kernel.org
Fixes: 34dc2efb39a2 ("memblock: fix section mismatch warning")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/memblock.h |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/include/linux/memblock.h~memblock-fix-section-mismatch-warning-again
+++ a/include/linux/memblock.h
@@ -460,7 +460,7 @@ static inline void memblock_free_late(ph
 /*
  * Set the allocation direction to bottom-up or top-down.
  */
-static inline __init void memblock_set_bottom_up(bool enable)
+static inline __init_memblock void memblock_set_bottom_up(bool enable)
 {
 	memblock.bottom_up = enable;
 }
@@ -470,7 +470,7 @@ static inline __init void memblock_set_b
  * if this is true, that said, memblock will allocate memory
  * in bottom-up direction.
  */
-static inline __init bool memblock_bottom_up(void)
+static inline __init_memblock bool memblock_bottom_up(void)
 {
 	return memblock.bottom_up;
 }
_


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 13/14] mm/highmem: fix CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP
  2021-03-25  4:36 incoming Andrew Morton
                   ` (11 preceding siblings ...)
  2021-03-25  4:37 ` [patch 12/14] mm: memblock: fix section mismatch warning again Andrew Morton
@ 2021-03-25  4:37 ` Andrew Morton
  2021-03-25  4:37 ` [patch 14/14] mailmap: update Andrey Konovalov's email address Andrew Morton
  13 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2021-03-25  4:37 UTC (permalink / raw)
  To: akpm, Chaitanya.Kulkarni, dsterba, ira.weiny, linux-mm,
	mm-commits, oliver.sang, stable, tglx, torvalds

From: Ira Weiny <ira.weiny@intel.com>
Subject: mm/highmem: fix CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP

The kernel test robot found that __kmap_local_sched_out() was not
correctly skipping the guard pages when CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP
was set.[1] This was due to CONFIG_DEBUG_HIGHMEM check being used.

Change the configuration check to be correct.

[1] https://lore.kernel.org/lkml/20210304083825.GB17830@xsang-OptiPlex-9020/

Link: https://lkml.kernel.org/r/20210318230657.1497881-1-ira.weiny@intel.com
Fixes: 0e91a0c6984c ("mm/highmem: Provide CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP")
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Oliver Sang <oliver.sang@intel.com>
Cc: Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>
Cc: David Sterba <dsterba@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/highmem.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/mm/highmem.c~mm-highmem-fix-config_debug_kmap_local_force_map
+++ a/mm/highmem.c
@@ -618,7 +618,7 @@ void __kmap_local_sched_out(void)
 		int idx;
 
 		/* With debug all even slots are unmapped and act as guard */
-		if (IS_ENABLED(CONFIG_DEBUG_HIGHMEM) && !(i & 0x01)) {
+		if (IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL) && !(i & 0x01)) {
 			WARN_ON_ONCE(!pte_none(pteval));
 			continue;
 		}
@@ -654,7 +654,7 @@ void __kmap_local_sched_in(void)
 		int idx;
 
 		/* With debug all even slots are unmapped and act as guard */
-		if (IS_ENABLED(CONFIG_DEBUG_HIGHMEM) && !(i & 0x01)) {
+		if (IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL) && !(i & 0x01)) {
 			WARN_ON_ONCE(!pte_none(pteval));
 			continue;
 		}
_


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 14/14] mailmap: update Andrey Konovalov's email address
  2021-03-25  4:36 incoming Andrew Morton
                   ` (12 preceding siblings ...)
  2021-03-25  4:37 ` [patch 13/14] mm/highmem: fix CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP Andrew Morton
@ 2021-03-25  4:37 ` Andrew Morton
  13 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2021-03-25  4:37 UTC (permalink / raw)
  To: akpm, andreyknvl, linux-mm, mm-commits, torvalds

From: Andrey Konovalov <andreyknvl@google.com>
Subject: mailmap: update Andrey Konovalov's email address

Use my personal email, the @google.com one will stop functioning soon.

Link: https://lkml.kernel.org/r/ead0e9c32a2f70e0bde6f63b3b9470e0ef13d2ee.1616107969.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 .mailmap |    1 +
 1 file changed, 1 insertion(+)

--- a/.mailmap~mailmap-update-andrey-konovalovs-email-address
+++ a/.mailmap
@@ -36,6 +36,7 @@ Andrew Morton <akpm@linux-foundation.org
 Andrew Murray <amurray@thegoodpenguin.co.uk> <amurray@embedded-bits.co.uk>
 Andrew Murray <amurray@thegoodpenguin.co.uk> <andrew.murray@arm.com>
 Andrew Vasquez <andrew.vasquez@qlogic.com>
+Andrey Konovalov <andreyknvl@gmail.com> <andreyknvl@google.com>
 Andrey Ryabinin <ryabinin.a.a@gmail.com> <a.ryabinin@samsung.com>
 Andrey Ryabinin <ryabinin.a.a@gmail.com> <aryabinin@virtuozzo.com>
 Andy Adamson <andros@citi.umich.edu>
_


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch 05/14] z3fold: prevent reclaim/free race for headless pages
       [not found] <20210324213324.fe1ba237e471b5de17a08775@linux-foundation.org>
@ 2021-03-25  4:36 ` Andrew Morton
  0 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2021-03-25  4:36 UTC (permalink / raw)
  To: akpm, ks77sj, linux-mm, mm-commits, snild, stable, tommyhebb,
	torvalds, vitaly.wool

From: Thomas Hebb <tommyhebb@gmail.com>
Subject: z3fold: prevent reclaim/free race for headless pages

commit ca0246bb97c2 ("z3fold: fix possible reclaim races") introduced the
PAGE_CLAIMED flag "to avoid racing on a z3fold 'headless' page release."
By atomically testing and setting the bit in each of z3fold_free() and
z3fold_reclaim_page(), a double-free was avoided.

However, commit dcf5aedb24f8 ("z3fold: stricter locking and more careful
reclaim") appears to have unintentionally broken this behavior by moving
the PAGE_CLAIMED check in z3fold_reclaim_page() to after the page lock
gets taken, which only happens for non-headless pages.  For headless
pages, the check is now skipped entirely and races can occur again.

I have observed such a race on my system:

    page:00000000ffbd76b7 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x165316
    flags: 0x2ffff0000000000()
    raw: 02ffff0000000000 ffffea0004535f48 ffff8881d553a170 0000000000000000
    raw: 0000000000000000 0000000000000011 00000000ffffffff 0000000000000000
    page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
    ------------[ cut here ]------------
    kernel BUG at include/linux/mm.h:707!
    invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
    CPU: 2 PID: 291928 Comm: kworker/2:0 Tainted: G    B             5.10.7-arch1-1-kasan #1
    Hardware name: Gigabyte Technology Co., Ltd. H97N-WIFI/H97N-WIFI, BIOS F9b 03/03/2016
    Workqueue: zswap-shrink shrink_worker
    RIP: 0010:__free_pages+0x10a/0x130
    Code: c1 e7 06 48 01 ef 45 85 e4 74 d1 44 89 e6 31 d2 41 83 ec 01 e8 e7 b0 ff ff eb da 48 c7 c6 e0 32 91 88 48 89 ef e8 a6 89 f8 ff <0f> 0b 4c 89 e7 e8 fc 79 07 00 e9 33 ff ff ff 48 89 ef e8 ff 79 07
    RSP: 0000:ffff88819a2ffb98 EFLAGS: 00010296
    RAX: 0000000000000000 RBX: ffffea000594c5a8 RCX: 0000000000000000
    RDX: 1ffffd4000b298b7 RSI: 0000000000000000 RDI: ffffea000594c5b8
    RBP: ffffea000594c580 R08: 000000000000003e R09: ffff8881d5520bbb
    R10: ffffed103aaa4177 R11: 0000000000000001 R12: ffffea000594c5b4
    R13: 0000000000000000 R14: ffff888165316000 R15: ffffea000594c588
    FS:  0000000000000000(0000) GS:ffff8881d5500000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f7c8c3654d8 CR3: 0000000103f42004 CR4: 00000000001706e0
    Call Trace:
     z3fold_zpool_shrink+0x9b6/0x1240
     ? sugov_update_single+0x357/0x990
     ? sched_clock+0x5/0x10
     ? sched_clock_cpu+0x18/0x180
     ? z3fold_zpool_map+0x490/0x490
     ? _raw_spin_lock_irq+0x88/0xe0
     shrink_worker+0x35/0x90
     process_one_work+0x70c/0x1210
     ? pwq_dec_nr_in_flight+0x15b/0x2a0
     worker_thread+0x539/0x1200
     ? __kthread_parkme+0x73/0x120
     ? rescuer_thread+0x1000/0x1000
     kthread+0x330/0x400
     ? __kthread_bind_mask+0x90/0x90
     ret_from_fork+0x22/0x30
    Modules linked in: rfcomm ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ccm algif_aead des_generic libdes ecb algif_skcipher cmac bnep md4 algif_hash af_alg vfat fat intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel iwlmvm hid_logitech_hidpp kvm at24 mac80211 snd_hda_codec_realtek iTCO_wdt snd_hda_codec_generic intel_pmc_bxt snd_hda_codec_hdmi ledtrig_audio iTCO_vendor_support mei_wdt mei_hdcp snd_hda_intel snd_intel_dspcfg libarc4 soundwire_intel irqbypass iwlwifi soundwire_generic_allocation rapl soundwire_cadence intel_cstate snd_hda_codec intel_uncore btusb joydev mousedev snd_usb_audio pcspkr btrtl uvcvideo nouveau btbcm i2c_i801 btintel snd_hda_core videobuf2_vmalloc i2c_smbus snd_usbmidi_lib videobuf2_memops bluetooth snd_hwdep soundwire_bus snd_soc_rt5640 videobuf2_v4l2 cfg80211 snd_soc_rl6231 videobuf2_common snd_rawmidi lpc_ich alx videodev mdio snd_seq_device snd_soc_core mc ecdh_generic mxm_wmi mei_me
     hid_logitech_dj wmi snd_compress e1000e ac97_bus mei ttm rfkill snd_pcm_dmaengine ecc snd_pcm snd_timer snd soundcore mac_hid acpi_pad pkcs8_key_parser it87 hwmon_vid crypto_user fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_crypt cbc encrypted_keys trusted tpm rng_core usbhid dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper xhci_pci xhci_pci_renesas i915 video intel_gtt i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm agpgart
    ---[ end trace 126d646fc3dc0ad8 ]---

To fix the issue, re-add the earlier test and set in the case where we
have a headless page.

Link: https://lkml.kernel.org/r/c8106dbe6d8390b290cd1d7f873a2942e805349e.1615452048.git.tommyhebb@gmail.com
Fixes: dcf5aedb24f8 ("z3fold: stricter locking and more careful reclaim")
Signed-off-by: Thomas Hebb <tommyhebb@gmail.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Jongseok Kim <ks77sj@gmail.com>
Cc: Snild Dolkow <snild@sony.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/z3fold.c |   16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

--- a/mm/z3fold.c~z3fold-prevent-reclaim-free-race-for-headless-pages
+++ a/mm/z3fold.c
@@ -1346,8 +1346,22 @@ static int z3fold_reclaim_page(struct z3
 			page = list_entry(pos, struct page, lru);
 
 			zhdr = page_address(page);
-			if (test_bit(PAGE_HEADLESS, &page->private))
+			if (test_bit(PAGE_HEADLESS, &page->private)) {
+				/*
+				 * For non-headless pages, we wait to do this
+				 * until we have the page lock to avoid racing
+				 * with __z3fold_alloc(). Headless pages don't
+				 * have a lock (and __z3fold_alloc() will never
+				 * see them), but we still need to test and set
+				 * PAGE_CLAIMED to avoid racing with
+				 * z3fold_free(), so just do it now before
+				 * leaving the loop.
+				 */
+				if (test_and_set_bit(PAGE_CLAIMED, &page->private))
+					continue;
+
 				break;
+			}
 
 			if (kref_get_unless_zero(&zhdr->refcount) == 0) {
 				zhdr = NULL;
_


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-03-25  4:37 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-25  4:36 incoming Andrew Morton
2021-03-25  4:37 ` [patch 01/14] hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings Andrew Morton
2021-03-25  4:37 ` [patch 02/14] kasan: fix per-page tags for non-page_alloc pages Andrew Morton
2021-03-25  4:37 ` [patch 03/14] mm/mmu_notifiers: ensure range_end() is paired with range_start() Andrew Morton
2021-03-25  4:37 ` [patch 04/14] selftests/vm: fix out-of-tree build Andrew Morton
2021-03-25  4:37 ` [patch 05/14] z3fold: prevent reclaim/free race for headless pages Andrew Morton
2021-03-25  4:37 ` [patch 06/14] squashfs: fix inode lookup sanity checks Andrew Morton
2021-03-25  4:37 ` [patch 07/14] squashfs: fix xattr id and id " Andrew Morton
2021-03-25  4:37 ` [patch 08/14] ia64: mca: allocate early mca with GFP_ATOMIC Andrew Morton
2021-03-25  4:37 ` [patch 09/14] ia64: fix format strings for err_inject Andrew Morton
2021-03-25  4:37 ` [patch 10/14] gcov: fix clang-11+ support Andrew Morton
2021-03-25  4:37 ` [patch 11/14] kfence: make compatible with kmemleak Andrew Morton
2021-03-25  4:37 ` [patch 12/14] mm: memblock: fix section mismatch warning again Andrew Morton
2021-03-25  4:37 ` [patch 13/14] mm/highmem: fix CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP Andrew Morton
2021-03-25  4:37 ` [patch 14/14] mailmap: update Andrey Konovalov's email address Andrew Morton
     [not found] <20210324213324.fe1ba237e471b5de17a08775@linux-foundation.org>
2021-03-25  4:36 ` [patch 05/14] z3fold: prevent reclaim/free race for headless pages Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).