linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* incoming
@ 2021-05-15  0:26 Andrew Morton
  2021-05-15  0:27 ` [patch 01/13] mm/hugetlb: fix F_SEAL_FUTURE_WRITE Andrew Morton
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: Andrew Morton @ 2021-05-15  0:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm

13 patches, based on bd3c9cdb21a2674dd0db70199df884828e37abd4.

Subsystems affected by this patch series:

  mm/hugetlb
  mm/slub
  resource
  squashfs
  mm/userfaultfd
  mm/ksm
  mm/pagealloc
  mm/kasan
  mm/pagemap
  hfsplus
  modprobe
  mm/ioremap

Subsystem: mm/hugetlb

    Peter Xu <peterx@redhat.com>:
    Patch series "mm/hugetlb: Fix issues on file sealing and fork", v2:
      mm/hugetlb: fix F_SEAL_FUTURE_WRITE
      mm/hugetlb: fix cow where page writtable in child

Subsystem: mm/slub

    Vlastimil Babka <vbabka@suse.cz>:
      mm, slub: move slub_debug static key enabling outside slab_mutex

Subsystem: resource

    Alistair Popple <apopple@nvidia.com>:
      kernel/resource: fix return code check in __request_free_mem_region

Subsystem: squashfs

    Phillip Lougher <phillip@squashfs.org.uk>:
      squashfs: fix divide error in calculate_skip()

Subsystem: mm/userfaultfd

    Axel Rasmussen <axelrasmussen@google.com>:
      userfaultfd: release page in error path to avoid BUG_ON

Subsystem: mm/ksm

    Hugh Dickins <hughd@google.com>:
      ksm: revert "use GET_KSM_PAGE_NOLOCK to get ksm page in remove_rmap_item_from_tree()"

Subsystem: mm/pagealloc

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
      mm: fix struct page layout on 32-bit systems

Subsystem: mm/kasan

    Peter Collingbourne <pcc@google.com>:
      kasan: fix unit tests with CONFIG_UBSAN_LOCAL_BOUNDS enabled

Subsystem: mm/pagemap

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
      mm/filemap: fix readahead return types

Subsystem: hfsplus

    Jouni Roivas <jouni.roivas@tuxera.com>:
      hfsplus: prevent corruption in shrinking truncate

Subsystem: modprobe

    Rasmus Villemoes <linux@rasmusvillemoes.dk>:
      docs: admin-guide: update description for kernel.modprobe sysctl

Subsystem: mm/ioremap

    Christophe Leroy <christophe.leroy@csgroup.eu>:
      mm/ioremap: fix iomap_max_page_shift

 Documentation/admin-guide/sysctl/kernel.rst |    9 ++++---
 fs/hfsplus/extents.c                        |    7 +++--
 fs/hugetlbfs/inode.c                        |    5 ++++
 fs/iomap/buffered-io.c                      |    4 +--
 fs/squashfs/file.c                          |    6 ++--
 include/linux/mm.h                          |   32 ++++++++++++++++++++++++++
 include/linux/mm_types.h                    |    4 +--
 include/linux/pagemap.h                     |    6 ++--
 include/net/page_pool.h                     |   12 +++++++++
 kernel/resource.c                           |    2 -
 lib/test_kasan.c                            |   29 ++++++++++++++++++-----
 mm/hugetlb.c                                |    1 
 mm/ioremap.c                                |    6 ++--
 mm/ksm.c                                    |    3 +-
 mm/shmem.c                                  |   34 ++++++++++++----------------
 mm/slab_common.c                            |   10 ++++++++
 mm/slub.c                                   |    9 -------
 net/core/page_pool.c                        |   12 +++++----
 18 files changed, 129 insertions(+), 62 deletions(-)



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 01/13] mm/hugetlb: fix F_SEAL_FUTURE_WRITE
  2021-05-15  0:26 incoming Andrew Morton
@ 2021-05-15  0:27 ` Andrew Morton
  2021-05-15  0:27 ` [patch 02/13] mm/hugetlb: fix cow where page writtable in child Andrew Morton
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2021-05-15  0:27 UTC (permalink / raw)
  To: akpm, hughd, joel, linux-mm, mike.kravetz, mm-commits, peterx,
	stable, torvalds

From: Peter Xu <peterx@redhat.com>
Subject: mm/hugetlb: fix F_SEAL_FUTURE_WRITE

Patch series "mm/hugetlb: Fix issues on file sealing and fork", v2.

Hugh reported issue with F_SEAL_FUTURE_WRITE not applied correctly to
hugetlbfs, which I can easily verify using the memfd_test program, which
seems that the program is hardly run with hugetlbfs pages (as by default
shmem).

Meanwhile I found another probably even more severe issue on that hugetlb
fork won't wr-protect child cow pages, so child can potentially write to
parent private pages.  Patch 2 addresses that.

After this series applied, "memfd_test hugetlbfs" should start to pass.


This patch (of 2):

F_SEAL_FUTURE_WRITE is missing for hugetlb starting from the first day. 
There is a test program for that and it fails constantly.

$ ./memfd_test hugetlbfs
memfd-hugetlb: CREATE
memfd-hugetlb: BASIC
memfd-hugetlb: SEAL-WRITE
memfd-hugetlb: SEAL-FUTURE-WRITE
mmap() didn't fail as expected
Aborted (core dumped)

I think it's probably because no one is really running the hugetlbfs test.

Fix it by checking FUTURE_WRITE also in hugetlbfs_file_mmap() as what we
do in shmem_mmap().  Generalize a helper for that.

Link: https://lkml.kernel.org/r/20210503234356.9097-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20210503234356.9097-2-peterx@redhat.com
Fixes: ab3948f58ff84 ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/hugetlbfs/inode.c |    5 +++++
 include/linux/mm.h   |   32 ++++++++++++++++++++++++++++++++
 mm/shmem.c           |   22 ++++------------------
 3 files changed, 41 insertions(+), 18 deletions(-)

--- a/fs/hugetlbfs/inode.c~mm-hugetlb-fix-f_seal_future_write
+++ a/fs/hugetlbfs/inode.c
@@ -131,6 +131,7 @@ static void huge_pagevec_release(struct
 static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	struct inode *inode = file_inode(file);
+	struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode);
 	loff_t len, vma_len;
 	int ret;
 	struct hstate *h = hstate_file(file);
@@ -146,6 +147,10 @@ static int hugetlbfs_file_mmap(struct fi
 	vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
 	vma->vm_ops = &hugetlb_vm_ops;
 
+	ret = seal_check_future_write(info->seals, vma);
+	if (ret)
+		return ret;
+
 	/*
 	 * page based offset in vm_pgoff could be sufficiently large to
 	 * overflow a loff_t when converted to byte offset.  This can
--- a/include/linux/mm.h~mm-hugetlb-fix-f_seal_future_write
+++ a/include/linux/mm.h
@@ -3216,5 +3216,37 @@ void mem_dump_obj(void *object);
 static inline void mem_dump_obj(void *object) {}
 #endif
 
+/**
+ * seal_check_future_write - Check for F_SEAL_FUTURE_WRITE flag and handle it
+ * @seals: the seals to check
+ * @vma: the vma to operate on
+ *
+ * Check whether F_SEAL_FUTURE_WRITE is set; if so, do proper check/handling on
+ * the vma flags.  Return 0 if check pass, or <0 for errors.
+ */
+static inline int seal_check_future_write(int seals, struct vm_area_struct *vma)
+{
+	if (seals & F_SEAL_FUTURE_WRITE) {
+		/*
+		 * New PROT_WRITE and MAP_SHARED mmaps are not allowed when
+		 * "future write" seal active.
+		 */
+		if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_WRITE))
+			return -EPERM;
+
+		/*
+		 * Since an F_SEAL_FUTURE_WRITE sealed memfd can be mapped as
+		 * MAP_SHARED and read-only, take care to not allow mprotect to
+		 * revert protections on such mappings. Do this only for shared
+		 * mappings. For private mappings, don't need to mask
+		 * VM_MAYWRITE as we still want them to be COW-writable.
+		 */
+		if (vma->vm_flags & VM_SHARED)
+			vma->vm_flags &= ~(VM_MAYWRITE);
+	}
+
+	return 0;
+}
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
--- a/mm/shmem.c~mm-hugetlb-fix-f_seal_future_write
+++ a/mm/shmem.c
@@ -2258,25 +2258,11 @@ out_nomem:
 static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	struct shmem_inode_info *info = SHMEM_I(file_inode(file));
+	int ret;
 
-	if (info->seals & F_SEAL_FUTURE_WRITE) {
-		/*
-		 * New PROT_WRITE and MAP_SHARED mmaps are not allowed when
-		 * "future write" seal active.
-		 */
-		if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_WRITE))
-			return -EPERM;
-
-		/*
-		 * Since an F_SEAL_FUTURE_WRITE sealed memfd can be mapped as
-		 * MAP_SHARED and read-only, take care to not allow mprotect to
-		 * revert protections on such mappings. Do this only for shared
-		 * mappings. For private mappings, don't need to mask
-		 * VM_MAYWRITE as we still want them to be COW-writable.
-		 */
-		if (vma->vm_flags & VM_SHARED)
-			vma->vm_flags &= ~(VM_MAYWRITE);
-	}
+	ret = seal_check_future_write(info->seals, vma);
+	if (ret)
+		return ret;
 
 	/* arm64 - allow memory tagging on RAM-based files */
 	vma->vm_flags |= VM_MTE_ALLOWED;
_


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 02/13] mm/hugetlb: fix cow where page writtable in child
  2021-05-15  0:26 incoming Andrew Morton
  2021-05-15  0:27 ` [patch 01/13] mm/hugetlb: fix F_SEAL_FUTURE_WRITE Andrew Morton
@ 2021-05-15  0:27 ` Andrew Morton
  2021-05-15  0:27 ` [patch 03/13] mm, slub: move slub_debug static key enabling outside slab_mutex Andrew Morton
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2021-05-15  0:27 UTC (permalink / raw)
  To: akpm, hughd, joel, linux-mm, mike.kravetz, mm-commits, peterx,
	stable, torvalds

From: Peter Xu <peterx@redhat.com>
Subject: mm/hugetlb: fix cow where page writtable in child

When rework early cow of pinned hugetlb pages, we moved huge_ptep_get()
upper but overlooked a side effect that the huge_ptep_get() will fetch the
pte after wr-protection.  After moving it upwards, we need explicit
wr-protect of child pte or we will keep the write bit set in the child
process, which could cause data corrution where the child can write to the
original page directly.

This issue can also be exposed by "memfd_test hugetlbfs" kselftest.

Link: https://lkml.kernel.org/r/20210503234356.9097-3-peterx@redhat.com
Fixes: 4eae4efa2c299 ("hugetlb: do early cow when page pinned on src mm")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/hugetlb.c |    1 +
 1 file changed, 1 insertion(+)

--- a/mm/hugetlb.c~mm-hugetlb-fix-cow-where-page-writtable-in-child
+++ a/mm/hugetlb.c
@@ -4056,6 +4056,7 @@ again:
 				 * See Documentation/vm/mmu_notifier.rst
 				 */
 				huge_ptep_set_wrprotect(src, addr, src_pte);
+				entry = huge_pte_wrprotect(entry);
 			}
 
 			page_dup_rmap(ptepage, true);
_


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 03/13] mm, slub: move slub_debug static key enabling outside slab_mutex
  2021-05-15  0:26 incoming Andrew Morton
  2021-05-15  0:27 ` [patch 01/13] mm/hugetlb: fix F_SEAL_FUTURE_WRITE Andrew Morton
  2021-05-15  0:27 ` [patch 02/13] mm/hugetlb: fix cow where page writtable in child Andrew Morton
@ 2021-05-15  0:27 ` Andrew Morton
  2021-05-15  0:27 ` [patch 04/13] kernel/resource: fix return code check in __request_free_mem_region Andrew Morton
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2021-05-15  0:27 UTC (permalink / raw)
  To: akpm, cl, iamjoonsoo.kim, linux-mm, mm-commits, paulmck, penberg,
	rientjes, torvalds, vbabka

From: Vlastimil Babka <vbabka@suse.cz>
Subject: mm, slub: move slub_debug static key enabling outside slab_mutex

Paul E.  McKenney reported [1] that commit 1f0723a4c0df ("mm, slub: enable
slub_debug static key when creating cache with explicit debug flags")
results in the lockdep complaint:

 ======================================================
 WARNING: possible circular locking dependency detected
 5.12.0+ #15 Not tainted
 ------------------------------------------------------
 rcu_torture_sta/109 is trying to acquire lock:
 ffffffff96063cd0 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_enable+0x9/0x20

 but task is already holding lock:
 ffffffff96173c28 (slab_mutex){+.+.}-{3:3}, at: kmem_cache_create_usercopy+0x2d/0x250

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (slab_mutex){+.+.}-{3:3}:
        lock_acquire+0xb9/0x3a0
        __mutex_lock+0x8d/0x920
        slub_cpu_dead+0x15/0xf0
        cpuhp_invoke_callback+0x17a/0x7c0
        cpuhp_invoke_callback_range+0x3b/0x80
        _cpu_down+0xdf/0x2a0
        cpu_down+0x2c/0x50
        device_offline+0x82/0xb0
        remove_cpu+0x1a/0x30
        torture_offline+0x80/0x140
        torture_onoff+0x147/0x260
        kthread+0x10a/0x140
        ret_from_fork+0x22/0x30

 -> #0 (cpu_hotplug_lock){++++}-{0:0}:
        check_prev_add+0x8f/0xbf0
        __lock_acquire+0x13f0/0x1d80
        lock_acquire+0xb9/0x3a0
        cpus_read_lock+0x21/0xa0
        static_key_enable+0x9/0x20
        __kmem_cache_create+0x38d/0x430
        kmem_cache_create_usercopy+0x146/0x250
        kmem_cache_create+0xd/0x10
        rcu_torture_stats+0x79/0x280
        kthread+0x10a/0x140
        ret_from_fork+0x22/0x30

 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(slab_mutex);
                                lock(cpu_hotplug_lock);
                                lock(slab_mutex);
   lock(cpu_hotplug_lock);

  *** DEADLOCK ***

 1 lock held by rcu_torture_sta/109:
  #0: ffffffff96173c28 (slab_mutex){+.+.}-{3:3}, at: kmem_cache_create_usercopy+0x2d/0x250

 stack backtrace:
 CPU: 3 PID: 109 Comm: rcu_torture_sta Not tainted 5.12.0+ #15
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
 Call Trace:
  dump_stack+0x6d/0x89
  check_noncircular+0xfe/0x110
  ? lock_is_held_type+0x98/0x110
  check_prev_add+0x8f/0xbf0
  __lock_acquire+0x13f0/0x1d80
  lock_acquire+0xb9/0x3a0
  ? static_key_enable+0x9/0x20
  ? mark_held_locks+0x49/0x70
  cpus_read_lock+0x21/0xa0
  ? static_key_enable+0x9/0x20
  static_key_enable+0x9/0x20
  __kmem_cache_create+0x38d/0x430
  kmem_cache_create_usercopy+0x146/0x250
  ? rcu_torture_stats_print+0xd0/0xd0
  kmem_cache_create+0xd/0x10
  rcu_torture_stats+0x79/0x280
  ? rcu_torture_stats_print+0xd0/0xd0
  kthread+0x10a/0x140
  ? kthread_park+0x80/0x80
  ret_from_fork+0x22/0x30

This is because there's one order of locking from the hotplug callbacks:

lock(cpu_hotplug_lock); // from hotplug machinery itself
lock(slab_mutex); // in e.g. slab_mem_going_offline_callback()

And commit 1f0723a4c0df made the reverse sequence possible:
lock(slab_mutex); // in kmem_cache_create_usercopy()
lock(cpu_hotplug_lock); // kmem_cache_open() -> static_key_enable()

The simplest fix is to move static_key_enable() to a place before slab_mutex is
taken. That means kmem_cache_create_usercopy() in mm/slab_common.c which is not
ideal for SLUB-specific code, but the #ifdef CONFIG_SLUB_DEBUG makes it
at least self-contained and obvious.

[1] https://lore.kernel.org/lkml/20210502171827.GA3670492@paulmck-ThinkPad-P17-Gen-1/

Link: https://lkml.kernel.org/r/20210504120019.26791-1-vbabka@suse.cz
Fixes: 1f0723a4c0df ("mm, slub: enable slub_debug static key when creating cache with explicit debug flags")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/slab_common.c |   10 ++++++++++
 mm/slub.c        |    9 ---------
 2 files changed, 10 insertions(+), 9 deletions(-)

--- a/mm/slab_common.c~mm-slub-move-slub_debug-static-key-enabling-outside-slab_mutex
+++ a/mm/slab_common.c
@@ -318,6 +318,16 @@ kmem_cache_create_usercopy(const char *n
 	const char *cache_name;
 	int err;
 
+#ifdef CONFIG_SLUB_DEBUG
+	/*
+	 * If no slub_debug was enabled globally, the static key is not yet
+	 * enabled by setup_slub_debug(). Enable it if the cache is being
+	 * created with any of the debugging flags passed explicitly.
+	 */
+	if (flags & SLAB_DEBUG_FLAGS)
+		static_branch_enable(&slub_debug_enabled);
+#endif
+
 	mutex_lock(&slab_mutex);
 
 	err = kmem_cache_sanity_check(name, size);
--- a/mm/slub.c~mm-slub-move-slub_debug-static-key-enabling-outside-slab_mutex
+++ a/mm/slub.c
@@ -3828,15 +3828,6 @@ static int calculate_sizes(struct kmem_c
 
 static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
 {
-#ifdef CONFIG_SLUB_DEBUG
-	/*
-	 * If no slub_debug was enabled globally, the static key is not yet
-	 * enabled by setup_slub_debug(). Enable it if the cache is being
-	 * created with any of the debugging flags passed explicitly.
-	 */
-	if (flags & SLAB_DEBUG_FLAGS)
-		static_branch_enable(&slub_debug_enabled);
-#endif
 	s->flags = kmem_cache_flags(s->size, flags, s->name);
 #ifdef CONFIG_SLAB_FREELIST_HARDENED
 	s->random = get_random_long();
_


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 04/13] kernel/resource: fix return code check in __request_free_mem_region
  2021-05-15  0:26 incoming Andrew Morton
                   ` (2 preceding siblings ...)
  2021-05-15  0:27 ` [patch 03/13] mm, slub: move slub_debug static key enabling outside slab_mutex Andrew Morton
@ 2021-05-15  0:27 ` Andrew Morton
  2021-05-15  0:27 ` [patch 05/13] squashfs: fix divide error in calculate_skip() Andrew Morton
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2021-05-15  0:27 UTC (permalink / raw)
  To: akpm, apopple, bsingharora, dan.j.williams, daniel.vetter, david,
	gregkh, jglisse, jhubbard, linux-mm, mm-commits, oliver.sang,
	smuchun, torvalds

From: Alistair Popple <apopple@nvidia.com>
Subject: kernel/resource: fix return code check in __request_free_mem_region

Splitting an earlier version of a patch that allowed calling
__request_region() while holding the resource lock into a series of
patches required changing the return code for the newly introduced
__request_region_locked().

Unfortunately this change was not carried through to a subsequent commit
56fd94919b8b ("kernel/resource: fix locking in request_free_mem_region")
in the series.  This resulted in a use-after-free due to freeing the
struct resource without properly releasing it.  Fix this by correcting the
return code check so that the struct is not freed if the request to add it
was successful.

Link: https://lkml.kernel.org/r/20210512073528.22334-1-apopple@nvidia.com
Fixes: 56fd94919b8b ("kernel/resource: fix locking in request_free_mem_region")
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Muchun Song <smuchun@gmail.com>
Cc: Oliver Sang <oliver.sang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 kernel/resource.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/resource.c~kernel-resource-fix-return-code-check-in-__request_free_mem_region
+++ a/kernel/resource.c
@@ -1805,7 +1805,7 @@ static struct resource *__request_free_m
 				REGION_DISJOINT)
 			continue;
 
-		if (!__request_region_locked(res, &iomem_resource, addr, size,
+		if (__request_region_locked(res, &iomem_resource, addr, size,
 						name, 0))
 			break;
 
_


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 05/13] squashfs: fix divide error in calculate_skip()
  2021-05-15  0:26 incoming Andrew Morton
                   ` (3 preceding siblings ...)
  2021-05-15  0:27 ` [patch 04/13] kernel/resource: fix return code check in __request_free_mem_region Andrew Morton
@ 2021-05-15  0:27 ` Andrew Morton
  2021-05-15  0:27 ` [patch 06/13] userfaultfd: release page in error path to avoid BUG_ON Andrew Morton
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2021-05-15  0:27 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, phillip, stable,
	syzbot+7b98870d4fec9447b951, syzbot+e8f781243ce16ac2f962,
	torvalds

From: Phillip Lougher <phillip@squashfs.org.uk>
Subject: squashfs: fix divide error in calculate_skip()

Sysbot has reported a "divide error" which has been identified as being
caused by a corrupted file_size value within the file inode.  This value
has been corrupted to a much larger value than expected.

Calculate_skip() is passed i_size_read(inode) >> msblk->block_log.  Due to
the file_size value corruption this overflows the int argument/variable in
that function, leading to the divide error.

This patch changes the function to use u64.  This will accommodate any
unexpectedly large values due to corruption.

The value returned from calculate_skip() is clamped to be never more than
SQUASHFS_CACHED_BLKS - 1, or 7.  So file_size corruption does not lead to
an unexpectedly large return result here.

Link: https://lkml.kernel.org/r/20210507152618.9447-1-phillip@squashfs.org.uk
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reported-by: <syzbot+e8f781243ce16ac2f962@syzkaller.appspotmail.com>
Reported-by: <syzbot+7b98870d4fec9447b951@syzkaller.appspotmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/squashfs/file.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/fs/squashfs/file.c~squashfs-fix-divide-error-in-calculate_skip
+++ a/fs/squashfs/file.c
@@ -211,11 +211,11 @@ failure:
  * If the skip factor is limited in this way then the file will use multiple
  * slots.
  */
-static inline int calculate_skip(int blocks)
+static inline int calculate_skip(u64 blocks)
 {
-	int skip = blocks / ((SQUASHFS_META_ENTRIES + 1)
+	u64 skip = blocks / ((SQUASHFS_META_ENTRIES + 1)
 		 * SQUASHFS_META_INDEXES);
-	return min(SQUASHFS_CACHED_BLKS - 1, skip + 1);
+	return min((u64) SQUASHFS_CACHED_BLKS - 1, skip + 1);
 }
 
 
_


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 06/13] userfaultfd: release page in error path to avoid BUG_ON
  2021-05-15  0:26 incoming Andrew Morton
                   ` (4 preceding siblings ...)
  2021-05-15  0:27 ` [patch 05/13] squashfs: fix divide error in calculate_skip() Andrew Morton
@ 2021-05-15  0:27 ` Andrew Morton
  2021-05-15  0:27 ` [patch 07/13] ksm: revert "use GET_KSM_PAGE_NOLOCK to get ksm page in remove_rmap_item_from_tree()" Andrew Morton
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2021-05-15  0:27 UTC (permalink / raw)
  To: akpm, axelrasmussen, hughd, linux-mm, mm-commits, peterx, stable,
	torvalds

From: Axel Rasmussen <axelrasmussen@google.com>
Subject: userfaultfd: release page in error path to avoid BUG_ON

Consider the following sequence of events:

1. Userspace issues a UFFD ioctl, which ends up calling into
   shmem_mfill_atomic_pte(). We successfully account the blocks, we
   shmem_alloc_page(), but then the copy_from_user() fails. We return
   -ENOENT. We don't release the page we allocated.
2. Our caller detects this error code, tries the copy_from_user() after
   dropping the mmap_lock, and retries, calling back into
   shmem_mfill_atomic_pte().
3. Meanwhile, let's say another process filled up the tmpfs being used.
4. So shmem_mfill_atomic_pte() fails to account blocks this time, and
   immediately returns - without releasing the page.

This triggers a BUG_ON in our caller, which asserts that the page
should always be consumed, unless -ENOENT is returned.

To fix this, detect if we have such a "dangling" page when accounting
fails, and if so, release it before returning.

Link: https://lkml.kernel.org/r/20210428230858.348400-1-axelrasmussen@google.com
Fixes: cb658a453b93 ("userfaultfd: shmem: avoid leaking blocks and used blocks in UFFDIO_COPY")
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reported-by: Hugh Dickins <hughd@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/shmem.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

--- a/mm/shmem.c~userfaultfd-release-page-in-error-path-to-avoid-bug_on
+++ a/mm/shmem.c
@@ -2361,8 +2361,18 @@ static int shmem_mfill_atomic_pte(struct
 	pgoff_t offset, max_off;
 
 	ret = -ENOMEM;
-	if (!shmem_inode_acct_block(inode, 1))
+	if (!shmem_inode_acct_block(inode, 1)) {
+		/*
+		 * We may have got a page, returned -ENOENT triggering a retry,
+		 * and now we find ourselves with -ENOMEM. Release the page, to
+		 * avoid a BUG_ON in our caller.
+		 */
+		if (unlikely(*pagep)) {
+			put_page(*pagep);
+			*pagep = NULL;
+		}
 		goto out;
+	}
 
 	if (!*pagep) {
 		page = shmem_alloc_page(gfp, info, pgoff);
_


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 07/13] ksm: revert "use GET_KSM_PAGE_NOLOCK to get ksm page in remove_rmap_item_from_tree()"
  2021-05-15  0:26 incoming Andrew Morton
                   ` (5 preceding siblings ...)
  2021-05-15  0:27 ` [patch 06/13] userfaultfd: release page in error path to avoid BUG_ON Andrew Morton
@ 2021-05-15  0:27 ` Andrew Morton
  2021-05-15  0:27 ` [patch 08/13] mm: fix struct page layout on 32-bit systems Andrew Morton
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2021-05-15  0:27 UTC (permalink / raw)
  To: akpm, hughd, linmiaohe, linux-mm, mm-commits, torvalds

From: Hugh Dickins <hughd@google.com>
Subject: ksm: revert "use GET_KSM_PAGE_NOLOCK to get ksm page in remove_rmap_item_from_tree()"

This reverts commit 3e96b6a2e9ad929a3230a22f4d64a74671a0720b.  General
Protection Fault in rmap_walk_ksm() under memory pressure:
remove_rmap_item_from_tree() needs to take page lock, of course.

Link: https://lkml.kernel.org/r/alpine.LSU.2.11.2105092253500.1127@eggly.anvils
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/ksm.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/mm/ksm.c~ksm-revert-use-get_ksm_page_nolock-to-get-ksm-page-in-remove_rmap_item_from_tree
+++ a/mm/ksm.c
@@ -776,11 +776,12 @@ static void remove_rmap_item_from_tree(s
 		struct page *page;
 
 		stable_node = rmap_item->head;
-		page = get_ksm_page(stable_node, GET_KSM_PAGE_NOLOCK);
+		page = get_ksm_page(stable_node, GET_KSM_PAGE_LOCK);
 		if (!page)
 			goto out;
 
 		hlist_del(&rmap_item->hlist);
+		unlock_page(page);
 		put_page(page);
 
 		if (!hlist_empty(&stable_node->hlist))
_


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 08/13] mm: fix struct page layout on 32-bit systems
  2021-05-15  0:26 incoming Andrew Morton
                   ` (6 preceding siblings ...)
  2021-05-15  0:27 ` [patch 07/13] ksm: revert "use GET_KSM_PAGE_NOLOCK to get ksm page in remove_rmap_item_from_tree()" Andrew Morton
@ 2021-05-15  0:27 ` Andrew Morton
  2021-05-15  0:27 ` [patch 09/13] kasan: fix unit tests with CONFIG_UBSAN_LOCAL_BOUNDS enabled Andrew Morton
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2021-05-15  0:27 UTC (permalink / raw)
  To: akpm, brouer, ilias.apalodimas, linux-mm, mcroce, mm-commits,
	stable, torvalds, vbabka, willy

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm: fix struct page layout on 32-bit systems

32-bit architectures which expect 8-byte alignment for 8-byte integers and
need 64-bit DMA addresses (arm, mips, ppc) had their struct page
inadvertently expanded in 2019.  When the dma_addr_t was added, it forced
the alignment of the union to 8 bytes, which inserted a 4 byte gap between
'flags' and the union.

Fix this by storing the dma_addr_t in one or two adjacent unsigned longs.
This restores the alignment to that of an unsigned long.  We always
store the low bits in the first word to prevent the PageTail bit from
being inadvertently set on a big endian platform.  If that happened,
get_user_pages_fast() racing against a page which was freed and
reallocated to the page_pool could dereference a bogus compound_head(),
which would be hard to trace back to this cause.

Link: https://lkml.kernel.org/r/20210510153211.1504886-1-willy@infradead.org
Fixes: c25fff7171be ("mm: add dma_addr_t to struct page")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Matteo Croce <mcroce@linux.microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mm_types.h |    4 ++--
 include/net/page_pool.h  |   12 +++++++++++-
 net/core/page_pool.c     |   12 +++++++-----
 3 files changed, 20 insertions(+), 8 deletions(-)

--- a/include/linux/mm_types.h~mm-fix-struct-page-layout-on-32-bit-systems
+++ a/include/linux/mm_types.h
@@ -97,10 +97,10 @@ struct page {
 		};
 		struct {	/* page_pool used by netstack */
 			/**
-			 * @dma_addr: might require a 64-bit value even on
+			 * @dma_addr: might require a 64-bit value on
 			 * 32-bit architectures.
 			 */
-			dma_addr_t dma_addr;
+			unsigned long dma_addr[2];
 		};
 		struct {	/* slab, slob and slub */
 			union {
--- a/include/net/page_pool.h~mm-fix-struct-page-layout-on-32-bit-systems
+++ a/include/net/page_pool.h
@@ -198,7 +198,17 @@ static inline void page_pool_recycle_dir
 
 static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
 {
-	return page->dma_addr;
+	dma_addr_t ret = page->dma_addr[0];
+	if (sizeof(dma_addr_t) > sizeof(unsigned long))
+		ret |= (dma_addr_t)page->dma_addr[1] << 16 << 16;
+	return ret;
+}
+
+static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
+{
+	page->dma_addr[0] = addr;
+	if (sizeof(dma_addr_t) > sizeof(unsigned long))
+		page->dma_addr[1] = upper_32_bits(addr);
 }
 
 static inline bool is_page_pool_compiled_in(void)
--- a/net/core/page_pool.c~mm-fix-struct-page-layout-on-32-bit-systems
+++ a/net/core/page_pool.c
@@ -174,8 +174,10 @@ static void page_pool_dma_sync_for_devic
 					  struct page *page,
 					  unsigned int dma_sync_size)
 {
+	dma_addr_t dma_addr = page_pool_get_dma_addr(page);
+
 	dma_sync_size = min(dma_sync_size, pool->p.max_len);
-	dma_sync_single_range_for_device(pool->p.dev, page->dma_addr,
+	dma_sync_single_range_for_device(pool->p.dev, dma_addr,
 					 pool->p.offset, dma_sync_size,
 					 pool->p.dma_dir);
 }
@@ -195,7 +197,7 @@ static bool page_pool_dma_map(struct pag
 	if (dma_mapping_error(pool->p.dev, dma))
 		return false;
 
-	page->dma_addr = dma;
+	page_pool_set_dma_addr(page, dma);
 
 	if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
 		page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
@@ -331,13 +333,13 @@ void page_pool_release_page(struct page_
 		 */
 		goto skip_dma_unmap;
 
-	dma = page->dma_addr;
+	dma = page_pool_get_dma_addr(page);
 
-	/* When page is unmapped, it cannot be returned our pool */
+	/* When page is unmapped, it cannot be returned to our pool */
 	dma_unmap_page_attrs(pool->p.dev, dma,
 			     PAGE_SIZE << pool->p.order, pool->p.dma_dir,
 			     DMA_ATTR_SKIP_CPU_SYNC);
-	page->dma_addr = 0;
+	page_pool_set_dma_addr(page, 0);
 skip_dma_unmap:
 	/* This may be the last page returned, releasing the pool, so
 	 * it is not safe to reference pool afterwards.
_


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 09/13] kasan: fix unit tests with CONFIG_UBSAN_LOCAL_BOUNDS enabled
  2021-05-15  0:26 incoming Andrew Morton
                   ` (7 preceding siblings ...)
  2021-05-15  0:27 ` [patch 08/13] mm: fix struct page layout on 32-bit systems Andrew Morton
@ 2021-05-15  0:27 ` Andrew Morton
  2021-05-15  0:27 ` [patch 10/13] mm/filemap: fix readahead return types Andrew Morton
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2021-05-15  0:27 UTC (permalink / raw)
  To: akpm, andreyknvl, eugenis, georgepope, glider, lenaptr, linux-mm,
	mm-commits, pcc, stable, torvalds

From: Peter Collingbourne <pcc@google.com>
Subject: kasan: fix unit tests with CONFIG_UBSAN_LOCAL_BOUNDS enabled

These tests deliberately access these arrays out of bounds, which will
cause the dynamic local bounds checks inserted by
CONFIG_UBSAN_LOCAL_BOUNDS to fail and panic the kernel.  To avoid this
problem, access the arrays via volatile pointers, which will prevent the
compiler from being able to determine the array bounds.

These accesses use volatile pointers to char (char *volatile) rather than
the more conventional pointers to volatile char (volatile char *) because
we want to prevent the compiler from making inferences about the pointer
itself (i.e.  its array bounds), not the data that it refers to.

Link: https://lkml.kernel.org/r/20210507025915.1464056-1-pcc@google.com
Link: https://linux-review.googlesource.com/id/I90b1713fbfa1bf68ff895aef099ea77b98a7c3b9
Signed-off-by: Peter Collingbourne <pcc@google.com>
Tested-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: George Popescu <georgepope@android.com>
Cc: Elena Petrova <lenaptr@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 lib/test_kasan.c |   29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)

--- a/lib/test_kasan.c~kasan-fix-unit-tests-with-config_ubsan_local_bounds-enabled
+++ a/lib/test_kasan.c
@@ -654,8 +654,20 @@ static char global_array[10];
 
 static void kasan_global_oob(struct kunit *test)
 {
-	volatile int i = 3;
-	char *p = &global_array[ARRAY_SIZE(global_array) + i];
+	/*
+	 * Deliberate out-of-bounds access. To prevent CONFIG_UBSAN_LOCAL_BOUNDS
+	 * from failing here and panicing the kernel, access the array via a
+	 * volatile pointer, which will prevent the compiler from being able to
+	 * determine the array bounds.
+	 *
+	 * This access uses a volatile pointer to char (char *volatile) rather
+	 * than the more conventional pointer to volatile char (volatile char *)
+	 * because we want to prevent the compiler from making inferences about
+	 * the pointer itself (i.e. its array bounds), not the data that it
+	 * refers to.
+	 */
+	char *volatile array = global_array;
+	char *p = &array[ARRAY_SIZE(global_array) + 3];
 
 	/* Only generic mode instruments globals. */
 	KASAN_TEST_NEEDS_CONFIG_ON(test, CONFIG_KASAN_GENERIC);
@@ -703,8 +715,9 @@ static void ksize_uaf(struct kunit *test
 static void kasan_stack_oob(struct kunit *test)
 {
 	char stack_array[10];
-	volatile int i = OOB_TAG_OFF;
-	char *p = &stack_array[ARRAY_SIZE(stack_array) + i];
+	/* See comment in kasan_global_oob. */
+	char *volatile array = stack_array;
+	char *p = &array[ARRAY_SIZE(stack_array) + OOB_TAG_OFF];
 
 	KASAN_TEST_NEEDS_CONFIG_ON(test, CONFIG_KASAN_STACK);
 
@@ -715,7 +728,9 @@ static void kasan_alloca_oob_left(struct
 {
 	volatile int i = 10;
 	char alloca_array[i];
-	char *p = alloca_array - 1;
+	/* See comment in kasan_global_oob. */
+	char *volatile array = alloca_array;
+	char *p = array - 1;
 
 	/* Only generic mode instruments dynamic allocas. */
 	KASAN_TEST_NEEDS_CONFIG_ON(test, CONFIG_KASAN_GENERIC);
@@ -728,7 +743,9 @@ static void kasan_alloca_oob_right(struc
 {
 	volatile int i = 10;
 	char alloca_array[i];
-	char *p = alloca_array + i;
+	/* See comment in kasan_global_oob. */
+	char *volatile array = alloca_array;
+	char *p = array + i;
 
 	/* Only generic mode instruments dynamic allocas. */
 	KASAN_TEST_NEEDS_CONFIG_ON(test, CONFIG_KASAN_GENERIC);
_


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 10/13] mm/filemap: fix readahead return types
  2021-05-15  0:26 incoming Andrew Morton
                   ` (8 preceding siblings ...)
  2021-05-15  0:27 ` [patch 09/13] kasan: fix unit tests with CONFIG_UBSAN_LOCAL_BOUNDS enabled Andrew Morton
@ 2021-05-15  0:27 ` Andrew Morton
  2021-05-15  0:27 ` [patch 11/13] hfsplus: prevent corruption in shrinking truncate Andrew Morton
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2021-05-15  0:27 UTC (permalink / raw)
  To: akpm, djwong, linux-mm, mm-commits, torvalds, willy

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm/filemap: fix readahead return types

A readahead request will not allocate more memory than can be represented
by a size_t, even on systems that have HIGHMEM available.  Change the
length functions from returning an loff_t to a size_t.

Link: https://lkml.kernel.org/r/20210510201201.1558972-1-willy@infradead.org
Fixes: 32c0a6bcaa1f57 ("btrfs: add and use readahead_batch_length")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/iomap/buffered-io.c  |    4 ++--
 include/linux/pagemap.h |    6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

--- a/fs/iomap/buffered-io.c~mm-filemap-fix-readahead-return-types
+++ a/fs/iomap/buffered-io.c
@@ -394,7 +394,7 @@ void iomap_readahead(struct readahead_co
 {
 	struct inode *inode = rac->mapping->host;
 	loff_t pos = readahead_pos(rac);
-	loff_t length = readahead_length(rac);
+	size_t length = readahead_length(rac);
 	struct iomap_readpage_ctx ctx = {
 		.rac	= rac,
 	};
@@ -402,7 +402,7 @@ void iomap_readahead(struct readahead_co
 	trace_iomap_readahead(inode, readahead_count(rac));
 
 	while (length > 0) {
-		loff_t ret = iomap_apply(inode, pos, length, 0, ops,
+		ssize_t ret = iomap_apply(inode, pos, length, 0, ops,
 				&ctx, iomap_readahead_actor);
 		if (ret <= 0) {
 			WARN_ON_ONCE(ret == 0);
--- a/include/linux/pagemap.h~mm-filemap-fix-readahead-return-types
+++ a/include/linux/pagemap.h
@@ -997,9 +997,9 @@ static inline loff_t readahead_pos(struc
  * readahead_length - The number of bytes in this readahead request.
  * @rac: The readahead request.
  */
-static inline loff_t readahead_length(struct readahead_control *rac)
+static inline size_t readahead_length(struct readahead_control *rac)
 {
-	return (loff_t)rac->_nr_pages * PAGE_SIZE;
+	return rac->_nr_pages * PAGE_SIZE;
 }
 
 /**
@@ -1024,7 +1024,7 @@ static inline unsigned int readahead_cou
  * readahead_batch_length - The number of bytes in the current batch.
  * @rac: The readahead request.
  */
-static inline loff_t readahead_batch_length(struct readahead_control *rac)
+static inline size_t readahead_batch_length(struct readahead_control *rac)
 {
 	return rac->_batch_count * PAGE_SIZE;
 }
_


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 11/13] hfsplus: prevent corruption in shrinking truncate
  2021-05-15  0:26 incoming Andrew Morton
                   ` (9 preceding siblings ...)
  2021-05-15  0:27 ` [patch 10/13] mm/filemap: fix readahead return types Andrew Morton
@ 2021-05-15  0:27 ` Andrew Morton
  2021-05-15  0:27 ` [patch 12/13] docs: admin-guide: update description for kernel.modprobe sysctl Andrew Morton
  2021-05-15  0:27 ` [patch 13/13] mm/ioremap: fix iomap_max_page_shift Andrew Morton
  12 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2021-05-15  0:27 UTC (permalink / raw)
  To: akpm, anatoly.trosinenko, anton, jouni.roivas, linux-mm,
	mm-commits, slava, stable, torvalds

From: Jouni Roivas <jouni.roivas@tuxera.com>
Subject: hfsplus: prevent corruption in shrinking truncate

I believe there are some issues introduced by commit 31651c607151
("hfsplus: avoid deadlock on file truncation")

HFS+ has extent records which always contains 8 extents.  In case the
first extent record in catalog file gets full, new ones are allocated from
extents overflow file.

In case shrinking truncate happens to middle of an extent record which
locates in extents overflow file, the logic in hfsplus_file_truncate() was
changed so that call to hfs_brec_remove() is not guarded any more.

Right action would be just freeing the extents that exceed the new size
inside extent record by calling hfsplus_free_extents(), and then check if
the whole extent record should be removed.  However since the guard
(blk_cnt > start) is now after the call to hfs_brec_remove(), this has
unfortunate effect that the last matching extent record is removed
unconditionally.

To reproduce this issue, create a file which has at least 10 extents, and
then perform shrinking truncate into middle of the last extent record, so
that the number of remaining extents is not under or divisible by 8.  This
causes the last extent record (8 extents) to be removed totally instead of
truncating into middle of it.  Thus this causes corruption, and lost data.

Fix for this is simply checking if the new truncated end is below the
start of this extent record, making it safe to remove the full extent
record.  However call to hfs_brec_remove() can't be moved to it's previous
place since we're dropping ->tree_lock and it can cause a race condition
and the cached info being invalidated possibly corrupting the node data.

Another issue is related to this one.  When entering into the block
(blk_cnt > start) we are not holding the ->tree_lock.  We break out from
the loop not holding the lock, but hfs_find_exit() does unlock it.  Not
sure if it's possible for someone else to take the lock under our feet,
but it can cause hard to debug errors and premature unlocking.  Even if
there's no real risk of it, the locking should still always be kept in
balance.  Thus taking the lock now just before the check.

Link: https://lkml.kernel.org/r/20210429165139.3082828-1-jouni.roivas@tuxera.com
Fixes: 31651c607151f ("hfsplus: avoid deadlock on file truncation")
Signed-off-by: Jouni Roivas <jouni.roivas@tuxera.com>
Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
Cc: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/hfsplus/extents.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

--- a/fs/hfsplus/extents.c~hfsplus-prevent-corruption-in-shrinking-truncate
+++ a/fs/hfsplus/extents.c
@@ -598,13 +598,15 @@ void hfsplus_file_truncate(struct inode
 		res = __hfsplus_ext_cache_extent(&fd, inode, alloc_cnt);
 		if (res)
 			break;
-		hfs_brec_remove(&fd);
 
-		mutex_unlock(&fd.tree->tree_lock);
 		start = hip->cached_start;
+		if (blk_cnt <= start)
+			hfs_brec_remove(&fd);
+		mutex_unlock(&fd.tree->tree_lock);
 		hfsplus_free_extents(sb, hip->cached_extents,
 				     alloc_cnt - start, alloc_cnt - blk_cnt);
 		hfsplus_dump_extent(hip->cached_extents);
+		mutex_lock(&fd.tree->tree_lock);
 		if (blk_cnt > start) {
 			hip->extent_state |= HFSPLUS_EXT_DIRTY;
 			break;
@@ -612,7 +614,6 @@ void hfsplus_file_truncate(struct inode
 		alloc_cnt = start;
 		hip->cached_start = hip->cached_blocks = 0;
 		hip->extent_state &= ~(HFSPLUS_EXT_DIRTY | HFSPLUS_EXT_NEW);
-		mutex_lock(&fd.tree->tree_lock);
 	}
 	hfs_find_exit(&fd);
 
_


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 12/13] docs: admin-guide: update description for kernel.modprobe sysctl
  2021-05-15  0:26 incoming Andrew Morton
                   ` (10 preceding siblings ...)
  2021-05-15  0:27 ` [patch 11/13] hfsplus: prevent corruption in shrinking truncate Andrew Morton
@ 2021-05-15  0:27 ` Andrew Morton
  2021-05-15  0:27 ` [patch 13/13] mm/ioremap: fix iomap_max_page_shift Andrew Morton
  12 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2021-05-15  0:27 UTC (permalink / raw)
  To: akpm, corbet, gregkh, jeyu, linux-mm, linux, mcgrof, mm-commits,
	torvalds

From: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Subject: docs: admin-guide: update description for kernel.modprobe sysctl

When I added CONFIG_MODPROBE_PATH, I neglected to update Documentation/. 
It's still true that this defaults to /sbin/modprobe, but now via a level
of indirection.  So document that the kernel might have been built with
something other than /sbin/modprobe as the initial value.

Link: https://lkml.kernel.org/r/20210420125324.1246826-1-linux@rasmusvillemoes.dk
Fixes: 17652f4240f7a ("modules: add CONFIG_MODPROBE_PATH")
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 Documentation/admin-guide/sysctl/kernel.rst |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

--- a/Documentation/admin-guide/sysctl/kernel.rst~docs-admin-guide-update-description-for-kernelmodprobe-sysctl
+++ a/Documentation/admin-guide/sysctl/kernel.rst
@@ -483,10 +483,11 @@ modprobe
 ========
 
 The full path to the usermode helper for autoloading kernel modules,
-by default "/sbin/modprobe".  This binary is executed when the kernel
-requests a module.  For example, if userspace passes an unknown
-filesystem type to mount(), then the kernel will automatically request
-the corresponding filesystem module by executing this usermode helper.
+by default ``CONFIG_MODPROBE_PATH``, which in turn defaults to
+"/sbin/modprobe".  This binary is executed when the kernel requests a
+module.  For example, if userspace passes an unknown filesystem type
+to mount(), then the kernel will automatically request the
+corresponding filesystem module by executing this usermode helper.
 This usermode helper should insert the needed module into the kernel.
 
 This sysctl only affects module autoloading.  It has no effect on the
_


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 13/13] mm/ioremap: fix iomap_max_page_shift
  2021-05-15  0:26 incoming Andrew Morton
                   ` (11 preceding siblings ...)
  2021-05-15  0:27 ` [patch 12/13] docs: admin-guide: update description for kernel.modprobe sysctl Andrew Morton
@ 2021-05-15  0:27 ` Andrew Morton
  12 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2021-05-15  0:27 UTC (permalink / raw)
  To: akpm, anshuman.khandual, christophe.leroy, linux-mm, mm-commits,
	npiggin, torvalds

From: Christophe Leroy <christophe.leroy@csgroup.eu>
Subject: mm/ioremap: fix iomap_max_page_shift

iomap_max_page_shift is expected to contain a page shift, so it can't be a
'bool', has to be an 'unsigned int'

And fix the default values: P4D_SHIFT is when huge iomap is allowed.

However, on some architectures (eg: powerpc book3s/64), P4D_SHIFT is not a
constant so it can't be used to initialise a static variable.  So,
initialise iomap_max_page_shift with a maximum shift supported by the
architecture, it is gated by P4D_SHIFT in vmap_try_huge_p4d() anyway.

Link: https://lkml.kernel.org/r/ad2d366015794a9f21320dcbdd0a8eb98979e9df.1620898113.git.christophe.leroy@csgroup.eu
Fixes: bbc180a5adb0 ("mm: HUGE_VMAP arch support cleanup")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/ioremap.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/mm/ioremap.c~mm-ioremap-fix-iomap_max_page_shift
+++ a/mm/ioremap.c
@@ -16,16 +16,16 @@
 #include "pgalloc-track.h"
 
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
-static bool __ro_after_init iomap_max_page_shift = PAGE_SHIFT;
+static unsigned int __ro_after_init iomap_max_page_shift = BITS_PER_LONG - 1;
 
 static int __init set_nohugeiomap(char *str)
 {
-	iomap_max_page_shift = P4D_SHIFT;
+	iomap_max_page_shift = PAGE_SHIFT;
 	return 0;
 }
 early_param("nohugeiomap", set_nohugeiomap);
 #else /* CONFIG_HAVE_ARCH_HUGE_VMAP */
-static const bool iomap_max_page_shift = PAGE_SHIFT;
+static const unsigned int iomap_max_page_shift = PAGE_SHIFT;
 #endif	/* CONFIG_HAVE_ARCH_HUGE_VMAP */
 
 int ioremap_page_range(unsigned long addr,
_


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-05-15  0:27 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-15  0:26 incoming Andrew Morton
2021-05-15  0:27 ` [patch 01/13] mm/hugetlb: fix F_SEAL_FUTURE_WRITE Andrew Morton
2021-05-15  0:27 ` [patch 02/13] mm/hugetlb: fix cow where page writtable in child Andrew Morton
2021-05-15  0:27 ` [patch 03/13] mm, slub: move slub_debug static key enabling outside slab_mutex Andrew Morton
2021-05-15  0:27 ` [patch 04/13] kernel/resource: fix return code check in __request_free_mem_region Andrew Morton
2021-05-15  0:27 ` [patch 05/13] squashfs: fix divide error in calculate_skip() Andrew Morton
2021-05-15  0:27 ` [patch 06/13] userfaultfd: release page in error path to avoid BUG_ON Andrew Morton
2021-05-15  0:27 ` [patch 07/13] ksm: revert "use GET_KSM_PAGE_NOLOCK to get ksm page in remove_rmap_item_from_tree()" Andrew Morton
2021-05-15  0:27 ` [patch 08/13] mm: fix struct page layout on 32-bit systems Andrew Morton
2021-05-15  0:27 ` [patch 09/13] kasan: fix unit tests with CONFIG_UBSAN_LOCAL_BOUNDS enabled Andrew Morton
2021-05-15  0:27 ` [patch 10/13] mm/filemap: fix readahead return types Andrew Morton
2021-05-15  0:27 ` [patch 11/13] hfsplus: prevent corruption in shrinking truncate Andrew Morton
2021-05-15  0:27 ` [patch 12/13] docs: admin-guide: update description for kernel.modprobe sysctl Andrew Morton
2021-05-15  0:27 ` [patch 13/13] mm/ioremap: fix iomap_max_page_shift Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).