linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* incoming
@ 2021-06-05  3:00 Andrew Morton
  2021-06-05  3:01 ` [patch 01/13] Revert "MIPS: make userspace mapping young by default" Andrew Morton
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: Andrew Morton @ 2021-06-05  3:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, mm-commits

13 patches, based on 16f0596fc1d78a1f3ae4628cff962bb297dc908c.

Subsystems affected by this patch series:

  mips
  mm/kfence
  init
  mm/debug
  mm/pagealloc
  mm/memory-hotplug
  mm/hugetlb
  proc
  mm/kasan
  mm/hugetlb
  lib
  ocfs2
  mailmap

Subsystem: mips

    Thomas Bogendoerfer <tsbogend@alpha.franken.de>:
      Revert "MIPS: make userspace mapping young by default"

Subsystem: mm/kfence

    Marco Elver <elver@google.com>:
      kfence: use TASK_IDLE when awaiting allocation

Subsystem: init

    Mark Rutland <mark.rutland@arm.com>:
      pid: take a reference when initializing `cad_pid`

Subsystem: mm/debug

    Gerald Schaefer <gerald.schaefer@linux.ibm.com>:
      mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests()

Subsystem: mm/pagealloc

    Ding Hui <dinghui@sangfor.com.cn>:
      mm/page_alloc: fix counting of free pages after take off from buddy

Subsystem: mm/memory-hotplug

    David Hildenbrand <david@redhat.com>:
      drivers/base/memory: fix trying offlining memory blocks with memory holes on aarch64

Subsystem: mm/hugetlb

    Naoya Horiguchi <naoya.horiguchi@nec.com>:
      hugetlb: pass head page to remove_hugetlb_page()

Subsystem: proc

    David Matlack <dmatlack@google.com>:
      proc: add .gitignore for proc-subset-pid selftest

Subsystem: mm/kasan

    Yu Kuai <yukuai3@huawei.com>:
      mm/kasan/init.c: fix doc warning

Subsystem: mm/hugetlb

    Mina Almasry <almasrymina@google.com>:
      mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY

Subsystem: lib

    YueHaibing <yuehaibing@huawei.com>:
      lib: crc64: fix kernel-doc warning

Subsystem: ocfs2

    Junxiao Bi <junxiao.bi@oracle.com>:
      ocfs2: fix data corruption by fallocate

Subsystem: mailmap

    Michel Lespinasse <michel@lespinasse.org>:
      mailmap: use private address for Michel Lespinasse

 .mailmap                                |    3 +
 arch/mips/mm/cache.c                    |   30 ++++++++---------
 drivers/base/memory.c                   |    6 +--
 fs/ocfs2/file.c                         |   55 +++++++++++++++++++++++++++++---
 include/linux/pgtable.h                 |    8 ++++
 init/main.c                             |    2 -
 lib/crc64.c                             |    2 -
 mm/debug_vm_pgtable.c                   |    4 +-
 mm/hugetlb.c                            |   16 +++++++--
 mm/kasan/init.c                         |    4 +-
 mm/kfence/core.c                        |    6 +--
 mm/memory.c                             |    4 ++
 mm/page_alloc.c                         |    2 +
 tools/testing/selftests/proc/.gitignore |    1 
 14 files changed, 107 insertions(+), 36 deletions(-)



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 01/13] Revert "MIPS: make userspace mapping young by default"
  2021-06-05  3:00 incoming Andrew Morton
@ 2021-06-05  3:01 ` Andrew Morton
  2021-06-05  3:01 ` [patch 02/13] kfence: use TASK_IDLE when awaiting allocation Andrew Morton
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2021-06-05  3:01 UTC (permalink / raw)
  To: akpm, huangpei, linux-mm, mm-commits, npiggin, stable, torvalds,
	tsbogend, zhouyanjie

From: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Subject: Revert "MIPS: make userspace mapping young by default"

This reverts commit f685a533a7fab35c5d069dcd663f59c8e4171a75.

MIPS cache flush logic needs to know whether the mapping was already
established to decide how to flush caches.  This is done by checking the
valid bit in the PTE.  The commit above breaks this logic by setting the
valid in the PTE in new mappings, which causes kernel crashes.

Link: https://lkml.kernel.org/r/20210526094335.92948-1-tsbogend@alpha.franken.de
Fixes: f685a533a7f ("MIPS: make userspace mapping young by default")
Reported-by: Zhou Yanjie <zhouyanjie@wanyeetech.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Huang Pei <huangpei@loongson.cn>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/mips/mm/cache.c    |   30 ++++++++++++++----------------
 include/linux/pgtable.h |    8 ++++++++
 mm/memory.c             |    4 ++++
 3 files changed, 26 insertions(+), 16 deletions(-)

--- a/arch/mips/mm/cache.c~revert-mips-make-userspace-mapping-young-by-default
+++ a/arch/mips/mm/cache.c
@@ -158,31 +158,29 @@ unsigned long _page_cachable_default;
 EXPORT_SYMBOL(_page_cachable_default);
 
 #define PM(p)	__pgprot(_page_cachable_default | (p))
-#define PVA(p)	PM(_PAGE_VALID | _PAGE_ACCESSED | (p))
 
 static inline void setup_protection_map(void)
 {
 	protection_map[0]  = PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_NO_READ);
-	protection_map[1]  = PVA(_PAGE_PRESENT | _PAGE_NO_EXEC);
-	protection_map[2]  = PVA(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_NO_READ);
-	protection_map[3]  = PVA(_PAGE_PRESENT | _PAGE_NO_EXEC);
-	protection_map[4]  = PVA(_PAGE_PRESENT);
-	protection_map[5]  = PVA(_PAGE_PRESENT);
-	protection_map[6]  = PVA(_PAGE_PRESENT);
-	protection_map[7]  = PVA(_PAGE_PRESENT);
+	protection_map[1]  = PM(_PAGE_PRESENT | _PAGE_NO_EXEC);
+	protection_map[2]  = PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_NO_READ);
+	protection_map[3]  = PM(_PAGE_PRESENT | _PAGE_NO_EXEC);
+	protection_map[4]  = PM(_PAGE_PRESENT);
+	protection_map[5]  = PM(_PAGE_PRESENT);
+	protection_map[6]  = PM(_PAGE_PRESENT);
+	protection_map[7]  = PM(_PAGE_PRESENT);
 
 	protection_map[8]  = PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_NO_READ);
-	protection_map[9]  = PVA(_PAGE_PRESENT | _PAGE_NO_EXEC);
-	protection_map[10] = PVA(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_WRITE |
+	protection_map[9]  = PM(_PAGE_PRESENT | _PAGE_NO_EXEC);
+	protection_map[10] = PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_WRITE |
 				_PAGE_NO_READ);
-	protection_map[11] = PVA(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_WRITE);
-	protection_map[12] = PVA(_PAGE_PRESENT);
-	protection_map[13] = PVA(_PAGE_PRESENT);
-	protection_map[14] = PVA(_PAGE_PRESENT);
-	protection_map[15] = PVA(_PAGE_PRESENT);
+	protection_map[11] = PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_WRITE);
+	protection_map[12] = PM(_PAGE_PRESENT);
+	protection_map[13] = PM(_PAGE_PRESENT);
+	protection_map[14] = PM(_PAGE_PRESENT | _PAGE_WRITE);
+	protection_map[15] = PM(_PAGE_PRESENT | _PAGE_WRITE);
 }
 
-#undef _PVA
 #undef PM
 
 void cpu_cache_init(void)
--- a/include/linux/pgtable.h~revert-mips-make-userspace-mapping-young-by-default
+++ a/include/linux/pgtable.h
@@ -432,6 +432,14 @@ static inline void ptep_set_wrprotect(st
  * To be differentiate with macro pte_mkyoung, this macro is used on platforms
  * where software maintains page access bit.
  */
+#ifndef pte_sw_mkyoung
+static inline pte_t pte_sw_mkyoung(pte_t pte)
+{
+	return pte;
+}
+#define pte_sw_mkyoung	pte_sw_mkyoung
+#endif
+
 #ifndef pte_savedwrite
 #define pte_savedwrite pte_write
 #endif
--- a/mm/memory.c~revert-mips-make-userspace-mapping-young-by-default
+++ a/mm/memory.c
@@ -2939,6 +2939,7 @@ static vm_fault_t wp_page_copy(struct vm
 		}
 		flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte));
 		entry = mk_pte(new_page, vma->vm_page_prot);
+		entry = pte_sw_mkyoung(entry);
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
 
 		/*
@@ -3602,6 +3603,7 @@ static vm_fault_t do_anonymous_page(stru
 	__SetPageUptodate(page);
 
 	entry = mk_pte(page, vma->vm_page_prot);
+	entry = pte_sw_mkyoung(entry);
 	if (vma->vm_flags & VM_WRITE)
 		entry = pte_mkwrite(pte_mkdirty(entry));
 
@@ -3786,6 +3788,8 @@ void do_set_pte(struct vm_fault *vmf, st
 
 	if (prefault && arch_wants_old_prefaulted_pte())
 		entry = pte_mkold(entry);
+	else
+		entry = pte_sw_mkyoung(entry);
 
 	if (write)
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
_


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 02/13] kfence: use TASK_IDLE when awaiting allocation
  2021-06-05  3:00 incoming Andrew Morton
  2021-06-05  3:01 ` [patch 01/13] Revert "MIPS: make userspace mapping young by default" Andrew Morton
@ 2021-06-05  3:01 ` Andrew Morton
  2021-06-05  3:01 ` [patch 03/13] pid: take a reference when initializing `cad_pid` Andrew Morton
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2021-06-05  3:01 UTC (permalink / raw)
  To: akpm, David.Laight, dvyukov, elver, glider, hdanton, linux-mm,
	mgorman, mm-commits, stable, torvalds

From: Marco Elver <elver@google.com>
Subject: kfence: use TASK_IDLE when awaiting allocation

Since wait_event() uses TASK_UNINTERRUPTIBLE by default, waiting for an
allocation counts towards load.  However, for KFENCE, this does not make
any sense, since there is no busy work we're awaiting.

Instead, use TASK_IDLE via wait_event_idle() to not count towards load.

BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1185565
Link: https://lkml.kernel.org/r/20210521083209.3740269-1-elver@google.com
Fixes: 407f1d8c1b5f ("kfence: await for allocation using wait_event")
Signed-off-by: Marco Elver <elver@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Hillf Danton <hdanton@sina.com>
Cc: <stable@vger.kernel.org>	[5.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/kfence/core.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/mm/kfence/core.c~kfence-use-task_idle-when-awaiting-allocation
+++ a/mm/kfence/core.c
@@ -627,10 +627,10 @@ static void toggle_allocation_gate(struc
 		 * During low activity with no allocations we might wait a
 		 * while; let's avoid the hung task warning.
 		 */
-		wait_event_timeout(allocation_wait, atomic_read(&kfence_allocation_gate),
-				   sysctl_hung_task_timeout_secs * HZ / 2);
+		wait_event_idle_timeout(allocation_wait, atomic_read(&kfence_allocation_gate),
+					sysctl_hung_task_timeout_secs * HZ / 2);
 	} else {
-		wait_event(allocation_wait, atomic_read(&kfence_allocation_gate));
+		wait_event_idle(allocation_wait, atomic_read(&kfence_allocation_gate));
 	}
 
 	/* Disable static key and reset timer. */
_


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 03/13] pid: take a reference when initializing `cad_pid`
  2021-06-05  3:00 incoming Andrew Morton
  2021-06-05  3:01 ` [patch 01/13] Revert "MIPS: make userspace mapping young by default" Andrew Morton
  2021-06-05  3:01 ` [patch 02/13] kfence: use TASK_IDLE when awaiting allocation Andrew Morton
@ 2021-06-05  3:01 ` Andrew Morton
  2021-06-05  3:01 ` [patch 04/13] mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests() Andrew Morton
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2021-06-05  3:01 UTC (permalink / raw)
  To: akpm, christian.brauner, christian, clg, ebiederm, keescook,
	linux-mm, mark.rutland, mm-commits, paulus, schwidefsky, stable,
	torvalds

From: Mark Rutland <mark.rutland@arm.com>
Subject: pid: take a reference when initializing `cad_pid`

During boot, kernel_init_freeable() initializes `cad_pid` to the init
task's struct pid.  Later on, we may change `cad_pid` via a sysctl, and
when this happens proc_do_cad_pid() will increment the refcount on the new
pid via get_pid(), and will decrement the refcount on the old pid via
put_pid().  As we never called get_pid() when we initialized `cad_pid`, we
decrement a reference we never incremented, can therefore free the init
task's struct pid early.  As there can be dangling references to the
struct pid, we can later encounter a use-after-free (e.g.  when delivering
signals).

This was spotted when fuzzing v5.13-rc3 with Syzkaller, but seems to have
been around since the conversion of `cad_pid` to struct pid in commit:

  9ec52099e4b8678a ("[PATCH] replace cad_pid by a struct pid")

... from the pre-KASAN stone age of v2.6.19.

Fix this by getting a reference to the init task's struct pid when we
assign it to `cad_pid`.

Full KASAN splat below.

==================================================================
BUG: KASAN: use-after-free in ns_of_pid include/linux/pid.h:153 [inline]
BUG: KASAN: use-after-free in task_active_pid_ns+0xc0/0xc8 kernel/pid.c:509
Read of size 4 at addr ffff23794dda0004 by task syz-executor.0/273

CPU: 1 PID: 273 Comm: syz-executor.0 Not tainted 5.12.0-00001-g9aef892b2d15 #1
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace+0x0/0x4a8 arch/arm64/kernel/stacktrace.c:105
 show_stack+0x34/0x48 arch/arm64/kernel/stacktrace.c:191
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x1d4/0x2a0 lib/dump_stack.c:120
 print_address_description.constprop.11+0x60/0x3a8 mm/kasan/report.c:232
 __kasan_report mm/kasan/report.c:399 [inline]
 kasan_report+0x1e8/0x200 mm/kasan/report.c:416
 __asan_report_load4_noabort+0x30/0x48 mm/kasan/report_generic.c:308
 ns_of_pid include/linux/pid.h:153 [inline]
 task_active_pid_ns+0xc0/0xc8 kernel/pid.c:509
 do_notify_parent+0x308/0xe60 kernel/signal.c:1950
 exit_notify kernel/exit.c:682 [inline]
 do_exit+0x2334/0x2bd0 kernel/exit.c:845
 do_group_exit+0x108/0x2c8 kernel/exit.c:922
 get_signal+0x4e4/0x2a88 kernel/signal.c:2781
 do_signal arch/arm64/kernel/signal.c:882 [inline]
 do_notify_resume+0x300/0x970 arch/arm64/kernel/signal.c:936
 work_pending+0xc/0x2dc

Allocated by task 0:
 kasan_save_stack+0x28/0x58 mm/kasan/common.c:38
 kasan_set_track mm/kasan/common.c:46 [inline]
 set_alloc_info mm/kasan/common.c:427 [inline]
 __kasan_slab_alloc+0x88/0xa8 mm/kasan/common.c:460
 kasan_slab_alloc include/linux/kasan.h:223 [inline]
 slab_post_alloc_hook+0x50/0x5c0 mm/slab.h:516
 slab_alloc_node mm/slub.c:2907 [inline]
 slab_alloc mm/slub.c:2915 [inline]
 kmem_cache_alloc+0x1f4/0x4c0 mm/slub.c:2920
 alloc_pid+0xdc/0xc00 kernel/pid.c:180
 copy_process+0x2794/0x5e18 kernel/fork.c:2129
 kernel_clone+0x194/0x13c8 kernel/fork.c:2500
 kernel_thread+0xd4/0x110 kernel/fork.c:2552
 rest_init+0x44/0x4a0 init/main.c:687
 arch_call_rest_init+0x1c/0x28
 start_kernel+0x520/0x554 init/main.c:1064
 0x0

Freed by task 270:
 kasan_save_stack+0x28/0x58 mm/kasan/common.c:38
 kasan_set_track+0x28/0x40 mm/kasan/common.c:46
 kasan_set_free_info+0x28/0x50 mm/kasan/generic.c:357
 ____kasan_slab_free mm/kasan/common.c:360 [inline]
 ____kasan_slab_free mm/kasan/common.c:325 [inline]
 __kasan_slab_free+0xf4/0x148 mm/kasan/common.c:367
 kasan_slab_free include/linux/kasan.h:199 [inline]
 slab_free_hook mm/slub.c:1562 [inline]
 slab_free_freelist_hook+0x98/0x260 mm/slub.c:1600
 slab_free mm/slub.c:3161 [inline]
 kmem_cache_free+0x224/0x8e0 mm/slub.c:3177
 put_pid.part.4+0xe0/0x1a8 kernel/pid.c:114
 put_pid+0x30/0x48 kernel/pid.c:109
 proc_do_cad_pid+0x190/0x1b0 kernel/sysctl.c:1401
 proc_sys_call_handler+0x338/0x4b0 fs/proc/proc_sysctl.c:591
 proc_sys_write+0x34/0x48 fs/proc/proc_sysctl.c:617
 call_write_iter include/linux/fs.h:1977 [inline]
 new_sync_write+0x3ac/0x510 fs/read_write.c:518
 vfs_write fs/read_write.c:605 [inline]
 vfs_write+0x9c4/0x1018 fs/read_write.c:585
 ksys_write+0x124/0x240 fs/read_write.c:658
 __do_sys_write fs/read_write.c:670 [inline]
 __se_sys_write fs/read_write.c:667 [inline]
 __arm64_sys_write+0x78/0xb0 fs/read_write.c:667
 __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline]
 invoke_syscall arch/arm64/kernel/syscall.c:49 [inline]
 el0_svc_common.constprop.1+0x16c/0x388 arch/arm64/kernel/syscall.c:129
 do_el0_svc+0xf8/0x150 arch/arm64/kernel/syscall.c:168
 el0_svc+0x28/0x38 arch/arm64/kernel/entry-common.c:416
 el0_sync_handler+0x134/0x180 arch/arm64/kernel/entry-common.c:432
 el0_sync+0x154/0x180 arch/arm64/kernel/entry.S:701

The buggy address belongs to the object at ffff23794dda0000
 which belongs to the cache pid of size 224
The buggy address is located 4 bytes inside of
 224-byte region [ffff23794dda0000, ffff23794dda00e0)
The buggy address belongs to the page:
page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4dda0
head:(____ptrval____) order:1 compound_mapcount:0
flags: 0x3fffc0000010200(slab|head)
raw: 03fffc0000010200 dead000000000100 dead000000000122 ffff23794d40d080
raw: 0000000000000000 0000000000190019 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff23794dd9ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff23794dd9ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff23794dda0000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                   ^
 ffff23794dda0080: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
 ffff23794dda0100: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00
==================================================================

Link: https://lkml.kernel.org/r/20210524172230.38715-1-mark.rutland@arm.com
Fixes: 9ec52099e4b8678a ("[PATCH] replace cad_pid by a struct pid")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 init/main.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/init/main.c~pid-take-a-reference-when-initializing-cad_pid
+++ a/init/main.c
@@ -1537,7 +1537,7 @@ static noinline void __init kernel_init_
 	 */
 	set_mems_allowed(node_states[N_MEMORY]);
 
-	cad_pid = task_pid(current);
+	cad_pid = get_pid(task_pid(current));
 
 	smp_prepare_cpus(setup_max_cpus);
 
_


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 04/13] mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests()
  2021-06-05  3:00 incoming Andrew Morton
                   ` (2 preceding siblings ...)
  2021-06-05  3:01 ` [patch 03/13] pid: take a reference when initializing `cad_pid` Andrew Morton
@ 2021-06-05  3:01 ` Andrew Morton
  2021-06-05  3:01 ` [patch 05/13] mm/page_alloc: fix counting of free pages after take off from buddy Andrew Morton
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2021-06-05  3:01 UTC (permalink / raw)
  To: akpm, anshuman.khandual, gerald.schaefer, linux-mm, mm-commits,
	palmer, paul.walmsley, stable, torvalds, vgupta

From: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Subject: mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests()

In pmd/pud_advanced_tests(), the vaddr is aligned up to the next pmd/pud
entry, and so it does not match the given pmdp/pudp and (aligned down) pfn
any more.

For s390, this results in memory corruption, because the IDTE instruction
used e.g. in xxx_get_and_clear() will take the vaddr for some calculations,
in combination with the given pmdp. It will then end up with a wrong table
origin, ending on ...ff8, and some of those wrongly set low-order bits will
also select a wrong pagetable level for the index addition. IDTE could
therefore invalidate (or 0x20) something outside of the page tables,
depending on the wrongly picked index, which in turn depends on the random
vaddr.

As result, we sometimes see "BUG task_struct (Not tainted): Padding
overwritten" on s390, where one 0x5a padding value got overwritten with
0x7a.

Fix this by aligning down, similar to how the pmd/pud_aligned pfns are
calculated.

Link: https://lkml.kernel.org/r/20210525130043.186290-2-gerald.schaefer@linux.ibm.com
Fixes: a5c3b9ffb0f40 ("mm/debug_vm_pgtable: add tests validating advanced arch page table helpers")
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: <stable@vger.kernel.org>	[5.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/debug_vm_pgtable.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/mm/debug_vm_pgtable.c~mm-debug_vm_pgtable-fix-alignment-for-pmd-pud_advanced_tests
+++ a/mm/debug_vm_pgtable.c
@@ -192,7 +192,7 @@ static void __init pmd_advanced_tests(st
 
 	pr_debug("Validating PMD advanced\n");
 	/* Align the address wrt HPAGE_PMD_SIZE */
-	vaddr = (vaddr & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE;
+	vaddr &= HPAGE_PMD_MASK;
 
 	pgtable_trans_huge_deposit(mm, pmdp, pgtable);
 
@@ -330,7 +330,7 @@ static void __init pud_advanced_tests(st
 
 	pr_debug("Validating PUD advanced\n");
 	/* Align the address wrt HPAGE_PUD_SIZE */
-	vaddr = (vaddr & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE;
+	vaddr &= HPAGE_PUD_MASK;
 
 	set_pud_at(mm, vaddr, pudp, pud);
 	pudp_set_wrprotect(mm, vaddr, pudp);
_


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 05/13] mm/page_alloc: fix counting of free pages after take off from buddy
  2021-06-05  3:00 incoming Andrew Morton
                   ` (3 preceding siblings ...)
  2021-06-05  3:01 ` [patch 04/13] mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests() Andrew Morton
@ 2021-06-05  3:01 ` Andrew Morton
  2021-06-05  3:01 ` [patch 06/13] drivers/base/memory: fix trying offlining memory blocks with memory holes on aarch64 Andrew Morton
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2021-06-05  3:01 UTC (permalink / raw)
  To: akpm, david, dinghui, linux-mm, mm-commits, naoya.horiguchi,
	osalvador, stable, torvalds

From: Ding Hui <dinghui@sangfor.com.cn>
Subject: mm/page_alloc: fix counting of free pages after take off from buddy

Recently we found that there is a lot MemFree left in /proc/meminfo after
do a lot of pages soft offline, it's not quite correct.

Before Oscar rework soft offline for free pages [1], if we soft offline
free pages, these pages are left in buddy with HWPoison flag, and
NR_FREE_PAGES is not updated immediately.  So the difference between
NR_FREE_PAGES and real number of available free pages is also even big at
the beginning.

However, with the workload running, when we catch HWPoison page in any
alloc functions subsequently, we will remove it from buddy, meanwhile
update the NR_FREE_PAGES and try again, so the NR_FREE_PAGES will get more
and more closer to the real number of available free pages.  (regardless
of unpoison_memory())

Now, for offline free pages, after a successful call
take_page_off_buddy(), the page is no longer belong to buddy allocator,
and will not be used any more, but we missed accounting NR_FREE_PAGES in
this situation, and there is no chance to be updated later.

Do update in take_page_off_buddy() like rmqueue() does, but avoid double
counting if some one already set_migratetype_isolate() on the page.

[1]: commit 06be6ff3d2ec ("mm,hwpoison: rework soft offline for free pages")

Link: https://lkml.kernel.org/r/20210526075247.11130-1-dinghui@sangfor.com.cn
Fixes: 06be6ff3d2ec ("mm,hwpoison: rework soft offline for free pages")
Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
Suggested-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_alloc.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/mm/page_alloc.c~mm-page_alloc-fix-counting-of-free-pages-after-take-off-from-buddy
+++ a/mm/page_alloc.c
@@ -9158,6 +9158,8 @@ bool take_page_off_buddy(struct page *pa
 			del_page_from_free_list(page_head, zone, page_order);
 			break_down_buddy_pages(zone, page_head, page, 0,
 						page_order, migratetype);
+			if (!is_migrate_isolate(migratetype))
+				__mod_zone_freepage_state(zone, -1, migratetype);
 			ret = true;
 			break;
 		}
_


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 06/13] drivers/base/memory: fix trying offlining memory blocks with memory holes on aarch64
  2021-06-05  3:00 incoming Andrew Morton
                   ` (4 preceding siblings ...)
  2021-06-05  3:01 ` [patch 05/13] mm/page_alloc: fix counting of free pages after take off from buddy Andrew Morton
@ 2021-06-05  3:01 ` Andrew Morton
  2021-06-05  3:01 ` [patch 07/13] hugetlb: pass head page to remove_hugetlb_page() Andrew Morton
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2021-06-05  3:01 UTC (permalink / raw)
  To: akpm, anshuman.khandual, david, linux-mm, mhocko, mm-commits,
	osalvador, quic_qiancai, rppt, torvalds

From: David Hildenbrand <david@redhat.com>
Subject: drivers/base/memory: fix trying offlining memory blocks with memory holes on aarch64

offline_pages() properly checks for memory holes and bails out.  However,
we do a page_zone(pfn_to_page(start_pfn)) before calling offline_pages()
when offlining a memory block.  We should not unconditionally call
page_zone(pfn_to_page(start_pfn)) on aarch64 in offlining code, otherwise
we can trigger a BUG when hitting a memory hole:

[  162.327720][ T1694] kernel BUG at include/linux/mm.h:1383!
[  162.333695][ T1694] Internal error: Oops - BUG: 0 [#1] SMP
[  162.339181][ T1694] Modules linked in: loop processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb nvme i2c_algo_bit mlx5_core i2c_core nvme_core firmware_class
[  162.354604][ T1694] CPU: 13 PID: 1694 Comm: ranbug Not tainted 5.12.0-next-20210524+ #4
[  162.362601][ T1694] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
[  162.371116][ T1694] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
[  162.377811][ T1694] pc : memory_subsys_offline+0x1f8/0x250
[  162.383295][ T1694] lr : memory_subsys_offline+0x1f8/0x250
[  162.388773][ T1694] sp : ffff80002458f8e0
[  162.392773][ T1694] x29: ffff80002458f8e0 x28: ffff800010914d30 x27: 0000000000000000
[  162.400602][ T1694] x26: 0000000000002000 x25: 1fffe00002550401 x24: ffff000012a82008
[  162.408431][ T1694] x23: fffffc0000000000 x22: 0000000000008000 x21: 0000000000000001
[  162.416259][ T1694] x20: ffffffffffffffff x19: ffff000012a82018 x18: ffff0008527b6a70
[  162.424086][ T1694] x17: 0000000000000000 x16: 0000000000000007 x15: 00000000000000c8
[  162.431914][ T1694] x14: 0000000000000000 x13: ffff800011c6eea4 x12: ffff60136ceb8574
[  162.439742][ T1694] x11: 1fffe0136ceb8573 x10: ffff60136ceb8573 x9 : dfff800000000000
[  162.447570][ T1694] x8 : ffff009b675c2b9b x7 : 0000000000000001 x6 : ffff009b675c2b98
[  162.455398][ T1694] x5 : 00009fec93147a8d x4 : ffff009b675c2b98 x3 : 1fffe0010a4f6c09
[  162.463226][ T1694] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000034
[  162.471054][ T1694] Call trace:
[  162.474186][ T1694]  memory_subsys_offline+0x1f8/0x250
[  162.479318][ T1694]  device_offline+0x154/0x1d8
[  162.483844][ T1694]  online_store+0xa4/0x118
[  162.488107][ T1694]  dev_attr_store+0x44/0x78
[  162.492457][ T1694]  sysfs_kf_write+0xe8/0x138
[  162.496896][ T1694]  kernfs_fop_write_iter+0x26c/0x3d0
[  162.502028][ T1694]  new_sync_write+0x2bc/0x4f8
[  162.506552][ T1694]  vfs_write+0x718/0xc88
[  162.510643][ T1694]  ksys_write+0xf8/0x1e0
[  162.514732][ T1694]  __arm64_sys_write+0x74/0xa8
[  162.519342][ T1694]  invoke_syscall.constprop.0+0x78/0x1e8
[  162.524824][ T1694]  do_el0_svc+0xe4/0x298
[  162.528914][ T1694]  el0_svc+0x20/0x30
[  162.532658][ T1694]  el0_sync_handler+0xb0/0xb8
[  162.537181][ T1694]  el0_sync+0x178/0x180
[  162.541187][ T1694] Code: f00033e1 91318021 91090021 97e38d8b (d4210000)
[  162.547968][ T1694] ---[ end trace 2a1964462a219f20 ]---
[  162.553273][ T1694] Kernel panic - not syncing: Oops - BUG: Fatal exception
[  162.560250][ T1694] SMP: stopping secondary CPUs
[  162.564871][ T1694] Kernel Offset: disabled
[  162.569045][ T1694] CPU features: 0x00000251,20000846
[  162.574089][ T1694] Memory Limit: none
[  162.577849][ T1694] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception ]---

If nr_vmemmap_pages is set, we know that we are dealing with hotplugged
memory that doesn't have any holes.  So call
page_zone(pfn_to_page(start_pfn)) only when really necessary -- when
nr_vmemmap_pages is set and we actually adjust the present pages.

Link: https://lkml.kernel.org/r/20210526075226.5572-1-david@redhat.com
Fixes: a08a2ae34613 ("mm,memory_hotplug: allocate memmap from the added memory range")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Qian Cai (QUIC) <quic_qiancai@quicinc.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/base/memory.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/drivers/base/memory.c~drivers-base-memory-fix-trying-offlining-memory-blocks-with-memory-holes-on-aarch64
+++ a/drivers/base/memory.c
@@ -218,14 +218,14 @@ static int memory_block_offline(struct m
 	struct zone *zone;
 	int ret;
 
-	zone = page_zone(pfn_to_page(start_pfn));
-
 	/*
 	 * Unaccount before offlining, such that unpopulated zone and kthreads
 	 * can properly be torn down in offline_pages().
 	 */
-	if (nr_vmemmap_pages)
+	if (nr_vmemmap_pages) {
+		zone = page_zone(pfn_to_page(start_pfn));
 		adjust_present_page_count(zone, -nr_vmemmap_pages);
+	}
 
 	ret = offline_pages(start_pfn + nr_vmemmap_pages,
 			    nr_pages - nr_vmemmap_pages);
_


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 07/13] hugetlb: pass head page to remove_hugetlb_page()
  2021-06-05  3:00 incoming Andrew Morton
                   ` (5 preceding siblings ...)
  2021-06-05  3:01 ` [patch 06/13] drivers/base/memory: fix trying offlining memory blocks with memory holes on aarch64 Andrew Morton
@ 2021-06-05  3:01 ` Andrew Morton
  2021-06-05  3:01 ` [patch 08/13] proc: add .gitignore for proc-subset-pid selftest Andrew Morton
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2021-06-05  3:01 UTC (permalink / raw)
  To: akpm, linmiaohe, linux-mm, mhocko, mike.kravetz, mm-commits,
	naoya.horiguchi, osalvador, songmuchun, torvalds, willy

From: Naoya Horiguchi <naoya.horiguchi@nec.com>
Subject: hugetlb: pass head page to remove_hugetlb_page()

When memory_failure() or soft_offline_page() is called on a tail page of
some hugetlb page, "BUG: unable to handle page fault" error can be
triggered.

remove_hugetlb_page() dereferences page->lru, so it's assumed that the
page points to a head page, but one of the caller,
dissolve_free_huge_page(), provides remove_hugetlb_page() with 'page'
which could be a tail page.  So pass 'head' to it, instead.

Link: https://lkml.kernel.org/r/20210526235257.2769473-1-nao.horiguchi@gmail.com
Fixes: 6eb4e88a6d27 ("hugetlb: create remove_hugetlb_page() to separate functionality")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/hugetlb.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/hugetlb.c~hugetlb-pass-head-page-to-remove_hugetlb_page
+++ a/mm/hugetlb.c
@@ -1793,7 +1793,7 @@ retry:
 			SetPageHWPoison(page);
 			ClearPageHWPoison(head);
 		}
-		remove_hugetlb_page(h, page, false);
+		remove_hugetlb_page(h, head, false);
 		h->max_huge_pages--;
 		spin_unlock_irq(&hugetlb_lock);
 		update_and_free_page(h, head);
_


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 08/13] proc: add .gitignore for proc-subset-pid selftest
  2021-06-05  3:00 incoming Andrew Morton
                   ` (6 preceding siblings ...)
  2021-06-05  3:01 ` [patch 07/13] hugetlb: pass head page to remove_hugetlb_page() Andrew Morton
@ 2021-06-05  3:01 ` Andrew Morton
  2021-06-05  3:01 ` [patch 09/13] mm/kasan/init.c: fix doc warning Andrew Morton
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2021-06-05  3:01 UTC (permalink / raw)
  To: adobriyan, akpm, christian.brauner, dmatlack, gladkov.alexey,
	linux-mm, mm-commits, shuah, torvalds

From: David Matlack <dmatlack@google.com>
Subject: proc: add .gitignore for proc-subset-pid selftest

This new selftest needs an entry in the .gitignore file otherwise git
will try to track the binary.

Link: https://lkml.kernel.org/r/20210601164305.11776-1-dmatlack@google.com
Fixes: 268af17ada5855 ("selftests: proc: test subset=pid")
Signed-off-by: David Matlack <dmatlack@google.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 tools/testing/selftests/proc/.gitignore |    1 +
 1 file changed, 1 insertion(+)

--- a/tools/testing/selftests/proc/.gitignore~proc-add-gitignore-for-proc-subset-pid-selftest
+++ a/tools/testing/selftests/proc/.gitignore
@@ -10,6 +10,7 @@
 /proc-self-map-files-002
 /proc-self-syscall
 /proc-self-wchan
+/proc-subset-pid
 /proc-uptime-001
 /proc-uptime-002
 /read
_


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 09/13] mm/kasan/init.c: fix doc warning
  2021-06-05  3:00 incoming Andrew Morton
                   ` (7 preceding siblings ...)
  2021-06-05  3:01 ` [patch 08/13] proc: add .gitignore for proc-subset-pid selftest Andrew Morton
@ 2021-06-05  3:01 ` Andrew Morton
  2021-06-05  3:01 ` [patch 10/13] mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY Andrew Morton
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2021-06-05  3:01 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, ryabinin.a.a, torvalds, yi.zhang, yukuai3

From: Yu Kuai <yukuai3@huawei.com>
Subject: mm/kasan/init.c: fix doc warning

Fix gcc W=1 warning:

mm/kasan/init.c:228: warning: Function parameter or member 'shadow_start' not described in 'kasan_populate_early_shadow'
mm/kasan/init.c:228: warning: Function parameter or member 'shadow_end' not described in 'kasan_populate_early_shadow'

Link: https://lkml.kernel.org/r/20210603140700.3045298-1-yukuai3@huawei.com
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Acked-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/kasan/init.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/mm/kasan/init.c~kasan-fix-doc-warning-in-initc
+++ a/mm/kasan/init.c
@@ -220,8 +220,8 @@ static int __ref zero_p4d_populate(pgd_t
 /**
  * kasan_populate_early_shadow - populate shadow memory region with
  *                               kasan_early_shadow_page
- * @shadow_start - start of the memory range to populate
- * @shadow_end   - end of the memory range to populate
+ * @shadow_start: start of the memory range to populate
+ * @shadow_end: end of the memory range to populate
  */
 int __ref kasan_populate_early_shadow(const void *shadow_start,
 					const void *shadow_end)
_


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 10/13] mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY
  2021-06-05  3:00 incoming Andrew Morton
                   ` (8 preceding siblings ...)
  2021-06-05  3:01 ` [patch 09/13] mm/kasan/init.c: fix doc warning Andrew Morton
@ 2021-06-05  3:01 ` Andrew Morton
  2021-06-05  3:01 ` [patch 11/13] lib: crc64: fix kernel-doc warning Andrew Morton
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2021-06-05  3:01 UTC (permalink / raw)
  To: akpm, almasrymina, axelrasmussen, linux-mm, mike.kravetz,
	mm-commits, peterx, stable, torvalds

From: Mina Almasry <almasrymina@google.com>
Subject: mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY

The userfaultfd hugetlb tests cause a resv_huge_pages underflow. This
happens when hugetlb_mcopy_atomic_pte() is called with !is_continue on an
index for which we already have a page in the cache.  When this happens,
we allocate a second page, double consuming the reservation, and then fail
to insert the page into the cache and return -EEXIST.

To fix this, we first check if there is a page in the cache which already
consumed the reservation, and return -EEXIST immediately if so.

There is still a rare condition where we fail to copy the page contents
AND race with a call for hugetlb_no_page() for this index and again we
will underflow resv_huge_pages.  That is fixed in a more complicated patch
not targeted for -stable.

Test:
Hacked the code locally such that resv_huge_pages underflows produce
a warning, then:

./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10
	2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success
./tools/testing/selftests/vm/userfaultfd hugetlb 10
	2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success

Both tests succeed and produce no warnings. After the
test runs number of free/resv hugepages is correct.

[mike.kravetz@oracle.com: changelog fixes]
Link: https://lkml.kernel.org/r/20210528004649.85298-1-almasrymina@google.com
Fixes: 8fb5debc5fcd ("userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for userfaultfd support")
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/hugetlb.c |   14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

--- a/mm/hugetlb.c~mm-hugetlb-fix-simple-resv_huge_pages-underflow-on-uffdio_copy
+++ a/mm/hugetlb.c
@@ -4889,10 +4889,20 @@ int hugetlb_mcopy_atomic_pte(struct mm_s
 		if (!page)
 			goto out;
 	} else if (!*pagep) {
-		ret = -ENOMEM;
+		/* If a page already exists, then it's UFFDIO_COPY for
+		 * a non-missing case. Return -EEXIST.
+		 */
+		if (vm_shared &&
+		    hugetlbfs_pagecache_present(h, dst_vma, dst_addr)) {
+			ret = -EEXIST;
+			goto out;
+		}
+
 		page = alloc_huge_page(dst_vma, dst_addr, 0);
-		if (IS_ERR(page))
+		if (IS_ERR(page)) {
+			ret = -ENOMEM;
 			goto out;
+		}
 
 		ret = copy_huge_page_from_user(page,
 						(const void __user *) src_addr,
_


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 11/13] lib: crc64: fix kernel-doc warning
  2021-06-05  3:00 incoming Andrew Morton
                   ` (9 preceding siblings ...)
  2021-06-05  3:01 ` [patch 10/13] mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY Andrew Morton
@ 2021-06-05  3:01 ` Andrew Morton
  2021-06-05  3:01 ` [patch 12/13] ocfs2: fix data corruption by fallocate Andrew Morton
  2021-06-05  3:01 ` [patch 13/13] mailmap: use private address for Michel Lespinasse Andrew Morton
  12 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2021-06-05  3:01 UTC (permalink / raw)
  To: akpm, colyli, linux-mm, mm-commits, rdunlap, torvalds, yuehaibing

From: YueHaibing <yuehaibing@huawei.com>
Subject: lib: crc64: fix kernel-doc warning

Fix W=1 kernel build warning:

lib/crc64.c:40: warning:
 bad line:         or the previous crc64 value if computing incrementally.

Link: https://lkml.kernel.org/r/20210601135851.15444-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Coly Li <colyli@suse.de>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 lib/crc64.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/lib/crc64.c~lib-crc64-fix-kernel-doc-warning
+++ a/lib/crc64.c
@@ -37,7 +37,7 @@ MODULE_LICENSE("GPL v2");
 /**
  * crc64_be - Calculate bitwise big-endian ECMA-182 CRC64
  * @crc: seed value for computation. 0 or (u64)~0 for a new CRC calculation,
-	or the previous crc64 value if computing incrementally.
+ *       or the previous crc64 value if computing incrementally.
  * @p: pointer to buffer over which CRC64 is run
  * @len: length of buffer @p
  */
_


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 12/13] ocfs2: fix data corruption by fallocate
  2021-06-05  3:00 incoming Andrew Morton
                   ` (10 preceding siblings ...)
  2021-06-05  3:01 ` [patch 11/13] lib: crc64: fix kernel-doc warning Andrew Morton
@ 2021-06-05  3:01 ` Andrew Morton
  2021-06-05  3:01 ` [patch 13/13] mailmap: use private address for Michel Lespinasse Andrew Morton
  12 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2021-06-05  3:01 UTC (permalink / raw)
  To: akpm, gechangwei, ghe, jack, jlbec, joseph.qi, junxiao.bi,
	linux-mm, mark, mm-commits, piaojun, stable, torvalds

From: Junxiao Bi <junxiao.bi@oracle.com>
Subject: ocfs2: fix data corruption by fallocate

When fallocate punches holes out of inode size, if original isize is in
the middle of last cluster, then the part from isize to the end of the
cluster will be zeroed with buffer write, at that time isize is not yet
updated to match the new size, if writeback is kicked in, it will invoke
ocfs2_writepage()->block_write_full_page() where the pages out of inode
size will be dropped.  That will cause file corruption.  Fix this by zero
out eof blocks when extending the inode size.

Running the following command with qemu-image 4.2.1 can get a corrupted
coverted image file easily.

    qemu-img convert -p -t none -T none -f qcow2 $qcow_image \
             -O qcow2 -o compat=1.1 $qcow_image.conv

The usage of fallocate in qemu is like this, it first punches holes out of
inode size, then extend the inode size.

    fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0
    fallocate(11, 0, 2276196352, 65536) = 0

v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html
v2: https://lore.kernel.org/linux-fsdevel/20210525093034.GB4112@quack2.suse.cz/T/

Link: https://lkml.kernel.org/r/20210528210648.9124-1-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/ocfs2/file.c |   55 +++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 50 insertions(+), 5 deletions(-)

--- a/fs/ocfs2/file.c~ocfs2-fix-data-corruption-by-fallocate
+++ a/fs/ocfs2/file.c
@@ -1856,6 +1856,45 @@ out:
 }
 
 /*
+ * zero out partial blocks of one cluster.
+ *
+ * start: file offset where zero starts, will be made upper block aligned.
+ * len: it will be trimmed to the end of current cluster if "start + len"
+ *      is bigger than it.
+ */
+static int ocfs2_zeroout_partial_cluster(struct inode *inode,
+					u64 start, u64 len)
+{
+	int ret;
+	u64 start_block, end_block, nr_blocks;
+	u64 p_block, offset;
+	u32 cluster, p_cluster, nr_clusters;
+	struct super_block *sb = inode->i_sb;
+	u64 end = ocfs2_align_bytes_to_clusters(sb, start);
+
+	if (start + len < end)
+		end = start + len;
+
+	start_block = ocfs2_blocks_for_bytes(sb, start);
+	end_block = ocfs2_blocks_for_bytes(sb, end);
+	nr_blocks = end_block - start_block;
+	if (!nr_blocks)
+		return 0;
+
+	cluster = ocfs2_bytes_to_clusters(sb, start);
+	ret = ocfs2_get_clusters(inode, cluster, &p_cluster,
+				&nr_clusters, NULL);
+	if (ret)
+		return ret;
+	if (!p_cluster)
+		return 0;
+
+	offset = start_block - ocfs2_clusters_to_blocks(sb, cluster);
+	p_block = ocfs2_clusters_to_blocks(sb, p_cluster) + offset;
+	return sb_issue_zeroout(sb, p_block, nr_blocks, GFP_NOFS);
+}
+
+/*
  * Parts of this function taken from xfs_change_file_space()
  */
 static int __ocfs2_change_file_space(struct file *file, struct inode *inode,
@@ -1865,7 +1904,7 @@ static int __ocfs2_change_file_space(str
 {
 	int ret;
 	s64 llen;
-	loff_t size;
+	loff_t size, orig_isize;
 	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
 	struct buffer_head *di_bh = NULL;
 	handle_t *handle;
@@ -1896,6 +1935,7 @@ static int __ocfs2_change_file_space(str
 		goto out_inode_unlock;
 	}
 
+	orig_isize = i_size_read(inode);
 	switch (sr->l_whence) {
 	case 0: /*SEEK_SET*/
 		break;
@@ -1903,7 +1943,7 @@ static int __ocfs2_change_file_space(str
 		sr->l_start += f_pos;
 		break;
 	case 2: /*SEEK_END*/
-		sr->l_start += i_size_read(inode);
+		sr->l_start += orig_isize;
 		break;
 	default:
 		ret = -EINVAL;
@@ -1957,6 +1997,14 @@ static int __ocfs2_change_file_space(str
 	default:
 		ret = -EINVAL;
 	}
+
+	/* zeroout eof blocks in the cluster. */
+	if (!ret && change_size && orig_isize < size) {
+		ret = ocfs2_zeroout_partial_cluster(inode, orig_isize,
+					size - orig_isize);
+		if (!ret)
+			i_size_write(inode, size);
+	}
 	up_write(&OCFS2_I(inode)->ip_alloc_sem);
 	if (ret) {
 		mlog_errno(ret);
@@ -1973,9 +2021,6 @@ static int __ocfs2_change_file_space(str
 		goto out_inode_unlock;
 	}
 
-	if (change_size && i_size_read(inode) < size)
-		i_size_write(inode, size);
-
 	inode->i_ctime = inode->i_mtime = current_time(inode);
 	ret = ocfs2_mark_inode_dirty(handle, inode, di_bh);
 	if (ret < 0)
_


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 13/13] mailmap: use private address for Michel Lespinasse
  2021-06-05  3:00 incoming Andrew Morton
                   ` (11 preceding siblings ...)
  2021-06-05  3:01 ` [patch 12/13] ocfs2: fix data corruption by fallocate Andrew Morton
@ 2021-06-05  3:01 ` Andrew Morton
  12 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2021-06-05  3:01 UTC (permalink / raw)
  To: akpm, corbet, linux-mm, michel, mm-commits, torvalds

From: Michel Lespinasse <michel@lespinasse.org>
Subject: mailmap: use private address for Michel Lespinasse

Link: https://lkml.kernel.org/r/20210602221225.49446-1-michel@lespinasse.org
Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 .mailmap |    3 +++
 1 file changed, 3 insertions(+)

--- a/.mailmap~mailmap-use-private-address-for-michel-lespinasse
+++ a/.mailmap
@@ -243,6 +243,9 @@ Maxime Ripard <mripard@kernel.org> <maxi
 Mayuresh Janorkar <mayur@ti.com>
 Michael Buesch <m@bues.ch>
 Michel Dänzer <michel@tungstengraphics.com>
+Michel Lespinasse <michel@lespinasse.org>
+Michel Lespinasse <michel@lespinasse.org> <walken@google.com>
+Michel Lespinasse <michel@lespinasse.org> <walken@zoy.org>
 Miguel Ojeda <ojeda@kernel.org> <miguel.ojeda.sandonis@gmail.com>
 Mike Rapoport <rppt@kernel.org> <mike@compulab.co.il>
 Mike Rapoport <rppt@kernel.org> <mike.rapoport@gmail.com>
_


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-06-05  7:16 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-05  3:00 incoming Andrew Morton
2021-06-05  3:01 ` [patch 01/13] Revert "MIPS: make userspace mapping young by default" Andrew Morton
2021-06-05  3:01 ` [patch 02/13] kfence: use TASK_IDLE when awaiting allocation Andrew Morton
2021-06-05  3:01 ` [patch 03/13] pid: take a reference when initializing `cad_pid` Andrew Morton
2021-06-05  3:01 ` [patch 04/13] mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests() Andrew Morton
2021-06-05  3:01 ` [patch 05/13] mm/page_alloc: fix counting of free pages after take off from buddy Andrew Morton
2021-06-05  3:01 ` [patch 06/13] drivers/base/memory: fix trying offlining memory blocks with memory holes on aarch64 Andrew Morton
2021-06-05  3:01 ` [patch 07/13] hugetlb: pass head page to remove_hugetlb_page() Andrew Morton
2021-06-05  3:01 ` [patch 08/13] proc: add .gitignore for proc-subset-pid selftest Andrew Morton
2021-06-05  3:01 ` [patch 09/13] mm/kasan/init.c: fix doc warning Andrew Morton
2021-06-05  3:01 ` [patch 10/13] mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY Andrew Morton
2021-06-05  3:01 ` [patch 11/13] lib: crc64: fix kernel-doc warning Andrew Morton
2021-06-05  3:01 ` [patch 12/13] ocfs2: fix data corruption by fallocate Andrew Morton
2021-06-05  3:01 ` [patch 13/13] mailmap: use private address for Michel Lespinasse Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).