All of lore.kernel.org
 help / color / mirror / Atom feed
* incoming
@ 2020-11-22  6:16 Andrew Morton
  2020-11-22  6:16 ` [patch 1/8] mm/madvise: fix memory leak from process_madvise Andrew Morton
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Andrew Morton @ 2020-11-22  6:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm

8 patches, based on a349e4c659609fd20e4beea89e5c4a4038e33a95.

Subsystems affected by this patch series:

  mm/madvise
  kbuild
  mm/pagemap
  mm/readahead
  mm/memcg
  mm/userfaultfd
  vfs-akpm
  mm/madvise

Subsystem: mm/madvise

    Eric Dumazet <edumazet@google.com>:
      mm/madvise: fix memory leak from process_madvise

Subsystem: kbuild

    Nick Desaulniers <ndesaulniers@google.com>:
      compiler-clang: remove version check for BPF Tracing

Subsystem: mm/pagemap

    Dan Williams <dan.j.williams@intel.com>:
      mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports

Subsystem: mm/readahead

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
      mm: fix readahead_page_batch for retry entries

Subsystem: mm/memcg

    Muchun Song <songmuchun@bytedance.com>:
      mm: memcg/slab: fix root memcg vmstats

Subsystem: mm/userfaultfd

    Gerald Schaefer <gerald.schaefer@linux.ibm.com>:
      mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()

Subsystem: vfs-akpm

    Yicong Yang <yangyicong@hisilicon.com>:
      libfs: fix error cast of negative value in simple_attr_write()

Subsystem: mm/madvise

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
      mm: fix madvise WILLNEED performance problem

 arch/ia64/include/asm/sparsemem.h    |    6 ++++++
 arch/powerpc/include/asm/mmzone.h    |    5 +++++
 arch/powerpc/include/asm/sparsemem.h |    5 ++---
 arch/powerpc/mm/mem.c                |    1 +
 arch/x86/include/asm/sparsemem.h     |   10 ++++++++++
 arch/x86/mm/numa.c                   |    2 ++
 drivers/dax/Kconfig                  |    1 -
 fs/libfs.c                           |    6 ++++--
 include/linux/compiler-clang.h       |    2 ++
 include/linux/memory_hotplug.h       |   14 --------------
 include/linux/numa.h                 |   30 +++++++++++++++++++++++++++++-
 include/linux/pagemap.h              |    2 ++
 mm/huge_memory.c                     |    9 ++++-----
 mm/madvise.c                         |    4 +---
 mm/memcontrol.c                      |    9 +++++++--
 mm/memory_hotplug.c                  |   18 ------------------
 16 files changed, 75 insertions(+), 49 deletions(-)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [patch 1/8] mm/madvise: fix memory leak from process_madvise
  2020-11-22  6:16 incoming Andrew Morton
@ 2020-11-22  6:16 ` Andrew Morton
  2020-11-22  6:17 ` [patch 2/8] compiler-clang: remove version check for BPF Tracing Andrew Morton
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2020-11-22  6:16 UTC (permalink / raw)
  To: akpm, edumazet, linux-mm, minchan, mm-commits, torvalds

From: Eric Dumazet <edumazet@google.com>
Subject: mm/madvise: fix memory leak from process_madvise

The early return in process_madvise() will produce a memory leak.  Fix it.

Link: https://lkml.kernel.org/r/20201116155132.GA3805951@google.com
Fixes: ecb8ac8b1f14 ("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/madvise.c |    2 --
 1 file changed, 2 deletions(-)

--- a/mm/madvise.c~mm-madvise-fix-memory-leak-from-process_madvise
+++ a/mm/madvise.c
@@ -1231,8 +1231,6 @@ SYSCALL_DEFINE5(process_madvise, int, pi
 		ret = total_len - iov_iter_count(&iter);
 
 	mmput(mm);
-	return ret;
-
 release_task:
 	put_task_struct(task);
 put_pid:
_

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [patch 2/8] compiler-clang: remove version check for BPF Tracing
  2020-11-22  6:16 incoming Andrew Morton
  2020-11-22  6:16 ` [patch 1/8] mm/madvise: fix memory leak from process_madvise Andrew Morton
@ 2020-11-22  6:17 ` Andrew Morton
  2020-11-22  6:17 ` [patch 3/8] mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports Andrew Morton
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2020-11-22  6:17 UTC (permalink / raw)
  To: akpm, jarkko, linux-mm, mm-commits, natechancellor, ndesaulniers,
	ojeda, songliubraving, torvalds, yu.chen.surf

From: Nick Desaulniers <ndesaulniers@google.com>
Subject: compiler-clang: remove version check for BPF Tracing

bpftrace parses the kernel headers and uses Clang under the hood.  Remove
the version check when __BPF_TRACING__ is defined (as bpftrace does) so
that this tool can continue to parse kernel headers, even with older clang
sources.

Link: https://lkml.kernel.org/r/20201104191052.390657-1-ndesaulniers@google.com
Fixes: commit 1f7a44f63e6c ("compiler-clang: add build check for clang 10.0.1")
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reported-by: Chen Yu <yu.chen.surf@gmail.com>
Reported-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/compiler-clang.h |    2 ++
 1 file changed, 2 insertions(+)

--- a/include/linux/compiler-clang.h~compiler-clang-remove-version-check-for-bpf-tracing
+++ a/include/linux/compiler-clang.h
@@ -8,8 +8,10 @@
 		     + __clang_patchlevel__)
 
 #if CLANG_VERSION < 100001
+#ifndef __BPF_TRACING__
 # error Sorry, your version of Clang is too old - please use 10.0.1 or newer.
 #endif
+#endif
 
 /* Compiler specific definitions for Clang compiler */
 
_

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [patch 3/8] mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports
  2020-11-22  6:16 incoming Andrew Morton
  2020-11-22  6:16 ` [patch 1/8] mm/madvise: fix memory leak from process_madvise Andrew Morton
  2020-11-22  6:17 ` [patch 2/8] compiler-clang: remove version check for BPF Tracing Andrew Morton
@ 2020-11-22  6:17 ` Andrew Morton
  2020-11-22  6:17 ` [patch 4/8] mm: fix readahead_page_batch for retry entries Andrew Morton
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2020-11-22  6:17 UTC (permalink / raw)
  To: akpm, benh, dan.j.williams, fenghua.yu, hch, hch, joao.m.martins,
	linux-mm, lkp, mm-commits, mpe, paulus, rdunlap, sfr, tglx,
	tony.luck, torvalds, vishal.l.verma

From: Dan Williams <dan.j.williams@intel.com>
Subject: mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports

The core-mm has a default __weak implementation of phys_to_target_node()
to mirror the weak definition of memory_add_physaddr_to_nid().  That
symbol is exported for modules.  However, while the export in
mm/memory_hotplug.c exported the symbol in the configuration cases of:

	CONFIG_NUMA_KEEP_MEMINFO=y
	CONFIG_MEMORY_HOTPLUG=y

...and:

	CONFIG_NUMA_KEEP_MEMINFO=n
	CONFIG_MEMORY_HOTPLUG=y

...it failed to export the symbol in the case of:

	CONFIG_NUMA_KEEP_MEMINFO=y
	CONFIG_MEMORY_HOTPLUG=n

Not only is that broken, but Christoph points out that the kernel should
not be exporting any __weak symbol, which means that
memory_add_physaddr_to_nid() example that phys_to_target_node() copied is
broken too.

Rework the definition of phys_to_target_node() and
memory_add_physaddr_to_nid() to not require weak symbols.  Move to the
common arch override design-pattern of an asm header defining a symbol to
replace the default implementation.

The only common header that all memory_add_physaddr_to_nid() producing
architectures implement is asm/sparsemem.h.  In fact, powerpc already
defines its memory_add_physaddr_to_nid() helper in sparsemem.h. 
Double-down on that observation and define phys_to_target_node() where
necessary in asm/sparsemem.h.  An alternate consideration that was
discarded was to put this override in asm/numa.h, but that entangles with
the definition of MAX_NUMNODES relative to the inclusion of
linux/nodemask.h, and requires powerpc to grow a new header.

The dependency on NUMA_KEEP_MEMINFO for DEV_DAX_HMEM_DEVICES is invalid
now that the symbol is properly exported / stubbed in all combinations of
CONFIG_NUMA_KEEP_MEMINFO and CONFIG_MEMORY_HOTPLUG.

[dan.j.williams@intel.com: v4]
  Link: https://lkml.kernel.org/r/160461461867.1505359.5301571728749534585.stgit@dwillia2-desk3.amr.corp.intel.com
[dan.j.williams@intel.com: powerpc: fix create_section_mapping compile warning]
  Link: https://lkml.kernel.org/r/160558386174.2948926.2740149041249041764.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160447639846.1133764.7044090803980177548.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: a035b6bf863e ("mm/memory_hotplug: introduce default phys_to_target_node() implementation")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/ia64/include/asm/sparsemem.h    |    6 +++++
 arch/powerpc/include/asm/mmzone.h    |    5 ++++
 arch/powerpc/include/asm/sparsemem.h |    5 +---
 arch/powerpc/mm/mem.c                |    1 
 arch/x86/include/asm/sparsemem.h     |   10 ++++++++
 arch/x86/mm/numa.c                   |    2 +
 drivers/dax/Kconfig                  |    1 
 include/linux/memory_hotplug.h       |   14 -----------
 include/linux/numa.h                 |   30 ++++++++++++++++++++++++-
 mm/memory_hotplug.c                  |   18 ---------------
 10 files changed, 55 insertions(+), 37 deletions(-)

--- a/arch/ia64/include/asm/sparsemem.h~mm-fix-phys_to_target_node-and-memory_add_physaddr_to_nid-exports
+++ a/arch/ia64/include/asm/sparsemem.h
@@ -18,4 +18,10 @@
 #endif
 
 #endif /* CONFIG_SPARSEMEM */
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+int memory_add_physaddr_to_nid(u64 addr);
+#define memory_add_physaddr_to_nid memory_add_physaddr_to_nid
+#endif
+
 #endif /* _ASM_IA64_SPARSEMEM_H */
--- a/arch/powerpc/include/asm/mmzone.h~mm-fix-phys_to_target_node-and-memory_add_physaddr_to_nid-exports
+++ a/arch/powerpc/include/asm/mmzone.h
@@ -46,5 +46,10 @@ u64 memory_hotplug_max(void);
 #define __HAVE_ARCH_RESERVED_KERNEL_PAGES
 #endif
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+extern int create_section_mapping(unsigned long start, unsigned long end,
+				  int nid, pgprot_t prot);
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _ASM_MMZONE_H_ */
--- a/arch/powerpc/include/asm/sparsemem.h~mm-fix-phys_to_target_node-and-memory_add_physaddr_to_nid-exports
+++ a/arch/powerpc/include/asm/sparsemem.h
@@ -13,9 +13,9 @@
 #endif /* CONFIG_SPARSEMEM */
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-extern int create_section_mapping(unsigned long start, unsigned long end,
-				  int nid, pgprot_t prot);
 extern int remove_section_mapping(unsigned long start, unsigned long end);
+extern int memory_add_physaddr_to_nid(u64 start);
+#define memory_add_physaddr_to_nid memory_add_physaddr_to_nid
 
 #ifdef CONFIG_NUMA
 extern int hot_add_scn_to_nid(unsigned long scn_addr);
@@ -26,6 +26,5 @@ static inline int hot_add_scn_to_nid(uns
 }
 #endif /* CONFIG_NUMA */
 #endif /* CONFIG_MEMORY_HOTPLUG */
-
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_SPARSEMEM_H */
--- a/arch/powerpc/mm/mem.c~mm-fix-phys_to_target_node-and-memory_add_physaddr_to_nid-exports
+++ a/arch/powerpc/mm/mem.c
@@ -50,6 +50,7 @@
 #include <asm/rtas.h>
 #include <asm/kasan.h>
 #include <asm/svm.h>
+#include <asm/mmzone.h>
 
 #include <mm/mmu_decl.h>
 
--- a/arch/x86/include/asm/sparsemem.h~mm-fix-phys_to_target_node-and-memory_add_physaddr_to_nid-exports
+++ a/arch/x86/include/asm/sparsemem.h
@@ -28,4 +28,14 @@
 #endif
 
 #endif /* CONFIG_SPARSEMEM */
+
+#ifndef __ASSEMBLY__
+#ifdef CONFIG_NUMA_KEEP_MEMINFO
+extern int phys_to_target_node(phys_addr_t start);
+#define phys_to_target_node phys_to_target_node
+extern int memory_add_physaddr_to_nid(u64 start);
+#define memory_add_physaddr_to_nid memory_add_physaddr_to_nid
+#endif
+#endif /* __ASSEMBLY__ */
+
 #endif /* _ASM_X86_SPARSEMEM_H */
--- a/arch/x86/mm/numa.c~mm-fix-phys_to_target_node-and-memory_add_physaddr_to_nid-exports
+++ a/arch/x86/mm/numa.c
@@ -938,6 +938,7 @@ int phys_to_target_node(phys_addr_t star
 
 	return meminfo_to_nid(&numa_reserved_meminfo, start);
 }
+EXPORT_SYMBOL_GPL(phys_to_target_node);
 
 int memory_add_physaddr_to_nid(u64 start)
 {
@@ -947,4 +948,5 @@ int memory_add_physaddr_to_nid(u64 start
 		nid = numa_meminfo.blk[0].nid;
 	return nid;
 }
+EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
 #endif
--- a/drivers/dax/Kconfig~mm-fix-phys_to_target_node-and-memory_add_physaddr_to_nid-exports
+++ a/drivers/dax/Kconfig
@@ -50,7 +50,6 @@ config DEV_DAX_HMEM
 	  Say M if unsure.
 
 config DEV_DAX_HMEM_DEVICES
-	depends on NUMA_KEEP_MEMINFO # for phys_to_target_node()
 	depends on DEV_DAX_HMEM && DAX=y
 	def_bool y
 
--- a/include/linux/memory_hotplug.h~mm-fix-phys_to_target_node-and-memory_add_physaddr_to_nid-exports
+++ a/include/linux/memory_hotplug.h
@@ -281,20 +281,6 @@ static inline bool movable_node_is_enabl
 }
 #endif /* ! CONFIG_MEMORY_HOTPLUG */
 
-#ifdef CONFIG_NUMA
-extern int memory_add_physaddr_to_nid(u64 start);
-extern int phys_to_target_node(u64 start);
-#else
-static inline int memory_add_physaddr_to_nid(u64 start)
-{
-	return 0;
-}
-static inline int phys_to_target_node(u64 start)
-{
-	return 0;
-}
-#endif
-
 #if defined(CONFIG_MEMORY_HOTPLUG) || defined(CONFIG_DEFERRED_STRUCT_PAGE_INIT)
 /*
  * pgdat resizing functions
--- a/include/linux/numa.h~mm-fix-phys_to_target_node-and-memory_add_physaddr_to_nid-exports
+++ a/include/linux/numa.h
@@ -21,13 +21,41 @@
 #endif
 
 #ifdef CONFIG_NUMA
+#include <linux/printk.h>
+#include <asm/sparsemem.h>
+
 /* Generic implementation available */
 int numa_map_to_online_node(int node);
-#else
+
+#ifndef memory_add_physaddr_to_nid
+static inline int memory_add_physaddr_to_nid(u64 start)
+{
+	pr_info_once("Unknown online node for memory at 0x%llx, assuming node 0\n",
+			start);
+	return 0;
+}
+#endif
+#ifndef phys_to_target_node
+static inline int phys_to_target_node(u64 start)
+{
+	pr_info_once("Unknown target node for memory at 0x%llx, assuming node 0\n",
+			start);
+	return 0;
+}
+#endif
+#else /* !CONFIG_NUMA */
 static inline int numa_map_to_online_node(int node)
 {
 	return NUMA_NO_NODE;
 }
+static inline int memory_add_physaddr_to_nid(u64 start)
+{
+	return 0;
+}
+static inline int phys_to_target_node(u64 start)
+{
+	return 0;
+}
 #endif
 
 #endif /* _LINUX_NUMA_H */
--- a/mm/memory_hotplug.c~mm-fix-phys_to_target_node-and-memory_add_physaddr_to_nid-exports
+++ a/mm/memory_hotplug.c
@@ -350,24 +350,6 @@ int __ref __add_pages(int nid, unsigned
 	return err;
 }
 
-#ifdef CONFIG_NUMA
-int __weak memory_add_physaddr_to_nid(u64 start)
-{
-	pr_info_once("Unknown online node for memory at 0x%llx, assuming node 0\n",
-			start);
-	return 0;
-}
-EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
-
-int __weak phys_to_target_node(u64 start)
-{
-	pr_info_once("Unknown target node for memory at 0x%llx, assuming node 0\n",
-			start);
-	return 0;
-}
-EXPORT_SYMBOL_GPL(phys_to_target_node);
-#endif
-
 /* find the smallest valid pfn in the range [start_pfn, end_pfn) */
 static unsigned long find_smallest_section_pfn(int nid, struct zone *zone,
 				     unsigned long start_pfn,
_

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [patch 4/8] mm: fix readahead_page_batch for retry entries
  2020-11-22  6:16 incoming Andrew Morton
                   ` (2 preceding siblings ...)
  2020-11-22  6:17 ` [patch 3/8] mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports Andrew Morton
@ 2020-11-22  6:17 ` Andrew Morton
  2020-11-22  6:17 ` [patch 5/8] mm: memcg/slab: fix root memcg vmstats Andrew Morton
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2020-11-22  6:17 UTC (permalink / raw)
  To: akpm, dsterba, linux-mm, mm-commits, stable, torvalds, vvghjk1234, willy

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm: fix readahead_page_batch for retry entries

Both btrfs and fuse have reported faults caused by seeing a retry entry
instead of the page they were looking for.  This was caused by a missing
check in the iterator.

As can be seen in the below panic log, the accessing 0x402 causes a
panic.  In the xarray.h, 0x402 means RETRY_ENTRY.

BUG: kernel NULL pointer dereference, address: 0000000000000402
CPU: 14 PID: 306003 Comm: as Not tainted 5.9.0-1-amd64 #1 Debian 5.9.1-1
Hardware name: Lenovo ThinkSystem SR665/7D2VCTO1WW, BIOS D8E106Q-1.01 05/30/2020
RIP: 0010:fuse_readahead+0x152/0x470 [fuse]
Code: 41 8b 57 18 4c 8d 54 10 ff 4c 89 d6 48 8d 7c 24 10 e8 d2 e3 28
f9 48 85 c0 0f 84 fe 00 00 00 44 89 f2 49 89 04 d4 44 8d 72 01 <48> 8b
10 41 8b 4f 1c 48 c1 ea 10 83 e2 01 80 fa 01 19 d2 81 e2 01
RSP: 0018:ffffad99ceaebc50 EFLAGS: 00010246
RAX: 0000000000000402 RBX: 0000000000000001 RCX: 0000000000000002
RDX: 0000000000000000 RSI: ffff94c5af90bd98 RDI: ffffad99ceaebc60
RBP: ffff94ddc1749a00 R08: 0000000000000402 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000100 R12: ffff94de6c429ce0
R13: ffff94de6c4d3700 R14: 0000000000000001 R15: ffffad99ceaebd68
FS:  00007f228c5c7040(0000) GS:ffff94de8ed80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000402 CR3: 0000001dbd9b4000 CR4: 0000000000350ee0
Call Trace:
  read_pages+0x83/0x270
  page_cache_readahead_unbounded+0x197/0x230
  generic_file_buffered_read+0x57a/0xa20
  new_sync_read+0x112/0x1a0
  vfs_read+0xf8/0x180
  ksys_read+0x5f/0xe0
  do_syscall_64+0x33/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Link: https://lkml.kernel.org/r/20201103142852.8543-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20201103124349.16722-1-vvghjk1234@gmail.com
Fixes: 042124cc64c3 ("mm: add new readahead_control API")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: David Sterba <dsterba@suse.com>
Reported-by: Wonhyuk Yang <vvghjk1234@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/pagemap.h |    2 ++
 1 file changed, 2 insertions(+)

--- a/include/linux/pagemap.h~mm-fix-readahead_page_batch-for-retry-entries
+++ a/include/linux/pagemap.h
@@ -906,6 +906,8 @@ static inline unsigned int __readahead_b
 	xas_set(&xas, rac->_index);
 	rcu_read_lock();
 	xas_for_each(&xas, page, rac->_index + rac->_nr_pages - 1) {
+		if (xas_retry(&xas, page))
+			continue;
 		VM_BUG_ON_PAGE(!PageLocked(page), page);
 		VM_BUG_ON_PAGE(PageTail(page), page);
 		array[i++] = page;
_

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [patch 5/8] mm: memcg/slab: fix root memcg vmstats
  2020-11-22  6:16 incoming Andrew Morton
                   ` (3 preceding siblings ...)
  2020-11-22  6:17 ` [patch 4/8] mm: fix readahead_page_batch for retry entries Andrew Morton
@ 2020-11-22  6:17 ` Andrew Morton
  2020-11-22  6:17 ` [patch 6/8] mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault() Andrew Morton
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2020-11-22  6:17 UTC (permalink / raw)
  To: akpm, chris, cl, guro, hannes, iamjoonsoo.kim, laoar.shao,
	linux-mm, mhocko, mm-commits, penberg, rientjes, shakeelb,
	songmuchun, stable, torvalds, vbabka, vdavydov.dev

From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: memcg/slab: fix root memcg vmstats

If we reparent the slab objects to the root memcg, when we free the slab
object, we need to update the per-memcg vmstats to keep it correct for the
root memcg.  Now this at least affects the vmstat of NR_KERNEL_STACK_KB
for !CONFIG_VMAP_STACK when the thread stack size is smaller than the
PAGE_SIZE.

David said: "I assume that without this fix that the root memcg's
vmstat would always be inflated if we reparented."

Link: https://lkml.kernel.org/r/20201110031015.15715-1-songmuchun@bytedance.com
Fixes: ec9f02384f60 ("mm: workingset: fix vmstat counters for shadow nodes")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yafang Shao <laoar.shao@gmail.com>
Cc: Chris Down <chris@chrisdown.name>
Cc: <stable@vger.kernel.org>	[5.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memcontrol.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

--- a/mm/memcontrol.c~mm-memcg-slab-fix-root-memcg-vmstats
+++ a/mm/memcontrol.c
@@ -867,8 +867,13 @@ void __mod_lruvec_slab_state(void *p, en
 	rcu_read_lock();
 	memcg = mem_cgroup_from_obj(p);
 
-	/* Untracked pages have no memcg, no lruvec. Update only the node */
-	if (!memcg || memcg == root_mem_cgroup) {
+	/*
+	 * Untracked pages have no memcg, no lruvec. Update only the
+	 * node. If we reparent the slab objects to the root memcg,
+	 * when we free the slab object, we need to update the per-memcg
+	 * vmstats to keep it correct for the root memcg.
+	 */
+	if (!memcg) {
 		__mod_node_page_state(pgdat, idx, val);
 	} else {
 		lruvec = mem_cgroup_lruvec(memcg, pgdat);
_

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [patch 6/8] mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()
  2020-11-22  6:16 incoming Andrew Morton
                   ` (4 preceding siblings ...)
  2020-11-22  6:17 ` [patch 5/8] mm: memcg/slab: fix root memcg vmstats Andrew Morton
@ 2020-11-22  6:17 ` Andrew Morton
  2020-11-22  6:17 ` [patch 7/8] libfs: fix error cast of negative value in simple_attr_write() Andrew Morton
  2020-11-22  6:17 ` [patch 8/8] mm: fix madvise WILLNEED performance problem Andrew Morton
  7 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2020-11-22  6:17 UTC (permalink / raw)
  To: aarcange, akpm, egorenar, gerald.schaefer, hca, linux-mm,
	mm-commits, stable, torvalds

From: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Subject: mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()

Alexander reported a syzkaller / KASAN finding on s390, see below for
complete output.

In do_huge_pmd_anonymous_page(), the pre-allocated pagetable will be freed
in some cases.  In the case of userfaultfd_missing(), this will happen
after calling handle_userfault(), which might have released the mmap_lock.
Therefore, the following pte_free(vma->vm_mm, pgtable) will access an
unstable vma->vm_mm, which could have been freed or re-used already.

For all architectures other than s390 this will go w/o any negative
impact, because pte_free() simply frees the page and ignores the passed-in
mm.  The implementation for SPARC32 would also access mm->page_table_lock
for pte_free(), but there is no THP support in SPARC32, so the buggy code
path will not be used there.

For s390, the mm->context.pgtable_list is being used to maintain the 2K
pagetable fragments, and operating on an already freed or even re-used mm
could result in various more or less subtle bugs due to list / pagetable
corruption.

Fix this by calling pte_free() before handle_userfault(), similar to how
it is already done in __do_huge_pmd_anonymous_page() for the WRITE /
non-huge_zero_page case.

Commit 6b251fc96cf2c ("userfaultfd: call handle_userfault() for
userfaultfd_missing() faults") actually introduced both, the
do_huge_pmd_anonymous_page() and also __do_huge_pmd_anonymous_page()
changes wrt to calling handle_userfault(), but only in the latter case it
put the pte_free() before calling handle_userfault().

==================================================================
BUG: KASAN: use-after-free in do_huge_pmd_anonymous_page+0xcda/0xd90 mm/huge_memory.c:744
Read of size 8 at addr 00000000962d6988 by task syz-executor.0/9334

CPU: 1 PID: 9334 Comm: syz-executor.0 Not tainted 5.10.0-rc1-syzkaller-07083-g4c9720875573 #0
Hardware name: IBM 3906 M04 701 (KVM/Linux)
Call Trace:
 [<00000000aa0a7a1c>] unwind_start arch/s390/include/asm/unwind.h:65 [inline]
 [<00000000aa0a7a1c>] show_stack+0x174/0x220 arch/s390/kernel/dumpstack.c:135
 [<00000000aa105952>] __dump_stack lib/dump_stack.c:77 [inline]
 [<00000000aa105952>] dump_stack+0x262/0x2e8 lib/dump_stack.c:118
 [<00000000aa0b484e>] print_address_description.constprop.0+0x5e/0x218 mm/kasan/report.c:385
 [<00000000a61f13aa>] __kasan_report mm/kasan/report.c:545 [inline]
 [<00000000a61f13aa>] kasan_report+0x11a/0x168 mm/kasan/report.c:562
 [<00000000a620d782>] do_huge_pmd_anonymous_page+0xcda/0xd90 mm/huge_memory.c:744
 [<00000000a610632e>] create_huge_pmd mm/memory.c:4256 [inline]
 [<00000000a610632e>] __handle_mm_fault+0xe6e/0x1068 mm/memory.c:4480
 [<00000000a61067b0>] handle_mm_fault+0x288/0x748 mm/memory.c:4607
 [<00000000a598b55c>] do_exception+0x394/0xae0 arch/s390/mm/fault.c:479
 [<00000000a598d7c4>] do_dat_exception+0x34/0x80 arch/s390/mm/fault.c:567
 [<00000000aa124e5e>] pgm_check_handler+0x1da/0x22c arch/s390/kernel/entry.S:706
 [<00000000aa0a6902>] copy_from_user_mvcos arch/s390/lib/uaccess.c:111 [inline]
 [<00000000aa0a6902>] raw_copy_from_user+0x3a/0x88 arch/s390/lib/uaccess.c:174
 [<00000000a7c24668>] _copy_from_user+0x48/0xa8 lib/usercopy.c:16
 [<00000000a5b0b2a8>] copy_from_user include/linux/uaccess.h:192 [inline]
 [<00000000a5b0b2a8>] __do_sys_sigaltstack kernel/signal.c:4064 [inline]
 [<00000000a5b0b2a8>] __s390x_sys_sigaltstack+0xc8/0x240 kernel/signal.c:4060
 [<00000000aa124a9c>] system_call+0xe0/0x28c arch/s390/kernel/entry.S:415

Allocated by task 9334:
 stack_trace_save+0xbe/0xf0 kernel/stacktrace.c:121
 kasan_save_stack+0x30/0x60 mm/kasan/common.c:48
 kasan_set_track mm/kasan/common.c:56 [inline]
 __kasan_kmalloc.constprop.0+0xd0/0xe8 mm/kasan/common.c:461
 slab_post_alloc_hook mm/slab.h:526 [inline]
 slab_alloc_node mm/slub.c:2891 [inline]
 slab_alloc mm/slub.c:2899 [inline]
 kmem_cache_alloc+0x118/0x348 mm/slub.c:2904
 vm_area_dup+0x9c/0x2b8 kernel/fork.c:356
 __split_vma+0xba/0x560 mm/mmap.c:2742
 split_vma+0xca/0x108 mm/mmap.c:2800
 mlock_fixup+0x4ae/0x600 mm/mlock.c:550
 apply_vma_lock_flags+0x2c6/0x398 mm/mlock.c:619
 do_mlock+0x1aa/0x718 mm/mlock.c:711
 __do_sys_mlock2 mm/mlock.c:738 [inline]
 __s390x_sys_mlock2+0x86/0xa8 mm/mlock.c:728
 system_call+0xe0/0x28c arch/s390/kernel/entry.S:415

Freed by task 9333:
 stack_trace_save+0xbe/0xf0 kernel/stacktrace.c:121
 kasan_save_stack+0x30/0x60 mm/kasan/common.c:48
 kasan_set_track+0x32/0x48 mm/kasan/common.c:56
 kasan_set_free_info+0x34/0x50 mm/kasan/generic.c:355
 __kasan_slab_free+0x11e/0x190 mm/kasan/common.c:422
 slab_free_hook mm/slub.c:1544 [inline]
 slab_free_freelist_hook mm/slub.c:1577 [inline]
 slab_free mm/slub.c:3142 [inline]
 kmem_cache_free+0x7c/0x4b8 mm/slub.c:3158
 __vma_adjust+0x7b2/0x2508 mm/mmap.c:960
 vma_merge+0x87e/0xce0 mm/mmap.c:1209
 userfaultfd_release+0x412/0x6b8 fs/userfaultfd.c:868
 __fput+0x22c/0x7a8 fs/file_table.c:281
 task_work_run+0x200/0x320 kernel/task_work.c:151
 tracehook_notify_resume include/linux/tracehook.h:188 [inline]
 do_notify_resume+0x100/0x148 arch/s390/kernel/signal.c:538
 system_call+0xe6/0x28c arch/s390/kernel/entry.S:416

The buggy address belongs to the object at 00000000962d6948
 which belongs to the cache vm_area_struct of size 200
The buggy address is located 64 bytes inside of
 200-byte region [00000000962d6948, 00000000962d6a10)
The buggy address belongs to the page:
page:00000000313a09fe refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x962d6
flags: 0x3ffff00000000200(slab)
raw: 3ffff00000000200 000040000257e080 0000000c0000000c 000000008020ba00
raw: 0000000000000000 000f001e00000000 ffffffff00000001 0000000096959501
page dumped because: kasan: bad access detected
page->mem_cgroup:0000000096959501

Memory state around the buggy address:
 00000000962d6880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 00000000962d6900: 00 fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb
>00000000962d6980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                      ^
 00000000962d6a00: fb fb fc fc fc fc fc fc fc fc 00 00 00 00 00 00
 00000000962d6a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

Link: https://lkml.kernel.org/r/20201110190329.11920-1-gerald.schaefer@linux.ibm.com
Fixes: 6b251fc96cf2c ("userfaultfd: call handle_userfault() for userfaultfd_missing() faults")
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: <stable@vger.kernel.org>	[4.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/huge_memory.c |    9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

--- a/mm/huge_memory.c~mm-userfaultfd-do-not-access-vma-vm_mm-after-calling-handle_userfault
+++ a/mm/huge_memory.c
@@ -710,7 +710,6 @@ vm_fault_t do_huge_pmd_anonymous_page(st
 			transparent_hugepage_use_zero_page()) {
 		pgtable_t pgtable;
 		struct page *zero_page;
-		bool set;
 		vm_fault_t ret;
 		pgtable = pte_alloc_one(vma->vm_mm);
 		if (unlikely(!pgtable))
@@ -723,25 +722,25 @@ vm_fault_t do_huge_pmd_anonymous_page(st
 		}
 		vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
 		ret = 0;
-		set = false;
 		if (pmd_none(*vmf->pmd)) {
 			ret = check_stable_address_space(vma->vm_mm);
 			if (ret) {
 				spin_unlock(vmf->ptl);
+				pte_free(vma->vm_mm, pgtable);
 			} else if (userfaultfd_missing(vma)) {
 				spin_unlock(vmf->ptl);
+				pte_free(vma->vm_mm, pgtable);
 				ret = handle_userfault(vmf, VM_UFFD_MISSING);
 				VM_BUG_ON(ret & VM_FAULT_FALLBACK);
 			} else {
 				set_huge_zero_page(pgtable, vma->vm_mm, vma,
 						   haddr, vmf->pmd, zero_page);
 				spin_unlock(vmf->ptl);
-				set = true;
 			}
-		} else
+		} else {
 			spin_unlock(vmf->ptl);
-		if (!set)
 			pte_free(vma->vm_mm, pgtable);
+		}
 		return ret;
 	}
 	gfp = alloc_hugepage_direct_gfpmask(vma);
_

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [patch 7/8] libfs: fix error cast of negative value in simple_attr_write()
  2020-11-22  6:16 incoming Andrew Morton
                   ` (5 preceding siblings ...)
  2020-11-22  6:17 ` [patch 6/8] mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault() Andrew Morton
@ 2020-11-22  6:17 ` Andrew Morton
  2020-11-22  6:17 ` [patch 8/8] mm: fix madvise WILLNEED performance problem Andrew Morton
  7 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2020-11-22  6:17 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, torvalds, viro, yangyicong

From: Yicong Yang <yangyicong@hisilicon.com>
Subject: libfs: fix error cast of negative value in simple_attr_write()

The attr->set() receive a value of u64, but simple_strtoll() is used for
doing the conversion.  It will lead to the error cast if user inputs a
negative value.

Use kstrtoull() instead of simple_strtoll() to convert a string got from
the user to an unsigned value.  The former will return '-EINVAL' if it
gets a negetive value, but the latter can't handle the situation
correctly.  Make 'val' unsigned long long as what kstrtoull() takes, this
will eliminate the compile warning on no 64-bit architectures.

Link: https://lkml.kernel.org/r/1605341356-11872-1-git-send-email-yangyicong@hisilicon.com
Fixes: f7b88631a897 ("fs/libfs.c: fix simple_attr_write() on 32bit machines")
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/libfs.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/fs/libfs.c~libfs-fix-error-cast-of-negative-value-in-simple_attr_write
+++ a/fs/libfs.c
@@ -959,7 +959,7 @@ ssize_t simple_attr_write(struct file *f
 			  size_t len, loff_t *ppos)
 {
 	struct simple_attr *attr;
-	u64 val;
+	unsigned long long val;
 	size_t size;
 	ssize_t ret;
 
@@ -977,7 +977,9 @@ ssize_t simple_attr_write(struct file *f
 		goto out;
 
 	attr->set_buf[size] = '\0';
-	val = simple_strtoll(attr->set_buf, NULL, 0);
+	ret = kstrtoull(attr->set_buf, 0, &val);
+	if (ret)
+		goto out;
 	ret = attr->set(attr->data, val);
 	if (ret == 0)
 		ret = len; /* on success, claim we got the whole input */
_

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [patch 8/8] mm: fix madvise WILLNEED performance problem
  2020-11-22  6:16 incoming Andrew Morton
                   ` (6 preceding siblings ...)
  2020-11-22  6:17 ` [patch 7/8] libfs: fix error cast of negative value in simple_attr_write() Andrew Morton
@ 2020-11-22  6:17 ` Andrew Morton
  7 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2020-11-22  6:17 UTC (permalink / raw)
  To: akpm, feng.tang, hannes, linux-mm, mm-commits, rong.a.chen,
	torvalds, william.kucharski, willy, zhengjun.xing

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm: fix madvise WILLNEED performance problem

The calculation of the end page index was incorrect, leading to a
regression of 70% when running stress-ng.  With this fix, we instead
see a performance improvement of 3%.

Link: https://lkml.kernel.org/r/20201109134851.29692-1-willy@infradead.org
Fixes: e6e88712e43b ("mm: optimise madvise WILLNEED")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: kernel test robot <rong.a.chen@intel.com>
Tested-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: "Chen, Rong A" <rong.a.chen@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/madvise.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/madvise.c~mm-fix-madvise-willneed-performance-problem
+++ a/mm/madvise.c
@@ -226,7 +226,7 @@ static void force_shm_swapin_readahead(s
 		struct address_space *mapping)
 {
 	XA_STATE(xas, &mapping->i_pages, linear_page_index(vma, start));
-	pgoff_t end_index = end / PAGE_SIZE;
+	pgoff_t end_index = linear_page_index(vma, end + PAGE_SIZE - 1);
 	struct page *page;
 
 	rcu_read_lock();
_

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-11-22  6:17 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-22  6:16 incoming Andrew Morton
2020-11-22  6:16 ` [patch 1/8] mm/madvise: fix memory leak from process_madvise Andrew Morton
2020-11-22  6:17 ` [patch 2/8] compiler-clang: remove version check for BPF Tracing Andrew Morton
2020-11-22  6:17 ` [patch 3/8] mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports Andrew Morton
2020-11-22  6:17 ` [patch 4/8] mm: fix readahead_page_batch for retry entries Andrew Morton
2020-11-22  6:17 ` [patch 5/8] mm: memcg/slab: fix root memcg vmstats Andrew Morton
2020-11-22  6:17 ` [patch 6/8] mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault() Andrew Morton
2020-11-22  6:17 ` [patch 7/8] libfs: fix error cast of negative value in simple_attr_write() Andrew Morton
2020-11-22  6:17 ` [patch 8/8] mm: fix madvise WILLNEED performance problem Andrew Morton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.