* incoming
@ 2020-11-22 6:16 Andrew Morton
2020-11-22 6:16 ` [patch 1/8] mm/madvise: fix memory leak from process_madvise Andrew Morton
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: Andrew Morton @ 2020-11-22 6:16 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
8 patches, based on a349e4c659609fd20e4beea89e5c4a4038e33a95.
Subsystems affected by this patch series:
mm/madvise
kbuild
mm/pagemap
mm/readahead
mm/memcg
mm/userfaultfd
vfs-akpm
mm/madvise
Subsystem: mm/madvise
Eric Dumazet <edumazet@google.com>:
mm/madvise: fix memory leak from process_madvise
Subsystem: kbuild
Nick Desaulniers <ndesaulniers@google.com>:
compiler-clang: remove version check for BPF Tracing
Subsystem: mm/pagemap
Dan Williams <dan.j.williams@intel.com>:
mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports
Subsystem: mm/readahead
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm: fix readahead_page_batch for retry entries
Subsystem: mm/memcg
Muchun Song <songmuchun@bytedance.com>:
mm: memcg/slab: fix root memcg vmstats
Subsystem: mm/userfaultfd
Gerald Schaefer <gerald.schaefer@linux.ibm.com>:
mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()
Subsystem: vfs-akpm
Yicong Yang <yangyicong@hisilicon.com>:
libfs: fix error cast of negative value in simple_attr_write()
Subsystem: mm/madvise
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm: fix madvise WILLNEED performance problem
arch/ia64/include/asm/sparsemem.h | 6 ++++++
arch/powerpc/include/asm/mmzone.h | 5 +++++
arch/powerpc/include/asm/sparsemem.h | 5 ++---
arch/powerpc/mm/mem.c | 1 +
arch/x86/include/asm/sparsemem.h | 10 ++++++++++
arch/x86/mm/numa.c | 2 ++
drivers/dax/Kconfig | 1 -
fs/libfs.c | 6 ++++--
include/linux/compiler-clang.h | 2 ++
include/linux/memory_hotplug.h | 14 --------------
include/linux/numa.h | 30 +++++++++++++++++++++++++++++-
include/linux/pagemap.h | 2 ++
mm/huge_memory.c | 9 ++++-----
mm/madvise.c | 4 +---
mm/memcontrol.c | 9 +++++++--
mm/memory_hotplug.c | 18 ------------------
16 files changed, 75 insertions(+), 49 deletions(-)
^ permalink raw reply [flat|nested] 9+ messages in thread
* [patch 1/8] mm/madvise: fix memory leak from process_madvise
2020-11-22 6:16 incoming Andrew Morton
@ 2020-11-22 6:16 ` Andrew Morton
2020-11-22 6:17 ` [patch 2/8] compiler-clang: remove version check for BPF Tracing Andrew Morton
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2020-11-22 6:16 UTC (permalink / raw)
To: akpm, edumazet, linux-mm, minchan, mm-commits, torvalds
From: Eric Dumazet <edumazet@google.com>
Subject: mm/madvise: fix memory leak from process_madvise
The early return in process_madvise() will produce a memory leak. Fix it.
Link: https://lkml.kernel.org/r/20201116155132.GA3805951@google.com
Fixes: ecb8ac8b1f14 ("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/madvise.c | 2 --
1 file changed, 2 deletions(-)
--- a/mm/madvise.c~mm-madvise-fix-memory-leak-from-process_madvise
+++ a/mm/madvise.c
@@ -1231,8 +1231,6 @@ SYSCALL_DEFINE5(process_madvise, int, pi
ret = total_len - iov_iter_count(&iter);
mmput(mm);
- return ret;
-
release_task:
put_task_struct(task);
put_pid:
_
^ permalink raw reply [flat|nested] 9+ messages in thread
* [patch 2/8] compiler-clang: remove version check for BPF Tracing
2020-11-22 6:16 incoming Andrew Morton
2020-11-22 6:16 ` [patch 1/8] mm/madvise: fix memory leak from process_madvise Andrew Morton
@ 2020-11-22 6:17 ` Andrew Morton
2020-11-22 6:17 ` [patch 3/8] mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports Andrew Morton
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2020-11-22 6:17 UTC (permalink / raw)
To: akpm, jarkko, linux-mm, mm-commits, natechancellor, ndesaulniers,
ojeda, songliubraving, torvalds, yu.chen.surf
From: Nick Desaulniers <ndesaulniers@google.com>
Subject: compiler-clang: remove version check for BPF Tracing
bpftrace parses the kernel headers and uses Clang under the hood. Remove
the version check when __BPF_TRACING__ is defined (as bpftrace does) so
that this tool can continue to parse kernel headers, even with older clang
sources.
Link: https://lkml.kernel.org/r/20201104191052.390657-1-ndesaulniers@google.com
Fixes: commit 1f7a44f63e6c ("compiler-clang: add build check for clang 10.0.1")
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reported-by: Chen Yu <yu.chen.surf@gmail.com>
Reported-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/compiler-clang.h | 2 ++
1 file changed, 2 insertions(+)
--- a/include/linux/compiler-clang.h~compiler-clang-remove-version-check-for-bpf-tracing
+++ a/include/linux/compiler-clang.h
@@ -8,8 +8,10 @@
+ __clang_patchlevel__)
#if CLANG_VERSION < 100001
+#ifndef __BPF_TRACING__
# error Sorry, your version of Clang is too old - please use 10.0.1 or newer.
#endif
+#endif
/* Compiler specific definitions for Clang compiler */
_
^ permalink raw reply [flat|nested] 9+ messages in thread
* [patch 3/8] mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports
2020-11-22 6:16 incoming Andrew Morton
2020-11-22 6:16 ` [patch 1/8] mm/madvise: fix memory leak from process_madvise Andrew Morton
2020-11-22 6:17 ` [patch 2/8] compiler-clang: remove version check for BPF Tracing Andrew Morton
@ 2020-11-22 6:17 ` Andrew Morton
2020-11-22 6:17 ` [patch 4/8] mm: fix readahead_page_batch for retry entries Andrew Morton
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2020-11-22 6:17 UTC (permalink / raw)
To: akpm, benh, dan.j.williams, fenghua.yu, hch, hch, joao.m.martins,
linux-mm, lkp, mm-commits, mpe, paulus, rdunlap, sfr, tglx,
tony.luck, torvalds, vishal.l.verma
From: Dan Williams <dan.j.williams@intel.com>
Subject: mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports
The core-mm has a default __weak implementation of phys_to_target_node()
to mirror the weak definition of memory_add_physaddr_to_nid(). That
symbol is exported for modules. However, while the export in
mm/memory_hotplug.c exported the symbol in the configuration cases of:
CONFIG_NUMA_KEEP_MEMINFO=y
CONFIG_MEMORY_HOTPLUG=y
...and:
CONFIG_NUMA_KEEP_MEMINFO=n
CONFIG_MEMORY_HOTPLUG=y
...it failed to export the symbol in the case of:
CONFIG_NUMA_KEEP_MEMINFO=y
CONFIG_MEMORY_HOTPLUG=n
Not only is that broken, but Christoph points out that the kernel should
not be exporting any __weak symbol, which means that
memory_add_physaddr_to_nid() example that phys_to_target_node() copied is
broken too.
Rework the definition of phys_to_target_node() and
memory_add_physaddr_to_nid() to not require weak symbols. Move to the
common arch override design-pattern of an asm header defining a symbol to
replace the default implementation.
The only common header that all memory_add_physaddr_to_nid() producing
architectures implement is asm/sparsemem.h. In fact, powerpc already
defines its memory_add_physaddr_to_nid() helper in sparsemem.h.
Double-down on that observation and define phys_to_target_node() where
necessary in asm/sparsemem.h. An alternate consideration that was
discarded was to put this override in asm/numa.h, but that entangles with
the definition of MAX_NUMNODES relative to the inclusion of
linux/nodemask.h, and requires powerpc to grow a new header.
The dependency on NUMA_KEEP_MEMINFO for DEV_DAX_HMEM_DEVICES is invalid
now that the symbol is properly exported / stubbed in all combinations of
CONFIG_NUMA_KEEP_MEMINFO and CONFIG_MEMORY_HOTPLUG.
[dan.j.williams@intel.com: v4]
Link: https://lkml.kernel.org/r/160461461867.1505359.5301571728749534585.stgit@dwillia2-desk3.amr.corp.intel.com
[dan.j.williams@intel.com: powerpc: fix create_section_mapping compile warning]
Link: https://lkml.kernel.org/r/160558386174.2948926.2740149041249041764.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160447639846.1133764.7044090803980177548.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: a035b6bf863e ("mm/memory_hotplug: introduce default phys_to_target_node() implementation")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/ia64/include/asm/sparsemem.h | 6 +++++
arch/powerpc/include/asm/mmzone.h | 5 ++++
arch/powerpc/include/asm/sparsemem.h | 5 +---
arch/powerpc/mm/mem.c | 1
arch/x86/include/asm/sparsemem.h | 10 ++++++++
arch/x86/mm/numa.c | 2 +
drivers/dax/Kconfig | 1
include/linux/memory_hotplug.h | 14 -----------
include/linux/numa.h | 30 ++++++++++++++++++++++++-
mm/memory_hotplug.c | 18 ---------------
10 files changed, 55 insertions(+), 37 deletions(-)
--- a/arch/ia64/include/asm/sparsemem.h~mm-fix-phys_to_target_node-and-memory_add_physaddr_to_nid-exports
+++ a/arch/ia64/include/asm/sparsemem.h
@@ -18,4 +18,10 @@
#endif
#endif /* CONFIG_SPARSEMEM */
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+int memory_add_physaddr_to_nid(u64 addr);
+#define memory_add_physaddr_to_nid memory_add_physaddr_to_nid
+#endif
+
#endif /* _ASM_IA64_SPARSEMEM_H */
--- a/arch/powerpc/include/asm/mmzone.h~mm-fix-phys_to_target_node-and-memory_add_physaddr_to_nid-exports
+++ a/arch/powerpc/include/asm/mmzone.h
@@ -46,5 +46,10 @@ u64 memory_hotplug_max(void);
#define __HAVE_ARCH_RESERVED_KERNEL_PAGES
#endif
+#ifdef CONFIG_MEMORY_HOTPLUG
+extern int create_section_mapping(unsigned long start, unsigned long end,
+ int nid, pgprot_t prot);
+#endif
+
#endif /* __KERNEL__ */
#endif /* _ASM_MMZONE_H_ */
--- a/arch/powerpc/include/asm/sparsemem.h~mm-fix-phys_to_target_node-and-memory_add_physaddr_to_nid-exports
+++ a/arch/powerpc/include/asm/sparsemem.h
@@ -13,9 +13,9 @@
#endif /* CONFIG_SPARSEMEM */
#ifdef CONFIG_MEMORY_HOTPLUG
-extern int create_section_mapping(unsigned long start, unsigned long end,
- int nid, pgprot_t prot);
extern int remove_section_mapping(unsigned long start, unsigned long end);
+extern int memory_add_physaddr_to_nid(u64 start);
+#define memory_add_physaddr_to_nid memory_add_physaddr_to_nid
#ifdef CONFIG_NUMA
extern int hot_add_scn_to_nid(unsigned long scn_addr);
@@ -26,6 +26,5 @@ static inline int hot_add_scn_to_nid(uns
}
#endif /* CONFIG_NUMA */
#endif /* CONFIG_MEMORY_HOTPLUG */
-
#endif /* __KERNEL__ */
#endif /* _ASM_POWERPC_SPARSEMEM_H */
--- a/arch/powerpc/mm/mem.c~mm-fix-phys_to_target_node-and-memory_add_physaddr_to_nid-exports
+++ a/arch/powerpc/mm/mem.c
@@ -50,6 +50,7 @@
#include <asm/rtas.h>
#include <asm/kasan.h>
#include <asm/svm.h>
+#include <asm/mmzone.h>
#include <mm/mmu_decl.h>
--- a/arch/x86/include/asm/sparsemem.h~mm-fix-phys_to_target_node-and-memory_add_physaddr_to_nid-exports
+++ a/arch/x86/include/asm/sparsemem.h
@@ -28,4 +28,14 @@
#endif
#endif /* CONFIG_SPARSEMEM */
+
+#ifndef __ASSEMBLY__
+#ifdef CONFIG_NUMA_KEEP_MEMINFO
+extern int phys_to_target_node(phys_addr_t start);
+#define phys_to_target_node phys_to_target_node
+extern int memory_add_physaddr_to_nid(u64 start);
+#define memory_add_physaddr_to_nid memory_add_physaddr_to_nid
+#endif
+#endif /* __ASSEMBLY__ */
+
#endif /* _ASM_X86_SPARSEMEM_H */
--- a/arch/x86/mm/numa.c~mm-fix-phys_to_target_node-and-memory_add_physaddr_to_nid-exports
+++ a/arch/x86/mm/numa.c
@@ -938,6 +938,7 @@ int phys_to_target_node(phys_addr_t star
return meminfo_to_nid(&numa_reserved_meminfo, start);
}
+EXPORT_SYMBOL_GPL(phys_to_target_node);
int memory_add_physaddr_to_nid(u64 start)
{
@@ -947,4 +948,5 @@ int memory_add_physaddr_to_nid(u64 start
nid = numa_meminfo.blk[0].nid;
return nid;
}
+EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
#endif
--- a/drivers/dax/Kconfig~mm-fix-phys_to_target_node-and-memory_add_physaddr_to_nid-exports
+++ a/drivers/dax/Kconfig
@@ -50,7 +50,6 @@ config DEV_DAX_HMEM
Say M if unsure.
config DEV_DAX_HMEM_DEVICES
- depends on NUMA_KEEP_MEMINFO # for phys_to_target_node()
depends on DEV_DAX_HMEM && DAX=y
def_bool y
--- a/include/linux/memory_hotplug.h~mm-fix-phys_to_target_node-and-memory_add_physaddr_to_nid-exports
+++ a/include/linux/memory_hotplug.h
@@ -281,20 +281,6 @@ static inline bool movable_node_is_enabl
}
#endif /* ! CONFIG_MEMORY_HOTPLUG */
-#ifdef CONFIG_NUMA
-extern int memory_add_physaddr_to_nid(u64 start);
-extern int phys_to_target_node(u64 start);
-#else
-static inline int memory_add_physaddr_to_nid(u64 start)
-{
- return 0;
-}
-static inline int phys_to_target_node(u64 start)
-{
- return 0;
-}
-#endif
-
#if defined(CONFIG_MEMORY_HOTPLUG) || defined(CONFIG_DEFERRED_STRUCT_PAGE_INIT)
/*
* pgdat resizing functions
--- a/include/linux/numa.h~mm-fix-phys_to_target_node-and-memory_add_physaddr_to_nid-exports
+++ a/include/linux/numa.h
@@ -21,13 +21,41 @@
#endif
#ifdef CONFIG_NUMA
+#include <linux/printk.h>
+#include <asm/sparsemem.h>
+
/* Generic implementation available */
int numa_map_to_online_node(int node);
-#else
+
+#ifndef memory_add_physaddr_to_nid
+static inline int memory_add_physaddr_to_nid(u64 start)
+{
+ pr_info_once("Unknown online node for memory at 0x%llx, assuming node 0\n",
+ start);
+ return 0;
+}
+#endif
+#ifndef phys_to_target_node
+static inline int phys_to_target_node(u64 start)
+{
+ pr_info_once("Unknown target node for memory at 0x%llx, assuming node 0\n",
+ start);
+ return 0;
+}
+#endif
+#else /* !CONFIG_NUMA */
static inline int numa_map_to_online_node(int node)
{
return NUMA_NO_NODE;
}
+static inline int memory_add_physaddr_to_nid(u64 start)
+{
+ return 0;
+}
+static inline int phys_to_target_node(u64 start)
+{
+ return 0;
+}
#endif
#endif /* _LINUX_NUMA_H */
--- a/mm/memory_hotplug.c~mm-fix-phys_to_target_node-and-memory_add_physaddr_to_nid-exports
+++ a/mm/memory_hotplug.c
@@ -350,24 +350,6 @@ int __ref __add_pages(int nid, unsigned
return err;
}
-#ifdef CONFIG_NUMA
-int __weak memory_add_physaddr_to_nid(u64 start)
-{
- pr_info_once("Unknown online node for memory at 0x%llx, assuming node 0\n",
- start);
- return 0;
-}
-EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
-
-int __weak phys_to_target_node(u64 start)
-{
- pr_info_once("Unknown target node for memory at 0x%llx, assuming node 0\n",
- start);
- return 0;
-}
-EXPORT_SYMBOL_GPL(phys_to_target_node);
-#endif
-
/* find the smallest valid pfn in the range [start_pfn, end_pfn) */
static unsigned long find_smallest_section_pfn(int nid, struct zone *zone,
unsigned long start_pfn,
_
^ permalink raw reply [flat|nested] 9+ messages in thread
* [patch 4/8] mm: fix readahead_page_batch for retry entries
2020-11-22 6:16 incoming Andrew Morton
` (2 preceding siblings ...)
2020-11-22 6:17 ` [patch 3/8] mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports Andrew Morton
@ 2020-11-22 6:17 ` Andrew Morton
2020-11-22 6:17 ` [patch 5/8] mm: memcg/slab: fix root memcg vmstats Andrew Morton
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2020-11-22 6:17 UTC (permalink / raw)
To: akpm, dsterba, linux-mm, mm-commits, stable, torvalds, vvghjk1234, willy
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm: fix readahead_page_batch for retry entries
Both btrfs and fuse have reported faults caused by seeing a retry entry
instead of the page they were looking for. This was caused by a missing
check in the iterator.
As can be seen in the below panic log, the accessing 0x402 causes a
panic. In the xarray.h, 0x402 means RETRY_ENTRY.
BUG: kernel NULL pointer dereference, address: 0000000000000402
CPU: 14 PID: 306003 Comm: as Not tainted 5.9.0-1-amd64 #1 Debian 5.9.1-1
Hardware name: Lenovo ThinkSystem SR665/7D2VCTO1WW, BIOS D8E106Q-1.01 05/30/2020
RIP: 0010:fuse_readahead+0x152/0x470 [fuse]
Code: 41 8b 57 18 4c 8d 54 10 ff 4c 89 d6 48 8d 7c 24 10 e8 d2 e3 28
f9 48 85 c0 0f 84 fe 00 00 00 44 89 f2 49 89 04 d4 44 8d 72 01 <48> 8b
10 41 8b 4f 1c 48 c1 ea 10 83 e2 01 80 fa 01 19 d2 81 e2 01
RSP: 0018:ffffad99ceaebc50 EFLAGS: 00010246
RAX: 0000000000000402 RBX: 0000000000000001 RCX: 0000000000000002
RDX: 0000000000000000 RSI: ffff94c5af90bd98 RDI: ffffad99ceaebc60
RBP: ffff94ddc1749a00 R08: 0000000000000402 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000100 R12: ffff94de6c429ce0
R13: ffff94de6c4d3700 R14: 0000000000000001 R15: ffffad99ceaebd68
FS: 00007f228c5c7040(0000) GS:ffff94de8ed80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000402 CR3: 0000001dbd9b4000 CR4: 0000000000350ee0
Call Trace:
read_pages+0x83/0x270
page_cache_readahead_unbounded+0x197/0x230
generic_file_buffered_read+0x57a/0xa20
new_sync_read+0x112/0x1a0
vfs_read+0xf8/0x180
ksys_read+0x5f/0xe0
do_syscall_64+0x33/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Link: https://lkml.kernel.org/r/20201103142852.8543-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20201103124349.16722-1-vvghjk1234@gmail.com
Fixes: 042124cc64c3 ("mm: add new readahead_control API")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: David Sterba <dsterba@suse.com>
Reported-by: Wonhyuk Yang <vvghjk1234@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/pagemap.h | 2 ++
1 file changed, 2 insertions(+)
--- a/include/linux/pagemap.h~mm-fix-readahead_page_batch-for-retry-entries
+++ a/include/linux/pagemap.h
@@ -906,6 +906,8 @@ static inline unsigned int __readahead_b
xas_set(&xas, rac->_index);
rcu_read_lock();
xas_for_each(&xas, page, rac->_index + rac->_nr_pages - 1) {
+ if (xas_retry(&xas, page))
+ continue;
VM_BUG_ON_PAGE(!PageLocked(page), page);
VM_BUG_ON_PAGE(PageTail(page), page);
array[i++] = page;
_
^ permalink raw reply [flat|nested] 9+ messages in thread
* [patch 5/8] mm: memcg/slab: fix root memcg vmstats
2020-11-22 6:16 incoming Andrew Morton
` (3 preceding siblings ...)
2020-11-22 6:17 ` [patch 4/8] mm: fix readahead_page_batch for retry entries Andrew Morton
@ 2020-11-22 6:17 ` Andrew Morton
2020-11-22 6:17 ` [patch 6/8] mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault() Andrew Morton
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2020-11-22 6:17 UTC (permalink / raw)
To: akpm, chris, cl, guro, hannes, iamjoonsoo.kim, laoar.shao,
linux-mm, mhocko, mm-commits, penberg, rientjes, shakeelb,
songmuchun, stable, torvalds, vbabka, vdavydov.dev
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: memcg/slab: fix root memcg vmstats
If we reparent the slab objects to the root memcg, when we free the slab
object, we need to update the per-memcg vmstats to keep it correct for the
root memcg. Now this at least affects the vmstat of NR_KERNEL_STACK_KB
for !CONFIG_VMAP_STACK when the thread stack size is smaller than the
PAGE_SIZE.
David said: "I assume that without this fix that the root memcg's
vmstat would always be inflated if we reparented."
Link: https://lkml.kernel.org/r/20201110031015.15715-1-songmuchun@bytedance.com
Fixes: ec9f02384f60 ("mm: workingset: fix vmstat counters for shadow nodes")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yafang Shao <laoar.shao@gmail.com>
Cc: Chris Down <chris@chrisdown.name>
Cc: <stable@vger.kernel.org> [5.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
--- a/mm/memcontrol.c~mm-memcg-slab-fix-root-memcg-vmstats
+++ a/mm/memcontrol.c
@@ -867,8 +867,13 @@ void __mod_lruvec_slab_state(void *p, en
rcu_read_lock();
memcg = mem_cgroup_from_obj(p);
- /* Untracked pages have no memcg, no lruvec. Update only the node */
- if (!memcg || memcg == root_mem_cgroup) {
+ /*
+ * Untracked pages have no memcg, no lruvec. Update only the
+ * node. If we reparent the slab objects to the root memcg,
+ * when we free the slab object, we need to update the per-memcg
+ * vmstats to keep it correct for the root memcg.
+ */
+ if (!memcg) {
__mod_node_page_state(pgdat, idx, val);
} else {
lruvec = mem_cgroup_lruvec(memcg, pgdat);
_
^ permalink raw reply [flat|nested] 9+ messages in thread
* [patch 6/8] mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()
2020-11-22 6:16 incoming Andrew Morton
` (4 preceding siblings ...)
2020-11-22 6:17 ` [patch 5/8] mm: memcg/slab: fix root memcg vmstats Andrew Morton
@ 2020-11-22 6:17 ` Andrew Morton
2020-11-22 6:17 ` [patch 7/8] libfs: fix error cast of negative value in simple_attr_write() Andrew Morton
2020-11-22 6:17 ` [patch 8/8] mm: fix madvise WILLNEED performance problem Andrew Morton
7 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2020-11-22 6:17 UTC (permalink / raw)
To: aarcange, akpm, egorenar, gerald.schaefer, hca, linux-mm,
mm-commits, stable, torvalds
From: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Subject: mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()
Alexander reported a syzkaller / KASAN finding on s390, see below for
complete output.
In do_huge_pmd_anonymous_page(), the pre-allocated pagetable will be freed
in some cases. In the case of userfaultfd_missing(), this will happen
after calling handle_userfault(), which might have released the mmap_lock.
Therefore, the following pte_free(vma->vm_mm, pgtable) will access an
unstable vma->vm_mm, which could have been freed or re-used already.
For all architectures other than s390 this will go w/o any negative
impact, because pte_free() simply frees the page and ignores the passed-in
mm. The implementation for SPARC32 would also access mm->page_table_lock
for pte_free(), but there is no THP support in SPARC32, so the buggy code
path will not be used there.
For s390, the mm->context.pgtable_list is being used to maintain the 2K
pagetable fragments, and operating on an already freed or even re-used mm
could result in various more or less subtle bugs due to list / pagetable
corruption.
Fix this by calling pte_free() before handle_userfault(), similar to how
it is already done in __do_huge_pmd_anonymous_page() for the WRITE /
non-huge_zero_page case.
Commit 6b251fc96cf2c ("userfaultfd: call handle_userfault() for
userfaultfd_missing() faults") actually introduced both, the
do_huge_pmd_anonymous_page() and also __do_huge_pmd_anonymous_page()
changes wrt to calling handle_userfault(), but only in the latter case it
put the pte_free() before calling handle_userfault().
==================================================================
BUG: KASAN: use-after-free in do_huge_pmd_anonymous_page+0xcda/0xd90 mm/huge_memory.c:744
Read of size 8 at addr 00000000962d6988 by task syz-executor.0/9334
CPU: 1 PID: 9334 Comm: syz-executor.0 Not tainted 5.10.0-rc1-syzkaller-07083-g4c9720875573 #0
Hardware name: IBM 3906 M04 701 (KVM/Linux)
Call Trace:
[<00000000aa0a7a1c>] unwind_start arch/s390/include/asm/unwind.h:65 [inline]
[<00000000aa0a7a1c>] show_stack+0x174/0x220 arch/s390/kernel/dumpstack.c:135
[<00000000aa105952>] __dump_stack lib/dump_stack.c:77 [inline]
[<00000000aa105952>] dump_stack+0x262/0x2e8 lib/dump_stack.c:118
[<00000000aa0b484e>] print_address_description.constprop.0+0x5e/0x218 mm/kasan/report.c:385
[<00000000a61f13aa>] __kasan_report mm/kasan/report.c:545 [inline]
[<00000000a61f13aa>] kasan_report+0x11a/0x168 mm/kasan/report.c:562
[<00000000a620d782>] do_huge_pmd_anonymous_page+0xcda/0xd90 mm/huge_memory.c:744
[<00000000a610632e>] create_huge_pmd mm/memory.c:4256 [inline]
[<00000000a610632e>] __handle_mm_fault+0xe6e/0x1068 mm/memory.c:4480
[<00000000a61067b0>] handle_mm_fault+0x288/0x748 mm/memory.c:4607
[<00000000a598b55c>] do_exception+0x394/0xae0 arch/s390/mm/fault.c:479
[<00000000a598d7c4>] do_dat_exception+0x34/0x80 arch/s390/mm/fault.c:567
[<00000000aa124e5e>] pgm_check_handler+0x1da/0x22c arch/s390/kernel/entry.S:706
[<00000000aa0a6902>] copy_from_user_mvcos arch/s390/lib/uaccess.c:111 [inline]
[<00000000aa0a6902>] raw_copy_from_user+0x3a/0x88 arch/s390/lib/uaccess.c:174
[<00000000a7c24668>] _copy_from_user+0x48/0xa8 lib/usercopy.c:16
[<00000000a5b0b2a8>] copy_from_user include/linux/uaccess.h:192 [inline]
[<00000000a5b0b2a8>] __do_sys_sigaltstack kernel/signal.c:4064 [inline]
[<00000000a5b0b2a8>] __s390x_sys_sigaltstack+0xc8/0x240 kernel/signal.c:4060
[<00000000aa124a9c>] system_call+0xe0/0x28c arch/s390/kernel/entry.S:415
Allocated by task 9334:
stack_trace_save+0xbe/0xf0 kernel/stacktrace.c:121
kasan_save_stack+0x30/0x60 mm/kasan/common.c:48
kasan_set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc.constprop.0+0xd0/0xe8 mm/kasan/common.c:461
slab_post_alloc_hook mm/slab.h:526 [inline]
slab_alloc_node mm/slub.c:2891 [inline]
slab_alloc mm/slub.c:2899 [inline]
kmem_cache_alloc+0x118/0x348 mm/slub.c:2904
vm_area_dup+0x9c/0x2b8 kernel/fork.c:356
__split_vma+0xba/0x560 mm/mmap.c:2742
split_vma+0xca/0x108 mm/mmap.c:2800
mlock_fixup+0x4ae/0x600 mm/mlock.c:550
apply_vma_lock_flags+0x2c6/0x398 mm/mlock.c:619
do_mlock+0x1aa/0x718 mm/mlock.c:711
__do_sys_mlock2 mm/mlock.c:738 [inline]
__s390x_sys_mlock2+0x86/0xa8 mm/mlock.c:728
system_call+0xe0/0x28c arch/s390/kernel/entry.S:415
Freed by task 9333:
stack_trace_save+0xbe/0xf0 kernel/stacktrace.c:121
kasan_save_stack+0x30/0x60 mm/kasan/common.c:48
kasan_set_track+0x32/0x48 mm/kasan/common.c:56
kasan_set_free_info+0x34/0x50 mm/kasan/generic.c:355
__kasan_slab_free+0x11e/0x190 mm/kasan/common.c:422
slab_free_hook mm/slub.c:1544 [inline]
slab_free_freelist_hook mm/slub.c:1577 [inline]
slab_free mm/slub.c:3142 [inline]
kmem_cache_free+0x7c/0x4b8 mm/slub.c:3158
__vma_adjust+0x7b2/0x2508 mm/mmap.c:960
vma_merge+0x87e/0xce0 mm/mmap.c:1209
userfaultfd_release+0x412/0x6b8 fs/userfaultfd.c:868
__fput+0x22c/0x7a8 fs/file_table.c:281
task_work_run+0x200/0x320 kernel/task_work.c:151
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
do_notify_resume+0x100/0x148 arch/s390/kernel/signal.c:538
system_call+0xe6/0x28c arch/s390/kernel/entry.S:416
The buggy address belongs to the object at 00000000962d6948
which belongs to the cache vm_area_struct of size 200
The buggy address is located 64 bytes inside of
200-byte region [00000000962d6948, 00000000962d6a10)
The buggy address belongs to the page:
page:00000000313a09fe refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x962d6
flags: 0x3ffff00000000200(slab)
raw: 3ffff00000000200 000040000257e080 0000000c0000000c 000000008020ba00
raw: 0000000000000000 000f001e00000000 ffffffff00000001 0000000096959501
page dumped because: kasan: bad access detected
page->mem_cgroup:0000000096959501
Memory state around the buggy address:
00000000962d6880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000000962d6900: 00 fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb
>00000000962d6980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
00000000962d6a00: fb fb fc fc fc fc fc fc fc fc 00 00 00 00 00 00
00000000962d6a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Link: https://lkml.kernel.org/r/20201110190329.11920-1-gerald.schaefer@linux.ibm.com
Fixes: 6b251fc96cf2c ("userfaultfd: call handle_userfault() for userfaultfd_missing() faults")
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: <stable@vger.kernel.org> [4.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/huge_memory.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
--- a/mm/huge_memory.c~mm-userfaultfd-do-not-access-vma-vm_mm-after-calling-handle_userfault
+++ a/mm/huge_memory.c
@@ -710,7 +710,6 @@ vm_fault_t do_huge_pmd_anonymous_page(st
transparent_hugepage_use_zero_page()) {
pgtable_t pgtable;
struct page *zero_page;
- bool set;
vm_fault_t ret;
pgtable = pte_alloc_one(vma->vm_mm);
if (unlikely(!pgtable))
@@ -723,25 +722,25 @@ vm_fault_t do_huge_pmd_anonymous_page(st
}
vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
ret = 0;
- set = false;
if (pmd_none(*vmf->pmd)) {
ret = check_stable_address_space(vma->vm_mm);
if (ret) {
spin_unlock(vmf->ptl);
+ pte_free(vma->vm_mm, pgtable);
} else if (userfaultfd_missing(vma)) {
spin_unlock(vmf->ptl);
+ pte_free(vma->vm_mm, pgtable);
ret = handle_userfault(vmf, VM_UFFD_MISSING);
VM_BUG_ON(ret & VM_FAULT_FALLBACK);
} else {
set_huge_zero_page(pgtable, vma->vm_mm, vma,
haddr, vmf->pmd, zero_page);
spin_unlock(vmf->ptl);
- set = true;
}
- } else
+ } else {
spin_unlock(vmf->ptl);
- if (!set)
pte_free(vma->vm_mm, pgtable);
+ }
return ret;
}
gfp = alloc_hugepage_direct_gfpmask(vma);
_
^ permalink raw reply [flat|nested] 9+ messages in thread
* [patch 7/8] libfs: fix error cast of negative value in simple_attr_write()
2020-11-22 6:16 incoming Andrew Morton
` (5 preceding siblings ...)
2020-11-22 6:17 ` [patch 6/8] mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault() Andrew Morton
@ 2020-11-22 6:17 ` Andrew Morton
2020-11-22 6:17 ` [patch 8/8] mm: fix madvise WILLNEED performance problem Andrew Morton
7 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2020-11-22 6:17 UTC (permalink / raw)
To: akpm, linux-mm, mm-commits, torvalds, viro, yangyicong
From: Yicong Yang <yangyicong@hisilicon.com>
Subject: libfs: fix error cast of negative value in simple_attr_write()
The attr->set() receive a value of u64, but simple_strtoll() is used for
doing the conversion. It will lead to the error cast if user inputs a
negative value.
Use kstrtoull() instead of simple_strtoll() to convert a string got from
the user to an unsigned value. The former will return '-EINVAL' if it
gets a negetive value, but the latter can't handle the situation
correctly. Make 'val' unsigned long long as what kstrtoull() takes, this
will eliminate the compile warning on no 64-bit architectures.
Link: https://lkml.kernel.org/r/1605341356-11872-1-git-send-email-yangyicong@hisilicon.com
Fixes: f7b88631a897 ("fs/libfs.c: fix simple_attr_write() on 32bit machines")
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/libfs.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/fs/libfs.c~libfs-fix-error-cast-of-negative-value-in-simple_attr_write
+++ a/fs/libfs.c
@@ -959,7 +959,7 @@ ssize_t simple_attr_write(struct file *f
size_t len, loff_t *ppos)
{
struct simple_attr *attr;
- u64 val;
+ unsigned long long val;
size_t size;
ssize_t ret;
@@ -977,7 +977,9 @@ ssize_t simple_attr_write(struct file *f
goto out;
attr->set_buf[size] = '\0';
- val = simple_strtoll(attr->set_buf, NULL, 0);
+ ret = kstrtoull(attr->set_buf, 0, &val);
+ if (ret)
+ goto out;
ret = attr->set(attr->data, val);
if (ret == 0)
ret = len; /* on success, claim we got the whole input */
_
^ permalink raw reply [flat|nested] 9+ messages in thread
* [patch 8/8] mm: fix madvise WILLNEED performance problem
2020-11-22 6:16 incoming Andrew Morton
` (6 preceding siblings ...)
2020-11-22 6:17 ` [patch 7/8] libfs: fix error cast of negative value in simple_attr_write() Andrew Morton
@ 2020-11-22 6:17 ` Andrew Morton
7 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2020-11-22 6:17 UTC (permalink / raw)
To: akpm, feng.tang, hannes, linux-mm, mm-commits, rong.a.chen,
torvalds, william.kucharski, willy, zhengjun.xing
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm: fix madvise WILLNEED performance problem
The calculation of the end page index was incorrect, leading to a
regression of 70% when running stress-ng. With this fix, we instead
see a performance improvement of 3%.
Link: https://lkml.kernel.org/r/20201109134851.29692-1-willy@infradead.org
Fixes: e6e88712e43b ("mm: optimise madvise WILLNEED")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: kernel test robot <rong.a.chen@intel.com>
Tested-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: "Chen, Rong A" <rong.a.chen@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/madvise.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/madvise.c~mm-fix-madvise-willneed-performance-problem
+++ a/mm/madvise.c
@@ -226,7 +226,7 @@ static void force_shm_swapin_readahead(s
struct address_space *mapping)
{
XA_STATE(xas, &mapping->i_pages, linear_page_index(vma, start));
- pgoff_t end_index = end / PAGE_SIZE;
+ pgoff_t end_index = linear_page_index(vma, end + PAGE_SIZE - 1);
struct page *page;
rcu_read_lock();
_
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-11-22 6:17 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-22 6:16 incoming Andrew Morton
2020-11-22 6:16 ` [patch 1/8] mm/madvise: fix memory leak from process_madvise Andrew Morton
2020-11-22 6:17 ` [patch 2/8] compiler-clang: remove version check for BPF Tracing Andrew Morton
2020-11-22 6:17 ` [patch 3/8] mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports Andrew Morton
2020-11-22 6:17 ` [patch 4/8] mm: fix readahead_page_batch for retry entries Andrew Morton
2020-11-22 6:17 ` [patch 5/8] mm: memcg/slab: fix root memcg vmstats Andrew Morton
2020-11-22 6:17 ` [patch 6/8] mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault() Andrew Morton
2020-11-22 6:17 ` [patch 7/8] libfs: fix error cast of negative value in simple_attr_write() Andrew Morton
2020-11-22 6:17 ` [patch 8/8] mm: fix madvise WILLNEED performance problem Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).