All of lore.kernel.org
 help / color / mirror / Atom feed
* incoming
@ 2021-07-08  0:59 Andrew Morton
  2021-07-08  1:07 ` [patch 01/54] lib/test: fix spelling mistakes Andrew Morton
                   ` (53 more replies)
  0 siblings, 54 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  0:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, mm-commits

54 patches, based on a931dd33d370896a683236bba67c0d6f3d01144d.

Subsystems affected by this patch series:

  lib
  mm/slub
  mm/secretmem
  mm/cleanups
  mm/init
  debug
  mm/pagemap
  mm/mremap

Subsystem: lib

    Zhen Lei <thunder.leizhen@huawei.com>:
      lib/test: fix spelling mistakes
      lib: fix spelling mistakes
      lib: fix spelling mistakes in header files

Subsystem: mm/slub

    Nathan Chancellor <nathan@kernel.org>:
    Patch series "hexagon: Fix build error with CONFIG_STACKDEPOT and select CONFIG_ARCH_WANT_LD_ORPHAN_WARN":
      hexagon: handle {,SOFT}IRQENTRY_TEXT in linker script
      hexagon: use common DISCARDS macro
      hexagon: select ARCH_WANT_LD_ORPHAN_WARN

    Oliver Glitta <glittao@gmail.com>:
      mm/slub: use stackdepot to save stack trace in objects

Subsystem: mm/secretmem

    Mike Rapoport <rppt@linux.ibm.com>:
    Patch series "mm: introduce memfd_secret system call to create "secret" memory areas", v20:
      mmap: make mlock_future_check() global
      riscv/Kconfig: make direct map manipulation options depend on MMU
      set_memory: allow querying whether set_direct_map_*() is actually enabled
      mm: introduce memfd_secret system call to create "secret" memory areas
      PM: hibernate: disable when there are active secretmem users
      arch, mm: wire up memfd_secret system call where relevant
      secretmem: test: add basic selftest for memfd_secret(2)

Subsystem: mm/cleanups

    Zhen Lei <thunder.leizhen@huawei.com>:
      mm: fix spelling mistakes in header files

Subsystem: mm/init

    Kefeng Wang <wangkefeng.wang@huawei.com>:
    Patch series "init_mm: cleanup ARCH's text/data/brk setup code", v3:
      mm: add setup_initial_init_mm() helper
      arc: convert to setup_initial_init_mm()
      arm: convert to setup_initial_init_mm()
      arm64: convert to setup_initial_init_mm()
      csky: convert to setup_initial_init_mm()
      h8300: convert to setup_initial_init_mm()
      m68k: convert to setup_initial_init_mm()
      nds32: convert to setup_initial_init_mm()
      nios2: convert to setup_initial_init_mm()
      openrisc: convert to setup_initial_init_mm()
      powerpc: convert to setup_initial_init_mm()
      riscv: convert to setup_initial_init_mm()
      s390: convert to setup_initial_init_mm()
      sh: convert to setup_initial_init_mm()
      x86: convert to setup_initial_init_mm()

Subsystem: debug

    Stephen Boyd <swboyd@chromium.org>:
    Patch series "Add build ID to stacktraces", v6:
      buildid: only consider GNU notes for build ID parsing
      buildid: add API to parse build ID out of buffer
      buildid: stash away kernels build ID on init
      dump_stack: add vmlinux build ID to stack traces
      module: add printk formats to add module build ID to stacktraces
      arm64: stacktrace: use %pSb for backtrace printing
      x86/dumpstack: use %pSb/%pBb for backtrace printing
      scripts/decode_stacktrace.sh: support debuginfod
      scripts/decode_stacktrace.sh: silence stderr messages from addr2line/nm
      scripts/decode_stacktrace.sh: indicate 'auto' can be used for base path
      buildid: mark some arguments const
      buildid: fix kernel-doc notation
      kdump: use vmlinux_build_id to simplify

Subsystem: mm/pagemap

    "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>:
      mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t *
      mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t *

Subsystem: mm/mremap

    "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>:
    Patch series "mrermap fixes", v2:
      selftest/mremap_test: update the test to handle pagesize other than 4K
      selftest/mremap_test: avoid crash with static build
      mm/mremap: convert huge PUD move to separate helper
      mm/mremap: don't enable optimized PUD move if page table levels is 2
      mm/mremap: use pmd/pud_poplulate to update page table entries
      mm/mremap: hold the rmap lock in write mode when moving page table entries.
    Patch series "Speedup mremap on ppc64", v8:
      mm/mremap: allow arch runtime override
      powerpc/book3s64/mm: update flush_tlb_range to flush page walk cache
      powerpc/mm: enable HAVE_MOVE_PMD support

 Documentation/core-api/printk-formats.rst           |   11 
 arch/alpha/include/asm/pgtable.h                    |    8 
 arch/arc/mm/init.c                                  |    5 
 arch/arm/include/asm/pgtable-3level.h               |    2 
 arch/arm/kernel/setup.c                             |    5 
 arch/arm64/include/asm/Kbuild                       |    1 
 arch/arm64/include/asm/cacheflush.h                 |    6 
 arch/arm64/include/asm/kfence.h                     |    2 
 arch/arm64/include/asm/pgtable.h                    |    8 
 arch/arm64/include/asm/set_memory.h                 |   17 +
 arch/arm64/include/uapi/asm/unistd.h                |    1 
 arch/arm64/kernel/machine_kexec.c                   |    1 
 arch/arm64/kernel/setup.c                           |    5 
 arch/arm64/kernel/stacktrace.c                      |    2 
 arch/arm64/mm/mmu.c                                 |    7 
 arch/arm64/mm/pageattr.c                            |   13 
 arch/csky/kernel/setup.c                            |    5 
 arch/h8300/kernel/setup.c                           |    5 
 arch/hexagon/Kconfig                                |    1 
 arch/hexagon/kernel/vmlinux.lds.S                   |    9 
 arch/ia64/include/asm/pgtable.h                     |    4 
 arch/m68k/include/asm/motorola_pgtable.h            |    2 
 arch/m68k/kernel/setup_mm.c                         |    5 
 arch/m68k/kernel/setup_no.c                         |    5 
 arch/mips/include/asm/pgtable-64.h                  |    8 
 arch/nds32/kernel/setup.c                           |    5 
 arch/nios2/kernel/setup.c                           |    5 
 arch/openrisc/kernel/setup.c                        |    5 
 arch/parisc/include/asm/pgtable.h                   |    4 
 arch/powerpc/include/asm/book3s/64/pgtable.h        |   11 
 arch/powerpc/include/asm/book3s/64/tlbflush-radix.h |    2 
 arch/powerpc/include/asm/nohash/64/pgtable-4k.h     |    6 
 arch/powerpc/include/asm/nohash/64/pgtable.h        |    6 
 arch/powerpc/include/asm/tlb.h                      |    6 
 arch/powerpc/kernel/setup-common.c                  |    5 
 arch/powerpc/mm/book3s64/radix_hugetlbpage.c        |    8 
 arch/powerpc/mm/book3s64/radix_pgtable.c            |    6 
 arch/powerpc/mm/book3s64/radix_tlb.c                |   44 +-
 arch/powerpc/mm/pgtable_64.c                        |    4 
 arch/powerpc/platforms/Kconfig.cputype              |    2 
 arch/riscv/Kconfig                                  |    4 
 arch/riscv/include/asm/pgtable-64.h                 |    4 
 arch/riscv/include/asm/unistd.h                     |    1 
 arch/riscv/kernel/setup.c                           |    5 
 arch/s390/kernel/setup.c                            |    5 
 arch/sh/include/asm/pgtable-3level.h                |    4 
 arch/sh/kernel/setup.c                              |    5 
 arch/sparc/include/asm/pgtable_32.h                 |    6 
 arch/sparc/include/asm/pgtable_64.h                 |   10 
 arch/um/include/asm/pgtable-3level.h                |    2 
 arch/x86/entry/syscalls/syscall_32.tbl              |    1 
 arch/x86/entry/syscalls/syscall_64.tbl              |    1 
 arch/x86/include/asm/pgtable.h                      |    8 
 arch/x86/kernel/dumpstack.c                         |    2 
 arch/x86/kernel/setup.c                             |    5 
 arch/x86/mm/init_64.c                               |    4 
 arch/x86/mm/pat/set_memory.c                        |    4 
 arch/x86/mm/pgtable.c                               |    2 
 include/asm-generic/pgtable-nop4d.h                 |    2 
 include/asm-generic/pgtable-nopmd.h                 |    2 
 include/asm-generic/pgtable-nopud.h                 |    4 
 include/linux/bootconfig.h                          |    4 
 include/linux/buildid.h                             |   10 
 include/linux/compaction.h                          |    4 
 include/linux/cpumask.h                             |    2 
 include/linux/crash_core.h                          |   12 
 include/linux/debugobjects.h                        |    2 
 include/linux/hmm.h                                 |    2 
 include/linux/hugetlb.h                             |    6 
 include/linux/kallsyms.h                            |   21 +
 include/linux/list_lru.h                            |    4 
 include/linux/lru_cache.h                           |    8 
 include/linux/mm.h                                  |    3 
 include/linux/mmu_notifier.h                        |    8 
 include/linux/module.h                              |    9 
 include/linux/nodemask.h                            |    6 
 include/linux/percpu-defs.h                         |    2 
 include/linux/percpu-refcount.h                     |    2 
 include/linux/pgtable.h                             |    4 
 include/linux/scatterlist.h                         |    2 
 include/linux/secretmem.h                           |   54 +++
 include/linux/set_memory.h                          |   12 
 include/linux/shrinker.h                            |    2 
 include/linux/syscalls.h                            |    1 
 include/linux/vmalloc.h                             |    4 
 include/uapi/asm-generic/unistd.h                   |    7 
 include/uapi/linux/magic.h                          |    1 
 init/Kconfig                                        |    1 
 init/main.c                                         |    2 
 kernel/crash_core.c                                 |   50 ---
 kernel/kallsyms.c                                   |  104 +++++--
 kernel/module.c                                     |   42 ++
 kernel/power/hibernate.c                            |    5 
 kernel/sys_ni.c                                     |    2 
 lib/Kconfig.debug                                   |   17 -
 lib/asn1_encoder.c                                  |    2 
 lib/buildid.c                                       |   80 ++++-
 lib/devres.c                                        |    2 
 lib/dump_stack.c                                    |   13 
 lib/dynamic_debug.c                                 |    2 
 lib/fonts/font_pearl_8x8.c                          |    2 
 lib/kfifo.c                                         |    2 
 lib/list_sort.c                                     |    2 
 lib/nlattr.c                                        |    4 
 lib/oid_registry.c                                  |    2 
 lib/pldmfw/pldmfw.c                                 |    2 
 lib/reed_solomon/test_rslib.c                       |    2 
 lib/refcount.c                                      |    2 
 lib/rhashtable.c                                    |    2 
 lib/sbitmap.c                                       |    2 
 lib/scatterlist.c                                   |    4 
 lib/seq_buf.c                                       |    2 
 lib/sort.c                                          |    2 
 lib/stackdepot.c                                    |    2 
 lib/test_bitops.c                                   |    2 
 lib/test_bpf.c                                      |    2 
 lib/test_kasan.c                                    |    2 
 lib/test_kmod.c                                     |    6 
 lib/test_scanf.c                                    |    2 
 lib/vsprintf.c                                      |   10 
 mm/Kconfig                                          |    4 
 mm/Makefile                                         |    1 
 mm/gup.c                                            |   12 
 mm/init-mm.c                                        |    9 
 mm/internal.h                                       |    3 
 mm/mlock.c                                          |    3 
 mm/mmap.c                                           |    5 
 mm/mremap.c                                         |  108 ++++++-
 mm/secretmem.c                                      |  254 +++++++++++++++++
 mm/slub.c                                           |   79 +++--
 scripts/checksyscalls.sh                            |    4 
 scripts/decode_stacktrace.sh                        |   89 +++++-
 tools/testing/selftests/vm/.gitignore               |    1 
 tools/testing/selftests/vm/Makefile                 |    3 
 tools/testing/selftests/vm/memfd_secret.c           |  296 ++++++++++++++++++++
 tools/testing/selftests/vm/mremap_test.c            |  116 ++++---
 tools/testing/selftests/vm/run_vmtests.sh           |   17 +
 137 files changed, 1470 insertions(+), 442 deletions(-)


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 01/54] lib/test: fix spelling mistakes
  2021-07-08  0:59 incoming Andrew Morton
@ 2021-07-08  1:07 ` Andrew Morton
  2021-07-08  1:07 ` [patch 02/54] lib: " Andrew Morton
                   ` (52 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:07 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, thunder.leizhen, torvalds, yhs

From: Zhen Lei <thunder.leizhen@huawei.com>
Subject: lib/test: fix spelling mistakes

Fix some spelling mistakes in comments found by "codespell":
thats ==> that's
unitialized ==> uninitialized
panicing ==> panicking
sucess ==> success
possitive ==> positive
intepreted ==> interpreted

Link: https://lkml.kernel.org/r/20210607133036.12525-2-thunder.leizhen@huawei.com
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>	[test_bfp.c]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 lib/test_bitops.c |    2 +-
 lib/test_bpf.c    |    2 +-
 lib/test_kasan.c  |    2 +-
 lib/test_kmod.c   |    6 +++---
 lib/test_scanf.c  |    2 +-
 5 files changed, 7 insertions(+), 7 deletions(-)

--- a/lib/test_bitops.c~lib-test-fix-spelling-mistakes
+++ a/lib/test_bitops.c
@@ -15,7 +15,7 @@
  *   get_count_order/long
  */
 
-/* use an enum because thats the most common BITMAP usage */
+/* use an enum because that's the most common BITMAP usage */
 enum bitops_fun {
 	BITOPS_4 = 4,
 	BITOPS_7 = 7,
--- a/lib/test_bpf.c~lib-test-fix-spelling-mistakes
+++ a/lib/test_bpf.c
@@ -1095,7 +1095,7 @@ static struct bpf_test tests[] = {
 	{
 		"RET_A",
 		.u.insns = {
-			/* check that unitialized X and A contain zeros */
+			/* check that uninitialized X and A contain zeros */
 			BPF_STMT(BPF_MISC | BPF_TXA, 0),
 			BPF_STMT(BPF_RET | BPF_A, 0)
 		},
--- a/lib/test_kasan.c~lib-test-fix-spelling-mistakes
+++ a/lib/test_kasan.c
@@ -651,7 +651,7 @@ static void kasan_global_oob(struct kuni
 {
 	/*
 	 * Deliberate out-of-bounds access. To prevent CONFIG_UBSAN_LOCAL_BOUNDS
-	 * from failing here and panicing the kernel, access the array via a
+	 * from failing here and panicking the kernel, access the array via a
 	 * volatile pointer, which will prevent the compiler from being able to
 	 * determine the array bounds.
 	 *
--- a/lib/test_kmod.c~lib-test-fix-spelling-mistakes
+++ a/lib/test_kmod.c
@@ -286,7 +286,7 @@ static int tally_work_test(struct kmod_t
  * If this ran it means *all* tasks were created fine and we
  * are now just collecting results.
  *
- * Only propagate errors, do not override with a subsequent sucess case.
+ * Only propagate errors, do not override with a subsequent success case.
  */
 static void tally_up_work(struct kmod_test_device *test_dev)
 {
@@ -543,7 +543,7 @@ static int trigger_config_run(struct kmo
 	 * wrong with the setup of the test. If the test setup went fine
 	 * then userspace must just check the result of config->test_result.
 	 * One issue with relying on the return from a call in the kernel
-	 * is if the kernel returns a possitive value using this trigger
+	 * is if the kernel returns a positive value using this trigger
 	 * will not return the value to userspace, it would be lost.
 	 *
 	 * By not relying on capturing the return value of tests we are using
@@ -585,7 +585,7 @@ trigger_config_store(struct device *dev,
 	 * Note: any return > 0 will be treated as success
 	 * and the error value will not be available to userspace.
 	 * Do not rely on trying to send to userspace a test value
-	 * return value as possitive return errors will be lost.
+	 * return value as positive return errors will be lost.
 	 */
 	if (WARN_ON(ret > 0))
 		return -EINVAL;
--- a/lib/test_scanf.c~lib-test-fix-spelling-mistakes
+++ a/lib/test_scanf.c
@@ -600,7 +600,7 @@ static void __init numbers_prefix_overfl
 	/*
 	 * 0x prefix in a field of width 2 using %i conversion: first field
 	 * converts to 0. Next field scan starts at the character after "0x",
-	 * which will convert if can be intepreted as decimal but will fail
+	 * which will convert if can be interpreted as decimal but will fail
 	 * if it contains any hex digits (since no 0x prefix).
 	 */
 	test_number_prefix(long long,	"0x67", "%2lli%lli", 0, 67, 2, check_ll);
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 02/54] lib: fix spelling mistakes
  2021-07-08  0:59 incoming Andrew Morton
  2021-07-08  1:07 ` [patch 01/54] lib/test: fix spelling mistakes Andrew Morton
@ 2021-07-08  1:07 ` Andrew Morton
  2021-07-08  1:07 ` [patch 03/54] lib: fix spelling mistakes in header files Andrew Morton
                   ` (51 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:07 UTC (permalink / raw)
  To: akpm, jacob.e.keller, linux-mm, mm-commits, thunder.leizhen, torvalds

From: Zhen Lei <thunder.leizhen@huawei.com>
Subject: lib: fix spelling mistakes

Fix some spelling mistakes in comments:
permanentely ==> permanently
wont ==> won't
remaning ==> remaining
succed ==> succeed
shouldnt ==> shouldn't
alpha-numeric ==> alphanumeric
storeing ==> storing
funtion ==> function
documenation ==> documentation
Determin ==> Determine
intepreted ==> interpreted
ammount ==> amount
obious ==> obvious
interupts ==> interrupts
occured ==> occurred
asssociated ==> associated
taking into acount ==> taking into account
squence ==> sequence
stil ==> still
contiguos ==> contiguous
matchs ==> matches

Link: https://lkml.kernel.org/r/20210607072555.12416-1-thunder.leizhen@huawei.com
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 lib/Kconfig.debug             |    6 +++---
 lib/asn1_encoder.c            |    2 +-
 lib/devres.c                  |    2 +-
 lib/dynamic_debug.c           |    2 +-
 lib/fonts/font_pearl_8x8.c    |    2 +-
 lib/kfifo.c                   |    2 +-
 lib/list_sort.c               |    2 +-
 lib/nlattr.c                  |    4 ++--
 lib/oid_registry.c            |    2 +-
 lib/pldmfw/pldmfw.c           |    2 +-
 lib/reed_solomon/test_rslib.c |    2 +-
 lib/refcount.c                |    2 +-
 lib/rhashtable.c              |    2 +-
 lib/sbitmap.c                 |    2 +-
 lib/scatterlist.c             |    4 ++--
 lib/seq_buf.c                 |    2 +-
 lib/sort.c                    |    2 +-
 lib/stackdepot.c              |    2 +-
 lib/vsprintf.c                |    2 +-
 19 files changed, 23 insertions(+), 23 deletions(-)

--- a/lib/asn1_encoder.c~lib-fix-spelling-mistakes
+++ a/lib/asn1_encoder.c
@@ -181,7 +181,7 @@ EXPORT_SYMBOL_GPL(asn1_encode_oid);
 /**
  * asn1_encode_length() - encode a length to follow an ASN.1 tag
  * @data: pointer to encode at
- * @data_len: pointer to remaning length (adjusted by routine)
+ * @data_len: pointer to remaining length (adjusted by routine)
  * @len: length to encode
  *
  * This routine can encode lengths up to 65535 using the ASN.1 rules.
--- a/lib/devres.c~lib-fix-spelling-mistakes
+++ a/lib/devres.c
@@ -355,7 +355,7 @@ static void pcim_iomap_release(struct de
  * detach.
  *
  * This function might sleep when the table is first allocated but can
- * be safely called without context and guaranteed to succed once
+ * be safely called without context and guaranteed to succeed once
  * allocated.
  */
 void __iomem * const *pcim_iomap_table(struct pci_dev *pdev)
--- a/lib/dynamic_debug.c~lib-fix-spelling-mistakes
+++ a/lib/dynamic_debug.c
@@ -991,7 +991,7 @@ static int ddebug_dyndbg_param_cb(char *
 
 	ddebug_exec_queries((val ? val : "+p"), modname);
 
-	return 0; /* query failure shouldnt stop module load */
+	return 0; /* query failure shouldn't stop module load */
 }
 
 /* handle both dyndbg and $module.dyndbg params at boot */
--- a/lib/fonts/font_pearl_8x8.c~lib-fix-spelling-mistakes
+++ a/lib/fonts/font_pearl_8x8.c
@@ -3,7 +3,7 @@
 /*                                            */
 /*       Font file generated by cpi2fnt       */
 /*       ------------------------------       */
-/*       Combined with the alpha-numeric      */
+/*       Combined with the alphanumeric       */
 /*       portion of Greg Harp's old PEARL     */
 /*       font (from earlier versions of       */
 /*       linux-m86k) by John Shifflett        */
--- a/lib/Kconfig.debug~lib-fix-spelling-mistakes
+++ a/lib/Kconfig.debug
@@ -1282,7 +1282,7 @@ config PROVE_RAW_LOCK_NESTING
 	 option expect lockdep splats until these problems have been fully
 	 addressed which is work in progress. This config switch allows to
 	 identify and analyze these problems. It will be removed and the
-	 check permanentely enabled once the main issues have been fixed.
+	 check permanently enabled once the main issues have been fixed.
 
 	 If unsure, select N.
 
@@ -1448,7 +1448,7 @@ config DEBUG_LOCKING_API_SELFTESTS
 	  Say Y here if you want the kernel to run a short self-test during
 	  bootup. The self-test checks whether common types of locking bugs
 	  are detected by debugging mechanisms or not. (if you disable
-	  lock debugging then those bugs wont be detected of course.)
+	  lock debugging then those bugs won't be detected of course.)
 	  The following locking APIs are covered: spinlocks, rwlocks,
 	  mutexes and rwsems.
 
@@ -1928,7 +1928,7 @@ config FAIL_IO_TIMEOUT
 	  thus exercising the error handling.
 
 	  Only works with drivers that use the generic timeout handling,
-	  for others it wont do anything.
+	  for others it won't do anything.
 
 config FAIL_FUTEX
 	bool "Fault-injection capability for futexes"
--- a/lib/kfifo.c~lib-fix-spelling-mistakes
+++ a/lib/kfifo.c
@@ -415,7 +415,7 @@ static unsigned int __kfifo_peek_n(struc
 	)
 
 /*
- * __kfifo_poke_n internal helper function for storeing the length of
+ * __kfifo_poke_n internal helper function for storing the length of
  * the record into the fifo
  */
 static void __kfifo_poke_n(struct __kfifo *fifo, unsigned int n, size_t recsize)
--- a/lib/list_sort.c~lib-fix-spelling-mistakes
+++ a/lib/list_sort.c
@@ -104,7 +104,7 @@ static void merge_final(void *priv, list
  * @head: the list to sort
  * @cmp: the elements comparison function
  *
- * The comparison funtion @cmp must return > 0 if @a should sort after
+ * The comparison function @cmp must return > 0 if @a should sort after
  * @b ("@a > @b" if you want an ascending sort), and <= 0 if @a should
  * sort before @b *or* their original order should be preserved.  It is
  * always called with the element that came first in the input in @a,
--- a/lib/nlattr.c~lib-fix-spelling-mistakes
+++ a/lib/nlattr.c
@@ -619,7 +619,7 @@ static int __nla_validate_parse(const st
  * Validates all attributes in the specified attribute stream against the
  * specified policy. Validation depends on the validate flags passed, see
  * &enum netlink_validation for more details on that.
- * See documenation of struct nla_policy for more details.
+ * See documentation of struct nla_policy for more details.
  *
  * Returns 0 on success or a negative error code.
  */
@@ -633,7 +633,7 @@ int __nla_validate(const struct nlattr *
 EXPORT_SYMBOL(__nla_validate);
 
 /**
- * nla_policy_len - Determin the max. length of a policy
+ * nla_policy_len - Determine the max. length of a policy
  * @policy: policy to use
  * @n: number of policies
  *
--- a/lib/oid_registry.c~lib-fix-spelling-mistakes
+++ a/lib/oid_registry.c
@@ -124,7 +124,7 @@ EXPORT_SYMBOL_GPL(parse_OID);
  * @bufsize: The size of the buffer
  *
  * The OID is rendered into the buffer in "a.b.c.d" format and the number of
- * bytes is returned.  -EBADMSG is returned if the data could not be intepreted
+ * bytes is returned.  -EBADMSG is returned if the data could not be interpreted
  * and -ENOBUFS if the buffer was too small.
  */
 int sprint_oid(const void *data, size_t datasize, char *buffer, size_t bufsize)
--- a/lib/pldmfw/pldmfw.c~lib-fix-spelling-mistakes
+++ a/lib/pldmfw/pldmfw.c
@@ -82,7 +82,7 @@ pldm_check_fw_space(struct pldmfw_priv *
  * @bytes_to_move: number of bytes to move the offset forward by
  *
  * Check that there is enough space past the current offset, and then move the
- * offset forward by this ammount.
+ * offset forward by this amount.
  *
  * Returns: zero on success, or -EFAULT if the image is too small to fit the
  * expected length.
--- a/lib/reed_solomon/test_rslib.c~lib-fix-spelling-mistakes
+++ a/lib/reed_solomon/test_rslib.c
@@ -385,7 +385,7 @@ static void test_bc(struct rs_control *r
 
 			/*
 			 * We check that the returned word is actually a
-			 * codeword. The obious way to do this would be to
+			 * codeword. The obvious way to do this would be to
 			 * compute the syndrome, but we don't want to replicate
 			 * that code here. However, all the codes are in
 			 * systematic form, and therefore we can encode the
--- a/lib/refcount.c~lib-fix-spelling-mistakes
+++ a/lib/refcount.c
@@ -164,7 +164,7 @@ EXPORT_SYMBOL(refcount_dec_and_lock);
  * @flags: saved IRQ-flags if the is acquired
  *
  * Same as refcount_dec_and_lock() above except that the spinlock is acquired
- * with disabled interupts.
+ * with disabled interrupts.
  *
  * Return: true and hold spinlock if able to decrement refcount to 0, false
  *         otherwise
--- a/lib/rhashtable.c~lib-fix-spelling-mistakes
+++ a/lib/rhashtable.c
@@ -703,7 +703,7 @@ EXPORT_SYMBOL_GPL(rhashtable_walk_exit);
  *
  * Returns zero if successful.
  *
- * Returns -EAGAIN if resize event occured.  Note that the iterator
+ * Returns -EAGAIN if resize event occurred.  Note that the iterator
  * will rewind back to the beginning and you may use it immediately
  * by calling rhashtable_walk_next.
  *
--- a/lib/sbitmap.c~lib-fix-spelling-mistakes
+++ a/lib/sbitmap.c
@@ -583,7 +583,7 @@ void sbitmap_queue_clear(struct sbitmap_
 	/*
 	 * Once the clear bit is set, the bit may be allocated out.
 	 *
-	 * Orders READ/WRITE on the asssociated instance(such as request
+	 * Orders READ/WRITE on the associated instance(such as request
 	 * of blk_mq) by this bit for avoiding race with re-allocation,
 	 * and its pair is the memory barrier implied in __sbitmap_get_word.
 	 *
--- a/lib/scatterlist.c~lib-fix-spelling-mistakes
+++ a/lib/scatterlist.c
@@ -38,7 +38,7 @@ EXPORT_SYMBOL(sg_next);
  * @sg:		The scatterlist
  *
  * Description:
- * Allows to know how many entries are in sg, taking into acount
+ * Allows to know how many entries are in sg, taking into account
  * chaining as well
  *
  **/
@@ -59,7 +59,7 @@ EXPORT_SYMBOL(sg_nents);
  *
  * Description:
  * Determines the number of entries in sg that are required to meet
- * the supplied length, taking into acount chaining as well
+ * the supplied length, taking into account chaining as well
  *
  * Returns:
  *   the number of sg entries needed, negative error on failure
--- a/lib/seq_buf.c~lib-fix-spelling-mistakes
+++ a/lib/seq_buf.c
@@ -289,7 +289,7 @@ int seq_buf_path(struct seq_buf *s, cons
 }
 
 /**
- * seq_buf_to_user - copy the squence buffer to user space
+ * seq_buf_to_user - copy the sequence buffer to user space
  * @s: seq_buf descriptor
  * @ubuf: The userspace memory location to copy to
  * @cnt: The amount to copy
--- a/lib/sort.c~lib-fix-spelling-mistakes
+++ a/lib/sort.c
@@ -51,7 +51,7 @@ static bool is_aligned(const void *base,
  * which basically all CPUs have, to minimize loop overhead computations.
  *
  * For some reason, on x86 gcc 7.3.0 adds a redundant test of n at the
- * bottom of the loop, even though the zero flag is stil valid from the
+ * bottom of the loop, even though the zero flag is still valid from the
  * subtract (since the intervening mov instructions don't alter the flags).
  * Gcc 8.1.0 doesn't have that problem.
  */
--- a/lib/stackdepot.c~lib-fix-spelling-mistakes
+++ a/lib/stackdepot.c
@@ -11,7 +11,7 @@
  * Instead, stack depot maintains a hashtable of unique stacktraces. Since alloc
  * and free stacks repeat a lot, we save about 100x space.
  * Stacks are never removed from depot, so we store them contiguously one after
- * another in a contiguos memory allocation.
+ * another in a contiguous memory allocation.
  *
  * Author: Alexander Potapenko <glider@google.com>
  * Copyright (C) 2016 Google, Inc.
--- a/lib/vsprintf.c~lib-fix-spelling-mistakes
+++ a/lib/vsprintf.c
@@ -3417,7 +3417,7 @@ int vsscanf(const char *buf, const char
 
 	while (*fmt) {
 		/* skip any white space in format */
-		/* white space in format matchs any amount of
+		/* white space in format matches any amount of
 		 * white space, including none, in the input.
 		 */
 		if (isspace(*fmt)) {
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 03/54] lib: fix spelling mistakes in header files
  2021-07-08  0:59 incoming Andrew Morton
  2021-07-08  1:07 ` [patch 01/54] lib/test: fix spelling mistakes Andrew Morton
  2021-07-08  1:07 ` [patch 02/54] lib: " Andrew Morton
@ 2021-07-08  1:07 ` Andrew Morton
  2021-07-08  1:07 ` [patch 04/54] hexagon: handle {,SOFT}IRQENTRY_TEXT in linker script Andrew Morton
                   ` (50 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:07 UTC (permalink / raw)
  To: akpm, cl, dennis, joe, linux-mm, mhiramat, mm-commits,
	thunder.leizhen, tj, torvalds

From: Zhen Lei <thunder.leizhen@huawei.com>
Subject: lib: fix spelling mistakes in header files

Fix some spelling mistakes in comments found by "codespell":
Hoever ==> However
poiter ==> pointer
representaion ==> representation
uppon ==> upon
independend ==> independent
aquired ==> acquired
mis-match ==> mismatch
scrach ==> scratch
struture ==> structure
Analagous ==> Analogous
interation ==> iteration

And some were discovered manually by Joe Perches and Christoph Lameter:
stroed ==> stored
arch independent ==> an architecture independent
A example structure for ==> Example structure for

Link: https://lkml.kernel.org/r/20210609150027.14805-2-thunder.leizhen@huawei.com
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Christoph Lameter <cl@gentwo.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/bootconfig.h      |    4 ++--
 include/linux/cpumask.h         |    2 +-
 include/linux/debugobjects.h    |    2 +-
 include/linux/lru_cache.h       |    8 ++++----
 include/linux/nodemask.h        |    6 +++---
 include/linux/percpu-refcount.h |    2 +-
 include/linux/scatterlist.h     |    2 +-
 7 files changed, 13 insertions(+), 13 deletions(-)

--- a/include/linux/bootconfig.h~lib-fix-spelling-mistakes-in-header-files
+++ a/include/linux/bootconfig.h
@@ -214,10 +214,10 @@ static inline struct xbc_node * __init x
  * @value: Iterated value of array entry.
  *
  * Iterate array entries of given @key under @node. Each array entry node
- * is stroed to @anode and @value. If the @node doesn't have @key node,
+ * is stored to @anode and @value. If the @node doesn't have @key node,
  * it does nothing.
  * Note that even if the found key node has only one value (not array)
- * this executes block once. Hoever, if the found key node has no value
+ * this executes block once. However, if the found key node has no value
  * (key-only node), this does nothing. So don't use this for testing the
  * key-value pair existence.
  */
--- a/include/linux/cpumask.h~lib-fix-spelling-mistakes-in-header-files
+++ a/include/linux/cpumask.h
@@ -259,7 +259,7 @@ extern int cpumask_next_wrap(int n, cons
 /**
  * for_each_cpu_wrap - iterate over every cpu in a mask, starting at a specified location
  * @cpu: the (optionally unsigned) integer iterator
- * @mask: the cpumask poiter
+ * @mask: the cpumask pointer
  * @start: the start location
  *
  * The implementation does not assume any bit in @mask is set (including @start).
--- a/include/linux/debugobjects.h~lib-fix-spelling-mistakes-in-header-files
+++ a/include/linux/debugobjects.h
@@ -18,7 +18,7 @@ enum debug_obj_state {
 struct debug_obj_descr;
 
 /**
- * struct debug_obj - representaion of an tracked object
+ * struct debug_obj - representation of an tracked object
  * @node:	hlist node to link the object into the tracker list
  * @state:	tracked object state
  * @astate:	current active state
--- a/include/linux/lru_cache.h~lib-fix-spelling-mistakes-in-header-files
+++ a/include/linux/lru_cache.h
@@ -32,7 +32,7 @@ This header file (and its .c file; kerne
   Because of this later property, it is called "lru_cache".
   As it actually Tracks Objects in an Active SeT, we could also call it
   toast (incidentally that is what may happen to the data on the
-  backend storage uppon next resync, if we don't get it right).
+  backend storage upon next resync, if we don't get it right).
 
 What for?
 
@@ -152,7 +152,7 @@ struct lc_element {
 	 * for paranoia, and for "lc_element_to_index" */
 	unsigned lc_index;
 	/* if we want to track a larger set of objects,
-	 * it needs to become arch independend u64 */
+	 * it needs to become an architecture independent u64 */
 	unsigned lc_number;
 	/* special label when on free list */
 #define LC_FREE (~0U)
@@ -263,7 +263,7 @@ extern void lc_seq_dump_details(struct s
  *
  * Allows (expects) the set to be "dirty".  Note that the reference counts and
  * order on the active and lru lists may still change.  Used to serialize
- * changing transactions.  Returns true if we aquired the lock.
+ * changing transactions.  Returns true if we acquired the lock.
  */
 static inline int lc_try_lock_for_transaction(struct lru_cache *lc)
 {
@@ -275,7 +275,7 @@ static inline int lc_try_lock_for_transa
  * @lc: the lru cache to operate on
  *
  * Note that the reference counts and order on the active and lru lists may
- * still change.  Only works on a "clean" set.  Returns true if we aquired the
+ * still change.  Only works on a "clean" set.  Returns true if we acquired the
  * lock, which means there are no pending changes, and any further attempt to
  * change the set will not succeed until the next lc_unlock().
  */
--- a/include/linux/nodemask.h~lib-fix-spelling-mistakes-in-header-files
+++ a/include/linux/nodemask.h
@@ -119,7 +119,7 @@ static inline const unsigned long *__nod
  * The inline keyword gives the compiler room to decide to inline, or
  * not inline a function as it sees best.  However, as these functions
  * are called in both __init and non-__init functions, if they are not
- * inlined we will end up with a section mis-match error (of the type of
+ * inlined we will end up with a section mismatch error (of the type of
  * freeable items not being freed).  So we must use __always_inline here
  * to fix the problem.  If other functions in the future also end up in
  * this situation they will also need to be annotated as __always_inline
@@ -515,7 +515,7 @@ static inline int node_random(const node
 #define for_each_online_node(node) for_each_node_state(node, N_ONLINE)
 
 /*
- * For nodemask scrach area.
+ * For nodemask scratch area.
  * NODEMASK_ALLOC(type, name) allocates an object with a specified type and
  * name.
  */
@@ -528,7 +528,7 @@ static inline int node_random(const node
 #define NODEMASK_FREE(m)			do {} while (0)
 #endif
 
-/* A example struture for using NODEMASK_ALLOC, used in mempolicy. */
+/* Example structure for using NODEMASK_ALLOC, used in mempolicy. */
 struct nodemask_scratch {
 	nodemask_t	mask1;
 	nodemask_t	mask2;
--- a/include/linux/percpu-refcount.h~lib-fix-spelling-mistakes-in-header-files
+++ a/include/linux/percpu-refcount.h
@@ -213,7 +213,7 @@ static inline void percpu_ref_get_many(s
  * percpu_ref_get - increment a percpu refcount
  * @ref: percpu_ref to get
  *
- * Analagous to atomic_long_inc().
+ * Analogous to atomic_long_inc().
  *
  * This function is safe to call as long as @ref is between init and exit.
  */
--- a/include/linux/scatterlist.h~lib-fix-spelling-mistakes-in-header-files
+++ a/include/linux/scatterlist.h
@@ -474,7 +474,7 @@ sg_page_iter_dma_address(struct sg_dma_p
  * Iterates over sg entries mapping page-by-page.  On each successful
  * iteration, @miter->page points to the mapped page and
  * @miter->length bytes of data can be accessed at @miter->addr.  As
- * long as an interation is enclosed between start and stop, the user
+ * long as an iteration is enclosed between start and stop, the user
  * is free to choose control structure and when to stop.
  *
  * @miter->consumed is set to @miter->length on each iteration.  It
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 04/54] hexagon: handle {,SOFT}IRQENTRY_TEXT in linker script
  2021-07-08  0:59 incoming Andrew Morton
                   ` (2 preceding siblings ...)
  2021-07-08  1:07 ` [patch 03/54] lib: fix spelling mistakes in header files Andrew Morton
@ 2021-07-08  1:07 ` Andrew Morton
  2021-07-08  1:07 ` [patch 05/54] hexagon: use common DISCARDS macro Andrew Morton
                   ` (49 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:07 UTC (permalink / raw)
  To: akpm, bcain, glittao, linux-mm, mm-commits, nathan, ndesaulniers,
	rientjes, torvalds, vbabka

From: Nathan Chancellor <nathan@kernel.org>
Subject: hexagon: handle {,SOFT}IRQENTRY_TEXT in linker script

Patch series "hexagon: Fix build error with CONFIG_STACKDEPOT and select CONFIG_ARCH_WANT_LD_ORPHAN_WARN".

This series fixes an error with ARCH=hexagon that was pointed out by the
patch "mm/slub: use stackdepot to save stack trace in objects".

The first patch fixes that error by handling the '.irqentry.text' and
'.softirqentry.text' sections.

The second patch switches Hexagon over to the common DISCARDS macro, which
should have been done when Hexagon was merged into the tree to match
commit 023bf6f1b8bf ("linker script: unify usage of discard definition").

The third patch selects CONFIG_ARCH_WANT_LD_ORPHAN_WARN so that something
like this does not happen again.


This patch (of 3):

Patch "mm/slub: use stackdepot to save stack trace in objects" in -mm
selects CONFIG_STACKDEPOT when CONFIG_STACKTRACE_SUPPORT is selected and
CONFIG_STACKDEPOT requires IRQENTRY_TEXT and SOFTIRQENTRY_TEXT to be
handled after commit 505a0ef15f96 ("kasan: stackdepot: move
filter_irq_stacks() to stackdepot.c") due to the use of the
__{,soft}irqentry_text_{start,end} section symbols.  If those sections are
not handled, the build is broken.

$ make ARCH=hexagon CROSS_COMPILE=hexagon-linux- LLVM=1 LLVM_IAS=1 defconfig all
...
ld.lld: error: undefined symbol: __irqentry_text_start
>>> referenced by stackdepot.c
>>>               stackdepot.o:(filter_irq_stacks) in archive lib/built-in.a
>>> referenced by stackdepot.c
>>>               stackdepot.o:(filter_irq_stacks) in archive lib/built-in.a

ld.lld: error: undefined symbol: __irqentry_text_end
>>> referenced by stackdepot.c
>>>               stackdepot.o:(filter_irq_stacks) in archive lib/built-in.a
>>> referenced by stackdepot.c
>>>               stackdepot.o:(filter_irq_stacks) in archive lib/built-in.a

ld.lld: error: undefined symbol: __softirqentry_text_start
>>> referenced by stackdepot.c
>>>               stackdepot.o:(filter_irq_stacks) in archive lib/built-in.a
>>> referenced by stackdepot.c
>>>               stackdepot.o:(filter_irq_stacks) in archive lib/built-in.a

ld.lld: error: undefined symbol: __softirqentry_text_end
>>> referenced by stackdepot.c
>>>               stackdepot.o:(filter_irq_stacks) in archive lib/built-in.a
>>> referenced by stackdepot.c
>>>               stackdepot.o:(filter_irq_stacks) in archive lib/built-in.a
...

Add these sections to the Hexagon linker script so the build continues to
work.  ld.lld's orphan section warning would have caught this prior to the
-mm commit mentioned above:

ld.lld: warning: kernel/built-in.a(softirq.o):(.softirqentry.text) is being placed in '.softirqentry.text'
ld.lld: warning: kernel/built-in.a(softirq.o):(.softirqentry.text) is being placed in '.softirqentry.text'
ld.lld: warning: kernel/built-in.a(softirq.o):(.softirqentry.text) is being placed in '.softirqentry.text'

Link: https://lkml.kernel.org/r/20210521011239.1332345-1-nathan@kernel.org
Link: https://lkml.kernel.org/r/20210521011239.1332345-2-nathan@kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/1381
Fixes: 505a0ef15f96 ("kasan: stackdepot: move filter_irq_stacks() to stackdepot.c")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Brian Cain <bcain@codeaurora.org>
Cc: Oliver Glitta <glittao@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/hexagon/kernel/vmlinux.lds.S |    2 ++
 1 file changed, 2 insertions(+)

--- a/arch/hexagon/kernel/vmlinux.lds.S~hexagon-handle-softirqentry_text-in-linker-script
+++ a/arch/hexagon/kernel/vmlinux.lds.S
@@ -38,6 +38,8 @@ SECTIONS
 	.text : AT(ADDR(.text)) {
 		_text = .;
 		TEXT_TEXT
+		IRQENTRY_TEXT
+		SOFTIRQENTRY_TEXT
 		SCHED_TEXT
 		CPUIDLE_TEXT
 		LOCK_TEXT
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 05/54] hexagon: use common DISCARDS macro
  2021-07-08  0:59 incoming Andrew Morton
                   ` (3 preceding siblings ...)
  2021-07-08  1:07 ` [patch 04/54] hexagon: handle {,SOFT}IRQENTRY_TEXT in linker script Andrew Morton
@ 2021-07-08  1:07 ` Andrew Morton
  2021-07-08  1:07 ` [patch 06/54] hexagon: select ARCH_WANT_LD_ORPHAN_WARN Andrew Morton
                   ` (48 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:07 UTC (permalink / raw)
  To: akpm, bcain, glittao, linux-mm, mm-commits, nathan, ndesaulniers,
	rientjes, torvalds, vbabka

From: Nathan Chancellor <nathan@kernel.org>
Subject: hexagon: use common DISCARDS macro

ld.lld warns that the '.modinfo' section is not currently handled:

ld.lld: warning: kernel/built-in.a(workqueue.o):(.modinfo) is being placed in '.modinfo'
ld.lld: warning: kernel/built-in.a(printk/printk.o):(.modinfo) is being placed in '.modinfo'
ld.lld: warning: kernel/built-in.a(irq/spurious.o):(.modinfo) is being placed in '.modinfo'
ld.lld: warning: kernel/built-in.a(rcu/update.o):(.modinfo) is being placed in '.modinfo'

The '.modinfo' section was added in commit 898490c010b5 ("moduleparam:
Save information about built-in modules in separate file") to the DISCARDS
macro but Hexagon has never used that macro.  The unification of DISCARDS
happened in commit 023bf6f1b8bf ("linker script: unify usage of discard
definition") in 2009, prior to Hexagon being added in 2011.

Switch Hexagon over to the DISCARDS macro so that anything that is
expected to be discarded gets discarded.

Link: https://lkml.kernel.org/r/20210521011239.1332345-3-nathan@kernel.org
Fixes: e95bf452a9e2 ("Hexagon: Add configuration and makefiles for the Hexagon architecture.")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Brian Cain <bcain@codeaurora.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Oliver Glitta <glittao@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/hexagon/kernel/vmlinux.lds.S |    7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

--- a/arch/hexagon/kernel/vmlinux.lds.S~hexagon-use-common-discards-macro
+++ a/arch/hexagon/kernel/vmlinux.lds.S
@@ -61,14 +61,9 @@ SECTIONS
 
 	_end = .;
 
-	/DISCARD/ : {
-		EXIT_TEXT
-		EXIT_DATA
-		EXIT_CALL
-	}
-
 	STABS_DEBUG
 	DWARF_DEBUG
 	ELF_DETAILS
 
+	DISCARDS
 }
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 06/54] hexagon: select ARCH_WANT_LD_ORPHAN_WARN
  2021-07-08  0:59 incoming Andrew Morton
                   ` (4 preceding siblings ...)
  2021-07-08  1:07 ` [patch 05/54] hexagon: use common DISCARDS macro Andrew Morton
@ 2021-07-08  1:07 ` Andrew Morton
  2021-07-08  1:07 ` [patch 07/54] mm/slub: use stackdepot to save stack trace in objects Andrew Morton
                   ` (47 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:07 UTC (permalink / raw)
  To: akpm, bcain, glittao, linux-mm, mm-commits, nathan, ndesaulniers,
	rientjes, torvalds, vbabka

From: Nathan Chancellor <nathan@kernel.org>
Subject: hexagon: select ARCH_WANT_LD_ORPHAN_WARN

Now that we handle all of the sections in a Hexagon defconfig, select
ARCH_WANT_LD_ORPHAN_WARN so that unhandled sections are warned about by
default.

Link: https://lkml.kernel.org/r/20210521011239.1332345-4-nathan@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Brian Cain <bcain@codeaurora.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Oliver Glitta <glittao@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/hexagon/Kconfig |    1 +
 1 file changed, 1 insertion(+)

--- a/arch/hexagon/Kconfig~hexagon-select-arch_want_ld_orphan_warn
+++ a/arch/hexagon/Kconfig
@@ -30,6 +30,7 @@ config HEXAGON
 	select MODULES_USE_ELF_RELA
 	select GENERIC_CPU_DEVICES
 	select SET_FS
+	select ARCH_WANT_LD_ORPHAN_WARN
 	help
 	  Qualcomm Hexagon is a processor architecture designed for high
 	  performance and low power across a wide variety of applications.
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 07/54] mm/slub: use stackdepot to save stack trace in objects
  2021-07-08  0:59 incoming Andrew Morton
                   ` (5 preceding siblings ...)
  2021-07-08  1:07 ` [patch 06/54] hexagon: select ARCH_WANT_LD_ORPHAN_WARN Andrew Morton
@ 2021-07-08  1:07 ` Andrew Morton
  2021-07-16  7:39   ` Christoph Hellwig
  2021-07-08  1:07 ` [patch 08/54] mmap: make mlock_future_check() global Andrew Morton
                   ` (46 subsequent siblings)
  53 siblings, 1 reply; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:07 UTC (permalink / raw)
  To: akpm, cl, glittao, iamjoonsoo.kim, linux-mm, mm-commits, penberg,
	rdunlap, rientjes, torvalds, vbabka

From: Oliver Glitta <glittao@gmail.com>
Subject: mm/slub: use stackdepot to save stack trace in objects

Many stack traces are similar so there are many similar arrays. 
Stackdepot saves each unique stack only once.

Replace field addrs in struct track with depot_stack_handle_t handle.  Use
stackdepot to save stack trace.

The benefits are smaller memory overhead and possibility to aggregate
per-cache statistics in the future using the stackdepot handle instead of
matching stacks manually.

[rdunlap@infradead.org: rename save_stack_trace()]
  Link: https://lkml.kernel.org/r/20210513051920.29320-1-rdunlap@infradead.org
[vbabka@suse.cz: fix lockdep splat]
  Link: https://lkml.kernel.org/r/20210516195150.26740-1-vbabka@suse.czLink: https://lkml.kernel.org/r/20210414163434.4376-1-glittao@gmail.com
Signed-off-by: Oliver Glitta <glittao@gmail.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 init/Kconfig |    1 
 mm/slub.c    |   79 ++++++++++++++++++++++++++++++-------------------
 2 files changed, 50 insertions(+), 30 deletions(-)

--- a/init/Kconfig~mm-slub-use-stackdepot-to-save-stack-trace-in-objects
+++ a/init/Kconfig
@@ -1847,6 +1847,7 @@ config SLUB_DEBUG
 	default y
 	bool "Enable SLUB debugging support" if EXPERT
 	depends on SLUB && SYSFS
+	select STACKDEPOT if STACKTRACE_SUPPORT
 	help
 	  SLUB has extensive debug support features. Disabling these can
 	  result in significant savings in code size. This also disables
--- a/mm/slub.c~mm-slub-use-stackdepot-to-save-stack-trace-in-objects
+++ a/mm/slub.c
@@ -26,6 +26,7 @@
 #include <linux/cpuset.h>
 #include <linux/mempolicy.h>
 #include <linux/ctype.h>
+#include <linux/stackdepot.h>
 #include <linux/debugobjects.h>
 #include <linux/kallsyms.h>
 #include <linux/kfence.h>
@@ -220,8 +221,8 @@ static inline bool kmem_cache_has_cpu_pa
 #define TRACK_ADDRS_COUNT 16
 struct track {
 	unsigned long addr;	/* Called from address */
-#ifdef CONFIG_STACKTRACE
-	unsigned long addrs[TRACK_ADDRS_COUNT];	/* Called from address */
+#ifdef CONFIG_STACKDEPOT
+	depot_stack_handle_t handle;
 #endif
 	int cpu;		/* Was running on cpu */
 	int pid;		/* Pid context */
@@ -625,22 +626,27 @@ static struct track *get_track(struct km
 	return kasan_reset_tag(p + alloc);
 }
 
+#ifdef CONFIG_STACKDEPOT
+static depot_stack_handle_t save_stack_depot_trace(gfp_t flags)
+{
+	unsigned long entries[TRACK_ADDRS_COUNT];
+	depot_stack_handle_t handle;
+	unsigned int nr_entries;
+
+	nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 4);
+	handle = stack_depot_save(entries, nr_entries, flags);
+	return handle;
+}
+#endif
+
 static void set_track(struct kmem_cache *s, void *object,
 			enum track_item alloc, unsigned long addr)
 {
 	struct track *p = get_track(s, object, alloc);
 
 	if (addr) {
-#ifdef CONFIG_STACKTRACE
-		unsigned int nr_entries;
-
-		metadata_access_enable();
-		nr_entries = stack_trace_save(kasan_reset_tag(p->addrs),
-					      TRACK_ADDRS_COUNT, 3);
-		metadata_access_disable();
-
-		if (nr_entries < TRACK_ADDRS_COUNT)
-			p->addrs[nr_entries] = 0;
+#ifdef CONFIG_STACKDEPOT
+		p->handle = save_stack_depot_trace(GFP_NOWAIT);
 #endif
 		p->addr = addr;
 		p->cpu = smp_processor_id();
@@ -667,14 +673,19 @@ static void print_track(const char *s, s
 
 	pr_err("%s in %pS age=%lu cpu=%u pid=%d\n",
 	       s, (void *)t->addr, pr_time - t->when, t->cpu, t->pid);
-#ifdef CONFIG_STACKTRACE
+#ifdef CONFIG_STACKDEPOT
 	{
-		int i;
-		for (i = 0; i < TRACK_ADDRS_COUNT; i++)
-			if (t->addrs[i])
-				pr_err("\t%pS\n", (void *)t->addrs[i]);
-			else
-				break;
+		depot_stack_handle_t handle;
+		unsigned long *entries;
+		unsigned int nr_entries;
+
+		handle = READ_ONCE(t->handle);
+		if (!handle) {
+			pr_err("object allocation/free stack trace missing\n");
+		} else {
+			nr_entries = stack_depot_fetch(handle, &entries);
+			stack_trace_print(entries, nr_entries, 0);
+		}
 	}
 #endif
 }
@@ -4048,18 +4059,26 @@ void kmem_obj_info(struct kmem_obj_info
 	objp = fixup_red_left(s, objp);
 	trackp = get_track(s, objp, TRACK_ALLOC);
 	kpp->kp_ret = (void *)trackp->addr;
-#ifdef CONFIG_STACKTRACE
-	for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) {
-		kpp->kp_stack[i] = (void *)trackp->addrs[i];
-		if (!kpp->kp_stack[i])
-			break;
-	}
+#ifdef CONFIG_STACKDEPOT
+	{
+		depot_stack_handle_t handle;
+		unsigned long *entries;
+		unsigned int nr_entries;
 
-	trackp = get_track(s, objp, TRACK_FREE);
-	for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) {
-		kpp->kp_free_stack[i] = (void *)trackp->addrs[i];
-		if (!kpp->kp_free_stack[i])
-			break;
+		handle = READ_ONCE(trackp->handle);
+		if (handle) {
+			nr_entries = stack_depot_fetch(handle, &entries);
+			for (i = 0; i < KS_ADDRS_COUNT && i < nr_entries; i++)
+				kpp->kp_stack[i] = (void *)entries[i];
+		}
+
+		trackp = get_track(s, objp, TRACK_FREE);
+		handle = READ_ONCE(trackp->handle);
+		if (handle) {
+			nr_entries = stack_depot_fetch(handle, &entries);
+			for (i = 0; i < KS_ADDRS_COUNT && i < nr_entries; i++)
+				kpp->kp_free_stack[i] = (void *)entries[i];
+		}
 	}
 #endif
 #endif
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 08/54] mmap: make mlock_future_check() global
  2021-07-08  0:59 incoming Andrew Morton
                   ` (6 preceding siblings ...)
  2021-07-08  1:07 ` [patch 07/54] mm/slub: use stackdepot to save stack trace in objects Andrew Morton
@ 2021-07-08  1:07 ` Andrew Morton
  2021-07-08  1:07 ` [patch 09/54] riscv/Kconfig: make direct map manipulation options depend on MMU Andrew Morton
                   ` (45 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:07 UTC (permalink / raw)
  To: akpm, arnd, bp, catalin.marinas, cl, dan.j.williams, dave.hansen,
	david, elena.reshetova, guro, hagen, hpa, James.Bottomley, jejb,
	kirill, linux-mm, lkp, luto, mark.rutland, mingo, mm-commits,
	mtk.manpages, palmer, palmerdabbelt, paul.walmsley, peterz,
	rick.p.edgecombe, rppt, shakeelb, shuah, tglx, torvalds, tycho,
	viro, will, willy

From: Mike Rapoport <rppt@linux.ibm.com>
Subject: mmap: make mlock_future_check() global

Patch series "mm: introduce memfd_secret system call to create "secret" memory areas", v20.

This is an implementation of "secret" mappings backed by a file
descriptor.

The file descriptor backing secret memory mappings is created using a
dedicated memfd_secret system call The desired protection mode for the
memory is configured using flags parameter of the system call.  The mmap()
of the file descriptor created with memfd_secret() will create a "secret"
memory mapping.  The pages in that mapping will be marked as not present
in the direct map and will be present only in the page table of the owning
mm.

Although normally Linux userspace mappings are protected from other users,
such secret mappings are useful for environments where a hostile tenant is
trying to trick the kernel into giving them access to other tenants
mappings.

It's designed to provide the following protections:

* Enhanced protection (in conjunction with all the other in-kernel
  attack prevention systems) against ROP attacks.  Seceretmem makes
  "simple" ROP insufficient to perform exfiltration, which increases the
  required complexity of the attack.  Along with other protections like
  the kernel stack size limit and address space layout randomization which
  make finding gadgets is really hard, absence of any in-kernel primitive
  for accessing secret memory means the one gadget ROP attack can't work. 
  Since the only way to access secret memory is to reconstruct the missing
  mapping entry, the attacker has to recover the physical page and insert
  a PTE pointing to it in the kernel and then retrieve the contents.  That
  takes at least three gadgets which is a level of difficulty beyond most
  standard attacks.

* Prevent cross-process secret userspace memory exposures.  Once the
  secret memory is allocated, the user can't accidentally pass it into the
  kernel to be transmitted somewhere.  The secreremem pages cannot be
  accessed via the direct map and they are disallowed in GUP.

* Harden against exploited kernel flaws.  In order to access secretmem,
  a kernel-side attack would need to either walk the page tables and
  create new ones, or spawn a new privileged uiserspace process to perform
  secrets exfiltration using ptrace.

In the future the secret mappings may be used as a mean to protect guest
memory in a virtual machine host.

For demonstration of secret memory usage we've created a userspace library

https://git.kernel.org/pub/scm/linux/kernel/git/jejb/secret-memory-preloader.git

that does two things: the first is act as a preloader for openssl to
redirect all the OPENSSL_malloc calls to secret memory meaning any secret
keys get automatically protected this way and the other thing it does is
expose the API to the user who needs it.  We anticipate that a lot of the
use cases would be like the openssl one: many toolkits that deal with
secret keys already have special handling for the memory to try to give
them greater protection, so this would simply be pluggable into the
toolkits without any need for user application modification.

Hiding secret memory mappings behind an anonymous file allows usage of the
page cache for tracking pages allocated for the "secret" mappings as well
as using address_space_operations for e.g.  page migration callbacks.

The anonymous file may be also used implicitly, like hugetlb files, to
implement mmap(MAP_SECRET) and use the secret memory areas with "native"
mm ABIs in the future.

Removing of the pages from the direct map may cause its fragmentation on
architectures that use large pages to map the physical memory which
affects the system performance.  However, the original Kconfig text for
CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct map "...  can
improve the kernel's performance a tiny bit ..." (commit 00d1c5e05736
("x86: add gbpages switches")) and the recent report [1] showed that "... 
although 1G mappings are a good default choice, there is no compelling
evidence that it must be the only choice".  Hence, it is sufficient to
have secretmem disabled by default with the ability of a system
administrator to enable it at boot time.

In addition, there is also a long term goal to improve management of the
direct map.

[1] https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com/


This patch (of 7):

It will be used by the upcoming secret memory implementation.

Link: https://lkml.kernel.org/r/20210518072034.31572-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20210518072034.31572-2-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Will Deacon <will@kernel.org>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/internal.h |    3 +++
 mm/mmap.c     |    5 ++---
 2 files changed, 5 insertions(+), 3 deletions(-)

--- a/mm/internal.h~mmap-make-mlock_future_check-global
+++ a/mm/internal.h
@@ -360,6 +360,9 @@ static inline void munlock_vma_pages_all
 extern void mlock_vma_page(struct page *page);
 extern unsigned int munlock_vma_page(struct page *page);
 
+extern int mlock_future_check(struct mm_struct *mm, unsigned long flags,
+			      unsigned long len);
+
 /*
  * Clear the page's PageMlocked().  This can be useful in a situation where
  * we want to unconditionally remove a page from the pagecache -- e.g.,
--- a/mm/mmap.c~mmap-make-mlock_future_check-global
+++ a/mm/mmap.c
@@ -1352,9 +1352,8 @@ static inline unsigned long round_hint_t
 	return hint;
 }
 
-static inline int mlock_future_check(struct mm_struct *mm,
-				     unsigned long flags,
-				     unsigned long len)
+int mlock_future_check(struct mm_struct *mm, unsigned long flags,
+		       unsigned long len)
 {
 	unsigned long locked, lock_limit;
 
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 09/54] riscv/Kconfig: make direct map manipulation options depend on MMU
  2021-07-08  0:59 incoming Andrew Morton
                   ` (7 preceding siblings ...)
  2021-07-08  1:07 ` [patch 08/54] mmap: make mlock_future_check() global Andrew Morton
@ 2021-07-08  1:07 ` Andrew Morton
  2021-07-08  1:07 ` [patch 10/54] set_memory: allow querying whether set_direct_map_*() is actually enabled Andrew Morton
                   ` (44 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:07 UTC (permalink / raw)
  To: akpm, arnd, bp, catalin.marinas, cl, dan.j.williams, dave.hansen,
	david, elena.reshetova, guro, hagen, hpa, James.Bottomley, jejb,
	kirill, linux-mm, lkp, luto, mark.rutland, mingo, mm-commits,
	mtk.manpages, palmer, palmerdabbelt, paul.walmsley, peterz,
	rick.p.edgecombe, rppt, shakeelb, shuah, tglx, torvalds, tycho,
	viro, will, willy

From: Mike Rapoport <rppt@linux.ibm.com>
Subject: riscv/Kconfig: make direct map manipulation options depend on MMU

ARCH_HAS_SET_DIRECT_MAP and ARCH_HAS_SET_MEMORY configuration options have
no meaning when CONFIG_MMU is disabled and there is no point to enable
them for the nommu case.

Add an explicit dependency on MMU for these options.

Link: https://lkml.kernel.org/r/20210518072034.31572-3-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/riscv/Kconfig |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/riscv/Kconfig~riscv-kconfig-make-direct-map-manipulation-options-depend-on-mmu
+++ a/arch/riscv/Kconfig
@@ -26,8 +26,8 @@ config RISCV
 	select ARCH_HAS_KCOV
 	select ARCH_HAS_MMIOWB
 	select ARCH_HAS_PTE_SPECIAL
-	select ARCH_HAS_SET_DIRECT_MAP
-	select ARCH_HAS_SET_MEMORY
+	select ARCH_HAS_SET_DIRECT_MAP if MMU
+	select ARCH_HAS_SET_MEMORY if MMU
 	select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
 	select ARCH_HAS_STRICT_MODULE_RWX if MMU && !XIP_KERNEL
 	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 10/54] set_memory: allow querying whether set_direct_map_*() is actually enabled
  2021-07-08  0:59 incoming Andrew Morton
                   ` (8 preceding siblings ...)
  2021-07-08  1:07 ` [patch 09/54] riscv/Kconfig: make direct map manipulation options depend on MMU Andrew Morton
@ 2021-07-08  1:07 ` Andrew Morton
  2021-07-08  1:08 ` [patch 11/54] mm: introduce memfd_secret system call to create "secret" memory areas Andrew Morton
                   ` (43 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:07 UTC (permalink / raw)
  To: akpm, arnd, bp, catalin.marinas, cl, dan.j.williams, dave.hansen,
	david, elena.reshetova, guro, hagen, hpa, James.Bottomley, jejb,
	kirill, linux-mm, lkp, luto, mark.rutland, mingo, mm-commits,
	mtk.manpages, palmer, palmerdabbelt, paul.walmsley, peterz,
	rick.p.edgecombe, rppt, shakeelb, shuah, tglx, torvalds, tycho,
	viro, will, willy

From: Mike Rapoport <rppt@linux.ibm.com>
Subject: set_memory: allow querying whether set_direct_map_*() is actually enabled

On arm64, set_direct_map_*() functions may return 0 without actually
changing the linear map.  This behaviour can be controlled using kernel
parameters, so we need a way to determine at runtime whether calls to
set_direct_map_invalid_noflush() and set_direct_map_default_noflush() have
any effect.

Extend set_memory API with can_set_direct_map() function that allows
checking if calling set_direct_map_*() will actually change the page
table, replace several occurrences of open coded checks in arm64 with the
new function and provide a generic stub for architectures that always
modify page tables upon calls to set_direct_map APIs.

[arnd@arndb.de: arm64: kfence: fix header inclusion ]
Link: https://lkml.kernel.org/r/20210518072034.31572-4-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Will Deacon <will@kernel.org>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/arm64/include/asm/Kbuild       |    1 -
 arch/arm64/include/asm/cacheflush.h |    6 ------
 arch/arm64/include/asm/kfence.h     |    2 +-
 arch/arm64/include/asm/set_memory.h |   17 +++++++++++++++++
 arch/arm64/kernel/machine_kexec.c   |    1 +
 arch/arm64/mm/mmu.c                 |    7 +++----
 arch/arm64/mm/pageattr.c            |   13 +++++++++----
 include/linux/set_memory.h          |   12 ++++++++++++
 8 files changed, 43 insertions(+), 16 deletions(-)

--- a/arch/arm64/include/asm/cacheflush.h~set_memory-allow-querying-whether-set_direct_map_-is-actually-enabled
+++ a/arch/arm64/include/asm/cacheflush.h
@@ -144,12 +144,6 @@ static __always_inline void icache_inval
 	dsb(ish);
 }
 
-int set_memory_valid(unsigned long addr, int numpages, int enable);
-
-int set_direct_map_invalid_noflush(struct page *page);
-int set_direct_map_default_noflush(struct page *page);
-bool kernel_page_present(struct page *page);
-
 #include <asm-generic/cacheflush.h>
 
 #endif /* __ASM_CACHEFLUSH_H */
--- a/arch/arm64/include/asm/Kbuild~set_memory-allow-querying-whether-set_direct_map_-is-actually-enabled
+++ a/arch/arm64/include/asm/Kbuild
@@ -3,7 +3,6 @@ generic-y += early_ioremap.h
 generic-y += mcs_spinlock.h
 generic-y += qrwlock.h
 generic-y += qspinlock.h
-generic-y += set_memory.h
 generic-y += user.h
 
 generated-y += cpucaps.h
--- a/arch/arm64/include/asm/kfence.h~set_memory-allow-querying-whether-set_direct_map_-is-actually-enabled
+++ a/arch/arm64/include/asm/kfence.h
@@ -8,7 +8,7 @@
 #ifndef __ASM_KFENCE_H
 #define __ASM_KFENCE_H
 
-#include <asm/cacheflush.h>
+#include <asm/set_memory.h>
 
 static inline bool arch_kfence_init_pool(void) { return true; }
 
--- /dev/null
+++ a/arch/arm64/include/asm/set_memory.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _ASM_ARM64_SET_MEMORY_H
+#define _ASM_ARM64_SET_MEMORY_H
+
+#include <asm-generic/set_memory.h>
+
+bool can_set_direct_map(void);
+#define can_set_direct_map can_set_direct_map
+
+int set_memory_valid(unsigned long addr, int numpages, int enable);
+
+int set_direct_map_invalid_noflush(struct page *page);
+int set_direct_map_default_noflush(struct page *page);
+bool kernel_page_present(struct page *page);
+
+#endif /* _ASM_ARM64_SET_MEMORY_H */
--- a/arch/arm64/kernel/machine_kexec.c~set_memory-allow-querying-whether-set_direct_map_-is-actually-enabled
+++ a/arch/arm64/kernel/machine_kexec.c
@@ -11,6 +11,7 @@
 #include <linux/kernel.h>
 #include <linux/kexec.h>
 #include <linux/page-flags.h>
+#include <linux/set_memory.h>
 #include <linux/smp.h>
 
 #include <asm/cacheflush.h>
--- a/arch/arm64/mm/mmu.c~set_memory-allow-querying-whether-set_direct_map_-is-actually-enabled
+++ a/arch/arm64/mm/mmu.c
@@ -22,6 +22,7 @@
 #include <linux/io.h>
 #include <linux/mm.h>
 #include <linux/vmalloc.h>
+#include <linux/set_memory.h>
 
 #include <asm/barrier.h>
 #include <asm/cputype.h>
@@ -515,8 +516,7 @@ static void __init map_mem(pgd_t *pgdp)
 	 */
 	BUILD_BUG_ON(pgd_index(direct_map_end - 1) == pgd_index(direct_map_end));
 
-	if (rodata_full || crash_mem_map || debug_pagealloc_enabled() ||
-	    IS_ENABLED(CONFIG_KFENCE))
+	if (can_set_direct_map() || crash_mem_map || IS_ENABLED(CONFIG_KFENCE))
 		flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
 
 	/*
@@ -1489,8 +1489,7 @@ int arch_add_memory(int nid, u64 start,
 	 * KFENCE requires linear map to be mapped at page granularity, so that
 	 * it is possible to protect/unprotect single pages in the KFENCE pool.
 	 */
-	if (rodata_full || debug_pagealloc_enabled() ||
-	    IS_ENABLED(CONFIG_KFENCE))
+	if (can_set_direct_map() || IS_ENABLED(CONFIG_KFENCE))
 		flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
 
 	__create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
--- a/arch/arm64/mm/pageattr.c~set_memory-allow-querying-whether-set_direct_map_-is-actually-enabled
+++ a/arch/arm64/mm/pageattr.c
@@ -19,6 +19,11 @@ struct page_change_data {
 
 bool rodata_full __ro_after_init = IS_ENABLED(CONFIG_RODATA_FULL_DEFAULT_ENABLED);
 
+bool can_set_direct_map(void)
+{
+	return rodata_full || debug_pagealloc_enabled();
+}
+
 static int change_page_range(pte_t *ptep, unsigned long addr, void *data)
 {
 	struct page_change_data *cdata = data;
@@ -155,7 +160,7 @@ int set_direct_map_invalid_noflush(struc
 		.clear_mask = __pgprot(PTE_VALID),
 	};
 
-	if (!debug_pagealloc_enabled() && !rodata_full)
+	if (!can_set_direct_map())
 		return 0;
 
 	return apply_to_page_range(&init_mm,
@@ -170,7 +175,7 @@ int set_direct_map_default_noflush(struc
 		.clear_mask = __pgprot(PTE_RDONLY),
 	};
 
-	if (!debug_pagealloc_enabled() && !rodata_full)
+	if (!can_set_direct_map())
 		return 0;
 
 	return apply_to_page_range(&init_mm,
@@ -181,7 +186,7 @@ int set_direct_map_default_noflush(struc
 #ifdef CONFIG_DEBUG_PAGEALLOC
 void __kernel_map_pages(struct page *page, int numpages, int enable)
 {
-	if (!debug_pagealloc_enabled() && !rodata_full)
+	if (!can_set_direct_map())
 		return;
 
 	set_memory_valid((unsigned long)page_address(page), numpages, enable);
@@ -206,7 +211,7 @@ bool kernel_page_present(struct page *pa
 	pte_t *ptep;
 	unsigned long addr = (unsigned long)page_address(page);
 
-	if (!debug_pagealloc_enabled() && !rodata_full)
+	if (!can_set_direct_map())
 		return true;
 
 	pgdp = pgd_offset_k(addr);
--- a/include/linux/set_memory.h~set_memory-allow-querying-whether-set_direct_map_-is-actually-enabled
+++ a/include/linux/set_memory.h
@@ -28,7 +28,19 @@ static inline bool kernel_page_present(s
 {
 	return true;
 }
+#else /* CONFIG_ARCH_HAS_SET_DIRECT_MAP */
+/*
+ * Some architectures, e.g. ARM64 can disable direct map modifications at
+ * boot time. Let them overrive this query.
+ */
+#ifndef can_set_direct_map
+static inline bool can_set_direct_map(void)
+{
+	return true;
+}
+#define can_set_direct_map can_set_direct_map
 #endif
+#endif /* CONFIG_ARCH_HAS_SET_DIRECT_MAP */
 
 #ifndef set_mce_nospec
 static inline int set_mce_nospec(unsigned long pfn, bool unmap)
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 11/54] mm: introduce memfd_secret system call to create "secret" memory areas
  2021-07-08  0:59 incoming Andrew Morton
                   ` (9 preceding siblings ...)
  2021-07-08  1:07 ` [patch 10/54] set_memory: allow querying whether set_direct_map_*() is actually enabled Andrew Morton
@ 2021-07-08  1:08 ` Andrew Morton
  2021-07-08  3:13     ` Linus Torvalds
  2021-07-08  1:08 ` [patch 12/54] PM: hibernate: disable when there are active secretmem users Andrew Morton
                   ` (42 subsequent siblings)
  53 siblings, 1 reply; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:08 UTC (permalink / raw)
  To: akpm, arnd, bp, catalin.marinas, cl, dan.j.williams, dave.hansen,
	david, elena.reshetova, guro, hagen, hpa, James.Bottomley, jejb,
	kirill, linux-mm, lkp, luto, mark.rutland, mingo, mm-commits,
	mtk.manpages, palmer, palmerdabbelt, paul.walmsley, peterz,
	rick.p.edgecombe, rppt, shakeelb, shuah, tglx, torvalds, tycho,
	viro, will, willy

From: Mike Rapoport <rppt@linux.ibm.com>
Subject: mm: introduce memfd_secret system call to create "secret" memory areas

Introduce "memfd_secret" system call with the ability to create memory
areas visible only in the context of the owning process and not mapped not
only to other processes but in the kernel page tables as well.

The secretmem feature is off by default and the user must explicitly
enable it at the boot time.

Once secretmem is enabled, the user will be able to create a file
descriptor using the memfd_secret() system call.  The memory areas created
by mmap() calls from this file descriptor will be unmapped from the kernel
direct map and they will be only mapped in the page table of the processes
that have access to the file descriptor.

Secretmem is designed to provide the following protections:

* Enhanced protection (in conjunction with all the other in-kernel
  attack prevention systems) against ROP attacks.  Seceretmem makes
  "simple" ROP insufficient to perform exfiltration, which increases the
  required complexity of the attack.  Along with other protections like
  the kernel stack size limit and address space layout randomization which
  make finding gadgets is really hard, absence of any in-kernel primitive
  for accessing secret memory means the one gadget ROP attack can't work. 
  Since the only way to access secret memory is to reconstruct the missing
  mapping entry, the attacker has to recover the physical page and insert
  a PTE pointing to it in the kernel and then retrieve the contents.  That
  takes at least three gadgets which is a level of difficulty beyond most
  standard attacks.

* Prevent cross-process secret userspace memory exposures.  Once the
  secret memory is allocated, the user can't accidentally pass it into the
  kernel to be transmitted somewhere.  The secreremem pages cannot be
  accessed via the direct map and they are disallowed in GUP.

* Harden against exploited kernel flaws.  In order to access secretmem,
  a kernel-side attack would need to either walk the page tables and
  create new ones, or spawn a new privileged uiserspace process to perform
  secrets exfiltration using ptrace.

The file descriptor based memory has several advantages over the
"traditional" mm interfaces, such as mlock(), mprotect(), madvise().  File
descriptor approach allows explicit and controlled sharing of the memory
areas, it allows to seal the operations.  Besides, file descriptor based
memory paves the way for VMMs to remove the secret memory range from the
userspace hipervisor process, for instance QEMU.  Andy Lutomirski says:

  "Getting fd-backed memory into a guest will take some possibly major
  work in the kernel, but getting vma-backed memory into a guest without
  mapping it in the host user address space seems much, much worse."

memfd_secret() is made a dedicated system call rather than an extension to
memfd_create() because it's purpose is to allow the user to create more
secure memory mappings rather than to simply allow file based access to
the memory.  Nowadays a new system call cost is negligible while it is way
simpler for userspace to deal with a clear-cut system calls than with a
multiplexer or an overloaded syscall.  Moreover, the initial
implementation of memfd_secret() is completely distinct from
memfd_create() so there is no much sense in overloading memfd_create() to
begin with.  If there will be a need for code sharing between these
implementation it can be easily achieved without a need to adjust user
visible APIs.

The secret memory remains accessible in the process context using uaccess
primitives, but it is not exposed to the kernel otherwise; secret memory
areas are removed from the direct map and functions in the
follow_page()/get_user_page() family will refuse to return a page that
belongs to the secret memory area.

Once there will be a use case that will require exposing secretmem to the
kernel it will be an opt-in request in the system call flags so that user
would have to decide what data can be exposed to the kernel.

Removing of the pages from the direct map may cause its fragmentation on
architectures that use large pages to map the physical memory which
affects the system performance.  However, the original Kconfig text for
CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct map "...  can
improve the kernel's performance a tiny bit ..." (commit 00d1c5e05736
("x86: add gbpages switches")) and the recent report [1] showed that "... 
although 1G mappings are a good default choice, there is no compelling
evidence that it must be the only choice".  Hence, it is sufficient to
have secretmem disabled by default with the ability of a system
administrator to enable it at boot time.

Pages in the secretmem regions are unevictable and unmovable to avoid
accidental exposure of the sensitive data via swap or during page
migration.

Since the secretmem mappings are locked in memory they cannot exceed
RLIMIT_MEMLOCK.  Since these mappings are already locked independently
from mlock(), an attempt to mlock()/munlock() secretmem range would fail
and mlockall()/munlockall() will ignore secretmem mappings.

However, unlike mlock()ed memory, secretmem currently behaves more like
long-term GUP: secretmem mappings are unmovable mappings directly consumed
by user space.  With default limits, there is no excessive use of
secretmem and it poses no real problem in combination with
ZONE_MOVABLE/CMA, but in the future this should be addressed to allow
balanced use of large amounts of secretmem along with ZONE_MOVABLE/CMA.

A page that was a part of the secret memory area is cleared when it is
freed to ensure the data is not exposed to the next user of that page.

The following example demonstrates creation of a secret mapping (error
handling is omitted):

	fd = memfd_secret(0);
	ftruncate(fd, MAP_SIZE);
	ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE,
		   MAP_SHARED, fd, 0);

[1] https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com/

[akpm@linux-foundation.org: suppress Kconfig whine]
Link: https://lkml.kernel.org/r/20210518072034.31572-5-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Hagen Paul Pfeifer <hagen@jauu.net>
Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Will Deacon <will@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/secretmem.h  |   48 +++++++
 include/uapi/linux/magic.h |    1 
 kernel/sys_ni.c            |    2 
 mm/Kconfig                 |    4 
 mm/Makefile                |    1 
 mm/gup.c                   |   12 +
 mm/mlock.c                 |    3 
 mm/secretmem.c             |  239 +++++++++++++++++++++++++++++++++++
 8 files changed, 309 insertions(+), 1 deletion(-)

--- /dev/null
+++ a/include/linux/secretmem.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _LINUX_SECRETMEM_H
+#define _LINUX_SECRETMEM_H
+
+#ifdef CONFIG_SECRETMEM
+
+extern const struct address_space_operations secretmem_aops;
+
+static inline bool page_is_secretmem(struct page *page)
+{
+	struct address_space *mapping;
+
+	/*
+	 * Using page_mapping() is quite slow because of the actual call
+	 * instruction and repeated compound_head(page) inside the
+	 * page_mapping() function.
+	 * We know that secretmem pages are not compound and LRU so we can
+	 * save a couple of cycles here.
+	 */
+	if (PageCompound(page) || !PageLRU(page))
+		return false;
+
+	mapping = (struct address_space *)
+		((unsigned long)page->mapping & ~PAGE_MAPPING_FLAGS);
+
+	if (mapping != page->mapping)
+		return false;
+
+	return mapping->a_ops == &secretmem_aops;
+}
+
+bool vma_is_secretmem(struct vm_area_struct *vma);
+
+#else
+
+static inline bool vma_is_secretmem(struct vm_area_struct *vma)
+{
+	return false;
+}
+
+static inline bool page_is_secretmem(struct page *page)
+{
+	return false;
+}
+
+#endif /* CONFIG_SECRETMEM */
+
+#endif /* _LINUX_SECRETMEM_H */
--- a/include/uapi/linux/magic.h~mm-introduce-memfd_secret-system-call-to-create-secret-memory-areas
+++ a/include/uapi/linux/magic.h
@@ -97,5 +97,6 @@
 #define DEVMEM_MAGIC		0x454d444d	/* "DMEM" */
 #define Z3FOLD_MAGIC		0x33
 #define PPC_CMM_MAGIC		0xc7571590
+#define SECRETMEM_MAGIC		0x5345434d	/* "SECM" */
 
 #endif /* __LINUX_MAGIC_H__ */
--- a/kernel/sys_ni.c~mm-introduce-memfd_secret-system-call-to-create-secret-memory-areas
+++ a/kernel/sys_ni.c
@@ -358,6 +358,8 @@ COND_SYSCALL(pkey_mprotect);
 COND_SYSCALL(pkey_alloc);
 COND_SYSCALL(pkey_free);
 
+/* memfd_secret */
+COND_SYSCALL(memfd_secret);
 
 /*
  * Architecture specific weak syscall entries.
--- a/mm/gup.c~mm-introduce-memfd_secret-system-call-to-create-secret-memory-areas
+++ a/mm/gup.c
@@ -10,6 +10,7 @@
 #include <linux/rmap.h>
 #include <linux/swap.h>
 #include <linux/swapops.h>
+#include <linux/secretmem.h>
 
 #include <linux/sched/signal.h>
 #include <linux/rwsem.h>
@@ -855,6 +856,9 @@ struct page *follow_page(struct vm_area_
 	struct follow_page_context ctx = { NULL };
 	struct page *page;
 
+	if (vma_is_secretmem(vma))
+		return NULL;
+
 	page = follow_page_mask(vma, address, foll_flags, &ctx);
 	if (ctx.pgmap)
 		put_dev_pagemap(ctx.pgmap);
@@ -988,6 +992,9 @@ static int check_vma_flags(struct vm_are
 	if ((gup_flags & FOLL_LONGTERM) && vma_is_fsdax(vma))
 		return -EOPNOTSUPP;
 
+	if (vma_is_secretmem(vma))
+		return -EFAULT;
+
 	if (write) {
 		if (!(vm_flags & VM_WRITE)) {
 			if (!(gup_flags & FOLL_FORCE))
@@ -2170,6 +2177,11 @@ static int gup_pte_range(pmd_t pmd, unsi
 		if (!head)
 			goto pte_unmap;
 
+		if (unlikely(page_is_secretmem(page))) {
+			put_compound_head(head, 1, flags);
+			goto pte_unmap;
+		}
+
 		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
 			put_compound_head(head, 1, flags);
 			goto pte_unmap;
--- a/mm/Kconfig~mm-introduce-memfd_secret-system-call-to-create-secret-memory-areas
+++ a/mm/Kconfig
@@ -885,4 +885,8 @@ config KMAP_LOCAL
 # struct io_mapping based helper.  Selected by drivers that need them
 config IO_MAPPING
 	bool
+
+config SECRETMEM
+	def_bool ARCH_HAS_SET_DIRECT_MAP && !EMBEDDED
+
 endmenu
--- a/mm/Makefile~mm-introduce-memfd_secret-system-call-to-create-secret-memory-areas
+++ a/mm/Makefile
@@ -113,6 +113,7 @@ obj-$(CONFIG_CMA)	+= cma.o
 obj-$(CONFIG_MEMORY_BALLOON) += balloon_compaction.o
 obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
 obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
+obj-$(CONFIG_SECRETMEM) += secretmem.o
 obj-$(CONFIG_CMA_SYSFS) += cma_sysfs.o
 obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
 obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
--- a/mm/mlock.c~mm-introduce-memfd_secret-system-call-to-create-secret-memory-areas
+++ a/mm/mlock.c
@@ -23,6 +23,7 @@
 #include <linux/hugetlb.h>
 #include <linux/memcontrol.h>
 #include <linux/mm_inline.h>
+#include <linux/secretmem.h>
 
 #include "internal.h"
 
@@ -503,7 +504,7 @@ static int mlock_fixup(struct vm_area_st
 
 	if (newflags == vma->vm_flags || (vma->vm_flags & VM_SPECIAL) ||
 	    is_vm_hugetlb_page(vma) || vma == get_gate_vma(current->mm) ||
-	    vma_is_dax(vma))
+	    vma_is_dax(vma) || vma_is_secretmem(vma))
 		/* don't set VM_LOCKED or VM_LOCKONFAULT and don't count */
 		goto out;
 
--- /dev/null
+++ a/mm/secretmem.c
@@ -0,0 +1,239 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright IBM Corporation, 2021
+ *
+ * Author: Mike Rapoport <rppt@linux.ibm.com>
+ */
+
+#include <linux/mm.h>
+#include <linux/fs.h>
+#include <linux/swap.h>
+#include <linux/mount.h>
+#include <linux/memfd.h>
+#include <linux/bitops.h>
+#include <linux/printk.h>
+#include <linux/pagemap.h>
+#include <linux/syscalls.h>
+#include <linux/pseudo_fs.h>
+#include <linux/secretmem.h>
+#include <linux/set_memory.h>
+#include <linux/sched/signal.h>
+
+#include <uapi/linux/magic.h>
+
+#include <asm/tlbflush.h>
+
+#include "internal.h"
+
+#undef pr_fmt
+#define pr_fmt(fmt) "secretmem: " fmt
+
+/*
+ * Define mode and flag masks to allow validation of the system call
+ * parameters.
+ */
+#define SECRETMEM_MODE_MASK	(0x0)
+#define SECRETMEM_FLAGS_MASK	SECRETMEM_MODE_MASK
+
+static bool secretmem_enable __ro_after_init;
+module_param_named(enable, secretmem_enable, bool, 0400);
+MODULE_PARM_DESC(secretmem_enable,
+		 "Enable secretmem and memfd_secret(2) system call");
+
+static vm_fault_t secretmem_fault(struct vm_fault *vmf)
+{
+	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
+	struct inode *inode = file_inode(vmf->vma->vm_file);
+	pgoff_t offset = vmf->pgoff;
+	gfp_t gfp = vmf->gfp_mask;
+	unsigned long addr;
+	struct page *page;
+	int err;
+
+	if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
+		return vmf_error(-EINVAL);
+
+retry:
+	page = find_lock_page(mapping, offset);
+	if (!page) {
+		page = alloc_page(gfp | __GFP_ZERO);
+		if (!page)
+			return VM_FAULT_OOM;
+
+		err = set_direct_map_invalid_noflush(page);
+		if (err) {
+			put_page(page);
+			return vmf_error(err);
+		}
+
+		__SetPageUptodate(page);
+		err = add_to_page_cache_lru(page, mapping, offset, gfp);
+		if (unlikely(err)) {
+			put_page(page);
+			/*
+			 * If a split of large page was required, it
+			 * already happened when we marked the page invalid
+			 * which guarantees that this call won't fail
+			 */
+			set_direct_map_default_noflush(page);
+			if (err == -EEXIST)
+				goto retry;
+
+			return vmf_error(err);
+		}
+
+		addr = (unsigned long)page_address(page);
+		flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
+	}
+
+	vmf->page = page;
+	return VM_FAULT_LOCKED;
+}
+
+static const struct vm_operations_struct secretmem_vm_ops = {
+	.fault = secretmem_fault,
+};
+
+static int secretmem_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	unsigned long len = vma->vm_end - vma->vm_start;
+
+	if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0)
+		return -EINVAL;
+
+	if (mlock_future_check(vma->vm_mm, vma->vm_flags | VM_LOCKED, len))
+		return -EAGAIN;
+
+	vma->vm_flags |= VM_LOCKED | VM_DONTDUMP;
+	vma->vm_ops = &secretmem_vm_ops;
+
+	return 0;
+}
+
+bool vma_is_secretmem(struct vm_area_struct *vma)
+{
+	return vma->vm_ops == &secretmem_vm_ops;
+}
+
+static const struct file_operations secretmem_fops = {
+	.mmap		= secretmem_mmap,
+};
+
+static bool secretmem_isolate_page(struct page *page, isolate_mode_t mode)
+{
+	return false;
+}
+
+static int secretmem_migratepage(struct address_space *mapping,
+				 struct page *newpage, struct page *page,
+				 enum migrate_mode mode)
+{
+	return -EBUSY;
+}
+
+static void secretmem_freepage(struct page *page)
+{
+	set_direct_map_default_noflush(page);
+	clear_highpage(page);
+}
+
+const struct address_space_operations secretmem_aops = {
+	.freepage	= secretmem_freepage,
+	.migratepage	= secretmem_migratepage,
+	.isolate_page	= secretmem_isolate_page,
+};
+
+static struct vfsmount *secretmem_mnt;
+
+static struct file *secretmem_file_create(unsigned long flags)
+{
+	struct file *file = ERR_PTR(-ENOMEM);
+	struct inode *inode;
+
+	inode = alloc_anon_inode(secretmem_mnt->mnt_sb);
+	if (IS_ERR(inode))
+		return ERR_CAST(inode);
+
+	file = alloc_file_pseudo(inode, secretmem_mnt, "secretmem",
+				 O_RDWR, &secretmem_fops);
+	if (IS_ERR(file))
+		goto err_free_inode;
+
+	mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
+	mapping_set_unevictable(inode->i_mapping);
+
+	inode->i_mapping->a_ops = &secretmem_aops;
+
+	/* pretend we are a normal file with zero size */
+	inode->i_mode |= S_IFREG;
+	inode->i_size = 0;
+
+	return file;
+
+err_free_inode:
+	iput(inode);
+	return file;
+}
+
+SYSCALL_DEFINE1(memfd_secret, unsigned int, flags)
+{
+	struct file *file;
+	int fd, err;
+
+	/* make sure local flags do not confict with global fcntl.h */
+	BUILD_BUG_ON(SECRETMEM_FLAGS_MASK & O_CLOEXEC);
+
+	if (!secretmem_enable)
+		return -ENOSYS;
+
+	if (flags & ~(SECRETMEM_FLAGS_MASK | O_CLOEXEC))
+		return -EINVAL;
+
+	fd = get_unused_fd_flags(flags & O_CLOEXEC);
+	if (fd < 0)
+		return fd;
+
+	file = secretmem_file_create(flags);
+	if (IS_ERR(file)) {
+		err = PTR_ERR(file);
+		goto err_put_fd;
+	}
+
+	file->f_flags |= O_LARGEFILE;
+
+	fd_install(fd, file);
+	return fd;
+
+err_put_fd:
+	put_unused_fd(fd);
+	return err;
+}
+
+static int secretmem_init_fs_context(struct fs_context *fc)
+{
+	return init_pseudo(fc, SECRETMEM_MAGIC) ? 0 : -ENOMEM;
+}
+
+static struct file_system_type secretmem_fs = {
+	.name		= "secretmem",
+	.init_fs_context = secretmem_init_fs_context,
+	.kill_sb	= kill_anon_super,
+};
+
+static int secretmem_init(void)
+{
+	int ret = 0;
+
+	if (!secretmem_enable)
+		return ret;
+
+	secretmem_mnt = kern_mount(&secretmem_fs);
+	if (IS_ERR(secretmem_mnt))
+		ret = PTR_ERR(secretmem_mnt);
+
+	/* prevent secretmem mappings from ever getting PROT_EXEC */
+	secretmem_mnt->mnt_flags |= MNT_NOEXEC;
+
+	return ret;
+}
+fs_initcall(secretmem_init);
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 12/54] PM: hibernate: disable when there are active secretmem users
  2021-07-08  0:59 incoming Andrew Morton
                   ` (10 preceding siblings ...)
  2021-07-08  1:08 ` [patch 11/54] mm: introduce memfd_secret system call to create "secret" memory areas Andrew Morton
@ 2021-07-08  1:08 ` Andrew Morton
  2021-07-08  3:15     ` Linus Torvalds
  2021-07-08  1:08 ` [patch 13/54] arch, mm: wire up memfd_secret system call where relevant Andrew Morton
                   ` (41 subsequent siblings)
  53 siblings, 1 reply; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:08 UTC (permalink / raw)
  To: akpm, arnd, bp, catalin.marinas, cl, dan.j.williams, dave.hansen,
	david, elena.reshetova, guro, hagen, hpa, James.Bottomley, jejb,
	kirill, linux-mm, lkp, luto, mark.rutland, mingo, mm-commits,
	mtk.manpages, palmer, palmerdabbelt, paul.walmsley, peterz,
	rick.p.edgecombe, rppt, shakeelb, shuah, tglx, torvalds, tycho,
	viro, will, willy

From: Mike Rapoport <rppt@linux.ibm.com>
Subject: PM: hibernate: disable when there are active secretmem users

It is unsafe to allow saving of secretmem areas to the hibernation
snapshot as they would be visible after the resume and this essentially
will defeat the purpose of secret memory mappings.

Prevent hibernation whenever there are active secret memory users.

Link: https://lkml.kernel.org/r/20210518072034.31572-6-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Will Deacon <will@kernel.org>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/secretmem.h |    6 ++++++
 kernel/power/hibernate.c  |    5 ++++-
 mm/secretmem.c            |   15 +++++++++++++++
 3 files changed, 25 insertions(+), 1 deletion(-)

--- a/include/linux/secretmem.h~pm-hibernate-disable-when-there-are-active-secretmem-users
+++ a/include/linux/secretmem.h
@@ -30,6 +30,7 @@ static inline bool page_is_secretmem(str
 }
 
 bool vma_is_secretmem(struct vm_area_struct *vma);
+bool secretmem_active(void);
 
 #else
 
@@ -42,6 +43,11 @@ static inline bool page_is_secretmem(str
 {
 	return false;
 }
+
+static inline bool secretmem_active(void)
+{
+	return false;
+}
 
 #endif /* CONFIG_SECRETMEM */
 
--- a/kernel/power/hibernate.c~pm-hibernate-disable-when-there-are-active-secretmem-users
+++ a/kernel/power/hibernate.c
@@ -31,6 +31,7 @@
 #include <linux/genhd.h>
 #include <linux/ktime.h>
 #include <linux/security.h>
+#include <linux/secretmem.h>
 #include <trace/events/power.h>
 
 #include "power.h"
@@ -81,7 +82,9 @@ void hibernate_release(void)
 
 bool hibernation_available(void)
 {
-	return nohibernate == 0 && !security_locked_down(LOCKDOWN_HIBERNATION);
+	return nohibernate == 0 &&
+		!security_locked_down(LOCKDOWN_HIBERNATION) &&
+		!secretmem_active();
 }
 
 /**
--- a/mm/secretmem.c~pm-hibernate-disable-when-there-are-active-secretmem-users
+++ a/mm/secretmem.c
@@ -40,6 +40,13 @@ module_param_named(enable, secretmem_ena
 MODULE_PARM_DESC(secretmem_enable,
 		 "Enable secretmem and memfd_secret(2) system call");
 
+static atomic_t secretmem_users;
+
+bool secretmem_active(void)
+{
+	return !!atomic_read(&secretmem_users);
+}
+
 static vm_fault_t secretmem_fault(struct vm_fault *vmf)
 {
 	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
@@ -94,6 +101,12 @@ static const struct vm_operations_struct
 	.fault = secretmem_fault,
 };
 
+static int secretmem_release(struct inode *inode, struct file *file)
+{
+	atomic_dec(&secretmem_users);
+	return 0;
+}
+
 static int secretmem_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	unsigned long len = vma->vm_end - vma->vm_start;
@@ -116,6 +129,7 @@ bool vma_is_secretmem(struct vm_area_str
 }
 
 static const struct file_operations secretmem_fops = {
+	.release	= secretmem_release,
 	.mmap		= secretmem_mmap,
 };
 
@@ -202,6 +216,7 @@ SYSCALL_DEFINE1(memfd_secret, unsigned i
 	file->f_flags |= O_LARGEFILE;
 
 	fd_install(fd, file);
+	atomic_inc(&secretmem_users);
 	return fd;
 
 err_put_fd:
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 13/54] arch, mm: wire up memfd_secret system call where relevant
  2021-07-08  0:59 incoming Andrew Morton
                   ` (11 preceding siblings ...)
  2021-07-08  1:08 ` [patch 12/54] PM: hibernate: disable when there are active secretmem users Andrew Morton
@ 2021-07-08  1:08 ` Andrew Morton
  2021-07-08  1:08 ` [patch 14/54] secretmem: test: add basic selftest for memfd_secret(2) Andrew Morton
                   ` (40 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:08 UTC (permalink / raw)
  To: akpm, arnd, bp, catalin.marinas, cl, dan.j.williams, dave.hansen,
	david, elena.reshetova, guro, hagen, hpa, James.Bottomley, jejb,
	kirill, linux-mm, lkp, luto, mark.rutland, mingo, mm-commits,
	mtk.manpages, palmer, palmerdabbelt, paul.walmsley, peterz,
	rick.p.edgecombe, rppt, shakeelb, shuah, tglx, torvalds, tycho,
	viro, will, willy

From: Mike Rapoport <rppt@linux.ibm.com>
Subject: arch, mm: wire up memfd_secret system call where relevant

Wire up memfd_secret system call on architectures that define
ARCH_HAS_SET_DIRECT_MAP, namely arm64, risc-v and x86.

Link: https://lkml.kernel.org/r/20210518072034.31572-7-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Will Deacon <will@kernel.org>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/arm64/include/uapi/asm/unistd.h   |    1 +
 arch/riscv/include/asm/unistd.h        |    1 +
 arch/x86/entry/syscalls/syscall_32.tbl |    1 +
 arch/x86/entry/syscalls/syscall_64.tbl |    1 +
 include/linux/syscalls.h               |    1 +
 include/uapi/asm-generic/unistd.h      |    7 ++++++-
 scripts/checksyscalls.sh               |    4 ++++
 7 files changed, 15 insertions(+), 1 deletion(-)

--- a/arch/arm64/include/uapi/asm/unistd.h~arch-mm-wire-up-memfd_secret-system-call-where-relevant
+++ a/arch/arm64/include/uapi/asm/unistd.h
@@ -20,5 +20,6 @@
 #define __ARCH_WANT_SET_GET_RLIMIT
 #define __ARCH_WANT_TIME32_SYSCALLS
 #define __ARCH_WANT_SYS_CLONE3
+#define __ARCH_WANT_MEMFD_SECRET
 
 #include <asm-generic/unistd.h>
--- a/arch/riscv/include/asm/unistd.h~arch-mm-wire-up-memfd_secret-system-call-where-relevant
+++ a/arch/riscv/include/asm/unistd.h
@@ -9,6 +9,7 @@
  */
 
 #define __ARCH_WANT_SYS_CLONE
+#define __ARCH_WANT_MEMFD_SECRET
 
 #include <uapi/asm/unistd.h>
 
--- a/arch/x86/entry/syscalls/syscall_32.tbl~arch-mm-wire-up-memfd_secret-system-call-where-relevant
+++ a/arch/x86/entry/syscalls/syscall_32.tbl
@@ -451,3 +451,4 @@
 444	i386	landlock_create_ruleset	sys_landlock_create_ruleset
 445	i386	landlock_add_rule	sys_landlock_add_rule
 446	i386	landlock_restrict_self	sys_landlock_restrict_self
+447	i386	memfd_secret		sys_memfd_secret
--- a/arch/x86/entry/syscalls/syscall_64.tbl~arch-mm-wire-up-memfd_secret-system-call-where-relevant
+++ a/arch/x86/entry/syscalls/syscall_64.tbl
@@ -368,6 +368,7 @@
 444	common	landlock_create_ruleset	sys_landlock_create_ruleset
 445	common	landlock_add_rule	sys_landlock_add_rule
 446	common	landlock_restrict_self	sys_landlock_restrict_self
+447	common	memfd_secret		sys_memfd_secret
 
 #
 # Due to a historical design error, certain syscalls are numbered differently
--- a/include/linux/syscalls.h~arch-mm-wire-up-memfd_secret-system-call-where-relevant
+++ a/include/linux/syscalls.h
@@ -1050,6 +1050,7 @@ asmlinkage long sys_landlock_create_rule
 asmlinkage long sys_landlock_add_rule(int ruleset_fd, enum landlock_rule_type rule_type,
 		const void __user *rule_attr, __u32 flags);
 asmlinkage long sys_landlock_restrict_self(int ruleset_fd, __u32 flags);
+asmlinkage long sys_memfd_secret(unsigned int flags);
 
 /*
  * Architecture-specific system calls
--- a/include/uapi/asm-generic/unistd.h~arch-mm-wire-up-memfd_secret-system-call-where-relevant
+++ a/include/uapi/asm-generic/unistd.h
@@ -873,8 +873,13 @@ __SYSCALL(__NR_landlock_add_rule, sys_la
 #define __NR_landlock_restrict_self 446
 __SYSCALL(__NR_landlock_restrict_self, sys_landlock_restrict_self)
 
+#ifdef __ARCH_WANT_MEMFD_SECRET
+#define __NR_memfd_secret 447
+__SYSCALL(__NR_memfd_secret, sys_memfd_secret)
+#endif
+
 #undef __NR_syscalls
-#define __NR_syscalls 447
+#define __NR_syscalls 448
 
 /*
  * 32 bit systems traditionally used different
--- a/scripts/checksyscalls.sh~arch-mm-wire-up-memfd_secret-system-call-where-relevant
+++ a/scripts/checksyscalls.sh
@@ -40,6 +40,10 @@ cat << EOF
 #define __IGNORE_setrlimit	/* setrlimit */
 #endif
 
+#ifndef __ARCH_WANT_MEMFD_SECRET
+#define __IGNORE_memfd_secret
+#endif
+
 /* Missing flags argument */
 #define __IGNORE_renameat	/* renameat2 */
 
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 14/54] secretmem: test: add basic selftest for memfd_secret(2)
  2021-07-08  0:59 incoming Andrew Morton
                   ` (12 preceding siblings ...)
  2021-07-08  1:08 ` [patch 13/54] arch, mm: wire up memfd_secret system call where relevant Andrew Morton
@ 2021-07-08  1:08 ` Andrew Morton
  2021-07-08  1:08 ` [patch 15/54] mm: fix spelling mistakes in header files Andrew Morton
                   ` (39 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:08 UTC (permalink / raw)
  To: akpm, arnd, bp, catalin.marinas, cl, dan.j.williams, dave.hansen,
	david, elena.reshetova, guro, hagen, hpa, James.Bottomley, jejb,
	kirill, linux-mm, lkp, luto, mark.rutland, mingo, mm-commits,
	mtk.manpages, palmer, palmerdabbelt, paul.walmsley, peterz,
	rick.p.edgecombe, rppt, shakeelb, shuah, tglx, torvalds, tycho,
	viro, will, willy

From: Mike Rapoport <rppt@linux.ibm.com>
Subject: secretmem: test: add basic selftest for memfd_secret(2)

The test verifies that file descriptor created with memfd_secret does not
allow read/write operations, that secret memory mappings respect
RLIMIT_MEMLOCK and that remote accesses with process_vm_read() and
ptrace() to the secret memory fail.

Link: https://lkml.kernel.org/r/20210518072034.31572-8-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Will Deacon <will@kernel.org>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 tools/testing/selftests/vm/.gitignore     |    1 
 tools/testing/selftests/vm/Makefile       |    3 
 tools/testing/selftests/vm/memfd_secret.c |  296 ++++++++++++++++++++
 tools/testing/selftests/vm/run_vmtests.sh |   17 +
 4 files changed, 316 insertions(+), 1 deletion(-)

--- a/tools/testing/selftests/vm/.gitignore~secretmem-test-add-basic-selftest-for-memfd_secret2
+++ a/tools/testing/selftests/vm/.gitignore
@@ -24,5 +24,6 @@ va_128TBswitch
 map_fixed_noreplace
 write_to_hugetlbfs
 hmm-tests
+memfd_secret
 local_config.*
 split_huge_page_test
--- a/tools/testing/selftests/vm/Makefile~secretmem-test-add-basic-selftest-for-memfd_secret2
+++ a/tools/testing/selftests/vm/Makefile
@@ -35,6 +35,7 @@ TEST_GEN_FILES += madv_populate
 TEST_GEN_FILES += map_fixed_noreplace
 TEST_GEN_FILES += map_hugetlb
 TEST_GEN_FILES += map_populate
+TEST_GEN_FILES += memfd_secret
 TEST_GEN_FILES += mlock-random-test
 TEST_GEN_FILES += mlock2-tests
 TEST_GEN_FILES += mremap_dontunmap
@@ -135,7 +136,7 @@ warn_32bit_failure:
 endif
 endif
 
-$(OUTPUT)/mlock-random-test: LDLIBS += -lcap
+$(OUTPUT)/mlock-random-test $(OUTPUT)/memfd_secret: LDLIBS += -lcap
 
 $(OUTPUT)/gup_test: ../../../../mm/gup_test.h
 
--- /dev/null
+++ a/tools/testing/selftests/vm/memfd_secret.c
@@ -0,0 +1,296 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright IBM Corporation, 2021
+ *
+ * Author: Mike Rapoport <rppt@linux.ibm.com>
+ */
+
+#define _GNU_SOURCE
+#include <sys/uio.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+#include <sys/types.h>
+#include <sys/ptrace.h>
+#include <sys/syscall.h>
+#include <sys/resource.h>
+#include <sys/capability.h>
+
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <errno.h>
+#include <stdio.h>
+
+#include "../kselftest.h"
+
+#define fail(fmt, ...) ksft_test_result_fail(fmt, ##__VA_ARGS__)
+#define pass(fmt, ...) ksft_test_result_pass(fmt, ##__VA_ARGS__)
+#define skip(fmt, ...) ksft_test_result_skip(fmt, ##__VA_ARGS__)
+
+#ifdef __NR_memfd_secret
+
+#define PATTERN	0x55
+
+static const int prot = PROT_READ | PROT_WRITE;
+static const int mode = MAP_SHARED;
+
+static unsigned long page_size;
+static unsigned long mlock_limit_cur;
+static unsigned long mlock_limit_max;
+
+static int memfd_secret(unsigned int flags)
+{
+	return syscall(__NR_memfd_secret, flags);
+}
+
+static void test_file_apis(int fd)
+{
+	char buf[64];
+
+	if ((read(fd, buf, sizeof(buf)) >= 0) ||
+	    (write(fd, buf, sizeof(buf)) >= 0) ||
+	    (pread(fd, buf, sizeof(buf), 0) >= 0) ||
+	    (pwrite(fd, buf, sizeof(buf), 0) >= 0))
+		fail("unexpected file IO\n");
+	else
+		pass("file IO is blocked as expected\n");
+}
+
+static void test_mlock_limit(int fd)
+{
+	size_t len;
+	char *mem;
+
+	len = mlock_limit_cur;
+	mem = mmap(NULL, len, prot, mode, fd, 0);
+	if (mem == MAP_FAILED) {
+		fail("unable to mmap secret memory\n");
+		return;
+	}
+	munmap(mem, len);
+
+	len = mlock_limit_max * 2;
+	mem = mmap(NULL, len, prot, mode, fd, 0);
+	if (mem != MAP_FAILED) {
+		fail("unexpected mlock limit violation\n");
+		munmap(mem, len);
+		return;
+	}
+
+	pass("mlock limit is respected\n");
+}
+
+static void try_process_vm_read(int fd, int pipefd[2])
+{
+	struct iovec liov, riov;
+	char buf[64];
+	char *mem;
+
+	if (read(pipefd[0], &mem, sizeof(mem)) < 0) {
+		fail("pipe write: %s\n", strerror(errno));
+		exit(KSFT_FAIL);
+	}
+
+	liov.iov_len = riov.iov_len = sizeof(buf);
+	liov.iov_base = buf;
+	riov.iov_base = mem;
+
+	if (process_vm_readv(getppid(), &liov, 1, &riov, 1, 0) < 0) {
+		if (errno == ENOSYS)
+			exit(KSFT_SKIP);
+		exit(KSFT_PASS);
+	}
+
+	exit(KSFT_FAIL);
+}
+
+static void try_ptrace(int fd, int pipefd[2])
+{
+	pid_t ppid = getppid();
+	int status;
+	char *mem;
+	long ret;
+
+	if (read(pipefd[0], &mem, sizeof(mem)) < 0) {
+		perror("pipe write");
+		exit(KSFT_FAIL);
+	}
+
+	ret = ptrace(PTRACE_ATTACH, ppid, 0, 0);
+	if (ret) {
+		perror("ptrace_attach");
+		exit(KSFT_FAIL);
+	}
+
+	ret = waitpid(ppid, &status, WUNTRACED);
+	if ((ret != ppid) || !(WIFSTOPPED(status))) {
+		fprintf(stderr, "weird waitppid result %ld stat %x\n",
+			ret, status);
+		exit(KSFT_FAIL);
+	}
+
+	if (ptrace(PTRACE_PEEKDATA, ppid, mem, 0))
+		exit(KSFT_PASS);
+
+	exit(KSFT_FAIL);
+}
+
+static void check_child_status(pid_t pid, const char *name)
+{
+	int status;
+
+	waitpid(pid, &status, 0);
+
+	if (WIFEXITED(status) && WEXITSTATUS(status) == KSFT_SKIP) {
+		skip("%s is not supported\n", name);
+		return;
+	}
+
+	if ((WIFEXITED(status) && WEXITSTATUS(status) == KSFT_PASS) ||
+	    WIFSIGNALED(status)) {
+		pass("%s is blocked as expected\n", name);
+		return;
+	}
+
+	fail("%s: unexpected memory access\n", name);
+}
+
+static void test_remote_access(int fd, const char *name,
+			       void (*func)(int fd, int pipefd[2]))
+{
+	int pipefd[2];
+	pid_t pid;
+	char *mem;
+
+	if (pipe(pipefd)) {
+		fail("pipe failed: %s\n", strerror(errno));
+		return;
+	}
+
+	pid = fork();
+	if (pid < 0) {
+		fail("fork failed: %s\n", strerror(errno));
+		return;
+	}
+
+	if (pid == 0) {
+		func(fd, pipefd);
+		return;
+	}
+
+	mem = mmap(NULL, page_size, prot, mode, fd, 0);
+	if (mem == MAP_FAILED) {
+		fail("Unable to mmap secret memory\n");
+		return;
+	}
+
+	ftruncate(fd, page_size);
+	memset(mem, PATTERN, page_size);
+
+	if (write(pipefd[1], &mem, sizeof(mem)) < 0) {
+		fail("pipe write: %s\n", strerror(errno));
+		return;
+	}
+
+	check_child_status(pid, name);
+}
+
+static void test_process_vm_read(int fd)
+{
+	test_remote_access(fd, "process_vm_read", try_process_vm_read);
+}
+
+static void test_ptrace(int fd)
+{
+	test_remote_access(fd, "ptrace", try_ptrace);
+}
+
+static int set_cap_limits(rlim_t max)
+{
+	struct rlimit new;
+	cap_t cap = cap_init();
+
+	new.rlim_cur = max;
+	new.rlim_max = max;
+	if (setrlimit(RLIMIT_MEMLOCK, &new)) {
+		perror("setrlimit() returns error");
+		return -1;
+	}
+
+	/* drop capabilities including CAP_IPC_LOCK */
+	if (cap_set_proc(cap)) {
+		perror("cap_set_proc() returns error");
+		return -2;
+	}
+
+	return 0;
+}
+
+static void prepare(void)
+{
+	struct rlimit rlim;
+
+	page_size = sysconf(_SC_PAGE_SIZE);
+	if (!page_size)
+		ksft_exit_fail_msg("Failed to get page size %s\n",
+				   strerror(errno));
+
+	if (getrlimit(RLIMIT_MEMLOCK, &rlim))
+		ksft_exit_fail_msg("Unable to detect mlock limit: %s\n",
+				   strerror(errno));
+
+	mlock_limit_cur = rlim.rlim_cur;
+	mlock_limit_max = rlim.rlim_max;
+
+	printf("page_size: %ld, mlock.soft: %ld, mlock.hard: %ld\n",
+	       page_size, mlock_limit_cur, mlock_limit_max);
+
+	if (page_size > mlock_limit_cur)
+		mlock_limit_cur = page_size;
+	if (page_size > mlock_limit_max)
+		mlock_limit_max = page_size;
+
+	if (set_cap_limits(mlock_limit_max))
+		ksft_exit_fail_msg("Unable to set mlock limit: %s\n",
+				   strerror(errno));
+}
+
+#define NUM_TESTS 4
+
+int main(int argc, char *argv[])
+{
+	int fd;
+
+	prepare();
+
+	ksft_print_header();
+	ksft_set_plan(NUM_TESTS);
+
+	fd = memfd_secret(0);
+	if (fd < 0) {
+		if (errno == ENOSYS)
+			ksft_exit_skip("memfd_secret is not supported\n");
+		else
+			ksft_exit_fail_msg("memfd_secret failed: %s\n",
+					   strerror(errno));
+	}
+
+	test_mlock_limit(fd);
+	test_file_apis(fd);
+	test_process_vm_read(fd);
+	test_ptrace(fd);
+
+	close(fd);
+
+	ksft_exit(!ksft_get_fail_cnt());
+}
+
+#else /* __NR_memfd_secret */
+
+int main(int argc, char *argv[])
+{
+	printf("skip: skipping memfd_secret test (missing __NR_memfd_secret)\n");
+	return KSFT_SKIP;
+}
+
+#endif /* __NR_memfd_secret */
--- a/tools/testing/selftests/vm/run_vmtests.sh~secretmem-test-add-basic-selftest-for-memfd_secret2
+++ a/tools/testing/selftests/vm/run_vmtests.sh
@@ -362,4 +362,21 @@ else
 	exitcode=1
 fi
 
+echo "running memfd_secret test"
+echo "------------------------------------"
+./memfd_secret
+ret_val=$?
+
+if [ $ret_val -eq 0 ]; then
+	echo "[PASS]"
+elif [ $ret_val -eq $ksft_skip ]; then
+	echo "[SKIP]"
+	exitcode=$ksft_skip
+else
+	echo "[FAIL]"
+	exitcode=1
+fi
+
+exit $exitcode
+
 exit $exitcode
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 15/54] mm: fix spelling mistakes in header files
  2021-07-08  0:59 incoming Andrew Morton
                   ` (13 preceding siblings ...)
  2021-07-08  1:08 ` [patch 14/54] secretmem: test: add basic selftest for memfd_secret(2) Andrew Morton
@ 2021-07-08  1:08 ` Andrew Morton
  2021-07-08  1:08 ` [patch 16/54] mm: add setup_initial_init_mm() helper Andrew Morton
                   ` (38 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:08 UTC (permalink / raw)
  To: akpm, cl, dennis, jglisse, linux-mm, mike.kravetz, mm-commits,
	thunder.leizhen, tj, torvalds

From: Zhen Lei <thunder.leizhen@huawei.com>
Subject: mm: fix spelling mistakes in header files

Fix some spelling mistakes in comments:
successfull ==> successful
potentialy ==> potentially
alloced ==> allocated
indicies ==> indices
wont ==> won't
resposible ==> responsible
dirtyness ==> dirtiness
droppped ==> dropped
alread ==> already
occured ==> occurred
interupts ==> interrupts
extention ==> extension
slighly ==> slightly
Dont't ==> Don't

Link: https://lkml.kernel.org/r/20210531034849.9549-2-thunder.leizhen@huawei.com
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/compaction.h   |    4 ++--
 include/linux/hmm.h          |    2 +-
 include/linux/hugetlb.h      |    6 +++---
 include/linux/list_lru.h     |    4 ++--
 include/linux/mmu_notifier.h |    8 ++++----
 include/linux/percpu-defs.h  |    2 +-
 include/linux/shrinker.h     |    2 +-
 include/linux/vmalloc.h      |    4 ++--
 8 files changed, 16 insertions(+), 16 deletions(-)

--- a/include/linux/compaction.h~mm-fix-spelling-mistakes-in-header-files
+++ a/include/linux/compaction.h
@@ -35,12 +35,12 @@ enum compact_result {
 	COMPACT_CONTINUE,
 
 	/*
-	 * The full zone was compacted scanned but wasn't successfull to compact
+	 * The full zone was compacted scanned but wasn't successful to compact
 	 * suitable pages.
 	 */
 	COMPACT_COMPLETE,
 	/*
-	 * direct compaction has scanned part of the zone but wasn't successfull
+	 * direct compaction has scanned part of the zone but wasn't successful
 	 * to compact suitable pages.
 	 */
 	COMPACT_PARTIAL_SKIPPED,
--- a/include/linux/hmm.h~mm-fix-spelling-mistakes-in-header-files
+++ a/include/linux/hmm.h
@@ -113,7 +113,7 @@ int hmm_range_fault(struct hmm_range *ra
  * HMM_RANGE_DEFAULT_TIMEOUT - default timeout (ms) when waiting for a range
  *
  * When waiting for mmu notifiers we need some kind of time out otherwise we
- * could potentialy wait for ever, 1000ms ie 1s sounds like a long time to
+ * could potentially wait for ever, 1000ms ie 1s sounds like a long time to
  * wait already.
  */
 #define HMM_RANGE_DEFAULT_TIMEOUT 1000
--- a/include/linux/hugetlb.h~mm-fix-spelling-mistakes-in-header-files
+++ a/include/linux/hugetlb.h
@@ -51,7 +51,7 @@ struct hugepage_subpool {
 	long count;
 	long max_hpages;	/* Maximum huge pages or -1 if no maximum. */
 	long used_hpages;	/* Used count against maximum, includes */
-				/* both alloced and reserved pages. */
+				/* both allocated and reserved pages. */
 	struct hstate *hstate;
 	long min_hpages;	/* Minimum huge pages or -1 if no minimum. */
 	long rsv_hpages;	/* Pages reserved against global pool to */
@@ -85,7 +85,7 @@ struct resv_map {
  * by a resv_map's lock.  The set of regions within the resv_map represent
  * reservations for huge pages, or huge pages that have already been
  * instantiated within the map.  The from and to elements are huge page
- * indicies into the associated mapping.  from indicates the starting index
+ * indices into the associated mapping.  from indicates the starting index
  * of the region.  to represents the first index past the end of  the region.
  *
  * For example, a file region structure with from == 0 and to == 4 represents
@@ -797,7 +797,7 @@ static inline bool hugepage_migration_su
  * It determines whether or not a huge page should be placed on
  * movable zone or not. Movability of any huge page should be
  * required only if huge page size is supported for migration.
- * There wont be any reason for the huge page to be movable if
+ * There won't be any reason for the huge page to be movable if
  * it is not migratable to start with. Also the size of the huge
  * page should be large enough to be placed under a movable zone
  * and still feasible enough to be migratable. Just the presence
--- a/include/linux/list_lru.h~mm-fix-spelling-mistakes-in-header-files
+++ a/include/linux/list_lru.h
@@ -146,7 +146,7 @@ typedef enum lru_status (*list_lru_walk_
  * @lru: the lru pointer.
  * @nid: the node id to scan from.
  * @memcg: the cgroup to scan from.
- * @isolate: callback function that is resposible for deciding what to do with
+ * @isolate: callback function that is responsible for deciding what to do with
  *  the item currently being scanned
  * @cb_arg: opaque type that will be passed to @isolate
  * @nr_to_walk: how many items to scan.
@@ -172,7 +172,7 @@ unsigned long list_lru_walk_one(struct l
  * @lru: the lru pointer.
  * @nid: the node id to scan from.
  * @memcg: the cgroup to scan from.
- * @isolate: callback function that is resposible for deciding what to do with
+ * @isolate: callback function that is responsible for deciding what to do with
  *  the item currently being scanned
  * @cb_arg: opaque type that will be passed to @isolate
  * @nr_to_walk: how many items to scan.
--- a/include/linux/mmu_notifier.h~mm-fix-spelling-mistakes-in-header-files
+++ a/include/linux/mmu_notifier.h
@@ -33,7 +33,7 @@ struct mmu_interval_notifier;
  *
  * @MMU_NOTIFY_SOFT_DIRTY: soft dirty accounting (still same page and same
  * access flags). User should soft dirty the page in the end callback to make
- * sure that anyone relying on soft dirtyness catch pages that might be written
+ * sure that anyone relying on soft dirtiness catch pages that might be written
  * through non CPU mappings.
  *
  * @MMU_NOTIFY_RELEASE: used during mmu_interval_notifier invalidate to signal
@@ -167,7 +167,7 @@ struct mmu_notifier_ops {
 	 * decrease the refcount. If the refcount is decreased on
 	 * invalidate_range_start() then the VM can free pages as page
 	 * table entries are removed.  If the refcount is only
-	 * droppped on invalidate_range_end() then the driver itself
+	 * dropped on invalidate_range_end() then the driver itself
 	 * will drop the last refcount but it must take care to flush
 	 * any secondary tlb before doing the final free on the
 	 * page. Pages will no longer be referenced by the linux
@@ -196,7 +196,7 @@ struct mmu_notifier_ops {
 	 * If invalidate_range() is used to manage a non-CPU TLB with
 	 * shared page-tables, it not necessary to implement the
 	 * invalidate_range_start()/end() notifiers, as
-	 * invalidate_range() alread catches the points in time when an
+	 * invalidate_range() already catches the points in time when an
 	 * external TLB range needs to be flushed. For more in depth
 	 * discussion on this see Documentation/vm/mmu_notifier.rst
 	 *
@@ -369,7 +369,7 @@ mmu_interval_read_retry(struct mmu_inter
  * mmu_interval_read_retry() will return true.
  *
  * False is not reliable and only suggests a collision may not have
- * occured. It can be called many times and does not have to hold the user
+ * occurred. It can be called many times and does not have to hold the user
  * provided lock.
  *
  * This call can be used as part of loops and other expensive operations to
--- a/include/linux/percpu-defs.h~mm-fix-spelling-mistakes-in-header-files
+++ a/include/linux/percpu-defs.h
@@ -412,7 +412,7 @@ do {									\
  * instead.
  *
  * If there is no other protection through preempt disable and/or disabling
- * interupts then one of these RMW operations can show unexpected behavior
+ * interrupts then one of these RMW operations can show unexpected behavior
  * because the execution thread was rescheduled on another processor or an
  * interrupt occurred and the same percpu variable was modified from the
  * interrupt context.
--- a/include/linux/shrinker.h~mm-fix-spelling-mistakes-in-header-files
+++ a/include/linux/shrinker.h
@@ -4,7 +4,7 @@
 
 /*
  * This struct is used to pass information from page reclaim to the shrinkers.
- * We consolidate the values for easier extention later.
+ * We consolidate the values for easier extension later.
  *
  * The 'gfpmask' refers to the allocation we are currently trying to
  * fulfil.
--- a/include/linux/vmalloc.h~mm-fix-spelling-mistakes-in-header-files
+++ a/include/linux/vmalloc.h
@@ -29,7 +29,7 @@ struct notifier_block;		/* in notifier.h
 #define VM_NO_HUGE_VMAP		0x00000400	/* force PAGE_SIZE pte mapping */
 
 /*
- * VM_KASAN is used slighly differently depending on CONFIG_KASAN_VMALLOC.
+ * VM_KASAN is used slightly differently depending on CONFIG_KASAN_VMALLOC.
  *
  * If IS_ENABLED(CONFIG_KASAN_VMALLOC), VM_KASAN is set on a vm_struct after
  * shadow memory has been mapped. It's used to handle allocation errors so that
@@ -247,7 +247,7 @@ static inline void set_vm_flush_reset_pe
 extern long vread(char *buf, char *addr, unsigned long count);
 
 /*
- *	Internals.  Dont't use..
+ *	Internals.  Don't use..
  */
 extern struct list_head vmap_area_list;
 extern __init void vm_area_add_early(struct vm_struct *vm);
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 16/54] mm: add setup_initial_init_mm() helper
  2021-07-08  0:59 incoming Andrew Morton
                   ` (14 preceding siblings ...)
  2021-07-08  1:08 ` [patch 15/54] mm: fix spelling mistakes in header files Andrew Morton
@ 2021-07-08  1:08 ` Andrew Morton
  2021-07-08  1:08 ` [patch 17/54] arc: convert to setup_initial_init_mm() Andrew Morton
                   ` (37 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:08 UTC (permalink / raw)
  To: akpm, benh, borntraeger, catalin.marinas, christophe.leroy,
	dalias, geert, gerg, gor, green.hu, guoren, hca, jonas,
	jrdr.linux, ley.foon.tan, linux-mm, mingo, mm-commits, mpe,
	nickhu, palmer, paul.walmsley, rmk+kernel, shorne,
	stefan.kristiansson, tglx, torvalds, vgupta, wangkefeng.wang,
	will, ysato

From: Kefeng Wang <wangkefeng.wang@huawei.com>
Subject: mm: add setup_initial_init_mm() helper

Patch series "init_mm: cleanup ARCH's text/data/brk setup code", v3.

Add setup_initial_init_mm() helper, then use it to cleanup the text, data
and brk setup code.


This patch (of 15):

Add setup_initial_init_mm() helper to setup kernel text, data and brk.

Link: https://lkml.kernel.org/r/20210608083418.137226-1-wangkefeng.wang@huawei.com
Link: https://lkml.kernel.org/r/20210608083418.137226-2-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mm.h |    3 +++
 mm/init-mm.c       |    9 +++++++++
 2 files changed, 12 insertions(+)

--- a/include/linux/mm.h~mm-add-setup_initial_init_mm-helper
+++ a/include/linux/mm.h
@@ -238,6 +238,9 @@ int __add_to_page_cache_locked(struct pa
 
 #define lru_to_page(head) (list_entry((head)->prev, struct page, lru))
 
+void setup_initial_init_mm(void *start_code, void *end_code,
+			   void *end_data, void *brk);
+
 /*
  * Linux kernel virtual memory manager primitives.
  * The idea being to have a "virtual" mm in the same way
--- a/mm/init-mm.c~mm-add-setup_initial_init_mm-helper
+++ a/mm/init-mm.c
@@ -40,3 +40,12 @@ struct mm_struct init_mm = {
 	.cpu_bitmap	= CPU_BITS_NONE,
 	INIT_MM_CONTEXT(init_mm)
 };
+
+void setup_initial_init_mm(void *start_code, void *end_code,
+			   void *end_data, void *brk)
+{
+	init_mm.start_code = (unsigned long)start_code;
+	init_mm.end_code = (unsigned long)end_code;
+	init_mm.end_data = (unsigned long)end_data;
+	init_mm.brk = (unsigned long)brk;
+}
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 17/54] arc: convert to setup_initial_init_mm()
  2021-07-08  0:59 incoming Andrew Morton
                   ` (15 preceding siblings ...)
  2021-07-08  1:08 ` [patch 16/54] mm: add setup_initial_init_mm() helper Andrew Morton
@ 2021-07-08  1:08 ` Andrew Morton
  2021-07-08  1:08 ` [patch 18/54] arm: " Andrew Morton
                   ` (36 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:08 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, torvalds, vgupta, wangkefeng.wang

From: Kefeng Wang <wangkefeng.wang@huawei.com>
Subject: arc: convert to setup_initial_init_mm()

Use setup_initial_init_mm() helper to simplify code.

Link: https://lkml.kernel.org/r/20210608083418.137226-3-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>	arch/arc]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/arc/mm/init.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/arch/arc/mm/init.c~arc-convert-to-setup_initial_init_mm
+++ a/arch/arc/mm/init.c
@@ -89,10 +89,7 @@ void __init setup_arch_memory(void)
 {
 	unsigned long max_zone_pfn[MAX_NR_ZONES] = { 0 };
 
-	init_mm.start_code = (unsigned long)_text;
-	init_mm.end_code = (unsigned long)_etext;
-	init_mm.end_data = (unsigned long)_edata;
-	init_mm.brk = (unsigned long)_end;
+	setup_initial_init_mm(_text, _etext, _edata, _end);
 
 	/* first page of system - kernel .vector starts here */
 	min_low_pfn = virt_to_pfn(CONFIG_LINUX_RAM_BASE);
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 18/54] arm: convert to setup_initial_init_mm()
  2021-07-08  0:59 incoming Andrew Morton
                   ` (16 preceding siblings ...)
  2021-07-08  1:08 ` [patch 17/54] arc: convert to setup_initial_init_mm() Andrew Morton
@ 2021-07-08  1:08 ` Andrew Morton
  2021-07-08  1:08 ` [patch 19/54] arm64: " Andrew Morton
                   ` (35 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:08 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, rmk+kernel, torvalds, wangkefeng.wang

From: Kefeng Wang <wangkefeng.wang@huawei.com>
Subject: arm: convert to setup_initial_init_mm()

Use setup_initial_init_mm() helper to simplify code.

Link: https://lkml.kernel.org/r/20210608083418.137226-4-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/arm/kernel/setup.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/arch/arm/kernel/setup.c~arm-convert-to-setup_initial_init_mm
+++ a/arch/arm/kernel/setup.c
@@ -1130,10 +1130,7 @@ void __init setup_arch(char **cmdline_p)
 	if (mdesc->reboot_mode != REBOOT_HARD)
 		reboot_mode = mdesc->reboot_mode;
 
-	init_mm.start_code = (unsigned long) _text;
-	init_mm.end_code   = (unsigned long) _etext;
-	init_mm.end_data   = (unsigned long) _edata;
-	init_mm.brk	   = (unsigned long) _end;
+	setup_initial_init_mm(_text, _etext, _edata, _end);
 
 	/* populate cmd_line too for later use, preserving boot_command_line */
 	strlcpy(cmd_line, boot_command_line, COMMAND_LINE_SIZE);
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 19/54] arm64: convert to setup_initial_init_mm()
  2021-07-08  0:59 incoming Andrew Morton
                   ` (17 preceding siblings ...)
  2021-07-08  1:08 ` [patch 18/54] arm: " Andrew Morton
@ 2021-07-08  1:08 ` Andrew Morton
  2021-07-08  1:08 ` [patch 20/54] csky: " Andrew Morton
                   ` (34 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:08 UTC (permalink / raw)
  To: akpm, catalin.marinas, linux-mm, mm-commits, torvalds,
	wangkefeng.wang, will

From: Kefeng Wang <wangkefeng.wang@huawei.com>
Subject: arm64: convert to setup_initial_init_mm()

Use setup_initial_init_mm() helper to simplify code.

Link: https://lkml.kernel.org/r/20210608083418.137226-5-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/arm64/kernel/setup.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/arch/arm64/kernel/setup.c~arm64-convert-to-setup_initial_init_mm
+++ a/arch/arm64/kernel/setup.c
@@ -293,10 +293,7 @@ u64 cpu_logical_map(unsigned int cpu)
 
 void __init __no_sanitize_address setup_arch(char **cmdline_p)
 {
-	init_mm.start_code = (unsigned long) _stext;
-	init_mm.end_code   = (unsigned long) _etext;
-	init_mm.end_data   = (unsigned long) _edata;
-	init_mm.brk	   = (unsigned long) _end;
+	setup_initial_init_mm(_stext, _etext, _edata, _end);
 
 	*cmdline_p = boot_command_line;
 
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 20/54] csky: convert to setup_initial_init_mm()
  2021-07-08  0:59 incoming Andrew Morton
                   ` (18 preceding siblings ...)
  2021-07-08  1:08 ` [patch 19/54] arm64: " Andrew Morton
@ 2021-07-08  1:08 ` Andrew Morton
  2021-07-08  1:08 ` [patch 21/54] h8300: " Andrew Morton
                   ` (33 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:08 UTC (permalink / raw)
  To: akpm, guoren, linux-mm, mm-commits, torvalds, wangkefeng.wang

From: Kefeng Wang <wangkefeng.wang@huawei.com>
Subject: csky: convert to setup_initial_init_mm()

Use setup_initial_init_mm() helper to simplify code.

Link: https://lkml.kernel.org/r/20210608083418.137226-6-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/csky/kernel/setup.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/arch/csky/kernel/setup.c~csky-convert-to-setup_initial_init_mm
+++ a/arch/csky/kernel/setup.c
@@ -78,10 +78,7 @@ void __init setup_arch(char **cmdline_p)
 	pr_info("Phys. mem: %ldMB\n",
 		(unsigned long) memblock_phys_mem_size()/1024/1024);
 
-	init_mm.start_code = (unsigned long) _stext;
-	init_mm.end_code = (unsigned long) _etext;
-	init_mm.end_data = (unsigned long) _edata;
-	init_mm.brk = (unsigned long) _end;
+	setup_initial_init_mm(_stext, _etext, _edata, _end);
 
 	parse_early_param();
 
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 21/54] h8300: convert to setup_initial_init_mm()
  2021-07-08  0:59 incoming Andrew Morton
                   ` (19 preceding siblings ...)
  2021-07-08  1:08 ` [patch 20/54] csky: " Andrew Morton
@ 2021-07-08  1:08 ` Andrew Morton
  2021-07-08  1:08 ` [patch 22/54] m68k: " Andrew Morton
                   ` (32 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:08 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, torvalds, wangkefeng.wang, ysato

From: Kefeng Wang <wangkefeng.wang@huawei.com>
Subject: h8300: convert to setup_initial_init_mm()

Use setup_initial_init_mm() helper to simplify code.

Link: https://lkml.kernel.org/r/20210608083418.137226-7-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/h8300/kernel/setup.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/arch/h8300/kernel/setup.c~h8300-convert-to-setup_initial_init_mm
+++ a/arch/h8300/kernel/setup.c
@@ -95,10 +95,7 @@ void __init setup_arch(char **cmdline_p)
 {
 	unflatten_and_copy_device_tree();
 
-	init_mm.start_code = (unsigned long) _stext;
-	init_mm.end_code = (unsigned long) _etext;
-	init_mm.end_data = (unsigned long) _edata;
-	init_mm.brk = (unsigned long) 0;
+	setup_initial_init_mm(_stext, _etext, _edata, NULL);
 
 	pr_notice("\r\n\nuClinux " CPU "\n");
 	pr_notice("Flat model support (C) 1998,1999 Kenneth Albanowski, D. Jeff Dionne\n");
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 22/54] m68k: convert to setup_initial_init_mm()
  2021-07-08  0:59 incoming Andrew Morton
                   ` (20 preceding siblings ...)
  2021-07-08  1:08 ` [patch 21/54] h8300: " Andrew Morton
@ 2021-07-08  1:08 ` Andrew Morton
  2021-07-08  1:08 ` [patch 23/54] nds32: " Andrew Morton
                   ` (31 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:08 UTC (permalink / raw)
  To: akpm, geert, gerg, linux-mm, mm-commits, torvalds, wangkefeng.wang

From: Kefeng Wang <wangkefeng.wang@huawei.com>
Subject: m68k: convert to setup_initial_init_mm()

Use setup_initial_init_mm() helper to simplify code.

Link: https://lkml.kernel.org/r/20210608083418.137226-8-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Greg Ungerer <gerg@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/m68k/kernel/setup_mm.c |    5 +----
 arch/m68k/kernel/setup_no.c |    5 +----
 2 files changed, 2 insertions(+), 8 deletions(-)

--- a/arch/m68k/kernel/setup_mm.c~m68k-convert-to-setup_initial_init_mm
+++ a/arch/m68k/kernel/setup_mm.c
@@ -258,10 +258,7 @@ void __init setup_arch(char **cmdline_p)
 		}
 	}
 
-	init_mm.start_code = PAGE_OFFSET;
-	init_mm.end_code = (unsigned long)_etext;
-	init_mm.end_data = (unsigned long)_edata;
-	init_mm.brk = (unsigned long)_end;
+	setup_initial_init_mm((void *)PAGE_OFFSET, _etext, _edata, _end);
 
 #if defined(CONFIG_BOOTPARAM)
 	strncpy(m68k_command_line, CONFIG_BOOTPARAM_STRING, CL_SIZE);
--- a/arch/m68k/kernel/setup_no.c~m68k-convert-to-setup_initial_init_mm
+++ a/arch/m68k/kernel/setup_no.c
@@ -87,10 +87,7 @@ void __init setup_arch(char **cmdline_p)
 	memory_start = PAGE_ALIGN(_ramstart);
 	memory_end = _ramend;
 
-	init_mm.start_code = (unsigned long) &_stext;
-	init_mm.end_code = (unsigned long) &_etext;
-	init_mm.end_data = (unsigned long) &_edata;
-	init_mm.brk = (unsigned long) 0;
+	setup_initial_init_mm(_stext, _etext, _edata, NULL);
 
 	config_BSP(&command_line[0], sizeof(command_line));
 
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 23/54] nds32: convert to setup_initial_init_mm()
  2021-07-08  0:59 incoming Andrew Morton
                   ` (21 preceding siblings ...)
  2021-07-08  1:08 ` [patch 22/54] m68k: " Andrew Morton
@ 2021-07-08  1:08 ` Andrew Morton
  2021-07-08  1:08 ` [patch 24/54] nios2: " Andrew Morton
                   ` (30 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:08 UTC (permalink / raw)
  To: akpm, green.hu, linux-mm, mm-commits, nickhu, torvalds, wangkefeng.wang

From: Kefeng Wang <wangkefeng.wang@huawei.com>
Subject: nds32: convert to setup_initial_init_mm()

Use setup_initial_init_mm() helper to simplify code.

Link: https://lkml.kernel.org/r/20210608083418.137226-9-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Greentime Hu <green.hu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/nds32/kernel/setup.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/arch/nds32/kernel/setup.c~nds32-convert-to-setup_initial_init_mm
+++ a/arch/nds32/kernel/setup.c
@@ -294,10 +294,7 @@ void __init setup_arch(char **cmdline_p)
 
 	setup_cpuinfo();
 
-	init_mm.start_code = (unsigned long)&_stext;
-	init_mm.end_code = (unsigned long)&_etext;
-	init_mm.end_data = (unsigned long)&_edata;
-	init_mm.brk = (unsigned long)&_end;
+	setup_initial_init_mm(_stext, _etext, _edata, _end);
 
 	/* setup bootmem allocator */
 	setup_memory();
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 24/54] nios2: convert to setup_initial_init_mm()
  2021-07-08  0:59 incoming Andrew Morton
                   ` (22 preceding siblings ...)
  2021-07-08  1:08 ` [patch 23/54] nds32: " Andrew Morton
@ 2021-07-08  1:08 ` Andrew Morton
  2021-07-08  1:08 ` [patch 25/54] openrisc: " Andrew Morton
                   ` (29 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:08 UTC (permalink / raw)
  To: akpm, ley.foon.tan, linux-mm, mm-commits, torvalds, wangkefeng.wang

From: Kefeng Wang <wangkefeng.wang@huawei.com>
Subject: nios2: convert to setup_initial_init_mm()

Use setup_initial_init_mm() helper to simplify code.

Link: https://lkml.kernel.org/r/20210608083418.137226-10-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/nios2/kernel/setup.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/arch/nios2/kernel/setup.c~nios2-convert-to-setup_initial_init_mm
+++ a/arch/nios2/kernel/setup.c
@@ -156,10 +156,7 @@ void __init setup_arch(char **cmdline_p)
 	memory_start = memblock_start_of_DRAM();
 	memory_end = memblock_end_of_DRAM();
 
-	init_mm.start_code = (unsigned long) _stext;
-	init_mm.end_code = (unsigned long) _etext;
-	init_mm.end_data = (unsigned long) _edata;
-	init_mm.brk = (unsigned long) _end;
+	setup_initial_init_mm(_stext, _etext, _edata, _end);
 	init_task.thread.kregs = &fake_regs;
 
 	/* Keep a copy of command line */
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 25/54] openrisc: convert to setup_initial_init_mm()
  2021-07-08  0:59 incoming Andrew Morton
                   ` (23 preceding siblings ...)
  2021-07-08  1:08 ` [patch 24/54] nios2: " Andrew Morton
@ 2021-07-08  1:08 ` Andrew Morton
  2021-07-08  1:08 ` [patch 26/54] powerpc: " Andrew Morton
                   ` (28 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:08 UTC (permalink / raw)
  To: akpm, jonas, linux-mm, mm-commits, shorne, stefan.kristiansson,
	torvalds, wangkefeng.wang

From: Kefeng Wang <wangkefeng.wang@huawei.com>
Subject: openrisc: convert to setup_initial_init_mm()

Use setup_initial_init_mm() helper to simplify code.

Link: https://lkml.kernel.org/r/20210608083418.137226-11-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Stafford Horne <shorne@gmail.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Stafford Horne <shorne@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/openrisc/kernel/setup.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/arch/openrisc/kernel/setup.c~openrisc-convert-to-setup_initial_init_mm
+++ a/arch/openrisc/kernel/setup.c
@@ -293,10 +293,7 @@ void __init setup_arch(char **cmdline_p)
 #endif
 
 	/* process 1's initial memory region is the kernel code/data */
-	init_mm.start_code = (unsigned long)_stext;
-	init_mm.end_code = (unsigned long)_etext;
-	init_mm.end_data = (unsigned long)_edata;
-	init_mm.brk = (unsigned long)_end;
+	setup_initial_init_mm(_stext, _etext, _edata, _end);
 
 #ifdef CONFIG_BLK_DEV_INITRD
 	if (initrd_start == initrd_end) {
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 26/54] powerpc: convert to setup_initial_init_mm()
  2021-07-08  0:59 incoming Andrew Morton
                   ` (24 preceding siblings ...)
  2021-07-08  1:08 ` [patch 25/54] openrisc: " Andrew Morton
@ 2021-07-08  1:08 ` Andrew Morton
  2021-07-08  4:46   ` Christophe Leroy
  2021-07-08  1:08 ` [patch 27/54] riscv: " Andrew Morton
                   ` (27 subsequent siblings)
  53 siblings, 1 reply; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:08 UTC (permalink / raw)
  To: akpm, benh, christophe.leroy, jrdr.linux, linux-mm, mm-commits,
	mpe, torvalds, wangkefeng.wang

From: Kefeng Wang <wangkefeng.wang@huawei.com>
Subject: powerpc: convert to setup_initial_init_mm()

Use setup_initial_init_mm() helper to simplify code.

Note klimit is (unsigned long) _end, with new helper, will use _end
directly.

Link: https://lkml.kernel.org/r/20210608083418.137226-12-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/powerpc/kernel/setup-common.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/arch/powerpc/kernel/setup-common.c~powerpc-convert-to-setup_initial_init_mm
+++ a/arch/powerpc/kernel/setup-common.c
@@ -926,10 +926,7 @@ void __init setup_arch(char **cmdline_p)
 
 	klp_init_thread_info(&init_task);
 
-	init_mm.start_code = (unsigned long)_stext;
-	init_mm.end_code = (unsigned long) _etext;
-	init_mm.end_data = (unsigned long) _edata;
-	init_mm.brk = (unsigned long)_end;
+	setup_initial_init_mm(_stext, _etext, _edata, _end);
 
 	mm_iommu_init(&init_mm);
 	irqstack_early_init();
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 27/54] riscv: convert to setup_initial_init_mm()
  2021-07-08  0:59 incoming Andrew Morton
                   ` (25 preceding siblings ...)
  2021-07-08  1:08 ` [patch 26/54] powerpc: " Andrew Morton
@ 2021-07-08  1:08 ` Andrew Morton
  2021-07-08  1:08 ` [patch 28/54] s390: " Andrew Morton
                   ` (26 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:08 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, palmerdabbelt, paul.walmsley,
	torvalds, wangkefeng.wang

From: Kefeng Wang <wangkefeng.wang@huawei.com>
Subject: riscv: convert to setup_initial_init_mm()

Use setup_initial_init_mm() helper to simplify code.

Link: https://lkml.kernel.org/r/20210608083418.137226-13-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/riscv/kernel/setup.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/arch/riscv/kernel/setup.c~riscv-convert-to-setup_initial_init_mm
+++ a/arch/riscv/kernel/setup.c
@@ -264,10 +264,7 @@ static void __init parse_dtb(void)
 void __init setup_arch(char **cmdline_p)
 {
 	parse_dtb();
-	init_mm.start_code = (unsigned long) _stext;
-	init_mm.end_code   = (unsigned long) _etext;
-	init_mm.end_data   = (unsigned long) _edata;
-	init_mm.brk        = (unsigned long) _end;
+	setup_initial_init_mm(_stext, _etext, _edata, _end);
 
 	*cmdline_p = boot_command_line;
 
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 28/54] s390: convert to setup_initial_init_mm()
  2021-07-08  0:59 incoming Andrew Morton
                   ` (26 preceding siblings ...)
  2021-07-08  1:08 ` [patch 27/54] riscv: " Andrew Morton
@ 2021-07-08  1:08 ` Andrew Morton
  2021-07-08  1:09 ` [patch 29/54] sh: " Andrew Morton
                   ` (25 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:08 UTC (permalink / raw)
  To: akpm, borntraeger, gor, hca, linux-mm, mm-commits, torvalds,
	wangkefeng.wang

From: Kefeng Wang <wangkefeng.wang@huawei.com>
Subject: s390: convert to setup_initial_init_mm()

Use setup_initial_init_mm() helper to simplify code.

Link: https://lkml.kernel.org/r/20210608083418.137226-14-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/s390/kernel/setup.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/arch/s390/kernel/setup.c~s390-convert-to-setup_initial_init_mm
+++ a/arch/s390/kernel/setup.c
@@ -1028,10 +1028,7 @@ void __init setup_arch(char **cmdline_p)
 
         ROOT_DEV = Root_RAM0;
 
-	init_mm.start_code = (unsigned long) _text;
-	init_mm.end_code = (unsigned long) _etext;
-	init_mm.end_data = (unsigned long) _edata;
-	init_mm.brk = (unsigned long) _end;
+	setup_initial_init_mm(_text, _etext, _edata, _end);
 
 	if (IS_ENABLED(CONFIG_EXPOLINE_AUTO))
 		nospec_auto_detect();
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 29/54] sh: convert to setup_initial_init_mm()
  2021-07-08  0:59 incoming Andrew Morton
                   ` (27 preceding siblings ...)
  2021-07-08  1:08 ` [patch 28/54] s390: " Andrew Morton
@ 2021-07-08  1:09 ` Andrew Morton
  2021-07-08  1:09 ` [patch 30/54] x86: " Andrew Morton
                   ` (24 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:09 UTC (permalink / raw)
  To: akpm, dalias, linux-mm, mm-commits, torvalds, wangkefeng.wang, ysato

From: Kefeng Wang <wangkefeng.wang@huawei.com>
Subject: sh: convert to setup_initial_init_mm()

Use setup_initial_init_mm() helper to simplify code.

Link: https://lkml.kernel.org/r/20210608083418.137226-15-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/sh/kernel/setup.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/arch/sh/kernel/setup.c~sh-convert-to-setup_initial_init_mm
+++ a/arch/sh/kernel/setup.c
@@ -294,10 +294,7 @@ void __init setup_arch(char **cmdline_p)
 
 	if (!MOUNT_ROOT_RDONLY)
 		root_mountflags &= ~MS_RDONLY;
-	init_mm.start_code = (unsigned long) _text;
-	init_mm.end_code = (unsigned long) _etext;
-	init_mm.end_data = (unsigned long) _edata;
-	init_mm.brk = (unsigned long) _end;
+	setup_initial_init_mm(_text, _etext, _edata, _end);
 
 	code_resource.start = virt_to_phys(_text);
 	code_resource.end = virt_to_phys(_etext)-1;
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 30/54] x86: convert to setup_initial_init_mm()
  2021-07-08  0:59 incoming Andrew Morton
                   ` (28 preceding siblings ...)
  2021-07-08  1:09 ` [patch 29/54] sh: " Andrew Morton
@ 2021-07-08  1:09 ` Andrew Morton
  2021-07-08  1:09 ` [patch 31/54] buildid: only consider GNU notes for build ID parsing Andrew Morton
                   ` (23 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:09 UTC (permalink / raw)
  To: akpm, linux-mm, mingo, mm-commits, tglx, torvalds, wangkefeng.wang

From: Kefeng Wang <wangkefeng.wang@huawei.com>
Subject: x86: convert to setup_initial_init_mm()

Use setup_initial_init_mm() helper to simplify code.

Link: https://lkml.kernel.org/r/20210608083418.137226-16-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/x86/kernel/setup.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/arch/x86/kernel/setup.c~x86-convert-to-setup_initial_init_mm
+++ a/arch/x86/kernel/setup.c
@@ -847,10 +847,7 @@ void __init setup_arch(char **cmdline_p)
 
 	if (!boot_params.hdr.root_flags)
 		root_mountflags &= ~MS_RDONLY;
-	init_mm.start_code = (unsigned long) _text;
-	init_mm.end_code = (unsigned long) _etext;
-	init_mm.end_data = (unsigned long) _edata;
-	init_mm.brk = _brk_end;
+	setup_initial_init_mm(_text, _etext, _edata, (void *)_brk_end);
 
 	code_resource.start = __pa_symbol(_text);
 	code_resource.end = __pa_symbol(_etext)-1;
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 31/54] buildid: only consider GNU notes for build ID parsing
  2021-07-08  0:59 incoming Andrew Morton
                   ` (29 preceding siblings ...)
  2021-07-08  1:09 ` [patch 30/54] x86: " Andrew Morton
@ 2021-07-08  1:09 ` Andrew Morton
  2021-07-08  1:09 ` [patch 32/54] buildid: add API to parse build ID out of buffer Andrew Morton
                   ` (22 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:09 UTC (permalink / raw)
  To: akpm, andriy.shevchenko, ast, bhe, bp, catalin.marinas, dyoung,
	evgreen, hsinyi, jeyu, jolsa, khlebnikov, linux-mm, linux, mingo,
	mm-commits, pmladek, rostedt, sashal, sergey.senozhatsky, swboyd,
	tglx, torvalds, vgoyal, will, willy

From: Stephen Boyd <swboyd@chromium.org>
Subject: buildid: only consider GNU notes for build ID parsing

Patch series "Add build ID to stacktraces", v6.

This series adds the kernel's build ID[1] to the stacktrace header printed
in oops messages, warnings, etc.  and the build ID for any module that
appears in the stacktrace after the module name.  The goal is to make the
stacktrace more self-contained and descriptive by including the relevant
build IDs in the kernel logs when something goes wrong.  This can be used
by post processing tools like script/decode_stacktrace.sh and kernel
developers to easily locate the debug info associated with a kernel crash
and line up what line and file things started falling apart at.

To show how this can be used I've included a patch to decode_stacktrace.sh
that downloads the debuginfo from a debuginfod server.  This also includes
some patches to make the buildid.c file use more const arguments and
consolidate logic into buildid.c from kdump.  These are left to the end as
they were mostly cleanup patches.

Here's an example lkdtm stacktrace on arm64.

 WARNING: CPU: 4 PID: 3255 at drivers/misc/lkdtm/bugs.c:83 lkdtm_WARNING+0x28/0x30 [lkdtm]
 Modules linked in: lkdtm rfcomm algif_hash algif_skcipher af_alg xt_cgroup uinput xt_MASQUERADE
 CPU: 4 PID: 3255 Comm: bash Not tainted 5.11 #3 aa23f7a1231c229de205662d5a9e0d4c580f19a1
 Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
 pstate: 00400009 (nzcv daif +PAN -UAO -TCO BTYPE=--)
 pc : lkdtm_WARNING+0x28/0x30 [lkdtm]
 lr : lkdtm_do_action+0x24/0x40 [lkdtm]
 sp : ffffffc0134fbca0
 x29: ffffffc0134fbca0 x28: ffffff92d53ba240
 x27: 0000000000000000 x26: 0000000000000000
 x25: 0000000000000000 x24: ffffffe3622352c0
 x23: 0000000000000020 x22: ffffffe362233366
 x21: ffffffe3622352e0 x20: ffffffc0134fbde0
 x19: 0000000000000008 x18: 0000000000000000
 x17: ffffff929b6536fc x16: 0000000000000000
 x15: 0000000000000000 x14: 0000000000000012
 x13: ffffffe380ed892c x12: ffffffe381d05068
 x11: 0000000000000000 x10: 0000000000000000
 x9 : 0000000000000001 x8 : ffffffe362237000
 x7 : aaaaaaaaaaaaaaaa x6 : 0000000000000000
 x5 : 0000000000000000 x4 : 0000000000000001
 x3 : 0000000000000008 x2 : ffffff93fef25a70
 x1 : ffffff93fef15788 x0 : ffffffe3622352e0
 Call trace:
  lkdtm_WARNING+0x28/0x30 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
  direct_entry+0x16c/0x1b4 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
  full_proxy_write+0x74/0xa4
  vfs_write+0xec/0x2e8
  ksys_write+0x84/0xf0
  __arm64_sys_write+0x24/0x30
  el0_svc_common+0xf4/0x1c0
  do_el0_svc_compat+0x28/0x3c
  el0_svc_compat+0x10/0x1c
  el0_sync_compat_handler+0xa8/0xcc
  el0_sync_compat+0x178/0x180
 ---[ end trace 3d95032303e59e68 ]---


This patch (of 13):

Some kernel elf files have various notes that also happen to have an elf
note type of '3', which matches NT_GNU_BUILD_ID but the note name isn't
"GNU".  For example, this note trips up the existing logic:

 Owner  Data size   Description
 Xen    0x00000008  Unknown note type: (0x00000003) description data: 00 00 00 ffffff80 ffffffff ffffffff ffffffff ffffffff

Let's make sure that it is a GNU note when parsing the build ID so that we
can use this function to parse a vmlinux's build ID too.

Link: https://lkml.kernel.org/r/20210511003845.2429846-1-swboyd@chromium.org
Link: https://lkml.kernel.org/r/20210511003845.2429846-2-swboyd@chromium.org
Fixes: bd7525dacd7e ("bpf: Move stack_map_get_build_id into lib")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reported-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Evan Green <evgreen@chromium.org>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 lib/buildid.c |    1 +
 1 file changed, 1 insertion(+)

--- a/lib/buildid.c~buildid-only-consider-gnu-notes-for-build-id-parsing
+++ a/lib/buildid.c
@@ -31,6 +31,7 @@ static inline int parse_build_id(void *p
 
 		if (nhdr->n_type == BUILD_ID &&
 		    nhdr->n_namesz == sizeof("GNU") &&
+		    !strcmp((char *)(nhdr + 1), "GNU") &&
 		    nhdr->n_descsz > 0 &&
 		    nhdr->n_descsz <= BUILD_ID_SIZE_MAX) {
 			memcpy(build_id,
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 32/54] buildid: add API to parse build ID out of buffer
  2021-07-08  0:59 incoming Andrew Morton
                   ` (30 preceding siblings ...)
  2021-07-08  1:09 ` [patch 31/54] buildid: only consider GNU notes for build ID parsing Andrew Morton
@ 2021-07-08  1:09 ` Andrew Morton
  2021-07-08  1:09 ` [patch 33/54] buildid: stash away kernels build ID on init Andrew Morton
                   ` (21 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:09 UTC (permalink / raw)
  To: akpm, andriy.shevchenko, ast, bhe, bp, catalin.marinas, dyoung,
	evgreen, hsinyi, jeyu, jolsa, khlebnikov, linux-mm, linux, mingo,
	mm-commits, pmladek, rostedt, sashal, sergey.senozhatsky, swboyd,
	tglx, torvalds, vgoyal, will, willy

From: Stephen Boyd <swboyd@chromium.org>
Subject: buildid: add API to parse build ID out of buffer

Add an API that can parse the build ID out of a buffer, instead of a vma,
to support printing a kernel module's build ID for stack traces.

Link: https://lkml.kernel.org/r/20210511003845.2429846-3-swboyd@chromium.org
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Evan Green <evgreen@chromium.org>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/buildid.h |    1 
 lib/buildid.c           |   50 ++++++++++++++++++++++++++++----------
 2 files changed, 38 insertions(+), 13 deletions(-)

--- a/include/linux/buildid.h~buildid-add-api-to-parse-build-id-out-of-buffer
+++ a/include/linux/buildid.h
@@ -8,5 +8,6 @@
 
 int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
 		   __u32 *size);
+int build_id_parse_buf(const void *buf, unsigned char *build_id, u32 buf_size);
 
 #endif
--- a/lib/buildid.c~buildid-add-api-to-parse-build-id-out-of-buffer
+++ a/lib/buildid.c
@@ -2,30 +2,23 @@
 
 #include <linux/buildid.h>
 #include <linux/elf.h>
+#include <linux/kernel.h>
 #include <linux/pagemap.h>
 
 #define BUILD_ID 3
+
 /*
  * Parse build id from the note segment. This logic can be shared between
  * 32-bit and 64-bit system, because Elf32_Nhdr and Elf64_Nhdr are
  * identical.
  */
-static inline int parse_build_id(void *page_addr,
-				 unsigned char *build_id,
-				 __u32 *size,
-				 void *note_start,
-				 Elf32_Word note_size)
+static int parse_build_id_buf(unsigned char *build_id,
+			      __u32 *size,
+			      const void *note_start,
+			      Elf32_Word note_size)
 {
 	Elf32_Word note_offs = 0, new_offs;
 
-	/* check for overflow */
-	if (note_start < page_addr || note_start + note_size < note_start)
-		return -EINVAL;
-
-	/* only supports note that fits in the first page */
-	if (note_start + note_size > page_addr + PAGE_SIZE)
-		return -EINVAL;
-
 	while (note_offs + sizeof(Elf32_Nhdr) < note_size) {
 		Elf32_Nhdr *nhdr = (Elf32_Nhdr *)(note_start + note_offs);
 
@@ -50,9 +43,27 @@ static inline int parse_build_id(void *p
 			break;
 		note_offs = new_offs;
 	}
+
 	return -EINVAL;
 }
 
+static inline int parse_build_id(void *page_addr,
+				 unsigned char *build_id,
+				 __u32 *size,
+				 void *note_start,
+				 Elf32_Word note_size)
+{
+	/* check for overflow */
+	if (note_start < page_addr || note_start + note_size < note_start)
+		return -EINVAL;
+
+	/* only supports note that fits in the first page */
+	if (note_start + note_size > page_addr + PAGE_SIZE)
+		return -EINVAL;
+
+	return parse_build_id_buf(build_id, size, note_start, note_size);
+}
+
 /* Parse build ID from 32-bit ELF */
 static int get_build_id_32(void *page_addr, unsigned char *build_id,
 			   __u32 *size)
@@ -148,3 +159,16 @@ out:
 	put_page(page);
 	return ret;
 }
+
+/**
+ * build_id_parse_buf - Get build ID from a buffer
+ * @buf:      Elf note section(s) to parse
+ * @buf_size: Size of @buf in bytes
+ * @build_id: Build ID parsed from @buf, at least BUILD_ID_SIZE_MAX long
+ *
+ * Return: 0 on success, -EINVAL otherwise
+ */
+int build_id_parse_buf(const void *buf, unsigned char *build_id, u32 buf_size)
+{
+	return parse_build_id_buf(build_id, NULL, buf, buf_size);
+}
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 33/54] buildid: stash away kernels build ID on init
  2021-07-08  0:59 incoming Andrew Morton
                   ` (31 preceding siblings ...)
  2021-07-08  1:09 ` [patch 32/54] buildid: add API to parse build ID out of buffer Andrew Morton
@ 2021-07-08  1:09 ` Andrew Morton
  2021-07-08  1:09 ` [patch 34/54] dump_stack: add vmlinux build ID to stack traces Andrew Morton
                   ` (20 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:09 UTC (permalink / raw)
  To: akpm, andriy.shevchenko, ast, bhe, bp, catalin.marinas, dyoung,
	evgreen, hsinyi, jeyu, jolsa, khlebnikov, linux-mm, linux, mingo,
	mm-commits, pmladek, rostedt, sashal, sergey.senozhatsky, swboyd,
	tglx, torvalds, vgoyal, will, willy

From: Stephen Boyd <swboyd@chromium.org>
Subject: buildid: stash away kernels build ID on init

Parse the kernel's build ID at initialization so that other code can print
a hex format string representation of the running kernel's build ID.  This
will be used in the kdump and dump_stack code so that developers can
easily locate the vmlinux debug symbols for a crash/stacktrace.

[swboyd@chromium.org: fix implicit declaration of init_vmlinux_build_id()]
  Link: https://lkml.kernel.org/r/CAE-0n51UjTbay8N9FXAyE7_aR2+ePrQnKSRJ0gbmRsXtcLBVaw@mail.gmail.com
Link: https://lkml.kernel.org/r/20210511003845.2429846-4-swboyd@chromium.org
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Evan Green <evgreen@chromium.org>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/buildid.h |    3 +++
 init/main.c             |    2 ++
 lib/buildid.c           |   15 +++++++++++++++
 3 files changed, 20 insertions(+)

--- a/include/linux/buildid.h~buildid-stash-away-kernels-build-id-on-init
+++ a/include/linux/buildid.h
@@ -10,4 +10,7 @@ int build_id_parse(struct vm_area_struct
 		   __u32 *size);
 int build_id_parse_buf(const void *buf, unsigned char *build_id, u32 buf_size);
 
+extern unsigned char vmlinux_build_id[BUILD_ID_SIZE_MAX];
+void init_vmlinux_build_id(void);
+
 #endif
--- a/init/main.c~buildid-stash-away-kernels-build-id-on-init
+++ a/init/main.c
@@ -45,6 +45,7 @@
 #include <linux/srcu.h>
 #include <linux/moduleparam.h>
 #include <linux/kallsyms.h>
+#include <linux/buildid.h>
 #include <linux/writeback.h>
 #include <linux/cpu.h>
 #include <linux/cpuset.h>
@@ -913,6 +914,7 @@ asmlinkage __visible void __init __no_sa
 	set_task_stack_end_magic(&init_task);
 	smp_setup_processor_id();
 	debug_objects_early_init();
+	init_vmlinux_build_id();
 
 	cgroup_init_early();
 
--- a/lib/buildid.c~buildid-stash-away-kernels-build-id-on-init
+++ a/lib/buildid.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 
 #include <linux/buildid.h>
+#include <linux/cache.h>
 #include <linux/elf.h>
 #include <linux/kernel.h>
 #include <linux/pagemap.h>
@@ -172,3 +173,17 @@ int build_id_parse_buf(const void *buf,
 {
 	return parse_build_id_buf(build_id, NULL, buf, buf_size);
 }
+
+unsigned char vmlinux_build_id[BUILD_ID_SIZE_MAX] __ro_after_init;
+
+/**
+ * init_vmlinux_build_id - Compute and stash the running kernel's build ID
+ */
+void __init init_vmlinux_build_id(void)
+{
+	extern const void __start_notes __weak;
+	extern const void __stop_notes __weak;
+	unsigned int size = &__stop_notes - &__start_notes;
+
+	build_id_parse_buf(&__start_notes, vmlinux_build_id, size);
+}
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 34/54] dump_stack: add vmlinux build ID to stack traces
  2021-07-08  0:59 incoming Andrew Morton
                   ` (32 preceding siblings ...)
  2021-07-08  1:09 ` [patch 33/54] buildid: stash away kernels build ID on init Andrew Morton
@ 2021-07-08  1:09 ` Andrew Morton
  2021-07-08  1:09 ` [patch 35/54] module: add printk formats to add module build ID to stacktraces Andrew Morton
                   ` (19 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:09 UTC (permalink / raw)
  To: akpm, andriy.shevchenko, ast, bhe, bp, catalin.marinas, dyoung,
	evgreen, hsinyi, jeyu, jolsa, khlebnikov, linux-mm, linux, mingo,
	mm-commits, pmladek, rostedt, sashal, sergey.senozhatsky, swboyd,
	tglx, torvalds, vgoyal, will, willy

From: Stephen Boyd <swboyd@chromium.org>
Subject: dump_stack: add vmlinux build ID to stack traces

Add the running kernel's build ID[1] to the stacktrace information header.
This makes it simpler for developers to locate the vmlinux with full
debuginfo for a particular kernel stacktrace.  Combined with
scripts/decode_stracktrace.sh, a developer can download the correct
vmlinux from a debuginfod[2] server and find the exact file and line
number for the functions plus offsets in a stacktrace.

This is especially useful for pstore crash debugging where the kernel
crashes are recorded in the pstore logs and the recovery kernel is
different or the debuginfo doesn't exist on the device due to space
concerns (the data can be large and a security concern).  The stacktrace
can be analyzed after the crash by using the build ID to find the matching
vmlinux and understand where in the function something went wrong.

Example stacktrace from lkdtm:

 WARNING: CPU: 4 PID: 3255 at drivers/misc/lkdtm/bugs.c:83 lkdtm_WARNING+0x28/0x30 [lkdtm]
 Modules linked in: lkdtm rfcomm algif_hash algif_skcipher af_alg xt_cgroup uinput xt_MASQUERADE
 CPU: 4 PID: 3255 Comm: bash Not tainted 5.11 #3 aa23f7a1231c229de205662d5a9e0d4c580f19a1
 Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
 pstate: 00400009 (nzcv daif +PAN -UAO -TCO BTYPE=--)
 pc : lkdtm_WARNING+0x28/0x30 [lkdtm]

The hex string aa23f7a1231c229de205662d5a9e0d4c580f19a1 is the build ID,
following the kernel version number. Put it all behind a config option,
STACKTRACE_BUILD_ID, so that kernel developers can remove this
information if they decide it is too much.

Link: https://lkml.kernel.org/r/20210511003845.2429846-5-swboyd@chromium.org
Link: https://fedoraproject.org/wiki/Releases/FeatureBuildId [1]
Link: https://sourceware.org/elfutils/Debuginfod.html [2]
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Evan Green <evgreen@chromium.org>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/buildid.h |    4 ++++
 lib/Kconfig.debug       |   11 +++++++++++
 lib/buildid.c           |    2 ++
 lib/dump_stack.c        |   13 +++++++++++--
 4 files changed, 28 insertions(+), 2 deletions(-)

--- a/include/linux/buildid.h~dump_stack-add-vmlinux-build-id-to-stack-traces
+++ a/include/linux/buildid.h
@@ -10,7 +10,11 @@ int build_id_parse(struct vm_area_struct
 		   __u32 *size);
 int build_id_parse_buf(const void *buf, unsigned char *build_id, u32 buf_size);
 
+#if IS_ENABLED(CONFIG_STACKTRACE_BUILD_ID)
 extern unsigned char vmlinux_build_id[BUILD_ID_SIZE_MAX];
 void init_vmlinux_build_id(void);
+#else
+static inline void init_vmlinux_build_id(void) { }
+#endif
 
 #endif
--- a/lib/buildid.c~dump_stack-add-vmlinux-build-id-to-stack-traces
+++ a/lib/buildid.c
@@ -174,6 +174,7 @@ int build_id_parse_buf(const void *buf,
 	return parse_build_id_buf(build_id, NULL, buf, buf_size);
 }
 
+#if IS_ENABLED(CONFIG_STACKTRACE_BUILD_ID)
 unsigned char vmlinux_build_id[BUILD_ID_SIZE_MAX] __ro_after_init;
 
 /**
@@ -187,3 +188,4 @@ void __init init_vmlinux_build_id(void)
 
 	build_id_parse_buf(&__start_notes, vmlinux_build_id, size);
 }
+#endif
--- a/lib/dump_stack.c~dump_stack-add-vmlinux-build-id-to-stack-traces
+++ a/lib/dump_stack.c
@@ -5,6 +5,7 @@
  */
 
 #include <linux/kernel.h>
+#include <linux/buildid.h>
 #include <linux/export.h>
 #include <linux/sched.h>
 #include <linux/sched/debug.h>
@@ -36,6 +37,14 @@ void __init dump_stack_set_arch_desc(con
 	va_end(args);
 }
 
+#if IS_ENABLED(CONFIG_STACKTRACE_BUILD_ID)
+#define BUILD_ID_FMT " %20phN"
+#define BUILD_ID_VAL vmlinux_build_id
+#else
+#define BUILD_ID_FMT "%s"
+#define BUILD_ID_VAL ""
+#endif
+
 /**
  * dump_stack_print_info - print generic debug info for dump_stack()
  * @log_lvl: log level
@@ -45,13 +54,13 @@ void __init dump_stack_set_arch_desc(con
  */
 void dump_stack_print_info(const char *log_lvl)
 {
-	printk("%sCPU: %d PID: %d Comm: %.20s %s%s %s %.*s\n",
+	printk("%sCPU: %d PID: %d Comm: %.20s %s%s %s %.*s" BUILD_ID_FMT "\n",
 	       log_lvl, raw_smp_processor_id(), current->pid, current->comm,
 	       kexec_crash_loaded() ? "Kdump: loaded " : "",
 	       print_tainted(),
 	       init_utsname()->release,
 	       (int)strcspn(init_utsname()->version, " "),
-	       init_utsname()->version);
+	       init_utsname()->version, BUILD_ID_VAL);
 
 	if (dump_stack_arch_desc_str[0] != '\0')
 		printk("%sHardware name: %s\n",
--- a/lib/Kconfig.debug~dump_stack-add-vmlinux-build-id-to-stack-traces
+++ a/lib/Kconfig.debug
@@ -35,6 +35,17 @@ config PRINTK_CALLER
 	  no option to enable/disable at the kernel command line parameter or
 	  sysfs interface.
 
+config STACKTRACE_BUILD_ID
+	bool "Show build ID information in stacktraces"
+	depends on PRINTK
+	help
+	  Selecting this option adds build ID information for symbols in
+	  stacktraces printed with the printk format '%p[SR]b'.
+
+	  This option is intended for distros where debuginfo is not easily
+	  accessible but can be downloaded given the build ID of the vmlinux or
+	  kernel module where the function is located.
+
 config CONSOLE_LOGLEVEL_DEFAULT
 	int "Default console loglevel (1-15)"
 	range 1 15
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 35/54] module: add printk formats to add module build ID to stacktraces
  2021-07-08  0:59 incoming Andrew Morton
                   ` (33 preceding siblings ...)
  2021-07-08  1:09 ` [patch 34/54] dump_stack: add vmlinux build ID to stack traces Andrew Morton
@ 2021-07-08  1:09 ` Andrew Morton
  2021-07-08  1:09 ` [patch 36/54] arm64: stacktrace: use %pSb for backtrace printing Andrew Morton
                   ` (18 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:09 UTC (permalink / raw)
  To: akpm, andriy.shevchenko, ast, bhe, bp, catalin.marinas,
	cuibixuan, dyoung, evgreen, hsinyi, jeyu, jolsa, khlebnikov,
	linux-mm, linux, mingo, mm-commits, pmladek, rdunlap, rostedt,
	sashal, sergey.senozhatsky, swboyd, tglx, torvalds, vgoyal, will,
	willy

From: Stephen Boyd <swboyd@chromium.org>
Subject: module: add printk formats to add module build ID to stacktraces

Let's make kernel stacktraces easier to identify by including the build
ID[1] of a module if the stacktrace is printing a symbol from a module. 
This makes it simpler for developers to locate a kernel module's full
debuginfo for a particular stacktrace.  Combined with
scripts/decode_stracktrace.sh, a developer can download the matching
debuginfo from a debuginfod[2] server and find the exact file and line
number for the functions plus offsets in a stacktrace that match the
module.  This is especially useful for pstore crash debugging where the
kernel crashes are recorded in something like console-ramoops and the
recovery kernel/modules are different or the debuginfo doesn't exist on
the device due to space concerns (the debuginfo can be too large for space
limited devices).

Originally, I put this on the %pS format, but that was quickly rejected
given that %pS is used in other places such as ftrace where build IDs
aren't meaningful.  There was some discussions on the list to put every
module build ID into the "Modules linked in:" section of the stacktrace
message but that quickly becomes very hard to read once you have more than
three or four modules linked in.  It also provides too much information
when we don't expect each module to be traversed in a stacktrace.  Having
the build ID for modules that aren't important just makes things messy. 
Splitting it to multiple lines for each module quickly explodes the number
of lines printed in an oops too, possibly wrapping the warning off the
console.  And finally, trying to stash away each module used in a
callstack to provide the ID of each symbol printed is cumbersome and would
require changes to each architecture to stash away modules and return
their build IDs once unwinding has completed.

Instead, we opt for the simpler approach of introducing new printk formats
'%pS[R]b' for "pointer symbolic backtrace with module build ID" and '%pBb'
for "pointer backtrace with module build ID" and then updating the few
places in the architecture layer where the stacktrace is printed to use
this new format.

Before:

 Call trace:
  lkdtm_WARNING+0x28/0x30 [lkdtm]
  direct_entry+0x16c/0x1b4 [lkdtm]
  full_proxy_write+0x74/0xa4
  vfs_write+0xec/0x2e8

After:

 Call trace:
  lkdtm_WARNING+0x28/0x30 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9]
  direct_entry+0x16c/0x1b4 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9]
  full_proxy_write+0x74/0xa4
  vfs_write+0xec/0x2e8

[akpm@linux-foundation.org: fix build with CONFIG_MODULES=n, tweak code layout]
[rdunlap@infradead.org: fix build when CONFIG_MODULES is not set]
  Link: https://lkml.kernel.org/r/20210513171510.20328-1-rdunlap@infradead.org
[akpm@linux-foundation.org: make kallsyms_lookup_buildid() static]
[cuibixuan@huawei.com: fix build error when CONFIG_SYSFS is disabled]
  Link: https://lkml.kernel.org/r/20210525105049.34804-1-cuibixuan@huawei.com
Link: https://lkml.kernel.org/r/20210511003845.2429846-6-swboyd@chromium.org
Link: https://fedoraproject.org/wiki/Releases/FeatureBuildId [1]
Link: https://sourceware.org/elfutils/Debuginfod.html [2]
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Bixuan Cui <cuibixuan@huawei.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Evan Green <evgreen@chromium.org>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 Documentation/core-api/printk-formats.rst |   11 ++
 include/linux/kallsyms.h                  |   21 +++-
 include/linux/module.h                    |    9 +
 kernel/kallsyms.c                         |  104 ++++++++++++++++----
 kernel/module.c                           |   42 +++++++-
 lib/vsprintf.c                            |    8 +
 6 files changed, 166 insertions(+), 29 deletions(-)

--- a/Documentation/core-api/printk-formats.rst~module-add-printk-formats-to-add-module-build-id-to-stacktraces
+++ a/Documentation/core-api/printk-formats.rst
@@ -125,6 +125,17 @@ used when printing stack backtraces. The
 consideration the effect of compiler optimisations which may occur
 when tail-calls are used and marked with the noreturn GCC attribute.
 
+If the pointer is within a module, the module name and optionally build ID is
+printed after the symbol name with an extra ``b`` appended to the end of the
+specifier.
+
+::
+	%pS	versatile_init+0x0/0x110 [module_name]
+	%pSb	versatile_init+0x0/0x110 [module_name ed5019fdf5e53be37cb1ba7899292d7e143b259e]
+	%pSRb	versatile_init+0x9/0x110 [module_name ed5019fdf5e53be37cb1ba7899292d7e143b259e]
+		(with __builtin_extract_return_addr() translation)
+	%pBb	prev_fn_of_versatile_init+0x88/0x88 [module_name ed5019fdf5e53be37cb1ba7899292d7e143b259e]
+
 Probed Pointers from BPF / tracing
 ----------------------------------
 
--- a/include/linux/kallsyms.h~module-add-printk-formats-to-add-module-build-id-to-stacktraces
+++ a/include/linux/kallsyms.h
@@ -7,6 +7,7 @@
 #define _LINUX_KALLSYMS_H
 
 #include <linux/errno.h>
+#include <linux/buildid.h>
 #include <linux/kernel.h>
 #include <linux/stddef.h>
 #include <linux/mm.h>
@@ -15,8 +16,10 @@
 #include <asm/sections.h>
 
 #define KSYM_NAME_LEN 128
-#define KSYM_SYMBOL_LEN (sizeof("%s+%#lx/%#lx [%s]") + (KSYM_NAME_LEN - 1) + \
-			 2*(BITS_PER_LONG*3/10) + (MODULE_NAME_LEN - 1) + 1)
+#define KSYM_SYMBOL_LEN (sizeof("%s+%#lx/%#lx [%s %s]") + \
+			(KSYM_NAME_LEN - 1) + \
+			2*(BITS_PER_LONG*3/10) + (MODULE_NAME_LEN - 1) + \
+			(BUILD_ID_SIZE_MAX * 2) + 1)
 
 struct cred;
 struct module;
@@ -91,8 +94,10 @@ const char *kallsyms_lookup(unsigned lon
 
 /* Look up a kernel symbol and return it in a text buffer. */
 extern int sprint_symbol(char *buffer, unsigned long address);
+extern int sprint_symbol_build_id(char *buffer, unsigned long address);
 extern int sprint_symbol_no_offset(char *buffer, unsigned long address);
 extern int sprint_backtrace(char *buffer, unsigned long address);
+extern int sprint_backtrace_build_id(char *buffer, unsigned long address);
 
 int lookup_symbol_name(unsigned long addr, char *symname);
 int lookup_symbol_attrs(unsigned long addr, unsigned long *size, unsigned long *offset, char *modname, char *name);
@@ -128,6 +133,12 @@ static inline int sprint_symbol(char *bu
 	return 0;
 }
 
+static inline int sprint_symbol_build_id(char *buffer, unsigned long address)
+{
+	*buffer = '\0';
+	return 0;
+}
+
 static inline int sprint_symbol_no_offset(char *buffer, unsigned long addr)
 {
 	*buffer = '\0';
@@ -138,6 +149,12 @@ static inline int sprint_backtrace(char
 {
 	*buffer = '\0';
 	return 0;
+}
+
+static inline int sprint_backtrace_build_id(char *buffer, unsigned long addr)
+{
+	*buffer = '\0';
+	return 0;
 }
 
 static inline int lookup_symbol_name(unsigned long addr, char *symname)
--- a/include/linux/module.h~module-add-printk-formats-to-add-module-build-id-to-stacktraces
+++ a/include/linux/module.h
@@ -11,6 +11,7 @@
 
 #include <linux/list.h>
 #include <linux/stat.h>
+#include <linux/buildid.h>
 #include <linux/compiler.h>
 #include <linux/cache.h>
 #include <linux/kmod.h>
@@ -369,6 +370,11 @@ struct module {
 	/* Unique handle for this module */
 	char name[MODULE_NAME_LEN];
 
+#ifdef CONFIG_STACKTRACE_BUILD_ID
+	/* Module build ID */
+	unsigned char build_id[BUILD_ID_SIZE_MAX];
+#endif
+
 	/* Sysfs stuff. */
 	struct module_kobject mkobj;
 	struct module_attribute *modinfo_attrs;
@@ -636,7 +642,7 @@ void *dereference_module_function_descri
 const char *module_address_lookup(unsigned long addr,
 			    unsigned long *symbolsize,
 			    unsigned long *offset,
-			    char **modname,
+			    char **modname, const unsigned char **modbuildid,
 			    char *namebuf);
 int lookup_module_symbol_name(unsigned long addr, char *symname);
 int lookup_module_symbol_attrs(unsigned long addr, unsigned long *size, unsigned long *offset, char *modname, char *name);
@@ -740,6 +746,7 @@ static inline const char *module_address
 					  unsigned long *symbolsize,
 					  unsigned long *offset,
 					  char **modname,
+					  const unsigned char **modbuildid,
 					  char *namebuf)
 {
 	return NULL;
--- a/kernel/kallsyms.c~module-add-printk-formats-to-add-module-build-id-to-stacktraces
+++ a/kernel/kallsyms.c
@@ -25,7 +25,10 @@
 #include <linux/filter.h>
 #include <linux/ftrace.h>
 #include <linux/kprobes.h>
+#include <linux/build_bug.h>
 #include <linux/compiler.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
 
 /*
  * These will be re-linked against their real values
@@ -297,21 +300,14 @@ int kallsyms_lookup_size_offset(unsigned
 		get_symbol_pos(addr, symbolsize, offset);
 		return 1;
 	}
-	return !!module_address_lookup(addr, symbolsize, offset, NULL, namebuf) ||
+	return !!module_address_lookup(addr, symbolsize, offset, NULL, NULL, namebuf) ||
 	       !!__bpf_address_lookup(addr, symbolsize, offset, namebuf);
 }
 
-/*
- * Lookup an address
- * - modname is set to NULL if it's in the kernel.
- * - We guarantee that the returned name is valid until we reschedule even if.
- *   It resides in a module.
- * - We also guarantee that modname will be valid until rescheduled.
- */
-const char *kallsyms_lookup(unsigned long addr,
-			    unsigned long *symbolsize,
-			    unsigned long *offset,
-			    char **modname, char *namebuf)
+static const char *kallsyms_lookup_buildid(unsigned long addr,
+			unsigned long *symbolsize,
+			unsigned long *offset, char **modname,
+			const unsigned char **modbuildid, char *namebuf)
 {
 	const char *ret;
 
@@ -327,6 +323,8 @@ const char *kallsyms_lookup(unsigned lon
 				       namebuf, KSYM_NAME_LEN);
 		if (modname)
 			*modname = NULL;
+		if (modbuildid)
+			*modbuildid = NULL;
 
 		ret = namebuf;
 		goto found;
@@ -334,7 +332,7 @@ const char *kallsyms_lookup(unsigned lon
 
 	/* See if it's in a module or a BPF JITed image. */
 	ret = module_address_lookup(addr, symbolsize, offset,
-				    modname, namebuf);
+				    modname, modbuildid, namebuf);
 	if (!ret)
 		ret = bpf_address_lookup(addr, symbolsize,
 					 offset, modname, namebuf);
@@ -348,6 +346,22 @@ found:
 	return ret;
 }
 
+/*
+ * Lookup an address
+ * - modname is set to NULL if it's in the kernel.
+ * - We guarantee that the returned name is valid until we reschedule even if.
+ *   It resides in a module.
+ * - We also guarantee that modname will be valid until rescheduled.
+ */
+const char *kallsyms_lookup(unsigned long addr,
+			    unsigned long *symbolsize,
+			    unsigned long *offset,
+			    char **modname, char *namebuf)
+{
+	return kallsyms_lookup_buildid(addr, symbolsize, offset, modname,
+				       NULL, namebuf);
+}
+
 int lookup_symbol_name(unsigned long addr, char *symname)
 {
 	int res;
@@ -404,15 +418,17 @@ found:
 
 /* Look up a kernel symbol and return it in a text buffer. */
 static int __sprint_symbol(char *buffer, unsigned long address,
-			   int symbol_offset, int add_offset)
+			   int symbol_offset, int add_offset, int add_buildid)
 {
 	char *modname;
+	const unsigned char *buildid;
 	const char *name;
 	unsigned long offset, size;
 	int len;
 
 	address += symbol_offset;
-	name = kallsyms_lookup(address, &size, &offset, &modname, buffer);
+	name = kallsyms_lookup_buildid(address, &size, &offset, &modname, &buildid,
+				       buffer);
 	if (!name)
 		return sprintf(buffer, "0x%lx", address - symbol_offset);
 
@@ -424,8 +440,19 @@ static int __sprint_symbol(char *buffer,
 	if (add_offset)
 		len += sprintf(buffer + len, "+%#lx/%#lx", offset, size);
 
-	if (modname)
-		len += sprintf(buffer + len, " [%s]", modname);
+	if (modname) {
+		len += sprintf(buffer + len, " [%s", modname);
+#if IS_ENABLED(CONFIG_STACKTRACE_BUILD_ID)
+		if (add_buildid && buildid) {
+			/* build ID should match length of sprintf */
+#if IS_ENABLED(CONFIG_MODULES)
+			static_assert(sizeof(typeof_member(struct module, build_id)) == 20);
+#endif
+			len += sprintf(buffer + len, " %20phN", buildid);
+		}
+#endif
+		len += sprintf(buffer + len, "]");
+	}
 
 	return len;
 }
@@ -443,11 +470,28 @@ static int __sprint_symbol(char *buffer,
  */
 int sprint_symbol(char *buffer, unsigned long address)
 {
-	return __sprint_symbol(buffer, address, 0, 1);
+	return __sprint_symbol(buffer, address, 0, 1, 0);
 }
 EXPORT_SYMBOL_GPL(sprint_symbol);
 
 /**
+ * sprint_symbol_build_id - Look up a kernel symbol and return it in a text buffer
+ * @buffer: buffer to be stored
+ * @address: address to lookup
+ *
+ * This function looks up a kernel symbol with @address and stores its name,
+ * offset, size, module name and module build ID to @buffer if possible. If no
+ * symbol was found, just saves its @address as is.
+ *
+ * This function returns the number of bytes stored in @buffer.
+ */
+int sprint_symbol_build_id(char *buffer, unsigned long address)
+{
+	return __sprint_symbol(buffer, address, 0, 1, 1);
+}
+EXPORT_SYMBOL_GPL(sprint_symbol_build_id);
+
+/**
  * sprint_symbol_no_offset - Look up a kernel symbol and return it in a text buffer
  * @buffer: buffer to be stored
  * @address: address to lookup
@@ -460,7 +504,7 @@ EXPORT_SYMBOL_GPL(sprint_symbol);
  */
 int sprint_symbol_no_offset(char *buffer, unsigned long address)
 {
-	return __sprint_symbol(buffer, address, 0, 0);
+	return __sprint_symbol(buffer, address, 0, 0, 0);
 }
 EXPORT_SYMBOL_GPL(sprint_symbol_no_offset);
 
@@ -480,7 +524,27 @@ EXPORT_SYMBOL_GPL(sprint_symbol_no_offse
  */
 int sprint_backtrace(char *buffer, unsigned long address)
 {
-	return __sprint_symbol(buffer, address, -1, 1);
+	return __sprint_symbol(buffer, address, -1, 1, 0);
+}
+
+/**
+ * sprint_backtrace_build_id - Look up a backtrace symbol and return it in a text buffer
+ * @buffer: buffer to be stored
+ * @address: address to lookup
+ *
+ * This function is for stack backtrace and does the same thing as
+ * sprint_symbol() but with modified/decreased @address. If there is a
+ * tail-call to the function marked "noreturn", gcc optimized out code after
+ * the call so that the stack-saved return address could point outside of the
+ * caller. This function ensures that kallsyms will find the original caller
+ * by decreasing @address. This function also appends the module build ID to
+ * the @buffer if @address is within a kernel module.
+ *
+ * This function returns the number of bytes stored in @buffer.
+ */
+int sprint_backtrace_build_id(char *buffer, unsigned long address)
+{
+	return __sprint_symbol(buffer, address, -1, 1, 1);
 }
 
 /* To avoid using get_symbol_offset for every symbol, we carry prefix along. */
--- a/kernel/module.c~module-add-printk-formats-to-add-module-build-id-to-stacktraces
+++ a/kernel/module.c
@@ -13,6 +13,7 @@
 #include <linux/trace_events.h>
 #include <linux/init.h>
 #include <linux/kallsyms.h>
+#include <linux/buildid.h>
 #include <linux/file.h>
 #include <linux/fs.h>
 #include <linux/sysfs.h>
@@ -1465,6 +1466,13 @@ resolve_symbol_wait(struct module *mod,
 	return ksym;
 }
 
+#ifdef CONFIG_KALLSYMS
+static inline bool sect_empty(const Elf_Shdr *sect)
+{
+	return !(sect->sh_flags & SHF_ALLOC) || sect->sh_size == 0;
+}
+#endif
+
 /*
  * /sys/module/foo/sections stuff
  * J. Corbet <corbet@lwn.net>
@@ -1472,11 +1480,6 @@ resolve_symbol_wait(struct module *mod,
 #ifdef CONFIG_SYSFS
 
 #ifdef CONFIG_KALLSYMS
-static inline bool sect_empty(const Elf_Shdr *sect)
-{
-	return !(sect->sh_flags & SHF_ALLOC) || sect->sh_size == 0;
-}
-
 struct module_sect_attr {
 	struct bin_attribute battr;
 	unsigned long address;
@@ -2797,6 +2800,26 @@ static void add_kallsyms(struct module *
 }
 #endif /* CONFIG_KALLSYMS */
 
+#if IS_ENABLED(CONFIG_KALLSYMS) && IS_ENABLED(CONFIG_STACKTRACE_BUILD_ID)
+static void init_build_id(struct module *mod, const struct load_info *info)
+{
+	const Elf_Shdr *sechdr;
+	unsigned int i;
+
+	for (i = 0; i < info->hdr->e_shnum; i++) {
+		sechdr = &info->sechdrs[i];
+		if (!sect_empty(sechdr) && sechdr->sh_type == SHT_NOTE &&
+		    !build_id_parse_buf((void *)sechdr->sh_addr, mod->build_id,
+					sechdr->sh_size))
+			break;
+	}
+}
+#else
+static void init_build_id(struct module *mod, const struct load_info *info)
+{
+}
+#endif
+
 static void dynamic_debug_setup(struct module *mod, struct _ddebug *debug, unsigned int num)
 {
 	if (!debug)
@@ -4021,6 +4044,7 @@ static int load_module(struct load_info
 		goto free_arch_cleanup;
 	}
 
+	init_build_id(mod, info);
 	dynamic_debug_setup(mod, info->debug, info->num_debug);
 
 	/* Ftrace init must be called in the MODULE_STATE_UNFORMED state */
@@ -4254,6 +4278,7 @@ const char *module_address_lookup(unsign
 			    unsigned long *size,
 			    unsigned long *offset,
 			    char **modname,
+			    const unsigned char **modbuildid,
 			    char *namebuf)
 {
 	const char *ret = NULL;
@@ -4264,6 +4289,13 @@ const char *module_address_lookup(unsign
 	if (mod) {
 		if (modname)
 			*modname = mod->name;
+		if (modbuildid) {
+#if IS_ENABLED(CONFIG_STACKTRACE_BUILD_ID)
+			*modbuildid = mod->build_id;
+#else
+			*modbuildid = NULL;
+#endif
+		}
 
 		ret = find_kallsyms_symbol(mod, addr, size, offset);
 	}
--- a/lib/vsprintf.c~module-add-printk-formats-to-add-module-build-id-to-stacktraces
+++ a/lib/vsprintf.c
@@ -993,8 +993,12 @@ char *symbol_string(char *buf, char *end
 	value = (unsigned long)ptr;
 
 #ifdef CONFIG_KALLSYMS
-	if (*fmt == 'B')
+	if (*fmt == 'B' && fmt[1] == 'b')
+		sprint_backtrace_build_id(sym, value);
+	else if (*fmt == 'B')
 		sprint_backtrace(sym, value);
+	else if (*fmt == 'S' && (fmt[1] == 'b' || (fmt[1] == 'R' && fmt[2] == 'b')))
+		sprint_symbol_build_id(sym, value);
 	else if (*fmt != 's')
 		sprint_symbol(sym, value);
 	else
@@ -2263,9 +2267,11 @@ early_param("no_hash_pointers", no_hash_
  * - 'S' For symbolic direct pointers (or function descriptors) with offset
  * - 's' For symbolic direct pointers (or function descriptors) without offset
  * - '[Ss]R' as above with __builtin_extract_return_addr() translation
+ * - 'S[R]b' as above with module build ID (for use in backtraces)
  * - '[Ff]' %pf and %pF were obsoleted and later removed in favor of
  *	    %ps and %pS. Be careful when re-using these specifiers.
  * - 'B' For backtraced symbolic direct pointers with offset
+ * - 'Bb' as above with module build ID (for use in backtraces)
  * - 'R' For decoded struct resource, e.g., [mem 0x0-0x1f 64bit pref]
  * - 'r' For raw struct resource, e.g., [mem 0x0-0x1f flags 0x201]
  * - 'b[l]' For a bitmap, the number of bits is determined by the field
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 36/54] arm64: stacktrace: use %pSb for backtrace printing
  2021-07-08  0:59 incoming Andrew Morton
                   ` (34 preceding siblings ...)
  2021-07-08  1:09 ` [patch 35/54] module: add printk formats to add module build ID to stacktraces Andrew Morton
@ 2021-07-08  1:09 ` Andrew Morton
  2021-07-08  1:09 ` [patch 37/54] x86/dumpstack: use %pSb/%pBb " Andrew Morton
                   ` (17 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:09 UTC (permalink / raw)
  To: akpm, andriy.shevchenko, ast, bhe, bp, catalin.marinas, dyoung,
	evgreen, hsinyi, jeyu, jolsa, khlebnikov, linux-mm, linux, mingo,
	mm-commits, pmladek, rostedt, sashal, sergey.senozhatsky, swboyd,
	tglx, torvalds, vgoyal, will, willy

From: Stephen Boyd <swboyd@chromium.org>
Subject: arm64: stacktrace: use %pSb for backtrace printing

Let's use the new printk format to print the stacktrace entry when
printing a backtrace to the kernel logs. This will include any module's
build ID[1] in it so that offline/crash debugging can easily locate the
debuginfo for a module via something like debuginfod[2].

Link: https://lkml.kernel.org/r/20210511003845.2429846-7-swboyd@chromium.org
Link: https://fedoraproject.org/wiki/Releases/FeatureBuildId [1]
Link: https://sourceware.org/elfutils/Debuginfod.html [2]
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Evan Green <evgreen@chromium.org>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Young <dyoung@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/arm64/kernel/stacktrace.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/arm64/kernel/stacktrace.c~arm64-stacktrace-use-%psb-for-backtrace-printing
+++ a/arch/arm64/kernel/stacktrace.c
@@ -153,7 +153,7 @@ NOKPROBE_SYMBOL(walk_stackframe);
 
 static void dump_backtrace_entry(unsigned long where, const char *loglvl)
 {
-	printk("%s %pS\n", loglvl, (void *)where);
+	printk("%s %pSb\n", loglvl, (void *)where);
 }
 
 void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 37/54] x86/dumpstack: use %pSb/%pBb for backtrace printing
  2021-07-08  0:59 incoming Andrew Morton
                   ` (35 preceding siblings ...)
  2021-07-08  1:09 ` [patch 36/54] arm64: stacktrace: use %pSb for backtrace printing Andrew Morton
@ 2021-07-08  1:09 ` Andrew Morton
  2021-07-08  1:09 ` [patch 38/54] scripts/decode_stacktrace.sh: support debuginfod Andrew Morton
                   ` (16 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:09 UTC (permalink / raw)
  To: akpm, andriy.shevchenko, ast, bhe, bp, catalin.marinas, dyoung,
	evgreen, hsinyi, jeyu, jolsa, khlebnikov, linux-mm, linux, mingo,
	mm-commits, pmladek, rostedt, sashal, sergey.senozhatsky, swboyd,
	tglx, torvalds, vgoyal, will, willy

From: Stephen Boyd <swboyd@chromium.org>
Subject: x86/dumpstack: use %pSb/%pBb for backtrace printing

Let's use the new printk formats to print the stacktrace entries when
printing a backtrace to the kernel logs.  This will include any module's
build ID[1] in it so that offline/crash debugging can easily locate the
debuginfo for a module via something like debuginfod[2].

Link: https://lkml.kernel.org/r/20210511003845.2429846-8-swboyd@chromium.org
Link: https://fedoraproject.org/wiki/Releases/FeatureBuildId [1]
Link: https://sourceware.org/elfutils/Debuginfod.html [2]
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Evan Green <evgreen@chromium.org>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/x86/kernel/dumpstack.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/kernel/dumpstack.c~x86-dumpstack-use-%psb-%pbb-for-backtrace-printing
+++ a/arch/x86/kernel/dumpstack.c
@@ -69,7 +69,7 @@ static void printk_stack_address(unsigne
 				 const char *log_lvl)
 {
 	touch_nmi_watchdog();
-	printk("%s %s%pB\n", log_lvl, reliable ? "" : "? ", (void *)address);
+	printk("%s %s%pBb\n", log_lvl, reliable ? "" : "? ", (void *)address);
 }
 
 static int copy_code(struct pt_regs *regs, u8 *buf, unsigned long src,
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 38/54] scripts/decode_stacktrace.sh: support debuginfod
  2021-07-08  0:59 incoming Andrew Morton
                   ` (36 preceding siblings ...)
  2021-07-08  1:09 ` [patch 37/54] x86/dumpstack: use %pSb/%pBb " Andrew Morton
@ 2021-07-08  1:09 ` Andrew Morton
  2021-07-08  1:09 ` [patch 39/54] scripts/decode_stacktrace.sh: silence stderr messages from addr2line/nm Andrew Morton
                   ` (15 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:09 UTC (permalink / raw)
  To: akpm, andriy.shevchenko, ast, bhe, bp, catalin.marinas, dyoung,
	evgreen, hsinyi, jeyu, jolsa, khlebnikov, linux-mm, linux, mingo,
	mm-commits, pmladek, rostedt, sashal, sergey.senozhatsky, swboyd,
	tglx, torvalds, vgoyal, will, willy

From: Stephen Boyd <swboyd@chromium.org>
Subject: scripts/decode_stacktrace.sh: support debuginfod

Now that stacktraces contain the build ID information we can update this
script to use debuginfod-find to locate the debuginfo for the vmlinux and
modules automatically.  This can replace the existing code that requires
specifying a path to vmlinux or tries to find the vmlinux and modules
automatically by using the release number.  Work it into the script as a
fallback option if the vmlinux isn't specified on the commandline.

Link: https://lkml.kernel.org/r/20210511003845.2429846-9-swboyd@chromium.org
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Evan Green <evgreen@chromium.org>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 scripts/decode_stacktrace.sh |   81 ++++++++++++++++++++++++++++-----
 1 file changed, 70 insertions(+), 11 deletions(-)

--- a/scripts/decode_stacktrace.sh~scripts-decode_stacktracesh-support-debuginfod
+++ a/scripts/decode_stacktrace.sh
@@ -3,11 +3,10 @@
 # (c) 2014, Sasha Levin <sasha.levin@oracle.com>
 #set -x
 
-if [[ $# < 1 ]]; then
+usage() {
 	echo "Usage:"
 	echo "	$0 -r <release> | <vmlinux> [base path] [modules path]"
-	exit 1
-fi
+}
 
 if [[ $1 == "-r" ]] ; then
 	vmlinux=""
@@ -24,6 +23,7 @@ if [[ $1 == "-r" ]] ; then
 
 	if [[ $vmlinux == "" ]] ; then
 		echo "ERROR! vmlinux image for release $release is not found" >&2
+		usage
 		exit 2
 	fi
 else
@@ -31,12 +31,35 @@ else
 	basepath=${2-auto}
 	modpath=$3
 	release=""
+	debuginfod=
+
+	# Can we use debuginfod-find?
+	if type debuginfod-find >/dev/null 2>&1 ; then
+		debuginfod=${1-only}
+	fi
+
+	if [[ $vmlinux == "" && -z $debuginfod ]] ; then
+		echo "ERROR! vmlinux image must be specified" >&2
+		usage
+		exit 1
+	fi
 fi
 
 declare -A cache
 declare -A modcache
 
 find_module() {
+	if [[ -n $debuginfod ]] ; then
+		if [[ -n $modbuildid ]] ; then
+			debuginfod-find debuginfo $modbuildid && return
+		fi
+
+		# Only using debuginfod so don't try to find vmlinux module path
+		if [[ $debuginfod == "only" ]] ; then
+			return
+		fi
+	fi
+
 	if [[ "$modpath" != "" ]] ; then
 		for fn in $(find "$modpath" -name "${module//_/[-_]}.ko*") ; do
 			if readelf -WS "$fn" | grep -qwF .debug_line ; then
@@ -150,6 +173,27 @@ parse_symbol() {
 	symbol="$segment$name ($code)"
 }
 
+debuginfod_get_vmlinux() {
+	local vmlinux_buildid=${1##* }
+
+	if [[ $vmlinux != "" ]]; then
+		return
+	fi
+
+	if [[ $vmlinux_buildid =~ ^[0-9a-f]+ ]]; then
+		vmlinux=$(debuginfod-find debuginfo $vmlinux_buildid)
+		if [[ $? -ne 0 ]] ; then
+			echo "ERROR! vmlinux image not found via debuginfod-find" >&2
+			usage
+			exit 2
+		fi
+		return
+	fi
+	echo "ERROR! Build ID for vmlinux not found. Try passing -r or specifying vmlinux" >&2
+	usage
+	exit 2
+}
+
 decode_code() {
 	local scripts=`dirname "${BASH_SOURCE[0]}"`
 
@@ -157,6 +201,14 @@ decode_code() {
 }
 
 handle_line() {
+	if [[ $basepath == "auto" && $vmlinux != "" ]] ; then
+		module=""
+		symbol="kernel_init+0x0/0x0"
+		parse_symbol
+		basepath=${symbol#kernel_init (}
+		basepath=${basepath%/init/main.c:*)}
+	fi
+
 	local words
 
 	# Tokenize
@@ -182,16 +234,28 @@ handle_line() {
 		fi
 	done
 
+	if [[ ${words[$last]} =~ ^[0-9a-f]+\] ]]; then
+		words[$last-1]="${words[$last-1]} ${words[$last]}"
+		unset words[$last]
+		last=$(( $last - 1 ))
+	fi
+
 	if [[ ${words[$last]} =~ \[([^]]+)\] ]]; then
 		module=${words[$last]}
 		module=${module#\[}
 		module=${module%\]}
+		modbuildid=${module#* }
+		module=${module% *}
+		if [[ $modbuildid == $module ]]; then
+			modbuildid=
+		fi
 		symbol=${words[$last-1]}
 		unset words[$last-1]
 	else
 		# The symbol is the last element, process it
 		symbol=${words[$last]}
 		module=
+		modbuildid=
 	fi
 
 	unset words[$last]
@@ -201,14 +265,6 @@ handle_line() {
 	echo "${words[@]}" "$symbol $module"
 }
 
-if [[ $basepath == "auto" ]] ; then
-	module=""
-	symbol="kernel_init+0x0/0x0"
-	parse_symbol
-	basepath=${symbol#kernel_init (}
-	basepath=${basepath%/init/main.c:*)}
-fi
-
 while read line; do
 	# Let's see if we have an address in the line
 	if [[ $line =~ \[\<([^]]+)\>\] ]] ||
@@ -218,6 +274,9 @@ while read line; do
 	# Is it a code line?
 	elif [[ $line == *Code:* ]]; then
 		decode_code "$line"
+	# Is it a version line?
+	elif [[ -n $debuginfod && $line =~ PID:\ [0-9]+\ Comm: ]]; then
+		debuginfod_get_vmlinux "$line"
 	else
 		# Nothing special in this line, show it as is
 		echo "$line"
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 39/54] scripts/decode_stacktrace.sh: silence stderr messages from addr2line/nm
  2021-07-08  0:59 incoming Andrew Morton
                   ` (37 preceding siblings ...)
  2021-07-08  1:09 ` [patch 38/54] scripts/decode_stacktrace.sh: support debuginfod Andrew Morton
@ 2021-07-08  1:09 ` Andrew Morton
  2021-07-08  1:09 ` [patch 40/54] scripts/decode_stacktrace.sh: indicate 'auto' can be used for base path Andrew Morton
                   ` (14 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:09 UTC (permalink / raw)
  To: akpm, andriy.shevchenko, ast, bhe, bp, catalin.marinas, dyoung,
	evgreen, hsinyi, jeyu, jolsa, khlebnikov, linux-mm, linux, mingo,
	mm-commits, pmladek, rostedt, sashal, sergey.senozhatsky, swboyd,
	tglx, torvalds, vgoyal, will, willy

From: Stephen Boyd <swboyd@chromium.org>
Subject: scripts/decode_stacktrace.sh: silence stderr messages from addr2line/nm

Sometimes if you're using tools that have linked things improperly or have
new features/sections that older tools don't expect you'll see warnings
printed to stderr.  We don't really care about these warnings, so let's
just silence these messages to cleanup output of this script.

Link: https://lkml.kernel.org/r/20210511003845.2429846-10-swboyd@chromium.org
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Evan Green <evgreen@chromium.org>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 scripts/decode_stacktrace.sh |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/scripts/decode_stacktrace.sh~scripts-decode_stacktracesh-silence-stderr-messages-from-addr2line-nm
+++ a/scripts/decode_stacktrace.sh
@@ -74,7 +74,7 @@ find_module() {
 	find_module && return
 
 	if [[ $release == "" ]] ; then
-		release=$(gdb -ex 'print init_uts_ns.name.release' -ex 'quit' -quiet -batch "$vmlinux" | sed -n 's/\$1 = "\(.*\)".*/\1/p')
+		release=$(gdb -ex 'print init_uts_ns.name.release' -ex 'quit' -quiet -batch "$vmlinux" 2>/dev/null | sed -n 's/\$1 = "\(.*\)".*/\1/p')
 	fi
 
 	for dn in {/usr/lib/debug,}/lib/modules/$release ; do
@@ -128,7 +128,7 @@ parse_symbol() {
 	if [[ "${cache[$module,$name]+isset}" == "isset" ]]; then
 		local base_addr=${cache[$module,$name]}
 	else
-		local base_addr=$(nm "$objfile" | awk '$3 == "'$name'" && ($2 == "t" || $2 == "T") {print $1; exit}')
+		local base_addr=$(nm "$objfile" 2>/dev/null | awk '$3 == "'$name'" && ($2 == "t" || $2 == "T") {print $1; exit}')
 		if [[ $base_addr == "" ]] ; then
 			# address not found
 			return
@@ -152,7 +152,7 @@ parse_symbol() {
 	if [[ "${cache[$module,$address]+isset}" == "isset" ]]; then
 		local code=${cache[$module,$address]}
 	else
-		local code=$(${CROSS_COMPILE}addr2line -i -e "$objfile" "$address")
+		local code=$(${CROSS_COMPILE}addr2line -i -e "$objfile" "$address" 2>/dev/null)
 		cache[$module,$address]=$code
 	fi
 
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 40/54] scripts/decode_stacktrace.sh: indicate 'auto' can be used for base path
  2021-07-08  0:59 incoming Andrew Morton
                   ` (38 preceding siblings ...)
  2021-07-08  1:09 ` [patch 39/54] scripts/decode_stacktrace.sh: silence stderr messages from addr2line/nm Andrew Morton
@ 2021-07-08  1:09 ` Andrew Morton
  2021-07-08  1:09 ` [patch 41/54] buildid: mark some arguments const Andrew Morton
                   ` (13 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:09 UTC (permalink / raw)
  To: akpm, andriy.shevchenko, ast, bhe, bp, catalin.marinas, dyoung,
	evgreen, hsinyi, jeyu, jolsa, khlebnikov, linux-mm, linux, mingo,
	mm-commits, pmladek, rostedt, sashal, sergey.senozhatsky, swboyd,
	tglx, torvalds, vgoyal, will, willy

From: Stephen Boyd <swboyd@chromium.org>
Subject: scripts/decode_stacktrace.sh: indicate 'auto' can be used for base path

Add "auto" to the usage message so that it's a little clearer that you can
pass "auto" as the second argument.  When passing "auto" the script tries
to find the base path automatically instead of requiring it be passed on
the commandline.  Also use [<variable>] to indicate the variable argument
and that it is optional so that we can differentiate from the literal
"auto" that should be passed.

Link: https://lkml.kernel.org/r/20210511003845.2429846-11-swboyd@chromium.org
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Evan Green <evgreen@chromium.org>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 scripts/decode_stacktrace.sh |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/scripts/decode_stacktrace.sh~scripts-decode_stacktracesh-indicate-auto-can-be-used-for-base-path
+++ a/scripts/decode_stacktrace.sh
@@ -5,7 +5,7 @@
 
 usage() {
 	echo "Usage:"
-	echo "	$0 -r <release> | <vmlinux> [base path] [modules path]"
+	echo "	$0 -r <release> | <vmlinux> [<base path>|auto] [<modules path>]"
 }
 
 if [[ $1 == "-r" ]] ; then
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 41/54] buildid: mark some arguments const
  2021-07-08  0:59 incoming Andrew Morton
                   ` (39 preceding siblings ...)
  2021-07-08  1:09 ` [patch 40/54] scripts/decode_stacktrace.sh: indicate 'auto' can be used for base path Andrew Morton
@ 2021-07-08  1:09 ` Andrew Morton
  2021-07-08  1:09 ` [patch 42/54] buildid: fix kernel-doc notation Andrew Morton
                   ` (12 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:09 UTC (permalink / raw)
  To: akpm, andriy.shevchenko, ast, bhe, bp, catalin.marinas, dyoung,
	evgreen, hsinyi, jeyu, jolsa, khlebnikov, linux-mm, linux, mingo,
	mm-commits, pmladek, rostedt, sashal, sergey.senozhatsky, swboyd,
	tglx, torvalds, vgoyal, will, willy

From: Stephen Boyd <swboyd@chromium.org>
Subject: buildid: mark some arguments const

These arguments are never modified so they can be marked const to indicate
as such.

Link: https://lkml.kernel.org/r/20210511003845.2429846-12-swboyd@chromium.org
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Evan Green <evgreen@chromium.org>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 lib/buildid.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/lib/buildid.c~buildid-mark-some-arguments-const
+++ a/lib/buildid.c
@@ -48,10 +48,10 @@ static int parse_build_id_buf(unsigned c
 	return -EINVAL;
 }
 
-static inline int parse_build_id(void *page_addr,
+static inline int parse_build_id(const void *page_addr,
 				 unsigned char *build_id,
 				 __u32 *size,
-				 void *note_start,
+				 const void *note_start,
 				 Elf32_Word note_size)
 {
 	/* check for overflow */
@@ -66,7 +66,7 @@ static inline int parse_build_id(void *p
 }
 
 /* Parse build ID from 32-bit ELF */
-static int get_build_id_32(void *page_addr, unsigned char *build_id,
+static int get_build_id_32(const void *page_addr, unsigned char *build_id,
 			   __u32 *size)
 {
 	Elf32_Ehdr *ehdr = (Elf32_Ehdr *)page_addr;
@@ -91,7 +91,7 @@ static int get_build_id_32(void *page_ad
 }
 
 /* Parse build ID from 64-bit ELF */
-static int get_build_id_64(void *page_addr, unsigned char *build_id,
+static int get_build_id_64(const void *page_addr, unsigned char *build_id,
 			   __u32 *size)
 {
 	Elf64_Ehdr *ehdr = (Elf64_Ehdr *)page_addr;
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 42/54] buildid: fix kernel-doc notation
  2021-07-08  0:59 incoming Andrew Morton
                   ` (40 preceding siblings ...)
  2021-07-08  1:09 ` [patch 41/54] buildid: mark some arguments const Andrew Morton
@ 2021-07-08  1:09 ` Andrew Morton
  2021-07-08  1:09 ` [patch 43/54] kdump: use vmlinux_build_id to simplify Andrew Morton
                   ` (11 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:09 UTC (permalink / raw)
  To: akpm, andriy.shevchenko, ast, bhe, bp, catalin.marinas, dyoung,
	evgreen, hsinyi, jeyu, jolsa, khlebnikov, linux-mm, linux, mingo,
	mm-commits, pmladek, rostedt, sashal, sergey.senozhatsky, swboyd,
	tglx, torvalds, vgoyal, will, willy

From: Stephen Boyd <swboyd@chromium.org>
Subject: buildid: fix kernel-doc notation

Kernel doc should use "Return:" instead of "Returns" to properly reflect
the return values.

Link: https://lkml.kernel.org/r/20210511003845.2429846-13-swboyd@chromium.org
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Evan Green <evgreen@chromium.org>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 lib/buildid.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/lib/buildid.c~buildid-fix-kernel-doc-notation
+++ a/lib/buildid.c
@@ -121,7 +121,7 @@ static int get_build_id_64(const void *p
  * @build_id: buffer to store build id, at least BUILD_ID_SIZE long
  * @size:     returns actual build id size in case of success
  *
- * Returns 0 on success, otherwise error (< 0).
+ * Return: 0 on success, -EINVAL otherwise
  */
 int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
 		   __u32 *size)
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 43/54] kdump: use vmlinux_build_id to simplify
  2021-07-08  0:59 incoming Andrew Morton
                   ` (41 preceding siblings ...)
  2021-07-08  1:09 ` [patch 42/54] buildid: fix kernel-doc notation Andrew Morton
@ 2021-07-08  1:09 ` Andrew Morton
  2021-07-08  1:09 ` [patch 44/54] mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t * Andrew Morton
                   ` (10 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:09 UTC (permalink / raw)
  To: akpm, andriy.shevchenko, ast, bhe, bp, catalin.marinas, dyoung,
	evgreen, hsinyi, jeyu, jolsa, khlebnikov, linux-mm, linux, mingo,
	mm-commits, pmladek, rostedt, sashal, sergey.senozhatsky, swboyd,
	tglx, torvalds, vgoyal, will, willy

From: Stephen Boyd <swboyd@chromium.org>
Subject: kdump: use vmlinux_build_id to simplify

We can use the vmlinux_build_id array here now instead of open coding it. 
This mostly consolidates code.

Link: https://lkml.kernel.org/r/20210511003845.2429846-14-swboyd@chromium.org
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Evan Green <evgreen@chromium.org>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/buildid.h    |    2 -
 include/linux/crash_core.h |   12 ++++----
 kernel/crash_core.c        |   50 +----------------------------------
 lib/buildid.c              |    2 -
 4 files changed, 10 insertions(+), 56 deletions(-)

--- a/include/linux/buildid.h~kdump-use-vmlinux_build_id-to-simplify
+++ a/include/linux/buildid.h
@@ -10,7 +10,7 @@ int build_id_parse(struct vm_area_struct
 		   __u32 *size);
 int build_id_parse_buf(const void *buf, unsigned char *build_id, u32 buf_size);
 
-#if IS_ENABLED(CONFIG_STACKTRACE_BUILD_ID)
+#if IS_ENABLED(CONFIG_STACKTRACE_BUILD_ID) || IS_ENABLED(CONFIG_CRASH_CORE)
 extern unsigned char vmlinux_build_id[BUILD_ID_SIZE_MAX];
 void init_vmlinux_build_id(void);
 #else
--- a/include/linux/crash_core.h~kdump-use-vmlinux_build_id-to-simplify
+++ a/include/linux/crash_core.h
@@ -38,8 +38,12 @@ phys_addr_t paddr_vmcoreinfo_note(void);
 
 #define VMCOREINFO_OSRELEASE(value) \
 	vmcoreinfo_append_str("OSRELEASE=%s\n", value)
-#define VMCOREINFO_BUILD_ID(value) \
-	vmcoreinfo_append_str("BUILD-ID=%s\n", value)
+#define VMCOREINFO_BUILD_ID()						\
+	({								\
+		static_assert(sizeof(vmlinux_build_id) == 20);		\
+		vmcoreinfo_append_str("BUILD-ID=%20phN\n", vmlinux_build_id); \
+	})
+
 #define VMCOREINFO_PAGESIZE(value) \
 	vmcoreinfo_append_str("PAGESIZE=%ld\n", value)
 #define VMCOREINFO_SYMBOL(name) \
@@ -69,10 +73,6 @@ extern unsigned char *vmcoreinfo_data;
 extern size_t vmcoreinfo_size;
 extern u32 *vmcoreinfo_note;
 
-/* raw contents of kernel .notes section */
-extern const void __start_notes __weak;
-extern const void __stop_notes __weak;
-
 Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
 			  void *data, size_t data_len);
 void final_note(Elf_Word *buf);
--- a/kernel/crash_core.c~kdump-use-vmlinux_build_id-to-simplify
+++ a/kernel/crash_core.c
@@ -4,6 +4,7 @@
  * Copyright (C) 2002-2004 Eric Biederman  <ebiederm@xmission.com>
  */
 
+#include <linux/buildid.h>
 #include <linux/crash_core.h>
 #include <linux/utsname.h>
 #include <linux/vmalloc.h>
@@ -378,53 +379,6 @@ phys_addr_t __weak paddr_vmcoreinfo_note
 }
 EXPORT_SYMBOL(paddr_vmcoreinfo_note);
 
-#define NOTES_SIZE (&__stop_notes - &__start_notes)
-#define BUILD_ID_MAX SHA1_DIGEST_SIZE
-#define NT_GNU_BUILD_ID 3
-
-struct elf_note_section {
-	struct elf_note	n_hdr;
-	u8 n_data[];
-};
-
-/*
- * Add build ID from .notes section as generated by the GNU ld(1)
- * or LLVM lld(1) --build-id option.
- */
-static void add_build_id_vmcoreinfo(void)
-{
-	char build_id[BUILD_ID_MAX * 2 + 1];
-	int n_remain = NOTES_SIZE;
-
-	while (n_remain >= sizeof(struct elf_note)) {
-		const struct elf_note_section *note_sec =
-			&__start_notes + NOTES_SIZE - n_remain;
-		const u32 n_namesz = note_sec->n_hdr.n_namesz;
-
-		if (note_sec->n_hdr.n_type == NT_GNU_BUILD_ID &&
-		    n_namesz != 0 &&
-		    !strcmp((char *)&note_sec->n_data[0], "GNU")) {
-			if (note_sec->n_hdr.n_descsz <= BUILD_ID_MAX) {
-				const u32 n_descsz = note_sec->n_hdr.n_descsz;
-				const u8 *s = &note_sec->n_data[n_namesz];
-
-				s = PTR_ALIGN(s, 4);
-				bin2hex(build_id, s, n_descsz);
-				build_id[2 * n_descsz] = '\0';
-				VMCOREINFO_BUILD_ID(build_id);
-				return;
-			}
-			pr_warn("Build ID is too large to include in vmcoreinfo: %u > %u\n",
-				note_sec->n_hdr.n_descsz,
-				BUILD_ID_MAX);
-			return;
-		}
-		n_remain -= sizeof(struct elf_note) +
-			ALIGN(note_sec->n_hdr.n_namesz, 4) +
-			ALIGN(note_sec->n_hdr.n_descsz, 4);
-	}
-}
-
 static int __init crash_save_vmcoreinfo_init(void)
 {
 	vmcoreinfo_data = (unsigned char *)get_zeroed_page(GFP_KERNEL);
@@ -443,7 +397,7 @@ static int __init crash_save_vmcoreinfo_
 	}
 
 	VMCOREINFO_OSRELEASE(init_uts_ns.name.release);
-	add_build_id_vmcoreinfo();
+	VMCOREINFO_BUILD_ID();
 	VMCOREINFO_PAGESIZE(PAGE_SIZE);
 
 	VMCOREINFO_SYMBOL(init_uts_ns);
--- a/lib/buildid.c~kdump-use-vmlinux_build_id-to-simplify
+++ a/lib/buildid.c
@@ -174,7 +174,7 @@ int build_id_parse_buf(const void *buf,
 	return parse_build_id_buf(build_id, NULL, buf, buf_size);
 }
 
-#if IS_ENABLED(CONFIG_STACKTRACE_BUILD_ID)
+#if IS_ENABLED(CONFIG_STACKTRACE_BUILD_ID) || IS_ENABLED(CONFIG_CRASH_CORE)
 unsigned char vmlinux_build_id[BUILD_ID_SIZE_MAX] __ro_after_init;
 
 /**
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 44/54] mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t *
  2021-07-08  0:59 incoming Andrew Morton
                   ` (42 preceding siblings ...)
  2021-07-08  1:09 ` [patch 43/54] kdump: use vmlinux_build_id to simplify Andrew Morton
@ 2021-07-08  1:09 ` Andrew Morton
  2021-07-08  1:09 ` [patch 45/54] mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t * Andrew Morton
                   ` (9 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:09 UTC (permalink / raw)
  To: akpm, aneesh.kumar, christophe.leroy, hughd, joel, kaleshsingh,
	kirill.shutemov, linux-mm, mm-commits, mpe, npiggin, sfr,
	torvalds

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Subject: mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t *

No functional change in this patch.

[aneesh.kumar@linux.ibm.com: fix]
  Link: https://lkml.kernel.org/r/87wnqtnb60.fsf@linux.ibm.com
[sfr@canb.auug.org.au: another fix]
  Link: https://lkml.kernel.org/r/20210619134410.89559-1-aneesh.kumar@linux.ibm.com
Link: https://lkml.kernel.org/r/20210615110859.320299-1-aneesh.kumar@linux.ibm.com
Link: https://lore.kernel.org/linuxppc-dev/CAHk-=wi+J+iodze9FtjM3Zi4j4OeS+qqbKxME9QN4roxPEXH9Q@mail.gmail.com/
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/alpha/include/asm/pgtable.h             |    8 +++++---
 arch/arm/include/asm/pgtable-3level.h        |    2 +-
 arch/arm64/include/asm/pgtable.h             |    4 ++--
 arch/ia64/include/asm/pgtable.h              |    2 +-
 arch/m68k/include/asm/motorola_pgtable.h     |    2 +-
 arch/mips/include/asm/pgtable-64.h           |    4 ++--
 arch/parisc/include/asm/pgtable.h            |    4 ++--
 arch/powerpc/include/asm/book3s/64/pgtable.h |    6 +++++-
 arch/powerpc/include/asm/nohash/64/pgtable.h |    6 +++++-
 arch/powerpc/mm/book3s64/radix_pgtable.c     |    4 ++--
 arch/powerpc/mm/pgtable_64.c                 |    2 +-
 arch/riscv/include/asm/pgtable-64.h          |    4 ++--
 arch/sh/include/asm/pgtable-3level.h         |    4 ++--
 arch/sparc/include/asm/pgtable_32.h          |    6 +++---
 arch/sparc/include/asm/pgtable_64.h          |    6 +++---
 arch/um/include/asm/pgtable-3level.h         |    2 +-
 arch/x86/include/asm/pgtable.h               |    4 ++--
 arch/x86/mm/pat/set_memory.c                 |    4 ++--
 arch/x86/mm/pgtable.c                        |    2 +-
 include/asm-generic/pgtable-nopmd.h          |    2 +-
 include/asm-generic/pgtable-nopud.h          |    2 +-
 include/linux/pgtable.h                      |    2 +-
 22 files changed, 46 insertions(+), 36 deletions(-)

--- a/arch/alpha/include/asm/pgtable.h~mm-rename-pud_page_vaddr-to-pud_pgtable-and-make-it-return-pmd_t
+++ a/arch/alpha/include/asm/pgtable.h
@@ -236,8 +236,10 @@ pmd_page_vaddr(pmd_t pmd)
 #define pmd_page(pmd)	(pfn_to_page(pmd_val(pmd) >> 32))
 #define pud_page(pud)	(pfn_to_page(pud_val(pud) >> 32))
 
-extern inline unsigned long pud_page_vaddr(pud_t pgd)
-{ return PAGE_OFFSET + ((pud_val(pgd) & _PFN_MASK) >> (32-PAGE_SHIFT)); }
+extern inline pmd_t *pud_pgtable(pud_t pgd)
+{
+	return (pmd_t *)(PAGE_OFFSET + ((pud_val(pgd) & _PFN_MASK) >> (32-PAGE_SHIFT)));
+}
 
 extern inline int pte_none(pte_t pte)		{ return !pte_val(pte); }
 extern inline int pte_present(pte_t pte)	{ return pte_val(pte) & _PAGE_VALID; }
@@ -287,7 +289,7 @@ extern inline pte_t pte_mkyoung(pte_t pt
 /* Find an entry in the second-level page table.. */
 extern inline pmd_t * pmd_offset(pud_t * dir, unsigned long address)
 {
-	pmd_t *ret = (pmd_t *) pud_page_vaddr(*dir) + ((address >> PMD_SHIFT) & (PTRS_PER_PAGE - 1));
+	pmd_t *ret = pud_pgtable(*dir) + ((address >> PMD_SHIFT) & (PTRS_PER_PAGE - 1));
 	smp_rmb(); /* see above */
 	return ret;
 }
--- a/arch/arm64/include/asm/pgtable.h~mm-rename-pud_page_vaddr-to-pud_pgtable-and-make-it-return-pmd_t
+++ a/arch/arm64/include/asm/pgtable.h
@@ -649,9 +649,9 @@ static inline phys_addr_t pud_page_paddr
 	return __pud_to_phys(pud);
 }
 
-static inline unsigned long pud_page_vaddr(pud_t pud)
+static inline pmd_t *pud_pgtable(pud_t pud)
 {
-	return (unsigned long)__va(pud_page_paddr(pud));
+	return (pmd_t *)__va(pud_page_paddr(pud));
 }
 
 /* Find an entry in the second-level page table. */
--- a/arch/arm/include/asm/pgtable-3level.h~mm-rename-pud_page_vaddr-to-pud_pgtable-and-make-it-return-pmd_t
+++ a/arch/arm/include/asm/pgtable-3level.h
@@ -130,7 +130,7 @@
 		flush_pmd_entry(pudp);	\
 	} while (0)
 
-static inline pmd_t *pud_page_vaddr(pud_t pud)
+static inline pmd_t *pud_pgtable(pud_t pud)
 {
 	return __va(pud_val(pud) & PHYS_MASK & (s32)PAGE_MASK);
 }
--- a/arch/ia64/include/asm/pgtable.h~mm-rename-pud_page_vaddr-to-pud_pgtable-and-make-it-return-pmd_t
+++ a/arch/ia64/include/asm/pgtable.h
@@ -273,7 +273,7 @@ ia64_phys_addr_valid (unsigned long addr
 #define pud_bad(pud)			(!ia64_phys_addr_valid(pud_val(pud)))
 #define pud_present(pud)		(pud_val(pud) != 0UL)
 #define pud_clear(pudp)			(pud_val(*(pudp)) = 0UL)
-#define pud_page_vaddr(pud)		((unsigned long) __va(pud_val(pud) & _PFN_MASK))
+#define pud_pgtable(pud)		((pmd_t *) __va(pud_val(pud) & _PFN_MASK))
 #define pud_page(pud)			virt_to_page((pud_val(pud) + PAGE_OFFSET))
 
 #if CONFIG_PGTABLE_LEVELS == 4
--- a/arch/m68k/include/asm/motorola_pgtable.h~mm-rename-pud_page_vaddr-to-pud_pgtable-and-make-it-return-pmd_t
+++ a/arch/m68k/include/asm/motorola_pgtable.h
@@ -131,7 +131,7 @@ static inline void pud_set(pud_t *pudp,
 
 #define __pte_page(pte) ((unsigned long)__va(pte_val(pte) & PAGE_MASK))
 #define pmd_page_vaddr(pmd) ((unsigned long)__va(pmd_val(pmd) & _TABLE_MASK))
-#define pud_page_vaddr(pud) ((unsigned long)__va(pud_val(pud) & _TABLE_MASK))
+#define pud_pgtable(pud) ((pmd_t *)__va(pud_val(pud) & _TABLE_MASK))
 
 
 #define pte_none(pte)		(!pte_val(pte))
--- a/arch/mips/include/asm/pgtable-64.h~mm-rename-pud_page_vaddr-to-pud_pgtable-and-make-it-return-pmd_t
+++ a/arch/mips/include/asm/pgtable-64.h
@@ -313,9 +313,9 @@ static inline void pud_clear(pud_t *pudp
 #endif
 
 #ifndef __PAGETABLE_PMD_FOLDED
-static inline unsigned long pud_page_vaddr(pud_t pud)
+static inline pmd_t *pud_pgtable(pud_t pud)
 {
-	return pud_val(pud);
+	return (pmd_t *)pud_val(pud);
 }
 #define pud_phys(pud)		virt_to_phys((void *)pud_val(pud))
 #define pud_page(pud)		(pfn_to_page(pud_phys(pud) >> PAGE_SHIFT))
--- a/arch/parisc/include/asm/pgtable.h~mm-rename-pud_page_vaddr-to-pud_pgtable-and-make-it-return-pmd_t
+++ a/arch/parisc/include/asm/pgtable.h
@@ -322,8 +322,8 @@ static inline void pmd_clear(pmd_t *pmd)
 
 
 #if CONFIG_PGTABLE_LEVELS == 3
-#define pud_page_vaddr(pud) ((unsigned long) __va(pud_address(pud)))
-#define pud_page(pud)	virt_to_page((void *)pud_page_vaddr(pud))
+#define pud_pgtable(pud) ((pmd_t *) __va(pud_address(pud)))
+#define pud_page(pud)	virt_to_page((void *)pud_pgtable(pud))
 
 /* For 64 bit we have three level tables */
 
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h~mm-rename-pud_page_vaddr-to-pud_pgtable-and-make-it-return-pmd_t
+++ a/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1051,9 +1051,13 @@ extern struct page *p4d_page(p4d_t p4d);
 /* Pointers in the page table tree are physical addresses */
 #define __pgtable_ptr_val(ptr)	__pa(ptr)
 
-#define pud_page_vaddr(pud)	__va(pud_val(pud) & ~PUD_MASKED_BITS)
 #define p4d_page_vaddr(p4d)	__va(p4d_val(p4d) & ~P4D_MASKED_BITS)
 
+static inline pmd_t *pud_pgtable(pud_t pud)
+{
+	return (pmd_t *)__va(pud_val(pud) & ~PUD_MASKED_BITS);
+}
+
 #define pte_ERROR(e) \
 	pr_err("%s:%d: bad pte %08lx.\n", __FILE__, __LINE__, pte_val(e))
 #define pmd_ERROR(e) \
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h~mm-rename-pud_page_vaddr-to-pud_pgtable-and-make-it-return-pmd_t
+++ a/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -162,7 +162,11 @@ static inline void pud_clear(pud_t *pudp
 #define	pud_bad(pud)		(!is_kernel_addr(pud_val(pud)) \
 				 || (pud_val(pud) & PUD_BAD_BITS))
 #define pud_present(pud)	(pud_val(pud) != 0)
-#define pud_page_vaddr(pud)	(pud_val(pud) & ~PUD_MASKED_BITS)
+
+static inline pmd_t *pud_pgtable(pud_t pud)
+{
+	return (pmd_t *)(pud_val(pud) & ~PUD_MASKED_BITS);
+}
 
 extern struct page *pud_page(pud_t pud);
 
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c~mm-rename-pud_page_vaddr-to-pud_pgtable-and-make-it-return-pmd_t
+++ a/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -820,7 +820,7 @@ static void __meminit remove_pud_table(p
 			continue;
 		}
 
-		pmd_base = (pmd_t *)pud_page_vaddr(*pud);
+		pmd_base = pud_pgtable(*pud);
 		remove_pmd_table(pmd_base, addr, next);
 		free_pmd_table(pmd_base, pud);
 	}
@@ -1105,7 +1105,7 @@ int pud_free_pmd_page(pud_t *pud, unsign
 	pmd_t *pmd;
 	int i;
 
-	pmd = (pmd_t *)pud_page_vaddr(*pud);
+	pmd = pud_pgtable(*pud);
 	pud_clear(pud);
 
 	flush_tlb_kernel_range(addr, addr + PUD_SIZE);
--- a/arch/powerpc/mm/pgtable_64.c~mm-rename-pud_page_vaddr-to-pud_pgtable-and-make-it-return-pmd_t
+++ a/arch/powerpc/mm/pgtable_64.c
@@ -115,7 +115,7 @@ struct page *pud_page(pud_t pud)
 		VM_WARN_ON(!pud_huge(pud));
 		return pte_page(pud_pte(pud));
 	}
-	return virt_to_page(pud_page_vaddr(pud));
+	return virt_to_page(pud_pgtable(pud));
 }
 
 /*
--- a/arch/riscv/include/asm/pgtable-64.h~mm-rename-pud_page_vaddr-to-pud_pgtable-and-make-it-return-pmd_t
+++ a/arch/riscv/include/asm/pgtable-64.h
@@ -60,9 +60,9 @@ static inline void pud_clear(pud_t *pudp
 	set_pud(pudp, __pud(0));
 }
 
-static inline unsigned long pud_page_vaddr(pud_t pud)
+static inline pmd_t *pud_pgtable(pud_t pud)
 {
-	return (unsigned long)pfn_to_virt(pud_val(pud) >> _PAGE_PFN_SHIFT);
+	return (pmd_t *)pfn_to_virt(pud_val(pud) >> _PAGE_PFN_SHIFT);
 }
 
 static inline struct page *pud_page(pud_t pud)
--- a/arch/sh/include/asm/pgtable-3level.h~mm-rename-pud_page_vaddr-to-pud_pgtable-and-make-it-return-pmd_t
+++ a/arch/sh/include/asm/pgtable-3level.h
@@ -32,9 +32,9 @@ typedef struct { unsigned long long pmd;
 #define pmd_val(x)	((x).pmd)
 #define __pmd(x)	((pmd_t) { (x) } )
 
-static inline unsigned long pud_page_vaddr(pud_t pud)
+static inline pmd_t *pud_pgtable(pud_t pud)
 {
-	return pud_val(pud);
+	return (pmd_t *)pud_val(pud);
 }
 
 /* only used by the stubbed out hugetlb gup code, should never be called */
--- a/arch/sparc/include/asm/pgtable_32.h~mm-rename-pud_page_vaddr-to-pud_pgtable-and-make-it-return-pmd_t
+++ a/arch/sparc/include/asm/pgtable_32.h
@@ -151,13 +151,13 @@ static inline unsigned long pmd_page_vad
 	return (unsigned long)__nocache_va(v << 4);
 }
 
-static inline unsigned long pud_page_vaddr(pud_t pud)
+static inline pmd_t *pud_pgtable(pud_t pud)
 {
 	if (srmmu_device_memory(pud_val(pud))) {
-		return ~0;
+		return (pmd_t *)~0;
 	} else {
 		unsigned long v = pud_val(pud) & SRMMU_PTD_PMASK;
-		return (unsigned long)__nocache_va(v << 4);
+		return (pmd_t *)__nocache_va(v << 4);
 	}
 }
 
--- a/arch/sparc/include/asm/pgtable_64.h~mm-rename-pud_page_vaddr-to-pud_pgtable-and-make-it-return-pmd_t
+++ a/arch/sparc/include/asm/pgtable_64.h
@@ -841,18 +841,18 @@ static inline unsigned long pmd_page_vad
 	return ((unsigned long) __va(pfn << PAGE_SHIFT));
 }
 
-static inline unsigned long pud_page_vaddr(pud_t pud)
+static inline pmd_t *pud_pgtable(pud_t pud)
 {
 	pte_t pte = __pte(pud_val(pud));
 	unsigned long pfn;
 
 	pfn = pte_pfn(pte);
 
-	return ((unsigned long) __va(pfn << PAGE_SHIFT));
+	return ((pmd_t *) __va(pfn << PAGE_SHIFT));
 }
 
 #define pmd_page(pmd) 			virt_to_page((void *)pmd_page_vaddr(pmd))
-#define pud_page(pud) 			virt_to_page((void *)pud_page_vaddr(pud))
+#define pud_page(pud)			virt_to_page((void *)pud_pgtable(pud))
 #define pmd_clear(pmdp)			(pmd_val(*(pmdp)) = 0UL)
 #define pud_present(pud)		(pud_val(pud) != 0U)
 #define pud_clear(pudp)			(pud_val(*(pudp)) = 0UL)
--- a/arch/um/include/asm/pgtable-3level.h~mm-rename-pud_page_vaddr-to-pud_pgtable-and-make-it-return-pmd_t
+++ a/arch/um/include/asm/pgtable-3level.h
@@ -83,7 +83,7 @@ static inline void pud_clear (pud_t *pud
 }
 
 #define pud_page(pud) phys_to_page(pud_val(pud) & PAGE_MASK)
-#define pud_page_vaddr(pud) ((unsigned long) __va(pud_val(pud) & PAGE_MASK))
+#define pud_pgtable(pud) ((pmd_t *) __va(pud_val(pud) & PAGE_MASK))
 
 static inline unsigned long pte_pfn(pte_t pte)
 {
--- a/arch/x86/include/asm/pgtable.h~mm-rename-pud_page_vaddr-to-pud_pgtable-and-make-it-return-pmd_t
+++ a/arch/x86/include/asm/pgtable.h
@@ -836,9 +836,9 @@ static inline int pud_present(pud_t pud)
 	return pud_flags(pud) & _PAGE_PRESENT;
 }
 
-static inline unsigned long pud_page_vaddr(pud_t pud)
+static inline pmd_t *pud_pgtable(pud_t pud)
 {
-	return (unsigned long)__va(pud_val(pud) & pud_pfn_mask(pud));
+	return (pmd_t *)__va(pud_val(pud) & pud_pfn_mask(pud));
 }
 
 /*
--- a/arch/x86/mm/pat/set_memory.c~mm-rename-pud_page_vaddr-to-pud_pgtable-and-make-it-return-pmd_t
+++ a/arch/x86/mm/pat/set_memory.c
@@ -1134,7 +1134,7 @@ static void __unmap_pmd_range(pud_t *pud
 			      unsigned long start, unsigned long end)
 {
 	if (unmap_pte_range(pmd, start, end))
-		if (try_to_free_pmd_page((pmd_t *)pud_page_vaddr(*pud)))
+		if (try_to_free_pmd_page(pud_pgtable(*pud)))
 			pud_clear(pud);
 }
 
@@ -1178,7 +1178,7 @@ static void unmap_pmd_range(pud_t *pud,
 	 * Try again to free the PMD page if haven't succeeded above.
 	 */
 	if (!pud_none(*pud))
-		if (try_to_free_pmd_page((pmd_t *)pud_page_vaddr(*pud)))
+		if (try_to_free_pmd_page(pud_pgtable(*pud)))
 			pud_clear(pud);
 }
 
--- a/arch/x86/mm/pgtable.c~mm-rename-pud_page_vaddr-to-pud_pgtable-and-make-it-return-pmd_t
+++ a/arch/x86/mm/pgtable.c
@@ -801,7 +801,7 @@ int pud_free_pmd_page(pud_t *pud, unsign
 	pte_t *pte;
 	int i;
 
-	pmd = (pmd_t *)pud_page_vaddr(*pud);
+	pmd = pud_pgtable(*pud);
 	pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
 	if (!pmd_sv)
 		return 0;
--- a/include/asm-generic/pgtable-nopmd.h~mm-rename-pud_page_vaddr-to-pud_pgtable-and-make-it-return-pmd_t
+++ a/include/asm-generic/pgtable-nopmd.h
@@ -51,7 +51,7 @@ static inline pmd_t * pmd_offset(pud_t *
 #define __pmd(x)				((pmd_t) { __pud(x) } )
 
 #define pud_page(pud)				(pmd_page((pmd_t){ pud }))
-#define pud_page_vaddr(pud)			(pmd_page_vaddr((pmd_t){ pud }))
+#define pud_pgtable(pud)			((pmd_t *)(pmd_page_vaddr((pmd_t){ pud })))
 
 /*
  * allocating and freeing a pmd is trivial: the 1-entry pmd is
--- a/include/asm-generic/pgtable-nopud.h~mm-rename-pud_page_vaddr-to-pud_pgtable-and-make-it-return-pmd_t
+++ a/include/asm-generic/pgtable-nopud.h
@@ -49,7 +49,7 @@ static inline pud_t *pud_offset(p4d_t *p
 #define __pud(x)				((pud_t) { __p4d(x) })
 
 #define p4d_page(p4d)				(pud_page((pud_t){ p4d }))
-#define p4d_page_vaddr(p4d)			(pud_page_vaddr((pud_t){ p4d }))
+#define p4d_page_vaddr(p4d)			(pud_pgtable((pud_t){ p4d }))
 
 /*
  * allocating and freeing a pud is trivial: the 1-entry pud is
--- a/include/linux/pgtable.h~mm-rename-pud_page_vaddr-to-pud_pgtable-and-make-it-return-pmd_t
+++ a/include/linux/pgtable.h
@@ -106,7 +106,7 @@ static inline pte_t *pte_offset_kernel(p
 #ifndef pmd_offset
 static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
 {
-	return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
+	return pud_pgtable(*pud) + pmd_index(address);
 }
 #define pmd_offset pmd_offset
 #endif
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 45/54] mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t *
  2021-07-08  0:59 incoming Andrew Morton
                   ` (43 preceding siblings ...)
  2021-07-08  1:09 ` [patch 44/54] mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t * Andrew Morton
@ 2021-07-08  1:09 ` Andrew Morton
  2021-07-08  1:09 ` [patch 46/54] selftest/mremap_test: update the test to handle pagesize other than 4K Andrew Morton
                   ` (8 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:09 UTC (permalink / raw)
  To: akpm, aneesh.kumar, christophe.leroy, hughd, joel, kaleshsingh,
	kirill.shutemov, linux-mm, mm-commits, mpe, npiggin, sfr,
	torvalds

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Subject: mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t *

No functional change in this patch.

[aneesh.kumar@linux.ibm.com: m68k build error reported by kernel robot]
  Link: https://lkml.kernel.org/r/87tulxnb2v.fsf@linux.ibm.com
Link: https://lkml.kernel.org/r/20210615110859.320299-2-aneesh.kumar@linux.ibm.com
Link: https://lore.kernel.org/linuxppc-dev/CAHk-=wi+J+iodze9FtjM3Zi4j4OeS+qqbKxME9QN4roxPEXH9Q@mail.gmail.com/
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/arm64/include/asm/pgtable.h                |    4 ++--
 arch/ia64/include/asm/pgtable.h                 |    2 +-
 arch/mips/include/asm/pgtable-64.h              |    4 ++--
 arch/powerpc/include/asm/book3s/64/pgtable.h    |    5 ++++-
 arch/powerpc/include/asm/nohash/64/pgtable-4k.h |    6 +++++-
 arch/powerpc/mm/book3s64/radix_pgtable.c        |    2 +-
 arch/powerpc/mm/pgtable_64.c                    |    2 +-
 arch/sparc/include/asm/pgtable_64.h             |    4 ++--
 arch/x86/include/asm/pgtable.h                  |    4 ++--
 arch/x86/mm/init_64.c                           |    4 ++--
 include/asm-generic/pgtable-nop4d.h             |    2 +-
 include/asm-generic/pgtable-nopud.h             |    2 +-
 include/linux/pgtable.h                         |    2 +-
 13 files changed, 25 insertions(+), 18 deletions(-)

--- a/arch/arm64/include/asm/pgtable.h~mm-rename-p4d_page_vaddr-to-p4d_pgtable-and-make-it-return-pud_t
+++ a/arch/arm64/include/asm/pgtable.h
@@ -710,9 +710,9 @@ static inline phys_addr_t p4d_page_paddr
 	return __p4d_to_phys(p4d);
 }
 
-static inline unsigned long p4d_page_vaddr(p4d_t p4d)
+static inline pud_t *p4d_pgtable(p4d_t p4d)
 {
-	return (unsigned long)__va(p4d_page_paddr(p4d));
+	return (pud_t *)__va(p4d_page_paddr(p4d));
 }
 
 /* Find an entry in the frst-level page table. */
--- a/arch/ia64/include/asm/pgtable.h~mm-rename-p4d_page_vaddr-to-p4d_pgtable-and-make-it-return-pud_t
+++ a/arch/ia64/include/asm/pgtable.h
@@ -281,7 +281,7 @@ ia64_phys_addr_valid (unsigned long addr
 #define p4d_bad(p4d)			(!ia64_phys_addr_valid(p4d_val(p4d)))
 #define p4d_present(p4d)		(p4d_val(p4d) != 0UL)
 #define p4d_clear(p4dp)			(p4d_val(*(p4dp)) = 0UL)
-#define p4d_page_vaddr(p4d)		((unsigned long) __va(p4d_val(p4d) & _PFN_MASK))
+#define p4d_pgtable(p4d)		((pud_t *) __va(p4d_val(p4d) & _PFN_MASK))
 #define p4d_page(p4d)			virt_to_page((p4d_val(p4d) + PAGE_OFFSET))
 #endif
 
--- a/arch/mips/include/asm/pgtable-64.h~mm-rename-p4d_page_vaddr-to-p4d_pgtable-and-make-it-return-pud_t
+++ a/arch/mips/include/asm/pgtable-64.h
@@ -209,9 +209,9 @@ static inline void p4d_clear(p4d_t *p4dp
 	p4d_val(*p4dp) = (unsigned long)invalid_pud_table;
 }
 
-static inline unsigned long p4d_page_vaddr(p4d_t p4d)
+static inline pud_t *p4d_pgtable(p4d_t p4d)
 {
-	return p4d_val(p4d);
+	return (pud_t *)p4d_val(p4d);
 }
 
 #define p4d_phys(p4d)		virt_to_phys((void *)p4d_val(p4d))
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h~mm-rename-p4d_page_vaddr-to-p4d_pgtable-and-make-it-return-pud_t
+++ a/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1051,7 +1051,10 @@ extern struct page *p4d_page(p4d_t p4d);
 /* Pointers in the page table tree are physical addresses */
 #define __pgtable_ptr_val(ptr)	__pa(ptr)
 
-#define p4d_page_vaddr(p4d)	__va(p4d_val(p4d) & ~P4D_MASKED_BITS)
+static inline pud_t *p4d_pgtable(p4d_t p4d)
+{
+	return (pud_t *)__va(p4d_val(p4d) & ~P4D_MASKED_BITS);
+}
 
 static inline pmd_t *pud_pgtable(pud_t pud)
 {
--- a/arch/powerpc/include/asm/nohash/64/pgtable-4k.h~mm-rename-p4d_page_vaddr-to-p4d_pgtable-and-make-it-return-pud_t
+++ a/arch/powerpc/include/asm/nohash/64/pgtable-4k.h
@@ -56,10 +56,14 @@
 #define p4d_none(p4d)		(!p4d_val(p4d))
 #define p4d_bad(p4d)		(p4d_val(p4d) == 0)
 #define p4d_present(p4d)	(p4d_val(p4d) != 0)
-#define p4d_page_vaddr(p4d)	(p4d_val(p4d) & ~P4D_MASKED_BITS)
 
 #ifndef __ASSEMBLY__
 
+static inline pud_t *p4d_pgtable(p4d_t p4d)
+{
+	return (pud_t *) (p4d_val(p4d) & ~P4D_MASKED_BITS);
+}
+
 static inline void p4d_clear(p4d_t *p4dp)
 {
 	*p4dp = __p4d(0);
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c~mm-rename-p4d_page_vaddr-to-p4d_pgtable-and-make-it-return-pud_t
+++ a/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -854,7 +854,7 @@ static void __meminit remove_pagetable(u
 			continue;
 		}
 
-		pud_base = (pud_t *)p4d_page_vaddr(*p4d);
+		pud_base = p4d_pgtable(*p4d);
 		remove_pud_table(pud_base, addr, next);
 		free_pud_table(pud_base, p4d);
 	}
--- a/arch/powerpc/mm/pgtable_64.c~mm-rename-p4d_page_vaddr-to-p4d_pgtable-and-make-it-return-pud_t
+++ a/arch/powerpc/mm/pgtable_64.c
@@ -105,7 +105,7 @@ struct page *p4d_page(p4d_t p4d)
 		VM_WARN_ON(!p4d_huge(p4d));
 		return pte_page(p4d_pte(p4d));
 	}
-	return virt_to_page(p4d_page_vaddr(p4d));
+	return virt_to_page(p4d_pgtable(p4d));
 }
 #endif
 
--- a/arch/sparc/include/asm/pgtable_64.h~mm-rename-p4d_page_vaddr-to-p4d_pgtable-and-make-it-return-pud_t
+++ a/arch/sparc/include/asm/pgtable_64.h
@@ -856,8 +856,8 @@ static inline pmd_t *pud_pgtable(pud_t p
 #define pmd_clear(pmdp)			(pmd_val(*(pmdp)) = 0UL)
 #define pud_present(pud)		(pud_val(pud) != 0U)
 #define pud_clear(pudp)			(pud_val(*(pudp)) = 0UL)
-#define p4d_page_vaddr(p4d)		\
-	((unsigned long) __va(p4d_val(p4d)))
+#define p4d_pgtable(p4d)		\
+	((pud_t *) __va(p4d_val(p4d)))
 #define p4d_present(p4d)		(p4d_val(p4d) != 0U)
 #define p4d_clear(p4dp)			(p4d_val(*(p4dp)) = 0UL)
 
--- a/arch/x86/include/asm/pgtable.h~mm-rename-p4d_page_vaddr-to-p4d_pgtable-and-make-it-return-pud_t
+++ a/arch/x86/include/asm/pgtable.h
@@ -877,9 +877,9 @@ static inline int p4d_present(p4d_t p4d)
 	return p4d_flags(p4d) & _PAGE_PRESENT;
 }
 
-static inline unsigned long p4d_page_vaddr(p4d_t p4d)
+static inline pud_t *p4d_pgtable(p4d_t p4d)
 {
-	return (unsigned long)__va(p4d_val(p4d) & p4d_pfn_mask(p4d));
+	return (pud_t *)__va(p4d_val(p4d) & p4d_pfn_mask(p4d));
 }
 
 /*
--- a/arch/x86/mm/init_64.c~mm-rename-p4d_page_vaddr-to-p4d_pgtable-and-make-it-return-pud_t
+++ a/arch/x86/mm/init_64.c
@@ -194,8 +194,8 @@ static void sync_global_pgds_l4(unsigned
 			spin_lock(pgt_lock);
 
 			if (!p4d_none(*p4d_ref) && !p4d_none(*p4d))
-				BUG_ON(p4d_page_vaddr(*p4d)
-				       != p4d_page_vaddr(*p4d_ref));
+				BUG_ON(p4d_pgtable(*p4d)
+				       != p4d_pgtable(*p4d_ref));
 
 			if (p4d_none(*p4d))
 				set_p4d(p4d, *p4d_ref);
--- a/include/asm-generic/pgtable-nop4d.h~mm-rename-p4d_page_vaddr-to-p4d_pgtable-and-make-it-return-pud_t
+++ a/include/asm-generic/pgtable-nop4d.h
@@ -41,7 +41,7 @@ static inline p4d_t *p4d_offset(pgd_t *p
 #define __p4d(x)				((p4d_t) { __pgd(x) })
 
 #define pgd_page(pgd)				(p4d_page((p4d_t){ pgd }))
-#define pgd_page_vaddr(pgd)			(p4d_page_vaddr((p4d_t){ pgd }))
+#define pgd_page_vaddr(pgd)			((unsigned long)(p4d_pgtable((p4d_t){ pgd })))
 
 /*
  * allocating and freeing a p4d is trivial: the 1-entry p4d is
--- a/include/asm-generic/pgtable-nopud.h~mm-rename-p4d_page_vaddr-to-p4d_pgtable-and-make-it-return-pud_t
+++ a/include/asm-generic/pgtable-nopud.h
@@ -49,7 +49,7 @@ static inline pud_t *pud_offset(p4d_t *p
 #define __pud(x)				((pud_t) { __p4d(x) })
 
 #define p4d_page(p4d)				(pud_page((pud_t){ p4d }))
-#define p4d_page_vaddr(p4d)			(pud_pgtable((pud_t){ p4d }))
+#define p4d_pgtable(p4d)			((pud_t *)(pud_pgtable((pud_t){ p4d })))
 
 /*
  * allocating and freeing a pud is trivial: the 1-entry pud is
--- a/include/linux/pgtable.h~mm-rename-p4d_page_vaddr-to-p4d_pgtable-and-make-it-return-pud_t
+++ a/include/linux/pgtable.h
@@ -114,7 +114,7 @@ static inline pmd_t *pmd_offset(pud_t *p
 #ifndef pud_offset
 static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address)
 {
-	return (pud_t *)p4d_page_vaddr(*p4d) + pud_index(address);
+	return p4d_pgtable(*p4d) + pud_index(address);
 }
 #define pud_offset pud_offset
 #endif
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 46/54] selftest/mremap_test: update the test to handle pagesize other than 4K
  2021-07-08  0:59 incoming Andrew Morton
                   ` (44 preceding siblings ...)
  2021-07-08  1:09 ` [patch 45/54] mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t * Andrew Morton
@ 2021-07-08  1:09 ` Andrew Morton
  2021-07-08  1:10 ` [patch 47/54] selftest/mremap_test: avoid crash with static build Andrew Morton
                   ` (7 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:09 UTC (permalink / raw)
  To: akpm, aneesh.kumar, christophe.leroy, hughd, joel, kaleshsingh,
	kirill.shutemov, kirill, linux-mm, mm-commits, mpe, npiggin, sfr,
	torvalds

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Subject: selftest/mremap_test: update the test to handle pagesize other than 4K

Patch series "mrermap fixes", v2.


This patch (of 6):

Instead of hardcoding 4K page size fetch it using sysconf().  For the
performance measurements test still assume 2M and 1G are hugepage sizes.

Link: https://lkml.kernel.org/r/20210616045239.370802-1-aneesh.kumar@linux.ibm.com
Link: https://lkml.kernel.org/r/20210616045239.370802-2-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Kalesh Singh <kaleshsingh@google.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 tools/testing/selftests/vm/mremap_test.c |  111 +++++++++++----------
 1 file changed, 60 insertions(+), 51 deletions(-)

--- a/tools/testing/selftests/vm/mremap_test.c~selftest-mremap_test-update-the-test-to-handle-pagesize-other-than-4k
+++ a/tools/testing/selftests/vm/mremap_test.c
@@ -45,14 +45,15 @@ enum {
 	_4MB = 4ULL << 20,
 	_1GB = 1ULL << 30,
 	_2GB = 2ULL << 30,
-	PTE = _4KB,
 	PMD = _2MB,
 	PUD = _1GB,
 };
 
+#define PTE page_size
+
 #define MAKE_TEST(source_align, destination_align, size,	\
 		  overlaps, should_fail, test_name)		\
-{								\
+(struct test){							\
 	.name = test_name,					\
 	.config = {						\
 		.src_alignment = source_align,			\
@@ -252,12 +253,17 @@ static int parse_args(int argc, char **a
 	return 0;
 }
 
+#define MAX_TEST 13
+#define MAX_PERF_TEST 3
 int main(int argc, char **argv)
 {
 	int failures = 0;
 	int i, run_perf_tests;
 	unsigned int threshold_mb = VALIDATION_DEFAULT_THRESHOLD;
 	unsigned int pattern_seed;
+	struct test test_cases[MAX_TEST];
+	struct test perf_test_cases[MAX_PERF_TEST];
+	int page_size;
 	time_t t;
 
 	pattern_seed = (unsigned int) time(&t);
@@ -268,56 +274,59 @@ int main(int argc, char **argv)
 	ksft_print_msg("Test configs:\n\tthreshold_mb=%u\n\tpattern_seed=%u\n\n",
 		       threshold_mb, pattern_seed);
 
-	struct test test_cases[] = {
-		/* Expected mremap failures */
-		MAKE_TEST(_4KB, _4KB, _4KB, OVERLAPPING, EXPECT_FAILURE,
-		  "mremap - Source and Destination Regions Overlapping"),
-		MAKE_TEST(_4KB, _1KB, _4KB, NON_OVERLAPPING, EXPECT_FAILURE,
-		  "mremap - Destination Address Misaligned (1KB-aligned)"),
-		MAKE_TEST(_1KB, _4KB, _4KB, NON_OVERLAPPING, EXPECT_FAILURE,
-		  "mremap - Source Address Misaligned (1KB-aligned)"),
-
-		/* Src addr PTE aligned */
-		MAKE_TEST(PTE, PTE, _8KB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "8KB mremap - Source PTE-aligned, Destination PTE-aligned"),
-
-		/* Src addr 1MB aligned */
-		MAKE_TEST(_1MB, PTE, _2MB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "2MB mremap - Source 1MB-aligned, Destination PTE-aligned"),
-		MAKE_TEST(_1MB, _1MB, _2MB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "2MB mremap - Source 1MB-aligned, Destination 1MB-aligned"),
-
-		/* Src addr PMD aligned */
-		MAKE_TEST(PMD, PTE, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "4MB mremap - Source PMD-aligned, Destination PTE-aligned"),
-		MAKE_TEST(PMD, _1MB, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "4MB mremap - Source PMD-aligned, Destination 1MB-aligned"),
-		MAKE_TEST(PMD, PMD, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "4MB mremap - Source PMD-aligned, Destination PMD-aligned"),
-
-		/* Src addr PUD aligned */
-		MAKE_TEST(PUD, PTE, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "2GB mremap - Source PUD-aligned, Destination PTE-aligned"),
-		MAKE_TEST(PUD, _1MB, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "2GB mremap - Source PUD-aligned, Destination 1MB-aligned"),
-		MAKE_TEST(PUD, PMD, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "2GB mremap - Source PUD-aligned, Destination PMD-aligned"),
-		MAKE_TEST(PUD, PUD, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "2GB mremap - Source PUD-aligned, Destination PUD-aligned"),
-	};
+	page_size = sysconf(_SC_PAGESIZE);
 
-	struct test perf_test_cases[] = {
-		/*
-		 * mremap 1GB region - Page table level aligned time
-		 * comparison.
-		 */
-		MAKE_TEST(PTE, PTE, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "1GB mremap - Source PTE-aligned, Destination PTE-aligned"),
-		MAKE_TEST(PMD, PMD, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "1GB mremap - Source PMD-aligned, Destination PMD-aligned"),
-		MAKE_TEST(PUD, PUD, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
-		  "1GB mremap - Source PUD-aligned, Destination PUD-aligned"),
-	};
+	/* Expected mremap failures */
+	test_cases[0] =	MAKE_TEST(page_size, page_size, page_size,
+				  OVERLAPPING, EXPECT_FAILURE,
+				  "mremap - Source and Destination Regions Overlapping");
+
+	test_cases[1] = MAKE_TEST(page_size, page_size/4, page_size,
+				  NON_OVERLAPPING, EXPECT_FAILURE,
+				  "mremap - Destination Address Misaligned (1KB-aligned)");
+	test_cases[2] = MAKE_TEST(page_size/4, page_size, page_size,
+				  NON_OVERLAPPING, EXPECT_FAILURE,
+				  "mremap - Source Address Misaligned (1KB-aligned)");
+
+	/* Src addr PTE aligned */
+	test_cases[3] = MAKE_TEST(PTE, PTE, PTE * 2,
+				  NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "8KB mremap - Source PTE-aligned, Destination PTE-aligned");
+
+	/* Src addr 1MB aligned */
+	test_cases[4] = MAKE_TEST(_1MB, PTE, _2MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "2MB mremap - Source 1MB-aligned, Destination PTE-aligned");
+	test_cases[5] = MAKE_TEST(_1MB, _1MB, _2MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "2MB mremap - Source 1MB-aligned, Destination 1MB-aligned");
+
+	/* Src addr PMD aligned */
+	test_cases[6] = MAKE_TEST(PMD, PTE, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "4MB mremap - Source PMD-aligned, Destination PTE-aligned");
+	test_cases[7] =	MAKE_TEST(PMD, _1MB, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "4MB mremap - Source PMD-aligned, Destination 1MB-aligned");
+	test_cases[8] = MAKE_TEST(PMD, PMD, _4MB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "4MB mremap - Source PMD-aligned, Destination PMD-aligned");
+
+	/* Src addr PUD aligned */
+	test_cases[9] = MAKE_TEST(PUD, PTE, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				  "2GB mremap - Source PUD-aligned, Destination PTE-aligned");
+	test_cases[10] = MAKE_TEST(PUD, _1MB, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				   "2GB mremap - Source PUD-aligned, Destination 1MB-aligned");
+	test_cases[11] = MAKE_TEST(PUD, PMD, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				   "2GB mremap - Source PUD-aligned, Destination PMD-aligned");
+	test_cases[12] = MAKE_TEST(PUD, PUD, _2GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				   "2GB mremap - Source PUD-aligned, Destination PUD-aligned");
+
+	perf_test_cases[0] =  MAKE_TEST(page_size, page_size, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+					"1GB mremap - Source PTE-aligned, Destination PTE-aligned");
+	/*
+	 * mremap 1GB region - Page table level aligned time
+	 * comparison.
+	 */
+	perf_test_cases[1] = MAKE_TEST(PMD, PMD, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				       "1GB mremap - Source PMD-aligned, Destination PMD-aligned");
+	perf_test_cases[2] = MAKE_TEST(PUD, PUD, _1GB, NON_OVERLAPPING, EXPECT_SUCCESS,
+				       "1GB mremap - Source PUD-aligned, Destination PUD-aligned");
 
 	run_perf_tests =  (threshold_mb == VALIDATION_NO_THRESHOLD) ||
 				(threshold_mb * _1MB >= _1GB);
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 47/54] selftest/mremap_test: avoid crash with static build
  2021-07-08  0:59 incoming Andrew Morton
                   ` (45 preceding siblings ...)
  2021-07-08  1:09 ` [patch 46/54] selftest/mremap_test: update the test to handle pagesize other than 4K Andrew Morton
@ 2021-07-08  1:10 ` Andrew Morton
  2021-07-08  1:10 ` [patch 48/54] mm/mremap: convert huge PUD move to separate helper Andrew Morton
                   ` (6 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:10 UTC (permalink / raw)
  To: akpm, aneesh.kumar, christophe.leroy, hughd, joel, kaleshsingh,
	kirill.shutemov, linux-mm, mm-commits, mpe, npiggin, sfr,
	torvalds

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Subject: selftest/mremap_test: avoid crash with static build

With a large mmap map size, we can overlap with the text area and using
MAP_FIXED results in unmapping that area.  Switch to MAP_FIXED_NOREPLACE
and handle the EEXIST error.

Link: https://lkml.kernel.org/r/20210616045239.370802-3-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Kalesh Singh <kaleshsingh@google.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 tools/testing/selftests/vm/mremap_test.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/tools/testing/selftests/vm/mremap_test.c~selftest-mremap_test-avoid-crash-with-static-build
+++ a/tools/testing/selftests/vm/mremap_test.c
@@ -75,9 +75,10 @@ static void *get_source_mapping(struct c
 retry:
 	addr += c.src_alignment;
 	src_addr = mmap((void *) addr, c.region_size, PROT_READ | PROT_WRITE,
-			MAP_FIXED | MAP_ANONYMOUS | MAP_SHARED, -1, 0);
+			MAP_FIXED_NOREPLACE | MAP_ANONYMOUS | MAP_SHARED,
+			-1, 0);
 	if (src_addr == MAP_FAILED) {
-		if (errno == EPERM)
+		if (errno == EPERM || errno == EEXIST)
 			goto retry;
 		goto error;
 	}
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 48/54] mm/mremap: convert huge PUD move to separate helper
  2021-07-08  0:59 incoming Andrew Morton
                   ` (46 preceding siblings ...)
  2021-07-08  1:10 ` [patch 47/54] selftest/mremap_test: avoid crash with static build Andrew Morton
@ 2021-07-08  1:10 ` Andrew Morton
  2021-07-08  1:10 ` [patch 49/54] mm/mremap: don't enable optimized PUD move if page table levels is 2 Andrew Morton
                   ` (5 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:10 UTC (permalink / raw)
  To: akpm, aneesh.kumar, christophe.leroy, hughd, joel, kaleshsingh,
	kirill.shutemov, linux-mm, mm-commits, mpe, npiggin, sfr,
	torvalds

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Subject: mm/mremap: convert huge PUD move to separate helper

With TRANSPARENT_HUGEPAGE_PUD enabled the kernel can find huge PUD
entries.  Add a helper to move huge PUD entries on mremap().

This will be used by a later patch to optimize mremap of PUD_SIZE aligned
level 4 PTE mapped address

This also make sure we support mremap on huge PUD entries even with
CONFIG_HAVE_MOVE_PUD disabled.

[aneesh.kumar@linux.ibm.com: fix build failure with clang-10]
  Link: https://lore.kernel.org/lkml/YMuOSnJsL9qkxweY@archlinux-ax161
  Link: https://lkml.kernel.org/r/20210619134310.89098-1-aneesh.kumar@linux.ibm.com
Link: https://lkml.kernel.org/r/20210616045239.370802-4-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/mremap.c |   80 +++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 73 insertions(+), 7 deletions(-)

--- a/mm/mremap.c~mm-mremap-convert-huge-pud-move-to-separate-helper
+++ a/mm/mremap.c
@@ -324,10 +324,61 @@ static inline bool move_normal_pud(struc
 }
 #endif
 
+#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
+static bool move_huge_pud(struct vm_area_struct *vma, unsigned long old_addr,
+			  unsigned long new_addr, pud_t *old_pud, pud_t *new_pud)
+{
+	spinlock_t *old_ptl, *new_ptl;
+	struct mm_struct *mm = vma->vm_mm;
+	pud_t pud;
+
+	/*
+	 * The destination pud shouldn't be established, free_pgtables()
+	 * should have released it.
+	 */
+	if (WARN_ON_ONCE(!pud_none(*new_pud)))
+		return false;
+
+	/*
+	 * We don't have to worry about the ordering of src and dst
+	 * ptlocks because exclusive mmap_lock prevents deadlock.
+	 */
+	old_ptl = pud_lock(vma->vm_mm, old_pud);
+	new_ptl = pud_lockptr(mm, new_pud);
+	if (new_ptl != old_ptl)
+		spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
+
+	/* Clear the pud */
+	pud = *old_pud;
+	pud_clear(old_pud);
+
+	VM_BUG_ON(!pud_none(*new_pud));
+
+	/* Set the new pud */
+	/* mark soft_ditry when we add pud level soft dirty support */
+	set_pud_at(mm, new_addr, new_pud, pud);
+	flush_pud_tlb_range(vma, old_addr, old_addr + HPAGE_PUD_SIZE);
+	if (new_ptl != old_ptl)
+		spin_unlock(new_ptl);
+	spin_unlock(old_ptl);
+
+	return true;
+}
+#else
+static bool move_huge_pud(struct vm_area_struct *vma, unsigned long old_addr,
+			  unsigned long new_addr, pud_t *old_pud, pud_t *new_pud)
+{
+	WARN_ON_ONCE(1);
+	return false;
+
+}
+#endif
+
 enum pgt_entry {
 	NORMAL_PMD,
 	HPAGE_PMD,
 	NORMAL_PUD,
+	HPAGE_PUD,
 };
 
 /*
@@ -347,6 +398,7 @@ static __always_inline unsigned long get
 		mask = PMD_MASK;
 		size = PMD_SIZE;
 		break;
+	case HPAGE_PUD:
 	case NORMAL_PUD:
 		mask = PUD_MASK;
 		size = PUD_SIZE;
@@ -395,6 +447,12 @@ static bool move_pgt_entry(enum pgt_entr
 			move_huge_pmd(vma, old_addr, new_addr, old_entry,
 				      new_entry);
 		break;
+	case HPAGE_PUD:
+		moved = IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
+			move_huge_pud(vma, old_addr, new_addr, old_entry,
+				      new_entry);
+		break;
+
 	default:
 		WARN_ON_ONCE(1);
 		break;
@@ -414,6 +472,7 @@ unsigned long move_page_tables(struct vm
 	unsigned long extent, old_end;
 	struct mmu_notifier_range range;
 	pmd_t *old_pmd, *new_pmd;
+	pud_t *old_pud, *new_pud;
 
 	old_end = old_addr + len;
 	flush_cache_range(vma, old_addr, old_end);
@@ -429,15 +488,22 @@ unsigned long move_page_tables(struct vm
 		 * PUD level if possible.
 		 */
 		extent = get_extent(NORMAL_PUD, old_addr, old_end, new_addr);
-		if (IS_ENABLED(CONFIG_HAVE_MOVE_PUD) && extent == PUD_SIZE) {
-			pud_t *old_pud, *new_pud;
 
-			old_pud = get_old_pud(vma->vm_mm, old_addr);
-			if (!old_pud)
+		old_pud = get_old_pud(vma->vm_mm, old_addr);
+		if (!old_pud)
+			continue;
+		new_pud = alloc_new_pud(vma->vm_mm, vma, new_addr);
+		if (!new_pud)
+			break;
+		if (pud_trans_huge(*old_pud) || pud_devmap(*old_pud)) {
+			if (extent == HPAGE_PUD_SIZE) {
+				move_pgt_entry(HPAGE_PUD, vma, old_addr, new_addr,
+					       old_pud, new_pud, need_rmap_locks);
+				/* We ignore and continue on error? */
 				continue;
-			new_pud = alloc_new_pud(vma->vm_mm, vma, new_addr);
-			if (!new_pud)
-				break;
+			}
+		} else if (IS_ENABLED(CONFIG_HAVE_MOVE_PUD) && extent == PUD_SIZE) {
+
 			if (move_pgt_entry(NORMAL_PUD, vma, old_addr, new_addr,
 					   old_pud, new_pud, need_rmap_locks))
 				continue;
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 49/54] mm/mremap: don't enable optimized PUD move if page table levels is 2
  2021-07-08  0:59 incoming Andrew Morton
                   ` (47 preceding siblings ...)
  2021-07-08  1:10 ` [patch 48/54] mm/mremap: convert huge PUD move to separate helper Andrew Morton
@ 2021-07-08  1:10 ` Andrew Morton
  2021-07-08  1:10 ` [patch 50/54] mm/mremap: use pmd/pud_poplulate to update page table entries Andrew Morton
                   ` (4 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:10 UTC (permalink / raw)
  To: akpm, aneesh.kumar, christophe.leroy, hughd, joel, kaleshsingh,
	kirill.shutemov, linux-mm, mm-commits, mpe, npiggin, sfr,
	torvalds

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Subject: mm/mremap: don't enable optimized PUD move if page table levels is 2

With two level page table don't enable move_normal_pud.

Link: https://lkml.kernel.org/r/20210616045239.370802-5-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/mremap.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/mremap.c~mm-mremap-dont-enable-optimized-pud-move-if-page-table-levels-is-2
+++ a/mm/mremap.c
@@ -276,7 +276,7 @@ static inline bool move_normal_pmd(struc
 }
 #endif
 
-#ifdef CONFIG_HAVE_MOVE_PUD
+#if CONFIG_PGTABLE_LEVELS > 2 && defined(CONFIG_HAVE_MOVE_PUD)
 static bool move_normal_pud(struct vm_area_struct *vma, unsigned long old_addr,
 		  unsigned long new_addr, pud_t *old_pud, pud_t *new_pud)
 {
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 50/54] mm/mremap: use pmd/pud_poplulate to update page table entries
  2021-07-08  0:59 incoming Andrew Morton
                   ` (48 preceding siblings ...)
  2021-07-08  1:10 ` [patch 49/54] mm/mremap: don't enable optimized PUD move if page table levels is 2 Andrew Morton
@ 2021-07-08  1:10 ` Andrew Morton
  2021-07-08  1:10 ` [patch 51/54] mm/mremap: hold the rmap lock in write mode when moving " Andrew Morton
                   ` (3 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:10 UTC (permalink / raw)
  To: akpm, aneesh.kumar, christophe.leroy, hughd, joel, kaleshsingh,
	kirill.shutemov, linux-mm, mm-commits, mpe, npiggin, sfr,
	torvalds

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Subject: mm/mremap: use pmd/pud_poplulate to update page table entries

pmd/pud_populate is the right interface to be used to set the respective
page table entries.  Some architectures like ppc64 do assume that
set_pmd/pud_at can only be used to set a hugepage PTE.  Since we are not
setting up a hugepage PTE here, use the pmd/pud_populate interface.

Link: https://lkml.kernel.org/r/20210616045239.370802-6-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/mremap.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

--- a/mm/mremap.c~mm-mremap-use-pmd-pud_poplulate-to-update-page-table-entries
+++ a/mm/mremap.c
@@ -26,6 +26,7 @@
 
 #include <asm/cacheflush.h>
 #include <asm/tlbflush.h>
+#include <asm/pgalloc.h>
 
 #include "internal.h"
 
@@ -258,8 +259,7 @@ static bool move_normal_pmd(struct vm_ar
 
 	VM_BUG_ON(!pmd_none(*new_pmd));
 
-	/* Set the new pmd */
-	set_pmd_at(mm, new_addr, new_pmd, pmd);
+	pmd_populate(mm, new_pmd, pmd_pgtable(pmd));
 	flush_tlb_range(vma, old_addr, old_addr + PMD_SIZE);
 	if (new_ptl != old_ptl)
 		spin_unlock(new_ptl);
@@ -306,8 +306,7 @@ static bool move_normal_pud(struct vm_ar
 
 	VM_BUG_ON(!pud_none(*new_pud));
 
-	/* Set the new pud */
-	set_pud_at(mm, new_addr, new_pud, pud);
+	pud_populate(mm, new_pud, pud_pgtable(pud));
 	flush_tlb_range(vma, old_addr, old_addr + PUD_SIZE);
 	if (new_ptl != old_ptl)
 		spin_unlock(new_ptl);
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 51/54] mm/mremap: hold the rmap lock in write mode when moving page table entries.
  2021-07-08  0:59 incoming Andrew Morton
                   ` (49 preceding siblings ...)
  2021-07-08  1:10 ` [patch 50/54] mm/mremap: use pmd/pud_poplulate to update page table entries Andrew Morton
@ 2021-07-08  1:10 ` Andrew Morton
  2021-07-08  1:10 ` [patch 52/54] mm/mremap: allow arch runtime override Andrew Morton
                   ` (2 subsequent siblings)
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:10 UTC (permalink / raw)
  To: akpm, aneesh.kumar, christophe.leroy, hughd, joel, kaleshsingh,
	kirill.shutemov, kirill, linux-mm, mm-commits, mpe, npiggin, sfr,
	stable, torvalds

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Subject: mm/mremap: hold the rmap lock in write mode when moving page table entries.

To avoid a race between rmap walk and mremap, mremap does
take_rmap_locks().  The lock was taken to ensure that rmap walk don't miss
a page table entry due to PTE moves via move_pagetables().  The kernel
does further optimization of this lock such that if we are going to find
the newly added vma after the old vma, the rmap lock is not taken.  This
is because rmap walk would find the vmas in the same order and if we don't
find the page table attached to older vma we would find it with the new
vma which we would iterate later.

As explained in commit eb66ae030829 ("mremap: properly flush TLB before
releasing the page") mremap is special in that it doesn't take ownership
of the page.  The optimized version for PUD/PMD aligned mremap also
doesn't hold the ptl lock.  This can result in stale TLB entries as show
below.

This patch updates the rmap locking requirement in mremap to handle the race condition
explained below with optimized mremap::

Optmized PMD move

    CPU 1                           CPU 2                                   CPU 3

    mremap(old_addr, new_addr)      page_shrinker/try_to_unmap_one

    mmap_write_lock_killable()

                                    addr = old_addr
                                    lock(pte_ptl)
    lock(pmd_ptl)
    pmd = *old_pmd
    pmd_clear(old_pmd)
    flush_tlb_range(old_addr)

    *new_pmd = pmd
                                                                            *new_addr = 10; and fills
                                                                            TLB with new addr
                                                                            and old pfn

    unlock(pmd_ptl)
                                    ptep_clear_flush()
                                    old pfn is free.
                                                                            Stale TLB entry

Optimized PUD move also suffers from a similar race.  Both the above race
condition can be fixed if we force mremap path to take rmap lock.

Link: https://lkml.kernel.org/r/20210616045239.370802-7-aneesh.kumar@linux.ibm.com
Fixes: 2c91bd4a4e2e ("mm: speed up mremap by 20x on large regions")
Fixes: c49dd3401802 ("mm: speedup mremap on 1GB or larger regions")
Link: https://lore.kernel.org/linux-mm/CAHk-=wgXVR04eBNtxQfevontWnP6FDm+oj5vauQXP3S-huwbPw@mail.gmail.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/mremap.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/mm/mremap.c~mm-mremap-hold-the-rmap-lock-in-write-mode-when-moving-page-table-entries
+++ a/mm/mremap.c
@@ -504,7 +504,7 @@ unsigned long move_page_tables(struct vm
 		} else if (IS_ENABLED(CONFIG_HAVE_MOVE_PUD) && extent == PUD_SIZE) {
 
 			if (move_pgt_entry(NORMAL_PUD, vma, old_addr, new_addr,
-					   old_pud, new_pud, need_rmap_locks))
+					   old_pud, new_pud, true))
 				continue;
 		}
 
@@ -531,7 +531,7 @@ unsigned long move_page_tables(struct vm
 			 * moving at the PMD level if possible.
 			 */
 			if (move_pgt_entry(NORMAL_PMD, vma, old_addr, new_addr,
-					   old_pmd, new_pmd, need_rmap_locks))
+					   old_pmd, new_pmd, true))
 				continue;
 		}
 
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 52/54] mm/mremap: allow arch runtime override
  2021-07-08  0:59 incoming Andrew Morton
                   ` (50 preceding siblings ...)
  2021-07-08  1:10 ` [patch 51/54] mm/mremap: hold the rmap lock in write mode when moving " Andrew Morton
@ 2021-07-08  1:10 ` Andrew Morton
  2021-07-08  1:10 ` [patch 53/54] powerpc/book3s64/mm: update flush_tlb_range to flush page walk cache Andrew Morton
  2021-07-08  1:10 ` [patch 54/54] powerpc/mm: enable HAVE_MOVE_PMD support Andrew Morton
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:10 UTC (permalink / raw)
  To: akpm, aneesh.kumar, christophe.leroy, hughd, joel, kaleshsingh,
	kirill.shutemov, kirill, linux-mm, mm-commits, mpe, npiggin, sfr,
	torvalds

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Subject: mm/mremap: allow arch runtime override

Patch series "Speedup mremap on ppc64", v8.

This patchset enables MOVE_PMD/MOVE_PUD support on power.  This requires
the platform to support updating higher-level page tables without updating
page table entries.  This also needs to invalidate the Page Walk Cache on
architecture supporting the same.


This patch (of 3):

Architectures like ppc64 support faster mremap only with radix
translation.  Hence allow a runtime check w.r.t support for fast mremap.

Link: https://lkml.kernel.org/r/20210616045735.374532-1-aneesh.kumar@linux.ibm.com
Link: https://lkml.kernel.org/r/20210616045735.374532-2-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/powerpc/include/asm/tlb.h |    6 ++++++
 mm/mremap.c                    |   15 ++++++++++++++-
 2 files changed, 20 insertions(+), 1 deletion(-)

--- a/arch/powerpc/include/asm/tlb.h~mm-mremap-allow-arch-runtime-override
+++ a/arch/powerpc/include/asm/tlb.h
@@ -83,5 +83,11 @@ static inline int mm_is_thread_local(str
 }
 #endif
 
+#define arch_supports_page_table_move arch_supports_page_table_move
+static inline bool arch_supports_page_table_move(void)
+{
+	return radix_enabled();
+}
+
 #endif /* __KERNEL__ */
 #endif /* __ASM_POWERPC_TLB_H */
--- a/mm/mremap.c~mm-mremap-allow-arch-runtime-override
+++ a/mm/mremap.c
@@ -25,7 +25,7 @@
 #include <linux/userfaultfd_k.h>
 
 #include <asm/cacheflush.h>
-#include <asm/tlbflush.h>
+#include <asm/tlb.h>
 #include <asm/pgalloc.h>
 
 #include "internal.h"
@@ -210,6 +210,15 @@ static void move_ptes(struct vm_area_str
 		drop_rmap_locks(vma);
 }
 
+#ifndef arch_supports_page_table_move
+#define arch_supports_page_table_move arch_supports_page_table_move
+static inline bool arch_supports_page_table_move(void)
+{
+	return IS_ENABLED(CONFIG_HAVE_MOVE_PMD) ||
+		IS_ENABLED(CONFIG_HAVE_MOVE_PUD);
+}
+#endif
+
 #ifdef CONFIG_HAVE_MOVE_PMD
 static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
 		  unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd)
@@ -218,6 +227,8 @@ static bool move_normal_pmd(struct vm_ar
 	struct mm_struct *mm = vma->vm_mm;
 	pmd_t pmd;
 
+	if (!arch_supports_page_table_move())
+		return false;
 	/*
 	 * The destination pmd shouldn't be established, free_pgtables()
 	 * should have released it.
@@ -284,6 +295,8 @@ static bool move_normal_pud(struct vm_ar
 	struct mm_struct *mm = vma->vm_mm;
 	pud_t pud;
 
+	if (!arch_supports_page_table_move())
+		return false;
 	/*
 	 * The destination pud shouldn't be established, free_pgtables()
 	 * should have released it.
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 53/54] powerpc/book3s64/mm: update flush_tlb_range to flush page walk cache
  2021-07-08  0:59 incoming Andrew Morton
                   ` (51 preceding siblings ...)
  2021-07-08  1:10 ` [patch 52/54] mm/mremap: allow arch runtime override Andrew Morton
@ 2021-07-08  1:10 ` Andrew Morton
  2021-07-08  1:10 ` [patch 54/54] powerpc/mm: enable HAVE_MOVE_PMD support Andrew Morton
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:10 UTC (permalink / raw)
  To: akpm, aneesh.kumar, christophe.leroy, hughd, joel, kaleshsingh,
	kirill.shutemov, linux-mm, mm-commits, mpe, npiggin, sfr,
	torvalds

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Subject: powerpc/book3s64/mm: update flush_tlb_range to flush page walk cache

flush_tlb_range is special in that we don't specify the page size used for
the translation.  Hence when flushing TLB we flush the translation cache
for all possible page sizes.  The kernel also uses the same interface when
moving page tables around.  Such a move requires us to flush the page walk
cache.

Instead of adding another interface to force page walk cache flush, update
flush_tlb_range to flush page walk cache if the range flushed is more than
the PMD range.  A page table move will always involve an invalidate range
more than PMD_SIZE.

Running microbenchmark with mprotect and parallel memory access didn't
show any observable performance impact.

Link: https://lkml.kernel.org/r/20210616045735.374532-3-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/powerpc/include/asm/book3s/64/tlbflush-radix.h |    2 
 arch/powerpc/mm/book3s64/radix_hugetlbpage.c        |    8 +
 arch/powerpc/mm/book3s64/radix_tlb.c                |   44 ++++++----
 3 files changed, 36 insertions(+), 18 deletions(-)

--- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h~powerpc-book3s64-mm-update-flush_tlb_range-to-flush-page-walk-cache
+++ a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
@@ -64,6 +64,8 @@ extern void radix__flush_hugetlb_tlb_ran
 					   unsigned long start, unsigned long end);
 extern void radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long start,
 					 unsigned long end, int psize);
+void radix__flush_tlb_pwc_range_psize(struct mm_struct *mm, unsigned long start,
+				      unsigned long end, int psize);
 extern void radix__flush_pmd_tlb_range(struct vm_area_struct *vma,
 				       unsigned long start, unsigned long end);
 extern void radix__flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
--- a/arch/powerpc/mm/book3s64/radix_hugetlbpage.c~powerpc-book3s64-mm-update-flush_tlb_range-to-flush-page-walk-cache
+++ a/arch/powerpc/mm/book3s64/radix_hugetlbpage.c
@@ -32,7 +32,13 @@ void radix__flush_hugetlb_tlb_range(stru
 	struct hstate *hstate = hstate_file(vma->vm_file);
 
 	psize = hstate_get_psize(hstate);
-	radix__flush_tlb_range_psize(vma->vm_mm, start, end, psize);
+	/*
+	 * Flush PWC even if we get PUD_SIZE hugetlb invalidate to keep this simpler.
+	 */
+	if (end - start >= PUD_SIZE)
+		radix__flush_tlb_pwc_range_psize(vma->vm_mm, start, end, psize);
+	else
+		radix__flush_tlb_range_psize(vma->vm_mm, start, end, psize);
 }
 
 /*
--- a/arch/powerpc/mm/book3s64/radix_tlb.c~powerpc-book3s64-mm-update-flush_tlb_range-to-flush-page-walk-cache
+++ a/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -1111,14 +1111,13 @@ static unsigned long tlb_local_single_pa
 
 static inline void __radix__flush_tlb_range(struct mm_struct *mm,
 					    unsigned long start, unsigned long end)
-
 {
 	unsigned long pid;
 	unsigned int page_shift = mmu_psize_defs[mmu_virtual_psize].shift;
 	unsigned long page_size = 1UL << page_shift;
 	unsigned long nr_pages = (end - start) >> page_shift;
 	bool fullmm = (end == TLB_FLUSH_ALL);
-	bool flush_pid;
+	bool flush_pid, flush_pwc = false;
 	enum tlb_flush_type type;
 
 	pid = mm->context.id;
@@ -1137,8 +1136,16 @@ static inline void __radix__flush_tlb_ra
 		flush_pid = nr_pages > tlb_single_page_flush_ceiling;
 	else
 		flush_pid = nr_pages > tlb_local_single_page_flush_ceiling;
+	/*
+	 * full pid flush already does the PWC flush. if it is not full pid
+	 * flush check the range is more than PMD and force a pwc flush
+	 * mremap() depends on this behaviour.
+	 */
+	if (!flush_pid && (end - start) >= PMD_SIZE)
+		flush_pwc = true;
 
 	if (!mmu_has_feature(MMU_FTR_GTSE) && type == FLUSH_TYPE_GLOBAL) {
+		unsigned long type = H_RPTI_TYPE_TLB;
 		unsigned long tgt = H_RPTI_TARGET_CMMU;
 		unsigned long pg_sizes = psize_to_rpti_pgsize(mmu_virtual_psize);
 
@@ -1146,19 +1153,20 @@ static inline void __radix__flush_tlb_ra
 			pg_sizes |= psize_to_rpti_pgsize(MMU_PAGE_2M);
 		if (atomic_read(&mm->context.copros) > 0)
 			tgt |= H_RPTI_TARGET_NMMU;
-		pseries_rpt_invalidate(pid, tgt, H_RPTI_TYPE_TLB, pg_sizes,
-				       start, end);
+		if (flush_pwc)
+			type |= H_RPTI_TYPE_PWC;
+		pseries_rpt_invalidate(pid, tgt, type, pg_sizes, start, end);
 	} else if (flush_pid) {
+		/*
+		 * We are now flushing a range larger than PMD size force a RIC_FLUSH_ALL
+		 */
 		if (type == FLUSH_TYPE_LOCAL) {
-			_tlbiel_pid(pid, RIC_FLUSH_TLB);
+			_tlbiel_pid(pid, RIC_FLUSH_ALL);
 		} else {
 			if (cputlb_use_tlbie()) {
-				if (mm_needs_flush_escalation(mm))
-					_tlbie_pid(pid, RIC_FLUSH_ALL);
-				else
-					_tlbie_pid(pid, RIC_FLUSH_TLB);
+				_tlbie_pid(pid, RIC_FLUSH_ALL);
 			} else {
-				_tlbiel_pid_multicast(mm, pid, RIC_FLUSH_TLB);
+				_tlbiel_pid_multicast(mm, pid, RIC_FLUSH_ALL);
 			}
 		}
 	} else {
@@ -1174,6 +1182,9 @@ static inline void __radix__flush_tlb_ra
 
 		if (type == FLUSH_TYPE_LOCAL) {
 			asm volatile("ptesync": : :"memory");
+			if (flush_pwc)
+				/* For PWC, only one flush is needed */
+				__tlbiel_pid(pid, 0, RIC_FLUSH_PWC);
 			__tlbiel_va_range(start, end, pid, page_size, mmu_virtual_psize);
 			if (hflush)
 				__tlbiel_va_range(hstart, hend, pid,
@@ -1181,6 +1192,8 @@ static inline void __radix__flush_tlb_ra
 			ppc_after_tlbiel_barrier();
 		} else if (cputlb_use_tlbie()) {
 			asm volatile("ptesync": : :"memory");
+			if (flush_pwc)
+				__tlbie_pid(pid, RIC_FLUSH_PWC);
 			__tlbie_va_range(start, end, pid, page_size, mmu_virtual_psize);
 			if (hflush)
 				__tlbie_va_range(hstart, hend, pid,
@@ -1188,10 +1201,10 @@ static inline void __radix__flush_tlb_ra
 			asm volatile("eieio; tlbsync; ptesync": : :"memory");
 		} else {
 			_tlbiel_va_range_multicast(mm,
-					start, end, pid, page_size, mmu_virtual_psize, false);
+					start, end, pid, page_size, mmu_virtual_psize, flush_pwc);
 			if (hflush)
 				_tlbiel_va_range_multicast(mm,
-					hstart, hend, pid, PMD_SIZE, MMU_PAGE_2M, false);
+					hstart, hend, pid, PMD_SIZE, MMU_PAGE_2M, flush_pwc);
 		}
 	}
 out:
@@ -1265,9 +1278,6 @@ void radix__flush_all_lpid_guest(unsigne
 	_tlbie_lpid_guest(lpid, RIC_FLUSH_ALL);
 }
 
-static void radix__flush_tlb_pwc_range_psize(struct mm_struct *mm, unsigned long start,
-				  unsigned long end, int psize);
-
 void radix__tlb_flush(struct mmu_gather *tlb)
 {
 	int psize = 0;
@@ -1374,8 +1384,8 @@ void radix__flush_tlb_range_psize(struct
 	return __radix__flush_tlb_range_psize(mm, start, end, psize, false);
 }
 
-static void radix__flush_tlb_pwc_range_psize(struct mm_struct *mm, unsigned long start,
-				  unsigned long end, int psize)
+void radix__flush_tlb_pwc_range_psize(struct mm_struct *mm, unsigned long start,
+				      unsigned long end, int psize)
 {
 	__radix__flush_tlb_range_psize(mm, start, end, psize, true);
 }
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [patch 54/54] powerpc/mm: enable HAVE_MOVE_PMD support
  2021-07-08  0:59 incoming Andrew Morton
                   ` (52 preceding siblings ...)
  2021-07-08  1:10 ` [patch 53/54] powerpc/book3s64/mm: update flush_tlb_range to flush page walk cache Andrew Morton
@ 2021-07-08  1:10 ` Andrew Morton
  53 siblings, 0 replies; 75+ messages in thread
From: Andrew Morton @ 2021-07-08  1:10 UTC (permalink / raw)
  To: akpm, aneesh.kumar, christophe.leroy, hughd, joel, kaleshsingh,
	kirill.shutemov, kirill, linux-mm, mm-commits, mpe, npiggin, sfr,
	torvalds

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Subject: powerpc/mm: enable HAVE_MOVE_PMD support

mremap HAVE_MOVE_PMD/PUD optimization time comparison for 1GB region:
1GB mremap - Source PTE-aligned, Destination PTE-aligned
  mremap time:      2292772ns
1GB mremap - Source PMD-aligned, Destination PMD-aligned
  mremap time:      1158928ns
1GB mremap - Source PUD-aligned, Destination PUD-aligned
  mremap time:        63886ns

Link: https://lkml.kernel.org/r/20210616045735.374532-4-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/powerpc/platforms/Kconfig.cputype |    2 ++
 1 file changed, 2 insertions(+)

--- a/arch/powerpc/platforms/Kconfig.cputype~powerpc-mm-enable-have_move_pmd-support
+++ a/arch/powerpc/platforms/Kconfig.cputype
@@ -102,6 +102,8 @@ config PPC_BOOK3S_64
 	select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
 	select ARCH_SUPPORTS_HUGETLBFS
 	select ARCH_SUPPORTS_NUMA_BALANCING
+	select HAVE_MOVE_PMD
+	select HAVE_MOVE_PUD
 	select IRQ_WORK
 	select PPC_MM_SLICES
 	select PPC_HAVE_KUEP
_

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [patch 11/54] mm: introduce memfd_secret system call to create "secret" memory areas
  2021-07-08  1:08 ` [patch 11/54] mm: introduce memfd_secret system call to create "secret" memory areas Andrew Morton
@ 2021-07-08  3:13     ` Linus Torvalds
  0 siblings, 0 replies; 75+ messages in thread
From: Linus Torvalds @ 2021-07-08  3:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Arnd Bergmann, Borislav Petkov, Catalin Marinas,
	Christoph Lameter, Dan Williams, Dave Hansen, David Hildenbrand,
	Reshetova, Elena, Roman Gushchin, Hagen Paul Pfeifer,
	Peter Anvin, James Bottomley, James Bottomley,
	Kirill A . Shutemov, Linux-MM, kernel test robot,
	Andrew Lutomirski, Mark Rutland, Ingo Molnar, mm-commits,
	Michael Kerrisk-manpages, Palmer Dabbelt, Palmer Dabbelt,
	Paul Walmsley, Peter Zijlstra, Edgecombe, Rick P, Mike Rapoport,
	Shakeel Butt, Shuah Khan, Thomas Gleixner, Tycho Andersen,
	Al Viro, Will Deacon, Matthew Wilcox

On Wed, Jul 7, 2021 at 6:08 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> From: Mike Rapoport <rppt@linux.ibm.com>
> Subject: mm: introduce memfd_secret system call to create "secret" memory areas
>
> Introduce "memfd_secret" system call with the ability to create memory
> areas visible only in the context of the owning process and not mapped not
> only to other processes but in the kernel page tables as well.

Am I missing something?

From what I can't tell, this must not be enabled for regular users,
because the secret mapping is effectively mlock'ed into the address
space.

But there does not seem to be any permission checks or any limits, so
this looks like a trivial way for a bad user to force the kernel to
run out of memory.

So this looks entirely unacceptable.

Please tell me what I'm not getting...

             Linus

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [patch 11/54] mm: introduce memfd_secret system call to create "secret" memory areas
@ 2021-07-08  3:13     ` Linus Torvalds
  0 siblings, 0 replies; 75+ messages in thread
From: Linus Torvalds @ 2021-07-08  3:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Arnd Bergmann, Borislav Petkov, Catalin Marinas,
	Christoph Lameter, Dan Williams, Dave Hansen, David Hildenbrand,
	Reshetova, Elena, Roman Gushchin, Hagen Paul Pfeifer,
	Peter Anvin, James Bottomley, James Bottomley,
	Kirill A . Shutemov, Linux-MM, kernel test robot,
	Andrew Lutomirski, Mark Rutland, Ingo Molnar, mm-commits,
	Michael Kerrisk-manpages, Palmer Dabbelt, Palmer Dabbelt,
	Paul Walmsley, Peter Zijlstra, Edgecombe, Rick P, Mike Rapoport,
	Shakeel Butt, Shuah Khan, Thomas Gleixner, Tycho Andersen,
	Al Viro, Will Deacon, Matthew Wilcox

On Wed, Jul 7, 2021 at 6:08 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> From: Mike Rapoport <rppt@linux.ibm.com>
> Subject: mm: introduce memfd_secret system call to create "secret" memory areas
>
> Introduce "memfd_secret" system call with the ability to create memory
> areas visible only in the context of the owning process and not mapped not
> only to other processes but in the kernel page tables as well.

Am I missing something?

From what I can't tell, this must not be enabled for regular users,
because the secret mapping is effectively mlock'ed into the address
space.

But there does not seem to be any permission checks or any limits, so
this looks like a trivial way for a bad user to force the kernel to
run out of memory.

So this looks entirely unacceptable.

Please tell me what I'm not getting...

             Linus


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [patch 12/54] PM: hibernate: disable when there are active secretmem users
  2021-07-08  1:08 ` [patch 12/54] PM: hibernate: disable when there are active secretmem users Andrew Morton
@ 2021-07-08  3:15     ` Linus Torvalds
  0 siblings, 0 replies; 75+ messages in thread
From: Linus Torvalds @ 2021-07-08  3:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Arnd Bergmann, Borislav Petkov, Catalin Marinas,
	Christoph Lameter, Dan Williams, Dave Hansen, David Hildenbrand,
	Reshetova, Elena, Roman Gushchin, Hagen Paul Pfeifer,
	Peter Anvin, James Bottomley, James Bottomley,
	Kirill A . Shutemov, Linux-MM, kernel test robot,
	Andrew Lutomirski, Mark Rutland, Ingo Molnar, mm-commits,
	Michael Kerrisk-manpages, Palmer Dabbelt, Palmer Dabbelt,
	Paul Walmsley, Peter Zijlstra, Edgecombe, Rick P, Mike Rapoport,
	Shakeel Butt, Shuah Khan, Thomas Gleixner, Tycho Andersen,
	Al Viro, Will Deacon, Matthew Wilcox

On Wed, Jul 7, 2021 at 6:08 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> Prevent hibernation whenever there are active secret memory users.

So now anybody can not only use up all memory resources with these
fake mlock regions, they can also force laptops to run out of battery
and crash.

Again, maybe I'm missing something, but I didn't see any capability
checks or anything at all to limit this once it is enabled.

                    Linus

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [patch 12/54] PM: hibernate: disable when there are active secretmem users
@ 2021-07-08  3:15     ` Linus Torvalds
  0 siblings, 0 replies; 75+ messages in thread
From: Linus Torvalds @ 2021-07-08  3:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Arnd Bergmann, Borislav Petkov, Catalin Marinas,
	Christoph Lameter, Dan Williams, Dave Hansen, David Hildenbrand,
	Reshetova, Elena, Roman Gushchin, Hagen Paul Pfeifer,
	Peter Anvin, James Bottomley, James Bottomley,
	Kirill A . Shutemov, Linux-MM, kernel test robot,
	Andrew Lutomirski, Mark Rutland, Ingo Molnar, mm-commits,
	Michael Kerrisk-manpages, Palmer Dabbelt, Palmer Dabbelt,
	Paul Walmsley, Peter Zijlstra, Edgecombe, Rick P, Mike Rapoport,
	Shakeel Butt, Shuah Khan, Thomas Gleixner, Tycho Andersen,
	Al Viro, Will Deacon, Matthew Wilcox

On Wed, Jul 7, 2021 at 6:08 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> Prevent hibernation whenever there are active secret memory users.

So now anybody can not only use up all memory resources with these
fake mlock regions, they can also force laptops to run out of battery
and crash.

Again, maybe I'm missing something, but I didn't see any capability
checks or anything at all to limit this once it is enabled.

                    Linus


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [patch 26/54] powerpc: convert to setup_initial_init_mm()
  2021-07-08  1:08 ` [patch 26/54] powerpc: " Andrew Morton
@ 2021-07-08  4:46   ` Christophe Leroy
  0 siblings, 0 replies; 75+ messages in thread
From: Christophe Leroy @ 2021-07-08  4:46 UTC (permalink / raw)
  To: Andrew Morton, benh, jrdr.linux, linux-mm, mm-commits, mpe,
	torvalds, wangkefeng.wang



Le 08/07/2021 à 03:08, Andrew Morton a écrit :
> From: Kefeng Wang <wangkefeng.wang@huawei.com>
> Subject: powerpc: convert to setup_initial_init_mm()
> 
> Use setup_initial_init_mm() helper to simplify code.
> 
> Note klimit is (unsigned long) _end, with new helper, will use _end
> directly.

The patch has been rebased because klimit doesn't exist anymore so the above sentence should be droped.

> 
> Link: https://lkml.kernel.org/r/20210608083418.137226-12-wangkefeng.wang@huawei.com
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Souptick Joarder <jrdr.linux@gmail.com>
> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>   arch/powerpc/kernel/setup-common.c |    5 +----
>   1 file changed, 1 insertion(+), 4 deletions(-)
> 
> --- a/arch/powerpc/kernel/setup-common.c~powerpc-convert-to-setup_initial_init_mm
> +++ a/arch/powerpc/kernel/setup-common.c
> @@ -926,10 +926,7 @@ void __init setup_arch(char **cmdline_p)
>   
>   	klp_init_thread_info(&init_task);
>   
> -	init_mm.start_code = (unsigned long)_stext;
> -	init_mm.end_code = (unsigned long) _etext;
> -	init_mm.end_data = (unsigned long) _edata;
> -	init_mm.brk = (unsigned long)_end;
> +	setup_initial_init_mm(_stext, _etext, _edata, _end);
>   
>   	mm_iommu_init(&init_mm);
>   	irqstack_early_init();
> _
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [patch 11/54] mm: introduce memfd_secret system call to create "secret" memory areas
  2021-07-08  3:13     ` Linus Torvalds
  (?)
@ 2021-07-08  5:21     ` Mike Rapoport
  2021-07-08 18:38         ` Linus Torvalds
  -1 siblings, 1 reply; 75+ messages in thread
From: Mike Rapoport @ 2021-07-08  5:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Arnd Bergmann, Borislav Petkov, Catalin Marinas,
	Christoph Lameter, Dan Williams, Dave Hansen, David Hildenbrand,
	Reshetova, Elena, Roman Gushchin, Hagen Paul Pfeifer,
	Peter Anvin, James Bottomley, James Bottomley,
	Kirill A . Shutemov, Linux-MM, kernel test robot,
	Andrew Lutomirski, Mark Rutland, Ingo Molnar, mm-commits,
	Michael Kerrisk-manpages, Palmer Dabbelt, Palmer Dabbelt,
	Paul Walmsley, Peter Zijlstra, Edgecombe, Rick P, Shakeel Butt,
	Shuah Khan, Thomas Gleixner, Tycho Andersen, Al Viro,
	Will Deacon, Matthew Wilcox

On Wed, Jul 07, 2021 at 08:13:10PM -0700, Linus Torvalds wrote:
> On Wed, Jul 7, 2021 at 6:08 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > From: Mike Rapoport <rppt@linux.ibm.com>
> > Subject: mm: introduce memfd_secret system call to create "secret" memory areas
> >
> > Introduce "memfd_secret" system call with the ability to create memory
> > areas visible only in the context of the owning process and not mapped not
> > only to other processes but in the kernel page tables as well.
> 
> Am I missing something?
> 
> From what I can't tell, this must not be enabled for regular users,
> because the secret mapping is effectively mlock'ed into the address
> space.
> 
> But there does not seem to be any permission checks or any limits, so
> this looks like a trivial way for a bad user to force the kernel to
> run out of memory.

This feature is off by default and should be explicitly enabled by a system
administrator. 
When it is enabled, a user cannot exceed RLIMIT_MEMLOCK.
 
-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [patch 12/54] PM: hibernate: disable when there are active secretmem users
  2021-07-08  3:15     ` Linus Torvalds
  (?)
@ 2021-07-08  5:30     ` Mike Rapoport
  -1 siblings, 0 replies; 75+ messages in thread
From: Mike Rapoport @ 2021-07-08  5:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Arnd Bergmann, Borislav Petkov, Catalin Marinas,
	Christoph Lameter, Dan Williams, Dave Hansen, David Hildenbrand,
	Reshetova, Elena, Roman Gushchin, Hagen Paul Pfeifer,
	Peter Anvin, James Bottomley, James Bottomley,
	Kirill A . Shutemov, Linux-MM, kernel test robot,
	Andrew Lutomirski, Mark Rutland, Ingo Molnar, mm-commits,
	Michael Kerrisk-manpages, Palmer Dabbelt, Palmer Dabbelt,
	Paul Walmsley, Peter Zijlstra, Edgecombe, Rick P, Shakeel Butt,
	Shuah Khan, Thomas Gleixner, Tycho Andersen, Al Viro,
	Will Deacon, Matthew Wilcox

On Wed, Jul 07, 2021 at 08:15:53PM -0700, Linus Torvalds wrote:
> On Wed, Jul 7, 2021 at 6:08 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > Prevent hibernation whenever there are active secret memory users.
> 
> So now anybody can not only use up all memory resources with these
> fake mlock regions, they can also force laptops to run out of battery
> and crash.

Again, this feature should be explicitly enabled on the command line, so
people that prefer hibernation to suspend-to-ram won't enable secretmem.

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [patch 11/54] mm: introduce memfd_secret system call to create "secret" memory areas
  2021-07-08  5:21     ` Mike Rapoport
@ 2021-07-08 18:38         ` Linus Torvalds
  0 siblings, 0 replies; 75+ messages in thread
From: Linus Torvalds @ 2021-07-08 18:38 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, Arnd Bergmann, Borislav Petkov, Catalin Marinas,
	Christoph Lameter, Dan Williams, Dave Hansen, David Hildenbrand,
	Reshetova, Elena, Roman Gushchin, Hagen Paul Pfeifer,
	Peter Anvin, James Bottomley, James Bottomley,
	Kirill A . Shutemov, Linux-MM, kernel test robot,
	Andrew Lutomirski, Mark Rutland, Ingo Molnar, mm-commits,
	Michael Kerrisk-manpages, Palmer Dabbelt, Palmer Dabbelt,
	Paul Walmsley, Peter Zijlstra, Edgecombe, Rick P, Shakeel Butt,
	Shuah Khan, Thomas Gleixner, Tycho Andersen, Al Viro,
	Will Deacon, Matthew Wilcox

On Wed, Jul 7, 2021 at 10:22 PM Mike Rapoport <rppt@linux.ibm.com> wrote:
>
> This feature is off by default and should be explicitly enabled by a system
> administrator.

I really don't think that matters, since people would go "oh, I want
secretmem" without being aware of the consequences.

But:

> When it is enabled, a user cannot exceed RLIMIT_MEMLOCK.

I had missed that, even though it was mentioned in the long commit
description. I just read the patch, and was looking at the
secretmem_file_create() and missed how the the limit was there in the
mmap path.

So I'm fine with this.

I still suspect that the "don't hibernate" should maybe at least alert
the sysadmin about *why* the hibernate failed, but let's see if that
ends up being an actual problem.

                 Linus

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [patch 11/54] mm: introduce memfd_secret system call to create "secret" memory areas
@ 2021-07-08 18:38         ` Linus Torvalds
  0 siblings, 0 replies; 75+ messages in thread
From: Linus Torvalds @ 2021-07-08 18:38 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, Arnd Bergmann, Borislav Petkov, Catalin Marinas,
	Christoph Lameter, Dan Williams, Dave Hansen, David Hildenbrand,
	Reshetova, Elena, Roman Gushchin, Hagen Paul Pfeifer,
	Peter Anvin, James Bottomley, James Bottomley,
	Kirill A . Shutemov, Linux-MM, kernel test robot,
	Andrew Lutomirski, Mark Rutland, Ingo Molnar, mm-commits,
	Michael Kerrisk-manpages, Palmer Dabbelt, Palmer Dabbelt,
	Paul Walmsley, Peter Zijlstra, Edgecombe, Rick P, Shakeel Butt,
	Shuah Khan, Thomas Gleixner, Tycho Andersen, Al Viro,
	Will Deacon, Matthew Wilcox

On Wed, Jul 7, 2021 at 10:22 PM Mike Rapoport <rppt@linux.ibm.com> wrote:
>
> This feature is off by default and should be explicitly enabled by a system
> administrator.

I really don't think that matters, since people would go "oh, I want
secretmem" without being aware of the consequences.

But:

> When it is enabled, a user cannot exceed RLIMIT_MEMLOCK.

I had missed that, even though it was mentioned in the long commit
description. I just read the patch, and was looking at the
secretmem_file_create() and missed how the the limit was there in the
mmap path.

So I'm fine with this.

I still suspect that the "don't hibernate" should maybe at least alert
the sysadmin about *why* the hibernate failed, but let's see if that
ends up being an actual problem.

                 Linus


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [patch 11/54] mm: introduce memfd_secret system call to create "secret" memory areas
  2021-07-08 18:38         ` Linus Torvalds
  (?)
@ 2021-07-08 20:13         ` Hagen Paul Pfeifer
  2021-07-09 15:44           ` Mike Rapoport
  -1 siblings, 1 reply; 75+ messages in thread
From: Hagen Paul Pfeifer @ 2021-07-08 20:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mike Rapoport, Andrew Morton, Arnd Bergmann, Borislav Petkov,
	Catalin Marinas, Christoph Lameter, Dan Williams, Dave Hansen,
	David Hildenbrand, Reshetova, Elena, Roman Gushchin, Peter Anvin,
	James Bottomley, James Bottomley, Kirill A . Shutemov, Linux-MM,
	kernel test robot, Andrew Lutomirski, Mark Rutland, Ingo Molnar,
	mm-commits, Michael Kerrisk-manpages, Palmer Dabbelt,
	Palmer Dabbelt, Paul Walmsley, Peter Zijlstra, Edgecombe, Rick P,
	Shakeel Butt, Shuah Khan, Thomas Gleixner, Tycho Andersen,
	Al Viro, Will Deacon, Matthew Wilcox

* Linus Torvalds | 2021-07-08 11:38:51 [-0700]:

Hello Mike, Linus

>> This feature is off by default and should be explicitly enabled by a system
>> administrator.
>>
>> When it is enabled, a user cannot exceed RLIMIT_MEMLOCK.

Just an idea/proposal:

this feature could be granted based on capabilities (new or existing one,
hopefully not CAP_SYS_ADMIN). Capabilities would provide a very convenient,
simple and fine granular way to use this, at least from a user perspective. Or
do I forget something Mike? 

If capability is the way, I think RLIMIT_MEMLOCK would also be redundant
in my view. It would be "just another parameter" which can only be set wrong
(too low or too high) and somehow always wrong by default. But yes, it doesn't
really hurt either, so I personally wouldn't care about that knob.

Hagen



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [patch 11/54] mm: introduce memfd_secret system call to create "secret" memory areas
  2021-07-08 20:13         ` Hagen Paul Pfeifer
@ 2021-07-09 15:44           ` Mike Rapoport
  0 siblings, 0 replies; 75+ messages in thread
From: Mike Rapoport @ 2021-07-09 15:44 UTC (permalink / raw)
  To: Hagen Paul Pfeifer
  Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Borislav Petkov,
	Catalin Marinas, Christoph Lameter, Dan Williams, Dave Hansen,
	David Hildenbrand, Reshetova, Elena, Roman Gushchin, Peter Anvin,
	James Bottomley, James Bottomley, Kirill A . Shutemov, Linux-MM,
	kernel test robot, Andrew Lutomirski, Mark Rutland, Ingo Molnar,
	mm-commits, Michael Kerrisk-manpages, Palmer Dabbelt,
	Palmer Dabbelt, Paul Walmsley, Peter Zijlstra, Edgecombe, Rick P,
	Shakeel Butt, Shuah Khan, Thomas Gleixner, Tycho Andersen,
	Al Viro, Will Deacon, Matthew Wilcox

Hello Hagen,

On Thu, Jul 08, 2021 at 10:13:23PM +0200, Hagen Paul Pfeifer wrote:
> * Linus Torvalds | 2021-07-08 11:38:51 [-0700]:
> 
> Hello Mike, Linus
> 
> >> This feature is off by default and should be explicitly enabled by a system
> >> administrator.
> >>
> >> When it is enabled, a user cannot exceed RLIMIT_MEMLOCK.
> 
> Just an idea/proposal:
> 
> this feature could be granted based on capabilities (new or existing one,
> hopefully not CAP_SYS_ADMIN). Capabilities would provide a very convenient,
> simple and fine granular way to use this, at least from a user perspective. Or
> do I forget something Mike? 

Our preference is to have secretmem available to everybody.

As James nicely put it [1]:

	I don't think dividing the world into people who can and can't use
	secret memory would be useful since the design is to be usable for
	anyone who might have a secret to keep; it would become like the
	kvm group permissions: something which is theoretically an access
	control but which in practise is given to everyone on the system.
 

[1] https://lore.kernel.org/lkml/73738cda43236b5ac2714e228af362b67a712f5d.camel@linux.ibm.com/

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [patch 07/54] mm/slub: use stackdepot to save stack trace in objects
  2021-07-08  1:07 ` [patch 07/54] mm/slub: use stackdepot to save stack trace in objects Andrew Morton
@ 2021-07-16  7:39   ` Christoph Hellwig
  2021-07-16  8:57     ` Vlastimil Babka
  2021-07-16 20:12       ` Linus Torvalds
  0 siblings, 2 replies; 75+ messages in thread
From: Christoph Hellwig @ 2021-07-16  7:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: cl, glittao, iamjoonsoo.kim, linux-mm, mm-commits, penberg,
	rdunlap, rientjes, torvalds, vbabka, linux-xfs

This somewhat unexpectedly causes a crash when running the xfs/433 test
in xfstests for me.  Reverting the commit fixes the problem:

xfs/433 files ... [  138.422742] run fstests xfs/433 at 2021-07-16 07:30:42
[  140.128145] XFS (vdb): Mounting V5 Filesystem
[  140.160450] XFS (vdb): Ending clean mount
[  140.171782] xfs filesystem being mounted at /mnt/test supports timestamps un)
[  140.966560] XFS (vdc): Mounting V5 Filesystem
[  140.987911] XFS (vdc): Ending clean mount
[  141.000104] xfs filesystem being mounted at /mnt/scratch supports timestamps)
[  145.130156] XFS (vdc): Unmounting Filesystem
[  145.365230] XFS (vdc): Mounting V5 Filesystem
[  145.394542] XFS (vdc): Ending clean mount
[  145.409232] xfs filesystem being mounted at /mnt/scratch supports timestamps)
[  145.471384] XFS (vdc): Injecting error (false) at file fs/xfs/xfs_buf.c, lin"
[  145.478561] XFS (vdc): Injecting error (false) at file fs/xfs/xfs_buf.c, lin"
[  145.486070] XFS (vdc): Injecting error (false) at file fs/xfs/xfs_buf.c, lin"
[  145.492248] XFS (vdc): Injecting error (false) at file fs/xfs/xfs_buf.c, lin"
[  145.599964] XFS (vdb): Unmounting Filesystem
[  145.958340] BUG: kernel NULL pointer dereference, address: 0000000000000020
[  145.961760] #PF: supervisor read access in kernel mode
[  145.964278] #PF: error_code(0x0000) - not-present page
[  145.966758] PGD 0 P4D 0 
[  145.968041] Oops: 0000 [#1] PREEMPT SMP PTI
[  145.970077] CPU: 3 PID: 14172 Comm: xfs_scrub Not tainted 5.13.0+ #601
[  145.973243] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.144
[  145.977312] RIP: 0010:xfs_inode_hasattr+0x19/0x30
[  145.979626] Code: 83 c6 05 b2 55 75 02 01 e8 39 40 e4 00 eb b6 66 90 31 c0 80
[  145.989446] RSP: 0018:ffffc900070eba08 EFLAGS: 00010206
[  145.992280] RAX: ffffffff00ff0000 RBX: 0000000000000000 RCX: 0000000000000001
[  145.995970] RDX: 0000000000000000 RSI: ffffffff82fdd33f RDI: ffff88810dbe16c0
[  145.999945] RBP: ffff88810dbe16c0 R08: ffff888110e14348 R09: ffff888110e14348
[  146.003932] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[  146.007854] R13: ffff888110d99000 R14: ffff888110d99000 R15: ffffffff834acd60
[  146.011765] FS:  00007f2bf29d7700(0000) GS:ffff88813bd80000(0000) knlGS:00000
[  146.016127] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  146.019297] CR2: 0000000000000020 CR3: 0000000110c96000 CR4: 00000000000006e0
[  146.023315] Call Trace:
[  146.024726]  xfs_attr_inactive+0x152/0x350
[  146.027059]  xfs_inactive+0x18a/0x240
[  146.029141]  xfs_fs_destroy_inode+0xcc/0x2d0
[  146.031311]  destroy_inode+0x36/0x70
[  146.033130]  xfs_bulkstat_one_int+0x243/0x340
[  146.035342]  xfs_bulkstat_iwalk+0x19/0x30
[  146.037562]  xfs_iwalk_ag_recs+0xef/0x1e0
[  146.039845]  xfs_iwalk_run_callbacks+0x9f/0x140
[  146.042550]  xfs_iwalk_ag+0x230/0x2f0
[  146.044601]  xfs_iwalk+0x139/0x200
[  146.046505]  ? xfs_bulkstat_one_int+0x340/0x340
[  146.049151]  xfs_bulkstat+0xc4/0x130
[  146.050771]  ? xfs_flags2diflags+0xe0/0xe0
[  146.052309]  xfs_ioc_bulkstat.constprop.0.isra.0+0xbf/0x120
[  146.054200]  xfs_file_ioctl+0xb6/0xef0
[  146.055474]  ? lock_is_held_type+0xd5/0x130
[  146.056867]  ? find_held_lock+0x2b/0x80
[  146.058241]  ? lock_release+0x13c/0x2e0
[  146.059385]  ? lock_is_held_type+0xd5/0x130
[  146.060435]  ? __fget_files+0xce/0x1d0
[  146.061385]  __x64_sys_ioctl+0x7e/0xb0
[  146.062333]  do_syscall_64+0x3b/0x90
[  146.063284]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  146.064572] RIP: 0033:0x7f2bf2df5427
[  146.065600] Code: 00 00 90 48 8b 05 69 aa 0c 00 64 c7 00 26 00 00 00 48 c7 c8
[  146.070244] RSP: 002b:00007f2bf29d6bd8 EFLAGS: 00000246 ORIG_RAX: 00000000000
[  146.072015] RAX: ffffffffffffffda RBX: 00007fffe44b8010 RCX: 00007f2bf2df5427
[  146.073692] RDX: 00007f2be4000b20 RSI: 000000008040587f RDI: 0000000000000003
[  146.075322] RBP: 00007f2be4000b20 R08: 00007f2be4003b70 R09: 0000000000000077
[  146.076962] R10: 0000000000000001 R11: 0000000000000246 R12: 00007f2be4003b70
[  146.078480] R13: 00007fffe44b8010 R14: 00007f2be4000b60 R15: 0000000000000018
[  146.079803] Modules linked in:
[  146.080379] CR2: 0000000000000020
[  146.081196] ---[ end trace 80a6ea90b0ea2a03 ]---
[  146.082130] RIP: 0010:xfs_inode_hasattr+0x19/0x30
[  146.083144] Code: 83 c6 05 b2 55 75 02 01 e8 39 40 e4 00 eb b6 66 90 31 c0 80
[  146.086831] RSP: 0018:ffffc900070eba08 EFLAGS: 00010206
[  146.087816] RAX: ffffffff00ff0000 RBX: 0000000000000000 RCX: 0000000000000001
[  146.089122] RDX: 0000000000000000 RSI: ffffffff82fdd33f RDI: ffff88810dbe16c0
[  146.090477] RBP: ffff88810dbe16c0 R08: ffff888110e14348 R09: ffff888110e14348
[  146.091794] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[  146.093096] R13: ffff888110d99000 R14: ffff888110d99000 R15: ffffffff834acd60
[  146.094429] FS:  00007f2bf29d7700(0000) GS:ffff88813bd80000(0000) knlGS:00000
[  146.096002] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  146.097079] CR2: 0000000000000020 CR3: 0000000110c96000 CR4: 00000000000006e0
[  146.098479] Kernel panic - not syncing: Fatal exception
[  146.099677] Kernel Offset: disabled
[  146.100397] ---[ end Kernel panic - not syncing: Fatal exception ]---



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [patch 07/54] mm/slub: use stackdepot to save stack trace in objects
  2021-07-16  7:39   ` Christoph Hellwig
@ 2021-07-16  8:57     ` Vlastimil Babka
  2021-07-16  9:12       ` Christoph Hellwig
  2021-07-16 20:12       ` Linus Torvalds
  1 sibling, 1 reply; 75+ messages in thread
From: Vlastimil Babka @ 2021-07-16  8:57 UTC (permalink / raw)
  To: Christoph Hellwig, Andrew Morton
  Cc: cl, glittao, iamjoonsoo.kim, linux-mm, mm-commits, penberg,
	rdunlap, rientjes, torvalds, linux-xfs

On 7/16/21 9:39 AM, Christoph Hellwig wrote:
> This somewhat unexpectedly causes a crash when running the xfs/433 test
> in xfstests for me.  Reverting the commit fixes the problem:

That's weird, the backtrace doesn't even include SLUB/stackdepot code.
Is that kernel actually booted with slub_debug option/built with
CONFIG_SLUB_DEBUG_ON or some cache created with SLAB_STORE_USER ?

> 
> xfs/433 files ... [  138.422742] run fstests xfs/433 at 2021-07-16 07:30:42
> [  140.128145] XFS (vdb): Mounting V5 Filesystem
> [  140.160450] XFS (vdb): Ending clean mount
> [  140.171782] xfs filesystem being mounted at /mnt/test supports timestamps un)
> [  140.966560] XFS (vdc): Mounting V5 Filesystem
> [  140.987911] XFS (vdc): Ending clean mount
> [  141.000104] xfs filesystem being mounted at /mnt/scratch supports timestamps)
> [  145.130156] XFS (vdc): Unmounting Filesystem
> [  145.365230] XFS (vdc): Mounting V5 Filesystem
> [  145.394542] XFS (vdc): Ending clean mount
> [  145.409232] xfs filesystem being mounted at /mnt/scratch supports timestamps)
> [  145.471384] XFS (vdc): Injecting error (false) at file fs/xfs/xfs_buf.c, lin"
> [  145.478561] XFS (vdc): Injecting error (false) at file fs/xfs/xfs_buf.c, lin"
> [  145.486070] XFS (vdc): Injecting error (false) at file fs/xfs/xfs_buf.c, lin"
> [  145.492248] XFS (vdc): Injecting error (false) at file fs/xfs/xfs_buf.c, lin"
> [  145.599964] XFS (vdb): Unmounting Filesystem
> [  145.958340] BUG: kernel NULL pointer dereference, address: 0000000000000020
> [  145.961760] #PF: supervisor read access in kernel mode
> [  145.964278] #PF: error_code(0x0000) - not-present page
> [  145.966758] PGD 0 P4D 0 
> [  145.968041] Oops: 0000 [#1] PREEMPT SMP PTI
> [  145.970077] CPU: 3 PID: 14172 Comm: xfs_scrub Not tainted 5.13.0+ #601
> [  145.973243] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.144
> [  145.977312] RIP: 0010:xfs_inode_hasattr+0x19/0x30
> [  145.979626] Code: 83 c6 05 b2 55 75 02 01 e8 39 40 e4 00 eb b6 66 90 31 c0 80
> [  145.989446] RSP: 0018:ffffc900070eba08 EFLAGS: 00010206
> [  145.992280] RAX: ffffffff00ff0000 RBX: 0000000000000000 RCX: 0000000000000001
> [  145.995970] RDX: 0000000000000000 RSI: ffffffff82fdd33f RDI: ffff88810dbe16c0
> [  145.999945] RBP: ffff88810dbe16c0 R08: ffff888110e14348 R09: ffff888110e14348
> [  146.003932] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> [  146.007854] R13: ffff888110d99000 R14: ffff888110d99000 R15: ffffffff834acd60
> [  146.011765] FS:  00007f2bf29d7700(0000) GS:ffff88813bd80000(0000) knlGS:00000
> [  146.016127] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  146.019297] CR2: 0000000000000020 CR3: 0000000110c96000 CR4: 00000000000006e0
> [  146.023315] Call Trace:
> [  146.024726]  xfs_attr_inactive+0x152/0x350
> [  146.027059]  xfs_inactive+0x18a/0x240
> [  146.029141]  xfs_fs_destroy_inode+0xcc/0x2d0
> [  146.031311]  destroy_inode+0x36/0x70
> [  146.033130]  xfs_bulkstat_one_int+0x243/0x340
> [  146.035342]  xfs_bulkstat_iwalk+0x19/0x30
> [  146.037562]  xfs_iwalk_ag_recs+0xef/0x1e0
> [  146.039845]  xfs_iwalk_run_callbacks+0x9f/0x140
> [  146.042550]  xfs_iwalk_ag+0x230/0x2f0
> [  146.044601]  xfs_iwalk+0x139/0x200
> [  146.046505]  ? xfs_bulkstat_one_int+0x340/0x340
> [  146.049151]  xfs_bulkstat+0xc4/0x130
> [  146.050771]  ? xfs_flags2diflags+0xe0/0xe0
> [  146.052309]  xfs_ioc_bulkstat.constprop.0.isra.0+0xbf/0x120
> [  146.054200]  xfs_file_ioctl+0xb6/0xef0
> [  146.055474]  ? lock_is_held_type+0xd5/0x130
> [  146.056867]  ? find_held_lock+0x2b/0x80
> [  146.058241]  ? lock_release+0x13c/0x2e0
> [  146.059385]  ? lock_is_held_type+0xd5/0x130
> [  146.060435]  ? __fget_files+0xce/0x1d0
> [  146.061385]  __x64_sys_ioctl+0x7e/0xb0
> [  146.062333]  do_syscall_64+0x3b/0x90
> [  146.063284]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  146.064572] RIP: 0033:0x7f2bf2df5427
> [  146.065600] Code: 00 00 90 48 8b 05 69 aa 0c 00 64 c7 00 26 00 00 00 48 c7 c8
> [  146.070244] RSP: 002b:00007f2bf29d6bd8 EFLAGS: 00000246 ORIG_RAX: 00000000000
> [  146.072015] RAX: ffffffffffffffda RBX: 00007fffe44b8010 RCX: 00007f2bf2df5427
> [  146.073692] RDX: 00007f2be4000b20 RSI: 000000008040587f RDI: 0000000000000003
> [  146.075322] RBP: 00007f2be4000b20 R08: 00007f2be4003b70 R09: 0000000000000077
> [  146.076962] R10: 0000000000000001 R11: 0000000000000246 R12: 00007f2be4003b70
> [  146.078480] R13: 00007fffe44b8010 R14: 00007f2be4000b60 R15: 0000000000000018
> [  146.079803] Modules linked in:
> [  146.080379] CR2: 0000000000000020
> [  146.081196] ---[ end trace 80a6ea90b0ea2a03 ]---
> [  146.082130] RIP: 0010:xfs_inode_hasattr+0x19/0x30
> [  146.083144] Code: 83 c6 05 b2 55 75 02 01 e8 39 40 e4 00 eb b6 66 90 31 c0 80
> [  146.086831] RSP: 0018:ffffc900070eba08 EFLAGS: 00010206
> [  146.087816] RAX: ffffffff00ff0000 RBX: 0000000000000000 RCX: 0000000000000001
> [  146.089122] RDX: 0000000000000000 RSI: ffffffff82fdd33f RDI: ffff88810dbe16c0
> [  146.090477] RBP: ffff88810dbe16c0 R08: ffff888110e14348 R09: ffff888110e14348
> [  146.091794] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> [  146.093096] R13: ffff888110d99000 R14: ffff888110d99000 R15: ffffffff834acd60
> [  146.094429] FS:  00007f2bf29d7700(0000) GS:ffff88813bd80000(0000) knlGS:00000
> [  146.096002] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  146.097079] CR2: 0000000000000020 CR3: 0000000110c96000 CR4: 00000000000006e0
> [  146.098479] Kernel panic - not syncing: Fatal exception
> [  146.099677] Kernel Offset: disabled
> [  146.100397] ---[ end Kernel panic - not syncing: Fatal exception ]---
> 
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [patch 07/54] mm/slub: use stackdepot to save stack trace in objects
  2021-07-16  8:57     ` Vlastimil Babka
@ 2021-07-16  9:12       ` Christoph Hellwig
  0 siblings, 0 replies; 75+ messages in thread
From: Christoph Hellwig @ 2021-07-16  9:12 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Christoph Hellwig, Andrew Morton, cl, glittao, iamjoonsoo.kim,
	linux-mm, mm-commits, penberg, rdunlap, rientjes, torvalds,
	linux-xfs

[-- Attachment #1: Type: text/plain, Size: 527 bytes --]

On Fri, Jul 16, 2021 at 10:57:51AM +0200, Vlastimil Babka wrote:
> On 7/16/21 9:39 AM, Christoph Hellwig wrote:
> > This somewhat unexpectedly causes a crash when running the xfs/433 test
> > in xfstests for me.  Reverting the commit fixes the problem:
> 
> That's weird, the backtrace doesn't even include SLUB/stackdepot code.
> Is that kernel actually booted with slub_debug option/built with
> CONFIG_SLUB_DEBUG_ON or some cache created with SLAB_STORE_USER ?

CONFIG_SLUB_DEBUG_ON is enabled, yes.  Full .config attached.

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 35959 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [patch 07/54] mm/slub: use stackdepot to save stack trace in objects
  2021-07-16  7:39   ` Christoph Hellwig
@ 2021-07-16 20:12       ` Linus Torvalds
  2021-07-16 20:12       ` Linus Torvalds
  1 sibling, 0 replies; 75+ messages in thread
From: Linus Torvalds @ 2021-07-16 20:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Christoph Lameter, glittao, Joonsoo Kim, Linux-MM,
	mm-commits, Pekka Enberg, Randy Dunlap, David Rientjes,
	Vlastimil Babka, linux-xfs

On Fri, Jul 16, 2021 at 12:39 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> This somewhat unexpectedly causes a crash when running the xfs/433 test
> in xfstests for me.  Reverting the commit fixes the problem:

I don't see why that would be the case, but I'm inclined to revert
that commit for another reason: the code doesn't seem to match the
description of the commit.

It used to be that CONFIG_SLUB_DEBUG was a config option that was
harmless and that defaulted to 'y' because there was little downside.
In fact, it's not just "default y", it doesn't even *ask* the user
unless CONFIG_EXPERT is on. Because it was fairly harmless. And then
SLOB_DEBUG_ON was that "do you actually want this code _enabled_".

But now it basically force-enables that STACKDEPOT support too, and
then instead of having an _optional_ CONFIG_STACKTRACE, you basically
have that as being forced on you whether you want active debugging or
not.

Maybe that

        select STACKDEPOT if STACKTRACE_SUPPORT

should have been

        select STACKDEPOT if STACKTRACE

because i\t used to be that CONFIG_STACKTRACE was somewhat unusual,
and only enabled for special debug cases (admittedly "CONFIG_TRACING"
likely meant that it was fairly widely enabled).

In contrast, STACKTRACE_SUPPORT is basically "this architecture supports it".

So now it seems STACKDEPOT is enabled basically unconditionally.

So I really don't see why it would cause that xfs issue, but I think
there are multiple reasons to just go "Hmm" on that commit.

Comments?

                Linus

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [patch 07/54] mm/slub: use stackdepot to save stack trace in objects
@ 2021-07-16 20:12       ` Linus Torvalds
  0 siblings, 0 replies; 75+ messages in thread
From: Linus Torvalds @ 2021-07-16 20:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Christoph Lameter, glittao, Joonsoo Kim, Linux-MM,
	mm-commits, Pekka Enberg, Randy Dunlap, David Rientjes,
	Vlastimil Babka, linux-xfs

On Fri, Jul 16, 2021 at 12:39 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> This somewhat unexpectedly causes a crash when running the xfs/433 test
> in xfstests for me.  Reverting the commit fixes the problem:

I don't see why that would be the case, but I'm inclined to revert
that commit for another reason: the code doesn't seem to match the
description of the commit.

It used to be that CONFIG_SLUB_DEBUG was a config option that was
harmless and that defaulted to 'y' because there was little downside.
In fact, it's not just "default y", it doesn't even *ask* the user
unless CONFIG_EXPERT is on. Because it was fairly harmless. And then
SLOB_DEBUG_ON was that "do you actually want this code _enabled_".

But now it basically force-enables that STACKDEPOT support too, and
then instead of having an _optional_ CONFIG_STACKTRACE, you basically
have that as being forced on you whether you want active debugging or
not.

Maybe that

        select STACKDEPOT if STACKTRACE_SUPPORT

should have been

        select STACKDEPOT if STACKTRACE

because i\t used to be that CONFIG_STACKTRACE was somewhat unusual,
and only enabled for special debug cases (admittedly "CONFIG_TRACING"
likely meant that it was fairly widely enabled).

In contrast, STACKTRACE_SUPPORT is basically "this architecture supports it".

So now it seems STACKDEPOT is enabled basically unconditionally.

So I really don't see why it would cause that xfs issue, but I think
there are multiple reasons to just go "Hmm" on that commit.

Comments?

                Linus


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [patch 07/54] mm/slub: use stackdepot to save stack trace in objects
  2021-07-16 20:12       ` Linus Torvalds
  (?)
@ 2021-07-16 22:37       ` Vlastimil Babka
  2021-07-17 17:34         ` Randy Dunlap
  -1 siblings, 1 reply; 75+ messages in thread
From: Vlastimil Babka @ 2021-07-16 22:37 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Hellwig
  Cc: Andrew Morton, Christoph Lameter, glittao, Joonsoo Kim, Linux-MM,
	mm-commits, Pekka Enberg, Randy Dunlap, David Rientjes,
	linux-xfs, Geert Uytterhoeven

On 7/16/21 10:12 PM, Linus Torvalds wrote:
> On Fri, Jul 16, 2021 at 12:39 AM Christoph Hellwig <hch@infradead.org> wrote:
>>
>> This somewhat unexpectedly causes a crash when running the xfs/433 test
>> in xfstests for me.  Reverting the commit fixes the problem:
> 
> I don't see why that would be the case, but I'm inclined to revert
> that commit for another reason: the code doesn't seem to match the
> description of the commit.
> 
> It used to be that CONFIG_SLUB_DEBUG was a config option that was
> harmless and that defaulted to 'y' because there was little downside.
> In fact, it's not just "default y", it doesn't even *ask* the user
> unless CONFIG_EXPERT is on. Because it was fairly harmless. And then
> SLOB_DEBUG_ON was that "do you actually want this code _enabled_".
> 
> But now it basically force-enables that STACKDEPOT support too, and
> then instead of having an _optional_ CONFIG_STACKTRACE, you basically
> have that as being forced on you whether you want active debugging or
> not.
> 
> Maybe that
> 
>         select STACKDEPOT if STACKTRACE_SUPPORT
> 
> should have been
> 
>         select STACKDEPOT if STACKTRACE

I recall we tried that and run into KConfig recursive dependency hell as
"config STACKDEPOT" does "select STACKTRACE", and after some attempts
ended up with the above.

> because i\t used to be that CONFIG_STACKTRACE was somewhat unusual,
> and only enabled for special debug cases (admittedly "CONFIG_TRACING"
> likely meant that it was fairly widely enabled).
> 
> In contrast, STACKTRACE_SUPPORT is basically "this architecture supports it".
> 
> So now it seems STACKDEPOT is enabled basically unconditionally.

It seemed rather harmless as it was just a bit of extra code. But it's
true Geert reports [1] unexpected memory usage which I would have only
expected if actual stacks started to be collected. So I guess we'll have
to look into that.

[1]
https://lore.kernel.org/lkml/CAMuHMdW=eoVzM1Re5FVoEN87nKfiLmM2+Ah7eNu2KXEhCvbZyA@mail.gmail.com/

> So I really don't see why it would cause that xfs issue, but I think
> there are multiple reasons to just go "Hmm" on that commit.
> 
> Comments?
> 
>                 Linus
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [patch 07/54] mm/slub: use stackdepot to save stack trace in objects
  2021-07-16 22:37       ` Vlastimil Babka
@ 2021-07-17 17:34         ` Randy Dunlap
  2021-07-18  7:29           ` Vlastimil Babka
  0 siblings, 1 reply; 75+ messages in thread
From: Randy Dunlap @ 2021-07-17 17:34 UTC (permalink / raw)
  To: Vlastimil Babka, Linus Torvalds, Christoph Hellwig
  Cc: Andrew Morton, Christoph Lameter, glittao, Joonsoo Kim, Linux-MM,
	mm-commits, Pekka Enberg, David Rientjes, linux-xfs,
	Geert Uytterhoeven

On 7/16/21 3:37 PM, Vlastimil Babka wrote:
> On 7/16/21 10:12 PM, Linus Torvalds wrote:
>> On Fri, Jul 16, 2021 at 12:39 AM Christoph Hellwig <hch@infradead.org> wrote:
>>>
>>> This somewhat unexpectedly causes a crash when running the xfs/433 test
>>> in xfstests for me.  Reverting the commit fixes the problem:
>>
>> I don't see why that would be the case, but I'm inclined to revert
>> that commit for another reason: the code doesn't seem to match the
>> description of the commit.
>>
>> It used to be that CONFIG_SLUB_DEBUG was a config option that was
>> harmless and that defaulted to 'y' because there was little downside.
>> In fact, it's not just "default y", it doesn't even *ask* the user
>> unless CONFIG_EXPERT is on. Because it was fairly harmless. And then
>> SLOB_DEBUG_ON was that "do you actually want this code _enabled_".
>>
>> But now it basically force-enables that STACKDEPOT support too, and
>> then instead of having an _optional_ CONFIG_STACKTRACE, you basically
>> have that as being forced on you whether you want active debugging or
>> not.
>>
>> Maybe that
>>
>>         select STACKDEPOT if STACKTRACE_SUPPORT
>>
>> should have been
>>
>>         select STACKDEPOT if STACKTRACE
> 
> I recall we tried that and run into KConfig recursive dependency hell as
> "config STACKDEPOT" does "select STACKTRACE", and after some attempts
> ended up with the above.
> 
>> because i\t used to be that CONFIG_STACKTRACE was somewhat unusual,
>> and only enabled for special debug cases (admittedly "CONFIG_TRACING"
>> likely meant that it was fairly widely enabled).
>>
>> In contrast, STACKTRACE_SUPPORT is basically "this architecture supports it".
>>
>> So now it seems STACKDEPOT is enabled basically unconditionally.
> 
> It seemed rather harmless as it was just a bit of extra code. But it's
> true Geert reports [1] unexpected memory usage which I would have only
> expected if actual stacks started to be collected. So I guess we'll have
> to look into that.
> 
> [1]
> https://lore.kernel.org/lkml/CAMuHMdW=eoVzM1Re5FVoEN87nKfiLmM2+Ah7eNu2KXEhCvbZyA@mail.gmail.com/
> 
>> So I really don't see why it would cause that xfs issue, but I think
>> there are multiple reasons to just go "Hmm" on that commit.
>>
>> Comments?
>>
>>                 Linus
>>
> 

There is also the matter of lib/stackdepot.c build errors on ARCH=arc:

https://lore.kernel.org/lkml/202107150600.LkGNb4Vb-lkp@intel.com/


-- 
~Randy


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [patch 07/54] mm/slub: use stackdepot to save stack trace in objects
  2021-07-17 17:34         ` Randy Dunlap
@ 2021-07-18  7:29           ` Vlastimil Babka
  2021-07-18 14:17             ` Randy Dunlap
  0 siblings, 1 reply; 75+ messages in thread
From: Vlastimil Babka @ 2021-07-18  7:29 UTC (permalink / raw)
  To: Randy Dunlap, Linus Torvalds, Christoph Hellwig
  Cc: Andrew Morton, Christoph Lameter, glittao, Joonsoo Kim, Linux-MM,
	mm-commits, Pekka Enberg, David Rientjes, linux-xfs,
	Geert Uytterhoeven

On 7/17/21 7:34 PM, Randy Dunlap wrote:
>>> because i\t used to be that CONFIG_STACKTRACE was somewhat unusual,
>>> and only enabled for special debug cases (admittedly "CONFIG_TRACING"
>>> likely meant that it was fairly widely enabled).
>>>
>>> In contrast, STACKTRACE_SUPPORT is basically "this architecture supports it".
>>>
>>> So now it seems STACKDEPOT is enabled basically unconditionally.
>>
>> It seemed rather harmless as it was just a bit of extra code. But it's
>> true Geert reports [1] unexpected memory usage which I would have only
>> expected if actual stacks started to be collected. So I guess we'll have
>> to look into that.
>>
>> [1]
>> https://lore.kernel.org/lkml/CAMuHMdW=eoVzM1Re5FVoEN87nKfiLmM2+Ah7eNu2KXEhCvbZyA@mail.gmail.com/
>>
>>> So I really don't see why it would cause that xfs issue, but I think
>>> there are multiple reasons to just go "Hmm" on that commit.
>>>
>>> Comments?
>>>
>>>                 Linus
>>>
>>
> 
> There is also the matter of lib/stackdepot.c build errors on ARCH=arc:
> 
> https://lore.kernel.org/lkml/202107150600.LkGNb4Vb-lkp@intel.com/

That's being fixed AFAIK?

https://lore.kernel.org/lkml/20210710145033.2804047-1-linux@roeck-us.net/

I'll try to come up with some KConfig flag set that will make it depend
on STRACKTRACE again without recursion issues.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [patch 07/54] mm/slub: use stackdepot to save stack trace in objects
  2021-07-18  7:29           ` Vlastimil Babka
@ 2021-07-18 14:17             ` Randy Dunlap
  0 siblings, 0 replies; 75+ messages in thread
From: Randy Dunlap @ 2021-07-18 14:17 UTC (permalink / raw)
  To: Vlastimil Babka, Linus Torvalds, Christoph Hellwig
  Cc: Andrew Morton, Christoph Lameter, glittao, Joonsoo Kim, Linux-MM,
	mm-commits, Pekka Enberg, David Rientjes, linux-xfs,
	Geert Uytterhoeven

On 7/18/21 12:29 AM, Vlastimil Babka wrote:
> On 7/17/21 7:34 PM, Randy Dunlap wrote:
>>>> because i\t used to be that CONFIG_STACKTRACE was somewhat unusual,
>>>> and only enabled for special debug cases (admittedly "CONFIG_TRACING"
>>>> likely meant that it was fairly widely enabled).
>>>>
>>>> In contrast, STACKTRACE_SUPPORT is basically "this architecture supports it".
>>>>
>>>> So now it seems STACKDEPOT is enabled basically unconditionally.
>>>
>>> It seemed rather harmless as it was just a bit of extra code. But it's
>>> true Geert reports [1] unexpected memory usage which I would have only
>>> expected if actual stacks started to be collected. So I guess we'll have
>>> to look into that.
>>>
>>> [1]
>>> https://lore.kernel.org/lkml/CAMuHMdW=eoVzM1Re5FVoEN87nKfiLmM2+Ah7eNu2KXEhCvbZyA@mail.gmail.com/
>>>
>>>> So I really don't see why it would cause that xfs issue, but I think
>>>> there are multiple reasons to just go "Hmm" on that commit.
>>>>
>>>> Comments?
>>>>
>>>>                 Linus
>>>>
>>>
>>
>> There is also the matter of lib/stackdepot.c build errors on ARCH=arc:
>>
>> https://lore.kernel.org/lkml/202107150600.LkGNb4Vb-lkp@intel.com/
> 
> That's being fixed AFAIK?
> 
> https://lore.kernel.org/lkml/20210710145033.2804047-1-linux@roeck-us.net/

Ah, thanks.

> I'll try to come up with some KConfig flag set that will make it depend
> on STRACKTRACE again without recursion issues.
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

end of thread, other threads:[~2021-07-18 14:17 UTC | newest]

Thread overview: 75+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-08  0:59 incoming Andrew Morton
2021-07-08  1:07 ` [patch 01/54] lib/test: fix spelling mistakes Andrew Morton
2021-07-08  1:07 ` [patch 02/54] lib: " Andrew Morton
2021-07-08  1:07 ` [patch 03/54] lib: fix spelling mistakes in header files Andrew Morton
2021-07-08  1:07 ` [patch 04/54] hexagon: handle {,SOFT}IRQENTRY_TEXT in linker script Andrew Morton
2021-07-08  1:07 ` [patch 05/54] hexagon: use common DISCARDS macro Andrew Morton
2021-07-08  1:07 ` [patch 06/54] hexagon: select ARCH_WANT_LD_ORPHAN_WARN Andrew Morton
2021-07-08  1:07 ` [patch 07/54] mm/slub: use stackdepot to save stack trace in objects Andrew Morton
2021-07-16  7:39   ` Christoph Hellwig
2021-07-16  8:57     ` Vlastimil Babka
2021-07-16  9:12       ` Christoph Hellwig
2021-07-16 20:12     ` Linus Torvalds
2021-07-16 20:12       ` Linus Torvalds
2021-07-16 22:37       ` Vlastimil Babka
2021-07-17 17:34         ` Randy Dunlap
2021-07-18  7:29           ` Vlastimil Babka
2021-07-18 14:17             ` Randy Dunlap
2021-07-08  1:07 ` [patch 08/54] mmap: make mlock_future_check() global Andrew Morton
2021-07-08  1:07 ` [patch 09/54] riscv/Kconfig: make direct map manipulation options depend on MMU Andrew Morton
2021-07-08  1:07 ` [patch 10/54] set_memory: allow querying whether set_direct_map_*() is actually enabled Andrew Morton
2021-07-08  1:08 ` [patch 11/54] mm: introduce memfd_secret system call to create "secret" memory areas Andrew Morton
2021-07-08  3:13   ` Linus Torvalds
2021-07-08  3:13     ` Linus Torvalds
2021-07-08  5:21     ` Mike Rapoport
2021-07-08 18:38       ` Linus Torvalds
2021-07-08 18:38         ` Linus Torvalds
2021-07-08 20:13         ` Hagen Paul Pfeifer
2021-07-09 15:44           ` Mike Rapoport
2021-07-08  1:08 ` [patch 12/54] PM: hibernate: disable when there are active secretmem users Andrew Morton
2021-07-08  3:15   ` Linus Torvalds
2021-07-08  3:15     ` Linus Torvalds
2021-07-08  5:30     ` Mike Rapoport
2021-07-08  1:08 ` [patch 13/54] arch, mm: wire up memfd_secret system call where relevant Andrew Morton
2021-07-08  1:08 ` [patch 14/54] secretmem: test: add basic selftest for memfd_secret(2) Andrew Morton
2021-07-08  1:08 ` [patch 15/54] mm: fix spelling mistakes in header files Andrew Morton
2021-07-08  1:08 ` [patch 16/54] mm: add setup_initial_init_mm() helper Andrew Morton
2021-07-08  1:08 ` [patch 17/54] arc: convert to setup_initial_init_mm() Andrew Morton
2021-07-08  1:08 ` [patch 18/54] arm: " Andrew Morton
2021-07-08  1:08 ` [patch 19/54] arm64: " Andrew Morton
2021-07-08  1:08 ` [patch 20/54] csky: " Andrew Morton
2021-07-08  1:08 ` [patch 21/54] h8300: " Andrew Morton
2021-07-08  1:08 ` [patch 22/54] m68k: " Andrew Morton
2021-07-08  1:08 ` [patch 23/54] nds32: " Andrew Morton
2021-07-08  1:08 ` [patch 24/54] nios2: " Andrew Morton
2021-07-08  1:08 ` [patch 25/54] openrisc: " Andrew Morton
2021-07-08  1:08 ` [patch 26/54] powerpc: " Andrew Morton
2021-07-08  4:46   ` Christophe Leroy
2021-07-08  1:08 ` [patch 27/54] riscv: " Andrew Morton
2021-07-08  1:08 ` [patch 28/54] s390: " Andrew Morton
2021-07-08  1:09 ` [patch 29/54] sh: " Andrew Morton
2021-07-08  1:09 ` [patch 30/54] x86: " Andrew Morton
2021-07-08  1:09 ` [patch 31/54] buildid: only consider GNU notes for build ID parsing Andrew Morton
2021-07-08  1:09 ` [patch 32/54] buildid: add API to parse build ID out of buffer Andrew Morton
2021-07-08  1:09 ` [patch 33/54] buildid: stash away kernels build ID on init Andrew Morton
2021-07-08  1:09 ` [patch 34/54] dump_stack: add vmlinux build ID to stack traces Andrew Morton
2021-07-08  1:09 ` [patch 35/54] module: add printk formats to add module build ID to stacktraces Andrew Morton
2021-07-08  1:09 ` [patch 36/54] arm64: stacktrace: use %pSb for backtrace printing Andrew Morton
2021-07-08  1:09 ` [patch 37/54] x86/dumpstack: use %pSb/%pBb " Andrew Morton
2021-07-08  1:09 ` [patch 38/54] scripts/decode_stacktrace.sh: support debuginfod Andrew Morton
2021-07-08  1:09 ` [patch 39/54] scripts/decode_stacktrace.sh: silence stderr messages from addr2line/nm Andrew Morton
2021-07-08  1:09 ` [patch 40/54] scripts/decode_stacktrace.sh: indicate 'auto' can be used for base path Andrew Morton
2021-07-08  1:09 ` [patch 41/54] buildid: mark some arguments const Andrew Morton
2021-07-08  1:09 ` [patch 42/54] buildid: fix kernel-doc notation Andrew Morton
2021-07-08  1:09 ` [patch 43/54] kdump: use vmlinux_build_id to simplify Andrew Morton
2021-07-08  1:09 ` [patch 44/54] mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t * Andrew Morton
2021-07-08  1:09 ` [patch 45/54] mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t * Andrew Morton
2021-07-08  1:09 ` [patch 46/54] selftest/mremap_test: update the test to handle pagesize other than 4K Andrew Morton
2021-07-08  1:10 ` [patch 47/54] selftest/mremap_test: avoid crash with static build Andrew Morton
2021-07-08  1:10 ` [patch 48/54] mm/mremap: convert huge PUD move to separate helper Andrew Morton
2021-07-08  1:10 ` [patch 49/54] mm/mremap: don't enable optimized PUD move if page table levels is 2 Andrew Morton
2021-07-08  1:10 ` [patch 50/54] mm/mremap: use pmd/pud_poplulate to update page table entries Andrew Morton
2021-07-08  1:10 ` [patch 51/54] mm/mremap: hold the rmap lock in write mode when moving " Andrew Morton
2021-07-08  1:10 ` [patch 52/54] mm/mremap: allow arch runtime override Andrew Morton
2021-07-08  1:10 ` [patch 53/54] powerpc/book3s64/mm: update flush_tlb_range to flush page walk cache Andrew Morton
2021-07-08  1:10 ` [patch 54/54] powerpc/mm: enable HAVE_MOVE_PMD support Andrew Morton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.