Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* incoming
@ 2021-02-26  1:14 Andrew Morton
  2021-02-26  1:15 ` [patch 001/118] mm: make pagecache tagged lookups return only head pages Andrew Morton
                   ` (118 more replies)
  0 siblings, 119 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm


- The rest of MM.

  Includes kfence - another runtime memory validator.  Not as
  thorough as KASAN, but it has unmeasurable overhead and is intended
  to be usable in production builds.

- Everything else


118 patches, based on 6fbd6cf85a3be127454a1ad58525a3adcf8612ab.

Subsystems affected by this patch series:

  mm/thp
  mm/cma
  mm/vmstat
  mm/memory-hotplug
  mm/mlock
  mm/rmap
  mm/zswap
  mm/zsmalloc
  mm/cleanups
  mm/kfence
  mm/kasan2
  alpha
  procfs
  sysctl
  misc
  core-kernel
  MAINTAINERS
  lib
  bitops
  checkpatch
  init
  coredump
  seq_file
  gdb
  ubsan
  initramfs
  mm/pagemap2

Subsystem: mm/thp

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
    Patch series "Overhaul multi-page lookups for THP", v4:
      mm: make pagecache tagged lookups return only head pages
      mm/shmem: use pagevec_lookup in shmem_unlock_mapping
      mm/swap: optimise get_shadow_from_swap_cache
      mm: add FGP_ENTRY
      mm/filemap: rename find_get_entry to mapping_get_entry
      mm/filemap: add helper for finding pages
      mm/filemap: add mapping_seek_hole_data
      iomap: use mapping_seek_hole_data
      mm: add and use find_lock_entries
      mm: add an 'end' parameter to find_get_entries
      mm: add an 'end' parameter to pagevec_lookup_entries
      mm: remove nr_entries parameter from pagevec_lookup_entries
      mm: pass pvec directly to find_get_entries
      mm: remove pagevec_lookup_entries

    Rik van Riel <riel@surriel.com>:
    Patch series "mm,thp,shm: limit shmem THP alloc gfp_mask", v6:
      mm,thp,shmem: limit shmem THP alloc gfp_mask
      mm,thp,shm: limit gfp mask to no more than specified
      mm,thp,shmem: make khugepaged obey tmpfs mount flags
      mm,shmem,thp: limit shmem THP allocations to requested zones

Subsystem: mm/cma

    Roman Gushchin <guro@fb.com>:
      mm: cma: allocate cma areas bottom-up

    David Hildenbrand <david@redhat.com>:
      mm/cma: expose all pages to the buddy if activation of an area fails
      mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo

    Patrick Daly <pdaly@codeaurora.org>:
      mm: cma: print region name on failure

Subsystem: mm/vmstat

    Johannes Weiner <hannes@cmpxchg.org>:
      mm: vmstat: fix NOHZ wakeups for node stat changes
      mm: vmstat: add some comments on internal storage of byte items

    Jiang Biao <benbjiang@tencent.com>:
      mm/vmstat.c: erase latency in vmstat_shepherd

Subsystem: mm/memory-hotplug

    Dan Williams <dan.j.williams@intel.com>:
    Patch series "mm: Fix pfn_to_online_page() with respect to ZONE_DEVICE", v4:
      mm: move pfn_to_online_page() out of line
      mm: teach pfn_to_online_page() to consider subsection validity
      mm: teach pfn_to_online_page() about ZONE_DEVICE section collisions
      mm: fix memory_failure() handling of dax-namespace metadata

    Anshuman Khandual <anshuman.khandual@arm.com>:
      mm/memory_hotplug: rename all existing 'memhp' into 'mhp'

    David Hildenbrand <david@redhat.com>:
      mm/memory_hotplug: MEMHP_MERGE_RESOURCE -> MHP_MERGE_RESOURCE

    Miaohe Lin <linmiaohe@huawei.com>:
      mm/memory_hotplug: use helper function zone_end_pfn() to get end_pfn

    David Hildenbrand <david@redhat.com>:
      drivers/base/memory: don't store phys_device in memory blocks
      Documentation: sysfs/memory: clarify some memory block device properties

    Anshuman Khandual <anshuman.khandual@arm.com>:
    Patch series "mm/memory_hotplug: Pre-validate the address range with platform", v5:
      mm/memory_hotplug: prevalidate the address range being added with platform
      arm64/mm: define arch_get_mappable_range()
      s390/mm: define arch_get_mappable_range()

    David Hildenbrand <david@redhat.com>:
      virtio-mem: check against mhp_get_pluggable_range() which memory we can hotplug

Subsystem: mm/mlock

    Miaohe Lin <linmiaohe@huawei.com>:
      mm/mlock: stop counting mlocked pages when none vma is found

Subsystem: mm/rmap

    Miaohe Lin <linmiaohe@huawei.com>:
      mm/rmap: correct some obsolete comments of anon_vma
      mm/rmap: remove unneeded semicolon in page_not_mapped()
      mm/rmap: fix obsolete comment in __page_check_anon_rmap()
      mm/rmap: use page_not_mapped in try_to_unmap()
      mm/rmap: correct obsolete comment of page_get_anon_vma()
      mm/rmap: fix potential pte_unmap on an not mapped pte

Subsystem: mm/zswap

    Randy Dunlap <rdunlap@infradead.org>:
      mm: zswap: clean up confusing comment

    Tian Tao <tiantao6@hisilicon.com>:
    Patch series "Fix the compatibility of zsmalloc and zswap":
      mm/zswap: add the flag can_sleep_mapped
      mm: set the sleep_mapped to true for zbud and z3fold

Subsystem: mm/zsmalloc

    Miaohe Lin <linmiaohe@huawei.com>:
      mm/zsmalloc.c: convert to use kmem_cache_zalloc in cache_alloc_zspage()

    Rokudo Yan <wu-yan@tcl.com>:
      zsmalloc: account the number of compacted pages correctly

    Miaohe Lin <linmiaohe@huawei.com>:
      mm/zsmalloc.c: use page_private() to access page->private

Subsystem: mm/cleanups

    Guo Ren <guoren@linux.alibaba.com>:
      mm: page-flags.h: Typo fix (It -> If)

    Daniel Vetter <daniel.vetter@ffwll.ch>:
      mm/dmapool: use might_alloc()
      mm/backing-dev.c: use might_alloc()

    Stephen Zhang <stephenzhangzsd@gmail.com>:
      mm/early_ioremap.c: use __func__ instead of function name

Subsystem: mm/kfence

    Alexander Potapenko <glider@google.com>:
    Patch series "KFENCE: A low-overhead sampling-based memory safety error detector", v7:
      mm: add Kernel Electric-Fence infrastructure
      x86, kfence: enable KFENCE for x86

    Marco Elver <elver@google.com>:
      arm64, kfence: enable KFENCE for ARM64
      kfence: use pt_regs to generate stack trace on faults

    Alexander Potapenko <glider@google.com>:
      mm, kfence: insert KFENCE hooks for SLAB
      mm, kfence: insert KFENCE hooks for SLUB
      kfence, kasan: make KFENCE compatible with KASAN

    Marco Elver <elver@google.com>:
      kfence, Documentation: add KFENCE documentation
      kfence: add test suite
      MAINTAINERS: add entry for KFENCE
      kfence: report sensitive information based on no_hash_pointers

    Alexander Potapenko <glider@google.com>:
    Patch series "Add error_report_end tracepoint to KFENCE and KASAN", v3:
      tracing: add error_report_end trace point
      kfence: use error_report_end tracepoint
      kasan: use error_report_end tracepoint

Subsystem: mm/kasan2

    Andrey Konovalov <andreyknvl@google.com>:
    Patch series "kasan: optimizations and fixes for HW_TAGS", v4:
      kasan, mm: don't save alloc stacks twice
      kasan, mm: optimize kmalloc poisoning
      kasan: optimize large kmalloc poisoning
      kasan: clean up setting free info in kasan_slab_free
      kasan: unify large kfree checks
      kasan: rework krealloc tests
      kasan, mm: fail krealloc on freed objects
      kasan, mm: optimize krealloc poisoning
      kasan: ensure poisoning size alignment
      arm64: kasan: simplify and inline MTE functions
      kasan: inline HW_TAGS helper functions
      kasan: clarify that only first bug is reported in HW_TAGS

Subsystem: alpha

    Randy Dunlap <rdunlap@infradead.org>:
      alpha: remove CONFIG_EXPERIMENTAL from defconfigs

Subsystem: procfs

    Helge Deller <deller@gmx.de>:
      proc/wchan: use printk format instead of lookup_symbol_name()

    Josef Bacik <josef@toxicpanda.com>:
      proc: use kvzalloc for our kernel buffer

Subsystem: sysctl

    Lin Feng <linf@wangsu.com>:
      sysctl.c: fix underflow value setting risk in vm_table

Subsystem: misc

    Randy Dunlap <rdunlap@infradead.org>:
      include/linux: remove repeated words

    Miguel Ojeda <ojeda@kernel.org>:
      treewide: Miguel has moved

Subsystem: core-kernel

    Hubert Jasudowicz <hubert.jasudowicz@gmail.com>:
      groups: use flexible-array member in struct group_info
      groups: simplify struct group_info allocation

    Randy Dunlap <rdunlap@infradead.org>:
      kernel: delete repeated words in comments

Subsystem: MAINTAINERS

    Vlastimil Babka <vbabka@suse.cz>:
      MAINTAINERS: add uapi directories to API/ABI section

Subsystem: lib

    Huang Shijie <sjhuang@iluvatar.ai>:
      lib/genalloc.c: change return type to unsigned long for bitmap_set_ll

    Francis Laniel <laniel_francis@privacyrequired.com>:
      string.h: move fortified functions definitions in a dedicated header.

    Yogesh Lal <ylal@codeaurora.org>:
      lib: stackdepot: add support to configure STACK_HASH_SIZE

    Vijayanand Jitta <vjitta@codeaurora.org>:
      lib: stackdepot: add support to disable stack depot
      lib: stackdepot: fix ignoring return value warning

    Masahiro Yamada <masahiroy@kernel.org>:
      lib/cmdline: remove an unneeded local variable in next_arg()

Subsystem: bitops

    Geert Uytterhoeven <geert+renesas@glider.be>:
      include/linux/bitops.h: spelling s/synomyn/synonym/

Subsystem: checkpatch

    Joe Perches <joe@perches.com>:
      checkpatch: improve blank line after declaration test

    Peng Wang <rocking@linux.alibaba.com>:
      checkpatch: ignore warning designated initializers using NR_CPUS

    Dwaipayan Ray <dwaipayanray1@gmail.com>:
      checkpatch: trivial style fixes

    Joe Perches <joe@perches.com>:
      checkpatch: prefer ftrace over function entry/exit printks
      checkpatch: improve TYPECAST_INT_CONSTANT test message

    Aditya Srivastava <yashsri421@gmail.com>:
      checkpatch: add warning for avoiding .L prefix symbols in assembly files

    Joe Perches <joe@perches.com>:
      checkpatch: add kmalloc_array_node to unnecessary OOM message check

    Chris Down <chris@chrisdown.name>:
      checkpatch: don't warn about colon termination in linker scripts

    Song Liu <songliubraving@fb.com>:
      checkpatch: do not apply "initialise globals to 0" check to BPF progs

Subsystem: init

    Masahiro Yamada <masahiroy@kernel.org>:
      init/version.c: remove Version_<LINUX_VERSION_CODE> symbol
      init: clean up early_param_on_off() macro

    Bhaskar Chowdhury <unixbhaskar@gmail.com>:
      init/Kconfig: fix a typo in CC_VERSION_TEXT help text

Subsystem: coredump

    Ira Weiny <ira.weiny@intel.com>:
      fs/coredump: use kmap_local_page()

Subsystem: seq_file

    NeilBrown <neilb@suse.de>:
    Patch series "Fix some seq_file users that were recently broken":
      seq_file: document how per-entry resources are managed.
      x86: fix seq_file iteration for pat/memtype.c

Subsystem: gdb

    George Prekas <prekageo@amazon.com>:
      scripts/gdb: fix list_for_each

    Sumit Garg <sumit.garg@linaro.org>:
      kgdb: fix to kill breakpoints on initmem after boot

Subsystem: ubsan

    Andrey Ryabinin <ryabinin.a.a@gmail.com>:
      ubsan: remove overflow checks

Subsystem: initramfs

    Florian Fainelli <f.fainelli@gmail.com>:
      initramfs: panic with memory information

Subsystem: mm/pagemap2

    Huang Pei <huangpei@loongson.cn>:
      MIPS: make userspace mapping young by default

 .mailmap                                            |    1 
 CREDITS                                             |    9 
 Documentation/ABI/testing/sysfs-devices-memory      |   58 -
 Documentation/admin-guide/auxdisplay/cfag12864b.rst |    2 
 Documentation/admin-guide/auxdisplay/ks0108.rst     |    2 
 Documentation/admin-guide/kernel-parameters.txt     |    6 
 Documentation/admin-guide/mm/memory-hotplug.rst     |   20 
 Documentation/dev-tools/index.rst                   |    1 
 Documentation/dev-tools/kasan.rst                   |    8 
 Documentation/dev-tools/kfence.rst                  |  318 +++++++
 Documentation/filesystems/seq_file.rst              |    6 
 MAINTAINERS                                         |   26 
 arch/alpha/configs/defconfig                        |    1 
 arch/arm64/Kconfig                                  |    1 
 arch/arm64/include/asm/cache.h                      |    1 
 arch/arm64/include/asm/kasan.h                      |    1 
 arch/arm64/include/asm/kfence.h                     |   26 
 arch/arm64/include/asm/mte-def.h                    |    2 
 arch/arm64/include/asm/mte-kasan.h                  |   65 +
 arch/arm64/include/asm/mte.h                        |    2 
 arch/arm64/kernel/mte.c                             |   46 -
 arch/arm64/lib/mte.S                                |   16 
 arch/arm64/mm/fault.c                               |    8 
 arch/arm64/mm/mmu.c                                 |   23 
 arch/mips/mm/cache.c                                |   30 
 arch/s390/mm/init.c                                 |    1 
 arch/s390/mm/vmem.c                                 |   14 
 arch/x86/Kconfig                                    |    1 
 arch/x86/include/asm/kfence.h                       |   76 +
 arch/x86/mm/fault.c                                 |   10 
 arch/x86/mm/pat/memtype.c                           |    4 
 drivers/auxdisplay/cfag12864b.c                     |    4 
 drivers/auxdisplay/cfag12864bfb.c                   |    4 
 drivers/auxdisplay/ks0108.c                         |    4 
 drivers/base/memory.c                               |   35 
 drivers/block/zram/zram_drv.c                       |    2 
 drivers/hv/hv_balloon.c                             |    2 
 drivers/virtio/virtio_mem.c                         |   43 
 drivers/xen/balloon.c                               |    2 
 fs/coredump.c                                       |    4 
 fs/iomap/seek.c                                     |  125 --
 fs/proc/base.c                                      |   21 
 fs/proc/proc_sysctl.c                               |    4 
 include/linux/bitops.h                              |    2 
 include/linux/cfag12864b.h                          |    2 
 include/linux/cred.h                                |    2 
 include/linux/fortify-string.h                      |  302 ++++++
 include/linux/gfp.h                                 |    2 
 include/linux/init.h                                |    4 
 include/linux/kasan.h                               |   25 
 include/linux/kfence.h                              |  230 +++++
 include/linux/kgdb.h                                |    2 
 include/linux/khugepaged.h                          |    2 
 include/linux/ks0108.h                              |    2 
 include/linux/mdev.h                                |    2 
 include/linux/memory.h                              |    3 
 include/linux/memory_hotplug.h                      |   33 
 include/linux/memremap.h                            |    6 
 include/linux/mmzone.h                              |   49 -
 include/linux/page-flags.h                          |    4 
 include/linux/pagemap.h                             |   10 
 include/linux/pagevec.h                             |   10 
 include/linux/pgtable.h                             |    8 
 include/linux/ptrace.h                              |    2 
 include/linux/rmap.h                                |    3 
 include/linux/slab_def.h                            |    3 
 include/linux/slub_def.h                            |    3 
 include/linux/stackdepot.h                          |    9 
 include/linux/string.h                              |  282 ------
 include/linux/vmstat.h                              |    6 
 include/linux/zpool.h                               |    3 
 include/linux/zsmalloc.h                            |    2 
 include/trace/events/error_report.h                 |   74 +
 include/uapi/linux/firewire-cdev.h                  |    2 
 include/uapi/linux/input.h                          |    2 
 init/Kconfig                                        |    2 
 init/initramfs.c                                    |   19 
 init/main.c                                         |    6 
 init/version.c                                      |    8 
 kernel/debug/debug_core.c                           |   11 
 kernel/events/core.c                                |    8 
 kernel/events/uprobes.c                             |    2 
 kernel/groups.c                                     |    7 
 kernel/locking/rtmutex.c                            |    4 
 kernel/locking/rwsem.c                              |    2 
 kernel/locking/semaphore.c                          |    2 
 kernel/sched/fair.c                                 |    2 
 kernel/sched/membarrier.c                           |    2 
 kernel/sysctl.c                                     |    8 
 kernel/trace/Makefile                               |    1 
 kernel/trace/error_report-traces.c                  |   12 
 lib/Kconfig                                         |    9 
 lib/Kconfig.debug                                   |    1 
 lib/Kconfig.kfence                                  |   84 +
 lib/Kconfig.ubsan                                   |   17 
 lib/cmdline.c                                       |    7 
 lib/genalloc.c                                      |    3 
 lib/stackdepot.c                                    |   41 
 lib/test_kasan.c                                    |  111 ++
 lib/test_ubsan.c                                    |   49 -
 lib/ubsan.c                                         |   68 -
 mm/Makefile                                         |    1 
 mm/backing-dev.c                                    |    3 
 mm/cma.c                                            |   64 -
 mm/dmapool.c                                        |    3 
 mm/early_ioremap.c                                  |   12 
 mm/filemap.c                                        |  361 +++++---
 mm/huge_memory.c                                    |    6 
 mm/internal.h                                       |    6 
 mm/kasan/common.c                                   |  213 +++-
 mm/kasan/generic.c                                  |    3 
 mm/kasan/hw_tags.c                                  |    2 
 mm/kasan/kasan.h                                    |   97 +-
 mm/kasan/report.c                                   |    8 
 mm/kasan/shadow.c                                   |   78 +
 mm/kfence/Makefile                                  |    6 
 mm/kfence/core.c                                    |  875 +++++++++++++++++++-
 mm/kfence/kfence.h                                  |  126 ++
 mm/kfence/kfence_test.c                             |  860 +++++++++++++++++++
 mm/kfence/report.c                                  |  350 ++++++--
 mm/khugepaged.c                                     |   22 
 mm/memory-failure.c                                 |    6 
 mm/memory.c                                         |    4 
 mm/memory_hotplug.c                                 |  178 +++-
 mm/memremap.c                                       |   23 
 mm/mlock.c                                          |    2 
 mm/page_alloc.c                                     |    1 
 mm/rmap.c                                           |   24 
 mm/shmem.c                                          |  160 +--
 mm/slab.c                                           |   38 
 mm/slab_common.c                                    |   29 
 mm/slub.c                                           |   63 +
 mm/swap.c                                           |   54 -
 mm/swap_state.c                                     |    7 
 mm/truncate.c                                       |  141 ---
 mm/vmstat.c                                         |   35 
 mm/z3fold.c                                         |    1 
 mm/zbud.c                                           |    1 
 mm/zpool.c                                          |   13 
 mm/zsmalloc.c                                       |   22 
 mm/zswap.c                                          |   57 +
 samples/auxdisplay/cfag12864b-example.c             |    2 
 scripts/Makefile.ubsan                              |    2 
 scripts/checkpatch.pl                               |  152 ++-
 scripts/gdb/linux/lists.py                          |    5 
 145 files changed, 5046 insertions(+), 1682 deletions(-)



^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 001/118] mm: make pagecache tagged lookups return only head pages
  2021-02-26  1:14 incoming Andrew Morton
@ 2021-02-26  1:15 ` Andrew Morton
  2021-02-26  1:15 ` [patch 002/118] mm/shmem: use pagevec_lookup in shmem_unlock_mapping Andrew Morton
                   ` (117 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:15 UTC (permalink / raw)
  To: akpm, dchinner, hannes, hch, hughd, jack, kirill.shutemov,
	linux-mm, mm-commits, torvalds, william.kucharski, willy,
	yang.shi

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm: make pagecache tagged lookups return only head pages

Patch series "Overhaul multi-page lookups for THP", v4.

This THP prep patchset changes several page cache iteration APIs to only
return head pages.

 - It's only possible to tag head pages in the page cache, so only
   return head pages, not all their subpages.
 - Factor a lot of common code out of the various batch lookup routines
 - Add mapping_seek_hole_data()
 - Unify find_get_entries() and pagevec_lookup_entries()
 - Make find_get_entries only return head pages, like find_get_entry().

These are only loosely connected, but they seem to make sense together as
a series.


This patch (of 14):

Pagecache tags are used for dirty page writeback.  Since dirtiness is
tracked on a per-THP basis, we only want to return the head page rather
than each subpage of a tagged page.  All the filesystems which use huge
pages today are in-memory, so there are no tagged huge pages today.

Link: https://lkml.kernel.org/r/20201112212641.27837-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/filemap.c |   11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

--- a/mm/filemap.c~mm-make-pagecache-tagged-lookups-return-only-head-pages
+++ a/mm/filemap.c
@@ -2062,7 +2062,7 @@ retry:
 EXPORT_SYMBOL(find_get_pages_contig);
 
 /**
- * find_get_pages_range_tag - find and return pages in given range matching @tag
+ * find_get_pages_range_tag - Find and return head pages matching @tag.
  * @mapping:	the address_space to search
  * @index:	the starting page index
  * @end:	The final page index (inclusive)
@@ -2070,8 +2070,9 @@ EXPORT_SYMBOL(find_get_pages_contig);
  * @nr_pages:	the maximum number of pages
  * @pages:	where the resulting pages are placed
  *
- * Like find_get_pages, except we only return pages which are tagged with
- * @tag.   We update @index to index the next page for the traversal.
+ * Like find_get_pages(), except we only return head pages which are tagged
+ * with @tag.  @index is updated to the index immediately after the last
+ * page we return, ready for the next iteration.
  *
  * Return: the number of pages which were found.
  */
@@ -2105,9 +2106,9 @@ unsigned find_get_pages_range_tag(struct
 		if (unlikely(page != xas_reload(&xas)))
 			goto put_page;
 
-		pages[ret] = find_subpage(page, xas.xa_index);
+		pages[ret] = page;
 		if (++ret == nr_pages) {
-			*index = xas.xa_index + 1;
+			*index = page->index + thp_nr_pages(page);
 			goto out;
 		}
 		continue;
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 002/118] mm/shmem: use pagevec_lookup in shmem_unlock_mapping
  2021-02-26  1:14 incoming Andrew Morton
  2021-02-26  1:15 ` [patch 001/118] mm: make pagecache tagged lookups return only head pages Andrew Morton
@ 2021-02-26  1:15 ` Andrew Morton
  2021-02-26  1:15 ` [patch 003/118] mm/swap: optimise get_shadow_from_swap_cache Andrew Morton
                   ` (116 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:15 UTC (permalink / raw)
  To: akpm, dchinner, hannes, hch, hughd, jack, kirill.shutemov,
	linux-mm, mm-commits, torvalds, william.kucharski, willy,
	yang.shi

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm/shmem: use pagevec_lookup in shmem_unlock_mapping

The comment shows that the reason for using find_get_entries() is now
stale; find_get_pages() will not return 0 if it hits a consecutive run of
swap entries, and I don't believe it has since 2011.  pagevec_lookup() is
a simpler function to use than find_get_pages(), so use it instead.

Link: https://lkml.kernel.org/r/20201112212641.27837-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/shmem.c |   11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

--- a/mm/shmem.c~mm-shmem-use-pagevec_lookup-in-shmem_unlock_mapping
+++ a/mm/shmem.c
@@ -842,7 +842,6 @@ unsigned long shmem_swap_usage(struct vm
 void shmem_unlock_mapping(struct address_space *mapping)
 {
 	struct pagevec pvec;
-	pgoff_t indices[PAGEVEC_SIZE];
 	pgoff_t index = 0;
 
 	pagevec_init(&pvec);
@@ -850,16 +849,8 @@ void shmem_unlock_mapping(struct address
 	 * Minor point, but we might as well stop if someone else SHM_LOCKs it.
 	 */
 	while (!mapping_unevictable(mapping)) {
-		/*
-		 * Avoid pagevec_lookup(): find_get_pages() returns 0 as if it
-		 * has finished, if it hits a row of PAGEVEC_SIZE swap entries.
-		 */
-		pvec.nr = find_get_entries(mapping, index,
-					   PAGEVEC_SIZE, pvec.pages, indices);
-		if (!pvec.nr)
+		if (!pagevec_lookup(&pvec, mapping, &index))
 			break;
-		index = indices[pvec.nr - 1] + 1;
-		pagevec_remove_exceptionals(&pvec);
 		check_move_unevictable_pages(&pvec);
 		pagevec_release(&pvec);
 		cond_resched();
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 003/118] mm/swap: optimise get_shadow_from_swap_cache
  2021-02-26  1:14 incoming Andrew Morton
  2021-02-26  1:15 ` [patch 001/118] mm: make pagecache tagged lookups return only head pages Andrew Morton
  2021-02-26  1:15 ` [patch 002/118] mm/shmem: use pagevec_lookup in shmem_unlock_mapping Andrew Morton
@ 2021-02-26  1:15 ` Andrew Morton
  2021-02-26  1:15 ` [patch 004/118] mm: add FGP_ENTRY Andrew Morton
                   ` (115 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:15 UTC (permalink / raw)
  To: akpm, dchinner, hannes, hch, hughd, jack, kirill.shutemov,
	linux-mm, mm-commits, torvalds, william.kucharski, willy,
	yang.shi

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm/swap: optimise get_shadow_from_swap_cache

There's no need to get a reference to the page, just load the entry and
see if it's a shadow entry.

Link: https://lkml.kernel.org/r/20201112212641.27837-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/swap_state.c |    4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

--- a/mm/swap_state.c~mm-swap-optimise-get_shadow_from_swap_cache
+++ a/mm/swap_state.c
@@ -87,11 +87,9 @@ void *get_shadow_from_swap_cache(swp_ent
 	pgoff_t idx = swp_offset(entry);
 	struct page *page;
 
-	page = find_get_entry(address_space, idx);
+	page = xa_load(&address_space->i_pages, idx);
 	if (xa_is_value(page))
 		return page;
-	if (page)
-		put_page(page);
 	return NULL;
 }
 
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 004/118] mm: add FGP_ENTRY
  2021-02-26  1:14 incoming Andrew Morton
                   ` (2 preceding siblings ...)
  2021-02-26  1:15 ` [patch 003/118] mm/swap: optimise get_shadow_from_swap_cache Andrew Morton
@ 2021-02-26  1:15 ` Andrew Morton
  2021-02-26  1:15 ` [patch 005/118] mm/filemap: rename find_get_entry to mapping_get_entry Andrew Morton
                   ` (114 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:15 UTC (permalink / raw)
  To: akpm, dchinner, hannes, hch, hughd, jack, kirill.shutemov,
	linux-mm, mm-commits, torvalds, william.kucharski, willy,
	yang.shi

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm: add FGP_ENTRY

The functionality of find_lock_entry() and find_get_entry() can be
provided by pagecache_get_page(), which lets us delete find_lock_entry()
and make find_get_entry() static.

Link: https://lkml.kernel.org/r/20201112212641.27837-5-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/pagemap.h |    1 
 mm/filemap.c            |   44 ++++++--------------------------------
 mm/internal.h           |    3 --
 mm/shmem.c              |    3 +-
 mm/swap_state.c         |    3 +-
 5 files changed, 13 insertions(+), 41 deletions(-)

--- a/include/linux/pagemap.h~mm-add-fgp_entry
+++ a/include/linux/pagemap.h
@@ -315,6 +315,7 @@ pgoff_t page_cache_prev_miss(struct addr
 #define FGP_NOWAIT		0x00000020
 #define FGP_FOR_MMAP		0x00000040
 #define FGP_HEAD		0x00000080
+#define FGP_ENTRY		0x00000100
 
 struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset,
 		int fgp_flags, gfp_t cache_gfp_mask);
--- a/mm/filemap.c~mm-add-fgp_entry
+++ a/mm/filemap.c
@@ -1658,7 +1658,7 @@ pgoff_t page_cache_prev_miss(struct addr
 }
 EXPORT_SYMBOL(page_cache_prev_miss);
 
-/**
+/*
  * find_get_entry - find and get a page cache entry
  * @mapping: the address_space to search
  * @index: The page cache index.
@@ -1671,7 +1671,7 @@ EXPORT_SYMBOL(page_cache_prev_miss);
  *
  * Return: The head page or shadow entry, %NULL if nothing is found.
  */
-struct page *find_get_entry(struct address_space *mapping, pgoff_t index)
+static struct page *find_get_entry(struct address_space *mapping, pgoff_t index)
 {
 	XA_STATE(xas, &mapping->i_pages, index);
 	struct page *page;
@@ -1708,39 +1708,6 @@ out:
 }
 
 /**
- * find_lock_entry - Locate and lock a page cache entry.
- * @mapping: The address_space to search.
- * @index: The page cache index.
- *
- * Looks up the page at @mapping & @index.  If there is a page in the
- * cache, the head page is returned locked and with an increased refcount.
- *
- * If the slot holds a shadow entry of a previously evicted page, or a
- * swap entry from shmem/tmpfs, it is returned.
- *
- * Context: May sleep.
- * Return: The head page or shadow entry, %NULL if nothing is found.
- */
-struct page *find_lock_entry(struct address_space *mapping, pgoff_t index)
-{
-	struct page *page;
-
-repeat:
-	page = find_get_entry(mapping, index);
-	if (page && !xa_is_value(page)) {
-		lock_page(page);
-		/* Has the page been truncated? */
-		if (unlikely(page->mapping != mapping)) {
-			unlock_page(page);
-			put_page(page);
-			goto repeat;
-		}
-		VM_BUG_ON_PAGE(!thp_contains(page, index), page);
-	}
-	return page;
-}
-
-/**
  * pagecache_get_page - Find and get a reference to a page.
  * @mapping: The address_space to search.
  * @index: The page index.
@@ -1755,6 +1722,8 @@ repeat:
  * * %FGP_LOCK - The page is returned locked.
  * * %FGP_HEAD - If the page is present and a THP, return the head page
  *   rather than the exact page specified by the index.
+ * * %FGP_ENTRY - If there is a shadow / swap / DAX entry, return it
+ *   instead of allocating a new page to replace it.
  * * %FGP_CREAT - If no page is present then a new page is allocated using
  *   @gfp_mask and added to the page cache and the VM's LRU list.
  *   The page is returned locked and with an increased refcount.
@@ -1779,8 +1748,11 @@ struct page *pagecache_get_page(struct a
 
 repeat:
 	page = find_get_entry(mapping, index);
-	if (xa_is_value(page))
+	if (xa_is_value(page)) {
+		if (fgp_flags & FGP_ENTRY)
+			return page;
 		page = NULL;
+	}
 	if (!page)
 		goto no_page;
 
--- a/mm/internal.h~mm-add-fgp_entry
+++ a/mm/internal.h
@@ -60,9 +60,6 @@ static inline void force_page_cache_read
 	force_page_cache_ra(&ractl, &file->f_ra, nr_to_read);
 }
 
-struct page *find_get_entry(struct address_space *mapping, pgoff_t index);
-struct page *find_lock_entry(struct address_space *mapping, pgoff_t index);
-
 /**
  * page_evictable - test whether a page is evictable
  * @page: the page to test
--- a/mm/shmem.c~mm-add-fgp_entry
+++ a/mm/shmem.c
@@ -1812,7 +1812,8 @@ repeat:
 	sbinfo = SHMEM_SB(inode->i_sb);
 	charge_mm = vma ? vma->vm_mm : current->mm;
 
-	page = find_lock_entry(mapping, index);
+	page = pagecache_get_page(mapping, index,
+					FGP_ENTRY | FGP_HEAD | FGP_LOCK, 0);
 	if (xa_is_value(page)) {
 		error = shmem_swapin_page(inode, index, &page,
 					  sgp, gfp, vma, fault_type);
--- a/mm/swap_state.c~mm-add-fgp_entry
+++ a/mm/swap_state.c
@@ -403,7 +403,8 @@ struct page *find_get_incore_page(struct
 {
 	swp_entry_t swp;
 	struct swap_info_struct *si;
-	struct page *page = find_get_entry(mapping, index);
+	struct page *page = pagecache_get_page(mapping, index,
+						FGP_ENTRY | FGP_HEAD, 0);
 
 	if (!page)
 		return page;
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 005/118] mm/filemap: rename find_get_entry to mapping_get_entry
  2021-02-26  1:14 incoming Andrew Morton
                   ` (3 preceding siblings ...)
  2021-02-26  1:15 ` [patch 004/118] mm: add FGP_ENTRY Andrew Morton
@ 2021-02-26  1:15 ` Andrew Morton
  2021-02-26  1:15 ` [patch 006/118] mm/filemap: add helper for finding pages Andrew Morton
                   ` (113 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:15 UTC (permalink / raw)
  To: akpm, dchinner, hannes, hch, hughd, jack, kirill.shutemov,
	linux-mm, mm-commits, torvalds, william.kucharski, willy,
	yang.shi

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm/filemap: rename find_get_entry to mapping_get_entry

find_get_entry doesn't "find" anything.  It returns the entry at a
particular index.

Link: https://lkml.kernel.org/r/20201112212641.27837-6-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/filemap.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

--- a/mm/filemap.c~mm-filemap-rename-find_get_entry-to-mapping_get_entry
+++ a/mm/filemap.c
@@ -1659,7 +1659,7 @@ pgoff_t page_cache_prev_miss(struct addr
 EXPORT_SYMBOL(page_cache_prev_miss);
 
 /*
- * find_get_entry - find and get a page cache entry
+ * mapping_get_entry - Get a page cache entry.
  * @mapping: the address_space to search
  * @index: The page cache index.
  *
@@ -1671,7 +1671,8 @@ EXPORT_SYMBOL(page_cache_prev_miss);
  *
  * Return: The head page or shadow entry, %NULL if nothing is found.
  */
-static struct page *find_get_entry(struct address_space *mapping, pgoff_t index)
+static struct page *mapping_get_entry(struct address_space *mapping,
+		pgoff_t index)
 {
 	XA_STATE(xas, &mapping->i_pages, index);
 	struct page *page;
@@ -1747,7 +1748,7 @@ struct page *pagecache_get_page(struct a
 	struct page *page;
 
 repeat:
-	page = find_get_entry(mapping, index);
+	page = mapping_get_entry(mapping, index);
 	if (xa_is_value(page)) {
 		if (fgp_flags & FGP_ENTRY)
 			return page;
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 006/118] mm/filemap: add helper for finding pages
  2021-02-26  1:14 incoming Andrew Morton
                   ` (4 preceding siblings ...)
  2021-02-26  1:15 ` [patch 005/118] mm/filemap: rename find_get_entry to mapping_get_entry Andrew Morton
@ 2021-02-26  1:15 ` Andrew Morton
  2021-02-26  1:15 ` [patch 007/118] mm/filemap: add mapping_seek_hole_data Andrew Morton
                   ` (112 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:15 UTC (permalink / raw)
  To: akpm, dchinner, hannes, hch, hughd, jack, kirill.shutemov,
	linux-mm, mm-commits, torvalds, william.kucharski, willy,
	yang.shi

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm/filemap: add helper for finding pages

There is a lot of common code in find_get_entries(),
find_get_pages_range() and find_get_pages_range_tag().  Factor out
find_get_entry() which simplifies all three functions.

[willy@infradead.org: remove VM_BUG_ON_PAGE()]
  Link: https://lkml.kernel.org/r/20201124041507.28996-2-willy@infradead.orgLink: https://lkml.kernel.org/r/20201112212641.27837-7-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/filemap.c |   97 +++++++++++++++++++++----------------------------
 1 file changed, 42 insertions(+), 55 deletions(-)

--- a/mm/filemap.c~mm-filemap-add-helper-for-finding-pages
+++ a/mm/filemap.c
@@ -1825,6 +1825,42 @@ no_page:
 }
 EXPORT_SYMBOL(pagecache_get_page);
 
+static inline struct page *find_get_entry(struct xa_state *xas, pgoff_t max,
+		xa_mark_t mark)
+{
+	struct page *page;
+
+retry:
+	if (mark == XA_PRESENT)
+		page = xas_find(xas, max);
+	else
+		page = xas_find_marked(xas, max, mark);
+
+	if (xas_retry(xas, page))
+		goto retry;
+	/*
+	 * A shadow entry of a recently evicted page, a swap
+	 * entry from shmem/tmpfs or a DAX entry.  Return it
+	 * without attempting to raise page count.
+	 */
+	if (!page || xa_is_value(page))
+		return page;
+
+	if (!page_cache_get_speculative(page))
+		goto reset;
+
+	/* Has the page moved or been split? */
+	if (unlikely(page != xas_reload(xas))) {
+		put_page(page);
+		goto reset;
+	}
+
+	return page;
+reset:
+	xas_reset(xas);
+	goto retry;
+}
+
 /**
  * find_get_entries - gang pagecache lookup
  * @mapping:	The address_space to search
@@ -1864,42 +1900,21 @@ unsigned find_get_entries(struct address
 		return 0;
 
 	rcu_read_lock();
-	xas_for_each(&xas, page, ULONG_MAX) {
-		if (xas_retry(&xas, page))
-			continue;
-		/*
-		 * A shadow entry of a recently evicted page, a swap
-		 * entry from shmem/tmpfs or a DAX entry.  Return it
-		 * without attempting to raise page count.
-		 */
-		if (xa_is_value(page))
-			goto export;
-
-		if (!page_cache_get_speculative(page))
-			goto retry;
-
-		/* Has the page moved or been split? */
-		if (unlikely(page != xas_reload(&xas)))
-			goto put_page;
-
+	while ((page = find_get_entry(&xas, ULONG_MAX, XA_PRESENT))) {
 		/*
 		 * Terminate early on finding a THP, to allow the caller to
 		 * handle it all at once; but continue if this is hugetlbfs.
 		 */
-		if (PageTransHuge(page) && !PageHuge(page)) {
+		if (!xa_is_value(page) && PageTransHuge(page) &&
+				!PageHuge(page)) {
 			page = find_subpage(page, xas.xa_index);
 			nr_entries = ret + 1;
 		}
-export:
+
 		indices[ret] = xas.xa_index;
 		entries[ret] = page;
 		if (++ret == nr_entries)
 			break;
-		continue;
-put_page:
-		put_page(page);
-retry:
-		xas_reset(&xas);
 	}
 	rcu_read_unlock();
 	return ret;
@@ -1938,30 +1953,16 @@ unsigned find_get_pages_range(struct add
 		return 0;
 
 	rcu_read_lock();
-	xas_for_each(&xas, page, end) {
-		if (xas_retry(&xas, page))
-			continue;
+	while ((page = find_get_entry(&xas, end, XA_PRESENT))) {
 		/* Skip over shadow, swap and DAX entries */
 		if (xa_is_value(page))
 			continue;
 
-		if (!page_cache_get_speculative(page))
-			goto retry;
-
-		/* Has the page moved or been split? */
-		if (unlikely(page != xas_reload(&xas)))
-			goto put_page;
-
 		pages[ret] = find_subpage(page, xas.xa_index);
 		if (++ret == nr_pages) {
 			*start = xas.xa_index + 1;
 			goto out;
 		}
-		continue;
-put_page:
-		put_page(page);
-retry:
-		xas_reset(&xas);
 	}
 
 	/*
@@ -2061,9 +2062,7 @@ unsigned find_get_pages_range_tag(struct
 		return 0;
 
 	rcu_read_lock();
-	xas_for_each_marked(&xas, page, end, tag) {
-		if (xas_retry(&xas, page))
-			continue;
+	while ((page = find_get_entry(&xas, end, tag))) {
 		/*
 		 * Shadow entries should never be tagged, but this iteration
 		 * is lockless so there is a window for page reclaim to evict
@@ -2072,23 +2071,11 @@ unsigned find_get_pages_range_tag(struct
 		if (xa_is_value(page))
 			continue;
 
-		if (!page_cache_get_speculative(page))
-			goto retry;
-
-		/* Has the page moved or been split? */
-		if (unlikely(page != xas_reload(&xas)))
-			goto put_page;
-
 		pages[ret] = page;
 		if (++ret == nr_pages) {
 			*index = page->index + thp_nr_pages(page);
 			goto out;
 		}
-		continue;
-put_page:
-		put_page(page);
-retry:
-		xas_reset(&xas);
 	}
 
 	/*
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 007/118] mm/filemap: add mapping_seek_hole_data
  2021-02-26  1:14 incoming Andrew Morton
                   ` (5 preceding siblings ...)
  2021-02-26  1:15 ` [patch 006/118] mm/filemap: add helper for finding pages Andrew Morton
@ 2021-02-26  1:15 ` Andrew Morton
  2021-02-26  1:15 ` [patch 008/118] iomap: use mapping_seek_hole_data Andrew Morton
                   ` (111 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:15 UTC (permalink / raw)
  To: akpm, dchinner, hannes, hch, hughd, jack, kirill.shutemov,
	linux-mm, mm-commits, torvalds, william.kucharski, willy,
	yang.shi

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm/filemap: add mapping_seek_hole_data

Rewrite shmem_seek_hole_data() and move it to filemap.c.

[willy@infradead.org: don't put an xa_is_value() page]
  Link: https://lkml.kernel.org/r/20201124041507.28996-4-willy@infradead.org
Link: https://lkml.kernel.org/r/20201112212641.27837-8-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/pagemap.h |    2 +
 mm/filemap.c            |   76 ++++++++++++++++++++++++++++++++++++++
 mm/shmem.c              |   74 ++-----------------------------------
 3 files changed, 82 insertions(+), 70 deletions(-)

--- a/include/linux/pagemap.h~mm-filemap-add-mapping_seek_hole_data
+++ a/include/linux/pagemap.h
@@ -760,6 +760,8 @@ extern void __delete_from_page_cache(str
 void replace_page_cache_page(struct page *old, struct page *new);
 void delete_from_page_cache_batch(struct address_space *mapping,
 				  struct pagevec *pvec);
+loff_t mapping_seek_hole_data(struct address_space *, loff_t start, loff_t end,
+		int whence);
 
 /*
  * Like add_to_page_cache_locked, but used to add newly allocated pages:
--- a/mm/filemap.c~mm-filemap-add-mapping_seek_hole_data
+++ a/mm/filemap.c
@@ -2553,6 +2553,82 @@ generic_file_read_iter(struct kiocb *ioc
 }
 EXPORT_SYMBOL(generic_file_read_iter);
 
+static inline bool page_seek_match(struct page *page, bool seek_data)
+{
+	if (xa_is_value(page) || PageUptodate(page))
+		return seek_data;
+	return !seek_data;
+}
+
+static inline
+unsigned int seek_page_size(struct xa_state *xas, struct page *page)
+{
+	if (xa_is_value(page))
+		return PAGE_SIZE << xa_get_order(xas->xa, xas->xa_index);
+	return thp_size(page);
+}
+
+/**
+ * mapping_seek_hole_data - Seek for SEEK_DATA / SEEK_HOLE in the page cache.
+ * @mapping: Address space to search.
+ * @start: First byte to consider.
+ * @end: Limit of search (exclusive).
+ * @whence: Either SEEK_HOLE or SEEK_DATA.
+ *
+ * If the page cache knows which blocks contain holes and which blocks
+ * contain data, your filesystem can use this function to implement
+ * SEEK_HOLE and SEEK_DATA.  This is useful for filesystems which are
+ * entirely memory-based such as tmpfs, and filesystems which support
+ * unwritten extents.
+ *
+ * Return: The requested offset on successs, or -ENXIO if @whence specifies
+ * SEEK_DATA and there is no data after @start.  There is an implicit hole
+ * after @end - 1, so SEEK_HOLE returns @end if all the bytes between @start
+ * and @end contain data.
+ */
+loff_t mapping_seek_hole_data(struct address_space *mapping, loff_t start,
+		loff_t end, int whence)
+{
+	XA_STATE(xas, &mapping->i_pages, start >> PAGE_SHIFT);
+	pgoff_t max = (end - 1) / PAGE_SIZE;
+	bool seek_data = (whence == SEEK_DATA);
+	struct page *page;
+
+	if (end <= start)
+		return -ENXIO;
+
+	rcu_read_lock();
+	while ((page = find_get_entry(&xas, max, XA_PRESENT))) {
+		loff_t pos = xas.xa_index * PAGE_SIZE;
+
+		if (start < pos) {
+			if (!seek_data)
+				goto unlock;
+			start = pos;
+		}
+
+		if (page_seek_match(page, seek_data))
+			goto unlock;
+		start = pos + seek_page_size(&xas, page);
+		if (!xa_is_value(page))
+			put_page(page);
+	}
+	rcu_read_unlock();
+
+	if (seek_data)
+		return -ENXIO;
+	goto out;
+
+unlock:
+	rcu_read_unlock();
+	if (!xa_is_value(page))
+		put_page(page);
+out:
+	if (start > end)
+		return end;
+	return start;
+}
+
 #ifdef CONFIG_MMU
 #define MMAP_LOTSAMISS  (100)
 /*
--- a/mm/shmem.c~mm-filemap-add-mapping_seek_hole_data
+++ a/mm/shmem.c
@@ -2668,86 +2668,20 @@ static ssize_t shmem_file_read_iter(stru
 	return retval ? retval : error;
 }
 
-/*
- * llseek SEEK_DATA or SEEK_HOLE through the page cache.
- */
-static pgoff_t shmem_seek_hole_data(struct address_space *mapping,
-				    pgoff_t index, pgoff_t end, int whence)
-{
-	struct page *page;
-	struct pagevec pvec;
-	pgoff_t indices[PAGEVEC_SIZE];
-	bool done = false;
-	int i;
-
-	pagevec_init(&pvec);
-	pvec.nr = 1;		/* start small: we may be there already */
-	while (!done) {
-		pvec.nr = find_get_entries(mapping, index,
-					pvec.nr, pvec.pages, indices);
-		if (!pvec.nr) {
-			if (whence == SEEK_DATA)
-				index = end;
-			break;
-		}
-		for (i = 0; i < pvec.nr; i++, index++) {
-			if (index < indices[i]) {
-				if (whence == SEEK_HOLE) {
-					done = true;
-					break;
-				}
-				index = indices[i];
-			}
-			page = pvec.pages[i];
-			if (page && !xa_is_value(page)) {
-				if (!PageUptodate(page))
-					page = NULL;
-			}
-			if (index >= end ||
-			    (page && whence == SEEK_DATA) ||
-			    (!page && whence == SEEK_HOLE)) {
-				done = true;
-				break;
-			}
-		}
-		pagevec_remove_exceptionals(&pvec);
-		pagevec_release(&pvec);
-		pvec.nr = PAGEVEC_SIZE;
-		cond_resched();
-	}
-	return index;
-}
-
 static loff_t shmem_file_llseek(struct file *file, loff_t offset, int whence)
 {
 	struct address_space *mapping = file->f_mapping;
 	struct inode *inode = mapping->host;
-	pgoff_t start, end;
-	loff_t new_offset;
 
 	if (whence != SEEK_DATA && whence != SEEK_HOLE)
 		return generic_file_llseek_size(file, offset, whence,
 					MAX_LFS_FILESIZE, i_size_read(inode));
+	if (offset < 0)
+		return -ENXIO;
+
 	inode_lock(inode);
 	/* We're holding i_mutex so we can access i_size directly */
-
-	if (offset < 0 || offset >= inode->i_size)
-		offset = -ENXIO;
-	else {
-		start = offset >> PAGE_SHIFT;
-		end = (inode->i_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
-		new_offset = shmem_seek_hole_data(mapping, start, end, whence);
-		new_offset <<= PAGE_SHIFT;
-		if (new_offset > offset) {
-			if (new_offset < inode->i_size)
-				offset = new_offset;
-			else if (whence == SEEK_DATA)
-				offset = -ENXIO;
-			else
-				offset = inode->i_size;
-		}
-	}
-
+	offset = mapping_seek_hole_data(mapping, offset, inode->i_size, whence);
 	if (offset >= 0)
 		offset = vfs_setpos(file, offset, MAX_LFS_FILESIZE);
 	inode_unlock(inode);
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 008/118] iomap: use mapping_seek_hole_data
  2021-02-26  1:14 incoming Andrew Morton
                   ` (6 preceding siblings ...)
  2021-02-26  1:15 ` [patch 007/118] mm/filemap: add mapping_seek_hole_data Andrew Morton
@ 2021-02-26  1:15 ` Andrew Morton
  2021-02-26  1:15 ` [patch 009/118] mm: add and use find_lock_entries Andrew Morton
                   ` (110 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:15 UTC (permalink / raw)
  To: akpm, dchinner, hannes, hch, hughd, jack, kirill.shutemov,
	linux-mm, mm-commits, torvalds, william.kucharski, willy,
	yang.shi

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: iomap: use mapping_seek_hole_data

Enhance mapping_seek_hole_data() to handle partially uptodate pages and
convert the iomap seek code to call it.

Link: https://lkml.kernel.org/r/20201112212641.27837-9-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/iomap/seek.c |  125 ++++------------------------------------------
 mm/filemap.c    |   37 +++++++++++--
 2 files changed, 43 insertions(+), 119 deletions(-)

--- a/fs/iomap/seek.c~iomap-use-mapping_seek_hole_data
+++ a/fs/iomap/seek.c
@@ -10,122 +10,17 @@
 #include <linux/pagemap.h>
 #include <linux/pagevec.h>
 
-/*
- * Seek for SEEK_DATA / SEEK_HOLE within @page, starting at @lastoff.
- * Returns true if found and updates @lastoff to the offset in file.
- */
-static bool
-page_seek_hole_data(struct inode *inode, struct page *page, loff_t *lastoff,
-		int whence)
-{
-	const struct address_space_operations *ops = inode->i_mapping->a_ops;
-	unsigned int bsize = i_blocksize(inode), off;
-	bool seek_data = whence == SEEK_DATA;
-	loff_t poff = page_offset(page);
-
-	if (WARN_ON_ONCE(*lastoff >= poff + PAGE_SIZE))
-		return false;
-
-	if (*lastoff < poff) {
-		/*
-		 * Last offset smaller than the start of the page means we found
-		 * a hole:
-		 */
-		if (whence == SEEK_HOLE)
-			return true;
-		*lastoff = poff;
-	}
-
-	/*
-	 * Just check the page unless we can and should check block ranges:
-	 */
-	if (bsize == PAGE_SIZE || !ops->is_partially_uptodate)
-		return PageUptodate(page) == seek_data;
-
-	lock_page(page);
-	if (unlikely(page->mapping != inode->i_mapping))
-		goto out_unlock_not_found;
-
-	for (off = 0; off < PAGE_SIZE; off += bsize) {
-		if (offset_in_page(*lastoff) >= off + bsize)
-			continue;
-		if (ops->is_partially_uptodate(page, off, bsize) == seek_data) {
-			unlock_page(page);
-			return true;
-		}
-		*lastoff = poff + off + bsize;
-	}
-
-out_unlock_not_found:
-	unlock_page(page);
-	return false;
-}
-
-/*
- * Seek for SEEK_DATA / SEEK_HOLE in the page cache.
- *
- * Within unwritten extents, the page cache determines which parts are holes
- * and which are data: uptodate buffer heads count as data; everything else
- * counts as a hole.
- *
- * Returns the resulting offset on successs, and -ENOENT otherwise.
- */
-static loff_t
-page_cache_seek_hole_data(struct inode *inode, loff_t offset, loff_t length,
-		int whence)
-{
-	pgoff_t index = offset >> PAGE_SHIFT;
-	pgoff_t end = DIV_ROUND_UP(offset + length, PAGE_SIZE);
-	loff_t lastoff = offset;
-	struct pagevec pvec;
-
-	if (length <= 0)
-		return -ENOENT;
-
-	pagevec_init(&pvec);
-
-	do {
-		unsigned nr_pages, i;
-
-		nr_pages = pagevec_lookup_range(&pvec, inode->i_mapping, &index,
-						end - 1);
-		if (nr_pages == 0)
-			break;
-
-		for (i = 0; i < nr_pages; i++) {
-			struct page *page = pvec.pages[i];
-
-			if (page_seek_hole_data(inode, page, &lastoff, whence))
-				goto check_range;
-			lastoff = page_offset(page) + PAGE_SIZE;
-		}
-		pagevec_release(&pvec);
-	} while (index < end);
-
-	/* When no page at lastoff and we are not done, we found a hole. */
-	if (whence != SEEK_HOLE)
-		goto not_found;
-
-check_range:
-	if (lastoff < offset + length)
-		goto out;
-not_found:
-	lastoff = -ENOENT;
-out:
-	pagevec_release(&pvec);
-	return lastoff;
-}
-
-
 static loff_t
-iomap_seek_hole_actor(struct inode *inode, loff_t offset, loff_t length,
+iomap_seek_hole_actor(struct inode *inode, loff_t start, loff_t length,
 		      void *data, struct iomap *iomap, struct iomap *srcmap)
 {
+	loff_t offset = start;
+
 	switch (iomap->type) {
 	case IOMAP_UNWRITTEN:
-		offset = page_cache_seek_hole_data(inode, offset, length,
-						   SEEK_HOLE);
-		if (offset < 0)
+		offset = mapping_seek_hole_data(inode->i_mapping, start,
+				start + length, SEEK_HOLE);
+		if (offset == start + length)
 			return length;
 		fallthrough;
 	case IOMAP_HOLE:
@@ -164,15 +59,17 @@ iomap_seek_hole(struct inode *inode, lof
 EXPORT_SYMBOL_GPL(iomap_seek_hole);
 
 static loff_t
-iomap_seek_data_actor(struct inode *inode, loff_t offset, loff_t length,
+iomap_seek_data_actor(struct inode *inode, loff_t start, loff_t length,
 		      void *data, struct iomap *iomap, struct iomap *srcmap)
 {
+	loff_t offset = start;
+
 	switch (iomap->type) {
 	case IOMAP_HOLE:
 		return length;
 	case IOMAP_UNWRITTEN:
-		offset = page_cache_seek_hole_data(inode, offset, length,
-						   SEEK_DATA);
+		offset = mapping_seek_hole_data(inode->i_mapping, start,
+				start + length, SEEK_DATA);
 		if (offset < 0)
 			return length;
 		fallthrough;
--- a/mm/filemap.c~iomap-use-mapping_seek_hole_data
+++ a/mm/filemap.c
@@ -2553,11 +2553,36 @@ generic_file_read_iter(struct kiocb *ioc
 }
 EXPORT_SYMBOL(generic_file_read_iter);
 
-static inline bool page_seek_match(struct page *page, bool seek_data)
+static inline loff_t page_seek_hole_data(struct xa_state *xas,
+		struct address_space *mapping, struct page *page,
+		loff_t start, loff_t end, bool seek_data)
 {
+	const struct address_space_operations *ops = mapping->a_ops;
+	size_t offset, bsz = i_blocksize(mapping->host);
+
 	if (xa_is_value(page) || PageUptodate(page))
-		return seek_data;
-	return !seek_data;
+		return seek_data ? start : end;
+	if (!ops->is_partially_uptodate)
+		return seek_data ? end : start;
+
+	xas_pause(xas);
+	rcu_read_unlock();
+	lock_page(page);
+	if (unlikely(page->mapping != mapping))
+		goto unlock;
+
+	offset = offset_in_thp(page, start) & ~(bsz - 1);
+
+	do {
+		if (ops->is_partially_uptodate(page, offset, bsz) == seek_data)
+			break;
+		start = (start + bsz) & ~(bsz - 1);
+		offset += bsz;
+	} while (offset < thp_size(page));
+unlock:
+	unlock_page(page);
+	rcu_read_lock();
+	return start;
 }
 
 static inline
@@ -2607,9 +2632,11 @@ loff_t mapping_seek_hole_data(struct add
 			start = pos;
 		}
 
-		if (page_seek_match(page, seek_data))
+		pos += seek_page_size(&xas, page);
+		start = page_seek_hole_data(&xas, mapping, page, start, pos,
+				seek_data);
+		if (start < pos)
 			goto unlock;
-		start = pos + seek_page_size(&xas, page);
 		if (!xa_is_value(page))
 			put_page(page);
 	}
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 009/118] mm: add and use find_lock_entries
  2021-02-26  1:14 incoming Andrew Morton
                   ` (7 preceding siblings ...)
  2021-02-26  1:15 ` [patch 008/118] iomap: use mapping_seek_hole_data Andrew Morton
@ 2021-02-26  1:15 ` Andrew Morton
  2021-02-26  1:16 ` [patch 010/118] mm: add an 'end' parameter to find_get_entries Andrew Morton
                   ` (109 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:15 UTC (permalink / raw)
  To: akpm, dchinner, hannes, hch, hughd, jack, kirill.shutemov,
	linux-mm, mm-commits, torvalds, william.kucharski, willy,
	yang.shi

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm: add and use find_lock_entries

We have three functions (shmem_undo_range(), truncate_inode_pages_range()
and invalidate_mapping_pages()) which want exactly this function, so add
it to filemap.c.  Before this patch, shmem_undo_range() would split any
compound page which overlaps either end of the range being punched in both
the first and second loops through the address space.  After this patch,
that functionality is left for the second loop, which is arguably more
appropriate since the first loop is supposed to run through all the pages
quickly, and splitting a page can sleep.

[willy@infradead.org: add assertion]
  Link: https://lkml.kernel.org/r/20201124041507.28996-3-willy@infradead.org
Link: https://lkml.kernel.org/r/20201112212641.27837-10-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/filemap.c  |   59 +++++++++++++++++++++++++++++++
 mm/internal.h |    3 +
 mm/shmem.c    |   22 ++---------
 mm/truncate.c |   91 +++++-------------------------------------------
 4 files changed, 78 insertions(+), 97 deletions(-)

--- a/mm/filemap.c~mm-add-and-use-find_lock_entries
+++ a/mm/filemap.c
@@ -1921,6 +1921,65 @@ unsigned find_get_entries(struct address
 }
 
 /**
+ * find_lock_entries - Find a batch of pagecache entries.
+ * @mapping:	The address_space to search.
+ * @start:	The starting page cache index.
+ * @end:	The final page index (inclusive).
+ * @pvec:	Where the resulting entries are placed.
+ * @indices:	The cache indices of the entries in @pvec.
+ *
+ * find_lock_entries() will return a batch of entries from @mapping.
+ * Swap, shadow and DAX entries are included.  Pages are returned
+ * locked and with an incremented refcount.  Pages which are locked by
+ * somebody else or under writeback are skipped.  Only the head page of
+ * a THP is returned.  Pages which are partially outside the range are
+ * not returned.
+ *
+ * The entries have ascending indexes.  The indices may not be consecutive
+ * due to not-present entries, THP pages, pages which could not be locked
+ * or pages under writeback.
+ *
+ * Return: The number of entries which were found.
+ */
+unsigned find_lock_entries(struct address_space *mapping, pgoff_t start,
+		pgoff_t end, struct pagevec *pvec, pgoff_t *indices)
+{
+	XA_STATE(xas, &mapping->i_pages, start);
+	struct page *page;
+
+	rcu_read_lock();
+	while ((page = find_get_entry(&xas, end, XA_PRESENT))) {
+		if (!xa_is_value(page)) {
+			if (page->index < start)
+				goto put;
+			VM_BUG_ON_PAGE(page->index != xas.xa_index, page);
+			if (page->index + thp_nr_pages(page) - 1 > end)
+				goto put;
+			if (!trylock_page(page))
+				goto put;
+			if (page->mapping != mapping || PageWriteback(page))
+				goto unlock;
+			VM_BUG_ON_PAGE(!thp_contains(page, xas.xa_index),
+					page);
+		}
+		indices[pvec->nr] = xas.xa_index;
+		if (!pagevec_add(pvec, page))
+			break;
+		goto next;
+unlock:
+		unlock_page(page);
+put:
+		put_page(page);
+next:
+		if (!xa_is_value(page) && PageTransHuge(page))
+			xas_set(&xas, page->index + thp_nr_pages(page));
+	}
+	rcu_read_unlock();
+
+	return pagevec_count(pvec);
+}
+
+/**
  * find_get_pages_range - gang pagecache lookup
  * @mapping:	The address_space to search
  * @start:	The starting page index
--- a/mm/internal.h~mm-add-and-use-find_lock_entries
+++ a/mm/internal.h
@@ -60,6 +60,9 @@ static inline void force_page_cache_read
 	force_page_cache_ra(&ractl, &file->f_ra, nr_to_read);
 }
 
+unsigned find_lock_entries(struct address_space *mapping, pgoff_t start,
+		pgoff_t end, struct pagevec *pvec, pgoff_t *indices);
+
 /**
  * page_evictable - test whether a page is evictable
  * @page: the page to test
--- a/mm/shmem.c~mm-add-and-use-find_lock_entries
+++ a/mm/shmem.c
@@ -907,12 +907,8 @@ static void shmem_undo_range(struct inod
 
 	pagevec_init(&pvec);
 	index = start;
-	while (index < end) {
-		pvec.nr = find_get_entries(mapping, index,
-			min(end - index, (pgoff_t)PAGEVEC_SIZE),
-			pvec.pages, indices);
-		if (!pvec.nr)
-			break;
+	while (index < end && find_lock_entries(mapping, index, end - 1,
+			&pvec, indices)) {
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 
@@ -927,18 +923,10 @@ static void shmem_undo_range(struct inod
 								index, page);
 				continue;
 			}
+			index += thp_nr_pages(page) - 1;
 
-			VM_BUG_ON_PAGE(page_to_pgoff(page) != index, page);
-
-			if (!trylock_page(page))
-				continue;
-
-			if ((!unfalloc || !PageUptodate(page)) &&
-			    page_mapping(page) == mapping) {
-				VM_BUG_ON_PAGE(PageWriteback(page), page);
-				if (shmem_punch_compound(page, start, end))
-					truncate_inode_page(mapping, page);
-			}
+			if (!unfalloc || !PageUptodate(page))
+				truncate_inode_page(mapping, page);
 			unlock_page(page);
 		}
 		pagevec_remove_exceptionals(&pvec);
--- a/mm/truncate.c~mm-add-and-use-find_lock_entries
+++ a/mm/truncate.c
@@ -326,51 +326,19 @@ void truncate_inode_pages_range(struct a
 
 	pagevec_init(&pvec);
 	index = start;
-	while (index < end && pagevec_lookup_entries(&pvec, mapping, index,
-			min(end - index, (pgoff_t)PAGEVEC_SIZE),
-			indices)) {
-		/*
-		 * Pagevec array has exceptional entries and we may also fail
-		 * to lock some pages. So we store pages that can be deleted
-		 * in a new pagevec.
-		 */
-		struct pagevec locked_pvec;
-
-		pagevec_init(&locked_pvec);
-		for (i = 0; i < pagevec_count(&pvec); i++) {
-			struct page *page = pvec.pages[i];
-
-			/* We rely upon deletion not changing page->index */
-			index = indices[i];
-			if (index >= end)
-				break;
-
-			if (xa_is_value(page))
-				continue;
-
-			if (!trylock_page(page))
-				continue;
-			WARN_ON(page_to_index(page) != index);
-			if (PageWriteback(page)) {
-				unlock_page(page);
-				continue;
-			}
-			if (page->mapping != mapping) {
-				unlock_page(page);
-				continue;
-			}
-			pagevec_add(&locked_pvec, page);
-		}
-		for (i = 0; i < pagevec_count(&locked_pvec); i++)
-			truncate_cleanup_page(mapping, locked_pvec.pages[i]);
-		delete_from_page_cache_batch(mapping, &locked_pvec);
-		for (i = 0; i < pagevec_count(&locked_pvec); i++)
-			unlock_page(locked_pvec.pages[i]);
+	while (index < end && find_lock_entries(mapping, index, end - 1,
+			&pvec, indices)) {
+		index = indices[pagevec_count(&pvec) - 1] + 1;
 		truncate_exceptional_pvec_entries(mapping, &pvec, indices, end);
+		for (i = 0; i < pagevec_count(&pvec); i++)
+			truncate_cleanup_page(mapping, pvec.pages[i]);
+		delete_from_page_cache_batch(mapping, &pvec);
+		for (i = 0; i < pagevec_count(&pvec); i++)
+			unlock_page(pvec.pages[i]);
 		pagevec_release(&pvec);
 		cond_resched();
-		index++;
 	}
+
 	if (partial_start) {
 		struct page *page = find_lock_page(mapping, start - 1);
 		if (page) {
@@ -539,9 +507,7 @@ static unsigned long __invalidate_mappin
 	int i;
 
 	pagevec_init(&pvec);
-	while (index <= end && pagevec_lookup_entries(&pvec, mapping, index,
-			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1,
-			indices)) {
+	while (find_lock_entries(mapping, index, end, &pvec, indices)) {
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 
@@ -555,39 +521,7 @@ static unsigned long __invalidate_mappin
 							     page);
 				continue;
 			}
-
-			if (!trylock_page(page))
-				continue;
-
-			WARN_ON(page_to_index(page) != index);
-
-			/* Middle of THP: skip */
-			if (PageTransTail(page)) {
-				unlock_page(page);
-				continue;
-			} else if (PageTransHuge(page)) {
-				index += HPAGE_PMD_NR - 1;
-				i += HPAGE_PMD_NR - 1;
-				/*
-				 * 'end' is in the middle of THP. Don't
-				 * invalidate the page as the part outside of
-				 * 'end' could be still useful.
-				 */
-				if (index > end) {
-					unlock_page(page);
-					continue;
-				}
-
-				/* Take a pin outside pagevec */
-				get_page(page);
-
-				/*
-				 * Drop extra pins before trying to invalidate
-				 * the huge page.
-				 */
-				pagevec_remove_exceptionals(&pvec);
-				pagevec_release(&pvec);
-			}
+			index += thp_nr_pages(page) - 1;
 
 			ret = invalidate_inode_page(page);
 			unlock_page(page);
@@ -601,9 +535,6 @@ static unsigned long __invalidate_mappin
 				if (nr_pagevec)
 					(*nr_pagevec)++;
 			}
-
-			if (PageTransHuge(page))
-				put_page(page);
 			count += ret;
 		}
 		pagevec_remove_exceptionals(&pvec);
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 010/118] mm: add an 'end' parameter to find_get_entries
  2021-02-26  1:14 incoming Andrew Morton
                   ` (8 preceding siblings ...)
  2021-02-26  1:15 ` [patch 009/118] mm: add and use find_lock_entries Andrew Morton
@ 2021-02-26  1:16 ` Andrew Morton
  2021-02-26  1:16 ` [patch 011/118] mm: add an 'end' parameter to pagevec_lookup_entries Andrew Morton
                   ` (108 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:16 UTC (permalink / raw)
  To: akpm, dchinner, hannes, hch, hughd, jack, kirill.shutemov,
	linux-mm, mm-commits, torvalds, william.kucharski, willy,
	yang.shi

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm: add an 'end' parameter to find_get_entries

This simplifies the callers and leads to a more efficient implementation
since the XArray has this functionality already.

Link: https://lkml.kernel.org/r/20201112212641.27837-11-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/pagemap.h |    4 ++--
 mm/filemap.c            |    9 +++++----
 mm/shmem.c              |   10 ++--------
 mm/swap.c               |    2 +-
 4 files changed, 10 insertions(+), 15 deletions(-)

--- a/include/linux/pagemap.h~mm-add-an-end-parameter-to-find_get_entries
+++ a/include/linux/pagemap.h
@@ -451,8 +451,8 @@ static inline struct page *find_subpage(
 }
 
 unsigned find_get_entries(struct address_space *mapping, pgoff_t start,
-			  unsigned int nr_entries, struct page **entries,
-			  pgoff_t *indices);
+		pgoff_t end, unsigned int nr_entries, struct page **entries,
+		pgoff_t *indices);
 unsigned find_get_pages_range(struct address_space *mapping, pgoff_t *start,
 			pgoff_t end, unsigned int nr_pages,
 			struct page **pages);
--- a/mm/filemap.c~mm-add-an-end-parameter-to-find_get_entries
+++ a/mm/filemap.c
@@ -1865,6 +1865,7 @@ reset:
  * find_get_entries - gang pagecache lookup
  * @mapping:	The address_space to search
  * @start:	The starting page cache index
+ * @end:	The final page index (inclusive).
  * @nr_entries:	The maximum number of entries
  * @entries:	Where the resulting entries are placed
  * @indices:	The cache indices corresponding to the entries in @entries
@@ -1888,9 +1889,9 @@ reset:
  *
  * Return: the number of pages and shadow entries which were found.
  */
-unsigned find_get_entries(struct address_space *mapping,
-			  pgoff_t start, unsigned int nr_entries,
-			  struct page **entries, pgoff_t *indices)
+unsigned find_get_entries(struct address_space *mapping, pgoff_t start,
+		pgoff_t end, unsigned int nr_entries, struct page **entries,
+		pgoff_t *indices)
 {
 	XA_STATE(xas, &mapping->i_pages, start);
 	struct page *page;
@@ -1900,7 +1901,7 @@ unsigned find_get_entries(struct address
 		return 0;
 
 	rcu_read_lock();
-	while ((page = find_get_entry(&xas, ULONG_MAX, XA_PRESENT))) {
+	while ((page = find_get_entry(&xas, end, XA_PRESENT))) {
 		/*
 		 * Terminate early on finding a THP, to allow the caller to
 		 * handle it all at once; but continue if this is hugetlbfs.
--- a/mm/shmem.c~mm-add-an-end-parameter-to-find_get_entries
+++ a/mm/shmem.c
@@ -913,8 +913,6 @@ static void shmem_undo_range(struct inod
 			struct page *page = pvec.pages[i];
 
 			index = indices[i];
-			if (index >= end)
-				break;
 
 			if (xa_is_value(page)) {
 				if (unfalloc)
@@ -967,9 +965,8 @@ static void shmem_undo_range(struct inod
 	while (index < end) {
 		cond_resched();
 
-		pvec.nr = find_get_entries(mapping, index,
-				min(end - index, (pgoff_t)PAGEVEC_SIZE),
-				pvec.pages, indices);
+		pvec.nr = find_get_entries(mapping, index, end - 1,
+				PAGEVEC_SIZE, pvec.pages, indices);
 		if (!pvec.nr) {
 			/* If all gone or hole-punch or unfalloc, we're done */
 			if (index == start || end != -1)
@@ -982,9 +979,6 @@ static void shmem_undo_range(struct inod
 			struct page *page = pvec.pages[i];
 
 			index = indices[i];
-			if (index >= end)
-				break;
-
 			if (xa_is_value(page)) {
 				if (unfalloc)
 					continue;
--- a/mm/swap.c~mm-add-an-end-parameter-to-find_get_entries
+++ a/mm/swap.c
@@ -1046,7 +1046,7 @@ unsigned pagevec_lookup_entries(struct p
 				pgoff_t start, unsigned nr_entries,
 				pgoff_t *indices)
 {
-	pvec->nr = find_get_entries(mapping, start, nr_entries,
+	pvec->nr = find_get_entries(mapping, start, ULONG_MAX, nr_entries,
 				    pvec->pages, indices);
 	return pagevec_count(pvec);
 }
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 011/118] mm: add an 'end' parameter to pagevec_lookup_entries
  2021-02-26  1:14 incoming Andrew Morton
                   ` (9 preceding siblings ...)
  2021-02-26  1:16 ` [patch 010/118] mm: add an 'end' parameter to find_get_entries Andrew Morton
@ 2021-02-26  1:16 ` Andrew Morton
  2021-02-26  1:16 ` [patch 012/118] mm: remove nr_entries parameter from pagevec_lookup_entries Andrew Morton
                   ` (107 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:16 UTC (permalink / raw)
  To: akpm, dchinner, hannes, hch, hughd, jack, kirill.shutemov,
	linux-mm, mm-commits, torvalds, william.kucharski, willy,
	yang.shi

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm: add an 'end' parameter to pagevec_lookup_entries

Simplifies the callers and uses the existing functionality in
find_get_entries().  We can also drop the final argument of
truncate_exceptional_pvec_entries() and simplify the logic in that
function.

Link: https://lkml.kernel.org/r/20201112212641.27837-12-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/pagevec.h |    5 +---
 mm/swap.c               |    8 +++----
 mm/truncate.c           |   41 +++++++++-----------------------------
 3 files changed, 16 insertions(+), 38 deletions(-)

--- a/include/linux/pagevec.h~mm-add-an-end-parameter-to-pagevec_lookup_entries
+++ a/include/linux/pagevec.h
@@ -26,9 +26,8 @@ struct pagevec {
 void __pagevec_release(struct pagevec *pvec);
 void __pagevec_lru_add(struct pagevec *pvec);
 unsigned pagevec_lookup_entries(struct pagevec *pvec,
-				struct address_space *mapping,
-				pgoff_t start, unsigned nr_entries,
-				pgoff_t *indices);
+		struct address_space *mapping, pgoff_t start, pgoff_t end,
+		unsigned nr_entries, pgoff_t *indices);
 void pagevec_remove_exceptionals(struct pagevec *pvec);
 unsigned pagevec_lookup_range(struct pagevec *pvec,
 			      struct address_space *mapping,
--- a/mm/swap.c~mm-add-an-end-parameter-to-pagevec_lookup_entries
+++ a/mm/swap.c
@@ -1022,6 +1022,7 @@ void __pagevec_lru_add(struct pagevec *p
  * @pvec:	Where the resulting entries are placed
  * @mapping:	The address_space to search
  * @start:	The starting entry index
+ * @end:	The highest index to return (inclusive).
  * @nr_entries:	The maximum number of pages
  * @indices:	The cache indices corresponding to the entries in @pvec
  *
@@ -1042,11 +1043,10 @@ void __pagevec_lru_add(struct pagevec *p
  * found.
  */
 unsigned pagevec_lookup_entries(struct pagevec *pvec,
-				struct address_space *mapping,
-				pgoff_t start, unsigned nr_entries,
-				pgoff_t *indices)
+		struct address_space *mapping, pgoff_t start, pgoff_t end,
+		unsigned nr_entries, pgoff_t *indices)
 {
-	pvec->nr = find_get_entries(mapping, start, ULONG_MAX, nr_entries,
+	pvec->nr = find_get_entries(mapping, start, end, nr_entries,
 				    pvec->pages, indices);
 	return pagevec_count(pvec);
 }
--- a/mm/truncate.c~mm-add-an-end-parameter-to-pagevec_lookup_entries
+++ a/mm/truncate.c
@@ -57,11 +57,10 @@ static void clear_shadow_entry(struct ad
  * exceptional entries similar to what pagevec_remove_exceptionals does.
  */
 static void truncate_exceptional_pvec_entries(struct address_space *mapping,
-				struct pagevec *pvec, pgoff_t *indices,
-				pgoff_t end)
+				struct pagevec *pvec, pgoff_t *indices)
 {
 	int i, j;
-	bool dax, lock;
+	bool dax;
 
 	/* Handled by shmem itself */
 	if (shmem_mapping(mapping))
@@ -75,8 +74,7 @@ static void truncate_exceptional_pvec_en
 		return;
 
 	dax = dax_mapping(mapping);
-	lock = !dax && indices[j] < end;
-	if (lock)
+	if (!dax)
 		xa_lock_irq(&mapping->i_pages);
 
 	for (i = j; i < pagevec_count(pvec); i++) {
@@ -88,9 +86,6 @@ static void truncate_exceptional_pvec_en
 			continue;
 		}
 
-		if (index >= end)
-			continue;
-
 		if (unlikely(dax)) {
 			dax_delete_mapping_entry(mapping, index);
 			continue;
@@ -99,7 +94,7 @@ static void truncate_exceptional_pvec_en
 		__clear_shadow_entry(mapping, index, page);
 	}
 
-	if (lock)
+	if (!dax)
 		xa_unlock_irq(&mapping->i_pages);
 	pvec->nr = j;
 }
@@ -329,7 +324,7 @@ void truncate_inode_pages_range(struct a
 	while (index < end && find_lock_entries(mapping, index, end - 1,
 			&pvec, indices)) {
 		index = indices[pagevec_count(&pvec) - 1] + 1;
-		truncate_exceptional_pvec_entries(mapping, &pvec, indices, end);
+		truncate_exceptional_pvec_entries(mapping, &pvec, indices);
 		for (i = 0; i < pagevec_count(&pvec); i++)
 			truncate_cleanup_page(mapping, pvec.pages[i]);
 		delete_from_page_cache_batch(mapping, &pvec);
@@ -381,8 +376,8 @@ void truncate_inode_pages_range(struct a
 	index = start;
 	for ( ; ; ) {
 		cond_resched();
-		if (!pagevec_lookup_entries(&pvec, mapping, index,
-			min(end - index, (pgoff_t)PAGEVEC_SIZE), indices)) {
+		if (!pagevec_lookup_entries(&pvec, mapping, index, end - 1,
+				PAGEVEC_SIZE, indices)) {
 			/* If all gone from start onwards, we're done */
 			if (index == start)
 				break;
@@ -390,23 +385,12 @@ void truncate_inode_pages_range(struct a
 			index = start;
 			continue;
 		}
-		if (index == start && indices[0] >= end) {
-			/* All gone out of hole to be punched, we're done */
-			pagevec_remove_exceptionals(&pvec);
-			pagevec_release(&pvec);
-			break;
-		}
 
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 
 			/* We rely upon deletion not changing page->index */
 			index = indices[i];
-			if (index >= end) {
-				/* Restart punch to make sure all gone */
-				index = start - 1;
-				break;
-			}
 
 			if (xa_is_value(page))
 				continue;
@@ -417,7 +401,7 @@ void truncate_inode_pages_range(struct a
 			truncate_inode_page(mapping, page);
 			unlock_page(page);
 		}
-		truncate_exceptional_pvec_entries(mapping, &pvec, indices, end);
+		truncate_exceptional_pvec_entries(mapping, &pvec, indices);
 		pagevec_release(&pvec);
 		index++;
 	}
@@ -513,8 +497,6 @@ static unsigned long __invalidate_mappin
 
 			/* We rely upon deletion not changing page->index */
 			index = indices[i];
-			if (index > end)
-				break;
 
 			if (xa_is_value(page)) {
 				invalidate_exceptional_entry(mapping, index,
@@ -656,16 +638,13 @@ int invalidate_inode_pages2_range(struct
 
 	pagevec_init(&pvec);
 	index = start;
-	while (index <= end && pagevec_lookup_entries(&pvec, mapping, index,
-			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1,
-			indices)) {
+	while (pagevec_lookup_entries(&pvec, mapping, index, end,
+			PAGEVEC_SIZE, indices)) {
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 
 			/* We rely upon deletion not changing page->index */
 			index = indices[i];
-			if (index > end)
-				break;
 
 			if (xa_is_value(page)) {
 				if (!invalidate_exceptional_entry2(mapping,
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 012/118] mm: remove nr_entries parameter from pagevec_lookup_entries
  2021-02-26  1:14 incoming Andrew Morton
                   ` (10 preceding siblings ...)
  2021-02-26  1:16 ` [patch 011/118] mm: add an 'end' parameter to pagevec_lookup_entries Andrew Morton
@ 2021-02-26  1:16 ` Andrew Morton
  2021-02-26  1:16 ` [patch 013/118] mm: pass pvec directly to find_get_entries Andrew Morton
                   ` (106 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:16 UTC (permalink / raw)
  To: akpm, dchinner, hannes, hch, hughd, jack, kirill.shutemov,
	linux-mm, mm-commits, torvalds, william.kucharski, willy,
	yang.shi

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm: remove nr_entries parameter from pagevec_lookup_entries

All callers want to fetch the full size of the pvec.

Link: https://lkml.kernel.org/r/20201112212641.27837-13-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/pagevec.h |    2 +-
 mm/swap.c               |    4 ++--
 mm/truncate.c           |    5 ++---
 3 files changed, 5 insertions(+), 6 deletions(-)

--- a/include/linux/pagevec.h~mm-remove-nr_entries-parameter-from-pagevec_lookup_entries
+++ a/include/linux/pagevec.h
@@ -27,7 +27,7 @@ void __pagevec_release(struct pagevec *p
 void __pagevec_lru_add(struct pagevec *pvec);
 unsigned pagevec_lookup_entries(struct pagevec *pvec,
 		struct address_space *mapping, pgoff_t start, pgoff_t end,
-		unsigned nr_entries, pgoff_t *indices);
+		pgoff_t *indices);
 void pagevec_remove_exceptionals(struct pagevec *pvec);
 unsigned pagevec_lookup_range(struct pagevec *pvec,
 			      struct address_space *mapping,
--- a/mm/swap.c~mm-remove-nr_entries-parameter-from-pagevec_lookup_entries
+++ a/mm/swap.c
@@ -1044,9 +1044,9 @@ void __pagevec_lru_add(struct pagevec *p
  */
 unsigned pagevec_lookup_entries(struct pagevec *pvec,
 		struct address_space *mapping, pgoff_t start, pgoff_t end,
-		unsigned nr_entries, pgoff_t *indices)
+		pgoff_t *indices)
 {
-	pvec->nr = find_get_entries(mapping, start, end, nr_entries,
+	pvec->nr = find_get_entries(mapping, start, end, PAGEVEC_SIZE,
 				    pvec->pages, indices);
 	return pagevec_count(pvec);
 }
--- a/mm/truncate.c~mm-remove-nr_entries-parameter-from-pagevec_lookup_entries
+++ a/mm/truncate.c
@@ -377,7 +377,7 @@ void truncate_inode_pages_range(struct a
 	for ( ; ; ) {
 		cond_resched();
 		if (!pagevec_lookup_entries(&pvec, mapping, index, end - 1,
-				PAGEVEC_SIZE, indices)) {
+				indices)) {
 			/* If all gone from start onwards, we're done */
 			if (index == start)
 				break;
@@ -638,8 +638,7 @@ int invalidate_inode_pages2_range(struct
 
 	pagevec_init(&pvec);
 	index = start;
-	while (pagevec_lookup_entries(&pvec, mapping, index, end,
-			PAGEVEC_SIZE, indices)) {
+	while (pagevec_lookup_entries(&pvec, mapping, index, end, indices)) {
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 013/118] mm: pass pvec directly to find_get_entries
  2021-02-26  1:14 incoming Andrew Morton
                   ` (11 preceding siblings ...)
  2021-02-26  1:16 ` [patch 012/118] mm: remove nr_entries parameter from pagevec_lookup_entries Andrew Morton
@ 2021-02-26  1:16 ` Andrew Morton
  2021-02-26  1:16 ` [patch 014/118] mm: remove pagevec_lookup_entries Andrew Morton
                   ` (105 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:16 UTC (permalink / raw)
  To: akpm, dchinner, hannes, hch, hughd, jack, kirill.shutemov,
	linux-mm, mm-commits, torvalds, william.kucharski, willy,
	yang.shi

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm: pass pvec directly to find_get_entries

All callers of find_get_entries() use a pvec, so pass it directly instead
of manipulating it in the caller.

Link: https://lkml.kernel.org/r/20201112212641.27837-14-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/pagemap.h |    3 +--
 mm/filemap.c            |   21 +++++++++------------
 mm/shmem.c              |    5 ++---
 mm/swap.c               |    4 +---
 4 files changed, 13 insertions(+), 20 deletions(-)

--- a/include/linux/pagemap.h~mm-pass-pvec-directly-to-find_get_entries
+++ a/include/linux/pagemap.h
@@ -451,8 +451,7 @@ static inline struct page *find_subpage(
 }
 
 unsigned find_get_entries(struct address_space *mapping, pgoff_t start,
-		pgoff_t end, unsigned int nr_entries, struct page **entries,
-		pgoff_t *indices);
+		pgoff_t end, struct pagevec *pvec, pgoff_t *indices);
 unsigned find_get_pages_range(struct address_space *mapping, pgoff_t *start,
 			pgoff_t end, unsigned int nr_pages,
 			struct page **pages);
--- a/mm/filemap.c~mm-pass-pvec-directly-to-find_get_entries
+++ a/mm/filemap.c
@@ -1866,14 +1866,12 @@ reset:
  * @mapping:	The address_space to search
  * @start:	The starting page cache index
  * @end:	The final page index (inclusive).
- * @nr_entries:	The maximum number of entries
- * @entries:	Where the resulting entries are placed
+ * @pvec:	Where the resulting entries are placed.
  * @indices:	The cache indices corresponding to the entries in @entries
  *
- * find_get_entries() will search for and return a group of up to
- * @nr_entries entries in the mapping.  The entries are placed at
- * @entries.  find_get_entries() takes a reference against any actual
- * pages it returns.
+ * find_get_entries() will search for and return a batch of entries in
+ * the mapping.  The entries are placed in @pvec.  find_get_entries()
+ * takes a reference on any actual pages it returns.
  *
  * The search returns a group of mapping-contiguous page cache entries
  * with ascending indexes.  There may be holes in the indices due to
@@ -1890,15 +1888,12 @@ reset:
  * Return: the number of pages and shadow entries which were found.
  */
 unsigned find_get_entries(struct address_space *mapping, pgoff_t start,
-		pgoff_t end, unsigned int nr_entries, struct page **entries,
-		pgoff_t *indices)
+		pgoff_t end, struct pagevec *pvec, pgoff_t *indices)
 {
 	XA_STATE(xas, &mapping->i_pages, start);
 	struct page *page;
 	unsigned int ret = 0;
-
-	if (!nr_entries)
-		return 0;
+	unsigned nr_entries = PAGEVEC_SIZE;
 
 	rcu_read_lock();
 	while ((page = find_get_entry(&xas, end, XA_PRESENT))) {
@@ -1913,11 +1908,13 @@ unsigned find_get_entries(struct address
 		}
 
 		indices[ret] = xas.xa_index;
-		entries[ret] = page;
+		pvec->pages[ret] = page;
 		if (++ret == nr_entries)
 			break;
 	}
 	rcu_read_unlock();
+
+	pvec->nr = ret;
 	return ret;
 }
 
--- a/mm/shmem.c~mm-pass-pvec-directly-to-find_get_entries
+++ a/mm/shmem.c
@@ -965,9 +965,8 @@ static void shmem_undo_range(struct inod
 	while (index < end) {
 		cond_resched();
 
-		pvec.nr = find_get_entries(mapping, index, end - 1,
-				PAGEVEC_SIZE, pvec.pages, indices);
-		if (!pvec.nr) {
+		if (!find_get_entries(mapping, index, end - 1, &pvec,
+				indices)) {
 			/* If all gone or hole-punch or unfalloc, we're done */
 			if (index == start || end != -1)
 				break;
--- a/mm/swap.c~mm-pass-pvec-directly-to-find_get_entries
+++ a/mm/swap.c
@@ -1046,9 +1046,7 @@ unsigned pagevec_lookup_entries(struct p
 		struct address_space *mapping, pgoff_t start, pgoff_t end,
 		pgoff_t *indices)
 {
-	pvec->nr = find_get_entries(mapping, start, end, PAGEVEC_SIZE,
-				    pvec->pages, indices);
-	return pagevec_count(pvec);
+	return find_get_entries(mapping, start, end, pvec, indices);
 }
 
 /**
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 014/118] mm: remove pagevec_lookup_entries
  2021-02-26  1:14 incoming Andrew Morton
                   ` (12 preceding siblings ...)
  2021-02-26  1:16 ` [patch 013/118] mm: pass pvec directly to find_get_entries Andrew Morton
@ 2021-02-26  1:16 ` Andrew Morton
  2021-02-26  1:16 ` [patch 015/118] mm,thp,shmem: limit shmem THP alloc gfp_mask Andrew Morton
                   ` (104 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:16 UTC (permalink / raw)
  To: akpm, dchinner, hannes, hch, hughd, jack, kirill.shutemov,
	linux-mm, mm-commits, torvalds, william.kucharski, willy,
	yang.shi

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm: remove pagevec_lookup_entries

pagevec_lookup_entries() is now just a wrapper around find_get_entries()
so remove it and convert all its callers.

Link: https://lkml.kernel.org/r/20201112212641.27837-15-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/pagevec.h |    3 ---
 mm/swap.c               |   36 ++----------------------------------
 mm/truncate.c           |    4 ++--
 3 files changed, 4 insertions(+), 39 deletions(-)

--- a/include/linux/pagevec.h~mm-remove-pagevec_lookup_entries
+++ a/include/linux/pagevec.h
@@ -25,9 +25,6 @@ struct pagevec {
 
 void __pagevec_release(struct pagevec *pvec);
 void __pagevec_lru_add(struct pagevec *pvec);
-unsigned pagevec_lookup_entries(struct pagevec *pvec,
-		struct address_space *mapping, pgoff_t start, pgoff_t end,
-		pgoff_t *indices);
 void pagevec_remove_exceptionals(struct pagevec *pvec);
 unsigned pagevec_lookup_range(struct pagevec *pvec,
 			      struct address_space *mapping,
--- a/mm/swap.c~mm-remove-pagevec_lookup_entries
+++ a/mm/swap.c
@@ -1018,43 +1018,11 @@ void __pagevec_lru_add(struct pagevec *p
 }
 
 /**
- * pagevec_lookup_entries - gang pagecache lookup
- * @pvec:	Where the resulting entries are placed
- * @mapping:	The address_space to search
- * @start:	The starting entry index
- * @end:	The highest index to return (inclusive).
- * @nr_entries:	The maximum number of pages
- * @indices:	The cache indices corresponding to the entries in @pvec
- *
- * pagevec_lookup_entries() will search for and return a group of up
- * to @nr_pages pages and shadow entries in the mapping.  All
- * entries are placed in @pvec.  pagevec_lookup_entries() takes a
- * reference against actual pages in @pvec.
- *
- * The search returns a group of mapping-contiguous entries with
- * ascending indexes.  There may be holes in the indices due to
- * not-present entries.
- *
- * Only one subpage of a Transparent Huge Page is returned in one call:
- * allowing truncate_inode_pages_range() to evict the whole THP without
- * cycling through a pagevec of extra references.
- *
- * pagevec_lookup_entries() returns the number of entries which were
- * found.
- */
-unsigned pagevec_lookup_entries(struct pagevec *pvec,
-		struct address_space *mapping, pgoff_t start, pgoff_t end,
-		pgoff_t *indices)
-{
-	return find_get_entries(mapping, start, end, pvec, indices);
-}
-
-/**
  * pagevec_remove_exceptionals - pagevec exceptionals pruning
  * @pvec:	The pagevec to prune
  *
- * pagevec_lookup_entries() fills both pages and exceptional radix
- * tree entries into the pagevec.  This function prunes all
+ * find_get_entries() fills both pages and XArray value entries (aka
+ * exceptional entries) into the pagevec.  This function prunes all
  * exceptionals from @pvec without leaving holes, so that it can be
  * passed on to page-only pagevec operations.
  */
--- a/mm/truncate.c~mm-remove-pagevec_lookup_entries
+++ a/mm/truncate.c
@@ -376,7 +376,7 @@ void truncate_inode_pages_range(struct a
 	index = start;
 	for ( ; ; ) {
 		cond_resched();
-		if (!pagevec_lookup_entries(&pvec, mapping, index, end - 1,
+		if (!find_get_entries(mapping, index, end - 1, &pvec,
 				indices)) {
 			/* If all gone from start onwards, we're done */
 			if (index == start)
@@ -638,7 +638,7 @@ int invalidate_inode_pages2_range(struct
 
 	pagevec_init(&pvec);
 	index = start;
-	while (pagevec_lookup_entries(&pvec, mapping, index, end, indices)) {
+	while (find_get_entries(mapping, index, end, &pvec, indices)) {
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 015/118] mm,thp,shmem: limit shmem THP alloc gfp_mask
  2021-02-26  1:14 incoming Andrew Morton
                   ` (13 preceding siblings ...)
  2021-02-26  1:16 ` [patch 014/118] mm: remove pagevec_lookup_entries Andrew Morton
@ 2021-02-26  1:16 ` Andrew Morton
  2021-02-26  1:16 ` [patch 016/118] mm,thp,shm: limit gfp mask to no more than specified Andrew Morton
                   ` (103 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:16 UTC (permalink / raw)
  To: aarcange, akpm, hughd, linux-mm, mgorman, mhocko, mm-commits,
	riel, torvalds, vbabka, willy, xuyu

From: Rik van Riel <riel@surriel.com>
Subject: mm,thp,shmem: limit shmem THP alloc gfp_mask

Patch series "mm,thp,shm: limit shmem THP alloc gfp_mask", v6.

The allocation flags of anonymous transparent huge pages can be controlled
through the files in /sys/kernel/mm/transparent_hugepage/defrag, which can
help the system from getting bogged down in the page reclaim and
compaction code when many THPs are getting allocated simultaneously.

However, the gfp_mask for shmem THP allocations were not limited by those
configuration settings, and some workloads ended up with all CPUs stuck on
the LRU lock in the page reclaim code, trying to allocate dozens of THPs
simultaneously.

This patch applies the same configurated limitation of THPs to shmem
hugepage allocations, to prevent that from happening.

This way a THP defrag setting of "never" or "defer+madvise" will result in
quick allocation failures without direct reclaim when no 2MB free pages
are available.

With this patch applied, THP allocations for tmpfs will be a little more
aggressive than today for files mmapped with MADV_HUGEPAGE, and a little
less aggressive for files that are not mmapped or mapped without that
flag.


This patch (of 4):

The allocation flags of anonymous transparent huge pages can be controlled
through the files in /sys/kernel/mm/transparent_hugepage/defrag, which can
help the system from getting bogged down in the page reclaim and
compaction code when many THPs are getting allocated simultaneously.

However, the gfp_mask for shmem THP allocations were not limited by those
configuration settings, and some workloads ended up with all CPUs stuck on
the LRU lock in the page reclaim code, trying to allocate dozens of THPs
simultaneously.

This patch applies the same configurated limitation of THPs to shmem
hugepage allocations, to prevent that from happening.

Controlling the gfp_mask of THP allocations through the knobs in sysfs
allows users to determine the balance between how aggressively the system
tries to allocate THPs at fault time, and how much the application may end
up stalling attempting those allocations.

This way a THP defrag setting of "never" or "defer+madvise" will result in
quick allocation failures without direct reclaim when no 2MB free pages
are available.

With this patch applied, THP allocations for tmpfs will be a little more
aggressive than today for files mmapped with MADV_HUGEPAGE, and a little
less aggressive for files that are not mmapped or mapped without that
flag.

Link: https://lkml.kernel.org/r/20201124194925.623931-1-riel@surriel.com
Link: https://lkml.kernel.org/r/20201124194925.623931-2-riel@surriel.com
Signed-off-by: Rik van Riel <riel@surriel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Xu Yu <xuyu@linux.alibaba.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/gfp.h |    2 ++
 mm/huge_memory.c    |    6 +++---
 mm/shmem.c          |    8 +++++---
 3 files changed, 10 insertions(+), 6 deletions(-)

--- a/include/linux/gfp.h~mmthpshmem-limit-shmem-thp-alloc-gfp_mask
+++ a/include/linux/gfp.h
@@ -634,6 +634,8 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_ma
 extern void pm_restrict_gfp_mask(void);
 extern void pm_restore_gfp_mask(void);
 
+extern gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma);
+
 #ifdef CONFIG_PM_SLEEP
 extern bool pm_suspended_storage(void);
 #else
--- a/mm/huge_memory.c~mmthpshmem-limit-shmem-thp-alloc-gfp_mask
+++ a/mm/huge_memory.c
@@ -668,9 +668,9 @@ release:
  *	    available
  * never: never stall for any thp allocation
  */
-static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma)
+gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma)
 {
-	const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE);
+	const bool vma_madvised = vma && (vma->vm_flags & VM_HUGEPAGE);
 
 	/* Always do synchronous compaction */
 	if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags))
@@ -762,7 +762,7 @@ vm_fault_t do_huge_pmd_anonymous_page(st
 		}
 		return ret;
 	}
-	gfp = alloc_hugepage_direct_gfpmask(vma);
+	gfp = vma_thp_gfp_mask(vma);
 	page = alloc_hugepage_vma(gfp, vma, haddr, HPAGE_PMD_ORDER);
 	if (unlikely(!page)) {
 		count_vm_event(THP_FAULT_FALLBACK);
--- a/mm/shmem.c~mmthpshmem-limit-shmem-thp-alloc-gfp_mask
+++ a/mm/shmem.c
@@ -1519,8 +1519,8 @@ static struct page *shmem_alloc_hugepage
 		return NULL;
 
 	shmem_pseudo_vma_init(&pvma, info, hindex);
-	page = alloc_pages_vma(gfp | __GFP_COMP | __GFP_NORETRY | __GFP_NOWARN,
-			HPAGE_PMD_ORDER, &pvma, 0, numa_node_id(), true);
+	page = alloc_pages_vma(gfp, HPAGE_PMD_ORDER, &pvma, 0, numa_node_id(),
+			       true);
 	shmem_pseudo_vma_destroy(&pvma);
 	if (page)
 		prep_transhuge_page(page);
@@ -1776,6 +1776,7 @@ static int shmem_getpage_gfp(struct inod
 	struct page *page;
 	enum sgp_type sgp_huge = sgp;
 	pgoff_t hindex = index;
+	gfp_t huge_gfp;
 	int error;
 	int once = 0;
 	int alloced = 0;
@@ -1862,7 +1863,8 @@ repeat:
 	}
 
 alloc_huge:
-	page = shmem_alloc_and_acct_page(gfp, inode, index, true);
+	huge_gfp = vma_thp_gfp_mask(vma);
+	page = shmem_alloc_and_acct_page(huge_gfp, inode, index, true);
 	if (IS_ERR(page)) {
 alloc_nohuge:
 		page = shmem_alloc_and_acct_page(gfp, inode,
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 016/118] mm,thp,shm: limit gfp mask to no more than specified
  2021-02-26  1:14 incoming Andrew Morton
                   ` (14 preceding siblings ...)
  2021-02-26  1:16 ` [patch 015/118] mm,thp,shmem: limit shmem THP alloc gfp_mask Andrew Morton
@ 2021-02-26  1:16 ` Andrew Morton
  2021-02-26  1:16 ` [patch 017/118] mm,thp,shmem: make khugepaged obey tmpfs mount flags Andrew Morton
                   ` (102 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:16 UTC (permalink / raw)
  To: aarcange, akpm, hughd, linux-mm, mgorman, mhocko, mm-commits,
	riel, torvalds, vbabka, willy, xuyu

From: Rik van Riel <riel@surriel.com>
Subject: mm,thp,shm: limit gfp mask to no more than specified

Matthew Wilcox pointed out that the i915 driver opportunistically
allocates tmpfs memory, but will happily reclaim some of its pool if no
memory is available.

Make sure the gfp mask used to opportunistically allocate a THP is always
at least as restrictive as the original gfp mask.

Link: https://lkml.kernel.org/r/20201124194925.623931-3-riel@surriel.com
Signed-off-by: Rik van Riel <riel@surriel.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xu Yu <xuyu@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/shmem.c |   21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

--- a/mm/shmem.c~mmthpshm-limit-gfp-mask-to-no-more-than-specified
+++ a/mm/shmem.c
@@ -1505,6 +1505,26 @@ static struct page *shmem_swapin(swp_ent
 	return page;
 }
 
+/*
+ * Make sure huge_gfp is always more limited than limit_gfp.
+ * Some of the flags set permissions, while others set limitations.
+ */
+static gfp_t limit_gfp_mask(gfp_t huge_gfp, gfp_t limit_gfp)
+{
+	gfp_t allowflags = __GFP_IO | __GFP_FS | __GFP_RECLAIM;
+	gfp_t denyflags = __GFP_NOWARN | __GFP_NORETRY;
+	gfp_t result = huge_gfp & ~allowflags;
+
+	/*
+	 * Minimize the result gfp by taking the union with the deny flags,
+	 * and the intersection of the allow flags.
+	 */
+	result |= (limit_gfp & denyflags);
+	result |= (huge_gfp & limit_gfp) & allowflags;
+
+	return result;
+}
+
 static struct page *shmem_alloc_hugepage(gfp_t gfp,
 		struct shmem_inode_info *info, pgoff_t index)
 {
@@ -1864,6 +1884,7 @@ repeat:
 
 alloc_huge:
 	huge_gfp = vma_thp_gfp_mask(vma);
+	huge_gfp = limit_gfp_mask(huge_gfp, gfp);
 	page = shmem_alloc_and_acct_page(huge_gfp, inode, index, true);
 	if (IS_ERR(page)) {
 alloc_nohuge:
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 017/118] mm,thp,shmem: make khugepaged obey tmpfs mount flags
  2021-02-26  1:14 incoming Andrew Morton
                   ` (15 preceding siblings ...)
  2021-02-26  1:16 ` [patch 016/118] mm,thp,shm: limit gfp mask to no more than specified Andrew Morton
@ 2021-02-26  1:16 ` Andrew Morton
  2021-02-26  1:16 ` [patch 018/118] mm,shmem,thp: limit shmem THP allocations to requested zones Andrew Morton
                   ` (101 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:16 UTC (permalink / raw)
  To: aarcange, akpm, hughd, linux-mm, mgorman, mhocko, mm-commits,
	riel, torvalds, vbabka, willy, xuyu

From: Rik van Riel <riel@surriel.com>
Subject: mm,thp,shmem: make khugepaged obey tmpfs mount flags

Currently if thp enabled=[madvise], mounting a tmpfs filesystem with
huge=always and mmapping files from that tmpfs does not result in
khugepaged collapsing those mappings, despite the mount flag indicating
that it should.

Fix that by breaking up the blocks of tests in hugepage_vma_check a little
bit, and testing things in the correct order.

Link: https://lkml.kernel.org/r/20201124194925.623931-4-riel@surriel.com
Fixes: c2231020ea7b ("mm: thp: register mm for khugepaged when merging vma for shmem")
Signed-off-by: Rik van Riel <riel@surriel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xu Yu <xuyu@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/khugepaged.h |    2 ++
 mm/khugepaged.c            |   22 ++++++++++++++++------
 2 files changed, 18 insertions(+), 6 deletions(-)

--- a/include/linux/khugepaged.h~mmthpshmem-make-khugepaged-obey-tmpfs-mount-flags
+++ a/include/linux/khugepaged.h
@@ -3,6 +3,7 @@
 #define _LINUX_KHUGEPAGED_H
 
 #include <linux/sched/coredump.h> /* MMF_VM_HUGEPAGE */
+#include <linux/shmem_fs.h>
 
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
@@ -57,6 +58,7 @@ static inline int khugepaged_enter(struc
 {
 	if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags))
 		if ((khugepaged_always() ||
+		     (shmem_file(vma->vm_file) && shmem_huge_enabled(vma)) ||
 		     (khugepaged_req_madv() && (vm_flags & VM_HUGEPAGE))) &&
 		    !(vm_flags & VM_NOHUGEPAGE) &&
 		    !test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
--- a/mm/khugepaged.c~mmthpshmem-make-khugepaged-obey-tmpfs-mount-flags
+++ a/mm/khugepaged.c
@@ -442,18 +442,28 @@ static inline int khugepaged_test_exit(s
 static bool hugepage_vma_check(struct vm_area_struct *vma,
 			       unsigned long vm_flags)
 {
-	if ((!(vm_flags & VM_HUGEPAGE) && !khugepaged_always()) ||
-	    (vm_flags & VM_NOHUGEPAGE) ||
+	/* Explicitly disabled through madvise. */
+	if ((vm_flags & VM_NOHUGEPAGE) ||
 	    test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
 		return false;
 
-	if (shmem_file(vma->vm_file) ||
-	    (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
-	     vma->vm_file &&
-	     (vm_flags & VM_DENYWRITE))) {
+	/* Enabled via shmem mount options or sysfs settings. */
+	if (shmem_file(vma->vm_file) && shmem_huge_enabled(vma)) {
 		return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
 				HPAGE_PMD_NR);
 	}
+
+	/* THP settings require madvise. */
+	if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
+		return false;
+
+	/* Read-only file mappings need to be aligned for THP to work. */
+	if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
+	    (vm_flags & VM_DENYWRITE)) {
+		return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
+				HPAGE_PMD_NR);
+	}
+
 	if (!vma->anon_vma || vma->vm_ops)
 		return false;
 	if (vma_is_temporary_stack(vma))
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 018/118] mm,shmem,thp: limit shmem THP allocations to requested zones
  2021-02-26  1:14 incoming Andrew Morton
                   ` (16 preceding siblings ...)
  2021-02-26  1:16 ` [patch 017/118] mm,thp,shmem: make khugepaged obey tmpfs mount flags Andrew Morton
@ 2021-02-26  1:16 ` Andrew Morton
  2021-02-26  1:16 ` [patch 019/118] mm: cma: allocate cma areas bottom-up Andrew Morton
                   ` (100 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:16 UTC (permalink / raw)
  To: aarcange, akpm, hughd, linux-mm, mgorman, mhocko, mm-commits,
	riel, torvalds, vbabka, willy, xuyu

From: Rik van Riel <riel@surriel.com>
Subject: mm,shmem,thp: limit shmem THP allocations to requested zones

Hugh pointed out that the gma500 driver uses shmem pages, but needs to
limit them to the DMA32 zone.  Ensure the allocations resulting from the
gfp_mask returned by limit_gfp_mask use the zone flags that were
originally passed to shmem_getpage_gfp.

Link: https://lkml.kernel.org/r/20210224121016.1314ed6d@imladris.surriel.com
Signed-off-by: Rik van Riel <riel@surriel.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xu Yu <xuyu@linux.alibaba.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/shmem.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/mm/shmem.c~mmshmemthp-limit-shmem-thp-allocations-to-requested-zones
+++ a/mm/shmem.c
@@ -1513,7 +1513,11 @@ static gfp_t limit_gfp_mask(gfp_t huge_g
 {
 	gfp_t allowflags = __GFP_IO | __GFP_FS | __GFP_RECLAIM;
 	gfp_t denyflags = __GFP_NOWARN | __GFP_NORETRY;
-	gfp_t result = huge_gfp & ~allowflags;
+	gfp_t zoneflags = limit_gfp & GFP_ZONEMASK;
+	gfp_t result = huge_gfp & ~(allowflags | GFP_ZONEMASK);
+
+	/* Allow allocations only from the originally specified zones. */
+	result |= zoneflags;
 
 	/*
 	 * Minimize the result gfp by taking the union with the deny flags,
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 019/118] mm: cma: allocate cma areas bottom-up
  2021-02-26  1:14 incoming Andrew Morton
                   ` (17 preceding siblings ...)
  2021-02-26  1:16 ` [patch 018/118] mm,shmem,thp: limit shmem THP allocations to requested zones Andrew Morton
@ 2021-02-26  1:16 ` Andrew Morton
  2021-02-26  1:16 ` [patch 020/118] mm/cma: expose all pages to the buddy if activation of an area fails Andrew Morton
                   ` (99 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:16 UTC (permalink / raw)
  To: akpm, guro, iamjoonsoo.kim, linux-mm, mhocko, mm-commits, riel,
	rppt, torvalds, vvghjk1234

From: Roman Gushchin <guro@fb.com>
Subject: mm: cma: allocate cma areas bottom-up

Currently cma areas without a fixed base are allocated close to the end of
the node.  This placement is sub-optimal because of compaction: it brings
pages into the cma area.  In particular, it can bring in hot executable
pages, even if there is a plenty of free memory on the machine.  This
results in cma allocation failures.

Instead let's place cma areas close to the beginning of a node.  In this
case the compaction will help to free cma areas, resulting in better cma
allocation success rates.

If there is enough memory let's try to allocate bottom-up starting with
4GB to exclude any possible interference with DMA32.  On smaller machines
or in a case of a failure, stick with the old behavior.

16GB vm, 2GB cma area:
With this patch:
[    0.000000] Command line: root=/dev/vda3 rootflags=subvol=/root systemd.unified_cgroup_hierarchy=1 enforcing=0 console=ttyS0,115200 hugetlb_cma=2G
[    0.002928] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
[    0.002930] cma: Reserved 2048 MiB at 0x0000000100000000
[    0.002931] hugetlb_cma: reserved 2048 MiB on node 0

Without this patch:
[    0.000000] Command line: root=/dev/vda3 rootflags=subvol=/root systemd.unified_cgroup_hierarchy=1 enforcing=0 console=ttyS0,115200 hugetlb_cma=2G
[    0.002930] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
[    0.002933] cma: Reserved 2048 MiB at 0x00000003c0000000
[    0.002934] hugetlb_cma: reserved 2048 MiB on node 0

v2:
  - switched to memblock_set_bottom_up(true), by Mike
  - start with 4GB, by Mike

[guro@fb.com: whitespace fix, per Mike]
  Link: https://lkml.kernel.org/r/20201221170551.GB3428478@carbon.DHCP.thefacebook.com
[guro@fb.com: fix 32-bit warnings]
  Link: https://lkml.kernel.org/r/20201223163537.GA4011967@carbon.DHCP.thefacebook.com
[guro@fb.com: fix 32-bit systems]
[akpm@linux-foundation.org: build fix]
Link: https://lkml.kernel.org/r/20201217201214.3414100-1-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Wonhyuk Yang <vvghjk1234@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/cma.c |   17 +++++++++++++++++
 1 file changed, 17 insertions(+)

--- a/mm/cma.c~mm-cma-allocate-cma-areas-bottom-up
+++ a/mm/cma.c
@@ -336,6 +336,23 @@ int __init cma_declare_contiguous_nid(ph
 			limit = highmem_start;
 		}
 
+		/*
+		 * If there is enough memory, try a bottom-up allocation first.
+		 * It will place the new cma area close to the start of the node
+		 * and guarantee that the compaction is moving pages out of the
+		 * cma area and not into it.
+		 * Avoid using first 4GB to not interfere with constrained zones
+		 * like DMA/DMA32.
+		 */
+#ifdef CONFIG_PHYS_ADDR_T_64BIT
+		if (!memblock_bottom_up() && memblock_end >= SZ_4G + size) {
+			memblock_set_bottom_up(true);
+			addr = memblock_alloc_range_nid(size, alignment, SZ_4G,
+							limit, nid, true);
+			memblock_set_bottom_up(false);
+		}
+#endif
+
 		if (!addr) {
 			addr = memblock_alloc_range_nid(size, alignment, base,
 					limit, nid, true);
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 020/118] mm/cma: expose all pages to the buddy if activation of an area fails
  2021-02-26  1:14 incoming Andrew Morton
                   ` (18 preceding siblings ...)
  2021-02-26  1:16 ` [patch 019/118] mm: cma: allocate cma areas bottom-up Andrew Morton
@ 2021-02-26  1:16 ` Andrew Morton
  2021-02-26  1:16 ` [patch 021/118] mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo Andrew Morton
                   ` (98 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:16 UTC (permalink / raw)
  To: akpm, david, linux-mm, mhocko, mm-commits, osalvador, peterz,
	richard.weiyang, rppt, tglx, torvalds, ziy

From: David Hildenbrand <david@redhat.com>
Subject: mm/cma: expose all pages to the buddy if activation of an area fails

Right now, if activation fails, we might already have exposed some pages
to the buddy for CMA use (although they will never get actually used by
CMA), and some pages won't be exposed to the buddy at all.

Let's check for "single zone" early and on error, don't expose any pages
for CMA use - instead, expose them to the buddy available for any use. 
Simply call free_reserved_page() on every single page - easier than going
via free_reserved_area(), converting back and forth between pfns and virt
addresses.

In addition, make sure to fixup totalcma_pages properly.

Example: 6 GiB QEMU VM with "... hugetlb_cma=2G movablecore=20% ...":
  [    0.006891] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
  [    0.006893] cma: Reserved 2048 MiB at 0x0000000100000000
  [    0.006893] hugetlb_cma: reserved 2048 MiB on node 0
  ...
  [    0.175433] cma: CMA area hugetlb0 could not be activated

Before this patch:
  # cat /proc/meminfo
  MemTotal:        5867348 kB
  MemFree:         5692808 kB
  MemAvailable:    5542516 kB
  ...
  CmaTotal:        2097152 kB
  CmaFree:         1884160 kB

After this patch:
  # cat /proc/meminfo
  MemTotal:        6077308 kB
  MemFree:         5904208 kB
  MemAvailable:    5747968 kB
  ...
  CmaTotal:              0 kB
  CmaFree:               0 kB

Note: cma_init_reserved_mem() makes sure that we always cover full
pageblocks / MAX_ORDER - 1 pages.

Link: https://lkml.kernel.org/r/20210127101813.6370-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/cma.c |   43 +++++++++++++++++++++----------------------
 1 file changed, 21 insertions(+), 22 deletions(-)

--- a/mm/cma.c~mm-cma-expose-all-pages-to-the-buddy-if-activation-of-an-area-fails
+++ a/mm/cma.c
@@ -94,34 +94,29 @@ static void cma_clear_bitmap(struct cma
 
 static void __init cma_activate_area(struct cma *cma)
 {
-	unsigned long base_pfn = cma->base_pfn, pfn = base_pfn;
-	unsigned i = cma->count >> pageblock_order;
+	unsigned long base_pfn = cma->base_pfn, pfn;
 	struct zone *zone;
 
 	cma->bitmap = bitmap_zalloc(cma_bitmap_maxno(cma), GFP_KERNEL);
 	if (!cma->bitmap)
 		goto out_error;
 
-	WARN_ON_ONCE(!pfn_valid(pfn));
-	zone = page_zone(pfn_to_page(pfn));
-
-	do {
-		unsigned j;
-
-		base_pfn = pfn;
-		for (j = pageblock_nr_pages; j; --j, pfn++) {
-			WARN_ON_ONCE(!pfn_valid(pfn));
-			/*
-			 * alloc_contig_range requires the pfn range
-			 * specified to be in the same zone. Make this
-			 * simple by forcing the entire CMA resv range
-			 * to be in the same zone.
-			 */
-			if (page_zone(pfn_to_page(pfn)) != zone)
-				goto not_in_zone;
-		}
-		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
-	} while (--i);
+	/*
+	 * alloc_contig_range() requires the pfn range specified to be in the
+	 * same zone. Simplify by forcing the entire CMA resv range to be in the
+	 * same zone.
+	 */
+	WARN_ON_ONCE(!pfn_valid(base_pfn));
+	zone = page_zone(pfn_to_page(base_pfn));
+	for (pfn = base_pfn + 1; pfn < base_pfn + cma->count; pfn++) {
+		WARN_ON_ONCE(!pfn_valid(pfn));
+		if (page_zone(pfn_to_page(pfn)) != zone)
+			goto not_in_zone;
+	}
+
+	for (pfn = base_pfn; pfn < base_pfn + cma->count;
+	     pfn += pageblock_nr_pages)
+		init_cma_reserved_pageblock(pfn_to_page(pfn));
 
 	mutex_init(&cma->lock);
 
@@ -135,6 +130,10 @@ static void __init cma_activate_area(str
 not_in_zone:
 	bitmap_free(cma->bitmap);
 out_error:
+	/* Expose all pages to the buddy, they are useless for CMA. */
+	for (pfn = base_pfn; pfn < base_pfn + cma->count; pfn++)
+		free_reserved_page(pfn_to_page(pfn));
+	totalcma_pages -= cma->count;
 	cma->count = 0;
 	pr_err("CMA area %s could not be activated\n", cma->name);
 	return;
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 021/118] mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo
  2021-02-26  1:14 incoming Andrew Morton
                   ` (19 preceding siblings ...)
  2021-02-26  1:16 ` [patch 020/118] mm/cma: expose all pages to the buddy if activation of an area fails Andrew Morton
@ 2021-02-26  1:16 ` Andrew Morton
  2021-02-26  1:16 ` [patch 022/118] mm: cma: print region name on failure Andrew Morton
                   ` (97 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:16 UTC (permalink / raw)
  To: akpm, david, linux-mm, mhocko, mm-commits, osalvador, peterz,
	richard.weiyang, rientjes, rppt, tglx, torvalds, ziy

From: David Hildenbrand <david@redhat.com>
Subject: mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo

Let's count the number of CMA pages per zone and print them in
/proc/zoneinfo.

Having access to the total number of CMA pages per zone is helpful for
debugging purposes to know where exactly the CMA pages ended up, and to
figure out how many pages of a zone might behave differently, even after
some of these pages might already have been allocated.

As one example, CMA pages part of a kernel zone cannot be used for
ordinary kernel allocations but instead behave more like ZONE_MOVABLE.

For now, we are only able to get the global nr+free cma pages from
/proc/meminfo and the free cma pages per zone from /proc/zoneinfo.

Example after this patch when booting a 6 GiB QEMU VM with
"hugetlb_cma=2G":
  # cat /proc/zoneinfo | grep cma
          cma      0
        nr_free_cma  0
          cma      0
        nr_free_cma  0
          cma      524288
        nr_free_cma  493016
          cma      0
          cma      0
  # cat /proc/meminfo | grep Cma
  CmaTotal:        2097152 kB
  CmaFree:         1972064 kB

Note: We print even without CONFIG_CMA, just like "nr_free_cma"; this way,
      one can be sure when spotting "cma 0", that there are definetly no
      CMA pages located in a zone.

[david@redhat.com: v2]
  Link: https://lkml.kernel.org/r/20210128164533.18566-1-david@redhat.com
[david@redhat.com: v3]
  Link: https://lkml.kernel.org/r/20210129113451.22085-1-david@redhat.com
Link: https://lkml.kernel.org/r/20210127101813.6370-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mmzone.h |   15 +++++++++++++++
 mm/page_alloc.c        |    1 +
 mm/vmstat.c            |    6 ++++--
 3 files changed, 20 insertions(+), 2 deletions(-)

--- a/include/linux/mmzone.h~mm-page_alloc-count-cma-pages-per-zone-and-print-them-in-proc-zoneinfo
+++ a/include/linux/mmzone.h
@@ -503,6 +503,9 @@ struct zone {
 	 * bootmem allocator):
 	 *	managed_pages = present_pages - reserved_pages;
 	 *
+	 * cma pages is present pages that are assigned for CMA use
+	 * (MIGRATE_CMA).
+	 *
 	 * So present_pages may be used by memory hotplug or memory power
 	 * management logic to figure out unmanaged pages by checking
 	 * (present_pages - managed_pages). And managed_pages should be used
@@ -527,6 +530,9 @@ struct zone {
 	atomic_long_t		managed_pages;
 	unsigned long		spanned_pages;
 	unsigned long		present_pages;
+#ifdef CONFIG_CMA
+	unsigned long		cma_pages;
+#endif
 
 	const char		*name;
 
@@ -624,6 +630,15 @@ static inline unsigned long zone_managed
 	return (unsigned long)atomic_long_read(&zone->managed_pages);
 }
 
+static inline unsigned long zone_cma_pages(struct zone *zone)
+{
+#ifdef CONFIG_CMA
+	return zone->cma_pages;
+#else
+	return 0;
+#endif
+}
+
 static inline unsigned long zone_end_pfn(const struct zone *zone)
 {
 	return zone->zone_start_pfn + zone->spanned_pages;
--- a/mm/page_alloc.c~mm-page_alloc-count-cma-pages-per-zone-and-print-them-in-proc-zoneinfo
+++ a/mm/page_alloc.c
@@ -2168,6 +2168,7 @@ void __init init_cma_reserved_pageblock(
 	}
 
 	adjust_managed_page_count(page, pageblock_nr_pages);
+	page_zone(page)->cma_pages += pageblock_nr_pages;
 }
 #endif
 
--- a/mm/vmstat.c~mm-page_alloc-count-cma-pages-per-zone-and-print-them-in-proc-zoneinfo
+++ a/mm/vmstat.c
@@ -1637,14 +1637,16 @@ static void zoneinfo_show_print(struct s
 		   "\n        high     %lu"
 		   "\n        spanned  %lu"
 		   "\n        present  %lu"
-		   "\n        managed  %lu",
+		   "\n        managed  %lu"
+		   "\n        cma      %lu",
 		   zone_page_state(zone, NR_FREE_PAGES),
 		   min_wmark_pages(zone),
 		   low_wmark_pages(zone),
 		   high_wmark_pages(zone),
 		   zone->spanned_pages,
 		   zone->present_pages,
-		   zone_managed_pages(zone));
+		   zone_managed_pages(zone),
+		   zone_cma_pages(zone));
 
 	seq_printf(m,
 		   "\n        protection: (%ld",
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 022/118] mm: cma: print region name on failure
  2021-02-26  1:14 incoming Andrew Morton
                   ` (20 preceding siblings ...)
  2021-02-26  1:16 ` [patch 021/118] mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo Andrew Morton
@ 2021-02-26  1:16 ` Andrew Morton
  2021-02-26  1:16 ` [patch 023/118] mm: vmstat: fix NOHZ wakeups for node stat changes Andrew Morton
                   ` (96 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:16 UTC (permalink / raw)
  To: akpm, david, georgi.djakov, linux-mm, minchan, mm-commits, pdaly,
	rdunlap, torvalds

From: Patrick Daly <pdaly@codeaurora.org>
Subject: mm: cma: print region name on failure

Print the name of the CMA region for convenience.  This is useful
information to have when cma_alloc() fails.

[pdaly@codeaurora.org: print the "count" variable]
  Link: https://lkml.kernel.org/r/20210209142414.12768-1-georgi.djakov@linaro.org
Link: https://lkml.kernel.org/r/20210208115200.20286-1-georgi.djakov@linaro.org
Signed-off-by: Patrick Daly <pdaly@codeaurora.org>
Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/cma.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/mm/cma.c~mm-cma-print-region-name-on-failure
+++ a/mm/cma.c
@@ -500,8 +500,8 @@ struct page *cma_alloc(struct cma *cma,
 	}
 
 	if (ret && !no_warn) {
-		pr_err("%s: alloc failed, req-size: %zu pages, ret: %d\n",
-			__func__, count, ret);
+		pr_err("%s: %s: alloc failed, req-size: %zu pages, ret: %d\n",
+		       __func__, cma->name, count, ret);
 		cma_debug_show_areas(cma);
 	}
 
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 023/118] mm: vmstat: fix NOHZ wakeups for node stat changes
  2021-02-26  1:14 incoming Andrew Morton
                   ` (21 preceding siblings ...)
  2021-02-26  1:16 ` [patch 022/118] mm: cma: print region name on failure Andrew Morton
@ 2021-02-26  1:16 ` Andrew Morton
  2021-02-26  1:16 ` [patch 024/118] mm: vmstat: add some comments on internal storage of byte items Andrew Morton
                   ` (95 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:16 UTC (permalink / raw)
  To: akpm, hannes, linux-mm, mm-commits, torvalds

From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm: vmstat: fix NOHZ wakeups for node stat changes

On NOHZ, the periodic vmstat flushers on each CPU can go to sleep and
won't wake up until stat changes are detected in the per-cpu deltas of the
zone vmstat counters.

In commit 75ef71840539 ("mm, vmstat: add infrastructure for per-node
vmstats") per-node counters were introduced, and subsequently most stats
were moved from the zone to the node level.  However, the node counters
weren't added to the NOHZ wakeup detection.

In theory this can cause per-cpu errors to remain in the user-reported
stats indefinitely.  In practice this only affects a handful of sub
counters (file_mapped, dirty and writeback e.g.) because other page state
changes at the node level likely involve a change at the zone level as
well (alloc and free, lru ops).  Also, nobody has complained.

Fix it up for completeness: wake up vmstat refreshing on node changes. 
Also remove the BUILD_BUG_ONs that assert counter size; we haven't relied
on it since we added sizeof() to the range calculation in commit
13c9aaf7fa01 ("mm/vmstat.c: fix NUMA statistics updates").

Link: https://lkml.kernel.org/r/20210202184342.118513-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/vmstat.c |   15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

--- a/mm/vmstat.c~mm-vmstat-fix-nohz-wakeups-for-node-stat-changes
+++ a/mm/vmstat.c
@@ -1894,16 +1894,12 @@ static void vmstat_update(struct work_st
  */
 static bool need_update(int cpu)
 {
+	pg_data_t *last_pgdat = NULL;
 	struct zone *zone;
 
 	for_each_populated_zone(zone) {
 		struct per_cpu_pageset *p = per_cpu_ptr(zone->pageset, cpu);
-
-		BUILD_BUG_ON(sizeof(p->vm_stat_diff[0]) != 1);
-#ifdef CONFIG_NUMA
-		BUILD_BUG_ON(sizeof(p->vm_numa_stat_diff[0]) != 2);
-#endif
-
+		struct per_cpu_nodestat *n;
 		/*
 		 * The fast way of checking if there are any vmstat diffs.
 		 */
@@ -1915,6 +1911,13 @@ static bool need_update(int cpu)
 			       sizeof(p->vm_numa_stat_diff[0])))
 			return true;
 #endif
+		if (last_pgdat == zone->zone_pgdat)
+			continue;
+		last_pgdat = zone->zone_pgdat;
+		n = per_cpu_ptr(zone->zone_pgdat->per_cpu_nodestats, cpu);
+		if (memchr_inv(n->vm_node_stat_diff, 0, NR_VM_NODE_STAT_ITEMS *
+			       sizeof(n->vm_node_stat_diff[0])))
+		    return true;
 	}
 	return false;
 }
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 024/118] mm: vmstat: add some comments on internal storage of byte items
  2021-02-26  1:14 incoming Andrew Morton
                   ` (22 preceding siblings ...)
  2021-02-26  1:16 ` [patch 023/118] mm: vmstat: fix NOHZ wakeups for node stat changes Andrew Morton
@ 2021-02-26  1:16 ` Andrew Morton
  2021-02-26  1:16 ` [patch 025/118] mm/vmstat.c: erase latency in vmstat_shepherd Andrew Morton
                   ` (94 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:16 UTC (permalink / raw)
  To: akpm, hannes, linux-mm, mm-commits, torvalds

From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm: vmstat: add some comments on internal storage of byte items

Byte-accounted items are used for slab object accounting at the cgroup
level, because the objects in a slab page can belong to different cgroups.
At the global level these items always change in multiples of whole slab
pages.  The vmstat code exploits this and stores these items as pages
internally, which allows for more compact per-cpu data.

This optimization isn't self-evident from the asserts and the division in
the stat update functions.  Provide the reader with some context.

Link: https://lkml.kernel.org/r/20210202184411.118614-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/vmstat.h |    6 ++++++
 mm/vmstat.c            |   12 ++++++++++++
 2 files changed, 18 insertions(+)

--- a/include/linux/vmstat.h~mm-vmstat-add-some-comments-on-internal-storage-of-byte-items
+++ a/include/linux/vmstat.h
@@ -313,6 +313,12 @@ static inline void __mod_node_page_state
 			enum node_stat_item item, int delta)
 {
 	if (vmstat_item_in_bytes(item)) {
+		/*
+		 * Only cgroups use subpage accounting right now; at
+		 * the global level, these items still change in
+		 * multiples of whole pages. Store them as pages
+		 * internally to keep the per-cpu counters compact.
+		 */
 		VM_WARN_ON_ONCE(delta & (PAGE_SIZE - 1));
 		delta >>= PAGE_SHIFT;
 	}
--- a/mm/vmstat.c~mm-vmstat-add-some-comments-on-internal-storage-of-byte-items
+++ a/mm/vmstat.c
@@ -342,6 +342,12 @@ void __mod_node_page_state(struct pglist
 	long t;
 
 	if (vmstat_item_in_bytes(item)) {
+		/*
+		 * Only cgroups use subpage accounting right now; at
+		 * the global level, these items still change in
+		 * multiples of whole pages. Store them as pages
+		 * internally to keep the per-cpu counters compact.
+		 */
 		VM_WARN_ON_ONCE(delta & (PAGE_SIZE - 1));
 		delta >>= PAGE_SHIFT;
 	}
@@ -551,6 +557,12 @@ static inline void mod_node_state(struct
 	long o, n, t, z;
 
 	if (vmstat_item_in_bytes(item)) {
+		/*
+		 * Only cgroups use subpage accounting right now; at
+		 * the global level, these items still change in
+		 * multiples of whole pages. Store them as pages
+		 * internally to keep the per-cpu counters compact.
+		 */
 		VM_WARN_ON_ONCE(delta & (PAGE_SIZE - 1));
 		delta >>= PAGE_SHIFT;
 	}
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 025/118] mm/vmstat.c: erase latency in vmstat_shepherd
  2021-02-26  1:14 incoming Andrew Morton
                   ` (23 preceding siblings ...)
  2021-02-26  1:16 ` [patch 024/118] mm: vmstat: add some comments on internal storage of byte items Andrew Morton
@ 2021-02-26  1:16 ` Andrew Morton
  2021-02-26  1:16 ` [patch 026/118] mm: move pfn_to_online_page() out of line Andrew Morton
                   ` (93 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:16 UTC (permalink / raw)
  To: akpm, benbjiang, cl, linux-mm, mm-commits, robinlai, torvalds

From: Jiang Biao <benbjiang@tencent.com>
Subject: mm/vmstat.c: erase latency in vmstat_shepherd

Many 100us+ latencies have been deteceted in vmstat_shepherd() on CPX
platform which has 208 logic cpus.  And vmstat_shepherd is queued every
second, which could make the case worse.

Add schedule point in vmstat_shepherd() to erase the latency.

Link: https://lkml.kernel.org/r/20210111035526.1511-1-benbjiang@tencent.com
Signed-off-by: Jiang Biao <benbjiang@tencent.com>
Reported-by: Bin Lai <robinlai@tencent.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/vmstat.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/mm/vmstat.c~mm-vmstatc-erase-latency-in-vmstat_shepherd
+++ a/mm/vmstat.c
@@ -1980,6 +1980,8 @@ static void vmstat_shepherd(struct work_
 
 		if (!delayed_work_pending(dw) && need_update(cpu))
 			queue_delayed_work_on(cpu, mm_percpu_wq, dw, 0);
+
+		cond_resched();
 	}
 	put_online_cpus();
 
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 026/118] mm: move pfn_to_online_page() out of line
  2021-02-26  1:14 incoming Andrew Morton
                   ` (24 preceding siblings ...)
  2021-02-26  1:16 ` [patch 025/118] mm/vmstat.c: erase latency in vmstat_shepherd Andrew Morton
@ 2021-02-26  1:16 ` Andrew Morton
  2021-02-26  1:17 ` [patch 027/118] mm: teach pfn_to_online_page() to consider subsection validity Andrew Morton
                   ` (92 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:16 UTC (permalink / raw)
  To: akpm, cai, dan.j.williams, david, linux-mm, mhocko, mhocko,
	mm-commits, naoya.horiguchi, osalvador, torvalds

From: Dan Williams <dan.j.williams@intel.com>
Subject: mm: move pfn_to_online_page() out of line

Patch series "mm: Fix pfn_to_online_page() with respect to ZONE_DEVICE", v4.

A pfn-walker that uses pfn_to_online_page() may inadvertently translate a
pfn as online and in the page allocator, when it is offline managed by a
ZONE_DEVICE mapping (details in Patch 3: ("mm: Teach pfn_to_online_page()
about ZONE_DEVICE section collisions")).

The 2 proposals under consideration are teach pfn_to_online_page() to be
precise in the presence of mixed-zone sections, or teach the memory-add
code to drop the System RAM associated with ZONE_DEVICE collisions.  In
order to not regress memory capacity by a few 10s to 100s of MiB the
approach taken in this set is to add precision to pfn_to_online_page().

In the course of validating pfn_to_online_page() a couple other fixes
fell out:

1/ soft_offline_page() fails to drop the reference taken in the
   madvise(..., MADV_SOFT_OFFLINE) case.

2/ memory_failure() uses get_dev_pagemap() to lookup ZONE_DEVICE pages,
   however that mapping may contain data pages and metadata raw pfns.
   Introduce pgmap_pfn_valid() to delineate the 2 types and fail the
   handling of raw metadata pfns.


This patch (of 4);

pfn_to_online_page() is already too large to be a macro or an inline
function.  In anticipation of further logic changes / growth, move it out
of line.

No functional change, just code movement.

Link: https://lkml.kernel.org/r/161058499000.1840162.702316708443239771.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/161058499608.1840162.10165648147615238793.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Michal Hocko <mhocko@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/memory_hotplug.h |   17 +----------------
 mm/memory_hotplug.c            |   16 ++++++++++++++++
 2 files changed, 17 insertions(+), 16 deletions(-)

--- a/include/linux/memory_hotplug.h~mm-move-pfn_to_online_page-out-of-line
+++ a/include/linux/memory_hotplug.h
@@ -16,22 +16,7 @@ struct resource;
 struct vmem_altmap;
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-/*
- * Return page for the valid pfn only if the page is online. All pfn
- * walkers which rely on the fully initialized page->flags and others
- * should use this rather than pfn_valid && pfn_to_page
- */
-#define pfn_to_online_page(pfn)					   \
-({								   \
-	struct page *___page = NULL;				   \
-	unsigned long ___pfn = pfn;				   \
-	unsigned long ___nr = pfn_to_section_nr(___pfn);	   \
-								   \
-	if (___nr < NR_MEM_SECTIONS && online_section_nr(___nr) && \
-	    pfn_valid_within(___pfn))				   \
-		___page = pfn_to_page(___pfn);			   \
-	___page;						   \
-})
+struct page *pfn_to_online_page(unsigned long pfn);
 
 /*
  * Types for free bootmem stored in page->lru.next. These have to be in
--- a/mm/memory_hotplug.c~mm-move-pfn_to_online_page-out-of-line
+++ a/mm/memory_hotplug.c
@@ -301,6 +301,22 @@ static int check_hotplug_memory_addressa
 }
 
 /*
+ * Return page for the valid pfn only if the page is online. All pfn
+ * walkers which rely on the fully initialized page->flags and others
+ * should use this rather than pfn_valid && pfn_to_page
+ */
+struct page *pfn_to_online_page(unsigned long pfn)
+{
+	unsigned long nr = pfn_to_section_nr(pfn);
+
+	if (nr < NR_MEM_SECTIONS && online_section_nr(nr) &&
+	    pfn_valid_within(pfn))
+		return pfn_to_page(pfn);
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(pfn_to_online_page);
+
+/*
  * Reasonably generic function for adding memory.  It is
  * expected that archs that support memory hotplug will
  * call this function after deciding the zone to which to
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 027/118] mm: teach pfn_to_online_page() to consider subsection validity
  2021-02-26  1:14 incoming Andrew Morton
                   ` (25 preceding siblings ...)
  2021-02-26  1:16 ` [patch 026/118] mm: move pfn_to_online_page() out of line Andrew Morton
@ 2021-02-26  1:17 ` Andrew Morton
  2021-02-26  1:17 ` [patch 028/118] mm: teach pfn_to_online_page() about ZONE_DEVICE section collisions Andrew Morton
                   ` (91 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:17 UTC (permalink / raw)
  To: akpm, cai, dan.j.williams, david, linux-mm, mhocko, mm-commits,
	naoya.horiguchi, osalvador, torvalds

From: Dan Williams <dan.j.williams@intel.com>
Subject: mm: teach pfn_to_online_page() to consider subsection validity

pfn_to_online_page is primarily used to filter out offline or fully
uninitialized pages.  pfn_valid resp.  online_section_nr have a coarse
per memory section granularity.  If a section shared with a partially
offline memory (e.g.  part of ZONE_DEVICE) then pfn_to_online_page
would lead to a false positive on some pfns.  Fix this by adding
pfn_section_valid check which is subsection aware.

[mhocko@kernel.org: changelog rewrite]
Link: https://lkml.kernel.org/r/161058500148.1840162.4365921007820501696.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: b13bc35193d9 ("mm/hotplug: invalid PFNs from pfn_to_online_page()")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory_hotplug.c |   23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

--- a/mm/memory_hotplug.c~mm-teach-pfn_to_online_page-to-consider-subsection-validity
+++ a/mm/memory_hotplug.c
@@ -308,11 +308,26 @@ static int check_hotplug_memory_addressa
 struct page *pfn_to_online_page(unsigned long pfn)
 {
 	unsigned long nr = pfn_to_section_nr(pfn);
+	struct mem_section *ms;
 
-	if (nr < NR_MEM_SECTIONS && online_section_nr(nr) &&
-	    pfn_valid_within(pfn))
-		return pfn_to_page(pfn);
-	return NULL;
+	if (nr >= NR_MEM_SECTIONS)
+		return NULL;
+
+	ms = __nr_to_section(nr);
+	if (!online_section(ms))
+		return NULL;
+
+	/*
+	 * Save some code text when online_section() +
+	 * pfn_section_valid() are sufficient.
+	 */
+	if (IS_ENABLED(CONFIG_HAVE_ARCH_PFN_VALID) && !pfn_valid(pfn))
+		return NULL;
+
+	if (!pfn_section_valid(ms, pfn))
+		return NULL;
+
+	return pfn_to_page(pfn);
 }
 EXPORT_SYMBOL_GPL(pfn_to_online_page);
 
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 028/118] mm: teach pfn_to_online_page() about ZONE_DEVICE section collisions
  2021-02-26  1:14 incoming Andrew Morton
                   ` (26 preceding siblings ...)
  2021-02-26  1:17 ` [patch 027/118] mm: teach pfn_to_online_page() to consider subsection validity Andrew Morton
@ 2021-02-26  1:17 ` Andrew Morton
  2021-02-26  1:17 ` [patch 029/118] mm: fix memory_failure() handling of dax-namespace metadata Andrew Morton
                   ` (90 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:17 UTC (permalink / raw)
  To: akpm, cai, dan.j.williams, david, linux-mm, mhocko, mm-commits,
	naoya.horiguchi, osalvador, torvalds

From: Dan Williams <dan.j.williams@intel.com>
Subject: mm: teach pfn_to_online_page() about ZONE_DEVICE section collisions

While pfn_to_online_page() is able to determine pfn_valid() at subsection
granularity it is not able to reliably determine if a given pfn is also
online if the section is mixes ZONE_{NORMAL,MOVABLE} with ZONE_DEVICE. 
This means that pfn_to_online_page() may return invalid @page objects. 
For example with a memory map like:

100000000-1fbffffff : System RAM
  142000000-143002e16 : Kernel code
  143200000-143713fff : Kernel rodata
  143800000-143b15b7f : Kernel data
  144227000-144ffffff : Kernel bss
1fc000000-2fbffffff : Persistent Memory (legacy)
  1fc000000-2fbffffff : namespace0.0

This command:

echo 0x1fc000000 > /sys/devices/system/memory/soft_offline_page

...succeeds when it should fail.  When it succeeds it touches an
uninitialized page and may crash or cause other damage (see
dissolve_free_huge_page()).

While the memory map above is contrived via the memmap=ss!nn kernel
command line option, the collision happens in practice on shipping
platforms.  The memory controller resources that decode spans of physical
address space are a limited resource.  One technique platform-firmware
uses to conserve those resources is to share a decoder across 2 devices to
keep the address range contiguous.  Unfortunately the unit of operation of
a decoder is 64MiB while the Linux section size is 128MiB.  This results
in situations where, without subsection hotplug memory mappings with
different lifetimes collide into one object that can only express one
lifetime.

Update move_pfn_range_to_zone() to flag (SECTION_TAINT_ZONE_DEVICE) a
section that mixes ZONE_DEVICE pfns with other online pfns.  With
SECTION_TAINT_ZONE_DEVICE to delineate, pfn_to_online_page() can fall back
to a slow-path check for ZONE_DEVICE pfns in an online section.  In the
fast path online_section() for a full ZONE_DEVICE section returns false.

Because the collision case is rare, and for simplicity, the
SECTION_TAINT_ZONE_DEVICE flag is never cleared once set.

[dan.j.williams@intel.com: fix CONFIG_ZONE_DEVICE=n build]
  Link: https://lkml.kernel.org/r/CAPcyv4iX+7LAgAeSqx7Zw-Zd=ZV9gBv8Bo7oTbwCOOqJoZ3+Yg@mail.gmail.com
Link: https://lkml.kernel.org/r/161058500675.1840162.7887862152161279354.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reported-by: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mmzone.h |   34 +++++++++++++++++++++++++++-------
 mm/memory_hotplug.c    |   38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+), 7 deletions(-)

--- a/include/linux/mmzone.h~mm-teach-pfn_to_online_page-about-zone_device-section-collisions
+++ a/include/linux/mmzone.h
@@ -918,6 +918,18 @@ static inline int local_memory_node(int
  */
 #define zone_idx(zone)		((zone) - (zone)->zone_pgdat->node_zones)
 
+#ifdef CONFIG_ZONE_DEVICE
+static inline bool zone_is_zone_device(struct zone *zone)
+{
+	return zone_idx(zone) == ZONE_DEVICE;
+}
+#else
+static inline bool zone_is_zone_device(struct zone *zone)
+{
+	return false;
+}
+#endif
+
 /*
  * Returns true if a zone has pages managed by the buddy allocator.
  * All the reclaim decisions have to use this function rather than
@@ -1306,13 +1318,14 @@ extern size_t mem_section_usage_size(voi
  *      which results in PFN_SECTION_SHIFT equal 6.
  * To sum it up, at least 6 bits are available.
  */
-#define	SECTION_MARKED_PRESENT	(1UL<<0)
-#define SECTION_HAS_MEM_MAP	(1UL<<1)
-#define SECTION_IS_ONLINE	(1UL<<2)
-#define SECTION_IS_EARLY	(1UL<<3)
-#define SECTION_MAP_LAST_BIT	(1UL<<4)
-#define SECTION_MAP_MASK	(~(SECTION_MAP_LAST_BIT-1))
-#define SECTION_NID_SHIFT	3
+#define SECTION_MARKED_PRESENT		(1UL<<0)
+#define SECTION_HAS_MEM_MAP		(1UL<<1)
+#define SECTION_IS_ONLINE		(1UL<<2)
+#define SECTION_IS_EARLY		(1UL<<3)
+#define SECTION_TAINT_ZONE_DEVICE	(1UL<<4)
+#define SECTION_MAP_LAST_BIT		(1UL<<5)
+#define SECTION_MAP_MASK		(~(SECTION_MAP_LAST_BIT-1))
+#define SECTION_NID_SHIFT		3
 
 static inline struct page *__section_mem_map_addr(struct mem_section *section)
 {
@@ -1351,6 +1364,13 @@ static inline int online_section(struct
 	return (section && (section->section_mem_map & SECTION_IS_ONLINE));
 }
 
+static inline int online_device_section(struct mem_section *section)
+{
+	unsigned long flags = SECTION_IS_ONLINE | SECTION_TAINT_ZONE_DEVICE;
+
+	return section && ((section->section_mem_map & flags) == flags);
+}
+
 static inline int online_section_nr(unsigned long nr)
 {
 	return online_section(__nr_to_section(nr));
--- a/mm/memory_hotplug.c~mm-teach-pfn_to_online_page-about-zone_device-section-collisions
+++ a/mm/memory_hotplug.c
@@ -308,6 +308,7 @@ static int check_hotplug_memory_addressa
 struct page *pfn_to_online_page(unsigned long pfn)
 {
 	unsigned long nr = pfn_to_section_nr(pfn);
+	struct dev_pagemap *pgmap;
 	struct mem_section *ms;
 
 	if (nr >= NR_MEM_SECTIONS)
@@ -327,6 +328,22 @@ struct page *pfn_to_online_page(unsigned
 	if (!pfn_section_valid(ms, pfn))
 		return NULL;
 
+	if (!online_device_section(ms))
+		return pfn_to_page(pfn);
+
+	/*
+	 * Slowpath: when ZONE_DEVICE collides with
+	 * ZONE_{NORMAL,MOVABLE} within the same section some pfns in
+	 * the section may be 'offline' but 'valid'. Only
+	 * get_dev_pagemap() can determine sub-section online status.
+	 */
+	pgmap = get_dev_pagemap(pfn, NULL);
+	put_dev_pagemap(pgmap);
+
+	/* The presence of a pgmap indicates ZONE_DEVICE offline pfn */
+	if (pgmap)
+		return NULL;
+
 	return pfn_to_page(pfn);
 }
 EXPORT_SYMBOL_GPL(pfn_to_online_page);
@@ -709,6 +726,14 @@ static void __meminit resize_pgdat_range
 	pgdat->node_spanned_pages = max(start_pfn + nr_pages, old_end_pfn) - pgdat->node_start_pfn;
 
 }
+
+static void section_taint_zone_device(unsigned long pfn)
+{
+	struct mem_section *ms = __pfn_to_section(pfn);
+
+	ms->section_mem_map |= SECTION_TAINT_ZONE_DEVICE;
+}
+
 /*
  * Associate the pfn range with the given zone, initializing the memmaps
  * and resizing the pgdat/zone data to span the added pages. After this
@@ -739,6 +764,19 @@ void __ref move_pfn_range_to_zone(struct
 	pgdat_resize_unlock(pgdat, &flags);
 
 	/*
+	 * Subsection population requires care in pfn_to_online_page().
+	 * Set the taint to enable the slow path detection of
+	 * ZONE_DEVICE pages in an otherwise  ZONE_{NORMAL,MOVABLE}
+	 * section.
+	 */
+	if (zone_is_zone_device(zone)) {
+		if (!IS_ALIGNED(start_pfn, PAGES_PER_SECTION))
+			section_taint_zone_device(start_pfn);
+		if (!IS_ALIGNED(start_pfn + nr_pages, PAGES_PER_SECTION))
+			section_taint_zone_device(start_pfn + nr_pages);
+	}
+
+	/*
 	 * TODO now we have a visible range of pages which are not associated
 	 * with their zone properly. Not nice but set_pfnblock_flags_mask
 	 * expects the zone spans the pfn range. All the pages in the range
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 029/118] mm: fix memory_failure() handling of dax-namespace metadata
  2021-02-26  1:14 incoming Andrew Morton
                   ` (27 preceding siblings ...)
  2021-02-26  1:17 ` [patch 028/118] mm: teach pfn_to_online_page() about ZONE_DEVICE section collisions Andrew Morton
@ 2021-02-26  1:17 ` Andrew Morton
  2021-02-26  1:17 ` [patch 030/118] mm/memory_hotplug: rename all existing 'memhp' into 'mhp' Andrew Morton
                   ` (89 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:17 UTC (permalink / raw)
  To: akpm, cai, dan.j.williams, david, linux-mm, mhocko, mm-commits,
	naoya.horiguchi, osalvador, torvalds

From: Dan Williams <dan.j.williams@intel.com>
Subject: mm: fix memory_failure() handling of dax-namespace metadata

Given 'struct dev_pagemap' spans both data pages and metadata pages be
careful to consult the altmap if present to delineate metadata.  In fact
the pfn_first() helper already identifies the first valid data pfn, so
export that helper for other code paths via pgmap_pfn_valid().

Other usage of get_dev_pagemap() are not a concern because those are
operating on known data pfns having been looked up by get_user_pages(). 
I.e.  metadata pfns are never user mapped.

Link: https://lkml.kernel.org/r/161058501758.1840162.4239831989762604527.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 6100e34b2526 ("mm, memory_failure: Teach memory_failure() about dev_pagemap pages")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/memremap.h |    6 ++++++
 mm/memory-failure.c      |    6 ++++++
 mm/memremap.c            |   15 +++++++++++++++
 3 files changed, 27 insertions(+)

--- a/include/linux/memremap.h~mm-fix-memory_failure-handling-of-dax-namespace-metadata
+++ a/include/linux/memremap.h
@@ -137,6 +137,7 @@ void *devm_memremap_pages(struct device
 void devm_memunmap_pages(struct device *dev, struct dev_pagemap *pgmap);
 struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
 		struct dev_pagemap *pgmap);
+bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn);
 
 unsigned long vmem_altmap_offset(struct vmem_altmap *altmap);
 void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns);
@@ -165,6 +166,11 @@ static inline struct dev_pagemap *get_de
 	return NULL;
 }
 
+static inline bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn)
+{
+	return false;
+}
+
 static inline unsigned long vmem_altmap_offset(struct vmem_altmap *altmap)
 {
 	return 0;
--- a/mm/memory-failure.c~mm-fix-memory_failure-handling-of-dax-namespace-metadata
+++ a/mm/memory-failure.c
@@ -1312,6 +1312,12 @@ static int memory_failure_dev_pagemap(un
 		 */
 		put_page(page);
 
+	/* device metadata space is not recoverable */
+	if (!pgmap_pfn_valid(pgmap, pfn)) {
+		rc = -ENXIO;
+		goto out;
+	}
+
 	/*
 	 * Prevent the inode from being freed while we are interrogating
 	 * the address_space, typically this would be handled by
--- a/mm/memremap.c~mm-fix-memory_failure-handling-of-dax-namespace-metadata
+++ a/mm/memremap.c
@@ -80,6 +80,21 @@ static unsigned long pfn_first(struct de
 	return pfn + vmem_altmap_offset(pgmap_altmap(pgmap));
 }
 
+bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn)
+{
+	int i;
+
+	for (i = 0; i < pgmap->nr_range; i++) {
+		struct range *range = &pgmap->ranges[i];
+
+		if (pfn >= PHYS_PFN(range->start) &&
+		    pfn <= PHYS_PFN(range->end))
+			return pfn >= pfn_first(pgmap, i);
+	}
+
+	return false;
+}
+
 static unsigned long pfn_end(struct dev_pagemap *pgmap, int range_id)
 {
 	const struct range *range = &pgmap->ranges[range_id];
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 030/118] mm/memory_hotplug: rename all existing 'memhp' into 'mhp'
  2021-02-26  1:14 incoming Andrew Morton
                   ` (28 preceding siblings ...)
  2021-02-26  1:17 ` [patch 029/118] mm: fix memory_failure() handling of dax-namespace metadata Andrew Morton
@ 2021-02-26  1:17 ` Andrew Morton
  2021-02-26  1:17 ` [patch 031/118] mm/memory_hotplug: MEMHP_MERGE_RESOURCE -> MHP_MERGE_RESOURCE Andrew Morton
                   ` (88 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:17 UTC (permalink / raw)
  To: akpm, anshuman.khandual, david, gregkh, linux-mm, mm-commits,
	rafael, torvalds

From: Anshuman Khandual <anshuman.khandual@arm.com>
Subject: mm/memory_hotplug: rename all existing 'memhp' into 'mhp'

This renames all 'memhp' instances to 'mhp' except for memhp_default_state
for being a kernel command line option.  This is just a clean up and
should not cause a functional change.  Let's make it consistent rater than
mixing the two prefixes.  In preparation for more users of the 'mhp'
terminology.

Link: https://lkml.kernel.org/r/1611554093-27316-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/base/memory.c          |   10 +++++-----
 include/linux/memory_hotplug.h |    4 ++--
 mm/memory_hotplug.c            |   12 ++++++------
 3 files changed, 13 insertions(+), 13 deletions(-)

--- a/drivers/base/memory.c~mm-memory_hotplug-rename-all-existing-memhp-into-mhp
+++ a/drivers/base/memory.c
@@ -35,7 +35,7 @@ static const char *const online_type_to_
 	[MMOP_ONLINE_MOVABLE] = "online_movable",
 };
 
-int memhp_online_type_from_str(const char *str)
+int mhp_online_type_from_str(const char *str)
 {
 	int i;
 
@@ -253,7 +253,7 @@ static int memory_subsys_offline(struct
 static ssize_t state_store(struct device *dev, struct device_attribute *attr,
 			   const char *buf, size_t count)
 {
-	const int online_type = memhp_online_type_from_str(buf);
+	const int online_type = mhp_online_type_from_str(buf);
 	struct memory_block *mem = to_memory_block(dev);
 	int ret;
 
@@ -387,19 +387,19 @@ static ssize_t auto_online_blocks_show(s
 				       struct device_attribute *attr, char *buf)
 {
 	return sysfs_emit(buf, "%s\n",
-			  online_type_to_str[memhp_default_online_type]);
+			  online_type_to_str[mhp_default_online_type]);
 }
 
 static ssize_t auto_online_blocks_store(struct device *dev,
 					struct device_attribute *attr,
 					const char *buf, size_t count)
 {
-	const int online_type = memhp_online_type_from_str(buf);
+	const int online_type = mhp_online_type_from_str(buf);
 
 	if (online_type < 0)
 		return -EINVAL;
 
-	memhp_default_online_type = online_type;
+	mhp_default_online_type = online_type;
 	return count;
 }
 
--- a/include/linux/memory_hotplug.h~mm-memory_hotplug-rename-all-existing-memhp-into-mhp
+++ a/include/linux/memory_hotplug.h
@@ -116,10 +116,10 @@ extern int arch_add_memory(int nid, u64
 			   struct mhp_params *params);
 extern u64 max_mem_size;
 
-extern int memhp_online_type_from_str(const char *str);
+extern int mhp_online_type_from_str(const char *str);
 
 /* Default online_type (MMOP_*) when new memory blocks are added. */
-extern int memhp_default_online_type;
+extern int mhp_default_online_type;
 /* If movable_node boot option specified */
 extern bool movable_node_enabled;
 static inline bool movable_node_is_enabled(void)
--- a/mm/memory_hotplug.c~mm-memory_hotplug-rename-all-existing-memhp-into-mhp
+++ a/mm/memory_hotplug.c
@@ -67,17 +67,17 @@ void put_online_mems(void)
 bool movable_node_enabled = false;
 
 #ifndef CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE
-int memhp_default_online_type = MMOP_OFFLINE;
+int mhp_default_online_type = MMOP_OFFLINE;
 #else
-int memhp_default_online_type = MMOP_ONLINE;
+int mhp_default_online_type = MMOP_ONLINE;
 #endif
 
 static int __init setup_memhp_default_state(char *str)
 {
-	const int online_type = memhp_online_type_from_str(str);
+	const int online_type = mhp_online_type_from_str(str);
 
 	if (online_type >= 0)
-		memhp_default_online_type = online_type;
+		mhp_default_online_type = online_type;
 
 	return 1;
 }
@@ -1076,7 +1076,7 @@ static int check_hotplug_memory_range(u6
 
 static int online_memory_block(struct memory_block *mem, void *arg)
 {
-	mem->online_type = memhp_default_online_type;
+	mem->online_type = mhp_default_online_type;
 	return device_online(&mem->dev);
 }
 
@@ -1157,7 +1157,7 @@ int __ref add_memory_resource(int nid, s
 		merge_system_ram_resource(res);
 
 	/* online pages if requested */
-	if (memhp_default_online_type != MMOP_OFFLINE)
+	if (mhp_default_online_type != MMOP_OFFLINE)
 		walk_memory_blocks(start, size, NULL, online_memory_block);
 
 	return ret;
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 031/118] mm/memory_hotplug: MEMHP_MERGE_RESOURCE -> MHP_MERGE_RESOURCE
  2021-02-26  1:14 incoming Andrew Morton
                   ` (29 preceding siblings ...)
  2021-02-26  1:17 ` [patch 030/118] mm/memory_hotplug: rename all existing 'memhp' into 'mhp' Andrew Morton
@ 2021-02-26  1:17 ` Andrew Morton
  2021-02-26  1:17 ` [patch 032/118] mm/memory_hotplug: use helper function zone_end_pfn() to get end_pfn Andrew Morton
                   ` (87 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:17 UTC (permalink / raw)
  To: akpm, anshuman.khandual, boris.ostrovsky, david, haiyangz,
	jasowang, jgross, kys, linmiaohe, linux-mm, mhocko, mm-commits,
	mst, osalvador, pankaj.gupta, richard.weiyang, sstabellini,
	sthemmin, torvalds, wei.liu

From: David Hildenbrand <david@redhat.com>
Subject: mm/memory_hotplug: MEMHP_MERGE_RESOURCE -> MHP_MERGE_RESOURCE

Let's make "MEMHP_MERGE_RESOURCE" consistent with "MHP_NONE", "mhp_t" and
"mhp_flags".  As discussed recently [1], "mhp" is our internal acronym for
memory hotplug now.

[1] https://lore.kernel.org/linux-mm/c37de2d0-28a1-4f7d-f944-cfd7d81c334d@redhat.com/

Link: https://lkml.kernel.org/r/20210126115829.10909-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Wei Liu <wei.liu@kernel.org>
Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/hv/hv_balloon.c        |    2 +-
 drivers/virtio/virtio_mem.c    |    2 +-
 drivers/xen/balloon.c          |    2 +-
 include/linux/memory_hotplug.h |    2 +-
 mm/memory_hotplug.c            |    2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

--- a/drivers/hv/hv_balloon.c~mm-memory_hotplug-memhp_merge_resource-mhp_merge_resource
+++ a/drivers/hv/hv_balloon.c
@@ -726,7 +726,7 @@ static void hv_mem_hot_add(unsigned long
 
 		nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
 		ret = add_memory(nid, PFN_PHYS((start_pfn)),
-				(HA_CHUNK << PAGE_SHIFT), MEMHP_MERGE_RESOURCE);
+				(HA_CHUNK << PAGE_SHIFT), MHP_MERGE_RESOURCE);
 
 		if (ret) {
 			pr_err("hot_add memory failed error is %d\n", ret);
--- a/drivers/virtio/virtio_mem.c~mm-memory_hotplug-memhp_merge_resource-mhp_merge_resource
+++ a/drivers/virtio/virtio_mem.c
@@ -623,7 +623,7 @@ static int virtio_mem_add_memory(struct
 	/* Memory might get onlined immediately. */
 	atomic64_add(size, &vm->offline_size);
 	rc = add_memory_driver_managed(vm->nid, addr, size, vm->resource_name,
-				       MEMHP_MERGE_RESOURCE);
+				       MHP_MERGE_RESOURCE);
 	if (rc) {
 		atomic64_sub(size, &vm->offline_size);
 		dev_warn(&vm->vdev->dev, "adding memory failed: %d\n", rc);
--- a/drivers/xen/balloon.c~mm-memory_hotplug-memhp_merge_resource-mhp_merge_resource
+++ a/drivers/xen/balloon.c
@@ -331,7 +331,7 @@ static enum bp_state reserve_additional_
 	mutex_unlock(&balloon_mutex);
 	/* add_memory_resource() requires the device_hotplug lock */
 	lock_device_hotplug();
-	rc = add_memory_resource(nid, resource, MEMHP_MERGE_RESOURCE);
+	rc = add_memory_resource(nid, resource, MHP_MERGE_RESOURCE);
 	unlock_device_hotplug();
 	mutex_lock(&balloon_mutex);
 
--- a/include/linux/memory_hotplug.h~mm-memory_hotplug-memhp_merge_resource-mhp_merge_resource
+++ a/include/linux/memory_hotplug.h
@@ -53,7 +53,7 @@ typedef int __bitwise mhp_t;
  * with this flag set, the resource pointer must no longer be used as it
  * might be stale, or the resource might have changed.
  */
-#define MEMHP_MERGE_RESOURCE	((__force mhp_t)BIT(0))
+#define MHP_MERGE_RESOURCE	((__force mhp_t)BIT(0))
 
 /*
  * Extended parameters for memory hotplug:
--- a/mm/memory_hotplug.c~mm-memory_hotplug-memhp_merge_resource-mhp_merge_resource
+++ a/mm/memory_hotplug.c
@@ -1153,7 +1153,7 @@ int __ref add_memory_resource(int nid, s
 	 * In case we're allowed to merge the resource, flag it and trigger
 	 * merging now that adding succeeded.
 	 */
-	if (mhp_flags & MEMHP_MERGE_RESOURCE)
+	if (mhp_flags & MHP_MERGE_RESOURCE)
 		merge_system_ram_resource(res);
 
 	/* online pages if requested */
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 032/118] mm/memory_hotplug: use helper function zone_end_pfn() to get end_pfn
  2021-02-26  1:14 incoming Andrew Morton
                   ` (30 preceding siblings ...)
  2021-02-26  1:17 ` [patch 031/118] mm/memory_hotplug: MEMHP_MERGE_RESOURCE -> MHP_MERGE_RESOURCE Andrew Morton
@ 2021-02-26  1:17 ` Andrew Morton
  2021-02-26  1:17 ` [patch 033/118] drivers/base/memory: don't store phys_device in memory blocks Andrew Morton
                   ` (86 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:17 UTC (permalink / raw)
  To: akpm, david, linmiaohe, linux-mm, mm-commits, torvalds

From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory_hotplug: use helper function zone_end_pfn() to get end_pfn

Commit 108bcc96ef70 ("mm: add & use zone_end_pfn() and zone_spans_pfn()")
introduced the helper zone_end_pfn() to calculate the zone end pfn.  But
update_pgdat_span() forgot to use it.

Use this helper and rename local variable zone_end_pfn to end_pfn to avoid
a naming conflict with the existing zone_end_pfn().

Link: https://lkml.kernel.org/r/20210127093211.37714-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory_hotplug.c |    9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

--- a/mm/memory_hotplug.c~mm-memory_hotplug-use-helper-function-zone_end_pfn-to-get-end_pfn
+++ a/mm/memory_hotplug.c
@@ -493,20 +493,19 @@ static void update_pgdat_span(struct pgl
 
 	for (zone = pgdat->node_zones;
 	     zone < pgdat->node_zones + MAX_NR_ZONES; zone++) {
-		unsigned long zone_end_pfn = zone->zone_start_pfn +
-					     zone->spanned_pages;
+		unsigned long end_pfn = zone_end_pfn(zone);
 
 		/* No need to lock the zones, they can't change. */
 		if (!zone->spanned_pages)
 			continue;
 		if (!node_end_pfn) {
 			node_start_pfn = zone->zone_start_pfn;
-			node_end_pfn = zone_end_pfn;
+			node_end_pfn = end_pfn;
 			continue;
 		}
 
-		if (zone_end_pfn > node_end_pfn)
-			node_end_pfn = zone_end_pfn;
+		if (end_pfn > node_end_pfn)
+			node_end_pfn = end_pfn;
 		if (zone->zone_start_pfn < node_start_pfn)
 			node_start_pfn = zone->zone_start_pfn;
 	}
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 033/118] drivers/base/memory: don't store phys_device in memory blocks
  2021-02-26  1:14 incoming Andrew Morton
                   ` (31 preceding siblings ...)
  2021-02-26  1:17 ` [patch 032/118] mm/memory_hotplug: use helper function zone_end_pfn() to get end_pfn Andrew Morton
@ 2021-02-26  1:17 ` Andrew Morton
  2021-02-26  1:17 ` [patch 034/118] Documentation: sysfs/memory: clarify some memory block device properties Andrew Morton
                   ` (85 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:17 UTC (permalink / raw)
  To: akpm, corbet, dave.hansen, david, geert+renesas, gerald.schaefer,
	gregkh, idryomov, linux-mm, mchehab+huawei, mhocko, mm-commits,
	osalvador, rafael, torvalds, trix, vaibhav

From: David Hildenbrand <david@redhat.com>
Subject: drivers/base/memory: don't store phys_device in memory blocks

No need to store the value for each and every memory block, as we can
easily query the value at runtime.  Reshuffle the members to optimize the
memory layout.  Also, let's clarify what the interface once was used for
and why it's legacy nowadays.

"phys_device" was used on s390x in older versions of lsmem[2]/chmem[3],
back when they were still part of s390x-tools.  They were later replaced
by the variants in linux-utils.  For example, RHEL6 and RHEL7 contain
lsmem/chmem from s390-utils.  RHEL8 switched to versions from util-linux
on s390x [4].

"phys_device" was added with sysfs support for memory hotplug in commit
3947be1969a9 ("[PATCH] memory hotplug: sysfs and add/remove functions") in
2005.  It always returned 0.

s390x started returning something != 0 on some setups (if sclp.rzm is set
by HW) in 2010 via commit 57b552ba0b2f ("memory hotplug/s390: set
phys_device").

For s390x, it allowed for identifying which memory block devices belong to
the same storage increment (RZM).  Only if all memory block devices
comprising a single storage increment were offline, the memory could
actually be removed in the hypervisor.

Since commit e5d709bb5fb7 ("s390/memory hotplug: provide
memory_block_size_bytes() function") in 2013 a memory block device spans
at least one storage increment - which is why the interface isn't really
helpful/used anymore (except by old lsmem/chmem tools).

There were once RFC patches to make use of "phys_device" in ACPI context;
however, the underlying problem could be solved using different interfaces
[1].

[1] https://patchwork.kernel.org/patch/2163871/
[2] https://github.com/ibm-s390-tools/s390-tools/blob/v2.1.0/zconf/lsmem
[3] https://github.com/ibm-s390-tools/s390-tools/blob/v2.1.0/zconf/chmem
[4] https://bugzilla.redhat.com/show_bug.cgi?id=1504134

Link: https://lkml.kernel.org/r/20210201181347.13262-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Vaibhav Jain <vaibhav@linux.ibm.com>
Cc: Tom Rix <trix@redhat.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 Documentation/ABI/testing/sysfs-devices-memory  |    5 +-
 Documentation/admin-guide/mm/memory-hotplug.rst |    4 +-
 drivers/base/memory.c                           |   25 +++++---------
 include/linux/memory.h                          |    3 -
 4 files changed, 15 insertions(+), 22 deletions(-)

--- a/Documentation/ABI/testing/sysfs-devices-memory~drivers-base-memory-dont-store-phys_device-in-memory-blocks
+++ a/Documentation/ABI/testing/sysfs-devices-memory
@@ -26,8 +26,9 @@ Date:		September 2008
 Contact:	Badari Pulavarty <pbadari@us.ibm.com>
 Description:
 		The file /sys/devices/system/memory/memoryX/phys_device
-		is read-only and is designed to show the name of physical
-		memory device.  Implementation is currently incomplete.
+		is read-only;  it is a legacy interface only ever used on s390x
+		to expose the covered storage increment.
+Users:		Legacy s390-tools lsmem/chmem
 
 What:		/sys/devices/system/memory/memoryX/phys_index
 Date:		September 2008
--- a/Documentation/admin-guide/mm/memory-hotplug.rst~drivers-base-memory-dont-store-phys_device-in-memory-blocks
+++ a/Documentation/admin-guide/mm/memory-hotplug.rst
@@ -160,8 +160,8 @@ Under each memory block, you can see 5 f
 
                     "online_movable", "online", "offline" command
                     which will be performed on all sections in the block.
-``phys_device``     read-only: designed to show the name of physical memory
-                    device.  This is not well implemented now.
+``phys_device``	    read-only: legacy interface only ever used on s390x to
+		    expose the covered storage increment.
 ``removable``       read-only: contains an integer value indicating
                     whether the memory block is removable or not
                     removable.  A value of 1 indicates that the memory
--- a/drivers/base/memory.c~drivers-base-memory-dont-store-phys_device-in-memory-blocks
+++ a/drivers/base/memory.c
@@ -290,20 +290,20 @@ static ssize_t state_store(struct device
 }
 
 /*
- * phys_device is a bad name for this.  What I really want
- * is a way to differentiate between memory ranges that
- * are part of physical devices that constitute
- * a complete removable unit or fru.
- * i.e. do these ranges belong to the same physical device,
- * s.t. if I offline all of these sections I can then
- * remove the physical device?
+ * Legacy interface that we cannot remove: s390x exposes the storage increment
+ * covered by a memory block, allowing for identifying which memory blocks
+ * comprise a storage increment. Since a memory block spans complete
+ * storage increments nowadays, this interface is basically unused. Other
+ * archs never exposed != 0.
  */
 static ssize_t phys_device_show(struct device *dev,
 				struct device_attribute *attr, char *buf)
 {
 	struct memory_block *mem = to_memory_block(dev);
+	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
 
-	return sysfs_emit(buf, "%d\n", mem->phys_device);
+	return sysfs_emit(buf, "%d\n",
+			  arch_get_memory_phys_device(start_pfn));
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
@@ -488,11 +488,7 @@ static DEVICE_ATTR_WO(soft_offline_page)
 static DEVICE_ATTR_WO(hard_offline_page);
 #endif
 
-/*
- * Note that phys_device is optional.  It is here to allow for
- * differentiation between which *physical* devices each
- * section belongs to...
- */
+/* See phys_device_show(). */
 int __weak arch_get_memory_phys_device(unsigned long start_pfn)
 {
 	return 0;
@@ -574,7 +570,6 @@ int register_memory(struct memory_block
 static int init_memory_block(unsigned long block_id, unsigned long state)
 {
 	struct memory_block *mem;
-	unsigned long start_pfn;
 	int ret = 0;
 
 	mem = find_memory_block_by_id(block_id);
@@ -588,8 +583,6 @@ static int init_memory_block(unsigned lo
 
 	mem->start_section_nr = block_id * sections_per_block;
 	mem->state = state;
-	start_pfn = section_nr_to_pfn(mem->start_section_nr);
-	mem->phys_device = arch_get_memory_phys_device(start_pfn);
 	mem->nid = NUMA_NO_NODE;
 
 	ret = register_memory(mem);
--- a/include/linux/memory.h~drivers-base-memory-dont-store-phys_device-in-memory-blocks
+++ a/include/linux/memory.h
@@ -27,9 +27,8 @@ struct memory_block {
 	unsigned long start_section_nr;
 	unsigned long state;		/* serialized by the dev->lock */
 	int online_type;		/* for passing data to online routine */
-	int phys_device;		/* to which fru does this belong? */
-	struct device dev;
 	int nid;			/* NID for this memory block */
+	struct device dev;
 };
 
 int arch_get_memory_phys_device(unsigned long start_pfn);
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 034/118] Documentation: sysfs/memory: clarify some memory block device properties
  2021-02-26  1:14 incoming Andrew Morton
                   ` (32 preceding siblings ...)
  2021-02-26  1:17 ` [patch 033/118] drivers/base/memory: don't store phys_device in memory blocks Andrew Morton
@ 2021-02-26  1:17 ` Andrew Morton
  2021-02-26  1:17 ` [patch 035/118] mm/memory_hotplug: prevalidate the address range being added with platform Andrew Morton
                   ` (84 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:17 UTC (permalink / raw)
  To: akpm, corbet, dave.hansen, david, geert+renesas, gregkh,
	idryomov, Jonathan.Cameron, linux-mm, mchehab+huawei, mhocko,
	mm-commits, osalvador, torvalds

From: David Hildenbrand <david@redhat.com>
Subject: Documentation: sysfs/memory: clarify some memory block device properties

In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory blocks
as removable") we changed the output of the "removable" property of memory
devices to return "1" if and only if the kernel supports memory offlining.

Let's update documentation, stating that the interface is legacy.  Also
update documentation of the "state" property and "valid_zones" properties.

Link: https://lkml.kernel.org/r/20210201181347.13262-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 Documentation/ABI/testing/sysfs-devices-memory  |   53 ++++++++------
 Documentation/admin-guide/mm/memory-hotplug.rst |   16 ++--
 2 files changed, 41 insertions(+), 28 deletions(-)

--- a/Documentation/ABI/testing/sysfs-devices-memory~documentation-sysfs-memory-clarify-some-memory-block-device-properties
+++ a/Documentation/ABI/testing/sysfs-devices-memory
@@ -13,13 +13,13 @@ What:		/sys/devices/system/memory/memory
 Date:		June 2008
 Contact:	Badari Pulavarty <pbadari@us.ibm.com>
 Description:
-		The file /sys/devices/system/memory/memoryX/removable
-		indicates whether this memory block is removable or not.
-		This is useful for a user-level agent to determine
-		identify removable sections of the memory before attempting
-		potentially expensive hot-remove memory operation
+		The file /sys/devices/system/memory/memoryX/removable is a
+		legacy interface used to indicated whether a memory block is
+		likely to be offlineable or not.  Newer kernel versions return
+		"1" if and only if the kernel supports memory offlining.
 Users:		hotplug memory remove tools
 		http://www.ibm.com/developerworks/wikis/display/LinuxP/powerpc-utils
+		lsmem/chmem part of util-linux
 
 What:		/sys/devices/system/memory/memoryX/phys_device
 Date:		September 2008
@@ -44,23 +44,25 @@ Date:		September 2008
 Contact:	Badari Pulavarty <pbadari@us.ibm.com>
 Description:
 		The file /sys/devices/system/memory/memoryX/state
-		is read-write.  When read, its contents show the
-		online/offline state of the memory section.  When written,
-		root can toggle the the online/offline state of a removable
-		memory section (see removable file description above)
-		using the following commands::
+		is read-write.  When read, it returns the online/offline
+		state of the memory block.  When written, root can toggle
+		the online/offline state of a memory block using the following
+		commands::
 
 		  # echo online > /sys/devices/system/memory/memoryX/state
 		  # echo offline > /sys/devices/system/memory/memoryX/state
 
-		For example, if /sys/devices/system/memory/memory22/removable
-		contains a value of 1 and
-		/sys/devices/system/memory/memory22/state contains the
-		string "online" the following command can be executed by
-		by root to offline that section::
-
-		  # echo offline > /sys/devices/system/memory/memory22/state
-
+		On newer kernel versions, advanced states can be specified
+		when onlining to select a target zone: "online_movable"
+		selects the movable zone.  "online_kernel" selects the
+		applicable kernel zone (DMA, DMA32, or Normal).  However,
+		after successfully setting one of the advanced states,
+		reading the file will return "online"; the zone information
+		can be obtained via "valid_zones" instead.
+
+		While onlining is unlikely to fail, there are no guarantees
+		that offlining will succeed.  Offlining is more likely to
+		succeed if "valid_zones" indicates "Movable".
 Users:		hotplug memory remove tools
 		http://www.ibm.com/developerworks/wikis/display/LinuxP/powerpc-utils
 
@@ -70,8 +72,19 @@ Date:           July 2014
 Contact:	Zhang Zhen <zhenzhang.zhang@huawei.com>
 Description:
 		The file /sys/devices/system/memory/memoryX/valid_zones	is
-		read-only and is designed to show which zone this memory
-		block can be onlined to.
+		read-only.
+
+		For online memory blocks, it returns in which zone memory
+		provided by a memory block is managed.  If multiple zones
+		apply (not applicable for hotplugged memory), "None" is returned
+		and the memory block cannot be offlined.
+
+		For offline memory blocks, it returns by which zone memory
+		provided by a memory block can be managed when onlining.
+		The first returned zone ("default") will be used when setting
+		the state of an offline memory block to "online".  Only one of
+		the kernel zones (DMA, DMA32, Normal) is applicable for a single
+		memory block.
 
 What:		/sys/devices/system/memoryX/nodeY
 Date:		October 2009
--- a/Documentation/admin-guide/mm/memory-hotplug.rst~documentation-sysfs-memory-clarify-some-memory-block-device-properties
+++ a/Documentation/admin-guide/mm/memory-hotplug.rst
@@ -162,14 +162,14 @@ Under each memory block, you can see 5 f
                     which will be performed on all sections in the block.
 ``phys_device``	    read-only: legacy interface only ever used on s390x to
 		    expose the covered storage increment.
-``removable``       read-only: contains an integer value indicating
-                    whether the memory block is removable or not
-                    removable.  A value of 1 indicates that the memory
-                    block is removable and a value of 0 indicates that
-                    it is not removable. A memory block is removable only if
-                    every section in the block is removable.
-``valid_zones``     read-only: designed to show which zones this memory block
-		    can be onlined to.
+``removable``	    read-only: legacy interface that indicated whether a memory
+		    block was likely to be offlineable or not.  Newer kernel
+		    versions return "1" if and only if the kernel supports
+		    memory offlining.
+``valid_zones``     read-only: designed to show by which zone memory provided by
+		    a memory block is managed, and to show by which zone memory
+		    provided by an offline memory block could be managed when
+		    onlining.
 
 		    The first column shows it`s default zone.
 
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 035/118] mm/memory_hotplug: prevalidate the address range being added with platform
  2021-02-26  1:14 incoming Andrew Morton
                   ` (33 preceding siblings ...)
  2021-02-26  1:17 ` [patch 034/118] Documentation: sysfs/memory: clarify some memory block device properties Andrew Morton
@ 2021-02-26  1:17 ` Andrew Morton
  2021-02-26  1:17 ` [patch 036/118] arm64/mm: define arch_get_mappable_range() Andrew Morton
                   ` (83 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:17 UTC (permalink / raw)
  To: akpm, anshuman.khandual, ardb, catalin.marinas, david, gor, hca,
	jasowang, Jonathan.Cameron, linux-mm, mark.rutland, mhocko,
	mm-commits, mst, osalvador, pankaj.gupta.linux, pankaj.gupta,
	richard.weiyang, teawaterz, torvalds, will

From: Anshuman Khandual <anshuman.khandual@arm.com>
Subject: mm/memory_hotplug: prevalidate the address range being added with platform

Patch series "mm/memory_hotplug: Pre-validate the address range with platform", v5.

This series adds a mechanism allowing platforms to weigh in and
prevalidate incoming address range before proceeding further with the
memory hotplug.  This helps prevent potential platform errors for the
given address range, down the hotplug call chain, which inevitably fails
the hotplug itself.

This mechanism was suggested by David Hildenbrand during another
discussion with respect to a memory hotplug fix on arm64 platform.

https://lore.kernel.org/linux-arm-kernel/1600332402-30123-1-git-send-email-anshuman.khandual@arm.com/

This mechanism focuses on the addressibility aspect and not [sub] section
alignment aspect.  Hence check_hotplug_memory_range() and check_pfn_span()
have been left unchanged.


This patch (of 4):


This introduces mhp_range_allowed() which can be called in various memory
hotplug paths to prevalidate the address range which is being added, with
the platform.  Then mhp_range_allowed() calls mhp_get_pluggable_range()
which provides applicable address range depending on whether linear
mapping is required or not.  For ranges that require linear mapping, it
calls a new arch callback arch_get_mappable_range() which the platform can
override.  So the new callback, in turn provides the platform an
opportunity to configure acceptable memory hotplug address ranges in case
there are constraints.

This mechanism will help prevent platform specific errors deep down during
hotplug calls.  This drops now redundant
check_hotplug_memory_addressable() check in __add_pages() but instead adds
a VM_BUG_ON() check which would ensure that the range has been validated
with mhp_range_allowed() earlier in the call chain.  Besides
mhp_get_pluggable_range() also can be used by potential memory hotplug
callers to avail the allowed physical range which would go through on a
given platform.

This does not really add any new range check in generic memory hotplug but
instead compensates for lost checks in arch_add_memory() where applicable
and check_hotplug_memory_addressable(), with unified mhp_range_allowed().

[akpm@linux-foundation.org: make pagemap_range() return -EINVAL when mhp_range_allowed() fails]
Link: https://lkml.kernel.org/r/1612149902-7867-1-git-send-email-anshuman.khandual@arm.com
Link: https://lkml.kernel.org/r/1612149902-7867-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com> # s390
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/memory_hotplug.h |   10 +++
 mm/memory_hotplug.c            |   78 +++++++++++++++++++++++--------
 mm/memremap.c                  |    8 ++-
 3 files changed, 76 insertions(+), 20 deletions(-)

--- a/include/linux/memory_hotplug.h~mm-memory_hotplug-prevalidate-the-address-range-being-added-with-platform
+++ a/include/linux/memory_hotplug.h
@@ -66,6 +66,9 @@ struct mhp_params {
 	pgprot_t pgprot;
 };
 
+bool mhp_range_allowed(u64 start, u64 size, bool need_mapping);
+struct range mhp_get_pluggable_range(bool need_mapping);
+
 /*
  * Zone resizing functions
  *
@@ -266,6 +269,13 @@ static inline bool movable_node_is_enabl
 }
 #endif /* ! CONFIG_MEMORY_HOTPLUG */
 
+/*
+ * Keep this declaration outside CONFIG_MEMORY_HOTPLUG as some
+ * platforms might override and use arch_get_mappable_range()
+ * for internal non memory hotplug purposes.
+ */
+struct range arch_get_mappable_range(void);
+
 #if defined(CONFIG_MEMORY_HOTPLUG) || defined(CONFIG_DEFERRED_STRUCT_PAGE_INIT)
 /*
  * pgdat resizing functions
--- a/mm/memory_hotplug.c~mm-memory_hotplug-prevalidate-the-address-range-being-added-with-platform
+++ a/mm/memory_hotplug.c
@@ -107,6 +107,9 @@ static struct resource *register_memory_
 	if (strcmp(resource_name, "System RAM"))
 		flags |= IORESOURCE_SYSRAM_DRIVER_MANAGED;
 
+	if (!mhp_range_allowed(start, size, true))
+		return ERR_PTR(-E2BIG);
+
 	/*
 	 * Make sure value parsed from 'mem=' only restricts memory adding
 	 * while booting, so that memory hotplug won't be impacted. Please
@@ -284,22 +287,6 @@ static int check_pfn_span(unsigned long
 	return 0;
 }
 
-static int check_hotplug_memory_addressable(unsigned long pfn,
-					    unsigned long nr_pages)
-{
-	const u64 max_addr = PFN_PHYS(pfn + nr_pages) - 1;
-
-	if (max_addr >> MAX_PHYSMEM_BITS) {
-		const u64 max_allowed = (1ull << (MAX_PHYSMEM_BITS + 1)) - 1;
-		WARN(1,
-		     "Hotplugged memory exceeds maximum addressable address, range=%#llx-%#llx, maximum=%#llx\n",
-		     (u64)PFN_PHYS(pfn), max_addr, max_allowed);
-		return -E2BIG;
-	}
-
-	return 0;
-}
-
 /*
  * Return page for the valid pfn only if the page is online. All pfn
  * walkers which rely on the fully initialized page->flags and others
@@ -365,9 +352,7 @@ int __ref __add_pages(int nid, unsigned
 	if (WARN_ON_ONCE(!params->pgprot.pgprot))
 		return -EINVAL;
 
-	err = check_hotplug_memory_addressable(pfn, nr_pages);
-	if (err)
-		return err;
+	VM_BUG_ON(!mhp_range_allowed(PFN_PHYS(pfn), nr_pages * PAGE_SIZE, false));
 
 	if (altmap) {
 		/*
@@ -1248,6 +1233,61 @@ out_unlock:
 }
 EXPORT_SYMBOL_GPL(add_memory_driver_managed);
 
+/*
+ * Platforms should define arch_get_mappable_range() that provides
+ * maximum possible addressable physical memory range for which the
+ * linear mapping could be created. The platform returned address
+ * range must adhere to these following semantics.
+ *
+ * - range.start <= range.end
+ * - Range includes both end points [range.start..range.end]
+ *
+ * There is also a fallback definition provided here, allowing the
+ * entire possible physical address range in case any platform does
+ * not define arch_get_mappable_range().
+ */
+struct range __weak arch_get_mappable_range(void)
+{
+	struct range mhp_range = {
+		.start = 0UL,
+		.end = -1ULL,
+	};
+	return mhp_range;
+}
+
+struct range mhp_get_pluggable_range(bool need_mapping)
+{
+	const u64 max_phys = (1ULL << MAX_PHYSMEM_BITS) - 1;
+	struct range mhp_range;
+
+	if (need_mapping) {
+		mhp_range = arch_get_mappable_range();
+		if (mhp_range.start > max_phys) {
+			mhp_range.start = 0;
+			mhp_range.end = 0;
+		}
+		mhp_range.end = min_t(u64, mhp_range.end, max_phys);
+	} else {
+		mhp_range.start = 0;
+		mhp_range.end = max_phys;
+	}
+	return mhp_range;
+}
+EXPORT_SYMBOL_GPL(mhp_get_pluggable_range);
+
+bool mhp_range_allowed(u64 start, u64 size, bool need_mapping)
+{
+	struct range mhp_range = mhp_get_pluggable_range(need_mapping);
+	u64 end = start + size;
+
+	if (start < end && start >= mhp_range.start && (end - 1) <= mhp_range.end)
+		return true;
+
+	pr_warn("Hotplug memory [%#llx-%#llx] exceeds maximum addressable range [%#llx-%#llx]\n",
+		start, end, mhp_range.start, mhp_range.end);
+	return false;
+}
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 /*
  * Confirm all pages in a range [start, end) belong to the same zone (skipping
--- a/mm/memremap.c~mm-memory_hotplug-prevalidate-the-address-range-being-added-with-platform
+++ a/mm/memremap.c
@@ -200,6 +200,7 @@ static void dev_pagemap_percpu_release(s
 static int pagemap_range(struct dev_pagemap *pgmap, struct mhp_params *params,
 		int range_id, int nid)
 {
+	const bool is_private = pgmap->type == MEMORY_DEVICE_PRIVATE;
 	struct range *range = &pgmap->ranges[range_id];
 	struct dev_pagemap *conflict_pgmap;
 	int error, is_ram;
@@ -245,6 +246,11 @@ static int pagemap_range(struct dev_page
 	if (error)
 		goto err_pfn_remap;
 
+	if (!mhp_range_allowed(range->start, range_len(range), !is_private)) {
+		error = -EINVAL;
+		goto err_pfn_remap;
+	}
+
 	mem_hotplug_begin();
 
 	/*
@@ -258,7 +264,7 @@ static int pagemap_range(struct dev_page
 	 * the CPU, we do want the linear mapping and thus use
 	 * arch_add_memory().
 	 */
-	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
+	if (is_private) {
 		error = add_pages(nid, PHYS_PFN(range->start),
 				PHYS_PFN(range_len(range)), params);
 	} else {
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 036/118] arm64/mm: define arch_get_mappable_range()
  2021-02-26  1:14 incoming Andrew Morton
                   ` (34 preceding siblings ...)
  2021-02-26  1:17 ` [patch 035/118] mm/memory_hotplug: prevalidate the address range being added with platform Andrew Morton
@ 2021-02-26  1:17 ` Andrew Morton
  2021-02-26  1:17 ` [patch 037/118] s390/mm: " Andrew Morton
                   ` (82 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:17 UTC (permalink / raw)
  To: akpm, anshuman.khandual, ardb, catalin.marinas, david, gor, hca,
	jasowang, Jonathan.Cameron, linux-mm, mark.rutland, mhocko,
	mm-commits, mst, osalvador, pankaj.gupta.linux, pankaj.gupta,
	richard.weiyang, teawaterz, torvalds, will

From: Anshuman Khandual <anshuman.khandual@arm.com>
Subject: arm64/mm: define arch_get_mappable_range()

This overrides arch_get_mappable_range() on arm64 platform which will be
used with recently added generic framework.  It drops
inside_linear_region() and subsequent check in arch_add_memory() which are
no longer required.  It also adds a VM_BUG_ON() check that would ensure
that mhp_range_allowed() has already been called.

Link: https://lkml.kernel.org/r/1612149902-7867-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/arm64/mm/mmu.c |   15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

--- a/arch/arm64/mm/mmu.c~arm64-mm-define-arch_get_mappable_range
+++ a/arch/arm64/mm/mmu.c
@@ -1444,16 +1444,19 @@ static void __remove_pgd_mapping(pgd_t *
 	free_empty_tables(start, end, PAGE_OFFSET, PAGE_END);
 }
 
-static bool inside_linear_region(u64 start, u64 size)
+struct range arch_get_mappable_range(void)
 {
+	struct range mhp_range;
+
 	/*
 	 * Linear mapping region is the range [PAGE_OFFSET..(PAGE_END - 1)]
 	 * accommodating both its ends but excluding PAGE_END. Max physical
 	 * range which can be mapped inside this linear mapping range, must
 	 * also be derived from its end points.
 	 */
-	return start >= __pa(_PAGE_OFFSET(vabits_actual)) &&
-	       (start + size - 1) <= __pa(PAGE_END - 1);
+	mhp_range.start = __pa(_PAGE_OFFSET(vabits_actual));
+	mhp_range.end =  __pa(PAGE_END - 1);
+	return mhp_range;
 }
 
 int arch_add_memory(int nid, u64 start, u64 size,
@@ -1461,11 +1464,7 @@ int arch_add_memory(int nid, u64 start,
 {
 	int ret, flags = 0;
 
-	if (!inside_linear_region(start, size)) {
-		pr_err("[%llx %llx] is outside linear mapping region\n", start, start + size);
-		return -EINVAL;
-	}
-
+	VM_BUG_ON(!mhp_range_allowed(start, size, true));
 	if (rodata_full || debug_pagealloc_enabled())
 		flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
 
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 037/118] s390/mm: define arch_get_mappable_range()
  2021-02-26  1:14 incoming Andrew Morton
                   ` (35 preceding siblings ...)
  2021-02-26  1:17 ` [patch 036/118] arm64/mm: define arch_get_mappable_range() Andrew Morton
@ 2021-02-26  1:17 ` Andrew Morton
  2021-02-26  1:17 ` [patch 038/118] virtio-mem: check against mhp_get_pluggable_range() which memory we can hotplug Andrew Morton
                   ` (81 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:17 UTC (permalink / raw)
  To: akpm, anshuman.khandual, ardb, catalin.marinas, david, gor, hca,
	jasowang, Jonathan.Cameron, linux-mm, mark.rutland, mhocko,
	mm-commits, mst, osalvador, pankaj.gupta.linux, pankaj.gupta,
	richard.weiyang, teawaterz, torvalds, will

From: Anshuman Khandual <anshuman.khandual@arm.com>
Subject: s390/mm: define arch_get_mappable_range()

This overrides arch_get_mappabble_range() on s390 platform which will be
used with recently added generic framework.  It modifies the existing
range check in vmem_add_mapping() using arch_get_mappable_range().  It
also adds a VM_BUG_ON() check that would ensure that mhp_range_allowed()
has already been called on the hotplug path.

Link: https://lkml.kernel.org/r/1612149902-7867-4-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/s390/mm/init.c |    1 +
 arch/s390/mm/vmem.c |   14 +++++++++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

--- a/arch/s390/mm/init.c~s390-mm-define-arch_get_mappable_range
+++ a/arch/s390/mm/init.c
@@ -297,6 +297,7 @@ int arch_add_memory(int nid, u64 start,
 	if (WARN_ON_ONCE(params->pgprot.pgprot != PAGE_KERNEL.pgprot))
 		return -EINVAL;
 
+	VM_BUG_ON(!mhp_range_allowed(start, size, true));
 	rc = vmem_add_mapping(start, size);
 	if (rc)
 		return rc;
--- a/arch/s390/mm/vmem.c~s390-mm-define-arch_get_mappable_range
+++ a/arch/s390/mm/vmem.c
@@ -4,6 +4,7 @@
  *    Author(s): Heiko Carstens <heiko.carstens@de.ibm.com>
  */
 
+#include <linux/memory_hotplug.h>
 #include <linux/memblock.h>
 #include <linux/pfn.h>
 #include <linux/mm.h>
@@ -532,11 +533,22 @@ void vmem_remove_mapping(unsigned long s
 	mutex_unlock(&vmem_mutex);
 }
 
+struct range arch_get_mappable_range(void)
+{
+	struct range mhp_range;
+
+	mhp_range.start = 0;
+	mhp_range.end =  VMEM_MAX_PHYS - 1;
+	return mhp_range;
+}
+
 int vmem_add_mapping(unsigned long start, unsigned long size)
 {
+	struct range range = arch_get_mappable_range();
 	int ret;
 
-	if (start + size > VMEM_MAX_PHYS ||
+	if (start < range.start ||
+	    start + size > range.end + 1 ||
 	    start + size < start)
 		return -ERANGE;
 
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 038/118] virtio-mem: check against mhp_get_pluggable_range() which memory we can hotplug
  2021-02-26  1:14 incoming Andrew Morton
                   ` (36 preceding siblings ...)
  2021-02-26  1:17 ` [patch 037/118] s390/mm: " Andrew Morton
@ 2021-02-26  1:17 ` Andrew Morton
  2021-02-26  1:17 ` [patch 039/118] mm/mlock: stop counting mlocked pages when none vma is found Andrew Morton
                   ` (80 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:17 UTC (permalink / raw)
  To: akpm, anshuman.khandual, ardb, catalin.marinas, david, gor, hca,
	jasowang, Jonathan.Cameron, linux-mm, mark.rutland, mhocko,
	mm-commits, mst, osalvador, pankaj.gupta.linux, pankaj.gupta,
	richard.weiyang, teawaterz, torvalds, will

From: David Hildenbrand <david@redhat.com>
Subject: virtio-mem: check against mhp_get_pluggable_range() which memory we can hotplug

Right now, we only check against MAX_PHYSMEM_BITS - but turns out there
are more restrictions of which memory we can actually hotplug, especially
om arm64 or s390x once we support them: we might receive something like
-E2BIG or -ERANGE from add_memory_driver_managed(), stopping device
operation.

So, check right when initializing the device which memory we can add,
warning the user.  Try only adding actually pluggable ranges: in the worst
case, no memory provided by our device is pluggable.

In the usual case, we expect all device memory to be pluggable, and in
corner cases only some memory at the end of the device-managed memory
region to not be pluggable.

Link: https://lkml.kernel.org/r/1612149902-7867-5-git-send-email-anshuman.khandual@arm.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/virtio/virtio_mem.c |   41 ++++++++++++++++++++++------------
 1 file changed, 27 insertions(+), 14 deletions(-)

--- a/drivers/virtio/virtio_mem.c~virtio-mem-check-against-mhp_get_pluggable_range-which-memory-we-can-hotplug
+++ a/drivers/virtio/virtio_mem.c
@@ -2222,7 +2222,7 @@ static int virtio_mem_unplug_pending_mb(
  */
 static void virtio_mem_refresh_config(struct virtio_mem *vm)
 {
-	const uint64_t phys_limit = 1UL << MAX_PHYSMEM_BITS;
+	const struct range pluggable_range = mhp_get_pluggable_range(true);
 	uint64_t new_plugged_size, usable_region_size, end_addr;
 
 	/* the plugged_size is just a reflection of what _we_ did previously */
@@ -2234,15 +2234,25 @@ static void virtio_mem_refresh_config(st
 	/* calculate the last usable memory block id */
 	virtio_cread_le(vm->vdev, struct virtio_mem_config,
 			usable_region_size, &usable_region_size);
-	end_addr = vm->addr + usable_region_size;
-	end_addr = min(end_addr, phys_limit);
+	end_addr = min(vm->addr + usable_region_size - 1,
+		       pluggable_range.end);
 
-	if (vm->in_sbm)
-		vm->sbm.last_usable_mb_id =
-					 virtio_mem_phys_to_mb_id(end_addr) - 1;
-	else
-		vm->bbm.last_usable_bb_id =
-				     virtio_mem_phys_to_bb_id(vm, end_addr) - 1;
+	if (vm->in_sbm) {
+		vm->sbm.last_usable_mb_id = virtio_mem_phys_to_mb_id(end_addr);
+		if (!IS_ALIGNED(end_addr + 1, memory_block_size_bytes()))
+			vm->sbm.last_usable_mb_id--;
+	} else {
+		vm->bbm.last_usable_bb_id = virtio_mem_phys_to_bb_id(vm,
+								     end_addr);
+		if (!IS_ALIGNED(end_addr + 1, vm->bbm.bb_size))
+			vm->bbm.last_usable_bb_id--;
+	}
+	/*
+	 * If we cannot plug any of our device memory (e.g., nothing in the
+	 * usable region is addressable), the last usable memory block id will
+	 * be smaller than the first usable memory block id. We'll stop
+	 * attempting to add memory with -ENOSPC from our main loop.
+	 */
 
 	/* see if there is a request to change the size */
 	virtio_cread_le(vm->vdev, struct virtio_mem_config, requested_size,
@@ -2364,7 +2374,7 @@ static int virtio_mem_init_vq(struct vir
 
 static int virtio_mem_init(struct virtio_mem *vm)
 {
-	const uint64_t phys_limit = 1UL << MAX_PHYSMEM_BITS;
+	const struct range pluggable_range = mhp_get_pluggable_range(true);
 	uint64_t sb_size, addr;
 	uint16_t node_id;
 
@@ -2405,9 +2415,10 @@ static int virtio_mem_init(struct virtio
 	if (!IS_ALIGNED(vm->addr + vm->region_size, memory_block_size_bytes()))
 		dev_warn(&vm->vdev->dev,
 			 "The alignment of the physical end address can make some memory unusable.\n");
-	if (vm->addr + vm->region_size > phys_limit)
+	if (vm->addr < pluggable_range.start ||
+	    vm->addr + vm->region_size - 1 > pluggable_range.end)
 		dev_warn(&vm->vdev->dev,
-			 "Some memory is not addressable. This can make some memory unusable.\n");
+			 "Some device memory is not addressable/pluggable. This can make some memory unusable.\n");
 
 	/*
 	 * We want subblocks to span at least MAX_ORDER_NR_PAGES and
@@ -2429,7 +2440,8 @@ static int virtio_mem_init(struct virtio
 				     vm->sbm.sb_size;
 
 		/* Round up to the next full memory block */
-		addr = vm->addr + memory_block_size_bytes() - 1;
+		addr = max_t(uint64_t, vm->addr, pluggable_range.start) +
+		       memory_block_size_bytes() - 1;
 		vm->sbm.first_mb_id = virtio_mem_phys_to_mb_id(addr);
 		vm->sbm.next_mb_id = vm->sbm.first_mb_id;
 	} else {
@@ -2450,7 +2462,8 @@ static int virtio_mem_init(struct virtio
 		}
 
 		/* Round up to the next aligned big block */
-		addr = vm->addr + vm->bbm.bb_size - 1;
+		addr = max_t(uint64_t, vm->addr, pluggable_range.start) +
+		       vm->bbm.bb_size - 1;
 		vm->bbm.first_bb_id = virtio_mem_phys_to_bb_id(vm, addr);
 		vm->bbm.next_bb_id = vm->bbm.first_bb_id;
 	}
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 039/118] mm/mlock: stop counting mlocked pages when none vma is found
  2021-02-26  1:14 incoming Andrew Morton
                   ` (37 preceding siblings ...)
  2021-02-26  1:17 ` [patch 038/118] virtio-mem: check against mhp_get_pluggable_range() which memory we can hotplug Andrew Morton
@ 2021-02-26  1:17 ` Andrew Morton
  2021-02-26  1:17 ` [patch 040/118] mm/rmap: correct some obsolete comments of anon_vma Andrew Morton
                   ` (79 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:17 UTC (permalink / raw)
  To: akpm, david, linmiaohe, linux-mm, mm-commits, torvalds

From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/mlock: stop counting mlocked pages when none vma is found

There will be no vma satisfies addr < vm_end when find_vma() returns NULL.
Thus it's meaningless to traverse the vma list below because we can't
find any vma to count mlocked pages.  Stop counting mlocked pages in this
case to save some vma list traversal cycles.

Link: https://lkml.kernel.org/r/20210204110705.17586-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/mlock.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/mlock.c~mm-mlock-stop-counting-mlocked-pages-when-none-vma-is-found
+++ a/mm/mlock.c
@@ -622,7 +622,7 @@ static unsigned long count_mm_mlocked_pa
 
 	vma = find_vma(mm, start);
 	if (vma == NULL)
-		vma = mm->mmap;
+		return 0;
 
 	for (; vma ; vma = vma->vm_next) {
 		if (start >= vma->vm_end)
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 040/118] mm/rmap: correct some obsolete comments of anon_vma
  2021-02-26  1:14 incoming Andrew Morton
                   ` (38 preceding siblings ...)
  2021-02-26  1:17 ` [patch 039/118] mm/mlock: stop counting mlocked pages when none vma is found Andrew Morton
@ 2021-02-26  1:17 ` Andrew Morton
  2021-02-26  1:17 ` [patch 041/118] mm/rmap: remove unneeded semicolon in page_not_mapped() Andrew Morton
                   ` (78 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:17 UTC (permalink / raw)
  To: akpm, linmiaohe, linux-mm, mm-commits, riel, torvalds

From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/rmap: correct some obsolete comments of anon_vma

commit 2b575eb64f7a ("mm: convert anon_vma->lock to a mutex") changed
spinlock used to serialize access to vma list to mutex.  And further, the
commit 5a505085f043 ("mm/rmap: Convert the struct anon_vma::mutex to an
rwsem") converted the mutex to an rwsem for solving scalability problem. 
So replace spinlock with rwsem to make comment uptodate.

Link: https://lkml.kernel.org/r/20210123072459.25903-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/rmap.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/mm/rmap.c~mm-rmap-correct-some-obsolete-comments-of-anon_vma
+++ a/mm/rmap.c
@@ -168,7 +168,7 @@ static void anon_vma_chain_link(struct v
  *
  * Anon-vma allocations are very subtle, because we may have
  * optimistically looked up an anon_vma in page_lock_anon_vma_read()
- * and that may actually touch the spinlock even in the newly
+ * and that may actually touch the rwsem even in the newly
  * allocated vma (it depends on RCU to make sure that the
  * anon_vma isn't actually destroyed).
  *
@@ -359,7 +359,7 @@ int anon_vma_fork(struct vm_area_struct
 		goto out_error_free_anon_vma;
 
 	/*
-	 * The root anon_vma's spinlock is the lock actually used when we
+	 * The root anon_vma's rwsem is the lock actually used when we
 	 * lock any of the anon_vmas in this anon_vma tree.
 	 */
 	anon_vma->root = pvma->anon_vma->root;
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 041/118] mm/rmap: remove unneeded semicolon in page_not_mapped()
  2021-02-26  1:14 incoming Andrew Morton
                   ` (39 preceding siblings ...)
  2021-02-26  1:17 ` [patch 040/118] mm/rmap: correct some obsolete comments of anon_vma Andrew Morton
@ 2021-02-26  1:17 ` Andrew Morton
  2021-02-26  1:17 ` [patch 042/118] mm/rmap: fix obsolete comment in __page_check_anon_rmap() Andrew Morton
                   ` (77 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:17 UTC (permalink / raw)
  To: akpm, linmiaohe, linux-mm, mm-commits, torvalds

From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/rmap: remove unneeded semicolon in page_not_mapped()

Remove extra semicolon without any functional change intended.

Link: https://lkml.kernel.org/r/20210127093425.39640-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/rmap.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/rmap.c~mm-rmap-remove-unneeded-semicolon-in-page_not_mapped
+++ a/mm/rmap.c
@@ -1784,7 +1784,7 @@ bool try_to_unmap(struct page *page, enu
 static int page_not_mapped(struct page *page)
 {
 	return !page_mapped(page);
-};
+}
 
 /**
  * try_to_munlock - try to munlock a page
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 042/118] mm/rmap: fix obsolete comment in __page_check_anon_rmap()
  2021-02-26  1:14 incoming Andrew Morton
                   ` (40 preceding siblings ...)
  2021-02-26  1:17 ` [patch 041/118] mm/rmap: remove unneeded semicolon in page_not_mapped() Andrew Morton
@ 2021-02-26  1:17 ` Andrew Morton
  2021-02-26  1:18 ` [patch 043/118] mm/rmap: use page_not_mapped in try_to_unmap() Andrew Morton
                   ` (76 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:17 UTC (permalink / raw)
  To: akpm, linmiaohe, linux-mm, mm-commits, torvalds

From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/rmap: fix obsolete comment in __page_check_anon_rmap()

Commit 21333b2b66b8 ("ksm: no debug in page_dup_rmap()") has reverted
page_dup_rmap() to an inline atomic_inc of mapcount.  So page_dup_rmap()
does not call __page_check_anon_rmap() anymore.

Link: https://lkml.kernel.org/r/20210128110209.50857-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/rmap.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/mm/rmap.c~mm-rmap-fix-obsolete-comment-in-__page_check_anon_rmap
+++ a/mm/rmap.c
@@ -1086,8 +1086,7 @@ static void __page_check_anon_rmap(struc
 	 * be set up correctly at this point.
 	 *
 	 * We have exclusion against page_add_anon_rmap because the caller
-	 * always holds the page locked, except if called from page_dup_rmap,
-	 * in which case the page is already known to be setup.
+	 * always holds the page locked.
 	 *
 	 * We have exclusion against page_add_new_anon_rmap because those pages
 	 * are initially only visible via the pagetables, and the pte is locked
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 043/118] mm/rmap: use page_not_mapped in try_to_unmap()
  2021-02-26  1:14 incoming Andrew Morton
                   ` (41 preceding siblings ...)
  2021-02-26  1:17 ` [patch 042/118] mm/rmap: fix obsolete comment in __page_check_anon_rmap() Andrew Morton
@ 2021-02-26  1:18 ` Andrew Morton
  2021-02-26  1:18 ` [patch 044/118] mm/rmap: correct obsolete comment of page_get_anon_vma() Andrew Morton
                   ` (75 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:18 UTC (permalink / raw)
  To: akpm, linmiaohe, linux-mm, mm-commits, torvalds

From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/rmap: use page_not_mapped in try_to_unmap()

page_mapcount_is_zero() calculates accurately how many mappings a hugepage
has in order to check against 0 only.  This is a waste of cpu time.  We
can do this via page_not_mapped() to save some possible atomic_read
cycles.  Remove the function page_mapcount_is_zero() as it's not used
anymore and move page_not_mapped() above try_to_unmap() to avoid
identifier undeclared compilation error.

Link: https://lkml.kernel.org/r/20210130084904.35307-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/rmap.c |   11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

--- a/mm/rmap.c~mm-rmap-use-page_not_mapped-in-try_to_unmap
+++ a/mm/rmap.c
@@ -1736,9 +1736,9 @@ static bool invalid_migration_vma(struct
 	return vma_is_temporary_stack(vma);
 }
 
-static int page_mapcount_is_zero(struct page *page)
+static int page_not_mapped(struct page *page)
 {
-	return !total_mapcount(page);
+	return !page_mapped(page);
 }
 
 /**
@@ -1756,7 +1756,7 @@ bool try_to_unmap(struct page *page, enu
 	struct rmap_walk_control rwc = {
 		.rmap_one = try_to_unmap_one,
 		.arg = (void *)flags,
-		.done = page_mapcount_is_zero,
+		.done = page_not_mapped,
 		.anon_lock = page_lock_anon_vma_read,
 	};
 
@@ -1780,11 +1780,6 @@ bool try_to_unmap(struct page *page, enu
 	return !page_mapcount(page) ? true : false;
 }
 
-static int page_not_mapped(struct page *page)
-{
-	return !page_mapped(page);
-}
-
 /**
  * try_to_munlock - try to munlock a page
  * @page: the page to be munlocked
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 044/118] mm/rmap: correct obsolete comment of page_get_anon_vma()
  2021-02-26  1:14 incoming Andrew Morton
                   ` (42 preceding siblings ...)
  2021-02-26  1:18 ` [patch 043/118] mm/rmap: use page_not_mapped in try_to_unmap() Andrew Morton
@ 2021-02-26  1:18 ` Andrew Morton
  2021-02-26  1:18 ` [patch 045/118] mm/rmap: fix potential pte_unmap on an not mapped pte Andrew Morton
                   ` (74 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:18 UTC (permalink / raw)
  To: a.p.zijlstra, akpm, linmiaohe, linux-mm, mm-commits, torvalds

From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/rmap: correct obsolete comment of page_get_anon_vma()

Since commit 746b18d421da ("mm: use refcounts for page_lock_anon_vma()"),
page_lock_anon_vma() is renamed to page_get_anon_vma() and converted to
return a refcount increased anon_vma.  But it forgot to change the
relevant comment.

Link: https://lkml.kernel.org/r/20210203093215.31990-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/rmap.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/mm/rmap.c~mm-rmap-correct-obsolete-comment-of-page_get_anon_vma
+++ a/mm/rmap.c
@@ -462,8 +462,8 @@ void __init anon_vma_init(void)
  * Getting a lock on a stable anon_vma from a page off the LRU is tricky!
  *
  * Since there is no serialization what so ever against page_remove_rmap()
- * the best this function can do is return a locked anon_vma that might
- * have been relevant to this page.
+ * the best this function can do is return a refcount increased anon_vma
+ * that might have been relevant to this page.
  *
  * The page might have been remapped to a different anon_vma or the anon_vma
  * returned may already be freed (and even reused).
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 045/118] mm/rmap: fix potential pte_unmap on an not mapped pte
  2021-02-26  1:14 incoming Andrew Morton
                   ` (43 preceding siblings ...)
  2021-02-26  1:18 ` [patch 044/118] mm/rmap: correct obsolete comment of page_get_anon_vma() Andrew Morton
@ 2021-02-26  1:18 ` Andrew Morton
  2021-02-26  1:18 ` [patch 046/118] mm: zswap: clean up confusing comment Andrew Morton
                   ` (73 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:18 UTC (permalink / raw)
  To: 0x7f454c46, akpm, bgeffon, hannes, keescook, kirill.shutemov,
	linmiaohe, linux-mm, louhongxiang, mike.kravetz, mm-commits,
	natechancellor, ndesaulniers, richard.weiyang, sedat.dilek,
	shakeelb, torvalds, vbabka, walken

From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/rmap: fix potential pte_unmap on an not mapped pte

For PMD-mapped page (usually THP), pvmw->pte is NULL.  For PTE-mapped THP,
pvmw->pte is mapped.  But for HugeTLB pages, pvmw->pte is not mapped and
set to the relevant page table entry.  So in page_vma_mapped_walk_done(),
we may do pte_unmap() for HugeTLB pte which is not mapped.  Fix this by
checking pvmw->page against PageHuge before trying to do pte_unmap().

Link: https://lkml.kernel.org/r/20210127093349.39081-1-linmiaohe@huawei.com
Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
Signed-off-by: Hongxiang Lou <louhongxiang@huawei.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michel Lespinasse <walken@google.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/rmap.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/include/linux/rmap.h~mm-rmap-fix-potential-pte_unmap-on-an-not-mapped-pte
+++ a/include/linux/rmap.h
@@ -213,7 +213,8 @@ struct page_vma_mapped_walk {
 
 static inline void page_vma_mapped_walk_done(struct page_vma_mapped_walk *pvmw)
 {
-	if (pvmw->pte)
+	/* HugeTLB pte is set to the relevant page table entry without pte_mapped. */
+	if (pvmw->pte && !PageHuge(pvmw->page))
 		pte_unmap(pvmw->pte);
 	if (pvmw->ptl)
 		spin_unlock(pvmw->ptl);
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 046/118] mm: zswap: clean up confusing comment
  2021-02-26  1:14 incoming Andrew Morton
                   ` (44 preceding siblings ...)
  2021-02-26  1:18 ` [patch 045/118] mm/rmap: fix potential pte_unmap on an not mapped pte Andrew Morton
@ 2021-02-26  1:18 ` Andrew Morton
  2021-02-26  1:18 ` [patch 047/118] mm/zswap: add the flag can_sleep_mapped Andrew Morton
                   ` (72 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:18 UTC (permalink / raw)
  To: akpm, ddstreet, linux-mm, mm-commits, rdunlap, sjenning,
	sjennings, torvalds, vitaly.wool, weijie.yang

From: Randy Dunlap <rdunlap@infradead.org>
Subject: mm: zswap: clean up confusing comment

Correct wording and change one duplicated word (it) to "it is".

Link: https://lkml.kernel.org/r/20201221042848.13980-1-rdunlap@infradead.org
Fixes: 0ab0abcf5115 ("mm/zswap: refactor the get/put routines")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Weijie Yang <weijie.yang@samsung.com>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/zswap.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/mm/zswap.c~mm-zswap-clean-up-confusing-comment
+++ a/mm/zswap.c
@@ -1022,10 +1022,10 @@ static int zswap_writeback_entry(struct
 
 	/*
 	* if we get here due to ZSWAP_SWAPCACHE_EXIST
-	* a load may happening concurrently
-	* it is safe and okay to not free the entry
+	* a load may be happening concurrently.
+	* it is safe and okay to not free the entry.
 	* if we free the entry in the following put
-	* it it either okay to return !0
+	* it is also okay to return !0
 	*/
 fail:
 	spin_lock(&tree->lock);
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 047/118] mm/zswap: add the flag can_sleep_mapped
  2021-02-26  1:14 incoming Andrew Morton
                   ` (45 preceding siblings ...)
  2021-02-26  1:18 ` [patch 046/118] mm: zswap: clean up confusing comment Andrew Morton
@ 2021-02-26  1:18 ` Andrew Morton
  2021-02-26  1:18 ` [patch 048/118] mm: set the sleep_mapped to true for zbud and z3fold Andrew Morton
                   ` (71 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:18 UTC (permalink / raw)
  To: akpm, bigeasy, colin.king, dan.carpenter, ddstreet, efault,
	linux-mm, mm-commits, natechancellor, sjenning, song.bao.hua,
	tiantao6, torvalds, vitaly.wool

From: Tian Tao <tiantao6@hisilicon.com>
Subject: mm/zswap: add the flag can_sleep_mapped

Patch series "Fix the compatibility of zsmalloc and zswap".

Patch #1 adds a flag to zpool, then zswap used to determine if zpool
drivers such as zbud/z3fold/zsmalloc will enter an atomic context after
mapping.

The difference between zbud/z3fold and zsmalloc is that zsmalloc requires
an atomic context that since its map function holds a preempt-disabled,
but zbud/z3fold don't require an atomic context.  So patch #2 sets flag
sleep_mapped to true indicating that zbud/z3fold can sleep after mapping. 
zsmalloc didn't support sleep after mapping, so don't set that flag to
true.


This patch (of 2):

Add a flag to zpool, named is "can_sleep_mapped", and have it set true for
zbud/z3fold, not set this flag for zsmalloc, so its default value is
false.  Then zswap could go the current path if the flag is true; and if
it's false, copy data from src to a temporary buffer, then unmap the
handle, take the mutex, process the buffer instead of src to avoid
sleeping function called from atomic context.

[natechancellor@gmail.com: add return value in zswap_frontswap_load]
  Link: https://lkml.kernel.org/r/20210121214804.926843-1-natechancellor@gmail.com
[tiantao6@hisilicon.com: fix potential memory leak]
  Link: https://lkml.kernel.org/r/1611538365-51811-1-git-send-email-tiantao6@hisilicon.com
[colin.king@canonical.com: fix potential uninitialized pointer read on tmp]
  Link: https://lkml.kernel.org/r/20210128141728.639030-1-colin.king@canonical.com
[tiantao6@hisilicon.com: fix variable 'entry' is uninitialized when used]
  Link: https://lkml.kernel.org/r/1611223030-58346-1-git-send-email-tiantao6@hisilicon.comLink: https://lkml.kernel.org/r/1611035683-12732-1-git-send-email-tiantao6@hisilicon.com
Link: https://lkml.kernel.org/r/1611035683-12732-2-git-send-email-tiantao6@hisilicon.com
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Mike Galbraith <efault@gmx.de>
Cc: Barry Song <song.bao.hua@hisilicon.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/zpool.h |    3 ++
 mm/zpool.c            |   13 ++++++++++
 mm/zswap.c            |   51 ++++++++++++++++++++++++++++++++++++----
 3 files changed, 62 insertions(+), 5 deletions(-)

--- a/include/linux/zpool.h~mm-zswap-add-the-flag-can_sleep_mapped
+++ a/include/linux/zpool.h
@@ -73,6 +73,7 @@ u64 zpool_get_total_size(struct zpool *p
  * @malloc:	allocate mem from a pool.
  * @free:	free mem from a pool.
  * @shrink:	shrink the pool.
+ * @sleep_mapped: whether zpool driver can sleep during map.
  * @map:	map a handle.
  * @unmap:	unmap a handle.
  * @total_size:	get total size of a pool.
@@ -100,6 +101,7 @@ struct zpool_driver {
 	int (*shrink)(void *pool, unsigned int pages,
 				unsigned int *reclaimed);
 
+	bool sleep_mapped;
 	void *(*map)(void *pool, unsigned long handle,
 				enum zpool_mapmode mm);
 	void (*unmap)(void *pool, unsigned long handle);
@@ -112,5 +114,6 @@ void zpool_register_driver(struct zpool_
 int zpool_unregister_driver(struct zpool_driver *driver);
 
 bool zpool_evictable(struct zpool *pool);
+bool zpool_can_sleep_mapped(struct zpool *pool);
 
 #endif
--- a/mm/zpool.c~mm-zswap-add-the-flag-can_sleep_mapped
+++ a/mm/zpool.c
@@ -23,6 +23,7 @@ struct zpool {
 	void *pool;
 	const struct zpool_ops *ops;
 	bool evictable;
+	bool can_sleep_mapped;
 
 	struct list_head list;
 };
@@ -183,6 +184,7 @@ struct zpool *zpool_create_pool(const ch
 	zpool->pool = driver->create(name, gfp, ops, zpool);
 	zpool->ops = ops;
 	zpool->evictable = driver->shrink && ops && ops->evict;
+	zpool->can_sleep_mapped = driver->sleep_mapped;
 
 	if (!zpool->pool) {
 		pr_err("couldn't create %s pool\n", type);
@@ -393,6 +395,17 @@ bool zpool_evictable(struct zpool *zpool
 	return zpool->evictable;
 }
 
+/**
+ * zpool_can_sleep_mapped - Test if zpool can sleep when do mapped.
+ * @zpool:	The zpool to test
+ *
+ * Returns: true if zpool can sleep; false otherwise.
+ */
+bool zpool_can_sleep_mapped(struct zpool *zpool)
+{
+	return zpool->can_sleep_mapped;
+}
+
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Dan Streetman <ddstreet@ieee.org>");
 MODULE_DESCRIPTION("Common API for compressed memory storage");
--- a/mm/zswap.c~mm-zswap-add-the-flag-can_sleep_mapped
+++ a/mm/zswap.c
@@ -935,13 +935,19 @@ static int zswap_writeback_entry(struct
 	struct scatterlist input, output;
 	struct crypto_acomp_ctx *acomp_ctx;
 
-	u8 *src;
+	u8 *src, *tmp = NULL;
 	unsigned int dlen;
 	int ret;
 	struct writeback_control wbc = {
 		.sync_mode = WB_SYNC_NONE,
 	};
 
+	if (!zpool_can_sleep_mapped(pool)) {
+		tmp = kmalloc(PAGE_SIZE, GFP_ATOMIC);
+		if (!tmp)
+			return -ENOMEM;
+	}
+
 	/* extract swpentry from data */
 	zhdr = zpool_map_handle(pool, handle, ZPOOL_MM_RO);
 	swpentry = zhdr->swpentry; /* here */
@@ -955,6 +961,7 @@ static int zswap_writeback_entry(struct
 		/* entry was invalidated */
 		spin_unlock(&tree->lock);
 		zpool_unmap_handle(pool, handle);
+		kfree(tmp);
 		return 0;
 	}
 	spin_unlock(&tree->lock);
@@ -979,6 +986,14 @@ static int zswap_writeback_entry(struct
 		dlen = PAGE_SIZE;
 		src = (u8 *)zhdr + sizeof(struct zswap_header);
 
+		if (!zpool_can_sleep_mapped(pool)) {
+
+			memcpy(tmp, src, entry->length);
+			src = tmp;
+
+			zpool_unmap_handle(pool, handle);
+		}
+
 		mutex_lock(acomp_ctx->mutex);
 		sg_init_one(&input, src, entry->length);
 		sg_init_table(&output, 1);
@@ -1033,7 +1048,11 @@ fail:
 	spin_unlock(&tree->lock);
 
 end:
-	zpool_unmap_handle(pool, handle);
+	if (zpool_can_sleep_mapped(pool))
+		zpool_unmap_handle(pool, handle);
+	else
+		kfree(tmp);
+
 	return ret;
 }
 
@@ -1235,7 +1254,7 @@ static int zswap_frontswap_load(unsigned
 	struct zswap_entry *entry;
 	struct scatterlist input, output;
 	struct crypto_acomp_ctx *acomp_ctx;
-	u8 *src, *dst;
+	u8 *src, *dst, *tmp;
 	unsigned int dlen;
 	int ret;
 
@@ -1253,15 +1272,33 @@ static int zswap_frontswap_load(unsigned
 		dst = kmap_atomic(page);
 		zswap_fill_page(dst, entry->value);
 		kunmap_atomic(dst);
+		ret = 0;
 		goto freeentry;
 	}
 
+	if (!zpool_can_sleep_mapped(entry->pool->zpool)) {
+
+		tmp = kmalloc(entry->length, GFP_ATOMIC);
+		if (!tmp) {
+			ret = -ENOMEM;
+			goto freeentry;
+		}
+	}
+
 	/* decompress */
 	dlen = PAGE_SIZE;
 	src = zpool_map_handle(entry->pool->zpool, entry->handle, ZPOOL_MM_RO);
 	if (zpool_evictable(entry->pool->zpool))
 		src += sizeof(struct zswap_header);
 
+	if (!zpool_can_sleep_mapped(entry->pool->zpool)) {
+
+		memcpy(tmp, src, entry->length);
+		src = tmp;
+
+		zpool_unmap_handle(entry->pool->zpool, entry->handle);
+	}
+
 	acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
 	mutex_lock(acomp_ctx->mutex);
 	sg_init_one(&input, src, entry->length);
@@ -1271,7 +1308,11 @@ static int zswap_frontswap_load(unsigned
 	ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait);
 	mutex_unlock(acomp_ctx->mutex);
 
-	zpool_unmap_handle(entry->pool->zpool, entry->handle);
+	if (zpool_can_sleep_mapped(entry->pool->zpool))
+		zpool_unmap_handle(entry->pool->zpool, entry->handle);
+	else
+		kfree(tmp);
+
 	BUG_ON(ret);
 
 freeentry:
@@ -1279,7 +1320,7 @@ freeentry:
 	zswap_entry_put(tree, entry);
 	spin_unlock(&tree->lock);
 
-	return 0;
+	return ret;
 }
 
 /* frees an entry in zswap */
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 048/118] mm: set the sleep_mapped to true for zbud and z3fold
  2021-02-26  1:14 incoming Andrew Morton
                   ` (46 preceding siblings ...)
  2021-02-26  1:18 ` [patch 047/118] mm/zswap: add the flag can_sleep_mapped Andrew Morton
@ 2021-02-26  1:18 ` Andrew Morton
  2021-02-26  1:18 ` [patch 049/118] mm/zsmalloc.c: convert to use kmem_cache_zalloc in cache_alloc_zspage() Andrew Morton
                   ` (70 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:18 UTC (permalink / raw)
  To: akpm, bigeasy, ddstreet, efault, linux-mm, mm-commits, sjenning,
	song.bao.hua, tiantao6, torvalds, vitaly.wool

From: Tian Tao <tiantao6@hisilicon.com>
Subject: mm: set the sleep_mapped to true for zbud and z3fold

zpool driver adds a flag to indicate whether the zpool driver can enter an
atomic context after mapping.  This patch sets it true for z3fold and
zbud.

Link: https://lkml.kernel.org/r/1611035683-12732-3-git-send-email-tiantao6@hisilicon.com
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Mike Galbraith <efault@gmx.de>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/z3fold.c |    1 +
 mm/zbud.c   |    1 +
 2 files changed, 2 insertions(+)

--- a/mm/z3fold.c~mm-set-the-sleep_mapped-to-true-for-zbud-and-z3fold
+++ a/mm/z3fold.c
@@ -1771,6 +1771,7 @@ static u64 z3fold_zpool_total_size(void
 
 static struct zpool_driver z3fold_zpool_driver = {
 	.type =		"z3fold",
+	.sleep_mapped = true,
 	.owner =	THIS_MODULE,
 	.create =	z3fold_zpool_create,
 	.destroy =	z3fold_zpool_destroy,
--- a/mm/zbud.c~mm-set-the-sleep_mapped-to-true-for-zbud-and-z3fold
+++ a/mm/zbud.c
@@ -203,6 +203,7 @@ static u64 zbud_zpool_total_size(void *p
 
 static struct zpool_driver zbud_zpool_driver = {
 	.type =		"zbud",
+	.sleep_mapped = true,
 	.owner =	THIS_MODULE,
 	.create =	zbud_zpool_create,
 	.destroy =	zbud_zpool_destroy,
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 049/118] mm/zsmalloc.c: convert to use kmem_cache_zalloc in cache_alloc_zspage()
  2021-02-26  1:14 incoming Andrew Morton
                   ` (47 preceding siblings ...)
  2021-02-26  1:18 ` [patch 048/118] mm: set the sleep_mapped to true for zbud and z3fold Andrew Morton
@ 2021-02-26  1:18 ` Andrew Morton
  2021-02-26  1:18 ` [patch 050/118] zsmalloc: account the number of compacted pages correctly Andrew Morton
                   ` (69 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:18 UTC (permalink / raw)
  To: akpm, linmiaohe, linux-mm, minchan, mm-commits,
	sergey.senozhatsky, torvalds

From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/zsmalloc.c: convert to use kmem_cache_zalloc in cache_alloc_zspage()

We always memset the zspage allocated via cache_alloc_zspage.  So it's
more convenient to use kmem_cache_zalloc in cache_alloc_zspage than caller
do it manually.

Link: https://lkml.kernel.org/r/20210114120032.25885-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/zsmalloc.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/mm/zsmalloc.c~mm-zsmallocc-convert-to-use-kmem_cache_zalloc-in-cache_alloc_zspage
+++ a/mm/zsmalloc.c
@@ -357,7 +357,7 @@ static void cache_free_handle(struct zs_
 
 static struct zspage *cache_alloc_zspage(struct zs_pool *pool, gfp_t flags)
 {
-	return kmem_cache_alloc(pool->zspage_cachep,
+	return kmem_cache_zalloc(pool->zspage_cachep,
 			flags & ~(__GFP_HIGHMEM|__GFP_MOVABLE));
 }
 
@@ -1064,7 +1064,6 @@ static struct zspage *alloc_zspage(struc
 	if (!zspage)
 		return NULL;
 
-	memset(zspage, 0, sizeof(struct zspage));
 	zspage->magic = ZSPAGE_MAGIC;
 	migrate_lock_init(zspage);
 
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 050/118] zsmalloc: account the number of compacted pages correctly
  2021-02-26  1:14 incoming Andrew Morton
                   ` (48 preceding siblings ...)
  2021-02-26  1:18 ` [patch 049/118] mm/zsmalloc.c: convert to use kmem_cache_zalloc in cache_alloc_zspage() Andrew Morton
@ 2021-02-26  1:18 ` Andrew Morton
  2021-02-26  1:18 ` [patch 051/118] mm/zsmalloc.c: use page_private() to access page->private Andrew Morton
                   ` (68 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:18 UTC (permalink / raw)
  To: akpm, linux-mm, minchan, mm-commits, sergey.senozhatsky, stable,
	torvalds, wu-yan

From: Rokudo Yan <wu-yan@tcl.com>
Subject: zsmalloc: account the number of compacted pages correctly

There exists multiple path may do zram compaction concurrently.
1. auto-compaction triggered during memory reclaim
2. userspace utils write zram<id>/compaction node

So, multiple threads may call zs_shrinker_scan/zs_compact concurrently.
But pages_compacted is a per zsmalloc pool variable and modification
of the variable is not serialized(through under class->lock).
There are two issues here:
1. the pages_compacted may not equal to total number of pages
freed(due to concurrently add).
2. zs_shrinker_scan may not return the correct number of pages
freed(issued by current shrinker).

The fix is simple:
1. account the number of pages freed in zs_compact locally.
2. use actomic variable pages_compacted to accumulate total number.

Link: https://lkml.kernel.org/r/20210202122235.26885-1-wu-yan@tcl.com
Fixes: 860c707dca155a56 ("zsmalloc: account the number of compacted pages")
Signed-off-by: Rokudo Yan <wu-yan@tcl.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/block/zram/zram_drv.c |    2 +-
 include/linux/zsmalloc.h      |    2 +-
 mm/zsmalloc.c                 |   17 +++++++++++------
 3 files changed, 13 insertions(+), 8 deletions(-)

--- a/drivers/block/zram/zram_drv.c~zsmalloc-account-the-number-of-compacted-pages-correctly
+++ a/drivers/block/zram/zram_drv.c
@@ -1081,7 +1081,7 @@ static ssize_t mm_stat_show(struct devic
 			zram->limit_pages << PAGE_SHIFT,
 			max_used << PAGE_SHIFT,
 			(u64)atomic64_read(&zram->stats.same_pages),
-			pool_stats.pages_compacted,
+			atomic_long_read(&pool_stats.pages_compacted),
 			(u64)atomic64_read(&zram->stats.huge_pages),
 			(u64)atomic64_read(&zram->stats.huge_pages_since));
 	up_read(&zram->init_lock);
--- a/include/linux/zsmalloc.h~zsmalloc-account-the-number-of-compacted-pages-correctly
+++ a/include/linux/zsmalloc.h
@@ -35,7 +35,7 @@ enum zs_mapmode {
 
 struct zs_pool_stats {
 	/* How many pages were migrated (freed) */
-	unsigned long pages_compacted;
+	atomic_long_t pages_compacted;
 };
 
 struct zs_pool;
--- a/mm/zsmalloc.c~zsmalloc-account-the-number-of-compacted-pages-correctly
+++ a/mm/zsmalloc.c
@@ -2212,11 +2212,13 @@ static unsigned long zs_can_compact(stru
 	return obj_wasted * class->pages_per_zspage;
 }
 
-static void __zs_compact(struct zs_pool *pool, struct size_class *class)
+static unsigned long __zs_compact(struct zs_pool *pool,
+				  struct size_class *class)
 {
 	struct zs_compact_control cc;
 	struct zspage *src_zspage;
 	struct zspage *dst_zspage = NULL;
+	unsigned long pages_freed = 0;
 
 	spin_lock(&class->lock);
 	while ((src_zspage = isolate_zspage(class, true))) {
@@ -2246,7 +2248,7 @@ static void __zs_compact(struct zs_pool
 		putback_zspage(class, dst_zspage);
 		if (putback_zspage(class, src_zspage) == ZS_EMPTY) {
 			free_zspage(pool, class, src_zspage);
-			pool->stats.pages_compacted += class->pages_per_zspage;
+			pages_freed += class->pages_per_zspage;
 		}
 		spin_unlock(&class->lock);
 		cond_resched();
@@ -2257,12 +2259,15 @@ static void __zs_compact(struct zs_pool
 		putback_zspage(class, src_zspage);
 
 	spin_unlock(&class->lock);
+
+	return pages_freed;
 }
 
 unsigned long zs_compact(struct zs_pool *pool)
 {
 	int i;
 	struct size_class *class;
+	unsigned long pages_freed = 0;
 
 	for (i = ZS_SIZE_CLASSES - 1; i >= 0; i--) {
 		class = pool->size_class[i];
@@ -2270,10 +2275,11 @@ unsigned long zs_compact(struct zs_pool
 			continue;
 		if (class->index != i)
 			continue;
-		__zs_compact(pool, class);
+		pages_freed += __zs_compact(pool, class);
 	}
+	atomic_long_add(pages_freed, &pool->stats.pages_compacted);
 
-	return pool->stats.pages_compacted;
+	return pages_freed;
 }
 EXPORT_SYMBOL_GPL(zs_compact);
 
@@ -2290,13 +2296,12 @@ static unsigned long zs_shrinker_scan(st
 	struct zs_pool *pool = container_of(shrinker, struct zs_pool,
 			shrinker);
 
-	pages_freed = pool->stats.pages_compacted;
 	/*
 	 * Compact classes and calculate compaction delta.
 	 * Can run concurrently with a manually triggered
 	 * (by user) compaction.
 	 */
-	pages_freed = zs_compact(pool) - pages_freed;
+	pages_freed = zs_compact(pool);
 
 	return pages_freed ? pages_freed : SHRINK_STOP;
 }
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 051/118] mm/zsmalloc.c: use page_private() to access page->private
  2021-02-26  1:14 incoming Andrew Morton
                   ` (49 preceding siblings ...)
  2021-02-26  1:18 ` [patch 050/118] zsmalloc: account the number of compacted pages correctly Andrew Morton
@ 2021-02-26  1:18 ` Andrew Morton
  2021-02-26  1:18 ` [patch 052/118] mm: page-flags.h: Typo fix (It -> If) Andrew Morton
                   ` (67 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:18 UTC (permalink / raw)
  To: akpm, linmiaohe, linux-mm, minchan, mm-commits, ngupta, torvalds

From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/zsmalloc.c: use page_private() to access page->private

It's recommended to use helper macro page_private() to access the private
field of page.  Use such helper to eliminate direct access.

Link: https://lkml.kernel.org/r/20210203091857.20017-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/zsmalloc.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/zsmalloc.c~mm-zsmallocc-use-page_private-to-access-page-private
+++ a/mm/zsmalloc.c
@@ -816,7 +816,7 @@ static int get_pages_per_zspage(int clas
 
 static struct zspage *get_zspage(struct page *page)
 {
-	struct zspage *zspage = (struct zspage *)page->private;
+	struct zspage *zspage = (struct zspage *)page_private(page);
 
 	BUG_ON(zspage->magic != ZSPAGE_MAGIC);
 	return zspage;
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 052/118] mm: page-flags.h: Typo fix (It -> If)
  2021-02-26  1:14 incoming Andrew Morton
                   ` (50 preceding siblings ...)
  2021-02-26  1:18 ` [patch 051/118] mm/zsmalloc.c: use page_private() to access page->private Andrew Morton
@ 2021-02-26  1:18 ` Andrew Morton
  2021-02-26  1:18 ` [patch 053/118] mm/dmapool: use might_alloc() Andrew Morton
                   ` (66 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:18 UTC (permalink / raw)
  To: akpm, alexander.h.duyck, david, guoren, linux-mm, mm-commits,
	osalvador, steven.price, torvalds

From: Guo Ren <guoren@linux.alibaba.com>
Subject: mm: page-flags.h: Typo fix (It -> If)

The "If" was wrongly spelled as "It".

Link: https://lkml.kernel.org/r/1608959036-91409-1-git-send-email-guoren@kernel.org
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Steven Price <steven.price@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/page-flags.h |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/include/linux/page-flags.h~mm-page-flagsh-typo-fix-it-if
+++ a/include/linux/page-flags.h
@@ -810,7 +810,7 @@ static inline void ClearPageSlabPfmemall
 
 /*
  * Flags checked when a page is freed.  Pages being freed should not have
- * these flags set.  It they are, there is a problem.
+ * these flags set.  If they are, there is a problem.
  */
 #define PAGE_FLAGS_CHECK_AT_FREE				\
 	(1UL << PG_lru		| 1UL << PG_locked	|	\
@@ -821,7 +821,7 @@ static inline void ClearPageSlabPfmemall
 
 /*
  * Flags checked when a page is prepped for return by the page allocator.
- * Pages being prepped should not have these flags set.  It they are set,
+ * Pages being prepped should not have these flags set.  If they are set,
  * there has been a kernel bug or struct page corruption.
  *
  * __PG_HWPOISON is exceptional because it needs to be kept beyond page's
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 053/118] mm/dmapool: use might_alloc()
  2021-02-26  1:14 incoming Andrew Morton
                   ` (51 preceding siblings ...)
  2021-02-26  1:18 ` [patch 052/118] mm: page-flags.h: Typo fix (It -> If) Andrew Morton
@ 2021-02-26  1:18 ` Andrew Morton
  2021-02-26  1:18 ` [patch 054/118] mm/backing-dev.c: " Andrew Morton
                   ` (65 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:18 UTC (permalink / raw)
  To: akpm, daniel.vetter, daniel.vetter, linux-mm, mm-commits, torvalds

From: Daniel Vetter <daniel.vetter@ffwll.ch>
Subject: mm/dmapool: use might_alloc()

Now that my little helper has landed, use it more.  On top of the existing
check this also uses lockdep through the fs_reclaim annotations.

Link: https://lkml.kernel.org/r/20210113135009.3606813-1-daniel.vetter@ffwll.ch
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/dmapool.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/mm/dmapool.c~mm-dmapool-use-might_alloc
+++ a/mm/dmapool.c
@@ -28,6 +28,7 @@
 #include <linux/mutex.h>
 #include <linux/poison.h>
 #include <linux/sched.h>
+#include <linux/sched/mm.h>
 #include <linux/slab.h>
 #include <linux/stat.h>
 #include <linux/spinlock.h>
@@ -319,7 +320,7 @@ void *dma_pool_alloc(struct dma_pool *po
 	size_t offset;
 	void *retval;
 
-	might_sleep_if(gfpflags_allow_blocking(mem_flags));
+	might_alloc(mem_flags);
 
 	spin_lock_irqsave(&pool->lock, flags);
 	list_for_each_entry(page, &pool->page_list, page_list) {
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 054/118] mm/backing-dev.c: use might_alloc()
  2021-02-26  1:14 incoming Andrew Morton
                   ` (52 preceding siblings ...)
  2021-02-26  1:18 ` [patch 053/118] mm/dmapool: use might_alloc() Andrew Morton
@ 2021-02-26  1:18 ` Andrew Morton
  2021-02-26  1:18 ` [patch 055/118] mm/early_ioremap.c: use __func__ instead of function name Andrew Morton
                   ` (64 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:18 UTC (permalink / raw)
  To: akpm, daniel.vetter, daniel.vetter, linux-mm, mm-commits, torvalds

From: Daniel Vetter <daniel.vetter@ffwll.ch>
Subject: mm/backing-dev.c: use might_alloc()

Now that my little helper has landed, use it more.  On top of the existing
check this also uses lockdep through the fs_reclaim annotations.

[akpm@linux-foundation.org: include linux/sched/mm.h]
Link: https://lkml.kernel.org/r/20210113135009.3606813-2-daniel.vetter@ffwll.ch
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/backing-dev.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/mm/backing-dev.c~bdi-use-might_alloc
+++ a/mm/backing-dev.c
@@ -8,6 +8,7 @@
 #include <linux/fs.h>
 #include <linux/pagemap.h>
 #include <linux/mm.h>
+#include <linux/sched/mm.h>
 #include <linux/sched.h>
 #include <linux/module.h>
 #include <linux/writeback.h>
@@ -578,7 +579,7 @@ struct bdi_writeback *wb_get_create(stru
 {
 	struct bdi_writeback *wb;
 
-	might_sleep_if(gfpflags_allow_blocking(gfp));
+	might_alloc(gfp);
 
 	if (!memcg_css->parent)
 		return &bdi->wb;
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 055/118] mm/early_ioremap.c: use __func__ instead of function name
  2021-02-26  1:14 incoming Andrew Morton
                   ` (53 preceding siblings ...)
  2021-02-26  1:18 ` [patch 054/118] mm/backing-dev.c: " Andrew Morton
@ 2021-02-26  1:18 ` Andrew Morton
  2021-02-26  1:18 ` [patch 056/118] mm: add Kernel Electric-Fence infrastructure Andrew Morton
                   ` (63 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:18 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, stephenzhangzsd, torvalds

From: Stephen Zhang <stephenzhangzsd@gmail.com>
Subject: mm/early_ioremap.c: use __func__ instead of function name

It is better to use __func__ instead of function name.

Link: https://lkml.kernel.org/r/1611385587-4209-1-git-send-email-stephenzhangzsd@gmail.com
Signed-off-by: Stephen Zhang <stephenzhangzsd@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/early_ioremap.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

--- a/mm/early_ioremap.c~mm-early_ioremapc-use-__func__-instead-of-function-name
+++ a/mm/early_ioremap.c
@@ -181,17 +181,17 @@ void __init early_iounmap(void __iomem *
 		}
 	}
 
-	if (WARN(slot < 0, "early_iounmap(%p, %08lx) not found slot\n",
-		 addr, size))
+	if (WARN(slot < 0, "%s(%p, %08lx) not found slot\n",
+		  __func__, addr, size))
 		return;
 
 	if (WARN(prev_size[slot] != size,
-		 "early_iounmap(%p, %08lx) [%d] size not consistent %08lx\n",
-		 addr, size, slot, prev_size[slot]))
+		 "%s(%p, %08lx) [%d] size not consistent %08lx\n",
+		  __func__, addr, size, slot, prev_size[slot]))
 		return;
 
-	WARN(early_ioremap_debug, "early_iounmap(%p, %08lx) [%d]\n",
-	     addr, size, slot);
+	WARN(early_ioremap_debug, "%s(%p, %08lx) [%d]\n",
+	      __func__, addr, size, slot);
 
 	virt_addr = (unsigned long)addr;
 	if (WARN_ON(virt_addr < fix_to_virt(FIX_BTMAP_BEGIN)))
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 056/118] mm: add Kernel Electric-Fence infrastructure
  2021-02-26  1:14 incoming Andrew Morton
                   ` (54 preceding siblings ...)
  2021-02-26  1:18 ` [patch 055/118] mm/early_ioremap.c: use __func__ instead of function name Andrew Morton
@ 2021-02-26  1:18 ` Andrew Morton
  2021-02-26  1:18 ` [patch 057/118] x86, kfence: enable KFENCE for x86 Andrew Morton
                   ` (62 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:18 UTC (permalink / raw)
  To: akpm, andreyknvl, aryabinin, bp, catalin.marinas, cl, corbet,
	dave.hansen, dvyukov, edumazet, elver, glider, gregkh, hdanton,
	hpa, iamjoonsoo.kim, jannh, joern, keescook, linux-mm, luto,
	mark.rutland, mingo, mm-commits, paulmck, penberg, peterz,
	rientjes, sjpark, tglx, torvalds, vbabka, will

From: Alexander Potapenko <glider@google.com>
Subject: mm: add Kernel Electric-Fence infrastructure

Patch series "KFENCE: A low-overhead sampling-based memory safety error detector", v7.

This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
low-overhead sampling-based memory safety error detector of heap
use-after-free, invalid-free, and out-of-bounds access errors.  This
series enables KFENCE for the x86 and arm64 architectures, and adds
KFENCE hooks to the SLAB and SLUB allocators.

KFENCE is designed to be enabled in production kernels, and has near
zero performance overhead. Compared to KASAN, KFENCE trades performance
for precision. The main motivation behind KFENCE's design, is that with
enough total uptime KFENCE will detect bugs in code paths not typically
exercised by non-production test workloads. One way to quickly achieve a
large enough total uptime is when the tool is deployed across a large
fleet of machines.

KFENCE objects each reside on a dedicated page, at either the left or
right page boundaries. The pages to the left and right of the object
page are "guard pages", whose attributes are changed to a protected
state, and cause page faults on any attempted access to them. Such page
faults are then intercepted by KFENCE, which handles the fault
gracefully by reporting a memory access error.

Guarded allocations are set up based on a sample interval (can be set
via kfence.sample_interval). After expiration of the sample interval,
the next allocation through the main allocator (SLAB or SLUB) returns a
guarded allocation from the KFENCE object pool. At this point, the timer
is reset, and the next allocation is set up after the expiration of the
interval.

To enable/disable a KFENCE allocation through the main allocator's
fast-path without overhead, KFENCE relies on static branches via the
static keys infrastructure. The static branch is toggled to redirect the
allocation to KFENCE.

The KFENCE memory pool is of fixed size, and if the pool is exhausted no
further KFENCE allocations occur. The default config is conservative
with only 255 objects, resulting in a pool size of 2 MiB (with 4 KiB
pages).

We have verified by running synthetic benchmarks (sysbench I/O,
hackbench) and production server-workload benchmarks that a kernel with
KFENCE (using sample intervals 100-500ms) is performance-neutral
compared to a non-KFENCE baseline kernel.

KFENCE is inspired by GWP-ASan [1], a userspace tool with similar
properties. The name "KFENCE" is a homage to the Electric Fence Malloc
Debugger [2].

For more details, see Documentation/dev-tools/kfence.rst added in the
series -- also viewable here:

	https://raw.githubusercontent.com/google/kasan/kfence/Documentation/dev-tools/kfence.rst

[1] http://llvm.org/docs/GwpAsan.html
[2] https://linux.die.net/man/3/efence


This patch (of 9):

This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
low-overhead sampling-based memory safety error detector of heap
use-after-free, invalid-free, and out-of-bounds access errors.

KFENCE is designed to be enabled in production kernels, and has near
zero performance overhead. Compared to KASAN, KFENCE trades performance
for precision. The main motivation behind KFENCE's design, is that with
enough total uptime KFENCE will detect bugs in code paths not typically
exercised by non-production test workloads. One way to quickly achieve a
large enough total uptime is when the tool is deployed across a large
fleet of machines.

KFENCE objects each reside on a dedicated page, at either the left or
right page boundaries. The pages to the left and right of the object
page are "guard pages", whose attributes are changed to a protected
state, and cause page faults on any attempted access to them. Such page
faults are then intercepted by KFENCE, which handles the fault
gracefully by reporting a memory access error. To detect out-of-bounds
writes to memory within the object's page itself, KFENCE also uses
pattern-based redzones. The following figure illustrates the page
layout:

  ---+-----------+-----------+-----------+-----------+-----------+---
     | xxxxxxxxx | O :       | xxxxxxxxx |       : O | xxxxxxxxx |
     | xxxxxxxxx | B :       | xxxxxxxxx |       : B | xxxxxxxxx |
     | x GUARD x | J : RED-  | x GUARD x | RED-  : J | x GUARD x |
     | xxxxxxxxx | E :  ZONE | xxxxxxxxx |  ZONE : E | xxxxxxxxx |
     | xxxxxxxxx | C :       | xxxxxxxxx |       : C | xxxxxxxxx |
     | xxxxxxxxx | T :       | xxxxxxxxx |       : T | xxxxxxxxx |
  ---+-----------+-----------+-----------+-----------+-----------+---

Guarded allocations are set up based on a sample interval (can be set
via kfence.sample_interval). After expiration of the sample interval, a
guarded allocation from the KFENCE object pool is returned to the main
allocator (SLAB or SLUB). At this point, the timer is reset, and the
next allocation is set up after the expiration of the interval.

To enable/disable a KFENCE allocation through the main allocator's
fast-path without overhead, KFENCE relies on static branches via the
static keys infrastructure. The static branch is toggled to redirect the
allocation to KFENCE. To date, we have verified by running synthetic
benchmarks (sysbench I/O, hackbench) that a kernel compiled with KFENCE
is performance-neutral compared to the non-KFENCE baseline.

For more details, see Documentation/dev-tools/kfence.rst (added later in
the series).

[elver@google.com: fix parameter description for kfence_object_start()]
  Link: https://lkml.kernel.org/r/20201106092149.GA2851373@elver.google.com
[elver@google.com: avoid stalling work queue task without allocations]
  Link: https://lkml.kernel.org/r/CADYN=9J0DQhizAGB0-jz4HOBBh+05kMBXb4c0cXMS7Qi5NAJiw@mail.gmail.com
  Link: https://lkml.kernel.org/r/20201110135320.3309507-1-elver@google.com
[elver@google.com: fix potential deadlock due to wake_up()]
  Link: https://lkml.kernel.org/r/000000000000c0645805b7f982e4@google.com
  Link: https://lkml.kernel.org/r/20210104130749.1768991-1-elver@google.com
[elver@google.com: add option to use KFENCE without static keys]
  Link: https://lkml.kernel.org/r/20210111091544.3287013-1-elver@google.com
[elver@google.com: add missing copyright and description headers]
  Link: https://lkml.kernel.org/r/20210118092159.145934-1-elver@google.com
Link: https://lkml.kernel.org/r/20201103175841.3495947-2-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: SeongJae Park <sjpark@amazon.de>
Co-developed-by: Marco Elver <elver@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/kfence.h |  216 ++++++++++
 init/main.c            |    3 
 lib/Kconfig.debug      |    1 
 lib/Kconfig.kfence     |   67 +++
 mm/Makefile            |    1 
 mm/kfence/Makefile     |    3 
 mm/kfence/core.c       |  840 +++++++++++++++++++++++++++++++++++++++
 mm/kfence/kfence.h     |  113 +++++
 mm/kfence/report.c     |  240 +++++++++++
 9 files changed, 1484 insertions(+)

--- /dev/null
+++ a/include/linux/kfence.h
@@ -0,0 +1,216 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Kernel Electric-Fence (KFENCE). Public interface for allocator and fault
+ * handler integration. For more info see Documentation/dev-tools/kfence.rst.
+ *
+ * Copyright (C) 2020, Google LLC.
+ */
+
+#ifndef _LINUX_KFENCE_H
+#define _LINUX_KFENCE_H
+
+#include <linux/mm.h>
+#include <linux/types.h>
+
+#ifdef CONFIG_KFENCE
+
+/*
+ * We allocate an even number of pages, as it simplifies calculations to map
+ * address to metadata indices; effectively, the very first page serves as an
+ * extended guard page, but otherwise has no special purpose.
+ */
+#define KFENCE_POOL_SIZE ((CONFIG_KFENCE_NUM_OBJECTS + 1) * 2 * PAGE_SIZE)
+extern char *__kfence_pool;
+
+#ifdef CONFIG_KFENCE_STATIC_KEYS
+#include <linux/static_key.h>
+DECLARE_STATIC_KEY_FALSE(kfence_allocation_key);
+#else
+#include <linux/atomic.h>
+extern atomic_t kfence_allocation_gate;
+#endif
+
+/**
+ * is_kfence_address() - check if an address belongs to KFENCE pool
+ * @addr: address to check
+ *
+ * Return: true or false depending on whether the address is within the KFENCE
+ * object range.
+ *
+ * KFENCE objects live in a separate page range and are not to be intermixed
+ * with regular heap objects (e.g. KFENCE objects must never be added to the
+ * allocator freelists). Failing to do so may and will result in heap
+ * corruptions, therefore is_kfence_address() must be used to check whether
+ * an object requires specific handling.
+ *
+ * Note: This function may be used in fast-paths, and is performance critical.
+ * Future changes should take this into account; for instance, we want to avoid
+ * introducing another load and therefore need to keep KFENCE_POOL_SIZE a
+ * constant (until immediate patching support is added to the kernel).
+ */
+static __always_inline bool is_kfence_address(const void *addr)
+{
+	/*
+	 * The non-NULL check is required in case the __kfence_pool pointer was
+	 * never initialized; keep it in the slow-path after the range-check.
+	 */
+	return unlikely((unsigned long)((char *)addr - __kfence_pool) < KFENCE_POOL_SIZE && addr);
+}
+
+/**
+ * kfence_alloc_pool() - allocate the KFENCE pool via memblock
+ */
+void __init kfence_alloc_pool(void);
+
+/**
+ * kfence_init() - perform KFENCE initialization at boot time
+ *
+ * Requires that kfence_alloc_pool() was called before. This sets up the
+ * allocation gate timer, and requires that workqueues are available.
+ */
+void __init kfence_init(void);
+
+/**
+ * kfence_shutdown_cache() - handle shutdown_cache() for KFENCE objects
+ * @s: cache being shut down
+ *
+ * Before shutting down a cache, one must ensure there are no remaining objects
+ * allocated from it. Because KFENCE objects are not referenced from the cache
+ * directly, we need to check them here.
+ *
+ * Note that shutdown_cache() is internal to SL*B, and kmem_cache_destroy() does
+ * not return if allocated objects still exist: it prints an error message and
+ * simply aborts destruction of a cache, leaking memory.
+ *
+ * If the only such objects are KFENCE objects, we will not leak the entire
+ * cache, but instead try to provide more useful debug info by making allocated
+ * objects "zombie allocations". Objects may then still be used or freed (which
+ * is handled gracefully), but usage will result in showing KFENCE error reports
+ * which include stack traces to the user of the object, the original allocation
+ * site, and caller to shutdown_cache().
+ */
+void kfence_shutdown_cache(struct kmem_cache *s);
+
+/*
+ * Allocate a KFENCE object. Allocators must not call this function directly,
+ * use kfence_alloc() instead.
+ */
+void *__kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags);
+
+/**
+ * kfence_alloc() - allocate a KFENCE object with a low probability
+ * @s:     struct kmem_cache with object requirements
+ * @size:  exact size of the object to allocate (can be less than @s->size
+ *         e.g. for kmalloc caches)
+ * @flags: GFP flags
+ *
+ * Return:
+ * * NULL     - must proceed with allocating as usual,
+ * * non-NULL - pointer to a KFENCE object.
+ *
+ * kfence_alloc() should be inserted into the heap allocation fast path,
+ * allowing it to transparently return KFENCE-allocated objects with a low
+ * probability using a static branch (the probability is controlled by the
+ * kfence.sample_interval boot parameter).
+ */
+static __always_inline void *kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags)
+{
+#ifdef CONFIG_KFENCE_STATIC_KEYS
+	if (static_branch_unlikely(&kfence_allocation_key))
+#else
+	if (unlikely(!atomic_read(&kfence_allocation_gate)))
+#endif
+		return __kfence_alloc(s, size, flags);
+	return NULL;
+}
+
+/**
+ * kfence_ksize() - get actual amount of memory allocated for a KFENCE object
+ * @addr: pointer to a heap object
+ *
+ * Return:
+ * * 0     - not a KFENCE object, must call __ksize() instead,
+ * * non-0 - this many bytes can be accessed without causing a memory error.
+ *
+ * kfence_ksize() returns the number of bytes requested for a KFENCE object at
+ * allocation time. This number may be less than the object size of the
+ * corresponding struct kmem_cache.
+ */
+size_t kfence_ksize(const void *addr);
+
+/**
+ * kfence_object_start() - find the beginning of a KFENCE object
+ * @addr: address within a KFENCE-allocated object
+ *
+ * Return: address of the beginning of the object.
+ *
+ * SL[AU]B-allocated objects are laid out within a page one by one, so it is
+ * easy to calculate the beginning of an object given a pointer inside it and
+ * the object size. The same is not true for KFENCE, which places a single
+ * object at either end of the page. This helper function is used to find the
+ * beginning of a KFENCE-allocated object.
+ */
+void *kfence_object_start(const void *addr);
+
+/**
+ * __kfence_free() - release a KFENCE heap object to KFENCE pool
+ * @addr: object to be freed
+ *
+ * Requires: is_kfence_address(addr)
+ *
+ * Release a KFENCE object and mark it as freed.
+ */
+void __kfence_free(void *addr);
+
+/**
+ * kfence_free() - try to release an arbitrary heap object to KFENCE pool
+ * @addr: object to be freed
+ *
+ * Return:
+ * * false - object doesn't belong to KFENCE pool and was ignored,
+ * * true  - object was released to KFENCE pool.
+ *
+ * Release a KFENCE object and mark it as freed. May be called on any object,
+ * even non-KFENCE objects, to simplify integration of the hooks into the
+ * allocator's free codepath. The allocator must check the return value to
+ * determine if it was a KFENCE object or not.
+ */
+static __always_inline __must_check bool kfence_free(void *addr)
+{
+	if (!is_kfence_address(addr))
+		return false;
+	__kfence_free(addr);
+	return true;
+}
+
+/**
+ * kfence_handle_page_fault() - perform page fault handling for KFENCE pages
+ * @addr: faulting address
+ *
+ * Return:
+ * * false - address outside KFENCE pool,
+ * * true  - page fault handled by KFENCE, no additional handling required.
+ *
+ * A page fault inside KFENCE pool indicates a memory error, such as an
+ * out-of-bounds access, a use-after-free or an invalid memory access. In these
+ * cases KFENCE prints an error message and marks the offending page as
+ * present, so that the kernel can proceed.
+ */
+bool __must_check kfence_handle_page_fault(unsigned long addr);
+
+#else /* CONFIG_KFENCE */
+
+static inline bool is_kfence_address(const void *addr) { return false; }
+static inline void kfence_alloc_pool(void) { }
+static inline void kfence_init(void) { }
+static inline void kfence_shutdown_cache(struct kmem_cache *s) { }
+static inline void *kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags) { return NULL; }
+static inline size_t kfence_ksize(const void *addr) { return 0; }
+static inline void *kfence_object_start(const void *addr) { return NULL; }
+static inline void __kfence_free(void *addr) { }
+static inline bool __must_check kfence_free(void *addr) { return false; }
+static inline bool __must_check kfence_handle_page_fault(unsigned long addr) { return false; }
+
+#endif
+
+#endif /* _LINUX_KFENCE_H */
--- a/init/main.c~mm-add-kernel-electric-fence-infrastructure
+++ a/init/main.c
@@ -40,6 +40,7 @@
 #include <linux/security.h>
 #include <linux/smp.h>
 #include <linux/profile.h>
+#include <linux/kfence.h>
 #include <linux/rcupdate.h>
 #include <linux/moduleparam.h>
 #include <linux/kallsyms.h>
@@ -824,6 +825,7 @@ static void __init mm_init(void)
 	 */
 	page_ext_init_flatmem();
 	init_mem_debugging_and_hardening();
+	kfence_alloc_pool();
 	report_meminit();
 	mem_init();
 	/* page_owner must be initialized after buddy is ready */
@@ -955,6 +957,7 @@ asmlinkage __visible void __init __no_sa
 	hrtimers_init();
 	softirq_init();
 	timekeeping_init();
+	kfence_init();
 
 	/*
 	 * For best initial stack canary entropy, prepare it after:
--- a/lib/Kconfig.debug~mm-add-kernel-electric-fence-infrastructure
+++ a/lib/Kconfig.debug
@@ -938,6 +938,7 @@ config DEBUG_STACKOVERFLOW
 	  If in doubt, say "N".
 
 source "lib/Kconfig.kasan"
+source "lib/Kconfig.kfence"
 
 endmenu # "Memory Debugging"
 
--- /dev/null
+++ a/lib/Kconfig.kfence
@@ -0,0 +1,67 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+config HAVE_ARCH_KFENCE
+	bool
+
+menuconfig KFENCE
+	bool "KFENCE: low-overhead sampling-based memory safety error detector"
+	depends on HAVE_ARCH_KFENCE && !KASAN && (SLAB || SLUB)
+	select STACKTRACE
+	help
+	  KFENCE is a low-overhead sampling-based detector of heap out-of-bounds
+	  access, use-after-free, and invalid-free errors. KFENCE is designed
+	  to have negligible cost to permit enabling it in production
+	  environments.
+
+	  Note that, KFENCE is not a substitute for explicit testing with tools
+	  such as KASAN. KFENCE can detect a subset of bugs that KASAN can
+	  detect, albeit at very different performance profiles. If you can
+	  afford to use KASAN, continue using KASAN, for example in test
+	  environments. If your kernel targets production use, and cannot
+	  enable KASAN due to its cost, consider using KFENCE.
+
+if KFENCE
+
+config KFENCE_STATIC_KEYS
+	bool "Use static keys to set up allocations"
+	default y
+	depends on JUMP_LABEL # To ensure performance, require jump labels
+	help
+	  Use static keys (static branches) to set up KFENCE allocations. Using
+	  static keys is normally recommended, because it avoids a dynamic
+	  branch in the allocator's fast path. However, with very low sample
+	  intervals, or on systems that do not support jump labels, a dynamic
+	  branch may still be an acceptable performance trade-off.
+
+config KFENCE_SAMPLE_INTERVAL
+	int "Default sample interval in milliseconds"
+	default 100
+	help
+	  The KFENCE sample interval determines the frequency with which heap
+	  allocations will be guarded by KFENCE. May be overridden via boot
+	  parameter "kfence.sample_interval".
+
+	  Set this to 0 to disable KFENCE by default, in which case only
+	  setting "kfence.sample_interval" to a non-zero value enables KFENCE.
+
+config KFENCE_NUM_OBJECTS
+	int "Number of guarded objects available"
+	range 1 65535
+	default 255
+	help
+	  The number of guarded objects available. For each KFENCE object, 2
+	  pages are required; with one containing the object and two adjacent
+	  ones used as guard pages.
+
+config KFENCE_STRESS_TEST_FAULTS
+	int "Stress testing of fault handling and error reporting" if EXPERT
+	default 0
+	help
+	  The inverse probability with which to randomly protect KFENCE object
+	  pages, resulting in spurious use-after-frees. The main purpose of
+	  this option is to stress test KFENCE with concurrent error reports
+	  and allocations/frees. A value of 0 disables stress testing logic.
+
+	  Only for KFENCE testing; set to 0 if you are not a KFENCE developer.
+
+endif # KFENCE
--- /dev/null
+++ a/mm/kfence/core.c
@@ -0,0 +1,840 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KFENCE guarded object allocator and fault handling.
+ *
+ * Copyright (C) 2020, Google LLC.
+ */
+
+#define pr_fmt(fmt) "kfence: " fmt
+
+#include <linux/atomic.h>
+#include <linux/bug.h>
+#include <linux/debugfs.h>
+#include <linux/kcsan-checks.h>
+#include <linux/kfence.h>
+#include <linux/list.h>
+#include <linux/lockdep.h>
+#include <linux/memblock.h>
+#include <linux/moduleparam.h>
+#include <linux/random.h>
+#include <linux/rcupdate.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/string.h>
+
+#include <asm/kfence.h>
+
+#include "kfence.h"
+
+/* Disables KFENCE on the first warning assuming an irrecoverable error. */
+#define KFENCE_WARN_ON(cond)                                                   \
+	({                                                                     \
+		const bool __cond = WARN_ON(cond);                             \
+		if (unlikely(__cond))                                          \
+			WRITE_ONCE(kfence_enabled, false);                     \
+		__cond;                                                        \
+	})
+
+/* === Data ================================================================= */
+
+static bool kfence_enabled __read_mostly;
+
+static unsigned long kfence_sample_interval __read_mostly = CONFIG_KFENCE_SAMPLE_INTERVAL;
+
+#ifdef MODULE_PARAM_PREFIX
+#undef MODULE_PARAM_PREFIX
+#endif
+#define MODULE_PARAM_PREFIX "kfence."
+
+static int param_set_sample_interval(const char *val, const struct kernel_param *kp)
+{
+	unsigned long num;
+	int ret = kstrtoul(val, 0, &num);
+
+	if (ret < 0)
+		return ret;
+
+	if (!num) /* Using 0 to indicate KFENCE is disabled. */
+		WRITE_ONCE(kfence_enabled, false);
+	else if (!READ_ONCE(kfence_enabled) && system_state != SYSTEM_BOOTING)
+		return -EINVAL; /* Cannot (re-)enable KFENCE on-the-fly. */
+
+	*((unsigned long *)kp->arg) = num;
+	return 0;
+}
+
+static int param_get_sample_interval(char *buffer, const struct kernel_param *kp)
+{
+	if (!READ_ONCE(kfence_enabled))
+		return sprintf(buffer, "0\n");
+
+	return param_get_ulong(buffer, kp);
+}
+
+static const struct kernel_param_ops sample_interval_param_ops = {
+	.set = param_set_sample_interval,
+	.get = param_get_sample_interval,
+};
+module_param_cb(sample_interval, &sample_interval_param_ops, &kfence_sample_interval, 0600);
+
+/* The pool of pages used for guard pages and objects. */
+char *__kfence_pool __ro_after_init;
+EXPORT_SYMBOL(__kfence_pool); /* Export for test modules. */
+
+/*
+ * Per-object metadata, with one-to-one mapping of object metadata to
+ * backing pages (in __kfence_pool).
+ */
+static_assert(CONFIG_KFENCE_NUM_OBJECTS > 0);
+struct kfence_metadata kfence_metadata[CONFIG_KFENCE_NUM_OBJECTS];
+
+/* Freelist with available objects. */
+static struct list_head kfence_freelist = LIST_HEAD_INIT(kfence_freelist);
+static DEFINE_RAW_SPINLOCK(kfence_freelist_lock); /* Lock protecting freelist. */
+
+#ifdef CONFIG_KFENCE_STATIC_KEYS
+/* The static key to set up a KFENCE allocation. */
+DEFINE_STATIC_KEY_FALSE(kfence_allocation_key);
+#endif
+
+/* Gates the allocation, ensuring only one succeeds in a given period. */
+atomic_t kfence_allocation_gate = ATOMIC_INIT(1);
+
+/* Statistics counters for debugfs. */
+enum kfence_counter_id {
+	KFENCE_COUNTER_ALLOCATED,
+	KFENCE_COUNTER_ALLOCS,
+	KFENCE_COUNTER_FREES,
+	KFENCE_COUNTER_ZOMBIES,
+	KFENCE_COUNTER_BUGS,
+	KFENCE_COUNTER_COUNT,
+};
+static atomic_long_t counters[KFENCE_COUNTER_COUNT];
+static const char *const counter_names[] = {
+	[KFENCE_COUNTER_ALLOCATED]	= "currently allocated",
+	[KFENCE_COUNTER_ALLOCS]		= "total allocations",
+	[KFENCE_COUNTER_FREES]		= "total frees",
+	[KFENCE_COUNTER_ZOMBIES]	= "zombie allocations",
+	[KFENCE_COUNTER_BUGS]		= "total bugs",
+};
+static_assert(ARRAY_SIZE(counter_names) == KFENCE_COUNTER_COUNT);
+
+/* === Internals ============================================================ */
+
+static bool kfence_protect(unsigned long addr)
+{
+	return !KFENCE_WARN_ON(!kfence_protect_page(ALIGN_DOWN(addr, PAGE_SIZE), true));
+}
+
+static bool kfence_unprotect(unsigned long addr)
+{
+	return !KFENCE_WARN_ON(!kfence_protect_page(ALIGN_DOWN(addr, PAGE_SIZE), false));
+}
+
+static inline struct kfence_metadata *addr_to_metadata(unsigned long addr)
+{
+	long index;
+
+	/* The checks do not affect performance; only called from slow-paths. */
+
+	if (!is_kfence_address((void *)addr))
+		return NULL;
+
+	/*
+	 * May be an invalid index if called with an address at the edge of
+	 * __kfence_pool, in which case we would report an "invalid access"
+	 * error.
+	 */
+	index = (addr - (unsigned long)__kfence_pool) / (PAGE_SIZE * 2) - 1;
+	if (index < 0 || index >= CONFIG_KFENCE_NUM_OBJECTS)
+		return NULL;
+
+	return &kfence_metadata[index];
+}
+
+static inline unsigned long metadata_to_pageaddr(const struct kfence_metadata *meta)
+{
+	unsigned long offset = (meta - kfence_metadata + 1) * PAGE_SIZE * 2;
+	unsigned long pageaddr = (unsigned long)&__kfence_pool[offset];
+
+	/* The checks do not affect performance; only called from slow-paths. */
+
+	/* Only call with a pointer into kfence_metadata. */
+	if (KFENCE_WARN_ON(meta < kfence_metadata ||
+			   meta >= kfence_metadata + CONFIG_KFENCE_NUM_OBJECTS))
+		return 0;
+
+	/*
+	 * This metadata object only ever maps to 1 page; verify that the stored
+	 * address is in the expected range.
+	 */
+	if (KFENCE_WARN_ON(ALIGN_DOWN(meta->addr, PAGE_SIZE) != pageaddr))
+		return 0;
+
+	return pageaddr;
+}
+
+/*
+ * Update the object's metadata state, including updating the alloc/free stacks
+ * depending on the state transition.
+ */
+static noinline void metadata_update_state(struct kfence_metadata *meta,
+					   enum kfence_object_state next)
+{
+	struct kfence_track *track =
+		next == KFENCE_OBJECT_FREED ? &meta->free_track : &meta->alloc_track;
+
+	lockdep_assert_held(&meta->lock);
+
+	/*
+	 * Skip over 1 (this) functions; noinline ensures we do not accidentally
+	 * skip over the caller by never inlining.
+	 */
+	track->num_stack_entries = stack_trace_save(track->stack_entries, KFENCE_STACK_DEPTH, 1);
+	track->pid = task_pid_nr(current);
+
+	/*
+	 * Pairs with READ_ONCE() in
+	 *	kfence_shutdown_cache(),
+	 *	kfence_handle_page_fault().
+	 */
+	WRITE_ONCE(meta->state, next);
+}
+
+/* Write canary byte to @addr. */
+static inline bool set_canary_byte(u8 *addr)
+{
+	*addr = KFENCE_CANARY_PATTERN(addr);
+	return true;
+}
+
+/* Check canary byte at @addr. */
+static inline bool check_canary_byte(u8 *addr)
+{
+	if (likely(*addr == KFENCE_CANARY_PATTERN(addr)))
+		return true;
+
+	atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]);
+	kfence_report_error((unsigned long)addr, addr_to_metadata((unsigned long)addr),
+			    KFENCE_ERROR_CORRUPTION);
+	return false;
+}
+
+/* __always_inline this to ensure we won't do an indirect call to fn. */
+static __always_inline void for_each_canary(const struct kfence_metadata *meta, bool (*fn)(u8 *))
+{
+	const unsigned long pageaddr = ALIGN_DOWN(meta->addr, PAGE_SIZE);
+	unsigned long addr;
+
+	lockdep_assert_held(&meta->lock);
+
+	/*
+	 * We'll iterate over each canary byte per-side until fn() returns
+	 * false. However, we'll still iterate over the canary bytes to the
+	 * right of the object even if there was an error in the canary bytes to
+	 * the left of the object. Specifically, if check_canary_byte()
+	 * generates an error, showing both sides might give more clues as to
+	 * what the error is about when displaying which bytes were corrupted.
+	 */
+
+	/* Apply to left of object. */
+	for (addr = pageaddr; addr < meta->addr; addr++) {
+		if (!fn((u8 *)addr))
+			break;
+	}
+
+	/* Apply to right of object. */
+	for (addr = meta->addr + meta->size; addr < pageaddr + PAGE_SIZE; addr++) {
+		if (!fn((u8 *)addr))
+			break;
+	}
+}
+
+static void *kfence_guarded_alloc(struct kmem_cache *cache, size_t size, gfp_t gfp)
+{
+	struct kfence_metadata *meta = NULL;
+	unsigned long flags;
+	struct page *page;
+	void *addr;
+
+	/* Try to obtain a free object. */
+	raw_spin_lock_irqsave(&kfence_freelist_lock, flags);
+	if (!list_empty(&kfence_freelist)) {
+		meta = list_entry(kfence_freelist.next, struct kfence_metadata, list);
+		list_del_init(&meta->list);
+	}
+	raw_spin_unlock_irqrestore(&kfence_freelist_lock, flags);
+	if (!meta)
+		return NULL;
+
+	if (unlikely(!raw_spin_trylock_irqsave(&meta->lock, flags))) {
+		/*
+		 * This is extremely unlikely -- we are reporting on a
+		 * use-after-free, which locked meta->lock, and the reporting
+		 * code via printk calls kmalloc() which ends up in
+		 * kfence_alloc() and tries to grab the same object that we're
+		 * reporting on. While it has never been observed, lockdep does
+		 * report that there is a possibility of deadlock. Fix it by
+		 * using trylock and bailing out gracefully.
+		 */
+		raw_spin_lock_irqsave(&kfence_freelist_lock, flags);
+		/* Put the object back on the freelist. */
+		list_add_tail(&meta->list, &kfence_freelist);
+		raw_spin_unlock_irqrestore(&kfence_freelist_lock, flags);
+
+		return NULL;
+	}
+
+	meta->addr = metadata_to_pageaddr(meta);
+	/* Unprotect if we're reusing this page. */
+	if (meta->state == KFENCE_OBJECT_FREED)
+		kfence_unprotect(meta->addr);
+
+	/*
+	 * Note: for allocations made before RNG initialization, will always
+	 * return zero. We still benefit from enabling KFENCE as early as
+	 * possible, even when the RNG is not yet available, as this will allow
+	 * KFENCE to detect bugs due to earlier allocations. The only downside
+	 * is that the out-of-bounds accesses detected are deterministic for
+	 * such allocations.
+	 */
+	if (prandom_u32_max(2)) {
+		/* Allocate on the "right" side, re-calculate address. */
+		meta->addr += PAGE_SIZE - size;
+		meta->addr = ALIGN_DOWN(meta->addr, cache->align);
+	}
+
+	addr = (void *)meta->addr;
+
+	/* Update remaining metadata. */
+	metadata_update_state(meta, KFENCE_OBJECT_ALLOCATED);
+	/* Pairs with READ_ONCE() in kfence_shutdown_cache(). */
+	WRITE_ONCE(meta->cache, cache);
+	meta->size = size;
+	for_each_canary(meta, set_canary_byte);
+
+	/* Set required struct page fields. */
+	page = virt_to_page(meta->addr);
+	page->slab_cache = cache;
+
+	raw_spin_unlock_irqrestore(&meta->lock, flags);
+
+	/* Memory initialization. */
+
+	/*
+	 * We check slab_want_init_on_alloc() ourselves, rather than letting
+	 * SL*B do the initialization, as otherwise we might overwrite KFENCE's
+	 * redzone.
+	 */
+	if (unlikely(slab_want_init_on_alloc(gfp, cache)))
+		memzero_explicit(addr, size);
+	if (cache->ctor)
+		cache->ctor(addr);
+
+	if (CONFIG_KFENCE_STRESS_TEST_FAULTS && !prandom_u32_max(CONFIG_KFENCE_STRESS_TEST_FAULTS))
+		kfence_protect(meta->addr); /* Random "faults" by protecting the object. */
+
+	atomic_long_inc(&counters[KFENCE_COUNTER_ALLOCATED]);
+	atomic_long_inc(&counters[KFENCE_COUNTER_ALLOCS]);
+
+	return addr;
+}
+
+static void kfence_guarded_free(void *addr, struct kfence_metadata *meta, bool zombie)
+{
+	struct kcsan_scoped_access assert_page_exclusive;
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&meta->lock, flags);
+
+	if (meta->state != KFENCE_OBJECT_ALLOCATED || meta->addr != (unsigned long)addr) {
+		/* Invalid or double-free, bail out. */
+		atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]);
+		kfence_report_error((unsigned long)addr, meta, KFENCE_ERROR_INVALID_FREE);
+		raw_spin_unlock_irqrestore(&meta->lock, flags);
+		return;
+	}
+
+	/* Detect racy use-after-free, or incorrect reallocation of this page by KFENCE. */
+	kcsan_begin_scoped_access((void *)ALIGN_DOWN((unsigned long)addr, PAGE_SIZE), PAGE_SIZE,
+				  KCSAN_ACCESS_SCOPED | KCSAN_ACCESS_WRITE | KCSAN_ACCESS_ASSERT,
+				  &assert_page_exclusive);
+
+	if (CONFIG_KFENCE_STRESS_TEST_FAULTS)
+		kfence_unprotect((unsigned long)addr); /* To check canary bytes. */
+
+	/* Restore page protection if there was an OOB access. */
+	if (meta->unprotected_page) {
+		kfence_protect(meta->unprotected_page);
+		meta->unprotected_page = 0;
+	}
+
+	/* Check canary bytes for memory corruption. */
+	for_each_canary(meta, check_canary_byte);
+
+	/*
+	 * Clear memory if init-on-free is set. While we protect the page, the
+	 * data is still there, and after a use-after-free is detected, we
+	 * unprotect the page, so the data is still accessible.
+	 */
+	if (!zombie && unlikely(slab_want_init_on_free(meta->cache)))
+		memzero_explicit(addr, meta->size);
+
+	/* Mark the object as freed. */
+	metadata_update_state(meta, KFENCE_OBJECT_FREED);
+
+	raw_spin_unlock_irqrestore(&meta->lock, flags);
+
+	/* Protect to detect use-after-frees. */
+	kfence_protect((unsigned long)addr);
+
+	kcsan_end_scoped_access(&assert_page_exclusive);
+	if (!zombie) {
+		/* Add it to the tail of the freelist for reuse. */
+		raw_spin_lock_irqsave(&kfence_freelist_lock, flags);
+		KFENCE_WARN_ON(!list_empty(&meta->list));
+		list_add_tail(&meta->list, &kfence_freelist);
+		raw_spin_unlock_irqrestore(&kfence_freelist_lock, flags);
+
+		atomic_long_dec(&counters[KFENCE_COUNTER_ALLOCATED]);
+		atomic_long_inc(&counters[KFENCE_COUNTER_FREES]);
+	} else {
+		/* See kfence_shutdown_cache(). */
+		atomic_long_inc(&counters[KFENCE_COUNTER_ZOMBIES]);
+	}
+}
+
+static void rcu_guarded_free(struct rcu_head *h)
+{
+	struct kfence_metadata *meta = container_of(h, struct kfence_metadata, rcu_head);
+
+	kfence_guarded_free((void *)meta->addr, meta, false);
+}
+
+static bool __init kfence_init_pool(void)
+{
+	unsigned long addr = (unsigned long)__kfence_pool;
+	struct page *pages;
+	int i;
+
+	if (!__kfence_pool)
+		return false;
+
+	if (!arch_kfence_init_pool())
+		goto err;
+
+	pages = virt_to_page(addr);
+
+	/*
+	 * Set up object pages: they must have PG_slab set, to avoid freeing
+	 * these as real pages.
+	 *
+	 * We also want to avoid inserting kfence_free() in the kfree()
+	 * fast-path in SLUB, and therefore need to ensure kfree() correctly
+	 * enters __slab_free() slow-path.
+	 */
+	for (i = 0; i < KFENCE_POOL_SIZE / PAGE_SIZE; i++) {
+		if (!i || (i % 2))
+			continue;
+
+		/* Verify we do not have a compound head page. */
+		if (WARN_ON(compound_head(&pages[i]) != &pages[i]))
+			goto err;
+
+		__SetPageSlab(&pages[i]);
+	}
+
+	/*
+	 * Protect the first 2 pages. The first page is mostly unnecessary, and
+	 * merely serves as an extended guard page. However, adding one
+	 * additional page in the beginning gives us an even number of pages,
+	 * which simplifies the mapping of address to metadata index.
+	 */
+	for (i = 0; i < 2; i++) {
+		if (unlikely(!kfence_protect(addr)))
+			goto err;
+
+		addr += PAGE_SIZE;
+	}
+
+	for (i = 0; i < CONFIG_KFENCE_NUM_OBJECTS; i++) {
+		struct kfence_metadata *meta = &kfence_metadata[i];
+
+		/* Initialize metadata. */
+		INIT_LIST_HEAD(&meta->list);
+		raw_spin_lock_init(&meta->lock);
+		meta->state = KFENCE_OBJECT_UNUSED;
+		meta->addr = addr; /* Initialize for validation in metadata_to_pageaddr(). */
+		list_add_tail(&meta->list, &kfence_freelist);
+
+		/* Protect the right redzone. */
+		if (unlikely(!kfence_protect(addr + PAGE_SIZE)))
+			goto err;
+
+		addr += 2 * PAGE_SIZE;
+	}
+
+	return true;
+
+err:
+	/*
+	 * Only release unprotected pages, and do not try to go back and change
+	 * page attributes due to risk of failing to do so as well. If changing
+	 * page attributes for some pages fails, it is very likely that it also
+	 * fails for the first page, and therefore expect addr==__kfence_pool in
+	 * most failure cases.
+	 */
+	memblock_free_late(__pa(addr), KFENCE_POOL_SIZE - (addr - (unsigned long)__kfence_pool));
+	__kfence_pool = NULL;
+	return false;
+}
+
+/* === DebugFS Interface ==================================================== */
+
+static int stats_show(struct seq_file *seq, void *v)
+{
+	int i;
+
+	seq_printf(seq, "enabled: %i\n", READ_ONCE(kfence_enabled));
+	for (i = 0; i < KFENCE_COUNTER_COUNT; i++)
+		seq_printf(seq, "%s: %ld\n", counter_names[i], atomic_long_read(&counters[i]));
+
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(stats);
+
+/*
+ * debugfs seq_file operations for /sys/kernel/debug/kfence/objects.
+ * start_object() and next_object() return the object index + 1, because NULL is used
+ * to stop iteration.
+ */
+static void *start_object(struct seq_file *seq, loff_t *pos)
+{
+	if (*pos < CONFIG_KFENCE_NUM_OBJECTS)
+		return (void *)((long)*pos + 1);
+	return NULL;
+}
+
+static void stop_object(struct seq_file *seq, void *v)
+{
+}
+
+static void *next_object(struct seq_file *seq, void *v, loff_t *pos)
+{
+	++*pos;
+	if (*pos < CONFIG_KFENCE_NUM_OBJECTS)
+		return (void *)((long)*pos + 1);
+	return NULL;
+}
+
+static int show_object(struct seq_file *seq, void *v)
+{
+	struct kfence_metadata *meta = &kfence_metadata[(long)v - 1];
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&meta->lock, flags);
+	kfence_print_object(seq, meta);
+	raw_spin_unlock_irqrestore(&meta->lock, flags);
+	seq_puts(seq, "---------------------------------\n");
+
+	return 0;
+}
+
+static const struct seq_operations object_seqops = {
+	.start = start_object,
+	.next = next_object,
+	.stop = stop_object,
+	.show = show_object,
+};
+
+static int open_objects(struct inode *inode, struct file *file)
+{
+	return seq_open(file, &object_seqops);
+}
+
+static const struct file_operations objects_fops = {
+	.open = open_objects,
+	.read = seq_read,
+	.llseek = seq_lseek,
+};
+
+static int __init kfence_debugfs_init(void)
+{
+	struct dentry *kfence_dir = debugfs_create_dir("kfence", NULL);
+
+	debugfs_create_file("stats", 0444, kfence_dir, NULL, &stats_fops);
+	debugfs_create_file("objects", 0400, kfence_dir, NULL, &objects_fops);
+	return 0;
+}
+
+late_initcall(kfence_debugfs_init);
+
+/* === Allocation Gate Timer ================================================ */
+
+/*
+ * Set up delayed work, which will enable and disable the static key. We need to
+ * use a work queue (rather than a simple timer), since enabling and disabling a
+ * static key cannot be done from an interrupt.
+ *
+ * Note: Toggling a static branch currently causes IPIs, and here we'll end up
+ * with a total of 2 IPIs to all CPUs. If this ends up a problem in future (with
+ * more aggressive sampling intervals), we could get away with a variant that
+ * avoids IPIs, at the cost of not immediately capturing allocations if the
+ * instructions remain cached.
+ */
+static struct delayed_work kfence_timer;
+static void toggle_allocation_gate(struct work_struct *work)
+{
+	if (!READ_ONCE(kfence_enabled))
+		return;
+
+	/* Enable static key, and await allocation to happen. */
+	atomic_set(&kfence_allocation_gate, 0);
+#ifdef CONFIG_KFENCE_STATIC_KEYS
+	static_branch_enable(&kfence_allocation_key);
+	/*
+	 * Await an allocation. Timeout after 1 second, in case the kernel stops
+	 * doing allocations, to avoid stalling this worker task for too long.
+	 */
+	{
+		unsigned long end_wait = jiffies + HZ;
+
+		do {
+			set_current_state(TASK_UNINTERRUPTIBLE);
+			if (atomic_read(&kfence_allocation_gate) != 0)
+				break;
+			schedule_timeout(1);
+		} while (time_before(jiffies, end_wait));
+		__set_current_state(TASK_RUNNING);
+	}
+	/* Disable static key and reset timer. */
+	static_branch_disable(&kfence_allocation_key);
+#endif
+	schedule_delayed_work(&kfence_timer, msecs_to_jiffies(kfence_sample_interval));
+}
+static DECLARE_DELAYED_WORK(kfence_timer, toggle_allocation_gate);
+
+/* === Public interface ===================================================== */
+
+void __init kfence_alloc_pool(void)
+{
+	if (!kfence_sample_interval)
+		return;
+
+	__kfence_pool = memblock_alloc(KFENCE_POOL_SIZE, PAGE_SIZE);
+
+	if (!__kfence_pool)
+		pr_err("failed to allocate pool\n");
+}
+
+void __init kfence_init(void)
+{
+	/* Setting kfence_sample_interval to 0 on boot disables KFENCE. */
+	if (!kfence_sample_interval)
+		return;
+
+	if (!kfence_init_pool()) {
+		pr_err("%s failed\n", __func__);
+		return;
+	}
+
+	WRITE_ONCE(kfence_enabled, true);
+	schedule_delayed_work(&kfence_timer, 0);
+	pr_info("initialized - using %lu bytes for %d objects", KFENCE_POOL_SIZE,
+		CONFIG_KFENCE_NUM_OBJECTS);
+	if (IS_ENABLED(CONFIG_DEBUG_KERNEL))
+		pr_cont(" at 0x%px-0x%px\n", (void *)__kfence_pool,
+			(void *)(__kfence_pool + KFENCE_POOL_SIZE));
+	else
+		pr_cont("\n");
+}
+
+void kfence_shutdown_cache(struct kmem_cache *s)
+{
+	unsigned long flags;
+	struct kfence_metadata *meta;
+	int i;
+
+	for (i = 0; i < CONFIG_KFENCE_NUM_OBJECTS; i++) {
+		bool in_use;
+
+		meta = &kfence_metadata[i];
+
+		/*
+		 * If we observe some inconsistent cache and state pair where we
+		 * should have returned false here, cache destruction is racing
+		 * with either kmem_cache_alloc() or kmem_cache_free(). Taking
+		 * the lock will not help, as different critical section
+		 * serialization will have the same outcome.
+		 */
+		if (READ_ONCE(meta->cache) != s ||
+		    READ_ONCE(meta->state) != KFENCE_OBJECT_ALLOCATED)
+			continue;
+
+		raw_spin_lock_irqsave(&meta->lock, flags);
+		in_use = meta->cache == s && meta->state == KFENCE_OBJECT_ALLOCATED;
+		raw_spin_unlock_irqrestore(&meta->lock, flags);
+
+		if (in_use) {
+			/*
+			 * This cache still has allocations, and we should not
+			 * release them back into the freelist so they can still
+			 * safely be used and retain the kernel's default
+			 * behaviour of keeping the allocations alive (leak the
+			 * cache); however, they effectively become "zombie
+			 * allocations" as the KFENCE objects are the only ones
+			 * still in use and the owning cache is being destroyed.
+			 *
+			 * We mark them freed, so that any subsequent use shows
+			 * more useful error messages that will include stack
+			 * traces of the user of the object, the original
+			 * allocation, and caller to shutdown_cache().
+			 */
+			kfence_guarded_free((void *)meta->addr, meta, /*zombie=*/true);
+		}
+	}
+
+	for (i = 0; i < CONFIG_KFENCE_NUM_OBJECTS; i++) {
+		meta = &kfence_metadata[i];
+
+		/* See above. */
+		if (READ_ONCE(meta->cache) != s || READ_ONCE(meta->state) != KFENCE_OBJECT_FREED)
+			continue;
+
+		raw_spin_lock_irqsave(&meta->lock, flags);
+		if (meta->cache == s && meta->state == KFENCE_OBJECT_FREED)
+			meta->cache = NULL;
+		raw_spin_unlock_irqrestore(&meta->lock, flags);
+	}
+}
+
+void *__kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags)
+{
+	/*
+	 * allocation_gate only needs to become non-zero, so it doesn't make
+	 * sense to continue writing to it and pay the associated contention
+	 * cost, in case we have a large number of concurrent allocations.
+	 */
+	if (atomic_read(&kfence_allocation_gate) || atomic_inc_return(&kfence_allocation_gate) > 1)
+		return NULL;
+
+	if (!READ_ONCE(kfence_enabled))
+		return NULL;
+
+	if (size > PAGE_SIZE)
+		return NULL;
+
+	return kfence_guarded_alloc(s, size, flags);
+}
+
+size_t kfence_ksize(const void *addr)
+{
+	const struct kfence_metadata *meta = addr_to_metadata((unsigned long)addr);
+
+	/*
+	 * Read locklessly -- if there is a race with __kfence_alloc(), this is
+	 * either a use-after-free or invalid access.
+	 */
+	return meta ? meta->size : 0;
+}
+
+void *kfence_object_start(const void *addr)
+{
+	const struct kfence_metadata *meta = addr_to_metadata((unsigned long)addr);
+
+	/*
+	 * Read locklessly -- if there is a race with __kfence_alloc(), this is
+	 * either a use-after-free or invalid access.
+	 */
+	return meta ? (void *)meta->addr : NULL;
+}
+
+void __kfence_free(void *addr)
+{
+	struct kfence_metadata *meta = addr_to_metadata((unsigned long)addr);
+
+	/*
+	 * If the objects of the cache are SLAB_TYPESAFE_BY_RCU, defer freeing
+	 * the object, as the object page may be recycled for other-typed
+	 * objects once it has been freed. meta->cache may be NULL if the cache
+	 * was destroyed.
+	 */
+	if (unlikely(meta->cache && (meta->cache->flags & SLAB_TYPESAFE_BY_RCU)))
+		call_rcu(&meta->rcu_head, rcu_guarded_free);
+	else
+		kfence_guarded_free(addr, meta, false);
+}
+
+bool kfence_handle_page_fault(unsigned long addr)
+{
+	const int page_index = (addr - (unsigned long)__kfence_pool) / PAGE_SIZE;
+	struct kfence_metadata *to_report = NULL;
+	enum kfence_error_type error_type;
+	unsigned long flags;
+
+	if (!is_kfence_address((void *)addr))
+		return false;
+
+	if (!READ_ONCE(kfence_enabled)) /* If disabled at runtime ... */
+		return kfence_unprotect(addr); /* ... unprotect and proceed. */
+
+	atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]);
+
+	if (page_index % 2) {
+		/* This is a redzone, report a buffer overflow. */
+		struct kfence_metadata *meta;
+		int distance = 0;
+
+		meta = addr_to_metadata(addr - PAGE_SIZE);
+		if (meta && READ_ONCE(meta->state) == KFENCE_OBJECT_ALLOCATED) {
+			to_report = meta;
+			/* Data race ok; distance calculation approximate. */
+			distance = addr - data_race(meta->addr + meta->size);
+		}
+
+		meta = addr_to_metadata(addr + PAGE_SIZE);
+		if (meta && READ_ONCE(meta->state) == KFENCE_OBJECT_ALLOCATED) {
+			/* Data race ok; distance calculation approximate. */
+			if (!to_report || distance > data_race(meta->addr) - addr)
+				to_report = meta;
+		}
+
+		if (!to_report)
+			goto out;
+
+		raw_spin_lock_irqsave(&to_report->lock, flags);
+		to_report->unprotected_page = addr;
+		error_type = KFENCE_ERROR_OOB;
+
+		/*
+		 * If the object was freed before we took the look we can still
+		 * report this as an OOB -- the report will simply show the
+		 * stacktrace of the free as well.
+		 */
+	} else {
+		to_report = addr_to_metadata(addr);
+		if (!to_report)
+			goto out;
+
+		raw_spin_lock_irqsave(&to_report->lock, flags);
+		error_type = KFENCE_ERROR_UAF;
+		/*
+		 * We may race with __kfence_alloc(), and it is possible that a
+		 * freed object may be reallocated. We simply report this as a
+		 * use-after-free, with the stack trace showing the place where
+		 * the object was re-allocated.
+		 */
+	}
+
+out:
+	if (to_report) {
+		kfence_report_error(addr, to_report, error_type);
+		raw_spin_unlock_irqrestore(&to_report->lock, flags);
+	} else {
+		/* This may be a UAF or OOB access, but we can't be sure. */
+		kfence_report_error(addr, NULL, KFENCE_ERROR_INVALID);
+	}
+
+	return kfence_unprotect(addr); /* Unprotect and let access proceed. */
+}
--- /dev/null
+++ a/mm/kfence/kfence.h
@@ -0,0 +1,113 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Kernel Electric-Fence (KFENCE). For more info please see
+ * Documentation/dev-tools/kfence.rst.
+ *
+ * Copyright (C) 2020, Google LLC.
+ */
+
+#ifndef MM_KFENCE_KFENCE_H
+#define MM_KFENCE_KFENCE_H
+
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+
+#include "../slab.h" /* for struct kmem_cache */
+
+/* For non-debug builds, avoid leaking kernel pointers into dmesg. */
+#ifdef CONFIG_DEBUG_KERNEL
+#define PTR_FMT "%px"
+#else
+#define PTR_FMT "%p"
+#endif
+
+/*
+ * Get the canary byte pattern for @addr. Use a pattern that varies based on the
+ * lower 3 bits of the address, to detect memory corruptions with higher
+ * probability, where similar constants are used.
+ */
+#define KFENCE_CANARY_PATTERN(addr) ((u8)0xaa ^ (u8)((unsigned long)(addr) & 0x7))
+
+/* Maximum stack depth for reports. */
+#define KFENCE_STACK_DEPTH 64
+
+/* KFENCE object states. */
+enum kfence_object_state {
+	KFENCE_OBJECT_UNUSED,		/* Object is unused. */
+	KFENCE_OBJECT_ALLOCATED,	/* Object is currently allocated. */
+	KFENCE_OBJECT_FREED,		/* Object was allocated, and then freed. */
+};
+
+/* Alloc/free tracking information. */
+struct kfence_track {
+	pid_t pid;
+	int num_stack_entries;
+	unsigned long stack_entries[KFENCE_STACK_DEPTH];
+};
+
+/* KFENCE metadata per guarded allocation. */
+struct kfence_metadata {
+	struct list_head list;		/* Freelist node; access under kfence_freelist_lock. */
+	struct rcu_head rcu_head;	/* For delayed freeing. */
+
+	/*
+	 * Lock protecting below data; to ensure consistency of the below data,
+	 * since the following may execute concurrently: __kfence_alloc(),
+	 * __kfence_free(), kfence_handle_page_fault(). However, note that we
+	 * cannot grab the same metadata off the freelist twice, and multiple
+	 * __kfence_alloc() cannot run concurrently on the same metadata.
+	 */
+	raw_spinlock_t lock;
+
+	/* The current state of the object; see above. */
+	enum kfence_object_state state;
+
+	/*
+	 * Allocated object address; cannot be calculated from size, because of
+	 * alignment requirements.
+	 *
+	 * Invariant: ALIGN_DOWN(addr, PAGE_SIZE) is constant.
+	 */
+	unsigned long addr;
+
+	/*
+	 * The size of the original allocation.
+	 */
+	size_t size;
+
+	/*
+	 * The kmem_cache cache of the last allocation; NULL if never allocated
+	 * or the cache has already been destroyed.
+	 */
+	struct kmem_cache *cache;
+
+	/*
+	 * In case of an invalid access, the page that was unprotected; we
+	 * optimistically only store one address.
+	 */
+	unsigned long unprotected_page;
+
+	/* Allocation and free stack information. */
+	struct kfence_track alloc_track;
+	struct kfence_track free_track;
+};
+
+extern struct kfence_metadata kfence_metadata[CONFIG_KFENCE_NUM_OBJECTS];
+
+/* KFENCE error types for report generation. */
+enum kfence_error_type {
+	KFENCE_ERROR_OOB,		/* Detected a out-of-bounds access. */
+	KFENCE_ERROR_UAF,		/* Detected a use-after-free access. */
+	KFENCE_ERROR_CORRUPTION,	/* Detected a memory corruption on free. */
+	KFENCE_ERROR_INVALID,		/* Invalid access of unknown type. */
+	KFENCE_ERROR_INVALID_FREE,	/* Invalid free. */
+};
+
+void kfence_report_error(unsigned long address, const struct kfence_metadata *meta,
+			 enum kfence_error_type type);
+
+void kfence_print_object(struct seq_file *seq, const struct kfence_metadata *meta);
+
+#endif /* MM_KFENCE_KFENCE_H */
--- /dev/null
+++ a/mm/kfence/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_KFENCE) := core.o report.o
--- /dev/null
+++ a/mm/kfence/report.c
@@ -0,0 +1,240 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KFENCE reporting.
+ *
+ * Copyright (C) 2020, Google LLC.
+ */
+
+#include <stdarg.h>
+
+#include <linux/kernel.h>
+#include <linux/lockdep.h>
+#include <linux/printk.h>
+#include <linux/seq_file.h>
+#include <linux/stacktrace.h>
+#include <linux/string.h>
+
+#include <asm/kfence.h>
+
+#include "kfence.h"
+
+/* Helper function to either print to a seq_file or to console. */
+__printf(2, 3)
+static void seq_con_printf(struct seq_file *seq, const char *fmt, ...)
+{
+	va_list args;
+
+	va_start(args, fmt);
+	if (seq)
+		seq_vprintf(seq, fmt, args);
+	else
+		vprintk(fmt, args);
+	va_end(args);
+}
+
+/*
+ * Get the number of stack entries to skip to get out of MM internals. @type is
+ * optional, and if set to NULL, assumes an allocation or free stack.
+ */
+static int get_stack_skipnr(const unsigned long stack_entries[], int num_entries,
+			    const enum kfence_error_type *type)
+{
+	char buf[64];
+	int skipnr, fallback = 0;
+	bool is_access_fault = false;
+
+	if (type) {
+		/* Depending on error type, find different stack entries. */
+		switch (*type) {
+		case KFENCE_ERROR_UAF:
+		case KFENCE_ERROR_OOB:
+		case KFENCE_ERROR_INVALID:
+			is_access_fault = true;
+			break;
+		case KFENCE_ERROR_CORRUPTION:
+		case KFENCE_ERROR_INVALID_FREE:
+			break;
+		}
+	}
+
+	for (skipnr = 0; skipnr < num_entries; skipnr++) {
+		int len = scnprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skipnr]);
+
+		if (is_access_fault) {
+			if (!strncmp(buf, KFENCE_SKIP_ARCH_FAULT_HANDLER, len))
+				goto found;
+		} else {
+			if (str_has_prefix(buf, "kfence_") || str_has_prefix(buf, "__kfence_") ||
+			    !strncmp(buf, "__slab_free", len)) {
+				/*
+				 * In case of tail calls from any of the below
+				 * to any of the above.
+				 */
+				fallback = skipnr + 1;
+			}
+
+			/* Also the *_bulk() variants by only checking prefixes. */
+			if (str_has_prefix(buf, "kfree") ||
+			    str_has_prefix(buf, "kmem_cache_free") ||
+			    str_has_prefix(buf, "__kmalloc") ||
+			    str_has_prefix(buf, "kmem_cache_alloc"))
+				goto found;
+		}
+	}
+	if (fallback < num_entries)
+		return fallback;
+found:
+	skipnr++;
+	return skipnr < num_entries ? skipnr : 0;
+}
+
+static void kfence_print_stack(struct seq_file *seq, const struct kfence_metadata *meta,
+			       bool show_alloc)
+{
+	const struct kfence_track *track = show_alloc ? &meta->alloc_track : &meta->free_track;
+
+	if (track->num_stack_entries) {
+		/* Skip allocation/free internals stack. */
+		int i = get_stack_skipnr(track->stack_entries, track->num_stack_entries, NULL);
+
+		/* stack_trace_seq_print() does not exist; open code our own. */
+		for (; i < track->num_stack_entries; i++)
+			seq_con_printf(seq, " %pS\n", (void *)track->stack_entries[i]);
+	} else {
+		seq_con_printf(seq, " no %s stack\n", show_alloc ? "allocation" : "deallocation");
+	}
+}
+
+void kfence_print_object(struct seq_file *seq, const struct kfence_metadata *meta)
+{
+	const int size = abs(meta->size);
+	const unsigned long start = meta->addr;
+	const struct kmem_cache *const cache = meta->cache;
+
+	lockdep_assert_held(&meta->lock);
+
+	if (meta->state == KFENCE_OBJECT_UNUSED) {
+		seq_con_printf(seq, "kfence-#%zd unused\n", meta - kfence_metadata);
+		return;
+	}
+
+	seq_con_printf(seq,
+		       "kfence-#%zd [0x" PTR_FMT "-0x" PTR_FMT
+		       ", size=%d, cache=%s] allocated by task %d:\n",
+		       meta - kfence_metadata, (void *)start, (void *)(start + size - 1), size,
+		       (cache && cache->name) ? cache->name : "<destroyed>", meta->alloc_track.pid);
+	kfence_print_stack(seq, meta, true);
+
+	if (meta->state == KFENCE_OBJECT_FREED) {
+		seq_con_printf(seq, "\nfreed by task %d:\n", meta->free_track.pid);
+		kfence_print_stack(seq, meta, false);
+	}
+}
+
+/*
+ * Show bytes at @addr that are different from the expected canary values, up to
+ * @max_bytes.
+ */
+static void print_diff_canary(unsigned long address, size_t bytes_to_show,
+			      const struct kfence_metadata *meta)
+{
+	const unsigned long show_until_addr = address + bytes_to_show;
+	const u8 *cur, *end;
+
+	/* Do not show contents of object nor read into following guard page. */
+	end = (const u8 *)(address < meta->addr ? min(show_until_addr, meta->addr)
+						: min(show_until_addr, PAGE_ALIGN(address)));
+
+	pr_cont("[");
+	for (cur = (const u8 *)address; cur < end; cur++) {
+		if (*cur == KFENCE_CANARY_PATTERN(cur))
+			pr_cont(" .");
+		else if (IS_ENABLED(CONFIG_DEBUG_KERNEL))
+			pr_cont(" 0x%02x", *cur);
+		else /* Do not leak kernel memory in non-debug builds. */
+			pr_cont(" !");
+	}
+	pr_cont(" ]");
+}
+
+void kfence_report_error(unsigned long address, const struct kfence_metadata *meta,
+			 enum kfence_error_type type)
+{
+	unsigned long stack_entries[KFENCE_STACK_DEPTH] = { 0 };
+	int num_stack_entries = stack_trace_save(stack_entries, KFENCE_STACK_DEPTH, 1);
+	int skipnr = get_stack_skipnr(stack_entries, num_stack_entries, &type);
+	const ptrdiff_t object_index = meta ? meta - kfence_metadata : -1;
+
+	/* Require non-NULL meta, except if KFENCE_ERROR_INVALID. */
+	if (WARN_ON(type != KFENCE_ERROR_INVALID && !meta))
+		return;
+
+	if (meta)
+		lockdep_assert_held(&meta->lock);
+	/*
+	 * Because we may generate reports in printk-unfriendly parts of the
+	 * kernel, such as scheduler code, the use of printk() could deadlock.
+	 * Until such time that all printing code here is safe in all parts of
+	 * the kernel, accept the risk, and just get our message out (given the
+	 * system might already behave unpredictably due to the memory error).
+	 * As such, also disable lockdep to hide warnings, and avoid disabling
+	 * lockdep for the rest of the kernel.
+	 */
+	lockdep_off();
+
+	pr_err("==================================================================\n");
+	/* Print report header. */
+	switch (type) {
+	case KFENCE_ERROR_OOB: {
+		const bool left_of_object = address < meta->addr;
+
+		pr_err("BUG: KFENCE: out-of-bounds in %pS\n\n", (void *)stack_entries[skipnr]);
+		pr_err("Out-of-bounds access at 0x" PTR_FMT " (%luB %s of kfence-#%zd):\n",
+		       (void *)address,
+		       left_of_object ? meta->addr - address : address - meta->addr,
+		       left_of_object ? "left" : "right", object_index);
+		break;
+	}
+	case KFENCE_ERROR_UAF:
+		pr_err("BUG: KFENCE: use-after-free in %pS\n\n", (void *)stack_entries[skipnr]);
+		pr_err("Use-after-free access at 0x" PTR_FMT " (in kfence-#%zd):\n",
+		       (void *)address, object_index);
+		break;
+	case KFENCE_ERROR_CORRUPTION:
+		pr_err("BUG: KFENCE: memory corruption in %pS\n\n", (void *)stack_entries[skipnr]);
+		pr_err("Corrupted memory at 0x" PTR_FMT " ", (void *)address);
+		print_diff_canary(address, 16, meta);
+		pr_cont(" (in kfence-#%zd):\n", object_index);
+		break;
+	case KFENCE_ERROR_INVALID:
+		pr_err("BUG: KFENCE: invalid access in %pS\n\n", (void *)stack_entries[skipnr]);
+		pr_err("Invalid access at 0x" PTR_FMT ":\n", (void *)address);
+		break;
+	case KFENCE_ERROR_INVALID_FREE:
+		pr_err("BUG: KFENCE: invalid free in %pS\n\n", (void *)stack_entries[skipnr]);
+		pr_err("Invalid free of 0x" PTR_FMT " (in kfence-#%zd):\n", (void *)address,
+		       object_index);
+		break;
+	}
+
+	/* Print stack trace and object info. */
+	stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr, 0);
+
+	if (meta) {
+		pr_err("\n");
+		kfence_print_object(NULL, meta);
+	}
+
+	/* Print report footer. */
+	pr_err("\n");
+	dump_stack_print_info(KERN_ERR);
+	pr_err("==================================================================\n");
+
+	lockdep_on();
+
+	if (panic_on_warn)
+		panic("panic_on_warn set ...\n");
+
+	/* We encountered a memory unsafety error, taint the kernel! */
+	add_taint(TAINT_BAD_PAGE, LOCKDEP_STILL_OK);
+}
--- a/mm/Makefile~mm-add-kernel-electric-fence-infrastructure
+++ a/mm/Makefile
@@ -81,6 +81,7 @@ obj-$(CONFIG_PAGE_POISONING) += page_poi
 obj-$(CONFIG_SLAB) += slab.o
 obj-$(CONFIG_SLUB) += slub.o
 obj-$(CONFIG_KASAN)	+= kasan/
+obj-$(CONFIG_KFENCE) += kfence/
 obj-$(CONFIG_FAILSLAB) += failslab.o
 obj-$(CONFIG_MEMORY_HOTPLUG) += memory_hotplug.o
 obj-$(CONFIG_MEMTEST)		+= memtest.o
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 057/118] x86, kfence: enable KFENCE for x86
  2021-02-26  1:14 incoming Andrew Morton
                   ` (55 preceding siblings ...)
  2021-02-26  1:18 ` [patch 056/118] mm: add Kernel Electric-Fence infrastructure Andrew Morton
@ 2021-02-26  1:18 ` Andrew Morton
  2021-02-26  1:19 ` [patch 058/118] arm64, kfence: enable KFENCE for ARM64 Andrew Morton
                   ` (61 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:18 UTC (permalink / raw)
  To: akpm, andreyknvl, aryabinin, bp, catalin.marinas, cl, corbet,
	dave.hansen, dvyukov, edumazet, elver, glider, gregkh, hdanton,
	hpa, iamjoonsoo.kim, jannh, joern, keescook, linux-mm, luto,
	mark.rutland, mingo, mm-commits, paulmck, penberg, peterz,
	rientjes, sjpark, tglx, torvalds, vbabka, will

From: Alexander Potapenko <glider@google.com>
Subject: x86, kfence: enable KFENCE for x86

Add architecture specific implementation details for KFENCE and enable
KFENCE for the x86 architecture. In particular, this implements the
required interface in <asm/kfence.h> for setting up the pool and
providing helper functions for protecting and unprotecting pages.

For x86, we need to ensure that the pool uses 4K pages, which is done
using the set_memory_4k() helper function.

[elver@google.com: add missing copyright and description header]
  Link: https://lkml.kernel.org/r/20210118092159.145934-2-elver@google.com
Link: https://lkml.kernel.org/r/20201103175841.3495947-3-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Co-developed-by: Marco Elver <elver@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/x86/Kconfig              |    1 
 arch/x86/include/asm/kfence.h |   70 ++++++++++++++++++++++++++++++++
 arch/x86/mm/fault.c           |    5 ++
 3 files changed, 76 insertions(+)

--- /dev/null
+++ a/arch/x86/include/asm/kfence.h
@@ -0,0 +1,70 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * x86 KFENCE support.
+ *
+ * Copyright (C) 2020, Google LLC.
+ */
+
+#ifndef _ASM_X86_KFENCE_H
+#define _ASM_X86_KFENCE_H
+
+#include <linux/bug.h>
+#include <linux/kfence.h>
+
+#include <asm/pgalloc.h>
+#include <asm/pgtable.h>
+#include <asm/set_memory.h>
+#include <asm/tlbflush.h>
+
+/*
+ * The page fault handler entry function, up to which the stack trace is
+ * truncated in reports.
+ */
+#define KFENCE_SKIP_ARCH_FAULT_HANDLER "asm_exc_page_fault"
+
+/* Force 4K pages for __kfence_pool. */
+static inline bool arch_kfence_init_pool(void)
+{
+	unsigned long addr;
+
+	for (addr = (unsigned long)__kfence_pool; is_kfence_address((void *)addr);
+	     addr += PAGE_SIZE) {
+		unsigned int level;
+
+		if (!lookup_address(addr, &level))
+			return false;
+
+		if (level != PG_LEVEL_4K)
+			set_memory_4k(addr, 1);
+	}
+
+	return true;
+}
+
+/* Protect the given page and flush TLB. */
+static inline bool kfence_protect_page(unsigned long addr, bool protect)
+{
+	unsigned int level;
+	pte_t *pte = lookup_address(addr, &level);
+
+	if (WARN_ON(!pte || level != PG_LEVEL_4K))
+		return false;
+
+	/*
+	 * We need to avoid IPIs, as we may get KFENCE allocations or faults
+	 * with interrupts disabled. Therefore, the below is best-effort, and
+	 * does not flush TLBs on all CPUs. We can tolerate some inaccuracy;
+	 * lazy fault handling takes care of faults after the page is PRESENT.
+	 */
+
+	if (protect)
+		set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT));
+	else
+		set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT));
+
+	/* Flush this CPU's TLB. */
+	flush_tlb_one_kernel(addr);
+	return true;
+}
+
+#endif /* _ASM_X86_KFENCE_H */
--- a/arch/x86/Kconfig~x86-kfence-enable-kfence-for-x86
+++ a/arch/x86/Kconfig
@@ -151,6 +151,7 @@ config X86
 	select HAVE_ARCH_JUMP_LABEL_RELATIVE
 	select HAVE_ARCH_KASAN			if X86_64
 	select HAVE_ARCH_KASAN_VMALLOC		if X86_64
+	select HAVE_ARCH_KFENCE
 	select HAVE_ARCH_KGDB
 	select HAVE_ARCH_MMAP_RND_BITS		if MMU
 	select HAVE_ARCH_MMAP_RND_COMPAT_BITS	if MMU && COMPAT
--- a/arch/x86/mm/fault.c~x86-kfence-enable-kfence-for-x86
+++ a/arch/x86/mm/fault.c
@@ -9,6 +9,7 @@
 #include <linux/kdebug.h>		/* oops_begin/end, ...		*/
 #include <linux/extable.h>		/* search_exception_tables	*/
 #include <linux/memblock.h>		/* max_low_pfn			*/
+#include <linux/kfence.h>		/* kfence_handle_page_fault	*/
 #include <linux/kprobes.h>		/* NOKPROBE_SYMBOL, ...		*/
 #include <linux/mmiotrace.h>		/* kmmio_handler, ...		*/
 #include <linux/perf_event.h>		/* perf_sw_event		*/
@@ -680,6 +681,10 @@ page_fault_oops(struct pt_regs *regs, un
 	if (IS_ENABLED(CONFIG_EFI))
 		efi_crash_gracefully_on_page_fault(address);
 
+	/* Only not-present faults should be handled by KFENCE. */
+	if (!(error_code & X86_PF_PROT) && kfence_handle_page_fault(address))
+		return;
+
 oops:
 	/*
 	 * Oops. The kernel tried to access some bad page. We'll have to
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 058/118] arm64, kfence: enable KFENCE for ARM64
  2021-02-26  1:14 incoming Andrew Morton
                   ` (56 preceding siblings ...)
  2021-02-26  1:18 ` [patch 057/118] x86, kfence: enable KFENCE for x86 Andrew Morton
@ 2021-02-26  1:19 ` Andrew Morton
  2021-02-26  1:19 ` [patch 059/118] kfence: use pt_regs to generate stack trace on faults Andrew Morton
                   ` (60 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:19 UTC (permalink / raw)
  To: akpm, andreyknvl, aryabinin, bp, catalin.marinas, cl, corbet,
	dave.hansen, dvyukov, edumazet, elver, glider, gregkh, hdanton,
	hpa, iamjoonsoo.kim, jannh, joern, keescook, linux-mm, luto,
	mark.rutland, mingo, mm-commits, paulmck, penberg, peterz,
	rientjes, sjpark, tglx, torvalds, vbabka, will

From: Marco Elver <elver@google.com>
Subject: arm64, kfence: enable KFENCE for ARM64

Add architecture specific implementation details for KFENCE and enable
KFENCE for the arm64 architecture. In particular, this implements the
required interface in <asm/kfence.h>.

KFENCE requires that attributes for pages from its memory pool can
individually be set. Therefore, force the entire linear map to be mapped
at page granularity. Doing so may result in extra memory allocated for
page tables in case rodata=full is not set; however, currently
CONFIG_RODATA_FULL_DEFAULT_ENABLED=y is the default, and the common case
is therefore not affected by this change.

[elver@google.com: add missing copyright and description header]
  Link: https://lkml.kernel.org/r/20210118092159.145934-3-elver@google.com
Link: https://lkml.kernel.org/r/20201103175841.3495947-4-elver@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Co-developed-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/arm64/Kconfig              |    1 +
 arch/arm64/include/asm/kfence.h |   24 ++++++++++++++++++++++++
 arch/arm64/mm/fault.c           |    4 ++++
 arch/arm64/mm/mmu.c             |    8 +++++++-
 4 files changed, 36 insertions(+), 1 deletion(-)

--- /dev/null
+++ a/arch/arm64/include/asm/kfence.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * arm64 KFENCE support.
+ *
+ * Copyright (C) 2020, Google LLC.
+ */
+
+#ifndef __ASM_KFENCE_H
+#define __ASM_KFENCE_H
+
+#include <asm/cacheflush.h>
+
+#define KFENCE_SKIP_ARCH_FAULT_HANDLER "el1_sync"
+
+static inline bool arch_kfence_init_pool(void) { return true; }
+
+static inline bool kfence_protect_page(unsigned long addr, bool protect)
+{
+	set_memory_valid(addr, 1, !protect);
+
+	return true;
+}
+
+#endif /* __ASM_KFENCE_H */
--- a/arch/arm64/Kconfig~arm64-kfence-enable-kfence-for-arm64
+++ a/arch/arm64/Kconfig
@@ -140,6 +140,7 @@ config ARM64
 	select HAVE_ARCH_KASAN if !(ARM64_16K_PAGES && ARM64_VA_BITS_48)
 	select HAVE_ARCH_KASAN_SW_TAGS if HAVE_ARCH_KASAN
 	select HAVE_ARCH_KASAN_HW_TAGS if (HAVE_ARCH_KASAN && ARM64_MTE)
+	select HAVE_ARCH_KFENCE
 	select HAVE_ARCH_KGDB
 	select HAVE_ARCH_MMAP_RND_BITS
 	select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
--- a/arch/arm64/mm/fault.c~arm64-kfence-enable-kfence-for-arm64
+++ a/arch/arm64/mm/fault.c
@@ -10,6 +10,7 @@
 #include <linux/acpi.h>
 #include <linux/bitfield.h>
 #include <linux/extable.h>
+#include <linux/kfence.h>
 #include <linux/signal.h>
 #include <linux/mm.h>
 #include <linux/hardirq.h>
@@ -389,6 +390,9 @@ static void __do_kernel_fault(unsigned l
 	} else if (addr < PAGE_SIZE) {
 		msg = "NULL pointer dereference";
 	} else {
+		if (kfence_handle_page_fault(addr))
+			return;
+
 		msg = "paging request";
 	}
 
--- a/arch/arm64/mm/mmu.c~arm64-kfence-enable-kfence-for-arm64
+++ a/arch/arm64/mm/mmu.c
@@ -1465,7 +1465,13 @@ int arch_add_memory(int nid, u64 start,
 	int ret, flags = 0;
 
 	VM_BUG_ON(!mhp_range_allowed(start, size, true));
-	if (rodata_full || debug_pagealloc_enabled())
+
+	/*
+	 * KFENCE requires linear map to be mapped at page granularity, so that
+	 * it is possible to protect/unprotect single pages in the KFENCE pool.
+	 */
+	if (rodata_full || debug_pagealloc_enabled() ||
+	    IS_ENABLED(CONFIG_KFENCE))
 		flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
 
 	__create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 059/118] kfence: use pt_regs to generate stack trace on faults
  2021-02-26  1:14 incoming Andrew Morton
                   ` (57 preceding siblings ...)
  2021-02-26  1:19 ` [patch 058/118] arm64, kfence: enable KFENCE for ARM64 Andrew Morton
@ 2021-02-26  1:19 ` Andrew Morton
  2021-02-26  1:19 ` [patch 060/118] mm, kfence: insert KFENCE hooks for SLAB Andrew Morton
                   ` (59 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:19 UTC (permalink / raw)
  To: akpm, dvyukov, elver, glider, jannh, linux-mm, mark.rutland,
	mm-commits, torvalds

From: Marco Elver <elver@google.com>
Subject: kfence: use pt_regs to generate stack trace on faults

Instead of removing the fault handling portion of the stack trace based on
the fault handler's name, just use struct pt_regs directly.

Change kfence_handle_page_fault() to take a struct pt_regs, and plumb it
through to kfence_report_error() for out-of-bounds, use-after-free, or
invalid access errors, where pt_regs is used to generate the stack trace.

If the kernel is a DEBUG_KERNEL, also show registers for more information.

Link: https://lkml.kernel.org/r/20201105092133.2075331-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/arm64/include/asm/kfence.h |    2 
 arch/arm64/mm/fault.c           |    2 
 arch/x86/include/asm/kfence.h   |    6 --
 arch/x86/mm/fault.c             |    2 
 include/linux/kfence.h          |    5 +-
 mm/kfence/core.c                |   10 ++--
 mm/kfence/kfence.h              |    4 -
 mm/kfence/report.c              |   63 +++++++++++++++++-------------
 8 files changed, 48 insertions(+), 46 deletions(-)

--- a/arch/arm64/include/asm/kfence.h~kfence-use-pt_regs-to-generate-stack-trace-on-faults
+++ a/arch/arm64/include/asm/kfence.h
@@ -10,8 +10,6 @@
 
 #include <asm/cacheflush.h>
 
-#define KFENCE_SKIP_ARCH_FAULT_HANDLER "el1_sync"
-
 static inline bool arch_kfence_init_pool(void) { return true; }
 
 static inline bool kfence_protect_page(unsigned long addr, bool protect)
--- a/arch/arm64/mm/fault.c~kfence-use-pt_regs-to-generate-stack-trace-on-faults
+++ a/arch/arm64/mm/fault.c
@@ -390,7 +390,7 @@ static void __do_kernel_fault(unsigned l
 	} else if (addr < PAGE_SIZE) {
 		msg = "NULL pointer dereference";
 	} else {
-		if (kfence_handle_page_fault(addr))
+		if (kfence_handle_page_fault(addr, regs))
 			return;
 
 		msg = "paging request";
--- a/arch/x86/include/asm/kfence.h~kfence-use-pt_regs-to-generate-stack-trace-on-faults
+++ a/arch/x86/include/asm/kfence.h
@@ -16,12 +16,6 @@
 #include <asm/set_memory.h>
 #include <asm/tlbflush.h>
 
-/*
- * The page fault handler entry function, up to which the stack trace is
- * truncated in reports.
- */
-#define KFENCE_SKIP_ARCH_FAULT_HANDLER "asm_exc_page_fault"
-
 /* Force 4K pages for __kfence_pool. */
 static inline bool arch_kfence_init_pool(void)
 {
--- a/arch/x86/mm/fault.c~kfence-use-pt_regs-to-generate-stack-trace-on-faults
+++ a/arch/x86/mm/fault.c
@@ -682,7 +682,7 @@ page_fault_oops(struct pt_regs *regs, un
 		efi_crash_gracefully_on_page_fault(address);
 
 	/* Only not-present faults should be handled by KFENCE. */
-	if (!(error_code & X86_PF_PROT) && kfence_handle_page_fault(address))
+	if (!(error_code & X86_PF_PROT) && kfence_handle_page_fault(address, regs))
 		return;
 
 oops:
--- a/include/linux/kfence.h~kfence-use-pt_regs-to-generate-stack-trace-on-faults
+++ a/include/linux/kfence.h
@@ -186,6 +186,7 @@ static __always_inline __must_check bool
 /**
  * kfence_handle_page_fault() - perform page fault handling for KFENCE pages
  * @addr: faulting address
+ * @regs: current struct pt_regs (can be NULL, but shows full stack trace)
  *
  * Return:
  * * false - address outside KFENCE pool,
@@ -196,7 +197,7 @@ static __always_inline __must_check bool
  * cases KFENCE prints an error message and marks the offending page as
  * present, so that the kernel can proceed.
  */
-bool __must_check kfence_handle_page_fault(unsigned long addr);
+bool __must_check kfence_handle_page_fault(unsigned long addr, struct pt_regs *regs);
 
 #else /* CONFIG_KFENCE */
 
@@ -209,7 +210,7 @@ static inline size_t kfence_ksize(const
 static inline void *kfence_object_start(const void *addr) { return NULL; }
 static inline void __kfence_free(void *addr) { }
 static inline bool __must_check kfence_free(void *addr) { return false; }
-static inline bool __must_check kfence_handle_page_fault(unsigned long addr) { return false; }
+static inline bool __must_check kfence_handle_page_fault(unsigned long addr, struct pt_regs *regs) { return false; }
 
 #endif
 
--- a/mm/kfence/core.c~kfence-use-pt_regs-to-generate-stack-trace-on-faults
+++ a/mm/kfence/core.c
@@ -216,7 +216,7 @@ static inline bool check_canary_byte(u8
 		return true;
 
 	atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]);
-	kfence_report_error((unsigned long)addr, addr_to_metadata((unsigned long)addr),
+	kfence_report_error((unsigned long)addr, NULL, addr_to_metadata((unsigned long)addr),
 			    KFENCE_ERROR_CORRUPTION);
 	return false;
 }
@@ -351,7 +351,7 @@ static void kfence_guarded_free(void *ad
 	if (meta->state != KFENCE_OBJECT_ALLOCATED || meta->addr != (unsigned long)addr) {
 		/* Invalid or double-free, bail out. */
 		atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]);
-		kfence_report_error((unsigned long)addr, meta, KFENCE_ERROR_INVALID_FREE);
+		kfence_report_error((unsigned long)addr, NULL, meta, KFENCE_ERROR_INVALID_FREE);
 		raw_spin_unlock_irqrestore(&meta->lock, flags);
 		return;
 	}
@@ -766,7 +766,7 @@ void __kfence_free(void *addr)
 		kfence_guarded_free(addr, meta, false);
 }
 
-bool kfence_handle_page_fault(unsigned long addr)
+bool kfence_handle_page_fault(unsigned long addr, struct pt_regs *regs)
 {
 	const int page_index = (addr - (unsigned long)__kfence_pool) / PAGE_SIZE;
 	struct kfence_metadata *to_report = NULL;
@@ -829,11 +829,11 @@ bool kfence_handle_page_fault(unsigned l
 
 out:
 	if (to_report) {
-		kfence_report_error(addr, to_report, error_type);
+		kfence_report_error(addr, regs, to_report, error_type);
 		raw_spin_unlock_irqrestore(&to_report->lock, flags);
 	} else {
 		/* This may be a UAF or OOB access, but we can't be sure. */
-		kfence_report_error(addr, NULL, KFENCE_ERROR_INVALID);
+		kfence_report_error(addr, regs, NULL, KFENCE_ERROR_INVALID);
 	}
 
 	return kfence_unprotect(addr); /* Unprotect and let access proceed. */
--- a/mm/kfence/kfence.h~kfence-use-pt_regs-to-generate-stack-trace-on-faults
+++ a/mm/kfence/kfence.h
@@ -105,8 +105,8 @@ enum kfence_error_type {
 	KFENCE_ERROR_INVALID_FREE,	/* Invalid free. */
 };
 
-void kfence_report_error(unsigned long address, const struct kfence_metadata *meta,
-			 enum kfence_error_type type);
+void kfence_report_error(unsigned long address, struct pt_regs *regs,
+			 const struct kfence_metadata *meta, enum kfence_error_type type);
 
 void kfence_print_object(struct seq_file *seq, const struct kfence_metadata *meta);
 
--- a/mm/kfence/report.c~kfence-use-pt_regs-to-generate-stack-trace-on-faults
+++ a/mm/kfence/report.c
@@ -10,6 +10,7 @@
 #include <linux/kernel.h>
 #include <linux/lockdep.h>
 #include <linux/printk.h>
+#include <linux/sched/debug.h>
 #include <linux/seq_file.h>
 #include <linux/stacktrace.h>
 #include <linux/string.h>
@@ -41,7 +42,6 @@ static int get_stack_skipnr(const unsign
 {
 	char buf[64];
 	int skipnr, fallback = 0;
-	bool is_access_fault = false;
 
 	if (type) {
 		/* Depending on error type, find different stack entries. */
@@ -49,8 +49,12 @@ static int get_stack_skipnr(const unsign
 		case KFENCE_ERROR_UAF:
 		case KFENCE_ERROR_OOB:
 		case KFENCE_ERROR_INVALID:
-			is_access_fault = true;
-			break;
+			/*
+			 * kfence_handle_page_fault() may be called with pt_regs
+			 * set to NULL; in that case we'll simply show the full
+			 * stack trace.
+			 */
+			return 0;
 		case KFENCE_ERROR_CORRUPTION:
 		case KFENCE_ERROR_INVALID_FREE:
 			break;
@@ -60,26 +64,21 @@ static int get_stack_skipnr(const unsign
 	for (skipnr = 0; skipnr < num_entries; skipnr++) {
 		int len = scnprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skipnr]);
 
-		if (is_access_fault) {
-			if (!strncmp(buf, KFENCE_SKIP_ARCH_FAULT_HANDLER, len))
-				goto found;
-		} else {
-			if (str_has_prefix(buf, "kfence_") || str_has_prefix(buf, "__kfence_") ||
-			    !strncmp(buf, "__slab_free", len)) {
-				/*
-				 * In case of tail calls from any of the below
-				 * to any of the above.
-				 */
-				fallback = skipnr + 1;
-			}
-
-			/* Also the *_bulk() variants by only checking prefixes. */
-			if (str_has_prefix(buf, "kfree") ||
-			    str_has_prefix(buf, "kmem_cache_free") ||
-			    str_has_prefix(buf, "__kmalloc") ||
-			    str_has_prefix(buf, "kmem_cache_alloc"))
-				goto found;
+		if (str_has_prefix(buf, "kfence_") || str_has_prefix(buf, "__kfence_") ||
+		    !strncmp(buf, "__slab_free", len)) {
+			/*
+			 * In case of tail calls from any of the below
+			 * to any of the above.
+			 */
+			fallback = skipnr + 1;
 		}
+
+		/* Also the *_bulk() variants by only checking prefixes. */
+		if (str_has_prefix(buf, "kfree") ||
+		    str_has_prefix(buf, "kmem_cache_free") ||
+		    str_has_prefix(buf, "__kmalloc") ||
+		    str_has_prefix(buf, "kmem_cache_alloc"))
+			goto found;
 	}
 	if (fallback < num_entries)
 		return fallback;
@@ -157,13 +156,20 @@ static void print_diff_canary(unsigned l
 	pr_cont(" ]");
 }
 
-void kfence_report_error(unsigned long address, const struct kfence_metadata *meta,
-			 enum kfence_error_type type)
+void kfence_report_error(unsigned long address, struct pt_regs *regs,
+			 const struct kfence_metadata *meta, enum kfence_error_type type)
 {
 	unsigned long stack_entries[KFENCE_STACK_DEPTH] = { 0 };
-	int num_stack_entries = stack_trace_save(stack_entries, KFENCE_STACK_DEPTH, 1);
-	int skipnr = get_stack_skipnr(stack_entries, num_stack_entries, &type);
 	const ptrdiff_t object_index = meta ? meta - kfence_metadata : -1;
+	int num_stack_entries;
+	int skipnr = 0;
+
+	if (regs) {
+		num_stack_entries = stack_trace_save_regs(regs, stack_entries, KFENCE_STACK_DEPTH, 0);
+	} else {
+		num_stack_entries = stack_trace_save(stack_entries, KFENCE_STACK_DEPTH, 1);
+		skipnr = get_stack_skipnr(stack_entries, num_stack_entries, &type);
+	}
 
 	/* Require non-NULL meta, except if KFENCE_ERROR_INVALID. */
 	if (WARN_ON(type != KFENCE_ERROR_INVALID && !meta))
@@ -227,7 +233,10 @@ void kfence_report_error(unsigned long a
 
 	/* Print report footer. */
 	pr_err("\n");
-	dump_stack_print_info(KERN_ERR);
+	if (IS_ENABLED(CONFIG_DEBUG_KERNEL) && regs)
+		show_regs(regs);
+	else
+		dump_stack_print_info(KERN_ERR);
 	pr_err("==================================================================\n");
 
 	lockdep_on();
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 060/118] mm, kfence: insert KFENCE hooks for SLAB
  2021-02-26  1:14 incoming Andrew Morton
                   ` (58 preceding siblings ...)
  2021-02-26  1:19 ` [patch 059/118] kfence: use pt_regs to generate stack trace on faults Andrew Morton
@ 2021-02-26  1:19 ` Andrew Morton
  2021-02-26  1:19 ` [patch 061/118] mm, kfence: insert KFENCE hooks for SLUB Andrew Morton
                   ` (58 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:19 UTC (permalink / raw)
  To: akpm, andreyknvl, aryabinin, bp, catalin.marinas, cl, corbet,
	dave.hansen, dvyukov, edumazet, elver, glider, gregkh, hdanton,
	hpa, iamjoonsoo.kim, jannh, joern, keescook, linux-mm, luto,
	mark.rutland, mingo, mm-commits, paulmck, penberg, peterz,
	rientjes, sjpark, tglx, torvalds, vbabka, will

From: Alexander Potapenko <glider@google.com>
Subject: mm, kfence: insert KFENCE hooks for SLAB

Inserts KFENCE hooks into the SLAB allocator.

To pass the originally requested size to KFENCE, add an argument
'orig_size' to slab_alloc*(). The additional argument is required to
preserve the requested original size for kmalloc() allocations, which
uses size classes (e.g. an allocation of 272 bytes will return an object
of size 512). Therefore, kmem_cache::size does not represent the
kmalloc-caller's requested size, and we must introduce the argument
'orig_size' to propagate the originally requested size to KFENCE.

Without the originally requested size, we would not be able to detect
out-of-bounds accesses for objects placed at the end of a KFENCE object
page if that object is not equal to the kmalloc-size class it was
bucketed into.

When KFENCE is disabled, there is no additional overhead, since
slab_alloc*() functions are __always_inline.

Link: https://lkml.kernel.org/r/20201103175841.3495947-5-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Co-developed-by: Marco Elver <elver@google.com>

Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/slab_def.h |    3 ++
 mm/kfence/core.c         |    2 +
 mm/slab.c                |   38 ++++++++++++++++++++++++++++---------
 mm/slab_common.c         |    5 +++-
 4 files changed, 38 insertions(+), 10 deletions(-)

--- a/include/linux/slab_def.h~mm-kfence-insert-kfence-hooks-for-slab
+++ a/include/linux/slab_def.h
@@ -2,6 +2,7 @@
 #ifndef _LINUX_SLAB_DEF_H
 #define	_LINUX_SLAB_DEF_H
 
+#include <linux/kfence.h>
 #include <linux/reciprocal_div.h>
 
 /*
@@ -114,6 +115,8 @@ static inline unsigned int obj_to_index(
 static inline int objs_per_slab_page(const struct kmem_cache *cache,
 				     const struct page *page)
 {
+	if (is_kfence_address(page_address(page)))
+		return 1;
 	return cache->num;
 }
 
--- a/mm/kfence/core.c~mm-kfence-insert-kfence-hooks-for-slab
+++ a/mm/kfence/core.c
@@ -317,6 +317,8 @@ static void *kfence_guarded_alloc(struct
 	/* Set required struct page fields. */
 	page = virt_to_page(meta->addr);
 	page->slab_cache = cache;
+	if (IS_ENABLED(CONFIG_SLAB))
+		page->s_mem = addr;
 
 	raw_spin_unlock_irqrestore(&meta->lock, flags);
 
--- a/mm/slab.c~mm-kfence-insert-kfence-hooks-for-slab
+++ a/mm/slab.c
@@ -100,6 +100,7 @@
 #include	<linux/seq_file.h>
 #include	<linux/notifier.h>
 #include	<linux/kallsyms.h>
+#include	<linux/kfence.h>
 #include	<linux/cpu.h>
 #include	<linux/sysctl.h>
 #include	<linux/module.h>
@@ -3208,7 +3209,7 @@ must_grow:
 }
 
 static __always_inline void *
-slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
+slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, size_t orig_size,
 		   unsigned long caller)
 {
 	unsigned long save_flags;
@@ -3221,6 +3222,10 @@ slab_alloc_node(struct kmem_cache *cache
 	if (unlikely(!cachep))
 		return NULL;
 
+	ptr = kfence_alloc(cachep, orig_size, flags);
+	if (unlikely(ptr))
+		goto out_hooks;
+
 	cache_alloc_debugcheck_before(cachep, flags);
 	local_irq_save(save_flags);
 
@@ -3253,6 +3258,7 @@ slab_alloc_node(struct kmem_cache *cache
 	if (unlikely(slab_want_init_on_alloc(flags, cachep)) && ptr)
 		memset(ptr, 0, cachep->object_size);
 
+out_hooks:
 	slab_post_alloc_hook(cachep, objcg, flags, 1, &ptr);
 	return ptr;
 }
@@ -3290,7 +3296,7 @@ __do_cache_alloc(struct kmem_cache *cach
 #endif /* CONFIG_NUMA */
 
 static __always_inline void *
-slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller)
+slab_alloc(struct kmem_cache *cachep, gfp_t flags, size_t orig_size, unsigned long caller)
 {
 	unsigned long save_flags;
 	void *objp;
@@ -3301,6 +3307,10 @@ slab_alloc(struct kmem_cache *cachep, gf
 	if (unlikely(!cachep))
 		return NULL;
 
+	objp = kfence_alloc(cachep, orig_size, flags);
+	if (unlikely(objp))
+		goto out;
+
 	cache_alloc_debugcheck_before(cachep, flags);
 	local_irq_save(save_flags);
 	objp = __do_cache_alloc(cachep, flags);
@@ -3311,6 +3321,7 @@ slab_alloc(struct kmem_cache *cachep, gf
 	if (unlikely(slab_want_init_on_alloc(flags, cachep)) && objp)
 		memset(objp, 0, cachep->object_size);
 
+out:
 	slab_post_alloc_hook(cachep, objcg, flags, 1, &objp);
 	return objp;
 }
@@ -3416,6 +3427,12 @@ free_done:
 static __always_inline void __cache_free(struct kmem_cache *cachep, void *objp,
 					 unsigned long caller)
 {
+	if (is_kfence_address(objp)) {
+		kmemleak_free_recursive(objp, cachep->flags);
+		__kfence_free(objp);
+		return;
+	}
+
 	if (unlikely(slab_want_init_on_free(cachep)))
 		memset(objp, 0, cachep->object_size);
 
@@ -3482,7 +3499,7 @@ void ___cache_free(struct kmem_cache *ca
  */
 void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
 {
-	void *ret = slab_alloc(cachep, flags, _RET_IP_);
+	void *ret = slab_alloc(cachep, flags, cachep->object_size, _RET_IP_);
 
 	trace_kmem_cache_alloc(_RET_IP_, ret,
 			       cachep->object_size, cachep->size, flags);
@@ -3515,7 +3532,7 @@ int kmem_cache_alloc_bulk(struct kmem_ca
 
 	local_irq_disable();
 	for (i = 0; i < size; i++) {
-		void *objp = __do_cache_alloc(s, flags);
+		void *objp = kfence_alloc(s, s->object_size, flags) ?: __do_cache_alloc(s, flags);
 
 		if (unlikely(!objp))
 			goto error;
@@ -3548,7 +3565,7 @@ kmem_cache_alloc_trace(struct kmem_cache
 {
 	void *ret;
 
-	ret = slab_alloc(cachep, flags, _RET_IP_);
+	ret = slab_alloc(cachep, flags, size, _RET_IP_);
 
 	ret = kasan_kmalloc(cachep, ret, size, flags);
 	trace_kmalloc(_RET_IP_, ret,
@@ -3574,7 +3591,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_trace);
  */
 void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
 {
-	void *ret = slab_alloc_node(cachep, flags, nodeid, _RET_IP_);
+	void *ret = slab_alloc_node(cachep, flags, nodeid, cachep->object_size, _RET_IP_);
 
 	trace_kmem_cache_alloc_node(_RET_IP_, ret,
 				    cachep->object_size, cachep->size,
@@ -3592,7 +3609,7 @@ void *kmem_cache_alloc_node_trace(struct
 {
 	void *ret;
 
-	ret = slab_alloc_node(cachep, flags, nodeid, _RET_IP_);
+	ret = slab_alloc_node(cachep, flags, nodeid, size, _RET_IP_);
 
 	ret = kasan_kmalloc(cachep, ret, size, flags);
 	trace_kmalloc_node(_RET_IP_, ret,
@@ -3673,7 +3690,7 @@ static __always_inline void *__do_kmallo
 	cachep = kmalloc_slab(size, flags);
 	if (unlikely(ZERO_OR_NULL_PTR(cachep)))
 		return cachep;
-	ret = slab_alloc(cachep, flags, caller);
+	ret = slab_alloc(cachep, flags, size, caller);
 
 	ret = kasan_kmalloc(cachep, ret, size, flags);
 	trace_kmalloc(caller, ret,
@@ -4172,7 +4189,10 @@ void __check_heap_object(const void *ptr
 	BUG_ON(objnr >= cachep->num);
 
 	/* Find offset within object. */
-	offset = ptr - index_to_obj(cachep, page, objnr) - obj_offset(cachep);
+	if (is_kfence_address(ptr))
+		offset = ptr - kfence_object_start(ptr);
+	else
+		offset = ptr - index_to_obj(cachep, page, objnr) - obj_offset(cachep);
 
 	/* Allow address range falling entirely within usercopy region. */
 	if (offset >= cachep->useroffset &&
--- a/mm/slab_common.c~mm-kfence-insert-kfence-hooks-for-slab
+++ a/mm/slab_common.c
@@ -12,6 +12,7 @@
 #include <linux/memory.h>
 #include <linux/cache.h>
 #include <linux/compiler.h>
+#include <linux/kfence.h>
 #include <linux/module.h>
 #include <linux/cpu.h>
 #include <linux/uaccess.h>
@@ -430,6 +431,7 @@ static void slab_caches_to_rcu_destroy_w
 	rcu_barrier();
 
 	list_for_each_entry_safe(s, s2, &to_destroy, list) {
+		kfence_shutdown_cache(s);
 #ifdef SLAB_SUPPORTS_SYSFS
 		sysfs_slab_release(s);
 #else
@@ -455,6 +457,7 @@ static int shutdown_cache(struct kmem_ca
 		list_add_tail(&s->list, &slab_caches_to_rcu_destroy);
 		schedule_work(&slab_caches_to_rcu_destroy_work);
 	} else {
+		kfence_shutdown_cache(s);
 #ifdef SLAB_SUPPORTS_SYSFS
 		sysfs_slab_unlink(s);
 		sysfs_slab_release(s);
@@ -1235,7 +1238,7 @@ size_t ksize(const void *objp)
 	if (unlikely(ZERO_OR_NULL_PTR(objp)) || !kasan_check_byte(objp))
 		return 0;
 
-	size = __ksize(objp);
+	size = kfence_ksize(objp) ?: __ksize(objp);
 	/*
 	 * We assume that ksize callers could use whole allocated area,
 	 * so we need to unpoison this area.
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 061/118] mm, kfence: insert KFENCE hooks for SLUB
  2021-02-26  1:14 incoming Andrew Morton
                   ` (59 preceding siblings ...)
  2021-02-26  1:19 ` [patch 060/118] mm, kfence: insert KFENCE hooks for SLAB Andrew Morton
@ 2021-02-26  1:19 ` Andrew Morton
  2021-02-26  1:19 ` [patch 062/118] kfence, kasan: make KFENCE compatible with KASAN Andrew Morton
                   ` (57 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:19 UTC (permalink / raw)
  To: akpm, andreyknvl, aryabinin, bp, catalin.marinas, cl, corbet,
	dave.hansen, dvyukov, edumazet, elver, glider, gregkh, hdanton,
	hpa, iamjoonsoo.kim, jannh, joern, keescook, linux-mm, luto,
	mark.rutland, mingo, mm-commits, paulmck, penberg, peterz,
	rientjes, sjpark, tglx, torvalds, vbabka, will

From: Alexander Potapenko <glider@google.com>
Subject: mm, kfence: insert KFENCE hooks for SLUB

Inserts KFENCE hooks into the SLUB allocator.

To pass the originally requested size to KFENCE, add an argument
'orig_size' to slab_alloc*(). The additional argument is required to
preserve the requested original size for kmalloc() allocations, which
uses size classes (e.g. an allocation of 272 bytes will return an object
of size 512). Therefore, kmem_cache::size does not represent the
kmalloc-caller's requested size, and we must introduce the argument
'orig_size' to propagate the originally requested size to KFENCE.

Without the originally requested size, we would not be able to detect
out-of-bounds accesses for objects placed at the end of a KFENCE object
page if that object is not equal to the kmalloc-size class it was
bucketed into.

When KFENCE is disabled, there is no additional overhead, since
slab_alloc*() functions are __always_inline.

Link: https://lkml.kernel.org/r/20201103175841.3495947-6-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Co-developed-by: Marco Elver <elver@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/slub_def.h |    3 +
 mm/kfence/core.c         |    2 +
 mm/slub.c                |   60 ++++++++++++++++++++++++++++---------
 3 files changed, 51 insertions(+), 14 deletions(-)

--- a/include/linux/slub_def.h~mm-kfence-insert-kfence-hooks-for-slub
+++ a/include/linux/slub_def.h
@@ -7,6 +7,7 @@
  *
  * (C) 2007 SGI, Christoph Lameter
  */
+#include <linux/kfence.h>
 #include <linux/kobject.h>
 #include <linux/reciprocal_div.h>
 
@@ -185,6 +186,8 @@ static inline unsigned int __obj_to_inde
 static inline unsigned int obj_to_index(const struct kmem_cache *cache,
 					const struct page *page, void *obj)
 {
+	if (is_kfence_address(obj))
+		return 0;
 	return __obj_to_index(cache, page_address(page), obj);
 }
 
--- a/mm/kfence/core.c~mm-kfence-insert-kfence-hooks-for-slub
+++ a/mm/kfence/core.c
@@ -317,6 +317,8 @@ static void *kfence_guarded_alloc(struct
 	/* Set required struct page fields. */
 	page = virt_to_page(meta->addr);
 	page->slab_cache = cache;
+	if (IS_ENABLED(CONFIG_SLUB))
+		page->objects = 1;
 	if (IS_ENABLED(CONFIG_SLAB))
 		page->s_mem = addr;
 
--- a/mm/slub.c~mm-kfence-insert-kfence-hooks-for-slub
+++ a/mm/slub.c
@@ -27,6 +27,7 @@
 #include <linux/ctype.h>
 #include <linux/debugobjects.h>
 #include <linux/kallsyms.h>
+#include <linux/kfence.h>
 #include <linux/memory.h>
 #include <linux/math64.h>
 #include <linux/fault-inject.h>
@@ -1570,6 +1571,11 @@ static inline bool slab_free_freelist_ho
 	void *old_tail = *tail ? *tail : *head;
 	int rsize;
 
+	if (is_kfence_address(next)) {
+		slab_free_hook(s, next);
+		return true;
+	}
+
 	/* Head and tail of the reconstructed freelist */
 	*head = NULL;
 	*tail = NULL;
@@ -2809,7 +2815,7 @@ static __always_inline void maybe_wipe_o
  * Otherwise we can simply pick the next object from the lockless free list.
  */
 static __always_inline void *slab_alloc_node(struct kmem_cache *s,
-		gfp_t gfpflags, int node, unsigned long addr)
+		gfp_t gfpflags, int node, unsigned long addr, size_t orig_size)
 {
 	void *object;
 	struct kmem_cache_cpu *c;
@@ -2820,6 +2826,11 @@ static __always_inline void *slab_alloc_
 	s = slab_pre_alloc_hook(s, &objcg, 1, gfpflags);
 	if (!s)
 		return NULL;
+
+	object = kfence_alloc(s, orig_size, gfpflags);
+	if (unlikely(object))
+		goto out;
+
 redo:
 	/*
 	 * Must read kmem_cache cpu data via this cpu ptr. Preemption is
@@ -2892,20 +2903,21 @@ redo:
 	if (unlikely(slab_want_init_on_alloc(gfpflags, s)) && object)
 		memset(kasan_reset_tag(object), 0, s->object_size);
 
+out:
 	slab_post_alloc_hook(s, objcg, gfpflags, 1, &object);
 
 	return object;
 }
 
 static __always_inline void *slab_alloc(struct kmem_cache *s,
-		gfp_t gfpflags, unsigned long addr)
+		gfp_t gfpflags, unsigned long addr, size_t orig_size)
 {
-	return slab_alloc_node(s, gfpflags, NUMA_NO_NODE, addr);
+	return slab_alloc_node(s, gfpflags, NUMA_NO_NODE, addr, orig_size);
 }
 
 void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags)
 {
-	void *ret = slab_alloc(s, gfpflags, _RET_IP_);
+	void *ret = slab_alloc(s, gfpflags, _RET_IP_, s->object_size);
 
 	trace_kmem_cache_alloc(_RET_IP_, ret, s->object_size,
 				s->size, gfpflags);
@@ -2917,7 +2929,7 @@ EXPORT_SYMBOL(kmem_cache_alloc);
 #ifdef CONFIG_TRACING
 void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
 {
-	void *ret = slab_alloc(s, gfpflags, _RET_IP_);
+	void *ret = slab_alloc(s, gfpflags, _RET_IP_, size);
 	trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags);
 	ret = kasan_kmalloc(s, ret, size, gfpflags);
 	return ret;
@@ -2928,7 +2940,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_trace);
 #ifdef CONFIG_NUMA
 void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
 {
-	void *ret = slab_alloc_node(s, gfpflags, node, _RET_IP_);
+	void *ret = slab_alloc_node(s, gfpflags, node, _RET_IP_, s->object_size);
 
 	trace_kmem_cache_alloc_node(_RET_IP_, ret,
 				    s->object_size, s->size, gfpflags, node);
@@ -2942,7 +2954,7 @@ void *kmem_cache_alloc_node_trace(struct
 				    gfp_t gfpflags,
 				    int node, size_t size)
 {
-	void *ret = slab_alloc_node(s, gfpflags, node, _RET_IP_);
+	void *ret = slab_alloc_node(s, gfpflags, node, _RET_IP_, size);
 
 	trace_kmalloc_node(_RET_IP_, ret,
 			   size, s->size, gfpflags, node);
@@ -2976,6 +2988,9 @@ static void __slab_free(struct kmem_cach
 
 	stat(s, FREE_SLOWPATH);
 
+	if (kfence_free(head))
+		return;
+
 	if (kmem_cache_debug(s) &&
 	    !free_debug_processing(s, page, head, tail, cnt, addr))
 		return;
@@ -3220,6 +3235,13 @@ int build_detached_freelist(struct kmem_
 		df->s = cache_from_obj(s, object); /* Support for memcg */
 	}
 
+	if (is_kfence_address(object)) {
+		slab_free_hook(df->s, object);
+		__kfence_free(object);
+		p[size] = NULL; /* mark object processed */
+		return size;
+	}
+
 	/* Start new detached freelist */
 	df->page = page;
 	set_freepointer(df->s, object, NULL);
@@ -3295,8 +3317,14 @@ int kmem_cache_alloc_bulk(struct kmem_ca
 	c = this_cpu_ptr(s->cpu_slab);
 
 	for (i = 0; i < size; i++) {
-		void *object = c->freelist;
+		void *object = kfence_alloc(s, s->object_size, flags);
 
+		if (unlikely(object)) {
+			p[i] = object;
+			continue;
+		}
+
+		object = c->freelist;
 		if (unlikely(!object)) {
 			/*
 			 * We may have removed an object from c->freelist using
@@ -4021,7 +4049,7 @@ void *__kmalloc(size_t size, gfp_t flags
 	if (unlikely(ZERO_OR_NULL_PTR(s)))
 		return s;
 
-	ret = slab_alloc(s, flags, _RET_IP_);
+	ret = slab_alloc(s, flags, _RET_IP_, size);
 
 	trace_kmalloc(_RET_IP_, ret, size, s->size, flags);
 
@@ -4069,7 +4097,7 @@ void *__kmalloc_node(size_t size, gfp_t
 	if (unlikely(ZERO_OR_NULL_PTR(s)))
 		return s;
 
-	ret = slab_alloc_node(s, flags, node, _RET_IP_);
+	ret = slab_alloc_node(s, flags, node, _RET_IP_, size);
 
 	trace_kmalloc_node(_RET_IP_, ret, size, s->size, flags, node);
 
@@ -4095,6 +4123,7 @@ void __check_heap_object(const void *ptr
 	struct kmem_cache *s;
 	unsigned int offset;
 	size_t object_size;
+	bool is_kfence = is_kfence_address(ptr);
 
 	ptr = kasan_reset_tag(ptr);
 
@@ -4107,10 +4136,13 @@ void __check_heap_object(const void *ptr
 			       to_user, 0, n);
 
 	/* Find offset within object. */
-	offset = (ptr - page_address(page)) % s->size;
+	if (is_kfence)
+		offset = ptr - kfence_object_start(ptr);
+	else
+		offset = (ptr - page_address(page)) % s->size;
 
 	/* Adjust for redzone and reject if within the redzone. */
-	if (kmem_cache_debug_flags(s, SLAB_RED_ZONE)) {
+	if (!is_kfence && kmem_cache_debug_flags(s, SLAB_RED_ZONE)) {
 		if (offset < s->red_left_pad)
 			usercopy_abort("SLUB object in left red zone",
 				       s->name, to_user, offset, n);
@@ -4527,7 +4559,7 @@ void *__kmalloc_track_caller(size_t size
 	if (unlikely(ZERO_OR_NULL_PTR(s)))
 		return s;
 
-	ret = slab_alloc(s, gfpflags, caller);
+	ret = slab_alloc(s, gfpflags, caller, size);
 
 	/* Honor the call site pointer we received. */
 	trace_kmalloc(caller, ret, size, s->size, gfpflags);
@@ -4558,7 +4590,7 @@ void *__kmalloc_node_track_caller(size_t
 	if (unlikely(ZERO_OR_NULL_PTR(s)))
 		return s;
 
-	ret = slab_alloc_node(s, gfpflags, node, caller);
+	ret = slab_alloc_node(s, gfpflags, node, caller, size);
 
 	/* Honor the call site pointer we received. */
 	trace_kmalloc_node(caller, ret, size, s->size, gfpflags, node);
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 062/118] kfence, kasan: make KFENCE compatible with KASAN
  2021-02-26  1:14 incoming Andrew Morton
                   ` (60 preceding siblings ...)
  2021-02-26  1:19 ` [patch 061/118] mm, kfence: insert KFENCE hooks for SLUB Andrew Morton
@ 2021-02-26  1:19 ` Andrew Morton
  2021-02-26  1:19 ` [patch 063/118] kfence, Documentation: add KFENCE documentation Andrew Morton
                   ` (56 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:19 UTC (permalink / raw)
  To: akpm, andreyknvl, aryabinin, bp, catalin.marinas, cl, corbet,
	dave.hansen, dvyukov, edumazet, elver, glider, gregkh, hdanton,
	hpa, iamjoonsoo.kim, jannh, joern, keescook, linux-mm, luto,
	mark.rutland, mingo, mm-commits, paulmck, penberg, peterz,
	rientjes, sjpark, tglx, torvalds, vbabka, will

From: Alexander Potapenko <glider@google.com>
Subject: kfence, kasan: make KFENCE compatible with KASAN

Make KFENCE compatible with KASAN. Currently this helps test KFENCE
itself, where KASAN can catch potential corruptions to KFENCE state, or
other corruptions that may be a result of freepointer corruptions in the
main allocators.

[akpm@linux-foundation.org: merge fixup]
[andreyknvl@google.com: untag addresses for KFENCE]
  Link: https://lkml.kernel.org/r/9dc196006921b191d25d10f6e611316db7da2efc.1611946152.git.andreyknvl@google.com
Link: https://lkml.kernel.org/r/20201103175841.3495947-7-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Co-developed-by: Marco Elver <elver@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 lib/Kconfig.kfence |    2 +-
 mm/kasan/common.c  |    6 ++++++
 mm/kasan/generic.c |    3 ++-
 mm/kasan/kasan.h   |   21 ++++++++++++++++++---
 mm/kasan/shadow.c  |   13 +++++++++++++
 5 files changed, 40 insertions(+), 5 deletions(-)

--- a/lib/Kconfig.kfence~kfence-kasan-make-kfence-compatible-with-kasan
+++ a/lib/Kconfig.kfence
@@ -5,7 +5,7 @@ config HAVE_ARCH_KFENCE
 
 menuconfig KFENCE
 	bool "KFENCE: low-overhead sampling-based memory safety error detector"
-	depends on HAVE_ARCH_KFENCE && !KASAN && (SLAB || SLUB)
+	depends on HAVE_ARCH_KFENCE && (SLAB || SLUB)
 	select STACKTRACE
 	help
 	  KFENCE is a low-overhead sampling-based detector of heap out-of-bounds
--- a/mm/kasan/common.c~kfence-kasan-make-kfence-compatible-with-kasan
+++ a/mm/kasan/common.c
@@ -335,6 +335,9 @@ static bool ____kasan_slab_free(struct k
 	tagged_object = object;
 	object = kasan_reset_tag(object);
 
+	if (is_kfence_address(object))
+		return false;
+
 	if (unlikely(nearest_obj(cache, virt_to_head_page(object), object) !=
 	    object)) {
 		kasan_report_invalid_free(tagged_object, ip);
@@ -413,6 +416,9 @@ static void *____kasan_kmalloc(struct km
 	if (unlikely(object == NULL))
 		return NULL;
 
+	if (is_kfence_address(kasan_reset_tag(object)))
+		return (void *)object;
+
 	redzone_start = round_up((unsigned long)(object + size),
 				KASAN_GRANULE_SIZE);
 	redzone_end = round_up((unsigned long)object + cache->object_size,
--- a/mm/kasan/generic.c~kfence-kasan-make-kfence-compatible-with-kasan
+++ a/mm/kasan/generic.c
@@ -14,6 +14,7 @@
 #include <linux/init.h>
 #include <linux/kasan.h>
 #include <linux/kernel.h>
+#include <linux/kfence.h>
 #include <linux/kmemleak.h>
 #include <linux/linkage.h>
 #include <linux/memblock.h>
@@ -331,7 +332,7 @@ void kasan_record_aux_stack(void *addr)
 	struct kasan_alloc_meta *alloc_meta;
 	void *object;
 
-	if (!(page && PageSlab(page)))
+	if (is_kfence_address(addr) || !(page && PageSlab(page)))
 		return;
 
 	cache = page->slab_cache;
--- a/mm/kasan/kasan.h~kfence-kasan-make-kfence-compatible-with-kasan
+++ a/mm/kasan/kasan.h
@@ -3,6 +3,7 @@
 #define __MM_KASAN_KASAN_H
 
 #include <linux/kasan.h>
+#include <linux/kfence.h>
 #include <linux/stackdepot.h>
 
 #ifdef CONFIG_KASAN_HW_TAGS
@@ -331,14 +332,28 @@ static inline u8 kasan_random_tag(void)
 
 static inline void kasan_poison(const void *address, size_t size, u8 value)
 {
-	hw_set_mem_tag_range(kasan_reset_tag(address),
+	address = kasan_reset_tag(address);
+
+	/* Skip KFENCE memory if called explicitly outside of sl*b. */
+	if (is_kfence_address(address))
+		return;
+
+	hw_set_mem_tag_range((void *)address,
 			round_up(size, KASAN_GRANULE_SIZE), value);
 }
 
 static inline void kasan_unpoison(const void *address, size_t size)
 {
-	hw_set_mem_tag_range(kasan_reset_tag(address),
-			round_up(size, KASAN_GRANULE_SIZE), get_tag(address));
+	u8 tag = get_tag(address);
+
+	address = kasan_reset_tag(address);
+
+	/* Skip KFENCE memory if called explicitly outside of sl*b. */
+	if (is_kfence_address(address))
+		return;
+
+	hw_set_mem_tag_range((void *)address,
+			round_up(size, KASAN_GRANULE_SIZE), tag);
 }
 
 static inline bool kasan_byte_accessible(const void *addr)
--- a/mm/kasan/shadow.c~kfence-kasan-make-kfence-compatible-with-kasan
+++ a/mm/kasan/shadow.c
@@ -13,6 +13,7 @@
 #include <linux/init.h>
 #include <linux/kasan.h>
 #include <linux/kernel.h>
+#include <linux/kfence.h>
 #include <linux/kmemleak.h>
 #include <linux/memory.h>
 #include <linux/mm.h>
@@ -84,6 +85,10 @@ void kasan_poison(const void *address, s
 	address = kasan_reset_tag(address);
 	size = round_up(size, KASAN_GRANULE_SIZE);
 
+	/* Skip KFENCE memory if called explicitly outside of sl*b. */
+	if (is_kfence_address(address))
+		return;
+
 	shadow_start = kasan_mem_to_shadow(address);
 	shadow_end = kasan_mem_to_shadow(address + size);
 
@@ -102,6 +107,14 @@ void kasan_unpoison(const void *address,
 	 */
 	address = kasan_reset_tag(address);
 
+	/*
+	 * Skip KFENCE memory if called explicitly outside of sl*b. Also note
+	 * that calls to ksize(), where size is not a multiple of machine-word
+	 * size, would otherwise poison the invalid portion of the word.
+	 */
+	if (is_kfence_address(address))
+		return;
+
 	kasan_poison(address, size, tag);
 
 	if (size & KASAN_GRANULE_MASK) {
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 063/118] kfence, Documentation: add KFENCE documentation
  2021-02-26  1:14 incoming Andrew Morton
                   ` (61 preceding siblings ...)
  2021-02-26  1:19 ` [patch 062/118] kfence, kasan: make KFENCE compatible with KASAN Andrew Morton
@ 2021-02-26  1:19 ` Andrew Morton
  2021-02-26  1:19 ` [patch 064/118] kfence: add test suite Andrew Morton
                   ` (55 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:19 UTC (permalink / raw)
  To: akpm, andreyknvl, aryabinin, bp, catalin.marinas, cl, corbet,
	dave.hansen, dvyukov, edumazet, elver, glider, gregkh, hdanton,
	hpa, iamjoonsoo.kim, jannh, joern, keescook, linux-mm, luto,
	mark.rutland, mingo, mm-commits, paulmck, penberg, peterz,
	rientjes, sjpark, tglx, torvalds, vbabka, will

From: Marco Elver <elver@google.com>
Subject: kfence, Documentation: add KFENCE documentation

Add KFENCE documentation in dev-tools/kfence.rst, and add to index.

[elver@google.com: add missing copyright header to documentation]
  Link: https://lkml.kernel.org/r/20210118092159.145934-4-elver@google.com
Link: https://lkml.kernel.org/r/20201103175841.3495947-8-elver@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Co-developed-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 Documentation/dev-tools/index.rst  |    1 
 Documentation/dev-tools/kfence.rst |  298 +++++++++++++++++++++++++++
 lib/Kconfig.kfence                 |    2 
 3 files changed, 301 insertions(+)

--- a/Documentation/dev-tools/index.rst~kfence-documentation-add-kfence-documentation
+++ a/Documentation/dev-tools/index.rst
@@ -22,6 +22,7 @@ whole; patches welcome!
    ubsan
    kmemleak
    kcsan
+   kfence
    gdb-kernel-debugging
    kgdb
    kselftest
--- /dev/null
+++ a/Documentation/dev-tools/kfence.rst
@@ -0,0 +1,298 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. Copyright (C) 2020, Google LLC.
+
+Kernel Electric-Fence (KFENCE)
+==============================
+
+Kernel Electric-Fence (KFENCE) is a low-overhead sampling-based memory safety
+error detector. KFENCE detects heap out-of-bounds access, use-after-free, and
+invalid-free errors.
+
+KFENCE is designed to be enabled in production kernels, and has near zero
+performance overhead. Compared to KASAN, KFENCE trades performance for
+precision. The main motivation behind KFENCE's design, is that with enough
+total uptime KFENCE will detect bugs in code paths not typically exercised by
+non-production test workloads. One way to quickly achieve a large enough total
+uptime is when the tool is deployed across a large fleet of machines.
+
+Usage
+-----
+
+To enable KFENCE, configure the kernel with::
+
+    CONFIG_KFENCE=y
+
+To build a kernel with KFENCE support, but disabled by default (to enable, set
+``kfence.sample_interval`` to non-zero value), configure the kernel with::
+
+    CONFIG_KFENCE=y
+    CONFIG_KFENCE_SAMPLE_INTERVAL=0
+
+KFENCE provides several other configuration options to customize behaviour (see
+the respective help text in ``lib/Kconfig.kfence`` for more info).
+
+Tuning performance
+~~~~~~~~~~~~~~~~~~
+
+The most important parameter is KFENCE's sample interval, which can be set via
+the kernel boot parameter ``kfence.sample_interval`` in milliseconds. The
+sample interval determines the frequency with which heap allocations will be
+guarded by KFENCE. The default is configurable via the Kconfig option
+``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0``
+disables KFENCE.
+
+The KFENCE memory pool is of fixed size, and if the pool is exhausted, no
+further KFENCE allocations occur. With ``CONFIG_KFENCE_NUM_OBJECTS`` (default
+255), the number of available guarded objects can be controlled. Each object
+requires 2 pages, one for the object itself and the other one used as a guard
+page; object pages are interleaved with guard pages, and every object page is
+therefore surrounded by two guard pages.
+
+The total memory dedicated to the KFENCE memory pool can be computed as::
+
+    ( #objects + 1 ) * 2 * PAGE_SIZE
+
+Using the default config, and assuming a page size of 4 KiB, results in
+dedicating 2 MiB to the KFENCE memory pool.
+
+Note: On architectures that support huge pages, KFENCE will ensure that the
+pool is using pages of size ``PAGE_SIZE``. This will result in additional page
+tables being allocated.
+
+Error reports
+~~~~~~~~~~~~~
+
+A typical out-of-bounds access looks like this::
+
+    ==================================================================
+    BUG: KFENCE: out-of-bounds in test_out_of_bounds_read+0xa3/0x22b
+
+    Out-of-bounds access at 0xffffffffb672efff (1B left of kfence-#17):
+     test_out_of_bounds_read+0xa3/0x22b
+     kunit_try_run_case+0x51/0x85
+     kunit_generic_run_threadfn_adapter+0x16/0x30
+     kthread+0x137/0x160
+     ret_from_fork+0x22/0x30
+
+    kfence-#17 [0xffffffffb672f000-0xffffffffb672f01f, size=32, cache=kmalloc-32] allocated by task 507:
+     test_alloc+0xf3/0x25b
+     test_out_of_bounds_read+0x98/0x22b
+     kunit_try_run_case+0x51/0x85
+     kunit_generic_run_threadfn_adapter+0x16/0x30
+     kthread+0x137/0x160
+     ret_from_fork+0x22/0x30
+
+    CPU: 4 PID: 107 Comm: kunit_try_catch Not tainted 5.8.0-rc6+ #7
+    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
+    ==================================================================
+
+The header of the report provides a short summary of the function involved in
+the access. It is followed by more detailed information about the access and
+its origin. Note that, real kernel addresses are only shown for
+``CONFIG_DEBUG_KERNEL=y`` builds.
+
+Use-after-free accesses are reported as::
+
+    ==================================================================
+    BUG: KFENCE: use-after-free in test_use_after_free_read+0xb3/0x143
+
+    Use-after-free access at 0xffffffffb673dfe0 (in kfence-#24):
+     test_use_after_free_read+0xb3/0x143
+     kunit_try_run_case+0x51/0x85
+     kunit_generic_run_threadfn_adapter+0x16/0x30
+     kthread+0x137/0x160
+     ret_from_fork+0x22/0x30
+
+    kfence-#24 [0xffffffffb673dfe0-0xffffffffb673dfff, size=32, cache=kmalloc-32] allocated by task 507:
+     test_alloc+0xf3/0x25b
+     test_use_after_free_read+0x76/0x143
+     kunit_try_run_case+0x51/0x85
+     kunit_generic_run_threadfn_adapter+0x16/0x30
+     kthread+0x137/0x160
+     ret_from_fork+0x22/0x30
+
+    freed by task 507:
+     test_use_after_free_read+0xa8/0x143
+     kunit_try_run_case+0x51/0x85
+     kunit_generic_run_threadfn_adapter+0x16/0x30
+     kthread+0x137/0x160
+     ret_from_fork+0x22/0x30
+
+    CPU: 4 PID: 109 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
+    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
+    ==================================================================
+
+KFENCE also reports on invalid frees, such as double-frees::
+
+    ==================================================================
+    BUG: KFENCE: invalid free in test_double_free+0xdc/0x171
+
+    Invalid free of 0xffffffffb6741000:
+     test_double_free+0xdc/0x171
+     kunit_try_run_case+0x51/0x85
+     kunit_generic_run_threadfn_adapter+0x16/0x30
+     kthread+0x137/0x160
+     ret_from_fork+0x22/0x30
+
+    kfence-#26 [0xffffffffb6741000-0xffffffffb674101f, size=32, cache=kmalloc-32] allocated by task 507:
+     test_alloc+0xf3/0x25b
+     test_double_free+0x76/0x171
+     kunit_try_run_case+0x51/0x85
+     kunit_generic_run_threadfn_adapter+0x16/0x30
+     kthread+0x137/0x160
+     ret_from_fork+0x22/0x30
+
+    freed by task 507:
+     test_double_free+0xa8/0x171
+     kunit_try_run_case+0x51/0x85
+     kunit_generic_run_threadfn_adapter+0x16/0x30
+     kthread+0x137/0x160
+     ret_from_fork+0x22/0x30
+
+    CPU: 4 PID: 111 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
+    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
+    ==================================================================
+
+KFENCE also uses pattern-based redzones on the other side of an object's guard
+page, to detect out-of-bounds writes on the unprotected side of the object.
+These are reported on frees::
+
+    ==================================================================
+    BUG: KFENCE: memory corruption in test_kmalloc_aligned_oob_write+0xef/0x184
+
+    Corrupted memory at 0xffffffffb6797ff9 [ 0xac . . . . . . ] (in kfence-#69):
+     test_kmalloc_aligned_oob_write+0xef/0x184
+     kunit_try_run_case+0x51/0x85
+     kunit_generic_run_threadfn_adapter+0x16/0x30
+     kthread+0x137/0x160
+     ret_from_fork+0x22/0x30
+
+    kfence-#69 [0xffffffffb6797fb0-0xffffffffb6797ff8, size=73, cache=kmalloc-96] allocated by task 507:
+     test_alloc+0xf3/0x25b
+     test_kmalloc_aligned_oob_write+0x57/0x184
+     kunit_try_run_case+0x51/0x85
+     kunit_generic_run_threadfn_adapter+0x16/0x30
+     kthread+0x137/0x160
+     ret_from_fork+0x22/0x30
+
+    CPU: 4 PID: 120 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
+    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
+    ==================================================================
+
+For such errors, the address where the corruption occurred as well as the
+invalidly written bytes (offset from the address) are shown; in this
+representation, '.' denote untouched bytes. In the example above ``0xac`` is
+the value written to the invalid address at offset 0, and the remaining '.'
+denote that no following bytes have been touched. Note that, real values are
+only shown for ``CONFIG_DEBUG_KERNEL=y`` builds; to avoid information
+disclosure for non-debug builds, '!' is used instead to denote invalidly
+written bytes.
+
+And finally, KFENCE may also report on invalid accesses to any protected page
+where it was not possible to determine an associated object, e.g. if adjacent
+object pages had not yet been allocated::
+
+    ==================================================================
+    BUG: KFENCE: invalid access in test_invalid_access+0x26/0xe0
+
+    Invalid access at 0xffffffffb670b00a:
+     test_invalid_access+0x26/0xe0
+     kunit_try_run_case+0x51/0x85
+     kunit_generic_run_threadfn_adapter+0x16/0x30
+     kthread+0x137/0x160
+     ret_from_fork+0x22/0x30
+
+    CPU: 4 PID: 124 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
+    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
+    ==================================================================
+
+DebugFS interface
+~~~~~~~~~~~~~~~~~
+
+Some debugging information is exposed via debugfs:
+
+* The file ``/sys/kernel/debug/kfence/stats`` provides runtime statistics.
+
+* The file ``/sys/kernel/debug/kfence/objects`` provides a list of objects
+  allocated via KFENCE, including those already freed but protected.
+
+Implementation Details
+----------------------
+
+Guarded allocations are set up based on the sample interval. After expiration
+of the sample interval, the next allocation through the main allocator (SLAB or
+SLUB) returns a guarded allocation from the KFENCE object pool (allocation
+sizes up to PAGE_SIZE are supported). At this point, the timer is reset, and
+the next allocation is set up after the expiration of the interval. To "gate" a
+KFENCE allocation through the main allocator's fast-path without overhead,
+KFENCE relies on static branches via the static keys infrastructure. The static
+branch is toggled to redirect the allocation to KFENCE.
+
+KFENCE objects each reside on a dedicated page, at either the left or right
+page boundaries selected at random. The pages to the left and right of the
+object page are "guard pages", whose attributes are changed to a protected
+state, and cause page faults on any attempted access. Such page faults are then
+intercepted by KFENCE, which handles the fault gracefully by reporting an
+out-of-bounds access, and marking the page as accessible so that the faulting
+code can (wrongly) continue executing (set ``panic_on_warn`` to panic instead).
+
+To detect out-of-bounds writes to memory within the object's page itself,
+KFENCE also uses pattern-based redzones. For each object page, a redzone is set
+up for all non-object memory. For typical alignments, the redzone is only
+required on the unguarded side of an object. Because KFENCE must honor the
+cache's requested alignment, special alignments may result in unprotected gaps
+on either side of an object, all of which are redzoned.
+
+The following figure illustrates the page layout::
+
+    ---+-----------+-----------+-----------+-----------+-----------+---
+       | xxxxxxxxx | O :       | xxxxxxxxx |       : O | xxxxxxxxx |
+       | xxxxxxxxx | B :       | xxxxxxxxx |       : B | xxxxxxxxx |
+       | x GUARD x | J : RED-  | x GUARD x | RED-  : J | x GUARD x |
+       | xxxxxxxxx | E :  ZONE | xxxxxxxxx |  ZONE : E | xxxxxxxxx |
+       | xxxxxxxxx | C :       | xxxxxxxxx |       : C | xxxxxxxxx |
+       | xxxxxxxxx | T :       | xxxxxxxxx |       : T | xxxxxxxxx |
+    ---+-----------+-----------+-----------+-----------+-----------+---
+
+Upon deallocation of a KFENCE object, the object's page is again protected and
+the object is marked as freed. Any further access to the object causes a fault
+and KFENCE reports a use-after-free access. Freed objects are inserted at the
+tail of KFENCE's freelist, so that the least recently freed objects are reused
+first, and the chances of detecting use-after-frees of recently freed objects
+is increased.
+
+Interface
+---------
+
+The following describes the functions which are used by allocators as well as
+page handling code to set up and deal with KFENCE allocations.
+
+.. kernel-doc:: include/linux/kfence.h
+   :functions: is_kfence_address
+               kfence_shutdown_cache
+               kfence_alloc kfence_free __kfence_free
+               kfence_ksize kfence_object_start
+               kfence_handle_page_fault
+
+Related Tools
+-------------
+
+In userspace, a similar approach is taken by `GWP-ASan
+<http://llvm.org/docs/GwpAsan.html>`_. GWP-ASan also relies on guard pages and
+a sampling strategy to detect memory unsafety bugs at scale. KFENCE's design is
+directly influenced by GWP-ASan, and can be seen as its kernel sibling. Another
+similar but non-sampling approach, that also inspired the name "KFENCE", can be
+found in the userspace `Electric Fence Malloc Debugger
+<https://linux.die.net/man/3/efence>`_.
+
+In the kernel, several tools exist to debug memory access errors, and in
+particular KASAN can detect all bug classes that KFENCE can detect. While KASAN
+is more precise, relying on compiler instrumentation, this comes at a
+performance cost.
+
+It is worth highlighting that KASAN and KFENCE are complementary, with
+different target environments. For instance, KASAN is the better debugging-aid,
+where test cases or reproducers exists: due to the lower chance to detect the
+error, it would require more effort using KFENCE to debug. Deployments at scale
+that cannot afford to enable KASAN, however, would benefit from using KFENCE to
+discover bugs due to code paths not exercised by test cases or fuzzers.
--- a/lib/Kconfig.kfence~kfence-documentation-add-kfence-documentation
+++ a/lib/Kconfig.kfence
@@ -13,6 +13,8 @@ menuconfig KFENCE
 	  to have negligible cost to permit enabling it in production
 	  environments.
 
+	  See <file:Documentation/dev-tools/kfence.rst> for more details.
+
 	  Note that, KFENCE is not a substitute for explicit testing with tools
 	  such as KASAN. KFENCE can detect a subset of bugs that KASAN can
 	  detect, albeit at very different performance profiles. If you can
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 064/118] kfence: add test suite
  2021-02-26  1:14 incoming Andrew Morton
                   ` (62 preceding siblings ...)
  2021-02-26  1:19 ` [patch 063/118] kfence, Documentation: add KFENCE documentation Andrew Morton
@ 2021-02-26  1:19 ` Andrew Morton
  2021-02-26  1:19 ` [patch 065/118] MAINTAINERS: add entry for KFENCE Andrew Morton
                   ` (54 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:19 UTC (permalink / raw)
  To: akpm, andreyknvl, aryabinin, bp, catalin.marinas, cl, corbet,
	dave.hansen, dvyukov, edumazet, elver, glider, gregkh, hdanton,
	hpa, iamjoonsoo.kim, jannh, joern, keescook, linux-mm, luto,
	mark.rutland, mingo, mm-commits, paulmck, penberg, peterz,
	rientjes, sjpark, tglx, torvalds, vbabka, will

From: Marco Elver <elver@google.com>
Subject: kfence: add test suite

Add KFENCE test suite, testing various error detection scenarios. Makes
use of KUnit for test organization. Since KFENCE's interface to obtain
error reports is via the console, the test verifies that KFENCE outputs
expected reports to the console.

[elver@google.com: fix typo in test]
  Link: https://lkml.kernel.org/r/X9lHQExmHGvETxY4@elver.google.com
[elver@google.com: show access type in report]
  Link: https://lkml.kernel.org/r/20210111091544.3287013-2-elver@google.com
Link: https://lkml.kernel.org/r/20201103175841.3495947-9-elver@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Co-developed-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 Documentation/dev-tools/kfence.rst |   12 
 arch/arm64/mm/fault.c              |    2 
 arch/x86/mm/fault.c                |    3 
 include/linux/kfence.h             |    9 
 lib/Kconfig.kfence                 |   13 
 mm/kfence/Makefile                 |    3 
 mm/kfence/core.c                   |   11 
 mm/kfence/kfence.h                 |    2 
 mm/kfence/kfence_test.c            |  858 +++++++++++++++++++++++++++
 mm/kfence/report.c                 |   27 
 10 files changed, 915 insertions(+), 25 deletions(-)

--- a/arch/arm64/mm/fault.c~kfence-add-test-suite
+++ a/arch/arm64/mm/fault.c
@@ -390,7 +390,7 @@ static void __do_kernel_fault(unsigned l
 	} else if (addr < PAGE_SIZE) {
 		msg = "NULL pointer dereference";
 	} else {
-		if (kfence_handle_page_fault(addr, regs))
+		if (kfence_handle_page_fault(addr, esr & ESR_ELx_WNR, regs))
 			return;
 
 		msg = "paging request";
--- a/arch/x86/mm/fault.c~kfence-add-test-suite
+++ a/arch/x86/mm/fault.c
@@ -682,7 +682,8 @@ page_fault_oops(struct pt_regs *regs, un
 		efi_crash_gracefully_on_page_fault(address);
 
 	/* Only not-present faults should be handled by KFENCE. */
-	if (!(error_code & X86_PF_PROT) && kfence_handle_page_fault(address, regs))
+	if (!(error_code & X86_PF_PROT) &&
+	    kfence_handle_page_fault(address, error_code & X86_PF_WRITE, regs))
 		return;
 
 oops:
--- a/Documentation/dev-tools/kfence.rst~kfence-add-test-suite
+++ a/Documentation/dev-tools/kfence.rst
@@ -65,9 +65,9 @@ Error reports
 A typical out-of-bounds access looks like this::
 
     ==================================================================
-    BUG: KFENCE: out-of-bounds in test_out_of_bounds_read+0xa3/0x22b
+    BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0xa3/0x22b
 
-    Out-of-bounds access at 0xffffffffb672efff (1B left of kfence-#17):
+    Out-of-bounds read at 0xffffffffb672efff (1B left of kfence-#17):
      test_out_of_bounds_read+0xa3/0x22b
      kunit_try_run_case+0x51/0x85
      kunit_generic_run_threadfn_adapter+0x16/0x30
@@ -94,9 +94,9 @@ its origin. Note that, real kernel addre
 Use-after-free accesses are reported as::
 
     ==================================================================
-    BUG: KFENCE: use-after-free in test_use_after_free_read+0xb3/0x143
+    BUG: KFENCE: use-after-free read in test_use_after_free_read+0xb3/0x143
 
-    Use-after-free access at 0xffffffffb673dfe0 (in kfence-#24):
+    Use-after-free read at 0xffffffffb673dfe0 (in kfence-#24):
      test_use_after_free_read+0xb3/0x143
      kunit_try_run_case+0x51/0x85
      kunit_generic_run_threadfn_adapter+0x16/0x30
@@ -193,9 +193,9 @@ where it was not possible to determine a
 object pages had not yet been allocated::
 
     ==================================================================
-    BUG: KFENCE: invalid access in test_invalid_access+0x26/0xe0
+    BUG: KFENCE: invalid read in test_invalid_access+0x26/0xe0
 
-    Invalid access at 0xffffffffb670b00a:
+    Invalid read at 0xffffffffb670b00a:
      test_invalid_access+0x26/0xe0
      kunit_try_run_case+0x51/0x85
      kunit_generic_run_threadfn_adapter+0x16/0x30
--- a/include/linux/kfence.h~kfence-add-test-suite
+++ a/include/linux/kfence.h
@@ -186,6 +186,7 @@ static __always_inline __must_check bool
 /**
  * kfence_handle_page_fault() - perform page fault handling for KFENCE pages
  * @addr: faulting address
+ * @is_write: is access a write
  * @regs: current struct pt_regs (can be NULL, but shows full stack trace)
  *
  * Return:
@@ -197,7 +198,7 @@ static __always_inline __must_check bool
  * cases KFENCE prints an error message and marks the offending page as
  * present, so that the kernel can proceed.
  */
-bool __must_check kfence_handle_page_fault(unsigned long addr, struct pt_regs *regs);
+bool __must_check kfence_handle_page_fault(unsigned long addr, bool is_write, struct pt_regs *regs);
 
 #else /* CONFIG_KFENCE */
 
@@ -210,7 +211,11 @@ static inline size_t kfence_ksize(const
 static inline void *kfence_object_start(const void *addr) { return NULL; }
 static inline void __kfence_free(void *addr) { }
 static inline bool __must_check kfence_free(void *addr) { return false; }
-static inline bool __must_check kfence_handle_page_fault(unsigned long addr, struct pt_regs *regs) { return false; }
+static inline bool __must_check kfence_handle_page_fault(unsigned long addr, bool is_write,
+							 struct pt_regs *regs)
+{
+	return false;
+}
 
 #endif
 
--- a/lib/Kconfig.kfence~kfence-add-test-suite
+++ a/lib/Kconfig.kfence
@@ -66,4 +66,17 @@ config KFENCE_STRESS_TEST_FAULTS
 
 	  Only for KFENCE testing; set to 0 if you are not a KFENCE developer.
 
+config KFENCE_KUNIT_TEST
+	tristate "KFENCE integration test suite" if !KUNIT_ALL_TESTS
+	default KUNIT_ALL_TESTS
+	depends on TRACEPOINTS && KUNIT
+	help
+	  Test suite for KFENCE, testing various error detection scenarios with
+	  various allocation types, and checking that reports are correctly
+	  output to console.
+
+	  Say Y here if you want the test to be built into the kernel and run
+	  during boot; say M if you want the test to build as a module; say N
+	  if you are unsure.
+
 endif # KFENCE
--- a/mm/kfence/core.c~kfence-add-test-suite
+++ a/mm/kfence/core.c
@@ -216,7 +216,7 @@ static inline bool check_canary_byte(u8
 		return true;
 
 	atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]);
-	kfence_report_error((unsigned long)addr, NULL, addr_to_metadata((unsigned long)addr),
+	kfence_report_error((unsigned long)addr, false, NULL, addr_to_metadata((unsigned long)addr),
 			    KFENCE_ERROR_CORRUPTION);
 	return false;
 }
@@ -355,7 +355,8 @@ static void kfence_guarded_free(void *ad
 	if (meta->state != KFENCE_OBJECT_ALLOCATED || meta->addr != (unsigned long)addr) {
 		/* Invalid or double-free, bail out. */
 		atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]);
-		kfence_report_error((unsigned long)addr, NULL, meta, KFENCE_ERROR_INVALID_FREE);
+		kfence_report_error((unsigned long)addr, false, NULL, meta,
+				    KFENCE_ERROR_INVALID_FREE);
 		raw_spin_unlock_irqrestore(&meta->lock, flags);
 		return;
 	}
@@ -770,7 +771,7 @@ void __kfence_free(void *addr)
 		kfence_guarded_free(addr, meta, false);
 }
 
-bool kfence_handle_page_fault(unsigned long addr, struct pt_regs *regs)
+bool kfence_handle_page_fault(unsigned long addr, bool is_write, struct pt_regs *regs)
 {
 	const int page_index = (addr - (unsigned long)__kfence_pool) / PAGE_SIZE;
 	struct kfence_metadata *to_report = NULL;
@@ -833,11 +834,11 @@ bool kfence_handle_page_fault(unsigned l
 
 out:
 	if (to_report) {
-		kfence_report_error(addr, regs, to_report, error_type);
+		kfence_report_error(addr, is_write, regs, to_report, error_type);
 		raw_spin_unlock_irqrestore(&to_report->lock, flags);
 	} else {
 		/* This may be a UAF or OOB access, but we can't be sure. */
-		kfence_report_error(addr, regs, NULL, KFENCE_ERROR_INVALID);
+		kfence_report_error(addr, is_write, regs, NULL, KFENCE_ERROR_INVALID);
 	}
 
 	return kfence_unprotect(addr); /* Unprotect and let access proceed. */
--- a/mm/kfence/kfence.h~kfence-add-test-suite
+++ a/mm/kfence/kfence.h
@@ -105,7 +105,7 @@ enum kfence_error_type {
 	KFENCE_ERROR_INVALID_FREE,	/* Invalid free. */
 };
 
-void kfence_report_error(unsigned long address, struct pt_regs *regs,
+void kfence_report_error(unsigned long address, bool is_write, struct pt_regs *regs,
 			 const struct kfence_metadata *meta, enum kfence_error_type type);
 
 void kfence_print_object(struct seq_file *seq, const struct kfence_metadata *meta);
--- /dev/null
+++ a/mm/kfence/kfence_test.c
@@ -0,0 +1,858 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Test cases for KFENCE memory safety error detector. Since the interface with
+ * which KFENCE's reports are obtained is via the console, this is the output we
+ * should verify. For each test case checks the presence (or absence) of
+ * generated reports. Relies on 'console' tracepoint to capture reports as they
+ * appear in the kernel log.
+ *
+ * Copyright (C) 2020, Google LLC.
+ * Author: Alexander Potapenko <glider@google.com>
+ *         Marco Elver <elver@google.com>
+ */
+
+#include <kunit/test.h>
+#include <linux/jiffies.h>
+#include <linux/kernel.h>
+#include <linux/kfence.h>
+#include <linux/mm.h>
+#include <linux/random.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/string.h>
+#include <linux/tracepoint.h>
+#include <trace/events/printk.h>
+
+#include "kfence.h"
+
+/* Report as observed from console. */
+static struct {
+	spinlock_t lock;
+	int nlines;
+	char lines[2][256];
+} observed = {
+	.lock = __SPIN_LOCK_UNLOCKED(observed.lock),
+};
+
+/* Probe for console output: obtains observed lines of interest. */
+static void probe_console(void *ignore, const char *buf, size_t len)
+{
+	unsigned long flags;
+	int nlines;
+
+	spin_lock_irqsave(&observed.lock, flags);
+	nlines = observed.nlines;
+
+	if (strnstr(buf, "BUG: KFENCE: ", len) && strnstr(buf, "test_", len)) {
+		/*
+		 * KFENCE report and related to the test.
+		 *
+		 * The provided @buf is not NUL-terminated; copy no more than
+		 * @len bytes and let strscpy() add the missing NUL-terminator.
+		 */
+		strscpy(observed.lines[0], buf, min(len + 1, sizeof(observed.lines[0])));
+		nlines = 1;
+	} else if (nlines == 1 && (strnstr(buf, "at 0x", len) || strnstr(buf, "of 0x", len))) {
+		strscpy(observed.lines[nlines++], buf, min(len + 1, sizeof(observed.lines[0])));
+	}
+
+	WRITE_ONCE(observed.nlines, nlines); /* Publish new nlines. */
+	spin_unlock_irqrestore(&observed.lock, flags);
+}
+
+/* Check if a report related to the test exists. */
+static bool report_available(void)
+{
+	return READ_ONCE(observed.nlines) == ARRAY_SIZE(observed.lines);
+}
+
+/* Information we expect in a report. */
+struct expect_report {
+	enum kfence_error_type type; /* The type or error. */
+	void *fn; /* Function pointer to expected function where access occurred. */
+	char *addr; /* Address at which the bad access occurred. */
+	bool is_write; /* Is access a write. */
+};
+
+static const char *get_access_type(const struct expect_report *r)
+{
+	return r->is_write ? "write" : "read";
+}
+
+/* Check observed report matches information in @r. */
+static bool report_matches(const struct expect_report *r)
+{
+	bool ret = false;
+	unsigned long flags;
+	typeof(observed.lines) expect;
+	const char *end;
+	char *cur;
+
+	/* Doubled-checked locking. */
+	if (!report_available())
+		return false;
+
+	/* Generate expected report contents. */
+
+	/* Title */
+	cur = expect[0];
+	end = &expect[0][sizeof(expect[0]) - 1];
+	switch (r->type) {
+	case KFENCE_ERROR_OOB:
+		cur += scnprintf(cur, end - cur, "BUG: KFENCE: out-of-bounds %s",
+				 get_access_type(r));
+		break;
+	case KFENCE_ERROR_UAF:
+		cur += scnprintf(cur, end - cur, "BUG: KFENCE: use-after-free %s",
+				 get_access_type(r));
+		break;
+	case KFENCE_ERROR_CORRUPTION:
+		cur += scnprintf(cur, end - cur, "BUG: KFENCE: memory corruption");
+		break;
+	case KFENCE_ERROR_INVALID:
+		cur += scnprintf(cur, end - cur, "BUG: KFENCE: invalid %s",
+				 get_access_type(r));
+		break;
+	case KFENCE_ERROR_INVALID_FREE:
+		cur += scnprintf(cur, end - cur, "BUG: KFENCE: invalid free");
+		break;
+	}
+
+	scnprintf(cur, end - cur, " in %pS", r->fn);
+	/* The exact offset won't match, remove it; also strip module name. */
+	cur = strchr(expect[0], '+');
+	if (cur)
+		*cur = '\0';
+
+	/* Access information */
+	cur = expect[1];
+	end = &expect[1][sizeof(expect[1]) - 1];
+
+	switch (r->type) {
+	case KFENCE_ERROR_OOB:
+		cur += scnprintf(cur, end - cur, "Out-of-bounds %s at", get_access_type(r));
+		break;
+	case KFENCE_ERROR_UAF:
+		cur += scnprintf(cur, end - cur, "Use-after-free %s at", get_access_type(r));
+		break;
+	case KFENCE_ERROR_CORRUPTION:
+		cur += scnprintf(cur, end - cur, "Corrupted memory at");
+		break;
+	case KFENCE_ERROR_INVALID:
+		cur += scnprintf(cur, end - cur, "Invalid %s at", get_access_type(r));
+		break;
+	case KFENCE_ERROR_INVALID_FREE:
+		cur += scnprintf(cur, end - cur, "Invalid free of");
+		break;
+	}
+
+	cur += scnprintf(cur, end - cur, " 0x" PTR_FMT, (void *)r->addr);
+
+	spin_lock_irqsave(&observed.lock, flags);
+	if (!report_available())
+		goto out; /* A new report is being captured. */
+
+	/* Finally match expected output to what we actually observed. */
+	ret = strstr(observed.lines[0], expect[0]) && strstr(observed.lines[1], expect[1]);
+out:
+	spin_unlock_irqrestore(&observed.lock, flags);
+	return ret;
+}
+
+/* ===== Test cases ===== */
+
+#define TEST_PRIV_WANT_MEMCACHE ((void *)1)
+
+/* Cache used by tests; if NULL, allocate from kmalloc instead. */
+static struct kmem_cache *test_cache;
+
+static size_t setup_test_cache(struct kunit *test, size_t size, slab_flags_t flags,
+			       void (*ctor)(void *))
+{
+	if (test->priv != TEST_PRIV_WANT_MEMCACHE)
+		return size;
+
+	kunit_info(test, "%s: size=%zu, ctor=%ps\n", __func__, size, ctor);
+
+	/*
+	 * Use SLAB_NOLEAKTRACE to prevent merging with existing caches. Any
+	 * other flag in SLAB_NEVER_MERGE also works. Use SLAB_ACCOUNT to
+	 * allocate via memcg, if enabled.
+	 */
+	flags |= SLAB_NOLEAKTRACE | SLAB_ACCOUNT;
+	test_cache = kmem_cache_create("test", size, 1, flags, ctor);
+	KUNIT_ASSERT_TRUE_MSG(test, test_cache, "could not create cache");
+
+	return size;
+}
+
+static void test_cache_destroy(void)
+{
+	if (!test_cache)
+		return;
+
+	kmem_cache_destroy(test_cache);
+	test_cache = NULL;
+}
+
+static inline size_t kmalloc_cache_alignment(size_t size)
+{
+	return kmalloc_caches[kmalloc_type(GFP_KERNEL)][kmalloc_index(size)]->align;
+}
+
+/* Must always inline to match stack trace against caller. */
+static __always_inline void test_free(void *ptr)
+{
+	if (test_cache)
+		kmem_cache_free(test_cache, ptr);
+	else
+		kfree(ptr);
+}
+
+/*
+ * If this should be a KFENCE allocation, and on which side the allocation and
+ * the closest guard page should be.
+ */
+enum allocation_policy {
+	ALLOCATE_ANY, /* KFENCE, any side. */
+	ALLOCATE_LEFT, /* KFENCE, left side of page. */
+	ALLOCATE_RIGHT, /* KFENCE, right side of page. */
+	ALLOCATE_NONE, /* No KFENCE allocation. */
+};
+
+/*
+ * Try to get a guarded allocation from KFENCE. Uses either kmalloc() or the
+ * current test_cache if set up.
+ */
+static void *test_alloc(struct kunit *test, size_t size, gfp_t gfp, enum allocation_policy policy)
+{
+	void *alloc;
+	unsigned long timeout, resched_after;
+	const char *policy_name;
+
+	switch (policy) {
+	case ALLOCATE_ANY:
+		policy_name = "any";
+		break;
+	case ALLOCATE_LEFT:
+		policy_name = "left";
+		break;
+	case ALLOCATE_RIGHT:
+		policy_name = "right";
+		break;
+	case ALLOCATE_NONE:
+		policy_name = "none";
+		break;
+	}
+
+	kunit_info(test, "%s: size=%zu, gfp=%x, policy=%s, cache=%i\n", __func__, size, gfp,
+		   policy_name, !!test_cache);
+
+	/*
+	 * 100x the sample interval should be more than enough to ensure we get
+	 * a KFENCE allocation eventually.
+	 */
+	timeout = jiffies + msecs_to_jiffies(100 * CONFIG_KFENCE_SAMPLE_INTERVAL);
+	/*
+	 * Especially for non-preemption kernels, ensure the allocation-gate
+	 * timer can catch up: after @resched_after, every failed allocation
+	 * attempt yields, to ensure the allocation-gate timer is scheduled.
+	 */
+	resched_after = jiffies + msecs_to_jiffies(CONFIG_KFENCE_SAMPLE_INTERVAL);
+	do {
+		if (test_cache)
+			alloc = kmem_cache_alloc(test_cache, gfp);
+		else
+			alloc = kmalloc(size, gfp);
+
+		if (is_kfence_address(alloc)) {
+			struct page *page = virt_to_head_page(alloc);
+			struct kmem_cache *s = test_cache ?: kmalloc_caches[kmalloc_type(GFP_KERNEL)][kmalloc_index(size)];
+
+			/*
+			 * Verify that various helpers return the right values
+			 * even for KFENCE objects; these are required so that
+			 * memcg accounting works correctly.
+			 */
+			KUNIT_EXPECT_EQ(test, obj_to_index(s, page, alloc), 0U);
+			KUNIT_EXPECT_EQ(test, objs_per_slab_page(s, page), 1);
+
+			if (policy == ALLOCATE_ANY)
+				return alloc;
+			if (policy == ALLOCATE_LEFT && IS_ALIGNED((unsigned long)alloc, PAGE_SIZE))
+				return alloc;
+			if (policy == ALLOCATE_RIGHT &&
+			    !IS_ALIGNED((unsigned long)alloc, PAGE_SIZE))
+				return alloc;
+		} else if (policy == ALLOCATE_NONE)
+			return alloc;
+
+		test_free(alloc);
+
+		if (time_after(jiffies, resched_after))
+			cond_resched();
+	} while (time_before(jiffies, timeout));
+
+	KUNIT_ASSERT_TRUE_MSG(test, false, "failed to allocate from KFENCE");
+	return NULL; /* Unreachable. */
+}
+
+static void test_out_of_bounds_read(struct kunit *test)
+{
+	size_t size = 32;
+	struct expect_report expect = {
+		.type = KFENCE_ERROR_OOB,
+		.fn = test_out_of_bounds_read,
+		.is_write = false,
+	};
+	char *buf;
+
+	setup_test_cache(test, size, 0, NULL);
+
+	/*
+	 * If we don't have our own cache, adjust based on alignment, so that we
+	 * actually access guard pages on either side.
+	 */
+	if (!test_cache)
+		size = kmalloc_cache_alignment(size);
+
+	/* Test both sides. */
+
+	buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_LEFT);
+	expect.addr = buf - 1;
+	READ_ONCE(*expect.addr);
+	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+	test_free(buf);
+
+	buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_RIGHT);
+	expect.addr = buf + size;
+	READ_ONCE(*expect.addr);
+	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+	test_free(buf);
+}
+
+static void test_out_of_bounds_write(struct kunit *test)
+{
+	size_t size = 32;
+	struct expect_report expect = {
+		.type = KFENCE_ERROR_OOB,
+		.fn = test_out_of_bounds_write,
+		.is_write = true,
+	};
+	char *buf;
+
+	setup_test_cache(test, size, 0, NULL);
+	buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_LEFT);
+	expect.addr = buf - 1;
+	WRITE_ONCE(*expect.addr, 42);
+	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+	test_free(buf);
+}
+
+static void test_use_after_free_read(struct kunit *test)
+{
+	const size_t size = 32;
+	struct expect_report expect = {
+		.type = KFENCE_ERROR_UAF,
+		.fn = test_use_after_free_read,
+		.is_write = false,
+	};
+
+	setup_test_cache(test, size, 0, NULL);
+	expect.addr = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY);
+	test_free(expect.addr);
+	READ_ONCE(*expect.addr);
+	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+static void test_double_free(struct kunit *test)
+{
+	const size_t size = 32;
+	struct expect_report expect = {
+		.type = KFENCE_ERROR_INVALID_FREE,
+		.fn = test_double_free,
+	};
+
+	setup_test_cache(test, size, 0, NULL);
+	expect.addr = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY);
+	test_free(expect.addr);
+	test_free(expect.addr); /* Double-free. */
+	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+static void test_invalid_addr_free(struct kunit *test)
+{
+	const size_t size = 32;
+	struct expect_report expect = {
+		.type = KFENCE_ERROR_INVALID_FREE,
+		.fn = test_invalid_addr_free,
+	};
+	char *buf;
+
+	setup_test_cache(test, size, 0, NULL);
+	buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY);
+	expect.addr = buf + 1; /* Free on invalid address. */
+	test_free(expect.addr); /* Invalid address free. */
+	test_free(buf); /* No error. */
+	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+static void test_corruption(struct kunit *test)
+{
+	size_t size = 32;
+	struct expect_report expect = {
+		.type = KFENCE_ERROR_CORRUPTION,
+		.fn = test_corruption,
+	};
+	char *buf;
+
+	setup_test_cache(test, size, 0, NULL);
+
+	/* Test both sides. */
+
+	buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_LEFT);
+	expect.addr = buf + size;
+	WRITE_ONCE(*expect.addr, 42);
+	test_free(buf);
+	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+
+	buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_RIGHT);
+	expect.addr = buf - 1;
+	WRITE_ONCE(*expect.addr, 42);
+	test_free(buf);
+	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/*
+ * KFENCE is unable to detect an OOB if the allocation's alignment requirements
+ * leave a gap between the object and the guard page. Specifically, an
+ * allocation of e.g. 73 bytes is aligned on 8 and 128 bytes for SLUB or SLAB
+ * respectively. Therefore it is impossible for the allocated object to
+ * contiguously line up with the right guard page.
+ *
+ * However, we test that an access to memory beyond the gap results in KFENCE
+ * detecting an OOB access.
+ */
+static void test_kmalloc_aligned_oob_read(struct kunit *test)
+{
+	const size_t size = 73;
+	const size_t align = kmalloc_cache_alignment(size);
+	struct expect_report expect = {
+		.type = KFENCE_ERROR_OOB,
+		.fn = test_kmalloc_aligned_oob_read,
+		.is_write = false,
+	};
+	char *buf;
+
+	buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_RIGHT);
+
+	/*
+	 * The object is offset to the right, so there won't be an OOB to the
+	 * left of it.
+	 */
+	READ_ONCE(*(buf - 1));
+	KUNIT_EXPECT_FALSE(test, report_available());
+
+	/*
+	 * @buf must be aligned on @align, therefore buf + size belongs to the
+	 * same page -> no OOB.
+	 */
+	READ_ONCE(*(buf + size));
+	KUNIT_EXPECT_FALSE(test, report_available());
+
+	/* Overflowing by @align bytes will result in an OOB. */
+	expect.addr = buf + size + align;
+	READ_ONCE(*expect.addr);
+	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+
+	test_free(buf);
+}
+
+static void test_kmalloc_aligned_oob_write(struct kunit *test)
+{
+	const size_t size = 73;
+	struct expect_report expect = {
+		.type = KFENCE_ERROR_CORRUPTION,
+		.fn = test_kmalloc_aligned_oob_write,
+	};
+	char *buf;
+
+	buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_RIGHT);
+	/*
+	 * The object is offset to the right, so we won't get a page
+	 * fault immediately after it.
+	 */
+	expect.addr = buf + size;
+	WRITE_ONCE(*expect.addr, READ_ONCE(*expect.addr) + 1);
+	KUNIT_EXPECT_FALSE(test, report_available());
+	test_free(buf);
+	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/* Test cache shrinking and destroying with KFENCE. */
+static void test_shrink_memcache(struct kunit *test)
+{
+	const size_t size = 32;
+	void *buf;
+
+	setup_test_cache(test, size, 0, NULL);
+	KUNIT_EXPECT_TRUE(test, test_cache);
+	buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY);
+	kmem_cache_shrink(test_cache);
+	test_free(buf);
+
+	KUNIT_EXPECT_FALSE(test, report_available());
+}
+
+static void ctor_set_x(void *obj)
+{
+	/* Every object has at least 8 bytes. */
+	memset(obj, 'x', 8);
+}
+
+/* Ensure that SL*B does not modify KFENCE objects on bulk free. */
+static void test_free_bulk(struct kunit *test)
+{
+	int iter;
+
+	for (iter = 0; iter < 5; iter++) {
+		const size_t size = setup_test_cache(test, 8 + prandom_u32_max(300), 0,
+						     (iter & 1) ? ctor_set_x : NULL);
+		void *objects[] = {
+			test_alloc(test, size, GFP_KERNEL, ALLOCATE_RIGHT),
+			test_alloc(test, size, GFP_KERNEL, ALLOCATE_NONE),
+			test_alloc(test, size, GFP_KERNEL, ALLOCATE_LEFT),
+			test_alloc(test, size, GFP_KERNEL, ALLOCATE_NONE),
+			test_alloc(test, size, GFP_KERNEL, ALLOCATE_NONE),
+		};
+
+		kmem_cache_free_bulk(test_cache, ARRAY_SIZE(objects), objects);
+		KUNIT_ASSERT_FALSE(test, report_available());
+		test_cache_destroy();
+	}
+}
+
+/* Test init-on-free works. */
+static void test_init_on_free(struct kunit *test)
+{
+	const size_t size = 32;
+	struct expect_report expect = {
+		.type = KFENCE_ERROR_UAF,
+		.fn = test_init_on_free,
+		.is_write = false,
+	};
+	int i;
+
+	if (!IS_ENABLED(CONFIG_INIT_ON_FREE_DEFAULT_ON))
+		return;
+	/* Assume it hasn't been disabled on command line. */
+
+	setup_test_cache(test, size, 0, NULL);
+	expect.addr = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY);
+	for (i = 0; i < size; i++)
+		expect.addr[i] = i + 1;
+	test_free(expect.addr);
+
+	for (i = 0; i < size; i++) {
+		/*
+		 * This may fail if the page was recycled by KFENCE and then
+		 * written to again -- this however, is near impossible with a
+		 * default config.
+		 */
+		KUNIT_EXPECT_EQ(test, expect.addr[i], (char)0);
+
+		if (!i) /* Only check first access to not fail test if page is ever re-protected. */
+			KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+	}
+}
+
+/* Ensure that constructors work properly. */
+static void test_memcache_ctor(struct kunit *test)
+{
+	const size_t size = 32;
+	char *buf;
+	int i;
+
+	setup_test_cache(test, size, 0, ctor_set_x);
+	buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY);
+
+	for (i = 0; i < 8; i++)
+		KUNIT_EXPECT_EQ(test, buf[i], (char)'x');
+
+	test_free(buf);
+
+	KUNIT_EXPECT_FALSE(test, report_available());
+}
+
+/* Test that memory is zeroed if requested. */
+static void test_gfpzero(struct kunit *test)
+{
+	const size_t size = PAGE_SIZE; /* PAGE_SIZE so we can use ALLOCATE_ANY. */
+	char *buf1, *buf2;
+	int i;
+
+	if (CONFIG_KFENCE_SAMPLE_INTERVAL > 100) {
+		kunit_warn(test, "skipping ... would take too long\n");
+		return;
+	}
+
+	setup_test_cache(test, size, 0, NULL);
+	buf1 = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY);
+	for (i = 0; i < size; i++)
+		buf1[i] = i + 1;
+	test_free(buf1);
+
+	/* Try to get same address again -- this can take a while. */
+	for (i = 0;; i++) {
+		buf2 = test_alloc(test, size, GFP_KERNEL | __GFP_ZERO, ALLOCATE_ANY);
+		if (buf1 == buf2)
+			break;
+		test_free(buf2);
+
+		if (i == CONFIG_KFENCE_NUM_OBJECTS) {
+			kunit_warn(test, "giving up ... cannot get same object back\n");
+			return;
+		}
+	}
+
+	for (i = 0; i < size; i++)
+		KUNIT_EXPECT_EQ(test, buf2[i], (char)0);
+
+	test_free(buf2);
+
+	KUNIT_EXPECT_FALSE(test, report_available());
+}
+
+static void test_invalid_access(struct kunit *test)
+{
+	const struct expect_report expect = {
+		.type = KFENCE_ERROR_INVALID,
+		.fn = test_invalid_access,
+		.addr = &__kfence_pool[10],
+		.is_write = false,
+	};
+
+	READ_ONCE(__kfence_pool[10]);
+	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/* Test SLAB_TYPESAFE_BY_RCU works. */
+static void test_memcache_typesafe_by_rcu(struct kunit *test)
+{
+	const size_t size = 32;
+	struct expect_report expect = {
+		.type = KFENCE_ERROR_UAF,
+		.fn = test_memcache_typesafe_by_rcu,
+		.is_write = false,
+	};
+
+	setup_test_cache(test, size, SLAB_TYPESAFE_BY_RCU, NULL);
+	KUNIT_EXPECT_TRUE(test, test_cache); /* Want memcache. */
+
+	expect.addr = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY);
+	*expect.addr = 42;
+
+	rcu_read_lock();
+	test_free(expect.addr);
+	KUNIT_EXPECT_EQ(test, *expect.addr, (char)42);
+	/*
+	 * Up to this point, memory should not have been freed yet, and
+	 * therefore there should be no KFENCE report from the above access.
+	 */
+	rcu_read_unlock();
+
+	/* Above access to @expect.addr should not have generated a report! */
+	KUNIT_EXPECT_FALSE(test, report_available());
+
+	/* Only after rcu_barrier() is the memory guaranteed to be freed. */
+	rcu_barrier();
+
+	/* Expect use-after-free. */
+	KUNIT_EXPECT_EQ(test, *expect.addr, (char)42);
+	KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/* Test krealloc(). */
+static void test_krealloc(struct kunit *test)
+{
+	const size_t size = 32;
+	const struct expect_report expect = {
+		.type = KFENCE_ERROR_UAF,
+		.fn = test_krealloc,
+		.addr = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY),
+		.is_write = false,
+	};
+	char *buf = expect.addr;
+	int i;
+
+	KUNIT_EXPECT_FALSE(test, test_cache);
+	KUNIT_EXPECT_EQ(test, ksize(buf), size); /* Precise size match after KFENCE alloc. */
+	for (i = 0; i < size; i++)
+		buf[i] = i + 1;
+
+	/* Check that we successfully change the size. */
+	buf = krealloc(buf, size * 3, GFP_KERNEL); /* Grow. */
+	/* Note: Might no longer be a KFENCE alloc. */
+	KUNIT_EXPECT_GE(test, ksize(buf), size * 3);
+	for (i = 0; i < size; i++)
+		KUNIT_EXPECT_EQ(test, buf[i], (char)(i + 1));
+	for (; i < size * 3; i++) /* Fill to extra bytes. */
+		buf[i] = i + 1;
+
+	buf = krealloc(buf, size * 2, GFP_KERNEL); /* Shrink. */
+	KUNIT_EXPECT_GE(test, ksize(buf), size * 2);
+	for (i = 0; i < size * 2; i++)
+		KUNIT_EXPECT_EQ(test, buf[i], (char)(i + 1));
+
+	buf = krealloc(buf, 0, GFP_KERNEL); /* Free. */
+	KUNIT_EXPECT_EQ(test, (unsigned long)buf, (unsigned long)ZERO_SIZE_PTR);
+	KUNIT_ASSERT_FALSE(test, report_available()); /* No reports yet! */
+
+	READ_ONCE(*expect.addr); /* Ensure krealloc() actually freed earlier KFENCE object. */
+	KUNIT_ASSERT_TRUE(test, report_matches(&expect));
+}
+
+/* Test that some objects from a bulk allocation belong to KFENCE pool. */
+static void test_memcache_alloc_bulk(struct kunit *test)
+{
+	const size_t size = 32;
+	bool pass = false;
+	unsigned long timeout;
+
+	setup_test_cache(test, size, 0, NULL);
+	KUNIT_EXPECT_TRUE(test, test_cache); /* Want memcache. */
+	/*
+	 * 100x the sample interval should be more than enough to ensure we get
+	 * a KFENCE allocation eventually.
+	 */
+	timeout = jiffies + msecs_to_jiffies(100 * CONFIG_KFENCE_SAMPLE_INTERVAL);
+	do {
+		void *objects[100];
+		int i, num = kmem_cache_alloc_bulk(test_cache, GFP_ATOMIC, ARRAY_SIZE(objects),
+						   objects);
+		if (!num)
+			continue;
+		for (i = 0; i < ARRAY_SIZE(objects); i++) {
+			if (is_kfence_address(objects[i])) {
+				pass = true;
+				break;
+			}
+		}
+		kmem_cache_free_bulk(test_cache, num, objects);
+		/*
+		 * kmem_cache_alloc_bulk() disables interrupts, and calling it
+		 * in a tight loop may not give KFENCE a chance to switch the
+		 * static branch. Call cond_resched() to let KFENCE chime in.
+		 */
+		cond_resched();
+	} while (!pass && time_before(jiffies, timeout));
+
+	KUNIT_EXPECT_TRUE(test, pass);
+	KUNIT_EXPECT_FALSE(test, report_available());
+}
+
+/*
+ * KUnit does not provide a way to provide arguments to tests, and we encode
+ * additional info in the name. Set up 2 tests per test case, one using the
+ * default allocator, and another using a custom memcache (suffix '-memcache').
+ */
+#define KFENCE_KUNIT_CASE(test_name)						\
+	{ .run_case = test_name, .name = #test_name },				\
+	{ .run_case = test_name, .name = #test_name "-memcache" }
+
+static struct kunit_case kfence_test_cases[] = {
+	KFENCE_KUNIT_CASE(test_out_of_bounds_read),
+	KFENCE_KUNIT_CASE(test_out_of_bounds_write),
+	KFENCE_KUNIT_CASE(test_use_after_free_read),
+	KFENCE_KUNIT_CASE(test_double_free),
+	KFENCE_KUNIT_CASE(test_invalid_addr_free),
+	KFENCE_KUNIT_CASE(test_corruption),
+	KFENCE_KUNIT_CASE(test_free_bulk),
+	KFENCE_KUNIT_CASE(test_init_on_free),
+	KUNIT_CASE(test_kmalloc_aligned_oob_read),
+	KUNIT_CASE(test_kmalloc_aligned_oob_write),
+	KUNIT_CASE(test_shrink_memcache),
+	KUNIT_CASE(test_memcache_ctor),
+	KUNIT_CASE(test_invalid_access),
+	KUNIT_CASE(test_gfpzero),
+	KUNIT_CASE(test_memcache_typesafe_by_rcu),
+	KUNIT_CASE(test_krealloc),
+	KUNIT_CASE(test_memcache_alloc_bulk),
+	{},
+};
+
+/* ===== End test cases ===== */
+
+static int test_init(struct kunit *test)
+{
+	unsigned long flags;
+	int i;
+
+	spin_lock_irqsave(&observed.lock, flags);
+	for (i = 0; i < ARRAY_SIZE(observed.lines); i++)
+		observed.lines[i][0] = '\0';
+	observed.nlines = 0;
+	spin_unlock_irqrestore(&observed.lock, flags);
+
+	/* Any test with 'memcache' in its name will want a memcache. */
+	if (strstr(test->name, "memcache"))
+		test->priv = TEST_PRIV_WANT_MEMCACHE;
+	else
+		test->priv = NULL;
+
+	return 0;
+}
+
+static void test_exit(struct kunit *test)
+{
+	test_cache_destroy();
+}
+
+static struct kunit_suite kfence_test_suite = {
+	.name = "kfence",
+	.test_cases = kfence_test_cases,
+	.init = test_init,
+	.exit = test_exit,
+};
+static struct kunit_suite *kfence_test_suites[] = { &kfence_test_suite, NULL };
+
+static void register_tracepoints(struct tracepoint *tp, void *ignore)
+{
+	check_trace_callback_type_console(probe_console);
+	if (!strcmp(tp->name, "console"))
+		WARN_ON(tracepoint_probe_register(tp, probe_console, NULL));
+}
+
+static void unregister_tracepoints(struct tracepoint *tp, void *ignore)
+{
+	if (!strcmp(tp->name, "console"))
+		tracepoint_probe_unregister(tp, probe_console, NULL);
+}
+
+/*
+ * We only want to do tracepoints setup and teardown once, therefore we have to
+ * customize the init and exit functions and cannot rely on kunit_test_suite().
+ */
+static int __init kfence_test_init(void)
+{
+	/*
+	 * Because we want to be able to build the test as a module, we need to
+	 * iterate through all known tracepoints, since the static registration
+	 * won't work here.
+	 */
+	for_each_kernel_tracepoint(register_tracepoints, NULL);
+	return __kunit_test_suites_init(kfence_test_suites);
+}
+
+static void kfence_test_exit(void)
+{
+	__kunit_test_suites_exit(kfence_test_suites);
+	for_each_kernel_tracepoint(unregister_tracepoints, NULL);
+	tracepoint_synchronize_unregister();
+}
+
+late_initcall(kfence_test_init);
+module_exit(kfence_test_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Alexander Potapenko <glider@google.com>, Marco Elver <elver@google.com>");
--- a/mm/kfence/Makefile~kfence-add-test-suite
+++ a/mm/kfence/Makefile
@@ -1,3 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 
 obj-$(CONFIG_KFENCE) := core.o report.o
+
+CFLAGS_kfence_test.o := -g -fno-omit-frame-pointer -fno-optimize-sibling-calls
+obj-$(CONFIG_KFENCE_KUNIT_TEST) += kfence_test.o
--- a/mm/kfence/report.c~kfence-add-test-suite
+++ a/mm/kfence/report.c
@@ -156,7 +156,12 @@ static void print_diff_canary(unsigned l
 	pr_cont(" ]");
 }
 
-void kfence_report_error(unsigned long address, struct pt_regs *regs,
+static const char *get_access_type(bool is_write)
+{
+	return is_write ? "write" : "read";
+}
+
+void kfence_report_error(unsigned long address, bool is_write, struct pt_regs *regs,
 			 const struct kfence_metadata *meta, enum kfence_error_type type)
 {
 	unsigned long stack_entries[KFENCE_STACK_DEPTH] = { 0 };
@@ -194,17 +199,19 @@ void kfence_report_error(unsigned long a
 	case KFENCE_ERROR_OOB: {
 		const bool left_of_object = address < meta->addr;
 
-		pr_err("BUG: KFENCE: out-of-bounds in %pS\n\n", (void *)stack_entries[skipnr]);
-		pr_err("Out-of-bounds access at 0x" PTR_FMT " (%luB %s of kfence-#%zd):\n",
-		       (void *)address,
+		pr_err("BUG: KFENCE: out-of-bounds %s in %pS\n\n", get_access_type(is_write),
+		       (void *)stack_entries[skipnr]);
+		pr_err("Out-of-bounds %s at 0x" PTR_FMT " (%luB %s of kfence-#%zd):\n",
+		       get_access_type(is_write), (void *)address,
 		       left_of_object ? meta->addr - address : address - meta->addr,
 		       left_of_object ? "left" : "right", object_index);
 		break;
 	}
 	case KFENCE_ERROR_UAF:
-		pr_err("BUG: KFENCE: use-after-free in %pS\n\n", (void *)stack_entries[skipnr]);
-		pr_err("Use-after-free access at 0x" PTR_FMT " (in kfence-#%zd):\n",
-		       (void *)address, object_index);
+		pr_err("BUG: KFENCE: use-after-free %s in %pS\n\n", get_access_type(is_write),
+		       (void *)stack_entries[skipnr]);
+		pr_err("Use-after-free %s at 0x" PTR_FMT " (in kfence-#%zd):\n",
+		       get_access_type(is_write), (void *)address, object_index);
 		break;
 	case KFENCE_ERROR_CORRUPTION:
 		pr_err("BUG: KFENCE: memory corruption in %pS\n\n", (void *)stack_entries[skipnr]);
@@ -213,8 +220,10 @@ void kfence_report_error(unsigned long a
 		pr_cont(" (in kfence-#%zd):\n", object_index);
 		break;
 	case KFENCE_ERROR_INVALID:
-		pr_err("BUG: KFENCE: invalid access in %pS\n\n", (void *)stack_entries[skipnr]);
-		pr_err("Invalid access at 0x" PTR_FMT ":\n", (void *)address);
+		pr_err("BUG: KFENCE: invalid %s in %pS\n\n", get_access_type(is_write),
+		       (void *)stack_entries[skipnr]);
+		pr_err("Invalid %s at 0x" PTR_FMT ":\n", get_access_type(is_write),
+		       (void *)address);
 		break;
 	case KFENCE_ERROR_INVALID_FREE:
 		pr_err("BUG: KFENCE: invalid free in %pS\n\n", (void *)stack_entries[skipnr]);
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 065/118] MAINTAINERS: add entry for KFENCE
  2021-02-26  1:14 incoming Andrew Morton
                   ` (63 preceding siblings ...)
  2021-02-26  1:19 ` [patch 064/118] kfence: add test suite Andrew Morton
@ 2021-02-26  1:19 ` Andrew Morton
  2021-02-26  1:19 ` [patch 066/118] kfence: report sensitive information based on no_hash_pointers Andrew Morton
                   ` (53 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:19 UTC (permalink / raw)
  To: akpm, andreyknvl, aryabinin, bp, catalin.marinas, cl, corbet,
	dave.hansen, dvyukov, edumazet, elver, glider, gregkh, hdanton,
	hpa, iamjoonsoo.kim, jannh, joern, keescook, linux-mm, luto,
	mark.rutland, mingo, mm-commits, paulmck, penberg, peterz,
	rientjes, sjpark, tglx, torvalds, vbabka, will

From: Marco Elver <elver@google.com>
Subject: MAINTAINERS: add entry for KFENCE

Add entry for KFENCE maintainers.

Link: https://lkml.kernel.org/r/20201103175841.3495947-10-elver@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: SeongJae Park <sjpark@amazon.de>
Co-developed-by: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 MAINTAINERS |   12 ++++++++++++
 1 file changed, 12 insertions(+)

--- a/MAINTAINERS~maintainers-add-entry-for-kfence
+++ a/MAINTAINERS
@@ -9867,6 +9867,18 @@ F:	include/linux/keyctl.h
 F:	include/uapi/linux/keyctl.h
 F:	security/keys/
 
+KFENCE
+M:	Alexander Potapenko <glider@google.com>
+M:	Marco Elver <elver@google.com>
+R:	Dmitry Vyukov <dvyukov@google.com>
+L:	kasan-dev@googlegroups.com
+S:	Maintained
+F:	Documentation/dev-tools/kfence.rst
+F:	arch/*/include/asm/kfence.h
+F:	include/linux/kfence.h
+F:	lib/Kconfig.kfence
+F:	mm/kfence/
+
 KFIFO
 M:	Stefani Seibold <stefani@seibold.net>
 S:	Maintained
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 066/118] kfence: report sensitive information based on no_hash_pointers
  2021-02-26  1:14 incoming Andrew Morton
                   ` (64 preceding siblings ...)
  2021-02-26  1:19 ` [patch 065/118] MAINTAINERS: add entry for KFENCE Andrew Morton
@ 2021-02-26  1:19 ` Andrew Morton
  2021-02-26  1:19 ` [patch 067/118] tracing: add error_report_end trace point Andrew Morton
                   ` (52 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:19 UTC (permalink / raw)
  To: akpm, andreyknvl, dvyukov, elver, glider, jannh, linux-mm,
	mm-commits, timur, torvalds

From: Marco Elver <elver@google.com>
Subject: kfence: report sensitive information based on no_hash_pointers

We cannot rely on CONFIG_DEBUG_KERNEL to decide if we're running a "debug
kernel" where we can safely show potentially sensitive information in the
kernel log.

Instead, simply rely on the newly introduced "no_hash_pointers" to print
unhashed kernel pointers, as well as decide if our reports can include
other potentially sensitive information such as registers and corrupted
bytes.

Link: https://lkml.kernel.org/r/20210223082043.1972742-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Cc: Timur Tabi <timur@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 Documentation/dev-tools/kfence.rst |    8 ++++----
 mm/kfence/core.c                   |   10 +++-------
 mm/kfence/kfence.h                 |    7 -------
 mm/kfence/kfence_test.c            |    2 +-
 mm/kfence/report.c                 |   18 ++++++++++--------
 5 files changed, 18 insertions(+), 27 deletions(-)

--- a/Documentation/dev-tools/kfence.rst~kfence-report-sensitive-information-based-on-no_hash_pointers
+++ a/Documentation/dev-tools/kfence.rst
@@ -88,8 +88,8 @@ A typical out-of-bounds access looks lik
 
 The header of the report provides a short summary of the function involved in
 the access. It is followed by more detailed information about the access and
-its origin. Note that, real kernel addresses are only shown for
-``CONFIG_DEBUG_KERNEL=y`` builds.
+its origin. Note that, real kernel addresses are only shown when using the
+kernel command line option ``no_hash_pointers``.
 
 Use-after-free accesses are reported as::
 
@@ -184,8 +184,8 @@ invalidly written bytes (offset from the
 representation, '.' denote untouched bytes. In the example above ``0xac`` is
 the value written to the invalid address at offset 0, and the remaining '.'
 denote that no following bytes have been touched. Note that, real values are
-only shown for ``CONFIG_DEBUG_KERNEL=y`` builds; to avoid information
-disclosure for non-debug builds, '!' is used instead to denote invalidly
+only shown if the kernel was booted with ``no_hash_pointers``; to avoid
+information disclosure otherwise, '!' is used instead to denote invalidly
 written bytes.
 
 And finally, KFENCE may also report on invalid accesses to any protected page
--- a/mm/kfence/core.c~kfence-report-sensitive-information-based-on-no_hash_pointers
+++ a/mm/kfence/core.c
@@ -646,13 +646,9 @@ void __init kfence_init(void)
 
 	WRITE_ONCE(kfence_enabled, true);
 	schedule_delayed_work(&kfence_timer, 0);
-	pr_info("initialized - using %lu bytes for %d objects", KFENCE_POOL_SIZE,
-		CONFIG_KFENCE_NUM_OBJECTS);
-	if (IS_ENABLED(CONFIG_DEBUG_KERNEL))
-		pr_cont(" at 0x%px-0x%px\n", (void *)__kfence_pool,
-			(void *)(__kfence_pool + KFENCE_POOL_SIZE));
-	else
-		pr_cont("\n");
+	pr_info("initialized - using %lu bytes for %d objects at 0x%p-0x%p\n", KFENCE_POOL_SIZE,
+		CONFIG_KFENCE_NUM_OBJECTS, (void *)__kfence_pool,
+		(void *)(__kfence_pool + KFENCE_POOL_SIZE));
 }
 
 void kfence_shutdown_cache(struct kmem_cache *s)
--- a/mm/kfence/kfence.h~kfence-report-sensitive-information-based-on-no_hash_pointers
+++ a/mm/kfence/kfence.h
@@ -16,13 +16,6 @@
 
 #include "../slab.h" /* for struct kmem_cache */
 
-/* For non-debug builds, avoid leaking kernel pointers into dmesg. */
-#ifdef CONFIG_DEBUG_KERNEL
-#define PTR_FMT "%px"
-#else
-#define PTR_FMT "%p"
-#endif
-
 /*
  * Get the canary byte pattern for @addr. Use a pattern that varies based on the
  * lower 3 bits of the address, to detect memory corruptions with higher
--- a/mm/kfence/kfence_test.c~kfence-report-sensitive-information-based-on-no_hash_pointers
+++ a/mm/kfence/kfence_test.c
@@ -146,7 +146,7 @@ static bool report_matches(const struct
 		break;
 	}
 
-	cur += scnprintf(cur, end - cur, " 0x" PTR_FMT, (void *)r->addr);
+	cur += scnprintf(cur, end - cur, " 0x%p", (void *)r->addr);
 
 	spin_lock_irqsave(&observed.lock, flags);
 	if (!report_available())
--- a/mm/kfence/report.c~kfence-report-sensitive-information-based-on-no_hash_pointers
+++ a/mm/kfence/report.c
@@ -19,6 +19,8 @@
 
 #include "kfence.h"
 
+extern bool no_hash_pointers;
+
 /* Helper function to either print to a seq_file or to console. */
 __printf(2, 3)
 static void seq_con_printf(struct seq_file *seq, const char *fmt, ...)
@@ -118,7 +120,7 @@ void kfence_print_object(struct seq_file
 	}
 
 	seq_con_printf(seq,
-		       "kfence-#%zd [0x" PTR_FMT "-0x" PTR_FMT
+		       "kfence-#%zd [0x%p-0x%p"
 		       ", size=%d, cache=%s] allocated by task %d:\n",
 		       meta - kfence_metadata, (void *)start, (void *)(start + size - 1), size,
 		       (cache && cache->name) ? cache->name : "<destroyed>", meta->alloc_track.pid);
@@ -148,7 +150,7 @@ static void print_diff_canary(unsigned l
 	for (cur = (const u8 *)address; cur < end; cur++) {
 		if (*cur == KFENCE_CANARY_PATTERN(cur))
 			pr_cont(" .");
-		else if (IS_ENABLED(CONFIG_DEBUG_KERNEL))
+		else if (no_hash_pointers)
 			pr_cont(" 0x%02x", *cur);
 		else /* Do not leak kernel memory in non-debug builds. */
 			pr_cont(" !");
@@ -201,7 +203,7 @@ void kfence_report_error(unsigned long a
 
 		pr_err("BUG: KFENCE: out-of-bounds %s in %pS\n\n", get_access_type(is_write),
 		       (void *)stack_entries[skipnr]);
-		pr_err("Out-of-bounds %s at 0x" PTR_FMT " (%luB %s of kfence-#%zd):\n",
+		pr_err("Out-of-bounds %s at 0x%p (%luB %s of kfence-#%zd):\n",
 		       get_access_type(is_write), (void *)address,
 		       left_of_object ? meta->addr - address : address - meta->addr,
 		       left_of_object ? "left" : "right", object_index);
@@ -210,24 +212,24 @@ void kfence_report_error(unsigned long a
 	case KFENCE_ERROR_UAF:
 		pr_err("BUG: KFENCE: use-after-free %s in %pS\n\n", get_access_type(is_write),
 		       (void *)stack_entries[skipnr]);
-		pr_err("Use-after-free %s at 0x" PTR_FMT " (in kfence-#%zd):\n",
+		pr_err("Use-after-free %s at 0x%p (in kfence-#%zd):\n",
 		       get_access_type(is_write), (void *)address, object_index);
 		break;
 	case KFENCE_ERROR_CORRUPTION:
 		pr_err("BUG: KFENCE: memory corruption in %pS\n\n", (void *)stack_entries[skipnr]);
-		pr_err("Corrupted memory at 0x" PTR_FMT " ", (void *)address);
+		pr_err("Corrupted memory at 0x%p ", (void *)address);
 		print_diff_canary(address, 16, meta);
 		pr_cont(" (in kfence-#%zd):\n", object_index);
 		break;
 	case KFENCE_ERROR_INVALID:
 		pr_err("BUG: KFENCE: invalid %s in %pS\n\n", get_access_type(is_write),
 		       (void *)stack_entries[skipnr]);
-		pr_err("Invalid %s at 0x" PTR_FMT ":\n", get_access_type(is_write),
+		pr_err("Invalid %s at 0x%p:\n", get_access_type(is_write),
 		       (void *)address);
 		break;
 	case KFENCE_ERROR_INVALID_FREE:
 		pr_err("BUG: KFENCE: invalid free in %pS\n\n", (void *)stack_entries[skipnr]);
-		pr_err("Invalid free of 0x" PTR_FMT " (in kfence-#%zd):\n", (void *)address,
+		pr_err("Invalid free of 0x%p (in kfence-#%zd):\n", (void *)address,
 		       object_index);
 		break;
 	}
@@ -242,7 +244,7 @@ void kfence_report_error(unsigned long a
 
 	/* Print report footer. */
 	pr_err("\n");
-	if (IS_ENABLED(CONFIG_DEBUG_KERNEL) && regs)
+	if (no_hash_pointers && regs)
 		show_regs(regs);
 	else
 		dump_stack_print_info(KERN_ERR);
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 067/118] tracing: add error_report_end trace point
  2021-02-26  1:14 incoming Andrew Morton
                   ` (65 preceding siblings ...)
  2021-02-26  1:19 ` [patch 066/118] kfence: report sensitive information based on no_hash_pointers Andrew Morton
@ 2021-02-26  1:19 ` Andrew Morton
  2021-02-26 14:03   ` Steven Rostedt
  2021-02-26  1:19 ` [patch 068/118] kfence: use error_report_end tracepoint Andrew Morton
                   ` (51 subsequent siblings)
  118 siblings, 1 reply; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:19 UTC (permalink / raw)
  To: akpm, andreyknvl, dvyukov, elver, glider, gregkh, linux-mm,
	mingo, mm-commits, pmladek, rostedt, sergey.senozhatsky,
	torvalds, vbabka

From: Alexander Potapenko <glider@google.com>
Subject: tracing: add error_report_end trace point

Patch series "Add error_report_end tracepoint to KFENCE and KASAN", v3.

This patchset adds a tracepoint, error_repor_end, that is to be used by
KFENCE, KASAN, and potentially other bug detection tools, when they print
an error report.  One of the possible use cases is userspace collection of
kernel error reports: interested parties can subscribe to the tracing
event via tracefs, and get notified when an error report occurs.


This patch (of 3):

Introduce error_report_end tracepoint.  It can be used in debugging tools
like KASAN, KFENCE, etc.  to provide extensions to the error reporting
mechanisms (e.g.  allow tests hook into error reporting, ease error report
collection from production kernels).  Another benefit would be making use
of ftrace for debugging or benchmarking the tools themselves.

Should we need it, the tracepoint name leaves us with the possibility to
introduce a complementary error_report_start tracepoint in the future.

Link: https://lkml.kernel.org/r/20210121131915.1331302-1-glider@google.com
Link: https://lkml.kernel.org/r/20210121131915.1331302-2-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Suggested-by: Marco Elver <elver@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/trace/events/error_report.h |   74 ++++++++++++++++++++++++++
 kernel/trace/Makefile               |    1 
 kernel/trace/error_report-traces.c  |   12 ++++
 3 files changed, 87 insertions(+)

--- /dev/null
+++ a/include/trace/events/error_report.h
@@ -0,0 +1,74 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Declarations for error reporting tracepoints.
+ *
+ * Copyright (C) 2021, Google LLC.
+ */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM error_report
+
+#if !defined(_TRACE_ERROR_REPORT_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_ERROR_REPORT_H
+
+#include <linux/tracepoint.h>
+
+#ifndef __ERROR_REPORT_DECLARE_TRACE_ENUMS_ONCE_ONLY
+#define __ERROR_REPORT_DECLARE_TRACE_ENUMS_ONCE_ONLY
+
+enum error_detector {
+	ERROR_DETECTOR_KFENCE,
+	ERROR_DETECTOR_KASAN
+};
+
+#endif /* __ERROR_REPORT_DECLARE_TRACE_ENUMS_ONCE_ONLY */
+
+#define error_detector_list	\
+	EM(ERROR_DETECTOR_KFENCE, "kfence")	\
+	EMe(ERROR_DETECTOR_KASAN, "kasan")
+/* Always end the list with an EMe. */
+
+#undef EM
+#undef EMe
+
+#define EM(a, b)	TRACE_DEFINE_ENUM(a);
+#define EMe(a, b)	TRACE_DEFINE_ENUM(a);
+
+error_detector_list
+
+#undef EM
+#undef EMe
+
+#define EM(a, b) { a, b },
+#define EMe(a, b) { a, b }
+
+#define show_error_detector_list(val) \
+	__print_symbolic(val, error_detector_list)
+
+DECLARE_EVENT_CLASS(error_report_template,
+		    TP_PROTO(enum error_detector error_detector, unsigned long id),
+		    TP_ARGS(error_detector, id),
+		    TP_STRUCT__entry(__field(enum error_detector, error_detector)
+					     __field(unsigned long, id)),
+		    TP_fast_assign(__entry->error_detector = error_detector;
+				   __entry->id = id;),
+		    TP_printk("[%s] %lx",
+			      show_error_detector_list(__entry->error_detector),
+			      __entry->id));
+
+/**
+ * error_report_end - called after printing the error report
+ * @error_detector:	short string describing the error detection tool
+ * @id:			pseudo-unique descriptor identifying the report
+ *			(e.g. the memory access address)
+ *
+ * This event occurs right after a debugging tool finishes printing the error
+ * report.
+ */
+DEFINE_EVENT(error_report_template, error_report_end,
+	     TP_PROTO(enum error_detector error_detector, unsigned long id),
+	     TP_ARGS(error_detector, id));
+
+#endif /* _TRACE_ERROR_REPORT_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
--- /dev/null
+++ a/kernel/trace/error_report-traces.c
@@ -0,0 +1,12 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Error reporting trace points.
+ *
+ * Copyright (C) 2021, Google LLC.
+ */
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/error_report.h>
+
+EXPORT_TRACEPOINT_SYMBOL_GPL(error_report_end);
+
--- a/kernel/trace/Makefile~tracing-add-error_report_end-trace-point
+++ a/kernel/trace/Makefile
@@ -81,6 +81,7 @@ obj-$(CONFIG_SYNTH_EVENTS) += trace_even
 obj-$(CONFIG_HIST_TRIGGERS) += trace_events_hist.o
 obj-$(CONFIG_BPF_EVENTS) += bpf_trace.o
 obj-$(CONFIG_KPROBE_EVENTS) += trace_kprobe.o
+obj-$(CONFIG_TRACEPOINTS) += error_report-traces.o
 obj-$(CONFIG_TRACEPOINTS) += power-traces.o
 ifeq ($(CONFIG_PM),y)
 obj-$(CONFIG_TRACEPOINTS) += rpm-traces.o
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 068/118] kfence: use error_report_end tracepoint
  2021-02-26  1:14 incoming Andrew Morton
                   ` (66 preceding siblings ...)
  2021-02-26  1:19 ` [patch 067/118] tracing: add error_report_end trace point Andrew Morton
@ 2021-02-26  1:19 ` Andrew Morton
  2021-02-26  1:19 ` [patch 069/118] kasan: " Andrew Morton
                   ` (50 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:19 UTC (permalink / raw)
  To: akpm, andreyknvl, dvyukov, elver, glider, gregkh, linux-mm,
	mingo, mm-commits, pmladek, rostedt, sergey.senozhatsky,
	torvalds, vbabka

From: Alexander Potapenko <glider@google.com>
Subject: kfence: use error_report_end tracepoint

Make it possible to trace KFENCE error reporting.  A good usecase is
watching for trace events from the userspace to detect and process memory
corruption reports from the kernel.

Link: https://lkml.kernel.org/r/20210121131915.1331302-3-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Suggested-by: Marco Elver <elver@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/kfence/report.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/mm/kfence/report.c~kfence-use-error_report_end-tracepoint
+++ a/mm/kfence/report.c
@@ -14,6 +14,7 @@
 #include <linux/seq_file.h>
 #include <linux/stacktrace.h>
 #include <linux/string.h>
+#include <trace/events/error_report.h>
 
 #include <asm/kfence.h>
 
@@ -248,6 +249,7 @@ void kfence_report_error(unsigned long a
 		show_regs(regs);
 	else
 		dump_stack_print_info(KERN_ERR);
+	trace_error_report_end(ERROR_DETECTOR_KFENCE, address);
 	pr_err("==================================================================\n");
 
 	lockdep_on();
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 069/118] kasan: use error_report_end tracepoint
  2021-02-26  1:14 incoming Andrew Morton
                   ` (67 preceding siblings ...)
  2021-02-26  1:19 ` [patch 068/118] kfence: use error_report_end tracepoint Andrew Morton
@ 2021-02-26  1:19 ` Andrew Morton
  2021-02-26  1:19 ` [patch 070/118] kasan, mm: don't save alloc stacks twice Andrew Morton
                   ` (49 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:19 UTC (permalink / raw)
  To: akpm, andreyknvl, dvyukov, elver, glider, gregkh, linux-mm,
	mingo, mm-commits, pmladek, rostedt, sergey.senozhatsky,
	torvalds, vbabka

From: Alexander Potapenko <glider@google.com>
Subject: kasan: use error_report_end tracepoint

Make it possible to trace KASAN error reporting.  A good usecase is
watching for trace events from the userspace to detect and process memory
corruption reports from the kernel.

Link: https://lkml.kernel.org/r/20210121131915.1331302-4-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Suggested-by: Marco Elver <elver@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/kasan/report.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

--- a/mm/kasan/report.c~kasan-use-error_report_end-tracepoint
+++ a/mm/kasan/report.c
@@ -25,6 +25,7 @@
 #include <linux/module.h>
 #include <linux/sched/task_stack.h>
 #include <linux/uaccess.h>
+#include <trace/events/error_report.h>
 
 #include <asm/sections.h>
 
@@ -84,8 +85,9 @@ static void start_report(unsigned long *
 	pr_err("==================================================================\n");
 }
 
-static void end_report(unsigned long *flags)
+static void end_report(unsigned long *flags, unsigned long addr)
 {
+	trace_error_report_end(ERROR_DETECTOR_KASAN, addr);
 	pr_err("==================================================================\n");
 	add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
 	spin_unlock_irqrestore(&report_lock, *flags);
@@ -355,7 +357,7 @@ void kasan_report_invalid_free(void *obj
 	print_address_description(object, tag);
 	pr_err("\n");
 	print_memory_metadata(object);
-	end_report(&flags);
+	end_report(&flags, (unsigned long)object);
 }
 
 static void __kasan_report(unsigned long addr, size_t size, bool is_write,
@@ -401,7 +403,7 @@ static void __kasan_report(unsigned long
 		dump_stack();
 	}
 
-	end_report(&flags);
+	end_report(&flags, addr);
 }
 
 bool kasan_report(unsigned long addr, size_t size, bool is_write,
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 070/118] kasan, mm: don't save alloc stacks twice
  2021-02-26  1:14 incoming Andrew Morton
                   ` (68 preceding siblings ...)
  2021-02-26  1:19 ` [patch 069/118] kasan: " Andrew Morton
@ 2021-02-26  1:19 ` Andrew Morton
  2021-02-26  1:19 ` [patch 071/118] kasan, mm: optimize kmalloc poisoning Andrew Morton
                   ` (48 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:19 UTC (permalink / raw)
  To: akpm, andreyknvl, aryabinin, Branislav.Rankov, catalin.marinas,
	dvyukov, elver, eugenis, glider, kevin.brodsky, linux-mm,
	mm-commits, pcc, torvalds, vincenzo.frascino, will.deacon

From: Andrey Konovalov <andreyknvl@google.com>
Subject: kasan, mm: don't save alloc stacks twice

Patch series "kasan: optimizations and fixes for HW_TAGS", v4.

This patchset makes the HW_TAGS mode more efficient, mostly by reworking
poisoning approaches and simplifying/inlining some internal helpers.

With this change, the overhead of HW_TAGS annotations excluding setting
and checking memory tags is ~3%.  The performance impact caused by tags
will be unknown until we have hardware that supports MTE.

As a side-effect, this patchset speeds up generic KASAN by ~15%.


This patch (of 13):

Currently KASAN saves allocation stacks in both kasan_slab_alloc() and
kasan_kmalloc() annotations.  This patch changes KASAN to save allocation
stacks for slab objects from kmalloc caches in kasan_kmalloc() only, and
stacks for other slab objects in kasan_slab_alloc() only.

This change requires ____kasan_kmalloc() knowing whether the object
belongs to a kmalloc cache.  This is implemented by adding a flag field to
the kasan_info structure.  That flag is only set for kmalloc caches via a
new kasan_cache_create_kmalloc() annotation.

Link: https://lkml.kernel.org/r/cover.1612546384.git.andreyknvl@google.com
Link: https://lkml.kernel.org/r/7c673ebca8d00f40a7ad6f04ab9a2bddeeae2097.1612546384.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/kasan.h |    9 +++++++++
 mm/kasan/common.c     |   18 ++++++++++++++----
 mm/slab_common.c      |    1 +
 3 files changed, 24 insertions(+), 4 deletions(-)

--- a/include/linux/kasan.h~kasan-mm-dont-save-alloc-stacks-twice
+++ a/include/linux/kasan.h
@@ -83,6 +83,7 @@ static inline void kasan_disable_current
 struct kasan_cache {
 	int alloc_meta_offset;
 	int free_meta_offset;
+	bool is_kmalloc;
 };
 
 #ifdef CONFIG_KASAN_HW_TAGS
@@ -143,6 +144,13 @@ static __always_inline void kasan_cache_
 		__kasan_cache_create(cache, size, flags);
 }
 
+void __kasan_cache_create_kmalloc(struct kmem_cache *cache);
+static __always_inline void kasan_cache_create_kmalloc(struct kmem_cache *cache)
+{
+	if (kasan_enabled())
+		__kasan_cache_create_kmalloc(cache);
+}
+
 size_t __kasan_metadata_size(struct kmem_cache *cache);
 static __always_inline size_t kasan_metadata_size(struct kmem_cache *cache)
 {
@@ -278,6 +286,7 @@ static inline void kasan_free_pages(stru
 static inline void kasan_cache_create(struct kmem_cache *cache,
 				      unsigned int *size,
 				      slab_flags_t *flags) {}
+static inline void kasan_cache_create_kmalloc(struct kmem_cache *cache) {}
 static inline size_t kasan_metadata_size(struct kmem_cache *cache) { return 0; }
 static inline void kasan_poison_slab(struct page *page) {}
 static inline void kasan_unpoison_object_data(struct kmem_cache *cache,
--- a/mm/kasan/common.c~kasan-mm-dont-save-alloc-stacks-twice
+++ a/mm/kasan/common.c
@@ -210,6 +210,11 @@ void __kasan_cache_create(struct kmem_ca
 		*size = optimal_size;
 }
 
+void __kasan_cache_create_kmalloc(struct kmem_cache *cache)
+{
+	cache->kasan_info.is_kmalloc = true;
+}
+
 size_t __kasan_metadata_size(struct kmem_cache *cache)
 {
 	if (!kasan_stack_collection_enabled())
@@ -394,17 +399,22 @@ void __kasan_slab_free_mempool(void *ptr
 	}
 }
 
-static void set_alloc_info(struct kmem_cache *cache, void *object, gfp_t flags)
+static void set_alloc_info(struct kmem_cache *cache, void *object,
+				gfp_t flags, bool is_kmalloc)
 {
 	struct kasan_alloc_meta *alloc_meta;
 
+	/* Don't save alloc info for kmalloc caches in kasan_slab_alloc(). */
+	if (cache->kasan_info.is_kmalloc && !is_kmalloc)
+		return;
+
 	alloc_meta = kasan_get_alloc_meta(cache, object);
 	if (alloc_meta)
 		kasan_set_track(&alloc_meta->alloc_track, flags);
 }
 
 static void *____kasan_kmalloc(struct kmem_cache *cache, const void *object,
-				size_t size, gfp_t flags, bool keep_tag)
+				size_t size, gfp_t flags, bool is_kmalloc)
 {
 	unsigned long redzone_start;
 	unsigned long redzone_end;
@@ -423,7 +433,7 @@ static void *____kasan_kmalloc(struct km
 				KASAN_GRANULE_SIZE);
 	redzone_end = round_up((unsigned long)object + cache->object_size,
 				KASAN_GRANULE_SIZE);
-	tag = assign_tag(cache, object, false, keep_tag);
+	tag = assign_tag(cache, object, false, is_kmalloc);
 
 	/* Tag is ignored in set_tag without CONFIG_KASAN_SW/HW_TAGS */
 	kasan_unpoison(set_tag(object, tag), size);
@@ -431,7 +441,7 @@ static void *____kasan_kmalloc(struct km
 			   KASAN_KMALLOC_REDZONE);
 
 	if (kasan_stack_collection_enabled())
-		set_alloc_info(cache, (void *)object, flags);
+		set_alloc_info(cache, (void *)object, flags, is_kmalloc);
 
 	return set_tag(object, tag);
 }
--- a/mm/slab_common.c~kasan-mm-dont-save-alloc-stacks-twice
+++ a/mm/slab_common.c
@@ -643,6 +643,7 @@ struct kmem_cache *__init create_kmalloc
 		panic("Out of memory when creating slab %s\n", name);
 
 	create_boot_cache(s, name, size, flags, useroffset, usersize);
+	kasan_cache_create_kmalloc(s);
 	list_add(&s->list, &slab_caches);
 	s->refcount = 1;
 	return s;
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 071/118] kasan, mm: optimize kmalloc poisoning
  2021-02-26  1:14 incoming Andrew Morton
                   ` (69 preceding siblings ...)
  2021-02-26  1:19 ` [patch 070/118] kasan, mm: don't save alloc stacks twice Andrew Morton
@ 2021-02-26  1:19 ` Andrew Morton
  2021-02-26  1:20 ` [patch 072/118] kasan: optimize large " Andrew Morton
                   ` (47 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:19 UTC (permalink / raw)
  To: akpm, andreyknvl, aryabinin, Branislav.Rankov, catalin.marinas,
	dvyukov, elver, eugenis, glider, kevin.brodsky, linux-mm,
	mm-commits, pcc, torvalds, vincenzo.frascino, will.deacon

From: Andrey Konovalov <andreyknvl@google.com>
Subject: kasan, mm: optimize kmalloc poisoning

For allocations from kmalloc caches, kasan_kmalloc() always follows
kasan_slab_alloc().  Currenly, both of them unpoison the whole object,
which is unnecessary.

This patch provides separate implementations for both annotations:
kasan_slab_alloc() unpoisons the whole object, and kasan_kmalloc() only
poisons the redzone.

For generic KASAN, the redzone start might not be aligned to
KASAN_GRANULE_SIZE.  Therefore, the poisoning is split in two parts:
kasan_poison_last_granule() poisons the unaligned part, and then
kasan_poison() poisons the rest.

This patch also clarifies alignment guarantees of each of the poisoning
functions and drops the unnecessary round_up() call for redzone_end.

With this change, the early SLUB cache annotation needs to be changed to
kasan_slab_alloc(), as kasan_kmalloc() doesn't unpoison objects now.  The
number of poisoned bytes for objects in this cache stays the same, as
kmem_cache_node->object_size is equal to sizeof(struct kmem_cache_node).

Link: https://lkml.kernel.org/r/7e3961cb52be380bc412860332063f5f7ce10d13.1612546384.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/kasan/common.c |   93 +++++++++++++++++++++++++++++---------------
 mm/kasan/kasan.h  |   43 +++++++++++++++++++-
 mm/kasan/shadow.c |   28 +++++++------
 mm/slub.c         |    3 -
 4 files changed, 119 insertions(+), 48 deletions(-)

--- a/mm/kasan/common.c~kasan-mm-optimize-kmalloc-poisoning
+++ a/mm/kasan/common.c
@@ -278,22 +278,12 @@ void __kasan_poison_object_data(struct k
  *    based on objects indexes, so that objects that are next to each other
  *    get different tags.
  */
-static u8 assign_tag(struct kmem_cache *cache, const void *object,
-			bool init, bool keep_tag)
+static u8 assign_tag(struct kmem_cache *cache, const void *object, bool init)
 {
 	if (IS_ENABLED(CONFIG_KASAN_GENERIC))
 		return 0xff;
 
 	/*
-	 * 1. When an object is kmalloc()'ed, two hooks are called:
-	 *    kasan_slab_alloc() and kasan_kmalloc(). We assign the
-	 *    tag only in the first one.
-	 * 2. We reuse the same tag for krealloc'ed objects.
-	 */
-	if (keep_tag)
-		return get_tag(object);
-
-	/*
 	 * If the cache neither has a constructor nor has SLAB_TYPESAFE_BY_RCU
 	 * set, assign a tag when the object is being allocated (init == false).
 	 */
@@ -325,7 +315,7 @@ void * __must_check __kasan_init_slab_ob
 	}
 
 	/* Tag is ignored in set_tag() without CONFIG_KASAN_SW/HW_TAGS */
-	object = set_tag(object, assign_tag(cache, object, true, false));
+	object = set_tag(object, assign_tag(cache, object, true));
 
 	return (void *)object;
 }
@@ -413,12 +403,46 @@ static void set_alloc_info(struct kmem_c
 		kasan_set_track(&alloc_meta->alloc_track, flags);
 }
 
+void * __must_check __kasan_slab_alloc(struct kmem_cache *cache,
+					void *object, gfp_t flags)
+{
+	u8 tag;
+	void *tagged_object;
+
+	if (gfpflags_allow_blocking(flags))
+		kasan_quarantine_reduce();
+
+	if (unlikely(object == NULL))
+		return NULL;
+
+	if (is_kfence_address(object))
+		return (void *)object;
+
+	/*
+	 * Generate and assign random tag for tag-based modes.
+	 * Tag is ignored in set_tag() for the generic mode.
+	 */
+	tag = assign_tag(cache, object, false);
+	tagged_object = set_tag(object, tag);
+
+	/*
+	 * Unpoison the whole object.
+	 * For kmalloc() allocations, kasan_kmalloc() will do precise poisoning.
+	 */
+	kasan_unpoison(tagged_object, cache->object_size);
+
+	/* Save alloc info (if possible) for non-kmalloc() allocations. */
+	if (kasan_stack_collection_enabled())
+		set_alloc_info(cache, (void *)object, flags, false);
+
+	return tagged_object;
+}
+
 static void *____kasan_kmalloc(struct kmem_cache *cache, const void *object,
-				size_t size, gfp_t flags, bool is_kmalloc)
+					size_t size, gfp_t flags)
 {
 	unsigned long redzone_start;
 	unsigned long redzone_end;
-	u8 tag;
 
 	if (gfpflags_allow_blocking(flags))
 		kasan_quarantine_reduce();
@@ -429,33 +453,41 @@ static void *____kasan_kmalloc(struct km
 	if (is_kfence_address(kasan_reset_tag(object)))
 		return (void *)object;
 
+	/*
+	 * The object has already been unpoisoned by kasan_slab_alloc() for
+	 * kmalloc() or by ksize() for krealloc().
+	 */
+
+	/*
+	 * The redzone has byte-level precision for the generic mode.
+	 * Partially poison the last object granule to cover the unaligned
+	 * part of the redzone.
+	 */
+	if (IS_ENABLED(CONFIG_KASAN_GENERIC))
+		kasan_poison_last_granule((void *)object, size);
+
+	/* Poison the aligned part of the redzone. */
 	redzone_start = round_up((unsigned long)(object + size),
 				KASAN_GRANULE_SIZE);
-	redzone_end = round_up((unsigned long)object + cache->object_size,
-				KASAN_GRANULE_SIZE);
-	tag = assign_tag(cache, object, false, is_kmalloc);
-
-	/* Tag is ignored in set_tag without CONFIG_KASAN_SW/HW_TAGS */
-	kasan_unpoison(set_tag(object, tag), size);
+	redzone_end = (unsigned long)object + cache->object_size;
 	kasan_poison((void *)redzone_start, redzone_end - redzone_start,
 			   KASAN_KMALLOC_REDZONE);
 
+	/*
+	 * Save alloc info (if possible) for kmalloc() allocations.
+	 * This also rewrites the alloc info when called from kasan_krealloc().
+	 */
 	if (kasan_stack_collection_enabled())
-		set_alloc_info(cache, (void *)object, flags, is_kmalloc);
+		set_alloc_info(cache, (void *)object, flags, true);
 
-	return set_tag(object, tag);
-}
-
-void * __must_check __kasan_slab_alloc(struct kmem_cache *cache,
-					void *object, gfp_t flags)
-{
-	return ____kasan_kmalloc(cache, object, cache->object_size, flags, false);
+	/* Keep the tag that was set by kasan_slab_alloc(). */
+	return (void *)object;
 }
 
 void * __must_check __kasan_kmalloc(struct kmem_cache *cache, const void *object,
 					size_t size, gfp_t flags)
 {
-	return ____kasan_kmalloc(cache, object, size, flags, true);
+	return ____kasan_kmalloc(cache, object, size, flags);
 }
 EXPORT_SYMBOL(__kasan_kmalloc);
 
@@ -496,8 +528,7 @@ void * __must_check __kasan_krealloc(con
 	if (unlikely(!PageSlab(page)))
 		return __kasan_kmalloc_large(object, size, flags);
 	else
-		return ____kasan_kmalloc(page->slab_cache, object, size,
-						flags, true);
+		return ____kasan_kmalloc(page->slab_cache, object, size, flags);
 }
 
 void __kasan_kfree_large(void *ptr, unsigned long ip)
--- a/mm/kasan/kasan.h~kasan-mm-optimize-kmalloc-poisoning
+++ a/mm/kasan/kasan.h
@@ -367,12 +367,51 @@ static inline bool kasan_byte_accessible
 
 #else /* CONFIG_KASAN_HW_TAGS */
 
-void kasan_poison(const void *address, size_t size, u8 value);
-void kasan_unpoison(const void *address, size_t size);
+/**
+ * kasan_poison - mark the memory range as unaccessible
+ * @addr - range start address, must be aligned to KASAN_GRANULE_SIZE
+ * @size - range size
+ * @value - value that's written to metadata for the range
+ *
+ * The size gets aligned to KASAN_GRANULE_SIZE before marking the range.
+ */
+void kasan_poison(const void *addr, size_t size, u8 value);
+
+/**
+ * kasan_unpoison - mark the memory range as accessible
+ * @addr - range start address, must be aligned to KASAN_GRANULE_SIZE
+ * @size - range size
+ *
+ * For the tag-based modes, the @size gets aligned to KASAN_GRANULE_SIZE before
+ * marking the range.
+ * For the generic mode, the last granule of the memory range gets partially
+ * unpoisoned based on the @size.
+ */
+void kasan_unpoison(const void *addr, size_t size);
+
 bool kasan_byte_accessible(const void *addr);
 
 #endif /* CONFIG_KASAN_HW_TAGS */
 
+#ifdef CONFIG_KASAN_GENERIC
+
+/**
+ * kasan_poison_last_granule - mark the last granule of the memory range as
+ * unaccessible
+ * @addr - range start address, must be aligned to KASAN_GRANULE_SIZE
+ * @size - range size
+ *
+ * This function is only available for the generic mode, as it's the only mode
+ * that has partially poisoned memory granules.
+ */
+void kasan_poison_last_granule(const void *address, size_t size);
+
+#else /* CONFIG_KASAN_GENERIC */
+
+static inline void kasan_poison_last_granule(const void *address, size_t size) { }
+
+#endif /* CONFIG_KASAN_GENERIC */
+
 /*
  * Exported functions for interfaces called from assembly or from generated
  * code. Declarations here to avoid warning about missing declarations.
--- a/mm/kasan/shadow.c~kasan-mm-optimize-kmalloc-poisoning
+++ a/mm/kasan/shadow.c
@@ -69,10 +69,6 @@ void *memcpy(void *dest, const void *src
 	return __memcpy(dest, src, len);
 }
 
-/*
- * Poisons the shadow memory for 'size' bytes starting from 'addr'.
- * Memory addresses should be aligned to KASAN_GRANULE_SIZE.
- */
 void kasan_poison(const void *address, size_t size, u8 value)
 {
 	void *shadow_start, *shadow_end;
@@ -83,12 +79,12 @@ void kasan_poison(const void *address, s
 	 * addresses to this function.
 	 */
 	address = kasan_reset_tag(address);
-	size = round_up(size, KASAN_GRANULE_SIZE);
 
 	/* Skip KFENCE memory if called explicitly outside of sl*b. */
 	if (is_kfence_address(address))
 		return;
 
+	size = round_up(size, KASAN_GRANULE_SIZE);
 	shadow_start = kasan_mem_to_shadow(address);
 	shadow_end = kasan_mem_to_shadow(address + size);
 
@@ -96,6 +92,16 @@ void kasan_poison(const void *address, s
 }
 EXPORT_SYMBOL(kasan_poison);
 
+#ifdef CONFIG_KASAN_GENERIC
+void kasan_poison_last_granule(const void *address, size_t size)
+{
+	if (size & KASAN_GRANULE_MASK) {
+		u8 *shadow = (u8 *)kasan_mem_to_shadow(address + size);
+		*shadow = size & KASAN_GRANULE_MASK;
+	}
+}
+#endif
+
 void kasan_unpoison(const void *address, size_t size)
 {
 	u8 tag = get_tag(address);
@@ -115,16 +121,12 @@ void kasan_unpoison(const void *address,
 	if (is_kfence_address(address))
 		return;
 
+	/* Unpoison round_up(size, KASAN_GRANULE_SIZE) bytes. */
 	kasan_poison(address, size, tag);
 
-	if (size & KASAN_GRANULE_MASK) {
-		u8 *shadow = (u8 *)kasan_mem_to_shadow(address + size);
-
-		if (IS_ENABLED(CONFIG_KASAN_SW_TAGS))
-			*shadow = tag;
-		else /* CONFIG_KASAN_GENERIC */
-			*shadow = size & KASAN_GRANULE_MASK;
-	}
+	/* Partially poison the last granule for the generic mode. */
+	if (IS_ENABLED(CONFIG_KASAN_GENERIC))
+		kasan_poison_last_granule(address, size);
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
--- a/mm/slub.c~kasan-mm-optimize-kmalloc-poisoning
+++ a/mm/slub.c
@@ -3579,8 +3579,7 @@ static void early_kmem_cache_node_alloc(
 	init_object(kmem_cache_node, n, SLUB_RED_ACTIVE);
 	init_tracking(kmem_cache_node, n);
 #endif
-	n = kasan_kmalloc(kmem_cache_node, n, sizeof(struct kmem_cache_node),
-		      GFP_KERNEL);
+	n = kasan_slab_alloc(kmem_cache_node, n, GFP_KERNEL);
 	page->freelist = get_freepointer(kmem_cache_node, n);
 	page->inuse = 1;
 	page->frozen = 0;
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 072/118] kasan: optimize large kmalloc poisoning
  2021-02-26  1:14 incoming Andrew Morton
                   ` (70 preceding siblings ...)
  2021-02-26  1:19 ` [patch 071/118] kasan, mm: optimize kmalloc poisoning Andrew Morton
@ 2021-02-26  1:20 ` Andrew Morton
  2021-02-26  1:20 ` [patch 073/118] kasan: clean up setting free info in kasan_slab_free Andrew Morton
                   ` (46 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:20 UTC (permalink / raw)
  To: akpm, andreyknvl, aryabinin, Branislav.Rankov, catalin.marinas,
	dvyukov, elver, eugenis, glider, kevin.brodsky, linux-mm,
	mm-commits, pcc, torvalds, vincenzo.frascino, will.deacon

From: Andrey Konovalov <andreyknvl@google.com>
Subject: kasan: optimize large kmalloc poisoning

Similarly to kasan_kmalloc(), kasan_kmalloc_large() doesn't need to
unpoison the object as it as already unpoisoned by alloc_pages() (or by
ksize() for krealloc()).

This patch changes kasan_kmalloc_large() to only poison the redzone.

Link: https://lkml.kernel.org/r/33dee5aac0e550ad7f8e26f590c9b02c6129b4a3.1612546384.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/kasan/common.c |   20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

--- a/mm/kasan/common.c~kasan-optimize-large-kmalloc-poisoning
+++ a/mm/kasan/common.c
@@ -494,7 +494,6 @@ EXPORT_SYMBOL(__kasan_kmalloc);
 void * __must_check __kasan_kmalloc_large(const void *ptr, size_t size,
 						gfp_t flags)
 {
-	struct page *page;
 	unsigned long redzone_start;
 	unsigned long redzone_end;
 
@@ -504,12 +503,23 @@ void * __must_check __kasan_kmalloc_larg
 	if (unlikely(ptr == NULL))
 		return NULL;
 
-	page = virt_to_page(ptr);
+	/*
+	 * The object has already been unpoisoned by kasan_alloc_pages() for
+	 * alloc_pages() or by ksize() for krealloc().
+	 */
+
+	/*
+	 * The redzone has byte-level precision for the generic mode.
+	 * Partially poison the last object granule to cover the unaligned
+	 * part of the redzone.
+	 */
+	if (IS_ENABLED(CONFIG_KASAN_GENERIC))
+		kasan_poison_last_granule(ptr, size);
+
+	/* Poison the aligned part of the redzone. */
 	redzone_start = round_up((unsigned long)(ptr + size),
 				KASAN_GRANULE_SIZE);
-	redzone_end = (unsigned long)ptr + page_size(page);
-
-	kasan_unpoison(ptr, size);
+	redzone_end = (unsigned long)ptr + page_size(virt_to_page(ptr));
 	kasan_poison((void *)redzone_start, redzone_end - redzone_start,
 		     KASAN_PAGE_REDZONE);
 
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 073/118] kasan: clean up setting free info in kasan_slab_free
  2021-02-26  1:14 incoming Andrew Morton
                   ` (71 preceding siblings ...)
  2021-02-26  1:20 ` [patch 072/118] kasan: optimize large " Andrew Morton
@ 2021-02-26  1:20 ` Andrew Morton
  2021-02-26  1:20 ` [patch 074/118] kasan: unify large kfree checks Andrew Morton
                   ` (45 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:20 UTC (permalink / raw)
  To: akpm, andreyknvl, aryabinin, Branislav.Rankov, catalin.marinas,
	dvyukov, elver, eugenis, glider, kevin.brodsky, linux-mm,
	mm-commits, pcc, torvalds, vincenzo.frascino, will.deacon

From: Andrey Konovalov <andreyknvl@google.com>
Subject: kasan: clean up setting free info in kasan_slab_free

Put kasan_stack_collection_enabled() check and kasan_set_free_info() calls
next to each other.

The way this was previously implemented was a minor optimization that
relied of the the fact that kasan_stack_collection_enabled() is always
true for generic KASAN.  The confusion that this brings outweights saving
a few instructions.

Link: https://lkml.kernel.org/r/f838e249be5ab5810bf54a36ef5072cfd80e2da7.1612546384.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/kasan/common.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

--- a/mm/kasan/common.c~kasan-clean-up-setting-free-info-in-kasan_slab_free
+++ a/mm/kasan/common.c
@@ -350,13 +350,11 @@ static bool ____kasan_slab_free(struct k
 
 	kasan_poison(object, cache->object_size, KASAN_KMALLOC_FREE);
 
-	if (!kasan_stack_collection_enabled())
-		return false;
-
 	if ((IS_ENABLED(CONFIG_KASAN_GENERIC) && !quarantine))
 		return false;
 
-	kasan_set_free_info(cache, object, tag);
+	if (kasan_stack_collection_enabled())
+		kasan_set_free_info(cache, object, tag);
 
 	return kasan_quarantine_put(cache, object);
 }
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 074/118] kasan: unify large kfree checks
  2021-02-26  1:14 incoming Andrew Morton
                   ` (72 preceding siblings ...)
  2021-02-26  1:20 ` [patch 073/118] kasan: clean up setting free info in kasan_slab_free Andrew Morton
@ 2021-02-26  1:20 ` Andrew Morton
  2021-02-26  1:20 ` [patch 075/118] kasan: rework krealloc tests Andrew Morton
                   ` (44 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:20 UTC (permalink / raw)
  To: akpm, andreyknvl, aryabinin, Branislav.Rankov, catalin.marinas,
	dvyukov, elver, eugenis, glider, kevin.brodsky, linux-mm,
	mm-commits, pcc, torvalds, vincenzo.frascino, will.deacon

From: Andrey Konovalov <andreyknvl@google.com>
Subject: kasan: unify large kfree checks

Unify checks in kasan_kfree_large() and in kasan_slab_free_mempool() for
large allocations as it's done for small kfree() allocations.

With this change, kasan_slab_free_mempool() starts checking that the first
byte of the memory that's being freed is accessible.

Link: https://lkml.kernel.org/r/14ffc4cd867e0b1ed58f7527e3b748a1b4ad08aa.1612546384.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/kasan.h |   16 ++++++++--------
 mm/kasan/common.c     |   36 ++++++++++++++++++++++++++----------
 2 files changed, 34 insertions(+), 18 deletions(-)

--- a/include/linux/kasan.h~kasan-unify-large-kfree-checks
+++ a/include/linux/kasan.h
@@ -200,6 +200,13 @@ static __always_inline bool kasan_slab_f
 	return false;
 }
 
+void __kasan_kfree_large(void *ptr, unsigned long ip);
+static __always_inline void kasan_kfree_large(void *ptr)
+{
+	if (kasan_enabled())
+		__kasan_kfree_large(ptr, _RET_IP_);
+}
+
 void __kasan_slab_free_mempool(void *ptr, unsigned long ip);
 static __always_inline void kasan_slab_free_mempool(void *ptr)
 {
@@ -247,13 +254,6 @@ static __always_inline void * __must_che
 	return (void *)object;
 }
 
-void __kasan_kfree_large(void *ptr, unsigned long ip);
-static __always_inline void kasan_kfree_large(void *ptr)
-{
-	if (kasan_enabled())
-		__kasan_kfree_large(ptr, _RET_IP_);
-}
-
 /*
  * Unlike kasan_check_read/write(), kasan_check_byte() is performed even for
  * the hardware tag-based mode that doesn't rely on compiler instrumentation.
@@ -302,6 +302,7 @@ static inline bool kasan_slab_free(struc
 {
 	return false;
 }
+static inline void kasan_kfree_large(void *ptr) {}
 static inline void kasan_slab_free_mempool(void *ptr) {}
 static inline void *kasan_slab_alloc(struct kmem_cache *s, void *object,
 				   gfp_t flags)
@@ -322,7 +323,6 @@ static inline void *kasan_krealloc(const
 {
 	return (void *)object;
 }
-static inline void kasan_kfree_large(void *ptr) {}
 static inline bool kasan_check_byte(const void *address)
 {
 	return true;
--- a/mm/kasan/common.c~kasan-unify-large-kfree-checks
+++ a/mm/kasan/common.c
@@ -364,6 +364,31 @@ bool __kasan_slab_free(struct kmem_cache
 	return ____kasan_slab_free(cache, object, ip, true);
 }
 
+static bool ____kasan_kfree_large(void *ptr, unsigned long ip)
+{
+	if (ptr != page_address(virt_to_head_page(ptr))) {
+		kasan_report_invalid_free(ptr, ip);
+		return true;
+	}
+
+	if (!kasan_byte_accessible(ptr)) {
+		kasan_report_invalid_free(ptr, ip);
+		return true;
+	}
+
+	/*
+	 * The object will be poisoned by kasan_free_pages() or
+	 * kasan_slab_free_mempool().
+	 */
+
+	return false;
+}
+
+void __kasan_kfree_large(void *ptr, unsigned long ip)
+{
+	____kasan_kfree_large(ptr, ip);
+}
+
 void __kasan_slab_free_mempool(void *ptr, unsigned long ip)
 {
 	struct page *page;
@@ -377,10 +402,8 @@ void __kasan_slab_free_mempool(void *ptr
 	 * KMALLOC_MAX_SIZE, and kmalloc falls back onto page_alloc.
 	 */
 	if (unlikely(!PageSlab(page))) {
-		if (ptr != page_address(page)) {
-			kasan_report_invalid_free(ptr, ip);
+		if (____kasan_kfree_large(ptr, ip))
 			return;
-		}
 		kasan_poison(ptr, page_size(page), KASAN_FREE_PAGE);
 	} else {
 		____kasan_slab_free(page->slab_cache, ptr, ip, false);
@@ -539,13 +562,6 @@ void * __must_check __kasan_krealloc(con
 		return ____kasan_kmalloc(page->slab_cache, object, size, flags);
 }
 
-void __kasan_kfree_large(void *ptr, unsigned long ip)
-{
-	if (ptr != page_address(virt_to_head_page(ptr)))
-		kasan_report_invalid_free(ptr, ip);
-	/* The object will be poisoned by kasan_free_pages(). */
-}
-
 bool __kasan_check_byte(const void *address, unsigned long ip)
 {
 	if (!kasan_byte_accessible(address)) {
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 075/118] kasan: rework krealloc tests
  2021-02-26  1:14 incoming Andrew Morton
                   ` (73 preceding siblings ...)
  2021-02-26  1:20 ` [patch 074/118] kasan: unify large kfree checks Andrew Morton
@ 2021-02-26  1:20 ` Andrew Morton
  2021-02-26  1:20 ` [patch 076/118] kasan, mm: fail krealloc on freed objects Andrew Morton
                   ` (43 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:20 UTC (permalink / raw)
  To: akpm, andreyknvl, aryabinin, Branislav.Rankov, catalin.marinas,
	dvyukov, elver, eugenis, glider, kevin.brodsky, linux-mm,
	mm-commits, pcc, torvalds, vincenzo.frascino, will.deacon

From: Andrey Konovalov <andreyknvl@google.com>
Subject: kasan: rework krealloc tests

This patch reworks KASAN-KUnit tests for krealloc() to:

1. Check both slab and page_alloc based krealloc() implementations.
2. Allow at least one full granule to fit between old and new sizes for
   each KASAN mode, and check accesses to that granule accordingly.

Link: https://lkml.kernel.org/r/c707f128a2bb9f2f05185d1eb52192cf179cf4fa.1612546384.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 lib/test_kasan.c |   91 ++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 81 insertions(+), 10 deletions(-)

--- a/lib/test_kasan.c~kasan-rework-krealloc-tests
+++ a/lib/test_kasan.c
@@ -252,11 +252,14 @@ static void kmalloc_large_oob_right(stru
 	kfree(ptr);
 }
 
-static void kmalloc_oob_krealloc_more(struct kunit *test)
+static void krealloc_more_oob_helper(struct kunit *test,
+					size_t size1, size_t size2)
 {
 	char *ptr1, *ptr2;
-	size_t size1 = 17;
-	size_t size2 = 19;
+	size_t middle;
+
+	KUNIT_ASSERT_LT(test, size1, size2);
+	middle = size1 + (size2 - size1) / 2;
 
 	ptr1 = kmalloc(size1, GFP_KERNEL);
 	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr1);
@@ -264,15 +267,31 @@ static void kmalloc_oob_krealloc_more(st
 	ptr2 = krealloc(ptr1, size2, GFP_KERNEL);
 	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr2);
 
-	KUNIT_EXPECT_KASAN_FAIL(test, ptr2[size2 + OOB_TAG_OFF] = 'x');
+	/* All offsets up to size2 must be accessible. */
+	ptr2[size1 - 1] = 'x';
+	ptr2[size1] = 'x';
+	ptr2[middle] = 'x';
+	ptr2[size2 - 1] = 'x';
+
+	/* Generic mode is precise, so unaligned size2 must be inaccessible. */
+	if (IS_ENABLED(CONFIG_KASAN_GENERIC))
+		KUNIT_EXPECT_KASAN_FAIL(test, ptr2[size2] = 'x');
+
+	/* For all modes first aligned offset after size2 must be inaccessible. */
+	KUNIT_EXPECT_KASAN_FAIL(test,
+		ptr2[round_up(size2, KASAN_GRANULE_SIZE)] = 'x');
+
 	kfree(ptr2);
 }
 
-static void kmalloc_oob_krealloc_less(struct kunit *test)
+static void krealloc_less_oob_helper(struct kunit *test,
+					size_t size1, size_t size2)
 {
 	char *ptr1, *ptr2;
-	size_t size1 = 17;
-	size_t size2 = 15;
+	size_t middle;
+
+	KUNIT_ASSERT_LT(test, size2, size1);
+	middle = size2 + (size1 - size2) / 2;
 
 	ptr1 = kmalloc(size1, GFP_KERNEL);
 	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr1);
@@ -280,10 +299,60 @@ static void kmalloc_oob_krealloc_less(st
 	ptr2 = krealloc(ptr1, size2, GFP_KERNEL);
 	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr2);
 
-	KUNIT_EXPECT_KASAN_FAIL(test, ptr2[size2 + OOB_TAG_OFF] = 'x');
+	/* Must be accessible for all modes. */
+	ptr2[size2 - 1] = 'x';
+
+	/* Generic mode is precise, so unaligned size2 must be inaccessible. */
+	if (IS_ENABLED(CONFIG_KASAN_GENERIC))
+		KUNIT_EXPECT_KASAN_FAIL(test, ptr2[size2] = 'x');
+
+	/* For all modes first aligned offset after size2 must be inaccessible. */
+	KUNIT_EXPECT_KASAN_FAIL(test,
+		ptr2[round_up(size2, KASAN_GRANULE_SIZE)] = 'x');
+
+	/*
+	 * For all modes all size2, middle, and size1 should land in separate
+	 * granules and thus the latter two offsets should be inaccessible.
+	 */
+	KUNIT_EXPECT_LE(test, round_up(size2, KASAN_GRANULE_SIZE),
+				round_down(middle, KASAN_GRANULE_SIZE));
+	KUNIT_EXPECT_LE(test, round_up(middle, KASAN_GRANULE_SIZE),
+				round_down(size1, KASAN_GRANULE_SIZE));
+	KUNIT_EXPECT_KASAN_FAIL(test, ptr2[middle] = 'x');
+	KUNIT_EXPECT_KASAN_FAIL(test, ptr2[size1 - 1] = 'x');
+	KUNIT_EXPECT_KASAN_FAIL(test, ptr2[size1] = 'x');
+
 	kfree(ptr2);
 }
 
+static void krealloc_more_oob(struct kunit *test)
+{
+	krealloc_more_oob_helper(test, 201, 235);
+}
+
+static void krealloc_less_oob(struct kunit *test)
+{
+	krealloc_less_oob_helper(test, 235, 201);
+}
+
+static void krealloc_pagealloc_more_oob(struct kunit *test)
+{
+	/* page_alloc fallback in only implemented for SLUB. */
+	KASAN_TEST_NEEDS_CONFIG_ON(test, CONFIG_SLUB);
+
+	krealloc_more_oob_helper(test, KMALLOC_MAX_CACHE_SIZE + 201,
+					KMALLOC_MAX_CACHE_SIZE + 235);
+}
+
+static void krealloc_pagealloc_less_oob(struct kunit *test)
+{
+	/* page_alloc fallback in only implemented for SLUB. */
+	KASAN_TEST_NEEDS_CONFIG_ON(test, CONFIG_SLUB);
+
+	krealloc_less_oob_helper(test, KMALLOC_MAX_CACHE_SIZE + 235,
+					KMALLOC_MAX_CACHE_SIZE + 201);
+}
+
 static void kmalloc_oob_16(struct kunit *test)
 {
 	struct {
@@ -977,8 +1046,10 @@ static struct kunit_case kasan_kunit_tes
 	KUNIT_CASE(pagealloc_oob_right),
 	KUNIT_CASE(pagealloc_uaf),
 	KUNIT_CASE(kmalloc_large_oob_right),
-	KUNIT_CASE(kmalloc_oob_krealloc_more),
-	KUNIT_CASE(kmalloc_oob_krealloc_less),
+	KUNIT_CASE(krealloc_more_oob),
+	KUNIT_CASE(krealloc_less_oob),
+	KUNIT_CASE(krealloc_pagealloc_more_oob),
+	KUNIT_CASE(krealloc_pagealloc_less_oob),
 	KUNIT_CASE(kmalloc_oob_16),
 	KUNIT_CASE(kmalloc_uaf_16),
 	KUNIT_CASE(kmalloc_oob_in_memset),
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 076/118] kasan, mm: fail krealloc on freed objects
  2021-02-26  1:14 incoming Andrew Morton
                   ` (74 preceding siblings ...)
  2021-02-26  1:20 ` [patch 075/118] kasan: rework krealloc tests Andrew Morton
@ 2021-02-26  1:20 ` Andrew Morton
  2021-02-26  1:20 ` [patch 077/118] kasan, mm: optimize krealloc poisoning Andrew Morton
                   ` (42 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:20 UTC (permalink / raw)
  To: akpm, andreyknvl, aryabinin, Branislav.Rankov, catalin.marinas,
	dvyukov, elver, eugenis, glider, kevin.brodsky, linux-mm,
	mm-commits, pcc, torvalds, vincenzo.frascino, will.deacon

From: Andrey Konovalov <andreyknvl@google.com>
Subject: kasan, mm: fail krealloc on freed objects

Currently, if krealloc() is called on a freed object with KASAN enabled,
it allocates and returns a new object, but doesn't copy any memory from
the old one as ksize() returns 0.  This makes the caller believe that
krealloc() succeeded (KASAN report is printed though).

This patch adds an accessibility check into __do_krealloc().  If the check
fails, krealloc() returns NULL.  This check duplicates the one in ksize();
this is fixed in the following patch.

This patch also adds a KASAN-KUnit test to check krealloc() behaviour when
it's called on a freed object.

Link: https://lkml.kernel.org/r/cbcf7b02be0a1ca11de4f833f2ff0b3f2c9b00c8.1612546384.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 lib/test_kasan.c |   20 ++++++++++++++++++++
 mm/slab_common.c |    3 +++
 2 files changed, 23 insertions(+)

--- a/lib/test_kasan.c~kasan-mm-fail-krealloc-on-freed-objects
+++ a/lib/test_kasan.c
@@ -353,6 +353,25 @@ static void krealloc_pagealloc_less_oob(
 					KMALLOC_MAX_CACHE_SIZE + 201);
 }
 
+/*
+ * Check that krealloc() detects a use-after-free, returns NULL,
+ * and doesn't unpoison the freed object.
+ */
+static void krealloc_uaf(struct kunit *test)
+{
+	char *ptr1, *ptr2;
+	int size1 = 201;
+	int size2 = 235;
+
+	ptr1 = kmalloc(size1, GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ptr1);
+	kfree(ptr1);
+
+	KUNIT_EXPECT_KASAN_FAIL(test, ptr2 = krealloc(ptr1, size2, GFP_KERNEL));
+	KUNIT_ASSERT_PTR_EQ(test, (void *)ptr2, NULL);
+	KUNIT_EXPECT_KASAN_FAIL(test, *(volatile char *)ptr1);
+}
+
 static void kmalloc_oob_16(struct kunit *test)
 {
 	struct {
@@ -1050,6 +1069,7 @@ static struct kunit_case kasan_kunit_tes
 	KUNIT_CASE(krealloc_less_oob),
 	KUNIT_CASE(krealloc_pagealloc_more_oob),
 	KUNIT_CASE(krealloc_pagealloc_less_oob),
+	KUNIT_CASE(krealloc_uaf),
 	KUNIT_CASE(kmalloc_oob_16),
 	KUNIT_CASE(kmalloc_uaf_16),
 	KUNIT_CASE(kmalloc_oob_in_memset),
--- a/mm/slab_common.c~kasan-mm-fail-krealloc-on-freed-objects
+++ a/mm/slab_common.c
@@ -1136,6 +1136,9 @@ static __always_inline void *__do_kreall
 	void *ret;
 	size_t ks;
 
+	if (likely(!ZERO_OR_NULL_PTR(p)) && !kasan_check_byte(p))
+		return NULL;
+
 	ks = ksize(p);
 
 	if (ks >= new_size) {
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 077/118] kasan, mm: optimize krealloc poisoning
  2021-02-26  1:14 incoming Andrew Morton
                   ` (75 preceding siblings ...)
  2021-02-26  1:20 ` [patch 076/118] kasan, mm: fail krealloc on freed objects Andrew Morton
@ 2021-02-26  1:20 ` Andrew Morton
  2021-02-26  1:20 ` [patch 078/118] kasan: ensure poisoning size alignment Andrew Morton
                   ` (41 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:20 UTC (permalink / raw)
  To: akpm, andreyknvl, aryabinin, Branislav.Rankov, catalin.marinas,
	dvyukov, elver, eugenis, glider, kevin.brodsky, linux-mm,
	mm-commits, pcc, torvalds, vincenzo.frascino, will.deacon

From: Andrey Konovalov <andreyknvl@google.com>
Subject: kasan, mm: optimize krealloc poisoning

Currently, krealloc() always calls ksize(), which unpoisons the whole
object including the redzone.  This is inefficient, as kasan_krealloc()
repoisons the redzone for objects that fit into the same buffer.

This patch changes krealloc() instrumentation to use uninstrumented
__ksize() that doesn't unpoison the memory.  Instead, kasan_kreallos() is
changed to unpoison the memory excluding the redzone.

For objects that don't fit into the old allocation, this patch disables
KASAN accessibility checks when copying memory into a new object instead
of unpoisoning it.

Link: https://lkml.kernel.org/r/9bef90327c9cb109d736c40115684fd32f49e6b0.1612546384.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/kasan/common.c |   12 ++++++++++--
 mm/slab_common.c  |   20 ++++++++++++++------
 2 files changed, 24 insertions(+), 8 deletions(-)

--- a/mm/kasan/common.c~kasan-mm-optimize-krealloc-poisoning
+++ a/mm/kasan/common.c
@@ -476,7 +476,7 @@ static void *____kasan_kmalloc(struct km
 
 	/*
 	 * The object has already been unpoisoned by kasan_slab_alloc() for
-	 * kmalloc() or by ksize() for krealloc().
+	 * kmalloc() or by kasan_krealloc() for krealloc().
 	 */
 
 	/*
@@ -526,7 +526,7 @@ void * __must_check __kasan_kmalloc_larg
 
 	/*
 	 * The object has already been unpoisoned by kasan_alloc_pages() for
-	 * alloc_pages() or by ksize() for krealloc().
+	 * alloc_pages() or by kasan_krealloc() for krealloc().
 	 */
 
 	/*
@@ -554,8 +554,16 @@ void * __must_check __kasan_krealloc(con
 	if (unlikely(object == ZERO_SIZE_PTR))
 		return (void *)object;
 
+	/*
+	 * Unpoison the object's data.
+	 * Part of it might already have been unpoisoned, but it's unknown
+	 * how big that part is.
+	 */
+	kasan_unpoison(object, size);
+
 	page = virt_to_head_page(object);
 
+	/* Piggy-back on kmalloc() instrumentation to poison the redzone. */
 	if (unlikely(!PageSlab(page)))
 		return __kasan_kmalloc_large(object, size, flags);
 	else
--- a/mm/slab_common.c~kasan-mm-optimize-krealloc-poisoning
+++ a/mm/slab_common.c
@@ -1136,19 +1136,27 @@ static __always_inline void *__do_kreall
 	void *ret;
 	size_t ks;
 
-	if (likely(!ZERO_OR_NULL_PTR(p)) && !kasan_check_byte(p))
-		return NULL;
-
-	ks = ksize(p);
+	/* Don't use instrumented ksize to allow precise KASAN poisoning. */
+	if (likely(!ZERO_OR_NULL_PTR(p))) {
+		if (!kasan_check_byte(p))
+			return NULL;
+		ks = kfence_ksize(p) ?: __ksize(p);
+	} else
+		ks = 0;
 
+	/* If the object still fits, repoison it precisely. */
 	if (ks >= new_size) {
 		p = kasan_krealloc((void *)p, new_size, flags);
 		return (void *)p;
 	}
 
 	ret = kmalloc_track_caller(new_size, flags);
-	if (ret && p)
-		memcpy(ret, p, ks);
+	if (ret && p) {
+		/* Disable KASAN checks as the object's redzone is accessed. */
+		kasan_disable_current();
+		memcpy(ret, kasan_reset_tag(p), ks);
+		kasan_enable_current();
+	}
 
 	return ret;
 }
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 078/118] kasan: ensure poisoning size alignment
  2021-02-26  1:14 incoming Andrew Morton
                   ` (76 preceding siblings ...)
  2021-02-26  1:20 ` [patch 077/118] kasan, mm: optimize krealloc poisoning Andrew Morton
@ 2021-02-26  1:20 ` Andrew Morton
  2021-02-26  1:20 ` [patch 079/118] arm64: kasan: simplify and inline MTE functions Andrew Morton
                   ` (40 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:20 UTC (permalink / raw)
  To: akpm, andreyknvl, aryabinin, Branislav.Rankov, catalin.marinas,
	dvyukov, elver, eugenis, glider, kevin.brodsky, linux-mm,
	mm-commits, pcc, torvalds, vincenzo.frascino, will.deacon

From: Andrey Konovalov <andreyknvl@google.com>
Subject: kasan: ensure poisoning size alignment

A previous changes d99f6a10c161 ("kasan: don't round_up too much")
attempted to simplify the code by adding a round_up(size) call into
kasan_poison().  While this allows to have less round_up() calls around
the code, this results in round_up() being called multiple times.

This patch removes round_up() of size from kasan_poison() and ensures that
all callers round_up() the size explicitly.  This patch also adds
WARN_ON() alignment checks for address and size to kasan_poison() and
kasan_unpoison().

Link: https://lkml.kernel.org/r/3ffe8d4a246ae67a8b5e91f65bf98cd7cba9d7b9.1612546384.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/kasan/common.c |    9 ++++++---
 mm/kasan/kasan.h  |   33 ++++++++++++++++++++-------------
 mm/kasan/shadow.c |   37 ++++++++++++++++++++++---------------
 3 files changed, 48 insertions(+), 31 deletions(-)

--- a/mm/kasan/common.c~kasan-ensure-poisoning-size-alignment
+++ a/mm/kasan/common.c
@@ -261,7 +261,8 @@ void __kasan_unpoison_object_data(struct
 
 void __kasan_poison_object_data(struct kmem_cache *cache, void *object)
 {
-	kasan_poison(object, cache->object_size, KASAN_KMALLOC_REDZONE);
+	kasan_poison(object, round_up(cache->object_size, KASAN_GRANULE_SIZE),
+			KASAN_KMALLOC_REDZONE);
 }
 
 /*
@@ -348,7 +349,8 @@ static bool ____kasan_slab_free(struct k
 		return true;
 	}
 
-	kasan_poison(object, cache->object_size, KASAN_KMALLOC_FREE);
+	kasan_poison(object, round_up(cache->object_size, KASAN_GRANULE_SIZE),
+			KASAN_KMALLOC_FREE);
 
 	if ((IS_ENABLED(CONFIG_KASAN_GENERIC) && !quarantine))
 		return false;
@@ -490,7 +492,8 @@ static void *____kasan_kmalloc(struct km
 	/* Poison the aligned part of the redzone. */
 	redzone_start = round_up((unsigned long)(object + size),
 				KASAN_GRANULE_SIZE);
-	redzone_end = (unsigned long)object + cache->object_size;
+	redzone_end = round_up((unsigned long)(object + cache->object_size),
+				KASAN_GRANULE_SIZE);
 	kasan_poison((void *)redzone_start, redzone_end - redzone_start,
 			   KASAN_KMALLOC_REDZONE);
 
--- a/mm/kasan/kasan.h~kasan-ensure-poisoning-size-alignment
+++ a/mm/kasan/kasan.h
@@ -330,30 +330,37 @@ static inline u8 kasan_random_tag(void)
 
 #ifdef CONFIG_KASAN_HW_TAGS
 
-static inline void kasan_poison(const void *address, size_t size, u8 value)
+static inline void kasan_poison(const void *addr, size_t size, u8 value)
 {
-	address = kasan_reset_tag(address);
+	addr = kasan_reset_tag(addr);
 
 	/* Skip KFENCE memory if called explicitly outside of sl*b. */
-	if (is_kfence_address(address))
+	if (is_kfence_address(addr))
 		return;
 
-	hw_set_mem_tag_range((void *)address,
-			round_up(size, KASAN_GRANULE_SIZE), value);
+	if (WARN_ON((unsigned long)addr & KASAN_GRANULE_MASK))
+		return;
+	if (WARN_ON(size & KASAN_GRANULE_MASK))
+		return;
+
+	hw_set_mem_tag_range((void *)addr, size, value);
 }
 
-static inline void kasan_unpoison(const void *address, size_t size)
+static inline void kasan_unpoison(const void *addr, size_t size)
 {
-	u8 tag = get_tag(address);
+	u8 tag = get_tag(addr);
 
-	address = kasan_reset_tag(address);
+	addr = kasan_reset_tag(addr);
 
 	/* Skip KFENCE memory if called explicitly outside of sl*b. */
-	if (is_kfence_address(address))
+	if (is_kfence_address(addr))
+		return;
+
+	if (WARN_ON((unsigned long)addr & KASAN_GRANULE_MASK))
 		return;
+	size = round_up(size, KASAN_GRANULE_SIZE);
 
-	hw_set_mem_tag_range((void *)address,
-			round_up(size, KASAN_GRANULE_SIZE), tag);
+	hw_set_mem_tag_range((void *)addr, size, tag);
 }
 
 static inline bool kasan_byte_accessible(const void *addr)
@@ -370,7 +377,7 @@ static inline bool kasan_byte_accessible
 /**
  * kasan_poison - mark the memory range as unaccessible
  * @addr - range start address, must be aligned to KASAN_GRANULE_SIZE
- * @size - range size
+ * @size - range size, must be aligned to KASAN_GRANULE_SIZE
  * @value - value that's written to metadata for the range
  *
  * The size gets aligned to KASAN_GRANULE_SIZE before marking the range.
@@ -380,7 +387,7 @@ void kasan_poison(const void *addr, size
 /**
  * kasan_unpoison - mark the memory range as accessible
  * @addr - range start address, must be aligned to KASAN_GRANULE_SIZE
- * @size - range size
+ * @size - range size, can be unaligned
  *
  * For the tag-based modes, the @size gets aligned to KASAN_GRANULE_SIZE before
  * marking the range.
--- a/mm/kasan/shadow.c~kasan-ensure-poisoning-size-alignment
+++ a/mm/kasan/shadow.c
@@ -69,7 +69,7 @@ void *memcpy(void *dest, const void *src
 	return __memcpy(dest, src, len);
 }
 
-void kasan_poison(const void *address, size_t size, u8 value)
+void kasan_poison(const void *addr, size_t size, u8 value)
 {
 	void *shadow_start, *shadow_end;
 
@@ -78,55 +78,62 @@ void kasan_poison(const void *address, s
 	 * some of the callers (e.g. kasan_poison_object_data) pass tagged
 	 * addresses to this function.
 	 */
-	address = kasan_reset_tag(address);
+	addr = kasan_reset_tag(addr);
 
 	/* Skip KFENCE memory if called explicitly outside of sl*b. */
-	if (is_kfence_address(address))
+	if (is_kfence_address(addr))
 		return;
 
-	size = round_up(size, KASAN_GRANULE_SIZE);
-	shadow_start = kasan_mem_to_shadow(address);
-	shadow_end = kasan_mem_to_shadow(address + size);
+	if (WARN_ON((unsigned long)addr & KASAN_GRANULE_MASK))
+		return;
+	if (WARN_ON(size & KASAN_GRANULE_MASK))
+		return;
+
+	shadow_start = kasan_mem_to_shadow(addr);
+	shadow_end = kasan_mem_to_shadow(addr + size);
 
 	__memset(shadow_start, value, shadow_end - shadow_start);
 }
 EXPORT_SYMBOL(kasan_poison);
 
 #ifdef CONFIG_KASAN_GENERIC
-void kasan_poison_last_granule(const void *address, size_t size)
+void kasan_poison_last_granule(const void *addr, size_t size)
 {
 	if (size & KASAN_GRANULE_MASK) {
-		u8 *shadow = (u8 *)kasan_mem_to_shadow(address + size);
+		u8 *shadow = (u8 *)kasan_mem_to_shadow(addr + size);
 		*shadow = size & KASAN_GRANULE_MASK;
 	}
 }
 #endif
 
-void kasan_unpoison(const void *address, size_t size)
+void kasan_unpoison(const void *addr, size_t size)
 {
-	u8 tag = get_tag(address);
+	u8 tag = get_tag(addr);
 
 	/*
 	 * Perform shadow offset calculation based on untagged address, as
 	 * some of the callers (e.g. kasan_unpoison_object_data) pass tagged
 	 * addresses to this function.
 	 */
-	address = kasan_reset_tag(address);
+	addr = kasan_reset_tag(addr);
 
 	/*
 	 * Skip KFENCE memory if called explicitly outside of sl*b. Also note
 	 * that calls to ksize(), where size is not a multiple of machine-word
 	 * size, would otherwise poison the invalid portion of the word.
 	 */
-	if (is_kfence_address(address))
+	if (is_kfence_address(addr))
+		return;
+
+	if (WARN_ON((unsigned long)addr & KASAN_GRANULE_MASK))
 		return;
 
-	/* Unpoison round_up(size, KASAN_GRANULE_SIZE) bytes. */
-	kasan_poison(address, size, tag);
+	/* Unpoison all granules that cover the object. */
+	kasan_poison(addr, round_up(size, KASAN_GRANULE_SIZE), tag);
 
 	/* Partially poison the last granule for the generic mode. */
 	if (IS_ENABLED(CONFIG_KASAN_GENERIC))
-		kasan_poison_last_granule(address, size);
+		kasan_poison_last_granule(addr, size);
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 079/118] arm64: kasan: simplify and inline MTE functions
  2021-02-26  1:14 incoming Andrew Morton
                   ` (77 preceding siblings ...)
  2021-02-26  1:20 ` [patch 078/118] kasan: ensure poisoning size alignment Andrew Morton
@ 2021-02-26  1:20 ` Andrew Morton
  2021-02-26  1:20 ` [patch 080/118] kasan: inline HW_TAGS helper functions Andrew Morton
                   ` (39 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:20 UTC (permalink / raw)
  To: akpm, andreyknvl, aryabinin, Branislav.Rankov, catalin.marinas,
	dvyukov, elver, eugenis, glider, kevin.brodsky, linux-mm,
	mm-commits, pcc, torvalds, vincenzo.frascino, will.deacon

From: Andrey Konovalov <andreyknvl@google.com>
Subject: arm64: kasan: simplify and inline MTE functions

This change provides a simpler implementation of mte_get_mem_tag(),
mte_get_random_tag(), and mte_set_mem_tag_range().

Simplifications include removing system_supports_mte() checks as these
functions are onlye called from KASAN runtime that had already checked
system_supports_mte().  Besides that, size and address alignment checks
are removed from mte_set_mem_tag_range(), as KASAN now does those.

This change also moves these functions into the asm/mte-kasan.h header and
implements mte_set_mem_tag_range() via inline assembly to avoid
unnecessary functions calls.

[vincenzo.frascino@arm.com: fix warning in mte_get_random_tag()]
  Link: https://lkml.kernel.org/r/20210211152208.23811-1-vincenzo.frascino@arm.com
Link: https://lkml.kernel.org/r/a26121b294fdf76e369cb7a74351d1c03a908930.1612546384.git.andreyknvl@google.com
Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/arm64/include/asm/cache.h     |    1 
 arch/arm64/include/asm/kasan.h     |    1 
 arch/arm64/include/asm/mte-def.h   |    2 
 arch/arm64/include/asm/mte-kasan.h |   65 +++++++++++++++++++++++----
 arch/arm64/include/asm/mte.h       |    2 
 arch/arm64/kernel/mte.c            |   46 -------------------
 arch/arm64/lib/mte.S               |   16 ------
 7 files changed, 60 insertions(+), 73 deletions(-)

--- a/arch/arm64/include/asm/cache.h~arm64-kasan-simplify-and-inline-mte-functions
+++ a/arch/arm64/include/asm/cache.h
@@ -6,7 +6,6 @@
 #define __ASM_CACHE_H
 
 #include <asm/cputype.h>
-#include <asm/mte-kasan.h>
 
 #define CTR_L1IP_SHIFT		14
 #define CTR_L1IP_MASK		3
--- a/arch/arm64/include/asm/kasan.h~arm64-kasan-simplify-and-inline-mte-functions
+++ a/arch/arm64/include/asm/kasan.h
@@ -6,6 +6,7 @@
 
 #include <linux/linkage.h>
 #include <asm/memory.h>
+#include <asm/mte-kasan.h>
 #include <asm/pgtable-types.h>
 
 #define arch_kasan_set_tag(addr, tag)	__tag_set(addr, tag)
--- a/arch/arm64/include/asm/mte-def.h~arm64-kasan-simplify-and-inline-mte-functions
+++ a/arch/arm64/include/asm/mte-def.h
@@ -11,4 +11,6 @@
 #define MTE_TAG_SIZE		4
 #define MTE_TAG_MASK		GENMASK((MTE_TAG_SHIFT + (MTE_TAG_SIZE - 1)), MTE_TAG_SHIFT)
 
+#define __MTE_PREAMBLE		ARM64_ASM_PREAMBLE ".arch_extension memtag\n"
+
 #endif /* __ASM_MTE_DEF_H  */
--- a/arch/arm64/include/asm/mte.h~arm64-kasan-simplify-and-inline-mte-functions
+++ a/arch/arm64/include/asm/mte.h
@@ -8,8 +8,6 @@
 #include <asm/compiler.h>
 #include <asm/mte-def.h>
 
-#define __MTE_PREAMBLE		ARM64_ASM_PREAMBLE ".arch_extension memtag\n"
-
 #ifndef __ASSEMBLY__
 
 #include <linux/bitfield.h>
--- a/arch/arm64/include/asm/mte-kasan.h~arm64-kasan-simplify-and-inline-mte-functions
+++ a/arch/arm64/include/asm/mte-kasan.h
@@ -11,11 +11,14 @@
 
 #include <linux/types.h>
 
+#ifdef CONFIG_ARM64_MTE
+
 /*
- * The functions below are meant to be used only for the
- * KASAN_HW_TAGS interface defined in asm/memory.h.
+ * These functions are meant to be only used from KASAN runtime through
+ * the arch_*() interface defined in asm/memory.h.
+ * These functions don't include system_supports_mte() checks,
+ * as KASAN only calls them when MTE is supported and enabled.
  */
-#ifdef CONFIG_ARM64_MTE
 
 static inline u8 mte_get_ptr_tag(void *ptr)
 {
@@ -25,9 +28,54 @@ static inline u8 mte_get_ptr_tag(void *p
 	return tag;
 }
 
-u8 mte_get_mem_tag(void *addr);
-u8 mte_get_random_tag(void);
-void *mte_set_mem_tag_range(void *addr, size_t size, u8 tag);
+/* Get allocation tag for the address. */
+static inline u8 mte_get_mem_tag(void *addr)
+{
+	asm(__MTE_PREAMBLE "ldg %0, [%0]"
+		: "+r" (addr));
+
+	return mte_get_ptr_tag(addr);
+}
+
+/* Generate a random tag. */
+static inline u8 mte_get_random_tag(void)
+{
+	void *addr;
+
+	asm(__MTE_PREAMBLE "irg %0, %0"
+		: "=r" (addr));
+
+	return mte_get_ptr_tag(addr);
+}
+
+/*
+ * Assign allocation tags for a region of memory based on the pointer tag.
+ * Note: The address must be non-NULL and MTE_GRANULE_SIZE aligned and
+ * size must be non-zero and MTE_GRANULE_SIZE aligned.
+ */
+static inline void mte_set_mem_tag_range(void *addr, size_t size, u8 tag)
+{
+	u64 curr, end;
+
+	if (!size)
+		return;
+
+	curr = (u64)__tag_set(addr, tag);
+	end = curr + size;
+
+	do {
+		/*
+		 * 'asm volatile' is required to prevent the compiler to move
+		 * the statement outside of the loop.
+		 */
+		asm volatile(__MTE_PREAMBLE "stg %0, [%0]"
+			     :
+			     : "r" (curr)
+			     : "memory");
+
+		curr += MTE_GRANULE_SIZE;
+	} while (curr != end);
+}
 
 void mte_enable_kernel(void);
 void mte_init_tags(u64 max_tag);
@@ -46,13 +94,14 @@ static inline u8 mte_get_mem_tag(void *a
 {
 	return 0xFF;
 }
+
 static inline u8 mte_get_random_tag(void)
 {
 	return 0xFF;
 }
-static inline void *mte_set_mem_tag_range(void *addr, size_t size, u8 tag)
+
+static inline void mte_set_mem_tag_range(void *addr, size_t size, u8 tag)
 {
-	return addr;
 }
 
 static inline void mte_enable_kernel(void)
--- a/arch/arm64/kernel/mte.c~arm64-kasan-simplify-and-inline-mte-functions
+++ a/arch/arm64/kernel/mte.c
@@ -19,7 +19,6 @@
 #include <asm/barrier.h>
 #include <asm/cpufeature.h>
 #include <asm/mte.h>
-#include <asm/mte-kasan.h>
 #include <asm/ptrace.h>
 #include <asm/sysreg.h>
 
@@ -88,51 +87,6 @@ int memcmp_pages(struct page *page1, str
 	return ret;
 }
 
-u8 mte_get_mem_tag(void *addr)
-{
-	if (!system_supports_mte())
-		return 0xFF;
-
-	asm(__MTE_PREAMBLE "ldg %0, [%0]"
-	    : "+r" (addr));
-
-	return mte_get_ptr_tag(addr);
-}
-
-u8 mte_get_random_tag(void)
-{
-	void *addr;
-
-	if (!system_supports_mte())
-		return 0xFF;
-
-	asm(__MTE_PREAMBLE "irg %0, %0"
-	    : "+r" (addr));
-
-	return mte_get_ptr_tag(addr);
-}
-
-void *mte_set_mem_tag_range(void *addr, size_t size, u8 tag)
-{
-	void *ptr = addr;
-
-	if ((!system_supports_mte()) || (size == 0))
-		return addr;
-
-	/* Make sure that size is MTE granule aligned. */
-	WARN_ON(size & (MTE_GRANULE_SIZE - 1));
-
-	/* Make sure that the address is MTE granule aligned. */
-	WARN_ON((u64)addr & (MTE_GRANULE_SIZE - 1));
-
-	tag = 0xF0 | tag;
-	ptr = (void *)__tag_set(ptr, tag);
-
-	mte_assign_mem_tag_range(ptr, size);
-
-	return ptr;
-}
-
 void mte_init_tags(u64 max_tag)
 {
 	static bool gcr_kernel_excl_initialized;
--- a/arch/arm64/lib/mte.S~arm64-kasan-simplify-and-inline-mte-functions
+++ a/arch/arm64/lib/mte.S
@@ -149,19 +149,3 @@ SYM_FUNC_START(mte_restore_page_tags)
 
 	ret
 SYM_FUNC_END(mte_restore_page_tags)
-
-/*
- * Assign allocation tags for a region of memory based on the pointer tag
- *   x0 - source pointer
- *   x1 - size
- *
- * Note: The address must be non-NULL and MTE_GRANULE_SIZE aligned and
- * size must be non-zero and MTE_GRANULE_SIZE aligned.
- */
-SYM_FUNC_START(mte_assign_mem_tag_range)
-1:	stg	x0, [x0]
-	add	x0, x0, #MTE_GRANULE_SIZE
-	subs	x1, x1, #MTE_GRANULE_SIZE
-	b.gt	1b
-	ret
-SYM_FUNC_END(mte_assign_mem_tag_range)
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 080/118] kasan: inline HW_TAGS helper functions
  2021-02-26  1:14 incoming Andrew Morton
                   ` (78 preceding siblings ...)
  2021-02-26  1:20 ` [patch 079/118] arm64: kasan: simplify and inline MTE functions Andrew Morton
@ 2021-02-26  1:20 ` Andrew Morton
  2021-02-26  1:20 ` [patch 081/118] kasan: clarify that only first bug is reported in HW_TAGS Andrew Morton
                   ` (38 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:20 UTC (permalink / raw)
  To: akpm, andreyknvl, aryabinin, Branislav.Rankov, catalin.marinas,
	dvyukov, elver, eugenis, glider, kevin.brodsky, linux-mm,
	mm-commits, pcc, torvalds, vincenzo.frascino, will.deacon

From: Andrey Konovalov <andreyknvl@google.com>
Subject: kasan: inline HW_TAGS helper functions

Mark all static functions in common.c and kasan.h that are used for
hardware tag-based KASAN as inline to avoid unnecessary function calls.

Link: https://lkml.kernel.org/r/2c94a2af0657f2b95b9337232339ff5ffa643ab5.1612546384.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/kasan/common.c |   13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

--- a/mm/kasan/common.c~kasan-inline-hw_tags-helper-functions
+++ a/mm/kasan/common.c
@@ -279,7 +279,8 @@ void __kasan_poison_object_data(struct k
  *    based on objects indexes, so that objects that are next to each other
  *    get different tags.
  */
-static u8 assign_tag(struct kmem_cache *cache, const void *object, bool init)
+static inline u8 assign_tag(struct kmem_cache *cache,
+					const void *object, bool init)
 {
 	if (IS_ENABLED(CONFIG_KASAN_GENERIC))
 		return 0xff;
@@ -321,8 +322,8 @@ void * __must_check __kasan_init_slab_ob
 	return (void *)object;
 }
 
-static bool ____kasan_slab_free(struct kmem_cache *cache, void *object,
-			      unsigned long ip, bool quarantine)
+static inline bool ____kasan_slab_free(struct kmem_cache *cache,
+				void *object, unsigned long ip, bool quarantine)
 {
 	u8 tag;
 	void *tagged_object;
@@ -366,7 +367,7 @@ bool __kasan_slab_free(struct kmem_cache
 	return ____kasan_slab_free(cache, object, ip, true);
 }
 
-static bool ____kasan_kfree_large(void *ptr, unsigned long ip)
+static inline bool ____kasan_kfree_large(void *ptr, unsigned long ip)
 {
 	if (ptr != page_address(virt_to_head_page(ptr))) {
 		kasan_report_invalid_free(ptr, ip);
@@ -461,8 +462,8 @@ void * __must_check __kasan_slab_alloc(s
 	return tagged_object;
 }
 
-static void *____kasan_kmalloc(struct kmem_cache *cache, const void *object,
-					size_t size, gfp_t flags)
+static inline void *____kasan_kmalloc(struct kmem_cache *cache,
+				const void *object, size_t size, gfp_t flags)
 {
 	unsigned long redzone_start;
 	unsigned long redzone_end;
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 081/118] kasan: clarify that only first bug is reported in HW_TAGS
  2021-02-26  1:14 incoming Andrew Morton
                   ` (79 preceding siblings ...)
  2021-02-26  1:20 ` [patch 080/118] kasan: inline HW_TAGS helper functions Andrew Morton
@ 2021-02-26  1:20 ` Andrew Morton
  2021-02-26  1:20 ` [patch 082/118] alpha: remove CONFIG_EXPERIMENTAL from defconfigs Andrew Morton
                   ` (37 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:20 UTC (permalink / raw)
  To: akpm, andreyknvl, aryabinin, Branislav.Rankov, catalin.marinas,
	dvyukov, elver, eugenis, glider, kevin.brodsky, linux-mm,
	mm-commits, pcc, torvalds, vincenzo.frascino, will.deacon

From: Andrey Konovalov <andreyknvl@google.com>
Subject: kasan: clarify that only first bug is reported in HW_TAGS

Hwardware tag-based KASAN only reports the first found bug. After that MTE
tag checking gets disabled. Clarify this in comments and documentation.

Link: https://lkml.kernel.org/r/00383ba88a47c3f8342d12263c24bdf95527b07d.1612546384.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 Documentation/dev-tools/kasan.rst |    8 ++++++--
 mm/kasan/hw_tags.c                |    2 +-
 2 files changed, 7 insertions(+), 3 deletions(-)

--- a/Documentation/dev-tools/kasan.rst~kasan-clarify-that-only-first-bug-is-reported-in-hw_tags
+++ a/Documentation/dev-tools/kasan.rst
@@ -155,7 +155,7 @@ Boot parameters
 ~~~~~~~~~~~~~~~
 
 Hardware tag-based KASAN mode (see the section about various modes below) is
-intended for use in production as a security mitigation. Therefore it supports
+intended for use in production as a security mitigation. Therefore, it supports
 boot parameters that allow to disable KASAN competely or otherwise control
 particular KASAN features.
 
@@ -165,7 +165,8 @@ particular KASAN features.
   traces collection (default: ``on``).
 
 - ``kasan.fault=report`` or ``=panic`` controls whether to only print a KASAN
-  report or also panic the kernel (default: ``report``).
+  report or also panic the kernel (default: ``report``). Note, that tag
+  checking gets disabled after the first reported bug.
 
 For developers
 ~~~~~~~~~~~~~~
@@ -295,6 +296,9 @@ Note, that enabling CONFIG_KASAN_HW_TAGS
 enabled. Even when kasan.mode=off is provided, or when the hardware doesn't
 support MTE (but supports TBI).
 
+Hardware tag-based KASAN only reports the first found bug. After that MTE tag
+checking gets disabled.
+
 What memory accesses are sanitised by KASAN?
 --------------------------------------------
 
--- a/mm/kasan/hw_tags.c~kasan-clarify-that-only-first-bug-is-reported-in-hw_tags
+++ a/mm/kasan/hw_tags.c
@@ -48,7 +48,7 @@ EXPORT_SYMBOL(kasan_flag_enabled);
 /* Whether to collect alloc/free stack traces. */
 DEFINE_STATIC_KEY_FALSE(kasan_flag_stacktrace);
 
-/* Whether panic or disable tag checking on fault. */
+/* Whether to panic or print a report and disable tag checking on fault. */
 bool kasan_flag_panic __ro_after_init;
 
 /* kasan=off/on */
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 082/118] alpha: remove CONFIG_EXPERIMENTAL from defconfigs
  2021-02-26  1:14 incoming Andrew Morton
                   ` (80 preceding siblings ...)
  2021-02-26  1:20 ` [patch 081/118] kasan: clarify that only first bug is reported in HW_TAGS Andrew Morton
@ 2021-02-26  1:20 ` Andrew Morton
  2021-02-26  1:20 ` [patch 083/118] proc/wchan: use printk format instead of lookup_symbol_name() Andrew Morton
                   ` (36 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:20 UTC (permalink / raw)
  To: akpm, gregkh, ink, keescook, linux-mm, mattst88, mm-commits,
	rdunlap, rth, torvalds

From: Randy Dunlap <rdunlap@infradead.org>
Subject: alpha: remove CONFIG_EXPERIMENTAL from defconfigs

Since CONFIG_EXPERIMENTAL was removed in 2013, go ahead and drop it
from any defconfig files.

Link: https://lkml.kernel.org/r/20210115005956.29408-1-rdunlap@infradead.org
Fixes: 3d374d09f16f ("final removal of CONFIG_EXPERIMENTAL")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/alpha/configs/defconfig |    1 -
 1 file changed, 1 deletion(-)

--- a/arch/alpha/configs/defconfig~alpha-remove-config_experimental-from-defconfigs
+++ a/arch/alpha/configs/defconfig
@@ -1,4 +1,3 @@
-CONFIG_EXPERIMENTAL=y
 CONFIG_SYSVIPC=y
 CONFIG_POSIX_MQUEUE=y
 CONFIG_LOG_BUF_SHIFT=14
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 083/118] proc/wchan: use printk format instead of lookup_symbol_name()
  2021-02-26  1:14 incoming Andrew Morton
                   ` (81 preceding siblings ...)
  2021-02-26  1:20 ` [patch 082/118] alpha: remove CONFIG_EXPERIMENTAL from defconfigs Andrew Morton
@ 2021-02-26  1:20 ` Andrew Morton
  2021-02-26  1:20 ` [patch 084/118] proc: use kvzalloc for our kernel buffer Andrew Morton
                   ` (35 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:20 UTC (permalink / raw)
  To: adobriyan, akpm, deller, linux-mm, mm-commits, torvalds

From: Helge Deller <deller@gmx.de>
Subject: proc/wchan: use printk format instead of lookup_symbol_name()

To resolve the symbol fuction name for wchan, use the printk format
specifier %ps instead of manually looking up the symbol function name
via lookup_symbol_name().

Link: https://lkml.kernel.org/r/20201217165413.GA1959@ls3530.fritz.box
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/proc/base.c |   21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

--- a/fs/proc/base.c~proc-wchan-use-printk-format-instead-of-lookup_symbol_name
+++ a/fs/proc/base.c
@@ -67,7 +67,6 @@
 #include <linux/mm.h>
 #include <linux/swap.h>
 #include <linux/rcupdate.h>
-#include <linux/kallsyms.h>
 #include <linux/stacktrace.h>
 #include <linux/resource.h>
 #include <linux/module.h>
@@ -386,19 +385,17 @@ static int proc_pid_wchan(struct seq_fil
 			  struct pid *pid, struct task_struct *task)
 {
 	unsigned long wchan;
-	char symname[KSYM_NAME_LEN];
 
-	if (!ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS))
-		goto print0;
+	if (ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS))
+		wchan = get_wchan(task);
+	else
+		wchan = 0;
+
+	if (wchan)
+		seq_printf(m, "%ps", (void *) wchan);
+	else
+		seq_putc(m, '0');
 
-	wchan = get_wchan(task);
-	if (wchan && !lookup_symbol_name(wchan, symname)) {
-		seq_puts(m, symname);
-		return 0;
-	}
-
-print0:
-	seq_putc(m, '0');
 	return 0;
 }
 #endif /* CONFIG_KALLSYMS */
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 084/118] proc: use kvzalloc for our kernel buffer
  2021-02-26  1:14 incoming Andrew Morton
                   ` (82 preceding siblings ...)
  2021-02-26  1:20 ` [patch 083/118] proc/wchan: use printk format instead of lookup_symbol_name() Andrew Morton
@ 2021-02-26  1:20 ` Andrew Morton
  2021-02-26  1:20 ` [patch 085/118] sysctl.c: fix underflow value setting risk in vm_table Andrew Morton
                   ` (34 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:20 UTC (permalink / raw)
  To: adobriyan, akpm, hch, josef, linux-mm, mm-commits, torvalds,
	vbabka, viro, willy

From: Josef Bacik <josef@toxicpanda.com>
Subject: proc: use kvzalloc for our kernel buffer

Since

  sysctl: pass kernel pointers to ->proc_handler

we have been pre-allocating a buffer to copy the data from the proc
handlers into, and then copying that to userspace.  The problem is this
just blindly kzalloc()'s the buffer size passed in from the read, which in
the case of our 'cat' binary was 64kib.  Order-4 allocations are not
awesome, and since we can potentially allocate up to our maximum order, so
use kvzalloc for these buffers.

[willy@infradead.org: changelog tweaks]
Link: https://lkml.kernel.org/r/6345270a2c1160b89dd5e6715461f388176899d1.1612972413.git.josef@toxicpanda.com
Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
CC: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/proc/proc_sysctl.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/fs/proc/proc_sysctl.c~proc-use-vmalloc-for-our-kernel-buffer
+++ a/fs/proc/proc_sysctl.c
@@ -571,7 +571,7 @@ static ssize_t proc_sys_call_handler(str
 	error = -ENOMEM;
 	if (count >= KMALLOC_MAX_SIZE)
 		goto out;
-	kbuf = kzalloc(count + 1, GFP_KERNEL);
+	kbuf = kvzalloc(count + 1, GFP_KERNEL);
 	if (!kbuf)
 		goto out;
 
@@ -600,7 +600,7 @@ static ssize_t proc_sys_call_handler(str
 
 	error = count;
 out_free_buf:
-	kfree(kbuf);
+	kvfree(kbuf);
 out:
 	sysctl_head_finish(head);
 
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 085/118] sysctl.c: fix underflow value setting risk in vm_table
  2021-02-26  1:14 incoming Andrew Morton
                   ` (83 preceding siblings ...)
  2021-02-26  1:20 ` [patch 084/118] proc: use kvzalloc for our kernel buffer Andrew Morton
@ 2021-02-26  1:20 ` Andrew Morton
  2021-02-26  1:20 ` [patch 086/118] include/linux: remove repeated words Andrew Morton
                   ` (33 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:20 UTC (permalink / raw)
  To: adobriyan, akpm, ebiederm, linf, linux-mm, mm-commits, torvalds

From: Lin Feng <linf@wangsu.com>
Subject: sysctl.c: fix underflow value setting risk in vm_table

Apart from subsystem specific .proc_handler handler, all ctl_tables with
extra1 and extra2 members set should use proc_dointvec_minmax instead of
proc_dointvec, or the limit set in extra* never work and potentially echo
underflow values(negative numbers) is likely make system unstable.

Especially vfs_cache_pressure and zone_reclaim_mode, -1 is apparently not
a valid value, but we can set to them.  And then kernel may crash.

# echo -1 > /proc/sys/vm/vfs_cache_pressure

Link: https://lkml.kernel.org/r/20201223105535.2875-1-linf@wangsu.com
Signed-off-by: Lin Feng <linf@wangsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 kernel/sysctl.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/kernel/sysctl.c~sysctlc-fix-underflow-value-setting-risk-in-vm_table
+++ a/kernel/sysctl.c
@@ -2962,7 +2962,7 @@ static struct ctl_table vm_table[] = {
 		.data		= &block_dump,
 		.maxlen		= sizeof(block_dump),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec,
+		.proc_handler	= proc_dointvec_minmax,
 		.extra1		= SYSCTL_ZERO,
 	},
 	{
@@ -2970,7 +2970,7 @@ static struct ctl_table vm_table[] = {
 		.data		= &sysctl_vfs_cache_pressure,
 		.maxlen		= sizeof(sysctl_vfs_cache_pressure),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec,
+		.proc_handler	= proc_dointvec_minmax,
 		.extra1		= SYSCTL_ZERO,
 	},
 #if defined(HAVE_ARCH_PICK_MMAP_LAYOUT) || \
@@ -2980,7 +2980,7 @@ static struct ctl_table vm_table[] = {
 		.data		= &sysctl_legacy_va_layout,
 		.maxlen		= sizeof(sysctl_legacy_va_layout),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec,
+		.proc_handler	= proc_dointvec_minmax,
 		.extra1		= SYSCTL_ZERO,
 	},
 #endif
@@ -2990,7 +2990,7 @@ static struct ctl_table vm_table[] = {
 		.data		= &node_reclaim_mode,
 		.maxlen		= sizeof(node_reclaim_mode),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec,
+		.proc_handler	= proc_dointvec_minmax,
 		.extra1		= SYSCTL_ZERO,
 	},
 	{
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 086/118] include/linux: remove repeated words
  2021-02-26  1:14 incoming Andrew Morton
                   ` (84 preceding siblings ...)
  2021-02-26  1:20 ` [patch 085/118] sysctl.c: fix underflow value setting risk in vm_table Andrew Morton
@ 2021-02-26  1:20 ` Andrew Morton
  2021-02-26  1:21 ` [patch 087/118] treewide: Miguel has moved Andrew Morton
                   ` (32 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:20 UTC (permalink / raw)
  To: akpm, dmitry.torokhov, kwankhede, linux-mm, mm-commits, oleg,
	rdunlap, stefanr, torvalds

From: Randy Dunlap <rdunlap@infradead.org>
Subject: include/linux: remove repeated words

Drop the doubled word "for" in a comment. {firewire-cdev.h}
Drop the doubled word "in" in a comment. {input.h}
Drop the doubled word "a" in a comment. {mdev.h}
Drop the doubled word "the" in a comment. {ptrace.h}

Link: https://lkml.kernel.org/r/20210126232444.22861-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Kirti Wankhede <kwankhede@nvidia.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mdev.h               |    2 +-
 include/linux/ptrace.h             |    2 +-
 include/uapi/linux/firewire-cdev.h |    2 +-
 include/uapi/linux/input.h         |    2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

--- a/include/linux/mdev.h~include-linux-remove-repeated-words
+++ a/include/linux/mdev.h
@@ -42,7 +42,7 @@ struct device *mdev_get_iommu_device(str
  *			@mdev: mdev_device structure on of mediated device
  *			      that is being created
  *			Returns integer: success (0) or error (< 0)
- * @remove:		Called to free resources in parent device's driver for a
+ * @remove:		Called to free resources in parent device's driver for
  *			a mediated device. It is mandatory to provide 'remove'
  *			ops.
  *			@mdev: mdev_device device structure which is being
--- a/include/linux/ptrace.h~include-linux-remove-repeated-words
+++ a/include/linux/ptrace.h
@@ -171,7 +171,7 @@ static inline void ptrace_event(int even
  *
  * Check whether @event is enabled and, if so, report @event and @pid
  * to the ptrace parent.  @pid is reported as the pid_t seen from the
- * the ptrace parent's pid namespace.
+ * ptrace parent's pid namespace.
  *
  * Called without locks.
  */
--- a/include/uapi/linux/firewire-cdev.h~include-linux-remove-repeated-words
+++ a/include/uapi/linux/firewire-cdev.h
@@ -844,7 +844,7 @@ struct fw_cdev_queue_iso {
  * struct fw_cdev_start_iso - Start an isochronous transmission or reception
  * @cycle:	Cycle in which to start I/O.  If @cycle is greater than or
  *		equal to 0, the I/O will start on that cycle.
- * @sync:	Determines the value to wait for for receive packets that have
+ * @sync:	Determines the value to wait for receive packets that have
  *		the %FW_CDEV_ISO_SYNC bit set
  * @tags:	Tag filter bit mask.  Only valid for isochronous reception.
  *		Determines the tag values for which packets will be accepted.
--- a/include/uapi/linux/input.h~include-linux-remove-repeated-words
+++ a/include/uapi/linux/input.h
@@ -84,7 +84,7 @@ struct input_id {
  * in units per radian.
  * When INPUT_PROP_ACCELEROMETER is set the resolution changes.
  * The main axes (ABS_X, ABS_Y, ABS_Z) are then reported in
- * in units per g (units/g) and in units per degree per second
+ * units per g (units/g) and in units per degree per second
  * (units/deg/s) for rotational axes (ABS_RX, ABS_RY, ABS_RZ).
  */
 struct input_absinfo {
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 087/118] treewide: Miguel has moved
  2021-02-26  1:14 incoming Andrew Morton
                   ` (85 preceding siblings ...)
  2021-02-26  1:20 ` [patch 086/118] include/linux: remove repeated words Andrew Morton
@ 2021-02-26  1:21 ` Andrew Morton
  2021-02-26  1:21 ` [patch 088/118] groups: use flexible-array member in struct group_info Andrew Morton
                   ` (31 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:21 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, ojeda, torvalds

From: Miguel Ojeda <ojeda@kernel.org>
Subject: treewide: Miguel has moved

Update contact info.

Link: https://lkml.kernel.org/r/20210206162524.GA11520@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 .mailmap                                            |    1 
 CREDITS                                             |    9 ++-----
 Documentation/admin-guide/auxdisplay/cfag12864b.rst |    2 -
 Documentation/admin-guide/auxdisplay/ks0108.rst     |    2 -
 MAINTAINERS                                         |   12 +++++-----
 drivers/auxdisplay/cfag12864b.c                     |    4 +--
 drivers/auxdisplay/cfag12864bfb.c                   |    4 +--
 drivers/auxdisplay/ks0108.c                         |    4 +--
 include/linux/cfag12864b.h                          |    2 -
 include/linux/ks0108.h                              |    2 -
 samples/auxdisplay/cfag12864b-example.c             |    2 -
 11 files changed, 21 insertions(+), 23 deletions(-)

--- a/CREDITS~treewide-update-my-contact-info
+++ a/CREDITS
@@ -2841,14 +2841,11 @@ S: Subiaco, 6008
 S: Perth, Western Australia
 S: Australia
 
-N: Miguel Ojeda Sandonis
-E: miguel.ojeda.sandonis@gmail.com
-W: http://miguelojeda.es
-W: http://jair.lab.fi.uva.es/~migojed/
+N: Miguel Ojeda
+E: ojeda@kernel.org
+W: https://ojeda.dev
 D: Author of the ks0108, cfag12864b and cfag12864bfb auxiliary display drivers.
 D: Maintainer of the auxiliary display drivers tree (drivers/auxdisplay/*)
-S: C/ Mieses 20, 9-B
-S: Valladolid 47009
 S: Spain
 
 N: Peter Oruba
--- a/Documentation/admin-guide/auxdisplay/cfag12864b.rst~treewide-update-my-contact-info
+++ a/Documentation/admin-guide/auxdisplay/cfag12864b.rst
@@ -3,7 +3,7 @@ cfag12864b LCD Driver Documentation
 ===================================
 
 :License:		GPLv2
-:Author & Maintainer:	Miguel Ojeda Sandonis
+:Author & Maintainer:	Miguel Ojeda <ojeda@kernel.org>
 :Date:			2006-10-27
 
 
--- a/Documentation/admin-guide/auxdisplay/ks0108.rst~treewide-update-my-contact-info
+++ a/Documentation/admin-guide/auxdisplay/ks0108.rst
@@ -3,7 +3,7 @@ ks0108 LCD Controller Driver Documentati
 ==========================================
 
 :License:		GPLv2
-:Author & Maintainer:	Miguel Ojeda Sandonis
+:Author & Maintainer:	Miguel Ojeda <ojeda@kernel.org>
 :Date:			2006-10-27
 
 
--- a/drivers/auxdisplay/cfag12864b.c~treewide-update-my-contact-info
+++ a/drivers/auxdisplay/cfag12864b.c
@@ -5,7 +5,7 @@
  * Description: cfag12864b LCD driver
  *     Depends: ks0108
  *
- *      Author: Copyright (C) Miguel Ojeda Sandonis
+ *      Author: Copyright (C) Miguel Ojeda <ojeda@kernel.org>
  *        Date: 2006-10-31
  */
 
@@ -376,5 +376,5 @@ module_init(cfag12864b_init);
 module_exit(cfag12864b_exit);
 
 MODULE_LICENSE("GPL v2");
-MODULE_AUTHOR("Miguel Ojeda Sandonis <miguel.ojeda.sandonis@gmail.com>");
+MODULE_AUTHOR("Miguel Ojeda <ojeda@kernel.org>");
 MODULE_DESCRIPTION("cfag12864b LCD driver");
--- a/drivers/auxdisplay/cfag12864bfb.c~treewide-update-my-contact-info
+++ a/drivers/auxdisplay/cfag12864bfb.c
@@ -5,7 +5,7 @@
  * Description: cfag12864b LCD framebuffer driver
  *     Depends: cfag12864b
  *
- *      Author: Copyright (C) Miguel Ojeda Sandonis
+ *      Author: Copyright (C) Miguel Ojeda <ojeda@kernel.org>
  *        Date: 2006-10-31
  */
 
@@ -171,5 +171,5 @@ module_init(cfag12864bfb_init);
 module_exit(cfag12864bfb_exit);
 
 MODULE_LICENSE("GPL v2");
-MODULE_AUTHOR("Miguel Ojeda Sandonis <miguel.ojeda.sandonis@gmail.com>");
+MODULE_AUTHOR("Miguel Ojeda <ojeda@kernel.org>");
 MODULE_DESCRIPTION("cfag12864b LCD framebuffer driver");
--- a/drivers/auxdisplay/ks0108.c~treewide-update-my-contact-info
+++ a/drivers/auxdisplay/ks0108.c
@@ -5,7 +5,7 @@
  * Description: ks0108 LCD Controller driver
  *     Depends: parport
  *
- *      Author: Copyright (C) Miguel Ojeda Sandonis
+ *      Author: Copyright (C) Miguel Ojeda <ojeda@kernel.org>
  *        Date: 2006-10-31
  */
 
@@ -182,6 +182,6 @@ module_init(ks0108_init);
 module_exit(ks0108_exit);
 
 MODULE_LICENSE("GPL v2");
-MODULE_AUTHOR("Miguel Ojeda Sandonis <miguel.ojeda.sandonis@gmail.com>");
+MODULE_AUTHOR("Miguel Ojeda <ojeda@kernel.org>");
 MODULE_DESCRIPTION("ks0108 LCD Controller driver");
 
--- a/include/linux/cfag12864b.h~treewide-update-my-contact-info
+++ a/include/linux/cfag12864b.h
@@ -4,7 +4,7 @@
  *     Version: 0.1.0
  * Description: cfag12864b LCD driver header
  *
- *      Author: Copyright (C) Miguel Ojeda Sandonis
+ *      Author: Copyright (C) Miguel Ojeda <ojeda@kernel.org>
  *        Date: 2006-10-12
  */
 
--- a/include/linux/ks0108.h~treewide-update-my-contact-info
+++ a/include/linux/ks0108.h
@@ -4,7 +4,7 @@
  *     Version: 0.1.0
  * Description: ks0108 LCD Controller driver header
  *
- *      Author: Copyright (C) Miguel Ojeda Sandonis
+ *      Author: Copyright (C) Miguel Ojeda <ojeda@kernel.org>
  *        Date: 2006-10-31
  */
 
--- a/.mailmap~treewide-update-my-contact-info
+++ a/.mailmap
@@ -237,6 +237,7 @@ Maxime Ripard <mripard@kernel.org> <maxi
 Mayuresh Janorkar <mayur@ti.com>
 Michael Buesch <m@bues.ch>
 Michel Dänzer <michel@tungstengraphics.com>
+Miguel Ojeda <ojeda@kernel.org> <miguel.ojeda.sandonis@gmail.com>
 Mike Rapoport <rppt@kernel.org> <mike@compulab.co.il>
 Mike Rapoport <rppt@kernel.org> <mike.rapoport@gmail.com>
 Mike Rapoport <rppt@kernel.org> <rppt@linux.ibm.com>
--- a/MAINTAINERS~treewide-update-my-contact-info
+++ a/MAINTAINERS
@@ -2982,7 +2982,7 @@ F:	include/uapi/linux/audit.h
 F:	kernel/audit*
 
 AUXILIARY DISPLAY DRIVERS
-M:	Miguel Ojeda Sandonis <miguel.ojeda.sandonis@gmail.com>
+M:	Miguel Ojeda <ojeda@kernel.org>
 S:	Maintained
 F:	drivers/auxdisplay/
 F:	include/linux/cfag12864b.h
@@ -4128,13 +4128,13 @@ F:	scripts/extract-cert.c
 F:	scripts/sign-file.c
 
 CFAG12864B LCD DRIVER
-M:	Miguel Ojeda Sandonis <miguel.ojeda.sandonis@gmail.com>
+M:	Miguel Ojeda <ojeda@kernel.org>
 S:	Maintained
 F:	drivers/auxdisplay/cfag12864b.c
 F:	include/linux/cfag12864b.h
 
 CFAG12864BFB LCD FRAMEBUFFER DRIVER
-M:	Miguel Ojeda Sandonis <miguel.ojeda.sandonis@gmail.com>
+M:	Miguel Ojeda <ojeda@kernel.org>
 S:	Maintained
 F:	drivers/auxdisplay/cfag12864bfb.c
 F:	include/linux/cfag12864b.h
@@ -4304,7 +4304,7 @@ S:	Supported
 F:	drivers/infiniband/hw/usnic/
 
 CLANG-FORMAT FILE
-M:	Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
+M:	Miguel Ojeda <ojeda@kernel.org>
 S:	Maintained
 F:	.clang-format
 
@@ -4444,7 +4444,7 @@ S:	Maintained
 F:	drivers/platform/x86/compal-laptop.c
 
 COMPILER ATTRIBUTES
-M:	Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
+M:	Miguel Ojeda <ojeda@kernel.org>
 S:	Maintained
 F:	include/linux/compiler_attributes.h
 
@@ -9939,7 +9939,7 @@ F:	include/linux/kprobes.h
 F:	kernel/kprobes.c
 
 KS0108 LCD CONTROLLER DRIVER
-M:	Miguel Ojeda Sandonis <miguel.ojeda.sandonis@gmail.com>
+M:	Miguel Ojeda <ojeda@kernel.org>
 S:	Maintained
 F:	Documentation/admin-guide/auxdisplay/ks0108.rst
 F:	drivers/auxdisplay/ks0108.c
--- a/samples/auxdisplay/cfag12864b-example.c~treewide-update-my-contact-info
+++ a/samples/auxdisplay/cfag12864b-example.c
@@ -4,7 +4,7 @@
  *     Version: 0.1.0
  * Description: cfag12864b LCD userspace example program
  *
- *      Author: Copyright (C) Miguel Ojeda Sandonis
+ *      Author: Copyright (C) Miguel Ojeda <ojeda@kernel.org>
  *        Date: 2006-10-31
  */
 
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 088/118] groups: use flexible-array member in struct group_info
  2021-02-26  1:14 incoming Andrew Morton
                   ` (86 preceding siblings ...)
  2021-02-26  1:21 ` [patch 087/118] treewide: Miguel has moved Andrew Morton
@ 2021-02-26  1:21 ` Andrew Morton
  2021-02-26  1:21 ` [patch 089/118] groups: simplify struct group_info allocation Andrew Morton
                   ` (30 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:21 UTC (permalink / raw)
  To: akpm, hubert.jasudowicz, linux-mm, mikelley, mm-commits, mortonm,
	peterz, thomascedeno, torvalds, xiang

From: Hubert Jasudowicz <hubert.jasudowicz@gmail.com>
Subject: groups: use flexible-array member in struct group_info

Replace zero-size array with flexible array member, as recommended by
the docs.

Link: https://lkml.kernel.org/r/155995eed35c3c1bdcc56e69d8997c8e4c46740a.1611620846.git.hubert.jasudowicz@gmail.com
Signed-off-by: Hubert Jasudowicz <hubert.jasudowicz@gmail.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Micah Morton <mortonm@chromium.org>
Cc: Gao Xiang <xiang@kernel.org>
Cc: Michael Kelley <mikelley@microsoft.com>
Cc: Thomas Cedeno <thomascedeno@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/cred.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/include/linux/cred.h~groups-use-flexible-array-member-in-struct-group_info
+++ a/include/linux/cred.h
@@ -25,7 +25,7 @@ struct inode;
 struct group_info {
 	atomic_t	usage;
 	int		ngroups;
-	kgid_t		gid[0];
+	kgid_t		gid[];
 } __randomize_layout;
 
 /**
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 089/118] groups: simplify struct group_info allocation
  2021-02-26  1:14 incoming Andrew Morton
                   ` (87 preceding siblings ...)
  2021-02-26  1:21 ` [patch 088/118] groups: use flexible-array member in struct group_info Andrew Morton
@ 2021-02-26  1:21 ` Andrew Morton
  2021-02-26  1:21 ` [patch 090/118] kernel: delete repeated words in comments Andrew Morton
                   ` (29 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:21 UTC (permalink / raw)
  To: akpm, hubert.jasudowicz, linux-mm, mikelley, mm-commits, mortonm,
	peterz, thomascedeno, torvalds, xiang

From: Hubert Jasudowicz <hubert.jasudowicz@gmail.com>
Subject: groups: simplify struct group_info allocation

Combine kmalloc and vmalloc into a single call.  Use struct_size macro
instead of direct size calculation.

Link: https://lkml.kernel.org/r/ba9ba5beea9a44b7196c41a0d9528abd5f20dd2e.1611620846.git.hubert.jasudowicz@gmail.com
Signed-off-by: Hubert Jasudowicz <hubert.jasudowicz@gmail.com>
Cc: Gao Xiang <xiang@kernel.org>
Cc: Micah Morton <mortonm@chromium.org>
Cc: Michael Kelley <mikelley@microsoft.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Thomas Cedeno <thomascedeno@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 kernel/groups.c |    7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

--- a/kernel/groups.c~groups-simplify-struct-group_info-allocation
+++ a/kernel/groups.c
@@ -15,12 +15,7 @@
 struct group_info *groups_alloc(int gidsetsize)
 {
 	struct group_info *gi;
-	unsigned int len;
-
-	len = sizeof(struct group_info) + sizeof(kgid_t) * gidsetsize;
-	gi = kmalloc(len, GFP_KERNEL_ACCOUNT|__GFP_NOWARN|__GFP_NORETRY);
-	if (!gi)
-		gi = __vmalloc(len, GFP_KERNEL_ACCOUNT);
+	gi = kvmalloc(struct_size(gi, gid, gidsetsize), GFP_KERNEL_ACCOUNT);
 	if (!gi)
 		return NULL;
 
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 090/118] kernel: delete repeated words in comments
  2021-02-26  1:14 incoming Andrew Morton
                   ` (88 preceding siblings ...)
  2021-02-26  1:21 ` [patch 089/118] groups: simplify struct group_info allocation Andrew Morton
@ 2021-02-26  1:21 ` Andrew Morton
  2021-02-26  1:21 ` [patch 091/118] MAINTAINERS: add uapi directories to API/ABI section Andrew Morton
                   ` (28 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:21 UTC (permalink / raw)
  To: acme, akpm, juri.lelli, linux-mm, mathieu.desnoyers, mingo,
	mm-commits, paulmck, peterz, rdunlap, torvalds, vincent.guittot,
	will

From: Randy Dunlap <rdunlap@infradead.org>
Subject: kernel: delete repeated words in comments

Drop repeated words in kernel/events/.
{if, the, that, with, time}

Drop repeated words in kernel/locking/.
{it, no, the}

Drop repeated words in kernel/sched/.
{in, not}

Link: https://lkml.kernel.org/r/20210127023412.26292-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Will Deacon <will@kernel.org>	[kernel/locking/]
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 kernel/events/core.c       |    8 ++++----
 kernel/events/uprobes.c    |    2 +-
 kernel/locking/rtmutex.c   |    4 ++--
 kernel/locking/rwsem.c     |    2 +-
 kernel/locking/semaphore.c |    2 +-
 kernel/sched/fair.c        |    2 +-
 kernel/sched/membarrier.c  |    2 +-
 7 files changed, 11 insertions(+), 11 deletions(-)

--- a/kernel/events/core.c~kernel-delete-repeated-words-in-comments
+++ a/kernel/events/core.c
@@ -269,7 +269,7 @@ static void event_function_call(struct p
 	if (!event->parent) {
 		/*
 		 * If this is a !child event, we must hold ctx::mutex to
-		 * stabilize the the event->ctx relation. See
+		 * stabilize the event->ctx relation. See
 		 * perf_event_ctx_lock().
 		 */
 		lockdep_assert_held(&ctx->mutex);
@@ -1303,7 +1303,7 @@ static void put_ctx(struct perf_event_co
  * life-time rules separate them. That is an exiting task cannot fork, and a
  * spawning task cannot (yet) exit.
  *
- * But remember that that these are parent<->child context relations, and
+ * But remember that these are parent<->child context relations, and
  * migration does not affect children, therefore these two orderings should not
  * interact.
  *
@@ -1442,7 +1442,7 @@ static u64 primary_event_id(struct perf_
 /*
  * Get the perf_event_context for a task and lock it.
  *
- * This has to cope with with the fact that until it is locked,
+ * This has to cope with the fact that until it is locked,
  * the context could get moved to another task.
  */
 static struct perf_event_context *
@@ -2486,7 +2486,7 @@ static void perf_set_shadow_time(struct
 	 * But this is a bit hairy.
 	 *
 	 * So instead, we have an explicit cgroup call to remain
-	 * within the time time source all along. We believe it
+	 * within the time source all along. We believe it
 	 * is cleaner and simpler to understand.
 	 */
 	if (is_cgroup_event(event))
--- a/kernel/events/uprobes.c~kernel-delete-repeated-words-in-comments
+++ a/kernel/events/uprobes.c
@@ -1733,7 +1733,7 @@ void uprobe_free_utask(struct task_struc
 }
 
 /*
- * Allocate a uprobe_task object for the task if if necessary.
+ * Allocate a uprobe_task object for the task if necessary.
  * Called when the thread hits a breakpoint.
  *
  * Returns:
--- a/kernel/locking/rtmutex.c~kernel-delete-repeated-words-in-comments
+++ a/kernel/locking/rtmutex.c
@@ -1420,7 +1420,7 @@ rt_mutex_fasttrylock(struct rt_mutex *lo
 }
 
 /*
- * Performs the wakeup of the the top-waiter and re-enables preemption.
+ * Performs the wakeup of the top-waiter and re-enables preemption.
  */
 void rt_mutex_postunlock(struct wake_q_head *wake_q)
 {
@@ -1819,7 +1819,7 @@ struct task_struct *rt_mutex_next_owner(
  *			been started.
  * @waiter:		the pre-initialized rt_mutex_waiter
  *
- * Wait for the the lock acquisition started on our behalf by
+ * Wait for the lock acquisition started on our behalf by
  * rt_mutex_start_proxy_lock(). Upon failure, the caller must call
  * rt_mutex_cleanup_proxy_lock().
  *
--- a/kernel/locking/rwsem.c~kernel-delete-repeated-words-in-comments
+++ a/kernel/locking/rwsem.c
@@ -1048,7 +1048,7 @@ rwsem_down_write_slowpath(struct rw_sema
 
 		/*
 		 * If there were already threads queued before us and:
-		 *  1) there are no no active locks, wake the front
+		 *  1) there are no active locks, wake the front
 		 *     queued process(es) as the handoff bit might be set.
 		 *  2) there are no active writers and some readers, the lock
 		 *     must be read owned; so we try to wake any read lock
--- a/kernel/locking/semaphore.c~kernel-delete-repeated-words-in-comments
+++ a/kernel/locking/semaphore.c
@@ -119,7 +119,7 @@ EXPORT_SYMBOL(down_killable);
  * @sem: the semaphore to be acquired
  *
  * Try to acquire the semaphore atomically.  Returns 0 if the semaphore has
- * been acquired successfully or 1 if it it cannot be acquired.
+ * been acquired successfully or 1 if it cannot be acquired.
  *
  * NOTE: This return value is inverted from both spin_trylock and
  * mutex_trylock!  Be careful about this when converting code.
--- a/kernel/sched/fair.c~kernel-delete-repeated-words-in-comments
+++ a/kernel/sched/fair.c
@@ -5126,7 +5126,7 @@ static void do_sched_cfs_slack_timer(str
 /*
  * When a group wakes up we want to make sure that its quota is not already
  * expired/exceeded, otherwise it may be allowed to steal additional ticks of
- * runtime as update_curr() throttling can not not trigger until it's on-rq.
+ * runtime as update_curr() throttling can not trigger until it's on-rq.
  */
 static void check_enqueue_throttle(struct cfs_rq *cfs_rq)
 {
--- a/kernel/sched/membarrier.c~kernel-delete-repeated-words-in-comments
+++ a/kernel/sched/membarrier.c
@@ -454,7 +454,7 @@ static int sync_runqueues_membarrier_sta
 
 	/*
 	 * For each cpu runqueue, if the task's mm match @mm, ensure that all
-	 * @mm's membarrier state set bits are also set in in the runqueue's
+	 * @mm's membarrier state set bits are also set in the runqueue's
 	 * membarrier state. This ensures that a runqueue scheduling
 	 * between threads which are users of @mm has its membarrier state
 	 * updated.
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 091/118] MAINTAINERS: add uapi directories to API/ABI section
  2021-02-26  1:14 incoming Andrew Morton
                   ` (89 preceding siblings ...)
  2021-02-26  1:21 ` [patch 090/118] kernel: delete repeated words in comments Andrew Morton
@ 2021-02-26  1:21 ` Andrew Morton
  2021-02-27 19:18   ` Michael Kerrisk (man-pages)
  2021-02-26  1:21 ` [patch 092/118] lib/genalloc.c: change return type to unsigned long for bitmap_set_ll Andrew Morton
                   ` (27 subsequent siblings)
  118 siblings, 1 reply; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:21 UTC (permalink / raw)
  To: akpm, david, linux-mm, mm-commits, mtk.manpages, torvalds, vbabka

From: Vlastimil Babka <vbabka@suse.cz>
Subject: MAINTAINERS: add uapi directories to API/ABI section

Let's add include/uapi/ and arch/*/include/uapi/ to API/ABI section, so
that for patches modifying them, get_maintainers.pl suggests CCing
linux-api@ so people don't forget.

Link: https://lkml.kernel.org/r/20210217174745.13591-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: David Hildenbrand <david@redhat.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 MAINTAINERS |    2 ++
 1 file changed, 2 insertions(+)

--- a/MAINTAINERS~maintainers-add-uapi-directories-to-api-abi-section
+++ a/MAINTAINERS
@@ -261,6 +261,8 @@ ABI/API
 L:	linux-api@vger.kernel.org
 F:	include/linux/syscalls.h
 F:	kernel/sys_ni.c
+F:	include/uapi/
+F:	arch/*/include/uapi/
 
 ABIT UGURU 1,2 HARDWARE MONITOR DRIVER
 M:	Hans de Goede <hdegoede@redhat.com>
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 092/118] lib/genalloc.c: change return type to unsigned long for bitmap_set_ll
  2021-02-26  1:14 incoming Andrew Morton
                   ` (90 preceding siblings ...)
  2021-02-26  1:21 ` [patch 091/118] MAINTAINERS: add uapi directories to API/ABI section Andrew Morton
@ 2021-02-26  1:21 ` Andrew Morton
  2021-02-26  1:21 ` [patch 093/118] string.h: move fortified functions definitions in a dedicated header Andrew Morton
                   ` (26 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:21 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, sjhuang, torvalds

From: Huang Shijie <sjhuang@iluvatar.ai>
Subject: lib/genalloc.c: change return type to unsigned long for bitmap_set_ll

Just as bitmap_clear_ll(), change return type to unsigned long
for bitmap_set_ll to avoid the possible overflow in future.

Link: https://lkml.kernel.org/r/20210105031644.2771-1-sjhuang@iluvatar.ai
Signed-off-by: Huang Shijie <sjhuang@iluvatar.ai>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 lib/genalloc.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/lib/genalloc.c~lib-genalloc-change-return-type-to-unsigned-long-for-bitmap_set_ll
+++ a/lib/genalloc.c
@@ -81,7 +81,8 @@ static int clear_bits_ll(unsigned long *
  * users set the same bit, one user will return remain bits, otherwise
  * return 0.
  */
-static int bitmap_set_ll(unsigned long *map, unsigned long start, unsigned long nr)
+static unsigned long
+bitmap_set_ll(unsigned long *map, unsigned long start, unsigned long nr)
 {
 	unsigned long *p = map + BIT_WORD(start);
 	const unsigned long size = start + nr;
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 093/118] string.h: move fortified functions definitions in a dedicated header.
  2021-02-26  1:14 incoming Andrew Morton
                   ` (91 preceding siblings ...)
  2021-02-26  1:21 ` [patch 092/118] lib/genalloc.c: change return type to unsigned long for bitmap_set_ll Andrew Morton
@ 2021-02-26  1:21 ` Andrew Morton
  2021-02-26  1:21 ` [patch 094/118] lib: stackdepot: add support to configure STACK_HASH_SIZE Andrew Morton
                   ` (25 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:21 UTC (permalink / raw)
  To: akpm, laniel_francis, linux-mm, mm-commits, torvalds

From: Francis Laniel <laniel_francis@privacyrequired.com>
Subject: string.h: move fortified functions definitions in a dedicated header.

This patch adds fortify-string.h to contain fortified functions
definitions.  Thus, the code is more separated and compile time is
approximately 1% faster for people who do not set CONFIG_FORTIFY_SOURCE.

Link: https://lkml.kernel.org/r/20210111092141.22946-1-laniel_francis@privacyrequired.com
Link: https://lkml.kernel.org/r/20210111092141.22946-2-laniel_francis@privacyrequired.com
Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/fortify-string.h |  302 +++++++++++++++++++++++++++++++
 include/linux/string.h         |  282 ----------------------------
 2 files changed, 303 insertions(+), 281 deletions(-)

--- /dev/null
+++ a/include/linux/fortify-string.h
@@ -0,0 +1,302 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_FORTIFY_STRING_H_
+#define _LINUX_FORTIFY_STRING_H_
+
+
+#if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
+extern void *__underlying_memchr(const void *p, int c, __kernel_size_t size) __RENAME(memchr);
+extern int __underlying_memcmp(const void *p, const void *q, __kernel_size_t size) __RENAME(memcmp);
+extern void *__underlying_memcpy(void *p, const void *q, __kernel_size_t size) __RENAME(memcpy);
+extern void *__underlying_memmove(void *p, const void *q, __kernel_size_t size) __RENAME(memmove);
+extern void *__underlying_memset(void *p, int c, __kernel_size_t size) __RENAME(memset);
+extern char *__underlying_strcat(char *p, const char *q) __RENAME(strcat);
+extern char *__underlying_strcpy(char *p, const char *q) __RENAME(strcpy);
+extern __kernel_size_t __underlying_strlen(const char *p) __RENAME(strlen);
+extern char *__underlying_strncat(char *p, const char *q, __kernel_size_t count) __RENAME(strncat);
+extern char *__underlying_strncpy(char *p, const char *q, __kernel_size_t size) __RENAME(strncpy);
+#else
+#define __underlying_memchr	__builtin_memchr
+#define __underlying_memcmp	__builtin_memcmp
+#define __underlying_memcpy	__builtin_memcpy
+#define __underlying_memmove	__builtin_memmove
+#define __underlying_memset	__builtin_memset
+#define __underlying_strcat	__builtin_strcat
+#define __underlying_strcpy	__builtin_strcpy
+#define __underlying_strlen	__builtin_strlen
+#define __underlying_strncat	__builtin_strncat
+#define __underlying_strncpy	__builtin_strncpy
+#endif
+
+__FORTIFY_INLINE char *strncpy(char *p, const char *q, __kernel_size_t size)
+{
+	size_t p_size = __builtin_object_size(p, 1);
+
+	if (__builtin_constant_p(size) && p_size < size)
+		__write_overflow();
+	if (p_size < size)
+		fortify_panic(__func__);
+	return __underlying_strncpy(p, q, size);
+}
+
+__FORTIFY_INLINE char *strcat(char *p, const char *q)
+{
+	size_t p_size = __builtin_object_size(p, 1);
+
+	if (p_size == (size_t)-1)
+		return __underlying_strcat(p, q);
+	if (strlcat(p, q, p_size) >= p_size)
+		fortify_panic(__func__);
+	return p;
+}
+
+__FORTIFY_INLINE __kernel_size_t strlen(const char *p)
+{
+	__kernel_size_t ret;
+	size_t p_size = __builtin_object_size(p, 1);
+
+	/* Work around gcc excess stack consumption issue */
+	if (p_size == (size_t)-1 ||
+		(__builtin_constant_p(p[p_size - 1]) && p[p_size - 1] == '\0'))
+		return __underlying_strlen(p);
+	ret = strnlen(p, p_size);
+	if (p_size <= ret)
+		fortify_panic(__func__);
+	return ret;
+}
+
+extern __kernel_size_t __real_strnlen(const char *, __kernel_size_t) __RENAME(strnlen);
+__FORTIFY_INLINE __kernel_size_t strnlen(const char *p, __kernel_size_t maxlen)
+{
+	size_t p_size = __builtin_object_size(p, 1);
+	__kernel_size_t ret = __real_strnlen(p, maxlen < p_size ? maxlen : p_size);
+
+	if (p_size <= ret && maxlen != ret)
+		fortify_panic(__func__);
+	return ret;
+}
+
+/* defined after fortified strlen to reuse it */
+extern size_t __real_strlcpy(char *, const char *, size_t) __RENAME(strlcpy);
+__FORTIFY_INLINE size_t strlcpy(char *p, const char *q, size_t size)
+{
+	size_t ret;
+	size_t p_size = __builtin_object_size(p, 1);
+	size_t q_size = __builtin_object_size(q, 1);
+
+	if (p_size == (size_t)-1 && q_size == (size_t)-1)
+		return __real_strlcpy(p, q, size);
+	ret = strlen(q);
+	if (size) {
+		size_t len = (ret >= size) ? size - 1 : ret;
+
+		if (__builtin_constant_p(len) && len >= p_size)
+			__write_overflow();
+		if (len >= p_size)
+			fortify_panic(__func__);
+		__underlying_memcpy(p, q, len);
+		p[len] = '\0';
+	}
+	return ret;
+}
+
+/* defined after fortified strnlen to reuse it */
+extern ssize_t __real_strscpy(char *, const char *, size_t) __RENAME(strscpy);
+__FORTIFY_INLINE ssize_t strscpy(char *p, const char *q, size_t size)
+{
+	size_t len;
+	/* Use string size rather than possible enclosing struct size. */
+	size_t p_size = __builtin_object_size(p, 1);
+	size_t q_size = __builtin_object_size(q, 1);
+
+	/* If we cannot get size of p and q default to call strscpy. */
+	if (p_size == (size_t) -1 && q_size == (size_t) -1)
+		return __real_strscpy(p, q, size);
+
+	/*
+	 * If size can be known at compile time and is greater than
+	 * p_size, generate a compile time write overflow error.
+	 */
+	if (__builtin_constant_p(size) && size > p_size)
+		__write_overflow();
+
+	/*
+	 * This call protects from read overflow, because len will default to q
+	 * length if it smaller than size.
+	 */
+	len = strnlen(q, size);
+	/*
+	 * If len equals size, we will copy only size bytes which leads to
+	 * -E2BIG being returned.
+	 * Otherwise we will copy len + 1 because of the final '\O'.
+	 */
+	len = len == size ? size : len + 1;
+
+	/*
+	 * Generate a runtime write overflow error if len is greater than
+	 * p_size.
+	 */
+	if (len > p_size)
+		fortify_panic(__func__);
+
+	/*
+	 * We can now safely call vanilla strscpy because we are protected from:
+	 * 1. Read overflow thanks to call to strnlen().
+	 * 2. Write overflow thanks to above ifs.
+	 */
+	return __real_strscpy(p, q, len);
+}
+
+/* defined after fortified strlen and strnlen to reuse them */
+__FORTIFY_INLINE char *strncat(char *p, const char *q, __kernel_size_t count)
+{
+	size_t p_len, copy_len;
+	size_t p_size = __builtin_object_size(p, 1);
+	size_t q_size = __builtin_object_size(q, 1);
+
+	if (p_size == (size_t)-1 && q_size == (size_t)-1)
+		return __underlying_strncat(p, q, count);
+	p_len = strlen(p);
+	copy_len = strnlen(q, count);
+	if (p_size < p_len + copy_len + 1)
+		fortify_panic(__func__);
+	__underlying_memcpy(p + p_len, q, copy_len);
+	p[p_len + copy_len] = '\0';
+	return p;
+}
+
+__FORTIFY_INLINE void *memset(void *p, int c, __kernel_size_t size)
+{
+	size_t p_size = __builtin_object_size(p, 0);
+
+	if (__builtin_constant_p(size) && p_size < size)
+		__write_overflow();
+	if (p_size < size)
+		fortify_panic(__func__);
+	return __underlying_memset(p, c, size);
+}
+
+__FORTIFY_INLINE void *memcpy(void *p, const void *q, __kernel_size_t size)
+{
+	size_t p_size = __builtin_object_size(p, 0);
+	size_t q_size = __builtin_object_size(q, 0);
+
+	if (__builtin_constant_p(size)) {
+		if (p_size < size)
+			__write_overflow();
+		if (q_size < size)
+			__read_overflow2();
+	}
+	if (p_size < size || q_size < size)
+		fortify_panic(__func__);
+	return __underlying_memcpy(p, q, size);
+}
+
+__FORTIFY_INLINE void *memmove(void *p, const void *q, __kernel_size_t size)
+{
+	size_t p_size = __builtin_object_size(p, 0);
+	size_t q_size = __builtin_object_size(q, 0);
+
+	if (__builtin_constant_p(size)) {
+		if (p_size < size)
+			__write_overflow();
+		if (q_size < size)
+			__read_overflow2();
+	}
+	if (p_size < size || q_size < size)
+		fortify_panic(__func__);
+	return __underlying_memmove(p, q, size);
+}
+
+extern void *__real_memscan(void *, int, __kernel_size_t) __RENAME(memscan);
+__FORTIFY_INLINE void *memscan(void *p, int c, __kernel_size_t size)
+{
+	size_t p_size = __builtin_object_size(p, 0);
+
+	if (__builtin_constant_p(size) && p_size < size)
+		__read_overflow();
+	if (p_size < size)
+		fortify_panic(__func__);
+	return __real_memscan(p, c, size);
+}
+
+__FORTIFY_INLINE int memcmp(const void *p, const void *q, __kernel_size_t size)
+{
+	size_t p_size = __builtin_object_size(p, 0);
+	size_t q_size = __builtin_object_size(q, 0);
+
+	if (__builtin_constant_p(size)) {
+		if (p_size < size)
+			__read_overflow();
+		if (q_size < size)
+			__read_overflow2();
+	}
+	if (p_size < size || q_size < size)
+		fortify_panic(__func__);
+	return __underlying_memcmp(p, q, size);
+}
+
+__FORTIFY_INLINE void *memchr(const void *p, int c, __kernel_size_t size)
+{
+	size_t p_size = __builtin_object_size(p, 0);
+
+	if (__builtin_constant_p(size) && p_size < size)
+		__read_overflow();
+	if (p_size < size)
+		fortify_panic(__func__);
+	return __underlying_memchr(p, c, size);
+}
+
+void *__real_memchr_inv(const void *s, int c, size_t n) __RENAME(memchr_inv);
+__FORTIFY_INLINE void *memchr_inv(const void *p, int c, size_t size)
+{
+	size_t p_size = __builtin_object_size(p, 0);
+
+	if (__builtin_constant_p(size) && p_size < size)
+		__read_overflow();
+	if (p_size < size)
+		fortify_panic(__func__);
+	return __real_memchr_inv(p, c, size);
+}
+
+extern void *__real_kmemdup(const void *src, size_t len, gfp_t gfp) __RENAME(kmemdup);
+__FORTIFY_INLINE void *kmemdup(const void *p, size_t size, gfp_t gfp)
+{
+	size_t p_size = __builtin_object_size(p, 0);
+
+	if (__builtin_constant_p(size) && p_size < size)
+		__read_overflow();
+	if (p_size < size)
+		fortify_panic(__func__);
+	return __real_kmemdup(p, size, gfp);
+}
+
+/* defined after fortified strlen and memcpy to reuse them */
+__FORTIFY_INLINE char *strcpy(char *p, const char *q)
+{
+	size_t p_size = __builtin_object_size(p, 1);
+	size_t q_size = __builtin_object_size(q, 1);
+	size_t size;
+
+	if (p_size == (size_t)-1 && q_size == (size_t)-1)
+		return __underlying_strcpy(p, q);
+	size = strlen(q) + 1;
+	/* test here to use the more stringent object size */
+	if (p_size < size)
+		fortify_panic(__func__);
+	memcpy(p, q, size);
+	return p;
+}
+
+/* Don't use these outside the FORITFY_SOURCE implementation */
+#undef __underlying_memchr
+#undef __underlying_memcmp
+#undef __underlying_memcpy
+#undef __underlying_memmove
+#undef __underlying_memset
+#undef __underlying_strcat
+#undef __underlying_strcpy
+#undef __underlying_strlen
+#undef __underlying_strncat
+#undef __underlying_strncpy
+
+#endif /* _LINUX_FORTIFY_STRING_H_ */
--- a/include/linux/string.h~stringh-move-fortified-functions-definitions-in-a-dedicated-header
+++ a/include/linux/string.h
@@ -266,287 +266,7 @@ void __read_overflow3(void) __compiletim
 void __write_overflow(void) __compiletime_error("detected write beyond size of object passed as 1st parameter");
 
 #if !defined(__NO_FORTIFY) && defined(__OPTIMIZE__) && defined(CONFIG_FORTIFY_SOURCE)
-
-#if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
-extern void *__underlying_memchr(const void *p, int c, __kernel_size_t size) __RENAME(memchr);
-extern int __underlying_memcmp(const void *p, const void *q, __kernel_size_t size) __RENAME(memcmp);
-extern void *__underlying_memcpy(void *p, const void *q, __kernel_size_t size) __RENAME(memcpy);
-extern void *__underlying_memmove(void *p, const void *q, __kernel_size_t size) __RENAME(memmove);
-extern void *__underlying_memset(void *p, int c, __kernel_size_t size) __RENAME(memset);
-extern char *__underlying_strcat(char *p, const char *q) __RENAME(strcat);
-extern char *__underlying_strcpy(char *p, const char *q) __RENAME(strcpy);
-extern __kernel_size_t __underlying_strlen(const char *p) __RENAME(strlen);
-extern char *__underlying_strncat(char *p, const char *q, __kernel_size_t count) __RENAME(strncat);
-extern char *__underlying_strncpy(char *p, const char *q, __kernel_size_t size) __RENAME(strncpy);
-#else
-#define __underlying_memchr	__builtin_memchr
-#define __underlying_memcmp	__builtin_memcmp
-#define __underlying_memcpy	__builtin_memcpy
-#define __underlying_memmove	__builtin_memmove
-#define __underlying_memset	__builtin_memset
-#define __underlying_strcat	__builtin_strcat
-#define __underlying_strcpy	__builtin_strcpy
-#define __underlying_strlen	__builtin_strlen
-#define __underlying_strncat	__builtin_strncat
-#define __underlying_strncpy	__builtin_strncpy
-#endif
-
-__FORTIFY_INLINE char *strncpy(char *p, const char *q, __kernel_size_t size)
-{
-	size_t p_size = __builtin_object_size(p, 1);
-	if (__builtin_constant_p(size) && p_size < size)
-		__write_overflow();
-	if (p_size < size)
-		fortify_panic(__func__);
-	return __underlying_strncpy(p, q, size);
-}
-
-__FORTIFY_INLINE char *strcat(char *p, const char *q)
-{
-	size_t p_size = __builtin_object_size(p, 1);
-	if (p_size == (size_t)-1)
-		return __underlying_strcat(p, q);
-	if (strlcat(p, q, p_size) >= p_size)
-		fortify_panic(__func__);
-	return p;
-}
-
-__FORTIFY_INLINE __kernel_size_t strlen(const char *p)
-{
-	__kernel_size_t ret;
-	size_t p_size = __builtin_object_size(p, 1);
-
-	/* Work around gcc excess stack consumption issue */
-	if (p_size == (size_t)-1 ||
-	    (__builtin_constant_p(p[p_size - 1]) && p[p_size - 1] == '\0'))
-		return __underlying_strlen(p);
-	ret = strnlen(p, p_size);
-	if (p_size <= ret)
-		fortify_panic(__func__);
-	return ret;
-}
-
-extern __kernel_size_t __real_strnlen(const char *, __kernel_size_t) __RENAME(strnlen);
-__FORTIFY_INLINE __kernel_size_t strnlen(const char *p, __kernel_size_t maxlen)
-{
-	size_t p_size = __builtin_object_size(p, 1);
-	__kernel_size_t ret = __real_strnlen(p, maxlen < p_size ? maxlen : p_size);
-	if (p_size <= ret && maxlen != ret)
-		fortify_panic(__func__);
-	return ret;
-}
-
-/* defined after fortified strlen to reuse it */
-extern size_t __real_strlcpy(char *, const char *, size_t) __RENAME(strlcpy);
-__FORTIFY_INLINE size_t strlcpy(char *p, const char *q, size_t size)
-{
-	size_t ret;
-	size_t p_size = __builtin_object_size(p, 1);
-	size_t q_size = __builtin_object_size(q, 1);
-	if (p_size == (size_t)-1 && q_size == (size_t)-1)
-		return __real_strlcpy(p, q, size);
-	ret = strlen(q);
-	if (size) {
-		size_t len = (ret >= size) ? size - 1 : ret;
-		if (__builtin_constant_p(len) && len >= p_size)
-			__write_overflow();
-		if (len >= p_size)
-			fortify_panic(__func__);
-		__underlying_memcpy(p, q, len);
-		p[len] = '\0';
-	}
-	return ret;
-}
-
-/* defined after fortified strnlen to reuse it */
-extern ssize_t __real_strscpy(char *, const char *, size_t) __RENAME(strscpy);
-__FORTIFY_INLINE ssize_t strscpy(char *p, const char *q, size_t size)
-{
-	size_t len;
-	/* Use string size rather than possible enclosing struct size. */
-	size_t p_size = __builtin_object_size(p, 1);
-	size_t q_size = __builtin_object_size(q, 1);
-
-	/* If we cannot get size of p and q default to call strscpy. */
-	if (p_size == (size_t) -1 && q_size == (size_t) -1)
-		return __real_strscpy(p, q, size);
-
-	/*
-	 * If size can be known at compile time and is greater than
-	 * p_size, generate a compile time write overflow error.
-	 */
-	if (__builtin_constant_p(size) && size > p_size)
-		__write_overflow();
-
-	/*
-	 * This call protects from read overflow, because len will default to q
-	 * length if it smaller than size.
-	 */
-	len = strnlen(q, size);
-	/*
-	 * If len equals size, we will copy only size bytes which leads to
-	 * -E2BIG being returned.
-	 * Otherwise we will copy len + 1 because of the final '\O'.
-	 */
-	len = len == size ? size : len + 1;
-
-	/*
-	 * Generate a runtime write overflow error if len is greater than
-	 * p_size.
-	 */
-	if (len > p_size)
-		fortify_panic(__func__);
-
-	/*
-	 * We can now safely call vanilla strscpy because we are protected from:
-	 * 1. Read overflow thanks to call to strnlen().
-	 * 2. Write overflow thanks to above ifs.
-	 */
-	return __real_strscpy(p, q, len);
-}
-
-/* defined after fortified strlen and strnlen to reuse them */
-__FORTIFY_INLINE char *strncat(char *p, const char *q, __kernel_size_t count)
-{
-	size_t p_len, copy_len;
-	size_t p_size = __builtin_object_size(p, 1);
-	size_t q_size = __builtin_object_size(q, 1);
-	if (p_size == (size_t)-1 && q_size == (size_t)-1)
-		return __underlying_strncat(p, q, count);
-	p_len = strlen(p);
-	copy_len = strnlen(q, count);
-	if (p_size < p_len + copy_len + 1)
-		fortify_panic(__func__);
-	__underlying_memcpy(p + p_len, q, copy_len);
-	p[p_len + copy_len] = '\0';
-	return p;
-}
-
-__FORTIFY_INLINE void *memset(void *p, int c, __kernel_size_t size)
-{
-	size_t p_size = __builtin_object_size(p, 0);
-	if (__builtin_constant_p(size) && p_size < size)
-		__write_overflow();
-	if (p_size < size)
-		fortify_panic(__func__);
-	return __underlying_memset(p, c, size);
-}
-
-__FORTIFY_INLINE void *memcpy(void *p, const void *q, __kernel_size_t size)
-{
-	size_t p_size = __builtin_object_size(p, 0);
-	size_t q_size = __builtin_object_size(q, 0);
-	if (__builtin_constant_p(size)) {
-		if (p_size < size)
-			__write_overflow();
-		if (q_size < size)
-			__read_overflow2();
-	}
-	if (p_size < size || q_size < size)
-		fortify_panic(__func__);
-	return __underlying_memcpy(p, q, size);
-}
-
-__FORTIFY_INLINE void *memmove(void *p, const void *q, __kernel_size_t size)
-{
-	size_t p_size = __builtin_object_size(p, 0);
-	size_t q_size = __builtin_object_size(q, 0);
-	if (__builtin_constant_p(size)) {
-		if (p_size < size)
-			__write_overflow();
-		if (q_size < size)
-			__read_overflow2();
-	}
-	if (p_size < size || q_size < size)
-		fortify_panic(__func__);
-	return __underlying_memmove(p, q, size);
-}
-
-extern void *__real_memscan(void *, int, __kernel_size_t) __RENAME(memscan);
-__FORTIFY_INLINE void *memscan(void *p, int c, __kernel_size_t size)
-{
-	size_t p_size = __builtin_object_size(p, 0);
-	if (__builtin_constant_p(size) && p_size < size)
-		__read_overflow();
-	if (p_size < size)
-		fortify_panic(__func__);
-	return __real_memscan(p, c, size);
-}
-
-__FORTIFY_INLINE int memcmp(const void *p, const void *q, __kernel_size_t size)
-{
-	size_t p_size = __builtin_object_size(p, 0);
-	size_t q_size = __builtin_object_size(q, 0);
-	if (__builtin_constant_p(size)) {
-		if (p_size < size)
-			__read_overflow();
-		if (q_size < size)
-			__read_overflow2();
-	}
-	if (p_size < size || q_size < size)
-		fortify_panic(__func__);
-	return __underlying_memcmp(p, q, size);
-}
-
-__FORTIFY_INLINE void *memchr(const void *p, int c, __kernel_size_t size)
-{
-	size_t p_size = __builtin_object_size(p, 0);
-	if (__builtin_constant_p(size) && p_size < size)
-		__read_overflow();
-	if (p_size < size)
-		fortify_panic(__func__);
-	return __underlying_memchr(p, c, size);
-}
-
-void *__real_memchr_inv(const void *s, int c, size_t n) __RENAME(memchr_inv);
-__FORTIFY_INLINE void *memchr_inv(const void *p, int c, size_t size)
-{
-	size_t p_size = __builtin_object_size(p, 0);
-	if (__builtin_constant_p(size) && p_size < size)
-		__read_overflow();
-	if (p_size < size)
-		fortify_panic(__func__);
-	return __real_memchr_inv(p, c, size);
-}
-
-extern void *__real_kmemdup(const void *src, size_t len, gfp_t gfp) __RENAME(kmemdup);
-__FORTIFY_INLINE void *kmemdup(const void *p, size_t size, gfp_t gfp)
-{
-	size_t p_size = __builtin_object_size(p, 0);
-	if (__builtin_constant_p(size) && p_size < size)
-		__read_overflow();
-	if (p_size < size)
-		fortify_panic(__func__);
-	return __real_kmemdup(p, size, gfp);
-}
-
-/* defined after fortified strlen and memcpy to reuse them */
-__FORTIFY_INLINE char *strcpy(char *p, const char *q)
-{
-	size_t p_size = __builtin_object_size(p, 1);
-	size_t q_size = __builtin_object_size(q, 1);
-	size_t size;
-	if (p_size == (size_t)-1 && q_size == (size_t)-1)
-		return __underlying_strcpy(p, q);
-	size = strlen(q) + 1;
-	/* test here to use the more stringent object size */
-	if (p_size < size)
-		fortify_panic(__func__);
-	memcpy(p, q, size);
-	return p;
-}
-
-/* Don't use these outside the FORITFY_SOURCE implementation */
-#undef __underlying_memchr
-#undef __underlying_memcmp
-#undef __underlying_memcpy
-#undef __underlying_memmove
-#undef __underlying_memset
-#undef __underlying_strcat
-#undef __underlying_strcpy
-#undef __underlying_strlen
-#undef __underlying_strncat
-#undef __underlying_strncpy
+#include <linux/fortify-string.h>
 #endif
 
 /**
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 094/118] lib: stackdepot: add support to configure STACK_HASH_SIZE
  2021-02-26  1:14 incoming Andrew Morton
                   ` (92 preceding siblings ...)
  2021-02-26  1:21 ` [patch 093/118] string.h: move fortified functions definitions in a dedicated header Andrew Morton
@ 2021-02-26  1:21 ` Andrew Morton
  2021-02-26  1:21 ` [patch 095/118] lib: stackdepot: add support to disable stack depot Andrew Morton
                   ` (24 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:21 UTC (permalink / raw)
  To: akpm, glider, linux-mm, minchan, mm-commits, torvalds, vinmenon,
	vjitta, ylal

From: Yogesh Lal <ylal@codeaurora.org>
Subject: lib: stackdepot: add support to configure STACK_HASH_SIZE

Use CONFIG_STACK_HASH_ORDER to configure STACK_HASH_SIZE.

Aim is to have configurable value for  STACK_HASH_SIZE,
so depend on use case one can configure it.

One example is of Page Owner, CONFIG_PAGE_OWNER works only if
page_owner=on via kernel parameter on CONFIG_PAGE_OWNER configured system.
Thus, unless admin enable it via command line option, the stackdepot will
just waste 8M memory without any customer.

Making it configurable and use lower value helps to enable features like
CONFIG_PAGE_OWNER without any significant overhead.

Link: https://lkml.kernel.org/r/1611749198-24316-1-git-send-email-vjitta@codeaurora.org
Signed-off-by: Yogesh Lal <ylal@codeaurora.org>
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Vijayanand Jitta <vjitta@codeaurora.org>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 lib/Kconfig      |    9 +++++++++
 lib/stackdepot.c |    3 +--
 2 files changed, 10 insertions(+), 2 deletions(-)

--- a/lib/Kconfig~lib-stackdepot-add-support-to-configure-stack_hash_size
+++ a/lib/Kconfig
@@ -651,6 +651,15 @@ config STACKDEPOT
 	bool
 	select STACKTRACE
 
+config STACK_HASH_ORDER
+	int "stack depot hash size (12 => 4KB, 20 => 1024KB)"
+	range 12 20
+	default 20
+	depends on STACKDEPOT
+	help
+	 Select the hash size as a power of 2 for the stackdepot hash table.
+	 Choose a lower value to reduce the memory impact.
+
 config SBITMAP
 	bool
 
--- a/lib/stackdepot.c~lib-stackdepot-add-support-to-configure-stack_hash_size
+++ a/lib/stackdepot.c
@@ -141,8 +141,7 @@ static struct stack_record *depot_alloc_
 	return stack;
 }
 
-#define STACK_HASH_ORDER 20
-#define STACK_HASH_SIZE (1L << STACK_HASH_ORDER)
+#define STACK_HASH_SIZE (1L << CONFIG_STACK_HASH_ORDER)
 #define STACK_HASH_MASK (STACK_HASH_SIZE - 1)
 #define STACK_HASH_SEED 0x9747b28c
 
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 095/118] lib: stackdepot: add support to disable stack depot
  2021-02-26  1:14 incoming Andrew Morton
                   ` (93 preceding siblings ...)
  2021-02-26  1:21 ` [patch 094/118] lib: stackdepot: add support to configure STACK_HASH_SIZE Andrew Morton
@ 2021-02-26  1:21 ` Andrew Morton
  2021-02-26  1:21 ` [patch 096/118] lib: stackdepot: fix ignoring return value warning Andrew Morton
                   ` (23 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:21 UTC (permalink / raw)
  To: akpm, glider, linux-mm, minchan, mm-commits, torvalds, vinmenon,
	vjitta, ylal

From: Vijayanand Jitta <vjitta@codeaurora.org>
Subject: lib: stackdepot: add support to disable stack depot

Add a kernel parameter stack_depot_disable to disable stack depot.  So
that stack hash table doesn't consume any memory when stack depot is
disabled.

The use case is CONFIG_PAGE_OWNER without page_owner=on.  Without this
patch, stackdepot will consume the memory for the hashtable.  By default,
it's 8M which is never trivial.

With this option, in CONFIG_PAGE_OWNER configured system, page_owner=off,
stack_depot_disable in kernel command line, we could save the wasted
memory for the hashtable.

[akpm@linux-foundation.org: fix CONFIG_STACKDEPOT=n build]
Link: https://lkml.kernel.org/r/1611749198-24316-2-git-send-email-vjitta@codeaurora.org
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Vijayanand Jitta <vjitta@codeaurora.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Yogesh Lal <ylal@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 Documentation/admin-guide/kernel-parameters.txt |    6 ++
 include/linux/stackdepot.h                      |    9 +++
 init/main.c                                     |    2 
 lib/stackdepot.c                                |   32 ++++++++++++--
 4 files changed, 45 insertions(+), 4 deletions(-)

--- a/Documentation/admin-guide/kernel-parameters.txt~lib-stackdepot-add-support-to-disable-stack-depot
+++ a/Documentation/admin-guide/kernel-parameters.txt
@@ -5182,6 +5182,12 @@
 			growing up) the main stack are reserved for no other
 			mapping. Default value is 256 pages.
 
+	stack_depot_disable= [KNL]
+			Setting this to true through kernel command line will
+			disable the stack depot thereby saving the static memory
+			consumed by the stack hash table. By default this is set
+			to false.
+
 	stacktrace	[FTRACE]
 			Enabled the stack tracer on boot up.
 
--- a/include/linux/stackdepot.h~lib-stackdepot-add-support-to-disable-stack-depot
+++ a/include/linux/stackdepot.h
@@ -21,4 +21,13 @@ unsigned int stack_depot_fetch(depot_sta
 
 unsigned int filter_irq_stacks(unsigned long *entries, unsigned int nr_entries);
 
+#ifdef CONFIG_STACKDEPOT
+int stack_depot_init(void);
+#else
+static inline int stack_depot_init(void)
+{
+	return 0;
+}
+#endif	/* CONFIG_STACKDEPOT */
+
 #endif
--- a/init/main.c~lib-stackdepot-add-support-to-disable-stack-depot
+++ a/init/main.c
@@ -97,6 +97,7 @@
 #include <linux/mem_encrypt.h>
 #include <linux/kcsan.h>
 #include <linux/init_syscalls.h>
+#include <linux/stackdepot.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -827,6 +828,7 @@ static void __init mm_init(void)
 	init_mem_debugging_and_hardening();
 	kfence_alloc_pool();
 	report_meminit();
+	stack_depot_init();
 	mem_init();
 	/* page_owner must be initialized after buddy is ready */
 	page_ext_init_flatmem_late();
--- a/lib/stackdepot.c~lib-stackdepot-add-support-to-disable-stack-depot
+++ a/lib/stackdepot.c
@@ -31,6 +31,7 @@
 #include <linux/stackdepot.h>
 #include <linux/string.h>
 #include <linux/types.h>
+#include <linux/memblock.h>
 
 #define DEPOT_STACK_BITS (sizeof(depot_stack_handle_t) * 8)
 
@@ -145,9 +146,32 @@ static struct stack_record *depot_alloc_
 #define STACK_HASH_MASK (STACK_HASH_SIZE - 1)
 #define STACK_HASH_SEED 0x9747b28c
 
-static struct stack_record *stack_table[STACK_HASH_SIZE] = {
-	[0 ...	STACK_HASH_SIZE - 1] = NULL
-};
+static bool stack_depot_disable;
+static struct stack_record **stack_table;
+
+static int __init is_stack_depot_disabled(char *str)
+{
+	kstrtobool(str, &stack_depot_disable);
+	if (stack_depot_disable) {
+		pr_info("Stack Depot is disabled\n");
+		stack_table = NULL;
+	}
+	return 0;
+}
+early_param("stack_depot_disable", is_stack_depot_disabled);
+
+int __init stack_depot_init(void)
+{
+	if (!stack_depot_disable) {
+		size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
+		int i;
+
+		stack_table = memblock_alloc(size, size);
+		for (i = 0; i < STACK_HASH_SIZE;  i++)
+			stack_table[i] = NULL;
+	}
+	return 0;
+}
 
 /* Calculate hash for a stack */
 static inline u32 hash_stack(unsigned long *entries, unsigned int size)
@@ -241,7 +265,7 @@ depot_stack_handle_t stack_depot_save(un
 	unsigned long flags;
 	u32 hash;
 
-	if (unlikely(nr_entries == 0))
+	if (unlikely(nr_entries == 0) || stack_depot_disable)
 		goto fast_exit;
 
 	hash = hash_stack(entries, nr_entries);
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 096/118] lib: stackdepot: fix ignoring return value warning
  2021-02-26  1:14 incoming Andrew Morton
                   ` (94 preceding siblings ...)
  2021-02-26  1:21 ` [patch 095/118] lib: stackdepot: add support to disable stack depot Andrew Morton
@ 2021-02-26  1:21 ` Andrew Morton
  2021-02-26  1:21 ` [patch 097/118] lib/cmdline: remove an unneeded local variable in next_arg() Andrew Morton
                   ` (22 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:21 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, torvalds, vjitta

From: Vijayanand Jitta <vjitta@codeaurora.org>
Subject: lib: stackdepot: fix ignoring return value warning

Fix the below ignoring return value warning for kstrtobool in
is_stack_depot_disabled function.

lib/stackdepot.c: In function 'is_stack_depot_disabled':
lib/stackdepot.c:154:2: warning: ignoring return value of 'kstrtobool'
declared with attribute 'warn_unused_result' [-Wunused-result]

Link: https://lkml.kernel.org/r/1612163048-28026-1-git-send-email-vjitta@codeaurora.org
Fixes: b9779abb09a8 ("lib: stackdepot: add support to disable stack depot")
Signed-off-by: Vijayanand Jitta <vjitta@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 lib/stackdepot.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/lib/stackdepot.c~lib-stackdepot-add-support-to-disable-stack-depot-fix-2
+++ a/lib/stackdepot.c
@@ -151,8 +151,10 @@ static struct stack_record **stack_table
 
 static int __init is_stack_depot_disabled(char *str)
 {
-	kstrtobool(str, &stack_depot_disable);
-	if (stack_depot_disable) {
+	int ret;
+
+	ret = kstrtobool(str, &stack_depot_disable);
+	if (!ret && stack_depot_disable) {
 		pr_info("Stack Depot is disabled\n");
 		stack_table = NULL;
 	}
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 097/118] lib/cmdline: remove an unneeded local variable in next_arg()
  2021-02-26  1:14 incoming Andrew Morton
                   ` (95 preceding siblings ...)
  2021-02-26  1:21 ` [patch 096/118] lib: stackdepot: fix ignoring return value warning Andrew Morton
@ 2021-02-26  1:21 ` Andrew Morton
  2021-02-26  1:21 ` [patch 098/118] include/linux/bitops.h: spelling s/synomyn/synonym/ Andrew Morton
                   ` (21 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:21 UTC (permalink / raw)
  To: akpm, linux-mm, masahiroy, mm-commits, torvalds

From: Masahiro Yamada <masahiroy@kernel.org>
Subject: lib/cmdline: remove an unneeded local variable in next_arg()

The local variable 'next' is unneeded because you can simply advance the
existing pointer 'args'.

Link: https://lkml.kernel.org/r/20210201014707.3828753-1-masahiroy@kernel.org
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 lib/cmdline.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

--- a/lib/cmdline.c~lib-cmdline-remove-an-unneeded-local-variable-in-next_arg
+++ a/lib/cmdline.c
@@ -228,7 +228,6 @@ char *next_arg(char *args, char **param,
 {
 	unsigned int i, equals = 0;
 	int in_quote = 0, quoted = 0;
-	char *next;
 
 	if (*args == '"') {
 		args++;
@@ -266,10 +265,10 @@ char *next_arg(char *args, char **param,
 
 	if (args[i]) {
 		args[i] = '\0';
-		next = args + i + 1;
+		args += i + 1;
 	} else
-		next = args + i;
+		args += i;
 
 	/* Chew up trailing spaces. */
-	return skip_spaces(next);
+	return skip_spaces(args);
 }
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 098/118] include/linux/bitops.h: spelling s/synomyn/synonym/
  2021-02-26  1:14 incoming Andrew Morton
                   ` (96 preceding siblings ...)
  2021-02-26  1:21 ` [patch 097/118] lib/cmdline: remove an unneeded local variable in next_arg() Andrew Morton
@ 2021-02-26  1:21 ` Andrew Morton
  2021-02-26  1:21 ` [patch 099/118] checkpatch: improve blank line after declaration test Andrew Morton
                   ` (20 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:21 UTC (permalink / raw)
  To: akpm, geert+renesas, linux-mm, mm-commits, torvalds

From: Geert Uytterhoeven <geert+renesas@glider.be>
Subject: include/linux/bitops.h: spelling s/synomyn/synonym/

Fix a misspelling of "synonym".

Link: https://lkml.kernel.org/r/20210108105305.2028120-1-geert+renesas@glider.be
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/bitops.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/include/linux/bitops.h~bitops-spelling-s-synomyn-synonym
+++ a/include/linux/bitops.h
@@ -214,7 +214,7 @@ static inline int get_count_order_long(u
  * __ffs64 - find first set bit in a 64 bit word
  * @word: The 64 bit word
  *
- * On 64 bit arches this is a synomyn for __ffs
+ * On 64 bit arches this is a synonym for __ffs
  * The result is not defined if no bits are set, so check that @word
  * is non-zero before calling this.
  */
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 099/118] checkpatch: improve blank line after declaration test
  2021-02-26  1:14 incoming Andrew Morton
                   ` (97 preceding siblings ...)
  2021-02-26  1:21 ` [patch 098/118] include/linux/bitops.h: spelling s/synomyn/synonym/ Andrew Morton
@ 2021-02-26  1:21 ` Andrew Morton
  2021-02-26  1:21 ` [patch 100/118] checkpatch: ignore warning designated initializers using NR_CPUS Andrew Morton
                   ` (19 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:21 UTC (permalink / raw)
  To: akpm, joe, linux-mm, mm-commits, torvalds

From: Joe Perches <joe@perches.com>
Subject: checkpatch: improve blank line after declaration test

Avoid multiple false positives by ignoring attributes.

Various attributes like volatile and ____cacheline_aligned_in_smp cause
checkpatch to emit invalid "Missing a blank line after declarations"
messages.

Use copies of $sline and $prevline, remove $Attribute and $Sparse, and use
the existing tests to avoid these false positives.

Miscellanea:

o Add volatile to $Attribute

This also reduces checkpatch runtime a bit by moving the indentation
comparison test to the start of the block to avoid multiple unnecessary
regex tests.

Link: https://lkml.kernel.org/r/9015fd00742bf4e5b824ad6d7fd7189530958548.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 scripts/checkpatch.pl |   52 ++++++++++++++++++++++------------------
 1 file changed, 29 insertions(+), 23 deletions(-)

--- a/scripts/checkpatch.pl~checkpatch-improve-blank-line-after-declaration-test
+++ a/scripts/checkpatch.pl
@@ -382,6 +382,7 @@ our $InitAttribute = qr{$InitAttributeDa
 # We need \b after 'init' otherwise 'initconst' will cause a false positive in a check
 our $Attribute	= qr{
 			const|
+			volatile|
 			__percpu|
 			__nocast|
 			__safe|
@@ -3776,43 +3777,48 @@ sub process {
 		}
 
 # check for missing blank lines after declarations
-		if ($sline =~ /^\+\s+\S/ &&			#Not at char 1
-			# actual declarations
-		    ($prevline =~ /^\+\s+$Declare\s*$Ident\s*[=,;:\[]/ ||
+# (declarations must have the same indentation and not be at the start of line)
+		if (($prevline =~ /\+(\s+)\S/) && $sline =~ /^\+$1\S/) {
+			# use temporaries
+			my $sl = $sline;
+			my $pl = $prevline;
+			# remove $Attribute/$Sparse uses to simplify comparisons
+			$sl =~ s/\b(?:$Attribute|$Sparse)\b//g;
+			$pl =~ s/\b(?:$Attribute|$Sparse)\b//g;
+			if (($pl =~ /^\+\s+$Declare\s*$Ident\s*[=,;:\[]/ ||
 			# function pointer declarations
-		     $prevline =~ /^\+\s+$Declare\s*\(\s*\*\s*$Ident\s*\)\s*[=,;:\[\(]/ ||
+			     $pl =~ /^\+\s+$Declare\s*\(\s*\*\s*$Ident\s*\)\s*[=,;:\[\(]/ ||
 			# foo bar; where foo is some local typedef or #define
-		     $prevline =~ /^\+\s+$Ident(?:\s+|\s*\*\s*)$Ident\s*[=,;\[]/ ||
+			     $pl =~ /^\+\s+$Ident(?:\s+|\s*\*\s*)$Ident\s*[=,;\[]/ ||
 			# known declaration macros
-		     $prevline =~ /^\+\s+$declaration_macros/) &&
+			     $pl =~ /^\+\s+$declaration_macros/) &&
 			# for "else if" which can look like "$Ident $Ident"
-		    !($prevline =~ /^\+\s+$c90_Keywords\b/ ||
+			    !($pl =~ /^\+\s+$c90_Keywords\b/ ||
 			# other possible extensions of declaration lines
-		      $prevline =~ /(?:$Compare|$Assignment|$Operators)\s*$/ ||
+			      $pl =~ /(?:$Compare|$Assignment|$Operators)\s*$/ ||
 			# not starting a section or a macro "\" extended line
-		      $prevline =~ /(?:\{\s*|\\)$/) &&
+			      $pl =~ /(?:\{\s*|\\)$/) &&
 			# looks like a declaration
-		    !($sline =~ /^\+\s+$Declare\s*$Ident\s*[=,;:\[]/ ||
+			    !($sl =~ /^\+\s+$Declare\s*$Ident\s*[=,;:\[]/ ||
 			# function pointer declarations
-		      $sline =~ /^\+\s+$Declare\s*\(\s*\*\s*$Ident\s*\)\s*[=,;:\[\(]/ ||
+			      $sl =~ /^\+\s+$Declare\s*\(\s*\*\s*$Ident\s*\)\s*[=,;:\[\(]/ ||
 			# foo bar; where foo is some local typedef or #define
-		      $sline =~ /^\+\s+$Ident(?:\s+|\s*\*\s*)$Ident\s*[=,;\[]/ ||
+			      $sl =~ /^\+\s+$Ident(?:\s+|\s*\*\s*)$Ident\s*[=,;\[]/ ||
 			# known declaration macros
-		      $sline =~ /^\+\s+$declaration_macros/ ||
+			      $sl =~ /^\+\s+$declaration_macros/ ||
 			# start of struct or union or enum
-		      $sline =~ /^\+\s+(?:static\s+)?(?:const\s+)?(?:union|struct|enum|typedef)\b/ ||
+			      $sl =~ /^\+\s+(?:static\s+)?(?:const\s+)?(?:union|struct|enum|typedef)\b/ ||
 			# start or end of block or continuation of declaration
-		      $sline =~ /^\+\s+(?:$|[\{\}\.\#\"\?\:\(\[])/ ||
+			      $sl =~ /^\+\s+(?:$|[\{\}\.\#\"\?\:\(\[])/ ||
 			# bitfield continuation
-		      $sline =~ /^\+\s+$Ident\s*:\s*\d+\s*[,;]/ ||
+			      $sl =~ /^\+\s+$Ident\s*:\s*\d+\s*[,;]/ ||
 			# other possible extensions of declaration lines
-		      $sline =~ /^\+\s+\(?\s*(?:$Compare|$Assignment|$Operators)/) &&
-			# indentation of previous and current line are the same
-		    (($prevline =~ /\+(\s+)\S/) && $sline =~ /^\+$1\S/)) {
-			if (WARN("LINE_SPACING",
-				 "Missing a blank line after declarations\n" . $hereprev) &&
-			    $fix) {
-				fix_insert_line($fixlinenr, "\+");
+			      $sl =~ /^\+\s+\(?\s*(?:$Compare|$Assignment|$Operators)/)) {
+				if (WARN("LINE_SPACING",
+					 "Missing a blank line after declarations\n" . $hereprev) &&
+				    $fix) {
+					fix_insert_line($fixlinenr, "\+");
+				}
 			}
 		}
 
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 100/118] checkpatch: ignore warning designated initializers using NR_CPUS
  2021-02-26  1:14 incoming Andrew Morton
                   ` (98 preceding siblings ...)
  2021-02-26  1:21 ` [patch 099/118] checkpatch: improve blank line after declaration test Andrew Morton
@ 2021-02-26  1:21 ` Andrew Morton
  2021-02-26  1:21 ` [patch 101/118] checkpatch: trivial style fixes Andrew Morton
                   ` (18 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:21 UTC (permalink / raw)
  To: akpm, joe, linux-mm, mm-commits, rocking, torvalds

From: Peng Wang <rocking@linux.alibaba.com>
Subject: checkpatch: ignore warning designated initializers using NR_CPUS

Some max_length wants to hold as large room as possible to ensure enough
size to tackle with the biggest NR_CPUS.  An example below:

kernel/cgroup/cpuset.c:
static struct cftype legacy_files[] = {
        {
                .name = "cpus",
                .seq_show = cpuset_common_seq_show,
                .write = cpuset_write_resmask,
                .max_write_len = (100U + 6 * NR_CPUS),
                .private = FILE_CPULIST,
        },
	...
}

Link: https://lkml.kernel.org/r/5d4998aa8a8ac7efada2c7daffa9e73559f8b186.1609331255.git.rocking@linux.alibaba.com
Signed-off-by: Peng Wang <rocking@linux.alibaba.com>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 scripts/checkpatch.pl |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/scripts/checkpatch.pl~checkpatch-ignore-warning-designated-initializers-using-nr_cpus
+++ a/scripts/checkpatch.pl
@@ -7025,12 +7025,14 @@ sub process {
 
 # use of NR_CPUS is usually wrong
 # ignore definitions of NR_CPUS and usage to define arrays as likely right
+# ignore designated initializers using NR_CPUS
 		if ($line =~ /\bNR_CPUS\b/ &&
 		    $line !~ /^.\s*\s*#\s*if\b.*\bNR_CPUS\b/ &&
 		    $line !~ /^.\s*\s*#\s*define\b.*\bNR_CPUS\b/ &&
 		    $line !~ /^.\s*$Declare\s.*\[[^\]]*NR_CPUS[^\]]*\]/ &&
 		    $line !~ /\[[^\]]*\.\.\.[^\]]*NR_CPUS[^\]]*\]/ &&
-		    $line !~ /\[[^\]]*NR_CPUS[^\]]*\.\.\.[^\]]*\]/)
+		    $line !~ /\[[^\]]*NR_CPUS[^\]]*\.\.\.[^\]]*\]/ &&
+		    $line !~ /^.\s*\.\w+\s*=\s*.*\bNR_CPUS\b/)
 		{
 			WARN("NR_CPUS",
 			     "usage of NR_CPUS is often wrong - consider using cpu_possible(), num_possible_cpus(), for_each_possible_cpu(), etc\n" . $herecurr);
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 101/118] checkpatch: trivial style fixes
  2021-02-26  1:14 incoming Andrew Morton
                   ` (99 preceding siblings ...)
  2021-02-26  1:21 ` [patch 100/118] checkpatch: ignore warning designated initializers using NR_CPUS Andrew Morton
@ 2021-02-26  1:21 ` Andrew Morton
  2021-02-26  1:21 ` [patch 102/118] checkpatch: prefer ftrace over function entry/exit printks Andrew Morton
                   ` (17 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:21 UTC (permalink / raw)
  To: akpm, dwaipayanray1, joe, linux-mm, mm-commits, torvalds

From: Dwaipayan Ray <dwaipayanray1@gmail.com>
Subject: checkpatch: trivial style fixes

Indentations should use tabs wherever possible.
Replace spaces by tabs for indents.

Link: https://lkml.kernel.org/r/20210105103044.40282-1-dwaipayanray1@gmail.com
Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 scripts/checkpatch.pl |   18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

--- a/scripts/checkpatch.pl~checkpatch-trivial-style-fixes
+++ a/scripts/checkpatch.pl
@@ -2974,7 +2974,7 @@ sub process {
 				}
 				if (!defined $lines[$linenr]) {
 					WARN("BAD_SIGN_OFF",
-                                             "Co-developed-by: must be immediately followed by Signed-off-by:\n" . "$here\n" . $rawline);
+					     "Co-developed-by: must be immediately followed by Signed-off-by:\n" . "$here\n" . $rawline);
 				} elsif ($rawlines[$linenr] !~ /^\s*signed-off-by:\s*(.*)/i) {
 					WARN("BAD_SIGN_OFF",
 					     "Co-developed-by: must be immediately followed by Signed-off-by:\n" . "$here\n" . $rawline . "\n" .$rawlines[$linenr]);
@@ -2997,8 +2997,8 @@ sub process {
 			if (ERROR("GERRIT_CHANGE_ID",
 			          "Remove Gerrit Change-Id's before submitting upstream\n" . $herecurr) &&
 			    $fix) {
-                                fix_delete_line($fixlinenr, $rawline);
-                        }
+				fix_delete_line($fixlinenr, $rawline);
+			}
 		}
 
 # Check if the commit log is in a possible stack dump
@@ -3240,10 +3240,10 @@ sub process {
 				next if ($start_char =~ /^\S$/);
 				next if (index(" \t.,;?!", $end_char) == -1);
 
-                                # avoid repeating hex occurrences like 'ff ff fe 09 ...'
-                                if ($first =~ /\b[0-9a-f]{2,}\b/i) {
-                                        next if (!exists($allow_repeated_words{lc($first)}));
-                                }
+				# avoid repeating hex occurrences like 'ff ff fe 09 ...'
+				if ($first =~ /\b[0-9a-f]{2,}\b/i) {
+					next if (!exists($allow_repeated_words{lc($first)}));
+				}
 
 				if (WARN("REPEATED_WORD",
 					 "Possible repeated word: '$first'\n" . $herecurr) &&
@@ -4423,7 +4423,7 @@ sub process {
 			WARN("STATIC_CONST_CHAR_ARRAY",
 			     "char * array declaration might be better as static const\n" .
 				$herecurr);
-               }
+		}
 
 # check for sizeof(foo)/sizeof(foo[0]) that could be ARRAY_SIZE(foo)
 		if ($line =~ m@\bsizeof\s*\(\s*($Lval)\s*\)@) {
@@ -5276,7 +5276,7 @@ sub process {
 		    $lines[$linenr - 3] !~ /^[ +]\s*$Ident\s*:/) {
 			WARN("RETURN_VOID",
 			     "void function return statements are not generally useful\n" . $hereprev);
-               }
+		}
 
 # if statements using unnecessary parentheses - ie: if ((foo == bar))
 		if ($perl_version_ok &&
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 102/118] checkpatch: prefer ftrace over function entry/exit printks
  2021-02-26  1:14 incoming Andrew Morton
                   ` (100 preceding siblings ...)
  2021-02-26  1:21 ` [patch 101/118] checkpatch: trivial style fixes Andrew Morton
@ 2021-02-26  1:21 ` Andrew Morton
  2021-02-26  1:21 ` [patch 103/118] checkpatch: improve TYPECAST_INT_CONSTANT test message Andrew Morton
                   ` (16 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:21 UTC (permalink / raw)
  To: akpm, dan.carpenter, gregkh, joe, linux-mm, mm-commits, torvalds

From: Joe Perches <joe@perches.com>
Subject: checkpatch: prefer ftrace over function entry/exit printks

Prefer using ftrace over function entry/exit logging messages.

Warn with various function entry/exit only logging that only
use __func__ with or without descriptive decoration.

Link: https://lkml.kernel.org/r/47c01081533a417c99c9a80a4cd537f8c308503f.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 scripts/checkpatch.pl |   35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

--- a/scripts/checkpatch.pl~checkpatch-prefer-ftrace-over-function-entry-exit-printks
+++ a/scripts/checkpatch.pl
@@ -507,6 +507,30 @@ our $signature_tags = qr{(?xi:
 	Cc:
 )};
 
+our $tracing_logging_tags = qr{(?xi:
+	[=-]*> |
+	<[=-]* |
+	\[ |
+	\] |
+	start |
+	called |
+	entered |
+	entry |
+	enter |
+	in |
+	inside |
+	here |
+	begin |
+	exit |
+	end |
+	done |
+	leave |
+	completed |
+	out |
+	return |
+	[\.\!:\s]*
+)};
+
 sub edit_distance_min {
 	my (@arr) = @_;
 	my $len = scalar @arr;
@@ -5972,6 +5996,17 @@ sub process {
 			     "Prefer using '\"%s...\", __func__' to using '$context_function', this function's name, in a string\n" . $herecurr);
 		}
 
+# check for unnecessary function tracing like uses
+# This does not use $logFunctions because there are many instances like
+# 'dprintk(FOO, "%s()\n", __func__);' which do not match $logFunctions
+		if ($rawline =~ /^\+.*\([^"]*"$tracing_logging_tags{0,3}%s(?:\s*\(\s*\)\s*)?$tracing_logging_tags{0,3}(?:\\n)?"\s*,\s*__func__\s*\)\s*;/) {
+			if (WARN("TRACING_LOGGING",
+				 "Unnecessary ftrace-like logging - prefer using ftrace\n" . $herecurr) &&
+			    $fix) {
+                                fix_delete_line($fixlinenr, $rawline);
+			}
+		}
+
 # check for spaces before a quoted newline
 		if ($rawline =~ /^.*\".*\s\\n/) {
 			if (WARN("QUOTED_WHITESPACE_BEFORE_NEWLINE",
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 103/118] checkpatch: improve TYPECAST_INT_CONSTANT test message
  2021-02-26  1:14 incoming Andrew Morton
                   ` (101 preceding siblings ...)
  2021-02-26  1:21 ` [patch 102/118] checkpatch: prefer ftrace over function entry/exit printks Andrew Morton
@ 2021-02-26  1:21 ` Andrew Morton
  2021-02-26  1:21 ` [patch 104/118] checkpatch: add warning for avoiding .L prefix symbols in assembly files Andrew Morton
                   ` (15 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:21 UTC (permalink / raw)
  To: akpm, dgilbert, joe, linux-mm, mm-commits, torvalds

From: Joe Perches <joe@perches.com>
Subject: checkpatch: improve TYPECAST_INT_CONSTANT test message

Improve the TYPECAST_INT_CONSTANT test by showing the suggested conversion
for various type of uses like (unsigned int)1 to 1U.

Link: https://lkml.kernel.org/r/ecefe8dcb93fe7028311b69dd297ba52224233d4.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 scripts/checkpatch.pl |   20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

--- a/scripts/checkpatch.pl~checkpatch-improve-typecast_int_constant-test-message
+++ a/scripts/checkpatch.pl
@@ -6518,18 +6518,18 @@ sub process {
 		if ($line =~ /(\(\s*$C90_int_types\s*\)\s*)($Constant)\b/) {
 			my $cast = $1;
 			my $const = $2;
+			my $suffix = "";
+			my $newconst = $const;
+			$newconst =~ s/${Int_type}$//;
+			$suffix .= 'U' if ($cast =~ /\bunsigned\b/);
+			if ($cast =~ /\blong\s+long\b/) {
+			    $suffix .= 'LL';
+			} elsif ($cast =~ /\blong\b/) {
+			    $suffix .= 'L';
+			}
 			if (WARN("TYPECAST_INT_CONSTANT",
-				 "Unnecessary typecast of c90 int constant\n" . $herecurr) &&
+				 "Unnecessary typecast of c90 int constant - '$cast$const' could be '$const$suffix'\n" . $herecurr) &&
 			    $fix) {
-				my $suffix = "";
-				my $newconst = $const;
-				$newconst =~ s/${Int_type}$//;
-				$suffix .= 'U' if ($cast =~ /\bunsigned\b/);
-				if ($cast =~ /\blong\s+long\b/) {
-					$suffix .= 'LL';
-				} elsif ($cast =~ /\blong\b/) {
-					$suffix .= 'L';
-				}
 				$fixed[$fixlinenr] =~ s/\Q$cast\E$const\b/$newconst$suffix/;
 			}
 		}
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 104/118] checkpatch: add warning for avoiding .L prefix symbols in assembly files
  2021-02-26  1:14 incoming Andrew Morton
                   ` (102 preceding siblings ...)
  2021-02-26  1:21 ` [patch 103/118] checkpatch: improve TYPECAST_INT_CONSTANT test message Andrew Morton
@ 2021-02-26  1:21 ` Andrew Morton
  2021-02-26  1:22 ` [patch 105/118] checkpatch: add kmalloc_array_node to unnecessary OOM message check Andrew Morton
                   ` (14 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:21 UTC (permalink / raw)
  To: akpm, broonie, dwaipayanray1, joe, jpoimboe, linux-mm,
	lukas.bulwahn, mm-commits, ndesaulniers, torvalds, yashsri421

From: Aditya Srivastava <yashsri421@gmail.com>
Subject: checkpatch: add warning for avoiding .L prefix symbols in assembly files

objtool requires that all code must be contained in an ELF symbol.  Symbol
names that have a '.L' prefix do not emit symbol table entries, as they
have special meaning for the assembler.

'.L' prefixed symbols can be used within a code region, but should be
avoided for denoting a range of code via 'SYM_*_START/END' annotations.

Add a new check to emit a warning on finding the usage of '.L' symbols for
'.S' files, if it denotes range of code via SYM_*_START/END annotation
pair.

Link: https://lkml.kernel.org/r/20210123190459.9701-1-yashsri421@gmail.com
Link: https://lore.kernel.org/lkml/20210112210154.GI4646@sirena.org.uk
Signed-off-by: Aditya Srivastava <yashsri421@gmail.com>
Suggested-by: Mark Brown <broonie@kernel.org>
Acked-by: Joe Perches <joe@perches.com>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: Aditya Srivastava <yashsri421@gmail.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 scripts/checkpatch.pl |    7 +++++++
 1 file changed, 7 insertions(+)

--- a/scripts/checkpatch.pl~checkpatch-add-warning-for-avoiding-l-prefix-symbols-in-assembly-files
+++ a/scripts/checkpatch.pl
@@ -3599,6 +3599,13 @@ sub process {
 			}
 		}
 
+# check for .L prefix local symbols in .S files
+		if ($realfile =~ /\.S$/ &&
+		    $line =~ /^\+\s*(?:[A-Z]+_)?SYM_[A-Z]+_(?:START|END)(?:_[A-Z_]+)?\s*\(\s*\.L/) {
+			WARN("AVOID_L_PREFIX",
+			     "Avoid using '.L' prefixed local symbol names for denoting a range of code via 'SYM_*_START/END' annotations; see Documentation/asm-annotations.rst\n" . $herecurr);
+		}
+
 # check we are in a valid source file C or perl if not then ignore this hunk
 		next if ($realfile !~ /\.(h|c|pl|dtsi|dts)$/);
 
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 105/118] checkpatch: add kmalloc_array_node to unnecessary OOM message check
  2021-02-26  1:14 incoming Andrew Morton
                   ` (103 preceding siblings ...)
  2021-02-26  1:21 ` [patch 104/118] checkpatch: add warning for avoiding .L prefix symbols in assembly files Andrew Morton
@ 2021-02-26  1:22 ` Andrew Morton
  2021-02-26  1:22 ` [patch 106/118] checkpatch: don't warn about colon termination in linker scripts Andrew Morton
                   ` (13 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:22 UTC (permalink / raw)
  To: akpm, joe, kuba, linux-mm, mm-commits, torvalds

From: Joe Perches <joe@perches.com>
Subject: checkpatch: add kmalloc_array_node to unnecessary OOM message check

commit 5799b255c491 ("include/linux/slab.h: add kmalloc_array_node() and
kcalloc_node()") was added in 2017.  Update the unnecessary OOM message
test to include it.

Link: https://lkml.kernel.org/r/b9dc4a808b1518e08ab8761480d9872e5d18e7cd.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 scripts/checkpatch.pl |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/scripts/checkpatch.pl~checkpatch-add-kmalloc_array_node-to-unnecessary-oom-message-check
+++ a/scripts/checkpatch.pl
@@ -487,7 +487,7 @@ our $logFunctions = qr{(?x:
 
 our $allocFunctions = qr{(?x:
 	(?:(?:devm_)?
-		(?:kv|k|v)[czm]alloc(?:_node|_array)? |
+		(?:kv|k|v)[czm]alloc(?:_array)?(?:_node)? |
 		kstrdup(?:_const)? |
 		kmemdup(?:_nul)?) |
 	(?:\w+)?alloc_skb(?:_ip_align)? |
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 106/118] checkpatch: don't warn about colon termination in linker scripts
  2021-02-26  1:14 incoming Andrew Morton
                   ` (104 preceding siblings ...)
  2021-02-26  1:22 ` [patch 105/118] checkpatch: add kmalloc_array_node to unnecessary OOM message check Andrew Morton
@ 2021-02-26  1:22 ` Andrew Morton
  2021-02-26  1:22 ` [patch 107/118] checkpatch: do not apply "initialise globals to 0" check to BPF progs Andrew Morton
                   ` (12 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:22 UTC (permalink / raw)
  To: akpm, chris, joe, linux-mm, mm-commits, torvalds

From: Chris Down <chris@chrisdown.name>
Subject: checkpatch: don't warn about colon termination in linker scripts

This check erroneously flags cases like the one in my recent printk
enumeration patch[0], where the spaces are syntactic, and `section:' vs. 
`section :' is syntactically important:

    ERROR: space prohibited before that ':' (ctx:WxW)
    #258: FILE: include/asm-generic/vmlinux.lds.h:314:
    +       .printk_fmts : AT(ADDR(.printk_fmts) - LOAD_OFFSET) {

0: https://lore.kernel.org/patchwork/patch/1375749/

Link: https://lkml.kernel.org/r/YBwhqsc2TIVeid3t@chrisdown.name
Link: https://lkml.kernel.org/r/YB6UsjCOy1qrrlSD@chrisdown.name
Signed-off-by: Chris Down <chris@chrisdown.name>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 scripts/checkpatch.pl |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/scripts/checkpatch.pl~checkpatch-dont-warn-about-colon-termination-in-linker-scripts
+++ a/scripts/checkpatch.pl
@@ -5044,7 +5044,7 @@ sub process {
 				# A colon needs no spaces before when it is
 				# terminating a case value or a label.
 				} elsif ($opv eq ':C' || $opv eq ':L') {
-					if ($ctx =~ /Wx./) {
+					if ($ctx =~ /Wx./ and $realfile !~ m@.*\.lds\.h$@) {
 						if (ERROR("SPACING",
 							  "space prohibited before that '$op' $at\n" . $hereptr)) {
 							$good = rtrim($fix_elements[$n]) . trim($fix_elements[$n + 1]);
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 107/118] checkpatch: do not apply "initialise globals to 0" check to BPF progs
  2021-02-26  1:14 incoming Andrew Morton
                   ` (105 preceding siblings ...)
  2021-02-26  1:22 ` [patch 106/118] checkpatch: don't warn about colon termination in linker scripts Andrew Morton
@ 2021-02-26  1:22 ` Andrew Morton
  2021-02-26  1:22 ` [patch 108/118] init/version.c: remove Version_<LINUX_VERSION_CODE> symbol Andrew Morton
                   ` (11 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:22 UTC (permalink / raw)
  To: akpm, joe, linux-mm, mm-commits, songliubraving, torvalds

From: Song Liu <songliubraving@fb.com>
Subject: checkpatch: do not apply "initialise globals to 0" check to BPF progs

BPF programs explicitly initialise global variables to 0 to make sure
clang (v10 or older) do not put the variables in the common section.  Skip
"initialise globals to 0" check for BPF programs to elimiate error
messages like:

    ERROR: do not initialise globals to 0
    #19: FILE: samples/bpf/tracex1_kern.c:21:

Link: https://lkml.kernel.org/r/20210209211954.490077-1-songliubraving@fb.com
Signed-off-by: Song Liu <songliubraving@fb.com>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 scripts/checkpatch.pl |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

--- a/scripts/checkpatch.pl~checkpatch-do-not-apply-initialise-globals-to-0-check-to-bpf-progs
+++ a/scripts/checkpatch.pl
@@ -2453,6 +2453,15 @@ sub get_raw_comment {
 	return $comment;
 }
 
+sub exclude_global_initialisers {
+	my ($realfile) = @_;
+
+	# Do not check for BPF programs (tools/testing/selftests/bpf/progs/*.c, samples/bpf/*_kern.c, *.bpf.c).
+	return $realfile =~ m@^tools/testing/selftests/bpf/progs/.*\.c$@ ||
+		$realfile =~ m@^samples/bpf/.*_kern\.c$@ ||
+		$realfile =~ m@/bpf/.*\.bpf\.c$@;
+}
+
 sub process {
 	my $filename = shift;
 
@@ -4358,7 +4367,8 @@ sub process {
 		}
 
 # check for global initialisers.
-		if ($line =~ /^\+$Type\s*$Ident(?:\s+$Modifier)*\s*=\s*($zero_initializer)\s*;/) {
+		if ($line =~ /^\+$Type\s*$Ident(?:\s+$Modifier)*\s*=\s*($zero_initializer)\s*;/ &&
+		    !exclude_global_initialisers($realfile)) {
 			if (ERROR("GLOBAL_INITIALISERS",
 				  "do not initialise globals to $1\n" . $herecurr) &&
 			    $fix) {
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 108/118] init/version.c: remove Version_<LINUX_VERSION_CODE> symbol
  2021-02-26  1:14 incoming Andrew Morton
                   ` (106 preceding siblings ...)
  2021-02-26  1:22 ` [patch 107/118] checkpatch: do not apply "initialise globals to 0" check to BPF progs Andrew Morton
@ 2021-02-26  1:22 ` Andrew Morton
  2021-02-26  1:22 ` [patch 109/118] init: clean up early_param_on_off() macro Andrew Morton
                   ` (10 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:22 UTC (permalink / raw)
  To: akpm, guilak, linux-mm, masahiroy, mchehab, mm-commits, rdunlap,
	rlrevell, torvalds

From: Masahiro Yamada <masahiroy@kernel.org>
Subject: init/version.c: remove Version_<LINUX_VERSION_CODE> symbol

This code hunk creates a Version_<LINUX_VERSION_CODE> symbol if
CONFIG_KALLSYMS is disabled.  For example, building the kernel v5.10 for
allnoconfig creates the following symbol:

  $ nm vmlinux | grep Version_
  c116b028 B Version_330240

There is no in-tree user of this symbol.

Commit 197dcffc8ba0 ("init/version.c: define version_string only if
CONFIG_KALLSYMS is not defined") mentions that Version_* is only used
with ksymoops.

However, a commit in the pre-git era [1] had added the statement,
"ksymoops is useless on 2.6.  Please use the Oops in its original format".

That statement existed until commit 4eb9241127a0 ("Documentation:
admin-guide: update bug-hunting.rst") finally removed the stale
ksymoops information.

This symbol is no longer needed.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=ad68b2f085f5c79e4759ca2d13947b3c885ee831

Link: https://lkml.kernel.org/r/20210120033452.2895170-1-masahiroy@kernel.org
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Daniel Guilak <guilak@linux.vnet.ibm.com>
Cc: Lee Revell <rlrevell@joe-job.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 init/version.c |    8 --------
 1 file changed, 8 deletions(-)

--- a/init/version.c~init-versionc-remove-version_linux_version_code-symbol
+++ a/init/version.c
@@ -16,14 +16,6 @@
 #include <linux/version.h>
 #include <linux/proc_ns.h>
 
-#ifndef CONFIG_KALLSYMS
-#define version(a) Version_ ## a
-#define version_string(a) version(a)
-
-extern int version_string(LINUX_VERSION_CODE);
-int version_string(LINUX_VERSION_CODE);
-#endif
-
 struct uts_namespace init_uts_ns = {
 	.ns.count = REFCOUNT_INIT(2),
 	.name = {
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 109/118] init: clean up early_param_on_off() macro
  2021-02-26  1:14 incoming Andrew Morton
                   ` (107 preceding siblings ...)
  2021-02-26  1:22 ` [patch 108/118] init/version.c: remove Version_<LINUX_VERSION_CODE> symbol Andrew Morton
@ 2021-02-26  1:22 ` Andrew Morton
  2021-02-26  1:22 ` [patch 110/118] init/Kconfig: fix a typo in CC_VERSION_TEXT help text Andrew Morton
                   ` (9 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:22 UTC (permalink / raw)
  To: akpm, joe, johan, linux-mm, masahiroy, mm-commits, ndesaulniers,
	ojeda, torvalds

From: Masahiro Yamada <masahiroy@kernel.org>
Subject: init: clean up early_param_on_off() macro

Use early_param() to define early_param_on_off().

Link: https://lkml.kernel.org/r/20210201041532.4025025-1-masahiroy@kernel.org
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Joe Perches <joe@perches.com>
Cc: Nick Desaulniers <ndesaulniers@gooogle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/init.h |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/include/linux/init.h~init-clean-up-early_param_on_off-macro
+++ a/include/linux/init.h
@@ -338,14 +338,14 @@ struct obs_kernel_param {
 		var = 1;						\
 		return 0;						\
 	}								\
-	__setup_param(str_on, parse_##var##_on, parse_##var##_on, 1);	\
+	early_param(str_on, parse_##var##_on);				\
 									\
 	static int __init parse_##var##_off(char *arg)			\
 	{								\
 		var = 0;						\
 		return 0;						\
 	}								\
-	__setup_param(str_off, parse_##var##_off, parse_##var##_off, 1)
+	early_param(str_off, parse_##var##_off)
 
 /* Relies on boot_command_line being set */
 void __init parse_early_param(void);
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 110/118] init/Kconfig: fix a typo in CC_VERSION_TEXT help text
  2021-02-26  1:14 incoming Andrew Morton
                   ` (108 preceding siblings ...)
  2021-02-26  1:22 ` [patch 109/118] init: clean up early_param_on_off() macro Andrew Morton
@ 2021-02-26  1:22 ` Andrew Morton
  2021-02-26  1:22 ` [patch 111/118] fs/coredump: use kmap_local_page() Andrew Morton
                   ` (8 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:22 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, nathan, rdunlap, torvalds, unixbhaskar

From: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Subject: init/Kconfig: fix a typo in CC_VERSION_TEXT help text

s/compier/compiler/

Link: https://lkml.kernel.org/r/20210224223325.29099-1-unixbhaskar@gmail.com
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 init/Kconfig |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/init/Kconfig~init-kconfig-fix-a-typo-in-cc_version_text-help-text
+++ a/init/Kconfig
@@ -19,7 +19,7 @@ config CC_VERSION_TEXT
 	    CC_VERSION_TEXT so it is recorded in include/config/auto.conf.cmd.
 	    When the compiler is updated, Kconfig will be invoked.
 
-	  - Ensure full rebuild when the compier is updated
+	  - Ensure full rebuild when the compiler is updated
 	    include/linux/kconfig.h contains this option in the comment line so
 	    fixdep adds include/config/cc/version/text.h into the auto-generated
 	    dependency. When the compiler is updated, syncconfig will touch it
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 111/118] fs/coredump: use kmap_local_page()
  2021-02-26  1:14 incoming Andrew Morton
                   ` (109 preceding siblings ...)
  2021-02-26  1:22 ` [patch 110/118] init/Kconfig: fix a typo in CC_VERSION_TEXT help text Andrew Morton
@ 2021-02-26  1:22 ` Andrew Morton
  2021-02-26  1:22 ` [patch 112/118] seq_file: document how per-entry resources are managed Andrew Morton
                   ` (7 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:22 UTC (permalink / raw)
  To: akpm, ira.weiny, linux-mm, mm-commits, torvalds, viro

From: Ira Weiny <ira.weiny@intel.com>
Subject: fs/coredump: use kmap_local_page()

In dump_user_range() there is no reason for the mapping to be global.  Use
kmap_local_page() rather than kmap.

Link: https://lkml.kernel.org/r/20210203223328.558945-1-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/coredump.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/fs/coredump.c~fs-coredump-use-kmap_local_page
+++ a/fs/coredump.c
@@ -897,10 +897,10 @@ int dump_user_range(struct coredump_para
 		 */
 		page = get_dump_page(addr);
 		if (page) {
-			void *kaddr = kmap(page);
+			void *kaddr = kmap_local_page(page);
 
 			stop = !dump_emit(cprm, kaddr, PAGE_SIZE);
-			kunmap(page);
+			kunmap_local(kaddr);
 			put_page(page);
 		} else {
 			stop = !dump_skip(cprm, PAGE_SIZE);
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 112/118] seq_file: document how per-entry resources are managed.
  2021-02-26  1:14 incoming Andrew Morton
                   ` (110 preceding siblings ...)
  2021-02-26  1:22 ` [patch 111/118] fs/coredump: use kmap_local_page() Andrew Morton
@ 2021-02-26  1:22 ` Andrew Morton
  2021-02-26  1:22 ` [patch 113/118] x86: fix seq_file iteration for pat/memtype.c Andrew Morton
                   ` (6 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:22 UTC (permalink / raw)
  To: akpm, corbet, dave.hansen, davem, linux-mm, lucien.xin, luto,
	marcelo.leitner, mingo, mm-commits, neilb, nhorman, peterz,
	stable, torvalds, viro, vyasevich

From: NeilBrown <neilb@suse.de>
Subject: seq_file: document how per-entry resources are managed.

Patch series "Fix some seq_file users that were recently broken".

A recent change to seq_file broke some users which were using seq_file
in a non-"standard" way ...  though the "standard" isn't documented, so
they can be excused.  The result is a possible leak - of memory in one
case, of references to a 'transport' in the other.

These three patches:
 1/ document and explain the problem
 2/ fix the problem user in x86
 3/ fix the problem user in net/sctp


This patch (of 3):

Users of seq_file will sometimes find it convenient to take a resource,
such as a lock or memory allocation, in the ->start or ->next operations. 
These are per-entry resources, distinct from per-session resources which
are taken in ->start and released in ->stop.

The preferred management of these is release the resource on the
subsequent call to ->next or ->stop.

However prior to Commit 1f4aace60b0e ("fs/seq_file.c: simplify seq_file
iteration code and interface") it happened that ->show would always be
called after ->start or ->next, and a few users chose to release the
resource in ->show.

This is no longer reliable.  Since the mentioned commit, ->next will
always come after a successful ->show (to ensure m->index is updated
correctly), so the original ordering cannot be maintained.

This patch updates the documentation to clearly state the required
behaviour.  Other patches will fix the few problematic users.

[akpm@linux-foundation.org: fix typo, per Willy]
Link: https://lkml.kernel.org/r/161248518659.21478.2484341937387294998.stgit@noble1
Link: https://lkml.kernel.org/r/161248539020.21478.3147971477400875336.stgit@noble1
Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface")
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Xin Long <lucien.xin@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 Documentation/filesystems/seq_file.rst |    6 ++++++
 1 file changed, 6 insertions(+)

--- a/Documentation/filesystems/seq_file.rst~seq_file-document-how-per-entry-resources-are-managed
+++ a/Documentation/filesystems/seq_file.rst
@@ -217,6 +217,12 @@ between the calls to start() and stop(),
 is a reasonable thing to do. The seq_file code will also avoid taking any
 other locks while the iterator is active.
 
+The iterater value returned by start() or next() is guaranteed to be
+passed to a subsequent next() or stop() call.  This allows resources
+such as locks that were taken to be reliably released.  There is *no*
+guarantee that the iterator will be passed to show(), though in practice
+it often will be.
+
 
 Formatted output
 ================
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 113/118] x86: fix seq_file iteration for pat/memtype.c
  2021-02-26  1:14 incoming Andrew Morton
                   ` (111 preceding siblings ...)
  2021-02-26  1:22 ` [patch 112/118] seq_file: document how per-entry resources are managed Andrew Morton
@ 2021-02-26  1:22 ` Andrew Morton
  2021-02-26  1:22 ` [patch 114/118] scripts/gdb: fix list_for_each Andrew Morton
                   ` (5 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:22 UTC (permalink / raw)
  To: akpm, corbet, dave.hansen, davem, linux-mm, lucien.xin, luto,
	marcelo.leitner, mingo, mm-commits, neilb, nhorman, peterz,
	stable, torvalds, viro, vyasevich

From: NeilBrown <neilb@suse.de>
Subject: x86: fix seq_file iteration for pat/memtype.c

The memtype seq_file iterator allocates a buffer in the ->start and ->next
functions and frees it in the ->show function.  The preferred handling for
such resources is to free them in the subsequent ->next or ->stop function
call.

Since Commit 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration
code and interface") there is no guarantee that ->show will be called
after ->next, so this function can now leak memory.

So move the freeing of the buffer to ->next and ->stop.

Link: https://lkml.kernel.org/r/161248539022.21478.13874455485854739066.stgit@noble1
Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface")
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Xin Long <lucien.xin@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/x86/mm/pat/memtype.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/x86/mm/pat/memtype.c~x86-fix-seq_file-iteration-for-pat-memtypec
+++ a/arch/x86/mm/pat/memtype.c
@@ -1164,12 +1164,14 @@ static void *memtype_seq_start(struct se
 
 static void *memtype_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
+	kfree(v);
 	++*pos;
 	return memtype_get_idx(*pos);
 }
 
 static void memtype_seq_stop(struct seq_file *seq, void *v)
 {
+	kfree(v);
 }
 
 static int memtype_seq_show(struct seq_file *seq, void *v)
@@ -1181,8 +1183,6 @@ static int memtype_seq_show(struct seq_f
 			entry_print->end,
 			cattr_name(entry_print->type));
 
-	kfree(entry_print);
-
 	return 0;
 }
 
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 114/118] scripts/gdb: fix list_for_each
  2021-02-26  1:14 incoming Andrew Morton
                   ` (112 preceding siblings ...)
  2021-02-26  1:22 ` [patch 113/118] x86: fix seq_file iteration for pat/memtype.c Andrew Morton
@ 2021-02-26  1:22 ` Andrew Morton
  2021-02-26  1:22 ` [patch 115/118] kgdb: fix to kill breakpoints on initmem after boot Andrew Morton
                   ` (4 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:22 UTC (permalink / raw)
  To: akpm, jan.kiszka, kbingham, linux-mm, mm-commits, prekageo, torvalds

From: George Prekas <prekageo@amazon.com>
Subject: scripts/gdb: fix list_for_each

If the list is uninitialized (next pointer is NULL), list_for_each gets
stuck in an infinite loop. Print a message and treat list as empty.

Link: https://lkml.kernel.org/r/4ae23bb1-c333-f669-da2d-fa35c4f49018@amazon.com
Signed-off-by: George Prekas <prekageo@amazon.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 scripts/gdb/linux/lists.py |    5 +++++
 1 file changed, 5 insertions(+)

--- a/scripts/gdb/linux/lists.py~scripts-gdb-fix-list_for_each
+++ a/scripts/gdb/linux/lists.py
@@ -27,6 +27,11 @@ def list_for_each(head):
         raise TypeError("Must be struct list_head not {}"
                            .format(head.type))
 
+    if head['next'] == 0:
+        gdb.write("list_for_each: Uninitialized list '{}' treated as empty\n"
+                     .format(head.address))
+        return
+
     node = head['next'].dereference()
     while node.address != head.address:
         yield node.address
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 115/118] kgdb: fix to kill breakpoints on initmem after boot
  2021-02-26  1:14 incoming Andrew Morton
                   ` (113 preceding siblings ...)
  2021-02-26  1:22 ` [patch 114/118] scripts/gdb: fix list_for_each Andrew Morton
@ 2021-02-26  1:22 ` Andrew Morton
  2021-02-26  1:22 ` [patch 116/118] ubsan: remove overflow checks Andrew Morton
                   ` (3 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:22 UTC (permalink / raw)
  To: akpm, daniel.thompson, dianders, jason.wessel, linux-mm,
	mhiramat, mm-commits, peterz, rostedt, sumit.garg, torvalds

From: Sumit Garg <sumit.garg@linaro.org>
Subject: kgdb: fix to kill breakpoints on initmem after boot

Currently breakpoints in kernel .init.text section are not handled
correctly while allowing to remove them even after corresponding pages
have been freed.

Fix it via killing .init.text section breakpoints just prior to initmem
pages being freed.

Doug: "HW breakpoints aren't handled by this patch but it's probably
not such a big deal".

Link: https://lkml.kernel.org/r/20210224081652.587785-1-sumit.garg@linaro.org
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Suggested-by: Doug Anderson <dianders@chromium.org>
Acked-by: Doug Anderson <dianders@chromium.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Tested-by: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/kgdb.h      |    2 ++
 init/main.c               |    1 +
 kernel/debug/debug_core.c |   11 +++++++++++
 3 files changed, 14 insertions(+)

--- a/include/linux/kgdb.h~kgdb-fix-to-kill-breakpoints-on-initmem-after-boot
+++ a/include/linux/kgdb.h
@@ -359,9 +359,11 @@ extern atomic_t			kgdb_active;
 extern bool dbg_is_early;
 extern void __init dbg_late_init(void);
 extern void kgdb_panic(const char *msg);
+extern void kgdb_free_init_mem(void);
 #else /* ! CONFIG_KGDB */
 #define in_dbg_master() (0)
 #define dbg_late_init()
 static inline void kgdb_panic(const char *msg) {}
+static inline void kgdb_free_init_mem(void) { }
 #endif /* ! CONFIG_KGDB */
 #endif /* _KGDB_H_ */
--- a/init/main.c~kgdb-fix-to-kill-breakpoints-on-initmem-after-boot
+++ a/init/main.c
@@ -1426,6 +1426,7 @@ static int __ref kernel_init(void *unuse
 	async_synchronize_full();
 	kprobe_free_init_mem();
 	ftrace_free_init_mem();
+	kgdb_free_init_mem();
 	free_initmem();
 	mark_readonly();
 
--- a/kernel/debug/debug_core.c~kgdb-fix-to-kill-breakpoints-on-initmem-after-boot
+++ a/kernel/debug/debug_core.c
@@ -455,6 +455,17 @@ setundefined:
 	return 0;
 }
 
+void kgdb_free_init_mem(void)
+{
+	int i;
+
+	/* Clear init memory breakpoints. */
+	for (i = 0; i < KGDB_MAX_BREAKPOINTS; i++) {
+		if (init_section_contains((void *)kgdb_break[i].bpt_addr, 0))
+			kgdb_break[i].state = BP_UNDEFINED;
+	}
+}
+
 #ifdef CONFIG_KGDB_KDB
 void kdb_dump_stack_on_cpu(int cpu)
 {
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 116/118] ubsan: remove overflow checks
  2021-02-26  1:14 incoming Andrew Morton
                   ` (114 preceding siblings ...)
  2021-02-26  1:22 ` [patch 115/118] kgdb: fix to kill breakpoints on initmem after boot Andrew Morton
@ 2021-02-26  1:22 ` Andrew Morton
  2021-02-26  1:22 ` [patch 117/118] initramfs: panic with memory information Andrew Morton
                   ` (2 subsequent siblings)
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:22 UTC (permalink / raw)
  To: akpm, dvyukov, jpoimboe, keescook, linux-mm, mm-commits, peterz,
	rdunlap, ryabinin.a.a, sfr, torvalds, viro

From: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Subject: ubsan: remove overflow checks

Since GCC 8.0 -fsanitize=signed-integer-overflow doesn't work with
-fwrapv.  -fwrapv makes signed overflows defines and GCC essentially
disables ubsan checks.  On GCC < 8.0 -fwrapv doesn't have influence on
-fsanitize=signed-integer-overflow setting, so it kinda works but
generates false-positves and violates uaccess rules:

lib/iov_iter.o: warning: objtool: iovec_from_user()+0x22d: call to
__ubsan_handle_add_overflow() with UACCESS enabled

Disable signed overflow checks to avoid these problems.  Remove unsigned
overflow checks as well.  Unsigned overflow appeared as side effect of
commit cdf8a76fda4a ("ubsan: move cc-option tests into Kconfig"), but it
never worked (kernel doesn't boot).  And unsigned overflows are allowed by
C standard, so it just pointless.

Link: https://lkml.kernel.org/r/20210209232348.20510-1-ryabinin.a.a@gmail.com
Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 lib/Kconfig.ubsan      |   17 ---------
 lib/test_ubsan.c       |   49 ----------------------------
 lib/ubsan.c            |   68 ---------------------------------------
 scripts/Makefile.ubsan |    2 -
 4 files changed, 136 deletions(-)

--- a/lib/Kconfig.ubsan~ubsan-remove-overflow-checks
+++ a/lib/Kconfig.ubsan
@@ -112,23 +112,6 @@ config UBSAN_UNREACHABLE
 	  This option enables -fsanitize=unreachable which checks for control
 	  flow reaching an expected-to-be-unreachable position.
 
-config UBSAN_SIGNED_OVERFLOW
-	bool "Perform checking for signed arithmetic overflow"
-	default UBSAN
-	depends on $(cc-option,-fsanitize=signed-integer-overflow)
-	help
-	  This option enables -fsanitize=signed-integer-overflow which checks
-	  for overflow of any arithmetic operations with signed integers.
-
-config UBSAN_UNSIGNED_OVERFLOW
-	bool "Perform checking for unsigned arithmetic overflow"
-	depends on $(cc-option,-fsanitize=unsigned-integer-overflow)
-	depends on !X86_32 # avoid excessive stack usage on x86-32/clang
-	help
-	  This option enables -fsanitize=unsigned-integer-overflow which checks
-	  for overflow of any arithmetic operations with unsigned integers. This
-	  currently causes x86 to fail to boot.
-
 config UBSAN_OBJECT_SIZE
 	bool "Perform checking for accesses beyond the end of objects"
 	default UBSAN
--- a/lib/test_ubsan.c~ubsan-remove-overflow-checks
+++ a/lib/test_ubsan.c
@@ -11,51 +11,6 @@ typedef void(*test_ubsan_fp)(void);
 			#config, IS_ENABLED(config) ? "y" : "n");	\
 	} while (0)
 
-static void test_ubsan_add_overflow(void)
-{
-	volatile int val = INT_MAX;
-	volatile unsigned int uval = UINT_MAX;
-
-	UBSAN_TEST(CONFIG_UBSAN_SIGNED_OVERFLOW);
-	val += 2;
-
-	UBSAN_TEST(CONFIG_UBSAN_UNSIGNED_OVERFLOW);
-	uval += 2;
-}
-
-static void test_ubsan_sub_overflow(void)
-{
-	volatile int val = INT_MIN;
-	volatile unsigned int uval = 0;
-	volatile int val2 = 2;
-
-	UBSAN_TEST(CONFIG_UBSAN_SIGNED_OVERFLOW);
-	val -= val2;
-
-	UBSAN_TEST(CONFIG_UBSAN_UNSIGNED_OVERFLOW);
-	uval -= val2;
-}
-
-static void test_ubsan_mul_overflow(void)
-{
-	volatile int val = INT_MAX / 2;
-	volatile unsigned int uval = UINT_MAX / 2;
-
-	UBSAN_TEST(CONFIG_UBSAN_SIGNED_OVERFLOW);
-	val *= 3;
-
-	UBSAN_TEST(CONFIG_UBSAN_UNSIGNED_OVERFLOW);
-	uval *= 3;
-}
-
-static void test_ubsan_negate_overflow(void)
-{
-	volatile int val = INT_MIN;
-
-	UBSAN_TEST(CONFIG_UBSAN_SIGNED_OVERFLOW);
-	val = -val;
-}
-
 static void test_ubsan_divrem_overflow(void)
 {
 	volatile int val = 16;
@@ -155,10 +110,6 @@ static void test_ubsan_object_size_misma
 }
 
 static const test_ubsan_fp test_ubsan_array[] = {
-	test_ubsan_add_overflow,
-	test_ubsan_sub_overflow,
-	test_ubsan_mul_overflow,
-	test_ubsan_negate_overflow,
 	test_ubsan_shift_out_of_bounds,
 	test_ubsan_out_of_bounds,
 	test_ubsan_load_invalid_value,
--- a/lib/ubsan.c~ubsan-remove-overflow-checks
+++ a/lib/ubsan.c
@@ -163,74 +163,6 @@ static void ubsan_epilogue(void)
 	}
 }
 
-static void handle_overflow(struct overflow_data *data, void *lhs,
-			void *rhs, char op)
-{
-
-	struct type_descriptor *type = data->type;
-	char lhs_val_str[VALUE_LENGTH];
-	char rhs_val_str[VALUE_LENGTH];
-
-	if (suppress_report(&data->location))
-		return;
-
-	ubsan_prologue(&data->location, type_is_signed(type) ?
-			"signed-integer-overflow" :
-			"unsigned-integer-overflow");
-
-	val_to_string(lhs_val_str, sizeof(lhs_val_str), type, lhs);
-	val_to_string(rhs_val_str, sizeof(rhs_val_str), type, rhs);
-	pr_err("%s %c %s cannot be represented in type %s\n",
-		lhs_val_str,
-		op,
-		rhs_val_str,
-		type->type_name);
-
-	ubsan_epilogue();
-}
-
-void __ubsan_handle_add_overflow(void *data,
-				void *lhs, void *rhs)
-{
-
-	handle_overflow(data, lhs, rhs, '+');
-}
-EXPORT_SYMBOL(__ubsan_handle_add_overflow);
-
-void __ubsan_handle_sub_overflow(void *data,
-				void *lhs, void *rhs)
-{
-	handle_overflow(data, lhs, rhs, '-');
-}
-EXPORT_SYMBOL(__ubsan_handle_sub_overflow);
-
-void __ubsan_handle_mul_overflow(void *data,
-				void *lhs, void *rhs)
-{
-	handle_overflow(data, lhs, rhs, '*');
-}
-EXPORT_SYMBOL(__ubsan_handle_mul_overflow);
-
-void __ubsan_handle_negate_overflow(void *_data, void *old_val)
-{
-	struct overflow_data *data = _data;
-	char old_val_str[VALUE_LENGTH];
-
-	if (suppress_report(&data->location))
-		return;
-
-	ubsan_prologue(&data->location, "negation-overflow");
-
-	val_to_string(old_val_str, sizeof(old_val_str), data->type, old_val);
-
-	pr_err("negation of %s cannot be represented in type %s:\n",
-		old_val_str, data->type->type_name);
-
-	ubsan_epilogue();
-}
-EXPORT_SYMBOL(__ubsan_handle_negate_overflow);
-
-
 void __ubsan_handle_divrem_overflow(void *_data, void *lhs, void *rhs)
 {
 	struct overflow_data *data = _data;
--- a/scripts/Makefile.ubsan~ubsan-remove-overflow-checks
+++ a/scripts/Makefile.ubsan
@@ -8,8 +8,6 @@ ubsan-cflags-$(CONFIG_UBSAN_LOCAL_BOUNDS
 ubsan-cflags-$(CONFIG_UBSAN_SHIFT)		+= -fsanitize=shift
 ubsan-cflags-$(CONFIG_UBSAN_DIV_ZERO)		+= -fsanitize=integer-divide-by-zero
 ubsan-cflags-$(CONFIG_UBSAN_UNREACHABLE)	+= -fsanitize=unreachable
-ubsan-cflags-$(CONFIG_UBSAN_SIGNED_OVERFLOW)	+= -fsanitize=signed-integer-overflow
-ubsan-cflags-$(CONFIG_UBSAN_UNSIGNED_OVERFLOW)	+= -fsanitize=unsigned-integer-overflow
 ubsan-cflags-$(CONFIG_UBSAN_OBJECT_SIZE)	+= -fsanitize=object-size
 ubsan-cflags-$(CONFIG_UBSAN_BOOL)		+= -fsanitize=bool
 ubsan-cflags-$(CONFIG_UBSAN_ENUM)		+= -fsanitize=enum
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 117/118] initramfs: panic with memory information
  2021-02-26  1:14 incoming Andrew Morton
                   ` (115 preceding siblings ...)
  2021-02-26  1:22 ` [patch 116/118] ubsan: remove overflow checks Andrew Morton
@ 2021-02-26  1:22 ` Andrew Morton
  2021-02-26  1:22 ` [patch 118/118] MIPS: make userspace mapping young by default Andrew Morton
  2021-02-26 17:55 ` incoming Linus Torvalds
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:22 UTC (permalink / raw)
  To: akpm, arnd, brho, f.fainelli, linux-mm, mm-commits, torvalds

From: Florian Fainelli <f.fainelli@gmail.com>
Subject: initramfs: panic with memory information

On systems with large amounts of reserved memory we may fail to
successfully complete unpack_to_rootfs() and be left with:

 Kernel panic - not syncing: write error

this is not too helpful to understand what happened, so let's wrap the
panic() calls with a surrounding show_mem() such that we have a chance of
understanding the memory conditions leading to these allocation failures.

[akpm@linux-foundation.org: replace macro with C function]
Link: https://lkml.kernel.org/r/20210114231517.1854379-1-f.fainelli@gmail.com
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Cc: Barret Rhoden <brho@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 init/initramfs.c |   19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

--- a/init/initramfs.c~initramfs-panic-with-memory-information
+++ a/init/initramfs.c
@@ -11,6 +11,7 @@
 #include <linux/utime.h>
 #include <linux/file.h>
 #include <linux/memblock.h>
+#include <linux/mm.h>
 #include <linux/namei.h>
 #include <linux/init_syscalls.h>
 
@@ -45,6 +46,16 @@ static void __init error(char *x)
 		message = x;
 }
 
+static void panic_show_mem(const char *fmt, ...)
+{
+	va_list args;
+
+	show_mem(0, NULL);
+	va_start(args, fmt);
+	panic(fmt, args);
+	va_end(args);
+}
+
 /* link hash */
 
 #define N_ALIGN(len) ((((len) + 1) & ~3) + 2)
@@ -80,7 +91,7 @@ static char __init *find_link(int major,
 	}
 	q = kmalloc(sizeof(struct hash), GFP_KERNEL);
 	if (!q)
-		panic("can't allocate link hash entry");
+		panic_show_mem("can't allocate link hash entry");
 	q->major = major;
 	q->minor = minor;
 	q->ino = ino;
@@ -125,7 +136,7 @@ static void __init dir_add(const char *n
 {
 	struct dir_entry *de = kmalloc(sizeof(struct dir_entry), GFP_KERNEL);
 	if (!de)
-		panic("can't allocate dir_entry buffer");
+		panic_show_mem("can't allocate dir_entry buffer");
 	INIT_LIST_HEAD(&de->list);
 	de->name = kstrdup(name, GFP_KERNEL);
 	de->mtime = mtime;
@@ -460,7 +471,7 @@ static char * __init unpack_to_rootfs(ch
 	name_buf = kmalloc(N_ALIGN(PATH_MAX), GFP_KERNEL);
 
 	if (!header_buf || !symlink_buf || !name_buf)
-		panic("can't allocate buffers");
+		panic_show_mem("can't allocate buffers");
 
 	state = Start;
 	this_header = 0;
@@ -607,7 +618,7 @@ static int __init populate_rootfs(void)
 	/* Load the built in initramfs */
 	char *err = unpack_to_rootfs(__initramfs_start, __initramfs_size);
 	if (err)
-		panic("%s", err); /* Failed to decompress INTERNAL initramfs */
+		panic_show_mem("%s", err); /* Failed to decompress INTERNAL initramfs */
 
 	if (!initrd_start || IS_ENABLED(CONFIG_INITRAMFS_FORCE))
 		goto done;
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* [patch 118/118] MIPS: make userspace mapping young by default
  2021-02-26  1:14 incoming Andrew Morton
                   ` (116 preceding siblings ...)
  2021-02-26  1:22 ` [patch 117/118] initramfs: panic with memory information Andrew Morton
@ 2021-02-26  1:22 ` Andrew Morton
  2021-02-26 17:55 ` incoming Linus Torvalds
  118 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26  1:22 UTC (permalink / raw)
  To: akpm, ambrosehua, chenhc, christophe.leroy, gaojuxin, huangpei,
	jiaxun.yang, linux-mm, lixuefeng, maobibo, mm-commits, npiggin,
	paulburton, torvalds, tsbogend, yangtiezhu, zhangfx

From: Huang Pei <huangpei@loongson.cn>
Subject: MIPS: make userspace mapping young by default

MIPS page fault path(except huge page) takes 3 exceptions (1 TLB Miss + 2
TLB Invalid), butthe second TLB Invalid exception is just triggered by
__update_tlb from do_page_fault writing tlb without _PAGE_VALID set.  With
this patch, user space mapping prot is made young by default (with both
_PAGE_VALID and _PAGE_YOUNG set), and it only take 1 TLB Miss + 1 TLB
Invalid exception

Remove pte_sw_mkyoung without polluting MM code and make page fault delay
of MIPS on par with other architecture

Link: https://lkml.kernel.org/r/20210204013942.8398-1-huangpei@loongson.cn
Signed-off-by: Huang Pei <huangpei@loongson.cn>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: <huangpei@loongson.cn>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: <ambrosehua@gmail.com>
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Li Xuefeng <lixuefeng@loongson.cn>
Cc: Yang Tiezhu <yangtiezhu@loongson.cn>
Cc: Gao Juxin <gaojuxin@loongson.cn>
Cc: Fuxin Zhang <zhangfx@lemote.com>
Cc: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/mips/mm/cache.c    |   30 ++++++++++++++++--------------
 include/linux/pgtable.h |    8 --------
 mm/memory.c             |    4 ----
 3 files changed, 16 insertions(+), 26 deletions(-)

--- a/arch/mips/mm/cache.c~mips-make-userspace-mapping-young-by-default
+++ a/arch/mips/mm/cache.c
@@ -157,29 +157,31 @@ unsigned long _page_cachable_default;
 EXPORT_SYMBOL(_page_cachable_default);
 
 #define PM(p)	__pgprot(_page_cachable_default | (p))
+#define PVA(p)	PM(_PAGE_VALID | _PAGE_ACCESSED | (p))
 
 static inline void setup_protection_map(void)
 {
 	protection_map[0]  = PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_NO_READ);
-	protection_map[1]  = PM(_PAGE_PRESENT | _PAGE_NO_EXEC);
-	protection_map[2]  = PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_NO_READ);
-	protection_map[3]  = PM(_PAGE_PRESENT | _PAGE_NO_EXEC);
-	protection_map[4]  = PM(_PAGE_PRESENT);
-	protection_map[5]  = PM(_PAGE_PRESENT);
-	protection_map[6]  = PM(_PAGE_PRESENT);
-	protection_map[7]  = PM(_PAGE_PRESENT);
+	protection_map[1]  = PVA(_PAGE_PRESENT | _PAGE_NO_EXEC);
+	protection_map[2]  = PVA(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_NO_READ);
+	protection_map[3]  = PVA(_PAGE_PRESENT | _PAGE_NO_EXEC);
+	protection_map[4]  = PVA(_PAGE_PRESENT);
+	protection_map[5]  = PVA(_PAGE_PRESENT);
+	protection_map[6]  = PVA(_PAGE_PRESENT);
+	protection_map[7]  = PVA(_PAGE_PRESENT);
 
 	protection_map[8]  = PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_NO_READ);
-	protection_map[9]  = PM(_PAGE_PRESENT | _PAGE_NO_EXEC);
-	protection_map[10] = PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_WRITE |
+	protection_map[9]  = PVA(_PAGE_PRESENT | _PAGE_NO_EXEC);
+	protection_map[10] = PVA(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_WRITE |
 				_PAGE_NO_READ);
-	protection_map[11] = PM(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_WRITE);
-	protection_map[12] = PM(_PAGE_PRESENT);
-	protection_map[13] = PM(_PAGE_PRESENT);
-	protection_map[14] = PM(_PAGE_PRESENT | _PAGE_WRITE);
-	protection_map[15] = PM(_PAGE_PRESENT | _PAGE_WRITE);
+	protection_map[11] = PVA(_PAGE_PRESENT | _PAGE_NO_EXEC | _PAGE_WRITE);
+	protection_map[12] = PVA(_PAGE_PRESENT);
+	protection_map[13] = PVA(_PAGE_PRESENT);
+	protection_map[14] = PVA(_PAGE_PRESENT);
+	protection_map[15] = PVA(_PAGE_PRESENT);
 }
 
+#undef _PVA
 #undef PM
 
 void cpu_cache_init(void)
--- a/include/linux/pgtable.h~mips-make-userspace-mapping-young-by-default
+++ a/include/linux/pgtable.h
@@ -432,14 +432,6 @@ static inline void ptep_set_wrprotect(st
  * To be differentiate with macro pte_mkyoung, this macro is used on platforms
  * where software maintains page access bit.
  */
-#ifndef pte_sw_mkyoung
-static inline pte_t pte_sw_mkyoung(pte_t pte)
-{
-	return pte;
-}
-#define pte_sw_mkyoung	pte_sw_mkyoung
-#endif
-
 #ifndef pte_savedwrite
 #define pte_savedwrite pte_write
 #endif
--- a/mm/memory.c~mips-make-userspace-mapping-young-by-default
+++ a/mm/memory.c
@@ -2902,7 +2902,6 @@ static vm_fault_t wp_page_copy(struct vm
 		}
 		flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte));
 		entry = mk_pte(new_page, vma->vm_page_prot);
-		entry = pte_sw_mkyoung(entry);
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
 
 		/*
@@ -3560,7 +3559,6 @@ static vm_fault_t do_anonymous_page(stru
 	__SetPageUptodate(page);
 
 	entry = mk_pte(page, vma->vm_page_prot);
-	entry = pte_sw_mkyoung(entry);
 	if (vma->vm_flags & VM_WRITE)
 		entry = pte_mkwrite(pte_mkdirty(entry));
 
@@ -3745,8 +3743,6 @@ void do_set_pte(struct vm_fault *vmf, st
 
 	if (prefault && arch_wants_old_prefaulted_pte())
 		entry = pte_mkold(entry);
-	else
-		entry = pte_sw_mkyoung(entry);
 
 	if (write)
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
_


^ permalink raw reply	[flat|nested] 281+ messages in thread

* Re: [patch 067/118] tracing: add error_report_end trace point
  2021-02-26  1:19 ` [patch 067/118] tracing: add error_report_end trace point Andrew Morton
@ 2021-02-26 14:03   ` Steven Rostedt
  0 siblings, 0 replies; 281+ messages in thread
From: Steven Rostedt @ 2021-02-26 14:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: andreyknvl, dvyukov, elver, glider, gregkh, linux-mm, mingo,
	mm-commits, pmladek, sergey.senozhatsky, torvalds, vbabka

On Thu, 25 Feb 2021 17:19:44 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:

> +#define show_error_detector_list(val) \
> +	__print_symbolic(val, error_detector_list)
> +
> +DECLARE_EVENT_CLASS(error_report_template,
> +		    TP_PROTO(enum error_detector error_detector, unsigned long id),
> +		    TP_ARGS(error_detector, id),
> +		    TP_STRUCT__entry(__field(enum error_detector, error_detector)
> +					     __field(unsigned long, id)),
> +		    TP_fast_assign(__entry->error_detector = error_detector;
> +				   __entry->id = id;),
> +		    TP_printk("[%s] %lx",
> +			      show_error_detector_list(__entry->error_detector),
> +			      __entry->id));
> +
> +/**

This doesn't need to change right now, but FYI, do not follow checkpatch
formatting for TRACE_EVENT() and friend macros. The above is really hard to
read for a trace event. It should look like this:

DECLARE_EVENT_CLASS(error_report_template,

	TP_PROTO(enum error_detector error_detector, unsigned long id),

	TP_ARGS(error_detector, id),

	TP_STRUCT__entry(
		__field(enum error_detector, error_detector)
		__field(unsigned long, id)
	),

	TP_fast_assign(
		__entry->error_detector = error_detector;
		__entry->id = id;
	),

	TP_printk("[%s] %lx",
		  show_error_detector_list(__entry->error_detector),
		  __entry->id)
);


As it's not really a macro, but code, and see, it's MUCH easier to read!
Because we see the prototype, the structure definition, the code that
assigns that structure, and how to print it. Following what checkpatch
says, is equivalent to writing code like this:

void trace_error_report_template
		    (enum error_detector error_detector, unsigned long id) {
		    struct entry {enum error_detector error_detector;
		                  unsigned long id;};
		    __entry->error_detector = error_detector;
				   __entry->id = id;
		    printk("[%s] %lx",
			   show_error_detector_list(__entry->error_detector),
			   __entry->id)); }


It doesn't need to be fixed now. I'll try to remember to fix it after it
lands in my tree.

-- Steve


^ permalink raw reply	[flat|nested] 281+ messages in thread

* Re: incoming
  2021-02-26  1:14 incoming Andrew Morton
                   ` (117 preceding siblings ...)
  2021-02-26  1:22 ` [patch 118/118] MIPS: make userspace mapping young by default Andrew Morton
@ 2021-02-26 17:55 ` Linus Torvalds
  2021-02-26 19:16   ` incoming Andrew Morton
  118 siblings, 1 reply; 281+ messages in thread
From: Linus Torvalds @ 2021-02-26 17:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mm-commits, Linux-MM

On Thu, Feb 25, 2021 at 5:14 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> - The rest of MM.
>
>   Includes kfence - another runtime memory validator.  Not as
>   thorough as KASAN, but it has unmeasurable overhead and is intended
>   to be usable in production builds.
>
> - Everything else

Just to clarify: you have nothing else really pending?

I'm hoping to just do -rc1 this weekend after all - despite my late
start due to loss of power for several days.

I'll allow late stragglers with good reason through, but the fewer of
those there are, the better, of course.

Thanks,
                   Linus


^ permalink raw reply	[flat|nested] 281+ messages in thread

* Re: incoming
  2021-02-26 17:55 ` incoming Linus Torvalds
@ 2021-02-26 19:16   ` Andrew Morton
  0 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-02-26 19:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, Linux-MM

On Fri, 26 Feb 2021 09:55:27 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Thu, Feb 25, 2021 at 5:14 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > - The rest of MM.
> >
> >   Includes kfence - another runtime memory validator.  Not as
> >   thorough as KASAN, but it has unmeasurable overhead and is intended
> >   to be usable in production builds.
> >
> > - Everything else
> 
> Just to clarify: you have nothing else really pending?

Yes, that's it from me for -rc1.




^ permalink raw reply	[flat|nested] 281+ messages in thread

* Re: [patch 091/118] MAINTAINERS: add uapi directories to API/ABI section
  2021-02-26  1:21 ` [patch 091/118] MAINTAINERS: add uapi directories to API/ABI section Andrew Morton
@ 2021-02-27 19:18   ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 281+ messages in thread
From: Michael Kerrisk (man-pages) @ 2021-02-27 19:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Linux-MM, mm-commits, Linus Torvalds, Vlastimil Babka

Hello,

On Fri, 26 Feb 2021 at 02:21, Andrew Morton <akpm@linux-foundation.org> wrote:
>
> From: Vlastimil Babka <vbabka@suse.cz>
> Subject: MAINTAINERS: add uapi directories to API/ABI section
>
> Let's add include/uapi/ and arch/*/include/uapi/ to API/ABI section, so
> that for patches modifying them, get_maintainers.pl suggests CCing
> linux-api@ so people don't forget.
>
> Link: https://lkml.kernel.org/r/20210217174745.13591-1-vbabka@suse.cz
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Reported-by: David Hildenbrand <david@redhat.com>
> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
>  MAINTAINERS |    2 ++
>  1 file changed, 2 insertions(+)
>
> --- a/MAINTAINERS~maintainers-add-uapi-directories-to-api-abi-section
> +++ a/MAINTAINERS
> @@ -261,6 +261,8 @@ ABI/API
>  L:     linux-api@vger.kernel.org
>  F:     include/linux/syscalls.h
>  F:     kernel/sys_ni.c
> +F:     include/uapi/
> +F:     arch/*/include/uapi/
>
>  ABIT UGURU 1,2 HARDWARE MONITOR DRIVER
>  M:     Hans de Goede <hdegoede@redhat.com>

Can we please NOT do this.

Please see commit b14fd334ff3dc47b31c3592c166af5ea42b204d3.

In the past, include/uapi/ was listed as a file trigger, and the
consequence was a lot of low signal noise on linux-api@, until finally
I removed the trigger.

Thanks,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/


^ permalink raw reply	[flat|nested] 281+ messages in thread

* incoming
@ 2021-07-23 22:49 Andrew Morton
  0 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-07-23 22:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, mm-commits

15 patches, based on 704f4cba43d4ed31ef4beb422313f1263d87bc55.

Subsystems affected by this patch series:

  mm/userfaultfd
  mm/kfence
  mm/highmem
  mm/pagealloc
  mm/memblock
  mm/pagecache
  mm/secretmem
  mm/pagemap
  mm/hugetlbfs

Subsystem: mm/userfaultfd

    Peter Collingbourne <pcc@google.com>:
    Patch series "userfaultfd: do not untag user pointers", v5:
      userfaultfd: do not untag user pointers
      selftest: use mmap instead of posix_memalign to allocate memory

Subsystem: mm/kfence

    Weizhao Ouyang <o451686892@gmail.com>:
      kfence: defer kfence_test_init to ensure that kunit debugfs is created

    Alexander Potapenko <glider@google.com>:
      kfence: move the size check to the beginning of __kfence_alloc()
      kfence: skip all GFP_ZONEMASK allocations

Subsystem: mm/highmem

    Christoph Hellwig <hch@lst.de>:
      mm: call flush_dcache_page() in memcpy_to_page() and memzero_page()
      mm: use kmap_local_page in memzero_page

Subsystem: mm/pagealloc

    Sergei Trofimovich <slyfox@gentoo.org>:
      mm: page_alloc: fix page_poison=1 / INIT_ON_ALLOC_DEFAULT_ON interaction

Subsystem: mm/memblock

    Mike Rapoport <rppt@linux.ibm.com>:
      memblock: make for_each_mem_range() traverse MEMBLOCK_HOTPLUG regions

Subsystem: mm/pagecache

    Roman Gushchin <guro@fb.com>:
      writeback, cgroup: remove wb from offline list before releasing refcnt
      writeback, cgroup: do not reparent dax inodes

Subsystem: mm/secretmem

    Mike Rapoport <rppt@linux.ibm.com>:
      mm/secretmem: wire up ->set_page_dirty

Subsystem: mm/pagemap

    Muchun Song <songmuchun@bytedance.com>:
      mm: mmap_lock: fix disabling preemption directly

    Qi Zheng <zhengqi.arch@bytedance.com>:
      mm: fix the deadlock in finish_fault()

Subsystem: mm/hugetlbfs

    Mike Kravetz <mike.kravetz@oracle.com>:
      hugetlbfs: fix mount mode command line processing

 Documentation/arm64/tagged-address-abi.rst |   26 ++++++++++++++++++--------
 fs/fs-writeback.c                          |    3 +++
 fs/hugetlbfs/inode.c                       |    2 +-
 fs/userfaultfd.c                           |   26 ++++++++++++--------------
 include/linux/highmem.h                    |    6 ++++--
 include/linux/memblock.h                   |    4 ++--
 mm/backing-dev.c                           |    2 +-
 mm/kfence/core.c                           |   19 ++++++++++++++++---
 mm/kfence/kfence_test.c                    |    2 +-
 mm/memblock.c                              |    3 ++-
 mm/memory.c                                |   11 ++++++++++-
 mm/mmap_lock.c                             |    4 ++--
 mm/page_alloc.c                            |   29 ++++++++++++++++-------------
 mm/secretmem.c                             |    1 +
 tools/testing/selftests/vm/userfaultfd.c   |    6 ++++--
 15 files changed, 93 insertions(+), 51 deletions(-)



^ permalink raw reply	[flat|nested] 281+ messages in thread

* incoming
@ 2021-07-15  4:26 Andrew Morton
  0 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-07-15  4:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, mm-commits

13 patches, based on 40226a3d96ef8ab8980f032681c8bfd46d63874e.

Subsystems affected by this patch series:

  mm/kasan
  mm/pagealloc
  mm/rmap
  mm/hmm
  hfs
  mm/hugetlb

Subsystem: mm/kasan

    Marco Elver <elver@google.com>:
      mm: move helper to check slub_debug_enabled

    Yee Lee <yee.lee@mediatek.com>:
      kasan: add memzero init for unaligned size at DEBUG

    Marco Elver <elver@google.com>:
      kasan: fix build by including kernel.h

Subsystem: mm/pagealloc

    Matteo Croce <mcroce@microsoft.com>:
      Revert "mm/page_alloc: make should_fail_alloc_page() static"

    Mel Gorman <mgorman@techsingularity.net>:
      mm/page_alloc: avoid page allocator recursion with pagesets.lock held

    Yanfei Xu <yanfei.xu@windriver.com>:
      mm/page_alloc: correct return value when failing at preparing

    Chuck Lever <chuck.lever@oracle.com>:
      mm/page_alloc: further fix __alloc_pages_bulk() return value

Subsystem: mm/rmap

    Christoph Hellwig <hch@lst.de>:
      mm: fix the try_to_unmap prototype for !CONFIG_MMU

Subsystem: mm/hmm

    Alistair Popple <apopple@nvidia.com>:
      lib/test_hmm: remove set but unused page variable

Subsystem: hfs

    Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>:
    Patch series "hfs: fix various errors", v2:
      hfs: add missing clean-up in hfs_fill_super
      hfs: fix high memory mapping in hfs_bnode_read
      hfs: add lock nesting notation to hfs_find_init

Subsystem: mm/hugetlb

    Joao Martins <joao.m.martins@oracle.com>:
      mm/hugetlb: fix refs calculation from unaligned @vaddr

 fs/hfs/bfind.c        |   14 +++++++++++++-
 fs/hfs/bnode.c        |   25 ++++++++++++++++++++-----
 fs/hfs/btree.h        |    7 +++++++
 fs/hfs/super.c        |   10 +++++-----
 include/linux/kasan.h |    1 +
 include/linux/rmap.h  |    4 +++-
 lib/test_hmm.c        |    2 --
 mm/hugetlb.c          |    5 +++--
 mm/kasan/kasan.h      |   12 ++++++++++++
 mm/page_alloc.c       |   30 ++++++++++++++++++++++--------
 mm/slab.h             |   15 +++++++++++----
 mm/slub.c             |   14 --------------
 12 files changed, 97 insertions(+), 42 deletions(-)



^ permalink raw reply	[flat|nested] 281+ messages in thread

* incoming
@ 2021-07-08  0:59 Andrew Morton
  0 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-07-08  0:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, mm-commits

54 patches, based on a931dd33d370896a683236bba67c0d6f3d01144d.

Subsystems affected by this patch series:

  lib
  mm/slub
  mm/secretmem
  mm/cleanups
  mm/init
  debug
  mm/pagemap
  mm/mremap

Subsystem: lib

    Zhen Lei <thunder.leizhen@huawei.com>:
      lib/test: fix spelling mistakes
      lib: fix spelling mistakes
      lib: fix spelling mistakes in header files

Subsystem: mm/slub

    Nathan Chancellor <nathan@kernel.org>:
    Patch series "hexagon: Fix build error with CONFIG_STACKDEPOT and select CONFIG_ARCH_WANT_LD_ORPHAN_WARN":
      hexagon: handle {,SOFT}IRQENTRY_TEXT in linker script
      hexagon: use common DISCARDS macro
      hexagon: select ARCH_WANT_LD_ORPHAN_WARN

    Oliver Glitta <glittao@gmail.com>:
      mm/slub: use stackdepot to save stack trace in objects

Subsystem: mm/secretmem

    Mike Rapoport <rppt@linux.ibm.com>:
    Patch series "mm: introduce memfd_secret system call to create "secret" memory areas", v20:
      mmap: make mlock_future_check() global
      riscv/Kconfig: make direct map manipulation options depend on MMU
      set_memory: allow querying whether set_direct_map_*() is actually enabled
      mm: introduce memfd_secret system call to create "secret" memory areas
      PM: hibernate: disable when there are active secretmem users
      arch, mm: wire up memfd_secret system call where relevant
      secretmem: test: add basic selftest for memfd_secret(2)

Subsystem: mm/cleanups

    Zhen Lei <thunder.leizhen@huawei.com>:
      mm: fix spelling mistakes in header files

Subsystem: mm/init

    Kefeng Wang <wangkefeng.wang@huawei.com>:
    Patch series "init_mm: cleanup ARCH's text/data/brk setup code", v3:
      mm: add setup_initial_init_mm() helper
      arc: convert to setup_initial_init_mm()
      arm: convert to setup_initial_init_mm()
      arm64: convert to setup_initial_init_mm()
      csky: convert to setup_initial_init_mm()
      h8300: convert to setup_initial_init_mm()
      m68k: convert to setup_initial_init_mm()
      nds32: convert to setup_initial_init_mm()
      nios2: convert to setup_initial_init_mm()
      openrisc: convert to setup_initial_init_mm()
      powerpc: convert to setup_initial_init_mm()
      riscv: convert to setup_initial_init_mm()
      s390: convert to setup_initial_init_mm()
      sh: convert to setup_initial_init_mm()
      x86: convert to setup_initial_init_mm()

Subsystem: debug

    Stephen Boyd <swboyd@chromium.org>:
    Patch series "Add build ID to stacktraces", v6:
      buildid: only consider GNU notes for build ID parsing
      buildid: add API to parse build ID out of buffer
      buildid: stash away kernels build ID on init
      dump_stack: add vmlinux build ID to stack traces
      module: add printk formats to add module build ID to stacktraces
      arm64: stacktrace: use %pSb for backtrace printing
      x86/dumpstack: use %pSb/%pBb for backtrace printing
      scripts/decode_stacktrace.sh: support debuginfod
      scripts/decode_stacktrace.sh: silence stderr messages from addr2line/nm
      scripts/decode_stacktrace.sh: indicate 'auto' can be used for base path
      buildid: mark some arguments const
      buildid: fix kernel-doc notation
      kdump: use vmlinux_build_id to simplify

Subsystem: mm/pagemap

    "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>:
      mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t *
      mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t *

Subsystem: mm/mremap

    "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>:
    Patch series "mrermap fixes", v2:
      selftest/mremap_test: update the test to handle pagesize other than 4K
      selftest/mremap_test: avoid crash with static build
      mm/mremap: convert huge PUD move to separate helper
      mm/mremap: don't enable optimized PUD move if page table levels is 2
      mm/mremap: use pmd/pud_poplulate to update page table entries
      mm/mremap: hold the rmap lock in write mode when moving page table entries.
    Patch series "Speedup mremap on ppc64", v8:
      mm/mremap: allow arch runtime override
      powerpc/book3s64/mm: update flush_tlb_range to flush page walk cache
      powerpc/mm: enable HAVE_MOVE_PMD support

 Documentation/core-api/printk-formats.rst           |   11 
 arch/alpha/include/asm/pgtable.h                    |    8 
 arch/arc/mm/init.c                                  |    5 
 arch/arm/include/asm/pgtable-3level.h               |    2 
 arch/arm/kernel/setup.c                             |    5 
 arch/arm64/include/asm/Kbuild                       |    1 
 arch/arm64/include/asm/cacheflush.h                 |    6 
 arch/arm64/include/asm/kfence.h                     |    2 
 arch/arm64/include/asm/pgtable.h                    |    8 
 arch/arm64/include/asm/set_memory.h                 |   17 +
 arch/arm64/include/uapi/asm/unistd.h                |    1 
 arch/arm64/kernel/machine_kexec.c                   |    1 
 arch/arm64/kernel/setup.c                           |    5 
 arch/arm64/kernel/stacktrace.c                      |    2 
 arch/arm64/mm/mmu.c                                 |    7 
 arch/arm64/mm/pageattr.c                            |   13 
 arch/csky/kernel/setup.c                            |    5 
 arch/h8300/kernel/setup.c                           |    5 
 arch/hexagon/Kconfig                                |    1 
 arch/hexagon/kernel/vmlinux.lds.S                   |    9 
 arch/ia64/include/asm/pgtable.h                     |    4 
 arch/m68k/include/asm/motorola_pgtable.h            |    2 
 arch/m68k/kernel/setup_mm.c                         |    5 
 arch/m68k/kernel/setup_no.c                         |    5 
 arch/mips/include/asm/pgtable-64.h                  |    8 
 arch/nds32/kernel/setup.c                           |    5 
 arch/nios2/kernel/setup.c                           |    5 
 arch/openrisc/kernel/setup.c                        |    5 
 arch/parisc/include/asm/pgtable.h                   |    4 
 arch/powerpc/include/asm/book3s/64/pgtable.h        |   11 
 arch/powerpc/include/asm/book3s/64/tlbflush-radix.h |    2 
 arch/powerpc/include/asm/nohash/64/pgtable-4k.h     |    6 
 arch/powerpc/include/asm/nohash/64/pgtable.h        |    6 
 arch/powerpc/include/asm/tlb.h                      |    6 
 arch/powerpc/kernel/setup-common.c                  |    5 
 arch/powerpc/mm/book3s64/radix_hugetlbpage.c        |    8 
 arch/powerpc/mm/book3s64/radix_pgtable.c            |    6 
 arch/powerpc/mm/book3s64/radix_tlb.c                |   44 +-
 arch/powerpc/mm/pgtable_64.c                        |    4 
 arch/powerpc/platforms/Kconfig.cputype              |    2 
 arch/riscv/Kconfig                                  |    4 
 arch/riscv/include/asm/pgtable-64.h                 |    4 
 arch/riscv/include/asm/unistd.h                     |    1 
 arch/riscv/kernel/setup.c                           |    5 
 arch/s390/kernel/setup.c                            |    5 
 arch/sh/include/asm/pgtable-3level.h                |    4 
 arch/sh/kernel/setup.c                              |    5 
 arch/sparc/include/asm/pgtable_32.h                 |    6 
 arch/sparc/include/asm/pgtable_64.h                 |   10 
 arch/um/include/asm/pgtable-3level.h                |    2 
 arch/x86/entry/syscalls/syscall_32.tbl              |    1 
 arch/x86/entry/syscalls/syscall_64.tbl              |    1 
 arch/x86/include/asm/pgtable.h                      |    8 
 arch/x86/kernel/dumpstack.c                         |    2 
 arch/x86/kernel/setup.c                             |    5 
 arch/x86/mm/init_64.c                               |    4 
 arch/x86/mm/pat/set_memory.c                        |    4 
 arch/x86/mm/pgtable.c                               |    2 
 include/asm-generic/pgtable-nop4d.h                 |    2 
 include/asm-generic/pgtable-nopmd.h                 |    2 
 include/asm-generic/pgtable-nopud.h                 |    4 
 include/linux/bootconfig.h                          |    4 
 include/linux/buildid.h                             |   10 
 include/linux/compaction.h                          |    4 
 include/linux/cpumask.h                             |    2 
 include/linux/crash_core.h                          |   12 
 include/linux/debugobjects.h                        |    2 
 include/linux/hmm.h                                 |    2 
 include/linux/hugetlb.h                             |    6 
 include/linux/kallsyms.h                            |   21 +
 include/linux/list_lru.h                            |    4 
 include/linux/lru_cache.h                           |    8 
 include/linux/mm.h                                  |    3 
 include/linux/mmu_notifier.h                        |    8 
 include/linux/module.h                              |    9 
 include/linux/nodemask.h                            |    6 
 include/linux/percpu-defs.h                         |    2 
 include/linux/percpu-refcount.h                     |    2 
 include/linux/pgtable.h                             |    4 
 include/linux/scatterlist.h                         |    2 
 include/linux/secretmem.h                           |   54 +++
 include/linux/set_memory.h                          |   12 
 include/linux/shrinker.h                            |    2 
 include/linux/syscalls.h                            |    1 
 include/linux/vmalloc.h                             |    4 
 include/uapi/asm-generic/unistd.h                   |    7 
 include/uapi/linux/magic.h                          |    1 
 init/Kconfig                                        |    1 
 init/main.c                                         |    2 
 kernel/crash_core.c                                 |   50 ---
 kernel/kallsyms.c                                   |  104 +++++--
 kernel/module.c                                     |   42 ++
 kernel/power/hibernate.c                            |    5 
 kernel/sys_ni.c                                     |    2 
 lib/Kconfig.debug                                   |   17 -
 lib/asn1_encoder.c                                  |    2 
 lib/buildid.c                                       |   80 ++++-
 lib/devres.c                                        |    2 
 lib/dump_stack.c                                    |   13 
 lib/dynamic_debug.c                                 |    2 
 lib/fonts/font_pearl_8x8.c                          |    2 
 lib/kfifo.c                                         |    2 
 lib/list_sort.c                                     |    2 
 lib/nlattr.c                                        |    4 
 lib/oid_registry.c                                  |    2 
 lib/pldmfw/pldmfw.c                                 |    2 
 lib/reed_solomon/test_rslib.c                       |    2 
 lib/refcount.c                                      |    2 
 lib/rhashtable.c                                    |    2 
 lib/sbitmap.c                                       |    2 
 lib/scatterlist.c                                   |    4 
 lib/seq_buf.c                                       |    2 
 lib/sort.c                                          |    2 
 lib/stackdepot.c                                    |    2 
 lib/test_bitops.c                                   |    2 
 lib/test_bpf.c                                      |    2 
 lib/test_kasan.c                                    |    2 
 lib/test_kmod.c                                     |    6 
 lib/test_scanf.c                                    |    2 
 lib/vsprintf.c                                      |   10 
 mm/Kconfig                                          |    4 
 mm/Makefile                                         |    1 
 mm/gup.c                                            |   12 
 mm/init-mm.c                                        |    9 
 mm/internal.h                                       |    3 
 mm/mlock.c                                          |    3 
 mm/mmap.c                                           |    5 
 mm/mremap.c                                         |  108 ++++++-
 mm/secretmem.c                                      |  254 +++++++++++++++++
 mm/slub.c                                           |   79 +++--
 scripts/checksyscalls.sh                            |    4 
 scripts/decode_stacktrace.sh                        |   89 +++++-
 tools/testing/selftests/vm/.gitignore               |    1 
 tools/testing/selftests/vm/Makefile                 |    3 
 tools/testing/selftests/vm/memfd_secret.c           |  296 ++++++++++++++++++++
 tools/testing/selftests/vm/mremap_test.c            |  116 ++++---
 tools/testing/selftests/vm/run_vmtests.sh           |   17 +
 137 files changed, 1470 insertions(+), 442 deletions(-)



^ permalink raw reply	[flat|nested] 281+ messages in thread

* Re: incoming
  2021-07-03  0:28 ` incoming Linus Torvalds
@ 2021-07-03  1:06   ` Linus Torvalds
  0 siblings, 0 replies; 281+ messages in thread
From: Linus Torvalds @ 2021-07-03  1:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux-MM, mm-commits

On Fri, Jul 2, 2021 at 5:28 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Commit e058a84bfddc42ba356a2316f2cf1141974625c9 is good, and looking
> at the pulls and merges I've done since, this -mm series looks like
> the obvious culprit.

No, unless my bisection is wrong, the -mm branch is innocent, and was
discarded from the suspects on the very first bisection trial.

So never mind.

             Linus


^ permalink raw reply	[flat|nested] 281+ messages in thread

* Re: incoming
  2021-07-01  1:46 incoming Andrew Morton
@ 2021-07-03  0:28 ` Linus Torvalds
  2021-07-03  1:06   ` incoming Linus Torvalds
  0 siblings, 1 reply; 281+ messages in thread
From: Linus Torvalds @ 2021-07-03  0:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux-MM, mm-commits

On Wed, Jun 30, 2021 at 6:46 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> This is the rest of the -mm tree, less 66 patches which are dependent on
> things which are (or were recently) in linux-next.  I'll trickle that
> material over next week.

I haven't bisected this yet, but with the current -git I'm getting

   watchdog: BUG: soft lockup - CPU#41 stuck for 49s!

and the common call chain seems to be in flush_tlb_mm_range ->
on_each_cpu_cond_mask.

Commit e058a84bfddc42ba356a2316f2cf1141974625c9 is good, and looking
at the pulls and merges I've done since, this -mm series looks like
the obvious culprit.

I'll go start bisection, but I thought I'd give a heads-up in case
somebody else has seen TLB-flush-related lockups and already figured
out the guilty party..

                 Linus


^ permalink raw reply	[flat|nested] 281+ messages in thread

* incoming
@ 2021-07-01  1:46 Andrew Morton
  2021-07-03  0:28 ` incoming Linus Torvalds
  0 siblings, 1 reply; 281+ messages in thread
From: Andrew Morton @ 2021-07-01  1:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, mm-commits


This is the rest of the -mm tree, less 66 patches which are dependent on
things which are (or were recently) in linux-next.  I'll trickle that
material over next week.


192 patches, based on 7cf3dead1ad70c72edb03e2d98e1f3dcd332cdb2 plus the
June 28 sendings.

Subsystems affected by this patch series:

  mm/hugetlb
  mm/userfaultfd
  mm/vmscan
  mm/kconfig
  mm/proc
  mm/z3fold
  mm/zbud
  mm/ras
  mm/mempolicy
  mm/memblock
  mm/migration
  mm/thp
  mm/nommu
  mm/kconfig
  mm/madvise
  mm/memory-hotplug
  mm/zswap
  mm/zsmalloc
  mm/zram
  mm/cleanups
  mm/kfence
  mm/hmm
  procfs
  sysctl
  misc
  core-kernel
  lib
  lz4
  checkpatch
  init
  kprobes
  nilfs2
  hfs
  signals
  exec
  kcov
  selftests
  compress/decompress
  ipc

Subsystem: mm/hugetlb

    Muchun Song <songmuchun@bytedance.com>:
    Patch series "Free some vmemmap pages of HugeTLB page", v23:
      mm: memory_hotplug: factor out bootmem core functions to bootmem_info.c
      mm: hugetlb: introduce a new config HUGETLB_PAGE_FREE_VMEMMAP
      mm: hugetlb: gather discrete indexes of tail page
      mm: hugetlb: free the vmemmap pages associated with each HugeTLB page
      mm: hugetlb: defer freeing of HugeTLB pages
      mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page
      mm: hugetlb: add a kernel parameter hugetlb_free_vmemmap
      mm: memory_hotplug: disable memmap_on_memory when hugetlb_free_vmemmap enabled
      mm: hugetlb: introduce nr_free_vmemmap_pages in the struct hstate

    Shixin Liu <liushixin2@huawei.com>:
      mm/debug_vm_pgtable: move {pmd/pud}_huge_tests out of CONFIG_TRANSPARENT_HUGEPAGE
      mm/debug_vm_pgtable: remove redundant pfn_{pmd/pte}() and fix one comment mistake

    Miaohe Lin <linmiaohe@huawei.com>:
    Patch series "Cleanup and fixup for huge_memory:, v3:
      mm/huge_memory.c: remove dedicated macro HPAGE_CACHE_INDEX_MASK
      mm/huge_memory.c: use page->deferred_list
      mm/huge_memory.c: add missing read-only THP checking in transparent_hugepage_enabled()
      mm/huge_memory.c: remove unnecessary tlb_remove_page_size() for huge zero pmd
      mm/huge_memory.c: don't discard hugepage if other processes are mapping it

    Christophe Leroy <christophe.leroy@csgroup.eu>:
    Patch series "Subject: [PATCH v2 0/5] Implement huge VMAP and VMALLOC on powerpc 8xx", v2:
      mm/hugetlb: change parameters of arch_make_huge_pte()
      mm/pgtable: add stubs for {pmd/pub}_{set/clear}_huge
      mm/vmalloc: enable mapping of huge pages at pte level in vmap
      mm/vmalloc: enable mapping of huge pages at pte level in vmalloc
      powerpc/8xx: add support for huge pages on VMAP and VMALLOC

    Nanyong Sun <sunnanyong@huawei.com>:
      khugepaged: selftests: remove debug_cow

    Mina Almasry <almasrymina@google.com>:
      mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY

    Muchun Song <songmuchun@bytedance.com>:
    Patch series "Split huge PMD mapping of vmemmap pages", v4:
      mm: sparsemem: split the huge PMD mapping of vmemmap pages
      mm: sparsemem: use huge PMD mapping for vmemmap pages
      mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON

    Mike Kravetz <mike.kravetz@oracle.com>:
    Patch series "Fix prep_compound_gigantic_page ref count adjustment":
      hugetlb: remove prep_compound_huge_page cleanup
      hugetlb: address ref count racing in prep_compound_gigantic_page

    Naoya Horiguchi <naoya.horiguchi@nec.com>:
      mm/hwpoison: disable pcp for page_handle_poison()

Subsystem: mm/userfaultfd

    Peter Xu <peterx@redhat.com>:
    Patch series "userfaultfd/selftests: A few cleanups", v2:
      userfaultfd/selftests: use user mode only
      userfaultfd/selftests: remove the time() check on delayed uffd
      userfaultfd/selftests: dropping VERIFY check in locking_thread
      userfaultfd/selftests: only dump counts if mode enabled
      userfaultfd/selftests: unify error handling
    Patch series "mm/uffd: Misc fix for uffd-wp and one more test":
      mm/thp: simplify copying of huge zero page pmd when fork
      mm/userfaultfd: fix uffd-wp special cases for fork()
      mm/userfaultfd: fail uffd-wp registration if not supported
      mm/pagemap: export uffd-wp protection information
      userfaultfd/selftests: add pagemap uffd-wp test

    Axel Rasmussen <axelrasmussen@google.com>:
    Patch series "userfaultfd: add minor fault handling for shmem", v6:
      userfaultfd/shmem: combine shmem_{mcopy_atomic,mfill_zeropage}_pte
      userfaultfd/shmem: support minor fault registration for shmem
      userfaultfd/shmem: support UFFDIO_CONTINUE for shmem
      userfaultfd/shmem: advertise shmem minor fault support
      userfaultfd/shmem: modify shmem_mfill_atomic_pte to use install_pte()
      userfaultfd/selftests: use memfd_create for shmem test type
      userfaultfd/selftests: create alias mappings in the shmem test
      userfaultfd/selftests: reinitialize test context in each test
      userfaultfd/selftests: exercise minor fault handling shmem support

Subsystem: mm/vmscan

    Yu Zhao <yuzhao@google.com>:
      mm/vmscan.c: fix potential deadlock in reclaim_pages()
      include/trace/events/vmscan.h: remove mm_vmscan_inactive_list_is_low

    Miaohe Lin <linmiaohe@huawei.com>:
      mm: workingset: define macro WORKINGSET_SHIFT

Subsystem: mm/kconfig

    Kefeng Wang <wangkefeng.wang@huawei.com>:
      mm/kconfig: move HOLES_IN_ZONE into mm

Subsystem: mm/proc

    Mike Rapoport <rppt@linux.ibm.com>:
      docs: proc.rst: meminfo: briefly describe gaps in memory accounting

    David Hildenbrand <david@redhat.com>:
    Patch series "fs/proc/kcore: don't read offline sections, logically offline pages and hwpoisoned pages", v3:
      fs/proc/kcore: drop KCORE_REMAP and KCORE_OTHER
      fs/proc/kcore: pfn_is_ram check only applies to KCORE_RAM
      fs/proc/kcore: don't read offline sections, logically offline pages and hwpoisoned pages
      mm: introduce page_offline_(begin|end|freeze|thaw) to synchronize setting PageOffline()
      virtio-mem: use page_offline_(start|end) when setting PageOffline()
      fs/proc/kcore: use page_offline_(freeze|thaw)

Subsystem: mm/z3fold

    Miaohe Lin <linmiaohe@huawei.com>:
    Patch series "Cleanup and fixup for z3fold":
      mm/z3fold: define macro NCHUNKS as TOTAL_CHUNKS - ZHDR_CHUNKS
      mm/z3fold: avoid possible underflow in z3fold_alloc()
      mm/z3fold: remove magic number in z3fold_create_pool()
      mm/z3fold: remove unused function handle_to_z3fold_header()
      mm/z3fold: fix potential memory leak in z3fold_destroy_pool()
      mm/z3fold: use release_z3fold_page_locked() to release locked z3fold page

Subsystem: mm/zbud

    Miaohe Lin <linmiaohe@huawei.com>:
    Patch series "Cleanups for zbud", v2:
      mm/zbud: reuse unbuddied[0] as buddied in zbud_pool
      mm/zbud: don't export any zbud API

Subsystem: mm/ras

    YueHaibing <yuehaibing@huawei.com>:
      mm/compaction: use DEVICE_ATTR_WO macro

    Liu Xiang <liu.xiang@zlingsmart.com>:
      mm: compaction: remove duplicate !list_empty(&sublist) check

    Wonhyuk Yang <vvghjk1234@gmail.com>:
      mm/compaction: fix 'limit' in fast_isolate_freepages

Subsystem: mm/mempolicy

    Feng Tang <feng.tang@intel.com>:
    Patch series "mm/mempolicy: some fix and semantics cleanup", v4:
      mm/mempolicy: cleanup nodemask intersection check for oom
      mm/mempolicy: don't handle MPOL_LOCAL like a fake MPOL_PREFERRED policy
      mm/mempolicy: unify the parameter sanity check for mbind and set_mempolicy

    Yang Shi <shy828301@gmail.com>:
      mm: mempolicy: don't have to split pmd for huge zero page

    Ben Widawsky <ben.widawsky@intel.com>:
      mm/mempolicy: use unified 'nodes' for bind/interleave/prefer policies

Subsystem: mm/memblock

    Mike Rapoport <rppt@linux.ibm.com>:
    Patch series "arm64: drop pfn_valid_within() and simplify pfn_valid()", v4:
      include/linux/mmzone.h: add documentation for pfn_valid()
      memblock: update initialization of reserved pages
      arm64: decouple check whether pfn is in linear map from pfn_valid()
      arm64: drop pfn_valid_within() and simplify pfn_valid()

    Anshuman Khandual <anshuman.khandual@arm.com>:
      arm64/mm: drop HAVE_ARCH_PFN_VALID

Subsystem: mm/migration

    Muchun Song <songmuchun@bytedance.com>:
      mm: migrate: fix missing update page_private to hugetlb_page_subpool

Subsystem: mm/thp

    Collin Fijalkovich <cfijalkovich@google.com>:
      mm, thp: relax the VM_DENYWRITE constraint on file-backed THPs

    Yang Shi <shy828301@gmail.com>:
      mm: memory: add orig_pmd to struct vm_fault
      mm: memory: make numa_migrate_prep() non-static
      mm: thp: refactor NUMA fault handling
      mm: migrate: account THP NUMA migration counters correctly
      mm: migrate: don't split THP for misplaced NUMA page
      mm: migrate: check mapcount for THP instead of refcount
      mm: thp: skip make PMD PROT_NONE if THP migration is not supported

    Anshuman Khandual <anshuman.khandual@arm.com>:
      mm/thp: make ARCH_ENABLE_SPLIT_PMD_PTLOCK dependent on PGTABLE_LEVELS > 2

    Yang Shi <shy828301@gmail.com>:
      mm: rmap: make try_to_unmap() void function

    Hugh Dickins <hughd@google.com>:
      mm/thp: remap_page() is only needed on anonymous THP
      mm: hwpoison_user_mappings() try_to_unmap() with TTU_SYNC

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
      mm/thp: fix strncpy warning

Subsystem: mm/nommu

    Chen Li <chenli@uniontech.com>:
      nommu: remove __GFP_HIGHMEM in vmalloc/vzalloc

    Liam Howlett <liam.howlett@oracle.com>:
      mm/nommu: unexport do_munmap()

Subsystem: mm/kconfig

    Kefeng Wang <wangkefeng.wang@huawei.com>:
      mm: generalize ZONE_[DMA|DMA32]

Subsystem: mm/madvise

    David Hildenbrand <david@redhat.com>:
    Patch series "mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables", v2:
      mm: make variable names for populate_vma_page_range() consistent
      mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables
      MAINTAINERS: add tools/testing/selftests/vm/ to MEMORY MANAGEMENT
      selftests/vm: add protection_keys_32 / protection_keys_64 to gitignore
      selftests/vm: add test for MADV_POPULATE_(READ|WRITE)

Subsystem: mm/memory-hotplug

    Liam Mark <lmark@codeaurora.org>:
      mm/memory_hotplug: rate limit page migration warnings

    Oscar Salvador <osalvador@suse.de>:
      mm,memory_hotplug: drop unneeded locking

Subsystem: mm/zswap

    Miaohe Lin <linmiaohe@huawei.com>:
    Patch series "Cleanup and fixup for zswap":
      mm/zswap.c: remove unused function zswap_debugfs_exit()
      mm/zswap.c: avoid unnecessary copy-in at map time
      mm/zswap.c: fix two bugs in zswap_writeback_entry()

Subsystem: mm/zsmalloc

    Zhaoyang Huang <zhaoyang.huang@unisoc.com>:
      mm: zram: amend SLAB_RECLAIM_ACCOUNT on zspage_cachep

    Miaohe Lin <linmiaohe@huawei.com>:
    Patch series "Cleanup for zsmalloc":
      mm/zsmalloc.c: remove confusing code in obj_free()
      mm/zsmalloc.c: improve readability for async_free_zspage()

Subsystem: mm/zram

    Yue Hu <huyue2@yulong.com>:
      zram: move backing_dev under macro CONFIG_ZRAM_WRITEBACK

Subsystem: mm/cleanups

    Hyeonggon Yoo <42.hyeyoo@gmail.com>:
      mm: fix typos and grammar error in comments

    Anshuman Khandual <anshuman.khandual@arm.com>:
      mm: define default value for FIRST_USER_ADDRESS

    Zhen Lei <thunder.leizhen@huawei.com>:
      mm: fix spelling mistakes

    Mel Gorman <mgorman@techsingularity.net>:
    Patch series "Clean W=1 build warnings for mm/":
      mm/vmscan: remove kerneldoc-like comment from isolate_lru_pages
      mm/vmalloc: include header for prototype of set_iounmap_nonlazy
      mm/page_alloc: make should_fail_alloc_page() static
      mm/mapping_dirty_helpers: remove double Note in kerneldoc
      mm/memcontrol.c: fix kerneldoc comment for mem_cgroup_calculate_protection
      mm/memory_hotplug: fix kerneldoc comment for __try_online_node
      mm/memory_hotplug: fix kerneldoc comment for __remove_memory
      mm/zbud: add kerneldoc fields for zbud_pool
      mm/z3fold: add kerneldoc fields for z3fold_pool
      mm/swap: make swap_address_space an inline function
      mm/mmap_lock: remove dead code for !CONFIG_TRACING configurations
      mm/page_alloc: move prototype for find_suitable_fallback
      mm/swap: make NODE_DATA an inline function on CONFIG_FLATMEM

    Anshuman Khandual <anshuman.khandual@arm.com>:
      mm/thp: define default pmd_pgtable()

Subsystem: mm/kfence

    Marco Elver <elver@google.com>:
      kfence: unconditionally use unbound work queue

Subsystem: mm/hmm

    Alistair Popple <apopple@nvidia.com>:
    Patch series "Add support for SVM atomics in Nouveau", v11:
      mm: remove special swap entry functions
      mm/swapops: rework swap entry manipulation code
      mm/rmap: split try_to_munlock from try_to_unmap
      mm/rmap: split migration into its own function
      mm: rename migrate_pgmap_owner
      mm/memory.c: allow different return codes for copy_nonpresent_pte()
      mm: device exclusive memory access
      mm: selftests for exclusive device memory
      nouveau/svm: refactor nouveau_range_fault
      nouveau/svm: implement atomic SVM access

Subsystem: procfs

    Marcelo Henrique Cerri <marcelo.cerri@canonical.com>:
      proc: Avoid mixing integer types in mem_rw()

    ZHOUFENG <zhoufeng.zf@bytedance.com>:
      fs/proc/kcore.c: add mmap interface

    Kalesh Singh <kaleshsingh@google.com>:
      procfs: allow reading fdinfo with PTRACE_MODE_READ
      procfs/dmabuf: add inode number to /proc/*/fdinfo

Subsystem: sysctl

    Jiapeng Chong <jiapeng.chong@linux.alibaba.com>:
      sysctl: remove redundant assignment to first

Subsystem: misc

    Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
      drm: include only needed headers in ascii85.h

Subsystem: core-kernel

    Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
      kernel.h: split out panic and oops helpers

Subsystem: lib

    Zhen Lei <thunder.leizhen@huawei.com>:
      lib: decompress_bunzip2: remove an unneeded semicolon

    Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
    Patch series "lib/string_helpers: get rid of ugly *_escape_mem_ascii()", v3:
      lib/string_helpers: switch to use BIT() macro
      lib/string_helpers: move ESCAPE_NP check inside 'else' branch in a loop
      lib/string_helpers: drop indentation level in string_escape_mem()
      lib/string_helpers: introduce ESCAPE_NA for escaping non-ASCII
      lib/string_helpers: introduce ESCAPE_NAP to escape non-ASCII and non-printable
      lib/string_helpers: allow to append additional characters to be escaped
      lib/test-string_helpers: print flags in hexadecimal format
      lib/test-string_helpers: get rid of trailing comma in terminators
      lib/test-string_helpers: add test cases for new features
      MAINTAINERS: add myself as designated reviewer for generic string library
      seq_file: introduce seq_escape_mem()
      seq_file: add seq_escape_str() as replica of string_escape_str()
      seq_file: convert seq_escape() to use seq_escape_str()
      nfsd: avoid non-flexible API in seq_quote_mem()
      seq_file: drop unused *_escape_mem_ascii()

    Trent Piepho <tpiepho@gmail.com>:
      lib/math/rational.c: fix divide by zero
      lib/math/rational: add Kunit test cases

    Zhen Lei <thunder.leizhen@huawei.com>:
      lib/decompressors: fix spelling mistakes
      lib/mpi: fix spelling mistakes

    Alexey Dobriyan <adobriyan@gmail.com>:
      lib: memscan() fixlet
      lib: uninline simple_strtoull()

    Matteo Croce <mcroce@microsoft.com>:
      lib/test_string.c: allow module removal

    Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
      kernel.h: split out kstrtox() and simple_strtox() to a separate header

Subsystem: lz4

    Rajat Asthana <thisisrast7@gmail.com>:
      lz4_decompress: declare LZ4_decompress_safe_withPrefix64k static

    Dimitri John Ledkov <dimitri.ledkov@canonical.com>:
      lib/decompress_unlz4.c: correctly handle zero-padding around initrds.

Subsystem: checkpatch

    Guenter Roeck <linux@roeck-us.net>:
      checkpatch: scripts/spdxcheck.py now requires python3

    Joe Perches <joe@perches.com>:
      checkpatch: improve the indented label test

    Guenter Roeck <linux@roeck-us.net>:
      checkpatch: do not complain about positive return values starting with EPOLL

Subsystem: init

    Andrew Halaney <ahalaney@redhat.com>:
      init: print out unknown kernel parameters

Subsystem: kprobes

    Barry Song <song.bao.hua@hisilicon.com>:
      kprobes: remove duplicated strong free_insn_page in x86 and s390

Subsystem: nilfs2

    Colin Ian King <colin.king@canonical.com>:
      nilfs2: remove redundant continue statement in a while-loop

Subsystem: hfs

    Zhen Lei <thunder.leizhen@huawei.com>:
      hfsplus: remove unnecessary oom message

    Chung-Chiang Cheng <shepjeng@gmail.com>:
      hfsplus: report create_date to kstat.btime

Subsystem: signals

    Al Viro <viro@zeniv.linux.org.uk>:
      x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned

Subsystem: exec

    Alexey Dobriyan <adobriyan@gmail.com>:
      exec: remove checks in __register_bimfmt()

Subsystem: kcov

    Marco Elver <elver@google.com>:
      kcov: add __no_sanitize_coverage to fix noinstr for all architectures

Subsystem: selftests

    Dave Hansen <dave.hansen@linux.intel.com>:
    Patch series "selftests/vm/pkeys: Bug fixes and a new test":
      selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random
      selftests/vm/pkeys: handle negative sys_pkey_alloc() return code
      selftests/vm/pkeys: refill shadow register after implicit kernel write
      selftests/vm/pkeys: exercise x86 XSAVE init state

Subsystem: compress/decompress

    Yu Kuai <yukuai3@huawei.com>:
      lib/decompressors: remove set but not used variabled 'level'

Subsystem: ipc

    Vasily Averin <vvs@virtuozzo.com>:
    Patch series "ipc: allocations cleanup", v2:
      ipc sem: use kvmalloc for sem_undo allocation
      ipc: use kmalloc for msg_queue and shmid_kernel

    Manfred Spraul <manfred@colorfullife.com>:
      ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock
      ipc/util.c: use binary search for max_idx

 Documentation/admin-guide/kernel-parameters.txt    |   35 
 Documentation/admin-guide/mm/hugetlbpage.rst       |   11 
 Documentation/admin-guide/mm/memory-hotplug.rst    |   13 
 Documentation/admin-guide/mm/pagemap.rst           |    2 
 Documentation/admin-guide/mm/userfaultfd.rst       |    3 
 Documentation/core-api/kernel-api.rst              |    7 
 Documentation/filesystems/proc.rst                 |   48 
 Documentation/vm/hmm.rst                           |   19 
 Documentation/vm/unevictable-lru.rst               |   33 
 MAINTAINERS                                        |   10 
 arch/alpha/Kconfig                                 |    5 
 arch/alpha/include/asm/pgalloc.h                   |    1 
 arch/alpha/include/asm/pgtable.h                   |    1 
 arch/alpha/include/uapi/asm/mman.h                 |    3 
 arch/alpha/kernel/setup.c                          |    2 
 arch/arc/include/asm/pgalloc.h                     |    2 
 arch/arc/include/asm/pgtable.h                     |    8 
 arch/arm/Kconfig                                   |    3 
 arch/arm/include/asm/pgalloc.h                     |    1 
 arch/arm64/Kconfig                                 |   15 
 arch/arm64/include/asm/hugetlb.h                   |    3 
 arch/arm64/include/asm/memory.h                    |    2 
 arch/arm64/include/asm/page.h                      |    4 
 arch/arm64/include/asm/pgalloc.h                   |    1 
 arch/arm64/include/asm/pgtable.h                   |    2 
 arch/arm64/kernel/setup.c                          |    1 
 arch/arm64/kvm/mmu.c                               |    2 
 arch/arm64/mm/hugetlbpage.c                        |    5 
 arch/arm64/mm/init.c                               |   51 
 arch/arm64/mm/ioremap.c                            |    4 
 arch/arm64/mm/mmu.c                                |   22 
 arch/csky/include/asm/pgalloc.h                    |    2 
 arch/csky/include/asm/pgtable.h                    |    1 
 arch/hexagon/include/asm/pgtable.h                 |    4 
 arch/ia64/Kconfig                                  |    7 
 arch/ia64/include/asm/pal.h                        |    1 
 arch/ia64/include/asm/pgalloc.h                    |    1 
 arch/ia64/include/asm/pgtable.h                    |    1 
 arch/m68k/Kconfig                                  |    5 
 arch/m68k/include/asm/mcf_pgalloc.h                |    2 
 arch/m68k/include/asm/mcf_pgtable.h                |    2 
 arch/m68k/include/asm/motorola_pgalloc.h           |    1 
 arch/m68k/include/asm/motorola_pgtable.h           |    2 
 arch/m68k/include/asm/pgtable_mm.h                 |    1 
 arch/m68k/include/asm/sun3_pgalloc.h               |    1 
 arch/microblaze/Kconfig                            |    4 
 arch/microblaze/include/asm/pgalloc.h              |    2 
 arch/microblaze/include/asm/pgtable.h              |    2 
 arch/mips/Kconfig                                  |   10 
 arch/mips/include/asm/pgalloc.h                    |    1 
 arch/mips/include/asm/pgtable-32.h                 |    1 
 arch/mips/include/asm/pgtable-64.h                 |    1 
 arch/mips/include/uapi/asm/mman.h                  |    3 
 arch/mips/kernel/relocate.c                        |    1 
 arch/mips/sgi-ip22/ip22-reset.c                    |    1 
 arch/mips/sgi-ip32/ip32-reset.c                    |    1 
 arch/nds32/include/asm/pgalloc.h                   |    5 
 arch/nios2/include/asm/pgalloc.h                   |    1 
 arch/nios2/include/asm/pgtable.h                   |    2 
 arch/openrisc/include/asm/pgalloc.h                |    2 
 arch/openrisc/include/asm/pgtable.h                |    1 
 arch/parisc/include/asm/pgalloc.h                  |    1 
 arch/parisc/include/asm/pgtable.h                  |    2 
 arch/parisc/include/uapi/asm/mman.h                |    3 
 arch/parisc/kernel/pdc_chassis.c                   |    1 
 arch/powerpc/Kconfig                               |    6 
 arch/powerpc/include/asm/book3s/pgtable.h          |    1 
 arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h   |    5 
 arch/powerpc/include/asm/nohash/32/mmu-8xx.h       |   43 
 arch/powerpc/include/asm/nohash/32/pgtable.h       |    1 
 arch/powerpc/include/asm/nohash/64/pgtable.h       |    2 
 arch/powerpc/include/asm/pgalloc.h                 |    5 
 arch/powerpc/include/asm/pgtable.h                 |    6 
 arch/powerpc/kernel/setup-common.c                 |    1 
 arch/powerpc/platforms/Kconfig.cputype             |    1 
 arch/riscv/Kconfig                                 |    5 
 arch/riscv/include/asm/pgalloc.h                   |    2 
 arch/riscv/include/asm/pgtable.h                   |    2 
 arch/s390/Kconfig                                  |    6 
 arch/s390/include/asm/pgalloc.h                    |    3 
 arch/s390/include/asm/pgtable.h                    |    5 
 arch/s390/kernel/ipl.c                             |    1 
 arch/s390/kernel/kprobes.c                         |    5 
 arch/s390/mm/pgtable.c                             |    2 
 arch/sh/include/asm/pgalloc.h                      |    1 
 arch/sh/include/asm/pgtable.h                      |    2 
 arch/sparc/Kconfig                                 |    5 
 arch/sparc/include/asm/pgalloc_32.h                |    1 
 arch/sparc/include/asm/pgalloc_64.h                |    1 
 arch/sparc/include/asm/pgtable_32.h                |    3 
 arch/sparc/include/asm/pgtable_64.h                |    8 
 arch/sparc/kernel/sstate.c                         |    1 
 arch/sparc/mm/hugetlbpage.c                        |    6 
 arch/sparc/mm/init_64.c                            |    1 
 arch/um/drivers/mconsole_kern.c                    |    1 
 arch/um/include/asm/pgalloc.h                      |    1 
 arch/um/include/asm/pgtable-2level.h               |    1 
 arch/um/include/asm/pgtable-3level.h               |    1 
 arch/um/kernel/um_arch.c                           |    1 
 arch/x86/Kconfig                                   |   17 
 arch/x86/include/asm/desc.h                        |    1 
 arch/x86/include/asm/pgalloc.h                     |    2 
 arch/x86/include/asm/pgtable_types.h               |    2 
 arch/x86/kernel/cpu/mshyperv.c                     |    1 
 arch/x86/kernel/kprobes/core.c                     |    6 
 arch/x86/kernel/setup.c                            |    1 
 arch/x86/mm/init_64.c                              |   21 
 arch/x86/mm/pgtable.c                              |   34 
 arch/x86/purgatory/purgatory.c                     |    2 
 arch/x86/xen/enlighten.c                           |    1 
 arch/xtensa/include/asm/pgalloc.h                  |    2 
 arch/xtensa/include/asm/pgtable.h                  |    1 
 arch/xtensa/include/uapi/asm/mman.h                |    3 
 arch/xtensa/platforms/iss/setup.c                  |    1 
 drivers/block/zram/zram_drv.h                      |    2 
 drivers/bus/brcmstb_gisb.c                         |    1 
 drivers/char/ipmi/ipmi_msghandler.c                |    1 
 drivers/clk/analogbits/wrpll-cln28hpc.c            |    4 
 drivers/edac/altera_edac.c                         |    1 
 drivers/firmware/google/gsmi.c                     |    1 
 drivers/gpu/drm/nouveau/include/nvif/if000c.h      |    1 
 drivers/gpu/drm/nouveau/nouveau_svm.c              |  162 ++-
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h      |    1 
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c |    6 
 drivers/hv/vmbus_drv.c                             |    1 
 drivers/hwtracing/coresight/coresight-cpu-debug.c  |    1 
 drivers/leds/trigger/ledtrig-activity.c            |    1 
 drivers/leds/trigger/ledtrig-heartbeat.c           |    1 
 drivers/leds/trigger/ledtrig-panic.c               |    1 
 drivers/misc/bcm-vk/bcm_vk_dev.c                   |    1 
 drivers/misc/ibmasm/heartbeat.c                    |    1 
 drivers/misc/pvpanic/pvpanic.c                     |    1 
 drivers/net/ipa/ipa_smp2p.c                        |    1 
 drivers/parisc/power.c                             |    1 
 drivers/power/reset/ltc2952-poweroff.c             |    1 
 drivers/remoteproc/remoteproc_core.c               |    1 
 drivers/s390/char/con3215.c                        |    1 
 drivers/s390/char/con3270.c                        |    1 
 drivers/s390/char/sclp.c                           |    1 
 drivers/s390/char/sclp_con.c                       |    1 
 drivers/s390/char/sclp_vt220.c                     |    1 
 drivers/s390/char/zcore.c                          |    1 
 drivers/soc/bcm/brcmstb/pm/pm-arm.c                |    1 
 drivers/staging/olpc_dcon/olpc_dcon.c              |    1 
 drivers/video/fbdev/hyperv_fb.c                    |    1 
 drivers/virtio/virtio_mem.c                        |    2 
 fs/Kconfig                                         |   15 
 fs/exec.c                                          |    3 
 fs/hfsplus/inode.c                                 |    5 
 fs/hfsplus/xattr.c                                 |    1 
 fs/nfsd/nfs4state.c                                |    2 
 fs/nilfs2/btree.c                                  |    1 
 fs/open.c                                          |   13 
 fs/proc/base.c                                     |    6 
 fs/proc/fd.c                                       |   20 
 fs/proc/kcore.c                                    |  136 ++
 fs/proc/task_mmu.c                                 |   34 
 fs/seq_file.c                                      |   43 
 fs/userfaultfd.c                                   |   15 
 include/asm-generic/bug.h                          |    3 
 include/linux/ascii85.h                            |    3 
 include/linux/bootmem_info.h                       |   68 +
 include/linux/compat.h                             |    2 
 include/linux/compiler-clang.h                     |   17 
 include/linux/compiler-gcc.h                       |    6 
 include/linux/compiler_types.h                     |    2 
 include/linux/huge_mm.h                            |   74 -
 include/linux/hugetlb.h                            |   80 +
 include/linux/hugetlb_cgroup.h                     |   19 
 include/linux/kcore.h                              |    3 
 include/linux/kernel.h                             |  227 ----
 include/linux/kprobes.h                            |    1 
 include/linux/kstrtox.h                            |  155 ++
 include/linux/memblock.h                           |    4 
 include/linux/memory_hotplug.h                     |   27 
 include/linux/mempolicy.h                          |    9 
 include/linux/memremap.h                           |    2 
 include/linux/migrate.h                            |   27 
 include/linux/mm.h                                 |   18 
 include/linux/mm_types.h                           |    2 
 include/linux/mmu_notifier.h                       |   26 
 include/linux/mmzone.h                             |   27 
 include/linux/mpi.h                                |    4 
 include/linux/page-flags.h                         |   22 
 include/linux/panic.h                              |   98 +
 include/linux/panic_notifier.h                     |   12 
 include/linux/pgtable.h                            |   44 
 include/linux/rmap.h                               |   13 
 include/linux/seq_file.h                           |   10 
 include/linux/shmem_fs.h                           |   19 
 include/linux/signal.h                             |    2 
 include/linux/string.h                             |    7 
 include/linux/string_helpers.h                     |   31 
 include/linux/sunrpc/cache.h                       |    1 
 include/linux/swap.h                               |   19 
 include/linux/swapops.h                            |  171 +--
 include/linux/thread_info.h                        |    1 
 include/linux/userfaultfd_k.h                      |    5 
 include/linux/vmalloc.h                            |   15 
 include/linux/zbud.h                               |   23 
 include/trace/events/vmscan.h                      |   41 
 include/uapi/asm-generic/mman-common.h             |    3 
 include/uapi/linux/mempolicy.h                     |    1 
 include/uapi/linux/userfaultfd.h                   |    7 
 init/main.c                                        |   42 
 ipc/msg.c                                          |    6 
 ipc/sem.c                                          |   25 
 ipc/shm.c                                          |    6 
 ipc/util.c                                         |   44 
 ipc/util.h                                         |    3 
 kernel/hung_task.c                                 |    1 
 kernel/kexec_core.c                                |    1 
 kernel/kprobes.c                                   |    2 
 kernel/panic.c                                     |    1 
 kernel/rcu/tree.c                                  |    2 
 kernel/signal.c                                    |   14 
 kernel/sysctl.c                                    |    4 
 kernel/trace/trace.c                               |    1 
 lib/Kconfig.debug                                  |   12 
 lib/decompress_bunzip2.c                           |    6 
 lib/decompress_unlz4.c                             |    8 
 lib/decompress_unlzo.c                             |    3 
 lib/decompress_unxz.c                              |    2 
 lib/decompress_unzstd.c                            |    4 
 lib/kstrtox.c                                      |    5 
 lib/lz4/lz4_decompress.c                           |    2 
 lib/math/Makefile                                  |    1 
 lib/math/rational-test.c                           |   56 +
 lib/math/rational.c                                |   16 
 lib/mpi/longlong.h                                 |    4 
 lib/mpi/mpicoder.c                                 |    6 
 lib/mpi/mpiutil.c                                  |    2 
 lib/parser.c                                       |    1 
 lib/string.c                                       |    2 
 lib/string_helpers.c                               |  142 +-
 lib/test-string_helpers.c                          |  157 ++-
 lib/test_hmm.c                                     |  127 ++
 lib/test_hmm_uapi.h                                |    2 
 lib/test_string.c                                  |    5 
 lib/vsprintf.c                                     |    1 
 lib/xz/xz_dec_bcj.c                                |    2 
 lib/xz/xz_dec_lzma2.c                              |    8 
 lib/zlib_inflate/inffast.c                         |    2 
 lib/zstd/huf.h                                     |    2 
 mm/Kconfig                                         |   16 
 mm/Makefile                                        |    2 
 mm/bootmem_info.c                                  |  127 ++
 mm/compaction.c                                    |   20 
 mm/debug_vm_pgtable.c                              |  109 --
 mm/gup.c                                           |   58 +
 mm/hmm.c                                           |   12 
 mm/huge_memory.c                                   |  269 ++---
 mm/hugetlb.c                                       |  369 +++++--
 mm/hugetlb_vmemmap.c                               |  332 ++++++
 mm/hugetlb_vmemmap.h                               |   53 -
 mm/internal.h                                      |   29 
 mm/kfence/core.c                                   |    4 
 mm/khugepaged.c                                    |   20 
 mm/madvise.c                                       |   66 +
 mm/mapping_dirty_helpers.c                         |    2 
 mm/memblock.c                                      |   28 
 mm/memcontrol.c                                    |    4 
 mm/memory-failure.c                                |   38 
 mm/memory.c                                        |  239 +++-
 mm/memory_hotplug.c                                |  161 ---
 mm/mempolicy.c                                     |  323 ++----
 mm/migrate.c                                       |  268 +----
 mm/mlock.c                                         |   12 
 mm/mmap_lock.c                                     |   59 -
 mm/mprotect.c                                      |   18 
 mm/nommu.c                                         |    5 
 mm/oom_kill.c                                      |    2 
 mm/page_alloc.c                                    |    5 
 mm/page_vma_mapped.c                               |   15 
 mm/rmap.c                                          |  644 +++++++++---
 mm/shmem.c                                         |  125 --
 mm/sparse-vmemmap.c                                |  432 +++++++-
 mm/sparse.c                                        |    1 
 mm/swap.c                                          |    2 
 mm/swapfile.c                                      |    2 
 mm/userfaultfd.c                                   |  249 ++--
 mm/util.c                                          |   40 
 mm/vmalloc.c                                       |   37 
 mm/vmscan.c                                        |   20 
 mm/workingset.c                                    |   10 
 mm/z3fold.c                                        |   39 
 mm/zbud.c                                          |  235 ++--
 mm/zsmalloc.c                                      |    5 
 mm/zswap.c                                         |   26 
 scripts/checkpatch.pl                              |   16 
 tools/testing/selftests/vm/.gitignore              |    3 
 tools/testing/selftests/vm/Makefile                |    5 
 tools/testing/selftests/vm/hmm-tests.c             |  158 +++
 tools/testing/selftests/vm/khugepaged.c            |    4 
 tools/testing/selftests/vm/madv_populate.c         |  342 ++++++
 tools/testing/selftests/vm/pkey-x86.h              |    1 
 tools/testing/selftests/vm/protection_keys.c       |   85 +
 tools/testing/selftests/vm/run_vmtests.sh          |   16 
 tools/testing/selftests/vm/userfaultfd.c           | 1094 ++++++++++-----------
 299 files changed, 6277 insertions(+), 3183 deletions(-)



^ permalink raw reply	[flat|nested] 281+ messages in thread

* incoming
@ 2021-06-29  2:32 Andrew Morton
  0 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-06-29  2:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, mm-commits

192 patches, based on 7cf3dead1ad70c72edb03e2d98e1f3dcd332cdb2.

Subsystems affected by this patch series:

  mm/gup
  mm/pagealloc
  kthread
  ia64
  scripts
  ntfs
  squashfs
  ocfs2
  z
  kernel/watchdog
  mm/slab
  mm/slub
  mm/kmemleak
  mm/dax
  mm/debug
  mm/pagecache
  mm/gup
  mm/swap
  mm/memcg
  mm/pagemap
  mm/mprotect
  mm/bootmem
  mm/dma
  mm/tracing
  mm/vmalloc
  mm/kasan
  mm/initialization
  mm/pagealloc
  mm/memory-failure

Subsystem: mm/gup

    Jann Horn <jannh@google.com>:
      mm/gup: fix try_grab_compound_head() race with split_huge_page()

Subsystem: mm/pagealloc

    Mike Rapoport <rppt@linux.ibm.com>:
      mm/page_alloc: fix memory map initialization for descending nodes

    Mel Gorman <mgorman@techsingularity.net>:
      mm/page_alloc: correct return value of populated elements if bulk array is populated

Subsystem: kthread

    Jonathan Neuschäfer <j.neuschaefer@gmx.net>:
      kthread: switch to new kerneldoc syntax for named variable macro argument

    Petr Mladek <pmladek@suse.com>:
      kthread_worker: fix return value when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync()

Subsystem: ia64

    Randy Dunlap <rdunlap@infradead.org>:
      ia64: headers: drop duplicated words

    Arnd Bergmann <arnd@arndb.de>:
      ia64: mca_drv: fix incorrect array size calculation

Subsystem: scripts

    "Steven Rostedt (VMware)" <rostedt@goodmis.org>:
    Patch series "streamline_config.pl: Fix Perl spacing":
      streamline_config.pl: make spacing consistent
      streamline_config.pl: add softtabstop=4 for vim users

    Colin Ian King <colin.king@canonical.com>:
      scripts/spelling.txt: add more spellings to spelling.txt

Subsystem: ntfs

    Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>:
      ntfs: fix validity check for file name attribute

Subsystem: squashfs

    Vincent Whitchurch <vincent.whitchurch@axis.com>:
      squashfs: add option to panic on errors

Subsystem: ocfs2

    Yang Yingliang <yangyingliang@huawei.com>:
      ocfs2: remove unnecessary INIT_LIST_HEAD()

Subsystem: z

    Dan Carpenter <dan.carpenter@oracle.com>:
      ocfs2: fix snprintf() checking

    Colin Ian King <colin.king@canonical.com>:
      ocfs2: remove redundant assignment to pointer queue

    Wan Jiabing <wanjiabing@vivo.com>:
      ocfs2: remove repeated uptodate check for buffer

    Chen Huang <chenhuang5@huawei.com>:
      ocfs2: replace simple_strtoull() with kstrtoull()

    Colin Ian King <colin.king@canonical.com>:
      ocfs2: remove redundant initialization of variable ret

Subsystem: kernel/watchdog

    Wang Qing <wangqing@vivo.com>:
      kernel: watchdog: modify the explanation related to watchdog thread
      doc: watchdog: modify the explanation related to watchdog thread
      doc: watchdog: modify the doc related to "watchdog/%u"

Subsystem: mm/slab

    gumingtao <gumingtao1225@gmail.com>:
      slab: use __func__ to trace function name

Subsystem: mm/slub

    Vlastimil Babka <vbabka@suse.cz>:
      kunit: make test->lock irq safe

    Oliver Glitta <glittao@gmail.com>:
      mm/slub, kunit: add a KUnit test for SLUB debugging functionality
      slub: remove resiliency_test() function

    Hyeonggon Yoo <42.hyeyoo@gmail.com>:
      mm, slub: change run-time assertion in kmalloc_index() to compile-time

    Stephen Boyd <swboyd@chromium.org>:
      slub: restore slub_debug=- behavior
      slub: actually use 'message' in restore_bytes()

    Joe Perches <joe@perches.com>:
      slub: indicate slab_fix() uses printf formats

    Stephen Boyd <swboyd@chromium.org>:
      slub: force on no_hash_pointers when slub_debug is enabled

    Faiyaz Mohammed <faiyazm@codeaurora.org>:
      mm: slub: move sysfs slab alloc/free interfaces to debugfs

    Georgi Djakov <quic_c_gdjako@quicinc.com>:
      mm/slub: add taint after the errors are printed

Subsystem: mm/kmemleak

    Yanfei Xu <yanfei.xu@windriver.com>:
      mm/kmemleak: fix possible wrong memory scanning period

Subsystem: mm/dax

    Jan Kara <jack@suse.cz>:
      dax: fix ENOMEM handling in grab_mapping_entry()

Subsystem: mm/debug

    Tang Bin <tangbin@cmss.chinamobile.com>:
      tools/vm/page_owner_sort.c: check malloc() return

    Anshuman Khandual <anshuman.khandual@arm.com>:
      mm/debug_vm_pgtable: ensure THP availability via has_transparent_hugepage()

    Nicolas Saenz Julienne <nsaenzju@redhat.com>:
      mm: mmap_lock: use local locks instead of disabling preemption

    Gavin Shan <gshan@redhat.com>:
    Patch series "mm/page_reporting: Make page reporting work on arm64 with 64KB page size", v4:
      mm/page_reporting: fix code style in __page_reporting_request()
      mm/page_reporting: export reporting order as module parameter
      mm/page_reporting: allow driver to specify reporting order
      virtio_balloon: specify page reporting order if needed

Subsystem: mm/pagecache

    Kefeng Wang <wangkefeng.wang@huawei.com>:
      mm: page-writeback: kill get_writeback_state() comments

    Chi Wu <wuchi.zero@gmail.com>:
      mm/page-writeback: Fix performance when BDI's share of ratio is 0.
      mm/page-writeback: update the comment of Dirty position control
      mm/page-writeback: use __this_cpu_inc() in account_page_dirtied()

    Roman Gushchin <guro@fb.com>:
    Patch series "cgroup, blkcg: prevent dirty inodes to pin dying memory cgroups", v9:
      writeback, cgroup: do not switch inodes with I_WILL_FREE flag
      writeback, cgroup: add smp_mb() to cgroup_writeback_umount()
      writeback, cgroup: increment isw_nr_in_flight before grabbing an inode
      writeback, cgroup: switch to rcu_work API in inode_switch_wbs()
      writeback, cgroup: keep list of inodes attached to bdi_writeback
      writeback, cgroup: split out the functional part of inode_switch_wbs_work_fn()
      writeback, cgroup: support switching multiple inodes at once
      writeback, cgroup: release dying cgwbs by switching attached inodes

    Christoph Hellwig <hch@lst.de>:
    Patch series "remove the implicit .set_page_dirty default":
      fs: unexport __set_page_dirty
      fs: move ramfs_aops to libfs
      mm: require ->set_page_dirty to be explicitly wired up

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
    Patch series "Further set_page_dirty cleanups":
      mm/writeback: move __set_page_dirty() to core mm
      mm/writeback: use __set_page_dirty in __set_page_dirty_nobuffers
      iomap: use __set_page_dirty_nobuffers
      fs: remove anon_set_page_dirty()
      fs: remove noop_set_page_dirty()
      mm: move page dirtying prototypes from mm.h

Subsystem: mm/gup

    Peter Xu <peterx@redhat.com>:
    Patch series "mm/gup: Fix pin page write cache bouncing on has_pinned", v2:
      mm/gup_benchmark: support threading

    Andrea Arcangeli <aarcange@redhat.com>:
      mm: gup: allow FOLL_PIN to scale in SMP
      mm: gup: pack has_pinned in MMF_HAS_PINNED

    Christophe Leroy <christophe.leroy@csgroup.eu>:
      mm: pagewalk: fix walk for hugepage tables

Subsystem: mm/swap

    Miaohe Lin <linmiaohe@huawei.com>:
    Patch series "close various race windows for swap", v6:
      mm/swapfile: use percpu_ref to serialize against concurrent swapoff
      swap: fix do_swap_page() race with swapoff
      mm/swap: remove confusing checking for non_swap_entry() in swap_ra_info()
      mm/shmem: fix shmem_swapin() race with swapoff
    Patch series "Cleanups for swap", v2:
      mm/swapfile: move get_swap_page_of_type() under CONFIG_HIBERNATION
      mm/swap: remove unused local variable nr_shadows
      mm/swap_slots.c: delete meaningless forward declarations

    Huang Ying <ying.huang@intel.com>:
      mm, swap: remove unnecessary smp_rmb() in swap_type_to_swap_info()
      mm: free idle swap cache page after COW
      swap: check mapping_empty() for swap cache before being freed

Subsystem: mm/memcg

    Waiman Long <longman@redhat.com>:
    Patch series "mm/memcg: Reduce kmemcache memory accounting overhead", v6:
      mm/memcg: move mod_objcg_state() to memcontrol.c
      mm/memcg: cache vmstat data in percpu memcg_stock_pcp
      mm/memcg: improve refill_obj_stock() performance
      mm/memcg: optimize user context object stock access
    Patch series "mm: memcg/slab: Fix objcg pointer array handling problem", v4:
      mm: memcg/slab: properly set up gfp flags for objcg pointer array
      mm: memcg/slab: create a new set of kmalloc-cg-<n> caches
      mm: memcg/slab: disable cache merging for KMALLOC_NORMAL caches

    Muchun Song <songmuchun@bytedance.com>:
      mm: memcontrol: fix root_mem_cgroup charging
    Patch series "memcontrol code cleanup and simplification", v3:
      mm: memcontrol: fix page charging in page replacement
      mm: memcontrol: bail out early when !mm in get_mem_cgroup_from_mm
      mm: memcontrol: remove the pgdata parameter of mem_cgroup_page_lruvec
      mm: memcontrol: simplify lruvec_holds_page_lru_lock
      mm: memcontrol: rename lruvec_holds_page_lru_lock to page_matches_lruvec
      mm: memcontrol: simplify the logic of objcg pinning memcg
      mm: memcontrol: move obj_cgroup_uncharge_pages() out of css_set_lock
      mm: vmscan: remove noinline_for_stack

    wenhuizhang <wenhui@gwmail.gwu.edu>:
      memcontrol: use flexible-array member

    Dan Schatzberg <schatzberg.dan@gmail.com>:
    Patch series "Charge loop device i/o to issuing cgroup", v14:
      loop: use worker per cgroup instead of kworker
      mm: charge active memcg when no mm is set
      loop: charge i/o to mem and blk cg

    Huilong Deng <denghuilong@cdjrlc.com>:
      mm: memcontrol: remove trailing semicolon in macros

Subsystem: mm/pagemap

    David Hildenbrand <david@redhat.com>:
    Patch series "perf/binfmt/mm: remove in-tree usage of MAP_EXECUTABLE":
      perf: MAP_EXECUTABLE does not indicate VM_MAYEXEC
      binfmt: remove in-tree usage of MAP_EXECUTABLE
      mm: ignore MAP_EXECUTABLE in ksys_mmap_pgoff()

    Gonzalo Matias Juarez Tello <gmjuareztello@gmail.com>:
      mm/mmap.c: logic of find_vma_intersection repeated in __do_munmap

    Liam Howlett <liam.howlett@oracle.com>:
      mm/mmap: introduce unlock_range() for code cleanup
      mm/mmap: use find_vma_intersection() in do_mmap() for overlap

    Liu Xiang <liu.xiang@zlingsmart.com>:
      mm/memory.c: fix comment of finish_mkwrite_fault()

    Liam Howlett <liam.howlett@oracle.com>:
    Patch series "mm: Add vma_lookup()", v2:
      mm: add vma_lookup(), update find_vma_intersection() comments
      drm/i915/selftests: use vma_lookup() in __igt_mmap()
      arch/arc/kernel/troubleshoot: use vma_lookup() instead of find_vma()
      arch/arm64/kvm: use vma_lookup() instead of find_vma_intersection()
      arch/powerpc/kvm/book3s_hv_uvmem: use vma_lookup() instead of find_vma_intersection()
      arch/powerpc/kvm/book3s: use vma_lookup() in kvmppc_hv_setup_htab_rma()
      arch/mips/kernel/traps: use vma_lookup() instead of find_vma()
      arch/m68k/kernel/sys_m68k: use vma_lookup() in sys_cacheflush()
      x86/sgx: use vma_lookup() in sgx_encl_find()
      virt/kvm: use vma_lookup() instead of find_vma_intersection()
      vfio: use vma_lookup() instead of find_vma_intersection()
      net/ipv5/tcp: use vma_lookup() in tcp_zerocopy_receive()
      drm/amdgpu: use vma_lookup() in amdgpu_ttm_tt_get_user_pages()
      media: videobuf2: use vma_lookup() in get_vaddr_frames()
      misc/sgi-gru/grufault: use vma_lookup() in gru_find_vma()
      kernel/events/uprobes: use vma_lookup() in find_active_uprobe()
      lib/test_hmm: use vma_lookup() in dmirror_migrate()
      mm/ksm: use vma_lookup() in find_mergeable_vma()
      mm/migrate: use vma_lookup() in do_pages_stat_array()
      mm/mremap: use vma_lookup() in vma_to_resize()
      mm/memory.c: use vma_lookup() in __access_remote_vm()
      mm/mempolicy: use vma_lookup() in __access_remote_vm()

    Chen Li <chenli@uniontech.com>:
      mm: update legacy flush_tlb_* to use vma

Subsystem: mm/mprotect

    Peter Collingbourne <pcc@google.com>:
      mm: improve mprotect(R|W) efficiency on pages referenced once

Subsystem: mm/bootmem

    Souptick Joarder <jrdr.linux@gmail.com>:
      h8300: remove unused variable

Subsystem: mm/dma

    YueHaibing <yuehaibing@huawei.com>:
      mm/dmapool: use DEVICE_ATTR_RO macro

Subsystem: mm/tracing

    Vincent Whitchurch <vincent.whitchurch@axis.com>:
      mm, tracing: unify PFN format strings

Subsystem: mm/vmalloc

    "Uladzislau Rezki (Sony)" <urezki@gmail.com>:
    Patch series "vmalloc() vs bulk allocator", v2:
      mm/page_alloc: add an alloc_pages_bulk_array_node() helper
      mm/vmalloc: switch to bulk allocator in __vmalloc_area_node()
      mm/vmalloc: print a warning message first on failure
      mm/vmalloc: remove quoted strings split across lines

    Uladzislau Rezki <urezki@gmail.com>:
      mm/vmalloc: fallback to a single page allocator

    Rafael Aquini <aquini@redhat.com>:
      mm: vmalloc: add cond_resched() in __vunmap()

Subsystem: mm/kasan

    Alexander Potapenko <glider@google.com>:
      printk: introduce dump_stack_lvl()
      kasan: use dump_stack_lvl(KERN_ERR) to print stacks

    David Gow <davidgow@google.com>:
      kasan: test: improve failure message in KUNIT_EXPECT_KASAN_FAIL()

    Daniel Axtens <dja@axtens.net>:
    Patch series "KASAN core changes for ppc64 radix KASAN", v16:
      kasan: allow an architecture to disable inline instrumentation
      kasan: allow architectures to provide an outline readiness check
      mm: define default MAX_PTRS_PER_* in include/pgtable.h
      kasan: use MAX_PTRS_PER_* for early shadow tables

    Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>:
    Patch series "kasan: add memory corruption identification support for hw tag-based kasan", v4:
      kasan: rename CONFIG_KASAN_SW_TAGS_IDENTIFY to CONFIG_KASAN_TAGS_IDENTIFY
      kasan: integrate the common part of two KASAN tag-based modes
      kasan: add memory corruption identification support for hardware tag-based mode

Subsystem: mm/initialization

    Jungseung Lee <js07.lee@samsung.com>:
      mm: report which part of mem is being freed on initmem case

Subsystem: mm/pagealloc

    Mike Rapoport <rppt@linux.ibm.com>:
      mm/mmzone.h: simplify is_highmem_idx()

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
    Patch series "Constify struct page arguments":
      mm: make __dump_page static

    Aaron Tomlin <atomlin@redhat.com>:
      mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
      mm/debug: factor PagePoisoned out of __dump_page
      mm/page_owner: constify dump_page_owner
      mm: make compound_head const-preserving
      mm: constify get_pfnblock_flags_mask and get_pfnblock_migratetype
      mm: constify page_count and page_ref_count
      mm: optimise nth_page for contiguous memmap

    Heiner Kallweit <hkallweit1@gmail.com>:
      mm/page_alloc: switch to pr_debug

    Andrii Nakryiko <andrii@kernel.org>:
      kbuild: skip per-CPU BTF generation for pahole v1.18-v1.21

    Mel Gorman <mgorman@techsingularity.net>:
      mm/page_alloc: split per cpu page lists and zone stats
      mm/page_alloc: convert per-cpu list protection to local_lock
      mm/vmstat: convert NUMA statistics to basic NUMA counters
      mm/vmstat: inline NUMA event counter updates
      mm/page_alloc: batch the accounting updates in the bulk allocator
      mm/page_alloc: reduce duration that IRQs are disabled for VM counters
      mm/page_alloc: explicitly acquire the zone lock in __free_pages_ok
      mm/page_alloc: avoid conflating IRQs disabled with zone->lock
      mm/page_alloc: update PGFREE outside the zone lock in __free_pages_ok

    Minchan Kim <minchan@kernel.org>:
      mm: page_alloc: dump migrate-failed pages only at -EBUSY

    Mel Gorman <mgorman@techsingularity.net>:
    Patch series "Calculate pcp->high based on zone sizes and active CPUs", v2:
      mm/page_alloc: delete vm.percpu_pagelist_fraction
      mm/page_alloc: disassociate the pcp->high from pcp->batch
      mm/page_alloc: adjust pcp->high after CPU hotplug events
      mm/page_alloc: scale the number of pages that are batch freed
      mm/page_alloc: limit the number of pages on PCP lists when reclaim is active
      mm/page_alloc: introduce vm.percpu_pagelist_high_fraction

    Dong Aisheng <aisheng.dong@nxp.com>:
      mm: drop SECTION_SHIFT in code comments
      mm/page_alloc: improve memmap_pages dbg msg

    Liu Shixin <liushixin2@huawei.com>:
      mm/page_alloc: fix counting of managed_pages

    Mel Gorman <mgorman@techsingularity.net>:
    Patch series "Allow high order pages to be stored on PCP", v2:
      mm/page_alloc: move free_the_page

    Mike Rapoport <rppt@linux.ibm.com>:
    Patch series "Remove DISCONTIGMEM memory model", v3:
      alpha: remove DISCONTIGMEM and NUMA
      arc: update comment about HIGHMEM implementation
      arc: remove support for DISCONTIGMEM
      m68k: remove support for DISCONTIGMEM
      mm: remove CONFIG_DISCONTIGMEM
      arch, mm: remove stale mentions of DISCONIGMEM
      docs: remove description of DISCONTIGMEM
      mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA
      mm: replace CONFIG_FLAT_NODE_MEM_MAP with CONFIG_FLATMEM

    Mel Gorman <mgorman@techsingularity.net>:
      mm/page_alloc: allow high-order pages to be stored on the per-cpu lists
      mm/page_alloc: split pcp->high across all online CPUs for cpuless nodes

Subsystem: mm/memory-failure

    Naoya Horiguchi <naoya.horiguchi@nec.com>:
      mm,hwpoison: send SIGBUS with error virutal address
      mm,hwpoison: make get_hwpoison_page() call get_any_page()

 Documentation/admin-guide/kernel-parameters.txt    |    6 
 Documentation/admin-guide/lockup-watchdogs.rst     |    4 
 Documentation/admin-guide/sysctl/kernel.rst        |   10 
 Documentation/admin-guide/sysctl/vm.rst            |   52 -
 Documentation/dev-tools/kasan.rst                  |    9 
 Documentation/vm/memory-model.rst                  |   45 
 arch/alpha/Kconfig                                 |   22 
 arch/alpha/include/asm/machvec.h                   |    6 
 arch/alpha/include/asm/mmzone.h                    |  100 --
 arch/alpha/include/asm/pgtable.h                   |    4 
 arch/alpha/include/asm/topology.h                  |   39 
 arch/alpha/kernel/core_marvel.c                    |   53 -
 arch/alpha/kernel/core_wildfire.c                  |   29 
 arch/alpha/kernel/pci_iommu.c                      |   29 
 arch/alpha/kernel/proto.h                          |    8 
 arch/alpha/kernel/setup.c                          |   16 
 arch/alpha/kernel/sys_marvel.c                     |    5 
 arch/alpha/kernel/sys_wildfire.c                   |    5 
 arch/alpha/mm/Makefile                             |    2 
 arch/alpha/mm/init.c                               |    3 
 arch/alpha/mm/numa.c                               |  223 ----
 arch/arc/Kconfig                                   |   13 
 arch/arc/include/asm/mmzone.h                      |   40 
 arch/arc/kernel/troubleshoot.c                     |    8 
 arch/arc/mm/init.c                                 |   21 
 arch/arm/include/asm/tlbflush.h                    |   13 
 arch/arm/mm/tlb-v6.S                               |    2 
 arch/arm/mm/tlb-v7.S                               |    2 
 arch/arm64/Kconfig                                 |    2 
 arch/arm64/kvm/mmu.c                               |    2 
 arch/h8300/kernel/setup.c                          |    2 
 arch/ia64/Kconfig                                  |    2 
 arch/ia64/include/asm/pal.h                        |    2 
 arch/ia64/include/asm/spinlock.h                   |    2 
 arch/ia64/include/asm/uv/uv_hub.h                  |    2 
 arch/ia64/kernel/efi_stub.S                        |    2 
 arch/ia64/kernel/mca_drv.c                         |    2 
 arch/ia64/kernel/topology.c                        |    5 
 arch/ia64/mm/numa.c                                |    5 
 arch/m68k/Kconfig.cpu                              |   10 
 arch/m68k/include/asm/mmzone.h                     |   10 
 arch/m68k/include/asm/page.h                       |    2 
 arch/m68k/include/asm/page_mm.h                    |   35 
 arch/m68k/include/asm/tlbflush.h                   |    2 
 arch/m68k/kernel/sys_m68k.c                        |    4 
 arch/m68k/mm/init.c                                |   20 
 arch/mips/Kconfig                                  |    2 
 arch/mips/include/asm/mmzone.h                     |    8 
 arch/mips/include/asm/page.h                       |    2 
 arch/mips/kernel/traps.c                           |    4 
 arch/mips/mm/init.c                                |    7 
 arch/nds32/include/asm/memory.h                    |    6 
 arch/openrisc/include/asm/tlbflush.h               |    2 
 arch/powerpc/Kconfig                               |    2 
 arch/powerpc/include/asm/mmzone.h                  |    4 
 arch/powerpc/kernel/setup_64.c                     |    2 
 arch/powerpc/kernel/smp.c                          |    2 
 arch/powerpc/kexec/core.c                          |    4 
 arch/powerpc/kvm/book3s_hv.c                       |    4 
 arch/powerpc/kvm/book3s_hv_uvmem.c                 |    2 
 arch/powerpc/mm/Makefile                           |    2 
 arch/powerpc/mm/mem.c                              |    4 
 arch/riscv/Kconfig                                 |    2 
 arch/s390/Kconfig                                  |    2 
 arch/s390/include/asm/pgtable.h                    |    2 
 arch/sh/include/asm/mmzone.h                       |    4 
 arch/sh/kernel/topology.c                          |    2 
 arch/sh/mm/Kconfig                                 |    2 
 arch/sh/mm/init.c                                  |    2 
 arch/sparc/Kconfig                                 |    2 
 arch/sparc/include/asm/mmzone.h                    |    4 
 arch/sparc/kernel/smp_64.c                         |    2 
 arch/sparc/mm/init_64.c                            |   12 
 arch/x86/Kconfig                                   |    2 
 arch/x86/ia32/ia32_aout.c                          |    4 
 arch/x86/kernel/cpu/mce/core.c                     |   13 
 arch/x86/kernel/cpu/sgx/encl.h                     |    4 
 arch/x86/kernel/setup_percpu.c                     |    6 
 arch/x86/mm/init_32.c                              |    4 
 arch/xtensa/include/asm/page.h                     |    4 
 arch/xtensa/include/asm/tlbflush.h                 |    4 
 drivers/base/node.c                                |   18 
 drivers/block/loop.c                               |  270 ++++-
 drivers/block/loop.h                               |   15 
 drivers/dax/device.c                               |    2 
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c            |    4 
 drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c |    2 
 drivers/media/common/videobuf2/frame_vector.c      |    2 
 drivers/misc/sgi-gru/grufault.c                    |    4 
 drivers/vfio/vfio_iommu_type1.c                    |    2 
 drivers/virtio/virtio_balloon.c                    |   17 
 fs/adfs/inode.c                                    |    1 
 fs/affs/file.c                                     |    2 
 fs/bfs/file.c                                      |    1 
 fs/binfmt_aout.c                                   |    4 
 fs/binfmt_elf.c                                    |    2 
 fs/binfmt_elf_fdpic.c                              |   11 
 fs/binfmt_flat.c                                   |    2 
 fs/block_dev.c                                     |    1 
 fs/buffer.c                                        |   25 
 fs/configfs/inode.c                                |    8 
 fs/dax.c                                           |    3 
 fs/ecryptfs/mmap.c                                 |   13 
 fs/exfat/inode.c                                   |    1 
 fs/ext2/inode.c                                    |    4 
 fs/ext4/inode.c                                    |    2 
 fs/fat/inode.c                                     |    1 
 fs/fs-writeback.c                                  |  366 +++++---
 fs/fuse/dax.c                                      |    3 
 fs/gfs2/aops.c                                     |    2 
 fs/gfs2/meta_io.c                                  |    2 
 fs/hfs/inode.c                                     |    2 
 fs/hfsplus/inode.c                                 |    2 
 fs/hpfs/file.c                                     |    1 
 fs/iomap/buffered-io.c                             |   27 
 fs/jfs/inode.c                                     |    1 
 fs/kernfs/inode.c                                  |    8 
 fs/libfs.c                                         |   44 
 fs/minix/inode.c                                   |    1 
 fs/nilfs2/mdt.c                                    |    1 
 fs/ntfs/inode.c                                    |    2 
 fs/ocfs2/aops.c                                    |    4 
 fs/ocfs2/cluster/heartbeat.c                       |    7 
 fs/ocfs2/cluster/nodemanager.c                     |    2 
 fs/ocfs2/dlm/dlmmaster.c                           |    2 
 fs/ocfs2/filecheck.c                               |    6 
 fs/ocfs2/stackglue.c                               |    8 
 fs/omfs/file.c                                     |    1 
 fs/proc/task_mmu.c                                 |    2 
 fs/ramfs/inode.c                                   |    9 
 fs/squashfs/block.c                                |    5 
 fs/squashfs/squashfs_fs_sb.h                       |    1 
 fs/squashfs/super.c                                |   86 +
 fs/sysv/itree.c                                    |    1 
 fs/udf/file.c                                      |    1 
 fs/udf/inode.c                                     |    1 
 fs/ufs/inode.c                                     |    1 
 fs/xfs/xfs_aops.c                                  |    4 
 fs/zonefs/super.c                                  |    4 
 include/asm-generic/memory_model.h                 |   37 
 include/asm-generic/pgtable-nop4d.h                |    1 
 include/asm-generic/topology.h                     |    2 
 include/kunit/test.h                               |    5 
 include/linux/backing-dev-defs.h                   |   20 
 include/linux/cpuhotplug.h                         |    2 
 include/linux/fs.h                                 |    6 
 include/linux/gfp.h                                |   13 
 include/linux/iomap.h                              |    1 
 include/linux/kasan.h                              |    7 
 include/linux/kernel.h                             |    2 
 include/linux/kthread.h                            |    2 
 include/linux/memblock.h                           |    6 
 include/linux/memcontrol.h                         |   60 -
 include/linux/mm.h                                 |   53 -
 include/linux/mm_types.h                           |   10 
 include/linux/mman.h                               |    2 
 include/linux/mmdebug.h                            |    3 
 include/linux/mmzone.h                             |   96 +-
 include/linux/page-flags.h                         |   10 
 include/linux/page_owner.h                         |    6 
 include/linux/page_ref.h                           |    4 
 include/linux/page_reporting.h                     |    3 
 include/linux/pageblock-flags.h                    |    2 
 include/linux/pagemap.h                            |    4 
 include/linux/pgtable.h                            |   22 
 include/linux/printk.h                             |    5 
 include/linux/sched/coredump.h                     |    8 
 include/linux/slab.h                               |   59 +
 include/linux/swap.h                               |   19 
 include/linux/swapops.h                            |    5 
 include/linux/vmstat.h                             |   69 -
 include/linux/writeback.h                          |    1 
 include/trace/events/cma.h                         |    4 
 include/trace/events/filemap.h                     |    2 
 include/trace/events/kmem.h                        |   12 
 include/trace/events/page_pool.h                   |    4 
 include/trace/events/pagemap.h                     |    4 
 include/trace/events/vmscan.h                      |    2 
 kernel/cgroup/cgroup.c                             |    1 
 kernel/crash_core.c                                |    4 
 kernel/events/core.c                               |    2 
 kernel/events/uprobes.c                            |    4 
 kernel/fork.c                                      |    1 
 kernel/kthread.c                                   |   19 
 kernel/sysctl.c                                    |   16 
 kernel/watchdog.c                                  |   12 
 lib/Kconfig.debug                                  |   15 
 lib/Kconfig.kasan                                  |   16 
 lib/Makefile                                       |    1 
 lib/dump_stack.c                                   |   20 
 lib/kunit/test.c                                   |   18 
 lib/slub_kunit.c                                   |  152 +++
 lib/test_hmm.c                                     |    5 
 lib/test_kasan.c                                   |   11 
 lib/vsprintf.c                                     |    2 
 mm/Kconfig                                         |   38 
 mm/backing-dev.c                                   |   66 +
 mm/compaction.c                                    |    2 
 mm/debug.c                                         |   27 
 mm/debug_vm_pgtable.c                              |   63 +
 mm/dmapool.c                                       |    5 
 mm/filemap.c                                       |    2 
 mm/gup.c                                           |   81 +
 mm/hugetlb.c                                       |    2 
 mm/internal.h                                      |    9 
 mm/kasan/Makefile                                  |    4 
 mm/kasan/common.c                                  |    6 
 mm/kasan/generic.c                                 |    3 
 mm/kasan/hw_tags.c                                 |   22 
 mm/kasan/init.c                                    |    6 
 mm/kasan/kasan.h                                   |   12 
 mm/kasan/report.c                                  |    6 
 mm/kasan/report_hw_tags.c                          |    5 
 mm/kasan/report_sw_tags.c                          |   45 
 mm/kasan/report_tags.c                             |   51 +
 mm/kasan/shadow.c                                  |    6 
 mm/kasan/sw_tags.c                                 |   45 
 mm/kasan/tags.c                                    |   59 +
 mm/kfence/kfence_test.c                            |    5 
 mm/kmemleak.c                                      |   18 
 mm/ksm.c                                           |    6 
 mm/memblock.c                                      |    8 
 mm/memcontrol.c                                    |  385 ++++++--
 mm/memory-failure.c                                |  344 +++++--
 mm/memory.c                                        |   22 
 mm/memory_hotplug.c                                |    6 
 mm/mempolicy.c                                     |    4 
 mm/migrate.c                                       |    4 
 mm/mmap.c                                          |   54 -
 mm/mmap_lock.c                                     |   33 
 mm/mprotect.c                                      |   52 +
 mm/mremap.c                                        |    5 
 mm/nommu.c                                         |    2 
 mm/page-writeback.c                                |   89 +
 mm/page_alloc.c                                    |  950 +++++++++++++--------
 mm/page_ext.c                                      |    2 
 mm/page_owner.c                                    |    2 
 mm/page_reporting.c                                |   19 
 mm/page_reporting.h                                |    5 
 mm/pagewalk.c                                      |   58 +
 mm/shmem.c                                         |   18 
 mm/slab.h                                          |   24 
 mm/slab_common.c                                   |   60 -
 mm/slub.c                                          |  420 +++++----
 mm/sparse.c                                        |    2 
 mm/swap.c                                          |    4 
 mm/swap_slots.c                                    |    2 
 mm/swap_state.c                                    |   20 
 mm/swapfile.c                                      |  177 +--
 mm/vmalloc.c                                       |  181 ++--
 mm/vmscan.c                                        |   43 
 mm/vmstat.c                                        |  282 ++----
 mm/workingset.c                                    |    2 
 net/ipv4/tcp.c                                     |    4 
 scripts/kconfig/streamline_config.pl               |   76 -
 scripts/link-vmlinux.sh                            |    4 
 scripts/spelling.txt                               |   16 
 tools/testing/selftests/vm/gup_test.c              |   96 +-
 tools/vm/page_owner_sort.c                         |    4 
 virt/kvm/kvm_main.c                                |    2 
 260 files changed, 3989 insertions(+), 2996 deletions(-)



^ permalink raw reply	[flat|nested] 281+ messages in thread

* incoming
@ 2021-06-25  1:38 Andrew Morton
  0 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-06-25  1:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm

24 patches, based on 4a09d388f2ab382f217a764e6a152b3f614246f6.

Subsystems affected by this patch series:

  mm/thp
  nilfs2
  mm/vmalloc
  kthread
  mm/hugetlb
  mm/memory-failure
  mm/pagealloc
  MAINTAINERS
  mailmap

Subsystem: mm/thp

    Hugh Dickins <hughd@google.com>:
    Patch series "mm: page_vma_mapped_walk() cleanup and THP fixes":
      mm: page_vma_mapped_walk(): use page for pvmw->page
      mm: page_vma_mapped_walk(): settle PageHuge on entry
      mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd
      mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block
      mm: page_vma_mapped_walk(): crossing page table boundary
      mm: page_vma_mapped_walk(): add a level of indentation
      mm: page_vma_mapped_walk(): use goto instead of while (1)
      mm: page_vma_mapped_walk(): get vma_address_end() earlier
      mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes
      mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk()

Subsystem: nilfs2

    Pavel Skripkin <paskripkin@gmail.com>:
      nilfs2: fix memory leak in nilfs_sysfs_delete_device_group

Subsystem: mm/vmalloc

    Claudio Imbrenda <imbrenda@linux.ibm.com>:
    Patch series "mm: add vmalloc_no_huge and use it", v4:
      mm/vmalloc: add vmalloc_no_huge
      KVM: s390: prepare for hugepage vmalloc

    Daniel Axtens <dja@axtens.net>:
      mm/vmalloc: unbreak kasan vmalloc support

Subsystem: kthread

    Petr Mladek <pmladek@suse.com>:
    Patch series "kthread_worker: Fix race between kthread_mod_delayed_work():
      kthread_worker: split code for canceling the delayed work timer
      kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync()

Subsystem: mm/hugetlb

    Hugh Dickins <hughd@google.com>:
      mm, futex: fix shared futex pgoff on shmem huge page

Subsystem: mm/memory-failure

    Tony Luck <tony.luck@intel.com>:
    Patch series "mm,hwpoison: fix sending SIGBUS for Action Required MCE", v5:
      mm/memory-failure: use a mutex to avoid memory_failure() races

    Aili Yao <yaoaili@kingsoft.com>:
      mm,hwpoison: return -EHWPOISON to denote that the page has already been poisoned

    Naoya Horiguchi <naoya.horiguchi@nec.com>:
      mm/hwpoison: do not lock page again when me_huge_page() successfully recovers

Subsystem: mm/pagealloc

    Rasmus Villemoes <linux@rasmusvillemoes.dk>:
      mm/page_alloc: __alloc_pages_bulk(): do bounds check before accessing array

    Mel Gorman <mgorman@techsingularity.net>:
      mm/page_alloc: do bulk array bounds check after checking populated elements

Subsystem: MAINTAINERS

    Marek Behún <kabel@kernel.org>:
      MAINTAINERS: fix Marek's identity again

Subsystem: mailmap

    Marek Behún <kabel@kernel.org>:
      mailmap: add Marek's other e-mail address and identity without diacritics

 .mailmap                |    2 
 MAINTAINERS             |    4 
 arch/s390/kvm/pv.c      |    7 +
 fs/nilfs2/sysfs.c       |    1 
 include/linux/hugetlb.h |   16 ---
 include/linux/pagemap.h |   13 +-
 include/linux/vmalloc.h |    1 
 kernel/futex.c          |    3 
 kernel/kthread.c        |   81 ++++++++++------
 mm/hugetlb.c            |    5 -
 mm/memory-failure.c     |   83 +++++++++++------
 mm/page_alloc.c         |    6 +
 mm/page_vma_mapped.c    |  233 +++++++++++++++++++++++++++---------------------
 mm/vmalloc.c            |   41 ++++++--
 14 files changed, 297 insertions(+), 199 deletions(-)



^ permalink raw reply	[flat|nested] 281+ messages in thread

* incoming
@ 2021-06-16  1:22 Andrew Morton
  0 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-06-16  1:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, mm-commits


18 patches, based on 94f0b2d4a1d0c52035aef425da5e022bd2cb1c71.

Subsystems affected by this patch series:

  mm/memory-failure
  mm/swap
  mm/slub
  mm/hugetlb
  mm/memory-failure
  coredump
  mm/slub
  mm/thp
  mm/sparsemem

Subsystem: mm/memory-failure

    Naoya Horiguchi <naoya.horiguchi@nec.com>:
      mm,hwpoison: fix race with hugetlb page allocation

Subsystem: mm/swap

    Peter Xu <peterx@redhat.com>:
      mm/swap: fix pte_same_as_swp() not removing uffd-wp bit when compare

Subsystem: mm/slub

    Kees Cook <keescook@chromium.org>:
    Patch series "Actually fix freelist pointer vs redzoning", v4:
      mm/slub: clarify verification reporting
      mm/slub: fix redzoning for small allocations
      mm/slub: actually fix freelist pointer vs redzoning

Subsystem: mm/hugetlb

    Mike Kravetz <mike.kravetz@oracle.com>:
      mm/hugetlb: expand restore_reserve_on_error functionality

Subsystem: mm/memory-failure

    yangerkun <yangerkun@huawei.com>:
      mm/memory-failure: make sure wait for page writeback in memory_failure

Subsystem: coredump

    Pingfan Liu <kernelfans@gmail.com>:
      crash_core, vmcoreinfo: append 'SECTION_SIZE_BITS' to vmcoreinfo

Subsystem: mm/slub

    Andrew Morton <akpm@linux-foundation.org>:
      mm/slub.c: include swab.h

Subsystem: mm/thp

    Xu Yu <xuyu@linux.alibaba.com>:
      mm, thp: use head page in __migration_entry_wait()

    Hugh Dickins <hughd@google.com>:
    Patch series "mm/thp: fix THP splitting unmap BUGs and related", v10:
      mm/thp: fix __split_huge_pmd_locked() on shmem migration entry
      mm/thp: make is_huge_zero_pmd() safe and quicker
      mm/thp: try_to_unmap() use TTU_SYNC for safe splitting
      mm/thp: fix vma_address() if virtual address below file offset

    Jue Wang <juew@google.com>:
      mm/thp: fix page_address_in_vma() on file THP tails

    Hugh Dickins <hughd@google.com>:
      mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page()

    Yang Shi <shy828301@gmail.com>:
      mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split

Subsystem: mm/sparsemem

    Miles Chen <miles.chen@mediatek.com>:
      mm/sparse: fix check_usemap_section_nr warnings

 Documentation/vm/slub.rst |   10 +--
 fs/hugetlbfs/inode.c      |    1 
 include/linux/huge_mm.h   |    8 ++
 include/linux/hugetlb.h   |    8 ++
 include/linux/mm.h        |    3 +
 include/linux/rmap.h      |    1 
 include/linux/swapops.h   |   15 +++--
 kernel/crash_core.c       |    1 
 mm/huge_memory.c          |   58 ++++++++++---------
 mm/hugetlb.c              |  137 +++++++++++++++++++++++++++++++++++++---------
 mm/internal.h             |   51 ++++++++++++-----
 mm/memory-failure.c       |   36 +++++++++++-
 mm/memory.c               |   41 +++++++++++++
 mm/migrate.c              |    1 
 mm/page_vma_mapped.c      |   27 +++++----
 mm/pgtable-generic.c      |    5 -
 mm/rmap.c                 |   41 +++++++++----
 mm/slab_common.c          |    3 -
 mm/slub.c                 |   37 +++++-------
 mm/sparse.c               |   13 +++-
 mm/swapfile.c             |    2 
 mm/truncate.c             |   43 ++++++--------
 22 files changed, 388 insertions(+), 154 deletions(-)



^ permalink raw reply	[flat|nested] 281+ messages in thread

* incoming
@ 2021-06-05  3:00 Andrew Morton
  0 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-06-05  3:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, mm-commits

13 patches, based on 16f0596fc1d78a1f3ae4628cff962bb297dc908c.

Subsystems affected by this patch series:

  mips
  mm/kfence
  init
  mm/debug
  mm/pagealloc
  mm/memory-hotplug
  mm/hugetlb
  proc
  mm/kasan
  mm/hugetlb
  lib
  ocfs2
  mailmap

Subsystem: mips

    Thomas Bogendoerfer <tsbogend@alpha.franken.de>:
      Revert "MIPS: make userspace mapping young by default"

Subsystem: mm/kfence

    Marco Elver <elver@google.com>:
      kfence: use TASK_IDLE when awaiting allocation

Subsystem: init

    Mark Rutland <mark.rutland@arm.com>:
      pid: take a reference when initializing `cad_pid`

Subsystem: mm/debug

    Gerald Schaefer <gerald.schaefer@linux.ibm.com>:
      mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests()

Subsystem: mm/pagealloc

    Ding Hui <dinghui@sangfor.com.cn>:
      mm/page_alloc: fix counting of free pages after take off from buddy

Subsystem: mm/memory-hotplug

    David Hildenbrand <david@redhat.com>:
      drivers/base/memory: fix trying offlining memory blocks with memory holes on aarch64

Subsystem: mm/hugetlb

    Naoya Horiguchi <naoya.horiguchi@nec.com>:
      hugetlb: pass head page to remove_hugetlb_page()

Subsystem: proc

    David Matlack <dmatlack@google.com>:
      proc: add .gitignore for proc-subset-pid selftest

Subsystem: mm/kasan

    Yu Kuai <yukuai3@huawei.com>:
      mm/kasan/init.c: fix doc warning

Subsystem: mm/hugetlb

    Mina Almasry <almasrymina@google.com>:
      mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY

Subsystem: lib

    YueHaibing <yuehaibing@huawei.com>:
      lib: crc64: fix kernel-doc warning

Subsystem: ocfs2

    Junxiao Bi <junxiao.bi@oracle.com>:
      ocfs2: fix data corruption by fallocate

Subsystem: mailmap

    Michel Lespinasse <michel@lespinasse.org>:
      mailmap: use private address for Michel Lespinasse

 .mailmap                                |    3 +
 arch/mips/mm/cache.c                    |   30 ++++++++---------
 drivers/base/memory.c                   |    6 +--
 fs/ocfs2/file.c                         |   55 +++++++++++++++++++++++++++++---
 include/linux/pgtable.h                 |    8 ++++
 init/main.c                             |    2 -
 lib/crc64.c                             |    2 -
 mm/debug_vm_pgtable.c                   |    4 +-
 mm/hugetlb.c                            |   16 +++++++--
 mm/kasan/init.c                         |    4 +-
 mm/kfence/core.c                        |    6 +--
 mm/memory.c                             |    4 ++
 mm/page_alloc.c                         |    2 +
 tools/testing/selftests/proc/.gitignore |    1 
 14 files changed, 107 insertions(+), 36 deletions(-)



^ permalink raw reply	[flat|nested] 281+ messages in thread

* incoming
@ 2021-05-23  0:41 Andrew Morton
  0 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-05-23  0:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm

10 patches, based on 4ff2473bdb4cf2bb7d208ccf4418d3d7e6b1652c.

Subsystems affected by this patch series:

  mm/pagealloc
  mm/gup
  ipc
  selftests
  mm/kasan
  kernel/watchdog
  bitmap
  procfs
  lib
  mm/userfaultfd

Subsystem: mm/pagealloc

    Arnd Bergmann <arnd@arndb.de>:
      mm/shuffle: fix section mismatch warning

Subsystem: mm/gup

    Michal Hocko <mhocko@suse.com>:
      Revert "mm/gup: check page posion status for coredump."

Subsystem: ipc

    Varad Gautam <varad.gautam@suse.com>:
      ipc/mqueue, msg, sem: avoid relying on a stack reference past its expiry

Subsystem: selftests

    Yang Yingliang <yangyingliang@huawei.com>:
      tools/testing/selftests/exec: fix link error

Subsystem: mm/kasan

    Alexander Potapenko <glider@google.com>:
      kasan: slab: always reset the tag in get_freepointer_safe()

Subsystem: kernel/watchdog

    Petr Mladek <pmladek@suse.com>:
      watchdog: reliable handling of timestamps

Subsystem: bitmap

    Rikard Falkeborn <rikard.falkeborn@gmail.com>:
      linux/bits.h: fix compilation error with GENMASK

Subsystem: procfs

    Alexey Dobriyan <adobriyan@gmail.com>:
      proc: remove Alexey from MAINTAINERS

Subsystem: lib

    Zhen Lei <thunder.leizhen@huawei.com>:
      lib: kunit: suppress a compilation warning of frame size

Subsystem: mm/userfaultfd

    Mike Kravetz <mike.kravetz@oracle.com>:
      userfaultfd: hugetlbfs: fix new flag usage in error path

 MAINTAINERS                           |    1 -
 fs/hugetlbfs/inode.c                  |    2 +-
 include/linux/bits.h                  |    2 +-
 include/linux/const.h                 |    8 ++++++++
 include/linux/minmax.h                |   10 ++--------
 ipc/mqueue.c                          |    6 ++++--
 ipc/msg.c                             |    6 ++++--
 ipc/sem.c                             |    6 ++++--
 kernel/watchdog.c                     |   34 ++++++++++++++++++++--------------
 lib/Makefile                          |    1 +
 mm/gup.c                              |    4 ----
 mm/internal.h                         |   20 --------------------
 mm/shuffle.h                          |    4 ++--
 mm/slub.c                             |    1 +
 mm/userfaultfd.c                      |   28 ++++++++++++++--------------
 tools/include/linux/bits.h            |    2 +-
 tools/include/linux/const.h           |    8 ++++++++
 tools/testing/selftests/exec/Makefile |    6 +++---
 18 files changed, 74 insertions(+), 75 deletions(-)



^ permalink raw reply	[flat|nested] 281+ messages in thread

* incoming
@ 2021-05-15  0:26 Andrew Morton
  0 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-05-15  0:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm

13 patches, based on bd3c9cdb21a2674dd0db70199df884828e37abd4.

Subsystems affected by this patch series:

  mm/hugetlb
  mm/slub
  resource
  squashfs
  mm/userfaultfd
  mm/ksm
  mm/pagealloc
  mm/kasan
  mm/pagemap
  hfsplus
  modprobe
  mm/ioremap

Subsystem: mm/hugetlb

    Peter Xu <peterx@redhat.com>:
    Patch series "mm/hugetlb: Fix issues on file sealing and fork", v2:
      mm/hugetlb: fix F_SEAL_FUTURE_WRITE
      mm/hugetlb: fix cow where page writtable in child

Subsystem: mm/slub

    Vlastimil Babka <vbabka@suse.cz>:
      mm, slub: move slub_debug static key enabling outside slab_mutex

Subsystem: resource

    Alistair Popple <apopple@nvidia.com>:
      kernel/resource: fix return code check in __request_free_mem_region

Subsystem: squashfs

    Phillip Lougher <phillip@squashfs.org.uk>:
      squashfs: fix divide error in calculate_skip()

Subsystem: mm/userfaultfd

    Axel Rasmussen <axelrasmussen@google.com>:
      userfaultfd: release page in error path to avoid BUG_ON

Subsystem: mm/ksm

    Hugh Dickins <hughd@google.com>:
      ksm: revert "use GET_KSM_PAGE_NOLOCK to get ksm page in remove_rmap_item_from_tree()"

Subsystem: mm/pagealloc

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
      mm: fix struct page layout on 32-bit systems

Subsystem: mm/kasan

    Peter Collingbourne <pcc@google.com>:
      kasan: fix unit tests with CONFIG_UBSAN_LOCAL_BOUNDS enabled

Subsystem: mm/pagemap

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
      mm/filemap: fix readahead return types

Subsystem: hfsplus

    Jouni Roivas <jouni.roivas@tuxera.com>:
      hfsplus: prevent corruption in shrinking truncate

Subsystem: modprobe

    Rasmus Villemoes <linux@rasmusvillemoes.dk>:
      docs: admin-guide: update description for kernel.modprobe sysctl

Subsystem: mm/ioremap

    Christophe Leroy <christophe.leroy@csgroup.eu>:
      mm/ioremap: fix iomap_max_page_shift

 Documentation/admin-guide/sysctl/kernel.rst |    9 ++++---
 fs/hfsplus/extents.c                        |    7 +++--
 fs/hugetlbfs/inode.c                        |    5 ++++
 fs/iomap/buffered-io.c                      |    4 +--
 fs/squashfs/file.c                          |    6 ++--
 include/linux/mm.h                          |   32 ++++++++++++++++++++++++++
 include/linux/mm_types.h                    |    4 +--
 include/linux/pagemap.h                     |    6 ++--
 include/net/page_pool.h                     |   12 +++++++++
 kernel/resource.c                           |    2 -
 lib/test_kasan.c                            |   29 ++++++++++++++++++-----
 mm/hugetlb.c                                |    1 
 mm/ioremap.c                                |    6 ++--
 mm/ksm.c                                    |    3 +-
 mm/shmem.c                                  |   34 ++++++++++++----------------
 mm/slab_common.c                            |   10 ++++++++
 mm/slub.c                                   |    9 -------
 net/core/page_pool.c                        |   12 +++++----
 18 files changed, 129 insertions(+), 62 deletions(-)



^ permalink raw reply	[flat|nested] 281+ messages in thread

* Re: incoming
  2021-05-07  1:01 incoming Andrew Morton
@ 2021-05-07  7:12 ` Linus Torvalds
  0 siblings, 0 replies; 281+ messages in thread
From: Linus Torvalds @ 2021-05-07  7:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mm-commits, Linux-MM

On Thu, May 6, 2021 at 6:01 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> I've been wobbly about the secretmem patches due to doubts about
> whether the feature is sufficiently useful to justify inclusion, but
> developers are now weighing in with helpful information and I've asked Mike
> for an extensively updated [0/n] changelog.  This will take a few days
> to play out so it is possible that I will prevail upon you for a post-rc1
> merge.

Oh, much too late for this release by now.

> If that's a problem, there's always 5.13-rc1.

5.13-rc1 is two days from now, it would be for 5.14-rc1.. How time -
and version numbers - fly.

             Linus


^ permalink raw reply	[flat|nested] 281+ messages in thread

* incoming
@ 2021-05-07  1:01 Andrew Morton
  2021-05-07  7:12 ` incoming Linus Torvalds
  0 siblings, 1 reply; 281+ messages in thread
From: Andrew Morton @ 2021-05-07  1:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm


This is everything else from -mm for this merge window, with the
possible exception of Mike Rapoport's "secretmem" syscall patch series
(https://lkml.kernel.org/r/20210303162209.8609-1-rppt@kernel.org).

I've been wobbly about the secretmem patches due to doubts about
whether the feature is sufficiently useful to justify inclusion, but
developers are now weighing in with helpful information and I've asked Mike
for an extensively updated [0/n] changelog.  This will take a few days
to play out so it is possible that I will prevail upon you for a post-rc1
merge.  If that's a problem, there's always 5.13-rc1.

91 patches, based on 8ca5297e7e38f2dc8c753d33a5092e7be181fff0, plus
previously sent patches.

Thanks.



Subsystems affected by this patch series:

  alpha
  procfs
  sysctl
  misc
  core-kernel
  bitmap
  lib
  compat
  checkpatch
  epoll
  isofs
  nilfs2
  hpfs
  exit
  fork
  kexec
  gcov
  panic
  delayacct
  gdb
  resource
  selftests
  async
  initramfs
  ipc
  mm/cleanups
  drivers/char
  mm/slub
  spelling

Subsystem: alpha

    Randy Dunlap <rdunlap@infradead.org>:
      alpha: eliminate old-style function definitions
      alpha: csum_partial_copy.c: add function prototypes from <net/checksum.h>

Subsystem: procfs

    Colin Ian King <colin.king@canonical.com>:
      fs/proc/generic.c: fix incorrect pde_is_permanent check

    Alexey Dobriyan <adobriyan@gmail.com>:
      proc: save LOC in __xlate_proc_name()
      proc: mandate ->proc_lseek in "struct proc_ops"
      proc: delete redundant subset=pid check
      selftests: proc: test subset=pid

Subsystem: sysctl

    zhouchuangao <zhouchuangao@vivo.com>:
      proc/sysctl: fix function name error in comments

Subsystem: misc

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
      include: remove pagemap.h from blkdev.h

    Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
      kernel.h: drop inclusion in bitmap.h

    Wan Jiabing <wanjiabing@vivo.com>:
      linux/profile.h: remove unnecessary declaration

Subsystem: core-kernel

    Rasmus Villemoes <linux@rasmusvillemoes.dk>:
      kernel/async.c: fix pr_debug statement
      kernel/cred.c: make init_groups static

Subsystem: bitmap

    Yury Norov <yury.norov@gmail.com>:
    Patch series "lib/find_bit: fast path for small bitmaps", v6:
      tools: disable -Wno-type-limits
      tools: bitmap: sync function declarations with the kernel
      tools: sync BITMAP_LAST_WORD_MASK() macro with the kernel
      arch: rearrange headers inclusion order in asm/bitops for m68k, sh and h8300
      lib: extend the scope of small_const_nbits() macro
      tools: sync small_const_nbits() macro with the kernel
      lib: inline _find_next_bit() wrappers
      tools: sync find_next_bit implementation
      lib: add fast path for find_next_*_bit()
      lib: add fast path for find_first_*_bit() and find_last_bit()
      tools: sync lib/find_bit implementation
      MAINTAINERS: add entry for the bitmap API

Subsystem: lib

    Bhaskar Chowdhury <unixbhaskar@gmail.com>:
      lib/bch.c: fix a typo in the file bch.c

    Wang Qing <wangqing@vivo.com>:
      lib: fix inconsistent indenting in process_bit1()

    ToastC <mrtoastcheng@gmail.com>:
      lib/list_sort.c: fix typo in function description

    Bhaskar Chowdhury <unixbhaskar@gmail.com>:
      lib/genalloc.c: Fix a typo

    Richard Fitzgerald <rf@opensource.cirrus.com>:
      lib: crc8: pointer to data block should be const

    Zqiang <qiang.zhang@windriver.com>:
      lib: stackdepot: turn depot_lock spinlock to raw_spinlock

    Alex Shi <alexs@kernel.org>:
      lib/percpu_counter: tame kernel-doc compile warning
      lib/genalloc: add parameter description to fix doc compile warning

    Randy Dunlap <rdunlap@infradead.org>:
      lib: parser: clean up kernel-doc

Subsystem: compat

    Masahiro Yamada <masahiroy@kernel.org>:
      include/linux/compat.h: remove unneeded declaration from COMPAT_SYSCALL_DEFINEx()

Subsystem: checkpatch

    Joe Perches <joe@perches.com>:
      checkpatch: warn when missing newline in return sysfs_emit() formats

    Vincent Mailhol <mailhol.vincent@wanadoo.fr>:
      checkpatch: exclude four preprocessor sub-expressions from MACRO_ARG_REUSE

    Christophe JAILLET <christophe.jaillet@wanadoo.fr>:
      checkpatch: improve ALLOC_ARRAY_ARGS test

Subsystem: epoll

    Davidlohr Bueso <dave@stgolabs.net>:
    Patch series "fs/epoll: restore user-visible behavior upon event ready":
      kselftest: introduce new epoll test case
      fs/epoll: restore waking from ep_done_scan()

Subsystem: isofs

    "Gustavo A. R. Silva" <gustavoars@kernel.org>:
      isofs: fix fall-through warnings for Clang

Subsystem: nilfs2

    Liu xuzhi <liu.xuzhi@zte.com.cn>:
      fs/nilfs2: fix misspellings using codespell tool

    Lu Jialin <lujialin4@huawei.com>:
      nilfs2: fix typos in comments

Subsystem: hpfs

    "Gustavo A. R. Silva" <gustavoars@kernel.org>:
      hpfs: replace one-element array with flexible-array member

Subsystem: exit

    Jim Newsome <jnewsome@torproject.org>:
      do_wait: make PIDTYPE_PID case O(1) instead of O(n)

Subsystem: fork

    Rolf Eike Beer <eb@emlix.com>:
      kernel/fork.c: simplify copy_mm()

    Xiaofeng Cao <cxfcosmos@gmail.com>:
      kernel/fork.c: fix typos

Subsystem: kexec

    Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>:
      kernel/crash_core: add crashkernel=auto for vmcore creation

    Joe LeVeque <jolevequ@microsoft.com>:
      kexec: Add kexec reboot string

    Jia-Ju Bai <baijiaju1990@gmail.com>:
      kernel: kexec_file: fix error return code of kexec_calculate_store_digests()

    Pavel Tatashin <pasha.tatashin@soleen.com>:
      kexec: dump kmessage before machine_kexec

Subsystem: gcov

    Johannes Berg <johannes.berg@intel.com>:
      gcov: combine common code
      gcov: simplify buffer allocation
      gcov: use kvmalloc()

    Nick Desaulniers <ndesaulniers@google.com>:
      gcov: clang: drop support for clang-10 and older

Subsystem: panic

    He Ying <heying24@huawei.com>:
      smp: kernel/panic.c - silence warnings

Subsystem: delayacct

    Yafang Shao <laoar.shao@gmail.com>:
      delayacct: clear right task's flag after blkio completes

Subsystem: gdb

    Johannes Berg <johannes.berg@intel.com>:
      gdb: lx-symbols: store the abspath()

    Barry Song <song.bao.hua@hisilicon.com>:
    Patch series "scripts/gdb: clarify the platforms supporting lx_current and add arm64 support", v2:
      scripts/gdb: document lx_current is only supported by x86
      scripts/gdb: add lx_current support for arm64

Subsystem: resource

    David Hildenbrand <david@redhat.com>:
    Patch series "kernel/resource: make walk_system_ram_res() and walk_mem_res() search the whole tree", v2:
      kernel/resource: make walk_system_ram_res() find all busy IORESOURCE_SYSTEM_RAM resources
      kernel/resource: make walk_mem_res() find all busy IORESOURCE_MEM resources
      kernel/resource: remove first_lvl / siblings_only logic

    Alistair Popple <apopple@nvidia.com>:
      kernel/resource: allow region_intersects users to hold resource_lock
      kernel/resource: refactor __request_region to allow external locking
      kernel/resource: fix locking in request_free_mem_region

Subsystem: selftests

    Zhang Yunkai <zhang.yunkai@zte.com.cn>:
      selftests: remove duplicate include

Subsystem: async

    Rasmus Villemoes <linux@rasmusvillemoes.dk>:
      kernel/async.c: stop guarding pr_debug() statements
      kernel/async.c: remove async_unregister_domain()

Subsystem: initramfs

    Rasmus Villemoes <linux@rasmusvillemoes.dk>:
    Patch series "background initramfs unpacking, and CONFIG_MODPROBE_PATH", v3:
      init/initramfs.c: do unpacking asynchronously
      modules: add CONFIG_MODPROBE_PATH

Subsystem: ipc

    Bhaskar Chowdhury <unixbhaskar@gmail.com>:
      ipc/sem.c: mundane typo fixes

Subsystem: mm/cleanups

    Shijie Luo <luoshijie1@huawei.com>:
      mm: fix some typos and code style problems

Subsystem: drivers/char

    David Hildenbrand <david@redhat.com>:
    Patch series "drivers/char: remove /dev/kmem for good":
      drivers/char: remove /dev/kmem for good
      mm: remove xlate_dev_kmem_ptr()
      mm/vmalloc: remove vwrite()

Subsystem: mm/slub

    Maninder Singh <maninder1.s@samsung.com>:
      arm: print alloc free paths for address in registers

Subsystem: spelling

    Drew Fustini <drew@beagleboard.org>:
      scripts/spelling.txt: add "overlfow"

    zuoqilin <zuoqilin@yulong.com>:
      scripts/spelling.txt: Add "diabled" typo

    Drew Fustini <drew@beagleboard.org>:
      scripts/spelling.txt: add "overflw"

    Colin Ian King <colin.king@canonical.com>:
      mm/slab.c: fix spelling mistake "disired" -> "desired"

    Bhaskar Chowdhury <unixbhaskar@gmail.com>:
      include/linux/pgtable.h: few spelling fixes

    zhouchuangao <zhouchuangao@vivo.com>:
      kernel/umh.c: fix some spelling mistakes

    Xiaofeng Cao <cxfcosmos@gmail.com>:
      kernel/user_namespace.c: fix typos

    Bhaskar Chowdhury <unixbhaskar@gmail.com>:
      kernel/up.c: fix typo

    Xiaofeng Cao <caoxiaofeng@yulong.com>:
      kernel/sys.c: fix typo

    dingsenjie <dingsenjie@yulong.com>:
      fs: fat: fix spelling typo of values

    Bhaskar Chowdhury <unixbhaskar@gmail.com>:
      ipc/sem.c: spelling fix

    Masahiro Yamada <masahiroy@kernel.org>:
      treewide: remove editor modelines and cruft

    Ingo Molnar <mingo@kernel.org>:
      mm: fix typos in comments

    Lu Jialin <lujialin4@huawei.com>:
      mm: fix typos in comments

 Documentation/admin-guide/devices.txt                         |    2 
 Documentation/admin-guide/kdump/kdump.rst                     |    3 
 Documentation/admin-guide/kernel-parameters.txt               |   18 
 Documentation/dev-tools/gdb-kernel-debugging.rst              |    4 
 MAINTAINERS                                                   |   16 
 arch/Kconfig                                                  |   20 
 arch/alpha/include/asm/io.h                                   |    5 
 arch/alpha/kernel/pc873xx.c                                   |    4 
 arch/alpha/lib/csum_partial_copy.c                            |    1 
 arch/arm/configs/dove_defconfig                               |    1 
 arch/arm/configs/magician_defconfig                           |    1 
 arch/arm/configs/moxart_defconfig                             |    1 
 arch/arm/configs/mps2_defconfig                               |    1 
 arch/arm/configs/mvebu_v5_defconfig                           |    1 
 arch/arm/configs/xcep_defconfig                               |    1 
 arch/arm/include/asm/bug.h                                    |    1 
 arch/arm/include/asm/io.h                                     |    5 
 arch/arm/kernel/process.c                                     |   11 
 arch/arm/kernel/traps.c                                       |    1 
 arch/h8300/include/asm/bitops.h                               |    8 
 arch/hexagon/configs/comet_defconfig                          |    1 
 arch/hexagon/include/asm/io.h                                 |    1 
 arch/ia64/include/asm/io.h                                    |    1 
 arch/ia64/include/asm/uaccess.h                               |   18 
 arch/m68k/atari/time.c                                        |    7 
 arch/m68k/configs/amcore_defconfig                            |    1 
 arch/m68k/include/asm/bitops.h                                |    6 
 arch/m68k/include/asm/io_mm.h                                 |    5 
 arch/mips/include/asm/io.h                                    |    5 
 arch/openrisc/configs/or1ksim_defconfig                       |    1 
 arch/parisc/include/asm/io.h                                  |    5 
 arch/parisc/include/asm/pdc_chassis.h                         |    1 
 arch/powerpc/include/asm/io.h                                 |    5 
 arch/s390/include/asm/io.h                                    |    5 
 arch/sh/configs/edosk7705_defconfig                           |    1 
 arch/sh/configs/se7206_defconfig                              |    1 
 arch/sh/configs/sh2007_defconfig                              |    1 
 arch/sh/configs/sh7724_generic_defconfig                      |    1 
 arch/sh/configs/sh7770_generic_defconfig                      |    1 
 arch/sh/configs/sh7785lcr_32bit_defconfig                     |    1 
 arch/sh/include/asm/bitops.h                                  |    5 
 arch/sh/include/asm/io.h                                      |    5 
 arch/sparc/configs/sparc64_defconfig                          |    1 
 arch/sparc/include/asm/io_64.h                                |    5 
 arch/um/drivers/cow.h                                         |    7 
 arch/xtensa/configs/xip_kc705_defconfig                       |    1 
 block/blk-settings.c                                          |    1 
 drivers/auxdisplay/panel.c                                    |    7 
 drivers/base/firmware_loader/main.c                           |    2 
 drivers/block/brd.c                                           |    1 
 drivers/block/loop.c                                          |    1 
 drivers/char/Kconfig                                          |   10 
 drivers/char/mem.c                                            |  231 --------
 drivers/gpu/drm/qxl/qxl_drv.c                                 |    1 
 drivers/isdn/capi/kcapi_proc.c                                |    1 
 drivers/md/bcache/super.c                                     |    1 
 drivers/media/usb/pwc/pwc-uncompress.c                        |    3 
 drivers/net/ethernet/adaptec/starfire.c                       |    8 
 drivers/net/ethernet/amd/atarilance.c                         |    8 
 drivers/net/ethernet/amd/pcnet32.c                            |    7 
 drivers/net/wireless/intersil/hostap/hostap_proc.c            |    1 
 drivers/net/wireless/intersil/orinoco/orinoco_nortel.c        |    8 
 drivers/net/wireless/intersil/orinoco/orinoco_pci.c           |    8 
 drivers/net/wireless/intersil/orinoco/orinoco_plx.c           |    8 
 drivers/net/wireless/intersil/orinoco/orinoco_tmd.c           |    8 
 drivers/nvdimm/btt.c                                          |    1 
 drivers/nvdimm/pmem.c                                         |    1 
 drivers/parport/parport_ip32.c                                |   12 
 drivers/platform/x86/dell/dell_rbu.c                          |    3 
 drivers/scsi/53c700.c                                         |    1 
 drivers/scsi/53c700.h                                         |    1 
 drivers/scsi/ch.c                                             |    6 
 drivers/scsi/esas2r/esas2r_main.c                             |    1 
 drivers/scsi/ips.c                                            |   20 
 drivers/scsi/ips.h                                            |   20 
 drivers/scsi/lasi700.c                                        |    1 
 drivers/scsi/megaraid/mbox_defs.h                             |    2 
 drivers/scsi/megaraid/mega_common.h                           |    2 
 drivers/scsi/megaraid/megaraid_mbox.c                         |    2 
 drivers/scsi/megaraid/megaraid_mbox.h                         |    2 
 drivers/scsi/qla1280.c                                        |   12 
 drivers/scsi/scsicam.c                                        |    1 
 drivers/scsi/sni_53c710.c                                     |    1 
 drivers/video/fbdev/matrox/matroxfb_base.c                    |    9 
 drivers/video/fbdev/vga16fb.c                                 |   10 
 fs/configfs/configfs_internal.h                               |    4 
 fs/configfs/dir.c                                             |    4 
 fs/configfs/file.c                                            |    4 
 fs/configfs/inode.c                                           |    4 
 fs/configfs/item.c                                            |    4 
 fs/configfs/mount.c                                           |    4 
 fs/configfs/symlink.c                                         |    4 
 fs/eventpoll.c                                                |    6 
 fs/fat/fatent.c                                               |    2 
 fs/hpfs/hpfs.h                                                |    3 
 fs/isofs/rock.c                                               |    1 
 fs/nfs/dir.c                                                  |    7 
 fs/nfs/nfs4proc.c                                             |    6 
 fs/nfs/nfs4renewd.c                                           |    6 
 fs/nfs/nfs4state.c                                            |    6 
 fs/nfs/nfs4xdr.c                                              |    6 
 fs/nfsd/nfs4proc.c                                            |    6 
 fs/nfsd/nfs4xdr.c                                             |    6 
 fs/nfsd/xdr4.h                                                |    6 
 fs/nilfs2/cpfile.c                                            |    2 
 fs/nilfs2/ioctl.c                                             |    4 
 fs/nilfs2/segment.c                                           |    4 
 fs/nilfs2/the_nilfs.c                                         |    2 
 fs/ocfs2/acl.c                                                |    4 
 fs/ocfs2/acl.h                                                |    4 
 fs/ocfs2/alloc.c                                              |    4 
 fs/ocfs2/alloc.h                                              |    4 
 fs/ocfs2/aops.c                                               |    4 
 fs/ocfs2/aops.h                                               |    4 
 fs/ocfs2/blockcheck.c                                         |    4 
 fs/ocfs2/blockcheck.h                                         |    4 
 fs/ocfs2/buffer_head_io.c                                     |    4 
 fs/ocfs2/buffer_head_io.h                                     |    4 
 fs/ocfs2/cluster/heartbeat.c                                  |    4 
 fs/ocfs2/cluster/heartbeat.h                                  |    4 
 fs/ocfs2/cluster/masklog.c                                    |    4 
 fs/ocfs2/cluster/masklog.h                                    |    4 
 fs/ocfs2/cluster/netdebug.c                                   |    4 
 fs/ocfs2/cluster/nodemanager.c                                |    4 
 fs/ocfs2/cluster/nodemanager.h                                |    4 
 fs/ocfs2/cluster/ocfs2_heartbeat.h                            |    4 
 fs/ocfs2/cluster/ocfs2_nodemanager.h                          |    4 
 fs/ocfs2/cluster/quorum.c                                     |    4 
 fs/ocfs2/cluster/quorum.h                                     |    4 
 fs/ocfs2/cluster/sys.c                                        |    4 
 fs/ocfs2/cluster/sys.h                                        |    4 
 fs/ocfs2/cluster/tcp.c                                        |    4 
 fs/ocfs2/cluster/tcp.h                                        |    4 
 fs/ocfs2/cluster/tcp_internal.h                               |    4 
 fs/ocfs2/dcache.c                                             |    4 
 fs/ocfs2/dcache.h                                             |    4 
 fs/ocfs2/dir.c                                                |    4 
 fs/ocfs2/dir.h                                                |    4 
 fs/ocfs2/dlm/dlmapi.h                                         |    4 
 fs/ocfs2/dlm/dlmast.c                                         |    4 
 fs/ocfs2/dlm/dlmcommon.h                                      |    4 
 fs/ocfs2/dlm/dlmconvert.c                                     |    4 
 fs/ocfs2/dlm/dlmconvert.h                                     |    4 
 fs/ocfs2/dlm/dlmdebug.c                                       |    4 
 fs/ocfs2/dlm/dlmdebug.h                                       |    4 
 fs/ocfs2/dlm/dlmdomain.c                                      |    4 
 fs/ocfs2/dlm/dlmdomain.h                                      |    4 
 fs/ocfs2/dlm/dlmlock.c                                        |    4 
 fs/ocfs2/dlm/dlmmaster.c                                      |    4 
 fs/ocfs2/dlm/dlmrecovery.c                                    |    4 
 fs/ocfs2/dlm/dlmthread.c                                      |    4 
 fs/ocfs2/dlm/dlmunlock.c                                      |    4 
 fs/ocfs2/dlmfs/dlmfs.c                                        |    4 
 fs/ocfs2/dlmfs/userdlm.c                                      |    4 
 fs/ocfs2/dlmfs/userdlm.h                                      |    4 
 fs/ocfs2/dlmglue.c                                            |    4 
 fs/ocfs2/dlmglue.h                                            |    4 
 fs/ocfs2/export.c                                             |    4 
 fs/ocfs2/export.h                                             |    4 
 fs/ocfs2/extent_map.c                                         |    4 
 fs/ocfs2/extent_map.h                                         |    4 
 fs/ocfs2/file.c                                               |    4 
 fs/ocfs2/file.h                                               |    4 
 fs/ocfs2/filecheck.c                                          |    4 
 fs/ocfs2/filecheck.h                                          |    4 
 fs/ocfs2/heartbeat.c                                          |    4 
 fs/ocfs2/heartbeat.h                                          |    4 
 fs/ocfs2/inode.c                                              |    4 
 fs/ocfs2/inode.h                                              |    4 
 fs/ocfs2/journal.c                                            |    4 
 fs/ocfs2/journal.h                                            |    4 
 fs/ocfs2/localalloc.c                                         |    4 
 fs/ocfs2/localalloc.h                                         |    4 
 fs/ocfs2/locks.c                                              |    4 
 fs/ocfs2/locks.h                                              |    4 
 fs/ocfs2/mmap.c                                               |    4 
 fs/ocfs2/move_extents.c                                       |    4 
 fs/ocfs2/move_extents.h                                       |    4 
 fs/ocfs2/namei.c                                              |    4 
 fs/ocfs2/namei.h                                              |    4 
 fs/ocfs2/ocfs1_fs_compat.h                                    |    4 
 fs/ocfs2/ocfs2.h                                              |    4 
 fs/ocfs2/ocfs2_fs.h                                           |    4 
 fs/ocfs2/ocfs2_ioctl.h                                        |    4 
 fs/ocfs2/ocfs2_lockid.h                                       |    4 
 fs/ocfs2/ocfs2_lockingver.h                                   |    4 
 fs/ocfs2/refcounttree.c                                       |    4 
 fs/ocfs2/refcounttree.h                                       |    4 
 fs/ocfs2/reservations.c                                       |    4 
 fs/ocfs2/reservations.h                                       |    4 
 fs/ocfs2/resize.c                                             |    4 
 fs/ocfs2/resize.h                                             |    4 
 fs/ocfs2/slot_map.c                                           |    4 
 fs/ocfs2/slot_map.h                                           |    4 
 fs/ocfs2/stack_o2cb.c                                         |    4 
 fs/ocfs2/stack_user.c                                         |    4 
 fs/ocfs2/stackglue.c                                          |    4 
 fs/ocfs2/stackglue.h                                          |    4 
 fs/ocfs2/suballoc.c                                           |    4 
 fs/ocfs2/suballoc.h                                           |    4 
 fs/ocfs2/super.c                                              |    4 
 fs/ocfs2/super.h                                              |    4 
 fs/ocfs2/symlink.c                                            |    4 
 fs/ocfs2/symlink.h                                            |    4 
 fs/ocfs2/sysfile.c                                            |    4 
 fs/ocfs2/sysfile.h                                            |    4 
 fs/ocfs2/uptodate.c                                           |    4 
 fs/ocfs2/uptodate.h                                           |    4 
 fs/ocfs2/xattr.c                                              |    4 
 fs/ocfs2/xattr.h                                              |    4 
 fs/proc/generic.c                                             |   13 
 fs/proc/inode.c                                               |   18 
 fs/proc/proc_sysctl.c                                         |    2 
 fs/reiserfs/procfs.c                                          |   10 
 include/asm-generic/bitops/find.h                             |  108 +++
 include/asm-generic/bitops/le.h                               |   38 +
 include/asm-generic/bitsperlong.h                             |   12 
 include/asm-generic/io.h                                      |   11 
 include/linux/align.h                                         |   15 
 include/linux/async.h                                         |    1 
 include/linux/bitmap.h                                        |   11 
 include/linux/bitops.h                                        |   12 
 include/linux/blkdev.h                                        |    1 
 include/linux/compat.h                                        |    1 
 include/linux/configfs.h                                      |    4 
 include/linux/crc8.h                                          |    2 
 include/linux/cred.h                                          |    1 
 include/linux/delayacct.h                                     |   20 
 include/linux/fs.h                                            |    2 
 include/linux/genl_magic_func.h                               |    1 
 include/linux/genl_magic_struct.h                             |    1 
 include/linux/gfp.h                                           |    2 
 include/linux/init_task.h                                     |    1 
 include/linux/initrd.h                                        |    2 
 include/linux/kernel.h                                        |    9 
 include/linux/mm.h                                            |    2 
 include/linux/mmzone.h                                        |    2 
 include/linux/pgtable.h                                       |   10 
 include/linux/proc_fs.h                                       |    1 
 include/linux/profile.h                                       |    3 
 include/linux/smp.h                                           |    8 
 include/linux/swap.h                                          |    1 
 include/linux/vmalloc.h                                       |    7 
 include/uapi/linux/if_bonding.h                               |   11 
 include/uapi/linux/nfs4.h                                     |    6 
 include/xen/interface/elfnote.h                               |   10 
 include/xen/interface/hvm/hvm_vcpu.h                          |   10 
 include/xen/interface/io/xenbus.h                             |   10 
 init/Kconfig                                                  |   12 
 init/initramfs.c                                              |   38 +
 init/main.c                                                   |    1 
 ipc/sem.c                                                     |   12 
 kernel/async.c                                                |   68 --
 kernel/configs/android-base.config                            |    1 
 kernel/crash_core.c                                           |    7 
 kernel/cred.c                                                 |    2 
 kernel/exit.c                                                 |   67 ++
 kernel/fork.c                                                 |   23 
 kernel/gcov/Kconfig                                           |    1 
 kernel/gcov/base.c                                            |   49 +
 kernel/gcov/clang.c                                           |  282 ----------
 kernel/gcov/fs.c                                              |  146 ++++-
 kernel/gcov/gcc_4_7.c                                         |  173 ------
 kernel/gcov/gcov.h                                            |   14 
 kernel/kexec_core.c                                           |    4 
 kernel/kexec_file.c                                           |    4 
 kernel/kmod.c                                                 |    2 
 kernel/resource.c                                             |  198 ++++---
 kernel/sys.c                                                  |   14 
 kernel/umh.c                                                  |    8 
 kernel/up.c                                                   |    2 
 kernel/user_namespace.c                                       |    6 
 lib/bch.c                                                     |    2 
 lib/crc8.c                                                    |    2 
 lib/decompress_unlzma.c                                       |    2 
 lib/find_bit.c                                                |   68 --
 lib/genalloc.c                                                |    7 
 lib/list_sort.c                                               |    2 
 lib/parser.c                                                  |   61 +-
 lib/percpu_counter.c                                          |    2 
 lib/stackdepot.c                                              |    6 
 mm/balloon_compaction.c                                       |    4 
 mm/compaction.c                                               |    4 
 mm/filemap.c                                                  |    2 
 mm/gup.c                                                      |    2 
 mm/highmem.c                                                  |    2 
 mm/huge_memory.c                                              |    6 
 mm/hugetlb.c                                                  |    6 
 mm/internal.h                                                 |    2 
 mm/kasan/kasan.h                                              |    8 
 mm/kasan/quarantine.c                                         |    4 
 mm/kasan/shadow.c                                             |    4 
 mm/kfence/report.c                                            |    2 
 mm/khugepaged.c                                               |    2 
 mm/ksm.c                                                      |    6 
 mm/madvise.c                                                  |    4 
 mm/memcontrol.c                                               |   18 
 mm/memory-failure.c                                           |    2 
 mm/memory.c                                                   |   18 
 mm/mempolicy.c                                                |    6 
 mm/migrate.c                                                  |    8 
 mm/mmap.c                                                     |    4 
 mm/mprotect.c                                                 |    2 
 mm/mremap.c                                                   |    2 
 mm/nommu.c                                                    |   10 
 mm/oom_kill.c                                                 |    2 
 mm/page-writeback.c                                           |    4 
 mm/page_alloc.c                                               |   16 
 mm/page_owner.c                                               |    2 
 mm/page_vma_mapped.c                                          |    2 
 mm/percpu-internal.h                                          |    2 
 mm/percpu.c                                                   |    2 
 mm/pgalloc-track.h                                            |    6 
 mm/rmap.c                                                     |    2 
 mm/slab.c                                                     |    8 
 mm/slub.c                                                     |    2 
 mm/swap.c                                                     |    4 
 mm/swap_slots.c                                               |    2 
 mm/swap_state.c                                               |    2 
 mm/vmalloc.c                                                  |  124 ----
 mm/vmstat.c                                                   |    2 
 mm/z3fold.c                                                   |    2 
 mm/zpool.c                                                    |    2 
 mm/zsmalloc.c                                                 |    6 
 samples/configfs/configfs_sample.c                            |    2 
 scripts/checkpatch.pl                                         |   15 
 scripts/gdb/linux/cpus.py                                     |   23 
 scripts/gdb/linux/symbols.py                                  |    3 
 scripts/spelling.txt                                          |    3 
 tools/include/asm-generic/bitops/find.h                       |   85 ++-
 tools/include/asm-generic/bitsperlong.h                       |    3 
 tools/include/linux/bitmap.h                                  |   18 
 tools/lib/bitmap.c                                            |    4 
 tools/lib/find_bit.c                                          |   56 -
 tools/scripts/Makefile.include                                |    1 
 tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c |   44 +
 tools/testing/selftests/kvm/lib/sparsebit.c                   |    1 
 tools/testing/selftests/mincore/mincore_selftest.c            |    1 
 tools/testing/selftests/powerpc/mm/tlbie_test.c               |    1 
 tools/testing/selftests/proc/Makefile                         |    1 
 tools/testing/selftests/proc/proc-subset-pid.c                |  121 ++++
 tools/testing/selftests/proc/read.c                           |    4 
 tools/usb/hcd-tests.sh                                        |    2 
 343 files changed, 1383 insertions(+), 2119 deletions(-)



^ permalink raw reply	[flat|nested] 281+ messages in thread

* Re: incoming
  2021-05-05 17:44       ` incoming Andrew Morton
@ 2021-05-06  3:19         ` Anshuman Khandual
  0 siblings, 0 replies; 281+ messages in thread
From: Anshuman Khandual @ 2021-05-06  3:19 UTC (permalink / raw)
  To: Andrew Morton, Linus Torvalds; +Cc: Konstantin Ryabitsev, Linux-MM, mm-commits



On 5/5/21 11:14 PM, Andrew Morton wrote:
> On Wed, 5 May 2021 10:10:33 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
>> On Tue, May 4, 2021 at 8:16 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>>> Let me resend right now with the same in-reply-to.  Hopefully they will
>>> land in the correct place.
>> Well, you re-sent it twice, and I have three copies in my own mailbox,
>> bot they still don't show up on the mm-commits mailing list.
>>
>> So the list hates them for some odd reason.
>>
>> I've picked them up locally, but adding Konstantin to the participants
>> to see if he can see what's up.
>>
>> Konstantin: patches 103/106/107 are missing on lore out of Andrew's
>> series of 143. Odd.
> It's weird.  They don't turn up on linux-mm either, and that's running
> at kvack.org, also majordomo.  They don't get through when sent with
> either heirloom-mailx or with sylpheed.
> 
> Also, it seems that when Anshuman originally sent the patch, linux-mm
> and linux-kernel didn't send it back out.  So perhaps a spam filter
> triggered?
> 
> I'm seeing
> 
> https://lore.kernel.org/linux-arm-kernel/1615278790-18053-3-git-send-email-anshuman.khandual@arm.com/
> 
> which is via linux-arm-kernel@lists.infradead.org but the linux-kernel
> server massacred that patch series.  Searching
> https://lkml.org/lkml/2021/3/9 for "anshuman" only shows 3 of the 7
> email series.

Yeah these patches faced problem from the very beginning getting
into the MM/LKML list for some strange reason.


^ permalink raw reply	[flat|nested] 281+ messages in thread

* Re: incoming
  2021-05-05 17:10     ` incoming Linus Torvalds
@ 2021-05-05 17:44       ` Andrew Morton
  2021-05-06  3:19         ` incoming Anshuman Khandual
  0 siblings, 1 reply; 281+ messages in thread
From: Andrew Morton @ 2021-05-05 17:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Konstantin Ryabitsev, Linux-MM, mm-commits


[-- Attachment #1: Type: text/plain, Size: 1387 bytes --]

On Wed, 5 May 2021 10:10:33 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, May 4, 2021 at 8:16 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > Let me resend right now with the same in-reply-to.  Hopefully they will
> > land in the correct place.
> 
> Well, you re-sent it twice, and I have three copies in my own mailbox,
> bot they still don't show up on the mm-commits mailing list.
> 
> So the list hates them for some odd reason.
> 
> I've picked them up locally, but adding Konstantin to the participants
> to see if he can see what's up.
> 
> Konstantin: patches 103/106/107 are missing on lore out of Andrew's
> series of 143. Odd.

It's weird.  They don't turn up on linux-mm either, and that's running
at kvack.org, also majordomo.  They don't get through when sent with
either heirloom-mailx or with sylpheed.

Also, it seems that when Anshuman originally sent the patch, linux-mm
and linux-kernel didn't send it back out.  So perhaps a spam filter
triggered?

I'm seeing

https://lore.kernel.org/linux-arm-kernel/1615278790-18053-3-git-send-email-anshuman.khandual@arm.com/

which is via linux-arm-kernel@lists.infradead.org but the linux-kernel
server massacred that patch series.  Searching
https://lkml.org/lkml/2021/3/9 for "anshuman" only shows 3 of the 7
email series.

One of the emails (as sent my me) is attached, if that helps.



[-- Attachment #2: x.txt --]
[-- Type: text/plain, Size: 21048 bytes --]

Return-Path: <akpm@linux-foundation.org>
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on y
X-Spam-Level: (none)
X-Spam-Status: No, score=-101.5 required=2.5 tests=BAYES_00,T_DKIM_INVALID,
	USER_IN_WHITELIST autolearn=ham autolearn_force=no version=3.4.1
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
	by localhost.localdomain (8.15.2/8.15.2/Debian-8ubuntu1) with ESMTP id 1453H2fk032202
	for <akpm@localhost>; Tue, 4 May 2021 20:17:03 -0700
Received: from imap.fastmail.com [66.111.4.135]
	by localhost.localdomain with IMAP (fetchmail-6.3.26)
	for <akpm@localhost> (single-drop); Tue, 04 May 2021 20:17:03 -0700 (PDT)
Received: from compute1.internal (compute1.nyi.internal [10.202.2.41])
	 by sloti11d1t06 (Cyrus 3.5.0-alpha0-442-g5daca166b9-fm-20210428.001-g5daca166) with LMTPA;
	 Tue, 04 May 2021 23:16:31 -0400
X-Cyrus-Session-Id: sloti11d1t06-1620184591-1699471-2-6359664467419938249
X-Sieve: CMU Sieve 3.0
X-Resolved-to: akpm@mbx.kernel.org
X-Delivered-to: akpm@mbx.kernel.org
X-Mail-from: akpm@linux-foundation.org
Received: from mx6 ([10.202.2.205])
  by compute1.internal (LMTPProxy); Tue, 04 May 2021 23:16:31 -0400
Received: from mx6.messagingengine.com (localhost [127.0.0.1])
	by mailmx.nyi.internal (Postfix) with ESMTP id 40796C800E1
	for <akpm@mbx.kernel.org>; Tue,  4 May 2021 23:16:31 -0400 (EDT)
Received: from mx6.messagingengine.com (localhost [127.0.0.1])
    by mx6.messagingengine.com (Authentication Milter) with ESMTP
    id 14870833D7F;
    Tue, 4 May 2021 23:16:31 -0400
ARC-Seal: i=2; a=rsa-sha256; cv=pass; d=messagingengine.com; s=fm2; t=
    1620184591; b=FBo7Gf3JFN+4QYg5Byan0oNm6RESv+sIf5HcaslVNsUd9SOTGS
    yI0+IsXr1CUpGH783hE6fmgEq9SyfOwQVZjdikLaJS1+7u0JtfAYQFU3RORCtXlr
    djJWrScfjVa8nAHX4rQCtzvtPYuzx5w7cTgGgeILgoJMxgLj7EC9xcT8BIf68+9W
    Lw+ohAmcuiKhL2ez+de4SMuwdh3dh2FwAIHQOsSjEU1/NV+WGxMLwYbxWgTrqQGH
    RQIzFNdq30qslW9huK47+e80uHOX2tXwxtshwbThFEn458bdV5LL6Y8Oh4ZWMbv1
    tFgTt515DVedonZknxc07XsXtAjaJyB8bfHw==
ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=
    messagingengine.com; h=date:from:to:subject:message-id
    :in-reply-to; s=fm2; t=1620184591; bh=LuH7mbm3+zp863vKBEqKeoZtnp
    uFxYpIb5oTVwf56Es=; b=m5E1fbz2b+an/X406oY3BuG0Zm4/W05vWAki8Lsnud
    gPCc1LfPUFSuXaMppcEDPbLKprp4hH3T52itK4pivXMQCLEOyme7kVStaLMVTiky
    Xxqh5ZdhOWvygBfda/GjfuLBSbbj2gfm8HPKpbL7CA5foelknIBhJHDzGkJyxetZ
    YagZfVvtdo2OEwnC1mmjUCpKPO5+m5kaZO0ol6rPdl+TV0MKGhjLg+/i6Ia+0nFp
    zDwV4VeACvVcGb2xY7KG5Z+BtqVxeVFn+w5JcqpWUtxEKoSBR4bWARzjwHg6eouh
    7psOOKPTt/NzDKk+3f49lso5KlPiTF2xEU/+5SIttCkQ==
ARC-Authentication-Results: i=2; mx6.messagingengine.com;
    arc=pass (as.1.google.com=pass, ams.1.google.com=pass)
    smtp.remote-ip=209.85.215.198;
    bimi=skipped (DMARC did not pass);
    dkim=pass (1024-bit rsa key sha256) header.d=linux-foundation.org
    header.i=@linux-foundation.org header.b=Gdz/3wY9 header.a=rsa-sha256
    header.s=korg x-bits=1024;
    dmarc=none policy.published-domain-policy=none
    policy.applied-disposition=none policy.evaluated-disposition=none
    (p=none,d=none,d.eval=none) policy.policy-from=p
    header.from=linux-foundation.org;
    iprev=pass smtp.remote-ip=209.85.215.198 (mail-pg1-f198.google.com);
    spf=pass smtp.mailfrom=akpm@linux-foundation.org
    smtp.helo=mail-pg1-f198.google.com;
    x-aligned-from=pass (Address match);
    x-arc-spf=pass
    (google.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender)
    smtp.mailfrom=akpm@linux-foundation.org x-arc-instance=1
    x-arc-domain=google.com (Trusted from aar.1.google.com);
    x-csa=none;
    x-google-dkim=fail (message has been altered, 2048-bit rsa key)
    header.d=1e100.net header.i=@1e100.net header.b=VZuDOxUf;
    x-me-sender=none;
    x-ptr=pass smtp.helo=mail-pg1-f198.google.com
    policy.ptr=mail-pg1-f198.google.com;
    x-return-mx=pass header.domain=linux-foundation.org policy.is_org=yes
    (MX Records found: ASPMX.L.GOOGLE.COM,ALT1.ASPMX.L.GOOGLE.COM,ALT2.ASPMX.L.GOOGLE.COM,ALT3.ASPMX.L.GOOGLE.COM,ALT4.ASPMX.L.GOOGLE.COM);
    x-return-mx=pass smtp.domain=linux-foundation.org policy.is_org=yes
    (MX Records found: ASPMX.L.GOOGLE.COM,ALT1.ASPMX.L.GOOGLE.COM,ALT2.ASPMX.L.GOOGLE.COM,ALT3.ASPMX.L.GOOGLE.COM,ALT4.ASPMX.L.GOOGLE.COM);
    x-tls=pass smtp.version=TLSv1.3 smtp.cipher=TLS_AES_256_GCM_SHA384
    smtp.bits=256/256;
    x-vs=clean score=40 state=0
Authentication-Results: mx6.messagingengine.com;
    arc=pass (as.1.google.com=pass, ams.1.google.com=pass)
      smtp.remote-ip=209.85.215.198;
    bimi=skipped (DMARC did not pass);
    dkim=pass (1024-bit rsa key sha256) header.d=linux-foundation.org
      header.i=@linux-foundation.org header.b=Gdz/3wY9 header.a=rsa-sha256
      header.s=korg x-bits=1024;
    dmarc=none policy.published-domain-policy=none
      policy.applied-disposition=none policy.evaluated-disposition=none
      (p=none,d=none,d.eval=none) policy.policy-from=p
      header.from=linux-foundation.org;
    iprev=pass smtp.remote-ip=209.85.215.198 (mail-pg1-f198.google.com);
    spf=pass smtp.mailfrom=akpm@linux-foundation.org
      smtp.helo=mail-pg1-f198.google.com;
    x-aligned-from=pass (Address match);
    x-arc-spf=pass
      (google.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender)
      smtp.mailfrom=akpm@linux-foundation.org x-arc-instance=1
      x-arc-domain=google.com (Trusted from aar.1.google.com);
    x-csa=none;
    x-google-dkim=fail (message has been altered, 2048-bit rsa key)
      header.d=1e100.net header.i=@1e100.net header.b=VZuDOxUf;
    x-me-sender=none;
    x-ptr=pass smtp.helo=mail-pg1-f198.google.com
      policy.ptr=mail-pg1-f198.google.com;
    x-return-mx=pass header.domain=linux-foundation.org policy.is_org=yes
      (MX Records found: ASPMX.L.GOOGLE.COM,ALT1.ASPMX.L.GOOGLE.COM,ALT2.ASPMX.L.GOOGLE.COM,ALT3.ASPMX.L.GOOGLE.COM,ALT4.ASPMX.L.GOOGLE.COM);
    x-return-mx=pass smtp.domain=linux-foundation.org policy.is_org=yes
      (MX Records found: ASPMX.L.GOOGLE.COM,ALT1.ASPMX.L.GOOGLE.COM,ALT2.ASPMX.L.GOOGLE.COM,ALT3.ASPMX.L.GOOGLE.COM,ALT4.ASPMX.L.GOOGLE.COM);
    x-tls=pass smtp.version=TLSv1.3 smtp.cipher=TLS_AES_256_GCM_SHA384
      smtp.bits=256/256;
    x-vs=clean score=40 state=0
X-ME-VSCause: gggruggvucftvghtrhhoucdtuddrgeduledrvdefjedgieegucetufdoteggodetrfdotf
    fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggvpdfu
    rfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucgoufhorhhtvggutfgvtg
    hiphdvucdlgedtmdenucfjughrpeffhffvuffkjggfsedttdertddtredtnecuhfhrohhm
    peetnhgurhgvficuofhorhhtohhnuceorghkphhmsehlihhnuhigqdhfohhunhgurghtih
    honhdrohhrgheqnecuggftrfgrthhtvghrnhepjeevfeduveffvddvudetkefhgeduveeu
    geevvdfhhfevhfekkedtieefgfduheeinecuffhomhgrihhnpehkvghrnhgvlhdrohhrgh
    enucfkphepvddtledrkeehrddvudehrdduleekpdduleekrddugeehrddvledrleelnecu
    uegrugftvghpuhhtkfhppeduleekrddugeehrddvledrleelnecuvehluhhsthgvrhfuih
    iivgeptdenucfrrghrrghmpehinhgvthepvddtledrkeehrddvudehrdduleekpdhhvghl
    ohepmhgrihhlqdhpghduqdhfudelkedrghhoohhglhgvrdgtohhmpdhmrghilhhfrhhomh
    epoegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgqe
X-ME-VSScore: 40
X-ME-VSCategory: clean
X-ME-CSA: none
Received-SPF: pass
    (linux-foundation.org: Sender is authorized to use 'akpm@linux-foundation.org' in 'mfrom' identity (mechanism 'include:_spf.google.com' matched))
    receiver=mx6.messagingengine.com;
    identity=mailfrom;
    envelope-from="akpm@linux-foundation.org";
    helo=mail-pg1-f198.google.com;
    client-ip=209.85.215.198
Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
	(No client certificate requested)
	by mx6.messagingengine.com (Postfix) with ESMTPS
	for <akpm@mbx.kernel.org>; Tue,  4 May 2021 23:16:31 -0400 (EDT)
Received: by mail-pg1-f198.google.com with SMTP id g5-20020a63f4050000b02901f6c7b9a6d0so593624pgi.5
        for <akpm@mbx.kernel.org>; Tue, 04 May 2021 20:16:30 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:dkim-signature:date:from:to:subject:message-id
         :in-reply-to:user-agent;
        bh=LuH7mbm3+zp863vKBEqKeoZtnpuFxYpIb5oTVwf56Es=;
        b=VZuDOxUfeHXJz1/CiFfcxuMVHkmW5RznvqYS+Py8Ub6nHHXprQJGE9Ze3WgH+1ylSe
         NJLEC7xgv15SR9A+e/MT4RTj3OVOwtd1Zi2vPav39a9K4tP+2uL2Ei+5d7FtT3LLZsjo
         feek/DqCGSkJ/EC5woLyU9BBkfLUuQ9/2HiDCk10BMetEfWdor69Slb39NOXES8br02X
         25Btabu9ZCWroyjQj7W5gwGr5Z6Hs2nbnnfAb+e92FalcUD/4ql77lNzRcWGi4/9TT8s
         ntqI2g46Xv+k5LURaRH5CRBpxkkKgzcrioRPYFUHkEgOEWy1hPzg9QPk8ZO35Xm9R9d2
         vl3Q==
X-Gm-Message-State: AOAM531IlYUTVWcMrsTunnxZWB7SKeeOmoZj5mZ1A5tl7N/JlZUueN8L
	tvyRKnvxHr6a5mDaGHN9Tb1N/iCzT0U5oQgRVTxTnj1qFGibRa9+leLQNKX0aGlNg9JiaMfromb
	xyOlCUpVXOlVvchuwTUSTn7rXum+Hh3PWQZm5II/EX+0AkzKqez62Z8U=
X-Received: by 2002:a17:90a:a581:: with SMTP id b1mr32203271pjq.53.1620184589161;
        Tue, 04 May 2021 20:16:29 -0700 (PDT)
X-Google-Smtp-Source: ABdhPJxffoGdRqAjUagWoMVD5p/Lk1KTEDftEhkWh8ewatgDmZLlxh0lO1hxYIdYYwoO5dsJ/i0z
X-Received: by 2002:a17:90a:a581:: with SMTP id b1mr32203198pjq.53.1620184588109;
        Tue, 04 May 2021 20:16:28 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1620184588; cv=none;
        d=google.com; s=arc-20160816;
        b=Fr2b2AMXJr6OeNpSql45tq1korkuDOunp7t+DpARuEBnwvQnKfagyipQ93jywsRf/c
         /i/mP2eTmJwOLWNORClh1MGF/0VfBx1ULoB9W4CI3LpVgGFXGGFis8LTcvUYD5yvhlsV
         50rm2j34iS9lyo04FB/hbhGkwLtUhz2PGkLGuqHspTd+pUpUCf5SLxGJbZC5uCcUEsbO
         8WSDBWyvaCPjFzJQZK60gK70ticKW+fCG1xHtOG4qsFCbqEpFKBy8eVK83OBazo/dQDr
         DOheWNWyw2o/WMP4GpZMvZuj30dx3j8xnBahIpnMIQJaog6wLMcVX9pkQ8UJym3/PGNm
         pO/g==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816;
        h=user-agent:in-reply-to:message-id:subject:to:from:date
         :dkim-signature;
        bh=LuH7mbm3+zp863vKBEqKeoZtnpuFxYpIb5oTVwf56Es=;
        b=vVN16NPMKjoxSJQ6b36VXFCkZqnmG7wABfilgE069txZqmHpEMyZb8lRStkHy557LM
         Kn7UfJFP3xwsP8ZTCipVDZ6tpFW/hYFU9o4th9G8asWs+MOf9xpWX2LQZ1FTmaao2Fg5
         uCHypz39cnAh0Z1EJfNsTcaTGIrkbBd6zje+mtBgs8hnfH8HcWBYTPCHCCx950Z928tb
         XOPd/Igs7yzD1ioBiGXZj/ciwPbWVTaZXBg4JOZSApxkDMfuMyfyLLOs++EVkyxJHUme
         TmgwvLkixcwEtKF7gIeqEhwvOUSVvilLuJLFVaLumwTcjJ1amVfGcJhBE7LIM9C3SMpA
         rOOg==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@linux-foundation.org header.s=korg header.b="Gdz/3wY9";
       spf=pass (google.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org
Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99])
        by mx.google.com with ESMTPS id c85si20173199pfb.8.2021.05.04.20.16.27
        (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
        Tue, 04 May 2021 20:16:28 -0700 (PDT)
Received-SPF: pass (google.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) client-ip=198.145.29.99;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@linux-foundation.org header.s=korg header.b="Gdz/3wY9";
       spf=pass (google.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org
Received: by mail.kernel.org (Postfix) with ESMTPSA id A4DB4610D2;
	Wed,  5 May 2021 03:16:26 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org;
	s=korg; t=1620184587;
	bh=TxN4wgKcKf2UUem+5pL09m9GL/7U592mEalo2U6vwAU=;
	h=Date:From:To:Subject:In-Reply-To:From;
	b=Gdz/3wY9ktH3hOmn2DAOkfh0JXwPdMJ8xsNQFa9eI25K39Z3iHdRGo9jX3QtMDtog
	 D4Zakt52CQCYsV91c9oCai8KnCTkkAjJq/Ez7p8UHpz97Go3yYYxqg6DDl6d8HCQvN
	 H47dTaZAgeH2sw29bjB9fRzNuTx7k4RAPlqZIpiE=
Date: Tue, 04 May 2021 20:16:26 -0700
From: Andrew Morton <akpm@linux-foundation.org>
To: akpm@linux-foundation.org, anshuman.khandual@arm.com, aou@eecs.berkeley.edu, arnd@arndb.de, benh@kernel.crashing.org, borntraeger@de.ibm.com, bp@alien8.de, catalin.marinas@arm.com, dalias@libc.org, deller@gmx.de, gor@linux.ibm.com, hca@linux.ibm.com, hpa@zytor.com, James.Bottomley@HansenPartnership.com, linux-mm@kvack.org, linux@armlinux.org.uk, mingo@redhat.com, mm-commits@vger.kernel.org, mpe@ellerman.id.au, palmerdabbelt@google.com, paul.walmsley@sifive.com, paulus@samba.org, tglx@linutronix.de, torvalds@linux-foundation.org, tsbogend@alpha.franken.de, vgupta@synopsys.com, viro@zeniv.linux.org.uk, will@kernel.org, ysato@users.osdn.me
Subject: [patch 103/143] mm: generalize SYS_SUPPORTS_HUGETLBFS (rename as ARCH_SUPPORTS_HUGETLBFS)
Message-ID: <20210505031626.c8o4WL7KE%akpm@linux-foundation.org>
In-Reply-To: <20210504183219.a3cc46aee4013d77402276c5@linux-foundation.org>
User-Agent: s-nail v14.8.16
X-Gm-Original-To: akpm@linux-foundation.org

From: Anshuman Khandual <anshuman.khandual@arm.com>
Subject: mm: generalize SYS_SUPPORTS_HUGETLBFS (rename as ARCH_SUPPORTS_HUGETLBFS)

SYS_SUPPORTS_HUGETLBFS config has duplicate definitions on platforms that
subscribe it.  Instead, just make it a generic option which can be
selected on applicable platforms.  Also rename it as
ARCH_SUPPORTS_HUGETLBFS instead.  This reduces code duplication and makes
it cleaner.

Link: https://lkml.kernel.org/r/1617259448-22529-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>	[riscv]
Acked-by: Michael Ellerman <mpe@ellerman.id.au>		[powerpc]
Cc: Russell King <linux@armlinux.org.uk>
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/arm/Kconfig                       |    5 +----
 arch/arm64/Kconfig                     |    4 +---
 arch/mips/Kconfig                      |    6 +-----
 arch/parisc/Kconfig                    |    5 +----
 arch/powerpc/Kconfig                   |    3 ---
 arch/powerpc/platforms/Kconfig.cputype |    6 +++---
 arch/riscv/Kconfig                     |    5 +----
 arch/sh/Kconfig                        |    5 +----
 fs/Kconfig                             |    5 ++++-
 9 files changed, 13 insertions(+), 31 deletions(-)

--- a/arch/arm64/Kconfig~mm-generalize-sys_supports_hugetlbfs-rename-as-arch_supports_hugetlbfs
+++ a/arch/arm64/Kconfig
@@ -73,6 +73,7 @@ config ARM64
 	select ARCH_USE_QUEUED_SPINLOCKS
 	select ARCH_USE_SYM_ANNOTATIONS
 	select ARCH_SUPPORTS_DEBUG_PAGEALLOC
+	select ARCH_SUPPORTS_HUGETLBFS
 	select ARCH_SUPPORTS_MEMORY_FAILURE
 	select ARCH_SUPPORTS_SHADOW_CALL_STACK if CC_HAVE_SHADOW_CALL_STACK
 	select ARCH_SUPPORTS_LTO_CLANG if CPU_LITTLE_ENDIAN
@@ -1072,9 +1073,6 @@ config HW_PERF_EVENTS
 	def_bool y
 	depends on ARM_PMU
 
-config SYS_SUPPORTS_HUGETLBFS
-	def_bool y
-
 config ARCH_HAS_FILTER_PGPROT
 	def_bool y
 
--- a/arch/arm/Kconfig~mm-generalize-sys_supports_hugetlbfs-rename-as-arch_supports_hugetlbfs
+++ a/arch/arm/Kconfig
@@ -31,6 +31,7 @@ config ARM
 	select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
 	select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT if CPU_V7
 	select ARCH_SUPPORTS_ATOMIC_RMW
+	select ARCH_SUPPORTS_HUGETLBFS if ARM_LPAE
 	select ARCH_USE_BUILTIN_BSWAP
 	select ARCH_USE_CMPXCHG_LOCKREF
 	select ARCH_USE_MEMTEST
@@ -1511,10 +1512,6 @@ config HW_PERF_EVENTS
 	def_bool y
 	depends on ARM_PMU
 
-config SYS_SUPPORTS_HUGETLBFS
-       def_bool y
-       depends on ARM_LPAE
-
 config HAVE_ARCH_TRANSPARENT_HUGEPAGE
        def_bool y
        depends on ARM_LPAE
--- a/arch/mips/Kconfig~mm-generalize-sys_supports_hugetlbfs-rename-as-arch_supports_hugetlbfs
+++ a/arch/mips/Kconfig
@@ -19,6 +19,7 @@ config MIPS
 	select ARCH_USE_MEMTEST
 	select ARCH_USE_QUEUED_RWLOCKS
 	select ARCH_USE_QUEUED_SPINLOCKS
+	select ARCH_SUPPORTS_HUGETLBFS if CPU_SUPPORTS_HUGEPAGES
 	select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
 	select ARCH_WANT_IPC_PARSE_VERSION
 	select ARCH_WANT_LD_ORPHAN_WARN
@@ -1287,11 +1288,6 @@ config SYS_SUPPORTS_BIG_ENDIAN
 config SYS_SUPPORTS_LITTLE_ENDIAN
 	bool
 
-config SYS_SUPPORTS_HUGETLBFS
-	bool
-	depends on CPU_SUPPORTS_HUGEPAGES
-	default y
-
 config MIPS_HUGE_TLB_SUPPORT
 	def_bool HUGETLB_PAGE || TRANSPARENT_HUGEPAGE
 
--- a/arch/parisc/Kconfig~mm-generalize-sys_supports_hugetlbfs-rename-as-arch_supports_hugetlbfs
+++ a/arch/parisc/Kconfig
@@ -12,6 +12,7 @@ config PARISC
 	select ARCH_HAS_STRICT_KERNEL_RWX
 	select ARCH_HAS_UBSAN_SANITIZE_ALL
 	select ARCH_NO_SG_CHAIN
+	select ARCH_SUPPORTS_HUGETLBFS if PA20
 	select ARCH_SUPPORTS_MEMORY_FAILURE
 	select DMA_OPS
 	select RTC_CLASS
@@ -138,10 +139,6 @@ config PGTABLE_LEVELS
 	default 3 if 64BIT && PARISC_PAGE_SIZE_4KB
 	default 2
 
-config SYS_SUPPORTS_HUGETLBFS
-	def_bool y if PA20
-
-
 menu "Processor type and features"
 
 choice
--- a/arch/powerpc/Kconfig~mm-generalize-sys_supports_hugetlbfs-rename-as-arch_supports_hugetlbfs
+++ a/arch/powerpc/Kconfig
@@ -697,9 +697,6 @@ config ARCH_SPARSEMEM_DEFAULT
 	def_bool y
 	depends on PPC_BOOK3S_64
 
-config SYS_SUPPORTS_HUGETLBFS
-	bool
-
 config ILLEGAL_POINTER_VALUE
 	hex
 	# This is roughly half way between the top of user space and the bottom
--- a/arch/powerpc/platforms/Kconfig.cputype~mm-generalize-sys_supports_hugetlbfs-rename-as-arch_supports_hugetlbfs
+++ a/arch/powerpc/platforms/Kconfig.cputype
@@ -40,8 +40,8 @@ config PPC_85xx
 
 config PPC_8xx
 	bool "Freescale 8xx"
+	select ARCH_SUPPORTS_HUGETLBFS
 	select FSL_SOC
-	select SYS_SUPPORTS_HUGETLBFS
 	select PPC_HAVE_KUEP
 	select PPC_HAVE_KUAP
 	select HAVE_ARCH_VMAP_STACK
@@ -95,9 +95,9 @@ config PPC_BOOK3S_64
 	bool "Server processors"
 	select PPC_FPU
 	select PPC_HAVE_PMU_SUPPORT
-	select SYS_SUPPORTS_HUGETLBFS
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
 	select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
+	select ARCH_SUPPORTS_HUGETLBFS
 	select ARCH_SUPPORTS_NUMA_BALANCING
 	select IRQ_WORK
 	select PPC_MM_SLICES
@@ -278,9 +278,9 @@ config FSL_BOOKE
 # this is for common code between PPC32 & PPC64 FSL BOOKE
 config PPC_FSL_BOOK3E
 	bool
+	select ARCH_SUPPORTS_HUGETLBFS if PHYS_64BIT || PPC64
 	select FSL_EMB_PERFMON
 	select PPC_SMP_MUXED_IPI
-	select SYS_SUPPORTS_HUGETLBFS if PHYS_64BIT || PPC64
 	select PPC_DOORBELL
 	default y if FSL_BOOKE
 
--- a/arch/riscv/Kconfig~mm-generalize-sys_supports_hugetlbfs-rename-as-arch_supports_hugetlbfs
+++ a/arch/riscv/Kconfig
@@ -30,6 +30,7 @@ config RISCV
 	select ARCH_HAS_STRICT_KERNEL_RWX if MMU
 	select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
 	select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT
+	select ARCH_SUPPORTS_HUGETLBFS if MMU
 	select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
 	select ARCH_WANT_FRAME_POINTERS
 	select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
@@ -165,10 +166,6 @@ config ARCH_WANT_GENERAL_HUGETLB
 config ARCH_SUPPORTS_UPROBES
 	def_bool y
 
-config SYS_SUPPORTS_HUGETLBFS
-	depends on MMU
-	def_bool y
-
 config STACKTRACE_SUPPORT
 	def_bool y
 
--- a/arch/sh/Kconfig~mm-generalize-sys_supports_hugetlbfs-rename-as-arch_supports_hugetlbfs
+++ a/arch/sh/Kconfig
@@ -101,9 +101,6 @@ config SYS_SUPPORTS_APM_EMULATION
 	bool
 	select ARCH_SUSPEND_POSSIBLE
 
-config SYS_SUPPORTS_HUGETLBFS
-	bool
-
 config SYS_SUPPORTS_SMP
 	bool
 
@@ -175,12 +172,12 @@ config CPU_SH3
 
 config CPU_SH4
 	bool
+	select ARCH_SUPPORTS_HUGETLBFS if MMU
 	select CPU_HAS_INTEVT
 	select CPU_HAS_SR_RB
 	select CPU_HAS_FPU if !CPU_SH4AL_DSP
 	select SH_INTC
 	select SYS_SUPPORTS_SH_TMU
-	select SYS_SUPPORTS_HUGETLBFS if MMU
 
 config CPU_SH4A
 	bool
--- a/fs/Kconfig~mm-generalize-sys_supports_hugetlbfs-rename-as-arch_supports_hugetlbfs
+++ a/fs/Kconfig
@@ -223,10 +223,13 @@ config TMPFS_INODE64
 
 	  If unsure, say N.
 
+config ARCH_SUPPORTS_HUGETLBFS
+	def_bool n
+
 config HUGETLBFS
 	bool "HugeTLB file system support"
 	depends on X86 || IA64 || SPARC64 || (S390 && 64BIT) || \
-		   SYS_SUPPORTS_HUGETLBFS || BROKEN
+		   ARCH_SUPPORTS_HUGETLBFS || BROKEN
 	help
 	  hugetlbfs is a filesystem backing for HugeTLB pages, based on
 	  ramfs. For architectures that support it, say Y here and read
_

^ permalink raw reply	[flat|nested] 281+ messages in thread

* Re: incoming
  2021-05-05  3:16   ` incoming Andrew Morton
@ 2021-05-05 17:10     ` Linus Torvalds
  2021-05-05 17:44       ` incoming Andrew Morton
  0 siblings, 1 reply; 281+ messages in thread
From: Linus Torvalds @ 2021-05-05 17:10 UTC (permalink / raw)
  To: Andrew Morton, Konstantin Ryabitsev; +Cc: Linux-MM, mm-commits

On Tue, May 4, 2021 at 8:16 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> Let me resend right now with the same in-reply-to.  Hopefully they will
> land in the correct place.

Well, you re-sent it twice, and I have three copies in my own mailbox,
bot they still don't show up on the mm-commits mailing list.

So the list hates them for some odd reason.

I've picked them up locally, but adding Konstantin to the participants
to see if he can see what's up.

Konstantin: patches 103/106/107 are missing on lore out of Andrew's
series of 143. Odd.

             Linus


^ permalink raw reply	[flat|nested] 281+ messages in thread

* Re: incoming
  2021-05-05  1:47 ` incoming Linus Torvalds
@ 2021-05-05  3:16   ` Andrew Morton
  2021-05-05 17:10     ` incoming Linus Torvalds
  0 siblings, 1 reply; 281+ messages in thread
From: Andrew Morton @ 2021-05-05  3:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux-MM, mm-commits

On Tue, 4 May 2021 18:47:19 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, May 4, 2021 at 6:32 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > 143 patches
> 
> Hmm. Only 140 seem to have made it to the list, with 103, 106 and 107 missing.
> 
> Maybe just some mail delay? But at least right now
> 
>    https://lore.kernel.org/mm-commits/
> 
> doesn't show them (and thus 'b4' doesn't work).
> 
> I'll check again later.
> 

Well that's strange.  I see all three via cc:me, but not on linux-mm or
mm-commits.

Let me resend right now with the same in-reply-to.  Hopefully they will
land in the correct place.


^ permalink raw reply	[flat|nested] 281+ messages in thread

* Re: incoming
  2021-05-05  1:32 incoming Andrew Morton
@ 2021-05-05  1:47 ` Linus Torvalds
  2021-05-05  3:16   ` incoming Andrew Morton
  0 siblings, 1 reply; 281+ messages in thread
From: Linus Torvalds @ 2021-05-05  1:47 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux-MM, mm-commits

On Tue, May 4, 2021 at 6:32 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> 143 patches

Hmm. Only 140 seem to have made it to the list, with 103, 106 and 107 missing.

Maybe just some mail delay? But at least right now

   https://lore.kernel.org/mm-commits/

doesn't show them (and thus 'b4' doesn't work).

I'll check again later.

             Linus


^ permalink raw reply	[flat|nested] 281+ messages in thread

* incoming
@ 2021-05-05  1:32 Andrew Morton
  2021-05-05  1:47 ` incoming Linus Torvalds
  0 siblings, 1 reply; 281+ messages in thread
From: Andrew Morton @ 2021-05-05  1:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, mm-commits


The remainder of the main mm/ queue.

143 patches, based on 8ca5297e7e38f2dc8c753d33a5092e7be181fff0, plus
previously sent patches.

Subsystems affected by this patch series:

  mm/pagecache
  mm/hugetlb
  mm/userfaultfd
  mm/vmscan
  mm/compaction
  mm/migration
  mm/cma
  mm/ksm
  mm/vmstat
  mm/mmap
  mm/kconfig
  mm/util
  mm/memory-hotplug
  mm/zswap
  mm/zsmalloc
  mm/highmem
  mm/cleanups
  mm/kfence

Subsystem: mm/pagecache

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
    Patch series "Remove nrexceptional tracking", v2:
      mm: introduce and use mapping_empty()
      mm: stop accounting shadow entries
      dax: account DAX entries as nrpages
      mm: remove nrexceptional from inode

    Hugh Dickins <hughd@google.com>:
      mm: remove nrexceptional from inode: remove BUG_ON

Subsystem: mm/hugetlb

    Peter Xu <peterx@redhat.com>:
    Patch series "hugetlb: Disable huge pmd unshare for uffd-wp", v4:
      hugetlb: pass vma into huge_pte_alloc() and huge_pmd_share()
      hugetlb/userfaultfd: forbid huge pmd sharing when uffd enabled
      mm/hugetlb: move flush_hugetlb_tlb_range() into hugetlb.h
      hugetlb/userfaultfd: unshare all pmds for hugetlbfs when register wp

    Miaohe Lin <linmiaohe@huawei.com>:
      mm/hugetlb: remove redundant reservation check condition in alloc_huge_page()

    Anshuman Khandual <anshuman.khandual@arm.com>:
      mm: generalize HUGETLB_PAGE_SIZE_VARIABLE

    Miaohe Lin <linmiaohe@huawei.com>:
    Patch series "Some cleanups for hugetlb":
      mm/hugetlb: use some helper functions to cleanup code
      mm/hugetlb: optimize the surplus state transfer code in move_hugetlb_state()
      mm/hugetlb_cgroup: remove unnecessary VM_BUG_ON_PAGE in hugetlb_cgroup_migrate()
      mm/hugetlb: simplify the code when alloc_huge_page() failed in hugetlb_no_page()
      mm/hugetlb: avoid calculating fault_mutex_hash in truncate_op case
    Patch series "Cleanup and fixup for khugepaged", v2:
      khugepaged: remove unneeded return value of khugepaged_collapse_pte_mapped_thps()
      khugepaged: reuse the smp_wmb() inside __SetPageUptodate()
      khugepaged: use helper khugepaged_test_exit() in __khugepaged_enter()
      khugepaged: fix wrong result value for trace_mm_collapse_huge_page_isolate()
      mm/huge_memory.c: remove unnecessary local variable ret2
    Patch series "Some cleanups for huge_memory", v3:
      mm/huge_memory.c: rework the function vma_adjust_trans_huge()
      mm/huge_memory.c: make get_huge_zero_page() return bool
      mm/huge_memory.c: rework the function do_huge_pmd_numa_page() slightly
      mm/huge_memory.c: remove redundant PageCompound() check
      mm/huge_memory.c: remove unused macro TRANSPARENT_HUGEPAGE_DEBUG_COW_FLAG
      mm/huge_memory.c: use helper function migration_entry_to_page()

    Yanfei Xu <yanfei.xu@windriver.com>:
      mm/khugepaged.c: replace barrier() with READ_ONCE() for a selective variable

    Miaohe Lin <linmiaohe@huawei.com>:
    Patch series "Cleanup for khugepaged":
      khugepaged: use helper function range_in_vma() in collapse_pte_mapped_thp()
      khugepaged: remove unnecessary out label in collapse_huge_page()
      khugepaged: remove meaningless !pte_present() check in khugepaged_scan_pmd()

    Zi Yan <ziy@nvidia.com>:
      mm: huge_memory: a new debugfs interface for splitting THP tests
      mm: huge_memory: debugfs for file-backed THP split

    Miaohe Lin <linmiaohe@huawei.com>:
    Patch series "Cleanup and fixup for hugetlb", v2:
      mm/hugeltb: remove redundant VM_BUG_ON() in region_add()
      mm/hugeltb: simplify the return code of __vma_reservation_common()
      mm/hugeltb: clarify (chg - freed) won't go negative in hugetlb_unreserve_pages()
      mm/hugeltb: handle the error case in hugetlb_fix_reserve_counts()
      mm/hugetlb: remove unused variable pseudo_vma in remove_inode_hugepages()

    Mike Kravetz <mike.kravetz@oracle.com>:
    Patch series "make hugetlb put_page safe for all calling contexts", v5:
      mm/cma: change cma mutex to irq safe spinlock
      hugetlb: no need to drop hugetlb_lock to call cma_release
      hugetlb: add per-hstate mutex to synchronize user adjustments
      hugetlb: create remove_hugetlb_page() to separate functionality
      hugetlb: call update_and_free_page without hugetlb_lock
      hugetlb: change free_pool_huge_page to remove_pool_huge_page
      hugetlb: make free_huge_page irq safe
      hugetlb: add lockdep_assert_held() calls for hugetlb_lock

    Oscar Salvador <osalvador@suse.de>:
    Patch series "Make alloc_contig_range handle Hugetlb pages", v10:
      mm,page_alloc: bail out earlier on -ENOMEM in alloc_contig_migrate_range
      mm,compaction: let isolate_migratepages_{range,block} return error codes
      mm,hugetlb: drop clearing of flag from prep_new_huge_page
      mm,hugetlb: split prep_new_huge_page functionality
      mm: make alloc_contig_range handle free hugetlb pages
      mm: make alloc_contig_range handle in-use hugetlb pages
      mm,page_alloc: drop unnecessary checks from pfn_range_valid_contig

Subsystem: mm/userfaultfd

    Axel Rasmussen <axelrasmussen@google.com>:
    Patch series "userfaultfd: add minor fault handling", v9:
      userfaultfd: add minor fault registration mode
      userfaultfd: disable huge PMD sharing for MINOR registered VMAs
      userfaultfd: hugetlbfs: only compile UFFD helpers if config enabled
      userfaultfd: add UFFDIO_CONTINUE ioctl
      userfaultfd: update documentation to describe minor fault handling
      userfaultfd/selftests: add test exercising minor fault handling

Subsystem: mm/vmscan

    Dave Hansen <dave.hansen@linux.intel.com>:
      mm/vmscan: move RECLAIM* bits to uapi header
      mm/vmscan: replace implicit RECLAIM_ZONE checks with explicit checks

    Yang Shi <shy828301@gmail.com>:
    Patch series "Make shrinker's nr_deferred memcg aware", v10:
      mm: vmscan: use nid from shrink_control for tracepoint
      mm: vmscan: consolidate shrinker_maps handling code
      mm: vmscan: use shrinker_rwsem to protect shrinker_maps allocation
      mm: vmscan: remove memcg_shrinker_map_size
      mm: vmscan: use kvfree_rcu instead of call_rcu
      mm: memcontrol: rename shrinker_map to shrinker_info
      mm: vmscan: add shrinker_info_protected() helper
      mm: vmscan: use a new flag to indicate shrinker is registered
      mm: vmscan: add per memcg shrinker nr_deferred
      mm: vmscan: use per memcg nr_deferred of shrinker
      mm: vmscan: don't need allocate shrinker->nr_deferred for memcg aware shrinkers
      mm: memcontrol: reparent nr_deferred when memcg offline
      mm: vmscan: shrink deferred objects proportional to priority

Subsystem: mm/compaction

    Pintu Kumar <pintu@codeaurora.org>:
      mm/compaction: remove unused variable sysctl_compact_memory

    Charan Teja Reddy <charante@codeaurora.org>:
      mm: compaction: update the COMPACT[STALL|FAIL] events properly

Subsystem: mm/migration

    Minchan Kim <minchan@kernel.org>:
      mm: disable LRU pagevec during the migration temporarily
      mm: replace migrate_[prep|finish] with lru_cache_[disable|enable]
      mm: fs: invalidate BH LRU during page migration

    Miaohe Lin <linmiaohe@huawei.com>:
    Patch series "Cleanup and fixup for mm/migrate.c", v3:
      mm/migrate.c: make putback_movable_page() static
      mm/migrate.c: remove unnecessary rc != MIGRATEPAGE_SUCCESS check in 'else' case
      mm/migrate.c: fix potential indeterminate pte entry in migrate_vma_insert_page()
      mm/migrate.c: use helper migrate_vma_collect_skip() in migrate_vma_collect_hole()
      Revert "mm: migrate: skip shared exec THP for NUMA balancing"

Subsystem: mm/cma

    Minchan Kim <minchan@kernel.org>:
      mm: vmstat: add cma statistics

    Baolin Wang <baolin.wang@linux.alibaba.com>:
      mm: cma: use pr_err_ratelimited for CMA warning

    Liam Mark <lmark@codeaurora.org>:
      mm: cma: add trace events for CMA alloc perf testing

    Minchan Kim <minchan@kernel.org>:
      mm: cma: support sysfs
      mm: cma: add the CMA instance name to cma trace events
      mm: use proper type for cma_[alloc|release]

Subsystem: mm/ksm

    Miaohe Lin <linmiaohe@huawei.com>:
    Patch series "Cleanup and fixup for ksm":
      ksm: remove redundant VM_BUG_ON_PAGE() on stable_tree_search()
      ksm: use GET_KSM_PAGE_NOLOCK to get ksm page in remove_rmap_item_from_tree()
      ksm: remove dedicated macro KSM_FLAG_MASK
      ksm: fix potential missing rmap_item for stable_node

    Chengyang Fan <cy.fan@huawei.com>:
      mm/ksm: remove unused parameter from remove_trailing_rmap_items()

Subsystem: mm/vmstat

    Hugh Dickins <hughd@google.com>:
      mm: restore node stat checking in /proc/sys/vm/stat_refresh
      mm: no more EINVAL from /proc/sys/vm/stat_refresh
      mm: /proc/sys/vm/stat_refresh skip checking known negative stats
      mm: /proc/sys/vm/stat_refresh stop checking monotonic numa stats

    Saravanan D <saravanand@fb.com>:
      x86/mm: track linear mapping split events

Subsystem: mm/mmap

    Liam Howlett <liam.howlett@oracle.com>:
      mm/mmap.c: don't unlock VMAs in remap_file_pages()

Subsystem: mm/kconfig

    Anshuman Khandual <anshuman.khandual@arm.com>:
    Patch series "mm: some config cleanups", v2:
      mm: generalize ARCH_HAS_CACHE_LINE_SIZE
      mm: generalize SYS_SUPPORTS_HUGETLBFS (rename as ARCH_SUPPORTS_HUGETLBFS)
      mm: generalize ARCH_ENABLE_MEMORY_[HOTPLUG|HOTREMOVE]
      mm: drop redundant ARCH_ENABLE_[HUGEPAGE|THP]_MIGRATION
      mm: drop redundant ARCH_ENABLE_SPLIT_PMD_PTLOCK
      mm: drop redundant HAVE_ARCH_TRANSPARENT_HUGEPAGE

Subsystem: mm/util

    Joe Perches <joe@perches.com>:
      mm/util.c: reduce mem_dump_obj() object size

    Bhaskar Chowdhury <unixbhaskar@gmail.com>:
      mm/util.c: fix typo

Subsystem: mm/memory-hotplug

    Pavel Tatashin <pasha.tatashin@soleen.com>:
    Patch series "prohibit pinning pages in ZONE_MOVABLE", v11:
      mm/gup: don't pin migrated cma pages in movable zone
      mm/gup: check every subpage of a compound page during isolation
      mm/gup: return an error on migration failure
      mm/gup: check for isolation errors
      mm cma: rename PF_MEMALLOC_NOCMA to PF_MEMALLOC_PIN
      mm: apply per-task gfp constraints in fast path
      mm: honor PF_MEMALLOC_PIN for all movable pages
      mm/gup: do not migrate zero page
      mm/gup: migrate pinned pages out of movable zone
      memory-hotplug.rst: add a note about ZONE_MOVABLE and page pinning
      mm/gup: change index type to long as it counts pages
      mm/gup: longterm pin migration cleanup
      selftests/vm: gup_test: fix test flag
      selftests/vm: gup_test: test faulting in kernel, and verify pinnable pages

    Mel Gorman <mgorman@techsingularity.net>:
      mm/memory_hotplug: remove broken locking of zone PCP structures during hot remove

    Oscar Salvador <osalvador@suse.de>:
    Patch series "Allocate memmap from hotadded memory (per device)", v10:
      drivers/base/memory: introduce memory_block_{online,offline}
      mm,memory_hotplug: relax fully spanned sections check

    David Hildenbrand <david@redhat.com>:
      mm,memory_hotplug: factor out adjusting present pages into adjust_present_page_count()

    Oscar Salvador <osalvador@suse.de>:
      mm,memory_hotplug: allocate memmap from the added memory range
      acpi,memhotplug: enable MHP_MEMMAP_ON_MEMORY when supported
      mm,memory_hotplug: add kernel boot option to enable memmap_on_memory
      x86/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
      arm64/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE

Subsystem: mm/zswap

    Zhiyuan Dai <daizhiyuan@phytium.com.cn>:
      mm/zswap.c: switch from strlcpy to strscpy

Subsystem: mm/zsmalloc

    zhouchuangao <zhouchuangao@vivo.com>:
      mm/zsmalloc: use BUG_ON instead of if condition followed by BUG.

Subsystem: mm/highmem

    Ira Weiny <ira.weiny@intel.com>:
    Patch series "btrfs: Convert kmap/memset/kunmap to memzero_user()":
      iov_iter: lift memzero_page() to highmem.h
      btrfs: use memzero_page() instead of open coded kmap pattern

    songqiang <songqiang@uniontech.com>:
      mm/highmem.c: fix coding style issue

Subsystem: mm/cleanups

    Zhiyuan Dai <daizhiyuan@phytium.com.cn>:
      mm/mempool: minor coding style tweaks

    Zhang Yunkai <zhang.yunkai@zte.com.cn>:
      mm/process_vm_access.c: remove duplicate include

Subsystem: mm/kfence

    Marco Elver <elver@google.com>:
      kfence: zero guard page after out-of-bounds access
    Patch series "kfence: optimize timer scheduling", v2:
      kfence: await for allocation using wait_event
      kfence: maximize allocation wait timeout duration
      kfence: use power-efficient work queue to run delayed work

 Documentation/ABI/testing/sysfs-kernel-mm-cma     |   25 
 Documentation/admin-guide/kernel-parameters.txt   |   17 
 Documentation/admin-guide/mm/memory-hotplug.rst   |    9 
 Documentation/admin-guide/mm/userfaultfd.rst      |  105 +-
 arch/arc/Kconfig                                  |    9 
 arch/arm/Kconfig                                  |   10 
 arch/arm64/Kconfig                                |   34 
 arch/arm64/mm/hugetlbpage.c                       |    7 
 arch/ia64/Kconfig                                 |   14 
 arch/ia64/mm/hugetlbpage.c                        |    3 
 arch/mips/Kconfig                                 |    6 
 arch/mips/mm/hugetlbpage.c                        |    4 
 arch/parisc/Kconfig                               |    5 
 arch/parisc/mm/hugetlbpage.c                      |    2 
 arch/powerpc/Kconfig                              |   17 
 arch/powerpc/mm/hugetlbpage.c                     |    3 
 arch/powerpc/platforms/Kconfig.cputype            |   16 
 arch/riscv/Kconfig                                |    5 
 arch/s390/Kconfig                                 |   12 
 arch/s390/mm/hugetlbpage.c                        |    2 
 arch/sh/Kconfig                                   |    7 
 arch/sh/mm/Kconfig                                |    8 
 arch/sh/mm/hugetlbpage.c                          |    2 
 arch/sparc/mm/hugetlbpage.c                       |    2 
 arch/x86/Kconfig                                  |   33 
 arch/x86/mm/pat/set_memory.c                      |    8 
 drivers/acpi/acpi_memhotplug.c                    |    5 
 drivers/base/memory.c                             |  105 ++
 fs/Kconfig                                        |    5 
 fs/block_dev.c                                    |    2 
 fs/btrfs/compression.c                            |    5 
 fs/btrfs/extent_io.c                              |   22 
 fs/btrfs/inode.c                                  |   33 
 fs/btrfs/reflink.c                                |    6 
 fs/btrfs/zlib.c                                   |    5 
 fs/btrfs/zstd.c                                   |    5 
 fs/buffer.c                                       |   36 
 fs/dax.c                                          |    8 
 fs/gfs2/glock.c                                   |    3 
 fs/hugetlbfs/inode.c                              |    9 
 fs/inode.c                                        |   11 
 fs/proc/task_mmu.c                                |    3 
 fs/userfaultfd.c                                  |  149 +++
 include/linux/buffer_head.h                       |    4 
 include/linux/cma.h                               |    4 
 include/linux/compaction.h                        |    1 
 include/linux/fs.h                                |    2 
 include/linux/gfp.h                               |    2 
 include/linux/highmem.h                           |    7 
 include/linux/huge_mm.h                           |    3 
 include/linux/hugetlb.h                           |   37 
 include/linux/memcontrol.h                        |   27 
 include/linux/memory.h                            |    8 
 include/linux/memory_hotplug.h                    |   15 
 include/linux/memremap.h                          |    2 
 include/linux/migrate.h                           |   11 
 include/linux/mm.h                                |   28 
 include/linux/mmzone.h                            |   20 
 include/linux/pagemap.h                           |    5 
 include/linux/pgtable.h                           |   12 
 include/linux/sched.h                             |    2 
 include/linux/sched/mm.h                          |   27 
 include/linux/shrinker.h                          |    7 
 include/linux/swap.h                              |   21 
 include/linux/userfaultfd_k.h                     |   55 +
 include/linux/vm_event_item.h                     |    8 
 include/trace/events/cma.h                        |   92 +-
 include/trace/events/migrate.h                    |   25 
 include/trace/events/mmflags.h                    |    7 
 include/uapi/linux/mempolicy.h                    |    7 
 include/uapi/linux/userfaultfd.h                  |   36 
 init/Kconfig                                      |    5 
 kernel/sysctl.c                                   |    2 
 lib/Kconfig.kfence                                |    1 
 lib/iov_iter.c                                    |    8 
 mm/Kconfig                                        |   28 
 mm/Makefile                                       |    6 
 mm/cma.c                                          |   70 +
 mm/cma.h                                          |   25 
 mm/cma_debug.c                                    |    8 
 mm/cma_sysfs.c                                    |  112 ++
 mm/compaction.c                                   |  113 ++
 mm/filemap.c                                      |   24 
 mm/frontswap.c                                    |   12 
 mm/gup.c                                          |  264 +++---
 mm/gup_test.c                                     |   29 
 mm/gup_test.h                                     |    3 
 mm/highmem.c                                      |   11 
 mm/huge_memory.c                                  |  326 +++++++-
 mm/hugetlb.c                                      |  843 ++++++++++++++--------
 mm/hugetlb_cgroup.c                               |    9 
 mm/internal.h                                     |   10 
 mm/kfence/core.c                                  |   61 +
 mm/khugepaged.c                                   |   63 -
 mm/ksm.c                                          |   17 
 mm/list_lru.c                                     |    6 
 mm/memcontrol.c                                   |  137 ---
 mm/memory_hotplug.c                               |  220 +++++
 mm/mempolicy.c                                    |   16 
 mm/mempool.c                                      |    2 
 mm/migrate.c                                      |  103 --
 mm/mlock.c                                        |    4 
 mm/mmap.c                                         |   18 
 mm/oom_kill.c                                     |    2 
 mm/page_alloc.c                                   |   83 +-
 mm/process_vm_access.c                            |    1 
 mm/shmem.c                                        |    2 
 mm/sparse.c                                       |    4 
 mm/swap.c                                         |   69 +
 mm/swap_state.c                                   |    4 
 mm/swapfile.c                                     |    4 
 mm/truncate.c                                     |   19 
 mm/userfaultfd.c                                  |   39 -
 mm/util.c                                         |   26 
 mm/vmalloc.c                                      |    2 
 mm/vmscan.c                                       |  543 +++++++++-----
 mm/vmstat.c                                       |   45 -
 mm/workingset.c                                   |    1 
 mm/zsmalloc.c                                     |    6 
 mm/zswap.c                                        |    2 
 tools/testing/selftests/vm/.gitignore             |    1 
 tools/testing/selftests/vm/Makefile               |    1 
 tools/testing/selftests/vm/gup_test.c             |   38 
 tools/testing/selftests/vm/split_huge_page_test.c |  400 ++++++++++
 tools/testing/selftests/vm/userfaultfd.c          |  164 ++++
 125 files changed, 3596 insertions(+), 1668 deletions(-)



^ permalink raw reply	[flat|nested] 281+ messages in thread

* incoming
@ 2021-04-30  5:52 Andrew Morton
  0 siblings, 0 replies; 281+ messages in thread
From: Andrew Morton @ 2021-04-30  5:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, mm-commits


A few misc subsystems and some of MM.


178 patches, based on 8ca5297e7e38f2dc8c753d33a5092e7be181fff0.

Subsystems affected by this patch series:

  ia64
  kbuild
  scripts
  sh
  ocfs2
  kfifo
  vfs
  kernel/watchdog
  mm/slab-generic
  mm/slub
  mm/kmemleak
  mm/debug
  mm/pagecache
  mm/msync
  mm/gup
  mm/memremap
  mm/memcg
  mm/pagemap
  mm/mremap
  mm/dma
  mm/sparsemem
  mm/vmalloc
  mm/documentation
  mm/kasan
  mm/initialization
  mm/pagealloc
  mm/memory-failure

Subsystem: ia64

    Zhang Yunkai <zhang.yunkai@zte.com.cn>:
      arch/ia64/kernel/head.S: remove duplicate include

    Bhaskar Chowdhury <unixbhaskar@gmail.com>:
      arch/ia64/kernel/fsys.S: fix typos
      arch/ia64/include/asm/pgtable.h: minor typo fixes

    Valentin Schneider <valentin.schneider@arm.com>:
      ia64: ensure proper NUMA distance and possible map initialization

    Sergei Trofimovich <slyfox@gentoo.org>:
      ia64: drop unused IA64_FW_EMU ifdef
      ia64: simplify code flow around swiotlb init

    Bhaskar Chowdhury <unixbhaskar@gmail.com>:
      ia64: trivial spelling fixes

    Sergei Trofimovich <slyfox@gentoo.org>:
      ia64: fix EFI_DEBUG build
      ia64: mca: always make IA64_MCA_DEBUG an expression
      ia64: drop marked broken DISCONTIGMEM and VIRTUAL_MEM_MAP
      ia64: module: fix symbolizer crash on fdescr

Subsystem: kbuild

    Luc Van Oostenryck <luc.vanoostenryck@gmail.com>:
      include/linux/compiler-gcc.h: sparse can do constant folding of __builtin_bswap*()

Subsystem: scripts

    Tom Saeger <tom.saeger@oracle.com>:
      scripts/spelling.txt: add entries for recent discoveries

    Wan Jiabing <wanjiabing@vivo.com>:
      scripts: a new script for checking duplicate struct declaration

Subsystem: sh

    Zhang Yunkai <zhang.yunkai@zte.com.cn>:
      arch/sh/include/asm/tlb.h: remove duplicate include

Subsystem: ocfs2

    Yang Li <yang.lee@linux.alibaba.com>:
      ocfs2: replace DEFINE_SIMPLE_ATTRIBUTE with DEFINE_DEBUGFS_ATTRIBUTE

    Joseph Qi <joseph.qi@linux.alibaba.com>:
      ocfs2: map flags directly in flags_to_o2dlm()

    Bhaskar Chowdhury <unixbhaskar@gmail.com>:
      ocfs2: fix a typo

    Jiapeng Chong <jiapeng.chong@linux.alibaba.com>:
      ocfs2/dlm: remove unused function

Subsystem: kfifo

    Dan Carpenter <dan.carpenter@oracle.com>:
      kfifo: fix ternary sign extension bugs

Subsystem: vfs

    Randy Dunlap <rdunlap@infradead.org>:
      vfs: fs_parser: clean up kernel-doc warnings

Subsystem: kernel/watchdog

    Petr Mladek <pmladek@suse.com>:
    Patch series "watchdog/softlockup: Report overall time and some cleanup", v2:
      watchdog: rename __touch_watchdog() to a better descriptive name
      watchdog: explicitly update timestamp when reporting softlockup
      watchdog/softlockup: report the overall time of softlockups
      watchdog/softlockup: remove logic that tried to prevent repeated reports
      watchdog: fix barriers when printing backtraces from all CPUs
      watchdog: cleanup handling of false positives

Subsystem: mm/slab-generic

    Rafael Aquini <aquini@redhat.com>:
      mm/slab_common: provide "slab_merge" option for !IS_ENABLED(CONFIG_SLAB_MERGE_DEFAULT) builds

Subsystem: mm/slub

    Vlastimil Babka <vbabka@suse.cz>:
      mm, slub: enable slub_debug static key when creating cache with explicit debug flags

    Oliver Glitta <glittao@gmail.com>:
      kunit: add a KUnit test for SLUB debugging functionality
      slub: remove resiliency_test() function

    Bhaskar Chowdhury <unixbhaskar@gmail.com>:
      mm/slub.c: trivial typo fixes

Subsystem: mm/kmemleak

    Bhaskar Chowdhury <unixbhaskar@gmail.com>:
      mm/kmemleak.c: fix a typo

Subsystem: mm/debug

    Georgi Djakov <georgi.djakov@linaro.org>:
      mm/page_owner: record the timestamp of all pages during free

    zhongjiang-ali <zhongjiang-ali@linux.alibaba.com>:
      mm, page_owner: remove unused parameter in __set_page_owner_handle

    Sergei Trofimovich <slyfox@gentoo.org>:
      mm: page_owner: fetch backtrace only for tracked pages
      mm: page_owner: use kstrtobool() to parse bool option
      mm: page_owner: detect page_owner recursion via task_struct
      mm: page_poison: print page info when corruption is caught

    Anshuman Khandual <anshuman.khandual@arm.com>:
      mm/memtest: add ARCH_USE_MEMTEST

Subsystem: mm/pagecache

    Jens Axboe <axboe@kernel.dk>:
    Patch series "Improve IOCB_NOWAIT O_DIRECT reads", v3:
      mm: provide filemap_range_needs_writeback() helper
      mm: use filemap_range_needs_writeback() for O_DIRECT reads
      iomap: use filemap_range_needs_writeback() for O_DIRECT reads

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
      mm/filemap: use filemap_read_page in filemap_fault
      mm/filemap: drop check for truncated page after I/O

    Johannes Weiner <hannes@cmpxchg.org>:
      mm: page-writeback: simplify memcg handling in test_clear_page_writeback()

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
      mm: move page_mapping_file to pagemap.h

    Rui Sun <sunrui26@huawei.com>:
      mm/filemap: update stale comment

Subsystem: mm/msync

    Nikita Ermakov <sh1r4s3@mail.si-head.nl>:
      mm/msync: exit early when the flags is an MS_ASYNC and start < vm_start

Subsystem: mm/gup

    Joao Martins <joao.m.martins@oracle.com>:
    Patch series "mm/gup: page unpining improvements", v4:
      mm/gup: add compound page list iterator
      mm/gup: decrement head page once for group of subpages
      mm/gup: add a range variant of unpin_user_pages_dirty_lock()
      RDMA/umem: batch page unpin in __ib_umem_release()

    Yang Shi <shy828301@gmail.com>:
      mm: gup: remove FOLL_SPLIT

Subsystem: mm/memremap

    Zhiyuan Dai <daizhiyuan@phytium.com.cn>:
      mm/memremap.c: fix improper SPDX comment style

Subsystem: mm/memcg

    Muchun Song <songmuchun@bytedance.com>:
      mm: memcontrol: fix kernel stack account

    Shakeel Butt <shakeelb@google.com>:
      memcg: cleanup root memcg checks
      memcg: enable memcg oom-kill for __GFP_NOFAIL

    Johannes Weiner <hannes@cmpxchg.org>:
    Patch series "mm: memcontrol: switch to rstat", v3:
      mm: memcontrol: fix cpuhotplug statistics flushing
      mm: memcontrol: kill mem_cgroup_nodeinfo()
      mm: memcontrol: privatize memcg_page_state query functions
      cgroup: rstat: support cgroup1
      cgroup: rstat: punt root-level optimization to individual controllers
      mm: memcontrol: switch to rstat
      mm: memcontrol: consolidate lruvec stat flushing
      kselftests: cgroup: update kmem test for new vmstat implementation

    Shakeel Butt <shakeelb@google.com>:
      memcg: charge before adding to swapcache on swapin

    Muchun Song <songmuchun@bytedance.com>:
    Patch series "Use obj_cgroup APIs to charge kmem pages", v5:
      mm: memcontrol: slab: fix obtain a reference to a freeing memcg
      mm: memcontrol: introduce obj_cgroup_{un}charge_pages
      mm: memcontrol: directly access page->memcg_data in mm/page_alloc.c
      mm: memcontrol: change ug->dummy_page only if memcg changed
      mm: memcontrol: use obj_cgroup APIs to charge kmem pages
      mm: memcontrol: inline __memcg_kmem_{un}charge() into obj_cgroup_{un}charge_pages()
      mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM

    Wan Jiabing <wanjiabing@vivo.com>:
      linux/memcontrol.h: remove duplicate struct declaration

    Johannes Weiner <hannes@cmpxchg.org>:
      mm: page_counter: mitigate consequences of a page_counter underflow

Subsystem: mm/pagemap

    Wang Qing <wangqing@vivo.com>:
      mm/memory.c: do_numa_page(): delete bool "migrated"

    Zhiyuan Dai <daizhiyuan@phytium.com.cn>:
      mm/interval_tree: add comments to improve code readability

    Oscar Salvador <osalvador@suse.de>:
    Patch series "Cleanup and fixups for vmemmap handling", v6:
      x86/vmemmap: drop handling of 4K unaligned vmemmap range
      x86/vmemmap: drop handling of 1GB vmemmap ranges
      x86/vmemmap: handle unpopulated sub-pmd ranges
      x86/vmemmap: optimize for consecutive sections in partial populated PMDs

    Ovidiu Panait <ovidiu.panait@windriver.com>:
      mm, tracing: improve rss_stat tracepoint message

    Christoph Hellwig <hch@lst.de>:
    Patch series "add remap_pfn_range_notrack instead of reinventing it in i915", v2:
      mm: add remap_pfn_range_notrack
      mm: add a io_mapping_map_user helper
      i915: use io_mapping_map_user
      i915: fix remap_io_sg to verify the pgprot

    Huang Ying <ying.huang@intel.com>:
      NUMA balancing: reduce TLB flush via delaying mapping on hint page fault

Subsystem: mm/mremap

    Brian Geffon <bgeffon@google.com>:
    Patch series "mm: Extend MREMAP_DONTUNMAP to non-anonymous mappings", v5:
      mm: extend MREMAP_DONTUNMAP to non-anonymous mappings
      Revert "mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio"
      selftests: add a MREMAP_DONTUNMAP selftest for shmem

Subsystem: mm/dma

    Zhiyuan Dai <daizhiyuan@phytium.com.cn>:
      mm/dmapool: switch from strlcpy to strscpy

Subsystem: mm/sparsemem

    Wang Wensheng <wangwensheng4@huawei.com>:
      mm/sparse: add the missing sparse_buffer_fini() in error branch

Subsystem: mm/vmalloc

    Christoph Hellwig <hch@lst.de>:
    Patch series "remap_vmalloc_range cleanups":
      samples/vfio-mdev/mdpy: use remap_vmalloc_range
      mm: unexport remap_vmalloc_range_partial

    Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>:
      mm/vmalloc: use rb_tree instead of list for vread() lookups

    Nicholas Piggin <npiggin@gmail.com>:
    Patch series "huge vmalloc mappings", v13:
      ARM: mm: add missing pud_page define to 2-level page tables
      mm/vmalloc: fix HUGE_VMAP regression by enabling huge pages in vmalloc_to_page
      mm: apply_to_pte_range warn and fail if a large pte is encountered
      mm/vmalloc: rename vmap_*_range vmap_pages_*_range
      mm/ioremap: rename ioremap_*_range to vmap_*_range
      mm: HUGE_VMAP arch support cleanup
      powerpc: inline huge vmap supported functions
      arm64: inline huge vmap supported functions
      x86: inline huge vmap supported functions
      mm/vmalloc: provide fallback arch huge vmap support functions
      mm: move vmap_range from mm/ioremap.c to mm/vmalloc.c
      mm/vmalloc: add vmap_range_noflush variant
      mm/vmalloc: hugepage vmalloc mappings
    Patch series "mm/vmalloc: cleanup after hugepage series", v2:
      mm/vmalloc: remove map_kernel_range
      kernel/dma: remove unnecessary unmap_kernel_range
      powerpc/xive: remove unnecessary unmap_kernel_range
      mm/vmalloc: remove unmap_kernel_range
      mm/vmalloc: improve allocation failure error messages

    Vijayanand Jitta <vjitta@codeaurora.org>:
      mm: vmalloc: prevent use after free in _vm_unmap_aliases

    "Uladzislau Rezki (Sony)" <urezki@gmail.com>:
      lib/test_vmalloc.c: remove two kvfree_rcu() tests
      lib/test_vmalloc.c: add a new 'nr_threads' parameter
      vm/test_vmalloc.sh: adapt for updated driver interface
      mm/vmalloc: refactor the preloading loagic
      mm/vmalloc: remove an empty line

Subsystem: mm/documentation

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
      mm/doc: fix fault_flag_allow_retry_first kerneldoc
      mm/doc: fix page_maybe_dma_pinned kerneldoc
      mm/doc: turn fault flags into an enum
      mm/doc: add mm.h and mm_types.h to the mm-api document

    Lukas Bulwahn <lukas.bulwahn@gmail.com>:
    Patch series "kernel-doc and MAINTAINERS clean-up":
      MAINTAINERS: assign pagewalk.h to MEMORY MANAGEMENT
      pagewalk: prefix struct kernel-doc descriptions

Subsystem: mm/kasan

    Zhiyuan Dai <daizhiyuan@phytium.com.cn>:
      mm/kasan: switch from strlcpy to strscpy

    Peter Collingbourne <pcc@google.com>:
      kasan: fix kasan_byte_accessible() to be consistent with actual checks

    Andrey Konovalov <andreyknvl@google.com>:
      kasan: initialize shadow to TAG_INVALID for SW_TAGS
      mm, kasan: don't poison boot memory with tag-based modes
    Patch series "kasan: integrate with init_on_alloc/free", v3:
      arm64: kasan: allow to init memory when setting tags
      kasan: init memory in kasan_(un)poison for HW_TAGS
      kasan, mm: integrate page_alloc init with HW_TAGS
      kasan, mm: integrate slab init_on_alloc with HW_TAGS
      kasan, mm: integrate slab init_on_free with HW_TAGS
      kasan: docs: clean up sections
      kasan: docs: update overview section
      kasan: docs: update usage section
      kasan: docs: update error reports section
      kasan: docs: update boot parameters section
      kasan: docs: update GENERIC implementation details section
      kasan: docs: update SW_TAGS implementation details section
      kasan: docs: update HW_TAGS implementation details section
      kasan: docs: update shadow memory section
      kasan: docs: update ignoring accesses section
      kasan: docs: update tests section

    Walter Wu <walter-zh.wu@mediatek.com>:
      kasan: record task_work_add() call stack

    Andrey Konovalov <andreyknvl@google.com>:
      kasan: detect false-positives in tests

    Zqiang <qiang.zhang@windriver.com>:
      irq_work: record irq_work_queue() call stack

Subsystem: mm/initialization

    Kefeng Wang <wangkefeng.wang@huawei.com>:
      mm: move mem_init_print_info() into mm_init()

Subsystem: mm/pagealloc

    David Hildenbrand <david@redhat.com>:
      mm/page_alloc: drop pr_info_ratelimited() in alloc_contig_range()

    Minchan Kim <minchan@kernel.org>:
      mm: remove lru_add_drain_all in alloc_contig_range

    Yu Zhao <yuzhao@google.com>:
      include/linux/page-flags-layout.h: correctly determine LAST_CPUPID_WIDTH
      include/linux/page-flags-layout.h: cleanups

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
    Patch series "Rationalise __alloc_pages wrappers", v3:
      mm/page_alloc: rename alloc_mask to alloc_gfp
      mm/page_alloc: rename gfp_mask to gfp
      mm/page_alloc: combine __alloc_pages and __alloc_pages_nodemask
      mm/mempolicy: rename alloc_pages_current to alloc_pages
      mm/mempolicy: rewrite alloc_pages documentation
      mm/mempolicy: rewrite alloc_pages_vma documentation
      mm/mempolicy: fix mpol_misplaced kernel-doc

    Minchan Kim <minchan@kernel.org>:
      mm: page_alloc: dump migrate-failed pages

    Geert Uytterhoeven <geert@linux-m68k.org>:
      mm/Kconfig: remove default DISCONTIGMEM_MANUAL

    Kefeng Wang <wangkefeng.wang@huawei.com>:
      mm, page_alloc: avoid page_to_pfn() in move_freepages()

    zhouchuangao <zhouchuangao@vivo.com>:
      mm/page_alloc: duplicate include linux/vmalloc.h

    Mel Gorman <mgorman@techsingularity.net>:
    Patch series "Introduce a bulk order-0 page allocator with two in-tree users", v6:
      mm/page_alloc: rename alloced to allocated
      mm/page_alloc: add a bulk page allocator
      mm/page_alloc: add an array-based interface to the bulk page allocator

    Jesper Dangaard Brouer <brouer@redhat.com>:
      mm/page_alloc: optimize code layout for __alloc_pages_bulk
      mm/page_alloc: inline __rmqueue_pcplist

    Chuck Lever <chuck.lever@oracle.com>:
    Patch series "SUNRPC consumer for the bulk page allocator":
      SUNRPC: set rq_page_end differently
      SUNRPC: refresh rq_pages using a bulk page allocator

    Jesper Dangaard Brouer <brouer@redhat.com>:
      net: page_pool: refactor dma_map into own function page_pool_dma_map
      net: page_pool: use alloc_pages_bulk in refill code path

    Sergei Trofimovich <slyfox@gentoo.org>:
      mm: page_alloc: ignore init_on_free=1 for debug_pagealloc=1

    huxiang <huxiang@uniontech.com>:
      mm/page_alloc: redundant definition variables of pfn in for loop

    Mike Rapoport <rppt@linux.ibm.com>:
      mm/mmzone.h: fix existing kernel-doc comments and link them to core-api

Subsystem: mm/memory-failure

    Jane Chu <jane.chu@oracle.com>:
      mm/memory-failure: unnecessary amount of unmapping

 Documentation/admin-guide/kernel-parameters.txt |    7 
 Documentation/admin-guide/mm/transhuge.rst      |    2 
 Documentation/core-api/cachetlb.rst             |    4 
 Documentation/core-api/mm-api.rst               |    6 
 Documentation/dev-tools/kasan.rst               |  355 +++++-----
 Documentation/vm/page_owner.rst                 |    2 
 Documentation/vm/transhuge.rst                  |    5 
 MAINTAINERS                                     |    1 
 arch/Kconfig                                    |   11 
 arch/alpha/mm/init.c