mm-commits Archive on lore.kernel.org
 help / color / Atom feed
* incoming
@ 2020-01-31  6:10 Andrew Morton
  2020-01-31  6:11 ` [patch 001/118] lib/test_bitmap: correct test data offsets for 32-bit Andrew Morton
                   ` (117 more replies)
  0 siblings, 118 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, mm-commits


Most of -mm and quite a number of other subsystems.

MM is fairly quiet this time.  Holidays, I assume.



119 patches, based on 39bed42de2e7d74686a2d5a45638d6a5d7e7d473:

Subsystems affected by this patch series:

  hotfixes
  scripts
  ocfs2
  mm/slub
  mm/kmemleak
  mm/debug
  mm/pagecache
  mm/gup
  mm/swap
  mm/memcg
  mm/pagemap
  mm/tracing
  mm/kasan
  mm/initialization
  mm/pagealloc
  mm/vmscan
  mm/tools
  mm/memblock
  mm/oom-kill
  mm/hugetlb
  mm/migration
  mm/mmap
  mm/memory-hotplug
  mm/zswap
  mm/cleanups
  mm/zram
  misc
  lib
  binfmt
  init
  reiserfs
  exec
  dma-mapping
  kcov

Subsystem: hotfixes

    Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
      lib/test_bitmap: correct test data offsets for 32-bit

    "Theodore Ts'o" <tytso@mit.edu>:
      memcg: fix a crash in wb_workfn when a device disappears

    Dan Carpenter <dan.carpenter@oracle.com>:
      mm/mempolicy.c: fix out of bounds write in mpol_parse_str()

    Pingfan Liu <kernelfans@gmail.com>:
      mm/sparse.c: reset section's mem_map when fully deactivated

    Wei Yang <richardw.yang@linux.intel.com>:
      mm/migrate.c: also overwrite error when it is bigger than zero

    Dan Williams <dan.j.williams@intel.com>:
      mm/memory_hotplug: fix remove_memory() lockdep splat

    Wei Yang <richardw.yang@linux.intel.com>:
      mm: thp: don't need care deferred split queue in memcg charge move path

    Yang Shi <yang.shi@linux.alibaba.com>:
      mm: move_pages: report the number of non-attempted pages

Subsystem: scripts

    Xiong <xndchn@gmail.com>:
      scripts/spelling.txt: add more spellings to spelling.txt

    Luca Ceresoli <luca@lucaceresoli.net>:
      scripts/spelling.txt: add "issus" typo

Subsystem: ocfs2

    Aditya Pakki <pakki001@umn.edu>:
      fs: ocfs: remove unnecessary assertion in dlm_migrate_lockres

    zhengbin <zhengbin13@huawei.com>:
      ocfs2: remove unneeded semicolons

    Masahiro Yamada <masahiroy@kernel.org>:
      ocfs2: make local header paths relative to C files

    Colin Ian King <colin.king@canonical.com>:
      ocfs2/dlm: remove redundant assignment to ret

    Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
      ocfs2/dlm: move BITS_TO_BYTES() to bitops.h for wider use

    wangyan <wangyan122@huawei.com>:
      ocfs2: fix a NULL pointer dereference when call ocfs2_update_inode_fsync_trans()
      ocfs2: use ocfs2_update_inode_fsync_trans() to access t_tid in handle->h_transaction

Subsystem: mm/slub

    Yu Zhao <yuzhao@google.com>:
      mm/slub.c: avoid slub allocation while holding list_lock

Subsystem: mm/kmemleak

    He Zhe <zhe.he@windriver.com>:
      mm/kmemleak: turn kmemleak_lock and object->lock to raw_spinlock_t

Subsystem: mm/debug

    Vlastimil Babka <vbabka@suse.cz>:
      mm/debug.c: always print flags in dump_page()

Subsystem: mm/pagecache

    Ira Weiny <ira.weiny@intel.com>:
      mm/filemap.c: clean up filemap_write_and_wait()

Subsystem: mm/gup

    Qiujun Huang <hqjagain@gmail.com>:
      mm: fix gup_pud_range

    Wei Yang <richardw.yang@linux.intel.com>:
      mm/gup.c: use is_vm_hugetlb_page() to check whether to follow huge

    John Hubbard <jhubbard@nvidia.com>:
    Patch series "mm/gup: prereqs to track dma-pinned pages: FOLL_PIN", v12:
      mm/gup: factor out duplicate code from four routines
      mm/gup: move try_get_compound_head() to top, fix minor issues

    Dan Williams <dan.j.williams@intel.com>:
      mm: Cleanup __put_devmap_managed_page() vs ->page_free()

    John Hubbard <jhubbard@nvidia.com>:
      mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages
      goldish_pipe: rename local pin_user_pages() routine
      mm: fix get_user_pages_remote()'s handling of FOLL_LONGTERM
      vfio: fix FOLL_LONGTERM use, simplify get_user_pages_remote() call
      mm/gup: allow FOLL_FORCE for get_user_pages_fast()
      IB/umem: use get_user_pages_fast() to pin DMA pages
      media/v4l2-core: set pages dirty upon releasing DMA buffers
      mm/gup: introduce pin_user_pages*() and FOLL_PIN
      goldish_pipe: convert to pin_user_pages() and put_user_page()
      IB/{core,hw,umem}: set FOLL_PIN via pin_user_pages*(), fix up ODP
      mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote()
      drm/via: set FOLL_PIN via pin_user_pages_fast()
      fs/io_uring: set FOLL_PIN via pin_user_pages()
      net/xdp: set FOLL_PIN via pin_user_pages()
      media/v4l2-core: pin_user_pages (FOLL_PIN) and put_user_page() conversion
      vfio, mm: pin_user_pages (FOLL_PIN) and put_user_page() conversion
      powerpc: book3s64: convert to pin_user_pages() and put_user_page()
      mm/gup_benchmark: use proper FOLL_WRITE flags instead of hard-coding "1"
      mm, tree-wide: rename put_user_page*() to unpin_user_page*()

Subsystem: mm/swap

    Vasily Averin <vvs@virtuozzo.com>:
      mm/swapfile.c: swap_next should increase position index

Subsystem: mm/memcg

    Kaitao Cheng <pilgrimtao@gmail.com>:
      mm/memcontrol.c: cleanup some useless code

Subsystem: mm/pagemap

    Li Xinhai <lixinhai.lxh@gmail.com>:
      mm/page_vma_mapped.c: explicitly compare pfn for normal, hugetlbfs and THP page

Subsystem: mm/tracing

    Junyong Sun <sunjy516@gmail.com>:
      mm, tracing: print symbol name for kmem_alloc_node call_site events

Subsystem: mm/kasan

    "Gustavo A. R. Silva" <gustavo@embeddedor.com>:
      lib/test_kasan.c: fix memory leak in kmalloc_oob_krealloc_more()

Subsystem: mm/initialization

    Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
      mm/early_ioremap.c: use %pa to print resource_size_t variables

Subsystem: mm/pagealloc

    "Kirill A. Shutemov" <kirill@shutemov.name>:
      mm/page_alloc: skip non present sections on zone initialization

    David Hildenbrand <david@redhat.com>:
      mm: remove the memory isolate notifier
      mm: remove "count" parameter from has_unmovable_pages()

Subsystem: mm/vmscan

    Liu Song <liu.song11@zte.com.cn>:
      mm/vmscan.c: remove unused return value of shrink_node

    Alex Shi <alex.shi@linux.alibaba.com>:
      mm/vmscan: remove prefetch_prev_lru_page
      mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE

Subsystem: mm/tools

    Daniel Wagner <dwagner@suse.de>:
      tools/vm/slabinfo: fix sanity checks enabling

Subsystem: mm/memblock

    Anshuman Khandual <anshuman.khandual@arm.com>:
      mm/memblock: define memblock_physmem_add()
      memblock: Use __func__ in remaining memblock_dbg() call sites

Subsystem: mm/oom-kill

    David Rientjes <rientjes@google.com>:
      mm, oom: dump stack of victim when reaping failed

Subsystem: mm/hugetlb

    Wei Yang <richardw.yang@linux.intel.com>:
      mm/huge_memory.c: use head to check huge zero page
      mm/huge_memory.c: use head to emphasize the purpose of page
      mm/huge_memory.c: reduce critical section protected by split_queue_lock

Subsystem: mm/migration

    Ralph Campbell <rcampbell@nvidia.com>:
      mm/migrate: remove useless mask of start address
      mm/migrate: clean up some minor coding style
      mm/migrate: add stable check in migrate_vma_insert_page()

    David Rientjes <rientjes@google.com>:
      mm, thp: fix defrag setting if newline is not used

Subsystem: mm/mmap

    Miaohe Lin <linmiaohe@huawei.com>:
      mm/mmap.c: get rid of odd jump labels in find_mergeable_anon_vma()

Subsystem: mm/memory-hotplug

    David Hildenbrand <david@redhat.com>:
    Patch series "mm/memory_hotplug: pass in nid to online_pages()":
      mm/memory_hotplug: pass in nid to online_pages()

    Qian Cai <cai@lca.pw>:
      mm/hotplug: silence a lockdep splat with printk()
      mm/page_isolation: fix potential warning from user

Subsystem: mm/zswap

    Vitaly Wool <vitaly.wool@konsulko.com>:
      mm/zswap.c: add allocation hysteresis if pool limit is hit

    Dan Carpenter <dan.carpenter@oracle.com>:
      zswap: potential NULL dereference on error in init_zswap()

Subsystem: mm/cleanups

    Yu Zhao <yuzhao@google.com>:
      include/linux/mm.h: clean up obsolete check on space in page->flags

    Wei Yang <richardw.yang@linux.intel.com>:
      include/linux/mm.h: remove dead code totalram_pages_set()

    Anshuman Khandual <anshuman.khandual@arm.com>:
      include/linux/memory.h: drop fields 'hw' and 'phys_callback' from struct memory_block

    Hao Lee <haolee.swjtu@gmail.com>:
      mm: fix comments related to node reclaim

Subsystem: mm/zram

    Taejoon Song <taejoon.song@lge.com>:
      zram: try to avoid worst-case scenario on same element pages

    Colin Ian King <colin.king@canonical.com>:
      drivers/block/zram/zram_drv.c: fix error return codes not being returned in writeback_store

Subsystem: misc

    Akinobu Mita <akinobu.mita@gmail.com>:
    Patch series "add header file for kelvin to/from Celsius conversion:
      include/linux/units.h: add helpers for kelvin to/from Celsius conversion
      ACPI: thermal: switch to use <linux/units.h> helpers
      platform/x86: asus-wmi: switch to use <linux/units.h> helpers
      platform/x86: intel_menlow: switch to use <linux/units.h> helpers
      thermal: int340x: switch to use <linux/units.h> helpers
      thermal: intel_pch: switch to use <linux/units.h> helpers
      nvme: hwmon: switch to use <linux/units.h> helpers
      thermal: remove kelvin to/from Celsius conversion helpers from <linux/thermal.h>
      iwlegacy: use <linux/units.h> helpers
      iwlwifi: use <linux/units.h> helpers
      thermal: armada: remove unused TO_MCELSIUS macro
      iio: adc: qcom-vadc-common: use <linux/units.h> helpers

Subsystem: lib

    Mikhail Zaslonko <zaslonko@linux.ibm.com>:
    Patch series "S390 hardware support for kernel zlib", v3:
      lib/zlib: add s390 hardware support for kernel zlib_deflate
      s390/boot: rename HEAP_SIZE due to name collision
      lib/zlib: add s390 hardware support for kernel zlib_inflate
      s390/boot: add dfltcc= kernel command line parameter
      lib/zlib: add zlib_deflate_dfltcc_enabled() function
      btrfs: use larger zlib buffer for s390 hardware compression

    Nathan Chancellor <natechancellor@gmail.com>:
      lib/scatterlist.c: adjust indentation in __sg_alloc_table

    Yury Norov <yury.norov@gmail.com>:
      uapi: rename ext2_swab() to swab() and share globally in swab.h
      lib/find_bit.c: join _find_next_bit{_le}
      lib/find_bit.c: uninline helper _find_next_bit()

Subsystem: binfmt

    Alexey Dobriyan <adobriyan@gmail.com>:
      fs/binfmt_elf.c: smaller code generation around auxv vector fill
      fs/binfmt_elf.c: fix ->start_code calculation
      fs/binfmt_elf.c: don't copy ELF header around
      fs/binfmt_elf.c: better codegen around current->mm
      fs/binfmt_elf.c: make BAD_ADDR() unlikely
      fs/binfmt_elf.c: coredump: allocate core ELF header on stack
      fs/binfmt_elf.c: coredump: delete duplicated overflow check
      fs/binfmt_elf.c: coredump: allow process with empty address space to coredump

Subsystem: init

    Arvind Sankar <nivedita@alum.mit.edu>:
      init/main.c: log arguments and environment passed to init
      init/main.c: remove unnecessary repair_env_string in do_initcall_level
    Patch series "init/main.c: minor cleanup/bugfix of envvar handling", v2:
      init/main.c: fix quoted value handling in unknown_bootoption

    Christophe Leroy <christophe.leroy@c-s.fr>:
      init/main.c: fix misleading "This architecture does not have kernel memory protection" message

Subsystem: reiserfs

    Yunfeng Ye <yeyunfeng@huawei.com>:
      reiserfs: prevent NULL pointer dereference in reiserfs_insert_item()

Subsystem: exec

    Alexey Dobriyan <adobriyan@gmail.com>:
      execve: warn if process starts with executable stack

Subsystem: dma-mapping

    Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
      include/linux/io-mapping.h-mapping: use PHYS_PFN() macro in io_mapping_map_atomic_wc()

Subsystem: kcov

    Dmitry Vyukov <dvyukov@google.com>:
      kcov: ignore fault-inject and stacktrace

 Documentation/admin-guide/kernel-parameters.txt              |   12 
 Documentation/core-api/index.rst                             |    1 
 Documentation/core-api/pin_user_pages.rst                    |  234 +++++
 Documentation/vm/zswap.rst                                   |   13 
 arch/powerpc/mm/book3s64/iommu_api.c                         |   14 
 arch/s390/boot/compressed/decompressor.c                     |    8 
 arch/s390/boot/ipl_parm.c                                    |   14 
 arch/s390/include/asm/setup.h                                |    7 
 arch/s390/kernel/setup.c                                     |   14 
 drivers/acpi/thermal.c                                       |   34 
 drivers/base/memory.c                                        |   25 
 drivers/block/zram/zram_drv.c                                |   10 
 drivers/gpu/drm/via/via_dmablit.c                            |    6 
 drivers/iio/adc/qcom-vadc-common.c                           |    6 
 drivers/iio/adc/qcom-vadc-common.h                           |    1 
 drivers/infiniband/core/umem.c                               |   21 
 drivers/infiniband/core/umem_odp.c                           |   13 
 drivers/infiniband/hw/hfi1/user_pages.c                      |    4 
 drivers/infiniband/hw/mthca/mthca_memfree.c                  |    8 
 drivers/infiniband/hw/qib/qib_user_pages.c                   |    4 
 drivers/infiniband/hw/qib/qib_user_sdma.c                    |    8 
 drivers/infiniband/hw/usnic/usnic_uiom.c                     |    4 
 drivers/infiniband/sw/siw/siw_mem.c                          |    4 
 drivers/media/v4l2-core/videobuf-dma-sg.c                    |   20 
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_init.h             |    1 
 drivers/net/wireless/intel/iwlegacy/4965-mac.c               |    3 
 drivers/net/wireless/intel/iwlegacy/4965.c                   |   17 
 drivers/net/wireless/intel/iwlegacy/common.h                 |    3 
 drivers/net/wireless/intel/iwlwifi/dvm/dev.h                 |    5 
 drivers/net/wireless/intel/iwlwifi/dvm/devices.c             |    6 
 drivers/nvdimm/pmem.c                                        |    6 
 drivers/nvme/host/hwmon.c                                    |   13 
 drivers/platform/goldfish/goldfish_pipe.c                    |   39 
 drivers/platform/x86/asus-wmi.c                              |    7 
 drivers/platform/x86/intel_menlow.c                          |    9 
 drivers/thermal/armada_thermal.c                             |    2 
 drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.c |    7 
 drivers/thermal/intel/intel_pch_thermal.c                    |    3 
 drivers/vfio/vfio_iommu_type1.c                              |   39 
 fs/binfmt_elf.c                                              |  154 +--
 fs/btrfs/compression.c                                       |    2 
 fs/btrfs/zlib.c                                              |  135 ++
 fs/exec.c                                                    |    5 
 fs/fs-writeback.c                                            |    2 
 fs/io_uring.c                                                |    6 
 fs/ocfs2/cluster/quorum.c                                    |    2 
 fs/ocfs2/dlm/Makefile                                        |    2 
 fs/ocfs2/dlm/dlmast.c                                        |    8 
 fs/ocfs2/dlm/dlmcommon.h                                     |    4 
 fs/ocfs2/dlm/dlmconvert.c                                    |    8 
 fs/ocfs2/dlm/dlmdebug.c                                      |    8 
 fs/ocfs2/dlm/dlmdomain.c                                     |    8 
 fs/ocfs2/dlm/dlmlock.c                                       |    8 
 fs/ocfs2/dlm/dlmmaster.c                                     |   10 
 fs/ocfs2/dlm/dlmrecovery.c                                   |   10 
 fs/ocfs2/dlm/dlmthread.c                                     |    8 
 fs/ocfs2/dlm/dlmunlock.c                                     |    8 
 fs/ocfs2/dlmfs/Makefile                                      |    2 
 fs/ocfs2/dlmfs/dlmfs.c                                       |    4 
 fs/ocfs2/dlmfs/userdlm.c                                     |    6 
 fs/ocfs2/dlmglue.c                                           |    2 
 fs/ocfs2/journal.h                                           |    8 
 fs/ocfs2/namei.c                                             |    3 
 fs/reiserfs/stree.c                                          |    3 
 include/linux/backing-dev.h                                  |   10 
 include/linux/bitops.h                                       |    1 
 include/linux/fs.h                                           |    6 
 include/linux/io-mapping.h                                   |    5 
 include/linux/memblock.h                                     |    7 
 include/linux/memory.h                                       |   29 
 include/linux/memory_hotplug.h                               |    3 
 include/linux/mm.h                                           |  116 +-
 include/linux/mmzone.h                                       |    2 
 include/linux/page-isolation.h                               |    8 
 include/linux/swab.h                                         |    1 
 include/linux/thermal.h                                      |   11 
 include/linux/units.h                                        |   84 +
 include/linux/zlib.h                                         |    6 
 include/trace/events/kmem.h                                  |    4 
 include/trace/events/writeback.h                             |   37 
 include/uapi/linux/swab.h                                    |   10 
 include/uapi/linux/sysctl.h                                  |    2 
 init/main.c                                                  |   36 
 kernel/Makefile                                              |    1 
 lib/Kconfig                                                  |    7 
 lib/Makefile                                                 |    2 
 lib/decompress_inflate.c                                     |   13 
 lib/find_bit.c                                               |   82 -
 lib/scatterlist.c                                            |    2 
 lib/test_bitmap.c                                            |    9 
 lib/test_kasan.c                                             |    1 
 lib/zlib_deflate/deflate.c                                   |   85 +
 lib/zlib_deflate/deflate_syms.c                              |    1 
 lib/zlib_deflate/deftree.c                                   |   54 -
 lib/zlib_deflate/defutil.h                                   |  134 ++
 lib/zlib_dfltcc/Makefile                                     |   13 
 lib/zlib_dfltcc/dfltcc.c                                     |   57 +
 lib/zlib_dfltcc/dfltcc.h                                     |  155 +++
 lib/zlib_dfltcc/dfltcc_deflate.c                             |  280 ++++++
 lib/zlib_dfltcc/dfltcc_inflate.c                             |  149 +++
 lib/zlib_dfltcc/dfltcc_syms.c                                |   17 
 lib/zlib_dfltcc/dfltcc_util.h                                |  123 ++
 lib/zlib_inflate/inflate.c                                   |   32 
 lib/zlib_inflate/inflate.h                                   |    8 
 lib/zlib_inflate/infutil.h                                   |   18 
 mm/Makefile                                                  |    1 
 mm/backing-dev.c                                             |    1 
 mm/debug.c                                                   |   18 
 mm/early_ioremap.c                                           |    8 
 mm/filemap.c                                                 |   34 
 mm/gup.c                                                     |  503 ++++++-----
 mm/gup_benchmark.c                                           |    9 
 mm/huge_memory.c                                             |   44 
 mm/kmemleak.c                                                |  112 +-
 mm/memblock.c                                                |   22 
 mm/memcontrol.c                                              |   25 
 mm/memory_hotplug.c                                          |   24 
 mm/mempolicy.c                                               |    6 
 mm/memremap.c                                                |   95 --
 mm/migrate.c                                                 |   77 +
 mm/mmap.c                                                    |   30 
 mm/oom_kill.c                                                |    2 
 mm/page_alloc.c                                              |   83 +
 mm/page_isolation.c                                          |   69 -
 mm/page_vma_mapped.c                                         |   12 
 mm/process_vm_access.c                                       |   32 
 mm/slub.c                                                    |   88 +
 mm/sparse.c                                                  |    2 
 mm/swap.c                                                    |   27 
 mm/swapfile.c                                                |    2 
 mm/vmscan.c                                                  |   24 
 mm/zswap.c                                                   |   88 +
 net/xdp/xdp_umem.c                                           |    4 
 scripts/spelling.txt                                         |   14 
 tools/testing/selftests/vm/gup_benchmark.c                   |    6 
 tools/vm/slabinfo.c                                          |    4 
 136 files changed, 2790 insertions(+), 1358 deletions(-)

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 001/118] lib/test_bitmap: correct test data offsets for 32-bit
  2020-01-31  6:10 incoming Andrew Morton
@ 2020-01-31  6:11 ` Andrew Morton
  2020-01-31  6:11 ` [patch 002/118] memcg: fix a crash in wb_workfn when a device disappears Andrew Morton
                   ` (116 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:11 UTC (permalink / raw)
  To: akpm, andriy.shevchenko, linux-mm, linux, linux, mm-commits,
	stable, torvalds, yury.norov

From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Subject: lib/test_bitmap: correct test data offsets for 32-bit

On 32-bit platform the size of long is only 32 bits which makes wrong
offset in the array of 64 bit size.

Calculate offset based on BITS_PER_LONG.

Link: http://lkml.kernel.org/r/20200109103601.45929-1-andriy.shevchenko@linux.intel.com
Fixes: 30544ed5de43 ("lib/bitmap: introduce bitmap_replace() helper")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 lib/test_bitmap.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

--- a/lib/test_bitmap.c~lib-test_bitmap-correct-test-data-offsets-for-32-bit
+++ a/lib/test_bitmap.c
@@ -275,22 +275,23 @@ static void __init test_copy(void)
 static void __init test_replace(void)
 {
 	unsigned int nbits = 64;
+	unsigned int nlongs = DIV_ROUND_UP(nbits, BITS_PER_LONG);
 	DECLARE_BITMAP(bmap, 1024);
 
 	bitmap_zero(bmap, 1024);
-	bitmap_replace(bmap, &exp2[0], &exp2[1], exp2_to_exp3_mask, nbits);
+	bitmap_replace(bmap, &exp2[0 * nlongs], &exp2[1 * nlongs], exp2_to_exp3_mask, nbits);
 	expect_eq_bitmap(bmap, exp3_0_1, nbits);
 
 	bitmap_zero(bmap, 1024);
-	bitmap_replace(bmap, &exp2[1], &exp2[0], exp2_to_exp3_mask, nbits);
+	bitmap_replace(bmap, &exp2[1 * nlongs], &exp2[0 * nlongs], exp2_to_exp3_mask, nbits);
 	expect_eq_bitmap(bmap, exp3_1_0, nbits);
 
 	bitmap_fill(bmap, 1024);
-	bitmap_replace(bmap, &exp2[0], &exp2[1], exp2_to_exp3_mask, nbits);
+	bitmap_replace(bmap, &exp2[0 * nlongs], &exp2[1 * nlongs], exp2_to_exp3_mask, nbits);
 	expect_eq_bitmap(bmap, exp3_0_1, nbits);
 
 	bitmap_fill(bmap, 1024);
-	bitmap_replace(bmap, &exp2[1], &exp2[0], exp2_to_exp3_mask, nbits);
+	bitmap_replace(bmap, &exp2[1 * nlongs], &exp2[0 * nlongs], exp2_to_exp3_mask, nbits);
 	expect_eq_bitmap(bmap, exp3_1_0, nbits);
 }
 
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 002/118] memcg: fix a crash in wb_workfn when a device disappears
  2020-01-31  6:10 incoming Andrew Morton
  2020-01-31  6:11 ` [patch 001/118] lib/test_bitmap: correct test data offsets for 32-bit Andrew Morton
@ 2020-01-31  6:11 ` Andrew Morton
  2020-01-31  6:11 ` [patch 003/118] mm/mempolicy.c: fix out of bounds write in mpol_parse_str() Andrew Morton
                   ` (115 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:11 UTC (permalink / raw)
  To: akpm, axboe, clm, linux-mm, mm-commits, stable, tj, torvalds, tytso

From: "Theodore Ts'o" <tytso@mit.edu>
Subject: memcg: fix a crash in wb_workfn when a device disappears

Without memcg, there is a one-to-one mapping between the bdi and
bdi_writeback structures.  In this world, things are fairly
straightforward; the first thing bdi_unregister() does is to shutdown the
bdi_writeback structure (or wb), and part of that writeback ensures that
no other work queued against the wb, and that the wb is fully drained.

With memcg, however, there is a one-to-many relationship between the bdi
and bdi_writeback structures; that is, there are multiple wb objects which
can all point to a single bdi.  There is a refcount which prevents the bdi
object from being released (and hence, unregistered).  So in theory, the
bdi_unregister() *should* only get called once its refcount goes to zero
(bdi_put will drop the refcount, and when it is zero, release_bdi gets
called, which calls bdi_unregister).

Unfortunately, del_gendisk() in block/gen_hd.c never got the memo about
the Brave New memcg World, and calls bdi_unregister directly.  It does
this without informing the file system, or the memcg code, or anything
else.  This causes the root wb associated with the bdi to be unregistered,
but none of the memcg-specific wb's are shutdown.  So when one of these
wb's are woken up to do delayed work, they try to dereference their
wb->bdi->dev to fetch the device name, but unfortunately bdi->dev is now
NULL, thanks to the bdi_unregister() called by del_gendisk().  As a
result, *boom*.

Fortunately, it looks like the rest of the writeback path is perfectly
happy with bdi->dev and bdi->owner being NULL, so the simplest fix is to
create a bdi_dev_name() function which can handle bdi->dev being NULL. 
This also allows us to bulletproof the writeback tracepoints to prevent
them from dereferencing a NULL pointer and crashing the kernel if one is
tracing with memcg's enabled, and an iSCSI device dies or a USB storage
stick is pulled.

The most common way of triggering this will be hotremoval of a device
while writeback with memcg enabled is going on.  It was triggering several
times a day in a heavily loaded production environment.

Google Bug Id: 145475544

Link: https://lore.kernel.org/r/20191227194829.150110-1-tytso@mit.edu
Link: http://lkml.kernel.org/r/20191228005211.163952-1-tytso@mit.edu
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Chris Mason <clm@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/fs-writeback.c                |    2 -
 include/linux/backing-dev.h      |   10 +++++++
 include/trace/events/writeback.h |   37 +++++++++++++----------------
 mm/backing-dev.c                 |    1 
 4 files changed, 29 insertions(+), 21 deletions(-)

--- a/fs/fs-writeback.c~memcg-fix-a-crash-in-wb_workfn-when-a-device-disappears
+++ a/fs/fs-writeback.c
@@ -2063,7 +2063,7 @@ void wb_workfn(struct work_struct *work)
 						struct bdi_writeback, dwork);
 	long pages_written;
 
-	set_worker_desc("flush-%s", dev_name(wb->bdi->dev));
+	set_worker_desc("flush-%s", bdi_dev_name(wb->bdi));
 	current->flags |= PF_SWAPWRITE;
 
 	if (likely(!current_is_workqueue_rescuer() ||
--- a/include/linux/backing-dev.h~memcg-fix-a-crash-in-wb_workfn-when-a-device-disappears
+++ a/include/linux/backing-dev.h
@@ -13,6 +13,7 @@
 #include <linux/fs.h>
 #include <linux/sched.h>
 #include <linux/blkdev.h>
+#include <linux/device.h>
 #include <linux/writeback.h>
 #include <linux/blk-cgroup.h>
 #include <linux/backing-dev-defs.h>
@@ -504,4 +505,13 @@ static inline int bdi_rw_congested(struc
 				  (1 << WB_async_congested));
 }
 
+extern const char *bdi_unknown_name;
+
+static inline const char *bdi_dev_name(struct backing_dev_info *bdi)
+{
+	if (!bdi || !bdi->dev)
+		return bdi_unknown_name;
+	return dev_name(bdi->dev);
+}
+
 #endif	/* _LINUX_BACKING_DEV_H */
--- a/include/trace/events/writeback.h~memcg-fix-a-crash-in-wb_workfn-when-a-device-disappears
+++ a/include/trace/events/writeback.h
@@ -67,8 +67,8 @@ DECLARE_EVENT_CLASS(writeback_page_templ
 
 	TP_fast_assign(
 		strscpy_pad(__entry->name,
-			    mapping ? dev_name(inode_to_bdi(mapping->host)->dev) : "(unknown)",
-			    32);
+			    bdi_dev_name(mapping ? inode_to_bdi(mapping->host) :
+					 NULL), 32);
 		__entry->ino = mapping ? mapping->host->i_ino : 0;
 		__entry->index = page->index;
 	),
@@ -111,8 +111,7 @@ DECLARE_EVENT_CLASS(writeback_dirty_inod
 		struct backing_dev_info *bdi = inode_to_bdi(inode);
 
 		/* may be called for files on pseudo FSes w/ unregistered bdi */
-		strscpy_pad(__entry->name,
-			    bdi->dev ? dev_name(bdi->dev) : "(unknown)", 32);
+		strscpy_pad(__entry->name, bdi_dev_name(bdi), 32);
 		__entry->ino		= inode->i_ino;
 		__entry->state		= inode->i_state;
 		__entry->flags		= flags;
@@ -193,7 +192,7 @@ TRACE_EVENT(inode_foreign_history,
 	),
 
 	TP_fast_assign(
-		strncpy(__entry->name, dev_name(inode_to_bdi(inode)->dev), 32);
+		strncpy(__entry->name, bdi_dev_name(inode_to_bdi(inode)), 32);
 		__entry->ino		= inode->i_ino;
 		__entry->cgroup_ino	= __trace_wbc_assign_cgroup(wbc);
 		__entry->history	= history;
@@ -222,7 +221,7 @@ TRACE_EVENT(inode_switch_wbs,
 	),
 
 	TP_fast_assign(
-		strncpy(__entry->name,	dev_name(old_wb->bdi->dev), 32);
+		strncpy(__entry->name,	bdi_dev_name(old_wb->bdi), 32);
 		__entry->ino		= inode->i_ino;
 		__entry->old_cgroup_ino	= __trace_wb_assign_cgroup(old_wb);
 		__entry->new_cgroup_ino	= __trace_wb_assign_cgroup(new_wb);
@@ -255,7 +254,7 @@ TRACE_EVENT(track_foreign_dirty,
 		struct address_space *mapping = page_mapping(page);
 		struct inode *inode = mapping ? mapping->host : NULL;
 
-		strncpy(__entry->name,	dev_name(wb->bdi->dev), 32);
+		strncpy(__entry->name,	bdi_dev_name(wb->bdi), 32);
 		__entry->bdi_id		= wb->bdi->id;
 		__entry->ino		= inode ? inode->i_ino : 0;
 		__entry->memcg_id	= wb->memcg_css->id;
@@ -288,7 +287,7 @@ TRACE_EVENT(flush_foreign,
 	),
 
 	TP_fast_assign(
-		strncpy(__entry->name,	dev_name(wb->bdi->dev), 32);
+		strncpy(__entry->name,	bdi_dev_name(wb->bdi), 32);
 		__entry->cgroup_ino	= __trace_wb_assign_cgroup(wb);
 		__entry->frn_bdi_id	= frn_bdi_id;
 		__entry->frn_memcg_id	= frn_memcg_id;
@@ -318,7 +317,7 @@ DECLARE_EVENT_CLASS(writeback_write_inod
 
 	TP_fast_assign(
 		strscpy_pad(__entry->name,
-			    dev_name(inode_to_bdi(inode)->dev), 32);
+			    bdi_dev_name(inode_to_bdi(inode)), 32);
 		__entry->ino		= inode->i_ino;
 		__entry->sync_mode	= wbc->sync_mode;
 		__entry->cgroup_ino	= __trace_wbc_assign_cgroup(wbc);
@@ -361,9 +360,7 @@ DECLARE_EVENT_CLASS(writeback_work_class
 		__field(ino_t, cgroup_ino)
 	),
 	TP_fast_assign(
-		strscpy_pad(__entry->name,
-			    wb->bdi->dev ? dev_name(wb->bdi->dev) :
-			    "(unknown)", 32);
+		strscpy_pad(__entry->name, bdi_dev_name(wb->bdi), 32);
 		__entry->nr_pages = work->nr_pages;
 		__entry->sb_dev = work->sb ? work->sb->s_dev : 0;
 		__entry->sync_mode = work->sync_mode;
@@ -416,7 +413,7 @@ DECLARE_EVENT_CLASS(writeback_class,
 		__field(ino_t, cgroup_ino)
 	),
 	TP_fast_assign(
-		strscpy_pad(__entry->name, dev_name(wb->bdi->dev), 32);
+		strscpy_pad(__entry->name, bdi_dev_name(wb->bdi), 32);
 		__entry->cgroup_ino = __trace_wb_assign_cgroup(wb);
 	),
 	TP_printk("bdi %s: cgroup_ino=%lu",
@@ -438,7 +435,7 @@ TRACE_EVENT(writeback_bdi_register,
 		__array(char, name, 32)
 	),
 	TP_fast_assign(
-		strscpy_pad(__entry->name, dev_name(bdi->dev), 32);
+		strscpy_pad(__entry->name, bdi_dev_name(bdi), 32);
 	),
 	TP_printk("bdi %s",
 		__entry->name
@@ -463,7 +460,7 @@ DECLARE_EVENT_CLASS(wbc_class,
 	),
 
 	TP_fast_assign(
-		strscpy_pad(__entry->name, dev_name(bdi->dev), 32);
+		strscpy_pad(__entry->name, bdi_dev_name(bdi), 32);
 		__entry->nr_to_write	= wbc->nr_to_write;
 		__entry->pages_skipped	= wbc->pages_skipped;
 		__entry->sync_mode	= wbc->sync_mode;
@@ -514,7 +511,7 @@ TRACE_EVENT(writeback_queue_io,
 	),
 	TP_fast_assign(
 		unsigned long *older_than_this = work->older_than_this;
-		strscpy_pad(__entry->name, dev_name(wb->bdi->dev), 32);
+		strscpy_pad(__entry->name, bdi_dev_name(wb->bdi), 32);
 		__entry->older	= older_than_this ?  *older_than_this : 0;
 		__entry->age	= older_than_this ?
 				  (jiffies - *older_than_this) * 1000 / HZ : -1;
@@ -600,7 +597,7 @@ TRACE_EVENT(bdi_dirty_ratelimit,
 	),
 
 	TP_fast_assign(
-		strscpy_pad(__entry->bdi, dev_name(wb->bdi->dev), 32);
+		strscpy_pad(__entry->bdi, bdi_dev_name(wb->bdi), 32);
 		__entry->write_bw	= KBps(wb->write_bandwidth);
 		__entry->avg_write_bw	= KBps(wb->avg_write_bandwidth);
 		__entry->dirty_rate	= KBps(dirty_rate);
@@ -665,7 +662,7 @@ TRACE_EVENT(balance_dirty_pages,
 
 	TP_fast_assign(
 		unsigned long freerun = (thresh + bg_thresh) / 2;
-		strscpy_pad(__entry->bdi, dev_name(wb->bdi->dev), 32);
+		strscpy_pad(__entry->bdi, bdi_dev_name(wb->bdi), 32);
 
 		__entry->limit		= global_wb_domain.dirty_limit;
 		__entry->setpoint	= (global_wb_domain.dirty_limit +
@@ -726,7 +723,7 @@ TRACE_EVENT(writeback_sb_inodes_requeue,
 
 	TP_fast_assign(
 		strscpy_pad(__entry->name,
-			    dev_name(inode_to_bdi(inode)->dev), 32);
+			    bdi_dev_name(inode_to_bdi(inode)), 32);
 		__entry->ino		= inode->i_ino;
 		__entry->state		= inode->i_state;
 		__entry->dirtied_when	= inode->dirtied_when;
@@ -800,7 +797,7 @@ DECLARE_EVENT_CLASS(writeback_single_ino
 
 	TP_fast_assign(
 		strscpy_pad(__entry->name,
-			    dev_name(inode_to_bdi(inode)->dev), 32);
+			    bdi_dev_name(inode_to_bdi(inode)), 32);
 		__entry->ino		= inode->i_ino;
 		__entry->state		= inode->i_state;
 		__entry->dirtied_when	= inode->dirtied_when;
--- a/mm/backing-dev.c~memcg-fix-a-crash-in-wb_workfn-when-a-device-disappears
+++ a/mm/backing-dev.c
@@ -21,6 +21,7 @@ struct backing_dev_info noop_backing_dev
 EXPORT_SYMBOL_GPL(noop_backing_dev_info);
 
 static struct class *bdi_class;
+const char *bdi_unknown_name = "(unknown)";
 
 /*
  * bdi_lock protects bdi_tree and updates to bdi_list. bdi_list has RCU
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 003/118] mm/mempolicy.c: fix out of bounds write in mpol_parse_str()
  2020-01-31  6:10 incoming Andrew Morton
  2020-01-31  6:11 ` [patch 001/118] lib/test_bitmap: correct test data offsets for 32-bit Andrew Morton
  2020-01-31  6:11 ` [patch 002/118] memcg: fix a crash in wb_workfn when a device disappears Andrew Morton
@ 2020-01-31  6:11 ` Andrew Morton
  2020-01-31  6:11 ` [patch 004/118] mm/sparse.c: reset section's mem_map when fully deactivated Andrew Morton
                   ` (114 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:11 UTC (permalink / raw)
  To: aarcange, akpm, dan.carpenter, hughd, lee.schermerhorn, linux-mm,
	mhocko, mm-commits, stable, torvalds, vbabka

From: Dan Carpenter <dan.carpenter@oracle.com>
Subject: mm/mempolicy.c: fix out of bounds write in mpol_parse_str()

What we are trying to do is change the '=' character to a NUL terminator
and then at the end of the function we restore it back to an '='.  The
problem is there are two error paths where we jump to the end of the
function before we have replaced the '=' with NUL.  We end up putting the
'=' in the wrong place (possibly one element before the start of the
buffer).

Link: http://lkml.kernel.org/r/20200115055426.vdjwvry44nfug7yy@kili.mountain
Reported-by: syzbot+e64a13c5369a194d67df@syzkaller.appspotmail.com
Fixes: 095f1fc4ebf3 ("mempolicy: rework shmem mpol parsing and display")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Dmitry Vyukov <dvyukov@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/mempolicy.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/mm/mempolicy.c~mm-mempolicyc-fix-out-of-bounds-write-in-mpol_parse_str
+++ a/mm/mempolicy.c
@@ -2821,6 +2821,9 @@ int mpol_parse_str(char *str, struct mem
 	char *flags = strchr(str, '=');
 	int err = 1, mode;
 
+	if (flags)
+		*flags++ = '\0';	/* terminate mode string */
+
 	if (nodelist) {
 		/* NUL-terminate mode or flags string */
 		*nodelist++ = '\0';
@@ -2831,9 +2834,6 @@ int mpol_parse_str(char *str, struct mem
 	} else
 		nodes_clear(nodes);
 
-	if (flags)
-		*flags++ = '\0';	/* terminate mode string */

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 004/118] mm/sparse.c: reset section's mem_map when fully deactivated
  2020-01-31  6:10 incoming Andrew Morton
                   ` (2 preceding siblings ...)
  2020-01-31  6:11 ` [patch 003/118] mm/mempolicy.c: fix out of bounds write in mpol_parse_str() Andrew Morton
@ 2020-01-31  6:11 ` Andrew Morton
  2020-01-31  6:11 ` [patch 005/118] mm/migrate.c: also overwrite error when it is bigger than zero Andrew Morton
                   ` (113 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:11 UTC (permalink / raw)
  To: akpm, bhe, cai, dan.j.williams, david, k-hagio, kernelfans,
	linux-mm, mhocko, mm-commits, osalvador, stable, torvalds

From: Pingfan Liu <kernelfans@gmail.com>
Subject: mm/sparse.c: reset section's mem_map when fully deactivated

After commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug"),
when a mem section is fully deactivated, section_mem_map still records the
section's start pfn, which is not used any more and will be reassigned
during re-addition.

In analogy with alloc/free pattern, it is better to clear all fields of
section_mem_map.

Beside this, it breaks the user space tool "makedumpfile" [1], which makes
assumption that a hot-removed section has mem_map as NULL, instead of
checking directly against SECTION_MARKED_PRESENT bit.  (makedumpfile will
be better to change the assumption, and need a patch)

The bug can be reproduced on IBM POWERVM by "drmgr -c mem -r -q 5" ,
trigger a crash, and save vmcore by makedumpfile

[1]: makedumpfile, commit e73016540293 ("[v1.6.7] Update version")

Link: http://lkml.kernel.org/r/1579487594-28889-1-git-send-email-kernelfans@gmail.com
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Baoquan He <bhe@redhat.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/sparse.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/sparse.c~mm-sparse-reset-sections-mem_map-when-fully-deactivated
+++ a/mm/sparse.c
@@ -789,7 +789,7 @@ static void section_deactivate(unsigned
 			ms->usage = NULL;
 		}
 		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
-		ms->section_mem_map = sparse_encode_mem_map(NULL, section_nr);
+		ms->section_mem_map = (unsigned long)NULL;
 	}
 
 	if (section_is_early && memmap)
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 005/118] mm/migrate.c: also overwrite error when it is bigger than zero
  2020-01-31  6:10 incoming Andrew Morton
                   ` (3 preceding siblings ...)
  2020-01-31  6:11 ` [patch 004/118] mm/sparse.c: reset section's mem_map when fully deactivated Andrew Morton
@ 2020-01-31  6:11 ` Andrew Morton
  2020-01-31  6:11 ` [patch 006/118] mm/memory_hotplug: fix remove_memory() lockdep splat Andrew Morton
                   ` (112 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:11 UTC (permalink / raw)
  To: akpm, cl, jhubbard, linux-mm, mhocko, mm-commits, richardw.yang,
	stable, torvalds, vbabka, yang.shi

From: Wei Yang <richardw.yang@linux.intel.com>
Subject: mm/migrate.c: also overwrite error when it is bigger than zero

If we get here after successfully adding page to list, err would be 1 to
indicate the page is queued in the list.

Current code has two problems:

  * on success, 0 is not returned
  * on error, if add_page_for_migratioin() return 1, and the following err1
    from do_move_pages_to_node() is set, the err1 is not returned since err
    is 1

And these behaviors break the user interface.

Link: http://lkml.kernel.org/r/20200119065753.21694-1-richardw.yang@linux.intel.com
Fixes: e0153fc2c760 ("mm: move_pages: return valid node id in status if the page is already on the target node").
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Acked-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/migrate.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/migrate.c~mm-migratec-also-overwrite-error-when-it-is-bigger-than-zero
+++ a/mm/migrate.c
@@ -1676,7 +1676,7 @@ out_flush:
 	err1 = do_move_pages_to_node(mm, &pagelist, current_node);
 	if (!err1)
 		err1 = store_status(status, start, current_node, i - start);
-	if (!err)
+	if (err >= 0)
 		err = err1;
 out:
 	return err;
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 006/118] mm/memory_hotplug: fix remove_memory() lockdep splat
  2020-01-31  6:10 incoming Andrew Morton
                   ` (4 preceding siblings ...)
  2020-01-31  6:11 ` [patch 005/118] mm/migrate.c: also overwrite error when it is bigger than zero Andrew Morton
@ 2020-01-31  6:11 ` Andrew Morton
  2020-01-31  6:11 ` [patch 007/118] mm: thp: don't need care deferred split queue in memcg charge move path Andrew Morton
                   ` (111 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:11 UTC (permalink / raw)
  To: akpm, dan.j.williams, dave.hansen, david, linux-mm, mhocko,
	mm-commits, pasha.tatashin, stable, torvalds, vishal.l.verma

From: Dan Williams <dan.j.williams@intel.com>
Subject: mm/memory_hotplug: fix remove_memory() lockdep splat

The daxctl unit test for the dax_kmem driver currently triggers the (false
positive) lockdep splat below.  It results from the fact that
remove_memory_block_devices() is invoked under the mem_hotplug_lock()
causing lockdep entanglements with cpu_hotplug_lock() and sysfs (kernfs
active state tracking).  It is a false positive because the sysfs
attribute path triggering the memory remove is not the same attribute path
associated with memory-block device.

sysfs_break_active_protection() is not applicable since there is no real
deadlock conflict, instead move memory-block device removal outside the
lock.  The mem_hotplug_lock() is not needed to synchronize the
memory-block device removal vs the page online state, that is already
handled by lock_device_hotplug().  Specifically, lock_device_hotplug() is
sufficient to allow try_remove_memory() to check the offline state of the
memblocks and be assured that any in progress online attempts are flushed
/ blocked by kernfs_drain() / attribute removal.

The add_memory() path safely creates memblock devices under the
mem_hotplug_lock().  There is no kernfs active state synchronization in
the memblock device_register() path, so nothing to fix there.

This change is only possible thanks to the recent change that refactored
memory block device removal out of arch_remove_memory() (commit
4c4b7f9ba948 mm/memory_hotplug: remove memory block devices before
arch_remove_memory()), and David's due diligence tracking down the
guarantees afforded by kernfs_drain().  Not flagged for -stable since this
only impacts ongoing development and lockdep validation, not a runtime
issue.

    ======================================================
    WARNING: possible circular locking dependency detected
    5.5.0-rc3+ #230 Tainted: G           OE
    ------------------------------------------------------
    lt-daxctl/6459 is trying to acquire lock:
    ffff99c7f0003510 (kn->count#241){++++}, at: kernfs_remove_by_name_ns+0x41/0x80

    but task is already holding lock:
    ffffffffa76a5450 (mem_hotplug_lock.rw_sem){++++}, at: percpu_down_write+0x20/0xe0

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #2 (mem_hotplug_lock.rw_sem){++++}:
           __lock_acquire+0x39c/0x790
           lock_acquire+0xa2/0x1b0
           get_online_mems+0x3e/0xb0
           kmem_cache_create_usercopy+0x2e/0x260
           kmem_cache_create+0x12/0x20
           ptlock_cache_init+0x20/0x28
           start_kernel+0x243/0x547
           secondary_startup_64+0xb6/0xc0

    -> #1 (cpu_hotplug_lock.rw_sem){++++}:
           __lock_acquire+0x39c/0x790
           lock_acquire+0xa2/0x1b0
           cpus_read_lock+0x3e/0xb0
           online_pages+0x37/0x300
           memory_subsys_online+0x17d/0x1c0
           device_online+0x60/0x80
           state_store+0x65/0xd0
           kernfs_fop_write+0xcf/0x1c0
           vfs_write+0xdb/0x1d0
           ksys_write+0x65/0xe0
           do_syscall_64+0x5c/0xa0
           entry_SYSCALL_64_after_hwframe+0x49/0xbe

    -> #0 (kn->count#241){++++}:
           check_prev_add+0x98/0xa40
           validate_chain+0x576/0x860
           __lock_acquire+0x39c/0x790
           lock_acquire+0xa2/0x1b0
           __kernfs_remove+0x25f/0x2e0
           kernfs_remove_by_name_ns+0x41/0x80
           remove_files.isra.0+0x30/0x70
           sysfs_remove_group+0x3d/0x80
           sysfs_remove_groups+0x29/0x40
           device_remove_attrs+0x39/0x70
           device_del+0x16a/0x3f0
           device_unregister+0x16/0x60
           remove_memory_block_devices+0x82/0xb0
           try_remove_memory+0xb5/0x130
           remove_memory+0x26/0x40
           dev_dax_kmem_remove+0x44/0x6a [kmem]
           device_release_driver_internal+0xe4/0x1c0
           unbind_store+0xef/0x120
           kernfs_fop_write+0xcf/0x1c0
           vfs_write+0xdb/0x1d0
           ksys_write+0x65/0xe0
           do_syscall_64+0x5c/0xa0
           entry_SYSCALL_64_after_hwframe+0x49/0xbe

    other info that might help us debug this:

    Chain exists of:
      kn->count#241 --> cpu_hotplug_lock.rw_sem --> mem_hotplug_lock.rw_sem

     Possible unsafe locking scenario:

           CPU0                    CPU1
           ----                    ----
      lock(mem_hotplug_lock.rw_sem);
                                   lock(cpu_hotplug_lock.rw_sem);
                                   lock(mem_hotplug_lock.rw_sem);
      lock(kn->count#241);

     *** DEADLOCK ***

No fixes tag as this has been a long standing issue that predated the
addition of kernfs lockdep annotations.

Link: http://lkml.kernel.org/r/157991441887.2763922.4770790047389427325.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory_hotplug.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

--- a/mm/memory_hotplug.c~mm-memory_hotplug-fix-remove_memory-lockdep-splat
+++ a/mm/memory_hotplug.c
@@ -1764,8 +1764,6 @@ static int __ref try_remove_memory(int n
 
 	BUG_ON(check_hotplug_memory_range(start, size));
 
-	mem_hotplug_begin();

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 007/118] mm: thp: don't need care deferred split queue in memcg charge move path
  2020-01-31  6:10 incoming Andrew Morton
                   ` (5 preceding siblings ...)
  2020-01-31  6:11 ` [patch 006/118] mm/memory_hotplug: fix remove_memory() lockdep splat Andrew Morton
@ 2020-01-31  6:11 ` Andrew Morton
  2020-01-31  6:11 ` [patch 008/118] mm: move_pages: report the number of non-attempted pages Andrew Morton
                   ` (110 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:11 UTC (permalink / raw)
  To: akpm, hannes, kirill.shutemov, linux-mm, mhocko, mm-commits,
	richardw.yang, rientjes, stable, torvalds, vdavydov.dev,
	yang.shi

From: Wei Yang <richardw.yang@linux.intel.com>
Subject: mm: thp: don't need care deferred split queue in memcg charge move path

If compound is true, this means it is a PMD mapped THP. Which implies
the page is not linked to any defer list. So the first code chunk will
not be executed.

Also with this reason, it would not be proper to add this page to a
defer list. So the second code chunk is not correct.

Based on this, we should remove the defer list related code.

[yang.shi@linux.alibaba.com: better patch title]
Link: http://lkml.kernel.org/r/20200117233836.3434-1-richardw.yang@linux.intel.com
Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <stable@vger.kernel.org>    [5.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memcontrol.c |   18 ------------------
 1 file changed, 18 deletions(-)

--- a/mm/memcontrol.c~mm-thp-remove-the-defer-list-related-code-since-this-will-not-happen
+++ a/mm/memcontrol.c
@@ -5340,14 +5340,6 @@ static int mem_cgroup_move_account(struc
 		__mod_lruvec_state(to_vec, NR_WRITEBACK, nr_pages);
 	}
 
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-	if (compound && !list_empty(page_deferred_list(page))) {
-		spin_lock(&from->deferred_split_queue.split_queue_lock);
-		list_del_init(page_deferred_list(page));
-		from->deferred_split_queue.split_queue_len--;
-		spin_unlock(&from->deferred_split_queue.split_queue_lock);
-	}
-#endif
 	/*
 	 * It is safe to change page->mem_cgroup here because the page
 	 * is referenced, charged, and isolated - we can't race with
@@ -5357,16 +5349,6 @@ static int mem_cgroup_move_account(struc
 	/* caller should have done css_get */
 	page->mem_cgroup = to;
 
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-	if (compound && list_empty(page_deferred_list(page))) {
-		spin_lock(&to->deferred_split_queue.split_queue_lock);
-		list_add_tail(page_deferred_list(page),
-			      &to->deferred_split_queue.split_queue);
-		to->deferred_split_queue.split_queue_len++;
-		spin_unlock(&to->deferred_split_queue.split_queue_lock);
-	}
-#endif

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 008/118] mm: move_pages: report the number of non-attempted pages
  2020-01-31  6:10 incoming Andrew Morton
                   ` (6 preceding siblings ...)
  2020-01-31  6:11 ` [patch 007/118] mm: thp: don't need care deferred split queue in memcg charge move path Andrew Morton
@ 2020-01-31  6:11 ` Andrew Morton
  2020-01-31  6:11 ` [patch 009/118] scripts/spelling.txt: add more spellings to spelling.txt Andrew Morton
                   ` (109 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:11 UTC (permalink / raw)
  To: akpm, linux-mm, mhocko, mm-commits, richardw.yang, stable,
	torvalds, yang.shi

From: Yang Shi <yang.shi@linux.alibaba.com>
Subject: mm: move_pages: report the number of non-attempted pages

Since commit a49bd4d71637 ("mm, numa: rework do_pages_move"), the semantic
of move_pages() has changed to return the number of non-migrated pages if
they were result of a non-fatal reasons (usually a busy page).  This was
an unintentional change that hasn't been noticed except for LTP tests
which checked for the documented behavior.

There are two ways to go around this change.  We can even get back to the
original behavior and return -EAGAIN whenever migrate_pages is not able to
migrate pages due to non-fatal reasons.  Another option would be to simply
continue with the changed semantic and extend move_pages documentation to
clarify that -errno is returned on an invalid input or when migration
simply cannot succeed (e.g.  -ENOMEM, -EBUSY) or the number of pages that
couldn't have been migrated due to ephemeral reasons (e.g.  page is pinned
or locked for other reasons).

This patch implements the second option because this behavior is in place
for some time without anybody complaining and possibly new users depending
on it.  Also it allows to have a slightly easier error handling as the
caller knows that it is worth to retry when err > 0.

But since the new semantic would be aborted immediately if migration is
failed due to ephemeral reasons, need include the number of non-attempted
pages in the return value too.

Link: http://lkml.kernel.org/r/1580160527-109104-1-git-send-email-yang.shi@linux.alibaba.com
Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move")
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
Cc: <stable@vger.kernel.org>    [4.17+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/migrate.c |   25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

--- a/mm/migrate.c~mm-move_pages-report-the-number-of-non-attempted-pages
+++ a/mm/migrate.c
@@ -1627,8 +1627,19 @@ static int do_pages_move(struct mm_struc
 			start = i;
 		} else if (node != current_node) {
 			err = do_move_pages_to_node(mm, &pagelist, current_node);
-			if (err)
+			if (err) {
+				/*
+				 * Positive err means the number of failed
+				 * pages to migrate.  Since we are going to
+				 * abort and return the number of non-migrated
+				 * pages, so need to incude the rest of the
+				 * nr_pages that have not been attempted as
+				 * well.
+				 */
+				if (err > 0)
+					err += nr_pages - i - 1;
 				goto out;
+			}
 			err = store_status(status, start, current_node, i - start);
 			if (err)
 				goto out;
@@ -1659,8 +1670,11 @@ static int do_pages_move(struct mm_struc
 			goto out_flush;
 
 		err = do_move_pages_to_node(mm, &pagelist, current_node);
-		if (err)
+		if (err) {
+			if (err > 0)
+				err += nr_pages - i - 1;
 			goto out;
+		}
 		if (i > start) {
 			err = store_status(status, start, current_node, i - start);
 			if (err)
@@ -1674,6 +1688,13 @@ out_flush:
 
 	/* Make sure we do not overwrite the existing error */
 	err1 = do_move_pages_to_node(mm, &pagelist, current_node);
+	/*
+	 * Don't have to report non-attempted pages here since:
+	 *     - If the above loop is done gracefully all pages have been
+	 *       attempted.
+	 *     - If the above loop is aborted it means a fatal error
+	 *       happened, should return ret.
+	 */
 	if (!err1)
 		err1 = store_status(status, start, current_node, i - start);
 	if (err >= 0)
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 009/118] scripts/spelling.txt: add more spellings to spelling.txt
  2020-01-31  6:10 incoming Andrew Morton
                   ` (7 preceding siblings ...)
  2020-01-31  6:11 ` [patch 008/118] mm: move_pages: report the number of non-attempted pages Andrew Morton
@ 2020-01-31  6:11 ` Andrew Morton
  2020-01-31  6:11 ` [patch 010/118] scripts/spelling.txt: add "issus" typo Andrew Morton
                   ` (108 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:11 UTC (permalink / raw)
  To: akpm, chris.paterson2, colin.king, linux-mm, mm-commits,
	paul.walmsley, sboyd, torvalds, xndchn

From: Xiong <xndchn@gmail.com>
Subject: scripts/spelling.txt: add more spellings to spelling.txt

Here are some of the common spelling mistakes and typos that I've found
while fixing up spelling mistakes in the kernel.  Most of them still exist
in more than two source files.

Link: http://lkml.kernel.org/r/20191229143626.51238-1-xndchn@gmail.com
Signed-off-by: Xiong <xndchn@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Chris Paterson <chris.paterson2@renesas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 scripts/spelling.txt |   13 +++++++++++++
 1 file changed, 13 insertions(+)

--- a/scripts/spelling.txt~scripts-spellingtxt-add-more-spellings-to-spellingtxt
+++ a/scripts/spelling.txt
@@ -39,6 +39,8 @@ accout||account
 accquire||acquire
 accquired||acquired
 accross||across
+accumalate||accumulate
+accumalator||accumulator
 acessable||accessible
 acess||access
 acessing||accessing
@@ -106,6 +108,7 @@ alogrithm||algorithm
 alot||a lot
 alow||allow
 alows||allows
+alreay||already
 alredy||already
 altough||although
 alue||value
@@ -241,6 +244,7 @@ calender||calendar
 calescing||coalescing
 calle||called
 callibration||calibration
+callled||called
 calucate||calculate
 calulate||calculate
 cancelation||cancellation
@@ -311,6 +315,7 @@ compaibility||compatibility
 comparsion||comparison
 compatability||compatibility
 compatable||compatible
+compatibililty||compatibility
 compatibiliy||compatibility
 compatibilty||compatibility
 compatiblity||compatibility
@@ -330,6 +335,7 @@ comunication||communication
 conbination||combination
 conditionaly||conditionally
 conditon||condition
+condtion||condition
 conected||connected
 conector||connector
 connecetd||connected
@@ -388,6 +394,8 @@ dafault||default
 deafult||default
 deamon||daemon
 debouce||debounce
+decendant||descendant
+decendants||descendants
 decompres||decompress
 decsribed||described
 decription||description
@@ -411,11 +419,13 @@ delare||declare
 delares||declares
 delaring||declaring
 delemiter||delimiter
+delievered||delivered
 demodualtor||demodulator
 demension||dimension
 dependancies||dependencies
 dependancy||dependency
 dependant||dependent
+dependend||dependent
 depreacted||deprecated
 depreacte||deprecate
 desactivate||deactivate
@@ -995,6 +1005,7 @@ peice||piece
 pendantic||pedantic
 peprocessor||preprocessor
 perfoming||performing
+perfomring||performing
 peripherial||peripheral
 permissons||permissions
 peroid||period
@@ -1166,6 +1177,8 @@ retreive||retrieve
 retreiving||retrieving
 retrive||retrieve
 retrived||retrieved
+retrun||return
+retun||return
 retuned||returned
 reudce||reduce
 reuest||request
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 010/118] scripts/spelling.txt: add "issus" typo
  2020-01-31  6:10 incoming Andrew Morton
                   ` (8 preceding siblings ...)
  2020-01-31  6:11 ` [patch 009/118] scripts/spelling.txt: add more spellings to spelling.txt Andrew Morton
@ 2020-01-31  6:11 ` Andrew Morton
  2020-01-31  6:11 ` [patch 011/118] fs: ocfs: remove unnecessary assertion in dlm_migrate_lockres Andrew Morton
                   ` (107 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:11 UTC (permalink / raw)
  To: akpm, linux-mm, luca, mm-commits, torvalds

From: Luca Ceresoli <luca@lucaceresoli.net>
Subject: scripts/spelling.txt: add "issus" typo

Add "issus" and correct it as "issues".

Link: http://lkml.kernel.org/r/20200105221950.8384-1-luca@lucaceresoli.net
Signed-off-by: Luca Ceresoli <luca@lucaceresoli.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 scripts/spelling.txt |    1 +
 1 file changed, 1 insertion(+)

--- a/scripts/spelling.txt~scripts-spellingtxt-add-issus-typo
+++ a/scripts/spelling.txt
@@ -801,6 +801,7 @@ ireelevant||irrelevant
 irrelevent||irrelevant
 isnt||isn't
 isssue||issue
+issus||issues
 iternations||iterations
 itertation||iteration
 itslef||itself
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 011/118] fs: ocfs: remove unnecessary assertion in dlm_migrate_lockres
  2020-01-31  6:10 incoming Andrew Morton
                   ` (9 preceding siblings ...)
  2020-01-31  6:11 ` [patch 010/118] scripts/spelling.txt: add "issus" typo Andrew Morton
@ 2020-01-31  6:11 ` Andrew Morton
  2020-01-31  6:11 ` [patch 012/118] ocfs2: remove unneeded semicolons Andrew Morton
                   ` (106 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:11 UTC (permalink / raw)
  To: akpm, gechangwei, ghe, jiangqi903, jlbec, junxiao.bi, linux-mm,
	mark, mm-commits, pakki001, piaojun, torvalds

From: Aditya Pakki <pakki001@umn.edu>
Subject: fs: ocfs: remove unnecessary assertion in dlm_migrate_lockres

In the only caller of dlm_migrate_lockres() - dlm_empty_lockres(), target
is checked for O2NM_MAX_NODES.  Thus, the assertion in
dlm_migrate_lockres() is unnecessary and can be removed.  The patch
eliminates such a check.

Link: http://lkml.kernel.org/r/20191218194111.26041-1-pakki001@umn.edu
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/ocfs2/dlm/dlmmaster.c |    2 --
 1 file changed, 2 deletions(-)

--- a/fs/ocfs2/dlm/dlmmaster.c~fs-ocfs-remove-unnecessary-assertion-in-dlm_migrate_lockres
+++ a/fs/ocfs2/dlm/dlmmaster.c
@@ -2554,8 +2554,6 @@ static int dlm_migrate_lockres(struct dl
 	if (!dlm_grab(dlm))
 		return -EINVAL;
 
-	BUG_ON(target == O2NM_MAX_NODES);

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 012/118] ocfs2: remove unneeded semicolons
  2020-01-31  6:10 incoming Andrew Morton
                   ` (10 preceding siblings ...)
  2020-01-31  6:11 ` [patch 011/118] fs: ocfs: remove unnecessary assertion in dlm_migrate_lockres Andrew Morton
@ 2020-01-31  6:11 ` Andrew Morton
  2020-01-31  6:11 ` [patch 013/118] ocfs2: make local header paths relative to C files Andrew Morton
                   ` (105 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:11 UTC (permalink / raw)
  To: akpm, gechangwei, ghe, hulkci, jlbec, joseph.qi, junxiao.bi,
	linux-mm, mark, mm-commits, piaojun, torvalds, zhengbin13

From: zhengbin <zhengbin13@huawei.com>
Subject: ocfs2: remove unneeded semicolons

Fixes coccicheck warnings:

fs/ocfs2/cluster/quorum.c:76:2-3: Unneeded semicolon
fs/ocfs2/dlmglue.c:573:2-3: Unneeded semicolon

Link: http://lkml.kernel.org/r/6ee3aa16-9078-30b1-df3f-22064950bd98@linux.alibaba.com
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/ocfs2/cluster/quorum.c |    2 +-
 fs/ocfs2/dlmglue.c        |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

--- a/fs/ocfs2/cluster/quorum.c~ocfs2-remove-unneeded-semicolon
+++ a/fs/ocfs2/cluster/quorum.c
@@ -73,7 +73,7 @@ static void o2quo_fence_self(void)
 		       "system by restarting ***\n");
 		emergency_restart();
 		break;
-	};
+	}
 }
 
 /* Indicate that a timeout occurred on a heartbeat region write. The
--- a/fs/ocfs2/dlmglue.c~ocfs2-remove-unneeded-semicolon
+++ a/fs/ocfs2/dlmglue.c
@@ -570,7 +570,7 @@ void ocfs2_inode_lock_res_init(struct oc
 			mlog_bug_on_msg(1, "type: %d\n", type);
 			ops = NULL; /* thanks, gcc */
 			break;
-	};
+	}
 
 	ocfs2_build_lock_name(type, OCFS2_I(inode)->ip_blkno,
 			      generation, res->l_name);
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 013/118] ocfs2: make local header paths relative to C files
  2020-01-31  6:10 incoming Andrew Morton
                   ` (11 preceding siblings ...)
  2020-01-31  6:11 ` [patch 012/118] ocfs2: remove unneeded semicolons Andrew Morton
@ 2020-01-31  6:11 ` Andrew Morton
  2020-01-31  6:11 ` [patch 014/118] ocfs2/dlm: remove redundant assignment to ret Andrew Morton
                   ` (104 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:11 UTC (permalink / raw)
  To: akpm, gechangwei, ghe, jiangqi903, jlbec, junxiao.bi, linux-mm,
	mark, masahiroy, mm-commits, piaojun, torvalds

From: Masahiro Yamada <masahiroy@kernel.org>
Subject: ocfs2: make local header paths relative to C files

Gang He reports the failure of building fs/ocfs2/ as an external module of
the kernel installed on the system:

 $ cd fs/ocfs2
 $ make -C /lib/modules/`uname -r`/build M=`pwd` modules

If you want to make it work reliably, I'd recommend to remove ccflags-y
from the Makefiles, and to make header paths relative to the C files.
I think this is the correct usage of the #include "..." directive.

Link: http://lkml.kernel.org/r/20191227022950.14804-1-ghe@suse.com
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Gang He <ghe@suse.com>
Reported-by: Gang He <ghe@suse.com>
Reviewed-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/ocfs2/dlm/Makefile      |    2 --
 fs/ocfs2/dlm/dlmast.c      |    8 ++++----
 fs/ocfs2/dlm/dlmconvert.c  |    8 ++++----
 fs/ocfs2/dlm/dlmdebug.c    |    8 ++++----
 fs/ocfs2/dlm/dlmdomain.c   |    8 ++++----
 fs/ocfs2/dlm/dlmlock.c     |    8 ++++----
 fs/ocfs2/dlm/dlmmaster.c   |    8 ++++----
 fs/ocfs2/dlm/dlmrecovery.c |    8 ++++----
 fs/ocfs2/dlm/dlmthread.c   |    8 ++++----
 fs/ocfs2/dlm/dlmunlock.c   |    8 ++++----
 fs/ocfs2/dlmfs/Makefile    |    2 --
 fs/ocfs2/dlmfs/dlmfs.c     |    4 ++--
 fs/ocfs2/dlmfs/userdlm.c   |    6 +++---
 13 files changed, 41 insertions(+), 45 deletions(-)

--- a/fs/ocfs2/dlm/dlmast.c~ocfs2-make-local-header-paths-relative-to-c-files
+++ a/fs/ocfs2/dlm/dlmast.c
@@ -23,15 +23,15 @@
 #include <linux/spinlock.h>
 
 
-#include "cluster/heartbeat.h"
-#include "cluster/nodemanager.h"
-#include "cluster/tcp.h"
+#include "../cluster/heartbeat.h"
+#include "../cluster/nodemanager.h"
+#include "../cluster/tcp.h"
 
 #include "dlmapi.h"
 #include "dlmcommon.h"
 
 #define MLOG_MASK_PREFIX ML_DLM
-#include "cluster/masklog.h"
+#include "../cluster/masklog.h"
 
 static void dlm_update_lvb(struct dlm_ctxt *dlm, struct dlm_lock_resource *res,
 			   struct dlm_lock *lock);
--- a/fs/ocfs2/dlm/dlmconvert.c~ocfs2-make-local-header-paths-relative-to-c-files
+++ a/fs/ocfs2/dlm/dlmconvert.c
@@ -23,9 +23,9 @@
 #include <linux/spinlock.h>
 
 
-#include "cluster/heartbeat.h"
-#include "cluster/nodemanager.h"
-#include "cluster/tcp.h"
+#include "../cluster/heartbeat.h"
+#include "../cluster/nodemanager.h"
+#include "../cluster/tcp.h"
 
 #include "dlmapi.h"
 #include "dlmcommon.h"
@@ -33,7 +33,7 @@
 #include "dlmconvert.h"
 
 #define MLOG_MASK_PREFIX ML_DLM
-#include "cluster/masklog.h"
+#include "../cluster/masklog.h"
 
 /* NOTE: __dlmconvert_master is the only function in here that
  * needs a spinlock held on entry (res->spinlock) and it is the
--- a/fs/ocfs2/dlm/dlmdebug.c~ocfs2-make-local-header-paths-relative-to-c-files
+++ a/fs/ocfs2/dlm/dlmdebug.c
@@ -17,9 +17,9 @@
 #include <linux/debugfs.h>
 #include <linux/export.h>
 
-#include "cluster/heartbeat.h"
-#include "cluster/nodemanager.h"
-#include "cluster/tcp.h"
+#include "../cluster/heartbeat.h"
+#include "../cluster/nodemanager.h"
+#include "../cluster/tcp.h"
 
 #include "dlmapi.h"
 #include "dlmcommon.h"
@@ -27,7 +27,7 @@
 #include "dlmdebug.h"
 
 #define MLOG_MASK_PREFIX ML_DLM
-#include "cluster/masklog.h"
+#include "../cluster/masklog.h"
 
 static int stringify_lockname(const char *lockname, int locklen, char *buf,
 			      int len);
--- a/fs/ocfs2/dlm/dlmdomain.c~ocfs2-make-local-header-paths-relative-to-c-files
+++ a/fs/ocfs2/dlm/dlmdomain.c
@@ -20,9 +20,9 @@
 #include <linux/debugfs.h>
 #include <linux/sched/signal.h>
 
-#include "cluster/heartbeat.h"
-#include "cluster/nodemanager.h"
-#include "cluster/tcp.h"
+#include "../cluster/heartbeat.h"
+#include "../cluster/nodemanager.h"
+#include "../cluster/tcp.h"
 
 #include "dlmapi.h"
 #include "dlmcommon.h"
@@ -30,7 +30,7 @@
 #include "dlmdebug.h"
 
 #define MLOG_MASK_PREFIX (ML_DLM|ML_DLM_DOMAIN)
-#include "cluster/masklog.h"
+#include "../cluster/masklog.h"
 
 /*
  * ocfs2 node maps are array of long int, which limits to send them freely
--- a/fs/ocfs2/dlm/dlmlock.c~ocfs2-make-local-header-paths-relative-to-c-files
+++ a/fs/ocfs2/dlm/dlmlock.c
@@ -25,9 +25,9 @@
 #include <linux/delay.h>
 
 
-#include "cluster/heartbeat.h"
-#include "cluster/nodemanager.h"
-#include "cluster/tcp.h"
+#include "../cluster/heartbeat.h"
+#include "../cluster/nodemanager.h"
+#include "../cluster/tcp.h"
 
 #include "dlmapi.h"
 #include "dlmcommon.h"
@@ -35,7 +35,7 @@
 #include "dlmconvert.h"
 
 #define MLOG_MASK_PREFIX ML_DLM
-#include "cluster/masklog.h"
+#include "../cluster/masklog.h"
 
 static struct kmem_cache *dlm_lock_cache;
 
--- a/fs/ocfs2/dlm/dlmmaster.c~ocfs2-make-local-header-paths-relative-to-c-files
+++ a/fs/ocfs2/dlm/dlmmaster.c
@@ -25,9 +25,9 @@
 #include <linux/delay.h>
 
 
-#include "cluster/heartbeat.h"
-#include "cluster/nodemanager.h"
-#include "cluster/tcp.h"
+#include "../cluster/heartbeat.h"
+#include "../cluster/nodemanager.h"
+#include "../cluster/tcp.h"
 
 #include "dlmapi.h"
 #include "dlmcommon.h"
@@ -35,7 +35,7 @@
 #include "dlmdebug.h"
 
 #define MLOG_MASK_PREFIX (ML_DLM|ML_DLM_MASTER)
-#include "cluster/masklog.h"
+#include "../cluster/masklog.h"
 
 static void dlm_mle_node_down(struct dlm_ctxt *dlm,
 			      struct dlm_master_list_entry *mle,
--- a/fs/ocfs2/dlm/dlmrecovery.c~ocfs2-make-local-header-paths-relative-to-c-files
+++ a/fs/ocfs2/dlm/dlmrecovery.c
@@ -26,16 +26,16 @@
 #include <linux/delay.h>
 
 
-#include "cluster/heartbeat.h"
-#include "cluster/nodemanager.h"
-#include "cluster/tcp.h"
+#include "../cluster/heartbeat.h"
+#include "../cluster/nodemanager.h"
+#include "../cluster/tcp.h"
 
 #include "dlmapi.h"
 #include "dlmcommon.h"
 #include "dlmdomain.h"
 
 #define MLOG_MASK_PREFIX (ML_DLM|ML_DLM_RECOVERY)
-#include "cluster/masklog.h"
+#include "../cluster/masklog.h"
 
 static void dlm_do_local_recovery_cleanup(struct dlm_ctxt *dlm, u8 dead_node);
 
--- a/fs/ocfs2/dlm/dlmthread.c~ocfs2-make-local-header-paths-relative-to-c-files
+++ a/fs/ocfs2/dlm/dlmthread.c
@@ -25,16 +25,16 @@
 #include <linux/delay.h>
 
 
-#include "cluster/heartbeat.h"
-#include "cluster/nodemanager.h"
-#include "cluster/tcp.h"
+#include "../cluster/heartbeat.h"
+#include "../cluster/nodemanager.h"
+#include "../cluster/tcp.h"
 
 #include "dlmapi.h"
 #include "dlmcommon.h"
 #include "dlmdomain.h"
 
 #define MLOG_MASK_PREFIX (ML_DLM|ML_DLM_THREAD)
-#include "cluster/masklog.h"
+#include "../cluster/masklog.h"
 
 static int dlm_thread(void *data);
 static void dlm_flush_asts(struct dlm_ctxt *dlm);
--- a/fs/ocfs2/dlm/dlmunlock.c~ocfs2-make-local-header-paths-relative-to-c-files
+++ a/fs/ocfs2/dlm/dlmunlock.c
@@ -23,15 +23,15 @@
 #include <linux/spinlock.h>
 #include <linux/delay.h>
 
-#include "cluster/heartbeat.h"
-#include "cluster/nodemanager.h"
-#include "cluster/tcp.h"
+#include "../cluster/heartbeat.h"
+#include "../cluster/nodemanager.h"
+#include "../cluster/tcp.h"
 
 #include "dlmapi.h"
 #include "dlmcommon.h"
 
 #define MLOG_MASK_PREFIX ML_DLM
-#include "cluster/masklog.h"
+#include "../cluster/masklog.h"
 
 #define DLM_UNLOCK_FREE_LOCK           0x00000001
 #define DLM_UNLOCK_CALL_AST            0x00000002
--- a/fs/ocfs2/dlmfs/dlmfs.c~ocfs2-make-local-header-paths-relative-to-c-files
+++ a/fs/ocfs2/dlmfs/dlmfs.c
@@ -33,11 +33,11 @@
 
 #include <linux/uaccess.h>
 
-#include "stackglue.h"
+#include "../stackglue.h"
 #include "userdlm.h"
 
 #define MLOG_MASK_PREFIX ML_DLMFS
-#include "cluster/masklog.h"
+#include "../cluster/masklog.h"
 
 
 static const struct super_operations dlmfs_ops;
--- a/fs/ocfs2/dlmfs/Makefile~ocfs2-make-local-header-paths-relative-to-c-files
+++ a/fs/ocfs2/dlmfs/Makefile
@@ -1,6 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
-ccflags-y := -I $(srctree)/$(src)/..
-
 obj-$(CONFIG_OCFS2_FS) += ocfs2_dlmfs.o
 
 ocfs2_dlmfs-objs := userdlm.o dlmfs.o
--- a/fs/ocfs2/dlmfs/userdlm.c~ocfs2-make-local-header-paths-relative-to-c-files
+++ a/fs/ocfs2/dlmfs/userdlm.c
@@ -21,12 +21,12 @@
 #include <linux/types.h>
 #include <linux/crc32.h>
 
-#include "ocfs2_lockingver.h"
-#include "stackglue.h"
+#include "../ocfs2_lockingver.h"
+#include "../stackglue.h"
 #include "userdlm.h"
 
 #define MLOG_MASK_PREFIX ML_DLMFS
-#include "cluster/masklog.h"
+#include "../cluster/masklog.h"
 
 
 static inline struct user_lock_res *user_lksb_to_lock_res(struct ocfs2_dlm_lksb *lksb)
--- a/fs/ocfs2/dlm/Makefile~ocfs2-make-local-header-paths-relative-to-c-files
+++ a/fs/ocfs2/dlm/Makefile
@@ -1,6 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
-ccflags-y := -I $(srctree)/$(src)/..

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 014/118] ocfs2/dlm: remove redundant assignment to ret
  2020-01-31  6:10 incoming Andrew Morton
                   ` (12 preceding siblings ...)
  2020-01-31  6:11 ` [patch 013/118] ocfs2: make local header paths relative to C files Andrew Morton
@ 2020-01-31  6:11 ` Andrew Morton
  2020-01-31  6:11 ` [patch 015/118] ocfs2/dlm: move BITS_TO_BYTES() to bitops.h for wider use Andrew Morton
                   ` (103 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:11 UTC (permalink / raw)
  To: akpm, colin.king, gechangwei, ghe, jlbec, joseph.qi, junxiao.bi,
	linux-mm, mark, mm-commits, piaojun, torvalds

From: Colin Ian King <colin.king@canonical.com>
Subject: ocfs2/dlm: remove redundant assignment to ret

The variable ret is being initialized with a value that is never read and
it is being updated later with a new value.  The initialization is
redundant and can be removed.

Addresses Coverity ("Unused value")

Link: http://lkml.kernel.org/r/20191202164833.62865-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/ocfs2/dlm/dlmrecovery.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/ocfs2/dlm/dlmrecovery.c~ocfs2-dlm-remove-redundant-assignment-to-ret
+++ a/fs/ocfs2/dlm/dlmrecovery.c
@@ -1668,7 +1668,7 @@ static int dlm_lockres_master_requery(st
 int dlm_do_master_requery(struct dlm_ctxt *dlm, struct dlm_lock_resource *res,
 			  u8 nodenum, u8 *real_master)
 {
-	int ret = -EINVAL;
+	int ret;
 	struct dlm_master_requery req;
 	int status = DLM_LOCK_RES_OWNER_UNKNOWN;
 
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 015/118] ocfs2/dlm: move BITS_TO_BYTES() to bitops.h for wider use
  2020-01-31  6:10 incoming Andrew Morton
                   ` (13 preceding siblings ...)
  2020-01-31  6:11 ` [patch 014/118] ocfs2/dlm: remove redundant assignment to ret Andrew Morton
@ 2020-01-31  6:11 ` Andrew Morton
  2020-01-31  6:11 ` [patch 016/118] ocfs2: fix a NULL pointer dereference when call ocfs2_update_inode_fsync_trans() Andrew Morton
                   ` (102 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:11 UTC (permalink / raw)
  To: akpm, andriy.shevchenko, gechangwei, ghe, jlbec, joseph.qi,
	junxiao.bi, linux-mm, mark, mm-commits, piaojun, skalluru,
	torvalds

From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Subject: ocfs2/dlm: move BITS_TO_BYTES() to bitops.h for wider use

There are users already and will be more of BITS_TO_BYTES() macro.  Move
it to bitops.h for wider use.

In the case of ocfs2 the replacement is identical.

As for bnx2x, there are two places where floor version is used.  In the
first case to calculate the amount of structures that can fit one memory
page.  In this case obviously the ceiling variant is correct and original
code might have a potential bug, if amount of bits % 8 is not 0.  In the
second case the macro is used to calculate bytes transmitted in one
microsecond.  This will work for all speeds which is multiply of 1Gbps
without any change, for the rest new code will give ceiling value, for
instance 100Mbps will give 13 bytes, while old code gives 12 bytes and the
arithmetically correct one is 12.5 bytes.  Further the value is used to
setup timer threshold which in any case has its own margins due to certain
resolution.  I don't see here an issue with slightly shifting thresholds
for low speed connections, the card is supposed to utilize highest
available rate, which is usually 10Gbps.

Link: http://lkml.kernel.org/r/20200108121316.22411-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/net/ethernet/broadcom/bnx2x/bnx2x_init.h |    1 -
 fs/ocfs2/dlm/dlmcommon.h                         |    4 ----
 include/linux/bitops.h                           |    1 +
 3 files changed, 1 insertion(+), 5 deletions(-)

--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_init.h~ocfs2-dlm-move-bits_to_bytes-to-bitopsh-for-wider-use
+++ a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_init.h
@@ -296,7 +296,6 @@ static inline void bnx2x_dcb_config_qm(s
  *    possible, the driver should only write the valid vnics into the internal
  *    ram according to the appropriate port mode.
  */
-#define BITS_TO_BYTES(x) ((x)/8)
 
 /* CMNG constants, as derived from system spec calculations */
 
--- a/fs/ocfs2/dlm/dlmcommon.h~ocfs2-dlm-move-bits_to_bytes-to-bitopsh-for-wider-use
+++ a/fs/ocfs2/dlm/dlmcommon.h
@@ -688,10 +688,6 @@ struct dlm_begin_reco
 	__be32 pad2;
 };
 
-
-#define BITS_PER_BYTE 8
-#define BITS_TO_BYTES(bits) (((bits)+BITS_PER_BYTE-1)/BITS_PER_BYTE)

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 016/118] ocfs2: fix a NULL pointer dereference when call ocfs2_update_inode_fsync_trans()
  2020-01-31  6:10 incoming Andrew Morton
                   ` (14 preceding siblings ...)
  2020-01-31  6:11 ` [patch 015/118] ocfs2/dlm: move BITS_TO_BYTES() to bitops.h for wider use Andrew Morton
@ 2020-01-31  6:11 ` Andrew Morton
  2020-01-31  6:11 ` [patch 017/118] ocfs2: use ocfs2_update_inode_fsync_trans() to access t_tid in handle->h_transaction Andrew Morton
                   ` (101 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:11 UTC (permalink / raw)
  To: akpm, gechangwei, ghe, jiangqi903, jlbec, junxiao.bi, linux-mm,
	mark, mm-commits, piaojun, torvalds, wangyan122

From: wangyan <wangyan122@huawei.com>
Subject: ocfs2: fix a NULL pointer dereference when call ocfs2_update_inode_fsync_trans()

I found a NULL pointer dereference in ocfs2_update_inode_fsync_trans(),
handle->h_transaction may be NULL in this situation:

ocfs2_file_write_iter
  ->__generic_file_write_iter
      ->generic_perform_write
        ->ocfs2_write_begin
          ->ocfs2_write_begin_nolock
            ->ocfs2_write_cluster_by_desc
              ->ocfs2_write_cluster
                ->ocfs2_mark_extent_written
                  ->ocfs2_change_extent_flag
                    ->ocfs2_split_extent
                      ->ocfs2_try_to_merge_extent
                        ->ocfs2_extend_rotate_transaction
                          ->ocfs2_extend_trans
                            ->jbd2_journal_restart
                              ->jbd2__journal_restart
                                // handle->h_transaction is NULL here
                                ->handle->h_transaction = NULL;
                                ->start_this_handle
                                  /* journal aborted due to storage
                                     network disconnection, return error */
                                  ->return -EROFS;
                         /* line 3806 in ocfs2_try_to_merge_extent (),
                            it will ignore ret error. */
                        ->ret = 0;
        ->...
        ->ocfs2_write_end
          ->ocfs2_write_end_nolock
            ->ocfs2_update_inode_fsync_trans
              // NULL pointer dereference
              ->oi->i_sync_tid = handle->h_transaction->t_tid;

The information of NULL pointer dereference as follows:
    JBD2: Detected IO errors while flushing file data on dm-11-45
    Aborting journal on device dm-11-45.
    JBD2: Error -5 detected when updating journal superblock for dm-11-45.
    (dd,22081,3):ocfs2_extend_trans:474 ERROR: status = -30
    (dd,22081,3):ocfs2_try_to_merge_extent:3877 ERROR: status = -30
    Unable to handle kernel NULL pointer dereference at
    virtual address 0000000000000008
    Mem abort info:
      ESR = 0x96000004
      Exception class = DABT (current EL), IL = 32 bits
      SET = 0, FnV = 0
      EA = 0, S1PTW = 0
    Data abort info:
      ISV = 0, ISS = 0x00000004
      CM = 0, WnR = 0
    user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000e74e1338
    [0000000000000008] pgd=0000000000000000
    Internal error: Oops: 96000004 [#1] SMP
    Process dd (pid: 22081, stack limit = 0x00000000584f35a9)
    CPU: 3 PID: 22081 Comm: dd Kdump: loaded
    Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 0.98 08/25/2019
    pstate: 60400009 (nZCv daif +PAN -UAO)
    pc : ocfs2_write_end_nolock+0x2b8/0x550 [ocfs2]
    lr : ocfs2_write_end_nolock+0x2a0/0x550 [ocfs2]
    sp : ffff0000459fba70
    x29: ffff0000459fba70 x28: 0000000000000000
    x27: ffff807ccf7f1000 x26: 0000000000000001
    x25: ffff807bdff57970 x24: ffff807caf1d4000
    x23: ffff807cc79e9000 x22: 0000000000001000
    x21: 000000006c6cd000 x20: ffff0000091d9000
    x19: ffff807ccb239db0 x18: ffffffffffffffff
    x17: 000000000000000e x16: 0000000000000007
    x15: ffff807c5e15bd78 x14: 0000000000000000
    x13: 0000000000000000 x12: 0000000000000000
    x11: 0000000000000000 x10: 0000000000000001
    x9 : 0000000000000228 x8 : 000000000000000c
    x7 : 0000000000000fff x6 : ffff807a308ed6b0
    x5 : ffff7e01f10967c0 x4 : 0000000000000018
    x3 : d0bc661572445600 x2 : 0000000000000000
    x1 : 000000001b2e0200 x0 : 0000000000000000
    Call trace:
     ocfs2_write_end_nolock+0x2b8/0x550 [ocfs2]
     ocfs2_write_end+0x4c/0x80 [ocfs2]
     generic_perform_write+0x108/0x1a8
     __generic_file_write_iter+0x158/0x1c8
     ocfs2_file_write_iter+0x668/0x950 [ocfs2]
     __vfs_write+0x11c/0x190
     vfs_write+0xac/0x1c0
     ksys_write+0x6c/0xd8
     __arm64_sys_write+0x24/0x30
     el0_svc_common+0x78/0x130
     el0_svc_handler+0x38/0x78
     el0_svc+0x8/0xc

To prevent NULL pointer dereference in this situation, we use
is_handle_aborted() before using handle->h_transaction->t_tid.

Link: http://lkml.kernel.org/r/03e750ab-9ade-83aa-b000-b9e81e34e539@huawei.com
Signed-off-by: Yan Wang <wangyan122@huawei.com>
Reviewed-by: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/ocfs2/journal.h |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

--- a/fs/ocfs2/journal.h~ocfs2-fix-a-null-pointer-dereference-when-call-ocfs2_update_inode_fsync_trans
+++ a/fs/ocfs2/journal.h
@@ -597,9 +597,11 @@ static inline void ocfs2_update_inode_fs
 {
 	struct ocfs2_inode_info *oi = OCFS2_I(inode);
 
-	oi->i_sync_tid = handle->h_transaction->t_tid;
-	if (datasync)
-		oi->i_datasync_tid = handle->h_transaction->t_tid;
+	if (!is_handle_aborted(handle)) {
+		oi->i_sync_tid = handle->h_transaction->t_tid;
+		if (datasync)
+			oi->i_datasync_tid = handle->h_transaction->t_tid;
+	}
 }
 
 #endif /* OCFS2_JOURNAL_H */
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 017/118] ocfs2: use ocfs2_update_inode_fsync_trans() to access t_tid in handle->h_transaction
  2020-01-31  6:10 incoming Andrew Morton
                   ` (15 preceding siblings ...)
  2020-01-31  6:11 ` [patch 016/118] ocfs2: fix a NULL pointer dereference when call ocfs2_update_inode_fsync_trans() Andrew Morton
@ 2020-01-31  6:11 ` Andrew Morton
  2020-01-31  6:11 ` [patch 018/118] mm/slub.c: avoid slub allocation while holding list_lock Andrew Morton
                   ` (100 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:11 UTC (permalink / raw)
  To: akpm, gechangwei, ghe, jiangqi903, jlbec, junxiao.bi, linux-mm,
	mark, mm-commits, piaojun, torvalds, wangyan122

From: wangyan <wangyan122@huawei.com>
Subject: ocfs2: use ocfs2_update_inode_fsync_trans() to access t_tid in handle->h_transaction

For the uniform format, we use ocfs2_update_inode_fsync_trans() to access
t_tid in handle->h_transaction

Link: http://lkml.kernel.org/r/6ff9a312-5f7d-0e27-fb51-bc4e062fcd97@huawei.com
Signed-off-by: Yan Wang <wangyan122@huawei.com>
Reviewed-by: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/ocfs2/namei.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/fs/ocfs2/namei.c~ocfs2-use-ocfs2_update_inode_fsync_trans-to-access-t_tid-in-handle-h_transaction
+++ a/fs/ocfs2/namei.c
@@ -586,8 +586,7 @@ static int __ocfs2_mknod_locked(struct i
 			mlog_errno(status);
 	}
 
-	oi->i_sync_tid = handle->h_transaction->t_tid;
-	oi->i_datasync_tid = handle->h_transaction->t_tid;
+	ocfs2_update_inode_fsync_trans(handle, inode, 1);
 
 leave:
 	if (status < 0) {
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 018/118] mm/slub.c: avoid slub allocation while holding list_lock
  2020-01-31  6:10 incoming Andrew Morton
                   ` (16 preceding siblings ...)
  2020-01-31  6:11 ` [patch 017/118] ocfs2: use ocfs2_update_inode_fsync_trans() to access t_tid in handle->h_transaction Andrew Morton
@ 2020-01-31  6:11 ` Andrew Morton
  2020-01-31  6:12 ` [patch 019/118] mm/kmemleak: turn kmemleak_lock and object->lock to raw_spinlock_t Andrew Morton
                   ` (99 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:11 UTC (permalink / raw)
  To: akpm, cl, iamjoonsoo.kim, kirill.shutemov, linux-mm, mm-commits,
	penberg, penguin-kernel, rientjes, torvalds, yuzhao

From: Yu Zhao <yuzhao@google.com>
Subject: mm/slub.c: avoid slub allocation while holding list_lock

If we are already under list_lock, don't call kmalloc().  Otherwise we
will run into a deadlock because kmalloc() also tries to grab the same
lock.

Fix the problem by using a static bitmap instead.

  WARNING: possible recursive locking detected
  --------------------------------------------
  mount-encrypted/4921 is trying to acquire lock:
  (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437

  but task is already holding lock:
  (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(&(&n->list_lock)->rlock);
    lock(&(&n->list_lock)->rlock);

   *** DEADLOCK ***

Link: http://lkml.kernel.org/r/20191108193958.205102-2-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/slub.c |   88 +++++++++++++++++++++++++++-------------------------
 1 file changed, 47 insertions(+), 41 deletions(-)

--- a/mm/slub.c~mm-avoid-slub-allocation-while-holding-list_lock
+++ a/mm/slub.c
@@ -439,19 +439,38 @@ static inline bool cmpxchg_double_slab(s
 }
 
 #ifdef CONFIG_SLUB_DEBUG
+static unsigned long object_map[BITS_TO_LONGS(MAX_OBJS_PER_PAGE)];
+static DEFINE_SPINLOCK(object_map_lock);
+
 /*
  * Determine a map of object in use on a page.
  *
  * Node listlock must be held to guarantee that the page does
  * not vanish from under us.
  */
-static void get_map(struct kmem_cache *s, struct page *page, unsigned long *map)
+static unsigned long *get_map(struct kmem_cache *s, struct page *page)
 {
 	void *p;
 	void *addr = page_address(page);
 
+	VM_BUG_ON(!irqs_disabled());
+
+	spin_lock(&object_map_lock);
+
+	bitmap_zero(object_map, page->objects);
+
 	for (p = page->freelist; p; p = get_freepointer(s, p))
-		set_bit(slab_index(p, s, addr), map);
+		set_bit(slab_index(p, s, addr), object_map);
+
+	return object_map;
+}
+
+static void put_map(unsigned long *map)
+{
+	VM_BUG_ON(map != object_map);
+	lockdep_assert_held(&object_map_lock);
+
+	spin_unlock(&object_map_lock);
 }
 
 static inline unsigned int size_from_object(struct kmem_cache *s)
@@ -3675,13 +3694,12 @@ static void list_slab_objects(struct kme
 #ifdef CONFIG_SLUB_DEBUG
 	void *addr = page_address(page);
 	void *p;
-	unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
-	if (!map)
-		return;
+	unsigned long *map;
+
 	slab_err(s, page, text, s->name);
 	slab_lock(page);
 
-	get_map(s, page, map);
+	map = get_map(s, page);
 	for_each_object(p, s, addr, page->objects) {
 
 		if (!test_bit(slab_index(p, s, addr), map)) {
@@ -3689,8 +3707,9 @@ static void list_slab_objects(struct kme
 			print_tracking(s, p);
 		}
 	}
+	put_map(map);
+
 	slab_unlock(page);
-	bitmap_free(map);
 #endif
 }
 
@@ -4384,19 +4403,19 @@ static int count_total(struct page *page
 #endif
 
 #ifdef CONFIG_SLUB_DEBUG
-static void validate_slab(struct kmem_cache *s, struct page *page,
-						unsigned long *map)
+static void validate_slab(struct kmem_cache *s, struct page *page)
 {
 	void *p;
 	void *addr = page_address(page);
+	unsigned long *map;
+
+	slab_lock(page);
 
 	if (!check_slab(s, page) || !on_freelist(s, page, NULL))
-		return;
+		goto unlock;
 
 	/* Now we know that a valid freelist exists */
-	bitmap_zero(map, page->objects);
-
-	get_map(s, page, map);
+	map = get_map(s, page);
 	for_each_object(p, s, addr, page->objects) {
 		u8 val = test_bit(slab_index(p, s, addr), map) ?
 			 SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;
@@ -4404,18 +4423,13 @@ static void validate_slab(struct kmem_ca
 		if (!check_object(s, page, p, val))
 			break;
 	}
-}
-
-static void validate_slab_slab(struct kmem_cache *s, struct page *page,
-						unsigned long *map)
-{
-	slab_lock(page);
-	validate_slab(s, page, map);
+	put_map(map);
+unlock:
 	slab_unlock(page);
 }
 
 static int validate_slab_node(struct kmem_cache *s,
-		struct kmem_cache_node *n, unsigned long *map)
+		struct kmem_cache_node *n)
 {
 	unsigned long count = 0;
 	struct page *page;
@@ -4424,7 +4438,7 @@ static int validate_slab_node(struct kme
 	spin_lock_irqsave(&n->list_lock, flags);
 
 	list_for_each_entry(page, &n->partial, slab_list) {
-		validate_slab_slab(s, page, map);
+		validate_slab(s, page);
 		count++;
 	}
 	if (count != n->nr_partial)
@@ -4435,7 +4449,7 @@ static int validate_slab_node(struct kme
 		goto out;
 
 	list_for_each_entry(page, &n->full, slab_list) {
-		validate_slab_slab(s, page, map);
+		validate_slab(s, page);
 		count++;
 	}
 	if (count != atomic_long_read(&n->nr_slabs))
@@ -4452,15 +4466,11 @@ static long validate_slab_cache(struct k
 	int node;
 	unsigned long count = 0;
 	struct kmem_cache_node *n;
-	unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
-
-	if (!map)
-		return -ENOMEM;
 
 	flush_all(s);
 	for_each_kmem_cache_node(s, node, n)
-		count += validate_slab_node(s, n, map);
-	bitmap_free(map);
+		count += validate_slab_node(s, n);
+
 	return count;
 }
 /*
@@ -4590,18 +4600,17 @@ static int add_location(struct loc_track
 }
 
 static void process_slab(struct loc_track *t, struct kmem_cache *s,
-		struct page *page, enum track_item alloc,
-		unsigned long *map)
+		struct page *page, enum track_item alloc)
 {
 	void *addr = page_address(page);
 	void *p;
+	unsigned long *map;
 
-	bitmap_zero(map, page->objects);
-	get_map(s, page, map);
-
+	map = get_map(s, page);
 	for_each_object(p, s, addr, page->objects)
 		if (!test_bit(slab_index(p, s, addr), map))
 			add_location(t, s, get_track(s, p, alloc));
+	put_map(map);
 }
 
 static int list_locations(struct kmem_cache *s, char *buf,
@@ -4612,11 +4621,9 @@ static int list_locations(struct kmem_ca
 	struct loc_track t = { 0, 0, NULL };
 	int node;
 	struct kmem_cache_node *n;
-	unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
 
-	if (!map || !alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location),
-				     GFP_KERNEL)) {
-		bitmap_free(map);
+	if (!alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location),
+			     GFP_KERNEL)) {
 		return sprintf(buf, "Out of memory\n");
 	}
 	/* Push back cpu slabs */
@@ -4631,9 +4638,9 @@ static int list_locations(struct kmem_ca
 
 		spin_lock_irqsave(&n->list_lock, flags);
 		list_for_each_entry(page, &n->partial, slab_list)
-			process_slab(&t, s, page, alloc, map);
+			process_slab(&t, s, page, alloc);
 		list_for_each_entry(page, &n->full, slab_list)
-			process_slab(&t, s, page, alloc, map);
+			process_slab(&t, s, page, alloc);
 		spin_unlock_irqrestore(&n->list_lock, flags);
 	}
 
@@ -4682,7 +4689,6 @@ static int list_locations(struct kmem_ca
 	}
 
 	free_loc_track(&t);
-	bitmap_free(map);
 	if (!t.count)
 		len += sprintf(buf, "No data\n");
 	return len;
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 019/118] mm/kmemleak: turn kmemleak_lock and object->lock to raw_spinlock_t
  2020-01-31  6:10 incoming Andrew Morton
                   ` (17 preceding siblings ...)
  2020-01-31  6:11 ` [patch 018/118] mm/slub.c: avoid slub allocation while holding list_lock Andrew Morton
@ 2020-01-31  6:12 ` Andrew Morton
  2020-01-31  6:12 ` [patch 020/118] mm/debug.c: always print flags in dump_page() Andrew Morton
                   ` (98 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:12 UTC (permalink / raw)
  To: akpm, bigeasy, catalin.marinas, haitao.liu, linux-mm, mm-commits,
	torvalds, yongxin.liu, zhe.he

From: He Zhe <zhe.he@windriver.com>
Subject: mm/kmemleak: turn kmemleak_lock and object->lock to raw_spinlock_t

kmemleak_lock as a rwlock on RT can possibly be acquired in atomic context
which does work.

Since the kmemleak operation is performed in atomic context make it a
raw_spinlock_t so it can also be acquired on RT.  This is used for
debugging and is not enabled by default in a production like environment
(where performance/latency matters) so it makes sense to make it a
raw_spinlock_t instead trying to get rid of the atomic context.  Turn also
the kmemleak_object->lock into raw_spinlock_t which is acquired (nested)
while the kmemleak_lock is held.

The time spent in "echo scan > kmemleak" slightly improved on 64core box
with this patch applied after boot.

[bigeasy@linutronix.de: redo the description, update comments. Merge the individual bits:  He Zhe did the kmemleak_lock, Liu Haitao the ->lock and Yongxin Liu forwarded Liu's patch.]
Link: http://lkml.kernel.org/r/20191219170834.4tah3prf2gdothz4@linutronix.de
Link: https://lkml.kernel.org/r/20181218150744.GB20197@arrakis.emea.arm.com
Link: https://lkml.kernel.org/r/1542877459-144382-1-git-send-email-zhe.he@windriver.com
Link: https://lkml.kernel.org/r/20190927082230.34152-1-yongxin.liu@windriver.com
Signed-off-by: He Zhe <zhe.he@windriver.com>
Signed-off-by: Liu Haitao <haitao.liu@windriver.com>
Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/kmemleak.c |  112 ++++++++++++++++++++++++------------------------
 1 file changed, 56 insertions(+), 56 deletions(-)

--- a/mm/kmemleak.c~kmemleak-turn-kmemleak_lock-and-object-lock-to-raw_spinlock_t
+++ a/mm/kmemleak.c
@@ -13,7 +13,7 @@
  *
  * The following locks and mutexes are used by kmemleak:
  *
- * - kmemleak_lock (rwlock): protects the object_list modifications and
+ * - kmemleak_lock (raw_spinlock_t): protects the object_list modifications and
  *   accesses to the object_tree_root. The object_list is the main list
  *   holding the metadata (struct kmemleak_object) for the allocated memory
  *   blocks. The object_tree_root is a red black tree used to look-up
@@ -22,13 +22,13 @@
  *   object_tree_root in the create_object() function called from the
  *   kmemleak_alloc() callback and removed in delete_object() called from the
  *   kmemleak_free() callback
- * - kmemleak_object.lock (spinlock): protects a kmemleak_object. Accesses to
- *   the metadata (e.g. count) are protected by this lock. Note that some
- *   members of this structure may be protected by other means (atomic or
- *   kmemleak_lock). This lock is also held when scanning the corresponding
- *   memory block to avoid the kernel freeing it via the kmemleak_free()
- *   callback. This is less heavyweight than holding a global lock like
- *   kmemleak_lock during scanning
+ * - kmemleak_object.lock (raw_spinlock_t): protects a kmemleak_object.
+ *   Accesses to the metadata (e.g. count) are protected by this lock. Note
+ *   that some members of this structure may be protected by other means
+ *   (atomic or kmemleak_lock). This lock is also held when scanning the
+ *   corresponding memory block to avoid the kernel freeing it via the
+ *   kmemleak_free() callback. This is less heavyweight than holding a global
+ *   lock like kmemleak_lock during scanning.
  * - scan_mutex (mutex): ensures that only one thread may scan the memory for
  *   unreferenced objects at a time. The gray_list contains the objects which
  *   are already referenced or marked as false positives and need to be
@@ -135,7 +135,7 @@ struct kmemleak_scan_area {
  * (use_count) and freed using the RCU mechanism.
  */
 struct kmemleak_object {
-	spinlock_t lock;
+	raw_spinlock_t lock;
 	unsigned int flags;		/* object status flags */
 	struct list_head object_list;
 	struct list_head gray_list;
@@ -191,8 +191,8 @@ static int mem_pool_free_count = ARRAY_S
 static LIST_HEAD(mem_pool_free_list);
 /* search tree for object boundaries */
 static struct rb_root object_tree_root = RB_ROOT;
-/* rw_lock protecting the access to object_list and object_tree_root */
-static DEFINE_RWLOCK(kmemleak_lock);
+/* protecting the access to object_list and object_tree_root */
+static DEFINE_RAW_SPINLOCK(kmemleak_lock);
 
 /* allocation caches for kmemleak internal data */
 static struct kmem_cache *object_cache;
@@ -426,7 +426,7 @@ static struct kmemleak_object *mem_pool_
 	}
 
 	/* slab allocation failed, try the memory pool */
-	write_lock_irqsave(&kmemleak_lock, flags);
+	raw_spin_lock_irqsave(&kmemleak_lock, flags);
 	object = list_first_entry_or_null(&mem_pool_free_list,
 					  typeof(*object), object_list);
 	if (object)
@@ -435,7 +435,7 @@ static struct kmemleak_object *mem_pool_
 		object = &mem_pool[--mem_pool_free_count];
 	else
 		pr_warn_once("Memory pool empty, consider increasing CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE\n");
-	write_unlock_irqrestore(&kmemleak_lock, flags);
+	raw_spin_unlock_irqrestore(&kmemleak_lock, flags);
 
 	return object;
 }
@@ -453,9 +453,9 @@ static void mem_pool_free(struct kmemlea
 	}
 
 	/* add the object to the memory pool free list */
-	write_lock_irqsave(&kmemleak_lock, flags);
+	raw_spin_lock_irqsave(&kmemleak_lock, flags);
 	list_add(&object->object_list, &mem_pool_free_list);
-	write_unlock_irqrestore(&kmemleak_lock, flags);
+	raw_spin_unlock_irqrestore(&kmemleak_lock, flags);
 }
 
 /*
@@ -514,9 +514,9 @@ static struct kmemleak_object *find_and_
 	struct kmemleak_object *object;
 
 	rcu_read_lock();
-	read_lock_irqsave(&kmemleak_lock, flags);
+	raw_spin_lock_irqsave(&kmemleak_lock, flags);
 	object = lookup_object(ptr, alias);
-	read_unlock_irqrestore(&kmemleak_lock, flags);
+	raw_spin_unlock_irqrestore(&kmemleak_lock, flags);
 
 	/* check whether the object is still available */
 	if (object && !get_object(object))
@@ -546,11 +546,11 @@ static struct kmemleak_object *find_and_
 	unsigned long flags;
 	struct kmemleak_object *object;
 
-	write_lock_irqsave(&kmemleak_lock, flags);
+	raw_spin_lock_irqsave(&kmemleak_lock, flags);
 	object = lookup_object(ptr, alias);
 	if (object)
 		__remove_object(object);
-	write_unlock_irqrestore(&kmemleak_lock, flags);
+	raw_spin_unlock_irqrestore(&kmemleak_lock, flags);
 
 	return object;
 }
@@ -585,7 +585,7 @@ static struct kmemleak_object *create_ob
 	INIT_LIST_HEAD(&object->object_list);
 	INIT_LIST_HEAD(&object->gray_list);
 	INIT_HLIST_HEAD(&object->area_list);
-	spin_lock_init(&object->lock);
+	raw_spin_lock_init(&object->lock);
 	atomic_set(&object->use_count, 1);
 	object->flags = OBJECT_ALLOCATED;
 	object->pointer = ptr;
@@ -617,7 +617,7 @@ static struct kmemleak_object *create_ob
 	/* kernel backtrace */
 	object->trace_len = __save_stack_trace(object->trace);
 
-	write_lock_irqsave(&kmemleak_lock, flags);
+	raw_spin_lock_irqsave(&kmemleak_lock, flags);
 
 	untagged_ptr = (unsigned long)kasan_reset_tag((void *)ptr);
 	min_addr = min(min_addr, untagged_ptr);
@@ -649,7 +649,7 @@ static struct kmemleak_object *create_ob
 
 	list_add_tail_rcu(&object->object_list, &object_list);
 out:
-	write_unlock_irqrestore(&kmemleak_lock, flags);
+	raw_spin_unlock_irqrestore(&kmemleak_lock, flags);
 	return object;
 }
 
@@ -667,9 +667,9 @@ static void __delete_object(struct kmeml
 	 * Locking here also ensures that the corresponding memory block
 	 * cannot be freed when it is being scanned.
 	 */
-	spin_lock_irqsave(&object->lock, flags);
+	raw_spin_lock_irqsave(&object->lock, flags);
 	object->flags &= ~OBJECT_ALLOCATED;
-	spin_unlock_irqrestore(&object->lock, flags);
+	raw_spin_unlock_irqrestore(&object->lock, flags);
 	put_object(object);
 }
 
@@ -739,9 +739,9 @@ static void paint_it(struct kmemleak_obj
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&object->lock, flags);
+	raw_spin_lock_irqsave(&object->lock, flags);
 	__paint_it(object, color);
-	spin_unlock_irqrestore(&object->lock, flags);
+	raw_spin_unlock_irqrestore(&object->lock, flags);
 }
 
 static void paint_ptr(unsigned long ptr, int color)
@@ -798,7 +798,7 @@ static void add_scan_area(unsigned long
 	if (scan_area_cache)
 		area = kmem_cache_alloc(scan_area_cache, gfp_kmemleak_mask(gfp));
 
-	spin_lock_irqsave(&object->lock, flags);
+	raw_spin_lock_irqsave(&object->lock, flags);
 	if (!area) {
 		pr_warn_once("Cannot allocate a scan area, scanning the full object\n");
 		/* mark the object for full scan to avoid false positives */
@@ -820,7 +820,7 @@ static void add_scan_area(unsigned long
 
 	hlist_add_head(&area->node, &object->area_list);
 out_unlock:
-	spin_unlock_irqrestore(&object->lock, flags);
+	raw_spin_unlock_irqrestore(&object->lock, flags);
 	put_object(object);
 }
 
@@ -842,9 +842,9 @@ static void object_set_excess_ref(unsign
 		return;
 	}
 
-	spin_lock_irqsave(&object->lock, flags);
+	raw_spin_lock_irqsave(&object->lock, flags);
 	object->excess_ref = excess_ref;
-	spin_unlock_irqrestore(&object->lock, flags);
+	raw_spin_unlock_irqrestore(&object->lock, flags);
 	put_object(object);
 }
 
@@ -864,9 +864,9 @@ static void object_no_scan(unsigned long
 		return;
 	}
 
-	spin_lock_irqsave(&object->lock, flags);
+	raw_spin_lock_irqsave(&object->lock, flags);
 	object->flags |= OBJECT_NO_SCAN;
-	spin_unlock_irqrestore(&object->lock, flags);
+	raw_spin_unlock_irqrestore(&object->lock, flags);
 	put_object(object);
 }
 
@@ -1026,9 +1026,9 @@ void __ref kmemleak_update_trace(const v
 		return;
 	}
 
-	spin_lock_irqsave(&object->lock, flags);
+	raw_spin_lock_irqsave(&object->lock, flags);
 	object->trace_len = __save_stack_trace(object->trace);
-	spin_unlock_irqrestore(&object->lock, flags);
+	raw_spin_unlock_irqrestore(&object->lock, flags);
 
 	put_object(object);
 }
@@ -1233,7 +1233,7 @@ static void scan_block(void *_start, voi
 	unsigned long flags;
 	unsigned long untagged_ptr;
 
-	read_lock_irqsave(&kmemleak_lock, flags);
+	raw_spin_lock_irqsave(&kmemleak_lock, flags);
 	for (ptr = start; ptr < end; ptr++) {
 		struct kmemleak_object *object;
 		unsigned long pointer;
@@ -1268,7 +1268,7 @@ static void scan_block(void *_start, voi
 		 * previously acquired in scan_object(). These locks are
 		 * enclosed by scan_mutex.
 		 */
-		spin_lock_nested(&object->lock, SINGLE_DEPTH_NESTING);
+		raw_spin_lock_nested(&object->lock, SINGLE_DEPTH_NESTING);
 		/* only pass surplus references (object already gray) */
 		if (color_gray(object)) {
 			excess_ref = object->excess_ref;
@@ -1277,7 +1277,7 @@ static void scan_block(void *_start, voi
 			excess_ref = 0;
 			update_refs(object);
 		}
-		spin_unlock(&object->lock);
+		raw_spin_unlock(&object->lock);
 
 		if (excess_ref) {
 			object = lookup_object(excess_ref, 0);
@@ -1286,12 +1286,12 @@ static void scan_block(void *_start, voi
 			if (object == scanned)
 				/* circular reference, ignore */
 				continue;
-			spin_lock_nested(&object->lock, SINGLE_DEPTH_NESTING);
+			raw_spin_lock_nested(&object->lock, SINGLE_DEPTH_NESTING);
 			update_refs(object);
-			spin_unlock(&object->lock);
+			raw_spin_unlock(&object->lock);
 		}
 	}
-	read_unlock_irqrestore(&kmemleak_lock, flags);
+	raw_spin_unlock_irqrestore(&kmemleak_lock, flags);
 }
 
 /*
@@ -1324,7 +1324,7 @@ static void scan_object(struct kmemleak_
 	 * Once the object->lock is acquired, the corresponding memory block
 	 * cannot be freed (the same lock is acquired in delete_object).
 	 */
-	spin_lock_irqsave(&object->lock, flags);
+	raw_spin_lock_irqsave(&object->lock, flags);
 	if (object->flags & OBJECT_NO_SCAN)
 		goto out;
 	if (!(object->flags & OBJECT_ALLOCATED))
@@ -1344,9 +1344,9 @@ static void scan_object(struct kmemleak_
 			if (start >= end)
 				break;
 
-			spin_unlock_irqrestore(&object->lock, flags);
+			raw_spin_unlock_irqrestore(&object->lock, flags);
 			cond_resched();
-			spin_lock_irqsave(&object->lock, flags);
+			raw_spin_lock_irqsave(&object->lock, flags);
 		} while (object->flags & OBJECT_ALLOCATED);
 	} else
 		hlist_for_each_entry(area, &object->area_list, node)
@@ -1354,7 +1354,7 @@ static void scan_object(struct kmemleak_
 				   (void *)(area->start + area->size),
 				   object);
 out:
-	spin_unlock_irqrestore(&object->lock, flags);
+	raw_spin_unlock_irqrestore(&object->lock, flags);
 }
 
 /*
@@ -1407,7 +1407,7 @@ static void kmemleak_scan(void)
 	/* prepare the kmemleak_object's */
 	rcu_read_lock();
 	list_for_each_entry_rcu(object, &object_list, object_list) {
-		spin_lock_irqsave(&object->lock, flags);
+		raw_spin_lock_irqsave(&object->lock, flags);
 #ifdef DEBUG
 		/*
 		 * With a few exceptions there should be a maximum of
@@ -1424,7 +1424,7 @@ static void kmemleak_scan(void)
 		if (color_gray(object) && get_object(object))
 			list_add_tail(&object->gray_list, &gray_list);
 
-		spin_unlock_irqrestore(&object->lock, flags);
+		raw_spin_unlock_irqrestore(&object->lock, flags);
 	}
 	rcu_read_unlock();
 
@@ -1492,14 +1492,14 @@ static void kmemleak_scan(void)
 	 */
 	rcu_read_lock();
 	list_for_each_entry_rcu(object, &object_list, object_list) {
-		spin_lock_irqsave(&object->lock, flags);
+		raw_spin_lock_irqsave(&object->lock, flags);
 		if (color_white(object) && (object->flags & OBJECT_ALLOCATED)
 		    && update_checksum(object) && get_object(object)) {
 			/* color it gray temporarily */
 			object->count = object->min_count;
 			list_add_tail(&object->gray_list, &gray_list);
 		}
-		spin_unlock_irqrestore(&object->lock, flags);
+		raw_spin_unlock_irqrestore(&object->lock, flags);
 	}
 	rcu_read_unlock();
 
@@ -1519,7 +1519,7 @@ static void kmemleak_scan(void)
 	 */
 	rcu_read_lock();
 	list_for_each_entry_rcu(object, &object_list, object_list) {
-		spin_lock_irqsave(&object->lock, flags);
+		raw_spin_lock_irqsave(&object->lock, flags);
 		if (unreferenced_object(object) &&
 		    !(object->flags & OBJECT_REPORTED)) {
 			object->flags |= OBJECT_REPORTED;
@@ -1529,7 +1529,7 @@ static void kmemleak_scan(void)
 
 			new_leaks++;
 		}
-		spin_unlock_irqrestore(&object->lock, flags);
+		raw_spin_unlock_irqrestore(&object->lock, flags);
 	}
 	rcu_read_unlock();
 
@@ -1681,10 +1681,10 @@ static int kmemleak_seq_show(struct seq_
 	struct kmemleak_object *object = v;
 	unsigned long flags;
 
-	spin_lock_irqsave(&object->lock, flags);
+	raw_spin_lock_irqsave(&object->lock, flags);
 	if ((object->flags & OBJECT_REPORTED) && unreferenced_object(object))
 		print_unreferenced(seq, object);
-	spin_unlock_irqrestore(&object->lock, flags);
+	raw_spin_unlock_irqrestore(&object->lock, flags);
 	return 0;
 }
 
@@ -1714,9 +1714,9 @@ static int dump_str_object_info(const ch
 		return -EINVAL;
 	}
 
-	spin_lock_irqsave(&object->lock, flags);
+	raw_spin_lock_irqsave(&object->lock, flags);
 	dump_object_info(object);
-	spin_unlock_irqrestore(&object->lock, flags);
+	raw_spin_unlock_irqrestore(&object->lock, flags);
 
 	put_object(object);
 	return 0;
@@ -1735,11 +1735,11 @@ static void kmemleak_clear(void)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(object, &object_list, object_list) {
-		spin_lock_irqsave(&object->lock, flags);
+		raw_spin_lock_irqsave(&object->lock, flags);
 		if ((object->flags & OBJECT_REPORTED) &&
 		    unreferenced_object(object))
 			__paint_it(object, KMEMLEAK_GREY);
-		spin_unlock_irqrestore(&object->lock, flags);
+		raw_spin_unlock_irqrestore(&object->lock, flags);
 	}
 	rcu_read_unlock();
 
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 020/118] mm/debug.c: always print flags in dump_page()
  2020-01-31  6:10 incoming Andrew Morton
                   ` (18 preceding siblings ...)
  2020-01-31  6:12 ` [patch 019/118] mm/kmemleak: turn kmemleak_lock and object->lock to raw_spinlock_t Andrew Morton
@ 2020-01-31  6:12 ` Andrew Morton
  2020-01-31  6:12 ` [patch 021/118] mm/filemap.c: clean up filemap_write_and_wait() Andrew Morton
                   ` (97 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:12 UTC (permalink / raw)
  To: akpm, anshuman.khandual, cai, dan.j.williams, david, linux-mm,
	mgorman, mhocko, mhocko, mm-commits, osalvador, pavel.tatashin,
	rcampbell, rppt, torvalds, vbabka

From: Vlastimil Babka <vbabka@suse.cz>
Subject: mm/debug.c: always print flags in dump_page()

Commit 76a1850e4572 ("mm/debug.c: __dump_page() prints an extra line")
inadvertently removed printing of page flags for pages that are neither
anon nor ksm nor have a mapping.  Fix that.

Using pr_cont() again would be a solution, but the commit explicitly
removed its use.  Avoiding the danger of mixing up split lines from
multiple CPUs might be beneficial for near-panic dumps like this, so fix
this without reintroducing pr_cont().

Link: http://lkml.kernel.org/r/9f884d5c-ca60-dc7b-219c-c081c755fab6@suse.cz
Fixes: 76a1850e4572 ("mm/debug.c: __dump_page() prints an extra line")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reported-by: Michal Hocko <mhocko@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/debug.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

--- a/mm/debug.c~mm-debug-always-print-flags-in-dump_page
+++ a/mm/debug.c
@@ -47,6 +47,7 @@ void __dump_page(struct page *page, cons
 	struct address_space *mapping;
 	bool page_poisoned = PagePoisoned(page);
 	int mapcount;
+	char *type = "";
 
 	/*
 	 * If struct page is poisoned don't access Page*() functions as that
@@ -78,9 +79,9 @@ void __dump_page(struct page *page, cons
 			page, page_ref_count(page), mapcount,
 			page->mapping, page_to_pgoff(page));
 	if (PageKsm(page))
-		pr_warn("ksm flags: %#lx(%pGp)\n", page->flags, &page->flags);
+		type = "ksm ";
 	else if (PageAnon(page))
-		pr_warn("anon flags: %#lx(%pGp)\n", page->flags, &page->flags);
+		type = "anon ";
 	else if (mapping) {
 		if (mapping->host && mapping->host->i_dentry.first) {
 			struct dentry *dentry;
@@ -88,10 +89,11 @@ void __dump_page(struct page *page, cons
 			pr_warn("%ps name:\"%pd\"\n", mapping->a_ops, dentry);
 		} else
 			pr_warn("%ps\n", mapping->a_ops);
-		pr_warn("flags: %#lx(%pGp)\n", page->flags, &page->flags);
 	}
 	BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
 
+	pr_warn("%sflags: %#lx(%pGp)\n", type, page->flags, &page->flags);
+
 hex_only:
 	print_hex_dump(KERN_WARNING, "raw: ", DUMP_PREFIX_NONE, 32,
 			sizeof(unsigned long), page,
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 021/118] mm/filemap.c: clean up filemap_write_and_wait()
  2020-01-31  6:10 incoming Andrew Morton
                   ` (19 preceding siblings ...)
  2020-01-31  6:12 ` [patch 020/118] mm/debug.c: always print flags in dump_page() Andrew Morton
@ 2020-01-31  6:12 ` Andrew Morton
  2020-01-31  6:12 ` [patch 022/118] mm: fix gup_pud_range Andrew Morton
                   ` (96 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:12 UTC (permalink / raw)
  To: akpm, ira.weiny, linux-mm, mm-commits, nborisov, torvalds, willy

From: Ira Weiny <ira.weiny@intel.com>
Subject: mm/filemap.c: clean up filemap_write_and_wait()

At some point filemap_write_and_wait() and filemap_write_and_wait_range()
got the exact same implementation with the exception of the range being
specified in *_range()

Similar to other functions in fs.h which call *_range(..., 0, LLONG_MAX),
change filemap_write_and_wait() to be a static inline which calls
filemap_write_and_wait_range()

Link: http://lkml.kernel.org/r/20191129160713.30892-1-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/fs.h |    6 +++++-
 mm/filemap.c       |   34 ++++++----------------------------
 2 files changed, 11 insertions(+), 29 deletions(-)

--- a/include/linux/fs.h~mm-clean-up-filemap_write_and_wait
+++ a/include/linux/fs.h
@@ -2737,7 +2737,6 @@ static inline int filemap_fdatawait(stru
 
 extern bool filemap_range_has_page(struct address_space *, loff_t lstart,
 				  loff_t lend);
-extern int filemap_write_and_wait(struct address_space *mapping);
 extern int filemap_write_and_wait_range(struct address_space *mapping,
 				        loff_t lstart, loff_t lend);
 extern int __filemap_fdatawrite_range(struct address_space *mapping,
@@ -2747,6 +2746,11 @@ extern int filemap_fdatawrite_range(stru
 extern int filemap_check_errors(struct address_space *mapping);
 extern void __filemap_set_wb_err(struct address_space *mapping, int err);
 
+static inline int filemap_write_and_wait(struct address_space *mapping)
+{
+	return filemap_write_and_wait_range(mapping, 0, LLONG_MAX);
+}
+
 extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart,
 						loff_t lend);
 extern int __must_check file_check_and_advance_wb_err(struct file *file);
--- a/mm/filemap.c~mm-clean-up-filemap_write_and_wait
+++ a/mm/filemap.c
@@ -632,33 +632,6 @@ static bool mapping_needs_writeback(stru
 	return mapping->nrpages;
 }
 
-int filemap_write_and_wait(struct address_space *mapping)
-{
-	int err = 0;
-
-	if (mapping_needs_writeback(mapping)) {
-		err = filemap_fdatawrite(mapping);
-		/*
-		 * Even if the above returned error, the pages may be
-		 * written partially (e.g. -ENOSPC), so we wait for it.
-		 * But the -EIO is special case, it may indicate the worst
-		 * thing (e.g. bug) happened, so we avoid waiting for it.
-		 */
-		if (err != -EIO) {
-			int err2 = filemap_fdatawait(mapping);
-			if (!err)
-				err = err2;
-		} else {
-			/* Clear any previously stored errors */
-			filemap_check_errors(mapping);
-		}
-	} else {
-		err = filemap_check_errors(mapping);
-	}
-	return err;
-}
-EXPORT_SYMBOL(filemap_write_and_wait);

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 022/118] mm: fix gup_pud_range
  2020-01-31  6:10 incoming Andrew Morton
                   ` (20 preceding siblings ...)
  2020-01-31  6:12 ` [patch 021/118] mm/filemap.c: clean up filemap_write_and_wait() Andrew Morton
@ 2020-01-31  6:12 ` Andrew Morton
  2020-01-31  6:12 ` [patch 023/118] mm/gup.c: use is_vm_hugetlb_page() to check whether to follow huge Andrew Morton
                   ` (95 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:12 UTC (permalink / raw)
  To: akpm, aneesh.kumar, hqjagain, jhubbard, linux-mm, mm-commits,
	n-horiguchi, torvalds

From: Qiujun Huang <hqjagain@gmail.com>
Subject: mm: fix gup_pud_range

sorry for not processing for a long time. I met it again.

patch v1   https://lkml.org/lkml/2019/9/20/656

do_machine_check()
  do_memory_failure()
    memory_failure()
      hw_poison_user_mappings()
        try_to_unmap()
          pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));

...and now we have a swap entry that indicates that the page entry
refers to a bad (and poisoned) page of memory, but gup_fast() at this
level of the page table was ignoring swap entries, and incorrectly
assuming that "!pxd_none() == valid and present".

And this was not just a poisoned page problem, but a generaly swap entry
problem. So, any swap entry type (device memory migration, numa migration,
or just regular swapping) could lead to the same problem.

Fix this by checking for pxd_present(), instead of pxd_none().

Link: http://lkml.kernel.org/r/1578479084-15508-1-git-send-email-hqjagain@gmail.com
Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/gup.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/gup.c~mm-fix-gup_pud_range
+++ a/mm/gup.c
@@ -2237,7 +2237,7 @@ static int gup_pud_range(p4d_t p4d, unsi
 		pud_t pud = READ_ONCE(*pudp);
 
 		next = pud_addr_end(addr, end);
-		if (pud_none(pud))
+		if (unlikely(!pud_present(pud)))
 			return 0;
 		if (unlikely(pud_huge(pud))) {
 			if (!gup_huge_pud(pud, pudp, addr, next, flags,
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 023/118] mm/gup.c: use is_vm_hugetlb_page() to check whether to follow huge
  2020-01-31  6:10 incoming Andrew Morton
                   ` (21 preceding siblings ...)
  2020-01-31  6:12 ` [patch 022/118] mm: fix gup_pud_range Andrew Morton
@ 2020-01-31  6:12 ` Andrew Morton
  2020-01-31  6:12 ` [patch 024/118] mm/gup: factor out duplicate code from four routines Andrew Morton
                   ` (94 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:12 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, rcampbell, richardw.yang, rientjes,
	torvalds, vbabka

From: Wei Yang <richardw.yang@linux.intel.com>
Subject: mm/gup.c: use is_vm_hugetlb_page() to check whether to follow huge

No functional change, just leverage the helper function to improve
readability as others.

Link: http://lkml.kernel.org/r/20200113070322.26627-1-richardw.yang@linux.intel.com
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/gup.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/mm/gup.c~mm-gupc-use-is_vm_hugetlb_page-to-check-whether-to-follow-huge
+++ a/mm/gup.c
@@ -323,7 +323,7 @@ static struct page *follow_pmd_mask(stru
 	pmdval = READ_ONCE(*pmd);
 	if (pmd_none(pmdval))
 		return no_page_table(vma, flags);
-	if (pmd_huge(pmdval) && vma->vm_flags & VM_HUGETLB) {
+	if (pmd_huge(pmdval) && is_vm_hugetlb_page(vma)) {
 		page = follow_huge_pmd(mm, address, pmd, flags);
 		if (page)
 			return page;
@@ -433,7 +433,7 @@ static struct page *follow_pud_mask(stru
 	pud = pud_offset(p4dp, address);
 	if (pud_none(*pud))
 		return no_page_table(vma, flags);
-	if (pud_huge(*pud) && vma->vm_flags & VM_HUGETLB) {
+	if (pud_huge(*pud) && is_vm_hugetlb_page(vma)) {
 		page = follow_huge_pud(mm, address, pud, flags);
 		if (page)
 			return page;
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 024/118] mm/gup: factor out duplicate code from four routines
  2020-01-31  6:10 incoming Andrew Morton
                   ` (22 preceding siblings ...)
  2020-01-31  6:12 ` [patch 023/118] mm/gup.c: use is_vm_hugetlb_page() to check whether to follow huge Andrew Morton
@ 2020-01-31  6:12 ` Andrew Morton
  2020-01-31  6:12 ` [patch 025/118] mm/gup: move try_get_compound_head() to top, fix minor issues Andrew Morton
                   ` (93 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:12 UTC (permalink / raw)
  To: akpm, alex.williamson, aneesh.kumar, axboe, bjorn.topel, corbet,
	dan.j.williams, daniel.vetter, hch, hverkuil-cisco, ira.weiny,
	jack, jgg, jgg, jglisse, jhubbard, kirill, leonro, linux-mm,
	mchehab, mm-commits, rppt, torvalds

From: John Hubbard <jhubbard@nvidia.com>
Subject: mm/gup: factor out duplicate code from four routines

Patch series "mm/gup: prereqs to track dma-pinned pages: FOLL_PIN", v12.

Overview:

This is a prerequisite to solving the problem of proper interactions
between file-backed pages, and [R]DMA activities, as discussed in [1],
[2], [3], and in a remarkable number of email threads since about
2017. :)

A new internal gup flag, FOLL_PIN is introduced, and thoroughly
documented in the last patch's Documentation/vm/pin_user_pages.rst.

I believe that this will provide a good starting point for doing the
layout lease work that Ira Weiny has been working on. That's because
these new wrapper functions provide a clean, constrained, systematically
named set of functionality that, again, is required in order to even
know if a page is "dma-pinned".

In contrast to earlier approaches, the page tracking can be
incrementally applied to the kernel call sites that, until now, have
been simply calling get_user_pages() ("gup"). In other words, opt-in by
changing from this:

    get_user_pages() (sets FOLL_GET)
    put_page()

to this:
    pin_user_pages() (sets FOLL_PIN)
    unpin_user_page()


Testing:

* I've done some overall kernel testing (LTP, and a few other goodies),
  and some directed testing to exercise some of the changes. And as you
  can see, gup_benchmark is enhanced to exercise this. Basically, I've
  been able to runtime test the core get_user_pages() and
  pin_user_pages() and related routines, but not so much on several of
  the call sites--but those are generally just a couple of lines
  changed, each.

  Not much of the kernel is actually using this, which on one hand
  reduces risk quite a lot. But on the other hand, testing coverage
  is low. So I'd love it if, in particular, the Infiniband and PowerPC
  folks could do a smoke test of this series for me.

  Runtime testing for the call sites so far is pretty light:

    * io_uring: Some directed tests from liburing exercise this, and
                they pass.
    * process_vm_access.c: A small directed test passes.
    * gup_benchmark: the enhanced version hits the new gup.c code, and
                     passes.
    * infiniband: Ran rdma-core tests: rdma-core/build/bin/run_tests.py
    * VFIO: compiles (I'm vowing to set up a run time test soon, but it's
                      not ready just yet)
    * powerpc: it compiles...
    * drm/via: compiles...
    * goldfish: compiles...
    * net/xdp: compiles...
    * media/v4l2: compiles...

[1] Some slow progress on get_user_pages() (Apr 2, 2019): https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018): https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018): https://lwn.net/Articles/753027/


This patch (of 22):

There are four locations in gup.c that have a fair amount of code
duplication.  This means that changing one requires making the same
changes in four places, not to mention reading the same code four times,
and wondering if there are subtle differences.

Factor out the common code into static functions, thus reducing the
overall line count and the code's complexity.

Also, take the opportunity to slightly improve the efficiency of the error
cases, by doing a mass subtraction of the refcount, surrounded by
get_page()/put_page().

Also, further simplify (slightly), by waiting until the the successful end
of each routine, to increment *nr.

Link: http://lkml.kernel.org/r/20200107224558.2362728-2-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/gup.c |   95 ++++++++++++++++++++++-------------------------------
 1 file changed, 40 insertions(+), 55 deletions(-)

--- a/mm/gup.c~mm-gup-factor-out-duplicate-code-from-four-routines
+++ a/mm/gup.c
@@ -1978,6 +1978,29 @@ static int __gup_device_huge_pud(pud_t p
 }
 #endif
 
+static int record_subpages(struct page *page, unsigned long addr,
+			   unsigned long end, struct page **pages)
+{
+	int nr;
+
+	for (nr = 0; addr != end; addr += PAGE_SIZE)
+		pages[nr++] = page++;
+
+	return nr;
+}
+
+static void put_compound_head(struct page *page, int refs)
+{
+	VM_BUG_ON_PAGE(page_ref_count(page) < refs, page);
+	/*
+	 * Calling put_page() for each ref is unnecessarily slow. Only the last
+	 * ref needs a put_page().
+	 */
+	if (refs > 1)
+		page_ref_sub(page, refs - 1);
+	put_page(page);
+}
+
 #ifdef CONFIG_ARCH_HAS_HUGEPD
 static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end,
 				      unsigned long sz)
@@ -2007,32 +2030,20 @@ static int gup_hugepte(pte_t *ptep, unsi
 	/* hugepages are never "special" */
 	VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
 
-	refs = 0;
 	head = pte_page(pte);
-
 	page = head + ((addr & (sz-1)) >> PAGE_SHIFT);
-	do {
-		VM_BUG_ON(compound_head(page) != head);
-		pages[*nr] = page;
-		(*nr)++;
-		page++;
-		refs++;
-	} while (addr += PAGE_SIZE, addr != end);
+	refs = record_subpages(page, addr, end, pages + *nr);
 
 	head = try_get_compound_head(head, refs);
-	if (!head) {
-		*nr -= refs;
+	if (!head)
 		return 0;
-	}
 
 	if (unlikely(pte_val(pte) != pte_val(*ptep))) {
-		/* Could be optimized better */
-		*nr -= refs;
-		while (refs--)
-			put_page(head);
+		put_compound_head(head, refs);
 		return 0;
 	}
 
+	*nr += refs;
 	SetPageReferenced(head);
 	return 1;
 }
@@ -2079,28 +2090,19 @@ static int gup_huge_pmd(pmd_t orig, pmd_
 		return __gup_device_huge_pmd(orig, pmdp, addr, end, pages, nr);
 	}
 
-	refs = 0;
 	page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
-	do {
-		pages[*nr] = page;
-		(*nr)++;
-		page++;
-		refs++;
-	} while (addr += PAGE_SIZE, addr != end);
+	refs = record_subpages(page, addr, end, pages + *nr);
 
 	head = try_get_compound_head(pmd_page(orig), refs);
-	if (!head) {
-		*nr -= refs;
+	if (!head)
 		return 0;
-	}
 
 	if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
-		*nr -= refs;
-		while (refs--)
-			put_page(head);
+		put_compound_head(head, refs);
 		return 0;
 	}
 
+	*nr += refs;
 	SetPageReferenced(head);
 	return 1;
 }
@@ -2120,28 +2122,19 @@ static int gup_huge_pud(pud_t orig, pud_
 		return __gup_device_huge_pud(orig, pudp, addr, end, pages, nr);
 	}
 
-	refs = 0;
 	page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
-	do {
-		pages[*nr] = page;
-		(*nr)++;
-		page++;
-		refs++;
-	} while (addr += PAGE_SIZE, addr != end);
+	refs = record_subpages(page, addr, end, pages + *nr);
 
 	head = try_get_compound_head(pud_page(orig), refs);
-	if (!head) {
-		*nr -= refs;
+	if (!head)
 		return 0;
-	}
 
 	if (unlikely(pud_val(orig) != pud_val(*pudp))) {
-		*nr -= refs;
-		while (refs--)
-			put_page(head);
+		put_compound_head(head, refs);
 		return 0;
 	}
 
+	*nr += refs;
 	SetPageReferenced(head);
 	return 1;
 }
@@ -2157,28 +2150,20 @@ static int gup_huge_pgd(pgd_t orig, pgd_
 		return 0;
 
 	BUILD_BUG_ON(pgd_devmap(orig));
-	refs = 0;
+
 	page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
-	do {
-		pages[*nr] = page;
-		(*nr)++;
-		page++;
-		refs++;
-	} while (addr += PAGE_SIZE, addr != end);
+	refs = record_subpages(page, addr, end, pages + *nr);
 
 	head = try_get_compound_head(pgd_page(orig), refs);
-	if (!head) {
-		*nr -= refs;
+	if (!head)
 		return 0;
-	}
 
 	if (unlikely(pgd_val(orig) != pgd_val(*pgdp))) {
-		*nr -= refs;
-		while (refs--)
-			put_page(head);
+		put_compound_head(head, refs);
 		return 0;
 	}
 
+	*nr += refs;
 	SetPageReferenced(head);
 	return 1;
 }
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 025/118] mm/gup: move try_get_compound_head() to top, fix minor issues
  2020-01-31  6:10 incoming Andrew Morton
                   ` (23 preceding siblings ...)
  2020-01-31  6:12 ` [patch 024/118] mm/gup: factor out duplicate code from four routines Andrew Morton
@ 2020-01-31  6:12 ` Andrew Morton
  2020-01-31  6:12 ` [patch 026/118] mm: Cleanup __put_devmap_managed_page() vs ->page_free() Andrew Morton
                   ` (92 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:12 UTC (permalink / raw)
  To: akpm, alex.williamson, aneesh.kumar, axboe, bjorn.topel, corbet,
	dan.j.williams, daniel.vetter, hch, hverkuil-cisco, ira.weiny,
	jack, jgg, jgg, jglisse, jhubbard, kirill, leonro, linux-mm,
	mchehab, mm-commits, rppt, torvalds

From: John Hubbard <jhubbard@nvidia.com>
Subject: mm/gup: move try_get_compound_head() to top, fix minor issues

An upcoming patch uses try_get_compound_head() more widely, so move it to
the top of gup.c.

Also fix a tiny spelling error and a checkpatch.pl warning.

Link: http://lkml.kernel.org/r/20200107224558.2362728-3-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/gup.c |   29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)

--- a/mm/gup.c~mm-gup-move-try_get_compound_head-to-top-fix-minor-issues
+++ a/mm/gup.c
@@ -29,6 +29,21 @@ struct follow_page_context {
 	unsigned int page_mask;
 };
 
+/*
+ * Return the compound head page with ref appropriately incremented,
+ * or NULL if that failed.
+ */
+static inline struct page *try_get_compound_head(struct page *page, int refs)
+{
+	struct page *head = compound_head(page);
+
+	if (WARN_ON_ONCE(page_ref_count(head) < 0))
+		return NULL;
+	if (unlikely(!page_cache_add_speculative(head, refs)))
+		return NULL;
+	return head;
+}
+
 /**
  * put_user_pages_dirty_lock() - release and optionally dirty gup-pinned pages
  * @pages:  array of pages to be maybe marked dirty, and definitely released.
@@ -1807,20 +1822,6 @@ static void __maybe_unused undo_dev_page
 	}
 }
 
-/*
- * Return the compund head page with ref appropriately incremented,
- * or NULL if that failed.
- */
-static inline struct page *try_get_compound_head(struct page *page, int refs)
-{
-	struct page *head = compound_head(page);
-	if (WARN_ON_ONCE(page_ref_count(head) < 0))
-		return NULL;
-	if (unlikely(!page_cache_add_speculative(head, refs)))
-		return NULL;
-	return head;
-}

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 026/118] mm: Cleanup __put_devmap_managed_page() vs ->page_free()
  2020-01-31  6:10 incoming Andrew Morton
                   ` (24 preceding siblings ...)
  2020-01-31  6:12 ` [patch 025/118] mm/gup: move try_get_compound_head() to top, fix minor issues Andrew Morton
@ 2020-01-31  6:12 ` Andrew Morton
  2020-01-31  6:12 ` [patch 027/118] mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages Andrew Morton
                   ` (91 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:12 UTC (permalink / raw)
  To: akpm, alex.williamson, aneesh.kumar, axboe, bjorn.topel, corbet,
	dan.j.williams, daniel.vetter, hch, hverkuil-cisco, ira.weiny,
	jack, jgg, jgg, jglisse, jhubbard, kirill, leonro, linux-mm,
	mchehab, mm-commits, rppt, torvalds

From: Dan Williams <dan.j.williams@intel.com>
Subject: mm: Cleanup __put_devmap_managed_page() vs ->page_free()

After the removal of the device-public infrastructure there are only 2
->page_free() call backs in the kernel.  One of those is a device-private
callback in the nouveau driver, the other is a generic wakeup needed in
the DAX case.  In the hopes that all ->page_free() callbacks can be
migrated to common core kernel functionality, move the device-private
specific actions in __put_devmap_managed_page() under the
is_device_private_page() conditional, including the ->page_free()
callback.  For the other page types just open-code the generic wakeup.

Yes, the wakeup is only needed in the MEMORY_DEVICE_FSDAX case, but it
does no harm in the MEMORY_DEVICE_DEVDAX and MEMORY_DEVICE_PCI_P2PDMA
case.

Link: http://lkml.kernel.org/r/20200107224558.2362728-4-jhubbard@nvidia.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/nvdimm/pmem.c |    6 ---
 mm/memremap.c         |   80 ++++++++++++++++++++++------------------
 2 files changed, 44 insertions(+), 42 deletions(-)

--- a/drivers/nvdimm/pmem.c~mm-cleanup-__put_devmap_managed_page-vs-page_free
+++ a/drivers/nvdimm/pmem.c
@@ -337,13 +337,7 @@ static void pmem_release_disk(void *__pm
 	put_disk(pmem->disk);
 }
 
-static void pmem_pagemap_page_free(struct page *page)
-{
-	wake_up_var(&page->_refcount);
-}
-
 static const struct dev_pagemap_ops fsdax_pagemap_ops = {
-	.page_free		= pmem_pagemap_page_free,
 	.kill			= pmem_pagemap_kill,
 	.cleanup		= pmem_pagemap_cleanup,
 };
--- a/mm/memremap.c~mm-cleanup-__put_devmap_managed_page-vs-page_free
+++ a/mm/memremap.c
@@ -27,7 +27,8 @@ static void devmap_managed_enable_put(vo
 
 static int devmap_managed_enable_get(struct dev_pagemap *pgmap)
 {
-	if (!pgmap->ops || !pgmap->ops->page_free) {
+	if (pgmap->type == MEMORY_DEVICE_PRIVATE &&
+	    (!pgmap->ops || !pgmap->ops->page_free)) {
 		WARN(1, "Missing page_free method\n");
 		return -EINVAL;
 	}
@@ -414,44 +415,51 @@ void __put_devmap_managed_page(struct pa
 {
 	int count = page_ref_dec_return(page);
 
-	/*
-	 * If refcount is 1 then page is freed and refcount is stable as nobody
-	 * holds a reference on the page.
-	 */
-	if (count == 1) {
-		/* Clear Active bit in case of parallel mark_page_accessed */
-		__ClearPageActive(page);
-		__ClearPageWaiters(page);
+	/* still busy */
+	if (count > 1)
+		return;
 
-		mem_cgroup_uncharge(page);
+	/* only triggered by the dev_pagemap shutdown path */
+	if (count == 0) {
+		__put_page(page);
+		return;
+	}
 
-		/*
-		 * When a device_private page is freed, the page->mapping field
-		 * may still contain a (stale) mapping value. For example, the
-		 * lower bits of page->mapping may still identify the page as
-		 * an anonymous page. Ultimately, this entire field is just
-		 * stale and wrong, and it will cause errors if not cleared.
-		 * One example is:
-		 *
-		 *  migrate_vma_pages()
-		 *    migrate_vma_insert_page()
-		 *      page_add_new_anon_rmap()
-		 *        __page_set_anon_rmap()
-		 *          ...checks page->mapping, via PageAnon(page) call,
-		 *            and incorrectly concludes that the page is an
-		 *            anonymous page. Therefore, it incorrectly,
-		 *            silently fails to set up the new anon rmap.
-		 *
-		 * For other types of ZONE_DEVICE pages, migration is either
-		 * handled differently or not done at all, so there is no need
-		 * to clear page->mapping.
-		 */
-		if (is_device_private_page(page))
-			page->mapping = NULL;
+	/* notify page idle for dax */
+	if (!is_device_private_page(page)) {
+		wake_up_var(&page->_refcount);
+		return;
+	}
 
-		page->pgmap->ops->page_free(page);
-	} else if (!count)
-		__put_page(page);
+	/* Clear Active bit in case of parallel mark_page_accessed */
+	__ClearPageActive(page);
+	__ClearPageWaiters(page);
+
+	mem_cgroup_uncharge(page);
+
+	/*
+	 * When a device_private page is freed, the page->mapping field
+	 * may still contain a (stale) mapping value. For example, the
+	 * lower bits of page->mapping may still identify the page as an
+	 * anonymous page. Ultimately, this entire field is just stale
+	 * and wrong, and it will cause errors if not cleared.  One
+	 * example is:
+	 *
+	 *  migrate_vma_pages()
+	 *    migrate_vma_insert_page()
+	 *      page_add_new_anon_rmap()
+	 *        __page_set_anon_rmap()
+	 *          ...checks page->mapping, via PageAnon(page) call,
+	 *            and incorrectly concludes that the page is an
+	 *            anonymous page. Therefore, it incorrectly,
+	 *            silently fails to set up the new anon rmap.
+	 *
+	 * For other types of ZONE_DEVICE pages, migration is either
+	 * handled differently or not done at all, so there is no need
+	 * to clear page->mapping.
+	 */
+	page->mapping = NULL;
+	page->pgmap->ops->page_free(page);
 }
 EXPORT_SYMBOL(__put_devmap_managed_page);
 #endif /* CONFIG_DEV_PAGEMAP_OPS */
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 027/118] mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages
  2020-01-31  6:10 incoming Andrew Morton
                   ` (25 preceding siblings ...)
  2020-01-31  6:12 ` [patch 026/118] mm: Cleanup __put_devmap_managed_page() vs ->page_free() Andrew Morton
@ 2020-01-31  6:12 ` Andrew Morton
  2020-01-31  6:12 ` [patch 028/118] goldish_pipe: rename local pin_user_pages() routine Andrew Morton
                   ` (90 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:12 UTC (permalink / raw)
  To: akpm, alex.williamson, aneesh.kumar, axboe, bjorn.topel, corbet,
	dan.j.williams, daniel.vetter, hch, hverkuil-cisco, ira.weiny,
	jack, jgg, jgg, jglisse, jhubbard, kirill, leonro, linux-mm,
	mchehab, mm-commits, rppt, torvalds

From: John Hubbard <jhubbard@nvidia.com>
Subject: mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages

An upcoming patch changes and complicates the refcounting and especially
the "put page" aspects of it.  In order to keep everything clean, refactor
the devmap page release routines:

* Rename put_devmap_managed_page() to page_is_devmap_managed(), and
  limit the functionality to "read only": return a bool, with no side
  effects.

* Add a new routine, put_devmap_managed_page(), to handle decrementing
  the refcount for ZONE_DEVICE pages.

* Change callers (just release_pages() and put_page()) to check
  page_is_devmap_managed() before calling the new
  put_devmap_managed_page() routine.  This is a performance point:
  put_page() is a hot path, so we need to avoid non- inline function calls
  where possible.

* Rename __put_devmap_managed_page() to free_devmap_managed_page(), and
  limit the functionality to unconditionally freeing a devmap page.

This is originally based on a separate patch by Ira Weiny, which applied
to an early version of the put_user_page() experiments.  Since then,
Jérôme Glisse suggested the refactoring described above.

Link: http://lkml.kernel.org/r/20200107224558.2362728-5-jhubbard@nvidia.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Suggested-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mm.h |   18 +++++++++++++-----
 mm/memremap.c      |   15 +--------------
 mm/swap.c          |   27 ++++++++++++++++++++++++++-
 3 files changed, 40 insertions(+), 20 deletions(-)

--- a/include/linux/mm.h~mm-devmap-refactor-1-based-refcounting-for-zone_device-pages
+++ a/include/linux/mm.h
@@ -947,9 +947,10 @@ static inline bool is_zone_device_page(c
 #endif
 
 #ifdef CONFIG_DEV_PAGEMAP_OPS
-void __put_devmap_managed_page(struct page *page);
+void free_devmap_managed_page(struct page *page);
 DECLARE_STATIC_KEY_FALSE(devmap_managed_key);
-static inline bool put_devmap_managed_page(struct page *page)
+
+static inline bool page_is_devmap_managed(struct page *page)
 {
 	if (!static_branch_unlikely(&devmap_managed_key))
 		return false;
@@ -958,7 +959,6 @@ static inline bool put_devmap_managed_pa
 	switch (page->pgmap->type) {
 	case MEMORY_DEVICE_PRIVATE:
 	case MEMORY_DEVICE_FS_DAX:
-		__put_devmap_managed_page(page);
 		return true;
 	default:
 		break;
@@ -966,11 +966,17 @@ static inline bool put_devmap_managed_pa
 	return false;
 }
 
+void put_devmap_managed_page(struct page *page);
+
 #else /* CONFIG_DEV_PAGEMAP_OPS */
-static inline bool put_devmap_managed_page(struct page *page)
+static inline bool page_is_devmap_managed(struct page *page)
 {
 	return false;
 }
+
+static inline void put_devmap_managed_page(struct page *page)
+{
+}
 #endif /* CONFIG_DEV_PAGEMAP_OPS */
 
 static inline bool is_device_private_page(const struct page *page)
@@ -1023,8 +1029,10 @@ static inline void put_page(struct page
 	 * need to inform the device driver through callback. See
 	 * include/linux/memremap.h and HMM for details.
 	 */
-	if (put_devmap_managed_page(page))
+	if (page_is_devmap_managed(page)) {
+		put_devmap_managed_page(page);
 		return;
+	}
 
 	if (put_page_testzero(page))
 		__put_page(page);
--- a/mm/memremap.c~mm-devmap-refactor-1-based-refcounting-for-zone_device-pages
+++ a/mm/memremap.c
@@ -411,20 +411,8 @@ struct dev_pagemap *get_dev_pagemap(unsi
 EXPORT_SYMBOL_GPL(get_dev_pagemap);
 
 #ifdef CONFIG_DEV_PAGEMAP_OPS
-void __put_devmap_managed_page(struct page *page)
+void free_devmap_managed_page(struct page *page)
 {
-	int count = page_ref_dec_return(page);
-
-	/* still busy */
-	if (count > 1)
-		return;
-
-	/* only triggered by the dev_pagemap shutdown path */
-	if (count == 0) {
-		__put_page(page);
-		return;
-	}
-
 	/* notify page idle for dax */
 	if (!is_device_private_page(page)) {
 		wake_up_var(&page->_refcount);
@@ -461,5 +449,4 @@ void __put_devmap_managed_page(struct pa
 	page->mapping = NULL;
 	page->pgmap->ops->page_free(page);
 }
-EXPORT_SYMBOL(__put_devmap_managed_page);
 #endif /* CONFIG_DEV_PAGEMAP_OPS */
--- a/mm/swap.c~mm-devmap-refactor-1-based-refcounting-for-zone_device-pages
+++ a/mm/swap.c
@@ -813,8 +813,10 @@ void release_pages(struct page **pages,
 			 * processing, and instead, expect a call to
 			 * put_page_testzero().
 			 */
-			if (put_devmap_managed_page(page))
+			if (page_is_devmap_managed(page)) {
+				put_devmap_managed_page(page);
 				continue;
+			}
 		}
 
 		page = compound_head(page);
@@ -1102,3 +1104,26 @@ void __init swap_setup(void)
 	 * _really_ don't want to cluster much more
 	 */
 }
+
+#ifdef CONFIG_DEV_PAGEMAP_OPS
+void put_devmap_managed_page(struct page *page)
+{
+	int count;
+
+	if (WARN_ON_ONCE(!page_is_devmap_managed(page)))
+		return;
+
+	count = page_ref_dec_return(page);
+
+	/*
+	 * devmap page refcounts are 1-based, rather than 0-based: if
+	 * refcount is 1, then the page is free and the refcount is
+	 * stable because nobody holds a reference on the page.
+	 */
+	if (count == 1)
+		free_devmap_managed_page(page);
+	else if (!count)
+		__put_page(page);
+}
+EXPORT_SYMBOL(put_devmap_managed_page);
+#endif
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 028/118] goldish_pipe: rename local pin_user_pages() routine
  2020-01-31  6:10 incoming Andrew Morton
                   ` (26 preceding siblings ...)
  2020-01-31  6:12 ` [patch 027/118] mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages Andrew Morton
@ 2020-01-31  6:12 ` Andrew Morton
  2020-01-31  6:12 ` [patch 029/118] mm: fix get_user_pages_remote()'s handling of FOLL_LONGTERM Andrew Morton
                   ` (89 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:12 UTC (permalink / raw)
  To: akpm, alex.williamson, aneesh.kumar, axboe, bjorn.topel, corbet,
	dan.j.williams, daniel.vetter, hch, hverkuil-cisco, ira.weiny,
	jack, jgg, jgg, jglisse, jhubbard, kirill, leonro, linux-mm,
	mchehab, mm-commits, rppt, torvalds

From: John Hubbard <jhubbard@nvidia.com>
Subject: goldish_pipe: rename local pin_user_pages() routine

Avoid naming conflicts: rename local static function from
"pin_user_pages()" to "goldfish_pin_pages()".

An upcoming patch will introduce a global pin_user_pages() function.

Link: http://lkml.kernel.org/r/20200107224558.2362728-6-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/platform/goldfish/goldfish_pipe.c |   18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

--- a/drivers/platform/goldfish/goldfish_pipe.c~goldish_pipe-rename-local-pin_user_pages-routine
+++ a/drivers/platform/goldfish/goldfish_pipe.c
@@ -257,12 +257,12 @@ static int goldfish_pipe_error_convert(i
 	}
 }
 
-static int pin_user_pages(unsigned long first_page,
-			  unsigned long last_page,
-			  unsigned int last_page_size,
-			  int is_write,
-			  struct page *pages[MAX_BUFFERS_PER_COMMAND],
-			  unsigned int *iter_last_page_size)
+static int goldfish_pin_pages(unsigned long first_page,
+			      unsigned long last_page,
+			      unsigned int last_page_size,
+			      int is_write,
+			      struct page *pages[MAX_BUFFERS_PER_COMMAND],
+			      unsigned int *iter_last_page_size)
 {
 	int ret;
 	int requested_pages = ((last_page - first_page) >> PAGE_SHIFT) + 1;
@@ -354,9 +354,9 @@ static int transfer_max_buffers(struct g
 	if (mutex_lock_interruptible(&pipe->lock))
 		return -ERESTARTSYS;
 
-	pages_count = pin_user_pages(first_page, last_page,
-				     last_page_size, is_write,
-				     pipe->pages, &iter_last_page_size);
+	pages_count = goldfish_pin_pages(first_page, last_page,
+					 last_page_size, is_write,
+					 pipe->pages, &iter_last_page_size);
 	if (pages_count < 0) {
 		mutex_unlock(&pipe->lock);
 		return pages_count;
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 029/118] mm: fix get_user_pages_remote()'s handling of FOLL_LONGTERM
  2020-01-31  6:10 incoming Andrew Morton
                   ` (27 preceding siblings ...)
  2020-01-31  6:12 ` [patch 028/118] goldish_pipe: rename local pin_user_pages() routine Andrew Morton
@ 2020-01-31  6:12 ` Andrew Morton
  2020-01-31  6:12 ` [patch 030/118] vfio: fix FOLL_LONGTERM use, simplify get_user_pages_remote() call Andrew Morton
                   ` (88 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:12 UTC (permalink / raw)
  To: akpm, alex.williamson, aneesh.kumar, axboe, bjorn.topel, corbet,
	dan.j.williams, daniel.vetter, hch, hverkuil-cisco, ira.weiny,
	jack, jgg, jgg, jglisse, jhubbard, kirill, leonro, linux-mm,
	mchehab, mm-commits, rppt, torvalds

From: John Hubbard <jhubbard@nvidia.com>
Subject: mm: fix get_user_pages_remote()'s handling of FOLL_LONGTERM

As it says in the updated comment in gup.c: current FOLL_LONGTERM behavior
is incompatible with FAULT_FLAG_ALLOW_RETRY because of the FS DAX check
requirement on vmas.

However, the corresponding restriction in get_user_pages_remote() was
slightly stricter than is actually required: it forbade all FOLL_LONGTERM
callers, but we can actually allow FOLL_LONGTERM callers that do not set
the "locked" arg.

Update the code and comments to loosen the restriction, allowing
FOLL_LONGTERM in some cases.

Also, copy the DAX check ("if a VMA is DAX, don't allow long term
pinning") from the VFIO call site, all the way into the internals of
get_user_pages_remote() and __gup_longterm_locked().  That is:
get_user_pages_remote() calls __gup_longterm_locked(), which in turn calls
check_dax_vmas().  This check will then be removed from the VFIO call site
in a subsequent patch.

Thanks to Jason Gunthorpe for pointing out a clean way to fix this, and to
Dan Williams for helping clarify the DAX refactoring.

Link: http://lkml.kernel.org/r/20200107224558.2362728-7-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/gup.c |  174 ++++++++++++++++++++++++++++-------------------------
 1 file changed, 92 insertions(+), 82 deletions(-)

--- a/mm/gup.c~mm-fix-get_user_pages_remotes-handling-of-foll_longterm
+++ a/mm/gup.c
@@ -1111,88 +1111,6 @@ static __always_inline long __get_user_p
 	return pages_done;
 }
 
-/*
- * get_user_pages_remote() - pin user pages in memory
- * @tsk:	the task_struct to use for page fault accounting, or
- *		NULL if faults are not to be recorded.
- * @mm:		mm_struct of target mm
- * @start:	starting user address
- * @nr_pages:	number of pages from start to pin
- * @gup_flags:	flags modifying lookup behaviour
- * @pages:	array that receives pointers to the pages pinned.
- *		Should be at least nr_pages long. Or NULL, if caller
- *		only intends to ensure the pages are faulted in.
- * @vmas:	array of pointers to vmas corresponding to each page.
- *		Or NULL if the caller does not require them.
- * @locked:	pointer to lock flag indicating whether lock is held and
- *		subsequently whether VM_FAULT_RETRY functionality can be
- *		utilised. Lock must initially be held.
- *
- * Returns either number of pages pinned (which may be less than the
- * number requested), or an error. Details about the return value:
- *
- * -- If nr_pages is 0, returns 0.
- * -- If nr_pages is >0, but no pages were pinned, returns -errno.
- * -- If nr_pages is >0, and some pages were pinned, returns the number of
- *    pages pinned. Again, this may be less than nr_pages.
- *
- * The caller is responsible for releasing returned @pages, via put_page().
- *
- * @vmas are valid only as long as mmap_sem is held.
- *
- * Must be called with mmap_sem held for read or write.
- *
- * get_user_pages walks a process's page tables and takes a reference to
- * each struct page that each user address corresponds to at a given
- * instant. That is, it takes the page that would be accessed if a user
- * thread accesses the given user virtual address at that instant.
- *
- * This does not guarantee that the page exists in the user mappings when
- * get_user_pages returns, and there may even be a completely different
- * page there in some cases (eg. if mmapped pagecache has been invalidated
- * and subsequently re faulted). However it does guarantee that the page
- * won't be freed completely. And mostly callers simply care that the page
- * contains data that was valid *at some point in time*. Typically, an IO
- * or similar operation cannot guarantee anything stronger anyway because
- * locks can't be held over the syscall boundary.
- *
- * If gup_flags & FOLL_WRITE == 0, the page must not be written to. If the page
- * is written to, set_page_dirty (or set_page_dirty_lock, as appropriate) must
- * be called after the page is finished with, and before put_page is called.
- *
- * get_user_pages is typically used for fewer-copy IO operations, to get a
- * handle on the memory by some means other than accesses via the user virtual
- * addresses. The pages may be submitted for DMA to devices or accessed via
- * their kernel linear mapping (via the kmap APIs). Care should be taken to
- * use the correct cache flushing APIs.
- *
- * See also get_user_pages_fast, for performance critical applications.
- *
- * get_user_pages should be phased out in favor of
- * get_user_pages_locked|unlocked or get_user_pages_fast. Nothing
- * should use get_user_pages because it cannot pass
- * FAULT_FLAG_ALLOW_RETRY to handle_mm_fault.
- */
-long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
-		unsigned long start, unsigned long nr_pages,
-		unsigned int gup_flags, struct page **pages,
-		struct vm_area_struct **vmas, int *locked)
-{
-	/*
-	 * FIXME: Current FOLL_LONGTERM behavior is incompatible with
-	 * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on
-	 * vmas.  As there are no users of this flag in this call we simply
-	 * disallow this option for now.
-	 */
-	if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM))
-		return -EINVAL;
-
-	return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas,
-				       locked,
-				       gup_flags | FOLL_TOUCH | FOLL_REMOTE);
-}
-EXPORT_SYMBOL(get_user_pages_remote);
-
 /**
  * populate_vma_page_range() -  populate a range of pages in the vma.
  * @vma:   target vma
@@ -1627,6 +1545,98 @@ static __always_inline long __gup_longte
 #endif /* CONFIG_FS_DAX || CONFIG_CMA */
 
 /*
+ * get_user_pages_remote() - pin user pages in memory
+ * @tsk:	the task_struct to use for page fault accounting, or
+ *		NULL if faults are not to be recorded.
+ * @mm:		mm_struct of target mm
+ * @start:	starting user address
+ * @nr_pages:	number of pages from start to pin
+ * @gup_flags:	flags modifying lookup behaviour
+ * @pages:	array that receives pointers to the pages pinned.
+ *		Should be at least nr_pages long. Or NULL, if caller
+ *		only intends to ensure the pages are faulted in.
+ * @vmas:	array of pointers to vmas corresponding to each page.
+ *		Or NULL if the caller does not require them.
+ * @locked:	pointer to lock flag indicating whether lock is held and
+ *		subsequently whether VM_FAULT_RETRY functionality can be
+ *		utilised. Lock must initially be held.
+ *
+ * Returns either number of pages pinned (which may be less than the
+ * number requested), or an error. Details about the return value:
+ *
+ * -- If nr_pages is 0, returns 0.
+ * -- If nr_pages is >0, but no pages were pinned, returns -errno.
+ * -- If nr_pages is >0, and some pages were pinned, returns the number of
+ *    pages pinned. Again, this may be less than nr_pages.
+ *
+ * The caller is responsible for releasing returned @pages, via put_page().
+ *
+ * @vmas are valid only as long as mmap_sem is held.
+ *
+ * Must be called with mmap_sem held for read or write.
+ *
+ * get_user_pages walks a process's page tables and takes a reference to
+ * each struct page that each user address corresponds to at a given
+ * instant. That is, it takes the page that would be accessed if a user
+ * thread accesses the given user virtual address at that instant.
+ *
+ * This does not guarantee that the page exists in the user mappings when
+ * get_user_pages returns, and there may even be a completely different
+ * page there in some cases (eg. if mmapped pagecache has been invalidated
+ * and subsequently re faulted). However it does guarantee that the page
+ * won't be freed completely. And mostly callers simply care that the page
+ * contains data that was valid *at some point in time*. Typically, an IO
+ * or similar operation cannot guarantee anything stronger anyway because
+ * locks can't be held over the syscall boundary.
+ *
+ * If gup_flags & FOLL_WRITE == 0, the page must not be written to. If the page
+ * is written to, set_page_dirty (or set_page_dirty_lock, as appropriate) must
+ * be called after the page is finished with, and before put_page is called.
+ *
+ * get_user_pages is typically used for fewer-copy IO operations, to get a
+ * handle on the memory by some means other than accesses via the user virtual
+ * addresses. The pages may be submitted for DMA to devices or accessed via
+ * their kernel linear mapping (via the kmap APIs). Care should be taken to
+ * use the correct cache flushing APIs.
+ *
+ * See also get_user_pages_fast, for performance critical applications.
+ *
+ * get_user_pages should be phased out in favor of
+ * get_user_pages_locked|unlocked or get_user_pages_fast. Nothing
+ * should use get_user_pages because it cannot pass
+ * FAULT_FLAG_ALLOW_RETRY to handle_mm_fault.
+ */
+long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
+		unsigned long start, unsigned long nr_pages,
+		unsigned int gup_flags, struct page **pages,
+		struct vm_area_struct **vmas, int *locked)
+{
+	/*
+	 * Parts of FOLL_LONGTERM behavior are incompatible with
+	 * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on
+	 * vmas. However, this only comes up if locked is set, and there are
+	 * callers that do request FOLL_LONGTERM, but do not set locked. So,
+	 * allow what we can.
+	 */
+	if (gup_flags & FOLL_LONGTERM) {
+		if (WARN_ON_ONCE(locked))
+			return -EINVAL;
+		/*
+		 * This will check the vmas (even if our vmas arg is NULL)
+		 * and return -ENOTSUPP if DAX isn't allowed in this case:
+		 */
+		return __gup_longterm_locked(tsk, mm, start, nr_pages, pages,
+					     vmas, gup_flags | FOLL_TOUCH |
+					     FOLL_REMOTE);
+	}
+
+	return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas,
+				       locked,
+				       gup_flags | FOLL_TOUCH | FOLL_REMOTE);
+}
+EXPORT_SYMBOL(get_user_pages_remote);
+
+/*
  * This is the same as get_user_pages_remote(), just with a
  * less-flexible calling convention where we assume that the task
  * and mm being operated on are the current task's and don't allow
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 030/118] vfio: fix FOLL_LONGTERM use, simplify get_user_pages_remote() call
  2020-01-31  6:10 incoming Andrew Morton
                   ` (28 preceding siblings ...)
  2020-01-31  6:12 ` [patch 029/118] mm: fix get_user_pages_remote()'s handling of FOLL_LONGTERM Andrew Morton
@ 2020-01-31  6:12 ` Andrew Morton
  2020-01-31  6:12 ` [patch 031/118] mm/gup: allow FOLL_FORCE for get_user_pages_fast() Andrew Morton
                   ` (87 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:12 UTC (permalink / raw)
  To: akpm, alex.williamson, aneesh.kumar, axboe, bjorn.topel, corbet,
	dan.j.williams, daniel.vetter, hch, hverkuil-cisco, ira.weiny,
	jack, jgg, jgg, jglisse, jhubbard, kirill, leonro, linux-mm,
	mchehab, mm-commits, rppt, torvalds

From: John Hubbard <jhubbard@nvidia.com>
Subject: vfio: fix FOLL_LONGTERM use, simplify get_user_pages_remote() call

Update VFIO to take advantage of the recently loosened restriction on
FOLL_LONGTERM with get_user_pages_remote().  Also, now it is possible to
fix a bug: the VFIO caller is logically a FOLL_LONGTERM user, but it
wasn't setting FOLL_LONGTERM.

Also, remove an unnessary pair of calls that were releasing and
reacquiring the mmap_sem.  There is no need to avoid holding mmap_sem just
in order to call page_to_pfn().

Also, now that the the DAX check ("if a VMA is DAX, don't allow long term
pinning") is in the internals of get_user_pages_remote() and
__gup_longterm_locked(), there's no need for it at the VFIO call site.  So
remove it.

Link: http://lkml.kernel.org/r/20200107224558.2362728-8-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/vfio/vfio_iommu_type1.c |   30 +++++-------------------------
 1 file changed, 5 insertions(+), 25 deletions(-)

--- a/drivers/vfio/vfio_iommu_type1.c~vfio-fix-foll_longterm-use-simplify-get_user_pages_remote-call
+++ a/drivers/vfio/vfio_iommu_type1.c
@@ -322,7 +322,6 @@ static int vaddr_get_pfn(struct mm_struc
 {
 	struct page *page[1];
 	struct vm_area_struct *vma;
-	struct vm_area_struct *vmas[1];
 	unsigned int flags = 0;
 	int ret;
 
@@ -330,33 +329,14 @@ static int vaddr_get_pfn(struct mm_struc
 		flags |= FOLL_WRITE;
 
 	down_read(&mm->mmap_sem);
-	if (mm == current->mm) {
-		ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page,
-				     vmas);
-	} else {
-		ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page,
-					    vmas, NULL);
-		/*
-		 * The lifetime of a vaddr_get_pfn() page pin is
-		 * userspace-controlled. In the fs-dax case this could
-		 * lead to indefinite stalls in filesystem operations.
-		 * Disallow attempts to pin fs-dax pages via this
-		 * interface.
-		 */
-		if (ret > 0 && vma_is_fsdax(vmas[0])) {
-			ret = -EOPNOTSUPP;
-			put_page(page[0]);
-		}
-	}
-	up_read(&mm->mmap_sem);
-
+	ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
+				    page, NULL, NULL);
 	if (ret == 1) {
 		*pfn = page_to_pfn(page[0]);
-		return 0;
+		ret = 0;
+		goto done;
 	}
 
-	down_read(&mm->mmap_sem);
-
 	vaddr = untagged_addr(vaddr);
 
 	vma = find_vma_intersection(mm, vaddr, vaddr + 1);
@@ -366,7 +346,7 @@ static int vaddr_get_pfn(struct mm_struc
 		if (is_invalid_reserved_pfn(*pfn))
 			ret = 0;
 	}

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 031/118] mm/gup: allow FOLL_FORCE for get_user_pages_fast()
  2020-01-31  6:10 incoming Andrew Morton
                   ` (29 preceding siblings ...)
  2020-01-31  6:12 ` [patch 030/118] vfio: fix FOLL_LONGTERM use, simplify get_user_pages_remote() call Andrew Morton
@ 2020-01-31  6:12 ` Andrew Morton
  2020-01-31  6:12 ` [patch 032/118] IB/umem: use get_user_pages_fast() to pin DMA pages Andrew Morton
                   ` (86 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:12 UTC (permalink / raw)
  To: akpm, alex.williamson, aneesh.kumar, axboe, bjorn.topel, corbet,
	dan.j.williams, daniel.vetter, hch, hverkuil-cisco, ira.weiny,
	jack, jgg, jgg, jglisse, jhubbard, kirill, leonro, linux-mm,
	mchehab, mm-commits, rppt, torvalds

From: John Hubbard <jhubbard@nvidia.com>
Subject: mm/gup: allow FOLL_FORCE for get_user_pages_fast()

Commit 817be129e6f2 ("mm: validate get_user_pages_fast flags") allowed
only FOLL_WRITE and FOLL_LONGTERM to be passed to get_user_pages_fast(). 
This, combined with the fact that get_user_pages_fast() falls back to
"slow gup", which *does* accept FOLL_FORCE, leads to an odd situation: if
you need FOLL_FORCE, you cannot call get_user_pages_fast().

There does not appear to be any reason for filtering out FOLL_FORCE. 
There is nothing in the _fast() implementation that requires that we avoid
writing to the pages.  So it appears to have been an oversight.

Fix by allowing FOLL_FORCE to be set for get_user_pages_fast().

Link: http://lkml.kernel.org/r/20200107224558.2362728-9-jhubbard@nvidia.com
Fixes: 817be129e6f2 ("mm: validate get_user_pages_fast flags")
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/gup.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/mm/gup.c~mm-gup-allow-foll_force-for-get_user_pages_fast
+++ a/mm/gup.c
@@ -2411,7 +2411,8 @@ int get_user_pages_fast(unsigned long st
 	unsigned long addr, len, end;
 	int nr = 0, ret = 0;
 
-	if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM)))
+	if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM |
+				       FOLL_FORCE)))
 		return -EINVAL;
 
 	start = untagged_addr(start) & PAGE_MASK;
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 032/118] IB/umem: use get_user_pages_fast() to pin DMA pages
  2020-01-31  6:10 incoming Andrew Morton
                   ` (30 preceding siblings ...)
  2020-01-31  6:12 ` [patch 031/118] mm/gup: allow FOLL_FORCE for get_user_pages_fast() Andrew Morton
@ 2020-01-31  6:12 ` Andrew Morton
  2020-01-31  6:12 ` [patch 033/118] media/v4l2-core: set pages dirty upon releasing DMA buffers Andrew Morton
                   ` (85 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:12 UTC (permalink / raw)
  To: akpm, alex.williamson, aneesh.kumar, axboe, bjorn.topel, corbet,
	dan.j.williams, daniel.vetter, hch, hverkuil-cisco, ira.weiny,
	jack, jgg, jgg, jglisse, jhubbard, kirill, leonro, linux-mm,
	mchehab, mm-commits, rppt, torvalds

From: John Hubbard <jhubbard@nvidia.com>
Subject: IB/umem: use get_user_pages_fast() to pin DMA pages

And get rid of the mmap_sem calls, as part of that.  Note that
get_user_pages_fast() will, if necessary, fall back to
__gup_longterm_unlocked(), which takes the mmap_sem as needed.

Link: http://lkml.kernel.org/r/20200107224558.2362728-10-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/infiniband/core/umem.c |   17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

--- a/drivers/infiniband/core/umem.c~ib-umem-use-get_user_pages_fast-to-pin-dma-pages
+++ a/drivers/infiniband/core/umem.c
@@ -257,16 +257,13 @@ struct ib_umem *ib_umem_get(struct ib_de
 	sg = umem->sg_head.sgl;
 
 	while (npages) {
-		down_read(&mm->mmap_sem);
-		ret = get_user_pages(cur_base,
-				     min_t(unsigned long, npages,
-					   PAGE_SIZE / sizeof (struct page *)),
-				     gup_flags | FOLL_LONGTERM,
-				     page_list, NULL);
-		if (ret < 0) {
-			up_read(&mm->mmap_sem);
+		ret = get_user_pages_fast(cur_base,
+					  min_t(unsigned long, npages,
+						PAGE_SIZE /
+						sizeof(struct page *)),
+					  gup_flags | FOLL_LONGTERM, page_list);
+		if (ret < 0)
 			goto umem_release;
-		}
 
 		cur_base += ret * PAGE_SIZE;
 		npages   -= ret;
@@ -274,8 +271,6 @@ struct ib_umem *ib_umem_get(struct ib_de
 		sg = ib_umem_add_sg_table(sg, page_list, ret,
 			dma_get_max_seg_size(device->dma_device),
 			&umem->sg_nents);

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 033/118] media/v4l2-core: set pages dirty upon releasing DMA buffers
  2020-01-31  6:10 incoming Andrew Morton
                   ` (31 preceding siblings ...)
  2020-01-31  6:12 ` [patch 032/118] IB/umem: use get_user_pages_fast() to pin DMA pages Andrew Morton
@ 2020-01-31  6:12 ` Andrew Morton
  2020-01-31  6:12 ` [patch 034/118] mm/gup: introduce pin_user_pages*() and FOLL_PIN Andrew Morton
                   ` (84 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:12 UTC (permalink / raw)
  To: akpm, alex.williamson, aneesh.kumar, axboe, bjorn.topel, corbet,
	dan.j.williams, daniel.vetter, hch, hverkuil-cisco, ira.weiny,
	jack, jgg, jgg, jglisse, jhubbard, kirill, leonro, linux-mm,
	mchehab, mm-commits, rppt, stable, torvalds

From: John Hubbard <jhubbard@nvidia.com>
Subject: media/v4l2-core: set pages dirty upon releasing DMA buffers

After DMA is complete, and the device and CPU caches are synchronized,
it's still required to mark the CPU pages as dirty, if the data was coming
from the device.  However, this driver was just issuing a bare put_page()
call, without any set_page_dirty*() call.

Fix the problem, by calling set_page_dirty_lock() if the CPU pages were
potentially receiving data from the device.

Link: http://lkml.kernel.org/r/20200107224558.2362728-11-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/media/v4l2-core/videobuf-dma-sg.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

--- a/drivers/media/v4l2-core/videobuf-dma-sg.c~media-v4l2-core-set-pages-dirty-upon-releasing-dma-buffers
+++ a/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -349,8 +349,11 @@ int videobuf_dma_free(struct videobuf_dm
 	BUG_ON(dma->sglen);
 
 	if (dma->pages) {
-		for (i = 0; i < dma->nr_pages; i++)
+		for (i = 0; i < dma->nr_pages; i++) {
+			if (dma->direction == DMA_FROM_DEVICE)
+				set_page_dirty_lock(dma->pages[i]);
 			put_page(dma->pages[i]);
+		}
 		kfree(dma->pages);
 		dma->pages = NULL;
 	}
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 034/118] mm/gup: introduce pin_user_pages*() and FOLL_PIN
  2020-01-31  6:10 incoming Andrew Morton
                   ` (32 preceding siblings ...)
  2020-01-31  6:12 ` [patch 033/118] media/v4l2-core: set pages dirty upon releasing DMA buffers Andrew Morton
@ 2020-01-31  6:12 ` Andrew Morton
  2020-01-31  6:12 ` [patch 035/118] goldish_pipe: convert to pin_user_pages() and put_user_page() Andrew Morton
                   ` (83 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:12 UTC (permalink / raw)
  To: akpm, alex.williamson, aneesh.kumar, axboe, bjorn.topel, corbet,
	dan.j.williams, daniel.vetter, hch, hverkuil-cisco, ira.weiny,
	jack, jgg, jgg, jglisse, jhubbard, kirill, leonro, linux-mm,
	mchehab, mm-commits, rppt, torvalds

From: John Hubbard <jhubbard@nvidia.com>
Subject: mm/gup: introduce pin_user_pages*() and FOLL_PIN

Introduce pin_user_pages*() variations of get_user_pages*() calls, and
also pin_longterm_pages*() variations.

For now, these are placeholder calls, until the various call sites are
converted to use the correct get_user_pages*() or pin_user_pages*() API.

These variants will eventually all set FOLL_PIN, which is also introduced,
and thoroughly documented.

    pin_user_pages()
    pin_user_pages_remote()
    pin_user_pages_fast()

All pages that are pinned via the above calls, must be unpinned via
put_user_page().

The underlying rules are:

* FOLL_PIN is a gup-internal flag, so the call sites should not directly
  set it.  That behavior is enforced with assertions.

* Call sites that want to indicate that they are going to do DirectIO
  ("DIO") or something with similar characteristics, should call a
  get_user_pages()-like wrapper call that sets FOLL_PIN.  These wrappers
  will:

    * Start with "pin_user_pages" instead of "get_user_pages".  That
      makes it easy to find and audit the call sites.

    * Set FOLL_PIN

* For pages that are received via FOLL_PIN, those pages must be returned
  via put_user_page().

Thanks to Jan Kara and Vlastimil Babka for explaining the 4 cases in this
documentation.  (I've reworded it and expanded upon it.)

Link: http://lkml.kernel.org/r/20200107224558.2362728-12-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>		[Documentation]
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 Documentation/core-api/index.rst          |    1 
 Documentation/core-api/pin_user_pages.rst |  232 ++++++++++++++++++++
 include/linux/mm.h                        |   63 ++++-
 mm/gup.c                                  |  164 ++++++++++++--
 4 files changed, 426 insertions(+), 34 deletions(-)

--- a/Documentation/core-api/index.rst~mm-gup-introduce-pin_user_pages-and-foll_pin
+++ a/Documentation/core-api/index.rst
@@ -31,6 +31,7 @@ Core utilities
    generic-radix-tree
    memory-allocation
    mm-api
+   pin_user_pages
    gfp_mask-from-fs-io
    timekeeping
    boot-time-mm
--- /dev/null
+++ a/Documentation/core-api/pin_user_pages.rst
@@ -0,0 +1,232 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================================================
+pin_user_pages() and related calls
+====================================================
+
+.. contents:: :local:
+
+Overview
+========
+
+This document describes the following functions::
+
+ pin_user_pages()
+ pin_user_pages_fast()
+ pin_user_pages_remote()
+
+Basic description of FOLL_PIN
+=============================
+
+FOLL_PIN and FOLL_LONGTERM are flags that can be passed to the get_user_pages*()
+("gup") family of functions. FOLL_PIN has significant interactions and
+interdependencies with FOLL_LONGTERM, so both are covered here.
+
+FOLL_PIN is internal to gup, meaning that it should not appear at the gup call
+sites. This allows the associated wrapper functions  (pin_user_pages*() and
+others) to set the correct combination of these flags, and to check for problems
+as well.
+
+FOLL_LONGTERM, on the other hand, *is* allowed to be set at the gup call sites.
+This is in order to avoid creating a large number of wrapper functions to cover
+all combinations of get*(), pin*(), FOLL_LONGTERM, and more. Also, the
+pin_user_pages*() APIs are clearly distinct from the get_user_pages*() APIs, so
+that's a natural dividing line, and a good point to make separate wrapper calls.
+In other words, use pin_user_pages*() for DMA-pinned pages, and
+get_user_pages*() for other cases. There are four cases described later on in
+this document, to further clarify that concept.
+
+FOLL_PIN and FOLL_GET are mutually exclusive for a given gup call. However,
+multiple threads and call sites are free to pin the same struct pages, via both
+FOLL_PIN and FOLL_GET. It's just the call site that needs to choose one or the
+other, not the struct page(s).
+
+The FOLL_PIN implementation is nearly the same as FOLL_GET, except that FOLL_PIN
+uses a different reference counting technique.
+
+FOLL_PIN is a prerequisite to FOLL_LONGTERM. Another way of saying that is,
+FOLL_LONGTERM is a specific case, more restrictive case of FOLL_PIN.
+
+Which flags are set by each wrapper
+===================================
+
+For these pin_user_pages*() functions, FOLL_PIN is OR'd in with whatever gup
+flags the caller provides. The caller is required to pass in a non-null struct
+pages* array, and the function then pin pages by incrementing each by a special
+value. For now, that value is +1, just like get_user_pages*().::
+
+ Function
+ --------
+ pin_user_pages          FOLL_PIN is always set internally by this function.
+ pin_user_pages_fast     FOLL_PIN is always set internally by this function.
+ pin_user_pages_remote   FOLL_PIN is always set internally by this function.
+
+For these get_user_pages*() functions, FOLL_GET might not even be specified.
+Behavior is a little more complex than above. If FOLL_GET was *not* specified,
+but the caller passed in a non-null struct pages* array, then the function
+sets FOLL_GET for you, and proceeds to pin pages by incrementing the refcount
+of each page by +1.::
+
+ Function
+ --------
+ get_user_pages           FOLL_GET is sometimes set internally by this function.
+ get_user_pages_fast      FOLL_GET is sometimes set internally by this function.
+ get_user_pages_remote    FOLL_GET is sometimes set internally by this function.
+
+Tracking dma-pinned pages
+=========================
+
+Some of the key design constraints, and solutions, for tracking dma-pinned
+pages:
+
+* An actual reference count, per struct page, is required. This is because
+  multiple processes may pin and unpin a page.
+
+* False positives (reporting that a page is dma-pinned, when in fact it is not)
+  are acceptable, but false negatives are not.
+
+* struct page may not be increased in size for this, and all fields are already
+  used.
+
+* Given the above, we can overload the page->_refcount field by using, sort of,
+  the upper bits in that field for a dma-pinned count. "Sort of", means that,
+  rather than dividing page->_refcount into bit fields, we simple add a medium-
+  large value (GUP_PIN_COUNTING_BIAS, initially chosen to be 1024: 10 bits) to
+  page->_refcount. This provides fuzzy behavior: if a page has get_page() called
+  on it 1024 times, then it will appear to have a single dma-pinned count.
+  And again, that's acceptable.
+
+This also leads to limitations: there are only 31-10==21 bits available for a
+counter that increments 10 bits at a time.
+
+TODO: for 1GB and larger huge pages, this is cutting it close. That's because
+when pin_user_pages() follows such pages, it increments the head page by "1"
+(where "1" used to mean "+1" for get_user_pages(), but now means "+1024" for
+pin_user_pages()) for each tail page. So if you have a 1GB huge page:
+
+* There are 256K (18 bits) worth of 4 KB tail pages.
+* There are 21 bits available to count up via GUP_PIN_COUNTING_BIAS (that is,
+  10 bits at a time)
+* There are 21 - 18 == 3 bits available to count. Except that there aren't,
+  because you need to allow for a few normal get_page() calls on the head page,
+  as well. Fortunately, the approach of using addition, rather than "hard"
+  bitfields, within page->_refcount, allows for sharing these bits gracefully.
+  But we're still looking at about 8 references.
+
+This, however, is a missing feature more than anything else, because it's easily
+solved by addressing an obvious inefficiency in the original get_user_pages()
+approach of retrieving pages: stop treating all the pages as if they were
+PAGE_SIZE. Retrieve huge pages as huge pages. The callers need to be aware of
+this, so some work is required. Once that's in place, this limitation mostly
+disappears from view, because there will be ample refcounting range available.
+
+* Callers must specifically request "dma-pinned tracking of pages". In other
+  words, just calling get_user_pages() will not suffice; a new set of functions,
+  pin_user_page() and related, must be used.
+
+FOLL_PIN, FOLL_GET, FOLL_LONGTERM: when to use which flags
+==========================================================
+
+Thanks to Jan Kara, Vlastimil Babka and several other -mm people, for describing
+these categories:
+
+CASE 1: Direct IO (DIO)
+-----------------------
+There are GUP references to pages that are serving
+as DIO buffers. These buffers are needed for a relatively short time (so they
+are not "long term"). No special synchronization with page_mkclean() or
+munmap() is provided. Therefore, flags to set at the call site are: ::
+
+    FOLL_PIN
+
+...but rather than setting FOLL_PIN directly, call sites should use one of
+the pin_user_pages*() routines that set FOLL_PIN.
+
+CASE 2: RDMA
+------------
+There are GUP references to pages that are serving as DMA
+buffers. These buffers are needed for a long time ("long term"). No special
+synchronization with page_mkclean() or munmap() is provided. Therefore, flags
+to set at the call site are: ::
+
+    FOLL_PIN | FOLL_LONGTERM
+
+NOTE: Some pages, such as DAX pages, cannot be pinned with longterm pins. That's
+because DAX pages do not have a separate page cache, and so "pinning" implies
+locking down file system blocks, which is not (yet) supported in that way.
+
+CASE 3: Hardware with page faulting support
+-------------------------------------------
+Here, a well-written driver doesn't normally need to pin pages at all. However,
+if the driver does choose to do so, it can register MMU notifiers for the range,
+and will be called back upon invalidation. Either way (avoiding page pinning, or
+using MMU notifiers to unpin upon request), there is proper synchronization with
+both filesystem and mm (page_mkclean(), munmap(), etc).
+
+Therefore, neither flag needs to be set.
+
+In this case, ideally, neither get_user_pages() nor pin_user_pages() should be
+called. Instead, the software should be written so that it does not pin pages.
+This allows mm and filesystems to operate more efficiently and reliably.
+
+CASE 4: Pinning for struct page manipulation only
+-------------------------------------------------
+Here, normal GUP calls are sufficient, so neither flag needs to be set.
+
+page_dma_pinned(): the whole point of pinning
+=============================================
+
+The whole point of marking pages as "DMA-pinned" or "gup-pinned" is to be able
+to query, "is this page DMA-pinned?" That allows code such as page_mkclean()
+(and file system writeback code in general) to make informed decisions about
+what to do when a page cannot be unmapped due to such pins.
+
+What to do in those cases is the subject of a years-long series of discussions
+and debates (see the References at the end of this document). It's a TODO item
+here: fill in the details once that's worked out. Meanwhile, it's safe to say
+that having this available: ::
+
+        static inline bool page_dma_pinned(struct page *page)
+
+...is a prerequisite to solving the long-running gup+DMA problem.
+
+Another way of thinking about FOLL_GET, FOLL_PIN, and FOLL_LONGTERM
+===================================================================
+
+Another way of thinking about these flags is as a progression of restrictions:
+FOLL_GET is for struct page manipulation, without affecting the data that the
+struct page refers to. FOLL_PIN is a *replacement* for FOLL_GET, and is for
+short term pins on pages whose data *will* get accessed. As such, FOLL_PIN is
+a "more severe" form of pinning. And finally, FOLL_LONGTERM is an even more
+restrictive case that has FOLL_PIN as a prerequisite: this is for pages that
+will be pinned longterm, and whose data will be accessed.
+
+Unit testing
+============
+This file::
+
+ tools/testing/selftests/vm/gup_benchmark.c
+
+has the following new calls to exercise the new pin*() wrapper functions:
+
+* PIN_FAST_BENCHMARK (./gup_benchmark -a)
+* PIN_BENCHMARK (./gup_benchmark -b)
+
+You can monitor how many total dma-pinned pages have been acquired and released
+since the system was booted, via two new /proc/vmstat entries: ::
+
+    /proc/vmstat/nr_foll_pin_requested
+    /proc/vmstat/nr_foll_pin_requested
+
+Those are both going to show zero, unless CONFIG_DEBUG_VM is set. This is
+because there is a noticeable performance drop in put_user_page(), when they
+are activated.
+
+References
+==========
+
+* `Some slow progress on get_user_pages() (Apr 2, 2019) <https://lwn.net/Articles/784574/>`_
+* `DMA and get_user_pages() (LPC: Dec 12, 2018) <https://lwn.net/Articles/774411/>`_
+* `The trouble with get_user_pages() (Apr 30, 2018) <https://lwn.net/Articles/753027/>`_
+
+John Hubbard, October, 2019
--- a/include/linux/mm.h~mm-gup-introduce-pin_user_pages-and-foll_pin
+++ a/include/linux/mm.h
@@ -1042,16 +1042,14 @@ static inline void put_page(struct page
  * put_user_page() - release a gup-pinned page
  * @page:            pointer to page to be released
  *
- * Pages that were pinned via get_user_pages*() must be released via
- * either put_user_page(), or one of the put_user_pages*() routines
- * below. This is so that eventually, pages that are pinned via
- * get_user_pages*() can be separately tracked and uniquely handled. In
- * particular, interactions with RDMA and filesystems need special
- * handling.
+ * Pages that were pinned via pin_user_pages*() must be released via either
+ * put_user_page(), or one of the put_user_pages*() routines. This is so that
+ * eventually such pages can be separately tracked and uniquely handled. In
+ * particular, interactions with RDMA and filesystems need special handling.
  *
  * put_user_page() and put_page() are not interchangeable, despite this early
  * implementation that makes them look the same. put_user_page() calls must
- * be perfectly matched up with get_user_page() calls.
+ * be perfectly matched up with pin*() calls.
  */
 static inline void put_user_page(struct page *page)
 {
@@ -1509,9 +1507,16 @@ long get_user_pages_remote(struct task_s
 			    unsigned long start, unsigned long nr_pages,
 			    unsigned int gup_flags, struct page **pages,
 			    struct vm_area_struct **vmas, int *locked);
+long pin_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
+			   unsigned long start, unsigned long nr_pages,
+			   unsigned int gup_flags, struct page **pages,
+			   struct vm_area_struct **vmas, int *locked);
 long get_user_pages(unsigned long start, unsigned long nr_pages,
 			    unsigned int gup_flags, struct page **pages,
 			    struct vm_area_struct **vmas);
+long pin_user_pages(unsigned long start, unsigned long nr_pages,
+		    unsigned int gup_flags, struct page **pages,
+		    struct vm_area_struct **vmas);
 long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
 		    unsigned int gup_flags, struct page **pages, int *locked);
 long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
@@ -1519,6 +1524,8 @@ long get_user_pages_unlocked(unsigned lo
 
 int get_user_pages_fast(unsigned long start, int nr_pages,
 			unsigned int gup_flags, struct page **pages);
+int pin_user_pages_fast(unsigned long start, int nr_pages,
+			unsigned int gup_flags, struct page **pages);
 
 int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc);
 int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
@@ -2583,13 +2590,15 @@ struct page *follow_page(struct vm_area_
 #define FOLL_ANON	0x8000	/* don't do file mappings */
 #define FOLL_LONGTERM	0x10000	/* mapping lifetime is indefinite: see below */
 #define FOLL_SPLIT_PMD	0x20000	/* split huge pmd before returning */
+#define FOLL_PIN	0x40000	/* pages must be released via put_user_page() */
 
 /*
- * NOTE on FOLL_LONGTERM:
+ * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each
+ * other. Here is what they mean, and how to use them:
  *
  * FOLL_LONGTERM indicates that the page will be held for an indefinite time
- * period _often_ under userspace control.  This is contrasted with
- * iov_iter_get_pages() where usages which are transient.
+ * period _often_ under userspace control.  This is in contrast to
+ * iov_iter_get_pages(), whose usages are transient.
  *
  * FIXME: For pages which are part of a filesystem, mappings are subject to the
  * lifetime enforced by the filesystem and we need guarantees that longterm
@@ -2604,11 +2613,39 @@ struct page *follow_page(struct vm_area_
  * Currently only get_user_pages() and get_user_pages_fast() support this flag
  * and calls to get_user_pages_[un]locked are specifically not allowed.  This
  * is due to an incompatibility with the FS DAX check and
- * FAULT_FLAG_ALLOW_RETRY
+ * FAULT_FLAG_ALLOW_RETRY.
  *
- * In the CMA case: longterm pins in a CMA region would unnecessarily fragment
- * that region.  And so CMA attempts to migrate the page before pinning when
+ * In the CMA case: long term pins in a CMA region would unnecessarily fragment
+ * that region.  And so, CMA attempts to migrate the page before pinning, when
  * FOLL_LONGTERM is specified.
+ *
+ * FOLL_PIN indicates that a special kind of tracking (not just page->_refcount,
+ * but an additional pin counting system) will be invoked. This is intended for
+ * anything that gets a page reference and then touches page data (for example,
+ * Direct IO). This lets the filesystem know that some non-file-system entity is
+ * potentially changing the pages' data. In contrast to FOLL_GET (whose pages
+ * are released via put_page()), FOLL_PIN pages must be released, ultimately, by
+ * a call to put_user_page().
+ *
+ * FOLL_PIN is similar to FOLL_GET: both of these pin pages. They use different
+ * and separate refcounting mechanisms, however, and that means that each has
+ * its own acquire and release mechanisms:
+ *
+ *     FOLL_GET: get_user_pages*() to acquire, and put_page() to release.
+ *
+ *     FOLL_PIN: pin_user_pages*() to acquire, and put_user_pages to release.
+ *
+ * FOLL_PIN and FOLL_GET are mutually exclusive for a given function call.
+ * (The underlying pages may experience both FOLL_GET-based and FOLL_PIN-based
+ * calls applied to them, and that's perfectly OK. This is a constraint on the
+ * callers, not on the pages.)
+ *
+ * FOLL_PIN should be set internally by the pin_user_pages*() APIs, never
+ * directly by the caller. That's in order to help avoid mismatches when
+ * releasing pages: get_user_pages*() pages must be released via put_page(),
+ * while pin_user_pages*() pages must be released via put_user_page().
+ *
+ * Please see Documentation/vm/pin_user_pages.rst for more information.
  */
 
 static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags)
--- a/mm/gup.c~mm-gup-introduce-pin_user_pages-and-foll_pin
+++ a/mm/gup.c
@@ -194,6 +194,10 @@ static struct page *follow_page_pte(stru
 	spinlock_t *ptl;
 	pte_t *ptep, pte;
 
+	/* FOLL_GET and FOLL_PIN are mutually exclusive. */
+	if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) ==
+			 (FOLL_PIN | FOLL_GET)))
+		return ERR_PTR(-EINVAL);
 retry:
 	if (unlikely(pmd_bad(*pmd)))
 		return no_page_table(vma, flags);
@@ -811,7 +815,7 @@ static long __get_user_pages(struct task
 
 	start = untagged_addr(start);
 
-	VM_BUG_ON(!!pages != !!(gup_flags & FOLL_GET));
+	VM_BUG_ON(!!pages != !!(gup_flags & (FOLL_GET | FOLL_PIN)));
 
 	/*
 	 * If FOLL_FORCE is set then do not force a full fault as the hinting
@@ -1035,7 +1039,16 @@ static __always_inline long __get_user_p
 		BUG_ON(*locked != 1);
 	}
 
-	if (pages)
+	/*
+	 * FOLL_PIN and FOLL_GET are mutually exclusive. Traditional behavior
+	 * is to set FOLL_GET if the caller wants pages[] filled in (but has
+	 * carelessly failed to specify FOLL_GET), so keep doing that, but only
+	 * for FOLL_GET, not for the newer FOLL_PIN.
+	 *
+	 * FOLL_PIN always expects pages to be non-null, but no need to assert
+	 * that here, as any failures will be obvious enough.
+	 */
+	if (pages && !(flags & FOLL_PIN))
 		flags |= FOLL_GET;
 
 	pages_done = 0;
@@ -1606,12 +1619,20 @@ static __always_inline long __gup_longte
  * should use get_user_pages because it cannot pass
  * FAULT_FLAG_ALLOW_RETRY to handle_mm_fault.
  */
+#ifdef CONFIG_MMU
 long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
 		unsigned long start, unsigned long nr_pages,
 		unsigned int gup_flags, struct page **pages,
 		struct vm_area_struct **vmas, int *locked)
 {
 	/*
+	 * FOLL_PIN must only be set internally by the pin_user_pages*() APIs,
+	 * never directly by the caller, so enforce that with an assertion:
+	 */
+	if (WARN_ON_ONCE(gup_flags & FOLL_PIN))
+		return -EINVAL;
+
+	/*
 	 * Parts of FOLL_LONGTERM behavior are incompatible with
 	 * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on
 	 * vmas. However, this only comes up if locked is set, and there are
@@ -1636,6 +1657,16 @@ long get_user_pages_remote(struct task_s
 }
 EXPORT_SYMBOL(get_user_pages_remote);
 
+#else /* CONFIG_MMU */
+long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
+			   unsigned long start, unsigned long nr_pages,
+			   unsigned int gup_flags, struct page **pages,
+			   struct vm_area_struct **vmas, int *locked)
+{
+	return 0;
+}
+#endif /* !CONFIG_MMU */
+
 /*
  * This is the same as get_user_pages_remote(), just with a
  * less-flexible calling convention where we assume that the task
@@ -1647,6 +1678,13 @@ long get_user_pages(unsigned long start,
 		unsigned int gup_flags, struct page **pages,
 		struct vm_area_struct **vmas)
 {
+	/*
+	 * FOLL_PIN must only be set internally by the pin_user_pages*() APIs,
+	 * never directly by the caller, so enforce that with an assertion:
+	 */
+	if (WARN_ON_ONCE(gup_flags & FOLL_PIN))
+		return -EINVAL;
+
 	return __gup_longterm_locked(current, current->mm, start, nr_pages,
 				     pages, vmas, gup_flags | FOLL_TOUCH);
 }
@@ -2389,30 +2427,15 @@ static int __gup_longterm_unlocked(unsig
 	return ret;
 }
 
-/**
- * get_user_pages_fast() - pin user pages in memory
- * @start:	starting user address
- * @nr_pages:	number of pages from start to pin
- * @gup_flags:	flags modifying pin behaviour
- * @pages:	array that receives pointers to the pages pinned.
- *		Should be at least nr_pages long.
- *
- * Attempt to pin user pages in memory without taking mm->mmap_sem.
- * If not successful, it will fall back to taking the lock and
- * calling get_user_pages().
- *
- * Returns number of pages pinned. This may be fewer than the number
- * requested. If nr_pages is 0 or negative, returns 0. If no pages
- * were pinned, returns -errno.
- */
-int get_user_pages_fast(unsigned long start, int nr_pages,
-			unsigned int gup_flags, struct page **pages)
+static int internal_get_user_pages_fast(unsigned long start, int nr_pages,
+					unsigned int gup_flags,
+					struct page **pages)
 {
 	unsigned long addr, len, end;
 	int nr = 0, ret = 0;
 
 	if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM |
-				       FOLL_FORCE)))
+				       FOLL_FORCE | FOLL_PIN)))
 		return -EINVAL;
 
 	start = untagged_addr(start) & PAGE_MASK;
@@ -2452,4 +2475,103 @@ int get_user_pages_fast(unsigned long st
 
 	return ret;
 }
+
+/**
+ * get_user_pages_fast() - pin user pages in memory
+ * @start:	starting user address
+ * @nr_pages:	number of pages from start to pin
+ * @gup_flags:	flags modifying pin behaviour
+ * @pages:	array that receives pointers to the pages pinned.
+ *		Should be at least nr_pages long.
+ *
+ * Attempt to pin user pages in memory without taking mm->mmap_sem.
+ * If not successful, it will fall back to taking the lock and
+ * calling get_user_pages().
+ *
+ * Returns number of pages pinned. This may be fewer than the number requested.
+ * If nr_pages is 0 or negative, returns 0. If no pages were pinned, returns
+ * -errno.
+ */
+int get_user_pages_fast(unsigned long start, int nr_pages,
+			unsigned int gup_flags, struct page **pages)
+{
+	/*
+	 * FOLL_PIN must only be set internally by the pin_user_pages*() APIs,
+	 * never directly by the caller, so enforce that:
+	 */
+	if (WARN_ON_ONCE(gup_flags & FOLL_PIN))
+		return -EINVAL;
+
+	return internal_get_user_pages_fast(start, nr_pages, gup_flags, pages);
+}
 EXPORT_SYMBOL_GPL(get_user_pages_fast);
+
+/**
+ * pin_user_pages_fast() - pin user pages in memory without taking locks
+ *
+ * For now, this is a placeholder function, until various call sites are
+ * converted to use the correct get_user_pages*() or pin_user_pages*() API. So,
+ * this is identical to get_user_pages_fast().
+ *
+ * This is intended for Case 1 (DIO) in Documentation/vm/pin_user_pages.rst. It
+ * is NOT intended for Case 2 (RDMA: long-term pins).
+ */
+int pin_user_pages_fast(unsigned long start, int nr_pages,
+			unsigned int gup_flags, struct page **pages)
+{
+	/*
+	 * This is a placeholder, until the pin functionality is activated.
+	 * Until then, just behave like the corresponding get_user_pages*()
+	 * routine.
+	 */
+	return get_user_pages_fast(start, nr_pages, gup_flags, pages);
+}
+EXPORT_SYMBOL_GPL(pin_user_pages_fast);
+
+/**
+ * pin_user_pages_remote() - pin pages of a remote process (task != current)
+ *
+ * For now, this is a placeholder function, until various call sites are
+ * converted to use the correct get_user_pages*() or pin_user_pages*() API. So,
+ * this is identical to get_user_pages_remote().
+ *
+ * This is intended for Case 1 (DIO) in Documentation/vm/pin_user_pages.rst. It
+ * is NOT intended for Case 2 (RDMA: long-term pins).
+ */
+long pin_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
+			   unsigned long start, unsigned long nr_pages,
+			   unsigned int gup_flags, struct page **pages,
+			   struct vm_area_struct **vmas, int *locked)
+{
+	/*
+	 * This is a placeholder, until the pin functionality is activated.
+	 * Until then, just behave like the corresponding get_user_pages*()
+	 * routine.
+	 */
+	return get_user_pages_remote(tsk, mm, start, nr_pages, gup_flags, pages,
+				     vmas, locked);
+}
+EXPORT_SYMBOL(pin_user_pages_remote);
+
+/**
+ * pin_user_pages() - pin user pages in memory for use by other devices
+ *
+ * For now, this is a placeholder function, until various call sites are
+ * converted to use the correct get_user_pages*() or pin_user_pages*() API. So,
+ * this is identical to get_user_pages().
+ *
+ * This is intended for Case 1 (DIO) in Documentation/vm/pin_user_pages.rst. It
+ * is NOT intended for Case 2 (RDMA: long-term pins).
+ */
+long pin_user_pages(unsigned long start, unsigned long nr_pages,
+		    unsigned int gup_flags, struct page **pages,
+		    struct vm_area_struct **vmas)
+{
+	/*
+	 * This is a placeholder, until the pin functionality is activated.
+	 * Until then, just behave like the corresponding get_user_pages*()
+	 * routine.
+	 */
+	return get_user_pages(start, nr_pages, gup_flags, pages, vmas);
+}
+EXPORT_SYMBOL(pin_user_pages);
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 035/118] goldish_pipe: convert to pin_user_pages() and put_user_page()
  2020-01-31  6:10 incoming Andrew Morton
                   ` (33 preceding siblings ...)
  2020-01-31  6:12 ` [patch 034/118] mm/gup: introduce pin_user_pages*() and FOLL_PIN Andrew Morton
@ 2020-01-31  6:12 ` Andrew Morton
  2020-01-31  6:13 ` [patch 036/118] IB/{core,hw,umem}: set FOLL_PIN via pin_user_pages*(), fix up ODP Andrew Morton
                   ` (82 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:12 UTC (permalink / raw)
  To: akpm, alex.williamson, aneesh.kumar, axboe, bjorn.topel, corbet,
	dan.j.williams, daniel.vetter, hch, hverkuil-cisco, ira.weiny,
	jack, jgg, jgg, jglisse, jhubbard, kirill, leonro, linux-mm,
	mchehab, mm-commits, rppt, torvalds

From: John Hubbard <jhubbard@nvidia.com>
Subject: goldish_pipe: convert to pin_user_pages() and put_user_page()

1. Call the new global pin_user_pages_fast(), from
   pin_goldfish_pages().

2. As required by pin_user_pages(), release these pages via
   put_user_page().  In this case, do so via put_user_pages_dirty_lock().

That has the side effect of calling set_page_dirty_lock(), instead of
set_page_dirty().  This is probably more accurate.

As Christoph Hellwig put it, "set_page_dirty() is only safe if we are
dealing with a file backed page where we have reference on the inode it
hangs off." [1]

Another side effect is that the release code is simplified because the
page[] loop is now in gup.c instead of here, so just delete the local
release_user_pages() entirely, and call put_user_pages_dirty_lock()
directly, instead.

[1] https://lore.kernel.org/r/20190723153640.GB720@lst.de

Link: http://lkml.kernel.org/r/20200107224558.2362728-13-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/platform/goldfish/goldfish_pipe.c |   17 +++--------------
 1 file changed, 3 insertions(+), 14 deletions(-)

--- a/drivers/platform/goldfish/goldfish_pipe.c~goldish_pipe-convert-to-pin_user_pages-and-put_user_page
+++ a/drivers/platform/goldfish/goldfish_pipe.c
@@ -274,7 +274,7 @@ static int goldfish_pin_pages(unsigned l
 		*iter_last_page_size = last_page_size;
 	}
 
-	ret = get_user_pages_fast(first_page, requested_pages,
+	ret = pin_user_pages_fast(first_page, requested_pages,
 				  !is_write ? FOLL_WRITE : 0,
 				  pages);
 	if (ret <= 0)
@@ -285,18 +285,6 @@ static int goldfish_pin_pages(unsigned l
 	return ret;
 }
 
-static void release_user_pages(struct page **pages, int pages_count,
-			       int is_write, s32 consumed_size)
-{
-	int i;
-
-	for (i = 0; i < pages_count; i++) {
-		if (!is_write && consumed_size > 0)
-			set_page_dirty(pages[i]);
-		put_page(pages[i]);
-	}
-}

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 036/118] IB/{core,hw,umem}: set FOLL_PIN via pin_user_pages*(), fix up ODP
  2020-01-31  6:10 incoming Andrew Morton
                   ` (34 preceding siblings ...)
  2020-01-31  6:12 ` [patch 035/118] goldish_pipe: convert to pin_user_pages() and put_user_page() Andrew Morton
@ 2020-01-31  6:13 ` Andrew Morton
  2020-01-31  6:13 ` [patch 037/118] mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote() Andrew Morton
                   ` (81 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:13 UTC (permalink / raw)
  To: akpm, alex.williamson, aneesh.kumar, axboe, bjorn.topel, corbet,
	dan.j.williams, daniel.vetter, hch, hverkuil-cisco, ira.weiny,
	jack, jgg, jgg, jglisse, jhubbard, kirill, leonro, linux-mm,
	mchehab, mm-commits, rppt, torvalds

From: John Hubbard <jhubbard@nvidia.com>
Subject: IB/{core,hw,umem}: set FOLL_PIN via pin_user_pages*(), fix up ODP

Convert infiniband to use the new pin_user_pages*() calls.

Also, revert earlier changes to Infiniband ODP that had it using
put_user_page().  ODP is "Case 3" in
Documentation/core-api/pin_user_pages.rst, which is to say, normal
get_user_pages() and put_page() is the API to use there.

The new pin_user_pages*() calls replace corresponding get_user_pages*()
calls, and set the FOLL_PIN flag.  The FOLL_PIN flag requires that the
caller must return the pages via put_user_page*() calls, but infiniband
was already doing that as part of an earlier commit.

Link: http://lkml.kernel.org/r/20200107224558.2362728-14-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/infiniband/core/umem.c              |    2 +-
 drivers/infiniband/core/umem_odp.c          |   13 ++++++-------
 drivers/infiniband/hw/hfi1/user_pages.c     |    2 +-
 drivers/infiniband/hw/mthca/mthca_memfree.c |    2 +-
 drivers/infiniband/hw/qib/qib_user_pages.c  |    2 +-
 drivers/infiniband/hw/qib/qib_user_sdma.c   |    2 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c    |    2 +-
 drivers/infiniband/sw/siw/siw_mem.c         |    2 +-
 8 files changed, 13 insertions(+), 14 deletions(-)

--- a/drivers/infiniband/core/umem.c~ib-corehwumem-set-foll_pin-via-pin_user_pages-fix-up-odp
+++ a/drivers/infiniband/core/umem.c
@@ -257,7 +257,7 @@ struct ib_umem *ib_umem_get(struct ib_de
 	sg = umem->sg_head.sgl;
 
 	while (npages) {
-		ret = get_user_pages_fast(cur_base,
+		ret = pin_user_pages_fast(cur_base,
 					  min_t(unsigned long, npages,
 						PAGE_SIZE /
 						sizeof(struct page *)),
--- a/drivers/infiniband/core/umem_odp.c~ib-corehwumem-set-foll_pin-via-pin_user_pages-fix-up-odp
+++ a/drivers/infiniband/core/umem_odp.c
@@ -293,9 +293,8 @@ EXPORT_SYMBOL(ib_umem_odp_release);
  * The function returns -EFAULT if the DMA mapping operation fails. It returns
  * -EAGAIN if a concurrent invalidation prevents us from updating the page.
  *
- * The page is released via put_user_page even if the operation failed. For
- * on-demand pinning, the page is released whenever it isn't stored in the
- * umem.
+ * The page is released via put_page even if the operation failed. For on-demand
+ * pinning, the page is released whenever it isn't stored in the umem.
  */
 static int ib_umem_odp_map_dma_single_page(
 		struct ib_umem_odp *umem_odp,
@@ -348,7 +347,7 @@ static int ib_umem_odp_map_dma_single_pa
 	}
 
 out:
-	put_user_page(page);
+	put_page(page);
 	return ret;
 }
 
@@ -458,7 +457,7 @@ int ib_umem_odp_map_dma_pages(struct ib_
 					ret = -EFAULT;
 					break;
 				}
-				put_user_page(local_page_list[j]);
+				put_page(local_page_list[j]);
 				continue;
 			}
 
@@ -485,8 +484,8 @@ int ib_umem_odp_map_dma_pages(struct ib_
 			 * ib_umem_odp_map_dma_single_page().
 			 */
 			if (npages - (j + 1) > 0)
-				put_user_pages(&local_page_list[j+1],
-					       npages - (j + 1));
+				release_pages(&local_page_list[j+1],
+					      npages - (j + 1));
 			break;
 		}
 	}
--- a/drivers/infiniband/hw/hfi1/user_pages.c~ib-corehwumem-set-foll_pin-via-pin_user_pages-fix-up-odp
+++ a/drivers/infiniband/hw/hfi1/user_pages.c
@@ -106,7 +106,7 @@ int hfi1_acquire_user_pages(struct mm_st
 	int ret;
 	unsigned int gup_flags = FOLL_LONGTERM | (writable ? FOLL_WRITE : 0);
 
-	ret = get_user_pages_fast(vaddr, npages, gup_flags, pages);
+	ret = pin_user_pages_fast(vaddr, npages, gup_flags, pages);
 	if (ret < 0)
 		return ret;
 
--- a/drivers/infiniband/hw/mthca/mthca_memfree.c~ib-corehwumem-set-foll_pin-via-pin_user_pages-fix-up-odp
+++ a/drivers/infiniband/hw/mthca/mthca_memfree.c
@@ -472,7 +472,7 @@ int mthca_map_user_db(struct mthca_dev *
 		goto out;
 	}
 
-	ret = get_user_pages_fast(uaddr & PAGE_MASK, 1,
+	ret = pin_user_pages_fast(uaddr & PAGE_MASK, 1,
 				  FOLL_WRITE | FOLL_LONGTERM, pages);
 	if (ret < 0)
 		goto out;
--- a/drivers/infiniband/hw/qib/qib_user_pages.c~ib-corehwumem-set-foll_pin-via-pin_user_pages-fix-up-odp
+++ a/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -108,7 +108,7 @@ int qib_get_user_pages(unsigned long sta
 
 	down_read(&current->mm->mmap_sem);
 	for (got = 0; got < num_pages; got += ret) {
-		ret = get_user_pages(start_page + got * PAGE_SIZE,
+		ret = pin_user_pages(start_page + got * PAGE_SIZE,
 				     num_pages - got,
 				     FOLL_LONGTERM | FOLL_WRITE | FOLL_FORCE,
 				     p + got, NULL);
--- a/drivers/infiniband/hw/qib/qib_user_sdma.c~ib-corehwumem-set-foll_pin-via-pin_user_pages-fix-up-odp
+++ a/drivers/infiniband/hw/qib/qib_user_sdma.c
@@ -670,7 +670,7 @@ static int qib_user_sdma_pin_pages(const
 		else
 			j = npages;
 
-		ret = get_user_pages_fast(addr, j, FOLL_LONGTERM, pages);
+		ret = pin_user_pages_fast(addr, j, FOLL_LONGTERM, pages);
 		if (ret != j) {
 			i = 0;
 			j = ret;
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c~ib-corehwumem-set-foll_pin-via-pin_user_pages-fix-up-odp
+++ a/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -141,7 +141,7 @@ static int usnic_uiom_get_pages(unsigned
 	ret = 0;
 
 	while (npages) {
-		ret = get_user_pages(cur_base,
+		ret = pin_user_pages(cur_base,
 				     min_t(unsigned long, npages,
 				     PAGE_SIZE / sizeof(struct page *)),
 				     gup_flags | FOLL_LONGTERM,
--- a/drivers/infiniband/sw/siw/siw_mem.c~ib-corehwumem-set-foll_pin-via-pin_user_pages-fix-up-odp
+++ a/drivers/infiniband/sw/siw/siw_mem.c
@@ -426,7 +426,7 @@ struct siw_umem *siw_umem_get(u64 start,
 		while (nents) {
 			struct page **plist = &umem->page_chunk[i].plist[got];
 
-			rv = get_user_pages(first_page_va, nents,
+			rv = pin_user_pages(first_page_va, nents,
 					    foll_flags | FOLL_LONGTERM,
 					    plist, NULL);
 			if (rv < 0)
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 037/118] mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote()
  2020-01-31  6:10 incoming Andrew Morton
                   ` (35 preceding siblings ...)
  2020-01-31  6:13 ` [patch 036/118] IB/{core,hw,umem}: set FOLL_PIN via pin_user_pages*(), fix up ODP Andrew Morton
@ 2020-01-31  6:13 ` Andrew Morton
  2020-01-31  6:13 ` [patch 038/118] drm/via: set FOLL_PIN via pin_user_pages_fast() Andrew Morton
                   ` (80 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:13 UTC (permalink / raw)
  To: akpm, alex.williamson, aneesh.kumar, axboe, bjorn.topel, corbet,
	dan.j.williams, daniel.vetter, hch, hverkuil-cisco, ira.weiny,
	jack, jgg, jgg, jglisse, jhubbard, kirill, leonro, linux-mm,
	mchehab, mm-commits, rppt, torvalds

From: John Hubbard <jhubbard@nvidia.com>
Subject: mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote()

Convert process_vm_access to use the new pin_user_pages_remote() call,
which sets FOLL_PIN.  Setting FOLL_PIN is now required for code that
requires tracking of pinned pages.

Also, release the pages via put_user_page*().

Also, rename "pages" to "pinned_pages", as this makes for easier reading
of process_vm_rw_single_vec().

Link: http://lkml.kernel.org/r/20200107224558.2362728-15-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/process_vm_access.c |   28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)

--- a/mm/process_vm_access.c~mm-process_vm_access-set-foll_pin-via-pin_user_pages_remote
+++ a/mm/process_vm_access.c
@@ -42,12 +42,11 @@ static int process_vm_rw_pages(struct pa
 		if (copy > len)
 			copy = len;
 
-		if (vm_write) {
+		if (vm_write)
 			copied = copy_page_from_iter(page, offset, copy, iter);
-			set_page_dirty_lock(page);
-		} else {
+		else
 			copied = copy_page_to_iter(page, offset, copy, iter);
-		}
+
 		len -= copied;
 		if (copied < copy && iov_iter_count(iter))
 			return -EFAULT;
@@ -96,7 +95,7 @@ static int process_vm_rw_single_vec(unsi
 		flags |= FOLL_WRITE;
 
 	while (!rc && nr_pages && iov_iter_count(iter)) {
-		int pages = min(nr_pages, max_pages_per_loop);
+		int pinned_pages = min(nr_pages, max_pages_per_loop);
 		int locked = 1;
 		size_t bytes;
 
@@ -106,14 +105,15 @@ static int process_vm_rw_single_vec(unsi
 		 * current/current->mm
 		 */
 		down_read(&mm->mmap_sem);
-		pages = get_user_pages_remote(task, mm, pa, pages, flags,
-					      process_pages, NULL, &locked);
+		pinned_pages = pin_user_pages_remote(task, mm, pa, pinned_pages,
+						     flags, process_pages,
+						     NULL, &locked);
 		if (locked)
 			up_read(&mm->mmap_sem);
-		if (pages <= 0)
+		if (pinned_pages <= 0)
 			return -EFAULT;
 
-		bytes = pages * PAGE_SIZE - start_offset;
+		bytes = pinned_pages * PAGE_SIZE - start_offset;
 		if (bytes > len)
 			bytes = len;
 
@@ -122,10 +122,12 @@ static int process_vm_rw_single_vec(unsi
 					 vm_write);
 		len -= bytes;
 		start_offset = 0;
-		nr_pages -= pages;
-		pa += pages * PAGE_SIZE;
-		while (pages)
-			put_page(process_pages[--pages]);
+		nr_pages -= pinned_pages;
+		pa += pinned_pages * PAGE_SIZE;
+
+		/* If vm_write is set, the pages need to be made dirty: */
+		put_user_pages_dirty_lock(process_pages, pinned_pages,
+					  vm_write);
 	}
 
 	return rc;
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 038/118] drm/via: set FOLL_PIN via pin_user_pages_fast()
  2020-01-31  6:10 incoming Andrew Morton
                   ` (36 preceding siblings ...)
  2020-01-31  6:13 ` [patch 037/118] mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote() Andrew Morton
@ 2020-01-31  6:13 ` Andrew Morton
  2020-01-31  6:13 ` [patch 039/118] fs/io_uring: set FOLL_PIN via pin_user_pages() Andrew Morton
                   ` (79 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:13 UTC (permalink / raw)
  To: akpm, alex.williamson, aneesh.kumar, axboe, bjorn.topel, corbet,
	dan.j.williams, daniel.vetter, hch, hverkuil-cisco, ira.weiny,
	jack, jgg, jgg, jglisse, jhubbard, kirill, leonro, linux-mm,
	mchehab, mm-commits, rppt, torvalds

From: John Hubbard <jhubbard@nvidia.com>
Subject: drm/via: set FOLL_PIN via pin_user_pages_fast()

Convert drm/via to use the new pin_user_pages_fast() call, which sets
FOLL_PIN.  Setting FOLL_PIN is now required for code that requires
tracking of pinned pages, and therefore for any code that calls
put_user_page().

In partial anticipation of this work, the drm/via driver was already
calling put_user_page() instead of put_page().  Therefore, in order to
convert from the get_user_pages()/put_page() model, to the
pin_user_pages()/put_user_page() model, the only change required is to
change get_user_pages() to pin_user_pages().

Link: http://lkml.kernel.org/r/20200107224558.2362728-16-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/gpu/drm/via/via_dmablit.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/gpu/drm/via/via_dmablit.c~drm-via-set-foll_pin-via-pin_user_pages_fast
+++ a/drivers/gpu/drm/via/via_dmablit.c
@@ -239,7 +239,7 @@ via_lock_all_dma_pages(drm_via_sg_info_t
 	vsg->pages = vzalloc(array_size(sizeof(struct page *), vsg->num_pages));
 	if (NULL == vsg->pages)
 		return -ENOMEM;
-	ret = get_user_pages_fast((unsigned long)xfer->mem_addr,
+	ret = pin_user_pages_fast((unsigned long)xfer->mem_addr,
 			vsg->num_pages,
 			vsg->direction == DMA_FROM_DEVICE ? FOLL_WRITE : 0,
 			vsg->pages);
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 039/118] fs/io_uring: set FOLL_PIN via pin_user_pages()
  2020-01-31  6:10 incoming Andrew Morton
                   ` (37 preceding siblings ...)
  2020-01-31  6:13 ` [patch 038/118] drm/via: set FOLL_PIN via pin_user_pages_fast() Andrew Morton
@ 2020-01-31  6:13 ` Andrew Morton
  2020-01-31  6:13 ` [patch 040/118] net/xdp: " Andrew Morton
                   ` (78 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:13 UTC (permalink / raw)
  To: akpm, alex.williamson, aneesh.kumar, axboe, bjorn.topel, corbet,
	dan.j.williams, daniel.vetter, hch, hverkuil-cisco, ira.weiny,
	jack, jgg, jgg, jglisse, jhubbard, kirill, leonro, linux-mm,
	mchehab, mm-commits, rppt, torvalds

From: John Hubbard <jhubbard@nvidia.com>
Subject: fs/io_uring: set FOLL_PIN via pin_user_pages()

Convert fs/io_uring to use the new pin_user_pages() call, which sets
FOLL_PIN.  Setting FOLL_PIN is now required for code that requires
tracking of pinned pages, and therefore for any code that calls
put_user_page().

In partial anticipation of this work, the io_uring code was already
calling put_user_page() instead of put_page().  Therefore, in order to
convert from the get_user_pages()/put_page() model, to the
pin_user_pages()/put_user_page() model, the only change required here is
to change get_user_pages() to pin_user_pages().

Link: http://lkml.kernel.org/r/20200107224558.2362728-17-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/io_uring.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/io_uring.c~fs-io_uring-set-foll_pin-via-pin_user_pages
+++ a/fs/io_uring.c
@@ -6126,7 +6126,7 @@ static int io_sqe_buffer_register(struct
 
 		ret = 0;
 		down_read(&current->mm->mmap_sem);
-		pret = get_user_pages(ubuf, nr_pages,
+		pret = pin_user_pages(ubuf, nr_pages,
 				      FOLL_WRITE | FOLL_LONGTERM,
 				      pages, vmas);
 		if (pret == nr_pages) {
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 040/118] net/xdp: set FOLL_PIN via pin_user_pages()
  2020-01-31  6:10 incoming Andrew Morton
                   ` (38 preceding siblings ...)
  2020-01-31  6:13 ` [patch 039/118] fs/io_uring: set FOLL_PIN via pin_user_pages() Andrew Morton
@ 2020-01-31  6:13 ` Andrew Morton
  2020-01-31  6:13 ` [patch 041/118] media/v4l2-core: pin_user_pages (FOLL_PIN) and put_user_page() conversion Andrew Morton
                   ` (77 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:13 UTC (permalink / raw)
  To: akpm, alex.williamson, aneesh.kumar, axboe, bjorn.topel, corbet,
	dan.j.williams, daniel.vetter, hch, hverkuil-cisco, ira.weiny,
	jack, jgg, jgg, jglisse, jhubbard, kirill, leonro, linux-mm,
	mchehab, mm-commits, rppt, torvalds

From: John Hubbard <jhubbard@nvidia.com>
Subject: net/xdp: set FOLL_PIN via pin_user_pages()

Convert net/xdp to use the new pin_longterm_pages() call, which sets
FOLL_PIN.  Setting FOLL_PIN is now required for code that requires
tracking of pinned pages.

In partial anticipation of this work, the net/xdp code was already calling
put_user_page() instead of put_page().  Therefore, in order to convert
from the get_user_pages()/put_page() model, to the
pin_user_pages()/put_user_page() model, the only change required here is
to change get_user_pages() to pin_user_pages().

Link: http://lkml.kernel.org/r/20200107224558.2362728-18-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 net/xdp/xdp_umem.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/xdp/xdp_umem.c~net-xdp-set-foll_pin-via-pin_user_pages
+++ a/net/xdp/xdp_umem.c
@@ -291,7 +291,7 @@ static int xdp_umem_pin_pages(struct xdp
 		return -ENOMEM;
 
 	down_read(&current->mm->mmap_sem);
-	npgs = get_user_pages(umem->address, umem->npgs,
+	npgs = pin_user_pages(umem->address, umem->npgs,
 			      gup_flags | FOLL_LONGTERM, &umem->pgs[0], NULL);
 	up_read(&current->mm->mmap_sem);
 
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 041/118] media/v4l2-core: pin_user_pages (FOLL_PIN) and put_user_page() conversion
  2020-01-31  6:10 incoming Andrew Morton
                   ` (39 preceding siblings ...)
  2020-01-31  6:13 ` [patch 040/118] net/xdp: " Andrew Morton
@ 2020-01-31  6:13 ` Andrew Morton
  2020-01-31  6:13 ` [patch 042/118] vfio, mm: " Andrew Morton
                   ` (76 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:13 UTC (permalink / raw)
  To: akpm, alex.williamson, aneesh.kumar, axboe, bjorn.topel, corbet,
	dan.j.williams, daniel.vetter, hch, hverkuil-cisco, ira.weiny,
	jack, jgg, jgg, jglisse, jhubbard, kirill, leonro, linux-mm,
	mchehab, mm-commits, rppt, torvalds

From: John Hubbard <jhubbard@nvidia.com>
Subject: media/v4l2-core: pin_user_pages (FOLL_PIN) and put_user_page() conversion

1. Change v4l2 from get_user_pages() to pin_user_pages().

2. Because all FOLL_PIN-acquired pages must be released via
put_user_page(), also convert the put_page() call over to
put_user_pages_dirty_lock().

Link: http://lkml.kernel.org/r/20200107224558.2362728-19-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/media/v4l2-core/videobuf-dma-sg.c |   11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

--- a/drivers/media/v4l2-core/videobuf-dma-sg.c~media-v4l2-core-pin_user_pages-foll_pin-and-put_user_page-conversion
+++ a/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -183,12 +183,12 @@ static int videobuf_dma_init_user_locked
 	dprintk(1, "init user [0x%lx+0x%lx => %d pages]\n",
 		data, size, dma->nr_pages);
 
-	err = get_user_pages(data & PAGE_MASK, dma->nr_pages,
+	err = pin_user_pages(data & PAGE_MASK, dma->nr_pages,
 			     flags | FOLL_LONGTERM, dma->pages, NULL);
 
 	if (err != dma->nr_pages) {
 		dma->nr_pages = (err >= 0) ? err : 0;
-		dprintk(1, "get_user_pages: err=%d [%d]\n", err,
+		dprintk(1, "pin_user_pages: err=%d [%d]\n", err,
 			dma->nr_pages);
 		return err < 0 ? err : -EINVAL;
 	}
@@ -349,11 +349,8 @@ int videobuf_dma_free(struct videobuf_dm
 	BUG_ON(dma->sglen);
 
 	if (dma->pages) {
-		for (i = 0; i < dma->nr_pages; i++) {
-			if (dma->direction == DMA_FROM_DEVICE)
-				set_page_dirty_lock(dma->pages[i]);
-			put_page(dma->pages[i]);
-		}
+		put_user_pages_dirty_lock(dma->pages, dma->nr_pages,
+					  dma->direction == DMA_FROM_DEVICE);
 		kfree(dma->pages);
 		dma->pages = NULL;
 	}
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 042/118] vfio, mm: pin_user_pages (FOLL_PIN) and put_user_page() conversion
  2020-01-31  6:10 incoming Andrew Morton
                   ` (40 preceding siblings ...)
  2020-01-31  6:13 ` [patch 041/118] media/v4l2-core: pin_user_pages (FOLL_PIN) and put_user_page() conversion Andrew Morton
@ 2020-01-31  6:13 ` Andrew Morton
  2020-01-31  6:13 ` [patch 043/118] powerpc: book3s64: convert to pin_user_pages() and put_user_page() Andrew Morton
                   ` (75 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:13 UTC (permalink / raw)
  To: akpm, alex.williamson, aneesh.kumar, axboe, bjorn.topel, corbet,
	dan.j.williams, daniel.vetter, hch, hverkuil-cisco, ira.weiny,
	jack, jgg, jgg, jglisse, jhubbard, kirill, leonro, linux-mm,
	mchehab, mm-commits, rppt, torvalds

From: John Hubbard <jhubbard@nvidia.com>
Subject: vfio, mm: pin_user_pages (FOLL_PIN) and put_user_page() conversion

1. Change vfio from get_user_pages_remote(), to
   pin_user_pages_remote().

2. Because all FOLL_PIN-acquired pages must be released via
   put_user_page(), also convert the put_page() call over to
   put_user_pages_dirty_lock().

Note that this effectively changes the code's behavior in
vfio_iommu_type1.c: put_pfn(): it now ultimately calls
set_page_dirty_lock(), instead of set_page_dirty().  This is probably more
accurate.

As Christoph Hellwig put it, "set_page_dirty() is only safe if we are
dealing with a file backed page where we have reference on the inode it
hangs off." [1]

[1] https://lore.kernel.org/r/20190723153640.GB720@lst.de

Link: http://lkml.kernel.org/r/20200107224558.2362728-20-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/vfio/vfio_iommu_type1.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

--- a/drivers/vfio/vfio_iommu_type1.c~vfio-mm-pin_user_pages-foll_pin-and-put_user_page-conversion
+++ a/drivers/vfio/vfio_iommu_type1.c
@@ -309,9 +309,8 @@ static int put_pfn(unsigned long pfn, in
 {
 	if (!is_invalid_reserved_pfn(pfn)) {
 		struct page *page = pfn_to_page(pfn);
-		if (prot & IOMMU_WRITE)
-			SetPageDirty(page);
-		put_page(page);
+
+		put_user_pages_dirty_lock(&page, 1, prot & IOMMU_WRITE);
 		return 1;
 	}
 	return 0;
@@ -329,7 +328,7 @@ static int vaddr_get_pfn(struct mm_struc
 		flags |= FOLL_WRITE;
 
 	down_read(&mm->mmap_sem);
-	ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
+	ret = pin_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
 				    page, NULL, NULL);
 	if (ret == 1) {
 		*pfn = page_to_pfn(page[0]);
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 043/118] powerpc: book3s64: convert to pin_user_pages() and put_user_page()
  2020-01-31  6:10 incoming Andrew Morton
                   ` (41 preceding siblings ...)
  2020-01-31  6:13 ` [patch 042/118] vfio, mm: " Andrew Morton
@ 2020-01-31  6:13 ` Andrew Morton
  2020-01-31  6:13 ` [patch 044/118] mm/gup_benchmark: use proper FOLL_WRITE flags instead of hard-coding "1" Andrew Morton
                   ` (74 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:13 UTC (permalink / raw)
  To: akpm, alex.williamson, aneesh.kumar, axboe, bjorn.topel, corbet,
	dan.j.williams, daniel.vetter, hch, hverkuil-cisco, ira.weiny,
	jack, jgg, jgg, jglisse, jhubbard, kirill, leonro, linux-mm,
	mchehab, mm-commits, rppt, torvalds

From: John Hubbard <jhubbard@nvidia.com>
Subject: powerpc: book3s64: convert to pin_user_pages() and put_user_page()

1. Convert from get_user_pages() to pin_user_pages().

2. As required by pin_user_pages(), release these pages via
put_user_page().

Link: http://lkml.kernel.org/r/20200107224558.2362728-21-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/powerpc/mm/book3s64/iommu_api.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

--- a/arch/powerpc/mm/book3s64/iommu_api.c~powerpc-book3s64-convert-to-pin_user_pages-and-put_user_page
+++ a/arch/powerpc/mm/book3s64/iommu_api.c
@@ -103,7 +103,7 @@ static long mm_iommu_do_alloc(struct mm_
 	for (entry = 0; entry < entries; entry += chunk) {
 		unsigned long n = min(entries - entry, chunk);
 
-		ret = get_user_pages(ua + (entry << PAGE_SHIFT), n,
+		ret = pin_user_pages(ua + (entry << PAGE_SHIFT), n,
 				FOLL_WRITE | FOLL_LONGTERM,
 				mem->hpages + entry, NULL);
 		if (ret == n) {
@@ -167,9 +167,8 @@ good_exit:
 	return 0;
 
 free_exit:
-	/* free the reference taken */
-	for (i = 0; i < pinned; i++)
-		put_page(mem->hpages[i]);
+	/* free the references taken */
+	put_user_pages(mem->hpages, pinned);
 
 	vfree(mem->hpas);
 	kfree(mem);
@@ -215,7 +214,8 @@ static void mm_iommu_unpin(struct mm_iom
 		if (mem->hpas[i] & MM_IOMMU_TABLE_GROUP_PAGE_DIRTY)
 			SetPageDirty(page);
 
-		put_page(page);
+		put_user_page(page);
+
 		mem->hpas[i] = 0;
 	}
 }
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 044/118] mm/gup_benchmark: use proper FOLL_WRITE flags instead of hard-coding "1"
  2020-01-31  6:10 incoming Andrew Morton
                   ` (42 preceding siblings ...)
  2020-01-31  6:13 ` [patch 043/118] powerpc: book3s64: convert to pin_user_pages() and put_user_page() Andrew Morton
@ 2020-01-31  6:13 ` Andrew Morton
  2020-01-31  6:13 ` [patch 045/118] mm, tree-wide: rename put_user_page*() to unpin_user_page*() Andrew Morton
                   ` (73 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:13 UTC (permalink / raw)
  To: akpm, alex.williamson, aneesh.kumar, axboe, bjorn.topel, corbet,
	dan.j.williams, daniel.vetter, hch, hverkuil-cisco, ira.weiny,
	jack, jgg, jgg, jglisse, jhubbard, kirill, leonro, linux-mm,
	mchehab, mm-commits, rppt, torvalds

From: John Hubbard <jhubbard@nvidia.com>
Subject: mm/gup_benchmark: use proper FOLL_WRITE flags instead of hard-coding "1"

Fix the gup benchmark flags to use the symbolic FOLL_WRITE, instead of a
hard-coded "1" value.

Also, clean up the filtering of gup flags a little, by just doing it once
before issuing any of the get_user_pages*() calls.  This makes it harder
to overlook, instead of having little "gup_flags & 1" phrases in the
function calls.

Link: http://lkml.kernel.org/r/20200107224558.2362728-22-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/gup_benchmark.c                         |    9 ++++++---
 tools/testing/selftests/vm/gup_benchmark.c |    6 +++++-
 2 files changed, 11 insertions(+), 4 deletions(-)

--- a/mm/gup_benchmark.c~mm-gup_benchmark-use-proper-foll_write-flags-instead-of-hard-coding-1
+++ a/mm/gup_benchmark.c
@@ -49,18 +49,21 @@ static int __gup_benchmark_ioctl(unsigne
 			nr = (next - addr) / PAGE_SIZE;
 		}
 
+		/* Filter out most gup flags: only allow a tiny subset here: */
+		gup->flags &= FOLL_WRITE;
+
 		switch (cmd) {
 		case GUP_FAST_BENCHMARK:
-			nr = get_user_pages_fast(addr, nr, gup->flags & 1,
+			nr = get_user_pages_fast(addr, nr, gup->flags,
 						 pages + i);
 			break;
 		case GUP_LONGTERM_BENCHMARK:
 			nr = get_user_pages(addr, nr,
-					    (gup->flags & 1) | FOLL_LONGTERM,
+					    gup->flags | FOLL_LONGTERM,
 					    pages + i, NULL);
 			break;
 		case GUP_BENCHMARK:
-			nr = get_user_pages(addr, nr, gup->flags & 1, pages + i,
+			nr = get_user_pages(addr, nr, gup->flags, pages + i,
 					    NULL);
 			break;
 		default:
--- a/tools/testing/selftests/vm/gup_benchmark.c~mm-gup_benchmark-use-proper-foll_write-flags-instead-of-hard-coding-1
+++ a/tools/testing/selftests/vm/gup_benchmark.c
@@ -18,6 +18,9 @@
 #define GUP_LONGTERM_BENCHMARK	_IOWR('g', 2, struct gup_benchmark)
 #define GUP_BENCHMARK		_IOWR('g', 3, struct gup_benchmark)
 
+/* Just the flags we need, copied from mm.h: */
+#define FOLL_WRITE	0x01	/* check pte is writable */
+
 struct gup_benchmark {
 	__u64 get_delta_usec;
 	__u64 put_delta_usec;
@@ -85,7 +88,8 @@ int main(int argc, char **argv)
 	}
 
 	gup.nr_pages_per_call = nr_pages;
-	gup.flags = write;
+	if (write)
+		gup.flags |= FOLL_WRITE;
 
 	fd = open("/sys/kernel/debug/gup_benchmark", O_RDWR);
 	if (fd == -1)
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 045/118] mm, tree-wide: rename put_user_page*() to unpin_user_page*()
  2020-01-31  6:10 incoming Andrew Morton
                   ` (43 preceding siblings ...)
  2020-01-31  6:13 ` [patch 044/118] mm/gup_benchmark: use proper FOLL_WRITE flags instead of hard-coding "1" Andrew Morton
@ 2020-01-31  6:13 ` Andrew Morton
  2020-01-31  6:13 ` [patch 046/118] mm/swapfile.c: swap_next should increase position index Andrew Morton
                   ` (72 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:13 UTC (permalink / raw)
  To: akpm, alex.williamson, aneesh.kumar, axboe, bjorn.topel, corbet,
	dan.j.williams, daniel.vetter, hch, hverkuil-cisco, ira.weiny,
	jack, jgg, jgg, jglisse, jhubbard, kirill, leonro, linux-mm,
	mchehab, mm-commits, rppt, torvalds

From: John Hubbard <jhubbard@nvidia.com>
Subject: mm, tree-wide: rename put_user_page*() to unpin_user_page*()

In order to provide a clearer, more symmetric API for pinning and
unpinning DMA pages.  This way, pin_user_pages*() calls match up with
unpin_user_pages*() calls, and the API is a lot closer to being
self-explanatory.

Link: http://lkml.kernel.org/r/20200107224558.2362728-23-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 Documentation/core-api/pin_user_pages.rst   |    2 -
 arch/powerpc/mm/book3s64/iommu_api.c        |    4 +-
 drivers/gpu/drm/via/via_dmablit.c           |    4 +-
 drivers/infiniband/core/umem.c              |    2 -
 drivers/infiniband/hw/hfi1/user_pages.c     |    2 -
 drivers/infiniband/hw/mthca/mthca_memfree.c |    6 +--
 drivers/infiniband/hw/qib/qib_user_pages.c  |    2 -
 drivers/infiniband/hw/qib/qib_user_sdma.c   |    6 +--
 drivers/infiniband/hw/usnic/usnic_uiom.c    |    2 -
 drivers/infiniband/sw/siw/siw_mem.c         |    2 -
 drivers/media/v4l2-core/videobuf-dma-sg.c   |    4 +-
 drivers/platform/goldfish/goldfish_pipe.c   |    4 +-
 drivers/vfio/vfio_iommu_type1.c             |    2 -
 fs/io_uring.c                               |    4 +-
 include/linux/mm.h                          |   26 +++++++-------
 mm/gup.c                                    |   32 +++++++++---------
 mm/process_vm_access.c                      |    4 +-
 net/xdp/xdp_umem.c                          |    2 -
 18 files changed, 55 insertions(+), 55 deletions(-)

--- a/arch/powerpc/mm/book3s64/iommu_api.c~mm-tree-wide-rename-put_user_page-to-unpin_user_page
+++ a/arch/powerpc/mm/book3s64/iommu_api.c
@@ -168,7 +168,7 @@ good_exit:
 
 free_exit:
 	/* free the references taken */
-	put_user_pages(mem->hpages, pinned);
+	unpin_user_pages(mem->hpages, pinned);
 
 	vfree(mem->hpas);
 	kfree(mem);
@@ -214,7 +214,7 @@ static void mm_iommu_unpin(struct mm_iom
 		if (mem->hpas[i] & MM_IOMMU_TABLE_GROUP_PAGE_DIRTY)
 			SetPageDirty(page);
 
-		put_user_page(page);
+		unpin_user_page(page);
 
 		mem->hpas[i] = 0;
 	}
--- a/Documentation/core-api/pin_user_pages.rst~mm-tree-wide-rename-put_user_page-to-unpin_user_page
+++ a/Documentation/core-api/pin_user_pages.rst
@@ -219,7 +219,7 @@ since the system was booted, via two new
     /proc/vmstat/nr_foll_pin_requested
 
 Those are both going to show zero, unless CONFIG_DEBUG_VM is set. This is
-because there is a noticeable performance drop in put_user_page(), when they
+because there is a noticeable performance drop in unpin_user_page(), when they
 are activated.
 
 References
--- a/drivers/gpu/drm/via/via_dmablit.c~mm-tree-wide-rename-put_user_page-to-unpin_user_page
+++ a/drivers/gpu/drm/via/via_dmablit.c
@@ -188,8 +188,8 @@ via_free_sg_info(struct pci_dev *pdev, d
 		kfree(vsg->desc_pages);
 		/* fall through */
 	case dr_via_pages_locked:
-		put_user_pages_dirty_lock(vsg->pages, vsg->num_pages,
-					  (vsg->direction == DMA_FROM_DEVICE));
+		unpin_user_pages_dirty_lock(vsg->pages, vsg->num_pages,
+					   (vsg->direction == DMA_FROM_DEVICE));
 		/* fall through */
 	case dr_via_pages_alloc:
 		vfree(vsg->pages);
--- a/drivers/infiniband/core/umem.c~mm-tree-wide-rename-put_user_page-to-unpin_user_page
+++ a/drivers/infiniband/core/umem.c
@@ -54,7 +54,7 @@ static void __ib_umem_release(struct ib_
 
 	for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->sg_nents, 0) {
 		page = sg_page_iter_page(&sg_iter);
-		put_user_pages_dirty_lock(&page, 1, umem->writable && dirty);
+		unpin_user_pages_dirty_lock(&page, 1, umem->writable && dirty);
 	}
 
 	sg_free_table(&umem->sg_head);
--- a/drivers/infiniband/hw/hfi1/user_pages.c~mm-tree-wide-rename-put_user_page-to-unpin_user_page
+++ a/drivers/infiniband/hw/hfi1/user_pages.c
@@ -118,7 +118,7 @@ int hfi1_acquire_user_pages(struct mm_st
 void hfi1_release_user_pages(struct mm_struct *mm, struct page **p,
 			     size_t npages, bool dirty)
 {
-	put_user_pages_dirty_lock(p, npages, dirty);
+	unpin_user_pages_dirty_lock(p, npages, dirty);
 
 	if (mm) { /* during close after signal, mm can be NULL */
 		atomic64_sub(npages, &mm->pinned_vm);
--- a/drivers/infiniband/hw/mthca/mthca_memfree.c~mm-tree-wide-rename-put_user_page-to-unpin_user_page
+++ a/drivers/infiniband/hw/mthca/mthca_memfree.c
@@ -482,7 +482,7 @@ int mthca_map_user_db(struct mthca_dev *
 
 	ret = pci_map_sg(dev->pdev, &db_tab->page[i].mem, 1, PCI_DMA_TODEVICE);
 	if (ret < 0) {
-		put_user_page(pages[0]);
+		unpin_user_page(pages[0]);
 		goto out;
 	}
 
@@ -490,7 +490,7 @@ int mthca_map_user_db(struct mthca_dev *
 				 mthca_uarc_virt(dev, uar, i));
 	if (ret) {
 		pci_unmap_sg(dev->pdev, &db_tab->page[i].mem, 1, PCI_DMA_TODEVICE);
-		put_user_page(sg_page(&db_tab->page[i].mem));
+		unpin_user_page(sg_page(&db_tab->page[i].mem));
 		goto out;
 	}
 
@@ -556,7 +556,7 @@ void mthca_cleanup_user_db_tab(struct mt
 		if (db_tab->page[i].uvirt) {
 			mthca_UNMAP_ICM(dev, mthca_uarc_virt(dev, uar, i), 1);
 			pci_unmap_sg(dev->pdev, &db_tab->page[i].mem, 1, PCI_DMA_TODEVICE);
-			put_user_page(sg_page(&db_tab->page[i].mem));
+			unpin_user_page(sg_page(&db_tab->page[i].mem));
 		}
 	}
 
--- a/drivers/infiniband/hw/qib/qib_user_pages.c~mm-tree-wide-rename-put_user_page-to-unpin_user_page
+++ a/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -40,7 +40,7 @@
 static void __qib_release_user_pages(struct page **p, size_t num_pages,
 				     int dirty)
 {
-	put_user_pages_dirty_lock(p, num_pages, dirty);
+	unpin_user_pages_dirty_lock(p, num_pages, dirty);
 }
 
 /**
--- a/drivers/infiniband/hw/qib/qib_user_sdma.c~mm-tree-wide-rename-put_user_page-to-unpin_user_page
+++ a/drivers/infiniband/hw/qib/qib_user_sdma.c
@@ -317,7 +317,7 @@ static int qib_user_sdma_page_to_frags(c
 		 * the caller can ignore this page.
 		 */
 		if (put) {
-			put_user_page(page);
+			unpin_user_page(page);
 		} else {
 			/* coalesce case */
 			kunmap(page);
@@ -631,7 +631,7 @@ static void qib_user_sdma_free_pkt_frag(
 			kunmap(pkt->addr[i].page);
 
 		if (pkt->addr[i].put_page)
-			put_user_page(pkt->addr[i].page);
+			unpin_user_page(pkt->addr[i].page);
 		else
 			__free_page(pkt->addr[i].page);
 	} else if (pkt->addr[i].kvaddr) {
@@ -706,7 +706,7 @@ static int qib_user_sdma_pin_pages(const
 	/* if error, return all pages not managed by pkt */
 free_pages:
 	while (i < j)
-		put_user_page(pages[i++]);
+		unpin_user_page(pages[i++]);
 
 done:
 	return ret;
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c~mm-tree-wide-rename-put_user_page-to-unpin_user_page
+++ a/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -75,7 +75,7 @@ static void usnic_uiom_put_pages(struct
 		for_each_sg(chunk->page_list, sg, chunk->nents, i) {
 			page = sg_page(sg);
 			pa = sg_phys(sg);
-			put_user_pages_dirty_lock(&page, 1, dirty);
+			unpin_user_pages_dirty_lock(&page, 1, dirty);
 			usnic_dbg("pa: %pa\n", &pa);
 		}
 		kfree(chunk);
--- a/drivers/infiniband/sw/siw/siw_mem.c~mm-tree-wide-rename-put_user_page-to-unpin_user_page
+++ a/drivers/infiniband/sw/siw/siw_mem.c
@@ -63,7 +63,7 @@ struct siw_mem *siw_mem_id2obj(struct si
 static void siw_free_plist(struct siw_page_chunk *chunk, int num_pages,
 			   bool dirty)
 {
-	put_user_pages_dirty_lock(chunk->plist, num_pages, dirty);
+	unpin_user_pages_dirty_lock(chunk->plist, num_pages, dirty);
 }
 
 void siw_umem_release(struct siw_umem *umem, bool dirty)
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c~mm-tree-wide-rename-put_user_page-to-unpin_user_page
+++ a/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -349,8 +349,8 @@ int videobuf_dma_free(struct videobuf_dm
 	BUG_ON(dma->sglen);
 
 	if (dma->pages) {
-		put_user_pages_dirty_lock(dma->pages, dma->nr_pages,
-					  dma->direction == DMA_FROM_DEVICE);
+		unpin_user_pages_dirty_lock(dma->pages, dma->nr_pages,
+					    dma->direction == DMA_FROM_DEVICE);
 		kfree(dma->pages);
 		dma->pages = NULL;
 	}
--- a/drivers/platform/goldfish/goldfish_pipe.c~mm-tree-wide-rename-put_user_page-to-unpin_user_page
+++ a/drivers/platform/goldfish/goldfish_pipe.c
@@ -360,8 +360,8 @@ static int transfer_max_buffers(struct g
 
 	*consumed_size = pipe->command_buffer->rw_params.consumed_size;
 
-	put_user_pages_dirty_lock(pipe->pages, pages_count,
-				  !is_write && *consumed_size > 0);
+	unpin_user_pages_dirty_lock(pipe->pages, pages_count,
+				    !is_write && *consumed_size > 0);
 
 	mutex_unlock(&pipe->lock);
 	return 0;
--- a/drivers/vfio/vfio_iommu_type1.c~mm-tree-wide-rename-put_user_page-to-unpin_user_page
+++ a/drivers/vfio/vfio_iommu_type1.c
@@ -310,7 +310,7 @@ static int put_pfn(unsigned long pfn, in
 	if (!is_invalid_reserved_pfn(pfn)) {
 		struct page *page = pfn_to_page(pfn);
 
-		put_user_pages_dirty_lock(&page, 1, prot & IOMMU_WRITE);
+		unpin_user_pages_dirty_lock(&page, 1, prot & IOMMU_WRITE);
 		return 1;
 	}
 	return 0;
--- a/fs/io_uring.c~mm-tree-wide-rename-put_user_page-to-unpin_user_page
+++ a/fs/io_uring.c
@@ -6005,7 +6005,7 @@ static int io_sqe_buffer_unregister(stru
 		struct io_mapped_ubuf *imu = &ctx->user_bufs[i];
 
 		for (j = 0; j < imu->nr_bvecs; j++)
-			put_user_page(imu->bvec[j].bv_page);
+			unpin_user_page(imu->bvec[j].bv_page);
 
 		if (ctx->account_mem)
 			io_unaccount_mem(ctx->user, imu->nr_bvecs);
@@ -6150,7 +6150,7 @@ static int io_sqe_buffer_register(struct
 			 * release any pages we did get
 			 */
 			if (pret > 0)
-				put_user_pages(pages, pret);
+				unpin_user_pages(pages, pret);
 			if (ctx->account_mem)
 				io_unaccount_mem(ctx->user, nr_pages);
 			kvfree(imu->bvec);
--- a/include/linux/mm.h~mm-tree-wide-rename-put_user_page-to-unpin_user_page
+++ a/include/linux/mm.h
@@ -1039,27 +1039,27 @@ static inline void put_page(struct page
 }
 
 /**
- * put_user_page() - release a gup-pinned page
+ * unpin_user_page() - release a gup-pinned page
  * @page:            pointer to page to be released
  *
  * Pages that were pinned via pin_user_pages*() must be released via either
- * put_user_page(), or one of the put_user_pages*() routines. This is so that
- * eventually such pages can be separately tracked and uniquely handled. In
+ * unpin_user_page(), or one of the unpin_user_pages*() routines. This is so
+ * that eventually such pages can be separately tracked and uniquely handled. In
  * particular, interactions with RDMA and filesystems need special handling.
  *
- * put_user_page() and put_page() are not interchangeable, despite this early
- * implementation that makes them look the same. put_user_page() calls must
+ * unpin_user_page() and put_page() are not interchangeable, despite this early
+ * implementation that makes them look the same. unpin_user_page() calls must
  * be perfectly matched up with pin*() calls.
  */
-static inline void put_user_page(struct page *page)
+static inline void unpin_user_page(struct page *page)
 {
 	put_page(page);
 }
 
-void put_user_pages_dirty_lock(struct page **pages, unsigned long npages,
-			       bool make_dirty);
+void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
+				 bool make_dirty);
 
-void put_user_pages(struct page **pages, unsigned long npages);
+void unpin_user_pages(struct page **pages, unsigned long npages);
 
 #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
 #define SECTION_IN_PAGE_FLAGS
@@ -2590,7 +2590,7 @@ struct page *follow_page(struct vm_area_
 #define FOLL_ANON	0x8000	/* don't do file mappings */
 #define FOLL_LONGTERM	0x10000	/* mapping lifetime is indefinite: see below */
 #define FOLL_SPLIT_PMD	0x20000	/* split huge pmd before returning */
-#define FOLL_PIN	0x40000	/* pages must be released via put_user_page() */
+#define FOLL_PIN	0x40000	/* pages must be released via unpin_user_page */
 
 /*
  * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each
@@ -2625,7 +2625,7 @@ struct page *follow_page(struct vm_area_
  * Direct IO). This lets the filesystem know that some non-file-system entity is
  * potentially changing the pages' data. In contrast to FOLL_GET (whose pages
  * are released via put_page()), FOLL_PIN pages must be released, ultimately, by
- * a call to put_user_page().
+ * a call to unpin_user_page().
  *
  * FOLL_PIN is similar to FOLL_GET: both of these pin pages. They use different
  * and separate refcounting mechanisms, however, and that means that each has
@@ -2633,7 +2633,7 @@ struct page *follow_page(struct vm_area_
  *
  *     FOLL_GET: get_user_pages*() to acquire, and put_page() to release.
  *
- *     FOLL_PIN: pin_user_pages*() to acquire, and put_user_pages to release.
+ *     FOLL_PIN: pin_user_pages*() to acquire, and unpin_user_pages to release.
  *
  * FOLL_PIN and FOLL_GET are mutually exclusive for a given function call.
  * (The underlying pages may experience both FOLL_GET-based and FOLL_PIN-based
@@ -2643,7 +2643,7 @@ struct page *follow_page(struct vm_area_
  * FOLL_PIN should be set internally by the pin_user_pages*() APIs, never
  * directly by the caller. That's in order to help avoid mismatches when
  * releasing pages: get_user_pages*() pages must be released via put_page(),
- * while pin_user_pages*() pages must be released via put_user_page().
+ * while pin_user_pages*() pages must be released via unpin_user_page().
  *
  * Please see Documentation/vm/pin_user_pages.rst for more information.
  */
--- a/mm/gup.c~mm-tree-wide-rename-put_user_page-to-unpin_user_page
+++ a/mm/gup.c
@@ -45,7 +45,7 @@ static inline struct page *try_get_compo
 }
 
 /**
- * put_user_pages_dirty_lock() - release and optionally dirty gup-pinned pages
+ * unpin_user_pages_dirty_lock() - release and optionally dirty gup-pinned pages
  * @pages:  array of pages to be maybe marked dirty, and definitely released.
  * @npages: number of pages in the @pages array.
  * @make_dirty: whether to mark the pages dirty
@@ -55,19 +55,19 @@ static inline struct page *try_get_compo
  *
  * For each page in the @pages array, make that page (or its head page, if a
  * compound page) dirty, if @make_dirty is true, and if the page was previously
- * listed as clean. In any case, releases all pages using put_user_page(),
- * possibly via put_user_pages(), for the non-dirty case.
+ * listed as clean. In any case, releases all pages using unpin_user_page(),
+ * possibly via unpin_user_pages(), for the non-dirty case.
  *
- * Please see the put_user_page() documentation for details.
+ * Please see the unpin_user_page() documentation for details.
  *
  * set_page_dirty_lock() is used internally. If instead, set_page_dirty() is
  * required, then the caller should a) verify that this is really correct,
  * because _lock() is usually required, and b) hand code it:
- * set_page_dirty_lock(), put_user_page().
+ * set_page_dirty_lock(), unpin_user_page().
  *
  */
-void put_user_pages_dirty_lock(struct page **pages, unsigned long npages,
-			       bool make_dirty)
+void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
+				 bool make_dirty)
 {
 	unsigned long index;
 
@@ -78,7 +78,7 @@ void put_user_pages_dirty_lock(struct pa
 	 */
 
 	if (!make_dirty) {
-		put_user_pages(pages, npages);
+		unpin_user_pages(pages, npages);
 		return;
 	}
 
@@ -106,21 +106,21 @@ void put_user_pages_dirty_lock(struct pa
 		 */
 		if (!PageDirty(page))
 			set_page_dirty_lock(page);
-		put_user_page(page);
+		unpin_user_page(page);
 	}
 }
-EXPORT_SYMBOL(put_user_pages_dirty_lock);
+EXPORT_SYMBOL(unpin_user_pages_dirty_lock);
 
 /**
- * put_user_pages() - release an array of gup-pinned pages.
+ * unpin_user_pages() - release an array of gup-pinned pages.
  * @pages:  array of pages to be marked dirty and released.
  * @npages: number of pages in the @pages array.
  *
- * For each page in the @pages array, release the page using put_user_page().
+ * For each page in the @pages array, release the page using unpin_user_page().
  *
- * Please see the put_user_page() documentation for details.
+ * Please see the unpin_user_page() documentation for details.
  */
-void put_user_pages(struct page **pages, unsigned long npages)
+void unpin_user_pages(struct page **pages, unsigned long npages)
 {
 	unsigned long index;
 
@@ -130,9 +130,9 @@ void put_user_pages(struct page **pages,
 	 * single operation to the head page should suffice.
 	 */
 	for (index = 0; index < npages; index++)
-		put_user_page(pages[index]);
+		unpin_user_page(pages[index]);
 }
-EXPORT_SYMBOL(put_user_pages);
+EXPORT_SYMBOL(unpin_user_pages);
 
 #ifdef CONFIG_MMU
 static struct page *no_page_table(struct vm_area_struct *vma,
--- a/mm/process_vm_access.c~mm-tree-wide-rename-put_user_page-to-unpin_user_page
+++ a/mm/process_vm_access.c
@@ -126,8 +126,8 @@ static int process_vm_rw_single_vec(unsi
 		pa += pinned_pages * PAGE_SIZE;
 
 		/* If vm_write is set, the pages need to be made dirty: */
-		put_user_pages_dirty_lock(process_pages, pinned_pages,
-					  vm_write);
+		unpin_user_pages_dirty_lock(process_pages, pinned_pages,
+					    vm_write);
 	}
 
 	return rc;
--- a/net/xdp/xdp_umem.c~mm-tree-wide-rename-put_user_page-to-unpin_user_page
+++ a/net/xdp/xdp_umem.c
@@ -212,7 +212,7 @@ static int xdp_umem_map_pages(struct xdp
 
 static void xdp_umem_unpin_pages(struct xdp_umem *umem)
 {
-	put_user_pages_dirty_lock(umem->pgs, umem->npgs, true);
+	unpin_user_pages_dirty_lock(umem->pgs, umem->npgs, true);
 
 	kfree(umem->pgs);
 	umem->pgs = NULL;
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 046/118] mm/swapfile.c: swap_next should increase position index
  2020-01-31  6:10 incoming Andrew Morton
                   ` (44 preceding siblings ...)
  2020-01-31  6:13 ` [patch 045/118] mm, tree-wide: rename put_user_page*() to unpin_user_page*() Andrew Morton
@ 2020-01-31  6:13 ` Andrew Morton
  2020-01-31  6:13 ` [patch 047/118] mm/memcontrol.c: cleanup some useless code Andrew Morton
                   ` (71 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:13 UTC (permalink / raw)
  To: akpm, hughd, jannh, keescook, linux-mm, mm-commits, torvalds, viro, vvs

From: Vasily Averin <vvs@virtuozzo.com>
Subject: mm/swapfile.c: swap_next should increase position index

If seq_file .next fuction does not change position index, read after some
lseek can generate unexpected output.

In Aug 2018 NeilBrown noticed commit 1f4aace60b0e ("fs/seq_file.c:
simplify seq_file iteration code and interface") "Some ->next functions do
not increment *pos when they return NULL...  Note that such ->next
functions are buggy and should be fixed.  A simple demonstration is

dd if=/proc/swaps bs=1000 skip=1

Choose any block size larger than the size of /proc/swaps.  This will
always show the whole last line of /proc/swaps"

Described problem is still actual.  If you make lseek into middle of last
output line following read will output end of last line and whole last
line once again.

$ dd if=/proc/swaps bs=1  # usual output
Filename				Type		Size	Used	Priority
/dev/dm-0                               partition	4194812	97536	-2
104+0 records in
104+0 records out
104 bytes copied

$ dd if=/proc/swaps bs=40 skip=1    # last line was generated twice
dd: /proc/swaps: cannot skip to specified offset
v/dm-0                               partition	4194812	97536	-2
/dev/dm-0                               partition	4194812	97536	-2
3+1 records in
3+1 records out
131 bytes copied

https://bugzilla.kernel.org/show_bug.cgi?id=206283

Link: http://lkml.kernel.org/r/bd8cfd7b-ac95-9b91-f9e7-e8438bd5047d@virtuozzo.com
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Jann Horn <jannh@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/swapfile.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/swapfile.c~swap_next-should-increase-position-index
+++ a/mm/swapfile.c
@@ -2737,10 +2737,10 @@ static void *swap_next(struct seq_file *
 	else
 		type = si->type + 1;
 
+	++(*pos);
 	for (; (si = swap_type_to_swap_info(type)); type++) {
 		if (!(si->flags & SWP_USED) || !si->swap_map)
 			continue;
-		++*pos;
 		return si;
 	}
 
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 047/118] mm/memcontrol.c: cleanup some useless code
  2020-01-31  6:10 incoming Andrew Morton
                   ` (45 preceding siblings ...)
  2020-01-31  6:13 ` [patch 046/118] mm/swapfile.c: swap_next should increase position index Andrew Morton
@ 2020-01-31  6:13 ` Andrew Morton
  2020-01-31  6:13 ` [patch 048/118] mm/page_vma_mapped.c: explicitly compare pfn for normal, hugetlbfs and THP page Andrew Morton
                   ` (70 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:13 UTC (permalink / raw)
  To: akpm, hannes, linux-mm, mhocko, mm-commits, pilgrimtao, torvalds,
	vdavydov.dev

From: Kaitao Cheng <pilgrimtao@gmail.com>
Subject: mm/memcontrol.c: cleanup some useless code

Compound pages handling in mem_cgroup_migrate is more convoluted than
necessary.  The state is duplicated in compound variable and the same
could be achieved by PageTransHuge check which is trivial and
hpage_nr_pages is already PageTransHuge aware.

It is much simpler to just use hpage_nr_pages for nr_pages and replace the
local variable by PageTransHuge check directly

Link: http://lkml.kernel.org/r/20191210160450.3395-1-pilgrimtao@gmail.com
Signed-off-by: Kaitao Cheng <pilgrimtao@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memcontrol.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

--- a/mm/memcontrol.c~mm-cleanup-some-useless-code
+++ a/mm/memcontrol.c
@@ -6633,7 +6633,6 @@ void mem_cgroup_migrate(struct page *old
 {
 	struct mem_cgroup *memcg;
 	unsigned int nr_pages;
-	bool compound;
 	unsigned long flags;
 
 	VM_BUG_ON_PAGE(!PageLocked(oldpage), oldpage);
@@ -6655,8 +6654,7 @@ void mem_cgroup_migrate(struct page *old
 		return;
 
 	/* Force-charge the new page. The old one will be freed soon */
-	compound = PageTransHuge(newpage);
-	nr_pages = compound ? hpage_nr_pages(newpage) : 1;
+	nr_pages = hpage_nr_pages(newpage);
 
 	page_counter_charge(&memcg->memory, nr_pages);
 	if (do_memsw_account())
@@ -6666,7 +6664,8 @@ void mem_cgroup_migrate(struct page *old
 	commit_charge(newpage, memcg, false);
 
 	local_irq_save(flags);
-	mem_cgroup_charge_statistics(memcg, newpage, compound, nr_pages);
+	mem_cgroup_charge_statistics(memcg, newpage, PageTransHuge(newpage),
+			nr_pages);
 	memcg_check_events(memcg, newpage);
 	local_irq_restore(flags);
 }
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 048/118] mm/page_vma_mapped.c: explicitly compare pfn for normal, hugetlbfs and THP page
  2020-01-31  6:10 incoming Andrew Morton
                   ` (46 preceding siblings ...)
  2020-01-31  6:13 ` [patch 047/118] mm/memcontrol.c: cleanup some useless code Andrew Morton
@ 2020-01-31  6:13 ` Andrew Morton
  2020-01-31  6:13 ` [patch 049/118] mm, tracing: print symbol name for kmem_alloc_node call_site events Andrew Morton
                   ` (69 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:13 UTC (permalink / raw)
  To: akpm, kirill.shutemov, linux-mm, lixinhai.lxh, mhocko,
	mike.kravetz, mm-commits, torvalds

From: Li Xinhai <lixinhai.lxh@gmail.com>
Subject: mm/page_vma_mapped.c: explicitly compare pfn for normal, hugetlbfs and THP page

When check_pte, pfn of normal, hugetlbfs and THP page need be compared. 
The current implementation apply comparison as

- normal 4K page: page_pfn <= pfn < page_pfn + 1
- hugetlbfs page:  page_pfn <= pfn < page_pfn + HPAGE_PMD_NR
- THP page: page_pfn <= pfn < page_pfn + HPAGE_PMD_NR

in pfn_in_hpage.  For hugetlbfs page, it should be page_pfn == pfn

Now, change pfn_in_hpage to pfn_is_match to highlight that comparison is
not only for THP and explicitly compare for these cases.

No impact upon current behavior, just make the code clear.  I think it
is important to make the code clear - comparing hugetlbfs page in range
page_pfn <= pfn < page_pfn + HPAGE_PMD_NR is confusing.

Link: http://lkml.kernel.org/r/1578737885-8890-1-git-send-email-lixinhai.lxh@gmail.com
Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_vma_mapped.c |   12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

--- a/mm/page_vma_mapped.c~mm-page_vma_mappedc-explicitly-compare-pfn-for-normal-hugetlbfs-and-thp-page
+++ a/mm/page_vma_mapped.c
@@ -52,12 +52,16 @@ static bool map_pte(struct page_vma_mapp
 	return true;
 }
 
-static inline bool pfn_in_hpage(struct page *hpage, unsigned long pfn)
+static inline bool pfn_is_match(struct page *page, unsigned long pfn)
 {
-	unsigned long hpage_pfn = page_to_pfn(hpage);
+	unsigned long page_pfn = page_to_pfn(page);
+
+	/* normal page and hugetlbfs page */
+	if (!PageTransCompound(page) || PageHuge(page))
+		return page_pfn == pfn;
 
 	/* THP can be referenced by any subpage */
-	return pfn >= hpage_pfn && pfn - hpage_pfn < hpage_nr_pages(hpage);
+	return pfn >= page_pfn && pfn - page_pfn < hpage_nr_pages(page);
 }
 
 /**
@@ -108,7 +112,7 @@ static bool check_pte(struct page_vma_ma
 		pfn = pte_pfn(*pvmw->pte);
 	}
 
-	return pfn_in_hpage(pvmw->page, pfn);
+	return pfn_is_match(pvmw->page, pfn);
 }
 
 /**
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 049/118] mm, tracing: print symbol name for kmem_alloc_node call_site events
  2020-01-31  6:10 incoming Andrew Morton
                   ` (47 preceding siblings ...)
  2020-01-31  6:13 ` [patch 048/118] mm/page_vma_mapped.c: explicitly compare pfn for normal, hugetlbfs and THP page Andrew Morton
@ 2020-01-31  6:13 ` Andrew Morton
  2020-01-31  6:13 ` [patch 050/118] lib/test_kasan.c: fix memory leak in kmalloc_oob_krealloc_more() Andrew Morton
                   ` (68 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:13 UTC (permalink / raw)
  To: akpm, changbin.du, joel, linux-mm, mingo, mm-commits, rostedt,
	sunjunyong, sunjy516, timmurray, torvalds

From: Junyong Sun <sunjy516@gmail.com>
Subject: mm, tracing: print symbol name for kmem_alloc_node call_site events

Print the call_site ip of kmem_alloc_node using '%pS' to improve the
readability of raw slab trace points.

Link: http://lkml.kernel.org/r/1577949568-4518-1-git-send-email-sunjunyong@xiaomi.com
Signed-off-by: Junyong Sun <sunjunyong@xiaomi.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Changbin Du <changbin.du@intel.com>
Cc: Tim Murray <timmurray@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/trace/events/kmem.h |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/include/trace/events/kmem.h~mm-tracing-print-symbol-name-for-kmem_alloc_node-call_site-events
+++ a/include/trace/events/kmem.h
@@ -88,8 +88,8 @@ DECLARE_EVENT_CLASS(kmem_alloc_node,
 		__entry->node		= node;
 	),
 
-	TP_printk("call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d",
-		__entry->call_site,
+	TP_printk("call_site=%pS ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d",
+		(void *)__entry->call_site,
 		__entry->ptr,
 		__entry->bytes_req,
 		__entry->bytes_alloc,
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 050/118] lib/test_kasan.c: fix memory leak in kmalloc_oob_krealloc_more()
  2020-01-31  6:10 incoming Andrew Morton
                   ` (48 preceding siblings ...)
  2020-01-31  6:13 ` [patch 049/118] mm, tracing: print symbol name for kmem_alloc_node call_site events Andrew Morton
@ 2020-01-31  6:13 ` Andrew Morton
  2020-01-31  6:13 ` [patch 051/118] mm/early_ioremap.c: use %pa to print resource_size_t variables Andrew Morton
                   ` (67 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:13 UTC (permalink / raw)
  To: akpm, dvyukov, gustavo, linux-mm, mm-commits, stable, torvalds

From: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Subject: lib/test_kasan.c: fix memory leak in kmalloc_oob_krealloc_more()

In case memory resources for _ptr2_ were allocated, release them before
return.

Notice that in case _ptr1_ happens to be NULL, krealloc() behaves exactly
like kmalloc().

Addresses-Coverity-ID: 1490594 ("Resource leak")
Link: http://lkml.kernel.org/r/20200123160115.GA4202@embeddedor
Fixes: 3f15801cdc23 ("lib: add kasan test module")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 lib/test_kasan.c |    1 +
 1 file changed, 1 insertion(+)

--- a/lib/test_kasan.c~lib-test_kasanc-fix-memory-leak-in-kmalloc_oob_krealloc_more
+++ a/lib/test_kasan.c
@@ -158,6 +158,7 @@ static noinline void __init kmalloc_oob_
 	if (!ptr1 || !ptr2) {
 		pr_err("Allocation failed\n");
 		kfree(ptr1);
+		kfree(ptr2);
 		return;
 	}
 
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 051/118] mm/early_ioremap.c: use %pa to print resource_size_t variables
  2020-01-31  6:10 incoming Andrew Morton
                   ` (49 preceding siblings ...)
  2020-01-31  6:13 ` [patch 050/118] lib/test_kasan.c: fix memory leak in kmalloc_oob_krealloc_more() Andrew Morton
@ 2020-01-31  6:13 ` Andrew Morton
  2020-01-31  6:13 ` [patch 052/118] mm/page_alloc: skip non present sections on zone initialization Andrew Morton
                   ` (66 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:13 UTC (permalink / raw)
  To: akpm, andriy.shevchenko, david, linux-mm, mm-commits, torvalds

From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Subject: mm/early_ioremap.c: use %pa to print resource_size_t variables

%pa takes into consideration the special types such as resource_size_t. 
Use this specifier %instead of explicit casting.

Link: http://lkml.kernel.org/r/20191209165413.56263-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/early_ioremap.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/mm/early_ioremap.c~mm-early_remap-use-%pa-to-print-resource_size_t-variables
+++ a/mm/early_ioremap.c
@@ -121,8 +121,8 @@ __early_ioremap(resource_size_t phys_add
 		}
 	}
 
-	if (WARN(slot < 0, "%s(%08llx, %08lx) not found slot\n",
-		 __func__, (u64)phys_addr, size))
+	if (WARN(slot < 0, "%s(%pa, %08lx) not found slot\n",
+		 __func__, &phys_addr, size))
 		return NULL;
 
 	/* Don't allow wraparound or zero size */
@@ -158,8 +158,8 @@ __early_ioremap(resource_size_t phys_add
 		--idx;
 		--nrpages;
 	}
-	WARN(early_ioremap_debug, "%s(%08llx, %08lx) [%d] => %08lx + %08lx\n",
-	     __func__, (u64)phys_addr, size, slot, offset, slot_virt[slot]);
+	WARN(early_ioremap_debug, "%s(%pa, %08lx) [%d] => %08lx + %08lx\n",
+	     __func__, &phys_addr, size, slot, offset, slot_virt[slot]);
 
 	prev_map[slot] = (void __iomem *)(offset + slot_virt[slot]);
 	return prev_map[slot];
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 052/118] mm/page_alloc: skip non present sections on zone initialization
  2020-01-31  6:10 incoming Andrew Morton
                   ` (50 preceding siblings ...)
  2020-01-31  6:13 ` [patch 051/118] mm/early_ioremap.c: use %pa to print resource_size_t variables Andrew Morton
@ 2020-01-31  6:13 ` Andrew Morton
  2020-01-31  6:14 ` [patch 053/118] mm: remove the memory isolate notifier Andrew Morton
                   ` (65 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:13 UTC (permalink / raw)
  To: akpm, bhe, dan.j.williams, david, kirill.shutemov, kirill,
	linux-mm, mgorman, mhocko, mhocko, mm-commits, osalvador,
	pasha.tatashin, torvalds, vbabka, zhi.jin

From: "Kirill A. Shutemov" <kirill@shutemov.name>
Subject: mm/page_alloc: skip non present sections on zone initialization

memmap_init_zone() can be called on the ranges with holes during the boot.
It will skip any non-valid PFNs one-by-one.  It works fine as long as
holes are not too big.

But huge holes in the memory map causes a problem.  It takes over 20
seconds to walk 32TiB hole.  x86-64 with 5-level paging allows for much
larger holes in the memory map which would practically hang the system.

Deferred struct page init doesn't help here.  It only works on the present
ranges.

Skipping non-present sections would fix the issue.

Link: http://lkml.kernel.org/r/20191230093828.24613-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: "Jin, Zhi" <zhi.jin@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_alloc.c |   28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

--- a/mm/page_alloc.c~mm-page_alloc-skip-non-present-sections-on-zone-initialization
+++ a/mm/page_alloc.c
@@ -5848,6 +5848,30 @@ overlap_memmap_init(unsigned long zone,
 	return false;
 }
 
+#ifdef CONFIG_SPARSEMEM
+/* Skip PFNs that belong to non-present sections */
+static inline __meminit unsigned long next_pfn(unsigned long pfn)
+{
+	unsigned long section_nr;
+
+	section_nr = pfn_to_section_nr(++pfn);
+	if (present_section_nr(section_nr))
+		return pfn;
+
+	while (++section_nr <= __highest_present_section_nr) {
+		if (present_section_nr(section_nr))
+			return section_nr_to_pfn(section_nr);
+	}
+
+	return -1;
+}
+#else
+static inline __meminit unsigned long next_pfn(unsigned long pfn)
+{
+	return pfn++;
+}
+#endif
+
 /*
  * Initially all pages are reserved - free ones are freed
  * up by memblock_free_all() once the early boot process is
@@ -5887,8 +5911,10 @@ void __meminit memmap_init_zone(unsigned
 		 * function.  They do not exist on hotplugged memory.
 		 */
 		if (context == MEMMAP_EARLY) {
-			if (!early_pfn_valid(pfn))
+			if (!early_pfn_valid(pfn)) {
+				pfn = next_pfn(pfn) - 1;
 				continue;
+			}
 			if (!early_pfn_in_nid(pfn, nid))
 				continue;
 			if (overlap_memmap_init(zone, &pfn))
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 053/118] mm: remove the memory isolate notifier
  2020-01-31  6:10 incoming Andrew Morton
                   ` (51 preceding siblings ...)
  2020-01-31  6:13 ` [patch 052/118] mm/page_alloc: skip non present sections on zone initialization Andrew Morton
@ 2020-01-31  6:14 ` Andrew Morton
  2020-01-31  6:14 ` [patch 054/118] mm: remove "count" parameter from has_unmovable_pages() Andrew Morton
                   ` (64 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:14 UTC (permalink / raw)
  To: akpm, anshuman.khandual, cai, dan.j.williams, david, gregkh,
	kernelfans, linux-mm, mhocko, mm-commits, mpe, osalvador,
	pasha.tatashin, rafael, torvalds

From: David Hildenbrand <david@redhat.com>
Subject: mm: remove the memory isolate notifier

Luckily, we have no users left, so we can get rid of it.  Cleanup
set_migratetype_isolate() a little bit.

Link: http://lkml.kernel.org/r/20191114131911.11783-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Qian Cai <cai@lca.pw>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/base/memory.c  |   19 -------------------
 include/linux/memory.h |   27 ---------------------------
 mm/page_isolation.c    |   38 ++++----------------------------------
 3 files changed, 4 insertions(+), 80 deletions(-)

--- a/drivers/base/memory.c~mm-remove-the-memory-isolate-notifier
+++ a/drivers/base/memory.c
@@ -70,20 +70,6 @@ void unregister_memory_notifier(struct n
 }
 EXPORT_SYMBOL(unregister_memory_notifier);
 
-static ATOMIC_NOTIFIER_HEAD(memory_isolate_chain);
-
-int register_memory_isolate_notifier(struct notifier_block *nb)
-{
-	return atomic_notifier_chain_register(&memory_isolate_chain, nb);
-}
-EXPORT_SYMBOL(register_memory_isolate_notifier);
-
-void unregister_memory_isolate_notifier(struct notifier_block *nb)
-{
-	atomic_notifier_chain_unregister(&memory_isolate_chain, nb);
-}
-EXPORT_SYMBOL(unregister_memory_isolate_notifier);
-
 static void memory_block_release(struct device *dev)
 {
 	struct memory_block *mem = to_memory_block(dev);
@@ -175,11 +161,6 @@ int memory_notify(unsigned long val, voi
 	return blocking_notifier_call_chain(&memory_chain, val, v);
 }
 
-int memory_isolate_notify(unsigned long val, void *v)
-{
-	return atomic_notifier_call_chain(&memory_isolate_chain, val, v);
-}
-
 /*
  * The probe routines leave the pages uninitialized, just as the bootmem code
  * does. Make sure we do not access them, but instead use only information from
--- a/include/linux/memory.h~mm-remove-the-memory-isolate-notifier
+++ a/include/linux/memory.h
@@ -55,19 +55,6 @@ struct memory_notify {
 	int status_change_nid;
 };
 
-/*
- * During pageblock isolation, count the number of pages within the
- * range [start_pfn, start_pfn + nr_pages) which are owned by code
- * in the notifier chain.
- */
-#define MEM_ISOLATE_COUNT	(1<<0)
-
-struct memory_isolate_notify {
-	unsigned long start_pfn;	/* Start of range to check */
-	unsigned int nr_pages;		/* # pages in range to check */
-	unsigned int pages_found;	/* # pages owned found by callbacks */
-};
-
 struct notifier_block;
 struct mem_section;
 
@@ -94,27 +81,13 @@ static inline int memory_notify(unsigned
 {
 	return 0;
 }
-static inline int register_memory_isolate_notifier(struct notifier_block *nb)
-{
-	return 0;
-}
-static inline void unregister_memory_isolate_notifier(struct notifier_block *nb)
-{
-}
-static inline int memory_isolate_notify(unsigned long val, void *v)
-{
-	return 0;
-}
 #else
 extern int register_memory_notifier(struct notifier_block *nb);
 extern void unregister_memory_notifier(struct notifier_block *nb);
-extern int register_memory_isolate_notifier(struct notifier_block *nb);
-extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
 int create_memory_block_devices(unsigned long start, unsigned long size);
 void remove_memory_block_devices(unsigned long start, unsigned long size);
 extern void memory_dev_init(void);
 extern int memory_notify(unsigned long val, void *v);
-extern int memory_isolate_notify(unsigned long val, void *v);
 extern struct memory_block *find_memory_block(struct mem_section *);
 typedef int (*walk_memory_blocks_func_t)(struct memory_block *, void *);
 extern int walk_memory_blocks(unsigned long start, unsigned long size,
--- a/mm/page_isolation.c~mm-remove-the-memory-isolate-notifier
+++ a/mm/page_isolation.c
@@ -18,9 +18,7 @@
 static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
 {
 	struct zone *zone;
-	unsigned long flags, pfn;
-	struct memory_isolate_notify arg;
-	int notifier_ret;
+	unsigned long flags;
 	int ret = -EBUSY;
 
 	zone = page_zone(page);
@@ -35,41 +33,11 @@ static int set_migratetype_isolate(struc
 	if (is_migrate_isolate_page(page))
 		goto out;
 
-	pfn = page_to_pfn(page);
-	arg.start_pfn = pfn;
-	arg.nr_pages = pageblock_nr_pages;
-	arg.pages_found = 0;
-
-	/*
-	 * It may be possible to isolate a pageblock even if the
-	 * migratetype is not MIGRATE_MOVABLE. The memory isolation
-	 * notifier chain is used by balloon drivers to return the
-	 * number of pages in a range that are held by the balloon
-	 * driver to shrink memory. If all the pages are accounted for
-	 * by balloons, are free, or on the LRU, isolation can continue.
-	 * Later, for example, when memory hotplug notifier runs, these
-	 * pages reported as "can be isolated" should be isolated(freed)
-	 * by the balloon driver through the memory notifier chain.
-	 */
-	notifier_ret = memory_isolate_notify(MEM_ISOLATE_COUNT, &arg);
-	notifier_ret = notifier_to_errno(notifier_ret);
-	if (notifier_ret)
-		goto out;
 	/*
 	 * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
 	 * We just check MOVABLE pages.
 	 */
-	if (!has_unmovable_pages(zone, page, arg.pages_found, migratetype,
-				 isol_flags))
-		ret = 0;
-
-	/*
-	 * immobile means "not-on-lru" pages. If immobile is larger than
-	 * removable-by-driver pages reported by notifier, we'll fail.
-	 */

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 054/118] mm: remove "count" parameter from has_unmovable_pages()
  2020-01-31  6:10 incoming Andrew Morton
                   ` (52 preceding siblings ...)
  2020-01-31  6:14 ` [patch 053/118] mm: remove the memory isolate notifier Andrew Morton
@ 2020-01-31  6:14 ` Andrew Morton
  2020-01-31  6:14 ` [patch 055/118] mm/vmscan.c: remove unused return value of shrink_node Andrew Morton
                   ` (63 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:14 UTC (permalink / raw)
  To: akpm, alexander.h.duyck, anshuman.khandual, arunks, cai,
	dan.j.williams, david, glider, kernelfans, linux-mm, mgorman,
	mhocko, mm-commits, mpe, osalvador, pasha.tatashin,
	richardw.yang, rppt, sfr, torvalds, vbabka

From: David Hildenbrand <david@redhat.com>
Subject: mm: remove "count" parameter from has_unmovable_pages()

Now that the memory isolate notifier is gone, the parameter is always 0. 
Drop it and cleanup has_unmovable_pages().

Link: http://lkml.kernel.org/r/20191114131911.11783-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/page-isolation.h |    4 ++--
 mm/memory_hotplug.c            |    2 +-
 mm/page_alloc.c                |   21 +++++++--------------
 mm/page_isolation.c            |    2 +-
 4 files changed, 11 insertions(+), 18 deletions(-)

--- a/include/linux/page-isolation.h~mm-remove-count-parameter-from-has_unmovable_pages
+++ a/include/linux/page-isolation.h
@@ -33,8 +33,8 @@ static inline bool is_migrate_isolate(in
 #define MEMORY_OFFLINE	0x1
 #define REPORT_FAILURE	0x2
 
-bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
-			 int migratetype, int flags);
+bool has_unmovable_pages(struct zone *zone, struct page *page, int migratetype,
+			 int flags);
 void set_pageblock_migratetype(struct page *page, int migratetype);
 int move_freepages_block(struct zone *zone, struct page *page,
 				int migratetype, int *num_movable);
--- a/mm/memory_hotplug.c~mm-remove-count-parameter-from-has_unmovable_pages
+++ a/mm/memory_hotplug.c
@@ -1182,7 +1182,7 @@ static bool is_pageblock_removable_noloc
 	if (!zone_spans_pfn(zone, pfn))
 		return false;
 
-	return !has_unmovable_pages(zone, page, 0, MIGRATE_MOVABLE,
+	return !has_unmovable_pages(zone, page, MIGRATE_MOVABLE,
 				    MEMORY_OFFLINE);
 }
 
--- a/mm/page_alloc.c~mm-remove-count-parameter-from-has_unmovable_pages
+++ a/mm/page_alloc.c
@@ -8180,17 +8180,15 @@ void *__init alloc_large_system_hash(con
 
 /*
  * This function checks whether pageblock includes unmovable pages or not.
- * If @count is not zero, it is okay to include less @count unmovable pages
  *
  * PageLRU check without isolation or lru_lock could race so that
  * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
  * check without lock_page also may miss some movable non-lru pages at
  * race condition. So you can't expect this function should be exact.
  */
-bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
-			 int migratetype, int flags)
+bool has_unmovable_pages(struct zone *zone, struct page *page, int migratetype,
+			 int flags)
 {
-	unsigned long found;
 	unsigned long iter = 0;
 	unsigned long pfn = page_to_pfn(page);
 	const char *reason = "unmovable page";
@@ -8216,13 +8214,11 @@ bool has_unmovable_pages(struct zone *zo
 		goto unmovable;
 	}
 
-	for (found = 0; iter < pageblock_nr_pages; iter++) {
-		unsigned long check = pfn + iter;
-
-		if (!pfn_valid_within(check))
+	for (; iter < pageblock_nr_pages; iter++) {
+		if (!pfn_valid_within(pfn + iter))
 			continue;
 
-		page = pfn_to_page(check);
+		page = pfn_to_page(pfn + iter);
 
 		if (PageReserved(page))
 			goto unmovable;
@@ -8271,11 +8267,9 @@ bool has_unmovable_pages(struct zone *zo
 		if ((flags & MEMORY_OFFLINE) && PageHWPoison(page))
 			continue;
 
-		if (__PageMovable(page))
+		if (__PageMovable(page) || PageLRU(page))
 			continue;
 
-		if (!PageLRU(page))
-			found++;
 		/*
 		 * If there are RECLAIMABLE pages, we need to check
 		 * it.  But now, memory offline itself doesn't call
@@ -8289,8 +8283,7 @@ bool has_unmovable_pages(struct zone *zo
 		 * is set to both of a memory hole page and a _used_ kernel
 		 * page at boot.
 		 */
-		if (found > count)
-			goto unmovable;
+		goto unmovable;
 	}
 	return false;
 unmovable:
--- a/mm/page_isolation.c~mm-remove-count-parameter-from-has_unmovable_pages
+++ a/mm/page_isolation.c
@@ -37,7 +37,7 @@ static int set_migratetype_isolate(struc
 	 * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
 	 * We just check MOVABLE pages.
 	 */
-	if (!has_unmovable_pages(zone, page, 0, migratetype, isol_flags)) {
+	if (!has_unmovable_pages(zone, page, migratetype, isol_flags)) {
 		unsigned long nr_pages;
 		int mt = get_pageblock_migratetype(page);
 
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 055/118] mm/vmscan.c: remove unused return value of shrink_node
  2020-01-31  6:10 incoming Andrew Morton
                   ` (53 preceding siblings ...)
  2020-01-31  6:14 ` [patch 054/118] mm: remove "count" parameter from has_unmovable_pages() Andrew Morton
@ 2020-01-31  6:14 ` Andrew Morton
  2020-01-31  6:14 ` [patch 056/118] mm/vmscan: remove prefetch_prev_lru_page Andrew Morton
                   ` (62 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:14 UTC (permalink / raw)
  To: akpm, david, linux-mm, liu.song11, mm-commits, torvalds

From: Liu Song <liu.song11@zte.com.cn>
Subject: mm/vmscan.c: remove unused return value of shrink_node

The return value of shrink_node is not used, so remove
unnecessary operations.

Link: http://lkml.kernel.org/r/20191128143524.3223-1-fishland@aliyun.com
Signed-off-by: Liu Song <liu.song11@zte.com.cn>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/vmscan.c |    4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

--- a/mm/vmscan.c~mm-vmscanc-remove-unused-return-value-of-shrink_node
+++ a/mm/vmscan.c
@@ -2695,7 +2695,7 @@ static void shrink_node_memcgs(pg_data_t
 	} while ((memcg = mem_cgroup_iter(target_memcg, memcg, NULL)));
 }
 
-static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc)
+static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
 {
 	struct reclaim_state *reclaim_state = current->reclaim_state;
 	unsigned long nr_reclaimed, nr_scanned;
@@ -2874,8 +2874,6 @@ again:
 	 */
 	if (reclaimable)
 		pgdat->kswapd_failures = 0;

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 056/118] mm/vmscan: remove prefetch_prev_lru_page
  2020-01-31  6:10 incoming Andrew Morton
                   ` (54 preceding siblings ...)
  2020-01-31  6:14 ` [patch 055/118] mm/vmscan.c: remove unused return value of shrink_node Andrew Morton
@ 2020-01-31  6:14 ` Andrew Morton
  2020-01-31  6:14 ` [patch 057/118] mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE Andrew Morton
                   ` (61 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:14 UTC (permalink / raw)
  To: akpm, alex.shi, cai, linux-mm, mm-commits, torvalds

From: Alex Shi <alex.shi@linux.alibaba.com>
Subject: mm/vmscan: remove prefetch_prev_lru_page

This macro was never used in git history. So better to remove.

Link: http://lkml.kernel.org/r/1579006500-127143-1-git-send-email-alex.shi@linux.alibaba.com
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/vmscan.c |   14 --------------
 1 file changed, 14 deletions(-)

--- a/mm/vmscan.c~mm-vmscan-remove-prefetch_prev_lru_page
+++ a/mm/vmscan.c
@@ -146,20 +146,6 @@ struct scan_control {
 	struct reclaim_state reclaim_state;
 };
 
-#ifdef ARCH_HAS_PREFETCH
-#define prefetch_prev_lru_page(_page, _base, _field)			\
-	do {								\
-		if ((_page)->lru.prev != _base) {			\
-			struct page *prev;				\
-									\
-			prev = lru_to_page(&(_page->lru));		\
-			prefetch(&prev->_field);			\
-		}							\
-	} while (0)
-#else
-#define prefetch_prev_lru_page(_page, _base, _field) do { } while (0)
-#endif

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 057/118] mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE
  2020-01-31  6:10 incoming Andrew Morton
                   ` (55 preceding siblings ...)
  2020-01-31  6:14 ` [patch 056/118] mm/vmscan: remove prefetch_prev_lru_page Andrew Morton
@ 2020-01-31  6:14 ` Andrew Morton
  2020-01-31  6:14 ` [patch 058/118] tools/vm/slabinfo: fix sanity checks enabling Andrew Morton
                   ` (60 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:14 UTC (permalink / raw)
  To: akpm, alex.shi, cl, dwagner, linux-mm, mm-commits, tobin, torvalds

From: Alex Shi <alex.shi@linux.alibaba.com>
Subject: mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE

commit 1b2ffb7896ad ("[PATCH] Zone reclaim: Allow modification of zone
reclaim behavior")' defined RECLAIM_OFF/RECLAIM_ZONE, but never use them,
so better to remove them.

[dwagner@suse.de: fix sanity checks enabling]
  Link: http://lkml.kernel.org/r/20200116131642.642-1-dwagner@suse.de
[akpm@linux-foundation.org: renumber the bits for neatness]
Link: http://lkml.kernel.org/r/1579005573-58923-1-git-send-email-alex.shi@linux.alibaba.com
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "Tobin C. Harding" <tobin@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/vmscan.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

--- a/mm/vmscan.c~mm-vmscan-remove-unused-reclaim_off-reclaim_zone
+++ a/mm/vmscan.c
@@ -4110,10 +4110,8 @@ module_init(kswapd_init)
  */
 int node_reclaim_mode __read_mostly;
 
-#define RECLAIM_OFF 0
-#define RECLAIM_ZONE (1<<0)	/* Run shrink_inactive_list on the zone */
-#define RECLAIM_WRITE (1<<1)	/* Writeout pages during reclaim */
-#define RECLAIM_UNMAP (1<<2)	/* Unmap pages during reclaim */
+#define RECLAIM_WRITE (1<<0)	/* Writeout pages during reclaim */
+#define RECLAIM_UNMAP (1<<1)	/* Unmap pages during reclaim */
 
 /*
  * Priority for NODE_RECLAIM. This determines the fraction of pages
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 058/118] tools/vm/slabinfo: fix sanity checks enabling
  2020-01-31  6:10 incoming Andrew Morton
                   ` (56 preceding siblings ...)
  2020-01-31  6:14 ` [patch 057/118] mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE Andrew Morton
@ 2020-01-31  6:14 ` Andrew Morton
  2020-01-31  6:14 ` [patch 059/118] mm/memblock: define memblock_physmem_add() Andrew Morton
                   ` (59 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:14 UTC (permalink / raw)
  To: akpm, cl, dwagner, linux-mm, mm-commits, tobin, torvalds

From: Daniel Wagner <dwagner@suse.de>
Subject: tools/vm/slabinfo: fix sanity checks enabling

The sysfs file name for enabling sanity checking is called 'sanity_checks'
and not 'sanity'.

The name of the file has never changed since the introduction of the slub
allocator.  Obviously, most people turn the checks on via the command line
option and not during runtime using slabinfo.

Link: http://lkml.kernel.org/r/20200116131642.642-1-dwagner@suse.de
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "Tobin C. Harding" <tobin@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 tools/vm/slabinfo.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/tools/vm/slabinfo.c~tools-vm-slabinfo-fix-sanity-checks-enabling
+++ a/tools/vm/slabinfo.c
@@ -720,11 +720,11 @@ static void slab_debug(struct slabinfo *
 		return;
 
 	if (sanity && !s->sanity_checks) {
-		set_obj(s, "sanity", 1);
+		set_obj(s, "sanity_checks", 1);
 	}
 	if (!sanity && s->sanity_checks) {
 		if (slab_empty(s))
-			set_obj(s, "sanity", 0);
+			set_obj(s, "sanity_checks", 0);
 		else
 			fprintf(stderr, "%s not empty cannot disable sanity checks\n", s->name);
 	}
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 059/118] mm/memblock: define memblock_physmem_add()
  2020-01-31  6:10 incoming Andrew Morton
                   ` (57 preceding siblings ...)
  2020-01-31  6:14 ` [patch 058/118] tools/vm/slabinfo: fix sanity checks enabling Andrew Morton
@ 2020-01-31  6:14 ` Andrew Morton
  2020-01-31  6:14 ` [patch 060/118] memblock: Use __func__ in remaining memblock_dbg() call sites Andrew Morton
                   ` (58 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:14 UTC (permalink / raw)
  To: akpm, anshuman.khandual, borntraeger, gerald.schaefer, gor,
	heiko.carstens, linux-mm, mm-commits, prudo, rppt, schwidefsky,
	torvalds, walling

From: Anshuman Khandual <anshuman.khandual@arm.com>
Subject: mm/memblock: define memblock_physmem_add()

On s390 platform memblock.physmem array is being built by directly calling
into memblock_add_range() which is a low level function not intended to be
used outside of memblock.  Hence lets conditionally add helper functions
for physmem array when HAVE_MEMBLOCK_PHYS_MAP is enabled.  Also use
MAX_NUMNODES instead of 0 as node ID similar to memblock_add() and
memblock_reserve().  Make memblock_add_range() a static function as it is
no longer getting used outside of memblock.

Link: http://lkml.kernel.org/r/1578283835-21969-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Collin Walling <walling@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/s390/kernel/setup.c |   12 +++---------
 include/linux/memblock.h |    7 +++----
 mm/memblock.c            |   14 +++++++++++++-
 3 files changed, 19 insertions(+), 14 deletions(-)

--- a/arch/s390/kernel/setup.c~mm-memblock-define-memblock_physmem_add
+++ a/arch/s390/kernel/setup.c
@@ -759,14 +759,6 @@ static void __init free_mem_detect_info(
 		memblock_free(start, size);
 }
 
-static void __init memblock_physmem_add(phys_addr_t start, phys_addr_t size)
-{
-	memblock_dbg("memblock_physmem_add: [%#016llx-%#016llx]\n",
-		     start, start + size - 1);
-	memblock_add_range(&memblock.memory, start, size, 0, 0);
-	memblock_add_range(&memblock.physmem, start, size, 0, 0);
-}
-
 static const char * __init get_mem_info_source(void)
 {
 	switch (mem_detect.info_source) {
@@ -791,8 +783,10 @@ static void __init memblock_add_mem_dete
 		     get_mem_info_source(), mem_detect.info_source);
 	/* keep memblock lists close to the kernel */
 	memblock_set_bottom_up(true);
-	for_each_mem_detect_block(i, &start, &end)
+	for_each_mem_detect_block(i, &start, &end) {
+		memblock_add(start, end - start);
 		memblock_physmem_add(start, end - start);
+	}
 	memblock_set_bottom_up(false);
 	memblock_dump_all();
 }
--- a/include/linux/memblock.h~mm-memblock-define-memblock_physmem_add
+++ a/include/linux/memblock.h
@@ -113,6 +113,9 @@ int memblock_add(phys_addr_t base, phys_
 int memblock_remove(phys_addr_t base, phys_addr_t size);
 int memblock_free(phys_addr_t base, phys_addr_t size);
 int memblock_reserve(phys_addr_t base, phys_addr_t size);
+#ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP
+int memblock_physmem_add(phys_addr_t base, phys_addr_t size);
+#endif
 void memblock_trim_memory(phys_addr_t align);
 bool memblock_overlaps_region(struct memblock_type *type,
 			      phys_addr_t base, phys_addr_t size);
@@ -127,10 +130,6 @@ void reset_node_managed_pages(pg_data_t
 void reset_all_zones_managed_pages(void);
 
 /* Low level functions */
-int memblock_add_range(struct memblock_type *type,
-		       phys_addr_t base, phys_addr_t size,
-		       int nid, enum memblock_flags flags);
-
 void __next_mem_range(u64 *idx, int nid, enum memblock_flags flags,
 		      struct memblock_type *type_a,
 		      struct memblock_type *type_b, phys_addr_t *out_start,
--- a/mm/memblock.c~mm-memblock-define-memblock_physmem_add
+++ a/mm/memblock.c
@@ -575,7 +575,7 @@ static void __init_memblock memblock_ins
  * Return:
  * 0 on success, -errno on failure.
  */
-int __init_memblock memblock_add_range(struct memblock_type *type,
+static int __init_memblock memblock_add_range(struct memblock_type *type,
 				phys_addr_t base, phys_addr_t size,
 				int nid, enum memblock_flags flags)
 {
@@ -830,6 +830,18 @@ int __init_memblock memblock_reserve(phy
 	return memblock_add_range(&memblock.reserved, base, size, MAX_NUMNODES, 0);
 }
 
+#ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP
+int __init_memblock memblock_physmem_add(phys_addr_t base, phys_addr_t size)
+{
+	phys_addr_t end = base + size - 1;
+
+	memblock_dbg("%s: [%pa-%pa] %pS\n", __func__,
+		     &base, &end, (void *)_RET_IP_);
+
+	return memblock_add_range(&memblock.physmem, base, size, MAX_NUMNODES, 0);
+}
+#endif
+
 /**
  * memblock_setclr_flag - set or clear flag for a memory region
  * @base: base address of the region
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 060/118] memblock: Use __func__ in remaining memblock_dbg() call sites
  2020-01-31  6:10 incoming Andrew Morton
                   ` (58 preceding siblings ...)
  2020-01-31  6:14 ` [patch 059/118] mm/memblock: define memblock_physmem_add() Andrew Morton
@ 2020-01-31  6:14 ` Andrew Morton
  2020-01-31  6:14 ` [patch 061/118] mm, oom: dump stack of victim when reaping failed Andrew Morton
                   ` (57 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:14 UTC (permalink / raw)
  To: akpm, anshuman.khandual, linux-mm, mm-commits, rppt, torvalds

From: Anshuman Khandual <anshuman.khandual@arm.com>
Subject: memblock: Use __func__ in remaining memblock_dbg() call sites

Replace open function name strings with %s (__func__) in all remaining
memblock_dbg() call sites.

Link: http://lkml.kernel.org/r/1578285510-28261-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memblock.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/mm/memblock.c~memblock-use-__func__-in-remaining-memblock_dbg-call-sites
+++ a/mm/memblock.c
@@ -694,7 +694,7 @@ int __init_memblock memblock_add(phys_ad
 {
 	phys_addr_t end = base + size - 1;
 
-	memblock_dbg("memblock_add: [%pa-%pa] %pS\n",
+	memblock_dbg("%s: [%pa-%pa] %pS\n", __func__,
 		     &base, &end, (void *)_RET_IP_);
 
 	return memblock_add_range(&memblock.memory, base, size, MAX_NUMNODES, 0);
@@ -795,7 +795,7 @@ int __init_memblock memblock_remove(phys
 {
 	phys_addr_t end = base + size - 1;
 
-	memblock_dbg("memblock_remove: [%pa-%pa] %pS\n",
+	memblock_dbg("%s: [%pa-%pa] %pS\n", __func__,
 		     &base, &end, (void *)_RET_IP_);
 
 	return memblock_remove_range(&memblock.memory, base, size);
@@ -813,7 +813,7 @@ int __init_memblock memblock_free(phys_a
 {
 	phys_addr_t end = base + size - 1;
 
-	memblock_dbg("   memblock_free: [%pa-%pa] %pS\n",
+	memblock_dbg("%s: [%pa-%pa] %pS\n", __func__,
 		     &base, &end, (void *)_RET_IP_);
 
 	kmemleak_free_part_phys(base, size);
@@ -824,7 +824,7 @@ int __init_memblock memblock_reserve(phy
 {
 	phys_addr_t end = base + size - 1;
 
-	memblock_dbg("memblock_reserve: [%pa-%pa] %pS\n",
+	memblock_dbg("%s: [%pa-%pa] %pS\n", __func__,
 		     &base, &end, (void *)_RET_IP_);
 
 	return memblock_add_range(&memblock.reserved, base, size, MAX_NUMNODES, 0);
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 061/118] mm, oom: dump stack of victim when reaping failed
  2020-01-31  6:10 incoming Andrew Morton
                   ` (59 preceding siblings ...)
  2020-01-31  6:14 ` [patch 060/118] memblock: Use __func__ in remaining memblock_dbg() call sites Andrew Morton
@ 2020-01-31  6:14 ` Andrew Morton
  2020-01-31  6:14 ` [patch 062/118] mm/huge_memory.c: use head to check huge zero page Andrew Morton
                   ` (56 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:14 UTC (permalink / raw)
  To: akpm, linux-mm, mhocko, mm-commits, penguin-kernel, rientjes, torvalds

From: David Rientjes <rientjes@google.com>
Subject: mm, oom: dump stack of victim when reaping failed

When a process cannot be oom reaped, for whatever reason, currently the
list of locks that are held is currently dumped to the kernel log.

Much more interesting is the stack trace of the victim that cannot be
reaped.  If the stack trace is dumped, we have the ability to find related
occurrences in the same kernel code and hopefully solve the issue that is
making it wedged.

Dump the stack trace when a process fails to be oom reaped.

Link: http://lkml.kernel.org/r/alpine.DEB.2.21.2001141519280.200484@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/oom_kill.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/mm/oom_kill.c~mm-oom-dump-stack-of-victim-when-reaping-failed
+++ a/mm/oom_kill.c
@@ -26,6 +26,7 @@
 #include <linux/sched/mm.h>
 #include <linux/sched/coredump.h>
 #include <linux/sched/task.h>
+#include <linux/sched/debug.h>
 #include <linux/swap.h>
 #include <linux/timex.h>
 #include <linux/jiffies.h>
@@ -620,6 +621,7 @@ static void oom_reap_task(struct task_st
 
 	pr_info("oom_reaper: unable to reap pid:%d (%s)\n",
 		task_pid_nr(tsk), tsk->comm);
+	sched_show_task(tsk);
 	debug_show_all_locks();
 
 done:
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 062/118] mm/huge_memory.c: use head to check huge zero page
  2020-01-31  6:10 incoming Andrew Morton
                   ` (60 preceding siblings ...)
  2020-01-31  6:14 ` [patch 061/118] mm, oom: dump stack of victim when reaping failed Andrew Morton
@ 2020-01-31  6:14 ` Andrew Morton
  2020-01-31  6:14 ` [patch 063/118] mm/huge_memory.c: use head to emphasize the purpose of page Andrew Morton
                   ` (55 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:14 UTC (permalink / raw)
  To: akpm, kirill.shutemov, linux-mm, mm-commits, richardw.yang, torvalds

From: Wei Yang <richardw.yang@linux.intel.com>
Subject: mm/huge_memory.c: use head to check huge zero page

The page could be a tail page, if this is the case, this BUG_ON will never
be triggered.

Link: http://lkml.kernel.org/r/20200110032610.26499-1-richardw.yang@linux.intel.com
Fixes: e9b61f19858a ("thp: reintroduce split_huge_page()")

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/huge_memory.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/huge_memory.c~mm-huge_memoryc-use-head-to-check-huge-zero-page
+++ a/mm/huge_memory.c
@@ -2712,7 +2712,7 @@ int split_huge_page_to_list(struct page
 	unsigned long flags;
 	pgoff_t end;
 
-	VM_BUG_ON_PAGE(is_huge_zero_page(page), page);
+	VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 	VM_BUG_ON_PAGE(!PageCompound(page), page);
 
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 063/118] mm/huge_memory.c: use head to emphasize the purpose of page
  2020-01-31  6:10 incoming Andrew Morton
                   ` (61 preceding siblings ...)
  2020-01-31  6:14 ` [patch 062/118] mm/huge_memory.c: use head to check huge zero page Andrew Morton
@ 2020-01-31  6:14 ` Andrew Morton
  2020-01-31  6:14 ` [patch 064/118] mm/huge_memory.c: reduce critical section protected by split_queue_lock Andrew Morton
                   ` (54 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:14 UTC (permalink / raw)
  To: akpm, kirill.shutemov, linux-mm, mm-commits, richardw.yang, torvalds

From: Wei Yang <richardw.yang@linux.intel.com>
Subject: mm/huge_memory.c: use head to emphasize the purpose of page

During split huge page, it checks the property of the page.  Currently we
do the check on page and head without emphasizing the check is on the
compound page.  In case the page passed to split_huge_page_to_list is a
tail page, audience would take some time to think about whether the check
is on compound page or tail page itself.

To make it explicit, use head instead of page for those checks.  After
this, audience would be more clear about the checks are on compound page
and the page is used to do the split and dump error message if failed.

Link: http://lkml.kernel.org/r/20200110032610.26499-2-richardw.yang@linux.intel.com
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/huge_memory.c |   16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

--- a/mm/huge_memory.c~mm-huge_memoryc-use-head-to-emphasize-the-purpose-of-page
+++ a/mm/huge_memory.c
@@ -2704,7 +2704,7 @@ int split_huge_page_to_list(struct page
 {
 	struct page *head = compound_head(page);
 	struct pglist_data *pgdata = NODE_DATA(page_to_nid(head));
-	struct deferred_split *ds_queue = get_deferred_split_queue(page);
+	struct deferred_split *ds_queue = get_deferred_split_queue(head);
 	struct anon_vma *anon_vma = NULL;
 	struct address_space *mapping = NULL;
 	int count, mapcount, extra_pins, ret;
@@ -2713,10 +2713,10 @@ int split_huge_page_to_list(struct page
 	pgoff_t end;
 
 	VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
-	VM_BUG_ON_PAGE(!PageLocked(page), page);
-	VM_BUG_ON_PAGE(!PageCompound(page), page);
+	VM_BUG_ON_PAGE(!PageLocked(head), head);
+	VM_BUG_ON_PAGE(!PageCompound(head), head);
 
-	if (PageWriteback(page))
+	if (PageWriteback(head))
 		return -EBUSY;
 
 	if (PageAnon(head)) {
@@ -2767,7 +2767,7 @@ int split_huge_page_to_list(struct page
 		goto out_unlock;
 	}
 
-	mlocked = PageMlocked(page);
+	mlocked = PageMlocked(head);
 	unmap_page(head);
 	VM_BUG_ON_PAGE(compound_mapcount(head), head);
 
@@ -2800,10 +2800,10 @@ int split_huge_page_to_list(struct page
 			list_del(page_deferred_list(head));
 		}
 		if (mapping) {
-			if (PageSwapBacked(page))
-				__dec_node_page_state(page, NR_SHMEM_THPS);
+			if (PageSwapBacked(head))
+				__dec_node_page_state(head, NR_SHMEM_THPS);
 			else
-				__dec_node_page_state(page, NR_FILE_THPS);
+				__dec_node_page_state(head, NR_FILE_THPS);
 		}
 
 		spin_unlock(&ds_queue->split_queue_lock);
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 064/118] mm/huge_memory.c: reduce critical section protected by split_queue_lock
  2020-01-31  6:10 incoming Andrew Morton
                   ` (62 preceding siblings ...)
  2020-01-31  6:14 ` [patch 063/118] mm/huge_memory.c: use head to emphasize the purpose of page Andrew Morton
@ 2020-01-31  6:14 ` Andrew Morton
  2020-01-31  6:14 ` [patch 065/118] mm/migrate: remove useless mask of start address Andrew Morton
                   ` (53 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:14 UTC (permalink / raw)
  To: akpm, kirill.shutemov, linux-mm, mm-commits, richardw.yang, torvalds

From: Wei Yang <richardw.yang@linux.intel.com>
Subject: mm/huge_memory.c: reduce critical section protected by split_queue_lock

split_queue_lock protects data in struct deferred_split.  We can release
the lock after delete the page from deferred_split_queue.

This patch moves the THP accounting out of the lock protection, which is
introduced in commit 65c453778aea ("mm, rmap: account shmem thp pages").

Link: http://lkml.kernel.org/r/20200110025516.23996-1-richardw.yang@linux.intel.com
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/huge_memory.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/huge_memory.c~mm-huge_memoryc-reduce-critical-section-protected-by-split_queue_lock
+++ a/mm/huge_memory.c
@@ -2799,6 +2799,7 @@ int split_huge_page_to_list(struct page
 			ds_queue->split_queue_len--;
 			list_del(page_deferred_list(head));
 		}
+		spin_unlock(&ds_queue->split_queue_lock);
 		if (mapping) {
 			if (PageSwapBacked(head))
 				__dec_node_page_state(head, NR_SHMEM_THPS);
@@ -2806,7 +2807,6 @@ int split_huge_page_to_list(struct page
 				__dec_node_page_state(head, NR_FILE_THPS);
 		}
 
-		spin_unlock(&ds_queue->split_queue_lock);
 		__split_huge_page(page, list, end, flags);
 		if (PageSwapCache(head)) {
 			swp_entry_t entry = { .val = page_private(head) };
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 065/118] mm/migrate: remove useless mask of start address
  2020-01-31  6:10 incoming Andrew Morton
                   ` (63 preceding siblings ...)
  2020-01-31  6:14 ` [patch 064/118] mm/huge_memory.c: reduce critical section protected by split_queue_lock Andrew Morton
@ 2020-01-31  6:14 ` Andrew Morton
  2020-01-31  6:14 ` [patch 066/118] mm/migrate: clean up some minor coding style Andrew Morton
                   ` (52 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:14 UTC (permalink / raw)
  To: akpm, bharata, chris, hch, jgg, jglisse, jhubbard, linux-mm,
	mhocko, mm-commits, rcampbell, torvalds

From: Ralph Campbell <rcampbell@nvidia.com>
Subject: mm/migrate: remove useless mask of start address

Addresses passed to walk_page_range() callback functions are already page
aligned and don't need to be masked with PAGE_MASK.

Link: http://lkml.kernel.org/r/20200107211208.24595-2-rcampbell@nvidia.com
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Chris Down <chris@chrisdown.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/migrate.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/mm/migrate.c~mm-migrate-remove-useless-mask-of-start-address
+++ a/mm/migrate.c
@@ -2156,7 +2156,7 @@ static int migrate_vma_collect_hole(unsi
 	struct migrate_vma *migrate = walk->private;
 	unsigned long addr;
 
-	for (addr = start & PAGE_MASK; addr < end; addr += PAGE_SIZE) {
+	for (addr = start; addr < end; addr += PAGE_SIZE) {
 		migrate->src[migrate->npages] = MIGRATE_PFN_MIGRATE;
 		migrate->dst[migrate->npages] = 0;
 		migrate->npages++;
@@ -2173,7 +2173,7 @@ static int migrate_vma_collect_skip(unsi
 	struct migrate_vma *migrate = walk->private;
 	unsigned long addr;
 
-	for (addr = start & PAGE_MASK; addr < end; addr += PAGE_SIZE) {
+	for (addr = start; addr < end; addr += PAGE_SIZE) {
 		migrate->dst[migrate->npages] = 0;
 		migrate->src[migrate->npages++] = 0;
 	}
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 066/118] mm/migrate: clean up some minor coding style
  2020-01-31  6:10 incoming Andrew Morton
                   ` (64 preceding siblings ...)
  2020-01-31  6:14 ` [patch 065/118] mm/migrate: remove useless mask of start address Andrew Morton
@ 2020-01-31  6:14 ` Andrew Morton
  2020-01-31  6:14 ` [patch 067/118] mm/migrate: add stable check in migrate_vma_insert_page() Andrew Morton
                   ` (51 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:14 UTC (permalink / raw)
  To: akpm, bharata, chris, hch, jgg, jglisse, jhubbard, linux-mm,
	mhocko, mm-commits, rcampbell, torvalds

From: Ralph Campbell <rcampbell@nvidia.com>
Subject: mm/migrate: clean up some minor coding style

Fix some comment typos and coding style clean up in preparation for the
next patch.  No functional changes.

Link: http://lkml.kernel.org/r/20200107211208.24595-3-rcampbell@nvidia.com
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Acked-by: Chris Down <chris@chrisdown.name>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/migrate.c |   34 +++++++++++++---------------------
 1 file changed, 13 insertions(+), 21 deletions(-)

--- a/mm/migrate.c~mm-migrate-clean-up-some-minor-coding-style
+++ a/mm/migrate.c
@@ -986,7 +986,7 @@ static int move_to_new_page(struct page
 		}
 
 		/*
-		 * Anonymous and movable page->mapping will be cleard by
+		 * Anonymous and movable page->mapping will be cleared by
 		 * free_pages_prepare so don't reset it here for keeping
 		 * the type to work PageAnon, for example.
 		 */
@@ -1199,8 +1199,7 @@ out:
 		/*
 		 * A page that has been migrated has all references
 		 * removed and will be freed. A page that has not been
-		 * migrated will have kepts its references and be
-		 * restored.
+		 * migrated will have kept its references and be restored.
 		 */
 		list_del(&page->lru);
 
@@ -2779,27 +2778,18 @@ static void migrate_vma_insert_page(stru
 	if (pte_present(*ptep)) {
 		unsigned long pfn = pte_pfn(*ptep);
 
-		if (!is_zero_pfn(pfn)) {
-			pte_unmap_unlock(ptep, ptl);
-			mem_cgroup_cancel_charge(page, memcg, false);
-			goto abort;
-		}
+		if (!is_zero_pfn(pfn))
+			goto unlock_abort;
 		flush = true;
-	} else if (!pte_none(*ptep)) {
-		pte_unmap_unlock(ptep, ptl);
-		mem_cgroup_cancel_charge(page, memcg, false);
-		goto abort;
-	}
+	} else if (!pte_none(*ptep))
+		goto unlock_abort;
 
 	/*
-	 * Check for usefaultfd but do not deliver the fault. Instead,
+	 * Check for userfaultfd but do not deliver the fault. Instead,
 	 * just back off.
 	 */
-	if (userfaultfd_missing(vma)) {
-		pte_unmap_unlock(ptep, ptl);
-		mem_cgroup_cancel_charge(page, memcg, false);
-		goto abort;
-	}
+	if (userfaultfd_missing(vma))
+		goto unlock_abort;
 
 	inc_mm_counter(mm, MM_ANONPAGES);
 	page_add_new_anon_rmap(page, vma, addr, false);
@@ -2823,6 +2813,9 @@ static void migrate_vma_insert_page(stru
 	*src = MIGRATE_PFN_MIGRATE;
 	return;
 
+unlock_abort:
+	pte_unmap_unlock(ptep, ptl);
+	mem_cgroup_cancel_charge(page, memcg, false);
 abort:
 	*src &= ~MIGRATE_PFN_MIGRATE;
 }
@@ -2855,9 +2848,8 @@ void migrate_vma_pages(struct migrate_vm
 		}
 
 		if (!page) {
-			if (!(migrate->src[i] & MIGRATE_PFN_MIGRATE)) {
+			if (!(migrate->src[i] & MIGRATE_PFN_MIGRATE))
 				continue;
-			}
 			if (!notified) {
 				notified = true;
 
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 067/118] mm/migrate: add stable check in migrate_vma_insert_page()
  2020-01-31  6:10 incoming Andrew Morton
                   ` (65 preceding siblings ...)
  2020-01-31  6:14 ` [patch 066/118] mm/migrate: clean up some minor coding style Andrew Morton
@ 2020-01-31  6:14 ` Andrew Morton
  2020-01-31  6:14 ` [patch 068/118] mm, thp: fix defrag setting if newline is not used Andrew Morton
                   ` (50 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:14 UTC (permalink / raw)
  To: akpm, bharata, chris, hch, jgg, jglisse, jhubbard, linux-mm,
	mhocko, mm-commits, rcampbell, torvalds

From: Ralph Campbell <rcampbell@nvidia.com>
Subject: mm/migrate: add stable check in migrate_vma_insert_page()

migrate_vma_insert_page() closely follows the code in:
  __handle_mm_fault()
    handle_pte_fault()
      do_anonymous_page()

Add a call to check_stable_address_space() after locking the page table
entry before inserting a ZONE_DEVICE private zero page mapping similar to
page faulting a new anonymous page.

Link: http://lkml.kernel.org/r/20200107211208.24595-4-rcampbell@nvidia.com
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Chris Down <chris@chrisdown.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/migrate.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

--- a/mm/migrate.c~mm-migrate-add-stable-check-in-migrate_vma_insert_page
+++ a/mm/migrate.c
@@ -48,6 +48,7 @@
 #include <linux/page_owner.h>
 #include <linux/sched/mm.h>
 #include <linux/ptrace.h>
+#include <linux/oom.h>
 
 #include <asm/tlbflush.h>
 
@@ -2695,6 +2696,14 @@ int migrate_vma_setup(struct migrate_vma
 }
 EXPORT_SYMBOL(migrate_vma_setup);
 
+/*
+ * This code closely matches the code in:
+ *   __handle_mm_fault()
+ *     handle_pte_fault()
+ *       do_anonymous_page()
+ * to map in an anonymous zero page but the struct page will be a ZONE_DEVICE
+ * private page.
+ */
 static void migrate_vma_insert_page(struct migrate_vma *migrate,
 				    unsigned long addr,
 				    struct page *page,
@@ -2775,6 +2784,9 @@ static void migrate_vma_insert_page(stru
 
 	ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
 
+	if (check_stable_address_space(mm))
+		goto unlock_abort;
+
 	if (pte_present(*ptep)) {
 		unsigned long pfn = pte_pfn(*ptep);
 
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 068/118] mm, thp: fix defrag setting if newline is not used
  2020-01-31  6:10 incoming Andrew Morton
                   ` (66 preceding siblings ...)
  2020-01-31  6:14 ` [patch 067/118] mm/migrate: add stable check in migrate_vma_insert_page() Andrew Morton
@ 2020-01-31  6:14 ` Andrew Morton
  2020-01-31  6:14 ` [patch 069/118] mm/mmap.c: get rid of odd jump labels in find_mergeable_anon_vma() Andrew Morton
                   ` (49 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:14 UTC (permalink / raw)
  To: akpm, linux-mm, mgorman, mm-commits, rientjes, torvalds, vbabka

From: David Rientjes <rientjes@google.com>
Subject: mm, thp: fix defrag setting if newline is not used

If thp defrag setting "defer" is used and a newline is *not* used when
writing to the sysfs file, this is interpreted as the "defer+madvise"
option.

This is because we do prefix matching and if five characters are written
without a newline, the current code ends up comparing to the first five
bytes of the "defer+madvise" option and using that instead.

Use the more appropriate sysfs_streq() that handles the trailing newline
for us.  Since this doubles as a nice cleanup, do it in enabled_store() as
well.


The current implementation relies on prefix matching: the number of
bytes compared is either the number of bytes written or the length of
the option being compared.  With a newline, "defer\n" does not match
"defer+"madvise"; without a newline, however, "defer" is considered to
match "defer+madvise" (prefix matching is only comparing the first five
bytes).  End result is that writing "defer" is broken unless it has an
additional trailing character.

This means that writing "madv" in the past would match and set
"madvise".  With strict checking, that no longer is the case but it is
unlikely anybody is currently doing this.

Link: http://lkml.kernel.org/r/alpine.DEB.2.21.2001171411020.56385@chino.kir.corp.google.com
Fixes: 21440d7eb904 ("mm, thp: add new defer+madvise defrag option")
Signed-off-by: David Rientjes <rientjes@google.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/huge_memory.c |   24 ++++++++----------------
 1 file changed, 8 insertions(+), 16 deletions(-)

--- a/mm/huge_memory.c~mm-thp-fix-defrag-setting-if-newline-is-not-used
+++ a/mm/huge_memory.c
@@ -177,16 +177,13 @@ static ssize_t enabled_store(struct kobj
 {
 	ssize_t ret = count;
 
-	if (!memcmp("always", buf,
-		    min(sizeof("always")-1, count))) {
+	if (sysfs_streq(buf, "always")) {
 		clear_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &transparent_hugepage_flags);
 		set_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags);
-	} else if (!memcmp("madvise", buf,
-			   min(sizeof("madvise")-1, count))) {
+	} else if (sysfs_streq(buf, "madvise")) {
 		clear_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags);
 		set_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &transparent_hugepage_flags);
-	} else if (!memcmp("never", buf,
-			   min(sizeof("never")-1, count))) {
+	} else if (sysfs_streq(buf, "never")) {
 		clear_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags);
 		clear_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &transparent_hugepage_flags);
 	} else
@@ -250,32 +247,27 @@ static ssize_t defrag_store(struct kobje
 			    struct kobj_attribute *attr,
 			    const char *buf, size_t count)
 {
-	if (!memcmp("always", buf,
-		    min(sizeof("always")-1, count))) {
+	if (sysfs_streq(buf, "always")) {
 		clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags);
 		clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags);
 		clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags);
 		set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags);
-	} else if (!memcmp("defer+madvise", buf,
-		    min(sizeof("defer+madvise")-1, count))) {
+	} else if (sysfs_streq(buf, "defer+madvise")) {
 		clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags);
 		clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags);
 		clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags);
 		set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags);
-	} else if (!memcmp("defer", buf,
-		    min(sizeof("defer")-1, count))) {
+	} else if (sysfs_streq(buf, "defer")) {
 		clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags);
 		clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags);
 		clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags);
 		set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags);
-	} else if (!memcmp("madvise", buf,
-			   min(sizeof("madvise")-1, count))) {
+	} else if (sysfs_streq(buf, "madvise")) {
 		clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags);
 		clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags);
 		clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags);
 		set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags);
-	} else if (!memcmp("never", buf,
-			   min(sizeof("never")-1, count))) {
+	} else if (sysfs_streq(buf, "never")) {
 		clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags);
 		clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags);
 		clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags);
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 069/118] mm/mmap.c: get rid of odd jump labels in find_mergeable_anon_vma()
  2020-01-31  6:10 incoming Andrew Morton
                   ` (67 preceding siblings ...)
  2020-01-31  6:14 ` [patch 068/118] mm, thp: fix defrag setting if newline is not used Andrew Morton
@ 2020-01-31  6:14 ` Andrew Morton
  2020-01-31  6:14 ` [patch 070/118] mm/memory_hotplug: pass in nid to online_pages() Andrew Morton
                   ` (48 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:14 UTC (permalink / raw)
  To: akpm, david, jhubbard, linmiaohe, linux-mm, mm-commits,
	richardw.yang, rientjes, torvalds

From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/mmap.c: get rid of odd jump labels in find_mergeable_anon_vma()

The jump labels try_prev and none are not really needed in
find_mergeable_anon_vma(), eliminate them to improve readability.

Link: http://lkml.kernel.org/r/1574079844-17493-1-git-send-email-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/mmap.c |   30 +++++++++++++-----------------
 1 file changed, 13 insertions(+), 17 deletions(-)

--- a/mm/mmap.c~mm-get-rid-of-odd-jump-labels-in-find_mergeable_anon_vma
+++ a/mm/mmap.c
@@ -1270,26 +1270,22 @@ static struct anon_vma *reusable_anon_vm
  */
 struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *vma)
 {
-	struct anon_vma *anon_vma;
-	struct vm_area_struct *near;
+	struct anon_vma *anon_vma = NULL;
 
-	near = vma->vm_next;
-	if (!near)
-		goto try_prev;
+	/* Try next first. */
+	if (vma->vm_next) {
+		anon_vma = reusable_anon_vma(vma->vm_next, vma, vma->vm_next);
+		if (anon_vma)
+			return anon_vma;
+	}
 
-	anon_vma = reusable_anon_vma(near, vma, near);
-	if (anon_vma)
-		return anon_vma;
-try_prev:
-	near = vma->vm_prev;
-	if (!near)
-		goto none;
+	/* Try prev next. */
+	if (vma->vm_prev)
+		anon_vma = reusable_anon_vma(vma->vm_prev, vma->vm_prev, vma);
 
-	anon_vma = reusable_anon_vma(near, near, vma);
-	if (anon_vma)
-		return anon_vma;
-none:
 	/*
+	 * We might reach here with anon_vma == NULL if we can't find
+	 * any reusable anon_vma.
 	 * There's no absolute need to look only at touching neighbours:
 	 * we could search further afield for "compatible" anon_vmas.
 	 * But it would probably just be a waste of time searching,
@@ -1297,7 +1293,7 @@ none:
 	 * We're trying to allow mprotect remerging later on,
 	 * not trying to minimize memory used for anon_vmas.
 	 */
-	return NULL;
+	return anon_vma;
 }
 
 /*
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 070/118] mm/memory_hotplug: pass in nid to online_pages()
  2020-01-31  6:10 incoming Andrew Morton
                   ` (68 preceding siblings ...)
  2020-01-31  6:14 ` [patch 069/118] mm/mmap.c: get rid of odd jump labels in find_mergeable_anon_vma() Andrew Morton
@ 2020-01-31  6:14 ` Andrew Morton
  2020-01-31  6:14 ` [patch 071/118] mm/hotplug: silence a lockdep splat with printk() Andrew Morton
                   ` (47 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:14 UTC (permalink / raw)
  To: akpm, anshuman.khandual, dan.j.williams, david, gregkh, linux-mm,
	mhocko, mm-commits, osalvador, pasha.tatashin, rafael, torvalds

From: David Hildenbrand <david@redhat.com>
Subject: mm/memory_hotplug: pass in nid to online_pages()

Patch series "mm/memory_hotplug: pass in nid to online_pages()".

Simplify onlining code and get rid of find_memory_block(). Pass in the
nid from the memory block we are trying to online directly, instead of
manually looking it up.


This patch (of 2):

No need to lookup the memory block, we can directly pass in the nid.

Link: http://lkml.kernel.org/r/20200113113354.6341-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/base/memory.c          |    6 +++---
 include/linux/memory_hotplug.h |    3 ++-
 mm/memory_hotplug.c            |   13 ++-----------
 3 files changed, 7 insertions(+), 15 deletions(-)

--- a/drivers/base/memory.c~mm-memory_hotplug-pass-in-nid-to-online_pages
+++ a/drivers/base/memory.c
@@ -206,7 +206,7 @@ static bool pages_correctly_probed(unsig
  */
 static int
 memory_block_action(unsigned long start_section_nr, unsigned long action,
-		    int online_type)
+		    int online_type, int nid)
 {
 	unsigned long start_pfn;
 	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
@@ -219,7 +219,7 @@ memory_block_action(unsigned long start_
 		if (!pages_correctly_probed(start_pfn))
 			return -EBUSY;
 
-		ret = online_pages(start_pfn, nr_pages, online_type);
+		ret = online_pages(start_pfn, nr_pages, online_type, nid);
 		break;
 	case MEM_OFFLINE:
 		ret = offline_pages(start_pfn, nr_pages);
@@ -245,7 +245,7 @@ static int memory_block_change_state(str
 		mem->state = MEM_GOING_OFFLINE;
 
 	ret = memory_block_action(mem->start_section_nr, to_state,
-				mem->online_type);
+				  mem->online_type, mem->nid);
 
 	mem->state = ret ? from_state_req : to_state;
 
--- a/include/linux/memory_hotplug.h~mm-memory_hotplug-pass-in-nid-to-online_pages
+++ a/include/linux/memory_hotplug.h
@@ -94,7 +94,8 @@ extern int zone_grow_free_lists(struct z
 extern int zone_grow_waitqueues(struct zone *zone, unsigned long nr_pages);
 extern int add_one_highpage(struct page *page, int pfn, int bad_ppro);
 /* VM interface that may be used by firmware interface */
-extern int online_pages(unsigned long, unsigned long, int);
+extern int online_pages(unsigned long pfn, unsigned long nr_pages,
+			int online_type, int nid);
 extern int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn,
 	unsigned long *valid_start, unsigned long *valid_end);
 extern unsigned long __offline_isolated_pages(unsigned long start_pfn,
--- a/mm/memory_hotplug.c~mm-memory_hotplug-pass-in-nid-to-online_pages
+++ a/mm/memory_hotplug.c
@@ -783,27 +783,18 @@ struct zone * zone_for_pfn_range(int onl
 	return default_zone_for_pfn(nid, start_pfn, nr_pages);
 }
 
-int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_type)
+int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
+		       int online_type, int nid)
 {
 	unsigned long flags;
 	unsigned long onlined_pages = 0;
 	struct zone *zone;
 	int need_zonelists_rebuild = 0;
-	int nid;
 	int ret;
 	struct memory_notify arg;
-	struct memory_block *mem;
 
 	mem_hotplug_begin();
 
-	/*
-	 * We can't use pfn_to_nid() because nid might be stored in struct page
-	 * which is not yet initialized. Instead, we find nid from memory block.
-	 */
-	mem = find_memory_block(__pfn_to_section(pfn));
-	nid = mem->nid;
-	put_device(&mem->dev);

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 071/118] mm/hotplug: silence a lockdep splat with printk()
  2020-01-31  6:10 incoming Andrew Morton
                   ` (69 preceding siblings ...)
  2020-01-31  6:14 ` [patch 070/118] mm/memory_hotplug: pass in nid to online_pages() Andrew Morton
@ 2020-01-31  6:14 ` Andrew Morton
  2020-01-31  6:15 ` [patch 072/118] mm/page_isolation: fix potential warning from user Andrew Morton
                   ` (46 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:14 UTC (permalink / raw)
  To: akpm, cai, david, linux-mm, mhocko, mm-commits, peterz, pmladek,
	rostedt, sergey.senozhatsky.work, torvalds

From: Qian Cai <cai@lca.pw>
Subject: mm/hotplug: silence a lockdep splat with printk()

It is not that hard to trigger lockdep splats by calling printk from under
zone->lock.  Most of them are false positives caused by lock chains
introduced early in the boot process and they do not cause any real
problems (although most of the early boot lock dependencies could happen
after boot as well).  There are some console drivers which do allocate
from the printk context as well and those should be fixed.  In any case,
false positives are not that trivial to workaround and it is far from
optimal to lose lockdep functionality for something that is a non-issue.

So change has_unmovable_pages() so that it no longer calls dump_page()
itself - instead it returns a "struct page *" of the unmovable page back
to the caller so that in the case of a has_unmovable_pages() failure, the
caller can call dump_page() after releasing zone->lock.  Also, make
dump_page() is able to report a CMA page as well, so the reason string
from has_unmovable_pages() can be removed.

Even though has_unmovable_pages doesn't hold any reference to the returned
page this should be reasonably safe for the purpose of reporting the page
(dump_page) because it cannot be hotremoved in the context of memory
unplug.  The state of the page might change but that is the case even with
the existing code as zone->lock only plays role for free pages.

While at it, remove a similar but unnecessary debug-only printk() as well.
A sample of one of those lockdep splats is,

WARNING: possible circular locking dependency detected
------------------------------------------------------
test.sh/8653 is trying to acquire lock:
ffffffff865a4460 (console_owner){-.-.}, at:
console_unlock+0x207/0x750

but task is already holding lock:
ffff88883fff3c58 (&(&zone->lock)->rlock){-.-.}, at:
__offline_isolated_pages+0x179/0x3e0

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&(&zone->lock)->rlock){-.-.}:
       __lock_acquire+0x5b3/0xb40
       lock_acquire+0x126/0x280
       _raw_spin_lock+0x2f/0x40
       rmqueue_bulk.constprop.21+0xb6/0x1160
       get_page_from_freelist+0x898/0x22c0
       __alloc_pages_nodemask+0x2f3/0x1cd0
       alloc_pages_current+0x9c/0x110
       allocate_slab+0x4c6/0x19c0
       new_slab+0x46/0x70
       ___slab_alloc+0x58b/0x960
       __slab_alloc+0x43/0x70
       __kmalloc+0x3ad/0x4b0
       __tty_buffer_request_room+0x100/0x250
       tty_insert_flip_string_fixed_flag+0x67/0x110
       pty_write+0xa2/0xf0
       n_tty_write+0x36b/0x7b0
       tty_write+0x284/0x4c0
       __vfs_write+0x50/0xa0
       vfs_write+0x105/0x290
       redirected_tty_write+0x6a/0xc0
       do_iter_write+0x248/0x2a0
       vfs_writev+0x106/0x1e0
       do_writev+0xd4/0x180
       __x64_sys_writev+0x45/0x50
       do_syscall_64+0xcc/0x76c
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #2 (&(&port->lock)->rlock){-.-.}:
       __lock_acquire+0x5b3/0xb40
       lock_acquire+0x126/0x280
       _raw_spin_lock_irqsave+0x3a/0x50
       tty_port_tty_get+0x20/0x60
       tty_port_default_wakeup+0xf/0x30
       tty_port_tty_wakeup+0x39/0x40
       uart_write_wakeup+0x2a/0x40
       serial8250_tx_chars+0x22e/0x440
       serial8250_handle_irq.part.8+0x14a/0x170
       serial8250_default_handle_irq+0x5c/0x90
       serial8250_interrupt+0xa6/0x130
       __handle_irq_event_percpu+0x78/0x4f0
       handle_irq_event_percpu+0x70/0x100
       handle_irq_event+0x5a/0x8b
       handle_edge_irq+0x117/0x370
       do_IRQ+0x9e/0x1e0
       ret_from_intr+0x0/0x2a
       cpuidle_enter_state+0x156/0x8e0
       cpuidle_enter+0x41/0x70
       call_cpuidle+0x5e/0x90
       do_idle+0x333/0x370
       cpu_startup_entry+0x1d/0x1f
       start_secondary+0x290/0x330
       secondary_startup_64+0xb6/0xc0

-> #1 (&port_lock_key){-.-.}:
       __lock_acquire+0x5b3/0xb40
       lock_acquire+0x126/0x280
       _raw_spin_lock_irqsave+0x3a/0x50
       serial8250_console_write+0x3e4/0x450
       univ8250_console_write+0x4b/0x60
       console_unlock+0x501/0x750
       vprintk_emit+0x10d/0x340
       vprintk_default+0x1f/0x30
       vprintk_func+0x44/0xd4
       printk+0x9f/0xc5

-> #0 (console_owner){-.-.}:
       check_prev_add+0x107/0xea0
       validate_chain+0x8fc/0x1200
       __lock_acquire+0x5b3/0xb40
       lock_acquire+0x126/0x280
       console_unlock+0x269/0x750
       vprintk_emit+0x10d/0x340
       vprintk_default+0x1f/0x30
       vprintk_func+0x44/0xd4
       printk+0x9f/0xc5
       __offline_isolated_pages.cold.52+0x2f/0x30a
       offline_isolated_pages_cb+0x17/0x30
       walk_system_ram_range+0xda/0x160
       __offline_pages+0x79c/0xa10
       offline_pages+0x11/0x20
       memory_subsys_offline+0x7e/0xc0
       device_offline+0xd5/0x110
       state_store+0xc6/0xe0
       dev_attr_store+0x3f/0x60
       sysfs_kf_write+0x89/0xb0
       kernfs_fop_write+0x188/0x240
       __vfs_write+0x50/0xa0
       vfs_write+0x105/0x290
       ksys_write+0xc6/0x160
       __x64_sys_write+0x43/0x50
       do_syscall_64+0xcc/0x76c
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

other info that might help us debug this:

Chain exists of:
  console_owner --> &(&port->lock)->rlock --> &(&zone->lock)->rlock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&(&zone->lock)->rlock);
                               lock(&(&port->lock)->rlock);
                               lock(&(&zone->lock)->rlock);
  lock(console_owner);

 *** DEADLOCK ***

9 locks held by test.sh/8653:
 #0: ffff88839ba7d408 (sb_writers#4){.+.+}, at:
vfs_write+0x25f/0x290
 #1: ffff888277618880 (&of->mutex){+.+.}, at:
kernfs_fop_write+0x128/0x240
 #2: ffff8898131fc218 (kn->count#115){.+.+}, at:
kernfs_fop_write+0x138/0x240
 #3: ffffffff86962a80 (device_hotplug_lock){+.+.}, at:
lock_device_hotplug_sysfs+0x16/0x50
 #4: ffff8884374f4990 (&dev->mutex){....}, at:
device_offline+0x70/0x110
 #5: ffffffff86515250 (cpu_hotplug_lock.rw_sem){++++}, at:
__offline_pages+0xbf/0xa10
 #6: ffffffff867405f0 (mem_hotplug_lock.rw_sem){++++}, at:
percpu_down_write+0x87/0x2f0
 #7: ffff88883fff3c58 (&(&zone->lock)->rlock){-.-.}, at:
__offline_isolated_pages+0x179/0x3e0
 #8: ffffffff865a4920 (console_lock){+.+.}, at:
vprintk_emit+0x100/0x340

stack backtrace:
Hardware name: HPE ProLiant DL560 Gen10/ProLiant DL560 Gen10,
BIOS U34 05/21/2019
Call Trace:
 dump_stack+0x86/0xca
 print_circular_bug.cold.31+0x243/0x26e
 check_noncircular+0x29e/0x2e0
 check_prev_add+0x107/0xea0
 validate_chain+0x8fc/0x1200
 __lock_acquire+0x5b3/0xb40
 lock_acquire+0x126/0x280
 console_unlock+0x269/0x750
 vprintk_emit+0x10d/0x340
 vprintk_default+0x1f/0x30
 vprintk_func+0x44/0xd4
 printk+0x9f/0xc5
 __offline_isolated_pages.cold.52+0x2f/0x30a
 offline_isolated_pages_cb+0x17/0x30
 walk_system_ram_range+0xda/0x160
 __offline_pages+0x79c/0xa10
 offline_pages+0x11/0x20
 memory_subsys_offline+0x7e/0xc0
 device_offline+0xd5/0x110
 state_store+0xc6/0xe0
 dev_attr_store+0x3f/0x60
 sysfs_kf_write+0x89/0xb0
 kernfs_fop_write+0x188/0x240
 __vfs_write+0x50/0xa0
 vfs_write+0x105/0x290
 ksys_write+0xc6/0x160
 __x64_sys_write+0x43/0x50
 do_syscall_64+0xcc/0x76c
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Link: http://lkml.kernel.org/r/20200117181200.20299-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/page-isolation.h |    4 ++--
 mm/debug.c                     |   10 +++++++++-
 mm/page_alloc.c                |   23 ++++++++++-------------
 mm/page_isolation.c            |   11 ++++++++++-
 4 files changed, 31 insertions(+), 17 deletions(-)

--- a/include/linux/page-isolation.h~mm-hotplug-silence-a-lockdep-splat-with-printk
+++ a/include/linux/page-isolation.h
@@ -33,8 +33,8 @@ static inline bool is_migrate_isolate(in
 #define MEMORY_OFFLINE	0x1
 #define REPORT_FAILURE	0x2
 
-bool has_unmovable_pages(struct zone *zone, struct page *page, int migratetype,
-			 int flags);
+struct page *has_unmovable_pages(struct zone *zone, struct page *page,
+				 int migratetype, int flags);
 void set_pageblock_migratetype(struct page *page, int migratetype);
 int move_freepages_block(struct zone *zone, struct page *page,
 				int migratetype, int *num_movable);
--- a/mm/debug.c~mm-hotplug-silence-a-lockdep-splat-with-printk
+++ a/mm/debug.c
@@ -46,6 +46,13 @@ void __dump_page(struct page *page, cons
 {
 	struct address_space *mapping;
 	bool page_poisoned = PagePoisoned(page);
+	/*
+	 * Accessing the pageblock without the zone lock. It could change to
+	 * "isolate" again in the meantime, but since we are just dumping the
+	 * state for debugging, it should be fine to accept a bit of
+	 * inaccuracy here due to racing.
+	 */
+	bool page_cma = is_migrate_cma_page(page);
 	int mapcount;
 	char *type = "";
 
@@ -92,7 +99,8 @@ void __dump_page(struct page *page, cons
 	}
 	BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
 
-	pr_warn("%sflags: %#lx(%pGp)\n", type, page->flags, &page->flags);
+	pr_warn("%sflags: %#lx(%pGp)%s\n", type, page->flags, &page->flags,
+		page_cma ? " CMA" : "");
 
 hex_only:
 	print_hex_dump(KERN_WARNING, "raw: ", DUMP_PREFIX_NONE, 32,
--- a/mm/page_alloc.c~mm-hotplug-silence-a-lockdep-splat-with-printk
+++ a/mm/page_alloc.c
@@ -8185,13 +8185,17 @@ void *__init alloc_large_system_hash(con
  * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
  * check without lock_page also may miss some movable non-lru pages at
  * race condition. So you can't expect this function should be exact.
+ *
+ * Returns a page without holding a reference. If the caller wants to
+ * dereference that page (e.g., dumping), it has to make sure that that it
+ * cannot get removed (e.g., via memory unplug) concurrently.
+ *
  */
-bool has_unmovable_pages(struct zone *zone, struct page *page, int migratetype,
-			 int flags)
+struct page *has_unmovable_pages(struct zone *zone, struct page *page,
+				 int migratetype, int flags)
 {
 	unsigned long iter = 0;
 	unsigned long pfn = page_to_pfn(page);
-	const char *reason = "unmovable page";
 
 	/*
 	 * TODO we could make this much more efficient by not checking every
@@ -8208,9 +8212,8 @@ bool has_unmovable_pages(struct zone *zo
 		 * so consider them movable here.
 		 */
 		if (is_migrate_cma(migratetype))
-			return false;
+			return NULL;
 
-		reason = "CMA page";
 		goto unmovable;
 	}
 
@@ -8285,12 +8288,10 @@ bool has_unmovable_pages(struct zone *zo
 		 */
 		goto unmovable;
 	}
-	return false;
+	return NULL;
 unmovable:
 	WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
-	if (flags & REPORT_FAILURE)
-		dump_page(pfn_to_page(pfn + iter), reason);
-	return true;
+	return pfn_to_page(pfn + iter);
 }
 
 #ifdef CONFIG_CONTIG_ALLOC
@@ -8694,10 +8695,6 @@ __offline_isolated_pages(unsigned long s
 		BUG_ON(!PageBuddy(page));
 		order = page_order(page);
 		offlined_pages += 1 << order;
-#ifdef CONFIG_DEBUG_VM
-		pr_info("remove from free list %lx %d %lx\n",
-			pfn, 1 << order, end_pfn);
-#endif
 		del_page_from_free_area(page, &zone->free_area[order]);
 		pfn += (1 << order);
 	}
--- a/mm/page_isolation.c~mm-hotplug-silence-a-lockdep-splat-with-printk
+++ a/mm/page_isolation.c
@@ -17,6 +17,7 @@
 
 static int set_migratetype_isolate(struct page *page, int migratetype, int isol_flags)
 {
+	struct page *unmovable = NULL;
 	struct zone *zone;
 	unsigned long flags;
 	int ret = -EBUSY;
@@ -37,7 +38,8 @@ static int set_migratetype_isolate(struc
 	 * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
 	 * We just check MOVABLE pages.
 	 */
-	if (!has_unmovable_pages(zone, page, migratetype, isol_flags)) {
+	unmovable = has_unmovable_pages(zone, page, migratetype, isol_flags);
+	if (!unmovable) {
 		unsigned long nr_pages;
 		int mt = get_pageblock_migratetype(page);
 
@@ -54,6 +56,13 @@ out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 	if (!ret)
 		drain_all_pages(zone);
+	else if ((isol_flags & REPORT_FAILURE) && unmovable)
+		/*
+		 * printk() with zone->lock held will guarantee to trigger a
+		 * lockdep splat, so defer it here.
+		 */
+		dump_page(unmovable, "unmovable page");
+
 	return ret;
 }
 
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 072/118] mm/page_isolation: fix potential warning from user
  2020-01-31  6:10 incoming Andrew Morton
                   ` (70 preceding siblings ...)
  2020-01-31  6:14 ` [patch 071/118] mm/hotplug: silence a lockdep splat with printk() Andrew Morton
@ 2020-01-31  6:15 ` Andrew Morton
  2020-01-31  6:15 ` [patch 073/118] mm/zswap.c: add allocation hysteresis if pool limit is hit Andrew Morton
                   ` (45 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:15 UTC (permalink / raw)
  To: akpm, cai, david, linux-mm, mhocko, mhocko, mm-commits, torvalds

From: Qian Cai <cai@lca.pw>
Subject: mm/page_isolation: fix potential warning from user

It makes sense to call the WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE)
from start_isolate_page_range(), but should avoid triggering it from
userspace, i.e, from is_mem_section_removable() because it could crash the
system by a non-root user if warn_on_panic is set.

While at it, simplify the code a bit by removing an unnecessary jump
label.

Link: http://lkml.kernel.org/r/20200120163915.1469-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_alloc.c     |   11 ++++-------
 mm/page_isolation.c |   18 +++++++++++-------
 2 files changed, 15 insertions(+), 14 deletions(-)

--- a/mm/page_alloc.c~mm-page_isolation-fix-potential-warning-from-user
+++ a/mm/page_alloc.c
@@ -8214,7 +8214,7 @@ struct page *has_unmovable_pages(struct
 		if (is_migrate_cma(migratetype))
 			return NULL;
 
-		goto unmovable;
+		return page;
 	}
 
 	for (; iter < pageblock_nr_pages; iter++) {
@@ -8224,7 +8224,7 @@ struct page *has_unmovable_pages(struct
 		page = pfn_to_page(pfn + iter);
 
 		if (PageReserved(page))
-			goto unmovable;
+			return page;
 
 		/*
 		 * If the zone is movable and we have ruled out all reserved
@@ -8244,7 +8244,7 @@ struct page *has_unmovable_pages(struct
 			unsigned int skip_pages;
 
 			if (!hugepage_migration_supported(page_hstate(head)))
-				goto unmovable;
+				return page;
 
 			skip_pages = compound_nr(head) - (page - head);
 			iter += skip_pages - 1;
@@ -8286,12 +8286,9 @@ struct page *has_unmovable_pages(struct
 		 * is set to both of a memory hole page and a _used_ kernel
 		 * page at boot.
 		 */
-		goto unmovable;
+		return page;
 	}
 	return NULL;
-unmovable:
-	WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
-	return pfn_to_page(pfn + iter);
 }
 
 #ifdef CONFIG_CONTIG_ALLOC
--- a/mm/page_isolation.c~mm-page_isolation-fix-potential-warning-from-user
+++ a/mm/page_isolation.c
@@ -54,14 +54,18 @@ static int set_migratetype_isolate(struc
 
 out:
 	spin_unlock_irqrestore(&zone->lock, flags);
-	if (!ret)
+	if (!ret) {
 		drain_all_pages(zone);
-	else if ((isol_flags & REPORT_FAILURE) && unmovable)
-		/*
-		 * printk() with zone->lock held will guarantee to trigger a
-		 * lockdep splat, so defer it here.
-		 */
-		dump_page(unmovable, "unmovable page");
+	} else {
+		WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
+
+		if ((isol_flags & REPORT_FAILURE) && unmovable)
+			/*
+			 * printk() with zone->lock held will likely trigger a
+			 * lockdep splat, so defer it here.
+			 */
+			dump_page(unmovable, "unmovable page");
+	}
 
 	return ret;
 }
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 073/118] mm/zswap.c: add allocation hysteresis if pool limit is hit
  2020-01-31  6:10 incoming Andrew Morton
                   ` (71 preceding siblings ...)
  2020-01-31  6:15 ` [patch 072/118] mm/page_isolation: fix potential warning from user Andrew Morton
@ 2020-01-31  6:15 ` Andrew Morton
  2020-01-31  6:15 ` [patch 074/118] zswap: potential NULL dereference on error in init_zswap() Andrew Morton
                   ` (44 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:15 UTC (permalink / raw)
  To: akpm, ddstreet, linux-mm, mm-commits, torvalds, vitaly.wool

From: Vitaly Wool <vitaly.wool@konsulko.com>
Subject: mm/zswap.c: add allocation hysteresis if pool limit is hit

zswap will always try to shrink pool when zswap is full.  If there is a
high pressure on zswap it will result in flipping pages in and out zswap
pool without any real benefit, and the overall system performance will
drop.  The previous discussion on this subject [1] ended up with a
suggestion to implement a sort of hysteresis to refuse taking pages into
zswap pool until it has sufficient space if the limit has been hit.  This
is my take on this.

Hysteresis is controlled with a sysfs-configurable parameter (namely,
/sys/kernel/debug/zswap/accept_threhsold_percent).  It specifies the
threshold at which zswap would start accepting pages again after it became
full.  Setting this parameter to 100 disables the hysteresis and sets the
zswap behavior to pre-hysteresis state.

[1] https://lkml.org/lkml/2019/11/8/949

Link: http://lkml.kernel.org/r/20200108200118.15563-1-vitaly.wool@konsulko.com
Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 Documentation/vm/zswap.rst |   13 +++++
 mm/zswap.c                 |   85 ++++++++++++++++++++++-------------
 2 files changed, 67 insertions(+), 31 deletions(-)

--- a/Documentation/vm/zswap.rst~zswap-add-allocation-hysteresis-if-pool-limit-is-hit
+++ a/Documentation/vm/zswap.rst
@@ -130,6 +130,19 @@ checking for the same-value filled pages
 existing pages which are marked as same-value filled pages remain stored
 unchanged in zswap until they are either loaded or invalidated.
 
+To prevent zswap from shrinking pool when zswap is full and there's a high
+pressure on swap (this will result in flipping pages in and out zswap pool
+without any real benefit but with a performance drop for the system), a
+special parameter has been introduced to implement a sort of hysteresis to
+refuse taking pages into zswap pool until it has sufficient space if the limit
+has been hit. To set the threshold at which zswap would start accepting pages
+again after it became full, use the sysfs ``accept_threhsold_percent``
+attribute, e. g.::
+
+	echo 80 > /sys/module/zswap/parameters/accept_threhsold_percent
+
+Setting this parameter to 100 will disable the hysteresis.
+
 A debugfs interface is provided for various statistic about pool size, number
 of pages stored, same-value filled pages and various counters for the reasons
 pages are rejected.
--- a/mm/zswap.c~zswap-add-allocation-hysteresis-if-pool-limit-is-hit
+++ a/mm/zswap.c
@@ -32,6 +32,7 @@
 #include <linux/swapops.h>
 #include <linux/writeback.h>
 #include <linux/pagemap.h>
+#include <linux/workqueue.h>
 
 /*********************************
 * statistics
@@ -65,6 +66,11 @@ static u64 zswap_reject_kmemcache_fail;
 /* Duplicate store was encountered (rare) */
 static u64 zswap_duplicate_entry;
 
+/* Shrinker work queue */
+static struct workqueue_struct *shrink_wq;
+/* Pool limit was hit, we need to calm down */
+static bool zswap_pool_reached_full;
+
 /*********************************
 * tunables
 **********************************/
@@ -109,6 +115,11 @@ module_param_cb(zpool, &zswap_zpool_para
 static unsigned int zswap_max_pool_percent = 20;
 module_param_named(max_pool_percent, zswap_max_pool_percent, uint, 0644);
 
+/* The threshold for accepting new pages after the max_pool_percent was hit */
+static unsigned int zswap_accept_thr_percent = 90; /* of max pool size */
+module_param_named(accept_threshold_percent, zswap_accept_thr_percent,
+		   uint, 0644);
+
 /* Enable/disable handling same-value filled pages (enabled by default) */
 static bool zswap_same_filled_pages_enabled = true;
 module_param_named(same_filled_pages_enabled, zswap_same_filled_pages_enabled,
@@ -123,7 +134,8 @@ struct zswap_pool {
 	struct crypto_comp * __percpu *tfm;
 	struct kref kref;
 	struct list_head list;
-	struct work_struct work;
+	struct work_struct release_work;
+	struct work_struct shrink_work;
 	struct hlist_node node;
 	char tfm_name[CRYPTO_MAX_ALG_NAME];
 };
@@ -214,6 +226,13 @@ static bool zswap_is_full(void)
 			DIV_ROUND_UP(zswap_pool_total_size, PAGE_SIZE);
 }
 
+static bool zswap_can_accept(void)
+{
+	return totalram_pages() * zswap_accept_thr_percent / 100 *
+				zswap_max_pool_percent / 100 >
+			DIV_ROUND_UP(zswap_pool_total_size, PAGE_SIZE);
+}
+
 static void zswap_update_total_size(void)
 {
 	struct zswap_pool *pool;
@@ -501,6 +520,16 @@ static struct zswap_pool *zswap_pool_fin
 	return NULL;
 }
 
+static void shrink_worker(struct work_struct *w)
+{
+	struct zswap_pool *pool = container_of(w, typeof(*pool),
+						shrink_work);
+
+	if (zpool_shrink(pool->zpool, 1, NULL))
+		zswap_reject_reclaim_fail++;
+	zswap_pool_put(pool);
+}
+
 static struct zswap_pool *zswap_pool_create(char *type, char *compressor)
 {
 	struct zswap_pool *pool;
@@ -551,6 +580,7 @@ static struct zswap_pool *zswap_pool_cre
 	 */
 	kref_init(&pool->kref);
 	INIT_LIST_HEAD(&pool->list);
+	INIT_WORK(&pool->shrink_work, shrink_worker);
 
 	zswap_pool_debug("created", pool);
 
@@ -624,7 +654,8 @@ static int __must_check zswap_pool_get(s
 
 static void __zswap_pool_release(struct work_struct *work)
 {
-	struct zswap_pool *pool = container_of(work, typeof(*pool), work);
+	struct zswap_pool *pool = container_of(work, typeof(*pool),
+						release_work);
 
 	synchronize_rcu();
 
@@ -647,8 +678,8 @@ static void __zswap_pool_empty(struct kr
 
 	list_del_rcu(&pool->list);
 
-	INIT_WORK(&pool->work, __zswap_pool_release);
-	schedule_work(&pool->work);
+	INIT_WORK(&pool->release_work, __zswap_pool_release);
+	schedule_work(&pool->release_work);
 
 	spin_unlock(&zswap_pools_lock);
 }
@@ -942,22 +973,6 @@ end:
 	return ret;
 }
 
-static int zswap_shrink(void)
-{
-	struct zswap_pool *pool;
-	int ret;
-
-	pool = zswap_pool_last_get();
-	if (!pool)
-		return -ENOENT;
-
-	ret = zpool_shrink(pool->zpool, 1, NULL);
-
-	zswap_pool_put(pool);
-
-	return ret;
-}
-
 static int zswap_is_page_same_filled(void *ptr, unsigned long *value)
 {
 	unsigned int pos;
@@ -1011,21 +1026,23 @@ static int zswap_frontswap_store(unsigne
 
 	/* reclaim space if needed */
 	if (zswap_is_full()) {
+		struct zswap_pool *pool;
+
 		zswap_pool_limit_hit++;
-		if (zswap_shrink()) {
-			zswap_reject_reclaim_fail++;
-			ret = -ENOMEM;
-			goto reject;
-		}
+		zswap_pool_reached_full = true;
+		pool = zswap_pool_last_get();
+		if (pool)
+			queue_work(shrink_wq, &pool->shrink_work);
+		ret = -ENOMEM;
+		goto reject;
+	}
 
-		/* A second zswap_is_full() check after
-		 * zswap_shrink() to make sure it's now
-		 * under the max_pool_percent
-		 */
-		if (zswap_is_full()) {
+	if (zswap_pool_reached_full) {
+	       if (!zswap_can_accept()) {
 			ret = -ENOMEM;
 			goto reject;
-		}
+		} else
+			zswap_pool_reached_full = false;
 	}
 
 	/* allocate entry */
@@ -1332,11 +1349,17 @@ static int __init init_zswap(void)
 		zswap_enabled = false;
 	}
 
+	shrink_wq = create_workqueue("zswap-shrink");
+	if (!shrink_wq)
+		goto fallback_fail;
+
 	frontswap_register_ops(&zswap_frontswap_ops);
 	if (zswap_debugfs_init())
 		pr_warn("debugfs initialization failed\n");
 	return 0;
 
+fallback_fail:
+	zswap_pool_destroy(pool);
 hp_fail:
 	cpuhp_remove_state(CPUHP_MM_ZSWP_MEM_PREPARE);
 dstmem_fail:
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 074/118] zswap: potential NULL dereference on error in init_zswap()
  2020-01-31  6:10 incoming Andrew Morton
                   ` (72 preceding siblings ...)
  2020-01-31  6:15 ` [patch 073/118] mm/zswap.c: add allocation hysteresis if pool limit is hit Andrew Morton
@ 2020-01-31  6:15 ` Andrew Morton
  2020-01-31  6:15 ` [patch 075/118] include/linux/mm.h: clean up obsolete check on space in page->flags Andrew Morton
                   ` (43 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:15 UTC (permalink / raw)
  To: akpm, dan.carpenter, linux-mm, mm-commits, torvalds, vitaly.wool

From: Dan Carpenter <dan.carpenter@oracle.com>
Subject: zswap: potential NULL dereference on error in init_zswap()

The "pool" pointer can be NULL at the end of the init_zswap().  (We would
allocate a new pool later in that situation.) So in the error handling
then we need to make sure pool is a valid pointer before calling
"zswap_pool_destroy(pool);" because that function dereferences the
argument.

Link: http://lkml.kernel.org/r/20200114050902.og32fkllkod5ycf5@kili.mountain
Fixes: 93d4dfa9fbd0 ("mm/zswap.c: add allocation hysteresis if pool limit is hit")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/zswap.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/mm/zswap.c~zswap-potential-null-dereference-on-error-in-init_zswap
+++ a/mm/zswap.c
@@ -1359,7 +1359,8 @@ static int __init init_zswap(void)
 	return 0;
 
 fallback_fail:
-	zswap_pool_destroy(pool);
+	if (pool)
+		zswap_pool_destroy(pool);
 hp_fail:
 	cpuhp_remove_state(CPUHP_MM_ZSWP_MEM_PREPARE);
 dstmem_fail:
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 075/118] include/linux/mm.h: clean up obsolete check on space in page->flags
  2020-01-31  6:10 incoming Andrew Morton
                   ` (73 preceding siblings ...)
  2020-01-31  6:15 ` [patch 074/118] zswap: potential NULL dereference on error in init_zswap() Andrew Morton
@ 2020-01-31  6:15 ` Andrew Morton
  2020-01-31  6:15 ` [patch 076/118] include/linux/mm.h: remove dead code totalram_pages_set() Andrew Morton
                   ` (42 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:15 UTC (permalink / raw)
  To: akpm, arnd, david, linux-mm, mm-commits, torvalds, yuzhao

From: Yu Zhao <yuzhao@google.com>
Subject: include/linux/mm.h: clean up obsolete check on space in page->flags

The check was intended to make sure we don't overrun page flags.  But it's
obsolete because it doesn't include LAST_CPUPID_WIDTH nor KASAN_TAG_WIDTH.

Just remove check since we already have it covered in
linux/page-flags-layout.h (near the end of the file).

Link: http://lkml.kernel.org/r/20191208183508.89177-1-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mm.h |    4 ----
 1 file changed, 4 deletions(-)

--- a/include/linux/mm.h~mm-clean-up-obsolete-check-on-space-in-page-flags
+++ a/include/linux/mm.h
@@ -916,10 +916,6 @@ vm_fault_t finish_mkwrite_fault(struct v
 
 #define ZONEID_PGSHIFT		(ZONEID_PGOFF * (ZONEID_SHIFT != 0))
 
-#if SECTIONS_WIDTH+NODES_WIDTH+ZONES_WIDTH > BITS_PER_LONG - NR_PAGEFLAGS
-#error SECTIONS_WIDTH+NODES_WIDTH+ZONES_WIDTH > BITS_PER_LONG - NR_PAGEFLAGS
-#endif

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 076/118] include/linux/mm.h: remove dead code totalram_pages_set()
  2020-01-31  6:10 incoming Andrew Morton
                   ` (74 preceding siblings ...)
  2020-01-31  6:15 ` [patch 075/118] include/linux/mm.h: clean up obsolete check on space in page->flags Andrew Morton
@ 2020-01-31  6:15 ` Andrew Morton
  2020-01-31  6:15 ` [patch 077/118] include/linux/memory.h: drop fields 'hw' and 'phys_callback' from struct memory_block Andrew Morton
                   ` (41 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:15 UTC (permalink / raw)
  To: akpm, david, linux-mm, mm-commits, richardw.yang, torvalds

From: Wei Yang <richardw.yang@linux.intel.com>
Subject: include/linux/mm.h: remove dead code totalram_pages_set()

totalram_pages_set() was introduced in commit ca79b0c211af ("mm:
convert totalram_pages and totalhigh_pages variables to atomic"), but
no one uses it.

Link: http://lkml.kernel.org/r/20191218005543.24146-1-richardw.yang@linux.intel.com
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mm.h |    5 -----
 1 file changed, 5 deletions(-)

--- a/include/linux/mm.h~mm-remove-dead-code-totalram_pages_set
+++ a/include/linux/mm.h
@@ -70,11 +70,6 @@ static inline void totalram_pages_add(lo
 	atomic_long_add(count, &_totalram_pages);
 }
 
-static inline void totalram_pages_set(long val)
-{
-	atomic_long_set(&_totalram_pages, val);
-}

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 077/118] include/linux/memory.h: drop fields 'hw' and 'phys_callback' from struct memory_block
  2020-01-31  6:10 incoming Andrew Morton
                   ` (75 preceding siblings ...)
  2020-01-31  6:15 ` [patch 076/118] include/linux/mm.h: remove dead code totalram_pages_set() Andrew Morton
@ 2020-01-31  6:15 ` Andrew Morton
  2020-01-31  6:15 ` [patch 078/118] mm: fix comments related to node reclaim Andrew Morton
                   ` (40 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:15 UTC (permalink / raw)
  To: akpm, anshuman.khandual, dan.j.williams, david, linux-mm, mhocko,
	mm-commits, pasha.tatashin, torvalds

From: Anshuman Khandual <anshuman.khandual@arm.com>
Subject: include/linux/memory.h: drop fields 'hw' and 'phys_callback' from struct memory_block

memory_block structure elements 'hw' and 'phys_callback' are not getting
used.  This was originally added with commit 3947be1969a9 ("[PATCH] memory
hotplug: sysfs and add/remove functions") but never seem to have been
used.  Just drop them now.

Link: http://lkml.kernel.org/r/1576728650-13867-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/memory.h |    2 --
 1 file changed, 2 deletions(-)

--- a/include/linux/memory.h~mm-drop-elements-hw-and-phys_callback-from-struct-memory_block
+++ a/include/linux/memory.h
@@ -29,8 +29,6 @@ struct memory_block {
 	int section_count;		/* serialized by mem_sysfs_mutex */
 	int online_type;		/* for passing data to online routine */
 	int phys_device;		/* to which fru does this belong? */
-	void *hw;			/* optional pointer to fw/hw data */
-	int (*phys_callback)(struct memory_block *);
 	struct device dev;
 	int nid;			/* NID for this memory block */
 };
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 078/118] mm: fix comments related to node reclaim
  2020-01-31  6:10 incoming Andrew Morton
                   ` (76 preceding siblings ...)
  2020-01-31  6:15 ` [patch 077/118] include/linux/memory.h: drop fields 'hw' and 'phys_callback' from struct memory_block Andrew Morton
@ 2020-01-31  6:15 ` Andrew Morton
  2020-01-31  6:15 ` [patch 079/118] zram: try to avoid worst-case scenario on same element pages Andrew Morton
                   ` (39 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:15 UTC (permalink / raw)
  To: akpm, anshuman.khandual, haolee.swjtu, linux-mm, mgorman,
	mm-commits, torvalds

From: Hao Lee <haolee.swjtu@gmail.com>
Subject: mm: fix comments related to node reclaim

As zone reclaim has been replaced by node reclaim, this patch fixes related
comments.

Link: http://lkml.kernel.org/r/20191126141346.GA22665@haolee.github.io
Signed-off-by: Hao Lee <haolee.swjtu@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mmzone.h      |    2 +-
 include/uapi/linux/sysctl.h |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

--- a/include/linux/mmzone.h~mm-fix-comments-related-to-node-reclaim
+++ a/include/linux/mmzone.h
@@ -758,7 +758,7 @@ typedef struct pglist_data {
 
 #ifdef CONFIG_NUMA
 	/*
-	 * zone reclaim becomes active if more unmapped pages exist.
+	 * node reclaim becomes active if more unmapped pages exist.
 	 */
 	unsigned long		min_unmapped_pages;
 	unsigned long		min_slab_pages;
--- a/include/uapi/linux/sysctl.h~mm-fix-comments-related-to-node-reclaim
+++ a/include/uapi/linux/sysctl.h
@@ -195,7 +195,7 @@ enum
 	VM_MIN_UNMAPPED=32,	/* Set min percent of unmapped pages */
 	VM_PANIC_ON_OOM=33,	/* panic at out-of-memory */
 	VM_VDSO_ENABLED=34,	/* map VDSO into new processes? */
-	VM_MIN_SLAB=35,		 /* Percent pages ignored by zone reclaim */
+	VM_MIN_SLAB=35,		 /* Percent pages ignored by node reclaim */
 };
 
 
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 079/118] zram: try to avoid worst-case scenario on same element pages
  2020-01-31  6:10 incoming Andrew Morton
                   ` (77 preceding siblings ...)
  2020-01-31  6:15 ` [patch 078/118] mm: fix comments related to node reclaim Andrew Morton
@ 2020-01-31  6:15 ` Andrew Morton
  2020-01-31  6:15 ` [patch 080/118] drivers/block/zram/zram_drv.c: fix error return codes not being returned in writeback_store Andrew Morton
                   ` (38 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:15 UTC (permalink / raw)
  To: akpm, axboe, linux-mm, minchan, mm-commits,
	sergey.senozhatsky.work, taejoon.song, torvalds

From: Taejoon Song <taejoon.song@lge.com>
Subject: zram: try to avoid worst-case scenario on same element pages

The worst-case scenario on finding same element pages is that almost all
elements are same at the first glance but only last few elements are
different.

Since the same element tends to be grouped from the beginning of the
pages, if we check the first element with the last element before looping
through all elements, we might have some chances to quickly detect
non-same element pages.

1. Test is done under LG webOS TV (64-bit arch)
2. Dump the swap-out pages (~819200 pages)
3. Analyze the pages with simple test script which counts the iteration
   number and measures the speed at off-line

Under 64-bit arch, the worst iteration count is PAGE_SIZE / 8 bytes = 512.
The speed is based on the time to consume page_same_filled() function
only.  The result, on average, is listed as below:

                                   Num of Iter    Speed(MB/s)
Looping-Forward (Orig)                 38            99265
Looping-Backward                       36           102725
Last-element-check (This Patch)        33           125072

The result shows that the average iteration count decreases by 13% and the
speed increases by 25% with this patch.  This patch does not increase the
overall time complexity, though.

I also ran simpler version which uses backward loop.  Just looping
backward also makes some improvement, but less than this patch.

[taejoon.song@lge.com: fix off-by-one]
  Link: http://lkml.kernel.org/r/1578642001-11765-1-git-send-email-taejoon.song@lge.com
Link: http://lkml.kernel.org/r/1575424418-16119-1-git-send-email-taejoon.song@lge.com
Signed-off-by: Taejoon Song <taejoon.song@lge.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/block/zram/zram_drv.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

--- a/drivers/block/zram/zram_drv.c~zram-try-to-avoid-worst-case-scenario-on-same-element-pages
+++ a/drivers/block/zram/zram_drv.c
@@ -207,14 +207,17 @@ static inline void zram_fill_page(void *
 
 static bool page_same_filled(void *ptr, unsigned long *element)
 {
-	unsigned int pos;
 	unsigned long *page;
 	unsigned long val;
+	unsigned int pos, last_pos = PAGE_SIZE / sizeof(*page) - 1;
 
 	page = (unsigned long *)ptr;
 	val = page[0];
 
-	for (pos = 1; pos < PAGE_SIZE / sizeof(*page); pos++) {
+	if (val != page[last_pos])
+		return false;
+
+	for (pos = 1; pos < last_pos; pos++) {
 		if (val != page[pos])
 			return false;
 	}
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 080/118] drivers/block/zram/zram_drv.c: fix error return codes not being returned in writeback_store
  2020-01-31  6:10 incoming Andrew Morton
                   ` (78 preceding siblings ...)
  2020-01-31  6:15 ` [patch 079/118] zram: try to avoid worst-case scenario on same element pages Andrew Morton
@ 2020-01-31  6:15 ` Andrew Morton
  2020-01-31  6:15 ` [patch 081/118] include/linux/units.h: add helpers for kelvin to/from Celsius conversion Andrew Morton
                   ` (37 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:15 UTC (permalink / raw)
  To: akpm, axboe, colin.king, linux-mm, minchan, mm-commits,
	sergey.senozhatsky, torvalds

From: Colin Ian King <colin.king@canonical.com>
Subject: drivers/block/zram/zram_drv.c: fix error return codes not being returned in writeback_store

Currently when an error code -EIO or -ENOSPC in the for-loop of
writeback_store the error code is being overwritten by a ret = len
assignment at the end of the function and the error codes are being lost. 
Fix this by assigning ret = len at the start of the function and remove
the assignment from the end, hence allowing ret to be preserved when error
codes are assigned to it.

Addresses Coverity ("Unused value")

Link: http://lkml.kernel.org/r/20191128122958.178290-1-colin.king@canonical.com
Fixes: a939888ec38b ("zram: support idle/huge page writeback")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/block/zram/zram_drv.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/drivers/block/zram/zram_drv.c~zram-fix-error-return-codes-not-being-returned-in-writeback_store
+++ a/drivers/block/zram/zram_drv.c
@@ -629,7 +629,7 @@ static ssize_t writeback_store(struct de
 	struct bio bio;
 	struct bio_vec bio_vec;
 	struct page *page;
-	ssize_t ret;
+	ssize_t ret = len;
 	int mode;
 	unsigned long blk_idx = 0;
 
@@ -765,7 +765,6 @@ next:
 
 	if (blk_idx)
 		free_block_bdev(zram, blk_idx);
-	ret = len;
 	__free_page(page);
 release_init_lock:
 	up_read(&zram->init_lock);
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 081/118] include/linux/units.h: add helpers for kelvin to/from Celsius conversion
  2020-01-31  6:10 incoming Andrew Morton
                   ` (79 preceding siblings ...)
  2020-01-31  6:15 ` [patch 080/118] drivers/block/zram/zram_drv.c: fix error return codes not being returned in writeback_store Andrew Morton
@ 2020-01-31  6:15 ` Andrew Morton
  2020-01-31  6:15 ` [patch 082/118] ACPI: thermal: switch to use <linux/units.h> helpers Andrew Morton
                   ` (36 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:15 UTC (permalink / raw)
  To: akinobu.mita, akpm, amit.kucheria, andy.shevchenko, andy, axboe,
	daniel.lezcano, dvhart, emmanuel.grumbach, hch, jdelvare, jic23,
	johannes.berg, Jonathan.Cameron, kbusch, knaack.h, kvalo, lars,
	linux-mm, linux, luciano.coelho, mm-commits, pmeerw, rui.zhang,
	sagi, sgruszka, sujith.thomas, torvalds

From: Akinobu Mita <akinobu.mita@gmail.com>
Subject: include/linux/units.h: add helpers for kelvin to/from Celsius conversion

Patch series "add header file for kelvin to/from Celsius conversion
helpers", v4.

There are several helper macros to convert kelvin to/from Celsius in
<linux/thermal.h> for thermal drivers.  These are useful for any other
drivers or subsystems, but it's odd to include <linux/thermal.h> just for
the helpers.

This adds a new <linux/units.h> that provides the equivalent inline
functions for any drivers or subsystems, and switches all the users of
conversion helpers in <linux/thermal.h> to use <linux/units.h> helpers.


This patch (of 12):

There are several helper macros to convert kelvin to/from Celsius in
<linux/thermal.h> for thermal drivers.  These are useful for any other
drivers or subsystems, but it's odd to include <linux/thermal.h> just for
the helpers.

This adds a new <linux/units.h> that provides the equivalent inline
functions for any drivers or subsystems.  It is intended to replace the
helpers in <linux/thermal.h>.

Link: http://lkml.kernel.org/r/1576386975-7941-2-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Sujith Thomas <sujith.thomas@intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/units.h |   84 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 84 insertions(+)

--- /dev/null
+++ a/include/linux/units.h
@@ -0,0 +1,84 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_UNITS_H
+#define _LINUX_UNITS_H
+
+#include <linux/kernel.h>
+
+#define ABSOLUTE_ZERO_MILLICELSIUS -273150
+
+static inline long milli_kelvin_to_millicelsius(long t)
+{
+	return t + ABSOLUTE_ZERO_MILLICELSIUS;
+}
+
+static inline long millicelsius_to_milli_kelvin(long t)
+{
+	return t - ABSOLUTE_ZERO_MILLICELSIUS;
+}
+
+#define MILLIDEGREE_PER_DEGREE 1000
+#define MILLIDEGREE_PER_DECIDEGREE 100
+
+static inline long kelvin_to_millicelsius(long t)
+{
+	return milli_kelvin_to_millicelsius(t * MILLIDEGREE_PER_DEGREE);
+}
+
+static inline long millicelsius_to_kelvin(long t)
+{
+	t = millicelsius_to_milli_kelvin(t);
+
+	return DIV_ROUND_CLOSEST(t, MILLIDEGREE_PER_DEGREE);
+}
+
+static inline long deci_kelvin_to_celsius(long t)
+{
+	t = milli_kelvin_to_millicelsius(t * MILLIDEGREE_PER_DECIDEGREE);
+
+	return DIV_ROUND_CLOSEST(t, MILLIDEGREE_PER_DEGREE);
+}
+
+static inline long celsius_to_deci_kelvin(long t)
+{
+	t = millicelsius_to_milli_kelvin(t * MILLIDEGREE_PER_DEGREE);
+
+	return DIV_ROUND_CLOSEST(t, MILLIDEGREE_PER_DECIDEGREE);
+}
+
+/**
+ * deci_kelvin_to_millicelsius_with_offset - convert Kelvin to Celsius
+ * @t: temperature value in decidegrees Kelvin
+ * @offset: difference between Kelvin and Celsius in millidegrees
+ *
+ * Return: temperature value in millidegrees Celsius
+ */
+static inline long deci_kelvin_to_millicelsius_with_offset(long t, long offset)
+{
+	return t * MILLIDEGREE_PER_DECIDEGREE - offset;
+}
+
+static inline long deci_kelvin_to_millicelsius(long t)
+{
+	return milli_kelvin_to_millicelsius(t * MILLIDEGREE_PER_DECIDEGREE);
+}
+
+static inline long millicelsius_to_deci_kelvin(long t)
+{
+	t = millicelsius_to_milli_kelvin(t);
+
+	return DIV_ROUND_CLOSEST(t, MILLIDEGREE_PER_DECIDEGREE);
+}
+
+static inline long kelvin_to_celsius(long t)
+{
+	return t + DIV_ROUND_CLOSEST(ABSOLUTE_ZERO_MILLICELSIUS,
+				     MILLIDEGREE_PER_DEGREE);
+}
+
+static inline long celsius_to_kelvin(long t)
+{
+	return t - DIV_ROUND_CLOSEST(ABSOLUTE_ZERO_MILLICELSIUS,
+				     MILLIDEGREE_PER_DEGREE);
+}
+
+#endif /* _LINUX_UNITS_H */
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 082/118] ACPI: thermal: switch to use <linux/units.h> helpers
  2020-01-31  6:10 incoming Andrew Morton
                   ` (80 preceding siblings ...)
  2020-01-31  6:15 ` [patch 081/118] include/linux/units.h: add helpers for kelvin to/from Celsius conversion Andrew Morton
@ 2020-01-31  6:15 ` Andrew Morton
  2020-01-31  6:15 ` [patch 083/118] platform/x86: asus-wmi: " Andrew Morton
                   ` (35 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:15 UTC (permalink / raw)
  To: akinobu.mita, akpm, amit.kucheria, andy.shevchenko, andy, axboe,
	daniel.lezcano, dvhart, emmanuel.grumbach, hch, jdelvare, jic23,
	johannes.berg, Jonathan.Cameron, kbusch, knaack.h, kvalo, lars,
	linux-mm, linux, luciano.coelho, mm-commits, pmeerw, rui.zhang,
	sagi, sgruszka, sujith.thomas, torvalds

From: Akinobu Mita <akinobu.mita@gmail.com>
Subject: ACPI: thermal: switch to use <linux/units.h> helpers

This switches the ACPI thermal zone driver to use
celsius_to_deci_kelvin(), deci_kelvin_to_celsius(), and
deci_kelvin_to_millicelsius_with_offset() in <linux/units.h> instead of
helpers in <linux/thermal.h>.

This is preparation for centralizing the kelvin to/from Celsius conversion
helpers in <linux/units.h>.

Link: http://lkml.kernel.org/r/1576386975-7941-3-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Sujith Thomas <sujith.thomas@intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/acpi/thermal.c |   34 ++++++++++++++++++----------------
 1 file changed, 18 insertions(+), 16 deletions(-)

--- a/drivers/acpi/thermal.c~acpi-thermal-switch-to-use-linux-unitsh-helpers
+++ a/drivers/acpi/thermal.c
@@ -27,6 +27,7 @@
 #include <linux/acpi.h>
 #include <linux/workqueue.h>
 #include <linux/uaccess.h>
+#include <linux/units.h>
 
 #define PREFIX "ACPI: "
 
@@ -172,7 +173,7 @@ struct acpi_thermal {
 	struct acpi_handle_list devices;
 	struct thermal_zone_device *thermal_zone;
 	int tz_enabled;
-	int kelvin_offset;
+	int kelvin_offset;	/* in millidegrees */
 	struct work_struct thermal_check_work;
 };
 
@@ -297,7 +298,8 @@ static int acpi_thermal_trips_update(str
 			if (crt == -1) {
 				tz->trips.critical.flags.valid = 0;
 			} else if (crt > 0) {
-				unsigned long crt_k = CELSIUS_TO_DECI_KELVIN(crt);
+				unsigned long crt_k = celsius_to_deci_kelvin(crt);
+
 				/*
 				 * Allow override critical threshold
 				 */
@@ -333,7 +335,7 @@ static int acpi_thermal_trips_update(str
 		if (psv == -1) {
 			status = AE_SUPPORT;
 		} else if (psv > 0) {
-			tmp = CELSIUS_TO_DECI_KELVIN(psv);
+			tmp = celsius_to_deci_kelvin(psv);
 			status = AE_OK;
 		} else {
 			status = acpi_evaluate_integer(tz->device->handle,
@@ -413,7 +415,7 @@ static int acpi_thermal_trips_update(str
 					break;
 				if (i == 1)
 					tz->trips.active[0].temperature =
-						CELSIUS_TO_DECI_KELVIN(act);
+						celsius_to_deci_kelvin(act);
 				else
 					/*
 					 * Don't allow override higher than
@@ -421,9 +423,9 @@ static int acpi_thermal_trips_update(str
 					 */
 					tz->trips.active[i - 1].temperature =
 						(tz->trips.active[i - 2].temperature <
-						CELSIUS_TO_DECI_KELVIN(act) ?
+						celsius_to_deci_kelvin(act) ?
 						tz->trips.active[i - 2].temperature :
-						CELSIUS_TO_DECI_KELVIN(act));
+						celsius_to_deci_kelvin(act));
 				break;
 			} else {
 				tz->trips.active[i].temperature = tmp;
@@ -519,7 +521,7 @@ static int thermal_get_temp(struct therm
 	if (result)
 		return result;
 
-	*temp = DECI_KELVIN_TO_MILLICELSIUS_WITH_OFFSET(tz->temperature,
+	*temp = deci_kelvin_to_millicelsius_with_offset(tz->temperature,
 							tz->kelvin_offset);
 	return 0;
 }
@@ -624,7 +626,7 @@ static int thermal_get_trip_temp(struct
 
 	if (tz->trips.critical.flags.valid) {
 		if (!trip) {
-			*temp = DECI_KELVIN_TO_MILLICELSIUS_WITH_OFFSET(
+			*temp = deci_kelvin_to_millicelsius_with_offset(
 				tz->trips.critical.temperature,
 				tz->kelvin_offset);
 			return 0;
@@ -634,7 +636,7 @@ static int thermal_get_trip_temp(struct
 
 	if (tz->trips.hot.flags.valid) {
 		if (!trip) {
-			*temp = DECI_KELVIN_TO_MILLICELSIUS_WITH_OFFSET(
+			*temp = deci_kelvin_to_millicelsius_with_offset(
 				tz->trips.hot.temperature,
 				tz->kelvin_offset);
 			return 0;
@@ -644,7 +646,7 @@ static int thermal_get_trip_temp(struct
 
 	if (tz->trips.passive.flags.valid) {
 		if (!trip) {
-			*temp = DECI_KELVIN_TO_MILLICELSIUS_WITH_OFFSET(
+			*temp = deci_kelvin_to_millicelsius_with_offset(
 				tz->trips.passive.temperature,
 				tz->kelvin_offset);
 			return 0;
@@ -655,7 +657,7 @@ static int thermal_get_trip_temp(struct
 	for (i = 0; i < ACPI_THERMAL_MAX_ACTIVE &&
 		tz->trips.active[i].flags.valid; i++) {
 		if (!trip) {
-			*temp = DECI_KELVIN_TO_MILLICELSIUS_WITH_OFFSET(
+			*temp = deci_kelvin_to_millicelsius_with_offset(
 				tz->trips.active[i].temperature,
 				tz->kelvin_offset);
 			return 0;
@@ -672,7 +674,7 @@ static int thermal_get_crit_temp(struct
 	struct acpi_thermal *tz = thermal->devdata;
 
 	if (tz->trips.critical.flags.valid) {
-		*temperature = DECI_KELVIN_TO_MILLICELSIUS_WITH_OFFSET(
+		*temperature = deci_kelvin_to_millicelsius_with_offset(
 				tz->trips.critical.temperature,
 				tz->kelvin_offset);
 		return 0;
@@ -692,7 +694,7 @@ static int thermal_get_trend(struct ther
 
 	if (type == THERMAL_TRIP_ACTIVE) {
 		int trip_temp;
-		int temp = DECI_KELVIN_TO_MILLICELSIUS_WITH_OFFSET(
+		int temp = deci_kelvin_to_millicelsius_with_offset(
 					tz->temperature, tz->kelvin_offset);
 		if (thermal_get_trip_temp(thermal, trip, &trip_temp))
 			return -EINVAL;
@@ -1043,9 +1045,9 @@ static void acpi_thermal_guess_offset(st
 {
 	if (tz->trips.critical.flags.valid &&
 	    (tz->trips.critical.temperature % 5) == 1)
-		tz->kelvin_offset = 2731;
+		tz->kelvin_offset = 273100;
 	else
-		tz->kelvin_offset = 2732;
+		tz->kelvin_offset = 273200;
 }
 
 static void acpi_thermal_check_fn(struct work_struct *work)
@@ -1087,7 +1089,7 @@ static int acpi_thermal_add(struct acpi_
 	INIT_WORK(&tz->thermal_check_work, acpi_thermal_check_fn);
 
 	pr_info(PREFIX "%s [%s] (%ld C)\n", acpi_device_name(device),
-		acpi_device_bid(device), DECI_KELVIN_TO_CELSIUS(tz->temperature));
+		acpi_device_bid(device), deci_kelvin_to_celsius(tz->temperature));
 	goto end;
 
 free_memory:
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 083/118] platform/x86: asus-wmi: switch to use <linux/units.h> helpers
  2020-01-31  6:10 incoming Andrew Morton
                   ` (81 preceding siblings ...)
  2020-01-31  6:15 ` [patch 082/118] ACPI: thermal: switch to use <linux/units.h> helpers Andrew Morton
@ 2020-01-31  6:15 ` Andrew Morton
  2020-01-31  6:15 ` [patch 084/118] platform/x86: intel_menlow: " Andrew Morton
                   ` (34 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:15 UTC (permalink / raw)
  To: akinobu.mita, akpm, amit.kucheria, andy.shevchenko, andy, axboe,
	daniel.lezcano, dvhart, emmanuel.grumbach, hch, jdelvare, jic23,
	johannes.berg, Jonathan.Cameron, kbusch, knaack.h, kvalo, lars,
	linux-mm, linux, luciano.coelho, mm-commits, pmeerw, rui.zhang,
	sagi, sgruszka, sujith.thomas, torvalds

From: Akinobu Mita <akinobu.mita@gmail.com>
Subject: platform/x86: asus-wmi: switch to use <linux/units.h> helpers

The asus-wmi driver doesn't implement the thermal device functionality
directly, so including <linux/thermal.h> just for DECI_KELVIN_TO_CELSIUS()
is a bit odd.

This switches the asus-wmi driver to use deci_kelvin_to_millicelsius() in
<linux/units.h>.

The format string is changed from %d to %ld due to function returned type.

Link: http://lkml.kernel.org/r/1576386975-7941-4-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Sujith Thomas <sujith.thomas@intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/platform/x86/asus-wmi.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

--- a/drivers/platform/x86/asus-wmi.c~platform-x86-asus-wmi-switch-to-use-linux-unitsh-helpers
+++ a/drivers/platform/x86/asus-wmi.c
@@ -33,9 +33,9 @@
 #include <linux/seq_file.h>
 #include <linux/platform_data/x86/asus-wmi.h>
 #include <linux/platform_device.h>
-#include <linux/thermal.h>
 #include <linux/acpi.h>
 #include <linux/dmi.h>
+#include <linux/units.h>
 
 #include <acpi/battery.h>
 #include <acpi/video.h>
@@ -1514,9 +1514,8 @@ static ssize_t asus_hwmon_temp1(struct d
 	if (err < 0)
 		return err;
 
-	value = DECI_KELVIN_TO_CELSIUS((value & 0xFFFF)) * 1000;

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 084/118] platform/x86: intel_menlow: switch to use <linux/units.h> helpers
  2020-01-31  6:10 incoming Andrew Morton
                   ` (82 preceding siblings ...)
  2020-01-31  6:15 ` [patch 083/118] platform/x86: asus-wmi: " Andrew Morton
@ 2020-01-31  6:15 ` Andrew Morton
  2020-01-31  6:15 ` [patch 085/118] thermal: int340x: " Andrew Morton
                   ` (33 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:15 UTC (permalink / raw)
  To: akinobu.mita, akpm, amit.kucheria, andy.shevchenko, andy, axboe,
	daniel.lezcano, dvhart, emmanuel.grumbach, hch, jdelvare, jic23,
	johannes.berg, Jonathan.Cameron, kbusch, knaack.h, kvalo, lars,
	linux-mm, linux, luciano.coelho, mm-commits, pmeerw, rui.zhang,
	sagi, sgruszka, sujith.thomas, torvalds

From: Akinobu Mita <akinobu.mita@gmail.com>
Subject: platform/x86: intel_menlow: switch to use <linux/units.h> helpers

This switches the intel_menlow driver to use deci_kelvin_to_celsius() and
celsius_to_deci_kelvin() in <linux/units.h> instead of helpers in
<linux/thermal.h>.

This is preparation for centralizing the kelvin to/from Celsius conversion
helpers in <linux/units.h>.

This also removes a trailing space, while we're at it.

Link: http://lkml.kernel.org/r/1576386975-7941-5-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Sujith Thomas <sujith.thomas@intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/platform/x86/intel_menlow.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

--- a/drivers/platform/x86/intel_menlow.c~platform-x86-intel_menlow-switch-to-use-linux-unitsh-helpers
+++ a/drivers/platform/x86/intel_menlow.c
@@ -22,6 +22,7 @@
 #include <linux/slab.h>
 #include <linux/thermal.h>
 #include <linux/types.h>
+#include <linux/units.h>
 
 MODULE_AUTHOR("Thomas Sujith");
 MODULE_AUTHOR("Zhang Rui");
@@ -302,8 +303,10 @@ static ssize_t aux_show(struct device *d
 	int result;
 
 	result = sensor_get_auxtrip(attr->handle, idx, &value);
+	if (result)
+		return result;
 
-	return result ? result : sprintf(buf, "%lu", DECI_KELVIN_TO_CELSIUS(value));
+	return sprintf(buf, "%lu", deci_kelvin_to_celsius(value));
 }
 
 static ssize_t aux0_show(struct device *dev,
@@ -332,8 +335,8 @@ static ssize_t aux_store(struct device *
 	if (value < 0)
 		return -EINVAL;
 
-	result = sensor_set_auxtrip(attr->handle, idx, 
-				    CELSIUS_TO_DECI_KELVIN(value));
+	result = sensor_set_auxtrip(attr->handle, idx,
+				    celsius_to_deci_kelvin(value));
 	return result ? result : count;
 }
 
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 085/118] thermal: int340x: switch to use <linux/units.h> helpers
  2020-01-31  6:10 incoming Andrew Morton
                   ` (83 preceding siblings ...)
  2020-01-31  6:15 ` [patch 084/118] platform/x86: intel_menlow: " Andrew Morton
@ 2020-01-31  6:15 ` Andrew Morton
  2020-01-31  6:15 ` [patch 086/118] thermal: intel_pch: " Andrew Morton
                   ` (32 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:15 UTC (permalink / raw)
  To: akinobu.mita, akpm, amit.kucheria, andy.shevchenko, andy, axboe,
	daniel.lezcano, dvhart, emmanuel.grumbach, hch, jdelvare, jic23,
	johannes.berg, Jonathan.Cameron, kbusch, knaack.h, kvalo, lars,
	linux-mm, linux, luciano.coelho, mm-commits, pmeerw, rui.zhang,
	sagi, sgruszka, sujith.thomas, torvalds

From: Akinobu Mita <akinobu.mita@gmail.com>
Subject: thermal: int340x: switch to use <linux/units.h> helpers

This switches the int340x thermal zone driver to use
deci_kelvin_to_millicelsius() and millicelsius_to_deci_kelvin() in
<linux/units.h> instead of helpers in <linux/thermal.h>.

This is preparation for centralizing the kelvin to/from Celsius conversion
helpers in <linux/units.h>.

Link: http://lkml.kernel.org/r/1576386975-7941-6-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Sujith Thomas <sujith.thomas@intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

--- a/drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.c~thermal-int340x-switch-to-use-linux-unitsh-helpers
+++ a/drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.c
@@ -8,6 +8,7 @@
 #include <linux/init.h>
 #include <linux/acpi.h>
 #include <linux/thermal.h>
+#include <linux/units.h>
 #include "int340x_thermal_zone.h"
 
 static int int340x_thermal_get_zone_temp(struct thermal_zone_device *zone,
@@ -34,7 +35,7 @@ static int int340x_thermal_get_zone_temp
 		*temp = (unsigned long)conv_temp * 10;
 	} else
 		/* _TMP returns the temperature in tenths of degrees Kelvin */
-		*temp = DECI_KELVIN_TO_MILLICELSIUS(tmp);
+		*temp = deci_kelvin_to_millicelsius(tmp);
 
 	return 0;
 }
@@ -116,7 +117,7 @@ static int int340x_thermal_set_trip_temp
 
 	snprintf(name, sizeof(name), "PAT%d", trip);
 	status = acpi_execute_simple_method(d->adev->handle, name,
-			MILLICELSIUS_TO_DECI_KELVIN(temp));
+			millicelsius_to_deci_kelvin(temp));
 	if (ACPI_FAILURE(status))
 		return -EIO;
 
@@ -163,7 +164,7 @@ static int int340x_thermal_get_trip_conf
 	if (ACPI_FAILURE(status))
 		return -EIO;
 
-	*temp = DECI_KELVIN_TO_MILLICELSIUS(r);
+	*temp = deci_kelvin_to_millicelsius(r);
 
 	return 0;
 }
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 086/118] thermal: intel_pch: switch to use <linux/units.h> helpers
  2020-01-31  6:10 incoming Andrew Morton
                   ` (84 preceding siblings ...)
  2020-01-31  6:15 ` [patch 085/118] thermal: int340x: " Andrew Morton
@ 2020-01-31  6:15 ` Andrew Morton
  2020-01-31  6:15 ` [patch 087/118] nvme: hwmon: " Andrew Morton
                   ` (31 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:15 UTC (permalink / raw)
  To: akinobu.mita, akpm, amit.kucheria, andy.shevchenko, andy, axboe,
	daniel.lezcano, dvhart, emmanuel.grumbach, hch, jdelvare, jic23,
	johannes.berg, Jonathan.Cameron, kbusch, knaack.h, kvalo, lars,
	linux-mm, linux, luciano.coelho, mm-commits, pmeerw, rui.zhang,
	sagi, sgruszka, sujith.thomas, torvalds

From: Akinobu Mita <akinobu.mita@gmail.com>
Subject: thermal: intel_pch: switch to use <linux/units.h> helpers

This switches the intel pch thermal driver to use
deci_kelvin_to_millicelsius() in <linux/units.h> instead of helpers in
<linux/thermal.h>.

This is preparation for centralizing the kelvin to/from Celsius conversion
helpers in <linux/units.h>.

Link: http://lkml.kernel.org/r/1576386975-7941-7-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Sujith Thomas <sujith.thomas@intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/thermal/intel/intel_pch_thermal.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/thermal/intel/intel_pch_thermal.c~thermal-intel_pch-switch-to-use-linux-unitsh-helpers
+++ a/drivers/thermal/intel/intel_pch_thermal.c
@@ -13,6 +13,7 @@
 #include <linux/pci.h>
 #include <linux/acpi.h>
 #include <linux/thermal.h>
+#include <linux/units.h>
 #include <linux/pm.h>
 
 /* Intel PCH thermal Device IDs */
@@ -93,7 +94,7 @@ static void pch_wpt_add_acpi_psv_trip(st
 		if (ACPI_SUCCESS(status)) {
 			unsigned long trip_temp;
 
-			trip_temp = DECI_KELVIN_TO_MILLICELSIUS(r);
+			trip_temp = deci_kelvin_to_millicelsius(r);
 			if (trip_temp) {
 				ptd->psv_temp = trip_temp;
 				ptd->psv_trip_id = *nr_trips;
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 087/118] nvme: hwmon: switch to use <linux/units.h> helpers
  2020-01-31  6:10 incoming Andrew Morton
                   ` (85 preceding siblings ...)
  2020-01-31  6:15 ` [patch 086/118] thermal: intel_pch: " Andrew Morton
@ 2020-01-31  6:15 ` Andrew Morton
  2020-01-31  6:15 ` [patch 088/118] thermal: remove kelvin to/from Celsius conversion helpers from <linux/thermal.h> Andrew Morton
                   ` (30 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:15 UTC (permalink / raw)
  To: akinobu.mita, akpm, amit.kucheria, andy.shevchenko, andy, axboe,
	daniel.lezcano, dvhart, emmanuel.grumbach, hch, jdelvare, jic23,
	johannes.berg, Jonathan.Cameron, kbusch, knaack.h, kvalo, lars,
	linux-mm, linux, luciano.coelho, mm-commits, pmeerw, rui.zhang,
	sagi, sgruszka, sujith.thomas, torvalds

From: Akinobu Mita <akinobu.mita@gmail.com>
Subject: nvme: hwmon: switch to use <linux/units.h> helpers

This switches the nvme driver to use kelvin_to_millicelsius() and
millicelsius_to_kelvin() in <linux/units.h>.

Link: http://lkml.kernel.org/r/1576386975-7941-8-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Sujith Thomas <sujith.thomas@intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/nvme/host/hwmon.c |   13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

--- a/drivers/nvme/host/hwmon.c~nvme-hwmon-switch-to-use-linux-unitsh-helpers
+++ a/drivers/nvme/host/hwmon.c
@@ -5,14 +5,11 @@
  */
 
 #include <linux/hwmon.h>
+#include <linux/units.h>
 #include <asm/unaligned.h>
 
 #include "nvme.h"
 
-/* These macros should be moved to linux/temperature.h */
-#define MILLICELSIUS_TO_KELVIN(t) DIV_ROUND_CLOSEST((t) + 273150, 1000)
-#define KELVIN_TO_MILLICELSIUS(t) ((t) * 1000L - 273150)
-
 struct nvme_hwmon_data {
 	struct nvme_ctrl *ctrl;
 	struct nvme_smart_log log;
@@ -35,7 +32,7 @@ static int nvme_get_temp_thresh(struct n
 		return -EIO;
 	if (ret < 0)
 		return ret;
-	*temp = KELVIN_TO_MILLICELSIUS(status & NVME_TEMP_THRESH_MASK);
+	*temp = kelvin_to_millicelsius(status & NVME_TEMP_THRESH_MASK);
 
 	return 0;
 }
@@ -46,7 +43,7 @@ static int nvme_set_temp_thresh(struct n
 	unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT;
 	int ret;
 
-	temp = MILLICELSIUS_TO_KELVIN(temp);
+	temp = millicelsius_to_kelvin(temp);
 	threshold |= clamp_val(temp, 0, NVME_TEMP_THRESH_MASK);
 
 	if (under)
@@ -88,7 +85,7 @@ static int nvme_hwmon_read(struct device
 	case hwmon_temp_min:
 		return nvme_get_temp_thresh(data->ctrl, channel, true, val);
 	case hwmon_temp_crit:
-		*val = KELVIN_TO_MILLICELSIUS(data->ctrl->cctemp);
+		*val = kelvin_to_millicelsius(data->ctrl->cctemp);
 		return 0;
 	default:
 		break;
@@ -105,7 +102,7 @@ static int nvme_hwmon_read(struct device
 			temp = get_unaligned_le16(log->temperature);
 		else
 			temp = le16_to_cpu(log->temp_sensor[channel - 1]);
-		*val = KELVIN_TO_MILLICELSIUS(temp);
+		*val = kelvin_to_millicelsius(temp);
 		break;
 	case hwmon_temp_alarm:
 		*val = !!(log->critical_warning & NVME_SMART_CRIT_TEMPERATURE);
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 088/118] thermal: remove kelvin to/from Celsius conversion helpers from <linux/thermal.h>
  2020-01-31  6:10 incoming Andrew Morton
                   ` (86 preceding siblings ...)
  2020-01-31  6:15 ` [patch 087/118] nvme: hwmon: " Andrew Morton
@ 2020-01-31  6:15 ` Andrew Morton
  2020-01-31  6:16 ` [patch 089/118] iwlegacy: use <linux/units.h> helpers Andrew Morton
                   ` (29 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:15 UTC (permalink / raw)
  To: akinobu.mita, akpm, amit.kucheria, andy.shevchenko, andy, axboe,
	daniel.lezcano, dvhart, emmanuel.grumbach, hch, jdelvare, jic23,
	johannes.berg, Jonathan.Cameron, kbusch, knaack.h, kvalo, lars,
	linux-mm, linux, luciano.coelho, mm-commits, pmeerw, rui.zhang,
	sagi, sgruszka, sujith.thomas, torvalds

From: Akinobu Mita <akinobu.mita@gmail.com>
Subject: thermal: remove kelvin to/from Celsius conversion helpers from <linux/thermal.h>

This removes the kelvin to/from Celsius conversion helper macros in
<linux/thermal.h> which were switched to the inline helper functions in
<linux/units.h>.

Link: http://lkml.kernel.org/r/1576386975-7941-9-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Sujith Thomas <sujith.thomas@intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/thermal.h |   11 -----------
 1 file changed, 11 deletions(-)

--- a/include/linux/thermal.h~thermal-remove-kelvin-to-from-celsius-conversion-helpers-from-linux-thermalh
+++ a/include/linux/thermal.h
@@ -32,17 +32,6 @@
 /* use value, which < 0K, to indicate an invalid/uninitialized temperature */
 #define THERMAL_TEMP_INVALID	-274000
 
-/* Unit conversion macros */
-#define DECI_KELVIN_TO_CELSIUS(t)	({			\
-	long _t = (t);						\
-	((_t-2732 >= 0) ? (_t-2732+5)/10 : (_t-2732-5)/10);	\
-})
-#define CELSIUS_TO_DECI_KELVIN(t)	((t)*10+2732)
-#define DECI_KELVIN_TO_MILLICELSIUS_WITH_OFFSET(t, off) (((t) - (off)) * 100)
-#define DECI_KELVIN_TO_MILLICELSIUS(t) DECI_KELVIN_TO_MILLICELSIUS_WITH_OFFSET(t, 2732)
-#define MILLICELSIUS_TO_DECI_KELVIN_WITH_OFFSET(t, off) (((t) / 100) + (off))
-#define MILLICELSIUS_TO_DECI_KELVIN(t) MILLICELSIUS_TO_DECI_KELVIN_WITH_OFFSET(t, 2732)

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 089/118] iwlegacy: use <linux/units.h> helpers
  2020-01-31  6:10 incoming Andrew Morton
                   ` (87 preceding siblings ...)
  2020-01-31  6:15 ` [patch 088/118] thermal: remove kelvin to/from Celsius conversion helpers from <linux/thermal.h> Andrew Morton
@ 2020-01-31  6:16 ` Andrew Morton
  2020-01-31  6:16 ` [patch 090/118] iwlwifi: " Andrew Morton
                   ` (28 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:16 UTC (permalink / raw)
  To: akinobu.mita, akpm, amit.kucheria, andy.shevchenko, andy, axboe,
	daniel.lezcano, dvhart, emmanuel.grumbach, hch, jdelvare, jic23,
	johannes.berg, kbusch, knaack.h, kvalo, lars, linux-mm, linux,
	luciano.coelho, mm-commits, pmeerw, rui.zhang, sagi, sfr,
	sgruszka, sujith.thomas, torvalds

From: Akinobu Mita <akinobu.mita@gmail.com>
Subject: iwlegacy: use <linux/units.h> helpers

This switches the iwlegacy driver to use celsius_to_kelvin() and
kelvin_to_celsius() in <linux/units.h>.

[akinobu.mita@gmail.com: fix build warnings with format string]
  Link: http://lkml.kernel.org/r/1579014483-9226-1-git-send-email-akinobu.mita@gmail.com
  Link: https://lore.kernel.org/r/20200106171452.201c3b4c@canb.auug.org.au
Link: http://lkml.kernel.org/r/1576386975-7941-10-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Kalle Valo <kvalo@codeaurora.org>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Sujith Thomas <sujith.thomas@intel.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/net/wireless/intel/iwlegacy/4965-mac.c |    3 +-
 drivers/net/wireless/intel/iwlegacy/4965.c     |   17 +++++++--------
 drivers/net/wireless/intel/iwlegacy/common.h   |    3 --
 3 files changed, 11 insertions(+), 12 deletions(-)

--- a/drivers/net/wireless/intel/iwlegacy/4965.c~iwlegacy-use-linux-unitsh-helpers
+++ a/drivers/net/wireless/intel/iwlegacy/4965.c
@@ -17,6 +17,7 @@
 #include <linux/sched.h>
 #include <linux/skbuff.h>
 #include <linux/netdevice.h>
+#include <linux/units.h>
 #include <net/mac80211.h>
 #include <linux/etherdevice.h>
 #include <asm/unaligned.h>
@@ -1104,7 +1105,7 @@ il4965_fill_txpower_tbl(struct il_priv *
 	/* get current temperature (Celsius) */
 	current_temp = max(il->temperature, IL_TX_POWER_TEMPERATURE_MIN);
 	current_temp = min(il->temperature, IL_TX_POWER_TEMPERATURE_MAX);
-	current_temp = KELVIN_TO_CELSIUS(current_temp);
+	current_temp = kelvin_to_celsius(current_temp);
 
 	/* select thermal txpower adjustment params, based on channel group
 	 *   (same frequency group used for mimo txatten adjustment) */
@@ -1610,8 +1611,8 @@ il4965_hw_get_temperature(struct il_priv
 	temperature =
 	    (temperature * 97) / 100 + TEMPERATURE_CALIB_KELVIN_OFFSET;
 
-	D_TEMP("Calibrated temperature: %dK, %dC\n", temperature,
-	       KELVIN_TO_CELSIUS(temperature));
+	D_TEMP("Calibrated temperature: %dK, %ldC\n", temperature,
+	       kelvin_to_celsius(temperature));
 
 	return temperature;
 }
@@ -1670,12 +1671,12 @@ il4965_temperature_calib(struct il_priv
 
 	if (il->temperature != temp) {
 		if (il->temperature)
-			D_TEMP("Temperature changed " "from %dC to %dC\n",
-			       KELVIN_TO_CELSIUS(il->temperature),
-			       KELVIN_TO_CELSIUS(temp));
+			D_TEMP("Temperature changed " "from %ldC to %ldC\n",
+			       kelvin_to_celsius(il->temperature),
+			       kelvin_to_celsius(temp));
 		else
-			D_TEMP("Temperature " "initialized to %dC\n",
-			       KELVIN_TO_CELSIUS(temp));
+			D_TEMP("Temperature " "initialized to %ldC\n",
+			       kelvin_to_celsius(temp));
 	}
 
 	il->temperature = temp;
--- a/drivers/net/wireless/intel/iwlegacy/4965-mac.c~iwlegacy-use-linux-unitsh-helpers
+++ a/drivers/net/wireless/intel/iwlegacy/4965-mac.c
@@ -27,6 +27,7 @@
 #include <linux/firmware.h>
 #include <linux/etherdevice.h>
 #include <linux/if_arp.h>
+#include <linux/units.h>
 
 #include <net/mac80211.h>
 
@@ -6468,7 +6469,7 @@ il4965_set_hw_params(struct il_priv *il)
 	il->hw_params.valid_rx_ant = il->cfg->valid_rx_ant;
 
 	il->hw_params.ct_kill_threshold =
-	   CELSIUS_TO_KELVIN(CT_KILL_THRESHOLD_LEGACY);
+	   celsius_to_kelvin(CT_KILL_THRESHOLD_LEGACY);
 
 	il->hw_params.sens = &il4965_sensitivity;
 	il->hw_params.beacon_time_tsf_bits = IL4965_EXT_BEACON_TIME_POS;
--- a/drivers/net/wireless/intel/iwlegacy/common.h~iwlegacy-use-linux-unitsh-helpers
+++ a/drivers/net/wireless/intel/iwlegacy/common.h
@@ -779,9 +779,6 @@ struct il_sensitivity_ranges {
 	u16 nrg_th_cca;
 };
 
-#define KELVIN_TO_CELSIUS(x) ((x)-273)
-#define CELSIUS_TO_KELVIN(x) ((x)+273)

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 090/118] iwlwifi: use <linux/units.h> helpers
  2020-01-31  6:10 incoming Andrew Morton
                   ` (88 preceding siblings ...)
  2020-01-31  6:16 ` [patch 089/118] iwlegacy: use <linux/units.h> helpers Andrew Morton
@ 2020-01-31  6:16 ` Andrew Morton
  2020-01-31  6:16 ` [patch 091/118] thermal: armada: remove unused TO_MCELSIUS macro Andrew Morton
                   ` (27 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:16 UTC (permalink / raw)
  To: akinobu.mita, akpm, amit.kucheria, andy.shevchenko, andy, axboe,
	daniel.lezcano, dvhart, emmanuel.grumbach, hch, jdelvare, jic23,
	johannes.berg, Jonathan.Cameron, kbusch, knaack.h, kvalo, lars,
	linux-mm, linux, luciano.coelho, mm-commits, pmeerw, rui.zhang,
	sagi, sgruszka, sujith.thomas, torvalds

From: Akinobu Mita <akinobu.mita@gmail.com>
Subject: iwlwifi: use <linux/units.h> helpers

This switches the iwlwifi driver to use celsius_to_kelvin() and
kelvin_to_celsius() in <linux/units.h>.

Link: http://lkml.kernel.org/r/1576386975-7941-11-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Luca Coelho <luciano.coelho@intel.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Sujith Thomas <sujith.thomas@intel.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/net/wireless/intel/iwlwifi/dvm/dev.h     |    5 -----
 drivers/net/wireless/intel/iwlwifi/dvm/devices.c |    6 ++++--
 2 files changed, 4 insertions(+), 7 deletions(-)

--- a/drivers/net/wireless/intel/iwlwifi/dvm/dev.h~iwlwifi-use-linux-unitsh-helpers
+++ a/drivers/net/wireless/intel/iwlwifi/dvm/dev.h
@@ -237,11 +237,6 @@ struct iwl_sensitivity_ranges {
 	u16 nrg_th_cca;
 };
 
-
-#define KELVIN_TO_CELSIUS(x) ((x)-273)
-#define CELSIUS_TO_KELVIN(x) ((x)+273)
-
-
 /******************************************************************************
  *
  * Functions implemented in core module which are forward declared here
--- a/drivers/net/wireless/intel/iwlwifi/dvm/devices.c~iwlwifi-use-linux-unitsh-helpers
+++ a/drivers/net/wireless/intel/iwlwifi/dvm/devices.c
@@ -10,6 +10,8 @@
  *
  *****************************************************************************/
 
+#include <linux/units.h>
+
 /*
  * DVM device-specific data & functions
  */
@@ -345,7 +347,7 @@ static s32 iwl_temp_calib_to_offset(stru
 static void iwl5150_set_ct_threshold(struct iwl_priv *priv)
 {
 	const s32 volt2temp_coef = IWL_5150_VOLTAGE_TO_TEMPERATURE_COEFF;
-	s32 threshold = (s32)CELSIUS_TO_KELVIN(CT_KILL_THRESHOLD_LEGACY) -
+	s32 threshold = (s32)celsius_to_kelvin(CT_KILL_THRESHOLD_LEGACY) -
 			iwl_temp_calib_to_offset(priv);
 
 	priv->hw_params.ct_kill_threshold = threshold * volt2temp_coef;
@@ -381,7 +383,7 @@ static void iwl5150_temperature(struct i
 	vt = le32_to_cpu(priv->statistics.common.temperature);
 	vt = vt / IWL_5150_VOLTAGE_TO_TEMPERATURE_COEFF + offset;
 	/* now vt hold the temperature in Kelvin */
-	priv->temperature = KELVIN_TO_CELSIUS(vt);
+	priv->temperature = kelvin_to_celsius(vt);
 	iwl_tt_handler(priv);
 }
 
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 091/118] thermal: armada: remove unused TO_MCELSIUS macro
  2020-01-31  6:10 incoming Andrew Morton
                   ` (89 preceding siblings ...)
  2020-01-31  6:16 ` [patch 090/118] iwlwifi: " Andrew Morton
@ 2020-01-31  6:16 ` Andrew Morton
  2020-01-31  6:16 ` [patch 092/118] iio: adc: qcom-vadc-common: use <linux/units.h> helpers Andrew Morton
                   ` (26 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:16 UTC (permalink / raw)
  To: akinobu.mita, akpm, amit.kucheria, andy.shevchenko, andy, axboe,
	daniel.lezcano, dvhart, emmanuel.grumbach, hch, jdelvare, jic23,
	johannes.berg, Jonathan.Cameron, kbusch, knaack.h, kvalo, lars,
	linux-mm, linux, luciano.coelho, mm-commits, pmeerw, rui.zhang,
	sagi, sgruszka, sujith.thomas, torvalds

From: Akinobu Mita <akinobu.mita@gmail.com>
Subject: thermal: armada: remove unused TO_MCELSIUS macro

This removes unused TO_MCELSIUS() macro.

Link: http://lkml.kernel.org/r/1576386975-7941-12-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Sujith Thomas <sujith.thomas@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/thermal/armada_thermal.c |    2 --
 1 file changed, 2 deletions(-)

--- a/drivers/thermal/armada_thermal.c~thermal-armada-remove-unused-to_mcelsius-macro
+++ a/drivers/thermal/armada_thermal.c
@@ -21,8 +21,6 @@
 
 #include "thermal_core.h"
 
-#define TO_MCELSIUS(c)			((c) * 1000)

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 092/118] iio: adc: qcom-vadc-common: use <linux/units.h> helpers
  2020-01-31  6:10 incoming Andrew Morton
                   ` (90 preceding siblings ...)
  2020-01-31  6:16 ` [patch 091/118] thermal: armada: remove unused TO_MCELSIUS macro Andrew Morton
@ 2020-01-31  6:16 ` Andrew Morton
  2020-01-31  6:16 ` [patch 093/118] lib/zlib: add s390 hardware support for kernel zlib_deflate Andrew Morton
                   ` (25 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:16 UTC (permalink / raw)
  To: akinobu.mita, akpm, amit.kucheria, andy.shevchenko, andy, axboe,
	daniel.lezcano, dvhart, emmanuel.grumbach, hch, jdelvare, jic23,
	johannes.berg, Jonathan.Cameron, kbusch, knaack.h, kvalo, lars,
	linux-mm, linux, luciano.coelho, mm-commits, pmeerw, rui.zhang,
	sagi, sgruszka, sujith.thomas, torvalds

From: Akinobu Mita <akinobu.mita@gmail.com>
Subject: iio: adc: qcom-vadc-common: use <linux/units.h> helpers

This switches the qcom-vadc-common to use milli_kelvin_to_millicelsius()
in <linux/units.h>.

Link: http://lkml.kernel.org/r/1576386975-7941-13-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Cc: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Sujith Thomas <sujith.thomas@intel.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/iio/adc/qcom-vadc-common.c |    6 +++---
 drivers/iio/adc/qcom-vadc-common.h |    1 -
 2 files changed, 3 insertions(+), 4 deletions(-)

--- a/drivers/iio/adc/qcom-vadc-common.c~iio-adc-qcom-vadc-common-use-linux-unitsh-helpers
+++ a/drivers/iio/adc/qcom-vadc-common.c
@@ -6,6 +6,7 @@
 #include <linux/log2.h>
 #include <linux/err.h>
 #include <linux/module.h>
+#include <linux/units.h>
 
 #include "qcom-vadc-common.h"
 
@@ -236,8 +237,7 @@ static int qcom_vadc_scale_die_temp(cons
 		voltage = 0;
 	}
 
-	voltage -= KELVINMIL_CELSIUSMIL;
-	*result_mdec = voltage;
+	*result_mdec = milli_kelvin_to_millicelsius(voltage);
 
 	return 0;
 }
@@ -325,7 +325,7 @@ static int qcom_vadc_scale_hw_calib_die_
 {
 	*result_mdec = qcom_vadc_scale_code_voltage_factor(adc_code,
 				prescale, data, 2);
-	*result_mdec -= KELVINMIL_CELSIUSMIL;
+	*result_mdec = milli_kelvin_to_millicelsius(*result_mdec);
 
 	return 0;
 }
--- a/drivers/iio/adc/qcom-vadc-common.h~iio-adc-qcom-vadc-common-use-linux-unitsh-helpers
+++ a/drivers/iio/adc/qcom-vadc-common.h
@@ -38,7 +38,6 @@
 #define VADC_AVG_SAMPLES_MAX			512
 #define ADC5_AVG_SAMPLES_MAX			16
 
-#define KELVINMIL_CELSIUSMIL			273150
 #define PMIC5_CHG_TEMP_SCALE_FACTOR		377500
 #define PMIC5_SMB_TEMP_CONSTANT			419400
 #define PMIC5_SMB_TEMP_SCALE_FACTOR		356
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 093/118] lib/zlib: add s390 hardware support for kernel zlib_deflate
  2020-01-31  6:10 incoming Andrew Morton
                   ` (91 preceding siblings ...)
  2020-01-31  6:16 ` [patch 092/118] iio: adc: qcom-vadc-common: use <linux/units.h> helpers Andrew Morton
@ 2020-01-31  6:16 ` Andrew Morton
  2020-01-31  6:16 ` [patch 094/118] s390/boot: rename HEAP_SIZE due to name collision Andrew Morton
                   ` (24 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:16 UTC (permalink / raw)
  To: akpm, borntraeger, clm, dsterba, edward6, gor, heiko.carstens,
	iii, josef, linux-mm, mm-commits, rpurdie, torvalds, zaslonko

From: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Subject: lib/zlib: add s390 hardware support for kernel zlib_deflate

Patch series "S390 hardware support for kernel zlib", v3.

With IBM z15 mainframe the new DFLTCC instruction is available.  It
implements deflate algorithm in hardware (Nest Acceleration Unit - NXU)
with estimated compression and decompression performance orders of
magnitude faster than the current zlib.

This patchset adds s390 hardware compression support to kernel zlib.  The
code is based on the userspace zlib implementation:

	https://github.com/madler/zlib/pull/410

The coding style is also preserved for future maintainability.  There is
only limited set of userspace zlib functions represented in kernel.  Apart
from that, all the memory allocation should be performed in advance. 
Thus, the workarea structures are extended with the parameter lists
required for the DEFLATE CONVENTION CALL instruction.

Since kernel zlib itself does not support gzip headers, only Adler-32
checksum is processed (also can be produced by DFLTCC facility).  Like it
was implemented for userspace, kernel zlib will compress in hardware on
level 1, and in software on all other levels.  Decompression will always
happen in hardware (when enabled).

Two DFLTCC compression calls produce the same results only when they both
are made on machines of the same generation, and when the respective
buffers have the same offset relative to the start of the page.  Therefore
care should be taken when using hardware compression when reproducible
results are desired.  However it does always produce the standard conform
output which can be inflated anyway.

The new kernel command line parameter 'dfltcc' is introduced to configure
s390 zlib hardware support:

    Format: { on | off | def_only | inf_only | always }
     on:       s390 zlib hardware support for compression on
               level 1 and decompression (default)
     off:      No s390 zlib hardware support
     def_only: s390 zlib hardware support for deflate
               only (compression on level 1)
     inf_only: s390 zlib hardware support for inflate
               only (decompression)
     always:   Same as 'on' but ignores the selected compression
               level always using hardware support (used for debugging)

The main purpose of the integration of the NXU support into the kernel
zlib is the use of hardware deflate in btrfs filesystem with on-the-fly
compression enabled.  Apart from that, hardware support can also be used
during boot for decompressing the kernel or the ramdisk image 

With the patch for btrfs expanding zlib buffer from 1 to 4 pages (patch 6)
the following performance results have been achieved using the ramdisk
with btrfs.  These are relative numbers based on throughput rate and
compression ratio for zlib level 1:

  Input data              Deflate rate   Inflate rate   Compression ratio
                          NXU/Software   NXU/Software   NXU/Software
  stream of zeroes        1.46           1.02           1.00
  random ASCII data       10.44          3.00           0.96
  ASCII text (dickens)    6,21           3.33           0.94
  binary data (vmlinux)   8,37           3.90           1.02

This means that s390 hardware deflate can provide up to 10 times faster
compression (on level 1) and up to 4 times faster decompression (refers to
all compression levels) for btrfs zlib.

Disclaimer: Performance results are based on IBM internal tests using DD
command-line utility on btrfs on a Fedora 30 based internal driver in
native LPAR on a z15 system.  Results may vary based on individual
workload, configuration and software levels.


This patch (of 9):

Create zlib_dfltcc library with the s390 DEFLATE CONVERSION CALL
implementation and related compression functions.  Update zlib_deflate
functions with the hooks for s390 hardware support and adjust workspace
structures with extra parameter lists required for hardware deflate.

Link: http://lkml.kernel.org/r/20200103223334.20669-2-zaslonko@linux.ibm.com
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Co-developed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Eduard Shishkin <edward6@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 lib/Kconfig                      |    7 
 lib/Makefile                     |    1 
 lib/zlib_deflate/deflate.c       |   79 ++++----
 lib/zlib_deflate/deftree.c       |   54 -----
 lib/zlib_deflate/defutil.h       |  134 +++++++++++++-
 lib/zlib_dfltcc/Makefile         |   11 +
 lib/zlib_dfltcc/dfltcc.c         |   52 +++++
 lib/zlib_dfltcc/dfltcc.h         |  115 ++++++++++++
 lib/zlib_dfltcc/dfltcc_deflate.c |  274 +++++++++++++++++++++++++++++
 lib/zlib_dfltcc/dfltcc_syms.c    |   17 +
 lib/zlib_dfltcc/dfltcc_util.h    |  110 +++++++++++
 11 files changed, 752 insertions(+), 102 deletions(-)

--- a/lib/Kconfig~lib-zlib-add-s390-hardware-support-for-kernel-zlib_deflate
+++ a/lib/Kconfig
@@ -278,6 +278,13 @@ config ZLIB_DEFLATE
 	tristate
 	select BITREVERSE
 
+config ZLIB_DFLTCC
+	def_bool y
+	depends on S390
+	prompt "Enable s390x DEFLATE CONVERSION CALL support for kernel zlib"
+	help
+	 Enable s390x hardware support for zlib in the kernel.
+
 config LZO_COMPRESS
 	tristate
 
--- a/lib/Makefile~lib-zlib-add-s390-hardware-support-for-kernel-zlib_deflate
+++ a/lib/Makefile
@@ -140,6 +140,7 @@ obj-$(CONFIG_842_COMPRESS) += 842/
 obj-$(CONFIG_842_DECOMPRESS) += 842/
 obj-$(CONFIG_ZLIB_INFLATE) += zlib_inflate/
 obj-$(CONFIG_ZLIB_DEFLATE) += zlib_deflate/
+obj-$(CONFIG_ZLIB_DFLTCC) += zlib_dfltcc/
 obj-$(CONFIG_REED_SOLOMON) += reed_solomon/
 obj-$(CONFIG_BCH) += bch.o
 obj-$(CONFIG_LZO_COMPRESS) += lzo/
--- a/lib/zlib_deflate/deflate.c~lib-zlib-add-s390-hardware-support-for-kernel-zlib_deflate
+++ a/lib/zlib_deflate/deflate.c
@@ -52,16 +52,18 @@
 #include <linux/zutil.h>
 #include "defutil.h"
 
+/* architecture-specific bits */
+#ifdef CONFIG_ZLIB_DFLTCC
+#  include "../zlib_dfltcc/dfltcc.h"
+#else
+#define DEFLATE_RESET_HOOK(strm) do {} while (0)
+#define DEFLATE_HOOK(strm, flush, bstate) 0
+#define DEFLATE_NEED_CHECKSUM(strm) 1
+#endif
 
 /* ===========================================================================
  *  Function prototypes.
  */
-typedef enum {
-    need_more,      /* block not completed, need more input or more output */
-    block_done,     /* block flush performed */
-    finish_started, /* finish started, need only more output at next deflate */
-    finish_done     /* finish done, accept no more input or output */
-} block_state;
 
 typedef block_state (*compress_func) (deflate_state *s, int flush);
 /* Compression function. Returns the block state after the call. */
@@ -72,7 +74,6 @@ static block_state deflate_fast   (defla
 static block_state deflate_slow   (deflate_state *s, int flush);
 static void lm_init        (deflate_state *s);
 static void putShortMSB    (deflate_state *s, uInt b);
-static void flush_pending  (z_streamp strm);
 static int read_buf        (z_streamp strm, Byte *buf, unsigned size);
 static uInt longest_match  (deflate_state *s, IPos cur_match);
 
@@ -98,6 +99,25 @@ static  void check_match (deflate_state
  * See deflate.c for comments about the MIN_MATCH+1.
  */
 
+/* Workspace to be allocated for deflate processing */
+typedef struct deflate_workspace {
+    /* State memory for the deflator */
+    deflate_state deflate_memory;
+#ifdef CONFIG_ZLIB_DFLTCC
+    /* State memory for s390 hardware deflate */
+    struct dfltcc_state dfltcc_memory;
+#endif
+    Byte *window_memory;
+    Pos *prev_memory;
+    Pos *head_memory;
+    char *overlay_memory;
+} deflate_workspace;
+
+#ifdef CONFIG_ZLIB_DFLTCC
+/* dfltcc_state must be doubleword aligned for DFLTCC call */
+static_assert(offsetof(struct deflate_workspace, dfltcc_memory) % 8 == 0);
+#endif
+
 /* Values for max_lazy_match, good_match and max_chain_length, depending on
  * the desired pack level (0..9). The values given below have been tuned to
  * exclude worst case performance for pathological files. Better values may be
@@ -207,7 +227,15 @@ int zlib_deflateInit2(
      */
     next = (char *) mem;
     next += sizeof(*mem);
+#ifdef CONFIG_ZLIB_DFLTCC
+    /*
+     *  DFLTCC requires the window to be page aligned.
+     *  Thus, we overallocate and take the aligned portion of the buffer.
+     */
+    mem->window_memory = (Byte *) PTR_ALIGN(next, PAGE_SIZE);
+#else
     mem->window_memory = (Byte *) next;
+#endif
     next += zlib_deflate_window_memsize(windowBits);
     mem->prev_memory = (Pos *) next;
     next += zlib_deflate_prev_memsize(windowBits);
@@ -277,6 +305,8 @@ int zlib_deflateReset(
     zlib_tr_init(s);
     lm_init(s);
 
+    DEFLATE_RESET_HOOK(strm);
+
     return Z_OK;
 }
 
@@ -294,35 +324,6 @@ static void putShortMSB(
     put_byte(s, (Byte)(b & 0xff));
 }   
 
-/* =========================================================================
- * Flush as much pending output as possible. All deflate() output goes
- * through this function so some applications may wish to modify it
- * to avoid allocating a large strm->next_out buffer and copying into it.
- * (See also read_buf()).
- */
-static void flush_pending(
-	z_streamp strm
-)
-{
-    deflate_state *s = (deflate_state *) strm->state;
-    unsigned len = s->pending;
-
-    if (len > strm->avail_out) len = strm->avail_out;
-    if (len == 0) return;
-
-    if (strm->next_out != NULL) {
-	memcpy(strm->next_out, s->pending_out, len);
-	strm->next_out += len;
-    }
-    s->pending_out += len;
-    strm->total_out += len;
-    strm->avail_out  -= len;
-    s->pending -= len;
-    if (s->pending == 0) {
-        s->pending_out = s->pending_buf;
-    }
-}
-
 /* ========================================================================= */
 int zlib_deflate(
 	z_streamp strm,
@@ -404,7 +405,8 @@ int zlib_deflate(
         (flush != Z_NO_FLUSH && s->status != FINISH_STATE)) {
         block_state bstate;
 
-	bstate = (*(configuration_table[s->level].func))(s, flush);
+	bstate = DEFLATE_HOOK(strm, flush, &bstate) ? bstate :
+		 (*(configuration_table[s->level].func))(s, flush);
 
         if (bstate == finish_started || bstate == finish_done) {
             s->status = FINISH_STATE;
@@ -503,7 +505,8 @@ static int read_buf(
 
     strm->avail_in  -= len;
 
-    if (!((deflate_state *)(strm->state))->noheader) {
+    if (!DEFLATE_NEED_CHECKSUM(strm)) {}
+    else if (!((deflate_state *)(strm->state))->noheader) {
         strm->adler = zlib_adler32(strm->adler, strm->next_in, len);
     }
     memcpy(buf, strm->next_in, len);
--- a/lib/zlib_deflate/deftree.c~lib-zlib-add-s390-hardware-support-for-kernel-zlib_deflate
+++ a/lib/zlib_deflate/deftree.c
@@ -76,11 +76,6 @@ static const uch bl_order[BL_CODES]
  * probability, to avoid transmitting the lengths for unused bit length codes.
  */
 
-#define Buf_size (8 * 2*sizeof(char))
-/* Number of bits used within bi_buf. (bi_buf might be implemented on
- * more than 16 bits on some systems.)
- */
-
 /* ===========================================================================
  * Local data. These are initialized only once.
  */
@@ -147,7 +142,6 @@ static void send_all_trees (deflate_stat
 static void compress_block (deflate_state *s, ct_data *ltree,
                            ct_data *dtree);
 static void set_data_type  (deflate_state *s);
-static void bi_windup      (deflate_state *s);
 static void bi_flush       (deflate_state *s);
 static void copy_block     (deflate_state *s, char *buf, unsigned len,
                            int header);
@@ -170,54 +164,6 @@ static void copy_block     (deflate_stat
  */
 
 /* ===========================================================================
- * Send a value on a given number of bits.
- * IN assertion: length <= 16 and value fits in length bits.
- */
-#ifdef DEBUG_ZLIB
-static void send_bits      (deflate_state *s, int value, int length);
-
-static void send_bits(
-	deflate_state *s,
-	int value,  /* value to send */
-	int length  /* number of bits */
-)
-{
-    Tracevv((stderr," l %2d v %4x ", length, value));
-    Assert(length > 0 && length <= 15, "invalid length");
-    s->bits_sent += (ulg)length;
-
-    /* If not enough room in bi_buf, use (valid) bits from bi_buf and
-     * (16 - bi_valid) bits from value, leaving (width - (16-bi_valid))
-     * unused bits in value.
-     */
-    if (s->bi_valid > (int)Buf_size - length) {
-        s->bi_buf |= (value << s->bi_valid);
-        put_short(s, s->bi_buf);
-        s->bi_buf = (ush)value >> (Buf_size - s->bi_valid);
-        s->bi_valid += length - Buf_size;
-    } else {
-        s->bi_buf |= value << s->bi_valid;
-        s->bi_valid += length;
-    }
-}
-#else /* !DEBUG_ZLIB */
-
-#define send_bits(s, value, length) \
-{ int len = length;\
-  if (s->bi_valid > (int)Buf_size - len) {\
-    int val = value;\
-    s->bi_buf |= (val << s->bi_valid);\
-    put_short(s, s->bi_buf);\
-    s->bi_buf = (ush)val >> (Buf_size - s->bi_valid);\
-    s->bi_valid += len - Buf_size;\
-  } else {\
-    s->bi_buf |= (value) << s->bi_valid;\
-    s->bi_valid += len;\
-  }\
-}
-#endif /* DEBUG_ZLIB */
-
-/* ===========================================================================
  * Initialize the various 'constant' tables. In a multi-threaded environment,
  * this function may be called by two threads concurrently, but this is
  * harmless since both invocations do exactly the same thing.
--- a/lib/zlib_deflate/defutil.h~lib-zlib-add-s390-hardware-support-for-kernel-zlib_deflate
+++ a/lib/zlib_deflate/defutil.h
@@ -1,5 +1,7 @@
+#ifndef DEFUTIL_H
+#define DEFUTIL_H
 
-
+#include <linux/zutil.h>
 
 #define Assert(err, str) 
 #define Trace(dummy) 
@@ -238,17 +240,13 @@ typedef struct deflate_state {
 
 } deflate_state;
 
-typedef struct deflate_workspace {
-    /* State memory for the deflator */
-    deflate_state deflate_memory;
-    Byte *window_memory;
-    Pos *prev_memory;
-    Pos *head_memory;
-    char *overlay_memory;
-} deflate_workspace;
-
+#ifdef CONFIG_ZLIB_DFLTCC
+#define zlib_deflate_window_memsize(windowBits) \
+	(2 * (1 << (windowBits)) * sizeof(Byte) + PAGE_SIZE)
+#else
 #define zlib_deflate_window_memsize(windowBits) \
 	(2 * (1 << (windowBits)) * sizeof(Byte))
+#endif
 #define zlib_deflate_prev_memsize(windowBits) \
 	((1 << (windowBits)) * sizeof(Pos))
 #define zlib_deflate_head_memsize(memLevel) \
@@ -293,6 +291,24 @@ void zlib_tr_stored_type_only (deflate_s
 }
 
 /* ===========================================================================
+ * Reverse the first len bits of a code, using straightforward code (a faster
+ * method would use a table)
+ * IN assertion: 1 <= len <= 15
+ */
+static inline unsigned  bi_reverse(
+    unsigned code, /* the value to invert */
+    int len        /* its bit length */
+)
+{
+    register unsigned res = 0;
+    do {
+        res |= code & 1;
+        code >>= 1, res <<= 1;
+    } while (--len > 0);
+    return res >> 1;
+}
+
+/* ===========================================================================
  * Flush the bit buffer, keeping at most 7 bits in it.
  */
 static inline void bi_flush(deflate_state *s)
@@ -325,3 +341,101 @@ static inline void bi_windup(deflate_sta
 #endif
 }
 
+typedef enum {
+    need_more,      /* block not completed, need more input or more output */
+    block_done,     /* block flush performed */
+    finish_started, /* finish started, need only more output at next deflate */
+    finish_done     /* finish done, accept no more input or output */
+} block_state;
+
+#define Buf_size (8 * 2*sizeof(char))
+/* Number of bits used within bi_buf. (bi_buf might be implemented on
+ * more than 16 bits on some systems.)
+ */
+
+/* ===========================================================================
+ * Send a value on a given number of bits.
+ * IN assertion: length <= 16 and value fits in length bits.
+ */
+#ifdef DEBUG_ZLIB
+static void send_bits      (deflate_state *s, int value, int length);
+
+static void send_bits(
+    deflate_state *s,
+    int value,  /* value to send */
+    int length  /* number of bits */
+)
+{
+    Tracevv((stderr," l %2d v %4x ", length, value));
+    Assert(length > 0 && length <= 15, "invalid length");
+    s->bits_sent += (ulg)length;
+
+    /* If not enough room in bi_buf, use (valid) bits from bi_buf and
+     * (16 - bi_valid) bits from value, leaving (width - (16-bi_valid))
+     * unused bits in value.
+     */
+    if (s->bi_valid > (int)Buf_size - length) {
+        s->bi_buf |= (value << s->bi_valid);
+        put_short(s, s->bi_buf);
+        s->bi_buf = (ush)value >> (Buf_size - s->bi_valid);
+        s->bi_valid += length - Buf_size;
+    } else {
+        s->bi_buf |= value << s->bi_valid;
+        s->bi_valid += length;
+    }
+}
+#else /* !DEBUG_ZLIB */
+
+#define send_bits(s, value, length) \
+{ int len = length;\
+  if (s->bi_valid > (int)Buf_size - len) {\
+    int val = value;\
+    s->bi_buf |= (val << s->bi_valid);\
+    put_short(s, s->bi_buf);\
+    s->bi_buf = (ush)val >> (Buf_size - s->bi_valid);\
+    s->bi_valid += len - Buf_size;\
+  } else {\
+    s->bi_buf |= (value) << s->bi_valid;\
+    s->bi_valid += len;\
+  }\
+}
+#endif /* DEBUG_ZLIB */
+
+static inline void zlib_tr_send_bits(
+    deflate_state *s,
+    int value,
+    int length
+)
+{
+    send_bits(s, value, length);
+}
+
+/* =========================================================================
+ * Flush as much pending output as possible. All deflate() output goes
+ * through this function so some applications may wish to modify it
+ * to avoid allocating a large strm->next_out buffer and copying into it.
+ * (See also read_buf()).
+ */
+static inline void flush_pending(
+	z_streamp strm
+)
+{
+    deflate_state *s = (deflate_state *) strm->state;
+    unsigned len = s->pending;
+
+    if (len > strm->avail_out) len = strm->avail_out;
+    if (len == 0) return;
+
+    if (strm->next_out != NULL) {
+	memcpy(strm->next_out, s->pending_out, len);
+	strm->next_out += len;
+    }
+    s->pending_out += len;
+    strm->total_out += len;
+    strm->avail_out  -= len;
+    s->pending -= len;
+    if (s->pending == 0) {
+        s->pending_out = s->pending_buf;
+    }
+}
+#endif /* DEFUTIL_H */
--- /dev/null
+++ a/lib/zlib_dfltcc/dfltcc.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: Zlib
+/* dfltcc.c - SystemZ DEFLATE CONVERSION CALL support. */
+
+#include <linux/zutil.h>
+#include "dfltcc_util.h"
+#include "dfltcc.h"
+
+char *oesc_msg(
+    char *buf,
+    int oesc
+)
+{
+    if (oesc == 0x00)
+        return NULL; /* Successful completion */
+    else {
+#ifdef STATIC
+        return NULL; /* Ignore for pre-boot decompressor */
+#else
+        sprintf(buf, "Operation-Ending-Supplemental Code is 0x%.2X", oesc);
+        return buf;
+#endif
+    }
+}
+
+void dfltcc_reset(
+    z_streamp strm,
+    uInt size
+)
+{
+    struct dfltcc_state *dfltcc_state =
+        (struct dfltcc_state *)((char *)strm->state + size);
+    struct dfltcc_qaf_param *param =
+        (struct dfltcc_qaf_param *)&dfltcc_state->param;
+
+    /* Initialize available functions */
+    if (is_dfltcc_enabled()) {
+        dfltcc(DFLTCC_QAF, param, NULL, NULL, NULL, NULL, NULL);
+        memmove(&dfltcc_state->af, param, sizeof(dfltcc_state->af));
+    } else
+        memset(&dfltcc_state->af, 0, sizeof(dfltcc_state->af));
+
+    /* Initialize parameter block */
+    memset(&dfltcc_state->param, 0, sizeof(dfltcc_state->param));
+    dfltcc_state->param.nt = 1;
+
+    /* Initialize tuning parameters */
+    dfltcc_state->level_mask = DFLTCC_LEVEL_MASK;
+    dfltcc_state->block_size = DFLTCC_BLOCK_SIZE;
+    dfltcc_state->block_threshold = DFLTCC_FIRST_FHT_BLOCK_SIZE;
+    dfltcc_state->dht_threshold = DFLTCC_DHT_MIN_SAMPLE_SIZE;
+    dfltcc_state->param.ribm = DFLTCC_RIBM;
+}
--- /dev/null
+++ a/lib/zlib_dfltcc/dfltcc_deflate.c
@@ -0,0 +1,274 @@
+// SPDX-License-Identifier: Zlib
+
+#include "../zlib_deflate/defutil.h"
+#include "dfltcc_util.h"
+#include "dfltcc.h"
+#include <linux/zutil.h>
+
+/*
+ * Compress.
+ */
+int dfltcc_can_deflate(
+    z_streamp strm
+)
+{
+    deflate_state *state = (deflate_state *)strm->state;
+    struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
+
+    /* Unsupported compression settings */
+    if (!dfltcc_are_params_ok(state->level, state->w_bits, state->strategy,
+                              dfltcc_state->level_mask))
+        return 0;
+
+    /* Unsupported hardware */
+    if (!is_bit_set(dfltcc_state->af.fns, DFLTCC_GDHT) ||
+            !is_bit_set(dfltcc_state->af.fns, DFLTCC_CMPR) ||
+            !is_bit_set(dfltcc_state->af.fmts, DFLTCC_FMT0))
+        return 0;
+
+    return 1;
+}
+
+static void dfltcc_gdht(
+    z_streamp strm
+)
+{
+    deflate_state *state = (deflate_state *)strm->state;
+    struct dfltcc_param_v0 *param = &GET_DFLTCC_STATE(state)->param;
+    size_t avail_in = avail_in = strm->avail_in;
+
+    dfltcc(DFLTCC_GDHT,
+           param, NULL, NULL,
+           &strm->next_in, &avail_in, NULL);
+}
+
+static dfltcc_cc dfltcc_cmpr(
+    z_streamp strm
+)
+{
+    deflate_state *state = (deflate_state *)strm->state;
+    struct dfltcc_param_v0 *param = &GET_DFLTCC_STATE(state)->param;
+    size_t avail_in = strm->avail_in;
+    size_t avail_out = strm->avail_out;
+    dfltcc_cc cc;
+
+    cc = dfltcc(DFLTCC_CMPR | HBT_CIRCULAR,
+                param, &strm->next_out, &avail_out,
+                &strm->next_in, &avail_in, state->window);
+    strm->total_in += (strm->avail_in - avail_in);
+    strm->total_out += (strm->avail_out - avail_out);
+    strm->avail_in = avail_in;
+    strm->avail_out = avail_out;
+    return cc;
+}
+
+static void send_eobs(
+    z_streamp strm,
+    const struct dfltcc_param_v0 *param
+)
+{
+    deflate_state *state = (deflate_state *)strm->state;
+
+    zlib_tr_send_bits(
+          state,
+          bi_reverse(param->eobs >> (15 - param->eobl), param->eobl),
+          param->eobl);
+    flush_pending(strm);
+    if (state->pending != 0) {
+        /* The remaining data is located in pending_out[0:pending]. If someone
+         * calls put_byte() - this might happen in deflate() - the byte will be
+         * placed into pending_buf[pending], which is incorrect. Move the
+         * remaining data to the beginning of pending_buf so that put_byte() is
+         * usable again.
+         */
+        memmove(state->pending_buf, state->pending_out, state->pending);
+        state->pending_out = state->pending_buf;
+    }
+#ifdef ZLIB_DEBUG
+    state->compressed_len += param->eobl;
+#endif
+}
+
+int dfltcc_deflate(
+    z_streamp strm,
+    int flush,
+    block_state *result
+)
+{
+    deflate_state *state = (deflate_state *)strm->state;
+    struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
+    struct dfltcc_param_v0 *param = &dfltcc_state->param;
+    uInt masked_avail_in;
+    dfltcc_cc cc;
+    int need_empty_block;
+    int soft_bcc;
+    int no_flush;
+
+    if (!dfltcc_can_deflate(strm))
+        return 0;
+
+again:
+    masked_avail_in = 0;
+    soft_bcc = 0;
+    no_flush = flush == Z_NO_FLUSH;
+
+    /* Trailing empty block. Switch to software, except when Continuation Flag
+     * is set, which means that DFLTCC has buffered some output in the
+     * parameter block and needs to be called again in order to flush it.
+     */
+    if (flush == Z_FINISH && strm->avail_in == 0 && !param->cf) {
+        if (param->bcf) {
+            /* A block is still open, and the hardware does not support closing
+             * blocks without adding data. Thus, close it manually.
+             */
+            send_eobs(strm, param);
+            param->bcf = 0;
+        }
+        return 0;
+    }
+
+    if (strm->avail_in == 0 && !param->cf) {
+        *result = need_more;
+        return 1;
+    }
+
+    /* There is an open non-BFINAL block, we are not going to close it just
+     * yet, we have compressed more than DFLTCC_BLOCK_SIZE bytes and we see
+     * more than DFLTCC_DHT_MIN_SAMPLE_SIZE bytes. Open a new block with a new
+     * DHT in order to adapt to a possibly changed input data distribution.
+     */
+    if (param->bcf && no_flush &&
+            strm->total_in > dfltcc_state->block_threshold &&
+            strm->avail_in >= dfltcc_state->dht_threshold) {
+        if (param->cf) {
+            /* We need to flush the DFLTCC buffer before writing the
+             * End-of-block Symbol. Mask the input data and proceed as usual.
+             */
+            masked_avail_in += strm->avail_in;
+            strm->avail_in = 0;
+            no_flush = 0;
+        } else {
+            /* DFLTCC buffer is empty, so we can manually write the
+             * End-of-block Symbol right away.
+             */
+            send_eobs(strm, param);
+            param->bcf = 0;
+            dfltcc_state->block_threshold =
+                strm->total_in + dfltcc_state->block_size;
+            if (strm->avail_out == 0) {
+                *result = need_more;
+                return 1;
+            }
+        }
+    }
+
+    /* The caller gave us too much data. Pass only one block worth of
+     * uncompressed data to DFLTCC and mask the rest, so that on the next
+     * iteration we start a new block.
+     */
+    if (no_flush && strm->avail_in > dfltcc_state->block_size) {
+        masked_avail_in += (strm->avail_in - dfltcc_state->block_size);
+        strm->avail_in = dfltcc_state->block_size;
+    }
+
+    /* When we have an open non-BFINAL deflate block and caller indicates that
+     * the stream is ending, we need to close an open deflate block and open a
+     * BFINAL one.
+     */
+    need_empty_block = flush == Z_FINISH && param->bcf && !param->bhf;
+
+    /* Translate stream to parameter block */
+    param->cvt = CVT_ADLER32;
+    if (!no_flush)
+        /* We need to close a block. Always do this in software - when there is
+         * no input data, the hardware will not nohor BCC. */
+        soft_bcc = 1;
+    if (flush == Z_FINISH && !param->bcf)
+        /* We are about to open a BFINAL block, set Block Header Final bit
+         * until the stream ends.
+         */
+        param->bhf = 1;
+    /* DFLTCC-CMPR will write to next_out, so make sure that buffers with
+     * higher precedence are empty.
+     */
+    Assert(state->pending == 0, "There must be no pending bytes");
+    Assert(state->bi_valid < 8, "There must be less than 8 pending bits");
+    param->sbb = (unsigned int)state->bi_valid;
+    if (param->sbb > 0)
+        *strm->next_out = (Byte)state->bi_buf;
+    if (param->hl)
+        param->nt = 0; /* Honor history */
+    param->cv = strm->adler;
+
+    /* When opening a block, choose a Huffman-Table Type */
+    if (!param->bcf) {
+        if (strm->total_in == 0 && dfltcc_state->block_threshold > 0) {
+            param->htt = HTT_FIXED;
+        }
+        else {
+            param->htt = HTT_DYNAMIC;
+            dfltcc_gdht(strm);
+        }
+    }
+
+    /* Deflate */
+    do {
+        cc = dfltcc_cmpr(strm);
+        if (strm->avail_in < 4096 && masked_avail_in > 0)
+            /* We are about to call DFLTCC with a small input buffer, which is
+             * inefficient. Since there is masked data, there will be at least
+             * one more DFLTCC call, so skip the current one and make the next
+             * one handle more data.
+             */
+            break;
+    } while (cc == DFLTCC_CC_AGAIN);
+
+    /* Translate parameter block to stream */
+    strm->msg = oesc_msg(dfltcc_state->msg, param->oesc);
+    state->bi_valid = param->sbb;
+    if (state->bi_valid == 0)
+        state->bi_buf = 0; /* Avoid accessing next_out */
+    else
+        state->bi_buf = *strm->next_out & ((1 << state->bi_valid) - 1);
+    strm->adler = param->cv;
+
+    /* Unmask the input data */
+    strm->avail_in += masked_avail_in;
+    masked_avail_in = 0;
+
+    /* If we encounter an error, it means there is a bug in DFLTCC call */
+    Assert(cc != DFLTCC_CC_OP2_CORRUPT || param->oesc == 0, "BUG");
+
+    /* Update Block-Continuation Flag. It will be used to check whether to call
+     * GDHT the next time.
+     */
+    if (cc == DFLTCC_CC_OK) {
+        if (soft_bcc) {
+            send_eobs(strm, param);
+            param->bcf = 0;
+            dfltcc_state->block_threshold =
+                strm->total_in + dfltcc_state->block_size;
+        } else
+            param->bcf = 1;
+        if (flush == Z_FINISH) {
+            if (need_empty_block)
+                /* Make the current deflate() call also close the stream */
+                return 0;
+            else {
+                bi_windup(state);
+                *result = finish_done;
+            }
+        } else {
+            if (flush == Z_FULL_FLUSH)
+                param->hl = 0; /* Clear history */
+            *result = flush == Z_NO_FLUSH ? need_more : block_done;
+        }
+    } else {
+        param->bcf = 1;
+        *result = need_more;
+    }
+    if (strm->avail_in != 0 && strm->avail_out != 0)
+        goto again; /* deflate() must use all input or all output */
+    return 1;
+}
+
--- /dev/null
+++ a/lib/zlib_dfltcc/dfltcc.h
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: Zlib
+#ifndef DFLTCC_H
+#define DFLTCC_H
+
+#include "../zlib_deflate/defutil.h"
+
+/*
+ * Tuning parameters.
+ */
+#define DFLTCC_LEVEL_MASK 0x2 /* DFLTCC compression for level 1 only */
+#define DFLTCC_BLOCK_SIZE 1048576
+#define DFLTCC_FIRST_FHT_BLOCK_SIZE 4096
+#define DFLTCC_DHT_MIN_SAMPLE_SIZE 4096
+#define DFLTCC_RIBM 0
+
+/*
+ * Parameter Block for Query Available Functions.
+ */
+struct dfltcc_qaf_param {
+    char fns[16];
+    char reserved1[8];
+    char fmts[2];
+    char reserved2[6];
+};
+
+static_assert(sizeof(struct dfltcc_qaf_param) == 32);
+
+#define DFLTCC_FMT0 0
+
+/*
+ * Parameter Block for Generate Dynamic-Huffman Table, Compress and Expand.
+ */
+struct dfltcc_param_v0 {
+    uint16_t pbvn;                     /* Parameter-Block-Version Number */
+    uint8_t mvn;                       /* Model-Version Number */
+    uint8_t ribm;                      /* Reserved for IBM use */
+    unsigned reserved32 : 31;
+    unsigned cf : 1;                   /* Continuation Flag */
+    uint8_t reserved64[8];
+    unsigned nt : 1;                   /* New Task */
+    unsigned reserved129 : 1;
+    unsigned cvt : 1;                  /* Check Value Type */
+    unsigned reserved131 : 1;
+    unsigned htt : 1;                  /* Huffman-Table Type */
+    unsigned bcf : 1;                  /* Block-Continuation Flag */
+    unsigned bcc : 1;                  /* Block Closing Control */
+    unsigned bhf : 1;                  /* Block Header Final */
+    unsigned reserved136 : 1;
+    unsigned reserved137 : 1;
+    unsigned dhtgc : 1;                /* DHT Generation Control */
+    unsigned reserved139 : 5;
+    unsigned reserved144 : 5;
+    unsigned sbb : 3;                  /* Sub-Byte Boundary */
+    uint8_t oesc;                      /* Operation-Ending-Supplemental Code */
+    unsigned reserved160 : 12;
+    unsigned ifs : 4;                  /* Incomplete-Function Status */
+    uint16_t ifl;                      /* Incomplete-Function Length */
+    uint8_t reserved192[8];
+    uint8_t reserved256[8];
+    uint8_t reserved320[4];
+    uint16_t hl;                       /* History Length */
+    unsigned reserved368 : 1;
+    uint16_t ho : 15;                  /* History Offset */
+    uint32_t cv;                       /* Check Value */
+    unsigned eobs : 15;                /* End-of-block Symbol */
+    unsigned reserved431: 1;
+    uint8_t eobl : 4;                  /* End-of-block Length */
+    unsigned reserved436 : 12;
+    unsigned reserved448 : 4;
+    uint16_t cdhtl : 12;               /* Compressed-Dynamic-Huffman Table
+                                          Length */
+    uint8_t reserved464[6];
+    uint8_t cdht[288];
+    uint8_t reserved[32];
+    uint8_t csb[1152];
+};
+
+static_assert(sizeof(struct dfltcc_param_v0) == 1536);
+
+#define CVT_CRC32 0
+#define CVT_ADLER32 1
+#define HTT_FIXED 0
+#define HTT_DYNAMIC 1
+
+/*
+ *  Extension of inflate_state and deflate_state for DFLTCC.
+ */
+struct dfltcc_state {
+    struct dfltcc_param_v0 param;      /* Parameter block */
+    struct dfltcc_qaf_param af;        /* Available functions */
+    uLong level_mask;                  /* Levels on which to use DFLTCC */
+    uLong block_size;                  /* New block each X bytes */
+    uLong block_threshold;             /* New block after total_in > X */
+    uLong dht_threshold;               /* New block only if avail_in >= X */
+    char msg[64];                      /* Buffer for strm->msg */
+};
+
+/* Resides right after inflate_state or deflate_state */
+#define GET_DFLTCC_STATE(state) ((struct dfltcc_state *)((state) + 1))
+
+/* External functions */
+int dfltcc_can_deflate(z_streamp strm);
+int dfltcc_deflate(z_streamp strm,
+                   int flush,
+                   block_state *result);
+void dfltcc_reset(z_streamp strm, uInt size);
+
+#define DEFLATE_RESET_HOOK(strm) \
+    dfltcc_reset((strm), sizeof(deflate_state))
+
+#define DEFLATE_HOOK dfltcc_deflate
+
+#define DEFLATE_NEED_CHECKSUM(strm) (!dfltcc_can_deflate((strm)))
+
+#endif /* DFLTCC_H */
--- /dev/null
+++ a/lib/zlib_dfltcc/dfltcc_syms.c
@@ -0,0 +1,17 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * linux/lib/zlib_dfltcc/dfltcc_syms.c
+ *
+ * Exported symbols for the s390 zlib dfltcc support.
+ *
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/zlib.h>
+#include "dfltcc.h"
+
+EXPORT_SYMBOL(dfltcc_can_deflate);
+EXPORT_SYMBOL(dfltcc_deflate);
+EXPORT_SYMBOL(dfltcc_reset);
+MODULE_LICENSE("GPL");
--- /dev/null
+++ a/lib/zlib_dfltcc/dfltcc_util.h
@@ -0,0 +1,110 @@
+// SPDX-License-Identifier: Zlib
+#ifndef DFLTCC_UTIL_H
+#define DFLTCC_UTIL_H
+
+#include <linux/zutil.h>
+#include <asm/facility.h>
+
+/*
+ * C wrapper for the DEFLATE CONVERSION CALL instruction.
+ */
+typedef enum {
+    DFLTCC_CC_OK = 0,
+    DFLTCC_CC_OP1_TOO_SHORT = 1,
+    DFLTCC_CC_OP2_TOO_SHORT = 2,
+    DFLTCC_CC_OP2_CORRUPT = 2,
+    DFLTCC_CC_AGAIN = 3,
+} dfltcc_cc;
+
+#define DFLTCC_QAF 0
+#define DFLTCC_GDHT 1
+#define DFLTCC_CMPR 2
+#define DFLTCC_XPND 4
+#define HBT_CIRCULAR (1 << 7)
+#define HB_BITS 15
+#define HB_SIZE (1 << HB_BITS)
+#define DFLTCC_FACILITY 151
+
+static inline dfltcc_cc dfltcc(
+    int fn,
+    void *param,
+    Byte **op1,
+    size_t *len1,
+    const Byte **op2,
+    size_t *len2,
+    void *hist
+)
+{
+    Byte *t2 = op1 ? *op1 : NULL;
+    size_t t3 = len1 ? *len1 : 0;
+    const Byte *t4 = op2 ? *op2 : NULL;
+    size_t t5 = len2 ? *len2 : 0;
+    register int r0 __asm__("r0") = fn;
+    register void *r1 __asm__("r1") = param;
+    register Byte *r2 __asm__("r2") = t2;
+    register size_t r3 __asm__("r3") = t3;
+    register const Byte *r4 __asm__("r4") = t4;
+    register size_t r5 __asm__("r5") = t5;
+    int cc;
+
+    __asm__ volatile(
+                     ".insn rrf,0xb9390000,%[r2],%[r4],%[hist],0\n"
+                     "ipm %[cc]\n"
+                     : [r2] "+r" (r2)
+                     , [r3] "+r" (r3)
+                     , [r4] "+r" (r4)
+                     , [r5] "+r" (r5)
+                     , [cc] "=r" (cc)
+                     : [r0] "r" (r0)
+                     , [r1] "r" (r1)
+                     , [hist] "r" (hist)
+                     : "cc", "memory");
+    t2 = r2; t3 = r3; t4 = r4; t5 = r5;
+
+    if (op1)
+        *op1 = t2;
+    if (len1)
+        *len1 = t3;
+    if (op2)
+        *op2 = t4;
+    if (len2)
+        *len2 = t5;
+    return (cc >> 28) & 3;
+}
+
+static inline int is_bit_set(
+    const char *bits,
+    int n
+)
+{
+    return bits[n / 8] & (1 << (7 - (n % 8)));
+}
+
+static inline void turn_bit_off(
+    char *bits,
+    int n
+)
+{
+    bits[n / 8] &= ~(1 << (7 - (n % 8)));
+}
+
+static inline int dfltcc_are_params_ok(
+    int level,
+    uInt window_bits,
+    int strategy,
+    uLong level_mask
+)
+{
+    return (level_mask & (1 << level)) != 0 &&
+        (window_bits == HB_BITS) &&
+        (strategy == Z_DEFAULT_STRATEGY);
+}
+
+static inline int is_dfltcc_enabled(void)
+{
+    return test_facility(DFLTCC_FACILITY);
+}
+
+char *oesc_msg(char *buf, int oesc);
+
+#endif /* DFLTCC_UTIL_H */
--- /dev/null
+++ a/lib/zlib_dfltcc/Makefile
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# This is a modified version of zlib, which does all memory
+# allocation ahead of time.
+#
+# This is the code for s390 zlib hardware support.
+#
+
+obj-$(CONFIG_ZLIB_DFLTCC) += zlib_dfltcc.o
+
+zlib_dfltcc-objs := dfltcc.o dfltcc_deflate.o dfltcc_syms.o
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 094/118] s390/boot: rename HEAP_SIZE due to name collision
  2020-01-31  6:10 incoming Andrew Morton
                   ` (92 preceding siblings ...)
  2020-01-31  6:16 ` [patch 093/118] lib/zlib: add s390 hardware support for kernel zlib_deflate Andrew Morton
@ 2020-01-31  6:16 ` Andrew Morton
  2020-01-31  6:16 ` [patch 095/118] lib/zlib: add s390 hardware support for kernel zlib_inflate Andrew Morton
                   ` (23 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:16 UTC (permalink / raw)
  To: akpm, borntraeger, linux-mm, mm-commits, torvalds, zaslonko

From: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Subject: s390/boot: rename HEAP_SIZE due to name collision

Change the conflicting macro name in preparation for zlib_inflate hardware
support.

Link: http://lkml.kernel.org/r/20200103223334.20669-3-zaslonko@linux.ibm.com
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/s390/boot/compressed/decompressor.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/arch/s390/boot/compressed/decompressor.c~s390-boot-rename-heap_size-due-to-name-collision
+++ a/arch/s390/boot/compressed/decompressor.c
@@ -30,13 +30,13 @@ extern unsigned char _compressed_start[]
 extern unsigned char _compressed_end[];
 
 #ifdef CONFIG_HAVE_KERNEL_BZIP2
-#define HEAP_SIZE	0x400000
+#define BOOT_HEAP_SIZE	0x400000
 #else
-#define HEAP_SIZE	0x10000
+#define BOOT_HEAP_SIZE	0x10000
 #endif
 
 static unsigned long free_mem_ptr = (unsigned long) _end;
-static unsigned long free_mem_end_ptr = (unsigned long) _end + HEAP_SIZE;
+static unsigned long free_mem_end_ptr = (unsigned long) _end + BOOT_HEAP_SIZE;
 
 #ifdef CONFIG_KERNEL_GZIP
 #include "../../../../lib/decompress_inflate.c"
@@ -62,7 +62,7 @@ static unsigned long free_mem_end_ptr =
 #include "../../../../lib/decompress_unxz.c"
 #endif
 
-#define decompress_offset ALIGN((unsigned long)_end + HEAP_SIZE, PAGE_SIZE)
+#define decompress_offset ALIGN((unsigned long)_end + BOOT_HEAP_SIZE, PAGE_SIZE)
 
 unsigned long mem_safe_offset(void)
 {
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 095/118] lib/zlib: add s390 hardware support for kernel zlib_inflate
  2020-01-31  6:10 incoming Andrew Morton
                   ` (93 preceding siblings ...)
  2020-01-31  6:16 ` [patch 094/118] s390/boot: rename HEAP_SIZE due to name collision Andrew Morton
@ 2020-01-31  6:16 ` Andrew Morton
  2020-01-31  6:16 ` [patch 096/118] s390/boot: add dfltcc= kernel command line parameter Andrew Morton
                   ` (22 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:16 UTC (permalink / raw)
  To: akpm, borntraeger, clm, dsterba, edward6, gor, heiko.carstens,
	iii, josef, linux-mm, mm-commits, rpurdie, torvalds, zaslonko

From: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Subject: lib/zlib: add s390 hardware support for kernel zlib_inflate

Add decompression functions to zlib_dfltcc library.  Update zlib_inflate
functions with the hooks for s390 hardware support and adjust workspace
structures with extra parameter lists required for hardware inflate
decompression.

Link: http://lkml.kernel.org/r/20200103223334.20669-4-zaslonko@linux.ibm.com
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Co-developed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Eduard Shishkin <edward6@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 lib/decompress_inflate.c         |   13 ++
 lib/zlib_dfltcc/Makefile         |    2 
 lib/zlib_dfltcc/dfltcc.h         |   28 +++++
 lib/zlib_dfltcc/dfltcc_inflate.c |  143 +++++++++++++++++++++++++++++
 lib/zlib_inflate/inflate.c       |   32 ++++--
 lib/zlib_inflate/inflate.h       |    8 +
 lib/zlib_inflate/infutil.h       |   18 +++
 7 files changed, 233 insertions(+), 11 deletions(-)

--- a/lib/decompress_inflate.c~lib-zlib-add-s390-hardware-support-for-kernel-zlib_inflate
+++ a/lib/decompress_inflate.c
@@ -10,6 +10,10 @@
 #include "zlib_inflate/inftrees.c"
 #include "zlib_inflate/inffast.c"
 #include "zlib_inflate/inflate.c"
+#ifdef CONFIG_ZLIB_DFLTCC
+#include "zlib_dfltcc/dfltcc.c"
+#include "zlib_dfltcc/dfltcc_inflate.c"
+#endif
 
 #else /* STATIC */
 /* initramfs et al: linked */
@@ -76,7 +80,12 @@ STATIC int INIT __gunzip(unsigned char *
 	}
 
 	strm->workspace = malloc(flush ? zlib_inflate_workspacesize() :
+#ifdef CONFIG_ZLIB_DFLTCC
+	/* Always allocate the full workspace for DFLTCC */
+				 zlib_inflate_workspacesize());
+#else
 				 sizeof(struct inflate_state));
+#endif
 	if (strm->workspace == NULL) {
 		error("Out of memory while allocating workspace");
 		goto gunzip_nomem4;
@@ -123,10 +132,14 @@ STATIC int INIT __gunzip(unsigned char *
 
 	rc = zlib_inflateInit2(strm, -MAX_WBITS);
 
+#ifdef CONFIG_ZLIB_DFLTCC
+	/* Always keep the window for DFLTCC */
+#else
 	if (!flush) {
 		WS(strm)->inflate_state.wsize = 0;
 		WS(strm)->inflate_state.window = NULL;
 	}
+#endif
 
 	while (rc == Z_OK) {
 		if (strm->avail_in == 0) {
--- a/lib/zlib_dfltcc/dfltcc.h~lib-zlib-add-s390-hardware-support-for-kernel-zlib_inflate
+++ a/lib/zlib_dfltcc/dfltcc.h
@@ -104,6 +104,14 @@ int dfltcc_deflate(z_streamp strm,
                    int flush,
                    block_state *result);
 void dfltcc_reset(z_streamp strm, uInt size);
+int dfltcc_can_inflate(z_streamp strm);
+typedef enum {
+    DFLTCC_INFLATE_CONTINUE,
+    DFLTCC_INFLATE_BREAK,
+    DFLTCC_INFLATE_SOFTWARE,
+} dfltcc_inflate_action;
+dfltcc_inflate_action dfltcc_inflate(z_streamp strm,
+                                     int flush, int *ret);
 
 #define DEFLATE_RESET_HOOK(strm) \
     dfltcc_reset((strm), sizeof(deflate_state))
@@ -112,4 +120,24 @@ void dfltcc_reset(z_streamp strm, uInt s
 
 #define DEFLATE_NEED_CHECKSUM(strm) (!dfltcc_can_deflate((strm)))
 
+#define INFLATE_RESET_HOOK(strm) \
+    dfltcc_reset((strm), sizeof(struct inflate_state))
+
+#define INFLATE_TYPEDO_HOOK(strm, flush) \
+    if (dfltcc_can_inflate((strm))) { \
+        dfltcc_inflate_action action; \
+\
+        RESTORE(); \
+        action = dfltcc_inflate((strm), (flush), &ret); \
+        LOAD(); \
+        if (action == DFLTCC_INFLATE_CONTINUE) \
+            break; \
+        else if (action == DFLTCC_INFLATE_BREAK) \
+            goto inf_leave; \
+    }
+
+#define INFLATE_NEED_CHECKSUM(strm) (!dfltcc_can_inflate((strm)))
+
+#define INFLATE_NEED_UPDATEWINDOW(strm) (!dfltcc_can_inflate((strm)))
+
 #endif /* DFLTCC_H */
--- /dev/null
+++ a/lib/zlib_dfltcc/dfltcc_inflate.c
@@ -0,0 +1,143 @@
+// SPDX-License-Identifier: Zlib
+
+#include "../zlib_inflate/inflate.h"
+#include "dfltcc_util.h"
+#include "dfltcc.h"
+#include <linux/zutil.h>
+
+/*
+ * Expand.
+ */
+int dfltcc_can_inflate(
+    z_streamp strm
+)
+{
+    struct inflate_state *state = (struct inflate_state *)strm->state;
+    struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
+
+    /* Unsupported compression settings */
+    if (state->wbits != HB_BITS)
+        return 0;
+
+    /* Unsupported hardware */
+    return is_bit_set(dfltcc_state->af.fns, DFLTCC_XPND) &&
+               is_bit_set(dfltcc_state->af.fmts, DFLTCC_FMT0);
+}
+
+static int dfltcc_was_inflate_used(
+    z_streamp strm
+)
+{
+    struct inflate_state *state = (struct inflate_state *)strm->state;
+    struct dfltcc_param_v0 *param = &GET_DFLTCC_STATE(state)->param;
+
+    return !param->nt;
+}
+
+static int dfltcc_inflate_disable(
+    z_streamp strm
+)
+{
+    struct inflate_state *state = (struct inflate_state *)strm->state;
+    struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
+
+    if (!dfltcc_can_inflate(strm))
+        return 0;
+    if (dfltcc_was_inflate_used(strm))
+        /* DFLTCC has already decompressed some data. Since there is not
+         * enough information to resume decompression in software, the call
+         * must fail.
+         */
+        return 1;
+    /* DFLTCC was not used yet - decompress in software */
+    memset(&dfltcc_state->af, 0, sizeof(dfltcc_state->af));
+    return 0;
+}
+
+static dfltcc_cc dfltcc_xpnd(
+    z_streamp strm
+)
+{
+    struct inflate_state *state = (struct inflate_state *)strm->state;
+    struct dfltcc_param_v0 *param = &GET_DFLTCC_STATE(state)->param;
+    size_t avail_in = strm->avail_in;
+    size_t avail_out = strm->avail_out;
+    dfltcc_cc cc;
+
+    cc = dfltcc(DFLTCC_XPND | HBT_CIRCULAR,
+                param, &strm->next_out, &avail_out,
+                &strm->next_in, &avail_in, state->window);
+    strm->avail_in = avail_in;
+    strm->avail_out = avail_out;
+    return cc;
+}
+
+dfltcc_inflate_action dfltcc_inflate(
+    z_streamp strm,
+    int flush,
+    int *ret
+)
+{
+    struct inflate_state *state = (struct inflate_state *)strm->state;
+    struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
+    struct dfltcc_param_v0 *param = &dfltcc_state->param;
+    dfltcc_cc cc;
+
+    if (flush == Z_BLOCK) {
+        /* DFLTCC does not support stopping on block boundaries */
+        if (dfltcc_inflate_disable(strm)) {
+            *ret = Z_STREAM_ERROR;
+            return DFLTCC_INFLATE_BREAK;
+        } else
+            return DFLTCC_INFLATE_SOFTWARE;
+    }
+
+    if (state->last) {
+        if (state->bits != 0) {
+            strm->next_in++;
+            strm->avail_in--;
+            state->bits = 0;
+        }
+        state->mode = CHECK;
+        return DFLTCC_INFLATE_CONTINUE;
+    }
+
+    if (strm->avail_in == 0 && !param->cf)
+        return DFLTCC_INFLATE_BREAK;
+
+    if (!state->window || state->wsize == 0) {
+        state->mode = MEM;
+        return DFLTCC_INFLATE_CONTINUE;
+    }
+
+    /* Translate stream to parameter block */
+    param->cvt = CVT_ADLER32;
+    param->sbb = state->bits;
+    param->hl = state->whave; /* Software and hardware history formats match */
+    param->ho = (state->write - state->whave) & ((1 << HB_BITS) - 1);
+    if (param->hl)
+        param->nt = 0; /* Honor history for the first block */
+    param->cv = state->flags ? REVERSE(state->check) : state->check;
+
+    /* Inflate */
+    do {
+        cc = dfltcc_xpnd(strm);
+    } while (cc == DFLTCC_CC_AGAIN);
+
+    /* Translate parameter block to stream */
+    strm->msg = oesc_msg(dfltcc_state->msg, param->oesc);
+    state->last = cc == DFLTCC_CC_OK;
+    state->bits = param->sbb;
+    state->whave = param->hl;
+    state->write = (param->ho + param->hl) & ((1 << HB_BITS) - 1);
+    state->check = state->flags ? REVERSE(param->cv) : param->cv;
+    if (cc == DFLTCC_CC_OP2_CORRUPT && param->oesc != 0) {
+        /* Report an error if stream is corrupted */
+        state->mode = BAD;
+        return DFLTCC_INFLATE_CONTINUE;
+    }
+    state->mode = TYPEDO;
+    /* Break if operands are exhausted, otherwise continue looping */
+    return (cc == DFLTCC_CC_OP1_TOO_SHORT || cc == DFLTCC_CC_OP2_TOO_SHORT) ?
+        DFLTCC_INFLATE_BREAK : DFLTCC_INFLATE_CONTINUE;
+}
--- a/lib/zlib_dfltcc/Makefile~lib-zlib-add-s390-hardware-support-for-kernel-zlib_inflate
+++ a/lib/zlib_dfltcc/Makefile
@@ -8,4 +8,4 @@
 
 obj-$(CONFIG_ZLIB_DFLTCC) += zlib_dfltcc.o
 
-zlib_dfltcc-objs := dfltcc.o dfltcc_deflate.o dfltcc_syms.o
+zlib_dfltcc-objs := dfltcc.o dfltcc_deflate.o dfltcc_inflate.o dfltcc_syms.o
--- a/lib/zlib_inflate/inflate.c~lib-zlib-add-s390-hardware-support-for-kernel-zlib_inflate
+++ a/lib/zlib_inflate/inflate.c
@@ -15,6 +15,16 @@
 #include "inffast.h"
 #include "infutil.h"
 
+/* architecture-specific bits */
+#ifdef CONFIG_ZLIB_DFLTCC
+#  include "../zlib_dfltcc/dfltcc.h"
+#else
+#define INFLATE_RESET_HOOK(strm) do {} while (0)
+#define INFLATE_TYPEDO_HOOK(strm, flush) do {} while (0)
+#define INFLATE_NEED_UPDATEWINDOW(strm) 1
+#define INFLATE_NEED_CHECKSUM(strm) 1
+#endif
+
 int zlib_inflate_workspacesize(void)
 {
     return sizeof(struct inflate_workspace);
@@ -42,6 +52,7 @@ int zlib_inflateReset(z_streamp strm)
     state->write = 0;
     state->whave = 0;
 
+    INFLATE_RESET_HOOK(strm);
     return Z_OK;
 }
 
@@ -66,7 +77,15 @@ int zlib_inflateInit2(z_streamp strm, in
         return Z_STREAM_ERROR;
     }
     state->wbits = (unsigned)windowBits;
+#ifdef CONFIG_ZLIB_DFLTCC
+    /*
+     * DFLTCC requires the window to be page aligned.
+     * Thus, we overallocate and take the aligned portion of the buffer.
+     */
+    state->window = PTR_ALIGN(&WS(strm)->working_window[0], PAGE_SIZE);
+#else
     state->window = &WS(strm)->working_window[0];
+#endif
 
     return zlib_inflateReset(strm);
 }
@@ -227,11 +246,6 @@ static int zlib_inflateSyncPacket(z_stre
         bits -= bits & 7; \
     } while (0)
 
-/* Reverse the bytes in a 32-bit value */
-#define REVERSE(q) \
-    ((((q) >> 24) & 0xff) + (((q) >> 8) & 0xff00) + \
-     (((q) & 0xff00) << 8) + (((q) & 0xff) << 24))
-
 /*
    inflate() uses a state machine to process as much input data and generate as
    much output data as possible before returning.  The state machine is
@@ -395,6 +409,7 @@ int zlib_inflate(z_streamp strm, int flu
             if (flush == Z_BLOCK) goto inf_leave;
 	    /* fall through */
         case TYPEDO:
+            INFLATE_TYPEDO_HOOK(strm, flush);
             if (state->last) {
                 BYTEBITS();
                 state->mode = CHECK;
@@ -692,7 +707,7 @@ int zlib_inflate(z_streamp strm, int flu
                 out -= left;
                 strm->total_out += out;
                 state->total += out;
-                if (out)
+                if (INFLATE_NEED_CHECKSUM(strm) && out)
                     strm->adler = state->check =
                         UPDATE(state->check, put - out, out);
                 out = left;
@@ -726,7 +741,8 @@ int zlib_inflate(z_streamp strm, int flu
      */
   inf_leave:
     RESTORE();
-    if (state->wsize || (state->mode < CHECK && out != strm->avail_out))
+    if (INFLATE_NEED_UPDATEWINDOW(strm) &&
+            (state->wsize || (state->mode < CHECK && out != strm->avail_out)))
         zlib_updatewindow(strm, out);
 
     in -= strm->avail_in;
@@ -734,7 +750,7 @@ int zlib_inflate(z_streamp strm, int flu
     strm->total_in += in;
     strm->total_out += out;
     state->total += out;
-    if (state->wrap && out)
+    if (INFLATE_NEED_CHECKSUM(strm) && state->wrap && out)
         strm->adler = state->check =
             UPDATE(state->check, strm->next_out - out, out);
 
--- a/lib/zlib_inflate/inflate.h~lib-zlib-add-s390-hardware-support-for-kernel-zlib_inflate
+++ a/lib/zlib_inflate/inflate.h
@@ -11,6 +11,8 @@
    subject to change. Applications should only use zlib.h.
  */
 
+#include "inftrees.h"
+
 /* Possible inflate modes between inflate() calls */
 typedef enum {
     HEAD,       /* i: waiting for magic header */
@@ -108,4 +110,10 @@ struct inflate_state {
     unsigned short work[288];   /* work area for code table building */
     code codes[ENOUGH];         /* space for code tables */
 };
+
+/* Reverse the bytes in a 32-bit value */
+#define REVERSE(q) \
+    ((((q) >> 24) & 0xff) + (((q) >> 8) & 0xff00) + \
+     (((q) & 0xff00) << 8) + (((q) & 0xff) << 24))
+
 #endif
--- a/lib/zlib_inflate/infutil.h~lib-zlib-add-s390-hardware-support-for-kernel-zlib_inflate
+++ a/lib/zlib_inflate/infutil.h
@@ -12,14 +12,28 @@
 #define _INFUTIL_H
 
 #include <linux/zlib.h>
+#ifdef CONFIG_ZLIB_DFLTCC
+#include "../zlib_dfltcc/dfltcc.h"
+#include <asm/page.h>
+#endif
 
 /* memory allocation for inflation */
 
 struct inflate_workspace {
 	struct inflate_state inflate_state;
-	unsigned char working_window[1 << MAX_WBITS];
+#ifdef CONFIG_ZLIB_DFLTCC
+	struct dfltcc_state dfltcc_state;
+	unsigned char working_window[(1 << MAX_WBITS) + PAGE_SIZE];
+#else
+	unsigned char working_window[(1 << MAX_WBITS)];
+#endif
 };
 
-#define WS(z) ((struct inflate_workspace *)(z->workspace))
+#ifdef CONFIG_ZLIB_DFLTCC
+/* dfltcc_state must be doubleword aligned for DFLTCC call */
+static_assert(offsetof(struct inflate_workspace, dfltcc_state) % 8 == 0);
+#endif
+
+#define WS(strm) ((struct inflate_workspace *)(strm->workspace))
 
 #endif
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 096/118] s390/boot: add dfltcc= kernel command line parameter
  2020-01-31  6:10 incoming Andrew Morton
                   ` (94 preceding siblings ...)
  2020-01-31  6:16 ` [patch 095/118] lib/zlib: add s390 hardware support for kernel zlib_inflate Andrew Morton
@ 2020-01-31  6:16 ` Andrew Morton
  2020-01-31  6:16 ` [patch 097/118] lib/zlib: add zlib_deflate_dfltcc_enabled() function Andrew Morton
                   ` (21 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:16 UTC (permalink / raw)
  To: akpm, borntraeger, clm, dsterba, edward6, gor, heiko.carstens,
	iii, josef, linux-mm, mm-commits, rpurdie, torvalds, zaslonko

From: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Subject: s390/boot: add dfltcc= kernel command line parameter

Add the new kernel command line parameter 'dfltcc=' to configure s390 zlib
hardware support.

Format: { on | off | def_only | inf_only | always }
 on:       s390 zlib hardware support for compression on
           level 1 and decompression (default)
 off:      No s390 zlib hardware support
 def_only: s390 zlib hardware support for deflate
           only (compression on level 1)
 inf_only: s390 zlib hardware support for inflate
           only (decompression)
 always:   Same as 'on' but ignores the selected compression
           level always using hardware support (used for debugging)

Link: http://lkml.kernel.org/r/20200103223334.20669-5-zaslonko@linux.ibm.com
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Eduard Shishkin <edward6@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 Documentation/admin-guide/kernel-parameters.txt |   12 ++++++++++++
 arch/s390/boot/ipl_parm.c                       |   14 ++++++++++++++
 arch/s390/include/asm/setup.h                   |    7 +++++++
 arch/s390/kernel/setup.c                        |    2 ++
 lib/zlib_dfltcc/dfltcc.c                        |    5 ++++-
 lib/zlib_dfltcc/dfltcc.h                        |    1 +
 lib/zlib_dfltcc/dfltcc_deflate.c                |    6 ++++++
 lib/zlib_dfltcc/dfltcc_inflate.c                |    6 ++++++
 lib/zlib_dfltcc/dfltcc_util.h                   |    4 +++-
 9 files changed, 55 insertions(+), 2 deletions(-)

--- a/arch/s390/boot/ipl_parm.c~s390-boot-add-dfltcc=-kernel-command-line-parameter
+++ a/arch/s390/boot/ipl_parm.c
@@ -14,6 +14,7 @@
 char __bootdata(early_command_line)[COMMAND_LINE_SIZE];
 struct ipl_parameter_block __bootdata_preserved(ipl_block);
 int __bootdata_preserved(ipl_block_valid);
+unsigned int __bootdata_preserved(zlib_dfltcc_support) = ZLIB_DFLTCC_FULL;
 
 unsigned long __bootdata(vmalloc_size) = VMALLOC_DEFAULT_SIZE;
 unsigned long __bootdata(memory_end);
@@ -229,6 +230,19 @@ void parse_boot_command_line(void)
 		if (!strcmp(param, "vmalloc") && val)
 			vmalloc_size = round_up(memparse(val, NULL), PAGE_SIZE);
 
+		if (!strcmp(param, "dfltcc")) {
+			if (!strcmp(val, "off"))
+				zlib_dfltcc_support = ZLIB_DFLTCC_DISABLED;
+			else if (!strcmp(val, "on"))
+				zlib_dfltcc_support = ZLIB_DFLTCC_FULL;
+			else if (!strcmp(val, "def_only"))
+				zlib_dfltcc_support = ZLIB_DFLTCC_DEFLATE_ONLY;
+			else if (!strcmp(val, "inf_only"))
+				zlib_dfltcc_support = ZLIB_DFLTCC_INFLATE_ONLY;
+			else if (!strcmp(val, "always"))
+				zlib_dfltcc_support = ZLIB_DFLTCC_FULL_DEBUG;
+		}
+
 		if (!strcmp(param, "noexec")) {
 			rc = kstrtobool(val, &enabled);
 			if (!rc && !enabled)
--- a/arch/s390/include/asm/setup.h~s390-boot-add-dfltcc=-kernel-command-line-parameter
+++ a/arch/s390/include/asm/setup.h
@@ -79,6 +79,13 @@ struct parmarea {
 	char command_line[ARCH_COMMAND_LINE_SIZE];	/* 0x10480 */
 };
 
+extern unsigned int zlib_dfltcc_support;
+#define ZLIB_DFLTCC_DISABLED		0
+#define ZLIB_DFLTCC_FULL		1
+#define ZLIB_DFLTCC_DEFLATE_ONLY	2
+#define ZLIB_DFLTCC_INFLATE_ONLY	3
+#define ZLIB_DFLTCC_FULL_DEBUG		4
+
 extern int noexec_disabled;
 extern int memory_end_set;
 extern unsigned long memory_end;
--- a/arch/s390/kernel/setup.c~s390-boot-add-dfltcc=-kernel-command-line-parameter
+++ a/arch/s390/kernel/setup.c
@@ -111,6 +111,8 @@ unsigned long __bootdata_preserved(__ete
 unsigned long __bootdata_preserved(__sdma);
 unsigned long __bootdata_preserved(__edma);
 unsigned long __bootdata_preserved(__kaslr_offset);
+unsigned int __bootdata_preserved(zlib_dfltcc_support);
+EXPORT_SYMBOL(zlib_dfltcc_support);
 
 unsigned long VMALLOC_START;
 EXPORT_SYMBOL(VMALLOC_START);
--- a/Documentation/admin-guide/kernel-parameters.txt~s390-boot-add-dfltcc=-kernel-command-line-parameter
+++ a/Documentation/admin-guide/kernel-parameters.txt
@@ -834,6 +834,18 @@
 			dump out devices still on the deferred probe list after
 			retrying.
 
+	dfltcc=		[HW,S390]
+			Format: { on | off | def_only | inf_only | always }
+			on:       s390 zlib hardware support for compression on
+			          level 1 and decompression (default)
+			off:      No s390 zlib hardware support
+			def_only: s390 zlib hardware support for deflate
+			          only (compression on level 1)
+			inf_only: s390 zlib hardware support for inflate
+			          only (decompression)
+			always:   Same as 'on' but ignores the selected compression
+			          level always using hardware support (used for debugging)
+
 	dhash_entries=	[KNL]
 			Set number of hash buckets for dentry cache.
 
--- a/lib/zlib_dfltcc/dfltcc.c~s390-boot-add-dfltcc=-kernel-command-line-parameter
+++ a/lib/zlib_dfltcc/dfltcc.c
@@ -44,7 +44,10 @@ void dfltcc_reset(
     dfltcc_state->param.nt = 1;
 
     /* Initialize tuning parameters */
-    dfltcc_state->level_mask = DFLTCC_LEVEL_MASK;
+    if (zlib_dfltcc_support == ZLIB_DFLTCC_FULL_DEBUG)
+        dfltcc_state->level_mask = DFLTCC_LEVEL_MASK_DEBUG;
+    else
+        dfltcc_state->level_mask = DFLTCC_LEVEL_MASK;
     dfltcc_state->block_size = DFLTCC_BLOCK_SIZE;
     dfltcc_state->block_threshold = DFLTCC_FIRST_FHT_BLOCK_SIZE;
     dfltcc_state->dht_threshold = DFLTCC_DHT_MIN_SAMPLE_SIZE;
--- a/lib/zlib_dfltcc/dfltcc_deflate.c~s390-boot-add-dfltcc=-kernel-command-line-parameter
+++ a/lib/zlib_dfltcc/dfltcc_deflate.c
@@ -3,6 +3,7 @@
 #include "../zlib_deflate/defutil.h"
 #include "dfltcc_util.h"
 #include "dfltcc.h"
+#include <asm/setup.h>
 #include <linux/zutil.h>
 
 /*
@@ -15,6 +16,11 @@ int dfltcc_can_deflate(
     deflate_state *state = (deflate_state *)strm->state;
     struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
 
+    /* Check for kernel dfltcc command line parameter */
+    if (zlib_dfltcc_support == ZLIB_DFLTCC_DISABLED ||
+            zlib_dfltcc_support == ZLIB_DFLTCC_INFLATE_ONLY)
+        return 0;
+
     /* Unsupported compression settings */
     if (!dfltcc_are_params_ok(state->level, state->w_bits, state->strategy,
                               dfltcc_state->level_mask))
--- a/lib/zlib_dfltcc/dfltcc.h~s390-boot-add-dfltcc=-kernel-command-line-parameter
+++ a/lib/zlib_dfltcc/dfltcc.h
@@ -8,6 +8,7 @@
  * Tuning parameters.
  */
 #define DFLTCC_LEVEL_MASK 0x2 /* DFLTCC compression for level 1 only */
+#define DFLTCC_LEVEL_MASK_DEBUG 0x3fe /* DFLTCC compression for all levels */
 #define DFLTCC_BLOCK_SIZE 1048576
 #define DFLTCC_FIRST_FHT_BLOCK_SIZE 4096
 #define DFLTCC_DHT_MIN_SAMPLE_SIZE 4096
--- a/lib/zlib_dfltcc/dfltcc_inflate.c~s390-boot-add-dfltcc=-kernel-command-line-parameter
+++ a/lib/zlib_dfltcc/dfltcc_inflate.c
@@ -3,6 +3,7 @@
 #include "../zlib_inflate/inflate.h"
 #include "dfltcc_util.h"
 #include "dfltcc.h"
+#include <asm/setup.h>
 #include <linux/zutil.h>
 
 /*
@@ -15,6 +16,11 @@ int dfltcc_can_inflate(
     struct inflate_state *state = (struct inflate_state *)strm->state;
     struct dfltcc_state *dfltcc_state = GET_DFLTCC_STATE(state);
 
+    /* Check for kernel dfltcc command line parameter */
+    if (zlib_dfltcc_support == ZLIB_DFLTCC_DISABLED ||
+            zlib_dfltcc_support == ZLIB_DFLTCC_DEFLATE_ONLY)
+        return 0;
+
     /* Unsupported compression settings */
     if (state->wbits != HB_BITS)
         return 0;
--- a/lib/zlib_dfltcc/dfltcc_util.h~s390-boot-add-dfltcc=-kernel-command-line-parameter
+++ a/lib/zlib_dfltcc/dfltcc_util.h
@@ -4,6 +4,7 @@
 
 #include <linux/zutil.h>
 #include <asm/facility.h>
+#include <asm/setup.h>
 
 /*
  * C wrapper for the DEFLATE CONVERSION CALL instruction.
@@ -102,7 +103,8 @@ static inline int dfltcc_are_params_ok(
 
 static inline int is_dfltcc_enabled(void)
 {
-    return test_facility(DFLTCC_FACILITY);
+    return (zlib_dfltcc_support != ZLIB_DFLTCC_DISABLED &&
+            test_facility(DFLTCC_FACILITY));
 }
 
 char *oesc_msg(char *buf, int oesc);
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 097/118] lib/zlib: add zlib_deflate_dfltcc_enabled() function
  2020-01-31  6:10 incoming Andrew Morton
                   ` (95 preceding siblings ...)
  2020-01-31  6:16 ` [patch 096/118] s390/boot: add dfltcc= kernel command line parameter Andrew Morton
@ 2020-01-31  6:16 ` Andrew Morton
  2020-01-31  6:16 ` [patch 098/118] btrfs: use larger zlib buffer for s390 hardware compression Andrew Morton
                   ` (20 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:16 UTC (permalink / raw)
  To: akpm, borntraeger, clm, dsterba, edward6, gor, heiko.carstens,
	iii, josef, linux-mm, mm-commits, rpurdie, torvalds, zaslonko

From: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Subject: lib/zlib: add zlib_deflate_dfltcc_enabled() function

Add a new function to zlib.h checking if s390 Deflate-Conversion facility
is installed and enabled.

Link: http://lkml.kernel.org/r/20200103223334.20669-6-zaslonko@linux.ibm.com
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Eduard Shishkin <edward6@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/zlib.h            |    6 ++++++
 lib/zlib_deflate/deflate.c      |    6 ++++++
 lib/zlib_deflate/deflate_syms.c |    1 +
 lib/zlib_dfltcc/dfltcc.h        |   11 +++++++++++
 lib/zlib_dfltcc/dfltcc_util.h   |    9 ---------
 5 files changed, 24 insertions(+), 9 deletions(-)

--- a/include/linux/zlib.h~lib-zlib-add-zlib_deflate_dfltcc_enabled-function
+++ a/include/linux/zlib.h
@@ -191,6 +191,12 @@ extern int zlib_deflate_workspacesize (i
    exceed those passed here.
 */
 
+extern int zlib_deflate_dfltcc_enabled (void);
+/*
+   Returns 1 if Deflate-Conversion facility is installed and enabled,
+   otherwise 0.
+*/
+
 /* 
 extern int deflateInit (z_streamp strm, int level);
 
--- a/lib/zlib_deflate/deflate.c~lib-zlib-add-zlib_deflate_dfltcc_enabled-function
+++ a/lib/zlib_deflate/deflate.c
@@ -59,6 +59,7 @@
 #define DEFLATE_RESET_HOOK(strm) do {} while (0)
 #define DEFLATE_HOOK(strm, flush, bstate) 0
 #define DEFLATE_NEED_CHECKSUM(strm) 1
+#define DEFLATE_DFLTCC_ENABLED() 0
 #endif
 
 /* ===========================================================================
@@ -1138,3 +1139,8 @@ int zlib_deflate_workspacesize(int windo
         + zlib_deflate_head_memsize(memLevel)
         + zlib_deflate_overlay_memsize(memLevel);
 }
+
+int zlib_deflate_dfltcc_enabled(void)
+{
+	return DEFLATE_DFLTCC_ENABLED();
+}
--- a/lib/zlib_deflate/deflate_syms.c~lib-zlib-add-zlib_deflate_dfltcc_enabled-function
+++ a/lib/zlib_deflate/deflate_syms.c
@@ -12,6 +12,7 @@
 #include <linux/zlib.h>
 
 EXPORT_SYMBOL(zlib_deflate_workspacesize);
+EXPORT_SYMBOL(zlib_deflate_dfltcc_enabled);
 EXPORT_SYMBOL(zlib_deflate);
 EXPORT_SYMBOL(zlib_deflateInit2);
 EXPORT_SYMBOL(zlib_deflateEnd);
--- a/lib/zlib_dfltcc/dfltcc.h~lib-zlib-add-zlib_deflate_dfltcc_enabled-function
+++ a/lib/zlib_dfltcc/dfltcc.h
@@ -3,6 +3,8 @@
 #define DFLTCC_H
 
 #include "../zlib_deflate/defutil.h"
+#include <asm/facility.h>
+#include <asm/setup.h>
 
 /*
  * Tuning parameters.
@@ -14,6 +16,8 @@
 #define DFLTCC_DHT_MIN_SAMPLE_SIZE 4096
 #define DFLTCC_RIBM 0
 
+#define DFLTCC_FACILITY 151
+
 /*
  * Parameter Block for Query Available Functions.
  */
@@ -113,6 +117,11 @@ typedef enum {
 } dfltcc_inflate_action;
 dfltcc_inflate_action dfltcc_inflate(z_streamp strm,
                                      int flush, int *ret);
+static inline int is_dfltcc_enabled(void)
+{
+return (zlib_dfltcc_support != ZLIB_DFLTCC_DISABLED &&
+        test_facility(DFLTCC_FACILITY));
+}
 
 #define DEFLATE_RESET_HOOK(strm) \
     dfltcc_reset((strm), sizeof(deflate_state))
@@ -121,6 +130,8 @@ dfltcc_inflate_action dfltcc_inflate(z_s
 
 #define DEFLATE_NEED_CHECKSUM(strm) (!dfltcc_can_deflate((strm)))
 
+#define DEFLATE_DFLTCC_ENABLED() is_dfltcc_enabled()
+
 #define INFLATE_RESET_HOOK(strm) \
     dfltcc_reset((strm), sizeof(struct inflate_state))
 
--- a/lib/zlib_dfltcc/dfltcc_util.h~lib-zlib-add-zlib_deflate_dfltcc_enabled-function
+++ a/lib/zlib_dfltcc/dfltcc_util.h
@@ -3,8 +3,6 @@
 #define DFLTCC_UTIL_H
 
 #include <linux/zutil.h>
-#include <asm/facility.h>
-#include <asm/setup.h>
 
 /*
  * C wrapper for the DEFLATE CONVERSION CALL instruction.
@@ -24,7 +22,6 @@ typedef enum {
 #define HBT_CIRCULAR (1 << 7)
 #define HB_BITS 15
 #define HB_SIZE (1 << HB_BITS)
-#define DFLTCC_FACILITY 151
 
 static inline dfltcc_cc dfltcc(
     int fn,
@@ -101,12 +98,6 @@ static inline int dfltcc_are_params_ok(
         (strategy == Z_DEFAULT_STRATEGY);
 }
 
-static inline int is_dfltcc_enabled(void)
-{
-    return (zlib_dfltcc_support != ZLIB_DFLTCC_DISABLED &&
-            test_facility(DFLTCC_FACILITY));
-}

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 098/118] btrfs: use larger zlib buffer for s390 hardware compression
  2020-01-31  6:10 incoming Andrew Morton
                   ` (96 preceding siblings ...)
  2020-01-31  6:16 ` [patch 097/118] lib/zlib: add zlib_deflate_dfltcc_enabled() function Andrew Morton
@ 2020-01-31  6:16 ` Andrew Morton
  2020-01-31  6:16 ` [patch 099/118] lib/scatterlist.c: adjust indentation in __sg_alloc_table Andrew Morton
                   ` (19 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:16 UTC (permalink / raw)
  To: akpm, borntraeger, clm, dsterba, edward6, gor, heiko.carstens,
	iii, josef, linux-mm, mm-commits, rpurdie, torvalds, zaslonko

From: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Subject: btrfs: use larger zlib buffer for s390 hardware compression

In order to benefit from s390 zlib hardware compression support, increase
the btrfs zlib workspace buffer size from 1 to 4 pages (if s390 zlib
hardware support is enabled on the machine).  This brings up to 60% better
performance in hardware on s390 compared to the PAGE_SIZE buffer and much
more compared to the software zlib processing in btrfs.  In case of memory
pressure, fall back to a single page buffer during workspace allocation.

The data compressed with larger input buffers will still conform to zlib
standard and thus can be decompressed also on a systems that uses only
PAGE_SIZE buffer for btrfs zlib.

Link: http://lkml.kernel.org/r/20200108105103.29028-1-zaslonko@linux.ibm.com
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Eduard Shishkin <edward6@linux.ibm.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/btrfs/compression.c |    2 
 fs/btrfs/zlib.c        |  135 ++++++++++++++++++++++++++++-----------
 2 files changed, 101 insertions(+), 36 deletions(-)

--- a/fs/btrfs/compression.c~btrfs-use-larger-zlib-buffer-for-s390-hardware-compression
+++ a/fs/btrfs/compression.c
@@ -1290,7 +1290,7 @@ int btrfs_decompress_buf2page(const char
 	/* copy bytes from the working buffer into the pages */
 	while (working_bytes > 0) {
 		bytes = min_t(unsigned long, bvec.bv_len,
-				PAGE_SIZE - buf_offset);
+				PAGE_SIZE - (buf_offset % PAGE_SIZE));
 		bytes = min(bytes, working_bytes);
 
 		kaddr = kmap_atomic(bvec.bv_page);
--- a/fs/btrfs/zlib.c~btrfs-use-larger-zlib-buffer-for-s390-hardware-compression
+++ a/fs/btrfs/zlib.c
@@ -20,9 +20,13 @@
 #include <linux/refcount.h>
 #include "compression.h"
 
+/* workspace buffer size for s390 zlib hardware support */
+#define ZLIB_DFLTCC_BUF_SIZE    (4 * PAGE_SIZE)
+
 struct workspace {
 	z_stream strm;
 	char *buf;
+	unsigned int buf_size;
 	struct list_head list;
 	int level;
 };
@@ -61,7 +65,21 @@ struct list_head *zlib_alloc_workspace(u
 			zlib_inflate_workspacesize());
 	workspace->strm.workspace = kvmalloc(workspacesize, GFP_KERNEL);
 	workspace->level = level;
-	workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+	workspace->buf = NULL;
+	/*
+	 * In case of s390 zlib hardware support, allocate lager workspace
+	 * buffer. If allocator fails, fall back to a single page buffer.
+	 */
+	if (zlib_deflate_dfltcc_enabled()) {
+		workspace->buf = kmalloc(ZLIB_DFLTCC_BUF_SIZE,
+					 __GFP_NOMEMALLOC | __GFP_NORETRY |
+					 __GFP_NOWARN | GFP_NOIO);
+		workspace->buf_size = ZLIB_DFLTCC_BUF_SIZE;
+	}
+	if (!workspace->buf) {
+		workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+		workspace->buf_size = PAGE_SIZE;
+	}
 	if (!workspace->strm.workspace || !workspace->buf)
 		goto fail;
 
@@ -85,6 +103,7 @@ int zlib_compress_pages(struct list_head
 	struct page *in_page = NULL;
 	struct page *out_page = NULL;
 	unsigned long bytes_left;
+	unsigned int in_buf_pages;
 	unsigned long len = *total_out;
 	unsigned long nr_dest_pages = *out_pages;
 	const unsigned long max_out = nr_dest_pages * PAGE_SIZE;
@@ -102,9 +121,6 @@ int zlib_compress_pages(struct list_head
 	workspace->strm.total_in = 0;
 	workspace->strm.total_out = 0;
 
-	in_page = find_get_page(mapping, start >> PAGE_SHIFT);
-	data_in = kmap(in_page);
-
 	out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
 	if (out_page == NULL) {
 		ret = -ENOMEM;
@@ -114,12 +130,51 @@ int zlib_compress_pages(struct list_head
 	pages[0] = out_page;
 	nr_pages = 1;
 
-	workspace->strm.next_in = data_in;
+	workspace->strm.next_in = workspace->buf;
+	workspace->strm.avail_in = 0;
 	workspace->strm.next_out = cpage_out;
 	workspace->strm.avail_out = PAGE_SIZE;
-	workspace->strm.avail_in = min(len, PAGE_SIZE);
 
 	while (workspace->strm.total_in < len) {
+		/*
+		 * Get next input pages and copy the contents to
+		 * the workspace buffer if required.
+		 */
+		if (workspace->strm.avail_in == 0) {
+			bytes_left = len - workspace->strm.total_in;
+			in_buf_pages = min(DIV_ROUND_UP(bytes_left, PAGE_SIZE),
+					   workspace->buf_size / PAGE_SIZE);
+			if (in_buf_pages > 1) {
+				int i;
+
+				for (i = 0; i < in_buf_pages; i++) {
+					if (in_page) {
+						kunmap(in_page);
+						put_page(in_page);
+					}
+					in_page = find_get_page(mapping,
+								start >> PAGE_SHIFT);
+					data_in = kmap(in_page);
+					memcpy(workspace->buf + i * PAGE_SIZE,
+					       data_in, PAGE_SIZE);
+					start += PAGE_SIZE;
+				}
+				workspace->strm.next_in = workspace->buf;
+			} else {
+				if (in_page) {
+					kunmap(in_page);
+					put_page(in_page);
+				}
+				in_page = find_get_page(mapping,
+							start >> PAGE_SHIFT);
+				data_in = kmap(in_page);
+				start += PAGE_SIZE;
+				workspace->strm.next_in = data_in;
+			}
+			workspace->strm.avail_in = min(bytes_left,
+						       (unsigned long) workspace->buf_size);
+		}
+
 		ret = zlib_deflate(&workspace->strm, Z_SYNC_FLUSH);
 		if (ret != Z_OK) {
 			pr_debug("BTRFS: deflate in loop returned %d\n",
@@ -161,33 +216,43 @@ int zlib_compress_pages(struct list_head
 		/* we're all done */
 		if (workspace->strm.total_in >= len)
 			break;
-
-		/* we've read in a full page, get a new one */
-		if (workspace->strm.avail_in == 0) {
-			if (workspace->strm.total_out > max_out)
-				break;
-
-			bytes_left = len - workspace->strm.total_in;
-			kunmap(in_page);
-			put_page(in_page);
-
-			start += PAGE_SIZE;
-			in_page = find_get_page(mapping,
-						start >> PAGE_SHIFT);
-			data_in = kmap(in_page);
-			workspace->strm.avail_in = min(bytes_left,
-							   PAGE_SIZE);
-			workspace->strm.next_in = data_in;
-		}
+		if (workspace->strm.total_out > max_out)
+			break;
 	}
 	workspace->strm.avail_in = 0;
-	ret = zlib_deflate(&workspace->strm, Z_FINISH);
-	zlib_deflateEnd(&workspace->strm);
-
-	if (ret != Z_STREAM_END) {
-		ret = -EIO;
-		goto out;
+	/*
+	 * Call deflate with Z_FINISH flush parameter providing more output
+	 * space but no more input data, until it returns with Z_STREAM_END.
+	 */
+	while (ret != Z_STREAM_END) {
+		ret = zlib_deflate(&workspace->strm, Z_FINISH);
+		if (ret == Z_STREAM_END)
+			break;
+		if (ret != Z_OK && ret != Z_BUF_ERROR) {
+			zlib_deflateEnd(&workspace->strm);
+			ret = -EIO;
+			goto out;
+		} else if (workspace->strm.avail_out == 0) {
+			/* get another page for the stream end */
+			kunmap(out_page);
+			if (nr_pages == nr_dest_pages) {
+				out_page = NULL;
+				ret = -E2BIG;
+				goto out;
+			}
+			out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
+			if (out_page == NULL) {
+				ret = -ENOMEM;
+				goto out;
+			}
+			cpage_out = kmap(out_page);
+			pages[nr_pages] = out_page;
+			nr_pages++;
+			workspace->strm.avail_out = PAGE_SIZE;
+			workspace->strm.next_out = cpage_out;
+		}
 	}
+	zlib_deflateEnd(&workspace->strm);
 
 	if (workspace->strm.total_out >= workspace->strm.total_in) {
 		ret = -E2BIG;
@@ -231,7 +296,7 @@ int zlib_decompress_bio(struct list_head
 
 	workspace->strm.total_out = 0;
 	workspace->strm.next_out = workspace->buf;
-	workspace->strm.avail_out = PAGE_SIZE;
+	workspace->strm.avail_out = workspace->buf_size;
 
 	/* If it's deflate, and it's got no preset dictionary, then
 	   we can tell zlib to skip the adler32 check. */
@@ -270,7 +335,7 @@ int zlib_decompress_bio(struct list_head
 		}
 
 		workspace->strm.next_out = workspace->buf;
-		workspace->strm.avail_out = PAGE_SIZE;
+		workspace->strm.avail_out = workspace->buf_size;
 
 		if (workspace->strm.avail_in == 0) {
 			unsigned long tmp;
@@ -320,7 +385,7 @@ int zlib_decompress(struct list_head *ws
 	workspace->strm.total_in = 0;
 
 	workspace->strm.next_out = workspace->buf;
-	workspace->strm.avail_out = PAGE_SIZE;
+	workspace->strm.avail_out = workspace->buf_size;
 	workspace->strm.total_out = 0;
 	/* If it's deflate, and it's got no preset dictionary, then
 	   we can tell zlib to skip the adler32 check. */
@@ -364,7 +429,7 @@ int zlib_decompress(struct list_head *ws
 			buf_offset = 0;
 
 		bytes = min(PAGE_SIZE - pg_offset,
-			    PAGE_SIZE - buf_offset);
+			    PAGE_SIZE - (buf_offset % PAGE_SIZE));
 		bytes = min(bytes, bytes_left);
 
 		kaddr = kmap_atomic(dest_page);
@@ -375,7 +440,7 @@ int zlib_decompress(struct list_head *ws
 		bytes_left -= bytes;
 next:
 		workspace->strm.next_out = workspace->buf;
-		workspace->strm.avail_out = PAGE_SIZE;
+		workspace->strm.avail_out = workspace->buf_size;
 	}
 
 	if (ret != Z_STREAM_END && bytes_left != 0)
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 099/118] lib/scatterlist.c: adjust indentation in __sg_alloc_table
  2020-01-31  6:10 incoming Andrew Morton
                   ` (97 preceding siblings ...)
  2020-01-31  6:16 ` [patch 098/118] btrfs: use larger zlib buffer for s390 hardware compression Andrew Morton
@ 2020-01-31  6:16 ` Andrew Morton
  2020-01-31  6:16 ` [patch 100/118] uapi: rename ext2_swab() to swab() and share globally in swab.h Andrew Morton
                   ` (18 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:16 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, natechancellor, torvalds

From: Nathan Chancellor <natechancellor@gmail.com>
Subject: lib/scatterlist.c: adjust indentation in __sg_alloc_table

Clang warns:

../lib/scatterlist.c:314:5: warning: misleading indentation; statement
is not part of the previous 'if' [-Wmisleading-indentation]
                        return -ENOMEM;
                        ^
../lib/scatterlist.c:311:4: note: previous statement is here
                        if (prv)
                        ^
1 warning generated.

This warning occurs because there is a space before the tab on this line. 
Remove it so that the indentation is consistent with the Linux kernel
coding style and clang no longer warns.

Link: http://lkml.kernel.org/r/20191218033606.11942-1-natechancellor@gmail.com
Link: https://github.com/ClangBuiltLinux/linux/issues/830
Fixes: edce6820a9fd ("scatterlist: prevent invalid free when alloc fails")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 lib/scatterlist.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/lib/scatterlist.c~lib-scatterlist-adjust-indentation-in-__sg_alloc_table
+++ a/lib/scatterlist.c
@@ -311,7 +311,7 @@ int __sg_alloc_table(struct sg_table *ta
 			if (prv)
 				table->nents = ++table->orig_nents;
 
- 			return -ENOMEM;
+			return -ENOMEM;
 		}
 
 		sg_init_table(sg, alloc_size);
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 100/118] uapi: rename ext2_swab() to swab() and share globally in swab.h
  2020-01-31  6:10 incoming Andrew Morton
                   ` (98 preceding siblings ...)
  2020-01-31  6:16 ` [patch 099/118] lib/scatterlist.c: adjust indentation in __sg_alloc_table Andrew Morton
@ 2020-01-31  6:16 ` Andrew Morton
  2020-01-31  6:16 ` [patch 101/118] lib/find_bit.c: join _find_next_bit{_le} Andrew Morton
                   ` (17 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:16 UTC (permalink / raw)
  To: akpm, allison, joe, linux-mm, mm-commits, tglx, torvalds,
	vilhelm.gray, yury.norov

From: Yury Norov <yury.norov@gmail.com>
Subject: uapi: rename ext2_swab() to swab() and share globally in swab.h

ext2_swab() is defined locally in lib/find_bit.c However it is not specific
to ext2, neither to bitmaps.

There are many potential users of it, so rename it to just swab() and move
to include/uapi/linux/swab.h

ABI guarantees that size of unsigned long corresponds to BITS_PER_LONG,
therefore drop unneeded cast.

Link: http://lkml.kernel.org/r/20200103202846.21616-1-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Cc: Allison Randal <allison@lohutok.net>
Cc: Joe Perches <joe@perches.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: William Breathitt Gray <vilhelm.gray@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/swab.h      |    1 +
 include/uapi/linux/swab.h |   10 ++++++++++
 lib/find_bit.c            |   16 ++--------------
 3 files changed, 13 insertions(+), 14 deletions(-)

--- a/include/linux/swab.h~uapi-rename-ext2_swab-to-swab-and-share-globally-in-swabh
+++ a/include/linux/swab.h
@@ -7,6 +7,7 @@
 # define swab16 __swab16
 # define swab32 __swab32
 # define swab64 __swab64
+# define swab __swab
 # define swahw32 __swahw32
 # define swahb32 __swahb32
 # define swab16p __swab16p
--- a/include/uapi/linux/swab.h~uapi-rename-ext2_swab-to-swab-and-share-globally-in-swabh
+++ a/include/uapi/linux/swab.h
@@ -4,6 +4,7 @@
 
 #include <linux/types.h>
 #include <linux/compiler.h>
+#include <asm/bitsperlong.h>
 #include <asm/swab.h>
 
 /*
@@ -132,6 +133,15 @@ static inline __attribute_const__ __u32
 	__fswab64(x))
 #endif
 
+static __always_inline unsigned long __swab(const unsigned long y)
+{
+#if BITS_PER_LONG == 64
+	return __swab64(y);
+#else /* BITS_PER_LONG == 32 */
+	return __swab32(y);
+#endif
+}
+
 /**
  * __swahw32 - return a word-swapped 32-bit value
  * @x: value to wordswap
--- a/lib/find_bit.c~uapi-rename-ext2_swab-to-swab-and-share-globally-in-swabh
+++ a/lib/find_bit.c
@@ -149,18 +149,6 @@ EXPORT_SYMBOL(find_last_bit);
 
 #ifdef __BIG_ENDIAN
 
-/* include/linux/byteorder does not support "unsigned long" type */
-static inline unsigned long ext2_swab(const unsigned long y)
-{
-#if BITS_PER_LONG == 64
-	return (unsigned long) __swab64((u64) y);
-#elif BITS_PER_LONG == 32
-	return (unsigned long) __swab32((u32) y);
-#else
-#error BITS_PER_LONG not defined
-#endif
-}

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 101/118] lib/find_bit.c: join _find_next_bit{_le}
  2020-01-31  6:10 incoming Andrew Morton
                   ` (99 preceding siblings ...)
  2020-01-31  6:16 ` [patch 100/118] uapi: rename ext2_swab() to swab() and share globally in swab.h Andrew Morton
@ 2020-01-31  6:16 ` Andrew Morton
  2020-01-31  6:16 ` [patch 102/118] lib/find_bit.c: uninline helper _find_next_bit() Andrew Morton
                   ` (16 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:16 UTC (permalink / raw)
  To: akpm, allison, joe, linux-mm, mm-commits, tglx, torvalds,
	vilhelm.gray, yury.norov

From: Yury Norov <yury.norov@gmail.com>
Subject: lib/find_bit.c: join _find_next_bit{_le}

_find_next_bit and _find_next_bit_le are very similar functions.  It's
possible to join them by adding 1 parameter and a couple of simple checks.
It's simplify maintenance and make possible to shrink the size of .text
by un-inlining the unified function (in the following patch).

Link: http://lkml.kernel.org/r/20200103202846.21616-2-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Cc: Allison Randal <allison@lohutok.net>
Cc: Joe Perches <joe@perches.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: William Breathitt Gray <vilhelm.gray@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 lib/find_bit.c |   64 +++++++++++++----------------------------------
 1 file changed, 19 insertions(+), 45 deletions(-)

--- a/lib/find_bit.c~lib-find_bitc-join-_find_next_bit_le
+++ a/lib/find_bit.c
@@ -17,9 +17,9 @@
 #include <linux/export.h>
 #include <linux/kernel.h>
 
-#if !defined(find_next_bit) || !defined(find_next_zero_bit) || \
-		!defined(find_next_and_bit)
-
+#if !defined(find_next_bit) || !defined(find_next_zero_bit) ||			\
+	!defined(find_next_bit_le) || !defined(find_next_zero_bit_le) ||	\
+	!defined(find_next_and_bit)
 /*
  * This is a common helper function for find_next_bit, find_next_zero_bit, and
  * find_next_and_bit. The differences are:
@@ -29,9 +29,9 @@
  */
 static inline unsigned long _find_next_bit(const unsigned long *addr1,
 		const unsigned long *addr2, unsigned long nbits,
-		unsigned long start, unsigned long invert)
+		unsigned long start, unsigned long invert, unsigned long le)
 {
-	unsigned long tmp;
+	unsigned long tmp, mask;
 
 	if (unlikely(start >= nbits))
 		return nbits;
@@ -42,7 +42,12 @@ static inline unsigned long _find_next_b
 	tmp ^= invert;
 
 	/* Handle 1st word. */
-	tmp &= BITMAP_FIRST_WORD_MASK(start);
+	mask = BITMAP_FIRST_WORD_MASK(start);
+	if (le)
+		mask = swab(mask);
+
+	tmp &= mask;
+
 	start = round_down(start, BITS_PER_LONG);
 
 	while (!tmp) {
@@ -56,6 +61,9 @@ static inline unsigned long _find_next_b
 		tmp ^= invert;
 	}
 
+	if (le)
+		tmp = swab(tmp);
+
 	return min(start + __ffs(tmp), nbits);
 }
 #endif
@@ -67,7 +75,7 @@ static inline unsigned long _find_next_b
 unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
 			    unsigned long offset)
 {
-	return _find_next_bit(addr, NULL, size, offset, 0UL);
+	return _find_next_bit(addr, NULL, size, offset, 0UL, 0);
 }
 EXPORT_SYMBOL(find_next_bit);
 #endif
@@ -76,7 +84,7 @@ EXPORT_SYMBOL(find_next_bit);
 unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
 				 unsigned long offset)
 {
-	return _find_next_bit(addr, NULL, size, offset, ~0UL);
+	return _find_next_bit(addr, NULL, size, offset, ~0UL, 0);
 }
 EXPORT_SYMBOL(find_next_zero_bit);
 #endif
@@ -86,7 +94,7 @@ unsigned long find_next_and_bit(const un
 		const unsigned long *addr2, unsigned long size,
 		unsigned long offset)
 {
-	return _find_next_bit(addr1, addr2, size, offset, 0UL);
+	return _find_next_bit(addr1, addr2, size, offset, 0UL, 0);
 }
 EXPORT_SYMBOL(find_next_and_bit);
 #endif
@@ -149,45 +157,11 @@ EXPORT_SYMBOL(find_last_bit);
 
 #ifdef __BIG_ENDIAN
 
-#if !defined(find_next_bit_le) || !defined(find_next_zero_bit_le)
-static inline unsigned long _find_next_bit_le(const unsigned long *addr1,
-		const unsigned long *addr2, unsigned long nbits,
-		unsigned long start, unsigned long invert)
-{
-	unsigned long tmp;
-
-	if (unlikely(start >= nbits))
-		return nbits;
-
-	tmp = addr1[start / BITS_PER_LONG];
-	if (addr2)
-		tmp &= addr2[start / BITS_PER_LONG];
-	tmp ^= invert;
-
-	/* Handle 1st word. */
-	tmp &= swab(BITMAP_FIRST_WORD_MASK(start));
-	start = round_down(start, BITS_PER_LONG);
-
-	while (!tmp) {
-		start += BITS_PER_LONG;
-		if (start >= nbits)
-			return nbits;
-
-		tmp = addr1[start / BITS_PER_LONG];
-		if (addr2)
-			tmp &= addr2[start / BITS_PER_LONG];
-		tmp ^= invert;
-	}
-
-	return min(start + __ffs(swab(tmp)), nbits);
-}
-#endif

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 102/118] lib/find_bit.c: uninline helper _find_next_bit()
  2020-01-31  6:10 incoming Andrew Morton
                   ` (100 preceding siblings ...)
  2020-01-31  6:16 ` [patch 101/118] lib/find_bit.c: join _find_next_bit{_le} Andrew Morton
@ 2020-01-31  6:16 ` Andrew Morton
  2020-01-31  6:16 ` [patch 103/118] fs/binfmt_elf.c: smaller code generation around auxv vector fill Andrew Morton
                   ` (15 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:16 UTC (permalink / raw)
  To: akpm, allison, joe, linux-mm, mm-commits, tglx, torvalds,
	vilhelm.gray, yury.norov

From: Yury Norov <yury.norov@gmail.com>
Subject: lib/find_bit.c: uninline helper _find_next_bit()

It saves 25% of .text for arm64, and more for BE architectures.

Before:
$ size lib/find_bit.o
   text    data     bss     dec     hex filename
   1012      56       0    1068     42c lib/find_bit.o

After:
$ size lib/find_bit.o
   text    data     bss     dec     hex filename
    776      56       0     832     340 lib/find_bit.o

Link: http://lkml.kernel.org/r/20200103202846.21616-3-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Allison Randal <allison@lohutok.net>
Cc: William Breathitt Gray <vilhelm.gray@gmail.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 lib/find_bit.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/lib/find_bit.c~lib-find_bitc-uninline-helper-_find_next_bit
+++ a/lib/find_bit.c
@@ -27,7 +27,7 @@
  *    searching it for one bits.
  *  - The optional "addr2", which is anded with "addr1" if present.
  */
-static inline unsigned long _find_next_bit(const unsigned long *addr1,
+static unsigned long _find_next_bit(const unsigned long *addr1,
 		const unsigned long *addr2, unsigned long nbits,
 		unsigned long start, unsigned long invert, unsigned long le)
 {
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 103/118] fs/binfmt_elf.c: smaller code generation around auxv vector fill
  2020-01-31  6:10 incoming Andrew Morton
                   ` (101 preceding siblings ...)
  2020-01-31  6:16 ` [patch 102/118] lib/find_bit.c: uninline helper _find_next_bit() Andrew Morton
@ 2020-01-31  6:16 ` Andrew Morton
  2020-01-31  6:16 ` [patch 104/118] fs/binfmt_elf.c: fix ->start_code calculation Andrew Morton
                   ` (14 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:16 UTC (permalink / raw)
  To: adobriyan, akpm, linux-mm, mm-commits, torvalds

From: Alexey Dobriyan <adobriyan@gmail.com>
Subject: fs/binfmt_elf.c: smaller code generation around auxv vector fill

Filling auxv vector as array with index (auxv[i++] = ...) generates
terrible code.  "saved_auxv" should be reworked because it is the worst
member of mm_struct by size/usefullness ratio but do it later.

Meanwhile help gcc a little with *auxv++ idiom.

Space savings on x86_64:

	add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-127 (-127)
	Function                                     old     new   delta
	load_elf_binary                             5470    5343    -127

Link: http://lkml.kernel.org/r/20191208172301.GD19716@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/binfmt_elf.c |   15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

--- a/fs/binfmt_elf.c~elf-smaller-code-generation-around-auxv-vector-fill
+++ a/fs/binfmt_elf.c
@@ -176,7 +176,7 @@ create_elf_tables(struct linux_binprm *b
 	unsigned char k_rand_bytes[16];
 	int items;
 	elf_addr_t *elf_info;
-	int ei_index = 0;
+	int ei_index;
 	const struct cred *cred = current_cred();
 	struct vm_area_struct *vma;
 
@@ -230,8 +230,8 @@ create_elf_tables(struct linux_binprm *b
 	/* update AT_VECTOR_SIZE_BASE if the number of NEW_AUX_ENT() changes */
 #define NEW_AUX_ENT(id, val) \
 	do { \
-		elf_info[ei_index++] = id; \
-		elf_info[ei_index++] = val; \
+		*elf_info++ = id; \
+		*elf_info++ = val; \
 	} while (0)
 
 #ifdef ARCH_DLINFO
@@ -275,12 +275,13 @@ create_elf_tables(struct linux_binprm *b
 	}
 #undef NEW_AUX_ENT
 	/* AT_NULL is zero; clear the rest too */
-	memset(&elf_info[ei_index], 0,
-	       sizeof current->mm->saved_auxv - ei_index * sizeof elf_info[0]);
+	memset(elf_info, 0, (char *)current->mm->saved_auxv +
+			sizeof(current->mm->saved_auxv) - (char *)elf_info);
 
 	/* And advance past the AT_NULL entry.  */
-	ei_index += 2;
+	elf_info += 2;
 
+	ei_index = elf_info - (elf_addr_t *)current->mm->saved_auxv;
 	sp = STACK_ADD(p, ei_index);
 
 	items = (argc + 1) + (envc + 1) + 1;
@@ -338,7 +339,7 @@ create_elf_tables(struct linux_binprm *b
 	current->mm->env_end = p;
 
 	/* Put the elf_info on the stack in the right place.  */
-	if (copy_to_user(sp, elf_info, ei_index * sizeof(elf_addr_t)))
+	if (copy_to_user(sp, current->mm->saved_auxv, ei_index * sizeof(elf_addr_t)))
 		return -EFAULT;
 	return 0;
 }
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 104/118] fs/binfmt_elf.c: fix ->start_code calculation
  2020-01-31  6:10 incoming Andrew Morton
                   ` (102 preceding siblings ...)
  2020-01-31  6:16 ` [patch 103/118] fs/binfmt_elf.c: smaller code generation around auxv vector fill Andrew Morton
@ 2020-01-31  6:16 ` Andrew Morton
  2020-01-31  6:16 ` [patch 105/118] fs/binfmt_elf.c: don't copy ELF header around Andrew Morton
                   ` (13 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:16 UTC (permalink / raw)
  To: adobriyan, akpm, linux-mm, mm-commits, torvalds

From: Alexey Dobriyan <adobriyan@gmail.com>
Subject: fs/binfmt_elf.c: fix ->start_code calculation

Only executable segments should be accounted to ->start_code just like
they do to ->end_code (correctly).

Link: http://lkml.kernel.org/r/20191208171410.GB19716@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/binfmt_elf.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/binfmt_elf.c~elf-fix-start_code-calculation
+++ a/fs/binfmt_elf.c
@@ -999,7 +999,7 @@ out_free_interp:
 			}
 		}
 		k = elf_ppnt->p_vaddr;
-		if (k < start_code)
+		if ((elf_ppnt->p_flags & PF_X) && k < start_code)
 			start_code = k;
 		if (start_data < k)
 			start_data = k;
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 105/118] fs/binfmt_elf.c: don't copy ELF header around
  2020-01-31  6:10 incoming Andrew Morton
                   ` (103 preceding siblings ...)
  2020-01-31  6:16 ` [patch 104/118] fs/binfmt_elf.c: fix ->start_code calculation Andrew Morton
@ 2020-01-31  6:16 ` Andrew Morton
  2020-01-31  6:16 ` [patch 106/118] fs/binfmt_elf.c: better codegen around current->mm Andrew Morton
                   ` (12 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:16 UTC (permalink / raw)
  To: adobriyan, akpm, linux-mm, mm-commits, torvalds

From: Alexey Dobriyan <adobriyan@gmail.com>
Subject: fs/binfmt_elf.c: don't copy ELF header around

ELF header is read into bprm->buf[] by generic execve code.

Save a memcpy and allocate just one header for the interpreter instead of
two headers (64 bytes instead of 128 on 64-bit).

Link: http://lkml.kernel.org/r/20191208171242.GA19716@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/binfmt_elf.c |   55 ++++++++++++++++++++++------------------------
 1 file changed, 27 insertions(+), 28 deletions(-)

--- a/fs/binfmt_elf.c~elf-dont-copy-elf-header-around
+++ a/fs/binfmt_elf.c
@@ -161,8 +161,9 @@ static int padzero(unsigned long elf_bss
 #endif
 
 static int
-create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
-		unsigned long load_addr, unsigned long interp_load_addr)
+create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec,
+		unsigned long load_addr, unsigned long interp_load_addr,
+		unsigned long e_entry)
 {
 	unsigned long p = bprm->p;
 	int argc = bprm->argc;
@@ -251,7 +252,7 @@ create_elf_tables(struct linux_binprm *b
 	NEW_AUX_ENT(AT_PHNUM, exec->e_phnum);
 	NEW_AUX_ENT(AT_BASE, interp_load_addr);
 	NEW_AUX_ENT(AT_FLAGS, 0);
-	NEW_AUX_ENT(AT_ENTRY, exec->e_entry);
+	NEW_AUX_ENT(AT_ENTRY, e_entry);
 	NEW_AUX_ENT(AT_UID, from_kuid_munged(cred->user_ns, cred->uid));
 	NEW_AUX_ENT(AT_EUID, from_kuid_munged(cred->user_ns, cred->euid));
 	NEW_AUX_ENT(AT_GID, from_kgid_munged(cred->user_ns, cred->gid));
@@ -690,12 +691,13 @@ static int load_elf_binary(struct linux_
 	int bss_prot = 0;
 	int retval, i;
 	unsigned long elf_entry;
+	unsigned long e_entry;
 	unsigned long interp_load_addr = 0;
 	unsigned long start_code, end_code, start_data, end_data;
 	unsigned long reloc_func_desc __maybe_unused = 0;
 	int executable_stack = EXSTACK_DEFAULT;
+	struct elfhdr *elf_ex = (struct elfhdr *)bprm->buf;
 	struct {
-		struct elfhdr elf_ex;
 		struct elfhdr interp_elf_ex;
 	} *loc;
 	struct arch_elf_state arch_state = INIT_ARCH_ELF_STATE;
@@ -706,30 +708,27 @@ static int load_elf_binary(struct linux_
 		retval = -ENOMEM;
 		goto out_ret;
 	}
-	
-	/* Get the exec-header */
-	loc->elf_ex = *((struct elfhdr *)bprm->buf);
 
 	retval = -ENOEXEC;
 	/* First of all, some simple consistency checks */
-	if (memcmp(loc->elf_ex.e_ident, ELFMAG, SELFMAG) != 0)
+	if (memcmp(elf_ex->e_ident, ELFMAG, SELFMAG) != 0)
 		goto out;
 
-	if (loc->elf_ex.e_type != ET_EXEC && loc->elf_ex.e_type != ET_DYN)
+	if (elf_ex->e_type != ET_EXEC && elf_ex->e_type != ET_DYN)
 		goto out;
-	if (!elf_check_arch(&loc->elf_ex))
+	if (!elf_check_arch(elf_ex))
 		goto out;
-	if (elf_check_fdpic(&loc->elf_ex))
+	if (elf_check_fdpic(elf_ex))
 		goto out;
 	if (!bprm->file->f_op->mmap)
 		goto out;
 
-	elf_phdata = load_elf_phdrs(&loc->elf_ex, bprm->file);
+	elf_phdata = load_elf_phdrs(elf_ex, bprm->file);
 	if (!elf_phdata)
 		goto out;
 
 	elf_ppnt = elf_phdata;
-	for (i = 0; i < loc->elf_ex.e_phnum; i++, elf_ppnt++) {
+	for (i = 0; i < elf_ex->e_phnum; i++, elf_ppnt++) {
 		char *elf_interpreter;
 
 		if (elf_ppnt->p_type != PT_INTERP)
@@ -783,7 +782,7 @@ out_free_interp:
 	}
 
 	elf_ppnt = elf_phdata;
-	for (i = 0; i < loc->elf_ex.e_phnum; i++, elf_ppnt++)
+	for (i = 0; i < elf_ex->e_phnum; i++, elf_ppnt++)
 		switch (elf_ppnt->p_type) {
 		case PT_GNU_STACK:
 			if (elf_ppnt->p_flags & PF_X)
@@ -793,7 +792,7 @@ out_free_interp:
 			break;
 
 		case PT_LOPROC ... PT_HIPROC:
-			retval = arch_elf_pt_proc(&loc->elf_ex, elf_ppnt,
+			retval = arch_elf_pt_proc(elf_ex, elf_ppnt,
 						  bprm->file, false,
 						  &arch_state);
 			if (retval)
@@ -837,7 +836,7 @@ out_free_interp:
 	 * still possible to return an error to the code that invoked
 	 * the exec syscall.
 	 */
-	retval = arch_check_elf(&loc->elf_ex,
+	retval = arch_check_elf(elf_ex,
 				!!interpreter, &loc->interp_elf_ex,
 				&arch_state);
 	if (retval)
@@ -850,8 +849,8 @@ out_free_interp:
 
 	/* Do this immediately, since STACK_TOP as used in setup_arg_pages
 	   may depend on the personality.  */
-	SET_PERSONALITY2(loc->elf_ex, &arch_state);
-	if (elf_read_implies_exec(loc->elf_ex, executable_stack))
+	SET_PERSONALITY2(*elf_ex, &arch_state);
+	if (elf_read_implies_exec(*elf_ex, executable_stack))
 		current->personality |= READ_IMPLIES_EXEC;
 
 	if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)
@@ -878,7 +877,7 @@ out_free_interp:
 	/* Now we do a little grungy work by mmapping the ELF image into
 	   the correct location in memory. */
 	for(i = 0, elf_ppnt = elf_phdata;
-	    i < loc->elf_ex.e_phnum; i++, elf_ppnt++) {
+	    i < elf_ex->e_phnum; i++, elf_ppnt++) {
 		int elf_prot, elf_flags;
 		unsigned long k, vaddr;
 		unsigned long total_size = 0;
@@ -922,9 +921,9 @@ out_free_interp:
 		 * If we are loading ET_EXEC or we have already performed
 		 * the ET_DYN load_addr calculations, proceed normally.
 		 */
-		if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) {
+		if (elf_ex->e_type == ET_EXEC || load_addr_set) {
 			elf_flags |= MAP_FIXED;
-		} else if (loc->elf_ex.e_type == ET_DYN) {
+		} else if (elf_ex->e_type == ET_DYN) {
 			/*
 			 * This logic is run once for the first LOAD Program
 			 * Header for ET_DYN binaries to calculate the
@@ -973,7 +972,7 @@ out_free_interp:
 			load_bias = ELF_PAGESTART(load_bias - vaddr);
 
 			total_size = total_mapping_size(elf_phdata,
-							loc->elf_ex.e_phnum);
+							elf_ex->e_phnum);
 			if (!total_size) {
 				retval = -EINVAL;
 				goto out_free_dentry;
@@ -991,7 +990,7 @@ out_free_interp:
 		if (!load_addr_set) {
 			load_addr_set = 1;
 			load_addr = (elf_ppnt->p_vaddr - elf_ppnt->p_offset);
-			if (loc->elf_ex.e_type == ET_DYN) {
+			if (elf_ex->e_type == ET_DYN) {
 				load_bias += error -
 				             ELF_PAGESTART(load_bias + vaddr);
 				load_addr += load_bias;
@@ -1032,7 +1031,7 @@ out_free_interp:
 		}
 	}
 
-	loc->elf_ex.e_entry += load_bias;
+	e_entry = elf_ex->e_entry + load_bias;
 	elf_bss += load_bias;
 	elf_brk += load_bias;
 	start_code += load_bias;
@@ -1075,7 +1074,7 @@ out_free_interp:
 		allow_write_access(interpreter);
 		fput(interpreter);
 	} else {
-		elf_entry = loc->elf_ex.e_entry;
+		elf_entry = e_entry;
 		if (BAD_ADDR(elf_entry)) {
 			retval = -EINVAL;
 			goto out_free_dentry;
@@ -1093,8 +1092,8 @@ out_free_interp:
 		goto out;
 #endif /* ARCH_HAS_SETUP_ADDITIONAL_PAGES */
 
-	retval = create_elf_tables(bprm, &loc->elf_ex,
-			  load_addr, interp_load_addr);
+	retval = create_elf_tables(bprm, elf_ex,
+			  load_addr, interp_load_addr, e_entry);
 	if (retval < 0)
 		goto out;
 	current->mm->end_code = end_code;
@@ -1112,7 +1111,7 @@ out_free_interp:
 		 * growing down), and into the unused ELF_ET_DYN_BASE region.
 		 */
 		if (IS_ENABLED(CONFIG_ARCH_HAS_ELF_RANDOMIZE) &&
-		    loc->elf_ex.e_type == ET_DYN && !interpreter)
+		    elf_ex->e_type == ET_DYN && !interpreter)
 			current->mm->brk = current->mm->start_brk =
 				ELF_ET_DYN_BASE;
 
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 106/118] fs/binfmt_elf.c: better codegen around current->mm
  2020-01-31  6:10 incoming Andrew Morton
                   ` (104 preceding siblings ...)
  2020-01-31  6:16 ` [patch 105/118] fs/binfmt_elf.c: don't copy ELF header around Andrew Morton
@ 2020-01-31  6:16 ` Andrew Morton
  2020-01-31  6:17 ` [patch 107/118] fs/binfmt_elf.c: make BAD_ADDR() unlikely Andrew Morton
                   ` (11 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:16 UTC (permalink / raw)
  To: adobriyan, akpm, linux-mm, mm-commits, torvalds

From: Alexey Dobriyan <adobriyan@gmail.com>
Subject: fs/binfmt_elf.c: better codegen around current->mm

"current->mm" pointer is stable in general except few cases one of which
execve(2).  Compiler can't treat is as stable but it _is_ stable most of
the time.  During ELF loading process ->mm becomes stable right after
flush_old_exec().

Help compiler by caching current->mm, otherwise it continues to refetch it.

	add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-141 (-141)
	Function                                     old     new   delta
	elf_core_dump                               5062    5039     -23
	load_elf_binary                             5426    5308    -118

Note: other cases are left as is because it is either pessimisation or no
change in binary size.

Link: http://lkml.kernel.org/r/20191215124755.GB21124@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/binfmt_elf.c |   52 ++++++++++++++++++++++++----------------------
 1 file changed, 28 insertions(+), 24 deletions(-)

--- a/fs/binfmt_elf.c~elf-better-codegen-around-current-mm
+++ a/fs/binfmt_elf.c
@@ -165,6 +165,7 @@ create_elf_tables(struct linux_binprm *b
 		unsigned long load_addr, unsigned long interp_load_addr,
 		unsigned long e_entry)
 {
+	struct mm_struct *mm = current->mm;
 	unsigned long p = bprm->p;
 	int argc = bprm->argc;
 	int envc = bprm->envc;
@@ -227,7 +228,7 @@ create_elf_tables(struct linux_binprm *b
 		return -EFAULT;
 
 	/* Create the ELF interpreter info */
-	elf_info = (elf_addr_t *)current->mm->saved_auxv;
+	elf_info = (elf_addr_t *)mm->saved_auxv;
 	/* update AT_VECTOR_SIZE_BASE if the number of NEW_AUX_ENT() changes */
 #define NEW_AUX_ENT(id, val) \
 	do { \
@@ -276,13 +277,13 @@ create_elf_tables(struct linux_binprm *b
 	}
 #undef NEW_AUX_ENT
 	/* AT_NULL is zero; clear the rest too */
-	memset(elf_info, 0, (char *)current->mm->saved_auxv +
-			sizeof(current->mm->saved_auxv) - (char *)elf_info);
+	memset(elf_info, 0, (char *)mm->saved_auxv +
+			sizeof(mm->saved_auxv) - (char *)elf_info);
 
 	/* And advance past the AT_NULL entry.  */
 	elf_info += 2;
 
-	ei_index = elf_info - (elf_addr_t *)current->mm->saved_auxv;
+	ei_index = elf_info - (elf_addr_t *)mm->saved_auxv;
 	sp = STACK_ADD(p, ei_index);
 
 	items = (argc + 1) + (envc + 1) + 1;
@@ -301,7 +302,7 @@ create_elf_tables(struct linux_binprm *b
 	 * Grow the stack manually; some architectures have a limit on how
 	 * far ahead a user-space access may be in order to grow the stack.
 	 */
-	vma = find_extend_vma(current->mm, bprm->p);
+	vma = find_extend_vma(mm, bprm->p);
 	if (!vma)
 		return -EFAULT;
 
@@ -310,7 +311,7 @@ create_elf_tables(struct linux_binprm *b
 		return -EFAULT;
 
 	/* Populate list of argv pointers back to argv strings. */
-	p = current->mm->arg_end = current->mm->arg_start;
+	p = mm->arg_end = mm->arg_start;
 	while (argc-- > 0) {
 		size_t len;
 		if (__put_user((elf_addr_t)p, sp++))
@@ -322,10 +323,10 @@ create_elf_tables(struct linux_binprm *b
 	}
 	if (__put_user(0, sp++))
 		return -EFAULT;
-	current->mm->arg_end = p;
+	mm->arg_end = p;
 
 	/* Populate list of envp pointers back to envp strings. */
-	current->mm->env_end = current->mm->env_start = p;
+	mm->env_end = mm->env_start = p;
 	while (envc-- > 0) {
 		size_t len;
 		if (__put_user((elf_addr_t)p, sp++))
@@ -337,10 +338,10 @@ create_elf_tables(struct linux_binprm *b
 	}
 	if (__put_user(0, sp++))
 		return -EFAULT;
-	current->mm->env_end = p;
+	mm->env_end = p;
 
 	/* Put the elf_info on the stack in the right place.  */
-	if (copy_to_user(sp, current->mm->saved_auxv, ei_index * sizeof(elf_addr_t)))
+	if (copy_to_user(sp, mm->saved_auxv, ei_index * sizeof(elf_addr_t)))
 		return -EFAULT;
 	return 0;
 }
@@ -701,6 +702,7 @@ static int load_elf_binary(struct linux_
 		struct elfhdr interp_elf_ex;
 	} *loc;
 	struct arch_elf_state arch_state = INIT_ARCH_ELF_STATE;
+	struct mm_struct *mm;
 	struct pt_regs *regs;
 
 	loc = kmalloc(sizeof(*loc), GFP_KERNEL);
@@ -1096,11 +1098,13 @@ out_free_interp:
 			  load_addr, interp_load_addr, e_entry);
 	if (retval < 0)
 		goto out;
-	current->mm->end_code = end_code;
-	current->mm->start_code = start_code;
-	current->mm->start_data = start_data;
-	current->mm->end_data = end_data;
-	current->mm->start_stack = bprm->p;
+
+	mm = current->mm;
+	mm->end_code = end_code;
+	mm->start_code = start_code;
+	mm->start_data = start_data;
+	mm->end_data = end_data;
+	mm->start_stack = bprm->p;
 
 	if ((current->flags & PF_RANDOMIZE) && (randomize_va_space > 1)) {
 		/*
@@ -1111,12 +1115,11 @@ out_free_interp:
 		 * growing down), and into the unused ELF_ET_DYN_BASE region.
 		 */
 		if (IS_ENABLED(CONFIG_ARCH_HAS_ELF_RANDOMIZE) &&
-		    elf_ex->e_type == ET_DYN && !interpreter)
-			current->mm->brk = current->mm->start_brk =
-				ELF_ET_DYN_BASE;
+		    elf_ex->e_type == ET_DYN && !interpreter) {
+			mm->brk = mm->start_brk = ELF_ET_DYN_BASE;
+		}
 
-		current->mm->brk = current->mm->start_brk =
-			arch_randomize_brk(current->mm);
+		mm->brk = mm->start_brk = arch_randomize_brk(mm);
 #ifdef compat_brk_randomized
 		current->brk_randomized = 1;
 #endif
@@ -1574,6 +1577,7 @@ static void fill_siginfo_note(struct mem
  */
 static int fill_files_note(struct memelfnote *note)
 {
+	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma;
 	unsigned count, size, names_ofs, remaining, n;
 	user_long_t *data;
@@ -1581,7 +1585,7 @@ static int fill_files_note(struct memelf
 	char *name_base, *name_curpos;
 
 	/* *Estimated* file count and total data size needed */
-	count = current->mm->map_count;
+	count = mm->map_count;
 	if (count > UINT_MAX / 64)
 		return -EINVAL;
 	size = count * 64;
@@ -1599,7 +1603,7 @@ static int fill_files_note(struct memelf
 	name_base = name_curpos = ((char *)data) + names_ofs;
 	remaining = size - names_ofs;
 	count = 0;
-	for (vma = current->mm->mmap; vma != NULL; vma = vma->vm_next) {
+	for (vma = mm->mmap; vma != NULL; vma = vma->vm_next) {
 		struct file *file;
 		const char *filename;
 
@@ -1633,10 +1637,10 @@ static int fill_files_note(struct memelf
 	data[0] = count;
 	data[1] = PAGE_SIZE;
 	/*
-	 * Count usually is less than current->mm->map_count,
+	 * Count usually is less than mm->map_count,
 	 * we need to move filenames down.
 	 */
-	n = current->mm->map_count - count;
+	n = mm->map_count - count;
 	if (n != 0) {
 		unsigned shift_bytes = n * 3 * sizeof(data[0]);
 		memmove(name_base - shift_bytes, name_base,
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 107/118] fs/binfmt_elf.c: make BAD_ADDR() unlikely
  2020-01-31  6:10 incoming Andrew Morton
                   ` (105 preceding siblings ...)
  2020-01-31  6:16 ` [patch 106/118] fs/binfmt_elf.c: better codegen around current->mm Andrew Morton
@ 2020-01-31  6:17 ` Andrew Morton
  2020-01-31  6:17 ` [patch 108/118] fs/binfmt_elf.c: coredump: allocate core ELF header on stack Andrew Morton
                   ` (10 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:17 UTC (permalink / raw)
  To: adobriyan, akpm, linux-mm, mm-commits, torvalds

From: Alexey Dobriyan <adobriyan@gmail.com>
Subject: fs/binfmt_elf.c: make BAD_ADDR() unlikely

If some mapping goes past TASK_SIZE it will be rejected by kernel which
means no such userspace binaries exist.

Mark every such check as unlikely.

Link: http://lkml.kernel.org/r/20191215124355.GA21124@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/binfmt_elf.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/binfmt_elf.c~elf-make-bad_addr-unlikely
+++ a/fs/binfmt_elf.c
@@ -97,7 +97,7 @@ static struct linux_binfmt elf_format =
 	.min_coredump	= ELF_EXEC_PAGESIZE,
 };
 
-#define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE)
+#define BAD_ADDR(x) (unlikely((unsigned long)(x) >= TASK_SIZE))
 
 static int set_brk(unsigned long start, unsigned long end, int prot)
 {
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 108/118] fs/binfmt_elf.c: coredump: allocate core ELF header on stack
  2020-01-31  6:10 incoming Andrew Morton
                   ` (106 preceding siblings ...)
  2020-01-31  6:17 ` [patch 107/118] fs/binfmt_elf.c: make BAD_ADDR() unlikely Andrew Morton
@ 2020-01-31  6:17 ` Andrew Morton
  2020-01-31  6:17 ` [patch 109/118] fs/binfmt_elf.c: coredump: delete duplicated overflow check Andrew Morton
                   ` (9 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:17 UTC (permalink / raw)
  To: adobriyan, akpm, linux-mm, mm-commits, torvalds

From: Alexey Dobriyan <adobriyan@gmail.com>
Subject: fs/binfmt_elf.c: coredump: allocate core ELF header on stack

Comment says ELF header is "too large to be on stack".  64 bytes on 64-bit
is not large by any means.

Link: http://lkml.kernel.org/r/20191222143850.GA24341@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/binfmt_elf.c |   16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

--- a/fs/binfmt_elf.c~elf-coredump-allocate-core-elf-header-on-stack
+++ a/fs/binfmt_elf.c
@@ -2186,7 +2186,7 @@ static int elf_core_dump(struct coredump
 	int segs, i;
 	size_t vma_data_size = 0;
 	struct vm_area_struct *vma, *gate_vma;
-	struct elfhdr *elf = NULL;
+	struct elfhdr elf;
 	loff_t offset = 0, dataoff;
 	struct elf_note_info info = { };
 	struct elf_phdr *phdr4note = NULL;
@@ -2207,10 +2207,6 @@ static int elf_core_dump(struct coredump
 	 * exists while dumping the mm->vm_next areas to the core file.
 	 */
   
-	/* alloc memory for large data structures: too large to be on stack */
-	elf = kmalloc(sizeof(*elf), GFP_KERNEL);
-	if (!elf)
-		goto out;
 	/*
 	 * The number of segs are recored into ELF header as 16bit value.
 	 * Please check DEFAULT_MAX_MAP_COUNT definition when you modify here.
@@ -2234,7 +2230,7 @@ static int elf_core_dump(struct coredump
 	 * Collect all the non-memory information about the process for the
 	 * notes.  This also sets up the file header.
 	 */
-	if (!fill_note_info(elf, e_phnum, &info, cprm->siginfo, cprm->regs))
+	if (!fill_note_info(&elf, e_phnum, &info, cprm->siginfo, cprm->regs))
 		goto cleanup;
 
 	has_dumped = 1;
@@ -2242,7 +2238,7 @@ static int elf_core_dump(struct coredump
 	fs = get_fs();
 	set_fs(KERNEL_DS);
 
-	offset += sizeof(*elf);				/* Elf header */
+	offset += sizeof(elf);				/* Elf header */
 	offset += segs * sizeof(struct elf_phdr);	/* Program headers */
 
 	/* Write notes phdr entry */
@@ -2285,12 +2281,12 @@ static int elf_core_dump(struct coredump
 		shdr4extnum = kmalloc(sizeof(*shdr4extnum), GFP_KERNEL);
 		if (!shdr4extnum)
 			goto end_coredump;
-		fill_extnum_info(elf, shdr4extnum, e_shoff, segs);
+		fill_extnum_info(&elf, shdr4extnum, e_shoff, segs);
 	}
 
 	offset = dataoff;
 
-	if (!dump_emit(cprm, elf, sizeof(*elf)))
+	if (!dump_emit(cprm, &elf, sizeof(elf)))
 		goto end_coredump;
 
 	if (!dump_emit(cprm, phdr4note, sizeof(*phdr4note)))
@@ -2374,8 +2370,6 @@ cleanup:
 	kfree(shdr4extnum);
 	kvfree(vma_filesz);
 	kfree(phdr4note);
-	kfree(elf);
-out:
 	return has_dumped;
 }
 
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 109/118] fs/binfmt_elf.c: coredump: delete duplicated overflow check
  2020-01-31  6:10 incoming Andrew Morton
                   ` (107 preceding siblings ...)
  2020-01-31  6:17 ` [patch 108/118] fs/binfmt_elf.c: coredump: allocate core ELF header on stack Andrew Morton
@ 2020-01-31  6:17 ` Andrew Morton
  2020-01-31  6:17 ` [patch 110/118] fs/binfmt_elf.c: coredump: allow process with empty address space to coredump Andrew Morton
                   ` (8 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:17 UTC (permalink / raw)
  To: adobriyan, akpm, linux-mm, mm-commits, torvalds

From: Alexey Dobriyan <adobriyan@gmail.com>
Subject: fs/binfmt_elf.c: coredump: delete duplicated overflow check

array_size() macro will do overflow check anyway.

Link: http://lkml.kernel.org/r/20191222144009.GB24341@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/binfmt_elf.c |    2 --
 1 file changed, 2 deletions(-)

--- a/fs/binfmt_elf.c~elf-coredump-delete-duplicated-overflow-check
+++ a/fs/binfmt_elf.c
@@ -2257,8 +2257,6 @@ static int elf_core_dump(struct coredump
 
 	dataoff = offset = roundup(offset, ELF_EXEC_PAGESIZE);
 
-	if (segs - 1 > ULONG_MAX / sizeof(*vma_filesz))
-		goto end_coredump;
 	vma_filesz = kvmalloc(array_size(sizeof(*vma_filesz), (segs - 1)),
 			      GFP_KERNEL);
 	if (ZERO_OR_NULL_PTR(vma_filesz))
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 110/118] fs/binfmt_elf.c: coredump: allow process with empty address space to coredump
  2020-01-31  6:10 incoming Andrew Morton
                   ` (108 preceding siblings ...)
  2020-01-31  6:17 ` [patch 109/118] fs/binfmt_elf.c: coredump: delete duplicated overflow check Andrew Morton
@ 2020-01-31  6:17 ` Andrew Morton
  2020-01-31  6:17 ` [patch 111/118] init/main.c: log arguments and environment passed to init Andrew Morton
                   ` (7 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:17 UTC (permalink / raw)
  To: adobriyan, akpm, linux-mm, mm-commits, torvalds

From: Alexey Dobriyan <adobriyan@gmail.com>
Subject: fs/binfmt_elf.c: coredump: allow process with empty address space to coredump

Unmapping whole address space at once with

	munmap(0, (1ULL<<47) - 4096)

or equivalent will create empty coredump.

It is silly way to exit, however registers content may still be useful.

The right to coredump is fundamental right of a process!

Link: http://lkml.kernel.org/r/20191222150137.GA1277@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/binfmt_elf.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

--- a/fs/binfmt_elf.c~elf-coredump-allow-process-with-empty-address-space-to-coredump
+++ a/fs/binfmt_elf.c
@@ -1595,6 +1595,10 @@ static int fill_files_note(struct memelf
 	if (size >= MAX_FILE_NOTE_SIZE) /* paranoia check */
 		return -EINVAL;
 	size = round_up(size, PAGE_SIZE);
+	/*
+	 * "size" can be 0 here legitimately.
+	 * Let it ENOMEM and omit NT_FILE section which will be empty anyway.
+	 */
 	data = kvmalloc(size, GFP_KERNEL);
 	if (ZERO_OR_NULL_PTR(data))
 		return -ENOMEM;
@@ -2257,9 +2261,13 @@ static int elf_core_dump(struct coredump
 
 	dataoff = offset = roundup(offset, ELF_EXEC_PAGESIZE);
 
+	/*
+	 * Zero vma process will get ZERO_SIZE_PTR here.
+	 * Let coredump continue for register state at least.
+	 */
 	vma_filesz = kvmalloc(array_size(sizeof(*vma_filesz), (segs - 1)),
 			      GFP_KERNEL);
-	if (ZERO_OR_NULL_PTR(vma_filesz))
+	if (!vma_filesz)
 		goto end_coredump;
 
 	for (i = 0, vma = first_vma(current, gate_vma); vma != NULL;
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 111/118] init/main.c: log arguments and environment passed to init
  2020-01-31  6:10 incoming Andrew Morton
                   ` (109 preceding siblings ...)
  2020-01-31  6:17 ` [patch 110/118] fs/binfmt_elf.c: coredump: allow process with empty address space to coredump Andrew Morton
@ 2020-01-31  6:17 ` Andrew Morton
  2020-01-31  6:17 ` [patch 112/118] init/main.c: remove unnecessary repair_env_string in do_initcall_level Andrew Morton
                   ` (6 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:17 UTC (permalink / raw)
  To: akpm, cmetcalf, krzysiek, linux-mm, mm-commits, nivedita, torvalds

From: Arvind Sankar <nivedita@alum.mit.edu>
Subject: init/main.c: log arguments and environment passed to init

Extend logging in `run_init_process` to also show the arguments and
environment that we are passing to init.

Link: http://lkml.kernel.org/r/20191212180023.24339-2-nivedita@alum.mit.edu
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 init/main.c |    8 ++++++++
 1 file changed, 8 insertions(+)

--- a/init/main.c~init-mainc-log-arguments-and-environment-passed-to-init
+++ a/init/main.c
@@ -1043,8 +1043,16 @@ static void __init do_pre_smp_initcalls(
 
 static int run_init_process(const char *init_filename)
 {
+	const char *const *p;
+
 	argv_init[0] = init_filename;
 	pr_info("Run %s as init process\n", init_filename);
+	pr_debug("  with arguments:\n");
+	for (p = argv_init; *p; p++)
+		pr_debug("    %s\n", *p);
+	pr_debug("  with environment:\n");
+	for (p = envp_init; *p; p++)
+		pr_debug("    %s\n", *p);
 	return do_execve(getname_kernel(init_filename),
 		(const char __user *const __user *)argv_init,
 		(const char __user *const __user *)envp_init);
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 112/118] init/main.c: remove unnecessary repair_env_string in do_initcall_level
  2020-01-31  6:10 incoming Andrew Morton
                   ` (110 preceding siblings ...)
  2020-01-31  6:17 ` [patch 111/118] init/main.c: log arguments and environment passed to init Andrew Morton
@ 2020-01-31  6:17 ` Andrew Morton
  2020-01-31  6:17 ` [patch 113/118] init/main.c: fix quoted value handling in unknown_bootoption Andrew Morton
                   ` (5 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:17 UTC (permalink / raw)
  To: akpm, cmetcalf, krzysiek, linux-mm, mm-commits, nivedita, torvalds

From: Arvind Sankar <nivedita@alum.mit.edu>
Subject: init/main.c: remove unnecessary repair_env_string in do_initcall_level

Since commit 08746a65c296 ("init: fix in-place parameter modification
regression"), parse_args in do_initcall_level is called on a copy of
saved_command_line.  It is unnecessary to call repair_env_string during
this parsing, as this copy is not used for anything later.

Remove the now unnecessary arguments from repair_env_string as well.

Link: http://lkml.kernel.org/r/20191212180023.24339-3-nivedita@alum.mit.edu
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Krzysztof Mazur <krzysiek@podlesie.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 init/main.c |   16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

--- a/init/main.c~init-mainc-remove-unnecessary-repair_env_string-in-do_initcall_level
+++ a/init/main.c
@@ -246,8 +246,7 @@ static int __init loglevel(char *str)
 early_param("loglevel", loglevel);
 
 /* Change NUL term back to "=", to make "param" the whole string. */
-static int __init repair_env_string(char *param, char *val,
-				    const char *unused, void *arg)
+static void __init repair_env_string(char *param, char *val)
 {
 	if (val) {
 		/* param=val or param="val"? */
@@ -260,7 +259,6 @@ static int __init repair_env_string(char
 		} else
 			BUG();
 	}
-	return 0;
 }
 
 /* Anything after -- gets handed straight to init. */
@@ -272,7 +270,7 @@ static int __init set_init_arg(char *par
 	if (panic_later)
 		return 0;
 
-	repair_env_string(param, val, unused, NULL);
+	repair_env_string(param, val);
 
 	for (i = 0; argv_init[i]; i++) {
 		if (i == MAX_INIT_ARGS) {
@@ -292,7 +290,7 @@ static int __init set_init_arg(char *par
 static int __init unknown_bootoption(char *param, char *val,
 				     const char *unused, void *arg)
 {
-	repair_env_string(param, val, unused, NULL);
+	repair_env_string(param, val);
 
 	/* Handle obsolete-style parameters */
 	if (obsolete_checksetup(param))
@@ -991,6 +989,12 @@ static const char *initcall_level_names[
 	"late",
 };
 
+static int __init ignore_unknown_bootoption(char *param, char *val,
+			       const char *unused, void *arg)
+{
+	return 0;
+}
+
 static void __init do_initcall_level(int level)
 {
 	initcall_entry_t *fn;
@@ -1000,7 +1004,7 @@ static void __init do_initcall_level(int
 		   initcall_command_line, __start___param,
 		   __stop___param - __start___param,
 		   level, level,
-		   NULL, &repair_env_string);
+		   NULL, ignore_unknown_bootoption);
 
 	trace_initcall_level(initcall_level_names[level]);
 	for (fn = initcall_levels[level]; fn < initcall_levels[level+1]; fn++)
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 113/118] init/main.c: fix quoted value handling in unknown_bootoption
  2020-01-31  6:10 incoming Andrew Morton
                   ` (111 preceding siblings ...)
  2020-01-31  6:17 ` [patch 112/118] init/main.c: remove unnecessary repair_env_string in do_initcall_level Andrew Morton
@ 2020-01-31  6:17 ` Andrew Morton
  2020-01-31  6:17 ` [patch 114/118] init/main.c: fix misleading "This architecture does not have kernel memory protection" message Andrew Morton
                   ` (4 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:17 UTC (permalink / raw)
  To: akpm, cmetcalf, krzysiek, linux-mm, mm-commits, nivedita, torvalds

From: Arvind Sankar <nivedita@alum.mit.edu>
Subject: init/main.c: fix quoted value handling in unknown_bootoption

Patch series "init/main.c: minor cleanup/bugfix of envvar handling", v2.

unknown_bootoption passes unrecognized command line arguments to init as
either environment variables or arguments.  Some of the logic in the
function is broken for quoted command line arguments.

When an argument of the form param="value" is processed by parse_args and
passed to unknown_bootoption, the command line has
  param\0"value\0
with val pointing to the beginning of value.  The helper function
repair_env_string is then used to restore the '=' character that was
removed by parse_args, and strip the quotes off fully.  This results in
  param=value\0\0
and val ends up pointing to the 'a' instead of the 'v' in value.  This bug
was introduced when repair_env_string was refactored into a separate
function, and the decrement of val in repair_env_string became dead code.

This causes two problems in unknown_bootoption in the two places where the
val pointer is used as a substitute for the length of param:

1. An argument of the form param=".value" is misinterpreted as a
   potential module parameter, with the result that it will not be placed
   in init's environment.

2. An argument of the form param="value" is checked to see if param is
   an existing environment variable that should be overwritten, but the
   comparison is off-by-one and compares 'param=v' instead of 'param='
   against the existing environment.  So passing, for example,
   TERM="vt100" on the command line results in init being passed both
   TERM=linux and TERM=vt100 in its environment.

Patch 1 adds logging for the arguments and environment passed to init and
is independent of the rest: it can be dropped if this is unnecessarily
verbose.

Patch 2 removes repair_env_string from initcall parameter parsing in
do_initcall_level, as that uses a separate copy of the command line now
and the repairing is no longer necessary.

Patch 3 fixes the bug in unknown_bootoption by recording the length of
param explicitly instead of implying it from val-param.


This patch (of 3):

Commit a99cd1125189 ("init: fix bug where environment vars can't be passed
via boot args") introduced two minor bugs in unknown_bootoption by
factoring out the quoted value handling into a separate function.

When value is quoted, repair_env_string will move the value up 1 byte to
strip the quotes, so val in unknown_bootoption no longer points to the
actual location of the value.

The result is that an argument of the form param=".value" is mistakenly
treated as a potential module parameter and is not placed in init's
environment, and an argument of the form param="value" can result in a
duplicate environment variable: eg TERM="vt100" on the command line will
result in both TERM=linux and TERM=vt100 being placed into init's
environment.

Fix this by recording the length of the param before calling
repair_env_string instead of relying on val.

Link: http://lkml.kernel.org/r/20191212180023.24339-4-nivedita@alum.mit.edu
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 init/main.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

--- a/init/main.c~init-mainc-fix-quoted-value-handling-in-unknown_bootoption
+++ a/init/main.c
@@ -255,7 +255,6 @@ static void __init repair_env_string(cha
 		else if (val == param+strlen(param)+2) {
 			val[-2] = '=';
 			memmove(val-1, val, strlen(val)+1);
-			val--;
 		} else
 			BUG();
 	}
@@ -290,6 +289,8 @@ static int __init set_init_arg(char *par
 static int __init unknown_bootoption(char *param, char *val,
 				     const char *unused, void *arg)
 {
+	size_t len = strlen(param);
+
 	repair_env_string(param, val);
 
 	/* Handle obsolete-style parameters */
@@ -297,7 +298,7 @@ static int __init unknown_bootoption(cha
 		return 0;
 
 	/* Unused module parameter. */
-	if (strchr(param, '.') && (!val || strchr(param, '.') < val))
+	if (strnchr(param, len, '.'))
 		return 0;
 
 	if (panic_later)
@@ -311,7 +312,7 @@ static int __init unknown_bootoption(cha
 				panic_later = "env";
 				panic_param = param;
 			}
-			if (!strncmp(param, envp_init[i], val - param))
+			if (!strncmp(param, envp_init[i], len+1))
 				break;
 		}
 		envp_init[i] = param;
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 114/118] init/main.c: fix misleading "This architecture does not have kernel memory protection" message
  2020-01-31  6:10 incoming Andrew Morton
                   ` (112 preceding siblings ...)
  2020-01-31  6:17 ` [patch 113/118] init/main.c: fix quoted value handling in unknown_bootoption Andrew Morton
@ 2020-01-31  6:17 ` Andrew Morton
  2020-01-31  6:17 ` [patch 115/118] reiserfs: prevent NULL pointer dereference in reiserfs_insert_item() Andrew Morton
                   ` (3 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:17 UTC (permalink / raw)
  To: akpm, benh, christophe.leroy, keescook, linux-mm, mm-commits,
	mpe, paulus, torvalds

From: Christophe Leroy <christophe.leroy@c-s.fr>
Subject: init/main.c: fix misleading "This architecture does not have kernel memory protection" message

This message leads to thinking that memory protection is not implemented
for the said architecture, whereas absence of CONFIG_STRICT_KERNEL_RWX
only means that memory protection has not been selected at compile time.

Don't print this message when CONFIG_ARCH_HAS_STRICT_KERNEL_RWX is
selected by the architecture.  Instead, print "Kernel memory protection
not selected by kernel config."

Link: http://lkml.kernel.org/r/62477e446d9685459d4f27d193af6ff1bd69d55f.1578557581.git.christophe.leroy@c-s.fr
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 init/main.c |    5 +++++
 1 file changed, 5 insertions(+)

--- a/init/main.c~init-fix-misleading-this-architecture-does-not-have-kernel-memory-protection-message
+++ a/init/main.c
@@ -1104,6 +1104,11 @@ static void mark_readonly(void)
 	} else
 		pr_info("Kernel memory protection disabled.\n");
 }
+#elif defined(CONFIG_ARCH_HAS_STRICT_KERNEL_RWX)
+static inline void mark_readonly(void)
+{
+	pr_warn("Kernel memory protection not selected by kernel config.\n");
+}
 #else
 static inline void mark_readonly(void)
 {
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 115/118] reiserfs: prevent NULL pointer dereference in reiserfs_insert_item()
  2020-01-31  6:10 incoming Andrew Morton
                   ` (113 preceding siblings ...)
  2020-01-31  6:17 ` [patch 114/118] init/main.c: fix misleading "This architecture does not have kernel memory protection" message Andrew Morton
@ 2020-01-31  6:17 ` Andrew Morton
  2020-01-31  6:17 ` [patch 116/118] execve: warn if process starts with executable stack Andrew Morton
                   ` (2 subsequent siblings)
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:17 UTC (permalink / raw)
  To: akpm, hushiyuan, jack, linfeilong, linux-mm, mm-commits,
	torvalds, yeyunfeng, zhengbin13

From: Yunfeng Ye <yeyunfeng@huawei.com>
Subject: reiserfs: prevent NULL pointer dereference in reiserfs_insert_item()

The variable inode may be NULL in reiserfs_insert_item(), but there is
no check before accessing the member of inode.

Fix this by adding NULL pointer check before calling reiserfs_debug().

Link: http://lkml.kernel.org/r/79c5135d-ff25-1cc9-4e99-9f572b88cc00@huawei.com
Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Cc: zhengbin <zhengbin13@huawei.com>
Cc: Hu Shiyuan <hushiyuan@huawei.com>
Cc: Feilong Lin <linfeilong@huawei.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/reiserfs/stree.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/fs/reiserfs/stree.c~reiserfs-prevent-null-pointer-dereference-in-reiserfs_insert_item
+++ a/fs/reiserfs/stree.c
@@ -2246,7 +2246,8 @@ error_out:
 	/* also releases the path */
 	unfix_nodes(&s_ins_balance);
 #ifdef REISERQUOTA_DEBUG
-	reiserfs_debug(th->t_super, REISERFS_DEBUG_CODE,
+	if (inode)
+		reiserfs_debug(th->t_super, REISERFS_DEBUG_CODE,
 		       "reiserquota insert_item(): freeing %u id=%u type=%c",
 		       quota_bytes, inode->i_uid, head2type(ih));
 #endif
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 116/118] execve: warn if process starts with executable stack
  2020-01-31  6:10 incoming Andrew Morton
                   ` (114 preceding siblings ...)
  2020-01-31  6:17 ` [patch 115/118] reiserfs: prevent NULL pointer dereference in reiserfs_insert_item() Andrew Morton
@ 2020-01-31  6:17 ` Andrew Morton
  2020-01-31  6:17 ` [patch 117/118] include/linux/io-mapping.h-mapping: use PHYS_PFN() macro in io_mapping_map_atomic_wc() Andrew Morton
  2020-01-31  6:17 ` [patch 118/118] kcov: ignore fault-inject and stacktrace Andrew Morton
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:17 UTC (permalink / raw)
  To: adobriyan, akpm, dan.carpenter, ebiederm, linux-mm, mm-commits,
	torvalds, will

From: Alexey Dobriyan <adobriyan@gmail.com>
Subject: execve: warn if process starts with executable stack

There were few episodes of silent downgrade to an executable stack over
years:

1) linking innocent looking assembly file will silently add executable
   stack if proper linker options is not given as well:

	$ cat f.S
	.intel_syntax noprefix
	.text
	.globl f
	f:
	        ret

	$ cat main.c
	void f(void);
	int main(void)
	{
	        f();
	        return 0;
	}

	$ gcc main.c f.S
	$ readelf -l ./a.out
	  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                         0x0000000000000000 0x0000000000000000  RWE    0x10
			 					 ^^^

2) converting C99 nested function into a closure
https://nullprogram.com/blog/2019/11/15/

	void intsort2(int *base, size_t nmemb, _Bool invert)
	{
	    int cmp(const void *a, const void *b)
	    {
	        int r = *(int *)a - *(int *)b;
	        return invert ? -r : r;
	    }
	    qsort(base, nmemb, sizeof(*base), cmp);
	}

will silently require stack trampolines while non-closure version will not.

Without doubt this behaviour is documented somewhere, add a warning so
that developers and users can at least notice.  After so many years of
x86_64 having proper executable stack support it should not cause too many
problems.

Link: http://lkml.kernel.org/r/20191208171918.GC19716@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Will Deacon <will@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/exec.c |    5 +++++
 1 file changed, 5 insertions(+)

--- a/fs/exec.c~execve-warn-if-process-starts-with-executable-stack
+++ a/fs/exec.c
@@ -761,6 +761,11 @@ int setup_arg_pages(struct linux_binprm
 		goto out_unlock;
 	BUG_ON(prev != vma);
 
+	if (unlikely(vm_flags & VM_EXEC)) {
+		pr_warn_once("process '%pD4' started with executable stack\n",
+			     bprm->file);
+	}
+
 	/* Move stack pages down in memory. */
 	if (stack_shift) {
 		ret = shift_arg_pages(vma, stack_shift);
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 117/118] include/linux/io-mapping.h-mapping: use PHYS_PFN() macro in io_mapping_map_atomic_wc()
  2020-01-31  6:10 incoming Andrew Morton
                   ` (115 preceding siblings ...)
  2020-01-31  6:17 ` [patch 116/118] execve: warn if process starts with executable stack Andrew Morton
@ 2020-01-31  6:17 ` Andrew Morton
  2020-01-31  6:17 ` [patch 118/118] kcov: ignore fault-inject and stacktrace Andrew Morton
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:17 UTC (permalink / raw)
  To: akpm, andriy.shevchenko, linux-mm, mm-commits, torvalds

From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Subject: include/linux/io-mapping.h-mapping: use PHYS_PFN() macro in io_mapping_map_atomic_wc()

Use PHYS_PFN() macro in io_mapping_map_atomic_wc() instead of open coded
variant.

Link: http://lkml.kernel.org/r/20191209165624.56351-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/io-mapping.h |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

--- a/include/linux/io-mapping.h~io-mapping-use-phys_pfn-macro-in-io_mapping_map_atomic_wc
+++ a/include/linux/io-mapping.h
@@ -28,6 +28,7 @@ struct io_mapping {
 
 #ifdef CONFIG_HAVE_ATOMIC_IOMAP
 
+#include <linux/pfn.h>
 #include <asm/iomap.h>
 /*
  * For small address space machines, mapping large objects
@@ -64,12 +65,10 @@ io_mapping_map_atomic_wc(struct io_mappi
 			 unsigned long offset)
 {
 	resource_size_t phys_addr;
-	unsigned long pfn;
 
 	BUG_ON(offset >= mapping->size);
 	phys_addr = mapping->base + offset;
-	pfn = (unsigned long) (phys_addr >> PAGE_SHIFT);
-	return iomap_atomic_prot_pfn(pfn, mapping->prot);
+	return iomap_atomic_prot_pfn(PHYS_PFN(phys_addr), mapping->prot);
 }
 
 static inline void
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* [patch 118/118] kcov: ignore fault-inject and stacktrace
  2020-01-31  6:10 incoming Andrew Morton
                   ` (116 preceding siblings ...)
  2020-01-31  6:17 ` [patch 117/118] include/linux/io-mapping.h-mapping: use PHYS_PFN() macro in io_mapping_map_atomic_wc() Andrew Morton
@ 2020-01-31  6:17 ` Andrew Morton
  117 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-01-31  6:17 UTC (permalink / raw)
  To: akpm, andreyknvl, dvyukov, linux-mm, mm-commits, torvalds

From: Dmitry Vyukov <dvyukov@google.com>
Subject: kcov: ignore fault-inject and stacktrace

Don't instrument 3 more files that contain debugging facilities and
produce large amounts of uninteresting coverage for every syscall.  The
following snippets are sprinkled all over the place in kcov traces in a
debugging kernel.  We already try to disable instrumentation of stack
unwinding code and of most debug facilities.  I guess we did not use
fault-inject.c at the time, and stacktrace.c was somehow missed (or
something has changed in kernel/configs).  This change both speeds up kcov
(kernel doesn't need to store these PCs, user-space doesn't need to
process them) and frees trace buffer capacity for more useful coverage.

should_fail
lib/fault-inject.c:149
fail_dump
lib/fault-inject.c:45

stack_trace_save
kernel/stacktrace.c:124
stack_trace_consume_entry
kernel/stacktrace.c:86
stack_trace_consume_entry
kernel/stacktrace.c:89
... a hundred frames skipped ...
stack_trace_consume_entry
kernel/stacktrace.c:93
stack_trace_consume_entry
kernel/stacktrace.c:86

Link: http://lkml.kernel.org/r/20200116111449.217744-1-dvyukov@gmail.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 kernel/Makefile |    1 +
 lib/Makefile    |    1 +
 mm/Makefile     |    1 +
 3 files changed, 3 insertions(+)

--- a/kernel/Makefile~kcov-ignore-fault-inject-and-stacktrace
+++ a/kernel/Makefile
@@ -27,6 +27,7 @@ KCOV_INSTRUMENT_softirq.o := n
 # and produce insane amounts of uninteresting coverage.
 KCOV_INSTRUMENT_module.o := n
 KCOV_INSTRUMENT_extable.o := n
+KCOV_INSTRUMENT_stacktrace.o := n
 # Don't self-instrument.
 KCOV_INSTRUMENT_kcov.o := n
 KASAN_SANITIZE_kcov.o := n
--- a/lib/Makefile~kcov-ignore-fault-inject-and-stacktrace
+++ a/lib/Makefile
@@ -16,6 +16,7 @@ KCOV_INSTRUMENT_rbtree.o := n
 KCOV_INSTRUMENT_list_debug.o := n
 KCOV_INSTRUMENT_debugobjects.o := n
 KCOV_INSTRUMENT_dynamic_debug.o := n
+KCOV_INSTRUMENT_fault-inject.o := n
 
 # Early boot use of cmdline, don't instrument it
 ifdef CONFIG_AMD_MEM_ENCRYPT
--- a/mm/Makefile~kcov-ignore-fault-inject-and-stacktrace
+++ a/mm/Makefile
@@ -20,6 +20,7 @@ KCOV_INSTRUMENT_kmemleak.o := n
 KCOV_INSTRUMENT_memcontrol.o := n
 KCOV_INSTRUMENT_mmzone.o := n
 KCOV_INSTRUMENT_vmstat.o := n
+KCOV_INSTRUMENT_failslab.o := n
 
 CFLAGS_init-mm.o += $(call cc-disable-warning, override-init)
 CFLAGS_init-mm.o += $(call cc-disable-warning, initializer-overrides)
_

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: incoming
  2021-01-12 23:48 incoming Andrew Morton
@ 2021-01-15 23:32 ` Linus Torvalds
  0 siblings, 0 replies; 263+ messages in thread
From: Linus Torvalds @ 2021-01-15 23:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux-MM, mm-commits

On Tue, Jan 12, 2021 at 3:48 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> 10 patches, based on e609571b5ffa3528bf85292de1ceaddac342bc1c.

Whee. I had completely dropped the ball on this - I had built my usual
"akpm" branch with the patches, but then had completely forgotten
about it after doing my basic build tests.

I tend to leave it for a while to see if people send belated ACK/NAK's
for the patches, but that "for a while" is typically "overnight", not
several days.

So if you ever notice that I haven't merged your patch submission, and
you haven't seen me comment on them, feel free to ping me to remind
me.

Because it might just have gotten lost in the shuffle for some random
reason. Admittedly it's rare - I think this is the first time I just
randomly noticed three days later that I'd never done the actual merge
of the patch-series).

               Linus

^ permalink raw reply	[flat|nested] 263+ messages in thread

* incoming
@ 2021-01-12 23:48 Andrew Morton
  2021-01-15 23:32 ` incoming Linus Torvalds
  0 siblings, 1 reply; 263+ messages in thread
From: Andrew Morton @ 2021-01-12 23:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, mm-commits

10 patches, based on e609571b5ffa3528bf85292de1ceaddac342bc1c.

Subsystems affected by this patch series:

  mm/slub
  mm/pagealloc
  mm/memcg
  mm/kasan
  mm/vmalloc
  mm/migration
  mm/hugetlb
  MAINTAINERS
  mm/memory-failure
  mm/process_vm_access

Subsystem: mm/slub

    Jann Horn <jannh@google.com>:
      mm, slub: consider rest of partial list if acquire_slab() fails

Subsystem: mm/pagealloc

    Hailong liu <liu.hailong6@zte.com.cn>:
      mm/page_alloc: add a missing mm_page_alloc_zone_locked() tracepoint

Subsystem: mm/memcg

    Hugh Dickins <hughd@google.com>:
      mm/memcontrol: fix warning in mem_cgroup_page_lruvec()

Subsystem: mm/kasan

    Hailong Liu <liu.hailong6@zte.com.cn>:
      arm/kasan: fix the array size of kasan_early_shadow_pte[]

Subsystem: mm/vmalloc

    Miaohe Lin <linmiaohe@huawei.com>:
      mm/vmalloc.c: fix potential memory leak

Subsystem: mm/migration

    Jan Stancek <jstancek@redhat.com>:
      mm: migrate: initialize err in do_migrate_pages

Subsystem: mm/hugetlb

    Miaohe Lin <linmiaohe@huawei.com>:
      mm/hugetlb: fix potential missing huge page size info

Subsystem: MAINTAINERS

    Vlastimil Babka <vbabka@suse.cz>:
      MAINTAINERS: add Vlastimil as slab allocators maintainer

Subsystem: mm/memory-failure

    Oscar Salvador <osalvador@suse.de>:
      mm,hwpoison: fix printing of page flags

Subsystem: mm/process_vm_access

    Andrew Morton <akpm@linux-foundation.org>:
      mm/process_vm_access.c: include compat.h

 MAINTAINERS                |    1 +
 include/linux/kasan.h      |    6 +++++-
 include/linux/memcontrol.h |    2 +-
 mm/hugetlb.c               |    2 +-
 mm/kasan/init.c            |    3 ++-
 mm/memory-failure.c        |    2 +-
 mm/mempolicy.c             |    2 +-
 mm/page_alloc.c            |   31 ++++++++++++++++---------------
 mm/process_vm_access.c     |    1 +
 mm/slub.c                  |    2 +-
 mm/vmalloc.c               |    4 +++-
 11 files changed, 33 insertions(+), 23 deletions(-)


^ permalink raw reply	[flat|nested] 263+ messages in thread

* incoming
@ 2020-12-29 23:13 Andrew Morton
  0 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-12-29 23:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, mm-commits

16 patches, based on dea8dcf2a9fa8cc540136a6cd885c3beece16ec3.

Subsystems affected by this patch series:

  mm/selftests
  mm/hugetlb
  kbuild
  checkpatch
  mm/pagecache
  mm/mremap
  mm/kasan
  misc
  lib
  mm/slub

Subsystem: mm/selftests

    Harish <harish@linux.ibm.com>:
      selftests/vm: fix building protection keys test

Subsystem: mm/hugetlb

    Mike Kravetz <mike.kravetz@oracle.com>:
      mm/hugetlb: fix deadlock in hugetlb_cow error path

Subsystem: kbuild

    Masahiro Yamada <masahiroy@kernel.org>:
      Revert "kbuild: avoid static_assert for genksyms"

Subsystem: checkpatch

    Joe Perches <joe@perches.com>:
      checkpatch: prefer strscpy to strlcpy

Subsystem: mm/pagecache

    Souptick Joarder <jrdr.linux@gmail.com>:
      mm: add prototype for __add_to_page_cache_locked()

    Baoquan He <bhe@redhat.com>:
      mm: memmap defer init doesn't work as expected

Subsystem: mm/mremap

    Kalesh Singh <kaleshsingh@google.com>:
      mm/mremap.c: fix extent calculation

    Nicholas Piggin <npiggin@gmail.com>:
      mm: generalise COW SMC TLB flushing race comment

Subsystem: mm/kasan

    Walter Wu <walter-zh.wu@mediatek.com>:
      kasan: fix null pointer dereference in kasan_record_aux_stack

Subsystem: misc

    Randy Dunlap <rdunlap@infradead.org>:
      local64.h: make <asm/local64.h> mandatory

    Huang Shijie <sjhuang@iluvatar.ai>:
      sizes.h: add SZ_8G/SZ_16G/SZ_32G macros

    Josh Poimboeuf <jpoimboe@redhat.com>:
      kdev_t: always inline major/minor helper functions

Subsystem: lib

    Huang Shijie <sjhuang@iluvatar.ai>:
      lib/genalloc: fix the overflow when size is too big

    Ilya Leoshkevich <iii@linux.ibm.com>:
      lib/zlib: fix inflating zlib streams on s390

    Randy Dunlap <rdunlap@infradead.org>:
      zlib: move EXPORT_SYMBOL() and MODULE_LICENSE() out of dfltcc_syms.c

Subsystem: mm/slub

    Roman Gushchin <guro@fb.com>:
      mm: slub: call account_slab_page() after slab page initialization

 arch/alpha/include/asm/local64.h    |    1 -
 arch/arc/include/asm/Kbuild         |    1 -
 arch/arm/include/asm/Kbuild         |    1 -
 arch/arm64/include/asm/Kbuild       |    1 -
 arch/csky/include/asm/Kbuild        |    1 -
 arch/h8300/include/asm/Kbuild       |    1 -
 arch/hexagon/include/asm/Kbuild     |    1 -
 arch/ia64/include/asm/local64.h     |    1 -
 arch/ia64/mm/init.c                 |    4 ++--
 arch/m68k/include/asm/Kbuild        |    1 -
 arch/microblaze/include/asm/Kbuild  |    1 -
 arch/mips/include/asm/Kbuild        |    1 -
 arch/nds32/include/asm/Kbuild       |    1 -
 arch/openrisc/include/asm/Kbuild    |    1 -
 arch/parisc/include/asm/Kbuild      |    1 -
 arch/powerpc/include/asm/Kbuild     |    1 -
 arch/riscv/include/asm/Kbuild       |    1 -
 arch/s390/include/asm/Kbuild        |    1 -
 arch/sh/include/asm/Kbuild          |    1 -
 arch/sparc/include/asm/Kbuild       |    1 -
 arch/x86/include/asm/local64.h      |    1 -
 arch/xtensa/include/asm/Kbuild      |    1 -
 include/asm-generic/Kbuild          |    1 +
 include/linux/build_bug.h           |    5 -----
 include/linux/kdev_t.h              |   22 +++++++++++-----------
 include/linux/mm.h                  |   12 ++++++++++--
 include/linux/sizes.h               |    3 +++
 lib/genalloc.c                      |   25 +++++++++++++------------
 lib/zlib_dfltcc/Makefile            |    2 +-
 lib/zlib_dfltcc/dfltcc.c            |    6 +++++-
 lib/zlib_dfltcc/dfltcc_deflate.c    |    3 +++
 lib/zlib_dfltcc/dfltcc_inflate.c    |    4 ++--
 lib/zlib_dfltcc/dfltcc_syms.c       |   17 -----------------
 mm/hugetlb.c                        |   22 +++++++++++++++++++++-
 mm/kasan/generic.c                  |    2 ++
 mm/memory.c                         |    8 +++++---
 mm/memory_hotplug.c                 |    2 +-
 mm/mremap.c                         |    4 +++-
 mm/page_alloc.c                     |    8 +++++---
 mm/slub.c                           |    5 ++---
 scripts/checkpatch.pl               |    6 ++++++
 tools/testing/selftests/vm/Makefile |   10 +++++-----
 42 files changed, 101 insertions(+), 91 deletions(-)


^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: incoming
  2020-12-22 19:58 incoming Andrew Morton
@ 2020-12-22 21:43 ` Linus Torvalds
  0 siblings, 0 replies; 263+ messages in thread
From: Linus Torvalds @ 2020-12-22 21:43 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux-MM, mm-commits

On Tue, Dec 22, 2020 at 11:58 AM Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> 60 patches, based on 8653b778e454a7708847aeafe689bce07aeeb94e.

I see that you enabled renaming in the patches. Lovely.

Can you also enable it in the diffstat?

>  74 files changed, 2869 insertions(+), 1553 deletions(-)

With -M in the diffstat, you should have seen

 72 files changed, 2775 insertions(+), 1460 deletions(-)

and if you add "--summary", you'll also see the rename part ofthe file
create/delete summary:

 rename mm/kasan/{tags_report.c => report_sw_tags.c} (78%)

which is often nice to see in addition to the line stats..

           Linus

^ permalink raw reply	[flat|nested] 263+ messages in thread

* incoming
@ 2020-12-22 19:58 Andrew Morton
  2020-12-22 21:43 ` incoming Linus Torvalds
  0 siblings, 1 reply; 263+ messages in thread
From: Andrew Morton @ 2020-12-22 19:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, mm-commits


60 patches, based on 8653b778e454a7708847aeafe689bce07aeeb94e.

Subsystems affected by this patch series:

  mm/kasan

Subsystem: mm/kasan

    Andrey Konovalov <andreyknvl@google.com>:
    Patch series "kasan: add hardware tag-based mode for arm64", v11:
      kasan: drop unnecessary GPL text from comment headers
      kasan: KASAN_VMALLOC depends on KASAN_GENERIC
      kasan: group vmalloc code
      kasan: shadow declarations only for software modes
      kasan: rename (un)poison_shadow to (un)poison_range
      kasan: rename KASAN_SHADOW_* to KASAN_GRANULE_*
      kasan: only build init.c for software modes
      kasan: split out shadow.c from common.c
      kasan: define KASAN_MEMORY_PER_SHADOW_PAGE
      kasan: rename report and tags files
      kasan: don't duplicate config dependencies
      kasan: hide invalid free check implementation
      kasan: decode stack frame only with KASAN_STACK_ENABLE
      kasan, arm64: only init shadow for software modes
      kasan, arm64: only use kasan_depth for software modes
      kasan, arm64: move initialization message
      kasan, arm64: rename kasan_init_tags and mark as __init
      kasan: rename addr_has_shadow to addr_has_metadata
      kasan: rename print_shadow_for_address to print_memory_metadata
      kasan: rename SHADOW layout macros to META
      kasan: separate metadata_fetch_row for each mode
      kasan: introduce CONFIG_KASAN_HW_TAGS

    Vincenzo Frascino <vincenzo.frascino@arm.com>:
      arm64: enable armv8.5-a asm-arch option
      arm64: mte: add in-kernel MTE helpers
      arm64: mte: reset the page tag in page->flags
      arm64: mte: add in-kernel tag fault handler
      arm64: kasan: allow enabling in-kernel MTE
      arm64: mte: convert gcr_user into an exclude mask
      arm64: mte: switch GCR_EL1 in kernel entry and exit
      kasan, mm: untag page address in free_reserved_area

    Andrey Konovalov <andreyknvl@google.com>:
      arm64: kasan: align allocations for HW_TAGS
      arm64: kasan: add arch layer for memory tagging helpers
      kasan: define KASAN_GRANULE_SIZE for HW_TAGS
      kasan, x86, s390: update undef CONFIG_KASAN
      kasan, arm64: expand CONFIG_KASAN checks
      kasan, arm64: implement HW_TAGS runtime
      kasan, arm64: print report from tag fault handler
      kasan, mm: reset tags when accessing metadata
      kasan, arm64: enable CONFIG_KASAN_HW_TAGS
      kasan: add documentation for hardware tag-based mode

    Vincenzo Frascino <vincenzo.frascino@arm.com>:
      kselftest/arm64: check GCR_EL1 after context switch

    Andrey Konovalov <andreyknvl@google.com>:
    Patch series "kasan: boot parameters for hardware tag-based mode", v4:
      kasan: simplify quarantine_put call site
      kasan: rename get_alloc/free_info
      kasan: introduce set_alloc_info
      kasan, arm64: unpoison stack only with CONFIG_KASAN_STACK
      kasan: allow VMAP_STACK for HW_TAGS mode
      kasan: remove __kasan_unpoison_stack
      kasan: inline kasan_reset_tag for tag-based modes
      kasan: inline random_tag for HW_TAGS
      kasan: open-code kasan_unpoison_slab
      kasan: inline (un)poison_range and check_invalid_free
      kasan: add and integrate kasan boot parameters
      kasan, mm: check kasan_enabled in annotations
      kasan, mm: rename kasan_poison_kfree
      kasan: don't round_up too much
      kasan: simplify assign_tag and set_tag calls
      kasan: clarify comment in __kasan_kfree_large
      kasan: sanitize objects when metadata doesn't fit
      kasan, mm: allow cache merging with no metadata
      kasan: update documentation

 Documentation/dev-tools/kasan.rst                         |  274 ++-
 arch/Kconfig                                              |    8 
 arch/arm64/Kconfig                                        |    9 
 arch/arm64/Makefile                                       |    7 
 arch/arm64/include/asm/assembler.h                        |    2 
 arch/arm64/include/asm/cache.h                            |    3 
 arch/arm64/include/asm/esr.h                              |    1 
 arch/arm64/include/asm/kasan.h                            |   17 
 arch/arm64/include/asm/memory.h                           |   15 
 arch/arm64/include/asm/mte-def.h                          |   16 
 arch/arm64/include/asm/mte-kasan.h                        |   67 
 arch/arm64/include/asm/mte.h                              |   22 
 arch/arm64/include/asm/processor.h                        |    2 
 arch/arm64/include/asm/string.h                           |    5 
 arch/arm64/include/asm/uaccess.h                          |   23 
 arch/arm64/kernel/asm-offsets.c                           |    3 
 arch/arm64/kernel/cpufeature.c                            |    3 
 arch/arm64/kernel/entry.S                                 |   41 
 arch/arm64/kernel/head.S                                  |    2 
 arch/arm64/kernel/hibernate.c                             |    5 
 arch/arm64/kernel/image-vars.h                            |    2 
 arch/arm64/kernel/kaslr.c                                 |    3 
 arch/arm64/kernel/module.c                                |    6 
 arch/arm64/kernel/mte.c                                   |  124 +
 arch/arm64/kernel/setup.c                                 |    2 
 arch/arm64/kernel/sleep.S                                 |    2 
 arch/arm64/kernel/smp.c                                   |    2 
 arch/arm64/lib/mte.S                                      |   16 
 arch/arm64/mm/copypage.c                                  |    9 
 arch/arm64/mm/fault.c                                     |   59 
 arch/arm64/mm/kasan_init.c                                |   41 
 arch/arm64/mm/mteswap.c                                   |    9 
 arch/arm64/mm/proc.S                                      |   23 
 arch/arm64/mm/ptdump.c                                    |    6 
 arch/s390/boot/string.c                                   |    1 
 arch/x86/boot/compressed/misc.h                           |    1 
 arch/x86/kernel/acpi/wakeup_64.S                          |    2 
 include/linux/kasan-checks.h                              |    2 
 include/linux/kasan.h                                     |  423 ++++-
 include/linux/mm.h                                        |   24 
 include/linux/moduleloader.h                              |    3 
 include/linux/page-flags-layout.h                         |    2 
 include/linux/sched.h                                     |    2 
 include/linux/string.h                                    |    2 
 init/init_task.c                                          |    2 
 kernel/fork.c                                             |    4 
 lib/Kconfig.kasan                                         |   71 
 lib/test_kasan.c                                          |    2 
 lib/test_kasan_module.c                                   |    2 
 mm/kasan/Makefile                                         |   33 
 mm/kasan/common.c                                         | 1006 +++-----------
 mm/kasan/generic.c                                        |   72 -
 mm/kasan/generic_report.c                                 |   13 
 mm/kasan/hw_tags.c                                        |  276 +++
 mm/kasan/init.c                                           |   25 
 mm/kasan/kasan.h                                          |  195 ++
 mm/kasan/quarantine.c                                     |   35 
 mm/kasan/report.c                                         |  363 +----
 mm/kasan/report_generic.c                                 |  169 ++
 mm/kasan/report_hw_tags.c                                 |   44 
 mm/kasan/report_sw_tags.c                                 |   22 
 mm/kasan/shadow.c                                         |  528 +++++++
 mm/kasan/sw_tags.c                                        |   34 
 mm/kasan/tags.c                                           |    7 
 mm/kasan/tags_report.c                                    |    7 
 mm/mempool.c                                              |    4 
 mm/page_alloc.c                                           |    9 
 mm/page_poison.c                                          |    2 
 mm/ptdump.c                                               |   13 
 mm/slab_common.c                                          |    5 
 mm/slub.c                                                 |   29 
 scripts/Makefile.lib                                      |    2 
 tools/testing/selftests/arm64/mte/Makefile                |    2 
 tools/testing/selftests/arm64/mte/check_gcr_el1_cswitch.c |  155 ++
 74 files changed, 2869 insertions(+), 1553 deletions(-)


^ permalink raw reply	[flat|nested] 263+ messages in thread

* incoming
@ 2020-12-18 22:00 Andrew Morton
  0 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-12-18 22:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm


78 patches, based on a409ed156a90093a03fe6a93721ddf4c591eac87.

Subsystems affected by this patch series:

  mm/memcg
  epoll
  mm/kasan
  mm/cleanups
  epoll

Subsystem: mm/memcg

    Alex Shi <alex.shi@linux.alibaba.com>:
    Patch series "bail out early for memcg disable":
      mm/memcg: bail early from swap accounting if memcg disabled
      mm/memcg: warning on !memcg after readahead page charged

    Wei Yang <richard.weiyang@gmail.com>:
      mm/memcg: remove unused definitions

    Shakeel Butt <shakeelb@google.com>:
      mm, kvm: account kvm_vcpu_mmap to kmemcg

    Hui Su <sh_def@163.com>:
      mm/memcontrol:rewrite mem_cgroup_page_lruvec()

Subsystem: epoll

    Soheil Hassas Yeganeh <soheil@google.com>:
    Patch series "simplify ep_poll":
      epoll: check for events when removing a timed out thread from the wait queue
      epoll: simplify signal handling
      epoll: pull fatal signal checks into ep_send_events()
      epoll: move eavail next to the list_empty_careful check
      epoll: simplify and optimize busy loop logic
      epoll: pull all code between fetch_events and send_event into the loop
      epoll: replace gotos with a proper loop
      epoll: eliminate unnecessary lock for zero timeout

Subsystem: mm/kasan

    Andrey Konovalov <andreyknvl@google.com>:
    Patch series "kasan: add hardware tag-based mode for arm64", v11:
      kasan: drop unnecessary GPL text from comment headers
      kasan: KASAN_VMALLOC depends on KASAN_GENERIC
      kasan: group vmalloc code
      kasan: shadow declarations only for software modes
      kasan: rename (un)poison_shadow to (un)poison_range
      kasan: rename KASAN_SHADOW_* to KASAN_GRANULE_*
      kasan: only build init.c for software modes
      kasan: split out shadow.c from common.c
      kasan: define KASAN_MEMORY_PER_SHADOW_PAGE
      kasan: rename report and tags files
      kasan: don't duplicate config dependencies
      kasan: hide invalid free check implementation
      kasan: decode stack frame only with KASAN_STACK_ENABLE
      kasan, arm64: only init shadow for software modes
      kasan, arm64: only use kasan_depth for software modes
      kasan, arm64: move initialization message
      kasan, arm64: rename kasan_init_tags and mark as __init
      kasan: rename addr_has_shadow to addr_has_metadata
      kasan: rename print_shadow_for_address to print_memory_metadata
      kasan: rename SHADOW layout macros to META
      kasan: separate metadata_fetch_row for each mode
      kasan: introduce CONFIG_KASAN_HW_TAGS

    Vincenzo Frascino <vincenzo.frascino@arm.com>:
      arm64: enable armv8.5-a asm-arch option
      arm64: mte: add in-kernel MTE helpers
      arm64: mte: reset the page tag in page->flags
      arm64: mte: add in-kernel tag fault handler
      arm64: kasan: allow enabling in-kernel MTE
      arm64: mte: convert gcr_user into an exclude mask
      arm64: mte: switch GCR_EL1 in kernel entry and exit
      kasan, mm: untag page address in free_reserved_area

    Andrey Konovalov <andreyknvl@google.com>:
      arm64: kasan: align allocations for HW_TAGS
      arm64: kasan: add arch layer for memory tagging helpers
      kasan: define KASAN_GRANULE_SIZE for HW_TAGS
      kasan, x86, s390: update undef CONFIG_KASAN
      kasan, arm64: expand CONFIG_KASAN checks
      kasan, arm64: implement HW_TAGS runtime
      kasan, arm64: print report from tag fault handler
      kasan, mm: reset tags when accessing metadata
      kasan, arm64: enable CONFIG_KASAN_HW_TAGS
      kasan: add documentation for hardware tag-based mode

    Vincenzo Frascino <vincenzo.frascino@arm.com>:
      kselftest/arm64: check GCR_EL1 after context switch

    Andrey Konovalov <andreyknvl@google.com>:
    Patch series "kasan: boot parameters for hardware tag-based mode", v4:
      kasan: simplify quarantine_put call site
      kasan: rename get_alloc/free_info
      kasan: introduce set_alloc_info
      kasan, arm64: unpoison stack only with CONFIG_KASAN_STACK
      kasan: allow VMAP_STACK for HW_TAGS mode
      kasan: remove __kasan_unpoison_stack
      kasan: inline kasan_reset_tag for tag-based modes
      kasan: inline random_tag for HW_TAGS
      kasan: open-code kasan_unpoison_slab
      kasan: inline (un)poison_range and check_invalid_free
      kasan: add and integrate kasan boot parameters
      kasan, mm: check kasan_enabled in annotations
      kasan, mm: rename kasan_poison_kfree
      kasan: don't round_up too much
      kasan: simplify assign_tag and set_tag calls
      kasan: clarify comment in __kasan_kfree_large
      kasan: sanitize objects when metadata doesn't fit
      kasan, mm: allow cache merging with no metadata
      kasan: update documentation

Subsystem: mm/cleanups

    Colin Ian King <colin.king@canonical.com>:
      mm/Kconfig: fix spelling mistake "whats" -> "what's"

Subsystem: epoll

    Willem de Bruijn <willemb@google.com>:
    Patch series "add epoll_pwait2 syscall", v4:
      epoll: convert internal api to timespec64
      epoll: add syscall epoll_pwait2
      epoll: wire up syscall epoll_pwait2
      selftests/filesystems: expand epoll with epoll_pwait2

 Documentation/dev-tools/kasan.rst                             |  274 +-
 arch/Kconfig                                                  |    8 
 arch/alpha/kernel/syscalls/syscall.tbl                        |    1 
 arch/arm/tools/syscall.tbl                                    |    1 
 arch/arm64/Kconfig                                            |    9 
 arch/arm64/Makefile                                           |    7 
 arch/arm64/include/asm/assembler.h                            |    2 
 arch/arm64/include/asm/cache.h                                |    3 
 arch/arm64/include/asm/esr.h                                  |    1 
 arch/arm64/include/asm/kasan.h                                |   17 
 arch/arm64/include/asm/memory.h                               |   15 
 arch/arm64/include/asm/mte-def.h                              |   16 
 arch/arm64/include/asm/mte-kasan.h                            |   67 
 arch/arm64/include/asm/mte.h                                  |   22 
 arch/arm64/include/asm/processor.h                            |    2 
 arch/arm64/include/asm/string.h                               |    5 
 arch/arm64/include/asm/uaccess.h                              |   23 
 arch/arm64/include/asm/unistd.h                               |    2 
 arch/arm64/include/asm/unistd32.h                             |    2 
 arch/arm64/kernel/asm-offsets.c                               |    3 
 arch/arm64/kernel/cpufeature.c                                |    3 
 arch/arm64/kernel/entry.S                                     |   41 
 arch/arm64/kernel/head.S                                      |    2 
 arch/arm64/kernel/hibernate.c                                 |    5 
 arch/arm64/kernel/image-vars.h                                |    2 
 arch/arm64/kernel/kaslr.c                                     |    3 
 arch/arm64/kernel/module.c                                    |    6 
 arch/arm64/kernel/mte.c                                       |  124 +
 arch/arm64/kernel/setup.c                                     |    2 
 arch/arm64/kernel/sleep.S                                     |    2 
 arch/arm64/kernel/smp.c                                       |    2 
 arch/arm64/lib/mte.S                                          |   16 
 arch/arm64/mm/copypage.c                                      |    9 
 arch/arm64/mm/fault.c                                         |   59 
 arch/arm64/mm/kasan_init.c                                    |   41 
 arch/arm64/mm/mteswap.c                                       |    9 
 arch/arm64/mm/proc.S                                          |   23 
 arch/arm64/mm/ptdump.c                                        |    6 
 arch/ia64/kernel/syscalls/syscall.tbl                         |    1 
 arch/m68k/kernel/syscalls/syscall.tbl                         |    1 
 arch/microblaze/kernel/syscalls/syscall.tbl                   |    1 
 arch/mips/kernel/syscalls/syscall_n32.tbl                     |    1 
 arch/mips/kernel/syscalls/syscall_n64.tbl                     |    1 
 arch/mips/kernel/syscalls/syscall_o32.tbl                     |    1 
 arch/parisc/kernel/syscalls/syscall.tbl                       |    1 
 arch/powerpc/kernel/syscalls/syscall.tbl                      |    1 
 arch/s390/boot/string.c                                       |    1 
 arch/s390/kernel/syscalls/syscall.tbl                         |    1 
 arch/sh/kernel/syscalls/syscall.tbl                           |    1 
 arch/sparc/kernel/syscalls/syscall.tbl                        |    1 
 arch/x86/boot/compressed/misc.h                               |    1 
 arch/x86/entry/syscalls/syscall_32.tbl                        |    1 
 arch/x86/entry/syscalls/syscall_64.tbl                        |    1 
 arch/x86/kernel/acpi/wakeup_64.S                              |    2 
 arch/x86/kvm/x86.c                                            |    2 
 arch/xtensa/kernel/syscalls/syscall.tbl                       |    1 
 fs/eventpoll.c                                                |  359 ++-
 include/linux/compat.h                                        |    6 
 include/linux/kasan-checks.h                                  |    2 
 include/linux/kasan.h                                         |  423 ++--
 include/linux/memcontrol.h                                    |  137 -
 include/linux/mm.h                                            |   24 
 include/linux/mmdebug.h                                       |   13 
 include/linux/moduleloader.h                                  |    3 
 include/linux/page-flags-layout.h                             |    2 
 include/linux/sched.h                                         |    2 
 include/linux/string.h                                        |    2 
 include/linux/syscalls.h                                      |    5 
 include/uapi/asm-generic/unistd.h                             |    4 
 init/init_task.c                                              |    2 
 kernel/fork.c                                                 |    4 
 kernel/sys_ni.c                                               |    2 
 lib/Kconfig.kasan                                             |   71 
 lib/test_kasan.c                                              |    2 
 lib/test_kasan_module.c                                       |    2 
 mm/Kconfig                                                    |    2 
 mm/kasan/Makefile                                             |   33 
 mm/kasan/common.c                                             | 1006 ++--------
 mm/kasan/generic.c                                            |   72 
 mm/kasan/generic_report.c                                     |   13 
 mm/kasan/hw_tags.c                                            |  294 ++
 mm/kasan/init.c                                               |   25 
 mm/kasan/kasan.h                                              |  204 +-
 mm/kasan/quarantine.c                                         |   35 
 mm/kasan/report.c                                             |  363 +--
 mm/kasan/report_generic.c                                     |  169 +
 mm/kasan/report_hw_tags.c                                     |   44 
 mm/kasan/report_sw_tags.c                                     |   22 
 mm/kasan/shadow.c                                             |  541 +++++
 mm/kasan/sw_tags.c                                            |   34 
 mm/kasan/tags.c                                               |    7 
 mm/kasan/tags_report.c                                        |    7 
 mm/memcontrol.c                                               |   53 
 mm/mempool.c                                                  |    4 
 mm/page_alloc.c                                               |    9 
 mm/page_poison.c                                              |    2 
 mm/ptdump.c                                                   |   13 
 mm/slab_common.c                                              |    5 
 mm/slub.c                                                     |   29 
 scripts/Makefile.lib                                          |    2 
 tools/testing/selftests/arm64/mte/Makefile                    |    2 
 tools/testing/selftests/arm64/mte/check_gcr_el1_cswitch.c     |  155 +
 tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c |   72 
 virt/kvm/coalesced_mmio.c                                     |    2 
 virt/kvm/kvm_main.c                                           |    2 
 105 files changed, 3268 insertions(+), 1873 deletions(-)


^ permalink raw reply	[flat|nested] 263+ messages in thread

* incoming
@ 2020-12-16  4:41 Andrew Morton
  0 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-12-16  4:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm


- lots of little subsystems

- a few post-linux-next MM material.  Most of this awaits more merging
  of other trees.


95 patches, based on 489e9fea66f31086f85d9a18e61e4791d94a56a4.

Subsystems affected by this patch series:

  mm/swap
  mm/memory-hotplug
  alpha
  procfs
  misc
  core-kernel
  bitmap
  lib
  lz4
  bitops
  checkpatch
  nilfs
  kdump
  rapidio
  gcov
  bfs
  relay
  resource
  ubsan
  reboot
  fault-injection
  lzo
  apparmor
  mm/pagemap
  mm/cleanups
  mm/gup

Subsystem: mm/swap

    Zhaoyang Huang <huangzhaoyang@gmail.com>:
      mm: fix a race on nr_swap_pages

Subsystem: mm/memory-hotplug

    Laurent Dufour <ldufour@linux.ibm.com>:
      mm/memory_hotplug: quieting offline operation

Subsystem: alpha

    Thomas Gleixner <tglx@linutronix.de>:
      alpha: replace bogus in_interrupt()

Subsystem: procfs

    Randy Dunlap <rdunlap@infradead.org>:
      procfs: delete duplicated words + other fixes

    Anand K Mistry <amistry@google.com>:
      proc: provide details on indirect branch speculation

    Alexey Dobriyan <adobriyan@gmail.com>:
      proc: fix lookup in /proc/net subdirectories after setns(2)

    Hui Su <sh_def@163.com>:
      fs/proc: make pde_get() return nothing

Subsystem: misc

    Christophe Leroy <christophe.leroy@csgroup.eu>:
      asm-generic: force inlining of get_order() to work around gcc10 poor decision

    Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
      kernel.h: split out mathematical helpers

Subsystem: core-kernel

    Hui Su <sh_def@163.com>:
      kernel/acct.c: use #elif instead of #end and #elif

Subsystem: bitmap

    Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
      include/linux/bitmap.h: convert bitmap_empty() / bitmap_full() to return boolean

    "Ma, Jianpeng" <jianpeng.ma@intel.com>:
      bitmap: remove unused function declaration

Subsystem: lib

    Geert Uytterhoeven <geert@linux-m68k.org>:
      lib/test_free_pages.c: add basic progress indicators

    "Gustavo A. R. Silva" <gustavoars@kernel.org>:
    Patch series "] lib/stackdepot.c: Replace one-element array with flexible-array member":
      lib/stackdepot.c: replace one-element array with flexible-array member
      lib/stackdepot.c: use flex_array_size() helper in memcpy()
      lib/stackdepot.c: use array_size() helper in jhash2()

    Sebastian Andrzej Siewior <bigeasy@linutronix.de>:
      lib/test_lockup.c: minimum fix to get it compiled on PREEMPT_RT

    Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
      lib/list_kunit: follow new file name convention for KUnit tests
      lib/linear_ranges_kunit: follow new file name convention for KUnit tests
      lib/bits_kunit: follow new file name convention for KUnit tests
      lib/cmdline: fix get_option() for strings starting with hyphen
      lib/cmdline: allow NULL to be an output for get_option()
      lib/cmdline_kunit: add a new test suite for cmdline API

    Jakub Jelinek <jakub@redhat.com>:
      ilog2: improve ilog2 for constant arguments

    Nick Desaulniers <ndesaulniers@google.com>:
      lib/string: remove unnecessary #undefs

    Daniel Axtens <dja@axtens.net>:
    Patch series "Fortify strscpy()", v7:
      lib: string.h: detect intra-object overflow in fortified string functions
      lkdtm: tests for FORTIFY_SOURCE

    Francis Laniel <laniel_francis@privacyrequired.com>:
      string.h: add FORTIFY coverage for strscpy()
      drivers/misc/lkdtm: add new file in LKDTM to test fortified strscpy
      drivers/misc/lkdtm/lkdtm.h: correct wrong filenames in comment

    Alexey Dobriyan <adobriyan@gmail.com>:
      lib: cleanup kstrto*() usage

Subsystem: lz4

    Gao Xiang <hsiangkao@redhat.com>:
      lib/lz4: explicitly support in-place decompression

Subsystem: bitops

    Syed Nayyar Waris <syednwaris@gmail.com>:
    Patch series "Introduce the for_each_set_clump macro", v12:
      bitops: introduce the for_each_set_clump macro
      lib/test_bitmap.c: add for_each_set_clump test cases
      gpio: thunderx: utilize for_each_set_clump macro
      gpio: xilinx: utilize generic bitmap_get_value and _set_value

Subsystem: checkpatch

    Dwaipayan Ray <dwaipayanray1@gmail.com>:
      checkpatch: add new exception to repeated word check

    Aditya Srivastava <yashsri421@gmail.com>:
      checkpatch: fix false positives in REPEATED_WORD warning

    Łukasz Stelmach <l.stelmach@samsung.com>:
      checkpatch: ignore generated CamelCase defines and enum values

    Joe Perches <joe@perches.com>:
      checkpatch: prefer static const declarations
      checkpatch: allow --fix removal of unnecessary break statements

    Dwaipayan Ray <dwaipayanray1@gmail.com>:
      checkpatch: extend attributes check to handle more patterns

    Tom Rix <trix@redhat.com>:
      checkpatch: add a fixer for missing newline at eof

    Joe Perches <joe@perches.com>:
      checkpatch: update __attribute__((section("name"))) quote removal

    Aditya Srivastava <yashsri421@gmail.com>:
      checkpatch: add fix option for GERRIT_CHANGE_ID

    Joe Perches <joe@perches.com>:
      checkpatch: add __alias and __weak to suggested __attribute__ conversions

    Dwaipayan Ray <dwaipayanray1@gmail.com>:
      checkpatch: improve email parsing
      checkpatch: fix spelling errors and remove repeated word

    Aditya Srivastava <yashsri421@gmail.com>:
      checkpatch: avoid COMMIT_LOG_LONG_LINE warning for signature tags

    Dwaipayan Ray <dwaipayanray1@gmail.com>:
      checkpatch: fix unescaped left brace

    Aditya Srivastava <yashsri421@gmail.com>:
      checkpatch: add fix option for ASSIGNMENT_CONTINUATIONS
      checkpatch: add fix option for LOGICAL_CONTINUATIONS
      checkpatch: add fix and improve warning msg for non-standard signature

    Dwaipayan Ray <dwaipayanray1@gmail.com>:
      checkpatch: add warning for unnecessary use of %h[xudi] and %hh[xudi]
      checkpatch: add warning for lines starting with a '#' in commit log
      checkpatch: fix TYPO_SPELLING check for words with apostrophe

    Joe Perches <joe@perches.com>:
      checkpatch: add printk_once and printk_ratelimit to prefer pr_<level> warning

Subsystem: nilfs

    Alex Shi <alex.shi@linux.alibaba.com>:
      fs/nilfs2: remove some unused macros to tame gcc

Subsystem: kdump

    Alexander Egorenkov <egorenar@linux.ibm.com>:
      kdump: append uts_namespace.name offset to VMCOREINFO

Subsystem: rapidio

    Sebastian Andrzej Siewior <bigeasy@linutronix.de>:
      rapidio: remove unused rio_get_asm() and rio_get_device()

Subsystem: gcov

    Nick Desaulniers <ndesaulniers@google.com>:
      gcov: remove support for GCC < 4.9

    Alex Shi <alex.shi@linux.alibaba.com>:
      gcov: fix kernel-doc markup issue

Subsystem: bfs

    Randy Dunlap <rdunlap@infradead.org>:
      bfs: don't use WARNING: string when it's just info.

Subsystem: relay

    Jani Nikula <jani.nikula@intel.com>:
    Patch series "relay: cleanup and const callbacks", v2:
      relay: remove unused buf_mapped and buf_unmapped callbacks
      relay: require non-NULL callbacks in relay_open()
      relay: make create_buf_file and remove_buf_file callbacks mandatory
      relay: allow the use of const callback structs
      drm/i915: make relay callbacks const
      ath10k: make relay callbacks const
      ath11k: make relay callbacks const
      ath9k: make relay callbacks const
      blktrace: make relay callbacks const

Subsystem: resource

    Mauro Carvalho Chehab <mchehab+huawei@kernel.org>:
      kernel/resource.c: fix kernel-doc markups

Subsystem: ubsan

    Kees Cook <keescook@chromium.org>:
    Patch series "Clean up UBSAN Makefile", v2:
      ubsan: remove redundant -Wno-maybe-uninitialized
      ubsan: move cc-option tests into Kconfig
      ubsan: disable object-size sanitizer under GCC
      ubsan: disable UBSAN_TRAP for all*config
      ubsan: enable for all*config builds
      ubsan: remove UBSAN_MISC in favor of individual options
      ubsan: expand tests and reporting

    Dmitry Vyukov <dvyukov@google.com>:
      kcov: don't instrument with UBSAN

    Zou Wei <zou_wei@huawei.com>:
      lib/ubsan.c: mark type_check_kinds with static keyword

Subsystem: reboot

    Matteo Croce <mcroce@microsoft.com>:
      reboot: refactor and comment the cpu selection code
      reboot: allow to specify reboot mode via sysfs
      reboot: remove cf9_safe from allowed types and rename cf9_force
    Patch series "reboot: sysfs improvements":
      reboot: allow to override reboot type if quirks are found
      reboot: hide from sysfs not applicable settings

Subsystem: fault-injection

    Barnabás Pőcze <pobrn@protonmail.com>:
      fault-injection: handle EI_ETYPE_TRUE

Subsystem: lzo

    Jason Yan <yanaijie@huawei.com>:
      lib/lzo/lzo1x_compress.c: make lzogeneric1x_1_compress() static

Subsystem: apparmor

    Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
      apparmor: remove duplicate macro list_entry_is_head()

Subsystem: mm/pagemap

    Christoph Hellwig <hch@lst.de>:
    Patch series "simplify follow_pte a bit":
      mm: unexport follow_pte_pmd
      mm: simplify follow_pte{,pmd}

Subsystem: mm/cleanups

    Haitao Shi <shihaitao1@huawei.com>:
      mm: fix some spelling mistakes in comments

Subsystem: mm/gup

    Jann Horn <jannh@google.com>:
      mmap locking API: don't check locking if the mm isn't live yet
      mm/gup: assert that the mmap lock is held in __get_user_pages()

 Documentation/ABI/testing/sysfs-kernel-reboot    |   32 
 Documentation/admin-guide/kdump/vmcoreinfo.rst   |    6 
 Documentation/dev-tools/ubsan.rst                |    1 
 Documentation/filesystems/proc.rst               |    2 
 MAINTAINERS                                      |    5 
 arch/alpha/kernel/process.c                      |    2 
 arch/powerpc/kernel/vmlinux.lds.S                |    4 
 arch/s390/pci/pci_mmio.c                         |    4 
 drivers/gpio/gpio-thunderx.c                     |   11 
 drivers/gpio/gpio-xilinx.c                       |   61 -
 drivers/gpu/drm/i915/gt/uc/intel_guc_log.c       |    2 
 drivers/misc/lkdtm/Makefile                      |    1 
 drivers/misc/lkdtm/bugs.c                        |   50 +
 drivers/misc/lkdtm/core.c                        |    3 
 drivers/misc/lkdtm/fortify.c                     |   82 ++
 drivers/misc/lkdtm/lkdtm.h                       |   19 
 drivers/net/wireless/ath/ath10k/spectral.c       |    2 
 drivers/net/wireless/ath/ath11k/spectral.c       |    2 
 drivers/net/wireless/ath/ath9k/common-spectral.c |    2 
 drivers/rapidio/rio.c                            |   81 --
 fs/bfs/inode.c                                   |    2 
 fs/dax.c                                         |    9 
 fs/exec.c                                        |    8 
 fs/nfs/callback_proc.c                           |    5 
 fs/nilfs2/segment.c                              |    5 
 fs/proc/array.c                                  |   28 
 fs/proc/base.c                                   |    2 
 fs/proc/generic.c                                |   24 
 fs/proc/internal.h                               |   10 
 fs/proc/proc_net.c                               |   20 
 include/asm-generic/bitops/find.h                |   19 
 include/asm-generic/getorder.h                   |    2 
 include/linux/bitmap.h                           |   67 +-
 include/linux/bitops.h                           |   24 
 include/linux/dcache.h                           |    1 
 include/linux/iommu-helper.h                     |    4 
 include/linux/kernel.h                           |  173 -----
 include/linux/log2.h                             |    3 
 include/linux/math.h                             |  177 +++++
 include/linux/mm.h                               |    6 
 include/linux/mm_types.h                         |   10 
 include/linux/mmap_lock.h                        |   16 
 include/linux/proc_fs.h                          |    8 
 include/linux/rcu_node_tree.h                    |    2 
 include/linux/relay.h                            |   29 
 include/linux/rio_drv.h                          |    3 
 include/linux/string.h                           |   75 +-
 include/linux/units.h                            |    2 
 kernel/Makefile                                  |    3 
 kernel/acct.c                                    |    7 
 kernel/crash_core.c                              |    1 
 kernel/fail_function.c                           |    6 
 kernel/gcov/gcc_4_7.c                            |   10 
 kernel/reboot.c                                  |  308 ++++++++-
 kernel/relay.c                                   |  111 ---
 kernel/resource.c                                |   24 
 kernel/trace/blktrace.c                          |    2 
 lib/Kconfig.debug                                |   11 
 lib/Kconfig.ubsan                                |  154 +++-
 lib/Makefile                                     |    7 
 lib/bits_kunit.c                                 |   75 ++
 lib/cmdline.c                                    |   20 
 lib/cmdline_kunit.c                              |  100 +++
 lib/errname.c                                    |    1 
 lib/error-inject.c                               |    2 
 lib/errseq.c                                     |    1 
 lib/find_bit.c                                   |   17 
 lib/linear_ranges_kunit.c                        |  228 +++++++
 lib/list-test.c                                  |  748 -----------------------
 lib/list_kunit.c                                 |  748 +++++++++++++++++++++++
 lib/lz4/lz4_decompress.c                         |    6 
 lib/lz4/lz4defs.h                                |    1 
 lib/lzo/lzo1x_compress.c                         |    2 
 lib/math/div64.c                                 |    4 
 lib/math/int_pow.c                               |    2 
 lib/math/int_sqrt.c                              |    3 
 lib/math/reciprocal_div.c                        |    9 
 lib/stackdepot.c                                 |   11 
 lib/string.c                                     |    4 
 lib/test_bitmap.c                                |  143 ++++
 lib/test_bits.c                                  |   75 --
 lib/test_firmware.c                              |    9 
 lib/test_free_pages.c                            |    5 
 lib/test_kmod.c                                  |   26 
 lib/test_linear_ranges.c                         |  228 -------
 lib/test_lockup.c                                |   16 
 lib/test_ubsan.c                                 |   74 ++
 lib/ubsan.c                                      |    2 
 mm/filemap.c                                     |    2 
 mm/gup.c                                         |    2 
 mm/huge_memory.c                                 |    2 
 mm/khugepaged.c                                  |    2 
 mm/memblock.c                                    |    2 
 mm/memory.c                                      |   36 -
 mm/memory_hotplug.c                              |    2 
 mm/migrate.c                                     |    2 
 mm/page_ext.c                                    |    2 
 mm/swapfile.c                                    |   11 
 scripts/Makefile.ubsan                           |   49 -
 scripts/checkpatch.pl                            |  495 +++++++++++----
 security/apparmor/apparmorfs.c                   |    3 
 tools/testing/selftests/lkdtm/tests.txt          |    1 
 102 files changed, 3022 insertions(+), 1899 deletions(-)


^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: incoming
  2020-12-15 22:49   ` incoming Linus Torvalds
@ 2020-12-15 22:55     ` Andrew Morton
  0 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-12-15 22:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux-MM, mm-commits

On Tue, 15 Dec 2020 14:49:24 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Dec 15, 2020 at 2:48 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > I will try to apply it on top of my merge of your previous series instead.
> 
> Yes, then it applies cleanly. So apparently we just have different
> concepts of what really constitutes a "base" for applying your series.
> 

oop, sorry, yes, the "based on" thing was wrong because I had two
series in flight simultaneously.  I've never tried that before..

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: incoming
  2020-12-15 22:48 ` incoming Linus Torvalds
@ 2020-12-15 22:49   ` Linus Torvalds
  2020-12-15 22:55     ` incoming Andrew Morton
  0 siblings, 1 reply; 263+ messages in thread
From: Linus Torvalds @ 2020-12-15 22:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux-MM, mm-commits

On Tue, Dec 15, 2020 at 2:48 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I will try to apply it on top of my merge of your previous series instead.

Yes, then it applies cleanly. So apparently we just have different
concepts of what really constitutes a "base" for applying your series.

              Linus

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: incoming
  2020-12-15 20:32 incoming Andrew Morton
  2020-12-15 21:00 ` incoming Linus Torvalds
@ 2020-12-15 22:48 ` Linus Torvalds
  2020-12-15 22:49   ` incoming Linus Torvalds
  1 sibling, 1 reply; 263+ messages in thread
From: Linus Torvalds @ 2020-12-15 22:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux-MM, mm-commits

On Tue, Dec 15, 2020 at 12:32 PM Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> - more MM work: a memcg scalability improvememt
>
> 19 patches, based on 148842c98a24e508aecb929718818fbf4c2a6ff3.

With your re-send, I get all patches, but they don't actually apply cleanly.

Is that base correct?

I get

  error: patch failed: mm/huge_memory.c:2750
  error: mm/huge_memory.c: patch does not apply
  Patch failed at 0004 mm/thp: narrow lru locking

for that patch "[patch 04/19] mm/thp: narrow lru locking", and that's
definitely true: the patch fragment has

@@ -2750,7 +2751,7 @@ int split_huge_page_to_list(struct page
                                __dec_lruvec_page_state(head, NR_FILE_THPS);
                }

-               __split_huge_page(page, list, end, flags);
+               __split_huge_page(page, list, end);
                ret = 0;
        } else {
                if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) {

but that __dec_lruvec_page_state() conversion was done by your
previous commit series.

So I have the feeling that what you actually mean by "base" isn't
actually really the base for that series at all..

I will try to apply it on top of my merge of your previous series instead.

              Linus

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: incoming
  2020-12-15 20:32 incoming Andrew Morton
@ 2020-12-15 21:00 ` Linus Torvalds
  2020-12-15 22:48 ` incoming Linus Torvalds
  1 sibling, 0 replies; 263+ messages in thread
From: Linus Torvalds @ 2020-12-15 21:00 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux-MM, mm-commits

On Tue, Dec 15, 2020 at 12:32 PM Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> - more MM work: a memcg scalability improvememt
>
> 19 patches, based on 148842c98a24e508aecb929718818fbf4c2a6ff3.

I'm not seeing patch 10/19 at all.

And patch 19/19 is corrupted and has an attachment with a '^P'
character in it. I could fix it up, but with the missing patch in the
middle I'm not going to even try. 'b4' is also very unhappy about that
patch 19/19.

I don't know what went wrong, but I'll ignore this send - please
re-send the series at your leisure, ok?

            Linus

^ permalink raw reply	[flat|nested] 263+ messages in thread

* incoming
@ 2020-12-15 20:32 Andrew Morton
  2020-12-15 21:00 ` incoming Linus Torvalds
  2020-12-15 22:48 ` incoming Linus Torvalds
  0 siblings, 2 replies; 263+ messages in thread
From: Andrew Morton @ 2020-12-15 20:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, mm-commits


- more MM work: a memcg scalability improvememt

19 patches, based on 148842c98a24e508aecb929718818fbf4c2a6ff3.

Subsystems affected by this patch series:


    Alex Shi <alex.shi@linux.alibaba.com>:
    Patch series "per memcg lru lock", v21:
      mm/thp: move lru_add_page_tail() to huge_memory.c
      mm/thp: use head for head page in lru_add_page_tail()
      mm/thp: simplify lru_add_page_tail()
      mm/thp: narrow lru locking
      mm/vmscan: remove unnecessary lruvec adding
      mm/rmap: stop store reordering issue on page->mapping

    Hugh Dickins <hughd@google.com>:
      mm: page_idle_get_page() does not need lru_lock

    Alex Shi <alex.shi@linux.alibaba.com>:
      mm/memcg: add debug checking in lock_page_memcg
      mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn
      mm/lru: move lock into lru_note_cost
      mm/vmscan: remove lruvec reget in move_pages_to_lru
      mm/mlock: remove lru_lock on TestClearPageMlocked
      mm/mlock: remove __munlock_isolate_lru_page()
      mm/lru: introduce TestClearPageLRU()
      mm/compaction: do page isolation first in compaction
      mm/swap.c: serialize memcg changes in pagevec_lru_move_fn
      mm/lru: replace pgdat lru_lock with lruvec lock

    Alexander Duyck <alexander.h.duyck@linux.intel.com>:
      mm/lru: introduce relock_page_lruvec()

    Hugh Dickins <hughd@google.com>:
      mm/lru: revise the comments of lru_lock

 Documentation/admin-guide/cgroup-v1/memcg_test.rst |   15 -
 Documentation/admin-guide/cgroup-v1/memory.rst     |   23 -
 Documentation/trace/events-kmem.rst                |    2 
 Documentation/vm/unevictable-lru.rst               |   22 -
 include/linux/memcontrol.h                         |  110 +++++++
 include/linux/mm_types.h                           |    2 
 include/linux/mmzone.h                             |    6 
 include/linux/page-flags.h                         |    1 
 include/linux/swap.h                               |    4 
 mm/compaction.c                                    |   98 ++++---
 mm/filemap.c                                       |    4 
 mm/huge_memory.c                                   |  109 ++++---
 mm/memcontrol.c                                    |   84 +++++-
 mm/mlock.c                                         |   93 ++----
 mm/mmzone.c                                        |    1 
 mm/page_alloc.c                                    |    1 
 mm/page_idle.c                                     |    4 
 mm/rmap.c                                          |   12 
 mm/swap.c                                          |  292 ++++++++-------------
 mm/vmscan.c                                        |  239 ++++++++---------
 mm/workingset.c                                    |    2 
 21 files changed, 644 insertions(+), 480 deletions(-)


^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: incoming
  2020-12-15  3:30   ` incoming Linus Torvalds
@ 2020-12-15 14:04     ` Konstantin Ryabitsev
  0 siblings, 0 replies; 263+ messages in thread
From: Konstantin Ryabitsev @ 2020-12-15 14:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, mm-commits, Linux-MM

On Mon, Dec 14, 2020 at 07:30:54PM -0800, Linus Torvalds wrote:
> > All the patches except for _one_ get a nice little green check-mark
> > next to them when I use 'git am' on this series.
> >
> > The one that did not was [patch 192/200].
> >
> > I have no idea why
> 
> Hmm. It looks like that patch is the only one in the series with the
> ">From" marker in the commit message, from the silly "clarify that
> this isn't the first line in a new message in mbox format".
> 
> And "b4 am" has turned the single ">" into two, making the stupid
> marker worse, and actually corrupting the end result.

It's a bug in b4 that I overlooked. Public-inbox emits mboxrd-formatted 
.mbox files, while Python's mailbox.mbox consumes mboxo only. The main 
distinction between the two is precisely that mboxrd will convert 
">From " into ">>From " in an attempt to avoid corruption during
escape/unescape (it didn't end up fixing the problem 100% and mostly 
introduced incompatibilities like this one).

I have a fix in master/stable-0.6.y and I'll release a 0.6.2 before the 
end of the week.

Thanks for the report.

-K

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: incoming
  2020-12-15  3:25 ` incoming Linus Torvalds
@ 2020-12-15  3:30   ` Linus Torvalds
  2020-12-15 14:04     ` incoming Konstantin Ryabitsev
  0 siblings, 1 reply; 263+ messages in thread
From: Linus Torvalds @ 2020-12-15  3:30 UTC (permalink / raw)
  To: Andrew Morton, Konstantin Ryabitsev; +Cc: mm-commits, Linux-MM

On Mon, Dec 14, 2020 at 7:25 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> All the patches except for _one_ get a nice little green check-mark
> next to them when I use 'git am' on this series.
>
> The one that did not was [patch 192/200].
>
> I have no idea why

Hmm. It looks like that patch is the only one in the series with the
">From" marker in the commit message, from the silly "clarify that
this isn't the first line in a new message in mbox format".

And "b4 am" has turned the single ">" into two, making the stupid
marker worse, and actually corrupting the end result.

Coincidence? Or cause?

            Linus

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: incoming
  2020-12-15  3:02 incoming Andrew Morton
@ 2020-12-15  3:25 ` Linus Torvalds
  2020-12-15  3:30   ` incoming Linus Torvalds
  0 siblings, 1 reply; 263+ messages in thread
From: Linus Torvalds @ 2020-12-15  3:25 UTC (permalink / raw)
  To: Andrew Morton, Konstantin Ryabitsev; +Cc: mm-commits, Linux-MM

On Mon, Dec 14, 2020 at 7:02 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> 200 patches, based on 2c85ebc57b3e1817b6ce1a6b703928e113a90442.

I haven't actually processed the patches yet, but I have a question
for Konstantin wrt b4.

All the patches except for _one_ get a nice little green check-mark
next to them when I use 'git am' on this series.

The one that did not was [patch 192/200].

I have no idea why - and it doesn't matter a lot to me, it just stood
out as being different. I'm assuming Andrew has started doing patch
attestation, and that patch failed. But if so, maybe Konstantin wants
to know what went wrong.

Konstantin?

            Linus

^ permalink raw reply	[flat|nested] 263+ messages in thread

* incoming
@ 2020-12-15  3:02 Andrew Morton
  2020-12-15  3:25 ` incoming Linus Torvalds
  0 siblings, 1 reply; 263+ messages in thread
From: Andrew Morton @ 2020-12-15  3:02 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm


- a few random little subsystems

- almost all of the MM patches which are staged ahead of linux-next
  material.  I'll trickle to post-linux-next work in as the dependents
  get merged up.


200 patches, based on 2c85ebc57b3e1817b6ce1a6b703928e113a90442.

Subsystems affected by this patch series:

  kthread
  kbuild
  ide
  ntfs
  ocfs2
  arch
  mm/slab-generic
  mm/slab
  mm/slub
  mm/dax
  mm/debug
  mm/pagecache
  mm/gup
  mm/swap
  mm/shmem
  mm/memcg
  mm/pagemap
  mm/mremap
  mm/hmm
  mm/vmalloc
  mm/documentation
  mm/kasan
  mm/pagealloc
  mm/memory-failure
  mm/hugetlb
  mm/vmscan
  mm/z3fold
  mm/compaction
  mm/oom-kill
  mm/migration
  mm/cma
  mm/page-poison
  mm/userfaultfd
  mm/zswap
  mm/zsmalloc
  mm/uaccess
  mm/zram
  mm/cleanups

Subsystem: kthread

    Rob Clark <robdclark@chromium.org>:
      kthread: add kthread_work tracepoints

    Petr Mladek <pmladek@suse.com>:
      kthread_worker: document CPU hotplug handling

Subsystem: kbuild

    Petr Vorel <petr.vorel@gmail.com>:
      uapi: move constants from <linux/kernel.h> to <linux/const.h>

Subsystem: ide

    Sebastian Andrzej Siewior <bigeasy@linutronix.de>:
      ide/falcon: remove in_interrupt() usage
      ide: remove BUG_ON(in_interrupt() || irqs_disabled()) from ide_unregister()

Subsystem: ntfs

    Alex Shi <alex.shi@linux.alibaba.com>:
      fs/ntfs: remove unused varibles
      fs/ntfs: remove unused variable attr_len

Subsystem: ocfs2

    Tom Rix <trix@redhat.com>:
      fs/ocfs2/cluster/tcp.c: remove unneeded break

    Mauricio Faria de Oliveira <mfo@canonical.com>:
      ocfs2: ratelimit the 'max lookup times reached' notice

Subsystem: arch

    Colin Ian King <colin.king@canonical.com>:
      arch/Kconfig: fix spelling mistakes

Subsystem: mm/slab-generic

    Hui Su <sh_def@163.com>:
      mm/slab_common.c: use list_for_each_entry in dump_unreclaimable_slab()

    Bartosz Golaszewski <bgolaszewski@baylibre.com>:
    Patch series "slab: provide and use krealloc_array()", v3:
      mm: slab: clarify krealloc()'s behavior with __GFP_ZERO
      mm: slab: provide krealloc_array()
      ALSA: pcm: use krealloc_array()
      vhost: vringh: use krealloc_array()
      pinctrl: use krealloc_array()
      edac: ghes: use krealloc_array()
      drm: atomic: use krealloc_array()
      hwtracing: intel: use krealloc_array()
      dma-buf: use krealloc_array()

    Vlastimil Babka <vbabka@suse.cz>:
      mm, slab, slub: clear the slab_cache field when freeing page

Subsystem: mm/slab

    Alexander Popov <alex.popov@linux.com>:
      mm/slab: rerform init_on_free earlier

Subsystem: mm/slub

    Vlastimil Babka <vbabka@suse.cz>:
      mm, slub: use kmem_cache_debug_flags() in deactivate_slab()

    Bharata B Rao <bharata@linux.ibm.com>:
      mm/slub: let number of online CPUs determine the slub page order

Subsystem: mm/dax

    Dan Williams <dan.j.williams@intel.com>:
      device-dax/kmem: use struct_size()

Subsystem: mm/debug

    Zhenhua Huang <zhenhuah@codeaurora.org>:
      mm: fix page_owner initializing issue for arm32

    Liam Mark <lmark@codeaurora.org>:
      mm/page_owner: record timestamp and pid

Subsystem: mm/pagecache

    Kent Overstreet <kent.overstreet@gmail.com>:
    Patch series "generic_file_buffered_read() improvements", v2:
      mm/filemap/c: break generic_file_buffered_read up into multiple functions
      mm/filemap.c: generic_file_buffered_read() now uses find_get_pages_contig

    Alex Shi <alex.shi@linux.alibaba.com>:
      mm/truncate: add parameter explanation for invalidate_mapping_pagevec

    Hailong Liu <carver4lio@163.com>:
      mm/filemap.c: remove else after a return

Subsystem: mm/gup

    John Hubbard <jhubbard@nvidia.com>:
    Patch series "selftests/vm: gup_test, hmm-tests, assorted improvements", v3:
      mm/gup_benchmark: rename to mm/gup_test
      selftests/vm: use a common gup_test.h
      selftests/vm: rename run_vmtests --> run_vmtests.sh
      selftests/vm: minor cleanup: Makefile and gup_test.c
      selftests/vm: only some gup_test items are really benchmarks
      selftests/vm: gup_test: introduce the dump_pages() sub-test
      selftests/vm: run_vmtests.sh: update and clean up gup_test invocation
      selftests/vm: hmm-tests: remove the libhugetlbfs dependency
      selftests/vm: 2x speedup for run_vmtests.sh

    Barry Song <song.bao.hua@hisilicon.com>:
      mm/gup_test.c: mark gup_test_init as __init function
      mm/gup_test: GUP_TEST depends on DEBUG_FS

    Jason Gunthorpe <jgg@nvidia.com>:
    Patch series "Add a seqcount between gup_fast and copy_page_range()", v4:
      mm/gup: reorganize internal_get_user_pages_fast()
      mm/gup: prevent gup_fast from racing with COW during fork
      mm/gup: remove the vma allocation from gup_longterm_locked()
      mm/gup: combine put_compound_head() and unpin_user_page()

Subsystem: mm/swap

    Ralph Campbell <rcampbell@nvidia.com>:
      mm: handle zone device pages in release_pages()

    Miaohe Lin <linmiaohe@huawei.com>:
      mm/swapfile.c: use helper function swap_count() in add_swap_count_continuation()
      mm/swap_state: skip meaningless swap cache readahead when ra_info.win == 0
      mm/swapfile.c: remove unnecessary out label in __swap_duplicate()
      mm/swapfile.c: use memset to fill the swap_map with SWAP_HAS_CACHE

    Jeff Layton <jlayton@kernel.org>:
      mm: remove pagevec_lookup_range_nr_tag()

Subsystem: mm/shmem

    Hui Su <sh_def@163.com>:
      mm/shmem.c: make shmem_mapping() inline

    Randy Dunlap <rdunlap@infradead.org>:
      tmpfs: fix Documentation nits

Subsystem: mm/memcg

    Johannes Weiner <hannes@cmpxchg.org>:
      mm: memcontrol: add file_thp, shmem_thp to memory.stat

    Muchun Song <songmuchun@bytedance.com>:
      mm: memcontrol: remove unused mod_memcg_obj_state()

    Miaohe Lin <linmiaohe@huawei.com>:
      mm: memcontrol: eliminate redundant check in __mem_cgroup_insert_exceeded()

    Muchun Song <songmuchun@bytedance.com>:
      mm: memcg/slab: fix return of child memcg objcg for root memcg
      mm: memcg/slab: fix use after free in obj_cgroup_charge

    Shakeel Butt <shakeelb@google.com>:
      mm/rmap: always do TTU_IGNORE_ACCESS

    Alex Shi <alex.shi@linux.alibaba.com>:
      mm/memcg: update page struct member in comments

    Roman Gushchin <guro@fb.com>:
      mm: memcg: fix obsolete code comments
    Patch series "mm: memcg: deprecate cgroup v1 non-hierarchical mode", v1:
      mm: memcg: deprecate the non-hierarchical mode
      docs: cgroup-v1: reflect the deprecation of the non-hierarchical mode
      cgroup: remove obsoleted broken_hierarchy and warned_broken_hierarchy

    Hui Su <sh_def@163.com>:
      mm/page_counter: use page_counter_read in page_counter_set_max

    Lukas Bulwahn <lukas.bulwahn@gmail.com>:
      mm: memcg: remove obsolete memcg_has_children()

    Muchun Song <songmuchun@bytedance.com>:
      mm: memcg/slab: rename *_lruvec_slab_state to *_lruvec_kmem_state

    Kaixu Xia <kaixuxia@tencent.com>:
      mm: memcontrol: sssign boolean values to a bool variable

    Alex Shi <alex.shi@linux.alibaba.com>:
      mm/memcg: remove incorrect comment

    Shakeel Butt <shakeelb@google.com>:
    Patch series "memcg: add pagetable comsumption to memory.stat", v2:
      mm: move lruvec stats update functions to vmstat.h
      mm: memcontrol: account pagetables per node

Subsystem: mm/pagemap

    Dan Williams <dan.j.williams@intel.com>:
      xen/unpopulated-alloc: consolidate pgmap manipulation

    Kalesh Singh <kaleshsingh@google.com>:
    Patch series "Speed up mremap on large regions", v4:
      kselftests: vm: add mremap tests
      mm: speedup mremap on 1GB or larger regions
      arm64: mremap speedup - enable HAVE_MOVE_PUD
      x86: mremap speedup - Enable HAVE_MOVE_PUD

    John Hubbard <jhubbard@nvidia.com>:
      mm: cleanup: remove unused tsk arg from __access_remote_vm

    Alex Shi <alex.shi@linux.alibaba.com>:
      mm/mapping_dirty_helpers: enhance the kernel-doc markups
      mm/page_vma_mapped.c: add colon to fix kernel-doc markups error for check_pte

    Axel Rasmussen <axelrasmussen@google.com>:
      mm: mmap_lock: add tracepoints around lock acquisition

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
      sparc: fix handling of page table constructor failure
      mm: move free_unref_page to mm/internal.h

Subsystem: mm/mremap

    Dmitry Safonov <dima@arista.com>:
    Patch series "mremap: move_vma() fixes":
      mm/mremap: account memory on do_munmap() failure
      mm/mremap: for MREMAP_DONTUNMAP check security_vm_enough_memory_mm()
      mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio
      vm_ops: rename .split() callback to .may_split()
      mremap: check if it's possible to split original vma
      mm: forbid splitting special mappings

Subsystem: mm/hmm

    Daniel Vetter <daniel.vetter@ffwll.ch>:
      mm: track mmu notifiers in fs_reclaim_acquire/release
      mm: extract might_alloc() debug check
      locking/selftests: add testcases for fs_reclaim

Subsystem: mm/vmalloc

    Andrew Morton <akpm@linux-foundation.org>:
      mm/vmalloc.c:__vmalloc_area_node(): avoid 32-bit overflow

    "Uladzislau Rezki (Sony)" <urezki@gmail.com>:
      mm/vmalloc: use free_vm_area() if an allocation fails
      mm/vmalloc: rework the drain logic

    Alex Shi <alex.shi@linux.alibaba.com>:
      mm/vmalloc: add 'align' parameter explanation for pvm_determine_end_from_reverse

    Baolin Wang <baolin.wang@linux.alibaba.com>:
      mm/vmalloc.c: remove unnecessary return statement

    Waiman Long <longman@redhat.com>:
      mm/vmalloc: Fix unlock order in s_stop()

Subsystem: mm/documentation

    Alex Shi <alex.shi@linux.alibaba.com>:
      docs/vm: remove unused 3 items explanation for /proc/vmstat

Subsystem: mm/kasan

    Vincenzo Frascino <vincenzo.frascino@arm.com>:
      mm/vmalloc.c: fix kasan shadow poisoning size

    Walter Wu <walter-zh.wu@mediatek.com>:
    Patch series "kasan: add workqueue stack for generic KASAN", v5:
      workqueue: kasan: record workqueue stack
      kasan: print workqueue stack
      lib/test_kasan.c: add workqueue test case
      kasan: update documentation for generic kasan

    Marco Elver <elver@google.com>:
      lkdtm: disable KASAN for rodata.o

Subsystem: mm/pagealloc

    Mike Rapoport <rppt@linux.ibm.com>:
    Patch series "arch, mm: deprecate DISCONTIGMEM", v2:
      alpha: switch from DISCONTIGMEM to SPARSEMEM
      ia64: remove custom __early_pfn_to_nid()
      ia64: remove 'ifdef CONFIG_ZONE_DMA32' statements
      ia64: discontig: paging_init(): remove local max_pfn calculation
      ia64: split virtual map initialization out of paging_init()
      ia64: forbid using VIRTUAL_MEM_MAP with FLATMEM
      ia64: make SPARSEMEM default and disable DISCONTIGMEM
      arm: remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL
      arm, arm64: move free_unused_memmap() to generic mm
      arc: use FLATMEM with freeing of unused memory map instead of DISCONTIGMEM
      m68k/mm: make node data and node setup depend on CONFIG_DISCONTIGMEM
      m68k/mm: enable use of generic memory_model.h for !DISCONTIGMEM
      m68k: deprecate DISCONTIGMEM
    Patch series "arch, mm: improve robustness of direct map manipulation", v7:
      mm: introduce debug_pagealloc_{map,unmap}_pages() helpers
      PM: hibernate: make direct map manipulations more explicit
      arch, mm: restore dependency of __kernel_map_pages() on DEBUG_PAGEALLOC
      arch, mm: make kernel_page_present() always available

    Vlastimil Babka <vbabka@suse.cz>:
    Patch series "disable pcplists during memory offline", v3:
      mm, page_alloc: clean up pageset high and batch update
      mm, page_alloc: calculate pageset high and batch once per zone
      mm, page_alloc: remove setup_pageset()
      mm, page_alloc: simplify pageset_update()
      mm, page_alloc: cache pageset high and batch in struct zone
      mm, page_alloc: move draining pcplists to page isolation users
      mm, page_alloc: disable pcplists during memory offline

    Miaohe Lin <linmiaohe@huawei.com>:
      include/linux/page-flags.h: remove unused __[Set|Clear]PagePrivate

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
      mm/page-flags: fix comment
      mm/page_alloc: add __free_pages() documentation

    Zou Wei <zou_wei@huawei.com>:
      mm/page_alloc: mark some symbols with static keyword

    David Hildenbrand <david@redhat.com>:
      mm/page_alloc: clear all pages in post_alloc_hook() with init_on_alloc=1

    Lin Feng <linf@wangsu.com>:
      init/main: fix broken buffer_init when DEFERRED_STRUCT_PAGE_INIT set

    Lorenzo Stoakes <lstoakes@gmail.com>:
      mm: page_alloc: refactor setup_per_zone_lowmem_reserve()

    Muchun Song <songmuchun@bytedance.com>:
      mm/page_alloc: speed up the iteration of max_order

Subsystem: mm/memory-failure

    Oscar Salvador <osalvador@suse.de>:
    Patch series "HWpoison: further fixes and cleanups", v5:
      mm,hwpoison: drain pcplists before bailing out for non-buddy zero-refcount page
      mm,hwpoison: take free pages off the buddy freelists
      mm,hwpoison: drop unneeded pcplist draining
    Patch series "HWPoison: Refactor get page interface", v2:
      mm,hwpoison: refactor get_any_page
      mm,hwpoison: disable pcplists before grabbing a refcount
      mm,hwpoison: remove drain_all_pages from shake_page
      mm,memory_failure: always pin the page in madvise_inject_error
      mm,hwpoison: return -EBUSY when migration fails

Subsystem: mm/hugetlb

    Hui Su <sh_def@163.com>:
      mm/hugetlb.c: just use put_page_testzero() instead of page_count()

    Ralph Campbell <rcampbell@nvidia.com>:
      include/linux/huge_mm.h: remove extern keyword

    Alex Shi <alex.shi@linux.alibaba.com>:
      khugepaged: add parameter explanations for kernel-doc markup

    Liu Xiang <liu.xiang@zlingsmart.com>:
      mm: hugetlb: fix type of delta parameter and related local variables in gather_surplus_pages()

    Oscar Salvador <osalvador@suse.de>:
      mm,hugetlb: remove unneeded initialization

    Dan Carpenter <dan.carpenter@oracle.com>:
      hugetlb: fix an error code in hugetlb_reserve_pages()

Subsystem: mm/vmscan

    Johannes Weiner <hannes@cmpxchg.org>:
      mm: don't wake kswapd prematurely when watermark boosting is disabled

    Lukas Bulwahn <lukas.bulwahn@gmail.com>:
      mm/vmscan: drop unneeded assignment in kswapd()

    "logic.yu" <hymmsx.yu@gmail.com>:
      mm/vmscan.c: remove the filename in the top of file comment

    Muchun Song <songmuchun@bytedance.com>:
      mm/page_isolation: do not isolate the max order page

Subsystem: mm/z3fold

    Vitaly Wool <vitaly.wool@konsulko.com>:
    Patch series "z3fold: stability / rt fixes":
      z3fold: simplify freeing slots
      z3fold: stricter locking and more careful reclaim
      z3fold: remove preempt disabled sections for RT

Subsystem: mm/compaction

    Yanfei Xu <yanfei.xu@windriver.com>:
      mm/compaction: rename 'start_pfn' to 'iteration_start_pfn' in compact_zone()

    Hui Su <sh_def@163.com>:
      mm/compaction: move compaction_suitable's comment to right place
      mm/compaction: make defer_compaction and compaction_deferred static

Subsystem: mm/oom-kill

    Hui Su <sh_def@163.com>:
      mm/oom_kill: change comment and rename is_dump_unreclaim_slabs()

Subsystem: mm/migration

    Long Li <lonuxli.64@gmail.com>:
      mm/migrate.c: fix comment spelling

    Ralph Campbell <rcampbell@nvidia.com>:
      mm/migrate.c: optimize migrate_vma_pages() mmu notifier

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
      mm: support THPs in zero_user_segments

    Yang Shi <shy828301@gmail.com>:
    Patch series "mm: misc migrate cleanup and improvement", v3:
      mm: truncate_complete_page() does not exist any more
      mm: migrate: simplify the logic for handling permanent failure
      mm: migrate: skip shared exec THP for NUMA balancing
      mm: migrate: clean up migrate_prep{_local}
      mm: migrate: return -ENOSYS if THP migration is unsupported

    Stephen Zhang <starzhangzsd@gmail.com>:
      mm: migrate: remove unused parameter in migrate_vma_insert_page()

Subsystem: mm/cma

    Lecopzer Chen <lecopzer.chen@mediatek.com>:
      mm/cma.c: remove redundant cma_mutex lock

    Charan Teja Reddy <charante@codeaurora.org>:
      mm: cma: improve pr_debug log in cma_release()

Subsystem: mm/page-poison

    Vlastimil Babka <vbabka@suse.cz>:
    Patch series "cleanup page poisoning", v3:
      mm, page_alloc: do not rely on the order of page_poison and init_on_alloc/free parameters
      mm, page_poison: use static key more efficiently
      kernel/power: allow hibernation with page_poison sanity checking
      mm, page_poison: remove CONFIG_PAGE_POISONING_NO_SANITY
      mm, page_poison: remove CONFIG_PAGE_POISONING_ZERO

Subsystem: mm/userfaultfd

    Lokesh Gidra <lokeshgidra@google.com>:
    Patch series "Control over userfaultfd kernel-fault handling", v6:
      userfaultfd: add UFFD_USER_MODE_ONLY
      userfaultfd: add user-mode only option to unprivileged_userfaultfd sysctl knob

    Axel Rasmussen <axelrasmussen@google.com>:
      userfaultfd: selftests: make __{s,u}64 format specifiers portable

    Peter Xu <peterx@redhat.com>:
    Patch series "userfaultfd: selftests: Small fixes":
      userfaultfd/selftests: always dump something in modes
      userfaultfd/selftests: fix retval check for userfaultfd_open()
      userfaultfd/selftests: hint the test runner on required privilege

Subsystem: mm/zswap

    Joe Perches <joe@perches.com>:
      mm/zswap: make struct kernel_param_ops definitions const

    YueHaibing <yuehaibing@huawei.com>:
      mm/zswap: fix passing zero to 'PTR_ERR' warning

    Barry Song <song.bao.hua@hisilicon.com>:
      mm/zswap: move to use crypto_acomp API for hardware acceleration

Subsystem: mm/zsmalloc

    Miaohe Lin <linmiaohe@huawei.com>:
      mm/zsmalloc.c: rework the list_add code in insert_zspage()

Subsystem: mm/uaccess

    Colin Ian King <colin.king@canonical.com>:
      mm/process_vm_access: remove redundant initialization of iov_r

Subsystem: mm/zram

    Minchan Kim <minchan@kernel.org>:
      zram: support page writeback
      zram: add stat to gather incompressible pages since zram set up

    Rui Salvaterra <rsalvaterra@gmail.com>:
      zram: break the strict dependency from lzo

Subsystem: mm/cleanups

    Mauro Carvalho Chehab <mchehab+huawei@kernel.org>:
      mm: fix kernel-doc markups

    Joe Perches <joe@perches.com>:
    Patch series "mm: Convert sysfs sprintf family to sysfs_emit", v2:
      mm: use sysfs_emit for struct kobject * uses
      mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neatening
      mm:backing-dev: use sysfs_emit in macro defining functions
      mm: shmem: convert shmem_enabled_show to use sysfs_emit_at
      mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at

    "Gustavo A. R. Silva" <gustavoars@kernel.org>:
      mm: fix fall-through warnings for Clang

    Alexey Dobriyan <adobriyan@gmail.com>:
      mm: cleanup kstrto*() usage

 /mmap_lock.h                                         |  107 ++
 a/Documentation/admin-guide/blockdev/zram.rst        |    6 
 a/Documentation/admin-guide/cgroup-v1/memcg_test.rst |    8 
 a/Documentation/admin-guide/cgroup-v1/memory.rst     |   42 
 a/Documentation/admin-guide/cgroup-v2.rst            |   11 
 a/Documentation/admin-guide/mm/transhuge.rst         |   15 
 a/Documentation/admin-guide/sysctl/vm.rst            |   15 
 a/Documentation/core-api/memory-allocation.rst       |    4 
 a/Documentation/core-api/pin_user_pages.rst          |    8 
 a/Documentation/dev-tools/kasan.rst                  |    5 
 a/Documentation/filesystems/tmpfs.rst                |    8 
 a/Documentation/vm/memory-model.rst                  |    3 
 a/Documentation/vm/page_owner.rst                    |   12 
 a/arch/Kconfig                                       |   21 
 a/arch/alpha/Kconfig                                 |    8 
 a/arch/alpha/include/asm/mmzone.h                    |   14 
 a/arch/alpha/include/asm/page.h                      |    7 
 a/arch/alpha/include/asm/pgtable.h                   |   12 
 a/arch/alpha/include/asm/sparsemem.h                 |   18 
 a/arch/alpha/kernel/setup.c                          |    1 
 a/arch/arc/Kconfig                                   |    3 
 a/arch/arc/include/asm/page.h                        |   20 
 a/arch/arc/mm/init.c                                 |   29 
 a/arch/arm/Kconfig                                   |   12 
 a/arch/arm/kernel/vdso.c                             |    9 
 a/arch/arm/mach-bcm/Kconfig                          |    1 
 a/arch/arm/mach-davinci/Kconfig                      |    1 
 a/arch/arm/mach-exynos/Kconfig                       |    1 
 a/arch/arm/mach-highbank/Kconfig                     |    1 
 a/arch/arm/mach-omap2/Kconfig                        |    1 
 a/arch/arm/mach-s5pv210/Kconfig                      |    1 
 a/arch/arm/mach-tango/Kconfig                        |    1 
 a/arch/arm/mm/init.c                                 |   78 -
 a/arch/arm64/Kconfig                                 |    9 
 a/arch/arm64/include/asm/cacheflush.h                |    1 
 a/arch/arm64/include/asm/pgtable.h                   |    1 
 a/arch/arm64/kernel/vdso.c                           |   41 
 a/arch/arm64/mm/init.c                               |   68 -
 a/arch/arm64/mm/pageattr.c                           |   12 
 a/arch/ia64/Kconfig                                  |   11 
 a/arch/ia64/include/asm/meminit.h                    |    2 
 a/arch/ia64/mm/contig.c                              |   88 --
 a/arch/ia64/mm/discontig.c                           |   44 -
 a/arch/ia64/mm/init.c                                |   14 
 a/arch/ia64/mm/numa.c                                |   30 
 a/arch/m68k/Kconfig.cpu                              |   31 
 a/arch/m68k/include/asm/page.h                       |    2 
 a/arch/m68k/include/asm/page_mm.h                    |    7 
 a/arch/m68k/include/asm/virtconvert.h                |    7 
 a/arch/m68k/mm/init.c                                |   10 
 a/arch/mips/vdso/genvdso.c                           |    4 
 a/arch/nds32/mm/mm-nds32.c                           |    6 
 a/arch/powerpc/Kconfig                               |    5 
 a/arch/riscv/Kconfig                                 |    4 
 a/arch/riscv/include/asm/pgtable.h                   |    2 
 a/arch/riscv/include/asm/set_memory.h                |    1 
 a/arch/riscv/mm/pageattr.c                           |   31 
 a/arch/s390/Kconfig                                  |    4 
 a/arch/s390/configs/debug_defconfig                  |    2 
 a/arch/s390/configs/defconfig                        |    2 
 a/arch/s390/kernel/vdso.c                            |   11 
 a/arch/sparc/Kconfig                                 |    4 
 a/arch/sparc/mm/init_64.c                            |    2 
 a/arch/x86/Kconfig                                   |    5 
 a/arch/x86/entry/vdso/vma.c                          |   17 
 a/arch/x86/include/asm/set_memory.h                  |    1 
 a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c          |    2 
 a/arch/x86/kernel/tboot.c                            |    1 
 a/arch/x86/mm/pat/set_memory.c                       |    6 
 a/drivers/base/node.c                                |    2 
 a/drivers/block/zram/Kconfig                         |   42 
 a/drivers/block/zram/zcomp.c                         |    2 
 a/drivers/block/zram/zram_drv.c                      |   29 
 a/drivers/block/zram/zram_drv.h                      |    1 
 a/drivers/dax/device.c                               |    4 
 a/drivers/dax/kmem.c                                 |    2 
 a/drivers/dma-buf/sync_file.c                        |    3 
 a/drivers/edac/ghes_edac.c                           |    4 
 a/drivers/firmware/efi/efi.c                         |    1 
 a/drivers/gpu/drm/drm_atomic.c                       |    3 
 a/drivers/hwtracing/intel_th/msu.c                   |    2 
 a/drivers/ide/falconide.c                            |    2 
 a/drivers/ide/ide-probe.c                            |    3 
 a/drivers/misc/lkdtm/Makefile                        |    1 
 a/drivers/pinctrl/pinctrl-utils.c                    |    2 
 a/drivers/vhost/vringh.c                             |    3 
 a/drivers/virtio/virtio_balloon.c                    |    6 
 a/drivers/xen/unpopulated-alloc.c                    |   14 
 a/fs/aio.c                                           |    5 
 a/fs/ntfs/file.c                                     |    5 
 a/fs/ntfs/inode.c                                    |    2 
 a/fs/ntfs/logfile.c                                  |    3 
 a/fs/ocfs2/cluster/tcp.c                             |    1 
 a/fs/ocfs2/namei.c                                   |    4 
 a/fs/proc/kcore.c                                    |    2 
 a/fs/proc/meminfo.c                                  |    2 
 a/fs/userfaultfd.c                                   |   20 
 a/include/linux/cgroup-defs.h                        |   15 
 a/include/linux/compaction.h                         |   12 
 a/include/linux/fs.h                                 |    2 
 a/include/linux/gfp.h                                |    2 
 a/include/linux/highmem.h                            |   19 
 a/include/linux/huge_mm.h                            |   93 --
 a/include/linux/memcontrol.h                         |  148 ---
 a/include/linux/migrate.h                            |    4 
 a/include/linux/mm.h                                 |  118 +-
 a/include/linux/mm_types.h                           |    8 
 a/include/linux/mmap_lock.h                          |   94 ++
 a/include/linux/mmzone.h                             |   50 -
 a/include/linux/page-flags.h                         |    6 
 a/include/linux/page_ext.h                           |    8 
 a/include/linux/pagevec.h                            |    3 
 a/include/linux/poison.h                             |    4 
 a/include/linux/rmap.h                               |    1 
 a/include/linux/sched/mm.h                           |   16 
 a/include/linux/set_memory.h                         |    5 
 a/include/linux/shmem_fs.h                           |    6 
 a/include/linux/slab.h                               |   18 
 a/include/linux/vmalloc.h                            |    8 
 a/include/linux/vmstat.h                             |  104 ++
 a/include/trace/events/sched.h                       |   84 +
 a/include/uapi/linux/const.h                         |    5 
 a/include/uapi/linux/ethtool.h                       |    2 
 a/include/uapi/linux/kernel.h                        |    9 
 a/include/uapi/linux/lightnvm.h                      |    2 
 a/include/uapi/linux/mroute6.h                       |    2 
 a/include/uapi/linux/netfilter/x_tables.h            |    2 
 a/include/uapi/linux/netlink.h                       |    2 
 a/include/uapi/linux/sysctl.h                        |    2 
 a/include/uapi/linux/userfaultfd.h                   |    9 
 a/init/main.c                                        |    6 
 a/ipc/shm.c                                          |    8 
 a/kernel/cgroup/cgroup.c                             |   12 
 a/kernel/fork.c                                      |    3 
 a/kernel/kthread.c                                   |   29 
 a/kernel/power/hibernate.c                           |    2 
 a/kernel/power/power.h                               |    2 
 a/kernel/power/snapshot.c                            |   52 +
 a/kernel/ptrace.c                                    |    2 
 a/kernel/workqueue.c                                 |    3 
 a/lib/locking-selftest.c                             |   47 +
 a/lib/test_kasan_module.c                            |   29 
 a/mm/Kconfig                                         |   25 
 a/mm/Kconfig.debug                                   |   28 
 a/mm/Makefile                                        |    4 
 a/mm/backing-dev.c                                   |    8 
 a/mm/cma.c                                           |    6 
 a/mm/compaction.c                                    |   29 
 a/mm/filemap.c                                       |  823 ++++++++++---------
 a/mm/gup.c                                           |  329 ++-----
 a/mm/gup_benchmark.c                                 |  210 ----
 a/mm/gup_test.c                                      |  299 ++++++
 a/mm/gup_test.h                                      |   40 
 a/mm/highmem.c                                       |   52 +
 a/mm/huge_memory.c                                   |   86 +
 a/mm/hugetlb.c                                       |   28 
 a/mm/init-mm.c                                       |    1 
 a/mm/internal.h                                      |    5 
 a/mm/kasan/generic.c                                 |    3 
 a/mm/kasan/report.c                                  |    4 
 a/mm/khugepaged.c                                    |   58 -
 a/mm/ksm.c                                           |   50 -
 a/mm/madvise.c                                       |   14 
 a/mm/mapping_dirty_helpers.c                         |    6 
 a/mm/memblock.c                                      |   80 +
 a/mm/memcontrol.c                                    |  170 +--
 a/mm/memory-failure.c                                |  322 +++----
 a/mm/memory.c                                        |   24 
 a/mm/memory_hotplug.c                                |   44 -
 a/mm/mempolicy.c                                     |    8 
 a/mm/migrate.c                                       |  183 ++--
 a/mm/mm_init.c                                       |    1 
 a/mm/mmap.c                                          |   22 
 a/mm/mmap_lock.c                                     |  230 +++++
 a/mm/mmu_notifier.c                                  |    7 
 a/mm/mmzone.c                                        |   14 
 a/mm/mremap.c                                        |  282 ++++--
 a/mm/nommu.c                                         |    8 
 a/mm/oom_kill.c                                      |   14 
 a/mm/page_alloc.c                                    |  517 ++++++-----
 a/mm/page_counter.c                                  |    4 
 a/mm/page_ext.c                                      |   10 
 a/mm/page_isolation.c                                |   18 
 a/mm/page_owner.c                                    |   17 
 a/mm/page_poison.c                                   |   56 -
 a/mm/page_vma_mapped.c                               |    9 
 a/mm/process_vm_access.c                             |    2 
 a/mm/rmap.c                                          |    9 
 a/mm/shmem.c                                         |   39 
 a/mm/slab.c                                          |   10 
 a/mm/slab.h                                          |    9 
 a/mm/slab_common.c                                   |   10 
 a/mm/slob.c                                          |    6 
 a/mm/slub.c                                          |  156 +--
 a/mm/swap.c                                          |   12 
 a/mm/swap_state.c                                    |    7 
 a/mm/swapfile.c                                      |   14 
 a/mm/truncate.c                                      |   18 
 a/mm/vmalloc.c                                       |  105 +-
 a/mm/vmscan.c                                        |   21 
 a/mm/vmstat.c                                        |    6 
 a/mm/workingset.c                                    |    8 
 a/mm/z3fold.c                                        |  215 ++--
 a/mm/zsmalloc.c                                      |   11 
 a/mm/zswap.c                                         |  193 +++-
 a/sound/core/pcm_lib.c                               |    4 
 a/tools/include/linux/poison.h                       |    6 
 a/tools/testing/selftests/vm/.gitignore              |    4 
 a/tools/testing/selftests/vm/Makefile                |   41 
 a/tools/testing/selftests/vm/check_config.sh         |   31 
 a/tools/testing/selftests/vm/config                  |    2 
 a/tools/testing/selftests/vm/gup_benchmark.c         |  143 ---
 a/tools/testing/selftests/vm/gup_test.c              |  258 +++++
 a/tools/testing/selftests/vm/hmm-tests.c             |   10 
 a/tools/testing/selftests/vm/mremap_test.c           |  344 +++++++
 a/tools/testing/selftests/vm/run_vmtests             |   51 -
 a/tools/testing/selftests/vm/userfaultfd.c           |   94 --
 217 files changed, 4817 insertions(+), 3369 deletions(-)


^ permalink raw reply	[flat|nested] 263+ messages in thread

* incoming
@ 2020-12-11 21:35 Andrew Morton
  0 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-12-11 21:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm

8 patches, based on 33dc9614dc208291d0c4bcdeb5d30d481dcd2c4c.

Subsystems affected by this patch series:

  mm/pagecache
  proc
  selftests
  kbuild
  mm/kasan
  mm/hugetlb

Subsystem: mm/pagecache

    Andrew Morton <akpm@linux-foundation.org>:
      revert "mm/filemap: add static for function __add_to_page_cache_locked"

Subsystem: proc

    Miles Chen <miles.chen@mediatek.com>:
      proc: use untagged_addr() for pagemap_read addresses

Subsystem: selftests

    Arnd Bergmann <arnd@arndb.de>:
      selftest/fpu: avoid clang warning

Subsystem: kbuild

    Arnd Bergmann <arnd@arndb.de>:
      kbuild: avoid static_assert for genksyms
      initramfs: fix clang build failure
      elfcore: fix building with clang

Subsystem: mm/kasan

    Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>:
      kasan: fix object remaining in offline per-cpu quarantine

Subsystem: mm/hugetlb

    Gerald Schaefer <gerald.schaefer@linux.ibm.com>:
      mm/hugetlb: clear compound_nr before freeing gigantic pages

 fs/proc/task_mmu.c        |    8 ++++++--
 include/linux/build_bug.h |    5 +++++
 include/linux/elfcore.h   |   22 ++++++++++++++++++++++
 init/initramfs.c          |    2 +-
 kernel/Makefile           |    1 -
 kernel/elfcore.c          |   26 --------------------------
 lib/Makefile              |    3 ++-
 mm/filemap.c              |    2 +-
 mm/hugetlb.c              |    1 +
 mm/kasan/quarantine.c     |   39 +++++++++++++++++++++++++++++++++++++++
 10 files changed, 77 insertions(+), 32 deletions(-)


^ permalink raw reply	[flat|nested] 263+ messages in thread

* incoming
@ 2020-12-06  6:14 Andrew Morton
  0 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-12-06  6:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm

12 patches, based on 33256ce194110874d4bc90078b577c59f9076c59.

Subsystems affected by this patch series:

  lib
  coredump
  mm/memcg
  mm/zsmalloc
  mm/swap
  mailmap
  mm/selftests
  mm/pagecache
  mm/hugetlb
  mm/pagemap

Subsystem: lib

    Randy Dunlap <rdunlap@infradead.org>:
      zlib: export S390 symbols for zlib modules

Subsystem: coredump

    Menglong Dong <dong.menglong@zte.com.cn>:
      coredump: fix core_pattern parse error

Subsystem: mm/memcg

    Roman Gushchin <guro@fb.com>:
      mm: memcg/slab: fix obj_cgroup_charge() return value handling

    Yang Shi <shy828301@gmail.com>:
      mm: list_lru: set shrinker map bit when child nr_items is not zero

Subsystem: mm/zsmalloc

    Minchan Kim <minchan@kernel.org>:
      mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING

Subsystem: mm/swap

    Qian Cai <qcai@redhat.com>:
      mm/swapfile: do not sleep with a spin lock held

Subsystem: mailmap

    Uwe Kleine-König <u.kleine-koenig@pengutronix.de>:
      mailmap: add two more addresses of Uwe Kleine-König

Subsystem: mm/selftests

    Xingxing Su <suxingxing@loongson.cn>:
      tools/testing/selftests/vm: fix build error

    Axel Rasmussen <axelrasmussen@google.com>:
      userfaultfd: selftests: fix SIGSEGV if huge mmap fails

Subsystem: mm/pagecache

    Alex Shi <alex.shi@linux.alibaba.com>:
      mm/filemap: add static for function __add_to_page_cache_locked

Subsystem: mm/hugetlb

    Mike Kravetz <mike.kravetz@oracle.com>:
      hugetlb_cgroup: fix offline of hugetlb cgroup with reservations

Subsystem: mm/pagemap

    Liu Zixian <liuzixian4@huawei.com>:
      mm/mmap.c: fix mmap return value when vma is merged after call_mmap()

 .mailmap                                 |    2 +
 arch/arm/configs/omap2plus_defconfig     |    1 
 fs/coredump.c                            |    3 +
 include/linux/zsmalloc.h                 |    1 
 lib/zlib_dfltcc/dfltcc_inflate.c         |    3 +
 mm/Kconfig                               |   13 -------
 mm/filemap.c                             |    2 -
 mm/hugetlb_cgroup.c                      |    8 +---
 mm/list_lru.c                            |   10 ++---
 mm/mmap.c                                |   26 ++++++--------
 mm/slab.h                                |   40 +++++++++++++---------
 mm/swapfile.c                            |    4 +-
 mm/zsmalloc.c                            |   54 -------------------------------
 tools/testing/selftests/vm/Makefile      |    4 ++
 tools/testing/selftests/vm/userfaultfd.c |   25 +++++++++-----
 15 files changed, 75 insertions(+), 121 deletions(-)


^ permalink raw reply	[flat|nested] 263+ messages in thread

* incoming
@ 2020-11-22  6:16 Andrew Morton
  0 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-11-22  6:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm

8 patches, based on a349e4c659609fd20e4beea89e5c4a4038e33a95.

Subsystems affected by this patch series:

  mm/madvise
  kbuild
  mm/pagemap
  mm/readahead
  mm/memcg
  mm/userfaultfd
  vfs-akpm
  mm/madvise

Subsystem: mm/madvise

    Eric Dumazet <edumazet@google.com>:
      mm/madvise: fix memory leak from process_madvise

Subsystem: kbuild

    Nick Desaulniers <ndesaulniers@google.com>:
      compiler-clang: remove version check for BPF Tracing

Subsystem: mm/pagemap

    Dan Williams <dan.j.williams@intel.com>:
      mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports

Subsystem: mm/readahead

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
      mm: fix readahead_page_batch for retry entries

Subsystem: mm/memcg

    Muchun Song <songmuchun@bytedance.com>:
      mm: memcg/slab: fix root memcg vmstats

Subsystem: mm/userfaultfd

    Gerald Schaefer <gerald.schaefer@linux.ibm.com>:
      mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()

Subsystem: vfs-akpm

    Yicong Yang <yangyicong@hisilicon.com>:
      libfs: fix error cast of negative value in simple_attr_write()

Subsystem: mm/madvise

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
      mm: fix madvise WILLNEED performance problem

 arch/ia64/include/asm/sparsemem.h    |    6 ++++++
 arch/powerpc/include/asm/mmzone.h    |    5 +++++
 arch/powerpc/include/asm/sparsemem.h |    5 ++---
 arch/powerpc/mm/mem.c                |    1 +
 arch/x86/include/asm/sparsemem.h     |   10 ++++++++++
 arch/x86/mm/numa.c                   |    2 ++
 drivers/dax/Kconfig                  |    1 -
 fs/libfs.c                           |    6 ++++--
 include/linux/compiler-clang.h       |    2 ++
 include/linux/memory_hotplug.h       |   14 --------------
 include/linux/numa.h                 |   30 +++++++++++++++++++++++++++++-
 include/linux/pagemap.h              |    2 ++
 mm/huge_memory.c                     |    9 ++++-----
 mm/madvise.c                         |    4 +---
 mm/memcontrol.c                      |    9 +++++++--
 mm/memory_hotplug.c                  |   18 ------------------
 16 files changed, 75 insertions(+), 49 deletions(-)


^ permalink raw reply	[flat|nested] 263+ messages in thread

* incoming
@ 2020-11-14  6:51 Andrew Morton
  0 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-11-14  6:51 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, mm-commits

14 patches, based on 9e6a39eae450b81c8b2c8cbbfbdf8218e9b40c81.

Subsystems affected by this patch series:

  mm/migration
  mm/vmscan
  mailmap
  mm/slub
  mm/gup
  kbuild
  reboot
  kernel/watchdog
  mm/memcg
  mm/hugetlbfs
  panic
  ocfs2

Subsystem: mm/migration

    Zi Yan <ziy@nvidia.com>:
      mm/compaction: count pages and stop correctly during page isolation
      mm/compaction: stop isolation if too many pages are isolated and we have pages to migrate

Subsystem: mm/vmscan

    Nicholas Piggin <npiggin@gmail.com>:
      mm/vmscan: fix NR_ISOLATED_FILE corruption on 64-bit

Subsystem: mailmap

    Dmitry Baryshkov <dbaryshkov@gmail.com>:
      mailmap: fix entry for Dmitry Baryshkov/Eremin-Solenikov

Subsystem: mm/slub

    Laurent Dufour <ldufour@linux.ibm.com>:
      mm/slub: fix panic in slab_alloc_node()

Subsystem: mm/gup

    Jason Gunthorpe <jgg@nvidia.com>:
      mm/gup: use unpin_user_pages() in __gup_longterm_locked()

Subsystem: kbuild

    Arvind Sankar <nivedita@alum.mit.edu>:
      compiler.h: fix barrier_data() on clang

Subsystem: reboot

    Matteo Croce <mcroce@microsoft.com>:
    Patch series "fix parsing of reboot= cmdline", v3:
      Revert "kernel/reboot.c: convert simple_strtoul to kstrtoint"
      reboot: fix overflow parsing reboot cpu number

Subsystem: kernel/watchdog

    Santosh Sivaraj <santosh@fossix.org>:
      kernel/watchdog: fix watchdog_allowed_mask not used warning

Subsystem: mm/memcg

    Muchun Song <songmuchun@bytedance.com>:
      mm: memcontrol: fix missing wakeup polling thread

Subsystem: mm/hugetlbfs

    Mike Kravetz <mike.kravetz@oracle.com>:
      hugetlbfs: fix anon huge page migration race

Subsystem: panic

    Christophe Leroy <christophe.leroy@csgroup.eu>:
      panic: don't dump stack twice on warn

Subsystem: ocfs2

    Wengang Wang <wen.gang.wang@oracle.com>:
      ocfs2: initialize ip_next_orphan

 .mailmap                       |    5 +-
 fs/ocfs2/super.c               |    1 
 include/asm-generic/barrier.h  |    1 
 include/linux/compiler-clang.h |    6 --
 include/linux/compiler-gcc.h   |   19 --------
 include/linux/compiler.h       |   18 +++++++-
 include/linux/memcontrol.h     |   11 ++++-
 kernel/panic.c                 |    3 -
 kernel/reboot.c                |   28 ++++++------
 kernel/watchdog.c              |    4 -
 mm/compaction.c                |   12 +++--
 mm/gup.c                       |   14 ++++--
 mm/hugetlb.c                   |   90 ++---------------------------------------
 mm/memory-failure.c            |   36 +++++++---------
 mm/migrate.c                   |   46 +++++++++++---------
 mm/rmap.c                      |    5 --
 mm/slub.c                      |    2 
 mm/vmscan.c                    |    5 +-
 18 files changed, 119 insertions(+), 187 deletions(-)


^ permalink raw reply	[flat|nested] 263+ messages in thread

* incoming
@ 2020-11-02  1:06 Andrew Morton
  0 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-11-02  1:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm

15 patches, based on 3cea11cd5e3b00d91caf0b4730194039b45c5891.

Subsystems affected by this patch series:

  mm/memremap
  mm/memcg
  mm/slab-generic
  mm/kasan
  mm/mempolicy
  signals
  lib
  mm/pagecache
  kthread
  mm/oom-kill
  mm/pagemap
  epoll
  core-kernel

Subsystem: mm/memremap

    Ralph Campbell <rcampbell@nvidia.com>:
      mm/mremap_pages: fix static key devmap_managed_key updates

Subsystem: mm/memcg

    Mike Kravetz <mike.kravetz@oracle.com>:
      hugetlb_cgroup: fix reservation accounting

    zhongjiang-ali <zhongjiang-ali@linux.alibaba.com>:
      mm: memcontrol: correct the NR_ANON_THPS counter of hierarchical memcg

    Roman Gushchin <guro@fb.com>:
      mm: memcg: link page counters to root if use_hierarchy is false

Subsystem: mm/slab-generic

Subsystem: mm/kasan

    Andrey Konovalov <andreyknvl@google.com>:
      kasan: adopt KUNIT tests to SW_TAGS mode

Subsystem: mm/mempolicy

    Shijie Luo <luoshijie1@huawei.com>:
      mm: mempolicy: fix potential pte_unmap_unlock pte error

Subsystem: signals

    Oleg Nesterov <oleg@redhat.com>:
      ptrace: fix task_join_group_stop() for the case when current is traced

Subsystem: lib

    Vasily Gorbik <gor@linux.ibm.com>:
      lib/crc32test: remove extra local_irq_disable/enable

Subsystem: mm/pagecache

    Jason Yan <yanaijie@huawei.com>:
      mm/truncate.c: make __invalidate_mapping_pages() static

Subsystem: kthread

    Zqiang <qiang.zhang@windriver.com>:
      kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled

Subsystem: mm/oom-kill

    Charles Haithcock <chaithco@redhat.com>:
      mm, oom: keep oom_adj under or at upper limit when printing

Subsystem: mm/pagemap

    Jason Gunthorpe <jgg@nvidia.com>:
      mm: always have io_remap_pfn_range() set pgprot_decrypted()

Subsystem: epoll

    Soheil Hassas Yeganeh <soheil@google.com>:
      epoll: check ep_events_available() upon timeout
      epoll: add a selftest for epoll timeout race

Subsystem: core-kernel

    Lukas Bulwahn <lukas.bulwahn@gmail.com>:
      kernel/hung_task.c: make type annotations consistent

 fs/eventpoll.c                                                |   16 +
 fs/proc/base.c                                                |    2 
 include/linux/mm.h                                            |    9 
 include/linux/pgtable.h                                       |    4 
 kernel/hung_task.c                                            |    3 
 kernel/kthread.c                                              |    3 
 kernel/signal.c                                               |   19 -
 lib/crc32test.c                                               |    4 
 lib/test_kasan.c                                              |  149 +++++++---
 mm/hugetlb.c                                                  |   20 -
 mm/memcontrol.c                                               |   25 +
 mm/mempolicy.c                                                |    6 
 mm/memremap.c                                                 |   39 +-
 mm/truncate.c                                                 |    2 
 tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c |   95 ++++++
 15 files changed, 290 insertions(+), 106 deletions(-)


^ permalink raw reply	[flat|nested] 263+ messages in thread

* incoming
@ 2020-10-17 23:13 Andrew Morton
  0 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-10-17 23:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm


40 patches, based on 9d9af1007bc08971953ae915d88dc9bb21344b53.

Subsystems affected by this patch series:

  ia64
  mm/memcg
  mm/migration
  mm/pagemap
  mm/gup
  mm/madvise
  mm/vmalloc
  misc

Subsystem: ia64

    Krzysztof Kozlowski <krzk@kernel.org>:
      ia64: fix build error with !COREDUMP

Subsystem: mm/memcg

    Roman Gushchin <guro@fb.com>:
      mm, memcg: rework remote charging API to support nesting
    Patch series "mm: kmem: kernel memory accounting in an interrupt context":
      mm: kmem: move memcg_kmem_bypass() calls to get_mem/obj_cgroup_from_current()
      mm: kmem: remove redundant checks from get_obj_cgroup_from_current()
      mm: kmem: prepare remote memcg charging infra for interrupt contexts
      mm: kmem: enable kernel memcg accounting from interrupt contexts

Subsystem: mm/migration

    Joonsoo Kim <iamjoonsoo.kim@lge.com>:
      mm/memory-failure: remove a wrapper for alloc_migration_target()
      mm/memory_hotplug: remove a wrapper for alloc_migration_target()

    Miaohe Lin <linmiaohe@huawei.com>:
      mm/migrate: avoid possible unnecessary process right check in kernel_move_pages()

Subsystem: mm/pagemap

    "Liam R. Howlett" <Liam.Howlett@Oracle.com>:
      mm/mmap: add inline vma_next() for readability of mmap code
      mm/mmap: add inline munmap_vma_range() for code readability

Subsystem: mm/gup

    Jann Horn <jannh@google.com>:
      mm/gup_benchmark: take the mmap lock around GUP
      binfmt_elf: take the mmap lock around find_extend_vma()
      mm/gup: assert that the mmap lock is held in __get_user_pages()

    John Hubbard <jhubbard@nvidia.com>:
    Patch series "selftests/vm: gup_test, hmm-tests, assorted improvements", v2:
      mm/gup_benchmark: rename to mm/gup_test
      selftests/vm: use a common gup_test.h
      selftests/vm: rename run_vmtests --> run_vmtests.sh
      selftests/vm: minor cleanup: Makefile and gup_test.c
      selftests/vm: only some gup_test items are really benchmarks
      selftests/vm: gup_test: introduce the dump_pages() sub-test
      selftests/vm: run_vmtests.sh: update and clean up gup_test invocation
      selftests/vm: hmm-tests: remove the libhugetlbfs dependency
      selftests/vm: 10x speedup for hmm-tests

Subsystem: mm/madvise

    Minchan Kim <minchan@kernel.org>:
    Patch series "introduce memory hinting API for external process", v9:
      mm/madvise: pass mm to do_madvise
      pid: move pidfd_get_pid() to pid.c
      mm/madvise: introduce process_madvise() syscall: an external memory hinting API

Subsystem: mm/vmalloc

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
    Patch series "remove alloc_vm_area", v4:
      mm: update the documentation for vfree

    Christoph Hellwig <hch@lst.de>:
      mm: add a VM_MAP_PUT_PAGES flag for vmap
      mm: add a vmap_pfn function
      mm: allow a NULL fn callback in apply_to_page_range
      zsmalloc: switch from alloc_vm_area to get_vm_area
      drm/i915: use vmap in shmem_pin_map
      drm/i915: stop using kmap in i915_gem_object_map
      drm/i915: use vmap in i915_gem_object_map
      xen/xenbus: use apply_to_page_range directly in xenbus_map_ring_pv
      x86/xen: open code alloc_vm_area in arch_gnttab_valloc
      mm: remove alloc_vm_area
    Patch series "two small vmalloc cleanups":
      mm: cleanup the gfp_mask handling in __vmalloc_area_node
      mm: remove the filename in the top of file comment in vmalloc.c

Subsystem: misc

    Tian Tao <tiantao6@hisilicon.com>:
      mm: remove duplicate include statement in mmu.c

 Documentation/core-api/pin_user_pages.rst   |    8 
 arch/alpha/kernel/syscalls/syscall.tbl      |    1 
 arch/arm/mm/mmu.c                           |    1 
 arch/arm/tools/syscall.tbl                  |    1 
 arch/arm64/include/asm/unistd.h             |    2 
 arch/arm64/include/asm/unistd32.h           |    2 
 arch/ia64/kernel/Makefile                   |    2 
 arch/ia64/kernel/syscalls/syscall.tbl       |    1 
 arch/m68k/kernel/syscalls/syscall.tbl       |    1 
 arch/microblaze/kernel/syscalls/syscall.tbl |    1 
 arch/mips/kernel/syscalls/syscall_n32.tbl   |    1 
 arch/mips/kernel/syscalls/syscall_n64.tbl   |    1 
 arch/mips/kernel/syscalls/syscall_o32.tbl   |    1 
 arch/parisc/kernel/syscalls/syscall.tbl     |    1 
 arch/powerpc/kernel/syscalls/syscall.tbl    |    1 
 arch/s390/configs/debug_defconfig           |    2 
 arch/s390/configs/defconfig                 |    2 
 arch/s390/kernel/syscalls/syscall.tbl       |    1 
 arch/sh/kernel/syscalls/syscall.tbl         |    1 
 arch/sparc/kernel/syscalls/syscall.tbl      |    1 
 arch/x86/entry/syscalls/syscall_32.tbl      |    1 
 arch/x86/entry/syscalls/syscall_64.tbl      |    1 
 arch/x86/xen/grant-table.c                  |   27 +-
 arch/xtensa/kernel/syscalls/syscall.tbl     |    1 
 drivers/gpu/drm/i915/Kconfig                |    1 
 drivers/gpu/drm/i915/gem/i915_gem_pages.c   |  136 ++++------
 drivers/gpu/drm/i915/gt/shmem_utils.c       |   78 +-----
 drivers/xen/xenbus/xenbus_client.c          |   30 +-
 fs/binfmt_elf.c                             |    3 
 fs/buffer.c                                 |    6 
 fs/io_uring.c                               |    2 
 fs/notify/fanotify/fanotify.c               |    5 
 fs/notify/inotify/inotify_fsnotify.c        |    5 
 include/linux/memcontrol.h                  |   12 
 include/linux/mm.h                          |    2 
 include/linux/pid.h                         |    1 
 include/linux/sched/mm.h                    |   43 +--
 include/linux/syscalls.h                    |    2 
 include/linux/vmalloc.h                     |    7 
 include/uapi/asm-generic/unistd.h           |    4 
 kernel/exit.c                               |   19 -
 kernel/pid.c                                |   19 +
 kernel/sys_ni.c                             |    1 
 mm/Kconfig                                  |   24 +
 mm/Makefile                                 |    2 
 mm/gup.c                                    |    2 
 mm/gup_benchmark.c                          |  225 ------------------
 mm/gup_test.c                               |  295 +++++++++++++++++++++--
 mm/gup_test.h                               |   40 ++-
 mm/madvise.c                                |  125 ++++++++--
 mm/memcontrol.c                             |   83 ++++--
 mm/memory-failure.c                         |   18 -
 mm/memory.c                                 |   16 -
 mm/memory_hotplug.c                         |   46 +--
 mm/migrate.c                                |   71 +++--
 mm/mmap.c                                   |   74 ++++-
 mm/nommu.c                                  |    7 
 mm/percpu.c                                 |    3 
 mm/slab.h                                   |    3 
 mm/vmalloc.c                                |  147 +++++------
 mm/zsmalloc.c                               |   10 
 tools/testing/selftests/vm/.gitignore       |    3 
 tools/testing/selftests/vm/Makefile         |   40 ++-
 tools/testing/selftests/vm/check_config.sh  |   31 ++
 tools/testing/selftests/vm/config           |    2 
 tools/testing/selftests/vm/gup_benchmark.c  |  143 -----------
 tools/testing/selftests/vm/gup_test.c       |  260 ++++++++++++++++++--
 tools/testing/selftests/vm/hmm-tests.c      |   12 
 tools/testing/selftests/vm/run_vmtests      |  334 --------------------------
 tools/testing/selftests/vm/run_vmtests.sh   |  350 +++++++++++++++++++++++++++-
 70 files changed, 1580 insertions(+), 1224 deletions(-)


^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: incoming
  2020-10-16  2:40 incoming Andrew Morton
@ 2020-10-16  3:03 ` Andrew Morton
  0 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-10-16  3:03 UTC (permalink / raw)
  To: Linus Torvalds, mm-commits, linux-mm

And... I forgot to set in-reply-to :(

Shall resend, omitting linux-mm.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* incoming
@ 2020-10-16  2:40 Andrew Morton
  2020-10-16  3:03 ` incoming Andrew Morton
  0 siblings, 1 reply; 263+ messages in thread
From: Andrew Morton @ 2020-10-16  2:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm


- most of the rest of mm/

- various other subsystems

156 patches, based on 578a7155c5a1894a789d4ece181abf9d25dc6b0d.

Subsystems affected by this patch series:

  mm/dax
  mm/debug
  mm/thp
  mm/readahead
  mm/page-poison
  mm/util
  mm/memory-hotplug
  mm/zram
  mm/cleanups
  misc
  core-kernel
  get_maintainer
  MAINTAINERS
  lib
  bitops
  checkpatch
  binfmt
  ramfs
  autofs
  nilfs
  rapidio
  panic
  relay
  kgdb
  ubsan
  romfs
  fault-injection

Subsystem: mm/dax

    Dan Williams <dan.j.williams@intel.com>:
      device-dax/kmem: fix resource release

Subsystem: mm/debug

    "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>:
    Patch series "mm/debug_vm_pgtable fixes", v4:
      powerpc/mm: add DEBUG_VM WARN for pmd_clear
      powerpc/mm: move setting pte specific flags to pfn_pte
      mm/debug_vm_pgtable/ppc64: avoid setting top bits in radom value
      mm/debug_vm_pgtables/hugevmap: use the arch helper to identify huge vmap support.
      mm/debug_vm_pgtable/savedwrite: enable savedwrite test with CONFIG_NUMA_BALANCING
      mm/debug_vm_pgtable/THP: mark the pte entry huge before using set_pmd/pud_at
      mm/debug_vm_pgtable/set_pte/pmd/pud: don't use set_*_at to update an existing pte entry
      mm/debug_vm_pgtable/locks: move non page table modifying test together
      mm/debug_vm_pgtable/locks: take correct page table lock
      mm/debug_vm_pgtable/thp: use page table depost/withdraw with THP
      mm/debug_vm_pgtable/pmd_clear: don't use pmd/pud_clear on pte entries
      mm/debug_vm_pgtable/hugetlb: disable hugetlb test on ppc64
      mm/debug_vm_pgtable: avoid none pte in pte_clear_test
      mm/debug_vm_pgtable: avoid doing memory allocation with pgtable_t mapped.

Subsystem: mm/thp

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
    Patch series "Fix read-only THP for non-tmpfs filesystems":
      XArray: add xa_get_order
      XArray: add xas_split
      mm/filemap: fix storing to a THP shadow entry
    Patch series "Remove assumptions of THP size":
      mm/filemap: fix page cache removal for arbitrary sized THPs
      mm/memory: remove page fault assumption of compound page size
      mm/page_owner: change split_page_owner to take a count

    "Kirill A. Shutemov" <kirill@shutemov.name>:
      mm/huge_memory: fix total_mapcount assumption of page size
      mm/huge_memory: fix split assumption of page size

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
      mm/huge_memory: fix page_trans_huge_mapcount assumption of THP size
      mm/huge_memory: fix can_split_huge_page assumption of THP size
      mm/rmap: fix assumptions of THP size
      mm/truncate: fix truncation for pages of arbitrary size
      mm/page-writeback: support tail pages in wait_for_stable_page
      mm/vmscan: allow arbitrary sized pages to be paged out
      fs: add a filesystem flag for THPs
      fs: do not update nr_thps for mappings which support THPs

    Huang Ying <ying.huang@intel.com>:
      mm: fix a race during THP splitting

Subsystem: mm/readahead

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
    Patch series "Readahead patches for 5.9/5.10":
      mm/readahead: add DEFINE_READAHEAD
      mm/readahead: make page_cache_ra_unbounded take a readahead_control
      mm/readahead: make do_page_cache_ra take a readahead_control

    David Howells <dhowells@redhat.com>:
      mm/readahead: make ondemand_readahead take a readahead_control
      mm/readahead: pass readahead_control to force_page_cache_ra

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
      mm/readahead: add page_cache_sync_ra and page_cache_async_ra

    David Howells <dhowells@redhat.com>:
      mm/filemap: fold ra_submit into do_sync_mmap_readahead
      mm/readahead: pass a file_ra_state into force_page_cache_ra

Subsystem: mm/page-poison

    Naoya Horiguchi <naoya.horiguchi@nec.com>:
    Patch series "HWPOISON: soft offline rework", v7:
      mm,hwpoison: cleanup unused PageHuge() check
      mm, hwpoison: remove recalculating hpage
      mm,hwpoison-inject: don't pin for hwpoison_filter

    Oscar Salvador <osalvador@suse.de>:
      mm,hwpoison: unexport get_hwpoison_page and make it static
      mm,hwpoison: refactor madvise_inject_error
      mm,hwpoison: kill put_hwpoison_page
      mm,hwpoison: unify THP handling for hard and soft offline
      mm,hwpoison: rework soft offline for free pages
      mm,hwpoison: rework soft offline for in-use pages
      mm,hwpoison: refactor soft_offline_huge_page and __soft_offline_page
      mm,hwpoison: return 0 if the page is already poisoned in soft-offline

    Naoya Horiguchi <naoya.horiguchi@nec.com>:
      mm,hwpoison: introduce MF_MSG_UNSPLIT_THP
      mm,hwpoison: double-check page count in __get_any_page()

    Oscar Salvador <osalvador@suse.de>:
      mm,hwpoison: try to narrow window race for free pages

    Mateusz Nosek <mateusznosek0@gmail.com>:
      mm/page_poison.c: replace bool variable with static key

    Miaohe Lin <linmiaohe@huawei.com>:
      mm/vmstat.c: use helper macro abs()

Subsystem: mm/util

    Bartosz Golaszewski <bgolaszewski@baylibre.com>:
      mm/util.c: update the kerneldoc for kstrdup_const()

    Jann Horn <jannh@google.com>:
      mm/mmu_notifier: fix mmget() assert in __mmu_interval_notifier_insert

Subsystem: mm/memory-hotplug

    David Hildenbrand <david@redhat.com>:
    Patch series "mm/memory_hotplug: online_pages()/offline_pages() cleanups", v2:
      mm/memory_hotplug: inline __offline_pages() into offline_pages()
      mm/memory_hotplug: enforce section granularity when onlining/offlining
      mm/memory_hotplug: simplify page offlining
      mm/page_alloc: simplify __offline_isolated_pages()
      mm/memory_hotplug: drop nr_isolate_pageblock in offline_pages()
      mm/page_isolation: simplify return value of start_isolate_page_range()
      mm/memory_hotplug: simplify page onlining
      mm/page_alloc: drop stale pageblock comment in memmap_init_zone*()
      mm: pass migratetype into memmap_init_zone() and move_pfn_range_to_zone()
      mm/memory_hotplug: mark pageblocks MIGRATE_ISOLATE while onlining memory
    Patch series "selective merging of system ram resources", v4:
      kernel/resource: make release_mem_region_adjustable() never fail
      kernel/resource: move and rename IORESOURCE_MEM_DRIVER_MANAGED
      mm/memory_hotplug: guard more declarations by CONFIG_MEMORY_HOTPLUG
      mm/memory_hotplug: prepare passing flags to add_memory() and friends
      mm/memory_hotplug: MEMHP_MERGE_RESOURCE to specify merging of System RAM resources
      virtio-mem: try to merge system ram resources
      xen/balloon: try to merge system ram resources
      hv_balloon: try to merge system ram resources
      kernel/resource: make iomem_resource implicit in release_mem_region_adjustable()

    Laurent Dufour <ldufour@linux.ibm.com>:
      mm: don't panic when links can't be created in sysfs

    David Hildenbrand <david@redhat.com>:
    Patch series "mm: place pages to the freelist tail when onlining and undoing isolation", v2:
      mm/page_alloc: convert "report" flag of __free_one_page() to a proper flag
      mm/page_alloc: place pages to tail in __putback_isolated_page()
      mm/page_alloc: move pages to tail in move_to_free_list()
      mm/page_alloc: place pages to tail in __free_pages_core()
      mm/memory_hotplug: update comment regarding zone shuffling

Subsystem: mm/zram

    Douglas Anderson <dianders@chromium.org>:
      zram: failing to decompress is WARN_ON worthy

Subsystem: mm/cleanups

    YueHaibing <yuehaibing@huawei.com>:
      mm/slab.h: remove duplicate include

    Wei Yang <richard.weiyang@linux.alibaba.com>:
      mm/page_reporting.c: drop stale list head check in page_reporting_cycle

    Ira Weiny <ira.weiny@intel.com>:
      mm/highmem.c: clean up endif comments

    Yu Zhao <yuzhao@google.com>:
      mm: use self-explanatory macros rather than "2"

    Miaohe Lin <linmiaohe@huawei.com>:
      mm: fix some broken comments

    Chen Tao <chentao3@hotmail.com>:
      mm: fix some comments formatting

    Xiaofei Tan <tanxiaofei@huawei.com>:
      mm/workingset.c: fix some doc warnings

    Miaohe Lin <linmiaohe@huawei.com>:
      mm: use helper function put_write_access()

    Mike Rapoport <rppt@linux.ibm.com>:
      include/linux/mmzone.h: remove unused early_pfn_valid()

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
      mm: rename page_order() to buddy_order()

Subsystem: misc

    Randy Dunlap <rdunlap@infradead.org>:
      fs: configfs: delete repeated words in comments

    Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
      kernel.h: split out min()/max() et al. helpers

Subsystem: core-kernel

    Liao Pingfang <liao.pingfang@zte.com.cn>:
      kernel/sys.c: replace do_brk with do_brk_flags in comment of prctl_set_mm_map()

    Randy Dunlap <rdunlap@infradead.org>:
      kernel/: fix repeated words in comments
      kernel: acct.c: fix some kernel-doc nits

Subsystem: get_maintainer

    Joe Perches <joe@perches.com>:
      get_maintainer: add test for file in VCS

Subsystem: MAINTAINERS

    Joe Perches <joe@perches.com>:
      get_maintainer: exclude MAINTAINERS file(s) from --git-fallback

    Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>:
      MAINTAINERS: jarkko.sakkinen@linux.intel.com -> jarkko@kernel.org

Subsystem: lib

    Randy Dunlap <rdunlap@infradead.org>:
      lib: bitmap: delete duplicated words
      lib: libcrc32c: delete duplicated words
      lib: decompress_bunzip2: delete duplicated words
      lib: dynamic_queue_limits: delete duplicated words + fix typo
      lib: earlycpio: delete duplicated words
      lib: radix-tree: delete duplicated words
      lib: syscall: delete duplicated words
      lib: test_sysctl: delete duplicated words
      lib/mpi/mpi-bit.c: fix spello of "functions"

    Stephen Boyd <swboyd@chromium.org>:
      lib/idr.c: document calling context for IDA APIs mustn't use locks
      lib/idr.c: document that ida_simple_{get,remove}() are deprecated

    Christophe JAILLET <christophe.jaillet@wanadoo.fr>:
      lib/scatterlist.c: avoid a double memset

    Miaohe Lin <linmiaohe@huawei.com>:
      lib/percpu_counter.c: use helper macro abs()

    Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
      include/linux/list.h: add a macro to test if entry is pointing to the head

    Dan Carpenter <dan.carpenter@oracle.com>:
      lib/test_hmm.c: fix an error code in dmirror_allocate_chunk()

    Tobias Jordan <kernel@cdqe.de>:
      lib/crc32.c: fix trivial typo in preprocessor condition

Subsystem: bitops

    Wei Yang <richard.weiyang@linux.alibaba.com>:
      bitops: simplify get_count_order_long()
      bitops: use the same mechanism for get_count_order[_long]

Subsystem: checkpatch

    Jerome Forissier <jerome@forissier.org>:
      checkpatch: add --kconfig-prefix

    Joe Perches <joe@perches.com>:
      checkpatch: move repeated word test
      checkpatch: add test for comma use that should be semicolon

    Rikard Falkeborn <rikard.falkeborn@gmail.com>:
      const_structs.checkpatch: add phy_ops

    Nicolas Boichat <drinkcat@chromium.org>:
      checkpatch: warn if trace_printk and friends are called

    Rikard Falkeborn <rikard.falkeborn@gmail.com>:
      const_structs.checkpatch: add pinctrl_ops and pinmux_ops

    Joe Perches <joe@perches.com>:
      checkpatch: warn on self-assignments
      checkpatch: allow not using -f with files that are in git

    Dwaipayan Ray <dwaipayanray1@gmail.com>:
      checkpatch: extend author Signed-off-by check for split From: header

    Joe Perches <joe@perches.com>:
      checkpatch: emit a warning on embedded filenames

    Dwaipayan Ray <dwaipayanray1@gmail.com>:
      checkpatch: fix multi-statement macro checks for while blocks.

    Łukasz Stelmach <l.stelmach@samsung.com>:
      checkpatch: fix false positive on empty block comment lines

    Dwaipayan Ray <dwaipayanray1@gmail.com>:
      checkpatch: add new warnings to author signoff checks.

Subsystem: binfmt

    Chris Kennelly <ckennelly@google.com>:
    Patch series "Selecting Load Addresses According to p_align", v3:
      fs/binfmt_elf: use PT_LOAD p_align values for suitable start address
      tools/testing/selftests: add self-test for verifying load alignment

    Jann Horn <jannh@google.com>:
    Patch series "Fix ELF / FDPIC ELF core dumping, and use mmap_lock properly in there", v5:
      binfmt_elf_fdpic: stop using dump_emit() on user pointers on !MMU
      coredump: let dump_emit() bail out on short writes
      coredump: refactor page range dumping into common helper
      coredump: rework elf/elf_fdpic vma_dump_size() into common helper
      binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot
      mm/gup: take mmap_lock in get_dump_page()
      mm: remove the now-unnecessary mmget_still_valid() hack

Subsystem: ramfs

    Matthew Wilcox (Oracle) <willy@infradead.org>:
      ramfs: fix nommu mmap with gaps in the page cache

Subsystem: autofs

    Matthew Wilcox <willy@infradead.org>:
      autofs: harden ioctl table

Subsystem: nilfs

    Wang Hai <wanghai38@huawei.com>:
      nilfs2: fix some kernel-doc warnings for nilfs2

Subsystem: rapidio

    Souptick Joarder <jrdr.linux@gmail.com>:
      rapidio: fix error handling path

    Jing Xiangfeng <jingxiangfeng@huawei.com>:
      rapidio: fix the missed put_device() for rio_mport_add_riodev

Subsystem: panic

    Alexey Kardashevskiy <aik@ozlabs.ru>:
      panic: dump registers on panic_on_warn

Subsystem: relay

    Sudip Mukherjee <sudipm.mukherjee@gmail.com>:
      kernel/relay.c: drop unneeded initialization

Subsystem: kgdb

    Ritesh Harjani <riteshh@linux.ibm.com>:
      scripts/gdb/proc: add struct mount & struct super_block addr in lx-mounts command
      scripts/gdb/tasks: add headers and improve spacing format

Subsystem: ubsan

    Elena Petrova <lenaptr@google.com>:
      sched.h: drop in_ubsan field when UBSAN is in trap mode

    George Popescu <georgepope@android.com>:
      ubsan: introduce CONFIG_UBSAN_LOCAL_BOUNDS for Clang

Subsystem: romfs

    Libing Zhou <libing.zhou@nokia-sbell.com>:
      ROMFS: support inode blocks calculation

Subsystem: fault-injection

    Albert van der Linde <alinde@google.com>:
    Patch series "add fault injection to user memory access", v3:
      lib, include/linux: add usercopy failure capability
      lib, uaccess: add failure injection to usercopy functions

 .mailmap                                          |    1 
 Documentation/admin-guide/kernel-parameters.txt   |    1 
 Documentation/core-api/xarray.rst                 |   14 
 Documentation/fault-injection/fault-injection.rst |    7 
 MAINTAINERS                                       |    6 
 arch/ia64/mm/init.c                               |    4 
 arch/powerpc/include/asm/book3s/64/pgtable.h      |   29 +
 arch/powerpc/include/asm/nohash/pgtable.h         |    5 
 arch/powerpc/mm/pgtable.c                         |    5 
 arch/powerpc/platforms/powernv/memtrace.c         |    2 
 arch/powerpc/platforms/pseries/hotplug-memory.c   |    2 
 drivers/acpi/acpi_memhotplug.c                    |    3 
 drivers/base/memory.c                             |    3 
 drivers/base/node.c                               |   33 +-
 drivers/block/zram/zram_drv.c                     |    2 
 drivers/dax/kmem.c                                |   50 ++-
 drivers/hv/hv_balloon.c                           |    4 
 drivers/infiniband/core/uverbs_main.c             |    3 
 drivers/rapidio/devices/rio_mport_cdev.c          |   18 -
 drivers/s390/char/sclp_cmd.c                      |    2 
 drivers/vfio/pci/vfio_pci.c                       |   38 +-
 drivers/virtio/virtio_mem.c                       |    5 
 drivers/xen/balloon.c                             |    4 
 fs/autofs/dev-ioctl.c                             |    8 
 fs/binfmt_elf.c                                   |  267 +++-------------
 fs/binfmt_elf_fdpic.c                             |  176 ++--------
 fs/configfs/dir.c                                 |    2 
 fs/configfs/file.c                                |    2 
 fs/coredump.c                                     |  238 +++++++++++++-
 fs/ext4/verity.c                                  |    4 
 fs/f2fs/verity.c                                  |    4 
 fs/inode.c                                        |    2 
 fs/nilfs2/bmap.c                                  |    2 
 fs/nilfs2/cpfile.c                                |    6 
 fs/nilfs2/page.c                                  |    1 
 fs/nilfs2/sufile.c                                |    4 
 fs/proc/task_mmu.c                                |   18 -
 fs/ramfs/file-nommu.c                             |    2 
 fs/romfs/super.c                                  |    1 
 fs/userfaultfd.c                                  |   28 -
 include/linux/bitops.h                            |   13 
 include/linux/blkdev.h                            |    1 
 include/linux/bvec.h                              |    6 
 include/linux/coredump.h                          |   13 
 include/linux/fault-inject-usercopy.h             |   22 +
 include/linux/fs.h                                |   28 -
 include/linux/idr.h                               |   13 
 include/linux/ioport.h                            |   15 
 include/linux/jiffies.h                           |    3 
 include/linux/kernel.h                            |  150 ---------
 include/linux/list.h                              |   29 +
 include/linux/memory_hotplug.h                    |   42 +-
 include/linux/minmax.h                            |  153 +++++++++
 include/linux/mm.h                                |    5 
 include/linux/mmzone.h                            |   17 -
 include/linux/node.h                              |   16 
 include/linux/nodemask.h                          |    2 
 include/linux/page-flags.h                        |    6 
 include/linux/page_owner.h                        |    6 
 include/linux/pagemap.h                           |  111 ++++++
 include/linux/sched.h                             |    2 
 include/linux/sched/mm.h                          |   25 -
 include/linux/uaccess.h                           |   12 
 include/linux/vmstat.h                            |    2 
 include/linux/xarray.h                            |   22 +
 include/ras/ras_event.h                           |    3 
 kernel/acct.c                                     |   10 
 kernel/cgroup/cpuset.c                            |    2 
 kernel/dma/direct.c                               |    2 
 kernel/fork.c                                     |    4 
 kernel/futex.c                                    |    2 
 kernel/irq/timings.c                              |    2 
 kernel/jump_label.c                               |    2 
 kernel/kcsan/encoding.h                           |    2 
 kernel/kexec_core.c                               |    2 
 kernel/kexec_file.c                               |    2 
 kernel/kthread.c                                  |    2 
 kernel/livepatch/state.c                          |    2 
 kernel/panic.c                                    |   12 
 kernel/pid_namespace.c                            |    2 
 kernel/power/snapshot.c                           |    2 
 kernel/range.c                                    |    3 
 kernel/relay.c                                    |    2 
 kernel/resource.c                                 |  114 +++++--
 kernel/smp.c                                      |    2 
 kernel/sys.c                                      |    2 
 kernel/user_namespace.c                           |    2 
 lib/Kconfig.debug                                 |    7 
 lib/Kconfig.ubsan                                 |   14 
 lib/Makefile                                      |    1 
 lib/bitmap.c                                      |    2 
 lib/crc32.c                                       |    2 
 lib/decompress_bunzip2.c                          |    2 
 lib/dynamic_queue_limits.c                        |    4 
 lib/earlycpio.c                                   |    2 
 lib/fault-inject-usercopy.c                       |   39 ++
 lib/find_bit.c                                    |    1 
 lib/hexdump.c                                     |    1 
 lib/idr.c                                         |    9 
 lib/iov_iter.c                                    |    5 
 lib/libcrc32c.c                                   |    2 
 lib/math/rational.c                               |    2 
 lib/math/reciprocal_div.c                         |    1 
 lib/mpi/mpi-bit.c                                 |    2 
 lib/percpu_counter.c                              |    2 
 lib/radix-tree.c                                  |    2 
 lib/scatterlist.c                                 |    2 
 lib/strncpy_from_user.c                           |    3 
 lib/syscall.c                                     |    2 
 lib/test_hmm.c                                    |    2 
 lib/test_sysctl.c                                 |    2 
 lib/test_xarray.c                                 |   65 ++++
 lib/usercopy.c                                    |    5 
 lib/xarray.c                                      |  208 ++++++++++++
 mm/Kconfig                                        |    2 
 mm/compaction.c                                   |    6 
 mm/debug_vm_pgtable.c                             |  267 ++++++++--------
 mm/filemap.c                                      |   58 ++-
 mm/gup.c                                          |   73 ++--
 mm/highmem.c                                      |    4 
 mm/huge_memory.c                                  |   47 +-
 mm/hwpoison-inject.c                              |   18 -
 mm/internal.h                                     |   47 +-
 mm/khugepaged.c                                   |    2 
 mm/madvise.c                                      |   52 ---
 mm/memory-failure.c                               |  357 ++++++++++------------
 mm/memory.c                                       |    7 
 mm/memory_hotplug.c                               |  223 +++++--------
 mm/memremap.c                                     |    3 
 mm/migrate.c                                      |   11 
 mm/mmap.c                                         |    7 
 mm/mmu_notifier.c                                 |    2 
 mm/page-writeback.c                               |    1 
 mm/page_alloc.c                                   |  289 +++++++++++------
 mm/page_isolation.c                               |   16 
 mm/page_owner.c                                   |   10 
 mm/page_poison.c                                  |   20 -
 mm/page_reporting.c                               |    4 
 mm/readahead.c                                    |  174 ++++------
 mm/rmap.c                                         |   10 
 mm/shmem.c                                        |    2 
 mm/shuffle.c                                      |    2 
 mm/slab.c                                         |    2 
 mm/slab.h                                         |    1 
 mm/slub.c                                         |    2 
 mm/sparse.c                                       |    2 
 mm/swap_state.c                                   |    2 
 mm/truncate.c                                     |    6 
 mm/util.c                                         |    3 
 mm/vmscan.c                                       |    5 
 mm/vmstat.c                                       |    8 
 mm/workingset.c                                   |    2 
 scripts/Makefile.ubsan                            |   10 
 scripts/checkpatch.pl                             |  238 ++++++++++----
 scripts/const_structs.checkpatch                  |    3 
 scripts/gdb/linux/proc.py                         |   15 
 scripts/gdb/linux/tasks.py                        |    9 
 scripts/get_maintainer.pl                         |    9 
 tools/testing/selftests/exec/.gitignore           |    1 
 tools/testing/selftests/exec/Makefile             |    9 
 tools/testing/selftests/exec/load_address.c       |   68 ++++
 161 files changed, 2532 insertions(+), 1864 deletions(-)


^ permalink raw reply	[flat|nested] 263+ messages in thread

* incoming
@ 2020-10-13 23:46 Andrew Morton
  0 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-10-13 23:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm

181 patches, based on 029f56db6ac248769f2c260bfaf3c3c0e23e904c.

Subsystems affected by this patch series:

  kbuild
  scripts
  ntfs
  ocfs2
  vfs
  mm/slab
  mm/slub
  mm/kmemleak
  mm/dax
  mm/debug
  mm/pagecache
  mm/fadvise
  mm/gup
  mm/swap
  mm/memremap
  mm/memcg
  mm/selftests
  mm/pagemap
  mm/mincore
  mm/hmm
  mm/dma
  mm/memory-failure
  mm/vmalloc
  mm/documentation
  mm/kasan
  mm/pagealloc
  mm/hugetlb
  mm/vmscan
  mm/z3fold
  mm/zbud
  mm/compaction
  mm/mempolicy
  mm/mempool
  mm/memblock
  mm/oom-kill
  mm/migration

Subsystem: kbuild

    Nick Desaulniers <ndesaulniers@google.com>:
    Patch series "set clang minimum version to 10.0.1", v3:
      compiler-clang: add build check for clang 10.0.1
      Revert "kbuild: disable clang's default use of -fmerge-all-constants"
      Revert "arm64: bti: Require clang >= 10.0.1 for in-kernel BTI support"
      Revert "arm64: vdso: Fix compilation with clang older than 8"
      Partially revert "ARM: 8905/1: Emit __gnu_mcount_nc when using Clang 10.0.0 or newer"

    Marco Elver <elver@google.com>:
      kasan: remove mentions of unsupported Clang versions

    Nick Desaulniers <ndesaulniers@google.com>:
      compiler-gcc: improve version error
      compiler.h: avoid escaped section names
      export.h: fix section name for CONFIG_TRIM_UNUSED_KSYMS for Clang

    Lukas Bulwahn <lukas.bulwahn@gmail.com>:
      kbuild: doc: describe proper script invocation

Subsystem: scripts

    Wang Qing <wangqing@vivo.com>:
      scripts/spelling.txt: increase error-prone spell checking

    Naoki Hayama <naoki.hayama@lineo.co.jp>:
      scripts/spelling.txt: add "arbitrary" typo

    Borislav Petkov <bp@suse.de>:
      scripts/decodecode: add the capability to supply the program counter

Subsystem: ntfs

    Rustam Kovhaev <rkovhaev@gmail.com>:
      ntfs: add check for mft record size in superblock

Subsystem: ocfs2

    Randy Dunlap <rdunlap@infradead.org>:
      ocfs2: delete repeated words in comments

    Gang He <ghe@suse.com>:
      ocfs2: fix potential soft lockup during fstrim

Subsystem: vfs

    Randy Dunlap <rdunlap@infradead.org>:
      fs/xattr.c: fix kernel-doc warnings for setxattr & removexattr

    Luo Jiaxing <luojiaxing@huawei.com>:
      fs_parse: mark fs_param_bad_value() as static

Subsystem: mm/slab

    Mateusz Nosek <mateusznosek0@gmail.com>:
      mm/slab.c: clean code by removing redundant if condition

    tangjianqiang <wyqt1985@gmail.com>:
      include/linux/slab.h: fix a typo error in comment

Subsystem: mm/slub

    Abel Wu <wuyun.wu@huawei.com>:
      mm/slub.c: branch optimization in free slowpath
      mm/slub: fix missing ALLOC_SLOWPATH stat when bulk alloc
      mm/slub: make add_full() condition more explicit

Subsystem: mm/kmemleak

    Davidlohr Bueso <dave@stgolabs.net>:
      mm/kmemleak: rely on rcu for task stack scanning

    Hui Su <sh_def@163.com>:
      mm,kmemleak-test.c: move kmemleak-test.c to samples dir

Subsystem: mm/dax

    Dan Williams <dan.j.williams@intel.com>:
    Patch series "device-dax: Support sub-dividing soft-reserved ranges", v5:
      x86/numa: cleanup configuration dependent command-line options
      x86/numa: add 'nohmat' option
      efi/fake_mem: arrange for a resource entry per efi_fake_mem instance
      ACPI: HMAT: refactor hmat_register_target_device to hmem_register_device
      resource: report parent to walk_iomem_res_desc() callback
      mm/memory_hotplug: introduce default phys_to_target_node() implementation
      ACPI: HMAT: attach a device for each soft-reserved range
      device-dax: drop the dax_region.pfn_flags attribute
      device-dax: move instance creation parameters to 'struct dev_dax_data'
      device-dax: make pgmap optional for instance creation
      device-dax/kmem: introduce dax_kmem_range()
      device-dax/kmem: move resource name tracking to drvdata
      device-dax/kmem: replace release_resource() with release_mem_region()
      device-dax: add an allocation interface for device-dax instances
      device-dax: introduce 'struct dev_dax' typed-driver operations
      device-dax: introduce 'seed' devices
      drivers/base: make device_find_child_by_name() compatible with sysfs inputs
      device-dax: add resize support
      mm/memremap_pages: convert to 'struct range'
      mm/memremap_pages: support multiple ranges per invocation
      device-dax: add dis-contiguous resource support
      device-dax: introduce 'mapping' devices

    Joao Martins <joao.m.martins@oracle.com>:
      device-dax: make align a per-device property

    Dan Williams <dan.j.williams@intel.com>:
      device-dax: add an 'align' attribute

    Joao Martins <joao.m.martins@oracle.com>:
      dax/hmem: introduce dax_hmem.region_idle parameter
      device-dax: add a range mapping allocation attribute

Subsystem: mm/debug

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
      mm/debug.c: do not dereference i_ino blindly

    John Hubbard <jhubbard@nvidia.com>:
      mm, dump_page: rename head_mapcount() --> head_compound_mapcount()

Subsystem: mm/pagecache

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
    Patch series "Return head pages from find_*_entry", v2:
      mm: factor find_get_incore_page out of mincore_page
      mm: use find_get_incore_page in memcontrol
      mm: optimise madvise WILLNEED
      proc: optimise smaps for shmem entries
      i915: use find_lock_page instead of find_lock_entry
      mm: convert find_get_entry to return the head page
      mm/shmem: return head page from find_lock_entry
      mm: add find_lock_head
      mm/filemap: fix filemap_map_pages for THP

Subsystem: mm/fadvise

    Yafang Shao <laoar.shao@gmail.com>:
      mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED

Subsystem: mm/gup

    Barry Song <song.bao.hua@hisilicon.com>:
      mm/gup_benchmark: update the documentation in Kconfig
      mm/gup_benchmark: use pin_user_pages for FOLL_LONGTERM flag
      mm/gup: don't permit users to call get_user_pages with FOLL_LONGTERM

    John Hubbard <jhubbard@nvidia.com>:
      mm/gup: protect unpin_user_pages() against npages==-ERRNO

Subsystem: mm/swap

    Gao Xiang <hsiangkao@redhat.com>:
      swap: rename SWP_FS to SWAP_FS_OPS to avoid ambiguity

    Yu Zhao <yuzhao@google.com>:
      mm: remove activate_page() from unuse_pte()
      mm: remove superfluous __ClearPageActive()

    Miaohe Lin <linmiaohe@huawei.com>:
      mm/swap.c: fix confusing comment in release_pages()
      mm/swap_slots.c: remove always zero and unused return value of enable_swap_slots_cache()
      mm/page_io.c: remove useless out label in __swap_writepage()
      mm/swap.c: fix incomplete comment in lru_cache_add_inactive_or_unevictable()
      mm/swapfile.c: remove unnecessary goto out in _swap_info_get()
      mm/swapfile.c: fix potential memory leak in sys_swapon

Subsystem: mm/memremap

    Ira Weiny <ira.weiny@intel.com>:
      mm/memremap.c: convert devmap static branch to {inc,dec}

Subsystem: mm/memcg

    "Gustavo A. R. Silva" <gustavoars@kernel.org>:
      mm: memcontrol: use flex_array_size() helper in memcpy()
      mm: memcontrol: use the preferred form for passing the size of a structure type

    Roman Gushchin <guro@fb.com>:
      mm: memcg/slab: fix racy access to page->mem_cgroup in mem_cgroup_from_obj()

    Miaohe Lin <linmiaohe@huawei.com>:
      mm: memcontrol: correct the comment of mem_cgroup_iter()

    Waiman Long <longman@redhat.com>:
    Patch series "mm/memcg: Miscellaneous cleanups and streamlining", v2:
      mm/memcg: clean up obsolete enum charge_type
      mm/memcg: simplify mem_cgroup_get_max()
      mm/memcg: unify swap and memsw page counters

    Muchun Song <songmuchun@bytedance.com>:
      mm: memcontrol: add the missing numa_stat interface for cgroup v2

    Miaohe Lin <linmiaohe@huawei.com>:
      mm/page_counter: correct the obsolete func name in the comment of page_counter_try_charge()
      mm: memcontrol: reword obsolete comment of mem_cgroup_unmark_under_oom()

    Bharata B Rao <bharata@linux.ibm.com>:
      mm: memcg/slab: uncharge during kmem_cache_free_bulk()

    Ralph Campbell <rcampbell@nvidia.com>:
      mm/memcg: fix device private memcg accounting

Subsystem: mm/selftests

    John Hubbard <jhubbard@nvidia.com>:
    Patch series "selftests/vm: fix some minor aggravating factors in the Makefile":
      selftests/vm: fix false build success on the second and later attempts
      selftests/vm: fix incorrect gcc invocation in some cases

Subsystem: mm/pagemap

    Matthew Wilcox <willy@infradead.org>:
      mm: account PMD tables like PTE tables

    Yanfei Xu <yanfei.xu@windriver.com>:
      mm/memory.c: fix typo in __do_fault() comment
      mm/memory.c: replace vmf->vma with variable vma

    Wei Yang <richard.weiyang@linux.alibaba.com>:
      mm/mmap: rename __vma_unlink_common() to __vma_unlink()
      mm/mmap: leverage vma_rb_erase_ignore() to implement vma_rb_erase()

    Chinwen Chang <chinwen.chang@mediatek.com>:
    Patch series "Try to release mmap_lock temporarily in smaps_rollup", v4:
      mmap locking API: add mmap_lock_is_contended()
      mm: smaps*: extend smap_gather_stats to support specified beginning
      mm: proc: smaps_rollup: do not stall write attempts on mmap_lock

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
    Patch series "Fix PageDoubleMap":
      mm: move PageDoubleMap bit
      mm: simplify PageDoubleMap with PF_SECOND policy

    Wei Yang <richard.weiyang@linux.alibaba.com>:
      mm/mmap: leave adjust_next as virtual address instead of page frame number

    Randy Dunlap <rdunlap@infradead.org>:
      mm/memory.c: fix spello of "function"

    Wei Yang <richard.weiyang@linux.alibaba.com>:
      mm/mmap: not necessary to check mapping separately
      mm/mmap: check on file instead of the rb_root_cached of its address_space

    Miaohe Lin <linmiaohe@huawei.com>:
      mm: use helper function mapping_allow_writable()
      mm/mmap.c: use helper function allow_write_access() in __remove_shared_vm_struct()

    Liao Pingfang <liao.pingfang@zte.com.cn>:
      mm/mmap.c: replace do_brk with do_brk_flags in comment of insert_vm_struct()

    Peter Xu <peterx@redhat.com>:
      mm: remove src/dst mm parameter in copy_page_range()

Subsystem: mm/mincore

    yuleixzhang <yulei.kernel@gmail.com>:
      include/linux/huge_mm.h: remove mincore_huge_pmd declaration

Subsystem: mm/hmm

    Ralph Campbell <rcampbell@nvidia.com>:
      tools/testing/selftests/vm/hmm-tests.c: use the new SKIP() macro
      lib/test_hmm.c: remove unused dmirror_zero_page

Subsystem: mm/dma

    Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
      mm/dmapool.c: replace open-coded list_for_each_entry_safe()
      mm/dmapool.c: replace hard coded function name with __func__

Subsystem: mm/memory-failure

    Xianting Tian <tian.xianting@h3c.com>:
      mm/memory-failure: do pgoff calculation before for_each_process()

    Alex Shi <alex.shi@linux.alibaba.com>:
      mm/memory-failure.c: remove unused macro `writeback'

Subsystem: mm/vmalloc

    Hui Su <sh_def@163.com>:
      mm/vmalloc.c: update the comment in __vmalloc_area_node()
      mm/vmalloc.c: fix the comment of find_vm_area

Subsystem: mm/documentation

    Alexander Gordeev <agordeev@linux.ibm.com>:
      docs/vm: fix 'mm_count' vs 'mm_users' counter confusion

Subsystem: mm/kasan

    Patricia Alfonso <trishalfonso@google.com>:
    Patch series "KASAN-KUnit Integration", v14:
      kasan/kunit: add KUnit Struct to Current Task
      KUnit: KASAN Integration
      KASAN: port KASAN Tests to KUnit
      KASAN: Testing Documentation

    David Gow <davidgow@google.com>:
      mm: kasan: do not panic if both panic_on_warn and kasan_multishot set

Subsystem: mm/pagealloc

    David Hildenbrand <david@redhat.com>:
    Patch series "mm / virtio-mem: support ZONE_MOVABLE", v5:
      mm/page_alloc: tweak comments in has_unmovable_pages()
      mm/page_isolation: exit early when pageblock is isolated in set_migratetype_isolate()
      mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate()
      mm/page_isolation: cleanup set_migratetype_isolate()
      virtio-mem: don't special-case ZONE_MOVABLE
      mm: document semantics of ZONE_MOVABLE

    Li Xinhai <lixinhai.lxh@gmail.com>:
      mm, isolation: avoid checking unmovable pages across pageblock boundary

    Mateusz Nosek <mateusznosek0@gmail.com>:
      mm/page_alloc.c: clean code by removing unnecessary initialization
      mm/page_alloc.c: micro-optimization remove unnecessary branch
      mm/page_alloc.c: fix early params garbage value accesses
      mm/page_alloc.c: clean code by merging two functions

    Yanfei Xu <yanfei.xu@windriver.com>:
      mm/page_alloc.c: __perform_reclaim should return 'unsigned long'

    Mateusz Nosek <mateusznosek0@gmail.com>:
      mmzone: clean code by removing unused macro parameter

    Ralph Campbell <rcampbell@nvidia.com>:
      mm: move call to compound_head() in release_pages()

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
      mm/page_alloc.c: fix freeing non-compound pages

    Michal Hocko <mhocko@suse.com>:
      include/linux/gfp.h: clarify usage of GFP_ATOMIC in !preemptible contexts

Subsystem: mm/hugetlb

    Baoquan He <bhe@redhat.com>:
    Patch series "mm/hugetlb: Small cleanup and improvement", v2:
      mm/hugetlb.c: make is_hugetlb_entry_hwpoisoned return bool
      mm/hugetlb.c: remove the unnecessary non_swap_entry()
      doc/vm: fix typo in the hugetlb admin documentation

    Wei Yang <richard.weiyang@linux.alibaba.com>:
    Patch series "mm/hugetlb: code refine and simplification", v4:
      mm/hugetlb: not necessary to coalesce regions recursively
      mm/hugetlb: remove VM_BUG_ON(!nrg) in get_file_region_entry_from_cache()
      mm/hugetlb: use list_splice to merge two list at once
      mm/hugetlb: count file_region to be added when regions_needed != NULL
      mm/hugetlb: a page from buddy is not on any list
      mm/hugetlb: narrow the hugetlb_lock protection area during preparing huge page
      mm/hugetlb: take the free hpage during the iteration directly

    Mike Kravetz <mike.kravetz@oracle.com>:
      hugetlb: add lockdep check for i_mmap_rwsem held in huge_pmd_share

Subsystem: mm/vmscan

    Chunxin Zang <zangchunxin@bytedance.com>:
      mm/vmscan: fix infinite loop in drop_slab_node

    Hui Su <sh_def@163.com>:
      mm/vmscan: fix comments for isolate_lru_page()

Subsystem: mm/z3fold

    Hui Su <sh_def@163.com>:
      mm/z3fold.c: use xx_zalloc instead xx_alloc and memset

Subsystem: mm/zbud

    Xiang Chen <chenxiang66@hisilicon.com>:
      mm/zbud: remove redundant initialization

Subsystem: mm/compaction

    Mateusz Nosek <mateusznosek0@gmail.com>:
      mm/compaction.c: micro-optimization remove unnecessary branch
      include/linux/compaction.h: clean code by removing unused enum value

    John Hubbard <jhubbard@nvidia.com>:
      selftests/vm: 8x compaction_test speedup

Subsystem: mm/mempolicy

    Wei Yang <richard.weiyang@linux.alibaba.com>:
      mm/mempolicy: remove or narrow the lock on current
      mm: remove unused alloc_page_vma_node()

Subsystem: mm/mempool

    Miaohe Lin <linmiaohe@huawei.com>:
      mm/mempool: add 'else' to split mutually exclusive case

Subsystem: mm/memblock

    Mike Rapoport <rppt@linux.ibm.com>:
    Patch series "memblock: seasonal cleaning^w cleanup", v3:
      KVM: PPC: Book3S HV: simplify kvm_cma_reserve()
      dma-contiguous: simplify cma_early_percent_memory()
      arm, xtensa: simplify initialization of high memory pages
      arm64: numa: simplify dummy_numa_init()
      h8300, nds32, openrisc: simplify detection of memory extents
      riscv: drop unneeded node initialization
      mircoblaze: drop unneeded NUMA and sparsemem initializations
      memblock: make for_each_memblock_type() iterator private
      memblock: make memblock_debug and related functionality private
      memblock: reduce number of parameters in for_each_mem_range()
      arch, mm: replace for_each_memblock() with for_each_mem_pfn_range()
      arch, drivers: replace for_each_membock() with for_each_mem_range()
      x86/setup: simplify initrd relocation and reservation
      x86/setup: simplify reserve_crashkernel()
      memblock: remove unused memblock_mem_size()
      memblock: implement for_each_reserved_mem_region() using __next_mem_region()
      memblock: use separate iterators for memory and reserved regions

Subsystem: mm/oom-kill

    Suren Baghdasaryan <surenb@google.com>:
      mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary

Subsystem: mm/migration

    Ralph Campbell <rcampbell@nvidia.com>:
      mm/migrate: remove cpages-- in migrate_vma_finalize()
      mm/migrate: remove obsolete comment about device public

 .clang-format                                |    7 
 Documentation/admin-guide/cgroup-v2.rst      |   69 +
 Documentation/admin-guide/mm/hugetlbpage.rst |    2 
 Documentation/dev-tools/kasan.rst            |   74 +
 Documentation/dev-tools/kmemleak.rst         |    2 
 Documentation/kbuild/makefiles.rst           |   20 
 Documentation/vm/active_mm.rst               |    2 
 Documentation/x86/x86_64/boot-options.rst    |    4 
 MAINTAINERS                                  |    2 
 Makefile                                     |    9 
 arch/arm/Kconfig                             |    2 
 arch/arm/include/asm/tlb.h                   |    1 
 arch/arm/kernel/setup.c                      |   18 
 arch/arm/mm/init.c                           |   59 -
 arch/arm/mm/mmu.c                            |   39 
 arch/arm/mm/pmsa-v7.c                        |   23 
 arch/arm/mm/pmsa-v8.c                        |   17 
 arch/arm/xen/mm.c                            |    7 
 arch/arm64/Kconfig                           |    2 
 arch/arm64/kernel/machine_kexec_file.c       |    6 
 arch/arm64/kernel/setup.c                    |    4 
 arch/arm64/kernel/vdso/Makefile              |    7 
 arch/arm64/mm/init.c                         |   11 
 arch/arm64/mm/kasan_init.c                   |   10 
 arch/arm64/mm/mmu.c                          |   11 
 arch/arm64/mm/numa.c                         |   15 
 arch/c6x/kernel/setup.c                      |    9 
 arch/h8300/kernel/setup.c                    |    8 
 arch/microblaze/mm/init.c                    |   23 
 arch/mips/cavium-octeon/dma-octeon.c         |   14 
 arch/mips/kernel/setup.c                     |   31 
 arch/mips/netlogic/xlp/setup.c               |    2 
 arch/nds32/kernel/setup.c                    |    8 
 arch/openrisc/kernel/setup.c                 |    9 
 arch/openrisc/mm/init.c                      |    8 
 arch/powerpc/kernel/fadump.c                 |   61 -
 arch/powerpc/kexec/file_load_64.c            |   16 
 arch/powerpc/kvm/book3s_hv_builtin.c         |   12 
 arch/powerpc/kvm/book3s_hv_uvmem.c           |   14 
 arch/powerpc/mm/book3s64/hash_utils.c        |   16 
 arch/powerpc/mm/book3s64/radix_pgtable.c     |   10 
 arch/powerpc/mm/kasan/kasan_init_32.c        |    8 
 arch/powerpc/mm/mem.c                        |   31 
 arch/powerpc/mm/numa.c                       |    7 
 arch/powerpc/mm/pgtable_32.c                 |    8 
 arch/riscv/mm/init.c                         |   36 
 arch/riscv/mm/kasan_init.c                   |   10 
 arch/s390/kernel/setup.c                     |   27 
 arch/s390/mm/page-states.c                   |    6 
 arch/s390/mm/vmem.c                          |    7 
 arch/sh/mm/init.c                            |    9 
 arch/sparc/mm/init_64.c                      |   12 
 arch/x86/include/asm/numa.h                  |    8 
 arch/x86/kernel/e820.c                       |   16 
 arch/x86/kernel/setup.c                      |   56 -
 arch/x86/mm/numa.c                           |   13 
 arch/x86/mm/numa_emulation.c                 |    3 
 arch/x86/xen/enlighten_pv.c                  |    2 
 arch/xtensa/mm/init.c                        |   55 -
 drivers/acpi/numa/hmat.c                     |   76 -
 drivers/acpi/numa/srat.c                     |    9 
 drivers/base/core.c                          |    2 
 drivers/bus/mvebu-mbus.c                     |   12 
 drivers/dax/Kconfig                          |    6 
 drivers/dax/Makefile                         |    3 
 drivers/dax/bus.c                            | 1237 +++++++++++++++++++++++----
 drivers/dax/bus.h                            |   34 
 drivers/dax/dax-private.h                    |   74 +
 drivers/dax/device.c                         |  164 +--
 drivers/dax/hmem.c                           |   56 -
 drivers/dax/hmem/Makefile                    |    8 
 drivers/dax/hmem/device.c                    |  100 ++
 drivers/dax/hmem/hmem.c                      |   93 +-
 drivers/dax/kmem.c                           |  236 ++---
 drivers/dax/pmem/compat.c                    |    2 
 drivers/dax/pmem/core.c                      |   36 
 drivers/firmware/efi/x86_fake_mem.c          |   12 
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c    |    4 
 drivers/gpu/drm/nouveau/nouveau_dmem.c       |   15 
 drivers/irqchip/irq-gic-v3-its.c             |    2 
 drivers/nvdimm/badrange.c                    |   26 
 drivers/nvdimm/claim.c                       |   13 
 drivers/nvdimm/nd.h                          |    3 
 drivers/nvdimm/pfn_devs.c                    |   13 
 drivers/nvdimm/pmem.c                        |   27 
 drivers/nvdimm/region.c                      |   21 
 drivers/pci/p2pdma.c                         |   12 
 drivers/virtio/virtio_mem.c                  |   47 -
 drivers/xen/unpopulated-alloc.c              |   45 
 fs/fs_parser.c                               |    2 
 fs/ntfs/inode.c                              |    6 
 fs/ocfs2/alloc.c                             |    6 
 fs/ocfs2/localalloc.c                        |    2 
 fs/proc/base.c                               |    3 
 fs/proc/task_mmu.c                           |  104 +-
 fs/xattr.c                                   |   22 
 include/acpi/acpi_numa.h                     |   14 
 include/kunit/test.h                         |    5 
 include/linux/acpi.h                         |    2 
 include/linux/compaction.h                   |    3 
 include/linux/compiler-clang.h               |    8 
 include/linux/compiler-gcc.h                 |    2 
 include/linux/compiler.h                     |    2 
 include/linux/dax.h                          |    8 
 include/linux/export.h                       |    2 
 include/linux/fs.h                           |    4 
 include/linux/gfp.h                          |    6 
 include/linux/huge_mm.h                      |    3 
 include/linux/kasan.h                        |    6 
 include/linux/memblock.h                     |   90 +
 include/linux/memcontrol.h                   |   13 
 include/linux/memory_hotplug.h               |   23 
 include/linux/memremap.h                     |   15 
 include/linux/mm.h                           |   36 
 include/linux/mmap_lock.h                    |    5 
 include/linux/mmzone.h                       |   37 
 include/linux/numa.h                         |   11 
 include/linux/oom.h                          |    1 
 include/linux/page-flags.h                   |   42 
 include/linux/pagemap.h                      |   43 
 include/linux/range.h                        |    6 
 include/linux/sched.h                        |    4 
 include/linux/sched/coredump.h               |    1 
 include/linux/slab.h                         |    2 
 include/linux/swap.h                         |   10 
 include/linux/swap_slots.h                   |    2 
 kernel/dma/contiguous.c                      |   11 
 kernel/fork.c                                |   25 
 kernel/resource.c                            |   11 
 lib/Kconfig.debug                            |    9 
 lib/Kconfig.kasan                            |   31 
 lib/Makefile                                 |    5 
 lib/kunit/test.c                             |   13 
 lib/test_free_pages.c                        |   42 
 lib/test_hmm.c                               |   65 -
 lib/test_kasan.c                             |  732 ++++++---------
 lib/test_kasan_module.c                      |  111 ++
 mm/Kconfig                                   |    4 
 mm/Makefile                                  |    1 
 mm/compaction.c                              |    5 
 mm/debug.c                                   |   18 
 mm/dmapool.c                                 |   46 -
 mm/fadvise.c                                 |    9 
 mm/filemap.c                                 |   78 -
 mm/gup.c                                     |   44 
 mm/gup_benchmark.c                           |   23 
 mm/huge_memory.c                             |    4 
 mm/hugetlb.c                                 |  100 +-
 mm/internal.h                                |    3 
 mm/kasan/report.c                            |   34 
 mm/kmemleak-test.c                           |   99 --
 mm/kmemleak.c                                |    8 
 mm/madvise.c                                 |   21 
 mm/memblock.c                                |  102 --
 mm/memcontrol.c                              |  262 +++--
 mm/memory-failure.c                          |    5 
 mm/memory.c                                  |  147 +--
 mm/memory_hotplug.c                          |   10 
 mm/mempolicy.c                               |    8 
 mm/mempool.c                                 |   18 
 mm/memremap.c                                |  344 ++++---
 mm/migrate.c                                 |    3 
 mm/mincore.c                                 |   28 
 mm/mmap.c                                    |   45 
 mm/oom_kill.c                                |    2 
 mm/page_alloc.c                              |   82 -
 mm/page_counter.c                            |    2 
 mm/page_io.c                                 |   14 
 mm/page_isolation.c                          |   41 
 mm/shmem.c                                   |   19 
 mm/slab.c                                    |    4 
 mm/slab.h                                    |   50 -
 mm/slub.c                                    |   33 
 mm/sparse.c                                  |   10 
 mm/swap.c                                    |   14 
 mm/swap_slots.c                              |    3 
 mm/swap_state.c                              |   38 
 mm/swapfile.c                                |   12 
 mm/truncate.c                                |   58 -
 mm/vmalloc.c                                 |    6 
 mm/vmscan.c                                  |    5 
 mm/z3fold.c                                  |    3 
 mm/zbud.c                                    |    1 
 samples/Makefile                             |    1 
 samples/kmemleak/Makefile                    |    3 
 samples/kmemleak/kmemleak-test.c             |   99 ++
 scripts/decodecode                           |   29 
 scripts/spelling.txt                         |    4 
 tools/testing/nvdimm/dax-dev.c               |   28 
 tools/testing/nvdimm/test/iomap.c            |    2 
 tools/testing/selftests/vm/Makefile          |   17 
 tools/testing/selftests/vm/compaction_test.c |   11 
 tools/testing/selftests/vm/gup_benchmark.c   |   14 
 tools/testing/selftests/vm/hmm-tests.c       |    4 
 194 files changed, 4273 insertions(+), 2777 deletions(-)


^ permalink raw reply	[flat|nested] 263+ messages in thread

* incoming
@ 2020-10-11  6:15 Andrew Morton
  0 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-10-11  6:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm

5 patches, based on da690031a5d6d50a361e3f19f3eeabd086a6f20d.

Subsystems affected by this patch series:

  MAINTAINERS
  mm/pagemap
  mm/swap
  mm/hugetlb

Subsystem: MAINTAINERS

    Kees Cook <keescook@chromium.org>:
      MAINTAINERS: change hardening mailing list

    Antoine Tenart <atenart@kernel.org>:
      MAINTAINERS: Antoine Tenart's email address

Subsystem: mm/pagemap

    Miaohe Lin <linmiaohe@huawei.com>:
      mm: mmap: Fix general protection fault in unlink_file_vma()

Subsystem: mm/swap

    Minchan Kim <minchan@kernel.org>:
      mm: validate inode in mapping_set_error()

Subsystem: mm/hugetlb

    Vijay Balakrishna <vijayb@linux.microsoft.com>:
      mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged

 .mailmap                   |    4 +++-
 MAINTAINERS                |    8 ++++----
 include/linux/khugepaged.h |    5 +++++
 include/linux/pagemap.h    |    3 ++-
 mm/khugepaged.c            |   13 +++++++++++--
 mm/mmap.c                  |    6 +++++-
 mm/page_alloc.c            |    3 +++
 7 files changed, 33 insertions(+), 9 deletions(-)


^ permalink raw reply	[flat|nested] 263+ messages in thread

* incoming
@ 2020-10-03  5:20 Andrew Morton
  0 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-10-03  5:20 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, mm-commits

3 patches, based on d3d45f8220d60a0b2aaaacf8fb2be4e6ffd9008e.

Subsystems affected by this patch series:

  mm/slub
  mm/cma
  scripts

Subsystem: mm/slub

    Eric Farman <farman@linux.ibm.com>:
      mm, slub: restore initial kmem_cache flags

Subsystem: mm/cma

    Joonsoo Kim <iamjoonsoo.kim@lge.com>:
      mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs

Subsystem: scripts

    Eric Biggers <ebiggers@google.com>:
      scripts/spelling.txt: fix malformed entry

 mm/page_alloc.c      |   19 ++++++++++++++++---
 mm/slub.c            |    6 +-----
 scripts/spelling.txt |    2 +-
 3 files changed, 18 insertions(+), 9 deletions(-)


^ permalink raw reply	[flat|nested] 263+ messages in thread

* incoming
@ 2020-09-26  4:17 Andrew Morton
  0 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-09-26  4:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm

9 patches, based on 7c7ec3226f5f33f9c050d85ec20f18419c622ad6.

Subsystems affected by this patch series:

  mm/thp
  mm/memcg
  mm/gup
  mm/migration
  lib
  x86
  mm/memory-hotplug

Subsystem: mm/thp

    Gao Xiang <hsiangkao@redhat.com>:
      mm, THP, swap: fix allocating cluster for swapfile by mistake

Subsystem: mm/memcg

    Muchun Song <songmuchun@bytedance.com>:
      mm: memcontrol: fix missing suffix of workingset_restore

Subsystem: mm/gup

    Vasily Gorbik <gor@linux.ibm.com>:
      mm/gup: fix gup_fast with dynamic page table folding

Subsystem: mm/migration

    Zi Yan <ziy@nvidia.com>:
      mm/migrate: correct thp migration stats

Subsystem: lib

    Nick Desaulniers <ndesaulniers@google.com>:
      lib/string.c: implement stpcpy

    Jason Yan <yanaijie@huawei.com>:
      lib/memregion.c: include memregion.h

Subsystem: x86

    Mikulas Patocka <mpatocka@redhat.com>:
      arch/x86/lib/usercopy_64.c: fix  __copy_user_flushcache() cache writeback

Subsystem: mm/memory-hotplug

    Laurent Dufour <ldufour@linux.ibm.com>:
    Patch series "mm: fix memory to node bad links in sysfs", v3:
      mm: replace memmap_context by meminit_context
      mm: don't rely on system state to detect hot-plug operations

 Documentation/admin-guide/cgroup-v2.rst |   25 ++++++---
 arch/ia64/mm/init.c                     |    6 +-
 arch/s390/include/asm/pgtable.h         |   42 +++++++++++----
 arch/x86/lib/usercopy_64.c              |    2 
 drivers/base/node.c                     |   85 ++++++++++++++++++++------------
 include/linux/mm.h                      |    2 
 include/linux/mmzone.h                  |   11 +++-
 include/linux/node.h                    |   11 ++--
 include/linux/pgtable.h                 |   10 +++
 lib/memregion.c                         |    1 
 lib/string.c                            |   24 +++++++++
 mm/gup.c                                |   18 +++---
 mm/memcontrol.c                         |    4 -
 mm/memory_hotplug.c                     |    5 +
 mm/migrate.c                            |    7 +-
 mm/page_alloc.c                         |   10 +--
 mm/swapfile.c                           |    2 
 17 files changed, 181 insertions(+), 84 deletions(-)


^ permalink raw reply	[flat|nested] 263+ messages in thread

* incoming
@ 2020-09-19  4:19 Andrew Morton
  0 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-09-19  4:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm

15 patches, based on 92ab97adeefccf375de7ebaad9d5b75d4125fe8b.

Subsystems affected by this patch series:

  mailmap
  mm/hotfixes
  mm/thp
  mm/memory-hotplug
  misc
  kcsan

Subsystem: mailmap

    Kees Cook <keescook@chromium.org>:
      mailmap: add older email addresses for Kees Cook

Subsystem: mm/hotfixes

    Hugh Dickins <hughd@google.com>:
    Patch series "mm: fixes to past from future testing":
      ksm: reinstate memcg charge on copied pages
      mm: migration of hugetlbfs page skip memcg
      shmem: shmem_writepage() split unlikely i915 THP
      mm: fix check_move_unevictable_pages() on THP
      mlock: fix unevictable_pgs event counts on THP

    Byron Stanoszek <gandalf@winds.org>:
      tmpfs: restore functionality of nr_inodes=0

    Muchun Song <songmuchun@bytedance.com>:
      kprobes: fix kill kprobe which has been marked as gone

Subsystem: mm/thp

    Ralph Campbell <rcampbell@nvidia.com>:
      mm/thp: fix __split_huge_pmd_locked() for migration PMD

    Christophe Leroy <christophe.leroy@csgroup.eu>:
      selftests/vm: fix display of page size in map_hugetlb

Subsystem: mm/memory-hotplug

    Pavel Tatashin <pasha.tatashin@soleen.com>:
      mm/memory_hotplug: drain per-cpu pages again during memory offline

Subsystem: misc

    Tobias Klauser <tklauser@distanz.ch>:
      ftrace: let ftrace_enable_sysctl take a kernel pointer buffer
      stackleak: let stack_erasing_sysctl take a kernel pointer buffer
      fs/fs-writeback.c: adjust dirtytime_interval_handler definition to match prototype

Subsystem: kcsan

    Changbin Du <changbin.du@gmail.com>:
      kcsan: kconfig: move to menu 'Generic Kernel Debugging Instruments'

 .mailmap                                 |    4 ++
 fs/fs-writeback.c                        |    2 -
 include/linux/ftrace.h                   |    3 --
 include/linux/stackleak.h                |    2 -
 kernel/kprobes.c                         |    9 +++++-
 kernel/stackleak.c                       |    2 -
 kernel/trace/ftrace.c                    |    3 --
 lib/Kconfig.debug                        |    4 --
 mm/huge_memory.c                         |   42 ++++++++++++++++---------------
 mm/ksm.c                                 |    4 ++
 mm/memory_hotplug.c                      |   14 ++++++++++
 mm/migrate.c                             |    3 +-
 mm/mlock.c                               |   24 +++++++++++------
 mm/page_isolation.c                      |    8 +++++
 mm/shmem.c                               |   20 +++++++++++---
 mm/swap.c                                |    6 ++--
 mm/vmscan.c                              |   10 +++++--
 tools/testing/selftests/vm/map_hugetlb.c |    2 -
 18 files changed, 111 insertions(+), 51 deletions(-)


^ permalink raw reply	[flat|nested] 263+ messages in thread

* incoming
@ 2020-09-04 23:34 Andrew Morton
  0 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-09-04 23:34 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm

19 patches, based on 59126901f200f5fc907153468b03c64e0081b6e6.

Subsystems affected by this patch series:

  mm/memcg
  mm/slub
  MAINTAINERS
  mm/pagemap
  ipc
  fork
  checkpatch
  mm/madvise
  mm/migration
  mm/hugetlb
  lib

Subsystem: mm/memcg

    Michal Hocko <mhocko@suse.com>:
      memcg: fix use-after-free in uncharge_batch

    Xunlei Pang <xlpang@linux.alibaba.com>:
      mm: memcg: fix memcg reclaim soft lockup

Subsystem: mm/slub

    Eugeniu Rosca <erosca@de.adit-jv.com>:
      mm: slub: fix conversion of freelist_corrupted()

Subsystem: MAINTAINERS

    Robert Richter <rric@kernel.org>:
      MAINTAINERS: update Cavium/Marvell entries

    Nick Desaulniers <ndesaulniers@google.com>:
      MAINTAINERS: add LLVM maintainers

    Randy Dunlap <rdunlap@infradead.org>:
      MAINTAINERS: IA64: mark Status as Odd Fixes only

Subsystem: mm/pagemap

    Joerg Roedel <jroedel@suse.de>:
      mm: track page table modifications in __apply_to_page_range()

Subsystem: ipc

    Tobias Klauser <tklauser@distanz.ch>:
      ipc: adjust proc_ipc_sem_dointvec definition to match prototype

Subsystem: fork

    Tobias Klauser <tklauser@distanz.ch>:
      fork: adjust sysctl_max_threads definition to match prototype

Subsystem: checkpatch

    Mrinal Pandey <mrinalmni@gmail.com>:
      checkpatch: fix the usage of capture group ( ... )

Subsystem: mm/madvise

    Yang Shi <shy828301@gmail.com>:
      mm: madvise: fix vma user-after-free

Subsystem: mm/migration

    Alistair Popple <alistair@popple.id.au>:
      mm/migrate: fixup setting UFFD_WP flag
      mm/rmap: fixup copying of soft dirty and uffd ptes

    Ralph Campbell <rcampbell@nvidia.com>:
    Patch series "mm/migrate: preserve soft dirty in remove_migration_pte()":
      mm/migrate: remove unnecessary is_zone_device_page() check
      mm/migrate: preserve soft dirty in remove_migration_pte()

Subsystem: mm/hugetlb

    Li Xinhai <lixinhai.lxh@gmail.com>:
      mm/hugetlb: try preferred node first when alloc gigantic page from cma

    Muchun Song <songmuchun@bytedance.com>:
      mm/hugetlb: fix a race between hugetlb sysctl handlers

    David Howells <dhowells@redhat.com>:
      mm/khugepaged.c: fix khugepaged's request size in collapse_file

Subsystem: lib

    Jason Gunthorpe <jgg@nvidia.com>:
      include/linux/log2.h: add missing () around n in roundup_pow_of_two()

 MAINTAINERS           |   32 ++++++++++++++++----------------
 include/linux/log2.h  |    2 +-
 ipc/ipc_sysctl.c      |    2 +-
 kernel/fork.c         |    2 +-
 mm/hugetlb.c          |   49 +++++++++++++++++++++++++++++++++++++------------
 mm/khugepaged.c       |    2 +-
 mm/madvise.c          |    2 +-
 mm/memcontrol.c       |    6 ++++++
 mm/memory.c           |   37 ++++++++++++++++++++++++-------------
 mm/migrate.c          |   31 +++++++++++++++++++------------
 mm/rmap.c             |    9 +++++++--
 mm/slub.c             |   12 ++++++------
 mm/vmscan.c           |    8 ++++++++
 scripts/checkpatch.pl |    4 ++--
 14 files changed, 130 insertions(+), 68 deletions(-)


^ permalink raw reply	[flat|nested] 263+ messages in thread

* incoming
@ 2020-08-21  0:41 Andrew Morton
  0 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-08-21  0:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm

11 patches, based on 7eac66d0456fe12a462e5c14c68e97c7460989da.

Subsystems affected by this patch series:

  misc
  mm/hugetlb
  mm/vmalloc
  mm/misc
  romfs
  relay
  uprobes
  squashfs
  mm/cma
  mm/pagealloc

Subsystem: misc

    Nick Desaulniers <ndesaulniers@google.com>:
      mailmap: add Andi Kleen

Subsystem: mm/hugetlb

    Xu Wang <vulab@iscas.ac.cn>:
      hugetlb_cgroup: convert comma to semicolon

    Hugh Dickins <hughd@google.com>:
      khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter()

Subsystem: mm/vmalloc

    "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>:
      mm/vunmap: add cond_resched() in vunmap_pmd_range

Subsystem: mm/misc

    Leon Romanovsky <leonro@nvidia.com>:
      mm/rodata_test.c: fix missing function declaration

Subsystem: romfs

    Jann Horn <jannh@google.com>:
      romfs: fix uninitialized memory leak in romfs_dev_read()

Subsystem: relay

    Wei Yongjun <weiyongjun1@huawei.com>:
      kernel/relay.c: fix memleak on destroy relay channel

Subsystem: uprobes

    Hugh Dickins <hughd@google.com>:
      uprobes: __replace_page() avoid BUG in munlock_vma_page()

Subsystem: squashfs

    Phillip Lougher <phillip@squashfs.org.uk>:
      squashfs: avoid bio_alloc() failure with 1Mbyte blocks

Subsystem: mm/cma

    Doug Berger <opendmb@gmail.com>:
      mm: include CMA pages in lowmem_reserve at boot

Subsystem: mm/pagealloc

    Charan Teja Reddy <charante@codeaurora.org>:
      mm, page_alloc: fix core hung in free_pcppages_bulk()

 .mailmap                |    1 +
 fs/romfs/storage.c      |    4 +---
 fs/squashfs/block.c     |    6 +++++-
 kernel/events/uprobes.c |    2 +-
 kernel/relay.c          |    1 +
 mm/hugetlb_cgroup.c     |    4 ++--
 mm/khugepaged.c         |    2 +-
 mm/page_alloc.c         |    7 ++++++-
 mm/rodata_test.c        |    1 +
 mm/vmalloc.c            |    2 ++
 10 files changed, 21 insertions(+), 9 deletions(-)


^ permalink raw reply	[flat|nested] 263+ messages in thread

* incoming
@ 2020-08-15  0:29 Andrew Morton
  0 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-08-15  0:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, mm-commits


39 patches, based on b923f1247b72fc100b87792fd2129d026bb10e66.

Subsystems affected by this patch series:

  mm/hotfixes
  lz4
  exec
  mailmap
  mm/thp
  autofs
  mm/madvise
  sysctl
  mm/kmemleak
  mm/misc
  lib

Subsystem: mm/hotfixes

    Mike Rapoport <rppt@linux.ibm.com>:
      asm-generic: pgalloc.h: use correct #ifdef to enable pud_alloc_one()

    Baoquan He <bhe@redhat.com>:
      Revert "mm/vmstat.c: do not show lowmem reserve protection information of empty zone"

Subsystem: lz4

    Nick Terrell <terrelln@fb.com>:
      lz4: fix kernel decompression speed

Subsystem: exec

    Kees Cook <keescook@chromium.org>:
    Patch series "Fix S_ISDIR execve() errno":
      exec: restore EACCES of S_ISDIR execve()
      selftests/exec: add file type errno tests

Subsystem: mailmap

    Greg Kurz <groug@kaod.org>:
      mailmap: add entry for Greg Kurz

Subsystem: mm/thp

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
    Patch series "THP prep patches":
      mm: store compound_nr as well as compound_order
      mm: move page-flags include to top of file
      mm: add thp_order
      mm: add thp_size
      mm: replace hpage_nr_pages with thp_nr_pages
      mm: add thp_head
      mm: introduce offset_in_thp

Subsystem: autofs

    Randy Dunlap <rdunlap@infradead.org>:
      fs: autofs: delete repeated words in comments

Subsystem: mm/madvise

    Minchan Kim <minchan@kernel.org>:
    Patch series "introduce memory hinting API for external process", v8:
      mm/madvise: pass task and mm to do_madvise
      pid: move pidfd_get_pid() to pid.c
      mm/madvise: introduce process_madvise() syscall: an external memory hinting API
      mm/madvise: check fatal signal pending of target process

Subsystem: sysctl

    Xiaoming Ni <nixiaoming@huawei.com>:
      all arch: remove system call sys_sysctl

Subsystem: mm/kmemleak

    Qian Cai <cai@lca.pw>:
      mm/kmemleak: silence KCSAN splats in checksum

Subsystem: mm/misc

    Qian Cai <cai@lca.pw>:
      mm/frontswap: mark various intentional data races
      mm/page_io: mark various intentional data races
      mm/swap_state: mark various intentional data races

    Kirill A. Shutemov <kirill@shutemov.name>:
      mm/filemap.c: fix a data race in filemap_fault()

    Qian Cai <cai@lca.pw>:
      mm/swapfile: fix and annotate various data races
      mm/page_counter: fix various data races at memsw
      mm/memcontrol: fix a data race in scan count
      mm/list_lru: fix a data race in list_lru_count_one
      mm/mempool: fix a data race in mempool_free()
      mm/rmap: annotate a data race at tlb_flush_batched
      mm/swap.c: annotate data races for lru_rotate_pvecs
      mm: annotate a data race in page_zonenum()

    Romain Naour <romain.naour@gmail.com>:
      include/asm-generic/vmlinux.lds.h: align ro_after_init

    Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>:
      sh: clkfwk: remove r8/r16/r32
      sh: use generic strncpy()

Subsystem: lib

    Krzysztof Kozlowski <krzk@kernel.org>:
    Patch series "iomap: Constify ioreadX() iomem argument", v3:
      iomap: constify ioreadX() iomem argument (as in generic implementation)
      rtl818x: constify ioreadX() iomem argument (as in generic implementation)
      ntb: intel: constify ioreadX() iomem argument (as in generic implementation)
      virtio: pci: constify ioreadX() iomem argument (as in generic implementation)

 .mailmap                                               |    1 
 arch/alpha/include/asm/core_apecs.h                    |    6 
 arch/alpha/include/asm/core_cia.h                      |    6 
 arch/alpha/include/asm/core_lca.h                      |    6 
 arch/alpha/include/asm/core_marvel.h                   |    4 
 arch/alpha/include/asm/core_mcpcia.h                   |    6 
 arch/alpha/include/asm/core_t2.h                       |    2 
 arch/alpha/include/asm/io.h                            |   12 -
 arch/alpha/include/asm/io_trivial.h                    |   16 -
 arch/alpha/include/asm/jensen.h                        |    2 
 arch/alpha/include/asm/machvec.h                       |    6 
 arch/alpha/kernel/core_marvel.c                        |    2 
 arch/alpha/kernel/io.c                                 |   12 -
 arch/alpha/kernel/syscalls/syscall.tbl                 |    3 
 arch/arm/configs/am200epdkit_defconfig                 |    1 
 arch/arm/tools/syscall.tbl                             |    3 
 arch/arm64/include/asm/unistd.h                        |    2 
 arch/arm64/include/asm/unistd32.h                      |    6 
 arch/ia64/kernel/syscalls/syscall.tbl                  |    3 
 arch/m68k/kernel/syscalls/syscall.tbl                  |    3 
 arch/microblaze/kernel/syscalls/syscall.tbl            |    3 
 arch/mips/configs/cu1000-neo_defconfig                 |    1 
 arch/mips/kernel/syscalls/syscall_n32.tbl              |    3 
 arch/mips/kernel/syscalls/syscall_n64.tbl              |    3 
 arch/mips/kernel/syscalls/syscall_o32.tbl              |    3 
 arch/parisc/include/asm/io.h                           |    4 
 arch/parisc/kernel/syscalls/syscall.tbl                |    3 
 arch/parisc/lib/iomap.c                                |   72 +++---
 arch/powerpc/kernel/iomap.c                            |   28 +-
 arch/powerpc/kernel/syscalls/syscall.tbl               |    3 
 arch/s390/kernel/syscalls/syscall.tbl                  |    3 
 arch/sh/configs/dreamcast_defconfig                    |    1 
 arch/sh/configs/espt_defconfig                         |    1 
 arch/sh/configs/hp6xx_defconfig                        |    1 
 arch/sh/configs/landisk_defconfig                      |    1 
 arch/sh/configs/lboxre2_defconfig                      |    1 
 arch/sh/configs/microdev_defconfig                     |    1 
 arch/sh/configs/migor_defconfig                        |    1 
 arch/sh/configs/r7780mp_defconfig                      |    1 
 arch/sh/configs/r7785rp_defconfig                      |    1 
 arch/sh/configs/rts7751r2d1_defconfig                  |    1 
 arch/sh/configs/rts7751r2dplus_defconfig               |    1 
 arch/sh/configs/se7206_defconfig                       |    1 
 arch/sh/configs/se7343_defconfig                       |    1 
 arch/sh/configs/se7619_defconfig                       |    1 
 arch/sh/configs/se7705_defconfig                       |    1 
 arch/sh/configs/se7750_defconfig                       |    1 
 arch/sh/configs/se7751_defconfig                       |    1 
 arch/sh/configs/secureedge5410_defconfig               |    1 
 arch/sh/configs/sh03_defconfig                         |    1 
 arch/sh/configs/sh7710voipgw_defconfig                 |    1 
 arch/sh/configs/sh7757lcr_defconfig                    |    1 
 arch/sh/configs/sh7763rdp_defconfig                    |    1 
 arch/sh/configs/shmin_defconfig                        |    1 
 arch/sh/configs/titan_defconfig                        |    1 
 arch/sh/include/asm/string_32.h                        |   26 --
 arch/sh/kernel/iomap.c                                 |   22 -
 arch/sh/kernel/syscalls/syscall.tbl                    |    3 
 arch/sparc/kernel/syscalls/syscall.tbl                 |    3 
 arch/x86/entry/syscalls/syscall_32.tbl                 |    3 
 arch/x86/entry/syscalls/syscall_64.tbl                 |    4 
 arch/xtensa/kernel/syscalls/syscall.tbl                |    3 
 drivers/mailbox/bcm-pdc-mailbox.c                      |    2 
 drivers/net/wireless/realtek/rtl818x/rtl8180/rtl8180.h |    6 
 drivers/ntb/hw/intel/ntb_hw_gen1.c                     |    2 
 drivers/ntb/hw/intel/ntb_hw_gen3.h                     |    2 
 drivers/ntb/hw/intel/ntb_hw_intel.h                    |    2 
 drivers/nvdimm/btt.c                                   |    4 
 drivers/nvdimm/pmem.c                                  |    6 
 drivers/sh/clk/cpg.c                                   |   25 --
 drivers/virtio/virtio_pci_modern.c                     |    6 
 fs/autofs/dev-ioctl.c                                  |    4 
 fs/io_uring.c                                          |    2 
 fs/namei.c                                             |    4 
 include/asm-generic/iomap.h                            |   28 +-
 include/asm-generic/pgalloc.h                          |    2 
 include/asm-generic/vmlinux.lds.h                      |    1 
 include/linux/compat.h                                 |    5 
 include/linux/huge_mm.h                                |   58 ++++-
 include/linux/io-64-nonatomic-hi-lo.h                  |    4 
 include/linux/io-64-nonatomic-lo-hi.h                  |    4 
 include/linux/memcontrol.h                             |    2 
 include/linux/mm.h                                     |   16 -
 include/linux/mm_inline.h                              |    6 
 include/linux/mm_types.h                               |    1 
 include/linux/pagemap.h                                |    6 
 include/linux/pid.h                                    |    1 
 include/linux/syscalls.h                               |    4 
 include/linux/sysctl.h                                 |    6 
 include/uapi/asm-generic/unistd.h                      |    4 
 kernel/Makefile                                        |    2 
 kernel/exit.c                                          |   17 -
 kernel/pid.c                                           |   17 +
 kernel/sys_ni.c                                        |    3 
 kernel/sysctl_binary.c                                 |  171 --------------
 lib/iomap.c                                            |   30 +-
 lib/lz4/lz4_compress.c                                 |    4 
 lib/lz4/lz4_decompress.c                               |   18 -
 lib/lz4/lz4defs.h                                      |   10 
 lib/lz4/lz4hc_compress.c                               |    2 
 mm/compaction.c                                        |    2 
 mm/filemap.c                                           |   22 +
 mm/frontswap.c                                         |    8 
 mm/gup.c                                               |    2 
 mm/internal.h                                          |    4 
 mm/kmemleak.c                                          |    2 
 mm/list_lru.c                                          |    2 
 mm/madvise.c                                           |  190 ++++++++++++++--
 mm/memcontrol.c                                        |   10 
 mm/memory.c                                            |    4 
 mm/memory_hotplug.c                                    |    7 
 mm/mempolicy.c                                         |    2 
 mm/mempool.c                                           |    2 
 mm/migrate.c                                           |   18 -
 mm/mlock.c                                             |    9 
 mm/page_alloc.c                                        |    5 
 mm/page_counter.c                                      |   13 -
 mm/page_io.c                                           |   12 -
 mm/page_vma_mapped.c                                   |    6 
 mm/rmap.c                                              |   10 
 mm/swap.c                                              |   21 -
 mm/swap_state.c                                        |   10 
 mm/swapfile.c                                          |   33 +-
 mm/vmscan.c                                            |    6 
 mm/vmstat.c                                            |   12 -
 mm/workingset.c                                        |    6 
 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl     |    2 
 tools/perf/arch/s390/entry/syscalls/syscall.tbl        |    2 
 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl      |    2 
 tools/testing/selftests/exec/.gitignore                |    1 
 tools/testing/selftests/exec/Makefile                  |    5 
 tools/testing/selftests/exec/non-regular.c             |  196 +++++++++++++++++
 132 files changed, 815 insertions(+), 614 deletions(-)


^ permalink raw reply	[flat|nested] 263+ messages in thread

* incoming
@ 2020-08-12  1:29 Andrew Morton
  0 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-08-12  1:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm


- Most of the rest of MM

- various other subsystems


165 patches, based on 00e4db51259a5f936fec1424b884f029479d3981.

Subsystems affected by this patch series:

  mm/memcg
  mm/hugetlb
  mm/vmscan
  mm/proc
  mm/compaction
  mm/mempolicy
  mm/oom-kill
  mm/hugetlbfs
  mm/migration
  mm/thp
  mm/cma
  mm/util
  mm/memory-hotplug
  mm/cleanups
  mm/uaccess
  alpha
  misc
  sparse
  bitmap
  lib
  lz4
  bitops
  checkpatch
  autofs
  minix
  nilfs
  ufs
  fat
  signals
  kmod
  coredump
  exec
  kdump
  rapidio
  panic
  kcov
  kgdb
  ipc
  mm/migration
  mm/gup
  mm/pagemap

Subsystem: mm/memcg

    Roman Gushchin <guro@fb.com>:
    Patch series "mm: memcg accounting of percpu memory", v3:
      percpu: return number of released bytes from pcpu_free_area()
      mm: memcg/percpu: account percpu memory to memory cgroups
      mm: memcg/percpu: per-memcg percpu memory statistics
      mm: memcg: charge memcg percpu memory to the parent cgroup
      kselftests: cgroup: add perpcu memory accounting test

Subsystem: mm/hugetlb

    Muchun Song <songmuchun@bytedance.com>:
      mm/hugetlb: add mempolicy check in the reservation routine

Subsystem: mm/vmscan

    Joonsoo Kim <iamjoonsoo.kim@lge.com>:
    Patch series "workingset protection/detection on the anonymous LRU list", v7:
      mm/vmscan: make active/inactive ratio as 1:1 for anon lru
      mm/vmscan: protect the workingset on anonymous LRU
      mm/workingset: prepare the workingset detection infrastructure for anon LRU
      mm/swapcache: support to handle the shadow entries
      mm/swap: implement workingset detection for anonymous LRU
      mm/vmscan: restore active/inactive ratio for anonymous LRU

Subsystem: mm/proc

    Michal Koutný <mkoutny@suse.com>:
      /proc/PID/smaps: consistent whitespace output format

Subsystem: mm/compaction

    Nitin Gupta <nigupta@nvidia.com>:
      mm: proactive compaction
      mm: fix compile error due to COMPACTION_HPAGE_ORDER
      mm: use unsigned types for fragmentation score

    Alex Shi <alex.shi@linux.alibaba.com>:
      mm/compaction: correct the comments of compact_defer_shift

Subsystem: mm/mempolicy

    Krzysztof Kozlowski <krzk@kernel.org>:
      mm: mempolicy: fix kerneldoc of numa_map_to_online_node()

    Wenchao Hao <haowenchao22@gmail.com>:
      mm/mempolicy.c: check parameters first in kernel_get_mempolicy

    Yanfei Xu <yanfei.xu@windriver.com>:
      include/linux/mempolicy.h: fix typo

Subsystem: mm/oom-kill

    Yafang Shao <laoar.shao@gmail.com>:
      mm, oom: make the calculation of oom badness more accurate

    Michal Hocko <mhocko@suse.com>:
      doc, mm: sync up oom_score_adj documentation
      doc, mm: clarify /proc/<pid>/oom_score value range

    Yafang Shao <laoar.shao@gmail.com>:
      mm, oom: show process exiting information in __oom_kill_process()

Subsystem: mm/hugetlbfs

    Mike Kravetz <mike.kravetz@oracle.com>:
      hugetlbfs: prevent filesystem stacking of hugetlbfs
      hugetlbfs: remove call to huge_pte_alloc without i_mmap_rwsem

Subsystem: mm/migration

    Ralph Campbell <rcampbell@nvidia.com>:
    Patch series "mm/migrate: optimize migrate_vma_setup() for holes":
      mm/migrate: optimize migrate_vma_setup() for holes
      mm/migrate: add migrate-shared test for migrate_vma_*()

Subsystem: mm/thp

    Yang Shi <yang.shi@linux.alibaba.com>:
      mm: thp: remove debug_cow switch

    Anshuman Khandual <anshuman.khandual@arm.com>:
      mm/vmstat: add events for THP migration without split

Subsystem: mm/cma

    Jianqun Xu <jay.xu@rock-chips.com>:
      mm/cma.c: fix NULL pointer dereference when cma could not be activated

    Barry Song <song.bao.hua@hisilicon.com>:
    Patch series "mm: fix the names of general cma and hugetlb cma", v2:
      mm: cma: fix the name of CMA areas
      mm: hugetlb: fix the name of hugetlb CMA

    Mike Kravetz <mike.kravetz@oracle.com>:
      cma: don't quit at first error when activating reserved areas

Subsystem: mm/util

    Waiman Long <longman@redhat.com>:
      include/linux/sched/mm.h: optimize current_gfp_context()

    Krzysztof Kozlowski <krzk@kernel.org>:
      mm: mmu_notifier: fix and extend kerneldoc

Subsystem: mm/memory-hotplug

    Daniel Jordan <daniel.m.jordan@oracle.com>:
      x86/mm: use max memory block size on bare metal

    Jia He <justin.he@arm.com>:
      mm/memory_hotplug: introduce default dummy memory_add_physaddr_to_nid()
      mm/memory_hotplug: fix unpaired mem_hotplug_begin/done

    Charan Teja Reddy <charante@codeaurora.org>:
      mm, memory_hotplug: update pcp lists everytime onlining a memory block

Subsystem: mm/cleanups

    Randy Dunlap <rdunlap@infradead.org>:
      mm: drop duplicated words in <linux/pgtable.h>
      mm: drop duplicated words in <linux/mm.h>
      include/linux/highmem.h: fix duplicated words in a comment
      include/linux/frontswap.h:  drop duplicated word in a comment
      include/linux/memcontrol.h: drop duplicate word and fix spello

    Arvind Sankar <nivedita@alum.mit.edu>:
      sh/mm: drop unused MAX_PHYSADDR_BITS
      sparc: drop unused MAX_PHYSADDR_BITS

    Randy Dunlap <rdunlap@infradead.org>:
      mm/compaction.c: delete duplicated word
      mm/filemap.c: delete duplicated word
      mm/hmm.c: delete duplicated word
      mm/hugetlb.c: delete duplicated words
      mm/memcontrol.c: delete duplicated words
      mm/memory.c: delete duplicated words
      mm/migrate.c: delete duplicated word
      mm/nommu.c: delete duplicated words
      mm/page_alloc.c: delete or fix duplicated words
      mm/shmem.c: delete duplicated word
      mm/slab_common.c: delete duplicated word
      mm/usercopy.c: delete duplicated word
      mm/vmscan.c: delete or fix duplicated words
      mm/zpool.c: delete duplicated word and fix grammar
      mm/zsmalloc.c: fix duplicated words

Subsystem: mm/uaccess

    Christoph Hellwig <hch@lst.de>:
    Patch series "clean up address limit helpers", v2:
      syscalls: use uaccess_kernel in addr_limit_user_check
      nds32: use uaccess_kernel in show_regs
      riscv: include <asm/pgtable.h> in <asm/uaccess.h>
      uaccess: remove segment_eq
      uaccess: add force_uaccess_{begin,end} helpers
      exec: use force_uaccess_begin during exec and exit

Subsystem: alpha

    Luc Van Oostenryck <luc.vanoostenryck@gmail.com>:
      alpha: fix annotation of io{read,write}{16,32}be()

Subsystem: misc

    Randy Dunlap <rdunlap@infradead.org>:
      include/linux/compiler-clang.h: drop duplicated word in a comment
      include/linux/exportfs.h: drop duplicated word in a comment
      include/linux/async_tx.h: drop duplicated word in a comment
      include/linux/xz.h: drop duplicated word

    Christoph Hellwig <hch@lst.de>:
      kernel: add a kernel_wait helper

    Feng Tang <feng.tang@intel.com>:
      ./Makefile: add debug option to enable function aligned on 32 bytes

    Arvind Sankar <nivedita@alum.mit.edu>:
      kernel.h: remove duplicate include of asm/div64.h

    "Alexander A. Klimov" <grandmaster@al2klimov.de>:
      include/: replace HTTP links with HTTPS ones

    Matthew Wilcox <willy@infradead.org>:
      include/linux/poison.h: remove obsolete comment

Subsystem: sparse

    Luc Van Oostenryck <luc.vanoostenryck@gmail.com>:
      sparse: group the defines by functionality

Subsystem: bitmap

    Stefano Brivio <sbrivio@redhat.com>:
    Patch series "lib: Fix bitmap_cut() for overlaps, add test":
      lib/bitmap.c: fix bitmap_cut() for partial overlapping case
      lib/test_bitmap.c: add test for bitmap_cut()

Subsystem: lib

    Luc Van Oostenryck <luc.vanoostenryck@gmail.com>:
      lib/generic-radix-tree.c: remove unneeded __rcu

    Geert Uytterhoeven <geert@linux-m68k.org>:
      lib/test_bitops: do the full test during module init

    Wei Yongjun <weiyongjun1@huawei.com>:
      lib/test_lockup.c: make symbol 'test_works' static

    Tiezhu Yang <yangtiezhu@loongson.cn>:
      lib/Kconfig.debug: make TEST_LOCKUP depend on module
      lib/test_lockup.c: fix return value of test_lockup_init()

    "Alexander A. Klimov" <grandmaster@al2klimov.de>:
      lib/: replace HTTP links with HTTPS ones

    "Kars Mulder" <kerneldev@karsmulder.nl>:
      kstrto*: correct documentation references to simple_strto*()
      kstrto*: do not describe simple_strto*() as obsolete/replaced

Subsystem: lz4

    Nick Terrell <terrelln@fb.com>:
      lz4: fix kernel decompression speed

Subsystem: bitops

    Rikard Falkeborn <rikard.falkeborn@gmail.com>:
      lib/test_bits.c: add tests of GENMASK

Subsystem: checkpatch

    Joe Perches <joe@perches.com>:
      checkpatch: add test for possible misuse of IS_ENABLED() without CONFIG_
      checkpatch: add --fix option for ASSIGN_IN_IF

    Quentin Monnet <quentin@isovalent.com>:
      checkpatch: fix CONST_STRUCT when const_structs.checkpatch is missing

    Joe Perches <joe@perches.com>:
      checkpatch: add test for repeated words
      checkpatch: remove missing switch/case break test

Subsystem: autofs

    Randy Dunlap <rdunlap@infradead.org>:
      autofs: fix doubled word

Subsystem: minix

    Eric Biggers <ebiggers@google.com>:
    Patch series "fs/minix: fix syzbot bugs and set s_maxbytes":
      fs/minix: check return value of sb_getblk()
      fs/minix: don't allow getting deleted inodes
      fs/minix: reject too-large maximum file size
      fs/minix: set s_maxbytes correctly
      fs/minix: fix block limit check for V1 filesystems
      fs/minix: remove expected error message in block_to_path()

Subsystem: nilfs

    Eric Biggers <ebiggers@google.com>:
    Patch series "nilfs2 updates":
      nilfs2: only call unlock_new_inode() if I_NEW

    Joe Perches <joe@perches.com>:
      nilfs2: convert __nilfs_msg to integrate the level and format
      nilfs2: use a more common logging style

Subsystem: ufs

    Colin Ian King <colin.king@canonical.com>:
      fs/ufs: avoid potential u32 multiplication overflow

Subsystem: fat

    Yubo Feng <fengyubo3@huawei.com>:
      fatfs: switch write_lock to read_lock in fat_ioctl_get_attributes

    "Alexander A. Klimov" <grandmaster@al2klimov.de>:
      VFAT/FAT/MSDOS FILESYSTEM: replace HTTP links with HTTPS ones

    OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>:
      fat: fix fat_ra_init() for data clusters == 0

Subsystem: signals

    Helge Deller <deller@gmx.de>:
      fs/signalfd.c: fix inconsistent return codes for signalfd4

Subsystem: kmod

    Tiezhu Yang <yangtiezhu@loongson.cn>:
    Patch series "kmod/umh: a few fixes":
      selftests: kmod: use variable NAME in kmod_test_0001()
      kmod: remove redundant "be an" in the comment
      test_kmod: avoid potential double free in trigger_config_run_type()

Subsystem: coredump

    Lepton Wu <ytht.net@gmail.com>:
      coredump: add %f for executable filename

Subsystem: exec

    Kees Cook <keescook@chromium.org>:
    Patch series "Relocate execve() sanity checks", v2:
      exec: change uselib(2) IS_SREG() failure to EACCES
      exec: move S_ISREG() check earlier
      exec: move path_noexec() check earlier

Subsystem: kdump

    Vijay Balakrishna <vijayb@linux.microsoft.com>:
      kdump: append kernel build-id string to VMCOREINFO

Subsystem: rapidio

    "Gustavo A. R. Silva" <gustavoars@kernel.org>:
      drivers/rapidio/devices/rio_mport_cdev.c: use struct_size() helper
      drivers/rapidio/rio-scan.c: use struct_size() helper
      rapidio/rio_mport_cdev: use array_size() helper in copy_{from,to}_user()

Subsystem: panic

    Tiezhu Yang <yangtiezhu@loongson.cn>:
      kernel/panic.c: make oops_may_print() return bool
      lib/Kconfig.debug: fix typo in the help text of CONFIG_PANIC_TIMEOUT

    Yue Hu <huyue2@yulong.com>:
      panic: make print_oops_end_marker() static

Subsystem: kcov

    Marco Elver <elver@google.com>:
      kcov: unconditionally add -fno-stack-protector to compiler options

    Wei Yongjun <weiyongjun1@huawei.com>:
      kcov: make some symbols static

Subsystem: kgdb

    Nick Desaulniers <ndesaulniers@google.com>:
      scripts/gdb: fix python 3.8 SyntaxWarning

Subsystem: ipc

    Alexey Dobriyan <adobriyan@gmail.com>:
      ipc: uninline functions

    Liao Pingfang <liao.pingfang@zte.com.cn>:
      ipc/shm.c: remove the superfluous break

Subsystem: mm/migration

    Joonsoo Kim <iamjoonsoo.kim@lge.com>:
    Patch series "clean-up the migration target allocation functions", v5:
      mm/page_isolation: prefer the node of the source page
      mm/migrate: move migration helper from .h to .c
      mm/hugetlb: unify migration callbacks
      mm/migrate: clear __GFP_RECLAIM to make the migration callback consistent with regular THP allocations
      mm/migrate: introduce a standard migration target allocation function
      mm/mempolicy: use a standard migration target allocation callback
      mm/page_alloc: remove a wrapper for alloc_migration_target()

Subsystem: mm/gup

    Joonsoo Kim <iamjoonsoo.kim@lge.com>:
      mm/gup: restrict CMA region by using allocation scope API
      mm/hugetlb: make hugetlb migration callback CMA aware
      mm/gup: use a standard migration target allocation callback

Subsystem: mm/pagemap

    Peter Xu <peterx@redhat.com>:
    Patch series "mm: Page fault accounting cleanups", v5:
      mm: do page fault accounting in handle_mm_fault
      mm/alpha: use general page fault accounting
      mm/arc: use general page fault accounting
      mm/arm: use general page fault accounting
      mm/arm64: use general page fault accounting
      mm/csky: use general page fault accounting
      mm/hexagon: use general page fault accounting
      mm/ia64: use general page fault accounting
      mm/m68k: use general page fault accounting
      mm/microblaze: use general page fault accounting
      mm/mips: use general page fault accounting
      mm/nds32: use general page fault accounting
      mm/nios2: use general page fault accounting
      mm/openrisc: use general page fault accounting
      mm/parisc: use general page fault accounting
      mm/powerpc: use general page fault accounting
      mm/riscv: use general page fault accounting
      mm/s390: use general page fault accounting
      mm/sh: use general page fault accounting
      mm/sparc32: use general page fault accounting
      mm/sparc64: use general page fault accounting
      mm/x86: use general page fault accounting
      mm/xtensa: use general page fault accounting
      mm: clean up the last pieces of page fault accountings
      mm/gup: remove task_struct pointer for all gup code

 Documentation/admin-guide/cgroup-v2.rst         |    4 
 Documentation/admin-guide/sysctl/kernel.rst     |    3 
 Documentation/admin-guide/sysctl/vm.rst         |   15 +
 Documentation/filesystems/proc.rst              |   11 -
 Documentation/vm/page_migration.rst             |   27 +++
 Makefile                                        |    4 
 arch/alpha/include/asm/io.h                     |    8 
 arch/alpha/include/asm/uaccess.h                |    2 
 arch/alpha/mm/fault.c                           |   10 -
 arch/arc/include/asm/segment.h                  |    3 
 arch/arc/kernel/process.c                       |    2 
 arch/arc/mm/fault.c                             |   20 --
 arch/arm/include/asm/uaccess.h                  |    4 
 arch/arm/kernel/signal.c                        |    2 
 arch/arm/mm/fault.c                             |   27 ---
 arch/arm64/include/asm/uaccess.h                |    2 
 arch/arm64/kernel/sdei.c                        |    2 
 arch/arm64/mm/fault.c                           |   31 ---
 arch/arm64/mm/numa.c                            |   10 -
 arch/csky/include/asm/segment.h                 |    2 
 arch/csky/mm/fault.c                            |   15 -
 arch/h8300/include/asm/segment.h                |    2 
 arch/hexagon/mm/vm_fault.c                      |   11 -
 arch/ia64/include/asm/uaccess.h                 |    2 
 arch/ia64/mm/fault.c                            |   11 -
 arch/ia64/mm/numa.c                             |    2 
 arch/m68k/include/asm/segment.h                 |    2 
 arch/m68k/include/asm/tlbflush.h                |    6 
 arch/m68k/mm/fault.c                            |   16 -
 arch/microblaze/include/asm/uaccess.h           |    2 
 arch/microblaze/mm/fault.c                      |   11 -
 arch/mips/include/asm/uaccess.h                 |    2 
 arch/mips/kernel/unaligned.c                    |   27 +--
 arch/mips/mm/fault.c                            |   16 -
 arch/nds32/include/asm/uaccess.h                |    2 
 arch/nds32/kernel/process.c                     |    2 
 arch/nds32/mm/alignment.c                       |    7 
 arch/nds32/mm/fault.c                           |   21 --
 arch/nios2/include/asm/uaccess.h                |    2 
 arch/nios2/mm/fault.c                           |   16 -
 arch/openrisc/include/asm/uaccess.h             |    2 
 arch/openrisc/mm/fault.c                        |   11 -
 arch/parisc/include/asm/uaccess.h               |    2 
 arch/parisc/mm/fault.c                          |   10 -
 arch/powerpc/include/asm/uaccess.h              |    3 
 arch/powerpc/mm/copro_fault.c                   |    7 
 arch/powerpc/mm/fault.c                         |   13 -
 arch/riscv/include/asm/uaccess.h                |    6 
 arch/riscv/mm/fault.c                           |   18 --
 arch/s390/include/asm/uaccess.h                 |    2 
 arch/s390/kvm/interrupt.c                       |    2 
 arch/s390/kvm/kvm-s390.c                        |    2 
 arch/s390/kvm/priv.c                            |    8 
 arch/s390/mm/fault.c                            |   18 --
 arch/s390/mm/gmap.c                             |    4 
 arch/sh/include/asm/segment.h                   |    3 
 arch/sh/include/asm/sparsemem.h                 |    4 
 arch/sh/kernel/traps_32.c                       |   12 -
 arch/sh/mm/fault.c                              |   13 -
 arch/sh/mm/init.c                               |    9 -
 arch/sparc/include/asm/sparsemem.h              |    1 
 arch/sparc/include/asm/uaccess_32.h             |    2 
 arch/sparc/include/asm/uaccess_64.h             |    2 
 arch/sparc/mm/fault_32.c                        |   15 -
 arch/sparc/mm/fault_64.c                        |   13 -
 arch/um/kernel/trap.c                           |    6 
 arch/x86/include/asm/uaccess.h                  |    2 
 arch/x86/mm/fault.c                             |   19 --
 arch/x86/mm/init_64.c                           |    9 +
 arch/x86/mm/numa.c                              |    1 
 arch/xtensa/include/asm/uaccess.h               |    2 
 arch/xtensa/mm/fault.c                          |   17 -
 drivers/firmware/arm_sdei.c                     |    5 
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c     |    2 
 drivers/infiniband/core/umem_odp.c              |    2 
 drivers/iommu/amd/iommu_v2.c                    |    2 
 drivers/iommu/intel/svm.c                       |    3 
 drivers/rapidio/devices/rio_mport_cdev.c        |    7 
 drivers/rapidio/rio-scan.c                      |    8 
 drivers/vfio/vfio_iommu_type1.c                 |    4 
 fs/coredump.c                                   |   17 +
 fs/exec.c                                       |   38 ++--
 fs/fat/Kconfig                                  |    2 
 fs/fat/fatent.c                                 |    3 
 fs/fat/file.c                                   |    4 
 fs/hugetlbfs/inode.c                            |    6 
 fs/minix/inode.c                                |   48 ++++-
 fs/minix/itree_common.c                         |    8 
 fs/minix/itree_v1.c                             |   16 -
 fs/minix/itree_v2.c                             |   15 -
 fs/minix/minix.h                                |    1 
 fs/namei.c                                      |   10 -
 fs/nilfs2/alloc.c                               |   38 ++--
 fs/nilfs2/btree.c                               |   42 ++--
 fs/nilfs2/cpfile.c                              |   10 -
 fs/nilfs2/dat.c                                 |   14 -
 fs/nilfs2/direct.c                              |   14 -
 fs/nilfs2/gcinode.c                             |    2 
 fs/nilfs2/ifile.c                               |    4 
 fs/nilfs2/inode.c                               |   32 +--
 fs/nilfs2/ioctl.c                               |   37 ++--
 fs/nilfs2/mdt.c                                 |    2 
 fs/nilfs2/namei.c                               |    6 
 fs/nilfs2/nilfs.h                               |   18 +-
 fs/nilfs2/page.c                                |   11 -
 fs/nilfs2/recovery.c                            |   32 +--
 fs/nilfs2/segbuf.c                              |    2 
 fs/nilfs2/segment.c                             |   38 ++--
 fs/nilfs2/sufile.c                              |   29 +--
 fs/nilfs2/super.c                               |   73 ++++----
 fs/nilfs2/sysfs.c                               |   29 +--
 fs/nilfs2/the_nilfs.c                           |   85 ++++-----
 fs/open.c                                       |    6 
 fs/proc/base.c                                  |   11 +
 fs/proc/task_mmu.c                              |    4 
 fs/signalfd.c                                   |   10 -
 fs/ufs/super.c                                  |    2 
 include/asm-generic/uaccess.h                   |    4 
 include/clocksource/timer-ti-dm.h               |    2 
 include/linux/async_tx.h                        |    2 
 include/linux/btree.h                           |    2 
 include/linux/compaction.h                      |    6 
 include/linux/compiler-clang.h                  |    2 
 include/linux/compiler_types.h                  |   44 ++---
 include/linux/crash_core.h                      |    6 
 include/linux/delay.h                           |    2 
 include/linux/dma/k3-psil.h                     |    2 
 include/linux/dma/k3-udma-glue.h                |    2 
 include/linux/dma/ti-cppi5.h                    |    2 
 include/linux/exportfs.h                        |    2 
 include/linux/frontswap.h                       |    2 
 include/linux/fs.h                              |   10 +
 include/linux/generic-radix-tree.h              |    2 
 include/linux/highmem.h                         |    2 
 include/linux/huge_mm.h                         |    7 
 include/linux/hugetlb.h                         |   53 ++++--
 include/linux/irqchip/irq-omap-intc.h           |    2 
 include/linux/jhash.h                           |    2 
 include/linux/kernel.h                          |   12 -
 include/linux/leds-ti-lmu-common.h              |    2 
 include/linux/memcontrol.h                      |   12 +
 include/linux/mempolicy.h                       |   18 +-
 include/linux/migrate.h                         |   42 +---
 include/linux/mm.h                              |   20 +-
 include/linux/mmzone.h                          |   17 +
 include/linux/oom.h                             |    4 
 include/linux/pgtable.h                         |   12 -
 include/linux/platform_data/davinci-cpufreq.h   |    2 
 include/linux/platform_data/davinci_asp.h       |    2 
 include/linux/platform_data/elm.h               |    2 
 include/linux/platform_data/gpio-davinci.h      |    2 
 include/linux/platform_data/gpmc-omap.h         |    2 
 include/linux/platform_data/mtd-davinci-aemif.h |    2 
 include/linux/platform_data/omap-twl4030.h      |    2 
 include/linux/platform_data/uio_pruss.h         |    2 
 include/linux/platform_data/usb-omap.h          |    2 
 include/linux/poison.h                          |    4 
 include/linux/sched/mm.h                        |    8 
 include/linux/sched/task.h                      |    1 
 include/linux/soc/ti/k3-ringacc.h               |    2 
 include/linux/soc/ti/knav_qmss.h                |    2 
 include/linux/soc/ti/ti-msgmgr.h                |    2 
 include/linux/swap.h                            |   25 ++
 include/linux/syscalls.h                        |    2 
 include/linux/uaccess.h                         |   20 ++
 include/linux/vm_event_item.h                   |    3 
 include/linux/wkup_m3_ipc.h                     |    2 
 include/linux/xxhash.h                          |    2 
 include/linux/xz.h                              |    4 
 include/linux/zlib.h                            |    2 
 include/soc/arc/aux.h                           |    2 
 include/trace/events/migrate.h                  |   17 +
 include/uapi/linux/auto_dev-ioctl.h             |    2 
 include/uapi/linux/elf.h                        |    2 
 include/uapi/linux/map_to_7segment.h            |    2 
 include/uapi/linux/types.h                      |    2 
 include/uapi/linux/usb/ch9.h                    |    2 
 ipc/sem.c                                       |    3 
 ipc/shm.c                                       |    4 
 kernel/Makefile                                 |    2 
 kernel/crash_core.c                             |   50 +++++
 kernel/events/callchain.c                       |    5 
 kernel/events/core.c                            |    5 
 kernel/events/uprobes.c                         |    8 
 kernel/exit.c                                   |   18 +-
 kernel/futex.c                                  |    2 
 kernel/kcov.c                                   |    6 
 kernel/kmod.c                                   |    5 
 kernel/kthread.c                                |    5 
 kernel/panic.c                                  |    4 
 kernel/stacktrace.c                             |    5 
 kernel/sysctl.c                                 |   11 +
 kernel/umh.c                                    |   29 ---
 lib/Kconfig.debug                               |   27 ++-
 lib/Makefile                                    |    1 
 lib/bitmap.c                                    |    4 
 lib/crc64.c                                     |    2 
 lib/decompress_bunzip2.c                        |    2 
 lib/decompress_unlzma.c                         |    6 
 lib/kstrtox.c                                   |   20 --
 lib/lz4/lz4_compress.c                          |    4 
 lib/lz4/lz4_decompress.c                        |   18 +-
 lib/lz4/lz4defs.h                               |   10 +
 lib/lz4/lz4hc_compress.c                        |    2 
 lib/math/rational.c                             |    2 
 lib/rbtree.c                                    |    2 
 lib/test_bitmap.c                               |   58 ++++++
 lib/test_bitops.c                               |   18 +-
 lib/test_bits.c                                 |   75 ++++++++
 lib/test_kmod.c                                 |    2 
 lib/test_lockup.c                               |    6 
 lib/ts_bm.c                                     |    2 
 lib/xxhash.c                                    |    2 
 lib/xz/xz_crc32.c                               |    2 
 lib/xz/xz_dec_bcj.c                             |    2 
 lib/xz/xz_dec_lzma2.c                           |    2 
 lib/xz/xz_lzma2.h                               |    2 
 lib/xz/xz_stream.h                              |    2 
 mm/cma.c                                        |   40 +---
 mm/cma.h                                        |    4 
 mm/compaction.c                                 |  207 +++++++++++++++++++++--
 mm/filemap.c                                    |    2 
 mm/gup.c                                        |  195 ++++++----------------
 mm/hmm.c                                        |    5 
 mm/huge_memory.c                                |   23 --
 mm/hugetlb.c                                    |   93 ++++------
 mm/internal.h                                   |    9 -
 mm/khugepaged.c                                 |    2 
 mm/ksm.c                                        |    3 
 mm/maccess.c                                    |   22 +-
 mm/memcontrol.c                                 |   42 +++-
 mm/memory-failure.c                             |    7 
 mm/memory.c                                     |  107 +++++++++---
 mm/memory_hotplug.c                             |   30 ++-
 mm/mempolicy.c                                  |   49 +----
 mm/migrate.c                                    |  151 ++++++++++++++---
 mm/mmu_notifier.c                               |    9 -
 mm/nommu.c                                      |    4 
 mm/oom_kill.c                                   |   24 +-
 mm/page_alloc.c                                 |   14 +
 mm/page_isolation.c                             |   21 --
 mm/percpu-internal.h                            |   55 ++++++
 mm/percpu-km.c                                  |    5 
 mm/percpu-stats.c                               |   36 ++--
 mm/percpu-vm.c                                  |    5 
 mm/percpu.c                                     |  208 +++++++++++++++++++++---
 mm/process_vm_access.c                          |    2 
 mm/rmap.c                                       |    2 
 mm/shmem.c                                      |    5 
 mm/slab_common.c                                |    2 
 mm/swap.c                                       |   13 -
 mm/swap_state.c                                 |   80 +++++++--
 mm/swapfile.c                                   |    4 
 mm/usercopy.c                                   |    2 
 mm/userfaultfd.c                                |    2 
 mm/vmscan.c                                     |   36 ++--
 mm/vmstat.c                                     |   32 +++
 mm/workingset.c                                 |   23 +-
 mm/zpool.c                                      |    8 
 mm/zsmalloc.c                                   |    2 
 scripts/checkpatch.pl                           |  116 +++++++++----
 scripts/gdb/linux/rbtree.py                     |    4 
 security/tomoyo/domain.c                        |    2 
 tools/testing/selftests/cgroup/test_kmem.c      |   70 +++++++-
 tools/testing/selftests/kmod/kmod.sh            |    4 
 tools/testing/selftests/vm/hmm-tests.c          |   35 ++++
 virt/kvm/async_pf.c                             |    2 
 virt/kvm/kvm_main.c                             |    2 
 268 files changed, 2481 insertions(+), 1551 deletions(-)


^ permalink raw reply	[flat|nested] 263+ messages in thread

* incoming
@ 2020-08-07  6:16 Andrew Morton
  0 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-08-07  6:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm


- A few MM hotfixes

- kthread, tools, scripts, ntfs and ocfs2

- Some of MM



163 patches, based on d6efb3ac3e6c19ab722b28bdb9252bae0b9676b6.

Subsystems affected by this patch series:

  mm/pagemap
  mm/hofixes
  mm/pagealloc
  kthread
  tools
  scripts
  ntfs
  ocfs2
  mm/slab-generic
  mm/slab
  mm/slub
  mm/kcsan
  mm/debug
  mm/pagecache
  mm/gup
  mm/swap
  mm/shmem
  mm/memcg
  mm/pagemap
  mm/mremap
  mm/mincore
  mm/sparsemem
  mm/vmalloc
  mm/kasan
  mm/pagealloc
  mm/hugetlb
  mm/vmscan

Subsystem: mm/pagemap

    Yang Shi <yang.shi@linux.alibaba.com>:
      mm/memory.c: avoid access flag update TLB flush for retried page fault

Subsystem: mm/hofixes

    Ralph Campbell <rcampbell@nvidia.com>:
      mm/migrate: fix migrate_pgmap_owner w/o CONFIG_MMU_NOTIFIER

Subsystem: mm/pagealloc

    David Hildenbrand <david@redhat.com>:
      mm/shuffle: don't move pages between zones and don't read garbage memmaps

Subsystem: kthread

    Peter Zijlstra <peterz@infradead.org>:
      mm: fix kthread_use_mm() vs TLB invalidate

    Ilias Stamatis <stamatis.iliass@gmail.com>:
      kthread: remove incorrect comment in kthread_create_on_cpu()

Subsystem: tools

    "Alexander A. Klimov" <grandmaster@al2klimov.de>:
      tools/: replace HTTP links with HTTPS ones

    Gaurav Singh <gaurav1086@gmail.com>:
      tools/testing/selftests/cgroup/cgroup_util.c: cg_read_strcmp: fix null pointer dereference

Subsystem: scripts

    Jialu Xu <xujialu@vimux.org>:
      scripts/tags.sh: collect compiled source precisely

    Nikolay Borisov <nborisov@suse.com>:
      scripts/bloat-o-meter: Support comparing library archives

    Konstantin Khlebnikov <khlebnikov@yandex-team.ru>:
      scripts/decode_stacktrace.sh: skip missing symbols
      scripts/decode_stacktrace.sh: guess basepath if not specified
      scripts/decode_stacktrace.sh: guess path to modules
      scripts/decode_stacktrace.sh: guess path to vmlinux by release name

    Joe Perches <joe@perches.com>:
      const_structs.checkpatch: add regulator_ops

    Colin Ian King <colin.king@canonical.com>:
      scripts/spelling.txt: add more spellings to spelling.txt

Subsystem: ntfs

    Luca Stefani <luca.stefani.ge1@gmail.com>:
      ntfs: fix ntfs_test_inode and ntfs_init_locked_inode function type

Subsystem: ocfs2

    Gang He <ghe@suse.com>:
      ocfs2: fix remounting needed after setfacl command

    Randy Dunlap <rdunlap@infradead.org>:
      ocfs2: suballoc.h: delete a duplicated word

    Junxiao Bi <junxiao.bi@oracle.com>:
      ocfs2: change slot number type s16 to u16

    "Alexander A. Klimov" <grandmaster@al2klimov.de>:
      ocfs2: replace HTTP links with HTTPS ones

    Pavel Machek <pavel@ucw.cz>:
      ocfs2: fix unbalanced locking

Subsystem: mm/slab-generic

    Waiman Long <longman@redhat.com>:
      mm, treewide: rename kzfree() to kfree_sensitive()

    William Kucharski <william.kucharski@oracle.com>:
      mm: ksize() should silently accept a NULL pointer

Subsystem: mm/slab

    Kees Cook <keescook@chromium.org>:
    Patch series "mm: Expand CONFIG_SLAB_FREELIST_HARDENED to include SLAB":
      mm/slab: expand CONFIG_SLAB_FREELIST_HARDENED to include SLAB
      mm/slab: add naive detection of double free

    Long Li <lonuxli.64@gmail.com>:
      mm, slab: check GFP_SLAB_BUG_MASK before alloc_pages in kmalloc_order

    Xiao Yang <yangx.jy@cn.fujitsu.com>:
      mm/slab.c: update outdated kmem_list3 in a comment

Subsystem: mm/slub

    Vlastimil Babka <vbabka@suse.cz>:
    Patch series "slub_debug fixes and improvements":
      mm, slub: extend slub_debug syntax for multiple blocks
      mm, slub: make some slub_debug related attributes read-only
      mm, slub: remove runtime allocation order changes
      mm, slub: make remaining slub_debug related attributes read-only
      mm, slub: make reclaim_account attribute read-only
      mm, slub: introduce static key for slub_debug()
      mm, slub: introduce kmem_cache_debug_flags()
      mm, slub: extend checks guarded by slub_debug static key
      mm, slab/slub: move and improve cache_from_obj()
      mm, slab/slub: improve error reporting and overhead of cache_from_obj()

    Sebastian Andrzej Siewior <bigeasy@linutronix.de>:
      mm/slub.c: drop lockdep_assert_held() from put_map()

Subsystem: mm/kcsan

    Marco Elver <elver@google.com>:
      mm, kcsan: instrument SLAB/SLUB free with "ASSERT_EXCLUSIVE_ACCESS"

Subsystem: mm/debug

    Anshuman Khandual <anshuman.khandual@arm.com>:
    Patch series "mm/debug_vm_pgtable: Add some more tests", v5:
      mm/debug_vm_pgtable: add tests validating arch helpers for core MM features
      mm/debug_vm_pgtable: add tests validating advanced arch page table helpers
      mm/debug_vm_pgtable: add debug prints for individual tests
      Documentation/mm: add descriptions for arch page table helpers

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
    Patch series "Improvements for dump_page()", v2:
      mm/debug: handle page->mapping better in dump_page
      mm/debug: dump compound page information on a second line
      mm/debug: print head flags in dump_page
      mm/debug: switch dump_page to get_kernel_nofault
      mm/debug: print the inode number in dump_page
      mm/debug: print hashed address of struct page

    John Hubbard <jhubbard@nvidia.com>:
      mm, dump_page: do not crash with bad compound_mapcount()

Subsystem: mm/pagecache

    Yang Shi <yang.shi@linux.alibaba.com>:
      mm: filemap: clear idle flag for writes
      mm: filemap: add missing FGP_ flags in kerneldoc comment for pagecache_get_page

Subsystem: mm/gup

    Tang Yizhou <tangyizhou@huawei.com>:
      mm/gup.c: fix the comment of return value for populate_vma_page_range()

Subsystem: mm/swap

    Zhen Lei <thunder.leizhen@huawei.com>:
    Patch series "clean up some functions in mm/swap_slots.c":
      mm/swap_slots.c: simplify alloc_swap_slot_cache()
      mm/swap_slots.c: simplify enable_swap_slots_cache()
      mm/swap_slots.c: remove redundant check for swap_slot_cache_initialized

    Krzysztof Kozlowski <krzk@kernel.org>:
      mm: swap: fix kerneldoc of swap_vma_readahead()

    Xianting Tian <xianting_tian@126.com>:
      mm/page_io.c: use blk_io_schedule() for avoiding task hung in sync io

Subsystem: mm/shmem

    Chris Down <chris@chrisdown.name>:
    Patch series "tmpfs: inode: Reduce risk of inum overflow", v7:
      tmpfs: per-superblock i_ino support
      tmpfs: support 64-bit inums per-sb

Subsystem: mm/memcg

    Roman Gushchin <guro@fb.com>:
      mm: kmem: make memcg_kmem_enabled() irreversible
    Patch series "The new cgroup slab memory controller", v7:
      mm: memcg: factor out memcg- and lruvec-level changes out of __mod_lruvec_state()
      mm: memcg: prepare for byte-sized vmstat items
      mm: memcg: convert vmstat slab counters to bytes
      mm: slub: implement SLUB version of obj_to_index()

    Johannes Weiner <hannes@cmpxchg.org>:
      mm: memcontrol: decouple reference counting from page accounting

    Roman Gushchin <guro@fb.com>:
      mm: memcg/slab: obj_cgroup API
      mm: memcg/slab: allocate obj_cgroups for non-root slab pages
      mm: memcg/slab: save obj_cgroup for non-root slab objects
      mm: memcg/slab: charge individual slab objects instead of pages
      mm: memcg/slab: deprecate memory.kmem.slabinfo
      mm: memcg/slab: move memcg_kmem_bypass() to memcontrol.h
      mm: memcg/slab: use a single set of kmem_caches for all accounted allocations
      mm: memcg/slab: simplify memcg cache creation
      mm: memcg/slab: remove memcg_kmem_get_cache()
      mm: memcg/slab: deprecate slab_root_caches
      mm: memcg/slab: remove redundant check in memcg_accumulate_slabinfo()
      mm: memcg/slab: use a single set of kmem_caches for all allocations
      kselftests: cgroup: add kernel memory accounting tests
      tools/cgroup: add memcg_slabinfo.py tool

    Shakeel Butt <shakeelb@google.com>:
      mm: memcontrol: account kernel stack per node

    Roman Gushchin <guro@fb.com>:
      mm: memcg/slab: remove unused argument by charge_slab_page()
      mm: slab: rename (un)charge_slab_page() to (un)account_slab_page()
      mm: kmem: switch to static_branch_likely() in memcg_kmem_enabled()
      mm: memcontrol: avoid workload stalls when lowering memory.high

    Chris Down <chris@chrisdown.name>:
    Patch series "mm, memcg: reclaim harder before high throttling", v2:
      mm, memcg: reclaim more aggressively before high allocator throttling
      mm, memcg: unify reclaim retry limits with page allocator

    Yafang Shao <laoar.shao@gmail.com>:
    Patch series "mm, memcg: memory.{low,min} reclaim fix & cleanup", v4:
      mm, memcg: avoid stale protection values when cgroup is above protection

    Chris Down <chris@chrisdown.name>:
      mm, memcg: decouple e{low,min} state mutations from protection checks

    Yafang Shao <laoar.shao@gmail.com>:
      memcg, oom: check memcg margin for parallel oom

    Johannes Weiner <hannes@cmpxchg.org>:
      mm: memcontrol: restore proper dirty throttling when memory.high changes
      mm: memcontrol: don't count limit-setting reclaim as memory pressure

    Michal Koutný <mkoutny@suse.com>:
      mm/page_counter.c: fix protection usage propagation

Subsystem: mm/pagemap

    Ralph Campbell <rcampbell@nvidia.com>:
      mm: remove redundant check non_swap_entry()

    Alex Zhang <zhangalex@google.com>:
      mm/memory.c: make remap_pfn_range() reject unaligned addr

    Mike Rapoport <rppt@linux.ibm.com>:
    Patch series "mm: cleanup usage of <asm/pgalloc.h>":
      mm: remove unneeded includes of <asm/pgalloc.h>
      opeinrisc: switch to generic version of pte allocation
      xtensa: switch to generic version of pte allocation
      asm-generic: pgalloc: provide generic pmd_alloc_one() and pmd_free_one()
      asm-generic: pgalloc: provide generic pud_alloc_one() and pud_free_one()
      asm-generic: pgalloc: provide generic pgd_free()
      mm: move lib/ioremap.c to mm/

    Joerg Roedel <jroedel@suse.de>:
      mm: move p?d_alloc_track to separate header file

    Zhen Lei <thunder.leizhen@huawei.com>:
      mm/mmap: optimize a branch judgment in ksys_mmap_pgoff()

    Feng Tang <feng.tang@intel.com>:
    Patch series "make vm_committed_as_batch aware of vm overcommit policy", v6:
      proc/meminfo: avoid open coded reading of vm_committed_as
      mm/util.c: make vm_memory_committed() more accurate
      percpu_counter: add percpu_counter_sync()
      mm: adjust vm_committed_as_batch according to vm overcommit policy

    Anshuman Khandual <anshuman.khandual@arm.com>:
    Patch series "arm64: Enable vmemmap mapping from device memory", v4:
      mm/sparsemem: enable vmem_altmap support in vmemmap_populate_basepages()
      mm/sparsemem: enable vmem_altmap support in vmemmap_alloc_block_buf()
      arm64/mm: enable vmem_altmap support for vmemmap mappings

    Miaohe Lin <linmiaohe@huawei.com>:
      mm: mmap: merge vma after call_mmap() if possible

    Peter Collingbourne <pcc@google.com>:
      mm: remove unnecessary wrapper function do_mmap_pgoff()

Subsystem: mm/mremap

    Wei Yang <richard.weiyang@linux.alibaba.com>:
    Patch series "mm/mremap: cleanup move_page_tables() a little", v5:
      mm/mremap: it is sure to have enough space when extent meets requirement
      mm/mremap: calculate extent in one place
      mm/mremap: start addresses are properly aligned

Subsystem: mm/mincore

    Ricardo Cañuelo <ricardo.canuelo@collabora.com>:
      selftests: add mincore() tests

Subsystem: mm/sparsemem

    Wei Yang <richard.weiyang@linux.alibaba.com>:
      mm/sparse: never partially remove memmap for early section
      mm/sparse: only sub-section aligned range would be populated

    Mike Rapoport <rppt@linux.ibm.com>:
      mm/sparse: cleanup the code surrounding memory_present()

Subsystem: mm/vmalloc

    "Matthew Wilcox (Oracle)" <willy@infradead.org>:
      vmalloc: convert to XArray

    "Uladzislau Rezki (Sony)" <urezki@gmail.com>:
      mm/vmalloc: simplify merge_or_add_vmap_area()
      mm/vmalloc: simplify augment_tree_propagate_check()
      mm/vmalloc: switch to "propagate()" callback
      mm/vmalloc: update the header about KVA rework

    Mike Rapoport <rppt@linux.ibm.com>:
      mm: vmalloc: remove redundant assignment in unmap_kernel_range_noflush()

    "Uladzislau Rezki (Sony)" <urezki@gmail.com>:
      mm/vmalloc.c: remove BUG() from the find_va_links()

Subsystem: mm/kasan

    Marco Elver <elver@google.com>:
      kasan: improve and simplify Kconfig.kasan
      kasan: update required compiler versions in documentation

    Walter Wu <walter-zh.wu@mediatek.com>:
    Patch series "kasan: memorize and print call_rcu stack", v8:
      rcu: kasan: record and print call_rcu() call stack
      kasan: record and print the free track
      kasan: add tests for call_rcu stack recording
      kasan: update documentation for generic kasan

    Vincenzo Frascino <vincenzo.frascino@arm.com>:
      kasan: remove kasan_unpoison_stack_above_sp_to()

    Walter Wu <walter-zh.wu@mediatek.com>:
      lib/test_kasan.c: fix KASAN unit tests for tag-based KASAN

    Andrey Konovalov <andreyknvl@google.com>:
    Patch series "kasan: support stack instrumentation for tag-based mode", v2:
      kasan: don't tag stacks allocated with pagealloc
      efi: provide empty efi_enter_virtual_mode implementation
      kasan, arm64: don't instrument functions that enable kasan
      kasan: allow enabling stack tagging for tag-based mode
      kasan: adjust kasan_stack_oob for tag-based mode

Subsystem: mm/pagealloc

    Vlastimil Babka <vbabka@suse.cz>:
      mm, page_alloc: use unlikely() in task_capc()

    Jaewon Kim <jaewon31.kim@samsung.com>:
      page_alloc: consider highatomic reserve in watermark fast

    Charan Teja Reddy <charante@codeaurora.org>:
      mm, page_alloc: skip ->waternark_boost for atomic order-0 allocations

    David Hildenbrand <david@redhat.com>:
      mm: remove vm_total_pages
      mm/page_alloc: remove nr_free_pagecache_pages()
      mm/memory_hotplug: document why shuffle_zone() is relevant
      mm/shuffle: remove dynamic reconfiguration

    Wei Yang <richard.weiyang@linux.alibaba.com>:
      mm/page_alloc.c: replace the definition of NR_MIGRATETYPE_BITS with PB_migratetype_bits
      mm/page_alloc.c: extract the common part in pfn_to_bitidx()
      mm/page_alloc.c: simplify pageblock bitmap access
      mm/page_alloc.c: remove unnecessary end_bitidx for [set|get]_pfnblock_flags_mask()

    Qian Cai <cai@lca.pw>:
      mm/page_alloc: silence a KASAN false positive

    Wei Yang <richard.weiyang@linux.alibaba.com>:
      mm/page_alloc: fallbacks at most has 3 elements

    Muchun Song <songmuchun@bytedance.com>:
      mm/page_alloc.c: skip setting nodemask when we are in interrupt

    Joonsoo Kim <iamjoonsoo.kim@lge.com>:
      mm/page_alloc: fix memalloc_nocma_{save/restore} APIs

Subsystem: mm/hugetlb

    "Alexander A. Klimov" <grandmaster@al2klimov.de>:
      mm: thp: replace HTTP links with HTTPS ones

    Peter Xu <peterx@redhat.com>:
      mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible

    Hugh Dickins <hughd@google.com>:
      khugepaged: collapse_pte_mapped_thp() flush the right range
      khugepaged: collapse_pte_mapped_thp() protect the pmd lock
      khugepaged: retract_page_tables() remember to test exit
      khugepaged: khugepaged_test_exit() check mmget_still_valid()

Subsystem: mm/vmscan

    dylan-meiners <spacct.spacct@gmail.com>:
      mm/vmscan.c: fix typo

    Shakeel Butt <shakeelb@google.com>:
      mm: vmscan: consistent update to pgrefill

 Documentation/admin-guide/kernel-parameters.txt        |    2 
 Documentation/dev-tools/kasan.rst                      |   10 
 Documentation/filesystems/dlmfs.rst                    |    2 
 Documentation/filesystems/ocfs2.rst                    |    2 
 Documentation/filesystems/tmpfs.rst                    |   18 
 Documentation/vm/arch_pgtable_helpers.rst              |  258 +++++
 Documentation/vm/memory-model.rst                      |    9 
 Documentation/vm/slub.rst                              |   51 -
 arch/alpha/include/asm/pgalloc.h                       |   21 
 arch/alpha/include/asm/tlbflush.h                      |    1 
 arch/alpha/kernel/core_irongate.c                      |    1 
 arch/alpha/kernel/core_marvel.c                        |    1 
 arch/alpha/kernel/core_titan.c                         |    1 
 arch/alpha/kernel/machvec_impl.h                       |    2 
 arch/alpha/kernel/smp.c                                |    1 
 arch/alpha/mm/numa.c                                   |    1 
 arch/arc/mm/fault.c                                    |    1 
 arch/arc/mm/init.c                                     |    1 
 arch/arm/include/asm/pgalloc.h                         |   12 
 arch/arm/include/asm/tlb.h                             |    1 
 arch/arm/kernel/machine_kexec.c                        |    1 
 arch/arm/kernel/smp.c                                  |    1 
 arch/arm/kernel/suspend.c                              |    1 
 arch/arm/mach-omap2/omap-mpuss-lowpower.c              |    1 
 arch/arm/mm/hugetlbpage.c                              |    1 
 arch/arm/mm/init.c                                     |    9 
 arch/arm/mm/mmu.c                                      |    1 
 arch/arm64/include/asm/pgalloc.h                       |   39 
 arch/arm64/kernel/setup.c                              |    2 
 arch/arm64/kernel/smp.c                                |    1 
 arch/arm64/mm/hugetlbpage.c                            |    1 
 arch/arm64/mm/init.c                                   |    6 
 arch/arm64/mm/ioremap.c                                |    1 
 arch/arm64/mm/mmu.c                                    |   63 -
 arch/csky/include/asm/pgalloc.h                        |    7 
 arch/csky/kernel/smp.c                                 |    1 
 arch/hexagon/include/asm/pgalloc.h                     |    7 
 arch/ia64/include/asm/pgalloc.h                        |   24 
 arch/ia64/include/asm/tlb.h                            |    1 
 arch/ia64/kernel/process.c                             |    1 
 arch/ia64/kernel/smp.c                                 |    1 
 arch/ia64/kernel/smpboot.c                             |    1 
 arch/ia64/mm/contig.c                                  |    1 
 arch/ia64/mm/discontig.c                               |    4 
 arch/ia64/mm/hugetlbpage.c                             |    1 
 arch/ia64/mm/tlb.c                                     |    1 
 arch/m68k/include/asm/mmu_context.h                    |    2 
 arch/m68k/include/asm/sun3_pgalloc.h                   |    7 
 arch/m68k/kernel/dma.c                                 |    2 
 arch/m68k/kernel/traps.c                               |    3 
 arch/m68k/mm/cache.c                                   |    2 
 arch/m68k/mm/fault.c                                   |    1 
 arch/m68k/mm/kmap.c                                    |    2 
 arch/m68k/mm/mcfmmu.c                                  |    1 
 arch/m68k/mm/memory.c                                  |    1 
 arch/m68k/sun3x/dvma.c                                 |    2 
 arch/microblaze/include/asm/pgalloc.h                  |    6 
 arch/microblaze/include/asm/tlbflush.h                 |    1 
 arch/microblaze/kernel/process.c                       |    1 
 arch/microblaze/kernel/signal.c                        |    1 
 arch/microblaze/mm/init.c                              |    3 
 arch/mips/include/asm/pgalloc.h                        |   19 
 arch/mips/kernel/setup.c                               |    8 
 arch/mips/loongson64/numa.c                            |    1 
 arch/mips/sgi-ip27/ip27-memory.c                       |    2 
 arch/mips/sgi-ip32/ip32-memory.c                       |    1 
 arch/nds32/mm/mm-nds32.c                               |    2 
 arch/nios2/include/asm/pgalloc.h                       |    7 
 arch/openrisc/include/asm/pgalloc.h                    |   33 
 arch/openrisc/include/asm/tlbflush.h                   |    1 
 arch/openrisc/kernel/or32_ksyms.c                      |    1 
 arch/parisc/include/asm/mmu_context.h                  |    1 
 arch/parisc/include/asm/pgalloc.h                      |   12 
 arch/parisc/kernel/cache.c                             |    1 
 arch/parisc/kernel/pci-dma.c                           |    1 
 arch/parisc/kernel/process.c                           |    1 
 arch/parisc/kernel/signal.c                            |    1 
 arch/parisc/kernel/smp.c                               |    1 
 arch/parisc/mm/hugetlbpage.c                           |    1 
 arch/parisc/mm/init.c                                  |    5 
 arch/parisc/mm/ioremap.c                               |    2 
 arch/powerpc/include/asm/tlb.h                         |    1 
 arch/powerpc/mm/book3s64/hash_hugetlbpage.c            |    1 
 arch/powerpc/mm/book3s64/hash_pgtable.c                |    1 
 arch/powerpc/mm/book3s64/hash_tlb.c                    |    1 
 arch/powerpc/mm/book3s64/radix_hugetlbpage.c           |    1 
 arch/powerpc/mm/init_32.c                              |    1 
 arch/powerpc/mm/init_64.c                              |    4 
 arch/powerpc/mm/kasan/8xx.c                            |    1 
 arch/powerpc/mm/kasan/book3s_32.c                      |    1 
 arch/powerpc/mm/mem.c                                  |    3 
 arch/powerpc/mm/nohash/40x.c                           |    1 
 arch/powerpc/mm/nohash/8xx.c                           |    1 
 arch/powerpc/mm/nohash/fsl_booke.c                     |    1 
 arch/powerpc/mm/nohash/kaslr_booke.c                   |    1 
 arch/powerpc/mm/nohash/tlb.c                           |    1 
 arch/powerpc/mm/numa.c                                 |    1 
 arch/powerpc/mm/pgtable.c                              |    1 
 arch/powerpc/mm/pgtable_64.c                           |    1 
 arch/powerpc/mm/ptdump/hashpagetable.c                 |    2 
 arch/powerpc/mm/ptdump/ptdump.c                        |    1 
 arch/powerpc/platforms/pseries/cmm.c                   |    1 
 arch/riscv/include/asm/pgalloc.h                       |   18 
 arch/riscv/mm/fault.c                                  |    1 
 arch/riscv/mm/init.c                                   |    3 
 arch/s390/crypto/prng.c                                |    4 
 arch/s390/include/asm/tlb.h                            |    1 
 arch/s390/include/asm/tlbflush.h                       |    1 
 arch/s390/kernel/machine_kexec.c                       |    1 
 arch/s390/kernel/ptrace.c                              |    1 
 arch/s390/kvm/diag.c                                   |    1 
 arch/s390/kvm/priv.c                                   |    1 
 arch/s390/kvm/pv.c                                     |    1 
 arch/s390/mm/cmm.c                                     |    1 
 arch/s390/mm/init.c                                    |    1 
 arch/s390/mm/mmap.c                                    |    1 
 arch/s390/mm/pgtable.c                                 |    1 
 arch/sh/include/asm/pgalloc.h                          |    4 
 arch/sh/kernel/idle.c                                  |    1 
 arch/sh/kernel/machine_kexec.c                         |    1 
 arch/sh/mm/cache-sh3.c                                 |    1 
 arch/sh/mm/cache-sh7705.c                              |    1 
 arch/sh/mm/hugetlbpage.c                               |    1 
 arch/sh/mm/init.c                                      |    7 
 arch/sh/mm/ioremap_fixed.c                             |    1 
 arch/sh/mm/numa.c                                      |    3 
 arch/sh/mm/tlb-sh3.c                                   |    1 
 arch/sparc/include/asm/ide.h                           |    1 
 arch/sparc/include/asm/tlb_64.h                        |    1 
 arch/sparc/kernel/leon_smp.c                           |    1 
 arch/sparc/kernel/process_32.c                         |    1 
 arch/sparc/kernel/signal_32.c                          |    1 
 arch/sparc/kernel/smp_32.c                             |    1 
 arch/sparc/kernel/smp_64.c                             |    1 
 arch/sparc/kernel/sun4m_irq.c                          |    1 
 arch/sparc/mm/highmem.c                                |    1 
 arch/sparc/mm/init_64.c                                |    1 
 arch/sparc/mm/io-unit.c                                |    1 
 arch/sparc/mm/iommu.c                                  |    1 
 arch/sparc/mm/tlb.c                                    |    1 
 arch/um/include/asm/pgalloc.h                          |    9 
 arch/um/include/asm/pgtable-3level.h                   |    3 
 arch/um/kernel/mem.c                                   |   17 
 arch/x86/ia32/ia32_aout.c                              |    1 
 arch/x86/include/asm/mmu_context.h                     |    1 
 arch/x86/include/asm/pgalloc.h                         |   42 
 arch/x86/kernel/alternative.c                          |    1 
 arch/x86/kernel/apic/apic.c                            |    1 
 arch/x86/kernel/mpparse.c                              |    1 
 arch/x86/kernel/traps.c                                |    1 
 arch/x86/mm/fault.c                                    |    1 
 arch/x86/mm/hugetlbpage.c                              |    1 
 arch/x86/mm/init_32.c                                  |    2 
 arch/x86/mm/init_64.c                                  |   12 
 arch/x86/mm/kaslr.c                                    |    1 
 arch/x86/mm/pgtable_32.c                               |    1 
 arch/x86/mm/pti.c                                      |    1 
 arch/x86/platform/uv/bios_uv.c                         |    1 
 arch/x86/power/hibernate.c                             |    2 
 arch/xtensa/include/asm/pgalloc.h                      |   46 
 arch/xtensa/kernel/xtensa_ksyms.c                      |    1 
 arch/xtensa/mm/cache.c                                 |    1 
 arch/xtensa/mm/fault.c                                 |    1 
 crypto/adiantum.c                                      |    2 
 crypto/ahash.c                                         |    4 
 crypto/api.c                                           |    2 
 crypto/asymmetric_keys/verify_pefile.c                 |    4 
 crypto/deflate.c                                       |    2 
 crypto/drbg.c                                          |   10 
 crypto/ecc.c                                           |    8 
 crypto/ecdh.c                                          |    2 
 crypto/gcm.c                                           |    2 
 crypto/gf128mul.c                                      |    4 
 crypto/jitterentropy-kcapi.c                           |    2 
 crypto/rng.c                                           |    2 
 crypto/rsa-pkcs1pad.c                                  |    6 
 crypto/seqiv.c                                         |    2 
 crypto/shash.c                                         |    2 
 crypto/skcipher.c                                      |    2 
 crypto/testmgr.c                                       |    6 
 crypto/zstd.c                                          |    2 
 drivers/base/node.c                                    |   10 
 drivers/block/xen-blkback/common.h                     |    1 
 drivers/crypto/allwinner/sun8i-ce/sun8i-ce-cipher.c    |    2 
 drivers/crypto/allwinner/sun8i-ss/sun8i-ss-cipher.c    |    2 
 drivers/crypto/amlogic/amlogic-gxl-cipher.c            |    4 
 drivers/crypto/atmel-ecc.c                             |    2 
 drivers/crypto/caam/caampkc.c                          |   28 
 drivers/crypto/cavium/cpt/cptvf_main.c                 |    6 
 drivers/crypto/cavium/cpt/cptvf_reqmanager.c           |   12 
 drivers/crypto/cavium/nitrox/nitrox_lib.c              |    4 
 drivers/crypto/cavium/zip/zip_crypto.c                 |    6 
 drivers/crypto/ccp/ccp-crypto-rsa.c                    |    6 
 drivers/crypto/ccree/cc_aead.c                         |    4 
 drivers/crypto/ccree/cc_buffer_mgr.c                   |    4 
 drivers/crypto/ccree/cc_cipher.c                       |    6 
 drivers/crypto/ccree/cc_hash.c                         |    8 
 drivers/crypto/ccree/cc_request_mgr.c                  |    2 
 drivers/crypto/marvell/cesa/hash.c                     |    2 
 drivers/crypto/marvell/octeontx/otx_cptvf_main.c       |    6 
 drivers/crypto/marvell/octeontx/otx_cptvf_reqmgr.h     |    2 
 drivers/crypto/nx/nx.c                                 |    4 
 drivers/crypto/virtio/virtio_crypto_algs.c             |   12 
 drivers/crypto/virtio/virtio_crypto_core.c             |    2 
 drivers/iommu/ipmmu-vmsa.c                             |    1 
 drivers/md/dm-crypt.c                                  |   32 
 drivers/md/dm-integrity.c                              |    6 
 drivers/misc/ibmvmc.c                                  |    6 
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c |    2 
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c         |    6 
 drivers/net/ppp/ppp_mppe.c                             |    6 
 drivers/net/wireguard/noise.c                          |    4 
 drivers/net/wireguard/peer.c                           |    2 
 drivers/net/wireless/intel/iwlwifi/pcie/rx.c           |    2 
 drivers/net/wireless/intel/iwlwifi/pcie/tx-gen2.c      |    6 
 drivers/net/wireless/intel/iwlwifi/pcie/tx.c           |    6 
 drivers/net/wireless/intersil/orinoco/wext.c           |    4 
 drivers/s390/crypto/ap_bus.h                           |    4 
 drivers/staging/ks7010/ks_hostif.c                     |    2 
 drivers/staging/rtl8723bs/core/rtw_security.c          |    2 
 drivers/staging/wlan-ng/p80211netdev.c                 |    2 
 drivers/target/iscsi/iscsi_target_auth.c               |    2 
 drivers/xen/balloon.c                                  |    1 
 drivers/xen/privcmd.c                                  |    1 
 fs/Kconfig                                             |   21 
 fs/aio.c                                               |    6 
 fs/binfmt_elf_fdpic.c                                  |    1 
 fs/cifs/cifsencrypt.c                                  |    2 
 fs/cifs/connect.c                                      |   10 
 fs/cifs/dfs_cache.c                                    |    2 
 fs/cifs/misc.c                                         |    8 
 fs/crypto/inline_crypt.c                               |    5 
 fs/crypto/keyring.c                                    |    6 
 fs/crypto/keysetup_v1.c                                |    4 
 fs/ecryptfs/keystore.c                                 |    4 
 fs/ecryptfs/messaging.c                                |    2 
 fs/hugetlbfs/inode.c                                   |    2 
 fs/ntfs/dir.c                                          |    2 
 fs/ntfs/inode.c                                        |   27 
 fs/ntfs/inode.h                                        |    4 
 fs/ntfs/mft.c                                          |    4 
 fs/ocfs2/Kconfig                                       |    6 
 fs/ocfs2/acl.c                                         |    2 
 fs/ocfs2/blockcheck.c                                  |    2 
 fs/ocfs2/dlmglue.c                                     |    8 
 fs/ocfs2/ocfs2.h                                       |    4 
 fs/ocfs2/suballoc.c                                    |    4 
 fs/ocfs2/suballoc.h                                    |    2 
 fs/ocfs2/super.c                                       |    4 
 fs/proc/meminfo.c                                      |   10 
 include/asm-generic/pgalloc.h                          |   80 +
 include/asm-generic/tlb.h                              |    1 
 include/crypto/aead.h                                  |    2 
 include/crypto/akcipher.h                              |    2 
 include/crypto/gf128mul.h                              |    2 
 include/crypto/hash.h                                  |    2 
 include/crypto/internal/acompress.h                    |    2 
 include/crypto/kpp.h                                   |    2 
 include/crypto/skcipher.h                              |    2 
 include/linux/efi.h                                    |    4 
 include/linux/fs.h                                     |   17 
 include/linux/huge_mm.h                                |    2 
 include/linux/kasan.h                                  |    4 
 include/linux/memcontrol.h                             |  209 +++-
 include/linux/mm.h                                     |   86 -
 include/linux/mm_types.h                               |    5 
 include/linux/mman.h                                   |    4 
 include/linux/mmu_notifier.h                           |   13 
 include/linux/mmzone.h                                 |   54 -
 include/linux/pageblock-flags.h                        |   30 
 include/linux/percpu_counter.h                         |    4 
 include/linux/sched/mm.h                               |    8 
 include/linux/shmem_fs.h                               |    3 
 include/linux/slab.h                                   |   11 
 include/linux/slab_def.h                               |    9 
 include/linux/slub_def.h                               |   31 
 include/linux/swap.h                                   |    2 
 include/linux/vmstat.h                                 |   14 
 init/Kconfig                                           |    9 
 init/main.c                                            |    2 
 ipc/shm.c                                              |    2 
 kernel/fork.c                                          |   54 -
 kernel/kthread.c                                       |    8 
 kernel/power/snapshot.c                                |    2 
 kernel/rcu/tree.c                                      |    2 
 kernel/scs.c                                           |    2 
 kernel/sysctl.c                                        |    2 
 lib/Kconfig.kasan                                      |   39 
 lib/Makefile                                           |    1 
 lib/ioremap.c                                          |  287 -----
 lib/mpi/mpiutil.c                                      |    6 
 lib/percpu_counter.c                                   |   19 
 lib/test_kasan.c                                       |   87 +
 mm/Kconfig                                             |    6 
 mm/Makefile                                            |    2 
 mm/debug.c                                             |  103 +-
 mm/debug_vm_pgtable.c                                  |  666 +++++++++++++
 mm/filemap.c                                           |    9 
 mm/gup.c                                               |    3 
 mm/huge_memory.c                                       |   14 
 mm/hugetlb.c                                           |   25 
 mm/ioremap.c                                           |  289 +++++
 mm/kasan/common.c                                      |   41 
 mm/kasan/generic.c                                     |   43 
 mm/kasan/generic_report.c                              |    1 
 mm/kasan/kasan.h                                       |   25 
 mm/kasan/quarantine.c                                  |    1 
 mm/kasan/report.c                                      |   54 -
 mm/kasan/tags.c                                        |   37 
 mm/khugepaged.c                                        |   75 -
 mm/memcontrol.c                                        |  832 ++++++++++-------
 mm/memory.c                                            |   15 
 mm/memory_hotplug.c                                    |   11 
 mm/migrate.c                                           |    6 
 mm/mm_init.c                                           |   20 
 mm/mmap.c                                              |   45 
 mm/mremap.c                                            |   19 
 mm/nommu.c                                             |    6 
 mm/oom_kill.c                                          |    2 
 mm/page-writeback.c                                    |    6 
 mm/page_alloc.c                                        |  226 ++--
 mm/page_counter.c                                      |    6 
 mm/page_io.c                                           |    2 
 mm/pgalloc-track.h                                     |   51 +
 mm/shmem.c                                             |  133 ++
 mm/shuffle.c                                           |   46 
 mm/shuffle.h                                           |   17 
 mm/slab.c                                              |  129 +-
 mm/slab.h                                              |  755 ++++++---------
 mm/slab_common.c                                       |  829 ++--------------
 mm/slob.c                                              |   12 
 mm/slub.c                                              |  680 ++++---------
 mm/sparse-vmemmap.c                                    |   62 -
 mm/sparse.c                                            |   31 
 mm/swap_slots.c                                        |   45 
 mm/swap_state.c                                        |    2 
 mm/util.c                                              |   52 +
 mm/vmalloc.c                                           |  176 +--
 mm/vmscan.c                                            |   39 
 mm/vmstat.c                                            |   38 
 mm/workingset.c                                        |    6 
 net/atm/mpoa_caches.c                                  |    4 
 net/bluetooth/ecdh_helper.c                            |    6 
 net/bluetooth/smp.c                                    |   24 
 net/core/sock.c                                        |    2 
 net/ipv4/tcp_fastopen.c                                |    2 
 net/mac80211/aead_api.c                                |    4 
 net/mac80211/aes_gmac.c                                |    2 
 net/mac80211/key.c                                     |    2 
 net/mac802154/llsec.c                                  |   20 
 net/sctp/auth.c                                        |    2 
 net/sunrpc/auth_gss/gss_krb5_crypto.c                  |    4 
 net/sunrpc/auth_gss/gss_krb5_keys.c                    |    6 
 net/sunrpc/auth_gss/gss_krb5_mech.c                    |    2 
 net/tipc/crypto.c                                      |   10 
 net/wireless/core.c                                    |    2 
 net/wireless/ibss.c                                    |    4 
 net/wireless/lib80211_crypt_tkip.c                     |    2 
 net/wireless/lib80211_crypt_wep.c                      |    2 
 net/wireless/nl80211.c                                 |   24 
 net/wireless/sme.c                                     |    6 
 net/wireless/util.c                                    |    2 
 net/wireless/wext-sme.c                                |    2 
 scripts/Makefile.kasan                                 |    3 
 scripts/bloat-o-meter                                  |    2 
 scripts/coccinelle/free/devm_free.cocci                |    4 
 scripts/coccinelle/free/ifnullfree.cocci               |    4 
 scripts/coccinelle/free/kfree.cocci                    |    6 
 scripts/coccinelle/free/kfreeaddr.cocci                |    2 
 scripts/const_structs.checkpatch                       |    1 
 scripts/decode_stacktrace.sh                           |   85 +
 scripts/spelling.txt                                   |   19 
 scripts/tags.sh                                        |   18 
 security/apparmor/domain.c                             |    4 
 security/apparmor/include/file.h                       |    2 
 security/apparmor/policy.c                             |   24 
 security/apparmor/policy_ns.c                          |    6 
 security/apparmor/policy_unpack.c                      |   14 
 security/keys/big_key.c                                |    6 
 security/keys/dh.c                                     |   14 
 security/keys/encrypted-keys/encrypted.c               |   14 
 security/keys/trusted-keys/trusted_tpm1.c              |   34 
 security/keys/user_defined.c                           |    6 
 tools/cgroup/memcg_slabinfo.py                         |  226 ++++
 tools/include/linux/jhash.h                            |    2 
 tools/lib/rbtree.c                                     |    2 
 tools/lib/traceevent/event-parse.h                     |    2 
 tools/testing/ktest/examples/README                    |    2 
 tools/testing/ktest/examples/crosstests.conf           |    2 
 tools/testing/selftests/Makefile                       |    1 
 tools/testing/selftests/cgroup/.gitignore              |    1 
 tools/testing/selftests/cgroup/Makefile                |    2 
 tools/testing/selftests/cgroup/cgroup_util.c           |    2 
 tools/testing/selftests/cgroup/test_kmem.c             |  382 +++++++
 tools/testing/selftests/mincore/.gitignore             |    2 
 tools/testing/selftests/mincore/Makefile               |    6 
 tools/testing/selftests/mincore/mincore_selftest.c     |  361 +++++++
 397 files changed, 5547 insertions(+), 4072 deletions(-)


^ permalink raw reply	[flat|nested] 263+ messages in thread

* incoming
@ 2020-07-24  4:14 Andrew Morton
  0 siblings, 0 replies; 263+ messages in thread
From: Andrew Morton @ 2020-07-24  4:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm


15 patches, based on f37e99aca03f63aa3f2bd13ceaf769455d12c4b0.

Subsystems affected by this patch series:

  mm/pagemap
  mm/shmem
  mm/hotfixes
  mm/memcg
  mm/hugetlb
  mailmap
  squashfs
  scripts
  io-mapping
  MAINTAINERS
  gdb

Subsystem: mm/pagemap

    Yang Shi <yang.shi@linux.alibaba.com>:
      mm/memory.c: avoid access flag update TLB flush for retried page fault

    "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>:
      mm/mmap.c: close race between munmap() and expand_upwards()/downwards()

Subsystem: mm/shmem

    Chengguang Xu <cgxu519@mykernel.net>:
      vfs/xattr: mm/shmem: kernfs: release simple xattr entry in a right way

Subsystem: mm/hotfixes

    Tom Rix <trix@redhat.com>:
      mm: initialize return of vm_insert_pages

    Bhupesh Sharma <bhsharma@redhat.com>:
      mm/memcontrol: fix OOPS inside mem_cgroup_get_nr_swap_pages()

Subsystem: mm/memcg

    Hugh Dickins <hughd@google.com>:
      mm/memcg: fix refcount error while moving and swapping

    Muchun Song <songmuchun@bytedance.com>:
      mm: memcg/slab: fix memory leak at non-root kmem_cache destroy

Subsystem: mm/hugetlb

    Barry Song <song.bao.hua@hisilicon.com>:
      mm/hugetlb: avoid hardcoding while checking if cma is enabled

    "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>:
      khugepaged: fix null-pointer dereference due to race

Subsystem: mailmap

    Mike Rapoport <rppt@linux.ibm.com>:
      mailmap: add entry for Mike Rapoport

Subsystem: squashfs

    Phillip Lougher <phillip@squashfs.org.uk>:
      squashfs: fix length field overlap check in metadata reading

Subsystem: scripts

    Pi-Hsun Shih <pihsun@chromium.